# code chunk for R
Midterm
The United States Geological Survey (USGS) continuously monitors earth quakes and makes the information available to the public in a data feed. The dataset all-month.csv included in this repository was pulled from USGS on Monday Mar 3, 2025. It contains all worldwide earth quakes for the last 30 days.
An detailed description of all the variables in the all_month
data (and more) is available from in the ComCat Documentation
For all of the following questions, provide answers and include your code. You can work in R or in python or in a mix of the two. Make sure that the final document you submit renders without errors. In the rendered document avoid any lengthy output.
- Load the data into the object
eq
using R or python. Make sure that you are working with an object with 9815 rows and 22 columns.
# python code chunk
- Read the ComCat Documentation for the variables
time
,latitude
,longitude
,depth
,mag
,magType
,nst
,id
, andplace
. Ensure that the format of the variables matches the description.
If you work on R, use the lubridate
package to convert the variable time
to a time object. If you work in python, use the datetime
package.
# R chunk
library(lubridate)
# python chunk
import datetime
- Include each of the variables
time
,latitude
,longitude
,depth
,mag
,magType
, andnst
in a data chart. Summarise the main finding(s) in each chart in 1-2 sentences.
- What is the time frame under consideration? How many earthquakes were there (show that the number of earthquakes is equal to the number of rows in the dataset)? When and where was the strongest earthquake? What was its magnitude?
depth
is an estimate (in kilometers) of the depth in which an earthquake begins to rupture. The documentation suggests that in some cases default depths are used (when there is no further information available). Based on chart(s), argue which values are used for default depths and where?
- The variable
nst
contains the information on the number of seismic stations used to determine an earthquake’s location.
- What is the average number of seismic stations used to determine the location of an earthquake?
- What is the minimum number of stations?
- 1,592 earthquakes do not have a recorded number of stations. What else do these earthquakes have in common?
- By introducing a suitable grouping variable, create a data set
daily_summary
that contains the daily summary values of the following statistics:
- the total number of earthquakes
- the number of earthquakes with a magnitude of 4 or higher
- the average magnitude of earthquakes
- the location, place and magnitude of the strongest earthquake
Based on the values in daily_summary
discuss the statements: On average, there are about 50 earthquakes globally with a magnitude of at least 4, days with more than 750 seismic events are not uncommon.
“The Ring of fire” is a horseshoe-shaped belt along the rim of the Pacific (tectonic) Plate reaching from New Zealand to the tip of South America. The Ring of Fire contains hundreds of active and dormant volcanoes and is the location of a lot of seismic activity.
Introduce a transformation of the longitude variable
longitude_rof
to the data set that allows you to visualize the horseshoe shape of the ring of fire. (draw that chart)
Update the data set: write a function
update_eq
with argumentsdata
,url
andverbose
that downloads the most recent 30 days of earthquake activity and incorporates that information into the provided data setdata
. Make sure to include each earthquake only once (in case there are multiple, choose the most recent update given byupdated
). Ifverbose
is set to true, provide a summary of the changes made to the data set of the form (replace the stars by the corresponding numbers):number of earthquakes added: ***
number of earthquakes updated: ***
Call your function with the eq
object set as the data, a url
of https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv in the verbose mode. Save the resulting data set as all_month_updated.csv
. Include the function, the call to your function and the updated file in your repository.