Update: The 6.0 Napa earthquake occured a week after I originally published this post. I re-ran the code and analysis to include the latest activity.

I decided to do a bit of data exploration around major California earthquakes in the last 30 years. Specifically, I will look at earthquakes within 200 km of San Francisco and Los Angeles between August 1984 and August 2014.

USGS has a very accessible Earthquake API and online query form. I wrote a simple script in Python3 which queries the API and downloads the result.

Data preparation

Fetching the data

To pull down the data, use usgs.APIquery with the desired parameters. All parameters are defined here.

import usgs

# obtain data for the greater San Francisco area, start of 1984 through August 24, 2014 in PST
usgs.APIquery(starttime = "1984-01-01T07:00:00", endtime = "2014-08-25T07:00:00",
              minmagnitude = "1.5",
              latitude = "37.77", longitude = "-122.44",
              minradiuskm = "0", maxradiuskm = "200",
              reviewstatus = "reviewed",
              filename = "usgsQuery_SF_84-14.csv",
              format = "csv")

# obtain data for the greater Los Angeles area, start of 1984 through August 24, 2014 in PST
usgs.APIquery(starttime = "1984-01-01T07:00:00", endtime = "2014-08-25T07:00:00",
              minmagnitude = "1.5",
              latitude = "34.05", longitude = "-118.26",
              minradiuskm = "0", maxradiuskm = "200",
              reviewstatus = "reviewed",
              filename = "usgsQuery_LA_84-14.csv",
              format = "csv")

Switching over to use R, I then read the CSVs into dataframes. For convenience, I define an area label and combine the dataframes into one.

SFquakes = read.csv("usgsQuery_SF_84-14.csv", header = TRUE, stringsAsFactors = FALSE)
LAquakes = read.csv("usgsQuery_LA_84-14.csv", header = TRUE, stringsAsFactors = FALSE)
SFquakes$area = "SF"
LAquakes$area = "LA"
quakes = rbind(SFquakes, LAquakes)

Data scrubbing and binning

# remove columns with all NA values
quakes = quakes[,-7:-10]

# create column with time object
quakes$ptime = as.POSIXlt(strptime(quakes$time, "%Y-%m-%dT%T"))

# remove instances of mwb, an alternative form of magnitude measurement
quakes = quakes[-which(quakes$magType == "mwb"),]

# create columns with year bins
quakes$yearbins = strftime(cut(quakes$ptime, "year", right = F), "%Y")

# keep one significant figure after the decimal point for magnitude
quakes$mag = round(quakes$mag, 1)

# create columns with half and whole magnitude bins
quakes$Mhalfbins = cut(quakes$mag, seq(2.0,7.5,.5), right = F)
quakes$Mwholebins = cut(quakes$mag, seq(2.0,8.0,1), right = F)

# create a column with varied magnitude bins, necessary if plotting without a log scale
quakes$Mvariedbins = cut(quakes$mag, c(2.0,2.5,3.0,3.5,4.0,5.0,7.5), right = F)

Magnitude-energy conversion

One way to comprehend the destructive force of earthquakes is to convert from earthquake magnitude to a more imaginative unit of energy: the ton of TNT. To do this conversion, we need to be able to convert from magnitude to energy, perhaps with the help of Beno and Charles. Also, it is useful to know that one ton of TNT is equivalent to about 4.184e9 Joules.

The Gutenberg and Richter energy-magnitude relation (in Joules):

E[M] = 10^(1.5M + 4.8)

quakes$Ejoules = 10^(1.5*quakes$mag + 4.8) # units of Joules
quakes$Etnt = quakes$Ejoules/4.184e9       # units of kilotonnes, TNT

Calculate approximate distance from epicenter to city center

For this calculation I decided to use the Haversine formula. A great reference to implementing this method and other methods can be found here.

Rearth = 6371 # average earth radius

# approximate city centroids
quakes$area_lon = 0
quakes$area_lon[quakes$area == "SF"] = -122.44
quakes$area_lon[quakes$area == "LA"] = -118.26
quakes$area_lat = 0
quakes$area_lat[quakes$area == "SF"] = 37.77
quakes$area_lat[quakes$area == "LA"] = 34.05

# haversine formula
lat1 = quakes$area_lat*pi/180
lat2 = quakes$latitude*pi/180
dlat = (quakes$area_lat - quakes$latitude)*pi/180
dlon = (quakes$area_lon - quakes$longitude)*pi/180
a = sin(dlat/2)*sin(dlat/2) + cos(lat1)*cos(lat2)*sin(dlon/2)*sin(dlon/2)
c = 2*atan2(sqrt(a), sqrt(1 - a))
quakes$dist = Rearth*c

Ask the data

What were the largest earthquakes in recent history?

quakes = quakes[order(quakes$mag, decreasing = TRUE),] # sort by magnitude
print(quakes[quakes$mag >= 6.0,c("ptime", "mag", "area", "Etnt", "dist")])

Output (with annotation):

                    ptime mag area       Etnt      dist
15826 1992-06-28 11:57:38 7.3   LA 1344028.02 159.76987 <-- Landers
12577 1999-10-16 09:46:46 7.2   LA  951498.97 175.46360 <-- Hector Mine
6365  1989-10-18 00:04:16 6.9   SF  337604.58  94.25933 <-- Loma Prieta
14411 1994-01-17 12:30:55 6.7   LA  169203.10  31.30647 <-- Northridge
15814 1992-06-28 15:05:33 6.5   LA   84802.44 135.38916 <-- related to Landers
7197  1984-04-24 21:15:20 6.1   SF   21301.41  82.69884 <-- Morgan Hill
16096 1992-04-23 04:50:23 6.1   LA   21301.41 162.05213 <-- Joshua Tree, preceeded Landers
6     2014-08-24 10:20:44 6.0   SF   15080.24  51.28585 <-- Napa

To compare the largest earthquakes in each area, the Landers quake had an explosive force of 1.3 megatonnes of TNT, while the Loma Prieta had about 340 kilotonnes of explosive force.

Since epicenter location is a huge factor in the destructive power of an earthquake, it is interesting to note that the Landers quake was about 160 km away from the densely populated LA metro area, while the Northridge quake was 30 km away, and about 4.0 (10^.6) times less powerful.

Which major city was most affected by earthquakes?

# Mean distance, magnitude, and Energy in TNT
print(aggregate(data = quakes, cbind(dist, mag, Etnt) ~ area, mean))

Output:

  area     dist      mag      Etnt
1   LA 124.8353 2.796353 273.54752
2   SF 115.9873 2.589565  56.07969

The question of which city has been more affected is more complex than a simple calculation of the 30-year mean of distance and magnitude. However, on average, recent earthquakes surrounding LA have been about 10% closer than Bay Area earthquakes, 1.6 (10^.21) times more severe in magnitude, and with over 4 times more explosive force.

What is the combined yearly mean distance, magnitude, and energy?

# yearly averages
print(aggregate(data = quakes, cbind(dist, mag, Etnt) ~ yearbins, mean))

Output:

yearbins      dist      mag         Etnt
1      1984 109.04522 3.351741  118.2368472
2      1985 119.81664 3.106047    7.7798894
3      1986 123.08929 3.196345   57.9139893
4      1987  97.24530 3.098084   28.2857204
5      1988 117.30689 3.134848   14.5052295
6      1989  99.45154 3.223005  804.5486309
7      1990 109.68262 3.072021   23.2815528
8      1991 113.49903 3.020319   24.0027741
9      1992 156.32608 3.237759  983.2753462
10     1993 135.49608 2.986907    2.7064472
11     1994  70.18247 3.157971  168.1940170
12     1995 138.14896 3.124582   17.2462753
13     1996 129.99745 3.138735    6.0027307
14     1997 115.75065 3.156890    6.6688196
15     1998 117.41048 3.114228    6.7265345
16     1999 172.62893 3.256908 1310.8670301
17     2000 149.68578 3.105828    4.8142657
18     2001 137.13644 3.147203    7.0122122
19     2002 117.39979 3.076856    4.3661773
20     2003 121.76297 3.115162    6.3490834
21     2004 119.03760 3.097368    8.9307051
22     2005 121.67829 3.138785   13.8969863
23     2006 121.59305 3.050495    3.9256528
24     2007 114.70162 3.046964   19.8885765
25     2008 113.91730 2.963265   12.9691261
26     2009  96.35781 2.678468    3.1106334
27     2010 104.13835 2.665835    1.2364846
28     2011  99.73551 2.676221    1.5248688
29     2012 104.98631 2.606157    1.1377617
30     2013 125.22960 1.933424    0.3044841
31     2014 118.96839 1.942429    6.8136214

Due to the exponential nature of the data, the Etnt feature is a clear indicator for seismologically active years. Also, it appears we are seeing increased detection efficiency in the last few years with the average magnitude decreasing. Note, however, that 2014 is not a complete year at the time of this blog post.

What is the magnitude-frequency distribution for the two areas of interest?

freqSF = as.data.frame(table(quakes[quakes$area == "SF","Mhalfbins"]))
names(freqSF) = c("magSF", "freqSF")
freqLA = as.data.frame(table(quakes[quakes$area == "LA","Mhalfbins"]))
names(freqLA) = c("magLA", "freqLA")

# event frequency
print(cbind(freqSF, freqLA))

Output:

     magSF freqSF   magLA freqLA
1  [2,2.5)   1215 [2,2.5)   1018
2  [2.5,3)   2069 [2.5,3)   2600
3  [3,3.5)   1540 [3,3.5)   2848
4  [3.5,4)    460 [3.5,4)   1055
5  [4,4.5)    169 [4,4.5)    365
6  [4.5,5)     44 [4.5,5)     87
7  [5,5.5)      9 [5,5.5)     35
8  [5.5,6)      2 [5.5,6)     10
9  [6,6.5)      2 [6,6.5)      1
10 [6.5,7)      1 [6.5,7)      2
11 [7,7.5)      0 [7,7.5)      2

Note: the units for frequency are in Event Counts per 30 years.

I am not an expert in seismology. However, my guess would be that this magnitude-frequency should theoretically follow a power-law where at one end, as the magnitude approaches zero, the frequency increases exponentially, and at the other end, as the magnitude exceeds 10 the frequency diminishes to almost 0. The fact that the frequency distribution peaks between [3,3.5) and falls off as the magnitude approaches 0, given my uneducated guess, suggests that the detection efficiency (exponentially) decreases as it approaches 0. See the magnitude versus frequency plot below.

Plots

Time for some plots. Let’s have a look at the following correlations:

  • time vs. magnitude
  • magnitude-frequency correlation

Load ggplot2 and scales. The scales package is necessary for log tick marks.

library("ggplot2")
library("scales")

time vs. magnitude

png("SF-LA_timeVmag.png", width = 1000, height = 800)
timeVmag = ggplot(na.omit(quakes), aes(ptime, mag)) +
           geom_point(aes(size = mag, color = dist)) +
           ggtitle("SF and LA Earthquakes, 1984-2014") +
           xlab("time") +
           ylab("earthquake magnitude") + 
           guides(size = guide_legend(title = "magnitude")) +
           guides(color = guide_legend(title = "distance [km]")) +
           scale_color_gradient(low = "red", high = "dark gray") +
           theme_grey(base_size = 12) +
           theme(text = element_text(size = 22)) +
           facet_wrap(~ area, ncol = 1)
print(timeVmag)
dev.off()

Output:

Magnitude over Time

The major earthquakes that pop out in this plot are: Landers in 1992, Loma Prieta in 1989, and Northridge at the beginning of 1994. What is also interesting is the amount of aftershocks and related earthquakes for the major Los Angeles area earthquakes. Another interesting aspect of this plot is the increased detection efficiency of 2 to 2.5 magnitude earthquakes since 2009. Perhaps this is due to the Quake-Catcher Network (QCN)?

magnitude-frequency correlation

freq = as.data.frame(table(quakes[quakes$mag >= 2.0, c("area","mag")]))
freq = freq[order(freq$area),]
freq[freq$Freq == 0.0,] = NA # bins with zero counts cause log plot issues
png("SF-LA_magVfreq.png", width = 1000, height = 800)
magVfreq = ggplot(na.omit(freq), aes(mag, Freq)) +
           geom_point(aes(color = area), size = 4) + 
           ggtitle("SF and LA Earthquakes, 1984-2014") + 
           ylab("frequency [event count per 31 years]") + 
           scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                         labels = trans_format("log10", math_format(10^.x))) +
           scale_x_discrete("magnitude", breaks = seq(2,7,1)) + 
           geom_smooth(method = "loess", aes(group = 1), color = "black") +
           theme_grey(base_size = 12) + 
           theme(text = element_text(size = 22))
print(magVfreq)
dev.off()

Output:

Magnitude versus Frequency

The correlation between magnitude and frequency for most of the Richter scale has a slope of about 10^1 events over one order of magnitude. Furthermore, as the magnitude increases, the spread of the data increases due to the insufficient amount of counts.

Code available on Github

The code for this post can be found under the exploration directory on github in my usgs repository: quakes.R

Many thanks to usgs.gov for providing easy access to their earthquake data.