-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy path01_z08_alsan_2015.rmd
339 lines (266 loc) · 10.7 KB
/
01_z08_alsan_2015.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
### Project Overview
---
**Objective:**
Create TseTse fly suitability index used in @alsan2015effect (find the paper [here](https://www.aeaweb.org/articles?id=10.1257/aer.20130604)) from temperature and humidity raster datasets.
---
**Datasets**
* daily temperature and humidity datasets
---
**GIS tasks**
* read raster data files in the NetCDF format
- use `stars::read_ncdf()`
* aggregate raster data
- use `stars::st_apply()`
* read vector data in the geojson format
- use `stars::st_read()`
* shift (rotate) longitude of a raster dataset
- use `terra::rotate()`
* convert `stars` objects to `SpatRaster` objects, and vice versa
- use `as(, "SpatRaster")`
- use `stars::st_as_stars()`
* define variables inside a `stars` object
- use `mutate()`
* subset (crop) raster data to a region specified by a vector dataset
- use `[]`
* spatially aggregate raster data by regions specified by a vector dataset
- use `aggregate()`
- use `exactextractr::exact_extract()`
+ create maps
* use the `ggplot2` package
---
**Preparation for replication**
Run the following code to install or load (if already installed) the `pacman` package, and then install or load (if already installed) the listed package inside the `pacman::p_load()` function.
```{r demo1_packages}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
sf, # vector data operations
tidyverse, # data wrangling
stars,
exactextractr
)
```
```{r "Setup", results = 'hide', echo = F, warning = F, message = F}
theme_map <- function(...) {
theme_bw() +
theme(
line = element_blank(),
rect = element_blank(),
strip.text = element_blank(),
axis.text = element_blank(),
plot.title = element_blank(),
axis.title = element_blank(),
plot.margin = structure(c(0, 0, -1, -1), unit = "lines", valid.unit = 3L, class = "unit"),
legend.position = "none",
...
)
}
```
### Project Demonstration
Marcella Alsan's 2015 AER paper "The Effect of the TseTse Fly on African Development" tests a hypothesis about historical African economic development. The hypothesis considers the TseTse fly -- a fly indigenous to Africa that is lethal to crops. The theory posits that the fly prevented agricultural surplus and hence stunted historical economic development in impacted regions of Africa. To test the hypothesis, @Alsan_2015 whats to compare African tribes that were in regions highly affected by the TseTse fly to areas not highly affected. To do so, they use a "sutability index" that is based on an areas average temperature and humidity.
For this replication, we will recreate the suitability index and aggregate the data at the tribe level. For reference, the original figure from the article is reproduced in Figure \@ref(fig:alsan-orig).
```{r "alsan-orig", echo = F, out.width = "70%", fig.cap = "TseTse Suitability Index from Alsan (2015)", eval = F}
knitr::include_graphics("Data/alsan-fig.png")
```
#### Scientific Details {.unnumbered}
Understanding the details of the suitablity index is not necessary, but below I implement the following to derive the index. Feel free to skip this if you would like.
Let B represent the birth rate, which is temperature dependent, and M represent the mortality of adult flies from dessication. The growth rate, $\lambda$ is defined as:
$$
\lambda= \max(B - M, 0)
$$
The formula for the birth rate and the mortality rate are determined by scientific experiments and have the following form
$$
B(t) = (-0.0058 * t^2 + .2847 t -2.467)
$$
$$
M(t, h) = -0.0003 * satdef^2 + 0.0236 * satdef + .235,
$$
where $t$ is temperature, $h$ is humidity and $satdef$ is defined as:
$$
satdef = \frac{100-h}{100} \left( 6.1078 * exp(\frac{17.2694t}{t+237}) \right)
$$
A "second form of mortality that is not due to climate, but rather attributable to competition among flies, is introduce. This is known as density dependent mortality, $\Delta$; and can be expressed as:"
$$
\Delta = \phi (N)^\psi
$$
This yields a steady state equilibrium population of
$$
N^* = (\frac{\lambda}{\phi})^{1/\psi}
$$
which is calibrated with \phi = 0.025 \text{ and } \psi= 1.25.
Lastly, the TseTse Suitability Index is the Z-score of $N^*$.
#### Load and Prepare Data {-}
To generate the data, we first download historical weather data in 1871 from [NOAA-CIRES 20th Century Reanalysis](https://psl.noaa.gov/data/gridded/data.20thC_ReanV2c.html). This data comes in a `NetCDF` file format (with extension `.nc`). We can use `stars::read_ncdf()` read a `NetCDF` file.
```{r "Load Weather Data", message=FALSE, warning=FALSE}
# NOAA-CIRES 20th Century Reanalysis version 2
#--- temperature ---#
(
temp_raw <- stars::read_ncdf(
here::here("data/alsan_2015/air.sig995.1871.nc")
)
)
#--- humidity ---#
(
humidity_raw <- stars::read_ncdf(
here::here("data/alsan_2015/rhum.sig995.1871.nc")
)
)
```
Since these raster files contain daily observations (see the `time` dimension above), Alsan aggregates them to the annual level. To average across a dimension (e.g. time), we will use the function `stars::st_apply`.
```{r "Aggregate Data to Annual"}
# Aggregate to annual average
temp <-
stars::st_apply(
X = temp_raw, MARGIN = c("lon", "lat"), FUN = mean
) %>%
# Rename "mean" attribute which was created from st_apply
# Convert Kelvin to Celsius
mutate(temp = mean - 273.15) %>%
dplyr::select(temp)
humidity <-
stars::st_apply(
X = humidity_raw, MARGIN = c("lon", "lat"), FUN = mean
) %>%
# Rename "mean" attribute which was created from st_apply
mutate(hum = mean) %>%
dplyr::select(hum)
```
We then combine the two to a single weather dataset.
```{r}
(
weather <- c(temp, humidity)
)
```
The second piece of data needed is a shape file containing the Tribal boundaries. The original drawings come from Murdock (1959), but were digitized by Nathan Nunn and coauthors and is available [here](https://scholar.harvard.edu/nunn/pages/data-0).
```{r load-tribal-boundary-data, message = FALSE}
# African Tribes, originally from Murdock (1959)
tribes <-
sf::st_read("Data/alsan_2015/Murdock_shapefile/borders_tribes.geojson") %>%
#--- reproject to the CRS of temp_raw ---#
st_transform(st_crs(temp_raw))
# Africa
africa <-
sf::st_read(here::here("data/alsan_2015/africa.geojson")) %>%
#--- reproject to the CRS of temp_raw ---#
st_transform(st_crs(temp_raw))
```
Here is the map of triabl boundaries superimposed on top of Africa.
```{r g-trial-boundary}
ggplot() +
geom_sf(data = tribes) +
geom_sf(data = africa, color = "#F26D21", fill = NA) +
coord_sf() +
theme_map()
```
There is a common problem in working with raster data for countries that cross the Prime Meridian. To see the problem, let's plot our weather data:
```{r weather-before-rotation}
ggplot() +
geom_stars(data = weather, mapping = aes(x = lon, y = lat, fill = temp)) +
geom_sf(data = africa, color = "#F26D21", fill = NA) +
coord_sf() +
scale_fill_distiller(type = "seq", palette = "Greys", direction = 1, na.value = NA) +
theme_map()
```
As you can see, the portion of Africa east of the Prime Meridian is wrapped to the other side of the map. That is because there are two ways to handle longitude: either from [0,360] or [-180,180]. Since our data is in [0,360] form, Africa will be cut in half. To convert, we can use the `terra::rotate()` function in the `terra` package. This means that you first need to convert `weather`, which is a `stars`, to a `SpatRaster` object, and then apply `terra::rotate()` to it.
```{r}
(
weather_raster <-
#--- convert to SpatRaster ---#
as(weather, "SpatRaster") %>%
terra::rotate()
)
```
Conversion back to a `stars` object with multiple layers needs some work. Specfically, we turn each layer into a `stars` object and then combine them using `c()`.
```{r}
weather <-
c(
st_as_stars(weather_raster$temp),
st_as_stars(weather_raster$hum)
)
```
Things are looking good now.
```{r rotated-weather}
ggplot() +
geom_stars(data = weather, mapping = aes(x = x, y = y, fill = temp)) +
geom_sf(data = africa, color = "#F26D21", fill = NA) +
coord_sf() +
scale_fill_distiller(
type = "seq", palette = "Greys", direction = 1, na.value = NA
) +
theme_map()
```
#### Calculate TseTse Suitability Index {-}
Following the scientific formulae above, we add those variables to the `stars` object using `mutate()` function as if you are wrangling a `data.frame`.
```{r create-tsetse-index}
weather <-
weather %>%
mutate(
B = -0.0058 * temp^2 + .2847 * temp - 2.467,
satdef = (6.1078 * exp((17.2694 * temp) / (temp + 237))) - (hum / 100) * (6.1078 * exp((17.2694 * temp) / (temp + 237))),
M = -0.0003 * satdef^2 + 0.0236 * satdef + .235,
lambda = B - M,
lambda = ifelse(lambda < 0, 0, lambda),
Nstar = (lambda / 0.025)^(1 / 1.25),
)
```
Let's subset `weather` to Africa and then calculate tsetse.
```{r}
sf_use_s2(FALSE)
weather_africa <-
#--- subset to Africa ---#
weather[tribes] %>%
#--- calculate TseTse suitability index---#
mutate(
tsetse = (Nstar - mean(Nstar, na.rm = TRUE)) / sd(Nstar, na.rm = TRUE)
)
```
Here is the map of TseTse suitability index.
```{r tsetse-suitability-index}
ggplot() +
geom_stars(
data = weather_africa,
mapping = aes(x = x, y = y, fill = tsetse)
) +
coord_sf() +
scale_fill_distiller(
type = "seq", palette = "Greys", direction = -1, na.value = NA
) +
theme_map()
```
Now that we have our raster of the standardized TseTse suitability index, we want to aggregate this to the tribal level. This can be done using `aggregate()`.
```{r aggregate-tsetse}
agg_tsetse <-
aggregate(
x = weather_africa["tsetse"],
by = tribes,
FUN = mean,
na.rm = TRUE,
as_points = FALSE
)
```
However, we are going to run into a problem due to the size of the tribes relative to the size of the raster cells.
```{r map-agg-tsetse}
plot(agg_tsetse)
```
Notice all the holes in the map that we have!^[This problem is documented well in [this thread](https://twitter.com/kylefbutts/status/1270815765948579841?s=20)]. This problem can be fixed using the `exactextractr` package. Since it does not work with `stars` object, we need to convert `weather` to a `SpatRaster` objective from the `terra` package (you can alternatively convert to a `raster` object).
```{r exactextractr-tsetse}
tribes$tsetse <-
exactextractr::exact_extract(
x = as(weather["Nstar"], "SpatRaster"),
# x = as(weather["Nstar"], "Raster"),
y = tribes,
fun = "mean",
progress = FALSE # not display progress bar
)
```
```{r map-tsese-final}
ggplot() +
geom_sf(data = tribes, aes(fill = tsetse)) +
coord_sf() +
scale_fill_distiller(
type = "seq",
palette = "Greys",
direction = 1,
na.value = "red"
)
```
There we have it! We successfully dealt with a slew of issues but now have created the tsetse susceptability index at the tribal level!