-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy path01_z05_groundwater_irrigation.rmd
224 lines (175 loc) · 7.65 KB
/
01_z05_groundwater_irrigation.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
### Project Overview
---
**Objective**
+ Understand the impact of monthly precipitation on groundwater use for agricultural irrigation
---
**Datasets**
+ Annual groundwater pumping by irrigation wells in Kansas for 2010 and 2011 (originally obtained from the Water Information Management & Analysis System (WIMAS) database)
+ Daymet^[[Daymet website](https://daymet.ornl.gov/)] daily precipitation and maximum temperature downloaded using `daymetr` package
---
**Econometric Model**
The econometric model we would like to estimate is:
$$
y_{i,t} = \alpha + P_{i,t} \beta + T_{i,t} \gamma + \phi_i + \eta_t + v_{i,t}
$$
where $y$ is the total groundwater extracted in year $t$, $P_{i,t}$ and $T_{i,t}$ is the collection of monthly total precipitation and mean maximum temperature April through September in year $t$, respectively, $\phi_i$ is the well fixed effect, $\eta_t$ is the year fixed effect, and $v_{i,t}$ is the error term.
---
**GIS tasks**
+ download Daymet precipitation and maximum temperature data for each well from within R in parallel
* use `daymetr::download_daymet()` and `future.apply::future_lapply()`
---
**Preparation for replication**
+ Run the following code to install or load (if already installed) the `pacman` package, and then install or load (if already installed) the listed package inside the `pacman::p_load()` function.
```{r demo5_packages}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
daymetr, # get Daymet data
sf, # vector data operations
dplyr, # data wrangling
data.table, # data wrangling
ggplot2, # for map creation
RhpcBLASctl, # to get the number of available cores
future.apply, # parallelization
lfe, # fast regression with many fixed effects
modelsummary # regression table generation
)
```
### Project Demonstration
We have already collected annual groundwater pumping data by irrigation wells in 2010 and 2011 in Kansas from the Water Information Management & Analysis System (WIMAS) database. Let's read in the groundwater use data.
```{r Demo5_gw}
#--- read in the data ---#
(
gw_KS_sf <- readRDS("Data/gw_KS_sf.rds")
)
```
We have `r length(unique(gw_KS_sf$well_id))` wells in total, and each well has records of groundwater pumping (`af_used`) for years 2010 and 2011. Here is the spatial distribution of the wells.
```{r Demo5_map}
KS_counties <- readRDS("Data/KS_county_borders.rds")
tm_shape(KS_counties) +
tm_polygons() +
tm_shape(gw_KS_sf) +
tm_symbols(size = 0.05, col = "black")
```
<!--
#=========================================
# Daymet data download and processing
#=========================================
-->
---
We now need to get monthly precipitation and maximum temperature data. We have decided that we use [Daymet](https://daymet.ornl.gov/) weather data. Here we use the `download_daymet()` function from the `daymetr` package^[[daymetr vignette](https://cran.r-project.org/web/packages/daymetr/vignettes/daymetr-vignette.html)] that allows us to download all the weather variables for a specified geographic location and time period^[See [here]() for a fuller explanation of how to use the `daymetr` package.]. We write a wrapper function that downloads Daymet data and then processes it to find monthly total precipitation and mean maximum temperature^[This may not be ideal for a real research project because the original raw data is not kept. It is often the case that your econometric plan changes on the course of your project (e.g., using other weather variables or using different temporal aggregation of weather variables instead of monthly aggregation). When this happens, you need to download the same data all over again.]. We then loop over the `r nrow(gw_KS_sf)` wells, which is parallelized using the `future_apply()` function^[For parallelized computation, see Chapter \@ref(par-comp)] from the `future.apply` package. This process takes about an hour on my Mac with parallelization on 7 cores. The data is available in the data repository for this course (named as "all_daymet.rds").
```{r Demo5_getdaymet}
#--- get the geographic coordinates of the wells ---#
well_locations <-
gw_KS_sf %>%
unique(by = "well_id") %>%
dplyr::select(well_id) %>%
cbind(., st_coordinates(.))
#--- define a function that downloads Daymet data by well and process it ---#
get_daymet <- function(i) {
temp_site <- well_locations[i, ]$well_id
temp_long <- well_locations[i, ]$X
temp_lat <- well_locations[i, ]$Y
data_temp <-
download_daymet(
site = temp_site,
lat = temp_lat,
lon = temp_long,
start = 2010,
end = 2011,
#--- if TRUE, tidy data is returned ---#
simplify = TRUE,
#--- if TRUE, the downloaded data can be assigned to an R object ---#
internal = TRUE
) %>%
data.table() %>%
#--- keep only precip and tmax ---#
.[measurement %in% c("prcp..mm.day.", "tmax..deg.c."), ] %>%
#--- recover calender date from Julian day ---#
.[, date := as.Date(paste(year, yday, sep = "-"), "%Y-%j")] %>%
#--- get month ---#
.[, month := month(date)] %>%
#--- keep only April through September ---#
.[month %in% 4:9, ] %>%
.[, .(site, year, month, date, measurement, value)] %>%
#--- long to wide ---#
dcast(site + year + month + date ~ measurement, value.var = "value") %>%
#--- change variable names ---#
setnames(c("prcp..mm.day.", "tmax..deg.c."), c("prcp", "tmax")) %>%
#--- find the total precip and mean tmax by month-year ---#
.[, .(prcp = sum(prcp), tmax = mean(tmax)), by = .(month, year)] %>%
.[, well_id := temp_site]
return(data_temp)
gc()
}
```
Here is what one run (for the first well) of `get_daymet()` returns
```{r Demo5_one_run}
#--- one run ---#
(
returned_data <- get_daymet(1)[]
)
```
We get the number of cores you can use by `RhpcBLASctl::get_num_procs()` and parallelize the loop over wells using `future_lapply()`.^[For Mac users, `mclapply` or `pbmclapply` (`mclapply` with progress bar) are good alternatives.]
```{r eval = FALSE}
#--- prepare for parallelization ---#
num_cores <- get_num_procs() - 1 # number of cores
plan(multiprocess, workers = num_cores) # set up cores
#--- run get_daymet with parallelization ---#
(
all_daymet <-
future_lapply(
1:nrow(well_locations),
get_daymet
) %>%
rbindlist()
)
```
```{r all_daymet_read, echo = FALSE}
all_daymet <- readRDS("Data/all_daymet.rds")
all_daymet
```
---
Before merging the Daymet data, we need to reshape the data into a wide format to get monthly precipitation and maximum temperature as columns.
```{r Demo5_long_wide}
#--- long to wide ---#
daymet_to_merge <-
all_daymet %>%
dcast(
well_id + year ~ month,
value.var = c("prcp", "tmax")
)
#--- take a look ---#
daymet_to_merge
```
Now, let's merge the weather data to the groundwater pumping dataset.
```{r Demo5_merge_weather}
(
reg_data <-
data.table(gw_KS_sf) %>%
#--- keep only the relevant variables ---#
.[, .(well_id, year, af_used)] %>%
#--- join ---#
daymet_to_merge[., on = c("well_id", "year")]
)
```
---
Let's run regression and display the results.
```{r Demo5_reg_display}
#--- run FE ---#
reg_results <-
fixest::feols(
af_used ~ # dependent variable
prcp_4 + prcp_5 + prcp_6 + prcp_7 + prcp_8 + prcp_9
+ tmax_4 + tmax_5 + tmax_6 + tmax_7 + tmax_8 + tmax_9 |
well_id + year, # FEs
cluster = "well_id",
data = reg_data
)
#--- display regression results ---#
modelsummary::modelsummary(
reg_results,
stars = TRUE,
gof_omit = "IC|Log|Adj|Within|Pseudo"
)
```
That's it. Do not bother to try to read into the regression results. Again, this is just an illustration of how R can be used to prepare a regression-ready dataset with spatial variables.