-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy path01_z01_groundwater_pumping.rmd
310 lines (243 loc) · 10.4 KB
/
01_z01_groundwater_pumping.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
### Project Overview
---
**Objective:**
* Understand the impact of groundwater pumping on groundwater level.
---
**Datasets**
* Groundwater pumping by irrigation wells in Chase, Dundy, and Perkins Counties in the southwest corner of Nebraska
* Groundwater levels observed at USGS monitoring wells located in the three counties and retrieved from the National Water Information System (NWIS) maintained by USGS using the `dataRetrieval` package.
---
**Econometric Model**
In order to achieve the project objective, we will estimate the following model:
$$
y_{i,t} - y_{i,t-1} = \alpha + \beta gw_{i,t-1} + v
$$
where $y_{i,t}$ is the depth to groundwater table^[the distance from the surface to the top of the aquifer] in March^[For our geographic focus of southwest Nebraska, corn is the dominant crop type. Irrigation for corn happens typically between April through September. For example, this means that changes in groundwater level ($y_{i,2012} - y_{i,2011}$) captures the impact of groundwater pumping that occurred April through September in 2011.] in year $t$ at USGS monitoring well $i$, and $gw_{i,t-1}$ is the total amount of groundwater pumping that happened within the 2-mile radius of the monitoring well $i$.
---
**GIS tasks**
* read an ESRI shape file as an `sf` (spatial) object
- use `sf::st_read()`
* download depth to water table data using the `dataRetrieval` package developed by USGS
- use `dataRetrieval::readNWISdata()` and `dataRetrieval::readNWISsite()`
* create a buffer around USGS monitoring wells
- use `sf::st_buffer()`
* convert a regular `data.frame` (non-spatial) with geographic coordinates into an `sf` (spatial) objects
- use `sf::st_as_sf()` and `sf::st_set_crs()`
* reproject an `sf` object to another CRS
- use `sf::st_transform()`
* identify irrigation wells located inside the buffers and calculate total pumping
- use `sf::st_join()`
+ create maps
* use the `tmap` package
---
**Preparation for replication**
Run the following code to install or load (if already installed) the `pacman` package, and then install or load (if already installed) the listed package inside the `pacman::p_load()` function.
```{r demo1_packages}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
sf, # vector data operations
dplyr, # data wrangling
dataRetrieval, # download USGS NWIS data
lubridate, # Date object handling
modelsummary, # regression table generation
lfe # fast regression with many fixed effects
)
```
### Project Demonstration
The geographic focus of the project is the southwest corner of Nebraska consisting of Chase, Dundy, and Perkins County (see Figure \@ref(fig:NE-county) for their locations within Nebraska). Let's read a shape file of the three counties represented as polygons. We will use it later to spatially filter groundwater level data downloaded from NWIS.
```{r NE_county_data, echo = FALSE, results = "hide"}
#--- Nebraska counties ---#
NE_county <-
st_read(
dsn = "Data",
layer = "cb_2018_us_county_20m"
) %>%
filter(STATEFP == "31") %>%
mutate(NAME = as.character(NAME)) %>%
st_transform(32614)
three_counties <-
st_read(dsn = "Data", layer = "urnrd") %>%
st_transform(32614)
```
```{r Demo1_read_urnrd_borders}
three_counties <-
st_read(dsn = "Data", layer = "urnrd") %>%
#--- project to WGS84/UTM 14N ---#
st_transform(32614)
```
```{r NE-county, fig.cap = "The location of Chase, Dundy, and Perkins County in Nebraska", echo = F}
#--- map the three counties ---#
tm_shape(NE_county) +
tm_polygons() +
tm_shape(three_counties) +
tm_polygons(col = "blue", alpha = 0.3) +
tm_layout(frame = FALSE)
```
---
We have already collected groundwater pumping data, so let's import it.
```{r Demo1_urnrd_gw_read}
#--- groundwater pumping data ---#
(
urnrd_gw <- readRDS("Data/urnrd_gw_pumping.rds")
)
```
`well_id` is the unique irrigation well identifier, and `vol_af` is the amount of groundwater pumped in acre-feet. This dataset is just a regular `data.frame` with coordinates. We need to convert this dataset into a object of class `sf` so that we can later identify irrigation wells located within a 2-mile radius of USGS monitoring wells (see Figure \@ref(fig:sp-dist-wells) for the spatial distribution of the irrigation wells).
```{r convert_to_sf}
urnrd_gw_sf <-
urnrd_gw %>%
#--- convert to sf ---#
st_as_sf(coords = c("lon", "lat")) %>%
#--- set CRS WGS UTM 14 (you need to know the CRS of the coordinates to do this) ---#
st_set_crs(32614)
#--- now sf ---#
urnrd_gw_sf
```
```{r sp-dist-wells, fig.cap = "Spatial distribution of irrigation wells", echo = F}
tm_shape(three_counties) +
tm_polygons() +
tm_shape(unique(urnrd_gw_sf, by = "well_id")) +
tm_symbols(size = 0.1, col = "blue") +
tm_layout(frame = FALSE)
```
---
Here are the rest of the steps we will take to create a regression-ready dataset for our analysis.
1. download groundwater level data observed at USGS monitoring wells from National Water Information System (NWIS) using the `dataRetrieval` package
2. identify the irrigation wells located within the 2-mile radius of the USGS wells and calculate the total groundwater pumping that occurred around each of the USGS wells by year
3. merge the groundwater pumping data to the groundwater level data
---
Let's download groundwater level data from NWIS first. The following code downloads groundwater level data for Nebraska from Jan 1, 1990, through Jan 1, 2016.
```{r gwl_data_download, eval = F}
#--- download groundwater level data ---#
NE_gwl <-
readNWISdata(
stateCd = "Nebraska",
startDate = "1990-01-01",
endDate = "2016-01-01",
service = "gwlevels"
) %>%
dplyr::select(site_no, lev_dt, lev_va) %>%
rename(date = lev_dt, dwt = lev_va)
#--- take a look ---#
head(NE_gwl, 10)
```
```{r read_NW_gwl, echo = F}
NE_gwl <- readRDS("Data/NE_gwl.rds")
#--- take a look ---#
head(NE_gwl, 10)
```
`site_no` is the unique monitoring well identifier, `date` is the date of groundwater level monitoring, and `dwt` is depth to water table.
We calculate the average groundwater level in March by USGS monitoring well (right before the irrigation season starts):^[`month()` and `year()` are from the `lubridate` package. They extract month and year from a `Date` object.]
```{r avg_march_deptn}
#--- Average depth to water table in March ---#
NE_gwl_march <-
NE_gwl %>%
mutate(
date = as.Date(date),
month = month(date),
year = year(date),
) %>%
#--- select observation in March ---#
filter(year >= 2007, month == 3) %>%
#--- gwl average in March ---#
group_by(site_no, year) %>%
summarize(dwt = mean(dwt))
#--- take a look ---#
head(NE_gwl_march, 10)
```
Since `NE_gwl` is missing geographic coordinates for the monitoring wells, we will download them using the `readNWISsite()` function and select only the monitoring wells that are inside the three counties.
```{r NE_sites}
#--- get the list of site ids ---#
NE_site_ls <- NE_gwl$site_no %>% unique()
#--- get the locations of the site ids ---#
sites_info <-
readNWISsite(siteNumbers = NE_site_ls) %>%
dplyr::select(site_no, dec_lat_va, dec_long_va) %>%
#--- turn the data into an sf object ---#
st_as_sf(coords = c("dec_long_va", "dec_lat_va")) %>%
#--- NAD 83 ---#
st_set_crs(4269) %>%
#--- project to WGS UTM 14 ---#
st_transform(32614) %>%
#--- keep only those located inside the three counties ---#
.[three_counties, ]
```
---
We now identify irrigation wells that are located within the 2-mile radius of the monitoring wells^[This can alternatively be done using the `st_is_within_distance()` function.]. We first create polygons of 2-mile radius circles around the monitoring wells (see Figure \@ref(fig:buffer-map)).
```{r create_buffer}
buffers <- st_buffer(sites_info, dist = 2 * 1609.34) # in meter
```
```{r buffer-map, fig.cap = "2-mile buffers around USGS monitoring wells", echo = F}
tm_shape(three_counties) +
tm_polygons() +
tm_shape(buffers) +
tm_polygons(col = "red", alpha = 0, 2) +
tm_shape(sites_info) +
tm_symbols(size = 0.1) +
tm_layout(frame = FALSE)
```
We now identify which irrigation wells are inside each of the buffers and get the associated groundwater pumping values. The `st_join()` function from the `sf` package will do the trick.
```{r Demo_join_buffer_gw, cache = FALSE}
#--- find irrigation wells inside the buffer and calculate total pumping ---#
pumping_nearby <- st_join(buffers, urnrd_gw_sf)
```
Let's take a look at a USGS monitoring well (`site_no` = $400012101323401$).
```{r take_a_look}
filter(pumping_nearby, site_no == 400012101323401, year == 2010)
```
As you can see, this well has seven irrigation wells within its 2-mile radius in 2010.
Now, we will get total nearby pumping by monitoring well and year.
```{r Demo1_summary_by_buffer, cache = TRUE}
(
total_pumping_nearby <-
pumping_nearby %>%
#--- calculate total pumping by monitoring well ---#
group_by(site_no, year) %>%
summarize(nearby_pumping = sum(vol_af, na.rm = TRUE)) %>%
#--- NA means 0 pumping ---#
mutate(
nearby_pumping = ifelse(is.na(nearby_pumping), 0, nearby_pumping)
)
)
```
---
We now merge nearby pumping data to the groundwater level data, and transform the data to obtain the dataset ready for regression analysis.
```{r Demo_nearby_merge}
#--- regression-ready data ---#
reg_data <-
NE_gwl_march %>%
#--- pick monitoring wells that are inside the three counties ---#
filter(site_no %in% unique(sites_info$site_no)) %>%
#--- merge with the nearby pumping data ---#
left_join(., total_pumping_nearby, by = c("site_no", "year")) %>%
#--- lead depth to water table ---#
arrange(site_no, year) %>%
group_by(site_no) %>%
mutate(
#--- lead depth ---#
dwt_lead1 = dplyr::lead(dwt, n = 1, default = NA, order_by = year),
#--- first order difference in dwt ---#
dwt_dif = dwt_lead1 - dwt
)
#--- take a look ---#
dplyr::select(reg_data, site_no, year, dwt_dif, nearby_pumping)
```
---
Finally, we estimate the model using `feols()` from the `fixest` package (see [here](https://cran.r-project.org/web/packages/fixest/vignettes/fixest_walkthrough.html) for an introduction).
```{r Demo_reg_dwt}
#--- OLS with site_no and year FEs (error clustered by site_no) ---#
reg_dwt <-
feols(
dwt_dif ~ nearby_pumping | site_no + year,
cluster = "site_no",
data = reg_data
)
```
Here is the regression result.
```{r reg_dwt_disp}
modelsummary(
reg_dwt,
stars = TRUE,
gof_omit = "IC|Log|Adj|Within|Pseudo"
)
```
---