You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: r/exercises_with_answers.Rmd
+2-3
Original file line number
Diff line number
Diff line change
@@ -91,12 +91,11 @@ dbDisconnect(con)
91
91
```
92
92
93
93
94
-
## Exercise: `dbplyr`
94
+
## Exercise: `dplyr`
95
95
96
-
Connect to the dvdrental database. Repeat [Exercise: Joining and Grouping 2](https://github.com/nuitrcs/databases_workshop/blob/master/sql/part2_exercises_with_answers.md) from Part 2 using `dbplyr`.
96
+
Connect to the dvdrental database. Repeat [Exercise: Joining and Grouping 2](https://github.com/nuitrcs/databases_workshop/blob/master/sql/part2_exercises_with_answers.md) from Part 2 using `dplyr`.
<p>We used paste function above because we have control over offset – it would be better to use a prepared query, but since we aren’t getting input from a user, it’s not super dangerous.</p>
432
432
<p>An alternative approach, which could work well if the table isn’t too big, is to retrieve all of the IDs, and then randomly sample the IDs, and retrieve just those rows.</p>
<p>Connect to the dvdrental database. Repeat <a href="https://github.com/nuitrcs/databases_workshop/blob/master/sql/part2_exercises_with_answers.md">Exercise: Joining and Grouping 2</a> from Part 2 using <code>dbplyr</code>.</p>
439
-
<pre class="r"><code>library(dbplyr)
440
-
library(dplyr)</code></pre>
436
+
<div id="exercise-dplyr" class="section level2">
437
+
<h2>Exercise: <code>dplyr</code></h2>
438
+
<p>Connect to the dvdrental database. Repeat <a href="https://github.com/nuitrcs/databases_workshop/blob/master/sql/part2_exercises_with_answers.md">Exercise: Joining and Grouping 2</a> from Part 2 using <code>dplyr</code>.</p>
439
+
<pre class="r"><code>library(dplyr)</code></pre>
441
440
<div id="solution-1" class="section level4">
442
441
<h4>Solution</h4>
443
442
<p>Set your connection information as appropriate for the workshop:</p>
Copy file name to clipboardexpand all lines: r/r_databases.Rmd
+43-14
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ library(RPostgres)
35
35
36
36
We connect with a function call like the following.
37
37
38
-
Note: this code was generated on my local machine connected to a local copy of the database.
38
+
Note: this code was generated on my local machine connected to a local copy of the database. Your connection details will be different. Note I also have permissions to modify this database.
39
39
40
40
```{r}
41
41
con <- dbConnect(RPostgres::Postgres(), host="localhost", dbname="dvdrental")
@@ -88,7 +88,7 @@ If you want part of your query to be determined by a variable -- especially if i
88
88
```{r}
89
89
# YES
90
90
myquery <- dbSendQuery(con, "select * from actor where actor_id = $1")
91
-
dbBind(myquery, list(5))
91
+
dbBind(myquery, list(4))
92
92
dbFetch(myquery)
93
93
```
94
94
@@ -124,8 +124,12 @@ Which are ok, but could get annoying.
124
124
125
125
If you're not a superuser on the `dvdrental` database, just try connecting to a database you can modify. Then the basic function is `dbSendQuery` for any command you want to execute where you aren't retrieving results.
126
126
127
+
Note that by default, statements take effect immediately - they are not in a transaction that you need to commit. To use transactions, see below.
128
+
127
129
```{r, eval=FALSE}
128
-
dbSendQuery(con, statement="update actor set actor_id=5000 where actor_id=5")
130
+
res <- dbSendQuery(con, statement="update actor set first_name='Jenn' where actor_id=4")
There are also methods for managing transactions if you need: `dbBegin`, `dbRollback`, `dbCommit`. Transactions are key for when you need to be sure that a sequence of SQL commands (e.g. `UPDATE`, `CREATE`, `DROP`, `DELETE`, etc.) execute correctly before they're made permanent (i.e. "committed").
154
+
155
+
156
+
```{r, eval=FALSE}
157
+
dbBegin(con)
158
+
dbWriteTable(con, "mynewtable", mytbl)
159
+
dbRollback(con)
160
+
dbGetQuery(con, "SELECT * FROM mynewtable")
161
+
```
162
+
163
+
The above will produce error:
164
+
165
+
```
166
+
Error in result_create(conn@ptr, statement) :
167
+
Failed to prepare query: ERROR: relation "mynewtable" does not exist
168
+
LINE 1: SELECT * FROM mynewtable
169
+
```
170
+
171
+
because the transaction was rolled back, not committed.
172
+
147
173
## Close Connection
148
174
149
175
Connections will get closed when you quit R, but it's good practice to explicitly close them.
@@ -152,17 +178,15 @@ Connections will get closed when you quit R, but it's good practice to explicitl
152
178
dbDisconnect(con)
153
179
```
154
180
155
-
## Transactions
156
181
157
-
There are also methods for managing transactions if you need: `dbBegin`, `dbRollback`, `dbCommit`. Transactions are key for when you need to be sure that a sequence of SQL commands (e.g. `UPDATE`, `CREATE`, `DROP`, `DELETE`, etc.) execute correctly before they're made permanent (i.e. "committed").
158
182
159
183
160
184
# Use `dplyr`
161
185
162
186
For more complete info, see the [RStudio databases site](http://db.rstudio.com/dplyr/).
@@ -193,7 +216,7 @@ If we look at this object, it doesn't have data in it:
193
216
str(actortbl)
194
217
```
195
218
196
-
It just has connection information. `dbplyr` will try to perform operations within the database where it can, instead of pulling all of the data into R.
219
+
It just has connection information. `dplyr` will try to perform operations within the database where it can, instead of pulling all of the data into R.
197
220
198
221
Yet you can print the object and see observations:
199
222
@@ -229,7 +252,7 @@ rentaltbl %>%
229
252
show_query()
230
253
```
231
254
232
-
You can use `collect` to pull down all of the data (tell `dbplyr` to stop being lazy).
255
+
You can use `collect` to pull down all of the data (tell `dplyr` to stop being lazy).
233
256
234
257
```{r, echo=TRUE}
235
258
# First, without collecting
@@ -242,14 +265,20 @@ df1
242
265
Looks OK, except:
243
266
244
267
```{r, eval=FALSE}
245
-
df[1,]
268
+
df1[1,]
246
269
```
247
270
248
271
Gives you:
249
272
250
-
`Error in df[1, ] : object of type 'closure' is not subsettable`
273
+
`Error in df1[1, ] : incorrect number of dimensions`
274
+
275
+
It's the wrong dimensions because `df1` isn't actually a data.frame:
276
+
277
+
```{r}
278
+
str(df1)
279
+
```
251
280
252
-
Which is a strange error, but it is telling us we need to collect the data.
281
+
It is telling us we need to collect the data first to actually pull it into R.
253
282
254
283
```{r, echo=TRUE}
255
284
# Then with collecting
@@ -276,14 +305,14 @@ custtbl %>%
276
305
```
277
306
278
307
279
-
You could create a table with `copy_to` (if you have write permissions)
308
+
You could create a table with `copy_to` (if you have the correct permissions)
0 commit comments