Exercise: Selecting Random Data

One thing that isn’t easy to do with SQL is selecting random rows. There are functions to generate random values, but generating a new random column on a large table, and then sorting by that column (or computing the max value and then selecting an observation) is costly. This is one scenario when working with a database from R (or another program) is useful.

Use the code below to create a table in the database. Then figure out how to select 3 random rows from that table (as if you didn’t have access to the code or values that created the table). Do this without reading the entire table into R. Hint: you’ll probably want to use some combination of sorting the table, limiting the number of rows you retrieve, and offsetting results (learn more here or here if you need a refresher).

Remember to set the connection information as appropriate.

The new argument bigint in the connection function tells the package how to translate data from the database into R.

library(RPostgres)
con <- dbConnect(RPostgres::Postgres(), host="localhost", dbname="workshop",
                 bigint="numeric") 

make_random_name <- function() {
  paste(sample(LETTERS, 5), collapse="")
}

measure <- data.frame(id=sample(1:1000, 100), 
                      name=replicate(100, make_random_name()))

dbWriteTable(con, "measure", measure,
             row.names = FALSE,
             overwrite=TRUE) # in case code is run multiple times

Exercise: dbplyr

Connect to the dvdrental database. Repeat Exercise: Joining and Grouping 2 from Part 2 using dbplyr.

library(dbplyr)
library(dplyr)

Exercise: Create and Populate Tables

Connect to a database where you have permission to create tables. (If you’re doing this on your own, you could use SQLite instead of PostgreSQL and write a local database.)

Create three tables with appropriate columns:

Populate the tables with information about your friends and/or family. Hint: think about how you want to handle IDs for people so that you can use them in the relationship table. Hint 2: think about how to make it clear in the relationship table what the direction of the relationship is (e.g. who is the child and who is the parent?). Hint 3: you can use dbWriteTable to add data to an existing table – set the append argument.

Once you have the tables created, select data from the database and print out sentences describing the family relationships.

Note: the process of creating database tables and populating them from R is a bit messy and doesn’t have a good natural workflow. This exercise is to help you get used to doing it, but you generally want to avoid creating databases from R in most circumstances. Use R to select data from databases, not create databases (at least stable, production ones).