One thing that isn’t easy to do with SQL is selecting random rows. There are functions to generate random values, but generating a new random column on a large table, and then sorting by that column (or computing the max value and then selecting an observation) is costly. This is one scenario when working with a database from R (or another program) is useful.
Use the code below to create a table in the database. Then figure out how to select 3 random rows from that table (as if you didn’t have access to the code or values that created the table). Do this without reading the entire table into R. Hint: you’ll probably want to use some combination of sorting the table, limiting the number of rows you retrieve, and offsetting results (learn more here or here if you need a refresher).
Remember to set the connection information as appropriate.
The new argument bigint
in the connection function tells the package how to translate data from the database into R.
library(RPostgres)
con <- dbConnect(RPostgres::Postgres(), host="localhost", dbname="workshop",
bigint="numeric")
make_random_name <- function() {
paste(sample(LETTERS, 5), collapse="")
}
measure <- data.frame(id=sample(1:1000, 100),
name=replicate(100, make_random_name()))
dbWriteTable(con, "measure", measure,
row.names = FALSE,
overwrite=TRUE) # in case code is run multiple times
dbplyr
Connect to the dvdrental database. Repeat Exercise: Joining and Grouping 2 from Part 2 using dbplyr
.
library(dbplyr)
library(dplyr)
Connect to a database where you have permission to create tables. (If you’re doing this on your own, you could use SQLite instead of PostgreSQL and write a local database.)
Create three tables with appropriate columns:
person
: at least an ID and name, maybe other characteristics of a personrelationship
: links people together and labels it with a relationship typerelationship_type
: a table defining the allowed set of relationship types in the relationship
tablePopulate the tables with information about your friends and/or family. Hint: think about how you want to handle IDs for people so that you can use them in the relationship table. Hint 2: think about how to make it clear in the relationship table what the direction of the relationship is (e.g. who is the child and who is the parent?). Hint 3: you can use dbWriteTable
to add data to an existing table – set the append
argument.
Once you have the tables created, select data from the database and print out sentences describing the family relationships.
Note: the process of creating database tables and populating them from R is a bit messy and doesn’t have a good natural workflow. This exercise is to help you get used to doing it, but you generally want to avoid creating databases from R in most circumstances. Use R to select data from databases, not create databases (at least stable, production ones).