databases_workshop

Databases

There are four databases workshops that cover parts of the material in this repo.

The workshops use PostgreSQL. Much of the material applies generically to SQL and other relational database systems, but some of it is specific to PostgreSQL.

Some of the workshops use the database discussed in, and follow much of the content of, the PostgreSQL Tutorial, which makes use of the Saklia database.

Selecting and Joining Data

Intro presentation

The workshop follows the files below, which have links to the exercises.

Part 1

Part 2

Exploring Data

See Exploring Data for materials and exercises.

This workshop uses data files from the nycflights13 R package and open data from the city of Evanston.

Updating and Changing Data

Part 3

Part 4

Creating and Designing Databases

Creating and Designing Materials

R and Python

The R and Python materials may also be of interest to those taking any of the above workshops.

Note

These materials are created for an in-person workshop. Participants will be connecting to a database server that will only be active for the duration of the workshop. In-person workshop participants do not need to install PostgreSQL.

Those wishing to work through the materials on their own will need to install PostgreSQL on their own systems or run a database instance/server via a cloud computing provider.

See Section 1. Getting started with PostgreSQL, from the PostgreSQL Tutorial for details on how to set up your own database server.

Supplementary Software

While in-person workshop participants do not need to install PostgreSQL, you will need a terminal program capable of creating an SSH connection to a remote server. On a Mac, the built-in Terminal program will work. On Windows, we suggest PuTTY if you don’t already have another program installed.

This repository also includes materials for connecting to a database using Python or R. For Python, you will need to install the psycopg2 package. For R, you will need the package RPostgres.

Resources

Background

Basic Explanation of Relational Databases: from the BBC, a quick explanation of relational databases

Software

DataGrip Tutorial: video on how to use the DataGrip program; it even uses the same database we use in this workshop.

Reference

PostgreSQL cheat sheet: a list of basic commands and patterns for statements

PostgreSQL Documentation: official documentation

dvdrental Diagram: entity-relationship diagram for the database used in the workshop

psql commands cheat sheet: describe commands, other slash commands

Additional Exercises/Tutorials

These resources use PostgreSQL or SQL generally.

Mode SQL Tutorials: good introduction, and you can try running queries on their platform

PostgreSQL Exercises: interactive, online exercises to practice SQL skills in a PostgreSQL environment.

SQL Tutorial: from SQL Zoo. Not specific to PostgreSQL. Also has interactive exercises.

Learn SQL from Code Academy with a free interactive online course

Try SQL: from Code School; the basic course is free. Interactive, online tutorial.

Intermediate SQL Tutorial: intermediate level SQL tutorial from Dataquest; uses PostgreSQL and Python (pandas, psycopg2) and includes exercises. For when you’re ready for more practice beyond the basics.

Welcome to SQL from Khan Academy provides a one-hour intro to databases that includes writing SQL to create tables, insert data, and query data.

Intro to SQL: Querying and managing data from Khan Academy

Intro to SQL from Kaggle uses Google’s BigQuery instead of PostgreSQL. Commands are generally standard SQL though.

SQL for Data Analysis from Udacity - an online self-paced course