Note: new exercises coming soon.

Exercise: Explore Differences in Selecting from `data.frame`

Make a data frame, called mydata, that has 3 variables: var1 which has the numbers 1 to 10, var2 which has the letters A-J, and var3 which has any 10 numbers you want to include.

After you’ve made the data frame, then add a fourth column, month, with the first 10 months:

mydata<-data.frame(var1=1:10, var2=LETTERS[1:10], 
                   var3=c(9,4.5,33,45.6,-14,3,7,0.2,0,7))
mydata$month <- month.name[1:10]
mydata

##    var1 var2  var3     month
## 1     1    A   9.0   January
## 2     2    B   4.5  February
## 3     3    C  33.0     March
## 4     4    D  45.6     April
## 5     5    E -14.0       May
## 6     6    F   3.0      June
## 7     7    G   7.0      July
## 8     8    H   0.2    August
## 9     9    I   0.0 September
## 10   10    J   7.0   October

Run each command below to figure out what type of data you get back.

Hint: Use the function typeof() to examine what is returned in each case.

mydata[1]
mydata[[1]]
mydata$var2
mydata["var2"]
mydata[1, 1]
mydata[, 1]
mydata[1,]
mydata[-1,]

Answer

Why is mydata$var2 of type integer? Factors are stored as integers. Why is var2 a factor in the first place? Take a look at the help page for data.frame to see if you can find the option that turned letters into factors.

Exercise: Working with a `data.frame`

R has some built-n data sets. One of them is called iris. You can just use the iris object (it’s a data.frame) without creating it first.

Hint: if you want iris to show up in the Environment tab, load it into your environment with data(iris). Otherwise, you can still use it, but it may not show in that tab.

Get the dimensions of iris. Then get a list of the names.

Output the first 10 rows of iris.

View the iris data frame in the RStudio data viewer.

Select from the iris data frame the observations where the Sepal.Width is less than 2.5. Do the same, but add the condition that the Sepal.Length must also be less than 5.

Using which.max, select a row with maximum Sepal.Length. Is there only one row with the maximum value?

Rename the column Petal.Width to petalwidth. Challenge: do this without hard coding in the column number.

Answer

dim(iris)

## [1] 150   5

names(iris)

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"

head(iris, 10)

##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa

View(iris) # or click on the name iris in the Environment tab to the right if you've loaded it there.

iris[iris$Sepal.Width < 2.5,]

##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 42           4.5         2.3          1.3         0.3     setosa
## 54           5.5         2.3          4.0         1.3 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 120          6.0         2.2          5.0         1.5  virginica

iris[iris$Sepal.Width < 2.5 & iris$Sepal.Length < 5,]

##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 42          4.5         2.3          1.3         0.3     setosa
## 58          4.9         2.4          3.3         1.0 versicolor

iris[which.max(iris$Sepal.Length),] # which.max only returns first index of a maximum value

##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 132          7.9         3.8          6.4           2 virginica

iris[iris$Sepal.Length == max(iris$Sepal.Length),] # check for other max Sepal.Length rows

##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 132          7.9         3.8          6.4           2 virginica

names(iris)[names(iris)=='Petal.Width'] <- 'pedalwidth'
names(iris)

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "pedalwidth"  
## [5] "Species"

Exercise: Working with a `data.frame`: `mtcars`

Another built-in data set is called mtcars.

Explore mtcars. Look at the help page for mtcars to see the variable definitions.

Get the cars with 6 cylinders
Get just the weight (wt) column
Make a new data frame called fuel_efficient cars that has cars with mpg > 25
Do the fuel efficient cars have lower average horsepower (hp) than the overall average?
Challenge: Which car with 3 gears has the highest horsepower (hp)? Hint: subset first, save as new variable, then find max

Answer

mtcars[mtcars$cyl==6,]

##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

mtcars$wt # or mtcars[,"wt"]

##  [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440
## [12] 4.070 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520
## [23] 3.435 3.840 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780

fuel_efficient <- mtcars[mtcars$mpg > 25,]
mean(fuel_efficient$hp) < mean(mtcars$hp)

## [1] TRUE

tmp <- mtcars[mtcars$gear==3,]
tmp[which.max(tmp$hp),]

##             mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Duster 360 14.3   8  360 245 3.21 3.57 15.84  0  0    3    4

Challenge Exercise: Use `subset`

Look up the help for the subset function. Can you use it to repeat some of the operations in the two exercises above?

Note: we’ll cover this function later in the workshop.

Exercise: Select elements from a list

The repurrrsive package has example list objects in it. Load the package (install first if needed). Then:

View the sw_people list with the View function. What’s this a list of?
Get the contents of first element in the list
Get the name of the first person in the list
What’s the gender of the last person in the list?

Answer

install.packages("repurrrsive")

library(repurrrsive)

View(sw_people)

sw_people[[1]]

## $name
## [1] "Luke Skywalker"
## 
## $height
## [1] "172"
## 
## $mass
## [1] "77"
## 
## $hair_color
## [1] "blond"
## 
## $skin_color
## [1] "fair"
## 
## $eye_color
## [1] "blue"
## 
## $birth_year
## [1] "19BBY"
## 
## $gender
## [1] "male"
## 
## $homeworld
## [1] "http://swapi.co/api/planets/1/"
## 
## $films
## [1] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/"
## [3] "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
## [5] "http://swapi.co/api/films/7/"
## 
## $species
## [1] "http://swapi.co/api/species/1/"
## 
## $vehicles
## [1] "http://swapi.co/api/vehicles/14/" "http://swapi.co/api/vehicles/30/"
## 
## $starships
## [1] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/22/"
## 
## $created
## [1] "2014-12-09T13:50:51.644000Z"
## 
## $edited
## [1] "2014-12-20T21:17:56.891000Z"
## 
## $url
## [1] "http://swapi.co/api/people/1/"

sw_people[[1]]$name

## [1] "Luke Skywalker"

sw_people[[length(sw_people)]]$gender

## [1] "female"

Exercise: Fix subsetting errors

A researcher is trying to select observations from the iris data frame for the setosa species, but gets an error:

iris[Species="setosa"]

## Error in `[.data.frame`(iris, Species = "setosa"): unused argument (Species = "setosa")

Correct this expression to select the observations the researcher wants. Hint: there may be more than 1 thing wrong.

Answer

Answer: Three mistakes, in the order that R will give you errors: 1) Use ==, not =, to test for equality; 2) Include the name of the data frame before the column: iris$Species; and 3) The researcher forgot the comma after the condition to indicate to select all columns from the data frame.

iris[iris$Species=="setosa",]

##    Sepal.Length Sepal.Width Petal.Length pedalwidth Species
## 1           5.1         3.5          1.4        0.2  setosa
## 2           4.9         3.0          1.4        0.2  setosa
## 3           4.7         3.2          1.3        0.2  setosa
## 4           4.6         3.1          1.5        0.2  setosa
## 5           5.0         3.6          1.4        0.2  setosa
## 6           5.4         3.9          1.7        0.4  setosa
## 7           4.6         3.4          1.4        0.3  setosa
## 8           5.0         3.4          1.5        0.2  setosa
## 9           4.4         2.9          1.4        0.2  setosa
## 10          4.9         3.1          1.5        0.1  setosa
## 11          5.4         3.7          1.5        0.2  setosa
## 12          4.8         3.4          1.6        0.2  setosa
## 13          4.8         3.0          1.4        0.1  setosa
## 14          4.3         3.0          1.1        0.1  setosa
## 15          5.8         4.0          1.2        0.2  setosa
## 16          5.7         4.4          1.5        0.4  setosa
## 17          5.4         3.9          1.3        0.4  setosa
## 18          5.1         3.5          1.4        0.3  setosa
## 19          5.7         3.8          1.7        0.3  setosa
## 20          5.1         3.8          1.5        0.3  setosa
## 21          5.4         3.4          1.7        0.2  setosa
## 22          5.1         3.7          1.5        0.4  setosa
## 23          4.6         3.6          1.0        0.2  setosa
## 24          5.1         3.3          1.7        0.5  setosa
## 25          4.8         3.4          1.9        0.2  setosa
## 26          5.0         3.0          1.6        0.2  setosa
## 27          5.0         3.4          1.6        0.4  setosa
## 28          5.2         3.5          1.5        0.2  setosa
## 29          5.2         3.4          1.4        0.2  setosa
## 30          4.7         3.2          1.6        0.2  setosa
## 31          4.8         3.1          1.6        0.2  setosa
## 32          5.4         3.4          1.5        0.4  setosa
## 33          5.2         4.1          1.5        0.1  setosa
## 34          5.5         4.2          1.4        0.2  setosa
## 35          4.9         3.1          1.5        0.2  setosa
## 36          5.0         3.2          1.2        0.2  setosa
## 37          5.5         3.5          1.3        0.2  setosa
## 38          4.9         3.6          1.4        0.1  setosa
## 39          4.4         3.0          1.3        0.2  setosa
## 40          5.1         3.4          1.5        0.2  setosa
## 41          5.0         3.5          1.3        0.3  setosa
## 42          4.5         2.3          1.3        0.3  setosa
## 43          4.4         3.2          1.3        0.2  setosa
## 44          5.0         3.5          1.6        0.6  setosa
## 45          5.1         3.8          1.9        0.4  setosa
## 46          4.8         3.0          1.4        0.3  setosa
## 47          5.1         3.8          1.6        0.2  setosa
## 48          4.6         3.2          1.4        0.2  setosa
## 49          5.3         3.7          1.5        0.2  setosa
## 50          5.0         3.3          1.4        0.2  setosa

Exercise: Counting with Conditionals

Using the built-in data set mtcars:

How many cars have mpg greater than 30?
How many have horse power (hp) less than 100?

Hint: you can use the sum function to count the number of TRUE observations in a vector (as we did with sum(is.na(x))). This works because as.numeric(TRUE) == 1 and as.numeric(FALSE) == 0.

Answer

sum(mtcars$mpg > 30)

## [1] 4

sum(mtcars$hp < 100)

## [1] 9

Challenge Exercise: Matrix Manipulation

Using information from Linear Algebra in R by Søren Højsgaard if needed, create a 10 x 5 matrix of random normal numbers and a 5 x 1 vector of values 1 to 5 and multiply them using matrix multiplication. Transpose your result.

Create a 6 x 6 matrix of random normal draws and take it’s inverse (solve it). Extract the diagonal from the result.

Hint: Generate random normal draws with rnorm: look it up to see the options.

Hint: Matrix multiplication operator is %*%.

Answer

X <- matrix(rnorm(50), nrow=10)
y <- 1:5
t(X %*% y) # t() is for transpose

##           [,1]      [,2]      [,3]     [,4]      [,5]     [,6]      [,7]
## [1,] -6.186134 -8.698666 -13.45169 9.365158 -11.67129 6.173199 -2.014841
##          [,8]       [,9]      [,10]
## [1,] 7.671724 -0.8886068 -0.1088036

X <- matrix(rnorm(36), nrow=6)
solve(X)

##              [,1]      [,2]       [,3]       [,4]        [,5]       [,6]
## [1,]  0.898008661 -4.371979  1.5687571 -1.6835242  0.68364196  1.1551904
## [2,]  0.307268033 -1.911628  0.8315138 -1.0306877  0.32797496  0.6538460
## [3,]  0.217433175 -1.684304  0.8171005 -0.5388321  0.33416961  0.7782129
## [4,] -0.005916087 -1.897348  0.7975908 -0.9158237 -0.03031322  0.4244748
## [5,]  0.487403907 -2.252167  1.1872060 -0.7517260  0.73619933  0.3407939
## [6,] -0.193256628  1.992749 -0.3031673  0.7206709 -0.52226175 -0.5259394

diag(solve(X))

## [1]  0.8980087 -1.9116280  0.8171005 -0.9158237  0.7361993 -0.5259394

Note that the specific numbers may differ when you run this code due to randomness. To prevent numbers from changing, set the random seed with set.seed(123) where 123 is any integer.

Challenge Exercise: List Indexing

Select the vector of letters from the following object.

Do it both using names (hint: printing the object might give you a hint) and using index numbers.

Extra hint: use the object viewer in RStudio to view the list after you create it. See if there’s any functionality to help you with this challenge.

nested_list <- list(level1 = list(level2 = list(letters = LETTERS)))

Answer

nested_list$level1$level2$letters

##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

nested_list[[1]][[1]][[1]]

##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

Challenge Exercise: Use `apply`

Use the apply function to get the average value of each variable in mtcars. You’ll need to look at the help page to figure out how to use it.

Answer

apply(mtcars, 2, mean)

##        mpg        cyl       disp         hp       drat         wt 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250 
##       qsec         vs         am       gear       carb 
##  17.848750   0.437500   0.406250   3.687500   2.812500

Exercises Part 2: Data Frames and Other Data Structures

Christina Maimone

2018-06-27

Exercise: Explore Differences in Selecting from `data.frame`

Answer

Exercise: Working with a `data.frame`

Answer

Exercise: Working with a `data.frame`: `mtcars`

Answer

Challenge Exercise: Use `subset`

Exercise: Select elements from a list

Answer

Exercise: Fix subsetting errors

Answer

Exercise: Counting with Conditionals

Answer

Challenge Exercise: Matrix Manipulation

Answer

Challenge Exercise: List Indexing

Answer

Challenge Exercise: Use `apply`

Answer

Exercises Part 2: Data Frames and Other Data Structures

Christina Maimone

2018-06-27

Exercise: Explore Differences in Selecting from data.frame

Answer

Exercise: Working with a data.frame

Answer

Exercise: Working with a data.frame: mtcars

Answer

Challenge Exercise: Use subset

Exercise: Select elements from a list

Answer

Exercise: Fix subsetting errors

Answer

Exercise: Counting with Conditionals

Answer

Challenge Exercise: Matrix Manipulation

Answer

Challenge Exercise: List Indexing

Answer

Challenge Exercise: Use apply

Answer

Exercise: Explore Differences in Selecting from `data.frame`

Exercise: Working with a `data.frame`

Exercise: Working with a `data.frame`: `mtcars`

Challenge Exercise: Use `subset`

Challenge Exercise: Use `apply`