Exercise: If Statement

Use an if() statement to print a suitable message reporting whether there are any records from 2002 in the gapminder dataset. Now do the same for 2012.

Hint: use the any function.

library(gapminder) # install first if needed
data(gapminder)

Answer

year<-2002
if(any(gapminder$year == year)){
   print(paste("Record(s) for the year",year,"found."))
} else {
  print(paste("No records for year",year))
}
## [1] "Record(s) for the year 2002 found."

Exercise: Loop and If Statements with gapminder

Write a script that loops through the gapminder data by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.

Then, modify the script to loop over each country. This time print out whether the life expectancy is smaller than 50, between 50 and 70, or greater than 70.

Answer

thresholdValue <- 50

for(iContinent in unique(gapminder$continent)) {
   tmp <- mean(gapminder$lifeExp[gapminder$continent==iContinent])
   
   if(tmp < thresholdValue) {
       print(paste("Average Life Expectancy in", iContinent, "is less than", thresholdValue))
   } else {
       print(paste("Average Life Expectancy in", iContinent, "is greater than", thresholdValue))
   } # end if else condition
   
} # end for loop
## [1] "Average Life Expectancy in Asia is greater than 50"
## [1] "Average Life Expectancy in Europe is greater than 50"
## [1] "Average Life Expectancy in Africa is less than 50"
## [1] "Average Life Expectancy in Americas is greater than 50"
## [1] "Average Life Expectancy in Oceania is greater than 50"
lowerThreshold <- 50
upperThreshold <- 70
 
for(iCountry in unique(gapminder$country)){
    tmp <- mean(gapminder$lifeExp[gapminder$country==iCountry])
    
    if(tmp < lowerThreshold){
        print(paste("Average Life Expectancy in", iCountry, "is less than", lowerThreshold))
    }
    else if(tmp > lowerThreshold & tmp < upperThreshold){
        print(paste("Average Life Expectancy in", iCountry, "is between", lowerThreshold, "and", upperThreshold))
    }
    else{
        print(paste("Average Life Expectancy in", iCountry, "is greater than", upperThreshold))
    }
   
}

Ouput for above not printed in the interest of space.

Exercise: Loop with mtcars

Loop through the observations in the built-in mtcars and print the name of the car (it’s the rowname). For each car, also print out whether it has a “manual” or “automatic” transmission.

Hint: Look at the help page for mtcars to see which column refers to transmission type and which value equals which type.

data(mtcars) #load the data

Answer

for (i in 1:nrow(mtcars)) {
  if (mtcars$am[i] == 0) {
    transtype <- "automatic"
  } else {
    transtype <- "manual"
  }
  print(paste(rownames(mtcars)[i], transtype, sep=': '))
}
## [1] "Mazda RX4: manual"
## [1] "Mazda RX4 Wag: manual"
## [1] "Datsun 710: manual"
## [1] "Hornet 4 Drive: automatic"
## [1] "Hornet Sportabout: automatic"
## [1] "Valiant: automatic"
## [1] "Duster 360: automatic"
## [1] "Merc 240D: automatic"
## [1] "Merc 230: automatic"
## [1] "Merc 280: automatic"
## [1] "Merc 280C: automatic"
## [1] "Merc 450SE: automatic"
## [1] "Merc 450SL: automatic"
## [1] "Merc 450SLC: automatic"
## [1] "Cadillac Fleetwood: automatic"
## [1] "Lincoln Continental: automatic"
## [1] "Chrysler Imperial: automatic"
## [1] "Fiat 128: manual"
## [1] "Honda Civic: manual"
## [1] "Toyota Corolla: manual"
## [1] "Toyota Corona: automatic"
## [1] "Dodge Challenger: automatic"
## [1] "AMC Javelin: automatic"
## [1] "Camaro Z28: automatic"
## [1] "Pontiac Firebird: automatic"
## [1] "Fiat X1-9: manual"
## [1] "Porsche 914-2: manual"
## [1] "Lotus Europa: manual"
## [1] "Ford Pantera L: manual"
## [1] "Ferrari Dino: manual"
## [1] "Maserati Bora: manual"
## [1] "Volvo 142E: manual"

Remember that there are other ways to write the loop and conditional statements that are still valid.

Exercise: ifelse

Use the ifelse function to add a new column to mtcars, transmission_name, that contains “automatic” or “manual” as appropriate.

Answer

mtcars$transmission_name <- ifelse(mtcars$am == 0, "automatic", "manual")

Exercise: Write Functions

Create a function that given a data frame will print the name of each column and the class of data it contains. An example test data frame is supplied below. Hint: Use mode() to get the class of the data in each column.

testdf <- data.frame(val1=1:5, val2=c("A","B","C","D","E"), 
                     stringsAsFactors = FALSE)

Create a function that given a vector and an integer will return how many times the integer appears in the vector.

Create a function that given a vector will print the mean and the standard deviation, it will optionally also print the median.

Answer

Note: Some of these were taken or modified from https://www.r-bloggers.com/functions-exercises/

data_frame_info <- function(df) {
  for (col in names(df)) {
    print(paste0(col, ": ", mode(df[,col])))
  }
}
data_frame_info(testdf)
## [1] "val1: numeric"
## [1] "val2: character"
appearances <- function(x, val) {
  sum(x == val)
}
appearances(c(1,2,2,3,3,3,4), 4)
## [1] 1
vector_info <- function(x, include_median=FALSE) {
  print(paste("Mean:", mean(x)))
  print(paste("Standard Deviation:", sd(x)))
  if (include_median) {
    print(paste("Median:", median(x)))
  }
}
vector_info(c(1,2,2,3,3,3,4))
## [1] "Mean: 2.57142857142857"
## [1] "Standard Deviation: 0.975900072948533"
vector_info(c(1,2,2,3,3,3,4), TRUE)
## [1] "Mean: 2.57142857142857"
## [1] "Standard Deviation: 0.975900072948533"
## [1] "Median: 3"

Challenge Exercise: The Birthday Problem

Say we choose 25 people at random. What is the probability two or more of them have the same birthday?

Write R code to figure this out by sampling birthdays (instead of solving it theoretically). Generalize your code to figure out how the probability changes as you increase the number of people in the room. Compute for 5 people up to 50 people, at increments of 5. If you know how to make a plot in R, plot the probability as a function of the number of people.

Model simplifications:

Hints:

Note: Learn more about the birthday problem here.

Answer

Let’s start by thinking of one room. We need to sample 25 birthdays and check how many are repeated. We can simulate this by sampling from a vector of values of 1 to 365, since that represents all possible birthdays.

set.seed(94705) # to ensure answer text below makes sense
birthdays <- sample(1:365, 25, replace=TRUE)
birthdays
##  [1] 135 225 191 242 351 276  10 214 330 288 242 322 129 234  21  86  77
## [18]  17 103  46  23 289  85  54 174

We now want to know how many of those birthdays are repeated.

duplicated(birthdays)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE
sum(duplicated(birthdays))
## [1] 1

One birthday is a duplicate. Two people have the same birthday!

Now we need to repeat this process many times, to see how frequently it happens.

We start by writing the function; make the number of people an argument since we’ll want to change it later:

birthday_function <- function(people=25){
    
    # we populate the room
    birthdays <- sample(1:365, people, replace=TRUE)

    # get the unique number of bdays
    shared_bdays <- sum(duplicated(birthdays))

    # return whether at least one bday is repeated.
    return(shared_bdays>0)
}
set.seed(94705) # use same seed to make sure everything is working
birthday_function(people=25)
## [1] TRUE

Thus, we have one repeat birthday!

Now we can use replicate to repeat the process 1,000 times:

many_sims <- replicate(1000, birthday_function()) # people defaults to 25

To approximate the probability of at least one matching birthday, we can just take the mean of this vector (it’s a vector of booleans, but mean will convert to numeric:

mean(many_sims)
## [1] 0.578

Now want to use R to see how this probability changes as the number of people in the room changes. How can we do this? We can use our function but incorporating it into a loop which varies the number of people in the room:

sims <- data.frame(numpeople=seq(5,50,5),
                   prob=NA)

for(i in 1:nrow(sims)){
    many_sims <- replicate(1000, birthday_function(people=sims$numpeople[i]))
    sims[i,"prob"] <- mean(many_sims)
}

sims
##    numpeople  prob
## 1          5 0.039
## 2         10 0.125
## 3         15 0.250
## 4         20 0.422
## 5         25 0.593
## 6         30 0.689
## 7         35 0.827
## 8         40 0.881
## 9         45 0.942
## 10        50 0.971

Plot the probability:

plot(sims$numpeople, sims$prob, 
     pch=16, col="blue", type="b",# just nice settings
     xlab="Number of People", 
     ylab="Probability of at Least One Match",
     main="Birthday Problem Results")

Challenge Exercise: Write a Substitution Cipher

Note: This challenge requires multiple functions/ideas not covered in the workshop.

Write a function that encodes a string with a substitution cipher.

Do this by:

Then also write a function that decodes from a supplied cipher text and an encoded string.

Check your functions by making them call each other.

Hint: You may want to convert all text to upper or lower case to keep everything consistent. There’s a toupper function. What are you going to do with spaces and punctuation?

Hint 2: Look at the sample function to randomly generate a cipher.

Hint 3: Use match to get the index position of a value in a vector.

Hint 4: strsplit – you can split on an empty string ''

Note: the answer provided doesn’t use it, but the chartr function might be useful.

Answer

encode <- function(input_text) {
  cipher <- sample(LETTERS, 26)
  std_input <- strsplit(toupper(input_text), '')[[1]]
  result <- rep(NA, nchar(input_text))
  for (i in 1:length(std_input)) {
    if (std_input[i] %in% LETTERS) {
      result[i] <- cipher[match(std_input[i], LETTERS)]
    } else {
      result[i] <- std_input[i]
    }
  }
  return(list(cipher=paste(cipher, collapse=""), 
              result=paste(result, collapse="")))
}

encoded_result <- encode("This is my secret text.")
encoded_result
## $cipher
## [1] "UODVRJXNGLKPCZMEAIBYSWFQHT"
## 
## $result
## [1] "YNGB GB CH BRDIRY YRQY."
decode <- function(encoded_text, cipher) {
  std_cipher <- str_split(cipher, '')[[1]]
  std_input <- strsplit(toupper(encoded_text), '')[[1]]
  result <- rep(NA, nchar(encoded_text))
  for (i in 1:length(std_input)) {
    if (std_input[i] %in% std_cipher) {
      result[i] <- LETTERS[match(std_input[i], std_cipher)]
    } else {
      result[i] <- std_input[i]
    }
  }
  return(paste(result=paste(result, collapse="")))
}

decode(encoded_result$result, encoded_result$cipher)
## [1] "THIS IS MY SECRET TEXT."