Exercise: Make a vector

Start by making a vector, named x, with the numbers 1 through 26.

Multiply the vector by 2.

Give the resulting vector names A through Z. Hint: see the documentation page for names for an example. Hint 2: there is a built in vector called LETTERS

Print the vector out with the elements in reverse order. Hint: you don’t need a function to do this (although there is one that will); you can do it based on how you specify the indices to select from the vector.

Answer

x <- 1:26
x <- x * 2
names(x) <- LETTERS
x
##  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y 
##  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 
##  Z 
## 52
x[26:1]
##  Z  Y  X  W  V  U  T  S  R  Q  P  O  N  M  L  K  J  I  H  G  F  E  D  C  B 
## 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10  8  6  4 
##  A 
##  2

Exercise: Manipulating Vectors

Set the 10th and 13th elements of the x vector you made in the last exercise to missing.

Then select every other element, starting with the second, of the vector. See the help for seq to help with this.

Take the mean of the log of the values in x. Exclude missing values.

Make a second vector badvals that has the values 4, 6, 8, and 10 in it. Use it to select the values of x that aren’t in badvals. Hint: remember what the not operator is? Put it at the beginning of the expression inside the []. You may also want to look up the help page for match if you don’t remember how to tell if values are in another vector.

Answer

x[c(10, 13)]<-NA
x
##  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y 
##  2  4  6  8 10 12 14 16 18 NA 22 24 NA 28 30 32 34 36 38 40 42 44 46 48 50 
##  Z 
## 52
x[seq(2, length(x), 2)]
##  B  D  F  H  J  L  N  P  R  T  V  X  Z 
##  4  8 12 16 NA 24 28 32 36 40 44 48 52
mean(log(x), na.rm=TRUE)
## [1] 3.042904
badvals <- c(4, 6, 8, 10)
x[!x %in% badvals]
##  A  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z 
##  2 12 14 16 18 NA 22 24 NA 28 30 32 34 36 38 40 42 44 46 48 50 52

Exercise: Looking up Values

The code below generates a vector with randomly selected (lowercase) letters. Check to see if your first initial is in the vector. Which index positions is it at?

Make a table of the frequencies of the letters in random_letters. Sort the table with the sort function.

Challenge: Replace occurences of the most frequent letter in random_letters with an * instead (don’t worry about any ties). Do this without hard coding in the value of the most frequent letter (meaning, use variables, don’t type the value of the letter). Hint: you’ll probably need to use the names function at some point to get the most frequent letter.

random_letters<-sample(letters, 50, replace=TRUE)

Answer

Note that your output may be different than what’s below because of the random selection of values.

'c' %in% random_letters # test for inclusion
## [1] FALSE
which(random_letters == 'c') # get actual position
## integer(0)
sort(table(random_letters))
## random_letters
## d i j n p s x a b q r u w g l m y k o f v 
## 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 4 4 5 6

which gets you all positions in the vector. match would only give you the index of the first match

# save the table results
table_results <- sort(table(random_letters), decreasing=TRUE)

# get the name of the first element
freq_letter <- names(table_results)[1]

# use that to replace values
random_letters[random_letters == freq_letter] <- '*'

random_letters
##  [1] "o" "j" "i" "m" "k" "g" "*" "y" "p" "k" "*" "u" "*" "f" "d" "*" "f"
## [18] "*" "b" "l" "x" "w" "s" "f" "k" "q" "u" "r" "b" "o" "a" "o" "w" "n"
## [35] "q" "m" "m" "*" "y" "r" "l" "f" "o" "k" "y" "g" "l" "g" "a" "f"

If you searched the internet for an answer, you may have been able to come up with other solutions.

Exercise: rep and seq

Create a vector where A appears 1 time, B appears 2 times, C appears 3 times, etc. for the alphabet. How long is the resulting vector?

Answer

x <- rep(LETTERS, times=1:26)
x
##   [1] "A" "B" "B" "C" "C" "C" "D" "D" "D" "D" "E" "E" "E" "E" "E" "F" "F"
##  [18] "F" "F" "F" "F" "G" "G" "G" "G" "G" "G" "G" "H" "H" "H" "H" "H" "H"
##  [35] "H" "H" "I" "I" "I" "I" "I" "I" "I" "I" "I" "J" "J" "J" "J" "J" "J"
##  [52] "J" "J" "J" "J" "K" "K" "K" "K" "K" "K" "K" "K" "K" "K" "K" "L" "L"
##  [69] "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "M" "M" "M" "M" "M" "M" "M"
##  [86] "M" "M" "M" "M" "M" "M" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"
## [103] "N" "N" "N" "O" "O" "O" "O" "O" "O" "O" "O" "O" "O" "O" "O" "O" "O"
## [120] "O" "P" "P" "P" "P" "P" "P" "P" "P" "P" "P" "P" "P" "P" "P" "P" "P"
## [137] "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q" "Q"
## [154] "R" "R" "R" "R" "R" "R" "R" "R" "R" "R" "R" "R" "R" "R" "R" "R" "R"
## [171] "R" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S"
## [188] "S" "S" "S" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T"
## [205] "T" "T" "T" "T" "T" "T" "U" "U" "U" "U" "U" "U" "U" "U" "U" "U" "U"
## [222] "U" "U" "U" "U" "U" "U" "U" "U" "U" "U" "V" "V" "V" "V" "V" "V" "V"
## [239] "V" "V" "V" "V" "V" "V" "V" "V" "V" "V" "V" "V" "V" "V" "V" "W" "W"
## [256] "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W"
## [273] "W" "W" "W" "W" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X"
## [290] "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "Y" "Y" "Y" "Y" "Y" "Y"
## [307] "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y" "Y"
## [324] "Y" "Y" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z"
## [341] "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z" "Z"
length(x)
## [1] 351

Exercise: Joining Strings

To concatenate (join) character (string, text) data, use the paste function. The default text that’s put between the strings when they are joined is a space, but you can change this with the sep argument. paste0 pastes strings together with no separator.

paste("Ahmad", "Haddad", sep=" ")
## [1] "Ahmad Haddad"

Create a variable with your age. Use paste to create a sentence with your age in it.

Answer

age <- 25
paste("My age is", age, sep=": ") # just one example
## [1] "My age is: 25"

Exercise: Workspace management

Remove the x vector you created above from your workspace. Check to make sure it’s gone.

Answer

rm(x)
ls() #list all objects, check by looking
## [1] "age"            "answers"        "badvals"        "freq_letter"   
## [5] "iknit"          "objs"           "params"         "random_letters"
## [9] "table_results"
"x" %in% ls() # is "x" in the output?
## [1] FALSE

You can also see the list of objects in the Environment tab in the upper right window in RStudio.

Exercise: Packages

Install the package fortunes. Look at the package documentation to figure out what it does and call the main method of the package.

Answer

install.packages("fortunes")

Or use the Packages tab in the bottom right in RStudio to install the package via the GUI. To load a package with RStudio, check the box next to it in the Packages tab. To read the documentation, click on the package name in the packages list.

library(fortunes) # don't forget to load it!
fortune()
## 
## Andrew Thomas: ...and if something goes wrong here it is probably not
## WinBUGS since that has been running for more than 10 years...
## Peter Green (from the back): ... and it still hasn't converged!
##    -- Andrew Thomas and Peter Green (during the talk about 'BRugs')
##       gR 2003, Aalborg (September 2003)

Challenge Exercise: Working with Strings

Part 1: Create a variable called age with your age. Using that variable, output your age as part of a sentence like “My age is 25.” Hint: look up the paste function.

Part 2: Concatenate all of the letters in LETTERS together into a single object, with a “-” between each letter. Hint: you’ll need the collapse argument of paste. Save this result as a new object allletters.

Load the stringr package. Using a function from the stringr package (look at the package documentation to find one), split allletters on “-” so that you have individual letters again, saved in another object. What type of object did you get back after splitting the letters? How can you get this back to a vector?

Part 3 (advanced): Find a function in base R (not the stringr package) that also splits strings (you may need to use Google). Compare the documentation to the function you found above. What’s the difference? Hint: try supplying a vector with different split patterns to each function and see what output you get.

Answer

age<-25
paste("My age is", age)
## [1] "My age is 25"
allletters<-paste(LETTERS, collapse = "-")
allletters
## [1] "A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z"
library(stringr)
split_letters<-str_split(allletters, "-")
split_letters
## [[1]]
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
typeof(split_letters)
## [1] "list"
split_letters <- split_letters[[1]]

strsplit will also split strings. Besides some small differences in the handling of regular expressions, the main difference from str_split from the stringr package is that str_split is vectorized over the input strings and patterns, while strsplit is only vectorized over the input strings. This means you can use str_split to split the same string on multiple patterns, but to do that with strsplit you’d have to make multiple calls.

In most circumstances, you could use either function. The stringr package incorporates some functions that exist in base R because it standardizes how you use a larger set of functions to manipulate strings.

If you don’t know what regular expressions are, take a look at Regular Expression in R, from Gloria Li and Jenny Bryan, or the Regular Expressions vignette from stringr package to get started. If you work with text data, or really any large data files, they can save you tons of time reformatting files and data.

Challenge Exercise: Packages and Strings

Install the cowsay package. Combine the output of the fortunes package (which you installed and used in an exercise above – if not, go back and do that) with the functionality of the cowsay package. Hint: you will also need to use the paste function with the collapse argument. To keep pretty formatting, also use the capture.output function.

Answer

This exercise is based on a blog post by Ista Zahn.

install.packages("cowsay")
library(cowsay)

say( # the cowsay command
  paste( # paste together individual lines of output
    capture.output( # captures printed output as character data - each line is a separate element in a vector
      fortune() # prints a fortune to the screen
      ), 
    collapse="\n" # argument to paste that tells it to join vector elements with a newline
    )
  )

Or drop capture.output:

say(paste(fortune(), collapse="\n"))

To understand the above better, try running the steps from the inside out to see what each stage feeds the next.

fortune()
capture.output(fortune())
paste(capture.output(fortune()), collapse="\n")
paste(fortune(), collapse="\n") # how is this different from above?

fortune() returns a list. paste will combine the elements together directly, but capture.output gets formatted output that includes spacing.

Exercise: Factors

Use the code below to create a vector with randomly selected names of months. Then turn this vector into a factor. Create a table of the value frequencies in the vector with the table function.

Then remake the factor by explicitly setting ordered levels. Hint: month.abb is a built in vector that lists the months in order.

Make a table again. What’s the difference?

months<-sample(month.abb, size=40, replace=TRUE)

Answer

months<-sample(month.abb, size=40, replace=TRUE)
months<-factor(months)
table(months) # months should be alphabetical
## months
## Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep 
##   1   4   4   1   3   4   4   4   6   2   3   4
months<-factor(months, levels=month.abb, ordered=TRUE)
table(months) # months should be in order
## months
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##   3   1   4   1   6   4   4   4   4   3   2   4

Challenge Exercise: Dates

Use R to calculate how many days you’ve been alive. Then calculate it in terms of weeks. Find out what day of the week you were born on. You’ll need to search the internet to figure out what functions to use, and use the R help as necessary.

Hint: You might want to start with the help page for Date, the class of date objects in R. Alternatively, take a look at the lubridate package.

Answer

If you were born on January 1, 1980:

Sys.Date()
## [1] "2018-06-27"
Sys.Date() - as.Date("1980-01-01")
## Time difference of 14057 days
difftime(Sys.Date(), as.Date("1980-01-01"), units="weeks")
## Time difference of 2008.143 weeks
weekdays(as.Date("1980-01-01"))
## [1] "Tuesday"

Or, using the lubridate package

library(lubridate)
today()
## [1] "2018-06-27"
today() - ymd("1980-01-01")
## Time difference of 14057 days
interval(ymd("1980-01-01"), today()) / weeks(1)
## [1] 2008.143
wday(ymd("1980-01-01"), label=TRUE)
## [1] Tue
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat