Practical 1

Requirements

  • R and RStudio must be installed prior to the session - you can follow the instructions below

Lecture

Lab

1. R basics

In the first module of the workshop, the goals are to familiarize with the language and the logic behind it; Get started with R studio and create your first project; Create your first .R file to write down the live code; and get comfortabble with installing packages.

Module content: - Seeking help - Installing packages

Script

R Markdown File

R Markdown files act as notebooks which can make it easier to document and share code. All the results from the code as well as your comments and descriptions can be combined into one, tidy document. This makes it easy to combine insights, protocols and figures in a practical and organized fashion.

Setting up:

Installing a package

To install a package, we simply use the function install.packages() with the name of the desired package in quotation marks (i.e. as a string!)

install.packages("packageName1")

# for multiple packages:
install.packages(c("packageName1", "packageName2"))
Loading packages

To load a package you simply use the library() function. Note function takes the name of an installed library without quotation marks.

library(packageName1)

R Basics:

Variable names

There are several naming conventions in coding: snake_case and CamelCase are the most common. There is no particular preference in R for naming, but make sure you are consistent and that your variable names are meaningful.

Case sensitivity

R is case sensitive. This means that var1 and Var1 represent two different objects.

1 based indexing

Indexing (calling elements in a list) in R starts with 1. The first element has index 1, the second 2, etc..

# Initialising a vector
var1 <- c('a', 'b', "c", "d") # note ' and " are the same in R :)

# Here we call elements 1 and 2 of var1
var1[1]
var1[2]

#Now we look at how long var1 is using the length() function

length(var1)

Exercises

1. Create a file

Create a new .R script in R studio. Save it as Script01 in the IntroToR folder.

2. Install base packages

In Script01, install the following packages: "data.table","datasets","devtools","dplyr","ggplot2","plyr","medicaldata","gapminder","RColorBrewer","rmarkdown","stringr","tidyr","tidyverse","viridis".

3. Load a package

Load all packages.

Solutions

1. Creating a file
2. Install base packages
install.packages(c("data.table","datasets","devtools","dplyr","ggplot2","plyr","medicaldata","gapminder","RColorBrewer","rmarkdown","stringr","tidyr","tidyverse","viridis"))
3. load packages
library(data.table)
library(datasets)
library(devtools)
library(dplyr)
library(ggplot2)
library(plyr)
library(medicaldata)
library(gapminder)
library(RColorBrewer)
library(rmarkdown)
library(stringr)
library(tidyr)
library(tidyverse)
library(viridis)

2. Data types: attributes and built-in functions

In this section participants will understand the differences between classes, objects and data types in R; Compute arithmetic operations; Use logical operators; create objects of different types, learn about their attributes and apply some built-in functions in R; Subset and index objects; and get comfortable with vectorized operations.

Module content: - Vectors - Lists - Factors - Data frames - Arrays - Coercion

Script

Examples

In this section we go over the material.

Comments

Comments are embedded into your code to help explain its purpose. This is extremely important for reproducibility and for documentation. To start a comment line, you use : #. The whole line is commented. Multi-line comments are available using #'.

# This is a comment line 
# print('hello world')

#' start comment
#' 
#' end comment!
Creating variables

You ‘assign’ a value to a variable using the assigning operator: <-

# The convention is to use left hand assignation 
var1 <- 12
var2 <- "hello world"

Hint: Look at your environment!

Printing Variables
var1
var2

print(var1)
print(var2)
# It is also possible to use the '=' sign, but is NOT convention and therefore not good practice.
var1 = 13
var2 = "hello world"
var1
var2
# Now what happens when you use the function rm()? hint: look at your environment!
rm(var1)
Data types and data structures
Atomic Classes

Atomic classes are the fundamental data type found in R. All subsequent data structures are used to store entries of different atomic classes.

1.Numeric

They store numbers as double, and it is stored with decimals. The term double refers to the number of bytes required to store it. Each double is accurate up to 16 significant digits.

2.Integer

They store numbers that can be written without a decimal component. Adding an L after an integer tells R to store it as an integer class instead of a numeric

3.Logical

They store the outputs of logical statements - TRUE or FALSE. Can be converted to integer where TRUE = 1 and FALSE = 0.

4.Character

Represents text. Can either be a single character or a word/sentence.

5.Missing Value

Used by R to indicate a missing data entry. Useful for manipulating data sets where missing entries are common.

Arithmetic Operations
# Addition  
2+100000
# Subtraction  
3-5
# Multiplication  
71*9
# Division  
90/((3*5) + 4)
# Power  
2^3
Logical operators
# First create two numeric variables  
var1 <- 35  
var2 <- 27
# Equal to  
var1 == var2
# Less than or equal to  
var1 <= var2   

# Not equal
var1 != var2
# They also work with other classes  
var1 <- "mango" 
var2 <- "mangos"
var1 == var2

Strings are compared character by character until they are not equal or there are no more characters left to compare.

var1 < var2

We can test if a variable is contained in another object using the contained operator %in%.

"c" %in% letters  
"c" %in% LETTERS

# note letters and LETTERS are built in constants. letters is the lowercase alphabet as a vector, and LETTERS uppercase.
Data Structures
Vectors

Key points:
- Can only contain objects of the same class
- Most basic type of R object
- Variables are vectors

1.Numeric

Creating a numeric vector using c()

x <- c(0.3, 0.1)
x

Using the vector() function

x <- vector(mode = "numeric",length = 10)
x

Using the numeric() function

x <- numeric(length = 10)
x

Creating a numeric vector with a sequence of numbers

# x <- seq(1,10,1)
# x

x <- seq(1,10,2)
x

x <- rep(2,10)
x

Check length of vector with length()

x
length(x)

y <- rep(2,5)
y
length(y)

length(x) == length(y)

2.Integer

Creating an integer vector using c()

x <- c(1L,2L,3L,4L,5L)  
x

Creating an integer vector of a sequences of numbers

x <- 1:10
x

3.Logical

Creating a logical vector with c()

x <- c(TRUE,FALSE,T,F)
x

Creating a logical vector with vector()

x <- vector(mode = "logical",length = 5)
x

Creating a logical vector using logical()

x <- logical(length = 10)
x

4.Character

x<-c("a","b","c")
x

x<-vector(mode = "character",length=10)
x

x<-character(length = 3)
x

Some useful functions to modify strings

tolower(LETTERS)
toupper(letters)
paste(letters,1:length(letters),sep="_") # Note the implicit coercion

5.Vector attributes

The elements of a vector can have names

x<-1:5
names(x)<-c("one","two","three","four","five")
x

x<-logical(length = 4)
names(x)<-c("F1","F2","F3","F4")
x
Built-in functions

To inspect the contents of a vector

is.vector(x) # Check if it is a vector
is.na(x) # Check if it is empty
is.null(x) # Check if it is NULL
is.numeric(x) # Check if it is numeric
is.logical(x) # Check if it is logical
is.character(x) # Check if it is character

To know what kind of vector you are working with

class(x) # Atomic class type
typeof(x) # Object type or data structure (matrix, list, array...)
str(x)

To know more about the data contained in the vector

Mathematical operations
sum(x)
min(x) 
max(x)
x <- seq(1,10,1)
mean(x) 
median(x) 
sd(x)
log(x) 
exp(x)

Other operations

length(x)
table(x)
summary(x)

Grouping elements in a vector using tapply

measurements<-sample(1:1000,6) 
samples<-factor(c(rep("case",3),rep("control",3)), 
                levels = c("control", "case"))
tapply(measurements, samples, mean)

Vector Operations

x<-1:10
y<-11:20
x*2
x+y
x*y
x^y

Recycling

If one of the vectors is smaller than the other, operations are still possible. R will replicate the smaller vector to enable the operation to occur.

IMPORTANT: if the larger vector is NOT a multiple of the smaller vector, the replication will still occur but will end at the length of the larger vector.

x<-1:10
y<-c(1,2,3)
x+y
Indexing and subsetting

For this example, lets create a vector of random numbers from 1 to 100 of size 15.

x<-sample(x = 1:100,size = 15,replace = F) 
x

Using the index/position

x[1] # Get the first element
x[13] # Get the thirteenth element

Using a vector of indices

x[1:12] # The first 12 numbers
x[c(1,5,6,8,9,13)] # Specific positions only

names(x) <- letters[1:length(x)]

x[c('a','c','d')]

Using a logical vector

# Only numbers that are less than or equal to 10
x<10
x[x>95] 
# 
# # Only even numbers 
# x%%2 == 0
# x[x%%2 == 0]
x<10
x[x<=10] # Only numbers that are less than or equal to 10

Skipping elements using indices

x[c(-1, -5)]

Skipping elements using names

x<-1:10
names(x)<-letters[1:10]
x[names(x) != "a"]
Factors

Key points:

  • Useful for categorical data
  • Can have implicit order, if needed
  • Each element has a label or level
  • Some operations behave differently on factors

Creating factors with factor

cols<-factor(x = c(rep("red",4),
                   rep("blue",5),
                   rep("green",2)),              
             levels = c("red","blue","green"))
cols
samples <- c("case", "control", "control", "case") 
samples 
samples_factor <- factor(samples, levels = c("control", "case")) 
samples_factor 
str(samples_factor)
Lists

Key points:
- Can contain objects of multiple classes
- Extremely powerful when combined with some R built-in functions

Creating lists with different data types

l <- list(1:10, list("hello",'hi'), TRUE)
l

Assigning names as we create the list

l<-list(title = "Numbers", 
        numbers = 1:10, 
        logic = TRUE )
l
names(l)

l$numbers
Indexing and subsetting

Using [[]] instead of []

l[[1]]

Using $ for named lists

l$logic
Built-in functions
l<-list(sample(1:100,10),
        sample(1:100,10),
        sample(1:100,10))
names(l)<-c("r1","r2","r3")
l

Performing operations on all elements of the list using lapply

lsums<-lapply(l,sum)
lsums

lsums <- lapply(l,function(a){
  sum(a)^2
})
lsums
Matrices

Creating a matrix full of zeros with matrix()

m<-matrix(0, ncol=6, nrow=3)
m
class(m)
typeof(m)

Creating a matrix from a vector of numbers

m<-matrix(1:5, ncol=2, nrow=5)
m
Attributes

Names of each dimension

colnames(m)<-letters[1:2]
rownames(m)<-LETTERS[1:5]
m
str(m)
Built-in functions

To know the size of the matrix

dim(m)
ncol(m)
nrow(m)
Data frames

Key points:

  • Columns in data frames are vectors

  • Each column can be of a different data type

  • A data frame is essentially a list of vectors

Creating a data frame using data.frame()

df<-data.frame(numbers=1:10,
               low_letters=letters[1:10],
               logical_values=rep(c(T,F),each=5))
df
class(df)
typeof(df)
str(df)

Re-naming columns

colnames(df)[2]<-"lowercase"
head(df)
View(df)
Indexing and sub-setting
df$numbers
df["numbers"]
df[1,]
df[,1]

df[3,3]
Coercion

Converting between data types with as. functions

x<-1:10
as.list(x)
l<-list(numbers=1:10,
        lowercase=letters[1:10])
l
typeof(l)
df<-as.data.frame(l)
df
typeof(df)

Exercises

Either complete these in the .Rmd file or create a new .R script and complete them.

1. Write a piece of code that stores a number in a variable and then check if it is greater than 5. Try to use comments!

Bonus: Is there a way to store the result of checking if the number is greater than 5?

2. Make a vector with the numbers 1 through 26. Multiply the vector by 2, and give the resulting vector names A through Z

hint: there is a built in vector called LETTERS

3. Make a matrix with the numbers 1:50, with 5 columns and 10 rows. Did the matrix function fill your matrix by column, or by row, as its default behavior? Once you have figured it out, try to change the default.

hint: read the documentation for matrix. To automatically call in documentation type ? before the function, package, etc.

Bonus: Which of the following commands was used to generate the matrix below?

  • matrix(c(4, 1, 9, 5, 10, 7), nrow = 3)
  • matrix(c(4, 9, 10, 1, 5, 7), ncol = 2, byrow = TRUE)
  • matrix(c(4, 9, 10, 1, 5, 7), nrow = 2)
  • matrix(c(4, 1, 9, 5, 10, 7), ncol = 2, byrow = TRUE)
4. Create a list of length two containing a character vector for each of the data sections: (1) Data types and (2) Data structures. Populate each character vector with the names of the data types and data structures, respectively.
5. Data frames There are several subtly different ways to call variables, observations and elements from data frames. Try them all and discuss with your team what they return.

Hint, use the function typeof()

  • iris[1]
  • iris[[1]]
  • iris$Species
  • iris["Species"]
  • iris[1,1]
  • iris[,1]
  • iris[1,]
6. Take the list you created in 4 and coerce it into a data frame. Then change the names of the columns to “dataTypes” and “dataStructures”
7. Create a vector x of numbers from 1 to 100. Find all the odd numbers in x

Hint: 3 %% 2 = 1 and 4 %% 2 = 0

Solutions

1. Write a piece of code that stores a number in a variable and then check if it is greater than 5. Try to use comments!

Bonus: Is there a way to store the result after checking the number?

x <- 10
x > 5

#Bonus
y <- x > 5

print(y)
2. Make a vector with the numbers 1 through 26. Multiply the vector by 2, and give the resulting vector names A through Z

hint: there is a built in vector called LETTERS

x <- 1:26
x <- x * 2
names(x) <- LETTERS
3. Make a matrix with the numbers 1:50, with 5 columns and 10 rows. Did the matrix function fill your matrix by column, or by row, as its default behavior? Once you have figured it out, try to change the default.

hint: read the documentation for matrix. To automatically call in documentation type ? before the function, package, etc.

# By default the matrix is filled by columns, we can change this behavior using byrow=TRUE
m<-matrix(1:50,ncol = 5,nrow = 10,byrow = T)

Bonus: Which of the following commands was used to generate the matrix below?

  • matrix(c(4, 1, 9, 5, 10, 7), nrow = 3)
  • matrix(c(4, 9, 10, 1, 5, 7), ncol = 2, byrow = TRUE)
  • matrix(c(4, 9, 10, 1, 5, 7), nrow = 2)
  • matrix(c(4, 1, 9, 5, 10, 7), ncol = 2, byrow = TRUE)
matrix(c(4, 1, 9, 5, 10, 7), ncol = 2, byrow = TRUE)
4. Create a list of length two containing a character vector for each of the data sections: (1) Data types and (2) Data structures. Populate each character vector with the names of the data types and data structures, respectively.
dt <- c('double', 'complex', 'integer', 'character', 'logical')
ds <- c('data.frame', 'vector', 'factor', 'list', 'matrix')
data.sections <- list(dt, ds)
5. Data frames There are several subtly different ways to call variables, observations and elements from data frames. Try them all and discuss with your team what they return.

Hint, use the function typeof()

  • iris[1]
  • iris[[1]]
  • iris$Species
  • iris["Species"]
  • iris[1,1]
  • iris[,1]
  • iris[1,]
# The single brace [1] returns the first slice of the list, as another list. In this case it is the first column of the data frame.
iris[1]
# The double brace [[1]] returns the contents of the list item. In this case it is the contents of the first column, a vector of type factor.
iris[[1]]
# This example uses the $ character to address items by name. Species is a vector of type factor.
iris$Species
# A single brace ["Species"] instead of the index number with the column name will also return a list like in the first example
iris["Species"]
# First element of first row and first column. The returned element is an integer
iris[1,1]
# First column. Returns a vector
iris[,1]
# First row. Returns a list with all the values in the first row.
iris[1,]
6. Take the list you created in 4 and coerce it into a data frame. Then change the names of the columns to “dataTypes” and “dataStructures”
df<-as.data.frame(data.sections)
colnames(df)<-c("dataTypes","dataStructures")
7. Create a vector x of numbers from 1 to 100. Find all the odd numbers in x

Hint: 3 %% 2 = 1 and 4 %% 2 = 0

x <- c(1:100)
x[x %% 2 ==1]

3. Control structures

In this module, participants learn to use if-else, while, and for loops; as well as building their own functions; and what packages are.

Module Content * If..Else * While loop * For loop * Functions * Packages

Script

Examples

In this section we go over the material.

If… Else

We can tell our code to perform certain tasks if a certain condition is met.

x <- 5

#If the number is greater than 5, get its square
if (x > 5){
  x^2
} else{
  #If not, multiply it by 2
  x*2
}
# Can incorporate multiple conditions
x <- 7

if (x > 5 && x < 10){
  #If x is greater than 5 AND is less than 10
  x^2
} else if ( x < 0 || x > 10){
  #If x is less than 0 OR greater than 10
  -x
} else {
  x * 2
}
For & While Loops

For loops let you iterate over the elements of a vector

#Iterate from 1 to 10

for (i in 1:10){
  print(i)
}
#Iterate through strings
x <- c('a','ab','cab','taxi')

for (i in x){
  print(i)
}

While loops let you iterate over a piece of code until a condition is no longer met.

x <- 5
#while x is greater than 0
while (x > 0){
  print(x)
  x <- x -1
}
Functions

Sometimes, we need to repeatedly use a sequence of code. Creating functions helps reduce the amount of clutter in our code.

#Example Function that calculates mean

new_mean <- function(values){
  print(values)
  #return()
  sum(values)/length(values)
}

x <- 1:5

mean(x) == new_mean(x)
Installing packages

There are multiple sources and ways to do this.

CRAN
install.packages(c("dplyr","ggplot2","gapminder","medicaldata"))
BioConductor

For more details about the project you can visit https://www.bioconductor.org

To install packages from BioConductor you first need to install BioConductor itself.

if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(version = "3.16")

Then you can install any package you want by using the install

BiocManager::install("DESeq2")
GitHub

If you want to install the development version of a package, or you are installing something that is only available on GitHub you can use devtools

devtools::install_github('andreacirilloac/updateR')

Exercises

1. Initialise variable x with numbers 1 through 10. Calculate the sum of x using base R. Then write a function to perform the sum of x.
2. Create a function that returns TRUE if the number is even, and FALSE if the number passed is odd. Test it.

Hint: Use if and else!

Solutions

Note these solutions are not unique :)

1. Initialise variable x = c(1,2,3,4,5,6,7,8,9,10) using base R. Calculate the sum of x. Then write a function to perform the sum of x.
x <- c(1,2,3,4,5,6,7,8,9,10)

sum(x)


new.sum <- function(x){
  
  len <- length(x) # also base R!
  
  y <- 0
  
  for(i in 1:len){
    
    y <- y + x[i]

  }
  
  return(y)
  
}

new.sum(x)
2. Create a function that returns TRUE if the number is even, and FALSE if the number passed is odd. Test it.

Hint: Use if and else!

is.even <- function(x){
  
  if(x %%2 == 0){
    y <- TRUE
  }else{
    y <- FALSE
  }
  
  return(y)
}

4. Basic data manipulation

In this module participants will learn how to read/write data to/from files with different formats (.tsv, .csv); become familiar with basic data-frame operations; (3) index and subset data frames using base R; manipulate individual data frame columns.

Module content: - Reading/writing data - Exploring data frames

Script

Examples

In this section we go over the material.

.Rmd files and paths

Despite being in a .Rproj, the working directory for a .Rmd is automatically the folder the .Rmd file is in.

getwd() #function to get the working directory

Therefore, to save a file into a desired path becomes a bit more complicated. Fortunately the following shorthands exist: . signifies the working directory, therefore setting a path as ./ will save the object to the current directory. .. signifies going up a folder. In this case, ../ will signify the Exercises folder. To save an object in the IntroToR folder, say, we therefore write ../../.

Read/Write

Several nifty functions exist to read and write files.

?read.csv() #reads .csv files into data.frames
?read.table()
?readRDS()
?write.csv()
?write.table()
?saveRDS()

Many packages also allow you to read/write different files. Technically read.csv() comes from the utils package, which is installed directly with R.

?readr::read_csv()
?terra::rast() #in spatial ecology we use the Terra package to read spatial data
Data frames

Data frames are the easiest ways to manipulate, analyse, and visualize data in .R. For these examples we will use data from the package palmerpenguins, a toy data set for exploration and visualization. See https://allisonhorst.github.io/palmerpenguins/.

#install.packages("Rtools")
install.packages("palmerpenguins")
library(palmerpenguins)
# Let us first call and save a copy of the PP dataset penguins
df <- penguins
Indexing

We have seen ways to index data frames in ./02DataTypes.Rmd. As a refresher, we will show ways to index the palmer penguins data set.

df[1] # first row
df[1,1] #first row and first column
df[1,] # first row
df[,1] # first column
df$species #species column
df[, "island"] # named column
Subset-ing

Subset-ing is very similar to indexing, and usually combines indexing with logical operators.

df[df$species == "Adelie",] #rows where the species is Adelie
df[df$sex == "female" & df$year > 2006, ]

: is used as a “through” operator. I.e. 3:6 represents 3, 4, 5, 6. We can use this to subset our data frame.

df[1:3, 1:3]
Manipulating

We can apply mean(), sum(), etc.. to data frame columns.

mean(df$bill_len)

Note we get NA as an answer. Why is that…? There are Nas in the dataset. We can remove them, or we can use a nice trick of mean():

mean(df$bill_len, na.rm = TRUE)

To identify NAs, we use base R function is.na().

#removing NAs uses
df_noNa <- df[!is.na(df$bill_len), ]
mean(df_noNa$bill_len)

Bonus! What is “wrong” with df_noNa ?

Next we can go in and change values! Let’s say all the NAs in bill_len actually should be 0s.

df[is.na(df$bill_len), "bill_len"] <- 0

Bonus! where did these changes save?

mean(df$bill_len)
Adding Data

We can add columns and rows to the data!

df$new_col <- NA #creates a new column with NA values

This can also be done using functions

?cbind()
cbind(df, df$species) #adding the species column to the df

Bonus! Did the above snippet of code modify df?

Apply

apply() refers to a suite of base R functions which apply a function over a vector, array, or list. These have include lapply() , tapply(), sapply(), mapply().

# Here we subset df to numerical columns (which are cols. 3 to 6, where : indicates through). We apply the function mean over the columns, and remove Nas.
apply(df[, 3:6], 2, mean, na.rm = TRUE)
Example R/W

Finally, lets write our toy data to the data folder as a .csv file.

df <- penguins #lets overwrite df with the penguin data to save the original

write.csv(df, "./exercises/exercises_module1/data/penguins.csv" , row.names = F)

To make sure we did it right, we will read the data we saved.

read.csv("./exercises/exercises_module1/data/penguins.csv")
##       species    island bill_len bill_dep flipper_len body_mass    sex year
## 1      Adelie Torgersen     39.1     18.7         181      3750   male 2007
## 2      Adelie Torgersen     39.5     17.4         186      3800 female 2007
## 3      Adelie Torgersen     40.3     18.0         195      3250 female 2007
## 4      Adelie Torgersen       NA       NA          NA        NA   <NA> 2007
## 5      Adelie Torgersen     36.7     19.3         193      3450 female 2007
## 6      Adelie Torgersen     39.3     20.6         190      3650   male 2007
## 7      Adelie Torgersen     38.9     17.8         181      3625 female 2007
## 8      Adelie Torgersen     39.2     19.6         195      4675   male 2007
## 9      Adelie Torgersen     34.1     18.1         193      3475   <NA> 2007
## 10     Adelie Torgersen     42.0     20.2         190      4250   <NA> 2007
## 11     Adelie Torgersen     37.8     17.1         186      3300   <NA> 2007
## 12     Adelie Torgersen     37.8     17.3         180      3700   <NA> 2007
## 13     Adelie Torgersen     41.1     17.6         182      3200 female 2007
## 14     Adelie Torgersen     38.6     21.2         191      3800   male 2007
## 15     Adelie Torgersen     34.6     21.1         198      4400   male 2007
## 16     Adelie Torgersen     36.6     17.8         185      3700 female 2007
## 17     Adelie Torgersen     38.7     19.0         195      3450 female 2007
## 18     Adelie Torgersen     42.5     20.7         197      4500   male 2007
## 19     Adelie Torgersen     34.4     18.4         184      3325 female 2007
## 20     Adelie Torgersen     46.0     21.5         194      4200   male 2007
## 21     Adelie    Biscoe     37.8     18.3         174      3400 female 2007
## 22     Adelie    Biscoe     37.7     18.7         180      3600   male 2007
## 23     Adelie    Biscoe     35.9     19.2         189      3800 female 2007
## 24     Adelie    Biscoe     38.2     18.1         185      3950   male 2007
## 25     Adelie    Biscoe     38.8     17.2         180      3800   male 2007
## 26     Adelie    Biscoe     35.3     18.9         187      3800 female 2007
## 27     Adelie    Biscoe     40.6     18.6         183      3550   male 2007
## 28     Adelie    Biscoe     40.5     17.9         187      3200 female 2007
## 29     Adelie    Biscoe     37.9     18.6         172      3150 female 2007
## 30     Adelie    Biscoe     40.5     18.9         180      3950   male 2007
## 31     Adelie     Dream     39.5     16.7         178      3250 female 2007
## 32     Adelie     Dream     37.2     18.1         178      3900   male 2007
## 33     Adelie     Dream     39.5     17.8         188      3300 female 2007
## 34     Adelie     Dream     40.9     18.9         184      3900   male 2007
## 35     Adelie     Dream     36.4     17.0         195      3325 female 2007
## 36     Adelie     Dream     39.2     21.1         196      4150   male 2007
## 37     Adelie     Dream     38.8     20.0         190      3950   male 2007
## 38     Adelie     Dream     42.2     18.5         180      3550 female 2007
## 39     Adelie     Dream     37.6     19.3         181      3300 female 2007
## 40     Adelie     Dream     39.8     19.1         184      4650   male 2007
## 41     Adelie     Dream     36.5     18.0         182      3150 female 2007
## 42     Adelie     Dream     40.8     18.4         195      3900   male 2007
## 43     Adelie     Dream     36.0     18.5         186      3100 female 2007
## 44     Adelie     Dream     44.1     19.7         196      4400   male 2007
## 45     Adelie     Dream     37.0     16.9         185      3000 female 2007
## 46     Adelie     Dream     39.6     18.8         190      4600   male 2007
## 47     Adelie     Dream     41.1     19.0         182      3425   male 2007
## 48     Adelie     Dream     37.5     18.9         179      2975   <NA> 2007
## 49     Adelie     Dream     36.0     17.9         190      3450 female 2007
## 50     Adelie     Dream     42.3     21.2         191      4150   male 2007
## 51     Adelie    Biscoe     39.6     17.7         186      3500 female 2008
## 52     Adelie    Biscoe     40.1     18.9         188      4300   male 2008
## 53     Adelie    Biscoe     35.0     17.9         190      3450 female 2008
## 54     Adelie    Biscoe     42.0     19.5         200      4050   male 2008
## 55     Adelie    Biscoe     34.5     18.1         187      2900 female 2008
## 56     Adelie    Biscoe     41.4     18.6         191      3700   male 2008
## 57     Adelie    Biscoe     39.0     17.5         186      3550 female 2008
## 58     Adelie    Biscoe     40.6     18.8         193      3800   male 2008
## 59     Adelie    Biscoe     36.5     16.6         181      2850 female 2008
## 60     Adelie    Biscoe     37.6     19.1         194      3750   male 2008
## 61     Adelie    Biscoe     35.7     16.9         185      3150 female 2008
## 62     Adelie    Biscoe     41.3     21.1         195      4400   male 2008
## 63     Adelie    Biscoe     37.6     17.0         185      3600 female 2008
## 64     Adelie    Biscoe     41.1     18.2         192      4050   male 2008
## 65     Adelie    Biscoe     36.4     17.1         184      2850 female 2008
## 66     Adelie    Biscoe     41.6     18.0         192      3950   male 2008
## 67     Adelie    Biscoe     35.5     16.2         195      3350 female 2008
## 68     Adelie    Biscoe     41.1     19.1         188      4100   male 2008
## 69     Adelie Torgersen     35.9     16.6         190      3050 female 2008
## 70     Adelie Torgersen     41.8     19.4         198      4450   male 2008
## 71     Adelie Torgersen     33.5     19.0         190      3600 female 2008
## 72     Adelie Torgersen     39.7     18.4         190      3900   male 2008
## 73     Adelie Torgersen     39.6     17.2         196      3550 female 2008
## 74     Adelie Torgersen     45.8     18.9         197      4150   male 2008
## 75     Adelie Torgersen     35.5     17.5         190      3700 female 2008
## 76     Adelie Torgersen     42.8     18.5         195      4250   male 2008
## 77     Adelie Torgersen     40.9     16.8         191      3700 female 2008
## 78     Adelie Torgersen     37.2     19.4         184      3900   male 2008
## 79     Adelie Torgersen     36.2     16.1         187      3550 female 2008
## 80     Adelie Torgersen     42.1     19.1         195      4000   male 2008
## 81     Adelie Torgersen     34.6     17.2         189      3200 female 2008
## 82     Adelie Torgersen     42.9     17.6         196      4700   male 2008
## 83     Adelie Torgersen     36.7     18.8         187      3800 female 2008
## 84     Adelie Torgersen     35.1     19.4         193      4200   male 2008
## 85     Adelie     Dream     37.3     17.8         191      3350 female 2008
## 86     Adelie     Dream     41.3     20.3         194      3550   male 2008
## 87     Adelie     Dream     36.3     19.5         190      3800   male 2008
## 88     Adelie     Dream     36.9     18.6         189      3500 female 2008
## 89     Adelie     Dream     38.3     19.2         189      3950   male 2008
## 90     Adelie     Dream     38.9     18.8         190      3600 female 2008
## 91     Adelie     Dream     35.7     18.0         202      3550 female 2008
## 92     Adelie     Dream     41.1     18.1         205      4300   male 2008
## 93     Adelie     Dream     34.0     17.1         185      3400 female 2008
## 94     Adelie     Dream     39.6     18.1         186      4450   male 2008
## 95     Adelie     Dream     36.2     17.3         187      3300 female 2008
## 96     Adelie     Dream     40.8     18.9         208      4300   male 2008
## 97     Adelie     Dream     38.1     18.6         190      3700 female 2008
## 98     Adelie     Dream     40.3     18.5         196      4350   male 2008
## 99     Adelie     Dream     33.1     16.1         178      2900 female 2008
## 100    Adelie     Dream     43.2     18.5         192      4100   male 2008
## 101    Adelie    Biscoe     35.0     17.9         192      3725 female 2009
## 102    Adelie    Biscoe     41.0     20.0         203      4725   male 2009
## 103    Adelie    Biscoe     37.7     16.0         183      3075 female 2009
## 104    Adelie    Biscoe     37.8     20.0         190      4250   male 2009
## 105    Adelie    Biscoe     37.9     18.6         193      2925 female 2009
## 106    Adelie    Biscoe     39.7     18.9         184      3550   male 2009
## 107    Adelie    Biscoe     38.6     17.2         199      3750 female 2009
## 108    Adelie    Biscoe     38.2     20.0         190      3900   male 2009
## 109    Adelie    Biscoe     38.1     17.0         181      3175 female 2009
## 110    Adelie    Biscoe     43.2     19.0         197      4775   male 2009
## 111    Adelie    Biscoe     38.1     16.5         198      3825 female 2009
## 112    Adelie    Biscoe     45.6     20.3         191      4600   male 2009
## 113    Adelie    Biscoe     39.7     17.7         193      3200 female 2009
## 114    Adelie    Biscoe     42.2     19.5         197      4275   male 2009
## 115    Adelie    Biscoe     39.6     20.7         191      3900 female 2009
## 116    Adelie    Biscoe     42.7     18.3         196      4075   male 2009
## 117    Adelie Torgersen     38.6     17.0         188      2900 female 2009
## 118    Adelie Torgersen     37.3     20.5         199      3775   male 2009
## 119    Adelie Torgersen     35.7     17.0         189      3350 female 2009
## 120    Adelie Torgersen     41.1     18.6         189      3325   male 2009
## 121    Adelie Torgersen     36.2     17.2         187      3150 female 2009
## 122    Adelie Torgersen     37.7     19.8         198      3500   male 2009
## 123    Adelie Torgersen     40.2     17.0         176      3450 female 2009
## 124    Adelie Torgersen     41.4     18.5         202      3875   male 2009
## 125    Adelie Torgersen     35.2     15.9         186      3050 female 2009
## 126    Adelie Torgersen     40.6     19.0         199      4000   male 2009
## 127    Adelie Torgersen     38.8     17.6         191      3275 female 2009
## 128    Adelie Torgersen     41.5     18.3         195      4300   male 2009
## 129    Adelie Torgersen     39.0     17.1         191      3050 female 2009
## 130    Adelie Torgersen     44.1     18.0         210      4000   male 2009
## 131    Adelie Torgersen     38.5     17.9         190      3325 female 2009
## 132    Adelie Torgersen     43.1     19.2         197      3500   male 2009
## 133    Adelie     Dream     36.8     18.5         193      3500 female 2009
## 134    Adelie     Dream     37.5     18.5         199      4475   male 2009
## 135    Adelie     Dream     38.1     17.6         187      3425 female 2009
## 136    Adelie     Dream     41.1     17.5         190      3900   male 2009
## 137    Adelie     Dream     35.6     17.5         191      3175 female 2009
## 138    Adelie     Dream     40.2     20.1         200      3975   male 2009
## 139    Adelie     Dream     37.0     16.5         185      3400 female 2009
## 140    Adelie     Dream     39.7     17.9         193      4250   male 2009
## 141    Adelie     Dream     40.2     17.1         193      3400 female 2009
## 142    Adelie     Dream     40.6     17.2         187      3475   male 2009
## 143    Adelie     Dream     32.1     15.5         188      3050 female 2009
## 144    Adelie     Dream     40.7     17.0         190      3725   male 2009
## 145    Adelie     Dream     37.3     16.8         192      3000 female 2009
## 146    Adelie     Dream     39.0     18.7         185      3650   male 2009
## 147    Adelie     Dream     39.2     18.6         190      4250   male 2009
## 148    Adelie     Dream     36.6     18.4         184      3475 female 2009
## 149    Adelie     Dream     36.0     17.8         195      3450 female 2009
## 150    Adelie     Dream     37.8     18.1         193      3750   male 2009
## 151    Adelie     Dream     36.0     17.1         187      3700 female 2009
## 152    Adelie     Dream     41.5     18.5         201      4000   male 2009
## 153    Gentoo    Biscoe     46.1     13.2         211      4500 female 2007
## 154    Gentoo    Biscoe     50.0     16.3         230      5700   male 2007
## 155    Gentoo    Biscoe     48.7     14.1         210      4450 female 2007
## 156    Gentoo    Biscoe     50.0     15.2         218      5700   male 2007
## 157    Gentoo    Biscoe     47.6     14.5         215      5400   male 2007
## 158    Gentoo    Biscoe     46.5     13.5         210      4550 female 2007
## 159    Gentoo    Biscoe     45.4     14.6         211      4800 female 2007
## 160    Gentoo    Biscoe     46.7     15.3         219      5200   male 2007
## 161    Gentoo    Biscoe     43.3     13.4         209      4400 female 2007
## 162    Gentoo    Biscoe     46.8     15.4         215      5150   male 2007
## 163    Gentoo    Biscoe     40.9     13.7         214      4650 female 2007
## 164    Gentoo    Biscoe     49.0     16.1         216      5550   male 2007
## 165    Gentoo    Biscoe     45.5     13.7         214      4650 female 2007
## 166    Gentoo    Biscoe     48.4     14.6         213      5850   male 2007
## 167    Gentoo    Biscoe     45.8     14.6         210      4200 female 2007
## 168    Gentoo    Biscoe     49.3     15.7         217      5850   male 2007
## 169    Gentoo    Biscoe     42.0     13.5         210      4150 female 2007
## 170    Gentoo    Biscoe     49.2     15.2         221      6300   male 2007
## 171    Gentoo    Biscoe     46.2     14.5         209      4800 female 2007
## 172    Gentoo    Biscoe     48.7     15.1         222      5350   male 2007
## 173    Gentoo    Biscoe     50.2     14.3         218      5700   male 2007
## 174    Gentoo    Biscoe     45.1     14.5         215      5000 female 2007
## 175    Gentoo    Biscoe     46.5     14.5         213      4400 female 2007
## 176    Gentoo    Biscoe     46.3     15.8         215      5050   male 2007
## 177    Gentoo    Biscoe     42.9     13.1         215      5000 female 2007
## 178    Gentoo    Biscoe     46.1     15.1         215      5100   male 2007
## 179    Gentoo    Biscoe     44.5     14.3         216      4100   <NA> 2007
## 180    Gentoo    Biscoe     47.8     15.0         215      5650   male 2007
## 181    Gentoo    Biscoe     48.2     14.3         210      4600 female 2007
## 182    Gentoo    Biscoe     50.0     15.3         220      5550   male 2007
## 183    Gentoo    Biscoe     47.3     15.3         222      5250   male 2007
## 184    Gentoo    Biscoe     42.8     14.2         209      4700 female 2007
## 185    Gentoo    Biscoe     45.1     14.5         207      5050 female 2007
## 186    Gentoo    Biscoe     59.6     17.0         230      6050   male 2007
## 187    Gentoo    Biscoe     49.1     14.8         220      5150 female 2008
## 188    Gentoo    Biscoe     48.4     16.3         220      5400   male 2008
## 189    Gentoo    Biscoe     42.6     13.7         213      4950 female 2008
## 190    Gentoo    Biscoe     44.4     17.3         219      5250   male 2008
## 191    Gentoo    Biscoe     44.0     13.6         208      4350 female 2008
## 192    Gentoo    Biscoe     48.7     15.7         208      5350   male 2008
## 193    Gentoo    Biscoe     42.7     13.7         208      3950 female 2008
## 194    Gentoo    Biscoe     49.6     16.0         225      5700   male 2008
## 195    Gentoo    Biscoe     45.3     13.7         210      4300 female 2008
## 196    Gentoo    Biscoe     49.6     15.0         216      4750   male 2008
## 197    Gentoo    Biscoe     50.5     15.9         222      5550   male 2008
## 198    Gentoo    Biscoe     43.6     13.9         217      4900 female 2008
## 199    Gentoo    Biscoe     45.5     13.9         210      4200 female 2008
## 200    Gentoo    Biscoe     50.5     15.9         225      5400   male 2008
## 201    Gentoo    Biscoe     44.9     13.3         213      5100 female 2008
## 202    Gentoo    Biscoe     45.2     15.8         215      5300   male 2008
## 203    Gentoo    Biscoe     46.6     14.2         210      4850 female 2008
## 204    Gentoo    Biscoe     48.5     14.1         220      5300   male 2008
## 205    Gentoo    Biscoe     45.1     14.4         210      4400 female 2008
## 206    Gentoo    Biscoe     50.1     15.0         225      5000   male 2008
## 207    Gentoo    Biscoe     46.5     14.4         217      4900 female 2008
## 208    Gentoo    Biscoe     45.0     15.4         220      5050   male 2008
## 209    Gentoo    Biscoe     43.8     13.9         208      4300 female 2008
## 210    Gentoo    Biscoe     45.5     15.0         220      5000   male 2008
## 211    Gentoo    Biscoe     43.2     14.5         208      4450 female 2008
## 212    Gentoo    Biscoe     50.4     15.3         224      5550   male 2008
## 213    Gentoo    Biscoe     45.3     13.8         208      4200 female 2008
## 214    Gentoo    Biscoe     46.2     14.9         221      5300   male 2008
## 215    Gentoo    Biscoe     45.7     13.9         214      4400 female 2008
## 216    Gentoo    Biscoe     54.3     15.7         231      5650   male 2008
## 217    Gentoo    Biscoe     45.8     14.2         219      4700 female 2008
## 218    Gentoo    Biscoe     49.8     16.8         230      5700   male 2008
## 219    Gentoo    Biscoe     46.2     14.4         214      4650   <NA> 2008
## 220    Gentoo    Biscoe     49.5     16.2         229      5800   male 2008
## 221    Gentoo    Biscoe     43.5     14.2         220      4700 female 2008
## 222    Gentoo    Biscoe     50.7     15.0         223      5550   male 2008
## 223    Gentoo    Biscoe     47.7     15.0         216      4750 female 2008
## 224    Gentoo    Biscoe     46.4     15.6         221      5000   male 2008
## 225    Gentoo    Biscoe     48.2     15.6         221      5100   male 2008
## 226    Gentoo    Biscoe     46.5     14.8         217      5200 female 2008
## 227    Gentoo    Biscoe     46.4     15.0         216      4700 female 2008
## 228    Gentoo    Biscoe     48.6     16.0         230      5800   male 2008
## 229    Gentoo    Biscoe     47.5     14.2         209      4600 female 2008
## 230    Gentoo    Biscoe     51.1     16.3         220      6000   male 2008
## 231    Gentoo    Biscoe     45.2     13.8         215      4750 female 2008
## 232    Gentoo    Biscoe     45.2     16.4         223      5950   male 2008
## 233    Gentoo    Biscoe     49.1     14.5         212      4625 female 2009
## 234    Gentoo    Biscoe     52.5     15.6         221      5450   male 2009
## 235    Gentoo    Biscoe     47.4     14.6         212      4725 female 2009
## 236    Gentoo    Biscoe     50.0     15.9         224      5350   male 2009
## 237    Gentoo    Biscoe     44.9     13.8         212      4750 female 2009
## 238    Gentoo    Biscoe     50.8     17.3         228      5600   male 2009
## 239    Gentoo    Biscoe     43.4     14.4         218      4600 female 2009
## 240    Gentoo    Biscoe     51.3     14.2         218      5300   male 2009
## 241    Gentoo    Biscoe     47.5     14.0         212      4875 female 2009
## 242    Gentoo    Biscoe     52.1     17.0         230      5550   male 2009
## 243    Gentoo    Biscoe     47.5     15.0         218      4950 female 2009
## 244    Gentoo    Biscoe     52.2     17.1         228      5400   male 2009
## 245    Gentoo    Biscoe     45.5     14.5         212      4750 female 2009
## 246    Gentoo    Biscoe     49.5     16.1         224      5650   male 2009
## 247    Gentoo    Biscoe     44.5     14.7         214      4850 female 2009
## 248    Gentoo    Biscoe     50.8     15.7         226      5200   male 2009
## 249    Gentoo    Biscoe     49.4     15.8         216      4925   male 2009
## 250    Gentoo    Biscoe     46.9     14.6         222      4875 female 2009
## 251    Gentoo    Biscoe     48.4     14.4         203      4625 female 2009
## 252    Gentoo    Biscoe     51.1     16.5         225      5250   male 2009
## 253    Gentoo    Biscoe     48.5     15.0         219      4850 female 2009
## 254    Gentoo    Biscoe     55.9     17.0         228      5600   male 2009
## 255    Gentoo    Biscoe     47.2     15.5         215      4975 female 2009
## 256    Gentoo    Biscoe     49.1     15.0         228      5500   male 2009
## 257    Gentoo    Biscoe     47.3     13.8         216      4725   <NA> 2009
## 258    Gentoo    Biscoe     46.8     16.1         215      5500   male 2009
## 259    Gentoo    Biscoe     41.7     14.7         210      4700 female 2009
## 260    Gentoo    Biscoe     53.4     15.8         219      5500   male 2009
## 261    Gentoo    Biscoe     43.3     14.0         208      4575 female 2009
## 262    Gentoo    Biscoe     48.1     15.1         209      5500   male 2009
## 263    Gentoo    Biscoe     50.5     15.2         216      5000 female 2009
## 264    Gentoo    Biscoe     49.8     15.9         229      5950   male 2009
## 265    Gentoo    Biscoe     43.5     15.2         213      4650 female 2009
## 266    Gentoo    Biscoe     51.5     16.3         230      5500   male 2009
## 267    Gentoo    Biscoe     46.2     14.1         217      4375 female 2009
## 268    Gentoo    Biscoe     55.1     16.0         230      5850   male 2009
## 269    Gentoo    Biscoe     44.5     15.7         217      4875   <NA> 2009
## 270    Gentoo    Biscoe     48.8     16.2         222      6000   male 2009
## 271    Gentoo    Biscoe     47.2     13.7         214      4925 female 2009
## 272    Gentoo    Biscoe       NA       NA          NA        NA   <NA> 2009
## 273    Gentoo    Biscoe     46.8     14.3         215      4850 female 2009
## 274    Gentoo    Biscoe     50.4     15.7         222      5750   male 2009
## 275    Gentoo    Biscoe     45.2     14.8         212      5200 female 2009
## 276    Gentoo    Biscoe     49.9     16.1         213      5400   male 2009
## 277 Chinstrap     Dream     46.5     17.9         192      3500 female 2007
## 278 Chinstrap     Dream     50.0     19.5         196      3900   male 2007
## 279 Chinstrap     Dream     51.3     19.2         193      3650   male 2007
## 280 Chinstrap     Dream     45.4     18.7         188      3525 female 2007
## 281 Chinstrap     Dream     52.7     19.8         197      3725   male 2007
## 282 Chinstrap     Dream     45.2     17.8         198      3950 female 2007
## 283 Chinstrap     Dream     46.1     18.2         178      3250 female 2007
## 284 Chinstrap     Dream     51.3     18.2         197      3750   male 2007
## 285 Chinstrap     Dream     46.0     18.9         195      4150 female 2007
## 286 Chinstrap     Dream     51.3     19.9         198      3700   male 2007
## 287 Chinstrap     Dream     46.6     17.8         193      3800 female 2007
## 288 Chinstrap     Dream     51.7     20.3         194      3775   male 2007
## 289 Chinstrap     Dream     47.0     17.3         185      3700 female 2007
## 290 Chinstrap     Dream     52.0     18.1         201      4050   male 2007
## 291 Chinstrap     Dream     45.9     17.1         190      3575 female 2007
## 292 Chinstrap     Dream     50.5     19.6         201      4050   male 2007
## 293 Chinstrap     Dream     50.3     20.0         197      3300   male 2007
## 294 Chinstrap     Dream     58.0     17.8         181      3700 female 2007
## 295 Chinstrap     Dream     46.4     18.6         190      3450 female 2007
## 296 Chinstrap     Dream     49.2     18.2         195      4400   male 2007
## 297 Chinstrap     Dream     42.4     17.3         181      3600 female 2007
## 298 Chinstrap     Dream     48.5     17.5         191      3400   male 2007
## 299 Chinstrap     Dream     43.2     16.6         187      2900 female 2007
## 300 Chinstrap     Dream     50.6     19.4         193      3800   male 2007
## 301 Chinstrap     Dream     46.7     17.9         195      3300 female 2007
## 302 Chinstrap     Dream     52.0     19.0         197      4150   male 2007
## 303 Chinstrap     Dream     50.5     18.4         200      3400 female 2008
## 304 Chinstrap     Dream     49.5     19.0         200      3800   male 2008
## 305 Chinstrap     Dream     46.4     17.8         191      3700 female 2008
## 306 Chinstrap     Dream     52.8     20.0         205      4550   male 2008
## 307 Chinstrap     Dream     40.9     16.6         187      3200 female 2008
## 308 Chinstrap     Dream     54.2     20.8         201      4300   male 2008
## 309 Chinstrap     Dream     42.5     16.7         187      3350 female 2008
## 310 Chinstrap     Dream     51.0     18.8         203      4100   male 2008
## 311 Chinstrap     Dream     49.7     18.6         195      3600   male 2008
## 312 Chinstrap     Dream     47.5     16.8         199      3900 female 2008
## 313 Chinstrap     Dream     47.6     18.3         195      3850 female 2008
## 314 Chinstrap     Dream     52.0     20.7         210      4800   male 2008
## 315 Chinstrap     Dream     46.9     16.6         192      2700 female 2008
## 316 Chinstrap     Dream     53.5     19.9         205      4500   male 2008
## 317 Chinstrap     Dream     49.0     19.5         210      3950   male 2008
## 318 Chinstrap     Dream     46.2     17.5         187      3650 female 2008
## 319 Chinstrap     Dream     50.9     19.1         196      3550   male 2008
## 320 Chinstrap     Dream     45.5     17.0         196      3500 female 2008
## 321 Chinstrap     Dream     50.9     17.9         196      3675 female 2009
## 322 Chinstrap     Dream     50.8     18.5         201      4450   male 2009
## 323 Chinstrap     Dream     50.1     17.9         190      3400 female 2009
## 324 Chinstrap     Dream     49.0     19.6         212      4300   male 2009
## 325 Chinstrap     Dream     51.5     18.7         187      3250   male 2009
## 326 Chinstrap     Dream     49.8     17.3         198      3675 female 2009
## 327 Chinstrap     Dream     48.1     16.4         199      3325 female 2009
## 328 Chinstrap     Dream     51.4     19.0         201      3950   male 2009
## 329 Chinstrap     Dream     45.7     17.3         193      3600 female 2009
## 330 Chinstrap     Dream     50.7     19.7         203      4050   male 2009
## 331 Chinstrap     Dream     42.5     17.3         187      3350 female 2009
## 332 Chinstrap     Dream     52.2     18.8         197      3450   male 2009
## 333 Chinstrap     Dream     45.2     16.6         191      3250 female 2009
## 334 Chinstrap     Dream     49.3     19.9         203      4050   male 2009
## 335 Chinstrap     Dream     50.2     18.8         202      3800   male 2009
## 336 Chinstrap     Dream     45.6     19.4         194      3525 female 2009
## 337 Chinstrap     Dream     51.9     19.5         206      3950   male 2009
## 338 Chinstrap     Dream     46.8     16.5         189      3650 female 2009
## 339 Chinstrap     Dream     45.7     17.0         195      3650 female 2009
## 340 Chinstrap     Dream     55.8     19.8         207      4000   male 2009
## 341 Chinstrap     Dream     43.5     18.1         202      3400 female 2009
## 342 Chinstrap     Dream     49.6     18.2         193      3775   male 2009
## 343 Chinstrap     Dream     50.8     19.0         210      4100   male 2009
## 344 Chinstrap     Dream     50.2     18.7         198      3775 female 2009

Exercises

These can be done in the .Rmd or in a separate .R script. To start, install and load the gapminder package.

1. Write a data processing snippet to include only the data points collected after 1995 in Asian countries and save a CSV file
2. Separate the gapminder data frame into 5 individual data frames, one for each continent. Store those 5 data frames as an RData file in the results folder called continents.RData.
3. Finish exploring the gapminder data frame and a) Find the number of rows and the number of columns, b) Print the data type of each column, c) Explain the meaning of everything that str(gapminder) prints
4. In which years has the GDP of Canada been larger than the average of all data points?
5. Find the mean life expectancy of Switzerland before and after 2000
6. You discovered that all the entries from 2007 are actually from 2008. Create a copy of the full gapminder data frame in an object called gp. Then change the year column to correct the entries from 2007.
Bonus - Find the mean life expectancy and mean gdp per continent using the function tapply

hint: to understand the tapply function, use the ?

Solutions

#install.packages(gapminder)
library(gapminder)
1. Write a data processing snippet to include only the data points collected after 1995 in Asian countries and save a CSV file
asia<-gapminder[gapminder$year > 1995 & gapminder$continent=="Asia", ]

write.table(asia,
            file = "data/gapminder_after1995_asia.csv",
            sep = ",", 
            quote = FALSE, 
            row.names = FALSE)
2. Separate the gapminder data frame into 5 individual data frames, one for each continent. Store those 5 data frames as an RData file in the results folder called continents.RData.
asia<-gapminder[gapminder$continent=="Asia", ]
africa<-gapminder[gapminder$continent=="Africa", ]
oceania<-gapminder[gapminder$continent=="Oceania", ]
europe<-gapminder[gapminder$continent=="Europe", ]
americas<-gapminder[gapminder$continent=="Americas", ]

save(asia,africa,oceania,europe,americas,file="../results/continents.RData")
3. Finish exploring the gapminder data frame and a) Find the number of rows and the number of columns, b) Print the data type of each column, c) Explain the meaning of everything that str(gapminder) prints
dim(gapminder)

typeof(gapminder$country)
typeof(gapminder$continent)
typeof(gapminder$year)
typeof(gapminder$lifeExp)
typeof(gapminder$pop)
typeof(gapminder$gdpPercap)

str(gapminder)
4. In which years has the GDP of Canada been larger than the average of all data points?
canada<-gapminder[gapminder$country=="Canada",]
mgdp<-mean(canada$gdpPercap)
canada[canada$gdpPercap>mgdp,"year"]
5. Find the mean life expectancy of Switzerland before and after 2000
swiss<-gapminder[gapminder$country=="Switzerland",]
mean(swiss[swiss$year<2000,]$lifeExp) # Before
mean(swiss[swiss$year>2000,]$lifeExp) # After
6. You discovered that all the entries from 2007 are actually from 2008. Create a copy of the full gapminder data frame in an object called gp. Then change the year column to correct the entries from 2007.
gp<-gapminder
gp[gp$year==2007,"year"]<-2008
gp[gp$year==2008,]
Bonus - Find the mean life expectancy and mean gdp per continent using the function tapply

hint: to understand the tapply function, use the ?

?tapply()
tapply(gapminder$lifeExp,gapminder$continent,mean)

5. Advanced data manipulation

The fourth module participants will familiarize with the dplyr syntax; create pipes with the operator %>%; perform operations on data frames using dplyr and tidyr functions; learn how to join columns and rows of different data frames; and implement functions from external packages by reading their documentation in R.

Module content: - Handling data frames with dplyr - Other useful packages - Hands-on: advanced data manipulation

Script

Examples

Here we go over the material.

# install.packages("dplyr")
library(dplyr)
library(tidyr)
library(palmerpenguins)

df <- penguins
Dplyr
?dplyr
browseVignettes(package = "dplyr") # note this opens a window

dplyr is a package widely used in R analysis, and lets us easily analyse large datasets. It both contains useful functions, and a “grammar” called piping which can be applied to these functions.

Piping uses the %>% operator. In short, this operator “passes” the previous lines of code into the next.

Let us look at the following example:

df %>% 
  dplyr::select(species) #select is a dplyr function, and lets you select columns using names and types. The pipe passes df as an input to select.

# alternatively, one could do select(df, species)

Note the packagName::functionName. This is good practice for two reasons: 1) trace-ability, and 2) some packages overwrite functions.

Let us try more functions. Common, and useful, functions include select(), filter(), mutate(), group_by(), and summarise(). Piping becomes more useful once you start to chain more than one function; the output of the previous function is passed onto the next.

df %>% 
  # select specified columns
  dplyr::select(species, island, year) %>% 
  
  # keeps data where years greater than 2007
  dplyr::filter(year > 2007) %>% 
  
  # adds a column sorting islands into Torgersen and others
  dplyr::mutate(Torgersen = ifelse(island == "Torgersen", TRUE, FALSE)) %>% 
  
  # create groups which are a combination of the values in species and Torgersen
  dplyr::group_by(species, Torgersen) %>%
  
  # Count the number of entries in each group
  dplyr::summarise(number = dplyr::n())
Mutating joins

Mutating joins are a group of dplyr functions which allow you to join two datasets together. Different joins exist: inner_join(), left_join(), right_join(), and full_join().

In the following example, we will create a new data frame with a code for each species, and join it to df.

# first we look at what species are present in the dataset
df %>% select(species) %>% unique(.)
# then we create the code data frame
df_code <- df %>% 
  dplyr::select(species) %>%
  dplyr::mutate(code = case_when(
    species == "Adelie" ~ 1,
    species == "Gentoo" ~ 2,
    species == "Chinstrap" ~ 3
  ))

head(df_code)
# Now we join the two together.
df %>%
  dplyr::left_join(., df_code, by = "species")
Other Packages
Stringr

For fun, lets look at and inplement stringr, a string manipulation package.

# install.packages("stringr")
library(stringr)

We will extract the first three letters of each island to create an island code. To do this, we will use

df %>%
  dplyr::mutate(island_code = stringr::str_sub(island, 1, 3))
Tidyr

tidyr is a useful package that helps clean/sort/manipulate data.

Useful functions include pivot_longer() and pivot_wider() which are often used in combination with ggplot2 to plot data (don’t worry we will see plotting in the next module!)/

Let’s use pivot_wider() as an example. These functions can also be piped.

df %>% 
  pivot_wider(names_from = island, values_from = species)

Note! doing the above really isn’t useful in this situation but there are times where it is useful to have a wider data-frame. For example, in ecology, this data set can be adapted using pivot_wider to create a site by species data frame.

Exercises

You can do these in the notebook or in a separate .R script. We again use the gapminder library. If it is not installed, install it, and if it is not loaded, load it. Note! If you use a separate .R script, you must load the necessary additional libraries.

1. Write one command (can span multiple lines) using pipes that will output a data frame that has only the columns lifeExp, country and year for the records before the year 2000 from African countries, but not for other Continents.
2. Calculate the average life expectancy per country. Which country has the longest average life expectancy and which one the shortest average life expectancy?
3. In the previous hands-on you discovered that all the entries from 2007 are actually from 2008. Write a command to edit the data accordingly using pipes. In the same command filter only the entries from 2008 to verify the change.
4. Lets add data from a different .csv to our data frame.
a) Read in the co2_pcap_cons.csv found here.
b) Look at the data. What do you notice? What will you need to change to be able to join the two dataframes together?
c) Perform the changes you identified in (b)

hint: use stringr!

d) join both data frames

Solutions

library(gapminder)
library(dplyr)
library(tidyr)
1. Write one command ( can span multiple lines) using pipes that will output a data frame that has only the columns lifeExp, country and year for the records before the year 2000 from African countries, but not for other Continents.
tidy_africa <- gapminder %>%
                dplyr::filter(continent == "Africa") %>%
                dplyr::select(year, country, lifeExp)
head(tidy_africa)
2. Calculate the average life expectancy per country. Which country has the longest average life expectancy and which one the shortest average life expectancy?
gapminder %>%
   dplyr::group_by(country) %>%
   dplyr::summarize(mean_lifeExp = mean(lifeExp)) %>%
   dplyr::filter(mean_lifeExp == min(mean_lifeExp) | mean_lifeExp == max(mean_lifeExp))
3. In the previous hands-on you discovered that all the entries from 2007 are actually from 2008. Write a command to edit the data accordingly using pipes. In the same command filter only the entries from 2008 to verify the change.
gapminder %>%
  dplyr::mutate(year = ifelse(year==2007,2008,year)) %>%
  dplyr::filter(year==2008) %>%
  head()
4. Lets add data from a different .csv to our data frame.
a) Read in the co2_pcap_cons.csv found here.
new_data <- read.csv("./exercises/exercises_module1/data/co2_pcap_cons.csv")
b) Look at the data. What do you notice? What will you need to change to be able to join the two dataframes together?
  • Change the data frame such that country , year, and emissions are separate columns.

  • Change year to numeric.

c) Perform the changes you identified in (b)

hint: use stringr!

new_data_t <- new_data %>% 
  pivot_longer(cols = -country, names_to = "year", values_to = "emissions") %>%
  mutate(year = as.numeric(str_extract(year, "\\d+")))

Note here i use str_extract() combined with regex, but you can use str_sub() with start = 2 and end = 5 as seen in the examples above!

d) join both data frames
gapminder %>% left_join(., new_data_t, by = c("country", "year"))

6. Generating visual outputs

This section will show participants how to (1) Create basic plots using base R functions; (2) Understand how to connect data frames with ggplot2; (3) create basic graphs with ggplot2; (4) use factors to customize graphics in ggplot2; (5) use RMarkdown to generate customized reports.

Module content: - Figures with base R - Graphics with ggplot2

Script

Examples

Here we go over the material.

We will continue using the palmer penguins as an example.

library(ggplot2)
library(palmerpenguins)
library(tidyr)
library(dplyr)

df <- penguins
Basic Plots

R has a number of basic plots. The most basic of these functions is plot().

plot(df$species, df$bill_length_mm)
plot(df$bill_depth_mm, df$bill_length_mm)

A similar function exists to look at distributions: hist()

hist(df$body_mass_g)
Using ggplot2

Most visualizations, however, are done with the package ggplot2. This package allows for more control over the graphs, and produces better visualizations. Furthermore, ggplot2 is tailored to making visuals from data frames.

In this section, we go over the basics of ggplot2, but https://ggplot2.tidyverse.org/ has a handy cheatsheet!

The basic architecture of a ggplot2 plot is ggplot(df, aes(x, y)) + ..., where df represents the data frame of data, x your x variable, and y your y variable. aes stands for aesthetic. The type of plot is specified after the first call, for example a scatter plot is created via ggplot(df, aes(x, y)) + geom_point().

Let’s first set up a color scheme:

colorcode <- c("darkorange","purple","cyan4") #scheme from citation(palmerpenguins)

Now we look at the number of penguins of each species on each island. This calls for a bar chart; in ggplot you do not specify a y variable for a bar chart as this is automatically calculated. Instead, you supply a fill aesthetic.

df %>%
  ggplot(aes(x = island, fill = species))+
  geom_bar()+
  scale_fill_manual(values = colorcode, 
                    guide = NULL) +
  # basic theme
  theme_minimal()

If we want to separate by species, we can use facet_wrap():

df %>% 
  ggplot(aes(x = island, fill = species))+ 
  geom_bar()+ 
  scale_fill_manual(values = colorcode, guide = NULL) + 
  theme_minimal()+
  facet_wrap(.~species)

Note that the same plot can be created a different way, specifying the y aesthetic. Notice the differences!

df %>% 
  
  # we calculate the number of penguins of each species at each island
  group_by(island, species) %>%
  summarise(n = n()) %>%
  
  #now we can specify n
  ggplot(aes(x = island, y = n, fill = species))+ 
  
  # since we specify n, we need to tell the function via stat = "identity"
  geom_bar(stat = "identity")+ 
  scale_fill_manual(values = colorcode, guide = NULL) + 
  theme_minimal()+
  facet_wrap(.~species)

Lets look at bill length in each species.

df %>%
  ggplot(aes(x = species, y = bill_length_mm, fill = species ))+
  geom_boxplot()+
  scale_fill_manual(values = colorcode, guide = NULL) + 
  theme_minimal()

Now lets plot the bill length and depth against each other:

df %>% 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species, shape = island))+ 
  geom_point()+ 
  scale_color_manual(values = colorcode, guide = NULL) + 
  theme_minimal()

Now Lets look at how many penguins of each sex and species there are using a bar chart.

df %>% 
  ggplot(aes(x = sex, fill = species)) +
  
  # alpha corresponds to transparency. the smaller, the more transparent
  geom_bar(alpha = 0.8) +
  
  # code to control the 'fill' aes
  scale_fill_manual(values = colorcode, 
                    guide = NULL) +
  
  # basic theme
  theme_minimal() +
  
  # bringing multiple 
  facet_wrap(~species, ncol = 1) +
  coord_flip()

# note this snippet of code is adapted from citation("palmerpenguins")

Let’s look at the distribution of body mass by sex.

df %>%
  ggplot(aes(x = body_mass_g, fill = species))+
  geom_histogram(alpha = 0.8) +
  scale_fill_manual(values = colorcode, 
                    guide = NULL) +
  theme_minimal() +
  facet_wrap(. ~ sex)

If we want to separate all of them out in a nice grid format, we can use use facet_grid() :

df %>%
  ggplot(aes(x = body_mass_g, fill = species))+
  geom_histogram(alpha = 0.8) +
  scale_fill_manual(values = colorcode, 
                    guide = NULL) +
  theme_minimal() +
  facet_grid(species ~ sex)

Finally, lets look at this data over time. Specifically, the number of penguins found each year!

df %>%
  group_by(year, species) %>%
  summarise(n = n()) %>%
  ggplot(aes(x = year, y = n, color = species))+
  geom_point()+
  geom_line()+
  scale_color_manual(values = colorcode, 
                    guide = NULL) +
  theme_minimal()+
  
  # here we specify the x and y axes. We change the x axis limits so that we don't end up with 2007.5 and 2008.5 in the graph.
  xlab("Year")+
  ylab("Number of penguins")+
  scale_x_continuous(breaks = c(2007, 2008, 2009))

Many more graphs exist, and many aspects of a graph can be modified in ggplot2. The best way to learn is by doing them!

Exercises

You can do these in the notebook or in a separate .R script.

For these exercises, you need libraries ggplot2, medicaldata, dplyr and tidyr.

1. Lets start by looking at the clinics.
a) How many clinics participated in the study
b) How many valid tests were performed on each one?
c) Did the testing trend vary over time?
2. Now Looking at the patients.
a) How many patients tested positive vs negative in the first 100 days of the pandemic?
b) Do you notice any difference with the age of the patients?

Hint: You can make two age groups and calculate the percentage each age group in positive vs negative tests.

3. Processing times
a) Look at the specimen processing time to receipt, did the sample processing times improve over the first 100 days of the pandemic?
b) Plot the median processing times of each day over the course of the pandemic and then compare the summary statistics of the first 50 vs the last 50 days.
4. Bonus - Higher viral loads are detected in less PCR cycles. What can you observe about the viral load of positive vs negative samples. Do you notice anything differences in viral load across ages in the positive samples?

Hint: Also split the data into two age groups and try using geom_boxplot()

Solutions

suppressPackageStartupMessages(library(ggplot2)) 
suppressPackageStartupMessages(library(medicaldata))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyr))

Use the covid_testing data set and everything you’ve learned so far to answer the following questions:

1.Lets start by looking at the clinics.
covid <- medicaldata::covid_testing
a) How many clinics participated in the study
clinics<- covid %>%
          dplyr::select(subject_id,clinic_name,result,pan_day) %>%
          dplyr::distinct()
length(unique(clinics$clinic_name))
b) How many valid tests were performed on each one?
clinics %>%
  dplyr::filter(result!="invalid") %>%
  dplyr::group_by(clinic_name) %>%
  dplyr::summarize(n_test = length(clinic_name)) %>%
  dplyr::arrange(desc(n_test))
c) Did the testing trend vary over time?
covid %>%
  filter(pan_day<=100) %>%
  group_by(pan_day) %>%
  summarize(n=length(result)) %>%
  ggplot(.,aes(x=pan_day,y=n))+
    geom_point()+
    geom_line()+
    ylab("Number of tests per day")+
    xlab("Pandemic day")
2. Now Looking at the patients.
a) How many patients tested positive vs negative in the first 100 days of the pandemic?
covid %>%
  filter(result!="invalid" & pan_day<=100) %>%
  group_by(result) %>%
  summarize(n=length(subject_id))
b) Do you notice any difference with the age of the patients?

Hint: You can make two age groups and calculate the percentage each age group in positive vs negative tests.

tsts_age<-covid %>%
            filter(result!="invalid" & pan_day<=100) %>%
            mutate(age_group=ifelse(age<=21,"children","adults")) %>%
            group_by(age_group,result) %>%
            summarize(n=length(subject_id)) %>%
            mutate(percent_total=n/sum(n)*100)
tsts_age
3. Processing times
a) Look at the specimen processing time to receipt, did the sample processing times improve over the first 100 days of the pandemic?
covid %>%
  group_by(pan_day) %>%
  dplyr::summarise(median_col_rec_tat=median(col_rec_tat)) %>%
  ggplot(.,aes(x=pan_day,y=median_col_rec_tat)) +
  geom_point()+
  geom_line()
b) Plot the median processing times of each day over the course of the pandemic and then compare the summary statistics of the first 50 vs the last 50 days.
covid %>%
  mutate(pan_day_group=ifelse(pan_day<50,"first_50","last_50")) %>%
  group_by(pan_day_group) %>%
  dplyr::summarise(mean_col_rec_tat=mean(col_rec_tat),
                   median_col_rec_tat=median(col_rec_tat),
                   min_col_rec_tat=min(col_rec_tat),
                   max_col_rec_tat=max(col_rec_tat))
4. Bonus - Higher viral loads are detected in less PCR cycles. What can you observe about the viral load of positive vs negative samples. Do you notice anything differences in viral load across ages in the positive samples?

Hint: Also split the data into two age groups and try using geom_boxplot()

ggplot(covid,aes(y=ct_result,x=result,color=result))+
  geom_boxplot()
covid %>%
  mutate(age_group=ifelse(age<=21,"children","adults")) %>%
  ggplot(.,aes(y=ct_result,x=result,color=age_group))+
    geom_boxplot()

Repo structure

├── README.md
├── .gitignore
├── _config.yaml
├── Intro_Mon-2510.Rproj
├── Exercises
│   └── README.md
│   ├── data
│   │   └── co2_pcap_cons.csv
│   │   └── penguins.csv
│   ├── scripts
│   │   └── 01GettingStarted.Rmd and .html
│   │   └── 02DataTypes.Rmd and .html
│   │   └── 03ControlStructuresAndFunctions.Rmd and .html
│   │   └── 04BasicDataManipulation.Rmd and .html
│   │   └── 05AdvancedDataManipulation.Rmd and .html
│   │   └── 06GeneratingOutputs.Rmd and .html
├── Slides
│   └── IntroToR_CBHWorkshop.pptx
└── Outline
    └── workshopoutline.pdf #using the template provided

References

The materials for this workshop were based on the following sources: - Base R Cheat Sheet
- Google’s R Style Guide - Mastering Software Development in R

Data in this workshop comes from: - Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218 - Gapminder data free from www.gapminder.org, CC-BY LICENSE - Bryan J (2025). gapminder: Data from Gapminder. doi:10.32614/CRAN.package.gapminder https://doi.org/10.32614/CRAN.package.gapminder, R package version 1.0.1, https://CRAN.R-project.org/package=gapminder. - Higgins P (2021). medicaldata: Data Package for Medical Datasets. doi:10.32614/CRAN.package.medicaldata https://doi.org/10.32614/CRAN.package.medicaldata, R package version 0.2.0, https://CRAN.R-project.org/package=medicaldata.

Workshop created as part of the McGill Initiative in Computational Medicine