Bioinformatics for Cancer Genomics

CBW BiCG Unix and R Review Session

Introduction

Welcome to the review for Unix and R! This lab will introduce you to the the command line and using R.

After this lab, you will be able to:

  • use common commands in Unix
  • read data into R
  • produce basic plots in R

Before We Begin

Read these directions for information on how to log in to your assigned Amazon node.

Unix Review

Basic commands:

  • ls: list the files in the directory
ls
  • mkdir: make a new directory
mkdir Review_Session
  • cd: change directories;
cd Review_Session

cd ... will take you up one directory.

  • pwd: print working directory
pwd
  • echo: print what it typed
echo 'Hello World!' > test.txt
  • cat: print the contents of a file
cat test.txt
  • curl: used to get contents from a URL
curl https://raw.githubusercontent.com/bioinformaticsdotca/Genomic_Med_2017/master/test.fasta > test.fasta
curl https://raw.githubusercontent.com/bioinformaticsdotca/BiCG_2017/master/Review_session/Gene_R_example.csv > Gene_R_Example.csv
  • head and tail: get the beginning or end of a file
head test.fasta
tail test.fasta
  • less and more: look at the contents of a file
less test.fasta

To exit less, press q.

  • cp: copy
cp test.fasta test2.fasta
  • mv: move
mv test2.fasta test3.fasta
  • rm: remove
rm test3.fasta

Note: you should be using rm -i to avoid accidentally deleting a wanted file.

  • grep: pattern matching
grep TTT test.fasta

You can use the pipe character | to string commands together:

head test.fasta | grep TTT

R Review

Connecting To RStudio

  • Open an internet browser.
  • In the URL bar, enter http://##.oicrcbw.ca:8080 replacing xx with your provided student number.
  • Enter the supplied username and password.

RStudio Notebooks

Information on RStudio Notebooks can be found here.

RStudio notebooks are written in R Markdown and contain text that can be executed independently.

To start a new notebook, File -> New File -> R Notebook.

To run code chunks, place your cursor within the code chunk and press Cmd+Shift+Enter on Mac and Crtl+Shift+Enter or click the green triangle run button. The output of the chunk appears below the code chunk.

Copy the code here into the notebook you have created.

Getting Around

The Hard Way

# get the current working directory:
getwd()
# set a new working directory:
setwd("C:/myPATH")
setwd("~/myPATH") # on Mac
setwd("/Users/david/myPATH") # on Mac
# list the files in the current working directory:
list.files()
# list objects in the current R session:
ls()

The Easy Way

In RStudio we can use “Session” > “Set Working Directory”.

Data Types

Vectors

numeric.vector <- c(1,2,3,4,5,6,2,1)
numeric.vector

character.vector <- c("Fred", "Barney", "Wilma", "Betty")
character.vector

logical.vector <- c(TRUE, TRUE, FALSE, TRUE)
logical.vector

To refer to elements in the vector:

character.vector
character.vector[2]
character.vector[2:3]
character.vector[c(2,4)]

Matrices

You can create a 3x4 numeric matrix with:

matrix.example <- matrix(1:12, nrow = 3, ncol=4, byrow = FALSE)
matrix.example

matrix.example <- matrix(1:12, nrow = 3, ncol=4, byrow = TRUE)
matrix.example

Alternatively, you can create a matrix by combining vectors:

dataset.a <- c(1,22,3,4,5)
dataset.b <- c(10,11,13,14,15)
dataset.a
dataset.b

rbind.together <- rbind(dataset.a, dataset.b)
rbind.together

cbind.together <- cbind(dataset.a, dataset.b)
cbind.together

To get elements of the matrix:

matrix.example[2,4]
matrix.example[2,]
matrix.example[,4]

You can add column and row names to the matrix and use the new names to get the elements of the matrix:

colnames(matrix.example) <- c("Sample1","Sample2","Sample3","Sample4")
rownames(matrix.example) <- paste("gene",1:3,sep="_")
matrix.example

matrix.example[,"Sample2"]
matrix.example[1,"Sample2"]
matrix.example["gene_1","Sample2"]

Note that all columns in a matrix must have the same mode(numeric, character, etc.) and the same length.

Dataframes

Dataframes are similar to arrays but different columns can have different modes (numeric, character, factor, etc.).

people.summary <- data.frame(
                             age = c(30,29,25,25),
                             names = c("Fred", "Barney", "Wilma", "Betty"),
                             gender = c("m", "m", "f", "f")
                             )
people.summary

To get elements of the dataframe:

people.summary[2,1]
people.summary[2,]
people.summary[,1]
people.summary$age

Lists

Lists gather together a collection of objects under one name.

together.list <- list(
                      vector.example = dataset.a, 
                      matrix.example = matrix.example,
                      data.frame.example = people.summary
                      )
together.list

There are several ways to get elements of a list:


together.list$matrix.example
together.list$matrix.example[,3]
together.list["matrix.example"]
together.list[["matrix.example"]]
together.list[["matrix.example"]][,2]

Reading Data In

We use read.data or read.csv to read in data.

gene_example <- read_csv("Gene_R_example.csv")

In RStudio, we can use the “File” navigation window instead. Navigate to the directory containing the Gene_R_example.csv that we downloaded previously. Click on the file name then click “Import Dataset.” A new window appears allowing you to modify attributes of your file. Rename the file to the object “gene_example”.

Commands like head and tail also work in R.

head(gene_example)
View(gene_example)

Basic Plotting

A very basic plot:

plot(x=gene_example$Control, y=gene_example$Treated)

A nicer plot:

plot(x=gene_example$Control, y=gene_example$Treated,
    xlab = "Control",
    ylab = "Treated",
    cex.lab = 1.5,
    main = "A nice scatter plot",
    pch = 16,
    bty = "n",
    col = "dark blue",
    las = 1
    )
## las
## How to change the axes label style in R
## To change the axes label style, use the graphics option las (label style). This changes the orientation angle of the labels:
## 0: The default, parallel to the axis
## 1: Always horizontal
## 2: Perpendicular to the axis
## 3: Always vertical

## bty
## To change the type of box round the plot area, use the option bty (box type):
## "o": The default value draws a complete rectangle around the plot.
## "n": Draws nothing around the plot.

Connecting the dots:

plot(x=gene_example$Control, y=gene_example$Treated,
    xlab = "Control",
    ylab = "Treated",
    cex.lab = 1.5,
    main = "A nice scatter plot",
    pch = 16,
    bty = "n",
    col = "dark blue",
	type = "b",
	las = 1
	)

Histograms

hist(gene_example$Control)

hist(gene_example$Control,
    xlab = "Expression",
    ylab = "Number of Genes",
    cex.lab = 1.5,
    main = "A nice histogram",
    col = "cyan",
    breaks = 10,
    las = 1
    )

Boxplots

 boxplot(gene_example[,2:3])
boxplot(gene_example[,2:3],
	width = c(3,1),
	col = "red",
	border = "dark blue",
	names = c("Control", "Treatment"),
	main = "My boxplot",
	notch = TRUE,
	horizontal = TRUE
	)

Saving your plots as PDFs

pdf("myfigure.pdf", height=10, width=6)
par(mfrow=c(2,1))

plot(x=gene_example$Control, y=gene_example$Treated,
    xlab = "Control",
    ylab = "Treated",
    cex.lab = 1.5,
    main = "A nice scatter plot",
    pch = 16,
    bty = "n",
    col = "dark blue",
	type = "b",
	las = 1
	)
	
boxplot(gene_example[,2:3],
	width = c(3,1),
	col = "red",
	border = "dark blue",
	names = c("Control", "Treatment"),
	main = "My boxplot",
	notch = TRUE,
	horizontal = TRUE
	)
  
  dev.off()