Informatics for High-throughput Sequencing Data Analysis 2019 Integrated Assignment
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.
CBW HT-seq Integrative Assignment
Written originally by Mathieu Bourgey, edited by Florence Cavalli
Task
We will perform the same analysis as in Module 3 but using the mother and father samples i.e sample NA12891 and NA12891.
The fastq files are in the following directory of the cloud instance: ~/CourseData/HT_data/Module3/
* raw_reads/NA12891/NA12891_CBW_chr1_R1.fastq.gz
* raw_reads/NA12891/NA12891_CBW_chr1_R2.fastq.gz
* raw_reads/NA12892/NA12892_CBW_chr1_R1.fastq.gz
* raw_reads/NA12892/NA12892_CBW_chr1_R2.fastq.gz
Environment setup
# Start a fresh docker container:
docker run --privileged -v /tmp:/tmp --network host -it -w $PWD -v $HOME:$HOME -v /media:/media --user $UID:$GROUPS -v /etc/group:/etc/group -v /etc/passwd:/etc/passwd c3genomics/genpipes:0.8
Then inside the docker container:
#set up environment variables
export REF=$WORK_DIR/reference/
export WORK_DIR=~/workspace/HTseq/Integrative_Assignment/
rm -rf $WORK_DIR
mkdir -p $WORK_DIR
cd $WORK_DIR
ln -fs ~/CourseData/HT_data/Module3/* .
# Load the software modules
module load mugqic/java/openjdk-jdk1.8.0_72 mugqic/bvatools/1.6 mugqic/trimmomatic/0.36 mugqic/samtools/1.9 mugqic/bwa/0.7.17 mugqic/GenomeAnalysisTK/4.1.0.0 mugqic/R_Bioconductor/3.5.0_3.7
Task list:
-
Check read QC
-
Trim unreliable bases from the read ends
-
Align the reads to the reference
-
Sort the alignments by chromosome position
-
Realign short indels
-
Fixe mate issues (optional)
-
Mark duplicates
-
Recalibrate the Base Quality
-
Generate alignment metrics
Discussion/Questions:
-
Explain the purpose of each step
-
Which software tool can be used for each step
The full commands can be downloaded here solution