Gene Fusions Tutorial - Installation
General installation for CBW tutorial
Environment setup
For this tutorial, we now assume the environment variable $TUTORIAL_HOME
has been set to an existing directory to which the user has write access.
All binaries used in this tutorial will be installed using conda
. Modify the PATH
environment variable to point to binaries in the anaconda installation.
export PATH="/home/ubuntu/CourseData/CG_data/Module7/anaconda/bin:$PATH"
Installation
Tutorial directory structure
Create a directory for the reference data.
mkdir -p $TUTORIAL_HOME/refdata
Install tutorial scripts
All content for this tutorial is located in a bitbucket repo at
https://dranew@bitbucket.org/dranew/cbw_tutorial.git
. Some of the data, scripts and config files from this repo will be used in the tutorial. Clone the tutorial repo so we have a copy of the tutorial scripts in a known location.
cd $TUTORIAL_HOME/
git clone https://bitbucket.org/dranew/cbw_tutorial.git
Install Anaconda
All binaries used in this tutorial will be installed using conda
. Download and install anaconda with the prefix $TUTORIAL_HOME/anaconda
.
Packages in conda
are stored in channels. Several additional channels hosting bioinformatics specific software will be required. Add additional channels using conda config
.
conda config --add channels r
conda config --add channels bioconda
conda config --add channels BioBuilds
conda config --add channels https://conda.anaconda.org/dranew
Install samtools
The samtools
package is the most widely used software for manipulating high-throughput sequence data stored in the ‘bam’ format.
Install samtools
using conda
.
conda install samtools
Install picard tools
Picard tools is a useful set of utilities for manipulating sequence data in bam/sam format.
Install picard
using conda
.
conda install picard
Install igv tools
The igvtools package provides utilities for preprocessing bam files for quicker viewing in IGV.
Install igvtools
using conda
.
conda install igvtools
Install bowtie and bowtie2
Install bowtie
and bowtie2
using conda
.
conda install bowtie bowtie2
Install the gmap aligner
Install gmap
using conda
.
conda install gmap
Install the bwa aligner
Install bwa
using conda
.
conda install bwa
Installation of the ChimeraScan gene fusion prediction tool
Environment setup
Set variable for index directory.
CHIMERASCAN_INDEX=$TUTORIAL_HOME/refdata/chimerascan/indices/
Installation
Install ChimeraScan
Install in ChimeraScan using conda
.
conda install chimerascan
Install the required reference data files
Install the reference data in a subdirectory of the tutorial ref data.
mkdir -p $TUTORIAL_HOME/refdata/chimerascan/
cd $TUTORIAL_HOME/refdata/chimerascan/
Download the gene models from chimerascan’s google code site as specified in the instructions.
wget https://chimerascan.googlecode.com/files/hg19.ucsc_genes.txt.gz
gunzip hg19.ucsc_genes.txt.gz
Build the chimerascan indices using the chimerascan_index.py
command.
mkdir -p $CHIMERASCAN_INDEX
chimerascan_index.py \
$UCSC_GENOME_FILENAME hg19.ucsc_genes.txt \
$CHIMERASCAN_INDEX
Installation of the deFuse gene fusion prediction tool
Environment setup
Set variable for the config filename and the two scripts.
DEFUSE_CONFIG=$TUTORIAL_HOME/cbw_tutorial/config/defuse_chr1.txt
DEFUSE_REF_DATA=$TUTORIAL_HOME/refdata/defuse/
Installation
Install deFuse
Install in ChimeraScan using conda
.
conda install defuse
Install the required reference data files
The reference data files can be downloaded and index automatically using
the defuse_create_ref.pl
script.
defuse_create_ref.pl -c $DEFUSE_CONFIG -d $DEFUSE_REF_DATA
Installation of the STAR RNA-Seq aligner
Environment
Specify the directory in which the reference genome data will be stored.
STAR_GENOME_INDEX=$TUTORIAL_HOME/refdata/star/
STAR_FUSION_GENOME_INDEX=$TUTORIAL_HOME/refdata/star/GRCh37_gencode_v19_CTAT_lib/
Installation
Install STAR using conda. Also install perl and the perl package Set::IntervalTree, required by STAR-Fusion.
conda install perl-threaded
conda install perl-set-intervaltree
conda install star
Create genome
For STAR-Fusion, we require an additional reference dataset.
cd $STAR_GENOME_INDEX
wget https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/GRCh37_gencode_v19_CTAT_lib.tar.gz
tar -xvf GRCh37_gencode_v19_CTAT_lib.tar.gz
cd GRCh37_gencode_v19_CTAT_lib/
Optionally subset the reference for the tutorial chromosomes.
samtools faidx ref_genome.fa chr$TUTORIAL_CHROMOSOME \
> ref_genome.chr$TUTORIAL_CHROMOSOME.fa
mv ref_genome.chr$TUTORIAL_CHROMOSOME.fa ref_genome.fa
rm ref_genome.fa.fai
grep -P "^chr$TUTORIAL_CHROMOSOME\t" ref_annot.gtf \
> ref_annot.chr$TUTORIAL_CHROMOSOME.gtf
mv ref_annot.chr$TUTORIAL_CHROMOSOME.gtf ref_annot.gtf
Prepare the genome and annotations for star fusion.
prep_genome_lib.pl \
--genome_fa ref_genome.fa \
--gtf ref_annot.gtf \
--blast_pairs blast_pairs.outfmt6.gz
Installation of the Trinity RNA-Seq assembler
Installation
Install trinity
using conda
conda install trinity