Gene Fusions Tutorial - Installation
General installation for CBW tutorial
Environment setup
For this tutorial, we now assume the environment variable $TUTORIAL_HOME
has been set to an existing directory to which the user has write access.
All binaries used in this tutorial will be installed using conda. Modify the PATH environment variable to point to binaries in the anaconda installation.
export PATH="/home/ubuntu/CourseData/CG_data/Module7/anaconda/bin:$PATH"
Installation
Tutorial directory structure
Create a directory for the reference data.
mkdir -p $TUTORIAL_HOME/refdata
Install tutorial scripts
All content for this tutorial is located in a bitbucket repo at
https://dranew@bitbucket.org/dranew/cbw_tutorial.git. Some of the data, scripts and config files from this repo will be used in the tutorial. Clone the tutorial repo so we have a copy of the tutorial scripts in a known location.
cd $TUTORIAL_HOME/
git clone https://bitbucket.org/dranew/cbw_tutorial.git
Install Anaconda
All binaries used in this tutorial will be installed using conda. Download and install anaconda with the prefix $TUTORIAL_HOME/anaconda.
Packages in conda are stored in channels. Several additional channels hosting bioinformatics specific software will be required. Add additional channels using conda config.
conda config --add channels r
conda config --add channels bioconda
conda config --add channels BioBuilds
conda config --add channels https://conda.anaconda.org/dranew
Install samtools
The samtools package is the most widely used software for manipulating high-throughput sequence data stored in the ‘bam’ format.
Install samtools using conda.
conda install samtools
Install picard tools
Picard tools is a useful set of utilities for manipulating sequence data in bam/sam format.
Install picard using conda.
conda install picard
Install igv tools
The igvtools package provides utilities for preprocessing bam files for quicker viewing in IGV.
Install igvtools using conda.
conda install igvtools
Install bowtie and bowtie2
Install bowtie and bowtie2 using conda.
conda install bowtie bowtie2
Install the gmap aligner
Install gmap using conda.
conda install gmap
Install the bwa aligner
Install bwa using conda.
conda install bwa
Installation of the ChimeraScan gene fusion prediction tool
Environment setup
Set variable for index directory.
CHIMERASCAN_INDEX=$TUTORIAL_HOME/refdata/chimerascan/indices/
Installation
Install ChimeraScan
Install in ChimeraScan using conda.
conda install chimerascan
Install the required reference data files
Install the reference data in a subdirectory of the tutorial ref data.
mkdir -p $TUTORIAL_HOME/refdata/chimerascan/
cd $TUTORIAL_HOME/refdata/chimerascan/
Download the gene models from chimerascan’s google code site as specified in the instructions.
wget https://chimerascan.googlecode.com/files/hg19.ucsc_genes.txt.gz
gunzip hg19.ucsc_genes.txt.gz
Build the chimerascan indices using the chimerascan_index.py command.
mkdir -p $CHIMERASCAN_INDEX
chimerascan_index.py \
$UCSC_GENOME_FILENAME hg19.ucsc_genes.txt \
$CHIMERASCAN_INDEX
Installation of the deFuse gene fusion prediction tool
Environment setup
Set variable for the config filename and the two scripts.
DEFUSE_CONFIG=$TUTORIAL_HOME/cbw_tutorial/config/defuse_chr1.txt
DEFUSE_REF_DATA=$TUTORIAL_HOME/refdata/defuse/
Installation
Install deFuse
Install in ChimeraScan using conda.
conda install defuse
Install the required reference data files
The reference data files can be downloaded and index automatically using
the defuse_create_ref.pl script.
defuse_create_ref.pl -c $DEFUSE_CONFIG -d $DEFUSE_REF_DATA
Installation of the STAR RNA-Seq aligner
Environment
Specify the directory in which the reference genome data will be stored.
STAR_GENOME_INDEX=$TUTORIAL_HOME/refdata/star/
STAR_FUSION_GENOME_INDEX=$TUTORIAL_HOME/refdata/star/GRCh37_gencode_v19_CTAT_lib/
Installation
Install STAR using conda. Also install perl and the perl package Set::IntervalTree, required by STAR-Fusion.
conda install perl-threaded
conda install perl-set-intervaltree
conda install star
Create genome
For STAR-Fusion, we require an additional reference dataset.
cd $STAR_GENOME_INDEX
wget https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/GRCh37_gencode_v19_CTAT_lib.tar.gz
tar -xvf GRCh37_gencode_v19_CTAT_lib.tar.gz
cd GRCh37_gencode_v19_CTAT_lib/
Optionally subset the reference for the tutorial chromosomes.
samtools faidx ref_genome.fa chr$TUTORIAL_CHROMOSOME \
> ref_genome.chr$TUTORIAL_CHROMOSOME.fa
mv ref_genome.chr$TUTORIAL_CHROMOSOME.fa ref_genome.fa
rm ref_genome.fa.fai
grep -P "^chr$TUTORIAL_CHROMOSOME\t" ref_annot.gtf \
> ref_annot.chr$TUTORIAL_CHROMOSOME.gtf
mv ref_annot.chr$TUTORIAL_CHROMOSOME.gtf ref_annot.gtf
Prepare the genome and annotations for star fusion.
prep_genome_lib.pl \
--genome_fa ref_genome.fa \
--gtf ref_annot.gtf \
--blast_pairs blast_pairs.outfmt6.gz
Installation of the Trinity RNA-Seq assembler
Installation
Install trinity using conda
conda install trinity