Introduction to ChIP-seq and analysis

Introduction to ChIP-seq and analysis

by Martin Hirst and Edmund Su

Table of contents:

  1. Common tools of the trade

  2. Module 1 Alignment

  3. Module 2 Peak Calling

  4. Module 3 Differential Analysis

Common tools of the trade


- Explaination of tools

BWA Genomic Sequence Read alignment tool

PICARD Toolkit for BAM file manipulation

SAMTOOLS Toolkit for BAM file manipulation

BEDTOOLSToolkit for BED/BEDGRAPH file manipulation

MACS2 Enriched region identifier for ChIP-seq

UCSCtoolsManipulation of Wigs and Bed/Bedgraph to binary forms

- Resource files

  1. BWA INDEX:
    • ~/CourseData/EPI_data/Module1/BWA_index/
  2. Enhancer file.bed:
    • ~/CourseData/EPI_data/Module1/QC_resources/
    • download various state7 and merge https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/
  3. TSS.bed:
    • ~/CourseData/EPI_data/Module1/QC_resources/
    • Generated by downloading Ensemblv79 GTF convert to Bed +/-2kb of TSS. See https://www.biostars.org/p/56280/

4.HOX regions.bed

  • ~/CourseData/EPI_data/Module1/QC_resources/
  • Generated by downloading Ensemblv79 GTF convert to Bed then selecting for HOX
  1. Hg38 Black list regions
    • ~/CourseData/EPI_data/Module1/QC_resources/
    • https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz

- Resource files

1.CEMT Pooled Breast Basal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69069.CEEHRC.CEMT0035.H3K27ac.peak_calls.bigBed

2.CEMT Pooled Breast Basal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69067.CEEHRC.CEMT0035.H3K27ac.signal_unstranded.bigWig

3.CEMT Pooled Breast Basal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69063.CEEHRC.CEMT0035.H3K27me3.peak_calls.bigBed

4.CEMT Pooled Breast Basal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69061.CEEHRC.CEMT0035.H3K27me3.signal_unstranded.bigWig

5.CEMT Pooled Breast Basal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69054.CEEHRC.CEMT0035.H3K4me1.peak_calls.bigBed

6.CEMT Pooled Breast Basal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69052.CEEHRC.CEMT0035.H3K4me1.signal_unstranded.bigWig

7.CEMT Pooled Breast Basal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69057.CEEHRC.CEMT0035.H3K4me3.peak_calls.bigBed

8.CEMT Pooled Breast Basal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69055.CEEHRC.CEMT0035.H3K4me3.signal_unstranded.bigWig

9.CEMT Pooled Breast Basal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69070.CEEHRC.CEMT0035.Input.signal_unstranded.bigWig

10.CEMT Pooled Breast Stromal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69088.CEEHRC.CEMT0036.H3K27ac.peak_calls.bigBed

11.CEMT Pooled Breast Stromal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69086.CEEHRC.CEMT0036.H3K27ac.signal_unstranded.bigWig

12.CEMT Pooled Breast Stromal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69082.CEEHRC.CEMT0036.H3K27me3.peak_calls.bigBed

13.CEMT Pooled Breast Stromal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69080.CEEHRC.CEMT0036.H3K27me3.signal_unstranded.bigWig

14.CEMT Pooled Breast Stromal-

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69073.CEEHRC.CEMT0036.H3K4me1.peak_calls.bigBed

15.CEMT Pooled Breast Stromal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69071.CEEHRC.CEMT0036.H3K4me1.signal_unstranded.bigWig

16.CEMT Pooled Breast Stromal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69076.CEEHRC.CEMT0036.H3K4me3.peak_calls.bigBed

17.CEMT Pooled Breast Stromal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69074.CEEHRC.CEMT0036.H3K4me3.signal_unstranded.bigWig

18.CEMT Pooled Breast Stromal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69089.CEEHRC.CEMT0036.Input.signal_unstranded.bigWig

19.CEMT Pooled Breast Luminal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69107.CEEHRC.CEMT0037.H3K27ac.peak_calls.bigBed

20.CEMT Pooled Breast Luminal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69105.CEEHRC.CEMT0037.H3K27ac.signal_unstranded.bigWig
  1. CEMT Pooled Breast Luminal
    • ~/CourseData/EPI_data/Module1/CHEERC_resources
    • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69101.CEEHRC.CEMT0037.H3K27me3.peak_calls.bigBed

22.CEMT Pooled Breast Luminal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69099.CEEHRC.CEMT0037.H3K27me3.signal_unstranded.bigWig

23.CEMT Pooled Breast Luminal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69092.CEEHRC.CEMT0037.H3K4me1.peak_calls.bigBed

24.CEMT Pooled Breast Luminal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69090.CEEHRC.CEMT0037.H3K4me1.signal_unstranded.bigWig

25.CEMT Pooled Breast Luminal

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69095.CEEHRC.CEMT0037.H3K4me3.peak_calls.bigBed

26.CEMT Pooled Breast Luminal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69093.CEEHRC.CEMT0037.H3K4me3.signal_unstranded.bigWig

27.CEMT Pooled Breast Luminal

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69108.CEEHRC.CEMT0037.Input.signal_unstranded.bigWig

28.CEMT Pooled Breast Luminal Progenitor

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69126.CEEHRC.CEMT0038.H3K27ac.peak_calls.bigBed

29.CEMT Pooled Breast Luminal Progenitor

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69124.CEEHRC.CEMT0038.H3K27ac.signal_unstranded.bigWig

30.CEMT Pooled Breast Luminal Progenitor

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69120.CEEHRC.CEMT0038.H3K27me3.peak_calls.bigBed

31.CEMT Pooled Breast Luminal Progenitor

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69118.CEEHRC.CEMT0038.H3K27me3.signal_unstranded.bigWig

32.CEMT Pooled Breast Luminal Progenitor

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69111.CEEHRC.CEMT0038.H3K4me1.peak_calls.bigBed

33.CEMT Pooled Breast Luminal Progenitor

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69109.CEEHRC.CEMT0038.H3K4me1.signal_unstranded.bigWig

34.CEMT Pooled Breast Luminal Progenitor

  • ~/CourseData/EPI_data/Module1/CHEERC_resources
  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69114.CEEHRC.CEMT0038.H3K4me3.peak_calls.bigBed

35.CEMT Pooled Breast Luminal Progenitor

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69112.CEEHRC.CEMT0038.H3K4me3.signal_unstranded.bigWig

36.CEMT Pooled Breast Luminal Progenitor

  • https://epigenomesportal.ca/tracks/CEEHRC/hg38/69127.CEEHRC.CEMT0038.Input.signal_unstranded.bigWig

Module 1. BWA Alignment + BAM post processing


1. Have a genome reference file ready (DONE)
2. Index genome reference file (DONE)
3. Run quality Check
4. Run alignment
5. Coordinate Sort alignment File
6. Duplicate marking alignment
7. Flagstats
7. Clean up

Setup

mkdir ~/workspace/{alignments,fastqc}

1. Make a reference directory and download appropriate fasta reference @ https://hgdownload.cse.ucsc.edu/goldenpath/ (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz). For the live tutorial this step was done for you. We will be using a shortened version containining only chr19

Code:
mkdir ~/CourseData/BWA_index
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz -O ~/CourseData/BWA_index/
gunzip ~/CourseData/EPI_data/Module1/BWA_index/hg38.fa.gz

2 .Index Fasta using BWA. For the live tutorial, this step was done for you.

Code:
bwa index ~/CourseData/EPI_data/Module1/BWA_index/chr19.hg38_no_alt.fa
ls ~/CourseData/EPI_data/Module1/BWA_index/ -lth
Output:
total 155M
8126488 -rw-rw-r-- 1 ubuntu ubuntu 28M Sep  7 19:22 chr19.hg38_no_alt.fa.sa
8126486 -rw-rw-r-- 1 ubuntu ubuntu 108 Sep  7 19:22 chr19.hg38_no_alt.fa.amb
8126485 -rw-rw-r-- 1 ubuntu ubuntu 153 Sep  7 19:22 chr19.hg38_no_alt.fa.ann
8126484 -rw-rw-r-- 1 ubuntu ubuntu 14M Sep  7 19:22 chr19.hg38_no_alt.fa.pac
8126487 -rw-rw-r-- 1 ubuntu ubuntu 56M Sep  7 19:22 chr19.hg38_no_alt.fa.bwt
8126483 -rw-rw-r-- 1 ubuntu ubuntu 57M Sep  7 18:36 chr19.hg38_no_alt.fa

3. Run quality Check

Code:
mkdir -p ~/workspace/fastqc
fastq_file_A=~/CourseData/EPI_data/Module1/MCF10A_resources/R1_input.fastq.gz
fastq_file_B=~/CourseData/EPI_data/Module1/MCF10A_resources/R1_h3k27ac.fastq.gz
fastqc ${fastq_file_A} -o ~/workspace/fastqc
fastqc ${fastq_file_B} -o ~/workspace/fastqc
Output:
Started analysis of R1_input.fastq.gz
Approx 5% complete for R1_input.fastq.gz
Approx 10% complete for R1_input.fastq.gz
Approx 15% complete for R1_input.fastq.gz
Approx 20% complete for R1_input.fastq.gz
Approx 25% complete for R1_input.fastq.gz
Approx 30% complete for R1_input.fastq.gz
Approx 35% complete for R1_input.fastq.gz
Approx 40% complete for R1_input.fastq.gz
Approx 45% complete for R1_input.fastq.gz
Approx 50% complete for R1_input.fastq.gz
Approx 55% complete for R1_input.fastq.gz
Approx 60% complete for R1_input.fastq.gz
Approx 65% complete for R1_input.fastq.gz
Approx 70% complete for R1_input.fastq.gz
Approx 75% complete for R1_input.fastq.gz
Approx 80% complete for R1_input.fastq.gz
Approx 85% complete for R1_input.fastq.gz
Approx 90% complete for R1_input.fastq.gz
Approx 95% complete for R1_input.fastq.gz
Analysis complete for R1_input.fastq.gz
Comment:
Check out the HTML file produced!
Note the GC distribution of Input VS H3K27ac
Input

H3K27ac

4. Run alignment

Code:
mkdir -p ~/workspace/alignments
ref=~/CourseData/EPI_data/Module1/BWA_index/chr19.hg38_no_alt.fa
read1=~/CourseData/EPI_data/Module1/MCF10A_resources/R1_input.fastq.gz
read2=~/CourseData/EPI_data/Module1/MCF10A_resources/R2_input.fastq.gz
sample=MCF10A_input_chr19
bwa mem -M -t 4 ${ref} ${read1} ${read2} 2>~/workspace/alignments/alignment.log | samtools view -hbS -o ~/workspace/alignments/${sample}.bam
Output:
Check log
Code Breakdown:
mkdir -p ~/workspace/alignments # Make working directory
ref=~/CourseData/EPI_data/Module1/BWA_index/chr19.hg38_no_alt.fa # Preset set variables
read1=~/CourseData/EPI_data/Module1/MCF10A_resources/R1_input.fastq.gz
read2=~/CourseData/EPI_data/Module1/MCF10A_resources/R2_input.fastq.gz
sample=MCF10A_input_chr19

bwa mem -M -t 4 ${ref} ${read1} ${read2} 2>~/workspace/alignments/alignment.log # Alignment step {"-t 2":Peform alignment using four threads. Our tutorial uses m5.xlarge therefore provides four cores,"-M":mark short split hits as secondary}
| samtools view -hbS -o ~/workspace/alignments/${sample}.bam # Converts SAM to BAM format {"h":Include Header, "b": output BAM, "S":detect input format}

5. Coordinate Sort alignment File

Code:
sample=MCF10A_input_chr19
samtools sort  ~/workspace/alignments/${sample}.bam -o ~/workspace/alignments/${sample}.sorted.bam
samtools view ~/workspace/alignments/${sample}.bam | head
samtools view ~/workspace/alignments/${sample}.sorted.bam | head
Output:
Output from ~/workspace/alignments/MCF10A_input_chr19.bam:
HS10_346:2:1101:3349:1883       83      chr19   49469101        60      75M     =       49468882        -294    CTTGACAAGAAGGTTTTGAGGCCCCGCCCTTAGGACTCAAGTTACTAAGGAAGAGGCTGTCCTTAGCAACAGGGN     DEEDDDDDDDDDDDDDDDDDDDFJJJJJJJJJJJJJJJJIIIIIIJJJJJJJJJJJJJIHJJHHHHHFFFFD=1#       NM:i:1  MD:Z:74G0       MC:Z:75M        AS:i:74 XS:i:0
HS10_346:2:1101:3349:1883       163     chr19   49468882        60      75M     =       49469101        294     CCAGCAGGCCTGGCCAACGTGGTGACAGGAGACCGGGACCATCTGACCCGCTGCCTGGCCTTGCACCAAGACGTC     CCCFFFFFGHGHGIJJJJJFHIEHIIJJJICHHIIGGIJGIJJIJJJJHGBEDDEEDDDDDDDCCDCDDDDDDBD       NM:i:0  MD:Z:75 MC:Z:75M        AS:i:75 XS:i:0
HS10_346:2:1101:3408:1953       83      chr19   23790237        60      75M     =       23790104        -208    TCCACCCGCCTAGGCTTCCCAAAGTGGTGGGATTACAGACGTGAGCCACTGGACCCGGCCTGATTTTCTCTTGAN     BDB9DDDEEFFFFFHHHHEJJJJJJJJIJJJJJJGIJJJIIJJJJJHGJJJIHJJJJJJJJJHHHHHFFFFD=1#       NM:i:2  MD:Z:47G26A0    MC:Z:5M1I69M    AS:i:69 XS:i:35
HS10_346:2:1101:3408:1953       163     chr19   23790104        60      5M1I69M =       23790237        208     ATTCTCCTGTCTCAGTCTCCTGAGTAGCTGGGATTACAGGCGCCCGCCACTGTGCCCGGCTACTTTTTGTATTTT     CCCFFFFFDFFHHJJ?FHGIJJEHEFHIIGJJEGGIIFIIGHIJIJJJJIFIGJJIGGHEDEFEFEEED?DCFEE       NM:i:1  MD:Z:74 MC:Z:75M        AS:i:69 XS:i:50
HS10_346:2:1101:11940:1853      99      chr19   42651286        60      75M     =       42651536        325     NTTTCTCCATCAACTTAGCTGGCAGCTCCTGTCCCCAGCAGCATCAGAGGCCCCATGAAAAGAGCTCCAGCAGGG     #1=DFFFFHHHHHJJJJJJJJJIJGIJJJIIIIJJJJIJJJJJJJJJJJJJJJJJJIJIJJJJJHHHHHFFFFFD       NM:i:1  MD:Z:0G74       MC:Z:75M        AS:i:74 XS:i:31
HS10_346:2:1101:11940:1853      147     chr19   42651536        60      75M     =       42651286        -325    GGGAGGAGAGAAGGGAACTGTAGGCCAATGGCTTTATTGGGTCTAGGGTGTTATTGACAGGTTTCCAGAAGGGAG     HHHHJIJJJJJJIJIGHJJJJJJIHJJJJJIJJJJJJJJJJIGIJJJJJIHFCCCJJJJJJJHHHHHFFFFFCCC       NM:i:0  MD:Z:75 MC:Z:75M        AS:i:75 XS:i:0
HS10_346:2:1101:20208:1926      83      chr19   5876554 60      75M     =       5876421 -208    GCGGTGAGGCCATCTATGCCCCTCGTTGGGGTCCTGGTCTTCATTGGACACCCCAGCTCCTCCCTCAGCCTGGGN     DDDDDDDDDDDEDDDDDDDDFFHHHJJJJJJJJJJJJJJJJJIJJIHDD:JJJJJJJJJJJJHHHHHFFFFD=4#       NM:i:2  MD:Z:15G58C0    MC:Z:32M1D24M2I17M      AS:i:69 XS:i:24
HS10_346:2:1101:20208:1926      163     chr19   5876421 60      32M1D24M2I17M   =       5876554 208     GGGGCTCTCTTGCCCTCCCAGCCAGATCATCCTTCTACTGGCTCCTCCAACCACCCCTGTGCCCCTGATTCTAGG     CCCFFFFFHHHHHJJJJJIJJJJJJJJJJIJIIJIJJJJJIJJJICHIGHHHIIJJJCGEGGHHH=DBEFFDDEE       NM:i:3  MD:Z:32^T41     MC:Z:75M        AS:i:58 XS:i:0
HS10_346:2:1101:21139:1859      83      chr19   41375797        60      75M     =       41375616        -256    AGCAAGACCCCGTTTTTTAAAAAATAATAATAAAAAAAAAATCCGCCGGGCGCGGTTGCTCACGCCTGTAATCCN     ####################################################################?<@?70#       NM:i:4  MD:Z:13C29T12G17T0      MC:Z:75M        AS:i:59 XS:i:28
HS10_346:2:1101:21139:1859      163     chr19   41375616        60      75M     =       41375797        256     AACAAGCCAACACGCCTTCAGCACTCCTCCGCAAAAAAACACCCCTAAACAAAATAGGCCAGGCGCGGTGACTCA     :+:=?;+<<A7=<)<1+1+22?@3@B>B>30?A04==AA####################################       NM:i:2  MD:Z:13C31A29   MC:Z:75M        AS:i:65 XS:i:24
Output from ~/workspace/alignments/$MCF10A_input_chr19.sorted.bam:
HS10_346:2:1304:19247:16557     117     chr19   60146   0       *       =       60146   0       GTTGAGTAATTGCTGAGATGGGCAGTAGAGATGCTCAGGTCTGTGGTCCCTTTCCATCCCCACTTGATCTATTTT     ###########################################################################       MC:Z:54M21S     AS:i:0  XS:i:0
HS10_346:2:1304:19247:16557     185     chr19   60146   60      54M21S  =       60146   0       TACAAGGATAATCTGACCTGCAGGGTCGAGGAGTTGACGGTGCTGAGTTCCCTGGATGGCACCAAGATCGGCCCT     DCCDDDCCCDCDECEEEFFFFHGHHIIHFGHIGGEIFAHCIEHGGIGHGHEHEDGEIIIHFG?HFFDDFFFD@@@       NM:i:0  MD:Z:54 AS:i:54 XS:i:0
HS10_346:2:1314:2495:92223      163     chr19   60167   60      75M     =       60313   221     AGGGTCAAGGAGTTGACGGTGCTGAGTTCCCTGCACTCTCAGTAGGGACAGGCCCTATGCTGCCACCTGTACATG     @@?DDDDDHHHHFGHJIJJFEHJJJJEIIIGHIJJJJGEIBHHGIJJIIJJJJJJJJIJJIJJJHGHHFDDFCEF       NM:i:1  MD:Z:6G68       MC:Z:75M        AS:i:70 XS:i:0
HS10_346:2:1207:4332:57794      99      chr19   60172   60      75M     =       60350   253     CGAGGAGTTGACGGTGCTGAGTTCCCTGCACTCTCAGTAGGGACAGGCCCTATGCTGCCACCTGTACATGCTATC     @C@FFDFDFFFHHIHIIJHHGGIIJGIEEHIJIJJJIJJIJJGIGHIHIHHJJJJJJJJIHHHEEHHHHFFFFFF       NM:i:0  MD:Z:75 MC:Z:75M        AS:i:75 XS:i:0
HS10_346:2:2315:10492:97635     121     chr19   60173   60      75M     =       60173   0       GAGGAGTGGACGGTGCTGAGTTCCCTGCTCTCTCAGTAGGGACAGGCCCTATGCTGCCACCTGTACATGCTATCA     ###########################################################################       NM:i:3  MD:Z:7T20A45T0  AS:i:64 XS:i:0
HS10_346:2:2315:10492:97635     181     chr19   60173   0       *       =       60173   0       AAAAATCGAAAATACTTTTAACAATTTGTATTTGATTTATAACTTTTAAACATTTTTATAATGACATTTAAAAAA     IJIGIHFCBHEC=8GHJIIIHIEEEIHDGGHGIGIGGHGEEEGCIHHAC?HEGJJGJIIGHBHDDHHFFFFD@@@       MC:Z:75M        AS:i:0  XS:i:0
HS10_346:2:1314:17241:18282     117     chr19   60211   0       *       =       60211   0       CTCTGTGATCTTCTCCATGGCAGGATCTCCCAGCAGGTAAAGCAGAGCCGGAGCCAGGTGCAGGCCATTGGAGAG     @CCD@@EEEHFHEA=HGGC@CF7=@@F;HF<CCBBDDB9DB@<>GG@E@HEIHACHDDHFDADF>HBDDDDB?@@       MC:Z:75M        AS:i:0  XS:i:0
HS10_346:2:1314:17241:18282     185     chr19   60211   60      75M     =       60211   0       GGGACAGGCCCTATGCTGCCACCTGTACATGCTATCTGAAGGACAGCCTCCAGGGCACACAGAGGATGGTATTTA     ##AA=5;;BA;BB=(7.8/**)A7=909<A??BB????*A:7@11)22+3<3?3,33<2<=?CA<,C7?A?===:       NM:i:0  MD:Z:75 AS:i:75 XS:i:0
HS10_346:2:1201:2781:75049      99      chr19   60221   60      75M     =       60484   338     CTATGCTGCCACCTGTACATGCTATCTGAAGGACAGCCTCCAGGGCACACAGAGGATGGTATTTACACATGCACA     CCCFFFFFHGGHHJJIJJJJJJJJJJJJJJJJJJJIIJJJJJGIJIIJJJJJIJJJJJJ@GIIJJJJIJHHHHHF       NM:i:0  MD:Z:75 MC:Z:75M        AS:i:75 XS:i:0
HS10_346:2:1314:2495:92223      83      chr19   60313   60      75M     =       60167   -221    CAAGCACTTCACAACCCCTCATGATCACGTGCAGCAGACAAAGTGGCCTCTGCAGAGGGGGAACGGAGACCGGAG     DCADDDDEEDCDFDHIIIEJJJJIJIHFHGGIEHGIHIIJJJJJJJIJIGIJIIJJJJJJIHFGHHHFDFFFCCC       NM:i:1  MD:Z:41T33      MC:Z:75M        AS:i:70 XS:i:0
Code Breakdown:
Coordinate sorts
Comment:
Take a look at the coordinate sorted bam vs original. When we view, notice the coordinates in the sorted bam were altered

6. Duplicate marking alignment

Code:
sample=MCF10A_input_chr19
java -jar /usr/local/picard/picard.jar MarkDuplicates I=~/workspace/alignments/${sample}.sorted.bam O=~/workspace/alignments/${sample}.dup_marked.sorted.bam M=~/workspace/alignments/${sample}.dup_marked.output.log ASSUME_SORTED=TRUE VALIDATION_STRINGENCY=LENIENT > ~/workspace/alignments/${sample}.dup_marked.error.log
Code Breakdown:
java -jar \ # Requires java to intepret
/usr/local/picard/picard.jar \ #Point to toolkit
MarkDuplicates \  #Specify function from toolkit
I=~/workspace/alignments/${sample}.sorted.bam \ #Input
O=~/workspace/alignments/${sample}.dup_marked.sorted.bam \ #output
M=~/workspace/alignments/${sample}.dup_marked.output.log \ #Work log
ASSUME_SORTED=TRUE \ #B/C we already sorted this will be true
VALIDATION_STRINGENCY=LENIENT \ #Emit warnings but keep going if possible
> ~/workspace/alignments/{sample}.dup_marked.error.log
Output:
See log. Its a bit long but the breakdown as follows:
1. A summary of the command used (so we can check the parameters)
2. A metric rollup equivalent to the flagstat step run later on
3. A histogram where, Col 1 is expected coverage col 2 is actual
"In case of many duplicates, the second column will result in much lower values, indicating that sequencing more will not add proportionally to the obtained effective coverage."
https://github.com/broadinstitute/picard/issues/917

7. Flagstats

Code:
sample=MCF10A_input_chr19
samtools flagstat ~/workspace/alignments/${sample}.dup_marked.sorted.bam > ~/workspace/alignments/${sample}.dup_marked.sorted.flagstat
Output:
1726243 + 0 in total (QC-passed reads + QC-failed reads)
34545 + 0 secondary
0 + 0 supplementary
64636 + 0 duplicates
1545613 + 0 mapped (89.54% : N/A)
1691698 + 0 paired in sequencing
845849 + 0 read1
845849 + 0 read2
1192976 + 0 properly paired (70.52% : N/A)
1330944 + 0 with itself and mate mapped
180124 + 0 singletons (10.65% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Comment:
Run flagstat on MCF10A_input_chr19.sorted.bam. What are the differences?
Try running the alignment with MCF10A_H3K27ac_chr19!

8. Clean up!

Code:
rm ~/workspace/alignments/${sample}.sorted.bam
rm ~/workspace/alignments/${sample}.bam
Comment:
Good practice to remove redundant files

Module 2. Peak Calling


1. Dedup BAM file
2. Running a peak caller
3. Remove blacklist regions
4. Visualization

Setup

mkdir ~/workspace/peaks
samtools index -@4 ~/CourseData/EPI_data/Module1/MCF10A_resources/*.bam

1. Dedup BAM file

Code:
treatment_bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_H3K27ac.bam
treatment_dedup=~/workspace/peaks/MCF10A_H3K27ac.dedup.bam
samtools view -@4 ${treatment_bam} -bh -q10 -F1028 -o ${treatment_dedup}
input_bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_Input.bam
input_dedup=~/workspace/peaks/MCF10A_Input.dedup.bam
samtools view -@4 ${treatment_bam} -bh -q10 -F1028 -o ${treatment_dedup}
Code Breakdown:
samtools view -@4 ${treatment_bam} -bh -q10 -F1028 chr19# {"-F1028": removes reads that unmapped and duplicated,
"-@4": uses two threads,
"-bh": outputs in binary,
"q10": MAPQ must be >=10

Comments:
This step will take a while, copies have been prepared in resources. Smallest file taking 3 minutes. Longest taking 10.

2. Running a peak caller

Code:
mkdir -p ~/workspace/peaks
treatment_frag=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_H3K27ac.dedup.bam
input_frag=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_Input.dedup.bam
name=MCF10A_H3K27ac

macs3 callpeak -t ${treatment_frag} -c ${input_frag} -f BAMPE -g hs -n ${name} --keep-dup all --outdir ~/workspace/peaks/ --bdg 1> ~/workspace/peaks/${name}.out.log 2> ~/workspace/peaks/${name}.err.log
Code Breakdown:
macs3 callpeak \ #General purpose peak calling mode
-t ${treatment_frag} \ #Treatment file, can provide more than one to "pool"
-c ${input_frag} \ #Control File/Input
-f BAMPE \ #Instructs MACS2 on what kind of file to expect. Single/Paired-end bed/bam
-g hs \ #Sets appropriate genome size for background sampling hs=human, mm=humouse
-n ${name} \ #name
#-q 0.05 #FDR q value default
#-broad #"broad mode" for broad marks - stitches small peaks together
--outdir ~/workspace/peaks/ \ #where to output files otherwise stores in current working directory
--bdg \ #outputs pileup into bedgraphs
1> ~/workspace/peaks/${name}.out.log \ #output log
2>  ~/workspace/peaks/${name}.err.log #error log

cat  ~/workspace/peaks/${name}.err.lo
Comments:
List for common marks but is not exhaustive. Best practice would be to observe trends on genome browser and then decide.
Narrow Marks:
H3K27ac
H3K4me3

Broad Marks:
H3K27me3
H3K4me1
H3K36me3
H3K36me2
H3K9me3

Take a look at the err.log we see:
1. Summary of command (a good way of troubleshooting if the wrong paramters were provided
2. Progress of loading the two bams
3. Note the line "mean fragment size is determined as 227.0 bp from treatment"

Output:
See logs

2 Black list removal/Inspection

  • 2.1 Black list removal

    Code:
    blacklist=~/CourseData/EPI_data/Module1/QC_resources/hg38_blacklist.bed
    sample="MCF10A_H3K27ac_peaks"
    bedtools intersect -v -a ~/workspace/peaks/${sample}.narrowPeak -b ${blacklist} > ~/workspace/peaks/${sample}.blacklistRemoved.narrowPeak
    bedtools intersect -u -a ~/workspace/peaks/${sample}.narrowPeak -b ${blacklist} > ~/workspace/peaks/${sample}.blacklist.narrowPeak
    
    Output:
    No log. Bedtools completes Silently
  • 2.2 NarrowPeak inspection

    Code:
sample="MCF10A_H3K27ac_peaks"
wc -l ~/workspace/peaks/${sample}.blacklistRemoved.narrowPeak
head ~/workspace/peaks/${sample}.blacklistRemoved.narrowPeak -n5
Output:
32384 /home/ubuntu/workspace/peaks/MCF10A_H3K27ac_peaks.blacklistRemoved.narrowPeak
chr1    10003   10467   MCF10A_H3K27ac_peak_1   2160    .       47.2436 221.596 216.028 179
chr1    17283   17514   MCF10A_H3K27ac_peak_2   27      .       3.9983  5.1706  2.76136 82
chr1    180312  180997  MCF10A_H3K27ac_peak_3   803     .       28.8766 85.1306 80.3433 461
chr1    777953  778720  MCF10A_H3K27ac_peak_4   210     .       10.1154 24.4429 21.0334 503
chr1    778984  779223  MCF10A_H3K27ac_peak_5   44      .       4.87285 7.00538 4.45379 69

Comments:
chromosome name
peak start
peak stop
peak name
int(-10*log10pvalue)
N/A
Fold change at peak summit
-log10 P-value at Peak
-log10 Q-value at Peak
Summit position relative to peak
  • 2.3 Quality control via % of reads in key genomic areas

    Code:
sample="MCF10A_H3K27ac"
query_bam=~/CourseData/EPI_data/Module1/MCF10A_resources/${sample}.bam
samtools view -@4 -q 10 -F 1028 $query_bam -c
samtools view -@4 -q 10 -F 1028 $query_bam -L ~/CourseData/EPI_data/Module1/QC_resources/encode_enhancers_liftover.bed -c
samtools view -@4 -q 10 -F 1028 $query_bam -L ~/workspace/peaks/${sample}_peaks.blacklistRemoved.narrowPeak -c
Output:
19632094
10752392
2028377

Comments:
samtools view #Viewing entirety or subset of BAM file
-q 10 \ #Parsing for reads with 5>=MAPQ
-F 1028 \ # Remove reads that have the following flags:UNMAP,DUP #https://broadinstitute.github.io/picard/explain-flags.html
$query_bam \ # Bam of interest
-c # Count number of reads instead of displaying
-L # Region of interest. Bed File
Enrichment at key regions:
  H3K4me1 H3K27me3 H3K27ac H3K4me3
Reads Count        
Total 71,545,746 55,438,916 19,632,094 26,381,511
Enhancer 48,466,245 25,742,977 10,752,392 12,302,265
HOX 13,866 13,085 4,504 28,674
TSS 7,303,584 5,111,631 2,702,194 18,041,080
In Peaks 27,539,876 25,072,097 2,028,377 20,920,281
         
         
  H3K4me1 H3K27me3 H3K27ac H3K4me3
Percent of Total        
Total 100.00 100.00 100.00 100.00
Enhancer 67.74 46.43 54.77 46.63
HOX 0.02 0.02 0.02 0.11
TSS 10.21 9.22 13.76 68.39
In Peaks 38.49 45.22 10.33 79.30

3. Visualization

  • 3.1 Visualization (Coverage)

    Code:
    mkdir -p ~/workspace/{bigBed,bigWig}
    sample="MCF10A_H3K27ac"
    input_bedgraph=~/workspace/peaks/${sample}_treat_pileup.bdg
    output_bigwig=~/workspace/bigWig/${sample}_treat_pileup.bw
    chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
    sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp
    bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig}
    rm ~/workspace/bigWig/tmp
    
input_bedgraph=~/workspace/peaks/${sample}_control_lambda.bdg
output_bigwig=~/workspace/bigWig/${sample}_control_lambda.bw
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp
bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig}
rm ~/workspace/bigWig/tmp
Output:
N/A
Code Breakdown:
mkdir -p ~/workspace/{bigBed,bigWig}
sample="MCF10A_H3K27ac" # Specify variables
input_bedgraph=~/workspace/peaks/${sample}_treat_pileup.bdg #
output_bigwig=~/workspace/bigWig/${sample}_treat_pileup.bw #
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes #
sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp  # Sorts peaks as to match alphabetical order in col 1 and numerical in col 2 #$(chrom_sizes) into a temporary file
bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig} # Conversion step
rm ~/workspace/bigWig/tmp #remove temporary file
  • 3.2 Visualization (Peaks)

    Code:
    mkdir -p ~/workspace/{bigBed,bigWig}
    sample="MCF10A_H3K27ac"
    input_bed=~/workspace/peaks/${sample}_peaks.blacklistRemoved.narrowPeak
    output_bigbed=~/workspace/bigBed/${sample}_peaks.blacklistRemoved.bb
    chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
    cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n  > ~/workspace/bigBed/tmp
    bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed}
    rm ~/workspace/bigBed/tmp
    
Output:
N/A
Code Breakdown:
sample="MCF10A_H3K27ac"
input_bed=~/workspace/peaks/${sample}_peaks.blacklistRemoved.narrowPeak
output_bigbed=~/workspace/bigBed/${sample}_peaks.blacklistRemoved.bb
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n  > ~/workspace/bigBed/tmp
bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed} # Largely the same as above, except we trim our narrowPeak file to just 3 columns
rm ~/workspace/bigBed/tmp
  • 3.3 Visualization

    1. Download tracks via public url
    2. Upload onto IGV
    3. Explore!
      Comments:

      ```Shell If you’re following via amazon AWS, you should have a public port open to the IPv4 associated with your instance in which you can interact via browser The files you would want to import would be:

/BigBed//MCF10A_H3K27ac_peaks.blacklistRemoved.bb //bigWig/bigWig/${sample}_treat_pileup.bw /bigWig/${sample}_control_lambda.bw ``` ```Shell Check out the provided regions! Filtered out region due to intersect w/ blacklist chr16:34,145,433-34,160,335 ``` ```Shell Artifact region: chr16:34514779-34732558 ``` ```Shell HOXA region: chr7:26981738-27365875 ``` ### 4. Homework ##### Comments: ```Shell Try Running H3K27me3, H3K4me1 and H3K4me3 on your own! When running H3K27me3 and H3K4me1 note the comments and code breakdown of step 2 ``` ## Module 3. Differential analysis *** ### Setup ```Shell mkdir analysis ``` #### - Comparing regions ##### Code: ```Shell MCF10A_H3K27ac=~/workspace/peaks/MCF10A_H3K27ac_peaks.blacklistRemoved.narrowPeak Basal_H3K27ac=~/CourseData/EPI_data/Module1/CEEHRC_resources/Basal_69069.CEEHRC.CEMT0035.H3K27ac.peak.bed mkdir -p ~/workspace/analysis bedtools intersect -u -a ${MCF10A_H3K27ac} -b ${Basal_H3K27ac} | wc -l bedtools intersect -v -a ${MCF10A_H3K27ac} -b ${Basal_H3K27ac} | cut -f1-3 > ~/workspace/analysis/MCF10A_H3K27ac_unique.bed bedtools intersect -v -b ${MCF10A_H3K27ac} -a ${Basal_H3K27ac} | wc -l wc -l ~/workspace/analysis/MCF10A_H3K27ac_unique.bed head ${MCF10A_H3K27ac} head ${Basal_H3K27ac} bedtools intersect -u -a ${MCF10A_H3K27ac} -b ${Basal_H3K27ac} | head bedtools intersect -u -b ${Basal_H3K27ac} -a ${MCF10A_H3K27ac} | head bedtools intersect -u -b ${MCF10A_H3K27ac} -a ${Basal_H3K27ac} | head ``` ##### Output: ```Shell 18355 107949 14029 ``` ##### Comments: ```Shell bedtools intersect \ #Tool intersects regions from A onto B to perform a function -u \ # A region of A could intersect the multiple regions of B, thus we do not want to over count and only report it once -a ${MCF10A_H3K27ac} \ # Depending on the operation bedtools will only provide the output relative to "A", thus if we want "B" we would have to switch the command order -b ${Basal_H3K27ac} \ # Comperator | wc -l # Line count bedtools intersect -u -a ${MCF10A_H3K27ac} -b ${Basal_H3K27ac} | wc -l # Reports number of unique regions in A that are in common with B bedtools intersect -v -a ${MCF10A_H3K27ac} -b ${Basal_H3K27ac} | wc -l # Reports number of regions that are not found in B bedtools intersect -v -b ${MCF10A_H3K27ac} -a ${Basal_H3K27ac} | wc -l # We swapped A with B to report regions unique to B head ${MCF10A_H3K27ac} # Note the # of coluns : 7 head ${Basal_H3K27ac} # Note the # of coluns : 3 bedtools intersect -u -a ${MCF10A_H3K27ac} -b ${Basal_H3K27ac} | head # Bedtools always returns the input of "-a" queried against "-b" so in this case our output has 7 cols bedtools intersect -u -b ${Basal_H3K27ac} -a ${MCF10A_H3K27ac} | head # If we swap the command positions, does this change? No b/c the command is essentially the same bedtools intersect -u -b ${MCF10A_H3K27ac} -a ${Basal_H3K27ac} | head # Here we swapped the files around and returned Basal_H3K27ac (3 cols) ``` ##### Peak Count Results: |Mark| |MCF10A|Basal|Luminal|LuminalProgenitor|Stromal| |--|--|--|--|--|--|--| |H3K27ac||32384|126386|76203|102867|149943| |||A-B|14029|13643|13267|14560| |||AuB|18355|18741|19117|17824| |||B-A|107949|59645|84630|131251| |H3K27me3||184034|314557|383257|344099|361087| |||A-B|87203|86349|93197|84175| |||AuB|96831|97685|90837|99859| |||B-A|140374|198506|178309|177294| |H3K4me1||141821|186957|170000|177095|204897| |||A-B|83813|88651|86570|89159| |||AuB|58008|53170|55251|52662| |||B-A|91024|84548|87121|114524| |H3K4me3||27632|26230|23216|29104|24974| |||A-B|9578|10599|9281|10532| |||AuB|18054|17033|18351|17100| |||B-A|7993|6370|10669|7789| ## Cheat sheet *** #### Dedup Bams ##### Code: ```Shell mkdir -p ~/workspace/peaks for mark in H3K27ac H3K4me3 H3K27me3 H3K4me1 Input; do bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${mark}.bam dedup=~/workspace/peaks/MCF10A_${mark}.dedup.bam samtools view -@4 ${bam} -bh -q10 -F1028 -o ${dedup} done ``` #### Peak Calling all the marks ##### Code: ```Shell input_frag=~/workspace/peaks/MCF10A_Input.dedup.bam for narrow_mark in H3K27ac H3K4me3; do treatment_frag=~/workspace/peaks/MCF10A_${narrow_mark}.dedup.bam macs3 callpeak -t ${treatment_frag} -c ${input_frag} -f BAMPE -g hs -n MCF10A_${narrow_mark} --keep-dup all --outdir ~/workspace/peaks/ --bdg 1> ~/workspace/peaks/${narrow_mark}.out.log 2> ~/workspace/peaks/${narrow_mark}.err.log done for broad_mark in H3K4me1 H3K27me3; do treatment_frag=~/workspace/peaks/MCF10A_${broad_mark}.dedup.bam macs3 callpeak -t ${treatment_frag} -c ${input_frag} -f BAMPE -g hs -n MCF10A_${broad_mark} --keep-dup all --outdir ~/workspace/peaks/ --bdg --broad 1> ~/workspace/peaks/${broad_mark}.out.log 2> ~/workspace/peaks/${broad_mark}.err.log done ``` #### Blacklist Filtering all Peaks ##### Code: ```Shell blacklist=~/CourseData/EPI_data/Module1/QC_resources/hg38_blacklist.bed for narrow_mark in H3K27ac H3K4me3; do bedtools intersect -v -a ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.narrowPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak bedtools intersect -u -a ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.narrowPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklist.narrowPeak done for broad_mark in H3K4me1 H3K27me3; do bedtools intersect -v -a ~/workspace/peaks/MCF10A_${broad_mark}_peaks.broadPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.broadPeak bedtools intersect -u -a ~/workspace/peaks/MCF10A_${broad_mark}_peaks.broadPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklist.broadPeak done ``` #### BigWig converting all your *treat_pileup.bdg ##### Code: ```Shell mkdir -p ~/workspace/{bigBed,bigWig} chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes for mark in H3K27ac H3K4me3 H3K27me3 H3K4me1 do input_bedgraph=~/workspace/peaks/MCF10A_${mark}_treat_pileup.bdg output_bigwig=~/workspace/bigWig/MCF10A_${mark}_treat_pileup.bw sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig} rm ~/workspace/bigWig/tmp done sample="MCF10A_H3K27ac" input_bedgraph=~/workspace/peaks/${sample}_control_lambda.bdg output_bigwig=~/workspace/bigWig/${sample}_control_lambda.bw chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig} rm ~/workspace/bigWig/tmp ``` #### BigBed converting all your *Peaks ##### Code: ```Shell mkdir -p ~/workspace/{bigBed,bigWig} chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes for narrow_mark in H3K27ac H3K4me3; do input_bed=~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak output_bigbed=~/workspace/bigBed/MCF10A_${narrow_mark}_peaks.blacklistRemoved.bb chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n > ~/workspace/bigBed/tmp bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed} done for broad_mark in H3K4me1 H3K27me3; do input_bed=~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.broadPeak output_bigbed=~/workspace/bigBed/MCF10A_${broad_mark}_peaks.blacklistRemoved.bb chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n > ~/workspace/bigBed/tmp bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed} done ``` #### Total read counts per BAM ##### Code: ```Shell for narrow_mark in H3K27ac H3K4me3; do bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${narrow_mark}.bam peak=~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak echo total reads in ${narrow_mark} $(samtools view -@4 ${bam} -q10 -F1028 -c ) done for broad_mark in H3K4me1 H3K27me3; do bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${broad_mark}.bam peak=~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.narrowPeak echo total reads in ${broad_mark} $(samtools view -@4 ${bam} -bh -q10 -F1028 -c ) done ``` #### Reads in Enriched region per BAM ##### Code: ```Shell for narrow_mark in H3K27ac H3K4me3; do bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${narrow_mark}.bam peak=~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak echo enriched reads in ${narrow_mark} $(samtools view -@4 ${bam} -q10 -F1028 -c -L ${peak}) done for broad_mark in H3K4me1 H3K27me3; do bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${broad_mark}.bam peak=~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.broadPeak echo enriched reads in ${broad_mark} $(samtools view -@4 ${bam} -bh -q10 -F1028 -c -L ${peak}) done ``` #### Reads in Domain regions per BAM ##### Code: ```Shell for file in $(ls ~/CourseData/EPI_data/Module1/QC_resources/* | grep bed | grep -v blacklist); do for bam in $(ls ~/CourseData/EPI_data/Module1/MCF10A_resources/*bam ); do echo $(basename ${bam}) $(basename ${file}) $(samtools view -@4 -q 10 -F 1028 ${bam} -L $file -c ) done done ``` #### Counts total peaks per file ##### Code: ```Shell for file in $(ls ~/workspace/peaks/*.blacklistRemoved.*Peak); do wc -l $file done for file in $(ls ~/CourseData/EPI_data/Module1/CEEHRC_resources/*.bed); do wc -l $file done ``` #### Counts total peaks per overlap ##### Code: ```Shell for histone_mark in H3K27ac H3K27me3 H3K4me3 H3K4me1; do for fileA in $(ls ~/workspace/peaks/*${histone_mark}*.blacklistRemoved*Peak); do for fileB in $(ls ~/CourseData/EPI_data/Module1/CEEHRC_resources/*${histone_mark}*.bed); do echo $(basename $fileA) $(basename $fileB) AuB $(bedtools intersect -u -a $fileA -b $fileB | wc -l) echo $(basename $fileA) $(basename $fileB) A-B $(bedtools intersect -v -a $fileA -b $fileB | wc -l) echo $(basename $fileA) $(basename $fileB) B-A $(bedtools intersect -v -b $fileA -a $fileB | wc -l) done done done ```