Cancer Analysis 2021
CAN Module 2 - Viewing cancer alignments in IGV
Suggested Answers
Below are suggested answers to the questions from lab 2. You might have thought of additional reasons and answers!
Visualization Part 2: Inspecting small variants in the normal sample
Heterozygous and Homozygous SNPs
- Which variant is heterozygous and which is homozygous?
- Left is heterozygous (~half the reads are marked as mismatches at this location) and right is homoszygous (all are marked as mismatches)
- What are the variant allele frequencies for each SNP? To find out you can click/hover on the colored bars in the coverage track
- 59% (left) and 100% (right)
- Do these look like true SNPs? What evidence is there for this?
- Yes, the allele frequencies are close to what we would expect to see for heterozygous and homozygous SNPS, and all of the mismatched bases are of high quality, and there is no strand bias. Additionally, they both line up with known SNPs in the dbSNP track.
- Are these sequencing errors, SNPs, or SNVs?
- Sequencing errors and SNVs. The low quality mismatches are most likely sequencing errors. The higher quality mismatches that only occur in one or two reads are most likely somatic SNVs. They are not SNPs because they are not known germline mutations.
- What does “Shade base by quality” do and how might this be helpful?
- Adjusts the mismatch color to be dark when the base is of high quality and light when low quality. This helps us differentiate between sequencing errors and SNVs.
- Can a normal sample have somatic SNVs?
- Yes! There is always a chance there will be a mutation or error during genome replication
Homozygous deletion
- How large is this deletion?
- 24bp as indicated in the black horizontal line indicating a break in the read (relative to the reference genome)
- Is it homozygous or heterozygous?
- Homozygous-no reads span this region and there is a clear sharp drop in the coverage track.
GC coverage
- What do you notice about the coverage track and GC Percentage track?
- The coverage track slowly dips down to 0 and then back up, and the GC Percentage track has a clear drop to 0%
- What does coloring alignments by “read strand” tell you?
- No forward reads are to the left of this dip and no reverse reads are to the right.
- Do you think this is a deletion? Compare this region to the previous region.
- This is a loss in coverage due to no GC content, not a deletion. Regions of the genome with a low GC% are notoriously difficult to sequence. The lack of reads spanning this region indicates these fragments were not able to be sequenced at all.
Visualization Part 3: Inspecting small somatic variants in the tumor sample
Somatic SNVs
- How many SNVs are in this region for each sample?
- Normal: 3, Tumor: 4
- What is the variant allele frequency for the extra SNV in the tumor sample? How did it get this high?
- 37%: A mutation might have occurred in a cell that had a growth advantage that led to a clone (group of related cells) making up a large proportion of the tumor sample.
Somatic SNP with change in heterozygosity
- What are the variant allele frequencies for each sample?
- Normal: 46%, Tumor: 80%
- Why are these frequencies different?
- Some cells experienced a loss of heterozygosity at this locus, leading to the variant being more frequent in the tumor population overall
Somatic indel next to SNP with change in heterozygosity
- What type of variant is in your centre line?
- Small deletion
- What do you notice about the variant in the normal sample to the right?
- The variant allele is less frequent in the tumor sample and the reads with the deletion do not have the variant allele
- What might be an explanation for what happened?
- Some cells experienced a loss of heterozygosity at this site, leading to the reference allele being more frequent in the tumor population overall. This LOH event might have also been associated with the occurance of a deletion nearby.
Visualization Part 4: Inspecting structural variants in NA12878
Inversion
- What do you notice about the blue and teal reads at the top?
- The blue read pairs are both reverse facing, whereas the teal read pairs are both forward facing
- How does “View as pairs” help understand that this is an inversion?
- It shows us that not all read pairs in this region are forward-reverse as expected
- Is this inversion heterozygous or homozygous?
- Heterozygous, as there are lots of properly oriented read pairs as well
Duplication
- What do you notice about the green reads at the top?
- The read pairs are in the opposite orientation as expected (reverse-forward instead of forward-reverse)
Large deletion
- What do the red read pairs indicate?
- Read pairs with a larger insert size than expected. Since the senquence in between is missing in the sample, the read pairs came from an appropriately sized fragment, but when mapped to the reference genome they are much further apart than expected if the reference sequence was present
- What other track can we look at to see that this is a deletion?
- There is a dip in the coverage track indicating fewer reads mapped here