Lab Module 8 - gProfiler
Lab Module 8 - Pathway Over-representation Analysis
By Jüri Reimand
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.
Let’s get a few gene lists for analysis
Highlights:
- Using the integrated data sets, the authors identified 127 significantly mutated genes as candidate cancer driver genes
- Genes under positive selection, either in individual or multiple tumour types, tend to display higher mutation frequencies above background.
- The statistical analysis identified 127 such genes
- The mutational significance in cancer (MuSiC) package was used to identify significant genes for both individual tumour types and the Pan-Cancer sample cohort. Dees et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 2012
- These significantly mutated genes are involved in a wide range of cellular processes, including transcription factors/regulators, histone modifiers, genome integrity, receptor tyrosine kinase signalling, cell cycle, mitogen- activated protein kinases (MAPK) signalling, phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/b-catenin signalling, histones, ubiquitin- mediated proteolysis, and splicing (Fig. 2).
Supplementary Data, Table 4
- globally significant, frequency >= 1% for glioblastoma multiforme (GBM): 46
- globally significant, frequency >= 1% for kidney renal clear cell carcinoma (KIRC): 53
Let’s use g:Profiler to obtain enrichment results
First set the parameters and filter gene sets to be analysed:
Note that input genes are ordered according to p-value so the Ordered Query option is appropriate.
Then paste the gene list (GBM shown here) and press g:Profile to start the analysis.
Scroll down to see significantly enriched pathways and processes. Scroll right to see gene annotations of GO processes (colored) and Reactome pathways (black; scroll further down).
Browse results:
- Click on numbers in the column n. of common genes to find genes that are part of a given process.
- Click graph icon to see how processes are related to each other.
- Check checkbox No electronic GO annotations and run query again. Now you see the enrichment analysis that uses only high-confidence gene annotations.
- Uncheck checkbox Hierarchical sorting to reveal ranking of results by corrected p-value.
g:Convert - gene ID conversion:
- Paste gene list into the Query box.
- In the Target Database list, select the desired type of gene/protein identifiers, for example UNIPROTSWISSPROT. Click Convert IDs to continue.
For Enrichment Map construction:
- Set Output Type to Generic Enrichment Map (TAB).
- Click on g:Profile to run analysis again.
- Right-click on Download data in Generic Enrichment Map (GEM) format to save the file.
- Browse the downloaded file in a text editor. Note lists of genes in the leftmost column. These genes are part of the input list and responsible for the given pathway enrichment.
- At the bottom of Advanced Options, find Download g:Profiler data as GMT and right-click the link name to save the zip file with gene-set annotations.
- In the zip file, you will need the file hsapiens.pathways.NAME.gmt.
Building an Enrichment Map visualization in Cytoscape
- From the main menu, select Apps>Enrichment Map>Create Enrichment Map. If you don’t have this app, install version 2.2.1 using Cytoscape App Manager or Cytoscape App Store.
- First load all the files:
- GMT file: has gene-set definitions
- Enrichment of data-set 1: has enrichment statistics for GBM
- Enrichment of data-set 2: has enrichment statistics for KIRC
- Set analysis parameters.
a. FDR-corrected P-values are filtered in g:Profiler and no further filtering is needed in Enrichment map.
b. Jaccard coefficient + overlap combined defines the stringency edges between gene sets and the granularity of the map. Use this parameter to create denser or sparser networks. The value 0.66 provides relatively stringent similarity cutoffs and leads to sparser maps.
- Click ‘Build’. An enrichment map will be generated.
- The resulting map links nodes to first gene list (GBM) and edges of nodes to second gene list (KIRC). Entirely red nodes are pathways with significant enrichment in both gene lists. Red nodes with gray edges are only significant in first gene list and grey nodes with red edges are only significant in the second list.
- Map the second gene list to blue tones to make this visualization easier to read. The simplest way involves a spreadsheet editor such as Excel. Edit the phenotype column and replace all values with -1. Make sure you save the file as text file. Review the new text file in a text editor to verify that no extra symbols or quotes have been added by Excel. See https://www.ncbi.nlm.nih.gov/pubmed/27552985.
- Build the enrichment map again and KIRC pathways will map to blue tones.