Module 5

Lecture

Lab: GeneMANIA (Cytoscape version)

Goal of this practical lab

Create GeneMANIA networks starting from a single gene to predict its function or starting from a gene list. Explore and understand the main output features of GeneMANIA such as the network composition or the enriched functions. This practical consists of 3 exercises.

Before starting the exercises,download the files:

Right click on link below and select “Save Link As…”.

Place it in the corresponding module directory of your CBW work directory.

Network layouts are flexible and can be rearranged. What you see when you perform these exercises may not be identical to what you see in the tutorial, or what you have seen other times that you have performed the exercises. Exact layouts and predictions can also be affected by updates to the networks database that GeneMANIA uses. However it is expected that the network weights and predicted genes will be similar to those shown here.

EXERCISE 1: Searching GeneMANIA with single gene

Imagine that you are interested in exploring the function of the human GRN gene: GRN returned as the strongest hit from your omics experiment but not much information about this gene is available in functional databases. Use GeneMANIA to identify its predicted function as well as potential interaction partners.

Skills:

  • GeneMANIA Single Gene search
  • Navigating Search Results
  • Exploring available Genes features
  • Rerun a new analysis using a single gene or multiple genes queried from the network.

Steps

  1. Open Cytoscape.

  2. In the network tab Locate the Network search bar located at the top of the Control Panel. Make sure that the database selected is GeneMANIA

  3. In the search window, ensure that the model organism is set to Homo sapiens homo.

  4. Enter the following gene in the GeneMANIA search bar: GRN

  5. Click on the search icon search and wait for the results.
    gc_1.1.png

  6. When your search results load, examine the network. Genes part of the query set are indicated in black, related genes added by GeneMANIA are represented in gray, and colored links represent the interactions that connect the nodes (genes).
    GC2.png

zoom in and zoom out using trackpad or mouse scrolling up and down.

  1. Locate the Functions summary tab in Results Panel.
    GC3.png

Questions:
* What are the functions significantly associated with this network?
* GRN is the central node of this network: which function would you predict for GRN?
* How well did GeneMANIA perform? (hints: use GeneCards (http://www.genecards.org/) , PubMed (http://www.ncbi.nlm.nih.gov/pubmed/))?

ANSWERS

Question What are the functions significantly associated with this network?
Answer the list of the functions associated with the network are listed in the above screenshot. The top 2 pathways are “vacuolar lumen” and “primary lysosome” and are significant under a FDR threshold less than 0.005.

Question GRN is the central node of this network: which function would you predict for GRN? Answer : a function related to lysosome and vacuole

Question How well did GeneMANIA perform (hints: use GeneCards (http://www.genecards.org/) , PubMed (http://www.ncbi.nlm.nih.gov/pubmed/))?
Answer The top functions predicted by GeneMANIA for GRN were related to lysosome and vacuole. A pubmed search could confirm these results: “We experimentally verified that granulin precursor (GRN) gene, whose mutations cause frontotemporal lobar degeneration, is involved in lysosome function.” (Transcriptional gene network inference from a massive dataset elucidates transcriptome organization and gene function. Belcastro et al. Nucleic Acids Res. 2011 Nov 1;39(20):8677-88. 2011. PMID:21785136)

  1. Locate the genes with the strongest associations with GRN.

These genes are the largest nodes in the network.

Answer is SLPI and SORT1

  1. Re-run an analysis by adding SORT1, SLPI to the search. Type ‘SORT1’ and ‘SLPI’ in the search box that already contains ‘GRN’ (one gene per line). Click on the search button.


gc_1.9.png

Question:Which functions are associated with this new network?

GC9b.png

GC9c.png

Biological interpretation of the results:

A paper describing the interaction between GRN and SORT1 and demonstrates how finding related genes could be relevant for elaborating therapy:

Targeted manipulation of the sortilin–progranulin axis rescues progranulin haploinsufficiency. Lee et al. Hum Mol Genet. 2014 March 15; 23(6): 1467–1478. PMCID:PMC3929086
“Progranulin (GRN) mutations causing haploinsufficiency are a major cause of frontotemporal lobar degeneration (FTLD-TDP). Recent discoveries demonstrating sortilin (SORT1) is a neuronal receptor for PGRN endocytosis and a determinant of plasma PGRN levels portend the development of enhancers targeting the SORT1–PGRN axis. We demonstrate the preclinical efficacy of several approaches through which impairing PGRN’s interaction with SORT1 restores extracellular PGRN levels. “

  1. Save the network as an image by clicking on File, Export, Network to Image… and setting the Export File Format to “PDF(*.pdf)”.
    GC10.png



EXERCISE 2: Searching GeneMANIA with gene list

To start this exercise, you need to download the 30_prostate_cancer_genes.txt file and save it on your computer.

For this exercise, you are working with a list of 30 prostate cancer genes. This list was downloaded from the cBioPortal website (http://www.cbioportal.org/). The cBioPortal for Cancer Genomics stores genomic data from large scale, integrated cancer genomic data sets. During this exercise, you will explore the types of networks that have been used to create the GeneMANIA network from the prostate cancer gene list and you will see how changing input parameters can affect the results. The last step of the exercise consists of uploading a custom network which is a list of genes that are positively correlated with CYP11B1 in mRNA expression data of 94 prostate cancer samples (http://www.cbioportal.org/) .

Skills:

  • GeneMANIA search using a gene list;
  • Navigating Search Results;
  • Exploring Networks and advanced options;
  • Uploading a custom network.

Steps

  1. Open Cytoscape.

  2. Locate the GeneMANIA search window located on the left side in Control Panel.

  3. Copy and paste genes in the file 30_prostate_cancer_genes.txt

    • Make sure that the parameter ‘Max resultant genes’ is set to ‘20’ by clicking on the menu button options at the right side of the search box and selecting ‘Customise advanced options’.
  4. Click on the search icon search and wait for the results.
    gc_2_4.png ``

  5. When your search results load, examine the network. Query genes are indicated in black, related genes added by GeneMANIA are represented in gray, and colored links represent the interactions that connect the nodes (genes). Move nodes around by selecting them with a mouse to investigate how they are connected.
    GC2_5.png

  6. Click any link (edge) connecting two nodes to highlight information about it. The information about the interaction is display in the Edge Table located in Table Panel (at the bottom) in the networks and data type columns.

    • Note: Clicking on an edge between 2 nodes will display information about all interaction networks that connect these 2 nodes.
    • It indicates the reference (publication) for these interactions.
    • The colors indicate the type of interaction (co-expression, shared protein domains, co-localization, physical interactions and predicted).
      gc_2.6.png
  7. Locate and expand the ‘Networks’ summary tab in Results Panel (on the right) and look at what data has been used to create the network and predictions.

    • Note that Co-expression (purple colored lines, weight over 25%) and Shared protein domains (lightgold colored lines, weight over 30%) influence the results the most, but Co-localization (blue colored lines), Physical interactions (salmon colored lines) and Predicted (orange) data are also included.
    • At the top of the Networks summary tab, use the menu button options and try Expand “All, then “Top-Level” and “None” to get information about the sources of the different networks.
      GC2_7.png

The observations of the number of connections makes it easier to understand why co-expression and shared protein domains have the highest percent weight for this network: they are helping to connect more genes than physical interactions and predicted interactions; A higher weight means that this network contributed more to finding related genes.

  1. Highlight all connections corresponding to each network by clicking the name of each network category.
  • Click on “Shared protein domains” and see which genes are connected by predicted protein protein interaction.
    GC2_8a.png
  • You can do the same for “Co-localization” , “Co-expression” and “Physical interactions”.
    GC2_8b.png
  1. Locate the Functions summary tab and look at what functions were significantly enriched in this list of prostate genes.
  • The top pathway with the strongest enrichments is: “oxidoreductase activity, acting on CH-OH group of donors” with 28 genes in the prostate cancer list overlapping with this pathway.
  • The FDR is equal to 6.4e-46.
    GC2_9.png

Question:
“Shared protein domains” is an important part of the network. What would the GeneMANIA results be if we didn’t include this source when we ran GeneMANIA search?

  • Go back to the ‘Network’ tab on the right side of the Cytoscape window to find the GeneMania search bar.
  • Click on the option menu button options which is located at the right of the search box.
  • Uncheck ‘Shared protein domains’ and click on a point outside the box to close it.
  • Click on the search icon search.
  • Explore the results.
    GC2_10a.png

Answer
If “shared protein domain” is removed, the relationships between the nodes are primarily from the Co-expression, Co-localization, Predicted and Physical interactions networks. The genes added to the network are different compared to the first network created with “Shared protein domain”.
GC2_10b.png

Question:
Locate the Functions summary tab in Results Panel and look at what functions were significantly enriched with these new settings.

Answer
With the new settings, “steroid biosynthetic process” is the new top enriched pathway.
GC2_11.png

  1. Try to modify additional parameters like Max Resultant Genes or Network Weighting and look at how the changes you made influenced the results.


EXERCISE 3: Searching GeneMANIA with mixed gene list

To start this exercise, you need to download the Mixed_gene_list.txt file and save it on your computer.

For this exercise, you are working on a gene list created by combining 3 user defined gene lists available from the cBioportal (http://www.cbioportal.org). It contains genes implicated in the DNA damage response, the PI3K-AKT-mTOR signaling pathway and Folate transport. This list is representative of a gene list obtained from transcriptomics data. During this exercise, we will first characterize our gene list based on functions and then we will add potential drug and microRNAs targeting genes in the network, and we will save the report.

Skills:

  • GeneMANIA search using a gene list;
  • Navigating Search Results;
  • Exploring Functions;
  • Adding attributes;
  • Create a report.

Steps

  1. Before performing the next GeneMANIA search make sure the GeneMANIA parameters are set back to the default values.

  2. Open Cytoscape and locate the GeneMANIA search window located on the left side in Control Panel.

  3. In the search window, ensure that the model organism is set to Homo sapiens homo .

  4. Copy and paste genes in the file Mixed_gene_list.txt. Click on the search icon search and wait for the results. Explore the network.
    gc_3_2.png

  5. Locate the Functions summary tab in Result Panel and look at functions returned by GeneMANIA.
    GC3_4.png

  6. In the functions summary tab, check some functions to color genes included in these functions. To follow this tutorial, you can for example color the “DNA recombination” , “response to insulin” functions.
    GC3_4a.png
    GC3_4b.png

  7. Color genes according to their GeneMANIA defined functions:

    • Go to the Control Panel tabs located on the right side of the Cytoscape window and select the Style tab.
    • In the Node panel, expand the Fill Color tab.
    • Set Column to annotation name.
      gc3_5a.png
    • Locate “DNA recombination”.
    • Double click on the white space at the right side of the box and click on the 3 dots options2. A Colors box appears.
    • Choose a color of your choice and click on OK.
      GC3_5.png
    • Locate “response to insulin”. Double click on the white space at the right side of the box and click on the 3 buttons menu. A Colors box appears.
    • Choose a color of your choice and click on OK.
      GC3_5b.png
  8. Locate our favorite gene PDPK1 on the network.

    • Click on the icon First Neighbor of Selected Nodes neighbour. It will highlight this gene and all its connections.
      GC3_6.png
    • Click on the icon From Selected Nodes, all Edges new network to create a subnetwork.
      GC3_6b.png
    • The resultsing subnetwork will only have the selected nodes from the first network
      GC3_6c.png

copy “PDPK1” to the search box, click enter and the node will be highlighted in yellow in the network.


GeneMANIA DEFINITIONS:

What are the different networks: Definition of the types of interaction:

  • Shared domains: Protein domain data. Two gene products are linked if they have the same protein domain. These data are collected from domain databases, such as InterPro, SMART and Pfam.

  • Co-localization: Genes expressed in the same tissue, or proteins found in the same location. Two genes are linked if they are both expressed in the same tissue or if their gene products are both identified in the same cellular location.

  • Co-expression: Gene expression data. Two genes are linked if their expression levels are similar across conditions in a gene expression study. Most of this data is collected from the Gene Expression Omnibus (GEO); we only collect data associated with a publication.

  • Predicted: Predicted functional relationships between genes, often protein interactions. A major source of predicted data is mapping known functional relationships from another organism via orthology.

What is defined by evidence sources?:

  • Evidence sources are the information contained in the multiple databases that GeneMANIA uses to establish interaction between two genes.

Network:

  • Node : circle representing the genes

  • Edge: line that links two nodes and represent an interaction between two genes (multiple lines correspond to multiple sources)

  • Node size: Mapped to gene score, i.e. the degree to which GeneMANIA predicts the genes are related

  • Thickness of edge: Strength/weight of interaction

Layout : The layout is different each time so the user can request the layout run multiple times until the user is satisfied with the result.

in Networks tab:

  • Percent weight (score) : a higher weight means that this network helped more to find related genes.

in Functions tab :

  • FDR : False discovery rate (FDR) is greater than or equal to the probability that this is a false positive.

  • Coverage : (number of genes in the network with a given function) / (all genes in the genome with the function)

In advanced options:

  • Network weighting? GeneMANIA can use a few different methods to weight networks when combining all networks to form the final composite network that results from a search. The default settings are usually appropriate, but you can choose a weighting method in the advanced option panel. (more details at http://pages.genemania.org/help/).

  • Related genes : are genes added by GeneMANIA in addition to the genes from the query. It helps to expand the network and predict function of the query gene(s).

  • The attributes represent the differences sources of evidence that can be used to build the network.

Notes :

  • prostate cancer gene list is “AKR1C3 AR CYB5A CYP11A1 CYP11B1 CYP11B2 CYP17A1 CYP19A1 CYP21A2 HSD17B1 HSD17B10 HSD17B11 HSD17B12 HSD17B13 HSD17B14 HSD17B2 HSD17B3 HSD17B4 HSD17B6 HSD17B7 HSD17B8 HSD3B1 HSD3B2 HSD3B7 RDH5 SHBG SRD5A1 SRD5A3 STAR”.

  • mixed gene list is AKT1 AKT1S1 AKT2 ATM ATR BRCA1 BRCA2 CHEK1 CHEK2 FANCF FOLR1 FOLR2 FOLR3 FOXO1 FOXO3 MDC1 MLH1 MLST8 MSH2 MTOR PARP1 PDPK1 PIK3CA PIK3R1 PIK3R2 PTEN RAD51 RHEB RICTOR RPTOR SLC19A1 TSC1 TSC2

look at GeneMANIA help pages when you run an analysis on your own after the workshop: http://pages.genemania.org/help/.

EXERCISE 4 (OPTIONAL): Discover the stringApp

stringApp imports functional associations or physical interactions between protein-protein and protein-chemical pairs from STRING, Viruses.STRING, STITCH, DISEASES and from PubMed text mining into Cytoscape. Users provide a list of one or more gene, protein, compound, disease, or PubMed queries, the species, the network type, and a confidence score and stringApp queries the database to return the matching network.

Currently, five different queries are supported:

  • STRING: protein query – enter a list of protein names (e.g. gene symbols or UniProt identifiers/accession numbers) to obtain a STRING network for the proteins
  • STRING: PubMed query – enter a PubMed query and utilize text mining to get a STRING network for the top N proteins associated with the query
  • STRING: disease query – enter a disease name to retrieve a STRING network of the top N proteins associated with the specified disease
  • STITCH: protein/compound query – enter a list of protein or compound names to obtain a network for them from STITCH
  • STRING: cross-species query – choose two species to obtain a STRING network between and within the proteins of the interacting species

Data

Let’s use the prostate cancer gene list that we used in exercise 1.

Steps:

  1. Open Cytoscape
  2. Make sure stringApp is installed. Go to menu, Apps, App Store, Show App Store. Install the app if necessary.
  3. In Cytoscape, locate the Network tab and select STRING, STRING: protein query by clicking the down arrow.

start

  1. Copy and paste the 30_prostate_cancer_genes.txt in the blank field and click on the search button.

  2. Observe the network that has been created. The genes from our list are connected by predicted protein-protein interactions.
    start

  3. On the right side of the Cytoscape window, locate and expand the STRING tab.

    • Make sure that the Nodes tab is selected.
    • Play with parameters on the top fields: Glass ball effect, STRING style labels, etc… and observe the changes on the network.
      start
  4. Optimize the layout. In Cytoscape, go to the menu bar, Layout, yFiles Organic Layout.start

  5. Go back to the STRING Note tab on the right side:

    • Select a node and look at the gene details in the Selected nodes tab.
    • Try the Functional enrichment and observe the resuls in the STRING Enrichment table located below the network.
      start
  6. Select the Edges tab.

    • The score slide bar enables to select the interactions with the strongest prediction scores.
    • The Subscore table traces the source of the predicted interactions using several evidence scores.
      start

Lab (GeneMANIA web version)

Goal of this practical lab

Create GeneMANIA networks starting from a single gene to predict its function or starting from a gene list. Explore and understand the main output features of GeneMANIA such as the network composition or the enriched functions.

This practical consists of 3 exercises. You can choose to do these exercises using the questions as your only guide (section ‘QUESTIONS AND STEPS TO FOLLOW) - or see the following pages for the step-by-step checklist to find the answers (section ’ANSWERS: DETAILED STEPS AND SCREENSHOTS’).

Before starting the exercises,download the files:

Right click on link below and select “Save Link As…”.

Place the file in your CBW work directory in the corresponding module directory.

Network layouts are flexible and can be rearranged. What you see when you perform these exercises may not be identical to what you see in the tutorial, or what you have seen other times that you have performed the exercises. Exact layouts and predictions can also be affected by updates to the networks database that GeneMANIA uses. However it is expected that the network weights and predicted genes will be similar to those shown here.

EXERCISE 1: QUESTIONS AND STEPS TO FOLLOW

Imagine that you are interested in exploring the function of the human GRN gene: GRN returned as the strongest hit from your omics experiment but not many information about this gene is available in functional databases. Use GeneMANIA to identify its predicted function as well as potential interaction partners.

Skills:

  • GeneMANIA Single Gene search; Navigating Search Results;
  • Exploring available Genes features;
  • Rerun a new analysis using a single gene or multiple genes query from the network.

STEPS

  1. Go to GeneMANIA’s homepage at http://www.genemania.org/

  2. In the search window, ensure that the model organism is set to Homo sapiens homo.

  3. Enter the following gene: GRN

  4. Click on the search icon search and wait for the results.

  5. When your search results load, examine the network. Query genes are indicated with stripes, related genes added by GeneMANIA are represented in black, and colored links represent the interactions that connect the nodes (genes).

  6. Clicking on a node gives information about its name, the possibility to add or remove this gene from the search (if the gene was not part of the initial search remove from search will be grayed out) or run a search with this gene only.

    • Click on the GRN node and explore the displayed information.
  7. Locate the Functions summary tab (bottom left icon circle).

  8. Locate the gene with the strongest association with GRN.

The larger the node in this network, the stronger its association with the query. Node size is correlated to its GeneMANIA score.

  1. Re-run the analysis with added genes SORT1, SLPI to the search.
    • Which functions are associated with this new network circle?
  2. On the left side of the window are located icons that we haven’t yet explored. The first 3 buttons activate different network layouts. Try
    • the circular circular,
    • the aligned aligned, and
    • the force_directed force layouts.
  3. Choose your favorite layout and
    • save the network as an image using the Network image As Shown option from the save menu save.
    • The menu can be opened by clicking on the 3 dots icon on the left hand side of the window (not the three dot icon in the search bar).

EXERCISE 1 ANSWERS: DETAILED EXPLANATION AND SCREENSHOTS

EXERCISE 1 - STEPS 1-4

start

EXERCISE 1 - STEP 5

start

EXERCISE 1 - STEP 6

start

Exercise 1 - STEP 7

start

Question What are the functions significantly associated with this network?
Answer the list of the functions associated with the network are listed in the above screenshot. “vacuolar lumen” and “primary lysosome” are the top 2 functions.

Question GRN is the central node of this network: which function would you predict for GRN?
Answer : a function related to lysosome and vacuole

Question How well did GeneMANIA perform (hints: use GeneCards (http://www.genecards.org/) , PubMed (http://www.ncbi.nlm.nih.gov/pubmed/))?
Answer The top functions predicted by GeneMANIA for GRN were related to lysosome and vacuole. A pubmed search could confirm these results: “We experimentally verified that granulin precursor (GRN) gene, whose mutations cause frontotemporal lobar degeneration, is involved in lysosome function.” (Transcriptional gene network inference from a massive dataset elucidates transcriptome organization and gene function. Belcastro et al. Nucleic Acids Res. 2011 Nov 1;39(20):8677-88. 2011. PMID:21785136)

Exercise 1 - STEP 8

Question Locate the genes with the strongest association with GRN (thick edge).
Answer is SORT1 and SLPI

Exercise 1 - STEP 9

start

start

Exercise 1 - STEP 10 (layouts)

Circular layout

start

Aligned layout

start

Force directed layout

start

Exercise 1 - STEP 11 (save an image)

start

Notes about biological interpretation of the results:

A paper describing the interaction between GRN and SORT1 and demonstrates how finding related genes could be relevant for elaborating therapy:

Targeted manipulation of the sortilin–progranulin axis rescues progranulin haploinsufficiency. Lee et al. Hum Mol Genet. 2014 March 15; 23(6): 1467–1478. PMCID:PMC3929086
“Progranulin (GRN) mutations causing haploinsufficiency are a major cause of frontotemporal lobar degeneration (FTLD-TDP). Recent discoveries demonstrating sortilin (SORT1) is a neuronal receptor for PGRN endocytosis and a determinant of plasma PGRN levels portend the development of enhancers targeting the SORT1–PGRN axis. We demonstrate the preclinical efficacy of several approaches through which impairing PGRN’s interaction with SORT1 restores extracellular PGRN levels. “

start


EXERCISE 2: QUESTIONS AND STEPS TO FOLLOW

To start this exercise, you need to download the 30_prostate_cancer_genes.txt file and save it on your computer.

For this exercise, you are working with a list of 30 prostate cancer genes. This list can be downloaded after the workshop from the cBioPortal website (http://www.cbioportal.org/). The cBioPortal for Cancer Genomics stores genomic data from large scale, integrated cancer genomic data sets. During this exercise, you will explore the types of networks that have been used to create the GeneMANIA network from the prostate cancer gene list and you will see how changing input parameters can affect the results.

Skills:

  • GeneMANIA search using a gene list;
  • Navigating Search Results;
  • Exploring Networks and advanced options;
  • Uploading a custom network.

STEPS

  1. Go to GeneMANIA’s homepage at http://www.genemania.org/

  2. In the search window, ensure that the model organism is set to Homo sapiens homo .

  3. Copy and paste genes in the file 30_prostate_cancer_genes.txt.

    • Make sure that the parameter ‘Max resultant genes’ is set to 20 by clicking on the 3 menu buttons at the right side of the search box and selecting ‘Customize advanced options’.
    • Set ‘Max resultant attributes’ to 10.
  4. Click on the search icon search and wait for the results.

  5. When your search results load, examine the network.

    • Genes you searched with are indicated with stripes,
    • related genes added by GeneMANIA are represented in black,
    • and colored links represent the interactions that connect the nodes (genes).
    • Move nodes around by selecting them with a mouse to investigate how they are connected.
  6. Click any link (edge) connecting two nodes to highlight information about it.

Clicking on an edge between 2 nodes will display information about all interaction networks that connect these 2 nodes. It indicates the reference (publication) for these interactions. The color indicates the type of interaction (co-expression, shared protein domains, co-localization, physical interactions and predicted).

  1. Locate and expand the ‘Networks’ summary tab (on the right lines) and look at what data has been used to create the network and predictions.

Shared protein domains (lightgold colored lines, weight over 30%) and Co-expression (purple colored lines, weight over 20%) influence the results the most, but Co-localization (blue colored lines), Physical interactions (salmon colored lines) and Predicted (orange) data also contribute.

At the top of the Networks summary tab,

  • click on the down arrow.
  • try Expand “none”, then “top” and “all” to get information about the sources of the different networks.
  1. Highlight all connections corresponding to each network by clicking the name of each network category.
    • Click on “Shared protein domains” and see which genes are connected by shared protein domains.
    • You can do the same for “Co-localization” , “Co-expression” and “Physical interactions”.

Seeing or highlighting the number of connections for each data source makes it easier to understand why co-expression and shared protein domains have the highest percent weight for this network: * they connect more genes than physical interactions and predicted; * A higher weight means that this network contributes more to finding related genes.

  1. Locate the Functions summary tab (bottom left circle) and look at what functions were significantly enriched in this list of prostate genes.

  2. “Shared protein domains” is an important part of this network. What would happen to the GeneMANIA results if we didn’t include this source when we run this GeneMANIA search?

    • Click on ‘Show advanced option options’ which is located at the right of the search box.
    • Uncheck ‘Shared protein domains’ and
    • click on the search icon search.
    • Explore the results.
  3. Locate the Functions summary tab (bottom left circle) and look at what functions were significantly enriched with these new settings.

  4. Upload a custom network to GeneMANIA:

    • Go to the menu option at the right of the search box (the icon with three dots) and
    • at the bottom of the network list, locate Uploaded, expand this option using the down arrow
    • click on “Upload a network” and browse your computer to locate and select the file CYP11B_pearson_correlation_prostate.txt.
    • Wait about a minute for the network to be uploaded.
    • Click on the search icon to launch the query
    • explore the results and locate the genes linked by the custom network

click on “Uploaded” in the Networks tab on right hand side.

  1. Try additional parameters of the ‘Customise advanced options options’ tab and look at how the changes you made influenced the results. For example change ‘Network weighting’ method or ‘Max resultant genes:’.

EXERCISE 2 ANSWERS: DETAILED STEPS AND SCREENSHOTS

Exercise 2 - STEPS 1 to 4

start

Check that the parameter ‘Max resultant genes’ is set to ‘20’ and ‘Max resultant attribute’ to ‘10’

start

Exercise 2 - STEP 5

start

Exercise 2 - STEP 6.

start

Exercise 2 - STEP 7

start

start

Exercise 2 - STEP 8

start

start

Exercise 2 - STEP 9

The top pathways with the strongest enrichments are: “oxidoreductase activity” with 28 genes in the list overlapping with this pathway. The FDR is equal to 6.39e-46.

start

Exercise 2 - STEP 10

Question “Shared protein domains” is an important part of the network. What would be the GeneMANIA results if we don’t include this source when we run the GeneMANIA search?
Answer If “shared protein domain” is removed, the relationships between the nodes are from the Co-expression, Co-localization, Predicted and Physical interactions networks.The genes added to the network are different compared to the first network created with “Shared protein domain”.

start

start

Exercise 2 - STEP 11

Question What functions were significantly enriched with these new settings?
Answer With the new settings, “steroid biosynthetic process” is the new top enriched pathway.

start

Exercise 2 - STEP 12

start

start

start

Exercise 2 - STEP 13.

start


EXERCISE 3: QUESTIONS AND STEPS TO FOLLOW

To start this exercise, you need to download the Mixed_gene_list.txt file and save it on your computer.

For this exercise, you are working on a gene list created by combining 3 user defined gene lists available from the cBioportal (http://www.cbioportal.org). It contains genes implicated in the DNA damage response, the PI3K-AKT-mTOR signaling pathway and Folate transport. This list is representative of a gene list obtained from transcriptomics data. During this exercise, we will first characterize our gene list based on functions and then we will add potential drug and microRNAs targeting genes in the network, and we will save the report.

Skills:

  • GeneMANIA search using a gene list;
  • Navigating Search Results;
  • Exploring Functions;
  • Adding attributes;
  • Create a report.

STEPS

  1. Go to GeneMANIA’s homepage at http://www.genemania.org/.

  2. In the search window,

    • ensure that the model organism is set to Homo sapiens homo .
    • ensure that your Uploaded network from the previous exercise is not selected. to delete it you can click on the red ‘x’ next to it.
  3. Copy and paste genes in the file Mixed_gene_list.txt. Click on the search icon search and wait for the results.

  4. Locate the Functions summary tab (bottom left circle) and look at functions returned by GeneMANIA

  5. In the functions summary tab, check some functions to color genes included in these functions. To follow this tutorial, you can for example color the “response to insulin” , “DNA recombination”

  6. Next, we will add miRs and drug interaction networks.

    • Click on ‘Show advanced option options’ which is located at the right of the search box.
    • In the ‘Networks’ tab, expand ‘Attributes’ and check “Drug-interactions-2020” and “miRNA-target-predictions-2020”.
    • Check “Physical interactions” and “Co-expression” .
    • Click on “Customise advanced options”. Set “Max resultant genes” to 20 and “Max resultant attributes” to 40.
    • Click on the search icon search and wait for the results. Explore the network.

Drug-interactions and miRNA-target-predictions nodes are displayed in gray. The nodes connected to a drug are genes that are targeted by that drug and nodes connected to a microRNA (miR) are genes predicted to be targeted by that miR.

  1. Locate our favorite gene PDPK1 in the network,

    • select it by moving the mouse cursor to its node and wait there for a second. (you can also, click and hold on the node)
    • This will highlight this gene and all its connections.
  2. Generate and save a report of your results by locating the save menu save, and selecting “Report”. The PDF report provides a detailed description of your search and results.

  3. Investigate the “history” function by clicking on the related icon redo located at the bottom of the window. A panel pops up showing the past networks generated by GeneMANIA. Clicking on one panel will relaunch the search for this network.

Exercise 3: MORE DETAILS AND SCREENSHOTS

Exercise 3 - STEPS 1 - 3

start

start

Exercise 3 - STEP 4/ STEP5

start

Exercise 3 - STEPS 6

start

start

Exercise 3 - STEP 7

start

Exercise 3 - STEP 8

start

start

Exercise 3 - STEP 9

start

SOME DEFINITIONS:

What are the networks: Definition of the types of interaction:

  • Shared domains: Protein domain data. Two gene products are linked if they have the same protein domain. These data are collected from domain databases, such as InterPro, SMART and Pfam.

  • Co-localization: Genes expressed in the same tissue, or proteins found in the same location. Two genes are linked if they are both expressed in the same tissue or if their gene products are both identified in the same cellular location.

  • Co-expression: Gene expression data. Two genes are linked if their expression levels are similar across conditions in a gene expression study. Most of these data are collected from the Gene Expression Omnibus (GEO); we only collect data associated with a publication.

  • Predicted: Predicted functional relationships between genes, often protein interactions. A major source of predicted data is mapping known functional relationships from another organism via orthology.

What is defined by evidence sources?:

  • Evidence sources are the information contained in the multiple databases that GeneMANIA uses to establish interaction between two genes.

Network:

  • Node : circle representing the genes

  • Edge: line that links two nodes and represent an interaction between two genes (multiple lines correspond to multiple sources

  • Node size: Mapped to gene score, i.e. the degree to which GeneMANIA predicts the genes are related

  • Thickness of edge: Strength/weight of interaction

Layout : The layout is different each time so the user can request the layout run multiple times until the user is satisfied with the result.

in Networks tab:

  • Percent weight (score) : a higher weight means that this network helped more to find related genes.

in Functions tab :

  • FDR : False discovery rate (FDR) is greater than or equal to the probability that this is a false positive.

  • Coverage : (number of genes in the network with a given function) / (all genes in the genome with the function)

In advanced options:

  • Network weighting? GeneMANIA can use a few different methods to weight networks when combining all networks to form the final composite network that results from a search. The default settings are usually appropriate, but you can choose a weighting method in the advanced option panel. (more details at http://pages.genemania.org/help/).

  • Related genes : are genes added by GeneMANIA in addition to the genes from the query. It helps to grow the network and then to predict function of the query gene(s).

  • The attributes represent the differences sources of evidence that can be used to build the network.

Notes :

  • prostate cancer gene list is “AKR1C3 AR CYB5A CYP11A1 CYP11B1 CYP11B2 CYP17A1 CYP19A1 CYP21A2 HSD17B1 HSD17B10 HSD17B11 HSD17B12 HSD17B13 HSD17B14 HSD17B2 HSD17B3 HSD17B4 HSD17B6 HSD17B7 HSD17B8 HSD3B1 HSD3B2 HSD3B7 RDH5 SHBG SRD5A1 SRD5A3 STAR”.

  • mixed gene list is AKT1 AKT1S1 AKT2 ATM ATR BRCA1 BRCA2 CHEK1 CHEK2 FANCF FOLR1 FOLR2 FOLR3 FOXO1 FOXO3 MDC1 MLH1 MLST8 MSH2 MTOR PARP1 PDPK1 PIK3CA PIK3R1 PIK3R2 PTEN RAD51 RHEB RICTOR RPTOR SLC19A1 TSC1 TSC2

look at GeneMANIA help pages when you run an analysis on your own after the workshop: http://pages.genemania.org/help/.