Informatics and Statistics for Metabolomics 2020 Module 2 Lab

Module 2 Lab: Metabolite Identification and Annotation

In this lab, you will perform metabolite ID and/or quantification using NMR data and Bayesil; GC-MS data and GC-AutoFit; and spectral processing of untargeted LC-MS data and MetaboAnalyst/R.

NMR

Example Data

The sample data is from a study comparing healthy controls to endometrial cancer cases in females. The sample type is human serum. There are 40 samples total, 20 controls and 20 cases. The study population are female adults over the age of 18 with a mean age of 59.2 +/- 12.7 years for the controls and 59.1 +/- 12.8 years for the cases.

General Instructions

We will be using a special version of Bayesil for analysis of the NMR data. Please use server and login information assigned to you. A small NMR data set is available here. The .zip folder contains 2 spectra. Upload the dateset into Bayesil and run the server. Save the results as an Excel file. Examine the results by eye to find interesting differences. Use HMDB to learn about the interesting metabolites. In additional we included two larger dataset that could be optionally run after the lab. These datasets include a 8-spectra dataset and 40-spectra dataset, which was used to obtain the uploaded NMR result.

Bayesil

Bayesil is a web-based tool for automated NMR spectral profiling that is very accurate (95%) and which uses probabilistic graphical models (PGM). It fits shift and peak intensity similar to the way humans perform fitting and pattern finding. Bayesil requires priop knowledge of probable biofluid composition. It has fully automated phasing, referencing, water removal, baseline correction, identification, and quantification.

img

Bayesil Example Settings

   
Biofluid Serum
Chemical Shift (CS) Reference DSS
CS Reference Concentration 833 um
NMR Frequency 500 MHz
Compressed FIDS CBW_NMR_Set.zip
Speed Standard

Bayesil Batch Results

“In order to collect profiling results, you need to go to each individual spectrum, click on the download drop down menu (please see the snapshot below) and select the option “Quantitatives CSV”. You would need to combine them in a spreadsheet program (e.g. MS Excel) for further statistical analysis.”

Bayesil Spectrum Results

img

GC-MS

Example Data

The sample data is from a study comparing 20 individuals with Eosinophilic Esophagitis (EoE) with 20 healthy controls, ages 3-13. The sample type is urine with an internal standard control of cholesterol. The example file contains one Alkane standard file, one blank file, and files for one sample from each group.

General Instructions

We will be using the special server of GC-AutoFit created for this workshop. Please use the server assigned to you for analysis of our data. The example data can be found here. Upload the files to GC-AutoFit and run the server. Save the results as an Excel file. Examine the results by eye to find interesting differences. Use HMDB to learn about the interesting metabolites.

GC-AutoFit (Automated GC-MS)

GC-AutoFit website requires 3 spectra (sample, blank, and alkane standard). It performs auto-alignment, peak ID, peak integration, and concentration calculation. It accepts NetCDF and mzXML files. It takes 60 seconds per spectrum with 40-115 compounds ID’d at 96% accuracy. It is optimized for blood, urine, saliva, and cerbral spinal fluid. It still requires careful sample preparation and derivatization.

img

Preparing GC-MS Spectral Files

3 types of input files:

  • alkane standard file (required; eg Alkane.mzXML, ALKstd.mzXML)
  • a blank sample file (optional but recommended; eg Blank.mzXML, Blk.mzXML)
  • Sample files (required)

Your files might require format conversion. Files are expected to be CDF or mzXML. “.D” format can be converted to “.CDF” or “.mzXML” format with conversion software such as ChemStation or ProteoWizard.

Uploading GC-MS Spectral Files

A. Individual files

img

B. .zip files

img

Select Sample Type and Library

You can either select an internal library (serum, urine/organic acids, saliva, etc) or a user library.

Set an internal standard for quantification:

img

or

img

Check Alkane Standard and Each Sample Spectrum

img

Profiling and Quantification

img

Final Results

Results are reported as a table (csv format) for each sample and merged concentrations for all samples.

img

LC-MS with MetaboAnalyst(R)

img

Example Data

This dataset was acquired using an UPLC-Q/E-ESI- in negative ionization mode. The 10 samples are a trimmed subset of a larger study from Lloyd-Price et al., and include fecal samples from people with Crohn’s Disease (CD, 4), healthy controls (4), and two quality controls. The metadata contains more sample information.

Example mzXML datasets (mzXML.zip files)

ibd_data_cbw2020_updated.zip

Spectra Processing with MetaboAnalyst (example result files)

Peak Table
Input for MS Peaks to Path

File conversion for vendor data

Use this tutorial to understand how to use msConvert to convert your data to open formats.

NOTE: Do this section only if you are comfortable using R

You will need to have R, MetaboAnalystR package dependencies, and MetaboAnalystR itself installed. You can optionally have ProteoWizard installed if your files are not in mzXML, mzML format (Windows only). All example data files are provided as mzXML converted files.

MetaboAnalystR

If you wish to use R instead of the MetaboAnalyst web-service for processing LC-MS spectra, please follow the MetaboAnalystR tutorial here.

Troubleshooting with Mac OS and MetaboAnalystR

In order to compile R for Mac OS, you need Xcode and GNU Fortran compiler installed (https://mac.r-project.org/tools/). MetaboAnalystR 3.0 also uses the Rccp R package as a dependency. Mac users may run into issues with getting this package installed. We suggest you follow these steps: https://thecoatlessprofessor.com/programming/cpp/r-compiler-tools-for-rcpp-on-macos/ to help with your installation.