MetaboLabPy

Basic 1D-NMR Data Processing

Introduction

To demonstrate basic 1D NMR data processing in MetaboLabPy, we are using two  metabolite standard NMR datasets available at the BMRB NMR database (https://bmrb.io/metabolomics/). The metabolites we are going to use for this demonstration are L-Lactate and L-Glutamate.

We load the NMR data into MetaboLabPy by clicking on “Open NMR Spectrum” in the file menu or by using the keyboard shortcut Ctrl + O in windows or + O in macOS to load in the data for L-Lactate and then repeat the same procedure to read in the data for L-Glutamate. In order to plot each spectrum in a different colour, we select the Display Parameters tab and select a colour for each of the spectra. 

There are a few steps as part of every basic NMR data processing protocol. This includes the following steps:


  • Define the apodisation function with associated parameters. For 1D-NMR metabolomics data processing this is usually an exponential line-broadening of 0.3 Hz.
  • Select the post-acquisition water suppression if necessary.
  • Select the amount of zero-filling. In the example figure below we are using 131072 data points, the experimental data have a length of 32768 data points.
  • Fourier Transform the NMR spectrum.
  • Phase correction of the NMR spectrum. These NMR spectra were already phase corrected and had the TopSpin phase value stored inside the raw NMR dataset. MetaboLabPy reads in and uses these phase values, which is why these spectra are well phase corrected. To exercise interactive phase correction, we change the indicated values for ph0 and ph1 to be zero prior to Fourier Transform as indicated in the figure below.
  • Finally, we need to check whether the spectra are referenced. The samples used to acquire these spectra did contain TMSP, which resonated at 0 ppm. If the singlet resonance close to 0 ppm is not sitting at precisely 0 ppm, we enter autoref() into the MetaboLabPy command line to perform an automatic spectrum reference.

Phase Correction

To enter into the interactive phase correction mode, we click on Phase/Baseline Correction and then on Interactive Phase Correction in the Data menu. Alternatively, we can also use the keyboard shortcut Alt + p ( + p). However, before we enter into interactive phase correction mode, we select the first NMR spectrum as phaseReference experiment in the Display Parameters tab of the first NMR spectrum.




When the interactive phase correction mode is active, the status line on the bottom of the main window changes and indicates mouse actions to perform an interactive phase correction on the current spectrum. In addition to the spectrum, a red line appears, which indicates the position of the pivot, i.e. the position in the NMR spectrum at which first order phase correction has no effect. Please also note that in order to zoom in and out of the spectrum, we need to click on zoom to switch into zoom mode. Once we adjusted the view, we need to exit zoom mode to return to the phase correction mode.

To enter into the interactive phase correction mode, we click on Phase/Baseline Correction and then on Interactive Phase Correction in the Data menu. Alternatively, we can also use the keyboard shortcut Alt + p ( + p). When the interactive phase correction mode is active, the status line on the bottom of the main window changes and indicates mouse actions to perform an interactive phase correction on the current spectrum. In addition to the spectrum, a red line appears, which indicates the position of the pivot, i.e. the position in the NMR spectrum at which first order phase correction has no effect. Please also note that in order to zoom in and out of the spectrum, we need to click on zoom to switch into zoom mode. Once we adjusted the view, we need to exit zoom mode to return to the phase correction mode.

Once the first NMR spectrum is sufficiently phase corrected, we can change to the next NMR spectrum but either clicking the up-arrow in the Exp counter, enter the number 2 into the text field followed by pressing enter, or  by using the keyboard shortcut Alt + cursor key up ( + Arrow Up). The next spectrum will be displayed together with the previously selected phase reference spectrum, although phase correction will only be executed for the current NMR experiment. This allows for a consistent phase correction across series of NMR spectra.

Once the last spectrum has been phase corrected, interactive phase correction mode can be left by clicking on Exit or through the keyboard shortcut Alt + p ( + p).

A video showing another interactive demonstration of basic 1D NMR data processing in MetaboLabPy can be found below:

After the NMR spectra have been phase corrected, automated baseline correction can be used to correct the spectral baseline. MetaboLabPy uses the pybaselines package, documentation can be found here: 

https://pybaselines.readthedocs.io

Data Pre-Processing

In order to prepare a set of 1D NMR spectra for statistical data analysis, several steps have to be performed. MetaboLabPy provides GUI elements to facilitate this process, which is also known as data pre-processing. The data pre-processing GUI is made available by clicking on the “Data Pre-Processing” checkbox. Data to follow this tutorial can be downloaded here.

In the following paragraphs we will systematically go through all different steps of NMR data pre-processing in the order they are performed during data pre-processing.


Normally 1D NMR spectra for metabolomics studies are acquired with a few ppm worthwhile of empty spectrum at the edges of the NMR spectrum. This is done deliberately to achieve a flat baseline in all peak-containing regions of the spectrum. Because those areas of the NMR spectra do not contain any signals and only noise, this spectral noise can negatively impact statistical data analysis. Therefore, these regions are excluded from analysis. Because most metabolomics samples are in aqueous environments, the middle of the spectrum is usually centred on the water signal. This signal is suppressed. However, there will still be a residual water signal left and therefore this spectral region is excluded from the analysis as well. Finally, we need to exclude the TMSP signal, because we are only interested in differences of endogenous metabolites. Because the TMSP signal is usually the rightmost signal in these NMR spectra, the exclusion area on the right-hand side of the spectrum is usually extended to include the TMSP signal.

Because different samples may have small differences in their pH values, we may find some areas in the NMR spectra where NMR peaks move from sample to sample. To facilitate statistical analysis, we can align those areas independent of the rest of the NMR spectra through segmental alignment. 

Furthermore, any area in the NMR spectra below a certain noise threshold can be removed from the data matrix to minimise noise. NMR spectra are then bucketed so that each individual data point represents a width of 0.005 ppm. After that, especially for urine and blood samples, it is important to scale the different NMR spectra to minimise effects from different sample dilutions. This is usually achieved using probabilistic quotient normalisation. 

The final step of data pre-processing, before export to a useful data format, is then variance stabilisation, which is usually achieved using Pareto scaling of the data. The exact data format for export depends which platform should then be used for statistical data analysis. The Phenome Centre Birmingham (PCB) uses an Excel spreadsheet format, which e.g. MetaboAnalyst uses a csv-based format. Please have a look at the video to see a practical demonstration of all these steps.