Compiled: Fri Mar 3 14:51:39 2023

1 Introduction

This document describes how the Biocrates p180-based targeted metabolomics data from the CHRIS study (Pattaro et al. 2015) (Verri Hernandes et al. 2022) can be loaded and analyzed in R.

The provided values for each metabolite is a absolute concentration in natural scale.

2 Usage

Data in TDF format is best imported into R using the tidyfr R package which can be installed using the code below.

remotes::install_github("EuracBiomedicalResearch/tidyfr")

To load the CHRIS p180 metabolomics data, access to the folder with the data in TDF format is required. The name of the data folder (and hence the name of the CHRIS module) is metabolomics_p180. Below we list all versions of the data set available for the this module. In this example we assume the metabolomics_p180 folder to be present in the current folder in which R is run. Parameter path allows to specify the folder in which CHRIS data modules are stored (we use "." to indicate the current working directory).

library(tidyfr)
## 
## Attaching package: 'tidyfr'
## The following object is masked from 'package:utils':
## 
##     data
list_data_modules(path = ".")
##                name version
## 1 metabolomics_p180 1.0.0.1
## 2 metabolomics_p180 1.0.1.1
##                                                                                                                                                             description
## 1 Targeted metabolomics data based on the Biocrates p180 kit comprising measured concentrations of 175 metabolites and lipids in serum samples from CHRIS participants.
## 2   Targeted metabolomics data based on the Biocrates p180 kit comprising measured concentrations of 175 metabolites and lipids in serum samples of CHRIS participants.

In this particular case, 2 versions of the module are available. We can next load the module with the data_module function specifying the name of the module, the version we want to load and also the path where the module can be found.

metabo <- data_module("metabolomics_p180", version = "1.0.1.1", path = ".")

The metabo variable is now a reference to this module which provides some general information:

metabo
## Object of class DataModule 
##  o name: metabolomics_p180
##  o version:  1.0.1.1
##  o description:  Targeted metabolomics data based on the Biocrates p180 kit comprising measured concentrations of 175 metabolites and lipids in serum samples of CHRIS participants.
##  o date: 2020-03

Note that we could also use the functions moduleName, moduleVersion, moduleDescription and moduleDate to extract these metadata.

The actual data can be loaded with the data function. This will read the full data which includes general information for each measurement along with the metabolite concentrations and a quality information flag for each individual measurement.

metabo_data <- data(metabo)
dim(metabo_data)
## [1] 7251  350

In this data columns are variables (such as metabolite concentrations of quality information) and rows participants. For the present data set concentrations for 175 metabolites are available along with quality information on each of these measurements (hence there are in total 350 columns). The AIDs (i.e., identifiers for each sample/participant) are provided as the row names of the data.frame. Description and information on the individual columns can be loaded with the labels function:

metabo_ann <- labels(metabo)

This data.frame contains metadata and additional information on each variable in data:

head(metabo_ann)
##           label unit  type          min        max missing description
## x0pt001 x0pt001   NA float 2.745243e-03  257.15694     -89        ADMA
## x0pt002 x0pt002   NA float 1.605280e+02  686.16781     -89         Ala
## x0pt003 x0pt003   NA float 1.455732e-03   27.97999     -89   alpha-AAA
## x0pt004 x0pt004   NA float 4.195551e+01  197.87741     -89         Arg
## x0pt005 x0pt005   NA float 2.329051e+01   96.62628     -89         Asn
## x0pt006 x0pt006   NA float 1.925211e+00 2291.64673     -89         Asp
##         analyte_name   analyte_class analyte_quant            biochemical_name
## x0pt001         ADMA biogenic amines          semi Asymmetric dimethylarginine
## x0pt002          Ala      aminoacids            ok                     Alanine
## x0pt003    alpha-AAA biogenic amines          semi      alpha-Aminoadipic acid
## x0pt004          Arg      aminoacids            ok                    Arginine
## x0pt005          Asn      aminoacids            ok                  Asparagine
## x0pt006          Asp      aminoacids          semi                   Aspartate
##                        aliases   formula lipid_maps analyte_flag   analyte_note
## x0pt001                   ADMA C8H18N4O2                       0               
## x0pt002                Alanine   C3H7NO2                       0               
## x0pt003 alpha-Aminoadipic acid  C6H11NO4                       1 outlier plates
## x0pt004               Arginine C6H14N4O2                       0               
## x0pt005             Asparagine  C4H8N2O3                       0               
## x0pt006              Aspartate   C4H7NO4                       0               
##             hmdb_id ms_type cv_qc_chris            long_description
## x0pt001 HMDB0001539   LC-MS  0.06627664 Asymmetric dimethylarginine
## x0pt002 HMDB0000161   LC-MS  0.04789639                     Alanine
## x0pt003 HMDB0000510   LC-MS  0.21486936      alpha-Aminoadipic acid
## x0pt004 HMDB0000517   LC-MS  0.03824621                    Arginine
## x0pt005 HMDB0000168   LC-MS  0.03138246                  Asparagine
## x0pt006 HMDB0000191   LC-MS  0.05464532                   Aspartate

Each row provides annotations for each column in the metabo_data data.frame (rows and columns are in the same order).

stopifnot(all(rownames(metabo_ann) == colnames(metabo_data)))

The column cv_qc_chris represent the coefficient of variation (CV) calculated on QC samples, the QC CHRIS Pool samples. This values thus represents the technical variability for each metabolite in the present dataset.

While these functions now loaded the data, it is suggested to further process and reformat the data to simplify its analysis. At first we replace the internal identifiers for CHRIS labels (i.e. starting with x0pt*) with more meaningful column names.

colnames(metabo_data) <- metabo_ann$description
rownames(metabo_ann) <- metabo_ann$description

With that it is much easier to access individual values.

quantile(metabo_data$Gly, na.rm = TRUE)
##        0%       25%       50%       75%      100% 
##  91.36936 202.04114 235.15132 277.15061 693.95894

It is also helpful to discriminate between columns in metabo_data that contain the actual metabolite concentrations or the quality information. Below we identify the columns with quality information. For these the keyword * flags* is added to the metabolite name.

flag_cols <- grep("flags$", colnames(metabo_data))

We next list all available quality information (which is encoded as a factor).

levels(unlist(metabo_data[, flag_cols]))
## [1] "OK"                                  "Removed because of technical reason"
## [3] "Below lower level of quantification" "Above upper level of quantification"
## [5] "Below level of detection"

Also, we will need at some point to annotate metabolites with some additional annotations. For that we can use the metabo_ann data frame that represents the annotation for the labels of the data module. Below we select 5 random metabolites and extract their annotation from this data frame.

ids <- sample(colnames(metabo_data)[-flag_cols], 5)
metabo_ann[ids, ]
##               label unit  type         min         max missing description
## Ile         x0pt015   NA float 26.78392210 243.3909540     -89         Ile
## PC aa C26:0 x0pt087   NA float  0.30337354   1.1886800     -89 PC aa C26:0
## Spermidine  x0pt029   NA float  0.02855096   0.7281069     -89  Spermidine
## C10:2       x0pt040   NA float  0.04312934   0.2741845     -89       C10:2
## SDMA        x0pt026   NA float  0.14667452 232.3364149     -89        SDMA
##             analyte_name        analyte_class analyte_quant
## Ile                  Ile           aminoacids            ok
## PC aa C26:0  PC aa C26:0 glycerophospholipids          semi
## Spermidine    Spermidine      biogenic amines          semi
## C10:2              C10:2       acylcarnitines          semi
## SDMA                SDMA      biogenic amines          semi
##                             biochemical_name             aliases    formula
## Ile                               Isoleucine          Isoleucine   C6H13NO2
## PC aa C26:0 Phosphatidylcholine diacyl C26:0             PC 26:0 C34H68NO8P
## Spermidine                        Spermidine          Spermidine    C7H19N3
## C10:2                    Decadienylcarnitine Decadienylcarnitine  C17H29NO4
## SDMA              Symmetric dimethylarginine                SDMA  C8H18N4O2
##                                                                                lipid_maps
## Ile                                                                                      
## PC aa C26:0 LMGP01010388;LMGP01010432;LMGP01010475;LMGP01010456;LMGP01010725;LMGP01011243
## Spermidine                                                                               
## C10:2                                                                                    
## SDMA                                                                                     
##             analyte_flag                                      analyte_note
## Ile                    0                                                  
## PC aa C26:0            0                                                  
## Spermidine             0                                                  
## C10:2                  1 small dynamic range in QC samples; outlier plates
## SDMA                   0                                                  
##                 hmdb_id ms_type cv_qc_chris                 long_description
## Ile         HMDB0000172   LC-MS  0.03691048                       Isoleucine
## PC aa C26:0                 FIA  0.07579529 Phosphatidylcholine diacyl C26:0
## Spermidine  HMDB0001257   LC-MS  0.05213707                       Spermidine
## C10:2       HMDB0013325     FIA  0.09161977              Decadienylcarnitine
## SDMA        HMDB0003334   LC-MS  0.08145940       Symmetric dimethylarginine

We could also simply calculate the mean abundance of these 5 metabolites across all available CHRIS participants using the code below:

vapply(metabo_data[, ids], mean, numeric(1), na.rm = TRUE)
##         Ile PC aa C26:0  Spermidine       C10:2        SDMA 
## 66.24374575  0.49278034  0.19819053  0.08016315  0.50839056

As we can see from the values above they are in natural scale - so, for data analysis it might be better to transform them using log2 or log10 (which will also ensure the data to be more Gaussian distributed). See also (Verri Hernandes et al. 2022) for more information on data distribution and quality.

3 Session information

sessionInfo()
## R Under development (unstable) (2023-02-22 r83892)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tidyfr_0.99.11   BiocStyle_2.27.1 rmarkdown_2.20  
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.0                   knitr_1.42                 
##  [3] rlang_1.0.6                 xfun_0.37                  
##  [5] DelayedArray_0.25.0         jsonlite_1.8.4             
##  [7] SummarizedExperiment_1.29.1 S4Vectors_0.37.3           
##  [9] RCurl_1.98-1.10             htmltools_0.5.4            
## [11] sass_0.4.5                  stats4_4.3.0               
## [13] MatrixGenerics_1.11.0       Biobase_2.59.0             
## [15] grid_4.3.0                  evaluate_0.20              
## [17] jquerylib_0.1.4             bitops_1.0-7               
## [19] fastmap_1.1.1               yaml_2.3.7                 
## [21] IRanges_2.33.0              GenomeInfoDb_1.35.15       
## [23] bookdown_0.32               BiocManager_1.30.20        
## [25] compiler_4.3.0              XVector_0.39.0             
## [27] lattice_0.20-45             digest_0.6.31              
## [29] R6_2.5.1                    GenomeInfoDbData_1.2.9     
## [31] GenomicRanges_1.51.4        Matrix_1.5-3               
## [33] bslib_0.4.2                 tools_4.3.0                
## [35] matrixStats_0.63.0          zlibbioc_1.45.0            
## [37] BiocGenerics_0.45.0         cachem_1.0.7

References

Pattaro, Cristian, Martin Gögele, Deborah Mascalzoni, Roberto Melotti, Christine Schwienbacher, Alessandro De Grandi, Luisa Foco, et al. 2015. “The Cooperative Health Research in South Tyrol (CHRIS) Study: Rationale, Objectives, and Preliminary Results.” Journal of Translational Medicine 13 (1): 348. https://doi.org/10.1186/s12967-015-0704-9.
Verri Hernandes, Vinicius, Nikola Dordevic, Essi Marjatta Hantikainen, Baldur Bragi Sigurdsson, Sigurður Vidir Smárason, Vanessa Garcia-Larsen, Martin Gögele, et al. 2022. “Age, Sex, Body Mass Index, Diet and Menopause Related Metabolites in a Large Homogeneous Alpine Cohort.” Metabolites 12 (3): 205. https://doi.org/10.3390/metabo12030205.