Authors: Johannes Rainer, Vinicius Veri Hernandes and Sigurdur Smarason
` Modified: 2020-04-01 11:21:02
Compiled: Wed Apr 1 11:21:06 2020

1 Introduction

In this document we evaluate the quality of the normalized Biocrates-based targeted metabolomics data for the CHRIS7500 data set also in comparison to the quality before normalization. The quality assessment is performed separately for the LCMS and FIA data. The results from this document were used to define which of the replicated plates are part of the final data release.

Below we load all required packages and fetch the LCMS and the FIA data from the database.

library(BioCHRIStes)
library(xcms)
library(RColorBrewer)
library(pander)
library(RMariaDB)

bcd <- BioCHRIStes(dbConnect(MariaDB(), dbname = "biochristes_7500",
                             user = Sys.getenv("MYSQL_USER"),
                             pass = Sys.getenv("MYSQL_PASS"),
                             host = Sys.getenv("MYSQL_HOST")))

#' Load the raw data
lcms_raw <- fetchData(bcd, data = "lcms", raw = TRUE,
                      filter = ~ data_release == "CHRIS7500 for normalization")
fia_raw <- fetchData(bcd, data = "fia", raw = TRUE,
                     filter = ~ data_release == "CHRIS7500 for normalization")
#' Load the normalized data
lcms <- fetchData(bcd, data = "lcms", raw = FALSE,
                  filter = ~ data_release == "CHRIS7500 for normalization")
fia <- fetchData(bcd, data = "fia", raw = FALSE,
                 filter = ~ data_release == "CHRIS7500 for normalization")

#' Define colors for the different types of samples
cols <- brewer.pal(9, "Set1")[c(2, 3, 1, 5, 9)]
names(cols) <- c("QC NIST STD", "QC CHRIS Pool", "00p180_QC2",
                 "00p180_QC1", "study")

Next we add sample type information to the data, only the code for the LCMS data is shown below.

lcms_raw$group <- "study"
lcms_raw$group[lcms_raw$sample_name == "QC NIST STD"] <- "QC NIST STD"
lcms_raw$group[lcms_raw$sample_name == "QC CHRIS Pool"] <- "QC CHRIS Pool"
lcms_raw$group[lcms_raw$sample_name == "00p180_QC1"] <- "00p180_QC1"
lcms_raw$group[lcms_raw$sample_name == "00p180_QC2"] <- "00p180_QC2"

#' "well" should be treated as a categorical variable
lcms_raw$well <- factor(lcms_raw$well)

#' Order the data by plate_barcode and injection index
lcms_raw <- lcms_raw[, order(lcms_raw$plate_barcode, lcms_raw$injection_idx)]

At last we remove samples with only missing values (i.e. samples excluded because of technical problems such as failed injections).

all_na <- function(x) all(is.na(x))

na_lcms <- apply(concentrations(lcms), MARGIN = 2, all_na)
na_fia <- apply(concentrations(fia), MARGIN = 2, all_na)

lcms <- lcms[, !(na_lcms & na_fia)]
lcms_raw <- lcms_raw[, !(na_lcms & na_fia)]
fia <- fia[, !(na_lcms & na_fia)]
fia_raw <- fia_raw[, !(na_lcms & na_fia)]

1.1 General data overview

Prior to quality assessment we provide a general data summary including the numbers of analytes, samples and plates.

General overview of the available data.
LCMS QC sample count 2440
LCMS study sample count 8140
LCMS plate count 102
LCMS replicated study samples 880
LCMS analyte count 42
FIA QC sample count 2440
FIA study sample count 8140
FIA plate count 102
FIA replicated study samples 880
FIA analyte count 146

The whole data set consists thus of in total 102 plates, with 11 plates being replicated.

2 Identification of potentially problematic samples

Before we evaluate the quality of the data we try to identify potentially problematic samples. These are samples with either a higher percentage of missing values or consistent differences in analyte concentrations (compared to other samples of the same sample type).

2.1 Samples with a high proportion of missing values

Below we determine the number of missing values per sample and calculate the RLA (relative log abundance) for each analyte across samples of the same sample type (being either the QC sample type or male/female study samples).

lcms_smpls_na <- apply(concentrations(lcms), 2, function(z) sum(is.na(z)))
fia_smpls_na <- apply(concentrations(fia), 2, function(z) sum(is.na(z)))

lcms_smpls_na_grp <- split(lcms_smpls_na, lcms$group)
fia_smpls_na_grp <- split(fia_smpls_na, fia$group)
Distribution of the number of missing values per sample for each sample type. Upper panel: LCMS data, lower: FIA. The vertical line represents the mean plus 3 standard deviations.

Figure 1: Distribution of the number of missing values per sample for each sample type
Upper panel: LCMS data, lower: FIA. The vertical line represents the mean plus 3 standard deviations.

Potential outlier samples are identified as those with a number of missing values that is larger than the average number of missing values for that sample group plus 3 times its standard deviation. The outlier candidates are listed below.

Samples with a higher number of missing values compared to the average of the group. Shown are the 80% quantile of the missing number count of the group and the number of missing values of the sample. (continued below)
sample_name plate_name well group 80% NA count
00p180_QC1 BPLT00000281 14 00p180_QC1 3
QC NIST STD BPLT00000281 62 QC NIST STD 15
00p180_QC1 BPLT00000281 14 00p180_QC1 3
QC NIST STD BPLT00000281 62 QC NIST STD 15
QC CHRIS Pool BPLT00000281 50 QC CHRIS Pool 13
QC CHRIS Pool BPLT00000086 50 QC CHRIS Pool 13
00p180_QC2 BPLT00000086 26 00p180_QC2 1
00p180_QC2 BPLT00000086 67 00p180_QC2 1
00p180_QC2 BPLT00000086 96 00p180_QC2 1
0010111776 BPLT00000409 5 study 14
0010215416 BPLT00000409 24 study 14
QC NIST STD BPLT00000233 62 QC NIST STD 49
NA count data idx
42 LCMS 1
42 LCMS 2
42 LCMS 3
42 LCMS 4
42 LCMS 5
42 LCMS 6
42 LCMS 7
42 LCMS 8
42 LCMS 9
17 LCMS 10
17 LCMS 11
146 FIA 12

The identified samples are mostly QC samples (especially in the LCMS data) and some study samples. Since the number of missing values is not extremely high for these latter we might still want to keep them. Below we define the index of these outlier samples, keeping the identified potential outlier study samples.

outl <- lapply(lcms_smpls_na_grp, function(z)
    z > (mean(z) + 3 * sd(z)))
outl <- unsplit(outl, lcms$group)
lcms_na_out_idx <- unname(which(outl & lcms$group != "study"))

outl <- lapply(fia_smpls_na_grp, function(z)
    z > (mean(z) + 3 * sd(z)))
outl <- unsplit(outl, fia$group)
fia_na_out_idx <- unname(which(outl & fia$group != "study"))

2.2 Samples with an on average different analyte concentration

Next we identify potentially problematic samples based on the measured analyte abundances relative to the average of that analyte across samples of the same group. Here we are specifically looking for samples that have considerably lower concentrations in most/all analytes (indicating potentially failed injections, globally lower amount of samples or similar).

The RLAs calculated below represent the log abundances of measurements relative to the median abundance within the same sample group (sample group being the sample type).

lcms_grp <- lcms$group
## lcms_grp[lcms_grp == "study"] <- as.character(lcms$sex[lcms_grp == "study"])
lcms_rla <- rowRla(concentrations(lcms), lcms_grp)
#' FIA
fia_grp <- fia$group
## fia_grp[fia_grp == "study"] <- as.character(fia$sex[fia_grp == "study"])
fia_rla <- rowRla(concentrations(fia), fia_grp)

The plot below shows the per-sample distribution of RLA values.

RLA plot for the LCMS (upper) and FIA data (lower plot). Samples are colored by their sample type. Shown are within-sample-type RLAs. Sample type is either male or female study samples, or QC sample type. Grey coloring indicates the different plates.

Figure 2: RLA plot for the LCMS (upper) and FIA data (lower plot)
Samples are colored by their sample type. Shown are within-sample-type RLAs. Sample type is either male or female study samples, or QC sample type. Grey coloring indicates the different plates.

Variance of RLA values are low for the LCMS data. In contrast, for the FIA data a considerable amount of samples have a more than 2-fold lower average abundance.

Next we aggregate the RLA values per sample by calculating for each sample the median, 80 and 90% quantile RLA across all analytes.

lcms_rla_qnts <- t(apply(lcms_rla, MARGIN = 2, quantile,
                         probs = c(0.5, 0.8, 0.9), na.rm = TRUE))
fia_rla_qnts <- t(apply(fia_rla, MARGIN = 2, quantile,
                        probs = c(0.5, 0.8, 0.9), na.rm = TRUE))

The distribution of the 80% RLA quantile per sample for all sample types is shown below.

Distribution of the RLAs per sample type. Upper panel: LCMS data, lower: FIA. The vertical line represents the mean minus 3 standard deviations.

Figure 3: Distribution of the RLAs per sample type
Upper panel: LCMS data, lower: FIA. The vertical line represents the mean minus 3 standard deviations.

There seem to be some samples with a 80% RLA quantile being lower than the mean minus 3 standard deviations (vertical lines in the plot above). With the exception of QC NIST STD and study samples the 80% RLA quantile of these potential outliers is however close to 0 which argues against a strong systematic difference in concentrations. Based on the distributions above we will however mark QC NIST STD samples with an 80% RLA quantile < -0.4 and study samples with an 80% RLA quantile < -1 as potential outliers. Ideally we should repeat the affected study samples, since we can not rule out that these differences are real.

lcms_rla_out_idx <- which(lcms_rla_qnts[, "80%"] < -1 | (
    lcms_rla_qnts[, "80%"] < -0.4 & lcms$group == "QC NIST STD"))
fia_rla_out_idx <- which(fia_rla_qnts[, "80%"] < -1 | (
    fia_rla_qnts[, "80%"] < -0.4 & fia$group == "QC NIST STD"))

The study samples for which 80% of the analytes have an RLA smaller -1 and the QC NIST STD samples with an 80% RLA quantile < -0.4 are listed in the table below (for both the LCMS and FIA data).

Potential outlier samples based on RLA values.
  sample_name plate_name well 80% RLA data
14875 QC NIST STD BPLT00000137 62 -0.5535 LCMS
14874 QC NIST STD BPLT00000137 62 -0.5738 LCMS
14833 QC NIST STD BPLT00000137 62 -0.5938 LCMS
14837 QC NIST STD BPLT00000137 62 -0.5891 LCMS
14838 QC NIST STD BPLT00000137 62 -0.5888 LCMS
23160 QC NIST STD BPLT00000209 62 -0.5177 LCMS
23162 QC NIST STD BPLT00000209 62 -0.4825 LCMS
23161 QC NIST STD BPLT00000209 62 -0.4909 LCMS
23163 QC NIST STD BPLT00000209 62 -0.4755 LCMS
23150 QC NIST STD BPLT00000209 62 -0.4987 LCMS
6269 QC NIST STD BPLT00000285 62 -0.6095 LCMS
6272 QC NIST STD BPLT00000285 62 -0.5907 LCMS
6271 QC NIST STD BPLT00000285 62 -0.5865 LCMS
6270 QC NIST STD BPLT00000285 62 -0.6084 LCMS
6273 QC NIST STD BPLT00000285 62 -0.6005 LCMS
15454 0010170742 BPLT00000129 93 -1.181 FIA
15477 0010029271 BPLT00000129 84 -1.288 FIA
20151 0010050687 BPLT00000057 39 -1.057 FIA
6619 0010016608 BPLT00000281 35 -1.481 FIA
22593 0010002492 BPLT00000249 32 -1.413 FIA
8835 0010125401 BPLT00000233 74 -1.73 FIA
17715 0010338094 BPLT00000093 72 -1.231 FIA
17260 0010236351 BPLT00000101 27 -1.388 FIA
16332 0010189496 BPLT00000113 8 -1.203 FIA
15676 0010249634 BPLT00000125 58 -1.227 FIA
13202 0010265431 BPLT00000165 87 -1.476 FIA
13190 0010216869 BPLT00000165 88 -1.471 FIA
13186 0010204360 BPLT00000165 60 -1.358 FIA
13420 0010266285 BPLT00000161 74 -2.561 FIA
15046 QC NIST STD BPLT00000137 62 -0.5338 FIA
15045 QC NIST STD BPLT00000137 62 -0.5467 FIA
15042 QC NIST STD BPLT00000137 62 -0.4836 FIA
15043 QC NIST STD BPLT00000137 62 -0.4865 FIA
15044 QC NIST STD BPLT00000137 62 -0.4515 FIA
9258 0010206447 BPLT00000229 29 -1.05 FIA
9912 0010117224 BPLT00000221 5 -1.399 FIA
23025 0010364678 BPLT00000213 91 -1.702 FIA
19702 0010050995 BPLT00000065 10 -1.15 FIA
23277 QC NIST STD BPLT00000209 62 -0.4742 FIA
23192 0010072017 BPLT00000209 52 -1.237 FIA
23279 QC NIST STD BPLT00000209 62 -0.5357 FIA
23278 QC NIST STD BPLT00000209 62 -0.515 FIA
23280 QC NIST STD BPLT00000209 62 -0.5078 FIA
23276 QC NIST STD BPLT00000209 62 -0.518 FIA
23405 0010035478 BPLT00000201 54 -1.179 FIA
11007 0010054356 BPLT00000193 82 -1.537 FIA
8552 0010194648 BPLT00000237 29 -1.018 FIA
8622 0010145197 BPLT00000237 65 -1.311 FIA
8597 0010084920 BPLT00000237 10 -1.331 FIA
8551 0010294559 BPLT00000237 82 -1.343 FIA
10177 0010013439 BPLT00000217 65 -1.347 FIA
10164 0010126780 BPLT00000217 10 -1.299 FIA
10199 0010230994 BPLT00000217 82 -1.205 FIA
10605 0010010220 BPLT00000205 15 -1.633 FIA
10638 0010261751 BPLT00000205 18 -1.376 FIA
12344 0010008924 BPLT00000177 3 -1.376 FIA
12342 0010066541 BPLT00000177 15 -1.068 FIA
12264 0010188438 BPLT00000177 82 -1.37 FIA
12295 0010253568 BPLT00000177 60 -1.042 FIA
12302 0010319987 BPLT00000177 84 -1.124 FIA
12107 0010137518 BPLT00000181 87 -1.151 FIA
21525 0010342281 BPLT00000041 87 -1.248 FIA
19933 0010278047 BPLT00000061 64 -1.083 FIA
20015 0010286121 BPLT00000061 43 -1.131 FIA
14282 0010068538 BPLT00000145 75 -1.296 FIA
14321 0010085866 BPLT00000145 80 -1.007 FIA
9683 0010094646 BPLT00000225 64 -1.429 FIA
9697 0010289563 BPLT00000225 42 -1.111 FIA
9458 0010094646 BPLT00000226 64 -1.157 FIA
11827 0010127859 BPLT00000182 30 -1.031 FIA
10414 0010010220 BPLT00000206 15 -1.124 FIA
10334 0010261751 BPLT00000206 18 -1.064 FIA
21337 0010342281 BPLT00000042 87 -1.105 FIA
6385 QC NIST STD BPLT00000285 62 -0.538 FIA
6388 QC NIST STD BPLT00000285 62 -0.6041 FIA
6387 QC NIST STD BPLT00000285 62 -0.5857 FIA
6386 QC NIST STD BPLT00000285 62 -0.5803 FIA
6389 QC NIST STD BPLT00000285 62 -0.5677 FIA

Not unexpectedly, many RLA-based outlier samples are QC NIST STD samples. All potentially problematic study samples were identified in the FIA data, most of them (5) on plate BPLT00000177 and 3 each on plates BPLT00000237, BPLT00000217 and BPLT00000165. The remaining samples are from different plates hence there is no systematic problem with any of the plates.

3 Identification of potentially problematic analytes

Next we evaluate variances of analyte measurements and numbers of missing values in QC samples to identify potentially problematic/noisy analytes. Missing values are measurements excluded due to technical problems or values outside the detection and quantification range. Note that we calculate these values on the full data set, i.e. without excluding potentially bad samples from the previous section. Note that the final evaluation of analyte qualities and definition of the analytes to be flagged is performed in the define-data-release-analytes.Rmd vignette.

prop_na <- function(x, MARGIN = 1) {
    apply(x, MARGIN = MARGIN, function(z) sum(is.na(z)) / length(z))
}

anlts_lcms_na <- within_group_fun(concentrations(lcms), lcms$group, prop_na)
anlts_fia_na <- within_group_fun(concentrations(fia), fia$group, prop_na)

anlts_lcms_rsd <- within_group_fun(concentrations(lcms), lcms$group, rowRsd)
anlts_fia_rsd <- within_group_fun(concentrations(fia), fia$group, rowRsd)

Analytes are flagged as having bad quality if one of the following conditions matches: - RSD across 00p180_QC1 samples is > 30% (technically too variable) - RSD across 00p180_QC2 samples is > 30% (technically too variable) - Percentage of missing values across study samples is > 99.9 (i.e. a concentration was measured in less than 5 individuals).

All analytes are supposed to be present in the 0p180 QC samples and high variability of measurements in them might indicate unstable estimation of concentrations.

The table below lists all analytes matching any of these criteria.

Analytes classified to have poor quality: RSD > 30% in 00p180_QC1 or 00p180_QC2 samples, or with 99.9% missing values in study samples. (continued below)
  %NA.00p180_QC1 %NA.00p180_QC2 %NA.study
Ac-Orn 70.78 70.99 99.73
ADMA 2.353 0.3297 2.396
c4-OH-Pro 100 0.3297 100
Carnosine 1.569 0.3297 100
DOPA 0.3922 0.3297 99.93
Dopamine 0.3922 0.3297 99.99
Histamine 12.16 12.2 99.99
Met-SO 100 0.3297 99.99
Nitro-Tyr 2.353 0.3297 99.99
PEA 0.3922 0.3297 99.99
Spermine 1.373 0.6593 97.84
C12:1 95.88 94.73 98.13
C14:1 92.75 92.97 47.71
C14:1-OH 95.49 95.6 95.18
C14:2 93.53 94.07 75.33
C14:2-OH 95.49 95.49 95.39
C16 0 0 99.93
C16-OH 88.04 91.1 90.2
C16:1-OH 97.25 97.36 97.22
C16:2 90.78 89.45 41.33
C16:2-OH 93.92 93.41 81.88
C18 0 0 99.98
C18:1 69.22 73.19 4.877
C3-DC (C4-OH) 90.59 95.49 94.18
C3-OH 100 100 100
C3:1 98.82 99.01 98.99
C4:1 99.8 100 100
C5 0.9804 0.989 99.91
C5-DC (C6-OH) 92.55 91.76 89.55
C5-M-DC 95.88 96.04 96.3
C5-OH (C3-DC-M) 94.51 95.16 94
C5:1 99.8 100 100
C5:1-DC 99.8 100 99.98
C6:1 100 100 100
C7-DC 99.61 99.78 63.42
C9 99.8 100 99.99
lysoPC a C14:0 100 100 100
lysoPC a C26:0 99.41 98.9 98.71
lysoPC a C26:1 100 100 100
lysoPC a C28:0 99.41 99.12 98.57
PC aa C30:2 32.16 32.09 33.91
PC aa C38:1 3.725 3.077 2.948
PC ae C30:1 3.725 3.846 4.582
SM C22:3 13.33 16.37 54.55
SM C26:0 3.725 2.857 1.032
SM C26:1 4.51 4.615 1.609
  RSD.00p180_QC1 RSD.00p180_QC2 data
Ac-Orn 149.7 144.3 LCMS
ADMA 10.62 35.49 LCMS
c4-OH-Pro NA 19.88 LCMS
Carnosine 8.345 12.96 LCMS
DOPA 82.76 80.19 LCMS
Dopamine 23.09 26.25 LCMS
Histamine 67.09 63.94 LCMS
Met-SO NA 28.12 LCMS
Nitro-Tyr 13.24 16.04 LCMS
PEA 7.072 6.136 LCMS
Spermine 50.85 64.24 LCMS
C12:1 23.97 42.55 FIA
C14:1 89.01 79.49 FIA
C14:1-OH 140.4 14.52 FIA
C14:2 54.96 13.17 FIA
C14:2-OH 65.25 11.98 FIA
C16 14.89 16.04 FIA
C16-OH 81.54 11.86 FIA
C16:1-OH 66.56 9.354 FIA
C16:2 91.73 36.87 FIA
C16:2-OH 73.07 29.71 FIA
C18 18.78 18.54 FIA
C18:1 46.63 112.4 FIA
C3-DC (C4-OH) 43.38 25.47 FIA
C3-OH NA NA FIA
C3:1 118.1 8.428 FIA
C4:1 NA NA FIA
C5 11.14 11.06 FIA
C5-DC (C6-OH) 61.44 15.5 FIA
C5-M-DC 114.7 5.037 FIA
C5-OH (C3-DC-M) 67.41 21.11 FIA
C5:1 NA NA FIA
C5:1-DC NA NA FIA
C6:1 NA NA FIA
C7-DC 50.31 1.746 FIA
C9 NA NA FIA
lysoPC a C14:0 NA NA FIA
lysoPC a C26:0 33.66 49.78 FIA
lysoPC a C26:1 NA NA FIA
lysoPC a C28:0 18.8 32.46 FIA
PC aa C30:2 52.2 47.67 FIA
PC aa C38:1 43.77 42.6 FIA
PC ae C30:1 39.61 39.85 FIA
SM C22:3 98.04 98.74 FIA
SM C26:0 35.02 33.63 FIA
SM C26:1 33.22 31.92 FIA

Note that there might be better approaches to identify poor quality signals, e.g. comparing the actual analyte concentrations to the expected concentrations. Also, problematic analytes will be identified in a second Rmd file (define-data-release-analytes.Rmd after removing poor quality samples and based only on one of the replicated plates.

4 Remove poor quality samples and analytes

Next we reduce the data set by removing samples that were identified above as potential outlier and poor quality samples. Also, we remove analytes considered to yield too noisy data.

lcms <- lcms[-lcms_anlts_out_idx,
             -unique(c(lcms_na_out_idx, lcms_rla_out_idx))]
lcms_raw <- lcms_raw[-lcms_anlts_out_idx,
                     -unique(c(lcms_na_out_idx, lcms_rla_out_idx))]
fia <- fia[-fia_anlts_out_idx,
             -unique(c(fia_na_out_idx, fia_rla_out_idx))]
fia_raw <- fia_raw[-fia_anlts_out_idx,
                   -unique(c(fia_na_out_idx, fia_rla_out_idx))]

All further data processing will be performed on the reduced data set.

5 General quality assessment

First we evaluate the general quality of the LCMS and FIA data set. This comprises plots of RLAs before and after normalization, relative standard deviation (RSD) calculated for each analyte in QC samples (across all plates), concentration ratios between replicated measurements.

Note that the quality assessment in this document is performed on all samples, also including the replicated ones.

5.1 Relative log abundances

First we calculate the relative log abundances (RLA) before and after normalization, using the sample type as groups within which the relative abundances are calculated.

## Defining the RLA groups
grp <- lcms$group
## grp[grp == "study"] <- as.character(lcms$sex)[grp == "study"]
lcms$rla_group <- grp
lcms_raw$rla_group <- grp

grp <- fia$group
## grp[grp == "study"] <- as.character(fia$sex)[grp == "study"]
fia$rla_group <- grp
fia_raw$rla_group <- grp

rla_lcms <- colMedians(rowRla(concentrations(lcms),
                              lcms$rla_group), na.rm = TRUE)
rla_lcms_raw <- colMedians(rowRla(concentrations(lcms_raw),
                                  lcms_raw$rla_group), na.rm = TRUE)
rla_fia <- colMedians(rowRla(concentrations(fia),
                             fia$rla_group), na.rm = TRUE)
rla_fia_raw <- colMedians(rowRla(concentrations(fia_raw),
                                 fia_raw$rla_group), na.rm = TRUE)
Per sample median RLA before (red) and after normalization (blue). The grey rectangles indicate plates/batches.

Figure 4: Per sample median RLA before (red) and after normalization (blue)
The grey rectangles indicate plates/batches.

Between-batch normalization reduced systematic difference in abundances, especially in the FIA data set centering the per sample median RLAs around 0.

The RLA plots above were created on blessed signals, i.e. measurements that are within the detection and quantification range defined by Biocrates. Data analyses will however be performed also considering signal outside this range. We thus create below also RLA plots considering all of the measured intensities.

Per sample median RLA before (red) and after normalization (blue) considering all measured intensities. The grey rectangles indicate plates/batches.

Figure 5: Per sample median RLA before (red) and after normalization (blue) considering all measured intensities
The grey rectangles indicate plates/batches.

Normalization reduced thus the between-group differences considerably.

5.2 Relative standard deviation

Below we calculate RSD values for each analyte in each sample type. We consider only signal within the detection range for each analyte.

rsd_lcms <- within_group_fun(concentrations(lcms),
                             lcms$group, rowRsd)
rsd_fia <- within_group_fun(concentrations(fia),
                            fia$group, rowRsd)
rsd_lcms_raw <- within_group_fun(concentrations(lcms_raw),
                                 lcms_raw$group, rowRsd)
rsd_fia_raw <- within_group_fun(concentrations(fia_raw),
                                fia_raw$group, rowRsd)
Distribution of RSD values per sample type for the LCMS (left) and FIA (right) data.

Figure 6: Distribution of RSD values per sample type for the LCMS (left) and FIA (right) data

The table below summarizes the RSD (across analytes) for each sample type before and after normalization.

Median RSD (in %) and percentage of analytes with an RSD > 30% for the LCMS and FIA data sets before and after normalization.
  LCMS raw LCMS FIA raw FIA
median RSD 00p180_QC1 7.651 4.628 22.96 9.141
median RSD 00p180_QC2 7.586 6.373 22.26 8.655
median RSD QC CHRIS Pool 6.937 4.505 23.13 6.723
median RSD QC NIST STD 9.85 9.231 25.78 10.96
median RSD study 22.28 21.95 36.03 28.05
% RSD > 30 00p180_QC1 0 0 14.41 0
% RSD > 30 00p180_QC2 0 0 13.51 0
% RSD > 30 QC CHRIS Pool 0 0 15.32 1.802
% RSD > 30 QC NIST STD 0 0 34.23 2.703
% RSD > 30 study 19.35 19.35 75.68 34.23

The average RSD for QC samples is below ~9% for the LCMS and below ~11% for the FIA data. Between-batch normalization improved the quality of the FIA data considerably reducing the RSD from over 20% to about 10%. Note: only RSD of QC NIST STD samples and study samples should be considered for the quality assessment as these were not used to estimate the batch effect. Also, importantly, normalization reduced the RSD of QC samples while having only a low impact on the RSD of study samples.

Since many analyses will also include measurements outside the detection range, we repeat the analysis considering all measurements.

rsd_lcms_all <- within_group_fun(
    concentrations(lcms, blessing = "none"),
    lcms$group, rowRsd)
rsd_fia_all <- within_group_fun(
    concentrations(fia, blessing = "none"),
    fia$group, rowRsd)
rsd_lcms_raw_all <- within_group_fun(
    concentrations(lcms_raw, blessing = "none"),
    lcms_raw$group, rowRsd)
rsd_fia_raw_all <- within_group_fun(
    concentrations(fia_raw, blessing = "none"),
    fia_raw$group, rowRsd)
Median RSD (in %) and percentage of analytes with an RSD > 30% for the LCMS and FIA data sets before and after normalization; all measurements also outside the detection range are used.
  LCMS raw LCMS FIA raw FIA
median RSD 00p180_QC1 7.651 4.628 23.39 9.161
median RSD 00p180_QC2 7.586 6.373 23.26 8.689
median RSD QC CHRIS Pool 7.025 4.505 23.71 7.113
median RSD QC NIST STD 10.2 9.381 26.89 11.01
median RSD study 23.42 23.11 37.08 28.3
% RSD > 30 00p180_QC1 0 0 16.22 0.9009
% RSD > 30 00p180_QC2 0 0 16.22 0.9009
% RSD > 30 QC CHRIS Pool 3.226 0 18.02 1.802
% RSD > 30 QC NIST STD 6.452 6.452 39.64 2.703
% RSD > 30 study 32.26 32.26 82.88 35.14

Even when signals outside the detection range are considered, RSD of QC Samples are below ~ 10% after normalization. Again, the improvement of the FIA data quality is impressive.

5.3 Maximum ratio of abundances for replicated measurements

As an additional quality criteria we evaluate the difference (ratio) in abundances of replicated measurements. To this end we first identify the replicated samples and subsequently determine the largest ratio of abundances measured for an analyte in the replicates of a sample. The percentage of such MRA (maximum ratio of abundances) larger than 1.3 are used as a quality measure. This represents the percentage of analyte measurements for which concentrations differ by more than 30%.

#' Identify replicated study samples and subset the data
samp_cnt <- table(lcms$sample_name[lcms$group == "study"])
mult_ids <- names(samp_cnt)[samp_cnt > 1]

#' subsetting the data to replicated measurements only
lcms_repl <- lcms[, lcms$sample_name %in% mult_ids]

#' Define the function to calculate the MRA:
#' Returns the ratio between the max and the min value or
#' NA if less than 2 valid measurements are available.
mra <- function(x) {
    if (is.matrix(x)) {
        apply(x, MARGIN = 1, function(z) {
            z <- z[!is.na(z)]
            if (length(z) > 1) {
                rng <- range(z)
                rng[2] / rng[1]
            } else
                NA_real_
        })
    } else
        rep(NA_real_, length(x))
}

mra_lcms <- do.call(
    cbind, within_group_fun(concentrations(lcms_repl),
                            lcms_repl$sample_name, mra))
#' Remove columns (sample pairs) with only NAs.
mra_lcms <- mra_lcms[, prop_na(mra_lcms, 2) < 1]

The distribution of MRA values per replicated sample before and after normalization is shown below, first for the LCMS and subsequently for the FIA data. For easier visualization the data is log2 transformed. A log2 MRA of 0 represents identical concentrations, log2 MRA larger than 1 a more than two-fold difference in abundances.

Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the LCMS data before and after normalization. Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Figure 7: Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the LCMS data before and after normalization
Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Normalization reduced the MRAs for all replicates, and had its largest impact on the measurements from the first two replicated plates. Per-sample average log2 MRAs are, with the exception of some replicates, below 0.2 for all samples, which represents less than 15% difference in concentration. Based on the MRAs, observed differences of abundances in the LCMS data which are above ~ 1.5-fold (log2 MRA of 0.6) can be considered to be real while for smaller differences it is not possible to discriminate between technical and biological variance.

Next we plot the MRAs before and after normalization for the FIA data.

Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the FIA data before and after normalization. Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Figure 8: Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the FIA data before and after normalization
Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Differences between replicated measurements are much larger for the FIA than for the LCMS data. These differences could be reduced by the normalization, specifically for replicated plate pairs 1 (BPLT00000037/BPLT00000038), 4 (BPLT00000229/BPLT00000230) and 11 (BPLT00000049/BPLT00000050). For the FIA data even two-fold differences could be caused exclusively by technical variances. Thus, potential biologically relevant differences should be more than two-fold at least.

The table below summarizes the differences between replicated measurements. For each replicated individual the median MRA (across all analytes) was calculated, as well as the 80% and the 95% quantile. The mean of these across all replicated individuals is then reported and used as quality criteria.

Summary for the maximum ratio of abundances (MRA) analysis. Mean of median MRA per sample (across analytes), 80% and 95 % quantile of MRAs as well as number of samples with a median MRA larger than 1.3 and percentage of measurements (all analytes, all samples) with an MRA > 1.3.
  LCMS raw LCMS FIA raw FIA
mean of median MRA 1.064 1.054 1.197 1.104
mean of 80% quantile MRA 1.113 1.092 1.374 1.175
mean of 95% quantile MRA 1.182 1.145 1.599 1.326
no. samples with median MRA > 1.3 0 0 104 19
% of measurements with MRA > 1.3 1.82 0.8768 28.7 8.61

On average, 80% of analytes have a less than 1.1-fold difference in concentration between replicates for LCMS data and 1.2-fold difference for FIA data. The quality of the data is thus extremely good. Normalization could improve data quality, specifically for the FIA data.

Despite the overall good quality of the data, some replicated measurements show (in some cases extremely) large differences. These are listed below.

Samples with a median MRA larger than 1.3. Shown are the sample name, the plates on which the sample was measured the median MRA, the quartiles of the abundance ratios, the intercept and slopes of the linear models fitted to the replicates and the R square. Samples are ordered by the median MRA in descending order. (continued below)
sample_name plate_name median MRA data AR 0%
0010133324 BPLT00000225, BPLT00000226 2.018 FIA 0.3591
0010071152 BPLT00000225, BPLT00000226 1.875 FIA 0.2786
0010222815 BPLT00000205, BPLT00000206 1.859 FIA 0.3554
0010306114 BPLT00000225, BPLT00000226 1.629 FIA 0.4419
0010324169 BPLT00000181, BPLT00000182 1.617 FIA 0.8972
0010148293 BPLT00000225, BPLT00000226 1.575 FIA 0.3779
0010103176 BPLT00000081, BPLT00000082 1.522 FIA 0.1071
0010339901 BPLT00000225, BPLT00000226 1.435 FIA 0.3876
0010063900 BPLT00000105, BPLT00000106 1.433 FIA 0.4846
0010039436 BPLT00000205, BPLT00000206 1.431 FIA 0.5496
0010226390 BPLT00000205, BPLT00000206 1.408 FIA 0.5034
0010176777 BPLT00000041, BPLT00000042 1.377 FIA 0.5573
0010179082 BPLT00000049, BPLT00000050 1.372 FIA 0.8236
0010167974 BPLT00000225, BPLT00000226 1.361 FIA 0.5553
0010291696 BPLT00000241, BPLT00000242 1.343 FIA 0.535
0010281876 BPLT00000225, BPLT00000226 1.332 FIA 0.367
0010271305 BPLT00000225, BPLT00000226 1.319 FIA 0.5374
0010290792 BPLT00000241, BPLT00000242 1.31 FIA 0.505
0010252353 BPLT00000049, BPLT00000050 1.308 FIA 0.8679
AR 25% AR 50% AR 75% AR 100% int slope R2
0.4715 0.4954 0.5607 1.499 14.05 1.025 0.9886
0.4955 0.5334 0.6426 1.213 18 1.024 0.9912
0.4962 0.5381 0.6681 1.061 11.67 1.103 0.9899
0.5638 0.6145 0.6867 2.035 7.977 0.8823 0.9936
1.416 1.617 1.693 2.265 -10.53 1.004 0.9899
0.6015 0.635 0.6939 1.193 8.326 1.03 0.9956
0.5958 0.657 0.7359 1.07 7.748 1.204 0.9958
0.649 0.6967 0.7571 1.136 7.642 1.019 0.9965
0.6725 0.6977 0.7629 1.172 7.783 0.9383 0.9964
0.671 0.6989 0.82 1.149 5.599 1.132 0.9981
0.6745 0.7122 0.8478 2.43 5.061 1.082 0.9983
0.6621 0.7264 0.8964 1.157 5.293 0.9386 0.9961
1.233 1.372 1.572 2.08 -6.964 0.8458 0.9963
0.7154 0.7373 0.8134 1.765 8.407 1.025 0.9942
0.702 0.7447 0.7885 1.325 4.283 1.231 0.9983
0.7171 0.7509 0.8256 1.214 5.779 1.035 0.998
0.7367 0.7586 0.8331 1.332 7.668 0.9963 0.9958
0.7266 0.7634 0.8089 1.056 7.003 1.105 0.9985
1.065 1.308 1.412 2.147 -12.05 0.9723 0.9944

Not a single replicate pair in the LC-MS data set had a median MRA larger 1.3. For the FIA data 20 had a median MRA larger than 1.3, but, with the exception of a single sample, smaller than 2-fold. Most of the samples (8) are from replicated plates BPLT00000225/BPLT00000226.

The MRA reports only absolute differences, thus it can not distinguish between sample mixups and differences in sample amount (or failing injections). The very large R squares argue however against sample mixups. Thus differences are most likely caused by differences in absolute concentrations.

5.4 Evaluation of sample grouping with a principal component analysis

At last we perform a PCA to group samples based on their metabolic profile. We will use all measured concentrations even if they were outside the detection range for an analyte. Note that Biocrates provides an abundance estimate for all analytes even if they are outside the detection or quantification limit. For values outside of the detection and quantification range Biocrates does however not guarantee that they are correct.

All analytes with a single missing value (i.e. measurement dropped because of a technical problem) are subsequently dropped.

conc_lcms <- concentrations(lcms, blessing = "none")
conc_fia <- concentrations(fia, blessing = "none")

conc_lcms <- na.omit(conc_lcms)
conc_fia <- na.omit(conc_fia)

The PCA analysis is based on the analytes listed in the table below.

Analytes used for the PCA analysis (continued below)
  analyte_class
Ala aminoacids
Asn aminoacids
Cit aminoacids
Creatinine biogenic amines
Gln aminoacids
Glu aminoacids
Gly aminoacids
His aminoacids
Ile aminoacids
Kynurenine biogenic amines
Lys aminoacids
Met aminoacids
Phe aminoacids
Pro aminoacids
Ser aminoacids
t4-OH-Pro biogenic amines
Trp aminoacids
Tyr aminoacids
Val aminoacids
C0 acylcarnitines
C10 acylcarnitines
C10:2 acylcarnitines
C12 acylcarnitines
C12-DC acylcarnitines
C14 acylcarnitines
C18:1-OH acylcarnitines
C18:2 acylcarnitines
C2 acylcarnitines
C3 acylcarnitines
C8 acylcarnitines
H1 sugars
lysoPC a C16:0 glycerophospholipids
lysoPC a C16:1 glycerophospholipids
lysoPC a C17:0 glycerophospholipids
lysoPC a C18:0 glycerophospholipids
lysoPC a C18:1 glycerophospholipids
lysoPC a C18:2 glycerophospholipids
lysoPC a C20:3 glycerophospholipids
lysoPC a C20:4 glycerophospholipids
lysoPC a C24:0 glycerophospholipids
lysoPC a C28:1 glycerophospholipids
PC aa C24:0 glycerophospholipids
PC aa C26:0 glycerophospholipids
PC aa C28:1 glycerophospholipids
PC aa C30:0 glycerophospholipids
PC aa C32:0 glycerophospholipids
PC aa C32:1 glycerophospholipids
PC aa C32:2 glycerophospholipids
PC aa C32:3 glycerophospholipids
PC aa C34:1 glycerophospholipids
PC aa C34:2 glycerophospholipids
PC aa C34:3 glycerophospholipids
PC aa C34:4 glycerophospholipids
PC aa C36:1 glycerophospholipids
PC aa C36:2 glycerophospholipids
PC aa C36:3 glycerophospholipids
PC aa C36:4 glycerophospholipids
PC aa C36:5 glycerophospholipids
PC aa C36:6 glycerophospholipids
PC aa C38:0 glycerophospholipids
PC aa C38:3 glycerophospholipids
PC aa C38:4 glycerophospholipids
PC aa C38:5 glycerophospholipids
PC aa C38:6 glycerophospholipids
PC aa C40:2 glycerophospholipids
PC aa C40:3 glycerophospholipids
PC aa C40:4 glycerophospholipids
PC aa C40:5 glycerophospholipids
PC aa C40:6 glycerophospholipids
PC aa C42:0 glycerophospholipids
PC aa C42:1 glycerophospholipids
PC aa C42:2 glycerophospholipids
PC aa C42:4 glycerophospholipids
PC aa C42:5 glycerophospholipids
PC aa C42:6 glycerophospholipids
PC ae C30:0 glycerophospholipids
PC ae C30:2 glycerophospholipids
PC ae C32:1 glycerophospholipids
PC ae C32:2 glycerophospholipids
PC ae C34:0 glycerophospholipids
PC ae C34:1 glycerophospholipids
PC ae C34:2 glycerophospholipids
PC ae C34:3 glycerophospholipids
PC ae C36:0 glycerophospholipids
PC ae C36:1 glycerophospholipids
PC ae C36:2 glycerophospholipids
PC ae C36:3 glycerophospholipids
PC ae C36:4 glycerophospholipids
PC ae C36:5 glycerophospholipids
PC ae C38:0 glycerophospholipids
PC ae C38:3 glycerophospholipids
PC ae C38:4 glycerophospholipids
PC ae C38:5 glycerophospholipids
PC ae C38:6 glycerophospholipids
PC ae C40:1 glycerophospholipids
PC ae C40:2 glycerophospholipids
PC ae C40:3 glycerophospholipids
PC ae C40:4 glycerophospholipids
PC ae C40:5 glycerophospholipids
PC ae C40:6 glycerophospholipids
PC ae C42:0 glycerophospholipids
PC ae C42:1 glycerophospholipids
PC ae C42:2 glycerophospholipids
PC ae C42:3 glycerophospholipids
PC ae C42:4 glycerophospholipids
PC ae C42:5 glycerophospholipids
PC ae C44:3 glycerophospholipids
PC ae C44:4 glycerophospholipids
PC ae C44:5 glycerophospholipids
PC ae C44:6 glycerophospholipids
SM (OH) C14:1 sphingolipids
SM (OH) C16:1 sphingolipids
SM (OH) C22:1 sphingolipids
SM (OH) C22:2 sphingolipids
SM (OH) C24:1 sphingolipids
SM C16:0 sphingolipids
SM C16:1 sphingolipids
SM C18:0 sphingolipids
SM C18:1 sphingolipids
SM C24:0 sphingolipids
SM C24:1 sphingolipids
  biochemical_name
Ala Alanine
Asn Asparagine
Cit Citrulline
Creatinine Creatinine
Gln Glutamine
Glu Glutamate
Gly Glycine
His Histidine
Ile Isoleucine
Kynurenine Kynurenine
Lys Lysine
Met Methionine
Phe Phenylalanine
Pro Proline
Ser Serine
t4-OH-Pro trans-4-Hydroxyproline
Trp Tryptophan
Tyr Tyrosine
Val Valine
C0 Carnitine
C10 Decanoylcarnitine
C10:2 Decadienylcarnitine
C12 Dodecanoylcarnitine
C12-DC Dodecanedioylcarnitine
C14 Tetradecanoylcarnitine
C18:1-OH Hydroxyoctadecenoylcarnitine
C18:2 Octadecadienylcarnitine
C2 Acetylcarnitine
C3 Propionylcarnitine
C8 Octanoylcarnitine
H1 Hexose
lysoPC a C16:0 lysoPhosphatidylcholine acyl C16:0
lysoPC a C16:1 lysoPhosphatidylcholine acyl C16:1
lysoPC a C17:0 lysoPhosphatidylcholine acyl C17:0
lysoPC a C18:0 lysoPhosphatidylcholine acyl C18:0
lysoPC a C18:1 lysoPhosphatidylcholine acyl C18:1
lysoPC a C18:2 lysoPhosphatidylcholine acyl C18:2
lysoPC a C20:3 lysoPhosphatidylcholine acyl C20:3
lysoPC a C20:4 lysoPhosphatidylcholine acyl C20:4
lysoPC a C24:0 lysoPhosphatidylcholine acyl C24:0
lysoPC a C28:1 lysoPhosphatidylcholine acyl C28:1
PC aa C24:0 Phosphatidylcholine diacyl C24:0
PC aa C26:0 Phosphatidylcholine diacyl C26:0
PC aa C28:1 Phosphatidylcholine diacyl C28:1
PC aa C30:0 Phosphatidylcholine diacyl C30:0
PC aa C32:0 Phosphatidylcholine diacyl C32:0
PC aa C32:1 Phosphatidylcholine diacyl C32:1
PC aa C32:2 Phosphatidylcholine diacyl C32:2
PC aa C32:3 Phosphatidylcholine diacyl C32:3
PC aa C34:1 Phosphatidylcholine diacyl C34:1
PC aa C34:2 Phosphatidylcholine diacyl C34:2
PC aa C34:3 Phosphatidylcholine diacyl C34:3
PC aa C34:4 Phosphatidylcholine diacyl C34:4
PC aa C36:1 Phosphatidylcholine diacyl C36:1
PC aa C36:2 Phosphatidylcholine diacyl C36:2
PC aa C36:3 Phosphatidylcholine diacyl C36:3
PC aa C36:4 Phosphatidylcholine diacyl C36:4
PC aa C36:5 Phosphatidylcholine diacyl C36:5
PC aa C36:6 Phosphatidylcholine diacyl C36:6
PC aa C38:0 Phosphatidylcholine diacyl C38:0
PC aa C38:3 Phosphatidylcholine diacyl C38:3
PC aa C38:4 Phosphatidylcholine diacyl C38:4
PC aa C38:5 Phosphatidylcholine diacyl C38:5
PC aa C38:6 Phosphatidylcholine diacyl C38:6
PC aa C40:2 Phosphatidylcholine diacyl C40:2
PC aa C40:3 Phosphatidylcholine diacyl C40:3
PC aa C40:4 Phosphatidylcholine diacyl C40:4
PC aa C40:5 Phosphatidylcholine diacyl C40:5
PC aa C40:6 Phosphatidylcholine diacyl C40:6
PC aa C42:0 Phosphatidylcholine diacyl C42:0
PC aa C42:1 Phosphatidylcholine diacyl C42:1
PC aa C42:2 Phosphatidylcholine diacyl C42:2
PC aa C42:4 Phosphatidylcholine diacyl C42:4
PC aa C42:5 Phosphatidylcholine diacyl C42:5
PC aa C42:6 Phosphatidylcholine diacyl C42:6
PC ae C30:0 Phosphatidylcholine acyl-alkyl C30:0
PC ae C30:2 Phosphatidylcholine acyl-alkyl C30:2
PC ae C32:1 Phosphatidylcholine acyl-alkyl C32:1
PC ae C32:2 Phosphatidylcholine acyl-alkyl C32:2
PC ae C34:0 Phosphatidylcholine acyl-alkyl C34:0
PC ae C34:1 Phosphatidylcholine acyl-alkyl C34:1
PC ae C34:2 Phosphatidylcholine acyl-alkyl C34:2
PC ae C34:3 Phosphatidylcholine acyl-alkyl C34:3
PC ae C36:0 Phosphatidylcholine acyl-alkyl C36:0
PC ae C36:1 Phosphatidylcholine acyl-alkyl C36:1
PC ae C36:2 Phosphatidylcholine acyl-alkyl C36:2
PC ae C36:3 Phosphatidylcholine acyl-alkyl C36:3
PC ae C36:4 Phosphatidylcholine acyl-alkyl C36:4
PC ae C36:5 Phosphatidylcholine acyl-alkyl C36:5
PC ae C38:0 Phosphatidylcholine acyl-alkyl C38:0
PC ae C38:3 Phosphatidylcholine acyl-alkyl C38:3
PC ae C38:4 Phosphatidylcholine acyl-alkyl C38:4
PC ae C38:5 Phosphatidylcholine acyl-alkyl C38:5
PC ae C38:6 Phosphatidylcholine acyl-alkyl C38:6
PC ae C40:1 Phosphatidylcholine acyl-alkyl C40:1
PC ae C40:2 Phosphatidylcholine acyl-alkyl C40:2
PC ae C40:3 Phosphatidylcholine acyl-alkyl C40:3
PC ae C40:4 Phosphatidylcholine acyl-alkyl C40:4
PC ae C40:5 Phosphatidylcholine acyl-alkyl C40:5
PC ae C40:6 Phosphatidylcholine acyl-alkyl C40:6
PC ae C42:0 Phosphatidylcholine acyl-alkyl C42:0
PC ae C42:1 Phosphatidylcholine acyl-alkyl C42:1
PC ae C42:2 Phosphatidylcholine acyl-alkyl C42:2
PC ae C42:3 Phosphatidylcholine acyl-alkyl C42:3
PC ae C42:4 Phosphatidylcholine acyl-alkyl C42:4
PC ae C42:5 Phosphatidylcholine acyl-alkyl C42:5
PC ae C44:3 Phosphatidylcholine acyl-alkyl C44:3
PC ae C44:4 Phosphatidylcholine acyl-alkyl C44:4
PC ae C44:5 Phosphatidylcholine acyl-alkyl C44:5
PC ae C44:6 Phosphatidylcholine acyl-alkyl C44:6
SM (OH) C14:1 Hydroxysphingomyeline C14:1
SM (OH) C16:1 Hydroxysphingomyeline C16:1
SM (OH) C22:1 Hydroxysphingomyeline C22:1
SM (OH) C22:2 Hydroxysphingomyeline C22:2
SM (OH) C24:1 Hydroxysphingomyeline C24:1
SM C16:0 Sphingomyeline C16:0
SM C16:1 Sphingomyeline C16:1
SM C18:0 Sphingomyeline C18:0
SM C18:1 Sphingomyeline C18:1
SM C24:0 Sphingomyeline C24:0
SM C24:1 Sphingomyeline C24:1

The table below lists the analytes that were excluded from the analysis.

Analytes excluded from the PCA analysis (continued below)
  analyte_class
alpha-AAA biogenic amines
Arg aminoacids
Asp aminoacids
Leu aminoacids
Orn aminoacids
Putrescine biogenic amines
Sarcosine biogenic amines
SDMA biogenic amines
Serotonin biogenic amines
Spermidine biogenic amines
Taurine biogenic amines
Thr aminoacids
C10:1 acylcarnitines
C16:1 acylcarnitines
C4 acylcarnitines
C6 (C4:1-DC) acylcarnitines
PC aa C36:0 glycerophospholipids
PC aa C40:1 glycerophospholipids
PC ae C38:1 glycerophospholipids
PC ae C38:2 glycerophospholipids
SM C20:2 sphingolipids
  biochemical_name
alpha-AAA alpha-Aminoadipic acid
Arg Arginine
Asp Aspartate
Leu Leucine
Orn Ornithine
Putrescine Putrescine
Sarcosine Sarcosine
SDMA Symmetric dimethylarginine
Serotonin Serotonin
Spermidine Spermidine
Taurine Taurine
Thr Threonine
C10:1 Decenoylcarnitine
C16:1 Hexadecenoylcarnitine
C4 Butyrylcarnitine
C6 (C4:1-DC) Hexanoylcarnitine (Fumarylcarnitine)
PC aa C36:0 Phosphatidylcholine diacyl C36:0
PC aa C40:1 Phosphatidylcholine diacyl C40:1
PC ae C38:1 Phosphatidylcholine acyl-alkyl C38:1
PC ae C38:2 Phosphatidylcholine acyl-alkyl C38:2
SM C20:2 Sphingomyeline C20:2

At last we perform the PCA on the log2 transformed and (analyte-wise) centered abundances.

pc_lcms <- prcomp(t(log2(conc_lcms)), scale = FALSE, center = TRUE)
pc_fia <- prcomp(t(log2(conc_fia)), scale = FALSE, center = TRUE)
PCA of the normalized LCMS data. PCA was performed on all measured concentrations even if they were outside the detection range.

Figure 9: PCA of the normalized LCMS data
PCA was performed on all measured concentrations even if they were outside the detection range.

PC1 clearly separates 00p180_QC2 samples from all other samples while on PC2 QC NIST STD and 00p180_QC1 samples separate from QC CHRIS Pool and study samples, that, not surprisingly, are both overlapping. Thus, the first principal components simply reflect differences between artificial QC samples and the samples from the CHRIS study. Below we plot the results for the FIA data.

PCA of the normalized FIA data. PCA was performed on all measured concentrations even if they were outside the detection range.

Figure 10: PCA of the normalized FIA data
PCA was performed on all measured concentrations even if they were outside the detection range.

PC1 reflects differences between the Biocrates’ QC samples 00p180_QC1 and 00p180_QC2 from all other samples. The samples separating from the main cloud on PC4 are from plates BPLT00000337, BPLT00000357, BPLT00000361 and BPLT00000413.

6 Session information

Data release information.

metadata(bcd)
##                         name                   value
## 1           db_creation_date     2020-04-01 09:19:31
## 2                  data_date                 2020-03
## 3              chris_release                     3.5
## 4                    version                   2.0.0
## 5            aminoacids_norm anlt_log2_QCs_mean_mean
## 6       biogenic_amines_norm anlt_log2_QCs_mean_mean
## 7        acylcarnitines_norm    anlt_log2_CHRIS_mean
## 8  glycerophospholipids_norm anlt_log2_QCs_mean_mean
## 9         sphingolipids_norm anlt_log2_QCs_mean_mean
## 10               sugars_norm anlt_log2_QCs_mean_mean

R session information.

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                                             
##  version  R Under development (unstable) (2020-03-27 r78085)
##  os       Debian GNU/Linux 10 (buster)                      
##  system   x86_64, linux-gnu                                 
##  ui       X11                                               
##  language (EN)                                              
##  collate  en_US.UTF-8                                       
##  ctype    en_US.UTF-8                                       
##  tz       Etc/UTC                                           
##  date     2020-04-01                                        
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package              * version  date       lib source        
##  affy                   1.65.1   2019-11-06 [1] Bioconductor  
##  affyio                 1.57.0   2019-10-29 [1] Bioconductor  
##  AnnotationDbi          1.49.1   2020-01-25 [1] Bioconductor  
##  AnnotationFilter     * 1.11.0   2019-10-29 [1] Bioconductor  
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.1.0)
##  backports              1.1.5    2019-10-02 [2] CRAN (R 4.1.0)
##  Biobase              * 2.47.3   2020-03-16 [1] Bioconductor  
##  BiocGenerics         * 0.33.3   2020-03-23 [1] Bioconductor  
##  BioCHRIStes          * 2.0.0    2020-04-01 [1] Bioconductor  
##  BiocManager          * 1.30.10  2019-11-16 [2] CRAN (R 4.1.0)
##  BiocParallel         * 1.21.2   2019-12-21 [1] Bioconductor  
##  BiocStyle            * 2.15.6   2020-02-01 [1] Bioconductor  
##  bit                    1.1-15.2 2020-02-10 [1] CRAN (R 4.0.0)
##  bit64                  0.9-7    2017-05-08 [1] CRAN (R 4.0.0)
##  bitops                 1.0-6    2013-08-17 [1] CRAN (R 4.0.0)
##  blob                   1.2.1    2020-01-20 [1] CRAN (R 4.0.0)
##  bookdown               0.18     2020-03-05 [1] CRAN (R 4.0.0)
##  callr                  3.4.2    2020-02-12 [2] CRAN (R 4.1.0)
##  cli                    2.0.2    2020-02-28 [2] CRAN (R 4.1.0)
##  codetools              0.2-16   2018-12-24 [3] CRAN (R 4.1.0)
##  colorspace             1.4-1    2019-03-18 [1] CRAN (R 4.0.0)
##  crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.1.0)
##  DBI                  * 1.1.0    2019-12-15 [1] CRAN (R 4.0.0)
##  dbplyr                 1.4.2    2019-06-17 [1] CRAN (R 4.0.0)
##  DelayedArray         * 0.13.8   2020-03-26 [1] Bioconductor  
##  DEoptimR               1.0-8    2016-11-19 [1] CRAN (R 4.0.0)
##  desc                   1.2.0    2018-05-01 [2] CRAN (R 4.1.0)
##  devtools               2.2.2    2020-02-17 [1] CRAN (R 4.0.0)
##  digest                 0.6.25   2020-02-23 [2] CRAN (R 4.1.0)
##  doParallel             1.0.15   2019-08-02 [1] CRAN (R 4.0.0)
##  dplyr                  0.8.5    2020-03-07 [1] CRAN (R 4.0.0)
##  ellipsis               0.3.0    2019-09-20 [2] CRAN (R 4.1.0)
##  evaluate               0.14     2019-05-28 [2] CRAN (R 4.1.0)
##  fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.1.0)
##  foreach                1.5.0    2020-03-30 [1] CRAN (R 4.1.0)
##  fs                     1.3.2    2020-03-05 [2] CRAN (R 4.1.0)
##  GenomeInfoDb         * 1.23.16  2020-03-27 [1] Bioconductor  
##  GenomeInfoDbData       1.2.2    2020-02-18 [1] Bioconductor  
##  GenomicRanges        * 1.39.3   2020-03-24 [1] Bioconductor  
##  ggplot2                3.3.0    2020-03-05 [1] CRAN (R 4.0.0)
##  glue                   1.3.2    2020-03-12 [2] CRAN (R 4.1.0)
##  gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.0.0)
##  highr                  0.8      2019-03-20 [1] CRAN (R 4.0.0)
##  hms                    0.5.3    2020-01-08 [1] CRAN (R 4.0.0)
##  htmltools              0.4.0    2019-10-04 [2] CRAN (R 4.1.0)
##  impute                 1.61.0   2019-10-29 [1] Bioconductor  
##  IRanges              * 2.21.8   2020-03-25 [1] Bioconductor  
##  iterators              1.0.12   2019-07-26 [1] CRAN (R 4.0.0)
##  knitr                  1.28     2020-02-06 [1] CRAN (R 4.0.0)
##  lattice                0.20-40  2020-02-19 [3] CRAN (R 4.1.0)
##  lazyeval               0.2.2    2019-03-15 [2] CRAN (R 4.1.0)
##  lifecycle              0.2.0    2020-03-06 [1] CRAN (R 4.0.0)
##  limma                  3.43.5   2020-03-06 [1] Bioconductor  
##  magick                 2.3      2020-01-24 [1] CRAN (R 4.0.0)
##  magrittr               1.5      2014-11-22 [2] CRAN (R 4.1.0)
##  MALDIquant             1.19.3   2019-05-12 [1] CRAN (R 4.0.0)
##  MASS                   7.3-51.5 2019-12-20 [3] CRAN (R 4.1.0)
##  MassSpecWavelet        1.53.0   2019-10-29 [1] Bioconductor  
##  Matrix                 1.2-18   2019-11-27 [3] CRAN (R 4.1.0)
##  matrixStats          * 0.56.0   2020-03-13 [1] CRAN (R 4.0.0)
##  memoise                1.1.0    2017-04-21 [2] CRAN (R 4.1.0)
##  MSnbase              * 2.13.4   2020-03-24 [1] Bioconductor  
##  multtest               2.43.1   2020-03-12 [1] Bioconductor  
##  munsell                0.5.0    2018-06-12 [1] CRAN (R 4.0.0)
##  mzID                   1.25.0   2019-10-29 [1] Bioconductor  
##  mzR                  * 2.21.1   2019-12-14 [1] Bioconductor  
##  ncdf4                  1.17     2019-10-23 [1] CRAN (R 4.0.0)
##  pander               * 0.6.3    2018-11-06 [1] CRAN (R 4.0.0)
##  pcaMethods             1.79.1   2019-11-03 [1] Bioconductor  
##  pillar                 1.4.3    2019-12-20 [1] CRAN (R 4.1.0)
##  pkgbuild               1.0.6    2019-10-09 [2] CRAN (R 4.1.0)
##  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.0.0)
##  pkgload                1.0.2    2018-10-29 [2] CRAN (R 4.1.0)
##  plyr                   1.8.6    2020-03-03 [1] CRAN (R 4.0.0)
##  preprocessCore         1.49.2   2020-02-01 [1] Bioconductor  
##  prettyunits            1.1.1    2020-01-24 [2] CRAN (R 4.1.0)
##  processx               3.4.2    2020-02-09 [2] CRAN (R 4.1.0)
##  ProtGenerics         * 1.19.3   2019-12-25 [1] Bioconductor  
##  ps                     1.3.2    2020-02-13 [2] CRAN (R 4.1.0)
##  purrr                  0.3.3    2019-10-18 [2] CRAN (R 4.1.0)
##  R6                     2.4.1    2019-11-12 [2] CRAN (R 4.1.0)
##  RANN                   2.6.1    2019-01-08 [1] CRAN (R 4.0.0)
##  RColorBrewer         * 1.1-2    2014-12-07 [1] CRAN (R 4.0.0)
##  Rcpp                 * 1.0.4    2020-03-17 [2] CRAN (R 4.1.0)
##  RCurl                  1.98-1.1 2020-01-19 [1] CRAN (R 4.0.0)
##  remotes                2.1.1    2020-02-15 [1] CRAN (R 4.0.0)
##  rlang                  0.4.5    2020-03-01 [2] CRAN (R 4.1.0)
##  RMariaDB             * 1.0.8    2019-12-18 [1] CRAN (R 4.0.0)
##  rmarkdown            * 2.1      2020-01-20 [1] CRAN (R 4.0.0)
##  robustbase             0.93-6   2020-03-23 [1] CRAN (R 4.0.0)
##  rprojroot              1.3-2    2018-01-03 [2] CRAN (R 4.1.0)
##  RSQLite                2.2.0    2020-01-07 [1] CRAN (R 4.0.0)
##  S4Vectors            * 0.25.14  2020-03-24 [1] Bioconductor  
##  scales                 1.1.0    2019-11-18 [1] CRAN (R 4.0.0)
##  sessioninfo            1.1.1    2018-11-05 [2] CRAN (R 4.1.0)
##  stringi                1.4.6    2020-02-17 [2] CRAN (R 4.1.0)
##  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.1.0)
##  SummarizedExperiment * 1.17.5   2020-03-27 [1] Bioconductor  
##  survival               3.1-11   2020-03-07 [3] CRAN (R 4.1.0)
##  testthat               2.3.2    2020-03-02 [1] CRAN (R 4.0.0)
##  tibble                 3.0.0    2020-03-30 [1] CRAN (R 4.1.0)
##  tidyselect             1.0.0    2020-01-27 [1] CRAN (R 4.0.0)
##  usethis                1.5.1    2019-07-04 [2] CRAN (R 4.1.0)
##  vctrs                  0.2.4    2020-03-10 [1] CRAN (R 4.0.0)
##  vsn                    3.55.0   2019-10-29 [1] Bioconductor  
##  withr                  2.1.2    2018-03-15 [2] CRAN (R 4.1.0)
##  xcms                 * 3.9.3    2020-03-13 [1] Bioconductor  
##  xfun                   0.12     2020-01-13 [1] CRAN (R 4.0.0)
##  XML                    3.99-0.3 2020-01-20 [1] CRAN (R 4.0.0)
##  XVector                0.27.2   2020-03-24 [1] Bioconductor  
##  yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.0.0)
##  zlibbioc               1.33.1   2020-01-24 [1] Bioconductor  
## 
## [1] /usr/local/lib/R/host-site-library
## [2] /usr/local/lib/R/site-library
## [3] /usr/local/lib/R/library