Authors: Johannes Rainer, Vinicius Veri Hernandes and Sigurdur Smarason
` Modified: 2020-04-01 11:21:02
Compiled: Wed Apr 1 11:21:06 2020

1 Introduction

In this document we evaluate the quality of the normalized Biocrates-based targeted metabolomics data for the CHRIS7500 data set also in comparison to the quality before normalization. The quality assessment is performed separately for the LCMS and FIA data. The results from this document were used to define which of the replicated plates are part of the final data release.

Below we load all required packages and fetch the LCMS and the FIA data from the database.

library(BioCHRIStes)
library(xcms)
library(RColorBrewer)
library(pander)
library(RMariaDB)

bcd <- BioCHRIStes(dbConnect(MariaDB(), dbname = "biochristes_7500",
                             user = Sys.getenv("MYSQL_USER"),
                             pass = Sys.getenv("MYSQL_PASS"),
                             host = Sys.getenv("MYSQL_HOST")))

#' Load the raw data
lcms_raw <- fetchData(bcd, data = "lcms", raw = TRUE,
                      filter = ~ data_release == "CHRIS7500 for normalization")
fia_raw <- fetchData(bcd, data = "fia", raw = TRUE,
                     filter = ~ data_release == "CHRIS7500 for normalization")
#' Load the normalized data
lcms <- fetchData(bcd, data = "lcms", raw = FALSE,
                  filter = ~ data_release == "CHRIS7500 for normalization")
fia <- fetchData(bcd, data = "fia", raw = FALSE,
                 filter = ~ data_release == "CHRIS7500 for normalization")

#' Define colors for the different types of samples
cols <- brewer.pal(9, "Set1")[c(2, 3, 1, 5, 9)]
names(cols) <- c("QC NIST STD", "QC CHRIS Pool", "00p180_QC2",
                 "00p180_QC1", "study")

Next we add sample type information to the data, only the code for the LCMS data is shown below.

lcms_raw$group <- "study"
lcms_raw$group[lcms_raw$sample_name == "QC NIST STD"] <- "QC NIST STD"
lcms_raw$group[lcms_raw$sample_name == "QC CHRIS Pool"] <- "QC CHRIS Pool"
lcms_raw$group[lcms_raw$sample_name == "00p180_QC1"] <- "00p180_QC1"
lcms_raw$group[lcms_raw$sample_name == "00p180_QC2"] <- "00p180_QC2"

#' "well" should be treated as a categorical variable
lcms_raw$well <- factor(lcms_raw$well)

#' Order the data by plate_barcode and injection index
lcms_raw <- lcms_raw[, order(lcms_raw$plate_barcode, lcms_raw$injection_idx)]

At last we remove samples with only missing values (i.e. samples excluded because of technical problems such as failed injections).

all_na <- function(x) all(is.na(x))

na_lcms <- apply(concentrations(lcms), MARGIN = 2, all_na)
na_fia <- apply(concentrations(fia), MARGIN = 2, all_na)

lcms <- lcms[, !(na_lcms & na_fia)]
lcms_raw <- lcms_raw[, !(na_lcms & na_fia)]
fia <- fia[, !(na_lcms & na_fia)]
fia_raw <- fia_raw[, !(na_lcms & na_fia)]

1.1 General data overview

Prior to quality assessment we provide a general data summary including the numbers of analytes, samples and plates.

General overview of the available data.
LCMS QC sample count	2440
LCMS study sample count	8140
LCMS plate count	102
LCMS replicated study samples	880
LCMS analyte count	42
FIA QC sample count	2440
FIA study sample count	8140
FIA plate count	102
FIA replicated study samples	880
FIA analyte count	146

The whole data set consists thus of in total 102 plates, with 11 plates being replicated.

2 Identification of potentially problematic samples

Before we evaluate the quality of the data we try to identify potentially problematic samples. These are samples with either a higher percentage of missing values or consistent differences in analyte concentrations (compared to other samples of the same sample type).

2.1 Samples with a high proportion of missing values

Below we determine the number of missing values per sample and calculate the RLA (relative log abundance) for each analyte across samples of the same sample type (being either the QC sample type or male/female study samples).

lcms_smpls_na <- apply(concentrations(lcms), 2, function(z) sum(is.na(z)))
fia_smpls_na <- apply(concentrations(fia), 2, function(z) sum(is.na(z)))

lcms_smpls_na_grp <- split(lcms_smpls_na, lcms$group)
fia_smpls_na_grp <- split(fia_smpls_na, fia$group)

Distribution of the number of missing values per sample for each sample type. Upper panel: LCMS data, lower: FIA. The vertical line represents the mean plus 3 standard deviations.

Figure 1: Distribution of the number of missing values per sample for each sample type
Upper panel: LCMS data, lower: FIA. The vertical line represents the mean plus 3 standard deviations.

Potential outlier samples are identified as those with a number of missing values that is larger than the average number of missing values for that sample group plus 3 times its standard deviation. The outlier candidates are listed below.

Samples with a higher number of missing values compared to the average of the group. Shown are the 80% quantile of the missing number count of the group and the number of missing values of the sample. (continued below)
sample_name	plate_name	well	group	80% NA count
00p180_QC1	BPLT00000281	14	00p180_QC1	3
QC NIST STD	BPLT00000281	62	QC NIST STD	15
00p180_QC1	BPLT00000281	14	00p180_QC1	3
QC NIST STD	BPLT00000281	62	QC NIST STD	15
QC CHRIS Pool	BPLT00000281	50	QC CHRIS Pool	13
QC CHRIS Pool	BPLT00000086	50	QC CHRIS Pool	13
00p180_QC2	BPLT00000086	26	00p180_QC2	1
00p180_QC2	BPLT00000086	67	00p180_QC2	1
00p180_QC2	BPLT00000086	96	00p180_QC2	1
0010111776	BPLT00000409	5	study	14
0010215416	BPLT00000409	24	study	14
QC NIST STD	BPLT00000233	62	QC NIST STD	49

NA count	data	idx
42	LCMS	1
42	LCMS	2
42	LCMS	3
42	LCMS	4
42	LCMS	5
42	LCMS	6
42	LCMS	7
42	LCMS	8
42	LCMS	9
17	LCMS	10
17	LCMS	11
146	FIA	12

The identified samples are mostly QC samples (especially in the LCMS data) and some study samples. Since the number of missing values is not extremely high for these latter we might still want to keep them. Below we define the index of these outlier samples, keeping the identified potential outlier study samples.

outl <- lapply(lcms_smpls_na_grp, function(z)
    z > (mean(z) + 3 * sd(z)))
outl <- unsplit(outl, lcms$group)
lcms_na_out_idx <- unname(which(outl & lcms$group != "study"))

outl <- lapply(fia_smpls_na_grp, function(z)
    z > (mean(z) + 3 * sd(z)))
outl <- unsplit(outl, fia$group)
fia_na_out_idx <- unname(which(outl & fia$group != "study"))

2.2 Samples with an on average different analyte concentration

Next we identify potentially problematic samples based on the measured analyte abundances relative to the average of that analyte across samples of the same group. Here we are specifically looking for samples that have considerably lower concentrations in most/all analytes (indicating potentially failed injections, globally lower amount of samples or similar).

The RLAs calculated below represent the log abundances of measurements relative to the median abundance within the same sample group (sample group being the sample type).

lcms_grp <- lcms$group
## lcms_grp[lcms_grp == "study"] <- as.character(lcms$sex[lcms_grp == "study"])
lcms_rla <- rowRla(concentrations(lcms), lcms_grp)
#' FIA
fia_grp <- fia$group
## fia_grp[fia_grp == "study"] <- as.character(fia$sex[fia_grp == "study"])
fia_rla <- rowRla(concentrations(fia), fia_grp)

The plot below shows the per-sample distribution of RLA values.

RLA plot for the LCMS (upper) and FIA data (lower plot). Samples are colored by their sample type. Shown are within-sample-type RLAs. Sample type is either male or female study samples, or QC sample type. Grey coloring indicates the different plates.

Figure 2: RLA plot for the LCMS (upper) and FIA data (lower plot)
Samples are colored by their sample type. Shown are within-sample-type RLAs. Sample type is either male or female study samples, or QC sample type. Grey coloring indicates the different plates.

Variance of RLA values are low for the LCMS data. In contrast, for the FIA data a considerable amount of samples have a more than 2-fold lower average abundance.

Next we aggregate the RLA values per sample by calculating for each sample the median, 80 and 90% quantile RLA across all analytes.

lcms_rla_qnts <- t(apply(lcms_rla, MARGIN = 2, quantile,
                         probs = c(0.5, 0.8, 0.9), na.rm = TRUE))
fia_rla_qnts <- t(apply(fia_rla, MARGIN = 2, quantile,
                        probs = c(0.5, 0.8, 0.9), na.rm = TRUE))

The distribution of the 80% RLA quantile per sample for all sample types is shown below.

Distribution of the RLAs per sample type. Upper panel: LCMS data, lower: FIA. The vertical line represents the mean minus 3 standard deviations.

Figure 3: Distribution of the RLAs per sample type
Upper panel: LCMS data, lower: FIA. The vertical line represents the mean minus 3 standard deviations.

There seem to be some samples with a 80% RLA quantile being lower than the mean minus 3 standard deviations (vertical lines in the plot above). With the exception of QC NIST STD and study samples the 80% RLA quantile of these potential outliers is however close to 0 which argues against a strong systematic difference in concentrations. Based on the distributions above we will however mark QC NIST STD samples with an 80% RLA quantile < -0.4 and study samples with an 80% RLA quantile < -1 as potential outliers. Ideally we should repeat the affected study samples, since we can not rule out that these differences are real.

lcms_rla_out_idx <- which(lcms_rla_qnts[, "80%"] < -1 | (
    lcms_rla_qnts[, "80%"] < -0.4 & lcms$group == "QC NIST STD"))
fia_rla_out_idx <- which(fia_rla_qnts[, "80%"] < -1 | (
    fia_rla_qnts[, "80%"] < -0.4 & fia$group == "QC NIST STD"))

The study samples for which 80% of the analytes have an RLA smaller -1 and the QC NIST STD samples with an 80% RLA quantile < -0.4 are listed in the table below (for both the LCMS and FIA data).

Potential outlier samples based on RLA values.
	sample_name	plate_name	well	80% RLA	data
14875	QC NIST STD	BPLT00000137	62	-0.5535	LCMS
14874	QC NIST STD	BPLT00000137	62	-0.5738	LCMS
14833	QC NIST STD	BPLT00000137	62	-0.5938	LCMS
14837	QC NIST STD	BPLT00000137	62	-0.5891	LCMS
14838	QC NIST STD	BPLT00000137	62	-0.5888	LCMS
23160	QC NIST STD	BPLT00000209	62	-0.5177	LCMS
23162	QC NIST STD	BPLT00000209	62	-0.4825	LCMS
23161	QC NIST STD	BPLT00000209	62	-0.4909	LCMS
23163	QC NIST STD	BPLT00000209	62	-0.4755	LCMS
23150	QC NIST STD	BPLT00000209	62	-0.4987	LCMS
6269	QC NIST STD	BPLT00000285	62	-0.6095	LCMS
6272	QC NIST STD	BPLT00000285	62	-0.5907	LCMS
6271	QC NIST STD	BPLT00000285	62	-0.5865	LCMS
6270	QC NIST STD	BPLT00000285	62	-0.6084	LCMS
6273	QC NIST STD	BPLT00000285	62	-0.6005	LCMS
15454	0010170742	BPLT00000129	93	-1.181	FIA
15477	0010029271	BPLT00000129	84	-1.288	FIA
20151	0010050687	BPLT00000057	39	-1.057	FIA
6619	0010016608	BPLT00000281	35	-1.481	FIA
22593	0010002492	BPLT00000249	32	-1.413	FIA
8835	0010125401	BPLT00000233	74	-1.73	FIA
17715	0010338094	BPLT00000093	72	-1.231	FIA
17260	0010236351	BPLT00000101	27	-1.388	FIA
16332	0010189496	BPLT00000113	8	-1.203	FIA
15676	0010249634	BPLT00000125	58	-1.227	FIA
13202	0010265431	BPLT00000165	87	-1.476	FIA
13190	0010216869	BPLT00000165	88	-1.471	FIA
13186	0010204360	BPLT00000165	60	-1.358	FIA
13420	0010266285	BPLT00000161	74	-2.561	FIA
15046	QC NIST STD	BPLT00000137	62	-0.5338	FIA
15045	QC NIST STD	BPLT00000137	62	-0.5467	FIA
15042	QC NIST STD	BPLT00000137	62	-0.4836	FIA
15043	QC NIST STD	BPLT00000137	62	-0.4865	FIA
15044	QC NIST STD	BPLT00000137	62	-0.4515	FIA
9258	0010206447	BPLT00000229	29	-1.05	FIA
9912	0010117224	BPLT00000221	5	-1.399	FIA
23025	0010364678	BPLT00000213	91	-1.702	FIA
19702	0010050995	BPLT00000065	10	-1.15	FIA
23277	QC NIST STD	BPLT00000209	62	-0.4742	FIA
23192	0010072017	BPLT00000209	52	-1.237	FIA
23279	QC NIST STD	BPLT00000209	62	-0.5357	FIA
23278	QC NIST STD	BPLT00000209	62	-0.515	FIA
23280	QC NIST STD	BPLT00000209	62	-0.5078	FIA
23276	QC NIST STD	BPLT00000209	62	-0.518	FIA
23405	0010035478	BPLT00000201	54	-1.179	FIA
11007	0010054356	BPLT00000193	82	-1.537	FIA
8552	0010194648	BPLT00000237	29	-1.018	FIA
8622	0010145197	BPLT00000237	65	-1.311	FIA
8597	0010084920	BPLT00000237	10	-1.331	FIA
8551	0010294559	BPLT00000237	82	-1.343	FIA
10177	0010013439	BPLT00000217	65	-1.347	FIA
10164	0010126780	BPLT00000217	10	-1.299	FIA
10199	0010230994	BPLT00000217	82	-1.205	FIA
10605	0010010220	BPLT00000205	15	-1.633	FIA
10638	0010261751	BPLT00000205	18	-1.376	FIA
12344	0010008924	BPLT00000177	3	-1.376	FIA
12342	0010066541	BPLT00000177	15	-1.068	FIA
12264	0010188438	BPLT00000177	82	-1.37	FIA
12295	0010253568	BPLT00000177	60	-1.042	FIA
12302	0010319987	BPLT00000177	84	-1.124	FIA
12107	0010137518	BPLT00000181	87	-1.151	FIA
21525	0010342281	BPLT00000041	87	-1.248	FIA
19933	0010278047	BPLT00000061	64	-1.083	FIA
20015	0010286121	BPLT00000061	43	-1.131	FIA
14282	0010068538	BPLT00000145	75	-1.296	FIA
14321	0010085866	BPLT00000145	80	-1.007	FIA
9683	0010094646	BPLT00000225	64	-1.429	FIA
9697	0010289563	BPLT00000225	42	-1.111	FIA
9458	0010094646	BPLT00000226	64	-1.157	FIA
11827	0010127859	BPLT00000182	30	-1.031	FIA
10414	0010010220	BPLT00000206	15	-1.124	FIA
10334	0010261751	BPLT00000206	18	-1.064	FIA
21337	0010342281	BPLT00000042	87	-1.105	FIA
6385	QC NIST STD	BPLT00000285	62	-0.538	FIA
6388	QC NIST STD	BPLT00000285	62	-0.6041	FIA
6387	QC NIST STD	BPLT00000285	62	-0.5857	FIA
6386	QC NIST STD	BPLT00000285	62	-0.5803	FIA
6389	QC NIST STD	BPLT00000285	62	-0.5677	FIA

Not unexpectedly, many RLA-based outlier samples are QC NIST STD samples. All potentially problematic study samples were identified in the FIA data, most of them (5) on plate BPLT00000177 and 3 each on plates BPLT00000237, BPLT00000217 and BPLT00000165. The remaining samples are from different plates hence there is no systematic problem with any of the plates.

3 Identification of potentially problematic analytes

Next we evaluate variances of analyte measurements and numbers of missing values in QC samples to identify potentially problematic/noisy analytes. Missing values are measurements excluded due to technical problems or values outside the detection and quantification range. Note that we calculate these values on the full data set, i.e. without excluding potentially bad samples from the previous section. Note that the final evaluation of analyte qualities and definition of the analytes to be flagged is performed in the define-data-release-analytes.Rmd vignette.

prop_na <- function(x, MARGIN = 1) {
    apply(x, MARGIN = MARGIN, function(z) sum(is.na(z)) / length(z))
}

anlts_lcms_na <- within_group_fun(concentrations(lcms), lcms$group, prop_na)
anlts_fia_na <- within_group_fun(concentrations(fia), fia$group, prop_na)

anlts_lcms_rsd <- within_group_fun(concentrations(lcms), lcms$group, rowRsd)
anlts_fia_rsd <- within_group_fun(concentrations(fia), fia$group, rowRsd)

Analytes are flagged as having bad quality if one of the following conditions matches: - RSD across 00p180_QC1 samples is > 30% (technically too variable) - RSD across 00p180_QC2 samples is > 30% (technically too variable) - Percentage of missing values across study samples is > 99.9 (i.e. a concentration was measured in less than 5 individuals).

All analytes are supposed to be present in the 0p180 QC samples and high variability of measurements in them might indicate unstable estimation of concentrations.

The table below lists all analytes matching any of these criteria.

Analytes classified to have poor quality: RSD > 30% in *00p180_QC1* or *00p180_QC2* samples, or with 99.9% missing values in study samples. (continued below)
	%NA.00p180_QC1	%NA.00p180_QC2	%NA.study
Ac-Orn	70.78	70.99	99.73
ADMA	2.353	0.3297	2.396
c4-OH-Pro	100	0.3297	100
Carnosine	1.569	0.3297	100
DOPA	0.3922	0.3297	99.93
Dopamine	0.3922	0.3297	99.99
Histamine	12.16	12.2	99.99
Met-SO	100	0.3297	99.99
Nitro-Tyr	2.353	0.3297	99.99
PEA	0.3922	0.3297	99.99
Spermine	1.373	0.6593	97.84
C12:1	95.88	94.73	98.13
C14:1	92.75	92.97	47.71
C14:1-OH	95.49	95.6	95.18
C14:2	93.53	94.07	75.33
C14:2-OH	95.49	95.49	95.39
C16	0	0	99.93
C16-OH	88.04	91.1	90.2
C16:1-OH	97.25	97.36	97.22
C16:2	90.78	89.45	41.33
C16:2-OH	93.92	93.41	81.88
C18	0	0	99.98
C18:1	69.22	73.19	4.877
C3-DC (C4-OH)	90.59	95.49	94.18
C3-OH	100	100	100
C3:1	98.82	99.01	98.99
C4:1	99.8	100	100
C5	0.9804	0.989	99.91
C5-DC (C6-OH)	92.55	91.76	89.55
C5-M-DC	95.88	96.04	96.3
C5-OH (C3-DC-M)	94.51	95.16	94
C5:1	99.8	100	100
C5:1-DC	99.8	100	99.98
C6:1	100	100	100
C7-DC	99.61	99.78	63.42
C9	99.8	100	99.99
lysoPC a C14:0	100	100	100
lysoPC a C26:0	99.41	98.9	98.71
lysoPC a C26:1	100	100	100
lysoPC a C28:0	99.41	99.12	98.57
PC aa C30:2	32.16	32.09	33.91
PC aa C38:1	3.725	3.077	2.948
PC ae C30:1	3.725	3.846	4.582
SM C22:3	13.33	16.37	54.55
SM C26:0	3.725	2.857	1.032
SM C26:1	4.51	4.615	1.609

	RSD.00p180_QC1	RSD.00p180_QC2	data
Ac-Orn	149.7	144.3	LCMS
ADMA	10.62	35.49	LCMS
c4-OH-Pro	NA	19.88	LCMS
Carnosine	8.345	12.96	LCMS
DOPA	82.76	80.19	LCMS
Dopamine	23.09	26.25	LCMS
Histamine	67.09	63.94	LCMS
Met-SO	NA	28.12	LCMS
Nitro-Tyr	13.24	16.04	LCMS
PEA	7.072	6.136	LCMS
Spermine	50.85	64.24	LCMS
C12:1	23.97	42.55	FIA
C14:1	89.01	79.49	FIA
C14:1-OH	140.4	14.52	FIA
C14:2	54.96	13.17	FIA
C14:2-OH	65.25	11.98	FIA
C16	14.89	16.04	FIA
C16-OH	81.54	11.86	FIA
C16:1-OH	66.56	9.354	FIA
C16:2	91.73	36.87	FIA
C16:2-OH	73.07	29.71	FIA
C18	18.78	18.54	FIA
C18:1	46.63	112.4	FIA
C3-DC (C4-OH)	43.38	25.47	FIA
C3-OH	NA	NA	FIA
C3:1	118.1	8.428	FIA
C4:1	NA	NA	FIA
C5	11.14	11.06	FIA
C5-DC (C6-OH)	61.44	15.5	FIA
C5-M-DC	114.7	5.037	FIA
C5-OH (C3-DC-M)	67.41	21.11	FIA
C5:1	NA	NA	FIA
C5:1-DC	NA	NA	FIA
C6:1	NA	NA	FIA
C7-DC	50.31	1.746	FIA
C9	NA	NA	FIA
lysoPC a C14:0	NA	NA	FIA
lysoPC a C26:0	33.66	49.78	FIA
lysoPC a C26:1	NA	NA	FIA
lysoPC a C28:0	18.8	32.46	FIA
PC aa C30:2	52.2	47.67	FIA
PC aa C38:1	43.77	42.6	FIA
PC ae C30:1	39.61	39.85	FIA
SM C22:3	98.04	98.74	FIA
SM C26:0	35.02	33.63	FIA
SM C26:1	33.22	31.92	FIA

Note that there might be better approaches to identify poor quality signals, e.g. comparing the actual analyte concentrations to the expected concentrations. Also, problematic analytes will be identified in a second Rmd file (define-data-release-analytes.Rmd after removing poor quality samples and based only on one of the replicated plates.

4 Remove poor quality samples and analytes

Next we reduce the data set by removing samples that were identified above as potential outlier and poor quality samples. Also, we remove analytes considered to yield too noisy data.

lcms <- lcms[-lcms_anlts_out_idx,
             -unique(c(lcms_na_out_idx, lcms_rla_out_idx))]
lcms_raw <- lcms_raw[-lcms_anlts_out_idx,
                     -unique(c(lcms_na_out_idx, lcms_rla_out_idx))]
fia <- fia[-fia_anlts_out_idx,
             -unique(c(fia_na_out_idx, fia_rla_out_idx))]
fia_raw <- fia_raw[-fia_anlts_out_idx,
                   -unique(c(fia_na_out_idx, fia_rla_out_idx))]

All further data processing will be performed on the reduced data set.

5 General quality assessment

First we evaluate the general quality of the LCMS and FIA data set. This comprises plots of RLAs before and after normalization, relative standard deviation (RSD) calculated for each analyte in QC samples (across all plates), concentration ratios between replicated measurements.

Note that the quality assessment in this document is performed on all samples, also including the replicated ones.

5.1 Relative log abundances

First we calculate the relative log abundances (RLA) before and after normalization, using the sample type as groups within which the relative abundances are calculated.

## Defining the RLA groups
grp <- lcms$group
## grp[grp == "study"] <- as.character(lcms$sex)[grp == "study"]
lcms$rla_group <- grp
lcms_raw$rla_group <- grp

grp <- fia$group
## grp[grp == "study"] <- as.character(fia$sex)[grp == "study"]
fia$rla_group <- grp
fia_raw$rla_group <- grp

rla_lcms <- colMedians(rowRla(concentrations(lcms),
                              lcms$rla_group), na.rm = TRUE)
rla_lcms_raw <- colMedians(rowRla(concentrations(lcms_raw),
                                  lcms_raw$rla_group), na.rm = TRUE)
rla_fia <- colMedians(rowRla(concentrations(fia),
                             fia$rla_group), na.rm = TRUE)
rla_fia_raw <- colMedians(rowRla(concentrations(fia_raw),
                                 fia_raw$rla_group), na.rm = TRUE)

Figure 4: Per sample median RLA before (red) and after normalization (blue)
The grey rectangles indicate plates/batches.

Between-batch normalization reduced systematic difference in abundances, especially in the FIA data set centering the per sample median RLAs around 0.

The RLA plots above were created on blessed signals, i.e. measurements that are within the detection and quantification range defined by Biocrates. Data analyses will however be performed also considering signal outside this range. We thus create below also RLA plots considering all of the measured intensities.

Per sample median RLA before (red) and after normalization (blue) considering all measured intensities. The grey rectangles indicate plates/batches.

Figure 5: Per sample median RLA before (red) and after normalization (blue) considering all measured intensities
The grey rectangles indicate plates/batches.

Normalization reduced thus the between-group differences considerably.

5.2 Relative standard deviation

Below we calculate RSD values for each analyte in each sample type. We consider only signal within the detection range for each analyte.

rsd_lcms <- within_group_fun(concentrations(lcms),
                             lcms$group, rowRsd)
rsd_fia <- within_group_fun(concentrations(fia),
                            fia$group, rowRsd)
rsd_lcms_raw <- within_group_fun(concentrations(lcms_raw),
                                 lcms_raw$group, rowRsd)
rsd_fia_raw <- within_group_fun(concentrations(fia_raw),
                                fia_raw$group, rowRsd)

Figure 6: Distribution of RSD values per sample type for the LCMS (left) and FIA (right) data

The table below summarizes the RSD (across analytes) for each sample type before and after normalization.

Median RSD (in %) and percentage of analytes with an RSD > 30% for the LCMS and FIA data sets before and after normalization.
	LCMS raw	LCMS	FIA raw	FIA
median RSD 00p180_QC1	7.651	4.628	22.96	9.141
median RSD 00p180_QC2	7.586	6.373	22.26	8.655
median RSD QC CHRIS Pool	6.937	4.505	23.13	6.723
median RSD QC NIST STD	9.85	9.231	25.78	10.96
median RSD study	22.28	21.95	36.03	28.05
% RSD > 30 00p180_QC1	0	0	14.41	0
% RSD > 30 00p180_QC2	0	0	13.51	0
% RSD > 30 QC CHRIS Pool	0	0	15.32	1.802
% RSD > 30 QC NIST STD	0	0	34.23	2.703
% RSD > 30 study	19.35	19.35	75.68	34.23

The average RSD for QC samples is below ~9% for the LCMS and below ~11% for the FIA data. Between-batch normalization improved the quality of the FIA data considerably reducing the RSD from over 20% to about 10%. Note: only RSD of QC NIST STD samples and study samples should be considered for the quality assessment as these were not used to estimate the batch effect. Also, importantly, normalization reduced the RSD of QC samples while having only a low impact on the RSD of study samples.

Since many analyses will also include measurements outside the detection range, we repeat the analysis considering all measurements.

rsd_lcms_all <- within_group_fun(
    concentrations(lcms, blessing = "none"),
    lcms$group, rowRsd)
rsd_fia_all <- within_group_fun(
    concentrations(fia, blessing = "none"),
    fia$group, rowRsd)
rsd_lcms_raw_all <- within_group_fun(
    concentrations(lcms_raw, blessing = "none"),
    lcms_raw$group, rowRsd)
rsd_fia_raw_all <- within_group_fun(
    concentrations(fia_raw, blessing = "none"),
    fia_raw$group, rowRsd)

Median RSD (in %) and percentage of analytes with an RSD > 30% for the LCMS and FIA data sets before and after normalization; all measurements also outside the detection range are used.
	LCMS raw	LCMS	FIA raw	FIA
median RSD 00p180_QC1	7.651	4.628	23.39	9.161
median RSD 00p180_QC2	7.586	6.373	23.26	8.689
median RSD QC CHRIS Pool	7.025	4.505	23.71	7.113
median RSD QC NIST STD	10.2	9.381	26.89	11.01
median RSD study	23.42	23.11	37.08	28.3
% RSD > 30 00p180_QC1	0	0	16.22	0.9009
% RSD > 30 00p180_QC2	0	0	16.22	0.9009
% RSD > 30 QC CHRIS Pool	3.226	0	18.02	1.802
% RSD > 30 QC NIST STD	6.452	6.452	39.64	2.703
% RSD > 30 study	32.26	32.26	82.88	35.14

Even when signals outside the detection range are considered, RSD of QC Samples are below ~ 10% after normalization. Again, the improvement of the FIA data quality is impressive.

5.3 Maximum ratio of abundances for replicated measurements

As an additional quality criteria we evaluate the difference (ratio) in abundances of replicated measurements. To this end we first identify the replicated samples and subsequently determine the largest ratio of abundances measured for an analyte in the replicates of a sample. The percentage of such MRA (maximum ratio of abundances) larger than 1.3 are used as a quality measure. This represents the percentage of analyte measurements for which concentrations differ by more than 30%.

#' Identify replicated study samples and subset the data
samp_cnt <- table(lcms$sample_name[lcms$group == "study"])
mult_ids <- names(samp_cnt)[samp_cnt > 1]

#' subsetting the data to replicated measurements only
lcms_repl <- lcms[, lcms$sample_name %in% mult_ids]

#' Define the function to calculate the MRA:
#' Returns the ratio between the max and the min value or
#' NA if less than 2 valid measurements are available.
mra <- function(x) {
    if (is.matrix(x)) {
        apply(x, MARGIN = 1, function(z) {
            z <- z[!is.na(z)]
            if (length(z) > 1) {
                rng <- range(z)
                rng[2] / rng[1]
            } else
                NA_real_
        })
    } else
        rep(NA_real_, length(x))
}

mra_lcms <- do.call(
    cbind, within_group_fun(concentrations(lcms_repl),
                            lcms_repl$sample_name, mra))
#' Remove columns (sample pairs) with only NAs.
mra_lcms <- mra_lcms[, prop_na(mra_lcms, 2) < 1]

The distribution of MRA values per replicated sample before and after normalization is shown below, first for the LCMS and subsequently for the FIA data. For easier visualization the data is log2 transformed. A log2 MRA of 0 represents identical concentrations, log2 MRA larger than 1 a more than two-fold difference in abundances.

Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the LCMS data before and after normalization. Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Figure 7: Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the LCMS data before and after normalization
Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Normalization reduced the MRAs for all replicates, and had its largest impact on the measurements from the first two replicated plates. Per-sample average log2 MRAs are, with the exception of some replicates, below 0.2 for all samples, which represents less than 15% difference in concentration. Based on the MRAs, observed differences of abundances in the LCMS data which are above ~ 1.5-fold (log2 MRA of 0.6) can be considered to be real while for smaller differences it is not possible to discriminate between technical and biological variance.

Next we plot the MRAs before and after normalization for the FIA data.

Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the FIA data before and after normalization. Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Figure 8: Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the FIA data before and after normalization
Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.

Differences between replicated measurements are much larger for the FIA than for the LCMS data. These differences could be reduced by the normalization, specifically for replicated plate pairs 1 (BPLT00000037/BPLT00000038), 4 (BPLT00000229/BPLT00000230) and 11 (BPLT00000049/BPLT00000050). For the FIA data even two-fold differences could be caused exclusively by technical variances. Thus, potential biologically relevant differences should be more than two-fold at least.

The table below summarizes the differences between replicated measurements. For each replicated individual the median MRA (across all analytes) was calculated, as well as the 80% and the 95% quantile. The mean of these across all replicated individuals is then reported and used as quality criteria.

Summary for the maximum ratio of abundances (MRA) analysis. Mean of median MRA per sample (across analytes), 80% and 95 % quantile of MRAs as well as number of samples with a median MRA larger than 1.3 and percentage of measurements (all analytes, all samples) with an MRA > 1.3.
	LCMS raw	LCMS	FIA raw	FIA
mean of median MRA	1.064	1.054	1.197	1.104
mean of 80% quantile MRA	1.113	1.092	1.374	1.175
mean of 95% quantile MRA	1.182	1.145	1.599	1.326
no. samples with median MRA > 1.3	0	0	104	19
% of measurements with MRA > 1.3	1.82	0.8768	28.7	8.61

On average, 80% of analytes have a less than 1.1-fold difference in concentration between replicates for LCMS data and 1.2-fold difference for FIA data. The quality of the data is thus extremely good. Normalization could improve data quality, specifically for the FIA data.

Despite the overall good quality of the data, some replicated measurements show (in some cases extremely) large differences. These are listed below.

Samples with a median MRA larger than 1.3. Shown are the sample name, the plates on which the sample was measured the median MRA, the quartiles of the abundance ratios, the intercept and slopes of the linear models fitted to the replicates and the R square. Samples are ordered by the median MRA in descending order. (continued below)
sample_name	plate_name	median MRA	data	AR 0%
0010133324	BPLT00000225, BPLT00000226	2.018	FIA	0.3591
0010071152	BPLT00000225, BPLT00000226	1.875	FIA	0.2786
0010222815	BPLT00000205, BPLT00000206	1.859	FIA	0.3554
0010306114	BPLT00000225, BPLT00000226	1.629	FIA	0.4419
0010324169	BPLT00000181, BPLT00000182	1.617	FIA	0.8972
0010148293	BPLT00000225, BPLT00000226	1.575	FIA	0.3779
0010103176	BPLT00000081, BPLT00000082	1.522	FIA	0.1071
0010339901	BPLT00000225, BPLT00000226	1.435	FIA	0.3876
0010063900	BPLT00000105, BPLT00000106	1.433	FIA	0.4846
0010039436	BPLT00000205, BPLT00000206	1.431	FIA	0.5496
0010226390	BPLT00000205, BPLT00000206	1.408	FIA	0.5034
0010176777	BPLT00000041, BPLT00000042	1.377	FIA	0.5573
0010179082	BPLT00000049, BPLT00000050	1.372	FIA	0.8236
0010167974	BPLT00000225, BPLT00000226	1.361	FIA	0.5553
0010291696	BPLT00000241, BPLT00000242	1.343	FIA	0.535
0010281876	BPLT00000225, BPLT00000226	1.332	FIA	0.367
0010271305	BPLT00000225, BPLT00000226	1.319	FIA	0.5374
0010290792	BPLT00000241, BPLT00000242	1.31	FIA	0.505
0010252353	BPLT00000049, BPLT00000050	1.308	FIA	0.8679

AR 25%	AR 50%	AR 75%	AR 100%	int	slope	R2
0.4715	0.4954	0.5607	1.499	14.05	1.025	0.9886
0.4955	0.5334	0.6426	1.213	18	1.024	0.9912
0.4962	0.5381	0.6681	1.061	11.67	1.103	0.9899
0.5638	0.6145	0.6867	2.035	7.977	0.8823	0.9936
1.416	1.617	1.693	2.265	-10.53	1.004	0.9899
0.6015	0.635	0.6939	1.193	8.326	1.03	0.9956
0.5958	0.657	0.7359	1.07	7.748	1.204	0.9958
0.649	0.6967	0.7571	1.136	7.642	1.019	0.9965
0.6725	0.6977	0.7629	1.172	7.783	0.9383	0.9964
0.671	0.6989	0.82	1.149	5.599	1.132	0.9981
0.6745	0.7122	0.8478	2.43	5.061	1.082	0.9983
0.6621	0.7264	0.8964	1.157	5.293	0.9386	0.9961
1.233	1.372	1.572	2.08	-6.964	0.8458	0.9963
0.7154	0.7373	0.8134	1.765	8.407	1.025	0.9942
0.702	0.7447	0.7885	1.325	4.283	1.231	0.9983
0.7171	0.7509	0.8256	1.214	5.779	1.035	0.998
0.7367	0.7586	0.8331	1.332	7.668	0.9963	0.9958
0.7266	0.7634	0.8089	1.056	7.003	1.105	0.9985
1.065	1.308	1.412	2.147	-12.05	0.9723	0.9944

Not a single replicate pair in the LC-MS data set had a median MRA larger 1.3. For the FIA data 20 had a median MRA larger than 1.3, but, with the exception of a single sample, smaller than 2-fold. Most of the samples (8) are from replicated plates BPLT00000225/BPLT00000226.

The MRA reports only absolute differences, thus it can not distinguish between sample mixups and differences in sample amount (or failing injections). The very large R squares argue however against sample mixups. Thus differences are most likely caused by differences in absolute concentrations.

5.4 Evaluation of sample grouping with a principal component analysis

At last we perform a PCA to group samples based on their metabolic profile. We will use all measured concentrations even if they were outside the detection range for an analyte. Note that Biocrates provides an abundance estimate for all analytes even if they are outside the detection or quantification limit. For values outside of the detection and quantification range Biocrates does however not guarantee that they are correct.

All analytes with a single missing value (i.e. measurement dropped because of a technical problem) are subsequently dropped.

conc_lcms <- concentrations(lcms, blessing = "none")
conc_fia <- concentrations(fia, blessing = "none")

conc_lcms <- na.omit(conc_lcms)
conc_fia <- na.omit(conc_fia)

The PCA analysis is based on the analytes listed in the table below.

Analytes used for the PCA analysis (continued below)
	analyte_class
Ala	aminoacids
Asn	aminoacids
Cit	aminoacids
Creatinine	biogenic amines
Gln	aminoacids
Glu	aminoacids
Gly	aminoacids
His	aminoacids
Ile	aminoacids
Kynurenine	biogenic amines
Lys	aminoacids
Met	aminoacids
Phe	aminoacids
Pro	aminoacids
Ser	aminoacids
t4-OH-Pro	biogenic amines
Trp	aminoacids
Tyr	aminoacids
Val	aminoacids
C0	acylcarnitines
C10	acylcarnitines
C10:2	acylcarnitines
C12	acylcarnitines
C12-DC	acylcarnitines
C14	acylcarnitines
C18:1-OH	acylcarnitines
C18:2	acylcarnitines
C2	acylcarnitines
C3	acylcarnitines
C8	acylcarnitines
H1	sugars
lysoPC a C16:0	glycerophospholipids
lysoPC a C16:1	glycerophospholipids
lysoPC a C17:0	glycerophospholipids
lysoPC a C18:0	glycerophospholipids
lysoPC a C18:1	glycerophospholipids
lysoPC a C18:2	glycerophospholipids
lysoPC a C20:3	glycerophospholipids
lysoPC a C20:4	glycerophospholipids
lysoPC a C24:0	glycerophospholipids
lysoPC a C28:1	glycerophospholipids
PC aa C24:0	glycerophospholipids
PC aa C26:0	glycerophospholipids
PC aa C28:1	glycerophospholipids
PC aa C30:0	glycerophospholipids
PC aa C32:0	glycerophospholipids
PC aa C32:1	glycerophospholipids
PC aa C32:2	glycerophospholipids
PC aa C32:3	glycerophospholipids
PC aa C34:1	glycerophospholipids
PC aa C34:2	glycerophospholipids
PC aa C34:3	glycerophospholipids
PC aa C34:4	glycerophospholipids
PC aa C36:1	glycerophospholipids
PC aa C36:2	glycerophospholipids
PC aa C36:3	glycerophospholipids
PC aa C36:4	glycerophospholipids
PC aa C36:5	glycerophospholipids
PC aa C36:6	glycerophospholipids
PC aa C38:0	glycerophospholipids
PC aa C38:3	glycerophospholipids
PC aa C38:4	glycerophospholipids
PC aa C38:5	glycerophospholipids
PC aa C38:6	glycerophospholipids
PC aa C40:2	glycerophospholipids
PC aa C40:3	glycerophospholipids
PC aa C40:4	glycerophospholipids
PC aa C40:5	glycerophospholipids
PC aa C40:6	glycerophospholipids
PC aa C42:0	glycerophospholipids
PC aa C42:1	glycerophospholipids
PC aa C42:2	glycerophospholipids
PC aa C42:4	glycerophospholipids
PC aa C42:5	glycerophospholipids
PC aa C42:6	glycerophospholipids
PC ae C30:0	glycerophospholipids
PC ae C30:2	glycerophospholipids
PC ae C32:1	glycerophospholipids
PC ae C32:2	glycerophospholipids
PC ae C34:0	glycerophospholipids
PC ae C34:1	glycerophospholipids
PC ae C34:2	glycerophospholipids
PC ae C34:3	glycerophospholipids
PC ae C36:0	glycerophospholipids
PC ae C36:1	glycerophospholipids
PC ae C36:2	glycerophospholipids
PC ae C36:3	glycerophospholipids
PC ae C36:4	glycerophospholipids
PC ae C36:5	glycerophospholipids
PC ae C38:0	glycerophospholipids
PC ae C38:3	glycerophospholipids
PC ae C38:4	glycerophospholipids
PC ae C38:5	glycerophospholipids
PC ae C38:6	glycerophospholipids
PC ae C40:1	glycerophospholipids
PC ae C40:2	glycerophospholipids
PC ae C40:3	glycerophospholipids
PC ae C40:4	glycerophospholipids
PC ae C40:5	glycerophospholipids
PC ae C40:6	glycerophospholipids
PC ae C42:0	glycerophospholipids
PC ae C42:1	glycerophospholipids
PC ae C42:2	glycerophospholipids
PC ae C42:3	glycerophospholipids
PC ae C42:4	glycerophospholipids
PC ae C42:5	glycerophospholipids
PC ae C44:3	glycerophospholipids
PC ae C44:4	glycerophospholipids
PC ae C44:5	glycerophospholipids
PC ae C44:6	glycerophospholipids
SM (OH) C14:1	sphingolipids
SM (OH) C16:1	sphingolipids
SM (OH) C22:1	sphingolipids
SM (OH) C22:2	sphingolipids
SM (OH) C24:1	sphingolipids
SM C16:0	sphingolipids
SM C16:1	sphingolipids
SM C18:0	sphingolipids
SM C18:1	sphingolipids
SM C24:0	sphingolipids
SM C24:1	sphingolipids

	biochemical_name
Ala	Alanine
Asn	Asparagine
Cit	Citrulline
Creatinine	Creatinine
Gln	Glutamine
Glu	Glutamate
Gly	Glycine
His	Histidine
Ile	Isoleucine
Kynurenine	Kynurenine
Lys	Lysine
Met	Methionine
Phe	Phenylalanine
Pro	Proline
Ser	Serine
t4-OH-Pro	trans-4-Hydroxyproline
Trp	Tryptophan
Tyr	Tyrosine
Val	Valine
C0	Carnitine
C10	Decanoylcarnitine
C10:2	Decadienylcarnitine
C12	Dodecanoylcarnitine
C12-DC	Dodecanedioylcarnitine
C14	Tetradecanoylcarnitine
C18:1-OH	Hydroxyoctadecenoylcarnitine
C18:2	Octadecadienylcarnitine
C2	Acetylcarnitine
C3	Propionylcarnitine
C8	Octanoylcarnitine
H1	Hexose
lysoPC a C16:0	lysoPhosphatidylcholine acyl C16:0
lysoPC a C16:1	lysoPhosphatidylcholine acyl C16:1
lysoPC a C17:0	lysoPhosphatidylcholine acyl C17:0
lysoPC a C18:0	lysoPhosphatidylcholine acyl C18:0
lysoPC a C18:1	lysoPhosphatidylcholine acyl C18:1
lysoPC a C18:2	lysoPhosphatidylcholine acyl C18:2
lysoPC a C20:3	lysoPhosphatidylcholine acyl C20:3
lysoPC a C20:4	lysoPhosphatidylcholine acyl C20:4
lysoPC a C24:0	lysoPhosphatidylcholine acyl C24:0
lysoPC a C28:1	lysoPhosphatidylcholine acyl C28:1
PC aa C24:0	Phosphatidylcholine diacyl C24:0
PC aa C26:0	Phosphatidylcholine diacyl C26:0
PC aa C28:1	Phosphatidylcholine diacyl C28:1
PC aa C30:0	Phosphatidylcholine diacyl C30:0
PC aa C32:0	Phosphatidylcholine diacyl C32:0
PC aa C32:1	Phosphatidylcholine diacyl C32:1
PC aa C32:2	Phosphatidylcholine diacyl C32:2
PC aa C32:3	Phosphatidylcholine diacyl C32:3
PC aa C34:1	Phosphatidylcholine diacyl C34:1
PC aa C34:2	Phosphatidylcholine diacyl C34:2
PC aa C34:3	Phosphatidylcholine diacyl C34:3
PC aa C34:4	Phosphatidylcholine diacyl C34:4
PC aa C36:1	Phosphatidylcholine diacyl C36:1
PC aa C36:2	Phosphatidylcholine diacyl C36:2
PC aa C36:3	Phosphatidylcholine diacyl C36:3
PC aa C36:4	Phosphatidylcholine diacyl C36:4
PC aa C36:5	Phosphatidylcholine diacyl C36:5
PC aa C36:6	Phosphatidylcholine diacyl C36:6
PC aa C38:0	Phosphatidylcholine diacyl C38:0
PC aa C38:3	Phosphatidylcholine diacyl C38:3
PC aa C38:4	Phosphatidylcholine diacyl C38:4
PC aa C38:5	Phosphatidylcholine diacyl C38:5
PC aa C38:6	Phosphatidylcholine diacyl C38:6
PC aa C40:2	Phosphatidylcholine diacyl C40:2
PC aa C40:3	Phosphatidylcholine diacyl C40:3
PC aa C40:4	Phosphatidylcholine diacyl C40:4
PC aa C40:5	Phosphatidylcholine diacyl C40:5
PC aa C40:6	Phosphatidylcholine diacyl C40:6
PC aa C42:0	Phosphatidylcholine diacyl C42:0
PC aa C42:1	Phosphatidylcholine diacyl C42:1
PC aa C42:2	Phosphatidylcholine diacyl C42:2
PC aa C42:4	Phosphatidylcholine diacyl C42:4
PC aa C42:5	Phosphatidylcholine diacyl C42:5
PC aa C42:6	Phosphatidylcholine diacyl C42:6
PC ae C30:0	Phosphatidylcholine acyl-alkyl C30:0
PC ae C30:2	Phosphatidylcholine acyl-alkyl C30:2
PC ae C32:1	Phosphatidylcholine acyl-alkyl C32:1
PC ae C32:2	Phosphatidylcholine acyl-alkyl C32:2
PC ae C34:0	Phosphatidylcholine acyl-alkyl C34:0
PC ae C34:1	Phosphatidylcholine acyl-alkyl C34:1
PC ae C34:2	Phosphatidylcholine acyl-alkyl C34:2
PC ae C34:3	Phosphatidylcholine acyl-alkyl C34:3
PC ae C36:0	Phosphatidylcholine acyl-alkyl C36:0
PC ae C36:1	Phosphatidylcholine acyl-alkyl C36:1
PC ae C36:2	Phosphatidylcholine acyl-alkyl C36:2
PC ae C36:3	Phosphatidylcholine acyl-alkyl C36:3
PC ae C36:4	Phosphatidylcholine acyl-alkyl C36:4
PC ae C36:5	Phosphatidylcholine acyl-alkyl C36:5
PC ae C38:0	Phosphatidylcholine acyl-alkyl C38:0
PC ae C38:3	Phosphatidylcholine acyl-alkyl C38:3
PC ae C38:4	Phosphatidylcholine acyl-alkyl C38:4
PC ae C38:5	Phosphatidylcholine acyl-alkyl C38:5
PC ae C38:6	Phosphatidylcholine acyl-alkyl C38:6
PC ae C40:1	Phosphatidylcholine acyl-alkyl C40:1
PC ae C40:2	Phosphatidylcholine acyl-alkyl C40:2
PC ae C40:3	Phosphatidylcholine acyl-alkyl C40:3
PC ae C40:4	Phosphatidylcholine acyl-alkyl C40:4
PC ae C40:5	Phosphatidylcholine acyl-alkyl C40:5
PC ae C40:6	Phosphatidylcholine acyl-alkyl C40:6
PC ae C42:0	Phosphatidylcholine acyl-alkyl C42:0
PC ae C42:1	Phosphatidylcholine acyl-alkyl C42:1
PC ae C42:2	Phosphatidylcholine acyl-alkyl C42:2
PC ae C42:3	Phosphatidylcholine acyl-alkyl C42:3
PC ae C42:4	Phosphatidylcholine acyl-alkyl C42:4
PC ae C42:5	Phosphatidylcholine acyl-alkyl C42:5
PC ae C44:3	Phosphatidylcholine acyl-alkyl C44:3
PC ae C44:4	Phosphatidylcholine acyl-alkyl C44:4
PC ae C44:5	Phosphatidylcholine acyl-alkyl C44:5
PC ae C44:6	Phosphatidylcholine acyl-alkyl C44:6
SM (OH) C14:1	Hydroxysphingomyeline C14:1
SM (OH) C16:1	Hydroxysphingomyeline C16:1
SM (OH) C22:1	Hydroxysphingomyeline C22:1
SM (OH) C22:2	Hydroxysphingomyeline C22:2
SM (OH) C24:1	Hydroxysphingomyeline C24:1
SM C16:0	Sphingomyeline C16:0
SM C16:1	Sphingomyeline C16:1
SM C18:0	Sphingomyeline C18:0
SM C18:1	Sphingomyeline C18:1
SM C24:0	Sphingomyeline C24:0
SM C24:1	Sphingomyeline C24:1

The table below lists the analytes that were excluded from the analysis.

Analytes excluded from the PCA analysis (continued below)
	analyte_class
alpha-AAA	biogenic amines
Arg	aminoacids
Asp	aminoacids
Leu	aminoacids
Orn	aminoacids
Putrescine	biogenic amines
Sarcosine	biogenic amines
SDMA	biogenic amines
Serotonin	biogenic amines
Spermidine	biogenic amines
Taurine	biogenic amines
Thr	aminoacids
C10:1	acylcarnitines
C16:1	acylcarnitines
C4	acylcarnitines
C6 (C4:1-DC)	acylcarnitines
PC aa C36:0	glycerophospholipids
PC aa C40:1	glycerophospholipids
PC ae C38:1	glycerophospholipids
PC ae C38:2	glycerophospholipids
SM C20:2	sphingolipids

	biochemical_name
alpha-AAA	alpha-Aminoadipic acid
Arg	Arginine
Asp	Aspartate
Leu	Leucine
Orn	Ornithine
Putrescine	Putrescine
Sarcosine	Sarcosine
SDMA	Symmetric dimethylarginine
Serotonin	Serotonin
Spermidine	Spermidine
Taurine	Taurine
Thr	Threonine
C10:1	Decenoylcarnitine
C16:1	Hexadecenoylcarnitine
C4	Butyrylcarnitine
C6 (C4:1-DC)	Hexanoylcarnitine (Fumarylcarnitine)
PC aa C36:0	Phosphatidylcholine diacyl C36:0
PC aa C40:1	Phosphatidylcholine diacyl C40:1
PC ae C38:1	Phosphatidylcholine acyl-alkyl C38:1
PC ae C38:2	Phosphatidylcholine acyl-alkyl C38:2
SM C20:2	Sphingomyeline C20:2

At last we perform the PCA on the log2 transformed and (analyte-wise) centered abundances.

pc_lcms <- prcomp(t(log2(conc_lcms)), scale = FALSE, center = TRUE)
pc_fia <- prcomp(t(log2(conc_fia)), scale = FALSE, center = TRUE)

Figure 9: PCA of the normalized LCMS data
PCA was performed on all measured concentrations even if they were outside the detection range.

PC1 clearly separates 00p180_QC2 samples from all other samples while on PC2 QC NIST STD and 00p180_QC1 samples separate from QC CHRIS Pool and study samples, that, not surprisingly, are both overlapping. Thus, the first principal components simply reflect differences between artificial QC samples and the samples from the CHRIS study. Below we plot the results for the FIA data.

Figure 10: PCA of the normalized FIA data
PCA was performed on all measured concentrations even if they were outside the detection range.

PC1 reflects differences between the Biocrates’ QC samples 00p180_QC1 and 00p180_QC2 from all other samples. The samples separating from the main cloud on PC4 are from plates BPLT00000337, BPLT00000357, BPLT00000361 and BPLT00000413.

6 Session information

Data release information.

metadata(bcd)

##                         name                   value
## 1           db_creation_date     2020-04-01 09:19:31
## 2                  data_date                 2020-03
## 3              chris_release                     3.5
## 4                    version                   2.0.0
## 5            aminoacids_norm anlt_log2_QCs_mean_mean
## 6       biogenic_amines_norm anlt_log2_QCs_mean_mean
## 7        acylcarnitines_norm    anlt_log2_CHRIS_mean
## 8  glycerophospholipids_norm anlt_log2_QCs_mean_mean
## 9         sphingolipids_norm anlt_log2_QCs_mean_mean
## 10               sugars_norm anlt_log2_QCs_mean_mean

R session information.

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                                             
##  version  R Under development (unstable) (2020-03-27 r78085)
##  os       Debian GNU/Linux 10 (buster)                      
##  system   x86_64, linux-gnu                                 
##  ui       X11                                               
##  language (EN)                                              
##  collate  en_US.UTF-8                                       
##  ctype    en_US.UTF-8                                       
##  tz       Etc/UTC                                           
##  date     2020-04-01                                        
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package              * version  date       lib source        
##  affy                   1.65.1   2019-11-06 [1] Bioconductor  
##  affyio                 1.57.0   2019-10-29 [1] Bioconductor  
##  AnnotationDbi          1.49.1   2020-01-25 [1] Bioconductor  
##  AnnotationFilter     * 1.11.0   2019-10-29 [1] Bioconductor  
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.1.0)
##  backports              1.1.5    2019-10-02 [2] CRAN (R 4.1.0)
##  Biobase              * 2.47.3   2020-03-16 [1] Bioconductor  
##  BiocGenerics         * 0.33.3   2020-03-23 [1] Bioconductor  
##  BioCHRIStes          * 2.0.0    2020-04-01 [1] Bioconductor  
##  BiocManager          * 1.30.10  2019-11-16 [2] CRAN (R 4.1.0)
##  BiocParallel         * 1.21.2   2019-12-21 [1] Bioconductor  
##  BiocStyle            * 2.15.6   2020-02-01 [1] Bioconductor  
##  bit                    1.1-15.2 2020-02-10 [1] CRAN (R 4.0.0)
##  bit64                  0.9-7    2017-05-08 [1] CRAN (R 4.0.0)
##  bitops                 1.0-6    2013-08-17 [1] CRAN (R 4.0.0)
##  blob                   1.2.1    2020-01-20 [1] CRAN (R 4.0.0)
##  bookdown               0.18     2020-03-05 [1] CRAN (R 4.0.0)
##  callr                  3.4.2    2020-02-12 [2] CRAN (R 4.1.0)
##  cli                    2.0.2    2020-02-28 [2] CRAN (R 4.1.0)
##  codetools              0.2-16   2018-12-24 [3] CRAN (R 4.1.0)
##  colorspace             1.4-1    2019-03-18 [1] CRAN (R 4.0.0)
##  crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.1.0)
##  DBI                  * 1.1.0    2019-12-15 [1] CRAN (R 4.0.0)
##  dbplyr                 1.4.2    2019-06-17 [1] CRAN (R 4.0.0)
##  DelayedArray         * 0.13.8   2020-03-26 [1] Bioconductor  
##  DEoptimR               1.0-8    2016-11-19 [1] CRAN (R 4.0.0)
##  desc                   1.2.0    2018-05-01 [2] CRAN (R 4.1.0)
##  devtools               2.2.2    2020-02-17 [1] CRAN (R 4.0.0)
##  digest                 0.6.25   2020-02-23 [2] CRAN (R 4.1.0)
##  doParallel             1.0.15   2019-08-02 [1] CRAN (R 4.0.0)
##  dplyr                  0.8.5    2020-03-07 [1] CRAN (R 4.0.0)
##  ellipsis               0.3.0    2019-09-20 [2] CRAN (R 4.1.0)
##  evaluate               0.14     2019-05-28 [2] CRAN (R 4.1.0)
##  fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.1.0)
##  foreach                1.5.0    2020-03-30 [1] CRAN (R 4.1.0)
##  fs                     1.3.2    2020-03-05 [2] CRAN (R 4.1.0)
##  GenomeInfoDb         * 1.23.16  2020-03-27 [1] Bioconductor  
##  GenomeInfoDbData       1.2.2    2020-02-18 [1] Bioconductor  
##  GenomicRanges        * 1.39.3   2020-03-24 [1] Bioconductor  
##  ggplot2                3.3.0    2020-03-05 [1] CRAN (R 4.0.0)
##  glue                   1.3.2    2020-03-12 [2] CRAN (R 4.1.0)
##  gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.0.0)
##  highr                  0.8      2019-03-20 [1] CRAN (R 4.0.0)
##  hms                    0.5.3    2020-01-08 [1] CRAN (R 4.0.0)
##  htmltools              0.4.0    2019-10-04 [2] CRAN (R 4.1.0)
##  impute                 1.61.0   2019-10-29 [1] Bioconductor  
##  IRanges              * 2.21.8   2020-03-25 [1] Bioconductor  
##  iterators              1.0.12   2019-07-26 [1] CRAN (R 4.0.0)
##  knitr                  1.28     2020-02-06 [1] CRAN (R 4.0.0)
##  lattice                0.20-40  2020-02-19 [3] CRAN (R 4.1.0)
##  lazyeval               0.2.2    2019-03-15 [2] CRAN (R 4.1.0)
##  lifecycle              0.2.0    2020-03-06 [1] CRAN (R 4.0.0)
##  limma                  3.43.5   2020-03-06 [1] Bioconductor  
##  magick                 2.3      2020-01-24 [1] CRAN (R 4.0.0)
##  magrittr               1.5      2014-11-22 [2] CRAN (R 4.1.0)
##  MALDIquant             1.19.3   2019-05-12 [1] CRAN (R 4.0.0)
##  MASS                   7.3-51.5 2019-12-20 [3] CRAN (R 4.1.0)
##  MassSpecWavelet        1.53.0   2019-10-29 [1] Bioconductor  
##  Matrix                 1.2-18   2019-11-27 [3] CRAN (R 4.1.0)
##  matrixStats          * 0.56.0   2020-03-13 [1] CRAN (R 4.0.0)
##  memoise                1.1.0    2017-04-21 [2] CRAN (R 4.1.0)
##  MSnbase              * 2.13.4   2020-03-24 [1] Bioconductor  
##  multtest               2.43.1   2020-03-12 [1] Bioconductor  
##  munsell                0.5.0    2018-06-12 [1] CRAN (R 4.0.0)
##  mzID                   1.25.0   2019-10-29 [1] Bioconductor  
##  mzR                  * 2.21.1   2019-12-14 [1] Bioconductor  
##  ncdf4                  1.17     2019-10-23 [1] CRAN (R 4.0.0)
##  pander               * 0.6.3    2018-11-06 [1] CRAN (R 4.0.0)
##  pcaMethods             1.79.1   2019-11-03 [1] Bioconductor  
##  pillar                 1.4.3    2019-12-20 [1] CRAN (R 4.1.0)
##  pkgbuild               1.0.6    2019-10-09 [2] CRAN (R 4.1.0)
##  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.0.0)
##  pkgload                1.0.2    2018-10-29 [2] CRAN (R 4.1.0)
##  plyr                   1.8.6    2020-03-03 [1] CRAN (R 4.0.0)
##  preprocessCore         1.49.2   2020-02-01 [1] Bioconductor  
##  prettyunits            1.1.1    2020-01-24 [2] CRAN (R 4.1.0)
##  processx               3.4.2    2020-02-09 [2] CRAN (R 4.1.0)
##  ProtGenerics         * 1.19.3   2019-12-25 [1] Bioconductor  
##  ps                     1.3.2    2020-02-13 [2] CRAN (R 4.1.0)
##  purrr                  0.3.3    2019-10-18 [2] CRAN (R 4.1.0)
##  R6                     2.4.1    2019-11-12 [2] CRAN (R 4.1.0)
##  RANN                   2.6.1    2019-01-08 [1] CRAN (R 4.0.0)
##  RColorBrewer         * 1.1-2    2014-12-07 [1] CRAN (R 4.0.0)
##  Rcpp                 * 1.0.4    2020-03-17 [2] CRAN (R 4.1.0)
##  RCurl                  1.98-1.1 2020-01-19 [1] CRAN (R 4.0.0)
##  remotes                2.1.1    2020-02-15 [1] CRAN (R 4.0.0)
##  rlang                  0.4.5    2020-03-01 [2] CRAN (R 4.1.0)
##  RMariaDB             * 1.0.8    2019-12-18 [1] CRAN (R 4.0.0)
##  rmarkdown            * 2.1      2020-01-20 [1] CRAN (R 4.0.0)
##  robustbase             0.93-6   2020-03-23 [1] CRAN (R 4.0.0)
##  rprojroot              1.3-2    2018-01-03 [2] CRAN (R 4.1.0)
##  RSQLite                2.2.0    2020-01-07 [1] CRAN (R 4.0.0)
##  S4Vectors            * 0.25.14  2020-03-24 [1] Bioconductor  
##  scales                 1.1.0    2019-11-18 [1] CRAN (R 4.0.0)
##  sessioninfo            1.1.1    2018-11-05 [2] CRAN (R 4.1.0)
##  stringi                1.4.6    2020-02-17 [2] CRAN (R 4.1.0)
##  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.1.0)
##  SummarizedExperiment * 1.17.5   2020-03-27 [1] Bioconductor  
##  survival               3.1-11   2020-03-07 [3] CRAN (R 4.1.0)
##  testthat               2.3.2    2020-03-02 [1] CRAN (R 4.0.0)
##  tibble                 3.0.0    2020-03-30 [1] CRAN (R 4.1.0)
##  tidyselect             1.0.0    2020-01-27 [1] CRAN (R 4.0.0)
##  usethis                1.5.1    2019-07-04 [2] CRAN (R 4.1.0)
##  vctrs                  0.2.4    2020-03-10 [1] CRAN (R 4.0.0)
##  vsn                    3.55.0   2019-10-29 [1] Bioconductor  
##  withr                  2.1.2    2018-03-15 [2] CRAN (R 4.1.0)
##  xcms                 * 3.9.3    2020-03-13 [1] Bioconductor  
##  xfun                   0.12     2020-01-13 [1] CRAN (R 4.0.0)
##  XML                    3.99-0.3 2020-01-20 [1] CRAN (R 4.0.0)
##  XVector                0.27.2   2020-03-24 [1] Bioconductor  
##  yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.0.0)
##  zlibbioc               1.33.1   2020-01-24 [1] Bioconductor  
## 
## [1] /usr/local/lib/R/host-site-library
## [2] /usr/local/lib/R/site-library
## [3] /usr/local/lib/R/library

Quality assessment of the CHRIS7500 targeted metabolomics data

1 April 2020

Package