BioCHRIStes 2.0.0
Authors: Johannes Rainer, Vinicius Veri Hernandes and Sigurdur Smarason
`
Modified: 2020-04-01 11:21:02
Compiled: Wed Apr 1 11:21:06 2020
In this document we evaluate the quality of the normalized Biocrates-based targeted metabolomics data for the CHRIS7500 data set also in comparison to the quality before normalization. The quality assessment is performed separately for the LCMS and FIA data. The results from this document were used to define which of the replicated plates are part of the final data release.
Below we load all required packages and fetch the LCMS and the FIA data from the database.
library(BioCHRIStes)
library(xcms)
library(RColorBrewer)
library(pander)
library(RMariaDB)
bcd <- BioCHRIStes(dbConnect(MariaDB(), dbname = "biochristes_7500",
user = Sys.getenv("MYSQL_USER"),
pass = Sys.getenv("MYSQL_PASS"),
host = Sys.getenv("MYSQL_HOST")))
#' Load the raw data
lcms_raw <- fetchData(bcd, data = "lcms", raw = TRUE,
filter = ~ data_release == "CHRIS7500 for normalization")
fia_raw <- fetchData(bcd, data = "fia", raw = TRUE,
filter = ~ data_release == "CHRIS7500 for normalization")
#' Load the normalized data
lcms <- fetchData(bcd, data = "lcms", raw = FALSE,
filter = ~ data_release == "CHRIS7500 for normalization")
fia <- fetchData(bcd, data = "fia", raw = FALSE,
filter = ~ data_release == "CHRIS7500 for normalization")
#' Define colors for the different types of samples
cols <- brewer.pal(9, "Set1")[c(2, 3, 1, 5, 9)]
names(cols) <- c("QC NIST STD", "QC CHRIS Pool", "00p180_QC2",
"00p180_QC1", "study")
Next we add sample type information to the data, only the code for the LCMS data is shown below.
lcms_raw$group <- "study"
lcms_raw$group[lcms_raw$sample_name == "QC NIST STD"] <- "QC NIST STD"
lcms_raw$group[lcms_raw$sample_name == "QC CHRIS Pool"] <- "QC CHRIS Pool"
lcms_raw$group[lcms_raw$sample_name == "00p180_QC1"] <- "00p180_QC1"
lcms_raw$group[lcms_raw$sample_name == "00p180_QC2"] <- "00p180_QC2"
#' "well" should be treated as a categorical variable
lcms_raw$well <- factor(lcms_raw$well)
#' Order the data by plate_barcode and injection index
lcms_raw <- lcms_raw[, order(lcms_raw$plate_barcode, lcms_raw$injection_idx)]
At last we remove samples with only missing values (i.e. samples excluded because of technical problems such as failed injections).
all_na <- function(x) all(is.na(x))
na_lcms <- apply(concentrations(lcms), MARGIN = 2, all_na)
na_fia <- apply(concentrations(fia), MARGIN = 2, all_na)
lcms <- lcms[, !(na_lcms & na_fia)]
lcms_raw <- lcms_raw[, !(na_lcms & na_fia)]
fia <- fia[, !(na_lcms & na_fia)]
fia_raw <- fia_raw[, !(na_lcms & na_fia)]
Prior to quality assessment we provide a general data summary including the numbers of analytes, samples and plates.
| LCMS QC sample count | 2440 |
| LCMS study sample count | 8140 |
| LCMS plate count | 102 |
| LCMS replicated study samples | 880 |
| LCMS analyte count | 42 |
| FIA QC sample count | 2440 |
| FIA study sample count | 8140 |
| FIA plate count | 102 |
| FIA replicated study samples | 880 |
| FIA analyte count | 146 |
The whole data set consists thus of in total 102 plates, with 11 plates being replicated.
Before we evaluate the quality of the data we try to identify potentially problematic samples. These are samples with either a higher percentage of missing values or consistent differences in analyte concentrations (compared to other samples of the same sample type).
Below we determine the number of missing values per sample and calculate the RLA (relative log abundance) for each analyte across samples of the same sample type (being either the QC sample type or male/female study samples).
lcms_smpls_na <- apply(concentrations(lcms), 2, function(z) sum(is.na(z)))
fia_smpls_na <- apply(concentrations(fia), 2, function(z) sum(is.na(z)))
lcms_smpls_na_grp <- split(lcms_smpls_na, lcms$group)
fia_smpls_na_grp <- split(fia_smpls_na, fia$group)
Figure 1: Distribution of the number of missing values per sample for each sample type
Upper panel: LCMS data, lower: FIA. The vertical line represents the mean plus 3 standard deviations.
Potential outlier samples are identified as those with a number of missing values that is larger than the average number of missing values for that sample group plus 3 times its standard deviation. The outlier candidates are listed below.
| sample_name | plate_name | well | group | 80% NA count |
|---|---|---|---|---|
| 00p180_QC1 | BPLT00000281 | 14 | 00p180_QC1 | 3 |
| QC NIST STD | BPLT00000281 | 62 | QC NIST STD | 15 |
| 00p180_QC1 | BPLT00000281 | 14 | 00p180_QC1 | 3 |
| QC NIST STD | BPLT00000281 | 62 | QC NIST STD | 15 |
| QC CHRIS Pool | BPLT00000281 | 50 | QC CHRIS Pool | 13 |
| QC CHRIS Pool | BPLT00000086 | 50 | QC CHRIS Pool | 13 |
| 00p180_QC2 | BPLT00000086 | 26 | 00p180_QC2 | 1 |
| 00p180_QC2 | BPLT00000086 | 67 | 00p180_QC2 | 1 |
| 00p180_QC2 | BPLT00000086 | 96 | 00p180_QC2 | 1 |
| 0010111776 | BPLT00000409 | 5 | study | 14 |
| 0010215416 | BPLT00000409 | 24 | study | 14 |
| QC NIST STD | BPLT00000233 | 62 | QC NIST STD | 49 |
| NA count | data | idx |
|---|---|---|
| 42 | LCMS | 1 |
| 42 | LCMS | 2 |
| 42 | LCMS | 3 |
| 42 | LCMS | 4 |
| 42 | LCMS | 5 |
| 42 | LCMS | 6 |
| 42 | LCMS | 7 |
| 42 | LCMS | 8 |
| 42 | LCMS | 9 |
| 17 | LCMS | 10 |
| 17 | LCMS | 11 |
| 146 | FIA | 12 |
The identified samples are mostly QC samples (especially in the LCMS data) and some study samples. Since the number of missing values is not extremely high for these latter we might still want to keep them. Below we define the index of these outlier samples, keeping the identified potential outlier study samples.
outl <- lapply(lcms_smpls_na_grp, function(z)
z > (mean(z) + 3 * sd(z)))
outl <- unsplit(outl, lcms$group)
lcms_na_out_idx <- unname(which(outl & lcms$group != "study"))
outl <- lapply(fia_smpls_na_grp, function(z)
z > (mean(z) + 3 * sd(z)))
outl <- unsplit(outl, fia$group)
fia_na_out_idx <- unname(which(outl & fia$group != "study"))
Next we identify potentially problematic samples based on the measured analyte abundances relative to the average of that analyte across samples of the same group. Here we are specifically looking for samples that have considerably lower concentrations in most/all analytes (indicating potentially failed injections, globally lower amount of samples or similar).
The RLAs calculated below represent the log abundances of measurements relative to the median abundance within the same sample group (sample group being the sample type).
lcms_grp <- lcms$group
## lcms_grp[lcms_grp == "study"] <- as.character(lcms$sex[lcms_grp == "study"])
lcms_rla <- rowRla(concentrations(lcms), lcms_grp)
#' FIA
fia_grp <- fia$group
## fia_grp[fia_grp == "study"] <- as.character(fia$sex[fia_grp == "study"])
fia_rla <- rowRla(concentrations(fia), fia_grp)
The plot below shows the per-sample distribution of RLA values.
Figure 2: RLA plot for the LCMS (upper) and FIA data (lower plot)
Samples are colored by their sample type. Shown are within-sample-type RLAs. Sample type is either male or female study samples, or QC sample type. Grey coloring indicates the different plates.
Variance of RLA values are low for the LCMS data. In contrast, for the FIA data a considerable amount of samples have a more than 2-fold lower average abundance.
Next we aggregate the RLA values per sample by calculating for each sample the median, 80 and 90% quantile RLA across all analytes.
lcms_rla_qnts <- t(apply(lcms_rla, MARGIN = 2, quantile,
probs = c(0.5, 0.8, 0.9), na.rm = TRUE))
fia_rla_qnts <- t(apply(fia_rla, MARGIN = 2, quantile,
probs = c(0.5, 0.8, 0.9), na.rm = TRUE))
The distribution of the 80% RLA quantile per sample for all sample types is shown below.
Figure 3: Distribution of the RLAs per sample type
Upper panel: LCMS data, lower: FIA. The vertical line represents the mean minus 3 standard deviations.
There seem to be some samples with a 80% RLA quantile being lower than the mean minus 3 standard deviations (vertical lines in the plot above). With the exception of QC NIST STD and study samples the 80% RLA quantile of these potential outliers is however close to 0 which argues against a strong systematic difference in concentrations. Based on the distributions above we will however mark QC NIST STD samples with an 80% RLA quantile < -0.4 and study samples with an 80% RLA quantile < -1 as potential outliers. Ideally we should repeat the affected study samples, since we can not rule out that these differences are real.
lcms_rla_out_idx <- which(lcms_rla_qnts[, "80%"] < -1 | (
lcms_rla_qnts[, "80%"] < -0.4 & lcms$group == "QC NIST STD"))
fia_rla_out_idx <- which(fia_rla_qnts[, "80%"] < -1 | (
fia_rla_qnts[, "80%"] < -0.4 & fia$group == "QC NIST STD"))
The study samples for which 80% of the analytes have an RLA smaller -1 and the QC NIST STD samples with an 80% RLA quantile < -0.4 are listed in the table below (for both the LCMS and FIA data).
| sample_name | plate_name | well | 80% RLA | data | |
|---|---|---|---|---|---|
| 14875 | QC NIST STD | BPLT00000137 | 62 | -0.5535 | LCMS |
| 14874 | QC NIST STD | BPLT00000137 | 62 | -0.5738 | LCMS |
| 14833 | QC NIST STD | BPLT00000137 | 62 | -0.5938 | LCMS |
| 14837 | QC NIST STD | BPLT00000137 | 62 | -0.5891 | LCMS |
| 14838 | QC NIST STD | BPLT00000137 | 62 | -0.5888 | LCMS |
| 23160 | QC NIST STD | BPLT00000209 | 62 | -0.5177 | LCMS |
| 23162 | QC NIST STD | BPLT00000209 | 62 | -0.4825 | LCMS |
| 23161 | QC NIST STD | BPLT00000209 | 62 | -0.4909 | LCMS |
| 23163 | QC NIST STD | BPLT00000209 | 62 | -0.4755 | LCMS |
| 23150 | QC NIST STD | BPLT00000209 | 62 | -0.4987 | LCMS |
| 6269 | QC NIST STD | BPLT00000285 | 62 | -0.6095 | LCMS |
| 6272 | QC NIST STD | BPLT00000285 | 62 | -0.5907 | LCMS |
| 6271 | QC NIST STD | BPLT00000285 | 62 | -0.5865 | LCMS |
| 6270 | QC NIST STD | BPLT00000285 | 62 | -0.6084 | LCMS |
| 6273 | QC NIST STD | BPLT00000285 | 62 | -0.6005 | LCMS |
| 15454 | 0010170742 | BPLT00000129 | 93 | -1.181 | FIA |
| 15477 | 0010029271 | BPLT00000129 | 84 | -1.288 | FIA |
| 20151 | 0010050687 | BPLT00000057 | 39 | -1.057 | FIA |
| 6619 | 0010016608 | BPLT00000281 | 35 | -1.481 | FIA |
| 22593 | 0010002492 | BPLT00000249 | 32 | -1.413 | FIA |
| 8835 | 0010125401 | BPLT00000233 | 74 | -1.73 | FIA |
| 17715 | 0010338094 | BPLT00000093 | 72 | -1.231 | FIA |
| 17260 | 0010236351 | BPLT00000101 | 27 | -1.388 | FIA |
| 16332 | 0010189496 | BPLT00000113 | 8 | -1.203 | FIA |
| 15676 | 0010249634 | BPLT00000125 | 58 | -1.227 | FIA |
| 13202 | 0010265431 | BPLT00000165 | 87 | -1.476 | FIA |
| 13190 | 0010216869 | BPLT00000165 | 88 | -1.471 | FIA |
| 13186 | 0010204360 | BPLT00000165 | 60 | -1.358 | FIA |
| 13420 | 0010266285 | BPLT00000161 | 74 | -2.561 | FIA |
| 15046 | QC NIST STD | BPLT00000137 | 62 | -0.5338 | FIA |
| 15045 | QC NIST STD | BPLT00000137 | 62 | -0.5467 | FIA |
| 15042 | QC NIST STD | BPLT00000137 | 62 | -0.4836 | FIA |
| 15043 | QC NIST STD | BPLT00000137 | 62 | -0.4865 | FIA |
| 15044 | QC NIST STD | BPLT00000137 | 62 | -0.4515 | FIA |
| 9258 | 0010206447 | BPLT00000229 | 29 | -1.05 | FIA |
| 9912 | 0010117224 | BPLT00000221 | 5 | -1.399 | FIA |
| 23025 | 0010364678 | BPLT00000213 | 91 | -1.702 | FIA |
| 19702 | 0010050995 | BPLT00000065 | 10 | -1.15 | FIA |
| 23277 | QC NIST STD | BPLT00000209 | 62 | -0.4742 | FIA |
| 23192 | 0010072017 | BPLT00000209 | 52 | -1.237 | FIA |
| 23279 | QC NIST STD | BPLT00000209 | 62 | -0.5357 | FIA |
| 23278 | QC NIST STD | BPLT00000209 | 62 | -0.515 | FIA |
| 23280 | QC NIST STD | BPLT00000209 | 62 | -0.5078 | FIA |
| 23276 | QC NIST STD | BPLT00000209 | 62 | -0.518 | FIA |
| 23405 | 0010035478 | BPLT00000201 | 54 | -1.179 | FIA |
| 11007 | 0010054356 | BPLT00000193 | 82 | -1.537 | FIA |
| 8552 | 0010194648 | BPLT00000237 | 29 | -1.018 | FIA |
| 8622 | 0010145197 | BPLT00000237 | 65 | -1.311 | FIA |
| 8597 | 0010084920 | BPLT00000237 | 10 | -1.331 | FIA |
| 8551 | 0010294559 | BPLT00000237 | 82 | -1.343 | FIA |
| 10177 | 0010013439 | BPLT00000217 | 65 | -1.347 | FIA |
| 10164 | 0010126780 | BPLT00000217 | 10 | -1.299 | FIA |
| 10199 | 0010230994 | BPLT00000217 | 82 | -1.205 | FIA |
| 10605 | 0010010220 | BPLT00000205 | 15 | -1.633 | FIA |
| 10638 | 0010261751 | BPLT00000205 | 18 | -1.376 | FIA |
| 12344 | 0010008924 | BPLT00000177 | 3 | -1.376 | FIA |
| 12342 | 0010066541 | BPLT00000177 | 15 | -1.068 | FIA |
| 12264 | 0010188438 | BPLT00000177 | 82 | -1.37 | FIA |
| 12295 | 0010253568 | BPLT00000177 | 60 | -1.042 | FIA |
| 12302 | 0010319987 | BPLT00000177 | 84 | -1.124 | FIA |
| 12107 | 0010137518 | BPLT00000181 | 87 | -1.151 | FIA |
| 21525 | 0010342281 | BPLT00000041 | 87 | -1.248 | FIA |
| 19933 | 0010278047 | BPLT00000061 | 64 | -1.083 | FIA |
| 20015 | 0010286121 | BPLT00000061 | 43 | -1.131 | FIA |
| 14282 | 0010068538 | BPLT00000145 | 75 | -1.296 | FIA |
| 14321 | 0010085866 | BPLT00000145 | 80 | -1.007 | FIA |
| 9683 | 0010094646 | BPLT00000225 | 64 | -1.429 | FIA |
| 9697 | 0010289563 | BPLT00000225 | 42 | -1.111 | FIA |
| 9458 | 0010094646 | BPLT00000226 | 64 | -1.157 | FIA |
| 11827 | 0010127859 | BPLT00000182 | 30 | -1.031 | FIA |
| 10414 | 0010010220 | BPLT00000206 | 15 | -1.124 | FIA |
| 10334 | 0010261751 | BPLT00000206 | 18 | -1.064 | FIA |
| 21337 | 0010342281 | BPLT00000042 | 87 | -1.105 | FIA |
| 6385 | QC NIST STD | BPLT00000285 | 62 | -0.538 | FIA |
| 6388 | QC NIST STD | BPLT00000285 | 62 | -0.6041 | FIA |
| 6387 | QC NIST STD | BPLT00000285 | 62 | -0.5857 | FIA |
| 6386 | QC NIST STD | BPLT00000285 | 62 | -0.5803 | FIA |
| 6389 | QC NIST STD | BPLT00000285 | 62 | -0.5677 | FIA |
Not unexpectedly, many RLA-based outlier samples are QC NIST STD samples. All potentially problematic study samples were identified in the FIA data, most of them (5) on plate BPLT00000177 and 3 each on plates BPLT00000237, BPLT00000217 and BPLT00000165. The remaining samples are from different plates hence there is no systematic problem with any of the plates.
Next we evaluate variances of analyte measurements and numbers of missing values in QC samples to identify potentially problematic/noisy analytes. Missing values are measurements excluded due to technical problems or values outside the detection and quantification range. Note that we calculate these values on the full data set, i.e. without excluding potentially bad samples from the previous section. Note that the final evaluation of analyte qualities and definition of the analytes to be flagged is performed in the define-data-release-analytes.Rmd vignette.
prop_na <- function(x, MARGIN = 1) {
apply(x, MARGIN = MARGIN, function(z) sum(is.na(z)) / length(z))
}
anlts_lcms_na <- within_group_fun(concentrations(lcms), lcms$group, prop_na)
anlts_fia_na <- within_group_fun(concentrations(fia), fia$group, prop_na)
anlts_lcms_rsd <- within_group_fun(concentrations(lcms), lcms$group, rowRsd)
anlts_fia_rsd <- within_group_fun(concentrations(fia), fia$group, rowRsd)
Analytes are flagged as having bad quality if one of the following conditions matches: - RSD across 00p180_QC1 samples is > 30% (technically too variable) - RSD across 00p180_QC2 samples is > 30% (technically too variable) - Percentage of missing values across study samples is > 99.9 (i.e. a concentration was measured in less than 5 individuals).
All analytes are supposed to be present in the 0p180 QC samples and high variability of measurements in them might indicate unstable estimation of concentrations.
The table below lists all analytes matching any of these criteria.
| %NA.00p180_QC1 | %NA.00p180_QC2 | %NA.study | |
|---|---|---|---|
| Ac-Orn | 70.78 | 70.99 | 99.73 |
| ADMA | 2.353 | 0.3297 | 2.396 |
| c4-OH-Pro | 100 | 0.3297 | 100 |
| Carnosine | 1.569 | 0.3297 | 100 |
| DOPA | 0.3922 | 0.3297 | 99.93 |
| Dopamine | 0.3922 | 0.3297 | 99.99 |
| Histamine | 12.16 | 12.2 | 99.99 |
| Met-SO | 100 | 0.3297 | 99.99 |
| Nitro-Tyr | 2.353 | 0.3297 | 99.99 |
| PEA | 0.3922 | 0.3297 | 99.99 |
| Spermine | 1.373 | 0.6593 | 97.84 |
| C12:1 | 95.88 | 94.73 | 98.13 |
| C14:1 | 92.75 | 92.97 | 47.71 |
| C14:1-OH | 95.49 | 95.6 | 95.18 |
| C14:2 | 93.53 | 94.07 | 75.33 |
| C14:2-OH | 95.49 | 95.49 | 95.39 |
| C16 | 0 | 0 | 99.93 |
| C16-OH | 88.04 | 91.1 | 90.2 |
| C16:1-OH | 97.25 | 97.36 | 97.22 |
| C16:2 | 90.78 | 89.45 | 41.33 |
| C16:2-OH | 93.92 | 93.41 | 81.88 |
| C18 | 0 | 0 | 99.98 |
| C18:1 | 69.22 | 73.19 | 4.877 |
| C3-DC (C4-OH) | 90.59 | 95.49 | 94.18 |
| C3-OH | 100 | 100 | 100 |
| C3:1 | 98.82 | 99.01 | 98.99 |
| C4:1 | 99.8 | 100 | 100 |
| C5 | 0.9804 | 0.989 | 99.91 |
| C5-DC (C6-OH) | 92.55 | 91.76 | 89.55 |
| C5-M-DC | 95.88 | 96.04 | 96.3 |
| C5-OH (C3-DC-M) | 94.51 | 95.16 | 94 |
| C5:1 | 99.8 | 100 | 100 |
| C5:1-DC | 99.8 | 100 | 99.98 |
| C6:1 | 100 | 100 | 100 |
| C7-DC | 99.61 | 99.78 | 63.42 |
| C9 | 99.8 | 100 | 99.99 |
| lysoPC a C14:0 | 100 | 100 | 100 |
| lysoPC a C26:0 | 99.41 | 98.9 | 98.71 |
| lysoPC a C26:1 | 100 | 100 | 100 |
| lysoPC a C28:0 | 99.41 | 99.12 | 98.57 |
| PC aa C30:2 | 32.16 | 32.09 | 33.91 |
| PC aa C38:1 | 3.725 | 3.077 | 2.948 |
| PC ae C30:1 | 3.725 | 3.846 | 4.582 |
| SM C22:3 | 13.33 | 16.37 | 54.55 |
| SM C26:0 | 3.725 | 2.857 | 1.032 |
| SM C26:1 | 4.51 | 4.615 | 1.609 |
| RSD.00p180_QC1 | RSD.00p180_QC2 | data | |
|---|---|---|---|
| Ac-Orn | 149.7 | 144.3 | LCMS |
| ADMA | 10.62 | 35.49 | LCMS |
| c4-OH-Pro | NA | 19.88 | LCMS |
| Carnosine | 8.345 | 12.96 | LCMS |
| DOPA | 82.76 | 80.19 | LCMS |
| Dopamine | 23.09 | 26.25 | LCMS |
| Histamine | 67.09 | 63.94 | LCMS |
| Met-SO | NA | 28.12 | LCMS |
| Nitro-Tyr | 13.24 | 16.04 | LCMS |
| PEA | 7.072 | 6.136 | LCMS |
| Spermine | 50.85 | 64.24 | LCMS |
| C12:1 | 23.97 | 42.55 | FIA |
| C14:1 | 89.01 | 79.49 | FIA |
| C14:1-OH | 140.4 | 14.52 | FIA |
| C14:2 | 54.96 | 13.17 | FIA |
| C14:2-OH | 65.25 | 11.98 | FIA |
| C16 | 14.89 | 16.04 | FIA |
| C16-OH | 81.54 | 11.86 | FIA |
| C16:1-OH | 66.56 | 9.354 | FIA |
| C16:2 | 91.73 | 36.87 | FIA |
| C16:2-OH | 73.07 | 29.71 | FIA |
| C18 | 18.78 | 18.54 | FIA |
| C18:1 | 46.63 | 112.4 | FIA |
| C3-DC (C4-OH) | 43.38 | 25.47 | FIA |
| C3-OH | NA | NA | FIA |
| C3:1 | 118.1 | 8.428 | FIA |
| C4:1 | NA | NA | FIA |
| C5 | 11.14 | 11.06 | FIA |
| C5-DC (C6-OH) | 61.44 | 15.5 | FIA |
| C5-M-DC | 114.7 | 5.037 | FIA |
| C5-OH (C3-DC-M) | 67.41 | 21.11 | FIA |
| C5:1 | NA | NA | FIA |
| C5:1-DC | NA | NA | FIA |
| C6:1 | NA | NA | FIA |
| C7-DC | 50.31 | 1.746 | FIA |
| C9 | NA | NA | FIA |
| lysoPC a C14:0 | NA | NA | FIA |
| lysoPC a C26:0 | 33.66 | 49.78 | FIA |
| lysoPC a C26:1 | NA | NA | FIA |
| lysoPC a C28:0 | 18.8 | 32.46 | FIA |
| PC aa C30:2 | 52.2 | 47.67 | FIA |
| PC aa C38:1 | 43.77 | 42.6 | FIA |
| PC ae C30:1 | 39.61 | 39.85 | FIA |
| SM C22:3 | 98.04 | 98.74 | FIA |
| SM C26:0 | 35.02 | 33.63 | FIA |
| SM C26:1 | 33.22 | 31.92 | FIA |
Note that there might be better approaches to identify poor quality signals, e.g. comparing the actual analyte concentrations to the expected concentrations. Also, problematic analytes will be identified in a second Rmd file (define-data-release-analytes.Rmd after removing poor quality samples and based only on one of the replicated plates.
Next we reduce the data set by removing samples that were identified above as potential outlier and poor quality samples. Also, we remove analytes considered to yield too noisy data.
lcms <- lcms[-lcms_anlts_out_idx,
-unique(c(lcms_na_out_idx, lcms_rla_out_idx))]
lcms_raw <- lcms_raw[-lcms_anlts_out_idx,
-unique(c(lcms_na_out_idx, lcms_rla_out_idx))]
fia <- fia[-fia_anlts_out_idx,
-unique(c(fia_na_out_idx, fia_rla_out_idx))]
fia_raw <- fia_raw[-fia_anlts_out_idx,
-unique(c(fia_na_out_idx, fia_rla_out_idx))]
All further data processing will be performed on the reduced data set.
First we evaluate the general quality of the LCMS and FIA data set. This comprises plots of RLAs before and after normalization, relative standard deviation (RSD) calculated for each analyte in QC samples (across all plates), concentration ratios between replicated measurements.
Note that the quality assessment in this document is performed on all samples, also including the replicated ones.
First we calculate the relative log abundances (RLA) before and after normalization, using the sample type as groups within which the relative abundances are calculated.
## Defining the RLA groups
grp <- lcms$group
## grp[grp == "study"] <- as.character(lcms$sex)[grp == "study"]
lcms$rla_group <- grp
lcms_raw$rla_group <- grp
grp <- fia$group
## grp[grp == "study"] <- as.character(fia$sex)[grp == "study"]
fia$rla_group <- grp
fia_raw$rla_group <- grp
rla_lcms <- colMedians(rowRla(concentrations(lcms),
lcms$rla_group), na.rm = TRUE)
rla_lcms_raw <- colMedians(rowRla(concentrations(lcms_raw),
lcms_raw$rla_group), na.rm = TRUE)
rla_fia <- colMedians(rowRla(concentrations(fia),
fia$rla_group), na.rm = TRUE)
rla_fia_raw <- colMedians(rowRla(concentrations(fia_raw),
fia_raw$rla_group), na.rm = TRUE)
Figure 4: Per sample median RLA before (red) and after normalization (blue)
The grey rectangles indicate plates/batches.
Between-batch normalization reduced systematic difference in abundances, especially in the FIA data set centering the per sample median RLAs around 0.
The RLA plots above were created on blessed signals, i.e. measurements that are within the detection and quantification range defined by Biocrates. Data analyses will however be performed also considering signal outside this range. We thus create below also RLA plots considering all of the measured intensities.
Figure 5: Per sample median RLA before (red) and after normalization (blue) considering all measured intensities
The grey rectangles indicate plates/batches.
Normalization reduced thus the between-group differences considerably.
Below we calculate RSD values for each analyte in each sample type. We consider only signal within the detection range for each analyte.
rsd_lcms <- within_group_fun(concentrations(lcms),
lcms$group, rowRsd)
rsd_fia <- within_group_fun(concentrations(fia),
fia$group, rowRsd)
rsd_lcms_raw <- within_group_fun(concentrations(lcms_raw),
lcms_raw$group, rowRsd)
rsd_fia_raw <- within_group_fun(concentrations(fia_raw),
fia_raw$group, rowRsd)
Figure 6: Distribution of RSD values per sample type for the LCMS (left) and FIA (right) data
The table below summarizes the RSD (across analytes) for each sample type before and after normalization.
| LCMS raw | LCMS | FIA raw | FIA | |
|---|---|---|---|---|
| median RSD 00p180_QC1 | 7.651 | 4.628 | 22.96 | 9.141 |
| median RSD 00p180_QC2 | 7.586 | 6.373 | 22.26 | 8.655 |
| median RSD QC CHRIS Pool | 6.937 | 4.505 | 23.13 | 6.723 |
| median RSD QC NIST STD | 9.85 | 9.231 | 25.78 | 10.96 |
| median RSD study | 22.28 | 21.95 | 36.03 | 28.05 |
| % RSD > 30 00p180_QC1 | 0 | 0 | 14.41 | 0 |
| % RSD > 30 00p180_QC2 | 0 | 0 | 13.51 | 0 |
| % RSD > 30 QC CHRIS Pool | 0 | 0 | 15.32 | 1.802 |
| % RSD > 30 QC NIST STD | 0 | 0 | 34.23 | 2.703 |
| % RSD > 30 study | 19.35 | 19.35 | 75.68 | 34.23 |
The average RSD for QC samples is below ~9% for the LCMS and below ~11% for the FIA data. Between-batch normalization improved the quality of the FIA data considerably reducing the RSD from over 20% to about 10%. Note: only RSD of QC NIST STD samples and study samples should be considered for the quality assessment as these were not used to estimate the batch effect. Also, importantly, normalization reduced the RSD of QC samples while having only a low impact on the RSD of study samples.
Since many analyses will also include measurements outside the detection range, we repeat the analysis considering all measurements.
rsd_lcms_all <- within_group_fun(
concentrations(lcms, blessing = "none"),
lcms$group, rowRsd)
rsd_fia_all <- within_group_fun(
concentrations(fia, blessing = "none"),
fia$group, rowRsd)
rsd_lcms_raw_all <- within_group_fun(
concentrations(lcms_raw, blessing = "none"),
lcms_raw$group, rowRsd)
rsd_fia_raw_all <- within_group_fun(
concentrations(fia_raw, blessing = "none"),
fia_raw$group, rowRsd)
| LCMS raw | LCMS | FIA raw | FIA | |
|---|---|---|---|---|
| median RSD 00p180_QC1 | 7.651 | 4.628 | 23.39 | 9.161 |
| median RSD 00p180_QC2 | 7.586 | 6.373 | 23.26 | 8.689 |
| median RSD QC CHRIS Pool | 7.025 | 4.505 | 23.71 | 7.113 |
| median RSD QC NIST STD | 10.2 | 9.381 | 26.89 | 11.01 |
| median RSD study | 23.42 | 23.11 | 37.08 | 28.3 |
| % RSD > 30 00p180_QC1 | 0 | 0 | 16.22 | 0.9009 |
| % RSD > 30 00p180_QC2 | 0 | 0 | 16.22 | 0.9009 |
| % RSD > 30 QC CHRIS Pool | 3.226 | 0 | 18.02 | 1.802 |
| % RSD > 30 QC NIST STD | 6.452 | 6.452 | 39.64 | 2.703 |
| % RSD > 30 study | 32.26 | 32.26 | 82.88 | 35.14 |
Even when signals outside the detection range are considered, RSD of QC Samples are below ~ 10% after normalization. Again, the improvement of the FIA data quality is impressive.
As an additional quality criteria we evaluate the difference (ratio) in abundances of replicated measurements. To this end we first identify the replicated samples and subsequently determine the largest ratio of abundances measured for an analyte in the replicates of a sample. The percentage of such MRA (maximum ratio of abundances) larger than 1.3 are used as a quality measure. This represents the percentage of analyte measurements for which concentrations differ by more than 30%.
#' Identify replicated study samples and subset the data
samp_cnt <- table(lcms$sample_name[lcms$group == "study"])
mult_ids <- names(samp_cnt)[samp_cnt > 1]
#' subsetting the data to replicated measurements only
lcms_repl <- lcms[, lcms$sample_name %in% mult_ids]
#' Define the function to calculate the MRA:
#' Returns the ratio between the max and the min value or
#' NA if less than 2 valid measurements are available.
mra <- function(x) {
if (is.matrix(x)) {
apply(x, MARGIN = 1, function(z) {
z <- z[!is.na(z)]
if (length(z) > 1) {
rng <- range(z)
rng[2] / rng[1]
} else
NA_real_
})
} else
rep(NA_real_, length(x))
}
mra_lcms <- do.call(
cbind, within_group_fun(concentrations(lcms_repl),
lcms_repl$sample_name, mra))
#' Remove columns (sample pairs) with only NAs.
mra_lcms <- mra_lcms[, prop_na(mra_lcms, 2) < 1]
The distribution of MRA values per replicated sample before and after normalization is shown below, first for the LCMS and subsequently for the FIA data. For easier visualization the data is log2 transformed. A log2 MRA of 0 represents identical concentrations, log2 MRA larger than 1 a more than two-fold difference in abundances.
Figure 7: Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the LCMS data before and after normalization
Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.
Normalization reduced the MRAs for all replicates, and had its largest impact on the measurements from the first two replicated plates. Per-sample average log2 MRAs are, with the exception of some replicates, below 0.2 for all samples, which represents less than 15% difference in concentration. Based on the MRAs, observed differences of abundances in the LCMS data which are above ~ 1.5-fold (log2 MRA of 0.6) can be considered to be real while for smaller differences it is not possible to discriminate between technical and biological variance.
Next we plot the MRAs before and after normalization for the FIA data.
Figure 8: Distribution of the maximal ratio of abundances (MRA) for replicated measurements for the FIA data before and after normalization
Background shading indicates samples from replicated plates. Blue points represent the mean MRA of a sample. Data points above the grey horizontal line (log2 MRA = 1) indicate measurements with a more than two-fold difference in concentration.
Differences between replicated measurements are much larger for the FIA than for the LCMS data. These differences could be reduced by the normalization, specifically for replicated plate pairs 1 (BPLT00000037/BPLT00000038), 4 (BPLT00000229/BPLT00000230) and 11 (BPLT00000049/BPLT00000050). For the FIA data even two-fold differences could be caused exclusively by technical variances. Thus, potential biologically relevant differences should be more than two-fold at least.
The table below summarizes the differences between replicated measurements. For each replicated individual the median MRA (across all analytes) was calculated, as well as the 80% and the 95% quantile. The mean of these across all replicated individuals is then reported and used as quality criteria.
| LCMS raw | LCMS | FIA raw | FIA | |
|---|---|---|---|---|
| mean of median MRA | 1.064 | 1.054 | 1.197 | 1.104 |
| mean of 80% quantile MRA | 1.113 | 1.092 | 1.374 | 1.175 |
| mean of 95% quantile MRA | 1.182 | 1.145 | 1.599 | 1.326 |
| no. samples with median MRA > 1.3 | 0 | 0 | 104 | 19 |
| % of measurements with MRA > 1.3 | 1.82 | 0.8768 | 28.7 | 8.61 |
On average, 80% of analytes have a less than 1.1-fold difference in concentration between replicates for LCMS data and 1.2-fold difference for FIA data. The quality of the data is thus extremely good. Normalization could improve data quality, specifically for the FIA data.
Despite the overall good quality of the data, some replicated measurements show (in some cases extremely) large differences. These are listed below.
| sample_name | plate_name | median MRA | data | AR 0% |
|---|---|---|---|---|
| 0010133324 | BPLT00000225, BPLT00000226 | 2.018 | FIA | 0.3591 |
| 0010071152 | BPLT00000225, BPLT00000226 | 1.875 | FIA | 0.2786 |
| 0010222815 | BPLT00000205, BPLT00000206 | 1.859 | FIA | 0.3554 |
| 0010306114 | BPLT00000225, BPLT00000226 | 1.629 | FIA | 0.4419 |
| 0010324169 | BPLT00000181, BPLT00000182 | 1.617 | FIA | 0.8972 |
| 0010148293 | BPLT00000225, BPLT00000226 | 1.575 | FIA | 0.3779 |
| 0010103176 | BPLT00000081, BPLT00000082 | 1.522 | FIA | 0.1071 |
| 0010339901 | BPLT00000225, BPLT00000226 | 1.435 | FIA | 0.3876 |
| 0010063900 | BPLT00000105, BPLT00000106 | 1.433 | FIA | 0.4846 |
| 0010039436 | BPLT00000205, BPLT00000206 | 1.431 | FIA | 0.5496 |
| 0010226390 | BPLT00000205, BPLT00000206 | 1.408 | FIA | 0.5034 |
| 0010176777 | BPLT00000041, BPLT00000042 | 1.377 | FIA | 0.5573 |
| 0010179082 | BPLT00000049, BPLT00000050 | 1.372 | FIA | 0.8236 |
| 0010167974 | BPLT00000225, BPLT00000226 | 1.361 | FIA | 0.5553 |
| 0010291696 | BPLT00000241, BPLT00000242 | 1.343 | FIA | 0.535 |
| 0010281876 | BPLT00000225, BPLT00000226 | 1.332 | FIA | 0.367 |
| 0010271305 | BPLT00000225, BPLT00000226 | 1.319 | FIA | 0.5374 |
| 0010290792 | BPLT00000241, BPLT00000242 | 1.31 | FIA | 0.505 |
| 0010252353 | BPLT00000049, BPLT00000050 | 1.308 | FIA | 0.8679 |
| AR 25% | AR 50% | AR 75% | AR 100% | int | slope | R2 |
|---|---|---|---|---|---|---|
| 0.4715 | 0.4954 | 0.5607 | 1.499 | 14.05 | 1.025 | 0.9886 |
| 0.4955 | 0.5334 | 0.6426 | 1.213 | 18 | 1.024 | 0.9912 |
| 0.4962 | 0.5381 | 0.6681 | 1.061 | 11.67 | 1.103 | 0.9899 |
| 0.5638 | 0.6145 | 0.6867 | 2.035 | 7.977 | 0.8823 | 0.9936 |
| 1.416 | 1.617 | 1.693 | 2.265 | -10.53 | 1.004 | 0.9899 |
| 0.6015 | 0.635 | 0.6939 | 1.193 | 8.326 | 1.03 | 0.9956 |
| 0.5958 | 0.657 | 0.7359 | 1.07 | 7.748 | 1.204 | 0.9958 |
| 0.649 | 0.6967 | 0.7571 | 1.136 | 7.642 | 1.019 | 0.9965 |
| 0.6725 | 0.6977 | 0.7629 | 1.172 | 7.783 | 0.9383 | 0.9964 |
| 0.671 | 0.6989 | 0.82 | 1.149 | 5.599 | 1.132 | 0.9981 |
| 0.6745 | 0.7122 | 0.8478 | 2.43 | 5.061 | 1.082 | 0.9983 |
| 0.6621 | 0.7264 | 0.8964 | 1.157 | 5.293 | 0.9386 | 0.9961 |
| 1.233 | 1.372 | 1.572 | 2.08 | -6.964 | 0.8458 | 0.9963 |
| 0.7154 | 0.7373 | 0.8134 | 1.765 | 8.407 | 1.025 | 0.9942 |
| 0.702 | 0.7447 | 0.7885 | 1.325 | 4.283 | 1.231 | 0.9983 |
| 0.7171 | 0.7509 | 0.8256 | 1.214 | 5.779 | 1.035 | 0.998 |
| 0.7367 | 0.7586 | 0.8331 | 1.332 | 7.668 | 0.9963 | 0.9958 |
| 0.7266 | 0.7634 | 0.8089 | 1.056 | 7.003 | 1.105 | 0.9985 |
| 1.065 | 1.308 | 1.412 | 2.147 | -12.05 | 0.9723 | 0.9944 |
Not a single replicate pair in the LC-MS data set had a median MRA larger 1.3. For the FIA data 20 had a median MRA larger than 1.3, but, with the exception of a single sample, smaller than 2-fold. Most of the samples (8) are from replicated plates BPLT00000225/BPLT00000226.
The MRA reports only absolute differences, thus it can not distinguish between sample mixups and differences in sample amount (or failing injections). The very large R squares argue however against sample mixups. Thus differences are most likely caused by differences in absolute concentrations.
At last we perform a PCA to group samples based on their metabolic profile. We will use all measured concentrations even if they were outside the detection range for an analyte. Note that Biocrates provides an abundance estimate for all analytes even if they are outside the detection or quantification limit. For values outside of the detection and quantification range Biocrates does however not guarantee that they are correct.
All analytes with a single missing value (i.e. measurement dropped because of a technical problem) are subsequently dropped.
conc_lcms <- concentrations(lcms, blessing = "none")
conc_fia <- concentrations(fia, blessing = "none")
conc_lcms <- na.omit(conc_lcms)
conc_fia <- na.omit(conc_fia)
The PCA analysis is based on the analytes listed in the table below.
| analyte_class | |
|---|---|
| Ala | aminoacids |
| Asn | aminoacids |
| Cit | aminoacids |
| Creatinine | biogenic amines |
| Gln | aminoacids |
| Glu | aminoacids |
| Gly | aminoacids |
| His | aminoacids |
| Ile | aminoacids |
| Kynurenine | biogenic amines |
| Lys | aminoacids |
| Met | aminoacids |
| Phe | aminoacids |
| Pro | aminoacids |
| Ser | aminoacids |
| t4-OH-Pro | biogenic amines |
| Trp | aminoacids |
| Tyr | aminoacids |
| Val | aminoacids |
| C0 | acylcarnitines |
| C10 | acylcarnitines |
| C10:2 | acylcarnitines |
| C12 | acylcarnitines |
| C12-DC | acylcarnitines |
| C14 | acylcarnitines |
| C18:1-OH | acylcarnitines |
| C18:2 | acylcarnitines |
| C2 | acylcarnitines |
| C3 | acylcarnitines |
| C8 | acylcarnitines |
| H1 | sugars |
| lysoPC a C16:0 | glycerophospholipids |
| lysoPC a C16:1 | glycerophospholipids |
| lysoPC a C17:0 | glycerophospholipids |
| lysoPC a C18:0 | glycerophospholipids |
| lysoPC a C18:1 | glycerophospholipids |
| lysoPC a C18:2 | glycerophospholipids |
| lysoPC a C20:3 | glycerophospholipids |
| lysoPC a C20:4 | glycerophospholipids |
| lysoPC a C24:0 | glycerophospholipids |
| lysoPC a C28:1 | glycerophospholipids |
| PC aa C24:0 | glycerophospholipids |
| PC aa C26:0 | glycerophospholipids |
| PC aa C28:1 | glycerophospholipids |
| PC aa C30:0 | glycerophospholipids |
| PC aa C32:0 | glycerophospholipids |
| PC aa C32:1 | glycerophospholipids |
| PC aa C32:2 | glycerophospholipids |
| PC aa C32:3 | glycerophospholipids |
| PC aa C34:1 | glycerophospholipids |
| PC aa C34:2 | glycerophospholipids |
| PC aa C34:3 | glycerophospholipids |
| PC aa C34:4 | glycerophospholipids |
| PC aa C36:1 | glycerophospholipids |
| PC aa C36:2 | glycerophospholipids |
| PC aa C36:3 | glycerophospholipids |
| PC aa C36:4 | glycerophospholipids |
| PC aa C36:5 | glycerophospholipids |
| PC aa C36:6 | glycerophospholipids |
| PC aa C38:0 | glycerophospholipids |
| PC aa C38:3 | glycerophospholipids |
| PC aa C38:4 | glycerophospholipids |
| PC aa C38:5 | glycerophospholipids |
| PC aa C38:6 | glycerophospholipids |
| PC aa C40:2 | glycerophospholipids |
| PC aa C40:3 | glycerophospholipids |
| PC aa C40:4 | glycerophospholipids |
| PC aa C40:5 | glycerophospholipids |
| PC aa C40:6 | glycerophospholipids |
| PC aa C42:0 | glycerophospholipids |
| PC aa C42:1 | glycerophospholipids |
| PC aa C42:2 | glycerophospholipids |
| PC aa C42:4 | glycerophospholipids |
| PC aa C42:5 | glycerophospholipids |
| PC aa C42:6 | glycerophospholipids |
| PC ae C30:0 | glycerophospholipids |
| PC ae C30:2 | glycerophospholipids |
| PC ae C32:1 | glycerophospholipids |
| PC ae C32:2 | glycerophospholipids |
| PC ae C34:0 | glycerophospholipids |
| PC ae C34:1 | glycerophospholipids |
| PC ae C34:2 | glycerophospholipids |
| PC ae C34:3 | glycerophospholipids |
| PC ae C36:0 | glycerophospholipids |
| PC ae C36:1 | glycerophospholipids |
| PC ae C36:2 | glycerophospholipids |
| PC ae C36:3 | glycerophospholipids |
| PC ae C36:4 | glycerophospholipids |
| PC ae C36:5 | glycerophospholipids |
| PC ae C38:0 | glycerophospholipids |
| PC ae C38:3 | glycerophospholipids |
| PC ae C38:4 | glycerophospholipids |
| PC ae C38:5 | glycerophospholipids |
| PC ae C38:6 | glycerophospholipids |
| PC ae C40:1 | glycerophospholipids |
| PC ae C40:2 | glycerophospholipids |
| PC ae C40:3 | glycerophospholipids |
| PC ae C40:4 | glycerophospholipids |
| PC ae C40:5 | glycerophospholipids |
| PC ae C40:6 | glycerophospholipids |
| PC ae C42:0 | glycerophospholipids |
| PC ae C42:1 | glycerophospholipids |
| PC ae C42:2 | glycerophospholipids |
| PC ae C42:3 | glycerophospholipids |
| PC ae C42:4 | glycerophospholipids |
| PC ae C42:5 | glycerophospholipids |
| PC ae C44:3 | glycerophospholipids |
| PC ae C44:4 | glycerophospholipids |
| PC ae C44:5 | glycerophospholipids |
| PC ae C44:6 | glycerophospholipids |
| SM (OH) C14:1 | sphingolipids |
| SM (OH) C16:1 | sphingolipids |
| SM (OH) C22:1 | sphingolipids |
| SM (OH) C22:2 | sphingolipids |
| SM (OH) C24:1 | sphingolipids |
| SM C16:0 | sphingolipids |
| SM C16:1 | sphingolipids |
| SM C18:0 | sphingolipids |
| SM C18:1 | sphingolipids |
| SM C24:0 | sphingolipids |
| SM C24:1 | sphingolipids |
| biochemical_name | |
|---|---|
| Ala | Alanine |
| Asn | Asparagine |
| Cit | Citrulline |
| Creatinine | Creatinine |
| Gln | Glutamine |
| Glu | Glutamate |
| Gly | Glycine |
| His | Histidine |
| Ile | Isoleucine |
| Kynurenine | Kynurenine |
| Lys | Lysine |
| Met | Methionine |
| Phe | Phenylalanine |
| Pro | Proline |
| Ser | Serine |
| t4-OH-Pro | trans-4-Hydroxyproline |
| Trp | Tryptophan |
| Tyr | Tyrosine |
| Val | Valine |
| C0 | Carnitine |
| C10 | Decanoylcarnitine |
| C10:2 | Decadienylcarnitine |
| C12 | Dodecanoylcarnitine |
| C12-DC | Dodecanedioylcarnitine |
| C14 | Tetradecanoylcarnitine |
| C18:1-OH | Hydroxyoctadecenoylcarnitine |
| C18:2 | Octadecadienylcarnitine |
| C2 | Acetylcarnitine |
| C3 | Propionylcarnitine |
| C8 | Octanoylcarnitine |
| H1 | Hexose |
| lysoPC a C16:0 | lysoPhosphatidylcholine acyl C16:0 |
| lysoPC a C16:1 | lysoPhosphatidylcholine acyl C16:1 |
| lysoPC a C17:0 | lysoPhosphatidylcholine acyl C17:0 |
| lysoPC a C18:0 | lysoPhosphatidylcholine acyl C18:0 |
| lysoPC a C18:1 | lysoPhosphatidylcholine acyl C18:1 |
| lysoPC a C18:2 | lysoPhosphatidylcholine acyl C18:2 |
| lysoPC a C20:3 | lysoPhosphatidylcholine acyl C20:3 |
| lysoPC a C20:4 | lysoPhosphatidylcholine acyl C20:4 |
| lysoPC a C24:0 | lysoPhosphatidylcholine acyl C24:0 |
| lysoPC a C28:1 | lysoPhosphatidylcholine acyl C28:1 |
| PC aa C24:0 | Phosphatidylcholine diacyl C24:0 |
| PC aa C26:0 | Phosphatidylcholine diacyl C26:0 |
| PC aa C28:1 | Phosphatidylcholine diacyl C28:1 |
| PC aa C30:0 | Phosphatidylcholine diacyl C30:0 |
| PC aa C32:0 | Phosphatidylcholine diacyl C32:0 |
| PC aa C32:1 | Phosphatidylcholine diacyl C32:1 |
| PC aa C32:2 | Phosphatidylcholine diacyl C32:2 |
| PC aa C32:3 | Phosphatidylcholine diacyl C32:3 |
| PC aa C34:1 | Phosphatidylcholine diacyl C34:1 |
| PC aa C34:2 | Phosphatidylcholine diacyl C34:2 |
| PC aa C34:3 | Phosphatidylcholine diacyl C34:3 |
| PC aa C34:4 | Phosphatidylcholine diacyl C34:4 |
| PC aa C36:1 | Phosphatidylcholine diacyl C36:1 |
| PC aa C36:2 | Phosphatidylcholine diacyl C36:2 |
| PC aa C36:3 | Phosphatidylcholine diacyl C36:3 |
| PC aa C36:4 | Phosphatidylcholine diacyl C36:4 |
| PC aa C36:5 | Phosphatidylcholine diacyl C36:5 |
| PC aa C36:6 | Phosphatidylcholine diacyl C36:6 |
| PC aa C38:0 | Phosphatidylcholine diacyl C38:0 |
| PC aa C38:3 | Phosphatidylcholine diacyl C38:3 |
| PC aa C38:4 | Phosphatidylcholine diacyl C38:4 |
| PC aa C38:5 | Phosphatidylcholine diacyl C38:5 |
| PC aa C38:6 | Phosphatidylcholine diacyl C38:6 |
| PC aa C40:2 | Phosphatidylcholine diacyl C40:2 |
| PC aa C40:3 | Phosphatidylcholine diacyl C40:3 |
| PC aa C40:4 | Phosphatidylcholine diacyl C40:4 |
| PC aa C40:5 | Phosphatidylcholine diacyl C40:5 |
| PC aa C40:6 | Phosphatidylcholine diacyl C40:6 |
| PC aa C42:0 | Phosphatidylcholine diacyl C42:0 |
| PC aa C42:1 | Phosphatidylcholine diacyl C42:1 |
| PC aa C42:2 | Phosphatidylcholine diacyl C42:2 |
| PC aa C42:4 | Phosphatidylcholine diacyl C42:4 |
| PC aa C42:5 | Phosphatidylcholine diacyl C42:5 |
| PC aa C42:6 | Phosphatidylcholine diacyl C42:6 |
| PC ae C30:0 | Phosphatidylcholine acyl-alkyl C30:0 |
| PC ae C30:2 | Phosphatidylcholine acyl-alkyl C30:2 |
| PC ae C32:1 | Phosphatidylcholine acyl-alkyl C32:1 |
| PC ae C32:2 | Phosphatidylcholine acyl-alkyl C32:2 |
| PC ae C34:0 | Phosphatidylcholine acyl-alkyl C34:0 |
| PC ae C34:1 | Phosphatidylcholine acyl-alkyl C34:1 |
| PC ae C34:2 | Phosphatidylcholine acyl-alkyl C34:2 |
| PC ae C34:3 | Phosphatidylcholine acyl-alkyl C34:3 |
| PC ae C36:0 | Phosphatidylcholine acyl-alkyl C36:0 |
| PC ae C36:1 | Phosphatidylcholine acyl-alkyl C36:1 |
| PC ae C36:2 | Phosphatidylcholine acyl-alkyl C36:2 |
| PC ae C36:3 | Phosphatidylcholine acyl-alkyl C36:3 |
| PC ae C36:4 | Phosphatidylcholine acyl-alkyl C36:4 |
| PC ae C36:5 | Phosphatidylcholine acyl-alkyl C36:5 |
| PC ae C38:0 | Phosphatidylcholine acyl-alkyl C38:0 |
| PC ae C38:3 | Phosphatidylcholine acyl-alkyl C38:3 |
| PC ae C38:4 | Phosphatidylcholine acyl-alkyl C38:4 |
| PC ae C38:5 | Phosphatidylcholine acyl-alkyl C38:5 |
| PC ae C38:6 | Phosphatidylcholine acyl-alkyl C38:6 |
| PC ae C40:1 | Phosphatidylcholine acyl-alkyl C40:1 |
| PC ae C40:2 | Phosphatidylcholine acyl-alkyl C40:2 |
| PC ae C40:3 | Phosphatidylcholine acyl-alkyl C40:3 |
| PC ae C40:4 | Phosphatidylcholine acyl-alkyl C40:4 |
| PC ae C40:5 | Phosphatidylcholine acyl-alkyl C40:5 |
| PC ae C40:6 | Phosphatidylcholine acyl-alkyl C40:6 |
| PC ae C42:0 | Phosphatidylcholine acyl-alkyl C42:0 |
| PC ae C42:1 | Phosphatidylcholine acyl-alkyl C42:1 |
| PC ae C42:2 | Phosphatidylcholine acyl-alkyl C42:2 |
| PC ae C42:3 | Phosphatidylcholine acyl-alkyl C42:3 |
| PC ae C42:4 | Phosphatidylcholine acyl-alkyl C42:4 |
| PC ae C42:5 | Phosphatidylcholine acyl-alkyl C42:5 |
| PC ae C44:3 | Phosphatidylcholine acyl-alkyl C44:3 |
| PC ae C44:4 | Phosphatidylcholine acyl-alkyl C44:4 |
| PC ae C44:5 | Phosphatidylcholine acyl-alkyl C44:5 |
| PC ae C44:6 | Phosphatidylcholine acyl-alkyl C44:6 |
| SM (OH) C14:1 | Hydroxysphingomyeline C14:1 |
| SM (OH) C16:1 | Hydroxysphingomyeline C16:1 |
| SM (OH) C22:1 | Hydroxysphingomyeline C22:1 |
| SM (OH) C22:2 | Hydroxysphingomyeline C22:2 |
| SM (OH) C24:1 | Hydroxysphingomyeline C24:1 |
| SM C16:0 | Sphingomyeline C16:0 |
| SM C16:1 | Sphingomyeline C16:1 |
| SM C18:0 | Sphingomyeline C18:0 |
| SM C18:1 | Sphingomyeline C18:1 |
| SM C24:0 | Sphingomyeline C24:0 |
| SM C24:1 | Sphingomyeline C24:1 |
The table below lists the analytes that were excluded from the analysis.
| analyte_class | |
|---|---|
| alpha-AAA | biogenic amines |
| Arg | aminoacids |
| Asp | aminoacids |
| Leu | aminoacids |
| Orn | aminoacids |
| Putrescine | biogenic amines |
| Sarcosine | biogenic amines |
| SDMA | biogenic amines |
| Serotonin | biogenic amines |
| Spermidine | biogenic amines |
| Taurine | biogenic amines |
| Thr | aminoacids |
| C10:1 | acylcarnitines |
| C16:1 | acylcarnitines |
| C4 | acylcarnitines |
| C6 (C4:1-DC) | acylcarnitines |
| PC aa C36:0 | glycerophospholipids |
| PC aa C40:1 | glycerophospholipids |
| PC ae C38:1 | glycerophospholipids |
| PC ae C38:2 | glycerophospholipids |
| SM C20:2 | sphingolipids |
| biochemical_name | |
|---|---|
| alpha-AAA | alpha-Aminoadipic acid |
| Arg | Arginine |
| Asp | Aspartate |
| Leu | Leucine |
| Orn | Ornithine |
| Putrescine | Putrescine |
| Sarcosine | Sarcosine |
| SDMA | Symmetric dimethylarginine |
| Serotonin | Serotonin |
| Spermidine | Spermidine |
| Taurine | Taurine |
| Thr | Threonine |
| C10:1 | Decenoylcarnitine |
| C16:1 | Hexadecenoylcarnitine |
| C4 | Butyrylcarnitine |
| C6 (C4:1-DC) | Hexanoylcarnitine (Fumarylcarnitine) |
| PC aa C36:0 | Phosphatidylcholine diacyl C36:0 |
| PC aa C40:1 | Phosphatidylcholine diacyl C40:1 |
| PC ae C38:1 | Phosphatidylcholine acyl-alkyl C38:1 |
| PC ae C38:2 | Phosphatidylcholine acyl-alkyl C38:2 |
| SM C20:2 | Sphingomyeline C20:2 |
At last we perform the PCA on the log2 transformed and (analyte-wise) centered abundances.
pc_lcms <- prcomp(t(log2(conc_lcms)), scale = FALSE, center = TRUE)
pc_fia <- prcomp(t(log2(conc_fia)), scale = FALSE, center = TRUE)
Figure 9: PCA of the normalized LCMS data
PCA was performed on all measured concentrations even if they were outside the detection range.
PC1 clearly separates 00p180_QC2 samples from all other samples while on PC2 QC NIST STD and 00p180_QC1 samples separate from QC CHRIS Pool and study samples, that, not surprisingly, are both overlapping. Thus, the first principal components simply reflect differences between artificial QC samples and the samples from the CHRIS study. Below we plot the results for the FIA data.
Figure 10: PCA of the normalized FIA data
PCA was performed on all measured concentrations even if they were outside the detection range.
PC1 reflects differences between the Biocrates’ QC samples 00p180_QC1 and 00p180_QC2 from all other samples. The samples separating from the main cloud on PC4 are from plates BPLT00000337, BPLT00000357, BPLT00000361 and BPLT00000413.
Data release information.
metadata(bcd)
## name value
## 1 db_creation_date 2020-04-01 09:19:31
## 2 data_date 2020-03
## 3 chris_release 3.5
## 4 version 2.0.0
## 5 aminoacids_norm anlt_log2_QCs_mean_mean
## 6 biogenic_amines_norm anlt_log2_QCs_mean_mean
## 7 acylcarnitines_norm anlt_log2_CHRIS_mean
## 8 glycerophospholipids_norm anlt_log2_QCs_mean_mean
## 9 sphingolipids_norm anlt_log2_QCs_mean_mean
## 10 sugars_norm anlt_log2_QCs_mean_mean
R session information.
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R Under development (unstable) (2020-03-27 r78085)
## os Debian GNU/Linux 10 (buster)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2020-04-01
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## affy 1.65.1 2019-11-06 [1] Bioconductor
## affyio 1.57.0 2019-10-29 [1] Bioconductor
## AnnotationDbi 1.49.1 2020-01-25 [1] Bioconductor
## AnnotationFilter * 1.11.0 2019-10-29 [1] Bioconductor
## assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.0)
## backports 1.1.5 2019-10-02 [2] CRAN (R 4.1.0)
## Biobase * 2.47.3 2020-03-16 [1] Bioconductor
## BiocGenerics * 0.33.3 2020-03-23 [1] Bioconductor
## BioCHRIStes * 2.0.0 2020-04-01 [1] Bioconductor
## BiocManager * 1.30.10 2019-11-16 [2] CRAN (R 4.1.0)
## BiocParallel * 1.21.2 2019-12-21 [1] Bioconductor
## BiocStyle * 2.15.6 2020-02-01 [1] Bioconductor
## bit 1.1-15.2 2020-02-10 [1] CRAN (R 4.0.0)
## bit64 0.9-7 2017-05-08 [1] CRAN (R 4.0.0)
## bitops 1.0-6 2013-08-17 [1] CRAN (R 4.0.0)
## blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.0)
## bookdown 0.18 2020-03-05 [1] CRAN (R 4.0.0)
## callr 3.4.2 2020-02-12 [2] CRAN (R 4.1.0)
## cli 2.0.2 2020-02-28 [2] CRAN (R 4.1.0)
## codetools 0.2-16 2018-12-24 [3] CRAN (R 4.1.0)
## colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0)
## crayon 1.3.4 2017-09-16 [2] CRAN (R 4.1.0)
## DBI * 1.1.0 2019-12-15 [1] CRAN (R 4.0.0)
## dbplyr 1.4.2 2019-06-17 [1] CRAN (R 4.0.0)
## DelayedArray * 0.13.8 2020-03-26 [1] Bioconductor
## DEoptimR 1.0-8 2016-11-19 [1] CRAN (R 4.0.0)
## desc 1.2.0 2018-05-01 [2] CRAN (R 4.1.0)
## devtools 2.2.2 2020-02-17 [1] CRAN (R 4.0.0)
## digest 0.6.25 2020-02-23 [2] CRAN (R 4.1.0)
## doParallel 1.0.15 2019-08-02 [1] CRAN (R 4.0.0)
## dplyr 0.8.5 2020-03-07 [1] CRAN (R 4.0.0)
## ellipsis 0.3.0 2019-09-20 [2] CRAN (R 4.1.0)
## evaluate 0.14 2019-05-28 [2] CRAN (R 4.1.0)
## fansi 0.4.1 2020-01-08 [2] CRAN (R 4.1.0)
## foreach 1.5.0 2020-03-30 [1] CRAN (R 4.1.0)
## fs 1.3.2 2020-03-05 [2] CRAN (R 4.1.0)
## GenomeInfoDb * 1.23.16 2020-03-27 [1] Bioconductor
## GenomeInfoDbData 1.2.2 2020-02-18 [1] Bioconductor
## GenomicRanges * 1.39.3 2020-03-24 [1] Bioconductor
## ggplot2 3.3.0 2020-03-05 [1] CRAN (R 4.0.0)
## glue 1.3.2 2020-03-12 [2] CRAN (R 4.1.0)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
## highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
## hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.0)
## htmltools 0.4.0 2019-10-04 [2] CRAN (R 4.1.0)
## impute 1.61.0 2019-10-29 [1] Bioconductor
## IRanges * 2.21.8 2020-03-25 [1] Bioconductor
## iterators 1.0.12 2019-07-26 [1] CRAN (R 4.0.0)
## knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
## lattice 0.20-40 2020-02-19 [3] CRAN (R 4.1.0)
## lazyeval 0.2.2 2019-03-15 [2] CRAN (R 4.1.0)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
## limma 3.43.5 2020-03-06 [1] Bioconductor
## magick 2.3 2020-01-24 [1] CRAN (R 4.0.0)
## magrittr 1.5 2014-11-22 [2] CRAN (R 4.1.0)
## MALDIquant 1.19.3 2019-05-12 [1] CRAN (R 4.0.0)
## MASS 7.3-51.5 2019-12-20 [3] CRAN (R 4.1.0)
## MassSpecWavelet 1.53.0 2019-10-29 [1] Bioconductor
## Matrix 1.2-18 2019-11-27 [3] CRAN (R 4.1.0)
## matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 4.0.0)
## memoise 1.1.0 2017-04-21 [2] CRAN (R 4.1.0)
## MSnbase * 2.13.4 2020-03-24 [1] Bioconductor
## multtest 2.43.1 2020-03-12 [1] Bioconductor
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
## mzID 1.25.0 2019-10-29 [1] Bioconductor
## mzR * 2.21.1 2019-12-14 [1] Bioconductor
## ncdf4 1.17 2019-10-23 [1] CRAN (R 4.0.0)
## pander * 0.6.3 2018-11-06 [1] CRAN (R 4.0.0)
## pcaMethods 1.79.1 2019-11-03 [1] Bioconductor
## pillar 1.4.3 2019-12-20 [1] CRAN (R 4.1.0)
## pkgbuild 1.0.6 2019-10-09 [2] CRAN (R 4.1.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
## pkgload 1.0.2 2018-10-29 [2] CRAN (R 4.1.0)
## plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0)
## preprocessCore 1.49.2 2020-02-01 [1] Bioconductor
## prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.1.0)
## processx 3.4.2 2020-02-09 [2] CRAN (R 4.1.0)
## ProtGenerics * 1.19.3 2019-12-25 [1] Bioconductor
## ps 1.3.2 2020-02-13 [2] CRAN (R 4.1.0)
## purrr 0.3.3 2019-10-18 [2] CRAN (R 4.1.0)
## R6 2.4.1 2019-11-12 [2] CRAN (R 4.1.0)
## RANN 2.6.1 2019-01-08 [1] CRAN (R 4.0.0)
## RColorBrewer * 1.1-2 2014-12-07 [1] CRAN (R 4.0.0)
## Rcpp * 1.0.4 2020-03-17 [2] CRAN (R 4.1.0)
## RCurl 1.98-1.1 2020-01-19 [1] CRAN (R 4.0.0)
## remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
## rlang 0.4.5 2020-03-01 [2] CRAN (R 4.1.0)
## RMariaDB * 1.0.8 2019-12-18 [1] CRAN (R 4.0.0)
## rmarkdown * 2.1 2020-01-20 [1] CRAN (R 4.0.0)
## robustbase 0.93-6 2020-03-23 [1] CRAN (R 4.0.0)
## rprojroot 1.3-2 2018-01-03 [2] CRAN (R 4.1.0)
## RSQLite 2.2.0 2020-01-07 [1] CRAN (R 4.0.0)
## S4Vectors * 0.25.14 2020-03-24 [1] Bioconductor
## scales 1.1.0 2019-11-18 [1] CRAN (R 4.0.0)
## sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.1.0)
## stringi 1.4.6 2020-02-17 [2] CRAN (R 4.1.0)
## stringr 1.4.0 2019-02-10 [2] CRAN (R 4.1.0)
## SummarizedExperiment * 1.17.5 2020-03-27 [1] Bioconductor
## survival 3.1-11 2020-03-07 [3] CRAN (R 4.1.0)
## testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
## tibble 3.0.0 2020-03-30 [1] CRAN (R 4.1.0)
## tidyselect 1.0.0 2020-01-27 [1] CRAN (R 4.0.0)
## usethis 1.5.1 2019-07-04 [2] CRAN (R 4.1.0)
## vctrs 0.2.4 2020-03-10 [1] CRAN (R 4.0.0)
## vsn 3.55.0 2019-10-29 [1] Bioconductor
## withr 2.1.2 2018-03-15 [2] CRAN (R 4.1.0)
## xcms * 3.9.3 2020-03-13 [1] Bioconductor
## xfun 0.12 2020-01-13 [1] CRAN (R 4.0.0)
## XML 3.99-0.3 2020-01-20 [1] CRAN (R 4.0.0)
## XVector 0.27.2 2020-03-24 [1] Bioconductor
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
## zlibbioc 1.33.1 2020-01-24 [1] Bioconductor
##
## [1] /usr/local/lib/R/host-site-library
## [2] /usr/local/lib/R/site-library
## [3] /usr/local/lib/R/library