Comparing univariate filtration preceding and succeeding PLS-DA analysis on the differential variables/metabolites identified from untargeted LC-MS metabolomics data

Copyright © 2023 Elsevier B.V. All rights reserved..

BACKGROUND: PLS-DA of high-dimensional metabolomics data is frequently employed to capture the most pertinent features to sample classification. But the presence of numerous insignificant input features could distort the PLS-DA model, blow up and scramble the selected differential features. Usually, univariate filtration is subsequently complemented to refine the selected features, but often giving unstable results. Whereas by precluding insignificant features through univariate data prefiltration assessed by FDR adjusted p-value, PLS-DA can generate more stable and reliable differential features. We explored and compared these two data analysis procedures to gain insights into the underlying mechanisms responsible for the disparate results.

RESULTS: The effect of univariate data filtration preceding and succeeding PLS-DA analysis on the identified discriminative features/metabolites was investigated using LC-MS data acquired on the samples of human serum and C. elegans extracts, with and without metabolite standards spiked to simulate the treated and control groups of biological samples. It was shown that the univariate data prefiltration before PLS-DA usually gave less but more stable and likely more reliable and meaningful differential features, while PLS-DA applied directly to the original data could be affected by the presence of insignificant features and orthogonal noise. Large number of insignificant variables and orthogonal noise could distort the generated PLS-DA model and affect the p(corr) value, and artificially inflate the calculated VIP values of relevant features due to the increased total number of input features for model construction, thus leading to more false positives selected by the conventional VIP threshold of 1.0.

SIGNIFICANCE AND NOVELTY: Univariate data filtration preceding PLS-DA was important for the identification of reliable differential features if using a conventional threshold of VIP of 1.0. Presence of insignificant features could distort the PLS-DA model and inflate VIP values. Appropriate VIP threshold is associated with the numbers of input features and the model components. For PLS-DA without univariate prefiltration, threshold of VIP larger than 1.0 is recommended for the selection of discriminative features to reduce the false positives.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - volume:1287

Enthalten in:

Analytica chimica acta - 1287(2024) vom: 25. Jan., Seite 342103

Sprache:

Englisch

Beteiligte Personen:

Xu, Suyun [VerfasserIn]
Bai, Caihong [VerfasserIn]
Chen, Yanli [VerfasserIn]
Yu, Lingling [VerfasserIn]
Wu, Wenjun [VerfasserIn]
Hu, Kaifeng [VerfasserIn]

Links:

Volltext

Themen:

Differential features
Journal Article
Metabolomics
Multivariate analysis
PLS-DA
Univariate data prefiltration

Anmerkungen:

Date Completed 08.01.2024

Date Revised 08.01.2024

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1016/j.aca.2023.342103

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM366729578