Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction
Copyright © 2024 Elsevier Ltd. All rights reserved..
The retention time (RT) of contaminants of emerging concern (CECs) in liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is crucial for database matching in non-targeted screening (NTS) analysis. In this study, we developed a machine learning (ML) model to predict RTs of CECs in NTS analysis. Using 1051 CEC standards, we evaluated Random Forest (RF), XGBoost, Support Vector Regression (SVR), and Artificial Neural Network (ANN) with molecular fingerprints and chemical descriptors to establish an optimal model. The SVR model utilizing chemical descriptors resulted in good predictive capacity with R2ext = 0.850 and r2 = 0.925. The model was further validated through laboratory NTS compound characterization. When applied to examine CEC occurrence in a large wastewater treatment plant, we identified 40 level S1 CECs (confirmed structure by reference standard) and 234 level S2 compounds (probable structure by library spectrum match). The model predicted RTs for level S2 compounds, leading to the classification of 153 level S2 compounds with high confidence (ΔRT <2 min). The model served as a robust filtering mechanism within the analytical framework. This study emphasizes the importance of predicted RTs in NTS analysis and highlights the potential of prediction models. Our research introduces a workflow that enhances NTS analysis by utilizing RT prediction models to determine compound confidence levels.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:347 |
---|---|
Enthalten in: |
Environmental pollution (Barking, Essex : 1987) - 347(2024) vom: 15. Apr., Seite 123763 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Song, Dehao [VerfasserIn] |
---|
Links: |
---|
Themen: |
Chemical descriptors |
---|
Anmerkungen: |
Date Completed 08.04.2024 Date Revised 08.04.2024 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1016/j.envpol.2024.123763 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM369824334 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM369824334 | ||
003 | DE-627 | ||
005 | 20240408232728.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240317s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.envpol.2024.123763 |2 doi | |
028 | 5 | 2 | |a pubmed24n1369.xml |
035 | |a (DE-627)NLM369824334 | ||
035 | |a (NLM)38492749 | ||
035 | |a (PII)S0269-7491(24)00477-9 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Song, Dehao |e verfasserin |4 aut | |
245 | 1 | 0 | |a Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 08.04.2024 | ||
500 | |a Date Revised 08.04.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Copyright © 2024 Elsevier Ltd. All rights reserved. | ||
520 | |a The retention time (RT) of contaminants of emerging concern (CECs) in liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is crucial for database matching in non-targeted screening (NTS) analysis. In this study, we developed a machine learning (ML) model to predict RTs of CECs in NTS analysis. Using 1051 CEC standards, we evaluated Random Forest (RF), XGBoost, Support Vector Regression (SVR), and Artificial Neural Network (ANN) with molecular fingerprints and chemical descriptors to establish an optimal model. The SVR model utilizing chemical descriptors resulted in good predictive capacity with R2ext = 0.850 and r2 = 0.925. The model was further validated through laboratory NTS compound characterization. When applied to examine CEC occurrence in a large wastewater treatment plant, we identified 40 level S1 CECs (confirmed structure by reference standard) and 234 level S2 compounds (probable structure by library spectrum match). The model predicted RTs for level S2 compounds, leading to the classification of 153 level S2 compounds with high confidence (ΔRT <2 min). The model served as a robust filtering mechanism within the analytical framework. This study emphasizes the importance of predicted RTs in NTS analysis and highlights the potential of prediction models. Our research introduces a workflow that enhances NTS analysis by utilizing RT prediction models to determine compound confidence levels | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Chemical descriptors | |
650 | 4 | |a Confidence level | |
650 | 4 | |a Contaminants of emerging concern | |
650 | 4 | |a Practical applications | |
650 | 4 | |a Support vector regression | |
700 | 1 | |a Tang, Ting |e verfasserin |4 aut | |
700 | 1 | |a Wang, Rui |e verfasserin |4 aut | |
700 | 1 | |a Liu, He |e verfasserin |4 aut | |
700 | 1 | |a Xie, Danping |e verfasserin |4 aut | |
700 | 1 | |a Zhao, Bo |e verfasserin |4 aut | |
700 | 1 | |a Dang, Zhi |e verfasserin |4 aut | |
700 | 1 | |a Lu, Guining |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Environmental pollution (Barking, Essex : 1987) |d 1987 |g 347(2024) vom: 15. Apr., Seite 123763 |w (DE-627)NLM087741504 |x 1873-6424 |7 nnns |
773 | 1 | 8 | |g volume:347 |g year:2024 |g day:15 |g month:04 |g pages:123763 |
856 | 4 | 0 | |u http://dx.doi.org/10.1016/j.envpol.2024.123763 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 347 |j 2024 |b 15 |c 04 |h 123763 |