Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data
© 2024. The Author(s)..
With the increased use of gene expression profiling for personalized oncology, optimized RNA sequencing (RNA-seq) protocols and algorithms are necessary to provide comparable expression measurements between exome capture (EC)-based and poly-A RNA-seq. Here, we developed and optimized an EC-based protocol for processing formalin-fixed, paraffin-embedded samples and a machine-learning algorithm, Procrustes, to overcome batch effects across RNA-seq data obtained using different sample preparation protocols like EC-based or poly-A RNA-seq protocols. Applying Procrustes to samples processed using EC and poly-A RNA-seq protocols showed the expression of 61% of genes (N = 20,062) to correlate across both protocols (concordance correlation coefficient > 0.8, versus 26% before transformation by Procrustes), including 84% of cancer-specific and cancer microenvironment-related genes (versus 36% before applying Procrustes; N = 1,438). Benchmarking analyses also showed Procrustes to outperform other batch correction methods. Finally, we showed that Procrustes can project RNA-seq data for a single sample to a larger cohort of RNA-seq data. Future application of Procrustes will enable direct gene expression analysis for single tumor samples to support gene expression-based treatment decisions.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:7 |
---|---|
Enthalten in: |
Communications biology - 7(2024), 1 vom: 30. März, Seite 392 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Kotlov, Nikita [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Completed 01.04.2024 Date Revised 02.04.2024 published: Electronic Citation Status MEDLINE |
---|
doi: |
10.1038/s42003-024-06020-z |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM370449290 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM370449290 | ||
003 | DE-627 | ||
005 | 20240403000741.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240331s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1038/s42003-024-06020-z |2 doi | |
028 | 5 | 2 | |a pubmed24n1361.xml |
035 | |a (DE-627)NLM370449290 | ||
035 | |a (NLM)38555407 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Kotlov, Nikita |e verfasserin |4 aut | |
245 | 1 | 0 | |a Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 01.04.2024 | ||
500 | |a Date Revised 02.04.2024 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © 2024. The Author(s). | ||
520 | |a With the increased use of gene expression profiling for personalized oncology, optimized RNA sequencing (RNA-seq) protocols and algorithms are necessary to provide comparable expression measurements between exome capture (EC)-based and poly-A RNA-seq. Here, we developed and optimized an EC-based protocol for processing formalin-fixed, paraffin-embedded samples and a machine-learning algorithm, Procrustes, to overcome batch effects across RNA-seq data obtained using different sample preparation protocols like EC-based or poly-A RNA-seq protocols. Applying Procrustes to samples processed using EC and poly-A RNA-seq protocols showed the expression of 61% of genes (N = 20,062) to correlate across both protocols (concordance correlation coefficient > 0.8, versus 26% before transformation by Procrustes), including 84% of cancer-specific and cancer microenvironment-related genes (versus 36% before applying Procrustes; N = 1,438). Benchmarking analyses also showed Procrustes to outperform other batch correction methods. Finally, we showed that Procrustes can project RNA-seq data for a single sample to a larger cohort of RNA-seq data. Future application of Procrustes will enable direct gene expression analysis for single tumor samples to support gene expression-based treatment decisions | ||
650 | 4 | |a Journal Article | |
650 | 7 | |a RNA |2 NLM | |
650 | 7 | |a 63231-63-0 |2 NLM | |
700 | 1 | |a Shaposhnikov, Kirill |e verfasserin |4 aut | |
700 | 1 | |a Tazearslan, Cagdas |e verfasserin |4 aut | |
700 | 1 | |a Chasse, Madison |e verfasserin |4 aut | |
700 | 1 | |a Baisangurov, Artur |e verfasserin |4 aut | |
700 | 1 | |a Podsvirova, Svetlana |e verfasserin |4 aut | |
700 | 1 | |a Fernandez, Dawn |e verfasserin |4 aut | |
700 | 1 | |a Abdou, Mary |e verfasserin |4 aut | |
700 | 1 | |a Kaneunyenye, Leznath |e verfasserin |4 aut | |
700 | 1 | |a Morgan, Kelley |e verfasserin |4 aut | |
700 | 1 | |a Cheremushkin, Ilya |e verfasserin |4 aut | |
700 | 1 | |a Zemskiy, Pavel |e verfasserin |4 aut | |
700 | 1 | |a Chelushkin, Maxim |e verfasserin |4 aut | |
700 | 1 | |a Sorokina, Maria |e verfasserin |4 aut | |
700 | 1 | |a Belova, Ekaterina |e verfasserin |4 aut | |
700 | 1 | |a Khorkova, Svetlana |e verfasserin |4 aut | |
700 | 1 | |a Lozinsky, Yaroslav |e verfasserin |4 aut | |
700 | 1 | |a Nuzhdina, Katerina |e verfasserin |4 aut | |
700 | 1 | |a Vasileva, Elena |e verfasserin |4 aut | |
700 | 1 | |a Kravchenko, Dmitry |e verfasserin |4 aut | |
700 | 1 | |a Suryamohan, Kushal |e verfasserin |4 aut | |
700 | 1 | |a Nomie, Krystle |e verfasserin |4 aut | |
700 | 1 | |a Curran, John |e verfasserin |4 aut | |
700 | 1 | |a Fowler, Nathan |e verfasserin |4 aut | |
700 | 1 | |a Bagaev, Alexander |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Communications biology |d 2018 |g 7(2024), 1 vom: 30. März, Seite 392 |w (DE-627)NLM284287245 |x 2399-3642 |7 nnns |
773 | 1 | 8 | |g volume:7 |g year:2024 |g number:1 |g day:30 |g month:03 |g pages:392 |
856 | 4 | 0 | |u http://dx.doi.org/10.1038/s42003-024-06020-z |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 7 |j 2024 |e 1 |b 30 |c 03 |h 392 |