CDSKNNXMBD : a novel clustering framework for large-scale single-cell data based on a stable graph structure
© 2024. The Author(s)..
BACKGROUND: Accurate and efficient cell grouping is essential for analyzing single-cell transcriptome sequencing (scRNA-seq) data. However, the existing clustering techniques often struggle to provide timely and accurate cell type groupings when dealing with datasets with large-scale or imbalanced cell types. Therefore, there is a need for improved methods that can handle the increasing size of scRNA-seq datasets while maintaining high accuracy and efficiency.
METHODS: We propose CDSKNNXMBD (Community Detection based on a Stable K-Nearest Neighbor Graph Structure), a novel single-cell clustering framework integrating partition clustering algorithm and community detection algorithm, which achieves accurate and fast cell type grouping by finding a stable graph structure.
RESULTS: We evaluated the effectiveness of our approach by analyzing 15 tissues from the human fetal atlas. Compared to existing methods, CDSKNN effectively counteracts the high imbalance in single-cell data, enabling effective clustering. Furthermore, we conducted comparisons across multiple single-cell datasets from different studies and sequencing techniques. CDSKNN is of high applicability and robustness, and capable of balancing the complexities of across diverse types of data. Most importantly, CDSKNN exhibits higher operational efficiency on datasets at the million-cell scale, requiring an average of only 6.33 min for clustering 1.46 million single cells, saving 33.3% to 99% of running time compared to those of existing methods.
CONCLUSIONS: The CDSKNN is a flexible, resilient, and promising clustering tool that is particularly suitable for clustering imbalanced data and demonstrates high efficiency on large-scale scRNA-seq datasets.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:22 |
---|---|
Enthalten in: |
Journal of translational medicine - 22(2024), 1 vom: 03. März, Seite 233 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Ren, Jun [VerfasserIn] |
---|
Links: |
---|
Themen: |
Clustering |
---|
Anmerkungen: |
Date Completed 05.03.2024 Date Revised 06.03.2024 published: Electronic Citation Status MEDLINE |
---|
doi: |
10.1186/s12967-024-05009-w |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM369230736 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM369230736 | ||
003 | DE-627 | ||
005 | 20240306233122.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240304s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/s12967-024-05009-w |2 doi | |
028 | 5 | 2 | |a pubmed24n1318.xml |
035 | |a (DE-627)NLM369230736 | ||
035 | |a (NLM)38433205 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Ren, Jun |e verfasserin |4 aut | |
245 | 1 | 0 | |a CDSKNNXMBD |b a novel clustering framework for large-scale single-cell data based on a stable graph structure |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 05.03.2024 | ||
500 | |a Date Revised 06.03.2024 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © 2024. The Author(s). | ||
520 | |a BACKGROUND: Accurate and efficient cell grouping is essential for analyzing single-cell transcriptome sequencing (scRNA-seq) data. However, the existing clustering techniques often struggle to provide timely and accurate cell type groupings when dealing with datasets with large-scale or imbalanced cell types. Therefore, there is a need for improved methods that can handle the increasing size of scRNA-seq datasets while maintaining high accuracy and efficiency | ||
520 | |a METHODS: We propose CDSKNNXMBD (Community Detection based on a Stable K-Nearest Neighbor Graph Structure), a novel single-cell clustering framework integrating partition clustering algorithm and community detection algorithm, which achieves accurate and fast cell type grouping by finding a stable graph structure | ||
520 | |a RESULTS: We evaluated the effectiveness of our approach by analyzing 15 tissues from the human fetal atlas. Compared to existing methods, CDSKNN effectively counteracts the high imbalance in single-cell data, enabling effective clustering. Furthermore, we conducted comparisons across multiple single-cell datasets from different studies and sequencing techniques. CDSKNN is of high applicability and robustness, and capable of balancing the complexities of across diverse types of data. Most importantly, CDSKNN exhibits higher operational efficiency on datasets at the million-cell scale, requiring an average of only 6.33 min for clustering 1.46 million single cells, saving 33.3% to 99% of running time compared to those of existing methods | ||
520 | |a CONCLUSIONS: The CDSKNN is a flexible, resilient, and promising clustering tool that is particularly suitable for clustering imbalanced data and demonstrates high efficiency on large-scale scRNA-seq datasets | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Clustering | |
650 | 4 | |a Imbalance ratio | |
650 | 4 | |a Large-scale | |
650 | 4 | |a scRNA-seq | |
700 | 1 | |a Lyu, Xuejing |e verfasserin |4 aut | |
700 | 1 | |a Guo, Jintao |e verfasserin |4 aut | |
700 | 1 | |a Shi, Xiaodong |e verfasserin |4 aut | |
700 | 1 | |a Zhou, Ying |e verfasserin |4 aut | |
700 | 1 | |a Li, Qiyuan |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Journal of translational medicine |d 2003 |g 22(2024), 1 vom: 03. März, Seite 233 |w (DE-627)NLM142679194 |x 1479-5876 |7 nnns |
773 | 1 | 8 | |g volume:22 |g year:2024 |g number:1 |g day:03 |g month:03 |g pages:233 |
856 | 4 | 0 | |u http://dx.doi.org/10.1186/s12967-024-05009-w |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 22 |j 2024 |e 1 |b 03 |c 03 |h 233 |