CDSKNNXMBD : a novel clustering framework for large-scale single-cell data based on a stable graph structure

© 2024. The Author(s)..

BACKGROUND: Accurate and efficient cell grouping is essential for analyzing single-cell transcriptome sequencing (scRNA-seq) data. However, the existing clustering techniques often struggle to provide timely and accurate cell type groupings when dealing with datasets with large-scale or imbalanced cell types. Therefore, there is a need for improved methods that can handle the increasing size of scRNA-seq datasets while maintaining high accuracy and efficiency.

METHODS: We propose CDSKNNXMBD (Community Detection based on a Stable K-Nearest Neighbor Graph Structure), a novel single-cell clustering framework integrating partition clustering algorithm and community detection algorithm, which achieves accurate and fast cell type grouping by finding a stable graph structure.

RESULTS: We evaluated the effectiveness of our approach by analyzing 15 tissues from the human fetal atlas. Compared to existing methods, CDSKNN effectively counteracts the high imbalance in single-cell data, enabling effective clustering. Furthermore, we conducted comparisons across multiple single-cell datasets from different studies and sequencing techniques. CDSKNN is of high applicability and robustness, and capable of balancing the complexities of across diverse types of data. Most importantly, CDSKNN exhibits higher operational efficiency on datasets at the million-cell scale, requiring an average of only 6.33 min for clustering 1.46 million single cells, saving 33.3% to 99% of running time compared to those of existing methods.

CONCLUSIONS: The CDSKNN is a flexible, resilient, and promising clustering tool that is particularly suitable for clustering imbalanced data and demonstrates high efficiency on large-scale scRNA-seq datasets.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - volume:22

Enthalten in:

Journal of translational medicine - 22(2024), 1 vom: 03. März, Seite 233

Sprache:

Englisch

Beteiligte Personen:

Ren, Jun [VerfasserIn]
Lyu, Xuejing [VerfasserIn]
Guo, Jintao [VerfasserIn]
Shi, Xiaodong [VerfasserIn]
Zhou, Ying [VerfasserIn]
Li, Qiyuan [VerfasserIn]

Links:

Volltext

Themen:

Clustering
Imbalance ratio
Journal Article
Large-scale
ScRNA-seq

Anmerkungen:

Date Completed 05.03.2024

Date Revised 06.03.2024

published: Electronic

Citation Status MEDLINE

doi:

10.1186/s12967-024-05009-w

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM369230736