Details der Publikation - A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors

A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors

© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissionsoup.com..

MOTIVATION: A major aim of single cell biology is to identify important cell types such as stem cells in heterogeneous tissues and tumors. This is typically done by isolating hundreds of individual cells and measuring expression levels of multiple genes simultaneously from each cell. Then, clustering algorithms are used to group together similar single-cell expression profiles into clusters, each representing a distinct cell type. However, many of these clusters result from overfitting, meaning that rather than representing biologically meaningful cell types, they describe the intrinsic 'noise' in gene expression levels due to limitations in experimental precision or the intrinsic randomness of biochemical cellular processes. Consequentially, these non-meaningful clusters are most sensitive to noise: a slight shift in gene expression levels due to a repeated measurement will rearrange the grouping of data points such that these clusters break up.

RESULTS: To identify the biologically meaningful clusters we propose a 'cluster robustness score': We add increasing amounts of noise (zero mean and increasing variance) and check which clusters are most robust in the sense that they do not mix with their neighbors up to high levels of noise. We show that biologically meaningful cell clusters that were manually identified in previously published single cell expression datasets have high robustness scores. These scores are higher than what would be expected in corresponding randomized homogeneous datasets having the same expression level statistics. We believe that this scoring system provides a more automated way to identify cell types in heterogeneous tissues and tumors.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Medienart:	E-Artikel

Erscheinungsjahr:	2019
Erschienen:	2019

Enthalten in:	Zur Gesamtaufnahme - volume:35
Enthalten in:	Bioinformatics (Oxford, England) - 35(2019), 6 vom: 15. März, Seite 962-971

Sprache:	Englisch

Beteiligte Personen:	Kanter, Itamar [VerfasserIn] Dalerba, Piero [VerfasserIn] Kalisky, Tomer [VerfasserIn]

Links:	Volltext

Themen:	Journal Article Research Support, Non-U.S. Gov't

Anmerkungen:	Date Completed 31.12.2019 Date Revised 31.12.2019 published: Print Citation Status MEDLINE

doi:	10.1093/bioinformatics/bty708

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM288033515

Internformat


LEADER	01000naa a22002652 4500
001	NLM288033515
003	DE-627
005	20231225055350.0
007	cr uuu---uuuuu
008	231225s2019 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1093/bioinformatics/bty708 \|2 doi
028	5	2	\|a pubmed24n0960.xml
035			\|a (DE-627)NLM288033515
035			\|a (NLM)30165506
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Kanter, Itamar \|e verfasserin \|4 aut
245	1	2	\|a A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors
264		1	\|c 2019
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 31.12.2019
500			\|a Date Revised 31.12.2019
500			\|a published: Print
500			\|a Citation Status MEDLINE
520			\|a © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissionsoup.com.
520			\|a MOTIVATION: A major aim of single cell biology is to identify important cell types such as stem cells in heterogeneous tissues and tumors. This is typically done by isolating hundreds of individual cells and measuring expression levels of multiple genes simultaneously from each cell. Then, clustering algorithms are used to group together similar single-cell expression profiles into clusters, each representing a distinct cell type. However, many of these clusters result from overfitting, meaning that rather than representing biologically meaningful cell types, they describe the intrinsic 'noise' in gene expression levels due to limitations in experimental precision or the intrinsic randomness of biochemical cellular processes. Consequentially, these non-meaningful clusters are most sensitive to noise: a slight shift in gene expression levels due to a repeated measurement will rearrange the grouping of data points such that these clusters break up
520			\|a RESULTS: To identify the biologically meaningful clusters we propose a 'cluster robustness score': We add increasing amounts of noise (zero mean and increasing variance) and check which clusters are most robust in the sense that they do not mix with their neighbors up to high levels of noise. We show that biologically meaningful cell clusters that were manually identified in previously published single cell expression datasets have high robustness scores. These scores are higher than what would be expected in corresponding randomized homogeneous datasets having the same expression level statistics. We believe that this scoring system provides a more automated way to identify cell types in heterogeneous tissues and tumors
520			\|a SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Dalerba, Piero \|e verfasserin \|4 aut
700	1		\|a Kalisky, Tomer \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Bioinformatics (Oxford, England) \|d 1998 \|g 35(2019), 6 vom: 15. März, Seite 962-971 \|w (DE-627)NLM094620342 \|x 1367-4811 \|7 nnns
773	1	8	\|g volume:35 \|g year:2019 \|g number:6 \|g day:15 \|g month:03 \|g pages:962-971
856	4	0	\|u http://dx.doi.org/10.1093/bioinformatics/bty708 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 35 \|j 2019 \|e 6 \|b 15 \|c 03 \|h 962-971

A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände