Artykuł w czasopiśmie
Ładowanie...
Miniatura
Licencja

ClosedAccessDostęp zamknięty

Statistical significance of cluster membership for unsupervised evaluation of cell identities

Punktacja ministerialna
200
Data publikacji
Abstrakt (EN)

Motivation: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect pheno- types and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external meas- urements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. Results: We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an un- supervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpo- pulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership.

Dyscyplina PBN
informatyka
Czasopismo
Bioinformatics
Tom
36
Zeszyt
10
Strony od-do
3107-3114
ISSN
1367-4803
Licencja otwartego dostępu
Dostęp zamknięty