A systematic performance evaluation of clustering methods for single-cell RNA-seq data


Angelo Duo1,2, Mark D Robinson1,2, Charlotte Soneson1,2

1 Institute of Molecular Life Sciences, University of Zurich, Switzerland
2 SIB Swiss Institute of Bioinformatics, Switzerland

Correspondence: charlotte.soneson@uzh.ch or mark.robinson@imls.uzh.ch

Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 12 clustering algorithms, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using 9 publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same preprocessing steps were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves.
We evaluated the ability of recovering known subpopulations, the stability and the run time of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering.
The R scripts providing an extensible framework for the evaluation of new methods and data sets are available on GitHub.


The unfiltered and filtered data sets used for the ealuation, as well as the output from all appplied clustering algorithms, are available here (4.93 GB).