Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data


Charlotte Soneson1,2, Mark D Robinson1,2

1 Institute of Molecular Life Sciences, University of Zurich, Switzerland
2 SIB Swiss Institute of Bioinformatics, Switzerland

Correspondence: charlotte.soneson@uzh.ch or mark.robinson@imls.uzh.ch

We perform an extensive evaluation of the performance and characteristics of 35 approaches for differential gene expression analysis in single-cell RNA-seq, using both experimental and synthetic data. Considerable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Prefiltering of lowly expressed genes is shown to have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq. We also present conquer, a repository of consistently processed, analysis-ready public single-cell RNA-seq data sets, aimed at simplifying method evaluation and reanalysis of published results. Each data set provides abundance estimates for both genes and transcripts, as well as quality control and exploratory analysis reports.


Most of the data sets used for the comparison were obtained from the conquer repository. The versions used in the published paper can be downloaded as a compressed archive here (10.21 GB), and the DE results can be downloaded here (28.4 GB).

All code used to build conquer is available from here.

The code used to perform the differential expression evaluation is available here.

The shiny app built to explore the main results in more detail can be reached from here.