CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets


Malgorzata Nowicka1,2, Carsten Krieg3, Lukas M. Weber1,2, Mark D. Robinson1,2,*

1 Institute for Molecular Life Sciences, University of Zurich, Switzerland
2 SIB Swiss Institute of Bioinformatics, University of Zurich, Switzerland
3 Institute of Experimental Immunology, University of Zurich, Switzerland
* Corresponding author: mark.robinson@imls.uzh.ch.

Abstract

High dimensional (mass and flow) cytometry (HDCyto) experiments have become a method of choice for interrogating and characterizing cell populations at high throughput. Here, we present a R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populatios using FlowSOM clustering, allowing for optional but reproducible manual merging. There are several differential analyses of interest: the differences in cell type abundance that are associated with a phenotype, or differences in signaling markers within specific subpopulations or differential analyses of aggregrated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary designs, such as those with batch effects, paired designs, etc. In particular, we apply generalized linear mixed models for analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cells count or aggregated signal across samples to be appropriately modeled. To support the formal statistical analyses, we include various visualizations at every step of the analysis, including for quality control (e.g., multi-dimensional scaling plots), for reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and for differential analyses (e.g., plots of aggregated signal).


Data used in the workflow: