CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets


Malgorzata Nowicka1,2, Carsten Krieg3, Lukas M. Weber1,2, Felix J. Hartmann3, Silvia Guglietta4, Burkhard Becher3, Mitchell P. Levesque5, Mark D. Robinson1,2,*

1 Institute for Molecular Life Sciences, University of Zurich, Switzerland
2 SIB Swiss Institute of Bioinformatics, University of Zurich, Switzerland
3 Institute of Experimental Immunology, University of Zurich, Switzerland
4 Department of Experimental Oncology, European Institute of Oncology, Via Adamello 16, I-20139 Milan, Italy
5 Department of Dermatology, University Hospital Zurich, CH-8091 Zurich, Switzerland
* Corresponding author: mark.robinson@imls.uzh.ch.

Abstract

High dimensional (mass and flow) cytometry (HDCyto) experiments have become a method of choice for interrogating and characterizing cell populations at high throughput. Here, we present a R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype, or changes in signaling markers within specific subpopulations or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experiments designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models for analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cells count or aggregated signal across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including for quality control (e.g., multi-dimensional scaling plots), for reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and for differential analyses (e.g., plots of aggregated signal).


Data used in the workflow: