-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Each cell type is compared to all others. This could result in biased choices if one cell type dominates the composition.
for (i in seq_len(nlevels(cellTypes))) # For each cell type.
tmp_celltype <- (ifelse(cellTypes == levels(cellTypes)[i], 1, 0)) # One cell type versus the rest combined.I propose that all pairs of cell types are compared and then averaged.
We deliberately use pairwise comparisons rather than comparing each cluster to the average of all other cells. The latter approach is sensitive to the population composition, which introduces an element of unpredictability to the marker sets due to variation in cell type abundances. In the worst case, the presence of one subpopulation containing a majority of the cells will drive the selection of top markers for every other cluster, pushing out useful genes that can distinguish between the smaller subpopulations.
Orchestrating Single Cell Analysis Chapter 6: Marker Gene Detection
The function also considers cell types but not samples: doLimma <- function(exprsMat, cellTypes, exprs_pct = 0.05). This also introduces a second level of bias. Suppose that sample A captured 3000 cells and sample B captured 5000 cells. The differences would be driven more by B.
scater's pseudoBulkDGE function already does this (Orchestrating Single Cell Analysis Chapter 4: Multi-sample Multi-condition Comparisons). But, it only allows integer count data and edgeR hypothesis testing.