diff --git a/DESCRIPTION b/DESCRIPTION index 05ea2d7..d87845d 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: gateR Type: Package Title: Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation -Version: 0.1.10 -Date: 2022-02-04 +Version: 0.1.11 +Date: 2022-08-25 Authors@R: c(person(given = "Ian D.", family = "Buller", @@ -32,7 +32,7 @@ Description: Estimates statistically significant marker combination values withi another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of markers to examine features of cells that may be different between groups. For a two-group comparison, the 'gateR' package uses the spatial relative risk - function that is estimated using the 'sparr' package. Details about the 'sparr' package + function estimated using the 'sparr' package. Details about the 'sparr' package methods can be found in the tutorial: Davies et al. (2018) . Details about kernel density estimation can be found in J. F. Bithell (1990) . More information about relative risk functions using kernel density estimation can be @@ -41,7 +41,7 @@ License: Apache License (>= 2.0) Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) -RoxygenNote: 7.1.2 +RoxygenNote: 7.2.1 Depends: R (>= 3.5.0) Imports: @@ -55,18 +55,15 @@ Imports: SpatialPack, spatstat.geom, stats, - tibble, - tools, - utils + tibble Suggests: dplyr, - flowWorkspaceData, - ncdfFlow, R.rsp, spelling, testthat, - usethis + usethis, + utils VignetteBuilder: R.rsp Language: en-US -URL: https://github.com/Waller-SUSAN/gateR -BugReports: https://github.com/Waller-SUSAN/gateR/issues +URL: https://github.com/lance-waller-lab/gateR +BugReports: https://github.com/lance-waller-lab/gateR/issues diff --git a/NEWS.md b/NEWS.md index d6282c4..b074093 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,14 @@ # gateR (development version) +# gateR v0.1.11 + * Updated package URL and BugReports to renamed GitHub account "lance-waller-lab" (previously "Waller-SUSAN") + * Replaced `if()` conditions comparing `class()` to string with `inherits()` in functions + * `tools` is no longer Imports + * `utils` is now Suggests because "zzz.R" calls the `packageDescription()` function + * `ncdfFlow`, `flowWorkspaceData` are no longer Suggests (for generating random data set `randCyto`) because "Package suggested but not available for checking" in the some CRAN environments + * Added CITATION file + * Fixed typos in documentation throughout + # gateR v0.1.10 * Updated dependencies `spatstat.core` and `spatstat.linnet` packages based on feedback from the Spatstat Team (Adrian Baddeley and Ege Rubak). All random generators in `spatstat.core` were moved to a new package `spatstat.random` * `spatstat.geom`, `spatstat.core`, `spatstat.linnet`, and `spatstat (>=2.0-0)` are no longer Depends diff --git a/R/gating.R b/R/gating.R index f11081e..4f0b231 100644 --- a/R/gating.R +++ b/R/gating.R @@ -15,10 +15,10 @@ #' @param name_gate Optional, character. The filename of the visualization(s). The default is "gate_k" where "k" is the gate number. #' @param path_gate Optional, character. The path of the visualization(s). The default is the current working directory. #' @param rcols Character string of length three (3) specifying the colors for: 1) group A (numerator), 2) neither, and 3) group B (denominator) designations. The defaults are \code{c("#FF0000", "#cccccc", "#0000FF")} or \code{c("red", "grey80", "blue")}. -#' @param lower_lrr Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit and the color key will include the minimum value of the log relative risk surface. -#' @param upper_lrr Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit and the color key will include the maximum value of the log relative risk surface. -#' @param c1n Optional, character. The name of the level for the numerator of condition A. The default is null and the first level is treated as the numerator. -#' @param c2n Optional, character. The name of the level for the numerator of condition B. The default is null and the first level is treated as the numerator. +#' @param lower_lrr Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit, and the color key will include the minimum value of the log relative risk surface. +#' @param upper_lrr Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit, and the color key will include the maximum value of the log relative risk surface. +#' @param c1n Optional, character. The name of the level for the numerator of condition A. The default is NULL, and the first level is treated as the numerator. +#' @param c2n Optional, character. The name of the level for the numerator of condition B. The default is NULL, and the first level is treated as the numerator. #' @param win Optional. Object of class \code{owin} for a custom two-dimensional window within which to estimate the surfaces. The default is NULL and calculates a convex hull around the data. #' @param ... Arguments passed to \code{\link[sparr]{risk}} to select resolution. #' @param doplot `r lifecycle::badge("deprecated")` \code{doplot} is no longer supported and has been renamed \code{plot_gate}. @@ -26,9 +26,9 @@ #' #' @details This function performs a sequential gating strategy for mass cytometry data comparing two levels with one or two conditions. Gates are typically two-dimensional space comprised of two fluorescent markers. The two-level comparison allows for the estimation of a spatial relative risk function and the computation of p-value based on an assumption of asymptotic normality. Cells within statistically significant areas are extracted and used in the next gate. This function relies heavily upon the \code{\link[sparr]{risk}} function. Basic visualization is available if \code{plot_gate = TRUE}. #' -#' The \code{vars} argument must be a vector with an even-numbered length where the odd-numbered elements are the markers used on the x-axis of a gate and the even-numbered elements are the markers used on the y-axis of a gate. For example, if \code{vars = c("V1", "V2", "V3", and "V4")} then the first gate is "V1" on the x-axis and "V2" on the y-axis and then the second gate is V3" on the x-axis and "V4" on the y-axis. Makers can be repeated in successive gates. +#' The \code{vars} argument must be a vector with an even-numbered length where the odd-numbered elements are the markers used on the x-axis of a gate, and the even-numbered elements are the markers used on the y-axis of a gate. For example, if \code{vars = c("V1", "V2", "V3", and "V4")} then the first gate is "V1" on the x-axis and "V2" on the y-axis and then the second gate is V3" on the x-axis and "V4" on the y-axis. Makers can be repeated in successive gates. #' -#' The \code{n_condition} argument specifies if the gating strategy is performed for one condition or two conditions. If \code{n_condition = 1}, then the function performs a one condition gating strategy using the internal \code{rrs} function, which computes the statistically significant areas (clusters) of a relative risk surface at each gate and selects the cells within the clusters specified by the \code{numerator} argument. If \code{n_condition = 2}, then the function performs a two conditions gating strategy using the internal \code{lotrrs} function, which computes the statistically significant areas (clusters) of a ratio of relative risk surfaces at each gate and selects the cells within the clusters specified by the \code{numerator} argument. The condition variable(s) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. See the documentation for the internal \code{rrs} and \code{lotrrs} functions for more details. +#' The \code{n_condition} argument specifies if the gating strategy is performed for one condition or two conditions. If \code{n_condition = 1}, then the function performs a one condition gating strategy using the internal \code{rrs} function, which computes the statistically significant areas (clusters) of a relative risk surface at each gate and selects the cells within the clusters specified by the \code{numerator} argument. If \code{n_condition = 2}, then the function performs a two conditions gating strategy using the internal \code{lotrrs} function, which computes the statistically significant areas (clusters) of a ratio of relative risk surfaces at each gate and selects the cells within the clusters specified by the \code{numerator} argument. The condition variable(s) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value, and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. See the documentation for the internal \code{rrs} and \code{lotrrs} functions for more details. #' #' The p-value surface of the ratio of relative risk surfaces is estimated assuming asymptotic normality of the ratio value at each gridded knot. The bandwidth is fixed across all layers. #' @@ -138,7 +138,7 @@ gating <- function(dat, } ## win - if (!is.null(win) & class(win) != "owin") { stop("'win' must be class 'owin'") } + if (!is.null(win) & !inherits(win, "owin")) { stop("'win' must be class 'owin'") } # Format data input dat <- dat[!is.na(dat[ , which(colnames(dat) %in% vars[1])]) & diff --git a/R/lotrrs.R b/R/lotrrs.R index 9c9042f..da447d0 100644 --- a/R/lotrrs.R +++ b/R/lotrrs.R @@ -12,16 +12,16 @@ #' @param name_gate Optional, character. The filename of the visualization. The default is "gate". #' @param path_gate Optional, character. The path of the visualization. The default is the current working directory. #' @param rcols Character string of length three (3) specifying the colors for: 1) group A (numerator), 2) neither, and 3) group B (denominator) designations. The defaults are \code{c("#FF0000", "#cccccc", "#0000FF")} or \code{c("red", "grey80", "blue")}. -#' @param lower_lrr Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit and the color key will include the minimum value of the log relative risk surface. -#' @param upper_lrr Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit and the color key will include the maximum value of the log relative risk surface. -#' @param c1n Optional, character. The name of the level for the numerator of condition A. The default is null and the first level is treated as the numerator. -#' @param c2n Optional, character. The name of the level for the numerator of condition B. The default is null and the first level is treated as the numerator. +#' @param lower_lrr Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit, and the color key will include the minimum value of the log relative risk surface. +#' @param upper_lrr Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit, and the color key will include the maximum value of the log relative risk surface. +#' @param c1n Optional, character. The name of the level for the numerator of condition A. The default is NULL, and the first level is treated as the numerator. +#' @param c2n Optional, character. The name of the level for the numerator of condition B. The default is NULL, and the first level is treated as the numerator. #' @param win Optional. Object of class \code{owin} for a custom two-dimensional window within which to estimate the surfaces. The default is NULL and calculates a convex hull around the data. #' @param ... Arguments passed to \code{\link[sparr]{risk}} to select resolution. #' @param doplot `r lifecycle::badge("deprecated")` \code{doplot} is no longer supported and has been renamed \code{plot_gate}. #' @param verbose `r lifecycle::badge("deprecated")` \code{verbose} is no longer supported; this function will not display verbose output from internal \code{\link[sparr]{risk}} function. #' -#' @details This function estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions using three successive \code{\link[sparr]{risk}} functions. A relative risk surface is estimated for Condition A at each level of Condition B and then a ratio of the two relative risk surfaces is computed. +#' @details This function estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions using three successive \code{\link[sparr]{risk}} functions. A relative risk surface is estimated for Condition A at each level of Condition B, and then a ratio of the two relative risk surfaces is computed. #' #' \deqn{RR_{Condition B1} = \frac{Condition A2 of B1}{Condition A1 of B1}} #' \deqn{RR_{Condition B2} = \frac{Condition A2 of B2}{Condition A1 of B2}} @@ -31,7 +31,7 @@ #' #' Provides functionality for a correction for multiple testing. If \code{p_correct = "FDR"}, calculates a False Discovery Rate by Benjamini and Hochberg. If \code{p_correct = "uncorrelated Sidak"}, calculates an independent Sidak correction. If \code{p_correct = "uncorrelated Bonferroni"}, calculates an independent Bonferroni correction. If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, then the corrections take into account the into account the spatial correlation of the surface. (NOTE: If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, it may take a considerable amount of computation resources and time to calculate). If \code{p_correct = "Adler and Hasofer"} or if \code{p_correct = "Friston"}, then calculates a correction based on Random Field Theory. If \code{p_correct = "none"} (the default), then the function does not account for multiple testing and uses the uncorrected \code{alpha} level. See the internal \code{pval_correct} function documentation for more details. #' -#' The two condition variables (Condition A and Condition B) within \code{dat} must be of class 'factor' with two levels. The first level in each variable is considered the numerator (i.e., "case") value and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. +#' The two condition variables (Condition A and Condition B) within \code{dat} must be of class 'factor' with two levels. The first level in each variable is considered the numerator (i.e., "case") value, and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. #' #' @return An object of class 'list' where each element is a object of class 'rrs' created by the \code{\link[sparr]{risk}} function with two additional components: #' @@ -108,7 +108,7 @@ lotrrs <- function(dat, } ## win - if (!is.null(win) & class(win) != "owin") { stop("'win' must be class 'owin'") } + if (!is.null(win) & !inherits(win, "owin")) { stop("'win' must be class 'owin'") } if (is.null(win)) { dat <- as.data.frame(dat) dat <- dat[!is.na(dat[ , 4]) & !is.na(dat[ , 5]) , ] diff --git a/R/lrr_plot.R b/R/lrr_plot.R index b10252f..9fc3c8e 100644 --- a/R/lrr_plot.R +++ b/R/lrr_plot.R @@ -33,7 +33,7 @@ lrr_plot <- function(input, digits = 1) { # Inputs - if (class(input) != "im") { + if (!inherits(input, "im")) { stop("The 'input' argument must be an object of class 'im'") } diff --git a/R/package.R b/R/package.R index 5202a2d..9ab81fa 100644 --- a/R/package.R +++ b/R/package.R @@ -2,7 +2,7 @@ #' #' Estimates statistically significant fluorescent marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of fluorescent markers to examine features of cells that may be different between groups. #' -#' @details For a two-group comparison, the 'gateR' package uses the spatial relative risk function that is estimated using the {sparr} package. Details about the {sparr} package methods can be found in the tutorial: Davies et al. (2018) \doi{10.1002/sim.7577}. Details about kernel density estimation can be found in J. F. Bithell (1990) \doi{10.1002/sim.4780090616}. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) \doi{10.1002/sim.4780101112}. +#' @details For a two-group comparison, the 'gateR' package uses the spatial relative risk function estimated using the \code{\link{sparr}} package. Details about the \code{\link{sparr}} package methods can be found in the tutorial: Davies et al. (2018) \doi{10.1002/sim.7577}. Details about kernel density estimation can be found in J. F. Bithell (1990) \doi{10.1002/sim.4780090616}. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) \doi{10.1002/sim.4780101112}. #' #' This package provides a function to perform a gating strategy for flow cytometry data. The 'gateR' package also provides basic visualization for each gate. #' @@ -12,9 +12,9 @@ #' #' \code{\link{gating}} Extracts cells within statistically significant combinations of fluorescent markers, successively, for a set of markers. Statistically significant combinations are identified using two-tailed p-values of a relative risk surface assuming asymptotic normality. This function is currently available for two-level comparisons of a single condition (e.g., case/control) or two conditions (e.g., case/control at time 1 and time 2). Provides functionality for basic visualization and multiple testing correction. #' -#' \code{\link{rrs}} Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition. Includes features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. +#' \code{\link{rrs}} Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition, including features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. #' -#' \code{\link{lotrrs}} Estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions. Includes features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. +#' \code{\link{lotrrs}} Estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions, including features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. #' #' \bold{Flow Cytometry Data} #' diff --git a/R/pval_correct.R b/R/pval_correct.R index 467ed44..88832c5 100644 --- a/R/pval_correct.R +++ b/R/pval_correct.R @@ -13,7 +13,7 @@ #' \item Computes a False Discovery Rate by Benjamini and Hochberg \doi{10.1111/j.2517-6161.1995.tb02031.x} (\code{p_correct = "FDR"}) by: 1) sorting the p-values (p_i) of each knot in ascending order (p_1 <= p_2 <= ... <= p_m), 2) starting from p_m find the first p_i for which p_i <= (i/m) * alpha. #' \item Computes an independent Sidak correction \doi{10.2307/2283989} (\code{p_correct = "uncorrelated Sidak"}) by 1 - (1 - \code{alpha}) ^ (1 / total number of gridded knots across the estimated surface). The default in the \code{\link[sparr]{risk}} function is a resolution of 128 x 128 or n = 16,384 knots and a custom resolution can be specified using the \code{resolution} argument within the \code{\link[sparr]{risk}} function. #' \item Computes an independent Bonferroni correction (\code{p_correct = "uncorrelated Bonferroni"}) by \code{alpha} / total number of gridded knots across the estimated surface. The default in the \code{\link[sparr]{risk}} function is a resolution of 128 x 128 or n = 16,384 knots and a custom resolution can be specified using the \code{resolution} argument within the \code{\link[sparr]{risk}} function. -#' \item Computes a spatially dependent Sidak correction (\code{p_correct = "correlated Sidak"}) by taking into account the spatial correlation of the relative risk surface values (if using the \code{rrs} function for a single condition gate) or the ratio of relative risk surfaces values (if using the \code{lotrrs} function for a two condition gate). The correction use the minimum number of knots that are not spatially correlated instead of the total number of knots. The minimum number of knots that are not spatially correlated is computed by counting the knots that are a distance apart that exceeds the minimum distance of non-significant spatial correlation based on a correlogram using the \code{\link[SpatialPack]{modified.ttest}} function. +#' \item Computes a spatially dependent Sidak correction (\code{p_correct = "correlated Sidak"}) by taking into account the spatial correlation of the relative risk surface values (if using the \code{rrs} function for a single condition gate) or the ratio of relative risk surfaces values (if using the \code{lotrrs} function for a two condition gate). The correction uses the minimum number of knots that are not spatially correlated instead of the total number of knots. The minimum number of knots that are not spatially correlated is computed by counting the knots that are a distance apart that exceeds the minimum distance of non-significant spatial correlation based on a correlogram using the \code{\link[SpatialPack]{modified.ttest}} function. #' \item Computes a spatially dependent Bonferroni correction (\code{p_correct = "correlated Bonferroni"}) by taking into account the spatial correlation of the relative risk surface values (if using the \code{rrs} function for a single condition gate) or the ratio of relative risk surfaces values (if using the \code{lotrrs} function for a two condition gate). The correction uses the minimum number of knots that are not spatially correlated instead of the total number of knots. The minimum number of knots that are not spatially correlated is computed by counting the knots that are a distance apart that exceeds the minimum distance of non-significant spatial correlation based on a correlogram using the \code{\link[SpatialPack]{modified.ttest}} function. #' \item Computes a critical p-value based on Random Field Theory and the Adler and Hasofer equation (\code{p_correct = "Euler A&H"}) \doi{10.1214/aop/1176996176} and p.111 of \doi{10.1137/1.9780898718980}. The correction uses the number of knots that are independent based on the bandwidth used in the kernel density estimation of the spatial relative risk function. #' \item Computes a critical p-value based on Random Field Theory and the Friston et al. equation (\code{p_correct = "Euler Friston"}) \doi{10.1038/jcbfm.1991.122} which differs from Adler and Hasofer's equation by a factor of 0.79. The correction uses the number of knots that are independent based on the bandwidth used in the kernel density estimation of the spatial relative risk function. diff --git a/R/pval_plot.R b/R/pval_plot.R index c5d5e59..8611f47 100644 --- a/R/pval_plot.R +++ b/R/pval_plot.R @@ -23,7 +23,7 @@ pval_plot <- function(input, alpha) { # Inputs - if (class(input) != "im") { + if (!inherits(input, "im")) { stop("The 'input' argument must be an object of class 'im'") } diff --git a/R/randCyto.R b/R/randCyto.R index 2ddd703..0355c86 100644 --- a/R/randCyto.R +++ b/R/randCyto.R @@ -15,5 +15,5 @@ #' @examples #' head(randCyto) #' -#' @source \url{https://github.com/Waller-SUSAN/gateR/blob/master/README.md} +#' @source \url{https://github.com/lance-waller-lab/gateR/blob/master/README.md} "randCyto" diff --git a/R/rrs.R b/R/rrs.R index 1c1982b..394a81f 100644 --- a/R/rrs.R +++ b/R/rrs.R @@ -1,6 +1,6 @@ #' A single gate for a single condition #' -#' Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition. Includes features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. +#' Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition, including features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. #' #' @param dat Input data frame flow cytometry data with four (4) features (columns): 1) ID, 2) Condition A ID, 3) Marker A as x-coordinate, 4) Marker B as y-coordinate. #' @param bandw Optional, numeric. Fixed bandwidth for the kernel density estimation. Default is based on the internal \code{[sparr]{OS}} function. @@ -12,9 +12,9 @@ #' @param name_gate Optional, character. The filename of the visualization. The default is "gate". #' @param path_gate Optional, character. The path of the visualization. The default is the current working directory. #' @param rcols Character string of length three (3) specifying the colors for: 1) group A (numerator), 2) neither, and 3) group B (denominator) designations. The defaults are \code{c("#FF0000", "#cccccc", "#0000FF")} or \code{c("red", "grey80", "blue")}. -#' @param lower_lrr Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit and the color key will include the minimum value of the log relative risk surface. -#' @param upper_lrr Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit and the color key will include the maximum value of the log relative risk surface. -#' @param c1n Optional, character. The name of the level for the numerator of condition A. The default is null and the first level is treated as the numerator. +#' @param lower_lrr Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit, and the color key will include the minimum value of the log relative risk surface. +#' @param upper_lrr Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit, and the color key will include the maximum value of the log relative risk surface. +#' @param c1n Optional, character. The name of the level for the numerator of condition A. The default is NULL, and the first level is treated as the numerator. #' @param win Optional. Object of class \code{owin} for a custom two-dimensional window within which to estimate the surfaces. The default is NULL and calculates a convex hull around the data. #' @param ... Arguments passed to \code{\link[sparr]{risk}} to select resolution. #' @param doplot `r lifecycle::badge("deprecated")` \code{doplot} is no longer supported and has been renamed \code{plot_gate}. @@ -24,7 +24,7 @@ #' #' Provides functionality for a correction for multiple testing. If \code{p_correct = "FDR"}, calculates a False Discovery Rate by Benjamini and Hochberg. If \code{p_correct = "uncorrelated Sidak"}, calculates an independent Sidak correction. If \code{p_correct = "uncorrelated Bonferroni"}, calculates an independent Bonferroni correction. If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, then the corrections take into account the into account the spatial correlation of the surface. (NOTE: If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, it may take a considerable amount of computation resources and time to calculate). If \code{p_correct = "Adler and Hasofer"} or if \code{p_correct = "Friston"}, then calculates a correction based on Random Field Theory. If \code{p_correct = "none"} (the default), then the function does not account for multiple testing and uses the uncorrected \code{alpha} level. See the internal \code{pval_correct} function documentation for more details. #' -#' The condition variable (Condition A) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value and the second level is considered the denominator (i.e., "control") value. The level can also be specified using the \code{c1n} parameter. +#' The condition variable (Condition A) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value, and the second level is considered the denominator (i.e., "control") value. The level can also be specified using the \code{c1n} parameter. #' #' @return An object of class 'list' where each element is a object of class 'rrs' created by the \code{\link[sparr]{risk}} function with two additional components: #' @@ -98,7 +98,7 @@ rrs <- function(dat, } ## win - if (!is.null(win) & class(win) != "owin") { stop("'win' must be class 'owin'") } + if (!is.null(win) & !inherits(win, "owin")) { stop("'win' must be class 'owin'") } if (is.null(win)) { dat <- as.data.frame(dat) dat <- dat[!is.na(dat[ , 4]) & !is.na(dat[ , 5]) , ] diff --git a/README.md b/README.md index db26638..4e2e341 100644 --- a/README.md +++ b/README.md @@ -6,11 +6,11 @@ gateR: Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation -**Date repository last updated**: August 08, 2022 +**Date repository last updated**: August 25, 2022

@@ -18,23 +18,23 @@ Overview

-The `gateR` package is a suite of `R` functions to identify significant spatial clustering of flow and mass cytometry data used in immunological investigations. For a two-group comparison we detect clusters using the kernel-based spatial relative risk function that is estimated using the [sparr](https://CRAN.R-project.org/package=sparr) package. The tests are conducted in two-dimensional space comprised of two fluorescent markers. +The `gateR` package is a suite of `R` functions to identify significant spatial clustering of flow and mass cytometry data used in immunological investigations. For a two-group comparison, we detect clusters using the kernel-based spatial relative risk function estimated using the [sparr](https://CRAN.R-project.org/package=sparr) package. The tests are conducted in a two-dimensional space comprised of two fluorescent markers. Examples of a single condition with two groups: -1. Disease case v. healthy control -2. Time 2 v. Time 1 (baseline) +1. Disease case vs. Healthy control +2. Time 2 vs. Time 1 (baseline) -For a two-group comparison of two conditions we estimate two relative risk surfaces for one condition and then a ratio of the relative risks. For example: +For a two-group comparison of two conditions, we estimate two relative risk surfaces for one condition and then a ratio of the relative risks. For example: 1. Estimate a relative risk surface for: - 1. Condition 2B v. Condition 2A - 2. Condition 1B v. Condition 1A -2. Estimate relative risk surface for the ratio: + 1. Condition 2B vs. Condition 2A + 2. Condition 1B vs. Condition 1A +2. Estimate the relative risk surface for the ratio: \frac{ \big(\frac{Condition2B}{Condition2A}\big)}{\big(\frac{Condition1B}{Condition1A}\big)} -Within areas where the relative risk exceeds an asymptotic normal assumption, the `gateR` package has functionality to examine the features of these cells. Basic visualization is also supported. +Within areas where the relative risk exceeds an asymptotic normal assumption, the `gateR` package has the functionality to examine the features of these cells. Basic visualization is also supported.

@@ -48,7 +48,7 @@ To install the release version from CRAN: To install the development version from GitHub: - devtools::install_github("Waller-SUSAN/gateR") + devtools::install_github("lance-waller-lab/gateR")

@@ -108,7 +108,7 @@ Available sample data sets randCyto -A sample dataset containing information about flow cytometry data with two binary conditions and four markers. The data are a random subset of the 'extdata' data in the flowWorkspaceData package found on Bioconductor and formated for `gateR` input. +A sample dataset containing information about flow cytometry data with two binary conditions and four markers. The data are a random subset of the 'extdata' data in the flowWorkspaceData package found on Bioconductor and formatted for `gateR` input. @@ -121,7 +121,7 @@ Authors * **Ian D. Buller** - *Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland* - [GitHub](https://github.com/idblr) - [ORCID](https://orcid.org/0000-0001-9477-8582) -See also the list of [contributors](https://github.com/Waller-SUSAN/gateR/graphs/contributors) who participated in this project. Main contributors include: +See also the list of [contributors](https://github.com/lance-waller-lab/gateR/graphs/contributors) who participated in this project. Main contributors include: * **Elena Hsieh** - *Immunology & Microbiology and Pediatrics, University of Colorado Anschutz School of Medicine* - [GitHub](https://github.com/elenahsieh1407) - [ORCID](https://orcid.org/0000-0003-3969-6597) * **Debashis Ghosh** - *Biostatistics & Informatics, Colorado School of Public Health, Aurora, Colorado* - [GitHub](https://github.com/ghoshd) - [ORCID](https://orcid.org/0000-0001-5672-7645) @@ -250,7 +250,7 @@ test_lotrrs <- gateR::lotrrs(dat = obs_dat[ , -5:-4]) ### Funding -This package was developed while the author was a doctoral student at in the [Environmental Health Sciences doctoral program](https://www.sph.emory.edu/departments/eh/degree-programs/phd/index.html) at [Emory University](https://www.emory.edu) and a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov/) at the [National Cancer Institute](https://www.cancer.gov/). +This package was developed while the author was a doctoral student at in the [Environmental Health Sciences doctoral program](https://www.sph.emory.edu/departments/eh/degree-programs/phd/index.html) at [Emory University](https://www.emory.edu/home/index.html) and a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov/) at the [National Cancer Institute](https://www.cancer.gov/). ### Acknowledgments @@ -260,4 +260,4 @@ When citing this package for publication, please follow: ### Questions? Feedback? -For questions about the package please contact the maintainer [Dr. Ian D. Buller](mailto:ian.buller@nih.gov) or [submit a new issue](https://github.com/Waller-SUSAN/gateR/issues). +For questions about the package, please contact the maintainer [Dr. Ian D. Buller](mailto:ian.buller@nih.gov) or [submit a new issue](https://github.com/lance-waller-lab/gateR/issues). diff --git a/cran-comments.md b/cran-comments.md index 94e535e..151aff8 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,16 +1,37 @@ -## This is the tenth resubmission +## This is the eleventh resubmission + +* First resubmission after CRAN archived `gateR` on 2022-08-08 because a dependency `SpatialPack` and its dependency `fastmatrix` were archived. Since the warning from CRAN 2022-07-20, the `gateR` maintainer worked with the maintainer of both packages. The dependencies have returned to CRAN and this maintainer is now resubmitting `gateR`. + +* Actions taken regarding feedback from CRAN teams' auto-check service: + * Replaced `if()` conditions comparing `class()` to string with `inherits()` * Updates since previous submission: - * Updated dependencies `spatstat.core` and `spatstat.linnet` packages based on feedback from the Spatstat Team (Adrian Baddeley and Ege Rubak). All random generators in `spatstat.core` were moved to a new package `spatstat.random` - * `spatstat.geom`, `spatstat.core`, `spatstat.linnet`, and `spatstat (>=2.0-0)` are no longer Depends - * `spatstat.geom` is now Imports - * Fixed annotation typos in the vignette. Removed packages no longer used in the vignette - * `dplyr`, `ncdfFlow`, `flowWorkspaceData`, and `usethis` now Suggests (for generating random data set `randCyto`) + * Updated package URL and BugReports to renamed GitHub account "lance-waller-lab" (previously "Waller-SUSAN") + * `tools` is no longer Imports + * `utils` is now Suggests because "zzz.R" calls the `packageDescription()` function + * `ncdfFlow`, `flowWorkspaceData` are no longer Suggests because "Package suggested but not available for checking" in the following CRAN environments: + * r-devel-linux-x86_64-fedora-clang + * r-devel-linux-x86_64-fedora-gcc + * r-devel-windows-x86_64-new-TK + * r-release-linux-x86_64 + * r-release-macos-x86_64 + * r-oldrel-macos-x86_64 + * Added CITATION file + * Fixed typos in documentation throughout + +* Documentation for DESCRIPTION references the following DOIs that throw a NOTE in win-builder, Fedora Linux, and Ubuntu Linux but are valid URLs: + * + * + * -* Documentation for `pval_correct()` references doi and that throw NOTES in win-builder, Fedora Linux, and Ubuntu Linux but these are valid URLs +* Documentation for `pval_correct()` references the following DOIs that throw a NOTE in win-builder, Fedora Linux, and Ubuntu Linux but are valid URLs: + * + * + * + * ## Test environments -* local OS X install, R 4.1.2 +* local OS X install, R 4.2.1 * win-builder, (devel, release, oldrelease) * Rhub * Fedora Linux, R-devel, clang, gfortran diff --git a/inst/CITATION b/inst/CITATION new file mode 100755 index 0000000..e77b527 --- /dev/null +++ b/inst/CITATION @@ -0,0 +1,19 @@ +citHeader("To cite gateR in publications, please use the following and include the version number and DOI:") + +citEntry(entry = "manual", + title = "gateR: Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation", + author = personList(as.person("Ian D. Buller")), + publisher = "The Comprehensive R Archive Network", + year = "2022", + number = "0.1.11", + doi = "10.5281/zenodo.5347892", + url = "https://cran.r-project.org/package=gateR", + + textVersion = + paste("Ian D. Buller (2022).", + "gateR: Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation.", + "The Comprehensive R Archive Network.", + "v0.1.11.", + "DOI:10.5281/zenodo.5347892", + "Accessed by: https://cran.r-project.org/package=gateR") +) diff --git a/man/gateR-package.Rd b/man/gateR-package.Rd index 2aee5d1..c36420e 100644 --- a/man/gateR-package.Rd +++ b/man/gateR-package.Rd @@ -9,7 +9,7 @@ Estimates statistically significant fluorescent marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of fluorescent markers to examine features of cells that may be different between groups. } \details{ -For a two-group comparison, the 'gateR' package uses the spatial relative risk function that is estimated using the {sparr} package. Details about the {sparr} package methods can be found in the tutorial: Davies et al. (2018) \doi{10.1002/sim.7577}. Details about kernel density estimation can be found in J. F. Bithell (1990) \doi{10.1002/sim.4780090616}. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) \doi{10.1002/sim.4780101112}. +For a two-group comparison, the 'gateR' package uses the spatial relative risk function estimated using the \code{\link{sparr}} package. Details about the \code{\link{sparr}} package methods can be found in the tutorial: Davies et al. (2018) \doi{10.1002/sim.7577}. Details about kernel density estimation can be found in J. F. Bithell (1990) \doi{10.1002/sim.4780090616}. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) \doi{10.1002/sim.4780101112}. This package provides a function to perform a gating strategy for flow cytometry data. The 'gateR' package also provides basic visualization for each gate. @@ -19,9 +19,9 @@ Key content of the 'gateR' package include:\cr \code{\link{gating}} Extracts cells within statistically significant combinations of fluorescent markers, successively, for a set of markers. Statistically significant combinations are identified using two-tailed p-values of a relative risk surface assuming asymptotic normality. This function is currently available for two-level comparisons of a single condition (e.g., case/control) or two conditions (e.g., case/control at time 1 and time 2). Provides functionality for basic visualization and multiple testing correction. -\code{\link{rrs}} Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition. Includes features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. +\code{\link{rrs}} Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition, including features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. -\code{\link{lotrrs}} Estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions. Includes features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. +\code{\link{lotrrs}} Estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions, including features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. \bold{Flow Cytometry Data} diff --git a/man/gating.Rd b/man/gating.Rd index ad9533b..912e4f9 100644 --- a/man/gating.Rd +++ b/man/gating.Rd @@ -55,13 +55,13 @@ gating( \item{rcols}{Character string of length three (3) specifying the colors for: 1) group A (numerator), 2) neither, and 3) group B (denominator) designations. The defaults are \code{c("#FF0000", "#cccccc", "#0000FF")} or \code{c("red", "grey80", "blue")}.} -\item{lower_lrr}{Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit and the color key will include the minimum value of the log relative risk surface.} +\item{lower_lrr}{Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit, and the color key will include the minimum value of the log relative risk surface.} -\item{upper_lrr}{Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit and the color key will include the maximum value of the log relative risk surface.} +\item{upper_lrr}{Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit, and the color key will include the maximum value of the log relative risk surface.} -\item{c1n}{Optional, character. The name of the level for the numerator of condition A. The default is null and the first level is treated as the numerator.} +\item{c1n}{Optional, character. The name of the level for the numerator of condition A. The default is NULL, and the first level is treated as the numerator.} -\item{c2n}{Optional, character. The name of the level for the numerator of condition B. The default is null and the first level is treated as the numerator.} +\item{c2n}{Optional, character. The name of the level for the numerator of condition B. The default is NULL, and the first level is treated as the numerator.} \item{win}{Optional. Object of class \code{owin} for a custom two-dimensional window within which to estimate the surfaces. The default is NULL and calculates a convex hull around the data.} @@ -97,9 +97,9 @@ Extracts cells within statistically significant combinations of fluorescent mark \details{ This function performs a sequential gating strategy for mass cytometry data comparing two levels with one or two conditions. Gates are typically two-dimensional space comprised of two fluorescent markers. The two-level comparison allows for the estimation of a spatial relative risk function and the computation of p-value based on an assumption of asymptotic normality. Cells within statistically significant areas are extracted and used in the next gate. This function relies heavily upon the \code{\link[sparr]{risk}} function. Basic visualization is available if \code{plot_gate = TRUE}. -The \code{vars} argument must be a vector with an even-numbered length where the odd-numbered elements are the markers used on the x-axis of a gate and the even-numbered elements are the markers used on the y-axis of a gate. For example, if \code{vars = c("V1", "V2", "V3", and "V4")} then the first gate is "V1" on the x-axis and "V2" on the y-axis and then the second gate is V3" on the x-axis and "V4" on the y-axis. Makers can be repeated in successive gates. +The \code{vars} argument must be a vector with an even-numbered length where the odd-numbered elements are the markers used on the x-axis of a gate, and the even-numbered elements are the markers used on the y-axis of a gate. For example, if \code{vars = c("V1", "V2", "V3", and "V4")} then the first gate is "V1" on the x-axis and "V2" on the y-axis and then the second gate is V3" on the x-axis and "V4" on the y-axis. Makers can be repeated in successive gates. -The \code{n_condition} argument specifies if the gating strategy is performed for one condition or two conditions. If \code{n_condition = 1}, then the function performs a one condition gating strategy using the internal \code{rrs} function, which computes the statistically significant areas (clusters) of a relative risk surface at each gate and selects the cells within the clusters specified by the \code{numerator} argument. If \code{n_condition = 2}, then the function performs a two conditions gating strategy using the internal \code{lotrrs} function, which computes the statistically significant areas (clusters) of a ratio of relative risk surfaces at each gate and selects the cells within the clusters specified by the \code{numerator} argument. The condition variable(s) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. See the documentation for the internal \code{rrs} and \code{lotrrs} functions for more details. +The \code{n_condition} argument specifies if the gating strategy is performed for one condition or two conditions. If \code{n_condition = 1}, then the function performs a one condition gating strategy using the internal \code{rrs} function, which computes the statistically significant areas (clusters) of a relative risk surface at each gate and selects the cells within the clusters specified by the \code{numerator} argument. If \code{n_condition = 2}, then the function performs a two conditions gating strategy using the internal \code{lotrrs} function, which computes the statistically significant areas (clusters) of a ratio of relative risk surfaces at each gate and selects the cells within the clusters specified by the \code{numerator} argument. The condition variable(s) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value, and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. See the documentation for the internal \code{rrs} and \code{lotrrs} functions for more details. The p-value surface of the ratio of relative risk surfaces is estimated assuming asymptotic normality of the ratio value at each gridded knot. The bandwidth is fixed across all layers. diff --git a/man/lotrrs.Rd b/man/lotrrs.Rd index 7ab36e5..4e973c9 100644 --- a/man/lotrrs.Rd +++ b/man/lotrrs.Rd @@ -46,13 +46,13 @@ lotrrs( \item{rcols}{Character string of length three (3) specifying the colors for: 1) group A (numerator), 2) neither, and 3) group B (denominator) designations. The defaults are \code{c("#FF0000", "#cccccc", "#0000FF")} or \code{c("red", "grey80", "blue")}.} -\item{lower_lrr}{Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit and the color key will include the minimum value of the log relative risk surface.} +\item{lower_lrr}{Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit, and the color key will include the minimum value of the log relative risk surface.} -\item{upper_lrr}{Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit and the color key will include the maximum value of the log relative risk surface.} +\item{upper_lrr}{Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit, and the color key will include the maximum value of the log relative risk surface.} -\item{c1n}{Optional, character. The name of the level for the numerator of condition A. The default is null and the first level is treated as the numerator.} +\item{c1n}{Optional, character. The name of the level for the numerator of condition A. The default is NULL, and the first level is treated as the numerator.} -\item{c2n}{Optional, character. The name of the level for the numerator of condition B. The default is null and the first level is treated as the numerator.} +\item{c2n}{Optional, character. The name of the level for the numerator of condition B. The default is NULL, and the first level is treated as the numerator.} \item{win}{Optional. Object of class \code{owin} for a custom two-dimensional window within which to estimate the surfaces. The default is NULL and calculates a convex hull around the data.} @@ -78,7 +78,7 @@ An object of class 'list' where each element is a object of class 'rrs' created Estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions. Includes features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. } \details{ -This function estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions using three successive \code{\link[sparr]{risk}} functions. A relative risk surface is estimated for Condition A at each level of Condition B and then a ratio of the two relative risk surfaces is computed. +This function estimates a ratio of relative risk surfaces and computes the asymptotic p-value surface for a single gate with two conditions using three successive \code{\link[sparr]{risk}} functions. A relative risk surface is estimated for Condition A at each level of Condition B, and then a ratio of the two relative risk surfaces is computed. \deqn{RR_{Condition B1} = \frac{Condition A2 of B1}{Condition A1 of B1}} \deqn{RR_{Condition B2} = \frac{Condition A2 of B2}{Condition A1 of B2}} @@ -88,7 +88,7 @@ The p-value surface of the ratio of relative risk surfaces is estimated assuming Provides functionality for a correction for multiple testing. If \code{p_correct = "FDR"}, calculates a False Discovery Rate by Benjamini and Hochberg. If \code{p_correct = "uncorrelated Sidak"}, calculates an independent Sidak correction. If \code{p_correct = "uncorrelated Bonferroni"}, calculates an independent Bonferroni correction. If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, then the corrections take into account the into account the spatial correlation of the surface. (NOTE: If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, it may take a considerable amount of computation resources and time to calculate). If \code{p_correct = "Adler and Hasofer"} or if \code{p_correct = "Friston"}, then calculates a correction based on Random Field Theory. If \code{p_correct = "none"} (the default), then the function does not account for multiple testing and uses the uncorrected \code{alpha} level. See the internal \code{pval_correct} function documentation for more details. -The two condition variables (Condition A and Condition B) within \code{dat} must be of class 'factor' with two levels. The first level in each variable is considered the numerator (i.e., "case") value and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. +The two condition variables (Condition A and Condition B) within \code{dat} must be of class 'factor' with two levels. The first level in each variable is considered the numerator (i.e., "case") value, and the second level is considered the denominator (i.e., "control") value. The levels can also be specified using the \code{c1n} and \code{c2n} parameters. } \examples{ test_lotrrs <- lotrrs(dat = randCyto) diff --git a/man/pval_correct.Rd b/man/pval_correct.Rd index faded3e..5010028 100644 --- a/man/pval_correct.Rd +++ b/man/pval_correct.Rd @@ -34,7 +34,7 @@ This function provides functionality for multiple testing correction in five way \item Computes a False Discovery Rate by Benjamini and Hochberg \doi{10.1111/j.2517-6161.1995.tb02031.x} (\code{p_correct = "FDR"}) by: 1) sorting the p-values (p_i) of each knot in ascending order (p_1 <= p_2 <= ... <= p_m), 2) starting from p_m find the first p_i for which p_i <= (i/m) * alpha. \item Computes an independent Sidak correction \doi{10.2307/2283989} (\code{p_correct = "uncorrelated Sidak"}) by 1 - (1 - \code{alpha}) ^ (1 / total number of gridded knots across the estimated surface). The default in the \code{\link[sparr]{risk}} function is a resolution of 128 x 128 or n = 16,384 knots and a custom resolution can be specified using the \code{resolution} argument within the \code{\link[sparr]{risk}} function. \item Computes an independent Bonferroni correction (\code{p_correct = "uncorrelated Bonferroni"}) by \code{alpha} / total number of gridded knots across the estimated surface. The default in the \code{\link[sparr]{risk}} function is a resolution of 128 x 128 or n = 16,384 knots and a custom resolution can be specified using the \code{resolution} argument within the \code{\link[sparr]{risk}} function. -\item Computes a spatially dependent Sidak correction (\code{p_correct = "correlated Sidak"}) by taking into account the spatial correlation of the relative risk surface values (if using the \code{rrs} function for a single condition gate) or the ratio of relative risk surfaces values (if using the \code{lotrrs} function for a two condition gate). The correction use the minimum number of knots that are not spatially correlated instead of the total number of knots. The minimum number of knots that are not spatially correlated is computed by counting the knots that are a distance apart that exceeds the minimum distance of non-significant spatial correlation based on a correlogram using the \code{\link[SpatialPack]{modified.ttest}} function. +\item Computes a spatially dependent Sidak correction (\code{p_correct = "correlated Sidak"}) by taking into account the spatial correlation of the relative risk surface values (if using the \code{rrs} function for a single condition gate) or the ratio of relative risk surfaces values (if using the \code{lotrrs} function for a two condition gate). The correction uses the minimum number of knots that are not spatially correlated instead of the total number of knots. The minimum number of knots that are not spatially correlated is computed by counting the knots that are a distance apart that exceeds the minimum distance of non-significant spatial correlation based on a correlogram using the \code{\link[SpatialPack]{modified.ttest}} function. \item Computes a spatially dependent Bonferroni correction (\code{p_correct = "correlated Bonferroni"}) by taking into account the spatial correlation of the relative risk surface values (if using the \code{rrs} function for a single condition gate) or the ratio of relative risk surfaces values (if using the \code{lotrrs} function for a two condition gate). The correction uses the minimum number of knots that are not spatially correlated instead of the total number of knots. The minimum number of knots that are not spatially correlated is computed by counting the knots that are a distance apart that exceeds the minimum distance of non-significant spatial correlation based on a correlogram using the \code{\link[SpatialPack]{modified.ttest}} function. \item Computes a critical p-value based on Random Field Theory and the Adler and Hasofer equation (\code{p_correct = "Euler A&H"}) \doi{10.1214/aop/1176996176} and p.111 of \doi{10.1137/1.9780898718980}. The correction uses the number of knots that are independent based on the bandwidth used in the kernel density estimation of the spatial relative risk function. \item Computes a critical p-value based on Random Field Theory and the Friston et al. equation (\code{p_correct = "Euler Friston"}) \doi{10.1038/jcbfm.1991.122} which differs from Adler and Hasofer's equation by a factor of 0.79. The correction uses the number of knots that are independent based on the bandwidth used in the kernel density estimation of the spatial relative risk function. diff --git a/man/randCyto.Rd b/man/randCyto.Rd index a1afdc8..c99690d 100644 --- a/man/randCyto.Rd +++ b/man/randCyto.Rd @@ -17,7 +17,7 @@ A data frame with 11763 rows and 7 variables: } } \source{ -\url{https://github.com/Waller-SUSAN/gateR/blob/master/README.md} +\url{https://github.com/lance-waller-lab/gateR/blob/master/README.md} } \usage{ randCyto diff --git a/man/rrs.Rd b/man/rrs.Rd index ead57c5..231651d 100644 --- a/man/rrs.Rd +++ b/man/rrs.Rd @@ -45,11 +45,11 @@ rrs( \item{rcols}{Character string of length three (3) specifying the colors for: 1) group A (numerator), 2) neither, and 3) group B (denominator) designations. The defaults are \code{c("#FF0000", "#cccccc", "#0000FF")} or \code{c("red", "grey80", "blue")}.} -\item{lower_lrr}{Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit and the color key will include the minimum value of the log relative risk surface.} +\item{lower_lrr}{Optional, numeric. Lower cut-off value for the log relative risk value in the color key (typically a negative value). The default is no limit, and the color key will include the minimum value of the log relative risk surface.} -\item{upper_lrr}{Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit and the color key will include the maximum value of the log relative risk surface.} +\item{upper_lrr}{Optional, numeric. Upper cut-off value for the log relative risk value in the color key (typically a positive value). The default is no limit, and the color key will include the maximum value of the log relative risk surface.} -\item{c1n}{Optional, character. The name of the level for the numerator of condition A. The default is null and the first level is treated as the numerator.} +\item{c1n}{Optional, character. The name of the level for the numerator of condition A. The default is NULL, and the first level is treated as the numerator.} \item{win}{Optional. Object of class \code{owin} for a custom two-dimensional window within which to estimate the surfaces. The default is NULL and calculates a convex hull around the data.} @@ -72,14 +72,14 @@ An object of class 'list' where each element is a object of class 'rrs' created } } \description{ -Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition. Includes features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. +Estimates a relative risk surface and computes the asymptotic p-value surface for a single gate with a single condition, including features for basic visualization. This function is used internally within the \code{\link{gating}} function to extract the points within the significant areas. This function can also be used as a standalone function. } \details{ This function estimates a relative risk surface and computes the asymptotic p-value surface for a single gate and single condition using the \code{\link[sparr]{risk}} function. Bandwidth is fixed across both layers (numerator and denominator spatial densities). Basic visualization is available if \code{plot_gate = TRUE}. Provides functionality for a correction for multiple testing. If \code{p_correct = "FDR"}, calculates a False Discovery Rate by Benjamini and Hochberg. If \code{p_correct = "uncorrelated Sidak"}, calculates an independent Sidak correction. If \code{p_correct = "uncorrelated Bonferroni"}, calculates an independent Bonferroni correction. If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, then the corrections take into account the into account the spatial correlation of the surface. (NOTE: If \code{p_correct = "correlated Sidak"} or if \code{p_correct = "correlated Bonferroni"}, it may take a considerable amount of computation resources and time to calculate). If \code{p_correct = "Adler and Hasofer"} or if \code{p_correct = "Friston"}, then calculates a correction based on Random Field Theory. If \code{p_correct = "none"} (the default), then the function does not account for multiple testing and uses the uncorrected \code{alpha} level. See the internal \code{pval_correct} function documentation for more details. -The condition variable (Condition A) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value and the second level is considered the denominator (i.e., "control") value. The level can also be specified using the \code{c1n} parameter. +The condition variable (Condition A) within \code{dat} must be of class 'factor' with two levels. The first level is considered the numerator (i.e., "case") value, and the second level is considered the denominator (i.e., "control") value. The level can also be specified using the \code{c1n} parameter. } \examples{ test_rrs <- rrs(dat = randCyto) diff --git a/vignettes/vignette.Rmd b/vignettes/vignette.Rmd index f259e2c..0ace750 100644 --- a/vignettes/vignette.Rmd +++ b/vignettes/vignette.Rmd @@ -13,25 +13,25 @@ vignette: > knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, cache = FALSE, fig.width = 7, fig.height = 7, fig.show = "hold") ``` -The gateR package is a suite of R functions to identify significant spatial clustering of mass and flow cytometry data used in immunological investigations. The gateR package can be used for a panel of all surface markers, or a mixture of surface markers and functional read outs. The gateR package performs a gating technique that estimates statistically significant marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of markers to examine features of cells that may be different between groups. For a two-group comparison, the gateR package uses the spatial relative risk function that is estimated using the [sparr](https://CRAN.R-project.org/package=sparr) package. The gates are conducted in two-dimensional space comprised of two markers. +The gateR package is a suite of R functions to identify significant spatial clustering of mass and flow cytometry data used in immunological investigations. The gateR package can be used for a panel of all surface markers or a mixture of surface markers and functional readouts. The gateR package performs a gating technique that estimates statistically significant marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of markers to examine features of cells that may be different between groups. For a two-group comparison, the gateR package uses the spatial relative risk function estimated using the [sparr](https://CRAN.R-project.org/package=sparr) package. The gates are conducted in two-dimensional space comprised of two markers. Examples of a single condition with two groups: -1. Disease case v. healthy control -2. Time 2 v. Time 1 (baseline) +1. Disease case vs. Healthy control +2. Time 2 vs. Time 1 (baseline) -For a two-group comparison of two conditions we estimate two relative risk surfaces for one condition and then a ratio of the relative risks. For example: +For a two-group comparison of two conditions, we estimate two relative risk surfaces for one condition and then a ratio of the relative risks. For example: 1. Estimate a relative risk surface for: - 1. Condition 2B v. Condition 2A - 2. Condition 1B v. Condition 1A -2. Estimate relative risk surface for the ratio: + 1. Condition 2B vs. Condition 2A + 2. Condition 1B vs. Condition 1A +2. Estimate the relative risk surface for the ratio: $$\frac{(\frac{Condition2B}{Condition2A})}{(\frac{Condition1B}{Condition1A})}$$ -Within areas where the relative risk exceeds an asymptotic normal assumption, the gateR package has functionality to examine the features of these cells. +Within areas where the relative risk exceeds an asymptotic normal assumption, the gateR package has the functionality to examine the features of these cells. -This vignette provides an implementation of the gateR package using a randomly generated data set. Please see the README.md file within the [gateR GitHub repository](https://github.com/Waller-SUSAN/gateR) for an example using publicly available flow cytometry data from the [flowWorkspaceData](https://bioconductor.org/packages/release/data/experiment/html/flowWorkspaceData.html) package available via [Bioconductor](https://bioconductor.org/). Here, we generate data with two conditions, four markers, and two additional features. +This vignette implements the gateR package using a randomly generated data set. Please see the README.md file within the [gateR GitHub repository](https://github.com/lance-waller-lab/gateR) for an example using publicly available flow cytometry data from the [flowWorkspaceData](https://bioconductor.org/packages/release/data/experiment/html/flowWorkspaceData.html) package available via [Bioconductor](https://bioconductor.org/). Here, we generate data with two conditions, four markers, and two additional features. We start with the necessary packages and seed for the vignette. @@ -43,7 +43,7 @@ We start with the necessary packages and seed for the vignette. ### Generate random toy data -Unique function to randomly generate data multivariate normal (MVN) around a central point. Parameters include the centroid coordinates (`centre`), number of observations to generate (`ncell`), and the standard deviation of the normal distribution (`scalar`). +A unique function randomly generates multivariate normal (MVN) data around a central point. Parameters include the centroid coordinates (`centre`), the number of observations to generate (`ncell`), and the standard deviation of the normal distribution (`scalar`). ```{r rand_mvn_function} rand_mvn <- function(centre, ncell, scalar) { @@ -232,7 +232,7 @@ The toy data frame has nine columns (id, groups, markers, and cytokines). total_time <- end_time - start_time # calculate duration of gating() example ``` -The gating process took about `r round(total_time, digits = 1)` *seconds* on a Macbook Pro (4 variables, 2 gates, 2 cytokines, `r format(nrow(df_full), big.mark= ",")` observations). The corrected significance level in the first gate was `r formatC(out_gate$lrr[[1]]$alpha, format = "e", digits = 2)`. The histograms for the two cytokines are the same as above. +The gating process took about `r round(total_time, digits = 1)` seconds on a machine with the features listed at the end of the vignette (4 variables, 2 gates, 2 cytokines, `r format(nrow(df_full), big.mark= ",")` observations). The corrected significance level in the first gate was `r formatC(out_gate$lrr[[1]]$alpha, format = "e", digits = 2)`. The histograms for the two cytokines are the same as above. ```{r 2C_cytokinesA} # Plot of Cytokine 1 @@ -297,7 +297,7 @@ Compare histograms before and after gating. Gating reduced the overall sample si col = "black", lty = 1, main = "Cytokine 2 of cases\npost-gating", - xlim = c(-5 ,5), + xlim = c(-5, 5), ylim = c(0, 0.5)) ``` @@ -326,7 +326,7 @@ Compare histograms before and after gating. Gating reduced the overall sample si total_time <- end_time - start_time # calculate duration of gating() example ``` -The gating process took about `r round(total_time, digits = 1)` *seconds* on a Macbook Pro (4 variables, 2 gates, 2 cytokines, `r format(nrow(df_sub), big.mark= ",")` observations). The corrected significance level in the first gate was `r formatC(out_gate$lrr[[1]]$alpha, format = "e", digits = 2)`. The histograms for the two cytokines are the same as above. +The gating process took about `r round(total_time, digits = 1)` seconds on a machine with the features listed at the end of the vignette (4 variables, 2 gates, 2 cytokines, `r format(nrow(df_sub), big.mark= ",")` observations). The corrected significance level in the first gate was `r formatC(out_gate$lrr[[1]]$alpha, format = "e", digits = 2)`. The histograms for the two cytokines are the same as above. ```{r 1C_cytokinesA} # Plot of Cytokine 1 @@ -384,15 +384,19 @@ Compare histograms before and after gating. Gating reduced the overall sample si col = "black", lty = 1, main = "Cytokine 2 of cases\npost-gating", - xlim = c(-5 ,5), + xlim = c(-5, 5), ylim = c(0, 0.5)) ``` ### Current limitations -1. Extracts observations at *all* significant clusters (either case or controls) and there is currently no functionality to select cells within a specific (set of) cluster(s) for the next gate. +1. Extracts observations at *all* significant clusters (either case or controls), and there is currently no functionality to select cells within a specific (set of) cluster(s) for the next gate. 2. Only two dimensions (i.e., markers) per gate because the spatial relative risk function is a two-dimensional spatial statistic. -3. Only two-group comparisons (e.g., case v. control) per gate because the spatial relative risk function is a ratio by nature. -4. Only comparisons of one condition or comparisons of two conditions are possible. +3. Only two-group comparisons (e.g., case vs. control) per gate because the spatial relative risk function is a ratio by nature. +4. Only comparisons of one condition or two conditions are possible. 5. Large computational expense (i.e., run-time) to calculate the correlated Bonferroni correction. 6. A large sample size of observations (i.e., cells) may overload the gateR process. We are evaluating this potential limitation and developing a possible solution (e.g., randomly subsetting the data to estimate the clusters at each gate). + +```{r system} +sessionInfo() +``` diff --git a/vignettes/vignette.html b/vignettes/vignette.html index 5a9c0b2..58529c5 100644 --- a/vignettes/vignette.html +++ b/vignettes/vignette.html @@ -12,11 +12,23 @@ - + gateR: Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation - + + @@ -139,37 +332,63 @@ -

gateR: Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation

+

gateR: Flow/Mass Cytometry Gating via +Spatial Kernel Density Estimation

Ian D. Buller, Ph.D., M.A. (Github: @idblr)

-

2022-02-03

- - - -

The gateR package is a suite of R functions to identify significant spatial clustering of mass and flow cytometry data used in immunological investigations. The gateR package can be used for a panel of all surface markers, or a mixture of surface markers and functional read outs. The gateR package performs a gating technique that estimates statistically significant marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., “gates”) of markers to examine features of cells that may be different between groups. For a two-group comparison, the gateR package uses the spatial relative risk function that is estimated using the sparr package. The gates are conducted in two-dimensional space comprised of two markers.

+

2022-08-25

+ + + +

The gateR package is a suite of R functions to identify significant +spatial clustering of mass and flow cytometry data used in immunological +investigations. The gateR package can be used for a panel of all surface +markers or a mixture of surface markers and functional readouts. The +gateR package performs a gating technique that estimates statistically +significant marker combination values within which one immunologically +distinctive group (i.e., disease case) is more associated than another +group (i.e., healthy control), successively, using various combinations +(i.e., “gates”) of markers to examine features of cells that may be +different between groups. For a two-group comparison, the gateR package +uses the spatial relative risk function estimated using the sparr package. The +gates are conducted in two-dimensional space comprised of two +markers.

Examples of a single condition with two groups:

    -
  1. Disease case v. healthy control
  2. -
  3. Time 2 v. Time 1 (baseline)
  4. +
  5. Disease case vs. Healthy control
  6. +
  7. Time 2 vs. Time 1 (baseline)
-

For a two-group comparison of two conditions we estimate two relative risk surfaces for one condition and then a ratio of the relative risks. For example:

+

For a two-group comparison of two conditions, we estimate two +relative risk surfaces for one condition and then a ratio of the +relative risks. For example:

  1. Estimate a relative risk surface for:
      -
    1. Condition 2B v. Condition 2A
    2. -
    3. Condition 1B v. Condition 1A
    4. +
    5. Condition 2B vs. Condition 2A
    6. +
    7. Condition 1B vs. Condition 1A
  2. -
  3. Estimate relative risk surface for the ratio:
  4. +
  5. Estimate the relative risk surface for the ratio:

\[\frac{(\frac{Condition2B}{Condition2A})}{(\frac{Condition1B}{Condition1A})}\]

-

Within areas where the relative risk exceeds an asymptotic normal assumption, the gateR package has functionality to examine the features of these cells.

-

This vignette provides an implementation of the gateR package using a randomly generated data set. Please see the README.md file within the gateR GitHub repository for an example using publicly available flow cytometry data from the flowWorkspaceData package available via Bioconductor. Here, we generate data with two conditions, four markers, and two additional features.

+

Within areas where the relative risk exceeds an asymptotic normal +assumption, the gateR package has the functionality to examine the +features of these cells.

+

This vignette implements the gateR package using a randomly generated +data set. Please see the README.md file within the gateR GitHub +repository for an example using publicly available flow cytometry +data from the flowWorkspaceData +package available via Bioconductor. Here, we generate +data with two conditions, four markers, and two additional features.

We start with the necessary packages and seed for the vignette.

  loadedPackages <- c("gateR", "graphics", "stats", "tibble", "utils")
   invisible(lapply(loadedPackages, library, character.only = TRUE))
   set.seed(1234) # for reproducibility

Generate random toy data

-

Unique function to randomly generate data multivariate normal (MVN) around a central point. Parameters include the centroid coordinates (centre), number of observations to generate (ncell), and the standard deviation of the normal distribution (scalar).

+

A unique function randomly generates multivariate normal (MVN) data +around a central point. Parameters include the centroid coordinates +(centre), the number of observations to generate +(ncell), and the standard deviation of the normal +distribution (scalar).

  rand_mvn <- function(centre, ncell, scalar) {
     x0 <- centre[1]  
     y0 <- centre[2]
@@ -181,7 +400,13 @@ 

Generate random toy data

}

Gate 1: Marker 1 and Marker 2

-

At Condition 1, we generate 100,000 cases and 100,000 controls (ncell = 100000) randomly MVN with a case centroid at (0.55, 0.55) and a control centroid at (0.40, 0.40) within a unit square window (0, 1), and cases have a more focal cluster (scalar = 0.05) than controls (scalar = 0.15).

+

At Condition 1, we generate 100,000 cases and 100,000 controls +(ncell = 100000) randomly MVN with a case centroid at +(0.55, 0.55) and a control centroid at +(0.40, 0.40) within a unit square window +(0, 1), and cases have a more focal cluster +(scalar = 0.05) than controls +(scalar = 0.15).

# Initial parameters
   ncell <- 100000 # number of observations per group per condition
   c1_cas_center <- c(0.55, 0.55)
@@ -199,7 +424,13 @@ 

Gate 1: Marker 1 and Marker 2

ylab = "V2") graphics::points(c1_cas, col = "orangered4")

-

At Condition 2, we generate 100,000 cases and 100,000 controls (ncell = 100000) randomly MVN with a case centroid at (0.45, 0.45) and a control centroid at (0.40, 0.40) within a unit square window (0, 1), and cases have a more focal cluster (scalar = 0.05) than controls (scalar = 0.10).

+

At Condition 2, we generate 100,000 cases and 100,000 controls +(ncell = 100000) randomly MVN with a case centroid at +(0.45, 0.45) and a control centroid at +(0.40, 0.40) within a unit square window +(0, 1), and cases have a more focal cluster +(scalar = 0.05) than controls +(scalar = 0.10).

# Initial parameters
   c2_cas_center <- c(0.45, 0.45)
   c2_con_center <- c(0.40, 0.40)
@@ -229,7 +460,12 @@ 

Gate 1: Marker 1 and Marker 2

Gate 2: Marker 3 and Marker 4

-

At Condition 1, we generate 100,000 cases and 100,000 controls (ncell = 100000) randomly MVN with a case centroid at (0.55, 0.55) and a control centroid at (0.50, 0.50) within a unit square window (0, 05), but both have the same amount of spread (scalar = 0.10).

+

At Condition 1, we generate 100,000 cases and 100,000 controls +(ncell = 100000) randomly MVN with a case centroid at +(0.55, 0.55) and a control centroid at +(0.50, 0.50) within a unit square window +(0, 05), but both have the same amount of spread +(scalar = 0.10).

# Initial parameters
   c1_cas_center <- c(0.55, 0.55)
   c1_con_center <- c(0.50, 0.50)
@@ -246,7 +482,13 @@ 

Gate 2: Marker 3 and Marker 4

ylab = "V4") graphics::points(c1_cas, col = "orangered4")

-

At Condition 2, we generate 100,000 cases and 100,000 controls (ncell = 100000) randomly with a case centroid at (0.65, 0.65) and control a centroid at (0.50, 0.50) within a unit square window (0, 1), and cases have a more focal cluster (scalar = 0.05) than controls (scalar = 0.10).

+

At Condition 2, we generate 100,000 cases and 100,000 controls +(ncell = 100000) randomly with a case centroid at +(0.65, 0.65) and control a centroid at +(0.50, 0.50) within a unit square window +(0, 1), and cases have a more focal cluster +(scalar = 0.05) than controls +(scalar = 0.10).

# Initial parameters
   c2_cas_center <- c(0.65, 0.65)
   c2_con_center <- c(0.50, 0.50)
@@ -268,7 +510,8 @@ 

Gate 2: Marker 3 and Marker 4

df_full$V4 <- c(c2_cas[ , 2], c1_cas[ , 2], c2_con[ , 2], c1_con[ , 2]) rm(c2_cas, c1_cas, c2_con, c1_con) # conserve memory
-

Generate random values for two example cytokines and append to the data frame.

+

Generate random values for two example cytokines and append to the +data frame.

# Two Cytokines
   Z1 <- stats::rchisq(ncell * 4, df = 5) # Random Chi-square distribution
   Z2 <- stats::rnorm(ncell * 4, 0, 1) # Random Gaussian distribution
@@ -302,8 +545,9 @@ 

Gate 2: Marker 3 and Marker 4

graphics::plot(stats::density(df_full$Z2[df_full$group == "control" & df_full$condition == "2"]), main = "Cytokine 2 of Controls at Condition 2")
-

-

The toy data frame has nine columns (id, groups, markers, and cytokines).

+

+

The toy data frame has nine columns (id, groups, markers, and +cytokines).

  utils::head(df_full)
## # A tibble: 6 × 9
 ##      id group condition    V1    V2    V3    V4    Z1      Z2
@@ -338,7 +582,11 @@ 

For two conditions

end_time <- Sys.time() # record end time total_time <- end_time - start_time # calculate duration of gating() example

-

The gating process took about 13.6 seconds on a Macbook Pro (4 variables, 2 gates, 2 cytokines, 400,000 observations). The corrected significance level in the first gate was . The histograms for the two cytokines are the same as above.

+

The gating process took about 14.9 seconds on a machine with the +features listed at the end of the vignette (4 variables, 2 gates, 2 +cytokines, 400,000 observations). The corrected significance level in +the first gate was . The histograms for the two cytokines are the same +as above.

# Plot of Cytokine 1
   graphics::par(mfrow = c(1, 2), pty = "s")
   graphics::plot(stats::density(out_gate$obs$Z1[out_gate$obs$group == "case"
@@ -366,8 +614,11 @@ 

For two conditions

main = "Cytokine 2 of controls\npost-gating", xlim = c(-5, 5), ylim = c(0, 0.5))
-

-

Compare histograms before and after gating. Gating reduced the overall sample size of observations from 400,000 (cases & controls and Condition 1 & Condition 2) to 73,316 observations (cases & controls and Condition 1 & Condition 2).

+

+

Compare histograms before and after gating. Gating reduced the +overall sample size of observations from 400,000 (cases & controls +and Condition 1 & Condition 2) to 73,316 observations (cases & +controls and Condition 1 & Condition 2).

# Plot of Cytokine 1
   graphics::par(mfrow = c(1, 2), pty = "s")
   graphics::plot(stats::density(df_full$Z1[df_full$group == "case"
@@ -398,9 +649,9 @@ 

For two conditions

col = "black", lty = 1, main = "Cytokine 2 of cases\npost-gating", - xlim = c(-5 ,5), + xlim = c(-5, 5), ylim = c(0, 0.5))
-

+

For a one condition (using only Condition 1)

@@ -425,7 +676,11 @@

For a one condition (using only Condition 1)

end_time <- Sys.time() # record end time total_time <- end_time - start_time # calculate duration of gating() example

-

The gating process took about 16.1 seconds on a Macbook Pro (4 variables, 2 gates, 2 cytokines, 200,000 observations). The corrected significance level in the first gate was . The histograms for the two cytokines are the same as above.

+

The gating process took about 18 seconds on a machine with the +features listed at the end of the vignette (4 variables, 2 gates, 2 +cytokines, 200,000 observations). The corrected significance level in +the first gate was . The histograms for the two cytokines are the same +as above.

# Plot of Cytokine 1
   graphics::par(mfrow = c(1, 2), pty = "s")
   graphics::plot(stats::density(out_gate$obs$Z1[out_gate$obs$group == "case"]),
@@ -450,8 +705,10 @@ 

For a one condition (using only Condition 1)

main = "Cytokine 2 of controls\npost-gating", xlim = c(-5, 5), ylim = c(0, 0.5))
-

-

Compare histograms before and after gating. Gating reduced the overall sample size of observations from 200,000 (cases & controls) to 86,167 observations (cases & controls).

+

+

Compare histograms before and after gating. Gating reduced the +overall sample size of observations from 200,000 (cases & controls) +to 86,167 observations (cases & controls).

# Plot of Cytokine 1
   graphics::par(mfrow = c(1, 2), pty = "s")
   graphics::plot(stats::density(df_full$Z1[df_full$group == "case"]),
@@ -478,20 +735,75 @@ 

For a one condition (using only Condition 1)

col = "black", lty = 1, main = "Cytokine 2 of cases\npost-gating", - xlim = c(-5 ,5), + xlim = c(-5, 5), ylim = c(0, 0.5))
-

+

Current limitations

    -
  1. Extracts observations at all significant clusters (either case or controls) and there is currently no functionality to select cells within a specific (set of) cluster(s) for the next gate.
  2. -
  3. Only two dimensions (i.e., markers) per gate because the spatial relative risk function is a two-dimensional spatial statistic.
  4. -
  5. Only two-group comparisons (e.g., case v. control) per gate because the spatial relative risk function is a ratio by nature.
  6. -
  7. Only comparisons of one condition or comparisons of two conditions are possible.
  8. -
  9. Large computational expense (i.e., run-time) to calculate the correlated Bonferroni correction.
  10. -
  11. A large sample size of observations (i.e., cells) may overload the gateR process. We are evaluating this potential limitation and developing a possible solution (e.g., randomly subsetting the data to estimate the clusters at each gate).
  12. +
  13. Extracts observations at all significant clusters (either +case or controls), and there is currently no functionality to select +cells within a specific (set of) cluster(s) for the next gate.
  14. +
  15. Only two dimensions (i.e., markers) per gate because the spatial +relative risk function is a two-dimensional spatial statistic.
  16. +
  17. Only two-group comparisons (e.g., case vs. control) per gate because +the spatial relative risk function is a ratio by nature.
  18. +
  19. Only comparisons of one condition or two conditions are +possible.
  20. +
  21. Large computational expense (i.e., run-time) to calculate the +correlated Bonferroni correction.
  22. +
  23. A large sample size of observations (i.e., cells) may overload the +gateR process. We are evaluating this potential limitation and +developing a possible solution (e.g., randomly subsetting the data to +estimate the clusters at each gate).
+
sessionInfo()
+
## R version 4.2.1 (2022-06-23)
+## Platform: x86_64-apple-darwin17.0 (64-bit)
+## Running under: macOS Catalina 10.15.7
+## 
+## Matrix products: default
+## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
+## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
+## 
+## locale:
+## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
+## 
+## attached base packages:
+## [1] stats     graphics  grDevices utils     datasets  methods   base     
+## 
+## other attached packages:
+## [1] tibble_3.1.8 gateR_0.1.11
+## 
+## loaded via a namespace (and not attached):
+##  [1] viridis_0.6.2         sass_0.4.2            maps_3.4.0           
+##  [4] jsonlite_1.8.0        viridisLite_0.4.1     splines_4.2.1        
+##  [7] foreach_1.5.2         sparr_2.2-16          dotCall64_1.0-1      
+## [10] bslib_0.4.0           SpatialPack_0.4       highr_0.9            
+## [13] sp_1.5-0              spatstat.geom_2.4-0   yaml_2.3.5           
+## [16] pillar_1.8.1          lattice_0.20-45       glue_1.6.2           
+## [19] digest_0.6.29         polyclip_1.10-0       colorspace_2.0-3     
+## [22] htmltools_0.5.3       Matrix_1.4-1          spatstat.sparse_2.1-1
+## [25] pkgconfig_2.0.3       raster_3.5-21         misc3d_0.9-1         
+## [28] purrr_0.3.4           spatstat.core_2.4-4   scales_1.2.1         
+## [31] tensor_1.5            terra_1.5-21          spatstat.utils_2.3-1 
+## [34] mgcv_1.8-40           generics_0.1.3        ggplot2_3.3.6        
+## [37] spatstat.random_2.2-0 cachem_1.0.6          cli_3.3.0            
+## [40] magrittr_2.0.3        deldir_1.0-6          evaluate_0.15        
+## [43] fansi_1.0.3           doParallel_1.0.17     nlme_3.1-158         
+## [46] tools_4.2.1           lifecycle_1.0.1       stringr_1.4.0        
+## [49] munsell_0.5.0         compiler_4.2.1        jquerylib_0.1.4      
+## [52] rlang_1.0.4           grid_4.2.1            iterators_1.0.14     
+## [55] rstudioapi_0.13       goftest_1.2-3         spam_2.9-1           
+## [58] tcltk_4.2.1           rmarkdown_2.14        spatstat.linnet_2.3-2
+## [61] gtable_0.3.0          codetools_0.2-18      abind_1.4-5          
+## [64] R6_2.5.1              gridExtra_2.3         knitr_1.39           
+## [67] dplyr_1.0.9           fastmap_1.1.0         utf8_1.2.2           
+## [70] fastmatrix_0.4-124    stringi_1.7.8         spatstat.data_2.2-0  
+## [73] parallel_4.2.1        spatstat_2.3-4        Rcpp_1.0.9           
+## [76] fields_14.1           vctrs_0.4.1           rpart_4.1.16         
+## [79] tidyselect_1.1.2      xfun_0.31