Skip to content

Commit 331114a

Browse files
committed
add combat-seq method
1 parent 4e9ebb6 commit 331114a

File tree

3 files changed

+95
-0
lines changed

3 files changed

+95
-0
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ A major update to the OpenProblems framework, switching from a Python-based fram
4545

4646
* Added scGPT fine-tuned (PR #17).
4747

48+
* Added ComBat-Seq method (PR #55).
49+
4850

4951
## Major changes
5052

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
__merge__: ../../api/comp_method.yaml
2+
name: combat_seq
3+
label: ComBat-Seq
4+
summary: Adjusting batch effects in RNA-Seq expression data using empirical Bayes
5+
methods
6+
description: |
7+
ComBat-Seq extends the ComBat method for batch correction in RNA-Seq data.
8+
While ComBat assumes normally distributed data, ComBat-Seq uses a negative
9+
binomial distribution to model the data. While initially developed for
10+
RNA-Seq data, ComBat-Seq can be applied to single-cell RNA-Seq data as well.
11+
12+
The method is implemented in Python as a part of the inmoose package. It is
13+
based on the original R implementation, distributed through the sva package.
14+
15+
references:
16+
doi:
17+
- 10.1093/nargab/lqaa078
18+
- 10.1186/s12859-023-05578-5
19+
20+
links:
21+
documentation: https://inmoose.readthedocs.io/en/stable/pycombatseq.html
22+
repository: https://github.com/epigenelabs/inmoose
23+
24+
# Metadata for your component
25+
info:
26+
# Which normalisation method this component prefers to use (required).
27+
preferred_normalization: counts
28+
29+
# Resources required to run the component
30+
resources:
31+
- type: python_script
32+
path: script.py
33+
- path: /src/utils/read_anndata_partial.py
34+
35+
engines:
36+
# Specifications for the Docker image for this component.
37+
- type: docker
38+
image: openproblems/base_python:1.0.0
39+
# Add custom dependencies here (optional). For more information, see
40+
# https://viash.io/reference/config/engines/docker/#setup .
41+
setup:
42+
- type: python
43+
pip: inmoose
44+
45+
runners:
46+
# This platform allows running the component natively
47+
- type: executable
48+
# Allows turning the component into a Nextflow module / pipeline.
49+
- type: nextflow
50+
directives:
51+
label: [midtime,midmem,midcpu]

src/methods/combat-seq/script.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import sys
2+
3+
import anndata as ad
4+
import numpy as np
5+
from inmoose.pycombat import pycombat_seq
6+
from scipy.sparse import csr_matrix
7+
8+
# VIASH START
9+
# Note: this section is auto-generated by viash at runtime. To edit it, make changes
10+
# in config.vsh.yaml and then run `viash config inject config.vsh.yaml`.
11+
par = {"input": "resources_test/.../input.h5ad", "output": "output.h5ad"}
12+
meta = {"name": "combat-seq"}
13+
# VIASH END
14+
15+
sys.path.append(meta["resources_dir"])
16+
from read_anndata_partial import read_anndata
17+
18+
print("Read input", flush=True)
19+
adata = read_anndata(
20+
par["input"], X="layers/normalized", obs="obs", var="var", uns="uns"
21+
)
22+
23+
print("Run Combat-Seq", flush=True)
24+
counts = adata.T.to_df().astype(np.double).values
25+
corrected_counts = pycombat_seq(adata.X, adata.obs["batch"])
26+
27+
print("Store output", flush=True)
28+
output = ad.AnnData(
29+
obs=adata.obs[[]],
30+
var=adata.var[[]],
31+
uns={
32+
"dataset_id": adata.uns["dataset_id"],
33+
"normalization_id": adata.uns["normalization_id"],
34+
"method_id": meta["name"],
35+
},
36+
layers={
37+
"corrected_counts": csr_matrix(corrected_counts.T),
38+
},
39+
)
40+
41+
print("Store outputs", flush=True)
42+
output.write_h5ad(par["output"], compression="gzip")

0 commit comments

Comments
 (0)