Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mmuphin wrapper #6584

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
298cadd
mmuphin wrapper draft with errors
renu-pal Nov 22, 2024
ef26199
Update .shed.yml
renu-pal Nov 22, 2024
801cc88
removing long description from .shed.yml
renu-pal Nov 25, 2024
f0ca96f
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
c19da4d
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
52076ec
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
bab0879
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
7c09803
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
f340c26
Update tools/mmuphin/mmuphin.xml
renu-pal Nov 25, 2024
4ee8d65
reducing CRC_abd file size and adding adjust_batch.R file
renu-pal Nov 25, 2024
e861c0f
adding long description into .shed.yml due to linting issue
renu-pal Nov 25, 2024
f65885e
Update .shed.yml
renu-pal Nov 25, 2024
7fafc59
Update .shed.yml
renu-pal Nov 25, 2024
f79c79f
update
paulzierep Nov 28, 2024
1b335f1
update
paulzierep Nov 28, 2024
d86e487
rm unneeded requs
paulzierep Nov 28, 2024
47d426d
Merge pull request #4 from paulzierep/mmuphin_wrapper
renu-pal Nov 29, 2024
f815e86
changing batch value in test, as first column header is null
renu-pal Dec 3, 2024
f36aa66
removing control_output from test
renu-pal Dec 3, 2024
090b476
reducing file size
renu-pal Dec 3, 2024
c5af559
Update mmuphin.xml
renu-pal Dec 10, 2024
e60c159
fixed tests
paulzierep Dec 13, 2024
9c404da
Merge pull request #5 from paulzierep/mmuphin_wrapper
renu-pal Dec 19, 2024
7c2d2d7
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 12, 2025
c1b167c
getting column names in R directly
renu-pal Jan 13, 2025
550cd60
Apply suggestions from code review
bgruening Jan 13, 2025
8d38c99
removed unnecessary commented code
renu-pal Jan 13, 2025
b6e9c39
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 14, 2025
5109992
improving help section
renu-pal Jan 14, 2025
0c902c3
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 14, 2025
19a33c0
removing additional options
renu-pal Jan 15, 2025
663287a
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 16, 2025
a139a38
adding test with covariate=null and few other updates
renu-pal Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions tools/mmuphin/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: mmuphin
owner: iuc
description: "MMUPHin is an R package implementing meta-analysis methods for microbial community profiles"
long_description: |
MMUPHin enables the normalization and combination of multiple microbial community studies. It can then help in identifying microbes, genes, or pathways that are differential with respect to combined phenotypes.
Finally, it can find clusters or gradients of sample types that reproduce consistently among studies.
homepage_url: https://huttenhower.sph.harvard.edu/mmuphin
remote_repository_url: https://github.com/biobakery/MMUPHin
type: unrestricted
categories:
- Metagenomics
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for the mmuphin adjust_batch function: {{ tool_name }}"
24 changes: 24 additions & 0 deletions tools/mmuphin/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<?xml version="1.0"?>
<macros>
<token name="@TOOL_VERSION@">1.16.0</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">23.2</token>

<xml name="xrefs">
<xrefs>
<xref type="bio.tools">mmuphin</xref>
<xref type="bioconductor">mmuphin</xref>

</xrefs>
</xml>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">bioconductor-mmuphin</requirement>
</requirements>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
</xml>
<xml name="citations">
<citations>
<citation type="doi"> 10.18129/B9.bioc.MMUPHin </citation>
</citations>
</xml>
</macros>
135 changes: 135 additions & 0 deletions tools/mmuphin/mmuphin.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
<tool id="mmuphin" name="mmuphin" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>Performing meta-analyses of microbiome studies</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="xrefs"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
Rscript '$rscript' &&
mv 'adjust_batch_diagnostic.pdf' '$diagnostic_plot_output'
]]></command>

<configfiles>
<configfile name="rscript"><![CDATA[

library(MMUPHin)

## input files
print("Read input files")
data <- read.csv("$input_data", sep = "\t", row.names=1, check.names = FALSE)
meta_data <- read.csv("$input_metadata", sep = "\t", row.names=1, check.names = FALSE)

# Define control list
controls <- list("$additional_options.zero_inflation",
"$additional_options.pseudo_count",
"$additional_options.conv",
"$additional_options.maxit",
"$additional_options.verbose",
"$additional_options.diagnostic_plot")

#Perform batch adjustment

batch <- colnames(meta_data[$batch_input-1])
cat("Batch Name:", batch, "\n")
covariates_val <- list($covariates_input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covariates can be empty (only correct batch). To handel the python None in R, this should work:

        # Ensure covariates_input is checked for NULL
        if (!exists("$covariates_input") || is.null($covariates_input)) {
            covariates_val <- list()  # Assign an empty list if input is NULL or does not exist
        } else {
            covariates_val <- list($covariates_input)
        }

        covariates <- c()

        # Process covariates only if they exist
        if (length(covariates_val) > 0) {     
            for (i in covariates_val) {
                covariates <- c(covariates, colnames(meta_data[i - 1]))
            }
            cat("Covariates Names:", covariates, "\n")
        } else {
            cat("No covariates provided.\n")
        }
        

Copy link
Contributor Author

@renu-pal renu-pal Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulzierep this code does not seem to work for some reason when I set covariates input to null. I keep on getting the error : Object 'None' not found.
PS: I removed --- !exists("$covariates_input") from if since it was executing if condition even when covariate was not empty. I believe it is used to just test whether covariate_input exists in the current env , which kept turning false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have pushed the other updates you asked for below, and working on how to fix this null issue.
a139a38

if (length(covariates_val) > 0) {
covariates <- c()
for(i in covariates_val){
covariates <- c(covariates,colnames(meta_data[i-1]))
}
cat("Covariates Names:", covariates, "\n")
}


result <- adjust_batch(feature_abd = data,
batch = batch,
covariates = covariates,
data = meta_data,
control=controls
)

# Save results into output files

write.table(result\$feature_abd_adj,file="$output",quote = FALSE, sep="\t")
]]></configfile>
</configfiles>



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please clean those things up? thanks

<inputs>
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/>
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/>
<param argument="batch_input" type="data_column" data_ref="input_metadata" use_header_names="true" label="batch" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please improve all labels and help text. They are not very user-friendly IMHO.

How does a metadata file needs to look like? Or the feature file? "batch"? Maybe "the column in which the batch identifier is species"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening does this work?
5109992

<param argument="covariates_input" type="data_column" data_ref="input_metadata" use_header_names="true" multiple="true" optional="true" label="covariates" />
<section name="additional_options" title="Additional Options" expanded="true">
<param argument="zero_inflation" type="boolean" truevalue="zero_inflation TRUE" falsevalue="zero_inflation FALSE" checked="true" label=" Run zero-inflated model"/>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
<param argument="pseudo_count" type="float" optional="true" label="Pseudo_count" help="Pseudo count to add feature_abd before the methods' log transformation.Default to NULL, in which case will be set to half of minimal non-zero values in feature_abd"/>
<param argument="conv" type="float" value="0.0001" optional="true" label="Convergence threshold" help="Convergence threshold for the method's iterative algorithm for shrinking batch effect parameters"/>
<param argument="maxit" type="float" value="1000" optional="true" label="Maximum number of iterations" help="Maximum number of iterations allowed for the method's iterative algorithm. Default to 1000"/>
<param argument="verbose" type="boolean" truevalue="verbose TRUE" falsevalue="verbose FALSE" checked="true" label="Print verbose information"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually don't expose those parameters to the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening ,so should I remove them ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and set a useful default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bgruening , I have the made required changes . Does this work ?
19a33c0

<param argument="diagnostic_plot" type="boolean" truevalue="diagnostic_plot TRUE" falsevalue="diagnostic_plot FALSE" checked="true" label="Generate diagnostic figure file, default: adjust_batch_diagnostic.pdf"/>
</section>
</inputs>


<outputs>
<data name="output" format="tabular" label="Adjusted abundance table"/>
<data name="diagnostic_plot_output" format="pdf" label="diagnostic figure file"/>
</outputs>

<tests>
<test expect_num_outputs="2">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test without covariates

<param name="input_data" value="CRC_abd.tsv"/>
<param name="input_metadata" value="CRC_meta.tsv"/>
<param name="batch_input" value="29"/>
<param name="covariates_input" value="4"/>
<section name="additional_options">
<param name="zero_inflation" value="TRUE"/>
<param name="pseudo_count" value="3"/>
<param name="conv" value="0.0001"/>
<param name="maxit" value="1000"/>
<param name="verbose" value="TRUE"/>
<param name="diagnostic_plot" value="TRUE"/>
</section>
<output name="output" file="CRC_abd_corrected.tsv" ftype="tabular"/>
<output name="diagnostic_plot_output" file="diagnostic.pdf" ftype="pdf"/>

</test>
</tests>
<help><![CDATA[
@HELP_HEADER@
MmuPHin
=========
MMUPHin is an R package implementing meta-analysis methods for microbial community profiles. It has interfaces for:

a) Performing batch (study) effect adjustment with adjust_batch :
------------------------------------------------------------------
It aims to correct for technical batch effects in microbial feature abundances. Batch effects refer to variations in data that arise not from the biological or experimental variables of interest but due to differences in technical or procedural factors during data collection or processing. For example:

Different equipment or lab environments.
Different operators handling the experiment.
Variations in sample preparation, sequencing runs, or platforms.

These unwanted variations can obscure true biological signals and introduce bias, making it critical to adjust for batch effects to ensure accurate and comparable results across datasets.

The function adjust_batch in the MMUPHin package is designed to correct batch effects in microbiome data.

Inputs:
=======
A feature-by-sample abundance matrix (e.g., microbial abundances).
A metadata file, which contains information about samples, including batch identifiers and optional covariates.

Output:
=======
A batch-adjusted abundance matrix for downstream analyses.

b) meta-analytic differential abundance testing
c) meta-analytic discovery of discrete (cluster-based) or continuous unsupervised population structure.

Meta-analysis methods are statistical techniques used to combine and synthesize data from multiple independent studies, typically to derive a more precise or generalizable conclusion. This approach is commonly used in fields such as medicine, psychology, and biology to aggregate research findings and increase the statistical power of analyses by pooling data from different experiments or studies.


]]></help>
<expand macro="citations"/>
</tool>
23 changes: 23 additions & 0 deletions tools/mmuphin/test-data/CRC_abd.tsv

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions tools/mmuphin/test-data/CRC_abd_corrected.tsv

Large diffs are not rendered by default.

Loading
Loading