-
Couldn't load subscription status.
- Fork 201
feat: Salmon-tximport meta-wrapper pathvars update #4665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,24 +1,24 @@ | ||
| rule salmon_decoy_sequences: | ||
| input: | ||
| transcriptome="resources/transcriptome.fasta", | ||
| genome="resources/genome.fasta", | ||
| transcriptome="<transcriptome_sequence>", | ||
| genome="<genome_sequence>", | ||
| output: | ||
| gentrome=temp("resources/gentrome.fasta"), | ||
| decoys=temp("resources/decoys.txt"), | ||
| gentrome=temp("<resources>/gentrome.fasta"), | ||
| decoys=temp("<resources>/decoys.txt"), | ||
| threads: 1 | ||
| log: | ||
| "decoys.log", | ||
| "<logs>/decoys.log", | ||
| wrapper: | ||
| "master/bio/salmon/decoys" | ||
|
|
||
|
|
||
| rule salmon_index_gentrome: | ||
| input: | ||
| sequences="resources/gentrome.fasta", | ||
| decoys="resources/decoys.txt", | ||
| sequences="<resources>/gentrome.fasta", | ||
| decoys="<resources>/decoys.txt", | ||
| output: | ||
| multiext( | ||
| "salmon/transcriptome_index/", | ||
| "<resources>/salmon_gentrome_index/", | ||
| "complete_ref_lens.bin", | ||
| "ctable.bin", | ||
| "ctg_offsets.bin", | ||
|
|
@@ -37,7 +37,7 @@ rule salmon_index_gentrome: | |
| ), | ||
| cache: True | ||
| log: | ||
| "logs/salmon/transcriptome_index.log", | ||
| "<logs>/salmon/gentrome_index.log", | ||
| threads: 2 | ||
| params: | ||
| # optional parameters | ||
|
|
@@ -48,9 +48,10 @@ rule salmon_index_gentrome: | |
|
|
||
| rule salmon_quant_reads: | ||
| input: | ||
| r="reads/{sample}.fastq.gz", | ||
| r1="<reads_r1>", | ||
| r2="<reads_r2>", | ||
| index=multiext( | ||
| "salmon/transcriptome_index/", | ||
| "<resources>/salmon_gentrome_index/", | ||
| "complete_ref_lens.bin", | ||
| "ctable.bin", | ||
| "ctg_offsets.bin", | ||
|
|
@@ -67,17 +68,17 @@ rule salmon_quant_reads: | |
| "seq.bin", | ||
| "versionInfo.json", | ||
| ), | ||
| gtf="resources/annotation.gtf", | ||
| gtf="<genome_annotation>", | ||
| output: | ||
| quant=temp("pseudo_mapping/{sample}/quant.sf"), | ||
| quant_gene=temp("pseudo_mapping/{sample}/quant.genes.sf"), | ||
| lib=temp("pseudo_mapping/{sample}/lib_format_counts.json"), | ||
| aux_info=temp(directory("pseudo_mapping/{sample}/aux_info")), | ||
| cmd_info=temp("pseudo_mapping/{sample}/cmd_info.json"), | ||
| libparams=temp(directory("pseudo_mapping/{sample}/libParams")), | ||
| logs=temp(directory("pseudo_mapping/{sample}/logs")), | ||
| quant=temp("<results>/pseudo_mapping/<per>/quant.sf"), | ||
| quant_gene=temp("<results>/pseudo_mapping/<per>/quant.genes.sf"), | ||
| lib=temp("<results>/pseudo_mapping/<per>/lib_format_counts.json"), | ||
| aux_info=temp(directory("<results>/pseudo_mapping/<per>/aux_info")), | ||
| cmd_info=temp("<results>/pseudo_mapping/<per>/cmd_info.json"), | ||
| libparams=temp(directory("<results>/pseudo_mapping/<per>/libParams")), | ||
| logs=temp(directory("<results>/pseudo_mapping/<per>/logs")), | ||
| log: | ||
| "logs/salmon/{sample}.log", | ||
| "<logs>/salmon/<per>.log", | ||
| params: | ||
| # optional parameters | ||
| libtype="A", | ||
|
|
@@ -90,28 +91,35 @@ rule salmon_quant_reads: | |
| rule tximport: | ||
| input: | ||
| quant=expand( | ||
| "pseudo_mapping/{sample}/quant.sf", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/quant.sf", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| lib=expand( | ||
| "pseudo_mapping/{sample}/lib_format_counts.json", | ||
| sample=["S1", "S2", "S3", "S4"], | ||
| "<results>/pseudo_mapping/{sample}/lib_format_counts.json", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| aux_info=expand( | ||
| "pseudo_mapping/{sample}/aux_info", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/aux_info", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| cmd_info=expand( | ||
| "pseudo_mapping/{sample}/cmd_info.json", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/cmd_info.json", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| libparams=expand( | ||
| "pseudo_mapping/{sample}/libParams", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/libParams", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| logs=expand("pseudo_mapping/{sample}/logs", sample=["S1", "S2", "S3", "S4"]), | ||
| tx_to_gene="resources/tx2gene.tsv", | ||
| logs=expand( | ||
| "<results>/pseudo_mapping/{sample}/logs", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| tx_to_gene="<tx_to_gene>", | ||
| output: | ||
| txi="tximport/SummarizedExperimentObject.RDS", | ||
| txi="<results>/tximport/SummarizedExperimentObject.RDS", | ||
| params: | ||
| extra="type='salmon'", | ||
| log: | ||
| "logs/tximport.log" | ||
| "<logs>/tximport.log", | ||
| wrapper: | ||
| "master/bio/tximport" | ||
|
Comment on lines
91
to
125
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Critical: Hardcoded sample list prevents user configuration. The This defeats the purpose of the pathvars system, as users must edit the wrapper code to use their own samples. Suggested solution:
custom:
samples: List of sample identifiers to process (e.g., ["sample1", "sample2"])
# Option 1: Use a placeholder (if Snakemake supports this for lists)
sample=<samples>,
# Option 2: Reference from config (more traditional approach)
sample=config.get("samples", ["S1", "S2"]),
pathvars:
samples: ["S1", "S2"]This would allow users to configure samples in their config file without touching the wrapper code. Do you want me to help generate a solution that makes the sample list configurable through pathvars or config? |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| pathvars: | ||
| transcriptome_sequence: "resources/transcriptome.fasta" | ||
| genome_sequence: "resources/genome.fasta" | ||
| genome_annotation: "resources/annotation.gtf" | ||
| tx_to_gene: "resources/tx2gene.tsv" | ||
| per: "{sample}" | ||
| reads_r1: "reads/{sample}_R1.fq.gz" | ||
| reads_r2: "reads/{sample}_R2.fq.gz" | ||
| resources: "resources" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| chromosome1 927 13 79 80 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| >transcript1 | ||
| CCAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAAT | ||
| CAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAATCGG | ||
| >transcript2 | ||
| CATCTCCCTGAGTCGGTTTAAAGATTGTCTTGTATGCGTACTCTTGATAGGTAACCCG | ||
| CAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAATCGGATACCGATTGGACTCTTGAGT | ||
| ACCGGCCCTGT |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| transcript1 49 13 49 50 | ||
| transcript2 81 76 70 71 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Pathvars structure is well-defined, but sample list configuration is missing.
The pathvars section properly defines default directories and custom path mappings. However, there's no pathvar or configuration mechanism for users to specify the list of samples to process. Users would need to know to look at the
meta_wrapper.smkfile and modify the hardcoded sample list in thetximportrule (currently["S1", "S2"]), which is not user-friendly.Consider adding a pathvar or configuration parameter for the sample list, such as:
samples: List of sample identifiers to processThis would allow users to specify their samples in the config file rather than modifying the wrapper code.
🏁 Script executed:
Length of output: 1769
🏁 Script executed:
Length of output: 180
🏁 Script executed:
Length of output: 355
🏁 Script executed:
Length of output: 364
🏁 Script executed:
Length of output: 3592
🏁 Script executed:
Length of output: 316
🏁 Script executed:
Length of output: 4314
Hardcoded sample list in tximport rule prevents user configuration.
The
tximportrule inmeta_wrapper.smk(lines 108-127) hardcodes samples as["S1", "S2"]in allexpand()calls. Users cannot customize the sample list without modifying the wrapper code itself, which defeats the purpose of configurable meta-wrappers.The test config demonstrates pathvar configuration for file paths and patterns (including
per: "{sample}"), but provides no mechanism to specify the actual list of samples to process.Add a configuration parameter for samples (either as a pathvar or config section) to allow users to specify their samples, such as:
Then update the
tximportrule to read this from config rather than hardcoding.🤖 Prompt for AI Agents