Skip to content

Conversation

@jlanga
Copy link
Contributor

@jlanga jlanga commented Dec 3, 2024

This PR shamelessly copies functionalities from bwa-mem2 to sort or not, do it by coordinate or queryname, and choose between samtools and picard to do so

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced configurability for the Bowtie2 alignment tool with new sorting parameters.
    • Added support for additional font packages to improve visual output.
    • Introduced dynamic rule generation for various alignment configurations in the testing framework.
    • Updated dependencies for Bowtie2 and added new packages for improved functionality, including graphics and font libraries.
    • Added a new author to the Bowtie2 package and clarified functionality in the documentation.
    • New entry added to the genome index file for improved sequence tracking.
  • Bug Fixes

    • Improved error handling for missing Bowtie2 index files and refined input validation.
  • Tests

    • Expanded test coverage for Bowtie2 alignment and Sourmash tools to ensure robustness across multiple configurations.
    • Added new test functions to validate command parameters and output paths for Bowtie2 alignments.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

📝 Walkthrough

Walkthrough

This pull request introduces significant updates to the Bowtie2 alignment tool's environment configuration and functionality. Key changes include modifications to the Conda environment files (environment.linux-64.pin.txt, environment.yaml), enhancements in the meta.yaml file for parameter configurability, and the addition of dynamic rules in the Snakefile for alignment processing. Furthermore, the wrapper.py file has been updated to improve error handling and command execution. New tests have been added to test_wrappers.py to ensure comprehensive coverage of the new features and configurations.

Changes

File Path Change Summary
bio/bowtie2/align/environment.linux-64.pin.txt - Added new Conda version header.
- Removed several package URLs.
- Added multiple new packages including fonts and libraries.
- Updated existing package versions.
bio/bowtie2/align/environment.yaml - Updated bowtie2 version from 2.5.4 to 2.5.
- Added dependency: picard-slim =3.3.
- Updated snakemake-wrapper-utils from 0.6.2 to 0.6.
bio/bowtie2/align/meta.yaml - Added author: Jorge Langa.
- Introduced parameters: sort_program, sort_extra, sort_order.
- Updated notes section.
bio/bowtie2/align/test/Snakefile - Introduced dynamic rules for alignment based on sorting programs, orders, and file extensions.
bio/bowtie2/align/wrapper.py - Added error handling for missing Bowtie2 index files.
- Refined input sample validation.
- Expanded sorting program options.
- Updated command construction and logging.
test_wrappers.py - Added test function: test_bowtie2_align.
- Added test function: test_sourmash_compute.
- Minor formatting adjustments in existing tests.

Possibly related PRs

Suggested reviewers

  • johanneskoester
  • fgvieira

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3b96cc1 and 9d58e61.

📒 Files selected for processing (6)
  • bio/bowtie2/align/environment.linux-64.pin.txt (1 hunks)
  • bio/bowtie2/align/environment.yaml (1 hunks)
  • bio/bowtie2/align/meta.yaml (1 hunks)
  • bio/bowtie2/align/test/Snakefile (1 hunks)
  • bio/bowtie2/align/wrapper.py (3 hunks)
  • test_wrappers.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

test_wrappers.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


103-103: Undefined name sort

(F821)


124-124: Undefined name snakemake

(F821)


125-125: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


128-128: Undefined name snakemake

(F821)


130-130: Undefined name snakemake

(F821)


132-132: Undefined name snakemake

(F821)


168-168: Undefined name snakemake

(F821)

🔇 Additional comments (5)
bio/bowtie2/align/environment.yaml (2)

Line range hint 1-4: LGTM! Well-configured channels and version pinning strategy.

The channel configuration with conda-forge and bioconda, along with nodefaults, ensures reliable and reproducible package resolution. The explicit version pinning for all dependencies is a good practice for reproducibility.


7-7: Verify picard-slim version.

Let's verify that picard-slim=3.3.0 is the latest stable version and check for any known issues.

✅ Verification successful

picard-slim version 3.3.0 is verified as the latest stable version

The verification confirms that:

  • 3.3.0 is currently the latest version available in the bioconda channel
  • Recent issues reported in the Picard repository are minor and don't indicate any critical problems with version 3.3.0
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check latest version and potential issues for picard-slim

# Get latest version from conda-forge
curl -s https://conda.anaconda.org/bioconda/noarch/repodata.json | jq -r '.packages | to_entries[] | select(.value.name=="picard-slim") | .value.version' | sort -V | tail -n 1

# Check for any reported issues in the past 6 months
gh api graphql -f query='
{
  search(query: "repo:broadinstitute/picard is:issue created:>2024-06-01", type: ISSUE, first: 5) {
    nodes {
      ... on Issue {
        title
        url
        labels(first: 5) {
          nodes {
            name
          }
        }
      }
    }
  }
}'

Length of output: 1316

test_wrappers.py (1)

2064-2079: Test function test_bowtie2_align_samtools_coordinate_extension is well-implemented

The new test function correctly iterates over all combinations of sorting programs, orders, and file extensions. The implementation is clean, and the function adheres to best practices for test definitions.

bio/bowtie2/align/environment.linux-64.pin.txt (2)

1-5: LGTM! Good practice adding Conda version.

The addition of the Conda version header improves reproducibility by documenting the exact version used to create this environment.


75-89: Verify version compatibility between tools.

The selected versions are recent and stable:

  • bowtie2 2.5.4
  • samtools 1.21
  • picard-slim 3.3.0

Good choice using picard-slim to reduce environment size. However, let's verify compatibility between these versions.

✅ Verification successful

Tools are compatible and consistently used across the repository

Based on the environment files analysis:

  • The selected versions (bowtie2 2.5.4, samtools 1.21, picard-slim 3.3.0) are consistently used together across multiple workflows in the repository
  • These versions are the current standard in the codebase, showing they are well-tested together
  • The Picard 3.3.0 release notes from earlier show no compatibility issues with samtools and includes htsjdk 4.1.3 which is compatible with samtools 1.21
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for known compatibility issues between the selected versions

# Check release notes and issues for compatibility information
gh api graphql -f query='
{
  repository(owner: "broadinstitute", name: "picard") {
    release(tagName: "3.3.0") {
      description
    }
  }
}
'

# Check if there are any reported issues between these specific versions
gh api graphql -f query='
{
  search(query: "repo:samtools/samtools repo:broadinstitute/picard is:issue samtools 1.21 picard 3.3.0", type: ISSUE) {
    nodes {
      ... on Issue {
        title
        url
        state
      }
    }
  }
}
'

Length of output: 2460


Script:

#!/bin/bash
# Let's check for any compatibility requirements or issues in the wrapper script and workflow

# Check wrapper script for version requirements
fd "bowtie2.*\.py" --exec cat {}

# Check workflow for version requirements
fd "bowtie2.*\.smk|bowtie2.*\.snakefile" -I --exec cat {}

# Check if there are any version constraints in environment files
fd "environment.*\.ya?ml" --exec grep -A5 -B5 "bowtie2\|samtools\|picard" {}

# Look for any version-specific code or comments
rg -A3 -B3 "version|Version" --glob "bowtie2*"

Length of output: 8907

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
bio/bowtie2/align/environment.linux-64.pin.txt (1)

16-55: Consider optimizing system library dependencies.

The environment includes a comprehensive set of system libraries. While these are likely pulled in as dependencies, consider:

  1. Evaluating if all these libraries are necessary for the core functionality
  2. Testing if any can be marked as optional dependencies
bio/bowtie2/align/wrapper.py (2)

42-45: Fix typo in error message

There's a typo in the error message: "declarad" should be "declared".

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"

171-179: Consider resource management optimization

The wrapper combines alignment and sorting in a single step using pipes. While this is memory-efficient, it could benefit from some resource management considerations:

  1. For large datasets, consider adding a parameter to control the memory usage of sorting tools (e.g., --memory for samtools, MAX_RECORDS_IN_RAM for Picard)
  2. Consider adding progress reporting for long-running operations

Would you like help implementing these resource management optimizations?

bio/bowtie2/align/meta.yaml (1)

Line range hint 29-31: Enhance documentation for new sorting parameters

The notes section should document the new sorting functionality and parameters.

Add documentation for the new parameters:

  * The `extra` param allows for additional arguments for bowtie2.
+ * The `sort_program` param enables output sorting using either 'samtools' or 'picard'. Default is 'none'.
+ * The `sort_extra` param allows passing additional arguments to the selected sorting program.
+ * The `sort_order` param specifies the sorting order: 'coordinate' (default) or 'queryname'.
  * This wrapper uses an inner pipe. Make sure to use at least two threads in your Snakefile.
bio/bowtie2/align/test/Snakefile (2)

140-140: Consider dynamic thread allocation for sorting

The current fixed thread count might not be optimal for all sorting operations. Consider adjusting threads based on the sorting program.

-               threads: 8  # Use at least two threads
+               threads: lambda wildcards, input: 16 if wildcards.program != "none" else 8  # Increase threads for sorting operations

138-139: Enhance log file structure

Consider organizing log files in subdirectories based on sorting configuration for better maintainability.

-               log:
-                   f"logs/bowtie2/{{sample}}.{program}_{order}_{extension}.log",
+               log:
+                   f"logs/bowtie2/{program}/{order}/{{sample}}.{extension}.log",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9d58e61 and 734b1bb.

📒 Files selected for processing (5)
  • bio/bowtie2/align/environment.linux-64.pin.txt (1 hunks)
  • bio/bowtie2/align/environment.yaml (1 hunks)
  • bio/bowtie2/align/meta.yaml (2 hunks)
  • bio/bowtie2/align/test/Snakefile (1 hunks)
  • bio/bowtie2/align/wrapper.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bio/bowtie2/align/environment.yaml
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


124-124: Undefined name snakemake

(F821)


125-125: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


128-128: Undefined name snakemake

(F821)


130-130: Undefined name snakemake

(F821)


132-132: Undefined name snakemake

(F821)


168-168: Undefined name snakemake

(F821)

🔇 Additional comments (7)
bio/bowtie2/align/environment.linux-64.pin.txt (2)

8-11: Consider optimizing dependencies for CLI usage.

The environment includes several GUI-related packages that may be unnecessary for a command-line bioinformatics tool:

  • Font packages (dejavu, inconsolata, ubuntu, etc.)
  • X11 libraries (xorg-*)
  • Graphics libraries (cairo, pixman)

These dependencies could increase the environment size and installation time without providing value for the core functionality.

Also applies to: 56-78


79-79: Verify compatibility between bioinformatics tools.

The core bioinformatics tools are present with appropriate versions:

  • bowtie2 2.5.4
  • samtools 1.21
  • picard-slim 3.3.0

Please verify that these specific versions are compatible with each other for the sorting functionality, particularly when using Picard for query name sorting.

Consider documenting the version compatibility matrix in the wrapper's documentation to help users understand which versions work well together.

Also applies to: 89-89, 92-92, 93-93

✅ Verification successful

Based on the search results, I can now generate a final response:

The specified versions of bowtie2, samtools, and picard are compatible and commonly used together

The codebase shows extensive integration testing and usage of these tools together:

  • Multiple test files demonstrate bowtie2 2.5.x working with samtools 1.x and picard 3.x
  • No compatibility issues or errors are documented between these versions
  • The tools are frequently used together in various workflows and pipelines
  • The version combinations are actively tested in the CI pipeline
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify compatibility between tool versions

# Check if there are any known compatibility issues between these versions
gh api graphql -f query='
{
  search(query: "repo:broadinstitute/picard is:issue \"bowtie2 2.5.4\" OR \"samtools 1.21\"", type: ISSUE) {
    nodes {
      ... on Issue {
        title
        url
        state
      }
    }
  }
}'

# Check if there are any reported issues with these specific versions
rg -l "compatibility|issue|error" | xargs rg -A 5 "bowtie2.*2\.5\.4|samtools.*1\.21|picard.*3\.3\.0"

Length of output: 636


Script:

#!/bin/bash
# Let's try a different approach to verify compatibility

# 1. Check for any configuration files or workflow files that might show these tools working together
fd -e yaml -e yml -e config -e json | xargs rg -l "bowtie2|samtools|picard"

# 2. Look for any documented version constraints or requirements
fd "README|CHANGELOG|requirements" | xargs rg -i "version|compatibility|require"

# 3. Check for any test files that might use these tools together
fd -e test -e spec | xargs rg -l "bowtie2|samtools|picard"

# 4. Look for workflow definitions that might show tool interactions
fd "workflow|pipeline|snakefile|nextflow" -i | xargs rg -l "bowtie2|samtools|picard"

# 5. Check if there are any documented issues in issue templates or docs
fd "ISSUE_TEMPLATE|docs" -i | xargs rg -i "compatibility|version"

Length of output: 34600

bio/bowtie2/align/wrapper.py (2)

103-103: ⚠️ Potential issue

Fix undefined variable sort in error message

The error message uses undefined variable sort instead of sort_program.

-    raise ValueError(f"Unexpected value for params.sort ({sort})")
+    raise ValueError(f"Unexpected value for params.sort_program ({sort_program})")

86-91: ⚠️ Potential issue

Clarify thread allocation error message for Picard

The error message for Picard thread allocation is unclear. It should explain that Picard requires 2 threads minimum - one for bowtie2 and one for Picard.

     bowtie2_threads = bowtie2_threads - 1
     if bowtie2_threads <= 0:
         raise ValueError(
-            "Not enough threads requested. This wrapper requires exactly one more."
+            "Not enough threads requested. When using Picard, at least 2 threads are required (1 for bowtie2, 1 for Picard)."
         )
bio/bowtie2/align/meta.yaml (2)

8-8: LGTM: Author addition

Addition of Jorge Langa as an author is appropriate given the contributions.


25-27: LGTM: New sorting parameters

The new parameters align well with the PR objectives:

  • sort_program: Enables sorting with samtools or picard
  • sort_extra: Allows customization of sorting behavior
  • sort_order: Provides coordinate/queryname sorting options
bio/bowtie2/align/test/Snakefile (1)

106-109: LGTM: Comprehensive dynamic rule generation

The nested loops effectively generate rules for all possible combinations of sorting configurations.

Verify the generated rule combinations:

✅ Verification successful

LGTM: Dynamic rule generation is correctly implemented

The nested loops generate 18 distinct rules (2 orders × 3 extensions × 3 programs) with proper:

  • Naming convention: bowtie2_align_{program}_{order}_{extension}
  • Required inputs including reference genome for CRAM output
  • Appropriate parameters for sorting configuration
  • Structured log file naming that reflects the configuration

The implementation ensures comprehensive coverage of all sorting configurations while maintaining consistent rule structure.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all expected rule combinations are generated

# List all rules in the Snakefile
snakemake --list-rules 2>/dev/null | grep "bowtie2_align_" | sort

Length of output: 53


Script:

#!/bin/bash
# Let's examine the Snakefile content directly to verify the rule generation
cat bio/bowtie2/align/test/Snakefile

Length of output: 4074


Script:

#!/bin/bash
# Search for rule definitions in the Snakefile
rg "rule bowtie2_align_" bio/bowtie2/align/test/Snakefile -A 5

Length of output: 64


Script:

#!/bin/bash
# Check the full context around the loops
rg -B 5 -A 10 'for order in \["coordinate", "queryname"\]' bio/bowtie2/align/test/Snakefile

Length of output: 614

Comment on lines 124 to 126
ref="genome.fasta",
output:
f"mapped_idx/{{sample}}.{program}_{order}.{extension}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Based on the analysis of both wrapper.py and the test Snakefile, I can now provide the final response:

CRAM output handling needs improvement

The wrapper doesn't explicitly validate the presence of the reference genome for CRAM output. While the test Snakefile correctly includes ref="genome.fasta" in CRAM-related rules, the wrapper.py should:

  • Validate reference presence when output is CRAM format
  • Properly pass the reference to samtools for CRAM generation

Specific issues:

  • In wrapper.py, line 54-55: The CRAM conversion lacks -T reference parameter in the samtools view command
  • No validation check for reference input when CRAM output is detected
🔗 Analysis chain

Verify CRAM output handling

The reference genome is required for CRAM output. Ensure proper error handling when ref is missing.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if wrapper.py handles missing reference for CRAM output

# Search for CRAM-related error handling
rg -A 5 "\.cram" bio/bowtie2/align/wrapper.py

Length of output: 47


Script:

#!/bin/bash
# Let's check the wrapper.py content to understand how output formats are handled
cat bio/bowtie2/align/wrapper.py

Length of output: 5582


Script:

#!/bin/bash
# Let's check if there's any reference handling for CRAM output in the Snakefile
cat bio/bowtie2/align/test/Snakefile

Length of output: 4074

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
bio/bowtie2/align/wrapper.py (1)

41-47: Fix typo in error message

There's a typo in the error message: "declarad" should be "declared".

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"
🧰 Tools
🪛 Ruff (0.8.0)

41-41: Undefined name snakemake

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 734b1bb and eac0db4.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


36-36: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


67-67: Undefined name snakemake

(F821)


126-126: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


129-129: Undefined name snakemake

(F821)


130-130: Undefined name snakemake

(F821)


132-132: Undefined name snakemake

(F821)


134-134: Undefined name snakemake

(F821)


170-170: Undefined name snakemake

(F821)

🔇 Additional comments (6)
bio/bowtie2/align/wrapper.py (6)

1-12: LGTM: Appropriate imports for the new functionality

The imports are well-organized and include necessary modules for the new sorting capabilities.


13-32: LGTM: Well-structured initialization with proper thread validation

The initialization code properly defines required indices and handles thread allocation with appropriate validation.

🧰 Tools
🪛 Ruff (0.8.0)

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


50-59: LGTM: Comprehensive input validation

The input validation properly checks for valid sample counts and sort order values.

🧰 Tools
🪛 Ruff (0.8.0)

51-51: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


88-93: Clarify Picard thread requirement message

The error message doesn't clearly explain that Picard requires at least 2 threads total, with 1 thread reserved for Picard.

-            "Not enough threads requested. This wrapper requires exactly one more."
+            "Not enough threads requested. Picard sorting requires at least 2 threads (1 for Bowtie2, 1 for Picard)."

104-105: Fix variable name in error message

The error message uses sort instead of sort_program.

-    raise ValueError(f"Unexpected value for params.sort ({sort_program})")
+    raise ValueError(f"Unexpected value for params.sort_program ({sort_program})")

173-181: LGTM: Clean command construction with proper temporary directory handling

The execution logic properly handles temporary directories and constructs the command with all necessary parameters.

- bowtie2 =2.5.4
- picard-slim =3.3.0
- bowtie2 =2.5
- picard-slim =3.3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- picard-slim =3.3
- picard-slim =3.3.0

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)

14-23: Add edge case handling to get_extension function

The function should handle edge cases such as empty filenames or files without extensions to prevent potential runtime errors.

Consider this improved implementation:

 def get_extension(filename: str) -> str:
     """
     Return file format since Bowtie2 reads files that
     could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
     """
+    if not filename:
+        raise ValueError("Empty filename")
+    parts = filename.split(".")
+    if len(parts) < 2:
+        raise ValueError(f"No extension found in filename: {filename}")
     if filename.endswith((".gz", ".bz2")):
-        return filename.split(".")[-2].lower()
+        return parts[-2].lower() if len(parts) > 2 else ""
-    return filename.split(".")[-1].lower()
+    return parts[-1].lower()

45-47: Fix typo in error message

There's a typo in the error message: "declarad" should be "declared".

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"

155-183: Consider using constants for repeated string literals

The code uses string literals like "none", "samtools", "picard" multiple times. Consider defining these as constants at the module level to improve maintainability.

+# Constants for sort programs
+SORT_PROGRAM_NONE = "none"
+SORT_PROGRAM_SAMTOOLS = "samtools"
+SORT_PROGRAM_PICARD = "picard"
+
 match sort_program:
-    case "none":
+    case SORT_PROGRAM_NONE:
         # ...
-    case "samtools":
+    case SORT_PROGRAM_SAMTOOLS:
         # ...
-    case "picard":
+    case SORT_PROGRAM_PICARD:
         # ...

111-116: Consider extracting CRAM validation to a separate function

The CRAM validation logic could be moved to a separate function to improve code organization and reusability.

+def validate_cram_requirements(ref: str | None, ref_fai: str | None) -> None:
+    """Validate that reference files are provided for CRAM output."""
+    if ref is None or ref_fai is None:
+        raise ValueError(
+            "Reference file and index are required for CRAM output. "
+            "Please specify them as input.ref and input.ref_fai."
+        )
+
 if bam_extension == "cram":
-    if REF is None or REF_FAI is None:
-        raise ValueError(
-            "Reference file and index are required for CRAM output."
-            "Please specify them as input.ref and input.ref_fai."
-        )
+    validate_cram_requirements(REF, REF_FAI)
🧰 Tools
🪛 Ruff (0.8.0)

111-112: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between eac0db4 and 9c9c918.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

26-26: Undefined name snakemake

(F821)


36-36: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


61-61: Undefined name snakemake

(F821)


62-62: Undefined name snakemake

(F821)


63-63: Undefined name snakemake

(F821)


69-69: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)


79-79: Undefined name snakemake

(F821)


80-80: Undefined name snakemake

(F821)


81-81: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


83-83: Undefined name snakemake

(F821)


85-85: Undefined name snakemake

(F821)


87-87: Undefined name snakemake

(F821)


111-112: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)


124-127: Use ternary operator CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"

(SIM108)


197-197: Undefined name snakemake

(F821)

🔇 Additional comments (2)
bio/bowtie2/align/wrapper.py (2)

1-13: LGTM! Imports are appropriate for the new functionality

The imports are well-organized and include all necessary modules for the enhanced sorting capabilities.


72-74: Ensure proper thread allocation for sorting

The current implementation could result in 0 threads for sorting if snakemake.threads is 1.

🧰 Tools
🪛 Ruff (0.8.0)

72-72: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
bio/bowtie2/align/test/Snakefile (2)

107-144: Consider validating program and order combinations

The current implementation allows all combinations of program and order, but some might not make sense (e.g., program="none" with order="coordinate"/"queryname"). Consider:

  1. Adding documentation about valid combinations
  2. Implementing validation logic to skip invalid combinations

Example improvement:

# Define valid combinations
VALID_COMBINATIONS = {
    "none": [""],  # no sorting
    "samtools": ["coordinate", "queryname"],
    "picard": ["coordinate", "queryname"]
}

for program, valid_orders in VALID_COMBINATIONS.items():
    for order in valid_orders:
        for extension in ["sam", "bam", "cram"]:
            # Rule definition here

129-134: Clean up commented output parameters

The commented output parameters appear to be unused. If they're not needed, consider removing them for better code clarity.

                output:
                    f"mapped_idx/{{sample}}.{program}_{order}.{extension}",
-                    # idx="",
-                    # metrics="",
-                    # unaligned="",
-                    # unpaired="",
-                    # unconcordant="",
-                    # concordant="",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9c9c918 and badae1c.

📒 Files selected for processing (2)
  • bio/bowtie2/align/test/Snakefile (2 hunks)
  • bio/bowtie2/align/test/genome.fasta.fai (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • bio/bowtie2/align/test/genome.fasta.fai
🔇 Additional comments (2)
bio/bowtie2/align/test/Snakefile (2)

89-89: LGTM: Proper CRAM reference handling

The addition of the FASTA index requirement (ref_fai) is correct and necessary for CRAM output handling. This change properly addresses the previous review feedback about CRAM reference validation.


107-109: LGTM: Comprehensive implementation of sorting options

The nested loop structure effectively generates test rules for all combinations of sorting programs (none/samtools/picard), orders (coordinate/queryname), and output formats (sam/bam/cram).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
bio/bowtie2/align/wrapper.py (3)

15-24: Consider handling additional edge cases in get_extension

While the function works correctly for standard cases, consider handling these edge cases:

  1. Files without extensions
  2. Files with multiple dots in the name
  3. Case-insensitive extension comparison
 def get_extension(filename: str) -> str:
     """
     Return file format since Bowtie2 reads files that
     could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
     """
+    if not filename or "." not in filename:
+        raise ValueError("Invalid filename: must have an extension")
     if filename.endswith((".gz", ".bz2")):
-        return filename.split(".")[-2].lower()
+        parts = filename.lower().split(".")
+        return parts[-2] if len(parts) > 2 else ""
-    return filename.split(".")[-1].lower()
+    return filename.lower().split(".")[-1]

174-176: Enhance CRAM output validation

When using CRAM output with samtools, consider validating that the reference file exists and is readable.

         if bam_extension == "cram":
+            if not path.isfile(REF):
+                raise ValueError(f"Reference file not found or not readable: {REF}")
             samtools_opts += f" --reference {REF} "

205-210: Consider using a list for command construction

The command construction could be more maintainable by using a list of command parts.

-    shell(
-        "( bowtie2"
-        " --threads {bowtie2_threads}"
-        " {CMD_INPUT} "
-        " -x {index}"
-        " {extra}"
-        " " + PIPE_CMD + ") {LOG}"
-    )
+    cmd_parts = [
+        "(",
+        "bowtie2",
+        f"--threads {bowtie2_threads}",
+        CMD_INPUT,
+        f"-x {index}",
+        extra,
+        PIPE_CMD,
+        f") {LOG}"
+    ]
+    shell(" ".join(cmd_parts))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between badae1c and 36b6145.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


62-62: Undefined name snakemake

(F821)


63-63: Undefined name snakemake

(F821)


64-64: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


66-66: Undefined name snakemake

(F821)


67-67: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


77-77: Undefined name snakemake

(F821)


81-81: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


83-83: Undefined name snakemake

(F821)


84-84: Undefined name snakemake

(F821)


85-85: Undefined name snakemake

(F821)


87-87: Undefined name snakemake

(F821)


89-89: Undefined name snakemake

(F821)


126-129: Use ternary operator CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"

(SIM108)


201-201: Undefined name snakemake

(F821)

🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)

1-14: LGTM! Imports and header are well-organized.

The imports are appropriate for the added functionality, including necessary modules for temporary file handling and path operations.


49-50: Fix typo in error message

The word "declarad" should be "declared" in the error message.

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"

76-78: Add early thread validation

The thread validation should occur before thread allocation to prevent negative thread counts.

+if snakemake.threads < 2:
+    raise ValueError("This wrapper requires at least 2 threads")
 bowtie2_threads = snakemake.threads
 sort_threads = snakemake.threads - 1
🧰 Tools
🪛 Ruff (0.8.0)

76-76: Undefined name snakemake

(F821)


77-77: Undefined name snakemake

(F821)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
bio/bowtie2/align/test/Snakefile (2)

136-136: Consider making threads configurable

The hardcoded thread count of 8 might not be optimal for all environments. Consider making this configurable through the config file or workflow resources.

-                threads: 8  # Use at least two threads
+                threads: config.get("bowtie2_threads", 8)  # Configurable with default

132-132: Document sort_extra parameter usage

The sort_extra parameter is included but empty. Please add a comment documenting its purpose and expected values for different sorting programs (samtools/picard).

-                    sort_extra="",
+                    sort_extra="",  # Additional arguments for sort program (e.g., "-m 4G" for samtools, "MAX_RECORDS_IN_RAM=1000000" for Picard)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 36b6145 and ceba80f.

📒 Files selected for processing (1)
  • bio/bowtie2/align/test/Snakefile (2 hunks)
🔇 Additional comments (1)
bio/bowtie2/align/test/Snakefile (1)

107-109: LGTM! Well-structured dynamic rule generation

The nested loop structure efficiently generates all necessary combinations of sorting orders, output formats, and processing programs. This aligns well with the PR objective of supporting different sorting options.

Comment on lines 125 to 126
ref="genome.fasta",
ref_fai="genome.fasta.fai",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Conditionally include reference inputs

The reference inputs (ref and ref_fai) are currently always required, but they're only needed for CRAM output. Consider making these conditional based on the output format.

# Example approach using a lambda function in input
input:
    sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
    idx=multiext(...),
    **lambda wildcards: {
        "ref": "genome.fasta",
        "ref_fai": "genome.fasta.fai"
    } if extension == "cram" else {}

".rev.2.bt2",
),
ref="genome.fasta",
ref_fai="genome.fasta.fai",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make the reference index path configurable

The hardcoded path "genome.fasta.fai" violates the requirement that input file paths should be selectable arbitrarily. Consider making this path configurable through a config file or rule parameter.

-        ref_fai="genome.fasta.fai",
+        ref_fai=config["ref_fai"],  # or another configurable source

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
bio/bowtie2/align/wrapper.py (1)

46-48: ⚠️ Potential issue

Improve thread allocation logic

The current thread allocation might lead to issues when snakemake.threads is 1.

Add validation before thread allocation:

 THREADS = snakemake.threads
+if THREADS < 2:
+    raise ValueError("This wrapper requires at least 2 threads")
 sort_threads = snakemake.threads - 1
🧰 Tools
🪛 Ruff (0.8.2)

46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)

🧹 Nitpick comments (6)
bio/bowtie2/align/meta.yaml (1)

17-21: Consider documenting optional outputs in notes section

The commented optional output paths could be moved to the notes section to maintain cleaner YAML structure while preserving the documentation of available outputs.

output:
  - SAM/BAM/CRAM file. This must be the first output file in the list.
  - idx: Optional path to bam index.
-  # - metrics: Optional path to metrics file.
-  # - unaligned: Optional path to unaligned unpaired reads.
-  # - unpaired: Optional path to unpaired reads that aligned at least once.
-  # - unconcordant: Optional path to pairs that didn't align concordantly.
-  # - concordant: Optional path to pairs that aligned concordantly at least once.
notes: |
  * The `extra` param allows for additional arguments for bowtie2.
  * This wrapper uses an inner pipe. Make sure to use at least two threads in your Snakefile.
+ * Optional outputs:
+   - metrics: Path to metrics file
+   - unaligned: Path to unaligned unpaired reads
+   - unpaired: Path to unpaired reads that aligned at least once
+   - unconcordant: Path to pairs that didn't align concordantly
+   - concordant: Path to pairs that aligned concordantly at least once
bio/bowtie2/align/wrapper.py (5)

16-23: Add input validation to get_extension

While the function handles basic cases well, it could benefit from input validation to handle edge cases.

Consider this enhanced implementation:

 def get_extension(filename: str) -> str:
+    if not filename:
+        raise ValueError("Filename cannot be empty")
     if filename.endswith((".gz", ".bz2")):
         return filename.split(".")[-2].lower()
     return filename.split(".")[-1].lower()

65-66: Enhance error message for input validation

The error message could be more descriptive to help users understand the expected format.

Consider this improvement:

 if not isinstance(SAMPLE, str) and len(SAMPLE) not in [1, 2]:
-    raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
+    raise ValueError(
+        f"Input must have 1 (single-end) or 2 (paired-end) elements, got {len(SAMPLE)} elements"
+    )

106-110: Improve error messages for sort parameters

The error messages for sort parameters could be more user-friendly.

Consider these improvements:

 if SORT_ORDER not in {"coordinate", "queryname"}:
     raise ValueError(
-        f"Unexpected value for sort_order ({SORT_ORDER})"
-        "Valid values are 'coordinate' or 'queryname'"
+        f"Invalid sort_order '{SORT_ORDER}'. Valid values are: 'coordinate' or 'queryname'"
     )

 if SORT_PROGRAM not in {"none", "samtools", "picard"}:
     raise ValueError(
-        f"Unexpected value for sort_program ({SORT_PROGRAM})"
-        "Valid values are 'none', 'samtools' or 'picard'"
+        f"Invalid sort_program '{SORT_PROGRAM}'. Valid values are: 'none', 'samtools' or 'picard'"
     )

Also applies to: 112-116


150-153: Simplify conditional logic using ternary operator

The conditional logic for interleaved input can be simplified.

-        if IS_INTERLEAVED:
-            cmd_input = f"--interleaved {SAMPLE}"
-        else:
-            cmd_input = f"-U {SAMPLE}"
+        cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"
🧰 Tools
🪛 Ruff (0.8.2)

150-153: Use ternary operator cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"

(SIM108)


194-194: Remove unnecessary f-string prefixes

Some strings are marked as f-strings but don't contain any placeholders.

-            SAMTOOLS_OPTS += f"--write-index "
+            SAMTOOLS_OPTS += "--write-index "
-            PICARD_OPTS += f"--CREATE_INDEX true "
+            PICARD_OPTS += "--CREATE_INDEX true "

Also applies to: 214-214

🧰 Tools
🪛 Ruff (0.8.2)

194-194: f-string without any placeholders

Remove extraneous f prefix

(F541)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ceba80f and eb499d4.

📒 Files selected for processing (4)
  • bio/bowtie2/align/meta.yaml (2 hunks)
  • bio/bowtie2/align/test/Snakefile (4 hunks)
  • bio/bowtie2/align/wrapper.py (1 hunks)
  • test_wrappers.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
test_wrappers.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


57-57: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


70-70: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


150-153: Use ternary operator cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"

(SIM108)


194-194: f-string without any placeholders

Remove extraneous f prefix

(F541)


214-214: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (8)
bio/bowtie2/align/meta.yaml (1)

25-27: LGTM! Well-structured sorting parameters

The new sorting parameters are well-organized and provide good flexibility:

  • sort_program: Clear options for sorting implementation
  • sort_extra: Allows customization of sorting behavior
  • sort_order: Standard options for sort ordering
bio/bowtie2/align/test/Snakefile (2)

Line range hint 1-21: LGTM! Well-structured base rule

The base rule test_bowtie2_sam is well-implemented with:

  • Clear input/output paths
  • Proper thread specification (>= 2 as required)
  • Consistent log file naming

75-90: LGTM! Well-designed sorting test rules

The sorting test rules are properly implemented:

  • Inherit from base rule using 'use rule as'
  • Consistent output path structure
  • Clear parameter specification for sort programs
test_wrappers.py (1)

2045-2070: LGTM! Comprehensive test coverage

The bowtie2 alignment tests provide good coverage:

  • Tests all output formats (SAM/BAM/CRAM)
  • Tests both sorting programs (samtools/picard)
  • Consistent with Snakefile output paths
  • Proper core allocation
bio/bowtie2/align/wrapper.py (4)

1-13: LGTM: Header and imports are appropriate

The imports are well-organized and include all necessary modules for the enhanced functionality.


85-99: LGTM: Output validation is thorough

The validation of output file extensions and index compatibility is comprehensive and well-implemented.


125-139: LGTM: Compatibility checks are comprehensive

The validation of CRAM output requirements and index file compatibility is thorough and well-implemented.


234-242: LGTM: Command execution is well-implemented

The use of a temporary directory and command construction is clean and efficient.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
bio/bowtie2/align/wrapper.py (3)

34-38: Consider removing commented-out output variables.

These commented-out variables (METRICS, UNALIGNED, etc.) and their related code (lines 179-188) should either be implemented or removed to maintain code cleanliness.

Would you like me to help clean up these commented sections?


197-198: Simplify thread check condition.

The condition if sort_threads >= 1 is redundant since we've already validated that there are at least 2 threads when sorting is enabled.

-        if sort_threads >= 1:
-            SAMTOOLS_OPTS += f"--threads {sort_threads} "
+        SAMTOOLS_OPTS += f"--threads {sort_threads} "

204-205: Make SORT_EXTRA modification more explicit.

The modification of SORT_EXTRA should be more explicit about appending the queryname sort option.

-        if SORT_ORDER == "queryname":
-            SORT_EXTRA += "-n "
+        queryname_opt = "-n " if SORT_ORDER == "queryname" else ""
+        SORT_EXTRA += queryname_opt
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb499d4 and a2e6504.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


165-165: Undefined name snakemake

(F821)

🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)

16-23: LGTM! Well-documented helper function.

The get_extension function correctly handles both compressed and uncompressed files, with proper type annotations and documentation.


242-249: LGTM! Proper temporary directory handling.

The command execution is well-implemented using Python's tempfile module for secure temporary directory management.


52-54: Verify sorting functionality across different scenarios.

The implementation supports both samtools and picard sorting with different sort orders. Let's verify this functionality:

Also applies to: 194-239

✅ Verification successful

Based on the search results, I can now generate the final response as I have sufficient information to verify the sorting functionality. The results show that:

  1. The sorting implementation in bowtie2/align is consistent with other bioinformatics tools in the codebase (bwa, bwa-mem2, dragmap, etc.)
  2. The sort parameters (sort_program, sort_order, sort_extra) are used consistently across the codebase
  3. There are test cases in bio/bowtie2/align/test/Snakefile that verify both samtools and picard sorting
  4. The sort_order parameter correctly handles both "coordinate" and "queryname" options
  5. The implementation properly handles extra arguments through sort_extra parameter

Sorting functionality implementation is verified and consistent with codebase patterns

  • The implementation follows the established pattern used across multiple bioinformatics tools in the codebase
  • Test cases exist for both samtools and picard sorting implementations
  • The sort parameters are handled consistently with other similar wrappers
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify sorting implementation and its usage across the codebase

# Check for other wrappers using similar sorting logic
echo "Checking for similar sorting implementations:"
rg -A 5 'sort_program.*=.*("samtools"|"picard")'

# Check for test cases covering sorting functionality
echo "Checking for sort-related test cases:"
rg -A 10 'sort_program.*=.*(samtools|picard)' test/

# Check for potential issues with sort order handling
echo "Checking sort order handling:"
rg -A 5 'sort_order.*=.*("coordinate"|"queryname")'

Length of output: 25780

🧰 Tools
🪛 Ruff (0.8.2)

52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)

Comment on lines 104 to 108
if THREADS == 1 and SORT_PROGRAM != "none":
raise ValueError(
"Not enough threads requested. This wrapper requires at least two threads: "
"one for bowtie2 and one for samtools/picard."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate duplicate thread validation logic.

There are two identical thread validation blocks. Consider consolidating them into a single validation at the beginning of the validation section.

Here's a suggested consolidation:

- if THREADS == 1 and SORT_PROGRAM != "none":
-     raise ValueError(
-         "Not enough threads requested. This wrapper requires at least two threads: "
-         "one for bowtie2 and one for samtools/picard."
-     )

# ... other validations ...

- if SORT_PROGRAM != "none" and THREADS <= 1:
-     raise ValueError(
-         "Not enough threads requested. This wrapper requires at least two threads: "
-         "one for bowtie2 and one for samtools/picard."
-     )

+ # Move this to the beginning of validation section
+ if SORT_PROGRAM != "none" and THREADS < 2:
+     raise ValueError(
+         "Not enough threads requested. This wrapper requires at least two threads: "
+         "one for bowtie2 and one for samtools/picard."
+     )

Also applies to: 127-131

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)

16-23: Add input validation to get_extension function

The function should validate that the input filename is not empty and is a string.

 def get_extension(filename: str) -> str:
+    if not filename:
+        raise ValueError("Filename cannot be empty")
     if filename.endswith((".gz", ".bz2")):
         return filename.split(".")[-2].lower()
     return filename.split(".")[-1].lower()

34-38: Clean up or document commented code

Several output variables are commented out. If these features are planned for future implementation, please add a TODO comment explaining the plan. If they are no longer needed, consider removing them.


108-112: Improve error message formatting

The error message for invalid sort_order is missing proper punctuation.

     raise ValueError(
-        f"Unexpected value for sort_order ({SORT_ORDER})"
-        "Valid values are 'coordinate' or 'queryname'"
+        f"Unexpected value for sort_order ({SORT_ORDER}). "
+        "Valid values are 'coordinate' or 'queryname'."
     )

187-235: Add error handling for invalid sort program combinations

While the validation checks prevent invalid combinations, it would be good practice to add an explicit error case in the pattern matching.

     case _:
+        if SORT_PROGRAM not in {"none"}:
+            raise ValueError(f"Unexpected sort program: {SORT_PROGRAM}")
         if sort_threads >= 1:
             SAMTOOLS_OPTS += f"--threads {sort_threads} "
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2e6504 and 3bdc418.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


158-158: Undefined name snakemake

(F821)

🔇 Additional comments (1)
bio/bowtie2/align/wrapper.py (1)

239-246: LGTM! Clean implementation of command execution

The implementation properly manages temporary directories and handles command execution with appropriate logging.

@jlanga
Copy link
Contributor Author

jlanga commented Dec 19, 2024

@fgvieira I think I am done with this wrapper.

I've left commented the part about outputting extra files from bowtie2 (metrics, unaligned, unpaired, concordant and unconcordant). Some of them introduce a lot of complexity in the code because bowtie can output 1 or 2 fastq files depending on if the input is SE or PE. These files can be easily obtained with samtools view and samtools fastq. Also if you want gzipped files you have to use different flags (--un-conc becomes --un-conc-gz or --un-conc-bz2). It is a headache.

Let me know if you want me to keep that functionality on to uncomment it.

@fgvieira
Copy link
Collaborator

It seems you still have un-pushed commits. Can you push them?

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

I don't understand. All my commits are on the bowtie2_sort branch. I have nothing else pending to push in my computer.

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

I clicked the Update Branch button. I hope it was that.

@fgvieira
Copy link
Collaborator

On the environment.yaml file, the patch versions are not pinned (see my comments).

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

I see that we both removed the patch number, but in a comment later you appended the one for picard-slim:

- picard-slim =3.3.0

Is that correct? That one has to be fixed to 3.3.0?

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

Ok. I see other wrappers with picard-slim =3.3.0. Fixing that and repinning everything.

@fgvieira
Copy link
Collaborator

fgvieira commented Dec 20, 2024

but also bowtie2 and wrapper utils

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

Ok. Pinned everything with the exception of samtools. The pin file stays the same after that.

@fgvieira
Copy link
Collaborator

Why the total re-write of the wrapper? As I see it, you only needed to add the logic for picard, like (for example) in the bwa wrapper:

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view {samtools_opts}"
elif sort == "samtools":
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {samtools_opts} {sort_extra} -T {tmpdir}"
elif sort == "fgbio":
if sort_order == "queryname":
sort_extra += " -s Queryname"
pipe_cmd = "fgbio SortBam -i /dev/stdin -o {snakemake.output[0]} {sort_extra}"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = "picard SortSam {java_opts} {sort_extra} --INPUT /dev/stdin --TMP_DIR {tmpdir} --SORT_ORDER {sort_order} --OUTPUT {snakemake.output[0]}"
else:
raise ValueError(f"Unexpected value for params.sort ({sort})")

and change the line:

"| samtools view --with-header "

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

My initial plan was to just do either samtools view or sort, but I saw the wrapper from bwa-mem2 and I saw that it could be done for sam, bam and cram, sorted and unsorted, and with samtools and picard.

But bwa-mem2 only aligns and generates the sam file. Bowtie2 can produce extra outputs.

And now you brought to my attention fgbio SortSam.

And there are wrappers specific to a single sort program like bwa-mem2 and samblaster

What is the official view from snakemake wrappers? A monolithic wrapper htat handles multiple tool combinations, or multiple silly wrappers?

If we go through the silly wrappers, we can delete 100+ lines of code just for validation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
bio/bowtie2/align/wrapper.py (5)

16-23: Enhance extension handling robustness

The extension handling could be improved to be more resilient:

  • Case-insensitive extension comparison
  • Handle paths with multiple dots

Consider this improved implementation:

 def get_extension(filename: str) -> str:
     """
     Return file format since Bowtie2 reads files that
     could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
     """
-    if filename.endswith((".gz", ".bz2")):
-        return filename.split(".")[-2].lower()
-    return filename.split(".")[-1].lower()
+    filename = filename.lower()
+    base, ext = path.splitext(filename)
+    if ext in ('.gz', '.bz2'):
+        return path.splitext(base)[1][1:]  # Remove leading dot
+    return ext[1:]  # Remove leading dot

34-38: Document rationale for commented-out output variables

These commented-out variables suggest planned functionality for metrics and different alignment outputs. Consider adding a TODO comment explaining why they're disabled and when they might be re-enabled.

Add a comment explaining the rationale:

+ # TODO: These outputs are temporarily disabled due to complexity with SE/PE handling
+ # They can be re-enabled once we implement proper SE/PE output handling
 # METRICS = snakemake.output.get("metrics", None)
 # UNALIGNED = snakemake.output.get("unaligned", None)
 # UNPAIRED = snakemake.output.get("unpaired", None)
 # UNCONCORDANT = snakemake.output.get("unconcordant", None)
 # CONCORDANT = snakemake.output.get("concordant", None)

64-68: Enhance error message with actual input details

The error message could be more helpful by including the actual input received.

 if not isinstance(SAMPLE, str) and len(SAMPLE) not in [1, 2]:
     raise ValueError(
         "Input must have 1 (single-end) or 2 (paired-end) elements, "
-        f"got {len(SAMPLE)} elements"
+        f"got {len(SAMPLE)} elements: {SAMPLE}"
     )

91-101: Standardize error message formatting

The error messages for output validation should follow a consistent format.

 if bam_extension.lower() not in {"sam", "bam", "cram"}:
     raise ValueError(
-        f"Unrecognized extension for output file: {bam_extension}."
-        "Valid extensions are 'sam', 'bam' or 'cram'"
+        f"Unrecognized extension for output file: {bam_extension}. "
+        "Valid extensions are: 'sam', 'bam' or 'cram'"
     )

 if bai_extension not in {None, "bai", "crai"}:
     raise ValueError(
-        f"Unrecognized extension for index file: {bai_extension}."
-        "Valid extensions are 'bai' or 'crai'"
+        f"Unrecognized extension for index file: {bai_extension}. "
+        "Valid extensions are: 'bai' or 'crai'"
     )

200-206: Use consistent string formatting style

The command string uses inconsistent formatting styles. Consider using f-strings consistently.

         cmd_output = (
-            "| samtools sort "
-            "{SAMTOOLS_OPTS} "
-            "{SORT_EXTRA} "
-            "-T {TMPDIR} "
-            "-o {bam} "
+            f"| samtools sort "
+            f"{SAMTOOLS_OPTS} "
+            f"{SORT_EXTRA} "
+            f"-T {TMPDIR} "
+            f"-o {bam} "
         )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8242918 and a93881a.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


158-158: Undefined name snakemake

(F821)

🔇 Additional comments (1)
bio/bowtie2/align/wrapper.py (1)

247-254: LGTM! Proper temporary directory and shell execution handling

The implementation correctly uses a temporary directory and properly handles shell command execution with logging.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)

16-25: Add input validation to get_extension

The function should validate that the input is not None or empty to prevent potential runtime errors.

 def get_extension(filename: str) -> str:
+    if not filename:
+        raise ValueError("Filename cannot be None or empty")
     filename = filename.lower()
     base, ext = path.splitext(filename)

36-42: Consider documenting the plan for re-enabling outputs

The TODO comment indicates that several outputs are temporarily disabled due to SE/PE handling complexity. Consider:

  1. Creating a tracking issue for re-enabling these outputs
  2. Documenting the specific challenges with SE/PE handling
  3. Adding acceptance criteria for when these can be re-enabled

Would you like me to help create a GitHub issue to track this enhancement?


112-116: Improve error message formatting

The error message for invalid sort order is missing a space between the message parts.

     raise ValueError(
-        f"Unexpected value for sort_order ({SORT_ORDER})"
-        "Valid values are 'coordinate' or 'queryname'"
+        f"Unexpected value for sort_order ({SORT_ORDER}). "
+        "Valid values are 'coordinate' or 'queryname'"
     )

193-210: Consider extracting samtools command construction

The samtools command construction logic is complex and could benefit from being extracted into a helper function for better maintainability.

+def build_samtools_sort_command(bam, bai, sort_order, ref, sort_threads):
+    opts = get_samtools_opts(snakemake, parse_threads=False, param_name="sort_extra")
+    opts += f" --threads {sort_threads} "
+    if bai:
+        opts += "--write-index "
+    if sort_order == "queryname":
+        opts += "-n "
+    if get_extension(bam) == "cram":
+        opts += f"--reference {ref} --output-fmt CRAM "
+    return (
+        "| samtools sort "
+        f"{opts} "
+        "{SORT_EXTRA} "
+        "-T {TMPDIR} "
+        f"-o {bam} "
+    )

 match SORT_PROGRAM:
     case "samtools":
-        SAMTOOLS_OPTS += f"--threads {sort_threads} "
-        if BAI:
-            bam = f"{BAM}##idx##{BAI}"
-            SAMTOOLS_OPTS += "--write-index "
-        else:
-            bam = BAM
-        if SORT_ORDER == "queryname":
-            SORT_EXTRA += "-n "
-        if bam_extension == "cram":
-            SAMTOOLS_OPTS += f"--reference {REF} --output-fmt CRAM "
-        cmd_output = (
-            "| samtools sort "
-            "{SAMTOOLS_OPTS} "
-            "{SORT_EXTRA} "
-            "-T {TMPDIR} "
-            "-o {bam} "
-        )
+        output_bam = f"{BAM}##idx##{BAI}" if BAI else BAM
+        cmd_output = build_samtools_sort_command(
+            output_bam, BAI, SORT_ORDER, REF, sort_threads
+        )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a93881a and 7ba5b46.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


35-35: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


57-57: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


62-62: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


162-162: Undefined name snakemake

(F821)

🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)

124-127: Consolidate thread validation logic

This thread validation appears to be duplicated. Consider consolidating all thread-related validations into a single location at the beginning of the validation section.


251-258: LGTM! Good use of temporary directory

The implementation properly handles temporary files and includes appropriate logging.


161-162: Verify thread allocation between bowtie2 and sort programs

The thread allocation between bowtie2 and sort programs needs verification to ensure optimal performance.

✅ Verification successful

Based on the code analysis, I can now provide a final response regarding the thread allocation verification:

Thread allocation between bowtie2 and sort is properly implemented

The wrapper implements a sensible thread allocation strategy:

  • When sorting is enabled, it reserves 1 thread for sorting (sort_threads = snakemake.threads - 1) and assigns the rest to bowtie2 (cmd_threads = THREADS)
  • The wrapper enforces this by checking if SORT_PROGRAM != "none" and THREADS < 2: raise ValueError("Not enough threads requested...")
  • This prevents resource contention between bowtie2 alignment and the sorting process
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if other wrappers in the codebase have similar thread allocation patterns
# This helps validate our approach to thread distribution

# Look for thread allocation patterns in other wrappers
rg -A 5 "threads\s*=\s*snakemake\.threads\s*-\s*[0-9]+" "bio/**/wrapper.py"

# Look for bowtie2 thread parameter usage
rg -A 5 "--threads.*bowtie2" "bio/**/wrapper.py"

Length of output: 632


Script:

#!/bin/bash
# Let's try a different approach to find the files and analyze thread allocation

# First, let's find the wrapper.py file
fd "wrapper.py" bio/

# Then look for thread-related patterns in the specific file
fd "wrapper.py" bio/ --exec rg -l "threads"

# Get the content of the specific wrapper file to analyze thread allocation
fd "wrapper.py" bio/bowtie2/align --exec cat {}

# Look for any configuration or documentation files that might specify threading behavior
fd "meta.yaml|README.md" bio/bowtie2/align --exec cat {}

Length of output: 27634

🧰 Tools
🪛 Ruff (0.8.2)

162-162: Undefined name snakemake

(F821)

@fgvieira
Copy link
Collaborator

If it is just different sorting programs, I'd say a single wrapper.
@johanneskoester what do you think?

@johanneskoester
Copy link
Contributor

I would view this from the user perspective. Since it is so common to sort after alignment, it is nice to have a reasonable default for that direcly built into a wrapper. But we don't need to support all combinations. Just the current state of the art (i.e. the fastest sorter) that supports all relevant output files (bam and cram).

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2025

This PR was marked as stale because it has been open for 6 months with no activity.

@github-actions github-actions bot added the Stale label Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants