Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

### Template

- New pre-commit hooks (large files, merge conflicts) in pipeline template ([#3935](https://github.com/nf-core/tools/pull/3935))
- switch to uv and prek for pipeline linting workflow ([#3942](https://github.com/nf-core/tools/pull/3942))

### Linting
Expand Down
30 changes: 30 additions & 0 deletions nf_core/pipeline-template/.hooks/block_pipeline_outdir.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash
# This hook is used to block commits if they include staged files inside a directory
# which also contains a subdirectory called `pipeline_info`. The purpose of this is to
# prevent users from inadvertently committing output from pipeline test runs inside the
# development directory.

set -e

# list staged files
staged_files="$(git diff --cached --name-only)"

status=0

for file in $staged_files; do
file_dir=$(dirname "$file")

# Walk up the directory tree and check if the current directory contains a subdirectory called `pipeline_info`
# or the staged file is itself inside a directory called `pipeline_info`.
while [ "$file_dir" != "." ] && [ "$file_dir" != "/" ]; do
if [ $(basename "$file_dir") == "pipeline_info" ] || [ -d "$file_dir/pipeline_info" ]; then
echo "❌ Commit blocked: Please do not commit output from pipeline test runs to the pipeline code itself."
echo "Use 'git restore --staged <file>...' to remove the output files from the staging area before proceeding."
Comment on lines +21 to +22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also print out the offending path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change the script to print out $file, yes.

However, I reckon that people may typically have added their whole output folder to the staging area and then it is more effective to remove all relevant files at once instead single files. I would, though, not want to make the call what that top-level folder is.

status=1
break
fi
file_dir=$(dirname "$file_dir")
done
done

exit "$status"
17 changes: 17 additions & 0 deletions nf_core/pipeline-template/.pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,20 @@ repos:
subworkflows/nf-core/.*|
.*\.snap$
)$
- id: check-added-large-files
args: [--maxkb=5000]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the limit be defined as a variable somewhere and be used here? This would make it easier to adjust it for each pipeline, if necessary. Ideally it would be nice if it could be in .nf-core.yml, but I don't know if that is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without needing another script, this file would be the place.

exclude: |
(?x)^(
.*ro-crate-metadata.json$|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the RO crate be that big ?
I guess we can accept it in on the basis it's machine generated, so its size can be considered legitimate whatever it is ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my logic. I just checked a semi-random selection of nf-core pipelines for the biggest files with the command given in the PR description and the RO-crate json was usually among the bigger files in a repo. Not that big, but I felt that it is safer to exclude it entirely as it is machine-generated.

With the nf-core downloads test I already once left everyone in a catch-22 situation where they required that test to pass to merge the template update that would correct said test to allow it to succeed. Therefore, I wanted to err on the safe site here.

.*\.snap$|
lib/nfcore_external_java_deps.jar$|
docs/.*\.(svg|pdf)$|
assets/.*$
)$
- id: check-merge-conflict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation for this PR was evidently the check-added-large-files. Phil just recommended to use the other one as well, because he has it on the MultiQC repo.

If you think that I shouldn't bundle them in one PR to allow separate decisions, I am fine with removing it as well.

- repo: local
hooks:
- id: block-pipeline-outdir
name: Prevent committing output from pipeline test runs to the pipeline code itself
entry: ./.hooks/block_pipeline_outdir.sh
language: script
1 change: 1 addition & 0 deletions nf_core/pipelines/create/template_features.yml
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,7 @@ code_quality:
- ".prettierignore"
- ".prettierrc.yml"
- ".github/workflows/fix-linting.yml"
- ".hooks/block_pipeline_outdir.sh"
short_description: "Use code linters"
description: "The pipeline will include code linters and CI tests to lint your code: pre-commit, editor-config and prettier."
help_text: |
Expand Down