-
Notifications
You must be signed in to change notification settings - Fork 0
Feature parallelization #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
34f41a8
Update gitignore
ozlemmuslu 78c870f
Upgrade Python and the Python packages.
ozlemmuslu 2a69e4d
Introduce null check before rank sum test
ozlemmuslu 055dc23
perf: major performance optimizations reducing runtime ~4-5x
ozlemmuslu 547667b
perf: cache in power.py to avoid computing the same values repeatedly.
ozlemmuslu 4493140
feat: chromosome-level parallelization via ProcessPoolExecutor
ozlemmuslu 9d3259e
return nan in safe_median to repeat previous results
ozlemmuslu a4f2288
breaking: change cli argument include-ambiguous-bases to exclude-ambi…
ozlemmuslu 8f62e75
Setup
ozlemmuslu 8f8f1c3
Update unit tests
ozlemmuslu c2370d2
Update python version in tests
ozlemmuslu 9e3d099
more flexible Python version
ozlemmuslu 534ebc0
more flexible Python version
ozlemmuslu 8ab4514
update test with new method
ozlemmuslu 82606ed
Fix tests
ozlemmuslu f5ea6b2
fix insertion tests
ozlemmuslu abefa7b
attempt to fix test
ozlemmuslu 541542f
remove insertion/deletion tests. will add an issue to the repository
ozlemmuslu cd8bb35
Refactor the code for better readability
ozlemmuslu e198540
Fix VCF INFO field error
ozlemmuslu 5e3344d
Fix wrong tag
ozlemmuslu 6db73d6
bump version
ozlemmuslu 12f017a
Readd test that was accidentally deleted
ozlemmuslu 4b6a272
improve code quality based on Codacy output
ozlemmuslu 8b80837
more codacy improvements
ozlemmuslu cf31f1e
Re-add comments
ozlemmuslu 8803b48
black code formatting
ozlemmuslu a3e5e20
fix unused variable bam
ozlemmuslu 2bcc602
fix Codacy errors
ozlemmuslu c0254d0
add seed for random class. fixes #54
ozlemmuslu eb17614
change version to 3.0.0
ozlemmuslu f239072
add support for CRAM files. closes #45
ozlemmuslu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,4 +14,6 @@ out | |
| vafator/tests/resources/results | ||
| .cache | ||
| .jupyter | ||
| .local | ||
| .local | ||
| run.sh | ||
| VAFator.egg-info/* | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,11 @@ | ||
| pandas~=1.3.3 | ||
| pysam~=0.19.1 | ||
| cyvcf2~=0.30.14 | ||
| logzero~=1.7.0 | ||
| pybedtools~=0.9.0 | ||
| numpy>=1.20,<2.0 | ||
| scipy>=1.0.0,<2.0.0 | ||
| pandas>=3.0.1,<4 | ||
| # pysam pinned: above 0.21.0 base qualities show up wrong in the presence of soft clipping/insertions/overlapping read pairs or a combination of these factors | ||
| pysam==0.21.0 | ||
| cyvcf2>=0.32.1,<0.33 | ||
| logzero>=1.7.0,<2 | ||
| pybedtools>=0.12.0,<0.13 | ||
| numpy>=2.4.3,<3 | ||
| scipy>=1.17.1,<2 | ||
| setuptools | ||
| pytest | ||
| pytest-cov | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,63 @@ | ||
| [metadata] | ||
| description-file = README.md | ||
| name = VAFator | ||
| version = 3.0.0 | ||
| description = Annotate variants in a VCF file with technical annotations from one or more BAMs | ||
| description-file = README.md | ||
| long_description = file: README.md | ||
| long_description_content_type = text/markdown | ||
| license = MIT | ||
| url = https://github.com/TRON-Bioinformatics/vafator | ||
| author = Pablo Riesgo Ferreiro, Jonas Ibn-Salem, Luis Kress, Özlem Muslu | ||
| classifiers = | ||
| Development Status :: 4 - Beta | ||
| Intended Audience :: Healthcare Industry | ||
| Intended Audience :: Science/Research | ||
| Topic :: Scientific/Engineering :: Bio-Informatics | ||
| Programming Language :: Python :: 3.11 | ||
| Programming Language :: Python :: 3.12 | ||
| Programming Language :: Python :: 3.13 | ||
| Programming Language :: Python :: 3 :: Only | ||
| License :: OSI Approved :: MIT License | ||
| Operating System :: Unix | ||
| author_email = priesgoferreiro@gmail.com | ||
|
|
||
| [options.entry_points] | ||
| console_scripts = | ||
| vafator=vafator.command_line:annotator | ||
| multiallelics-filter=vafator.command_line:multiallelics_filter | ||
| vafator2decifer=vafator.command_line:vafator2decifer | ||
| hatchet2bed=vafator.command_line:hatchet2bed | ||
|
|
||
| [options] | ||
| packages = find: | ||
| include_package_data = True | ||
| zip_safe = False | ||
|
|
||
| python_requires = >=3.11, <3.12 | ||
|
|
||
| install_requires = | ||
| pandas>=3.0.1,<4 | ||
| pysam==0.21.0 # above this version base qualities show up wrong in the presence of soft clipping/insertions/both (latest release 0.23.3) | ||
| cyvcf2>=0.32.1,<0.33 | ||
| logzero>=1.7.0,<2 | ||
| pybedtools>=0.12.0,<0.13 | ||
| numpy>=2.4.3,<3 | ||
| scipy>=1.17.1,<2 | ||
| setuptools | ||
|
|
||
| [options.packages.find] | ||
| exclude = | ||
| tests | ||
| tests.* | ||
| legacy | ||
| legacy.* | ||
|
|
||
| [options.extras_require] | ||
| dev = | ||
| pytest | ||
| ruff | ||
| mypy | ||
| test = | ||
| pytest | ||
| pytest-cov | ||
| setuptools |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice, thanks for cleaning this up! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,52 +1,11 @@ | ||
| from setuptools import find_packages, setup | ||
| from setuptools import setup | ||
| import vafator | ||
|
|
||
|
|
||
| VERSION = vafator.VERSION | ||
|
|
||
|
|
||
| # parses requirements from file | ||
| with open("requirements.txt") as f: | ||
| required = f.read().splitlines() | ||
|
|
||
| with open("README.md", "r", encoding="utf-8") as f: | ||
| long_description = f.read() | ||
|
|
||
| # Build the Python package | ||
| setup( | ||
| name='vafator', | ||
| version=VERSION, | ||
| packages=find_packages(exclude=["legacy"]), | ||
| entry_points={ | ||
| 'console_scripts': [ | ||
| 'vafator=vafator.command_line:annotator', | ||
| 'multiallelics-filter=vafator.command_line:multiallelics_filter', | ||
| 'vafator2decifer=vafator.command_line:vafator2decifer', | ||
| 'hatchet2bed=vafator.command_line:hatchet2bed' | ||
| ], | ||
| }, | ||
| author="TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University Mainz" | ||
| "- Computational Medicine group", | ||
| author_email='pablo.riesgoferreiro@tron-mainz.de', | ||
| description='Annotate a VCF file with AF, AD and DP from tumor and normal BAMs', | ||
| long_description=long_description, | ||
| long_description_content_type="text/markdown", | ||
| url="https://github.com/tron-bioinformatics/vafator", | ||
| requires=[], | ||
| install_requires=required, | ||
| classifiers=[ | ||
| 'Development Status :: 4 - Beta', # Chose either "3 - Alpha", "4 - Beta" or "5 - Production/Stable" as the current state of your package | ||
| 'Intended Audience :: Healthcare Industry', | ||
| 'Intended Audience :: Science/Research', | ||
| 'Topic :: Scientific/Engineering :: Bio-Informatics', | ||
| 'Programming Language :: Python :: 3.7', | ||
| 'Programming Language :: Python :: 3.8', | ||
| 'Programming Language :: Python :: 3.9', | ||
| 'Programming Language :: Python :: 3.10', | ||
| 'Programming Language :: Python :: 3 :: Only', | ||
| "License :: OSI Approved :: MIT License", | ||
| "Operating System :: Unix" | ||
| ], | ||
| python_requires='>=3.7', | ||
| license='MIT' | ||
| ) | ||
| setup() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1 @@ | ||
| VERSION='2.2.2' | ||
|
|
||
|
|
||
| AMBIGUOUS_BASES = ['N', 'M', 'R', 'W', 'S', 'Y', 'K', 'V', 'H', 'D', 'B'] | ||
| VERSION = "3.0.0" |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pandas used at all? Pandas v3 introduced some major changes that break backward compatibility. If pandas is used, we should check that is works as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas is used in hachet2bed, ploidies, and vafator2decifer.
My test runs would not include these, I'm not sure how well they are covered in the unit/integration tests either