This repository was archived by the owner on Jan 24, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 91
Sql repo #1166
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Created a new API based on a SQLite DB for the data repository. This is still WIP, and is incomplete. Changes to download data script: - Move the script to the project root so that it can access the ga4gh package. - Changed the NCBI URL used to access sequence data as the existing one seems to have been discontinued. - Added a --force/-f flag to force removal of any existing directories. - Changed the download directory to contain a flat list of the files, as the hierarchy wasn't useful any more. - Added the repo DB. - Removed the checkpointing functionality. This would have been very complex to maintain now that we are using a DB rather than just putting files in a specific location. The main reason for including it has gone away in any case, as htslib should be much more reliable now.
Conflicts: ga4gh/datamodel/datasets.py
Updated all test data to use the single FASTA for a reference set.
Simplified ontologies to just have a single object representing an ontology, backed by a single file, with the data repository providing all the other functionality. Partial refactor of VariantAnnotations: The present organisation of the VariantAnnotations code was difficult to reconcile with the DB based repo refactor. This commit gives an outline of how it could work, by changing the relationship between VariantSet and VariantAnnotationSet from "is a" to "has a". Unfortunately I could not complete this work, and have had to move on to other aspects. I have therefore disabled the tests that are failing and moved on.
- added read/write semantics for opening the repo manager - removed repo_manager module and disabled tests.
Also created conditional startup code for the server to keep support for the file system repo.:
Variant annotation sets were stored at the top level in the dataset, which was awkward and inconsistent. Fixed simulated stack tests.
All VA tests have been re-enabled.
Sql repo initial work
This allows us to be systematic about what we accept into the repo and ensure that we don't have duplciates. Also gives a neat way of checking for errors, and tidies up a lot of code.
Also implemented remove dataset, referenceSet in the CLI, and reenabled some CLI tests.
Updates on repo manager
Add feature set add / delete to repo manager
The SQL schema and CLI used the term Ontology rather than OntologyTermMap because it seems that the current approach is quite limited and will need to be changed. This seems more forward compatible, since we don't want to affect uses by making them change CLI syntax or make backwards incompatible changes to the schema. Conflicts: ga4gh/cli.py tests/unit/test_repo_manager.py
Ontologies update
Add version check to data repo
Cache num(Un)AlignedBases for ReadGroupSets
Adds initial support for adding Variants and VariantAnnotations to the SQL repo and the manager CLI. Conflicts: tests/unit/test_repo_manager.py
Add top-level exception handler to repo manager
Repo manager end to end tests enabled
Issue #1182
Reinstate TestVerify
OK, #1212 has been merged so I think we're good to go. Retracting my previous -1. @dcolligan, what do you think? Ready to push the button? |
I am now pushing the button. |
@jeromekelleher let the great issue-closing begin |
Woohoo! Thanks @dcolligan, time for a closeathon! |
This was referenced May 5, 2016
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR changes master to use the sql repo, and removes support for file system based data repositories.
Issues closed by this PR:
check
for variantsets misleading #1059 No longer relevant, as check has been changed toverify
and rewritten. However, dealing with inconsistent VCF files passed as part of a variant set is something we need to tackle.This is a large change affecting all developers and users, so please review and vote.