This repository has been archived by the owner on Jan 18, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6
MANIFEST Spec
bruth edited this page Oct 15, 2014
·
3 revisions
The Varify Data Warehouse (VDW) provides a loading pipeline that ingests files in the Variant Call Format (VCF). These files are parsed and data is loaded in their respective tables in the database.
To ensure consistency of the loading process, the loader requires each VCF file to have a MANIFEST file associated with it. This file contains metadata used for identification and validation of the file itself. A typical MANIFEST file looks as follows:
[general]
load = true
[sample]
project = CEU trio
batch = batch1
version = 0
[vcf]
file = locus_1.vcf
md5 = f67a6913ee83345657a6e790c6f5feee
The MANIFEST file is searched for by the loader, which scans directories defined in the VDW_SAMPLE_DIRS
setting. Valid MANIFEST files that are found whose data has not been loaded before will be queued by the loader. Below is a list of options for each section of the MANIFEST file.
-
load
- Set totrue
to tell the loader the sample is allowed to be queued. Defaults to false, in which case the loader will skip the associated VCF file.
-
name
- The name of the genome used -
version
- Version of the genome used, e.g.hg19
-
project
(required) - Specifies the project the sample is associated with. -
batch
(required) - Specifies the batch the sample is associated with. -
version
(required) - Specifies the version of the sample. This provides a mechanism for updating the view of the sample in Varify. Previous versions will not be removed or changed, but they will be unpublished in favor of the new version. -
sample
- Specifies the sample names contained in the VCF file. If not specified, they will be extracted from the VCF file itself.
-
file
(required) - The path to the file. This can be a relative or absolute path to the directory containing the MANIFEST file. -
md5
- And MD5 hex digest of the file. This is used during validation of the VCF file.