Skip to content

Commit

Permalink
v0.1.5: Finalize variant evaluation support with new bcbio-nextgen ch…
Browse files Browse the repository at this point in the history
…anges: handle bgzip inputs, support non-GATK callable regions, avoid VariantEval due to java errors. Thanks to Severine Catreux.
  • Loading branch information
chapmanb committed Mar 15, 2014
1 parent 2b6d257 commit fc5bac4
Show file tree
Hide file tree
Showing 6 changed files with 30 additions and 15 deletions.
6 changes: 5 additions & 1 deletion HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
## 0.1.5 (In progress)
## 0.1.5 (15 March 2014)

- Move to MIT licensed GATK 3.0 framework.
- Support bgzipped inputs for variant assessment. Thanks to Severine Catreux.
- Support lightweight loading options for gemini integration to avoid large load
times with new gene tables in gemini 0.6.5.
- Avoid running GATK VariantEval which causes intermittent java core dumps.
- Avoid re-runs of callable regions when already prepared using non-GATK chanjo-based
methods.

## 0.1.4 (5 March 2014)

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ associated with different variant representations.

### Download

The latest release is 0.1.4 (5 March 2014): [bcbio.variation-0.1.4-standalone.jar][dl].
The latest release is 0.1.5 (15 March 2014): [bcbio.variation-0.1.5-standalone.jar][dl].
Run from the command line:

$ java -jar bcbio.variation-VERSION-standalone.jar [arguments]
Expand All @@ -44,7 +44,7 @@ the library for variant comparison, normalization and ensemble calling. Note
that bcbio.variation requires Java 1.7 since the underlying GATK libraries are
not compatible with earlier versions.

[dl]: https://github.com/chapmanb/bcbio.variation/releases/download/v0.1.4/bcbio.variation-0.1.4-standalone.jar
[dl]: https://github.com/chapmanb/bcbio.variation/releases/download/v0.1.5/bcbio.variation-0.1.5-standalone.jar

### As a library

Expand Down
2 changes: 1 addition & 1 deletion project.clj
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(defproject bcbio.variation "0.1.5-SNAPSHOT"
(defproject bcbio.variation "0.1.5"
:description "Toolkit to analyze genomic variation data, built on the GATK with Clojure"
:license {:name "MIT" :url "http://www.opensource.org/licenses/mit-license.html"}
:dependencies [[org.clojure/clojure "1.5.1"]
Expand Down
3 changes: 2 additions & 1 deletion src/bcbio/variation/callable.clj
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@
(if-not (fs/exists? base-dir)
(fs/mkdirs base-dir))
(broad/index-bam align-bam)
(broad/run-gatk "CallableLoci" args file-info {:out [:out-bed :out-summary]})
(when (itx/needs-run? (:out-bed file-info))
(broad/run-gatk "CallableLoci" args file-info {:out [:out-bed :out-summary]}))
(:out-bed file-info)))

(defn features-in-region [source space start end]
Expand Down
18 changes: 10 additions & 8 deletions src/bcbio/variation/compare.clj
Original file line number Diff line number Diff line change
Expand Up @@ -157,18 +157,20 @@
(let [c-files (select-by-concordance (:sample exp) c1 c2 (:ref exp)
:out-dir (get-in config [:dir :out])
:intervals (:intervals exp))
eval (calc-variant-eval-metrics (:file c1) (:file c2) (:ref exp)
:out-base (first c-files)
:intervals (:intervals exp))
c-eval (calc-variant-eval-metrics (:file c1) (:file c2) (:ref exp)
:out-base (fsp/add-file-part (first c-files) "callable")
:intervals (callable-intervals exp c1 c2))]
;; eval (calc-variant-eval-metrics (:file c1) (:file c2) (:ref exp)
;; :out-base (first c-files)
;; :intervals (:intervals exp))
;; c-eval (calc-variant-eval-metrics (:file c1) (:file c2) (:ref exp)
;; :out-base (fsp/add-file-part (first c-files) "callable")
;; :intervals (callable-intervals exp c1 c2))
]
{:c-files (zipmap-ordered (map keyword
["concordant" (discordant-name c1) (discordant-name c2)])
c-files)
:c1 c1 :c2 c2 :exp exp :dir (config :dir)
:metrics (report/concordance-report-metrics (:sample exp) eval)
:callable-metrics (report/concordance-report-metrics (:sample exp) c-eval)})))
;; :metrics (report/concordance-report-metrics (:sample exp) eval)
;; :callable-metrics (report/concordance-report-metrics (:sample exp) c-eval)
})))

(defn compare-two-vcf
"Compare two VCF files, handling standard and haploid specific comparisons."
Expand Down
12 changes: 10 additions & 2 deletions src/bcbio/variation/normalize.clj
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,14 @@
(neg-qual? xs) []
:else xs)))

(defn zipsafe-reader
"Provide a reader, handling gzipped or plain text inputs."
[f]
(reader
(cond
(.endsWith f ".gz") (-> f input-stream java.util.zip.GZIPInputStream.)
:else f)))

(defn clean-problem-vcf
"Clean VCF file which GATK parsers cannot handle due to illegal characters.
Fixes:
Expand All @@ -559,7 +567,7 @@
- Removes spaces in INFO fields."
[in-vcf-file ref-file sample call & {:keys [out-dir]}]
(let [get-ref-base (ref-base-getter ref-file)
out-file (fsp/add-file-part in-vcf-file "preclean" out-dir)]
out-file (string/replace (fsp/add-file-part in-vcf-file "preclean" out-dir) ".vcf.gz" ".vcf")]
(letfn [(remove-gap [n xs]
(assoc xs n
(-> (nth xs n)
Expand All @@ -584,7 +592,7 @@
(string/join "\t"))))]
(when (itx/needs-run? out-file)
(itx/with-tx-file [tx-out-file out-file]
(with-open [rdr (reader in-vcf-file)
(with-open [rdr (zipsafe-reader in-vcf-file)
wtr (writer tx-out-file)]
(doall
(map #(.write wtr (str % "\n"))
Expand Down

0 comments on commit fc5bac4

Please sign in to comment.