Releases: broadinstitute/cromwell
50
50 Release Notes
Changes and Warnings
Metadata Archival Config Change
Note: Unless you have already opted-in to GCS-archival of metadata during its development, this change will not affect you.
Cromwell's metadata archival configuration has changed in a backwards incompatible way to increase consistency,
please see
the updated documentation for details.
49
49 Release Notes
Changes and Warnings
Job store database refactoring
The primary keys of Cromwell's job store tables have been refactored to use a BIGINT
datatype in place of the previous
INT
datatype. Cromwell will not be usable during the time the Liquibase migration for this refactor is running.
In the Google Cloud SQL with SSD environment this migration runs at a rate of approximately 40,000 JOB_STORE_SIMPLETON_ENTRY
rows per second. In deployments with millions or billions of JOB_STORE_SIMPLETON_ENTRY
rows the migration may require
a significant amount of downtime so please plan accordingly. The following SQL could be used to estimate the number of
rows in this table:
SELECT table_rows FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'cromwell' AND table_name = 'JOB_STORE_SIMPLETON_ENTRY';
Execution Directory Layout (cache copies)
When an attempt to copy a cache result is made, you'll now see a cacheCopy
directory in the call root directory.
This prevents them clashing with the files staged to the same directory for attempt 1 if the cache copy fails (see also: Bug Fixes).
The directory layout used to be:
[...]/callRoot/
- script [from the cache copy attempt, or for execution attempt 1 if the cache copy fails]
- stdout [from the cache copy attempt, or for execution attempt 1 if the cache copy fails]
- output.file [from the cache copy attempt, or for execution attempt 1 if the cache copy fails]
- attempt-2/ [if attempt 1 fails]
- script
- stdout
- output.file
but is now:
[...]/callRoot/
- cacheCopy/
- script
- stdout
- output.file
- script [for attempt 1 if the cache copy fails]
- stdout [for attempt 1 if the cache copy fails]
- output.file [for attempt 1 if the cache copy fails]
- attempt-2/ [if attempt 1 fails]
- script
- stdout
- output.file
New Functionality
Disable call-caching for tasks
It is now possible to indicate in a workflow that a task should not be call-cached. See details
here.
Delete Intermediate Outputs on PapiV2
- Experimental: When a new workflow option
delete_intermediate_output_files
is submitted with the workflow,
intermediateFile
objects will be deleted when the workflow completes. See the Google Pipelines API Workflow Options
documentation
for more information.
Metadata Archival Support
Cromwell 49 now offers the option to archive metadata to GCS and remove the equivalent metadata from relational
database storage. Please see
the documentation for more details.
Adding support for Google Cloud Life Sciences v2beta
Cromwell now supports running workflows using Google Cloud Life Sciences v2beta API in addition to Google Cloud Genomics v2alpha1.
More information about migration to the new API from v2alpha1
here.
- Note Google Cloud Life Sciences is the new name for newer versions of Google Cloud Genomics.
- Note Support for Google Cloud Genomics v2alpha1 will be removed in a future version of Cromwell. Advance notice will be provided.
New Docs
Installation methods
Links to the conda package and docker container are now available in
the install documentation.
Bug Fixes
- Fix a bug where zip files with directories could not be imported.
For example a zip witha.wdl
andb.wdl
could be imported but one withsub_workflows/a.wdl
andimports/b.wdl
could not. - Fix a bug which sometimes allowed execution scripts copied by a failed cache-copy to be run instead
of the attempt-1 script for a live job execution.
48
48 Release Notes
Womtool Graph for WDL 1.0
The womtool graph
command now supports WDL 1.0 workflows.
- Note: Generated graphs - including in WDL draft 2 - may look slightly different than they did in version 47.
Documentation
- Documented the use of a HSQLDB file-based database so users can try call-caching without needing a database server.
Please checkout the database documentation.
47
47 Release Notes
Retry with more memory on Papiv2 (#5180)
Cromwell now allows user defined retries. With memory-retry
config you can specify an array of strings which when encountered in the stderr
file by Cromwell, allows the task to be retried with multiplier factor mentioned in the config. More information here.
GCS Parallel Composite Upload Support
Cromwell 47 now supports GCS parallel composite uploads which can greatly improve delocalization performance.
This feature is turned off by default, it can be turned on by either a backend-level configuration setting or
on a per-workflow basis with workflow options. More details here.
Papi V2 Localization Using GCR (#5200)
The Docker image for the Google Cloud SDK was previously only published on Docker
Hub. Now that the image is publicly hosted in
GCR, Papi V2 jobs will localize inputs and delocalize outputs using
the GCR image.
46.1
46.1 Release Notes
Retry with more memory on Papiv2 (#5180)
Cromwell now allows user defined retries. With memory-retry
config you can specify an array of strings which when encountered in the stderr
file by Cromwell, allows the task to be retried with multiplier factor mentioned in the config. More information here.
46
46 Release Notes
Nvidia GPU Driver Update
The default driver for Nvidia GPU's on Google Cloud has been updated from 390
to 418.87.00
. A user may override this option at anytime by providing the nvidiaDriverVersion
runtime attribute. See the Runtime Attribute description for GPUs for detailed information.
Enhanced "error code 10" handling in PAPIv2
On Google Pipelines API v2, a worker VM that is preempted may emit a generic error message like
PAPI error code 10. The assigned worker has failed to complete the operation
instead of a preemption-specific message like
PAPI error code 14. Task was preempted for the 2nd time.
Cromwell 44 introduced special handling that detects both preemption indicators and re-runs the job consistent with the preemptible
setting.
Cromwell 46 enhances this handling in response to user reports of possible continued issues.
45.1
45
45 Release Notes
Improved input and output transfer performance on PAPI v2
Cromwell now requires only a single PAPI "action" each for the entire localization or delocalization process, rather than two per file or directory.
This greatly increases execution speed for jobs with large numbers of input or output files.
In testing, total execution time for a call with 800 inputs improved from more than 70 minutes to less than 20 minutes.
List dependencies flag in Womtool Command Line (#5098)
Womtool now outputs the list of files referenced in import statements using -l
flag for validate
command.
More info here
BCS backend new Features support
New docker registry
Alibaba Cloud Container Registry is now supported for the docker
runtime attribute, and the previous dockerTag
runtime attribute continues to be available for Alibaba Cloud OSS Registry.
Call caching
Cromwell now supports Call caching when using the BCS backend.
Workflow output glob
Globs can be used to define outputs for BCS backend.
NAS mount
Alibaba Cloud NAS is now supported for the mounts
runtime attribute.
44
44 Release Notes
Improved PAPI v2 Preemptible VM Support
In some cases PAPI v2 will report the preemption of a VM in a way that differs from PAPI v1. This novel means of reporting
preemption was not recognized by Cromwell's PAPI v2 backend and would result in preemptions being miscategorized as call failures.
Cromwell's PAPI v2 backend will now handle this type of preemption.
43
43 Release Notes
Virtual Private Cloud with Subnetworks
Cromwell now allows PAPIV2 jobs to run on a specific subnetwork inside a private network by adding the subnetwork key
subnetwork-label-key
inside virtual-private-cloud
in backend configuration. More info here.
Call caching database refactoring
Cromwell's CALL_CACHING_HASH_ENTRY
primary key has been refactored to use a BIGINT
datatype in place of the previous
INT
datatype. Cromwell will not be usable during the time the Liquibase migration for this refactor is running.
In the Google Cloud SQL with SSD environment this migration runs at a rate of approximately 100,000 CALL_CACHING_HASH_ENTRY
rows per second. In deployments with millions or billions of CALL_CACHING_HASH_ENTRY
rows the migration may require
a significant amount of downtime so please plan accordingly. The following SQL could be used to estimate the number of
rows in this table:
select max(CALL_CACHING_HASH_ENTRY_ID) from CALL_CACHING_HASH_ENTRY
Stackdriver Instrumentation
Cromwell now supports sending metrics to Google's Stackdriver API.
Learn more on how to configure here.
BigQuery in PAPI
Cromwell now allows a user to specify BigQuery jobs when using the PAPIv2 backend
Configuration Changes
StatsD Instrumentation
There is a small change in StatsD's configuration path. Originally, the path to the config was services.Instrumentation.config.statsd
which now has been updated to services.Instrumentation.config
. More info on its configuration can be found
here.
cached-copy
A new experimental feature, the cached-copy
localization strategy is available for the shared filesystem.
More information can be found in the documentation on localization.
Yaml node limits
Yaml parsing now checks for cycles, and limits the maximum number of parsed nodes to a configurable value. It also
limits the nesting depth of sequences and mappings. See the documentation on configuring
YAML for more information.
API Changes
Workflow Metadata
- It is now possible to use
includeKey
andexcludeKey
at the same time. If so, the metadata key must match theincludeKey
and not match theexcludeKey
to be included. - It is now possible to use "
calls
" as one of yourexcludeKey
s, to request that only workflow metadata gets returned.
PostgreSQL support
Cromwell now supports PostgreSQL (version 9.6 or higher, with the Large Object
extension installed) as a database backend.
See here for
instructions for configuring the database connection.