-
Notifications
You must be signed in to change notification settings - Fork 764
Task provenance #3802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Task provenance #3802
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
47d0168
Add initial task graph and metadata json file
bentsherman ae67027
Add task inputs and outputs to conrete DAG
bentsherman 8f95cd6
Fix failing tests
bentsherman 9f11e4b
Use path-based APIs to get file metadata
bentsherman db6aed1
Merge branch 'master' into ben-task-graph
bentsherman 8456892
Use buffer to compute checksum
bentsherman 0dd98d6
Merge branch 'master' into ben-task-graph-pull
bentsherman 0f505d3
Merge branch 'master' into ben-task-graph-pull
bentsherman e81e584
Replace synchronized with lock
bentsherman 35bac94
Refactor task graph to not depend on task directory naming
bentsherman 8bbe3d7
Replace abstract/concrete with process/task
bentsherman 7ec397f
Add support for AWS SSE env variables
pditommaso c0c7492
Merge branch 'master' into ben-task-graph-pull
bentsherman 49396cf
Fix failing tests
bentsherman 910a2f9
Merge branch 'master' into ben-task-graph-pull
bentsherman 0c63254
Rename 'process' option to 'workflow'
bentsherman 9b934e0
Save task inputs and outputs to cache db instead of json file
bentsherman 08fda25
Decouple task graph from DAG renderer
bentsherman 40a05e4
Merge branch 'master' into ben-task-graph
bentsherman 5eb9de1
Improve DAG rendering
bentsherman 47e595e
Merge branch 'master' into ben-task-graph
bentsherman c7422df
Add subworkflows to task graph
bentsherman cc8b6dc
Remove task DAG rendering (in favor of nf-prov)
bentsherman 8ed4c9f
Merge branch 'master' into ben-task-graph
bentsherman 73459b7
Revert unrelated changes
bentsherman d46b346
Remove unused code
bentsherman d851c2d
Use CacheHelper to compute MD5 checksum
bentsherman 501bce7
Add temporary debug logging
bentsherman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,7 +43,7 @@ import nextflow.script.params.TupleOutParam | |
| import java.util.concurrent.atomic.AtomicLong | ||
|
|
||
| /** | ||
| * Model a direct acyclic graph of the pipeline execution. | ||
| * Model the directed acyclic graph of the workflow definition. | ||
| * | ||
| * @author Paolo Di Tommaso <[email protected]> | ||
| */ | ||
|
|
||
142 changes: 142 additions & 0 deletions
142
modules/nextflow/src/main/groovy/nextflow/dag/TaskDAG.groovy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| /* | ||
| * Copyright 2013-2023, Seqera Labs | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package nextflow.dag | ||
|
|
||
| import java.nio.file.Files | ||
| import java.nio.file.Path | ||
| import java.util.concurrent.locks.Lock | ||
| import java.util.concurrent.locks.ReentrantLock | ||
|
|
||
| import groovy.transform.CompileStatic | ||
| import groovy.transform.TupleConstructor | ||
| import groovy.util.logging.Slf4j | ||
| import nextflow.extension.FilesEx | ||
| import nextflow.processor.TaskRun | ||
| import nextflow.script.params.FileOutParam | ||
| import nextflow.trace.TraceRecord | ||
| /** | ||
| * Model the directed acyclic graph of the workflow execution. | ||
| * | ||
| * @author Ben Sherman <[email protected]> | ||
| */ | ||
| @Slf4j | ||
| @CompileStatic | ||
| class TaskDAG { | ||
|
|
||
| private Map<TaskRun,Vertex> vertices = new HashMap<>() | ||
|
|
||
| private Map<Path,TaskRun> taskLookup = new HashMap<>() | ||
|
|
||
| private Lock sync = new ReentrantLock() | ||
|
|
||
| Map<TaskRun,Vertex> getVertices() { vertices } | ||
|
|
||
| /** | ||
| * Add a task to the graph. | ||
| * | ||
| * @param task | ||
| */ | ||
| void addTask(TaskRun task) { | ||
| final inputs = task.getInputFilesMap() | ||
|
|
||
| sync.lock() | ||
| try { | ||
| // add new task to graph | ||
| vertices[task] = new Vertex(inputs) | ||
| } | ||
| finally { | ||
| sync.unlock() | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Add a task's outputs to the graph. | ||
| * | ||
| * @param task | ||
| */ | ||
| void addTaskOutputs(TaskRun task) { | ||
| final outputs = task | ||
| .getOutputsByType(FileOutParam) | ||
| .values() | ||
| .flatten() as Set<Path> | ||
|
|
||
| sync.lock() | ||
| try { | ||
| // add task outputs to graph | ||
| vertices[task].outputs = outputs | ||
|
|
||
| // add new output files to task lookup | ||
| for( Path path : outputs ) | ||
| taskLookup[path] = task | ||
| } | ||
| finally { | ||
| sync.unlock() | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Get the task that produced the given file. | ||
| * | ||
| * @param path | ||
| */ | ||
| TaskRun getProducerTask(Path path) { | ||
| taskLookup[path] | ||
| } | ||
|
|
||
| /** | ||
| * Get the vertex for the task that produced the given file. | ||
| * | ||
| * @param path | ||
| */ | ||
| Vertex getProducerVertex(Path path) { | ||
| vertices[taskLookup[path]] | ||
| } | ||
|
|
||
| /** | ||
| * Save task input and output metadata to trace record. | ||
| * | ||
| * @param task | ||
| * @param record | ||
| */ | ||
| void saveToRecord(TaskRun task, TraceRecord record) { | ||
| final vertex = vertices[task] | ||
|
|
||
| record.inputs = vertex.inputs.collect { name, path -> | ||
| final producer = getProducerTask(path) | ||
| new TraceRecord.Input( | ||
| name, | ||
| path, | ||
| producer ? producer.hash.toString() : null) | ||
| } | ||
|
|
||
| record.outputs = vertex.outputs.collect { path -> | ||
| new TraceRecord.Output( | ||
| path, | ||
| Files.size(path), | ||
| FilesEx.getChecksum(path)) | ||
| } | ||
|
|
||
| log.info "task ${task.name} ; inputs: ${record.inputs} ; outputs: ${record.outputs}" | ||
| } | ||
|
|
||
| @TupleConstructor(excludes = 'outputs') | ||
| static class Vertex { | ||
| Map<String,Path> inputs | ||
| Set<Path> outputs | ||
| } | ||
|
|
||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
75 changes: 75 additions & 0 deletions
75
modules/nextflow/src/test/groovy/nextflow/dag/TaskDAGTest.groovy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| /* | ||
| * Copyright 2013-2023, Seqera Labs | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package nextflow.dag | ||
|
|
||
| import java.nio.file.Paths | ||
|
|
||
| import com.google.common.hash.HashCode | ||
| import nextflow.processor.TaskRun | ||
| import spock.lang.Specification | ||
| /** | ||
| * | ||
| * @author Ben Sherman <[email protected]> | ||
| */ | ||
| class TaskDAGTest extends Specification { | ||
|
|
||
| def 'should add task vertices and outputs' () { | ||
|
|
||
| given: | ||
| def task1 = Mock(TaskRun) { | ||
| getInputFilesMap() >> [ | ||
| 'data.txt': Paths.get('/inputs/data.txt') | ||
| ] | ||
| getOutputsByType(_) >> [ | ||
| 'data.foo': Paths.get('/work/00112233/data.foo') | ||
| ] | ||
| } | ||
| def task2 = Mock(TaskRun) { | ||
| getInputFilesMap() >> [ | ||
| 'data.foo': Paths.get('/work/00112233/data.foo') | ||
| ] | ||
| getOutputsByType(_) >> [ | ||
| 'data.bar': Paths.get('/work/aabbccdd/data.bar') | ||
| ] | ||
| } | ||
| def dag = new TaskDAG() | ||
|
|
||
| when: | ||
| dag.addTask( task1 ) | ||
| dag.addTask( task2 ) | ||
| def v1 = dag.vertices[task1] | ||
| def v2 = dag.vertices[task2] | ||
| then: | ||
| v1.inputs.size() == 1 | ||
| v1.inputs['data.txt'] == Paths.get('/inputs/data.txt') | ||
| and: | ||
| v2.inputs.size() == 1 | ||
| v2.inputs['data.foo'] == Paths.get('/work/00112233/data.foo') | ||
|
|
||
| when: | ||
| dag.addTaskOutputs( task1 ) | ||
| dag.addTaskOutputs( task2 ) | ||
| then: | ||
| v1.outputs == [ Paths.get('/work/00112233/data.foo') ] as Set | ||
| and: | ||
| v2.outputs == [ Paths.get('/work/aabbccdd/data.bar') ] as Set | ||
| and: | ||
| dag.getProducerVertex(v1.inputs['data.txt']) == null | ||
| dag.getProducerVertex(v2.inputs['data.foo']) == v1 | ||
| } | ||
|
|
||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.