Skip to content

Fusion symlink resolution doesn't work with directories #4725

@bentsherman

Description

@bentsherman

We recently added some logic to handle Fusion symlinks, which happen when an input file is included in a published output. We use the .fusion.symlinks file to detect and resolve any Fusion symlinks when they are published.

However, when publishing a directory, Nextflow does not walk the directory tree and publish each file individually, it just publishes the directory. If Fusion symlinks were staged into this directory, they will not be detected and resolved.

Given a list of input files:

$ ls -1 freshdesk/fd-4463/files/ 
1.txt
2.txt
3.txt
4.txt
5.txt

The following pipeline script will demonstrate this issue:

process AGGREGATE {
  container "quay.io/nextflow/bash"
  publishDir "results", mode: "copy"

  input:
  path(samples), stageAs: 'AnalysisFiles/'

  output:
  path("*")

  script:
  """
  for name in AnalysisFiles/*.txt; do
    touch AnalysisFiles/Analysis_on_\$(basename \${name} .txt)
  done
  """
}

workflow {
    AGGREGATE( files("files/*") )
}

One workaround that I tried is to make the outputs more explicit:

  output:
  path("AnalysisFiles/*.txt", includeInputs: true)
  path("AnalysisFiles/Analysis_on_*")

But this doesn't quite work because of a small bug in the symlink resolution. I will submit a patch so that at least this workaround works.

A more permanent solution would be to walk the directory tree and publish each file explicitly instead of only publishing the directory (#3933). There are other benefits to making this change, see also #3372.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions