Skip to content

Conversation

@bentsherman
Copy link
Member

@bentsherman bentsherman commented Mar 1, 2023

Closes #969 #1236 #2425 #3341

This PR adds an arity option for path inputs and outputs. The arity can be a single number (e.g. 1) or a range (e.g. 1..*). The arity is used (1) to validate the expected number of files and (2) to determine, when a path contains only one file, whether to "wrap" that path in a list.

Currently, a path with a single file is always unwrapped. With this PR, the path will be unwrapped only if there is a single file and the arity is "single", meaning it can have at most 1 element (in other words, 1 and 0..1 are "single").

The default arity is 1, or 1..* if the param has a glob pattern. For optional output params, the default then is 0..1 or 0..* respectively.

This PR introduces a minor breaking change -- a path input/output with a glob pattern will have a default arity of 1..*, so it will be wrapped in a list even if there is only one file. Without this PR, this file would be unwrapped. I think this breaking change is worthwhile since it is exactly the inconsistency that users were complaining about.

@bentsherman bentsherman marked this pull request as ready for review March 3, 2023 00:00
@pditommaso
Copy link
Member

This PR introduces a minor breaking change -- a path input/output with a glob pattern will have a default arity of 1..*, so it will be wrapped in a list even if there is only one file.

This should be avoided. it could create nasty side effects hard depending the nextflow version.

Arity should default to null, which will implicitly stick to the current behaviour, and use the new rule only when explicitly defined.

@bentsherman
Copy link
Member Author

Should be able to make that work. It's too bad though, because the current defaults would probably work 99% of the time without the user having to specify. It would be nice if we could eventually transition to these default settings instead of just null.

@bentsherman
Copy link
Member Author

ArityParam fails on my laptop but passes in CI 🤷

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 0d59b4c to b93634e Compare March 11, 2023 11:20
@pditommaso
Copy link
Member

Worth adding some tests for the DSL both for input and output, like in this example

def 'should define output path options' () {
given:
def text = '''
process foo {
output:
path x,
maxDepth:2,
hidden: false,
followLinks: false,
type: 'file',
separatorChar: '#',
glob: false,
optional: false,
includeInputs: false
path y,
maxDepth:5,
hidden: true,
followLinks: true,
type: 'dir',
separatorChar: ':',
glob: true,
optional: true,
includeInputs: true
return ''
}
workflow { foo() }
'''
when:
def process = parseAndReturnProcess(text, [:])
FileOutParam out0 = process.config.getOutputs().get(0)
FileOutParam out1 = process.config.getOutputs().get(1)

@bentsherman
Copy link
Member Author

I think that is covered by the e2e test I added: tests/process-arity.nf

@pditommaso
Copy link
Member

Better to add also as unit test

@pditommaso
Copy link
Member

I found a glitch testing this. If I'm not wrong, when declaring an input `arity: '0..1', it could be a null or a path type, however, I've noticed that when an empty list is given the input type is a list.

This was my test case

process foo {
  input:
  path x, arity: '0..1'

  script:
  println "inputs [${x.class.simpleName}]=$x"

  """
  echo $x
  """
}


workflow {
  foo([])
}

@bentsherman
Copy link
Member Author

I think I wasn't sure how to handle that case. You think the list should be coerced into a single null path?

@bentsherman
Copy link
Member Author

Now I'm thinking we shouldn't allow the arity to include 0, and instead only support that through the nullable option #2893

@pditommaso
Copy link
Member

Should be null according this

Currently, a path with a single file is always unwrapped. With this PR, the path will be unwrapped only if there is a single file and the arity is "single", meaning it can have at most 1 element (in other words, 1 and 0..1 are "single").

Instead I believe we should drop nullable in favour of this

@bentsherman
Copy link
Member Author

Instead I believe we should drop nullable in favour of this

That could work, actually I think that would be much simpler

With this PR, the path will be unwrapped only if there is a single file and the arity is "single", meaning it can have at most 1 element (in other words, 1 and 0..1 are "single").

Well, I was only thinking about outputs when I wrote this. If an input has arity = '0..1', then it should receive a single file or a "null" file, but not an empty list. An empty list should cause an error.

@bentsherman
Copy link
Member Author

Also I think I will add the ability to specify arity: true to have the arity inferred from the file pattern. Originally I was doing this by default but this way it can be an opt-in feature.

@bentsherman
Copy link
Member Author

I added the NullPath from the other PR, and made it fail if a nullable input receives an empty list.

The problem with the null path is that it goes through a bunch of code that it probably shouldn't, and produces this warning:

WARN: File system is unable to get file attributes file: nextflow.util.NullPath@7f -- Cause: java.nio.file.ProviderMismatchException

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member Author

I'm thinking about the zero one infinity rule and wondering if we should only allow the following:

  • 0..1
  • 0..*
  • 1
  • 1..*

It seems to be that other cases like 1..2 or 2 should be represented with tuples of (possibly nullable) files, for example:

// instead of arity: '2'
tuple path('1.txt'), path('2.txt')

// instead of arity: '1..2'
tuple path('1.txt'), path('2.txt', arity: '0..1')

@bentsherman
Copy link
Member Author

I don't think such a restriction would simplify this PR much, and there might be some weird cases where something like a..b is needed, so maybe better to only suggest it as a best practice.

@pditommaso
Copy link
Member

I've merged this manually, removing the support for 0..1 that opens the problem of how to handle the optional path. It can be added in the following iteration. See 42504d3.

@pditommaso pditommaso closed this Sep 10, 2023
@bentsherman
Copy link
Member Author

Thanks. I will rebase this branch and make a new PR with just the optional features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants