You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Until v1.0.3, pairtools sort allows the header line to list column names chr1 and chr2 (as indicated in official 4DN specs).
Starting with v1.1.0, pairtools sort now expects the header line indicating column names to list chrom1 and chrom2, and breaks if the header line is #columns: readID chr1 pos1 chr2 pos2 strand1 strand2.
It also seem to require pair_type to be present in the #columns in the header, as well as in a column.
I understand that the chr1/chr2 can be circumvented by specifying -c1 and -c2 fields in CLI, but now if a pair_type column is not included, pairtools sort cannot work. Is this an intended behavior? Sorry if I missed something or if this issue has already been raised.
Reproducible example
Here is an unsorted pairs file I created by hand, with chr1/chr2 in header:
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
sys.exit(cli())
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
return func(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
sort_py(
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'chrom1' is not in list
Now, changing the chr1/chr2 to chrom1/chrom2 in the header:
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
sys.exit(cli())
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
return func(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
sort_py(
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'pair_type' is not in list
Hi, Jacques!
Thank you for this detailed report - thanks to you, we identified the source of both issues in the code.
Fixing them would require a bit of time, but is ultimately straightforward.
Modify flags -c1, -c2, ... so that they can accept both ints and strs, with the default value being expressed as an int.
Write a headerops function that converts between int and str columns + another function that "canonicalizes" lists of columns names (we'd need to decide which one is better - probably strs?..).
Modify the code in sort and dedup's cli's to use the functions introduced above.
v1.0.3
,pairtools sort
allows the header line to list column nameschr1
andchr2
(as indicated in official 4DN specs).v1.1.0
,pairtools sort
now expects the header line indicating column names to listchrom1
andchrom2
, and breaks if the header line is#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
.pair_type
to be present in the#columns
in the header, as well as in a column.I understand that the
chr1
/chr2
can be circumvented by specifying-c1
and-c2
fields in CLI, but now if apair_type
column is not included,pairtools sort
cannot work. Is this an intended behavior? Sorry if I missed something or if this issue has already been raised.Reproducible example
chr1
/chr2
in header:This works
This fails:
pip install pairtools==1.1.1 ## pairtools 1.1.0 errors with `circular import` pairtools sort tmp.pairs
chr1
/chr2
tochrom1
/chrom2
in the header:This works:
This fails:
pair_type
:This works:
This works:
The text was updated successfully, but these errors were encountered: