Skip to content

chrom1, chrom2 and pair_type fields are now required in pairs file header #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
js2264 opened this issue Feb 19, 2025 · 3 comments · May be fixed by #268
Open

chrom1, chrom2 and pair_type fields are now required in pairs file header #264

js2264 opened this issue Feb 19, 2025 · 3 comments · May be fixed by #268

Comments

@js2264
Copy link

js2264 commented Feb 19, 2025

  • Until v1.0.3, pairtools sort allows the header line to list column names chr1 and chr2 (as indicated in official 4DN specs).
  • Starting with v1.1.0, pairtools sort now expects the header line indicating column names to list chrom1 and chrom2, and breaks if the header line is #columns: readID chr1 pos1 chr2 pos2 strand1 strand2.
  • It also seem to require pair_type to be present in the #columns in the header, as well as in a column.

I understand that the chr1/chr2 can be circumvented by specifying -c1 and -c2 fields in CLI, but now if a pair_type column is not included, pairtools sort cannot work. Is this an intended behavior? Sorry if I missed something or if this issue has already been raised.

Reproducible example

  1. Here is an unsorted pairs file I created by hand, with chr1/chr2 in header:
echo -e "## pairs format v1.0
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp.pairs

This works

pip install pairtools==1.0.3
pairtools sort tmp.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
NS500150:497:HWH2WBGXC:4:23605:21900:3336       NODE_1404       461     NODE_1404       246     --
NS500150:497:HWH2WBGXC:4:23606:10802:17906      NODE_1404       1441    NODE_1814       4433    --
NS500150:497:HWH2WBGXC:4:23603:4102:4882        NODE_522        6855    NODE_1404       1035    --

This fails:

pip install pairtools==1.1.1   ## pairtools 1.1.0 errors with `circular import` 
pairtools sort tmp.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
  File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
    sys.exit(cli())
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
    return func(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
    sort_py(
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
    colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'chrom1' is not in list
  1. Now, changing the chr1/chr2 to chrom1/chrom2 in the header:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp2.pairs

This works:

pip install pairtools==1.0.3
pairtools sort tmp2.pairs 
# sorted pairs...

This fails:

pip install pairtools==1.1.1
pairtools sort tmp2.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
  File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
    sys.exit(cli())
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
    return func(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
    sort_py(
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
    colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'pair_type' is not in list
  1. Now, adding pair_type:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp3.pairs

This works:

pip install pairtools==1.0.3
pairtools sort tmp3.pairs 
# sorted pairs...

This works:

pip install pairtools==1.1.1
pairtools sort tmp3.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
NS500150:497:HWH2WBGXC:4:23605:21900:3336       NODE_1404       461     NODE_1404       246      --
NS500150:497:HWH2WBGXC:4:23606:10802:17906      NODE_1404       1441    NODE_1814       4433    --
NS500150:497:HWH2WBGXC:4:23603:4102:4882        NODE_522        6855    NODE_1404       1035    --
@golobor
Copy link
Member

golobor commented Mar 3, 2025

Hi, Jacques!
Thank you for this detailed report - thanks to you, we identified the source of both issues in the code.
Fixing them would require a bit of time, but is ultimately straightforward.

@Phlya , the plan would be:

  1. Modify flags -c1, -c2, ... so that they can accept both ints and strs, with the default value being expressed as an int.
  2. Write a headerops function that converts between int and str columns + another function that "canonicalizes" lists of columns names (we'd need to decide which one is better - probably strs?..).
  3. Modify the code in sort and dedup's cli's to use the functions introduced above.
  4. Make pair_type (pt) optional for sorting.

@ShigrafS
Copy link

I'll look into it.

@golobor
Copy link
Member

golobor commented May 12, 2025

after a discussion, we figured a better solution is to provide two flags:

pairtools sort --by-column-id "2,3,4:n,5:n"

or

pairtools sort --by-column-name "chrom1,chrom2,pos1:n,pos2:n"

This way, we'll remove the flags with field names and extra columns, make sorting by pair_type optional and enable other types of sorting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants