Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize intersect #140

Open
jkanche opened this issue Jan 12, 2025 · 1 comment
Open

optimize intersect #140

jkanche opened this issue Jan 12, 2025 · 1 comment
Labels
good first issue Good for newcomers

Comments

@jkanche
Copy link
Member

jkanche commented Jan 12, 2025

2597392145 function calls (2597381786 primitive calls) in 462.936 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1778880648/1778880640  109.894    0.000  109.894    0.000 {built-in method builtins.isinstance}
      426  100.660    0.236  472.105    1.108 {built-in method time.sleep}
592950332   94.024    0.000  162.465    0.000 normalize_subscript.py:15(_is_scalar_bool)
     1341   67.611    0.050  184.089    0.137 normalize_subscript.py:66(normalize_subscript)
1675/1005   15.522    0.009  171.167    0.170 subset_sequence.py:5(subset_sequence)
      368   13.168    0.036   13.168    0.036 {method 'astype' of 'numpy.ndarray' objects}
     97/0   11.685    0.120    0.000          {method 'control' of 'select.kqueue' objects}
     97/0    7.023    0.072    0.000          selectors.py:558(select)
       12    6.456    0.538    8.244    0.687 GenomicRanges.py:438(get_seqnames)
       19    4.402    0.232    4.402    0.232 {built-in method numpy.asarray}
149778492    4.361    0.000    4.361    0.000 SeqInfo.py:316(get_seqnames)
 71711356    4.087    0.000    4.088    0.000 subset_sequence.py:28(<genexpr>)
       56    3.649    0.065    7.736    0.138 subset_sequence.py:26(_subset_sequence_list)
      355    2.886    0.008    2.886    0.008 {method 'flatten' of 'numpy.ndarray' objects}
        1    2.435    2.435    8.630    8.630 GenomicRanges.py:1658(reduce)
        3    2.291    0.764    5.784    1.928 GenomicRanges.py:1775(gaps)
      349    2.232    0.006    2.232    0.006 {method 'sort' of 'numpy.ndarray' objects}
      335    1.562    0.005    1.562    0.005 {method 'tolist' of 'numpy.ndarray' objects}
        6    1.021    0.170    2.308    0.385 GenomicRanges.py:1643(_group_indices_by_chrm)
        6    1.019    0.170    1.019    0.170 {method 'argsort' of 'numpy.ndarray' objects}
        6    0.937    0.156    0.937    0.156 {method 'cumsum' of 'numpy.ndarray' objects}
       98    0.775    0.008  944.683    9.640 base_events.py:1915(_run_once)
      355    0.749    0.002    3.551    0.010 _arraysetops_impl.py:339(_unique1d)
      355    0.537    0.002   17.633    0.050 _arraysetops_impl.py:145(unique)
      335    0.478    0.001  159.270    0.475 IRanges.py:490(__getitem__)
      2/1    0.467    0.234   21.839   21.839 GenomicRanges.py:2045(intersect)
  3273800    0.433    0.000    0.433    0.000 {method 'split' of 'str' objects}
        4    0.310    0.077    0.310    0.077 combine_sequences.py:44(_combine_sequences_lists)
      160    0.261    0.002    0.261    0.002 {built-in method iranges.lib_iranges.gaps_ranges}
      353    0.191    0.001    0.474    0.001 GenomicRanges.py:209(_sanitize_seqnames)
        2    0.186    0.093    4.785    2.393 GenomicRanges.py:3168(_fast_combine_GenomicRanges)
      825    0.181    0.000    0.208    0.000 IRanges.py:126(_validate_width)
     1670    0.173    0.000    0.173    0.000 {built-in method numpy.array}
        2    0.169    0.084  123.503   61.751 GenomicRanges.py:1730(range)
      105    0.154    0.001    0.154    0.001 {built-in method iranges.lib_iranges.reduce_ranges}
      667    0.146    0.000    0.146    0.000 {method 'extend' of 'list' objects}
      105    0.105    0.001    0.105    0.001 subset_sequence.py:31(_subset_sequence_range)
       21    0.094    0.004    0.094    0.004 combine_sequences.py:49(_combine_sequences_dense_arrays)
        1    0.075    0.075    9.121    9.121 GenomicRanges.py:1980(union)
     4499    0.069    0.000    0.069    0.000 {method 'reduce' of 'numpy.ufunc' objects}
       72    0.068    0.001    0.068    0.001 IRanges.py:299(get_end)
        1    0.058    0.058   20.912   20.912 GenomicRanges.py:2005(setdiff)
      349    0.058    0.000    6.670    0.019 GenomicRanges.py:123(__init__)
      350    0.054    0.000    0.067    0.000 GenomicRanges.py:31(_validate_seqnames)
   579671    0.046    0.000    0.046    0.000 {method 'append' of 'list' objects}
        6    0.018    0.003    0.018    0.003 {method 'copy' of 'numpy.ndarray' objects}
    10890    0.017    0.000    0.030    0.000 ipkernel.py:775(_clean_thread_parent_frames)
      346    0.012    0.000    0.016    0.000 GenomicRanges.py:66(_validate_optional_attrs)
      835    0.010    0.000    0.402    0.000 IRanges.py:62(__init__)
        1    0.010    0.010    0.365    0.365 GenomicRanges.py:3181(_combine_GenomicRanges)
      335    0.008    0.000  333.182    0.995 GenomicRanges.py:926(get_subset)
        9    0.006    0.001    0.148    0.016 IRanges.py:2450(_combine_IRanges)
     2475    0.006    0.000    0.018    0.000 fromnumeric.py:89(_wrapreduction_any_all)
      670    0.006    0.000  143.119    0.214 BiocFrame.py:684(get_slice)
       70    0.005    0.000    0.092    0.001 IRanges.py:795(range)
        2    0.005    0.002    0.005    0.002 {built-in method numpy.zeros}
@jkanche jkanche changed the title optimize cprofile for intersect optimize intersect Jan 12, 2025
@jkanche
Copy link
Member Author

jkanche commented Jan 21, 2025

Few changes that might improve performance:

  • recycle parameters
  • do not convert seqnames to strings for group by operations
  • Some sane way of mapping sequences and strands across objects

@jkanche jkanche added the good first issue Good for newcomers label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant