Skip to content

Conversation

@tlarock
Copy link

@tlarock tlarock commented May 6, 2019

This PR adds the ability to ignore selfloops in path data to the Paths.add_path instance function and Paths.read_file class function. It also removes some redundant code from paths.py by changing read_file to use add_path with the expand_subpaths option.

I implemented this because I have a dataset that includes (apparently) meaningless self-loops that I want to be able to remove/include in my pipeline without preprocessing the data manually. The two main changes are:

  1. Add remove_selfloops option to add_path with functionality that collapses consecutively repeated symbols into a single symbol. For example, the sequence ('a', 'a', 'b', 'b', 'c') will collapse to just ('a', 'b', 'c').
    • Note that since a sanity check in add_path is already looping through the elements of the path to ensure the separator character is safe to use, I only add comparisons and list appends to the loop so the change should not meaningfully impact computational complexity.
  2. Change Paths.read_file to call cls.add_path rather than reproducing the same functionality. Moves expand_subpaths functionality to add_path rather than calling it on the whole object at the end.

In working with this code, I couldn't understand why Paths.read_edges is static and Paths.read_file is class. I think it makes sense for them to be consistent and I think they should both be static, since they both populate and return a new Paths object. I can change the decorator in this PR if desired.

Let me know if you have questions/comments/changes!

Edit: Meant to include a note that this is completely separate from my other open PR (#47).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant