Add functionality to ignore selfloops #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the ability to ignore selfloops in path data to the
Paths.add_pathinstance function andPaths.read_fileclass function. It also removes some redundant code frompaths.pyby changingread_fileto useadd_pathwith theexpand_subpathsoption.I implemented this because I have a dataset that includes (apparently) meaningless self-loops that I want to be able to remove/include in my pipeline without preprocessing the data manually. The two main changes are:
remove_selfloopsoption toadd_pathwith functionality that collapses consecutively repeated symbols into a single symbol. For example, the sequence ('a', 'a', 'b', 'b', 'c') will collapse to just ('a', 'b', 'c').add_pathis already looping through the elements of the path to ensure the separator character is safe to use, I only add comparisons and list appends to the loop so the change should not meaningfully impact computational complexity.Paths.read_fileto callcls.add_pathrather than reproducing the same functionality. Movesexpand_subpathsfunctionality toadd_pathrather than calling it on the whole object at the end.In working with this code, I couldn't understand why
Paths.read_edgesis static andPaths.read_fileis class. I think it makes sense for them to be consistent and I think they should both be static, since they both populate and return a newPathsobject. I can change the decorator in this PR if desired.Let me know if you have questions/comments/changes!
Edit: Meant to include a note that this is completely separate from my other open PR (#47).