Skip to content

Conversation

@chief-dweeb
Copy link

I have gone through my copious TODO: comments and have dealt with the ones that seemed important to resolve before releasing it to alpha/beta users. I have also updated the function signature documentation for routines in graph.py and tree.py where it seemed most important. Also some cosmetic changes - longer variable names etc.

I have just redone performance tests - comparing NX performance against RX performance. There is one line in side _first_time() in partition.py that will revert partitions to use NX graphs internally. I use this to run the current code with an embedded graph that is either NX or RX, and then compare the results.

I have created an Excel spreadsheet that compares NX vs. RX, and I would be happy to share it with you, but the TLDR is that on a test that does 100,000 iterations in MarkovChain() the RX version runs more than twice as fast with essentially all of the speedup happening in random_spanning_tree() due to the underlying nxtree.minimum_spanning_tree() vs. rx.minimum_spanning_tree() which is an astonishingly 4 times faster than the NX version.

So - let me know what else you would like done before we let real users loose on this codebase...

-Fred

Fred Mueller added 6 commits November 5, 2025 13:29
  * Some changes are cosmetic
  * Some are results of resolving a TODO: item

All tests pass...
No new functionality in GerryChain...
with a subtag that said whether the TODO: was for Performance,
Documentation, Refactoring (basically any stylistic or software
engineering improvement that did not affect functionality), and Code
(which was a catchall for stuff that should be done sooner
rather than later for one reason or another).

This commit is just to allow a reviewer to be able to ignore
these changes as trivial.
updated function signature comments, lots of comments more
generally, updated TODO: comments (actually "did" a lot
of them)
Copy link
Collaborator

@peterrrock2 peterrrock2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Thank you for adding the tags to each of your "TODO" items.
  • There are a lot of missing doc strings (esp. in graph.py) but I think that we can worry about that more after a refactor and when we get closer to a full release

split = {name: 0 for name in all_reg_names}

# frm: TODO: Grok what this tests - not clear to me at this time...
# frm: TODO: Testing: Grok what this tests - not clear to me at this time...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a test; it's just a helper function and is not picked up by pytest. Tests are prefixed with the word "test" in the function name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point about it not being a test, but I still want to "grok" what it does ;-)

Comment on lines +296 to +297
subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]
subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]
subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]
subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch... I am not sure this test should remain in the test suite, though. I will take a look at it when I get around to cleaning up the testing stuff...

Comment on lines +252 to +284
def test_subgraph(four_by_five_graph_rx):

"""
Subgraphs are one of the most dangerous areas of the code.
In NX, subgraphs preserve node_ids - that is, the node_id
in the subgraph is the same as the node_id in the parent.
However, in RX, that is not the case - RX always creates
new node_ids starting at 0 and increasing by one
sequentially, so in general a node in an RX subgraph
will have a different node_id than it has in the parent
graph.
To deal with this, the code creates a map from the
node_id in a subgraph to the node_id in the parent
graph, _node_id_to_parent_node_id_map. This test verifies
that this map is properly created.
In addition, all RX based graphs that came from an NX
based graph record the "original" NX node_ids in
another node_id map, _node_id_to_original_nx_node_id_map
When we create a subgraph, this map needs to be
established for the subgraph. This test verifies
that this map is properly created.
Note that this test is only configured to work on
RX based Graph objects because the only uses of subgraph
in the gerrychain codebase is on RX based Graph objects.
"""

# Create a subgraph for an arbitrary set of nodes:
subgraph_node_ids = [2, 4, 5, 8, 11, 13]
parent_graph_rx = four_by_five_graph_rx # make the code below clearer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a fuzzed version of this where we do a random connected graph on 1000 nodes and then take random subgraphs of random size (say with between 100 and 200 nodes) and then repeat a few times.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - good idea. I will put a TODO: Testing: comment in the code to remind me to do this

Comment on lines +625 to +647
* Aliasing concerns:
It occurs to me that the RX node_data is aliased with the NX node_data.
That is, the data dictionaries in the NX Graph are just retained
when the NX Graph is converted to be an RX Graph - so if you change
the data in the RX Graph, the NX Graph from which we created the RX
graph will also be changed.
I believe that this is also true for subgraphs for both NX and RX,
meaning that the node_data in the subgraph is the exact same
data dictionary in the parent graph and the subgraph.
I am not sure if this is a problem or not, but it is something
to be tested / thought about...
* NX allows node_ids to be almost anything - they can be integers,
strings, even tuples. I think that they just need to be hashable.
I don't know if we need to test that non-integer NX node_ids
don't cause a problem. There are tests elsewhere that have
NX node_ids that are tuples, and that test passes, so I think
we are OK, but there are no tests specifically targeting this
issue that I know of.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we don't really need to worry about the aliasing problem here. Once they get passed to a recom method, we treat graphs as immutable, read-only structures. So long as a user doesn't try to do something silly in an acceptance function or in an updater, things should be fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is more about the data dictionaries (node data and edge data) as these dictionaries are also shared.

I agree that it is probably not a problem, but aliasing creeps me out, and I would like to make sure that we really are OK...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Comment on lines +72 to +74
# frm: TODO: Testing: how to handle geometry?
#
# Originally, the following statement blew up because we do not copy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably leave the geometry in a geodataframe attached as an attribute of the "graph" class rather than trying to add data onto the nodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this entire comment - I went ahead and added geometry data to the RX-based Graph object, and it all just worked after that...

@peterrrock2 peterrrock2 merged commit 37e2983 into mggg:wip/rustworkx-migration Dec 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants