Frm rustworkx - cleanup... #432

chief-dweeb · 2025-12-08T20:30:42Z

I have gone through my copious TODO: comments and have dealt with the ones that seemed important to resolve before releasing it to alpha/beta users. I have also updated the function signature documentation for routines in graph.py and tree.py where it seemed most important. Also some cosmetic changes - longer variable names etc.

I have just redone performance tests - comparing NX performance against RX performance. There is one line in side _first_time() in partition.py that will revert partitions to use NX graphs internally. I use this to run the current code with an embedded graph that is either NX or RX, and then compare the results.

I have created an Excel spreadsheet that compares NX vs. RX, and I would be happy to share it with you, but the TLDR is that on a test that does 100,000 iterations in MarkovChain() the RX version runs more than twice as fast with essentially all of the speedup happening in random_spanning_tree() due to the underlying nxtree.minimum_spanning_tree() vs. rx.minimum_spanning_tree() which is an astonishingly 4 times faster than the NX version.

So - let me know what else you would like done before we let real users loose on this codebase...

-Fred

* Some changes are cosmetic * Some are results of resolving a TODO: item All tests pass...

No new functionality in GerryChain...

with a subtag that said whether the TODO: was for Performance, Documentation, Refactoring (basically any stylistic or software engineering improvement that did not affect functionality), and Code (which was a catchall for stuff that should be done sooner rather than later for one reason or another). This commit is just to allow a reviewer to be able to ignore these changes as trivial.

updated function signature comments, lots of comments more generally, updated TODO: comments (actually "did" a lot of them)

peterrrock2

Thank you for adding the tags to each of your "TODO" items.
There are a lot of missing doc strings (esp. in graph.py) but I think that we can worry about that more after a refactor and when we get closer to a full release

peterrrock2 · 2025-12-10T17:07:08Z

tests/test_region_aware.py

    split = {name: 0 for name in all_reg_names}

-    # frm: TODO: Grok what this tests - not clear to me at this time...
+    # frm: TODO: Testing: Grok what this tests - not clear to me at this time...


This isn't a test; it's just a helper function and is not picked up by pytest. Tests are prefixed with the word "test" in the function name.

Fair point about it not being a test, but I still want to "grok" what it does ;-)

peterrrock2 · 2025-12-10T18:11:15Z

tests/test_frm_graph.py

+        subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]
+        subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]


Suggested change

subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]

subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]

subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]

Good catch... I am not sure this test should remain in the test suite, though. I will take a look at it when I get around to cleaning up the testing stuff...

peterrrock2 · 2025-12-10T18:21:28Z

tests/test_frm_graph.py

+def test_subgraph(four_by_five_graph_rx):
+
+    """
+    Subgraphs are one of the most dangerous areas of the code.
+    In NX, subgraphs preserve node_ids - that is, the node_id
+    in the subgraph is the same as the node_id in the parent.
+    However, in RX, that is not the case - RX always creates
+    new node_ids starting at 0 and increasing by one 
+    sequentially, so in general a node in an RX subgraph 
+    will have a different node_id than it has in the parent
+    graph.
+
+    To deal with this, the code creates a map from the 
+    node_id in a subgraph to the node_id in the parent
+    graph, _node_id_to_parent_node_id_map.  This test verifies
+    that this map is properly created.
+
+    In addition, all RX based graphs that came from an NX
+    based graph record the "original" NX node_ids in 
+    another node_id map, _node_id_to_original_nx_node_id_map
+
+    When we create a subgraph, this map needs to be 
+    established for the subgraph.  This test verifies
+    that this map is properly created.
+
+    Note that this test is only configured to work on
+    RX based Graph objects because the only uses of subgraph 
+    in the gerrychain codebase is on RX based Graph objects.
+    """
+
+    # Create a subgraph for an arbitrary set of nodes:
+    subgraph_node_ids = [2, 4, 5, 8, 11, 13]
+    parent_graph_rx = four_by_five_graph_rx     # make the code below clearer


We should probably add a fuzzed version of this where we do a random connected graph on 1000 nodes and then take random subgraphs of random size (say with between 100 and 200 nodes) and then repeat a few times.

Yep - good idea. I will put a TODO: Testing: comment in the code to remind me to do this

peterrrock2 · 2025-12-10T18:29:07Z

tests/test_frm_graph.py

+  * Aliasing concerns:
+
+    It occurs to me that the RX node_data is aliased with the NX node_data.
+    That is, the data dictionaries in the NX Graph are just retained
+    when the NX Graph is converted to be an RX Graph - so if you change
+    the data in the RX Graph, the NX Graph from which we created the RX
+    graph will also be changed.
+
+    I believe that this is also true for subgraphs for both NX and RX,
+    meaning that the node_data in the subgraph is the exact same
+    data dictionary in the parent graph and the subgraph.
+
+    I am not sure if this is a problem or not, but it is something
+    to be tested / thought about...
+
+  * NX allows node_ids to be almost anything - they can be integers,
+    strings, even tuples.  I think that they just need to be hashable.
+
+    I don't know if we need to test that non-integer NX node_ids 
+    don't cause a problem.  There are tests elsewhere that have
+    NX node_ids that are tuples, and that test passes, so I think
+    we are OK, but there are no tests specifically targeting this
+    issue that I know of.


I think that we don't really need to worry about the aliasing problem here. Once they get passed to a recom method, we treat graphs as immutable, read-only structures. So long as a user doesn't try to do something silly in an acceptance function or in an updater, things should be fine.

The issue is more about the data dictionaries (node data and edge data) as these dictionaries are also shared.

I agree that it is probably not a problem, but aliasing creeps me out, and I would like to make sure that we really are OK...

peterrrock2 · 2025-12-10T18:29:51Z

tests/README.txt

Sounds good!

peterrrock2 · 2025-12-10T18:33:49Z

tests/partition/test_plotting.py

+        # frm: TODO: Testing: how to handle geometry?
+        # 
+        # Originally, the following statement blew up because we do not copy


We'll probably leave the geometry in a geodataframe attached as an attribute of the "graph" class rather than trying to add data onto the nodes.

I deleted this entire comment - I went ahead and added geometry data to the RX-based Graph object, and it all just worked after that...

gerrychain/proposals/tree_proposals.py

Fred Mueller added 6 commits November 5, 2025 13:29

CLEAN-UP work: no new functionality

a81233f

* Some changes are cosmetic * Some are results of resolving a TODO: item All tests pass...

Added a test file and a README.txt file for tests.

da18c77

No new functionality in GerryChain...

Mostly a "cleanup" set of changes - code readability,

10bfc86

updated function signature comments, lots of comments more generally, updated TODO: comments (actually "did" a lot of them)

Added two performance tests - using cProfile

4d1d1f2

Trivial one-line change in a comment...

195720e

peterrrock2 reviewed Dec 10, 2025

View reviewed changes

peterrrock2 merged commit 37e2983 into mggg:wip/rustworkx-migration Dec 10, 2025
1 check passed

		subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]
		subgraph_stored_node_id = subgraph_rx.node_data(subgraph_node_id)["nx_node_id"]

Frm rustworkx - cleanup... #432

Frm rustworkx - cleanup... #432

Uh oh!

Conversation

chief-dweeb commented Dec 8, 2025

Uh oh!

peterrrock2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants