Attempt to fix 3489 #3533

heplesser · 2025-07-13T21:48:18Z

This PR is an (so far not entirely successful) attempt to solve #3489, i.e., to address the problem that spikes are delivered by the event delivery manager after a connection has been removed. See #3489 for a reproducer (not added here as a test yet). I post this here now so further work can build on my attempts.

My attempt to solve this is by running update_connection_infrastructure() from the main update_() loop, after events have been delivered and SP has done its work. In addition, the conn infrastructure must be built once before the first simulation step, so WFR can work already in the first time slice. This is unproblematic, since no spikes can be available for delivery at time 0.

Most tests pass, but I needed to adapt some tests because they either worked more by coincidence that robustly, and others so that comparisons of SynapseCollections no longer depend on the order of synapses in the collection; I also ignore port in comparisons in almost all cases, because that changes with conn infra updates and is an internal implementation detail. See the changes in testsuite for how I modified tests.

I finally ran into trouble that I haven't fully solved yet with test_spike_transmission_after_simulate(), which does

    n = nest.Create("parrot_neuron", 10)
    nest.Connect(n, n)

    c = nest.GetConnections()
    c[::3].disconnect()

    g = nest.Create("spike_generator", params={"spike_times": [1]})
    nest.Connect(g, n)

    nest.Simulate(3)

This fails on the disconnect() step, claiming that the connection to be deleted does not exist. This is somehow related to ConnectionManager::find_connection(). Introducing nest.Simulate(0.1) before GetConnections(), and thus forcing the conn infra to be updated, solves the problem and make the test work. Note that this test works in master only if NEST uses compressed spikes, otherwise it fails in the same way as this branch (see #3532).

The following tests currently fail, I haven't had time to investigate why:

  | install.share.nest.testsuite.pytests.test_spike_transmission_after_disconnect.test_spike_transmission_after_disconnect
    | install.share.nest.testsuite.pytests.test_synapsecollection.TestSynapseCollection.test_GetConnectionsOnSubset
    | install.share.nest.testsuite.pytests.test_synapsecollection.TestSynapseCollection.test_GetConnectionsSynapse
    | install.share.nest.testsuite.pytests.test_synapsecollection.TestSynapseCollection.test_basic
    | install.share.nest.testsuite.pytests.test_synapsecollection.TestSynapseCollection.test_get
    | install.share.nest.testsuite.pytests.test_synapsecollection.TestSynapseCollection.test_getWithPandasOutput
    | install.share.nest.testsuite.pytests.test_synapsecollection.TestSynapseCollection.test_string
    | install.share.nest.testsuite.pytests.test_sp.test_disconnect_multiple.TestDisconnect.test_disconnect_all_to_all
    | install.share.nest.testsuite.pytests.test_sp.test_disconnect_multiple.TestDisconnect.test_disconnect_defaults
    | install.share.nest.testsuite.pytests.test_sp.test_disconnect_multiple.TestDisconnect.test_disconnect_static_synapse
    | install.share.nest.testsuite.pytests.test_sp.test_disconnect_multiple.TestDisconnect.test_multiple_synapse_deletion_all_to_all
    | install.share.nest.testsuite.pytests.test_sp.test_disconnect_multiple.TestDisconnect.test_multiple_synapse_deletion_one_to_one
    | install.share.nest.testsuite.pytests.test_sp.test_disconnect_multiple.TestDisconnect.test_multiple_synapse_deletion_one_to_one_no_sp
    | install.share.nest.testsuite.pytests.test_sp.test_disconnect_multiple.TestDisconnect.test_single_synapse_deletion_sp
    | install.share.nest.testsuite.pytests.test_spatial.test_dumping.DumpingTestCase.test_DumpConns
    | install.share.nest.testsuite.pytests.sli2py_mpi.test_issue_1957.test_issue_1957
    | install.share.nest.testsuite.pytests.sli2py_mpi.test_self_get_conns_with_empty_ranks.test_get_conns_with_empty_ranks

…very

…s need revision, see Note.

… for order-independent comparison

…on of SynapseCollection

heplesser · 2025-07-14T06:47:56Z

I just pushed another update fixing one corner case and added a regression test. Besides the problem with test_spike_transmission_after_disconnect mentioned above, there is also a deadlock issue on some MPI-based tests which leads to timeouts. This happens if one rank has connections and the other doesn't. One example is test_issue_3099.py, which does the following:

    nrn = nest.Create("parrot_neuron")
    nest.Connect(nrn, nrn)
    conns = nest.GetConnections()
    if conns:
        conns.weight = 2.5

Here, GetConnections() completes well. Running on two MPI ranks, only one rank will have a connection in conns, for the other one the node collection is empty. Therefore, only one rank enters the if. Now the following happens:

At the Python level, we call SynapseCollection.set() and there we have

nest-simulator/pynest/nest/lib/hl_api_types.py

Line 941 in 2803af2

if self.__len__() == 0 or GetKernelStatus("network_size") == 0:
The rank without connection returns here immediately, while the rank with connection calls GetKernelStatus().

GetKernelStatus() calls update_delay_extrema_(), which initiates MPI communication if and only if connections_have_changed() is true:

nest-simulator/nestkernel/connection_manager.cpp

Lines 541 to 542 in 2803af2

    
           if ( not kernel().connection_manager.get_user_set_delay_extrema() 
        
             and kernel().connection_manager.connections_have_changed() and kernel().mpi_manager.get_num_processes() > 1 )

.

Now, in current master, because GetConnections() triggers an update of the connection infrastructure, both ranks have connections_have_changed() returning false at this point, so neither rank tries to communicate and all is well. But in my PR, this is on longer the case, and so only the rank with connections tries to communicate here, and then we are locked. But GetConnections() must not update the connection infrastructure, because that will lead to undeliverable spikes after disconnection.

A "workaround" is to call Simulate(0.1) after creating the connections, because then the delay extrema are fixed and update_delay_extrema_() is skipped.

Clearly, in this case, where we only need the network size, no MPI communication would be necessary at all, we just induce it because the only way to get data out of the kernel is to call GetKernelStatus(), and that always returns everything. Incidentally, the SynapseCollection.set() method later on calls SynapseCollection.get()

nest-simulator/pynest/nest/lib/hl_api_types.py

Line 953 in 2803af2

node_params = self[0].get()

, which does the GetKernelStatus() dance all over again:

nest-simulator/pynest/nest/lib/hl_api_types.py

Line 870 in 2803af2

if self.__len__() == 0 or GetKernelStatus("network_size") == 0:

. BTW, this code is the same in PyNEST-NG for now.

heplesser · 2025-07-14T06:53:47Z

I manually canceled the Actions workflow because the remaining cases had gotten stuck and our timeout mechanisms don't seem to work for MPI-based tests right now.

heplesser added 14 commits April 16, 2025 00:52

Hike ubuntu version for isort to 22.04

8c8c725

Merge branch 'master' of github.com:nest/nest-simulator

936a022

Merge branch 'master' of github.com:nest/nest-simulator

9a0060b

Merge branch 'master' of github.com:nest/nest-simulator

11fa2f9

Merge branch 'master' of github.com:nest/nest-simulator

c98f492

Merge branch 'master' of github.com:nest/nest-simulator

0d2c0f0

Merge branch 'master' of github.com:nest/nest-simulator

8a3dabd

Properly null out pointer after object deletion

29f52da

Attempt to fix nest#3489 by moving conn infra update after event deli…

b8cc35b

…very

Adjust neuron number to make tests more robust. But the affected test…

0d2cde0

…s need revision, see Note.

Add function to turn SynapseCollection into Counter object (multiset)…

5f18db3

… for order-independent comparison

Adapt to work correctly independent of synapse ordering

f047cbc

Make test robust against changes in synapse reordering and invalidati…

fa86018

…on of SynapseCollection

Make test robust against changes in synapse reordering and invalidati…

033e1ef

…on of SynapseCollection

heplesser added T: Bug Wrong statements in the code or documentation S: High Should be handled next I: Behavior changes Introduces changes that produce different results for some users labels Jul 13, 2025

heplesser added this to Kernel Jul 13, 2025

github-project-automation bot moved this to In progress in Kernel Jul 13, 2025

heplesser added 2 commits July 14, 2025 07:15

Add regression test for issue 3489

3efc806

Update conn infrastructure only at beginning of slice

5c4f24e

heplesser mentioned this pull request Jul 14, 2025

nest.Connect and mpi #3489

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attempt to fix 3489 #3533

Attempt to fix 3489 #3533

Uh oh!

heplesser commented Jul 13, 2025

Uh oh!

heplesser commented Jul 14, 2025

Uh oh!

heplesser commented Jul 14, 2025

Uh oh!

Uh oh!

Attempt to fix 3489 #3533

Are you sure you want to change the base?

Attempt to fix 3489 #3533

Uh oh!

Conversation

heplesser commented Jul 13, 2025

Uh oh!

heplesser commented Jul 14, 2025

Uh oh!

heplesser commented Jul 14, 2025

Uh oh!

Uh oh!