Skip to content

Conversation

@urmastalimaa
Copy link

Fixes a data-race between an ets read from lookup/2 and the ets write from the 'DOWN' handler of a registered process which causes intermittent failures in supervision strategies.

In practice, the problem can occur quite often in tests which assert the init behaviour of gen_server's. In production, the already_started errors can cause a single failure to cascade and restart the parent supervisor.

A similar approach is taken in Elixir's Registry (also backed by ets)

Fixes #48

]},
{four_nodes_registry, [shuffle], [
four_nodes_concurrency
one_node_strict_mode,
Copy link
Owner

@ostinelli ostinelli Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was moved here, maybe by mistake?

{four_nodes_registry, [shuffle], [
four_nodes_concurrency
one_node_strict_mode,
one_node_repeated_restart
Copy link
Owner

@ostinelli ostinelli Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if inside of this test group, we have a four nodes setup, can you please rename this with the convention of {number_of_nodes}_{test_name}? Unless this really is a one node, because I don't see a setup for 4 nodes in the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urmastalimaa urmastalimaa force-pushed the fix_already_started_when_registered_process_restarted branch 3 times, most recently from 1953cfd to 4e1b194 Compare March 25, 2025 17:17
@urmastalimaa
Copy link
Author

I added a check to avoid a rpc, as the aliveness check is necessary for only local processes:
https://github.com/ostinelli/syn/compare/872f137c5da0fc4978e1e6fe545b33c058ca09f6..4e1b194012e8be6b0253375c2dd9a4b4196cda41

Fixes a data-race between an ets read from lookup/2 and the ets write
from the 'DOWN' handler of a registered process which causes
intermittent failures in supervision strategies.

In practice, the problem can occur quite often in tests which assert the
init behaviour of gen_server's. In production, the `already_started`
errors can cause a single failure to cascade and restart the parent
supervisor.

A similar approach is taken in [Elixir's Registry](https://github.com/elixir-lang/elixir/blob/e35ffc5a903bff3b595e323eb1ac12c4ecd515ad/lib/elixir/lib/registry.ex#L243) (also backed by ets).

Fixes ostinelli#48
@urmastalimaa urmastalimaa force-pushed the fix_already_started_when_registered_process_restarted branch from 4e1b194 to a3f5c66 Compare March 28, 2025 06:58
@urmastalimaa
Copy link
Author

Hi!

When you have a chance, could you please re-review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

syn doesn't seem to deregister a process in some cases

2 participants