Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions when reusing watches #53

Open
balena opened this issue Jun 23, 2022 · 2 comments
Open

Race conditions when reusing watches #53

balena opened this issue Jun 23, 2022 · 2 comments

Comments

@balena
Copy link
Contributor

balena commented Jun 23, 2022

How to reproduce:

Suppose you have a function containing:

FirstWatchRequest0 = eetcd_watch:new(),
FirstWatchRequest1 = eetcd_watch:with_key(FirstWatchRequest0, Key1),
FirstWatchRequest = eetcd_watch:with_prefix(FirstWatchRequest1),
{ok, WatchConn, _} = eetcd_watch:watch(EtcdConn, FirstWatchRequest),

SecondWatchRequest0 = eetcd_watch:new(),
SecondWatchRequest1 = eetcd_watch:with_key(SecondWatchRequest0, Key2),
SecondWatchRequest = eetcd_watch:with_prefix(SecondWatchRequest1),
eetcd_watch:watch(WatchConn, SecondWatchRequest).

Then suppose Key1 is a "busy" prefix at the etcd cluster (i.e. it has many keys underneath and/or receives many updates per second). Then there is a high chance that the second eetcd_watch:watch call that receives the existing watch crashes like:

** (MatchError) no match of right hand side value: {:ok, %{cancel_reason: "", canceled: false, compact_revision: 0, created: false, events: [%{kv: %{create_revision: 38458213, key: "/foo", lease: 4658271320502330774, mod_revision: 48536024, value: "bar", version: 583}, type: :PUT}], fragment: false, header: %{cluster_id: 16182920199522267672, member_id: 6198688855164797093, raft_term: 13, revision: 48536024}, watch_id: 0}, ""}
    (eetcd 0.3.5) /path/to/eetcd/src/eetcd_watch.erl:225: :eetcd_watch.watch_reuse_/3

It seems to happen because at the time the second watch request is sent, the server already sent a change event.

@balena
Copy link
Contributor Author

balena commented Jun 23, 2022

I confirm that the above is indeed a problem with the implementation. The etcd v3 API states that:

A watch stream is bi-directional; the client writes to the stream to establish watches and reads to receive watch events.
Reference: https://etcd.io/docs/v3.5/learning/api/#watch-streams

However the Erlang function sends a request and actively waits for a response for that request.

In order to implement this requirement properly without losing watch events, it will require changes at the interface level. There should exist one function to send the watch request, while the response of it shall come from eetcd_watch:watch_stream.

@belltoy
Copy link
Contributor

belltoy commented Dec 10, 2022

The eetcd_watch module is just a low-level, a wrapper to communicate with the etcd gRPC APIs. The re-use of the watching stream provided by this module is also a low-level wrapper, actually.

IMO the better fix for this module is to return the not matched responses (the changed events) to the caller. Let the caller decide how to handle that changes, like drop or take actions. This should need an API level breaking change, likely.

In addition, maybe this lib can also provide a real high-level watcher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants