Race conditions when reusing watches #53

balena · 2022-06-23T15:36:37Z

How to reproduce:

Suppose you have a function containing:

FirstWatchRequest0 = eetcd_watch:new(),
FirstWatchRequest1 = eetcd_watch:with_key(FirstWatchRequest0, Key1),
FirstWatchRequest = eetcd_watch:with_prefix(FirstWatchRequest1),
{ok, WatchConn, _} = eetcd_watch:watch(EtcdConn, FirstWatchRequest),

SecondWatchRequest0 = eetcd_watch:new(),
SecondWatchRequest1 = eetcd_watch:with_key(SecondWatchRequest0, Key2),
SecondWatchRequest = eetcd_watch:with_prefix(SecondWatchRequest1),
eetcd_watch:watch(WatchConn, SecondWatchRequest).

Then suppose Key1 is a "busy" prefix at the etcd cluster (i.e. it has many keys underneath and/or receives many updates per second). Then there is a high chance that the second eetcd_watch:watch call that receives the existing watch crashes like:

** (MatchError) no match of right hand side value: {:ok, %{cancel_reason: "", canceled: false, compact_revision: 0, created: false, events: [%{kv: %{create_revision: 38458213, key: "/foo", lease: 4658271320502330774, mod_revision: 48536024, value: "bar", version: 583}, type: :PUT}], fragment: false, header: %{cluster_id: 16182920199522267672, member_id: 6198688855164797093, raft_term: 13, revision: 48536024}, watch_id: 0}, ""}
    (eetcd 0.3.5) /path/to/eetcd/src/eetcd_watch.erl:225: :eetcd_watch.watch_reuse_/3

It seems to happen because at the time the second watch request is sent, the server already sent a change event.

The text was updated successfully, but these errors were encountered:

balena · 2022-06-23T18:09:46Z

I confirm that the above is indeed a problem with the implementation. The etcd v3 API states that:

A watch stream is bi-directional; the client writes to the stream to establish watches and reads to receive watch events.
Reference: https://etcd.io/docs/v3.5/learning/api/#watch-streams

However the Erlang function sends a request and actively waits for a response for that request.

In order to implement this requirement properly without losing watch events, it will require changes at the interface level. There should exist one function to send the watch request, while the response of it shall come from eetcd_watch:watch_stream.

belltoy · 2022-12-10T11:57:20Z

The eetcd_watch module is just a low-level, a wrapper to communicate with the etcd gRPC APIs. The re-use of the watching stream provided by this module is also a low-level wrapper, actually.

IMO the better fix for this module is to return the not matched responses (the changed events) to the caller. Let the caller decide how to handle that changes, like drop or take actions. This should need an API level breaking change, likely.

In addition, maybe this lib can also provide a real high-level watcher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race conditions when reusing watches #53

Race conditions when reusing watches #53

balena commented Jun 23, 2022 •

edited

Loading

balena commented Jun 23, 2022 •

edited

Loading

belltoy commented Dec 10, 2022 •

edited

Loading

Race conditions when reusing watches #53

Race conditions when reusing watches #53

Comments

balena commented Jun 23, 2022 • edited Loading

balena commented Jun 23, 2022 • edited Loading

belltoy commented Dec 10, 2022 • edited Loading

balena commented Jun 23, 2022 •

edited

Loading

balena commented Jun 23, 2022 •

edited

Loading

belltoy commented Dec 10, 2022 •

edited

Loading