Race condition in onSnapshot code crashes the process #2280
Labels
api: firestore
Issues related to the googleapis/nodejs-firestore API.
priority: p2
Moderately-important priority. Fix may not be included in next release.
type: bug
Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Bottom line up front
When the conditions are just right, it's possible for a grpc stream (underlying an onSnapshot query) to emit an
error
event while not having any error handlers, killing the node process. It's rare, but our server's heavy traffic causes it to happen multiple times per day.Is this a client library issue or a product issue?
Client library. A transient error isn't properly handled by the library under certain conditions.
Did someone already solve this?
None of my searches have yielded results.
Do you have a support contract?
I don't
Environment details
@google-cloud/firestore
version: 7.4.0Steps to reproduce
While I can't write code that reliably reproduces the problem (it's a low probability race condition), I can point to a code path that is hit and support my hypothesis with logs emitted by our server.
https://github.com/googleapis/nodejs-firestore/blob/main/dev/src/watch.ts#L483-L513
.isActive
field is flipped back to false. Subsequently, the code enters theif
clause (2). Note that, in this path, we exit the function at (3) andbackendStream
doesn't get an error handler like it does in (4).backendStream
must emit an error (it can be anything, like a transient networking issue). The fact that it receives anend
event earlier doesn't preclude it from doing so.Logs (I have selectively enabled certain firestore's internal logs to be formatted and logged to console):
The text was updated successfully, but these errors were encountered: