-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve heartbeat #259
Improve heartbeat #259
Conversation
I tested the changes and the described failure happens not longer. I did not check the source code in detail. I tested with the following configuration of the heartbeat:
and got the following Dump in wireshark in my local dev env: The black marked packages are authentication and data initalization of the frontend. The blue marked packages are the Ping/Pong heartbeat messages from the backend, controlled by the python As visible the first ping messages is send by the fontend after 30 seconds (timeout). Any further ping message is sent after an intervall of 20 seconds. I asume this is the intended behaviour? There are no disconnectes longer as described in the linked issue. From a functional side the changes are good to go. Edit: Oops... Looks like some other problems occurs after some minutes of an established connection. |
src/lib/heartbeat.ts
Outdated
const heartbeatInterval = setInterval(() => { | ||
if (lastMessageTime.current + timeout <= Date.now()) { | ||
console.warn(`Heartbeat timed out, closing connection, last message was seen ${Date.now() - lastMessageTime.current}ms ago`); | ||
const timeNow = Date.now(); | ||
|
||
if (lastMessageTime.current + timeout <= timeNow) { | ||
console.warn(`Heartbeat timed out, closing connection, last message was seen ${timeNow - lastMessageTime.current}ms ago`); | ||
ws.close(); | ||
} else { | ||
if (lastMessageTime.current + interval <= Date.now()) { | ||
if (lastMessageTime.current + interval <= timeNow && lastPingSent + interval <= timeNow) { | ||
try { | ||
if (typeof message === 'function') { | ||
ws.send(message()); | ||
} else { | ||
ws.send(message); | ||
} | ||
lastPingSent = timeNow; | ||
} catch (err: unknown) { | ||
console.error(`Heartbeat failed, closing connection`, err instanceof Error ? err.message : err); | ||
ws.close(); | ||
} | ||
|
||
} | ||
} | ||
}, interval); | ||
}, intervalCheck); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm missing the point, but is there a reason to combine these two checks into the same logic? My understanding is that the bug is caused when the interval and timeouts are too close together, but it seems to me that we can avoid some of the confusion by handling their logic separately. For the heartbeat timeout:
//attach-shared-listeners.ts
const bindMessageHandler = (
webSocketInstance: WebSocketLike,
url: string,
heartbeatOptions?: boolean | HeartbeatOptions
) => {
let heartbeatTimeoutMs = typeof heartbeatOptions !== 'boolean' ? heartbeatOptions?.timeout : undefined;
let heartbeatTimeout: NodeJS.Timeout;
if (heartbeatTimeoutMs) {
webSocketInstance.addEventListener("close", () => {
if (heartbeatTimeout) {
clearTimeout(heartbeatTimeout);
}
});
}
webSocketInstance.onmessage = (message: WebSocketEventMap['message']) => {
if (heartbeatTimeoutMs) {
clearTimeout(heartbeatTimeout);
heartbeatTimeout = setTimeout(() => {
webSocketInstance.close();
}, heartbeatTimeoutMs);
}
getSubscribers(url).forEach(subscriber => {
if (subscriber.optionsRef.current.onMessage) {
subscriber.optionsRef.current.onMessage(message);
}
if (typeof subscriber?.lastMessageTime?.current === 'number') {
subscriber.lastMessageTime.current = Date.now();
}
if (
typeof subscriber.optionsRef.current.filter === 'function' &&
subscriber.optionsRef.current.filter(message) !== true
) {
return;
}
if (
heartbeatOptions &&
typeof heartbeatOptions !== "boolean" &&
heartbeatOptions?.returnMessage === message.data
)
return;
subscriber.setLastMessage(message);
});
};
};
And for the interval:
//attach-shared-listeners.ts
const bindOpenHandler = (
webSocketInstance: WebSocketLike,
url: string,
heartbeatOptions?: boolean | HeartbeatOptions
) => {
let heartbeatIntervalMs = typeof heartbeatOptions !== 'boolean' ? heartbeatOptions?.interval : undefined;
let heartbeatInterval: NodeJS.Timeout;
if (heartbeatIntervalMs) {
webSocketInstance.addEventListener("close", () => {
if (heartbeatInterval) {
clearInterval(heartbeatInterval);
}
});
}
webSocketInstance.onopen = (event: WebSocketEventMap['open']) => {
if (heartbeatOptions !== undefined && typeof heartbeatOptions !== 'boolean' && heartbeatIntervalMs && webSocketInstance instanceof WebSocket) {
heartbeatInterval = setInterval(() => {
const {
message = DEFAULT_HEARTBEAT.message
} = heartbeatOptions;
webSocketInstance.send(typeof message === 'function' ? message() : message)
}, heartbeatIntervalMs)
}
getSubscribers(url).forEach(subscriber => {
subscriber.reconnectCount.current = 0;
if (subscriber.optionsRef.current.onOpen) {
subscriber.optionsRef.current.onOpen(event);
}
subscriber.setReadyState(ReadyState.OPEN);
let onMessageCb: () => void;
//would remove the logic from here
if (heartbeatOptions && webSocketInstance instanceof WebSocket) {
subscriber.lastMessageTime.current = Date.now();
heartbeat(webSocketInstance, subscriber.lastMessageTime, typeof heartbeatOptions === 'boolean' ? undefined : heartbeatOptions,);
}
});
};
};
I like having this logic abstracted into its own file, but I kept it in one place here for demonstration purposes.
I also have a question about whether it's better to lift the heartbeat logic outside of the subscriber step (as I indicated by comment in the above example) -- the way I see it, sharing a single connection between multiple "subscribers" is a front-end implementation detail/optimization. But handling the heartbeat at the subscriber level means that we are sending multiple heartbeat messages for a single websocket connection, which an api might reject/not handle predictably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, will refactor this a bit later this week. Thanks for your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored it a bit, now it does not create separate interval for each subscriber in shared mode, so only one ping message regardless of the number of subscribers. New test validates this behavior.
Corrected interval in one existing test, but just to make sure it gets expected number of ping messages.
Could you please retest it? New implementation is more correct, @robtaussig helped to spot one bit I did not think about. |
I get the timout after a while again: In Wireshark I can see that after this timespan the client stops sending the ping message: 1: Last ping/pong messages from the frontend The two marked timestamps shows, that after 55s the timout and reconnect from the client is happening: So, the issue, that the client stops sending the ping message is still there. |
I added more logging in my last commit. The reason for that is because I can't replicate that problem locally with tests. Looking on your screenshot from wireshark I noticed that some messages are not ping/pong that are sent as a data, those are real low-level ping-pongs, where the I understand that sharing actual network dump is probably not a good idea as it may contain sensitive data. But if you can re-test this again with additional logging that I added in my last commit, then it would provide a bit more data for me. Thanks. |
You are right. The Ping/Pong messages you are marked are "lowlevel" messages coming from the python websocket packages which runs the backend. The browser replies correctly with a pong. So in my app there are two different heartbeats. One controlled from the backend and one controlled from the frontend. From my understanding it is visible, that the frontend, so the I will test your new version soon and provide the results here. Stay tuned. |
Other tab should not cause any issues. I assume that Edge browser may put tab into sleep mode if it is not active. |
Hi, this is @w666 but writing from my work account I applied this fix to our project, I've built a production release and then was running it locally, I could not replicate the problem This is the config we use Also I double checked what happens when tab is not active for some time, EDGE put it in to sleep mode and connection terminates, but immediately reconnects when tab wakes up What browser do you use? To be honest I am not sure what is happening here. Maybe adjust timeout and interval to the same values as in my config? |
I adjusted the timout to your configuration w/o a change in the result. We are using Edge. I think your idea with the sleep mode after some time, espacially if the tab is not active in the browser, is a good theory for the behavior. In the end it is acceptable because the client connects immediately again as soon as the tab is active again. But the question is, is this the desired behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to your opinion regarding the console.log, but I'm inclined to think that in most cases, it is preferable to not clog up the console with information pertaining to just one (of potentially many) libraries in an app. In an ideal world, we can consider adding a debug
option to the websocket hook that would allow developers to opt in to this behavior, in which case I'd be all for any and all logs!
src/lib/heartbeat.ts
Outdated
try { | ||
if (typeof message === 'function') { | ||
ws.send(message()); | ||
} else { | ||
ws.send(message); | ||
} | ||
console.log(`>>> last message received ${timeNow - lastMessageReceivedAt}ms ago, last ping sent ${timeNow - lastPingSentAt}ms ago`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about this log -- if I were a developer working on an app that used this library, I would be pretty annoyed if my console was getting cluttered by this message every X seconds.
console.log(`>>> last message received ${timeNow - lastMessageReceivedAt}ms ago, last ping sent ${timeNow - lastPingSentAt}ms ago`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If seeing the ping history is important (and I understand that it would be difficult to monitor it without digging through the network tab), then I'd consider something like:
const lastTenHeartbeats = [];
//instead of the console.log on line 40:
lastTenHeartbeats.push({ messageReceived: timeNow - lastMessageReceivedAt, pingSent: timeNow - lastPingSentAt });
if (lastTenHeartbeats.length > 10) {
lastTenHeartbeats.shift();
}
Then where you call console.warn
, you can display a message that incorporates the last 10 heartbeats (since I imagine that is the only time a developer would care about this information)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @robtaussig, @buhln
I investigated that area, seems like this is known "issue", it is happening from ~2017 in all browsers. Unfortunately, we can't really mitigate the sleeping tab
problem.
I will remove that debug logging, it was just temporary change for debugging.
But this PR in general makes sense, it provides more info when connection is terminated and using just one interval in shared mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR, removed temp debug logging, it is ready for another review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated that area, seems like this is known "issue", it is happening from ~2017 in all browsers. Unfortunately, we can't really mitigate the
sleeping tab
problem.
Thanks for your research. If this is a "feature" of the browsers you will be right.
But at the end your commit fixed the bug reported initially in #255. 👍
364c463
to
79a3c9d
Compare
Check interval between ping messages more often (interval / 10). This is to prevent some configurations that may cause timeouts and connection close. For example, when timeout and interval are set to very close values and there was data message recived right after connection is established. Use one heartbeat interval in shared mode, add test. Close robtaussig#255
79a3c9d
to
bd9f3ac
Compare
Check interval between ping messages more often (interval / 10).
This is to prevent some configurations that may cause timeouts
and connection close. For example, when timeout and interval are
set to very close values and there was data message recived right
after connection is established.
Close #255