Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve heartbeat #259

Merged
merged 1 commit into from
Jan 31, 2025
Merged

Conversation

w666
Copy link
Contributor

@w666 w666 commented Jan 14, 2025

Check interval between ping messages more often (interval / 10).
This is to prevent some configurations that may cause timeouts
and connection close. For example, when timeout and interval are
set to very close values and there was data message recived right
after connection is established.

Close #255

@buhln
Copy link

buhln commented Jan 15, 2025

I tested the changes and the described failure happens not longer. I did not check the source code in detail.

I tested with the following configuration of the heartbeat:

heartbeat: {
        message: 'ping',
        returnMessage: 'pong',
        timeout: 30000,
        interval: 20000,
      }

and got the following Dump in wireshark in my local dev env:

image

The black marked packages are authentication and data initalization of the frontend. The blue marked packages are the Ping/Pong heartbeat messages from the backend, controlled by the python websockets==13.1 package. The red marked packages are the texed based ping/pong heartbeat messages from the frontend.

As visible the first ping messages is send by the fontend after 30 seconds (timeout). Any further ping message is sent after an intervall of 20 seconds. I asume this is the intended behaviour? There are no disconnectes longer as described in the linked issue.

From a functional side the changes are good to go.

Edit:

Oops...

image

Looks like some other problems occurs after some minutes of an established connection.

Comment on lines 19 to 48
const heartbeatInterval = setInterval(() => {
if (lastMessageTime.current + timeout <= Date.now()) {
console.warn(`Heartbeat timed out, closing connection, last message was seen ${Date.now() - lastMessageTime.current}ms ago`);
const timeNow = Date.now();

if (lastMessageTime.current + timeout <= timeNow) {
console.warn(`Heartbeat timed out, closing connection, last message was seen ${timeNow - lastMessageTime.current}ms ago`);
ws.close();
} else {
if (lastMessageTime.current + interval <= Date.now()) {
if (lastMessageTime.current + interval <= timeNow && lastPingSent + interval <= timeNow) {
try {
if (typeof message === 'function') {
ws.send(message());
} else {
ws.send(message);
}
lastPingSent = timeNow;
} catch (err: unknown) {
console.error(`Heartbeat failed, closing connection`, err instanceof Error ? err.message : err);
ws.close();
}

}
}
}, interval);
}, intervalCheck);
Copy link
Owner

@robtaussig robtaussig Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm missing the point, but is there a reason to combine these two checks into the same logic? My understanding is that the bug is caused when the interval and timeouts are too close together, but it seems to me that we can avoid some of the confusion by handling their logic separately. For the heartbeat timeout:

//attach-shared-listeners.ts
const bindMessageHandler = (
  webSocketInstance: WebSocketLike,
  url: string,
  heartbeatOptions?: boolean | HeartbeatOptions
) => {
  let heartbeatTimeoutMs = typeof heartbeatOptions !== 'boolean' ? heartbeatOptions?.timeout : undefined;

  let heartbeatTimeout: NodeJS.Timeout;

  if (heartbeatTimeoutMs) {
    webSocketInstance.addEventListener("close", () => {
      if (heartbeatTimeout) {
        clearTimeout(heartbeatTimeout);
      }
    });
  }

  webSocketInstance.onmessage = (message: WebSocketEventMap['message']) => {
    if (heartbeatTimeoutMs) {
      clearTimeout(heartbeatTimeout);
      heartbeatTimeout = setTimeout(() => {
        webSocketInstance.close();
      }, heartbeatTimeoutMs);
    }

    getSubscribers(url).forEach(subscriber => {
      if (subscriber.optionsRef.current.onMessage) {
        subscriber.optionsRef.current.onMessage(message);
      }

      if (typeof subscriber?.lastMessageTime?.current === 'number') {
        subscriber.lastMessageTime.current = Date.now();
      }

      if (
        typeof subscriber.optionsRef.current.filter === 'function' &&
        subscriber.optionsRef.current.filter(message) !== true
      ) {
        return;
      }

      if (
        heartbeatOptions &&
        typeof heartbeatOptions !== "boolean" &&
        heartbeatOptions?.returnMessage === message.data
      )
        return;

      subscriber.setLastMessage(message);
    });
  };
};

And for the interval:

//attach-shared-listeners.ts
const bindOpenHandler = (
  webSocketInstance: WebSocketLike,
  url: string,
  heartbeatOptions?: boolean | HeartbeatOptions
) => {
  let heartbeatIntervalMs = typeof heartbeatOptions !== 'boolean' ? heartbeatOptions?.interval : undefined;

  let heartbeatInterval: NodeJS.Timeout;

  if (heartbeatIntervalMs) {
    webSocketInstance.addEventListener("close", () => {
      if (heartbeatInterval) {
        clearInterval(heartbeatInterval);
      }
    });
  }

  webSocketInstance.onopen = (event: WebSocketEventMap['open']) => {
    if (heartbeatOptions !== undefined && typeof heartbeatOptions !== 'boolean' && heartbeatIntervalMs && webSocketInstance instanceof WebSocket) {
      heartbeatInterval = setInterval(() => {
        const {
          message = DEFAULT_HEARTBEAT.message
        } = heartbeatOptions;
        webSocketInstance.send(typeof message === 'function' ? message() : message)
      }, heartbeatIntervalMs)
    }
    getSubscribers(url).forEach(subscriber => {
      subscriber.reconnectCount.current = 0;
      if (subscriber.optionsRef.current.onOpen) {
        subscriber.optionsRef.current.onOpen(event);
      }

      subscriber.setReadyState(ReadyState.OPEN);

      let onMessageCb: () => void;

      //would remove the logic from here
      if (heartbeatOptions && webSocketInstance instanceof WebSocket) {
        subscriber.lastMessageTime.current = Date.now();
        heartbeat(webSocketInstance, subscriber.lastMessageTime, typeof heartbeatOptions === 'boolean' ? undefined : heartbeatOptions,);
      }
    });
  };
};

I like having this logic abstracted into its own file, but I kept it in one place here for demonstration purposes.

I also have a question about whether it's better to lift the heartbeat logic outside of the subscriber step (as I indicated by comment in the above example) -- the way I see it, sharing a single connection between multiple "subscribers" is a front-end implementation detail/optimization. But handling the heartbeat at the subscriber level means that we are sending multiple heartbeat messages for a single websocket connection, which an api might reject/not handle predictably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will refactor this a bit later this week. Thanks for your feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it a bit, now it does not create separate interval for each subscriber in shared mode, so only one ping message regardless of the number of subscribers. New test validates this behavior.

Corrected interval in one existing test, but just to make sure it gets expected number of ping messages.

@w666
Copy link
Contributor Author

w666 commented Jan 19, 2025

@buhln,

Could you please retest it? New implementation is more correct, @robtaussig helped to spot one bit I did not think about.

@buhln
Copy link

buhln commented Jan 20, 2025

I get the timout after a while again:

image

In Wireshark I can see that after this timespan the client stops sending the ping message:

image

1: Last ping/pong messages from the frontend
2: First ping/pong messages after reconnect

The two marked timestamps shows, that after 55s the timout and reconnect from the client is happening:
1373.36s -1318.35s = 55s

So, the issue, that the client stops sending the ping message is still there.

@w666
Copy link
Contributor Author

w666 commented Jan 21, 2025

@buhln

I added more logging in my last commit. The reason for that is because I can't replicate that problem locally with tests.

Looking on your screenshot from wireshark I noticed that some messages are not ping/pong that are sent as a data, those are real low-level ping-pongs, where ping is sent by server and pong is sent by client (browser), you can't see those in browser as browsers do not provide access to low-level messages, only to data messages.

image

the react-use-websocket library sends ping and receives pong messages as text. So, I am not sure about what I see on the screenshot.

I understand that sharing actual network dump is probably not a good idea as it may contain sensitive data. But if you can re-test this again with additional logging that I added in my last commit, then it would provide a bit more data for me.

Thanks.

@buhln
Copy link

buhln commented Jan 21, 2025

You are right. The Ping/Pong messages you are marked are "lowlevel" messages coming from the python websocket packages which runs the backend. The browser replies correctly with a pong.

So in my app there are two different heartbeats. One controlled from the backend and one controlled from the frontend.

From my understanding it is visible, that the frontend, so the react-use-websocket package is reconnecting due to a timeout of the pong (data/text) message from the backend.

I will test your new version soon and provide the results here. Stay tuned.

@buhln
Copy link

buhln commented Jan 22, 2025

image

The first of these disconnects happend after 1222 seconds. Is it possible that this happens if another tab is open in the browser?

@w666
Copy link
Contributor Author

w666 commented Jan 22, 2025

Other tab should not cause any issues.

I assume that Edge browser may put tab into sleep mode if it is not active.
I will try to replicate this tomorrow at work. We use this on a pretty large project.

@vm-orbit
Copy link

Hi, this is @w666 but writing from my work account

I applied this fix to our project, I've built a production release and then was running it locally, I could not replicate the problem

image

This is the config we use

image

Also I double checked what happens when tab is not active for some time, EDGE put it in to sleep mode and connection terminates, but immediately reconnects when tab wakes up

image

What browser do you use?
Could you check in Firefox maybe?

To be honest I am not sure what is happening here. Maybe adjust timeout and interval to the same values as in my config?

@buhln
Copy link

buhln commented Jan 23, 2025

I adjusted the timout to your configuration w/o a change in the result. We are using Edge. I think your idea with the sleep mode after some time, espacially if the tab is not active in the browser, is a good theory for the behavior.

In the end it is acceptable because the client connects immediately again as soon as the tab is active again. But the question is, is this the desired behavior?

Copy link
Owner

@robtaussig robtaussig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to your opinion regarding the console.log, but I'm inclined to think that in most cases, it is preferable to not clog up the console with information pertaining to just one (of potentially many) libraries in an app. In an ideal world, we can consider adding a debug option to the websocket hook that would allow developers to opt in to this behavior, in which case I'd be all for any and all logs!

try {
if (typeof message === 'function') {
ws.send(message());
} else {
ws.send(message);
}
console.log(`>>> last message received ${timeNow - lastMessageReceivedAt}ms ago, last ping sent ${timeNow - lastPingSentAt}ms ago`);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about this log -- if I were a developer working on an app that used this library, I would be pretty annoyed if my console was getting cluttered by this message every X seconds.

Suggested change
console.log(`>>> last message received ${timeNow - lastMessageReceivedAt}ms ago, last ping sent ${timeNow - lastPingSentAt}ms ago`);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If seeing the ping history is important (and I understand that it would be difficult to monitor it without digging through the network tab), then I'd consider something like:

const lastTenHeartbeats = [];

//instead of the console.log on line 40:
lastTenHeartbeats.push({ messageReceived: timeNow - lastMessageReceivedAt, pingSent: timeNow - lastPingSentAt });
if (lastTenHeartbeats.length > 10) {
  lastTenHeartbeats.shift();
}

Then where you call console.warn, you can display a message that incorporates the last 10 heartbeats (since I imagine that is the only time a developer would care about this information)

Copy link

@vm-orbit vm-orbit Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @robtaussig, @buhln

I investigated that area, seems like this is known "issue", it is happening from ~2017 in all browsers. Unfortunately, we can't really mitigate the sleeping tab problem.

I will remove that debug logging, it was just temporary change for debugging.

But this PR in general makes sense, it provides more info when connection is terminated and using just one interval in shared mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR, removed temp debug logging, it is ready for another review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated that area, seems like this is known "issue", it is happening from ~2017 in all browsers. Unfortunately, we can't really mitigate the sleeping tab problem.

Thanks for your research. If this is a "feature" of the browsers you will be right.

But at the end your commit fixed the bug reported initially in #255. 👍

@w666 w666 force-pushed the feature-255-improve-heartbeat branch from 364c463 to 79a3c9d Compare January 24, 2025 19:41
 Check interval between ping messages more often (interval / 10).
 This is to prevent some configurations that may cause timeouts
 and connection close. For example, when timeout and interval are
 set to very close values and there was data message recived right
 after connection is established.

 Use one heartbeat interval in shared mode, add test.

 Close robtaussig#255
@w666 w666 force-pushed the feature-255-improve-heartbeat branch from 79a3c9d to bd9f3ac Compare January 24, 2025 19:43
@robtaussig robtaussig merged commit 328cfaf into robtaussig:master Jan 31, 2025
1 check passed
@robtaussig
Copy link
Owner

@w666 Thank you so much for your time and help with this! Released under 4.12.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Heartbeat timeout after update 4.8.1 -> 4.11.1
4 participants