Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug Report — Logging broken in hibernatable WebSocket event handlers #2089

Open
codefrau opened this issue May 5, 2024 · 9 comments

Comments

@codefrau
Copy link

codefrau commented May 5, 2024

We have one DurableObject that uses the hibernation API. It implements webSocketMessage() in which we use console.log() to log stuff.

On miniflare (wrangler dev) that works perfectly fine, the logs appear in the terminal, as they should.

When deployed to Cloudflare though, nothing is logged from within the webSocketMessage() handler. The handler method with the log statement is executed and works fine.

This happens both with wrangler tail, as well as our log push job to an R2 bucket. All other log calls work fine.

I added a workaround where for that DO we additionally send every log message to another DO and log it there. That works.

@a-kuz
Copy link

a-kuz commented Jun 7, 2024

I observe identical behavior

@codefrau
Copy link
Author

codefrau commented Jun 7, 2024

@MellowYarker @jasnell @kentonv Sorry to ping you this way but could someone please triage this?

It's not a blocker for us since we have the work-around, and I can't look at it again for a few weeks anyways, but now that someone else ran into it maybe it's worth taking a look?

@MellowYarker
Copy link
Contributor

Huh, not sure of anything that would have caused this to break. Was it always broken or did you notice a regression? Will take a look next week.

@codefrau
Copy link
Author

codefrau commented Jun 7, 2024

We only started using hibernatable sockets a few months ago, so I can't tell if it was broken from the beginning.

@MellowYarker
Copy link
Contributor

Haven't forgotten about this, just been a bit hectic this week! Still on my todo list.

@MellowYarker
Copy link
Contributor

@codefrau sorry for the delay. I just tested this and I seems to work for me.

My steps were:

  1. npm create cloudflare@latest
  2. Pasted the hibernation example
  3. Added a console.log() to the webSocketMessage() handler, so now it looks like:
  async webSocketMessage(ws, message) {
    console.log(`RECEIVED NEW MESSAGE: ${message}`)
    // Upon receiving a message from the client, reply with the same message,
    // but will prefix the message with "[Durable Object]: " and return the
    // total number of connections.
    ws.send(`[Durable Object] message: ${message}, connections: ${this.ctx.getWebSockets().length}`);
  }
  1. npx wrangler deploy after setting up the binding in wrangler.toml
  2. Ran npx wrangler tail to get logs
  3. Created a client connection from my browser that did ws.send("Hello from the browser");, and then saw the following in my console:
Successfully created tail, expires at 2024-06-21T19:08:10Z
Connected to logging, waiting for logs...
Unknown Event - Ok @ 6/21/2024, 9:10:02 AM
  (log) RECEIVED NEW MESSAGE: Hello from the browser

Is there any way you could help narrow the behavior down?

@aranlunzer
Copy link

Hi @MellowYarker - thank you for getting back to this. I'm keeping the engines running here while @codefrau is out of the office.

Using your basic example, I initially saw the same response as you. After a while I went back to our application and re-deployed to test out one theory - which proved unhelpful... but it did reveal that the basic logging was initially working, but stopped working after a few minutes.

To test this, I added

console.log(`REQUEST: ${request.url}`);

to the fetch method of your sample DO, and set up the following code in a web page accessing it:

const sock = new WebSocket("wss://ael-expt2.synq.workers.dev/websocket");
sock.onmessage = (event) => console.log(event.data);

setInterval(() => {
    fetch("https://ael-expt2.synq.workers.dev/", { mode: "no-cors" }); // should trigger a GET log

    sock.send(new Date().toISOString()); // should trigger an Unknown Event log
}, 20000);

Below is a screenshot of the npx wrangler tail next to the console of the browser. As you can see, for two minutes the logs show both the fetch and the socket message, but after that the socket-message-triggered logging disappears (and seems never to resume).

Screenshot 2024-06-21 at 18 39 01

@MellowYarker
Copy link
Contributor

@aranlunzer thanks for looking into this some more! I ran the same experiment as you and saw logs from fetch and the websocket for an hour, after which wrangler tail seems to have silently quit.

Which browser are you using? Also, would you mind adding a log to the webSocketClose() handler as well?

@aranlunzer
Copy link

aranlunzer commented Jun 27, 2024

@MellowYarker After some further hours of experiments, my main finding is that the results are wildly inconsistent, with no readily discernable pattern.

To maximise your chances of reproduction, here is what I am using.

wrangler.toml

#:schema node_modules/wrangler/config-schema.json
name = "ael-expt2"
main = "src/index.ts"
logpush = true
compatibility_date = "2024-06-20"
compatibility_flags = ["nodejs_compat"]

[[durable_objects.bindings]]
name = "WEBSOCKET_HIBERNATION_SERVER"
class_name = "WebSocketHibernationServer"

[[migrations]]
tag = "v1"
new_classes = ["WebSocketHibernationServer"]

index.ts (note removal of async on the websocket functions, which appears to have made no difference)

import { DurableObject } from "cloudflare:workers";

export interface Env {
	WEBSOCKET_HIBERNATION_SERVER: DurableObjectNamespace<WebSocketHibernationServer>;
}

// Worker
export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
		console.log(`REQUEST: ${request.url}`);
		if (request.url.endsWith("/websocket")) {
			const upgradeHeader = request.headers.get('Upgrade');
			if (!upgradeHeader || upgradeHeader !== 'websocket') {
				return new Response('Durable Object expected Upgrade: websocket', { status: 426 });
			}

			let id = env.WEBSOCKET_HIBERNATION_SERVER.idFromName("foo");
			let stub = env.WEBSOCKET_HIBERNATION_SERVER.get(id);

			return stub.fetch(request);
		}

		return new Response("hello");
	}
};

export class WebSocketHibernationServer extends DurableObject {
	constructor(state: DurableObjectState, env: Env) {
		super(state, env);
		console.log("woke up");
	}

	async fetch(request: Request): Promise<Response> {
		const webSocketPair = new WebSocketPair();
		const [client, server] = Object.values(webSocketPair);

		this.ctx.acceptWebSocket(server);

		return new Response(null, {
			status: 101,
			webSocket: client,
		});
	}

	// async
	webSocketMessage(ws: WebSocket, message: ArrayBuffer | string) {
		console.log(`RECEIVED NEW MESSAGE: ${message}`);
		ws.send(`[Durable Object] message: ${message}, connections: ${this.ctx.getWebSockets().length}`);
	}

	// async
	webSocketClose(ws: WebSocket, code: number, reason: string, wasClean: boolean) {
		console.log(`CLOSE: ${code} ${reason} wasClean: ${wasClean}`);
		ws.close(code, "Durable Object is closing WebSocket");
	}
}

index.html

<html>
    <body>
        <script type="text/javascript">
            try {
                const sock = new WebSocket("wss://ael-expt2.synq.workers.dev/websocket")
                sock.onmessage = event => console.log(event.data);
                setInterval(() => {
                    fetch("https://ael-expt2.synq.workers.dev/", { mode: "no-cors" }); // should trigger a GET log
                    sock.send(new Date().toISOString()); // should trigger an Unknown Event log
                }, 10000);
            } catch(e) {
                console.log(e)
            }
        </script>
    </body>
</html>

This evening I have re-deployed around 50 times. Sequence:

  • npx wrangler deploy
  • on the Cloudflare page for this worker, select Logs tab and ask to begin streaming (accepting the need to restart DOs)
  • start a new tab with index.html, accessing the worker
  • record how long we continue to see the expected two log entries on each request (every 10s) before it drops to a single log each time, because the websocket logging is now failing

Once the change is observed (or after around 5 minutes it it's still going)

  • close the browser tab
  • suspend log stream
  • switch away from Logs tab (so streaming will have to be begun from scratch next time)
  • re-deploy and repeat

The time taken for logging from the websocket request to start failing is extremely variable. Sometimes after just 20s; most often somewhere in the range 100-200s; occasionally it's still going even after 300s (so I reset and try again).

[edit] Forgot to mention: I'm using Chrome to load the web page, and make sure to keep the tab visible so the timers continue to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants