Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnecting the inspector causes a segmentation fault #2564

Closed
penalosa opened this issue Aug 20, 2024 · 2 comments
Closed

Reconnecting the inspector causes a segmentation fault #2564

penalosa opened this issue Aug 20, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@penalosa
Copy link
Contributor

This is likely due to 19c8ed5, but I haven't been able to fully pin it down.

Essentially, in the latest version of workerd, connecting, disconnecting, and then reconnecting to the inspector causes a segmentation fault:

workerd/util/symbolizer.c++:96: warning: Not symbolizing stack traces because $LLVM_SYMBOLIZER is not set. To symbolize stack traces, set $LLVM_SYMBOLIZER to the location of the llvm-symbolizer binary. When running tests under bazel, use `--test_env=LLVM_SYMBOLIZER=<path>`.
*** Received signal #11: Segmentation fault: 11
stack: 

It's pretty easy to reproduce. Run:

npx workerd@latest serve samples/helloworld/config.capnp --inspector-addr=localhost:8989

and open up https://devtools.devprod.cloudflare.dev/js_app?ws=localhost:8989/main. Reloading that webpage twice will result in a segmentation fault.

the same problem doesn't present with the previous version of workerd:

npx [email protected] serve samples/helloworld/config.capnp --inspector-addr=localhost:8989
@jasnell
Copy link
Member

jasnell commented Aug 20, 2024

Same issue but slightly more useful error:

workerd/server/server.c++:1227: info: Inspector client attaching [main]
workerd/util/symbolizer.c++:96: warning: Not symbolizing stack traces because $LLVM_SYMBOLIZER is not set. To symbolize stack traces, set $LLVM_SYMBOLIZER to the location of the llvm-symbolizer binary. When running tests under bazel, use `--test_env=LLVM_SYMBOLIZER=<path>`.
workerd/server/server.c++:1194: error: Uncaught exception: kj/_virtual_includes/kj/kj/memory.h:433: failed: expected ptr != nullptr; null Own<> dereference
stack: /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@b9f9874 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@462b7c1 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@b94c3c0 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@46284bc /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@554a5b2 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@557dda0 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@473bbb4 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@473bde2 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@a5f901d /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@a5fe901 /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@a60320d /home/jsnell/.cache/bazel/_bazel_jsnell/53c6c64186779d616175192445f45b10/execroot/workerd/bazel-out/k8-fastbuild/bin/src/workerd/server/workerd@a552540

Symbolizing the stack yields...

kj::_::inlineRequireFailure(char const*, int, char const*, char const*, char const*) at ??:0:0

kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>::operator->() at workerd.c++:0:0

kj::_::CoroutineBase::AwaiterBase::awaitSuspendImpl(kj::_::CoroutineBase&) at ??:0:0

bool kj::_::Coroutine<void>::Awaiter<void>::await_suspend<kj::_::Coroutine<void>>(std::__1::coroutine_handle<kj::_::Coroutine<void>>) at workerd.c++:0:0

workerd::XThreadNotifier::awaitNotification() at worker.c++:0:0

workerd::Worker::Isolate::InspectorChannelImpl::WebSocketIoHandler::dispatchLoop() (.resume) at worker.c++:0:0

workerd::server::Server::InspectorService::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&) (.resume) at server.c++:0:0

workerd::server::Server::InspectorService::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&) (.resume) at server.c++:0:0

kj::HttpServer::Connection::onRequest(kj::HttpHeaders::Request&) (.resume) at http.c++:0:0

kj::HttpServer::Connection::onHeaders(kj::OneOf<kj::HttpHeaders::Request, kj::HttpHeaders::ConnectRequest, kj::HttpHeaders::ProtocolError>&&) (.resume) at http.c++:0:0

kj::HttpServer::Connection::loop() (.resume) at http.c++:0:0

kj::HttpServer::Connection::startLoopImpl()::'lambda'(kj::Exception&&)::operator()(kj::Exception&&) const at http.c++:0:0

harrishancock added a commit that referenced this issue Aug 20, 2024
…erence

This is an attempt to reproduce #2564.

The test doesn't fail as I expect, but it's still useful, so if it passes on all CI, I'm going to leave it uncommented.
harrishancock added a commit that referenced this issue Aug 20, 2024
We need to revert the fix until we've diagnosed #2564.
@jasnell jasnell added the bug Something isn't working label Aug 20, 2024
npaun pushed a commit that referenced this issue Aug 20, 2024
…erence

This is an attempt to reproduce #2564.

The test doesn't fail as I expect, but it's still useful, so if it passes on all CI, I'm going to leave it uncommented.
npaun pushed a commit that referenced this issue Aug 20, 2024
We need to revert the fix until we've diagnosed #2564.
@jasnell
Copy link
Member

jasnell commented Aug 20, 2024

The revert has landed

harrishancock added a commit that referenced this issue Aug 21, 2024
My initial attempt at a fix made a subtle behavior change which I did not consider fully: instead of a new `incomingQueueNotifier` XThreadNotifier being constructed for every inspector connection, my fix caused all inspector connections to re-use one long-lived XThreadNotifier. This subsequently ran afoul of the fact that XThreadNotifier::awaitNotification() is not cancel-safe: if it is cancelled while awaiting its stored promise, calling awaitNotification() a second time tries to await a moved-from promise.

This commit restores the previous behavior of creating a new `incomingQueueNotifier` XThreadNotifier for every inspector connection. However, instead of constructing it in the WebSocketIoHandler constructor as before, I moved the construction to the `messageLoop()` function implementation, which also spawns the dispatch loop. This narrows the scope of access to the notifier to only those functions which actually need it, keeps our usage of the dispatch kj::Executor localized in one place, and avoids synchronously blocking the inspector thread, as we would have had to do if we constructed it in the WebSocketIoHandler constructor.

Fixes #2564.
harrishancock added a commit that referenced this issue Aug 21, 2024
The previous attempt at a test didn't fail because it didn't actually exercise the inspector protocol. If it had, it would have encounterd a WebSocket error and failed.

This commit removes the previous test, uncomments out the profiling test, factors it out into a separate function, then uses that function to implement two separate test cases, one to actually test profiling, and one to test repeated inspector connections.
harrishancock added a commit that referenced this issue Aug 21, 2024
The previous attempt at a test didn't fail because it didn't actually exercise the inspector protocol. If it had, it would have encounterd a WebSocket error and failed.

This commit removes the previous test, uncomments out the profiling test, factors it out into a separate function, then uses that function to implement two separate test cases, one to actually test profiling, and one to test repeated inspector connections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants