-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebRTCTransport.dial AbortError #2702
WebRTCTransport.dial AbortError #2702
Comments
This might be the same issue I reported in #2462. I'll take a closer look by replacing my However, I don't think that this is the same, as I'm building the application into a docker container with the I'll report back on my findings. |
I carefully deleted my The main target that I'm testing is a libp2p node setup as a Circuit Relay server. |
TURN works for regular internet connections across countries without this error, but it doesn't function properly with restrictive VPNs. This error indicates that WebRTC has failed to establish a connection with the peer. |
I wouldn't mind if webRTC fails to connect, but this error causes the application to crash and exit, and there doesn't seem to be any way to wrap it with try/catch to handle the exception. |
![]() @christroutner while this is not a 'solution' (more of a temporary workaround), you might consider an application level handler and consider not allowing the application to crash if that type of exception goes unhandled... |
I appreciate the tip @cristianmadularu. I ended up just disabling WebRTC in my application until this issue can be resolved. It would be great to have, but it's not a core requirement. |
@christroutner if I remove WebRTC on NodeJS from transports, is circuit-relay and autonat, dcutr for browsers to connect peer-to-peer (via WebRTC) to each other still possible when coming both via wss or webtransport? |
My understanding is that if you remove WebRTC, then circuit-relay is not possible. I don't know much about the other protocols mentioned in your question. |
Browsers can listen on circuit relay addresses where the relayed connection is established over WebSockets/WebTransport, but any incoming connections will be time/data limited so it's only useful under certain conditions. For two browsers to upgrade to an unlimited direct connection you need WebRTC. |
While investigating libp2p/js-libp2p#2702 I've had this running for almost 12 hours without a crash. The only changes I've made are to upgrade the libp2p/Helia deps and to enable the WebRTC/WebRTC Direct transports and add a WebRTC Direct listener. This PR just upgrades the Helia/libp2p deps.
While investigating libp2p/js-libp2p#2702 I've had this running for almost 12 hours without a crash. The only changes I've made are to upgrade the libp2p/Helia deps and to enable the WebRTC/WebRTC Direct transports and add a WebRTC Direct listener. This PR just upgrades the Helia/libp2p deps.
@christroutner I've been running the The deps were quite out of date so that might have something to do with it. I've [opened a PR](Permissionless-Software-Foundation/ipfs-service-provider#168 that updates them. I will open a followup with my changes that re-add WebRTC support. |
Here is the followup that re-enables WebRTC - Permissionless-Software-Foundation/ipfs-service-provider#169 |
This is just the prod I needed. Thanks @achingbrain. I've been intending to update this thread the last few days. I updated ipfs-service-provider to use helia v5.2.0, libp2p v2.6.2, and @libp2p/webrtc v5.1.0. The WebRTC and Circuit Relay stuff is working much better. However, I'm still seeing the random AbortError. Sometimes it happens right after startup, sometimes it doesn't happen for hours. It seems completely random (which makes me think the root cause is a race condition). I have however managed to catch it by adding this code snippet to the first JS file to get executed: process.on('unhandledRejection', (reason, promise) => {
console.log(`Handling ${reason.code} error`)
}) That at least prevents it from crashing the entire app. There does not appear to be any negative side effects to handling the error as above. I still can't seem to find the root cause, but it seems to be the same issue. I'll update the code to print out the error and I'll try to add it to this thread, to see if the error stack has changed at all. In the meantime, I'll review your PR and compare it to the changes I've already made. |
After updating all npm dependencies, I'm still seeing the AbortError randomly. Here is the stack from the latest error:
|
Do you know what the multiaddr is that your node is trying to dial? That might help me narrow it down a bit. |
No, not at the time the error occurs. At a high level, when a new node is trying to connect to the network, it first connects to a handful of bootstrap nodes. It listens on an 'announcement' pubsub channel. When a new node announces itself that it hasn't seen, the announcement object contains multiaddrs. The node will go down the list of multiaddrs and try to connect to each multiaddr until it's successful or reaches the end of the list. Also, a timer will kick off every few minutes to try and connect to nodes it knows about and hasn't been able to connect to. So the stage is set for a race condition. Everything is jumbled in production. If the error was thrown within the code path, it would be caught. I would know exactly where in the code path the error happened and exactly which node and which transport it was using. But because this is manifesting as an AbortError that I have to catch in a general way, I can't isolate exactly what is causing the error. And there is no info in the stack to help me isolate the code path within my own app. |
Ok, I think I've figured out what's happening.
This should be fixed by achingbrain/race-signal#64 released in |
Version:
libp2p v1.9.1
Platform:
Linux hp-elitedesk01 5.15.0-91-generic Create CODE_OF_CONDUCT.md #101~20.04.1-Ubuntu SMP Thu Nov 16 14:22:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Subsystem:
WebRTC
Severity:
Description:
I had filed this previous issue about issues I was having with the @libp2p/webrtc package. That was resolved and the current package versions can be seen here and the code for initializing libp2p can be found here.
I'm now encountering what appears to be a race condition inside the webRTC libraries. The node will run for a while and then randomly will crash with the following error message:
Steps to reproduce the error:
The error does not occur right away. It will appear at some point within 30 minutes while the node is running. It forces the app to crash and the process manager will restart it. But then the crash will happen again within 30 minutes.
The text was updated successfully, but these errors were encountered: