-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gossmap crash, issues #8053
Open
rustyrussell
wants to merge
8
commits into
ElementsProject:master
Choose a base branch
from
rustyrussell:guilt/gossmap-crash2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Gossmap crash, issues #8053
rustyrussell
wants to merge
8
commits into
ElementsProject:master
from
rustyrussell:guilt/gossmap-crash2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We have reports of crashes on reading gossip_store, including from gossipd itself! ``` lightning_gossipd: common/gossmap.c:121: map_copy: Assertion `offset + len <= map->map_size' failed. ... lightning_gossipd: FATAL SIGNAL (version v24.11) 0x6260c41d682a send_backtrace common/daemon.c:33 0x6260c41e098b status_failed common/status.c:221 0x6260c41e0b41 status_backtrace_exit common/subdaemon.c:18 0x6260c41d68b8 crashdump common/daemon.c:78 0x70508ea6913f ??? ???:0 0x70508e8a0d51 ??? ???:0 0x70508e88a536 ??? ???:0 0x70508e88a40e ??? ???:0 0x70508e8996d1 ??? ???:0 0x6260c41d8b69 map_copy common/gossmap.c:121 0x6260c41d8bab map_be16 common/gossmap.c:142 0x6260c41daa45 map_catchup common/gossmap.c:705 0x6260c41dab95 gossmap_refresh_mayfail common/gossmap.c:1192 0x6260c41daca6 gossmap_refresh common/gossmap.c:1213 0x6260c41cee32 gossmap_manage_get_gossmap gossipd/gossmap_manage.c:1314 0x6260c41d0686 gossmap_manage_new_block gossipd/gossmap_manage.c:1221 0x6260c41cbfdd new_blockheight gossipd/gossipd.c:473 0x6260c41cc363 recv_req gossipd/gossipd.c:584 0x6260c41d6b1d handle_read common/daemon_conn.c:35 0x6260c43175b5 next_plan ccan/ccan/io/io.c:60 0x6260c4317a40 do_plan ccan/ccan/io/io.c:422 0x6260c4317af9 io_ready ccan/ccan/io/io.c:439 0x6260c4319446 io_loop ccan/ccan/io/poll.c:455 0x6260c41cccf4 main gossipd/gossipd.c:665 ``` This implies that we have a message shorter than 2 bytes, which should never happen. An audit didn't shed any light, but let's make sure we don't ever write such a thing. Signed-off-by: Rusty Russell <[email protected]>
rustyrussell
force-pushed
the
guilt/gossmap-crash2
branch
3 times, most recently
from
February 5, 2025 23:33
ebf78bc
to
d32b9e5
Compare
We have a report of this happening under ZFS. We cannot do much if this really is a problem where we can't read back what we write, but this avoids the immediate crash. Fixes: ElementsProject#7971 Signed-off-by: Rusty Russell <[email protected]> Changelog-Fixed: gossmap: occasional crash (at least on ZFS) reading gossip_store.
We only use it in one place, and that was simply to share an fd between gossipd writing and gossipd reading, which may be causing our zfs problem anyway. In fact, it fixes a race if we don't have HAVE_PWRITEV. Signed-off-by: Rusty Russell <[email protected]>
Default goes to stderr for LOG_UNUSUAL and higher. Signed-off-by: Rusty Russell <[email protected]>
We're about to test them in gossmap. Signed-off-by: Rusty Russell <[email protected]>
We assume if it's incorrect, we simply need to wait. If this proves incorrect, we will see a stream of BROKEN log messages. To measure the performance impact, I timed tests/test_askrene.py::test_real_biases on my laptop. Before: 194.52s After: 202.81s So it's marginal. Signed-off-by: Rusty Russell <[email protected]>
Instead of making a copy. To measure the performance impact, I timed tests/test_askrene.py::test_real_biases on my laptop. No checksum check: 194.52s Copying for checksum check: 202.81s Zero-copy checksum check: 194.40s But these numbers proved noisy. Still, doesn't hurt. Signed-off-by: Rusty Russell <[email protected]>
If they go to stderr, you can't associate them with the record they're talking about. Signed-off-by: Rusty Russell <[email protected]>
rustyrussell
force-pushed
the
guilt/gossmap-crash2
branch
from
February 6, 2025 02:44
d32b9e5
to
15f5e4c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #7971