Skip to content

Giant 11Gb gossip stores and node crashes in v24.08.1 #7763

Open
@m-schmoock

Description

@m-schmoock

Issue and Steps to Reproduce

A v24.08.1 mainnet node I have access to was creating 11GB gossip files. I didn't notice that until the node crashed when closing a channel with:

2024-10-23T19:38:53.032Z **BROKEN** gossipd: gossip_store: get delete entry offset 1399/10934507092 (version v24.08.1-modded)
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: common/daemon.c:38 (send_backtrace) 0x5575ea2847
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: common/status.c:221 (status_failed) 0x5575ead743
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:466 (gossip_store_get_with_hdr) 0x5575e990fb
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:592 (gossip_store_set_timestamp) 0x5575e9975b
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:777 (process_channel_update) 0x5575e9aeeb
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:901 (gossmap_manage_channel_update) 0x5575e9b8ab
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:215 (handle_recv_gossip) 0x5575e97f17
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:307 (connectd_req) 0x5575e98017
2024-10-23T19:38:53.033Z **BROKEN** gossipd: backtrace: common/daemon_conn.c:35 (handle_read) 0x5575ea2b6b
2024-10-23T19:38:53.034Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x5575f34397
2024-10-23T19:38:53.034Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x5575f3496f
2024-10-23T19:38:53.034Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x5575f34a4b
2024-10-23T19:38:53.034Z **BROKEN** gossipd: backtrace: ccan/ccan/io/poll.c:455 (io_loop) 0x5575f36a0b
2024-10-23T19:38:53.034Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:672 (main) 0x5575e9831f
2024-10-23T19:38:53.034Z **BROKEN** gossipd: backtrace: ../csu/libc-start.c:308 (__libc_start_main) 0x7f8ce2fdd7
2024-10-23T19:38:53.034Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x5575e94167
2024-10-23T19:38:53.034Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: gossip_store: get delete entry offset 1399/10934507092

After that I found out it was creating these big gossip files. This was not the first time the node was producing such large gossip stores, as a gossip_store.corrupt with the same size existed on the node.
The node had ridiculous long startup times, which I now believe was due to the fact it was processing these jumbo gossip stores.

If required, I can upload the 11GB store to my server so you can use it for debugging...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions