Skip to content

fuzz-tests: improve fuzz-initial_channel #8373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Chand-ra
Copy link

Add a couple of improvements to the fuzz test for common/initial_channel.{c, h}- tests/fuzz/fuzz-initial_channel.

Checklist

Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:

  • The changelog has been updated in the relevant commit(s) according to the guidelines.
  • Tests have been added or modified to reflect the changes.
  • Documentation has been reviewed and updated as needed.
  • Related issues have been listed and linked, including any that this PR closes.

Chandra Pratap added 3 commits June 25, 2025 07:06
Changelog-None: The current test can leak memory due to improper
cleanup in the case of an early return. Fix it.
Currently, `fuzz-initial_channel` doesn't verify the following
functions in its target file, `common/initial_channel.h`:
`channel_update_funding()` and `initial_channel_tx()`.

Add a test for them.
@Chand-ra
Copy link
Author

Hey @morehouse,

This target crashes for what is seemingly a bug but I'm not sure. I've added the crashing input to the corpus for you to take a look. Here is all I was able to uncover from my investigation:

The fuzzer fails with the following assertion:

fuzz-initial_channel: bitcoin/tx.c:548: void bitcoin_tx_finalize(struct bitcoin_tx *): Assertion `bitcoin_tx_check(tx)' failed.

which traces back to the newly added initial_channel_tx() check in the fuzzer. Here is bitcoin_tx_check():

bool bitcoin_tx_check(const struct bitcoin_tx *tx)
{
	u8 *newtx;
	size_t written;
	int flags = WALLY_TX_FLAG_USE_WITNESS;

	if (wally_tx_get_length(tx->wtx, flags, &written) != WALLY_OK)
		return false;

	newtx = tal_arr(tmpctx, u8, written);
	if (wally_tx_to_bytes(tx->wtx, flags, newtx, written, &written) !=          <-- This condition causes the error
	    WALLY_OK)
		return false;

	if (written != tal_bytelen(newtx))
		return false;

	tal_free(newtx);
	return true;
}

which in turn fails due to a condition inside external/libwally-core/src/transaction.c::tx_to_bytes():

if (!(flags & WALLY_TX_FLAG_ALLOW_PARTIAL)) {
    /* 0-input/output txs can be only be written with this flag */
    if (!tx->num_inputs || !tx->num_outputs)
        return WALLY_EINVAL;
}

triggered by the fact that our transaction has wtx->num_outputs = 0:

(gdb) n
548		assert(bitcoin_tx_check(tx));
(gdb) p *tx
$1 = {wtx = 0x508000000348, chainparams = 0x555555d193e0 <networks>, psbt = 0x512000000368}
(gdb) p *tx->wtx
$2 = {version = 2, locktime = 538831143, inputs = 0x5110000006a8, num_inputs = 1, inputs_allocation_len = 1, outputs = 0x515000000828, num_outputs = 0, outputs_allocation_len = 4}

I looked around for callers of initial_channel_tx() and found some callers where the function is called in a similar way to what we do in our fuzzing setup. Here's openingd/openingd.c::funder_finalize_channel_setup() for example:

static bool funder_finalize_channel_setup(struct state *state,
					  struct amount_msat local_msat,
					  struct bitcoin_signature *sig,
					  struct bitcoin_tx **tx,
					  struct penalty_base **pbase)
{
	u8 *msg;
	struct channel_id id_in;
	const u8 *wscript;
	struct channel_id cid;
	char *err_reason;
	struct wally_tx_output *direct_outputs[NUM_SIDES];

	/*~ Channel is ready; Report the channel parameters to the signer. */
	msg = towire_hsmd_setup_channel(NULL,
				       true,	/* is_outbound */
				       state->funding_sats,
				       state->push_msat,
				       &state->funding.txid,
				       state->funding.n,
				       state->localconf.to_self_delay,
				       state->upfront_shutdown_script[LOCAL],
				       state->local_upfront_shutdown_wallet_index,
				       &state->their_points,
				       &state->their_funding_pubkey,
				       state->remoteconf.to_self_delay,
				       state->upfront_shutdown_script[REMOTE],
				       state->channel_type);
	wire_sync_write(HSM_FD, take(msg));
	msg = wire_sync_read(tmpctx, HSM_FD);
	if (!fromwire_hsmd_setup_channel_reply(msg))
		status_failed(STATUS_FAIL_HSM_IO, "Bad setup_channel_reply %s",
			      tal_hex(tmpctx, msg));


	derive_channel_id(&cid, &state->funding);

	state->channel = new_initial_channel(state,
					     &cid,
					     &state->funding,
					     state->minimum_depth,
					     NULL, 0, /* No channel lease */
					     state->funding_sats,
					     local_msat,
					     take(new_fee_states(NULL, LOCAL,
								 &state->feerate_per_kw)),
					     &state->localconf,
					     &state->remoteconf,
					     &state->our_points,
					     &state->their_points,
					     &state->our_funding_pubkey,
					     &state->their_funding_pubkey,
					     state->channel_type,
					     feature_offered(state->their_features,
							     OPT_LARGE_CHANNELS),
					     /* Opener is local */
					     LOCAL);
	/* We were supposed to do enough checks above, but just in case,
	 * new_initial_channel will fail to create absurd channels */
	if (!state->channel)
		peer_failed_err(state->pps,
				&state->channel_id,
				"could not create channel with given config");

	*tx = initial_channel_tx(state, &wscript, state->channel,
				&state->first_per_commitment_point[REMOTE],
				REMOTE, direct_outputs, &err_reason);
...
...
...

This makes believe that it is probably some setup that we're missing or that this is an actual bug, but I'm unsure on how to proceed with investigating either of these. Any ideas?

@morehouse
Copy link
Contributor

morehouse commented Jun 27, 2025

The commitment transaction has no outputs, which means both sides of the channel have balances below their designated dust_limit. Normally this is prevented by the various checks upon receipt of open_channel or accept_channel, which ensure that dust limits and channel reserves are compatible. See fundee_channel and funder_channel_start.

Probably since this fuzz target skips all those checks, we can get all kinds of invalid channel parameters and cause problems. If we can trigger this assertion while satisfying those checks, we probably have a serious DoS vector. A straightforward way to determine if that's possible would be with state machine fuzzing of fundee_channel, as we discussed a while ago.

@Chand-ra
Copy link
Author

Hey @morehouse,

Sorry for the late reply to this, I was working on creating a fuzz test for fundee_channel() that could potentially trigger this breakage, and it turned out to be more challenging than I had anticipated. That being said, I was eventually able to get it done, and the resulting target successfully reproduces the issue.

I've pushed the test as #8411 with the breaking input as the corpus; let me know what you think.

@Chand-ra
Copy link
Author

Hey @morehouse , I added the fix in #8411 to this test and while it does solve the issue it hand, it breaks with another assertion failure:

fuzz-initial_channel: bitcoin/script.c:411: u8 *bitcoin_wscript_to_remote_anchored(const tal_t *, const struct pubkey *, u32): Assertion `is_to_remote_anchored_witness_script(script, tal_bytelen(script))' failed.
==8983== ERROR: libFuzzer: deadly signal
    #0 0x572ab9288e45 in __sanitizer_print_stack_trace (/home/chandra/lightning/tests/fuzz/fuzz-initial_channel+0x28de45) (BuildId: 74f51ae6856ec8a16a3afb73b50d4ab159a62ba6)
    #1 0x572ab91e295c in fuzzer::PrintStackTrace() (/home/chandra/lightning/tests/fuzz/fuzz-initial_channel+0x1e795c) (BuildId: 74f51ae6856ec8a16a3afb73b50d4ab159a62ba6)
    #2 0x572ab91c89e7 in fuzzer::Fuzzer::CrashCallback() (/home/chandra/lightning/tests/fuzz/fuzz-initial_channel+0x1cd9e7) (BuildId: 74f51ae6856ec8a16a3afb73b50d4ab159a62ba6)
    #3 0x74b7b7c4532f  (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4532f) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)
    #4 0x74b7b7c9eb2b in __pthread_kill_implementation nptl/pthread_kill.c:43:17
    #5 0x74b7b7c9eb2b in __pthread_kill_internal nptl/pthread_kill.c:78:10
    #6 0x74b7b7c9eb2b in pthread_kill nptl/pthread_kill.c:89:10
    #7 0x74b7b7c4527d in raise signal/../sysdeps/posix/raise.c:26:13
    #8 0x74b7b7c288fe in abort stdlib/abort.c:79:7
    #9 0x74b7b7c2881a in __assert_fail_base assert/assert.c:96:3
    #10 0x74b7b7c3b516 in __assert_fail assert/assert.c:105:3
    #11 0x572ab945517f in bitcoin_wscript_to_remote_anchored /home/chandra/lightning/bitcoin/script.c:411:2
    #12 0x572ab9341c86 in initial_commit_tx /home/chandra/lightning/common/initial_commit_tx.c:248:13
    #13 0x572ab933bbc4 in initial_channel_tx /home/chandra/lightning/common/initial_channel.c:121:12
    #14 0x572ab948a83e in run /home/chandra/lightning/tests/fuzz/fuzz-initial_channel.c:123:8
    #15 0x572ab92bc7e8 in LLVMFuzzerTestOneInput /home/chandra/lightning/tests/fuzz/libfuzz.c:25:2
...
...

I took a quick look and here is what I found:

  • In common/initial_channel.c, initial_channel_tx() calculates a csv_lock value using

    csv_lock = channel->lease_expiry - get_blockheight(...)

    where channel->lease_expiry is something we set using the fuzzer.

  • The csv_lock is passed to initial_commit_tx which creates a to_remote output when the option_anchor_outputs or option_anchors_zero_fee_htlc_tx feature is enabled.

  • To do this, it calls bitcoin_wscript_to_remote_anchored, passing it the csv_lock value.

  • Inside bitcoin/script.c, the bitcoin_wscript_to_remote_anchored function constructs a script. A key part of this script is the csv_lock number, which is added using the add_number helper function.

  • The add_number function can encode a number using a variable number of bytes (from 1 to 6 bytes). For values greater than 32,767 (0x7FFF), it requires at least 4 bytes.

  • Immediately after creating the script, an assertion checks its validity using is_to_remote_anchored_witness_script.

  • The validation function is_to_remote_anchored_witness_script has a strict length check:

    if (script_len < len || script_len > len + 2)

    where len is 37. This allows for a maximum script length of 39 bytes.

  • The fixed parts of the script sum to 36 bytes. This leaves a maximum of 3 bytes for the encoded csv_lock number.

  • In the crashing input, the csv_lock value is 65023 which results in 4 bytes being used to place the value in the script, resulting in the crash.

Is this an actual bug or a false positive caused by an impossible value of channel->lease_expiry?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants