-
Notifications
You must be signed in to change notification settings - Fork 419
Add support for native async KVStore
persist to ChainMonitor
#4063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for native async KVStore
persist to ChainMonitor
#4063
Conversation
👋 Thanks for assigning @joostjager as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4063 +/- ##
==========================================
+ Coverage 88.71% 88.72% +0.01%
==========================================
Files 176 177 +1
Lines 132778 133252 +474
Branches 132778 133252 +474
==========================================
+ Hits 117790 118234 +444
- Misses 12304 12320 +16
- Partials 2684 2698 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see how you wrestle the wrappers, polling and generics, especially after having fought with that myself for quite some time.
lightning/src/util/persist.rs
Outdated
); | ||
(start, end) | ||
}) | ||
let latest_update_id = monitor.get_latest_update_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there test coverage to make this change safely? Migration code often isn't covered as well. It seems that persister_with_real_monitors
provides some, but not sure what exactly.
Also wondering if this change is necessary for this PR. Otherwise it might be better to split it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its sadly required. We have to make the persist_new_channel
call as a non-async call and then block its future later. If we kept the old code which read the old monitor, we'd need to await
it before calling persist_new_channel
and suddenly we can have write-order inversions. I'll add a comment describing why its important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this would break upgrades for any monitors written pre-0.1, right (i.e., even if users just have them stilll lying around)? Should we document that in the pending_changelog
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't? The change is still functionally-equivalent, we just do the read and cleanup after the first write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial question about test coverage is still open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we should add test coverage but the previous logic also had no coverage, and the new logic is clearly safer as it relies on listing and existing code rather than code specifically for this case. #4104
/// [`MonitorUpdatingPersisterAsync`] and thus allows persistence to be completed async. | ||
/// | ||
/// Note that async monitor updating is considered beta, and bugs may be triggered by its use. | ||
pub fn new_async_beta( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are now three ways to do persistence: sync, the previous async way via implementing a different Persist
and this new_async_beta
?
Is there any form of consolidation possible between the two async setups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I mention it in the last commit, but I think the eventual consolidation should be that we merge MonitorUpdatingPersister
into ChainMonitor
and then the Persist
interface is just the interface between ChannelManager
and ChainMonitor
, a user will always just instantiate a ChainMonitor
with either a KVStore
or a KVStoreSync
and we'll deal with the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. I just wondered if we should already now steer towards MonitorUpdatingPersister
with an async kv store as the only way to do async. I don't think it is more "beta" than the current callback-based async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'd missed this. I don't really see a strong reason to change the current API and remove the manual-async approach immediately. Its not additional code to maintain (given the new logic uses it under the hood anyway) and we do have folks using it. That said, it does probably make sense to deprecate it, which I'll go ahead and do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually nevermind, we should have a discussion about if we want to support async outside of rust, which would need the old API (or a way to make async KVStore work outside of rust, which I think we can do eventually as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't consider the bindings. Not great to remain stuck with multiple ways to do it, but not sure what we can do either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can map the async stuff to bindings eventually, so its not like we're stuck, just a question of priorities.
/// - [`MonitorUpdatingPersister`] will potentially have more listing to do if you need to run | ||
/// [`MonitorUpdatingPersister::cleanup_stale_updates`]. | ||
/// | ||
/// Note that you can disable the update-writing entirely by setting `maximum_pending_updates` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the Persist
impl for KvStore (non-updating) be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think the next step in cleaning things up would be to consolidate ChainMonitor
and MonitorUpdatingPersister
and then we'd remove that blanket impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could already remove it now and let users use MonitorUpdatingPersister
with zero updates? Or you want to keep it to avoid a two-step api change?
c5c70bc
to
53cb9f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first high-level pass, looks good so far!
I think this would be great candidate to follow the recently-discussed policy of testing on LDK Node. Mind also opening a draft PR against its develop
branch we now have to test this out? (doesn't need to be production ready, happy to take it over eventually)
lightning/src/util/persist.rs
Outdated
); | ||
(start, end) | ||
}) | ||
let latest_update_id = monitor.get_latest_update_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this would break upgrades for any monitors written pre-0.1, right (i.e., even if users just have them stilll lying around)? Should we document that in the pending_changelog
?
lightning/src/util/persist.rs
Outdated
} | ||
|
||
/// A generic trait which is able to spawn futures in the background. | ||
pub trait FutureSpawner: Send + Sync + 'static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this live in lightning-types
or a new shared lightning-util
crate, so that we can DRY up the one in lightning-block-sync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? Not entirely sure what to do with it honestly. It doesn't really below in lightning-types
and adding a crate just for this seems like overkill...We could probably move it to lightning::util
in a separate module and just use that one in lightning-block-sync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I went ahead and did this.
struct PanicingSpawner; | ||
impl FutureSpawner for PanicingSpawner { | ||
fn spawn<T: Future<Output = ()> + MaybeSend + 'static>(&self, _: T) { | ||
unreachable!(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a message here explaining what happened if it was ever hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that was clear from the PanicingSpawner
name :) I could write out why its used below but it seems weird to put that here rather than where PanicingSpawner
is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but if it panics we won't see the type name, no? Not that important, but some message could provide some initial context on what happened/why we crashed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll still give a line number 🤷♂️
if let Some(a) = res_a { | ||
a.await?; | ||
} | ||
if let Some(b) = res_b { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these else if
branches to express that we'd only ever deal with one res
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems more correct not to, given we're not returning the Ok
from them just ?
ing them?
/// | ||
/// Note that async monitor updating is considered beta, and bugs may be triggered by its use. | ||
pub fn new_async_beta( | ||
chain_source: Option<C>, broadcaster: T, logger: L, feeest: F, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why not simply fee_estimator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cause I copied the one from new
🤷♂️
Indeed, though I don't actually think we want to use this right away - because async persist is still considered "beta" I don't think there's anything actionable in LDK Node until probably at least 0.3? I'm happy to do a test integration but does LDK Node already have support for async KVStore/use I'll at least go update |
ac1c9d5
to
dfaa102
Compare
For the former, it will after lightningdevkit/ldk-node#633 lands which I intend to finish ~early next week (will also need some changes over here in LDK). For the latter we already had a PR open / close that however was blocked on the LDK upgrade (lightningdevkit/ldk-node#456). I think I'll see to pick that up in the coming week(s), too. |
Actually: #4069 |
def7770
to
403385f
Compare
Updated |
6616a70
to
c2079d3
Compare
Ok(()) => inner.async_completed_updates.lock().unwrap().push(completion), | ||
Err(e) => { | ||
log_error!( | ||
inner.logger, | ||
"Failed to persist new ChannelMonitor {channel_id}: {e}. The node will now likely stall as this channel will not be able to make progress. You should restart as soon as possible.", | ||
); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling in spawn_async_persist_new_channel
adds the completion to async_completed_updates
regardless of whether there's an error, while the similar error handling in spawn_async_update_persisted_channel
at line 908 only adds the completion when there's no error. These behaviors should be consistent - either both should add completions only on success, or both should add completions regardless of errors.
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
c4748f9
to
5b058df
Compare
ed66038
to
a501bc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, I think.
I do worry that in three weeks we'll start forgetting about a lot of the context/details regarding the KVStore
ordering requirements and will trip up eventually. Makes sense to move forward now, but I do wonder if we could do a better job enforcing some of our assumptions via types eventually.
CHANNEL_MONITOR_PERSISTENCE_PRIMARY_NAMESPACE, | ||
CHANNEL_MONITOR_PERSISTENCE_SECONDARY_NAMESPACE, | ||
).await?; | ||
let primary = CHANNEL_MONITOR_PERSISTENCE_PRIMARY_NAMESPACE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference, and we discussed this many times already, but IMO these 'cleanups' make the code harder to read, as to me it's much clearer to read the fully-captialized constants in the write call than going through the indirection of 2-3 more less-descript variables.
I'd take some more verticality and some more indentation/structure over additional variables any day. FWIW, in any of these cases we even maintain the number of lines, so not even reduce verticality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some of the cases it could probably be walked back as i was hoping to reuse the namespace variables but it turned out they were different. Given I failed to recognize they were different we really need to change the constants so that they're much shorter and can be eyeballed much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some of the cases it could probably be walked back as i was hoping to reuse the namespace variables but it turned out they were different. Given I failed to recognize they were different we really need to change the constants so that they're much shorter and can be eyeballed much easier.
I disagree. They were intentionally kept expressive and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss on #4105, but I wrote to the wrong namespace at least 3 times at various points in this PR, both from copying around code and from bogus attempts at cleaning up code. That indicates to me pretty clearly we need to rework those constants cause they're gonna cause issues
struct PanicingSpawner; | ||
impl FutureSpawner for PanicingSpawner { | ||
fn spawn<T: Future<Output = ()> + MaybeSend + 'static>(&self, _: T) { | ||
unreachable!(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but if it panics we won't see the type name, no? Not that important, but some message could provide some initial context on what happened/why we crashed?
CHANNEL_MONITOR_UPDATE_PERSISTENCE_PRIMARY_NAMESPACE, | ||
monitor_key.as_str(), | ||
let primary = CHANNEL_MONITOR_UPDATE_PERSISTENCE_PRIMARY_NAMESPACE; | ||
// Note that this is NOT an async function, but rather calls the *sync* KVStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really makes me wonder if we should still somehow reflect the ordering requirements in the KVStore
interface itself (could split write
in register_write
/complete_write
calls, for example). It seems very likely to me that we'll otherwise eventually mess up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I agree there's room for a better API here, tho its not at all clear to me what that is. A register/write split seems like it locks the implementation in to something like our queuing stuff in FS Store, but that doesn't need to be required globally. We'd probably end up with some kind of crazy intermediate state return that gets passed back to complete, which also seems pretty annoying.
a501bc7
to
0abc47b
Compare
CI failure appears to just be a slow-IO issue :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash.
0abc47b
to
287b9c8
Compare
Squashed, only additionally addressing some comment from @tnull I'd missed $ git diff-tree -U1 0abc47b6c 462a6479d9
diff --git a/lightning/src/util/persist.rs b/lightning/src/util/persist.rs
index 5d3575e1ec..9036a27f49 100644
--- a/lightning/src/util/persist.rs
+++ b/lightning/src/util/persist.rs
@@ -1479,5 +1479,3 @@ mod tests {
// Exercise the `MonitorUpdatingPersister` with real channels and payments.
- fn do_persister_with_real_monitors(persisters_max_pending_updates: (u64, u64)) {
- let persister_0_max_pending_updates = persisters_max_pending_updates.0;
- let persister_1_max_pending_updates = persisters_max_pending_updates.1;
+ fn do_persister_with_real_monitors(max_pending_updates_0: u64, max_pending_updates_1: u64) {
let chanmon_cfgs = create_chanmon_cfgs(4);
@@ -1487,3 +1485,3 @@ mod tests {
&chanmon_cfgs[0].logger,
- persister_0_max_pending_updates,
+ max_pending_updates_0,
&chanmon_cfgs[0].keys_manager,
@@ -1497,3 +1495,3 @@ mod tests {
&chanmon_cfgs[1].logger,
- persister_1_max_pending_updates,
+ max_pending_updates_1,
&chanmon_cfgs[1].keys_manager,
@@ -1546,6 +1544,6 @@ mod tests {
let monitor_name = mon.persistence_key();
- let expected_updates = if persister_0_max_pending_updates == 0 {
+ let expected_updates = if max_pending_updates_0 == 0 {
0
} else {
- mon.get_latest_update_id() % persister_0_max_pending_updates
+ mon.get_latest_update_id() % max_pending_updates_0
};
@@ -1564,6 +1562,6 @@ mod tests {
let monitor_name = mon.persistence_key();
- let expected_updates = if persister_1_max_pending_updates == 0 {
+ let expected_updates = if max_pending_updates_1 == 0 {
0
} else {
- mon.get_latest_update_id() % persister_1_max_pending_updates
+ mon.get_latest_update_id() % max_pending_updates_1
};
@@ -1592,3 +1590,3 @@ mod tests {
let mut sender = 0;
- for i in 3..=persister_0_max_pending_updates * 2 {
+ for i in 3..=max_pending_updates_0 * 2 {
let receiver;
@@ -1635,6 +1633,5 @@ mod tests {
// Make sure everything is persisted as expected after close.
- // We always send at least two payments, and loop up to persister_0_max_pending_updates *
- // 2.
+ // We always send at least two payments, and loop up to max_pending_updates_0 * 2.
check_persisted_data!(
- cmp::max(2, persister_0_max_pending_updates * 2) * EXPECTED_UPDATES_PER_PAYMENT + 1
+ cmp::max(2, max_pending_updates_0 * 2) * EXPECTED_UPDATES_PER_PAYMENT + 1
);
@@ -1644,5 +1641,5 @@ mod tests {
fn persister_with_real_monitors() {
- do_persister_with_real_monitors((7, 3));
- do_persister_with_real_monitors((0, 1));
- do_persister_with_real_monitors((4, 2));
+ do_persister_with_real_monitors(7, 3);
+ do_persister_with_real_monitors(0, 1);
+ do_persister_with_real_monitors(4, 2);
} |
Though users maybe shouldn't use `MonitorUpdatingPersister` if they don't actually want to persist `ChannelMonitorUpdate`s, we also shouldn't panic if `maximum_pending_updates` is set to zero.
In the coming commits `MonitorUpdatingPersister`'s internal state will be reworked. To avoid spurious test diff, we instead use the public API of `MonitorUpdatingPersister` rather than internal bits in tests.
In the coming commits, we'll use the `MonitorUpdatingPersister` as *the* way to do async monitor updating in the `ChainMonitor`. However, to support folks who don't actually want a `MonitorUpdatingPersister` in that case, we explicitly support them setting `maximum_pending_updates` to 0, disabling all of the update-writing behavior.
As we've done with several other structs, this adds an async variant of `MonitorUpdatingPersister` and adds an async-sync wrapper for those using `KVStoreSync`. Unlike with other structs, we leave `MonitorUpdatingPersister` as the sync variant and make the new async logic a `MonitorUpdatingPersisterAsync` as the async monitor updating flow is still considered beta. This does not yet expose the async monitor updating logic anywhere, as doing a standard `Persist` async variant would not work for ensuring the `ChannelManager` and `ChainMonitor` don't block on async writes or suddenly require a runtime.
Pre-0.1, after a channel was closed we generated `ChannelMonitorUpdate`s with a static `update_id` of `u64::MAX`. In this case, when using `MonitorUpdatingPersister`, we had to read the persisted `ChannelMonitor` to figure out what range of monitor updates to remove from disk. However, now that we have a `list` method there's no reason to do this anymore, we can just use that. Simplifying code that we anticipate never hitting anymore is always a win.
In the next commit we'll use this to spawn async persistence operations in the background, but for now we just move the `lightning-block-sync` `FutureSpawner` into `lightning`.
In the next commit we'll add the ability to use an async `KVStore` as the backing for a `ChainMonitor`. Here we tee this up by adding an async API to `MonitorUpdatingPersisterAsync`. Its not intended for public use and is thus only `pub(crate)` but allows spawning all operations via a generic `FutureSpawner` trait, initiating the write via the `KVStore` before any `await`s (or async functions). Because we aren't going to make the `ChannelManager` (or `ChainMonitor`) fully async, we need a way to alert the `ChainMonitor` when a persistence completes, but we leave that for the next commit.
This finally adds support for full native Rust `async` persistence to `ChainMonitor`. Way back when, before we had any other persistence, we added the `Persist` trait to persist `ChannelMonitor`s. It eventualy grew homegrown async persistence support via a simple immediate return and callback upon completion. We later added a persistence trait in `lightning-background-processor` to persist the few fields that it needed to drive writes for. Over time, we found more places where persistence was useful, and we eventually added a generic `KVStore` trait. In dc75436 we removed the `lightning-background-processor` `Persister` in favor of simply using the native `KVStore` directly. Here we continue that trend, building native `async` `ChannelMonitor` persistence on top of our native `KVStore` rather than hacking support for it into the `chain::Persist` trait. Because `MonitorUpdatingPersister` already exists as a common way to wrap a `KVStore` into a `ChannelMonitor` persister, we build exclusively on that (though note that the "monitor updating" part is now optional), utilizing its new async option as our native async driver. Thus, we end up with a `ChainMonitor::new_async_beta` which takes a `MonitorUpdatingPersisterAsync` rather than a classic `chain::Persist` and then operates the same as a normal `ChainMonitor`. While the requirement that users now use a `MonitorUpdatingPersister` to wrap their `KVStore` before providing it to `ChainMonitor` is somewhat awkward, as we move towards a `KVStore`-only world it seems like `MonitorUpdatingPersister` should eventually merge into `ChainMonitor`.
`TestStore` recently got the ability to make async writes, but wasn't a very useful test as all writes actually completed immediately. Instead, here, we make the writes actually-async, forcing the test to mark writes complete as required.
We only require `Send + Sync` on the `Future`s which are spawned in std builds, so its weird to require them on the trait itself in all builds. Instead, make them consistent.
We already have pretty good coverage of the `MonitorUpdatingPersister` itself as well as `ChainMonitor`'s update-completion handling, so here we just focus on an end-to-end test to make sure `ChainMonitor` behaves at all correctly in the new async mode.
287b9c8
to
462a647
Compare
Curious to hear what folks think. No tests yet which I need to do.