-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for actor upgrades #1866
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1866 +/- ##
===========================================
- Coverage 75.73% 56.05% -19.68%
===========================================
Files 152 153 +1
Lines 15073 15259 +186
===========================================
- Hits 11415 8553 -2862
- Misses 3658 6706 +3048
|
6c352f5
to
bb5107e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial feedback
fvm/src/call_manager/default.rs
Outdated
// Make a store. | ||
let mut store = engine.new_store(kernel); | ||
|
||
// TODO: a hack until I find a better/simpler way to do this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, honestly, we can probably treat these errors as "fatal" and check earlier. I.e., the final version of upgrade (not for this PR, but later) would:
- Lookup the target code CID in some "deployed actors" table in the init actor.
- Check some metadata generated on deploy to see if the actor supports the upgrade entrypoint.
By the time we get to this code, everything will have been checked and any errors would be considered "fatal".
For now, what we'd likely do is check if:
- The caller is one of builtin actor types X/Y/Z (in this case, it would be restricted to an eth account).
- The target code CID is an EVM actor (again, using the builtin actors manifest).
Because that's the immediate use-case (upgrading eth accounts into EVM actors).
dc4e730
to
22242b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more thoughts. Let's talk tomorrow.
fvm/src/call_manager/default.rs
Outdated
|
||
fn maybe_put_registry(&mut self, br: &mut BlockRegistry) -> Result<()> { | ||
match self.entrypoint { | ||
Entrypoint::Invoke(_) => Ok(()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should treat invoke params the same way (if possible).
2de14cd
to
22318f9
Compare
@Stebalien ready for another review. There are still some build errors in CI which are not occurring on my machine which I am looking into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only major thing is recursion handling. Otherwise, quibbles but LGTM. I'll take one more look tomorrow.
fvm/src/kernel/default.rs
Outdated
if code != code_after_upgrade { | ||
return Err(syscall_error!(Forbidden; "re-entrant upgrade detected").into()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should actually be in send
, not upgrade
. I.e.:
- If an actor calls
A (upgrade -> upgrade -> upgrade)
, that's fine. Because, on success, we'll "unwind" past all the upgrade calls all at once. - If an actor has a call stack like
A -> B -> A (upgrade)
, that's not fine and we need to abort when we return to the top-level call intoA
. Otherwise, we'd end up running old code.
It's that last point that's problematic: We never want to re-enter some actor code after we've upgraded away from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, in that second case, we should treat it as a "revert" of B
(I think?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after talking this through with @maciejwitowski, there's an alternative here that I kind of like. Instead of catching this when we return to the top-level invocation of A
, we can catch this on the upgrade syscall itself. To do this, we'd need to:
- Keep a map of
ActorID -> reentrancy count
, incrementing the count each time we re-enter an actor already on the call-stack. - In the upgrade syscall, if this count is non-zero (for the current actor), return a
Forbidden
syscall error (or something like that).
One reason I like this approach is that, independent of this particular change, I'd like to find a way to expose whether or not a call is reentrant to actors. I.e., some kind of flag in their environment where they can detect that they're within a reentrant call. I expect most actors will, at that point, simply abort with an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien I played around with this approach in 109c5e9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I mean that we need to keep track of calls in progress, not upgrades. I.e., if some actor A
is already on the call stack, no "deeper" instance of A
should be able to call the upgrade
syscall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh lets talk sync today, Oh I see, updated the PR with that in mind and added new tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this now, right?
109c5e9
to
1913868
Compare
16d03b2
to
dfc4d61
Compare
dfc4d61
to
0f65803
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing: Let's make sure we cover cases like:
- Selfdestruct -> upgrade
- Re-entrent call -> upgrade
- upgrade -> upgrade
- failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed).
- upgrade to some code CID that doesn't exist (maybe?)
- upgrade to some code CID that doesn't implement the
upgrade
entrypoint.
You're probably already covering some of these cases.
sdk/src/actor.rs
Outdated
@@ -107,6 +107,21 @@ pub fn create_actor( | |||
} | |||
} | |||
|
|||
/// Upgrades an actor using the given block which includes the old code cid and the upgrade params | |||
pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> SyscallResult<Response> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave it like this for now, but we should have a better API.
- We generally take
Cid
by reference (it's large so we avoid copying till we need to). - There is no "success" case here.
It should probably look something like:
pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> SyscallResult<Response> { | |
pub enum UpgradeError { | |
CodeNotInstalled, // code not available, could be worded better. | |
InvalidUpgradeTarget, // code doesn't implement the upgrade entrypoint. | |
ReentrentUpgrade, // actor already on the call stack. | |
UpgradeFailed(Response), | |
} | |
pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> Result<std::convert::Infallible, UpgradeError> { |
I wish we could use !
(the correct never type) but that's not stable yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to do this is that we want to treat every possible return here as an error. If the user blindly writes upgrade_actor(...)?
or upgrade_actor(...).expect("upgrade failed")
, that should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. ReentrentUpgrade
may not quite work (see my other comments on the conflict with selfdestruct). Honestly, I'd just change the variant in this enum to IllegalOperation
and document it as an "either or" case (where, 99% of the time, it'll be because the actor doesn't exist).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to take CID by reference, will take a look at infallable/UpgradeError next
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our conversation below, InvalidUpgradeTarget
likely isn't something we need to include in this enum (that's covered by an exit code, not an error number, and we should probably leave the exit codes alone).
sdk/src/sys/actor.rs
Outdated
/// | [`InvalidHandle`] | parameters block not found. | | ||
/// | [`LimitExceeded`] | recursion limit reached. | | ||
/// | [`IllegalArgument`] | invalid code cid buffer. | | ||
/// | [`Forbidden`] | target actor doesn't have an upgrade endpoint. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we're done with everything else, let's make sure to go back and revisit this.
- Rename send_* to call_actor_* in call manager - Check if upgrade is allowed locally inside kernel upgrade_actor - Other minor refactorings
From this list, we are at least covering:
From the remaining cases:
|
It's probably fine to leave that out, in that case.
I mean, I want to make sure that the following works:
At this point, the actor's code CID should be |
Are you sure it's testing the right thing? I.e., See #1866 (comment) |
- Now checking correctly that the code cid changed - Refactored tests into test cases which makes them more readable - Added new upgrade receiver actor so we can test upgrades with different actors - Added test case for testing failure in a recursive upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
This is a minimal version of #1906 that still compiles the relevent code, just disables it at runtime.
See: #717
This PR adds support for actor upgrades through a new
sdk::actor::upgrade_actor
syscall. This allows actors to upgrade to a new version while still keeping the same address, balance, etc.The implementation follows the proposal discussed in filecoin-project/FIPs#396 almost exactly except with the two minor changes:
sself::become_actor
but we renamed it tosdk::actor::upgrade_actor
to better reflect its meaningUpgradeInfo
parameter to pass in metadata IPLD block (see FVM: Actor Upgrades FIPs#396 (comment))