Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for actor upgrades #1866

Merged
merged 40 commits into from
Oct 26, 2023
Merged

Add support for actor upgrades #1866

merged 40 commits into from
Oct 26, 2023

Conversation

fridrik01
Copy link
Contributor

@fridrik01 fridrik01 commented Aug 30, 2023

See: #717

This PR adds support for actor upgrades through a new sdk::actor::upgrade_actor syscall. This allows actors to upgrade to a new version while still keeping the same address, balance, etc.

The implementation follows the proposal discussed in filecoin-project/FIPs#396 almost exactly except with the two minor changes:

  • The new syscall was referred to as sself::become_actor but we renamed it to sdk::actor::upgrade_actor to better reflect its meaning
  • We decided to not create a separate syscall to get the code cid of the actor initiating the upgrade, but instead the new syscall takes a second UpgradeInfo parameter to pass in metadata IPLD block (see FVM: Actor Upgrades FIPs#396 (comment))

@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2023

Codecov Report

Merging #1866 (84a47d1) into master (e9044cb) will decrease coverage by 19.68%.
Report is 2 commits behind head on master.
The diff coverage is 0.00%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1866       +/-   ##
===========================================
- Coverage   75.73%   56.05%   -19.68%     
===========================================
  Files         152      153        +1     
  Lines       15073    15259      +186     
===========================================
- Hits        11415     8553     -2862     
- Misses       3658     6706     +3048     
Files Coverage Δ
fvm/src/kernel/error.rs 60.00% <ø> (-12.10%) ⬇️
fvm/src/syscalls/send.rs 0.00% <ø> (-100.00%) ⬇️
fvm/src/trace/mod.rs 0.00% <ø> (-100.00%) ⬇️
shared/src/error/mod.rs 0.00% <ø> (-48.58%) ⬇️
shared/src/lib.rs 20.00% <ø> (ø)
fvm/src/call_manager/backtrace.rs 0.00% <0.00%> (-84.06%) ⬇️
fvm/src/syscalls/error.rs 0.00% <0.00%> (-48.49%) ⬇️
shared/src/upgrade/mod.rs 0.00% <0.00%> (ø)
sdk/src/send.rs 0.00% <0.00%> (ø)
fvm/src/syscalls/mod.rs 0.00% <0.00%> (-97.82%) ⬇️
... and 9 more

... and 45 files with indirect coverage changes

@fridrik01 fridrik01 force-pushed the actor-upgrades branch 6 times, most recently from 6c352f5 to bb5107e Compare September 1, 2023 16:12
@fridrik01 fridrik01 requested a review from Stebalien September 5, 2023 14:34
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial feedback

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
// Make a store.
let mut store = engine.new_store(kernel);

// TODO: a hack until I find a better/simpler way to do this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, honestly, we can probably treat these errors as "fatal" and check earlier. I.e., the final version of upgrade (not for this PR, but later) would:

  1. Lookup the target code CID in some "deployed actors" table in the init actor.
  2. Check some metadata generated on deploy to see if the actor supports the upgrade entrypoint.

By the time we get to this code, everything will have been checked and any errors would be considered "fatal".


For now, what we'd likely do is check if:

  1. The caller is one of builtin actor types X/Y/Z (in this case, it would be restricted to an eth account).
  2. The target code CID is an EVM actor (again, using the builtin actors manifest).

Because that's the immediate use-case (upgrading eth accounts into EVM actors).

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
@fridrik01 fridrik01 force-pushed the actor-upgrades branch 3 times, most recently from dc4e730 to 22242b1 Compare September 8, 2023 15:21
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more thoughts. Let's talk tomorrow.

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved

fn maybe_put_registry(&mut self, br: &mut BlockRegistry) -> Result<()> {
match self.entrypoint {
Entrypoint::Invoke(_) => Ok(()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should treat invoke params the same way (if possible).

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
@fridrik01 fridrik01 force-pushed the actor-upgrades branch 5 times, most recently from 2de14cd to 22318f9 Compare October 3, 2023 15:00
@fridrik01 fridrik01 changed the title WIP: Actor Upgrades Add support for actor upgrades Oct 3, 2023
@fridrik01
Copy link
Contributor Author

@Stebalien ready for another review. There are still some build errors in CI which are not occurring on my machine which I am looking into

@fridrik01 fridrik01 marked this pull request as ready for review October 3, 2023 15:23
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only major thing is recursion handling. Otherwise, quibbles but LGTM. I'll take one more look tomorrow.

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/kernel/default.rs Outdated Show resolved Hide resolved
Comment on lines 940 to 942
if code != code_after_upgrade {
return Err(syscall_error!(Forbidden; "re-entrant upgrade detected").into());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should actually be in send, not upgrade. I.e.:

  1. If an actor calls A (upgrade -> upgrade -> upgrade), that's fine. Because, on success, we'll "unwind" past all the upgrade calls all at once.
  2. If an actor has a call stack like A -> B -> A (upgrade), that's not fine and we need to abort when we return to the top-level call into A. Otherwise, we'd end up running old code.

It's that last point that's problematic: We never want to re-enter some actor code after we've upgraded away from it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, in that second case, we should treat it as a "revert" of B (I think?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after talking this through with @maciejwitowski, there's an alternative here that I kind of like. Instead of catching this when we return to the top-level invocation of A, we can catch this on the upgrade syscall itself. To do this, we'd need to:

  1. Keep a map of ActorID -> reentrancy count, incrementing the count each time we re-enter an actor already on the call-stack.
  2. In the upgrade syscall, if this count is non-zero (for the current actor), return a Forbidden syscall error (or something like that).

One reason I like this approach is that, independent of this particular change, I'd like to find a way to expose whether or not a call is reentrant to actors. I.e., some kind of flag in their environment where they can detect that they're within a reentrant call. I expect most actors will, at that point, simply abort with an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien I played around with this approach in 109c5e9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I mean that we need to keep track of calls in progress, not upgrades. I.e., if some actor A is already on the call stack, no "deeper" instance of A should be able to call the upgrade syscall.

Copy link
Contributor Author

@fridrik01 fridrik01 Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh lets talk sync today, Oh I see, updated the PR with that in mind and added new tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this now, right?

fvm/src/kernel/error.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/mod.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/mod.rs Outdated Show resolved Hide resolved
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing: Let's make sure we cover cases like:

  • Selfdestruct -> upgrade
  • Re-entrent call -> upgrade
  • upgrade -> upgrade
  • failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed).
  • upgrade to some code CID that doesn't exist (maybe?)
  • upgrade to some code CID that doesn't implement the upgrade entrypoint.

You're probably already covering some of these cases.

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/mod.rs Outdated Show resolved Hide resolved
shared/src/lib.rs Outdated Show resolved Hide resolved
testing/integration/tests/readonly_test.rs Show resolved Hide resolved
sdk/src/actor.rs Outdated
@@ -107,6 +107,21 @@ pub fn create_actor(
}
}

/// Upgrades an actor using the given block which includes the old code cid and the upgrade params
pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> SyscallResult<Response> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave it like this for now, but we should have a better API.

  • We generally take Cid by reference (it's large so we avoid copying till we need to).
  • There is no "success" case here.

It should probably look something like:

Suggested change
pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> SyscallResult<Response> {
pub enum UpgradeError {
CodeNotInstalled, // code not available, could be worded better.
InvalidUpgradeTarget, // code doesn't implement the upgrade entrypoint.
ReentrentUpgrade, // actor already on the call stack.
UpgradeFailed(Response),
}
pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> Result<std::convert::Infallible, UpgradeError> {

I wish we could use ! (the correct never type) but that's not stable yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to do this is that we want to treat every possible return here as an error. If the user blindly writes upgrade_actor(...)? or upgrade_actor(...).expect("upgrade failed"), that should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. ReentrentUpgrade may not quite work (see my other comments on the conflict with selfdestruct). Honestly, I'd just change the variant in this enum to IllegalOperation and document it as an "either or" case (where, 99% of the time, it'll be because the actor doesn't exist).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to take CID by reference, will take a look at infallable/UpgradeError next

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our conversation below, InvalidUpgradeTarget likely isn't something we need to include in this enum (that's covered by an exit code, not an error number, and we should probably leave the exit codes alone).

fvm/src/kernel/default.rs Outdated Show resolved Hide resolved
/// | [`InvalidHandle`] | parameters block not found. |
/// | [`LimitExceeded`] | recursion limit reached. |
/// | [`IllegalArgument`] | invalid code cid buffer. |
/// | [`Forbidden`] | target actor doesn't have an upgrade endpoint. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we're done with everything else, let's make sure to go back and revisit this.

testing/integration/tests/main.rs Outdated Show resolved Hide resolved
- Rename send_* to call_actor_* in call manager
- Check if upgrade is allowed locally inside kernel upgrade_actor
- Other minor refactorings
@fridrik01
Copy link
Contributor Author

Testing: Let's make sure we cover cases like:

  • Selfdestruct -> upgrade
  • Re-entrent call -> upgrade
  • upgrade -> upgrade
  • failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed).
  • upgrade to some code CID that doesn't exist (maybe?)
  • upgrade to some code CID that doesn't implement the upgrade entrypoint.

You're probably already covering some of these cases.

From this list, we are at least covering:

  • Selfdestruct -> upgrade: covered with method 5 in fil-upgrade-actor
  • Re-entrent call -> upgrade: covered with method 4 in fil-upgrade-actor
  • upgrade -> upgrade: covered with method 3 in fil-upgrade-actor
  • upgrade to some code CID that doesn't implement the upgrade entrypoint: covered in fil-syscall-actor).

From the remaining cases:

  • upgrade to some code CID that doesn't exist (maybe?): Right now we blindly accept code_cids in the kernel (as long as they could be parsed) and add them to the actor state. What would be the way to go do check if that code actually exists? (maybe store.get_cbor)?
  • failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed): Not sure I understand this, so its a case of "upgrade -> upgrade" , how would the 2nd upgrade CID stick around as long as the syscall succeded.

@Stebalien
Copy link
Member

upgrade to some code CID that doesn't exist (maybe?): Right now we blindly accept code_cids in the kernel (as long as they could be parsed) and add them to the actor state. What would be the way to go do check if that code actually exists? (maybe store.get_cbor)?

It's probably fine to leave that out, in that case.

failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed): Not sure I understand this, so its a case of "upgrade -> upgrade" , how would the 2nd upgrade CID stick around as long as the syscall succeded.

I mean, I want to make sure that the following works:

  1. Upgrade from code A to code B.
  2. Inside code B's upgrade function, try to recursively upgrade to code C. Fail.
  3. Return success from code B's upgrade function.

At this point, the actor's code CID should be B, not A or C.

@Stebalien
Copy link
Member

upgrade -> upgrade: covered with method 3 in fil-upgrade-actor

Are you sure it's testing the right thing? I.e., upgrade (to code A) -> upgrade (to code B) should result in code B.

See #1866 (comment)

fridrik01 and others added 3 commits October 26, 2023 12:59
- Now checking correctly that the code cid changed
- Refactored tests into test cases which makes them more readable
- Added new upgrade receiver actor so we can test upgrades with different actors
- Added test case for testing failure in a recursive upgrade
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

This is a minimal version of #1906 that still compiles the relevent
code, just disables it at runtime.
@Stebalien Stebalien enabled auto-merge (squash) October 26, 2023 19:17
@Stebalien Stebalien merged commit c412b3a into master Oct 26, 2023
14 checks passed
@Stebalien Stebalien deleted the actor-upgrades branch October 26, 2023 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants