Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support forking from mainnet (or any target network) #625

Open
janewang opened this issue Aug 7, 2024 · 12 comments
Open

Support forking from mainnet (or any target network) #625

janewang opened this issue Aug 7, 2024 · 12 comments
Assignees

Comments

@janewang
Copy link

janewang commented Aug 7, 2024

What problem does your feature solve?

To be able to replicate state and issues seen from another network, or used for testing.

What would you like to see?

Be able to recreate a state from mainnet. The node could be forked from the target network from a specific block or continously syncing to the target network.

What alternatives are there?

@leighmcculloch
Copy link
Member

leighmcculloch commented Aug 16, 2024

Internal document we should deliver on the action items for:

@leighmcculloch
Copy link
Member

leighmcculloch commented Aug 21, 2024

Proposed requirements:

  • Start quickstart, core catches up to a specific ledger (any ledger, not checkpoint) then disconnects from network and quorum and shifts to an unsafe quorum with just itself.
  • Maintains connection with local RPC, and local Horizon, etc. Or, RPC and local Horizon are started after the fork.
  • G account impersonation:
    • Be able to submit txs for existing mainnet G accounts without holding the signers.
    • Be able to submit soroban auths for existing mainnet G accounts without holding the signers.

Ideal requirements:

  • The forked network has a different network passphrase to the original network.
  • That the bulk of the fork functionality is built directly into stellar-core to make it possible to use stellar-core in isolation to connect to and fork a network.
  • C account impersonation:
    • Be able to submit soroban auths for existing mainnet C accounts without executing their __check_auth logic.

I think most of the work for this is adding capabilities to stellar-core, with some small work to expose those capabilities to quickstart. I don't think we could realistically implement this all in quickstart only, because there's no way to stop stellar-core at a specific ledger and starting and stopping core, swapping out config files, is likely to be brittle.

cc @anupsdf @dmkozh @janewang @tomerweller

@dmkozh
Copy link
Contributor

dmkozh commented Aug 21, 2024

G account impersonation:
Be able to submit txs for existing mainnet G accounts without holding the signers.
Be able to submit soroban auths for existing mainnet G accounts without holding the signers.

I'm not sure if real 'impersonation' is feasible; that seems too cumbersome and risky to maintain in Core. I think we could just disable signature verification if a certain Core config flag is set. This still seems risky, but at least is much easier to control. One can also use this mode to fund an arbitrary number of test accounts and then switch back into 'enforcement' mode (e.g. when they want to set up some sort of integration test).

The forked network has a different network passphrase to the original network.

I don't think that's a good idea; the network id defines the contract id namespace, so if we change the passphrase, then the network allow instantiating 2 SAC instances per asset and that's generally not the operation mode that we'd want to support in any capacity.

That the bulk of the fork functionality is built directly into stellar-core to make it possible to use stellar-core in isolation to connect to and fork a network.

Is the bulk of the functionality not already in the Core/has to be implemented in the Core (besides downstream service deps, that is)? I don't think we need to go beyond that - there needs to be some external orchestration and I don't think it belongs to Core.

C account impersonation:
Be able to submit soroban auths for existing mainnet C accounts without executing their __check_auth logic.

Similarly to G-accounts, we could just switch host to recording auth. I wouldn't try to go for more granular control than that.

@MonsieurNicolas
Copy link
Contributor

wrt requirements above: those seem to be solutions more than actual requirements.

Adding arbitrary overrides/hooks to core seems to be very brittle (as it's not "the real thing") and will make adding features slow (because now you need to coordinate DevX and core teams on future changes) and I don't see why devX (or others) would have to write different code depending on if they're testing against a "real core" or against some arbitrary state (be local filesystem for CLI or in the client for browser based solutions).

For background, we actually investigated some of those things as part of stellar/stellar-core#2695 -- this was before Soroban.

Here are few things to think about:

  • changing the network passphrase is probably not doable as it changes auth, but also all contract IDs (include SAC). So things like proxy contracts and SAC balances will break. It also breaks classic constructs like AMMs, CBs, etc.
  • auth breaking is the "canary", signature verification does not occur exclusively during auth. if people want to test interactions in the context of layer 2/bridge development, they will run into similar issues elsewhere (in some cases it's more that you need to control which role a specific address has, typically stored in some data entry or wrapped in some access token).
  • as you have multiple core nodes running, they all need to behave exactly the same.

I would actually try to flip this work on its head by exposing a much narrower set of functionality in core and let people outside iterate on functionality.

For example, if we were adding a special native contract (only enabled when a special flag is set) that allows to create/update/delete arbitrary ledger entries (first version, we can limit this to soroban code/data, but there could be other methods added in the future to make changes to classic entries, network settings or even TTL entries).

Note that we would still need to do something to allow people to use this contract, so maybe the special flag that enables that functionality would also reset the "network admin account" somehow (so that people can submit transactions with it). For example GAAZI4TCR3TY5OJHCTJC2A4QSY6CJWJH5IAJTGKIN2ER7LBNVKOCCWN7 on the current public network is "locked" right now and does not have a lot of XLMs see its state in lab.

With this functionality you can:

  • replace any contract code by anything -- so if I don't like a policy contained in a contract, I can just change it, so for example change the admin check code to "return true", or add new methods to popular contracts. The replacement could also just be a wrapper of sorts that performs some pre-post processing before invoking the "real" wasm.
  • replace ledger entries based on educated guesses (like if you know where a balance is stored) or by using the output of a simulation run (that bypasses all auth)

I could see the same logic built on top of this kind of functionality usable either on top of a "quickstart image" like this, or in a pure client side (browser based or cli where the "host" is not core).

@leighmcculloch
Copy link
Member

👍🏻 Thanks, this is really helpful feedback.

If we went for the narrower set of functionality in core focused on supporting quickstart coordinating the forking and supporting ledger entry substitution, could we make these two changes in core?

  • Add --stop-at-ledger option to stellar run stellar-core#4427 so that we can start core, catch up to a specific ledger, then exit, change quorum cfg, restart core. Without this we can in theory do this with checkpoints only with the catchup command?
  • a new http endpoint that accepts a ledger entry which overwrites that entry before, the next ledger.
    • This can be used to reset the network root account.
    • This can be used to do anything that the contract @MonsieurNicolas you suggested could do, but without the need for people to build valid txs, or build contract invocations that requires rpc to simulate for costs and footprints.

With those two changes quickstart in fork mode would:

  • catch up core to ledger # (core stops/shutsdown itself after catch up)
  • change cfg of core+rpc+horizon to local core instance only quorum
  • start core, rpc, horizon
  • call core's new http endpoint to:
    • reset thresholds of root account so the root's master key works again
    • set balance to u32::MAX for native of root account
  • start friendbot (it uses the root account to issue test accounts)

Then folks can use the fork like any test network, or they can use the new http endpoint to sub any other data.

Technically it wouldn't allow you to do everything you might want to do. You might want to disable auth on a contract without subbing the entire contract and subbing ledger entries wouldn't let you do that. But I think the above would get us 80% there, and then we can add other features as needed such as recording auth like what @dmkozh suggested.


  • changing the network passphrase is probably not doable as it changes auth, but also all contract IDs (include SAC)

I understand the difficulty with contract IDs. It's unfortunate that we tied the IDs to the network passphrase, because it hasn't turned out to be a benefit. Could we separate the network passphrase/id concept so that a network could change it's ID for future signatures (txs, auths) while keeping it's "original ID" for contract IDs and other uses?

The risk of a tx accidentally being submitted to pubnet exists. Even though txs won't be naturally circulated to pubnet, there's a footgun opportunity that someone copies a test tx that they're developing with and pastes it into something like the Lab, then accidentally submitting it to pubnet, or runs the forked setup in a public CI environment where their private key might be secret but a signed tx is leaked and someone submits to pubnet.

@tomerweller
Copy link

The risk of a tx accidentally being submitted to pubnet exists.

Just want to emphasize that this is a very real foot gun if we maintain the same passphrase. Developers often jump between networks and often accidentally submit a transaction in the wrong network (happens to me all the time). If we promote a flow in which local debug transactions are valid on mainnet someone will accidentally submit them on mainnet.

@janewang janewang added this to DevX Aug 27, 2024
@github-project-automation github-project-automation bot moved this to Backlog (Not Ready) in DevX Aug 27, 2024
@janewang janewang moved this from Backlog (Not Ready) to Backlog (Ready for Design) in DevX Aug 27, 2024
@MonsieurNicolas
Copy link
Contributor

yeah the passphrase issue is quite annoying -- changing it "partially" would require adopting this partial switch all over SDKs etc (ie: SDKs today compute the SAC address for example, so they would need to know about this split and only use the new ID when signing payloads).

Going back to what we're trying to do here: do we really need to fork an entire network's state?
What about the original requirement of "continuously syncing to the network"?

Could this work be instead be rescoped to just "import and transform" (that can be extended as much as needed with contract specific transforms): I imagine that the list of entries to import is actually small (and simple to generate) and transforms (like compute different hashes) are also fairly simple to do.

With this paradigm:

  • "forking" a network is a matter of seconds, even on top of pubnet (that would normally require downloading GBs of data). So even "rebasing" some changes on top of the latest network state should be doable + as the overhead is very low, a fork can be run everywhere (laptop, browser, etc)
  • separate passphrase -> no risk of signing something valid on existing networks
  • no need to deal with archived state (something nobody mentioned so far)

to make this work, I think the only core/platform change needed would be to support taking as an argument a file that contains the genesis ledger + its ledger header.

@github-actions github-actions bot added the stale label Oct 30, 2024
@janewang janewang removed the stale label Oct 30, 2024
@janewang janewang assigned janewang and unassigned janewang Oct 30, 2024
@stellar stellar deleted a comment from github-actions bot Nov 25, 2024
@sagpatil sagpatil moved this from Todo (Ready for Dev) to Backlog (Ready for Design) in DevX Dec 3, 2024
@leighmcculloch
Copy link
Member

There's another use case as well where changing the passphrase would restrict functionality: it's reasonable I think to fork a network, and then to also be able to apply some legit transactions from that network to the fork. While that approach isn't guaranteed to work because state may diverge, I think it's reasonable enough to support it.

We could go a step further and say, what if forking just allows forking but doesn't actually fork immediately, such as the network is running taking in data from the network, but a command to modify a signer, or something like that, will result in it forking, and that fork is tolerated rather than halted on.

Supporting ideas like this would require the network ID to stay the same.

Ideally we also find a way for transaction hashes to stay consistent, which is trickier than the network ID problem, because transaction hashes derived from the TransactionSignaturePayload are what gets signed.

So I think what we need is to signal a transaction should only be accepted by a development node, or a development fork of a network. That signal needs to be included in the transaction envelope outside the transaction and not affect or in the TransactionSignaturePayload. So it could be signatureless. i.e. a network with forking enabled can execute txns with no signature, rather than a special signature.

Transactions without signatures naturally can't be delivered to a real network. There's actually no need for any special new signal, it's just no signatures are needed.

Signatures are still needed for SorobanAuthorizationEntry's, but we can address "cheatcode" like capabilities there using the pattern @MonsieurNicolas suggested where there's a way to deploy a contract with a transaction that then lets you set any ledger entry and reconfigure contracts.

@leighmcculloch
Copy link
Member

As a bonus, no signatures is extremely easy to integrate into SDKs. SDKs and wallets don't need special "way to sign" to work with the fork, devs can just build txns without signing them and submit them.

@leighmcculloch
Copy link
Member

leighmcculloch commented Dec 16, 2024

  • "forking" a network is a matter of seconds

If we take the approach above where we replace "forking" with "allowing forking and enable some cheatcodes", then we don't actually need to focus on fast forking, we can instead focus on fast network joining which would benefit all users, not only forking devs.

If a node can join the network in a matter of seconds, then someone who wants to fork can join the network in a matter of seconds. Their node will continue to follow the network until such time as the fork occurs.

For devs who wish to test actions at a specific ledger, we'd still need some sort of 'start and fork immediately'.

So at a high level I think this looks like:

  1. Project to support ultra fast network joining (goal < 5-10 seconds with good bandwidth) (this work disconnected from forking completely)

  2. Project to support a dev mode start and fork immediately.

  3. Project to support a dev mode start and tolerate a fork when it occurs.

  4. Project to support a dev mode allow signature-less txs

  5. Project to support a dev mode admin interface for writing / deleting any ledger entry, either via a command, or via a transaction invoking an admin contract not normally available

  6. Project to support snapshotting ledger state in a (2) situation, and restoring, so that a node could be brought up at that specific ledger with zero outside world communication for preserving a test scenario for run in CI.

@MonsieurNicolas Thoughts?

@MonsieurNicolas
Copy link
Contributor

I am not sure I totally following this thread. Can you rework it so that we have a clear list of actual problems worth getting solved (decoupled from solutions) and hard requirements?

Like: if we're actually implementing any "forking" while still producing signatures valid on the public network, it seems we still don't have closure on that topic?

The "signature less" idea is interesting -- if we go in this direction, I am not sure how it differs from simulation though (and/or why we can't just solve this whole thing as an extension on top of simulation semantics).

@leighmcculloch
Copy link
Member

leighmcculloch commented Dec 20, 2024

I'll rework it.

To answer the question on signatureless - simulation doesn't provide a full running network that persists state across simulation, simulation gets fees and costings wrong, and simulation doesn't work with classic operations. But fair point, we could address those gaps. It would make simulation more useful. There is also something comforting about forking because you are running the full system for real.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog (Ready for Design)
Development

No branches or pull requests

5 participants