Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research I/O gas measurement #1160

Closed
5 tasks done
freesig opened this issue Apr 25, 2023 · 7 comments
Closed
5 tasks done

Research I/O gas measurement #1160

freesig opened this issue Apr 25, 2023 · 7 comments
Assignees

Comments

@freesig
Copy link
Contributor

freesig commented Apr 25, 2023

Investigate State-of-the-Art Solutions for Costing State Storage and Bandwidth on Blockchains

This issue aims to research and analyze the current state of the art solutions for costing state storage and bandwidth on blockchains. Consider the trade-offs for each point, focusing on security, scalability, and implementation complexity. The analysis will cover topics such as state rent, the Arweave model, data pruning, and witness data charging.

Look for any undesired outcomes like people arbitraging bogus state to profit off state refunds.

1. State of the art solutions for costing state storage and bandwidth on blockchains

  • Investigate various approaches to state storage and bandwidth costing.
  • Analyze the trade-offs between different solutions.

2. Latest research in state rent

  • Review recent publications and proposals related to state rent.
  • Evaluate the benefits and drawbacks of implementing state rent in our system.

3. Arweave model for "one price forever"

  • Study the Arweave model and its "one price for ever" policy for data storage.
  • Assess the feasibility of adopting this model and the potential trade-offs involved.

4. Data pruning

  • Determine the types of data that can be pruned without compromising system integrity.
    • For example we probably can't prune UTXO ids from state.

5. Charging for witness data

  • Investigate whether charging for witness data is a needed for our system.
  • Examine the pros and cons of implementing this policy.

Goal:

The goal of this issue is to gain a comprehensive understanding of the current state of the art solutions for costing state storage and bandwidth on blockchains, and to evaluate the trade-offs for each point, focusing on implementation complexity.

Completion Criteria:

This issue is considered complete when the following criteria are met:

  • A concise analysis of the current state of the art solutions for costing state storage and bandwidth on blockchains is provided, focusing on trade-offs and implementation complexity.
  • The latest research in state rent is reviewed and assessed, with emphasis on trade-offs and implementation complexity.
  • The Arweave model for "one price forever" is examined and evaluated, considering trade-offs and implementation complexity.
  • Data pruning considerations are discussed, and types of data that can be pruned are identified, with a focus on trade-offs and implementation complexity.
  • The feasibility of charging for witness data is determined.

Once these criteria are met, we should have a clear understanding of the best course of action for our system in terms of state storage and bandwidth costing.

@freesig freesig changed the title Design I/O gas costing Research I/O gas costing May 1, 2023
@freesig
Copy link
Contributor Author

freesig commented May 2, 2023

I suggest the first starting point for this issue should be to review the description and request clarity on any point that is not clear. Also if you have other paths to investigate then please add them in the comments and we can update the description.

@Voxelot
Copy link
Member

Voxelot commented May 2, 2023

Another idea I had was to charge less for overwriting existing storage slots than inserting into fresh ones. This could incentivize more state reuse and penalize state growth.

@Voxelot
Copy link
Member

Voxelot commented May 2, 2023

For refunds on state deletion, one idea I had for mitigating congestion was to scale the amount of any potential refund by a running average of recent I/O usage. For example, if the average number of IOPS for the last 50 blocks is maxed out (based on EBS provisioning etc), then removing state would cost the user gas. If average recent IOPs is low, then a user might be eligible for a refund by reclaiming storage space while the network isn't busy. However, refunds are complex if we allow users to pay fees in other tokens like ETH or USDC.

@freesig
Copy link
Contributor Author

freesig commented May 2, 2023

Yeh I think it's hard to overcome gaming issues with things like running averages. Refunds is my least favorite idea due to it's potential for unintended incentivisations.

@xgreenx xgreenx changed the title Research I/O gas costing Research I/O gas measurement May 29, 2023
@Voxelot
Copy link
Member

Voxelot commented Aug 22, 2023

1. State of the art solutions for costing state storage and bandwidth on blockchains

  • Arbitrum 2d fees: https://medium.com/offchainlabs/understanding-arbitrum-2-dimensional-fees-fd1d582596c9
    • Arbitrum is bound by the ethereum transaction format which is only able to charge fees in one dimension. However, layer 2’s have multiple dimensions to consider, the cost of l2 execution + l1 data availability. The formula they use to cram these into one simple gas cost is the following:
      • G = L2 gas used + ( L1 calldata price * L1 calldata size) / (L2 gas price)
  • Solana state rent: https://docs.solana.com/implemented-proposals/rent
  • Multi-dimensional pricing w/ eip1559:
    • https://baincapitalcrypto.com/multi-dimensional-on-chain-resource-pricing/
    • https://ethresear.ch/t/multidimensional-eip-1559/11651
      • These papers and discussions outline an approach to improve the accuracy of gas charging by charging for different types of resources separately. By utilizing the dynamic basefee adjustment of EIP 1559, the price of each type of resource changes independently based on demand vs target level for that resource. For example, if most transactions are light on storage but execution heavy, the price of execution will increase without affecting other transactions that are storage heavy and light on execution. This allows for more throughput and utilization of nodes without exceeding their storage or execution budget and potentially allows the network to recover from underpriced opcodes which could otherwise lead to a DOS.

2. Latest research in state rent

Not researching this in depth, ultimately the UX overhead of users paying rent seems more complex and risky than we want to navigate. It’s also more difficult to automate if users hold their funds in UTXOs, as rent fees can’t be automatically deducted from account balances.

3. Arweave model for "one price forever"

Arweave bases their one price forever by estimating the cost to store a gigabyte of data for one hour.

image

Given the assumption that over time, the ratio of HDDsz / HDDprice follows a predictable decreasing curve, they then approximate the forever cost to be the sum of all Pgbh * data stored over an infinite time horizon.

image

The price of storage is then converted into an estimated market value of their token at the current time.

There's also a fallback mechanism for incentivizing nodes to continue holding that data through a “Storage Endowment” if the cost of storing the data isn’t being adequately covered by new fees (i.e. forever storage price). The endowment is an inflationary reward based on the current amount of data stored on the network vs the profit the nodes are already making in other areas.

This seems to be a reasonable approach if the only goal is to adequately cover storage costs, regardless of how much state growth occurs. However it’s likely an inadequate deterrent for preventing undesirable state bloat in a smart contract platform such as Fuel. If we have an ideal hardware target in mind (such as not exceeding 1/2 tb of growth per year on a full-node at max TPS), we likely need a model which keeps prices in line with a target state growth rate instead.

Another downside is that it’s heavily dependent on strong assumptions about external factors such as the Arweave token value in USD terms, and the price of storage in USD terms. This could destabilize the Arweave economy if there’s unexpected periods of USD inflation or a contraction in HDD supply due to geopolitical reasons or natural disasters affecting the production of NAND chips used in SSDs (covid, tsunamis etc).

https://www.arweave.org/yellow-paper.pdf

4. Data pruning

  • Once fraud proofs are enabled, non-historical nodes are able to prune a significant amount of data older than the challenge window:
    • Block headers
    • Transactions
    • Receipts
  • All nodes are free to prune spent UTXOs, ie coins and most of the data from messages (excluding the id of a spent message)

5. Charging for witness data

The original rationale for excluding witness data from fees was the following:

  • Third parties are allowed to modify witness data, which means witness data fees could be a vector for abusing users by making them pay more gas than expected. Third-party witness malleability is needed to enable native meta-transactions and other advanced features.
  • Witness data can be pruned from fuel after the fraud-proving window has elapsed, so the data is only transient.
  • DA layers like Celestia will be able to horizontally scale, essentially making witness data a negligible overhead.

However, there are some negative consequences to this approach, namely that people could DOS our block space for free! Some counterpoints to the above:

  • This is easily solved via adding another field to transactions which allows users to limit the total amount of witness data includable into a transaction. This enables users to ensure their transaction doesn’t get bloated by spurious witness data and cause unexpected fees.
  • Contract bytecode is uploaded to the chain through witness fields, and isn’t prune-able. In addition, witness data isn’t special here, as all block data past the fraud proving window could be pruned. Since all block data is prune-able, why charge for some bytes and not for others?
  • This is speculation and remains to be seen. Even if storage is free there will inevitably be some form of scarcity encountered on DA layers such as shared upload bandwidth which will need to be priced in. We can always adapt to this development later instead of exposing ourselves to undue risk now in the hope that DA will one day be basically free.

Suggested course of action:

In the near term, choose a set of fixed parameters based on a 1D gas price and assuming no pruning initially. In terms of storage costs I suggest the following as a starting point which we can upgrade from later:

  • Calculate the state growth rate on Ethereum as a function of the cost of SSTORE
    • R = aP, where R is (new bytes) / year, and P is the current cost of SSTORE
  • Come up with a desired max state growth rate for Fuel (i.e. 1/2 tb per year)
  • Solve for P given the target state growth rate (P = R/a)
  • Use P as the extra cost per byte for blockspace and state on disk (i.e. storage slots, coins, etc).
  • Only charge reads based on the benchmarked execution time (with the exception of CCP and LDC which may need some extra gas to cover the full amount of bytes touched on disk regardless of the subslice pulled into vm memory)

This is far from an ideal solution, but as a start it should get us within a general cost region to mitigate DOS. This can be extended with Arbitrum style 2d fees once the DA layer is online. I also think we should add a witness_data_limit field to transactions and start charging for witness data.

In the future, we should aim to adopt a multi-dimensional EIP1559 style approach as outlined in section 1. This will allow the gas prices for storage to be largely self-correcting based on our target state growth rate and DA costs. We could even have different prices for prunable datatypes (i.e. blockspace consumed) vs persistent state (coins / storage) vs DA capacity which are autoregulated. The reason I don't suggest we dive straight into this approach is that it will take some time to revamp the mempool, sdks, UI/UX and so on to support a more advanced fee model and this can be improved on later.

@Voxelot
Copy link
Member

Voxelot commented Aug 24, 2023

After discussing with @xgreenx we came up with the following plan:

  1. Add policies to transactions with all limits managed by policies (in preparation for multi-dimensional pricing in the future)
    1. move gas limit to be a policy
    2. add witness data limit as a policy
    3. more policies to come later with multidimensional pricing
  2. Determine the appropriate price of storage in Gas (Gs) based on the above solution
    1. Gs should include the overhead of Merkle trees for contract state updates. It will overcharge the user for the witness data, but we will not undercharge for the storage update. Benchmarking should give us an estimate of how much overhead there is to insert new keys as the tree grows large (1m+ keys)
  3. Min fee also should include price(WL(Witness limit) * Gs). And WL should affect the validity rules of the transaction.
  4. SDK needs to set a witness data limit based on the expected amount of witness data before signing(if the user doesn’t specify WL, then it should be automatically calculated).
    1. In cases without any script to execute, don’t use GL policy.
  5. Add Gas per byte fee (Gs) to any operations in the VM that increase storage.
    1. sstore(any opcodes, SWW, SWWQ + TR, CALL, MINT too for any new asset ids added to the balances tree)
    2. receipts (ret, retd, log, logd)
  6. (later) Add support of WL to be set by users on the wallet along with GL when we have a feature that actually makes use of third-party witness additions (i.e. meta-tx support)

@Voxelot
Copy link
Member

Voxelot commented Aug 25, 2023

Started a PR here to describe how policies could be implemented for the transaction format: FuelLabs/fuel-specs#514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants