Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Code Health] Improve Claim settlement processing scalability #1013

Open
5 tasks
red-0ne opened this issue Dec 17, 2024 · 8 comments
Open
5 tasks

[Code Health] Improve Claim settlement processing scalability #1013

red-0ne opened this issue Dec 17, 2024 · 8 comments
Assignees
Labels
code health Cleans up some code on-chain On-chain business logic proof Claim & Proof life cycle scalability

Comments

@red-0ne
Copy link
Contributor

red-0ne commented Dec 17, 2024

Objective

Ensure that Claim settlement processing scales well with the increasing number of claims to process

Origin Document

image
image

Goals

  • Make sure that the protocol is able to support and process a large number of sessions when those are settled in the same block height.
  • Investigate and address any potential bottleneck resulting from the processing of a large number of sessions.

Deliverables

A PR that

  • Query module parameters outside of the the claim settlement loop such as:
func (k Keeper) SettlePendingClaims(ctx cosmostypes.Context) {
   ...
+  targetNumRelays := k.serviceKeeper.GetParams(ctx).TargetNumRelays
   ...
   for _, claim := range expiringClaims {
      ...
-     targetNumRelays := k.serviceKeeper.GetParams(ctx).TargetNumRelays
  • Gather settled services data outside of the claim settlement loop:
func (k Keeper) SettlePendingClaims(ctx cosmostypes.Context) {
   ...
+  // Query mining difficulty only once for each service
+  serviceIdToRelayMiningDifficultyMap := getServicesRelayMiningDifficulties(claims)
   ...
   for _, claim := range expiringClaims {
      ...
-     relayMiningDifficulty, found := k.serviceKeeper.GetRelayMiningDifficulty(ctx, serviceId)
  • Perform proof validation at proof submission. Currently the validation is done at settlement time and involve signature verification.

Non-goals / Non-deliverables

  • Rewrite or rethink how the protocol works
  • Implement language specific or micro-optimizations

General deliverables

  • Comments: Add/update TODOs and comments alongside the source code so it is easier to follow.
  • Testing: Add new tests (unit and/or E2E) to the test suite.

Creator: @red-0ne
Co-Owners: @okdas @bryanchriswhite

@red-0ne red-0ne added on-chain On-chain business logic code health Cleans up some code scalability proof Claim & Proof life cycle labels Dec 17, 2024
@red-0ne red-0ne added this to the Shannon Beta TestNet Support milestone Dec 17, 2024
@red-0ne red-0ne self-assigned this Dec 17, 2024
@red-0ne red-0ne added this to Shannon Dec 17, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Shannon Dec 17, 2024
@red-0ne red-0ne moved this from 📋 Backlog to 🔖 Ready in Shannon Dec 17, 2024
@Olshansk
Copy link
Member

@red-0ne A few followup quetions/comments:

  1. Querying module parameters outside of the the claims loop.

Where/when would we be querying them? I don't fully understand the optimization here.

  1. Gather settled services data outside of the claim settlement loop.

Same as above.

If I were to pick up and implement this myself, I don't understand the direction and rationale.

  1. Perform proof validation at proof submission. Currently the validation is done at settlement time and involve signature verification.

We DO NOT want to do this for two reasons:

  1. Protocol safety:
    • An attacker would be able to hammer / rainbow attack if this was possible.
    • Independently from the point above, you can't validate if a proof is correct until after the proof submission window closes, and you can only submit it BEFORE it closes. I would say this doesn't make sense.
  2. Idiomatic software engineering:
    • API calls have to be quick & lightweight.
    • This can make a simple API query require heavy processing and simply moves the processing from validators (which should be ready for heavy processing) to full nodes (which are not set up for this).
    • I would argue we have to #PUC next to the proof submission endpoint TO NOT do this.

@red-0ne
Copy link
Contributor Author

red-0ne commented Dec 18, 2024

@Olshansk,

Independently from the point above, you can't validate if a proof is correct until after the proof submission window closes, and you can only submit it BEFORE it closes. I would say this doesn't make sense.

TL;DR; I think there's a confusion between Claim submission (that binds the supplier to the smt root) and Proof submission (that selects which branch to submit)

It is possible to validate the proof correctness at submission time. Since closest path generation and validation uses the EarliestSupplierProofCommitHeight's hash as a seed to select which branch to prove.

See: x/proof/keeper/proof_validation.go#L235-L250

This height comes before ProofWindowCloseHeight and is known by the supplier at proof submission time.

Note that it is unknown at Claim submission which is binding the supplier to a claim root

More broadly, the supplier knows whether the proof it's submitting is valid or not and there's no (and should not be any) secret revealed at (or after) proof submission window closing.

An attacker would be able to hammer / rainbow attack if this was possible.

This is why we have to minimize the time a supplier has to submit a proof after the seed hash is known.

API calls have to be quick & lightweight.

Message submission responses are fast and do not wait for the tx to process (only basic verification is performed). The outcome of the transaction is only known if the tx gets included in a block that is committed.

cc @bryanchriswhite

@Olshansk
Copy link
Member

This is why we have to minimize the time a supplier has to submit a proof after the seed hash is known.

ACK.

However, I don't want this to be a consideration / decision because whoever will manage the governance params 10 years from now will not have this thought process.

We need the submission windows to function independently, safely and securely without these limitations.

@Olshansk
Copy link
Member

TL;DR; I think there's a confusion between Claim submission (that binds the supplier to the smt root) and Proof submission (that selects which branch to submit)

Noted and agreed with you approach.

I concur that it is safe but am still worried about a potential grinding attack.

Will keep thinking...

@Olshansk
Copy link
Member

Olshansk commented Dec 20, 2024

Message submission responses are fast and do not wait for the tx to process (only basic verification is performed). The outcome of the transaction is only known if the tx gets included in a block that is committed.

For the purposes of this messages, let's remove the grinding attack risk vector because I agree with your statement. I'm disregarding my two messages above.

Can you confirm that this is what you expect the flow to be?

  • If no: Can you provide a better explanation?
  • If yes: the greedy validation endpoint could potentially need a really long timeout, which is my concern for RPC nodes.
flowchart TD
    A[Client submits proof] --> B[RPC Node receives proof submission request]
    B --> C{Is 'greedy_validation' boolean field set?}
    C -->|greedy_validation=false| D[Store proof onchain w/ field 'validated=false']
    D --> E[[Return success]]
    C -->|greedy_validation=true| F[Validate proof]
    F --> G{Is proof valid?}
    G -->|No| H[Don't store on chain]
    H --> I[[Return error]]
    G -->|Yes| J[Store proof onchain w/ field 'validated=tre']
    J --> K[[Return success]]
Loading

@red-0ne
Copy link
Contributor Author

red-0ne commented Jan 14, 2025

@Olshansk, to describe the workflow more accurately, we should split it into two phases (mempool inclusion and block inclusion)

As per CosmosSDK Transaction Lifecycle's docs, the following is a simplification of the block production workflow:

Proof tx submission (mempool inclusion)

flowchart TD
    A[Client submits proof tx] --> B[RPC Node receives proof submission tx]
    B --> C{tx passes basic validation}
    C -->|validated=true| D[tx added to mempool]
    D --> E[[Return success]]
    C -->|validated=false| F[tx rejected]
    F --> K[[Return failure]]
Loading

Proof storing (block inclusion)

flowchart TD
    A[Proposer picks proof from mempool] --> B["Include proof into the proposed bloc<br>(produce block=>exec msg)"]
    B --> C["Block is exchanged with other validators<br>(validate block=>exec msg)"]
    C --> D[["Proposed block is broadcasted to full nodes<br>(apply block=>exec msg)"]]
Loading

Implications

  • Full-Nodes do execute all messages and begin/end blockers, we need to account for that when recommending full-node hardware requirements.
  • Proof validity is not known until the proof submission handler is executed.
  • The client receives a response right after the proof tx basic validation is performed.

@red-0ne
Copy link
Contributor Author

red-0ne commented Jan 14, 2025

@Olshansk , after rethinking proof submission scalability, I believe there's an optimal middle ground between "proof validation at submission" and "proof validation at settlement":

Proof validation at the submission block's end blocker.


TL;DR:

  1. Proof submission stores the proof.
  2. EndBlocker validates the proof.
  3. Claim settlement settles the claim or slashes the supplier.

Proposed Flow:

flowchart TD
subgraph MsgSubmitProof
  B["Validate proof.SessionHeader"]
  B --> C["Store proof"]
end
subgraph EndBlocker
  C --> D["Execute EndBlocker"]
  D --> E{"Validate proof.ClosestMerkleProof"}
  E --> |is_valid_proof=true| F["Emit proof + validity events"]
  F --> G["Delete/nullify proof.ClosestMerkleProof<br>(save block space)"]
  E --> |is_valid_proof=false| H["Emit proof + non-validity events"]
  H --> I["Delete the entire proof"]
end
subgraph ClaimSettlement
  G --> J["Wait for claim settlement"]
  I --> J
  J --> K{"Check proof requirement"}
  K --> |"is_proof_required=false"| L["Settle claim"]
  K --> |"is_proof_required=true"| M{"Check proof presence"}
  M --> |"valid_proof_exists=true"| L
  M --> |"valid_proof_exists=false"| N["Slash supplier"]
end
Loading

Advantages of Validating Proofs in the Submission Block’s EndBlocker:

  1. Distributed Validation Workload:
    Proof validation is spread across the entire ProofSubmissionWindow, distributing the workload more effectively.

  2. Simplified Proof Submission Process:

    • Proof submission only involves storing the proof, without running validation checks during submission.
    • RPC nodes simulating proof submissions do not perform additional validations, reducing overhead.
    • Both valid and invalid proofs are accepted for submission, mitigating the risk of proof grinding.
  3. Optimized Block Space Usage:

    • Only valid proofs are retained after EndBlocker execution.
    • Valid proofs no longer include ClosestMerkleProof, avoiding the need to store the relay payload (which is the biggest block space consumer).
    • The whole proof (including ClosestMerkleProof) is captured by the emitted events anyway.

This approach balances scalability and efficiency while ensuring robustness in proof handling.

cc @bryanchriswhite

@Olshansk Olshansk moved this from 🔖 Ready to 🏗 In progress in Shannon Jan 14, 2025
@Olshansk
Copy link
Member

@red-0ne In response to your last message: love the direction, plan and rationale!

Proof validation is spread across the entire ProofSubmissionWindow, distributing the workload more effectively.

💯 agreed

RPC nodes simulating proof submissions do not perform additional validations, reducing overhead.
Both valid and invalid proofs are accepted for submission, mitigating the risk of proof grinding.

💯 agreed

I'd add / modify:

  • Keep RPC endpoints simple & lightweight, offloading business logic outside of the CRUD lifecycle
  • Mitigate grinding attacks by collecting fees for heavy messages

(Note: this is setting us up for verifiable-compute/zero-knowledge in 2026+)

Only valid proofs are retained after EndBlocker execution.

Really great idea! 💡

Here's an image I generated with Claude using some edits to make it easier on the eye:

Screenshot 2025-01-14 at 12 10 35 PM

Syntax for context (it gets cut off a bit in github which is why I added the image)


flowchart TD
    classDef process fill:#e1f5fe,stroke:#01579b,color:#000
    classDef decision fill:#fff3e0,stroke:#e65100,color:#000
    classDef event fill:#f3e5f5,stroke:#4a148c,color:#000
    classDef terminal fill:#e8f5e9,stroke:#1b5e20,color:#000

    subgraph MsgSubmitProof["MsgSubmitProof (Height=X)"]
        B[["Validate proof.SessionHeader"]]
        B --> C[["Store proof"]]
    end
    
    subgraph EndBlocker["EndBlocker (Height=X)"]
        C --> D[["Execute EndBlocker"]]
        D --> E{{"Validate proof.ClosestMerkleProof"}}
        E --> |"is_valid_proof=true"| F[["Emit proof + validity events"]]
        F --> G[["Delete/nullify proof.ClosestMerkleProof\n(save block space)"]]
        E --> |"is_valid_proof=false"| H[["Emit proof + non-validity events"]]
        H --> I[["Delete the entire proof"]]
    end
    
    subgraph ClaimSettlement["ClaimSettlement (Height=SessionEnd(X))"]
        G --> J[["Wait for claim settlement"]]
        I --> J
        J --> K{{"Check proof requirement"}}
        K --> |"is_proof_required=false"| L[("Settle claim")]
        K --> |"is_proof_required=true"| M{{"Check proof presence"}}
        M --> |"valid_proof_exists=true"| L
        M --> |"valid_proof_exists=false"| N[("Slash supplier")]
    end

    class B,C,D,F,G,H,I,J process
    class E,K,M decision
    class L,N terminal
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code health Cleans up some code on-chain On-chain business logic proof Claim & Proof life cycle scalability
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants