[Code Health] Improve Claim settlement processing scalability #1013

red-0ne · 2024-12-17T13:13:14Z

Objective

Ensure that Claim settlement processing scales well with the increasing number of claims to process

Origin Document

Goals

Make sure that the protocol is able to support and process a large number of sessions when those are settled in the same block height.
Investigate and address any potential bottleneck resulting from the processing of a large number of sessions.

Deliverables

A PR that

Query module parameters outside of the the claim settlement loop such as:

func (k Keeper) SettlePendingClaims(ctx cosmostypes.Context) {
   ...
+  targetNumRelays := k.serviceKeeper.GetParams(ctx).TargetNumRelays
   ...
   for _, claim := range expiringClaims {
      ...
-     targetNumRelays := k.serviceKeeper.GetParams(ctx).TargetNumRelays

Gather settled services data outside of the claim settlement loop:

func (k Keeper) SettlePendingClaims(ctx cosmostypes.Context) {
   ...
+  // Query mining difficulty only once for each service
+  serviceIdToRelayMiningDifficultyMap := getServicesRelayMiningDifficulties(claims)
   ...
   for _, claim := range expiringClaims {
      ...
-     relayMiningDifficulty, found := k.serviceKeeper.GetRelayMiningDifficulty(ctx, serviceId)

Perform proof validation at proof submission. Currently the validation is done at settlement time and involve signature verification.

Non-goals / Non-deliverables

Rewrite or rethink how the protocol works
Implement language specific or micro-optimizations

General deliverables

Comments: Add/update TODOs and comments alongside the source code so it is easier to follow.
Testing: Add new tests (unit and/or E2E) to the test suite.

Creator: @red-0ne
Co-Owners: @okdas @bryanchriswhite

Olshansk · 2024-12-18T04:04:33Z

@red-0ne A few followup quetions/comments:

Querying module parameters outside of the the claims loop.

Where/when would we be querying them? I don't fully understand the optimization here.

Gather settled services data outside of the claim settlement loop.

Same as above.

If I were to pick up and implement this myself, I don't understand the direction and rationale.

Perform proof validation at proof submission. Currently the validation is done at settlement time and involve signature verification.

We DO NOT want to do this for two reasons:

Protocol safety:
- An attacker would be able to hammer / rainbow attack if this was possible.
- Independently from the point above, you can't validate if a proof is correct until after the proof submission window closes, and you can only submit it BEFORE it closes. I would say this doesn't make sense.
Idiomatic software engineering:
- API calls have to be quick & lightweight.
- This can make a simple API query require heavy processing and simply moves the processing from validators (which should be ready for heavy processing) to full nodes (which are not set up for this).
- I would argue we have to #PUC next to the proof submission endpoint TO NOT do this.

red-0ne · 2024-12-18T16:23:38Z

@Olshansk,

Independently from the point above, you can't validate if a proof is correct until after the proof submission window closes, and you can only submit it BEFORE it closes. I would say this doesn't make sense.

TL;DR; I think there's a confusion between Claim submission (that binds the supplier to the smt root) and Proof submission (that selects which branch to submit)

It is possible to validate the proof correctness at submission time. Since closest path generation and validation uses the EarliestSupplierProofCommitHeight's hash as a seed to select which branch to prove.

See: x/proof/keeper/proof_validation.go#L235-L250

This height comes before ProofWindowCloseHeight and is known by the supplier at proof submission time.

Note that it is unknown at Claim submission which is binding the supplier to a claim root

More broadly, the supplier knows whether the proof it's submitting is valid or not and there's no (and should not be any) secret revealed at (or after) proof submission window closing.

An attacker would be able to hammer / rainbow attack if this was possible.

This is why we have to minimize the time a supplier has to submit a proof after the seed hash is known.

API calls have to be quick & lightweight.

Message submission responses are fast and do not wait for the tx to process (only basic verification is performed). The outcome of the transaction is only known if the tx gets included in a block that is committed.

cc @bryanchriswhite

Olshansk · 2024-12-20T19:33:10Z

This is why we have to minimize the time a supplier has to submit a proof after the seed hash is known.

ACK.

However, I don't want this to be a consideration / decision because whoever will manage the governance params 10 years from now will not have this thought process.

We need the submission windows to function independently, safely and securely without these limitations.

Olshansk · 2024-12-20T19:33:54Z

TL;DR; I think there's a confusion between Claim submission (that binds the supplier to the smt root) and Proof submission (that selects which branch to submit)

Noted and agreed with you approach.

I concur that it is safe but am still worried about a potential grinding attack.

Will keep thinking...

Olshansk · 2024-12-20T20:01:35Z

Message submission responses are fast and do not wait for the tx to process (only basic verification is performed). The outcome of the transaction is only known if the tx gets included in a block that is committed.

For the purposes of this messages, let's remove the grinding attack risk vector because I agree with your statement. I'm disregarding my two messages above.

Can you confirm that this is what you expect the flow to be?

If no: Can you provide a better explanation?
If yes: the greedy validation endpoint could potentially need a really long timeout, which is my concern for RPC nodes.

flowchart TD
    A[Client submits proof] --> B[RPC Node receives proof submission request]
    B --> C{Is 'greedy_validation' boolean field set?}
    C -->|greedy_validation=false| D[Store proof onchain w/ field 'validated=false']
    D --> E[[Return success]]
    C -->|greedy_validation=true| F[Validate proof]
    F --> G{Is proof valid?}
    G -->|No| H[Don't store on chain]
    H --> I[[Return error]]
    G -->|Yes| J[Store proof onchain w/ field 'validated=tre']
    J --> K[[Return success]]

red-0ne · 2025-01-14T14:08:20Z

@Olshansk, to describe the workflow more accurately, we should split it into two phases (mempool inclusion and block inclusion)

As per CosmosSDK Transaction Lifecycle's docs, the following is a simplification of the block production workflow:

Proof tx submission (mempool inclusion)

flowchart TD
    A[Client submits proof tx] --> B[RPC Node receives proof submission tx]
    B --> C{tx passes basic validation}
    C -->|validated=true| D[tx added to mempool]
    D --> E[[Return success]]
    C -->|validated=false| F[tx rejected]
    F --> K[[Return failure]]

Proof storing (block inclusion)

flowchart TD
    A[Proposer picks proof from mempool] --> B["Include proof into the proposed bloc<br>(produce block=>exec msg)"]
    B --> C["Block is exchanged with other validators<br>(validate block=>exec msg)"]
    C --> D[["Proposed block is broadcasted to full nodes<br>(apply block=>exec msg)"]]

Implications

Full-Nodes do execute all messages and begin/end blockers, we need to account for that when recommending full-node hardware requirements.
Proof validity is not known until the proof submission handler is executed.
The client receives a response right after the proof tx basic validation is performed.

red-0ne · 2025-01-14T15:48:28Z

@Olshansk , after rethinking proof submission scalability, I believe there's an optimal middle ground between "proof validation at submission" and "proof validation at settlement":

Proof validation at the submission block's end blocker.

TL;DR:

Proof submission stores the proof.
EndBlocker validates the proof.
Claim settlement settles the claim or slashes the supplier.

Proposed Flow:

flowchart TD
subgraph MsgSubmitProof
  B["Validate proof.SessionHeader"]
  B --> C["Store proof"]
end
subgraph EndBlocker
  C --> D["Execute EndBlocker"]
  D --> E{"Validate proof.ClosestMerkleProof"}
  E --> |is_valid_proof=true| F["Emit proof + validity events"]
  F --> G["Delete/nullify proof.ClosestMerkleProof<br>(save block space)"]
  E --> |is_valid_proof=false| H["Emit proof + non-validity events"]
  H --> I["Delete the entire proof"]
end
subgraph ClaimSettlement
  G --> J["Wait for claim settlement"]
  I --> J
  J --> K{"Check proof requirement"}
  K --> |"is_proof_required=false"| L["Settle claim"]
  K --> |"is_proof_required=true"| M{"Check proof presence"}
  M --> |"valid_proof_exists=true"| L
  M --> |"valid_proof_exists=false"| N["Slash supplier"]
end

Advantages of Validating Proofs in the Submission Block’s EndBlocker:

Distributed Validation Workload:
Proof validation is spread across the entire ProofSubmissionWindow, distributing the workload more effectively.
Simplified Proof Submission Process:
- Proof submission only involves storing the proof, without running validation checks during submission.
- RPC nodes simulating proof submissions do not perform additional validations, reducing overhead.
- Both valid and invalid proofs are accepted for submission, mitigating the risk of proof grinding.
Optimized Block Space Usage:
- Only valid proofs are retained after EndBlocker execution.
- Valid proofs no longer include ClosestMerkleProof, avoiding the need to store the relay payload (which is the biggest block space consumer).
- The whole proof (including ClosestMerkleProof) is captured by the emitted events anyway.

This approach balances scalability and efficiency while ensuring robustness in proof handling.

cc @bryanchriswhite

Olshansk · 2025-01-14T17:11:45Z

@red-0ne In response to your last message: love the direction, plan and rationale!

Proof validation is spread across the entire ProofSubmissionWindow, distributing the workload more effectively.

💯 agreed

RPC nodes simulating proof submissions do not perform additional validations, reducing overhead.
Both valid and invalid proofs are accepted for submission, mitigating the risk of proof grinding.

💯 agreed

I'd add / modify:

Keep RPC endpoints simple & lightweight, offloading business logic outside of the CRUD lifecycle
Mitigate grinding attacks by collecting fees for heavy messages

(Note: this is setting us up for verifiable-compute/zero-knowledge in 2026+)

Only valid proofs are retained after EndBlocker execution.

Really great idea! 💡

Here's an image I generated with Claude using some edits to make it easier on the eye:

Syntax for context (it gets cut off a bit in github which is why I added the image)

flowchart TD
    classDef process fill:#e1f5fe,stroke:#01579b,color:#000
    classDef decision fill:#fff3e0,stroke:#e65100,color:#000
    classDef event fill:#f3e5f5,stroke:#4a148c,color:#000
    classDef terminal fill:#e8f5e9,stroke:#1b5e20,color:#000

    subgraph MsgSubmitProof["MsgSubmitProof (Height=X)"]
        B[["Validate proof.SessionHeader"]]
        B --> C[["Store proof"]]
    end
    
    subgraph EndBlocker["EndBlocker (Height=X)"]
        C --> D[["Execute EndBlocker"]]
        D --> E{{"Validate proof.ClosestMerkleProof"}}
        E --> |"is_valid_proof=true"| F[["Emit proof + validity events"]]
        F --> G[["Delete/nullify proof.ClosestMerkleProof\n(save block space)"]]
        E --> |"is_valid_proof=false"| H[["Emit proof + non-validity events"]]
        H --> I[["Delete the entire proof"]]
    end
    
    subgraph ClaimSettlement["ClaimSettlement (Height=SessionEnd(X))"]
        G --> J[["Wait for claim settlement"]]
        I --> J
        J --> K{{"Check proof requirement"}}
        K --> |"is_proof_required=false"| L[("Settle claim")]
        K --> |"is_proof_required=true"| M{{"Check proof presence"}}
        M --> |"valid_proof_exists=true"| L
        M --> |"valid_proof_exists=false"| N[("Slash supplier")]
    end

    class B,C,D,F,G,H,I,J process
    class E,K,M decision
    class L,N terminal

red-0ne added on-chain On-chain business logic code health Cleans up some code scalability proof Claim & Proof life cycle labels Dec 17, 2024

red-0ne added this to the Shannon Beta TestNet Support milestone Dec 17, 2024

red-0ne self-assigned this Dec 17, 2024

red-0ne added this to Shannon Dec 17, 2024

github-project-automation bot moved this to 📋 Backlog in Shannon Dec 17, 2024

red-0ne moved this from 📋 Backlog to 🔖 Ready in Shannon Dec 17, 2024

Olshansk moved this from 🔖 Ready to 🏗 In progress in Shannon Jan 14, 2025

red-0ne mentioned this issue Jan 14, 2025

[RelayMiner] Allow big transactions simulation #1027

Merged

14 tasks

red-0ne mentioned this issue Jan 17, 2025

[Proof] Implement scalable proof validation #1031

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Code Health] Improve Claim settlement processing scalability #1013

[Code Health] Improve Claim settlement processing scalability #1013

red-0ne commented Dec 17, 2024 •

edited

Loading

Olshansk commented Dec 18, 2024

red-0ne commented Dec 18, 2024 •

edited

Loading

Olshansk commented Dec 20, 2024

Olshansk commented Dec 20, 2024

Olshansk commented Dec 20, 2024 •

edited

Loading

red-0ne commented Jan 14, 2025 •

edited

Loading

red-0ne commented Jan 14, 2025 •

edited

Loading

Olshansk commented Jan 14, 2025

[Code Health] Improve Claim settlement processing scalability #1013

[Code Health] Improve Claim settlement processing scalability #1013

Comments

red-0ne commented Dec 17, 2024 • edited Loading

Objective

Origin Document

Goals

Deliverables

Non-goals / Non-deliverables

General deliverables

Olshansk commented Dec 18, 2024

red-0ne commented Dec 18, 2024 • edited Loading

Olshansk commented Dec 20, 2024

Olshansk commented Dec 20, 2024

Olshansk commented Dec 20, 2024 • edited Loading

red-0ne commented Jan 14, 2025 • edited Loading

Proof tx submission (mempool inclusion)

Proof storing (block inclusion)

Implications

red-0ne commented Jan 14, 2025 • edited Loading

Olshansk commented Jan 14, 2025

Syntax for context (it gets cut off a bit in github which is why I added the image)

red-0ne commented Dec 17, 2024 •

edited

Loading

red-0ne commented Dec 18, 2024 •

edited

Loading

Olshansk commented Dec 20, 2024 •

edited

Loading

red-0ne commented Jan 14, 2025 •

edited

Loading

red-0ne commented Jan 14, 2025 •

edited

Loading