Generalized sync cache #332

typedarray · 2023-08-27T01:53:10Z

typedarray
Aug 27, 2023
Maintainer

I have a strong hunch/feeling that there is a way to simplify & improve Ponder's raw blockchain data caching design that, if possible, would pay dividends down the road.

Background

Required reading: Ethers docs on topic sets.

Log filters

In ponder.config.ts, the user can specify contracts that they’d like to index. Internally, Ponder converts each contract into a log filter with this type (simplified):

type LogFilter = {
  chainId: number;
  address?: Address | Address[];
  topics?: (Hex | Hex[] | null)[];
}

Consider a simple case where the user wants to fetch and index all events from a single contract, and has specified the start block as the contract deployment block number. Here's the resulting log filter:

const simpleLogFilter = {
  chainId: 1,
  address: "0xabc"
  topics: undefined,
}

Should be easy to follow. (Note: the log filter address and topics fields are exactly the same as the eth_getLogs parameters with the same names).

"Cached ranges"

Ponder’s sync service (the component that fetches and caches raw blockchain data) is largely organized around log filters. The database keeps track of which block ranges have been fetched and inserted into the store for very log filter. This is what powers the caching functionality, where if you finish the historical sync locally, then restart the ponder app, all the raw blockchain data is served from the cache.

The cache scheme is simply a database table with the schema:

type LogFilterCachedRange = {
  logFilterKey: string; // `${chainId}-${address}-${topics}`
  startBlock: number;
  endBlock: number;
}

During the historical sync, after we've fetched and cached a range of blocks for a given log filter, we insert a record in the database like this:

const logFilterCachedRangeRow = {
  logFilterKey: "1-0xabc-null"
  startBlock: 100;
  endBlock: 125;
}

After inserting a row, we then merge any rows with the same key that have any overlap. This actually works pretty well today!

The problem: Custom log filters & overlaps

To complicate things - Ponder also supports custom log filters where the user can specify the log filter directly. Consider now that the user from above has synced their initial simple 1-contract app. Then, they add a new custom log filter like this:

const customLogFilter = {
  chainId: 1,
  address: "0xabc",
  topics: ["0x1"],
}

This log filter is actually a strict sub-set of the simple log filter. The first one matched ALL logs produced by the contract, this one only matches those where topic_0 == "0x1". So, the local raw blockchain store technically already has every log required to serve this. Unforunately, Ponder is not currently able to take advantage of this, and instead must refetch all the logs for the new log filter.

This is because the new log filter key looks like "1-0xabc-[0x1]" which is different from "1-0xabc-null".

Solutions

There might be a very simple solution here that I'm missing. It's basically a cache where the cache key is a set in a high dimension, and the components have slightly different rules (topic set logic, block range merging logic). Bit masks??

Ideally, the event store would have an API that looks like this:

type EventStore = {
  insertLogFilterCachedRange(options: {
    chainId: number;
    address?: Address | Address[];
    topics?: (Hex | Hex[] | null)[];
    startBlock: number;
    endBlock: number;
  }): Promise<void>

  getLogFilterCachedRange(options: {
    chainId: number;
    address?: Address | Address[];
    topics?: (Hex | Hex[] | null)[];
  }): Promise<{ startBlock: number; endBlock: number }[]>

  invalidateLogFilterCachedRange(options: {
    chainId?: number;
    address?: Address | Address[];
    topics?: (Hex | Hex[] | null)[];
    startBlock?: number;
    endBlock?: number;
  }): Promise<void>
}

The insert method would magically merge the newly inserted range with the existing ranges following the address/topic set logic and overlapping block range logic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalized sync cache #332

{{title}}

Replies: 0 comments

Select a reply

Generalized sync cache #332

typedarray Aug 27, 2023 Maintainer

Background

Required reading: Ethers docs on topic sets.

Log filters

"Cached ranges"

The problem: Custom log filters & overlaps

Solutions

Replies: 0 comments

typedarray
Aug 27, 2023
Maintainer