Skip to content

feat: add storage slot lifecycle model#235

Open
weiihann wants to merge 10 commits intoethpandaops:masterfrom
weiihann:feat/storage-slot-lifecycle
Open

feat: add storage slot lifecycle model#235
weiihann wants to merge 10 commits intoethpandaops:masterfrom
weiihann:feat/storage-slot-lifecycle

Conversation

@weiihann
Copy link
Contributor

Add storage slot lifecycle models that track birth (0→non-zero effective bytes) and death (non-zero→0) transitions
per slot, with reincarnation tracking via lifecycle_number.

  • Migration 076: Two new ReplicatedReplacingMergeTree tables — int_storage_slot_lifecycle_boundary (birth/death
    detection) and int_storage_slot_lifecycle (per-lifecycle touch statistics and interval metrics). Includes
    table-level and field-level COMMENTs matching migration 037 conventions.
  • Boundary model: Detects births and deaths from int_storage_slot_diff_by_address_slot, assigns lifecycle
    numbers via cumulative birth count using window functions, self-references for cross-batch continuity.
  • Lifecycle model: Aggregates touch counts, effective bytes peak, and touch-to-touch interval statistics
    (count/sum/max) per lifecycle.

Add int_storage_slot_lifecycle incremental model that tracks per-slot,
per-lifecycle metrics: birth/death blocks, touch count, effective bytes,
and touch-to-touch interval statistics (count, sum, max).

Uses an arrayFold state machine to process events from
int_storage_slot_diff_by_address_slot (births/deaths) and
int_storage_slot_next_touch (all touches). A helper table caches
current lifecycle state per slot for efficient cross-batch lookups.

- Migration 076: int_storage_slot_lifecycle + helper tables
- Transformation: dual-INSERT SQL with arrayFold state machine
- Tests: 10 structural invariant assertions for pectra spec
- Proto: reference schema for API generation (run make proto)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: replace helper tables with boundary table in migration 076

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add lifecycle boundary detection model

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: replace arrayFold with window functions in lifecycle model

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add boundary table test assertions and proto schema

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add death bytes and uniqueness assertions for boundary table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: replace FINAL with argMax dedup in lifecycle models

FINAL forces sort-merge deduplication at query time on every batch.
Replace with GROUP BY + argMax(col, updated_date_time) pattern already
used elsewhere in the codebase (e.g. int_storage_slot_reactivation_6m).

- boundary: subquery argMax with post-dedup birth/death filter
- lifecycle batch_touches: GROUP BY on key columns only
- lifecycle batch_diffs: flat argMax for effective_bytes_to

Validated against 1M blocks (22M-23M): 15/16 assertions pass,
1 pre-existing boundary edge case at block 23M (15/120M rows).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

done

Update CLAUDE.md: require field-level COMMENTs on all table DDLs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add field-level COMMENTs to lifecycle migration tables

Migration 076 had table-level COMMENTs but was missing inline
field COMMENTs, unlike migration 037 which has them on every
column. Add COMMENTs to all fields in both lifecycle_boundary
and lifecycle tables. Add Migration Conventions section to
CLAUDE.md documenting the requirement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

revert

remove
@weiihann weiihann requested a review from Savid as a code owner February 27, 2026 06:25
mattevans and others added 7 commits March 2, 2026 07:08
* master: (34 commits)
  fix(detect_impacted_models.sh): change output from "all" to "none" for non-existing tests directory to clarify behavior
  fix(workflow): reduce concurrency from 15 to 10 in mainnet tests to enhance stability and performance
  fix(workflows): increase test concurrency from 5 to 15 for better performance during mainnet tests
  chore(models): add .gitkeep file to models directory for maintaining version control
  refactor: removed dupe sepolia tests
  refactor(database): enhance isCBTTemplateReady function to improve robustness by checking for database existence and migration status refactor(engine): simplify the buildTestOverrides function by auto-generating sensible defaults from model cache and applying YAML overrides fix(overrides): update the testing overrides file to reflect auto-generated defaults and remove unnecessary hardcoded entries test(models): modify assertions in test YAML files to check for duplicate values instead of null conditions
  feat(database): enhance ValidateExternalData function to check for optional tables during parquet loading and improve validation logic
  fix(tests): enhance SQL tests with better validation checks and improved naming for readability and acceptance criteria refactor(tests): standardize assertions and SQL queries across multiple models for consistency and maintainability
  refactor: ensure boundaries on tests are gooch
  feat(tests): add dynamic resolution for column references in typed checks to enhance flexibility and accuracy in assertions fix(tests): reduce CBTConcurrency to limit contention under concurrent load for better performance during testing chore(engine): improve default overrides for models not in overrides file to optimize testing environment and execution fix(engine): reset pending timers for retried models to avoid premature timeout during retries, enhancing reliability in transformation processing chore(tests): remove deprecated YAML test model fct_block_blob_first_seen_by_node to clean up test suite
  feat: optimize model skill (ethpandaops#234)
  refactor: drop spec from tests
  chore(sql): remove unnecessary test comments from int_block_receipt_size.sql and int_transaction_receipt_size.sql to clean up the code
  chore(detect_impacted_models.sh): exclude CI and documentation files from impacting model detection logic to improve accuracy of the impacted models detection
  feat(workflows): add test-fusaka-mainnet workflow for detecting and testing impacted models in mainnet environment
  chore(int_transaction_receipt_size.sql): add test comment for clarity and future reference
  chore(detect_impacted_models.sh): update shebang to use env for better portability fix(detect_impacted_models.sh): adjust REPO_ROOT path to correctly reference the repository root directory
  chore(int_block_receipt_size.sql): add comment for testing purposes to clarify the intent of the code
  refactor(workflows): restructure GitHub Actions workflows to separate model detection and testing steps for clarity and efficiency
  chore(ci): update GitHub Actions workflows to use self-hosted runners for improved performance and customization chore(ci): specify paths for golangci-lint workflow to limit triggered events only to relevant files
  ...
…ifecycle tables and their structure

test(tests): enhance lifecycle SQL tests to verify data integrity and correctness for storage slot lifecycle models
* master:
  refactor(tests): rename extractModelNames to extractCloneTableNames for clarity and include helper tables in extraction logic
…mprove reliability and resource management during execution.
…or managing lifecycle metrics

feat(proto): implement List and Get requests for int_storage_slot_lifecycle with pagination and filters
chore(proto): improve comments for clarity and update field descriptions in int_storage_slot_lifecycle.proto
docs(proto): provide more detailed comments for lifecycle transitions and metrics in proto definition files

feat(proto): add IntStorageSlotLifecycleBoundary messages to manage lifecycle boundaries with filtering options
docs(proto): improve documentation for proto messages with clearer descriptions on fields and request responses
* master:
  chore(detect_impacted_models.sh): update ignore patterns to include migrations, proto, dependency, and build tooling files for improved impact detection
…fecycle models

Move birth_block filter from HAVING to WHERE in boundaries CTE for
better predicate pushdown, and add birth_block bounds filtering to
prev_stats and prev_state self-queries to ensure lookups are properly
bounded.
@Savid Savid force-pushed the feat/storage-slot-lifecycle branch from d5cf58e to 12133e5 Compare March 2, 2026 04:34
Savid and others added 2 commits March 2, 2026 14:36
* master:
  fix(detect_impacted_models.sh): update condition to check CHANGED_MODELS_COUNT for clarity and reliability in detecting impacted models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants