Skip to content

Conversation

@ericallam
Copy link
Member

@ericallam ericallam commented Dec 1, 2025

This is an alternate solution to the fix made in #2719 because task_events_v1 can cause issues even if hardly any events are being sent there.

Core problem with task_events_v1

  1. Insert arrives with start_time from 3 days ago
  2. New tiny part created in partition 20251028 (e.g., 10 rows, 3KB)
  3. That partition already has a massive merged part (e.g., 30GB)
  4. To merge the tiny part, ClickHouse must rewrite the entire 30GB + 10 rows
  5. That merge takes hours, during which more tiny parts arrive
  6. Repeat forever

The old partitions will never stabilise because we keep inserting late-arriving data into them. Additionally, large merges like this can overwhelm a ch replica and prevent merges on other tables, causing a cascading failure.

This change basically attacks the root problem: insert task_events_v1 records with old start times. You can now set the EVENTS_CLICKHOUSE_START_TIME_MAX_AGE_MS env var (defaults to 5 minutes) to clamp the start_time to close to now. Along with this we've updated the application logic that merged many events into a single span to always use the earliest span start_time, instead of just the first.

@changeset-bot
Copy link

changeset-bot bot commented Dec 1, 2025

⚠️ No Changeset found

Latest commit: 461afb9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

This change adds a configurable clamping mechanism for ClickHouse event start times: a new environment variable EVENTS_CLICKHOUSE_START_TIME_MAX_AGE_MS (default 5 minutes) and a startTimeMaxAgeMs option on the ClickHouse repository config. The ClickHouse repository implementation gains private clamp helpers and replaces direct start_time formatting with clamped formatting across event creation, batch flush, completion paths, and trace-summary/span construction, propagating earliest start times where applicable. The repository instance file passes the new config from ENV to the v1 ClickHouse repo. EVENT_REPOSITORY_DEFAULT_STORE was reformatted in the env file.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts — dense logic changes: new clamp helpers, many call sites updated, nanosecond↔millisecond conversions, and earliest-start-time propagation require careful verification.
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts — ensure config wiring passes startTimeMaxAgeMs correctly and no initialization regressions.
  • apps/webapp/app/env.server.ts — verify new env var schema, default, and formatting change to EVENT_REPOSITORY_DEFAULT_STORE.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description provides detailed context and explanation of the problem and solution, but lacks the structured template sections (Closes #issue, Checklist, Testing, Changelog, Screenshots). Follow the repository's PR description template by including the Closes # reference, completing the checklist, documenting testing steps, adding a changelog entry, and noting any relevant screenshots.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: clamping start_time to prevent old partition merge issues in ClickHouse.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ea-branch-104

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts (1)

37-121: Critical: Unresolved merge conflict and missing startTimeMaxAgeMs in v2 initializer.

This file has two issues:

  1. Unresolved merge conflict markers that will prevent compilation
  2. Missing startTimeMaxAgeMs in v2 initializer (lines 123-136) - the clamping feature won't work for v2 repositories

Resolve the conflict and add startTimeMaxAgeMs to both initializers:

 function initializeClickhouseRepository() {
   // ... existing code ...
   const repository = new ClickhouseEventRepository({
     clickhouse: clickhouse,
     batchSize: env.EVENTS_CLICKHOUSE_BATCH_SIZE,
     flushInterval: env.EVENTS_CLICKHOUSE_FLUSH_INTERVAL_MS,
     maximumTraceSummaryViewCount: env.EVENTS_CLICKHOUSE_MAX_TRACE_SUMMARY_VIEW_COUNT,
     maximumTraceDetailedSummaryViewCount:
       env.EVENTS_CLICKHOUSE_MAX_TRACE_DETAILED_SUMMARY_VIEW_COUNT,
     maximumLiveReloadingSetting: env.EVENTS_CLICKHOUSE_MAX_LIVE_RELOADING_SETTING,
     insertStrategy: env.EVENTS_CLICKHOUSE_INSERT_STRATEGY,
     waitForAsyncInsert: env.EVENTS_CLICKHOUSE_WAIT_FOR_ASYNC_INSERT === "1",
     asyncInsertMaxDataSize: env.EVENTS_CLICKHOUSE_ASYNC_INSERT_MAX_DATA_SIZE,
     asyncInsertBusyTimeoutMs: env.EVENTS_CLICKHOUSE_ASYNC_INSERT_BUSY_TIMEOUT_MS,
+    startTimeMaxAgeMs: env.EVENTS_CLICKHOUSE_START_TIME_MAX_AGE_MS,
     version: "v1",
   });
   return repository;
 }

 function initializeClickhouseRepositoryV2() {
   // ... existing code ...
   const repository = new ClickhouseEventRepository({
     clickhouse: clickhouse,
     batchSize: env.EVENTS_CLICKHOUSE_BATCH_SIZE,
     flushInterval: env.EVENTS_CLICKHOUSE_FLUSH_INTERVAL_MS,
     maximumTraceSummaryViewCount: env.EVENTS_CLICKHOUSE_MAX_TRACE_SUMMARY_VIEW_COUNT,
     maximumTraceDetailedSummaryViewCount:
       env.EVENTS_CLICKHOUSE_MAX_TRACE_DETAILED_SUMMARY_VIEW_COUNT,
     maximumLiveReloadingSetting: env.EVENTS_CLICKHOUSE_MAX_LIVE_RELOADING_SETTING,
     insertStrategy: env.EVENTS_CLICKHOUSE_INSERT_STRATEGY,
     waitForAsyncInsert: env.EVENTS_CLICKHOUSE_WAIT_FOR_ASYNC_INSERT === "1",
     asyncInsertMaxDataSize: env.EVENTS_CLICKHOUSE_ASYNC_INSERT_MAX_DATA_SIZE,
     asyncInsertBusyTimeoutMs: env.EVENTS_CLICKHOUSE_ASYNC_INSERT_BUSY_TIMEOUT_MS,
+    startTimeMaxAgeMs: env.EVENTS_CLICKHOUSE_START_TIME_MAX_AGE_MS,
     version: "v2",
   });
   return repository;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ae1317 and f674fe4.

⛔ Files ignored due to path filters (1)
  • references/hello-world/src/trigger/example.ts is excluded by !references/**
📒 Files selected for processing (3)
  • apps/webapp/app/env.server.ts (1 hunks)
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (30 hunks)
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts (2 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
**/*.{js,ts,jsx,tsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
🧠 Learnings (5)
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

  • apps/webapp/app/env.server.ts
📚 Learning: 2025-06-14T08:07:46.625Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 2175
File: apps/webapp/app/services/environmentMetricsRepository.server.ts:202-207
Timestamp: 2025-06-14T08:07:46.625Z
Learning: In apps/webapp/app/services/environmentMetricsRepository.server.ts, the ClickHouse methods (getTaskActivity, getCurrentRunningStats, getAverageDurations) intentionally do not filter by the `tasks` parameter at the ClickHouse level, even though the tasks parameter is accepted by the public methods. This is done on purpose as there is not much benefit from adding that filtering at the ClickHouse layer.

Applied to files:

  • apps/webapp/app/env.server.ts
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Configure OpenTelemetry instrumentations and exporters in trigger.config.ts for enhanced logging

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Limit task duration using the `maxDuration` property (in seconds)

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
🧬 Code graph analysis (1)
apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (3)
apps/webapp/app/v3/eventRepository/common.server.ts (2)
  • getNowInNanoseconds (23-25)
  • convertDateToNanoseconds (53-55)
apps/webapp/app/v3/otlpExporter.server.ts (1)
  • events (125-131)
internal-packages/clickhouse/src/taskEvents.ts (4)
  • TaskEventV1Input (5-22)
  • TaskEventV1Input (24-24)
  • TaskEventV2Input (139-158)
  • TaskEventV2Input (160-160)
🪛 Biome (2.1.2)
apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts

[error] 82-90: Expected a statement but instead found '||||||| ancestor

/**

  • The version of the ClickHouse task_events table to use.
    • "v1": Uses task_events_v1 (partitioned by start_time)
    • "v2": Uses task_events_v2 (partitioned by inserted_at to avoid "too many parts" errors)
      */
      version?: "v1" | "v2"'.

Expected a statement here.

(parse)


[error] 185-187: Expected a statement but instead found '||||||| ancestor
async #flushBatch(flushId: string, events: TaskEventV1Input[])'.

Expected a statement here.

(parse)


[error] 187-189: Expected a statement but instead found '=======
async #flushBatch(flushId: string, events: (TaskEventV1Input | TaskEventV2Input)[])'.

Expected a statement here.

(parse)

apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts

[error] 121-123: Expected a statement but instead found '>>>>>>> theirs'.

Expected a statement here.

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
apps/webapp/app/env.server.ts (1)

1152-1159: LGTM!

The new environment variable follows existing patterns and uses zod for validation as per coding guidelines. The 5-minute default is reasonable for preventing old partition merge issues.

apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (4)

141-183: Clamping implementation is correct.

The three helper methods correctly implement the time clamping logic:

  • Proper early return when startTimeMaxAgeMs is not configured
  • Correct nanosecond conversion using * 1_000_000n
  • Clamps to current time when the start time exceeds the maximum age threshold

264-264: Consistent application of clamping across event creation paths.

The clamping is properly applied to all event creation methods including:

  • createEventToTaskEventV1Input
  • Span event conversions (exception, cancellation, attempt failed, other)
  • recordEvent and traceEvent
  • All complete*RunEvent methods

This comprehensive coverage ensures no old start times can cause partition merge issues.


1231-1243: Earliest start time tracking logic is sound.

The merge methods correctly:

  • Track the earliest start time across all non-override/non-event records
  • Exclude ANCESTOR_OVERRIDE and SPAN_EVENT from the calculation (as these are metadata, not the primary span)
  • Apply the earliest time to the final span summary

This ensures consistent startTime representation when multiple records exist for a span.


1053-1055: Reasonable buffer for endCreatedAt.

The 1-minute buffer on endCreatedAt provides tolerance for clock skew and events that may have shifted timestamps due to clamping. This aligns well with the clamping feature.

@ericallam ericallam merged commit 3c326a4 into main Dec 1, 2025
27 of 28 checks passed
@ericallam ericallam deleted the ea-branch-104 branch December 1, 2025 15:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (2)

76-81: startTimeMaxAgeMs option wiring looks good; 0ms currently behaves like “no clamping”

The config shape and JSDoc match the intended behavior. Note that because the helpers test if (!this._config.startTimeMaxAgeMs), an explicit 0 value is treated the same as undefined (i.e., clamping disabled). If you ever want 0 to mean “clamp everything to now”, you’d need a nullish check (== null) instead of a falsy check; otherwise this is fine and just worth being aware of.


1043-1045: End‑window buffering in getTraceSummary is reasonable; consider tying it to max‑age

Adding a 1s buffer before startCreatedAt and a 60s buffer after endCreatedAt should help avoid missing slightly late or clamped events at the tail of the window.

If EVENTS_CLICKHOUSE_START_TIME_MAX_AGE_MS is tuned significantly above 60 000ms, you might consider deriving this buffer from startTimeMaxAgeMs (or at least documenting the relationship) so the query window is guaranteed to encompass any clamped insertions for that trace.

Also applies to: 1058-1061

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4cb3d7 and 461afb9.

⛔ Files ignored due to path filters (1)
  • references/hello-world/src/trigger/example.ts is excluded by !references/**
📒 Files selected for processing (3)
  • apps/webapp/app/env.server.ts (1 hunks)
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (30 hunks)
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
  • apps/webapp/app/env.server.ts
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
**/*.{js,ts,jsx,tsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier

Files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
🧠 Learnings (1)
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Limit task duration using the `maxDuration` property (in seconds)

Applied to files:

  • apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
🧬 Code graph analysis (1)
apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (1)
apps/webapp/app/v3/eventRepository/common.server.ts (2)
  • getNowInNanoseconds (23-25)
  • convertDateToNanoseconds (53-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (7)
apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts (7)

132-178: Clamping helpers correctly implement “old → now” behavior in both ns and Date domains

The three helpers are consistent with each other and the doc comment:

  • They base “now” on epoch nanoseconds/Date via the existing utilities.
  • They only adjust timestamps that are older than startTimeMaxAgeMs, clamping them to the current time as documented.
  • They short‑circuit cleanly when startTimeMaxAgeMs is unset/falsy.

No correctness issues spotted here.


254-255: All CreateEventInput/span event write paths now clamp start_time to avoid very old partitions

Routing event.startTime and each spanEvent.time through #clampAndFormatStartTime ensures no v1 span/log/SPAN_EVENT records can be written with arbitrarily old start_time values, which matches the PR’s ClickHouse goal.

Given this relies on CreateEventInput.startTime and convertDateToNanoseconds(spanEvent.time) being “epoch nanoseconds”, it’s worth double‑checking those types/callers remain consistent (i.e., no monotonic hrtime values slipping in).

Also applies to: 323-325, 359-361, 389-391, 423-425


591-592: recordEvent/traceEvent clamping is consistent with the new policy

recordEvent and both the main span and error SPAN_EVENT in traceEvent now clamp their start_time via #clampAndFormatStartTime, so ad‑hoc trace/log writes also respect the max‑age constraint.

This will cap the apparent duration of any spans whose provided options.startTime is much older than “now”. That’s aligned with the partition‑stability goal, but it does trade precision in long‑lived/late‑reported spans, so it’s a good place to have tests that assert the new expected behavior for extremely old startTime inputs.

Also applies to: 692-693, 726-727


764-766: Run completion helpers now clamp createdAt/spanCreatedAt/endTime; semantics look intentional

For all run completion flows you now:

  • Clamp run.createdAt or spanCreatedAt (or endTime for the instantaneous attempt‑failed override) via #clampStartTimeDate.
  • Derive the stored start_time from the clamped value while keeping expiry based on the original run.createdAt.

That achieves “no start_time older than X ms” without changing retention. The special case in createAttemptFailedRunEvent (using endTime ?? new Date() as the effective “start”) matches the idea of that override being an instantaneous event at failure time rather than the run’s creation, but it’s worth confirming that’s the intended timeline in the UI/analytics.

Also applies to: 815-817, 858-860, 907-909, 956-958, 1001-1003


1221-1222: SpanDetail now uses earliest non‑event startTime across records, avoiding overrides skewing start

The #mergeRecordsIntoSpanDetail changes:

  • Track earliestStartTime across all records except ANCESTOR_OVERRIDE and SPAN_EVENT.
  • Initialize the span from the first record but then overwrite span.startTime with the earliest non‑event timestamp.
  • Keep individual event times (SPAN_EVENT etc.) based on their own record timestamps.

This aligns with “use the earliest real span/log record as the canonical start” while preventing override/event records from dragging the span earlier or later. Looks correct and matches the PR description.

Also applies to: 1224-1233, 1244-1245, 1271-1272, 1339-1342


1485-1486: SpanSummary aggregation also correctly normalizes to earliest non‑event startTime

The summary‑level merge now mirrors the detail logic:

  • earliestStartTime is computed from all non‑ANCESTOR_OVERRIDE / non‑SPAN_EVENT records.
  • span.data.startTime is reset to that earliest value after the loop.
  • Ancestor overrides and span events still contribute per‑event timestamps via recordStartTime.

This should make the span list view consistent with the detailed view and avoid cases where a later non‑SPAN record defines the start purely because of ordering.

Also applies to: 1488-1497, 1513-1514, 1540-1541, 1571-1574


1741-1742: Detailed summary view now shares the same earliest‑startTime semantics as other aggregations

#mergeRecordsIntoSpanDetailedSummary follows the same pattern: track the earliest non‑override/non‑event start, initialize from the first record, then overwrite data.startTime with the earliest seen value and keep per‑event times.

That keeps all three representations (detail, summary, detailedSummary) aligned in how they define the canonical span start, which should simplify reasoning in the UI and override logic.

Also applies to: 1744-1753, 1768-1769, 1796-1797, 1823-1826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants