Skip to content

react-inspector: standardized Effect Schema annotations for data lineage (source-of-truth, derivation, provenance) #687

@schickling-assistant

Description

@schickling-assistant

Problem

Effect Schema already lets us annotate fields with semantic metadata (identifier, title, description, pretty, examples, default, jsonSchema, documentation). What's missing is a standardized vocabulary for data lineage — the epistemic status of a field rather than its shape.

Concretely, when reading a value in the inspector (or generating docs/UI from a schema) the following questions recur:

  • Is this field the source of truth, or a projection / cache / mirror of something else?
  • If it's derived, from what and how? Pure function of other fields? Computed by a service? Materialized from an event log?
  • Who owns / writes this field — and is it writable here or read-only in this context?
  • Is the value authoritative now, or is it a snapshot with a known staleness window?
  • Are there cross-entity references (foreign keys) we should follow?

Today this lives in ad-hoc description prose, in code comments, or only in the author's head. We want a structured, machine-readable, IDE/inspector-renderable layer for it.

Why now / why in react-inspector

The react-inspector already specializes its rendering on schema annotations. Lineage is exactly the kind of signal a debug inspector wants to surface — "this number you're staring at is computed from these other two fields, it's not stored" is the #1 cause of confused debugging sessions. But the annotations themselves are a general-purpose addition to the Effect Schema ecosystem; the inspector is just the first consumer.

Research / prior art

A few existing models we should borrow vocabulary from rather than reinvent:

  • W3C PROV (https://www.w3.org/TR/prov-overview/) — Entity / Activity / Agent, wasDerivedFrom, wasGeneratedBy, wasAttributedTo. Heavy for our needs but the vocabulary is well-considered.
  • OpenLineage (https://openlineage.io/) — dataset-level lineage; less applicable at field granularity but informs the inputs / outputs / job framing.
  • dbt column-level lineage — sources, models, tests, description. Field-level granularity, very pragmatic.
  • GraphQL @deprecated / @external / @requires / @provides directives (Apollo Federation) — closest analogue to what we want: per-field metadata describing where a field comes from in a federated graph.
  • JSON Schema readOnly / writeOnly — minimal stake in the ground for authority.
  • EventModeling / CQRS vocabulary — command / event / read-model / projection. Resonates with the Effect/event-sourcing patterns in this org.

Proposal: a lineage annotation namespace

Add a new annotation symbol under the Effect Schema annotation conventions:

const LineageAnnotationId = Symbol.for('effect/annotation/Lineage')

The value is a tagged union following our codebase conventions (_tag discriminator, no string-only unions). Strawman:

type Lineage =
  | { _tag: 'SourceOfTruth'; owner?: string; system?: string }
  | { _tag: 'Derived'; from: ReadonlyArray<LineageRef>; how: DerivationKind; pure?: boolean }
  | { _tag: 'Projection'; of: LineageRef; staleness?: Duration }
  | { _tag: 'Cache'; of: LineageRef; ttl?: Duration }
  | { _tag: 'Mirror'; of: LineageRef; system?: string }
  | { _tag: 'External'; system: string; ref?: string }   // foreign-system reference
  | { _tag: 'Computed'; fn?: string; description?: string } // ephemeral, not persisted

type LineageRef =
  | { _tag: 'Field'; path: string }            // path relative to root schema
  | { _tag: 'Schema'; identifier: string }     // by schema identifier
  | { _tag: 'External'; system: string; ref: string }

type DerivationKind =
  | { _tag: 'Pure' }                            // pure function of `from`
  | { _tag: 'Aggregation'; op: 'sum' | 'count' | 'min' | 'max' | 'avg' | 'custom' }
  | { _tag: 'Reduction'; description: string }  // event-fold / projection
  | { _tag: 'External'; service: string }       // computed by another service

Companion annotations (separate namespaces, composable with lineage):

const AuthorityAnnotationId = Symbol.for('effect/annotation/Authority')
type Authority = { writers: ReadonlyArray<string>; readers?: ReadonlyArray<string> }

const FreshnessAnnotationId = Symbol.for('effect/annotation/Freshness')
type Freshness = { capturedAt?: 'now' | 'event-time' | 'snapshot'; maxAgeMs?: number }

const ReferenceAnnotationId = Symbol.for('effect/annotation/Reference')
type Reference = { _tag: 'ForeignKey'; targetSchema: string; targetField?: string }

Design options to debate

(A) One fat lineage annotation (tagged union) — proposal above. Pro: one well-known place to look; encourages thinking holistically. Con: large surface; tempts overuse.

(B) Many small annotations (sourceOfTruth, derivedFrom, owner, …) — each a focused boolean/struct. Pro: progressive disclosure, easy to add incrementally. Con: combinations are implicit; risk of conflicts.

(C) PROV-shaped graph annotation — model fields as nodes with explicit edges (wasDerivedFrom, wasGeneratedBy). Pro: standards-compatible, supports tooling export. Con: heavyweight at the field level; most fields are simpler than this.

(D) Free-form meta bag with semantic conventions (like OTel attributes) — meta: { 'lineage.kind': 'derived', 'lineage.from': [...] }. Pro: extensible without changing the type. Con: untyped, defeats the point of using Effect Schema.

Recommendation: start with (A) for the lineage core + small companion annotations (B) for orthogonal concerns (authority, freshness, references). This is the smallest principled surface that covers the common cases without becoming a graph database.

How react-inspector would consume it

  • Field label gets a small badge / glyph based on _tag (e.g. for SourceOfTruth, ƒ for Derived, for Projection, for External).
  • Tooltip / hover surfaces the structured detail: "Derived from subtotal, tax, shipping (pure)".
  • For Derived.from references that point to sibling fields in the same root schema, the inspector can highlight those fields on hover — jump-to-source for data.
  • A new togglable "Lineage" pane in the inspector renders the lineage graph for the current value.

How it fits the rest of the ecosystem

  • @overeng/notion-react already pairs Effect schemas with UI; lineage annotations would give it a principled way to mark which fields are computed vs editable in the database view.
  • Storybook stories of complex domain objects (Order etc.) can demonstrate lineage end-to-end.
  • Future: an effect-schema-to-mermaid exporter for lineage graphs; OpenLineage adapter for analytics pipelines.

Acceptance

  • Decision documented: which of (A)/(B)/(C)/(D) — or hybrid — we adopt, with rationale
  • Lineage (and any companion) annotation types defined as Effect Schemas themselves (self-describing)
  • Helper API: Schema.X.pipe(Lineage.derivedFrom(['a','b'])) style ergonomic constructors
  • react-inspector reads the annotations and renders at minimum: a per-field badge + tooltip
  • Documented annotation vocabulary in the package README (or a dedicated LINEAGE.md)
  • Storybook story demonstrating each variant on a realistic domain model
  • Round-trip test: annotations survive Schema.encodedSchema / Schema.typeSchema operations as expected

Out of scope (for the first cut)

  • Cross-process / cross-service lineage tracking (this is schema-author-declared lineage, not runtime-observed)
  • OpenLineage / PROV serialization adapters — leave as future work behind a stable annotation shape
  • Persisting lineage decisions in a separate registry — schema annotations are the source of truth

Open questions

  1. Should LineageRef.Field.path use the same path syntax we already use for schema context resolution in SchemaContext.tsx? (Strong vote: yes.)
  2. Do we want lineage to compose with Schema.transform automatically (auto-mark transformed fields as Derived)? Risks being magical; counter-argument is it captures the most common case for free.
  3. Authority and freshness — separate annotations or sub-fields of Lineage? Leaning separate so a SourceOfTruth field can still have a Freshness annotation.
Posted on behalf of @schickling
field value
agent_name 🕯️ cl2-flame
agent_session_id e8faef70-586e-46dd-ad04-bc9f4748b80c
agent_tool Claude Code
agent_tool_version 2.1.139
agent_runtime Claude Code 2.1.139
agent_model claude-opus-4-7
worktree effect-utils/schickling/2026-05-25-react-inspector
machine mbp2025
tooling_profile dotfiles@4e6515b

Metadata

Metadata

Assignees

No one assigned

    Labels

    origin:agentFiled or primarily produced by an AI agent · Set: AI agent or manualstate:triageNeeds classification or owner decision · Set: manualtype:epicLarge tracking issue with child tasks · Set: manualtype:featureNew user-visible or system capability · Set: manual

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions