Skip to content

feat/iii-observability#1678

Open
guibeira wants to merge 41 commits into
mainfrom
feat/new-libs
Open

feat/iii-observability#1678
guibeira wants to merge 41 commits into
mainfrom
feat/new-libs

Conversation

@guibeira
Copy link
Copy Markdown
Contributor

@guibeira guibeira commented May 22, 2026

What

Why

Notes

Summary by CodeRabbit

Release Notes

  • New Features

    • Published observability packages (@iii-dev/observability for Node, iii-observability for Python and Rust) containing shared telemetry, logging, and tracing utilities.
    • Added HTTP request instrumentation with automatic OpenTelemetry span creation and W3C trace context propagation.
    • Improved connection failure handling with graceful export queue draining.
  • Refactor

    • Consolidated observability functionality into dedicated packages for better modularity and cross-SDK consistency.

Review Change Stack

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
iii-website Ready Ready Preview, Comment May 23, 2026 2:43pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

This PR extracts observability, telemetry, and logging functionality into three shared language-specific packages (@iii-dev/observability for Node, iii-observability for Python and Rust) and refactors all SDK packages to depend on and re-export from these new packages. The change consolidates duplicate telemetry infrastructure across SDKs into reusable, independently-publishable modules, with CI/CD publishing updated to include the new packages.

Changes

Observability Package Extraction & Multi-Language SDK Refactoring

Layer / File(s) Summary
Workspace and package configuration setup
pnpm-workspace.yaml, Cargo.toml, sdk/packages/*/observability/{README,package.json,pyproject.toml,Cargo.toml,tsconfig,vitest.config,tsdown.config}
Workspace files register new observability package paths and dependencies; build/test configurations are added for each observability package.
Node observability HTTP instrumentation and types
sdk/packages/node/observability/src/{http-instrumentation,types,utils,logger,index}.ts, sdk/packages/node/observability/src/**/*.test.ts
executeTracedRequest wraps fetch calls in OTel CLIENT spans, records HTTP request/response attributes and errors, and injects W3C trace context; OtelLogEvent type, safeStringify utility, and public re-exports are defined.
Node observability connection and exporter failure handling
sdk/packages/node/observability/src/telemetry-system/{connection,*-exporter,index}.ts, sdk/packages/node/observability/src/telemetry-system/*.test.ts
SharedEngineConnection.onFailed event allows exporters to drain queued exports when the connection permanently fails; log/metric/span exporters implement failure handlers and queue management; flushOtel() performs forceFlush without shutdown.
Python observability HTTP and telemetry helpers
sdk/packages/python/observability/src/iii_observability/{http_instrumentation,reconnection,telemetry,__init__}.py, sdk/packages/python/observability/tests/**
execute_traced_request wraps httpx requests in OTel CLIENT spans with request/response header and status recording; ReconnectionConfig dataclass for WebSocket backoff; flush_otel() async helper; public API re-exports.
Rust observability crate and tests
sdk/packages/rust/observability/{Cargo.toml,README.md,src/lib.rs,src/logger.rs,src/telemetry/**,tests/**}
New iii-observability crate defines VERSION constant, re-exports telemetry/logger/context helpers and utilities from internal modules; documentation examples and test coverage for configuration defaults and span context outside active spans.
Node SDK refactoring to use observability package
sdk/packages/node/iii/{package.json,src/{iii,index,types,utils,tsdown.config},vitest.config,tests/**}
Node iii SDK dependency list restructured to pull OpenTelemetry runtime via @iii-dev/observability workspace package; imports in source and test files redirected; test fixtures and mocks updated to use @iii-dev/observability entrypoints; Vitest alias configured for local observability resolution.
Python SDK refactoring to use observability package
sdk/packages/python/iii/{pyproject.toml,src/iii/{__init__,iii,iii_constants}.py,tests/**}, sdk/packages/python/iii-example/**
Python iii SDK and example dependencies updated to pull iii-observability; imports across __init__, iii.py, and iii_constants.py redirected; test files and monkeypatching updated to use iii_observability module paths; example state and stream client methods switched to async trigger calls.
Rust SDK and dependent crates refactoring
sdk/packages/rust/iii/{Cargo.toml,src/lib.rs,src/iii.rs,README,tests/**}, sdk/packages/rust/iii-example/**, console/packages/console-rust/**, crates/iii-worker/**, engine/**
Rust iii SDK dependency on OpenTelemetry replaced with iii-observability workspace crate; internal telemetry and logger modules removed; InitOptions.otel field type changed to iii_observability::OtelConfig; public re-exports consolidated from iii_observability; test and example imports updated across crates.
CI/CD release workflow and build pipeline updates
.github/workflows/{ci,release-iii}.yml
Release workflow adds three new publishing jobs for observability packages (npm, Python, Rust); SDK npm/py/rust jobs updated to depend on corresponding observability jobs; notify-complete job includes observability results in dependency list and status notification; CI adds precheck build step for @iii-dev/observability.

Sequence Diagram(s)

No sequence diagrams generated; the changes are primarily refactoring and re-export consolidation without introducing new multi-step workflows at the SDK-user level.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • iii-hq/iii#1228: Updates Node SDK worker observability/gauges tests to import from @iii-dev/observability, directly overlapping test infrastructure changes in this PR.
  • iii-hq/iii#1223: Rust/Python init refactoring that updates InitOptions and OTEL configuration handling, affecting the same types being refactored here.
  • iii-hq/iii#1676: Refactors Node SDK telemetry export surface (exports.test.ts) by removing/reshaping OpenTelemetry helper re-exports, paralleling the public API consolidation in this PR.

Suggested reviewers

  • ytallo

Poem

🐰 Whiskers twitching with joy,
Three observability hops—one for each tongue,
Node, Python, and Rust now share the same burrow,
No telemetry rabbit holes to tunnel alone! 🔍✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/new-libs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdk/packages/rust/observability/src/telemetry/mod.rs (1)

402-404: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent OTEL_REF_COUNT underflow in shutdown_otel when init_otel is a disabled no-op.

  • OTEL_REF_COUNT starts at AtomicUsize::new(0), and init_otel returns false for enabled: Some(false) before any fetch_add.
  • shutdown_otel unconditionally does OTEL_REF_COUNT.fetch_sub(1, ...), so calling it after a disabled init underflows and corrupts later ref-counting (the test init_otel_with_disabled_config_is_noop covers this).
Suggested fix
 pub async fn shutdown_otel() {
-    let prev = OTEL_REF_COUNT.fetch_sub(1, Ordering::SeqCst);
+    let prev = match OTEL_REF_COUNT.fetch_update(
+        Ordering::SeqCst,
+        Ordering::SeqCst,
+        |count| count.checked_sub(1),
+    ) {
+        Ok(prev) => prev,
+        Err(_) => {
+            tracing::debug!("OpenTelemetry not initialized, skipping shutdown");
+            return;
+        }
+    };
     if prev > 1 {
         tracing::debug!(
             remaining = prev - 1,
             "OpenTelemetry still in use, skipping shutdown"
         );
         return;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/rust/observability/src/telemetry/mod.rs` around lines 402 - 404,
shutdown_otel currently unconditionally calls OTEL_REF_COUNT.fetch_sub(1, ...)
which underflows if init_otel was a disabled no-op; change shutdown_otel to only
decrement the atomic when its current value is > 0 (use
AtomicUsize::fetch_update or a load + compare_exchange loop with SeqCst to avoid
races) so OTEL_REF_COUNT never goes below zero; update the code in shutdown_otel
(referencing OTEL_REF_COUNT and the shutdown_otel function) to perform a
conditional decrement and only run the shutdown logic when the decrement
succeeds.
🧹 Nitpick comments (2)
sdk/packages/node/observability/package.json (1)

8-8: 💤 Low value

Consider adding a repository field for better discoverability.

Adding a repository field helps with package discoverability and tooling integration. Consider adding:

"repository": {
  "type": "git",
  "url": "https://github.com/iii-hq/iii.git",
  "directory": "sdk/packages/node/observability"
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/node/observability/package.json` at line 8, The package.json for
the observability package is missing a repository field which reduces
discoverability; add a "repository" entry to package.json with type "git", the
repo URL "https://github.com/iii-hq/iii.git" and the directory
"sdk/packages/node/observability" so tooling and users can locate the source for
this package.
sdk/packages/python/observability/src/iii_observability/reconnection.py (1)

20-24: ⚡ Quick win

Validate reconnection parameter bounds at construction.

ReconnectionConfig accepts invalid runtime values today (negative delays/retries other than -1, multiplier < 1, jitter outside [0,1]), which can produce pathological retry behavior.

Suggested fix
 `@dataclass`
 class ReconnectionConfig:
@@
     initial_delay_ms: int = 1000
     max_delay_ms: int = 30000
     backoff_multiplier: float = 2.0
     jitter_factor: float = 0.3
     max_retries: int = -1
+
+    def __post_init__(self) -> None:
+        if self.initial_delay_ms < 0:
+            raise ValueError("initial_delay_ms must be >= 0")
+        if self.max_delay_ms < self.initial_delay_ms:
+            raise ValueError("max_delay_ms must be >= initial_delay_ms")
+        if self.backoff_multiplier < 1.0:
+            raise ValueError("backoff_multiplier must be >= 1.0")
+        if not 0.0 <= self.jitter_factor <= 1.0:
+            raise ValueError("jitter_factor must be between 0 and 1")
+        if self.max_retries < -1:
+            raise ValueError("max_retries must be -1 or >= 0")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/python/observability/src/iii_observability/reconnection.py`
around lines 20 - 24, ReconnectionConfig currently allows invalid values; add
validation in the ReconnectionConfig constructor or __post_init__ that checks
the fields and raises ValueError on bad input: ensure initial_delay_ms and
max_delay_ms are integers >= 0 (and optionally require max_delay_ms >=
initial_delay_ms), backoff_multiplier is a float >= 1.0, jitter_factor is a
float in [0.0, 1.0], and max_retries is an int >= -1 (only -1 allowed as
sentinel for unlimited); reference the ReconnectionConfig class and its fields
initial_delay_ms, max_delay_ms, backoff_multiplier, jitter_factor, and
max_retries when adding these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sdk/packages/node/iii/tests/logger-runtime.test.ts`:
- Around line 51-57: The test registers and looks up a function using the dotted
ID "demo.handler"; update both uses to the engine-standard separator by
replacing "demo.handler" with "demo::handler" in the sdk.registerFunction(...)
call and the sdk.functions.get(...) lookup so the test uses the `::` function ID
format (referencing sdk.registerFunction and sdk.functions.get).

In `@sdk/packages/node/observability/src/http-instrumentation.ts`:
- Around line 22-25: The URL constructor is called on rawUrl without a base
which throws for relative request URLs (e.g. "/orders"); update the parsing
logic around rawUrl/URL so relative URLs are handled safely: detect if rawUrl is
a relative path (no scheme/host or starts with "/") and pass a safe base (for
example a harmless origin like "http://localhost") or wrap new URL(rawUrl) in a
try/catch and fall back to new URL(rawUrl, 'http://localhost'), then compute
method and name from that safe URL; adjust the same guard where URL is used
later (lines 31-40) so executeTracedRequest won’t throw on relative inputs.

In `@sdk/packages/node/observability/src/telemetry-system/log-exporter.ts`:
- Around line 55-57: The pendingExports queue in log-exporter.ts (where it
pushes in the block containing if (state !== 'connected') {
this.pendingExports.push({ logs, callback: resultCallback }) }) must be bounded
to prevent unbounded memory growth; add a MAX_PENDING_EXPORTS constant and
enforce it before push (e.g., if queue.length >= MAX_PENDING_EXPORTS then remove
oldest entry or reject the incoming export), and when dropping an export invoke
the associated callback with an error/Failed result and emit a warning via the
exporter logger so callers are notified; ensure this logic is applied around the
pendingExports push in the same method and that unit tests cover both enqueue
and drop behavior.

In `@sdk/packages/node/observability/src/types.ts`:
- Around line 4-7: The two fields timestamp_unix_nano and
observed_timestamp_unix_nano must not be plain JS numbers; change their type in
sdk/packages/node/observability/src/types.ts to a lossless representation
(prefer a string-based UNIX-ns field, e.g. string) and update
producers/serialization and consumers to treat them as lossless values
(producers emit the nanoseconds as strings, and consumers/console parse to
BigInt before math). Specifically, replace the number type for
timestamp_unix_nano and observed_timestamp_unix_nano, update any serialization
code that writes these fields to emit stringified nanoseconds, and update
consumers (places doing arithmetic like "... / 1_000_000" or
"a.timestamp_unix_nano - b.timestamp_unix_nano") to parse the value to BigInt
and perform integer arithmetic (use BigInt division by 1_000_000n to get
milliseconds) so ordering and timestamps remain exact.

In `@sdk/packages/node/observability/src/utils.ts`:
- Around line 5-10: The safeStringify function can return undefined when
JSON.stringify serializes a top-level undefined/function/symbol, violating its
string return type; update safeStringify (in both implementations) to capture
the result of JSON.stringify(...) and always return a string by coalescing
undefined to a canonical fallback (e.g. '"[unserializable]"' or
'[unserializable]')—use the existing replacer/WeakSet logic as-is, but change
the final return to something like: const out = JSON.stringify(...); return out
?? '"[unserializable]"'.

In `@sdk/packages/python/observability/src/iii_observability/telemetry.py`:
- Around line 330-345: flush_otel currently calls blocking
provider.force_flush() methods directly inside an async function; offload these
synchronous calls to a background thread so the event loop isn't blocked. Change
flush_otel to await asynchronous thread execution for each provider call (e.g.,
using asyncio.to_thread or loop.run_in_executor) when invoking
tracer_provider.force_flush(), _meter_provider.force_flush(), and
_log_provider.force_flush(); preserve the existing hasattr checks
(tracer_provider in trace.get_tracer_provider(), _meter_provider, _log_provider)
and await the threaded calls in sequence (or concurrently with asyncio.gather)
and surface any exceptions as needed.

---

Outside diff comments:
In `@sdk/packages/rust/observability/src/telemetry/mod.rs`:
- Around line 402-404: shutdown_otel currently unconditionally calls
OTEL_REF_COUNT.fetch_sub(1, ...) which underflows if init_otel was a disabled
no-op; change shutdown_otel to only decrement the atomic when its current value
is > 0 (use AtomicUsize::fetch_update or a load + compare_exchange loop with
SeqCst to avoid races) so OTEL_REF_COUNT never goes below zero; update the code
in shutdown_otel (referencing OTEL_REF_COUNT and the shutdown_otel function) to
perform a conditional decrement and only run the shutdown logic when the
decrement succeeds.

---

Nitpick comments:
In `@sdk/packages/node/observability/package.json`:
- Line 8: The package.json for the observability package is missing a repository
field which reduces discoverability; add a "repository" entry to package.json
with type "git", the repo URL "https://github.com/iii-hq/iii.git" and the
directory "sdk/packages/node/observability" so tooling and users can locate the
source for this package.

In `@sdk/packages/python/observability/src/iii_observability/reconnection.py`:
- Around line 20-24: ReconnectionConfig currently allows invalid values; add
validation in the ReconnectionConfig constructor or __post_init__ that checks
the fields and raises ValueError on bad input: ensure initial_delay_ms and
max_delay_ms are integers >= 0 (and optionally require max_delay_ms >=
initial_delay_ms), backoff_multiplier is a float >= 1.0, jitter_factor is a
float in [0.0, 1.0], and max_retries is an int >= -1 (only -1 allowed as
sentinel for unlimited); reference the ReconnectionConfig class and its fields
initial_delay_ms, max_delay_ms, backoff_multiplier, jitter_factor, and
max_retries when adding these checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 56c6ae23-90e0-4c35-82ed-1b5e2392f6a6

📥 Commits

Reviewing files that changed from the base of the PR and between 5893e1a and 148f013.

⛔ Files ignored due to path filters (3)
  • Cargo.lock is excluded by !**/*.lock
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
  • sdk/packages/python/observability/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (101)
  • .github/workflows/release-iii.yml
  • Cargo.toml
  • pnpm-workspace.yaml
  • sdk/packages/node/iii/package.json
  • sdk/packages/node/iii/src/iii.ts
  • sdk/packages/node/iii/src/index.ts
  • sdk/packages/node/iii/src/telemetry.ts
  • sdk/packages/node/iii/src/types.ts
  • sdk/packages/node/iii/src/utils.ts
  • sdk/packages/node/iii/tests/baggage-span-processor.test.ts
  • sdk/packages/node/iii/tests/bridge-utils.ts
  • sdk/packages/node/iii/tests/exports.test.ts
  • sdk/packages/node/iii/tests/fetch-instrumentation.test.ts
  • sdk/packages/node/iii/tests/logger-runtime.test.ts
  • sdk/packages/node/iii/tests/logger.test.ts
  • sdk/packages/node/iii/tests/otel-defaults.test.ts
  • sdk/packages/node/iii/tests/otel-worker-gauges.test.ts
  • sdk/packages/node/iii/tests/payload.test.ts
  • sdk/packages/node/iii/tests/span-ops.test.ts
  • sdk/packages/node/iii/tests/worker-metrics.test.ts
  • sdk/packages/node/iii/vitest.config.ts
  • sdk/packages/node/observability/README.md
  • sdk/packages/node/observability/package.json
  • sdk/packages/node/observability/src/http-instrumentation.test.ts
  • sdk/packages/node/observability/src/http-instrumentation.ts
  • sdk/packages/node/observability/src/index.ts
  • sdk/packages/node/observability/src/logger.test.ts
  • sdk/packages/node/observability/src/logger.ts
  • sdk/packages/node/observability/src/otel-worker-gauges.ts
  • sdk/packages/node/observability/src/telemetry-system/baggage-span-processor.ts
  • sdk/packages/node/observability/src/telemetry-system/connection.ts
  • sdk/packages/node/observability/src/telemetry-system/context.test.ts
  • sdk/packages/node/observability/src/telemetry-system/context.ts
  • sdk/packages/node/observability/src/telemetry-system/exporters.ts
  • sdk/packages/node/observability/src/telemetry-system/fetch-instrumentation.ts
  • sdk/packages/node/observability/src/telemetry-system/index.ts
  • sdk/packages/node/observability/src/telemetry-system/log-exporter.ts
  • sdk/packages/node/observability/src/telemetry-system/metrics-exporter.ts
  • sdk/packages/node/observability/src/telemetry-system/payload.ts
  • sdk/packages/node/observability/src/telemetry-system/span-exporter.ts
  • sdk/packages/node/observability/src/telemetry-system/span-ops.ts
  • sdk/packages/node/observability/src/telemetry-system/types.ts
  • sdk/packages/node/observability/src/telemetry-system/utils.ts
  • sdk/packages/node/observability/src/types.ts
  • sdk/packages/node/observability/src/utils.ts
  • sdk/packages/node/observability/src/worker-metrics.ts
  • sdk/packages/node/observability/tsconfig.json
  • sdk/packages/node/observability/tsdown.config.ts
  • sdk/packages/node/observability/vitest.config.ts
  • sdk/packages/python/iii/pyproject.toml
  • sdk/packages/python/iii/src/iii/__init__.py
  • sdk/packages/python/iii/src/iii/iii.py
  • sdk/packages/python/iii/src/iii/iii_constants.py
  • sdk/packages/python/iii/tests/test_baggage_span_processor.py
  • sdk/packages/python/iii/tests/test_logger_otel.py
  • sdk/packages/python/iii/tests/test_payload.py
  • sdk/packages/python/iii/tests/test_span_ops.py
  • sdk/packages/python/iii/tests/test_telemetry.py
  • sdk/packages/python/iii/tests/test_telemetry_types.py
  • sdk/packages/python/observability/README.md
  • sdk/packages/python/observability/pyproject.toml
  • sdk/packages/python/observability/src/iii_observability/__init__.py
  • sdk/packages/python/observability/src/iii_observability/baggage_span_processor.py
  • sdk/packages/python/observability/src/iii_observability/http_instrumentation.py
  • sdk/packages/python/observability/src/iii_observability/logger.py
  • sdk/packages/python/observability/src/iii_observability/payload.py
  • sdk/packages/python/observability/src/iii_observability/reconnection.py
  • sdk/packages/python/observability/src/iii_observability/span_ops.py
  • sdk/packages/python/observability/src/iii_observability/telemetry.py
  • sdk/packages/python/observability/src/iii_observability/telemetry_exporters.py
  • sdk/packages/python/observability/src/iii_observability/telemetry_types.py
  • sdk/packages/python/observability/tests/__init__.py
  • sdk/packages/python/observability/tests/test_http_instrumentation.py
  • sdk/packages/python/observability/tests/test_telemetry.py
  • sdk/packages/rust/iii/Cargo.toml
  • sdk/packages/rust/iii/src/iii.rs
  • sdk/packages/rust/iii/src/lib.rs
  • sdk/packages/rust/iii/tests/payload_capture.rs
  • sdk/packages/rust/observability/Cargo.toml
  • sdk/packages/rust/observability/README.md
  • sdk/packages/rust/observability/src/lib.rs
  • sdk/packages/rust/observability/src/logger.rs
  • sdk/packages/rust/observability/src/telemetry/baggage_span_processor.rs
  • sdk/packages/rust/observability/src/telemetry/connection.rs
  • sdk/packages/rust/observability/src/telemetry/context.rs
  • sdk/packages/rust/observability/src/telemetry/http_instrumentation.rs
  • sdk/packages/rust/observability/src/telemetry/json_serializer.rs
  • sdk/packages/rust/observability/src/telemetry/log_exporter.rs
  • sdk/packages/rust/observability/src/telemetry/metrics_exporter.rs
  • sdk/packages/rust/observability/src/telemetry/mod.rs
  • sdk/packages/rust/observability/src/telemetry/otel_worker_gauges.rs
  • sdk/packages/rust/observability/src/telemetry/payload.rs
  • sdk/packages/rust/observability/src/telemetry/span_exporter.rs
  • sdk/packages/rust/observability/src/telemetry/span_ops.rs
  • sdk/packages/rust/observability/src/telemetry/types.rs
  • sdk/packages/rust/observability/src/telemetry/worker_metrics.rs
  • sdk/packages/rust/observability/tests/context_test.rs
  • sdk/packages/rust/observability/tests/init_test.rs
  • sdk/packages/rust/observability/tests/logger_test.rs
  • sdk/packages/rust/observability/tests/payload_test.rs
  • sdk/packages/rust/observability/tests/types_test.rs
💤 Files with no reviewable changes (2)
  • sdk/packages/node/iii/src/types.ts
  • sdk/packages/node/iii/src/utils.ts

Comment thread sdk/packages/node/iii/tests/logger-runtime.test.ts Outdated
Comment thread sdk/packages/node/observability/src/http-instrumentation.ts Outdated
Comment thread sdk/packages/node/observability/src/types.ts
Comment thread sdk/packages/node/observability/src/utils.ts
Comment thread sdk/packages/python/observability/src/iii_observability/telemetry.py Outdated
guibeira added 19 commits May 22, 2026 19:26
Copies payload.rs, span_ops.rs, and baggage_span_processor.rs verbatim
from iii-sdk into the iii-observability crate and re-exports all public
symbols at the crate root. Adds integration test payload_test.rs (TDD).
…d_request

Move all OTel lifecycle functions (init_otel, shutdown_otel, flush_otel,
with_span, run_in_span) and their required internal modules (connection,
json_serializer, log_exporter, metrics_exporter, span_exporter,
otel_worker_gauges, worker_metrics, http_instrumentation) from iii-sdk
into iii-observability. Add public re-exports in lib.rs and add
futures-util, tokio-tungstenite, and uuid as direct dependencies.
…compat

Delete iii-sdk's own telemetry/ and logger.rs; add iii-observability as
a workspace dep instead. All OTel symbols are re-exported from the new
crate at the iii-sdk crate root for back-compat. iii.rs internal usages
of opentelemetry:: are routed through iii_observability::opentelemetry.
Test file payload_capture.rs updated to use the flat re-export path.
…rker-gauges

Copy telemetry-system folder, worker-metrics.ts, and otel-worker-gauges.ts
from iii-sdk into @iii-dev/observability. Add OtelLogEvent to local types.ts,
safeStringify to local utils.ts, and ws dependency. All imports self-contained.
…back-compat

- Replace all direct OTel and local telemetry sources in iii-sdk with a
  dependency on @iii-dev/observability (workspace:*)
- Delete logger.ts, otel-worker-gauges.ts, worker-metrics.ts, telemetry-system/
- Rewrite telemetry.ts as a shim: export * from '@iii-dev/observability'
- Update index.ts to re-export Logger from @iii-dev/observability
- Fix @iii-dev/observability to expose getMeter, getTracer, SpanKind, and
  resolve Logger naming conflict with @opentelemetry/api-logs Logger type
  (rename internal OtelApiLogger to avoid tsdown DTS bundling collision)
…er source move

Update all test files that referenced moved telemetry-system source paths
to import from @iii-dev/observability instead. Add @opentelemetry/sdk-trace-base
and @opentelemetry/instrumentation as devDeps for tests that need them directly.
Also expose getMeter/getTracer/SpanKind on @iii-dev/observability public surface
since iii.ts needs them, and fix OtelApiLogger rename to resolve tsdown DTS
naming collision between the Logger class and @opentelemetry/api-logs Logger type.
…test isolation

- Add resolve.alias in vitest.config.ts to load observability from
  TypeScript source instead of compiled dist, fixing OTel context
  propagation (CJS/ESM dual-instantiation of @opentelemetry/api)
- Update otel-defaults.test.ts and fetch-instrumentation.test.ts:
  update internal mock paths to point to observability source,
  add BatchSpanProcessor to sdk-trace-base mock, add forceFlush to
  all provider/processor mocks so shutdownOtel completes cleanly,
  add BatchLogRecordProcessor to sdk-logs mock
- Add AsyncLocalStorageContextManager to span-ops.test.ts so
  context.with() propagates the active span for in-process assertions
- Add @opentelemetry/context-async-hooks as devDependency
guibeira added 20 commits May 22, 2026 19:26
…ed state

Exporters were queueing spans/metrics/logs when connection state was not
'connected' and waited indefinitely for an onConnected callback. Once
SharedEngineConnection entered the terminal 'failed' state (max retries
exhausted), queued exports were never drained and never invoked their
resultCallback. BatchSpanProcessor.forceFlush() then timed out after
30s during shutdown, causing test hookTimeout failures.

Add onFailed hook on SharedEngineConnection and wire all three exporters
to drain their pending queues with ExportResultCode.FAILED when the
connection gives up. Also fast-fail in doExport() when state is already
'failed' at export time.

Add otel.reconnectionConfig (maxRetries: 3) to bridgeIII in test utils so
the bridge OTel socket actually reaches the 'failed' terminal state
instead of retrying forever. Bump hookTimeout to 60000 defensively.
…python package

Copies telemetry.py, telemetry_types.py, telemetry_exporters.py, baggage_span_processor.py,
span_ops.py, payload.py, and logger.py verbatim from iii SDK into iii_observability.
Extracts ReconnectionConfig into a new reconnection.py module and wires up the
public surface in __init__.py with tests confirming the exports work.
…k-compat

Switch iii-sdk to depend on iii-observability instead of bundling its own
copies of telemetry/logger/payload/span_ops/baggage modules. Delete the
moved files, update all internal imports, and re-export all symbols from
__init__.py so the public API surface is unchanged.
The __init__.py was missing exports for the following telemetry functions that are
re-exported from iii_observability:
  - current_span_id
  - current_trace_id
  - execute_traced_request
  - extract_baggage / extract_traceparent
  - flush_otel / init_otel / shutdown_otel
  - inject_baggage / inject_traceparent
  - with_span

Add these to __all__ so they are properly exported as part of the public API.
tsc resolves @iii-dev/observability via its dist/index.d.ts; the workspace
package must be built before the iii-sdk type check or tsc errors with
TS2307 'Cannot find module'.
iii-sdk pinned iii-observability==0.13.0.dev1, which is not yet on PyPI,
so uv sync failed in CI with 'package not found in registry'. Adds
[tool.uv.sources] mapping iii-observability to ../observability so uv
resolves it from the workspace path, and regenerates uv.lock.

Also adds a py.typed marker to iii-observability so mypy recognizes its
type information from iii-sdk.
… move

Tests still imported from iii.telemetry, iii.telemetry_types, iii.logger,
and iii.telemetry_exporters, which were deleted when the observability
surface moved to iii_observability. Updates the imports so pytest can
collect the test modules.
Tests patched iii.telemetry.init_otel and iii.telemetry.attach_event_loop
via monkeypatch.setattr() string targets. Production code now imports
those from iii_observability.telemetry, so the patches landed on the
wrong module and the real init_otel ran against a fake websocket,
producing ImportError or wrong behavior. Updates the string targets to
iii_observability.telemetry.* so patches actually take effect.
- executeTracedRequest: guard new URL() against relative inputs so the
  helper does not throw before reaching fetch.
- EngineLogExporter: cap pendingExports at 100 to prevent unbounded
  memory growth during long outages; dropped entries fail with a clear
  error and a warning is logged.
- safeStringify: coalesce JSON.stringify's possible undefined return
  (top-level undefined/function/symbol) so the documented string
  contract is honored.
- flush_otel: offload provider force_flush() to asyncio.to_thread and
  await via gather so the synchronous OTel calls do not block the event
  loop.
- logger-runtime test: switch function id to the engine-standard ::
  separator (demo::handler).

Skipped: OtelLogEvent.timestamp_unix_nano number-precision finding.
The fields are part of a cross-SDK protocol contract that ships
verbatim from the prior surface; changing them needs aligned producer
and consumer updates outside the scope of this PR.
… to iii-observability

Removes the shim re-exports that kept iii-sdk's old public surface
pointing at the moved observability symbols, and retargets every
in-tree caller to import directly from the new package.

Node:
- Delete sdk/packages/node/iii/src/telemetry.ts (back-compat shim).
- Remove the './telemetry' subpath export from package.json and the
  matching entry from tsdown.config.ts.
- Drop the two telemetry-surface tests from tests/exports.test.ts;
  they targeted the deleted shim entry.

Rust:
- Remove the 'pub use iii_observability::{...}' back-compat block
  from sdk/packages/rust/iii/src/lib.rs.
- Add iii-observability as a direct dependency on engine,
  console-rust, iii-worker, and iii-example, then switch their
  imports from iii_sdk::{Logger, OtelConfig, BaggageSpanProcessor,
  execute_traced_request, ...} to iii_observability::{...}.
- Update sdk/packages/rust/iii/tests/{init_api,payload_capture,
  span_ops_api}.rs to import from iii_observability.
- Update the Rust README and how-to doc snippet to use
  iii_observability::Logger.

Python:
- Drop __version__ from iii_observability/__init__.py; the release
  pipeline reads the version from pyproject.toml, so keeping a hand-
  maintained constant in __init__.py only invites drift.
- Hoist 'from opentelemetry import trace' to the top of
  observability/src/iii_observability/telemetry.py and remove the
  redundant in-function import inside flush_otel.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

skill-check — docs

5 verified, 283 skipped.

Layer Result
structure
vale
ai

Three for three. Nicely done.

…ger from iii_observability

State and StreamClient called the sync iii.trigger() from inside async
handlers. trigger() routes through _run_on_loop which refuses calls
made from the event loop thread, so every async handler that read or
wrote streams/state raised:

  RuntimeError: Cannot call sync SDK methods from the event loop
  thread. Use async handler methods instead.

Switch the nine .trigger() call sites in state.py and stream.py to
their await-able trigger_async() counterpart.

Also depend on iii-observability directly (path source via
[tool.uv.sources]) and import Logger from it in hooks.py instead of
re-exporting through iii. Aligns the example with the new package
boundary.
@guibeira guibeira marked this pull request as ready for review May 23, 2026 14:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
.github/workflows/release-iii.yml (1)

396-403: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

sdk-npm should pre-build workspace observability dependency inside _npm.yml.

needs: observability-npm only sequences jobs; it does not build local workspace deps in the sdk-npm job. Since iii-sdk already needed this prerequisite in CI (Line 548 in .github/workflows/ci.yml), release publish should pass pre_build_filter too to keep the build deterministic.

💡 Suggested workflow fix
   sdk-npm:
     name: SDK Node Publish
     needs: [setup, create-iii-release, observability-npm]
     if: ${{ !failure() && !cancelled() }}
     uses: ./.github/workflows/_npm.yml
     with:
       package_path: sdk/packages/node/iii
       npm_tag: ${{ needs.setup.outputs.npm_tag }}
+      pre_build_filter: "`@iii-dev/observability`"
       build_filter: iii-sdk
       dry_run: ${{ needs.setup.outputs.dry_run == 'true' }}
       slack_thread_ts: ${{ needs.setup.outputs.slack_ts }}
       slack_label: SDK Node (npm)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release-iii.yml around lines 396 - 403, The sdk-npm
release job currently sequences observability-npm but does not pre-build that
workspace dependency; update the job that uses ./.github/workflows/_npm.yml (the
block with package_path: sdk/packages/node/iii and build_filter: iii-sdk) to
pass the pre_build_filter input (e.g., pre_build_filter: observability-npm) so
the _npm.yml template will pre-build the observability workspace dependency
before publishing.
docs/0-11-0/how-to/observability-and-logs.mdx (1)

63-71: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix remaining Rust Logger import in docs/0-11-0/how-to/observability-and-logs.mdx

Line 168 still imports Logger from iii_sdk, leaving conflicting guidance on the same page (the earlier snippet uses iii_observability::Logger).

📌 Proposed doc fix
-use iii_sdk::{register_worker, InitOptions, Logger, RegisterFunction};
+use iii_observability::Logger;
+use iii_sdk::{register_worker, InitOptions, RegisterFunction};

(There are additional stale iii_sdk Logger imports in other docs files as well, but this change is required for this page’s consistency.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/0-11-0/how-to/observability-and-logs.mdx` around lines 63 - 71, Update
the Rust import so the docs consistently reference iii_observability::Logger
rather than the stale iii_sdk import: find the import statement that brings
Logger from iii_sdk (the conflicting example on the page) and change it to use
iii_observability::Logger, and scan the same docs for other occurrences of
iii_sdk::Logger to make the same replacement so all Rust snippets reference the
iii_observability Logger type consistently.
engine/src/workers/observability/otel.rs (1)

782-810: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Both exporter path is missing BaggageSpanProcessor, causing mode-specific behavior drift.

ExporterType::Both success path builds the provider without .with_span_processor(...), unlike the other branches. This drops baggage span processing when OTLP is healthy.

Suggested fix
                 Ok(otlp_exporter) => {
                     init_sdk_span_forwarder(&config.endpoint);

                     // Create tee exporter that sends to both
                     let tee_exporter = TeeSpanExporter::new(
                         otlp_exporter,
                         memory_storage,
                         config.service_name.clone(),
                     );

                     SdkTracerProvider::builder()
+                        .with_span_processor(iii_observability::BaggageSpanProcessor::default())
                         .with_batch_exporter(tee_exporter)
                         .with_sampler(sampler)
                         .with_id_generator(RandomIdGenerator::default())
                         .with_resource(resource)
                         .build()
                 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engine/src/workers/observability/otel.rs` around lines 782 - 810, In the
ExporterType::Both branch you build the SdkTracerProvider without attaching the
BaggageSpanProcessor, causing baggage processing to be skipped when OTLP is
available; update the ExporterType::Both success path (the block creating
TeeSpanExporter and calling SdkTracerProvider::builder()) to also register the
BaggageSpanProcessor (the same span processor used in the other branches) via
.with_span_processor(...) so baggage is processed consistently (keep
init_sdk_span_forwarder, TeeSpanExporter, sampler, id generator, and resource
unchanged).
🧹 Nitpick comments (2)
sdk/packages/node/observability/src/http-instrumentation.test.ts (1)

5-15: ⚡ Quick win

Add a regression test for Request inputs with existing headers.

Current coverage only checks string URL input; it won’t catch header loss when executeTracedRequest receives a Request.

✅ Suggested additional test
 describe('executeTracedRequest', () => {
   it('forwards to fetch and returns the response', async () => {
@@
   })
+
+  it('preserves existing Request headers when injecting trace headers', async () => {
+    const originalFetch = globalThis.fetch
+    const fetchMock = vi.fn().mockResolvedValue(new Response('ok', { status: 200 }))
+    globalThis.fetch = fetchMock as typeof fetch
+    try {
+      const req = new Request('https://example.com/api', {
+        headers: { authorization: 'Bearer token' },
+      })
+      await executeTracedRequest(req)
+      const [, init] = fetchMock.mock.calls[0]
+      const headers = new Headers(init?.headers)
+      expect(headers.get('authorization')).toBe('Bearer token')
+    } finally {
+      globalThis.fetch = originalFetch
+    }
+  })
 })
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/node/observability/src/http-instrumentation.test.ts` around
lines 5 - 15, Add a regression test in http-instrumentation.test.ts that calls
executeTracedRequest with a Request instance that already contains headers
(e.g., an Authorization or custom header) to ensure headers are preserved; mock
globalThis.fetch (like existing tests) to capture the incoming Request argument
and assert the header is present on the request passed to fetch, and that the
response from fetch is forwarded unchanged (status/body). Reference
executeTracedRequest and the existing test pattern for restoring
globalThis.fetch when implementing the new test.
sdk/packages/python/observability/tests/test_http_instrumentation.py (1)

6-22: ⚡ Quick win

Tighten assertions to validate instrumentation behavior (not only HTTP responses).

Line 17’s test name implies span error-status behavior, but it only checks res.status_code. Please either assert instrumentation side-effects (e.g., traceparent header presence via mocked requests) or rename tests to reflect response-only validation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/python/observability/tests/test_http_instrumentation.py` around
lines 6 - 22, The test test_execute_traced_request_records_error_status
currently only asserts HTTP response code; update it to validate instrumentation
side-effects from execute_traced_request by asserting that the outgoing request
carried tracing headers (e.g., presence of "traceparent" or "tracestate" on the
built request or in httpx_mock.request_history) or by inspecting recorded spans
from the test tracer/exporter to ensure the span has error status for 500
responses; modify the test to either (a) check
httpx_mock.request_history[0].headers contains "traceparent" and that a
span/emitted event reflects error status, or (b) rename the test to indicate it
only validates HTTP response if you prefer not to assert instrumentation. Ensure
you reference execute_traced_request and
test_execute_traced_request_records_error_status when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sdk/packages/node/observability/src/telemetry-system/connection.ts`:
- Around line 117-120: The loop that invokes each registered failure handler
(the this.onFailedCallbacks iteration) must isolate exceptions so one bad
callback doesn't stop the rest; wrap each cb() call in a try/catch and on error
log the exception (use an existing logger if available, e.g. this.logger.error,
or console.error) and continue to the next callback. Apply the same change to
the other place where you iterate this.onFailedCallbacks (the second draining
loop) so both notification sites catch and log individual callback errors and
keep draining exporters.

---

Outside diff comments:
In @.github/workflows/release-iii.yml:
- Around line 396-403: The sdk-npm release job currently sequences
observability-npm but does not pre-build that workspace dependency; update the
job that uses ./.github/workflows/_npm.yml (the block with package_path:
sdk/packages/node/iii and build_filter: iii-sdk) to pass the pre_build_filter
input (e.g., pre_build_filter: observability-npm) so the _npm.yml template will
pre-build the observability workspace dependency before publishing.

In `@docs/0-11-0/how-to/observability-and-logs.mdx`:
- Around line 63-71: Update the Rust import so the docs consistently reference
iii_observability::Logger rather than the stale iii_sdk import: find the import
statement that brings Logger from iii_sdk (the conflicting example on the page)
and change it to use iii_observability::Logger, and scan the same docs for other
occurrences of iii_sdk::Logger to make the same replacement so all Rust snippets
reference the iii_observability Logger type consistently.

In `@engine/src/workers/observability/otel.rs`:
- Around line 782-810: In the ExporterType::Both branch you build the
SdkTracerProvider without attaching the BaggageSpanProcessor, causing baggage
processing to be skipped when OTLP is available; update the ExporterType::Both
success path (the block creating TeeSpanExporter and calling
SdkTracerProvider::builder()) to also register the BaggageSpanProcessor (the
same span processor used in the other branches) via .with_span_processor(...) so
baggage is processed consistently (keep init_sdk_span_forwarder,
TeeSpanExporter, sampler, id generator, and resource unchanged).

---

Nitpick comments:
In `@sdk/packages/node/observability/src/http-instrumentation.test.ts`:
- Around line 5-15: Add a regression test in http-instrumentation.test.ts that
calls executeTracedRequest with a Request instance that already contains headers
(e.g., an Authorization or custom header) to ensure headers are preserved; mock
globalThis.fetch (like existing tests) to capture the incoming Request argument
and assert the header is present on the request passed to fetch, and that the
response from fetch is forwarded unchanged (status/body). Reference
executeTracedRequest and the existing test pattern for restoring
globalThis.fetch when implementing the new test.

In `@sdk/packages/python/observability/tests/test_http_instrumentation.py`:
- Around line 6-22: The test test_execute_traced_request_records_error_status
currently only asserts HTTP response code; update it to validate instrumentation
side-effects from execute_traced_request by asserting that the outgoing request
carried tracing headers (e.g., presence of "traceparent" or "tracestate" on the
built request or in httpx_mock.request_history) or by inspecting recorded spans
from the test tracer/exporter to ensure the span has error status for 500
responses; modify the test to either (a) check
httpx_mock.request_history[0].headers contains "traceparent" and that a
span/emitted event reflects error status, or (b) rename the test to indicate it
only validates HTTP response if you prefer not to assert instrumentation. Ensure
you reference execute_traced_request and
test_execute_traced_request_records_error_status when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 62d3dca0-3951-4a06-bd44-d02127c4e78a

📥 Commits

Reviewing files that changed from the base of the PR and between 148f013 and 3a44d9b.

⛔ Files ignored due to path filters (5)
  • Cargo.lock is excluded by !**/*.lock
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
  • sdk/packages/python/iii-example/uv.lock is excluded by !**/*.lock
  • sdk/packages/python/iii/uv.lock is excluded by !**/*.lock
  • sdk/packages/python/observability/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (132)
  • .github/workflows/ci.yml
  • .github/workflows/release-iii.yml
  • Cargo.toml
  • console/packages/console-rust/Cargo.toml
  • console/packages/console-rust/src/main.rs
  • crates/iii-worker/Cargo.toml
  • crates/iii-worker/src/cli/worker_manager_daemon.rs
  • crates/iii-worker/src/sandbox_daemon/mod.rs
  • docs/0-11-0/how-to/observability-and-logs.mdx
  • engine/Cargo.toml
  • engine/src/workers/observability/otel.rs
  • pnpm-workspace.yaml
  • sdk/packages/node/iii/package.json
  • sdk/packages/node/iii/src/iii.ts
  • sdk/packages/node/iii/src/index.ts
  • sdk/packages/node/iii/src/types.ts
  • sdk/packages/node/iii/src/utils.ts
  • sdk/packages/node/iii/tests/baggage-span-processor.test.ts
  • sdk/packages/node/iii/tests/bridge-utils.ts
  • sdk/packages/node/iii/tests/exports.test.ts
  • sdk/packages/node/iii/tests/fetch-instrumentation.test.ts
  • sdk/packages/node/iii/tests/logger-runtime.test.ts
  • sdk/packages/node/iii/tests/logger.test.ts
  • sdk/packages/node/iii/tests/otel-defaults.test.ts
  • sdk/packages/node/iii/tests/otel-worker-gauges.test.ts
  • sdk/packages/node/iii/tests/payload.test.ts
  • sdk/packages/node/iii/tests/span-ops.test.ts
  • sdk/packages/node/iii/tests/worker-metrics.test.ts
  • sdk/packages/node/iii/tsdown.config.ts
  • sdk/packages/node/iii/vitest.config.ts
  • sdk/packages/node/observability/README.md
  • sdk/packages/node/observability/package.json
  • sdk/packages/node/observability/src/http-instrumentation.test.ts
  • sdk/packages/node/observability/src/http-instrumentation.ts
  • sdk/packages/node/observability/src/index.ts
  • sdk/packages/node/observability/src/logger.test.ts
  • sdk/packages/node/observability/src/logger.ts
  • sdk/packages/node/observability/src/otel-worker-gauges.ts
  • sdk/packages/node/observability/src/telemetry-system/baggage-span-processor.ts
  • sdk/packages/node/observability/src/telemetry-system/connection.ts
  • sdk/packages/node/observability/src/telemetry-system/context.test.ts
  • sdk/packages/node/observability/src/telemetry-system/context.ts
  • sdk/packages/node/observability/src/telemetry-system/exporters.ts
  • sdk/packages/node/observability/src/telemetry-system/fetch-instrumentation.ts
  • sdk/packages/node/observability/src/telemetry-system/index.ts
  • sdk/packages/node/observability/src/telemetry-system/log-exporter.ts
  • sdk/packages/node/observability/src/telemetry-system/metrics-exporter.ts
  • sdk/packages/node/observability/src/telemetry-system/payload.ts
  • sdk/packages/node/observability/src/telemetry-system/span-exporter.ts
  • sdk/packages/node/observability/src/telemetry-system/span-ops.ts
  • sdk/packages/node/observability/src/telemetry-system/types.ts
  • sdk/packages/node/observability/src/telemetry-system/utils.ts
  • sdk/packages/node/observability/src/types.ts
  • sdk/packages/node/observability/src/utils.ts
  • sdk/packages/node/observability/src/worker-metrics.ts
  • sdk/packages/node/observability/tsconfig.json
  • sdk/packages/node/observability/tsdown.config.ts
  • sdk/packages/node/observability/vitest.config.ts
  • sdk/packages/python/iii-example/pyproject.toml
  • sdk/packages/python/iii-example/src/hooks.py
  • sdk/packages/python/iii-example/src/state.py
  • sdk/packages/python/iii-example/src/stream.py
  • sdk/packages/python/iii/pyproject.toml
  • sdk/packages/python/iii/src/iii/__init__.py
  • sdk/packages/python/iii/src/iii/iii.py
  • sdk/packages/python/iii/src/iii/iii_constants.py
  • sdk/packages/python/iii/tests/test_baggage_span_processor.py
  • sdk/packages/python/iii/tests/test_context_propagation.py
  • sdk/packages/python/iii/tests/test_hold_process.py
  • sdk/packages/python/iii/tests/test_http_external_functions_integration.py
  • sdk/packages/python/iii/tests/test_iii_registration_dedup.py
  • sdk/packages/python/iii/tests/test_init_api.py
  • sdk/packages/python/iii/tests/test_logger_function_ids.py
  • sdk/packages/python/iii/tests/test_logger_otel.py
  • sdk/packages/python/iii/tests/test_payload.py
  • sdk/packages/python/iii/tests/test_register_function_args.py
  • sdk/packages/python/iii/tests/test_span_ops.py
  • sdk/packages/python/iii/tests/test_sync_api.py
  • sdk/packages/python/iii/tests/test_telemetry.py
  • sdk/packages/python/iii/tests/test_telemetry_exporters.py
  • sdk/packages/python/iii/tests/test_telemetry_types.py
  • sdk/packages/python/iii/tests/test_trace_helpers.py
  • sdk/packages/python/observability/README.md
  • sdk/packages/python/observability/pyproject.toml
  • sdk/packages/python/observability/src/iii_observability/__init__.py
  • sdk/packages/python/observability/src/iii_observability/baggage_span_processor.py
  • sdk/packages/python/observability/src/iii_observability/http_instrumentation.py
  • sdk/packages/python/observability/src/iii_observability/logger.py
  • sdk/packages/python/observability/src/iii_observability/payload.py
  • sdk/packages/python/observability/src/iii_observability/py.typed
  • sdk/packages/python/observability/src/iii_observability/reconnection.py
  • sdk/packages/python/observability/src/iii_observability/span_ops.py
  • sdk/packages/python/observability/src/iii_observability/telemetry.py
  • sdk/packages/python/observability/src/iii_observability/telemetry_exporters.py
  • sdk/packages/python/observability/src/iii_observability/telemetry_types.py
  • sdk/packages/python/observability/tests/__init__.py
  • sdk/packages/python/observability/tests/test_http_instrumentation.py
  • sdk/packages/python/observability/tests/test_telemetry.py
  • sdk/packages/rust/iii-example/Cargo.toml
  • sdk/packages/rust/iii-example/src/http_example.rs
  • sdk/packages/rust/iii-example/src/logger_example.rs
  • sdk/packages/rust/iii-example/src/main.rs
  • sdk/packages/rust/iii/Cargo.toml
  • sdk/packages/rust/iii/README.md
  • sdk/packages/rust/iii/src/iii.rs
  • sdk/packages/rust/iii/src/lib.rs
  • sdk/packages/rust/iii/tests/init_api.rs
  • sdk/packages/rust/iii/tests/payload_capture.rs
  • sdk/packages/rust/iii/tests/span_ops_api.rs
  • sdk/packages/rust/observability/Cargo.toml
  • sdk/packages/rust/observability/README.md
  • sdk/packages/rust/observability/src/lib.rs
  • sdk/packages/rust/observability/src/logger.rs
  • sdk/packages/rust/observability/src/telemetry/baggage_span_processor.rs
  • sdk/packages/rust/observability/src/telemetry/connection.rs
  • sdk/packages/rust/observability/src/telemetry/context.rs
  • sdk/packages/rust/observability/src/telemetry/http_instrumentation.rs
  • sdk/packages/rust/observability/src/telemetry/json_serializer.rs
  • sdk/packages/rust/observability/src/telemetry/log_exporter.rs
  • sdk/packages/rust/observability/src/telemetry/metrics_exporter.rs
  • sdk/packages/rust/observability/src/telemetry/mod.rs
  • sdk/packages/rust/observability/src/telemetry/otel_worker_gauges.rs
  • sdk/packages/rust/observability/src/telemetry/payload.rs
  • sdk/packages/rust/observability/src/telemetry/span_exporter.rs
  • sdk/packages/rust/observability/src/telemetry/span_ops.rs
  • sdk/packages/rust/observability/src/telemetry/types.rs
  • sdk/packages/rust/observability/src/telemetry/worker_metrics.rs
  • sdk/packages/rust/observability/tests/context_test.rs
  • sdk/packages/rust/observability/tests/init_test.rs
  • sdk/packages/rust/observability/tests/logger_test.rs
  • sdk/packages/rust/observability/tests/payload_test.rs
  • sdk/packages/rust/observability/tests/types_test.rs
💤 Files with no reviewable changes (3)
  • sdk/packages/node/iii/src/utils.ts
  • sdk/packages/node/iii/src/types.ts
  • sdk/packages/node/iii/tests/exports.test.ts
✅ Files skipped from review due to trivial changes (28)
  • sdk/packages/rust/observability/README.md
  • crates/iii-worker/Cargo.toml
  • sdk/packages/python/iii/tests/test_trace_helpers.py
  • sdk/packages/rust/iii-example/src/logger_example.rs
  • console/packages/console-rust/Cargo.toml
  • engine/Cargo.toml
  • sdk/packages/node/iii/tests/worker-metrics.test.ts
  • sdk/packages/rust/observability/src/telemetry/http_instrumentation.rs
  • sdk/packages/rust/iii/tests/init_api.rs
  • sdk/packages/rust/iii-example/Cargo.toml
  • sdk/packages/node/observability/vitest.config.ts
  • sdk/packages/node/observability/package.json
  • sdk/packages/python/iii/tests/test_logger_function_ids.py
  • sdk/packages/rust/observability/src/telemetry/mod.rs
  • sdk/packages/rust/iii/README.md
  • sdk/packages/python/observability/README.md
  • sdk/packages/python/iii-example/src/hooks.py
  • sdk/packages/rust/observability/Cargo.toml
  • sdk/packages/node/iii/tests/payload.test.ts
  • sdk/packages/python/iii/tests/test_telemetry_exporters.py
  • sdk/packages/python/iii/tests/test_baggage_span_processor.py
  • sdk/packages/python/iii/tests/test_init_api.py
  • sdk/packages/python/observability/pyproject.toml
  • sdk/packages/python/iii/tests/test_payload.py
  • sdk/packages/node/observability/README.md
  • sdk/packages/rust/observability/src/logger.rs
  • sdk/packages/python/iii/src/iii/iii.py
  • sdk/packages/python/iii/tests/test_telemetry_types.py

Comment on lines +117 to +120
// Notify failed callbacks so exporters can drain their own queues
for (const cb of this.onFailedCallbacks) {
cb()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate onFailed callback exceptions so draining stays robust.

If one onFailed callback throws, remaining callbacks may never run, which can leave exporter queues undrained.

💡 Suggested fix
       // Notify failed callbacks so exporters can drain their own queues
       for (const cb of this.onFailedCallbacks) {
-        cb()
+        try {
+          cb()
+        } catch (err) {
+          console.error('[OTel] onFailed callback threw:', err)
+        }
       }
@@
   onFailed(callback: () => void): void {
     this.onFailedCallbacks.push(callback)
     if (this.state === 'failed') {
-      callback()
+      try {
+        callback()
+      } catch (err) {
+        console.error('[OTel] onFailed callback threw:', err)
+      }
     }
   }

Also applies to: 180-184

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/node/observability/src/telemetry-system/connection.ts` around
lines 117 - 120, The loop that invokes each registered failure handler (the
this.onFailedCallbacks iteration) must isolate exceptions so one bad callback
doesn't stop the rest; wrap each cb() call in a try/catch and on error log the
exception (use an existing logger if available, e.g. this.logger.error, or
console.error) and continue to the next callback. Apply the same change to the
other place where you iterate this.onFailedCallbacks (the second draining loop)
so both notification sites catch and log individual callback errors and keep
draining exporters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant