feat/iii-observability#1678
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR extracts observability, telemetry, and logging functionality into three shared language-specific packages ( ChangesObservability Package Extraction & Multi-Language SDK Refactoring
Sequence Diagram(s)No sequence diagrams generated; the changes are primarily refactoring and re-export consolidation without introducing new multi-step workflows at the SDK-user level. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdk/packages/rust/observability/src/telemetry/mod.rs (1)
402-404:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent
OTEL_REF_COUNTunderflow inshutdown_otelwheninit_otelis a disabled no-op.
OTEL_REF_COUNTstarts atAtomicUsize::new(0), andinit_otelreturnsfalseforenabled: Some(false)before anyfetch_add.shutdown_otelunconditionally doesOTEL_REF_COUNT.fetch_sub(1, ...), so calling it after a disabled init underflows and corrupts later ref-counting (the testinit_otel_with_disabled_config_is_noopcovers this).Suggested fix
pub async fn shutdown_otel() { - let prev = OTEL_REF_COUNT.fetch_sub(1, Ordering::SeqCst); + let prev = match OTEL_REF_COUNT.fetch_update( + Ordering::SeqCst, + Ordering::SeqCst, + |count| count.checked_sub(1), + ) { + Ok(prev) => prev, + Err(_) => { + tracing::debug!("OpenTelemetry not initialized, skipping shutdown"); + return; + } + }; if prev > 1 { tracing::debug!( remaining = prev - 1, "OpenTelemetry still in use, skipping shutdown" ); return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/packages/rust/observability/src/telemetry/mod.rs` around lines 402 - 404, shutdown_otel currently unconditionally calls OTEL_REF_COUNT.fetch_sub(1, ...) which underflows if init_otel was a disabled no-op; change shutdown_otel to only decrement the atomic when its current value is > 0 (use AtomicUsize::fetch_update or a load + compare_exchange loop with SeqCst to avoid races) so OTEL_REF_COUNT never goes below zero; update the code in shutdown_otel (referencing OTEL_REF_COUNT and the shutdown_otel function) to perform a conditional decrement and only run the shutdown logic when the decrement succeeds.
🧹 Nitpick comments (2)
sdk/packages/node/observability/package.json (1)
8-8: 💤 Low valueConsider adding a
repositoryfield for better discoverability.Adding a repository field helps with package discoverability and tooling integration. Consider adding:
"repository": { "type": "git", "url": "https://github.com/iii-hq/iii.git", "directory": "sdk/packages/node/observability" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/packages/node/observability/package.json` at line 8, The package.json for the observability package is missing a repository field which reduces discoverability; add a "repository" entry to package.json with type "git", the repo URL "https://github.com/iii-hq/iii.git" and the directory "sdk/packages/node/observability" so tooling and users can locate the source for this package.sdk/packages/python/observability/src/iii_observability/reconnection.py (1)
20-24: ⚡ Quick winValidate reconnection parameter bounds at construction.
ReconnectionConfigaccepts invalid runtime values today (negative delays/retries other than-1, multiplier< 1, jitter outside[0,1]), which can produce pathological retry behavior.Suggested fix
`@dataclass` class ReconnectionConfig: @@ initial_delay_ms: int = 1000 max_delay_ms: int = 30000 backoff_multiplier: float = 2.0 jitter_factor: float = 0.3 max_retries: int = -1 + + def __post_init__(self) -> None: + if self.initial_delay_ms < 0: + raise ValueError("initial_delay_ms must be >= 0") + if self.max_delay_ms < self.initial_delay_ms: + raise ValueError("max_delay_ms must be >= initial_delay_ms") + if self.backoff_multiplier < 1.0: + raise ValueError("backoff_multiplier must be >= 1.0") + if not 0.0 <= self.jitter_factor <= 1.0: + raise ValueError("jitter_factor must be between 0 and 1") + if self.max_retries < -1: + raise ValueError("max_retries must be -1 or >= 0")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/packages/python/observability/src/iii_observability/reconnection.py` around lines 20 - 24, ReconnectionConfig currently allows invalid values; add validation in the ReconnectionConfig constructor or __post_init__ that checks the fields and raises ValueError on bad input: ensure initial_delay_ms and max_delay_ms are integers >= 0 (and optionally require max_delay_ms >= initial_delay_ms), backoff_multiplier is a float >= 1.0, jitter_factor is a float in [0.0, 1.0], and max_retries is an int >= -1 (only -1 allowed as sentinel for unlimited); reference the ReconnectionConfig class and its fields initial_delay_ms, max_delay_ms, backoff_multiplier, jitter_factor, and max_retries when adding these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@sdk/packages/node/iii/tests/logger-runtime.test.ts`:
- Around line 51-57: The test registers and looks up a function using the dotted
ID "demo.handler"; update both uses to the engine-standard separator by
replacing "demo.handler" with "demo::handler" in the sdk.registerFunction(...)
call and the sdk.functions.get(...) lookup so the test uses the `::` function ID
format (referencing sdk.registerFunction and sdk.functions.get).
In `@sdk/packages/node/observability/src/http-instrumentation.ts`:
- Around line 22-25: The URL constructor is called on rawUrl without a base
which throws for relative request URLs (e.g. "/orders"); update the parsing
logic around rawUrl/URL so relative URLs are handled safely: detect if rawUrl is
a relative path (no scheme/host or starts with "/") and pass a safe base (for
example a harmless origin like "http://localhost") or wrap new URL(rawUrl) in a
try/catch and fall back to new URL(rawUrl, 'http://localhost'), then compute
method and name from that safe URL; adjust the same guard where URL is used
later (lines 31-40) so executeTracedRequest won’t throw on relative inputs.
In `@sdk/packages/node/observability/src/telemetry-system/log-exporter.ts`:
- Around line 55-57: The pendingExports queue in log-exporter.ts (where it
pushes in the block containing if (state !== 'connected') {
this.pendingExports.push({ logs, callback: resultCallback }) }) must be bounded
to prevent unbounded memory growth; add a MAX_PENDING_EXPORTS constant and
enforce it before push (e.g., if queue.length >= MAX_PENDING_EXPORTS then remove
oldest entry or reject the incoming export), and when dropping an export invoke
the associated callback with an error/Failed result and emit a warning via the
exporter logger so callers are notified; ensure this logic is applied around the
pendingExports push in the same method and that unit tests cover both enqueue
and drop behavior.
In `@sdk/packages/node/observability/src/types.ts`:
- Around line 4-7: The two fields timestamp_unix_nano and
observed_timestamp_unix_nano must not be plain JS numbers; change their type in
sdk/packages/node/observability/src/types.ts to a lossless representation
(prefer a string-based UNIX-ns field, e.g. string) and update
producers/serialization and consumers to treat them as lossless values
(producers emit the nanoseconds as strings, and consumers/console parse to
BigInt before math). Specifically, replace the number type for
timestamp_unix_nano and observed_timestamp_unix_nano, update any serialization
code that writes these fields to emit stringified nanoseconds, and update
consumers (places doing arithmetic like "... / 1_000_000" or
"a.timestamp_unix_nano - b.timestamp_unix_nano") to parse the value to BigInt
and perform integer arithmetic (use BigInt division by 1_000_000n to get
milliseconds) so ordering and timestamps remain exact.
In `@sdk/packages/node/observability/src/utils.ts`:
- Around line 5-10: The safeStringify function can return undefined when
JSON.stringify serializes a top-level undefined/function/symbol, violating its
string return type; update safeStringify (in both implementations) to capture
the result of JSON.stringify(...) and always return a string by coalescing
undefined to a canonical fallback (e.g. '"[unserializable]"' or
'[unserializable]')—use the existing replacer/WeakSet logic as-is, but change
the final return to something like: const out = JSON.stringify(...); return out
?? '"[unserializable]"'.
In `@sdk/packages/python/observability/src/iii_observability/telemetry.py`:
- Around line 330-345: flush_otel currently calls blocking
provider.force_flush() methods directly inside an async function; offload these
synchronous calls to a background thread so the event loop isn't blocked. Change
flush_otel to await asynchronous thread execution for each provider call (e.g.,
using asyncio.to_thread or loop.run_in_executor) when invoking
tracer_provider.force_flush(), _meter_provider.force_flush(), and
_log_provider.force_flush(); preserve the existing hasattr checks
(tracer_provider in trace.get_tracer_provider(), _meter_provider, _log_provider)
and await the threaded calls in sequence (or concurrently with asyncio.gather)
and surface any exceptions as needed.
---
Outside diff comments:
In `@sdk/packages/rust/observability/src/telemetry/mod.rs`:
- Around line 402-404: shutdown_otel currently unconditionally calls
OTEL_REF_COUNT.fetch_sub(1, ...) which underflows if init_otel was a disabled
no-op; change shutdown_otel to only decrement the atomic when its current value
is > 0 (use AtomicUsize::fetch_update or a load + compare_exchange loop with
SeqCst to avoid races) so OTEL_REF_COUNT never goes below zero; update the code
in shutdown_otel (referencing OTEL_REF_COUNT and the shutdown_otel function) to
perform a conditional decrement and only run the shutdown logic when the
decrement succeeds.
---
Nitpick comments:
In `@sdk/packages/node/observability/package.json`:
- Line 8: The package.json for the observability package is missing a repository
field which reduces discoverability; add a "repository" entry to package.json
with type "git", the repo URL "https://github.com/iii-hq/iii.git" and the
directory "sdk/packages/node/observability" so tooling and users can locate the
source for this package.
In `@sdk/packages/python/observability/src/iii_observability/reconnection.py`:
- Around line 20-24: ReconnectionConfig currently allows invalid values; add
validation in the ReconnectionConfig constructor or __post_init__ that checks
the fields and raises ValueError on bad input: ensure initial_delay_ms and
max_delay_ms are integers >= 0 (and optionally require max_delay_ms >=
initial_delay_ms), backoff_multiplier is a float >= 1.0, jitter_factor is a
float in [0.0, 1.0], and max_retries is an int >= -1 (only -1 allowed as
sentinel for unlimited); reference the ReconnectionConfig class and its fields
initial_delay_ms, max_delay_ms, backoff_multiplier, jitter_factor, and
max_retries when adding these checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 56c6ae23-90e0-4c35-82ed-1b5e2392f6a6
⛔ Files ignored due to path filters (3)
Cargo.lockis excluded by!**/*.lockpnpm-lock.yamlis excluded by!**/pnpm-lock.yamlsdk/packages/python/observability/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (101)
.github/workflows/release-iii.ymlCargo.tomlpnpm-workspace.yamlsdk/packages/node/iii/package.jsonsdk/packages/node/iii/src/iii.tssdk/packages/node/iii/src/index.tssdk/packages/node/iii/src/telemetry.tssdk/packages/node/iii/src/types.tssdk/packages/node/iii/src/utils.tssdk/packages/node/iii/tests/baggage-span-processor.test.tssdk/packages/node/iii/tests/bridge-utils.tssdk/packages/node/iii/tests/exports.test.tssdk/packages/node/iii/tests/fetch-instrumentation.test.tssdk/packages/node/iii/tests/logger-runtime.test.tssdk/packages/node/iii/tests/logger.test.tssdk/packages/node/iii/tests/otel-defaults.test.tssdk/packages/node/iii/tests/otel-worker-gauges.test.tssdk/packages/node/iii/tests/payload.test.tssdk/packages/node/iii/tests/span-ops.test.tssdk/packages/node/iii/tests/worker-metrics.test.tssdk/packages/node/iii/vitest.config.tssdk/packages/node/observability/README.mdsdk/packages/node/observability/package.jsonsdk/packages/node/observability/src/http-instrumentation.test.tssdk/packages/node/observability/src/http-instrumentation.tssdk/packages/node/observability/src/index.tssdk/packages/node/observability/src/logger.test.tssdk/packages/node/observability/src/logger.tssdk/packages/node/observability/src/otel-worker-gauges.tssdk/packages/node/observability/src/telemetry-system/baggage-span-processor.tssdk/packages/node/observability/src/telemetry-system/connection.tssdk/packages/node/observability/src/telemetry-system/context.test.tssdk/packages/node/observability/src/telemetry-system/context.tssdk/packages/node/observability/src/telemetry-system/exporters.tssdk/packages/node/observability/src/telemetry-system/fetch-instrumentation.tssdk/packages/node/observability/src/telemetry-system/index.tssdk/packages/node/observability/src/telemetry-system/log-exporter.tssdk/packages/node/observability/src/telemetry-system/metrics-exporter.tssdk/packages/node/observability/src/telemetry-system/payload.tssdk/packages/node/observability/src/telemetry-system/span-exporter.tssdk/packages/node/observability/src/telemetry-system/span-ops.tssdk/packages/node/observability/src/telemetry-system/types.tssdk/packages/node/observability/src/telemetry-system/utils.tssdk/packages/node/observability/src/types.tssdk/packages/node/observability/src/utils.tssdk/packages/node/observability/src/worker-metrics.tssdk/packages/node/observability/tsconfig.jsonsdk/packages/node/observability/tsdown.config.tssdk/packages/node/observability/vitest.config.tssdk/packages/python/iii/pyproject.tomlsdk/packages/python/iii/src/iii/__init__.pysdk/packages/python/iii/src/iii/iii.pysdk/packages/python/iii/src/iii/iii_constants.pysdk/packages/python/iii/tests/test_baggage_span_processor.pysdk/packages/python/iii/tests/test_logger_otel.pysdk/packages/python/iii/tests/test_payload.pysdk/packages/python/iii/tests/test_span_ops.pysdk/packages/python/iii/tests/test_telemetry.pysdk/packages/python/iii/tests/test_telemetry_types.pysdk/packages/python/observability/README.mdsdk/packages/python/observability/pyproject.tomlsdk/packages/python/observability/src/iii_observability/__init__.pysdk/packages/python/observability/src/iii_observability/baggage_span_processor.pysdk/packages/python/observability/src/iii_observability/http_instrumentation.pysdk/packages/python/observability/src/iii_observability/logger.pysdk/packages/python/observability/src/iii_observability/payload.pysdk/packages/python/observability/src/iii_observability/reconnection.pysdk/packages/python/observability/src/iii_observability/span_ops.pysdk/packages/python/observability/src/iii_observability/telemetry.pysdk/packages/python/observability/src/iii_observability/telemetry_exporters.pysdk/packages/python/observability/src/iii_observability/telemetry_types.pysdk/packages/python/observability/tests/__init__.pysdk/packages/python/observability/tests/test_http_instrumentation.pysdk/packages/python/observability/tests/test_telemetry.pysdk/packages/rust/iii/Cargo.tomlsdk/packages/rust/iii/src/iii.rssdk/packages/rust/iii/src/lib.rssdk/packages/rust/iii/tests/payload_capture.rssdk/packages/rust/observability/Cargo.tomlsdk/packages/rust/observability/README.mdsdk/packages/rust/observability/src/lib.rssdk/packages/rust/observability/src/logger.rssdk/packages/rust/observability/src/telemetry/baggage_span_processor.rssdk/packages/rust/observability/src/telemetry/connection.rssdk/packages/rust/observability/src/telemetry/context.rssdk/packages/rust/observability/src/telemetry/http_instrumentation.rssdk/packages/rust/observability/src/telemetry/json_serializer.rssdk/packages/rust/observability/src/telemetry/log_exporter.rssdk/packages/rust/observability/src/telemetry/metrics_exporter.rssdk/packages/rust/observability/src/telemetry/mod.rssdk/packages/rust/observability/src/telemetry/otel_worker_gauges.rssdk/packages/rust/observability/src/telemetry/payload.rssdk/packages/rust/observability/src/telemetry/span_exporter.rssdk/packages/rust/observability/src/telemetry/span_ops.rssdk/packages/rust/observability/src/telemetry/types.rssdk/packages/rust/observability/src/telemetry/worker_metrics.rssdk/packages/rust/observability/tests/context_test.rssdk/packages/rust/observability/tests/init_test.rssdk/packages/rust/observability/tests/logger_test.rssdk/packages/rust/observability/tests/payload_test.rssdk/packages/rust/observability/tests/types_test.rs
💤 Files with no reviewable changes (2)
- sdk/packages/node/iii/src/types.ts
- sdk/packages/node/iii/src/utils.ts
Copies payload.rs, span_ops.rs, and baggage_span_processor.rs verbatim from iii-sdk into the iii-observability crate and re-exports all public symbols at the crate root. Adds integration test payload_test.rs (TDD).
…d_request Move all OTel lifecycle functions (init_otel, shutdown_otel, flush_otel, with_span, run_in_span) and their required internal modules (connection, json_serializer, log_exporter, metrics_exporter, span_exporter, otel_worker_gauges, worker_metrics, http_instrumentation) from iii-sdk into iii-observability. Add public re-exports in lib.rs and add futures-util, tokio-tungstenite, and uuid as direct dependencies.
…compat Delete iii-sdk's own telemetry/ and logger.rs; add iii-observability as a workspace dep instead. All OTel symbols are re-exported from the new crate at the iii-sdk crate root for back-compat. iii.rs internal usages of opentelemetry:: are routed through iii_observability::opentelemetry. Test file payload_capture.rs updated to use the flat re-export path.
…rker-gauges Copy telemetry-system folder, worker-metrics.ts, and otel-worker-gauges.ts from iii-sdk into @iii-dev/observability. Add OtelLogEvent to local types.ts, safeStringify to local utils.ts, and ws dependency. All imports self-contained.
…cy with other exporters
…back-compat - Replace all direct OTel and local telemetry sources in iii-sdk with a dependency on @iii-dev/observability (workspace:*) - Delete logger.ts, otel-worker-gauges.ts, worker-metrics.ts, telemetry-system/ - Rewrite telemetry.ts as a shim: export * from '@iii-dev/observability' - Update index.ts to re-export Logger from @iii-dev/observability - Fix @iii-dev/observability to expose getMeter, getTracer, SpanKind, and resolve Logger naming conflict with @opentelemetry/api-logs Logger type (rename internal OtelApiLogger to avoid tsdown DTS bundling collision)
…er source move Update all test files that referenced moved telemetry-system source paths to import from @iii-dev/observability instead. Add @opentelemetry/sdk-trace-base and @opentelemetry/instrumentation as devDeps for tests that need them directly. Also expose getMeter/getTracer/SpanKind on @iii-dev/observability public surface since iii.ts needs them, and fix OtelApiLogger rename to resolve tsdown DTS naming collision between the Logger class and @opentelemetry/api-logs Logger type.
…test isolation - Add resolve.alias in vitest.config.ts to load observability from TypeScript source instead of compiled dist, fixing OTel context propagation (CJS/ESM dual-instantiation of @opentelemetry/api) - Update otel-defaults.test.ts and fetch-instrumentation.test.ts: update internal mock paths to point to observability source, add BatchSpanProcessor to sdk-trace-base mock, add forceFlush to all provider/processor mocks so shutdownOtel completes cleanly, add BatchLogRecordProcessor to sdk-logs mock - Add AsyncLocalStorageContextManager to span-ops.test.ts so context.with() propagates the active span for in-process assertions - Add @opentelemetry/context-async-hooks as devDependency
…ed state Exporters were queueing spans/metrics/logs when connection state was not 'connected' and waited indefinitely for an onConnected callback. Once SharedEngineConnection entered the terminal 'failed' state (max retries exhausted), queued exports were never drained and never invoked their resultCallback. BatchSpanProcessor.forceFlush() then timed out after 30s during shutdown, causing test hookTimeout failures. Add onFailed hook on SharedEngineConnection and wire all three exporters to drain their pending queues with ExportResultCode.FAILED when the connection gives up. Also fast-fail in doExport() when state is already 'failed' at export time. Add otel.reconnectionConfig (maxRetries: 3) to bridgeIII in test utils so the bridge OTel socket actually reaches the 'failed' terminal state instead of retrying forever. Bump hookTimeout to 60000 defensively.
…python package Copies telemetry.py, telemetry_types.py, telemetry_exporters.py, baggage_span_processor.py, span_ops.py, payload.py, and logger.py verbatim from iii SDK into iii_observability. Extracts ReconnectionConfig into a new reconnection.py module and wires up the public surface in __init__.py with tests confirming the exports work.
…k-compat Switch iii-sdk to depend on iii-observability instead of bundling its own copies of telemetry/logger/payload/span_ops/baggage modules. Delete the moved files, update all internal imports, and re-export all symbols from __init__.py so the public API surface is unchanged.
The __init__.py was missing exports for the following telemetry functions that are re-exported from iii_observability: - current_span_id - current_trace_id - execute_traced_request - extract_baggage / extract_traceparent - flush_otel / init_otel / shutdown_otel - inject_baggage / inject_traceparent - with_span Add these to __all__ so they are properly exported as part of the public API.
…m iii_observability
tsc resolves @iii-dev/observability via its dist/index.d.ts; the workspace package must be built before the iii-sdk type check or tsc errors with TS2307 'Cannot find module'.
iii-sdk pinned iii-observability==0.13.0.dev1, which is not yet on PyPI, so uv sync failed in CI with 'package not found in registry'. Adds [tool.uv.sources] mapping iii-observability to ../observability so uv resolves it from the workspace path, and regenerates uv.lock. Also adds a py.typed marker to iii-observability so mypy recognizes its type information from iii-sdk.
… move Tests still imported from iii.telemetry, iii.telemetry_types, iii.logger, and iii.telemetry_exporters, which were deleted when the observability surface moved to iii_observability. Updates the imports so pytest can collect the test modules.
Tests patched iii.telemetry.init_otel and iii.telemetry.attach_event_loop via monkeypatch.setattr() string targets. Production code now imports those from iii_observability.telemetry, so the patches landed on the wrong module and the real init_otel ran against a fake websocket, producing ImportError or wrong behavior. Updates the string targets to iii_observability.telemetry.* so patches actually take effect.
- executeTracedRequest: guard new URL() against relative inputs so the helper does not throw before reaching fetch. - EngineLogExporter: cap pendingExports at 100 to prevent unbounded memory growth during long outages; dropped entries fail with a clear error and a warning is logged. - safeStringify: coalesce JSON.stringify's possible undefined return (top-level undefined/function/symbol) so the documented string contract is honored. - flush_otel: offload provider force_flush() to asyncio.to_thread and await via gather so the synchronous OTel calls do not block the event loop. - logger-runtime test: switch function id to the engine-standard :: separator (demo::handler). Skipped: OtelLogEvent.timestamp_unix_nano number-precision finding. The fields are part of a cross-SDK protocol contract that ships verbatim from the prior surface; changing them needs aligned producer and consumer updates outside the scope of this PR.
… to iii-observability
Removes the shim re-exports that kept iii-sdk's old public surface
pointing at the moved observability symbols, and retargets every
in-tree caller to import directly from the new package.
Node:
- Delete sdk/packages/node/iii/src/telemetry.ts (back-compat shim).
- Remove the './telemetry' subpath export from package.json and the
matching entry from tsdown.config.ts.
- Drop the two telemetry-surface tests from tests/exports.test.ts;
they targeted the deleted shim entry.
Rust:
- Remove the 'pub use iii_observability::{...}' back-compat block
from sdk/packages/rust/iii/src/lib.rs.
- Add iii-observability as a direct dependency on engine,
console-rust, iii-worker, and iii-example, then switch their
imports from iii_sdk::{Logger, OtelConfig, BaggageSpanProcessor,
execute_traced_request, ...} to iii_observability::{...}.
- Update sdk/packages/rust/iii/tests/{init_api,payload_capture,
span_ops_api}.rs to import from iii_observability.
- Update the Rust README and how-to doc snippet to use
iii_observability::Logger.
Python:
- Drop __version__ from iii_observability/__init__.py; the release
pipeline reads the version from pyproject.toml, so keeping a hand-
maintained constant in __init__.py only invites drift.
- Hoist 'from opentelemetry import trace' to the top of
observability/src/iii_observability/telemetry.py and remove the
redundant in-function import inside flush_otel.
skill-check — docs5 verified, 283 skipped.
Three for three. Nicely done. |
…ger from iii_observability State and StreamClient called the sync iii.trigger() from inside async handlers. trigger() routes through _run_on_loop which refuses calls made from the event loop thread, so every async handler that read or wrote streams/state raised: RuntimeError: Cannot call sync SDK methods from the event loop thread. Use async handler methods instead. Switch the nine .trigger() call sites in state.py and stream.py to their await-able trigger_async() counterpart. Also depend on iii-observability directly (path source via [tool.uv.sources]) and import Logger from it in hooks.py instead of re-exporting through iii. Aligns the example with the new package boundary.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
.github/workflows/release-iii.yml (1)
396-403:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
sdk-npmshould pre-build workspace observability dependency inside_npm.yml.
needs: observability-npmonly sequences jobs; it does not build local workspace deps in thesdk-npmjob. Sinceiii-sdkalready needed this prerequisite in CI (Line 548 in.github/workflows/ci.yml), release publish should passpre_build_filtertoo to keep the build deterministic.💡 Suggested workflow fix
sdk-npm: name: SDK Node Publish needs: [setup, create-iii-release, observability-npm] if: ${{ !failure() && !cancelled() }} uses: ./.github/workflows/_npm.yml with: package_path: sdk/packages/node/iii npm_tag: ${{ needs.setup.outputs.npm_tag }} + pre_build_filter: "`@iii-dev/observability`" build_filter: iii-sdk dry_run: ${{ needs.setup.outputs.dry_run == 'true' }} slack_thread_ts: ${{ needs.setup.outputs.slack_ts }} slack_label: SDK Node (npm)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release-iii.yml around lines 396 - 403, The sdk-npm release job currently sequences observability-npm but does not pre-build that workspace dependency; update the job that uses ./.github/workflows/_npm.yml (the block with package_path: sdk/packages/node/iii and build_filter: iii-sdk) to pass the pre_build_filter input (e.g., pre_build_filter: observability-npm) so the _npm.yml template will pre-build the observability workspace dependency before publishing.docs/0-11-0/how-to/observability-and-logs.mdx (1)
63-71:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix remaining Rust
Loggerimport indocs/0-11-0/how-to/observability-and-logs.mdxLine 168 still imports
Loggerfromiii_sdk, leaving conflicting guidance on the same page (the earlier snippet usesiii_observability::Logger).📌 Proposed doc fix
-use iii_sdk::{register_worker, InitOptions, Logger, RegisterFunction}; +use iii_observability::Logger; +use iii_sdk::{register_worker, InitOptions, RegisterFunction};(There are additional stale
iii_sdkLoggerimports in other docs files as well, but this change is required for this page’s consistency.)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/0-11-0/how-to/observability-and-logs.mdx` around lines 63 - 71, Update the Rust import so the docs consistently reference iii_observability::Logger rather than the stale iii_sdk import: find the import statement that brings Logger from iii_sdk (the conflicting example on the page) and change it to use iii_observability::Logger, and scan the same docs for other occurrences of iii_sdk::Logger to make the same replacement so all Rust snippets reference the iii_observability Logger type consistently.engine/src/workers/observability/otel.rs (1)
782-810:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
Bothexporter path is missingBaggageSpanProcessor, causing mode-specific behavior drift.
ExporterType::Bothsuccess path builds the provider without.with_span_processor(...), unlike the other branches. This drops baggage span processing when OTLP is healthy.Suggested fix
Ok(otlp_exporter) => { init_sdk_span_forwarder(&config.endpoint); // Create tee exporter that sends to both let tee_exporter = TeeSpanExporter::new( otlp_exporter, memory_storage, config.service_name.clone(), ); SdkTracerProvider::builder() + .with_span_processor(iii_observability::BaggageSpanProcessor::default()) .with_batch_exporter(tee_exporter) .with_sampler(sampler) .with_id_generator(RandomIdGenerator::default()) .with_resource(resource) .build() }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@engine/src/workers/observability/otel.rs` around lines 782 - 810, In the ExporterType::Both branch you build the SdkTracerProvider without attaching the BaggageSpanProcessor, causing baggage processing to be skipped when OTLP is available; update the ExporterType::Both success path (the block creating TeeSpanExporter and calling SdkTracerProvider::builder()) to also register the BaggageSpanProcessor (the same span processor used in the other branches) via .with_span_processor(...) so baggage is processed consistently (keep init_sdk_span_forwarder, TeeSpanExporter, sampler, id generator, and resource unchanged).
🧹 Nitpick comments (2)
sdk/packages/node/observability/src/http-instrumentation.test.ts (1)
5-15: ⚡ Quick winAdd a regression test for
Requestinputs with existing headers.Current coverage only checks string URL input; it won’t catch header loss when
executeTracedRequestreceives aRequest.✅ Suggested additional test
describe('executeTracedRequest', () => { it('forwards to fetch and returns the response', async () => { @@ }) + + it('preserves existing Request headers when injecting trace headers', async () => { + const originalFetch = globalThis.fetch + const fetchMock = vi.fn().mockResolvedValue(new Response('ok', { status: 200 })) + globalThis.fetch = fetchMock as typeof fetch + try { + const req = new Request('https://example.com/api', { + headers: { authorization: 'Bearer token' }, + }) + await executeTracedRequest(req) + const [, init] = fetchMock.mock.calls[0] + const headers = new Headers(init?.headers) + expect(headers.get('authorization')).toBe('Bearer token') + } finally { + globalThis.fetch = originalFetch + } + }) })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/packages/node/observability/src/http-instrumentation.test.ts` around lines 5 - 15, Add a regression test in http-instrumentation.test.ts that calls executeTracedRequest with a Request instance that already contains headers (e.g., an Authorization or custom header) to ensure headers are preserved; mock globalThis.fetch (like existing tests) to capture the incoming Request argument and assert the header is present on the request passed to fetch, and that the response from fetch is forwarded unchanged (status/body). Reference executeTracedRequest and the existing test pattern for restoring globalThis.fetch when implementing the new test.sdk/packages/python/observability/tests/test_http_instrumentation.py (1)
6-22: ⚡ Quick winTighten assertions to validate instrumentation behavior (not only HTTP responses).
Line 17’s test name implies span error-status behavior, but it only checks
res.status_code. Please either assert instrumentation side-effects (e.g.,traceparentheader presence via mocked requests) or rename tests to reflect response-only validation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/packages/python/observability/tests/test_http_instrumentation.py` around lines 6 - 22, The test test_execute_traced_request_records_error_status currently only asserts HTTP response code; update it to validate instrumentation side-effects from execute_traced_request by asserting that the outgoing request carried tracing headers (e.g., presence of "traceparent" or "tracestate" on the built request or in httpx_mock.request_history) or by inspecting recorded spans from the test tracer/exporter to ensure the span has error status for 500 responses; modify the test to either (a) check httpx_mock.request_history[0].headers contains "traceparent" and that a span/emitted event reflects error status, or (b) rename the test to indicate it only validates HTTP response if you prefer not to assert instrumentation. Ensure you reference execute_traced_request and test_execute_traced_request_records_error_status when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@sdk/packages/node/observability/src/telemetry-system/connection.ts`:
- Around line 117-120: The loop that invokes each registered failure handler
(the this.onFailedCallbacks iteration) must isolate exceptions so one bad
callback doesn't stop the rest; wrap each cb() call in a try/catch and on error
log the exception (use an existing logger if available, e.g. this.logger.error,
or console.error) and continue to the next callback. Apply the same change to
the other place where you iterate this.onFailedCallbacks (the second draining
loop) so both notification sites catch and log individual callback errors and
keep draining exporters.
---
Outside diff comments:
In @.github/workflows/release-iii.yml:
- Around line 396-403: The sdk-npm release job currently sequences
observability-npm but does not pre-build that workspace dependency; update the
job that uses ./.github/workflows/_npm.yml (the block with package_path:
sdk/packages/node/iii and build_filter: iii-sdk) to pass the pre_build_filter
input (e.g., pre_build_filter: observability-npm) so the _npm.yml template will
pre-build the observability workspace dependency before publishing.
In `@docs/0-11-0/how-to/observability-and-logs.mdx`:
- Around line 63-71: Update the Rust import so the docs consistently reference
iii_observability::Logger rather than the stale iii_sdk import: find the import
statement that brings Logger from iii_sdk (the conflicting example on the page)
and change it to use iii_observability::Logger, and scan the same docs for other
occurrences of iii_sdk::Logger to make the same replacement so all Rust snippets
reference the iii_observability Logger type consistently.
In `@engine/src/workers/observability/otel.rs`:
- Around line 782-810: In the ExporterType::Both branch you build the
SdkTracerProvider without attaching the BaggageSpanProcessor, causing baggage
processing to be skipped when OTLP is available; update the ExporterType::Both
success path (the block creating TeeSpanExporter and calling
SdkTracerProvider::builder()) to also register the BaggageSpanProcessor (the
same span processor used in the other branches) via .with_span_processor(...) so
baggage is processed consistently (keep init_sdk_span_forwarder,
TeeSpanExporter, sampler, id generator, and resource unchanged).
---
Nitpick comments:
In `@sdk/packages/node/observability/src/http-instrumentation.test.ts`:
- Around line 5-15: Add a regression test in http-instrumentation.test.ts that
calls executeTracedRequest with a Request instance that already contains headers
(e.g., an Authorization or custom header) to ensure headers are preserved; mock
globalThis.fetch (like existing tests) to capture the incoming Request argument
and assert the header is present on the request passed to fetch, and that the
response from fetch is forwarded unchanged (status/body). Reference
executeTracedRequest and the existing test pattern for restoring
globalThis.fetch when implementing the new test.
In `@sdk/packages/python/observability/tests/test_http_instrumentation.py`:
- Around line 6-22: The test test_execute_traced_request_records_error_status
currently only asserts HTTP response code; update it to validate instrumentation
side-effects from execute_traced_request by asserting that the outgoing request
carried tracing headers (e.g., presence of "traceparent" or "tracestate" on the
built request or in httpx_mock.request_history) or by inspecting recorded spans
from the test tracer/exporter to ensure the span has error status for 500
responses; modify the test to either (a) check
httpx_mock.request_history[0].headers contains "traceparent" and that a
span/emitted event reflects error status, or (b) rename the test to indicate it
only validates HTTP response if you prefer not to assert instrumentation. Ensure
you reference execute_traced_request and
test_execute_traced_request_records_error_status when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 62d3dca0-3951-4a06-bd44-d02127c4e78a
⛔ Files ignored due to path filters (5)
Cargo.lockis excluded by!**/*.lockpnpm-lock.yamlis excluded by!**/pnpm-lock.yamlsdk/packages/python/iii-example/uv.lockis excluded by!**/*.locksdk/packages/python/iii/uv.lockis excluded by!**/*.locksdk/packages/python/observability/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (132)
.github/workflows/ci.yml.github/workflows/release-iii.ymlCargo.tomlconsole/packages/console-rust/Cargo.tomlconsole/packages/console-rust/src/main.rscrates/iii-worker/Cargo.tomlcrates/iii-worker/src/cli/worker_manager_daemon.rscrates/iii-worker/src/sandbox_daemon/mod.rsdocs/0-11-0/how-to/observability-and-logs.mdxengine/Cargo.tomlengine/src/workers/observability/otel.rspnpm-workspace.yamlsdk/packages/node/iii/package.jsonsdk/packages/node/iii/src/iii.tssdk/packages/node/iii/src/index.tssdk/packages/node/iii/src/types.tssdk/packages/node/iii/src/utils.tssdk/packages/node/iii/tests/baggage-span-processor.test.tssdk/packages/node/iii/tests/bridge-utils.tssdk/packages/node/iii/tests/exports.test.tssdk/packages/node/iii/tests/fetch-instrumentation.test.tssdk/packages/node/iii/tests/logger-runtime.test.tssdk/packages/node/iii/tests/logger.test.tssdk/packages/node/iii/tests/otel-defaults.test.tssdk/packages/node/iii/tests/otel-worker-gauges.test.tssdk/packages/node/iii/tests/payload.test.tssdk/packages/node/iii/tests/span-ops.test.tssdk/packages/node/iii/tests/worker-metrics.test.tssdk/packages/node/iii/tsdown.config.tssdk/packages/node/iii/vitest.config.tssdk/packages/node/observability/README.mdsdk/packages/node/observability/package.jsonsdk/packages/node/observability/src/http-instrumentation.test.tssdk/packages/node/observability/src/http-instrumentation.tssdk/packages/node/observability/src/index.tssdk/packages/node/observability/src/logger.test.tssdk/packages/node/observability/src/logger.tssdk/packages/node/observability/src/otel-worker-gauges.tssdk/packages/node/observability/src/telemetry-system/baggage-span-processor.tssdk/packages/node/observability/src/telemetry-system/connection.tssdk/packages/node/observability/src/telemetry-system/context.test.tssdk/packages/node/observability/src/telemetry-system/context.tssdk/packages/node/observability/src/telemetry-system/exporters.tssdk/packages/node/observability/src/telemetry-system/fetch-instrumentation.tssdk/packages/node/observability/src/telemetry-system/index.tssdk/packages/node/observability/src/telemetry-system/log-exporter.tssdk/packages/node/observability/src/telemetry-system/metrics-exporter.tssdk/packages/node/observability/src/telemetry-system/payload.tssdk/packages/node/observability/src/telemetry-system/span-exporter.tssdk/packages/node/observability/src/telemetry-system/span-ops.tssdk/packages/node/observability/src/telemetry-system/types.tssdk/packages/node/observability/src/telemetry-system/utils.tssdk/packages/node/observability/src/types.tssdk/packages/node/observability/src/utils.tssdk/packages/node/observability/src/worker-metrics.tssdk/packages/node/observability/tsconfig.jsonsdk/packages/node/observability/tsdown.config.tssdk/packages/node/observability/vitest.config.tssdk/packages/python/iii-example/pyproject.tomlsdk/packages/python/iii-example/src/hooks.pysdk/packages/python/iii-example/src/state.pysdk/packages/python/iii-example/src/stream.pysdk/packages/python/iii/pyproject.tomlsdk/packages/python/iii/src/iii/__init__.pysdk/packages/python/iii/src/iii/iii.pysdk/packages/python/iii/src/iii/iii_constants.pysdk/packages/python/iii/tests/test_baggage_span_processor.pysdk/packages/python/iii/tests/test_context_propagation.pysdk/packages/python/iii/tests/test_hold_process.pysdk/packages/python/iii/tests/test_http_external_functions_integration.pysdk/packages/python/iii/tests/test_iii_registration_dedup.pysdk/packages/python/iii/tests/test_init_api.pysdk/packages/python/iii/tests/test_logger_function_ids.pysdk/packages/python/iii/tests/test_logger_otel.pysdk/packages/python/iii/tests/test_payload.pysdk/packages/python/iii/tests/test_register_function_args.pysdk/packages/python/iii/tests/test_span_ops.pysdk/packages/python/iii/tests/test_sync_api.pysdk/packages/python/iii/tests/test_telemetry.pysdk/packages/python/iii/tests/test_telemetry_exporters.pysdk/packages/python/iii/tests/test_telemetry_types.pysdk/packages/python/iii/tests/test_trace_helpers.pysdk/packages/python/observability/README.mdsdk/packages/python/observability/pyproject.tomlsdk/packages/python/observability/src/iii_observability/__init__.pysdk/packages/python/observability/src/iii_observability/baggage_span_processor.pysdk/packages/python/observability/src/iii_observability/http_instrumentation.pysdk/packages/python/observability/src/iii_observability/logger.pysdk/packages/python/observability/src/iii_observability/payload.pysdk/packages/python/observability/src/iii_observability/py.typedsdk/packages/python/observability/src/iii_observability/reconnection.pysdk/packages/python/observability/src/iii_observability/span_ops.pysdk/packages/python/observability/src/iii_observability/telemetry.pysdk/packages/python/observability/src/iii_observability/telemetry_exporters.pysdk/packages/python/observability/src/iii_observability/telemetry_types.pysdk/packages/python/observability/tests/__init__.pysdk/packages/python/observability/tests/test_http_instrumentation.pysdk/packages/python/observability/tests/test_telemetry.pysdk/packages/rust/iii-example/Cargo.tomlsdk/packages/rust/iii-example/src/http_example.rssdk/packages/rust/iii-example/src/logger_example.rssdk/packages/rust/iii-example/src/main.rssdk/packages/rust/iii/Cargo.tomlsdk/packages/rust/iii/README.mdsdk/packages/rust/iii/src/iii.rssdk/packages/rust/iii/src/lib.rssdk/packages/rust/iii/tests/init_api.rssdk/packages/rust/iii/tests/payload_capture.rssdk/packages/rust/iii/tests/span_ops_api.rssdk/packages/rust/observability/Cargo.tomlsdk/packages/rust/observability/README.mdsdk/packages/rust/observability/src/lib.rssdk/packages/rust/observability/src/logger.rssdk/packages/rust/observability/src/telemetry/baggage_span_processor.rssdk/packages/rust/observability/src/telemetry/connection.rssdk/packages/rust/observability/src/telemetry/context.rssdk/packages/rust/observability/src/telemetry/http_instrumentation.rssdk/packages/rust/observability/src/telemetry/json_serializer.rssdk/packages/rust/observability/src/telemetry/log_exporter.rssdk/packages/rust/observability/src/telemetry/metrics_exporter.rssdk/packages/rust/observability/src/telemetry/mod.rssdk/packages/rust/observability/src/telemetry/otel_worker_gauges.rssdk/packages/rust/observability/src/telemetry/payload.rssdk/packages/rust/observability/src/telemetry/span_exporter.rssdk/packages/rust/observability/src/telemetry/span_ops.rssdk/packages/rust/observability/src/telemetry/types.rssdk/packages/rust/observability/src/telemetry/worker_metrics.rssdk/packages/rust/observability/tests/context_test.rssdk/packages/rust/observability/tests/init_test.rssdk/packages/rust/observability/tests/logger_test.rssdk/packages/rust/observability/tests/payload_test.rssdk/packages/rust/observability/tests/types_test.rs
💤 Files with no reviewable changes (3)
- sdk/packages/node/iii/src/utils.ts
- sdk/packages/node/iii/src/types.ts
- sdk/packages/node/iii/tests/exports.test.ts
✅ Files skipped from review due to trivial changes (28)
- sdk/packages/rust/observability/README.md
- crates/iii-worker/Cargo.toml
- sdk/packages/python/iii/tests/test_trace_helpers.py
- sdk/packages/rust/iii-example/src/logger_example.rs
- console/packages/console-rust/Cargo.toml
- engine/Cargo.toml
- sdk/packages/node/iii/tests/worker-metrics.test.ts
- sdk/packages/rust/observability/src/telemetry/http_instrumentation.rs
- sdk/packages/rust/iii/tests/init_api.rs
- sdk/packages/rust/iii-example/Cargo.toml
- sdk/packages/node/observability/vitest.config.ts
- sdk/packages/node/observability/package.json
- sdk/packages/python/iii/tests/test_logger_function_ids.py
- sdk/packages/rust/observability/src/telemetry/mod.rs
- sdk/packages/rust/iii/README.md
- sdk/packages/python/observability/README.md
- sdk/packages/python/iii-example/src/hooks.py
- sdk/packages/rust/observability/Cargo.toml
- sdk/packages/node/iii/tests/payload.test.ts
- sdk/packages/python/iii/tests/test_telemetry_exporters.py
- sdk/packages/python/iii/tests/test_baggage_span_processor.py
- sdk/packages/python/iii/tests/test_init_api.py
- sdk/packages/python/observability/pyproject.toml
- sdk/packages/python/iii/tests/test_payload.py
- sdk/packages/node/observability/README.md
- sdk/packages/rust/observability/src/logger.rs
- sdk/packages/python/iii/src/iii/iii.py
- sdk/packages/python/iii/tests/test_telemetry_types.py
| // Notify failed callbacks so exporters can drain their own queues | ||
| for (const cb of this.onFailedCallbacks) { | ||
| cb() | ||
| } |
There was a problem hiding this comment.
Isolate onFailed callback exceptions so draining stays robust.
If one onFailed callback throws, remaining callbacks may never run, which can leave exporter queues undrained.
💡 Suggested fix
// Notify failed callbacks so exporters can drain their own queues
for (const cb of this.onFailedCallbacks) {
- cb()
+ try {
+ cb()
+ } catch (err) {
+ console.error('[OTel] onFailed callback threw:', err)
+ }
}
@@
onFailed(callback: () => void): void {
this.onFailedCallbacks.push(callback)
if (this.state === 'failed') {
- callback()
+ try {
+ callback()
+ } catch (err) {
+ console.error('[OTel] onFailed callback threw:', err)
+ }
}
}Also applies to: 180-184
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sdk/packages/node/observability/src/telemetry-system/connection.ts` around
lines 117 - 120, The loop that invokes each registered failure handler (the
this.onFailedCallbacks iteration) must isolate exceptions so one bad callback
doesn't stop the rest; wrap each cb() call in a try/catch and on error log the
exception (use an existing logger if available, e.g. this.logger.error, or
console.error) and continue to the next callback. Apply the same change to the
other place where you iterate this.onFailedCallbacks (the second draining loop)
so both notification sites catch and log individual callback errors and keep
draining exporters.
What
Why
Notes
Summary by CodeRabbit
Release Notes
New Features
@iii-dev/observabilityfor Node,iii-observabilityfor Python and Rust) containing shared telemetry, logging, and tracing utilities.Refactor