Skip to content

fix(instrumentation): updated GenAI token usage attributes #3138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

martimfasantos
Copy link

@martimfasantos martimfasantos commented Jul 15, 2025

This pull request updates the token usage terminology across multiple packages and test files. The changes involve replacing prompt_tokens with input_tokens and completion_tokens with output_tokens to align with updated naming conventions present in the Documentation. Additionally, all related assertions in the tests have been modified to reflect these updates.

Screenshots

image
  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Renamed prompt_tokens to input_tokens and completion_tokens to output_tokens across various test files and packages to align with updated naming conventions.

  • Terminology Update:
    • Renamed prompt_tokens to input_tokens and completion_tokens to output_tokens in SpanAttributes in __init__.py.
    • Updated assertions in test files like test_completion.py, test_messages.py, and test_prompt_caching.py to use input_tokens and output_tokens.
  • Test Files:
    • Modified token usage assertions in test_completion.py for AlephAlpha, Anthropic, OpenAI, and Together packages.
    • Updated test_prompt_caching.py for Anthropic and OpenAI to reflect new token terminology.
    • Adjusted test_chat.py and test_completion.py for Together package to use updated token names.
  • Misc:
    • Ensured all related test assertions and logs reflect the new token terminology.

This description was created by Ellipsis for 3dad6f9. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • Refactor
    • Updated all telemetry and tracing attribute keys from the legacy LLM naming convention to the new GEN_AI convention across all supported AI integrations and tests, aligning with updated semantic standards for generative AI.
  • Chores
    • Added a dependency on the OpenTelemetry SDK to support the new semantic conventions.
  • Tests
    • Revised test assertions to reflect the updated GEN_AI attribute keys, ensuring consistency with the refactored instrumentation code.

@CLAassistant
Copy link

CLAassistant commented Jul 15, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 3dad6f9 in 1 minute and 35 seconds. Click for details.
  • Reviewed 591 lines of code in 7 files
  • Skipped 0 files when reviewing.
  • Skipped posting 13 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:50
  • Draft comment:
    Updated constants: 'LLM_USAGE_COMPLETION_TOKENS' now uses 'gen_ai.usage.output_tokens' and 'LLM_USAGE_PROMPT_TOKENS' uses 'gen_ai.usage.input_tokens'. This aligns with the new naming conventions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining the changes made to constants and their alignment with new naming conventions. It does not provide any actionable feedback or suggestions for improvement.
2. packages/opentelemetry-instrumentation-together/tests/test_chat.py:37
  • Draft comment:
    Test assertions now use the updated attribute names (input_tokens/output_tokens) for token usage. Verify consistency with new semantic conventions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify consistency with new semantic conventions, which falls under the rule of not asking the author to double-check things. It doesn't provide a specific suggestion or point out a specific issue with the code.
3. packages/opentelemetry-instrumentation-together/tests/test_completion.py:37
  • Draft comment:
    Ensure that the test for Together completions correctly sums 'gen_ai.usage.input_tokens' and 'gen_ai.usage.output_tokens' to match total tokens.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:70
  • Draft comment:
    The OpenAI prompt caching tests have been updated to use new input/output token attribute names. Confirm that these changes are in sync with semantic conventions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that changes are in sync with semantic conventions, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written. Therefore, it should be removed.
5. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:54
  • Draft comment:
    Consider updating the accompanying documentation or changelog to mention that the legacy token-related attribute names have been replaced with 'input_tokens' and 'output_tokens'.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment suggests updating documentation or changelog, which is not allowed as per the rules. It doesn't provide a specific code suggestion or ask for a test to be written.
6. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:109
  • Draft comment:
    Typographical error: "cassete" should be "cassette" in the comment.
  • Reason this comment was not posted:
    Comment was on unchanged code.
7. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:195
  • Draft comment:
    Typo/grammatical note: The comment currently says "but not wrote to it," which could be updated to "but not write to it," for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
8. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:604
  • Draft comment:
    Typographical suggestion: The comment reads "but not wrote to it," which could be clearer. Consider revising it to something like "but did not write to it," to improve clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
9. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:798
  • Draft comment:
    Typo in comment: "but not wrote to it" is ungrammatical. Consider using "but not writing to it" or "but not writing into it" for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
10. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:931
  • Draft comment:
    Typo in comment: Consider updating "but not wrote to it" to "but did not write to it" for clearer grammar.
  • Reason this comment was not posted:
    Comment was on unchanged code.
11. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1025
  • Draft comment:
    Typo in comment: Consider changing 'wrote to it' to 'written to it' for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
12. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1029
  • Draft comment:
    Typographical error: 'cassete' appears to be misspelled. Consider updating it to 'cassette'.
  • Reason this comment was not posted:
    Comment was on unchanged code.
13. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1362
  • Draft comment:
    Typo: 'cassete' should be spelled as 'cassette' in the comment.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_oahtGC07iVEgOTYG

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martimfasantos! in this case I'd remove them from the semantic conventions and just use it directly from the otel package - that way it will always be up to date (when we originally built it there weren't any semconvs for this)

Copy link

coderabbitai bot commented Jul 19, 2025

Walkthrough

The changes systematically update all span attribute keys and related constants from the legacy LLM_* (Large Language Model) naming convention to the newer GEN_AI_* (Generative AI) convention across all code, tests, and semantic convention definitions. This affects instrumentation, span utilities, metrics, and tests in every supported AI provider integration, aligning the codebase with the latest OpenTelemetry GenAI semantic conventions. No logic, control flow, or public API signatures were altered.

Changes

File(s) / Path(s) Change Summary
.../opentelemetry/semconv_ai/init.py Replaced hardcoded LLM_* string constants with GEN_AI_* constants referencing official OpenTelemetry attributes.
.../opentelemetry/semconv_ai/pyproject.toml Added opentelemetry-sdk as a dependency.
.../instrumentation-/opentelemetry/instrumentation//init.py Replaced all LLM_* span attribute keys with GEN_AI_* equivalents in instrumentation wrappers and span creation.
.../instrumentation-/opentelemetry/instrumentation//span_utils.py Systematically updated all span attribute keys from LLM_* to GEN_AI_* in utility functions for all providers.
.../instrumentation-/opentelemetry/instrumentation//utils.py, streaming.py, guardrail.py, etc. Updated metric and span attribute keys from LLM_* to GEN_AI_* in helper and streaming/metrics logic.
.../instrumentation-/tests/ Updated all test assertions and expected span/metric attribute keys from LLM_* to GEN_AI_*.
.../instrumentation-/tests/metrics/, .../tests/utils.py Changed metric attribute keys and test utilities from LLM_* to GEN_AI_*.
.../instrumentation-openai/opentelemetry/instrumentation/openai/shared/, v1/, event_handler_*.py Updated all span and metric attribute keys from LLM_* to GEN_AI_* in OpenAI shared and v1 wrappers.
.../traceloop-sdk/traceloop/sdk/tracing/manual.py, .../traceloop-sdk/tests/* Changed manual tracing and corresponding tests to use GEN_AI_* span attribute keys.

Sequence Diagram(s)

sequenceDiagram
    participant App
    participant Instrumentation
    participant Span
    participant Metrics

    App->>Instrumentation: Make AI model call
    Instrumentation->>Span: Start span with GEN_AI_* attributes
    Instrumentation->>Span: Set GEN_AI_* prompt/completion/model/system
    Instrumentation->>Metrics: Record metrics with GEN_AI_* keys
    Instrumentation-->>App: Return AI model response
    App->>Instrumentation: Retrieve span/metrics for validation (tests)
    Instrumentation->>Span: Assert GEN_AI_* attributes present
Loading

Poem

🐇
Goodbye to LLM, hello GenAI,
Attribute keys have learned to fly!
From prompts to completions, the names are new,
Spans and metrics, all freshened too.
With every trace and every test,
Our GenAI naming is now the best!
—A happy tracing rabbit

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (12)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)

210-225: Replace LLM token attributes with GEN_AI equivalents

The semantic-conventions package exports GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS, so we should stop using the old LLM_USAGE_PROMPT_TOKENS and LLM_USAGE_COMPLETION_TOKENS. The total-tokens attribute remains LLM_USAGE_TOTAL_TOKENS (no GEN_AI_USAGE_TOTAL_TOKENS is defined).

Locations to update:

  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~210–225
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~324–336

Suggested diff:

-    _set_span_attribute(
-        span,
-        SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
-        stream_response.get("input_token_count"),
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
+        stream_response.get("input_token_count"),
+    )

-    _set_span_attribute(
-        span,
-        SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
-        stream_response.get("generated_token_count"),
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
+        stream_response.get("generated_token_count"),
+    )

     total_token = stream_response.get("input_token_count") + stream_response.get(
         "generated_token_count"
     )
-    _set_span_attribute(
-        span,
-        SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
-        total_token,
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
+        total_token,
+    )

Please apply the same replacements in the second block (lines 324–336) and update any related tests to expect the new GEN_AI_USAGE_* constants.

packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (1)

49-65: Align usage attributes with Gen AI semantic conventions

The Gen AI spec defines gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and the cache‐related usage attributes, but does not yet provide gen_ai.usage.total_tokens or a gen_ai.request.type. To stay consistent:

• Replace LLM prompt/completion and cache‐input tokens with the GEN_AI equivalents
• Keep LLM_USAGE_TOTAL_TOKENS and LLM_REQUEST_TYPE as they have no Gen AI counterparts

Locations to update (packages/traceloop-sdk/traceloop/sdk/tracing/manual.py, lines 49–65):

-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens
-        )
-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens
-        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens
+        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens
+        )
         self._span.set_attribute(
             SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens
         )
-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS,
-            usage.cache_creation_input_tokens,
-        )
-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS,
-            usage.cache_read_input_tokens,
-        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS,
+            usage.cache_creation_input_tokens,
+        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
+            usage.cache_read_input_tokens,
+        )

No change needed for LLM_REQUEST_TYPE—the semantic conventions only define "llm.request.type" at this time.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

479-497: None > 0 guard needed before metric recording

prompt_tokens or completion_tokens can be None when the provider omits usage stats.
Comparing None > 0 raises TypeError.

-            if prompt_tokens > 0:
+            if isinstance(prompt_tokens, (int, float)) and prompt_tokens > 0:
@@
-            if completion_tokens > 0:
+            if isinstance(completion_tokens, (int, float)) and completion_tokens > 0:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)

226-233: Inconsistent token usage attribute naming.

The token usage attributes still use the legacy LLM_USAGE_* naming instead of the newer GEN_AI_USAGE_* conventions. Based on the PR objectives to update token terminology, these should be updated to align with the new naming conventions.

Apply this diff to align with the new token usage attribute naming:

         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+            SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
             usage_dict.get("completion_tokens"),
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+            SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
             usage_dict.get("prompt_tokens"),
         )
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

402-416: Inconsistent token usage attribute naming.

The token usage span attributes still use the legacy LLM_USAGE_* naming instead of the newer GEN_AI_USAGE_* conventions, which is inconsistent with the overall migration to GenAI semantic conventions.

Consider updating these attributes to align with the new naming conventions:

             set_span_attribute(
                 span,
-                SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+                SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
                 input_tokens,
             )
             set_span_attribute(
                 span,
-                SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+                SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
                 output_tokens,
             )
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py (1)

80-89: Inconsistent attribute naming - some LLM_ attributes remain.*

While the response model was updated, the token usage attributes and finish reason attribute still use the legacy LLM_* naming instead of the newer GEN_AI_* conventions.

Consider updating these remaining attributes for consistency:

         span.set_attribute(
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens
+            SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens
         )
-        span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens)
+        span.set_attribute(SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens)

And for the finish reason attribute:

         span.set_attribute(
-            SpanAttributes.LLM_RESPONSE_FINISH_REASON, choices[0].finish_reason
+            SpanAttributes.GEN_AI_RESPONSE_FINISH_REASONS, choices[0].finish_reason
         )
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py (1)

88-109: Incomplete semantic convention migration for token usage attributes

While GEN_AI_RESPONSE_MODEL has been correctly updated, the token usage attributes on lines 97, 102, and 107 still use the legacy LLM_USAGE_* naming convention. Based on the PR objectives to update token usage terminology, these should be updated to:

  • LLM_USAGE_PROMPT_TOKENSGEN_AI_USAGE_INPUT_TOKENS
  • LLM_USAGE_COMPLETION_TOKENSGEN_AI_USAGE_OUTPUT_TOKENS
  • LLM_USAGE_TOTAL_TOKENSGEN_AI_USAGE_TOTAL_TOKENS
    _set_span_attribute(
        span,
-       SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
+       SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS,
        input_tokens + output_tokens,
    )
    _set_span_attribute(
        span,
-       SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+       SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
        output_tokens,
    )
    _set_span_attribute(
        span,
-       SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+       SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
        input_tokens,
    )
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (1)

902-911: Update token usage attribute names in streaming tests

The new semantic conventions define GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS, so the legacy LLM_USAGE_* constants should be replaced accordingly. Since there is no GEN_AI_USAGE_TOTAL_TOKENS constant, compute the total by summing input and output tokens.

• File: packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (around lines 902–911)

  • Replace SpanAttributes.LLM_USAGE_PROMPT_TOKENSSpanAttributes.GEN_AI_USAGE_INPUT_TOKENS
  • Replace SpanAttributes.LLM_USAGE_COMPLETION_TOKENSSpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS
  • Update total‐tokens assertion to sum the two new attributes instead of using LLM_USAGE_TOTAL_TOKENS

Suggested diff:

- assert (
-     bedrock_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS]
-     == inputTokens
- )
- assert (
-     bedrock_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS]
-     == outputTokens
- )
- assert (
-     bedrock_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
-     == inputTokens + outputTokens
- )
+ assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == inputTokens
+ assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == outputTokens
+ assert (
+     bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+     + bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+     == inputTokens + outputTokens
+ )
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (2)

222-230: Token usage attributes not updated to new semantic conventions

Similar to other files in this PR, the token usage attributes still use legacy LLM_USAGE_* naming:

  • LLM_USAGE_PROMPT_TOKENS (line 222)
  • LLM_USAGE_COMPLETION_TOKENS (line 224)
  • LLM_USAGE_TOTAL_TOKENS (line 228)

For consistency with the PR objectives, these should be updated to use GEN_AI_USAGE_* equivalents with the new "input/output" terminology instead of "prompt/completion".


76-90: Update semantic convention prefixes for penalty attributes and confirm streaming attribute

The semantic conventions for frequency and presence penalties have GEN_AI_REQUEST_* equivalents and should replace the legacy LLM_* prefixes. The streaming attribute currently has no GEN_AI_* equivalent in the semantic conventions—continue using LLM_IS_STREAMING or propose adding GEN_AI_REQUEST_STREAMING.

• In packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

  • Line 85: replace SpanAttributes.LLM_FREQUENCY_PENALTY with SpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY
  • Line 88: replace SpanAttributes.LLM_PRESENCE_PENALTY with SpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY
  • Line 90: retain SpanAttributes.LLM_IS_STREAMING (no GEN_AI_REQUEST_STREAMING found in opentelemetry-semantic-conventions-ai)

Suggested diff:

- set_span_attribute(
-     span, SpanAttributes.LLM_FREQUENCY_PENALTY, kwargs.get("frequency_penalty")
- )
+ set_span_attribute(
+     span, SpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY, kwargs.get("frequency_penalty")
+ )

- set_span_attribute(
-     span, SpanAttributes.LLM_PRESENCE_PENALTY, kwargs.get("presence_penalty")
- )
+ set_span_attribute(
+     span, SpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY, kwargs.get("presence_penalty")
+ )
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (1)

86-140: Migrate legacy LLM_USAGE_* attributes to the new naming convention

The old token-usage constants (LLM_USAGE_COMPLETION_TOKENS, LLM_USAGE_PROMPT_TOKENS, LLM_USAGE_TOTAL_TOKENS) are still in use across multiple instrumentation packages (e.g., groq/span_utils.py, together/span_utils.py, vertexai/span_utils.py, etc.). All of these must be updated to the new attribute names alongside the already-migrated GEN_AI_TOKEN_TYPE and GEN_AI_RESPONSE_MODEL.

Please update every occurrence in your span utilities—including tests—to use the correct, non-legacy constants. You can locate all remaining uses with:

rg -l 'LLM_USAGE_.*TOKEN' packages/opentelemetry-instrumentation-*/

• Replace LLM_USAGE_PROMPT_TOKENS
• Replace LLM_USAGE_COMPLETION_TOKENS
• Replace LLM_USAGE_TOTAL_TOKENS

with their new equivalents in each span_utils.py (and related test files).

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

287-300: _get_vendor_from_url returns capital-cased values that violate the GenAI enum

GenAISystemValues in the incuabating semconv are lower-case ("openai", "aws", "azure", …).
Here we return "AWS", "Azure", etc.; this will produce spans that fail semantic-convention
validation and break downstream queries that expect the canonical values.

-    if "openai.azure.com" in base_url:
-        return "Azure"
-    elif "amazonaws.com" in base_url or "bedrock" in base_url:
-        return "AWS"
-    elif "googleapis.com" in base_url or "vertex" in base_url:
-        return "Google"
-    elif "openrouter.ai" in base_url:
-        return "OpenRouter"
+    if "openai.azure.com" in base_url:
+        return "azure"
+    elif "amazonaws.com" in base_url or "bedrock" in base_url:
+        return "aws"
+    elif "googleapis.com" in base_url or "vertex" in base_url:
+        return "google"
+    elif "openrouter.ai" in base_url:
+        return "openrouter"

A one-line return vendor.lower() after detection would also work.

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

269-276: Repeat comment: use enum for vendor

Same concern as above—replace the literal "AWS" with the enum value to stay future-proof.

🧹 Nitpick comments (13)
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py (1)

177-181: Simplify nested if statements.

The static analysis tool correctly identified nested if statements that can be combined for better readability.

Apply this diff to combine the nested if statements:

-    if should_send_prompts():
-        if llm_request_type == LLMRequestTypeValues.COMPLETION:
+    if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION:
             _set_span_attribute(
                 span, f"{SpanAttributes.GEN_AI_COMPLETION}.0.content", response.text
             )
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

51-52: Prefer the official enum to avoid string drift

"AWS" is hard-coded while elsewhere the enum GenAIAttributes.GenAiSystemValues.AWS_BEDROCK.value is used. Using the enum keeps tests aligned if the canonical value ever changes.

-assert bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM] == "AWS"
+assert (
+    bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM]
+    == GenAIAttributes.GenAiSystemValues.AWS_BEDROCK.value
+)
packages/opentelemetry-instrumentation-cohere/tests/test_completion.py (1)

53-56: Minor consistency nit

Consider using GenAIAttributes.GenAiSystemValues.COHERE.value instead of the literal "Cohere" for the vendor check.

packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py (2)

145-153: Duplication across tests

The same literal prompt string & model ID are asserted in ~20 places. Parametrize via pytest.mark.parametrize or a fixture to reduce duplication and improve maintainability.


321-327: Enum vs literal vendor value

Throughout the file vendor strings ("openai", etc.) are hard-coded. Prefer the enum values from GenAIAttributes for future resiliency.

packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (2)

27-41: Repeated literal "MistralAI" vendor

Same comment as other files – consider centralising vendor constant via enum to avoid divergence.

Also applies to: 73-89, 126-143, 179-197


176-197: Large duplication – consider parametrised fixtures

The chat & streaming variants share extensive duplicated assertions differing only by the fixture used (instrument_*). A helper asserting common span attributes would shrink the test file and make future convention changes easier.

packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (2)

25-29: Minor naming mismatch in prompt attribute key

Other instrumentations store the user prompt under …prompt.0.content, but here it is stored under …prompt.0.user.
If this was accidental, switching to content keeps cross-provider parity.

-            f"{SpanAttributes.GEN_AI_PROMPT}.0.user",
+            f"{SpanAttributes.GEN_AI_PROMPT}.0.content",

57-66: Unused parameter (llm_model) in set_response_attributes

llm_model is accepted but never referenced. Either remove the argument or use it (e.g., set GEN_AI_RESPONSE_MODEL).

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

273-276: Mixed attribute families in the same span

GEN_AI_SYSTEM replaces LLM_SYSTEM, but the next line still sets LLM_REQUEST_TYPE. Consider moving that one to a GEN_AI_REQUEST_TYPE constant (if defined) for consistency.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

391-402: Hard-coded strings for usage metrics – prefer the SpanAttributes constants

metric_shared_attributes builds the dict with string literals
"gen_ai.operation.name" etc., but uses SpanAttributes.GEN_AI_SYSTEM
and GEN_AI_RESPONSE_MODEL. For the two usage keys introduced in this
PR we now have SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS /
GEN_AI_USAGE_OUTPUT_TOKENS. Using the constants avoids typos and keeps
refactors mechanical.

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (2)

1291-1299: Raw "gen_ai.usage.*" literals – use the constants to avoid drift

These assertions use bare strings for the input / output token counters.
Now that SpanAttributes exposes
GEN_AI_USAGE_INPUT_TOKENS / GEN_AI_USAGE_OUTPUT_TOKENS, switch to:

-assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
+assert anthropic_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514

Besides eliminating magic strings, it guarantees the tests fail when the
spec keys change again.

Also applies to: 1468-1474, 1604-1610


433-468: Verbose prompt/completion assertions – extractor helper would shrink test noise

The repeated pattern

assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.role"] == ...
assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.content"] == ...

appears dozens of times. A small helper such as
assert_prompt(span, idx, role, content) would make the intentions
clearer and the diff noise smaller when the convention inevitably
evolves again.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01b6fc1 and f8914d6.

📒 Files selected for processing (47)
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (8 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (7 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (22 hunks)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (23 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (3 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (15 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py (12 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (17 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (14 hunks)
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (6 hunks)
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (3 hunks)
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py (3 hunks)
  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (9 hunks)
  • packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py (9 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (5 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (9 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (16 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (6 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py (1 hunks)
  • packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py (7 hunks)
  • packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (12 hunks)
  • packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py (8 hunks)
  • packages/opentelemetry-instrumentation-ollama/tests/test_chat.py (16 hunks)
  • packages/opentelemetry-instrumentation-ollama/tests/test_generation.py (12 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (5 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (6 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (5 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py (2 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (15 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (13 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (19 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py (10 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py (9 hunks)
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py (6 hunks)
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py (3 hunks)
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (13 hunks)
  • packages/traceloop-sdk/tests/test_manual.py (1 hunks)
  • packages/traceloop-sdk/tests/test_privacy_no_prompts.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (2 hunks)
✅ Files skipped from review due to trivial changes (5)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/init.py
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
  • packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
🚧 Files skipped from review as they are similar to previous changes (41)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py
  • packages/traceloop-sdk/tests/test_privacy_no_prompts.py
  • packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
  • packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py
  • packages/traceloop-sdk/tests/test_manual.py
  • packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/init.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
  • packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/init.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
  • packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
  • packages/opentelemetry-instrumentation-ollama/tests/test_chat.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (37-249)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (2)

99-109: Token attribute migration consistently applied.

The migration pattern is consistently applied here, matching the changes in the first test function. The same consideration about LLM_USAGE_TOTAL_TOKENS applies here as well.


158-168: Token attribute migration consistently applied across all test functions.

The migration is consistently implemented across all three test functions, maintaining the same logical assertions while updating to the new semantic convention attribute names.

@martimfasantos martimfasantos requested a review from nirga July 19, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants