-
Notifications
You must be signed in to change notification settings - Fork 763
fix(instrumentation): updated GenAI token usage attributes #3138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(instrumentation): updated GenAI token usage attributes #3138
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 3dad6f9 in 1 minute and 35 seconds. Click for details.
- Reviewed
591
lines of code in7
files - Skipped
0
files when reviewing. - Skipped posting
13
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:50
- Draft comment:
Updated constants: 'LLM_USAGE_COMPLETION_TOKENS' now uses 'gen_ai.usage.output_tokens' and 'LLM_USAGE_PROMPT_TOKENS' uses 'gen_ai.usage.input_tokens'. This aligns with the new naming conventions. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is purely informative, explaining the changes made to constants and their alignment with new naming conventions. It does not provide any actionable feedback or suggestions for improvement.
2. packages/opentelemetry-instrumentation-together/tests/test_chat.py:37
- Draft comment:
Test assertions now use the updated attribute names (input_tokens/output_tokens) for token usage. Verify consistency with new semantic conventions. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the PR author to verify consistency with new semantic conventions, which falls under the rule of not asking the author to double-check things. It doesn't provide a specific suggestion or point out a specific issue with the code.
3. packages/opentelemetry-instrumentation-together/tests/test_completion.py:37
- Draft comment:
Ensure that the test for Together completions correctly sums 'gen_ai.usage.input_tokens' and 'gen_ai.usage.output_tokens' to match total tokens. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:70
- Draft comment:
The OpenAI prompt caching tests have been updated to use new input/output token attribute names. Confirm that these changes are in sync with semantic conventions. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the PR author to confirm that changes are in sync with semantic conventions, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written. Therefore, it should be removed.
5. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:54
- Draft comment:
Consider updating the accompanying documentation or changelog to mention that the legacy token-related attribute names have been replaced with 'input_tokens' and 'output_tokens'. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment suggests updating documentation or changelog, which is not allowed as per the rules. It doesn't provide a specific code suggestion or ask for a test to be written.
6. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:109
- Draft comment:
Typographical error: "cassete" should be "cassette" in the comment. - Reason this comment was not posted:
Comment was on unchanged code.
7. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:195
- Draft comment:
Typo/grammatical note: The comment currently says "but not wrote to it," which could be updated to "but not write to it," for clarity. - Reason this comment was not posted:
Comment was on unchanged code.
8. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:604
- Draft comment:
Typographical suggestion: The comment reads "but not wrote to it," which could be clearer. Consider revising it to something like "but did not write to it," to improve clarity. - Reason this comment was not posted:
Comment was on unchanged code.
9. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:798
- Draft comment:
Typo in comment: "but not wrote to it" is ungrammatical. Consider using "but not writing to it" or "but not writing into it" for clarity. - Reason this comment was not posted:
Comment was on unchanged code.
10. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:931
- Draft comment:
Typo in comment: Consider updating "but not wrote to it" to "but did not write to it" for clearer grammar. - Reason this comment was not posted:
Comment was on unchanged code.
11. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1025
- Draft comment:
Typo in comment: Consider changing 'wrote to it' to 'written to it' for clarity. - Reason this comment was not posted:
Comment was on unchanged code.
12. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1029
- Draft comment:
Typographical error: 'cassete' appears to be misspelled. Consider updating it to 'cassette'. - Reason this comment was not posted:
Comment was on unchanged code.
13. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1362
- Draft comment:
Typo: 'cassete' should be spelled as 'cassette' in the comment. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_oahtGC07iVEgOTYG
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martimfasantos! in this case I'd remove them from the semantic conventions and just use it directly from the otel package - that way it will always be up to date (when we originally built it there weren't any semconvs for this)
WalkthroughThe changes systematically update all span attribute keys and related constants from the legacy Changes
Sequence Diagram(s)sequenceDiagram
participant App
participant Instrumentation
participant Span
participant Metrics
App->>Instrumentation: Make AI model call
Instrumentation->>Span: Start span with GEN_AI_* attributes
Instrumentation->>Span: Set GEN_AI_* prompt/completion/model/system
Instrumentation->>Metrics: Record metrics with GEN_AI_* keys
Instrumentation-->>App: Return AI model response
App->>Instrumentation: Retrieve span/metrics for validation (tests)
Instrumentation->>Span: Assert GEN_AI_* attributes present
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (12)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)
210-225
: Replace LLM token attributes with GEN_AI equivalentsThe semantic-conventions package exports
GEN_AI_USAGE_INPUT_TOKENS
andGEN_AI_USAGE_OUTPUT_TOKENS
, so we should stop using the oldLLM_USAGE_PROMPT_TOKENS
andLLM_USAGE_COMPLETION_TOKENS
. The total-tokens attribute remainsLLM_USAGE_TOTAL_TOKENS
(noGEN_AI_USAGE_TOTAL_TOKENS
is defined).Locations to update:
- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~210–225
- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~324–336
Suggested diff:
- _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, - stream_response.get("input_token_count"), - ) + _set_span_attribute( + span, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, + stream_response.get("input_token_count"), + ) - _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, - stream_response.get("generated_token_count"), - ) + _set_span_attribute( + span, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, + stream_response.get("generated_token_count"), + ) total_token = stream_response.get("input_token_count") + stream_response.get( "generated_token_count" ) - _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_TOTAL_TOKENS, - total_token, - ) + _set_span_attribute( + span, + SpanAttributes.LLM_USAGE_TOTAL_TOKENS, + total_token, + )Please apply the same replacements in the second block (lines 324–336) and update any related tests to expect the new
GEN_AI_USAGE_*
constants.packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (1)
49-65
: Align usage attributes with Gen AI semantic conventionsThe Gen AI spec defines
gen_ai.usage.input_tokens
,gen_ai.usage.output_tokens
, and the cache‐related usage attributes, but does not yet providegen_ai.usage.total_tokens
or agen_ai.request.type
. To stay consistent:• Replace LLM prompt/completion and cache‐input tokens with the GEN_AI equivalents
• KeepLLM_USAGE_TOTAL_TOKENS
andLLM_REQUEST_TYPE
as they have no Gen AI counterpartsLocations to update (packages/traceloop-sdk/traceloop/sdk/tracing/manual.py, lines 49–65):
- self._span.set_attribute( - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens - ) - self._span.set_attribute( - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens - ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens + ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens + ) self._span.set_attribute( SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens ) - self._span.set_attribute( - SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS, - usage.cache_creation_input_tokens, - ) - self._span.set_attribute( - SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS, - usage.cache_read_input_tokens, - ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS, + usage.cache_creation_input_tokens, + ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, + usage.cache_read_input_tokens, + )No change needed for
LLM_REQUEST_TYPE
—the semantic conventions only define"llm.request.type"
at this time.packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
479-497
:None > 0
guard needed before metric recording
prompt_tokens
orcompletion_tokens
can beNone
when the provider omits usage stats.
ComparingNone > 0
raisesTypeError
.- if prompt_tokens > 0: + if isinstance(prompt_tokens, (int, float)) and prompt_tokens > 0: @@ - if completion_tokens > 0: + if isinstance(completion_tokens, (int, float)) and completion_tokens > 0:packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)
226-233
: Inconsistent token usage attribute naming.The token usage attributes still use the legacy
LLM_USAGE_*
naming instead of the newerGEN_AI_USAGE_*
conventions. Based on the PR objectives to update token terminology, these should be updated to align with the new naming conventions.Apply this diff to align with the new token usage attribute naming:
_set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage_dict.get("completion_tokens"), ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage_dict.get("prompt_tokens"), )packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
402-416
: Inconsistent token usage attribute naming.The token usage span attributes still use the legacy
LLM_USAGE_*
naming instead of the newerGEN_AI_USAGE_*
conventions, which is inconsistent with the overall migration to GenAI semantic conventions.Consider updating these attributes to align with the new naming conventions:
set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens, )set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens, )packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py (1)
80-89
: Inconsistent attribute naming - some LLM_ attributes remain.*While the response model was updated, the token usage attributes and finish reason attribute still use the legacy
LLM_*
naming instead of the newerGEN_AI_*
conventions.Consider updating these remaining attributes for consistency:
span.set_attribute( - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens ) - span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens) + span.set_attribute(SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens)And for the finish reason attribute:
span.set_attribute( - SpanAttributes.LLM_RESPONSE_FINISH_REASON, choices[0].finish_reason + SpanAttributes.GEN_AI_RESPONSE_FINISH_REASONS, choices[0].finish_reason )packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py (1)
88-109
: Incomplete semantic convention migration for token usage attributesWhile
GEN_AI_RESPONSE_MODEL
has been correctly updated, the token usage attributes on lines 97, 102, and 107 still use the legacyLLM_USAGE_*
naming convention. Based on the PR objectives to update token usage terminology, these should be updated to:
LLM_USAGE_PROMPT_TOKENS
→GEN_AI_USAGE_INPUT_TOKENS
LLM_USAGE_COMPLETION_TOKENS
→GEN_AI_USAGE_OUTPUT_TOKENS
LLM_USAGE_TOTAL_TOKENS
→GEN_AI_USAGE_TOTAL_TOKENS
_set_span_attribute( span, - SpanAttributes.LLM_USAGE_TOTAL_TOKENS, + SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS, input_tokens + output_tokens, ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens, ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens, )packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (1)
902-911
: Update token usage attribute names in streaming testsThe new semantic conventions define
GEN_AI_USAGE_INPUT_TOKENS
andGEN_AI_USAGE_OUTPUT_TOKENS
, so the legacyLLM_USAGE_*
constants should be replaced accordingly. Since there is noGEN_AI_USAGE_TOTAL_TOKENS
constant, compute the total by summing input and output tokens.• File: packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (around lines 902–911)
- Replace
SpanAttributes.LLM_USAGE_PROMPT_TOKENS
→SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS
- Replace
SpanAttributes.LLM_USAGE_COMPLETION_TOKENS
→SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS
- Update total‐tokens assertion to sum the two new attributes instead of using
LLM_USAGE_TOTAL_TOKENS
Suggested diff:
- assert ( - bedrock_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS] - == inputTokens - ) - assert ( - bedrock_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS] - == outputTokens - ) - assert ( - bedrock_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] - == inputTokens + outputTokens - ) + assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == inputTokens + assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == outputTokens + assert ( + bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] + + bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + == inputTokens + outputTokens + )packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (2)
222-230
: Token usage attributes not updated to new semantic conventionsSimilar to other files in this PR, the token usage attributes still use legacy
LLM_USAGE_*
naming:
LLM_USAGE_PROMPT_TOKENS
(line 222)LLM_USAGE_COMPLETION_TOKENS
(line 224)LLM_USAGE_TOTAL_TOKENS
(line 228)For consistency with the PR objectives, these should be updated to use
GEN_AI_USAGE_*
equivalents with the new "input/output" terminology instead of "prompt/completion".
76-90
: Update semantic convention prefixes for penalty attributes and confirm streaming attributeThe semantic conventions for frequency and presence penalties have
GEN_AI_REQUEST_*
equivalents and should replace the legacyLLM_*
prefixes. The streaming attribute currently has noGEN_AI_*
equivalent in the semantic conventions—continue usingLLM_IS_STREAMING
or propose addingGEN_AI_REQUEST_STREAMING
.• In
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
- Line 85: replace
SpanAttributes.LLM_FREQUENCY_PENALTY
withSpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY
- Line 88: replace
SpanAttributes.LLM_PRESENCE_PENALTY
withSpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY
- Line 90: retain
SpanAttributes.LLM_IS_STREAMING
(noGEN_AI_REQUEST_STREAMING
found inopentelemetry-semantic-conventions-ai
)Suggested diff:
- set_span_attribute( - span, SpanAttributes.LLM_FREQUENCY_PENALTY, kwargs.get("frequency_penalty") - ) + set_span_attribute( + span, SpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY, kwargs.get("frequency_penalty") + ) - set_span_attribute( - span, SpanAttributes.LLM_PRESENCE_PENALTY, kwargs.get("presence_penalty") - ) + set_span_attribute( + span, SpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY, kwargs.get("presence_penalty") + )packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (1)
86-140
: Migrate legacyLLM_USAGE_*
attributes to the new naming conventionThe old token-usage constants (
LLM_USAGE_COMPLETION_TOKENS
,LLM_USAGE_PROMPT_TOKENS
,LLM_USAGE_TOTAL_TOKENS
) are still in use across multiple instrumentation packages (e.g.,groq/span_utils.py
,together/span_utils.py
,vertexai/span_utils.py
, etc.). All of these must be updated to the new attribute names alongside the already-migratedGEN_AI_TOKEN_TYPE
andGEN_AI_RESPONSE_MODEL
.Please update every occurrence in your span utilities—including tests—to use the correct, non-legacy constants. You can locate all remaining uses with:
rg -l 'LLM_USAGE_.*TOKEN' packages/opentelemetry-instrumentation-*/• Replace
LLM_USAGE_PROMPT_TOKENS
• ReplaceLLM_USAGE_COMPLETION_TOKENS
• ReplaceLLM_USAGE_TOTAL_TOKENS
with their new equivalents in each
span_utils.py
(and related test files).packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
287-300
:_get_vendor_from_url
returns capital-cased values that violate the GenAI enum
GenAISystemValues
in the incuabating semconv are lower-case ("openai"
,"aws"
,"azure"
, …).
Here we return"AWS"
,"Azure"
, etc.; this will produce spans that fail semantic-convention
validation and break downstream queries that expect the canonical values.- if "openai.azure.com" in base_url: - return "Azure" - elif "amazonaws.com" in base_url or "bedrock" in base_url: - return "AWS" - elif "googleapis.com" in base_url or "vertex" in base_url: - return "Google" - elif "openrouter.ai" in base_url: - return "OpenRouter" + if "openai.azure.com" in base_url: + return "azure" + elif "amazonaws.com" in base_url or "bedrock" in base_url: + return "aws" + elif "googleapis.com" in base_url or "vertex" in base_url: + return "google" + elif "openrouter.ai" in base_url: + return "openrouter"A one-line
return vendor.lower()
after detection would also work.
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
269-276
: Repeat comment: use enum for vendorSame concern as above—replace the literal
"AWS"
with the enum value to stay future-proof.
🧹 Nitpick comments (13)
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py (1)
177-181
: Simplify nested if statements.The static analysis tool correctly identified nested if statements that can be combined for better readability.
Apply this diff to combine the nested if statements:
- if should_send_prompts(): - if llm_request_type == LLMRequestTypeValues.COMPLETION: + if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION: _set_span_attribute( span, f"{SpanAttributes.GEN_AI_COMPLETION}.0.content", response.text )packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
51-52
: Prefer the official enum to avoid string drift
"AWS"
is hard-coded while elsewhere the enumGenAIAttributes.GenAiSystemValues.AWS_BEDROCK.value
is used. Using the enum keeps tests aligned if the canonical value ever changes.-assert bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM] == "AWS" +assert ( + bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM] + == GenAIAttributes.GenAiSystemValues.AWS_BEDROCK.value +)packages/opentelemetry-instrumentation-cohere/tests/test_completion.py (1)
53-56
: Minor consistency nitConsider using
GenAIAttributes.GenAiSystemValues.COHERE.value
instead of the literal"Cohere"
for the vendor check.packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py (2)
145-153
: Duplication across testsThe same literal prompt string & model ID are asserted in ~20 places. Parametrize via
pytest.mark.parametrize
or a fixture to reduce duplication and improve maintainability.
321-327
: Enum vs literal vendor valueThroughout the file vendor strings (
"openai"
, etc.) are hard-coded. Prefer the enum values fromGenAIAttributes
for future resiliency.packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (2)
27-41
: Repeated literal"MistralAI"
vendorSame comment as other files – consider centralising vendor constant via enum to avoid divergence.
Also applies to: 73-89, 126-143, 179-197
176-197
: Large duplication – consider parametrised fixturesThe chat & streaming variants share extensive duplicated assertions differing only by the fixture used (
instrument_*
). A helper asserting common span attributes would shrink the test file and make future convention changes easier.packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (2)
25-29
: Minor naming mismatch in prompt attribute keyOther instrumentations store the user prompt under
…prompt.0.content
, but here it is stored under…prompt.0.user
.
If this was accidental, switching tocontent
keeps cross-provider parity.- f"{SpanAttributes.GEN_AI_PROMPT}.0.user", + f"{SpanAttributes.GEN_AI_PROMPT}.0.content",
57-66
: Unused parameter (llm_model
) inset_response_attributes
llm_model
is accepted but never referenced. Either remove the argument or use it (e.g., setGEN_AI_RESPONSE_MODEL
).packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
273-276
: Mixed attribute families in the same span
GEN_AI_SYSTEM
replacesLLM_SYSTEM
, but the next line still setsLLM_REQUEST_TYPE
. Consider moving that one to aGEN_AI_REQUEST_TYPE
constant (if defined) for consistency.packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
391-402
: Hard-coded strings for usage metrics – prefer the SpanAttributes constants
metric_shared_attributes
builds the dict with string literals
"gen_ai.operation.name"
etc., but usesSpanAttributes.GEN_AI_SYSTEM
andGEN_AI_RESPONSE_MODEL
. For the two usage keys introduced in this
PR we now haveSpanAttributes.GEN_AI_USAGE_INPUT_TOKENS
/
GEN_AI_USAGE_OUTPUT_TOKENS
. Using the constants avoids typos and keeps
refactors mechanical.packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (2)
1291-1299
: Raw"gen_ai.usage.*"
literals – use the constants to avoid driftThese assertions use bare strings for the input / output token counters.
Now thatSpanAttributes
exposes
GEN_AI_USAGE_INPUT_TOKENS
/GEN_AI_USAGE_OUTPUT_TOKENS
, switch to:-assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 +assert anthropic_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514Besides eliminating magic strings, it guarantees the tests fail when the
spec keys change again.Also applies to: 1468-1474, 1604-1610
433-468
: Verbose prompt/completion assertions – extractor helper would shrink test noiseThe repeated pattern
assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.role"] == ... assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.content"] == ...appears dozens of times. A small helper such as
assert_prompt(span, idx, role, content)
would make the intentions
clearer and the diff noise smaller when the convention inevitably
evolves again.
.../opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py
Outdated
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py
(3 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
(8 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
(7 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
(3 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
(22 hunks)packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
(23 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
(3 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
(15 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
(12 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
(17 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
(14 hunks)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
(6 hunks)packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
(3 hunks)packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
(5 hunks)packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py
(3 hunks)packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py
(9 hunks)packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py
(9 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
(5 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
(9 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
(16 hunks)packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py
(5 hunks)packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
(6 hunks)packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
(1 hunks)packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py
(7 hunks)packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py
(12 hunks)packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py
(8 hunks)packages/opentelemetry-instrumentation-ollama/tests/test_chat.py
(16 hunks)packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
(12 hunks)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
(5 hunks)packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
(6 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
(5 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py
(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
(15 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
(13 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
(19 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
(10 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
(9 hunks)packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py
(5 hunks)packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py
(5 hunks)packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py
(6 hunks)packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py
(3 hunks)packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py
(13 hunks)packages/traceloop-sdk/tests/test_manual.py
(1 hunks)packages/traceloop-sdk/tests/test_privacy_no_prompts.py
(1 hunks)packages/traceloop-sdk/traceloop/sdk/tracing/manual.py
(2 hunks)
✅ Files skipped from review due to trivial changes (5)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
- packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py
- packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/init.py
- packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
- packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
🚧 Files skipped from review as they are similar to previous changes (41)
- packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
- packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py
- packages/traceloop-sdk/tests/test_privacy_no_prompts.py
- packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
- packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py
- packages/traceloop-sdk/tests/test_manual.py
- packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py
- packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
- packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py
- packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
- packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/init.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
- packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
- packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/init.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
- packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
- packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py
- packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
- packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
- packages/traceloop-sdk/traceloop/sdk/tracing/manual.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
- packages/opentelemetry-instrumentation-ollama/tests/test_chat.py
- packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes
(37-249)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (2)
99-109
: Token attribute migration consistently applied.The migration pattern is consistently applied here, matching the changes in the first test function. The same consideration about
LLM_USAGE_TOTAL_TOKENS
applies here as well.
158-168
: Token attribute migration consistently applied across all test functions.The migration is consistently implemented across all three test functions, maintaining the same logical assertions while updating to the new semantic convention attribute names.
This pull request updates the token usage terminology across multiple packages and test files. The changes involve replacing
prompt_tokens
withinput_tokens
andcompletion_tokens
withoutput_tokens
to align with updated naming conventions present in the Documentation. Additionally, all related assertions in the tests have been modified to reflect these updates.Screenshots
feat(instrumentation): ...
orfix(instrumentation): ...
.Important
Renamed
prompt_tokens
toinput_tokens
andcompletion_tokens
tooutput_tokens
across various test files and packages to align with updated naming conventions.prompt_tokens
toinput_tokens
andcompletion_tokens
tooutput_tokens
inSpanAttributes
in__init__.py
.test_completion.py
,test_messages.py
, andtest_prompt_caching.py
to useinput_tokens
andoutput_tokens
.test_completion.py
for AlephAlpha, Anthropic, OpenAI, and Together packages.test_prompt_caching.py
for Anthropic and OpenAI to reflect new token terminology.test_chat.py
andtest_completion.py
for Together package to use updated token names.This description was created by
for 3dad6f9. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit