Skip to content

[Teams Chatbot] Reduce hosted agent log noise and use AAD-only Cosmos auth#16230

Merged
JiaqiZhang-Dev merged 1 commit into
mainfrom
fix/agent-log-noise-and-cosmos-aad-auth
Jul 2, 2026
Merged

[Teams Chatbot] Reduce hosted agent log noise and use AAD-only Cosmos auth#16230
JiaqiZhang-Dev merged 1 commit into
mainfrom
fix/agent-log-noise-and-cosmos-aad-auth

Conversation

@JiaqiZhang-Dev

@JiaqiZhang-Dev JiaqiZhang-Dev commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

Resolve the issues found by SRE agent

  • The "insufficient_scope" 403 errors: It's throwed by microsoft.opentelemetry.a365.core.exporters.agent365_exporter, original log is below. It's caused by agent lacks Agent 365 observability permissions(doc), Currently we do not need export telemetry to Microsoft Agent 365, and the permission requires Global Administrator or Application Administrator role in Microsoft Entra ID, so we could suppress this error now.
HTTP 403 non-retryable error. Correlation ID: 349a4d54-fc12-457c-be9b-c33acec1e63f. Response: . WWW-Authenticate: Bearer error="insufficient_scope", error_description="Required app role: Agent365.Observability.OtelWrite", scope="Agent365.Observability.OtelWrite". Response headers: {'Date': 'Wed, 01 Jul 2026 06:45:28 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'WWW-Authenticate': 'Bearer error="insufficient_scope", error_description="Required app role: Agent365.Observability.OtelWrite", scope="Agent365.Observability.OtelWrite"', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'x-ms-islandgateway': '_prdcm001csegb0_5, _prdil105eusgb0_3', 'x-ms-igw-tracking-id': 'fd972ffe-ec50-457a-9d85-b77f8a091ff320260701064527_prdcm001csegb0_5, 79c5e829-2cfa-44c8-8eb9-b997c07953a820260701064528_prdil105eusgb0_3', 'mise-correlation-id': '4302815f-d35f-405b-ae01-b5a218b42085', 'x-ms-ppapigateway': '_prdcm001csegb0_4, _prdil105eusgb0_8', 'x-ms-gateway-clusters': 'prdcm001cse, prdil105eus', 'x-servicefabric': 'NoRetry', 'x-ms-service-request-id': 'fd972ffe-ec50-457a-9d85-b77f8a091ff3', 'x-ms-correlation-id': '349a4d54-fc12-457c-be9b-c33acec1e63f', 'x-ms-activity-vector': '00.00.00', 'Server-Timing': 'x-ms-igw-upstream-headers;dur=302.2,x-ms-igw-req-overhead;dur=0.3', 'X-Content-Type-Options': 'nosniff', 'x-azure-ref': '20260701T064527Z-175fc7bccd497z65hC1GVXrbm80000000n70000000000kh2', 'X-Cache': 'CONFIG_NOCACHE'}
  • Chunk 1 of 1 failed for tenant...: It's also from Microsoft Agent 365, same as the first issue, we could suppress this error now.
Chunk 1 of 1 failed for tenant xxxxx, agent xxxxx
logger_name: microsoft.opentelemetry.a365.core.exporters.agent365_exporter
  • Key Vault secret 'AZURE-COSMOSDB-KEY' not found: This is a known warning, we've dropped the key-based auth for cosmosdb, so we could remove the logic now.
  • ModuleNotFoundError (A365 instrumentation): Same as the first issue, we need to suppress this error now.
  • IMDS connection refused: set ENV OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=otel

Changes

Log noise reduction (agents/chat_agent/)

  • init.py — convert the noisy-logger suppression loop to (logger_name, level) tuples and add two entries:
    • microsoft.opentelemetry.a365.core.exporters.agent365_exporterCRITICAL: drops the repeated insufficient_scope 403 errors (severity 3) emitted until the agent identity is granted Agent365.Observability.OtelWrite.
    • microsoft.opentelemetry._distroERROR: drops the benign No module named 'agents' warning (the openai-agents SDK is not installed / not used by this agent).
  • Dockerfile — set ENV OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=otel so the Azure VM resource detector no longer probes IMDS (169.254.169.254), which always fails with Connection refused in a Foundry hosted container (not a VM). Removes the failed HTTP call and its startup latency.

Cosmos DB auth (utils/azure_cosmosdb.py)

  • Remove the key-based auth path (_get_cosmos_credential() + Key Vault AZURE-COSMOSDB-KEY lookup) and the now-unused get_secret import.
  • Authenticate exclusively via get_credential() (AAD / managed identity).

Container startup logs were flooded with benign, non-actionable messages
and Cosmos DB still supported key-based auth. This cleans both up.

Log noise (agents/chat_agent):
- init.py: raise A365 exporter logger to CRITICAL to drop repeated
  insufficient_scope 403 errors (severity 3) emitted until the agent
  identity is granted Agent365.Observability.OtelWrite.
- init.py: raise microsoft.opentelemetry._distro logger to ERROR to drop
  the benign "No module named 'agents'" warning (openai-agents SDK is
  not installed / not used).
- Dockerfile: set OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=otel so the Azure
  VM resource detector no longer probes IMDS (169.254.169.254), which
  always fails with Connection refused in a Foundry hosted container.

Cosmos DB auth (utils/azure_cosmosdb.py):
- Remove key-based auth path (_get_cosmos_credential + Key Vault lookup);
  authenticate exclusively via get_credential() (AAD / managed identity).

Note: the A365 exporter suppression is a workaround. The root cause is a
missing Agent365.Observability.OtelWrite app-role assignment on the agent
identities, which requires a directory admin to grant.
@JiaqiZhang-Dev JiaqiZhang-Dev changed the title Reduce hosted agent log noise and use AAD-only Cosmos auth [Teams Chatbot] Reduce hosted agent log noise and use AAD-only Cosmos auth Jul 1, 2026
@JiaqiZhang-Dev JiaqiZhang-Dev marked this pull request as ready for review July 1, 2026 07:17
Copilot AI review requested due to automatic review settings July 1, 2026 07:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces operational log noise in the Teams chatbot hosted container and simplifies Cosmos DB authentication by removing the Key Vault key-based fallback in favor of AAD/managed identity only.

Changes:

  • Suppresses specific high-volume OpenTelemetry/A365-related loggers by setting per-logger severity thresholds.
  • Removes Cosmos DB key-based auth retrieval (Key Vault AZURE-COSMOSDB-KEY) and uses get_credential() exclusively.
  • Sets OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=otel in the chat agent container to avoid IMDS probing in hosted environments.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tools/sdk-ai-bots/azure-sdk-qa-bot-agent/utils/azure_cosmosdb.py Removes Key Vault key retrieval path; Cosmos client now uses AAD credential only.
tools/sdk-ai-bots/azure-sdk-qa-bot-agent/agents/chat_agent/init.py Refactors noisy logger suppression to per-logger configurable levels; adds A365-related suppressions.
tools/sdk-ai-bots/azure-sdk-qa-bot-agent/agents/chat_agent/Dockerfile Sets OTEL resource detector env var to prevent IMDS/VM detector startup noise/latency.

@JiaqiZhang-Dev

Copy link
Copy Markdown
Member Author

/check-enforcer override

@JiaqiZhang-Dev JiaqiZhang-Dev merged commit 305f65d into main Jul 2, 2026
19 of 23 checks passed
@JiaqiZhang-Dev JiaqiZhang-Dev deleted the fix/agent-log-noise-and-cosmos-aad-auth branch July 2, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants