feat: implement structured JSON logging across services#161
feat: implement structured JSON logging across services#161Suhaskumard wants to merge 3 commits into
Conversation
|
@Suhaskumard is attempting to deploy a commit to the s3dfx-cyber's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning Review limit reached
More reviews will be available in 39 minutes and 27 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR replaces ad-hoc ChangesStructured JSON Logging Across Services
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @S3DFX-CYBER, |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
tests/unit/test_logging.py (2)
60-90: ⚡ Quick winConsider adding test coverage for exception logging.
This test validates the JSON structure for normal log messages, but the
JSONFormatteralso handles exceptions viarecord.exc_infoandformatException(). Consider adding a test case that logs withexc_info=Trueto verify the"exception"field is properly formatted.💡 Suggested test addition
def test_json_formatter_with_exception(self): """Test that exceptions are properly formatted in JSON logs.""" import json from io import StringIO from services.utils.logging_config import JSONFormatter log_stream = StringIO() logger = logging.getLogger("test_json_exception") logger.setLevel(logging.ERROR) logger.handlers.clear() handler = logging.StreamHandler(log_stream) handler.setFormatter(JSONFormatter()) logger.addHandler(handler) try: raise ValueError("Test exception") except ValueError: logger.error("An error occurred", exc_info=True) log_output = log_stream.getvalue().strip() log_data = json.loads(log_output) assert log_data.get("message") == "An error occurred" assert "exception" in log_data assert "ValueError: Test exception" in log_data["exception"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_logging.py` around lines 60 - 90, Add a new unit test to cover exception formatting by JSONFormatter: create a test (e.g., test_json_formatter_with_exception) that configures a StringIO StreamHandler with services.utils.logging_config.JSONFormatter, emits a logger.error("An error occurred", exc_info=True) inside an except ValueError block, reads the JSON output and asserts that "message" == "An error occurred" and that an "exception" key exists whose value contains "ValueError: Test exception"; this verifies JSONFormatter.formatException/record.exc_info handling.
92-115: ⚡ Quick winConsider explicitly verifying correlation_id key absence when unset.
The test correctly validates behavior when
correlation_id_varis unset (or set to a falsy value likeNone). However, the assertionassert log_data.get("correlation_id") is Nonepasses both when the key is absent and when it's present with aNonevalue.Since the
JSONFormatteromits the key entirely whencorrelation_idis falsy (line 27-28 in logging_config.py checkif correlation_id:), consider adding an explicit assertion to document this behavior.💡 Suggested assertion addition
log_output = log_stream.getvalue().strip() log_data = json.loads(log_output) assert log_data.get("message") == "No correlation ID here" - assert log_data.get("correlation_id") is None + # Verify the correlation_id key is absent, not present with None value + assert "correlation_id" not in log_data🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_logging.py` around lines 92 - 115, The test_correlation_id_empty currently uses assert log_data.get("correlation_id") is None which does not distinguish between a missing key and a key set to None; update the test to explicitly assert the key is absent (e.g., assert "correlation_id" not in log_data) to match JSONFormatter's behavior for falsy correlation_id_var; reference the test function test_correlation_id_empty and the correlation_id_var/JSONFormatter behavior to locate where to change the assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/tenet_agent/tenet_solve.py:
- Around line 124-127: Replace the direct raw LLM payload logs in tenet_solve.py
(the logger.warning calls around parse_file_changes that log llm_output[:2000],
and the similar block at lines ~212-214) with metadata-only logs: log length and
a short hash/fingerprint of llm_output instead of its content, and only emit the
full payload when a secure debug flag is explicitly enabled (e.g., an env var or
module-level SECURE_DEBUG flag checked before logging the raw llm_output).
Update the code paths that reference llm_output and the logger (the
parse_file_changes logging block and the second logging block) to perform
hashing and length computation and gate full-content logging behind the secure
debug flag.
- Around line 208-217: The result of call_llm(model, code_prompt) (variable
code_output) can be None and is currently passed directly into
parse_file_changes(code_output); add a guard after the logging block that checks
if not code_output (None or empty), log an error with context via logger.error/
warning, and invoke the existing fallback flow (e.g., create/post an issue
comment or return an empty file_changes list) instead of calling
parse_file_changes; update callers that expect file_changes so they handle the
fallback path. Ensure you reference the variables and functions involved:
call_llm, code_output, parse_file_changes, and logger.
In @.github/tenet_agent/utils.py:
- Around line 244-247: The current logger.error call is printing full git
stdout/stderr which may leak secrets; change it to log the git return code
(result.returncode) and only bounded, sanitized snippets of stdout/stderr rather
than the full texts: trim to a small length (e.g. first/last ~200 chars),
collapse newlines to spaces, and apply simple redaction (mask
token/password-like patterns via regex) before logging; update the call around
the logger.error that references cmd and result so the message contains the
return code, the sanitized/truncated stdout and stderr snippets, and avoids
dumping the entire output.
In `@CONTRIBUTING.md`:
- Line 168: Update the Correlation IDs doc to state that automatic propagation
applies only when the HTTP correlation ID middleware is installed; for non-HTTP
contexts (scripts, background workers, cron jobs) contributors must manually set
and propagate the context var by calling correlation_id_var.set(<id>) at the
entry point of the task/process and ensure that any spawned threads/tasks
capture that context (e.g., set before creating threads/futures or use helper
wrappers that propagate context); mention explicitly that correlation_id_var is
the symbol to use and add a short note pointing contributors to the HTTP
middleware as the automatic case.
- Line 175: Update CONTRIBUTING.md to clarify that the logging system does not
redact secrets by default: explicitly state that JSONFormatter.format() in
services/utils/logging_config.py uses record.getMessage() and
self.formatException(record.exc_info) and returns json.dumps(payload) with no
automatic redaction, so any passwords, tokens, API keys or raw secrets passed to
logger.* will be emitted in plaintext; instruct contributors to never log
secrets, to use provided redaction utilities or a sanitizer before logging, and
to review logger calls (e.g., JSONFormatter.format, record.getMessage,
formatException) for accidental secret exposure.
In `@services/ingest/app.py`:
- Around line 136-142: The middleware correlation_id_middleware sets the context
var correlation_id_var but never resets it, risking ID leakage between requests;
update correlation_id_middleware to capture the token returned by
correlation_id_var.set(corr_id), invoke call_next(request) in a try block, and
always call correlation_id_var.reset(token) in a finally block (ensuring
response header setting remains correct and exceptions still propagate).
In `@services/utils/logging_config.py`:
- Around line 17-33: In the format method of logging_config.py (class
CustomFormatter -> def format), replace the naive timestamp construction using
datetime.utcnow() with a timezone-aware one: use
datetime.now(timezone.utc).isoformat() (ensure timezone is imported from
datetime). Make the same change in scripts/train_model.py for the trained_at and
generated_at assignments and in services/ingest/app.py for any timestamp fields
so all UTC timestamps are timezone-aware.
---
Nitpick comments:
In `@tests/unit/test_logging.py`:
- Around line 60-90: Add a new unit test to cover exception formatting by
JSONFormatter: create a test (e.g., test_json_formatter_with_exception) that
configures a StringIO StreamHandler with
services.utils.logging_config.JSONFormatter, emits a logger.error("An error
occurred", exc_info=True) inside an except ValueError block, reads the JSON
output and asserts that "message" == "An error occurred" and that an "exception"
key exists whose value contains "ValueError: Test exception"; this verifies
JSONFormatter.formatException/record.exc_info handling.
- Around line 92-115: The test_correlation_id_empty currently uses assert
log_data.get("correlation_id") is None which does not distinguish between a
missing key and a key set to None; update the test to explicitly assert the key
is absent (e.g., assert "correlation_id" not in log_data) to match
JSONFormatter's behavior for falsy correlation_id_var; reference the test
function test_correlation_id_empty and the correlation_id_var/JSONFormatter
behavior to locate where to change the assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a97c4c58-48d1-4514-9796-4a3e95f2f9ec
📒 Files selected for processing (11)
.github/tenet_agent/tenet_review.py.github/tenet_agent/tenet_solve.py.github/tenet_agent/utils.pyCONTRIBUTING.mdexamples/llm_plugin_demo.pyscripts/train_model.pyscripts/verify_model_artifacts.pyservices/analyzer/model/phishing_model.pyservices/ingest/app.pyservices/utils/logging_config.pytests/unit/test_logging.py
| logger.warning("⚠️ parse_file_changes: no FILE blocks matched any pattern.") | ||
| logger.warning("─── RAW LLM OUTPUT (first 2000 chars) ───") | ||
| logger.warning(llm_output[:2000]) | ||
| logger.warning("──────────────────────────────────────────") |
There was a problem hiding this comment.
Avoid logging raw LLM output payloads directly.
At Line 124–127 and Line 212–214, dumping raw model output to logs can expose sensitive code/content in CI artifacts. Log metadata (length/hash) instead, and gate full payload behind an explicit secure debug flag.
Suggested fix
- logger.warning("─── RAW LLM OUTPUT (first 2000 chars) ───")
- logger.warning(llm_output[:2000])
- logger.warning("──────────────────────────────────────────")
+ logger.warning(
+ "parse_file_changes: no FILE blocks matched; output_len=%d",
+ len(llm_output or ""),
+ )
@@
- logger.debug("─── RAW LLM OUTPUT (first 500 chars) ────")
- logger.debug((code_output or "")[:500])
- logger.debug("─────────────────────────────────────────")
+ if os.getenv("TENET_DEBUG_LLM_OUTPUT") == "1":
+ logger.debug("LLM output preview (first 500 chars): %r", (code_output or "")[:500])
+ else:
+ logger.debug("LLM output received; length=%d", len(code_output or ""))Also applies to: 212-214
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/tenet_agent/tenet_solve.py around lines 124 - 127, Replace the
direct raw LLM payload logs in tenet_solve.py (the logger.warning calls around
parse_file_changes that log llm_output[:2000], and the similar block at lines
~212-214) with metadata-only logs: log length and a short hash/fingerprint of
llm_output instead of its content, and only emit the full payload when a secure
debug flag is explicitly enabled (e.g., an env var or module-level SECURE_DEBUG
flag checked before logging the raw llm_output). Update the code paths that
reference llm_output and the logger (the parse_file_changes logging block and
the second logging block) to perform hashing and length computation and gate
full-content logging behind the secure debug flag.
| code_output = call_llm(model, code_prompt) | ||
| print("✍️ Code generation complete.") | ||
| logger.info("✍️ Code generation complete.") | ||
|
|
||
| # ── Debug: log raw output header to aid future parse failures ───────────── | ||
| print("─── RAW LLM OUTPUT (first 500 chars) ────") | ||
| print((code_output or "")[:500]) | ||
| print("─────────────────────────────────────────") | ||
| logger.debug("─── RAW LLM OUTPUT (first 500 chars) ────") | ||
| logger.debug((code_output or "")[:500]) | ||
| logger.debug("─────────────────────────────────────────") | ||
|
|
||
| # ── Parse file changes ───────────────────────────────────────────────────── | ||
| file_changes = parse_file_changes(code_output) |
There was a problem hiding this comment.
Guard code_output before parsing to prevent crash on LLM failures.
call_llm() can return None; at Line 217, parse_file_changes(code_output) then dereferences non-string input and can fail before posting a fallback issue comment.
Suggested fix
code_output = call_llm(model, code_prompt)
logger.info("✍️ Code generation complete.")
+ if not code_output:
+ post_issue_comment(
+ repo,
+ issue_number,
+ f"## 🤖 TENET Agent - Fix Generation Failed\n\n"
+ f"TENET Agent could not generate a code response for issue #{issue_number}.\n\n"
+ f"Please retry or review workflow logs.\n\n---\n*TENET Agent 🛡️*",
+ )
+ logger.error("❌ Code generation returned empty/failed response.")
+ sys.exit(1)
# ── Debug: log raw output header to aid future parse failures ─────────────🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/tenet_agent/tenet_solve.py around lines 208 - 217, The result of
call_llm(model, code_prompt) (variable code_output) can be None and is currently
passed directly into parse_file_changes(code_output); add a guard after the
logging block that checks if not code_output (None or empty), log an error with
context via logger.error/ warning, and invoke the existing fallback flow (e.g.,
create/post an issue comment or return an empty file_changes list) instead of
calling parse_file_changes; update callers that expect file_changes so they
handle the fallback path. Ensure you reference the variables and functions
involved: call_llm, code_output, parse_file_changes, and logger.
| logger.error( | ||
| f"Git command failed: {' '.join(cmd)}\n" | ||
| f"Stdout: {result.stdout}\nStderr: {result.stderr}" | ||
| ) |
There was a problem hiding this comment.
Redact or limit git stdout/stderr in error logs.
At Line 244–247, logging full git stdout/stderr risks leaking credentials or sensitive repo content in CI logs. Log return code + bounded/sanitized snippets instead.
Suggested fix
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
- logger.error(
- f"Git command failed: {' '.join(cmd)}\n"
- f"Stdout: {result.stdout}\nStderr: {result.stderr}"
- )
+ safe_stdout = (result.stdout or "")[:500]
+ safe_stderr = (result.stderr or "")[:500]
+ logger.error(
+ "Git command failed: %s (code=%s) stdout=%r stderr=%r",
+ " ".join(cmd[:2]), # avoid logging full args
+ result.returncode,
+ safe_stdout,
+ safe_stderr,
+ )
raise RuntimeError(f"git command failed: {result.stderr}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/tenet_agent/utils.py around lines 244 - 247, The current
logger.error call is printing full git stdout/stderr which may leak secrets;
change it to log the git return code (result.returncode) and only bounded,
sanitized snippets of stdout/stderr rather than the full texts: trim to a small
length (e.g. first/last ~200 chars), collapse newlines to spaces, and apply
simple redaction (mask token/password-like patterns via regex) before logging;
update the call around the logger.error that references cmd and result so the
message contains the return code, the sanitized/truncated stdout and stderr
snippets, and avoids dumping the entire output.
| from services.utils.logging_config import setup_logging | ||
| logger = setup_logging(__name__) | ||
| ``` | ||
| - **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers. |
There was a problem hiding this comment.
Clarify correlation ID propagation for non-HTTP contexts.
The documentation states correlation IDs are "automatically handled," but this is only true for HTTP services with middleware. From the context, only services/ingest/app.py has correlation ID middleware. Scripts, background workers, and other non-HTTP services would need to manually set correlation_id_var if they want correlation IDs in their logs. Contributors implementing logging in these contexts may be confused by the "automatic" claim.
📝 Suggested clarification
-- **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers.
+- **Correlation IDs**: HTTP services automatically propagate request correlation IDs via middleware (see `services/ingest/app.py`). For scripts and non-HTTP services, correlation IDs can be set manually using `correlation_id_var.set()` from `services.utils.logging_config`. Do not manually construct log messages with request identifiers.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers. | |
| - **Correlation IDs**: HTTP services automatically propagate request correlation IDs via middleware (see `services/ingest/app.py`). For scripts and non-HTTP services, correlation IDs can be set manually using `correlation_id_var.set()` from `services.utils.logging_config`. Do not manually construct log messages with request identifiers. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CONTRIBUTING.md` at line 168, Update the Correlation IDs doc to state that
automatic propagation applies only when the HTTP correlation ID middleware is
installed; for non-HTTP contexts (scripts, background workers, cron jobs)
contributors must manually set and propagate the context var by calling
correlation_id_var.set(<id>) at the entry point of the task/process and ensure
that any spawned threads/tasks capture that context (e.g., set before creating
threads/futures or use helper wrappers that propagate context); mention
explicitly that correlation_id_var is the symbol to use and add a short note
pointing contributors to the HTTP middleware as the automatic case.
| - `WARNING`: Recoverable or unexpected situations (e.g., rate limit exceeded). | ||
| - `ERROR`: Failed operations (e.g., API call failed). | ||
| - `CRITICAL`: Severe failures affecting application availability (e.g., database connection lost). | ||
| - **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Confirm JSONFormatter emits getMessage() and formatException() without redaction.
# Search for redaction, masking, or filtering logic in JSONFormatter
rg -n -A5 -B5 'class JSONFormatter' services/utils/logging_config.py
# Search for any sanitize/redact functions in logging_config.py
rg -n 'def.*(redact|sanitize|mask|filter)' services/utils/logging_config.pyRepository: TENET-DEV-AI/TENET-AI
Length of output: 529
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="services/utils/logging_config.py"
echo "== File: $file =="
wc -l "$file"
echo "== JSONFormatter definition =="
rg -n "class JSONFormatter" "$file" -n
echo "== JSONFormatter.format() body (expanded context) =="
line=$(rg -n "class JSONFormatter" "$file" | head -n1 | cut -d: -f1)
# Print from a little before class through some lines after format()
start=$((line-10))
end=$((line+120))
sed -n "${start},${end}p" "$file" | nl -ba
echo "== Any redaction/sanitization/masking helpers in this file =="
rg -n "(redact|sanitize|mask|filter)" "$file" || true
echo "== Any use of record.getMessage()/exception formatting in JSONFormatter.format() =="
rg -n "getMessage\\(|formatException\\(" "$file" || trueRepository: TENET-DEV-AI/TENET-AI
Length of output: 335
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="services/utils/logging_config.py"
echo "== File: $file =="
wc -l "$file"
echo "== JSONFormatter definition =="
rg -n "class JSONFormatter" "$file"
line=$(rg -n "class JSONFormatter" "$file" | head -n1 | cut -d: -f1)
echo "== JSONFormatter.format() body (expanded) =="
start=$((line-10))
end=$((line+180))
# Use cat -n (available) to preserve line numbers
sed -n "${start},${end}p" "$file" | cat -n
echo "== Redaction/sanitization/masking helpers in this file =="
rg -n "(redact|sanitize|mask|filter)" "$file" || true
echo "== Uses of record.getMessage() / formatException() in this file =="
rg -n "getMessage\\(|formatException\\(" "$file" || true
echo "== Any logging framework-wide redaction mention (repo) =="
rg -n "(redact|sanitize|mask|filter)" --glob='*.py' . || trueRepository: TENET-DEV-AI/TENET-AI
Length of output: 3748
Clarify that logging does not automatically redact secrets (CONTRIBUTING.md)
services/utils/logging_config.py’s JSONFormatter.format() sets "message": record.getMessage() and "exception": self.formatException(record.exc_info) and returns json.dumps(payload) with no redaction/masking, so secrets passed to logger.* will be written in plaintext. Update the guidance accordingly.
🛡️ Suggested documentation enhancement
-- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets.
+- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. The logging framework does not automatically redact sensitive values; developers are responsible for ensuring secrets are not passed to logger calls.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. | |
| - **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. The logging framework does not automatically redact sensitive values; developers are responsible for ensuring secrets are not passed to logger calls. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CONTRIBUTING.md` at line 175, Update CONTRIBUTING.md to clarify that the
logging system does not redact secrets by default: explicitly state that
JSONFormatter.format() in services/utils/logging_config.py uses
record.getMessage() and self.formatException(record.exc_info) and returns
json.dumps(payload) with no automatic redaction, so any passwords, tokens, API
keys or raw secrets passed to logger.* will be emitted in plaintext; instruct
contributors to never log secrets, to use provided redaction utilities or a
sanitizer before logging, and to review logger calls (e.g.,
JSONFormatter.format, record.getMessage, formatException) for accidental secret
exposure.
| @app.middleware("http") | ||
| async def correlation_id_middleware(request: Request, call_next): | ||
| corr_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4()) | ||
| correlation_id_var.set(corr_id) | ||
| response = await call_next(request) | ||
| response.headers["X-Correlation-ID"] = corr_id | ||
| return response |
There was a problem hiding this comment.
Context variable cleanup is missing—correlation IDs may leak between requests.
The middleware sets correlation_id_var but never resets it. In async contexts with connection pooling or when exceptions occur, this can cause correlation IDs to leak into subsequent unrelated requests, breaking request tracing.
The proper pattern is to capture the token from .set() and reset it in a finally block.
🔒 Proposed fix with proper cleanup
`@app.middleware`("http")
async def correlation_id_middleware(request: Request, call_next):
corr_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())
- correlation_id_var.set(corr_id)
- response = await call_next(request)
- response.headers["X-Correlation-ID"] = corr_id
- return response
+ token = correlation_id_var.set(corr_id)
+ try:
+ response = await call_next(request)
+ response.headers["X-Correlation-ID"] = corr_id
+ return response
+ finally:
+ correlation_id_var.reset(token)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/ingest/app.py` around lines 136 - 142, The middleware
correlation_id_middleware sets the context var correlation_id_var but never
resets it, risking ID leakage between requests; update correlation_id_middleware
to capture the token returned by correlation_id_var.set(corr_id), invoke
call_next(request) in a try block, and always call
correlation_id_var.reset(token) in a finally block (ensuring response header
setting remains correct and exceptions still propagate).
| def format(self, record: logging.LogRecord) -> str: | ||
| payload: Dict[str, Any] = { | ||
| "timestamp": f"{datetime.utcnow().isoformat()}Z", | ||
| "level": record.levelname, | ||
| "logger": record.name, | ||
| "message": record.getMessage(), | ||
| } | ||
|
|
||
| # Inject correlation ID if available | ||
| correlation_id = correlation_id_var.get() | ||
| if correlation_id: | ||
| payload["correlation_id"] = correlation_id | ||
|
|
||
| if record.exc_info: | ||
| payload["exception"] = self.formatException(record.exc_info) | ||
|
|
||
| return json.dumps(payload) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check Python version and find deprecated datetime.utcnow() usage
echo "=== Python version from project config ==="
fd -t f 'pyproject.toml|.python-version|.tool-versions|Dockerfile' --exec cat {}
echo -e "\n=== All datetime.utcnow() usages ==="
rg -n 'datetime\.utcnow\(\)' --type=pyRepository: TENET-DEV-AI/TENET-AI
Length of output: 1480
Replace datetime.utcnow() with datetime.now(timezone.utc) for UTC timestamps.
datetime.utcnow() returns a naive datetime (no tzinfo); switch to datetime.now(timezone.utc).isoformat() for timezone-aware UTC. Apply the same change in the other matches as well:
scripts/train_model.py(trained_at,generated_at)services/ingest/app.py(timestampfields)services/utils/logging_config.py(timestamp)
🧰 Tools
🪛 ast-grep (0.43.0)
[info] 32-32: use jsonify instead of json.dumps for JSON output
Context: json.dumps(payload)
Note: Security best practice.
(use-jsonify)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/utils/logging_config.py` around lines 17 - 33, In the format method
of logging_config.py (class CustomFormatter -> def format), replace the naive
timestamp construction using datetime.utcnow() with a timezone-aware one: use
datetime.now(timezone.utc).isoformat() (ensure timezone is imported from
datetime). Make the same change in scripts/train_model.py for the trained_at and
generated_at assignments and in services/ingest/app.py for any timestamp fields
so all UTC timestamps are timezone-aware.
Summary
Implemented project-wide structured JSON logging by replacing ad-hoc
print()statements with the centralized logging framework. Added unit tests to validate JSON log formatting and correlation ID handling, improving observability, consistency, and reliability across TENET-AI services.Key Changes
print()statements with structured logger callsRelated Issue
Fixes #99
Type of Change
Changes Made
Logging Infrastructure
print()statements with structured logger eventsGitHub Agent Utilities
tenet_solve.pyutils.pyMachine Learning Services
scripts/train_model.pyscripts/verify_model_artifacts.pyservices/analyzer/model/phishing_model.pyExample Applications
examples/llm_plugin_demo.pyTesting
Added new unit tests in
tests/unit/test_logging.py:test_json_formatter_structureValidates generated logs are valid JSON
Verifies required fields:
test_correlation_id_emptynullis emitted instead of causing failuresDocumentation
CONTRIBUTING.mdFiles Modified
utils/logging_config.pytenet_solve.pyutils.pyscripts/train_model.pyscripts/verify_model_artifacts.pyservices/analyzer/model/phishing_model.pyexamples/llm_plugin_demo.pytests/unit/test_logging.pyCONTRIBUTING.mdScreenshots / Logs
1. Structured JSON Log Output
Structured JSON logs generated through the centralized logging framework.
2. Logging Configuration
Centralized JSON logging configuration used across TENET-AI services.
3. Updated Service Logging
Migration from print statements to structured logger calls.
4. Unit Tests Passing
Successful execution of
pytest tests/unit/test_logging.pyverifying JSON formatting and correlation ID handling.How Has This Been Tested?
Validation Performed
print()statements were replacedConfirmed all logging tests pass successfully
Validated JSON formatter structure
Validated correlation ID fallback behavior
Verified logging functionality in:
Checklist
Summary by cubic
Implemented project-wide structured JSON logging with correlation IDs, replacing ad-hoc prints to improve observability across services. Expanded middleware, formatter, tests, and docs to standardize behavior end-to-end.
New Features
services.utils.logging_configwithJSONFormatterandcorrelation_id_var.X-Correlation-IDon requests and responses..github/tenet_agent/*) now usesetup_logging.CONTRIBUTING.md.Refactors
print()with logger calls in TENET Agent tools, example app, training/verify scripts, phishing model, and ingest app.setup_logging.Written for commit 60fd3ed. Summary will update on new commits.
Summary by CodeRabbit
Documentation
Tests
Chores