Skip to content

feat: implement structured JSON logging across services#161

Open
Suhaskumard wants to merge 3 commits into
TENET-DEV-AI:mainfrom
Suhaskumard:feature/structured-json-logging
Open

feat: implement structured JSON logging across services#161
Suhaskumard wants to merge 3 commits into
TENET-DEV-AI:mainfrom
Suhaskumard:feature/structured-json-logging

Conversation

@Suhaskumard

@Suhaskumard Suhaskumard commented Jun 11, 2026

Copy link
Copy Markdown

🤖 TENET Agent will automatically review this PR for security issues and code quality.
Maintainers: to have TENET solve an issue autonomously, comment /tenet fix on the issue.


Summary

Implemented project-wide structured JSON logging by replacing ad-hoc print() statements with the centralized logging framework. Added unit tests to validate JSON log formatting and correlation ID handling, improving observability, consistency, and reliability across TENET-AI services.

Key Changes

  • Replaced legacy print() statements with structured logger calls
  • Standardized JSON logging across services, scripts, utilities, and examples
  • Added unit tests for JSON formatter validation
  • Added unit tests for correlation ID handling
  • Updated contributor documentation with logging standards

Related Issue

Fixes #99


Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • CI/CD Improvement
  • Added Tests

Changes Made

Logging Infrastructure

  • Enhanced centralized JSON logging configuration
  • Replaced direct print() statements with structured logger events
  • Standardized logging behavior across project modules
  • Improved compatibility with log aggregation and monitoring systems

GitHub Agent Utilities

  • Refactored tenet_solve.py
  • Refactored utils.py
  • Integrated centralized logger usage

Machine Learning Services

  • Updated scripts/train_model.py
  • Updated scripts/verify_model_artifacts.py
  • Updated services/analyzer/model/phishing_model.py
  • Converted training and verification output to structured logs

Example Applications

  • Updated examples/llm_plugin_demo.py
  • Replaced console output with logger calls

Testing

Added new unit tests in tests/unit/test_logging.py:

  • test_json_formatter_structure

    • Validates generated logs are valid JSON

    • Verifies required fields:

      • logger
      • level
      • message
      • timestamp
      • correlation_id
  • test_correlation_id_empty

    • Verifies missing correlation IDs are safely handled
    • Confirms JSON output remains valid
    • Ensures null is emitted instead of causing failures

Documentation

  • Added Logging Standards section to CONTRIBUTING.md
  • Documented approved logging practices
  • Added guidance for future contributors

Files Modified

File Purpose
utils/logging_config.py Centralized JSON logging configuration
tenet_solve.py Replaced print statements
utils.py Integrated structured logging
scripts/train_model.py Added JSON logging
scripts/verify_model_artifacts.py Added JSON logging
services/analyzer/model/phishing_model.py Updated logging implementation
examples/llm_plugin_demo.py Replaced print outputs
tests/unit/test_logging.py Added JSON logging unit tests
CONTRIBUTING.md Added logging standards

Screenshots / Logs

1. Structured JSON Log Output

image image

Structured JSON logs generated through the centralized logging framework.

2. Logging Configuration

image image

Centralized JSON logging configuration used across TENET-AI services.

3. Updated Service Logging

image image

Migration from print statements to structured logger calls.

4. Unit Tests Passing

image image

Successful execution of pytest tests/unit/test_logging.py verifying JSON formatting and correlation ID handling.


How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Manual testing

Validation Performed

  • Verified all identified print() statements were replaced
  • Confirmed logger initialization across updated modules
  • Verified structured JSON output formatting
  • Executed:
pytest tests/unit/test_logging.py
  • Confirmed all logging tests pass successfully

  • Validated JSON formatter structure

  • Validated correlation ID fallback behavior

  • Verified logging functionality in:

    • GitHub agent utilities
    • ML training scripts
    • Model verification scripts
    • Phishing detection model
    • Example applications

Checklist

  • My code follows the project's code style
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes


Summary by cubic

Implemented project-wide structured JSON logging with correlation IDs, replacing ad-hoc prints to improve observability across services. Expanded middleware, formatter, tests, and docs to standardize behavior end-to-end.

  • New Features

    • Centralized JSON logger via services.utils.logging_config with JSONFormatter and correlation_id_var.
    • Ingest middleware sets and propagates X-Correlation-ID on requests and responses.
    • Agents and utilities (.github/tenet_agent/*) now use setup_logging.
    • Unit tests expanded for JSON structure and correlation ID handling; updated logging config tests.
    • Added logging standards to CONTRIBUTING.md.
  • Refactors

    • Replaced print() with logger calls in TENET Agent tools, example app, training/verify scripts, phishing model, and ingest app.
    • Standardized log levels; removed custom ingest JSON formatter in favor of setup_logging.

Written for commit 60fd3ed. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • Documentation

    • Added Logging Standards section to contribution guidelines, requiring structured JSON logging and forbidding ad-hoc console output.
  • Tests

    • Expanded test coverage for JSON logging structure and correlation ID validation.
  • Chores

    • Implemented structured JSON logging across workflows and utility modules for improved request traceability and debugging capabilities.

@vercel

vercel Bot commented Jun 11, 2026

Copy link
Copy Markdown

@Suhaskumard is attempting to deploy a commit to the s3dfx-cyber's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@Suhaskumard, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 27 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: afa08c7b-e321-43e1-be48-112937ce8df2

📥 Commits

Reviewing files that changed from the base of the PR and between 08b2bf1 and 60fd3ed.

📒 Files selected for processing (1)
  • tests/unit/test_logging_config.py
📝 Walkthrough

Walkthrough

This PR replaces ad-hoc print() statements throughout the codebase with structured JSON logging. A new JSONFormatter emits correlated logs with timestamps, severity levels, and context; HTTP middleware propagates correlation IDs across requests; and all scripts, agents, and examples adopt centralized logging via setup_logging(). Logging standards are now documented in the contributor guide.

Changes

Structured JSON Logging Across Services

Layer / File(s) Summary
JSON logging infrastructure and correlation IDs
services/utils/logging_config.py
Core JSONFormatter emits structured JSON with timestamp, level, logger name, message, correlation ID, and exception details. Module-level correlation_id_var holds per-request/async-context correlation IDs. setup_logging() now applies JSONFormatter to all handlers.
Correlation ID middleware and JSON logging tests
services/ingest/app.py, tests/unit/test_logging.py
HTTP middleware reads X-Correlation-ID header (or generates UUID), stores in correlation_id_var, forwards request, and echoes correlation ID in response. Test suite validates JSON formatter structure, presence of required fields, and correlation ID handling when set or unset.
TENET agent workflow scripts refactoring
.github/tenet_agent/tenet_review.py, .github/tenet_agent/tenet_solve.py, .github/tenet_agent/utils.py
Review workflow logs environment setup, PR diff fetching, LLM analysis start/completion, and security label success/failure. Solve workflow logs issue metadata, repo scanning, code-generation phases with debug output preview, and PR creation status. Utilities log environment errors, LLM failures, GitHub comment posts, and git command errors with appropriate severity.
Supporting scripts and examples refactoring
scripts/train_model.py, scripts/verify_model_artifacts.py, services/analyzer/model/phishing_model.py, examples/llm_plugin_demo.py
Model training, artifact verification, phishing detector, and LLM plugin demo convert all console output and error reporting from print() to structured logging. Validation logic and test/demo flow remain unchanged; only output mechanism delegates to centralized logger.
Logging standards documentation
CONTRIBUTING.md
New "Logging Standards" subsection specifies structured JSON logging requirements, forbids ad-hoc print() usage, directs contributors to use setup_logging(), documents automatic correlation ID handling, defines severity levels, and prohibits logging sensitive data.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • TENET-DEV-AI/TENET-AI#90: Extends unit test coverage for setup_logging behavior, handler initialization, and logger defaults alongside the new JSON formatter and correlation ID tests in this PR.
  • TENET-DEV-AI/TENET-AI#45: Introduced the TENET agent reviewer/solver scripts (.github/tenet_agent/*) that this PR refactors to use centralized structured logging instead of print().
  • TENET-DEV-AI/TENET-AI#35: Introduced the initial services/utils/logging_config.py module and corresponding unit tests that this PR extends with JSONFormatter, correlation_id_var, and correlation ID middleware integration.

Suggested labels

SSoC26, hard

Suggested reviewers

  • S3DFX-CYBER

Poem

🐰 Hop along the structured way,
JSON logs now save the day,
Correlation IDs trace each flow,
Debugging's easier, don't you know?
Print() is gone—the logger's here,
Production secrets now so clear! 🚀

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing structured JSON logging across the entire codebase, which aligns with the file modifications and objectives.
Linked Issues check ✅ Passed All coding objectives from issue #99 are met: print() statements replaced with logging calls, structured JSON format implemented, correlation IDs added via context variables, severity levels implemented, centralized logging established, and contributor documentation updated.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #99 objectives. No out-of-scope modifications detected; changes are limited to logging infrastructure updates and print-to-logger migrations across specified modules.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description is comprehensive and well-structured, following the required template with all major sections completed including Summary, Key Changes, Related Issue, Type of Change, detailed Changes Made breakdown, Files Modified table, Screenshots/Logs, Testing methodology, and Checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Suhaskumard

Copy link
Copy Markdown
Author

Hi @S3DFX-CYBER,
I have solved the issue if any changes kindly let me know

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 11 files

Re-trigger cubic

@S3DFX-CYBER S3DFX-CYBER requested a review from Preetham404 June 11, 2026 11:29

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
tests/unit/test_logging.py (2)

60-90: ⚡ Quick win

Consider adding test coverage for exception logging.

This test validates the JSON structure for normal log messages, but the JSONFormatter also handles exceptions via record.exc_info and formatException(). Consider adding a test case that logs with exc_info=True to verify the "exception" field is properly formatted.

💡 Suggested test addition
def test_json_formatter_with_exception(self):
    """Test that exceptions are properly formatted in JSON logs."""
    import json
    from io import StringIO
    from services.utils.logging_config import JSONFormatter
    
    log_stream = StringIO()
    logger = logging.getLogger("test_json_exception")
    logger.setLevel(logging.ERROR)
    logger.handlers.clear()
    
    handler = logging.StreamHandler(log_stream)
    handler.setFormatter(JSONFormatter())
    logger.addHandler(handler)
    
    try:
        raise ValueError("Test exception")
    except ValueError:
        logger.error("An error occurred", exc_info=True)
    
    log_output = log_stream.getvalue().strip()
    log_data = json.loads(log_output)
    
    assert log_data.get("message") == "An error occurred"
    assert "exception" in log_data
    assert "ValueError: Test exception" in log_data["exception"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_logging.py` around lines 60 - 90, Add a new unit test to
cover exception formatting by JSONFormatter: create a test (e.g.,
test_json_formatter_with_exception) that configures a StringIO StreamHandler
with services.utils.logging_config.JSONFormatter, emits a logger.error("An error
occurred", exc_info=True) inside an except ValueError block, reads the JSON
output and asserts that "message" == "An error occurred" and that an "exception"
key exists whose value contains "ValueError: Test exception"; this verifies
JSONFormatter.formatException/record.exc_info handling.

92-115: ⚡ Quick win

Consider explicitly verifying correlation_id key absence when unset.

The test correctly validates behavior when correlation_id_var is unset (or set to a falsy value like None). However, the assertion assert log_data.get("correlation_id") is None passes both when the key is absent and when it's present with a None value.

Since the JSONFormatter omits the key entirely when correlation_id is falsy (line 27-28 in logging_config.py check if correlation_id:), consider adding an explicit assertion to document this behavior.

💡 Suggested assertion addition
     log_output = log_stream.getvalue().strip()
     log_data = json.loads(log_output)
     
     assert log_data.get("message") == "No correlation ID here"
-    assert log_data.get("correlation_id") is None
+    # Verify the correlation_id key is absent, not present with None value
+    assert "correlation_id" not in log_data
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_logging.py` around lines 92 - 115, The
test_correlation_id_empty currently uses assert log_data.get("correlation_id")
is None which does not distinguish between a missing key and a key set to None;
update the test to explicitly assert the key is absent (e.g., assert
"correlation_id" not in log_data) to match JSONFormatter's behavior for falsy
correlation_id_var; reference the test function test_correlation_id_empty and
the correlation_id_var/JSONFormatter behavior to locate where to change the
assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/tenet_agent/tenet_solve.py:
- Around line 124-127: Replace the direct raw LLM payload logs in tenet_solve.py
(the logger.warning calls around parse_file_changes that log llm_output[:2000],
and the similar block at lines ~212-214) with metadata-only logs: log length and
a short hash/fingerprint of llm_output instead of its content, and only emit the
full payload when a secure debug flag is explicitly enabled (e.g., an env var or
module-level SECURE_DEBUG flag checked before logging the raw llm_output).
Update the code paths that reference llm_output and the logger (the
parse_file_changes logging block and the second logging block) to perform
hashing and length computation and gate full-content logging behind the secure
debug flag.
- Around line 208-217: The result of call_llm(model, code_prompt) (variable
code_output) can be None and is currently passed directly into
parse_file_changes(code_output); add a guard after the logging block that checks
if not code_output (None or empty), log an error with context via logger.error/
warning, and invoke the existing fallback flow (e.g., create/post an issue
comment or return an empty file_changes list) instead of calling
parse_file_changes; update callers that expect file_changes so they handle the
fallback path. Ensure you reference the variables and functions involved:
call_llm, code_output, parse_file_changes, and logger.

In @.github/tenet_agent/utils.py:
- Around line 244-247: The current logger.error call is printing full git
stdout/stderr which may leak secrets; change it to log the git return code
(result.returncode) and only bounded, sanitized snippets of stdout/stderr rather
than the full texts: trim to a small length (e.g. first/last ~200 chars),
collapse newlines to spaces, and apply simple redaction (mask
token/password-like patterns via regex) before logging; update the call around
the logger.error that references cmd and result so the message contains the
return code, the sanitized/truncated stdout and stderr snippets, and avoids
dumping the entire output.

In `@CONTRIBUTING.md`:
- Line 168: Update the Correlation IDs doc to state that automatic propagation
applies only when the HTTP correlation ID middleware is installed; for non-HTTP
contexts (scripts, background workers, cron jobs) contributors must manually set
and propagate the context var by calling correlation_id_var.set(<id>) at the
entry point of the task/process and ensure that any spawned threads/tasks
capture that context (e.g., set before creating threads/futures or use helper
wrappers that propagate context); mention explicitly that correlation_id_var is
the symbol to use and add a short note pointing contributors to the HTTP
middleware as the automatic case.
- Line 175: Update CONTRIBUTING.md to clarify that the logging system does not
redact secrets by default: explicitly state that JSONFormatter.format() in
services/utils/logging_config.py uses record.getMessage() and
self.formatException(record.exc_info) and returns json.dumps(payload) with no
automatic redaction, so any passwords, tokens, API keys or raw secrets passed to
logger.* will be emitted in plaintext; instruct contributors to never log
secrets, to use provided redaction utilities or a sanitizer before logging, and
to review logger calls (e.g., JSONFormatter.format, record.getMessage,
formatException) for accidental secret exposure.

In `@services/ingest/app.py`:
- Around line 136-142: The middleware correlation_id_middleware sets the context
var correlation_id_var but never resets it, risking ID leakage between requests;
update correlation_id_middleware to capture the token returned by
correlation_id_var.set(corr_id), invoke call_next(request) in a try block, and
always call correlation_id_var.reset(token) in a finally block (ensuring
response header setting remains correct and exceptions still propagate).

In `@services/utils/logging_config.py`:
- Around line 17-33: In the format method of logging_config.py (class
CustomFormatter -> def format), replace the naive timestamp construction using
datetime.utcnow() with a timezone-aware one: use
datetime.now(timezone.utc).isoformat() (ensure timezone is imported from
datetime). Make the same change in scripts/train_model.py for the trained_at and
generated_at assignments and in services/ingest/app.py for any timestamp fields
so all UTC timestamps are timezone-aware.

---

Nitpick comments:
In `@tests/unit/test_logging.py`:
- Around line 60-90: Add a new unit test to cover exception formatting by
JSONFormatter: create a test (e.g., test_json_formatter_with_exception) that
configures a StringIO StreamHandler with
services.utils.logging_config.JSONFormatter, emits a logger.error("An error
occurred", exc_info=True) inside an except ValueError block, reads the JSON
output and asserts that "message" == "An error occurred" and that an "exception"
key exists whose value contains "ValueError: Test exception"; this verifies
JSONFormatter.formatException/record.exc_info handling.
- Around line 92-115: The test_correlation_id_empty currently uses assert
log_data.get("correlation_id") is None which does not distinguish between a
missing key and a key set to None; update the test to explicitly assert the key
is absent (e.g., assert "correlation_id" not in log_data) to match
JSONFormatter's behavior for falsy correlation_id_var; reference the test
function test_correlation_id_empty and the correlation_id_var/JSONFormatter
behavior to locate where to change the assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a97c4c58-48d1-4514-9796-4a3e95f2f9ec

📥 Commits

Reviewing files that changed from the base of the PR and between 721686f and 08b2bf1.

📒 Files selected for processing (11)
  • .github/tenet_agent/tenet_review.py
  • .github/tenet_agent/tenet_solve.py
  • .github/tenet_agent/utils.py
  • CONTRIBUTING.md
  • examples/llm_plugin_demo.py
  • scripts/train_model.py
  • scripts/verify_model_artifacts.py
  • services/analyzer/model/phishing_model.py
  • services/ingest/app.py
  • services/utils/logging_config.py
  • tests/unit/test_logging.py

Comment on lines +124 to +127
logger.warning("⚠️ parse_file_changes: no FILE blocks matched any pattern.")
logger.warning("─── RAW LLM OUTPUT (first 2000 chars) ───")
logger.warning(llm_output[:2000])
logger.warning("──────────────────────────────────────────")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging raw LLM output payloads directly.

At Line 124–127 and Line 212–214, dumping raw model output to logs can expose sensitive code/content in CI artifacts. Log metadata (length/hash) instead, and gate full payload behind an explicit secure debug flag.

Suggested fix
-    logger.warning("─── RAW LLM OUTPUT (first 2000 chars) ───")
-    logger.warning(llm_output[:2000])
-    logger.warning("──────────────────────────────────────────")
+    logger.warning(
+        "parse_file_changes: no FILE blocks matched; output_len=%d",
+        len(llm_output or ""),
+    )

@@
-    logger.debug("─── RAW LLM OUTPUT (first 500 chars) ────")
-    logger.debug((code_output or "")[:500])
-    logger.debug("─────────────────────────────────────────")
+    if os.getenv("TENET_DEBUG_LLM_OUTPUT") == "1":
+        logger.debug("LLM output preview (first 500 chars): %r", (code_output or "")[:500])
+    else:
+        logger.debug("LLM output received; length=%d", len(code_output or ""))

Also applies to: 212-214

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/tenet_agent/tenet_solve.py around lines 124 - 127, Replace the
direct raw LLM payload logs in tenet_solve.py (the logger.warning calls around
parse_file_changes that log llm_output[:2000], and the similar block at lines
~212-214) with metadata-only logs: log length and a short hash/fingerprint of
llm_output instead of its content, and only emit the full payload when a secure
debug flag is explicitly enabled (e.g., an env var or module-level SECURE_DEBUG
flag checked before logging the raw llm_output). Update the code paths that
reference llm_output and the logger (the parse_file_changes logging block and
the second logging block) to perform hashing and length computation and gate
full-content logging behind the secure debug flag.

Comment on lines 208 to 217
code_output = call_llm(model, code_prompt)
print("✍️ Code generation complete.")
logger.info("✍️ Code generation complete.")

# ── Debug: log raw output header to aid future parse failures ─────────────
print("─── RAW LLM OUTPUT (first 500 chars) ────")
print((code_output or "")[:500])
print("─────────────────────────────────────────")
logger.debug("─── RAW LLM OUTPUT (first 500 chars) ────")
logger.debug((code_output or "")[:500])
logger.debug("─────────────────────────────────────────")

# ── Parse file changes ─────────────────────────────────────────────────────
file_changes = parse_file_changes(code_output)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard code_output before parsing to prevent crash on LLM failures.

call_llm() can return None; at Line 217, parse_file_changes(code_output) then dereferences non-string input and can fail before posting a fallback issue comment.

Suggested fix
     code_output = call_llm(model, code_prompt)
     logger.info("✍️  Code generation complete.")
+    if not code_output:
+        post_issue_comment(
+            repo,
+            issue_number,
+            f"## 🤖 TENET Agent - Fix Generation Failed\n\n"
+            f"TENET Agent could not generate a code response for issue #{issue_number}.\n\n"
+            f"Please retry or review workflow logs.\n\n---\n*TENET Agent 🛡️*",
+        )
+        logger.error("❌ Code generation returned empty/failed response.")
+        sys.exit(1)

     # ── Debug: log raw output header to aid future parse failures ─────────────
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/tenet_agent/tenet_solve.py around lines 208 - 217, The result of
call_llm(model, code_prompt) (variable code_output) can be None and is currently
passed directly into parse_file_changes(code_output); add a guard after the
logging block that checks if not code_output (None or empty), log an error with
context via logger.error/ warning, and invoke the existing fallback flow (e.g.,
create/post an issue comment or return an empty file_changes list) instead of
calling parse_file_changes; update callers that expect file_changes so they
handle the fallback path. Ensure you reference the variables and functions
involved: call_llm, code_output, parse_file_changes, and logger.

Comment on lines +244 to 247
logger.error(
f"Git command failed: {' '.join(cmd)}\n"
f"Stdout: {result.stdout}\nStderr: {result.stderr}"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact or limit git stdout/stderr in error logs.

At Line 244–247, logging full git stdout/stderr risks leaking credentials or sensitive repo content in CI logs. Log return code + bounded/sanitized snippets instead.

Suggested fix
         result = subprocess.run(cmd, capture_output=True, text=True)
         if result.returncode != 0:
-            logger.error(
-                f"Git command failed: {' '.join(cmd)}\n"
-                f"Stdout: {result.stdout}\nStderr: {result.stderr}"
-            )
+            safe_stdout = (result.stdout or "")[:500]
+            safe_stderr = (result.stderr or "")[:500]
+            logger.error(
+                "Git command failed: %s (code=%s) stdout=%r stderr=%r",
+                " ".join(cmd[:2]),  # avoid logging full args
+                result.returncode,
+                safe_stdout,
+                safe_stderr,
+            )
             raise RuntimeError(f"git command failed: {result.stderr}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/tenet_agent/utils.py around lines 244 - 247, The current
logger.error call is printing full git stdout/stderr which may leak secrets;
change it to log the git return code (result.returncode) and only bounded,
sanitized snippets of stdout/stderr rather than the full texts: trim to a small
length (e.g. first/last ~200 chars), collapse newlines to spaces, and apply
simple redaction (mask token/password-like patterns via regex) before logging;
update the call around the logger.error that references cmd and result so the
message contains the return code, the sanitized/truncated stdout and stderr
snippets, and avoids dumping the entire output.

Comment thread CONTRIBUTING.md
from services.utils.logging_config import setup_logging
logger = setup_logging(__name__)
```
- **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify correlation ID propagation for non-HTTP contexts.

The documentation states correlation IDs are "automatically handled," but this is only true for HTTP services with middleware. From the context, only services/ingest/app.py has correlation ID middleware. Scripts, background workers, and other non-HTTP services would need to manually set correlation_id_var if they want correlation IDs in their logs. Contributors implementing logging in these contexts may be confused by the "automatic" claim.

📝 Suggested clarification
-- **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers.
+- **Correlation IDs**: HTTP services automatically propagate request correlation IDs via middleware (see `services/ingest/app.py`). For scripts and non-HTTP services, correlation IDs can be set manually using `correlation_id_var.set()` from `services.utils.logging_config`. Do not manually construct log messages with request identifiers.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers.
- **Correlation IDs**: HTTP services automatically propagate request correlation IDs via middleware (see `services/ingest/app.py`). For scripts and non-HTTP services, correlation IDs can be set manually using `correlation_id_var.set()` from `services.utils.logging_config`. Do not manually construct log messages with request identifiers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CONTRIBUTING.md` at line 168, Update the Correlation IDs doc to state that
automatic propagation applies only when the HTTP correlation ID middleware is
installed; for non-HTTP contexts (scripts, background workers, cron jobs)
contributors must manually set and propagate the context var by calling
correlation_id_var.set(<id>) at the entry point of the task/process and ensure
that any spawned threads/tasks capture that context (e.g., set before creating
threads/futures or use helper wrappers that propagate context); mention
explicitly that correlation_id_var is the symbol to use and add a short note
pointing contributors to the HTTP middleware as the automatic case.

Comment thread CONTRIBUTING.md
- `WARNING`: Recoverable or unexpected situations (e.g., rate limit exceeded).
- `ERROR`: Failed operations (e.g., API call failed).
- `CRITICAL`: Severe failures affecting application availability (e.g., database connection lost).
- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Confirm JSONFormatter emits getMessage() and formatException() without redaction.

# Search for redaction, masking, or filtering logic in JSONFormatter
rg -n -A5 -B5 'class JSONFormatter' services/utils/logging_config.py

# Search for any sanitize/redact functions in logging_config.py
rg -n 'def.*(redact|sanitize|mask|filter)' services/utils/logging_config.py

Repository: TENET-DEV-AI/TENET-AI

Length of output: 529


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="services/utils/logging_config.py"
echo "== File: $file =="
wc -l "$file"

echo "== JSONFormatter definition =="
rg -n "class JSONFormatter" "$file" -n

echo "== JSONFormatter.format() body (expanded context) =="
line=$(rg -n "class JSONFormatter" "$file" | head -n1 | cut -d: -f1)
# Print from a little before class through some lines after format()
start=$((line-10))
end=$((line+120))
sed -n "${start},${end}p" "$file" | nl -ba

echo "== Any redaction/sanitization/masking helpers in this file =="
rg -n "(redact|sanitize|mask|filter)" "$file" || true

echo "== Any use of record.getMessage()/exception formatting in JSONFormatter.format() =="
rg -n "getMessage\\(|formatException\\(" "$file" || true

Repository: TENET-DEV-AI/TENET-AI

Length of output: 335


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="services/utils/logging_config.py"
echo "== File: $file =="
wc -l "$file"

echo "== JSONFormatter definition =="
rg -n "class JSONFormatter" "$file"

line=$(rg -n "class JSONFormatter" "$file" | head -n1 | cut -d: -f1)

echo "== JSONFormatter.format() body (expanded) =="
start=$((line-10))
end=$((line+180))
# Use cat -n (available) to preserve line numbers
sed -n "${start},${end}p" "$file" | cat -n

echo "== Redaction/sanitization/masking helpers in this file =="
rg -n "(redact|sanitize|mask|filter)" "$file" || true

echo "== Uses of record.getMessage() / formatException() in this file =="
rg -n "getMessage\\(|formatException\\(" "$file" || true

echo "== Any logging framework-wide redaction mention (repo) =="
rg -n "(redact|sanitize|mask|filter)" --glob='*.py' . || true

Repository: TENET-DEV-AI/TENET-AI

Length of output: 3748


Clarify that logging does not automatically redact secrets (CONTRIBUTING.md)
services/utils/logging_config.py’s JSONFormatter.format() sets "message": record.getMessage() and "exception": self.formatException(record.exc_info) and returns json.dumps(payload) with no redaction/masking, so secrets passed to logger.* will be written in plaintext. Update the guidance accordingly.

🛡️ Suggested documentation enhancement
-- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets.
+- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. The logging framework does not automatically redact sensitive values; developers are responsible for ensuring secrets are not passed to logger calls.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets.
- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. The logging framework does not automatically redact sensitive values; developers are responsible for ensuring secrets are not passed to logger calls.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CONTRIBUTING.md` at line 175, Update CONTRIBUTING.md to clarify that the
logging system does not redact secrets by default: explicitly state that
JSONFormatter.format() in services/utils/logging_config.py uses
record.getMessage() and self.formatException(record.exc_info) and returns
json.dumps(payload) with no automatic redaction, so any passwords, tokens, API
keys or raw secrets passed to logger.* will be emitted in plaintext; instruct
contributors to never log secrets, to use provided redaction utilities or a
sanitizer before logging, and to review logger calls (e.g.,
JSONFormatter.format, record.getMessage, formatException) for accidental secret
exposure.

Comment thread services/ingest/app.py
Comment on lines +136 to +142
@app.middleware("http")
async def correlation_id_middleware(request: Request, call_next):
corr_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())
correlation_id_var.set(corr_id)
response = await call_next(request)
response.headers["X-Correlation-ID"] = corr_id
return response

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Context variable cleanup is missing—correlation IDs may leak between requests.

The middleware sets correlation_id_var but never resets it. In async contexts with connection pooling or when exceptions occur, this can cause correlation IDs to leak into subsequent unrelated requests, breaking request tracing.

The proper pattern is to capture the token from .set() and reset it in a finally block.

🔒 Proposed fix with proper cleanup
 `@app.middleware`("http")
 async def correlation_id_middleware(request: Request, call_next):
     corr_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())
-    correlation_id_var.set(corr_id)
-    response = await call_next(request)
-    response.headers["X-Correlation-ID"] = corr_id
-    return response
+    token = correlation_id_var.set(corr_id)
+    try:
+        response = await call_next(request)
+        response.headers["X-Correlation-ID"] = corr_id
+        return response
+    finally:
+        correlation_id_var.reset(token)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/ingest/app.py` around lines 136 - 142, The middleware
correlation_id_middleware sets the context var correlation_id_var but never
resets it, risking ID leakage between requests; update correlation_id_middleware
to capture the token returned by correlation_id_var.set(corr_id), invoke
call_next(request) in a try block, and always call
correlation_id_var.reset(token) in a finally block (ensuring response header
setting remains correct and exceptions still propagate).

Comment on lines +17 to +33
def format(self, record: logging.LogRecord) -> str:
payload: Dict[str, Any] = {
"timestamp": f"{datetime.utcnow().isoformat()}Z",
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
}

# Inject correlation ID if available
correlation_id = correlation_id_var.get()
if correlation_id:
payload["correlation_id"] = correlation_id

if record.exc_info:
payload["exception"] = self.formatException(record.exc_info)

return json.dumps(payload)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Python version and find deprecated datetime.utcnow() usage

echo "=== Python version from project config ==="
fd -t f 'pyproject.toml|.python-version|.tool-versions|Dockerfile' --exec cat {}

echo -e "\n=== All datetime.utcnow() usages ==="
rg -n 'datetime\.utcnow\(\)' --type=py

Repository: TENET-DEV-AI/TENET-AI

Length of output: 1480


Replace datetime.utcnow() with datetime.now(timezone.utc) for UTC timestamps.

datetime.utcnow() returns a naive datetime (no tzinfo); switch to datetime.now(timezone.utc).isoformat() for timezone-aware UTC. Apply the same change in the other matches as well:

  • scripts/train_model.py (trained_at, generated_at)
  • services/ingest/app.py (timestamp fields)
  • services/utils/logging_config.py (timestamp)
🧰 Tools
🪛 ast-grep (0.43.0)

[info] 32-32: use jsonify instead of json.dumps for JSON output
Context: json.dumps(payload)
Note: Security best practice.

(use-jsonify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/utils/logging_config.py` around lines 17 - 33, In the format method
of logging_config.py (class CustomFormatter -> def format), replace the naive
timestamp construction using datetime.utcnow() with a timezone-aware one: use
datetime.now(timezone.utc).isoformat() (ensure timezone is imported from
datetime). Make the same change in scripts/train_model.py for the trained_at and
generated_at assignments and in services/ingest/app.py for any timestamp fields
so all UTC timestamps are timezone-aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Structured JSON Logging Across Services

1 participant