feat: implement structured JSON logging across services by Suhaskumard · Pull Request #161 · TENET-DEV-AI/TENET-AI

Suhaskumard · 2026-06-11T11:19:19Z

🤖 TENET Agent will automatically review this PR for security issues and code quality.
Maintainers: to have TENET solve an issue autonomously, comment /tenet fix on the issue.

Summary

Implemented project-wide structured JSON logging by replacing ad-hoc print() statements with the centralized logging framework. Added unit tests to validate JSON log formatting and correlation ID handling, improving observability, consistency, and reliability across TENET-AI services.

Key Changes

Replaced legacy print() statements with structured logger calls
Standardized JSON logging across services, scripts, utilities, and examples
Added unit tests for JSON formatter validation
Added unit tests for correlation ID handling
Updated contributor documentation with logging standards

Related Issue

Fixes #99

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
CI/CD Improvement
Added Tests

Changes Made

Logging Infrastructure

Enhanced centralized JSON logging configuration
Replaced direct print() statements with structured logger events
Standardized logging behavior across project modules
Improved compatibility with log aggregation and monitoring systems

GitHub Agent Utilities

Refactored tenet_solve.py
Refactored utils.py
Integrated centralized logger usage

Machine Learning Services

Updated scripts/train_model.py
Updated scripts/verify_model_artifacts.py
Updated services/analyzer/model/phishing_model.py
Converted training and verification output to structured logs

Example Applications

Updated examples/llm_plugin_demo.py
Replaced console output with logger calls

Testing

Added new unit tests in tests/unit/test_logging.py:

test_json_formatter_structure
- Validates generated logs are valid JSON
- Verifies required fields:
  - logger
  - level
  - message
  - timestamp
  - correlation_id
test_correlation_id_empty
- Verifies missing correlation IDs are safely handled
- Confirms JSON output remains valid
- Ensures null is emitted instead of causing failures

Documentation

Added Logging Standards section to CONTRIBUTING.md
Documented approved logging practices
Added guidance for future contributors

Files Modified

File	Purpose
`utils/logging_config.py`	Centralized JSON logging configuration
`tenet_solve.py`	Replaced print statements
`utils.py`	Integrated structured logging
`scripts/train_model.py`	Added JSON logging
`scripts/verify_model_artifacts.py`	Added JSON logging
`services/analyzer/model/phishing_model.py`	Updated logging implementation
`examples/llm_plugin_demo.py`	Replaced print outputs
`tests/unit/test_logging.py`	Added JSON logging unit tests
`CONTRIBUTING.md`	Added logging standards

Screenshots / Logs

1. Structured JSON Log Output

Structured JSON logs generated through the centralized logging framework.

2. Logging Configuration

Centralized JSON logging configuration used across TENET-AI services.

3. Updated Service Logging

Migration from print statements to structured logger calls.

4. Unit Tests Passing

Successful execution of pytest tests/unit/test_logging.py verifying JSON formatting and correlation ID handling.

How Has This Been Tested?

Unit tests
Integration tests
Manual testing

Validation Performed

Verified all identified print() statements were replaced
Confirmed logger initialization across updated modules
Verified structured JSON output formatting
Executed:

pytest tests/unit/test_logging.py

Confirmed all logging tests pass successfully
Validated JSON formatter structure
Validated correlation ID fallback behavior
Verified logging functionality in:
- GitHub agent utilities
- ML training scripts
- Model verification scripts
- Phishing detection model
- Example applications

Checklist

My code follows the project's code style
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective
New and existing unit tests pass locally with my changes

Summary by cubic

Implemented project-wide structured JSON logging with correlation IDs, replacing ad-hoc prints to improve observability across services. Expanded middleware, formatter, tests, and docs to standardize behavior end-to-end.

New Features
- Centralized JSON logger via services.utils.logging_config with JSONFormatter and correlation_id_var.
- Ingest middleware sets and propagates X-Correlation-ID on requests and responses.
- Agents and utilities (.github/tenet_agent/*) now use setup_logging.
- Unit tests expanded for JSON structure and correlation ID handling; updated logging config tests.
- Added logging standards to CONTRIBUTING.md.
Refactors
- Replaced print() with logger calls in TENET Agent tools, example app, training/verify scripts, phishing model, and ingest app.
- Standardized log levels; removed custom ingest JSON formatter in favor of setup_logging.

^{Written for commit 60fd3ed. Summary will update on new commits.}

Summary by CodeRabbit

Documentation
- Added Logging Standards section to contribution guidelines, requiring structured JSON logging and forbidding ad-hoc console output.
Tests
- Expanded test coverage for JSON logging structure and correlation ID validation.
Chores
- Implemented structured JSON logging across workflows and utility modules for improved request traceability and debugging capabilities.

vercel · 2026-06-11T11:19:23Z

@Suhaskumard is attempting to deploy a commit to the s3dfx-cyber's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-06-11T11:19:43Z

Warning

Review limit reached

@Suhaskumard, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 27 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: afa08c7b-e321-43e1-be48-112937ce8df2

📥 Commits

Reviewing files that changed from the base of the PR and between 08b2bf1 and 60fd3ed.

📒 Files selected for processing (1)

tests/unit/test_logging_config.py

📝 Walkthrough

Walkthrough

This PR replaces ad-hoc print() statements throughout the codebase with structured JSON logging. A new JSONFormatter emits correlated logs with timestamps, severity levels, and context; HTTP middleware propagates correlation IDs across requests; and all scripts, agents, and examples adopt centralized logging via setup_logging(). Logging standards are now documented in the contributor guide.

Changes

Structured JSON Logging Across Services

Layer / File(s)	Summary
JSON logging infrastructure and correlation IDs `services/utils/logging_config.py`	Core `JSONFormatter` emits structured JSON with timestamp, level, logger name, message, correlation ID, and exception details. Module-level `correlation_id_var` holds per-request/async-context correlation IDs. `setup_logging()` now applies `JSONFormatter` to all handlers.
Correlation ID middleware and JSON logging tests `services/ingest/app.py`, `tests/unit/test_logging.py`	HTTP middleware reads `X-Correlation-ID` header (or generates UUID), stores in `correlation_id_var`, forwards request, and echoes correlation ID in response. Test suite validates JSON formatter structure, presence of required fields, and correlation ID handling when set or unset.
TENET agent workflow scripts refactoring `.github/tenet_agent/tenet_review.py`, `.github/tenet_agent/tenet_solve.py`, `.github/tenet_agent/utils.py`	Review workflow logs environment setup, PR diff fetching, LLM analysis start/completion, and security label success/failure. Solve workflow logs issue metadata, repo scanning, code-generation phases with debug output preview, and PR creation status. Utilities log environment errors, LLM failures, GitHub comment posts, and git command errors with appropriate severity.
Supporting scripts and examples refactoring `scripts/train_model.py`, `scripts/verify_model_artifacts.py`, `services/analyzer/model/phishing_model.py`, `examples/llm_plugin_demo.py`	Model training, artifact verification, phishing detector, and LLM plugin demo convert all console output and error reporting from `print()` to structured logging. Validation logic and test/demo flow remain unchanged; only output mechanism delegates to centralized logger.
Logging standards documentation `CONTRIBUTING.md`	New "Logging Standards" subsection specifies structured JSON logging requirements, forbids ad-hoc `print()` usage, directs contributors to use `setup_logging()`, documents automatic correlation ID handling, defines severity levels, and prohibits logging sensitive data.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

TENET-DEV-AI/TENET-AI#90: Extends unit test coverage for setup_logging behavior, handler initialization, and logger defaults alongside the new JSON formatter and correlation ID tests in this PR.
TENET-DEV-AI/TENET-AI#45: Introduced the TENET agent reviewer/solver scripts (.github/tenet_agent/*) that this PR refactors to use centralized structured logging instead of print().
TENET-DEV-AI/TENET-AI#35: Introduced the initial services/utils/logging_config.py module and corresponding unit tests that this PR extends with JSONFormatter, correlation_id_var, and correlation ID middleware integration.

Suggested labels

SSoC26, hard

Suggested reviewers

S3DFX-CYBER

Poem

🐰 Hop along the structured way,
JSON logs now save the day,
Correlation IDs trace each flow,
Debugging's easier, don't you know?
Print() is gone—the logger's here,
Production secrets now so clear! 🚀

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: implementing structured JSON logging across the entire codebase, which aligns with the file modifications and objectives.
Linked Issues check	✅ Passed	All coding objectives from issue `#99` are met: print() statements replaced with logging calls, structured JSON format implemented, correlation IDs added via context variables, severity levels implemented, centralized logging established, and contributor documentation updated.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#99` objectives. No out-of-scope modifications detected; changes are limited to logging infrastructure updates and print-to-logger migrations across specified modules.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description is comprehensive and well-structured, following the required template with all major sections completed including Summary, Key Changes, Related Issue, Type of Change, detailed Changes Made breakdown, Files Modified table, Screenshots/Logs, Testing methodology, and Checklist.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Suhaskumard · 2026-06-11T11:24:20Z

Hi @S3DFX-CYBER,
I have solved the issue if any changes kindly let me know

cubic-dev-ai

No issues found across 11 files

_{Re-trigger cubic}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

tests/unit/test_logging.py (2)

60-90: ⚡ Quick win

Consider adding test coverage for exception logging.

This test validates the JSON structure for normal log messages, but the JSONFormatter also handles exceptions via record.exc_info and formatException(). Consider adding a test case that logs with exc_info=True to verify the "exception" field is properly formatted.

💡 Suggested test addition

def test_json_formatter_with_exception(self):
    """Test that exceptions are properly formatted in JSON logs."""
    import json
    from io import StringIO
    from services.utils.logging_config import JSONFormatter
    
    log_stream = StringIO()
    logger = logging.getLogger("test_json_exception")
    logger.setLevel(logging.ERROR)
    logger.handlers.clear()
    
    handler = logging.StreamHandler(log_stream)
    handler.setFormatter(JSONFormatter())
    logger.addHandler(handler)
    
    try:
        raise ValueError("Test exception")
    except ValueError:
        logger.error("An error occurred", exc_info=True)
    
    log_output = log_stream.getvalue().strip()
    log_data = json.loads(log_output)
    
    assert log_data.get("message") == "An error occurred"
    assert "exception" in log_data
    assert "ValueError: Test exception" in log_data["exception"]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_logging.py` around lines 60 - 90, Add a new unit test to
cover exception formatting by JSONFormatter: create a test (e.g.,
test_json_formatter_with_exception) that configures a StringIO StreamHandler
with services.utils.logging_config.JSONFormatter, emits a logger.error("An error
occurred", exc_info=True) inside an except ValueError block, reads the JSON
output and asserts that "message" == "An error occurred" and that an "exception"
key exists whose value contains "ValueError: Test exception"; this verifies
JSONFormatter.formatException/record.exc_info handling.

92-115: ⚡ Quick win

Consider explicitly verifying correlation_id key absence when unset.

The test correctly validates behavior when correlation_id_var is unset (or set to a falsy value like None). However, the assertion assert log_data.get("correlation_id") is None passes both when the key is absent and when it's present with a None value.

Since the JSONFormatter omits the key entirely when correlation_id is falsy (line 27-28 in logging_config.py check if correlation_id:), consider adding an explicit assertion to document this behavior.

💡 Suggested assertion addition

     log_output = log_stream.getvalue().strip()
     log_data = json.loads(log_output)
     
     assert log_data.get("message") == "No correlation ID here"
-    assert log_data.get("correlation_id") is None
+    # Verify the correlation_id key is absent, not present with None value
+    assert "correlation_id" not in log_data

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_logging.py` around lines 92 - 115, The
test_correlation_id_empty currently uses assert log_data.get("correlation_id")
is None which does not distinguish between a missing key and a key set to None;
update the test to explicitly assert the key is absent (e.g., assert
"correlation_id" not in log_data) to match JSONFormatter's behavior for falsy
correlation_id_var; reference the test function test_correlation_id_empty and
the correlation_id_var/JSONFormatter behavior to locate where to change the
assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/tenet_agent/tenet_solve.py:
- Around line 124-127: Replace the direct raw LLM payload logs in tenet_solve.py
(the logger.warning calls around parse_file_changes that log llm_output[:2000],
and the similar block at lines ~212-214) with metadata-only logs: log length and
a short hash/fingerprint of llm_output instead of its content, and only emit the
full payload when a secure debug flag is explicitly enabled (e.g., an env var or
module-level SECURE_DEBUG flag checked before logging the raw llm_output).
Update the code paths that reference llm_output and the logger (the
parse_file_changes logging block and the second logging block) to perform
hashing and length computation and gate full-content logging behind the secure
debug flag.
- Around line 208-217: The result of call_llm(model, code_prompt) (variable
code_output) can be None and is currently passed directly into
parse_file_changes(code_output); add a guard after the logging block that checks
if not code_output (None or empty), log an error with context via logger.error/
warning, and invoke the existing fallback flow (e.g., create/post an issue
comment or return an empty file_changes list) instead of calling
parse_file_changes; update callers that expect file_changes so they handle the
fallback path. Ensure you reference the variables and functions involved:
call_llm, code_output, parse_file_changes, and logger.

In @.github/tenet_agent/utils.py:
- Around line 244-247: The current logger.error call is printing full git
stdout/stderr which may leak secrets; change it to log the git return code
(result.returncode) and only bounded, sanitized snippets of stdout/stderr rather
than the full texts: trim to a small length (e.g. first/last ~200 chars),
collapse newlines to spaces, and apply simple redaction (mask
token/password-like patterns via regex) before logging; update the call around
the logger.error that references cmd and result so the message contains the
return code, the sanitized/truncated stdout and stderr snippets, and avoids
dumping the entire output.

In `@CONTRIBUTING.md`:
- Line 168: Update the Correlation IDs doc to state that automatic propagation
applies only when the HTTP correlation ID middleware is installed; for non-HTTP
contexts (scripts, background workers, cron jobs) contributors must manually set
and propagate the context var by calling correlation_id_var.set(<id>) at the
entry point of the task/process and ensure that any spawned threads/tasks
capture that context (e.g., set before creating threads/futures or use helper
wrappers that propagate context); mention explicitly that correlation_id_var is
the symbol to use and add a short note pointing contributors to the HTTP
middleware as the automatic case.
- Line 175: Update CONTRIBUTING.md to clarify that the logging system does not
redact secrets by default: explicitly state that JSONFormatter.format() in
services/utils/logging_config.py uses record.getMessage() and
self.formatException(record.exc_info) and returns json.dumps(payload) with no
automatic redaction, so any passwords, tokens, API keys or raw secrets passed to
logger.* will be emitted in plaintext; instruct contributors to never log
secrets, to use provided redaction utilities or a sanitizer before logging, and
to review logger calls (e.g., JSONFormatter.format, record.getMessage,
formatException) for accidental secret exposure.

In `@services/ingest/app.py`:
- Around line 136-142: The middleware correlation_id_middleware sets the context
var correlation_id_var but never resets it, risking ID leakage between requests;
update correlation_id_middleware to capture the token returned by
correlation_id_var.set(corr_id), invoke call_next(request) in a try block, and
always call correlation_id_var.reset(token) in a finally block (ensuring
response header setting remains correct and exceptions still propagate).

In `@services/utils/logging_config.py`:
- Around line 17-33: In the format method of logging_config.py (class
CustomFormatter -> def format), replace the naive timestamp construction using
datetime.utcnow() with a timezone-aware one: use
datetime.now(timezone.utc).isoformat() (ensure timezone is imported from
datetime). Make the same change in scripts/train_model.py for the trained_at and
generated_at assignments and in services/ingest/app.py for any timestamp fields
so all UTC timestamps are timezone-aware.

---

Nitpick comments:
In `@tests/unit/test_logging.py`:
- Around line 60-90: Add a new unit test to cover exception formatting by
JSONFormatter: create a test (e.g., test_json_formatter_with_exception) that
configures a StringIO StreamHandler with
services.utils.logging_config.JSONFormatter, emits a logger.error("An error
occurred", exc_info=True) inside an except ValueError block, reads the JSON
output and asserts that "message" == "An error occurred" and that an "exception"
key exists whose value contains "ValueError: Test exception"; this verifies
JSONFormatter.formatException/record.exc_info handling.
- Around line 92-115: The test_correlation_id_empty currently uses assert
log_data.get("correlation_id") is None which does not distinguish between a
missing key and a key set to None; update the test to explicitly assert the key
is absent (e.g., assert "correlation_id" not in log_data) to match
JSONFormatter's behavior for falsy correlation_id_var; reference the test
function test_correlation_id_empty and the correlation_id_var/JSONFormatter
behavior to locate where to change the assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a97c4c58-48d1-4514-9796-4a3e95f2f9ec

📥 Commits

Reviewing files that changed from the base of the PR and between 721686f and 08b2bf1.

📒 Files selected for processing (11)

.github/tenet_agent/tenet_review.py
.github/tenet_agent/tenet_solve.py
.github/tenet_agent/utils.py
CONTRIBUTING.md
examples/llm_plugin_demo.py
scripts/train_model.py
scripts/verify_model_artifacts.py
services/analyzer/model/phishing_model.py
services/ingest/app.py
services/utils/logging_config.py
tests/unit/test_logging.py

coderabbitai · 2026-06-11T11:30:05Z

+        logger.warning("⚠️  parse_file_changes: no FILE blocks matched any pattern.")
+        logger.warning("─── RAW LLM OUTPUT (first 2000 chars) ───")
+        logger.warning(llm_output[:2000])
+        logger.warning("──────────────────────────────────────────")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging raw LLM output payloads directly.

At Line 124–127 and Line 212–214, dumping raw model output to logs can expose sensitive code/content in CI artifacts. Log metadata (length/hash) instead, and gate full payload behind an explicit secure debug flag.

Suggested fix

- logger.warning("─── RAW LLM OUTPUT (first 2000 chars) ───") - logger.warning(llm_output[:2000]) - logger.warning("──────────────────────────────────────────") + logger.warning( + "parse_file_changes: no FILE blocks matched; output_len=%d", + len(llm_output or ""), + ) @@ - logger.debug("─── RAW LLM OUTPUT (first 500 chars) ────") - logger.debug((code_output or "")[:500]) - logger.debug("─────────────────────────────────────────") + if os.getenv("TENET_DEBUG_LLM_OUTPUT") == "1": + logger.debug("LLM output preview (first 500 chars): %r", (code_output or "")[:500]) + else: + logger.debug("LLM output received; length=%d", len(code_output or ""))

Also applies to: 212-214

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/tenet_agent/tenet_solve.py around lines 124 - 127, Replace the direct raw LLM payload logs in tenet_solve.py (the logger.warning calls around parse_file_changes that log llm_output[:2000], and the similar block at lines ~212-214) with metadata-only logs: log length and a short hash/fingerprint of llm_output instead of its content, and only emit the full payload when a secure debug flag is explicitly enabled (e.g., an env var or module-level SECURE_DEBUG flag checked before logging the raw llm_output). Update the code paths that reference llm_output and the logger (the parse_file_changes logging block and the second logging block) to perform hashing and length computation and gate full-content logging behind the secure debug flag.

coderabbitai · 2026-06-11T11:30:05Z

    code_output = call_llm(model, code_prompt)
-    print("✍️  Code generation complete.")
+    logger.info("✍️  Code generation complete.")

    # ── Debug: log raw output header to aid future parse failures ─────────────
-    print("─── RAW LLM OUTPUT (first 500 chars) ────")
-    print((code_output or "")[:500])
-    print("─────────────────────────────────────────")
+    logger.debug("─── RAW LLM OUTPUT (first 500 chars) ────")
+    logger.debug((code_output or "")[:500])
+    logger.debug("─────────────────────────────────────────")

    # ── Parse file changes ─────────────────────────────────────────────────────
    file_changes = parse_file_changes(code_output)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard code_output before parsing to prevent crash on LLM failures.

call_llm() can return None; at Line 217, parse_file_changes(code_output) then dereferences non-string input and can fail before posting a fallback issue comment.

Suggested fix

code_output = call_llm(model, code_prompt) logger.info("✍️ Code generation complete.") + if not code_output: + post_issue_comment( + repo, + issue_number, + f"## 🤖 TENET Agent - Fix Generation Failed\n\n" + f"TENET Agent could not generate a code response for issue #{issue_number}.\n\n" + f"Please retry or review workflow logs.\n\n---\n*TENET Agent 🛡️*", + ) + logger.error("❌ Code generation returned empty/failed response.") + sys.exit(1) # ── Debug: log raw output header to aid future parse failures ─────────────

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/tenet_agent/tenet_solve.py around lines 208 - 217, The result of call_llm(model, code_prompt) (variable code_output) can be None and is currently passed directly into parse_file_changes(code_output); add a guard after the logging block that checks if not code_output (None or empty), log an error with context via logger.error/ warning, and invoke the existing fallback flow (e.g., create/post an issue comment or return an empty file_changes list) instead of calling parse_file_changes; update callers that expect file_changes so they handle the fallback path. Ensure you reference the variables and functions involved: call_llm, code_output, parse_file_changes, and logger.

coderabbitai · 2026-06-11T11:30:06Z

+            logger.error(
                f"Git command failed: {' '.join(cmd)}\n"
                f"Stdout: {result.stdout}\nStderr: {result.stderr}"
            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact or limit git stdout/stderr in error logs.

At Line 244–247, logging full git stdout/stderr risks leaking credentials or sensitive repo content in CI logs. Log return code + bounded/sanitized snippets instead.

Suggested fix

result = subprocess.run(cmd, capture_output=True, text=True) if result.returncode != 0: - logger.error( - f"Git command failed: {' '.join(cmd)}\n" - f"Stdout: {result.stdout}\nStderr: {result.stderr}" - ) + safe_stdout = (result.stdout or "")[:500] + safe_stderr = (result.stderr or "")[:500] + logger.error( + "Git command failed: %s (code=%s) stdout=%r stderr=%r", + " ".join(cmd[:2]), # avoid logging full args + result.returncode, + safe_stdout, + safe_stderr, + ) raise RuntimeError(f"git command failed: {result.stderr}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/tenet_agent/utils.py around lines 244 - 247, The current logger.error call is printing full git stdout/stderr which may leak secrets; change it to log the git return code (result.returncode) and only bounded, sanitized snippets of stdout/stderr rather than the full texts: trim to a small length (e.g. first/last ~200 chars), collapse newlines to spaces, and apply simple redaction (mask token/password-like patterns via regex) before logging; update the call around the logger.error that references cmd and result so the message contains the return code, the sanitized/truncated stdout and stderr snippets, and avoids dumping the entire output.

coderabbitai · 2026-06-11T11:30:06Z

+  from services.utils.logging_config import setup_logging
+  logger = setup_logging(__name__)
+  ```
+- **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify correlation ID propagation for non-HTTP contexts.

The documentation states correlation IDs are "automatically handled," but this is only true for HTTP services with middleware. From the context, only services/ingest/app.py has correlation ID middleware. Scripts, background workers, and other non-HTTP services would need to manually set correlation_id_var if they want correlation IDs in their logs. Contributors implementing logging in these contexts may be confused by the "automatic" claim.

📝 Suggested clarification

-- **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers. +- **Correlation IDs**: HTTP services automatically propagate request correlation IDs via middleware (see `services/ingest/app.py`). For scripts and non-HTTP services, correlation IDs can be set manually using `correlation_id_var.set()` from `services.utils.logging_config`. Do not manually construct log messages with request identifiers.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **Correlation IDs**: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers.

- **Correlation IDs**: HTTP services automatically propagate request correlation IDs via middleware (see `services/ingest/app.py`). For scripts and non-HTTP services, correlation IDs can be set manually using `correlation_id_var.set()` from `services.utils.logging_config`. Do not manually construct log messages with request identifiers.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CONTRIBUTING.md` at line 168, Update the Correlation IDs doc to state that automatic propagation applies only when the HTTP correlation ID middleware is installed; for non-HTTP contexts (scripts, background workers, cron jobs) contributors must manually set and propagate the context var by calling correlation_id_var.set(<id>) at the entry point of the task/process and ensure that any spawned threads/tasks capture that context (e.g., set before creating threads/futures or use helper wrappers that propagate context); mention explicitly that correlation_id_var is the symbol to use and add a short note pointing contributors to the HTTP middleware as the automatic case.

coderabbitai · 2026-06-11T11:30:06Z

+  - `WARNING`: Recoverable or unexpected situations (e.g., rate limit exceeded).
+  - `ERROR`: Failed operations (e.g., API call failed).
+  - `CRITICAL`: Severe failures affecting application availability (e.g., database connection lost).
+- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets.


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Confirm JSONFormatter emits getMessage() and formatException() without redaction. # Search for redaction, masking, or filtering logic in JSONFormatter rg -n -A5 -B5 'class JSONFormatter' services/utils/logging_config.py # Search for any sanitize/redact functions in logging_config.py rg -n 'def.*(redact|sanitize|mask|filter)' services/utils/logging_config.py

Repository: TENET-DEV-AI/TENET-AI

Length of output: 529

🏁 Script executed:

#!/bin/bash set -euo pipefail file="services/utils/logging_config.py" echo "== File: $file ==" wc -l "$file" echo "== JSONFormatter definition ==" rg -n "class JSONFormatter" "$file" -n echo "== JSONFormatter.format() body (expanded context) ==" line=$(rg -n "class JSONFormatter" "$file" | head -n1 | cut -d: -f1) # Print from a little before class through some lines after format() start=$((line-10)) end=$((line+120)) sed -n "${start},${end}p" "$file" | nl -ba echo "== Any redaction/sanitization/masking helpers in this file ==" rg -n "(redact|sanitize|mask|filter)" "$file" || true echo "== Any use of record.getMessage()/exception formatting in JSONFormatter.format() ==" rg -n "getMessage\\(|formatException\\(" "$file" || true

Repository: TENET-DEV-AI/TENET-AI

Length of output: 335

🏁 Script executed:

#!/bin/bash set -euo pipefail file="services/utils/logging_config.py" echo "== File: $file ==" wc -l "$file" echo "== JSONFormatter definition ==" rg -n "class JSONFormatter" "$file" line=$(rg -n "class JSONFormatter" "$file" | head -n1 | cut -d: -f1) echo "== JSONFormatter.format() body (expanded) ==" start=$((line-10)) end=$((line+180)) # Use cat -n (available) to preserve line numbers sed -n "${start},${end}p" "$file" | cat -n echo "== Redaction/sanitization/masking helpers in this file ==" rg -n "(redact|sanitize|mask|filter)" "$file" || true echo "== Uses of record.getMessage() / formatException() in this file ==" rg -n "getMessage\\(|formatException\\(" "$file" || true echo "== Any logging framework-wide redaction mention (repo) ==" rg -n "(redact|sanitize|mask|filter)" --glob='*.py' . || true

Repository: TENET-DEV-AI/TENET-AI

Length of output: 3748

Clarify that logging does not automatically redact secrets (CONTRIBUTING.md)
services/utils/logging_config.py’s JSONFormatter.format() sets "message": record.getMessage() and "exception": self.formatException(record.exc_info) and returns json.dumps(payload) with no redaction/masking, so secrets passed to logger.* will be written in plaintext. Update the guidance accordingly.

🛡️ Suggested documentation enhancement

-- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. +- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. The logging framework does not automatically redact sensitive values; developers are responsible for ensuring secrets are not passed to logger calls.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets.

- **Sensitive Data**: Never log passwords, tokens, API keys, or raw secrets. The logging framework does not automatically redact sensitive values; developers are responsible for ensuring secrets are not passed to logger calls.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CONTRIBUTING.md` at line 175, Update CONTRIBUTING.md to clarify that the logging system does not redact secrets by default: explicitly state that JSONFormatter.format() in services/utils/logging_config.py uses record.getMessage() and self.formatException(record.exc_info) and returns json.dumps(payload) with no automatic redaction, so any passwords, tokens, API keys or raw secrets passed to logger.* will be emitted in plaintext; instruct contributors to never log secrets, to use provided redaction utilities or a sanitizer before logging, and to review logger calls (e.g., JSONFormatter.format, record.getMessage, formatException) for accidental secret exposure.

coderabbitai · 2026-06-11T11:30:06Z

+@app.middleware("http")
+async def correlation_id_middleware(request: Request, call_next):
+    corr_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())
+    correlation_id_var.set(corr_id)
+    response = await call_next(request)
+    response.headers["X-Correlation-ID"] = corr_id
+    return response


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Context variable cleanup is missing—correlation IDs may leak between requests.

The middleware sets correlation_id_var but never resets it. In async contexts with connection pooling or when exceptions occur, this can cause correlation IDs to leak into subsequent unrelated requests, breaking request tracing.

The proper pattern is to capture the token from .set() and reset it in a finally block.

🔒 Proposed fix with proper cleanup

`@app.middleware`("http") async def correlation_id_middleware(request: Request, call_next): corr_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4()) - correlation_id_var.set(corr_id) - response = await call_next(request) - response.headers["X-Correlation-ID"] = corr_id - return response + token = correlation_id_var.set(corr_id) + try: + response = await call_next(request) + response.headers["X-Correlation-ID"] = corr_id + return response + finally: + correlation_id_var.reset(token)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/ingest/app.py` around lines 136 - 142, The middleware correlation_id_middleware sets the context var correlation_id_var but never resets it, risking ID leakage between requests; update correlation_id_middleware to capture the token returned by correlation_id_var.set(corr_id), invoke call_next(request) in a try block, and always call correlation_id_var.reset(token) in a finally block (ensuring response header setting remains correct and exceptions still propagate).

coderabbitai · 2026-06-11T11:30:06Z

+    def format(self, record: logging.LogRecord) -> str:
+        payload: Dict[str, Any] = {
+            "timestamp": f"{datetime.utcnow().isoformat()}Z",
+            "level": record.levelname,
+            "logger": record.name,
+            "message": record.getMessage(),
+        }
+
+        # Inject correlation ID if available
+        correlation_id = correlation_id_var.get()
+        if correlation_id:
+            payload["correlation_id"] = correlation_id
+
+        if record.exc_info:
+            payload["exception"] = self.formatException(record.exc_info)
+
+        return json.dumps(payload)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check Python version and find deprecated datetime.utcnow() usage echo "=== Python version from project config ===" fd -t f 'pyproject.toml|.python-version|.tool-versions|Dockerfile' --exec cat {} echo -e "\n=== All datetime.utcnow() usages ===" rg -n 'datetime\.utcnow' --type=py

Repository: TENET-DEV-AI/TENET-AI

Length of output: 1480

Replace datetime.utcnow() with datetime.now(timezone.utc) for UTC timestamps.

datetime.utcnow() returns a naive datetime (no tzinfo); switch to datetime.now(timezone.utc).isoformat() for timezone-aware UTC. Apply the same change in the other matches as well:

scripts/train_model.py (trained_at, generated_at)

services/ingest/app.py (timestamp fields)

services/utils/logging_config.py (timestamp)

🧰 Tools

🪛 ast-grep (0.43.0)

[info] 32-32: use jsonify instead of json.dumps for JSON output
Context: json.dumps(payload)
Note: Security best practice.

(use-jsonify)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/utils/logging_config.py` around lines 17 - 33, In the format method of logging_config.py (class CustomFormatter -> def format), replace the naive timestamp construction using datetime.utcnow() with a timezone-aware one: use datetime.now(timezone.utc).isoformat() (ensure timezone is imported from datetime). Make the same change in scripts/train_model.py for the trained_at and generated_at assignments and in services/ingest/app.py for any timestamp fields so all UTC timestamps are timezone-aware.

Suhaskumard added 2 commits June 11, 2026 16:29

feat: implement structured JSON logging across services

e81ada7

test: update unit tests for structured JSON logging

08b2bf1

cubic-dev-ai Bot reviewed Jun 11, 2026

View reviewed changes

S3DFX-CYBER requested a review from Preetham404 June 11, 2026 11:29

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

test: update logging config tests for JSON logging

60fd3ed

	- Correlation IDs: The logging framework automatically handles request correlation IDs across services. Do not manually construct log messages with request identifiers.
	- Correlation IDs: HTTP services automatically propagate request correlation IDs via middleware (see `services/ingest/app.py`). For scripts and non-HTTP services, correlation IDs can be set manually using `correlation_id_var.set()` from `services.utils.logging_config`. Do not manually construct log messages with request identifiers.

	- Sensitive Data: Never log passwords, tokens, API keys, or raw secrets.
	- Sensitive Data: Never log passwords, tokens, API keys, or raw secrets. The logging framework does not automatically redact sensitive values; developers are responsible for ensuring secrets are not passed to logger calls.

Conversation

Suhaskumard commented Jun 11, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Related Issue

Type of Change

Changes Made

Logging Infrastructure

GitHub Agent Utilities

Machine Learning Services

Example Applications

Testing

Documentation

Files Modified

Screenshots / Logs

1. Structured JSON Log Output

2. Logging Configuration

3. Updated Service Logging

4. Unit Tests Passing

How Has This Been Tested?

Validation Performed

Checklist

Summary by cubic

Summary by CodeRabbit

Uh oh!

vercel Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Suhaskumard commented Jun 11, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Suhaskumard commented Jun 11, 2026 •

edited by cubic-dev-ai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading