Skip to content

77QAlab/LogMiner-QA

Repository files navigation

LogMiner-QA

Python Version License: MIT PRs Welcome

"Turn Production Wisdom into Test Coverage" — Privacy-first log analysis and automated test generation for banking and enterprise workloads.

Overview

LogMiner-QA ingests raw banking logs, scrubs sensitive customer data, and generates actionable test cases. The tool is composed of:

  • Intelligent Log Sanitizer: Detects PII via pattern matching and optional spaCy NER, redacts sensitive data with stable tokens, and hashes identifiers for correlation without exposure.
  • Differential Privacy Layer: Adds calibrated noise to aggregate metrics to prevent reconstruction of individual behaviour.
  • Analysis & Test Generation: Reconstructs journeys, prioritises high-risk flows, and emits Gherkin scenarios for CI/CD pipelines.
  • On-Prem Deployment: Designed for containerised, air-gapped environments where logs never leave the bank’s infrastructure.

Features

  • PII detection for emails, account numbers, phone numbers, IBANs, and more.
  • Configurable hashing algorithm and token format with referential integrity.
  • Laplace-mechanism aggregator for counts, histograms, and ratios.
  • Multi-source ingestion via local JSON/CSV, Elasticsearch, and Datadog connectors.
  • NLP enrichment with regex + spaCy, transformer embeddings, clustering, and Isolation Forest anomaly scoring.
  • LSTM-based journey analysis surfacing anomalous customer flows.
  • Compliance (PCI, GDPR, audit trail) and fraud (velocity, high-value, failed-login) test generation modules.
  • CI/CD friendly summary generation and FastAPI microservice for on-prem orchestration.
  • CLI workflow to process logs into sanitized outputs, privacy-preserving reports, and templated Gherkin tests.

Quick Start

Get started in 5 minutes! See Quick Start Guide for detailed instructions.

# 1. Install dependencies
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# 2. Install package in editable mode (required for CLI)
pip install -e .

# 3. Set security secret (optional but recommended)
export LOGMINER_HASH_SECRET=$(openssl rand -hex 32)  # Linux/Mac
# Windows PowerShell: See docs/QUICK_START.md

# 4. Run analysis
python -m logminer_qa.cli \
  --input data/sample_logs.jsonl \
  --output sanitized.jsonl \
  --report report.json \
  --tests tests.feature

Windows (PowerShell): After activating the venv (.\.venv\Scripts\Activate.ps1), ensure you've run pip install -e . first. Then you can use .\.venv\Scripts\python.exe -m logminer_qa.cli ... or the one-step script: .\run_sample.ps1 (runs the installation test, ensures package is installed, then runs the sample pipeline).

CI mode

Emit a compact JSON summary for pipeline gates:

python -m logminer_qa.cli --input data/sample_logs.jsonl --ci-summary build/logminer-summary.json

Inspect high_severity_findings / anomalies_detected and fail the build if thresholds are exceeded.

API Server

Run the analysis service in-process:

uvicorn logminer_qa.server:create_app --factory --host 0.0.0.0 --port 8080

POST {"records": [...]} to /analyze to receive sanitized previews, risk summaries, and generated tests.

Dashboard (React UI)

The dashboard is a separate frontend app that talks to the API server (backend) via /analyze.

  1. Start the API backend

Windows PowerShell (offline-ready):

$env:PYTHONPATH="C:\Users\abirm\LogMiner-QA\src"
$env:TRANSFORMERS_OFFLINE="1"
$env:HF_HUB_OFFLINE="1"

cd C:\Users\abirm\LogMiner-QA
.\.venv\Scripts\python.exe -m uvicorn logminer_qa.server:create_app --factory --host 127.0.0.1 --port 8081
  1. Start the dashboard frontend

First time:

cd C:\Users\abirm\LogMiner-QA\dashboard
npm install

Run:

cd C:\Users\abirm\LogMiner-QA\dashboard
npx vite

Open the URL printed by Vite (for example http://localhost:5177/).

Notes:

  • If you use Upload File with a CSV that has timestamp and event columns, the tool will treat event as the message-like field for validation/test generation.
  • If the sentence-transformers model is not available in the local cache, offline mode may skip embedding-based steps (clustering/anomaly/journey), but sanitization + compliance/fraud/test generation can still run depending on your data.

Project Structure

  • src/logminer_qa/sanitizer.py — PII detection, tokenisation, and hashing.
  • src/logminer_qa/privacy.py — Differential privacy utilities.
  • src/logminer_qa/pipeline.py — Orchestrates sanitization, aggregation, and test generation.
  • src/logminer_qa/cli.py — End-user entry point.
  • Dockerfile / helm/ (optional) — Templates for on-prem deployment.

Configuration

Refer to src/logminer_qa/config.py for tunable parameters:

  • SanitizerConfig toggles NER, hashing algorithm, token store path, and entity types.
  • PrivacyConfig defines epsilon/delta budgets and toggles DP.
  • Log format: Records must have at least a timestamp-like and message-like field. Built-in aliases support time/timestamp, msg/message, etc.; custom mapping is available via CLI (--timestamp-field, --message-field, --severity-field) or Settings.log_format. See Log format and field mapping (includes data cleaning expectations: encoding, size limits, PII handling).

Environment variables:

  • LOGMINER_HASH_SECRET — Secret key for deterministic hashing (default fallback provided with warning).

Differential Privacy Guarantees

The Laplace mechanism ensures ε-differential privacy for count-based metrics. Configure ε per compliance needs; smaller ε yields stronger privacy at the cost of accuracy.

Extensibility

  • Plug in custom PII detection patterns or new NER models.
  • Extend LogMinerPipeline._classify_record to reflect domain-specific risk classification.
  • Replace _generate_tests with connectors to Cucumber, Pytest-BDD, or internal tooling.

Deployment Notes

  • Package the tool as a container; run sanitizer and analysis components in the same secure cluster.
  • Disable outbound networking, mount model/token stores to persistent volumes, and integrate with the bank’s secrets manager.
  • Export Prometheus metrics from the pipeline while respecting privacy budgets.

Early Adopters

We're looking for early adopters to help shape LogMiner-QA!

Documentation

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see LICENSE file for details.

Future Enhancements

  • Expand connector catalog (Splunk, CloudWatch, Datadog streaming)
  • Integrate model registry + fine-tuned fraud detection models
  • Surface dashboard visualisations (Streamlit/React) and CI/CD automation hooks
  • Enhanced test generation with prioritization and risk scoring

About

Turn production logs into test coverage using AI/ML. Automatically generates Gherkin test scenarios from application logs with banking-specific compliance and fraud detection modules.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages