NITISH-R-G · NITISH-R-G · May 1, 2026 · May 1, 2026 · May 1, 2026 · May 1, 2026
diff --git a/.env.example b/.env.example
@@ -2,3 +2,4 @@
 OPENAI_API_KEY=
 # OPENAI_MODEL=gpt-4o-mini
 # ORCHESTRATE_SEED=42
+# ORCHESTRATE_MAX_FIELD_CHARS=200000
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,35 @@
+name: CI
+
+on:
+  push:
+    branches: ["**"]
+  pull_request:
+
+jobs:
+  test-and-sample-eval:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+
+      - name: Install dependencies
+        run: pip install -r code/requirements.txt
+
+      # Do not use `python -m code` here: on Linux it often resolves to the *stdlib* `code`
+      # module instead of this repo's `code/` package.
+      - name: CLI help smoke test
+        working-directory: code
+        run: python main.py --help
+
+      - name: Pytest
+        working-directory: code
+        run: python -m pytest tests -q
+
+      - name: Sample regression (offline)
+        working-directory: code
+        env:
+          ORCHESTRATE_DISABLE_LLM: "1"
+        run: python run_eval.py --offline
diff --git a/README.md b/README.md
@@ -4,7 +4,24 @@ Starter repository for the **HackerRank Orchestrate** 24-hour hackathon (May 1
 
 Build a terminal-based AI agent that triages real support tickets across three product ecosystems; **HackerRank**, **Claude**, and **Visa** — using only the support corpus shipped in this repo.
 
-Read [`problem_statement.md`](./problem_statement.md) for the full task spec, input/output schema, and allowed values, and [`evalutation_criteria.md`](./evalutation_criteria.md) for how submissions are scored.
+Read [`problem_statement.md`](./problem_statement.md) for the full task spec, input/output schema, and allowed values, and [`evaluation_criteria.md`](./evaluation_criteria.md) for how submissions are scored.
+
+### Start here (run the bundled agent)
+
+From the **repository root** (after `pip install -r code/requirements.txt`):
+
+| Shell | Command |
+|-------|---------|
+| **Any (recommended)** | `python code/main.py` — avoids shadowing the stdlib `code` module |
+| **Any** | `cd code` then `python main.py` |
+| **bash / zsh** | `./scripts/run_agent.sh` or `bash scripts/run_agent.sh` |
+| **PowerShell** | `pwsh -File scripts/run_agent.ps1` |
+
+**Note:** `python -m code` can invoke the **standard library** `code` module on Linux instead of this repo’s package—prefer `python code/main.py` or the scripts above.
+
+Optional offline-only: set `ORCHESTRATE_DISABLE_LLM=1`, then run one of the above. Full CLI flags are in [`code/README.md`](./code/README.md).
+
+**Interview / demo:** [`docs/interview.md`](./docs/interview.md), [`docs/demo-script.md`](./docs/demo-script.md). **Manual answer quality:** [`docs/DEV_EVAL.md`](./docs/DEV_EVAL.md). **Scope:** [`docs/scope_and_limits.md`](./docs/scope_and_limits.md).
 
 ---
 
@@ -28,16 +45,21 @@ Read [`problem_statement.md`](./problem_statement.md) for the full task spec, in
 ├── AGENTS.md                       # Rules for AI coding tools + transcript logging
 ├── problem_statement.md            # Full task description and I/O schema
 ├── README.md                       # You are here
-├── code/                           # ← Build your agent here
-│   └── main.py                     #   Entry point (rename/extend as you like)
-├── data/                           # Local-only support corpus (no network needed)
-│   ├── hackerrank/                 #   HackerRank help center
-│   ├── claude/                     #   Claude Help Center export
-│   └── visa/                       #   Visa consumer + small-business support
+├── docs/                           # decisions.md, interview prep, demo script, dev rubric
+├── scripts/                        # run_agent.sh / run_agent.ps1 (repo-root invocation)
+├── code/                           # Participant agent (see code/README.md)
+│   ├── main.py                     # CLI entry: reads CSV, writes predictions
+│   ├── retrieve.py                 # Hybrid retrieval + reranking
+│   ├── eval_sample.py              # Metrics vs sample_support_tickets.csv
+│   └── tests/                      # pytest regression checks
+├── data/                           # Offline support corpus (required to run locally)
+│   ├── hackerrank/
+│   ├── claude/
+│   └── visa/
 └── support_tickets/
-    ├── sample_support_tickets.csv  # Inputs + expected outputs (for development)
-    ├── support_tickets.csv         # Inputs only (run your agent on these)
-    └── output.csv                  # Write your agent's predictions here
+    ├── sample_support_tickets.csv  # Labeled examples for development
+    ├── support_tickets.csv         # Inputs for final predictions
+    └── output.csv                  # Generated predictions (create by running the agent)
 ```
 
 ---
@@ -67,27 +89,35 @@ Beyond that you are free to bring your own approach — RAG, vector DBs, tool us
 
 ## Where your code goes
 
-All of your work belongs in [`code/`](./code/). The repo ships with an empty `code/main.py` you can grow into your full agent — add more modules (`agent.py`, `retriever.py`, `classifier.py`, etc.) next to it as needed.
+Implement the agent under [`code/`](./code/). This checkout includes a **Python** reference implementation: hybrid retrieval over `data/`, risk-based escalation, optional OpenAI JSON generation with offline fallback, and CSV I/O. Full setup, environment variables, and regression commands are documented in **[`code/README.md`](./code/README.md)**.
 
 Conventions:
 
-- Put a **README inside `code/`** describing how to install dependencies and run your agent.
-- Read secrets **from environment variables only** (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, …). Copy `.env.example` → `.env` (already gitignored) if you keep one. **Never hardcode keys.**
-- Be **deterministic** where possible. Seed any random sampling.
-- Write responses to `support_tickets/output.csv`.
+- Read secrets **only from environment variables** (see `.env.example`). **Never hardcode keys.**
+- Be **deterministic** where possible (seeded retrieval / sampling).
+- Write predictions to `support_tickets/output.csv`.
 
 ---
 
 ## Quickstart
 
-Clone this repository:
+Clone the repository and keep the **`data/`** folder next to `code/` — the agent does not fetch live help-center content; it reads the bundled corpus.
 
 ```bash
-git clone git@github.com:interviewstreet/hackerrank-orchestrate-may26.git
 cd hackerrank-orchestrate-may26
+python -m venv .venv
+.venv\Scripts\activate          # Windows
+# source .venv/bin/activate     # macOS / Linux
+pip install -r code/requirements.txt
+# Optional: fully offline run (no LLM API)
+set ORCHESTRATE_DISABLE_LLM=1   # Windows cmd
+# export ORCHESTRATE_DISABLE_LLM=1   # macOS / Linux
+python code/main.py
 ```
 
-You are free to use any language or runtime. We recommend **Python**, **JavaScript**, or **TypeScript**.
+This writes `support_tickets/output.csv`. **Regression:** `cd code` then `python run_eval.py --offline` (compares to `sample_support_tickets.csv`).
+
+Submission expects a **zip of `code/` only** (no `data/` in the zip); evaluators use their own corpus copy. Your **`output.csv`** is uploaded separately.
 
 ---
 
@@ -131,4 +161,4 @@ Results will be announced on May 15, 2026
 
 Submissions are scored across four dimensions: agent design (your `code/`), the AI Judge interview, output accuracy on `support_tickets/output.csv`, and AI fluency from your chat transcript.
 
-See [`evalutation_criteria.md`](./evalutation_criteria.md) for the full rubric.
+See [`evaluation_criteria.md`](./evaluation_criteria.md) for the full rubric. Design notes for this repo’s agent: [`docs/decisions.md`](./docs/decisions.md).
diff --git a/code/README.md b/code/README.md
@@ -1,6 +1,8 @@
 # Support triage agent (Orchestrate)
 
-Terminal agent that reads `support_tickets/support_tickets.csv`, retrieves grounded snippets from the offline `data/` corpus (BM25 + overlap rerank), applies risk-based escalation rules, and writes predictions to `support_tickets/output.csv`.
+Terminal agent that reads `support_tickets/support_tickets.csv`, retrieves grounded snippets from the offline `data/` corpus (**BM25 + TF‑IDF fusion + lexical rerank**), applies risk-based escalation rules + taxonomy mapping, and writes predictions to `support_tickets/output.csv`.
+
+**Design rationale & decision flowchart:** [`../docs/decisions.md`](../docs/decisions.md). **Interview / demo / rubric:** [`../docs/interview.md`](../docs/interview.md), [`../docs/demo-script.md`](../docs/demo-script.md), [`../docs/DEV_EVAL.md`](../docs/DEV_EVAL.md).
 
 ## Setup
 
@@ -21,36 +23,96 @@ OPENAI_API_KEY=sk-...
 # ORCHESTRATE_SEED=42
 # TOP_K=6
 # LOW_BM25_THRESHOLD=7.0
+# ORCHESTRATE_DISABLE_LLM=1
+# ORCHESTRATE_INDEX_VERSION=2
+# HYBRID_CANDIDATES=160
+# BM25_WEIGHT=0.55
+# TFIDF_WEIGHT=0.45
+#
+# Grounding (post-generation checks vs retrieved text):
+# ORCHESTRATE_GROUNDING_MIN_OVERLAP=0.12
+# ORCHESTRATE_GROUNDING_FAIL_MODE=resynthesize   # or: escalate
+#
+# Lexical rerank bonuses (query term + brand alignment):
+# ORCHESTRATE_RERANK_BONUS_TEAM=5
+# ORCHESTRATE_RERANK_BONUS_WORKSPACE=5
+# ORCHESTRATE_RERANK_BONUS_BRAND=3
 ```
 
-If `OPENAI_API_KEY` is unset, the agent uses an extractive fallback (weaker, but runs offline except for the corpus).
+If `OPENAI_API_KEY` is unset (or `ORCHESTRATE_DISABLE_LLM=1`), the agent uses an **offline synthesis** path (structured steps from retrieved articles; weaker than a quota-available LLM, but fully corpus-grounded).
 
 ## Run
 
-From the `code/` directory (so imports resolve):
+From the **repository root** (recommended — avoids shadowing Python’s stdlib `code` module on Linux):
+
+```bash
+python code/main.py
+```
+
+Or from the `code/` directory:
 
 ```bash
 python main.py
 ```
 
+(`python -m code` is unreliable because it may load the **stdlib** `code` module instead of this folder.)
+
 Options:
 
 ```text
 --input   path to input CSV (default: ../support_tickets/support_tickets.csv)
 --output  path to output CSV (default: ../support_tickets/output.csv)
---limit N process only the first N rows (debug)
+--limit N process only the first N rows (default 0 = all rows; must be >= 0)
+--fail-fast        exit on first row exception (exit 2); default is write escalated placeholder rows
+--progress         tqdm progress bar (requires `tqdm` installed)
+--max-field-chars N cap Issue/Subject length per row (default: env ORCHESTRATE_MAX_FIELD_CHARS or 200000)
+```
+
+Exit codes: **0** success; **2** user error (missing input, bad CSV schema, bad `--limit` / `--max-field-chars`, unusable `--output`, missing `data/`, index lock timeout, **`--fail-fast` row error**). Exceptions in row processing are caught by default (escalated row); otherwise **1** for unexpected crashes.
+
+The first run builds a retrieval index under `code/.cache/bm25_index.pkl`. Delete it if you change chunking/fusion logic or bump `ORCHESTRATE_INDEX_VERSION`.
+
+### Quick regression (sample file)
+
+```bash
+python run_eval.py --offline
+```
+
+Optional diagnostics on the generated `sample_pred.csv`:
+
+```bash
+python run_eval.py --offline --report-quality
+```
+
+`eval_sample.py` reports exact match on routing columns and **fuzzy stats on `response`** (normalized exact, token F1, character overlap). Use them to catch regressions on free-text answers; the official holdout is still scored by the platform.
+
+Compare any gold CSV to predictions (e.g. internal dev labels with `Justification`):
+
+```bash
+python compare_outputs.py --gold ../path/to/gold.csv --pred ../support_tickets/output.csv
+```
+
+### Tests
+
+```bash
+cd code
+python -m pytest tests -q
 ```
 
-The first run builds a BM25 index under `code/.cache/bm25_index.pkl`; delete that file if you change chunking logic.
+CI (GitHub Actions) runs the same **pytest** + **`run_eval.py --offline`** on each push/PR.
 
 ## Architecture
 
 | Module          | Role                                               |
 | --------------- | -------------------------------------------------- |
 | `corpus.py`     | Loads markdown, strips noise, chunks articles       |
-| `retrieve.py`   | BM25 index, brand filtering, lexical rerank        |
+| `retrieve.py`   | Hybrid retrieval index + brand filtering + rerank |
+| `taxonomy.py`   | Canonical `product_area` mapping                  |
+| `grounding.py` / `postprocess.py` | Cheap grounding checks + finalize |
 | `risk.py`       | Regex escalation triggers (e.g. grading disputes) |
-| `openai_agent.py` | JSON-grounded LLM decisioning + fallback        |
+| `openai_agent.py` | JSON-grounded LLM decisioning + offline synthesis |
+| `eval_metrics.py` / `eval_sample.py` | Sample regression metrics |
+| `ticket_hints.py` | Optional multi-topic heuristics (tests / docs) |
 | `main.py`       | CSV orchestration                                  |
 
 Secrets are read **only** from the environment (never commit `.env`).
diff --git a/code/__init__.py b/code/__init__.py
@@ -0,0 +1 @@
+"""Orchestrate support triage agent package (run from repo root: ``python code/main.py``)."""
diff --git a/code/__main__.py b/code/__main__.py
@@ -0,0 +1,21 @@
+"""CLI entry when invoked as ``python -m code`` from the repository root (Windows-friendly;
+on Linux ``python -m code`` may load the stdlib ``code`` module — use ``python code/main.py``).
+
+
+``main.py`` uses absolute imports (``from config import …``) assuming ``code/`` is on
+``sys.path``. Running ``python -m code`` sets the cwd on ``sys.path``, not ``code/``,
+so we prepend this package directory before importing ``main``.
+"""
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+_pkg_dir = Path(__file__).resolve().parent
+if str(_pkg_dir) not in sys.path:
+    sys.path.insert(0, str(_pkg_dir))
+
+from main import main  # noqa: E402
+
+if __name__ == "__main__":
+    main()
diff --git a/code/answer_synthesis.py b/code/answer_synthesis.py
@@ -0,0 +1,110 @@
+"""Offline answer shaping: convert retrieved markdown-ish text into short actionable guidance."""
+from __future__ import annotations
+
+import hashlib
+import re
+
+from retrieve import Retrieved
+
+
+_BULLET_LINE = re.compile(r"(?m)^\s*(?:[-*•]|\d+\.)\s+(.*)$")
+
+# Lines that usually aren't helpful as user-facing steps.
+_NOISE_PREFIXES = (
+    "last updated:",
+    "_last updated",
+    "note:",
+    "important:",
+    "warning:",
+)
+
+
+def _clean_line(line: str) -> str:
+    s = line.strip()
+    s = re.sub(r"\s+", " ", s)
+    return s
+
+
+def _strip_heading_noise(text: str) -> str:
+    # Remove markdown headings but keep their content lines handled separately.
+    text = re.sub(r"(?m)^#+\s+.*$", "", text)
+    return text
+
+
+def extract_steps(text: str, *, max_steps: int = 8, max_chars_per_step: int = 260) -> list[str]:
+    """Pull readable steps from support article bodies."""
+    text = _strip_heading_noise(text)
+    lines = [ln.rstrip() for ln in text.splitlines()]
+    steps: list[str] = []
+
+    # Prefer explicit bullets/numbered lists.
+    for ln in lines:
+        m = _BULLET_LINE.match(ln.strip())
+        if not m:
+            continue
+        step = _clean_line(m.group(1))
+        if not step:
+            continue
+        low = step.lower()
+        if any(low.startswith(p) for p in _NOISE_PREFIXES):
+            continue
+        if len(step) > max_chars_per_step:
+            step = step[: max_chars_per_step - 1] + "…"
+        steps.append(step)
+        if len(steps) >= max_steps:
+            break
+
+    # Fallback: split long paragraphs into sentences if no bullets exist.
+    if not steps:
+        blob = _clean_line(re.sub(r"\s+", " ", text))
+        # naive sentence split (good enough for hackathon corpus text)
+        parts = re.split(r"(?<=[.!?])\s+", blob)
+        for p in parts:
+            p = _clean_line(p)
+            if len(p) < 40:
+                continue
+            low = p.lower()
+            if any(low.startswith(pref) for pref in _NOISE_PREFIXES):
+                continue
+            if len(p) > max_chars_per_step:
+                p = p[: max_chars_per_step - 1] + "…"
+            steps.append(p)
+            if len(steps) >= max_steps:
+                break
+
+    return steps[:max_steps]
+
+
+def synthesize_reply_from_hits(hits: list[Retrieved], *, max_sources: int = 2) -> tuple[str, list[str]]:
+    """Return (user_response, source_paths_used)."""
+    if not hits:
+        return "", []
+
+    sources: list[str] = []
+    blocks: list[str] = []
+
+    for h in hits[:max_sources]:
+        c = h.chunk
+        sources.append(c.path)
+        title = c.title.strip()
+        steps = extract_steps(c.text)
+        if not steps:
+            # last resort small excerpt
+            excerpt = re.sub(r"\s+", " ", c.text).strip()
+            excerpt = excerpt[:700] + ("…" if len(excerpt) > 700 else "")
+            blocks.append(f"From {title}:\n{excerpt}")
+            continue
+
+        rendered = "\n".join(f"{i+1}. {s}" for i, s in enumerate(steps))
+        blocks.append(f"From {title}:\n{rendered}")
+
+    body = "\n\n".join(blocks).strip()
+    # Short, varied closings (deterministic per content hash — less repetitive than one fixed paragraph).
+    _closings = (
+        "If this doesn’t match what you see, contact official support for your product.",
+        "If you still need help, use your product’s official support channel.",
+        "For anything still unclear, reach out through the official support path for your product.",
+    )
+    h = int(hashlib.sha256(body.encode("utf-8")).hexdigest(), 16)
+    body += "\n\n" + _closings[h % len(_closings)]
+    return body, sources
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		"""Orchestrate support triage agent package (run from repo root: ``python code/main.py``)."""