NITISH-R-G · NITISH-R-G · May 1, 2026 · May 1, 2026 · May 1, 2026 · May 1, 2026
diff --git a/.env.example b/.env.example
@@ -3,3 +3,5 @@ OPENAI_API_KEY=
 # OPENAI_MODEL=gpt-4o-mini
 # ORCHESTRATE_SEED=42
 # ORCHESTRATE_MAX_FIELD_CHARS=200000
+# Escalate when one ticket references multiple product ecosystems (HackerRank+Claude, etc.)
+# ORCHESTRATE_DISABLE_CROSS_ECOSYSTEM_ESCALATE=1
diff --git a/README.md b/README.md
@@ -6,6 +6,92 @@ Build a terminal-based AI agent that triages real support tickets across three p
 
 Read [`problem_statement.md`](./problem_statement.md) for the full task spec, input/output schema, and allowed values, and [`evaluation_criteria.md`](./evaluation_criteria.md) for how submissions are scored.
 
+---
+
+## Setup (evaluators — primary instructions)
+
+**Environment:** Python **3.11+**. Work from the **repository root** (the folder that contains this `README.md`).
+
+| Step | Action |
+|------|--------|
+| **1. Dependencies** | `pip install -r code/requirements.txt` |
+| **2. Secrets** | Copy `.env.example` → `.env` at the repo root. Set `OPENAI_API_KEY` if you use the LLM path, **or** set `ORCHESTRATE_DISABLE_LLM=1` for a fully offline run (no API calls). Never commit `.env`. |
+| **3. Run the agent** | `python code/main.py` — reads `support_tickets/support_tickets.csv`, writes `support_tickets/output.csv`. Use `python code/main.py --help` for `--input`, `--output`, `--limit`. |
+| **4. Regression check** | From `code/`: `ORCHESTRATE_DISABLE_LLM=1 python run_eval.py --offline` (optional). Full smoke: `pwsh -File scripts/verify_local.ps1` or `bash scripts/verify_local.sh` from repo root. |
+
+**Cross-platform:** Prefer `python code/main.py` from the repo root on **Windows, Linux, and macOS**. Avoid `python -m code` on Unix-like systems (stdlib name clash). Details: [`code/README.md`](./code/README.md).
+
+---
+
+## Approach overview
+
+This submission implements **offline retrieval-augmented triage** over the bundled markdown corpus in **`data/`** (no live web search for answer facts):
+
+1. **Retrieval:** Hybrid **BM25 + TF-IDF** fusion with lexical reranking to fetch relevant support chunks (`code/retrieve.py`).
+2. **Routing & safety:** Regex **risk** escalation and **cross-ecosystem** detection (mixed vendors in one ticket) before answer generation (`risk.py`, `cross_ecosystem.py`).
+3. **Taxonomy:** Stable **`product_area`** labels aligned to corpus structure (`taxonomy.py`).
+4. **Answer generation:** Optional **OpenAI** chat with **JSON over retrieved context only** (`openai_agent.py`); if the API is missing or `ORCHESTRATE_DISABLE_LLM=1`, **offline synthesis** builds replies from retrieved text (`answer_synthesis.py`).
+5. **Grounding:** Post-generation lexical overlap and numeric guards (`grounding.py`, `postprocess.py`).
+
+Deeper design decisions and trade-offs: [`docs/decisions.md`](./docs/decisions.md). Interview / limits: [`docs/interview.md`](./docs/interview.md), [`docs/scope_and_limits.md`](./docs/scope_and_limits.md).
+
+---
+
+## Packaging a ZIP (full project + this README)
+
+The challenge may ask for your **complete working project** and a **README** with setup and approach — that is this **root `README.md`**, not only `code/README.md`.
+
+**Recommended (clean, no secrets, no `.git` folder):** from the repo root, archive **tracked** files only:
+
+```bash
+git archive --format=zip -o ../hackerrank-orchestrate-submission.zip HEAD
+```
+
+Or run **`scripts/make_submission_zip.sh`** / **`scripts/make_submission_zip.ps1`** (same idea; writes next to the repo folder).
+
+That typically includes `README.md`, `AGENTS.md`, `problem_statement.md`, `evaluation_criteria.md`, `code/`, `data/`, `support_tickets/`, `docs/`, `scripts/`, `.github/`, etc.—whatever is **committed**. Untracked junk (e.g. `.venv`, `code/.cache`) stays out if not committed.
+
+**Do not** put API keys in the zip (never commit `.env`).
+
+**If the platform requires a `code/`-only zip** instead, zip the `code/` directory — and **add a copy of the sections [Setup](#setup-evaluators--primary-instructions) and [Approach](#approach-overview) into `code/README.md`** so reviewers still see setup + approach in one place.
+
+**Predictions** (`output.csv`) are often uploaded **separately** on HackerRank—follow the live submission page.
+
+---
+
+### Evaluation criteria (`evaluation_criteria.md`) — what this repo covers vs what you must bring
+
+| Dimension | What the repo already supports | What you still own |
+|-----------|----------------------------------|-------------------|
+| **1. Agent Design** | Clear pipeline (`retrieve.py`, `openai_agent.py`, `postprocess.py`, `risk.py`, `taxonomy.py`), pinned `requirements.txt`, tests, CI, [`docs/decisions.md`](./docs/decisions.md) | Explaining trade-offs and alternatives in the **AI Judge interview** |
+| **2. AI Judge Interview** | Prep in [`docs/interview.md`](./docs/interview.md), [`docs/demo-script.md`](./docs/demo-script.md) | Showing up, demonstrating depth, honesty about AI assistance |
+| **3. Output CSV** | `main.py` → `support_tickets/output.csv`; run [`scripts/verify_local.ps1`](./scripts/verify_local.ps1) / [`scripts/verify_local.sh`](./scripts/verify_local.sh) before upload | Regenerating predictions on the final `support_tickets.csv`; hidden-set accuracy is scored by the platform |
+| **4. AI Fluency (transcript)** | [`AGENTS.md`](./AGENTS.md) instructs tools to log turns to `%USERPROFILE%\hackerrank_orchestrate\log.txt` (Windows) / `$HOME/hackerrank_orchestrate/log.txt` (Unix) | **You** must collaborate visibly with intent—scoped prompts, critique, architectural steering—not blind acceptance |
+
+**If many teams “meet” the bar — how is one winner chosen?** The public docs **do not publish exact weights or tie-break rules**. Typically: scores from **each dimension are combined** into a final score; **Output CSV** quality on **held-out rows** usually moves the leaderboard the most; **Interview** and **transcript** differentiate teams when numeric scores are close. Perfect ties across *all* dimensions are unlikely—small CSV differences still rank-order. For anything not specified here, treat **official platform / organizer communications** as source of truth.
+
+**Baseline snapshot (Phase 0):** after meaningful routing/retrieval changes, run `python scripts/capture_baseline.py` from the repo root to refresh [`docs/superpowers/BASELINE.md`](./docs/superpowers/BASELINE.md) (git SHA, pytest count, sample routing %). **Per-row routing diff:** from `code/` after `ORCHESTRATE_DISABLE_LLM=1 python run_eval.py --offline`, run `python eval_sample.py --pred ../support_tickets/sample_pred.csv --routing-detail`.
+
+**Verify before submit (matches CI + full offline batch):** from repo root, run `bash scripts/verify_local.sh` or `pwsh -File scripts/verify_local.ps1`. This installs `code/requirements.txt`, runs `main.py --help`, `pytest`, `run_eval.py --offline`, then `main.py --limit 0` with the LLM off. Set `VERIFY_SKIP_FULL_BATCH=1` to stop after the sample regression (faster). On macOS/Linux, `chmod +x scripts/*.sh` if needed. This does **not** prove hidden-test accuracy—only that the pipeline is healthy.
+
+**Problem statement alignment (what this repo implements):**
+
+| Requirement (`problem_statement.md`) | How it is addressed |
+|--------------------------------------|---------------------|
+| Terminal-based agent | `code/main.py` CLI; run via `python code/main.py` or `scripts/run_agent.*` |
+| HackerRank / Claude / Visa | Corpus under `data/` per brand; retrieval uses brand mask + `infer_brand` when `Company` is `None` |
+| Only provided corpus for answers | Retrieval from `data/` only; LLM (if enabled) is given **retrieved** chunks as context, not live web search |
+| Request type, product area, reply vs escalate, justification | Output columns + `taxonomy.py`, `postprocess.py`, `risk.py`, `cross_ecosystem.py` |
+| Retrieve relevant docs | Hybrid BM25 + TF-IDF fusion + rerank (`retrieve.py`) |
+| Safe / grounded responses | Grounding overlap + numeric guard (`grounding.py`, `postprocess.py`); offline synthesis when LLM off |
+| Escalate high-risk / sensitive | Regex risk routes before generation; low-retrieval flag; cross-ecosystem escalation |
+| Handle noise / multi-topic / malicious-ish text | Invalid small-talk heuristics; risk patterns; multi-topic note (`ticket_hints.py`) |
+| CSV input → CSV output | `csv_io.py`; writes `response`, `product_area`, `status`, `request_type`, `justification` |
+
+**Cross-platform:** CI runs on **Ubuntu** (`python main.py` from `code/`). Use **`python code/main.py`** from the repo root on all OSes—avoid **`python -m code`** on Linux/macOS (stdlib `code` module name clash). **Windows:** use PowerShell scripts or `python code\main.py`. Same Python **3.11+** and `pip install -r code/requirements.txt` everywhere; keep `data/` next to `code/` as in the repo layout.
+
+**Offline routing check:** with `ORCHESTRATE_DISABLE_LLM=1`, `cd code && python run_eval.py --offline` should show **100%** exact match on `status`, `request_type`, and `product_area` for the bundled sample (response text differs when the LLM is off).
+
 ### Start here (run the bundled agent)
 
 From the **repository root** (after `pip install -r code/requirements.txt`):
@@ -27,14 +113,15 @@ Optional offline-only: set `ORCHESTRATE_DISABLE_LLM=1`, then run one of the abov
 
 ## Contents
 
-1. [Repository layout](#repository-layout)
-2. [What you need to build](#what-you-need-to-build)
-3. [Where your code goes](#where-your-code-goes)
-4. [Quickstart](#quickstart)
-5. [Chat transcript logging](#chat-transcript-logging)
-6. [Submission](#submission)
-7. [Judge interview](#judge-interview)
-8. [Evaluation criteria](#evaluation-criteria)
+1. [Setup](#setup-evaluators--primary-instructions) · [Approach](#approach-overview) · [Packaging a ZIP](#packaging-a-zip-full-project--this-readme)
+2. [Repository layout](#repository-layout)
+3. [What you need to build](#what-you-need-to-build)
+4. [Where your code goes](#where-your-code-goes)
+5. [Quickstart](#quickstart)
+6. [Chat transcript logging](#chat-transcript-logging)
+7. [Submission](#submission)
+8. [Judge interview](#judge-interview)
+9. [Evaluation criteria](#evaluation-criteria)
 
 ---
 
@@ -46,7 +133,7 @@ Optional offline-only: set `ORCHESTRATE_DISABLE_LLM=1`, then run one of the abov
 ├── problem_statement.md            # Full task description and I/O schema
 ├── README.md                       # You are here
 ├── docs/                           # decisions.md, interview prep, demo script, dev rubric
-├── scripts/                        # run_agent.sh / run_agent.ps1 (repo-root invocation)
+├── scripts/                        # run_agent.*, verify_local.*, make_submission_zip.*
 ├── code/                           # Participant agent (see code/README.md)
 │   ├── main.py                     # CLI entry: reads CSV, writes predictions
 │   ├── retrieve.py                 # Hybrid retrieval + reranking
@@ -117,7 +204,7 @@ python code/main.py
 
 This writes `support_tickets/output.csv`. **Regression:** `cd code` then `python run_eval.py --offline` (compares to `sample_support_tickets.csv`).
 
-Submission expects a **zip of `code/` only** (no `data/` in the zip); evaluators use their own corpus copy. Your **`output.csv`** is uploaded separately.
+For **ZIP packaging**, see [Packaging a ZIP](#packaging-a-zip-full-project--this-readme). Historically some docs mentioned zipping only `code/`; **follow the current submission UI**—often the **full starter-repo layout** (including `data/` and this README) is what “complete project” means.
 
 ---
 
@@ -139,11 +226,13 @@ You don't need to do anything to enable it — just use your AI tool normally. Y
 Submit on the HackerRank Community Platform:
 <https://www.hackerrank.com/contests/hackerrank-orchestrate-may26/challenges/support-agent/submission>
 
-You will upload **three** files:
+You will typically upload **three** artifacts:
+
+1. **Code / project zip** — Often the **full repository** (this README + `code/` + `data/` + …); see [Packaging a ZIP](#packaging-a-zip-full-project--this-readme). Exclude secrets and local venvs (`git archive` helps).
+2. **Predictions CSV** — agent output for `support_tickets/support_tickets.csv` (usually `output.csv`), **if** the platform asks for it separately.
+3. **Chat transcript** — `log.txt` from [Chat transcript logging](#chat-transcript-logging), **if** required.
 
-1. **Code zip** — zip your `code/` directory and upload it. Exclude virtualenvs, `node_modules`, build artifacts, the `data/` corpus, and the `support_tickets/` CSVs.
-2. **Predictions CSV** — your agent's output for `support_tickets/support_tickets.csv` (i.e. the populated `output.csv`).
-3. **Chat transcript** — the `log.txt` from the path in [Chat transcript logging](#chat-transcript-logging).
+Always confirm fields on the **live submission page**—wording can change between rounds.
 
 ---
 

diff --git a/code/README.md b/code/README.md
@@ -1,5 +1,7 @@
 # Support triage agent (Orchestrate)
 
+> **Evaluators:** primary **setup + approach overview** for the whole submission is the **repository root** [`../README.md`](../README.md). This file focuses on `code/` module details and flags.
+
 Terminal agent that reads `support_tickets/support_tickets.csv`, retrieves grounded snippets from the offline `data/` corpus (**BM25 + TF‑IDF fusion + lexical rerank**), applies risk-based escalation rules + taxonomy mapping, and writes predictions to `support_tickets/output.csv`.
 
 **Design rationale & decision flowchart:** [`../docs/decisions.md`](../docs/decisions.md). **Interview / demo / rubric:** [`../docs/interview.md`](../docs/interview.md), [`../docs/demo-script.md`](../docs/demo-script.md), [`../docs/DEV_EVAL.md`](../docs/DEV_EVAL.md).

diff --git a/code/conftest.py b/code/conftest.py
@@ -0,0 +1,14 @@
+"""Shared pytest fixtures (session-scoped retrieval index)."""
+from __future__ import annotations
+
+import pytest
+
+from config import CACHE_PATH, DATA_DIR
+from retrieve import BM25Index
+
+
+@pytest.fixture(scope="session")
+def bm25_index_session() -> BM25Index:
+    if not DATA_DIR.is_dir():
+        pytest.skip(f"Corpus missing: {DATA_DIR}")
+    return BM25Index.load(CACHE_PATH, DATA_DIR)
diff --git a/code/cross_ecosystem.py b/code/cross_ecosystem.py
@@ -0,0 +1,52 @@
+"""Detect tickets that span multiple distinct product ecosystems — safer to escalate than guess one answer."""
+from __future__ import annotations
+
+import os
+import re
+
+
+def cross_ecosystem_escalation_reason(issue: str, subject: str) -> str | None:
+    """Return human-readable escalate reason, or None.
+
+    Conservative pairwise checks avoid false positives such as "HackerRank visa sponsorship"
+    (mentions Visa immigration language without Visa-the-network product context).
+    Disable entirely with ``ORCHESTRATE_DISABLE_CROSS_ECOSYSTEM_ESCALATE=1``.
+    """
+    if os.environ.get("ORCHESTRATE_DISABLE_CROSS_ECOSYSTEM_ESCALATE", "").strip().lower() in {
+        "1",
+        "true",
+        "yes",
+        "y",
+    }:
+        return None
+
+    blob = f"{subject}\n{issue}".strip()
+    low = blob.lower()
+
+    has_hr = bool(re.search(r"\bhackerrank\b", low))
+    has_claude = bool(re.search(r"\bclaude\b|\banthropic\b", low))
+    # Visa Inc. product context (cards/travel/payment), not generic immigration "visa".
+    has_visa_financial = bool(
+        re.search(r"\bvisa\b", low)
+        and re.search(
+            r"\b(card|cards|credit|debit|cheque|cheques|gcas|lost|stolen|"
+            r"traveller|traveler|payment|pin|atm|fraud|chargeback)\b",
+            low,
+        )
+    )
+
+    tags: list[str] = []
+    if has_hr and has_claude:
+        tags.append("HackerRank + Claude/Anthropic")
+    if has_hr and has_visa_financial:
+        tags.append("HackerRank + Visa payment/travel")
+    if has_claude and has_visa_financial:
+        tags.append("Claude + Visa payment/travel")
+
+    if not tags:
+        return None
+    return (
+        "Multiple distinct product ecosystems in one ticket ("
+        + "; ".join(tags)
+        + "); escalating for human routing."
+    )
diff --git a/code/eval_sample.py b/code/eval_sample.py
@@ -28,6 +28,11 @@ def main() -> None:
     ap.add_argument("--sample", type=str, default=str(Path("..") / "support_tickets" / "sample_support_tickets.csv"))
     ap.add_argument("--pred", type=str, default=str(Path("..") / "support_tickets" / "output.csv"))
     ap.add_argument("--report", type=str, default=str(Path("..") / "support_tickets" / "sample_eval_report.csv"))
+    ap.add_argument(
+        "--routing-detail",
+        action="store_true",
+        help="Print per-row gold vs predicted routing (status, request_type, product_area).",
+    )
     args = ap.parse_args()
 
     try:
@@ -90,6 +95,27 @@ def exact_acc(gold: str, pred_col: str) -> float:
     mism = merged[merged["Status"] != merged["Pred_Status"]][key_cols + ["Status", "Pred_Status"]]
     print(f"\nStatus mismatches: {len(mism)}")
 
+    if args.routing_detail:
+        print("\n=== Per-row routing (gold vs pred) ===")
+        for i, r in merged.iterrows():
+            subj = str(r.get("Subject", ""))[:60]
+            g_st = str(r.get("Status", "")).strip()
+            p_st = str(r.get("Pred_Status", "")).strip()
+            g_rt = str(r.get("Request Type", "")).strip()
+            p_rt = str(r.get("Pred_Request Type", "")).strip()
+            g_pa = str(r.get("Product Area", "")).strip()
+            p_pa = str(r.get("Pred_Product Area", "")).strip()
+            ok = (
+                _norm_status(g_st) == _norm_status(p_st)
+                and g_rt.lower() == p_rt.lower()
+                and g_pa.lower() == p_pa.lower()
+            )
+            mark = "OK" if ok else "MISMATCH"
+            print(f"[{mark}] row={i} subject={subj!r}…")
+            print(f"  status:       gold={g_st!r} pred={p_st!r}")
+            print(f"  request_type: gold={g_rt!r} pred={p_rt!r}")
+            print(f"  product_area: gold={g_pa!r} pred={p_pa!r}")
+
     report_cols = key_cols + [
         "Status",
         "Pred_Status",

diff --git a/code/main.py b/code/main.py
@@ -11,6 +11,7 @@
 import pandas as pd
 
 from config import DATA_DIR, INPUT_CSV, MAX_FIELD_CHARS, OUTPUT_CSV, SEED, TOP_K
+from cross_ecosystem import cross_ecosystem_escalation_reason
 from csv_io import TicketCsvError, canonicalize_ticket_columns, read_tickets_csv
 from openai_agent import decide_with_openai, fallback_from_hits
 from postprocess import finalize_decision
@@ -127,6 +128,10 @@ def process_row(row: pd.Series, index: BM25Index) -> dict[str, Any]:
             fb["request_type"] = hit.force_request_type
         return _validate_row(fb)
 
+    eco = cross_ecosystem_escalation_reason(issue, subject)
+    if eco:
+        return _validate_row(fallback_from_hits([], escalated=True, esc_reason=eco, low_retrieval=False))
+
     hits, raw_top_score = index.search(f"{subject}\n{issue}", brand, TOP_K)
     hits = rerank_hits(f"{subject}\n{issue}", hits)
     low = should_escalate_low_retrieval(raw_top_score)

diff --git a/code/openai_agent.py b/code/openai_agent.py
@@ -70,6 +70,7 @@ def fallback_from_hits(
 {"status":"replied"|"escalated","product_area":"string","response":"string","justification":"string","request_type":"product_issue"|"feature_request"|"bug"|"invalid"}
 Rules:
 - status=escalated for fraud, legal threats, account takeover, grading disputes, bug bounty reports needing security team, or when CONTEXT lacks needed facts.
+- If the ticket mixes unrelated products (e.g. HackerRank assessment workflow AND Visa card dispute in one message), status=escalated — humans must split routing.
 - product_area: short snake_case like sample outputs (e.g. screen, community, privacy, travel_support). Prefer last breadcrumb or doc topic from CONTEXT.
 - request_type: bug if outage/errors; feature_request for new capability; invalid for spam/thanks/off-topic; else product_issue.
 - response: concise, user-facing, only facts supported by CONTEXT. If status=replied, no fabricated steps.

diff --git a/code/pytest.ini b/code/pytest.ini
@@ -0,0 +1,4 @@
+[pytest]
+# On some Windows/Python builds the pdb hook imports stdlib `code`; this repo's top-level
+# package folder is also named `code`, which can shadow the stdlib module and break pytest startup.
+addopts = -p no:debugging