Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
OPENAI_API_KEY=
# OPENAI_MODEL=gpt-4o-mini
# ORCHESTRATE_SEED=42
# ORCHESTRATE_MAX_FIELD_CHARS=200000
35 changes: 35 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: CI

on:
push:
branches: ["**"]
pull_request:

jobs:
test-and-sample-eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install dependencies
run: pip install -r code/requirements.txt

# Do not use `python -m code` here: on Linux it often resolves to the *stdlib* `code`
# module instead of this repo's `code/` package.
- name: CLI help smoke test
working-directory: code
run: python main.py --help

- name: Pytest
working-directory: code
run: python -m pytest tests -q

- name: Sample regression (offline)
working-directory: code
env:
ORCHESTRATE_DISABLE_LLM: "1"
run: python run_eval.py --offline
68 changes: 49 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,24 @@ Starter repository for the **HackerRank Orchestrate** 24-hour hackathon (May 1

Build a terminal-based AI agent that triages real support tickets across three product ecosystems; **HackerRank**, **Claude**, and **Visa** — using only the support corpus shipped in this repo.

Read [`problem_statement.md`](./problem_statement.md) for the full task spec, input/output schema, and allowed values, and [`evalutation_criteria.md`](./evalutation_criteria.md) for how submissions are scored.
Read [`problem_statement.md`](./problem_statement.md) for the full task spec, input/output schema, and allowed values, and [`evaluation_criteria.md`](./evaluation_criteria.md) for how submissions are scored.

### Start here (run the bundled agent)

From the **repository root** (after `pip install -r code/requirements.txt`):

| Shell | Command |
|-------|---------|
| **Any (recommended)** | `python code/main.py` — avoids shadowing the stdlib `code` module |
| **Any** | `cd code` then `python main.py` |
| **bash / zsh** | `./scripts/run_agent.sh` or `bash scripts/run_agent.sh` |
| **PowerShell** | `pwsh -File scripts/run_agent.ps1` |

**Note:** `python -m code` can invoke the **standard library** `code` module on Linux instead of this repo’s package—prefer `python code/main.py` or the scripts above.

Optional offline-only: set `ORCHESTRATE_DISABLE_LLM=1`, then run one of the above. Full CLI flags are in [`code/README.md`](./code/README.md).

**Interview / demo:** [`docs/interview.md`](./docs/interview.md), [`docs/demo-script.md`](./docs/demo-script.md). **Manual answer quality:** [`docs/DEV_EVAL.md`](./docs/DEV_EVAL.md). **Scope:** [`docs/scope_and_limits.md`](./docs/scope_and_limits.md).

---

Expand All @@ -28,16 +45,21 @@ Read [`problem_statement.md`](./problem_statement.md) for the full task spec, in
├── AGENTS.md # Rules for AI coding tools + transcript logging
├── problem_statement.md # Full task description and I/O schema
├── README.md # You are here
├── code/ # ← Build your agent here
│ └── main.py # Entry point (rename/extend as you like)
├── data/ # Local-only support corpus (no network needed)
│ ├── hackerrank/ # HackerRank help center
│ ├── claude/ # Claude Help Center export
│ └── visa/ # Visa consumer + small-business support
├── docs/ # decisions.md, interview prep, demo script, dev rubric
├── scripts/ # run_agent.sh / run_agent.ps1 (repo-root invocation)
├── code/ # Participant agent (see code/README.md)
│ ├── main.py # CLI entry: reads CSV, writes predictions
│ ├── retrieve.py # Hybrid retrieval + reranking
│ ├── eval_sample.py # Metrics vs sample_support_tickets.csv
│ └── tests/ # pytest regression checks
├── data/ # Offline support corpus (required to run locally)
│ ├── hackerrank/
│ ├── claude/
│ └── visa/
└── support_tickets/
├── sample_support_tickets.csv # Inputs + expected outputs (for development)
├── support_tickets.csv # Inputs only (run your agent on these)
└── output.csv # Write your agent's predictions here
├── sample_support_tickets.csv # Labeled examples for development
├── support_tickets.csv # Inputs for final predictions
└── output.csv # Generated predictions (create by running the agent)
```

---
Expand Down Expand Up @@ -67,27 +89,35 @@ Beyond that you are free to bring your own approach — RAG, vector DBs, tool us

## Where your code goes

All of your work belongs in [`code/`](./code/). The repo ships with an empty `code/main.py` you can grow into your full agent — add more modules (`agent.py`, `retriever.py`, `classifier.py`, etc.) next to it as needed.
Implement the agent under [`code/`](./code/). This checkout includes a **Python** reference implementation: hybrid retrieval over `data/`, risk-based escalation, optional OpenAI JSON generation with offline fallback, and CSV I/O. Full setup, environment variables, and regression commands are documented in **[`code/README.md`](./code/README.md)**.

Conventions:

- Put a **README inside `code/`** describing how to install dependencies and run your agent.
- Read secrets **from environment variables only** (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, …). Copy `.env.example` → `.env` (already gitignored) if you keep one. **Never hardcode keys.**
- Be **deterministic** where possible. Seed any random sampling.
- Write responses to `support_tickets/output.csv`.
- Read secrets **only from environment variables** (see `.env.example`). **Never hardcode keys.**
- Be **deterministic** where possible (seeded retrieval / sampling).
- Write predictions to `support_tickets/output.csv`.

---

## Quickstart

Clone this repository:
Clone the repository and keep the **`data/`** folder next to `code/` — the agent does not fetch live help-center content; it reads the bundled corpus.

```bash
git clone git@github.com:interviewstreet/hackerrank-orchestrate-may26.git
cd hackerrank-orchestrate-may26
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS / Linux
pip install -r code/requirements.txt
# Optional: fully offline run (no LLM API)
set ORCHESTRATE_DISABLE_LLM=1 # Windows cmd
# export ORCHESTRATE_DISABLE_LLM=1 # macOS / Linux
python code/main.py
```

You are free to use any language or runtime. We recommend **Python**, **JavaScript**, or **TypeScript**.
This writes `support_tickets/output.csv`. **Regression:** `cd code` then `python run_eval.py --offline` (compares to `sample_support_tickets.csv`).

Submission expects a **zip of `code/` only** (no `data/` in the zip); evaluators use their own corpus copy. Your **`output.csv`** is uploaded separately.

---

Expand Down Expand Up @@ -131,4 +161,4 @@ Results will be announced on May 15, 2026

Submissions are scored across four dimensions: agent design (your `code/`), the AI Judge interview, output accuracy on `support_tickets/output.csv`, and AI fluency from your chat transcript.

See [`evalutation_criteria.md`](./evalutation_criteria.md) for the full rubric.
See [`evaluation_criteria.md`](./evaluation_criteria.md) for the full rubric. Design notes for this repo’s agent: [`docs/decisions.md`](./docs/decisions.md).
76 changes: 69 additions & 7 deletions code/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Support triage agent (Orchestrate)

Terminal agent that reads `support_tickets/support_tickets.csv`, retrieves grounded snippets from the offline `data/` corpus (BM25 + overlap rerank), applies risk-based escalation rules, and writes predictions to `support_tickets/output.csv`.
Terminal agent that reads `support_tickets/support_tickets.csv`, retrieves grounded snippets from the offline `data/` corpus (**BM25 + TF‑IDF fusion + lexical rerank**), applies risk-based escalation rules + taxonomy mapping, and writes predictions to `support_tickets/output.csv`.

**Design rationale & decision flowchart:** [`../docs/decisions.md`](../docs/decisions.md). **Interview / demo / rubric:** [`../docs/interview.md`](../docs/interview.md), [`../docs/demo-script.md`](../docs/demo-script.md), [`../docs/DEV_EVAL.md`](../docs/DEV_EVAL.md).

## Setup

Expand All @@ -21,36 +23,96 @@ OPENAI_API_KEY=sk-...
# ORCHESTRATE_SEED=42
# TOP_K=6
# LOW_BM25_THRESHOLD=7.0
# ORCHESTRATE_DISABLE_LLM=1
# ORCHESTRATE_INDEX_VERSION=2
# HYBRID_CANDIDATES=160
# BM25_WEIGHT=0.55
# TFIDF_WEIGHT=0.45
#
# Grounding (post-generation checks vs retrieved text):
# ORCHESTRATE_GROUNDING_MIN_OVERLAP=0.12
# ORCHESTRATE_GROUNDING_FAIL_MODE=resynthesize # or: escalate
#
# Lexical rerank bonuses (query term + brand alignment):
# ORCHESTRATE_RERANK_BONUS_TEAM=5
# ORCHESTRATE_RERANK_BONUS_WORKSPACE=5
# ORCHESTRATE_RERANK_BONUS_BRAND=3
```

If `OPENAI_API_KEY` is unset, the agent uses an extractive fallback (weaker, but runs offline except for the corpus).
If `OPENAI_API_KEY` is unset (or `ORCHESTRATE_DISABLE_LLM=1`), the agent uses an **offline synthesis** path (structured steps from retrieved articles; weaker than a quota-available LLM, but fully corpus-grounded).

## Run

From the `code/` directory (so imports resolve):
From the **repository root** (recommended — avoids shadowing Python’s stdlib `code` module on Linux):

```bash
python code/main.py
```

Or from the `code/` directory:

```bash
python main.py
```

(`python -m code` is unreliable because it may load the **stdlib** `code` module instead of this folder.)

Options:

```text
--input path to input CSV (default: ../support_tickets/support_tickets.csv)
--output path to output CSV (default: ../support_tickets/output.csv)
--limit N process only the first N rows (debug)
--limit N process only the first N rows (default 0 = all rows; must be >= 0)
--fail-fast exit on first row exception (exit 2); default is write escalated placeholder rows
--progress tqdm progress bar (requires `tqdm` installed)
--max-field-chars N cap Issue/Subject length per row (default: env ORCHESTRATE_MAX_FIELD_CHARS or 200000)
```

Exit codes: **0** success; **2** user error (missing input, bad CSV schema, bad `--limit` / `--max-field-chars`, unusable `--output`, missing `data/`, index lock timeout, **`--fail-fast` row error**). Exceptions in row processing are caught by default (escalated row); otherwise **1** for unexpected crashes.

The first run builds a retrieval index under `code/.cache/bm25_index.pkl`. Delete it if you change chunking/fusion logic or bump `ORCHESTRATE_INDEX_VERSION`.

### Quick regression (sample file)

```bash
python run_eval.py --offline
```

Optional diagnostics on the generated `sample_pred.csv`:

```bash
python run_eval.py --offline --report-quality
```

`eval_sample.py` reports exact match on routing columns and **fuzzy stats on `response`** (normalized exact, token F1, character overlap). Use them to catch regressions on free-text answers; the official holdout is still scored by the platform.

Compare any gold CSV to predictions (e.g. internal dev labels with `Justification`):

```bash
python compare_outputs.py --gold ../path/to/gold.csv --pred ../support_tickets/output.csv
```

### Tests

```bash
cd code
python -m pytest tests -q
```

The first run builds a BM25 index under `code/.cache/bm25_index.pkl`; delete that file if you change chunking logic.
CI (GitHub Actions) runs the same **pytest** + **`run_eval.py --offline`** on each push/PR.

## Architecture

| Module | Role |
| --------------- | -------------------------------------------------- |
| `corpus.py` | Loads markdown, strips noise, chunks articles |
| `retrieve.py` | BM25 index, brand filtering, lexical rerank |
| `retrieve.py` | Hybrid retrieval index + brand filtering + rerank |
| `taxonomy.py` | Canonical `product_area` mapping |
| `grounding.py` / `postprocess.py` | Cheap grounding checks + finalize |
| `risk.py` | Regex escalation triggers (e.g. grading disputes) |
| `openai_agent.py` | JSON-grounded LLM decisioning + fallback |
| `openai_agent.py` | JSON-grounded LLM decisioning + offline synthesis |
| `eval_metrics.py` / `eval_sample.py` | Sample regression metrics |
| `ticket_hints.py` | Optional multi-topic heuristics (tests / docs) |
| `main.py` | CSV orchestration |

Secrets are read **only** from the environment (never commit `.env`).
1 change: 1 addition & 0 deletions code/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Orchestrate support triage agent package (run from repo root: ``python code/main.py``)."""
21 changes: 21 additions & 0 deletions code/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""CLI entry when invoked as ``python -m code`` from the repository root (Windows-friendly;
on Linux ``python -m code`` may load the stdlib ``code`` module — use ``python code/main.py``).


``main.py`` uses absolute imports (``from config import …``) assuming ``code/`` is on
``sys.path``. Running ``python -m code`` sets the cwd on ``sys.path``, not ``code/``,
so we prepend this package directory before importing ``main``.
"""
from __future__ import annotations

import sys
from pathlib import Path

_pkg_dir = Path(__file__).resolve().parent
if str(_pkg_dir) not in sys.path:
sys.path.insert(0, str(_pkg_dir))

from main import main # noqa: E402

if __name__ == "__main__":
main()
110 changes: 110 additions & 0 deletions code/answer_synthesis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
"""Offline answer shaping: convert retrieved markdown-ish text into short actionable guidance."""
from __future__ import annotations

import hashlib
import re

from retrieve import Retrieved


_BULLET_LINE = re.compile(r"(?m)^\s*(?:[-*•]|\d+\.)\s+(.*)$")

# Lines that usually aren't helpful as user-facing steps.
_NOISE_PREFIXES = (
"last updated:",
"_last updated",
"note:",
"important:",
"warning:",
)


def _clean_line(line: str) -> str:
s = line.strip()
s = re.sub(r"\s+", " ", s)
return s


def _strip_heading_noise(text: str) -> str:
# Remove markdown headings but keep their content lines handled separately.
text = re.sub(r"(?m)^#+\s+.*$", "", text)
return text


def extract_steps(text: str, *, max_steps: int = 8, max_chars_per_step: int = 260) -> list[str]:
"""Pull readable steps from support article bodies."""
text = _strip_heading_noise(text)
lines = [ln.rstrip() for ln in text.splitlines()]
steps: list[str] = []

# Prefer explicit bullets/numbered lists.
for ln in lines:
m = _BULLET_LINE.match(ln.strip())
if not m:
continue
step = _clean_line(m.group(1))
if not step:
continue
low = step.lower()
if any(low.startswith(p) for p in _NOISE_PREFIXES):
continue
if len(step) > max_chars_per_step:
step = step[: max_chars_per_step - 1] + "…"
steps.append(step)
if len(steps) >= max_steps:
break

# Fallback: split long paragraphs into sentences if no bullets exist.
if not steps:
blob = _clean_line(re.sub(r"\s+", " ", text))
# naive sentence split (good enough for hackathon corpus text)
parts = re.split(r"(?<=[.!?])\s+", blob)
for p in parts:
p = _clean_line(p)
if len(p) < 40:
continue
low = p.lower()
if any(low.startswith(pref) for pref in _NOISE_PREFIXES):
continue
if len(p) > max_chars_per_step:
p = p[: max_chars_per_step - 1] + "…"
steps.append(p)
if len(steps) >= max_steps:
break

return steps[:max_steps]


def synthesize_reply_from_hits(hits: list[Retrieved], *, max_sources: int = 2) -> tuple[str, list[str]]:
"""Return (user_response, source_paths_used)."""
if not hits:
return "", []

sources: list[str] = []
blocks: list[str] = []

for h in hits[:max_sources]:
c = h.chunk
sources.append(c.path)
title = c.title.strip()
steps = extract_steps(c.text)
if not steps:
# last resort small excerpt
excerpt = re.sub(r"\s+", " ", c.text).strip()
excerpt = excerpt[:700] + ("…" if len(excerpt) > 700 else "")
blocks.append(f"From {title}:\n{excerpt}")
continue

rendered = "\n".join(f"{i+1}. {s}" for i, s in enumerate(steps))
blocks.append(f"From {title}:\n{rendered}")

body = "\n\n".join(blocks).strip()
# Short, varied closings (deterministic per content hash — less repetitive than one fixed paragraph).
_closings = (
"If this doesn’t match what you see, contact official support for your product.",
"If you still need help, use your product’s official support channel.",
"For anything still unclear, reach out through the official support path for your product.",
)
h = int(hashlib.sha256(body.encode("utf-8")).hexdigest(), 16)
body += "\n\n" + _closings[h % len(_closings)]
return body, sources
Loading
Loading