Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,8 @@ jobs:
- run: uv run pytest -q --strict-markers -m "not integration"

install-from-wheel:
# Catches importlib.resources packaging bugs that only manifest after a
# real wheel install (the editable-install layout hides them). Without
# this the package ships broken to anyone who pip installs from PyPI.
# Catches import bugs that only manifest after a real wheel install. The
# benchmark data itself is intentionally repo-level under experiments/.
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand Down
1 change: 0 additions & 1 deletion .gitleaks.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ and gitleaks' generic-api-key heuristic flags them as high-entropy strings.
.env files with real provider keys are kept out via .gitignore.
"""
paths = [
'''src/philosophy_bench/data/scenarios/.*''',
'''results/.*''',
'''experiments/.*/data/scenarios/.*''',
'''experiments/.*/results/.*''',
Expand Down
24 changes: 11 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ Benedict Brady.
## Install

```bash
uv pip install philosophy-bench
git clone https://github.com/benedictbrady/philosophy-bench
cd philosophy-bench
uv sync
cp .env.example .env # add at least one provider key
```

Expand All @@ -43,18 +45,16 @@ produce a clear error at the first API call, not at import time.
## Quickstart

```bash
philosophy-bench models # list registered models (29)
philosophy-bench scenarios # validate the default C-vs-D corpus
philosophy-bench run -m opus-4.7 --limit 5 # smoke test (5 scenarios)
uv run philosophy-bench models # list registered models (29)
uv run philosophy-bench scenarios # validate the default C-vs-D corpus
uv run philosophy-bench run -m opus-4.7 --limit 5 # smoke test (5 scenarios)
```

For development:

```bash
git clone https://github.com/benedictbrady/philosophy-bench
cd philosophy-bench
uv sync --extra dev
uv run pytest # 672 tests, ~2s
uv run pytest # full local test suite
```

## Methodology
Expand Down Expand Up @@ -85,10 +85,9 @@ See `SCORING.md` for the canonical rubric. In brief:
from a registered provider, edit `MODEL_REGISTRY` in
`src/philosophy_bench/providers.py`. To add a scenario to the original C-vs-D
experiment, copy `tests/fixtures/synthetic_scenario.yaml` into
`experiments/c_vs_d/data/scenarios/<category>/<your-id>.yaml`, mirror it under
`src/philosophy_bench/data/scenarios/` for wheel compatibility, and follow the
authoring rule above. Validate with `philosophy-bench scenarios` and
`pytest tests/test_scenario_corpus.py`.
`experiments/c_vs_d/data/scenarios/<category>/<your-id>.yaml` and follow the
authoring rule above. Validate with `philosophy-bench scenarios` and `pytest
tests/test_scenario_corpus.py`.

## Results format

Expand Down Expand Up @@ -134,6 +133,5 @@ reproduction will drift as the underlying snapshot migrates.
## License

- **Code**: MIT — see `LICENSE`
- **Data** (experiment scenarios/results in `experiments/` plus the bundled
compatibility mirror in `src/philosophy_bench/data/`): CC-BY-4.0 — see
- **Data** (experiment scenarios/results in `experiments/`): CC-BY-4.0 — see
`LICENSE-DATA`
6 changes: 2 additions & 4 deletions experiments/c_vs_d/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,8 @@ experiments/c_vs_d/
results/ checked-in public artifacts, limited to Opus 4.7
```

For backward compatibility, the same scenario and primer data is mirrored under
`src/philosophy_bench/data/` so wheel installs can still run the default
benchmark without needing the repo checkout. Source checkouts prefer this
experiment directory as the default path.
This directory is the canonical home for the original experiment data. Shared
Python code lives under `src/philosophy_bench/`; experiment data does not.

Run it with:

Expand Down
5 changes: 2 additions & 3 deletions experiments/shared/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,5 @@ Shared experiment infrastructure is the package CLI and engine:
Keep reusable code in `src/philosophy_bench/`. Keep experiment-only scripts in
the experiment's own `harness/` directory.

In a source checkout, the default CLI paths point at
`experiments/c_vs_d/data`. In an installed wheel, they fall back to the bundled
compatibility mirror under `src/philosophy_bench/data`.
Default CLI paths point at the repo-level C-vs-D data under
`experiments/c_vs_d/data`.
7 changes: 0 additions & 7 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,6 @@ build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-data]
philosophy_bench = [
"data/scenarios/**/*.yaml",
"data/primers/*.txt",
"data/ask_poles.yaml",
]

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
Expand Down
122 changes: 0 additions & 122 deletions src/philosophy_bench/data/ask_poles.yaml

This file was deleted.

1 change: 0 additions & 1 deletion src/philosophy_bench/data/primers/baseline_primer.txt

This file was deleted.

7 changes: 0 additions & 7 deletions src/philosophy_bench/data/primers/c_direct_primer.txt

This file was deleted.

7 changes: 0 additions & 7 deletions src/philosophy_bench/data/primers/d_direct_primer.txt

This file was deleted.

This file was deleted.

This file was deleted.

Loading
Loading