Pivot/agent skill#17
Merged
Merged
Conversation
…eware Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete skills/confidential_data_procurement/ (preserved in backup branch)
- Delete skills/dataset_audit/ stub
- Delete skills/hackathon_novelty/init.py (dead after /init removed)
- Remove POST /init, /upload, /respond, /download/{token} endpoints
- Remove init_handler / upload_handler / respond_handler from SkillCard
- Remove InitRequest / InitResponse from core.models
- Drop procurement and live e2e tests; rewrite test_e2e to seed instances
directly until typed POST /instances lands in phase 4
Phase 1 of pivot/agent-skill. All 52 remaining tests pass.
- Add storage/ package with module-level functions for instances, submissions, results, tokens, registrations - Schema includes evaluation_runs and attestations tables stubbed for phases 5 and 8 - routes.py: drop _instances/_submissions/_results/_tokens/_registrations module-level dicts, route through storage.* instead - main.py: call storage.init_db() at startup - tests: fixture now calls storage.reset_all(); tests use :memory: DB - _resolve_token now stashes the raw token in the returned dict so /submit can call add_submission_to_token() DB path: env CONCLAVE_DB_PATH (default ./data/conclave.db). Tests use in-memory. Phase 2 of pivot/agent-skill. All 52 tests pass.
- _resolve_token now accepts either Authorization: Bearer <token> or the
legacy X-Instance-Token header. Bearer is the canonical path used by
the agent skill.
- New endpoint POST /generate-token returns {token, expires_at} matching
Colosseum Copilot's PAT shape. /register kept for the existing web UI.
- expires_at is null in v1 (no token expiry yet).
- URL-as-access-control: anyone with the enclave URL can mint a token.
Sybil prevention is deferred per plan.
Phase 3 of pivot/agent-skill. 58 tests pass.
- New endpoint accepts {name, end_date, evaluation_frequency, tracks[]} and
returns {instance_id, admin_token, enclave_url}.
- Add TrackConfig, CreateInstanceRequest, CreateInstanceResponse models.
- Duration parser supports w/d/h/m/s units (e.g. "1w", "30m").
- Validates end_date in future and at least one track.
- enclave_url comes from CONCLAVE_PUBLIC_URL env var or request.base_url.
- Threshold set to 999_999 on creation; phase 5 scheduler will drive
evaluation instead of count-based auto-trigger.
- Storage create_instance now takes **fields kwargs that flow into the
JSON data column. New fields (name, end_date, evaluation_frequency_seconds,
tracks) are stored alongside config/threshold.
Phase 4 of pivot/agent-skill. 62 tests pass.
- New module infra/scheduler.py with one asyncio task per instance. Loop sleeps for evaluation_frequency_seconds, runs the pipeline (skipped if cohort empty), repeats until end_date. Final tick fires on the way out. - main.py uses lifespan to call scheduler.start_all() on startup and stop_all() on shutdown. Synchronous setup (storage.init_db, register_skills) runs at import so tests don't need lifespan. - POST /instances spins up the scheduler loop for the new instance. - POST /submit no longer auto-triggers the pipeline. Status response is "received" (was "received_pending" / "received_analysis_complete"). Pipeline runs only via the scheduler or admin POST /trigger. - Tests: e2e tests call /trigger explicitly. New test_scheduler.py covers empty-cohort skip, normal tick, end_date stop, env-var disable. - CONCLAVE_DISABLE_SCHEDULER=1 disables scheduler globally; tests use it. Phase 5 of pivot/agent-skill. 66 tests pass.
NoveltyResult gains: - track_alignments: dict[str, float] — cosine similarity vs each track - best_fit_track: str | None — argmax of track alignments - cluster_label / cluster_size — surfaced from deterministic layer - confidence: 'low' | 'high' — 'low' when cohort < 5 - name_collisions: list[NameCollision] — fuzzy-matched project name dupes Deterministic layer now computes track alignments via cosine similarity between submission embeddings and operator-supplied track descriptions, plus name collisions via difflib.SequenceMatcher (no new deps). OperatorConfig gains a tracks list. POST /instances populates it from the typed body. ALLOWED_OUTPUT_KEYS expanded; USER_OUTPUT_KEYS rewritten to expose the new participant-facing fields and drop aligned/criteria_scores from the participant view (those are admin-only signals now). run_skill no longer short-circuits on small cohorts — instead tags every result confidence='low' so the agent skill can warn early submitters that scores will firm up. test_run_skill_insufficient_submissions renamed and rewritten to assert this. Submission update flow already worked via storage.upsert_submission's ON CONFLICT REPLACE; pipeline naturally re-evaluates from the latest stored row on each scheduler tick. Phase 6 of pivot/agent-skill. 66 tests pass.
- New endpoint GET /cohort/aggregates: cohort size, last-evaluation timestamp, cluster distribution, track distribution, name-collision pair count. - New endpoint GET /cohort/timeline: history of every pipeline tick for this instance with per-tick aggregate snapshot (top clusters, top tracks, collision count). - Each pipeline run now records to evaluation_runs storage table with a compact snapshot. - GET /submissions admin response gains idea_title_or_summary (first line, truncated to 80 chars) so the operator can see broadly what's being submitted without raw idea text. - Storage gains record_evaluation_run + list_evaluation_runs. Phase 7 of pivot/agent-skill. 69 tests pass.
- New module infra/solana.py: publishes a SHA-256 of the cohort report via the SPL Memo program. No custom on-chain code needed — the signed transaction with a deterministic memo IS the attestation. Anyone can look up the txn, read the memo, verify the signer pubkey. - hash_report() is order-independent: sorts results by submission_id before hashing, so two ticks over the same data produce the same hash. - Graceful degradation: if CONCLAVE_SOLANA_KEYPAIR is unset, returns a 'local_only' record without hitting the network. Tests rely on this. - Configuration via env: CONCLAVE_SOLANA_KEYPAIR (base58 / JSON-array / base64), CONCLAVE_SOLANA_RPC_URL (default devnet), CONCLAVE_SOLANA_NETWORK. - Scheduler hooks attestation publish into the final end_date tick. - New endpoints: GET /attestations (any valid token) and admin-only POST /attestations/publish for the demo path that doesn't want to wait for end_date. - Storage gains attestations table data column + record/list helpers. - requirements.txt adds solders + solana. Phase 8 of pivot/agent-skill. 74 tests pass.
…nents - Delete app/templates/, app/access/, app/i/[id]/ — replaced by the agent-skill flow (participants never touch the web UI in the pivot). - Delete component shells used only by procurement / template gallery: template-card, procurement-scorecard, negotiation-panel, dataset-upload-card, procurement-policy-preview, release-token-card, hard-constraints-card, milestone-breakdown, chat-message. - Stub app/page.tsx, app/setup/page.tsx, app/dashboard/[id]/page.tsx so the codebase compiles between phases. Real implementations land in frontend phases 3, 4, 5. lib/api.ts and lib/types.ts still reference procurement types; those get removed in frontend phase 2. Frontend phase 1 of pivot/agent-skill.
… surface types.ts: - Drop procurement types (BuyerPolicy, SupplierSubmission, ReleaseToken, NegotiationStatus, SettlementStatus, ProcurementResult, etc.) - Drop legacy InitRequest/InitResponse (no more conversational setup) - Extend NoveltyResult with track_alignments, best_fit_track, cluster_label, cluster_size, confidence, name_collisions - Add CreateInstanceRequest/Response, TrackConfig, GenerateTokenResponse - Add CohortAggregates, CohortTimelineEntry, Attestation, MeResponse, InstanceMeta, StoredInstance - SubmitResponse status simplified to "received" - SubmissionMeta gains idea_title_or_summary api.ts: - Rewrite as a minimal Bearer-auth client: ~14 endpoints covering operator setup, dashboard reads, and admin actions - Drop all mocks, procurement adapters, supabase OTP shims - Single request() helper handles auth header + JSON body + ApiError - Reads NEXT_PUBLIC_TEE_URL (default localhost:8000) Frontend phase 2 of pivot/agent-skill.
- Drop multi-skill template gallery framing entirely. Single-product hero: "The novelty checker that never sees your idea." - Two pathway cards: Participants (copyable npx skills add command) and Organizers (CTA to /setup). - "How it works" section reframed for the hackathon novelty story: organizer creates instance → participants submit via agent → periodic evaluation inside enclave → on-chain attestation at end_date. - Reuse AttestationWidget for the verify-enclave section. - Footer points at the GitHub repo + key actions. Frontend phase 3 of pivot/agent-skill.
Single-page operator setup, no conversational LLM flow. Fields:
- Hackathon display name
- End date (datetime-local)
- Evaluation cadence (preset durations: 30m, 1h, 6h, 1d, 3d, 1w, 2w)
- Tracks (repeater: name + markdown description, min 1)
Flow: AttestationWidget gates the form. On verified, form unlocks.
Validation runs client-side, then POST /instances. Success view shows
participant share message, enclave URL, instance ID, admin token (masked
toggle), and a CTA into the dashboard.
Persists {instance_id, admin_token, enclave_url, name, created_at} in
localStorage under "conclave.instances" so the dashboard can pick up the
admin token without manual paste.
Frontend phase 4 of pivot/agent-skill.
…ions Three tabs: - Cohort: cohort size, last-eval timestamp, name-collision pair count, cluster + track distribution bars, evaluation timeline (newest-first). - Submissions: anonymized table — short submission_id, idea_title_or_summary, novelty pill, best-fit track, cluster label + size, submitted-relative time. No raw idea text or repo content visible to the operator. - Attestations: per-attestation card with status pill (published / local_only / failed), report hash, tx signature, enclave pubkey, Solana Explorer link. Header actions: - Refresh: reloads all five endpoints in parallel. - Trigger evaluation: POST /trigger. - Publish attestation: POST /attestations/publish. Auto-loads admin_token from localStorage (saved by the setup flow). Falls back to a paste-token prompt if not found. Removes the old Traces tab (broken), procurement panels, and threshold-style auto-trigger UI — all incompatible with the agent-skill pivot. Frontend phase 5 of pivot/agent-skill.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
… evals) The participant-facing surface for the agent-skill pivot. SKILL.md drives Claude Code / Codex through enclave verification, token minting, local repo summarization, submission, and result fetching. references/ holds the trust boundary statement, the REST API reference, and a troubleshooting matrix. evals/evals.json seeds positive + negative trigger prompts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
data/mock_ideas/ holds 20 deliberately diversified READMEs for end-to-end
testing of the hackathon_novelty pipeline against a populated cohort:
- 5 Solana-payments variants (cluster A; idea_02 FlowPay + idea_03
FlowPayments are an embedded name-collision pair)
- 4 DePIN sensor networks (cluster B)
- 3 consumer wallet UX (cluster C)
- 4 distinct on-track singletons
- 2 off-track ideas (TaxBuddy, Recipely)
- 2 high-quality novel outliers (zkAttest, ConclaveSelf)
tests/bulk_submit.py mints a fresh participant token per idea and POSTs
each README as idea_text, logging (idea_dir, submission_id, token) to
data/mock_ideas/submissions.jsonl. Used to populate a local enclave
instance before running the agent-skill demo against a 21st submission.
Adds a "Dev: prefill" button on /setup (gated by NODE_ENV !== production) that one-clicks the full Frontier 2026 instance config — name, end_date, 30m eval frequency, and 6 thematic tracks (DeFi, Infra, Consumer, DePIN, AI Agents, Public Goods) with their long markdown descriptions. Same content lives in hackathon_instance_setup.md as a copy/paste reference for non-dev environments and as documentation of why we layered thematic tracks on top of Frontier's open-format prizes.
Three correctness issues surfaced when running the pipeline against the
20-idea mock cohort:
1. Cluster labels were generic Cluster_0..N. cluster_submissions now
accepts the submissions list and labels each cluster by the title of
the submission closest to that cluster's centroid (e.g. "FlowPay",
"WeatherMesh", "DecayVote").
2. Track scoring collapsed 13/20 submissions to "Public Goods & Open
Source" because cosine similarity on full-markdown track descriptions
shares too many generic terms ("open-source", "Solana", "tooling").
Switched to embedding only the track *name*, which is short and
discriminative. Added TRACK_MIN_SIMILARITY=0.18 floor so off-track
submissions get best_fit_track=None instead of a forced assignment.
3. FlowPay vs FlowPayments returned 0 name-collision pairs — the title
extractor kept the markdown "#" prefix and SequenceMatcher.ratio()
penalized the length asymmetry to 0.737 (below the 0.85 threshold).
_project_name now strips header markers; lowered threshold to 0.75;
added a substring-containment check (min len 4) so contained names
like "FlowPay" inside "FlowPayments" are flagged at proportional
overlap (7/12 = 0.583).
Also a console.log instrumentation line in the dashboard load() to help
diagnose any future spinner-stuck issue (the original was a Turbopack
HMR wedge fixed by hard refresh, not a code bug).
Includes earlier change to read embedding_model from settings instead of
hardcoded "all-mpnet-base-v2" — the Dockerfile bakes "all-MiniLM-L6-v2"
which caused sentence-transformers to fall back to an untrained
mean-pooling wrapper locally.
The ingest layer rewrites HackathonSubmission.idea_text via an LLM
normalization step (skills/hackathon_novelty/ingest.py:49-52) before the
deterministic layer runs. The LLM joins the markdown title to its
description with ": " — turning "# MerchantStream\n\nShopify plugin..."
into "MerchantStream: Shopify plugin...". The previous _project_name
returned the entire first line capped at 80 chars, so cluster labels
ended up as "MerchantStream: Shopify plugin that adds Solana checkout to
any e-com" — visually noisy and worse for downstream display.
_project_name now also splits on a ": " separator when it appears in the
first 40 chars, returning just the title prefix. Handles both raw
markdown ("# FlowPay\n\n...") and the LLM-flattened
("FlowPay: streaming...") shapes.
When the conclave-novelty skill sends a submission, it summarizes the user's repo locally and POSTs idea_text in sentence form like "FlowPayHQ is a Solana program for recurring B2B payments..." — no markdown header, no ": " separator. _project_name fell back to the first 80 chars of the sentence, which then collapsed name-collision similarity calculations: substring containment of "flowpay" inside the 80-char haystack returned 7/80 = 0.087, causing the agent to dismiss a real collision as "probably not a real conflict." _project_name now adds a third shape: walk leading words, stop at the first lowercase-starting token. Title-case prefixes survive, common "X is/lets/provides Y" patterns get collapsed to just "X". FlowPayHQ vs FlowPay now correctly reports similarity 0.875.
… Mono Replaces the violet Apple-inspired theme with a travertine cream + porphyry purple + arena ochre palette. Three voices: Cinzel for display, EB Garamond for body, IBM Plex Mono for hashes/IDs. - Design tokens (globals.css): travertine, porphyry, ochre, blood, basalt - AttestationWidget reframed as "The Imperial Seal" with SPQR seal animation - Landing: arena image hero, basalt-slab "Four acts, one seal" - /setup: "Enter the Lists", form sections renamed to Hackathon/Disciplines - /dashboard: "The Arena" — three tabs (Lists/Submissions/Seals) - /style: design system checkpoint route - New shared SVG ornaments in components/seal-marks.tsx (Laurel, SpqrSeal, ArchDivider) - arena.jpg hero asset in public/
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.