Pivot/agent skill by prakhar728 · Pull Request #17 · prakhar728/conclave

prakhar728 · 2026-05-05T23:29:00Z

No description provided.

…eware Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Delete skills/confidential_data_procurement/ (preserved in backup branch) - Delete skills/dataset_audit/ stub - Delete skills/hackathon_novelty/init.py (dead after /init removed) - Remove POST /init, /upload, /respond, /download/{token} endpoints - Remove init_handler / upload_handler / respond_handler from SkillCard - Remove InitRequest / InitResponse from core.models - Drop procurement and live e2e tests; rewrite test_e2e to seed instances directly until typed POST /instances lands in phase 4 Phase 1 of pivot/agent-skill. All 52 remaining tests pass.

- Add storage/ package with module-level functions for instances, submissions, results, tokens, registrations - Schema includes evaluation_runs and attestations tables stubbed for phases 5 and 8 - routes.py: drop _instances/_submissions/_results/_tokens/_registrations module-level dicts, route through storage.* instead - main.py: call storage.init_db() at startup - tests: fixture now calls storage.reset_all(); tests use :memory: DB - _resolve_token now stashes the raw token in the returned dict so /submit can call add_submission_to_token() DB path: env CONCLAVE_DB_PATH (default ./data/conclave.db). Tests use in-memory. Phase 2 of pivot/agent-skill. All 52 tests pass.

- _resolve_token now accepts either Authorization: Bearer <token> or the legacy X-Instance-Token header. Bearer is the canonical path used by the agent skill. - New endpoint POST /generate-token returns {token, expires_at} matching Colosseum Copilot's PAT shape. /register kept for the existing web UI. - expires_at is null in v1 (no token expiry yet). - URL-as-access-control: anyone with the enclave URL can mint a token. Sybil prevention is deferred per plan. Phase 3 of pivot/agent-skill. 58 tests pass.

- New endpoint accepts {name, end_date, evaluation_frequency, tracks[]} and returns {instance_id, admin_token, enclave_url}. - Add TrackConfig, CreateInstanceRequest, CreateInstanceResponse models. - Duration parser supports w/d/h/m/s units (e.g. "1w", "30m"). - Validates end_date in future and at least one track. - enclave_url comes from CONCLAVE_PUBLIC_URL env var or request.base_url. - Threshold set to 999_999 on creation; phase 5 scheduler will drive evaluation instead of count-based auto-trigger. - Storage create_instance now takes **fields kwargs that flow into the JSON data column. New fields (name, end_date, evaluation_frequency_seconds, tracks) are stored alongside config/threshold. Phase 4 of pivot/agent-skill. 62 tests pass.

- New module infra/scheduler.py with one asyncio task per instance. Loop sleeps for evaluation_frequency_seconds, runs the pipeline (skipped if cohort empty), repeats until end_date. Final tick fires on the way out. - main.py uses lifespan to call scheduler.start_all() on startup and stop_all() on shutdown. Synchronous setup (storage.init_db, register_skills) runs at import so tests don't need lifespan. - POST /instances spins up the scheduler loop for the new instance. - POST /submit no longer auto-triggers the pipeline. Status response is "received" (was "received_pending" / "received_analysis_complete"). Pipeline runs only via the scheduler or admin POST /trigger. - Tests: e2e tests call /trigger explicitly. New test_scheduler.py covers empty-cohort skip, normal tick, end_date stop, env-var disable. - CONCLAVE_DISABLE_SCHEDULER=1 disables scheduler globally; tests use it. Phase 5 of pivot/agent-skill. 66 tests pass.

NoveltyResult gains: - track_alignments: dict[str, float] — cosine similarity vs each track - best_fit_track: str | None — argmax of track alignments - cluster_label / cluster_size — surfaced from deterministic layer - confidence: 'low' | 'high' — 'low' when cohort < 5 - name_collisions: list[NameCollision] — fuzzy-matched project name dupes Deterministic layer now computes track alignments via cosine similarity between submission embeddings and operator-supplied track descriptions, plus name collisions via difflib.SequenceMatcher (no new deps). OperatorConfig gains a tracks list. POST /instances populates it from the typed body. ALLOWED_OUTPUT_KEYS expanded; USER_OUTPUT_KEYS rewritten to expose the new participant-facing fields and drop aligned/criteria_scores from the participant view (those are admin-only signals now). run_skill no longer short-circuits on small cohorts — instead tags every result confidence='low' so the agent skill can warn early submitters that scores will firm up. test_run_skill_insufficient_submissions renamed and rewritten to assert this. Submission update flow already worked via storage.upsert_submission's ON CONFLICT REPLACE; pipeline naturally re-evaluates from the latest stored row on each scheduler tick. Phase 6 of pivot/agent-skill. 66 tests pass.

- New endpoint GET /cohort/aggregates: cohort size, last-evaluation timestamp, cluster distribution, track distribution, name-collision pair count. - New endpoint GET /cohort/timeline: history of every pipeline tick for this instance with per-tick aggregate snapshot (top clusters, top tracks, collision count). - Each pipeline run now records to evaluation_runs storage table with a compact snapshot. - GET /submissions admin response gains idea_title_or_summary (first line, truncated to 80 chars) so the operator can see broadly what's being submitted without raw idea text. - Storage gains record_evaluation_run + list_evaluation_runs. Phase 7 of pivot/agent-skill. 69 tests pass.

- New module infra/solana.py: publishes a SHA-256 of the cohort report via the SPL Memo program. No custom on-chain code needed — the signed transaction with a deterministic memo IS the attestation. Anyone can look up the txn, read the memo, verify the signer pubkey. - hash_report() is order-independent: sorts results by submission_id before hashing, so two ticks over the same data produce the same hash. - Graceful degradation: if CONCLAVE_SOLANA_KEYPAIR is unset, returns a 'local_only' record without hitting the network. Tests rely on this. - Configuration via env: CONCLAVE_SOLANA_KEYPAIR (base58 / JSON-array / base64), CONCLAVE_SOLANA_RPC_URL (default devnet), CONCLAVE_SOLANA_NETWORK. - Scheduler hooks attestation publish into the final end_date tick. - New endpoints: GET /attestations (any valid token) and admin-only POST /attestations/publish for the demo path that doesn't want to wait for end_date. - Storage gains attestations table data column + record/list helpers. - requirements.txt adds solders + solana. Phase 8 of pivot/agent-skill. 74 tests pass.

…nents - Delete app/templates/, app/access/, app/i/[id]/ — replaced by the agent-skill flow (participants never touch the web UI in the pivot). - Delete component shells used only by procurement / template gallery: template-card, procurement-scorecard, negotiation-panel, dataset-upload-card, procurement-policy-preview, release-token-card, hard-constraints-card, milestone-breakdown, chat-message. - Stub app/page.tsx, app/setup/page.tsx, app/dashboard/[id]/page.tsx so the codebase compiles between phases. Real implementations land in frontend phases 3, 4, 5. lib/api.ts and lib/types.ts still reference procurement types; those get removed in frontend phase 2. Frontend phase 1 of pivot/agent-skill.

… surface types.ts: - Drop procurement types (BuyerPolicy, SupplierSubmission, ReleaseToken, NegotiationStatus, SettlementStatus, ProcurementResult, etc.) - Drop legacy InitRequest/InitResponse (no more conversational setup) - Extend NoveltyResult with track_alignments, best_fit_track, cluster_label, cluster_size, confidence, name_collisions - Add CreateInstanceRequest/Response, TrackConfig, GenerateTokenResponse - Add CohortAggregates, CohortTimelineEntry, Attestation, MeResponse, InstanceMeta, StoredInstance - SubmitResponse status simplified to "received" - SubmissionMeta gains idea_title_or_summary api.ts: - Rewrite as a minimal Bearer-auth client: ~14 endpoints covering operator setup, dashboard reads, and admin actions - Drop all mocks, procurement adapters, supabase OTP shims - Single request() helper handles auth header + JSON body + ApiError - Reads NEXT_PUBLIC_TEE_URL (default localhost:8000) Frontend phase 2 of pivot/agent-skill.

- Drop multi-skill template gallery framing entirely. Single-product hero: "The novelty checker that never sees your idea." - Two pathway cards: Participants (copyable npx skills add command) and Organizers (CTA to /setup). - "How it works" section reframed for the hackathon novelty story: organizer creates instance → participants submit via agent → periodic evaluation inside enclave → on-chain attestation at end_date. - Reuse AttestationWidget for the verify-enclave section. - Footer points at the GitHub repo + key actions. Frontend phase 3 of pivot/agent-skill.

Single-page operator setup, no conversational LLM flow. Fields: - Hackathon display name - End date (datetime-local) - Evaluation cadence (preset durations: 30m, 1h, 6h, 1d, 3d, 1w, 2w) - Tracks (repeater: name + markdown description, min 1) Flow: AttestationWidget gates the form. On verified, form unlocks. Validation runs client-side, then POST /instances. Success view shows participant share message, enclave URL, instance ID, admin token (masked toggle), and a CTA into the dashboard. Persists {instance_id, admin_token, enclave_url, name, created_at} in localStorage under "conclave.instances" so the dashboard can pick up the admin token without manual paste. Frontend phase 4 of pivot/agent-skill.

…ions Three tabs: - Cohort: cohort size, last-eval timestamp, name-collision pair count, cluster + track distribution bars, evaluation timeline (newest-first). - Submissions: anonymized table — short submission_id, idea_title_or_summary, novelty pill, best-fit track, cluster label + size, submitted-relative time. No raw idea text or repo content visible to the operator. - Attestations: per-attestation card with status pill (published / local_only / failed), report hash, tx signature, enclave pubkey, Solana Explorer link. Header actions: - Refresh: reloads all five endpoints in parallel. - Trigger evaluation: POST /trigger. - Publish attestation: POST /attestations/publish. Auto-loads admin_token from localStorage (saved by the setup flow). Falls back to a paste-token prompt if not found. Removes the old Traces tab (broken), procurement panels, and threshold-style auto-trigger UI — all incompatible with the agent-skill pivot. Frontend phase 5 of pivot/agent-skill.

vercel · 2026-05-05T23:29:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
conclave	Ready	Preview, Comment	May 9, 2026 4:17am

… evals) The participant-facing surface for the agent-skill pivot. SKILL.md drives Claude Code / Codex through enclave verification, token minting, local repo summarization, submission, and result fetching. references/ holds the trust boundary statement, the REST API reference, and a troubleshooting matrix. evals/evals.json seeds positive + negative trigger prompts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

data/mock_ideas/ holds 20 deliberately diversified READMEs for end-to-end testing of the hackathon_novelty pipeline against a populated cohort: - 5 Solana-payments variants (cluster A; idea_02 FlowPay + idea_03 FlowPayments are an embedded name-collision pair) - 4 DePIN sensor networks (cluster B) - 3 consumer wallet UX (cluster C) - 4 distinct on-track singletons - 2 off-track ideas (TaxBuddy, Recipely) - 2 high-quality novel outliers (zkAttest, ConclaveSelf) tests/bulk_submit.py mints a fresh participant token per idea and POSTs each README as idea_text, logging (idea_dir, submission_id, token) to data/mock_ideas/submissions.jsonl. Used to populate a local enclave instance before running the agent-skill demo against a 21st submission.

Adds a "Dev: prefill" button on /setup (gated by NODE_ENV !== production) that one-clicks the full Frontier 2026 instance config — name, end_date, 30m eval frequency, and 6 thematic tracks (DeFi, Infra, Consumer, DePIN, AI Agents, Public Goods) with their long markdown descriptions. Same content lives in hackathon_instance_setup.md as a copy/paste reference for non-dev environments and as documentation of why we layered thematic tracks on top of Frontier's open-format prizes.

Three correctness issues surfaced when running the pipeline against the 20-idea mock cohort: 1. Cluster labels were generic Cluster_0..N. cluster_submissions now accepts the submissions list and labels each cluster by the title of the submission closest to that cluster's centroid (e.g. "FlowPay", "WeatherMesh", "DecayVote"). 2. Track scoring collapsed 13/20 submissions to "Public Goods & Open Source" because cosine similarity on full-markdown track descriptions shares too many generic terms ("open-source", "Solana", "tooling"). Switched to embedding only the track *name*, which is short and discriminative. Added TRACK_MIN_SIMILARITY=0.18 floor so off-track submissions get best_fit_track=None instead of a forced assignment. 3. FlowPay vs FlowPayments returned 0 name-collision pairs — the title extractor kept the markdown "#" prefix and SequenceMatcher.ratio() penalized the length asymmetry to 0.737 (below the 0.85 threshold). _project_name now strips header markers; lowered threshold to 0.75; added a substring-containment check (min len 4) so contained names like "FlowPay" inside "FlowPayments" are flagged at proportional overlap (7/12 = 0.583). Also a console.log instrumentation line in the dashboard load() to help diagnose any future spinner-stuck issue (the original was a Turbopack HMR wedge fixed by hard refresh, not a code bug). Includes earlier change to read embedding_model from settings instead of hardcoded "all-mpnet-base-v2" — the Dockerfile bakes "all-MiniLM-L6-v2" which caused sentence-transformers to fall back to an untrained mean-pooling wrapper locally.

The ingest layer rewrites HackathonSubmission.idea_text via an LLM normalization step (skills/hackathon_novelty/ingest.py:49-52) before the deterministic layer runs. The LLM joins the markdown title to its description with ": " — turning "# MerchantStream\n\nShopify plugin..." into "MerchantStream: Shopify plugin...". The previous _project_name returned the entire first line capped at 80 chars, so cluster labels ended up as "MerchantStream: Shopify plugin that adds Solana checkout to any e-com" — visually noisy and worse for downstream display. _project_name now also splits on a ": " separator when it appears in the first 40 chars, returning just the title prefix. Handles both raw markdown ("# FlowPay\n\n...") and the LLM-flattened ("FlowPay: streaming...") shapes.

When the conclave-novelty skill sends a submission, it summarizes the user's repo locally and POSTs idea_text in sentence form like "FlowPayHQ is a Solana program for recurring B2B payments..." — no markdown header, no ": " separator. _project_name fell back to the first 80 chars of the sentence, which then collapsed name-collision similarity calculations: substring containment of "flowpay" inside the 80-char haystack returned 7/80 = 0.087, causing the agent to dismiss a real collision as "probably not a real conflict." _project_name now adds a third shape: walk leading words, stop at the first lowercase-starting token. Title-case prefixes survive, common "X is/lets/provides Y" patterns get collapsed to just "X". FlowPayHQ vs FlowPay now correctly reports similarity 0.875.

… Mono Replaces the violet Apple-inspired theme with a travertine cream + porphyry purple + arena ochre palette. Three voices: Cinzel for display, EB Garamond for body, IBM Plex Mono for hashes/IDs. - Design tokens (globals.css): travertine, porphyry, ochre, blood, basalt - AttestationWidget reframed as "The Imperial Seal" with SPQR seal animation - Landing: arena image hero, basalt-slab "Four acts, one seal" - /setup: "Enter the Lists", form sections renamed to Hackathon/Disciplines - /dashboard: "The Arena" — three tabs (Lists/Submissions/Seals) - /style: design system checkpoint route - New shared SVG ornaments in components/seal-marks.tsx (Laurel, SpqrSeal, ArchDivider) - arena.jpg hero asset in public/

prakhar728 and others added 14 commits April 10, 2026 22:24

Debug: log procurement skill registration failure + custom CORS middl…

0d56095

…eware Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

prakhar728 and others added 7 commits May 6, 2026 05:03

vercel Bot deployed to Preview May 9, 2026 03:58 View deployment

CI: add pytest-asyncio so test_scheduler async tests run

1ee3daa

vercel Bot deployed to Preview May 9, 2026 04:17 View deployment

prakhar728 merged commit 7ceae83 into main May 9, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot/agent skill#17

Pivot/agent skill#17
prakhar728 merged 22 commits into
mainfrom
pivot/agent-skill

prakhar728 commented May 5, 2026

Uh oh!

vercel Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prakhar728 commented May 5, 2026

Uh oh!

vercel Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 5, 2026 •

edited

Loading