This repository was archived by the owner on Feb 24, 2026. It is now read-only.
Conversation
- extract ScoringEngine for eligibility checks and weight computation - extract ChainMonitor for blockchain commitment monitoring - extract JobScheduler for evaluation job creation and broadcasting - extract WebSocketHub for validator connection management - create ProviderRegistry with protocol-based environment provider interface - add MetaworldProvider and SwarmProvider implementations - refactor BackendService to use composition pattern with injected components - add unit tests for ScoringEngine, WebSocketHub, and ProviderRegistry - preserve original service as service_old.py for reference
- Create core task interfaces (TaskType, TaskSpec, TaskResult, TaskExecutor) - Implement ExecutorRegistry for task type dispatch - Add RLRolloutExecutor wrapping existing rollout infrastructure - Create OrchestratorV2 using executor pattern - Add task_type field to EvalJobMessage and Competition model - Add unit tests for executor registry (23 tests) - Add documentation for creating new executors
- Create ScoringStrategy protocol for task-type-specific scoring logic - Implement RLRolloutScoringStrategy with eligibility, scoring, and comparison - Add ScoringStrategyRegistry for task type dispatch - Update ScoringEngine to use strategies via registry - Rename scoring.py to scoring_engine.py, re-export from scoring package - Add 32 unit tests for scoring strategies and registry - Maintain backward compatibility for existing imports
Architecture simplification - enable evaluators to connect directly to backend without going through validator relay: Backend Changes: - Add EvaluatorHub for managing direct evaluator WebSocket connections - Add EvaluatorConnection model for tracking evaluator state and statistics - Add /ws/evaluator endpoint for evaluator WebSocket registration - Add EvaluatorInfoResponse and /evaluators REST endpoints - Add EVALUATOR role to UserRole enum and API key constraints Core Changes: - Add EvaluatorRegisterMessage, EvaluatorRegistrationAckMessage, JobAckMessage - Add EVALUATOR_REGISTER, EVALUATOR_REGISTRATION_ACK, JOB_ACK message types Evaluator Changes: - Add BackendClient for WebSocket communication with backend - Add backend connection config options (backend_ws_url, evaluator_id, etc.) - Support direct and pgqueuer connection modes This enables: - Direct Backend <-> Evaluator communication (no validator relay) - Multiple evaluators connecting simultaneously - Capacity-based job routing via EvaluatorHub - Evaluator heartbeat and statistics tracking Next steps: - Update OrchestratorV2 to use BackendClient - Create database migrations for EvaluatorConnection table - Update JobScheduler to route jobs via EvaluatorHub
…only Remove pgqueuer relay mode and use direct backend WebSocket connection: - Remove connection_mode branching and pgqueuer imports - Simplify setup_job/process_job to work with EvalJobMessage directly - Remove _enqueue_job_for_processing (pgqueuer re-queue) - Remove _start_pgqueuer_mode/_start_direct_mode (single path) - Inline BackendClient initialization into start() - Remove conditional WebSocket/pgqueuer paths in result publishing This eliminates ~80 lines of code and the validator relay middleman, enabling evaluators to connect directly to the backend.
Replace WebSocketHub (validator relay) with EvaluatorHub for direct job broadcasting: - Change JobScheduler to accept EvaluatorHub instead of WebSocketHub - Extract _build_job_message() helper for building EvalJobMessage - Update publish_jobs() to broadcast to all connected evaluators - Update BackendService to pass evaluator_hub to JobScheduler Jobs are now broadcast to all connected evaluators via WebSocket, allowing any available evaluator to pick up and process the job.
- Add migration 020: evaluator_connections table for direct evaluator communication - Tracks evaluator connection state, capabilities, job counts - Includes check constraints for non-negative counters - Indexes on is_connected and last_heartbeat for efficient queries - Add migration 021: extends api_keys role constraint with 'evaluator' role - Enables evaluators to authenticate with dedicated API keys
…et connection - Delete legacy orchestrator.py (1784 lines, pgqueuer-based) - Rename orchestrator_v2.py to orchestrator.py - Rename OrchestratorV2 class to Orchestrator - Update docstrings and log messages The new Orchestrator uses direct WebSocket connection to backend and the TaskExecutor pattern for pluggable task types.
Replace WebSocket-based validator with a simple polling approach: - WeightSettingValidator polls GET /weights endpoint periodically - Sets weights on chain when they change (hash comparison) - No WebSocket connection complexity needed - Default poll interval: 5 minutes (configurable via weight_poll_interval) Also removes pgqueuer queue methods from DatabaseManager since evaluators now communicate directly with backend. Reduced from 641 lines to ~210 lines.
Architecture simplification: - Remove /ws/validator endpoint from endpoints.py (~350 lines) - Remove WebSocketHub and its tests (websocket_hub.py, test_websocket_hub.py) - Remove validator heartbeat monitoring from BackendService - Update _broadcast_and_set_weights to just update snapshot (validators poll) - Simplify _monitor_stale_jobs to not reference ws_hub Validators now poll GET /weights periodically instead of maintaining a WebSocket connection. This simplifies the architecture significantly. Update CLAUDE.md to reflect the new architecture: - Evaluators connect directly to backend via WebSocket - Validators poll /weights endpoint - Removed pgqueuer relay references
- Update introduction.md with new diagrams showing direct evaluator connection - Rewrite orchestrator.md to describe WebSocket-based job dispatch - Update evaluator.md to remove pgqueuer references - Simplify validator.md to focus on polling-based weight setting - Update incentive.md weight distribution section - Fix overview.mdx and docker.md references Reflects the architecture changes where: - Evaluators connect directly to backend via WebSocket - Validators poll GET /weights instead of WebSocket - No more pgqueuer intermediate queue
- Re-export exception classes from job_scheduler in service.py - Add migration 022 for task_type column on competitions - Delete unused service_old.py (3160 lines) - Code formatting cleanup
- Move db/, migrations/, alembic.ini from validator to evaluator - Refactor episode_logger.py to use BackendClient instead of pgqueuer - Rename validator to LiteValidator (polling-based weight setter) - Remove validator database infrastructure (no longer needed) - Add migrate_evaluator_db.sh script - Update README with new architecture overview - Fix ProviderRegistry.has() -> has_provider() in rl_rollout.py The evaluator now streams telemetry directly to the backend via WebSocket instead of queuing through pgqueuer. The validator is simplified to just poll /weights and set on chain.
- Fix enum serialization: use model_dump(mode='json') to properly serialize EvaluationStatus enums to string values - Fix Ray Queue truthiness: use 'is None' checks instead of 'not' since Ray Queue returns False for bool() when empty - Fix blocking pod creation: run container creation in thread pool executor to prevent WebSocket keepalive timeouts - Initialize ProviderRegistry at startup to register swarm/metaworld - Remove database_url parameter from LoggingConfig and RolloutWorker since we no longer use pgqueuer for episode logging
Remove real-time episode/step streaming and S3 observation uploads from episode_logger.py. These features are deferred to Phase 4 as they require solving the Ray actor -> WebSocket connection challenge. The logger now only tracks episode progress for final result aggregation. This aligns with the simplified architecture where only final evaluation results are sent back to the backend after completion. Changes: - Remove BackendClient dependency from EpisodeLogger - Remove ThreadPoolExecutor for background S3 uploads - Remove _send_episode_data and _send_step_data methods - Simplify step logging to just track basic metrics - Disable S3 upload by default in LoggingConfig - Clean up unused imports and helper methods
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.