Skip to content
This repository was archived by the owner on Feb 24, 2026. It is now read-only.

Feat/v0.2#106

Draft
Shr1ftyy wants to merge 15 commits intomainfrom
feat/v0.2
Draft

Feat/v0.2#106
Shr1ftyy wants to merge 15 commits intomainfrom
feat/v0.2

Conversation

@Shr1ftyy
Copy link
Contributor

No description provided.

- extract ScoringEngine for eligibility checks and weight computation
- extract ChainMonitor for blockchain commitment monitoring
- extract JobScheduler for evaluation job creation and broadcasting
- extract WebSocketHub for validator connection management
- create ProviderRegistry with protocol-based environment provider interface
- add MetaworldProvider and SwarmProvider implementations
- refactor BackendService to use composition pattern with injected components
- add unit tests for ScoringEngine, WebSocketHub, and ProviderRegistry
- preserve original service as service_old.py for reference
- Create core task interfaces (TaskType, TaskSpec, TaskResult, TaskExecutor)
- Implement ExecutorRegistry for task type dispatch
- Add RLRolloutExecutor wrapping existing rollout infrastructure
- Create OrchestratorV2 using executor pattern
- Add task_type field to EvalJobMessage and Competition model
- Add unit tests for executor registry (23 tests)
- Add documentation for creating new executors
- Create ScoringStrategy protocol for task-type-specific scoring logic
- Implement RLRolloutScoringStrategy with eligibility, scoring, and comparison
- Add ScoringStrategyRegistry for task type dispatch
- Update ScoringEngine to use strategies via registry
- Rename scoring.py to scoring_engine.py, re-export from scoring package
- Add 32 unit tests for scoring strategies and registry
- Maintain backward compatibility for existing imports
Architecture simplification - enable evaluators to connect directly to backend
without going through validator relay:

Backend Changes:
- Add EvaluatorHub for managing direct evaluator WebSocket connections
- Add EvaluatorConnection model for tracking evaluator state and statistics
- Add /ws/evaluator endpoint for evaluator WebSocket registration
- Add EvaluatorInfoResponse and /evaluators REST endpoints
- Add EVALUATOR role to UserRole enum and API key constraints

Core Changes:
- Add EvaluatorRegisterMessage, EvaluatorRegistrationAckMessage, JobAckMessage
- Add EVALUATOR_REGISTER, EVALUATOR_REGISTRATION_ACK, JOB_ACK message types

Evaluator Changes:
- Add BackendClient for WebSocket communication with backend
- Add backend connection config options (backend_ws_url, evaluator_id, etc.)
- Support direct and pgqueuer connection modes

This enables:
- Direct Backend <-> Evaluator communication (no validator relay)
- Multiple evaluators connecting simultaneously
- Capacity-based job routing via EvaluatorHub
- Evaluator heartbeat and statistics tracking

Next steps:
- Update OrchestratorV2 to use BackendClient
- Create database migrations for EvaluatorConnection table
- Update JobScheduler to route jobs via EvaluatorHub
…only

Remove pgqueuer relay mode and use direct backend WebSocket connection:

- Remove connection_mode branching and pgqueuer imports
- Simplify setup_job/process_job to work with EvalJobMessage directly
- Remove _enqueue_job_for_processing (pgqueuer re-queue)
- Remove _start_pgqueuer_mode/_start_direct_mode (single path)
- Inline BackendClient initialization into start()
- Remove conditional WebSocket/pgqueuer paths in result publishing

This eliminates ~80 lines of code and the validator relay middleman,
enabling evaluators to connect directly to the backend.
Replace WebSocketHub (validator relay) with EvaluatorHub for direct job broadcasting:

- Change JobScheduler to accept EvaluatorHub instead of WebSocketHub
- Extract _build_job_message() helper for building EvalJobMessage
- Update publish_jobs() to broadcast to all connected evaluators
- Update BackendService to pass evaluator_hub to JobScheduler

Jobs are now broadcast to all connected evaluators via WebSocket,
allowing any available evaluator to pick up and process the job.
- Add migration 020: evaluator_connections table for direct evaluator communication
  - Tracks evaluator connection state, capabilities, job counts
  - Includes check constraints for non-negative counters
  - Indexes on is_connected and last_heartbeat for efficient queries

- Add migration 021: extends api_keys role constraint with 'evaluator' role
  - Enables evaluators to authenticate with dedicated API keys
…et connection

- Delete legacy orchestrator.py (1784 lines, pgqueuer-based)
- Rename orchestrator_v2.py to orchestrator.py
- Rename OrchestratorV2 class to Orchestrator
- Update docstrings and log messages

The new Orchestrator uses direct WebSocket connection to backend
and the TaskExecutor pattern for pluggable task types.
Replace WebSocket-based validator with a simple polling approach:
- WeightSettingValidator polls GET /weights endpoint periodically
- Sets weights on chain when they change (hash comparison)
- No WebSocket connection complexity needed
- Default poll interval: 5 minutes (configurable via weight_poll_interval)

Also removes pgqueuer queue methods from DatabaseManager since
evaluators now communicate directly with backend.

Reduced from 641 lines to ~210 lines.
Architecture simplification:
- Remove /ws/validator endpoint from endpoints.py (~350 lines)
- Remove WebSocketHub and its tests (websocket_hub.py, test_websocket_hub.py)
- Remove validator heartbeat monitoring from BackendService
- Update _broadcast_and_set_weights to just update snapshot (validators poll)
- Simplify _monitor_stale_jobs to not reference ws_hub

Validators now poll GET /weights periodically instead of maintaining
a WebSocket connection. This simplifies the architecture significantly.

Update CLAUDE.md to reflect the new architecture:
- Evaluators connect directly to backend via WebSocket
- Validators poll /weights endpoint
- Removed pgqueuer relay references
- Update introduction.md with new diagrams showing direct evaluator connection
- Rewrite orchestrator.md to describe WebSocket-based job dispatch
- Update evaluator.md to remove pgqueuer references
- Simplify validator.md to focus on polling-based weight setting
- Update incentive.md weight distribution section
- Fix overview.mdx and docker.md references

Reflects the architecture changes where:
- Evaluators connect directly to backend via WebSocket
- Validators poll GET /weights instead of WebSocket
- No more pgqueuer intermediate queue
- Re-export exception classes from job_scheduler in service.py
- Add migration 022 for task_type column on competitions
- Delete unused service_old.py (3160 lines)
- Code formatting cleanup
- Move db/, migrations/, alembic.ini from validator to evaluator
- Refactor episode_logger.py to use BackendClient instead of pgqueuer
- Rename validator to LiteValidator (polling-based weight setter)
- Remove validator database infrastructure (no longer needed)
- Add migrate_evaluator_db.sh script
- Update README with new architecture overview
- Fix ProviderRegistry.has() -> has_provider() in rl_rollout.py

The evaluator now streams telemetry directly to the backend via
WebSocket instead of queuing through pgqueuer. The validator is
simplified to just poll /weights and set on chain.
- Fix enum serialization: use model_dump(mode='json') to properly
  serialize EvaluationStatus enums to string values
- Fix Ray Queue truthiness: use 'is None' checks instead of 'not'
  since Ray Queue returns False for bool() when empty
- Fix blocking pod creation: run container creation in thread pool
  executor to prevent WebSocket keepalive timeouts
- Initialize ProviderRegistry at startup to register swarm/metaworld
- Remove database_url parameter from LoggingConfig and RolloutWorker
  since we no longer use pgqueuer for episode logging
Remove real-time episode/step streaming and S3 observation uploads from
episode_logger.py. These features are deferred to Phase 4 as they require
solving the Ray actor -> WebSocket connection challenge.

The logger now only tracks episode progress for final result aggregation.
This aligns with the simplified architecture where only final evaluation
results are sent back to the backend after completion.

Changes:
- Remove BackendClient dependency from EpisodeLogger
- Remove ThreadPoolExecutor for background S3 uploads
- Remove _send_episode_data and _send_step_data methods
- Simplify step logging to just track basic metrics
- Disable S3 upload by default in LoggingConfig
- Clean up unused imports and helper methods
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant