Entity miners Initial Infra #633

jbonilla-tao · 2025-12-04T07:31:27Z

Taoshi Pull Request

Description

This PR introduces an "Entity Miners" feature allowing a single entity hotkey to manage multiple subaccounts (synthetic hotkeys) for trading. The implementation follows existing RPC patterns, adds comprehensive entity management logic, and integrates with the validator/metagraph systems.

Related Issues (JIRA)

[Reference any related issues or tasks that this pull request addresses or closes.]

Checklist

I have tested my changes on testnet.
I have updated any necessary documentation.
I have added unit tests for my changes (if applicable).
If there are breaking changes for validators, I have (or will) notify the community in Discord of the release.

Reviewer Instructions

[Provide any specific instructions or areas you would like the reviewer to focus on.]

Definition of Done

Code has been reviewed.
All checks and tests pass.
Documentation is up to date.
Approved by at least one reviewer.

Checklist (for the reviewer)

Code follows project conventions.
Code is well-documented.
Changes are necessary and align with the project's goals.
No breaking changes introduced.

Optional: Deploy Notes

[Any instructions or notes related to deployment, if applicable.]

/cc @mention_reviewer

github-actions · 2025-12-04T07:33:14Z

🤖 Claude AI Code Review

Last reviewed on: 14:23:47

Summary

This PR introduces an "Entity Miners" feature that allows a single entity hotkey to manage up to 500 subaccounts (synthetic hotkeys) for trading. The implementation adds comprehensive entity management infrastructure following existing RPC patterns, with debt ledger aggregation, challenge period integration, and validator broadcast capabilities.

✅ Strengths

Excellent architectural consistency - Follows existing RPC patterns (ChallengePeriodManager/Server/Client) consistently throughout
Comprehensive locking strategy - Per-entity locks enable better concurrency than a single global lock
Thorough documentation - Implementation steps document is exceptional, API documentation is detailed
Idempotent design - Subaccount registration handles duplicates gracefully
Security conscious - Uses verify_broadcast_sender() for validator broadcasts
Graceful degradation - Dashboard data aggregation handles missing services elegantly
Lazy RPC connections - connect_immediately=False prevents blocking during initialization

⚠️ Concerns

CRITICAL: Security & Data Integrity

No authentication on entity registration (entity_manager.py:233)

def register_entity(self, entity_hotkey: str, ...):
    # TODO: Add collateral verification here
    # No signature verification to prove ownership of entity_hotkey

Issue: Anyone can register any hotkey as an entity without proving ownership. This could lead to:

Impersonation attacks
Unauthorized subaccount creation
Griefing by exhausting entity slots

Recommendation: Add signature verification before entity registration:

def register_entity(self, entity_hotkey: str, signature: str, ...):
    if not self._verify_hotkey_signature(entity_hotkey, signature):
        return False, "Invalid signature - cannot prove hotkey ownership"

Race condition in monotonic ID generation (entity_manager.py:276)
```
subaccount_id = entity_data.next_subaccount_id
entity_data.next_subaccount_id += 1
```
Issue: While per-entity locks protect against races within the same validator, concurrent broadcasts from different validators could create ID conflicts.

Recommendation: Add UUID-based conflict resolution or use timestamp-based IDs.
Typo in directory name: entitiy_management should be entity_management
This affects all imports and file paths. Should be fixed before merge to avoid future confusion.

Major: Performance & Scalability

Disk writes on every operation (entity_manager.py:308, 328, 350)
Every entity/subaccount mutation calls _write_entities_from_memory_to_disk() synchronously. With 500 subaccounts per entity and multiple entities:

Creates I/O bottleneck
Slows down order processing
Increases disk wear

Recommendation: Implement write batching or async persistence:

def _schedule_disk_write(self):
    """Debounced disk write (write after 5 seconds of inactivity)"""
    with self._write_timer_lock:
        if self._write_timer:
            self._write_timer.cancel()
        self._write_timer = threading.Timer(5.0, self._write_entities_from_memory_to_disk)
        self._write_timer.start()

Missing index for synthetic hotkey lookups (entity_manager.py:360)
```
def get_subaccount_status(self, synthetic_hotkey: str):
    entity_hotkey, subaccount_id = parse_synthetic_hotkey(synthetic_hotkey)
    entity_data = self.entities.get(entity_hotkey)
```
This is O(1) only because the parse function extracts the entity hotkey. However, validation checks may become frequent (per order submission). Consider caching validation results.
Dashboard aggregation queries 5+ services synchronously (entity_manager.py:418)
Could timeout or be slow if any service is unresponsive. Consider:
- Parallel fetching with asyncio
- Circuit breaker pattern
- Caching with TTL

Major: Missing Features

No rate limiting on entity/subaccount creation (entity_manager.py:233, 253)
An attacker could spam entity registrations or subaccount creations.

Recommendation: Add rate limiting per IP or per validator.
No audit logging
Critical operations (entity registration, subaccount creation, elimination) should have immutable audit logs for compliance and debugging.
Missing migration path for existing miners
Documentation mentions this but no implementation provided. How do current miners transition?

Moderate: Code Quality

Placeholder TODOs in production code (entity_manager.py:293, 298)
```
# TODO: Transfer collateral from entity to subaccount
# TODO: Set account size for the subaccount using ContractClient
```
Placeholders are fine for initial infra, but should have:
- Tracking issues/tickets
- Clear interfaces defined
- Integration tests with mocks
Error handling inconsistencies
- Some methods return Tuple[bool, str], others return Tuple[bool, Optional[dict], str]
- No standardized error codes
- Exception messages could leak internal state
Recommendation: Use consistent error response pattern:
```
@dataclass
class OperationResult:
    success: bool
    data: Optional[Any] = None
    error_code: Optional[str] = None
    error_message: str = ""
```

Lock leak potential (entity_manager.py:210)

def _get_entity_lock(self, entity_hotkey: str) -> threading.RLock:
    with self._entities_lock:
        if entity_hotkey not in self._entity_locks:
            self._entity_locks[entity_hotkey] = threading.RLock()
        return self._entity_locks[entity_hotkey]

Entity locks are never removed even when entities are eliminated. Over time, this could accumulate many locks in memory.

Recommendation: Add lock cleanup when entities are permanently removed.

Missing input validation (entity_client.py:92, entity_manager.py:233)
- No validation of entity_hotkey format (could be empty, too long, invalid characters)
- No validation of collateral_amount (could be negative)
- No validation of max_subaccounts (could exceed limits)

💡 Suggestions

Architecture & Design

Consider event sourcing for entity operations
Instead of mutable state, store immutable events (EntityRegistered, SubaccountCreated, SubaccountEliminated). This provides:
- Natural audit log
- Easier debugging
- Replay capability for recovery
- Time-travel queries
Separate read and write models (CQRS)
Current design mixes read-heavy operations (validation, dashboard queries) with write operations (registration, elimination). Consider:
- Read-optimized cache for hotkey validation (hot path)
- Write-optimized persistence for entity mutations

Add health metrics (entity_server.py:371)

def health_check_rpc(self) -> dict:
    return {
        "status": "healthy",
        "total_entities": len(self._manager.entities),
        "total_active_subaccounts": sum(len(e.get_active_subaccounts()) for e in self._manager.entities.values()),
        "daemon_running": self._manager.is_daemon_running(),
        "last_persistence_ms": self._manager._last_disk_write_ms  # Add this field
    }

Add comprehensive metrics/monitoring
- Subaccount creation rate
- Entity registration rate
- Elimination rate
- Lock contention metrics
- RPC call latency

Testing

Missing critical test cases (based on README_steps.txt checklist)
- Challenge period pass/fail scenarios ✗
- Concurrent subaccount creation (race conditions) ✗
- Validator sync with network partition ✗
- Disk persistence recovery after crash ✗
- Load testing with 500 subaccounts per entity ✗
- Per-entity lock contention testing ✗
Add property-based tests
Use hypothesis to test invariants:
- Monotonic IDs never decrease
- Active count never exceeds max_subaccounts
- Eliminated subaccounts never become active again

Add integration test for full order flow

def test_entity_order_flow_integration():
    # Register entity
    # Create subaccount
    # Submit order via REST API
    # Verify position tracking
    # Verify debt ledger aggregation
    # Eliminate subaccount
    # Verify order rejection

Documentation

Add sequence diagrams for complex flows:
- Subaccount registration + validator broadcast
- Dashboard data aggregation
- Elimination assessment flow
Add troubleshooting guide to running_signals_server.md:
- What to do if subaccount is not found
- How to check if entity is registered
- How to verify synthetic hotkey is active
Document backwards compatibility
How does this change affect existing miners? Are there breaking changes?

🔒 Security Notes

High Priority

Authentication missing on REST endpoints (docs/running_signals_server.md:40)
```
POST /api/receive-signal
```
The API documentation shows api_key authentication, but entity endpoints don't mention authentication:
- POST /register_subaccount - Who can register?
- GET /subaccount_status/{id} - Should this be public?
- GET /entity_data/{hotkey} - Privacy concerns?
Potential for hotkey enumeration
GET /entity_data/{entity_hotkey} could allow attackers to enumerate all registered entities by brute force.

Recommendation: Require authentication or rate limit this endpoint.
Broadcast message validation (entity_manager.py:758)
```
if not self.verify_broadcast_sender(sender_hotkey, "SubaccountRegistration"):
    return False
```
Good! But ensure verify_broadcast_sender is cryptographically secure.
Subaccount UUID generation (entity_manager.py:287)
```
subaccount_uuid = str(uuid.uuid4())
```
Using UUID v4 is fine, but ensure this UUID cannot be predicted or used to infer information about other subaccounts.

Medium Priority

Error messages could leak information (entity_manager.py:272)
```
return False, None, f"Entity {entity_hotkey} has reached maximum subaccounts ({entity_data.max_subaccounts})"
```
Consider generic error messages for external APIs to avoid information disclosure.
No input sanitization
Entity hotkeys and synthetic hotkeys are stored and used without validation. Could contain injection payloads if used in logs or queries.
Disk persistence is unencrypted (entity_manager.py:540)
Entity data stored in plaintext JSON. Consider encryption at rest for sensitive data.

📋 Checklist for Authors

Before merging, please ensure:

🎯 Verdict

Status: ⚠️ APPROVE WITH CONDITIONS

This is a well-architected feature with excellent documentation and clear implementation patterns. However, the critical security issues (missing authentication, no signature verification) and performance concerns (synchronous disk writes) must be addressed before production deployment.

Recommendation:

Merge to a feature branch or staging environment
Address critical security issues immediately
Add comprehensive integration tests
Performance test with realistic load (500 subaccounts)
Security audit by another team member
Deploy to testnet for extended validation

The foundation is solid, but production-readiness requires addressing the security and performance gaps identified above.

Fixes bug where get_perf_ledgers_path() incorrectly returned .pkl path, causing migrate_perf_ledgers_to_compressed() to migrate perf_ledgers.json to perf_ledgers.pkl instead of perf_ledgers.json.gz. This created orphaned .pkl files with misleading extensions (containing gzip JSON, not pickle).

- Move validator_broadcast_base.py from shared_objects/ to vali_objects/ - Better location for validator-specific functionality - Update all imports across ValidatorContractManager, AssetSelectionManager, EntityManager SIMPLIFICATION: - Remove dynamic secrets loading from verify_broadcast_sender() - MOTHERSHIP_HOTKEY is now configured directly in ValiConfig - Clearer error messages when configuration is missing - Remove unnecessary ValiUtils import - Simplify verification logic from ~22 lines to ~18 lines CLARITY: - Rename 'wallet' to 'vault_wallet' throughout broadcast system - Less ambiguous than generic 'wallet' name - Matches existing vault_wallet naming in ValidatorContractManager - Update parameter, instance variable, property, and all usages - Update documentation and code examples BUG FIX: - Fix AssetSelectionManager.receive_asset_selection_update() - Correct asset_selection_data.get("") → .get("asset_selection") - Bug would have caused asset selection broadcasts to fail silently Files Changed: - vali_objects/validator_broadcast_base.py (moved + simplified + renamed) - vali_objects/contract/validator_contract_manager.py (import + parameter) - vali_objects/utils/asset_selection/asset_selection_manager.py (import + parameter + bug fix) - entitiy_management/entity_manager.py (import + parameter) - shared_objects/BROADCAST_REFACTORING.md (documentation updates) Benefits: - Better code organization (validator code in vali_objects/) - Simpler verification logic (single source of truth for MOTHERSHIP_HOTKEY) - Clearer naming (vault_wallet vs ambiguous wallet) - Asset selection broadcasts now work correctly - All files compile successfully

…ter filtering Replace hard-coded parameter list with dynamic introspection to filter SubtensorOpsServer initialization parameters. This makes the code more maintainable and automatically adapts to signature changes. Changes: - Add inspect-based parameter filtering in ServerOrchestrator - Dynamically discover accepted parameters using inspect.signature() - Remove hard-coded parameter list ['config', 'wallet', 'is_miner', ...] - Add debug logging for filtered parameters - Enhance TESTING mode support in ServerOrchestrator - Create mock config/wallet for SubtensorOpsServer in TESTING mode - Add minimal test config for entity, contract, asset_selection, weight_calculator - Provide test hotkey and is_mainnet=False for weight_calculator - Fix test compatibility in test_metagraph_updater.py - Rename position_inspector → position_manager (7 occurrences) - Update helper method _create_mock_position_manager() - All 12 tests now passing - Update neurons/miner.py and neurons/validator.py - Integrate with ServerOrchestrator neuron startup pattern - Refactor SubtensorOpsServer initialization - Improve parameter handling for different modes (MINER/VALIDATOR/TESTING)

…ture Centralize test mock creation and refactor weight calculator to follow manager-server pattern, improving separation of concerns and testability. Test Infrastructure: - Add TestMockFactory utility (shared_objects/rpc/test_mock_factory.py) for centralized creation of mock configs, wallets, and hotkeys - Move mock creation from ServerOrchestrator to individual servers (ContractServer, AssetSelectionServer, EntityServer, WeightCalculatorServer, SubtensorOpsServer now self-manage mocks when running_unit_tests=True) - Add test support to Miner class with running_unit_tests parameter - Add test mode to PositionInspector (skips network calls) - Add mock validator responses to PropNetOrderPlacer for testing Weight Calculator Refactoring: - Rename SubtensorWeightSetter → WeightCalculatorManager (manager pattern) - Refactor WeightCalculatorServer to delegate business logic to manager - Simplify server implementation (-275/+91 lines) - Manager creates own RPC clients internally (forward compatibility) - Update imports in tests and runnable scripts Benefits: - Better separation of concerns (servers manage own dependencies) - DRY principle (shared TestMockFactory) - Consistent manager-server pattern across codebase - Improved testability with minimal production code impact - Easier maintenance when requirements change

jbonilla-tao requested review from derekawender, sli-tao, taoshidev1 and trdougherty as code owners December 4, 2025 07:31

jbonilla-tao force-pushed the feat/vanta_entity branch 5 times, most recently from 6021496 to d8ebcda Compare December 6, 2025 07:03

jbonilla-tao changed the title ~~[WIP] Entity miners V1~~ Entity miners Initial Infra Dec 8, 2025

sli-tao approved these changes Dec 9, 2025

View reviewed changes

sli-tao changed the base branch from refactor/price-fetcher to main December 10, 2025 01:41

sli-tao changed the base branch from main to refactor/price-fetcher December 10, 2025 06:38

sli-tao changed the base branch from refactor/price-fetcher to main December 10, 2025 07:44

sli-tao force-pushed the feat/vanta_entity branch from a69b26a to 1274464 Compare December 10, 2025 07:44

jbonilla-tao and others added 14 commits December 11, 2025 19:02

[WIP] Entity miners V1

d71664e

refactor, remove dead code.

395088d

new tests

8d92b25

tests

e220fd4

endpoints

c5cf1ec

fix

32bb39b

cp

fb9dfb2

checkpoint

bbec744

lock down entity endpoints

452f309

update entity endpoint and readme

0d15d62

sli-tao and others added 4 commits December 11, 2025 19:02

full example response

84a8348

remove net_pnl from debt ledger

c780379

add .gz to gitignore

c008e1f

Fix after rebase

50d7d9c

ward-taoshi force-pushed the feat/vanta_entity branch from a7e484c to 50d7d9c Compare December 12, 2025 00:12

add subaccount order processing

2a420cb

sli-tao force-pushed the feat/vanta_entity branch from 617e117 to 2a420cb Compare December 12, 2025 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Entity miners Initial Infra #633

Entity miners Initial Infra #633

Uh oh!

jbonilla-tao commented Dec 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Entity miners Initial Infra #633

Are you sure you want to change the base?

Entity miners Initial Infra #633

Uh oh!

Conversation

jbonilla-tao commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Taoshi Pull Request

Description

Related Issues (JIRA)

Checklist

Reviewer Instructions

Definition of Done

Checklist (for the reviewer)

Optional: Deploy Notes

Uh oh!

github-actions bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Claude AI Code Review

Summary

✅ Strengths

⚠️ Concerns

CRITICAL: Security & Data Integrity

Major: Performance & Scalability

Major: Missing Features

Moderate: Code Quality

💡 Suggestions

Architecture & Design

Testing

Documentation

🔒 Security Notes

High Priority

Medium Priority

📋 Checklist for Authors

🎯 Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jbonilla-tao commented Dec 4, 2025 •

edited

Loading

github-actions bot commented Dec 4, 2025 •

edited

Loading