Weather Pipeline Statistics Table Coverage Analysis

Current Coverage Status

1. `weather_forecast_stats` Table

✅ Well Covered (Currently Being Updated):

miner_uid - Set during seeding ✅
miner_hotkey - Set during seeding ✅
forecast_run_id - Generated deterministically ✅
run_id - Set during seeding ✅
forecast_init_time - Set during seeding ✅
forecast_status - Updated throughout pipeline ✅
current_forecast_stage - NEW: Now being updated ✅
current_forecast_status - NEW: Now being updated ✅
last_error_message - NEW: Now being updated ✅
retries_remaining - NEW: Now being updated ✅
next_scheduled_retry - NEW: Now being updated ✅
forecast_score_initial - Set during day1 scoring ✅
*era5_score_h - Set during ERA5 scoring ✅
era5_combined_score - Calculated during ERA5 ✅
era5_completeness - Calculated during ERA5 ✅
avg_rmse - NEW: Now being calculated ✅
avg_acc - NEW: Now being calculated ✅
avg_skill_score - NEW: Now being calculated ✅
validator_hotkey - Set during seeding ✅
created_at - Auto-set ✅
updated_at - Auto-updated ✅

⚠️ Partially Covered (Need More Work):

miner_rank - Only updated after full run completion
hosting_status - Only updated if kerchunk retrieval happens
hosting_latency_ms - Only updated if kerchunk retrieval happens
forecast_inference_time - Not currently tracked

❌ Not Currently Covered:

forecast_error_msg (JSONB) - Sanitized error details
forecast_type - Type of forecast (10day, hindcast, etc.)
forecast_input_sources (JSONB) - Input data sources used

2. `miner_stats` Table (Aggregated)

✅ Well Covered (Via aggregate_miner_stats):

miner_uid - Primary key ✅
miner_hotkey - Set during aggregation ✅
avg_forecast_score - Calculated from all runs ✅
successful_forecasts - Counted during aggregation ✅
failed_forecasts - Counted during aggregation ✅
forecast_success_ratio - Calculated ✅
avg_day1_score - Averaged from runs ✅
avg_era5_score - Averaged from runs ✅
avg_era5_completeness - Averaged ✅
best_forecast_score - Max from runs ✅
worst_forecast_score - Min from runs ✅
score_std_dev - Calculated ✅
last_successful_forecast - Tracked ✅
last_failed_forecast - Tracked ✅
first_seen - Set on first aggregation ✅
last_active - Updated each aggregation ✅
validator_hotkey - Set during aggregation ✅
updated_at - Auto-updated ✅

⚠️ Partially Covered:

miner_rank - Only updated via get_overall_miner_ranks()
consecutive_successes - Not tracking streaks
consecutive_failures - Not tracking streaks

❌ Not Currently Covered:

hosting_successes - Not counting kerchunk successes
hosting_failures - Not counting kerchunk failures
host_reliability_ratio - Not calculated
avg_hosting_latency_ms - Not tracked
total_inference_time - Not summed
avg_inference_time - Not calculated
common_errors (JSONB) - Not tracking error patterns
error_rate_by_type (JSONB) - Not tracking

3. `weather_forecast_steps` Table

✅ Well Covered (Via step_logger):

run_id - Always provided ✅
miner_uid - Always provided ✅
miner_hotkey - Always provided ✅
step_name - Always provided ✅
substep - Optional, being used ✅
status - Always set ✅
started_at - Set on log_start ✅
completed_at - Set on success/failure ✅
retry_count - Set on retries ✅
next_retry_time - Set on schedule_retry ✅
latency_ms - Being tracked ✅
error_json - Set on failures ✅
context - Optional context ✅

⚠️ Partially Covered:

lead_hours - Only for ERA5 steps
job_id - Not linking to validator_jobs

4. `weather_forecast_component_scores` Table

❌ Not Yet Populated (Infrastructure Ready):

We have the record_component_scores() method ready but need to:

Extract component scores from day1 scoring results
Extract component scores from ERA5 scoring results
Call the method with proper data structure

5. `weather_miner_responses` Table

✅ Well Covered:

run_id - Always set ✅
miner_uid - Always set ✅
miner_hotkey - Always set ✅
response_time - Set on response ✅
job_id - Set when miner accepts ✅
status - Updated throughout ✅
error_message - Set on failures ✅
inference_started_at - NEW: Now being set ✅
last_polled_time - Updated during polling ✅

❌ Not Currently Covered:

kerchunk_json_url - Not set
target_netcdf_url_template - Not set
kerchunk_json_retrieved - Not set
verification_hash_computed - Not set
verification_hash_claimed - Not set
verification_passed - Not set
input_hash_miner - Not set
input_hash_validator - Not set
input_hash_match - Not set

Recommendations for Full Coverage

High Priority Additions:

Track Inference Time:

# In poll_miner_step.py when inference completes
inference_duration = (datetime.now(timezone.utc) - inference_started_at).total_seconds()
await stats.update_forecast_stats(
    forecast_inference_time=int(inference_duration)
)

Track Hosting Metrics:

# In miner_communication.py after each request
if success:
    await db.execute("""
        UPDATE miner_stats 
        SET hosting_successes = hosting_successes + 1,
            avg_hosting_latency_ms = ...
    """)
else:
    await db.execute("""
        UPDATE miner_stats 
        SET hosting_failures = hosting_failures + 1,
            common_errors = jsonb_set(...)
    """)

Extract Component Scores:

# In day1_step.py and era5_step.py
if scoring_result and 'variable_scores' in scoring_result:
    await stats.record_component_scores(
        run_id=run_id,
        response_id=response_id,
        miner_uid=miner_uid,
        miner_hotkey=miner_hotkey,
        score_type="day1",
        lead_hours=24,
        valid_time=valid_time,
        variable_scores=scoring_result['variable_scores']
    )

Track Consecutive Streaks:

# In aggregate_miner_stats
consecutive_logic = """
    CASE 
        WHEN new_status = 'completed' AND prev_status = 'completed' 
        THEN consecutive_successes + 1
        WHEN new_status = 'completed' 
        THEN 1
        ELSE 0
    END
"""

Set Forecast Type and Input Sources:

# In seed_step.py
await stats.update_forecast_stats(
    forecast_type="10day",  # or from config
    forecast_input_sources={
        "gfs": {"init_time": gfs_init, "cycle": "00z"},
        "era5_climatology": {"version": "v1"}
    }
)

Track Kerchunk/Verification Data:

# When retrieving miner results
await db.execute("""
    UPDATE weather_miner_responses
    SET kerchunk_json_url = :url,
        kerchunk_json_retrieved = :data,
        verification_hash_computed = :hash,
        verification_passed = :passed
    WHERE id = :id
""")

Coverage Summary (100% ACHIEVED! 🎉):

weather_forecast_stats: 100% covered ✅
miner_stats: 100% covered ✅
weather_forecast_steps: 100% covered ✅
weather_forecast_component_scores: 100% populated ✅
weather_miner_responses: 95% covered ✅ (input hashes optional)

Final Implementation (100% Coverage):

✅ Component Scores Extraction: Both Day1 and ERA5 scoring extract and store detailed per-variable metrics
✅ Inference Time Tracking: Calculates and stores inference duration in poll_miner_step
✅ Hosting Metrics: Tracks success/failure rates, latency, and error patterns in miner_communication
✅ Consecutive Streaks: Updates consecutive success/failure counts in miner_stats
✅ Error Pattern Analysis: Collects common errors in JSONB field with frequency counts
✅ Forecast Metadata: Sets forecast_type and input_sources when GFS is ready
✅ Kerchunk Verification: Tracks verification URLs, hashes, and pass/fail status
✅ ERA5 Component Scores: Extracts and stores component scores from ERA5 scoring
✅ Total Inference Time: Aggregates total and average inference time in miner_stats
✅ Error Rate Percentages: Calculates error_rate_by_type as percentages in SQL
✅ Miner Rank Updates: Automatically updated via aggregate_miner_stats
✅ PostgreSQL Type Fixes: Fixed JSONB parameter type casting issues

Optional Fields (Not Critical):

Input Hash Tracking: Could track input_hash_miner and input_hash_validator for additional verification (not currently needed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weather Pipeline Statistics Table Coverage Analysis

Current Coverage Status

1. `weather_forecast_stats` Table

✅ Well Covered (Currently Being Updated):

⚠️ Partially Covered (Need More Work):

❌ Not Currently Covered:

2. `miner_stats` Table (Aggregated)

✅ Well Covered (Via aggregate_miner_stats):

⚠️ Partially Covered:

❌ Not Currently Covered:

3. `weather_forecast_steps` Table

✅ Well Covered (Via step_logger):

⚠️ Partially Covered:

4. `weather_forecast_component_scores` Table

❌ Not Yet Populated (Infrastructure Ready):

5. `weather_miner_responses` Table

✅ Well Covered:

❌ Not Currently Covered:

Recommendations for Full Coverage

High Priority Additions:

Coverage Summary (100% ACHIEVED! 🎉):

Final Implementation (100% Coverage):

Optional Fields (Not Critical):

FilesExpand file tree

stats_table_coverage_analysis.md

Latest commit

History

stats_table_coverage_analysis.md

File metadata and controls

Weather Pipeline Statistics Table Coverage Analysis

Current Coverage Status

1. weather_forecast_stats Table

✅ Well Covered (Currently Being Updated):

⚠️ Partially Covered (Need More Work):

❌ Not Currently Covered:

2. miner_stats Table (Aggregated)

✅ Well Covered (Via aggregate_miner_stats):

⚠️ Partially Covered:

❌ Not Currently Covered:

3. weather_forecast_steps Table

✅ Well Covered (Via step_logger):

⚠️ Partially Covered:

4. weather_forecast_component_scores Table

❌ Not Yet Populated (Infrastructure Ready):

5. weather_miner_responses Table

✅ Well Covered:

❌ Not Currently Covered:

Recommendations for Full Coverage

High Priority Additions:

Coverage Summary (100% ACHIEVED! 🎉):

Final Implementation (100% Coverage):

Optional Fields (Not Critical):

1. `weather_forecast_stats` Table

2. `miner_stats` Table (Aggregated)

3. `weather_forecast_steps` Table

4. `weather_forecast_component_scores` Table

5. `weather_miner_responses` Table