Skip to content

Feat: Zero-dependency Real-Time Experiment Dashboard#114

Open
aniruddhaadak80 wants to merge 1 commit intokarpathy:masterfrom
aniruddhaadak80:feat/realtime-dashboard
Open

Feat: Zero-dependency Real-Time Experiment Dashboard#114
aniruddhaadak80 wants to merge 1 commit intokarpathy:masterfrom
aniruddhaadak80:feat/realtime-dashboard

Conversation

@aniruddhaadak80
Copy link

@aniruddhaadak80 aniruddhaadak80 commented Mar 10, 2026

Running an autonomous agent overnight means having to repeatedly tail logs or run Jupyter to see how things are going.

This PR adds a 100% dependency-free real-time dashboard powered by the built-in Python http.server. It reads results.tsv and run.log live—meaning it cannot possibly create merge conflicts or break train.py.

Features

  1. Live Validation Graph (via Chart.js CDN) showing BPB over time.
  2. Success/Crash Metrics Counter.
  3. Live run.log tailing block to see if the current experiment is OOMing right from your browser.

How to jump in

Simply run python dashboard.py entirely in parallel with the LLM script, and open http://localhost:8080.

Copilot AI review requested due to automatic review settings March 10, 2026 06:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a lightweight real-time experiment dashboard intended to run alongside training/agent runs, displaying results.tsv metrics and tailing run.log over a local HTTP server.

Changes:

  • Introduces dashboard.py, an http.server-based UI with a /data JSON endpoint and an HTML dashboard using Chart.js via CDN.
  • Updates README.md to document running the new dashboard.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
dashboard.py New HTTP dashboard server that serves an HTML page and exposes experiment/log data via /data.
README.md Mentions the new dashboard and how to run it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,165 @@
import os
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File starts with a UTF-8 BOM / non-printing character before import os. While Python can tolerate this, it often causes issues with linters, diffs, and some tooling; consider removing the BOM so the first character is i in import.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +53
<thead><tr><th>Commit</th><th>BPB</th><th>Memory (MB)</th><th>Status</th><th>Description</th></tr></thead>
<tbody id="tableBody"></tbody>
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is labeled "Memory (MB)", but results.tsv is documented as memory_gb in program.md. This will mislead readers and makes comparisons error-prone; either update the UI label to GB or convert the stored value to MB consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +76
<td><code>${r.commit.substring(0,7)}</code></td>
<td>${r.bpb}</td>
<td>${r.memory}</td>
<td><span class="badge ${r.status}">${r.status}</span></td>
<td>${r.description}</td>
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values from results.tsv (commit, status, description, etc.) are inserted via innerHTML without escaping. If the TSV contains </& (accidentally or maliciously), this becomes an XSS vector in the dashboard; prefer building DOM nodes with textContent (or escape the fields) and avoid interpolating raw strings into HTML.

Copilot uses AI. Check for mistakes.
Comment on lines +144 to +145
except Exception:
pass
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception: pass here silently drops parse errors and can mask malformed results.tsv rows or encoding issues. Prefer catching the specific exceptions you expect (e.g., FileNotFoundError, UnicodeDecodeError) and logging errors via self.log_error(...) so failures are diagnosable.

Copilot uses AI. Check for mistakes.
Comment on lines +152 to +153
except:
pass
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except: will also swallow KeyboardInterrupt/SystemExit and makes debugging harder. Catch specific exceptions for log reading and log the error (or at least include it in the /data response) so the dashboard doesn't fail silently.

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +142
"bpb": parts[1],
"memory": parts[2],
"status": parts[3].upper(),
"description": parts[4]
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory is passed through from results.tsv as a string and (per program.md) is in GB, but the UI implies MB and there is no numeric normalization. Consider parsing to a float, validating it, and either renaming the key to memory_gb or converting units before returning JSON to keep the client logic consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +151
if os.path.exists("run.log"):
with open("run.log", "r", encoding="utf-8") as f:
log_tail = "".join(f.readlines()[-40:])
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f.readlines() reads the entire run.log into memory every 2 seconds just to tail the last 40 lines. For large logs this can become a significant CPU/memory hit and stall responses; consider a true tail implementation (seek from end in binary mode) or cap read size.

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +164
with socketserver.TCPServer(("", PORT), DashboardHandler) as httpd:
print(f"📊 Dashboard gracefully running at http://localhost:{PORT}")
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCPServer(("", PORT), ...) binds to all interfaces (0.0.0.0) even though the message advertises localhost. This can unintentionally expose run.log / results.tsv to the local network; bind explicitly to 127.0.0.1 by default (and make host configurable if needed).

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +160
else:
super().do_GET()
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falling back to super().do_GET() means the server will serve arbitrary files from the working directory (and potentially directory listings), which is risky given this is intended as a log/results viewer. Consider returning 404 for unknown paths, or restrict static serving to an explicit allowlist so only / and /data are reachable.

Copilot uses AI. Check for mistakes.
Comment on lines 12 to +16

- **`prepare.py`** — fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified.
- **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. **This file is edited and iterated on by the agent**.
- **`program.md`** — baseline instructions for one agent. Point your agent here and let it go. **This file is edited and iterated on by the human**.
- **`dashboard.py`** — a dependency-free, zero-config real-time dashboard. Run `python dashboard.py` to stream live experiment data & logs via HTTP.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README now lists dashboard.py, but the preceding text says "three files that matter" and the dashboard also relies on loading Chart.js from a CDN (so it isn't fully dependency-free/offline). Consider updating this section to reflect the additional file and clarify that the UI fetches Chart.js over the network unless vendored locally.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants