Feat: Zero-dependency Real-Time Experiment Dashboard#114
Feat: Zero-dependency Real-Time Experiment Dashboard#114aniruddhaadak80 wants to merge 1 commit intokarpathy:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a lightweight real-time experiment dashboard intended to run alongside training/agent runs, displaying results.tsv metrics and tailing run.log over a local HTTP server.
Changes:
- Introduces
dashboard.py, anhttp.server-based UI with a/dataJSON endpoint and an HTML dashboard using Chart.js via CDN. - Updates
README.mdto document running the new dashboard.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| dashboard.py | New HTTP dashboard server that serves an HTML page and exposes experiment/log data via /data. |
| README.md | Mentions the new dashboard and how to run it. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,165 @@ | |||
| import os | |||
There was a problem hiding this comment.
File starts with a UTF-8 BOM / non-printing character before import os. While Python can tolerate this, it often causes issues with linters, diffs, and some tooling; consider removing the BOM so the first character is i in import.
| <thead><tr><th>Commit</th><th>BPB</th><th>Memory (MB)</th><th>Status</th><th>Description</th></tr></thead> | ||
| <tbody id="tableBody"></tbody> |
There was a problem hiding this comment.
The table is labeled "Memory (MB)", but results.tsv is documented as memory_gb in program.md. This will mislead readers and makes comparisons error-prone; either update the UI label to GB or convert the stored value to MB consistently.
| <td><code>${r.commit.substring(0,7)}</code></td> | ||
| <td>${r.bpb}</td> | ||
| <td>${r.memory}</td> | ||
| <td><span class="badge ${r.status}">${r.status}</span></td> | ||
| <td>${r.description}</td> |
There was a problem hiding this comment.
Values from results.tsv (commit, status, description, etc.) are inserted via innerHTML without escaping. If the TSV contains </& (accidentally or maliciously), this becomes an XSS vector in the dashboard; prefer building DOM nodes with textContent (or escape the fields) and avoid interpolating raw strings into HTML.
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The except Exception: pass here silently drops parse errors and can mask malformed results.tsv rows or encoding issues. Prefer catching the specific exceptions you expect (e.g., FileNotFoundError, UnicodeDecodeError) and logging errors via self.log_error(...) so failures are diagnosable.
| except: | ||
| pass |
There was a problem hiding this comment.
Bare except: will also swallow KeyboardInterrupt/SystemExit and makes debugging harder. Catch specific exceptions for log reading and log the error (or at least include it in the /data response) so the dashboard doesn't fail silently.
| "bpb": parts[1], | ||
| "memory": parts[2], | ||
| "status": parts[3].upper(), | ||
| "description": parts[4] |
There was a problem hiding this comment.
memory is passed through from results.tsv as a string and (per program.md) is in GB, but the UI implies MB and there is no numeric normalization. Consider parsing to a float, validating it, and either renaming the key to memory_gb or converting units before returning JSON to keep the client logic consistent.
| if os.path.exists("run.log"): | ||
| with open("run.log", "r", encoding="utf-8") as f: | ||
| log_tail = "".join(f.readlines()[-40:]) |
There was a problem hiding this comment.
f.readlines() reads the entire run.log into memory every 2 seconds just to tail the last 40 lines. For large logs this can become a significant CPU/memory hit and stall responses; consider a true tail implementation (seek from end in binary mode) or cap read size.
| with socketserver.TCPServer(("", PORT), DashboardHandler) as httpd: | ||
| print(f"📊 Dashboard gracefully running at http://localhost:{PORT}") |
There was a problem hiding this comment.
TCPServer(("", PORT), ...) binds to all interfaces (0.0.0.0) even though the message advertises localhost. This can unintentionally expose run.log / results.tsv to the local network; bind explicitly to 127.0.0.1 by default (and make host configurable if needed).
| else: | ||
| super().do_GET() |
There was a problem hiding this comment.
Falling back to super().do_GET() means the server will serve arbitrary files from the working directory (and potentially directory listings), which is risky given this is intended as a log/results viewer. Consider returning 404 for unknown paths, or restrict static serving to an explicit allowlist so only / and /data are reachable.
|
|
||
| - **`prepare.py`** — fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified. | ||
| - **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. **This file is edited and iterated on by the agent**. | ||
| - **`program.md`** — baseline instructions for one agent. Point your agent here and let it go. **This file is edited and iterated on by the human**. | ||
| - **`dashboard.py`** — a dependency-free, zero-config real-time dashboard. Run `python dashboard.py` to stream live experiment data & logs via HTTP. |
There was a problem hiding this comment.
README now lists dashboard.py, but the preceding text says "three files that matter" and the dashboard also relies on loading Chart.js from a CDN (so it isn't fully dependency-free/offline). Consider updating this section to reflect the additional file and clarify that the UI fetches Chart.js over the network unless vendored locally.
Running an autonomous agent overnight means having to repeatedly
taillogs or run Jupyter to see how things are going.This PR adds a 100% dependency-free real-time dashboard powered by the built-in Python
http.server. It readsresults.tsvandrun.loglive—meaning it cannot possibly create merge conflicts or breaktrain.py.Features
run.logtailing block to see if the current experiment is OOMing right from your browser.How to jump in
Simply run
python dashboard.pyentirely in parallel with the LLM script, and openhttp://localhost:8080.