Feat: Zero-dependency Real-Time Experiment Dashboard by aniruddhaadak80 · Pull Request #114 · karpathy/autoresearch

aniruddhaadak80 · 2026-03-10T06:50:56Z

Running an autonomous agent overnight means having to repeatedly tail logs or run Jupyter to see how things are going.

This PR adds a 100% dependency-free real-time dashboard powered by the built-in Python http.server. It reads results.tsv and run.log live—meaning it cannot possibly create merge conflicts or break train.py.

Features

Live Validation Graph (via Chart.js CDN) showing BPB over time.
Success/Crash Metrics Counter.
Live run.log tailing block to see if the current experiment is OOMing right from your browser.

How to jump in

Simply run python dashboard.py entirely in parallel with the LLM script, and open http://localhost:8080.

Copilot

Pull request overview

Adds a lightweight real-time experiment dashboard intended to run alongside training/agent runs, displaying results.tsv metrics and tailing run.log over a local HTTP server.

Changes:

Introduces dashboard.py, an http.server-based UI with a /data JSON endpoint and an HTML dashboard using Chart.js via CDN.
Updates README.md to document running the new dashboard.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File	Description
dashboard.py	New HTTP dashboard server that serves an HTML page and exposes experiment/log data via `/data`.
README.md	Mentions the new dashboard and how to run it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T06:57:44Z

dashboard.py

@@ -0,0 +1,165 @@
+import os


File starts with a UTF-8 BOM / non-printing character before import os. While Python can tolerate this, it often causes issues with linters, diffs, and some tooling; consider removing the BOM so the first character is i in import.

Copilot · 2026-03-10T06:57:45Z

dashboard.py

+                    <thead><tr><th>Commit</th><th>BPB</th><th>Memory (MB)</th><th>Status</th><th>Description</th></tr></thead>
+                    <tbody id="tableBody"></tbody>


The table is labeled "Memory (MB)", but results.tsv is documented as memory_gb in program.md. This will mislead readers and makes comparisons error-prone; either update the UI label to GB or convert the stored value to MB consistently.

Copilot · 2026-03-10T06:57:45Z

dashboard.py

+                        <td><code>${r.commit.substring(0,7)}</code></td>
+                        <td>${r.bpb}</td>
+                        <td>${r.memory}</td>
+                        <td><span class="badge ${r.status}">${r.status}</span></td>
+                        <td>${r.description}</td>


Values from results.tsv (commit, status, description, etc.) are inserted via innerHTML without escaping. If the TSV contains </& (accidentally or maliciously), this becomes an XSS vector in the dashboard; prefer building DOM nodes with textContent (or escape the fields) and avoid interpolating raw strings into HTML.

Copilot · 2026-03-10T06:57:45Z

dashboard.py

+            except Exception:
+                pass


The except Exception: pass here silently drops parse errors and can mask malformed results.tsv rows or encoding issues. Prefer catching the specific exceptions you expect (e.g., FileNotFoundError, UnicodeDecodeError) and logging errors via self.log_error(...) so failures are diagnosable.

Copilot · 2026-03-10T06:57:45Z

dashboard.py

+            except:
+                pass


Bare except: will also swallow KeyboardInterrupt/SystemExit and makes debugging harder. Catch specific exceptions for log reading and log the error (or at least include it in the /data response) so the dashboard doesn't fail silently.

Copilot · 2026-03-10T06:57:45Z

dashboard.py

+                                    "bpb": parts[1],
+                                    "memory": parts[2],
+                                    "status": parts[3].upper(),
+                                    "description": parts[4]


memory is passed through from results.tsv as a string and (per program.md) is in GB, but the UI implies MB and there is no numeric normalization. Consider parsing to a float, validating it, and either renaming the key to memory_gb or converting units before returning JSON to keep the client logic consistent.

Copilot · 2026-03-10T06:57:46Z

dashboard.py

+                if os.path.exists("run.log"):
+                    with open("run.log", "r", encoding="utf-8") as f:
+                        log_tail = "".join(f.readlines()[-40:])


f.readlines() reads the entire run.log into memory every 2 seconds just to tail the last 40 lines. For large logs this can become a significant CPU/memory hit and stall responses; consider a true tail implementation (seek from end in binary mode) or cap read size.

Copilot · 2026-03-10T06:57:46Z

dashboard.py

+    with socketserver.TCPServer(("", PORT), DashboardHandler) as httpd:
+        print(f"📊 Dashboard gracefully running at http://localhost:{PORT}")


TCPServer(("", PORT), ...) binds to all interfaces (0.0.0.0) even though the message advertises localhost. This can unintentionally expose run.log / results.tsv to the local network; bind explicitly to 127.0.0.1 by default (and make host configurable if needed).

Copilot · 2026-03-10T06:57:46Z

dashboard.py

+        else:
+            super().do_GET()


Falling back to super().do_GET() means the server will serve arbitrary files from the working directory (and potentially directory listings), which is risky given this is intended as a log/results viewer. Consider returning 404 for unknown paths, or restrict static serving to an explicit allowlist so only / and /data are reachable.

Copilot · 2026-03-10T06:57:46Z

README.md


 - **`prepare.py`** — fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified.
 - **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. **This file is edited and iterated on by the agent**.
 - **`program.md`** — baseline instructions for one agent. Point your agent here and let it go. **This file is edited and iterated on by the human**.
+- **`dashboard.py`** — a dependency-free, zero-config real-time dashboard. Run `python dashboard.py` to stream live experiment data & logs via HTTP.


README now lists dashboard.py, but the preceding text says "three files that matter" and the dashboard also relies on loading Chart.js from a CDN (so it isn't fully dependency-free/offline). Consider updating this section to reflect the additional file and clarify that the UI fetches Chart.js over the network unless vendored locally.

Feat: Add zero-dependency Real-Time Experiment Dashboard

68b1d91

Copilot AI review requested due to automatic review settings March 10, 2026 06:50

Copilot started reviewing on behalf of aniruddhaadak80 March 10, 2026 06:51 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Zero-dependency Real-Time Experiment Dashboard#114

Feat: Zero-dependency Real-Time Experiment Dashboard#114
aniruddhaadak80 wants to merge 1 commit intokarpathy:masterfrom
aniruddhaadak80:feat/realtime-dashboard

aniruddhaadak80 commented Mar 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		<thead><tr><th>Commit</th><th>BPB</th><th>Memory (MB)</th><th>Status</th><th>Description</th></tr></thead>
		<tbody id="tableBody"></tbody>

		with socketserver.TCPServer(("", PORT), DashboardHandler) as httpd:
		print(f"📊 Dashboard gracefully running at http://localhost:{PORT}")

Conversation

aniruddhaadak80 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features

How to jump in

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aniruddhaadak80 commented Mar 10, 2026 •

edited

Loading