Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The repo is deliberately kept small and only really has a three files that matte
- **`prepare.py`** — fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified.
- **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. **This file is edited and iterated on by the agent**.
- **`program.md`** — baseline instructions for one agent. Point your agent here and let it go. **This file is edited and iterated on by the human**.
- **`dashboard.py`** — a dependency-free, zero-config real-time dashboard. Run `python dashboard.py` to stream live experiment data & logs via HTTP.

By design, training runs for a **fixed 5-minute time budget** (wall clock, excluding startup/compilation), regardless of the details of your compute. The metric is **val_bpb** (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared.

Expand Down
165 changes: 165 additions & 0 deletions dashboard.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
import os
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File starts with a UTF-8 BOM / non-printing character before import os. While Python can tolerate this, it often causes issues with linters, diffs, and some tooling; consider removing the BOM so the first character is i in import.

Copilot uses AI. Check for mistakes.
import json
import http.server
import socketserver

PORT = 8080

class DashboardHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
if self.path == '/':
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers()
html = """<!DOCTYPE html>
<html>
<head>
<title>Autoresearch Dashboard</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<style>
body { background: #1e1e1e; color: #fff; font-family: sans-serif; margin: 0; padding: 20px; }
.container { display: flex; flex-direction: column; gap: 20px; max-width: 1200px; margin: 0 auto; }
.panels { display: flex; gap: 20px; }
.panel { background: #2d2d2d; padding: 20px; border-radius: 8px; flex: 1; box-shadow: 0 4px 6px rgba(0,0,0,0.3); }
h2 { margin-top: 0; }
pre { background: #000; padding: 10px; border-radius: 4px; overflow-y: auto; max-height: 400px; color: #0f0; }
table { width: 100%; border-collapse: collapse; }
th, td { text-align: left; padding: 8px; border-bottom: 1px solid #444; }
th { background: #3d3d3d; }
.badge { padding: 4px 8px; border-radius: 4px; font-size: 0.8em; font-weight: bold; }
.KEEP { background: #28a745; }
.DISCARD { background: #dc3545; }
.CRASH { background: #ffc107; color: #000; }
</style>
</head>
<body>
<div class="container">
<h1>🚀 Autoresearch Real-Time Dashboard</h1>
<div class="panels">
<div class="panel" style="flex: 2">
<h2>Validation BPB (Bits Per Byte) History</h2>
<canvas id="bpbChart"></canvas>
</div>
<div class="panel" style="flex: 1">
<h2>Experiments Summary</h2>
<div id="stats">Loading...</div>
</div>
</div>
<div class="panels">
<div class="panel" style="flex: 2">
<h2>Recent Experiments</h2>
<table>
<thead><tr><th>Commit</th><th>BPB</th><th>Memory (MB)</th><th>Status</th><th>Description</th></tr></thead>
<tbody id="tableBody"></tbody>
Comment on lines +52 to +53
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is labeled "Memory (MB)", but results.tsv is documented as memory_gb in program.md. This will mislead readers and makes comparisons error-prone; either update the UI label to GB or convert the stored value to MB consistently.

Copilot uses AI. Check for mistakes.
</table>
</div>
<div class="panel" style="flex: 1">
<h2>Live Log (Tail)</h2>
<pre id="logOutput">Waiting for logs...</pre>
</div>
</div>
</div>
<script>
let bpbChart;
async function fetchData() {
try {
const res = await fetch('/data');
const data = await res.json();

const tbody = document.getElementById('tableBody');
tbody.innerHTML = data.results.slice(-10).reverse().map(r => `
<tr>
<td><code>${r.commit.substring(0,7)}</code></td>
<td>${r.bpb}</td>
<td>${r.memory}</td>
<td><span class="badge ${r.status}">${r.status}</span></td>
<td>${r.description}</td>
Comment on lines +72 to +76
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values from results.tsv (commit, status, description, etc.) are inserted via innerHTML without escaping. If the TSV contains </& (accidentally or maliciously), this becomes an XSS vector in the dashboard; prefer building DOM nodes with textContent (or escape the fields) and avoid interpolating raw strings into HTML.

Copilot uses AI. Check for mistakes.
</tr>
`).join('');

const keeps = data.results.filter(r => r.status === 'KEEP').length;
const discards = data.results.filter(r => r.status === 'DISCARD').length;
const crashes = data.results.filter(r => r.status === 'CRASH').length;
document.getElementById('stats').innerHTML = `
<p>Total Runs: <b>${data.results.length}</b></p>
<p>🟢 Successful Keeps: <b>${keeps}</b></p>
<p>🔴 Discarded ideas: <b>${discards}</b></p>
<p>⚠️ Crashes/OOMs: <b>${crashes}</b></p>
`;

const validRuns = data.results.filter(r => r.bpb && !isNaN(r.bpb));
const labels = validRuns.map((r, i) => i + 1);
const speeds = validRuns.map(r => parseFloat(r.bpb));

if (!bpbChart) {
const ctx = document.getElementById('bpbChart').getContext('2d');
bpbChart = new Chart(ctx, {
type: 'line',
data: {
labels: labels,
datasets: [{
label: 'Validation BPB',
data: speeds,
borderColor: '#007bff',
tension: 0.1,
fill: false
}]
},
options: { animation: false }
});
} else {
bpbChart.data.labels = labels;
bpbChart.data.datasets[0].data = speeds;
bpbChart.update();
}

document.getElementById('logOutput').innerText = data.log;

} catch (err) {
console.error(err);
}
}
setInterval(fetchData, 2000);
fetchData();
</script>
</body>
</html>"""
self.wfile.write(html.encode("utf-8"))
elif self.path == '/data':
results = []
try:
if os.path.exists("results.tsv"):
with open("results.tsv", "r", encoding="utf-8") as f:
lines = f.readlines()
for line in lines[1:]:
parts = line.strip().split("\t")
if len(parts) >= 5:
results.append({
"commit": parts[0],
"bpb": parts[1],
"memory": parts[2],
"status": parts[3].upper(),
"description": parts[4]
Comment on lines +139 to +142
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory is passed through from results.tsv as a string and (per program.md) is in GB, but the UI implies MB and there is no numeric normalization. Consider parsing to a float, validating it, and either renaming the key to memory_gb or converting units before returning JSON to keep the client logic consistent.

Copilot uses AI. Check for mistakes.
})
except Exception:
pass
Comment on lines +144 to +145
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception: pass here silently drops parse errors and can mask malformed results.tsv rows or encoding issues. Prefer catching the specific exceptions you expect (e.g., FileNotFoundError, UnicodeDecodeError) and logging errors via self.log_error(...) so failures are diagnosable.

Copilot uses AI. Check for mistakes.

log_tail = ""
try:
if os.path.exists("run.log"):
with open("run.log", "r", encoding="utf-8") as f:
log_tail = "".join(f.readlines()[-40:])
Comment on lines +149 to +151
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f.readlines() reads the entire run.log into memory every 2 seconds just to tail the last 40 lines. For large logs this can become a significant CPU/memory hit and stall responses; consider a true tail implementation (seek from end in binary mode) or cap read size.

Copilot uses AI. Check for mistakes.
except:
pass
Comment on lines +152 to +153
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except: will also swallow KeyboardInterrupt/SystemExit and makes debugging harder. Catch specific exceptions for log reading and log the error (or at least include it in the /data response) so the dashboard doesn't fail silently.

Copilot uses AI. Check for mistakes.

self.send_response(200)
self.send_header('Content-type', 'application/json')
self.end_headers()
self.wfile.write(json.dumps({"results": results, "log": log_tail}).encode("utf-8"))
else:
super().do_GET()
Comment on lines +159 to +160
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falling back to super().do_GET() means the server will serve arbitrary files from the working directory (and potentially directory listings), which is risky given this is intended as a log/results viewer. Consider returning 404 for unknown paths, or restrict static serving to an explicit allowlist so only / and /data are reachable.

Copilot uses AI. Check for mistakes.

if __name__ == '__main__':
with socketserver.TCPServer(("", PORT), DashboardHandler) as httpd:
print(f"📊 Dashboard gracefully running at http://localhost:{PORT}")
Comment on lines +163 to +164
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCPServer(("", PORT), ...) binds to all interfaces (0.0.0.0) even though the message advertises localhost. This can unintentionally expose run.log / results.tsv to the local network; bind explicitly to 127.0.0.1 by default (and make host configurable if needed).

Copilot uses AI. Check for mistakes.
httpd.serve_forever()
Loading