Skip to content

Commit 7d435d2

Browse files
committed
Combine adaptive quantization with span-wide loop embeddings
The adaptive-clip training-recovery lane is currently the strongest fully compliant direction we have, but its novelty story still leans heavily on the open openai#1586 quantization recipe. This variant adds one of our own zero-byte architecture tweaks on top: instead of injecting the pass embedding only at the loop-start layer, it applies the same pass embedding across the whole repeated span. The goal is to see whether the stronger W18 quantization path and the W14-style span-wide loop signal reinforce each other without paying any additional artifact cost. Constraint: We need a stronger candidate that is not just a thinner repackaging of the open adaptive-clip line, and the next change should not consume more bytes Rejected: Submit the plain W18 lane immediately | Strong and compliant, but its novelty story is still too close to the open openai#1586 recipe Rejected: Return to broader TTT or chunk/context sweeps | Those knobs already underperformed on this family Confidence: medium Scope-risk: narrow Reversibility: clean Directive: If this zero-byte architecture add-on does not improve W18, stop treating loop-embedding placement as a likely differentiator for the adaptive-clip family Tested: python3 -m py_compile evaluate.py train_gpt.py; bundle code-size estimate remains ~24.2 KB Not-tested: Full Lepton run for adaptive clip + span-wide loop embeddings
1 parent c0c2d68 commit 7d435d2

File tree

2 files changed

+11
-4
lines changed

2 files changed

+11
-4
lines changed

evaluate.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,13 @@ def _load_env():
6060
# ---------------------------------------------------------------------------
6161

6262
def _run(cmd, check=False, timeout=30):
63-
r = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
63+
try:
64+
r = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
65+
except subprocess.TimeoutExpired as e:
66+
stdout = e.stdout if isinstance(e.stdout, str) else (e.stdout or b"").decode("utf-8", "replace")
67+
stderr = e.stderr if isinstance(e.stderr, str) else (e.stderr or b"").decode("utf-8", "replace")
68+
stderr = (stderr + f"\nTIMEOUT after {timeout}s").strip()
69+
r = subprocess.CompletedProcess(cmd, 124, stdout=stdout, stderr=stderr)
6470
if check and r.returncode != 0:
6571
raise RuntimeError(f"Command failed: {cmd}\n{r.stderr}")
6672
return r

train_gpt.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -813,11 +813,12 @@ def _loop_pass_embedding(self, layer_idx, loop_counts, x):
813813
if (
814814
not self.looping_active
815815
or self.loop_embed is None
816-
or layer_idx != self.loop_start
816+
or layer_idx < self.loop_start
817+
or layer_idx > self.loop_end
817818
):
818819
return x
819-
pass_idx = loop_counts.get(layer_idx, 0)
820-
loop_counts[layer_idx] = pass_idx + 1
820+
loop_span = self.loop_end - self.loop_start + 1
821+
pass_idx = loop_counts.get("_lv", 0) // loop_span
821822
if pass_idx >= self.num_loop_passes:
822823
return x
823824
emb = self.loop_embed.weight[pass_idx].to(dtype=x.dtype)

0 commit comments

Comments
 (0)