Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,18 @@ lightning_logs/
/notebooks/logs
/artifacts

# Claude Code workspace
.claude/scheduled_tasks.lock
.claude/worktrees/

# Local evaluation outputs — the eval *.py scripts under logs/ are tracked
# for reproducibility, but their generated artefacts (raw logs, JSON, PNGs)
# are not.
logs/*.log
logs/*.json
logs/*.png
logs/__pycache__/

# IDE
.idea/
.vscode/
Expand Down
8 changes: 8 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,14 @@ User-facing knobs (all on `[0, 1]` where applicable):
- `allowed_elements` — hard whitelist over element symbols.
- `element_step_scale` — per-element gradient scaling; `0` hard-locks an element to its seed value.
- `class_target_weight` — weight on the classification objective vs. the regression targets.
- `max_elements` — cardinality cap "at most K elements per recipe", enforced by a
differentiable Plötz–Roth iterative soft top-K mask + final hard projection.
- `annealing_scale` ∈ `[0, 1]` (default 0.5) — single-knob softness of the K-hot annealing
schedule; maps to `τ_start = 25**scale`. Advanced override via the `annealing_schedule` dict.
- `fixed_amounts` — pin specific elements at user-given absolute amounts (e.g.
`{"Au": 0.65}`); reuses the lock-paste path, no `initial_weights` required.
- `min_nonzero_weight` — drop unlocked positions below this floor (per-row safe fallback
keeps the simplex valid).

```mermaid
graph TD
Expand Down
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,20 @@ entry points on the model:
| `optimize_latent(optimize_space="latent")` | the latent $h$ | no — needs AE decode | `ae_align_scale ∈ [0, 1]` (default 0.5; pulls $h$ onto the AE manifold) |
| `optimize_composition` | element-weight logits $\theta$, with $w = \text{softmax}(\theta)$ | yes — $w$ is the recipe | `diversity_scale ∈ [0, 1]` (default 1.0; per-output entropy penalty) |

`optimize_composition` further accepts an orthogonal constraint surface (full docstrings on
the method; design notes in
[docs/inverse_design_extension_notes.md](docs/inverse_design_extension_notes.md)):

- `max_elements: int` — cardinality cap (at most K non-zero elements per recipe), enforced
through a differentiable iterative-softmax K-hot mask with a single `annealing_scale ∈ [0, 1]`
softness knob (default 0.5 = the calibrated safe choice).
- `fixed_amounts: {symbol: float}` — pin specific elements at user-given absolute amounts
(e.g. `{"Au": 0.65, "Ga": 0.20}`); the optimiser distributes the remaining mass freely.
- `min_nonzero_weight: float` — reject trace-amount appearances (e.g. drop anything below
10 %), with safe-fallback so the simplex invariant is always preserved.

All three compose orthogonally with each other and with `allowed_elements` / `element_step_scale`.

Both methods share the same regression-MSE + classification-cross-entropy backbone; only the
third loss term and the optimisation variable differ. **Reference:**
[docs/inverse_design_algorithms.md](docs/inverse_design_algorithms.md).
Expand Down
10 changes: 9 additions & 1 deletion docs/inverse_design_algorithms.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ with
| **`allowed_elements` whitelist** | Masks the logits of disallowed elements to $-\infty$ before every softmax step. | Restrict the search to physically realisable elements (e.g. `ALLOY_PALETTE`, 41 symbols), suppressing model biases toward Pu / F / Cs / etc. |
| **`element_step_scale` soft-freeze / hard-lock** | Soft: multiply the element's logit gradient by the scale before each Adam step. Hard (value = 0): rewrite the softmax output to paste seed values back at locked positions and renormalise unlocked positions over the remaining mass. | Let the user pin certain elements to their seed values ("keep the Au-Ga-RE skeleton; you may only change the rare-earth ratios"). |
| **`seed_blend` mixture** | $w_0 \leftarrow \text{seed\_blend} \cdot \text{seed} + (1 - \text{seed\_blend}) \cdot \text{uniform}_{\text{allowed}}$ | Don't start from a 100 % seed (5 % uniform mass lifts every allowed element's logit from $\log(10^{-12}) \approx -27.6$ to $\log(0.05 / \lvert\text{allowed}\rvert) \approx -7.6$, so Adam can introduce new elements within a few hundred steps — this is the **element-discovery** mechanism). |
| **`max_elements` cardinality cap** | Plötz–Roth iterative-softmax K-hot mask $m \in [0, 1]^n$ with $\sum_i m_i = K$, multiplied with `softmax(θ)` and renormalised; temperature $\tau$ annealed from $\tau_\text{start} = 25^{\text{annealing\_scale}}$ down to $\tau_\text{end} = 0.01$ (geometric by default). A hard top-K projection at the end guarantees exactly K-hot (subject to floor below). | Restrict recipes to **at most K non-zero elements** (e.g. "I want a 3-element alloy"). The annealing doubles as a continuation method — the soft τ early on lets the optimiser explore different K-subsets before committing. |
| **`fixed_amounts` user-pin** | Build $\text{locked\_w}_0$ with user-specified values at the named positions, zero elsewhere; reuse the existing lock-paste machinery (no `initial_weights` required since values are given directly). | Pin specific elements at user-given absolute amounts (e.g. `{"Au": 0.65, "Ga": 0.20}` — the optimiser distributes the remaining 0.15 mass across other allowed elements). |
| **`min_nonzero_weight` floor** | After lock-paste, zero unlocked positions with $0 < w < \text{floor}$ and renormalise the unlocked portion to fit the free mass; safe-fallback when dropping would empty a row (leave that row unfloored). | Reject trace-amount appearances (e.g. `Pt = 0.5 %`) that are not synthesisable — "if you use it, use ≥ 10 %". |

### What each loss term is for

Expand All @@ -106,6 +109,11 @@ with
| `seed_blend` | $[0, 1]$ | 0.95 | Fraction of seed kept at the start (the rest is uniform, so new elements can enter). |
| `allowed_elements` | symbol list or `"all"` | `"all"` | Element whitelist (hard constraint). |
| `element_step_scale` | float or `{symbol: float}` | 1.0 | Per-element step scaling; `0` = hard-lock to the seed value. |
| `max_elements` | `int` ∈ $[1, n]$ or `None` | `None` | Cardinality cap — at most K non-zero elements (differentiable soft top-K + final hard projection). |
| `annealing_scale` | $[0, 1]$ | 0.5 | Single-knob softness for the K-hot schedule; maps to $\tau_\text{start} = 25^{\text{scale}}$. |
| `annealing_schedule` | dict or `None` | `None` | Advanced piecewise override of the annealing schedule. |
| `fixed_amounts` | `{symbol: float}` or `None` | `None` | Pin elements at user-specified amounts (e.g. `{"Au": 0.65}`); needs $\sum < 1$. |
| `min_nonzero_weight` | $[0, 1]$ | 0.0 | Drop unlocked positions below this floor (and re-distribute mass). |
| `steps`, `lr` | — | 300, 0.05 | Adam optimisation budget over the logits. |

---
Expand All @@ -118,7 +126,7 @@ with
| **Where the reported recipe comes from** | $w_{\text{report}}$ inferred from $D(h)$ (an extra AE-decode step) | $w$ itself is the report |
| **Method-specific loss term** | $\alpha \cdot \lVert h - \tanh(E(D(h))) \rVert^2$ (keeps $h$ on the AE manifold) | $(1 - d) \cdot H(w)$ (controls per-solution peakiness) |
| **Failure mode** | $\alpha = 0$: $h$ drifts off the manifold, decoded recipe unphysical (QC 0.97 → 0.35). | `seed_blend = 1.0`: the seed's support set is frozen — no new elements can ever appear. |
| **Method-specific knobs** | `ae_align_scale` | `diversity_scale`, `seed_blend`, `allowed_elements`, `element_step_scale` |
| **Method-specific knobs** | `ae_align_scale` | `diversity_scale`, `seed_blend`, `allowed_elements`, `element_step_scale`, `max_elements` + `annealing_scale` / `annealing_schedule`, `fixed_amounts`, `min_nonzero_weight` |

The shared backbone — (1) regression MSE + (2) classification cross-entropy — is **identical**
between the two methods. They differ *only* in the third loss term and in which variable is
Expand Down
Loading