Transition metal oxidation state proposed fixes#35
Draft
Zhaoli2042 wants to merge 17 commits into
Draft
Conversation
The classy-yarp refactor set el_valence, el_n_deficient, el_n_expand_octet, el_expand_octet entries for Y, Zr, Nb, Mo, Tc, Ag, Cd, La, Hf, Ta, W, Re, Os, Pt, Hg to None, and dropped el_pol entries for the 6th-period block (cs, ba, la, hf, ta, w, re, os, ir, pt, au, hg, tl, pb, bi, po, at, rn). These cause KeyErrors / TypeErrors when running yarpecule on TM-containing species. This restores the old-YARP values so the patched-new-YARP can process the GoldDIGR transition-metal subset (181k reactions). Local commit; not pushed upstream.
Adds 4 patches missing from the new YARP master vs the old-YARP patch (GH commit fed9385): A. sys.setrecursionlimit(5000) -> 100000 (lewis_structure.py:197) Deep Lewis searches on TM-containing species hit the 5000 ceiling. B. 2nd gen_all_lstructs call: min_opt=True -> min_opt=False, min_win=0.5 (lewis_structure.py:295) The greedy second-pass descent gets stuck on high-OS local minima for TM-containing species. min_win=0.5 lets the search jump up to 0.5 above current best and find lower-score (chemically sensible) BEMs. C. Remove outer 'for ind in range(0, len(bond_mats))' loop in gen_all_lstructs (find_lewis.py:289). The body of 'for j' now runs once per call against bond_mats[ind] (the function parameter). Avoids re-walking over already-visited starting BEMs at each level. D. Remove move 4-bis (radical-radical bond formation) from valid_moves (find_lewis.py:446-450). This move was eliminated in the old-YARP patch as it generated spurious neutral-coupled bonds for species with multiple unpaired electrons (relevant to TM clusters). Motivation: a 30-archive sanity comparison of new-patched-YARP against the published slim transition_metal_oxidation_states.csv (from old patched YARP) showed 11/30 (37%) of archives differed, with new-YARP returning chemically implausible values (W+6, Co+7, Mn+5/+7 etc.) for TM species the old patched YARP handled correctly. Diagnosis traced the divergence to these four missing changes.
NEW YARP master flipped the sign of the radical-environment weight in bmat_score: w_rad went from +0.1 (penalize radicals on polarizable atoms) to -0.01 (reward them). Bisection over the 144-archive stratified TM sample shows new YARP biases toward HIGHER OS than old YARP in 9/10 metals (Fe, Mn, Re, Pt, Rh, Ir all 100% upward). High-OS BEMs leave unpaired electrons (radicals) on ligand atoms that are now rewarded under the new -0.01 weight, plausibly explaining the upward bias. This commit reverts w_rad default to +0.1 in both bmat_score() and lewis_struct._gen_bond_el_mat() (used in pass 3, the final ranking after adjust_metals). Passes 1 and 2 already set w_rad=0 explicitly so the search loop is unaffected; only the final ranking changes.
Generalizes patch E. The seed re-pool step before adjust_metals is:
- RESTORED when no transition metals are present in the molecule
(preserves NEW YARP master behavior for organic-only species)
- DISABLED when transition metals are present
(prevents non-aromatic pass-1 seeds from sneaking through
adjust_metals as high-OS Z-bond ligand configurations)
This is the long-term-correct form of patch E. Empirically validated:
- Patch E broke 3 organic pytest cases (test_diazomethane_xyz,
test_benzothiazole_smi, test_ester_xyz) where the re-pool was
needed to anchor the mats_thresh trim
- Patch F should pass all organic tests while keeping the 38/55
TM oxidation-state fixes from patch E
el_metals is imported from yarp.util.properties (already maintained
in the patch-baseline restoration of 5d/4d TM property values).
Bisection on the 144-archive TM stratified sample revealed patch B has a net-NEGATIVE effect on TM oxidation states: 1 archive matched slim after B alone, 3 new archives diverged. ABCD (full 4-patch bundle) had 1 more regression than only-C alone, attributable to B. Bisection on YARP 3.0's pytest also showed patch B breaks 3 organic test cases: test_diazomethane_xyz, test_ester_xyz, test_benzothiazole_smi. Master / only-A / only-C / only-D all pass; only patch B fails. GH commit fed9385 silently bundled patch B with the deliberate TM-OS fixes (adjust_metals 3 GUARDs + 5d el_metals). It was a carry-along algorithmic change from local development that was never validated. Removing it improves both organic and TM behavior. Final patch stack for new YARP: master baseline: - 3 GUARDs in adjust_metals + 5d el_metals (already in new master) - N_score=100 (already in new master) - restore-old-values properties.py (5d/4d TM dicts + el_pol) zhao-final-20260619: - A: recursion limit 5000 -> 100000 (safety net for huge molecules) - C: remove outer for-ind loop in gen_all_lstructs (10x search speedup) - D: remove move 4-bis (radical-radical, dead code with unused k loop) - w_rad: +0.1 vs new master's -0.01 (no-op for chemistry; speed-only) - F: conditional seed-BEM re-pool (organic restored, TM disabled)
Two markdown docs explaining the investigation, the patches, and the
recommended upstream changes:
zhao-patches-doc/YARP-3.0-OS-divergence-investigation.md
Full technical writeup. ~7 pages. Covers the bisection
methodology, every patch with bug/fix/evidence, the conceptual
explanation of why new YARP biases TM oxidation states upward,
the recommended upstream PRs, and the residual 13% breakdown.
zhao-patches-doc/YARP-3.0-OS-divergence-summary.md
One-page executive summary for non-YARP audiences (PI / collaborators).
Three plain-English bugs, impact on GoldDIGR, action items.
Lives on the zhao-final-20260619 branch alongside the patches it
documents; master stays a clean mirror of upstream classy-yarp.
Full 181,450-archive mass-gen completed 2026-06-20 across 4 SLURM phases.
Headline numbers:
- 10 YARP errors / 181,450 (0.01%)
- 80.25% full-archive agreement vs published slim CSV
- Per-metal up/down splits roughly symmetric at corpus scale
(Pd 4651/4575, Mo 1355/1373, Cu 1745/2282)
- 0 chemically impossible OS values (no Pt(VII), Cr(VI), Ir(V), Co(VII))
The full-corpus number (80%) is lower than the stratified-sample number
(89%) because the stratified sample was deliberately weighted toward
hard rare metals where our patches helped most. At corpus scale the
dominant late-TM organometallic chemistry (Pd/Rh/Fe/Ni/Cu) is more like
two algorithms making different but defensible BEM choices than a
systematic bug pattern.
Investigation doc: section 7 rewritten with stratified + corpus tables
and the symmetric-split observation.
Summary doc: numbers table extended with full-corpus row, the
'remaining ~13%' bullet replaced with corpus-wide
framing.
Of 181,450 archives, 10 raised LewisStructureError or ValueError under
the FINAL stack. Cluster-bonding diagnosis:
- 8 of 10 are metallaboranes (M-Bn, n=4-19) where Wade/Mingos
polyhedral cluster bonding requires 3c-2e formalism that YARP's
2c-2e Lewis search cannot express
- 1 is a Pd3 sulfide cluster (3c-2e bridging chalcogenides)
- 1 is a Ru-Pb11 main-group cluster
4 of the 10 errors come from one paper (c9nj06335h1) studying
analogous M-B16 series for M = Cr/Mn/Co/Ni. This is academic
exploration of one anomalous bonding class, not 4 separate bugs.
Often only one of (reactant, product) fails -- the pipeline emits
'ERR:' on the failing side and OS values on the working side, so
downstream consumers can detect and skip them.
At 10/181450 = 0.0055%, this is well below the aggregate noise floor
for figures derived from the OS distribution.
Recommendation to YARP team: distinguish 'cluster molecule' from
'malformed input' as a return value so downstream tools can branch
cleanly. No algorithmic fix realistic without a separate
cluster-aware YARP mode.
Headline finding from corpus-wide spot-check of the 35,843 disagreement
archives (59,769 per-atom OS diffs):
- 94% of per-atom diffs are |delta| <= 2 (Lewis-choice noise; both
values chemically defensible)
- mean signed delta = +0.155 OS units (slight upward drift, not
systematic bug)
- mean |delta| = 1.52 OS units
The single most important corpus-level finding:
Atom-level OS values exceeding group maximum (Pd > 4, Fe > 6, Cu > 3,
etc.) DROPPED from 53,980 in the old slim CSV to 11,515 in the new
FINAL CSV -- a 78% reduction in chemically impossible OS reports.
Per-metal:
Ag -94% Au -97% Rh -87% Pd -76% Cu -75% Fe -74% ...
True archive-level regressions (1,162) almost exactly balance new
fixes (1,124). The patched new YARP is strictly more chemically
conservative than the old YARP that produced the published dial-plot.
Implication: regenerating Figure 2 from the new FINAL CSV would
IMPROVE the published plot's over-group-max tail bins, not degrade
them. Updated section 7.5 'Bottom line' bullets and added a new
section 9 'Spot-check' with magnitude distribution and examples.
Summary doc also updated to reflect the corpus-wide headline.
Earlier draft compared the full slim CSV (462k rows, all charge/mult variants) against the FINAL CSV (181k deduped picks). That inflated the OLD over-max counts by including dup rows the published dial-plot never used. Apples-to-apples comparison on the same deduped 181k archive set: OLD slim over-max: 11,477 atom-level cases NEW FINAL over-max: 11,515 atom-level cases net delta: +38 (+0.3%) -- essentially flat Real wins are concentrated on heavy noble metals: Au: 161 -> 91 (-43%) Ag: 300 -> 212 (-29%) Modest losses on: Rh: 985 -> 1115 (+130) Co: 1133 -> 1198 (+65) Ni: 1139 -> 1188 (+49) What the patches DO eliminate are the catastrophic single-archive failures shown on the stratified sample (Cp-Cr -> Cr(VI) clusters, Pt(0) -> Pt(VII), Cp-Ir -> Ir(V), Co(VII)). Those don't show up in the corpus-wide over-max count because they're rare absolute contributors to a ~11k baseline of corpus-wide over-max events. Both summary and investigation docs updated to reflect this. Section 9.1's table now shows the apples-to-apples comparison; the bottom line bullets are toned down accordingly.
Three figures showing OLD slim CSV vs NEW FINAL CSV at corpus scale,
restricted to the apples-to-apples deduplicated 181,450-archive set:
tm_os_dials_OLD.png reproduction of published dial-plot Figure 2
tm_os_dials_NEW.png same plot from FINAL CSV
tm_os_compare_OLD_vs_NEW.png per-metal grouped bar chart with
chemically-impossible region shaded
The dial plots look essentially identical to the eye -- consistent with
the +0.3% over-max delta. The bar-chart comparison is more informative:
Cu / Ag / Au show visibly reduced over-max bins under the new YARP;
most other metals are indistinguishable from old.
Investigation doc section 9.4.5 added with figure descriptions.
PNG renderer: Scripts/v2/os_test_final/plot_os_compare.py
Matrix builder: Scripts/v2/build_compare_matrices.py
Dial-plot script: Scripts/draw_tm_os_radial_dials_fullcircle.py
Comment-only changes; no behavior change. Audit found three annotations
that were either stale (made false claims) or missed a load-bearing
rationale that a future reader would need.
bem_score.py w_rad default revert
lewis_structure w_rad default revert
OLD comment claimed the sign flip 'biases TM dial-plot toward higher
OS because high-OS BEMs leave radicals on ligand atoms now rewarded
under -0.01'. This was the ORIGINAL hypothesis when patch was made.
Later bisection (only-w_rad branch) showed the revert has ZERO
chemistry effect: new bem_score computes rad_env = +pol/(100+pol)
internally, old find_lewis passed rad_env = -0.1*pol/(100+pol), so
w_rad*rad_env is algebraically identical between the two. The
revert is COSMETIC (matches +0.1 downstream convention). New comment
states this explicitly.
find_lewis.py Patch C (outer for-ind loop removal)
OLD comment said 'match old-YARP behavior' but didn't explain WHY
the loop existed or what removing it costs/saves. Added the
performance angle: old loop re-walked all BEMs at every recursion
depth (exponential blow-up); removal gives ~10x speedup on TM
bench (32972s -> 1718s in isolation). Future reader: do NOT
restore.
find_lewis.py Patch D (move 4-bis removal)
OLD comment preserved the removed code but didn't explain it was
dead. Added: the inner loop iterates but is never used
in the yielded move, so the block emits the same (i,j) coupling
move once per qualifying k — pure duplicate yields that bloat the
search. Move 4 below covers the legitimate radical-radical case.
Looks like unfinished experimental code.
All 55 lewis pytest cases still pass.
In-branch supplementary material (1.7 MB total) so reviewers can verify
the patch claims without leaving GitHub:
scripts/ build/bench/plot tooling + PATHS_NOTE.md
bench_stratified_144/ 9 per-condition CSVs + the input list
for the bisection summarized in the
investigation MD
tm_os_matrix_OLD.csv per-metal OS bin matrix from old YARP
tm_os_matrix_NEW.csv per-metal OS bin matrix from this branch
README.md folder index + reading order
NOT included (intentionally) to keep the branch small:
transition_metal_oxidation_states_FINAL.csv 24 MB master output
dedup_tm_picks.txt 21 MB input zip list
These two are available in a separate 46 MB bundle; the URL goes in
the PR description so anyone who wants to re-run the corpus comparison
can fetch them. The data already in the branch is enough to evaluate
the patches and the docs.
erm42
requested changes
Jun 25, 2026
erm42
left a comment
Member
There was a problem hiding this comment.
This is looking nice! One thing is that I do not want the zhao-patches-doc contents on the git repo, so please remove that, and then we should be fine to merge into master
In the future, I'd like to establish some integration tests for YARP, so perhaps we can circle back around to including some of your testing pipeline into that. But I don't have a home figured out for it yet.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five small patches that fix YARP 3.0's behavior on transition-metal
oxidation-state extraction, validated on a 181,450-archive corpus of
DFT-optimized organometallic reactions. All commits are on
zhao-final-20260619and pass the existingtest/yarpecule/lewis/suite (55 / 55).
What's in the bundle
yarp/util/properties.py— restore 5d/4d TM dict entries thatwere set to
Nonein master (el_valence,el_n_deficient,el_n_expand_octet,el_expand_octet). Also restore the 6th-periodel_polentries (Cs..Rn were missing entirely, causingKeyError('cs')). Drops a duplicate"rh": 66key in the same dictliteral. Trivial bug fix.
Patch C — remove the outer
for ind in range(0, len(bond_mats)):loop in
gen_all_lstructs(find_lewis.py:289). Every recursivecall already passes
ind = len(bond_mats)-1, so the outer loopre-walks every previously discovered BEM at every recursion depth.
Removing it yields ~10× speedup on TM benchmarks (32,972 s →
1,718 s in isolation on a 144-archive stratified sample).
Performance + correctness, zero organic test regression.
Patch D — remove the radical-radical "move 4-bis" yield block in
valid_moves(find_lewis.py:447). The innerfor kloop iteratesbut
kis never used in the yielded move — block emits the same(i, j)coupling move once per qualifyingk, all duplicates.Dead-code cleanup.
Patch F — conditional seed-BEM re-pool in
lewis_structure._gen_bond_el_mat. The re-pool step (after pass-2,before
adjust_metals) keeps the new behavior for organic systems(where the existing pytest cases depend on it) but disables it when
any transition metal is present. On TM species the pass-1 seed BEMs
are optimized without aromaticity (
w_aro = 0) and so often encodenon-aromatic ligand configurations that
adjust_metalsthendative-izes via Z-bonds — draining the metal diagonal to zero and
producing impossible high OS values (Cp-Cr → Cr(VI), Cp-Ir → Ir(V),
Pt(0) → Pt(VII)). Eliminating those single-archive failures was the
most impactful chemistry change on our bench. Could plausibly be
done a different way upstream (see "open question" below).
w_raddefault revert (bem_score.py,lewis_structure.py) —from
-0.01to+0.1. COSMETIC, not a behavior change. Thesign flip in
w_radis algebraically compensated by a sign+scaleflip in how
rad_envis computed insidebmat_scorevs how oldfind_lewispassed it in. We isolated this on its own branch andconfirmed zero chemistry effect; reverted only to keep
w_rad > 0for downstream tooling that calls
bmat_scoredirectly.What's NOT in the bundle
We deliberately exclude one change that was in the historical
fed9385commit: switching the 2ndgen_all_lstructscall fromgreedy descent (
min_opt=True) to exploratory (min_opt=False, min_win=0.5). Bisection on the stratified sample showed it'snet-negative on TMs (1 archive fixed, 3 new regressions) and
breaks 3 organic pytest cases (
test_diazomethane_xyz,test_ester_xyz,test_benzothiazole_smi).Validation
pytest test/yarpecule/lewis/— 55 / 55 pass (same as master).weighted toward W, Re, Os, Pt, Cr, Mn, Co, Au): match rate vs the
earlier patched-old-YARP that produced our published OS values
improves from 65 % → 89 %, wall-time 10× faster.
metallaborane / cluster molecules outside YARP's 2c-2e Lewis scope);
80.25 % full-archive agreement with the OS values produced by the
earlier patched-old-YARP. Of the disagreements, 94 % are |ΔOS| ≤ 2
(Lewis-choice noise; both values chemically defensible). Atom-level
over-group-max OS counts are flat at +0.3 % on an apples-to-apples
comparison (with real wins on Au −43 %, Ag −29 %).
Documentation and data
The branch ships with
zhao-patches-doc/(1.7 MB total) containing:YARP-3.0-OS-divergence-investigation.md— full ~10-page writeupwith bisection methodology, every patch's bug / fix / evidence, and
corpus-wide per-metal diff tables.
YARP-3.0-OS-divergence-summary.md— 1-page exec summary.tm_os_compare_OLD_vs_NEW.png— per-metal bar chart showing theeffect on the OS distribution.
tm_os_dials_OLD.png/tm_os_dials_NEW.png— full radial dialplots from each version.
tm_os_matrix_{OLD,NEW}.csv— per-metal OS bin matrices used torender the dial plots.
bench_stratified_144/— 9 per-condition CSVs from the bisectionsummarized in the investigation MD, plus the 144-archive input list.
scripts/— the build / bench / plot tooling that producedeverything above, with a
PATHS_NOTE.mdflagging hard-coded pathsto edit before running on another machine.
These are optional for the merge — happy to drop them or move them
to a wiki / issue if you'd prefer the working tree clean.
Not in the branch (kept out to keep the diff small):
transition_metal_oxidation_states_FINAL.csv— 24 MB master outputof the 181,450-archive run, one row per archive.
dedup_tm_picks.txt— 21 MB list of zip paths corresponding tothose archives.
Both are bundled in the 3.7 MB attachment to this PR
(
PR-classy-yarp-zhao-final-20260619.zip) — drag-and-drop it toGitHub and unpack to reproduce the full apples-to-apples comparison.
Open question for the maintainers
The "right" fix for Patch F (seed-BEM re-pool) is probably not the
TM-conditional we wrote here. Two alternatives we'd welcome a thought
on:
(rather than after). This would let the aromaticity term down-rank
non-aromatic Cp seeds before they ever reach
adjust_metals,preserving the safety-net behavior for organics without the TM
regression.
mats_threshto keep theorganic pytest cases passing. More invasive but cleaner conceptually.
Happy to split this PR into separate per-patch PRs if that would be
easier to review/merge. The branch's commit history already cleanly
separates each patch.
How to test locally
Attached files (too big to put directly into the commits, so attached to PR):
transition_metal_oxidation_states_FINAL.csv
dedup_tm_picks.txt