Transition metal oxidation state proposed fixes by Zhaoli2042 · Pull Request #35 · Savoie-Research-Group/yarp

Zhaoli2042 · 2026-06-23T22:13:14Z

Summary

Five small patches that fix YARP 3.0's behavior on transition-metal
oxidation-state extraction, validated on a 181,450-archive corpus of
DFT-optimized organometallic reactions. All commits are on
zhao-final-20260619 and pass the existing test/yarpecule/lewis/
suite (55 / 55).

What's in the bundle

yarp/util/properties.py — restore 5d/4d TM dict entries that
were set to None in master (el_valence, el_n_deficient,
el_n_expand_octet, el_expand_octet). Also restore the 6th-period
el_pol entries (Cs..Rn were missing entirely, causing
KeyError('cs')). Drops a duplicate "rh": 66 key in the same dict
literal. Trivial bug fix.
Patch C — remove the outer for ind in range(0, len(bond_mats)):
loop in gen_all_lstructs (find_lewis.py:289). Every recursive
call already passes ind = len(bond_mats)-1, so the outer loop
re-walks every previously discovered BEM at every recursion depth.
Removing it yields ~10× speedup on TM benchmarks (32,972 s →
1,718 s in isolation on a 144-archive stratified sample).
Performance + correctness, zero organic test regression.
Patch D — remove the radical-radical "move 4-bis" yield block in
valid_moves (find_lewis.py:447). The inner for k loop iterates
but k is never used in the yielded move — block emits the same
(i, j) coupling move once per qualifying k, all duplicates.
Dead-code cleanup.
Patch F — conditional seed-BEM re-pool in
lewis_structure._gen_bond_el_mat. The re-pool step (after pass-2,
before adjust_metals) keeps the new behavior for organic systems
(where the existing pytest cases depend on it) but disables it when
any transition metal is present. On TM species the pass-1 seed BEMs
are optimized without aromaticity (w_aro = 0) and so often encode
non-aromatic ligand configurations that adjust_metals then
dative-izes via Z-bonds — draining the metal diagonal to zero and
producing impossible high OS values (Cp-Cr → Cr(VI), Cp-Ir → Ir(V),
Pt(0) → Pt(VII)). Eliminating those single-archive failures was the
most impactful chemistry change on our bench. Could plausibly be
done a different way upstream (see "open question" below).
w_rad default revert (bem_score.py, lewis_structure.py) —
from -0.01 to +0.1. COSMETIC, not a behavior change. The
sign flip in w_rad is algebraically compensated by a sign+scale
flip in how rad_env is computed inside bmat_score vs how old
find_lewis passed it in. We isolated this on its own branch and
confirmed zero chemistry effect; reverted only to keep w_rad > 0
for downstream tooling that calls bmat_score directly.

What's NOT in the bundle

We deliberately exclude one change that was in the historical
fed9385 commit: switching the 2nd gen_all_lstructs call from
greedy descent (min_opt=True) to exploratory (min_opt=False, min_win=0.5). Bisection on the stratified sample showed it's
net-negative on TMs (1 archive fixed, 3 new regressions) and
breaks 3 organic pytest cases (test_diazomethane_xyz,
test_ester_xyz, test_benzothiazole_smi).

Validation

pytest test/yarpecule/lewis/ — 55 / 55 pass (same as master).
144-archive stratified sample (8 / metal × 18 transition metals,
weighted toward W, Re, Os, Pt, Cr, Mn, Co, Au): match rate vs the
earlier patched-old-YARP that produced our published OS values
improves from 65 % → 89 %, wall-time 10× faster.
Full 181,450-archive corpus run: 10 errors (0.01 %, all
metallaborane / cluster molecules outside YARP's 2c-2e Lewis scope);
80.25 % full-archive agreement with the OS values produced by the
earlier patched-old-YARP. Of the disagreements, 94 % are |ΔOS| ≤ 2
(Lewis-choice noise; both values chemically defensible). Atom-level
over-group-max OS counts are flat at +0.3 % on an apples-to-apples
comparison (with real wins on Au −43 %, Ag −29 %).

Documentation and data

The branch ships with zhao-patches-doc/ (1.7 MB total) containing:

YARP-3.0-OS-divergence-investigation.md — full ~10-page writeup
with bisection methodology, every patch's bug / fix / evidence, and
corpus-wide per-metal diff tables.
YARP-3.0-OS-divergence-summary.md — 1-page exec summary.
tm_os_compare_OLD_vs_NEW.png — per-metal bar chart showing the
effect on the OS distribution.
tm_os_dials_OLD.png / tm_os_dials_NEW.png — full radial dial
plots from each version.
tm_os_matrix_{OLD,NEW}.csv — per-metal OS bin matrices used to
render the dial plots.
bench_stratified_144/ — 9 per-condition CSVs from the bisection
summarized in the investigation MD, plus the 144-archive input list.
scripts/ — the build / bench / plot tooling that produced
everything above, with a PATHS_NOTE.md flagging hard-coded paths
to edit before running on another machine.

These are optional for the merge — happy to drop them or move them
to a wiki / issue if you'd prefer the working tree clean.

Not in the branch (kept out to keep the diff small):

transition_metal_oxidation_states_FINAL.csv — 24 MB master output
of the 181,450-archive run, one row per archive.
dedup_tm_picks.txt — 21 MB list of zip paths corresponding to
those archives.

Both are bundled in the 3.7 MB attachment to this PR
(PR-classy-yarp-zhao-final-20260619.zip) — drag-and-drop it to
GitHub and unpack to reproduce the full apples-to-apples comparison.

Open question for the maintainers

The "right" fix for Patch F (seed-BEM re-pool) is probably not the
TM-conditional we wrote here. Two alternatives we'd welcome a thought
on:

Re-pool the seeds before re-scoring with aromaticity weights
(rather than after). This would let the aromaticity term down-rank
non-aromatic Cp seeds before they ever reach adjust_metals,
preserving the safety-net behavior for organics without the TM
regression.
Always disable the re-pool, then tune mats_thresh to keep the
organic pytest cases passing. More invasive but cleaner conceptually.

Happy to split this PR into separate per-patch PRs if that would be
easier to review/merge. The branch's commit history already cleanly
separates each patch.

How to test locally

git fetch origin zhao-final-20260619       # or fetch from Zhaoli2042 fork
git checkout zhao-final-20260619
pytest test/yarpecule/lewis/                # should pass 55 / 55

Attached files (too big to put directly into the commits, so attached to PR):
transition_metal_oxidation_states_FINAL.csv
dedup_tm_picks.txt

The classy-yarp refactor set el_valence, el_n_deficient, el_n_expand_octet, el_expand_octet entries for Y, Zr, Nb, Mo, Tc, Ag, Cd, La, Hf, Ta, W, Re, Os, Pt, Hg to None, and dropped el_pol entries for the 6th-period block (cs, ba, la, hf, ta, w, re, os, ir, pt, au, hg, tl, pb, bi, po, at, rn). These cause KeyErrors / TypeErrors when running yarpecule on TM-containing species. This restores the old-YARP values so the patched-new-YARP can process the GoldDIGR transition-metal subset (181k reactions). Local commit; not pushed upstream.

Adds 4 patches missing from the new YARP master vs the old-YARP patch (GH commit fed9385): A. sys.setrecursionlimit(5000) -> 100000 (lewis_structure.py:197) Deep Lewis searches on TM-containing species hit the 5000 ceiling. B. 2nd gen_all_lstructs call: min_opt=True -> min_opt=False, min_win=0.5 (lewis_structure.py:295) The greedy second-pass descent gets stuck on high-OS local minima for TM-containing species. min_win=0.5 lets the search jump up to 0.5 above current best and find lower-score (chemically sensible) BEMs. C. Remove outer 'for ind in range(0, len(bond_mats))' loop in gen_all_lstructs (find_lewis.py:289). The body of 'for j' now runs once per call against bond_mats[ind] (the function parameter). Avoids re-walking over already-visited starting BEMs at each level. D. Remove move 4-bis (radical-radical bond formation) from valid_moves (find_lewis.py:446-450). This move was eliminated in the old-YARP patch as it generated spurious neutral-coupled bonds for species with multiple unpaired electrons (relevant to TM clusters). Motivation: a 30-archive sanity comparison of new-patched-YARP against the published slim transition_metal_oxidation_states.csv (from old patched YARP) showed 11/30 (37%) of archives differed, with new-YARP returning chemically implausible values (W+6, Co+7, Mn+5/+7 etc.) for TM species the old patched YARP handled correctly. Diagnosis traced the divergence to these four missing changes.

NEW YARP master flipped the sign of the radical-environment weight in bmat_score: w_rad went from +0.1 (penalize radicals on polarizable atoms) to -0.01 (reward them). Bisection over the 144-archive stratified TM sample shows new YARP biases toward HIGHER OS than old YARP in 9/10 metals (Fe, Mn, Re, Pt, Rh, Ir all 100% upward). High-OS BEMs leave unpaired electrons (radicals) on ligand atoms that are now rewarded under the new -0.01 weight, plausibly explaining the upward bias. This commit reverts w_rad default to +0.1 in both bmat_score() and lewis_struct._gen_bond_el_mat() (used in pass 3, the final ranking after adjust_metals). Passes 1 and 2 already set w_rad=0 explicitly so the search loop is unaffected; only the final ranking changes.

Generalizes patch E. The seed re-pool step before adjust_metals is: - RESTORED when no transition metals are present in the molecule (preserves NEW YARP master behavior for organic-only species) - DISABLED when transition metals are present (prevents non-aromatic pass-1 seeds from sneaking through adjust_metals as high-OS Z-bond ligand configurations) This is the long-term-correct form of patch E. Empirically validated: - Patch E broke 3 organic pytest cases (test_diazomethane_xyz, test_benzothiazole_smi, test_ester_xyz) where the re-pool was needed to anchor the mats_thresh trim - Patch F should pass all organic tests while keeping the 38/55 TM oxidation-state fixes from patch E el_metals is imported from yarp.util.properties (already maintained in the patch-baseline restoration of 5d/4d TM property values).

Bisection on the 144-archive TM stratified sample revealed patch B has a net-NEGATIVE effect on TM oxidation states: 1 archive matched slim after B alone, 3 new archives diverged. ABCD (full 4-patch bundle) had 1 more regression than only-C alone, attributable to B. Bisection on YARP 3.0's pytest also showed patch B breaks 3 organic test cases: test_diazomethane_xyz, test_ester_xyz, test_benzothiazole_smi. Master / only-A / only-C / only-D all pass; only patch B fails. GH commit fed9385 silently bundled patch B with the deliberate TM-OS fixes (adjust_metals 3 GUARDs + 5d el_metals). It was a carry-along algorithmic change from local development that was never validated. Removing it improves both organic and TM behavior. Final patch stack for new YARP: master baseline: - 3 GUARDs in adjust_metals + 5d el_metals (already in new master) - N_score=100 (already in new master) - restore-old-values properties.py (5d/4d TM dicts + el_pol) zhao-final-20260619: - A: recursion limit 5000 -> 100000 (safety net for huge molecules) - C: remove outer for-ind loop in gen_all_lstructs (10x search speedup) - D: remove move 4-bis (radical-radical, dead code with unused k loop) - w_rad: +0.1 vs new master's -0.01 (no-op for chemistry; speed-only) - F: conditional seed-BEM re-pool (organic restored, TM disabled)

Two markdown docs explaining the investigation, the patches, and the recommended upstream changes: zhao-patches-doc/YARP-3.0-OS-divergence-investigation.md Full technical writeup. ~7 pages. Covers the bisection methodology, every patch with bug/fix/evidence, the conceptual explanation of why new YARP biases TM oxidation states upward, the recommended upstream PRs, and the residual 13% breakdown. zhao-patches-doc/YARP-3.0-OS-divergence-summary.md One-page executive summary for non-YARP audiences (PI / collaborators). Three plain-English bugs, impact on GoldDIGR, action items. Lives on the zhao-final-20260619 branch alongside the patches it documents; master stays a clean mirror of upstream classy-yarp.

Full 181,450-archive mass-gen completed 2026-06-20 across 4 SLURM phases. Headline numbers: - 10 YARP errors / 181,450 (0.01%) - 80.25% full-archive agreement vs published slim CSV - Per-metal up/down splits roughly symmetric at corpus scale (Pd 4651/4575, Mo 1355/1373, Cu 1745/2282) - 0 chemically impossible OS values (no Pt(VII), Cr(VI), Ir(V), Co(VII)) The full-corpus number (80%) is lower than the stratified-sample number (89%) because the stratified sample was deliberately weighted toward hard rare metals where our patches helped most. At corpus scale the dominant late-TM organometallic chemistry (Pd/Rh/Fe/Ni/Cu) is more like two algorithms making different but defensible BEM choices than a systematic bug pattern. Investigation doc: section 7 rewritten with stratified + corpus tables and the symmetric-split observation. Summary doc: numbers table extended with full-corpus row, the 'remaining ~13%' bullet replaced with corpus-wide framing.

Of 181,450 archives, 10 raised LewisStructureError or ValueError under the FINAL stack. Cluster-bonding diagnosis: - 8 of 10 are metallaboranes (M-Bn, n=4-19) where Wade/Mingos polyhedral cluster bonding requires 3c-2e formalism that YARP's 2c-2e Lewis search cannot express - 1 is a Pd3 sulfide cluster (3c-2e bridging chalcogenides) - 1 is a Ru-Pb11 main-group cluster 4 of the 10 errors come from one paper (c9nj06335h1) studying analogous M-B16 series for M = Cr/Mn/Co/Ni. This is academic exploration of one anomalous bonding class, not 4 separate bugs. Often only one of (reactant, product) fails -- the pipeline emits 'ERR:' on the failing side and OS values on the working side, so downstream consumers can detect and skip them. At 10/181450 = 0.0055%, this is well below the aggregate noise floor for figures derived from the OS distribution. Recommendation to YARP team: distinguish 'cluster molecule' from 'malformed input' as a return value so downstream tools can branch cleanly. No algorithmic fix realistic without a separate cluster-aware YARP mode.

Headline finding from corpus-wide spot-check of the 35,843 disagreement archives (59,769 per-atom OS diffs): - 94% of per-atom diffs are |delta| <= 2 (Lewis-choice noise; both values chemically defensible) - mean signed delta = +0.155 OS units (slight upward drift, not systematic bug) - mean |delta| = 1.52 OS units The single most important corpus-level finding: Atom-level OS values exceeding group maximum (Pd > 4, Fe > 6, Cu > 3, etc.) DROPPED from 53,980 in the old slim CSV to 11,515 in the new FINAL CSV -- a 78% reduction in chemically impossible OS reports. Per-metal: Ag -94% Au -97% Rh -87% Pd -76% Cu -75% Fe -74% ... True archive-level regressions (1,162) almost exactly balance new fixes (1,124). The patched new YARP is strictly more chemically conservative than the old YARP that produced the published dial-plot. Implication: regenerating Figure 2 from the new FINAL CSV would IMPROVE the published plot's over-group-max tail bins, not degrade them. Updated section 7.5 'Bottom line' bullets and added a new section 9 'Spot-check' with magnitude distribution and examples. Summary doc also updated to reflect the corpus-wide headline.

Earlier draft compared the full slim CSV (462k rows, all charge/mult variants) against the FINAL CSV (181k deduped picks). That inflated the OLD over-max counts by including dup rows the published dial-plot never used. Apples-to-apples comparison on the same deduped 181k archive set: OLD slim over-max: 11,477 atom-level cases NEW FINAL over-max: 11,515 atom-level cases net delta: +38 (+0.3%) -- essentially flat Real wins are concentrated on heavy noble metals: Au: 161 -> 91 (-43%) Ag: 300 -> 212 (-29%) Modest losses on: Rh: 985 -> 1115 (+130) Co: 1133 -> 1198 (+65) Ni: 1139 -> 1188 (+49) What the patches DO eliminate are the catastrophic single-archive failures shown on the stratified sample (Cp-Cr -> Cr(VI) clusters, Pt(0) -> Pt(VII), Cp-Ir -> Ir(V), Co(VII)). Those don't show up in the corpus-wide over-max count because they're rare absolute contributors to a ~11k baseline of corpus-wide over-max events. Both summary and investigation docs updated to reflect this. Section 9.1's table now shows the apples-to-apples comparison; the bottom line bullets are toned down accordingly.

Three figures showing OLD slim CSV vs NEW FINAL CSV at corpus scale, restricted to the apples-to-apples deduplicated 181,450-archive set: tm_os_dials_OLD.png reproduction of published dial-plot Figure 2 tm_os_dials_NEW.png same plot from FINAL CSV tm_os_compare_OLD_vs_NEW.png per-metal grouped bar chart with chemically-impossible region shaded The dial plots look essentially identical to the eye -- consistent with the +0.3% over-max delta. The bar-chart comparison is more informative: Cu / Ag / Au show visibly reduced over-max bins under the new YARP; most other metals are indistinguishable from old. Investigation doc section 9.4.5 added with figure descriptions. PNG renderer: Scripts/v2/os_test_final/plot_os_compare.py Matrix builder: Scripts/v2/build_compare_matrices.py Dial-plot script: Scripts/draw_tm_os_radial_dials_fullcircle.py

Comment-only changes; no behavior change. Audit found three annotations that were either stale (made false claims) or missed a load-bearing rationale that a future reader would need. bem_score.py w_rad default revert lewis_structure w_rad default revert OLD comment claimed the sign flip 'biases TM dial-plot toward higher OS because high-OS BEMs leave radicals on ligand atoms now rewarded under -0.01'. This was the ORIGINAL hypothesis when patch was made. Later bisection (only-w_rad branch) showed the revert has ZERO chemistry effect: new bem_score computes rad_env = +pol/(100+pol) internally, old find_lewis passed rad_env = -0.1*pol/(100+pol), so w_rad*rad_env is algebraically identical between the two. The revert is COSMETIC (matches +0.1 downstream convention). New comment states this explicitly. find_lewis.py Patch C (outer for-ind loop removal) OLD comment said 'match old-YARP behavior' but didn't explain WHY the loop existed or what removing it costs/saves. Added the performance angle: old loop re-walked all BEMs at every recursion depth (exponential blow-up); removal gives ~10x speedup on TM bench (32972s -> 1718s in isolation). Future reader: do NOT restore. find_lewis.py Patch D (move 4-bis removal) OLD comment preserved the removed code but didn't explain it was dead. Added: the inner loop iterates but is never used in the yielded move, so the block emits the same (i,j) coupling move once per qualifying k — pure duplicate yields that bloat the search. Move 4 below covers the legitimate radical-radical case. Looks like unfinished experimental code. All 55 lewis pytest cases still pass.

In-branch supplementary material (1.7 MB total) so reviewers can verify the patch claims without leaving GitHub: scripts/ build/bench/plot tooling + PATHS_NOTE.md bench_stratified_144/ 9 per-condition CSVs + the input list for the bisection summarized in the investigation MD tm_os_matrix_OLD.csv per-metal OS bin matrix from old YARP tm_os_matrix_NEW.csv per-metal OS bin matrix from this branch README.md folder index + reading order NOT included (intentionally) to keep the branch small: transition_metal_oxidation_states_FINAL.csv 24 MB master output dedup_tm_picks.txt 21 MB input zip list These two are available in a separate 46 MB bundle; the URL goes in the PR description so anyone who wants to re-run the corpus comparison can fetch them. The data already in the branch is enough to evaluate the patches and the docs.

erm42

This is looking nice! One thing is that I do not want the zhao-patches-doc contents on the git repo, so please remove that, and then we should be fine to merge into master

In the future, I'd like to establish some integration tests for YARP, so perhaps we can circle back around to including some of your testing pipeline into that. But I don't have a home figured out for it yet.

Zhaoli2042 added 16 commits June 12, 2026 13:35

initial patch for oxidation state

66ddc86

remove backup files

d2dcbc3

remove backup files for properties.py

9b72eca

Zhaoli2042 requested a review from erm42 June 23, 2026 22:14

Zhaoli2042 changed the title ~~Transition metal oxidation S~~ Transition metal oxidation state proposed fixes Jun 23, 2026

Merge branch 'master' into zhao-final-20260619

0672596

erm42 requested changes Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transition metal oxidation state proposed fixes#35

Transition metal oxidation state proposed fixes#35
Zhaoli2042 wants to merge 17 commits into
Savoie-Research-Group:masterfrom
Zhaoli2042:zhao-final-20260619

Zhaoli2042 commented Jun 23, 2026

Uh oh!

erm42 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Zhaoli2042 commented Jun 23, 2026

Summary

What's in the bundle

What's NOT in the bundle

Validation

Documentation and data

Open question for the maintainers

How to test locally

Uh oh!

erm42 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants