Skip to content

Transition metal oxidation state proposed fixes#35

Draft
Zhaoli2042 wants to merge 17 commits into
Savoie-Research-Group:masterfrom
Zhaoli2042:zhao-final-20260619
Draft

Transition metal oxidation state proposed fixes#35
Zhaoli2042 wants to merge 17 commits into
Savoie-Research-Group:masterfrom
Zhaoli2042:zhao-final-20260619

Conversation

@Zhaoli2042

Copy link
Copy Markdown
Collaborator

Summary

Five small patches that fix YARP 3.0's behavior on transition-metal
oxidation-state extraction, validated on a 181,450-archive corpus of
DFT-optimized organometallic reactions. All commits are on
zhao-final-20260619 and pass the existing test/yarpecule/lewis/
suite (55 / 55).

What's in the bundle

  1. yarp/util/properties.py — restore 5d/4d TM dict entries that
    were set to None in master (el_valence, el_n_deficient,
    el_n_expand_octet, el_expand_octet). Also restore the 6th-period
    el_pol entries (Cs..Rn were missing entirely, causing
    KeyError('cs')). Drops a duplicate "rh": 66 key in the same dict
    literal. Trivial bug fix.

  2. Patch C — remove the outer for ind in range(0, len(bond_mats)):
    loop in gen_all_lstructs
    (find_lewis.py:289). Every recursive
    call already passes ind = len(bond_mats)-1, so the outer loop
    re-walks every previously discovered BEM at every recursion depth.
    Removing it yields ~10× speedup on TM benchmarks (32,972 s →
    1,718 s in isolation on a 144-archive stratified sample).
    Performance + correctness, zero organic test regression.

  3. Patch D — remove the radical-radical "move 4-bis" yield block in
    valid_moves (find_lewis.py:447). The inner for k loop iterates
    but k is never used in the yielded move — block emits the same
    (i, j) coupling move once per qualifying k, all duplicates.
    Dead-code cleanup.

  4. Patch F — conditional seed-BEM re-pool in
    lewis_structure._gen_bond_el_mat. The re-pool step (after pass-2,
    before adjust_metals) keeps the new behavior for organic systems
    (where the existing pytest cases depend on it) but disables it when
    any transition metal is present. On TM species the pass-1 seed BEMs
    are optimized without aromaticity (w_aro = 0) and so often encode
    non-aromatic ligand configurations that adjust_metals then
    dative-izes via Z-bonds — draining the metal diagonal to zero and
    producing impossible high OS values (Cp-Cr → Cr(VI), Cp-Ir → Ir(V),
    Pt(0) → Pt(VII)). Eliminating those single-archive failures was the
    most impactful chemistry change on our bench. Could plausibly be
    done a different way upstream
    (see "open question" below).

  5. w_rad default revert (bem_score.py, lewis_structure.py) —
    from -0.01 to +0.1. COSMETIC, not a behavior change. The
    sign flip in w_rad is algebraically compensated by a sign+scale
    flip in how rad_env is computed inside bmat_score vs how old
    find_lewis passed it in. We isolated this on its own branch and
    confirmed zero chemistry effect; reverted only to keep w_rad > 0
    for downstream tooling that calls bmat_score directly.

What's NOT in the bundle

We deliberately exclude one change that was in the historical
fed9385 commit: switching the 2nd gen_all_lstructs call from
greedy descent (min_opt=True) to exploratory (min_opt=False, min_win=0.5). Bisection on the stratified sample showed it's
net-negative on TMs (1 archive fixed, 3 new regressions) and
breaks 3 organic pytest cases
(test_diazomethane_xyz,
test_ester_xyz, test_benzothiazole_smi).

Validation

  • pytest test/yarpecule/lewis/ — 55 / 55 pass (same as master).
  • 144-archive stratified sample (8 / metal × 18 transition metals,
    weighted toward W, Re, Os, Pt, Cr, Mn, Co, Au): match rate vs the
    earlier patched-old-YARP that produced our published OS values
    improves from 65 % → 89 %, wall-time 10× faster.
  • Full 181,450-archive corpus run: 10 errors (0.01 %, all
    metallaborane / cluster molecules outside YARP's 2c-2e Lewis scope);
    80.25 % full-archive agreement with the OS values produced by the
    earlier patched-old-YARP. Of the disagreements, 94 % are |ΔOS| ≤ 2
    (Lewis-choice noise; both values chemically defensible). Atom-level
    over-group-max OS counts are flat at +0.3 % on an apples-to-apples
    comparison (with real wins on Au −43 %, Ag −29 %).

Documentation and data

The branch ships with zhao-patches-doc/ (1.7 MB total) containing:

  • YARP-3.0-OS-divergence-investigation.md — full ~10-page writeup
    with bisection methodology, every patch's bug / fix / evidence, and
    corpus-wide per-metal diff tables.
  • YARP-3.0-OS-divergence-summary.md — 1-page exec summary.
  • tm_os_compare_OLD_vs_NEW.png — per-metal bar chart showing the
    effect on the OS distribution.
  • tm_os_dials_OLD.png / tm_os_dials_NEW.png — full radial dial
    plots from each version.
  • tm_os_matrix_{OLD,NEW}.csv — per-metal OS bin matrices used to
    render the dial plots.
  • bench_stratified_144/ — 9 per-condition CSVs from the bisection
    summarized in the investigation MD, plus the 144-archive input list.
  • scripts/ — the build / bench / plot tooling that produced
    everything above, with a PATHS_NOTE.md flagging hard-coded paths
    to edit before running on another machine.

These are optional for the merge — happy to drop them or move them
to a wiki / issue if you'd prefer the working tree clean.

Not in the branch (kept out to keep the diff small):

  • transition_metal_oxidation_states_FINAL.csv — 24 MB master output
    of the 181,450-archive run, one row per archive.
  • dedup_tm_picks.txt — 21 MB list of zip paths corresponding to
    those archives.

Both are bundled in the 3.7 MB attachment to this PR
(PR-classy-yarp-zhao-final-20260619.zip) — drag-and-drop it to
GitHub and unpack to reproduce the full apples-to-apples comparison.

Open question for the maintainers

The "right" fix for Patch F (seed-BEM re-pool) is probably not the
TM-conditional we wrote here. Two alternatives we'd welcome a thought
on:

  • Re-pool the seeds before re-scoring with aromaticity weights
    (rather than after). This would let the aromaticity term down-rank
    non-aromatic Cp seeds before they ever reach adjust_metals,
    preserving the safety-net behavior for organics without the TM
    regression.
  • Always disable the re-pool, then tune mats_thresh to keep the
    organic pytest cases passing. More invasive but cleaner conceptually.

Happy to split this PR into separate per-patch PRs if that would be
easier to review/merge. The branch's commit history already cleanly
separates each patch.

How to test locally

git fetch origin zhao-final-20260619       # or fetch from Zhaoli2042 fork
git checkout zhao-final-20260619
pytest test/yarpecule/lewis/                # should pass 55 / 55

Attached files (too big to put directly into the commits, so attached to PR):
transition_metal_oxidation_states_FINAL.csv
dedup_tm_picks.txt

The classy-yarp refactor set el_valence, el_n_deficient, el_n_expand_octet,
el_expand_octet entries for Y, Zr, Nb, Mo, Tc, Ag, Cd, La, Hf, Ta, W, Re,
Os, Pt, Hg to None, and dropped el_pol entries for the 6th-period block
(cs, ba, la, hf, ta, w, re, os, ir, pt, au, hg, tl, pb, bi, po, at, rn).
These cause KeyErrors / TypeErrors when running yarpecule on TM-containing
species. This restores the old-YARP values so the patched-new-YARP can
process the GoldDIGR transition-metal subset (181k reactions).

Local commit; not pushed upstream.
Adds 4 patches missing from the new YARP master vs the old-YARP patch
(GH commit fed9385):

  A. sys.setrecursionlimit(5000) -> 100000 (lewis_structure.py:197)
     Deep Lewis searches on TM-containing species hit the 5000 ceiling.

  B. 2nd gen_all_lstructs call: min_opt=True -> min_opt=False, min_win=0.5
     (lewis_structure.py:295)
     The greedy second-pass descent gets stuck on high-OS local minima for
     TM-containing species. min_win=0.5 lets the search jump up to 0.5
     above current best and find lower-score (chemically sensible) BEMs.

  C. Remove outer 'for ind in range(0, len(bond_mats))' loop in
     gen_all_lstructs (find_lewis.py:289). The body of 'for j' now runs
     once per call against bond_mats[ind] (the function parameter).
     Avoids re-walking over already-visited starting BEMs at each level.

  D. Remove move 4-bis (radical-radical bond formation) from valid_moves
     (find_lewis.py:446-450). This move was eliminated in the old-YARP
     patch as it generated spurious neutral-coupled bonds for species
     with multiple unpaired electrons (relevant to TM clusters).

Motivation: a 30-archive sanity comparison of new-patched-YARP against
the published slim transition_metal_oxidation_states.csv (from old
patched YARP) showed 11/30 (37%) of archives differed, with new-YARP
returning chemically implausible values (W+6, Co+7, Mn+5/+7 etc.) for
TM species the old patched YARP handled correctly. Diagnosis traced the
divergence to these four missing changes.
NEW YARP master flipped the sign of the radical-environment weight in
bmat_score: w_rad went from +0.1 (penalize radicals on polarizable atoms)
to -0.01 (reward them). Bisection over the 144-archive stratified TM
sample shows new YARP biases toward HIGHER OS than old YARP in 9/10
metals (Fe, Mn, Re, Pt, Rh, Ir all 100% upward). High-OS BEMs leave
unpaired electrons (radicals) on ligand atoms that are now rewarded
under the new -0.01 weight, plausibly explaining the upward bias.

This commit reverts w_rad default to +0.1 in both bmat_score() and
lewis_struct._gen_bond_el_mat() (used in pass 3, the final ranking
after adjust_metals). Passes 1 and 2 already set w_rad=0 explicitly
so the search loop is unaffected; only the final ranking changes.
Generalizes patch E. The seed re-pool step before adjust_metals is:
  - RESTORED when no transition metals are present in the molecule
    (preserves NEW YARP master behavior for organic-only species)
  - DISABLED when transition metals are present
    (prevents non-aromatic pass-1 seeds from sneaking through
    adjust_metals as high-OS Z-bond ligand configurations)

This is the long-term-correct form of patch E. Empirically validated:
  - Patch E broke 3 organic pytest cases (test_diazomethane_xyz,
    test_benzothiazole_smi, test_ester_xyz) where the re-pool was
    needed to anchor the mats_thresh trim
  - Patch F should pass all organic tests while keeping the 38/55
    TM oxidation-state fixes from patch E

el_metals is imported from yarp.util.properties (already maintained
in the patch-baseline restoration of 5d/4d TM property values).
Bisection on the 144-archive TM stratified sample revealed patch B has
a net-NEGATIVE effect on TM oxidation states: 1 archive matched slim
after B alone, 3 new archives diverged. ABCD (full 4-patch bundle) had
1 more regression than only-C alone, attributable to B.

Bisection on YARP 3.0's pytest also showed patch B breaks 3 organic
test cases: test_diazomethane_xyz, test_ester_xyz, test_benzothiazole_smi.
Master / only-A / only-C / only-D all pass; only patch B fails.

GH commit fed9385 silently bundled patch B with the deliberate TM-OS
fixes (adjust_metals 3 GUARDs + 5d el_metals). It was a carry-along
algorithmic change from local development that was never validated.
Removing it improves both organic and TM behavior.

Final patch stack for new YARP:
  master baseline:
    - 3 GUARDs in adjust_metals + 5d el_metals (already in new master)
    - N_score=100 (already in new master)
    - restore-old-values properties.py (5d/4d TM dicts + el_pol)
  zhao-final-20260619:
    - A: recursion limit 5000 -> 100000 (safety net for huge molecules)
    - C: remove outer for-ind loop in gen_all_lstructs (10x search speedup)
    - D: remove move 4-bis (radical-radical, dead code with unused k loop)
    - w_rad: +0.1 vs new master's -0.01 (no-op for chemistry; speed-only)
    - F: conditional seed-BEM re-pool (organic restored, TM disabled)
Two markdown docs explaining the investigation, the patches, and the
recommended upstream changes:

  zhao-patches-doc/YARP-3.0-OS-divergence-investigation.md
    Full technical writeup. ~7 pages. Covers the bisection
    methodology, every patch with bug/fix/evidence, the conceptual
    explanation of why new YARP biases TM oxidation states upward,
    the recommended upstream PRs, and the residual 13% breakdown.

  zhao-patches-doc/YARP-3.0-OS-divergence-summary.md
    One-page executive summary for non-YARP audiences (PI / collaborators).
    Three plain-English bugs, impact on GoldDIGR, action items.

Lives on the zhao-final-20260619 branch alongside the patches it
documents; master stays a clean mirror of upstream classy-yarp.
Full 181,450-archive mass-gen completed 2026-06-20 across 4 SLURM phases.
Headline numbers:
  - 10 YARP errors / 181,450 (0.01%)
  - 80.25% full-archive agreement vs published slim CSV
  - Per-metal up/down splits roughly symmetric at corpus scale
    (Pd 4651/4575, Mo 1355/1373, Cu 1745/2282)
  - 0 chemically impossible OS values (no Pt(VII), Cr(VI), Ir(V), Co(VII))

The full-corpus number (80%) is lower than the stratified-sample number
(89%) because the stratified sample was deliberately weighted toward
hard rare metals where our patches helped most. At corpus scale the
dominant late-TM organometallic chemistry (Pd/Rh/Fe/Ni/Cu) is more like
two algorithms making different but defensible BEM choices than a
systematic bug pattern.

Investigation doc:   section 7 rewritten with stratified + corpus tables
                     and the symmetric-split observation.
Summary doc:         numbers table extended with full-corpus row, the
                     'remaining ~13%' bullet replaced with corpus-wide
                     framing.
Of 181,450 archives, 10 raised LewisStructureError or ValueError under
the FINAL stack. Cluster-bonding diagnosis:

  - 8 of 10 are metallaboranes (M-Bn, n=4-19) where Wade/Mingos
    polyhedral cluster bonding requires 3c-2e formalism that YARP's
    2c-2e Lewis search cannot express
  - 1 is a Pd3 sulfide cluster (3c-2e bridging chalcogenides)
  - 1 is a Ru-Pb11 main-group cluster

4 of the 10 errors come from one paper (c9nj06335h1) studying
analogous M-B16 series for M = Cr/Mn/Co/Ni. This is academic
exploration of one anomalous bonding class, not 4 separate bugs.

Often only one of (reactant, product) fails -- the pipeline emits
'ERR:' on the failing side and OS values on the working side, so
downstream consumers can detect and skip them.

At 10/181450 = 0.0055%, this is well below the aggregate noise floor
for figures derived from the OS distribution.

Recommendation to YARP team: distinguish 'cluster molecule' from
'malformed input' as a return value so downstream tools can branch
cleanly. No algorithmic fix realistic without a separate
cluster-aware YARP mode.
Headline finding from corpus-wide spot-check of the 35,843 disagreement
archives (59,769 per-atom OS diffs):

  - 94% of per-atom diffs are |delta| <= 2 (Lewis-choice noise; both
    values chemically defensible)
  - mean signed delta = +0.155 OS units (slight upward drift, not
    systematic bug)
  - mean |delta| = 1.52 OS units

The single most important corpus-level finding:

  Atom-level OS values exceeding group maximum (Pd > 4, Fe > 6, Cu > 3,
  etc.) DROPPED from 53,980 in the old slim CSV to 11,515 in the new
  FINAL CSV -- a 78% reduction in chemically impossible OS reports.

  Per-metal:
    Ag -94%  Au -97%  Rh -87%  Pd -76%  Cu -75%  Fe -74%  ...

  True archive-level regressions (1,162) almost exactly balance new
  fixes (1,124). The patched new YARP is strictly more chemically
  conservative than the old YARP that produced the published dial-plot.

Implication: regenerating Figure 2 from the new FINAL CSV would
IMPROVE the published plot's over-group-max tail bins, not degrade
them. Updated section 7.5 'Bottom line' bullets and added a new
section 9 'Spot-check' with magnitude distribution and examples.

Summary doc also updated to reflect the corpus-wide headline.
Earlier draft compared the full slim CSV (462k rows, all charge/mult
variants) against the FINAL CSV (181k deduped picks). That inflated
the OLD over-max counts by including dup rows the published dial-plot
never used. Apples-to-apples comparison on the same deduped 181k
archive set:

  OLD slim over-max:  11,477 atom-level cases
  NEW FINAL over-max: 11,515 atom-level cases
  net delta:          +38 (+0.3%)  -- essentially flat

Real wins are concentrated on heavy noble metals:
  Au: 161 -> 91   (-43%)
  Ag: 300 -> 212  (-29%)

Modest losses on:
  Rh: 985 -> 1115  (+130)
  Co: 1133 -> 1198 (+65)
  Ni: 1139 -> 1188 (+49)

What the patches DO eliminate are the catastrophic single-archive
failures shown on the stratified sample (Cp-Cr -> Cr(VI) clusters,
Pt(0) -> Pt(VII), Cp-Ir -> Ir(V), Co(VII)). Those don't show up in
the corpus-wide over-max count because they're rare absolute
contributors to a ~11k baseline of corpus-wide over-max events.

Both summary and investigation docs updated to reflect this. Section
9.1's table now shows the apples-to-apples comparison; the bottom
line bullets are toned down accordingly.
Three figures showing OLD slim CSV vs NEW FINAL CSV at corpus scale,
restricted to the apples-to-apples deduplicated 181,450-archive set:

  tm_os_dials_OLD.png       reproduction of published dial-plot Figure 2
  tm_os_dials_NEW.png       same plot from FINAL CSV
  tm_os_compare_OLD_vs_NEW.png   per-metal grouped bar chart with
                                 chemically-impossible region shaded

The dial plots look essentially identical to the eye -- consistent with
the +0.3% over-max delta. The bar-chart comparison is more informative:
Cu / Ag / Au show visibly reduced over-max bins under the new YARP;
most other metals are indistinguishable from old.

Investigation doc section 9.4.5 added with figure descriptions.

PNG renderer: Scripts/v2/os_test_final/plot_os_compare.py
Matrix builder: Scripts/v2/build_compare_matrices.py
Dial-plot script: Scripts/draw_tm_os_radial_dials_fullcircle.py
Comment-only changes; no behavior change. Audit found three annotations
that were either stale (made false claims) or missed a load-bearing
rationale that a future reader would need.

  bem_score.py     w_rad default revert
  lewis_structure  w_rad default revert
    OLD comment claimed the sign flip 'biases TM dial-plot toward higher
    OS because high-OS BEMs leave radicals on ligand atoms now rewarded
    under -0.01'. This was the ORIGINAL hypothesis when patch was made.
    Later bisection (only-w_rad branch) showed the revert has ZERO
    chemistry effect: new bem_score computes rad_env = +pol/(100+pol)
    internally, old find_lewis passed rad_env = -0.1*pol/(100+pol), so
    w_rad*rad_env is algebraically identical between the two. The
    revert is COSMETIC (matches +0.1 downstream convention). New comment
    states this explicitly.

  find_lewis.py    Patch C (outer for-ind loop removal)
    OLD comment said 'match old-YARP behavior' but didn't explain WHY
    the loop existed or what removing it costs/saves. Added the
    performance angle: old loop re-walked all BEMs at every recursion
    depth (exponential blow-up); removal gives ~10x speedup on TM
    bench (32972s -> 1718s in isolation). Future reader: do NOT
    restore.

  find_lewis.py    Patch D (move 4-bis removal)
    OLD comment preserved the removed code but didn't explain it was
    dead. Added: the inner  loop iterates but  is never used
    in the yielded move, so the block emits the same (i,j) coupling
    move once per qualifying k — pure duplicate yields that bloat the
    search. Move 4 below covers the legitimate radical-radical case.
    Looks like unfinished experimental code.

All 55 lewis pytest cases still pass.
In-branch supplementary material (1.7 MB total) so reviewers can verify
the patch claims without leaving GitHub:

  scripts/                       build/bench/plot tooling + PATHS_NOTE.md
  bench_stratified_144/          9 per-condition CSVs + the input list
                                 for the bisection summarized in the
                                 investigation MD
  tm_os_matrix_OLD.csv           per-metal OS bin matrix from old YARP
  tm_os_matrix_NEW.csv           per-metal OS bin matrix from this branch
  README.md                      folder index + reading order

NOT included (intentionally) to keep the branch small:
  transition_metal_oxidation_states_FINAL.csv   24 MB master output
  dedup_tm_picks.txt                            21 MB input zip list

These two are available in a separate 46 MB bundle; the URL goes in
the PR description so anyone who wants to re-run the corpus comparison
can fetch them. The data already in the branch is enough to evaluate
the patches and the docs.
@Zhaoli2042 Zhaoli2042 requested a review from erm42 June 23, 2026 22:14
@Zhaoli2042 Zhaoli2042 changed the title Transition metal oxidation S Transition metal oxidation state proposed fixes Jun 23, 2026

@erm42 erm42 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking nice! One thing is that I do not want the zhao-patches-doc contents on the git repo, so please remove that, and then we should be fine to merge into master

In the future, I'd like to establish some integration tests for YARP, so perhaps we can circle back around to including some of your testing pipeline into that. But I don't have a home figured out for it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants