Skip to content

Switch to OpenBabel 3 #2088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 22, 2021
Merged

Switch to OpenBabel 3 #2088

merged 10 commits into from
Apr 22, 2021

Conversation

rwest
Copy link
Member

@rwest rwest commented Mar 15, 2021

Motivation or Problem

The release notes say OpenBabel 3 a major improvement
https://github.com/openbabel/openbabel/releases/tag/openbabel-3-0-0
but it does break some things in the API
https://open-babel.readthedocs.io/en/latest/UseTheLibrary/migration.html#migrating-to-3-0
so subsequent commits on this pull request should fix RMG to be compatible.

This pull request is currently a stub. Please feel free to contribute commits!

Description of Changes

  • update the environment.yaml to use the official updated cclib and openbabel conda packages.
  • change import statements
  • when reading radicals from OpenBabel molecules, calculate spin multiplicity based on undervalence of atoms, because OpenBabel no longer does this for you
  • Mark a couple of broken unit tests as work_in_progress, because they seem to have malformed InChIs that we can no longer read, and I'm not sure what the correct molecule is supposed to be.

Testing

  • Tests run locally on my Mac.
  • Continuous Integration tests will be run.

Reviewer Tips

  • if someone can figure out the broken InChIs in the unit tests, that'd be great.

Further detailed commentary is in the individual commit messages.

@rwest
Copy link
Member Author

rwest commented Mar 17, 2021

Running tests locally I get only two problems now

======================================================================
ERROR: test_c6h6o4 (rmgpy.molecule.pathfinderTest.FindButadieneEndWithChargeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/rwest/Code/RMG-Py/rmgpy/molecule/pathfinderTest.py", line 316, in test_c6h6o4
    mol = Molecule().from_inchi(inchi)
  File "rmgpy/molecule/molecule.py", line 1667, in rmgpy.molecule.molecule.Molecule.from_inchi
    def from_inchi(self, inchistr, backend='try-all', raise_atomtype_exception=True):
  File "rmgpy/molecule/molecule.py", line 1671, in rmgpy.molecule.molecule.Molecule.from_inchi
    translator.from_inchi(self, inchistr, backend, raise_atomtype_exception=raise_atomtype_exception)
  File "rmgpy/molecule/translator.py", line 254, in rmgpy.molecule.translator.from_inchi
    def from_inchi(mol, inchistr, backend='try-all', raise_atomtype_exception=True):
  File "rmgpy/molecule/translator.py", line 261, in rmgpy.molecule.translator.from_inchi
    return _read(mol, inchistr, 'inchi', backend, raise_atomtype_exception=raise_atomtype_exception)
  File "rmgpy/molecule/translator.py", line 503, in rmgpy.molecule.translator._read
    raise ValueError("Unable to correctly parse {0} with backend {1}.".format(identifier, backend))
ValueError: Unable to correctly parse InChI=1S/C6H6O4/c1-2-4-9-6(7)3-5-10-8/h2-3H,1,5H2 with backend try-all.

This is the same InChI that I had to mark as a work-in-progress because OpenBabel cannot parse it into a valid Molecule.
InChI=1S/C6H6O4/c1-2-4-9-6(7)3-5-10-8/h2-3H,1,5H2
Pasting it into https://rmg.mit.edu/molecule_search generates an adjacency list, but not a valid one (try clicking "Draw Molecule") so I'm not sure what it's meant to be.

======================================================================
FAIL: test_nitrogenated_birad (rmgpy.molecule.pathfinderTest.FindAllylDelocalizationPathsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/rwest/Code/RMG-Py/rmgpy/molecule/pathfinderTest.py", line 405, in test_nitrogenated_birad
    self.assertTrue(paths)
AssertionError: [] is not true

the test is

    def test_nitrogenated_birad(self):
        smiles = '[CH]=C[N]'
        mol = Molecule().from_smiles(smiles)
        paths = find_allyl_delocalization_paths(mol.atoms[3])
        self.assertTrue(paths)

after reading from_smiles the molecule is

multiplicity 4
1 N u2 p1 c0 {2,S}
2 C u0 p0 c0 {1,S} {3,D} {4,S}
3 C u1 p0 c0 {2,D} {5,S}
4 H u0 p0 c0 {2,S}
5 H u0 p0 c0 {3,S}

compared to the RMG website which gives

multiplicity 4
1 C u1 p0 c0 {2,S} {3,D}
2 H u0 p0 c0 {1,S}
3 C u0 p0 c0 {1,D} {4,S} {5,S}
4 N u2 p1 c0 {3,S}
5 H u0 p0 c0 {3,S}

so I think it's just an atom ordering issue. I'll see if there's an alternative SMILES that passes both before and after.

@rwest
Copy link
Member Author

rwest commented Mar 17, 2021

The unit tests now pass for me locally, on Mac OS X, where according to conda list I am using
cclib 1.7 pyhd3deb0d_0 conda-forge
but on Travis it is preferring
cclib 1.6.1.rmg py37h39e3cac_0 rmg
which uses the old openbabel imports.
The patch for the imports is at cclib/cclib@3f73c94 if someone wants to apply it to the "RMG" version of cclib and push new binaries to conda, but there may be other patches needed too, and it'd be vastly preferable to get back on track and use the official cclib.
Is there a reason we need our own? (and if so, why not for Mac OS X? or am I not testing Arkane or ARC locally or something?)

@rwest
Copy link
Member Author

rwest commented Mar 17, 2021

See #756 for the conversion to the official CCLib.
Possible issues in parsing mopac files, mm4 files, and some tweaks to the gaussian parser (rotational symmetry numbers, rotational constants, HOMOs?)

@rwest
Copy link
Member Author

rwest commented Mar 17, 2021

Current test failures in the travis log all to do with parsing mopac files with cclib:

======================================================================
8347ERROR: Test that Mocpac get_thermo_data() works correctly.
8348----------------------------------------------------------------------
8349Traceback (most recent call last):
8350  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mainTest.py", line 244, in test_get_thermo_data_mopac
8351    thermo1 = self.mop1.get_thermo_data(mol)
8352  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/main.py", line 216, in get_thermo_data
8353    thermo0 = qm_molecule_calculator.generate_thermo_data()
8354  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8355    self.qm_data = self.generate_qm_data()
8356  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 309, in generate_qm_data
8357    success = self.run()
8358  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 149, in run
8359    return self.verify_output_file()
8360  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8361    qm_data = self.parse()
8362  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8363    parser = self.get_parser(self.output_file_path)
8364  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8365    return cclib.parser.Mopac(output_file)
8366AttributeError: module 'cclib.parser' has no attribute 'Mopac'
8367
8368======================================================================
8369ERROR: Test that run_jobs() works properly.
8370----------------------------------------------------------------------
8371Traceback (most recent call last):
8372  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mainTest.py", line 304, in test_run_jobs
8373    qm.run_jobs(spc_list, procnum=1)
8374  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/main.py", line 253, in run_jobs
8375    _write_qm_files_star(qm_arg)
8376  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/main.py", line 264, in _write_qm_files_star
8377    return _write_qm_files(*args)
8378  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/main.py", line 271, in _write_qm_files
8379    quantum_mechanics.get_thermo_data(mol)
8380  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/main.py", line 216, in get_thermo_data
8381    thermo0 = qm_molecule_calculator.generate_thermo_data()
8382  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8383    self.qm_data = self.generate_qm_data()
8384  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 309, in generate_qm_data
8385    success = self.run()
8386  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 149, in run
8387    return self.verify_output_file()
8388  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8389    qm_data = self.parse()
8390  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8391    parser = self.get_parser(self.output_file_path)
8392  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8393    return cclib.parser.Mopac(output_file)
8394AttributeError: module 'cclib.parser' has no attribute 'Mopac'
8395
8396======================================================================
8397ERROR: Test that generate_thermo_data() works correctly for MOPAC PM3
8398----------------------------------------------------------------------
8399Traceback (most recent call last):
8400  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopacTest.py", line 88, in test_generate_thermo_data
8401    self.qmmol1.generate_thermo_data()
8402  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8403    self.qm_data = self.generate_qm_data()
8404  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 309, in generate_qm_data
8405    success = self.run()
8406  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 149, in run
8407    return self.verify_output_file()
8408  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8409    qm_data = self.parse()
8410  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8411    parser = self.get_parser(self.output_file_path)
8412  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8413    return cclib.parser.Mopac(output_file)
8414AttributeError: module 'cclib.parser' has no attribute 'Mopac'
8415
8416======================================================================
8417ERROR: Test that generate_thermo_data() can load thermo from the previous MOPAC PM3 run.
8418----------------------------------------------------------------------
8419Traceback (most recent call last):
8420  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopacTest.py", line 107, in test_load_thermo_data
8421    self.qmmol1.generate_thermo_data()
8422  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8423    self.qm_data = self.generate_qm_data()
8424  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 298, in generate_qm_data
8425    if self.verify_output_file():
8426  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8427    qm_data = self.parse()
8428  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8429    parser = self.get_parser(self.output_file_path)
8430  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8431    return cclib.parser.Mopac(output_file)
8432AttributeError: module 'cclib.parser' has no attribute 'Mopac'
8433
8434======================================================================
8435ERROR: Test that generate_thermo_data() works correctly for MOPAC PM6
8436----------------------------------------------------------------------
8437Traceback (most recent call last):
8438  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopacTest.py", line 152, in test_generate_thermo_data
8439    self.qmmol1.generate_thermo_data()
8440  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8441    self.qm_data = self.generate_qm_data()
8442  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 309, in generate_qm_data
8443    success = self.run()
8444  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 149, in run
8445    return self.verify_output_file()
8446  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8447    qm_data = self.parse()
8448  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8449    parser = self.get_parser(self.output_file_path)
8450  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8451    return cclib.parser.Mopac(output_file)
8452AttributeError: module 'cclib.parser' has no attribute 'Mopac'
8453
8454======================================================================
8455ERROR: Test that generate_thermo_data() can load thermo from the previous MOPAC PM6 run.
8456----------------------------------------------------------------------
8457Traceback (most recent call last):
8458  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopacTest.py", line 172, in test_load_thermo_data
8459    self.qmmol1.generate_thermo_data()
8460  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8461    self.qm_data = self.generate_qm_data()
8462  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 298, in generate_qm_data
8463    if self.verify_output_file():
8464  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8465    qm_data = self.parse()
8466  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8467    parser = self.get_parser(self.output_file_path)
8468  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8469    return cclib.parser.Mopac(output_file)
8470AttributeError: module 'cclib.parser' has no attribute 'Mopac'
8471
8472======================================================================
8473ERROR: Test that generate_thermo_data() works correctly for MOPAC PM7
8474----------------------------------------------------------------------
8475Traceback (most recent call last):
8476  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopacTest.py", line 218, in test_generate_thermo_data
8477    self.qmmol1.generate_thermo_data()
8478  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8479    self.qm_data = self.generate_qm_data()
8480  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 309, in generate_qm_data
8481    success = self.run()
8482  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 149, in run
8483    return self.verify_output_file()
8484  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8485    qm_data = self.parse()
8486  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8487    parser = self.get_parser(self.output_file_path)
8488  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8489    return cclib.parser.Mopac(output_file)
8490AttributeError: module 'cclib.parser' has no attribute 'Mopac'
8491
8492======================================================================
8493ERROR: Test that generate_thermo_data() can load thermo from the previous MOPAC PM7 run.
8494----------------------------------------------------------------------
8495Traceback (most recent call last):
8496  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopacTest.py", line 238, in test_load_thermo_data
8497    self.qmmol1.generate_thermo_data()
8498  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 359, in generate_thermo_data
8499    self.qm_data = self.generate_qm_data()
8500  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 298, in generate_qm_data
8501    if self.verify_output_file():
8502  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 215, in verify_output_file
8503    qm_data = self.parse()
8504  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/molecule.py", line 332, in parse
8505    parser = self.get_parser(self.output_file_path)
8506  File "/home/travis/build/ReactionMechanismGenerator/RMG-Py/rmgpy/qm/mopac.py", line 230, in get_parser
8507    return cclib.parser.Mopac(output_file)
8508AttributeError: module 'cclib.parser' has no attribute 'Mopac'

@rwest rwest mentioned this pull request Mar 18, 2021
@rwest rwest force-pushed the openbabel3 branch 2 times, most recently from 9147fe2 to 459de4e Compare March 24, 2021 01:42
@rwest rwest requested a review from alongd April 16, 2021 04:00
@codecov
Copy link

codecov bot commented Apr 16, 2021

Codecov Report

Merging #2088 (7eaa733) into master (b67a8b3) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2088      +/-   ##
==========================================
+ Coverage   47.58%   47.61%   +0.03%     
==========================================
  Files          89       89              
  Lines       23575    23575              
  Branches     6133     6133              
==========================================
+ Hits        11217    11226       +9     
+ Misses      11174    11166       -8     
+ Partials     1184     1183       -1     
Impacted Files Coverage Δ
arkane/encorr/data.py 93.30% <100.00%> (ø)
arkane/kinetics.py 12.24% <0.00%> (ø)
rmgpy/molecule/draw.py 53.52% <0.00%> (+0.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b67a8b3...7eaa733. Read the comment docs.

@rwest rwest marked this pull request as ready for review April 16, 2021 10:46
Copy link
Contributor

@mjohnson541 mjohnson541 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just the changes to pathfinderTest test_c6h6o4 should probably be squashed.

@alongd
Copy link
Member

alongd commented Apr 18, 2021

Looks good to me as well. I guess rmg-tests fail due to known reasons?
I'll create a twin ob3 PR in ARC.

@alongd
Copy link
Member

alongd commented Apr 18, 2021

I get a wrong Molecule object for phenanthrene (c1=ccc2c3cc=ccc3ccc2c1) on this branch:
image

@rwest
Copy link
Member Author

rwest commented Apr 18, 2021

Oh :-(

http://openbabel.org/docs/current/Features/Radicals.html

SMILES extensions for radicals

Finishes with:

This extension is not as robust or as carefully considered as standard SMILES and should be used with restraint. A structure that uses c as a radical centre close to aromatic carbons can be confusing to read, and Open Babel’s SMILES parser can also be confused. For example, it recognizes c1ccccc1c as the benzyl radical, but it doesn’t like c1cc(c)ccc1. Radical centres should not be involved in ring closure: for cyclohexyl radical C1cCCCC1 is ok, but c1CCCCC1 is not.

The URL says it's "current" documentation but the HTML says 2.3.1

Guess one could post to the openbabel user group mailing list

Update: found the documentation for newer version

https://open-babel.readthedocs.io/en/latest/Features/Radicals.html#smiles-extensions-for-radicals

@rwest
Copy link
Member Author

rwest commented Apr 18, 2021

It might be our conversion from OBMol to Molecule. Or it might be OpenBabel's parsing of the SMILES.

@alongd
Copy link
Member

alongd commented Apr 18, 2021

I get the correct Molecule, though, when I use the full SMILES C1=CCC2=C3CC=CCC3=CC=C2C1 w/o the lowercase c's

@alongd
Copy link
Member

alongd commented Apr 18, 2021

I requested "posting permission" for the open babel forum.

I got the same problem with c1cc1O. Other ccc.. SMILES I tried were OK. Meanwhile, we could direct users to avoid using this syntax, or wait for the bug to be resolved? Both aren't great options

@alongd
Copy link
Member

alongd commented Apr 18, 2021

Citing from @rwest's link:

This extension is not as robust or as carefully considered as standard SMILES and should be used with restraint. A structure that uses c as a radical centre close to aromatic carbons can be confusing to read, and Open Babel’s SMILES parser can also be confused

So I guess it's OK if it's "designed/known", although the previous version supported these cases better. We might want to document this though?

@mjohnson541
Copy link
Contributor

Do we want to wait and not include this in the release?

@rwest
Copy link
Member Author

rwest commented Apr 20, 2021

Does c1ccc2c(c1)ccc3ccccc23 work?

Not sure what the solution is.

Issue a warning when using lower case letters in a SMILES that things may not be what you expect? Add a note in documentation?

@rwest
Copy link
Member Author

rwest commented Apr 20, 2021

What were you expecting for c1cc1O ? The current rmg.mit.edu website gives cyclopropanol, as does OC1CC1, and https://cactus.nci.nih.gov/chemical/structure/c1cc1O/names/xml

@alongd
Copy link
Member

alongd commented Apr 21, 2021

For c1cc1O I do expect to get OC1CC1, but on this branch I get:

multiplicity 2
1 O u0 p2 c0 {2,S} {7,S}
2 C u0 p0 c0 {1,S} {3,S} {4,D}
3 C u1 p0 c0 {2,S} {4,S} {6,S}
4 C u0 p0 c0 {2,D} {3,S} {5,S}
5 H u0 p0 c0 {4,S}
6 H u0 p0 c0 {3,S}
7 H u0 p0 c0 {1,S}

image

for c1ccc2c(c1)ccc3ccccc23 I do get a reasonable solution on this branch:
image

I think this is not an ordinary "bug", but rather perhaps a "design choice" by OB's devs. I recommend documenting it. We could make RMG crush if any c1cc.... sort of SMILES shortcut ends up as a radical. You can certainty aim to have radicals with c1ccc...'s but it is likely that you got the SMILES correctly if you end up with a multiplicity 1, but I haven't tested thoroughly. This is a bit of a headache and not hermetic. Perhaps don't allow lowercase SMILES will be a better option?

@rwest
Copy link
Member Author

rwest commented Apr 21, 2021

Are you sure c1=ccc2c3cc=ccc3ccc2c1 represents phenanthrene?
Because if I remove the = from it and get c1ccc2c3ccccc3ccc2c1 then that is phenanthrene. According to wikipedia, "In the latter case, bonds between two aromatic atoms are assumed (if not explicitly shown) to be aromatic bonds". If you then explicitly state that a couple of the bonds are actually double bonds, I'm not sure what the algorithm ought to do (remove an H from each end of the bond? just put a double bond there when attempting to Kekulize? ignore it? crash?), but I'm not surprised it doesn't leave it exactly the same as without specifying the bonds, even if some implementations do. I would say that c1=ccc2c3cc=ccc3ccc2c1 is a bad (or at least ambiguous) input.

One way of thinking of the lower case letters is they essentially remove an H atom, so if C would have been CH2 (eg. in C1CCCCC1 cyclohexane) then c becomes CH (eg. in c1ccccc1 benzene). This way of thinking means C is CH4 and c is CH3., CCC is CH3CH2CH3 or C3H8 or propane and Ccc is CH3CHCH2 or C3H6 or propene CC=C. You could argue that "c actually represents an aromatic atom, and thus c and Ccc are chemically invalid and should just crash" but if you have to interpret it as something, then this "rule of thumb" seems as good as any.

So if C1CC1O is cyclopropanol C3H5OH, then c1cc1O should indeed be C3H2OH, which I would probably draw with a double bond and a radical if forced to do a Lewis structure, which is what (the new) OpenBabel does.

So I think, after digging through these, the new OpenBabel branch is behaving as I would expect it to, and you were testing it with some weird malformed SMILES. Is that unfair?

Should we add code to spit out warnings in case others do the same? Either way, I think we go ahead with the change to OpenBabel3

Screen Shot 2021-04-21 at 9 08 59 AM

Screen Shot 2021-04-21 at 9 09 10 AM

NewOpenBabel.ipynb.zip

Note: this notebook currently only tests SMILES string to OpenBabel, not all the way through to RMG Molecule objects

@alongd
Copy link
Member

alongd commented Apr 21, 2021

You are absolutely correct. Thanks for getting to the bottom of this, I agree with the interpretation that NOW OB behaves "as expected". I'm still uncomfortable with the interpretation of the (weird) c1=ccc2c3cc=ccc3ccc2c1 structure. But I see you point re "c equals CH", and that an explicit double bond is indeed ambiguous. Please go ahead with getting this PR in, as far as I can see it looks good. Thanks!

@mjohnson541
Copy link
Contributor

Alright lets get this in. Can you rebase?

rwest and others added 10 commits April 21, 2021 19:28
The release notes say it's an improvement
https://github.com/openbabel/openbabel/releases/tag/openbabel-3-0-0
but it does break some things in the API
https://open-babel.readthedocs.io/en/latest/UseTheLibrary/migration.html#migrating-to-3-0
so subsequent commits should fix RMG to be compatible.
The initial pull request in openbabel openbabel/openbabel#1576 sounds 
like it just broke radicals, to be fixed later.
A follow-up openbabel/openbabel#1626 somewhat fixed things but 
"Now reading undervalent atoms from SMILES does not set the spin (it did previously)", 
with a comment "I didn't add an 'op' to set the spin state of undervalent atoms. 
It could be done of course, but the syntax was sufficiently involved (how to specify which 
atoms?, what default atoms?) that I'd rather wait until someone actually has a use case". 

The code introduced there for how to *draw* radicals gives some clues as to how you 
could handle it though, using the expected valence.

That is the approach I take here. The SpinMultiplicity on the OBAtom object is left incorrect
or undefined, and we fix it when converting from_ob_mol to an RMG Molecule.
This test used to pass with OpenBabel < 3.0, but I think the inchi is invalid?
or at least not standard.
OpenBabel reports:
    Problems/mismatches: Mobile-H( Hydrogens: Locations or number, Number; Charge(s): Do not match)
and cactus.nci.nih.gov converts it to InChI=1S/C6H7O4/c1-2-4-9-6(7)3-5-10-8/h2-3,8H,1,5H2/q+1
which at least doesn't make OpenBabel complain. However, both have a net charge
and cause RMG to crash. I'm not sure what the molecule was ever supposed to represent.
The default is 0 so I won't call it for efficiency's sake,
but this will make sure we don't forget that it's now 
determined this way by OpenBabel 3
…passes

I think this test is equivalent, but writing the SMILES this way both
old OpenBabel and new OpenBabel agree that the N is mol.atoms[0] and
so the test passes.
This test had bene using an invalid InChI, which broke when
we upgraded OpenBabel. Replaced it with a valid molecule.
@rwest
Copy link
Member Author

rwest commented Apr 22, 2021

Rebased, and added a small note to the documentation.

Out of curiosity, I extended the notebook to compare the RDKit back end. In many cases it does similar things, it ends up with similar isomers for the C14H6 case but with different electronic structure. And it indeed crashes for some of the non-aromatic cases.

NewOpenBabel.pdf
NewOpenBabel.ipynb.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants