Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions src/graphmol/rw_mol.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,17 @@ impl RWMol {
ROMol { ptr }
}

pub fn from_smarts(smarts: &str) -> Result<Self, Box<dyn std::error::Error>> {
pub fn from_smarts(smarts: &str) -> Result<Self, &str> {
if smarts.is_empty(){ return Err("empty SMARTS"); }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I do not study chemistry myself, but informatics, I am not sure, if this is the wanted behavior or if you'd prefer a wildcard here or if maybe an empty string is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably don't want to allow an empty smarts, so this works for me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for contributing this PR! It's close but I think a little guidance would help. First off, it's good to reiterate the structure of this git repository. It contains two rust crates: rdkit and rdkit-sys. The sys crate is where pointers are built and where rust meets the C++ code of rdkit. You want all exception handling to happen in the sys crate. The rdkit crate is for high-level convenience. Every pointer type from rdkit-sys is wrapped in a struct -- with hidden pointers, as you've seen, we wan't to create a "nanny state" where our users can't do unsafe things. I believe you want to take a look at the sys crate as part of your bug fix.

Now for my feedback on this code snippet:

Using the Result type is absolutely the right way to go -- we need to discriminate between good values and error values. My only objection is the type on the error path, &str is not very Rust-y. We want to avoid "stringly-typed" data and instead use more enums.

See

pub fn from_smiles(smiles: &str) -> Result<Self, ROMolError> {
let_cxx_string!(smiles_cxx_string = smiles);
let ptr = ro_mol_ffi::smiles_to_mol(&smiles_cxx_string);
match ptr {
Ok(ptr) => {
if ptr.is_null() {
Err(ROMolError::UnknownConversionError)
} else {
Ok(ROMol { ptr })
}
}
Err(e) => Err(ROMolError::ConversionException(e.to_string())),
}
}
for an example of how we build enum errors when working with SMILES. I believe you want to make a RWMolError enum and instantiate it in your error path.

My next concern is performing condition checking in rw_mol.rs itself -- in general I want rdkit (c++) to drive what happens. It's very possible in the future our friends at the RDKit project could change how empty strings behave (although unlikely) and I don't want the Rust logic to be at odds with whatever they decide. Instead, check if the returned pointer is null and if so, build the appropriate enum.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the detailed feedback! You are totally right about the enums and I am going to adapt it to the handling of errors in ro_mol.rs.
Concerning the empty SMARTS, I checked what happens in Python and they return a valid Mol object that doesn't have any contents (so not a nullpointer, but empty). So maybe we should go for that, too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to model what the C++ library does -- it does mean that python folks are going to have to rethink how they use rdkit. The rdkit-rs implementation puts errors upfront and in your face. Clear error propagation is important for our users of the the cheminee search engine, for example. We don't want too much "magic".

So think of this project as C++ flavored RDKit 😄. Hopefully you are familiar with the autogenerated rdkit docs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a deeper look at the rdkit-Documentation, though I must say I have never dealt with C++. I still found an interesting comment in the actual C++ code stating: // empty strings produce empty molecules. So maybe having an error in Rust at this part would not really fit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, please address my last comment and I'm happy to merge this and push it up to crates.io. Thanks for your perseverance and patience!

let_cxx_string!(smarts = smarts);

let ptr = rdkit_sys::rw_mol_ffi::smarts_to_mol(&smarts)?;
Ok(RWMol { ptr })
let ptr = rdkit_sys::rw_mol_ffi::smarts_to_mol(&smarts).unwrap();
if ptr.is_null() {
Err("invalid SMARTS")
} else {
Ok(RWMol { ptr })

}
}
}

Expand Down
16 changes: 16 additions & 0 deletions tests/test_graphmol.rs
Original file line number Diff line number Diff line change
Expand Up @@ -294,3 +294,19 @@ fn test_building_rwmol_from_smarts() {
let result = substruct_match(&ro_mol, &query_mol, &SubstructMatchParameters::default());
assert_eq!(result.len(), 0);
}

#[test]
fn test_building_rwmol_from_invalid_smarts() {
let smarts = "string";
let e =RWMol::from_smarts(smarts).unwrap_err();
assert_eq!(e, "invalid SMARTS");

}

#[test]
fn test_building_rwmol_from_empty_smarts() {
let smarts = "";
let e =RWMol::from_smarts(smarts).unwrap_err();
assert_eq!(e, "empty SMARTS");

}