Skip to content

Conversation

@AnIrishDuck
Copy link
Collaborator

@AnIrishDuck AnIrishDuck commented Feb 26, 2025

This was pretty much entirely written by Claude via aider. The code quality isn't amazing in my opinion, but it's working on a pretty rough base so that's fair.

Here's what I did:

  • Generate a panics.md via Claude. Prompt it to find any unsafe methods in any of the array methods (which had to be fed in piecemeal as there are tons) and suggest safe replacements.
  • Prompt claude to use said panics.md file to remove any unsafe method calls in deserialize.rs. This generated a lot of errors, I just used aider to feed those errors into claude until they were fixed. Fortunately it didn't seem to get stuck in a loop.
  • Then prompt claude to remove any unwrap, expect, and todo! that it possibly could. This required some precise prompting, it kept wanting to convert a "todo" comment block for unsupported array types into busted code.

I will leave my own comments in here where I have found some non-optimal things.

I will say this is one case where just having Claude hammer on the problem and error messages was very nice. Threading Result values through various method calls is very grunty work that's always unpleasant.

hasher.write(&integer.to_le_bytes());
(hasher.finish(), value)
integer.and_then(|integer_result| {
integer_result.ok().map(|integer| {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not amazing, as bad float values will be converted to NULL. But it's how things currently are for other issues in this parsing code, and it's an improvement over panic-ing.

) -> Result<(), Error> {
let iter = rows.iter().map(|row| match row.borrow() {
Value::Number(number) => Some(deserialize_int_single(*number)),
Value::Number(number) => deserialize_int_single(*number).ok(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

DataType::FixedSizeList(_, _) => Box::new(FixedSizeListArray::new_empty(data_type)),
DataType::Decimal(_, _) => Box::new(PrimitiveArray::<i128>::new_empty(data_type)),
*/
DataType::FixedSizeBinary(_) => Ok(Box::new(FixedSizeBinaryArray::new_empty(data_type))),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the block it kept wanting to flesh out.

DataType::Utf8 => Box::new(MutableUtf8Array::<i32>::new()),
DataType::LargeUtf8 => Box::new(MutableUtf8Array::<i64>::new()),
DataType::FixedSizeList(inner, size) => Box::new(MutableFixedSizeListArray::<_>::new_from(
DataType::FixedSizeList(inner, size) => Box::new(MutableFixedSizeListArray::<_>::try_new_from(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of stuff like this of somewhat-questionable value as the data type coming in should be correct. But it's mostly just being overly paranoid, so I don't think it hurts and I don't want to analyze it further.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because you are paranoid doesn't mean people aren't trying to stuff ints in floats...

@pnehrer
Copy link

pnehrer commented Feb 26, 2025

This is cool. @AnIrishDuck can you share panics.md somewhere? 😁

Comment on lines -46 to +62
Value::Bool(number) => Some(i64::from(*number)),
Value::Bool(number) => Some(Ok(i64::from(*number))),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes type inside Option, so it either didn't work correctly earlier, or won't work now.
Or there is something else that knows how to coerce Result<T> into T?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was what I was referencing in the comment below. If I'm reading this right, it creates a Result that then immediately gets downcast into an Option. Not ideal, but this was already the state of affairs with the larger parsing code. It'll be more annoying to fix that.

let integer = fraction.split(|x| *x == b'.').next().unwrap();
let mut integer: T = lexical_core::parse(integer).unwrap();
let integer = fraction.split(|x| *x == b'.').next().ok_or_else(|| {
Error::ExternalFormat("Failed to parse float: no bytes".to_string())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, one can see it was robot - human would create a nice constructor for Error::ExternalFormat

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was also kinda forced to not do that via context (it was only given this file and the above panics.md), FWIW.

I wonder if different prompting would get it to do the more elegant thing (not enough to actually bother trying this time around haha)

@imp
Copy link

imp commented Feb 27, 2025

Does it make sense to first catchup with the upstream? Diverging further may make the syncing effort progressively larger

@AnIrishDuck
Copy link
Collaborator Author

@imp upstream is no longer maintained. We'd need to switch to something else entirely, which is going to be a big and annoying lift.

We've started getting everyone on board with that, but buy in will take time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants