-
Notifications
You must be signed in to change notification settings - Fork 976
[Variant] Adding code to store metadata and value references in VariantArray #7945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @abacef. I just worked on this too. I left a few suggestions that might help improve this PR. Feel free to take what works for you! 😊
@@ -88,7 +90,10 @@ impl VariantArray { | |||
)); | |||
}; | |||
// Ensure the StructArray has a metadata field of BinaryView | |||
let Some(metadata_field) = inner.fields().iter().find(|f| f.name() == "metadata") else { | |||
|
|||
let cloned = inner.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do the cloning at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know how it is possible since the metadata_field
and value_field
need to reference the cloned StructArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is believed that the metadata_field and value_field refer to the StructArray rather than the cloned StructArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When doing a clone, is it just incrementing the reference count to the function parameter? If so, maybe finding the metadata reference and the value reference on either the passed in StructArray or the clone will come up with the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. The final result is the same. However, what I want to say is that we can perform the clone operation after the verification. These verification and value retrieval operations may fail. If we place the clone operation at the end, we can avoid unnecessary operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to clone the argument here even if it will be wasteful on the error case, as the error case is not commonly expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -169,8 +184,13 @@ impl Array for VariantArray { | |||
} | |||
|
|||
fn slice(&self, offset: usize, length: usize) -> ArrayRef { | |||
let slice = self.inner.slice(offset, length); | |||
let met = VariantArray::find_metadata_field(&slice).unwrap(); | |||
let val = VariantArray::find_value_field(&slice).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about whether I can directly use the index without searching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is correct to slice the child arrays
so something like
let val = self.value_ref.slice(offset, length);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @abacef -- I think this looks great.
@codephage2020 has some good ideas and I left some other comments that I think are worth considering, but I don't think any of them are required to merge this PR
@@ -88,7 +90,10 @@ impl VariantArray { | |||
)); | |||
}; | |||
// Ensure the StructArray has a metadata field of BinaryView | |||
let Some(metadata_field) = inner.fields().iter().find(|f| f.name() == "metadata") else { | |||
|
|||
let cloned = inner.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to clone the argument here even if it will be wasteful on the error case, as the error case is not commonly expected
@@ -169,8 +184,13 @@ impl Array for VariantArray { | |||
} | |||
|
|||
fn slice(&self, offset: usize, length: usize) -> ArrayRef { | |||
let slice = self.inner.slice(offset, length); | |||
let met = VariantArray::find_metadata_field(&slice).unwrap(); | |||
let val = VariantArray::find_value_field(&slice).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is correct to slice the child arrays
so something like
let val = self.value_ref.slice(offset, length);
@@ -59,6 +59,8 @@ pub struct VariantArray { | |||
/// Dictionary-Encoded, preferably (but not required) with an index type of | |||
/// int8. | |||
inner: StructArray, | |||
metadata_ref: ArrayRef, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add some comment that explain this is a cached copy of the metadata / value child arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…cing existing columns instead of finding the columns again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM . 🚀
Thanks again @abacef and @codephage2020 |
Which issue does this PR close?
VariantArray
performance by storing the index of the metadata and value arrays #7920.Are these changes tested?
Tests were already implemented
Are there any user-facing changes?
None