Skip to content

[Variant] Adding code to store metadata and value references in VariantArray #7945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 18, 2025

Conversation

abacef
Copy link
Contributor

@abacef abacef commented Jul 16, 2025

Which issue does this PR close?

Are these changes tested?

Tests were already implemented

Are there any user-facing changes?

None

Copy link
Contributor

@codephage2020 codephage2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @abacef. I just worked on this too. I left a few suggestions that might help improve this PR. Feel free to take what works for you! 😊

@@ -88,7 +90,10 @@ impl VariantArray {
));
};
// Ensure the StructArray has a metadata field of BinaryView
let Some(metadata_field) = inner.fields().iter().find(|f| f.name() == "metadata") else {

let cloned = inner.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do the cloning at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know how it is possible since the metadata_field and value_field need to reference the cloned StructArray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is believed that the metadata_field and value_field refer to the StructArray rather than the cloned StructArray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When doing a clone, is it just incrementing the reference count to the function parameter? If so, maybe finding the metadata reference and the value reference on either the passed in StructArray or the clone will come up with the same value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. The final result is the same. However, what I want to say is that we can perform the clone operation after the verification. These verification and value retrieval operations may fail. If we place the clone operation at the end, we can avoid unnecessary operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to clone the argument here even if it will be wasteful on the error case, as the error case is not commonly expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -169,8 +184,13 @@ impl Array for VariantArray {
}

fn slice(&self, offset: usize, length: usize) -> ArrayRef {
let slice = self.inner.slice(offset, length);
let met = VariantArray::find_metadata_field(&slice).unwrap();
let val = VariantArray::find_value_field(&slice).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about whether I can directly use the index without searching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is correct to slice the child arrays

so something like

        let val = self.value_ref.slice(offset, length);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, makes sense.

@alamb alamb changed the title Adding code to store metadata and value references in VariantArray [Variant] Adding code to store metadata and value references in VariantArray Jul 17, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @abacef -- I think this looks great.

@codephage2020 has some good ideas and I left some other comments that I think are worth considering, but I don't think any of them are required to merge this PR

@@ -88,7 +90,10 @@ impl VariantArray {
));
};
// Ensure the StructArray has a metadata field of BinaryView
let Some(metadata_field) = inner.fields().iter().find(|f| f.name() == "metadata") else {

let cloned = inner.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to clone the argument here even if it will be wasteful on the error case, as the error case is not commonly expected

@@ -169,8 +184,13 @@ impl Array for VariantArray {
}

fn slice(&self, offset: usize, length: usize) -> ArrayRef {
let slice = self.inner.slice(offset, length);
let met = VariantArray::find_metadata_field(&slice).unwrap();
let val = VariantArray::find_value_field(&slice).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is correct to slice the child arrays

so something like

        let val = self.value_ref.slice(offset, length);

@@ -59,6 +59,8 @@ pub struct VariantArray {
/// Dictionary-Encoded, preferably (but not required) with an index type of
/// int8.
inner: StructArray,
metadata_ref: ArrayRef,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some comment that explain this is a cached copy of the metadata / value child arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@codephage2020 codephage2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM . 🚀

@alamb alamb merged commit a984ca7 into apache:main Jul 18, 2025
9 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 18, 2025

Thanks again @abacef and @codephage2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant] Improve VariantArray performance by storing the index of the metadata and value arrays
3 participants