Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign dictionary id use, add columnFromValues. #16

Merged
merged 1 commit into from
Sep 12, 2024
Merged

Conversation

jheer
Copy link
Member

@jheer jheer commented Sep 12, 2024

This PR redesigns how dictionary ids are used.

  • Types: The dictionary() type constructor will set the id to a default value of -1 if not specified.
  • Decoding: Dictionary ids are decoded and written to dictionary types by tableFromIPC.
  • Encoding: The schema is analyzed and (potentially new) consecutive ids are assigned to each dictionary by tableToIPC. This effective serialization of tables that may have been built with conflicting dictionary ids.
  • Building: Builders will treat a non-negative dictionary id as a signal for potential reuse, allowing the same backing dictionary values to be used across multiple columns. In this case, clients must take care not to provide overlapping ids for semantically separate dictionaries.
  • Drop dictionaryTypes map from schema objects, as it is no longer needed. This also simplifies some of the decoding code.

In addition, this PR adds a new builder method that allows values to be provided by a visitor function:

  • Add columnFromValues method. Instead of providing a materialized array, callers can provide a total length and a visit (or "scan") function that invokes a callback with each successive value.

@jheer jheer merged commit 2350b68 into main Sep 12, 2024
2 checks passed
@jheer jheer deleted the jh/dict-ids branch September 12, 2024 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant