-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
In #18449 we made InList set handling generic for all types. However, I think we lost some specialization for types that may have slowed things down
The idea is to improve the INLIST performance by using specialized HashSets for different data types, and thus avoiding dynamic dispatch for different types
in #18449 we implemented such a specialization for Int32 but we should probably do it for all the types that had a specialization previously
- All primitive types (Int8, Int32, etc)
- Boolean
- Utf8/LargeUtf8/Utf8View
- Binary/LargeBinary/BinaryView
As @adriangb says:
I'm surprised that doing dynamic dispatch once per batch we evaluate as opposed to twice per batch we evaluate makes that much of a difference. What would make sense that makes a difference to me is doing it once per element vs. once per batch. But I guess that's what benchmarks say!
That does leave me with a question... could we squeeze out even more performance if we specialize for ~ all scalar types? It wouldn't be that hard to write a macro and have AI do the copy pasta of implementing it for all of the types... I'll open a follow up ticket.
Originally posted by @adriangb in #18449 (comment)