You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given that, we'll replace all Precision with Distribution: synnada-ai#63. So, while I make the design for #10316, I presumably use Distribution in statistics.
There is a spot where I'll do the merge for statistics, and it'll be spread to the Distribution.
The specific case is that I need to compute the partition-level statistics, aka, files will be grouped as the filegroup, each file group will be treated as a partition, and different partitions will be processed in parallel. So, the partition-level statistics will be from the merge of the files in a filegroup.
Describe the solution you'd like
Create a function that combines their statistical properties into a new distribution. The most appropriate approach is to create a GenericDistribution that approximates the mixture of the two input distributions.
Create a function that combines their statistical properties into a new distribution. The most appropriate approach is to create a GenericDistribution that approximates the mixture of the two input distributions.
Will require an accurate distribution (not just an approximation). Using ProgressiveEval requires (for correctness) that we know the ranges do not overlap
Will require an accurate distribution (not just an approximation
Yes, it depends on whether each distribution is accurate, if they're, the merged distribution should be accurate, or we should merge them conservatively
Is your feature request related to a problem or challenge?
I'm working on the ticket: #10316.
Given that, we'll replace all
Precision
withDistribution
: synnada-ai#63. So, while I make the design for #10316, I presumably useDistribution
in statistics.There is a spot where I'll do the
merge
for statistics, and it'll be spread to theDistribution
.The specific case is that I need to compute the partition-level statistics, aka, files will be grouped as the filegroup, each file group will be treated as a partition, and different partitions will be processed in parallel. So, the partition-level statistics will be from the merge of the files in a filegroup.
Describe the solution you'd like
Create a function that combines their statistical properties into a new distribution. The most appropriate approach is to create a GenericDistribution that approximates the mixture of the two input distributions.
I'll open a PR and we can do more discussions based on the PR.
Describe alternatives you've considered
No
Additional context
No
The text was updated successfully, but these errors were encountered: