Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StatisticsV2: initial statistics framework redesign #14699

Merged
merged 100 commits into from
Feb 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
b4e8668
StatisticsV2: initial definition and validation method implementation
Jan 8, 2025
992b3c0
Implement mean, median and standard deviation extraction for StatsV2
Fly-Style Jan 9, 2025
e1d9395
Move stats_v2 to `physical-expr` package
Fly-Style Jan 10, 2025
1e27828
Introduce `ExprStatisticGraph` and `ExprStatisticGraphNode`
Fly-Style Jan 11, 2025
674926f
Split the StatisticsV2 and statistics graph locations, prepare the in…
Fly-Style Jan 12, 2025
943b5f1
Merge branch 'apache_main' into feat/statistics_v2
Fly-Style Jan 14, 2025
45f151e
Calculate variance instead of std_dev
Fly-Style Jan 15, 2025
7329c03
Create a skeleton for statistics bottom-up evaluation
Fly-Style Jan 14, 2025
bf6d01c
Introduce high-level test for 'evaluate_statistics()'
Fly-Style Jan 16, 2025
add838c
Refactor result distribution computation during the statistics evalua…
Fly-Style Jan 16, 2025
d52af46
Always produce Unknown distribution in non-mentioned combination case…
Fly-Style Jan 16, 2025
81b756d
Introduce Bernoulli distribution to be used as result of comparisons …
Fly-Style Jan 17, 2025
a35440b
Implement initial statistics propagation of Uniform and Unknown distr…
Fly-Style Jan 20, 2025
518d56c
Implement evaluate_statistics for logical not and unary negation oper…
Fly-Style Jan 21, 2025
a1bbfce
Fix and add tests; make fmt happy
Fly-Style Jan 25, 2025
d9c2d83
Add integration test, implement conversion into Bernoulli distributio…
Fly-Style Jan 26, 2025
c3df2a6
Finish test, small cleanup
Fly-Style Jan 27, 2025
a00e382
minor improvements
berkaysynnada Jan 29, 2025
6245218
Update stats.rs
berkaysynnada Jan 29, 2025
b8068b8
Addressing review comments
Fly-Style Jan 29, 2025
140fb5e
Implement median colmputation for Gaussian-Gaussian pair
Fly-Style Jan 31, 2025
5630ff9
Update stats_v2.rs
berkaysynnada Feb 1, 2025
c3b96ed
minor improvements
berkaysynnada Feb 2, 2025
f4dd402
Addressing second review comments, part 1
Fly-Style Feb 2, 2025
eac21f1
Return true in other cases
Fly-Style Feb 2, 2025
10e187f
Finish addressing review requrests, part 2
Fly-Style Feb 4, 2025
ee4f7ab
final clean-up
berkaysynnada Feb 4, 2025
4f190b8
bug fix
berkaysynnada Feb 4, 2025
9d8baa5
final clean-up
berkaysynnada Feb 4, 2025
1523ffc
apply reverse logic in stats framework as well
berkaysynnada Feb 4, 2025
d78b19d
Merge branch 'feat/statistics_v2' of https://github.com/Fly-Style/dat…
berkaysynnada Feb 4, 2025
f78bb22
Update cp_solver.rs
berkaysynnada Feb 4, 2025
29db57f
revert data.parquet
berkaysynnada Feb 4, 2025
be61480
Apply suggestions from code review
ozankabak Feb 5, 2025
2d108e3
Update datafusion/physical-expr-common/src/stats_v2.rs
ozankabak Feb 5, 2025
e6e35e9
Update datafusion/physical-expr-common/src/stats_v2.rs
ozankabak Feb 5, 2025
19903c3
Apply suggestions from code review
ozankabak Feb 5, 2025
ea94aae
Merge branch 'apache_main' into feat/statistics_v2
Fly-Style Feb 5, 2025
3c6f44b
Fix compilation issue
Fly-Style Feb 5, 2025
6305ebf
Fix mean/median formula for exponential distribution
Fly-Style Feb 5, 2025
ef87360
casting + exp dir + remove opt's + is_valid refractor
berkaysynnada Feb 6, 2025
7baa86f
Update stats_v2_graph.rs
berkaysynnada Feb 6, 2025
8b82289
remove inner mod
berkaysynnada Feb 6, 2025
889376e
last todo: bernoulli propagation
berkaysynnada Feb 6, 2025
243fad8
Apply suggestions from code review
ozankabak Feb 7, 2025
cf965a4
Apply suggestions from code review
ozankabak Feb 7, 2025
48f2b5e
prop_stats in binary
berkaysynnada Feb 9, 2025
8fddaba
Update binary.rs
berkaysynnada Feb 9, 2025
a18b0dd
rename intervals
berkaysynnada Feb 9, 2025
bc4f7d4
block explicit construction
berkaysynnada Feb 10, 2025
97f0408
test updates
berkaysynnada Feb 10, 2025
22ea336
Update binary.rs
berkaysynnada Feb 10, 2025
77b8f20
revert renaming
berkaysynnada Feb 10, 2025
7dca5ee
impl range methods as well
berkaysynnada Feb 10, 2025
bcf2752
Apply suggestions from code review
ozankabak Feb 10, 2025
d390227
Apply suggestions from code review
ozankabak Feb 10, 2025
9f54e94
Update datafusion/physical-expr-common/src/stats_v2.rs
ozankabak Feb 10, 2025
b769daf
Update stats_v2.rs
berkaysynnada Feb 10, 2025
aebe139
fmt
berkaysynnada Feb 10, 2025
7d7dd92
fix bernoulli or eval
berkaysynnada Feb 10, 2025
6b6ce31
fmt
berkaysynnada Feb 10, 2025
42804c9
Review
ozankabak Feb 10, 2025
65cdc05
Review Part 2
ozankabak Feb 10, 2025
9b2a115
not propagate
berkaysynnada Feb 10, 2025
a8e0274
clean-up
berkaysynnada Feb 10, 2025
ba8126b
Review Part 3
ozankabak Feb 10, 2025
4f95d27
Review Part 4
ozankabak Feb 11, 2025
c75c782
Review Part 5
ozankabak Feb 11, 2025
7ebf43e
Review Part 6
ozankabak Feb 11, 2025
bab97d8
Review Part 7
ozankabak Feb 11, 2025
818eba4
Review Part 8
ozankabak Feb 11, 2025
da2ce64
Review Part 9
ozankabak Feb 11, 2025
0cc8a2d
Review Part 10
ozankabak Feb 11, 2025
eee36de
Review Part 11
ozankabak Feb 11, 2025
07a070f
Review Part 12
ozankabak Feb 11, 2025
320aeb6
Review Part 13
ozankabak Feb 11, 2025
9d2a6f3
Review Part 14
ozankabak Feb 11, 2025
712d936
Review Part 15 | Fix equality comparisons between uniform distributions
ozankabak Feb 12, 2025
2163c76
Review Part 16 | Remove unnecessary temporary file
ozankabak Feb 12, 2025
c23974e
Review Part 17 | Leave TODOs for real-valued summary statistics
ozankabak Feb 12, 2025
fee3023
Review Part 18
ozankabak Feb 12, 2025
2566c91
Review Part 19 | Fix variance calculations
ozankabak Feb 12, 2025
195cd50
Review Part 20 | Fix range calculations
ozankabak Feb 12, 2025
d753eeb
Review Part 21
ozankabak Feb 12, 2025
b081ea4
Review Part 22
ozankabak Feb 12, 2025
f8cade9
Review Part 23
ozankabak Feb 13, 2025
44b6b64
Review Part 24 | Add default implementations for evaluate_statistics …
ozankabak Feb 13, 2025
54030b8
Review Part 25 | Improve docs, refactor statistics graph code
ozankabak Feb 14, 2025
940adbc
Review Part 26
ozankabak Feb 14, 2025
ad2c518
Review Part 27
ozankabak Feb 14, 2025
081e3f3
Review Part 28 | Remove get_zero/get_one, simplify propagation in sta…
ozankabak Feb 14, 2025
6d0e9e3
Review Part 29
ozankabak Feb 14, 2025
6854c68
Review Part 30 | Move statistics-combining functions to core module, …
ozankabak Feb 14, 2025
3192eef
Review Part 31
ozankabak Feb 14, 2025
47dc8f7
Review Part 32 | Module reorganization
ozankabak Feb 14, 2025
d5741cd
Review Part 33
ozankabak Feb 14, 2025
cc99a9d
Add tests for bernoulli and gaussians combination
Fly-Style Feb 16, 2025
3c0afec
Incorporate community feedback
ozankabak Feb 23, 2025
aa4b02e
Merge branch 'main' into feat/statistics_v2
ozankabak Feb 23, 2025
17d7a7a
Fix merge issue
ozankabak Feb 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datafusion/common/src/spans.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ impl Span {
/// the column a that comes from SELECT 1 AS a UNION ALL SELECT 2 AS a you'll
/// need two spans.
#[derive(Debug, Clone)]
// Store teh first [`Span`] on the stack because that is by far the most common
// Store the first [`Span`] on the stack because that is by far the most common
// case. More will spill onto the heap.
pub struct Spans(pub Vec<Span>);

Expand Down
Loading