Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Count(Expr:Wildcard) with DataFrame API #5518

Closed
wants to merge 1 commit into from

Conversation

jiangzhx
Copy link
Contributor

@jiangzhx jiangzhx commented Mar 9, 2023

Which issue does this PR close?

Closes #5473

@jiangzhx jiangzhx marked this pull request as draft March 9, 2023 04:45
@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules labels Mar 9, 2023
@mingmwang
Copy link
Contributor

I think such logic should be moved to the Analyzer.

@jiangzhx
Copy link
Contributor Author

jiangzhx commented Mar 13, 2023

thanks @mingmwang , as @Jefffrey mention
SQL substitute function is here:

https://github.com/apache/arrow-datafusion/blob/e9852074bacd8c891d84eba38b3417aa16a2d18c/datafusion/sql/src/expr/function.rs#L193-L218

but use dafaframe will got error

    ctx.table("alltypes_plain")
        .await?
        .aggregate(vec![], vec![count(Expr::Wildcard)])?
        .explain(false, false)?
        .show()
        .await?;

I'm trying another way following @Jefffrey's method

@github-actions github-actions bot removed logical-expr Logical plan and expressions optimizer Optimizer rules labels Mar 14, 2023
@jiangzhx jiangzhx force-pushed the count-Wildcard branch 2 times, most recently from fff20ed to 0809980 Compare March 14, 2023 03:46
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Mar 14, 2023
@jiangzhx
Copy link
Contributor Author

jiangzhx commented Mar 14, 2023

As everyone's suggestions, I made some modifications.
Can @mingmwang and @Jefffrey take a look and help with code review when you have time.
thanks a lot.

@jiangzhx jiangzhx force-pushed the count-Wildcard branch 2 times, most recently from 3d424b2 to f5197b1 Compare March 14, 2023 06:46
@jiangzhx jiangzhx marked this pull request as ready for review March 14, 2023 07:05
Comment on lines 997 to 1071
//handle Count(Expr:Wildcard) with DataFrame API
pub fn handle_wildcard(exprs: Vec<Expr>) -> Result<Vec<Expr>> {
let exprs: Vec<Expr> = exprs
.iter()
.map(|expr| {
if let Expr::AggregateFunction(AggregateFunction {
fun,
args,
distinct,
filter,
}) = expr
{
if let aggregate_function::AggregateFunction::Count = fun {
if args.len() == 1 {
let arg = args.get(0).unwrap().clone();
match arg {
Expr::Wildcard => {
Expr::AggregateFunction(AggregateFunction {
fun: fun.clone(),
args: vec![lit(ScalarValue::UInt8(Some(1)))],
distinct: *distinct,
filter: filter.clone(),
})
}
_ => expr.clone(),
}
} else {
expr.clone()
}
} else {
expr.clone()
}
} else {
expr.clone()
}
})
.collect();
Ok(exprs)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extreme nesting is not ideal, i think can simplify to this:

Suggested change
//handle Count(Expr:Wildcard) with DataFrame API
pub fn handle_wildcard(exprs: Vec<Expr>) -> Result<Vec<Expr>> {
let exprs: Vec<Expr> = exprs
.iter()
.map(|expr| {
if let Expr::AggregateFunction(AggregateFunction {
fun,
args,
distinct,
filter,
}) = expr
{
if let aggregate_function::AggregateFunction::Count = fun {
if args.len() == 1 {
let arg = args.get(0).unwrap().clone();
match arg {
Expr::Wildcard => {
Expr::AggregateFunction(AggregateFunction {
fun: fun.clone(),
args: vec![lit(ScalarValue::UInt8(Some(1)))],
distinct: *distinct,
filter: filter.clone(),
})
}
_ => expr.clone(),
}
} else {
expr.clone()
}
} else {
expr.clone()
}
} else {
expr.clone()
}
})
.collect();
Ok(exprs)
}
//handle Count(Expr:Wildcard) with DataFrame API
pub fn handle_wildcard(exprs: Vec<Expr>) -> Result<Vec<Expr>> {
let exprs: Vec<Expr> = exprs
.iter()
.map(|expr| match expr {
Expr::AggregateFunction(AggregateFunction {
fun: aggregate_function::AggregateFunction::Count,
args,
distinct,
filter,
}) if args.len() == 1 => match args[0] {
Expr::Wildcard => Expr::AggregateFunction(AggregateFunction {
fun: aggregate_function::AggregateFunction::Count,
// TODO: replace with the constant
args: vec![lit(ScalarValue::UInt8(Some(1)))],
distinct: *distinct,
filter: filter.clone(),
}),
_ => expr.clone(),
},
_ => expr.clone(),
})
.collect();
Ok(exprs)
}

unsure if can simplify more, feel free to explore that

also check my TODO comment on the constant (i think i mention it in the original issue)

@jiangzhx jiangzhx force-pushed the count-Wildcard branch 2 times, most recently from 2f91005 to 566ed24 Compare March 14, 2023 08:04
@jiangzhx jiangzhx requested a review from Jefffrey March 15, 2023 01:53
@jiangzhx
Copy link
Contributor Author

jiangzhx commented Mar 15, 2023

I think such logic should be moved to the Analyzer.

I saw @mingmwang's PR #5570, maybe I can migrate the current logic to @mingmwang's solution.
I don't know if my idea is correct.

@jiangzhx
Copy link
Contributor Author

jiangzhx commented Mar 17, 2023

try adding AnalyzerRule #5627 to do the samething

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does look good, though worth checking other opinions about where the logic should reside (whether like this or in analyzer as mentioned)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- I also think #5627 looks good to but defer to @mingmwang

.await?;

//make sure sql plan same with df plan
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Mar 18, 2023

Closing in favor of #5518 (so it is clear which we are trying to merge)

Thank you @jiangzhx and @Jefffrey

@alamb alamb closed this Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot Count(Expr:Wildcard) with DataFrame API
4 participants