-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Missing handling for Iterator[IntoExpr]
#1897
Comments
Opened (narwhals-dev/narwhals#1897) Marking (#3631 (comment)) as resolved
yup, thanks for the report! |
Been looking at this some more and have some notes
|
def flatten(args: Any) -> list[Any]: | |
if not args: | |
return [] | |
if len(args) == 1 and _is_iterable(args[0]): | |
return args[0] # type: ignore[no-any-return] | |
return args # type: ignore[no-any-return] |
Based on the usage, the current signature should be:
utils.flatten
overloads
from __future__ import annotations
from typing import Iterable
from typing import TypeVar
from typing import overload
T = TypeVar("T")
@overload
def flatten(args: tuple[()]) -> list[Any]: ...
@overload
def flatten(args: tuple[Iterable[T]]) -> Iterable[T]: ...
@overload
def flatten(args: tuple[T | Iterable[T], ...]) -> tuple[T | Iterable[T], ...]: ...
def flatten(
args: tuple[T | Iterable[T], ...] | tuple[()],
) -> list[Any] | T | Iterable[T] | tuple[T | Iterable[T], ...] | tuple[T | Iterable[T]]:
if not args:
return []
if len(args) == 1 and _is_iterable(args[0]):
return args[0]
return args
Looking at that vs Callable[[Any], list[Any]]
seems to me like a lot of complexity is being masked at the moment.
Reuse of Iterable
(s)
It looks like the issue shows up for functions that reuse an Iterable
without checking it is not an Iterator
.
Like here where flat_exprs
is unpacked 4 times, but is exhausted by the time you get to is_order_dependent
:
narwhals/narwhals/functions.py
Lines 2463 to 2474 in 83c1b2a
flat_exprs = flatten(exprs) | |
return Expr( | |
lambda plx: plx.any_horizontal( | |
*( | |
extract_compliant(plx, v, parse_column_name_as_expr=True) | |
for v in flat_exprs | |
) | |
), | |
is_order_dependent=operation_is_order_dependent(*flat_exprs), | |
changes_length=operation_changes_length(*flat_exprs), | |
aggregates=operation_aggregates(*flat_exprs), | |
) |
Right now, utils.flatten
and _expression_parsing.extract_compliant
pass-through Iterator
(s) unchanged.
This is fine if they are used at most once, but is unsafe to reuse for collecting metadata.
Alternatives
Maybe something like these approaches and calling tuple(...)
on the result when reusing
more_itertools.recipes.flatten
- Really just assigning the same name to
itertools.chain.from_iterable
- Really just assigning the same name to
more_itertools.more.collapse
- Usually what I think of when I see the name flatten
- Fails on `main`, but getting a different error than when I raised (narwhals-dev#1897) - Original issue was the expression could only be used once, hence the repeated statements
Describe the bug
Discovered this in (vega/altair#3631 (comment)) and managed to narrow it down to a more minimal repro.
I think this may be an issue for any function that accepts
Iterable[IntoExpr]
but assumesSequence[IntoExpr]
.I discovered a similar bug in
polars
a while back, but that was more limited in scopeSo far I've found this impacts these two:
nw.all_horizontal
nw.any_horizontal
I suspect maybe adding a step in
flatten
to collect anIterator
might help?narwhals/narwhals/utils.py
Lines 344 to 349 in f644931
Steps or code to reproduce the bug
Expected results
No error is thrown.
Using the same example in
polars
, the same expression can be used multiple times regardless of the kind ofIterable
used.Output
Actual results
No results, see Relevant log output
Please run narwhals.show_version() and enter the output below.
Relevant log output
The text was updated successfully, but these errors were encountered: