Open
Description
The implementation is broken as soon as the max or min value is not in the first partition
This was broken for years, so only adding for API compatibility now but we should fix this
def test_df_groupby_idxmax():
pdf = pd.DataFrame(
{"idx": list(range(4)), "group": [1, 1, 2, 2, 1], "value": [10, 20, 20, 10, 40]}
).set_index("idx")
ddf = dd.from_pandas(pdf, npartitions=3)
expected = pd.DataFrame({"group": [1, 2], "value": [4, 2]}).set_index("group")
result_pd = pdf.groupby("group").idxmax()
result_dd = ddf.groupby("group").idxmax()
assert_eq(result_pd, result_dd)
assert_eq(expected, result_dd)
This is a very simple reproducer for the underlying issue
Metadata
Metadata
Assignees
Labels
No labels