Skip to content

Support for user defined functions (UDFs)

Compare
Choose a tag to compare
@machow machow released this 08 Feb 04:22

New Feature: support user defined functions (#146)

  • Support for user defined functions (UDFs). Note that these require annotating the return type. For more on the theory behind these see ADR-003.
from siuba.siu import symbolic_dispatch
from pandas.core.groupby import SeriesGroupBy, GroupBy
from pandas import Series

@symbolic_dispatch(cls = Series)
def cummean(x):
    """Return a same-length array, containing the cumulative mean."""
    return x.expanding().mean()


@cummean.register(SeriesGroupBy)
def _cummean_grouped(x) -> SeriesGroupBy:
    grouper = x.grouper
    n_entries = x.obj.notna().groupby(grouper).cumsum()

    res = x.cumsum() / n_entries

    return res.groupby(grouper)

from siuba import _, mutate
from siuba.data import mtcars

# a pandas DataFrameGroupBy object
g_cyl = mtcars.groupby("cyl")

mutate(g_students, cumul_mean = cummean(_.score))
  • Support for many methods in vector.py, using UDFs (#158)

Bug Fixes

  • Fix regression where .str wasn't being removed when processing siu expressions for SQL (#159)
  • Grouped filter now preserves order
  • Verbs now tested to preserve original index (d938ab3)

Tests

  • Add many more versions of python and pandas to travis CI test matrix (#161)