-
Added support for
stringr::str_replace_na()
(#153). -
Better checks for unknown and unsupported arguments in
compute()
,collect()
,*_join()
,pivot_*()
,sink_*()
,slice_sample()
anduncount()
(#158, thanks @fkohrt for the report). Now, when those functions receive:- an argument that exists in the
tidyverse
implementation but not supported bytidypolars
, they warn the user. This default behaviour can be changed to error instead withoptions(tidypolars_unknown_args = "error")
. - an argument that doesn't exist at all, they error.
- an argument that exists in the
-
Add support for argument
explicit
intidyr::complete()
.
- Fix edge cases in the
tidypolars
implementation ofstringr::str_sub()
andsubstr()
compared to their original implementation (#159).
tidypolars
requires polars
>= 0.21.0.
summarize()
now drops the last group of the output by default (for consistency withdplyr
). Previously it kept the same groups as in the input data (#149).
-
Add support for argument
.groups
insummarize()
. Value"rowwise"
is not supported for now (#149). -
Added support for
dplyr::lead()
. Indplyr::lead()
anddplyr::lag()
, the argumentsdefault
andorder_by
are now supported (#151).
tidypolars
requires polars
>= 0.20.0.
-
arrange()
now errors with unknown variable names (likedplyr::arrange()
). Previously, unknown variables were silently ignored. Using expressions (likea + b
) is now accepted (#144). -
The parameter
inherit_optimization
is removed from allsink_*()
functions.
-
The power operators
^
and**
now work. -
New function
sink_ndjson()
to write the results of a lazy query to a NDJSON file without collecting it in memory. -
inner_join()
now accepts inequality joins in theby
argument, including the following helpers:between()
,overlaps()
,within()
(#148).
-
Using an external object in
case_when()
,ifelse()
andifelse()
now works. -
str_sub()
doesn't error anymore whenstart
is positive andend
is negative. -
read_*_polars()
functions used to return a standarddata.frame
by mistake. They now return a Polars DataFrame. -
Using
[
for subsetting in expressions now works. Thanks @ginolhac for the report (#141). -
bind_cols_polars()
andbind_rows_polars()
now error (as expected before) if elements are a mix of Polars DataFrames and LazyFrames.
- Do not error when handling columns with datatype
Null
. Note that converting those columns to R withas.data.frame()
,as_tibble()
, orcollect()
is still an issue as ofpolars
0.19.1.
tidypolars
requires polars
>= 0.19.1.
-
describe()
is deprecated as of tidypolars 0.10.0 and will be removed in a future update. Usesummary()
with the same arguments instead (#127). -
describe_plan()
anddescribe_optimized_plan()
are deprecated as of tidypolars 0.10.0 and will be removed in a future update. Useexplain()
withoptimized = TRUE/FALSE
instead (#128). -
In
sink_parquet()
andsink_csv()
, all arguments except for.data
andpath
must be named (#136).
-
Add support for more functions:
- from package
base
:substr()
.
- from package
-
Better error message when a function can come from several packages but only one version is translated (#130).
-
row_number()
now works without argument (#131). -
New functions to import data as Polars DataFrames and LazyFrames (#136):
read_<format>_polars()
to import data as a Polars DataFrame;scan_<format>_polars()
to import data as a Polars LazyFrame;<format>
can be "csv", "ipc", "json", "parquet".
Those can replace functions from
polars
. For example,polars::pl$read_parquet(...)
can be replaced byread_parquet_polars(...)
. -
New functions to write Polars DataFrames to external files:
write_<format>_polars()
where<format>
can be "csv", "ipc", "json", "ndjson", "parquet" (#136). -
New function
sink_ipc()
that is similar tosink_parquet()
andsink_csv()
but for IPC files (#136). -
across()
now throws a better error message when the user passes an external list to.fns
. This works withdplyr
but cannot work withtidypolars
(#135). -
Added support for argument
.add
ingroup_by()
.
-
stringr::str_sub()
now works when bothstart
andend
are negative. -
Fixed a bug in
str_sub()
whenstart
was greater than 1. -
stringr::str_starts()
andstringr::str_ends()
now work with a regex. -
fill()
doesn't error anymore when...
is empty. Instead, it returns the input data. -
unite()
now provides a proper error message whencol
is missing. -
unite()
doesn't error anymore when...
is empty. Instead, it uses all variables in the dataset. -
filter()
,mutate()
andsummarize()
now work when using a column from another data.frame, e.g.my_polars_df |> filter(x %in% some_data_frame$y)
-
replace_na()
no longer converts the column to the datatype of the replacement, e.g.data |> replace_na("a")
will error if the input data is numeric. -
n_distinct()
now correctly applies thena.rm
argument when several columns are passed as input (#137).
tidypolars
requires polars
>= 0.18.0.
-
Add support for several functions:
-
from package
base
:%%
and%/%
. -
from package
dplyr
:dense_rank()
,row_number()
. -
from package
lubridate
:wday()
.
-
-
Better handling of missing values to match
R
behavior. In the following functions, if there is at least one missing value andna.rm = FALSE
(the default), then the output will beNA
:max()
,mean()
,median()
,min()
,sd()
,sum()
,var()
(#120). -
New argument
cluster_with_columns
incollect()
,compute()
, andfetch()
. -
Add a global option
tidypolars_unknown_args
to control what happens whentidypolars
doesn't know how to handle an argument in a function. The default is to warn and the only other accepted value is"error"
.
count()
andadd_count()
no longer overwrite a variable namedn
if the argumentname
is unspecified.
tidypolars
requires polars
>= 0.17.0.
-
As announced in
tidypolars
0.7.0, the behavior ofcollect()
has changed. It now returns a standard Rdata.frame
and not a PolarsDataFrame
anymore. Replacecollect()
bycompute()
(with the same arguments) to keep the old behavior. -
In
bind_rows_polars()
, if.id
is passed, the resulting column now is of type character instead of integer.
-
Add support for several functions:
-
from package
base
:all()
,any()
,diff()
,ISOdatetime()
,length()
,rev()
,unique()
. -
from package
dplyr
:consecutive_id()
,min_rank()
,na_if()
,n_distinct()
,nth()
. -
from package
lubridate
:make_datetime()
. -
from package
stringr
:str_dup()
,str_split()
,str_split_i()
,str_trunc()
. -
from package
tidyr
:replace_na()
(the data.frame method was already translated but not the vector one that can be used inmutate()
for example).
-
-
It is now possible to use explicit namespaces (such as
dplyr::first()
instead offirst()
) inmutate()
,summarize()
andfilter()
(#114). -
In
bind_rows_polars()
, if all elements are named and.id
is specified, the.id
column will use the names of the elements (#116). -
It is now possible to rename variables in
select()
(#117). -
Add support for argument
na_matches
in all join functions (exceptcross_join()
that doesn't need it) (#109).
-
Local variables in custom functions could not be used in tidypolars functions (reported in a blog post of Art Steinmetz). This is now fixed.
-
across()
now works when.cols
contains only one variable and.fns
contains only one function. -
In
across()
, the.cols
argument now takes into account variables created in the samemutate()
orsummarize()
call beforeacross()
.as_polars_df(mtcars) |> head(n = 3) |> mutate( foo = 1, across(.cols = contains("oo"), \(x) x - 1) ) shape: (3, 12) ┌──────┬─────┬───────┬───────┬───┬─────┬──────┬──────┬─────┐ │ mpg ┆ cyl ┆ disp ┆ hp ┆ … ┆ am ┆ gear ┆ carb ┆ foo │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞══════╪═════╪═══════╪═══════╪═══╪═════╪══════╪══════╪═════╡ │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 4.0 ┆ 0.0 │ │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 4.0 ┆ 0.0 │ │ 22.8 ┆ 4.0 ┆ 108.0 ┆ 93.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 1.0 ┆ 0.0 │ └──────┴─────┴───────┴───────┴───┴─────┴──────┴──────┴─────┘
Note that the
where()
function is not supported here. For example:as_polars_df(mtcars) |> mutate( foo = 1, across(.cols = where(is.numeric), \(x) x - 1) )
will not return 0 for the variable
foo
. A warning is emitted about this behavior. -
Better handling of negative values in
c()
when called inmutate()
andsummarize()
.
tidypolars
requires polars
>= 0.16.0.
-
as_polars()
is now removed. It was deprecated in 0.6.0. Useas_polars_df()
oras_polars_lf()
instead. -
to_r()
is now removed. It was deprecated in 0.6.0. Useas.data.frame()
oras_tibble()
instead. -
For consistency with
dplyr
, the behavior ofcollect()
will change in 0.8.0 as it will perform the lazy query and convert the result to a standarddata.frame
. For now,collect()
only throws a warning about this future change. It is recommended to usecompute()
to only perform the query and get a Polars DataFrame as output (#101).
-
Several improvements and changes for
pivot_wider()
(#95):names_from
can now takes several variables;- add support for
id_cols
andnames_glue
; - default value of
names_sep
now is_
, for consistency withtidyr
; - fix documentation as
pivot_wider()
doesn't work on LazyFrame.
-
Add support for
stringr::regex()
. Note that only the argumentignore_case
is supported for now (#97). -
Add support for several
lubridate
functions:dweeks()
,ddays()
,dhours()
,dminutes()
,dseconds()
,dmilliseconds()
,make_date()
(#107). -
When a
polars
function called internally fails, the original error message is now displayed. -
Add support for
group_split()
(forDataFrame
only). -
Add support for argument
relationship
inleft_join()
,right_join()
,full_join()
andinner_join()
(#106).
tidypolars
requires polars
>= 0.15.0.
-
as_polars()
is deprecated and will be removed in 0.7.0. Useas_polars_lf()
oras_polars_df()
instead. -
as_polars()
doesn't have an argumentwith_string_cache
anymore. When set toTRUE
, this enabled the string cache globally, which could lead to undesirable side effects. -
to_r()
is deprecated and will be removed in 0.7.0. Useas.data.frame()
oras_tibble()
instead. This used to silently return aLazyFrame
if the input wasLazyFrame
. It now automatically collects theLazyFrame
(#88). -
pull()
nows automatically collects inputLazyFrame
(#89).
-
Add support for argument
.keep
inmutate()
(#80). -
Add support for
group_vars()
andgroup_keys()
(#81). -
Experimental support of
rowwise()
. For now, this is limited to a few functions:mean()
,median()
,min()
,max()
,sum()
,all()
,any()
.rowwise()
andgroup_by()
cannot be used at the same time (#40). -
All functions that return a polars
Data/LazyFrame
now add the class"tidypolars"
to the output (#86). -
Support
which.min()
,which.max()
,dplyr::n()
. -
Support
.data[[
and.env[[
in addition to.data$
and.env$
. Better error messages when the objects specified in.data
or.env
don't exist.
pull()
now errors whenvar
is of length > 1.
tidypolars
requires polars
>= 0.12.0.
-
across()
now errors if the argument.cols
is not provided (either named or unnamed). This behavior was deprecated indplyr
1.1.0. -
It is no longer possible to use
!
inarrange()
to sort by decreasing order, for compatibility withdplyr::arrange()
. Use-
ordesc()
instead.
-
summarize()
now works on ungrouped data and returns a 1-row output. -
It is now possible to use
desc(x1)
inarrange()
to sort in decreasing order ofx1
(this is equivalent to-x1
). -
Add support for argument
names_prefix
inpivot_longer()
. -
Add support for arguments
names_prefix
andnames_sep
inpivot_wider()
. -
Add support for
tidyr::uncount()
. -
All
*_join()
functions now work whenby
is a specification created bydplyr::join_by()
. Notice that this is limited to equality joins for now. -
You can now use the "embrace" operator
{{ }}
to pass unquoted column names (among other things) as arguments of custom functions. See the "Programming with dplyr" vignette for some examples. -
bind_cols_polars()
now works with twoLazyFrame
s, but not more. -
Add support for argument
.name_repair
inbind_cols_polars()
(#74). -
Support for
.env$
and.data$
pronouns in expressions offilter()
,mutate()
andsummarize()
. -
Support named vector in the argument
pattern
ofstr_replace_all()
, where names are patterns and values are replacements. -
Using
%in%
for factor variables doesn't require enabling the string cache anymore.
-
summarize()
no longer errors whenacross(everything(), ...)
is used with.by
. -
All
*_join()
functions no longer error when a named vector is provided in the argumentby
. -
Expressions with values only are not named "literal" anymore.
- Simplify the procedure to support new functions.
tidypolars
requires polars
>= 0.11.0.
- It is no longer possible to pass a list in
rename()
.
-
The argument
with_string_cache
inas_polars()
now enables the string cache globally if set toTRUE
(#54). -
Better error message in
filter()
when comparing factors to strings while the string cache is disabled. -
Basic support for
strptime()
. It is possible to usestrptime(*, strict = FALSE)
to not error when the parsing of some characters fails. -
New argument
.by
infilter()
,mutate()
, andsummarize()
, and new argumentby
in theslice_*()
functions. This allows to do operations on groups without usinggroup_by()
andungroup()
. See thedplyr
vignette for more information (#59). -
rename()
now accepts unquoted names both old and new names. -
Support fixed regexes in
str_detect()
(usingfixed()
) and ingrepl()
(usingfixed = TRUE
).
-
Improve robustness of sequential expressions in
mutate()
andsummarize()
(i.e expressions that should be run one after the other because they depend on variables created in the same call) (#58). -
relocate()
now works correctly when.after = last_col()
. -
All functions that work on grouped data now correctly restore the groups structure (#62).
-
Error messages coming from
mutate()
,summarize()
, andfilter()
now give the right function call. -
Faster tidy selection (#61).
tidypolars
requires polars
>= 0.10.0.
-
All functions starting with
pl_
have been removed to the benefit of the S3 methods. For example,pl_distinct()
doesn't exist anymore so the only way to use it is to loaddplyr
and to usedistinct()
on a Polars DataFrame or LazyFrame. This is to avoid confusion about compatibility withdplyr
andtidyr
. See #49 for a more detailed explanation. -
pl_bind_rows()
andpl_bind_cols()
are renamedbind_rows_polars()
andbind_cols_polars()
respectively. This is becausebind_rows()
andbind_cols()
are not S3 methods (this might change in future versions ofdplyr
).
-
New function
duplicated_rows()
that is the opposite ofdistinct()
(#50). -
New argument
.id
inbind_rows_polars()
. -
bind_rows_polars()
can now bind Data/LazyFrames that don't have the same schema. Columns will be upcast to common types if necessary. Unknown columns will be filled withNA
.
complete()
now works correctly on grouped data.
relig_income
andfish_encounters
are not reexported anymore sincetidyr
is now imported.
tidypolars
requires polars
>= 0.9.0.
-
Rename
pl_fetch()
tofetch()
. -
New functions supported:
describe()
,sink_csv()
,slice_sample()
. -
New argument
fill
inpl_complete()
. -
Support
stringr::str_to_title()
andtools::toTitleCase()
. -
Support
stringr::fixed()
to use literal strings. -
Support replacements with captured groups like
\\1
instringr::str_replace()
andstringr::str_replace_all()
.
sink_parquet()
didn't use the user inputs (apart from thepath
).
-
Clearer error message when an expression contains
<pkg>::
. This is not supported for now but could potentially be implemented later. -
pl_colnames()
is no longer exported.
-
Support
as.numeric()
,as.character()
,as.logical()
,grepl()
, andpaste()
in expressions inpl_filter()
,pl_mutate()
andpl_summarize()
. -
Support
sink_parquet()
(#38). -
Support
fetch()
(#42). -
Support for additional
stringr
functions:str_detect()
,str_extract_all()
,str_pad()
,str_squish()
,str_trim()
,word()
(some arguments or corner cases are not supported yet). -
Add all optimization parameters in
collect()
.
-
Fix
pl_mutate()
andpl_summarize()
when expressions use some variables previously created or modified (#10, #37). -
Fix bug in
pl_filter()
when passing a vector in the RHS of%in%
.
-
Improve the backend to translate R expressions into Polars expressions. This also led to a complete rewriting of the vignette "R and Polars expressions" (#27).
-
Error messages should now report the correct function call.
-
Improve CI coverage (#35).
- First Github release.