Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
34a744e
add fix
ben-schwen Apr 6, 2022
80b0b91
add test
ben-schwen Apr 6, 2022
1f6c41e
add more tests
ben-schwen Apr 6, 2022
4ab82c1
add NEWS
ben-schwen Apr 6, 2022
5a3e3c7
typo
MichaelChirico Apr 7, 2022
8d70e6d
sentence structure
MichaelChirico Apr 7, 2022
b1f4892
state bug more precisely
ben-schwen Apr 7, 2022
2183cbb
unwield lengthy if with ws
ben-schwen Apr 7, 2022
f42d498
more tests
ben-schwen Apr 7, 2022
96437fb
extend NEWS for mirror casse
ben-schwen Apr 7, 2022
f45564a
update NEWS
ben-schwen Jun 26, 2022
9ce6737
vapply on single columns instead of whole subset
ben-schwen Sep 9, 2024
b7bbb5a
Merge branch 'master' into merge_factor_char_key
ben-schwen Sep 9, 2024
608599c
use .shallow
ben-schwen Sep 9, 2024
fedde2d
Merge branch 'master' into merge_factor_char_key
MichaelChirico Jun 30, 2025
0d5f5a5
suggested NEWS wording
MichaelChirico Jun 30, 2025
c204850
add OPs original example more exactly to the regression test
MichaelChirico Jun 30, 2025
6750e27
add some tests of multiple join columns
MichaelChirico Jun 30, 2025
49a8f1a
avoid using column x in table x
MichaelChirico Jun 30, 2025
c9a33af
attempt to refactor into huge helper (🤞)
MichaelChirico Jun 30, 2025
07e3acc
trailing ws
MichaelChirico Jun 30, 2025
fa3be64
fix (?) tests
MichaelChirico Jun 30, 2025
dc95644
need to pass 'ans' too
MichaelChirico Jun 30, 2025
5ad17ad
don't reuse overloaded name 'let'
MichaelChirico Jun 30, 2025
7c8d2f7
typo
MichaelChirico Jun 30, 2025
088f876
remove apparently vestigial check
MichaelChirico Jun 30, 2025
dda8722
Merge remote-tracking branch 'origin/master' into merge_factor_char_key
MichaelChirico Jun 30, 2025
a1cbe53
Merge branch 'master' into merge_factor_char_key
ben-schwen Jul 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ rowwiseDT(

14. Passing functions programmatically with `env=` doesn't produce an opaque error, e.g. `DT[, f(b), env = list(f=sum)]`, [#6026](https://github.com/Rdatatable/data.table/issues/6026). Note that it's much better to pass functions like `f="sum"` instead. Thanks to @MichaelChirico for the bug report and fix.

15. If the first column of a key is a factor column containing unsorted levels (`setkey()` does not order the levels) and it is joined to a character column, then an invalidly keyed result would be produced. This could lead to a subsequent error/warning or silent incorrect results. Thanks to @gbrunick for the report and Benjamin Schwendinger for the fix.

## NOTES

1. `transform()` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.
Expand Down
18 changes: 16 additions & 2 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1337,8 +1337,22 @@ replace_dot_alias = function(e) {
## check key on i as well!
ichk = is.data.table(i) && haskey(i) &&
identical(head(key(i), length(leftcols)), names_i[leftcols]) # i has the correct key, #3061
if (keylen && (ichk || is.logical(i) || (.Call(CisOrderedSubset, irows, nrow(x)) && ((roll == FALSE) || length(irows) == 1L)))) # see #1010. don't set key when i has no key, but irows is ordered and roll != FALSE
setattr(ans,"sorted",head(key(x),keylen))
dt_types = function(x, cols = seq_along(x)) vapply_1c(cols, function(j) typeof(x[[j]]))
if (
keylen
&& (
ichk
|| is.logical(i)
|| (
.Call(CisOrderedSubset, irows, nrow(x))
&& (!roll || length(irows) == 1L) # see #1010. don't set key when i has no key, but irows is ordered and roll != FALSE
&& ( #5361 merging on keyed factor with character, check if resulting character is really sorted
identical(dt_types(i, leftcols), dt_types(x, rightcols)) # can only be not identical
|| is.sorted(ans, by=head(key(x), keylen))
)
)
)
) setattr(ans,"sorted",head(key(x),keylen))
}
setattr(ans, "class", class(x)) # retain class that inherits from data.table, #64
setattr(ans, "row.names", .set_row_names(length(ans[[1L]])))
Expand Down
24 changes: 24 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -7070,6 +7070,30 @@ test(1483.3, merge(x,y,by="country",all=TRUE), data.table(country=factor(c("US",
setkey(y)
test(1483.4, y[x], data.table(country="US", key="country"))

# 5361 merge on character and factor should only have key(x) if result is really sorted
let = c("a", "b", "c")
rlet = c("c", "b", "a")
x = data.table(x=rlet)
y = data.table(x=factor(let, levels=rlet), key="x")
test(1483.51, x[y, on="x"], x)
test(1483.52, y[x, on="x"], x)
test(1483.53, merge(x, y, by="x"), data.table(x=let, key="x"))
test(1483.54, merge(y, x, by="x"), data.table(x=let, key="x"))

x = data.table(x=rlet, key="x")
y = data.table(x=factor(let, levels=rlet))
test(1483.61, x[y, on="x"], x)
test(1483.62, y[x, on="x"], data.table(x=let))
test(1483.63, merge(x, y, by="x"), data.table(x=let, key="x"))
test(1483.64, merge(y, x, by="x"), data.table(x=let, key="x"))

x = data.table(x=rlet, a=rlet)
y = data.table(x=factor(let, levels=rlet), b=let, key="x")
test(1483.71, x[y, on="x"], data.table(x=rlet, a=rlet, b=rlet))
test(1483.72, y[x, on="x"], data.table(x=rlet, b=rlet, a=rlet))
test(1483.73, merge(x, y, by="x"), data.table(x=let, a=let, b=let, key="x"))
test(1483.74, merge(y, x, by="x"), data.table(x=let, b=let, a=let, key="x"))

# NULL items should be removed when making data.table from list, #842
# Original fix for #842 added a branch in as.data.table.list() using point()
# Then PR#3471 moved logic from data.table() into as.data.table.list() and now removes NULL items up front, so longer need for the branch
Expand Down