Skip to content

[Feature Request / Question] Tibble-esqe DT Printing #5425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rsangole opened this issue Jul 29, 2022 · 12 comments
Open

[Feature Request / Question] Tibble-esqe DT Printing #5425

rsangole opened this issue Jul 29, 2022 · 12 comments

Comments

@rsangole
Copy link

Hello,

As a long time user of {data.table}, I first want to appreciate the efforts you and your team have taken to build this package and maintain it. Your work is truly an exemplar of the best of what R has to offer, and I routinely look up to it as the touchstone to which I judge my own packages.


As a former {tibble} user, I now use {data.table} everyday for my work for the speed and efficiency of handling large datasets. The only feature which I truly miss about tibble is it's print function, which (imho) leaves a bit to be desired. Especially datasets with many columns, long strings, NA values, and numerics, the default print method for tibbles gives a better user experience. Here's a comparison from a recent dataset on my 18" laptop:

image

(UX features: the left-vs-right align for char-vs-numerics, red-colored NA values, long col names trimmed)

Comparing this with our data.table print method, it results in a very large print; difficult to follow the items in the rows since I have to scroll up and down:

image

image

--

  1. Is this by design? Is there a performance reasoning behind keeping the print method simple?
  2. Are there any options & adjustments I could make to have the data.table print like a tibble?

If this isn't possible, would you consider this a feature request for an updated print method?

Cheers and thanks for your hard work on this amazing package!

@ben-schwen
Copy link
Member

ben-schwen commented Jul 29, 2022

For answering the 2nd question. You could overwrite the printing mechanism of data.table.

There is a nice gist on this topic from @krlmlr https://gist.github.com/krlmlr/35f56d625ea56ff098f965d7c6d5a382

library(data.table)
library(tibble)

print_data_table <- function(x, ...) {
  # Adapted from data.table:::as.data.frame.data.table()
  ans <- x
  attr(ans, "row.names") <- .set_row_names(nrow(x))
  attr(ans, "class") <- c("tbl", "data.frame")
  attr(ans, "sorted") <- NULL
  attr(ans, ".internal.selfref") <- NULL
  print(ans)
  invisible(x)
}

assignInNamespace("print.data.table", print_data_table, asNamespace("data.table"))

x = as.data.table(mtcars)
x
#> # A data frame: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

@andrewrech
Copy link

andrewrech commented Jul 29, 2022

You can control the max col length using

options(datatable.prettyprint.char = 30L)

which is (for me) the main issue.

#1091

@rsangole
Copy link
Author

Fantastic @ben-schwen and @andrewrech . This is exactly what I need.

It would be great if we could incorporate that print method into the package, at least an an option via options(...). That would be a feature request... if not, feel free to close out this issue.

Thanks y'all!

@grantmcdermott
Copy link
Contributor

Talking about options, just to add that you can get the column types (chr, int, etc.) under the headers with:

options(datatable.print.class = TRUE, datatable.print.keys = TRUE)

You can obviously add the above to your .Rprofile if you'd like the behaviour to persist over sessions. But note that the next release of data.table will enable these options by default. So another way of getting this to work is just to grab the latest dev version from GitHub (or r-universe if you're on Windows/Mac and just want binaries).

@MichaelChirico
Copy link
Member

We have long-standing issue #1523 which tracks enhancements to printing. Some things you mention might be added as new options, but I do prefer data.tables defaults here.

In addition to the notes above, there's an option trunc.cols (controlled by option datatable.print.trunc.cols) that will help reduce the total width of printing, see ?print.data.table.

Is there a performance reasoning behind keeping the print method simple?

Not really. The main performance gain comes from subsetting the table to rbind(head(x), tail(x)) -- once the "inner" rows of the table are ignored, most operations on the remaining table will be all but instantaneous. The real issue here is dependencies -- tibble printing uses pillar, which has a non-trivial dependency load:

https://github.com/r-lib/pillar/blob/3f849a5e95eac06075985a4365cd5ab5bdcd18f5/DESCRIPTION#L20-L49

The main innovation here is fansi to color terminal output. We might consider doing something similar with fansi as a Suggested dependency...

@eddelbuettel
Copy link
Contributor

BTW and as it hasn't been mentioned: you can get colored output in an informal (== non-CRAN) way via the rather nice colorout package I quite like and use. Because it gets into R internals it can never be on CRAN but it is good, lightweight and zero (other) depends. Works well with data.table.

@eutwt
Copy link

eutwt commented Jul 31, 2022

Maybe stating the obvious here, but if you want to print a data.table exactly like a tibble and don't mind loading {tibble}, note that as_tibble(dt) won't copy the columns of dt, so just overwriting print.data.table to convert your data to a tibble first isn't all that expensive.

Personally the default settings in dev data.table do everything I'd want, including showing the keys (which tibble obviously doesn't support), but thought I'd mention this in case it's useful and non-obvious to someone else.

library(tibble)
#> Warning: package 'tibble' was built under R version 4.1.2
library(data.table)

dt <- as.data.table(head(mtcars))
tracemem(dt$mpg)
#> [1] "<0x7f99ed356ba8>"

print_data_table <- function(x, ...) print(as_tibble(x), ...)
assignInNamespace("print.data.table", print_data_table, asNamespace("data.table"))

dt # note no messages from tracemem
#> # A tibble: 6 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
#> 6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1

Created on 2022-07-31 by the reprex package (v2.0.1)

@davidbudzynski
Copy link
Contributor

davidbudzynski commented Aug 3, 2022

One thing that caught me out when I switched to data.table is that strings with prefixed whitespace e.g. " foo" look very much like "foo" when in data.table. Tibbles surround strings containing whitespace with quotes, which is really useful in my opinion.

data.table::data.table(a = c("foo", "bar"), b= c(" foo", "bar "))
#>         a      b
#>    <char> <char>
#> 1:    foo    foo
#> 2:    bar   bar
tibble::tibble(a = c("foo", "bar"), b= c(" foo", "bar "))
#> # A tibble: 2 × 2
#>   a     b     
#>   <chr> <chr> 
#> 1 foo   " foo"
#> 2 bar   "bar "

It would be really nice to have something like this in data.table as well

@MichaelChirico
Copy link
Member

there's quite=TRUE which will surround all strings with quotes, but auto-quoting when leading/trailing whitespace is detected shouldn't be too hard to support

@davidbudzynski
Copy link
Contributor

quote = TRUE will surround everything with quotes:

library(data.table)
dt = data.table(a = c("foo", "bar"), b = c(" foo", "bar "), c = 1:2) 
print(dt, quote = TRUE)
#>         "a"      "b"     "c"
#>    "<char>" "<char>" "<int>"
#> 1:    "foo"   " foo"     "1"
#> 2:    "bar"   "bar "     "2"

Good to hear that tibble's way of doing it wouldn't be difficult to support!

@vpetzel
Copy link

vpetzel commented Aug 4, 2022

Actually it is quite easy to make use of tibble printing for data tables, like this:

# Override format.data.table to format with pillar functions. Not exactly sure why we need this, as data.table does not provide format.
format.data.table <- pillar:::format.tbl
# Override print.data.table to use pillar:::print_tbl. We need to copy the code so that format.data.table can be scoped
print.data.table <- function (x, width = NULL, ..., n_extra = NULL, n = NULL, max_extra_cols = NULL, max_footer_lines = NULL) 
{
    if (!is.null(n_extra)) {
        deprecate_soft("1.6.2", "pillar::print(n_extra = )", 
            "pillar::print(max_extra_cols = )", user_env = caller_env(2))
        if (is.null(max_extra_cols)) {
            max_extra_cols <- n_extra
        }
    }
    writeLines(format(x, width = width, ..., n = n, max_extra_cols = max_extra_cols, 
        max_footer_lines = max_footer_lines))
    invisible(x)
}

dt <- data.table::as.data.table(mtcars)
# Add tbl to class list (keep data.table first!)
class(dt) <- c("data.table", "tbl", "data.frame")

The only real challenges here is getting the methods accept a data.table.

@rsangole
Copy link
Author

Thanks for all the replies so far. Very insightful.

@eutwt , I was hoping to put these lines in the .Rprofile at my project level. But that doesn't seem to work. Any idea why?

print_data_table <- function(x, ...) print(as_tibble(x), ...)
assignInNamespace("print.data.table", print_data_table, asNamespace("data.table"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants