Feature: Pre-serialise s3 generic for object transform #16

anthonynorth · 2023-10-09T02:49:29Z

An S3 generic to perform object transformations prior to json serialise will give package authors full control of their object json serialisation. This is the same concept as the JavaScript toJSON() method on objects.

Some use cases:

Adding / removing elements from lists
Convert element names to camelCase
Formatting dates, rounding numbers, etc
Applying AsIs to elements that must be scalars

A naïve example implementation:

yyjson_mut_val* any_serialise_function(SEXP object) {
  // should be global
  SEXP to_json = PROTECT(
    Rf_findFun(
      Rf_install("to_json"),
      Rf_findVarInFrame(R_NamespaceRegistry, Rf_install("yyjsonr"))
    )
  );

  SEXP trans_object = PROTECT(
    Rf_eval(Rf_lang2(to_json, object), R_GlobalEnv)
  );

  // serialise trans_object

  UNPROTECT(2); // 1 if to_json is global
  // return the yyjson_mut_val*
}

#' @export
to_json <- function(object, ...) UseMethod("to_json")

# A real use-case would be to add AsIs to scalars, drop items that aren't required, camelCase property names etc.
# but complete object replacement is possible
#' @export
to_json.foobar <- function(object, ...) list(foo = "bar")

foobar <- structure(list(whatever = "doesn't matter"), class = "foobar")
yyjsonr::write_json_str(foobar)
#> [1] "{\"foo\":[\"bar\"]}"

^{Created on 2023-10-09 with reprex v2.0.2}

Overhead? Yes there is

Executing an R method for each object in the tree has some overhead. In prototyping, I observed around 1.5secs of overhead per 1 000 000 to_json() dispatches. If we assume that most objects won't have a to_json() method, we can avoid much of this overhead by skipping the to_json.default() call.

One approach is to cache the classes implementing to_json() and only dispatch if our input object inherits any of these classes.

E.g.

for (/* to_json classes */) {
  if (!Rf_inherits(object, classes[i]) continue;
  // invoke to_json()
  break;
}

In prototyping, with only one s3 method to check, I found this reduced the overhead of serialising types without a to_json() method to approx 20ms per 1 000 000 objects.

my_list <- rep(
  list(structure(list(whatever = "doesn't matter"), class = "not_foobar")),
  1000000
)

bench::mark(
  yyjsonr::write_json_str(my_list),
  iterations = 10,
  check = FALSE
)
#> # A tibble: 1 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 yyjsonr::write_json_str(my_list)    268ms   270ms      3.70    30.5MB        0

^{Created on 2023-10-09 with reprex v2.0.2}

With to_json()

my_list <- rep(
  list(structure(list(whatever = "doesn't matter"), class = "not_foobar")),
  1000000
)

bench::mark(
  yyjsonr::write_json_str(my_list),
  iterations = 10,
  check = FALSE
)
#> # A tibble: 1 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 yyjsonr::write_json_str(my_list)    285ms   286ms      3.49    30.5MB        0

^{Created on 2023-10-09 with reprex v2.0.2}

The text was updated successfully, but these errors were encountered:

coolbutuseless · 2023-10-12T00:05:17Z

Thanks for the great write up!

I will sit down and have a closer look through this, but it feels like a worthwhile feature to add.

coolbutuseless · 2024-01-13T00:05:48Z

I'm feeling like this might over-complicate what yyjsonr is trying to do.

Perhaps this would be more suited to a separate package? Or just using other list-specific tools to transform data before outputting to JSON?

Leaving this issue open, and hoping that others will chime in to nominate their interest in such a feature.

anthonynorth · 2024-01-18T04:47:13Z

I disagree that it's complex, but I made it sound complex by suggesting a custom s3 dispatch to avoid overhead.

In it's simplest form, we'd be calling an S3 method on each object (e.g. a list, dataframe, vector) prior to serialising it. This would be fast for most use-cases, but slower with large list-of-lists.

An eager, recursive transform of objects is definitely doable, but it's not as general as a hook from within yyjsonr itself. The hook allows for defining how types are serialised, which is useful for package authors controlling how their objects are serialised.

coolbutuseless added the enhancement New feature or request label Jan 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Pre-serialise s3 generic for object transform #16

Feature: Pre-serialise s3 generic for object transform #16

anthonynorth commented Oct 9, 2023

coolbutuseless commented Oct 12, 2023

coolbutuseless commented Jan 13, 2024

anthonynorth commented Jan 18, 2024

Feature: Pre-serialise s3 generic for object transform #16

Feature: Pre-serialise s3 generic for object transform #16

Comments

anthonynorth commented Oct 9, 2023

Overhead? Yes there is

coolbutuseless commented Oct 12, 2023

coolbutuseless commented Jan 13, 2024

anthonynorth commented Jan 18, 2024