Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Pre-serialise s3 generic for object transform #16

Open
anthonynorth opened this issue Oct 9, 2023 · 3 comments
Open

Feature: Pre-serialise s3 generic for object transform #16

anthonynorth opened this issue Oct 9, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@anthonynorth
Copy link

An S3 generic to perform object transformations prior to json serialise will give package authors full control of their object json serialisation. This is the same concept as the JavaScript toJSON() method on objects.

Some use cases:

  • Adding / removing elements from lists
  • Convert element names to camelCase
  • Formatting dates, rounding numbers, etc
  • Applying AsIs to elements that must be scalars

A naïve example implementation:

yyjson_mut_val* any_serialise_function(SEXP object) {
  // should be global
  SEXP to_json = PROTECT(
    Rf_findFun(
      Rf_install("to_json"),
      Rf_findVarInFrame(R_NamespaceRegistry, Rf_install("yyjsonr"))
    )
  );

  SEXP trans_object = PROTECT(
    Rf_eval(Rf_lang2(to_json, object), R_GlobalEnv)
  );

  // serialise trans_object

  UNPROTECT(2); // 1 if to_json is global
  // return the yyjson_mut_val*
}
#' @export
to_json <- function(object, ...) UseMethod("to_json")

# A real use-case would be to add AsIs to scalars, drop items that aren't required, camelCase property names etc.
# but complete object replacement is possible
#' @export
to_json.foobar <- function(object, ...) list(foo = "bar")
foobar <- structure(list(whatever = "doesn't matter"), class = "foobar")
yyjsonr::write_json_str(foobar)
#> [1] "{\"foo\":[\"bar\"]}"

Created on 2023-10-09 with reprex v2.0.2

Overhead? Yes there is

Executing an R method for each object in the tree has some overhead. In prototyping, I observed around 1.5secs of overhead per 1 000 000 to_json() dispatches. If we assume that most objects won't have a to_json() method, we can avoid much of this overhead by skipping the to_json.default() call.

One approach is to cache the classes implementing to_json() and only dispatch if our input object inherits any of these classes.

E.g.

for (/* to_json classes */) {
  if (!Rf_inherits(object, classes[i]) continue;
  // invoke to_json()
  break;
}

In prototyping, with only one s3 method to check, I found this reduced the overhead of serialising types without a to_json() method to approx 20ms per 1 000 000 objects.

my_list <- rep(
  list(structure(list(whatever = "doesn't matter"), class = "not_foobar")),
  1000000
)

bench::mark(
  yyjsonr::write_json_str(my_list),
  iterations = 10,
  check = FALSE
)
#> # A tibble: 1 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 yyjsonr::write_json_str(my_list)    268ms   270ms      3.70    30.5MB        0

Created on 2023-10-09 with reprex v2.0.2

With to_json()

my_list <- rep(
  list(structure(list(whatever = "doesn't matter"), class = "not_foobar")),
  1000000
)

bench::mark(
  yyjsonr::write_json_str(my_list),
  iterations = 10,
  check = FALSE
)
#> # A tibble: 1 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 yyjsonr::write_json_str(my_list)    285ms   286ms      3.49    30.5MB        0

Created on 2023-10-09 with reprex v2.0.2

@coolbutuseless
Copy link
Owner

Thanks for the great write up!

I will sit down and have a closer look through this, but it feels like a worthwhile feature to add.

@coolbutuseless
Copy link
Owner

I'm feeling like this might over-complicate what yyjsonr is trying to do.

Perhaps this would be more suited to a separate package? Or just using other list-specific tools to transform data before outputting to JSON?

Leaving this issue open, and hoping that others will chime in to nominate their interest in such a feature.

@anthonynorth
Copy link
Author

I disagree that it's complex, but I made it sound complex by suggesting a custom s3 dispatch to avoid overhead.

In it's simplest form, we'd be calling an S3 method on each object (e.g. a list, dataframe, vector) prior to serialising it. This would be fast for most use-cases, but slower with large list-of-lists.

An eager, recursive transform of objects is definitely doable, but it's not as general as a hook from within yyjsonr itself. The hook allows for defining how types are serialised, which is useful for package authors controlling how their objects are serialised.

@coolbutuseless coolbutuseless added the enhancement New feature or request label Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants