Experiment with compiling (ordinary) variants to objects. #3801

cristianoc · 2019-09-02T15:31:29Z

Experiment with the runtime representation of (non-polymorphic) variants, to see if they can be produced/consumed from JS without the need for conversion.

Example

type v =
  | A1
  | A2
  | B(int)
  | C(int, int)
  | D((int, int));

let a1 = A1;
let a2 = A2;
let b = B(34);
let c = C(4, 2);
let d = D((4, 2));

Is represented as:

var a1 = "A1";

var a2 = "A2";

var b = /* constructor */{
  tag: "B",
  Arg0: 34
};

var c = /* constructor */{
  tag: "C",
  Arg0: 4,
  Arg1: 2
};

var d = /* constructor */{
  tag: "D",
  Arg0: /* tuple */[
    4,
    2
  ]
};

Ocaml objects, exceptions, polymorphic variants, and lazy values for now keep the existing representation.

Changes

Requires operations to:

Access the tag
Access the length with e.g. Caml_obj_extern.size_of_t and Caml_obj_extern.length.
Copy/traverse the object as in Caml_hash.caml_hash or Caml_obj.caml_obj_dup.

The pattern-matching compiler had to be modified so it does not apply a range-discovering algorithm to the numeric tags. But only uses equality on the string constants.

TODO

1 Disabled tests that use jsConverter which is not compatible:

ast_js_mapper_poly_test.ml
ast_abstract_test.ml
One specific feature might need revisiting: | A1 [@bs.as 3].

2 Removed tests involving generated parsers:
Supporting generated parsers requires passing some information differently.
Currently, the internal representation of variants is implicitly assumed in the generated code.
E.g. the table yytransl_const is an array, assuming that the index is the numeric tag of terminals.
Since these are only used for bigger internal tests, it's possible to adapt ocamllex to generate different code.

bobzhang · 2019-09-18T02:08:29Z

we discussed such before

var c = /* constructor */{
  tag: "C",
  "0": 4,
  "1": 2
};

such encoding is slow (with numeric val as key)

cristianoc · 2019-09-18T02:15:59Z

such encoding is slow (with numeric val as key)

How do we get some numbers? We need something a bit more precise to evaluate the trade-offs.
There are many possibilities including:

Use numbers by default, and let people opt into strings.
Use strings by default, and let people opt into numbers.
Use genType to generate singleton types for the numeric constants, to act as a form of documentation. So it's still usable from JS.

bobzhang · 2019-09-18T05:38:13Z

see discussions in #24, it should be fine to switch tag between number and string.
It is that {"tag" : 0 , "0":a,"1":b} is quite different from [a,b].tag = 0

cristianoc · 2019-09-18T06:58:25Z

@bobzhang ooh so I've tested the difference is 10X. Looks like the situation is exactly the same as 3 years ago.
That makes tradeoffs really easy: no way we're going to use this.

However, it still makes sense to explore representing tags as strings, and measure if there's a perf difference. This would also mean that for zero-ary variants, the interop is still perfect.
Also, we can reuse a lot of what's done here to do the same for polymorphic variants.

bobzhang · 2019-09-18T11:29:57Z

However, it still makes sense to explore representing tags as strings, and measure if there's a perf difference.

It makes sense to me, as it also makes debug easier

This would also mean that for zero-ary variants, the interop is still perfect.

For zero-arity variants, polyvar makes more sense to be compiled into string, as it is structual typing and it has more flexibity with names (name can start with lowercase letter). We also need be careful about a speical case, take list for example type list = | Cons of int * list | Nil, in the compiler we take advantage of the fact that Nil is false, if we make it string, such invariant no longer holds

bloodyowl · 2019-09-19T09:11:44Z

would something like {tag: "Name", contents: [1, 2]} have the same negative performance impact? the ability to serialise BucklSscript's data structures would be a huge help for server-side rendering as we generally do something like : window.initialState = JSON.stringify(data)

cristianoc · 2019-09-19T11:25:18Z

For reference, here's the bernchmark: 37cd85b#diff-7074723a3e2e5fe591d29f327ab490e7R1.

cristianoc · 2019-09-19T11:27:25Z

@bloodyowl serialization is related but also can be orthogonal. E.g. see here: https://github.com/cristianoc/REInfer/blob/master/src/Serialize.re

(It might require updating, but that's the gist of what kind of thing is required -- just a bit of knowledge of what the runtime representations in general look like)

cristianoc · 2019-09-19T11:36:39Z

And here's the change to use block representation yet keep tags as strings: 45924dc#diff-28556b6d23fb9668ee5320bf37ed6cc2R189-R192.

cristianoc · 2019-09-20T03:57:37Z

@chenglou has identified that the slowdown was due to having integer keys in objects.
The new representation uses Arg0, Arg1 etc as keys. And the benchmark jscomp/test/record_bench.js is now faster than with arrays.

Add length property. For now, an an explicit field. Checking in little test. size_of_t Object length. Remove "length" property from representation. Adapt caml_update_dummy to also support objects. Remove all the magic for length from Caml_obj_extern. Now, the representation of objects is not hardcoded anymore. They could equally be objects originating from JS that happen to have the same fields and contents. Don't print `caml_update_dummy0` as it disturbs tests. Some tests towards strings as tags. Tweak. rebase on master First attempt at emitting tags as strings. More variant examples. Add support for Pisstring in if-then-else generated for pattern matching. comment Sync pattern matching changes from ocaml. Pull in ocaml change for booleans, and fix some tests. NOTE: the comparison needs at least revisiting. As now lists don't compare in the intuitive way. The base case "[]" is a string, and is the largest element in the current comparison. Tweak printexc Support compilation of recursive modules. The shape is a variant constructed in an untyped way from OCaml, needs to generate the new representation. Disable test with jsConverter that does not fit in the new runtime representation. This does not make much sense: ``` | A1 [@bs.as 3] ``` Compile () to 0, not "()". Disable one more test with deriving jsConverter. Tweak caml_parser to use the new representation. Remove the tests involving generated parsers. Supporting generated parsers requires passing some information differently. Currently, the internal representation of variants is implicitly assumed in the generated code. E.g. the table `yytransl_const` is an array, assuming that the index is the numeric tag of terminals. Update hash test. Delete arith_lexer.mll Restore some lexer files. Add record runtime representation benchmark. Use blocks for payloads but string for tags. Use new representaiton where args are strings not numbers. Instead of using 0, 1, 2 for variant arguments, use "Arg0", "Arg1", "Arg2". This avoids the perf issues with `record_bench.js`, in fact the test becomes faster.

jacobp100 · 2019-10-02T12:01:49Z

would something like {tag: "Name", contents: [1, 2]} have the same negative performance impact? the ability to serialise BucklSscript's data structures would be a huge help for server-side rendering as we generally do something like : window.initialState = JSON.stringify(data)

You could also use polymorphic variants, which can (currently) be safely run through JSON stringify and parse

Niveous · 2019-11-17T11:59:51Z

How about using shorter keys since object keys can't be shortened by minifiers? Elm uses $ for the tag and a, b, etc for the fields.

{$: 'C', a: 4, b: 2}

or, name the keys like Scala tuples:

{$: 'C', _0: 4, _1: 2}

chenglou · 2019-11-17T13:07:29Z

@Niveous I've explained this elsewhere, though I can't find that issue currently. But the short version is that we want to expose these underlying representations as public, for trivial, cost-free (code-free!) interop with JS.

Niveous · 2019-11-17T13:29:48Z

@chenglou tag Arg0 etc are not exposed right now right? Could still rename them to something shorter before commiting to a public API?

chenglou · 2019-11-17T13:30:54Z

They're not. Temporary names. We'll name them into something more readable before public release.

Niveous · 2019-11-17T13:38:20Z

Sounds good then! Maybe they should be modifiable using attributes similar to https://serde.rs/enum-representations.html for even better zero cost interop.

sikanhe · 2019-11-21T22:10:01Z

Would this work for Inline Records?

chenglou · 2019-11-26T01:57:00Z

Not considering it until variants are a public representation too

bobzhang · 2020-04-15T03:45:49Z

I am going to move this forward .
For the transition, I am not going to changing the pattern match compiler, this is my proposed representation:

{ _0 : field0, _1 : field1, ..., tag  }

In debug mode, we can have one more

{ _0 : field0, _1 : field1, ..., tag  , name}

Once landed, I think performance wise, we have reached optimal for data representation then we can stabilize the ABI

chenglou · 2020-04-15T03:47:30Z

Sure. Then let's make it clear that folks should not leverage the new representation still

cristianoc · 2020-04-15T05:15:40Z

Splitting into phases is good, even letting people test etc. But I would try to reach a final form before creating an actual release. So any tooling needs to sync up only once.

jacobp100 · 2020-04-15T07:12:04Z

Btw, if you put tag first then _0, _1 etc. in the object it should perform better. Then every key will always be at the same offset so you can benefit from the inline cache optimisations

jfrolich · 2020-04-15T13:26:57Z

Just out of interest, what will happen to variants with inline records? These would open up an amazing opportunity for interop potentially.

bobzhang · 2020-04-16T01:43:23Z

@jfrolich For inline records, I am not sure, we will see how difficult it goes to remapping keys.
@cristianoc I tend to keep tag point to number instead of string, we can add new fields for debugging. The reason is that change tag from number to string is not strictly better, it increases the bundle size, slow dow a little bit, but it requires much more work to do. We need more benefit to justify the complexity

Risto-Stevcev · 2020-04-16T14:22:36Z

Could we get this for poly variants as well? #4295
They tend to be more useful for most of my use cases because you can merge sets (ie: type encoding = [ input_encoding | output_encoding ]) and you don't need to bring the constructors into scope

yawaramin · 2020-04-17T03:55:59Z

One caveat: TypeScript and Flow's discriminated union support requires all cases of the DU type to have the same tag field. E.g., to adapt @cristianoc 's original example:

type v =
  | A1
  | A2
  | B(int)
  | C(int, int)
  | D((int, int));

To interop seamlessly with Flow/TS, it would need to output something like:

{ tag: 0 } // A1
{ tag: 1 } // A2
{ tag: 2, _1: 1 } // B(1)
{ tag: 3, _1: 1, _2: 2 } // C(1, 2)
{ tag: 4, _1: [1, 2] } // D((1, 2))

Of course, if a variant type has no payloads in any cases, it could just be compiled to 0, 1, 2, etc. like currently.

jacobp100 · 2020-04-17T08:00:53Z

@yawaramin Play around with this https://www.typescriptlang.org/play/index.html?ssl=1&ssc=1&pln=21&pc=2#code/C4TwDgpgBAKlC8AoKKoB8oAZmowRhxQwCZD0oBvKYAQwHMAuKAZgBooB9PJvdj4psSgBfMhiq1GUACx9uUANq8oxALojEiADYRg1CAGc98KAApgTGAEoEAPkpkAlgDMzoSAHtXx+CYDkAHYArgC2AEYQAE5+NhRkqFAGAO6OwADGABZusfEJqGk0BtCYTJG6QZEBWLl5UAVFUPJlwBVVBLW19dACUM2tKjUaCcJQEFoNcR2JKelZ5gB0kjlTCV0speWVLIOrhdDSGy1b0oOiw4jCQA - you could be able to remove some of the cases and see errors

Risto-Stevcev · 2020-04-17T08:30:19Z

What the advantage of using tag and _1, _2,... in this representation?
I read #24 but I'm not sure I understand what the performance issue is here. You normally wouldn't have that many variants in ordinary code so it seems like any kind of optimization would be negligible in terms of perf but would dramatically hurt ergonomics.

@yawaramin It seems completely reasonable afaict to use this kind of rep, which would be way more expressive, and ts/flow could still wrangle it just fine:

type v =
  | A1
  | A2
  | B(int)
  | C(int, int)
  | D((int, int));

"A1" // A1
"A2" // A2
{ B: 1 } // B(1)
{ C: [1, 2] } // C(1, 2)
{ D: [[1, 2]] } // D((1, 2))

That way you could write converter helpers to/from objects/records that would let you write really expressive code for information processing, and bucklescript would be much more capable of writing libraries that target vanilla js like typescript is, which is one of the goals right?

I'm in favor of an ergonomics-first approach like it was with the records. This seems like a premature optimization to me if it does have an perf benefits, and a separate type like Js.t could be provided for those cases where you need to perf gains

jacobp100 · 2020-04-17T08:49:35Z

@Risto-Stevcev the tags would be a TS enum (or equivalent), so you wouldn't use the numbers directly

Risto-Stevcev · 2020-04-17T09:32:14Z

That's assuming that you would be using typescript on the other side though and not plain javascript. It would also mean you can't really write libraries that target both bucklescript and plain js because the js representation wouldn't be ergonomic. Typescript gives you the tools to write variants that are like what I'm proposing if you want to also target plain js:

// if you want to write a typescript-only lib -- enums aren't really ergonomic with plain js
enum Res {
    No = 0,
    Yes = 1,
}
// if you want to write a lib that also targets plain js
// (bucklescript would have to choose one of them since it can't just add new features to ocaml)
type Res2 = "Yes" | "No"

const yes: Res2 = "Yes"
console.log(Res.Yes, yes);

// the value proposition (what `result` would compile to) described using typescript:
type Result<T, E> = { Ok: T } | { Error: E }
const foo: Result<number, Error> = { Ok: 123 }

The last two examples show what you would get if you kept the approach I'm proposing -- you could write libraries in bucklescript that also target plain javascript in a very straightforward way with a clean API.

Also note that typescript can handle the proposed syntax just fine because it has support for union types but enums would be much less ergonomic for a library targeting plain js.

jfrolich · 2020-04-17T09:38:53Z

Perhaps it would be possible to customize the representation with annotations? For instance it would be quite nice to have variants map to Redux-like actions easily:

type action = 
| Increment
| Decrement
| Set({number: int})

to

{type: "Increment"}
{type:  "Decrement"}
{type: "Set", number: 1}

I can imagine the tag is the default tag, but perhaps customizable with an annotation to type?

yawaramin · 2020-04-17T13:43:48Z

@jacobp100 @Risto-Stevcev people using TS/Flow discriminated unions feature don't write types and pattern matches like that. They will want to write a single switch statement. See https://www.typescriptlang.org/docs/handbook/advanced-types.html#discriminated-unions and https://flow.org/en/docs/types/unions/#toc-disjoint-unions . The common element to both is that each case has to have a 'discriminant' i.e. the tag field that is being discussed here. Without it, if you want TS/Flow to do exhaustiveness checking, you have to resort to workarounds.

Anyway this may all be moot, because the proposed tags and field identifiers here are essentially numeric, so not really what TS/Flow people would normally do. But what I'm suggesting will at least be simple to model on that side.

Risto-Stevcev · 2020-04-17T14:05:44Z

That's a fair point, you could model pattern matching like that in js as well. But why the _1, _2, etc? why not just do something like:

{ tag: "C", value: /*tuple*/[1,2] }

Ocaml already does type erasure anyway (tuples reuse array rep on the js side)

bloodyowl · 2020-04-17T14:08:00Z

@Risto-Stevcev I believe that'd be because the VM would be able to optimise maps better

Risto-Stevcev · 2020-04-17T14:14:09Z

@bloodyowl I saw that the comment that it would benefit from inline cache optimizations, but it brings me back to my skepticism about the old record representation -- what are the benchmarks on this and are they realistic in terms of how variants are used in ocaml and js? for example, is it only noticeable about 1000 variations? which would literally never happen in real code since it's really rare that you'd ever go over 10 variations for any data structure. It seems like a premature optimization. Typescript isn't doing this and it doesn't seem like anyone's complaining about perf

jacobp100 · 2020-04-17T14:31:31Z

To clarify, my comment was only that putting tag first before the _0 fields would benefit inline caches.

But as a side note, I think the _0 stuff goes away if you use inline records

type t =
  | Test { x: 0, y: 0 }
  | Other { a: 0, b: 0 }

Would have the TS type { tag: 0, x: number, y: number } | { tag: 1, a: number, b: number }, which is pretty friendly!

(Although that might take future work)

TheSpyder · 2020-04-18T00:04:51Z

@Risto-Stevcev arrays in JS (particularly V8) are optimised for homogeneous contents. I don't think that has anything to do with cache lines. As soon as you start mixing data types in an array v8 switches to a less optimal representation of the array. There's plenty of posts about this, but the first search result I came up with seems good:
https://stackoverflow.com/questions/50939690/v8-heterogeneous-array-literals

I don't think TypeScript developers are storing 90% of their data values in heterogeneous arrays, as BuckleScript generated code used to before records-as-objects. They use unions and other things to simulate what we do with variants and records.

Risto-Stevcev · 2020-04-18T11:23:16Z

@TheSpyder Yeah, I think I get all the arguments for the rep now. But is it worth the perf gains to use _1, _2,... considering how few variants you'd work with? I'm just asking, I don't know what the benchmarks are. In any case, @jacobp100 mentioned that inline records seem would solve the issue with the rep

cristianoc force-pushed the variant_to_object branch from 39b926f to 8582024 Compare September 11, 2019 09:20

cristianoc force-pushed the variant_to_object branch from 85f64ea to 45924dc Compare September 20, 2019 02:29

cristianoc force-pushed the variant_to_object branch from 0501045 to 7f6d70e Compare September 20, 2019 12:39

cristianoc force-pushed the variant_to_object branch from 7f6d70e to c4921e5 Compare September 20, 2019 12:49

cristianoc closed this Nov 6, 2019

cristianoc reopened this Nov 6, 2019

rickyvetter mentioned this pull request Mar 6, 2020

Use open poly variant to cover the FUTURE case zth/rescript-relay#19

Closed

TheSpyder mentioned this pull request Apr 16, 2020

variant to record #4307

Closed

4 tasks

yawaramin mentioned this pull request Apr 17, 2020

Proposal to provide cleaner representation for poly variants #4295

Closed

bobzhang closed this Oct 10, 2020

cristianoc deleted the variant_to_object branch June 18, 2022 02:21

ryyppy added the experiment label Dec 20, 2022

Experiment with compiling (ordinary) variants to objects. #3801

Experiment with compiling (ordinary) variants to objects. #3801

Uh oh!

Conversation

cristianoc commented Sep 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobzhang commented Sep 18, 2019

Uh oh!

cristianoc commented Sep 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobzhang commented Sep 18, 2019

Uh oh!

cristianoc commented Sep 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobzhang commented Sep 18, 2019

Uh oh!

bloodyowl commented Sep 19, 2019

Uh oh!

cristianoc commented Sep 19, 2019

Uh oh!

cristianoc commented Sep 19, 2019

Uh oh!

cristianoc commented Sep 19, 2019

Uh oh!

cristianoc commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobp100 commented Oct 2, 2019

Uh oh!

Niveous commented Nov 17, 2019

Uh oh!

chenglou commented Nov 17, 2019

Uh oh!

Niveous commented Nov 17, 2019

Uh oh!

chenglou commented Nov 17, 2019

Uh oh!

Niveous commented Nov 17, 2019

Uh oh!

sikanhe commented Nov 21, 2019

Uh oh!

chenglou commented Nov 26, 2019

Uh oh!

bobzhang commented Apr 15, 2020

Uh oh!

chenglou commented Apr 15, 2020

Uh oh!

cristianoc commented Apr 15, 2020

Uh oh!

jacobp100 commented Apr 15, 2020

Uh oh!

jfrolich commented Apr 15, 2020

Uh oh!

bobzhang commented Apr 16, 2020

Uh oh!

Risto-Stevcev commented Apr 16, 2020

Uh oh!

yawaramin commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobp100 commented Apr 17, 2020

Uh oh!

Risto-Stevcev commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobp100 commented Apr 17, 2020

Uh oh!

Risto-Stevcev commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfrolich commented Apr 17, 2020

Uh oh!

yawaramin commented Apr 17, 2020

Uh oh!

Risto-Stevcev commented Apr 17, 2020

Uh oh!

bloodyowl commented Apr 17, 2020

cristianoc commented Sep 2, 2019 •

edited

Loading

cristianoc commented Sep 18, 2019 •

edited

Loading

cristianoc commented Sep 18, 2019 •

edited

Loading

cristianoc commented Sep 20, 2019 •

edited

Loading

yawaramin commented Apr 17, 2020 •

edited

Loading

Risto-Stevcev commented Apr 17, 2020 •

edited

Loading

Risto-Stevcev commented Apr 17, 2020 •

edited

Loading

Risto-Stevcev commented Apr 17, 2020 •

edited

Loading

jacobp100 commented Apr 17, 2020 •

edited

Loading