-
Notifications
You must be signed in to change notification settings - Fork 464
Experiment with compiling (ordinary) variants to objects. #3801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
39b926f
to
8582024
Compare
we discussed such before
such encoding is slow (with numeric val as key) |
How do we get some numbers? We need something a bit more precise to evaluate the trade-offs.
|
see discussions in #24, it should be fine to switch tag between number and string. |
@bobzhang ooh so I've tested the difference is 10X. Looks like the situation is exactly the same as 3 years ago. However, it still makes sense to explore representing tags as strings, and measure if there's a perf difference. This would also mean that for zero-ary variants, the interop is still perfect. |
It makes sense to me, as it also makes debug easier
For zero-arity variants, polyvar makes more sense to be compiled into string, as it is structual typing and it has more flexibity with names (name can start with lowercase letter). We also need be careful about a speical case, take list for example |
would something like |
For reference, here's the bernchmark: 37cd85b#diff-7074723a3e2e5fe591d29f327ab490e7R1. |
@bloodyowl serialization is related but also can be orthogonal. E.g. see here: https://github.com/cristianoc/REInfer/blob/master/src/Serialize.re (It might require updating, but that's the gist of what kind of thing is required -- just a bit of knowledge of what the runtime representations in general look like) |
And here's the change to use block representation yet keep tags as strings: 45924dc#diff-28556b6d23fb9668ee5320bf37ed6cc2R189-R192. |
85f64ea
to
45924dc
Compare
@chenglou has identified that the slowdown was due to having integer keys in objects. |
0501045
to
7f6d70e
Compare
Add length property. For now, an an explicit field. Checking in little test. size_of_t Object length. Remove "length" property from representation. Adapt caml_update_dummy to also support objects. Remove all the magic for length from Caml_obj_extern. Now, the representation of objects is not hardcoded anymore. They could equally be objects originating from JS that happen to have the same fields and contents. Don't print `caml_update_dummy0` as it disturbs tests. Some tests towards strings as tags. Tweak. rebase on master First attempt at emitting tags as strings. More variant examples. Add support for Pisstring in if-then-else generated for pattern matching. comment Sync pattern matching changes from ocaml. Pull in ocaml change for booleans, and fix some tests. NOTE: the comparison needs at least revisiting. As now lists don't compare in the intuitive way. The base case "[]" is a string, and is the largest element in the current comparison. Tweak printexc Support compilation of recursive modules. The shape is a variant constructed in an untyped way from OCaml, needs to generate the new representation. Disable test with jsConverter that does not fit in the new runtime representation. This does not make much sense: ``` | A1 [@bs.as 3] ``` Compile () to 0, not "()". Disable one more test with deriving jsConverter. Tweak caml_parser to use the new representation. Remove the tests involving generated parsers. Supporting generated parsers requires passing some information differently. Currently, the internal representation of variants is implicitly assumed in the generated code. E.g. the table `yytransl_const` is an array, assuming that the index is the numeric tag of terminals. Update hash test. Delete arith_lexer.mll Restore some lexer files. Add record runtime representation benchmark. Use blocks for payloads but string for tags. Use new representaiton where args are strings not numbers. Instead of using 0, 1, 2 for variant arguments, use "Arg0", "Arg1", "Arg2". This avoids the perf issues with `record_bench.js`, in fact the test becomes faster.
7f6d70e
to
c4921e5
Compare
You could also use polymorphic variants, which can (currently) be safely run through JSON stringify and parse |
How about using shorter keys since object keys can't be shortened by minifiers? Elm uses
or, name the keys like Scala tuples:
|
@Niveous I've explained this elsewhere, though I can't find that issue currently. But the short version is that we want to expose these underlying representations as public, for trivial, cost-free (code-free!) interop with JS. |
@chenglou |
They're not. Temporary names. We'll name them into something more readable before public release. |
Sounds good then! Maybe they should be modifiable using attributes similar to https://serde.rs/enum-representations.html for even better zero cost interop. |
Would this work for Inline Records? |
Not considering it until variants are a public representation too |
I am going to move this forward .
In debug mode, we can have one more
Once landed, I think performance wise, we have reached optimal for data representation then we can stabilize the ABI |
Sure. Then let's make it clear that folks should not leverage the new representation still |
Splitting into phases is good, even letting people test etc. But I would try to reach a final form before creating an actual release. So any tooling needs to sync up only once. |
Btw, if you put tag first then _0, _1 etc. in the object it should perform better. Then every key will always be at the same offset so you can benefit from the inline cache optimisations |
Just out of interest, what will happen to variants with inline records? These would open up an amazing opportunity for interop potentially. |
@jfrolich For inline records, I am not sure, we will see how difficult it goes to remapping keys. |
Could we get this for poly variants as well? #4295 |
One caveat: TypeScript and Flow's discriminated union support requires all cases of the DU type to have the same type v =
| A1
| A2
| B(int)
| C(int, int)
| D((int, int)); To interop seamlessly with Flow/TS, it would need to output something like: { tag: 0 } // A1
{ tag: 1 } // A2
{ tag: 2, _1: 1 } // B(1)
{ tag: 3, _1: 1, _2: 2 } // C(1, 2)
{ tag: 4, _1: [1, 2] } // D((1, 2)) Of course, if a variant type has no payloads in any cases, it could just be compiled to 0, 1, 2, etc. like currently. |
What the advantage of using @yawaramin It seems completely reasonable afaict to use this kind of rep, which would be way more expressive, and ts/flow could still wrangle it just fine: type v =
| A1
| A2
| B(int)
| C(int, int)
| D((int, int)); "A1" // A1
"A2" // A2
{ B: 1 } // B(1)
{ C: [1, 2] } // C(1, 2)
{ D: [[1, 2]] } // D((1, 2)) That way you could write converter helpers to/from objects/records that would let you write really expressive code for information processing, and bucklescript would be much more capable of writing libraries that target vanilla js like typescript is, which is one of the goals right? I'm in favor of an ergonomics-first approach like it was with the records. This seems like a premature optimization to me if it does have an perf benefits, and a separate type like |
@Risto-Stevcev the tags would be a TS enum (or equivalent), so you wouldn't use the numbers directly |
That's assuming that you would be using typescript on the other side though and not plain javascript. It would also mean you can't really write libraries that target both bucklescript and plain js because the js representation wouldn't be ergonomic. Typescript gives you the tools to write variants that are like what I'm proposing if you want to also target plain js: // if you want to write a typescript-only lib -- enums aren't really ergonomic with plain js
enum Res {
No = 0,
Yes = 1,
}
// if you want to write a lib that also targets plain js
// (bucklescript would have to choose one of them since it can't just add new features to ocaml)
type Res2 = "Yes" | "No"
const yes: Res2 = "Yes"
console.log(Res.Yes, yes);
// the value proposition (what `result` would compile to) described using typescript:
type Result<T, E> = { Ok: T } | { Error: E }
const foo: Result<number, Error> = { Ok: 123 } The last two examples show what you would get if you kept the approach I'm proposing -- you could write libraries in bucklescript that also target plain javascript in a very straightforward way with a clean API. Also note that typescript can handle the proposed syntax just fine because it has support for union types but enums would be much less ergonomic for a library targeting plain js. |
Perhaps it would be possible to customize the representation with annotations? For instance it would be quite nice to have variants map to Redux-like actions easily: type action =
| Increment
| Decrement
| Set({number: int}) to
I can imagine the |
@jacobp100 @Risto-Stevcev people using TS/Flow discriminated unions feature don't write types and pattern matches like that. They will want to write a single Anyway this may all be moot, because the proposed tags and field identifiers here are essentially numeric, so not really what TS/Flow people would normally do. But what I'm suggesting will at least be simple to model on that side. |
That's a fair point, you could model pattern matching like that in js as well. But why the { tag: "C", value: /*tuple*/[1,2] } Ocaml already does type erasure anyway (tuples reuse array rep on the js side) |
@Risto-Stevcev I believe that'd be because the VM would be able to optimise maps better |
@bloodyowl I saw that the comment that it would benefit from inline cache optimizations, but it brings me back to my skepticism about the old record representation -- what are the benchmarks on this and are they realistic in terms of how variants are used in ocaml and js? for example, is it only noticeable about 1000 variations? which would literally never happen in real code since it's really rare that you'd ever go over 10 variations for any data structure. It seems like a premature optimization. Typescript isn't doing this and it doesn't seem like anyone's complaining about perf |
To clarify, my comment was only that putting But as a side note, I think the
Would have the TS type (Although that might take future work) |
@Risto-Stevcev arrays in JS (particularly V8) are optimised for homogeneous contents. I don't think that has anything to do with cache lines. As soon as you start mixing data types in an array v8 switches to a less optimal representation of the array. There's plenty of posts about this, but the first search result I came up with seems good: I don't think TypeScript developers are storing 90% of their data values in heterogeneous arrays, as BuckleScript generated code used to before records-as-objects. They use unions and other things to simulate what we do with variants and records. |
@TheSpyder Yeah, I think I get all the arguments for the rep now. But is it worth the perf gains to use |
Experiment with the runtime representation of (non-polymorphic) variants, to see if they can be produced/consumed from JS without the need for conversion.
Example
Is represented as:
Ocaml objects, exceptions, polymorphic variants, and lazy values for now keep the existing representation.
Changes
Requires operations to:
tag
length
with e.g.Caml_obj_extern.size_of_t
andCaml_obj_extern.length
.Caml_hash.caml_hash
orCaml_obj.caml_obj_dup
.The pattern-matching compiler had to be modified so it does not apply a range-discovering algorithm to the numeric tags. But only uses equality on the string constants.
TODO
1 Disabled tests that use
jsConverter
which is not compatible:ast_js_mapper_poly_test.ml
ast_abstract_test.ml
One specific feature might need revisiting:
| A1 [@bs.as 3]
.2 Removed tests involving generated parsers:
Supporting generated parsers requires passing some information differently.
Currently, the internal representation of variants is implicitly assumed in the generated code.
E.g. the table
yytransl_const
is an array, assuming that the index is the numeric tag of terminals.Since these are only used for bigger internal tests, it's possible to adapt ocamllex to generate different code.