-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink about the runtime encoding of ocaml values in javascript #24
Comments
Note that
This encoding actually be efficient in most cases like |
As far as I know, ocamlscript currently encodes OCaml string (pure byte array) as JS string (UTF16 string?). I think it may go wrong in particular situations. Sorry if unrelated to this issue. |
@m2ym indeed, there is a corner case in range pattern function '\000' .. '\255' -> 1 in ocaml will be compiled into function _ -> 1 do you have anything else in mind? thanks! (ocamlscript will require -safe-string) |
we can walk around this issue by |
think twice, it might not be a problem, since string is immutable, and is lexed by ocamllex, there is no way you can encode a string out of range, it only happens in FFI, @alainfrisch ? |
Yes, you could get strings from JS through the FFI. |
another benefit of such encoding is that for the FFI, tuple is quite easy to map, even if we goto typedtree we can still guarantee the mapping of tuple, since the field of |
Float arrays and records could be represented using |
@copy I have similar ideas, but |
I don't know why they don't use it, maybe they haven't considered it yet. I agree with engine support. It could be enabled conditionally with a flag or shimmed though. |
@hhugo do you have any suggestion? |
For the record, encoding variants as js objects will help debugging a lot, for example function $Cons(x,y){ this._0 = x; this._1 = y; this.t = "Cons" } $Cons.prototype.toString = function(){return `${this.t} ( ${this._0}, ${this._1})`} The benefit is that we can make use of prototype chain in js and get a pretty output when debugging, the challenge is that we need know where constructors are called. Every call site for the record we can do something similar type t = { x : int ; y : int} function $t(x,y} { this._0 = x; this._1 = y; this.lables = ['x','y']}
$t.prototype.toString= function(){...} we can still preserve the structure comparison semantics of ocaml, the same challenge as we have for encoding variants, if we go this way, it is less work then compiling from typedtree, but get a reasonable output |
note that the reason that |
so currently, record, tuple, array, polymorphic variants are all compiled into array(offset 0) normal variants are compiled into array like object { 0 : field_0, 1 : field_1, ... n - 1 : field_n_1, length : n , tag : obj.tag} type option and list are handled specially,
|
As discussed in email previously, I'd strongly suggest to use object literals with 0 based indexing and the tag, as you described.
This might be slower in some cases than the array literal version
but I'm sure we (and other VMs) can fix the performance differences when you run into tricky performance problems. I think the object literal version has the most potential for usability (i.e. FFI), readability and consistent performance. |
@bmeurer that's very kind of you, we will move forward with this encoding for variants |
For the record, we can use |
Actually we talked about this again and came to the conclusion that using
an Array literal plus defineProperty for the "tag" might be better
currently from a performance perspective.
|
@bmeurer thanks for the info, it's easy for us to switch, you mean like this below? Object.defineProperty([1,2,3], 'tag', { 'value' : tag_value}) What are the best property settings? Another question, shall I write a wrap function let f = (arr, tag)=> Object.defineProperty(arr,'tag',{'value',tag}}
f([1,2,3],tag_value) or always do the inlining? I guess I should do the inlining for the peephole, right? |
Yes, I'd also go for
i.e. non-writable, non-configurable 'tag' property. Not sure if the helper function makes sense. It might save you some space, and shouldn't really hurt performance, as it should not get polymorphic as long as tag is always a small integer. |
@bmeurer thanks, we will go this way, it will not save too much space after being gzipped. |
What about creating "prototypical class" constructors for each tag type? When creating objects, you just create "new" instances of those. Every object already inherently has a constructor with its prototype, so this doesn't seem like it would add any more memory allocations. Checking the tag could be as simple as checking reference identity on that constructor/prototype of the object/record etc. |
@jordwalke can you be more elaborate (with an example), the thing is we don't have type definition. but we can add a name property for debugging experience when available like below: { tag : 0, 0 : field_0 , 1 : field_1, name : constructor_name_when_available_just_for_debugging} |
For the record, actually labels are available when constructing a record. so when compiling record(in debug mode) we can generate type t = { x : float; y : float}
let make_t x y = { x ; y }
type u = A of int
let make_a x = A x function make_t (x,y) {
return Object.defineProperty([x,y], 'labels', { 'value' : ["x";"y"]}
}
function make_x (x){
return Obj.defineProperties([x], { 'name' : { 'value' : 'A'}, 'tag' : 0}}
} with this, I think we can debug ocaml generated js in js debugger without too much pain(with or without sourcemap) |
Old reply: Regarding the FFI API, I really don't care too much because I would imagine a toolchain that compiles interop based on Flow type definitions. (Flow is likely a better match than type script because it has non-nullability by default and even polymorphic variants! TypeScript is what you get when C# developers build static typing. Flow is what you get when OCaml developers build static typing - but I digress) Why is it that you have access to the field labels at debug time, but not for other compilation modes? And yes, just having those debuggable values fulfills the purpose - the actual representation doesn't matter as long as it performs very well. |
If you benchmark, don't forget to test on JavaScriptCore (JIT and non-JIT). It is one of the best, lightest weight, and most versatile JS engines for mobile where perf matters most. I'm always happy to help people test on JSC (it's already installed on every Mac by default!) |
@jordwalke in the debug time, field access is still using array. for example |
it's ok for variants, the thing is that some compiler internals (mostly in translclass/translobj) assumes block is an array, unless the upstream lend a hand, it will be a headache to maintain those patches |
It would be fine to have them continue to be delivered as an array, as long as every type chose a distinct index range. So, a We don't need to know the actual original "types", we just need to guarantee that the index ranges (for creation and access) are consistent for two data
|
Arrays with holes will waste space and are in general slower than packed @bobzhang Object.defineProperty in a hot function will always be way slower than adding a property via a store to a previously defined literal. slow3.js usually translates to highly efficient code that just allocates and initializes the literal with slack space for the tag property, and then transitions the object to a new hidden class and stores the tag value. So that's just bump pointer allocation plus a bunch of machine level stores. While slow4.js always goes to the runtime (switching to C++) for the Object.defineProperty call. |
@bobzhang Ok, I figured out what's causing the slow down in slow.js vs fast.js. The access to the objects is properly optimized (in V8), but the allocation of the literal is currently not. So what happens is that we generate fast code for [a,...,z], but need to fallback to the generic allocation path (which is super slow compared to the inline path) for {0:a, ..., n-1:z}. I'm not exactly sure why we do this tho, as there doesn't seem to be an obvious reason why we can't support both in the inline path. Maybe it's just because that didn't turn out to be relevant (and somehow in the back of my head I was pretty sure we already fixed this some time ago). |
I was not suggesting that there be actual holes in the arrays that are allocated in JavaScript. I was only suggesting that holes be places in the index ranges in the compiler's intermediate representation. Those holes are only there to ensure that intermediate representations maintain distinct "meaning" for various offsets. We don't need to know everything about the type at this later stage of the compiler - only its memory layout, and some hole starting index that uniquely classifies which other structures it is compatible with. I would then suggest taking those hole-ridden ranges, and then converting them into plain Objects as follows:
This has all the benefits of the third test case that I created called "String Object Keys" above, but without the issue that JIT optimizers may have their hidden classes confused by every structure having the fields located at keys The actual native ocaml compiler would want to disregard those holes. I'm merely suggesting a way that, via index ranges, everything we needed to know about the distinct type can be conveyed without actually having to track the type through the various intermediate representations. For every possible engine, including legacy engines deployed to node/browsers, it seems this would be optimal, correct? |
@jordwalke Indeed, that's a good suggestion, and I suppose it will be optimizable on all engines. |
@bmeurer , thanks for looking, is there any downside with respect to |
@jordwalke the general policy of patches is that it should be sound -- which means even if it is missing somewhere, the output should be still correct (maybe less efficient or uglier). I think we can discuss more about it in the futre |
I do not believe I proposed anything unsound. |
It's almost the same performance on access, yes. |
@bmeurer cool, it seems slow3.js is the best encoding at this time -- I used to learn that patch an object with property could cause de-optimization, but it is not true in this case, right? |
@bobzhang In V8 this won't cause de-opts, but the assignment to x.tag will not be inlined into the function (but use a store IC instead), because the array literal doesn't reserve space for in-object properties, so we need to allocate out-of-object properties backing store for tag first. |
@bmeurer @jordwalke so we will go with |
I want to address the issues with the object literals in V8. I'll try to reserve some time for that during the year. |
@bmeurer cool, thank you in advance! |
feel free to re-open it if anyone has better ideas |
Is there still an issue with the performance of object literals? I'm hitting an issue with the current array representation — I'm sending the representation of union types around the network, but given that most serialization formats either ignore or strip properties in arrays, there's no guarantee that the representation will be the same after being encoded and decoded again. |
@ergl There's apparently some work on "efficient deriving" that should fix this. In the meantime, you could use this for serialization: https://github.com/BuckleTypes/transit-bsc |
@glennsl thanks for the link, I ended up implementing a converter for array encoding <-> object literal encoding, as I'm not in control of the transport format right now |
we are going to providing something like below:
type t = ... [@@bs.deriving{json}]
Currently it is recommended to roll your own
[email protected] At: 02/22/17 10:07:29" data-digest="From: [email protected] At: 02/22/17 10:07:29" style="">
From: [email protected] At: 02/22/17 10:07:29
To: [email protected]
Cc: HONGBO ZHANG (BLOOMBERG/ 731 LEX), [email protected]
Subject: Re: [bloomberg/bucklescript] Rethink about the runtime encoding of ocaml values in javascript (#24)
@glennsl thanks for the link, I ended up implementing a converter for array encoding <-> object literal encoding, as I'm not in control of the transport format right now
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.
|
* replace the basic theme Reason project with the bsb-native example * note behavior to create a new project * add back bs-dependencies to basic template bsconfig; code review nits * add back code review nit; rebase off master
Hope this isn't too off topic. I was just looking at a common use-case: mapping over a list. I made a test demo using an implementation like I tested in Chrome and Safari using esbench. For the immutable case, the object wins out slightly in Chrome, and by quite a lot in Safari. For the mutable case, the object wins out by quite a lot in both browsers (both were faster than any immutable updates, too). My results in Safari were,
Codelet listArray = null
let listObject = null
for (let i = 100; i >= 0; i -= 1) {
listArray = [i, listArray]
listObject = { current: i, next: listObject }
}
const fn = x => x * 2 + 1
const mapImmutableArray = (fn, x) => x != null ? [fn(x[0]), mapImmutableArray(fn, x[1])] : null
const mapImmutableObject = (fn, x) => x != null ? { current: fn(x.current), next: mapImmutableObject(fn, x.next) } : null
const mapMutableArray = (fn, x) => {
if (x == null) return null
const out = [fn(x[0]), null]
let writeTo = out
let readFrom = x[1]
while (readFrom != null) {
const next = [fn(readFrom[0]), null]
writeTo[1] = next
writeTo = next
readFrom = readFrom[1]
}
return out
}
const mapMutableObject = (fn, x) => {
if (x == null) return null
const out = { current: fn(x.current), next: null }
let writeTo = out
let readFrom = x.next
while (readFrom != null) {
const next = { current: fn(readFrom.current), next: null }
writeTo.next = next
writeTo = next
readFrom = readFrom.next
}
return out
} |
Goal:
Some documentation about the current encoding is [https://github.com/bloomberg/ocamlscript/blob/master/docs%2Fffi.md] here, there is a problem with this encoding is that
Pfield i
is no longer the same asParrayref
, while the internal OCaml compiler think it is the same, for examplestdlib/camlinternalOO.ml
,bytecomp/translobj.ml
,bytecomp/translclass.ml
there might be some other files I am missing. (Recent changes in the trunk ofstdlib/camlinternalOO
requires us to sync up the change)So I am thinking of that
Obj.tag
is not used in too much in the compiler itself(except the GC, which is not relevant in js backend)So I am proposing that
blocks with tag zero
is encoded as array, blocks with tag non zero (mostly normal variants) will be encoded as array plus an extra property viaObject.defineProperty
so that they are polymorphic. andPfield
,Parrayref
will behave the same.for example
A (1,2,3)
will be encoded asObject.defineProperty([1,2,3], {'t', {'value', 1}}
B(1,2)
will be encoded as[1,2]
The text was updated successfully, but these errors were encountered: