Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to piecemeal stream write/encode a single json object? #23

Closed
veqryn opened this issue Mar 9, 2024 · 11 comments
Closed

How to piecemeal stream write/encode a single json object? #23

veqryn opened this issue Mar 9, 2024 · 11 comments

Comments

@veqryn
Copy link
Contributor

veqryn commented Mar 9, 2024

I would like to piecemeal construct a single json object or array.
Ideally, I would like to be able to clone the Encoder at any point, so that I can have multiple versions of the partially finished json.
The main use case right now, is a slog.Handler that will be able to partially write the json as attributes are added, so that it doesn't need to fully marshal all of the attributes every time (similar to the built-in slog handlers, but using this json v2 library so I can take advantage of the new encoder options like SpaceAfterComma).

Example attempt:

package main

import (
	"bytes"
	"fmt"

	"github.com/go-json-experiment/json"
	"github.com/go-json-experiment/json/jsontext"
)

type Property struct {
	Name  string
	Value any
}

func main() {
	properties := []Property{
		{"foo", "bar"},
		{"num", 12},
		{"hi", Hello{Foo: "fooo", Bar: 34.56, Earth: World{Baz: "bazz", Nuu: 78}}},
	}

	opts := []jsontext.Options{
		json.Deterministic(true),
		jsontext.SpaceAfterComma(true),
	}

	buf := &bytes.Buffer{}
	encoder := jsontext.NewEncoder(buf, opts...)

	wToken(encoder, jsontext.ObjectStart)

	for _, p := range properties {
		wValue(encoder, marshal(p.Name))
		wValue(encoder, marshal(p.Value))
	}

	wToken(encoder, jsontext.ObjectEnd)

	fmt.Println(buf.String())
}

func marshal(in any, opts ...json.Options) jsontext.Value {
	b, err := json.Marshal(in, opts...)
	if err != nil {
		panic(err)
	}
	return jsontext.Value(b)
}

func wToken(encoder *jsontext.Encoder, token jsontext.Token) {
	if err := encoder.WriteToken(token); err != nil {
		panic(err)
	}
}

func wValue(encoder *jsontext.Encoder, value jsontext.Value) {
	if err := encoder.WriteValue(value); err != nil {
		panic(err)
	}
}

type Hello struct {
	Foo   string
	Bar   float64
	Earth World
}

type World struct {
	Baz string
	Nuu int
}

Right now, I am encountering a few problems:

  1. The values are being parsed twice.
    In the example above, I am using the regular json.Marshal(...) to turn an any into a jsontext.Value, then writing that value to the encoder.
    When writing to the encoder, it automatically re-parses the []byte value to confirm it is valid json. This is unneeded and a performance penalty.

  2. I don't see a way to clone the Encoder.
    There isn't a method to create a new encoder with an existing buffer or any of the encoder's state set.
    If I choose to replace the encoder with a simple buffer, I would lose out on all the encoder's guarantees and the jsontext.Options available, or have to re-implement them myself.

Is there an existing better way to do this?

If not, could we discuss what api additions or changes would be needed to allow this use case?

@dsnet
Copy link
Collaborator

dsnet commented Apr 11, 2024

I don't see a way to clone the Encoder.

If the Encoder backs a bytes.Buffer, then clone might have obvious semantics (as we can clone the underlying bytes.Buffer as well). But if it backs an arbitrary io.Writer, what does that mean?

@veqryn
Copy link
Contributor Author

veqryn commented Apr 12, 2024

Perhaps instead of being able to clone the Encoder, how about being able to create a new Encoder if given a []byte or bytes.Buffer?

Or perhaps there is a better way to accomplish this?

A great example of the problem:
stdlib log/slog's JSONHandler, in particular how it piecemeal adds log attributes to a buffer to manually construct the JSON log line

Here is the library I have written, based off of log/slog JSONHandler, but converted to use your JSON v2 library. It also faces the same issue:
https://github.com/veqryn/slog-json

@seankhliao
Copy link

I was attempting something similar and also found the jsontext api quite awkward to use. There's no efficient way to encode parts that are reused later, so you store a slice of jsontext.Token, but then if you get a json.MarshalerV1 or json.MarshalerV2 or some other non basic type the best you can get is a jsontext.Value and there' no way to go from Token to Value so you have to store a union of both.

@dsnet
Copy link
Collaborator

dsnet commented Nov 12, 2024

@seankhliao, could you provide a concrete example of what you're trying to do? Thanks!

@seankhliao
Copy link

I started with trying to convert my slog json handler to use jsontext. The current code is https://github.com/seankhliao/mono/blob/92d6c1a99aa152ab5312372b996128e2446bd3d2/jsonlog/jsonlog.go which works on appending to []byte buffers, cloning that when the state forks and joining prerendered state with the current log line in Handle.

I started with a bottom up translation, with my first snag being this block. For the other cases I had replaced []byte with []jsontext.Token, I now needed something different to accomodate an arbitrary json value. https://github.com/seankhliao/mono/blob/92d6c1a99aa152ab5312372b996128e2446bd3d2/jsonlog/jsonlog.go#L238-L255

When I ignored the problem to think about the rest of the code, I realized I couldn't give the underlying writer jsontext.NewEncoder either, unless I wanted to hold a lock for the entire duration of Handle.

@dsnet
Copy link
Collaborator

dsnet commented Nov 12, 2024

At this level of optimization, I think manually crafting JSON with append and jsontext.AppendQuote for the cases where you know that the string is untrusted is the right thing to do. I also don't think the code you've got is all that unreasonable.

It's possible that we could add API to make jsontext.Encoder preserve (or pre-initialize) state between usages, but I fear that it will still always be slower since the semantics of the Encoder is that it will always check method calls against the JSON state machine (e.g., to know whether to emit a comma) and that things properly comply with the grammar (for the sake of correctness). Those checks will always cost you in performance.

An API that would probably help is a json.MarshalAppend function that you could use here:
https://github.com/seankhliao/mono/blob/92d6c1a99aa152ab5312372b996128e2446bd3d2/jsonlog/jsonlog.go#L253-L254

(as an aside, I suspect you probably also want an option that does a best-effort marshal where it emits valid JSON even if there is an error)

@dsnet
Copy link
Collaborator

dsnet commented Nov 12, 2024

BTW, is this a bug?
https://github.com/seankhliao/mono/blob/92d6c1a99aa152ab5312372b996128e2446bd3d2/jsonlog/jsonlog.go#L241-L242

Did you instead perhaps intend to do:

case json.MarshalerV1:
	b, _ := v.MarshalJSON()
	h.buf, _ = append(h.buf, b)

since b is already going to be JSON?

@seankhliao
Copy link

I see, json.MarshalAppend would be nice. I guess when I first heard of jsontext without having seen the api, I was thinking it'd be more append based given your previous proposals on adding append based APIs.

And yes, that's a bug.

@dsnet
Copy link
Collaborator

dsnet commented Nov 13, 2024

BTW, have you seen https://pkg.go.dev/github.com/orsinium-labs/jsony? It uses a structured representation that might be perfect for your use-case. The nature of the structured representation for JSON arrays and objects means that it doesn't have to waste effort maintaining push-down automaton to validate the JSON grammar.

@seankhliao
Copy link

That's new to me, I'll take it for a spin

@dsnet
Copy link
Collaborator

dsnet commented Jan 15, 2025

I'm going to to close this as "won't fix".

For extreme performance, nothing will faster than manually crafting JSON with a series of append and we provide jsontext.AppendQuote to assist with encoding a JSON string, specifically. The JSON grammar is sufficiently simple that manual crafting of JSON isn't that hard.

For more structured representation that still marshals quickly, then something like the jsony project is always going to be faster than a token-based approach that jsontext takes. Functionally, jsony is an AST of the JSON value using structured Go values (albeit you are somewhat restricted on exactly what Go values can be in the AST. In contrast, a token-based is more flexible, but always has to take the performance hit of validating the JSON grammar according to a state machine.

There's a fundamental tradeoff between flexibility and performance. The jsontext is aiming for flexibility (with decent performance), while jsony aims for best performance, but you sacrifice some flexibility.

@dsnet dsnet closed this as completed Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants