Optimize key formatter #6

moberegger · 2025-05-30T14:12:28Z

Makes a few optimizations to the Jbuilder::KeyFormatter integration:

Moves the computation of the cache key to the Hash itself, which performs slightly better than using ||=. I believe this because Ruby can compute the key in a single sys call.
Uses *args and **kwargs to handle the provided format symbols. This saves on having to compute a list of formatters from the provided options, and saves on some memory allocations for the empty [] arrays that represented "no parameters". While this is definitely a micro-optimization (ie. it would only be a hot code path of a template itself uses json.key_format!) it does make things easier to read.
Uses .each over .inject when computing cache keys, which is slightly faster.
Saves on some memory allocation when initializing either Jbuilder or JbuilderTemplate with no options, which is what the template handler does (json||=JbuilderTemplate.new(self)). No need to allocation an empty Hash for this.
Optimizes Jbuilder initialization by using the [] operator instead of fetch when grabbing values from options
Saves on a memory allocation for both key_format! and self.key_format by simply passing args through to KeyFormatter.new. This is a micro-optimization, but can be a hotter code path if a template uses json.key_format!.

Now the big change here is that the key formatter cache is now no longer clobbered between template renders. What was originally happening was that when you configured a global key formatter with something like Jbuilder.key_format = camelize: :lower, it would be .cloned whenever a new Jbuilder was initialized, which happens before each template render. When it was cloned, the cache would be wiped it when KeyFormatter#initialize_copy ran. This meant that each template render - and thus each API request - would start with a fresh cache. This meant that you would have to re-pay the cost to format the keys all over again.

I'm not sure why it was done this way. When configuring a global key formatter for Jbuilder, I would expect that to be used across requests so that the cost to generate the keys could amortize as the service runs. This behaviour was originally added twelve years ago, specifically in this commit, but it's not clear to me what the intent or motivation was to do that.

Currently we don't use jbuilders key formatting abilities... so you may be wondering why I'm doing this. Consider the following...

Without the key formatter, we would have to write templates like the following to get camelized keys

json.set! :firstName, person[:first_name]
json.set! :lastName, person[:last_name]
json.set! :age, person[:age]
json.set! :city, person[:city]

With a key formatter configured like Jbuilder.key_format = camelize: :lower, we could instead do

json.extract! person, :first_name, :last_name, :age, :city

and get the same result. Comparing the two...

ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                set!    53.232k i/100ms
            extract!    43.572k i/100ms
Calculating -------------------------------------
                set!    532.799k (± 1.5%) i/s    (1.88 μs/i) -      2.715M in   5.096552s
            extract!    598.457k (± 1.4%) i/s    (1.67 μs/i) -      3.006M in   5.024653s

Comparison:
            extract!:   598457.4 i/s
                set!:   532799.1 i/s - 1.12x  slower

Calculating -------------------------------------
                set!   320.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         4.000  strings (     0.000  retained)
            extract!    80.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
            extract!:         80 allocated
                set!:        320 allocated - 4.00x more

So this change will allow us to leverage extract! more to save on latency and memory. This is because extract! does a lot less work under the hood, where as set! has a wide range of abilities and this has cpu and memory overhead to manage the various options that can be provided to it.

With better caching for the key formatter we now have a tenable way to mitigate much of the overhead within the library.

moberegger · 2025-05-30T15:05:12Z

test/jbuilder_test.rb

-  test 'do not use default key formatter directly' do
-    Jbuilder.key_format
-    jbuild{ |json| json.key 'value' }
-    formatter = Jbuilder.send(:class_variable_get, '@@key_formatter')
-    cache = formatter.instance_variable_get('@cache')
-    assert_empty cache
-  end


Maybe repurpose this to validate that it is used?

Suppose I should just repurpose this test. With the Mutex I'm much more confident with the approach.

moberegger · 2025-05-30T15:35:29Z

The benchmark in the description was run against a raw Jbuilder. The differences are even more pronounced when using JbuilderTemplate, which is what is used in the action view integration and is thus more representative of what would happen in prod.

ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                set!    41.713k i/100ms
            extract!    60.275k i/100ms
Calculating -------------------------------------
                set!    423.084k (± 6.9%) i/s    (2.36 μs/i) -      2.127M in   5.072925s
            extract!    596.795k (± 3.7%) i/s    (1.68 μs/i) -      3.014M in   5.058895s

Comparison:
            extract!:   596794.9 i/s
                set!:   423083.6 i/s - 1.41x  slower

Calculating -------------------------------------
                set!   480.000  memsize (     0.000  retained)
                        12.000  objects (     0.000  retained)
                         4.000  strings (     0.000  retained)
            extract!    80.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
            extract!:         80 allocated
                set!:        480 allocated - 6.00x more

This new approach would circumvent many of the cpu and memory hotspots we're seeing.

moberegger · 2025-05-30T15:43:24Z

Filed upstream: rails#597

Insomniak47 · 2025-06-02T13:07:13Z

lib/jbuilder.rb

-    @key_formatter = options.fetch(:key_formatter){ @@key_formatter ? @@key_formatter.clone : nil}
-    @ignore_nil = options.fetch(:ignore_nil, @@ignore_nil)
-    @deep_format_keys = options.fetch(:deep_format_keys, @@deep_format_keys)
+    @key_formatter = options&.[](:key_formatter) || @@key_formatter


Only concern I have generally here is that this is not thread safe unless the key formatter is (assuming the clone operation is thread safe). As well can we obviate the need for the inline nil checks on the options/do you see any perf improvements if you branch instead in the nil case?

if options == nil @key_formatter = @@key_formatter.clone @ignore_nil = @@ignore_nil # ... else # ... The existing stuff without the safe index operation end

The previous iteration of KeyFormatter was thread safe because you would get a fresh cache when it was cloned. But this means that the cache only exists for the duration of the request which limits its usefulness IMO:

Depending on the template, you pay the cost to cache a key without ever yielding a cache hit (ex: first_name is only formatted a single time)

The cost to format doesn't amortize across requests (ie. you have to re-format first_name at least once per request), increasing the number of times those costly operations run.

Given a KeyFormatter that runs .camelize(:lower) on each key (for example), an input of first_name will always yield a result of firstName. There is no need to re-compute this. While what I am proposing is not technically thread safe, I'd argue it doesn't need to be because there isn't a race condition that would result in an undesirable outcome. Really, this is closer to memoizing the result than a cache per se.

We have Puma configured to run three threads - which I believe is also the default for Rails - so there is a possibility that three requests attempt to render the same key at the same time resulting in that key being computed three times, but the result will be the same.

We could add a mutex here, but there is runtime overhead that would negate any performance gains we'd get from having the cache persist across requests, and in practice it wouldn't protect us from anything.

I'd argue it doesn't need to be because there isn't a race condition that would result in any undesirable outcome. Really, this is closer to memoizing the result than a cache per se.

Two requests that caused reallocation of the underlying memory at the same time would result in a memory error would it not? The underlying memory in the hash map is not protected my a critical section and could result in allocations and leaks

We could add a mutex here, but there is runtime overhead that would negate any performance gains we'd get from having the cache persist across requests, and in practice it wouldn't protect us from anything.

I don't think this is true

As well I don't think gating it with a RW lock w/ upgrading would necessarily add much overhead given the read heavy nature given that it's usually just a single atomic increment for these sort of maps unless there's a ton of read/write conflicts I don't believe a RW lock would be an issue.

here is a possibility that three requests attempt to render the same key at the same time resulting in that key being computed three times

Thinking about it more, I'm not sure this scenario is even possible with the GVL because I moved the formatting to the default proc, which I believe makes it a single syscall, which means the thread scheduler won't switch contexts while the formatter runs. Grain of salt, though. Did some more reading: the default proc doesn't make it atomic.

The usage of ||= in the original implementation could context switch between the lookup, nil check, and assignment. But even then, I think the GVL would prevent the issue you're talking about.

I don't believe this is possible with the GVL; only one thread can execute Ruby code at a time.

If the execution of whatever operation is not guaranteed to be a single call by contract we can't assume this will continue to be true

The usage of ||= in the original implementation could context switch between the lookup, nil check, and assignment. But even then, I think the GVL would prevent the issue you're talking about.

I don't believe there's a guarantee made by the language that does that so we shouldn't be relying on implementation details of any particular interpreter given a quick googlin' shows it's an issue in different interpreters (JRuby, etc). We should not be making un-threadsafe code that runs in a concurrent context or code that is incidentally threadsafe but not assuredly so because there are no guarantees that the code will stay in its current form forever and when it changes it should be able to be changed safely.

My perf concerns were unfounded. Did a quick implementation with the mutex and re-ran the bench mark to compare

# No key formatter; mimics what we're doing now json.set! :firstName, person[:first_name] json.set! :lastName, person[:last_name] json.set! :age, person[:age] json.set! :city, person[:city]

to

# With key formatter for `camelize: :lower`, cache hits after first render json.extract! person, :first_name, :last_name, :age, :city

and it's in line with what was seen in my previous comment

ruby 3.4.4 (2025-05-14 revision a38531fd3f) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- set! 97.310k i/100ms extract! 133.702k i/100ms Calculating ------------------------------------- set! 948.250k (± 9.5%) i/s (1.05 μs/i) - 4.671M in 5.002671s extract! 1.369M (± 8.8%) i/s (730.21 ns/i) - 6.819M in 5.044080s Comparison: extract!: 1369470.1 i/s set!: 948249.9 i/s - 1.44x slower Calculating ------------------------------------- set! 480.000 memsize ( 0.000 retained) 12.000 objects ( 0.000 retained) 4.000 strings ( 0.000 retained) extract! 80.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) Comparison: extract!: 80 allocated set!: 480 allocated - 6.00x more

The latter being faster and less memory intensive than the former is really all I care about for this iteration, and this still holds true with the mutex. Last time I tried something similar to this I observed a performance hit. That would have been on Ruby 3.3 and I also probably did it without yjit, so perhaps 3.4 and/or yjit does some heavy lifting here for us.

Regarding

if options == nil @key_formatter = @@key_formatter.clone @ignore_nil = @@ignore_nil # ... else # ... The existing stuff without the safe index operation end

~~it made no difference.~~ Oops! I borked it. Correction - it does make a difference when options are provided.

Yeah, makes sense with the caching especially, assuming it's a one time cost now?

Insomniak47

Just blocking on the thread safety question

Insomniak47 · 2025-06-02T13:07:39Z

lib/jbuilder/key_formatter.rb

-      @cache = {}
+    def initialize(*formats, **formats_with_options)
+      @cache =
+        Hash.new do |hash, key|


This is not thread safe and my understanding from the above is you're sharing instances now?

Generally speaking it's very uncommon to have thread safe hashmaps without mutexes and they're usually slower (there are a few lock free ones in the wild)

Insomniak47 · 2025-06-02T18:48:52Z

lib/jbuilder/key_formatter.rb


-    def initialize_copy(original)
-      @cache = {}
+          hash[key] = value


I'm not sure if it's the docs saw but:

* You can define a per-key default for a hash; * that is, a Proc that will return a value based on the key itself. * * You can set the default proc when the hash is created with Hash.new and a block, * or later with method #default_proc=. * * Note that the proc can modify +self+, * but modifying +self+ in this way is not thread-safe; * multiple threads can concurrently call into the default proc * for the same key.

This is specifically called out as unsafe in the docs for yjit and I'm pretty sure that without a critical section there's no way that this could be made to be atomic without the gvl blocking for an arbitrary amount of execution

On top of that I don't even think indexed assignment is guaranteed to be threadsafe in these contexts by the language. An older blogpost showing similar failures. These may or may not be possible in yjit (I'd assume that at least some types of concurrency related failures are) but we should be writing to the guarantees we have not incidental facts esp when we're writing libraries.

I'm not sure about ruby mutex performance, I do know that locking an uncontested RW lock in most other langs is ~20ns and your bench is running in the µs range - if you want to share the map you might wanna just see if you can get a RWlock in there and see the cost

I'm poking around with Mutex now. Studying up on some other concurrent map implementations for inspiration and to see if there is anything else I should be thinking about.

Insomniak47 · 2025-06-03T13:29:12Z

lib/jbuilder.rb

-    @deep_format_keys = options.fetch(:deep_format_keys, @@deep_format_keys)
+    if options
+      @key_formatter = options[:key_formatter]
+      @ignore_nil = options[:ignore_nil]


My ruby is bad now - will this be nil if it's unset which I know is falsy but..

options can't be unset. It is a required parameter that defaults to nil.

I can change this to

if options.nil? @key_formatter = @@key_formatter @ignore_nil = @@ignore_nil @deep_format_keys = @@deep_format_keys else @key_formatter = options[:key_formatter] @ignore_nil = options[:ignore_nil] @deep_format_keys = options[:deep_format_keys] end

to make it clearer (and perhaps safer).

Actually... I wonder if it's better to just use kwargs here... hold on, I think I got something better.

options can't be unset. It is a required parameter that defaults to nil.

I can change this to

if options.nil? @key_formatter = @@key_formatter @ignore_nil = @@ignore_nil @deep_format_keys = @@deep_format_keys else @key_formatter = options[:key_formatter] @ignore_nil = options[:ignore_nil] @deep_format_keys = options[:deep_format_keys] end

to make it clearer (and perhaps safer).

I mean the value in :ignore_nil is a nil if it's not there right?

Yeah, Hashes will return nil if the key doesn't exist. Should have been false. Likewise with : deep_format_keys. I was too careless hammering this out.

Moot point. Moved it over to use key args, which actually speeds things up a bit more.

moberegger · 2025-06-03T14:58:56Z

ActionView::Template::Error: no implicit conversion of nil into Hash

Ah right, **nil only works from 3.3 on... one moment please.

Insomniak47 · 2025-06-04T13:15:42Z

lib/jbuilder/key_formatter.rb

-          func.call result, *args
-        else
-          result.send func, *args
+      @mutex.synchronize do


It's very annoying that ruby doesn't come native with a RW lock (esp an upgradable one) cause this would be a perfect case for one given the expected use patterns. Might be some check-lock-check patterns that might be a bit faster under high contention but not worth exploring atm.

It doesn't, but the concurrent gem provides one, and we use it already in the app: https://ruby-concurrency.github.io/concurrent-ruby/1.1.5/Concurrent/ReadWriteLock.html

We can't fork and then add deps I don't think and achieve the goal of compat

yeah I know

Insomniak47

LGTM

moberegger added 3 commits May 30, 2025 10:11

Optimize key formatter

7c58bcc

Reuse key cache across template renders

b524e6c

Save on some memory allocation during initialization

47824af

moberegger marked this pull request as ready for review May 30, 2025 15:02

moberegger requested review from Insomniak47 and mscrivo May 30, 2025 15:03

moberegger commented May 30, 2025

View reviewed changes

Insomniak47 reviewed Jun 2, 2025

View reviewed changes

Insomniak47 requested changes Jun 2, 2025

View reviewed changes

Insomniak47 reviewed Jun 2, 2025

View reviewed changes

moberegger added 2 commits June 2, 2025 16:05

Use mutex for thread safety

a6f1eed

Optimize jbuilder initialization

964ef0b

Insomniak47 reviewed Jun 3, 2025

View reviewed changes

moberegger added 2 commits June 3, 2025 10:34

Repurpose test to validate default formatter cache is used

34814e7

Invert options nil check

5ea899b

moberegger force-pushed the moberegger/optimize_key_formatter branch from 434c7aa to 2f3f902 Compare June 3, 2025 15:02

Cleanup initializer

be609e7

moberegger force-pushed the moberegger/optimize_key_formatter branch from 2f3f902 to be609e7 Compare June 3, 2025 15:04

moberegger requested a review from Insomniak47 June 3, 2025 16:59

Insomniak47 reviewed Jun 4, 2025

View reviewed changes

Insomniak47 approved these changes Jun 4, 2025

View reviewed changes

mscrivo approved these changes Jun 4, 2025

View reviewed changes

Save on call to ::Kernal.block_given?

e0d60fd

moberegger merged commit 2e0349b into main Jun 4, 2025
30 checks passed

moberegger deleted the moberegger/optimize_key_formatter branch June 13, 2025 18:54

moberegger restored the moberegger/optimize_key_formatter branch June 17, 2025 02:05

Optimize key formatter #6

Optimize key formatter #6

Uh oh!

Conversation

moberegger commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moberegger commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moberegger commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moberegger Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Insomniak47 Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moberegger Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moberegger Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moberegger Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Insomniak47 left a comment

Choose a reason for hiding this comment

Uh oh!

Insomniak47 Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moberegger Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moberegger commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Insomniak47 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

moberegger commented May 30, 2025 •

edited

Loading

moberegger commented May 30, 2025 •

edited

Loading

moberegger Jun 2, 2025 •

edited

Loading

Insomniak47 Jun 2, 2025 •

edited

Loading

moberegger Jun 2, 2025 •

edited

Loading

moberegger Jun 2, 2025 •

edited

Loading

moberegger Jun 2, 2025 •

edited

Loading

Insomniak47 Jun 2, 2025 •

edited

Loading

moberegger Jun 3, 2025 •

edited

Loading

Insomniak47 Jun 4, 2025 •

edited

Loading