Skip to content

Conversation

@moberegger
Copy link

@moberegger moberegger commented May 30, 2025

Makes a few optimizations to the Jbuilder::KeyFormatter integration:

  • Moves the computation of the cache key to the Hash itself, which performs slightly better than using ||=. I believe this because Ruby can compute the key in a single sys call.
  • Uses *args and **kwargs to handle the provided format symbols. This saves on having to compute a list of formatters from the provided options, and saves on some memory allocations for the empty [] arrays that represented "no parameters". While this is definitely a micro-optimization (ie. it would only be a hot code path of a template itself uses json.key_format!) it does make things easier to read.
  • Uses .each over .inject when computing cache keys, which is slightly faster.
  • Saves on some memory allocation when initializing either Jbuilder or JbuilderTemplate with no options, which is what the template handler does (json||=JbuilderTemplate.new(self)). No need to allocation an empty Hash for this.
  • Optimizes Jbuilder initialization by using the [] operator instead of fetch when grabbing values from options
  • Saves on a memory allocation for both key_format! and self.key_format by simply passing args through to KeyFormatter.new. This is a micro-optimization, but can be a hotter code path if a template uses json.key_format!.

Now the big change here is that the key formatter cache is now no longer clobbered between template renders. What was originally happening was that when you configured a global key formatter with something like Jbuilder.key_format = camelize: :lower, it would be .cloned whenever a new Jbuilder was initialized, which happens before each template render. When it was cloned, the cache would be wiped it when KeyFormatter#initialize_copy ran. This meant that each template render - and thus each API request - would start with a fresh cache. This meant that you would have to re-pay the cost to format the keys all over again.

I'm not sure why it was done this way. When configuring a global key formatter for Jbuilder, I would expect that to be used across requests so that the cost to generate the keys could amortize as the service runs. This behaviour was originally added twelve years ago, specifically in this commit, but it's not clear to me what the intent or motivation was to do that.

Currently we don't use jbuilders key formatting abilities... so you may be wondering why I'm doing this. Consider the following...

Without the key formatter, we would have to write templates like the following to get camelized keys

json.set! :firstName, person[:first_name]
json.set! :lastName, person[:last_name]
json.set! :age, person[:age]
json.set! :city, person[:city]

With a key formatter configured like Jbuilder.key_format = camelize: :lower, we could instead do

json.extract! person, :first_name, :last_name, :age, :city

and get the same result. Comparing the two...

ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                set!    53.232k i/100ms
            extract!    43.572k i/100ms
Calculating -------------------------------------
                set!    532.799k (± 1.5%) i/s    (1.88 μs/i) -      2.715M in   5.096552s
            extract!    598.457k (± 1.4%) i/s    (1.67 μs/i) -      3.006M in   5.024653s

Comparison:
            extract!:   598457.4 i/s
                set!:   532799.1 i/s - 1.12x  slower
Calculating -------------------------------------
                set!   320.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         4.000  strings (     0.000  retained)
            extract!    80.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
            extract!:         80 allocated
                set!:        320 allocated - 4.00x more

So this change will allow us to leverage extract! more to save on latency and memory. This is because extract! does a lot less work under the hood, where as set! has a wide range of abilities and this has cpu and memory overhead to manage the various options that can be provided to it.

With better caching for the key formatter we now have a tenable way to mitigate much of the overhead within the library.

@moberegger moberegger marked this pull request as ready for review May 30, 2025 15:02
@moberegger moberegger requested review from Insomniak47 and mscrivo May 30, 2025 15:03
Comment on lines 787 to 793
test 'do not use default key formatter directly' do
Jbuilder.key_format
jbuild{ |json| json.key 'value' }
formatter = Jbuilder.send(:class_variable_get, '@@key_formatter')
cache = formatter.instance_variable_get('@cache')
assert_empty cache
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe repurpose this to validate that it is used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose I should just repurpose this test. With the Mutex I'm much more confident with the approach.

@moberegger
Copy link
Author

moberegger commented May 30, 2025

The benchmark in the description was run against a raw Jbuilder. The differences are even more pronounced when using JbuilderTemplate, which is what is used in the action view integration and is thus more representative of what would happen in prod.

ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                set!    41.713k i/100ms
            extract!    60.275k i/100ms
Calculating -------------------------------------
                set!    423.084k (± 6.9%) i/s    (2.36 μs/i) -      2.127M in   5.072925s
            extract!    596.795k (± 3.7%) i/s    (1.68 μs/i) -      3.014M in   5.058895s

Comparison:
            extract!:   596794.9 i/s
                set!:   423083.6 i/s - 1.41x  slower
Calculating -------------------------------------
                set!   480.000  memsize (     0.000  retained)
                        12.000  objects (     0.000  retained)
                         4.000  strings (     0.000  retained)
            extract!    80.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
            extract!:         80 allocated
                set!:        480 allocated - 6.00x more

This new approach would circumvent many of the cpu and memory hotspots we're seeing.

@moberegger
Copy link
Author

Filed upstream: rails#597

lib/jbuilder.rb Outdated
@key_formatter = options.fetch(:key_formatter){ @@key_formatter ? @@key_formatter.clone : nil}
@ignore_nil = options.fetch(:ignore_nil, @@ignore_nil)
@deep_format_keys = options.fetch(:deep_format_keys, @@deep_format_keys)
@key_formatter = options&.[](:key_formatter) || @@key_formatter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only concern I have generally here is that this is not thread safe unless the key formatter is (assuming the clone operation is thread safe). As well can we obviate the need for the inline nil checks on the options/do you see any perf improvements if you branch instead in the nil case?

if options == nil
  @key_formatter = @@key_formatter.clone
  @ignore_nil = @@ignore_nil
  # ... 
else 
  # ... The existing stuff without the safe index operation
end

Copy link
Author

@moberegger moberegger Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous iteration of KeyFormatter was thread safe because you would get a fresh cache when it was cloned. But this means that the cache only exists for the duration of the request which limits its usefulness IMO:

  • Depending on the template, you pay the cost to cache a key without ever yielding a cache hit (ex: first_name is only formatted a single time)
  • The cost to format doesn't amortize across requests (ie. you have to re-format first_name at least once per request), increasing the number of times those costly operations run.

Given a KeyFormatter that runs .camelize(:lower) on each key (for example), an input of first_name will always yield a result of firstName. There is no need to re-compute this. While what I am proposing is not technically thread safe, I'd argue it doesn't need to be because there isn't a race condition that would result in an undesirable outcome. Really, this is closer to memoizing the result than a cache per se.

We have Puma configured to run three threads - which I believe is also the default for Rails - so there is a possibility that three requests attempt to render the same key at the same time resulting in that key being computed three times, but the result will be the same.

We could add a mutex here, but there is runtime overhead that would negate any performance gains we'd get from having the cache persist across requests, and in practice it wouldn't protect us from anything.

Copy link

@Insomniak47 Insomniak47 Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue it doesn't need to be because there isn't a race condition that would result in any undesirable outcome. Really, this is closer to memoizing the result than a cache per se.

Two requests that caused reallocation of the underlying memory at the same time would result in a memory error would it not? The underlying memory in the hash map is not protected my a critical section and could result in allocations and leaks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a mutex here, but there is runtime overhead that would negate any performance gains we'd get from having the cache persist across requests, and in practice it wouldn't protect us from anything.

I don't think this is true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well I don't think gating it with a RW lock w/ upgrading would necessarily add much overhead given the read heavy nature given that it's usually just a single atomic increment for these sort of maps unless there's a ton of read/write conflicts I don't believe a RW lock would be an issue.

Copy link
Author

@moberegger moberegger Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a possibility that three requests attempt to render the same key at the same time resulting in that key being computed three times

Thinking about it more, I'm not sure this scenario is even possible with the GVL because I moved the formatting to the default proc, which I believe makes it a single syscall, which means the thread scheduler won't switch contexts while the formatter runs. Grain of salt, though. Did some more reading: the default proc doesn't make it atomic.

The usage of ||= in the original implementation could context switch between the lookup, nil check, and assignment. But even then, I think the GVL would prevent the issue you're talking about.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is possible with the GVL; only one thread can execute Ruby code at a time.

If the execution of whatever operation is not guaranteed to be a single call by contract we can't assume this will continue to be true

The usage of ||= in the original implementation could context switch between the lookup, nil check, and assignment. But even then, I think the GVL would prevent the issue you're talking about.

I don't believe there's a guarantee made by the language that does that so we shouldn't be relying on implementation details of any particular interpreter given a quick googlin' shows it's an issue in different interpreters (JRuby, etc). We should not be making un-threadsafe code that runs in a concurrent context or code that is incidentally threadsafe but not assuredly so because there are no guarantees that the code will stay in its current form forever and when it changes it should be able to be changed safely.

Copy link
Author

@moberegger moberegger Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My perf concerns were unfounded. Did a quick implementation with the mutex and re-ran the bench mark to compare

# No key formatter; mimics what we're doing now
json.set! :firstName, person[:first_name]
json.set! :lastName, person[:last_name]
json.set! :age, person[:age]
json.set! :city, person[:city]

to

# With key formatter for `camelize: :lower`, cache hits after first render
json.extract! person, :first_name, :last_name, :age, :city

and it's in line with what was seen in my previous comment

ruby 3.4.4 (2025-05-14 revision a38531fd3f) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                set!    97.310k i/100ms
            extract!   133.702k i/100ms
Calculating -------------------------------------
                set!    948.250k (± 9.5%) i/s    (1.05 μs/i) -      4.671M in   5.002671s
            extract!      1.369M (± 8.8%) i/s  (730.21 ns/i) -      6.819M in   5.044080s

Comparison:
            extract!:  1369470.1 i/s
                set!:   948249.9 i/s - 1.44x  slower

Calculating -------------------------------------
                set!   480.000  memsize (     0.000  retained)
                        12.000  objects (     0.000  retained)
                         4.000  strings (     0.000  retained)
            extract!    80.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
            extract!:         80 allocated
                set!:        480 allocated - 6.00x more

The latter being faster and less memory intensive than the former is really all I care about for this iteration, and this still holds true with the mutex. Last time I tried something similar to this I observed a performance hit. That would have been on Ruby 3.3 and I also probably did it without yjit, so perhaps 3.4 and/or yjit does some heavy lifting here for us.

Copy link
Author

@moberegger moberegger Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding

if options == nil
  @key_formatter = @@key_formatter.clone
  @ignore_nil = @@ignore_nil
  # ... 
else 
  # ... The existing stuff without the safe index operation
end

it made no difference. Oops! I borked it. Correction - it does make a difference when options are provided.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense with the caching especially, assuming it's a one time cost now?

Copy link

@Insomniak47 Insomniak47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just blocking on the thread safety question

@cache = {}
def initialize(*formats, **formats_with_options)
@cache =
Hash.new do |hash, key|
Copy link

@Insomniak47 Insomniak47 Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread safe and my understanding from the above is you're sharing instances now?

Generally speaking it's very uncommon to have thread safe hashmaps without mutexes and they're usually slower (there are a few lock free ones in the wild)


def initialize_copy(original)
@cache = {}
hash[key] = value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's the docs saw but:

 *  You can define a per-key default for a hash;
 *  that is, a Proc that will return a value based on the key itself.
 *
 *  You can set the default proc when the hash is created with Hash.new and a block,
 *  or later with method #default_proc=.
 *
 *  Note that the proc can modify +self+,
 *  but modifying +self+ in this way is not thread-safe;
 *  multiple threads can concurrently call into the default proc
 *  for the same key.

This is specifically called out as unsafe in the docs for yjit and I'm pretty sure that without a critical section there's no way that this could be made to be atomic without the gvl blocking for an arbitrary amount of execution

On top of that I don't even think indexed assignment is guaranteed to be threadsafe in these contexts by the language. An older blogpost showing similar failures. These may or may not be possible in yjit (I'd assume that at least some types of concurrency related failures are) but we should be writing to the guarantees we have not incidental facts esp when we're writing libraries.

I'm not sure about ruby mutex performance, I do know that locking an uncontested RW lock in most other langs is ~20ns and your bench is running in the µs range - if you want to share the map you might wanna just see if you can get a RWlock in there and see the cost

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm poking around with Mutex now. Studying up on some other concurrent map implementations for inspiration and to see if there is anything else I should be thinking about.

lib/jbuilder.rb Outdated
@deep_format_keys = options.fetch(:deep_format_keys, @@deep_format_keys)
if options
@key_formatter = options[:key_formatter]
@ignore_nil = options[:ignore_nil]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ruby is bad now - will this be nil if it's unset which I know is falsy but..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options can't be unset. It is a required parameter that defaults to nil.

I can change this to

    if options.nil?
      @key_formatter = @@key_formatter
      @ignore_nil = @@ignore_nil
      @deep_format_keys = @@deep_format_keys
    else
      @key_formatter = options[:key_formatter]
      @ignore_nil = options[:ignore_nil]
      @deep_format_keys = options[:deep_format_keys]
    end

to make it clearer (and perhaps safer).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... I wonder if it's better to just use kwargs here... hold on, I think I got something better.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options can't be unset. It is a required parameter that defaults to nil.

I can change this to

    if options.nil?
      @key_formatter = @@key_formatter
      @ignore_nil = @@ignore_nil
      @deep_format_keys = @@deep_format_keys
    else
      @key_formatter = options[:key_formatter]
      @ignore_nil = options[:ignore_nil]
      @deep_format_keys = options[:deep_format_keys]
    end

to make it clearer (and perhaps safer).

I mean the value in :ignore_nil is a nil if it's not there right?

Copy link
Author

@moberegger moberegger Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Hashes will return nil if the key doesn't exist. Should have been false. Likewise with : deep_format_keys. I was too careless hammering this out.

Moot point. Moved it over to use key args, which actually speeds things up a bit more.

@moberegger
Copy link
Author

ActionView::Template::Error: no implicit conversion of nil into Hash

Ah right, **nil only works from 3.3 on... one moment please.

@moberegger moberegger force-pushed the moberegger/optimize_key_formatter branch from 434c7aa to 2f3f902 Compare June 3, 2025 15:02
@moberegger moberegger force-pushed the moberegger/optimize_key_formatter branch from 2f3f902 to be609e7 Compare June 3, 2025 15:04
@moberegger moberegger requested a review from Insomniak47 June 3, 2025 16:59
func.call result, *args
else
result.send func, *args
@mutex.synchronize do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very annoying that ruby doesn't come native with a RW lock (esp an upgradable one) cause this would be a perfect case for one given the expected use patterns. Might be some check-lock-check patterns that might be a bit faster under high contention but not worth exploring atm.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, but the concurrent gem provides one, and we use it already in the app: https://ruby-concurrency.github.io/concurrent-ruby/1.1.5/Concurrent/ReadWriteLock.html

Copy link

@Insomniak47 Insomniak47 Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't fork and then add deps I don't think and achieve the goal of compat

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I know

Copy link

@Insomniak47 Insomniak47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@moberegger moberegger merged commit 2e0349b into main Jun 4, 2025
30 checks passed
@moberegger moberegger deleted the moberegger/optimize_key_formatter branch June 13, 2025 18:54
@moberegger moberegger restored the moberegger/optimize_key_formatter branch June 17, 2025 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants