-
Notifications
You must be signed in to change notification settings - Fork 0
Optimize key formatter #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| test 'do not use default key formatter directly' do | ||
| Jbuilder.key_format | ||
| jbuild{ |json| json.key 'value' } | ||
| formatter = Jbuilder.send(:class_variable_get, '@@key_formatter') | ||
| cache = formatter.instance_variable_get('@cache') | ||
| assert_empty cache | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe repurpose this to validate that it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose I should just repurpose this test. With the Mutex I'm much more confident with the approach.
|
The benchmark in the description was run against a raw This new approach would circumvent many of the cpu and memory hotspots we're seeing. |
|
Filed upstream: rails#597 |
lib/jbuilder.rb
Outdated
| @key_formatter = options.fetch(:key_formatter){ @@key_formatter ? @@key_formatter.clone : nil} | ||
| @ignore_nil = options.fetch(:ignore_nil, @@ignore_nil) | ||
| @deep_format_keys = options.fetch(:deep_format_keys, @@deep_format_keys) | ||
| @key_formatter = options&.[](:key_formatter) || @@key_formatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only concern I have generally here is that this is not thread safe unless the key formatter is (assuming the clone operation is thread safe). As well can we obviate the need for the inline nil checks on the options/do you see any perf improvements if you branch instead in the nil case?
if options == nil
@key_formatter = @@key_formatter.clone
@ignore_nil = @@ignore_nil
# ...
else
# ... The existing stuff without the safe index operation
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous iteration of KeyFormatter was thread safe because you would get a fresh cache when it was cloned. But this means that the cache only exists for the duration of the request which limits its usefulness IMO:
- Depending on the template, you pay the cost to cache a key without ever yielding a cache hit (ex:
first_nameis only formatted a single time) - The cost to format doesn't amortize across requests (ie. you have to re-format
first_nameat least once per request), increasing the number of times those costly operations run.
Given a KeyFormatter that runs .camelize(:lower) on each key (for example), an input of first_name will always yield a result of firstName. There is no need to re-compute this. While what I am proposing is not technically thread safe, I'd argue it doesn't need to be because there isn't a race condition that would result in an undesirable outcome. Really, this is closer to memoizing the result than a cache per se.
We have Puma configured to run three threads - which I believe is also the default for Rails - so there is a possibility that three requests attempt to render the same key at the same time resulting in that key being computed three times, but the result will be the same.
We could add a mutex here, but there is runtime overhead that would negate any performance gains we'd get from having the cache persist across requests, and in practice it wouldn't protect us from anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue it doesn't need to be because there isn't a race condition that would result in any undesirable outcome. Really, this is closer to memoizing the result than a cache per se.
Two requests that caused reallocation of the underlying memory at the same time would result in a memory error would it not? The underlying memory in the hash map is not protected my a critical section and could result in allocations and leaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a mutex here, but there is runtime overhead that would negate any performance gains we'd get from having the cache persist across requests, and in practice it wouldn't protect us from anything.
I don't think this is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well I don't think gating it with a RW lock w/ upgrading would necessarily add much overhead given the read heavy nature given that it's usually just a single atomic increment for these sort of maps unless there's a ton of read/write conflicts I don't believe a RW lock would be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a possibility that three requests attempt to render the same key at the same time resulting in that key being computed three times
Thinking about it more, I'm not sure this scenario is even possible with the GVL because I moved the formatting to the default proc, which I believe makes it a single syscall, which means the thread scheduler won't switch contexts while the formatter runs. Grain of salt, though. Did some more reading: the default proc doesn't make it atomic.
The usage of ||= in the original implementation could context switch between the lookup, nil check, and assignment. But even then, I think the GVL would prevent the issue you're talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is possible with the GVL; only one thread can execute Ruby code at a time.
If the execution of whatever operation is not guaranteed to be a single call by contract we can't assume this will continue to be true
The usage of ||= in the original implementation could context switch between the lookup, nil check, and assignment. But even then, I think the GVL would prevent the issue you're talking about.
I don't believe there's a guarantee made by the language that does that so we shouldn't be relying on implementation details of any particular interpreter given a quick googlin' shows it's an issue in different interpreters (JRuby, etc). We should not be making un-threadsafe code that runs in a concurrent context or code that is incidentally threadsafe but not assuredly so because there are no guarantees that the code will stay in its current form forever and when it changes it should be able to be changed safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My perf concerns were unfounded. Did a quick implementation with the mutex and re-ran the bench mark to compare
# No key formatter; mimics what we're doing now
json.set! :firstName, person[:first_name]
json.set! :lastName, person[:last_name]
json.set! :age, person[:age]
json.set! :city, person[:city]to
# With key formatter for `camelize: :lower`, cache hits after first render
json.extract! person, :first_name, :last_name, :age, :cityand it's in line with what was seen in my previous comment
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
set! 97.310k i/100ms
extract! 133.702k i/100ms
Calculating -------------------------------------
set! 948.250k (± 9.5%) i/s (1.05 μs/i) - 4.671M in 5.002671s
extract! 1.369M (± 8.8%) i/s (730.21 ns/i) - 6.819M in 5.044080s
Comparison:
extract!: 1369470.1 i/s
set!: 948249.9 i/s - 1.44x slower
Calculating -------------------------------------
set! 480.000 memsize ( 0.000 retained)
12.000 objects ( 0.000 retained)
4.000 strings ( 0.000 retained)
extract! 80.000 memsize ( 0.000 retained)
1.000 objects ( 0.000 retained)
0.000 strings ( 0.000 retained)
Comparison:
extract!: 80 allocated
set!: 480 allocated - 6.00x more
The latter being faster and less memory intensive than the former is really all I care about for this iteration, and this still holds true with the mutex. Last time I tried something similar to this I observed a performance hit. That would have been on Ruby 3.3 and I also probably did it without yjit, so perhaps 3.4 and/or yjit does some heavy lifting here for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding
if options == nil
@key_formatter = @@key_formatter.clone
@ignore_nil = @@ignore_nil
# ...
else
# ... The existing stuff without the safe index operation
endit made no difference. Oops! I borked it. Correction - it does make a difference when options are provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense with the caching especially, assuming it's a one time cost now?
Insomniak47
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just blocking on the thread safety question
lib/jbuilder/key_formatter.rb
Outdated
| @cache = {} | ||
| def initialize(*formats, **formats_with_options) | ||
| @cache = | ||
| Hash.new do |hash, key| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread safe and my understanding from the above is you're sharing instances now?
Generally speaking it's very uncommon to have thread safe hashmaps without mutexes and they're usually slower (there are a few lock free ones in the wild)
lib/jbuilder/key_formatter.rb
Outdated
|
|
||
| def initialize_copy(original) | ||
| @cache = {} | ||
| hash[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's the docs saw but:
* You can define a per-key default for a hash;
* that is, a Proc that will return a value based on the key itself.
*
* You can set the default proc when the hash is created with Hash.new and a block,
* or later with method #default_proc=.
*
* Note that the proc can modify +self+,
* but modifying +self+ in this way is not thread-safe;
* multiple threads can concurrently call into the default proc
* for the same key.
This is specifically called out as unsafe in the docs for yjit and I'm pretty sure that without a critical section there's no way that this could be made to be atomic without the gvl blocking for an arbitrary amount of execution
On top of that I don't even think indexed assignment is guaranteed to be threadsafe in these contexts by the language. An older blogpost showing similar failures. These may or may not be possible in yjit (I'd assume that at least some types of concurrency related failures are) but we should be writing to the guarantees we have not incidental facts esp when we're writing libraries.
I'm not sure about ruby mutex performance, I do know that locking an uncontested RW lock in most other langs is ~20ns and your bench is running in the µs range - if you want to share the map you might wanna just see if you can get a RWlock in there and see the cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm poking around with Mutex now. Studying up on some other concurrent map implementations for inspiration and to see if there is anything else I should be thinking about.
lib/jbuilder.rb
Outdated
| @deep_format_keys = options.fetch(:deep_format_keys, @@deep_format_keys) | ||
| if options | ||
| @key_formatter = options[:key_formatter] | ||
| @ignore_nil = options[:ignore_nil] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My ruby is bad now - will this be nil if it's unset which I know is falsy but..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
options can't be unset. It is a required parameter that defaults to nil.
I can change this to
if options.nil?
@key_formatter = @@key_formatter
@ignore_nil = @@ignore_nil
@deep_format_keys = @@deep_format_keys
else
@key_formatter = options[:key_formatter]
@ignore_nil = options[:ignore_nil]
@deep_format_keys = options[:deep_format_keys]
endto make it clearer (and perhaps safer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... I wonder if it's better to just use kwargs here... hold on, I think I got something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optionscan't be unset. It is a required parameter that defaults tonil.I can change this to
if options.nil? @key_formatter = @@key_formatter @ignore_nil = @@ignore_nil @deep_format_keys = @@deep_format_keys else @key_formatter = options[:key_formatter] @ignore_nil = options[:ignore_nil] @deep_format_keys = options[:deep_format_keys] endto make it clearer (and perhaps safer).
I mean the value in :ignore_nil is a nil if it's not there right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Hashes will return nil if the key doesn't exist. Should have been false. Likewise with : deep_format_keys. I was too careless hammering this out.
Moot point. Moved it over to use key args, which actually speeds things up a bit more.
Ah right, |
434c7aa to
2f3f902
Compare
2f3f902 to
be609e7
Compare
| func.call result, *args | ||
| else | ||
| result.send func, *args | ||
| @mutex.synchronize do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very annoying that ruby doesn't come native with a RW lock (esp an upgradable one) cause this would be a perfect case for one given the expected use patterns. Might be some check-lock-check patterns that might be a bit faster under high contention but not worth exploring atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, but the concurrent gem provides one, and we use it already in the app: https://ruby-concurrency.github.io/concurrent-ruby/1.1.5/Concurrent/ReadWriteLock.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't fork and then add deps I don't think and achieve the goal of compat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I know
Insomniak47
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Makes a few optimizations to the
Jbuilder::KeyFormatterintegration:Hashitself, which performs slightly better than using||=. I believe this because Ruby can compute the key in a single sys call.*argsand**kwargsto handle the provided format symbols. This saves on having to compute a list of formatters from the providedoptions, and saves on some memory allocations for the empty[]arrays that represented "no parameters". While this is definitely a micro-optimization (ie. it would only be a hot code path of a template itself usesjson.key_format!) it does make things easier to read..eachover.injectwhen computing cache keys, which is slightly faster.JbuilderorJbuilderTemplatewith nooptions, which is what the template handler does (json||=JbuilderTemplate.new(self)). No need to allocation an emptyHashfor this.Jbuilderinitialization by using the[]operator instead offetchwhen grabbing values fromoptionskey_format!andself.key_formatby simply passing args through toKeyFormatter.new. This is a micro-optimization, but can be a hotter code path if a template usesjson.key_format!.Now the big change here is that the key formatter cache is now no longer clobbered between template renders. What was originally happening was that when you configured a global key formatter with something like
Jbuilder.key_format = camelize: :lower, it would be.cloned whenever a newJbuilderwas initialized, which happens before each template render. When it was cloned, the cache would be wiped it whenKeyFormatter#initialize_copyran. This meant that each template render - and thus each API request - would start with a fresh cache. This meant that you would have to re-pay the cost to format the keys all over again.I'm not sure why it was done this way. When configuring a global key formatter for Jbuilder, I would expect that to be used across requests so that the cost to generate the keys could amortize as the service runs. This behaviour was originally added twelve years ago, specifically in this commit, but it's not clear to me what the intent or motivation was to do that.
Currently we don't use
jbuilders key formatting abilities... so you may be wondering why I'm doing this. Consider the following...Without the key formatter, we would have to write templates like the following to get camelized keys
With a key formatter configured like
Jbuilder.key_format = camelize: :lower, we could instead doand get the same result. Comparing the two...
So this change will allow us to leverage
extract!more to save on latency and memory. This is becauseextract!does a lot less work under the hood, where asset!has a wide range of abilities and this has cpu and memory overhead to manage the various options that can be provided to it.With better caching for the key formatter we now have a tenable way to mitigate much of the overhead within the library.