-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small performance improvement for cold boot BEAM #79
Small performance improvement for cold boot BEAM #79
Conversation
I'm running my tests on AMD 24 cores/6.6.51-1-lts kernel. ``` Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:128:24] [ds:128:24:10] [async-threads:1] [jit:ns] Interactive Elixir (1.17.1) - press Ctrl+C to exit (type h() ENTER for help) ``` My colleagues on macs could not ever reproduce the cold boot slowdown, one person with Intel linux could. This change consists of two minor improvements. 1. Use `MapSet.new/1` rather than `Enum.into/2` Apparently, on some architectures at least, this makes a significant difference on cold VM boot: (clean `iex` required each time) ``` iex(1)> :timer.tc(fn -> MapSet.new([%{}]) end) {5, MapSet.new([%{}])} ``` ``` iex(1)> :timer.tc(fn -> Enum.into([%{}], MapSet.new()) end) {7931, MapSet.new([%{}])} iex(2)> :timer.tc(fn -> Enum.into([%{}], MapSet.new()) end) {12, MapSet.new([%{}])} ``` This change is functionally equivalent, but in code utilizing ConCache locks, it does cause occasional timeouts with acquire limited to 500ms, while running tests - especially if the seed happens to cause lock congestion very early on after task startup. I can also mitigate the issue locally, by warming up ConCache in `test_helper.exs` by doing a loop of enough isolated calls to cover all the partitions, but upstream change feels much better. Even if the _real_ issue is still elsewhere, the pure gain is still worth it I hope. 2. Ensure that when running tests where `ConCache` as a dependency, `ConCache.Lock.Resource` is loaded before the lock pool is started. In some cases, dynamic module resolution caused `Resource.new` to even add up 300ms of latency 🫣. I'm not sure if there's a cleaner way of making sure that module is loaded when the application starts? Thanks for looking.
lib/con_cache/lock_supervisor.ex
Outdated
@@ -10,6 +10,8 @@ defmodule ConCache.LockSupervisor do | |||
end | |||
|
|||
def start_link(n_partitions) do | |||
true = Code.ensure_loaded?(ConCache.Lock.Resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is you're only experiencing slowdown in development (mode=interactive), in production releases (mode=embedded), all the modules would be preloaded on boot so this line would not be necessary. Btw, I vaguely remember that function might be going through the code server which we should rather avoid at runtime, though booting supervisors and similar is about the only place where this is sensible enough.
Interestingly enough, I believe this:
true = Code.ensure_loaded?(ConCache.Lock.Resource)
def start_link(n_partitions) do
would solve your issue without impacting the runtime characteristics. That being said, ideally we fix the root cause, maybe it's code server and/or VM issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Wojtek, pretty sure I’ve tried that before to no avail (as well as explicit require) but will double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to reproduce the slowdown with just con_cache? Ideally something like:
$ time elixir -e 'Mix.install([:con_cache]) ; {:ok, _} = ConCache.start_link(ttl_check_interval: false)'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of:
iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))
{:ok, #PID<0.215.0>}
iex(2)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)
{5044, :ok}
iex(3)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)
{50, :ok}
and on this branch:
iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))
{:ok, #PID<0.242.0>}
iex(2)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)
{2138, :ok}
iex(3)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)
{49, :ok}
Now with more processes:
iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))
{:ok, #PID<0.215.0>}
iex(2)> 1..System.schedulers_online() |> Task.async_stream(fn _ -> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end) end) |> Enum.take(24)
[
ok: {6024, :ok},
ok: {6083, :ok},
ok: {6042, :ok},
ok: {4637, :ok},
ok: {4575, :ok},
ok: {4529, :ok},
ok: {4516, :ok},
ok: {4481, :ok},
ok: {4451, :ok},
ok: {4434, :ok},
ok: {4426, :ok},
ok: {4380, :ok},
ok: {4366, :ok},
ok: {4355, :ok},
ok: {4367, :ok},
ok: {4363, :ok},
ok: {4352, :ok},
ok: {4347, :ok},
ok: {4337, :ok},
ok: {4334, :ok},
ok: {4320, :ok},
ok: {4311, :ok},
ok: {4304, :ok},
ok: {4273, :ok}
]
and on this branch:
iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))
{:ok, #PID<0.215.0>}
iex(2)> 1..System.schedulers_online() |> Task.async_stream(fn _ -> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end) end) |> Enum.take(24)
[
ok: {2935, :ok},
ok: {2984, :ok},
ok: {2969, :ok},
ok: {2532, :ok},
ok: {2577, :ok},
ok: {2566, :ok},
ok: {2542, :ok},
ok: {2514, :ok},
ok: {2525, :ok},
ok: {2544, :ok},
ok: {2533, :ok},
ok: {2509, :ok},
ok: {2485, :ok},
ok: {2421, :ok},
ok: {2386, :ok},
ok: {2372, :ok},
ok: {2351, :ok},
ok: {2316, :ok},
ok: {2292, :ok},
ok: {2264, :ok},
ok: {2250, :ok},
ok: {2224, :ok},
ok: {2192, :ok},
ok: {2189, :ok}
]
Subsequent call:
iex(3)> 1..System.schedulers_online() |> Task.async_stream(fn _ -> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end) end) |> Enum.take(24)
[
ok: {71, :ok},
ok: {42, :ok},
ok: {16, :ok},
ok: {47, :ok},
ok: {75, :ok},
ok: {57, :ok},
ok: {66, :ok},
ok: {49, :ok},
ok: {47, :ok},
ok: {38, :ok},
ok: {45, :ok},
ok: {97, :ok},
ok: {94, :ok},
ok: {131, :ok},
ok: {162, :ok},
ok: {164, :ok},
ok: {140, :ok},
ok: {134, :ok},
ok: {142, :ok},
ok: {85, :ok},
ok: {108, :ok},
ok: {98, :ok},
ok: {58, :ok},
ok: {55, :ok}
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing there's more to do if we care about avoiding the code server in interactive mode. I've tried running https://github.com/DockYard/flame_on and it was indeed the culprit (hence the patch) - apart from the awkward MapSet
instantiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this happens because of concurrent attempts to load the module.
Looking at the code of ConCache.Resource
, the module is short, and there's no compile-time generation of functions and clauses, so this all seems very strange.
This needs further investigation, and I'm currently not inclined to add this workaround to ConCache. We should instead try to check if the similar problem exists in a simplified small project, where we try to do concurrent implicit lazy load of some small module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! MapSet.new
is an improvement, and I suggest we merge that.
I'm currently not in favor of ensure_loaded
, at least not until further investigation is done. I think we should try to reproduce this independently of ConCache.
lib/con_cache/lock_supervisor.ex
Outdated
@@ -10,6 +10,8 @@ defmodule ConCache.LockSupervisor do | |||
end | |||
|
|||
def start_link(n_partitions) do | |||
true = Code.ensure_loaded?(ConCache.Lock.Resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this happens because of concurrent attempts to load the module.
Looking at the code of ConCache.Resource
, the module is short, and there's no compile-time generation of functions and clauses, so this all seems very strange.
This needs further investigation, and I'm currently not inclined to add this workaround to ConCache. We should instead try to check if the similar problem exists in a simplified small project, where we try to do concurrent implicit lazy load of some small module.
Thank you! |
@sasa1977 would be great if you could bump hexpm release by the way. thank you. |
Released 1.1.1 to hex |
I'm running my tests on AMD 24 cores/6.6.51-1-lts kernel.
My colleagues on macs could not ever reproduce the cold boot slowdown, one person with Intel linux could.
This change consists of two minor improvements.
MapSet.new/1
rather thanEnum.into/2
Apparently, on some architectures at least, this makes a significant difference on cold VM boot:
(clean
iex
required each time)This change is functionally equivalent, but in code utilizing ConCache locks, it does cause occasional timeouts with acquire limited to 500ms, while running tests - especially if the seed happens to cause lock congestion very early on after task startup.
I can also mitigate the issue locally, by warming up ConCache in
test_helper.exs
by doing a loop of enough isolated calls to cover all the partitions, but upstream change feels much better.Even if the real issue is still elsewhere, the pure gain is still worth it I hope.
ConCache
as a dependency,ConCache.Lock.Resource
is loaded before the lock pool is started. In some cases, dynamic module resolution causedResource.new
to even add up 300ms of latency 🫣. I'm not sure if there's a cleaner way of making sure that module is loaded when the application starts?Thanks for looking.