Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small performance improvement for cold boot BEAM #79

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

aerosol
Copy link
Contributor

@aerosol aerosol commented Sep 17, 2024

I'm running my tests on AMD 24 cores/6.6.51-1-lts kernel.

Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:128:24] [ds:128:24:10] [async-threads:1] [jit:ns]

Interactive Elixir (1.17.1) - press Ctrl+C to exit (type h() ENTER for help)

My colleagues on macs could not ever reproduce the cold boot slowdown, one person with Intel linux could.

This change consists of two minor improvements.

  1. Use MapSet.new/1 rather than Enum.into/2

Apparently, on some architectures at least, this makes a significant difference on cold VM boot:

(clean iex required each time)

iex(1)> :timer.tc(fn -> MapSet.new([%{}]) end)
{5, MapSet.new([%{}])}
iex(1)> :timer.tc(fn -> Enum.into([%{}], MapSet.new()) end)
{7931, MapSet.new([%{}])}

iex(2)> :timer.tc(fn -> Enum.into([%{}], MapSet.new()) end)
{12, MapSet.new([%{}])}

This change is functionally equivalent, but in code utilizing ConCache locks, it does cause occasional timeouts with acquire limited to 500ms, while running tests - especially if the seed happens to cause lock congestion very early on after task startup.

I can also mitigate the issue locally, by warming up ConCache in test_helper.exs by doing a loop of enough isolated calls to cover all the partitions, but upstream change feels much better.

Even if the real issue is still elsewhere, the pure gain is still worth it I hope.

  1. Ensure that when running tests where ConCache as a dependency, ConCache.Lock.Resource is loaded before the lock pool is started. In some cases, dynamic module resolution caused Resource.new to even add up 300ms of latency 🫣. I'm not sure if there's a cleaner way of making sure that module is loaded when the application starts?

Thanks for looking.

I'm running my tests on AMD 24 cores/6.6.51-1-lts kernel.

```
Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:128:24] [ds:128:24:10] [async-threads:1] [jit:ns]

Interactive Elixir (1.17.1) - press Ctrl+C to exit (type h() ENTER for help)
```

My colleagues on macs could not ever reproduce the cold boot
slowdown, one person with Intel linux could.

This change consists of two minor improvements.

1. Use `MapSet.new/1` rather than `Enum.into/2`

Apparently, on some architectures at least, this makes
a significant difference on cold VM boot:

(clean `iex` required each time)

```
iex(1)> :timer.tc(fn -> MapSet.new([%{}]) end)
{5, MapSet.new([%{}])}
```

```
iex(1)> :timer.tc(fn -> Enum.into([%{}], MapSet.new()) end)
{7931, MapSet.new([%{}])}

iex(2)> :timer.tc(fn -> Enum.into([%{}], MapSet.new()) end)
{12, MapSet.new([%{}])}
```

This change is functionally equivalent, but in code
utilizing ConCache locks, it does cause occasional timeouts
with acquire limited to 500ms, while running tests -
especially if the seed happens to cause lock congestion
very early on after task startup.

I can also mitigate the issue locally, by warming up ConCache in
`test_helper.exs` by doing a loop of enough isolated calls
to cover all the partitions, but upstream change feels much better.

Even if the _real_ issue is still elsewhere, the pure gain
is still worth it I hope.

2. Ensure that when running tests where `ConCache` as a dependency,
`ConCache.Lock.Resource` is loaded before the lock pool is started.
In some cases, dynamic module resolution caused `Resource.new`
to even add up 300ms of latency 🫣. I'm not sure if there's a cleaner
way of making sure that module is loaded when the application starts?

Thanks for looking.
@aerosol aerosol marked this pull request as ready for review September 17, 2024 09:00
aerosol added a commit to plausible/analytics that referenced this pull request Sep 17, 2024
aerosol added a commit to plausible/analytics that referenced this pull request Sep 17, 2024
github-merge-queue bot pushed a commit to plausible/analytics that referenced this pull request Sep 17, 2024
@@ -10,6 +10,8 @@ defmodule ConCache.LockSupervisor do
end

def start_link(n_partitions) do
true = Code.ensure_loaded?(ConCache.Lock.Resource)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is you're only experiencing slowdown in development (mode=interactive), in production releases (mode=embedded), all the modules would be preloaded on boot so this line would not be necessary. Btw, I vaguely remember that function might be going through the code server which we should rather avoid at runtime, though booting supervisors and similar is about the only place where this is sensible enough.

Interestingly enough, I believe this:

  true = Code.ensure_loaded?(ConCache.Lock.Resource)

  def start_link(n_partitions) do

would solve your issue without impacting the runtime characteristics. That being said, ideally we fix the root cause, maybe it's code server and/or VM issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Wojtek, pretty sure I’ve tried that before to no avail (as well as explicit require) but will double check

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to reproduce the slowdown with just con_cache? Ideally something like:

$ time elixir -e 'Mix.install([:con_cache]) ; {:ok, _} = ConCache.start_link(ttl_check_interval: false)'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of:

iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))          
{:ok, #PID<0.215.0>}
iex(2)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)                                  
{5044, :ok}
iex(3)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)
{50, :ok}

and on this branch:

iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))
{:ok, #PID<0.242.0>}
iex(2)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)                        
{2138, :ok}
iex(3)> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end)
{49, :ok}

Now with more processes:

iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))                                 
{:ok, #PID<0.215.0>}
iex(2)> 1..System.schedulers_online() |> Task.async_stream(fn _ -> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end) end) |> Enum.take(24)
[
  ok: {6024, :ok},
  ok: {6083, :ok},
  ok: {6042, :ok},
  ok: {4637, :ok},
  ok: {4575, :ok},
  ok: {4529, :ok},
  ok: {4516, :ok},
  ok: {4481, :ok},
  ok: {4451, :ok},
  ok: {4434, :ok},
  ok: {4426, :ok},
  ok: {4380, :ok},
  ok: {4366, :ok},
  ok: {4355, :ok},
  ok: {4367, :ok},
  ok: {4363, :ok},
  ok: {4352, :ok},
  ok: {4347, :ok},
  ok: {4337, :ok},
  ok: {4334, :ok},
  ok: {4320, :ok},
  ok: {4311, :ok},
  ok: {4304, :ok},
  ok: {4273, :ok}
]

and on this branch:

iex(1)> ConCache.start_link(name: :sessions, ttl_check_interval: :timer.seconds(1), global_ttl: :timer.minutes(30))                                                         
{:ok, #PID<0.215.0>}
iex(2)> 1..System.schedulers_online() |> Task.async_stream(fn _ -> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end) end) |> Enum.take(24)
[
  ok: {2935, :ok},
  ok: {2984, :ok},
  ok: {2969, :ok},
  ok: {2532, :ok},
  ok: {2577, :ok},
  ok: {2566, :ok},
  ok: {2542, :ok},
  ok: {2514, :ok},
  ok: {2525, :ok},
  ok: {2544, :ok},
  ok: {2533, :ok},
  ok: {2509, :ok},
  ok: {2485, :ok},
  ok: {2421, :ok},
  ok: {2386, :ok},
  ok: {2372, :ok},
  ok: {2351, :ok},
  ok: {2316, :ok},
  ok: {2292, :ok},
  ok: {2264, :ok},
  ok: {2250, :ok},
  ok: {2224, :ok},
  ok: {2192, :ok},
  ok: {2189, :ok}
]

Subsequent call:

iex(3)> 1..System.schedulers_online() |> Task.async_stream(fn _ -> :timer.tc(fn -> ConCache.isolated(:sessions, {:foo, :bar}, 500, fn -> :ok end) end) end) |> Enum.take(24)
[
  ok: {71, :ok},
  ok: {42, :ok},
  ok: {16, :ok},
  ok: {47, :ok},
  ok: {75, :ok},
  ok: {57, :ok},
  ok: {66, :ok},
  ok: {49, :ok},
  ok: {47, :ok},
  ok: {38, :ok},
  ok: {45, :ok},
  ok: {97, :ok},
  ok: {94, :ok},
  ok: {131, :ok},
  ok: {162, :ok},
  ok: {164, :ok},
  ok: {140, :ok},
  ok: {134, :ok},
  ok: {142, :ok},
  ok: {85, :ok},
  ok: {108, :ok},
  ok: {98, :ok},
  ok: {58, :ok},
  ok: {55, :ok}
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing there's more to do if we care about avoiding the code server in interactive mode. I've tried running https://github.com/DockYard/flame_on and it was indeed the culprit (hence the patch) - apart from the awkward MapSet instantiation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this happens because of concurrent attempts to load the module.

Looking at the code of ConCache.Resource, the module is short, and there's no compile-time generation of functions and clauses, so this all seems very strange.

This needs further investigation, and I'm currently not inclined to add this workaround to ConCache. We should instead try to check if the similar problem exists in a simplified small project, where we try to do concurrent implicit lazy load of some small module.

Copy link
Owner

@sasa1977 sasa1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! MapSet.new is an improvement, and I suggest we merge that.

I'm currently not in favor of ensure_loaded, at least not until further investigation is done. I think we should try to reproduce this independently of ConCache.

@@ -10,6 +10,8 @@ defmodule ConCache.LockSupervisor do
end

def start_link(n_partitions) do
true = Code.ensure_loaded?(ConCache.Lock.Resource)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this happens because of concurrent attempts to load the module.

Looking at the code of ConCache.Resource, the module is short, and there's no compile-time generation of functions and clauses, so this all seems very strange.

This needs further investigation, and I'm currently not inclined to add this workaround to ConCache. We should instead try to check if the similar problem exists in a simplified small project, where we try to do concurrent implicit lazy load of some small module.

@sasa1977 sasa1977 merged commit 57fb986 into sasa1977:master Sep 19, 2024
2 checks passed
@sasa1977
Copy link
Owner

Thank you!

@aerosol
Copy link
Contributor Author

aerosol commented Sep 19, 2024

@sasa1977 would be great if you could bump hexpm release by the way. thank you.

@sasa1977
Copy link
Owner

@sasa1977 would be great if you could bump hexpm release by the way. thank you.

Released 1.1.1 to hex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants