Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize allocations when unpacking TimeZones from cache (updated) #451

Merged
merged 11 commits into from
May 23, 2024
Merged

Minimize allocations when unpacking TimeZones from cache (updated) #451

merged 11 commits into from
May 23, 2024

Conversation

tpgillam
Copy link
Contributor

Closes #422

This is a rebase/reimplementation of #423 onto master .
(FYI @nickrobinson251, I'm doing this after seeing your comment!)

I started by trying to rebase your PR properly, but in the end it was easier to cherry-pick your commits that still applied (just the tests), then re-implement the rest. I'm sorry that this results in this new PR, but wasn't sure how else to proceed.

One note -- I find that it now takes two allocations to retrieve a VariableTimeZone from the cache, rather than 1. I spent some time with the alllocation tracker trying to understand this with no luck, so I've updated the test expectation accordingly. This isn't any worse than the current behaviour on master, and in any case the main thing is that retrieving a FixedTimeZone is now allocation free :)

@tpgillam
Copy link
Contributor Author

Also, I've checked this against julia 1.10-rc1, and still see 2 allocs for constructing a VariableTimeZone unfortunately.

@tpgillam
Copy link
Contributor Author

tpgillam commented Nov 30, 2023

Also, I've checked this against julia 1.10-rc1, and still see 2 allocs for constructing a VariableTimeZone unfortunately.

I tried an alternative refactor, where I had a separate cache for all the classes (taking us to a total of 3) -- see on this branch. Whilst this reduces the number of allocations (down to 1 again for the VariableTimeZone), overall the timings are significantly worse, presumably due to the additional hash-table lookup.

Microbenchmarks on my machine with Julia 1.10-rc1 for

@btime TimeZone("UTC")  # FTZ case
@btime TimeZone("America/Winnipeg")  # VTZ case

this PR

FTZ:  18.727 ns (0 allocations: 0 bytes)
VTZ:  37.680 ns (2 allocations: 96 bytes)

separate class cache

FTZ:  31.096 ns (0 allocations: 0 bytes)
VTZ:  46.151 ns (1 allocation: 48 bytes)

current master

FTZ:  48.142 ns (2 allocations: 64 bytes)
VTZ:  48.238 ns (2 allocations: 64 bytes)

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I want to sit with this for a little bit

test/types/timezone.jl Outdated Show resolved Hide resolved
test/types/timezone.jl Outdated Show resolved Hide resolved
test/helpers.jl Outdated Show resolved Hide resolved
test/helpers.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Show resolved Hide resolved
Comment on lines 43 to 47
return get(_FTZ_CACHE, name) do
get(_VTZ_CACHE, name) do
return f_default()
end
end
Copy link
Member

@omus omus Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a possible faster alternative we where lookup the name to get the Class which then lets us lookup the TimeZones from its type specific Dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds a little like what I mentioned here: #451 (comment)

tidies up the allocations a bit, but the extra hash table lookup slowed things down in my timings

Copy link
Contributor Author

@tpgillam tpgillam Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just amended that comment to link to branch with the class-cache.)

@tpgillam
Copy link
Contributor Author

LGTM. I want to sit with this for a little bit

Thanks - let me know if there’s anything else from my end that would be helpful.

@tpgillam
Copy link
Contributor Author

tpgillam commented Nov 30, 2023

PkgBenchmark ``` > TZDATA_VERSION=2016j julia --project=benchmark/ -e 'using PkgBenchmark, TimeZones; export_markdown(stdout, judge(TimeZones, "origin/HEAD", verbose=false))' PkgBenchmark: Running benchmarks... PkgBenchmark: using benchmark tuning data in /home/tom/.julia/dev/TimeZones/benchmark/tune.json Benchmarking 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:45 PkgBenchmark: Running benchmarks... PkgBenchmark: using benchmark tuning data in /home/tom/.julia/dev/TimeZones/benchmark/tune.json Benchmarking 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:47 ```

Benchmark Report for /home/tom/.julia/dev/TimeZones

Job Properties

  • Time of benchmarks:
    • Target: 30 Nov 2023 - 21:39
    • Baseline: 30 Nov 2023 - 21:40
  • Package commits:
    • Target: 2b0b76
    • Baseline: f9d0dc
  • Julia commits:
    • Target: 8e5136
    • Baseline: 8e5136
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["ZonedDateTime", "local", "ambiguous"] 1.08 (5%) ❌ 1.00 (1%)
["ZonedDateTime", "local", "standard"] 0.92 (5%) ✅ 1.00 (1%)
["ZonedDateTime", "range", "VariableTimeZone/DatePeriod"] 1.09 (5%) ❌ 1.00 (1%)
["interpret", "local", "ambiguous"] 0.92 (5%) ✅ 1.00 (1%)
["interpret", "local", "non-existent"] 0.91 (5%) ✅ 1.00 (1%)
["interpret", "local", "standard"] 0.94 (5%) ✅ 1.00 (1%)
["interpret", "utc"] 0.90 (5%) ✅ 1.00 (1%)
["parse", "ISOZonedDateTimeFormat"] 1.00 (5%) 0.94 (1%) ✅
["parse", "issue#25"] 1.01 (5%) 1.05 (1%) ❌
["transition_range", "utc"] 1.88 (5%) ❌ 1.00 (1%)
["tryparsenext_tz", "America/Argentina/ComodRivadavia"] 0.95 (5%) ✅ 1.00 (1%)
["tryparsenext_tz", "UTC"] 0.94 (5%) ✅ 1.00 (1%)
["tz_data", "parse_components"] 0.98 (5%) 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["ZonedDateTime"]
  • ["ZonedDateTime", "local"]
  • ["ZonedDateTime", "range"]
  • ["arithmetic"]
  • ["arithmetic", "broadcast"]
  • ["interpret", "local"]
  • ["interpret"]
  • ["parse"]
  • ["transition_range", "local"]
  • ["transition_range"]
  • ["tryparsenext_fixedtz"]
  • ["tryparsenext_tz"]
  • ["tz_data"]

Julia versioninfo

Target

Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      "Fedora release 39 (Thirty Nine)"
  uname: Linux 6.6.2-201.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 22 21:31:42 UTC 2023 x86_64 unknown
  CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz:
                 speed         user         nice          sys         idle          irq
       #1-12  4298 MHz     290496 s       1686 s      90879 s     443666 s      20652 s
  Memory: 31.26819610595703 GB (19133.734375 MB free)
  Uptime: 46609.8 sec
  Load Avg:  1.26  0.93  0.72
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

Baseline

Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      "Fedora release 39 (Thirty Nine)"
  uname: Linux 6.6.2-201.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 22 21:31:42 UTC 2023 x86_64 unknown
  CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz:
                 speed         user         nice          sys         idle          irq
       #1-12  4497 MHz     291253 s       1686 s      90977 s     450010 s      20669 s
  Memory: 31.26819610595703 GB (19001.28125 MB free)
  Uptime: 46670.1 sec
  Load Avg:  1.58  1.06  0.77
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

For good measure I did a second run, and stuff moved around quite a bit. So I'm not reading too much into any of these numbers:

ID time ratio memory ratio
["ZonedDateTime", "local", "ambiguous"] 0.91 (5%) ✅ 1.00 (1%)
["interpret", "local", "ambiguous"] 1.29 (5%) ❌ 1.00 (1%)
["interpret", "local", "non-existent"] 1.37 (5%) ❌ 1.00 (1%)
["interpret", "local", "standard"] 1.21 (5%) ❌ 1.00 (1%)
["parse", "ISOZonedDateTimeFormat"] 1.01 (5%) 0.94 (1%) ✅
["parse", "issue#25"] 0.97 (5%) 1.05 (1%) ❌
["transition_range", "utc"] 0.75 (5%) ✅ 1.00 (1%)
["tryparsenext_fixedtz", "UTC"] 1.09 (5%) ❌ 1.00 (1%)
["tz_data", "parse_components"] 0.99 (5%) 1.07 (1%) ❌

Copy link
Contributor

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tpgillam! LGTM

I'd love us one day to get the VTZ case down to 1 or ideally 0 allocs (i was never sure what the last 1 alloc was tbh, but i think the alloc profiler now has info for all types whereas it didn't when i last tried). Anyway, this is already a big win for the most common case, and much appreciated!

(I'll leave it to @omus to decide when to merge and release, though i ofc hope it's soon!)

@tpgillam
Copy link
Contributor Author

tpgillam commented Dec 9, 2023

Thanks both for the discussions so far! @omus are you happy to move forward with this, or are there are other thoughts/concerns we should discuss?

@nickrobinson251
Copy link
Contributor

Bump! @omus please could you merge this and make a new release?

@omus
Copy link
Member

omus commented May 23, 2024

Baseline timings for my machine. I'll need to resolve the conflict yet.

Julia 1.10.0

Current master (ac06a91)

julia> @btime TimeZone("America/Winnipeg");
  41.246 ns (2 allocations: 64 bytes)

julia> @btime TimeZone("UTC");
  40.868 ns (2 allocations: 64 bytes)

This PR (2b0b76e)

julia> @btime TimeZone("UTC");
  17.994 ns (0 allocations: 0 bytes)

julia> @btime TimeZone("America/Winnipeg");
  28.475 ns (2 allocations: 96 bytes)

@codecov-commenter
Copy link

codecov-commenter commented May 23, 2024

Codecov Report

Attention: Patch coverage is 97.22222% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 92.73%. Comparing base (2bc8f50) to head (1237173).
Report is 7 commits behind head on master.

Files Patch % Lines
src/types/timezone.jl 97.22% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #451      +/-   ##
==========================================
- Coverage   92.79%   92.73%   -0.07%     
==========================================
  Files          39       39              
  Lines        1818     1844      +26     
==========================================
+ Hits         1687     1710      +23     
- Misses        131      134       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@omus
Copy link
Member

omus commented May 23, 2024

This PR 511d2cc

julia> @time_imports using TimeZones
      0.4 ms  Scratch
     10.2 ms  Preferences
      0.4 ms  PrecompileTools
               ┌ 0.0 ms Parsers.__init__()
     45.3 ms  Parsers 40.31% compilation time
      4.2 ms  InlineStrings
      0.4 ms  TZJData
      0.5 ms  Compat
      0.2 ms  Compat  CompatLinearAlgebraExt
      0.4 ms  ExprTools
      0.6 ms  Mocking
               ┌ 1.1 ms TimeZones.TZData.__init__()
               ├ 0.0 ms TimeZones.__init__()
     22.6 ms  TimeZones

julia> using BenchmarkTools

julia> @btime TimeZone("UTC");
  18.203 ns (0 allocations: 0 bytes)

julia> @btime TimeZone("America/Winnipeg");
  29.104 ns (2 allocations: 96 bytes)

julia> @btime istimezone("Europe/Warsaw");
  76.418 ns (1 allocation: 48 bytes)

I slight increase in allocated bytes with istimezone from this: #457 (comment)

@omus
Copy link
Member

omus commented May 23, 2024

The cache code is definitely becoming unwieldy. I've created a TimeZoneCache struct which doesn't affect the performance:

This PR 67f2c3f

julia> @time_imports using TimeZones
      0.4 ms  Scratch
      9.1 ms  Preferences
      0.4 ms  PrecompileTools
               ┌ 0.0 ms Parsers.__init__()
     46.7 ms  Parsers 38.23% compilation time
      4.1 ms  InlineStrings
      0.3 ms  TZJData
      0.4 ms  Compat
      0.2 ms  Compat  CompatLinearAlgebraExt
      0.4 ms  ExprTools
      0.6 ms  Mocking
               ┌ 1.0 ms TimeZones.TZData.__init__()
               ├ 0.0 ms TimeZones.__init__()
     22.9 ms  TimeZones

julia> using BenchmarkTools

julia> @btime TimeZone("UTC");
  18.597 ns (0 allocations: 0 bytes)

julia> @btime TimeZone("America/Winnipeg");
  28.894 ns (2 allocations: 96 bytes)

julia> @btime istimezone("Europe/Warsaw");
  74.802 ns (1 allocation: 48 bytes)

@omus
Copy link
Member

omus commented May 23, 2024

These changes are also a step in the right direction for reducing the initial cache loading time

This PR 67f2c3f

julia> @btime TimeZones._reload_tz_cache(TimeZones._COMPILED_DIR[]);
  20.795 ms (313574 allocations: 12.52 MiB)

Current master ac06a91

julia> @btime TimeZones._reload_tz_cache(TimeZones._COMPILED_DIR[]);
  22.808 ms (313621 allocations: 12.52 MiB)

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tpgillam and @nickrobinson251 for this!

@omus omus merged commit 5c8f5c4 into JuliaTime:master May 23, 2024
16 checks passed
@omus
Copy link
Member

omus commented May 23, 2024

I'm planning on rolling some other changes in before making a release here. I expect TimeZones.jl 1.17 should be out by 2024-05-27 (Monday).

@tpgillam tpgillam deleted the tg/timezone-allocations branch May 28, 2024 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating TimeZone from string allocates unnecessarily
4 participants