Skip to content

Commit 5bffed5

Browse files
committed
Reduce memory & redundant work for concurrent TimeZones construction
This commit improves the thread-local caching scheme introduced in JuliaTime#344, by sharing TimeZones across _all_ thread-local caches (so there's only ever one TimeZone object instance created per process), and reduces redundant work caused by multiple concurrent Tasks starting to construct the same TimeZone at the same time. Before this commit, multiple Tasks that try to construct the same TimeZone on different threads would have constructed the same object multiple times. Whereas now, they would share it, via the "Future". ------------------------------------------------------------ It's difficult to show this effect, but one way to show it is by logging when a TimeZone is constructed, via this diff: ```julia diff --git a/src/types/timezone.jl b/src/types/timezone.jl index 25d36c3..1cea69e 100644 --- a/src/types/timezone.jl +++ b/src/types/timezone.jl @@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT)) # Note: If the class `mask` does not match the time zone we'll still load the # information into the cache to ensure the result is consistent. tz, class = get!(_tz_cache(), str) do + Core.println("CONSTRUCTING $str") tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...) if isfile(tz_path) ``` Before this commit, you can see that every thread constructs the object twice - once before the clear and once after (total of 8 times): ```julia julia> Threads.nthreads() 4 julia> TimeZones.TZData.compile(); julia> foo() = begin @sync for i in 1:20 if (i == 10) @info "---------clear-------" TimeZones.TZData.compile() end Threads.@Spawn begin TimeZone("US/Eastern", TimeZones.Class(:LEGACY)) end end @info "done" end foo (generic function with 1 method) julia> @time foo() [ Info: ---------clear------- CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern [ Info: done 0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time) ``` After this commit, it's constructed only twice - once before the clear and once after: ```julia julia> @time foo() [ Info: ---------clear------- [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: done 0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time) ``` ------------------------------------------------------------------ Finally, the other problem this avoids is if we ever accidentally introduce a Task yield inside the constructor, which could happen if we used `@info` instead of `Core.println()`, then without this PR, the old code could potentially do _redundant work_ to construct the TimeZone multiple times - even on the same thread - since each Task's constructor would see that there's no TZ in the cache, start the work, and then yield to the next Task, which would do the same, until finally they all report their work into the cache, overwriting each other. This is what happens if we use `@info` in the above diff, instead: Before this commit - the results are nondeterministic, but in this run, you can see it redundantly constructed the value all 20 times!: ```julia julia> @time foo() [ Info: ---------clear------- [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: done 0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time) ``` After this commit, just the two we expect. 😊 : ```julia julia> @time foo() [ Info: ---------clear------- [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: done 0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time) ```
1 parent ae1e583 commit 5bffed5

File tree

4 files changed

+58
-24
lines changed

4 files changed

+58
-24
lines changed

src/TimeZones.jl

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
module TimeZones
22

3+
using Base: @lock
34
using Dates
45
using Printf
56
using Serialization
@@ -39,7 +40,7 @@ abstract type Local <: TimeZone end
3940

4041
function __init__()
4142
# Initialize the thread-local TimeZone cache (issue #342)
42-
_reset_tz_cache()
43+
_tz_cache_init()
4344

4445
# Base extension needs to happen everytime the module is loaded (issue #24)
4546
Dates.CONVERSION_SPECIFIERS['z'] = TimeZone

src/types/timezone.jl

+53-15
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44
# to the cache, while still being thread-safe.
55
const THREAD_TZ_CACHES = Vector{Dict{String,Tuple{TimeZone,Class}}}()
66

7+
# Holding a lock during construction of a specific TimeZone prevents multiple Tasks (on the
8+
# same or different threads) from attempting to construct the same TimeZone object, and
9+
# allows them all to share the result.
10+
const tz_cache_mutex = ReentrantLock()
11+
const TZ_CACHE_FUTURES = Dict{String,Channel{Tuple{TimeZone,Class}}}() # Guarded by: tz_cache_mutex
12+
713
# Based upon the thread-safe Global RNG implementation in the Random stdlib:
814
# https://github.com/JuliaLang/julia/blob/e4fcdf5b04fd9751ce48b0afc700330475b42443/stdlib/Random/src/RNGs.jl#L369-L385
915
@inline _tz_cache() = _tz_cache(Threads.threadid())
@@ -19,10 +25,22 @@ const THREAD_TZ_CACHES = Vector{Dict{String,Tuple{TimeZone,Class}}}()
1925
end
2026
@noinline _tz_cache_length_assert() = @assert false "0 < tid <= length(THREAD_TZ_CACHES)"
2127

22-
function _reset_tz_cache()
23-
# ensures that we didn't save a bad object
28+
function _tz_cache_init()
2429
resize!(empty!(THREAD_TZ_CACHES), Threads.nthreads())
2530
end
31+
# ensures that we didn't save a bad object
32+
function _reset_tz_cache()
33+
# Since we use thread-local caches, we spawn a task on _each thread_ to clear that
34+
# thread's local cache.
35+
Threads.@threads for i in 1:Threads.nthreads()
36+
@assert Threads.threadid() === i "TimeZones.TZData.compile() must be called from the main, top-level Task."
37+
empty!(_tz_cache())
38+
end
39+
@lock tz_cache_mutex begin
40+
empty!(TZ_CACHE_FUTURES)
41+
end
42+
return nothing
43+
end
2644

2745
"""
2846
TimeZone(str::AbstractString) -> TimeZone
@@ -68,20 +86,40 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
6886
# Note: If the class `mask` does not match the time zone we'll still load the
6987
# information into the cache to ensure the result is consistent.
7088
tz, class = get!(_tz_cache(), str) do
71-
tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)
72-
73-
if isfile(tz_path)
74-
open(deserialize, tz_path, "r")
75-
elseif occursin(FIXED_TIME_ZONE_REGEX, str)
76-
FixedTimeZone(str), Class(:FIXED)
77-
elseif !isdir(TZData.COMPILED_DIR) || isempty(readdir(TZData.COMPILED_DIR))
78-
# Note: Julia 1.0 supresses the build logs which can hide issues in time zone
79-
# compliation which result in no tzdata time zones being available.
80-
throw(ArgumentError(
81-
"Unable to find time zone \"$str\". Try running `TimeZones.build()`."
82-
))
89+
# Even though we're using Thread-local caches, we still need to lock during
90+
# construction to prevent multiple tasks redundantly constructing the same object,
91+
# and potential thread safety violations due to Tasks migrating threads.
92+
# NOTE that we only grab the lock if the TZ doesn't exist, so the mutex contention
93+
# is not on the critical path for most constructors. :)
94+
constructing = false
95+
# We lock the mutex, but for only a short, *constant time* duration, to grab the
96+
# future for this TimeZone, or create the future if it doesn't exist.
97+
future = @lock tz_cache_mutex begin
98+
get!(TZ_CACHE_FUTURES, str) do
99+
constructing = true
100+
Channel{Tuple{TimeZone,Class}}(1)
101+
end
102+
end
103+
if constructing
104+
tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)
105+
106+
tz = if isfile(tz_path)
107+
open(deserialize, tz_path, "r")
108+
elseif occursin(FIXED_TIME_ZONE_REGEX, str)
109+
FixedTimeZone(str), Class(:FIXED)
110+
elseif !isdir(TZData.COMPILED_DIR) || isempty(readdir(TZData.COMPILED_DIR))
111+
# Note: Julia 1.0 supresses the build logs which can hide issues in time zone
112+
# compliation which result in no tzdata time zones being available.
113+
throw(ArgumentError(
114+
"Unable to find time zone \"$str\". Try running `TimeZones.build()`."
115+
))
116+
else
117+
throw(ArgumentError("Unknown time zone \"$str\""))
118+
end
119+
120+
put!(future, tz)
83121
else
84-
throw(ArgumentError("Unknown time zone \"$str\""))
122+
fetch(future)
85123
end
86124
end
87125

src/tzdata/TZData.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
module TZData
22

33
using Printf
4-
using ...TimeZones: DEPS_DIR
4+
using ...TimeZones: DEPS_DIR, _reset_tz_cache
55

66
import Pkg
77

src/tzdata/compile.jl

+2-7
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ using Dates
22
using Serialization
33
using Dates: parse_components
44

5-
using ...TimeZones: _tz_cache
5+
using ...TimeZones: _reset_tz_cache
66
using ...TimeZones: TimeZones, TimeZone, FixedTimeZone, VariableTimeZone, Transition, Class
77
using ...TimeZones: rename
88
using ..TZData: TimeOffset, ZERO, MIN_GMT_OFFSET, MAX_GMT_OFFSET, MIN_SAVE, MAX_SAVE,
@@ -696,12 +696,7 @@ function compile(tz_source::TZSource, dest_dir::AbstractString; kwargs...)
696696
isdir(dest_dir) || error("Destination directory doesn't exist")
697697
# When we recompile the TimeZones from a new source, we clear all the existing cached
698698
# TimeZone objects, so that newly constructed objects pick up the newly compiled rules.
699-
# Since we use thread-local caches, we spawn a task on _each thread_ to clear that
700-
# thread's local cache.
701-
Threads.@threads for i in 1:Threads.nthreads()
702-
@assert Threads.threadid() === i "TimeZones.TZData.compile() must be called from the main, top-level Task."
703-
empty!(_tz_cache())
704-
end
699+
_reset_tz_cache()
705700

706701
for (tz, class) in results
707702
parts = split(TimeZones.name(tz), '/')

0 commit comments

Comments
 (0)