ORC tz split 4/6: CUDA civil calendar helpers and DST math#4548
ORC tz split 4/6: CUDA civil calendar helpers and DST math#4548res-life wants to merge 1 commit into
Conversation
Part of the split of #4432. Adds the GPU-side machinery needed for runtime DST handling. Old get_transition_index / convert_timestamp_between_timezones / kernel are left in place; they are replaced by the new versions in orc-tz-5-cuda-dst-gpu. Changes in timezones.cu: - Add #include <cudf/utilities/bit.hpp> and <cuda/std/array>. - MS_PER_SECOND/MINUTE/HOUR/DAY constants. - Branchless civil calendar primitives: DAYS_BEFORE_MONTH lookup, is_leap_year, days_in_month, days_before_month, days_from_epoch_to_year, millis_to_year. - DST rule math: compute_rule_day (SimpleTimeZone-style start-mode evaluation), compute_transition_utc_ms. - compute_dst_offset: GPU-side evaluation of the DST rule. - New get_transition_index that consults the DST rule and historical initial offset on out-of-range lookups. - is_fixed_offset_tz helper. Signed-off-by: Chong Gao <chongg@nvidia.com>
d0baa11 to
944737e
Compare
4766088 to
91b7501
Compare
Greptile SummaryThis PR adds GPU-side DST math for ORC timezone handling: civil-calendar helpers (
Confidence Score: 3/5Do not merge: the translation unit will not compile because convert_timestamp_between_timezones still calls the removed 6-argument get_transition_index. The DST math itself (calendar helpers, compute_rule_day, compute_dst_offset) is well-structured and logically matches the Java SimpleTimeZone contract. However, the PR silently removes the 6-argument get_transition_index that convert_timestamp_between_timezones depends on — three call sites pass an int32_t const* where int32_t is now expected and omit the required dst_rule argument, so the file will not compile. The other observations (year approximation using raw_offset, absent bounds guard in compute_rule_day) are non-blocking in practice but worth addressing before wider use. src/main/cpp/src/timezones.cu — convert_timestamp_between_timezones and its convert_timezones_functor wrapper must either be updated to the new get_transition_index signature or the old overload must be preserved alongside the new one. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["get_transition_index(begin, end, time_ms,\noffset_begin, initial_offset, raw_offset, rule)"] --> B{begin == end?}
B -- Yes --> C["compute_dst_offset(time_ms, raw_offset, rule)"]
B -- No --> D["upper_bound(time_ms)"]
D --> E{iter == end?}
E -- Yes --> C
E -- No --> F{index == 0?}
F -- Yes --> G["return initial_offset"]
F -- No --> H["return offset_begin[index-1]"]
C --> I{rule.has_dst?}
I -- No --> J["return raw_offset"]
I -- Yes --> K["millis_to_year(utc_ms + raw_offset_ms)"]
K --> L["compute_transition_utc_ms (start)"]
K --> M["compute_transition_utc_ms (end)"]
L --> N{dst_start < dst_end?}
M --> N
N -- Yes --> O["Northern: in_dst = utc_ms in [start, end)"]
N -- No --> P["Southern: in_dst = utc_ms >= start OR utc_ms < end"]
O --> Q["return raw_offset + dst_savings or raw_offset"]
P --> Q
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["get_transition_index(begin, end, time_ms,\noffset_begin, initial_offset, raw_offset, rule)"] --> B{begin == end?}
B -- Yes --> C["compute_dst_offset(time_ms, raw_offset, rule)"]
B -- No --> D["upper_bound(time_ms)"]
D --> E{iter == end?}
E -- Yes --> C
E -- No --> F{index == 0?}
F -- Yes --> G["return initial_offset"]
F -- No --> H["return offset_begin[index-1]"]
C --> I{rule.has_dst?}
I -- No --> J["return raw_offset"]
I -- Yes --> K["millis_to_year(utc_ms + raw_offset_ms)"]
K --> L["compute_transition_utc_ms (start)"]
K --> M["compute_transition_utc_ms (end)"]
L --> N{dst_start < dst_end?}
M --> N
N -- Yes --> O["Northern: in_dst = utc_ms in [start, end)"]
N -- No --> P["Southern: in_dst = utc_ms >= start OR utc_ms < end"]
O --> Q["return raw_offset + dst_savings or raw_offset"]
P --> Q
|
| /** | ||
| * @brief Get the transition index for the given time `time_ms` using binary search. | ||
| * Find the first transition that is greater or equal to `time_ms`, | ||
| * and return the corresponding offset. | ||
| * @brief Compute the total UTC offset (raw + DST) for a UTC timestamp using DST rules. | ||
| * | ||
| * @param begin the beginning of the transition array. | ||
| * @param end the end of the transition array. | ||
| * @param time_ms the input time in milliseconds to find the transition index for. | ||
| * @param offset_begin the beginning of the offset array. | ||
| * @param offset_end the end of the offset array. | ||
| * This is the GPU equivalent of java.util.SimpleTimeZone.getOffset(long). | ||
| * It computes the DST start and end transitions for the year containing the |
There was a problem hiding this comment.
millis_to_year uses only raw_offset_ms; may derive wrong year near DST-active year boundaries
millis_to_year(utc_ms + raw_offset_ms) approximates the local year using the standard (non-DST) offset. When DST is active the true local time is utc_ms + raw_offset_ms + dst_savings. For a Southern-Hemisphere timezone (e.g., UTC+11, +1h DST) a UTC timestamp at Dec 31 ~22:xx could map to Jan 1 via the raw offset but only to Dec 31 if the extra DST hour is missing. The Northern-Hemisphere path is fine because DST periods never straddle Jan 1, but for Southern-Hemisphere transitions that span the year boundary the OR-based check (utc_ms >= dst_start || utc_ms < dst_end) can give a correct result even with the wrong year, so in practice this may not surface — but it is an undocumented fragility that should be either explained in a comment or resolved by iterating DST windows for year and year ± 1.
| } | ||
| } | ||
|
|
||
| case DOW_GE_DOM_MODE: { | ||
| // First rule_dow on or after rule_day | ||
| int64_t target_epoch = first_of_month_epoch_days + (rule_day - 1); | ||
| int64_t target_raw = ((target_epoch + 4) % 7); | ||
| if (target_raw < 0) target_raw += 7; | ||
| int32_t target_dow = static_cast<int32_t>(target_raw) + 1; | ||
| int32_t diff = rule_dow - target_dow; | ||
| if (diff < 0) diff += 7; | ||
| return rule_day + diff; | ||
| } | ||
|
|
||
| case DOW_LE_DOM_MODE: { | ||
| // Last rule_dow on or before rule_day | ||
| int64_t target_epoch = first_of_month_epoch_days + (rule_day - 1); | ||
| int64_t target_raw = ((target_epoch + 4) % 7); | ||
| if (target_raw < 0) target_raw += 7; | ||
| int32_t target_dow = static_cast<int32_t>(target_raw) + 1; | ||
| int32_t diff = target_dow - rule_dow; | ||
| if (diff < 0) diff += 7; | ||
| return rule_day - diff; | ||
| } | ||
|
|
||
| default: return rule_day; | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * @brief Compute the UTC millis of a DST transition for a given year. | ||
| * | ||
| * @param year The calendar year. | ||
| * @param rule_month 0-based month. | ||
| * @param rule_day Day parameter of the rule. | ||
| * @param rule_dow Day-of-week parameter. | ||
| * @param rule_time Time-of-day in ms. | ||
| * @param rule_time_mode WALL_TIME / STANDARD_TIME / UTC_TIME. | ||
| * @param rule_mode DOM / DOW_IN_MONTH / DOW_GE_DOM / DOW_LE_DOM. | ||
| * @param raw_offset_ms The timezone raw offset in ms. | ||
| * @param dst_savings_ms The DST savings in ms (needed for WALL_TIME conversion). | ||
| * @param is_start_rule True for DST start (to determine WALL_TIME adjustment). | ||
| */ | ||
| __device__ static int64_t compute_transition_utc_ms(int32_t year, | ||
| int32_t rule_month, | ||
| int32_t rule_day, | ||
| int32_t rule_dow, | ||
| int32_t rule_time, | ||
| int32_t rule_time_mode, | ||
| int32_t rule_mode, | ||
| int32_t raw_offset_ms, | ||
| int32_t dst_savings_ms, | ||
| bool is_start_rule) | ||
| { | ||
| int32_t actual_day = compute_rule_day(rule_mode, rule_day, rule_dow, year, rule_month); |
There was a problem hiding this comment.
compute_rule_day can return a day outside [1, month_len]
DOW_GE_DOM_MODE returns rule_day + diff (0–6 added), so a rule_day of 27 in a 28-day month with diff = 3 yields 30 — already out of range. Similarly, DOW_IN_MONTH_MODE with a large positive occurrence can overshoot. Java's SimpleTimeZone validates its own rules before storing them, so inputs from the serialized DST rule table should be safe, but compute_transition_utc_ms ultimately uses the result as a day-of-month offset into the month without any clamp, so a corrupted/unexpected rule would silently produce a wrong UTC timestamp (rather than a detectable error).
|
useless now. |
Part of the split of #4432.
Companion: NVIDIA/cudf-spark#14544
Previous: #4547
Adds the GPU-side machinery needed for runtime DST handling. Old
get_transition_index/convert_timestamp_between_timezones/ kernel are left in place; they are replaced by the new versions in #orc-tz-5.Changes in
timezones.cu:#include <cudf/utilities/bit.hpp>and<cuda/std/array>.compute_dst_offset: GPU-side evaluation of the DST rule.get_transition_indexoverload that consults the DST rule and historical initial offset on out-of-range lookups.is_fixed_offset_tzhelper.Signed-off-by: Chong Gao chongg@nvidia.com