Skip to content

ORC tz split 4/6: CUDA civil calendar helpers and DST math#4548

Closed
res-life wants to merge 1 commit into
orc-tz-3-orctzinfo-runtime-buildfrom
orc-tz-4-cuda-civil-calendar
Closed

ORC tz split 4/6: CUDA civil calendar helpers and DST math#4548
res-life wants to merge 1 commit into
orc-tz-3-orctzinfo-runtime-buildfrom
orc-tz-4-cuda-civil-calendar

Conversation

@res-life

Copy link
Copy Markdown
Collaborator

Part of the split of #4432.
Companion: NVIDIA/cudf-spark#14544
Previous: #4547

Adds the GPU-side machinery needed for runtime DST handling. Old get_transition_index / convert_timestamp_between_timezones / kernel are left in place; they are replaced by the new versions in #orc-tz-5.

Changes in timezones.cu:

  • Add #include <cudf/utilities/bit.hpp> and <cuda/std/array>.
  • MS_PER_SECOND/MINUTE/HOUR/DAY constants.
  • Branchless civil calendar primitives: DAYS_BEFORE_MONTH lookup, is_leap_year, days_in_month, days_before_month, days_from_epoch_to_year, millis_to_year.
  • DST rule math: compute_rule_day (SimpleTimeZone-style start-mode evaluation), compute_transition_utc_ms.
  • compute_dst_offset: GPU-side evaluation of the DST rule.
  • New get_transition_index overload that consults the DST rule and historical initial offset on out-of-range lookups.
  • is_fixed_offset_tz helper.

Signed-off-by: Chong Gao chongg@nvidia.com

Part of the split of #4432.

Adds the GPU-side machinery needed for runtime DST handling. Old
get_transition_index / convert_timestamp_between_timezones / kernel are
left in place; they are replaced by the new versions in
orc-tz-5-cuda-dst-gpu.

Changes in timezones.cu:

- Add #include <cudf/utilities/bit.hpp> and <cuda/std/array>.
- MS_PER_SECOND/MINUTE/HOUR/DAY constants.
- Branchless civil calendar primitives: DAYS_BEFORE_MONTH lookup,
  is_leap_year, days_in_month, days_before_month, days_from_epoch_to_year,
  millis_to_year.
- DST rule math: compute_rule_day (SimpleTimeZone-style start-mode
  evaluation), compute_transition_utc_ms.
- compute_dst_offset: GPU-side evaluation of the DST rule.
- New get_transition_index that consults the DST rule and historical
  initial offset on out-of-range lookups.
- is_fixed_offset_tz helper.

Signed-off-by: Chong Gao <chongg@nvidia.com>
@res-life res-life force-pushed the orc-tz-3-orctzinfo-runtime-build branch from d0baa11 to 944737e Compare May 19, 2026 08:33
@res-life res-life force-pushed the orc-tz-4-cuda-civil-calendar branch from 4766088 to 91b7501 Compare May 19, 2026 08:33
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds GPU-side DST math for ORC timezone handling: civil-calendar helpers (is_leap_year, days_in_month, millis_to_year), compute_rule_day (porting Java SimpleTimeZone rule decoding), compute_transition_utc_ms, and compute_dst_offset. It also introduces a new get_transition_index overload that falls back to DST-rule computation for out-of-range timestamps.

  • The old 6-argument get_transition_index is entirely replaced by a 7-argument version, but convert_timestamp_between_timezones and its three call sites still use the old signature, causing a build failure.
  • millis_to_year approximates the local year with only the raw (non-DST) offset, which can produce the wrong year for timestamps near Jan 1 in DST-active Southern-Hemisphere zones; the OR-boundary check masks most cases but the fragility is undocumented.
  • compute_rule_day has no guard against returning a day outside [1, month_len] for edge-case rule inputs.

Confidence Score: 3/5

Do not merge: the translation unit will not compile because convert_timestamp_between_timezones still calls the removed 6-argument get_transition_index.

The DST math itself (calendar helpers, compute_rule_day, compute_dst_offset) is well-structured and logically matches the Java SimpleTimeZone contract. However, the PR silently removes the 6-argument get_transition_index that convert_timestamp_between_timezones depends on — three call sites pass an int32_t const* where int32_t is now expected and omit the required dst_rule argument, so the file will not compile. The other observations (year approximation using raw_offset, absent bounds guard in compute_rule_day) are non-blocking in practice but worth addressing before wider use.

src/main/cpp/src/timezones.cu — convert_timestamp_between_timezones and its convert_timezones_functor wrapper must either be updated to the new get_transition_index signature or the old overload must be preserved alongside the new one.

Important Files Changed

Filename Overview
src/main/cpp/src/timezones.cu Adds GPU-side DST math (civil-calendar helpers, compute_rule_day, compute_dst_offset, new get_transition_index overload), but removes the 6-argument get_transition_index that convert_timestamp_between_timezones still calls, causing a build failure.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["get_transition_index(begin, end, time_ms,\noffset_begin, initial_offset, raw_offset, rule)"] --> B{begin == end?}
    B -- Yes --> C["compute_dst_offset(time_ms, raw_offset, rule)"]
    B -- No --> D["upper_bound(time_ms)"]
    D --> E{iter == end?}
    E -- Yes --> C
    E -- No --> F{index == 0?}
    F -- Yes --> G["return initial_offset"]
    F -- No --> H["return offset_begin[index-1]"]
    C --> I{rule.has_dst?}
    I -- No --> J["return raw_offset"]
    I -- Yes --> K["millis_to_year(utc_ms + raw_offset_ms)"]
    K --> L["compute_transition_utc_ms (start)"]
    K --> M["compute_transition_utc_ms (end)"]
    L --> N{dst_start < dst_end?}
    M --> N
    N -- Yes --> O["Northern: in_dst = utc_ms in [start, end)"]
    N -- No --> P["Southern: in_dst = utc_ms >= start OR utc_ms < end"]
    O --> Q["return raw_offset + dst_savings or raw_offset"]
    P --> Q
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["get_transition_index(begin, end, time_ms,\noffset_begin, initial_offset, raw_offset, rule)"] --> B{begin == end?}
    B -- Yes --> C["compute_dst_offset(time_ms, raw_offset, rule)"]
    B -- No --> D["upper_bound(time_ms)"]
    D --> E{iter == end?}
    E -- Yes --> C
    E -- No --> F{index == 0?}
    F -- Yes --> G["return initial_offset"]
    F -- No --> H["return offset_begin[index-1]"]
    C --> I{rule.has_dst?}
    I -- No --> J["return raw_offset"]
    I -- Yes --> K["millis_to_year(utc_ms + raw_offset_ms)"]
    K --> L["compute_transition_utc_ms (start)"]
    K --> M["compute_transition_utc_ms (end)"]
    L --> N{dst_start < dst_end?}
    M --> N
    N -- Yes --> O["Northern: in_dst = utc_ms in [start, end)"]
    N -- No --> P["Southern: in_dst = utc_ms >= start OR utc_ms < end"]
    O --> Q["return raw_offset + dst_savings or raw_offset"]
    P --> Q
Loading

Comments Outside Diff (1)

  1. src/main/cpp/src/timezones.cu, line 531-570 (link)

    P1 Build failure: old 6-argument get_transition_index call is unresolvable

    This PR replaces the 6-argument get_transition_index(begin, end, time_ms, offset_begin, offset_end, raw_offset) with a 7-argument version that takes initial_offset, raw_offset, and dst_rule. However, convert_timestamp_between_timezones still calls the old 6-argument form (at lines 549, 556, and 564), passing writer_offsets_end/reader_offsets_end (int32_t const*) where initial_offset (int32_t) is now expected, and omitting the required rule argument entirely. There is no longer any matching overload, so this translation unit will not compile.

Reviews (1): Last reviewed commit: "ORC tz split 4/6: CUDA civil calendar he..." | Re-trigger Greptile

Comment on lines 428 to +432
/**
* @brief Get the transition index for the given time `time_ms` using binary search.
* Find the first transition that is greater or equal to `time_ms`,
* and return the corresponding offset.
* @brief Compute the total UTC offset (raw + DST) for a UTC timestamp using DST rules.
*
* @param begin the beginning of the transition array.
* @param end the end of the transition array.
* @param time_ms the input time in milliseconds to find the transition index for.
* @param offset_begin the beginning of the offset array.
* @param offset_end the end of the offset array.
* This is the GPU equivalent of java.util.SimpleTimeZone.getOffset(long).
* It computes the DST start and end transitions for the year containing the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 millis_to_year uses only raw_offset_ms; may derive wrong year near DST-active year boundaries

millis_to_year(utc_ms + raw_offset_ms) approximates the local year using the standard (non-DST) offset. When DST is active the true local time is utc_ms + raw_offset_ms + dst_savings. For a Southern-Hemisphere timezone (e.g., UTC+11, +1h DST) a UTC timestamp at Dec 31 ~22:xx could map to Jan 1 via the raw offset but only to Dec 31 if the extra DST hour is missing. The Northern-Hemisphere path is fine because DST periods never straddle Jan 1, but for Southern-Hemisphere transitions that span the year boundary the OR-based check (utc_ms >= dst_start || utc_ms < dst_end) can give a correct result even with the wrong year, so in practice this may not surface — but it is an undocumented fragility that should be either explained in a comment or resolved by iterating DST windows for year and year ± 1.

Comment on lines +336 to +390
}
}

case DOW_GE_DOM_MODE: {
// First rule_dow on or after rule_day
int64_t target_epoch = first_of_month_epoch_days + (rule_day - 1);
int64_t target_raw = ((target_epoch + 4) % 7);
if (target_raw < 0) target_raw += 7;
int32_t target_dow = static_cast<int32_t>(target_raw) + 1;
int32_t diff = rule_dow - target_dow;
if (diff < 0) diff += 7;
return rule_day + diff;
}

case DOW_LE_DOM_MODE: {
// Last rule_dow on or before rule_day
int64_t target_epoch = first_of_month_epoch_days + (rule_day - 1);
int64_t target_raw = ((target_epoch + 4) % 7);
if (target_raw < 0) target_raw += 7;
int32_t target_dow = static_cast<int32_t>(target_raw) + 1;
int32_t diff = target_dow - rule_dow;
if (diff < 0) diff += 7;
return rule_day - diff;
}

default: return rule_day;
}
}

/**
* @brief Compute the UTC millis of a DST transition for a given year.
*
* @param year The calendar year.
* @param rule_month 0-based month.
* @param rule_day Day parameter of the rule.
* @param rule_dow Day-of-week parameter.
* @param rule_time Time-of-day in ms.
* @param rule_time_mode WALL_TIME / STANDARD_TIME / UTC_TIME.
* @param rule_mode DOM / DOW_IN_MONTH / DOW_GE_DOM / DOW_LE_DOM.
* @param raw_offset_ms The timezone raw offset in ms.
* @param dst_savings_ms The DST savings in ms (needed for WALL_TIME conversion).
* @param is_start_rule True for DST start (to determine WALL_TIME adjustment).
*/
__device__ static int64_t compute_transition_utc_ms(int32_t year,
int32_t rule_month,
int32_t rule_day,
int32_t rule_dow,
int32_t rule_time,
int32_t rule_time_mode,
int32_t rule_mode,
int32_t raw_offset_ms,
int32_t dst_savings_ms,
bool is_start_rule)
{
int32_t actual_day = compute_rule_day(rule_mode, rule_day, rule_dow, year, rule_month);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 compute_rule_day can return a day outside [1, month_len]

DOW_GE_DOM_MODE returns rule_day + diff (0–6 added), so a rule_day of 27 in a 28-day month with diff = 3 yields 30 — already out of range. Similarly, DOW_IN_MONTH_MODE with a large positive occurrence can overshoot. Java's SimpleTimeZone validates its own rules before storing them, so inputs from the serialized DST rule table should be safe, but compute_transition_utc_ms ultimately uses the result as a day-of-month offset into the month without any clamp, so a corrupted/unexpected rule would silently produce a wrong UTC timestamp (rather than a detectable error).

@res-life

Copy link
Copy Markdown
Collaborator Author

useless now.

@res-life res-life closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants