Skip to content

[Part 3] [ORC-timezone]: GPU kernel for DST-aware timezone conversion#4733

Open
res-life wants to merge 4 commits into
NVIDIA:mainfrom
res-life:orc-tz-pr3-cuda-dst-kernel
Open

[Part 3] [ORC-timezone]: GPU kernel for DST-aware timezone conversion#4733
res-life wants to merge 4 commits into
NVIDIA:mainfrom
res-life:orc-tz-pr3-cuda-dst-kernel

Conversation

@res-life

@res-life res-life commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Context

Part 3 of the split of #4432 (full ORC DST GPU support). Part 1 (#4539, runtime build of OrcTimezoneInfo) and Part 2 (#4635, Java-side DST rule extraction) are merged. This PR adds the CUDA implementation that converts ORC writer/reader timestamps with Daylight Saving Time support. It does not wire DST into the production dispatch path — production behavior is unchanged.

What this adds

In timezones.hpp:

  • struct dst_rule — a SimpleTimeZone-shaped DST descriptor (raw/dst offsets, start/end month/day/dayOfWeek/time/mode) consumed by the GPU kernel. Field layout mirrors the Java OrcDstRuleExtractor.DstRule (0-based month; dayOfWeek 1=Sun..7=Sat; mode 0=DOM/1=DOW_IN_MONTH/2=DOW_GE_DOM/3=DOW_LE_DOM; timeMode 0=WALL/1=STANDARD/2=UTC).
  • A DST-capable convert_orc_writer_reader_timezones overload (adds base_offset_us, per-side initial_offset and dst_rule).

In timezones.cu:

  • Device civil-calendar / DST-offset helpers: is_leap_year, days_in_month, days_before_month, days_from_epoch_to_year, compute_rule_day, compute_transition_utc_ms, millis_to_year, compute_dst_offset, get_transition_index, is_fixed_offset_tz. These implement SimpleTimeZone.getOffset() semantics for dates beyond the historical transition table.
  • A shared-memory conversion kernel and its host launcher.
  • A backward-compatible 5-arg overload of convert_orc_writer_reader_timezones that forwards to the new path with has_dst=false and initial_offset=raw_offset. This reproduces the legacy non-DST behavior exactly (the only old/new divergence is the before-first-transition branch, which the mapping pins), so existing callers — the JNI binding and C++ tests — are unaffected.
  • NVTX instrumentation and CUDA launch error checking on the shared launcher path.

In tests/timezones.cpp:

  • ConvertOrcTimezonesReaderDstBeyondTable — fixed-UTC writer, US-style DST reader with no transition table; a winter instant is unchanged, a summer instant shifts back one hour.
  • ConvertOrcTimezonesSameDstZoneIsIdentity — converting a DST zone to itself is the identity for both standard and daylight instants.
  • ConvertOrcTimezonesAppliesBaseOffset — covers non-zero base_offset_us for both fixed-offset and transition-table paths.
  • ConvertOrcTimezonesDstRuleModes — covers all SimpleTimeZone rule modes (DOM, positive/negative DOW_IN_MONTH, DOW_GE_DOM, DOW_LE_DOM) plus normalized adjacent-month boundary behavior.
  • The existing floor-division regression tests are preserved.

Review follow-up

  • Added the missing SRJ_FUNC_RANGE() coverage through the shared convert_timezones launcher and included nvtx_ranges.hpp.
  • Added CUDF_CHECK_CUDA(stream.value()) after the raw CUDA kernel launch.
  • Added direct includes required by the new public header/test signatures.
  • Documented that the DST rule-mode math is a manual port of java.util.SimpleTimeZone behavior and kept normalized-calendar semantics covered by tests.

What this does not change

  • GpuTimeZoneDB.convertOrcTimezones — DST gate still throws UnsupportedOperationException for DST zones.
  • No Java or JNI changes; the DST-capable API is not reached from production. The DST capability is staged behind the gate.
  • The pre-existing non-DST ORC conversion path is behavior-preserving (rerouted through the new kernel via the overload).

Verification

  • git diff --check
  • Build/test were not run locally for the latest pushed revision; CI will run them.

Diff size

+857 / -191 across 3 files:

  • timezones.cu +682 / -178
  • timezones.hpp +102 / -13
  • tests/timezones.cpp +264

Follow-up parts

  • Part 4: Java↔CUDA DstRule serialization + GpuTimeZoneDB host-side assembly of transition tables / dst_rule and the JNI binding (still gated — no behavior change).
  • Part 5: Remove the DST gate in convertOrcTimezones, dispatch DST zones through the kernel, add end-to-end Java tests. This is the only follow-up part that changes runtime behavior.

Part 3 of NVIDIA#4432. Adds the CUDA implementation that computes ORC
writer/reader timezone conversions with Daylight Saving Time support.
No production behavior change: the Java DST gate in
GpuTimeZoneDB.convertOrcTimezones still throws and no Java/JNI changes
are included, so the DST capability is unreachable from production.
The pre-existing non-DST ORC path is behavior-preserving -- it now
routes through the new kernel via a backward-compatible overload.

- timezones.hpp: add `struct dst_rule` (SimpleTimeZone-shaped DST
  descriptor consumed by the kernel) and a DST-capable
  `convert_orc_writer_reader_timezones` overload.
- timezones.cu: civil-calendar / DST-offset device helpers
  (is_leap_year, days_*, compute_rule_day, compute_transition_utc_ms,
  millis_to_year, compute_dst_offset, get_transition_index,
  is_fixed_offset_tz), a shared-memory conversion kernel, and the
  DST-capable host launcher. A backward-compatible 5-arg overload of
  convert_orc_writer_reader_timezones forwards to the new path with
  has_dst=false and initial_offset=raw_offset, which reproduces the
  legacy non-DST behavior exactly (the only old/new divergence is the
  before-first-transition branch), so existing callers (the JNI
  binding and C++ tests) are unaffected.
- tests/timezones.cpp: add ConvertOrcTimezonesReaderDstBeyondTable and
  ConvertOrcTimezonesSameDstZoneIsIdentity covering the DST math, and
  preserve the existing floor-division regression tests.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life self-assigned this Jun 23, 2026
@res-life

Copy link
Copy Markdown
Collaborator Author

build

1 similar comment
@res-life

Copy link
Copy Markdown
Collaborator Author

build

@res-life res-life marked this pull request as ready for review June 23, 2026 05:02
@res-life res-life marked this pull request as draft June 23, 2026 06:20
Signed-off-by: Chong Gao <res_life@163.com>
@res-life

Copy link
Copy Markdown
Collaborator Author

build

@res-life res-life marked this pull request as ready for review June 23, 2026 08:51
@res-life res-life requested review from mythrocks and ttnghia June 23, 2026 08:53
Signed-off-by: Chong Gao <res_life@163.com>
@res-life

Copy link
Copy Markdown
Collaborator Author

build

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the CUDA-side implementation needed for DST-aware ORC writer↔reader timestamp timezone conversion (SimpleTimeZone semantics), while keeping the existing non-DST behavior available via a backward-compatible overload and extending C++ test coverage for the new DST logic.

Changes:

  • Introduces dst_rule and a DST-capable convert_orc_writer_reader_timezones overload (including base_offset_us and per-side initial_offset).
  • Implements GPU calendar/DST helpers plus a shared-memory conversion kernel + launcher, with NVTX ranges and CUDA launch error checking.
  • Adds new C++ tests covering DST behavior beyond transition tables, identity conversion, base-offset fusion, and all SimpleTimeZone rule modes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/main/cpp/src/timezones.hpp Adds dst_rule and new DST-capable ORC conversion API + backward-compatible overload; minor doc fix and includes.
src/main/cpp/src/timezones.cu Implements GPU DST/calendar logic, shared-memory conversion kernel, launcher instrumentation, and overload forwarding.
src/main/cpp/tests/timezones.cpp Adds targeted tests for DST rule evaluation, rule modes, identity behavior, and base offset handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/cpp/src/timezones.hpp
Comment thread src/main/cpp/src/timezones.cu
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds the CUDA implementation for DST-aware ORC timezone conversion, porting java.util.SimpleTimeZone.getOffset() semantics to the GPU. The new kernel handles transition-table lookups, DST rule fallback for future dates, and a fixed base-offset adjustment in a single pass; a backward-compatible 5-arg overload keeps existing callers unchanged and the DST gate in GpuTimeZoneDB continues to throw for production.

  • New GPU DST kernel (timezones.cu): calendar helpers, rule-mode dispatch (compute_rule_day), UTC-transition computation (compute_transition_utc_ms), hemisphere-aware offset selection (compute_dst_offset), and a shared-memory conversion kernel replacing the prior thrust::transform functor.
  • dst_rule struct (timezones.hpp): 48-byte JNI-ready layout with explicit 3-byte padding after has_dst; new 11-arg overload plus retained 6-arg backward-compatible overload.
  • Tests (tests/timezones.cpp): four new test cases covering the no-table DST path, same-zone identity, base-offset fusion, and all four SimpleTimeZone rule modes including month-boundary normalization.

Confidence Score: 5/5

Safe to merge — the DST gate in GpuTimeZoneDB still throws UnsupportedOperationException, so no production path reaches the new kernel.

The new code is a careful port of Java's SimpleTimeZone.getOffset() logic with comprehensive test coverage across all rule modes, hemisphere variants, and boundary normalization. The floor_div fix and explicit struct padding from earlier review rounds are incorporated. All findings are minor defensive-programming observations that do not affect correctness with valid Java inputs.

The days_before_month unchecked array access in timezones.cu is worth a second look before Part 4 wires the Java-side DstRule serialization.

Important Files Changed

Filename Overview
src/main/cpp/src/timezones.cu Adds ~480 lines of GPU DST support: calendar helpers, compute_rule_day/compute_transition_utc_ms/compute_dst_offset implementing SimpleTimeZone.getOffset on-GPU, a shared-memory conversion kernel replacing the thrust::transform functor, and a backward-compatible 5-arg overload. Minor issues around unchecked array bounds in days_before_month and ptr asymmetry noted.
src/main/cpp/src/timezones.hpp Adds dst_rule struct with explicit ABI padding, new 11-arg convert_orc_writer_reader_timezones overload, and retained 6-arg backward-compatible overload. Adds #pragma once and missing direct includes.
src/main/cpp/tests/timezones.cpp Adds 264 lines of test coverage across all four SimpleTimeZone rule modes, hemisphere variants, same-zone identity, and base-offset fusion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[convert_orc_writer_reader_timezones] --> B[convert_timezones]
    B --> C{input.size == 0?}
    C -- yes --> D[return empty result]
    C -- no --> E[compute smem_bytes]
    E --> F[convert_timezones_kernel]
    F --> G[Cooperative load into shared mem]
    G --> H{null element?}
    H -- yes --> I[skip]
    H -- no --> J[convert_timestamp_between_timezones]
    J --> K{writer fixed-offset?}
    K -- yes --> L[writer_raw_offset]
    K -- no --> M[get_transition_index writer]
    J --> N{reader fixed-offset?}
    N -- yes --> O[reader_raw_offset]
    N -- no --> P[get_transition_index reader]
    M --> Q{time_ms vs table?}
    Q -- before first --> R[initial_offset]
    Q -- within --> S[binary search]
    Q -- beyond --> T[compute_dst_offset]
    T --> U{has_dst?}
    U -- no --> V[raw_offset]
    U -- yes --> W[millis_to_year]
    W --> X[compute_transition_utc_ms x2]
    X --> Y[compute_rule_day]
    Y --> Z{Northern or Southern?}
    Z -- Northern --> AA[start <= t < end]
    Z -- Southern --> AB[t >= start OR t < end]
    AA --> AC[raw_offset +/- dst_savings]
    AB --> AC
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[convert_orc_writer_reader_timezones] --> B[convert_timezones]
    B --> C{input.size == 0?}
    C -- yes --> D[return empty result]
    C -- no --> E[compute smem_bytes]
    E --> F[convert_timezones_kernel]
    F --> G[Cooperative load into shared mem]
    G --> H{null element?}
    H -- yes --> I[skip]
    H -- no --> J[convert_timestamp_between_timezones]
    J --> K{writer fixed-offset?}
    K -- yes --> L[writer_raw_offset]
    K -- no --> M[get_transition_index writer]
    J --> N{reader fixed-offset?}
    N -- yes --> O[reader_raw_offset]
    N -- no --> P[get_transition_index reader]
    M --> Q{time_ms vs table?}
    Q -- before first --> R[initial_offset]
    Q -- within --> S[binary search]
    Q -- beyond --> T[compute_dst_offset]
    T --> U{has_dst?}
    U -- no --> V[raw_offset]
    U -- yes --> W[millis_to_year]
    W --> X[compute_transition_utc_ms x2]
    X --> Y[compute_rule_day]
    Y --> Z{Northern or Southern?}
    Z -- Northern --> AA[start <= t < end]
    Z -- Southern --> AB[t >= start OR t < end]
    AA --> AC[raw_offset +/- dst_savings]
    AB --> AC
Loading

Reviews (2): Last reviewed commit: "Address timezone review feedback" | Re-trigger Greptile

Comment thread src/main/cpp/src/timezones.cu Outdated
Comment thread src/main/cpp/src/timezones.cu
Signed-off-by: Chong Gao <res_life@163.com>
@res-life

Copy link
Copy Markdown
Collaborator Author

Addressed the remaining review feedback in 554c7502:

  • Changed writer/reader initial and raw timezone offsets from cudf::size_type to int32_t in the public ORC timezone API and internal launcher.
  • Restored UB-safe epoch-millis calculation via integer_utils::floor_div.
  • Made the dst_rule ABI padding explicit with has_dst_padding[3] for future host/JNI serialization.
  • Added comments clarifying raw-offset-only DST year selection and the fixed-offset/null-transition-table case.

The remaining inline review threads have been replied to and resolved.

@res-life

Copy link
Copy Markdown
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants