Support DST timezones conversion for ORC#4432
Conversation
|
build |
Signed-off-by: Chong Gao <res_life@163.com>
Greptile SummaryThis PR adds DST timezone support to the ORC timestamp conversion path in
Confidence Score: 4/5The PR is safe to merge. The core DST math is well-implemented with consistent Java/CUDA mirror implementations, verified across multiple reference years, and covered by new tests at historical transition boundaries and future-year DST fallback paths. The main new code paths are internally consistent and cross-validated. Findings are limited to style and a minor inconsistency in how dstSavings is sourced between the two extraction paths, both caught by the existing verification. No correctness gaps were found in the shared-memory layout arithmetic, the southern-hemisphere year-boundary logic, or the JNI marshaling of DST rule parameters. src/main/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfo.java warrants a second look around the two DST extraction paths and their differing dstSavings sources; the verification safety net is robust but the inconsistency is non-obvious. Important Files Changed
Reviews (1): Last reviewed commit: "Add back convertOrcTimezones" | Re-trigger Greptile |
|
build |
GetIntArrayElements returns nullptr and sets a pending OutOfMemoryError when it cannot pin/copy the array (e.g. under memory pressure). Dereferencing that pointer would crash the JVM. Return early with has_dst=false so the pending exception surfaces on JNI exit instead of triggering a SIGSEGV. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
parse_dst_rule can leave a pending Java exception (bad array length or OOM from GetIntArrayElements), but control returns to the caller which would otherwise launch convert_orc_writer_reader_timezones with partially initialized rules and waste GPU work. Check for pending exceptions after each parse_dst_rule call so the outer JNI_CATCH propagates the failure immediately. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
Close() could be called more than once (e.g. explicit close followed by try-with-resources unwind), which would double-close the underlying writer/reader tables and trigger undefined behavior in the native layer. Guard with a "closed" flag so subsequent calls are no-ops. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
get_transition_index only iterates by index off offset_begin; the matching end pointer is never dereferenced. Dropping it also lets convert_timestamp_between_timezones and convert_timezones_kernel stop computing/forwarding those pointers, reducing register pressure in the hot device code path. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
thrust::upper_bound returns the first element strictly greater than the needle, so the subsequent `*iter == time_ms` check could never be true. Remove the dead branch and add a comment explaining the upper_bound semantics so the index-1 math is obvious. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
ZoneOffsetTransitionRule can express both "day-of-week on or after day" (positive indicator, DOW_GE_DOM_MODE=2) and "day-of-week on or before day" (negative indicator, DOW_LE_DOM_MODE=3). We reject the latter in the precondition, so mode=2 is always safe. Document this invariant so future readers don't wonder why the mode isn't derived from the transition rule. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
DST_RULE_VALIDATION_YEARS previously only checked 2400 and 9997, both well past any IANA explicit transition entry. That meant a zone whose CPU-side TimeZone.getOffset() still followed explicit transitions into the 2040s-2050s could drift from our derived recurring rule without the verification catching it. Add 2060 so we notice mismatches within a typical application lifetime; keep 2400/9997 to continue exercising the recurring-rule fallback. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
MIN_SUPPORTED_ORC_UTC_MILLIS is computed via java.time.LocalDate (proleptic Gregorian), whereas TimeZone.getOffset(long) uses a hybrid Julian/Gregorian calendar internally. The distinction is academic here since the offset lookup is instant-based and no zone has DST in year 0001, but the subtlety is non-obvious; add a comment so future readers don't have to re-derive why this is safe. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
buildRuntimeOrcTimezoneInfo already resolves ZoneId and ZoneRules for the incoming timezone id, but extractDstRuleFromZoneRules re-calls GpuTimeZoneDB.getZoneId(timezoneId).getRules() for the same zone. Pass the ZoneRules through extractDstRule so each buildRuntimeOrcTimezoneInfo call resolves the zone at most once. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
Timezone metadata is now generated at runtime from java.util.TimeZone and java.time.zone.ZoneRules, so results depend on whichever IANA tzdata ships with the running JVM. Previously we shipped a frozen OpenJDK-8 snapshot, so results were identical across environments. Add a Javadoc note so users who see cross-environment discrepancies know to check their JVM's tzdata version first. Signed-off-by: Chong Gao <chongg@nvidia.com> Made-with: Cursor
|
build |
Made-with: Cursor
Made-with: Cursor
|
build |
|
Can you address the comments from greptile first ? |
Per review: callers should know that the historical scan + DST-rule probing path is costly, and that they should always go through the cached get(...) wrapper. Signed-off-by: Chong Gao <res_life@163.com>
~4 ms typical / ~12 ms worst-case (ART) on an Intel i7-10700K with OpenJDK 17, measured across all 626 JVM-known zones. Signed-off-by: Chong Gao <res_life@163.com>
Per review: cuda::std::array carries length and bounds-friendly access better than a C-style array. __device__ storage is still needed because days_in_month/days_before_month index this with non-compile-time values. Signed-off-by: Chong Gao <res_life@163.com>
|
build |
|
One nit as the above, others LGTM (Only JVM part) |
Signed-off-by: Chong Gao <res_life@163.com>
|
build |
| } | ||
| closed = true; | ||
| if (writerTzInfoTable != null) { | ||
| writerTzInfoTable.close(); |
There was a problem hiding this comment.
Just confirming here that OrcTimezoneContext is not meant for use from multiple threads, right? Each thread would have its own instance of OrcTimezoneContext? It probably would.
In case it doesn't, I would be afraid of a double-close at this point.
There was a problem hiding this comment.
It's not thread safe, it's loaded/closed in Spark-Rapids plugin init/close phase, not multi-threaded.
Anyway, added not thread safe comment in the code.
There was a problem hiding this comment.
Fixed in the split sub PRs.
| int time = getTransitionRuleTimeMillis(transitionRule); | ||
| int timeMode = getTransitionRuleTimeMode(transitionRule); | ||
| // SimpleTimeZone mode constant: DOW_GE_DOM_MODE. Guaranteed by the precondition above. | ||
| int mode = 2; |
There was a problem hiding this comment.
I'm sorry, this is not readable.
My previous comment was to introduce an enum for dst_rule_mode. We did that in timezones.cu. (Thank you.)
But the code here is using magic numbers. We should have a Java-side enum with the same indexes and names.
There was a problem hiding this comment.
Fixed in the split sub PRs.
| */ | ||
| class OrcTimezoneInfo { | ||
| public OrcTimezoneInfo(int rawOffset, long[] transitions, int[] offsets) { | ||
| public OrcTimezoneInfo( |
There was a problem hiding this comment.
Nit: This should probably not be public. The class itself is package private.
There was a problem hiding this comment.
Fixed in the split sub PRs.
| bool const writer_fixed = is_fixed_offset_tz(writer_trans_begin, writer_trans_end, writer_dst); | ||
| bool const reader_fixed = is_fixed_offset_tz(reader_trans_begin, reader_trans_end, reader_dst); |
There was a problem hiding this comment.
Silly question: Does this computation depend on the ts timestamp instance? My understanding is that it doesn't. We are recomputing the same result, per row.
Wouldn't it make sense to move these out of this function? In fact, this could be moved out of the kernel itself. We can pre-compute this, and pass it in as a function parameter. I suspect it will speed things up.
There was a problem hiding this comment.
fixed in sub-PR 5
There was a problem hiding this comment.
What changed
Before (per-row, called inside convert_timestamp_between_timezones — which runs once per timestamp):
bool const writer_fixed = is_fixed_offset_tz(writer_trans_begin, writer_trans_end, writer_dst);
bool const reader_fixed = is_fixed_offset_tz(reader_trans_begin, reader_trans_end, reader_dst);
After (host-side, once per kernel launch, in convert_timezones):
// Fixed-offset (no transitions and no DST) is a per-call property, so resolve
// it on the host and pass the bools through; the per-row path can then skip
// the redundant check.
bool const writer_fixed = (writer_trans_count == 0) && !writer_dst.has_dst;
bool const reader_fixed = (reader_trans_count == 0) && !reader_dst.has_dst;
convert_timezones_kernel<<<...>>>(
..., writer_fixed, ..., reader_fixed, ...);
Then the two bools are threaded through:
- convert_timezones_kernel signature gains bool writer_fixed and bool reader_fixed parameters.
- They're passed straight down into convert_timestamp_between_timezones(..., writer_fixed, ..., reader_fixed, ...).
- The per-row is_fixed_offset_tz() calls are gone.
mythrocks
left a comment
There was a problem hiding this comment.
A couple of readability nitpicks. One possible performance improvement.
This is a long PR, and not easy to review. :]
|
I'll split this huge PR. |
|
NOTE: release/26.06 has been created from main. Please retarget your PR to release/26.06 if it should be included in the release. |
Replace `orc_timezone_info.data` (544KB, OpenJDK-8 snapshot) with a runtime build path that derives historical transitions from `ZoneRules.getTransitions()` and `TimeZone.getOffset()`. Per-id results are cached in `RUNTIME_TIMEZONE_INFOS`. `OrcTimezoneInfo` keeps its `(rawOffset, transitions, offsets)` shape; the CUDA kernel and the `hasDaylightSavingTime` DST guard are unchanged. Switch `isSupportedTimeZone` to catch `DateTimeException` so IDs like `+05:30` are handled the same as named zones. Split from NVIDIA#4432. Signed-off-by: Chong Gao <chongg@nvidia.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final part of the split of #4432. Adds GpuTimeZoneDBTest coverage for: - DST rule extraction across DST and non-DST zones, validating startMonth / startDay / startDayOfWeek / endMonth / time-mode / rule-mode values. - Probe-based extraction fallback for fixed-offset zones. - Historical transition handling for America/Los_Angeles, America/New_York, Europe/London, Asia/Tokyo, etc. - buildHistoricalTransitions edge cases. - End-to-end convertOrcTimezones across cross-tz and same-tz cases using OrcTimezoneContext lifecycle. After this PR the working tree matches PR #4432 head; the cumulative diff against main is identical to #4432. Signed-off-by: Chong Gao <chongg@nvidia.com>
|
Need to split into small PRs, convert to draft first. |
…pregenerated binary (#4539) First in a stack splitting #4432 into reviewable chunks. ### Description Previously `OrcTimezoneInfo` loaded its transition tables from a pregenerated 544KB binary (`src/main/resources/orc_timezone_info.data`, derived from `sun.util.calendar.ZoneInfo` on OpenJDK 8). This froze ORC timezone behavior to that snapshot and required regenerating and committing the binary whenever the data needed to change. This PR replaces the binary with a runtime build path: - `buildHistoricalTransitions` walks `ZoneRules.getTransitions()` from year 0001 (the lower bound ORC supports) and probes `TimeZone.getOffset()` to capture the offset before each known transition. - `collectTimeZoneTransitionsByScanning` covers any gap between two known transitions by stepping forward at 1-day granularity and binary-searching the exact transition millisecond when an offset change is detected. - Per-id results are cached in `RUNTIME_TIMEZONE_INFOS`. - `getAllTimezoneIds()` now returns the live `TimeZone.getAvailableIDs()` instead of a hard-coded array. `OrcTimezoneInfo` keeps its `(rawOffset, transitions, offsets)` shape so the CUDA kernel is unchanged. DST timezones remain blocked by the existing `hasDaylightSavingTime` guard in `convertOrcTimezones` — follow-up PRs lift this guard: 1. **(this PR)** Build OrcTimezoneInfo at runtime 2. DST rule extraction (Java side, CPU verified) 3. GPU DST kernel + new `convertOrcTimezones` signature 4. `OrcTimezoneContext` + remaining tests `isSupportedTimeZone` is broadened from `catch (ZoneRulesException)` to `catch (DateTimeException)` so IDs like `+05:30` are rejected/accepted consistently with the runtime build path (`ZoneId.of` may throw either, depending on whether the id has a malformed offset vs. an unknown name). ### Checklists - [ ] Documentation: no user-facing change. - [x] Tests: behavior preserved for non-DST timezones; covered by existing tests in this repo and the companion spark-rapids PR. - [x] Performance: no expected change for the existing non-DST path. The first call per timezone pays a one-time runtime-build cost (measured ~4 ms typical, ~12 ms worst on the previous full-DST iteration); subsequent calls hit the cache. Signed-off-by: Chong Gao <chongg@nvidia.com> --------- Signed-off-by: Chong Gao <chongg@nvidia.com> Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… APIs (#4635) ## Context Part 2 of the split of #4432 (full ORC DST GPU support). Part 1 (#4539, runtime build of `OrcTimezoneInfo`) is merged. This PR adds the Java-side machinery to **extract** a recurring DST rule for a zone, but does **not** yet wire it into the GPU dispatch path — production behavior is unchanged. ## What this adds New file `OrcDstRuleExtractor.java` (package-private `final class`), which hosts: - `static final class DstRule` — SimpleTimeZone-shaped DST descriptor (raw offset, dst savings, start/end month / day / dayOfWeek / time / mode), 0-based month and Calendar-style 1=Sun..7=Sat to match the GPU consumer. - `extractDstRule(timezoneId, tz, rules)` — entry point. Returns `null` for non-DST zones (`rules.isFixedOffset()` or `!tz.useDaylightTime()`). Fail-fast sanity check that `tz` and `rules` describe the same zone (guards against the silent-GMT trap where `TimeZone.getTimeZone(id)` returns GMT for offset-style ids). - **Path B (tried first):** `extractDstRuleByProbing` — probe `tz.getOffset()` hourly across a reference year, find the two transitions, encode them in SimpleTimeZone form. Captures what `java.util.TimeZone.getOffset` actually returns, which is the source of truth the GPU side must match. - **Path A (fallback):** `extractDstRuleFromZoneRules` — derive from `ZoneRules.getTransitionRules()`. Used for zones whose recurring rule cannot be recovered by probing alone. - Cross-year verification: the extracted rule is re-evaluated against `tz.getOffset` across anchor years **2060, 2400, 9997** to catch divergence well into the recurring-rule fallback regime. In `OrcTimezoneInfo.java`: - Widen `utcMillisForDate` and `binarySearchTransition` from `private` to package-private so `OrcDstRuleExtractor` can reuse the same UTC anchor and bracketed binary search. No behavioral change. ## What this does **not** change - `GpuTimeZoneDB.convertOrcTimezones` — DST gate still throws `UnsupportedOperationException` for DST zones. - `buildRuntimeOrcTimezoneInfo` — does not yet populate any `DstRule`. - `OrcTimezoneInfo`'s public constructor / fields (the two widened methods are package-private, not public). - JNI, CUDA. `extractDstRule` is package-private and exercised only by `OrcTimezoneInfoTest`. ## Verification Extracted rules cross-checked against tzdata; the test suite is green on **both JDK 8 (the build's `maven.compiler.target`) and JDK 17 Zulu**: | Zone | start | end | dstSavings | |------|-------|-----|------------| | America/New_York | 2nd Sun of March, 02:00 standard | 1st Sun of November, 01:00 standard | +1h | | Europe/London | last Sun of March, 01:00 standard | last Sun of October, 01:00 standard | +1h | | Australia/Sydney | 1st Sun of October, 02:00 standard | 1st Sun of April, 02:00 standard | +1h | | Asia/Shanghai, UTC, +05:30 | `null` | `null` | — | Real-zone tests assert the exact (`month`, `day`, `dayOfWeek`, `time`, `mode`) shape for each zone. Synthetic-zone tests additionally cover both extraction paths (probing success, ZoneRules fallback), every `IllegalStateException` branch (unsupported rule count / shape, zero-/both-positive-/both-negative-delta, mismatched savings, negative day indicator, cross-year verification failure, tz/rules zone mismatch), and edge cases (`isMidnightEndOfDay`, `TimeDefinition.WALL`, February day-of-month clamp, multiple-DST-on probing guard). 26 tests total. ## Diff size `+1314 / -2` across 3 files: - `OrcDstRuleExtractor.java` `+529` (new file) - `OrcTimezoneInfo.java` `+6 / -2` (visibility widening) - `OrcTimezoneInfoTest.java` `+779` No changes outside this slice. ## Follow-up parts - **Part 3**: CUDA civil-calendar utilities + Java↔CUDA `DstRule` serialization scaffolding (still all dead-code). - **Part 4**: CUDA DST kernel implementation + kernel-level C++ tests. - **Part 5**: Remove the DST gate in `convertOrcTimezones`, dispatch DST zones through the new kernel, add end-to-end Java tests. This is the only follow-up part that changes runtime behavior. --------- Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com>
Contributes to NVIDIA/cudf-spark#13437.
Description
Corresponding Spark-Rapids PR
Previously
convertOrcTimezonesonly handled non-DST (fixed-offset) timezones by looking up a pre-built transition table shipped inorc_timezone_info.data. DST timezones such asAmerica/Los_AngelesandAmerica/New_Yorkwere not fully covered.This PR adds DST timezone support and optimizes the ORC timezone conversion path:
DST timezone support (main feature)
orc_timezone_info.datafile with runtime-generatedOrcTimezoneInfobuilt from JVMTimeZone/ZoneRulesAPIs, covering all JVM-known timezones.DstRuleextraction by probingTimeZone.getOffset()to compute DST transition parameters (start/end month, day, day-of-week, time, mode) compatible withjava.util.SimpleTimeZonesemantics.compute_dst_offset) that evaluates DST rules per-row using the extracted rule parameters, eliminating the need for a pre-built transition table for current-era timestamps.CUDA kernel optimizations
millis_to_fieldswith a branchless civil calendar algorithm (era-based) for O(1) date decomposition.DAYS_BEFORE_MONTH[]lookup table for O(1) month-day calculation, replacing the loop over months.int64_talignment padding between writer and reader sections.JNI API improvements
OrcTimezoneContextto pre-build writer/reader transition tables and DST rules once per table, avoiding redundant GPU allocations when converting multiple timestamp columns.ColumnViewinstead ofColumnVectorinconvertOrcTimezonessince onlygetNativeView()is used.OrcTimezoneInfo optimizations
getOffset()calls).TimeZoneobjects instead of redundantTimeZone.getTimeZone()calls.java.util.Calendarwithjava.time.Instant/LocalDateTimefor lighter date decomposition.Tests
GpuTimeZoneDBTest.javacovering DST rule extraction, historical transitions, fixed-offset timezones, and cross-timezone ORC conversion.Performance
1B rows, 4.1 GB ORC data (random timestamps 2000-2033), single GPU:
The custom timezone kernel contributes <600ms overhead for 1B rows of cross-timezone conversion.
How to run perf test: see companion PR NVIDIA/cudf-spark#14544.
Correctness test
Refer to the suite
OrcTimezoneSuitein NVIDIA/cudf-spark#14544Checklists
GpuTimeZoneDBTest.javacovers DST rule extraction, historical transitions, fixed-offset timezones, and cross-timezone ORC conversion.Signed-off-by: Chong Gao chongg@nvidia.com