ORC tz split 3/6: OrcTimezoneInfo runtime build and registry#4547
ORC tz split 3/6: OrcTimezoneInfo runtime build and registry#4547res-life wants to merge 1 commit into
Conversation
Part of the split of #4432. Completes OrcTimezoneInfo by adding the runtime build pipeline and the remaining DST math helpers: - verifyDstRule, verifyDstRuleAcrossReferenceYears: lightweight verification (~200 getOffset() calls instead of ~52K). - computeDstOffset, computeTransitionUtcMillis, computeRuleDay, utcMillisForDate: SimpleTimeZone-compatible offset math. - toString. - RUNTIME_TIMEZONE_INFOS registry, get(timezoneId), buildRuntimeOrcTimezoneInfo, getAllTimezoneIds. - getInitialOffset, buildHistoricalTransitions, collectTimeZoneTransitionsByScanning, toLongArray / toIntArray, and the HistoricalTransitions value class. After this PR, OrcTimezoneInfo.java matches the version in #4432. Callers introduced in orc-tz-1-jni-plumbing now resolve. Signed-off-by: Chong Gao <chongg@nvidia.com>
d0baa11 to
944737e
Compare
c11c7cd to
0d0a2c4
Compare
| public static List<String> getAllTimezoneIds() { | ||
|
|
||
| String[] ids = TimeZone.getAvailableIDs(); | ||
| Arrays.sort(ids); | ||
| return Arrays.asList(ids); |
There was a problem hiding this comment.
getAllTimezoneIds() returns IDs that get() cannot handle
TimeZone.getAvailableIDs() includes POSIX/legacy aliases such as "EST5EDT", "PST8PDT", "SystemV/CST6CDT", etc. that are not recognized by ZoneId.of(id, ZoneId.SHORT_IDS) — the mechanism used internally by GpuTimeZoneDB.getZoneId(). When any caller (including GpuTimeZoneDB.getOrcSupportedTimezones()) iterates the result and calls OrcTimezoneInfo.get() for each entry, those legacy IDs will throw IllegalArgumentException. Replacing TimeZone.getAvailableIDs() with ZoneId.getAvailableZoneIds() produces exactly the set of IDs that ZoneId.of() accepts and is therefore consistent with get().
| public static List<String> getAllTimezoneIds() { | ||
|
|
||
| String[] ids = TimeZone.getAvailableIDs(); | ||
| Arrays.sort(ids); | ||
| return Arrays.asList(ids); | ||
| } |
There was a problem hiding this comment.
Extra blank line after the opening brace — minor style nit.
| public static List<String> getAllTimezoneIds() { | |
| String[] ids = TimeZone.getAvailableIDs(); | |
| Arrays.sort(ids); | |
| return Arrays.asList(ids); | |
| } | |
| public static List<String> getAllTimezoneIds() { | |
| String[] ids = TimeZone.getAvailableIDs(); | |
| Arrays.sort(ids); | |
| return Arrays.asList(ids); | |
| } |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| switch (ruleMode) { | ||
| case 1: { | ||
| if (ruleDay > 0) { | ||
| int diff = ruleDayOfWeek - firstDayOfWeek; | ||
| if (diff < 0) { | ||
| diff += 7; | ||
| } | ||
| return 1 + diff + (ruleDay - 1) * 7; | ||
| } else { | ||
| int lastDayOfWeek = toCalendarDayOfWeek( | ||
| LocalDate.of(year, month + 1, monthLength).getDayOfWeek().getValue()); | ||
| int diff = lastDayOfWeek - ruleDayOfWeek; | ||
| if (diff < 0) { | ||
| diff += 7; | ||
| } | ||
| return monthLength - diff + (ruleDay + 1) * 7; | ||
| } | ||
| } | ||
| case 2: { | ||
| int targetDayOfWeek = toCalendarDayOfWeek( | ||
| LocalDate.of(year, month + 1, ruleDay).getDayOfWeek().getValue()); | ||
| int diff = ruleDayOfWeek - targetDayOfWeek; | ||
| if (diff < 0) { | ||
| diff += 7; | ||
| } | ||
| return ruleDay + diff; | ||
| } | ||
| case 3: { | ||
| int targetDayOfWeek = toCalendarDayOfWeek( | ||
| LocalDate.of(year, month + 1, ruleDay).getDayOfWeek().getValue()); | ||
| int diff = targetDayOfWeek - ruleDayOfWeek; | ||
| if (diff < 0) { | ||
| diff += 7; | ||
| } | ||
| return ruleDay - diff; | ||
| } | ||
| default: | ||
| return ruleDay; | ||
| } |
There was a problem hiding this comment.
computeRuleDay cases 1 and 3 are dead code with potential out-of-bounds results
All DST rules in this class are encoded exclusively as DOW_GE_DOM_MODE (value 2) — both decodeTransition and fillDstRuleFromTransitionRule hard-code DstRuleMode.DOW_GE_DOM_MODE.value — so cases 1 and 3 in computeRuleDay are never reached today. However, if case 1 (DOW_IN_MONTH) were ever invoked with ruleDay = 5 on a short month, 1 + diff + (ruleDay - 1) * 7 can exceed the month length and LocalDate.of(year, month+1, day) downstream would throw DateTimeException. Similarly, case 3 with a small ruleDay (e.g. 1) can yield a non-positive result. A comment noting the dead-code status or an explicit guard on the returned day would prevent silent misbehavior if the switch is extended later.
|
useless now. |
Part of the split of #4432.
Companion: NVIDIA/cudf-spark#14544
Previous: #4546
Completes OrcTimezoneInfo by adding the runtime build pipeline and the remaining DST math helpers:
After this PR,
OrcTimezoneInfo.javamatches the version in #4432. Callers introduced in #orc-tz-1 now resolve.Signed-off-by: Chong Gao chongg@nvidia.com