Skip to content

ORC tz split 2/6: OrcTimezoneInfo DstRule and DST rule extraction#4546

Closed
res-life wants to merge 1 commit into
orc-tz-1-jni-plumbingfrom
orc-tz-2-orctzinfo-dst-extract
Closed

ORC tz split 2/6: OrcTimezoneInfo DstRule and DST rule extraction#4546
res-life wants to merge 1 commit into
orc-tz-1-jni-plumbingfrom
orc-tz-2-orctzinfo-dst-extract

Conversation

@res-life

Copy link
Copy Markdown
Collaborator

Part of the split of #4432.
Companion: NVIDIA/cudf-spark#14544
Previous: #4545

Replaces the snapshot-driven OrcTimezoneInfo with a runtime-built version that derives metadata from java.time.zone.ZoneRules and java.util.TimeZone. This PR adds:

  • License header, java.time / java.util.concurrent imports.
  • Updated OrcTimezoneInfo constructor, initialOffset/rawOffset fields, and the DstRule inner class.
  • DST_RULE_VALIDATION_YEARS / MIN_SUPPORTED_ORC_UTC_MILLIS constants and the historical scan step.
  • extractDstRule entry point, extractDstRuleFromZoneRules, fillDstRuleFromTransitionRule, getTransitionRuleTimeMillis, getTransitionRuleTimeMode, toCalendarDayOfWeek.
  • extractDstRuleByProbing (probing fallback) and the binarySearchTransition / decodeTransition helpers it relies on.

The rest of the rewrite (verifyDstRule, computeDstOffset, transition math, toString, runtime registry, HistoricalTransitions) lands in #orc-tz-3. This intermediate state intentionally truncates the class after decodeTransition.

Signed-off-by: Chong Gao chongg@nvidia.com

Part of the split of #4432.

Replaces the snapshot-driven OrcTimezoneInfo with a runtime-built
version that derives metadata from java.time.zone.ZoneRules and
java.util.TimeZone. This PR adds:

- License header, java.time / java.util.concurrent imports.
- Updated OrcTimezoneInfo constructor, initialOffset/rawOffset fields,
  and the DstRule inner class.
- DST_RULE_VALIDATION_YEARS / MIN_SUPPORTED_ORC_UTC_MILLIS constants and
  the historical scan step.
- extractDstRule entry point, extractDstRuleFromZoneRules,
  fillDstRuleFromTransitionRule, getTransitionRuleTimeMillis,
  getTransitionRuleTimeMode, toCalendarDayOfWeek.
- extractDstRuleByProbing (probing fallback) and the
  binarySearchTransition / decodeTransition helpers it relies on.

The rest of the rewrite (verifyDstRule, computeDstOffset, transition
math, toString, runtime registry, HistoricalTransitions) lands in
orc-tz-3-orctzinfo-runtime-build. This intermediate state intentionally
truncates the class after decodeTransition.

Signed-off-by: Chong Gao <chongg@nvidia.com>
@res-life res-life force-pushed the orc-tz-1-jni-plumbing branch from 91450ec to b38def7 Compare May 19, 2026 08:33
@res-life res-life force-pushed the orc-tz-2-orctzinfo-dst-extract branch from c11c7cd to 0d0a2c4 Compare May 19, 2026 08:33
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR (2/6 of a larger split) rewrites OrcTimezoneInfo from a static, file-backed implementation to one that derives metadata at runtime from java.time.zone.ZoneRules and java.util.TimeZone. It introduces the DstRule inner class, the DstRuleMode/DstTimeMode enums, and the full DST-extraction pipeline (extractDstRule, probing via hourly scanning + binary search, and a ZoneRules fallback).

  • The DstRule struct and the two extraction paths (extractDstRuleByProbing and extractDstRuleFromZoneRules) are logically complete within this file; the probing path correctly converts UTC transition instants to standard local time and encodes them as DOW_GE_DOM_MODE rules.
  • The file intentionally references two methods that are not yet defined — verifyDstRule and utcMillisForDate — which will be added in #orc-tz-3; this means the class does not compile in its current state.
  • Several imports (DateTimeException, ArrayList, ConcurrentHashMap, ConcurrentMap, etc.) are pre-declared for upcoming PRs but are unused here, and HISTORICAL_TRANSITION_SCAN_STEP_MILLIS is defined but never referenced.

Confidence Score: 3/5

The file does not compile as written — two methods called from within this file are missing — so this PR cannot stand alone and should only be merged as part of the coordinated series that adds those methods in #orc-tz-3.

The DST extraction and probing logic that IS present looks algorithmically sound (binary search narrows UTC transition to exact millisecond, standard-local-time conversion is correct, DOW_GE_DOM encoding handles both nth-weekday and last-weekday cases). However, the file explicitly calls verifyDstRule and utcMillisForDate which are undefined — any build against this branch will fail to compile, making the change unverifiable in isolation.

src/main/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfo.java — the only changed file; needs verifyDstRule and utcMillisForDate from #orc-tz-3 before it will compile.

Important Files Changed

Filename Overview
src/main/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfo.java Replaces frozen snapshot-based ORC timezone implementation with a runtime-built version using java.time/java.util APIs; adds DstRule inner class, DstRuleMode/DstTimeMode enums, and DST extraction/probing logic — but intentionally references two undefined methods (verifyDstRule, utcMillisForDate) that prevent compilation until #orc-tz-3 lands.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["extractDstRule(timezoneId, tz, rules)"]
    A --> B{tz.useDaylightTime?}
    B -- No --> C[return null]
    B -- Yes --> D["extractDstRuleByProbing(tz)"]
    D --> E{probe rule != null?}

    subgraph Probing ["extractDstRuleByProbing loop over DST_RULE_VALIDATION_YEARS"]
        F["extractDstRuleByProbing(tz, refYear)"] --> G["Scan hourly: janFirst to nextJanFirst"]
        G --> H{offset changed?}
        H -- Yes --> I["binarySearchTransition: exact UTC ms"]
        I --> J{curOffset > prevOffset?}
        J -- Yes --> K[dstOnTransition = exactMs]
        J -- No --> L[dstOffTransition = exactMs]
        H -- No --> G
        K --> M{both found?}
        L --> M
        M -- Yes --> N["decodeTransition(dstOnTransition, rawOffset)"]
        N --> O["decodeTransition(dstOffTransition, rawOffset)"]
        O --> P["verifyDstRuleAcrossReferenceYears (calls undefined verifyDstRule)"]
        P -- pass --> Q[return DstRule]
        P -- fail --> R[try next refYear]
        M -- No --> R
    end

    D --> F
    E -- Yes --> S[return dstRule]
    E -- No --> T["extractDstRuleFromZoneRules(timezoneId, tz, rules)"]

    subgraph ZoneRules ["extractDstRuleFromZoneRules"]
        U["rules.getTransitionRules()"] --> V{exactly 2 rules?}
        V -- No --> W[null or throw]
        V -- Yes --> X[classify start/end by deltaMillis]
        X --> Y["fillDstRuleFromTransitionRule start"]
        Y --> Z["fillDstRuleFromTransitionRule end"]
        Z --> AA["verifyDstRuleAcrossReferenceYears (calls undefined verifyDstRule)"]
        AA -- pass --> AB[return DstRule]
        AA -- fail --> AC[throw IllegalStateException]
    end

    T --> U
    T --> AD{rule != null?}
    AD -- Yes --> AE[return dstRule]
    AD -- No --> AF[throw IllegalStateException]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["extractDstRule(timezoneId, tz, rules)"]
    A --> B{tz.useDaylightTime?}
    B -- No --> C[return null]
    B -- Yes --> D["extractDstRuleByProbing(tz)"]
    D --> E{probe rule != null?}

    subgraph Probing ["extractDstRuleByProbing loop over DST_RULE_VALIDATION_YEARS"]
        F["extractDstRuleByProbing(tz, refYear)"] --> G["Scan hourly: janFirst to nextJanFirst"]
        G --> H{offset changed?}
        H -- Yes --> I["binarySearchTransition: exact UTC ms"]
        I --> J{curOffset > prevOffset?}
        J -- Yes --> K[dstOnTransition = exactMs]
        J -- No --> L[dstOffTransition = exactMs]
        H -- No --> G
        K --> M{both found?}
        L --> M
        M -- Yes --> N["decodeTransition(dstOnTransition, rawOffset)"]
        N --> O["decodeTransition(dstOffTransition, rawOffset)"]
        O --> P["verifyDstRuleAcrossReferenceYears (calls undefined verifyDstRule)"]
        P -- pass --> Q[return DstRule]
        P -- fail --> R[try next refYear]
        M -- No --> R
    end

    D --> F
    E -- Yes --> S[return dstRule]
    E -- No --> T["extractDstRuleFromZoneRules(timezoneId, tz, rules)"]

    subgraph ZoneRules ["extractDstRuleFromZoneRules"]
        U["rules.getTransitionRules()"] --> V{exactly 2 rules?}
        V -- No --> W[null or throw]
        V -- Yes --> X[classify start/end by deltaMillis]
        X --> Y["fillDstRuleFromTransitionRule start"]
        Y --> Z["fillDstRuleFromTransitionRule end"]
        Z --> AA["verifyDstRuleAcrossReferenceYears (calls undefined verifyDstRule)"]
        AA -- pass --> AB[return DstRule]
        AA -- fail --> AC[throw IllegalStateException]
    end

    T --> U
    T --> AD{rule != null?}
    AD -- Yes --> AE[return dstRule]
    AD -- No --> AF[throw IllegalStateException]
Loading

Reviews (1): Last reviewed commit: "ORC tz split 2/6: OrcTimezoneInfo DstRul..." | Re-trigger Greptile

Comment on lines +353 to 360
private static boolean verifyDstRuleAcrossReferenceYears(TimeZone tz, DstRule rule) {
for (int refYear : DST_RULE_VALIDATION_YEARS) {
if (!verifyDstRule(tz, rule, refYear)) {
return false;
}
}
return true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing methods prevent compilation

verifyDstRuleAcrossReferenceYears calls verifyDstRule(tz, rule, refYear) (line 355), and the static field initializer at line 147 calls utcMillisForDate(1, 0, 1), but neither method is defined anywhere in this file or in any other file in the package. The class will fail to compile as-is, meaning any CI job targeting this branch will fail. The PR description explicitly notes this is intentional ("this intermediate state intentionally truncates the class after decodeTransition"), but it's worth confirming that CI is not expected to pass until #orc-tz-3 lands.

Comment on lines +19 to +31
import java.time.DateTimeException;
import java.time.Instant;
import java.time.LocalDate;
import java.time.ZoneId;
import java.time.zone.ZoneOffsetTransition;
import java.time.zone.ZoneOffsetTransitionRule;
import java.time.zone.ZoneRules;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.TimeZone;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Numerous forward-declared unused imports

Nine of the eleven new imports are unused in this file: DateTimeException, Instant, LocalDate, ZoneId, ZoneOffsetTransition, ArrayList, Arrays, ConcurrentHashMap, ConcurrentMap. The PR description calls this out ("java.time / java.util.concurrent imports"), so these are intentional pre-declarations for #orc-tz-3. Just worth noting that javac -Xlint:all or a strict checkstyle configuration will flag these, which may also break CI depending on build settings.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +389 to +394
private static int[] decodeTransition(long utcMs, int rawOffsetMs) {
// Convert UTC ms to standard local time
long localMs = utcMs + rawOffsetMs;
java.time.Instant instant = java.time.Instant.ofEpochMilli(localMs);
java.time.LocalDateTime ldt = java.time.LocalDateTime.ofInstant(
instant, java.time.ZoneOffset.UTC);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fully-qualified java.time names inside decodeTransition

java.time.Instant, java.time.LocalDateTime, and java.time.ZoneOffset are all written with their full package prefix here, while the rest of the file uses short imported names. Instant is already imported (albeit unused elsewhere right now), and LocalDateTime/ZoneOffset should be added to the import block alongside the other java.time.* types. Using fully-qualified names in a method body is unusual Java style and makes the code harder to read at a glance.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

// calendars agree on the offset at this instant. Kept as a single anchor so
// the GPU side matches whatever TimeZone.getOffset returns here.
private static final long MIN_SUPPORTED_ORC_UTC_MILLIS = utcMillisForDate(1, 0, 1);
private static final long HISTORICAL_TRANSITION_SCAN_STEP_MILLIS = 24L * 3600_000L;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 HISTORICAL_TRANSITION_SCAN_STEP_MILLIS declared but never used

This constant is defined here but never referenced in any method of this file. If it belongs to the HistoricalTransitions logic described as landing in #orc-tz-3, it might be cleaner to define it closer to where it is used rather than leaving it as dead code in this intermediate state.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@res-life

Copy link
Copy Markdown
Collaborator Author

useless now.

@res-life res-life closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants