Skip to content

[Part 2] [ORC-timezone]: Extract recurring DST rule from JVM timezone APIs#4635

Merged
res-life merged 35 commits into
NVIDIA:mainfrom
res-life:orc-tz-pr2-dst-extract
Jun 5, 2026
Merged

[Part 2] [ORC-timezone]: Extract recurring DST rule from JVM timezone APIs#4635
res-life merged 35 commits into
NVIDIA:mainfrom
res-life:orc-tz-pr2-dst-extract

Conversation

@res-life

@res-life res-life commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Context

Part 2 of the split of #4432 (full ORC DST GPU support). Part 1 (#4539, runtime build of OrcTimezoneInfo) is merged. This PR adds the Java-side machinery to extract a recurring DST rule for a zone, but does not yet wire it into the GPU dispatch path — production behavior is unchanged.

What this adds

New file OrcDstRuleExtractor.java (package-private final class), which hosts:

  • static final class DstRule — SimpleTimeZone-shaped DST descriptor (raw offset, dst savings, start/end month / day / dayOfWeek / time / mode), 0-based month and Calendar-style 1=Sun..7=Sat to match the GPU consumer.
  • extractDstRule(timezoneId, tz, rules) — entry point. Returns null for non-DST zones (rules.isFixedOffset() or !tz.useDaylightTime()). Fail-fast sanity check that tz and rules describe the same zone (guards against the silent-GMT trap where TimeZone.getTimeZone(id) returns GMT for offset-style ids).
  • Path B (tried first): extractDstRuleByProbing — probe tz.getOffset() hourly across a reference year, find the two transitions, encode them in SimpleTimeZone form. Captures what java.util.TimeZone.getOffset actually returns, which is the source of truth the GPU side must match.
  • Path A (fallback): extractDstRuleFromZoneRules — derive from ZoneRules.getTransitionRules(). Used for zones whose recurring rule cannot be recovered by probing alone.
  • Cross-year verification: the extracted rule is re-evaluated against tz.getOffset across anchor years 2060, 2400, 9997 to catch divergence well into the recurring-rule fallback regime.

In OrcTimezoneInfo.java:

  • Widen utcMillisForDate and binarySearchTransition from private to package-private so OrcDstRuleExtractor can reuse the same UTC anchor and bracketed binary search. No behavioral change.

What this does not change

  • GpuTimeZoneDB.convertOrcTimezones — DST gate still throws UnsupportedOperationException for DST zones.
  • buildRuntimeOrcTimezoneInfo — does not yet populate any DstRule.
  • OrcTimezoneInfo's public constructor / fields (the two widened methods are package-private, not public).
  • JNI, CUDA.

extractDstRule is package-private and exercised only by OrcTimezoneInfoTest.

Verification

Extracted rules cross-checked against tzdata; the test suite is green on both JDK 8 (the build's maven.compiler.target) and JDK 17 Zulu:

Zone start end dstSavings
America/New_York 2nd Sun of March, 02:00 standard 1st Sun of November, 01:00 standard +1h
Europe/London last Sun of March, 01:00 standard last Sun of October, 01:00 standard +1h
Australia/Sydney 1st Sun of October, 02:00 standard 1st Sun of April, 02:00 standard +1h
Asia/Shanghai, UTC, +05:30 null null

Real-zone tests assert the exact (month, day, dayOfWeek, time, mode) shape for each zone. Synthetic-zone tests additionally cover both extraction paths (probing success, ZoneRules fallback), every IllegalStateException branch (unsupported rule count / shape, zero-/both-positive-/both-negative-delta, mismatched savings, negative day indicator, cross-year verification failure, tz/rules zone mismatch), and edge cases (isMidnightEndOfDay, TimeDefinition.WALL, February day-of-month clamp, multiple-DST-on probing guard). 26 tests total.

Diff size

+1314 / -2 across 3 files:

  • OrcDstRuleExtractor.java +529 (new file)
  • OrcTimezoneInfo.java +6 / -2 (visibility widening)
  • OrcTimezoneInfoTest.java +779

No changes outside this slice.

Follow-up parts

  • Part 3: CUDA civil-calendar utilities + Java↔CUDA DstRule serialization scaffolding (still all dead-code).
  • Part 4: CUDA DST kernel implementation + kernel-level C++ tests.
  • Part 5: Remove the DST gate in convertOrcTimezones, dispatch DST zones through the new kernel, add end-to-end Java tests. This is the only follow-up part that changes runtime behavior.

@greptile-apps

greptile-apps Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds the Java-side DST rule extractor (OrcDstRuleExtractor) and corresponding tests as dead-code scaffolding for upcoming ORC DST GPU support. No production behavior changes — the DST gate in convertOrcTimezones continues to throw, and buildRuntimeOrcTimezoneInfo does not yet populate DstRule.

  • Two extraction paths (extractDstRuleByProbing via TimeZone.getOffset hourly probes, extractDstRuleFromZoneRules via ZoneRules.getTransitionRules as fallback) are implemented with cross-year verification against anchor years 2060, 2400, 9997.
  • utcMillisForDate and binarySearchTransition in OrcTimezoneInfo are widened from private to package-private so OrcDstRuleExtractor can share them without duplication.
  • Tests cover real zones (America/New_York, Europe/London, Australia/Sydney, Asia/Shanghai), fixed-offset controls (UTC, +05:30), and ~15 synthetic error-path fixtures for every exception branch.

Confidence Score: 5/5

Safe to merge — all changes are dead code and production behavior is unchanged.

All new code sits behind the unmodified DST gate in convertOrcTimezones and cannot affect runtime behavior. The extractor logic is thoroughly verified by cross-year probing, and the test suite covers every exception branch.

No files require special attention for this merge.

Important Files Changed

Filename Overview
src/main/java/com/nvidia/spark/rapids/jni/OrcDstRuleExtractor.java New 529-line class implementing dual-path DST rule extraction with cross-year verification. Logic is thorough; one latent edge case in computeRuleDay's MODE_DOM default branch (no bounds clamping unlike the three sibling branches) is flagged as a forward-compatibility concern.
src/main/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfo.java Minimal, correct visibility widening of utcMillisForDate and binarySearchTransition from private to package-private; no behavioral change.
src/test/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfoTest.java Adds ~14 focused tests covering real zones, fixed-offset controls, and every exception branch. testExtractDstRuleFebruaryClampDoesNotThrow now mirrors production's computeRuleDay logic exactly in the synthetic TZ.

Reviews (13): Last reviewed commit: "Fix synthetic ZoneRules fixtures failing..." | Re-trigger Greptile

Comment thread src/main/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfo.java Outdated
Comment thread src/test/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfoTest.java Outdated
Comment thread src/main/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfo.java Outdated
Chong Gao added 6 commits June 3, 2026 10:23
Adds a static DST extraction path to OrcTimezoneInfo that produces a
SimpleTimeZone-shaped DstRule (month / day / dayOfWeek / time / mode,
plus DST savings) for any IANA zone with active DST. The extraction
tries two paths and verifies the result against tz.getOffset() across
the anchor years 2060, 2400, 9997:

- Path A: derive from ZoneRules.getTransitionRules() (recurring
  rule list returned for modern IANA zones)
- Path B: probe tz.getOffset() hourly across a reference year, then
  decode the two transitions into the SimpleTimeZone shape

The probing path runs first because it captures what
java.util.TimeZone.getOffset returns, which is the source of truth
the GPU side must match for ORC byte-compatibility. ZoneRules is
the fallback for zones whose recurring rule cannot be recovered
from probes alone.

Verified against tzdata on JDK 17:
- America/New_York: 2nd Sun of March -> 1st Sun of November
- Europe/London:    last Sun of March -> last Sun of October
- Australia/Sydney: 1st Sun of October -> 1st Sun of April
- Asia/Shanghai, UTC, +05:30: null (no DST)

The DST rule is NOT wired into buildRuntimeOrcTimezoneInfo or the
GPU dispatch path yet -- the DST gate in
GpuTimeZoneDB.convertOrcTimezones still throws
UnsupportedOperationException for DST zones, so production behavior
is unchanged. extractDstRule is package-private and exercised only
by OrcTimezoneInfoTest. The follow-up PR will hand DstRule to the
new CUDA kernel.

Signed-off-by: Chong Gao <res_life@163.com>
- Document IllegalStateException on extractDstRule with the specific
  failure modes (unsupported rule count, non-DOW_GE_DOM shape,
  mismatched savings, zero-delta, verification mismatch).
- Guard extractDstRule with rules.isFixedOffset() so the contract is
  robust against the silent +05:30 -> GMT trap in
  TimeZone.getTimeZone; mirrors the production guard already used in
  buildRuntimeOrcTimezoneInfo.
- Clamp computeRuleDay mode 2 (DOW_GE_DOM) anchor and result against
  monthLength. Defends against ZoneOffsetTransitionRule indicators
  that refer to a day past month end (e.g. Feb 29 in non-leap years
  per the JDK contract), so an unexpected rule shape returns a clean
  in-month day instead of leaking DateTimeException out of
  verifyDstRule. Mirrors SimpleTimeZone's documented within-month
  clamp for DOW_GE_DOM. No behaviour change for current IANA zones.
- Lock startTime / endTime / startTimeMode / endTimeMode for
  America/New_York in testExtractDstRuleNorthernHemisphere so a
  regression that flips timeMode (e.g. STANDARD -> WALL) is caught.
- Mirror the production fixed-offset guard in the test helper
  extractDstRuleFor and document the +05:30 -> GMT trap.

Signed-off-by: Chong Gao <res_life@163.com>
Replace the magic numbers in DstRule mode/timeMode assignments and
the consuming switch/if-else chains with named static-final ints on
DstRule itself:

  MODE_DOM, MODE_DOW_IN_MONTH, MODE_DOW_GE_DOM, MODE_DOW_LE_DOM
  TIME_MODE_WALL, TIME_MODE_STANDARD, TIME_MODE_UTC

Mirrors SimpleTimeZone's internal encoding, so the GPU side
consuming these values directly in a follow-up part can re-use the
same numeric meaning. Pure rename / constant substitution -- no
behaviour change.

Signed-off-by: Chong Gao <res_life@163.com>
extractDstRuleByProbing(tz, refYear) walked the entire reference
year at hourly step (~8760 tz.getOffset calls per try) even after
both DST transitions had been recorded. Capture the year-start
standard offset, and once both transitions are observed and the
running offset returns to it, exit the loop. The remainder of the
year is a no-op tail for any SimpleTimeZone-style two-transition
zone -- a later offset change would have correctly triggered the
existing "more than one DST-on/off" early-return.

Cuts ~700 hourly probes per probing attempt for typical
northern-hemisphere zones (DST ends in October/November); larger
savings for zones whose DST ends earlier.

Signed-off-by: Chong Gao <res_life@163.com>
computeDstOffset used utcMs + rawOffsetMs to pick the year, which
returns the standard-time local epoch day. For a southern-hemisphere
zone with DST active in late Dec / early Jan (e.g. Australia/Sydney
DST starts in October and ends in April), an instant whose actual
wall-clock year is Y can land in year Y-1 when only rawOffsetMs is
added. computeTransitionUtcMillis would then resolve DST start/end
against Y-1's transitions, and the cross-year branch below would
compare against the wrong window.

Adding dstSavings to the local guess lands utcMs inside DST when
DST is active, which is the regime where the year-boundary mis-
classification matters. Non-DST instants stay in the same year too,
since DST savings are hours not days.

Existing tests (testExtractDstRuleSouthernHemisphere) continue to
pass; verifyDstRule's anchor-year probes were already exercising
Dec/Jan boundary cases that now resolve correctly.

Signed-off-by: Chong Gao <res_life@163.com>
OrcTimezoneInfo had grown to ~700 lines and mixed two unrelated
responsibilities: building the historical-transition table consumed
by ORC rebasing, and recovering a recurring DST rule for the GPU
DST path. The DST code shared no read/write state with the
historical-transition code, only a couple of small helpers.

Lift DstRule, extractDstRule, and all DST helpers
(extractDstRuleByProbing, extractDstRuleFromZoneRules,
fillDstRuleFromTransitionRule, getTransitionRuleTime*,
toCalendarDayOfWeek, decodeTransition, verifyDstRule*,
computeDstOffset, computeTransitionUtcMillis, computeRuleDay) plus
the DST_RULE_VALIDATION_YEARS constant into a sibling
package-private OrcDstRuleExtractor class.

Shared helpers utcMillisForDate and binarySearchTransition stay in
OrcTimezoneInfo but are promoted from private to package-private so
OrcDstRuleExtractor can reuse them.

OrcTimezoneInfoTest is updated to reference OrcDstRuleExtractor
instead of OrcTimezoneInfo for DstRule and extractDstRule. The
runtime behaviour for all tested zones is unchanged
(America/New_York, Europe/London, Australia/Sydney extract the same
rule; Asia/Shanghai, UTC, +05:30 still return null).

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life force-pushed the orc-tz-pr2-dst-extract branch from e3a0b10 to 90c8863 Compare June 3, 2026 02:24
Chong Gao added 21 commits June 3, 2026 10:25
computeRuleDay's MODE_DOW_GE_DOM branch already clamps both the
LocalDate.of anchor and the returned day into [1, monthLength].
The MODE_DOW_IN_MONTH and MODE_DOW_LE_DOM branches did not, even
though they consume the same untrusted day-of-month indicator
from ZoneOffsetTransitionRule:

- MODE_DOW_IN_MONTH positive: "1 + diff + (ruleDay - 1) * 7" can
  reach 29-35 for a 5th occurrence in a 28-day February.
- MODE_DOW_IN_MONTH negative: "monthLength - diff + (ruleDay + 1) * 7"
  drops below 1 for sufficiently negative ruleDay.
- MODE_DOW_LE_DOM: LocalDate.of(... ruleDay) throws when ruleDay
  exceeds monthLength, and "ruleDay - diff" drops below 1 when
  ruleDay < diff.

The bad values would then escape as DateTimeException from
utcMillisForDate. No real IANA recurring rule triggers these
paths today (both extraction paths emit only DOW_GE_DOM), but the
constants are package-visible and Part 3+ may exercise them.

Mirror the DOW_GE_DOM treatment in the two sibling branches:
clamp anchor input and clamp the returned day so the result is
always a valid day in the requested month.

Signed-off-by: Chong Gao <res_life@163.com>
The extractDstRuleFor helper's Javadoc still pointed
{@link OrcTimezoneInfo#extractDstRule} after extractDstRule moved
to OrcDstRuleExtractor. Repoint the link to the actual method
and include the parameter signature so the reference resolves
unambiguously.

Signed-off-by: Chong Gao <res_life@163.com>
Per spark-rapids-jni Doxygen convention, package-private API
methods that already document @throws should also document
@param for each parameter and @return for the return value.
extractDstRule had only the @throws clause -- add the missing
tags so the contract is complete at the API boundary.

Signed-off-by: Chong Gao <res_life@163.com>
testExtractDstRuleEuropeLondon and testExtractDstRuleSouthernHemisphere
asserted only a subset of the rule fields (dstSavings, months, day,
dayOfWeek). Mode, timeMode, and time-of-day were not pinned, so a
regression that, for example, encoded a DOW_GE_DOM rule as
DOW_IN_MONTH or flipped STANDARD to WALL would pass these tests
silently while producing wrong UTC transitions in the GPU dispatch.

Add the missing assertions for startMode/endMode (DOW_GE_DOM = 2),
startTimeMode/endTimeMode (STANDARD = 1), and startTime/endTime
(London: 01:00; Sydney: 02:00). Values verified empirically against
the probing-path output on JDK 17 Zulu.

Signed-off-by: Chong Gao <res_life@163.com>
extractDstRule documents five distinct IllegalStateException
conditions but no test exercised any of them; only the null-return
paths (non-DST, fixed-offset) were covered. A regression that turned
one of the documented throws into a null return would have passed
silently.

Add a test that hand-crafts a TimeZone whose getOffset is constant
(so the probing path observes no transitions and falls through) and
a ZoneRules with exactly one transition rule. Production code
rejects any rule count outside {0, 2}, so extractDstRuleFromZoneRules
throws the "Unsupported ORC DST rule count" IllegalStateException.

Verified empirically on JDK 17 Zulu: the test triggers the expected
branch with the expected message.

Signed-off-by: Chong Gao <res_life@163.com>
extractDstRule's terminal IllegalStateException at line 128
("Failed to extract ORC DST rule") -- the survivor path when
useDaylightTime()==true but both probing and ZoneRules extraction
return null -- had no test. The neighboring "Unsupported ORC DST
rule count" branch is covered; this commit adds its companion.

Synthesize a TimeZone whose getOffset is constant (probing
observes no transitions) and a ZoneRules whose lastRules list is
empty but whose transitionList contains one historical entry (so
isFixedOffset() returns false and extractDstRule does not
short-circuit at the early guard). Path B returns null because of
the constant offset; Path A returns null because the recurring
rules list is empty. extractDstRule falls through to the terminal
throw. Verified empirically.

Signed-off-by: Chong Gao <res_life@163.com>
verifyDstRule's inner sample loop called computeDstOffset on every
probe (~62 calls per refYear × 3 anchor refYears × up to 3 probing
attempts = ~558 calls per cold zone). Each call re-derived the
wall-clock year from the timestamp and re-ran two
computeTransitionUtcMillis calls -- both of which allocate a
LocalDate and query lengthOfMonth / getDayOfWeek. The surrounding
loop already computed the year's dstStart and dstEnd, so the
re-derivation was redundant.

Split computeDstOffset into a thin entry point (still derives the
year for callers that don't know it) and a bounds-aware variant
computeDstOffsetWithBounds that takes dstStart and dstEnd as
parameters. verifyDstRule now calls the bounds-aware variant in
the hot loop, eliminating ~558 LocalDate allocations per cold zone
on the verification path. Public API surface and observable
behaviour unchanged; all current zone tests still extract the same
DstRule.

Signed-off-by: Chong Gao <res_life@163.com>
extractDstRuleFromZoneRules throws IllegalStateException on five
distinct shape violations. Only the "Unsupported rule count" and
final "Failed to extract" branches were covered. Add tests for the
four remaining branches:

- "Unsupported zero-delta ORC DST rule" -- second rule whose
  offsetBefore equals offsetAfter
- "Failed to identify ORC DST start/end rules" -- two rules with
  same-sign delta (both positive) leaving endTransitionRule null
- "Mismatched ORC DST savings" -- start rule +1h, end rule -2h
- "Unsupported ORC DST transition rule shape" -- first rule with
  null dayOfWeek (DOM-style fixed-day, rejected by Path A)

Each test uses the same synthetic constant-offset TimeZone pattern
(extracted into a small helper to avoid five copies of the
anonymous-class boilerplate) and a hand-crafted two-rule ZoneRules
that targets the specific branch. All four verified empirically to
trigger the expected message.

Signed-off-by: Chong Gao <res_life@163.com>
The general-purpose computeDstOffset(utcMs, rawOffsetMs, rule)
introduced when hoisting bounds out of verifyDstRule had zero
callers. Its body was already covered by
computeDstOffsetWithBounds plus a five-line year-derivation
prologue; keeping the wrapper added 20 lines of dead code (which
the project convention treats the same as commented-out code).

Drop the wrapper. The bounds-aware variant remains the sole DST
classification helper. A general-purpose entry point can be
reintroduced in a follow-up part when an actual non-verification
caller appears.

Signed-off-by: Chong Gao <res_life@163.com>
- Import LocalDateTime and Arrays directly instead of using fully
  qualified names (java.time.LocalDateTime.of, java.util.Arrays.asList).
  Consistent with the rest of the file's java.time / java.util
  imports.
- Add testExtractDstRuleThrowsOnNegativeDayIndicator covering the
  getDayOfMonthIndicator() <= 0 sub-case of fillDstRuleFromTransitionRule's
  shape guard. The existing testExtractDstRuleThrowsOnUnsupportedRuleShape
  covers the null-dayOfWeek sub-case; this one covers the negative-indicator
  sub-case (a DOW_LE_DOM rule shape), so a future split of the || guard
  cannot silently drop coverage on either half.

Signed-off-by: Chong Gao <res_life@163.com>
extractDstRuleFromZoneRules's success path was untested -- all
real-zone tests (America/New_York, Europe/London, Australia/Sydney)
satisfy the probing path before Path A is ever reached.
Consequently, fillDstRuleFromTransitionRule and the
TIME_MODE_UTC / TIME_MODE_WALL branches of getTransitionRuleTimeMode
were dead code from the test suite's perspective.

testExtractDstRuleViaZoneRulesFallback forces Path B to fail by
constructing a TimeZone that observes a real +2h DST window but
reports getDSTSavings() == +1h. Probing extracts a rule with
dstSavings=+1h, whose verifyDstRule prediction disagrees with the
observed +2h offset -- verify fails, probing returns null, and
extractDstRuleFromZoneRules runs with the synthetic ZoneRules.
That path re-derives dstSavings from the rule's offset deltas
(+2h, correct), verify passes, and the returned DstRule is
asserted to carry dstSavings=+2h (the +1h would have indicated a
Path B win).

The end rule uses TimeDefinition.UTC, exercising
getTransitionRuleTimeMode's TIME_MODE_UTC branch. The extracted
DstRule's endTimeMode is asserted to be 2 (UTC).

Signed-off-by: Chong Gao <res_life@163.com>
A multi-parameter helper that carries a Javadoc block should
spell its parameters out rather than fitting the contract into a
single 118-char line without tags. Expand the comment to use the
project's standard Doxygen-style tags.

Signed-off-by: Chong Gao <res_life@163.com>
OrcDstRuleExtractor mixed two groupings -- "3600_000" (grouped
from the right) and "3_600_000" (grouped from the left, the form
used everywhere in the test file and elsewhere in the project).
Per the project convention ("4+ digit decimal literals: 1'234'567"
translated to Java's underscore separator), thousands grouping is
left-to-right. Normalise the six occurrences in this file.

Signed-off-by: Chong Gao <res_life@163.com>
extractDstRuleFromZoneRules's "ZoneRules ORC DST rule
verification failed" throw was the last uncovered exception
branch. The test feeds a constant-offset TimeZone to a valid
2-rule synthetic ZoneRules. Probing finds no transitions, Path A
parses the rules into a DstRule with dstSavings=+1h, but the
constant tz disagrees with the rule's predictions and the verify
step rejects -- triggering the targeted throw. Verified
empirically.

Signed-off-by: Chong Gao <res_life@163.com>
Three of the four switch arms (MODE_DOW_IN_MONTH,
MODE_DOW_LE_DOM, MODE_DOM) are kept for forward compatibility with
a future caller that constructs a DstRule directly (e.g. from a
serialised form). The current Path A / Path B extraction code
always emits MODE_DOW_GE_DOM, so the other three branches have no
test path until that future caller appears. Add a short comment
documenting this intent so the gap is visible at the switch.

Signed-off-by: Chong Gao <res_life@163.com>
getTransitionRuleTimeMode's TIME_MODE_WALL else-branch and the
matching WALL arm of computeTransitionUtcMillis were untested --
all prior synthetic tests used STANDARD or UTC TimeDefinition.

testExtractDstRuleTimeDefinitionWallMode mirrors the existing
Path-A-success test but encodes the start rule with
TimeDefinition.WALL. The custom TimeZone returns +2h inside the
DST window but reports getDSTSavings()==0, so Path B's verify
fails (predicted 0 vs observed +2h) and Path A runs. The
extracted rule's startTimeMode is asserted to be 0 (TIME_MODE_WALL)
and endTimeMode is 1 (TIME_MODE_STANDARD), pinning both branches.

Verified empirically on JDK 17 Zulu.

Signed-off-by: Chong Gao <res_life@163.com>
OrcTimezoneInfoTest used LocalDate.of and LocalDate.ofEpochDay in
nthDayOfWeekUtcMs and the synthetic TimeZone subclasses introduced
by recent commits, but the import was missing -- the file did not
compile. Add the import.

Signed-off-by: Chong Gao <res_life@163.com>
getTransitionRuleTimeMillis selects between 24 * 3600 (when
isMidnightEndOfDay()==true) and getLocalTime().toSecondOfDay() based
on the ZoneOffsetTransitionRule flag. No existing test set the flag
to true, so a mutation swapping the two branches would have shipped
silently.

testExtractDstRuleMidnightEndOfDayRule passes
midnightEndOfDay=true + LocalTime.MIDNIGHT to ZoneOffsetTransitionRule.of
and asserts that the extracted rule's startTime equals 24h (not 0).
The custom TimeZone fires DST at March 2nd-Sunday + 24h and reports
getDSTSavings()==0, forcing Path B verify to fail so Path A runs.

Signed-off-by: Chong Gao <res_life@163.com>
decodeTransition recovers the standard wall clock by computing
utcMs + rawOffsetMs. The arithmetic only works because
OrcTimezoneInfo.binarySearchTransition returns the first ms at
which the new offset applies -- so on a fall-back, the input UTC
ms is already on the post-transition (standard) side and adding
rawOffsetMs gives the right answer.

That contract is not currently spelled out anywhere near
decodeTransition; a future change that shifts the binary-search
return convention by one ms would silently break this. Add the
precondition to the Javadoc.

Signed-off-by: Chong Gao <res_life@163.com>
The file uses underscore-separated literals (3_600_000) everywhere
else seconds-to-milliseconds constants appear. The bare 3600 at
line 221 is inconsistent; add the digit separator.

Signed-off-by: Chong Gao <res_life@163.com>
testExtractDstRuleThrowsOnBothPositiveDeltaRules covers
endTransitionRule == null in the "Failed to identify ORC DST
start/end rules" guard. The symmetric startTransitionRule == null
case (two negative-delta rules) was untested -- a mutation from
the || at line 157 to && would slip through silently.

Add testExtractDstRuleThrowsOnBothNegativeDeltaRules with the
same shape as the positive variant.

Signed-off-by: Chong Gao <res_life@163.com>
Chong Gao added 2 commits June 3, 2026 14:28
Math.min(ruleDay, monthLength) and Math.min(anchorDay + diff,
monthLength) in computeRuleDay's MODE_DOW_GE_DOM branch guard
against utcMillisForDate throwing DateTimeException when a
February rule comes in with dayOfMonthIndicator=29 on a non-leap
year. The code comment cites this exact scenario, but no existing
test supplies ruleDay > monthLength, so the clamps were dead from
the test suite's perspective.

testExtractDstRuleFebruaryClampDoesNotThrow constructs a synthetic
ZoneOffsetTransitionRule for FEBRUARY with indicator=29, forces
Path B verify to fail (custom getDSTSavings()==0), and asserts the
extracted startDay lands in [1, 28] without throwing.

Signed-off-by: Chong Gao <res_life@163.com>
extractDstRule's two-argument signature requires that tz and rules
describe the same zone, but only the Javadoc said so -- a caller
that passed mismatched objects would silently get a verification
mismatch much later, with no signal pointing at the API misuse.

Add an explicit fail-fast guard: compare tz.getRawOffset() against
rules.getStandardOffset(ref) at a recent reference instant. The
epoch is unsafe because some zones (Europe/London) had a different
standard offset in 1970.

Note this sits after the existing isFixedOffset/!useDaylightTime
short-circuit, so the original silent-GMT trap (e.g. tz built from
"+05:30" which returns GMT with useDaylightTime()==false) is still
handled by the early return; the new guard catches the broader
"DST-having tz mismatches DST-having rules" case.

Signed-off-by: Chong Gao <res_life@163.com>
Comment thread src/test/java/com/nvidia/spark/rapids/jni/OrcTimezoneInfoTest.java Outdated
Chong Gao added 2 commits June 3, 2026 15:08
The synthetic TimeZone reported DST starting on the 4th Sunday of
February, but Path A's encoding (dayOfMonthIndicator=29 + SUNDAY +
MODE_DOW_GE_DOM) collapses through computeRuleDay's clamp to the
last day of February (29 leap / 28 non-leap, dropping the day-of-
week constraint when ruleDay >= monthLength). For most years the
two days differ and verifyDstRuleAcrossReferenceYears throws
"verification failed" before the assertion ever runs. The second
assertion was also wrong: Path A stores getDayOfMonthIndicator()
unchanged so rule.startDay is 29, not in [1, 28].

Align the synthetic tz with computeRuleDay's clamped semantics
(report DST starting on the last day of February) and assert
rule.startDay == 29 to confirm Path A's verbatim storage. The
test still proves the clamp prevents DateTimeException because
verify exercises computeRuleDay for the non-leap reference year
9997 with ruleDay=29.

Signed-off-by: Chong Gao <res_life@163.com>
The new sanity-check at extractDstRule lines 128-133 had no test
coverage. Every other IllegalStateException branch in
extractDstRule has a dedicated synthetic-zone test
(testExtractDstRuleThrowsOnUnsupportedRuleCount,
testExtractDstRuleThrowsOnZeroDeltaRule, ...) -- this guard was
the gap.

Pair a constant-zero-offset TimeZone with ZoneRules for
America/New_York so tz.rawOffset (0) differs from
rules.getStandardOffset(ref) (-18_000_000 ms) and the guard fires
before either Path A or Path B runs.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life marked this pull request as draft June 3, 2026 07:24
Chong Gao added 3 commits June 3, 2026 15:36
The previous "last day of February" shortcut was functionally
equivalent for indicator=29 (because computeRuleDay's outer
Math.min(_, monthLength) clamp collapses the day-of-week
constraint when the anchor sits at the month boundary), but it
required a reader to derive that collapse independently to see
the test agrees with production.

Inline the same MODE_DOW_GE_DOM algorithm in the synthetic tz:
anchor = min(29, monthLength), step forward to the first SUNDAY,
clamp to monthLength. This way the synthetic getOffset and the
production computeRuleDay read identically, leaving no room for
misreading the equivalence.

Signed-off-by: Chong Gao <res_life@163.com>
The helper substitutes TimeZone.getTimeZone("UTC") for fixed-offset
zones to dodge the silent-GMT trap. That UTC placeholder has
rawOffset=0, which does *not* match the rules' actual standard
offset and would now fail the recently-added tz-vs-rules sanity
check inside extractDstRule. It is safe only because the
isFixedOffset() short-circuit at the top of extractDstRule fires
before the sanity check.

Spell out that ordering dependency in a comment so a future
reordering of guards in extractDstRule does not silently turn a
benign helper into a source of "describe different zones"
exceptions.

Signed-off-by: Chong Gao <res_life@163.com>
The early-return at the second DST-on transition in
extractDstRuleByProbing was uncovered. Every other Path-A-success
test reaches Path A by failing Path B's cross-year verify, leaving
the "second DST-on transition seen" guard untouched.

Add a synthetic TimeZone that spring-forwards twice per year
(March 2nd Sunday and June 1st Sunday) plus a fall-back at
November 1st Sunday. ZoneRules supplies one historical transition
to escape rules.isFixedOffset() short-circuit but no recurring
rules, so Path A also returns null on the empty getTransitionRules
early-out -- the terminal "Failed to extract" throw is the
expected outcome and confirms probing returned null specifically
due to the multiple-DST-on guard.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life closed this Jun 4, 2026
@res-life res-life reopened this Jun 4, 2026
@res-life

res-life commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

build

12 OrcTimezoneInfoTest cases failed under the maven build (Java 1.8
target) while passing under a JDK 17 compile check. Root cause:
ZoneRules.of(base, base, emptyList, emptyList, recurringRules) --
recurring DST rules but no concrete historical transitions --
reports isFixedOffset() == true on JDK 8, so extractDstRule's
rules.isFixedOffset() guard short-circuits to null before either
extraction path runs. The throws-tests then saw no exception and the
Path-A-success tests saw a null rule. JDK 17 reports false for the
same construction, which is why the regression was invisible to
standalone javac checks during development.

Two fixtures (BothPathsFail, MultipleDstOnTransitions) already
included a historical transition for exactly this reason; the other
12 did not. Add a shared SYNTHETIC_HISTORICAL_TRANSITION constant (an
inert 1900 wall-offset transition that leaves getStandardOffset and
the post-2060 probing windows untouched) and route every synthetic
rules-only ZoneRules.of through it.

Real IANA zones always carry historical transitions, so production is
unaffected. Verified: OrcTimezoneInfoTest 26/26 green on both JDK 8
and JDK 17.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life

res-life commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

build

@res-life res-life marked this pull request as ready for review June 4, 2026 05:20
@res-life res-life requested a review from a team June 4, 2026 05:20

@firestarman firestarman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nits, but good to merge it as-is.

if (diff < 0) diff += 7;
return Math.max(anchorDay - diff, 1);
}
case DstRule.MODE_DOM:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Empty case, can be removed ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it → documents intent — it makes explicit that MODE_DOM is a recognized, handled mode (mapped to identity), rather than something silently swept into default.

// to the variant.
long[] boundaries = {dstStart, dstEnd};
for (long boundary : boundaries) {
long from = boundary - 12 * 3_600_000L;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: better name these literal values.

int month = ldt.getMonthValue() - 1; // 0-based for Calendar compat
int dayOfMonth = ldt.getDayOfMonth();
int dayOfWeek = toCalendarDayOfWeek(ldt.getDayOfWeek().getValue());
int timeInDay = ldt.getHour() * 3_600_000

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Better name these literal values.

@res-life res-life merged commit db5872b into NVIDIA:main Jun 5, 2026
5 checks passed
@res-life res-life deleted the orc-tz-pr2-dst-extract branch June 5, 2026 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants