Skip to content

ORC tz split 6/6: GpuTimeZoneDB tests for DST and ORC conversion#4550

Closed
res-life wants to merge 1 commit into
orc-tz-5-cuda-dst-gpufrom
orc-tz-6-tests
Closed

ORC tz split 6/6: GpuTimeZoneDB tests for DST and ORC conversion#4550
res-life wants to merge 1 commit into
orc-tz-5-cuda-dst-gpufrom
orc-tz-6-tests

Conversation

@res-life

Copy link
Copy Markdown
Collaborator

Final part of the split of #4432.
Companion: NVIDIA/cudf-spark#14544
Previous: #4549

Adds GpuTimeZoneDBTest coverage for:

  • DST rule extraction across DST and non-DST zones, validating startMonth / startDay / startDayOfWeek / endMonth / time-mode / rule-mode values.
  • Probe-based extraction fallback for fixed-offset zones.
  • Historical transition handling for America/Los_Angeles, America/New_York, Europe/London, Asia/Tokyo, etc.
  • buildHistoricalTransitions edge cases.
  • End-to-end convertOrcTimezones across cross-tz and same-tz cases using the OrcTimezoneContext lifecycle.

After this PR the working tree matches #4432 head; the cumulative diff against main is identical to #4432 (7 files, +1755/-416).

Signed-off-by: Chong Gao chongg@nvidia.com

Final part of the split of #4432.

Adds GpuTimeZoneDBTest coverage for:

- DST rule extraction across DST and non-DST zones, validating
  startMonth / startDay / startDayOfWeek / endMonth / time-mode /
  rule-mode values.
- Probe-based extraction fallback for fixed-offset zones.
- Historical transition handling for America/Los_Angeles, America/New_York,
  Europe/London, Asia/Tokyo, etc.
- buildHistoricalTransitions edge cases.
- End-to-end convertOrcTimezones across cross-tz and same-tz cases
  using OrcTimezoneContext lifecycle.

After this PR the working tree matches PR #4432 head; the cumulative
diff against main is identical to #4432.

Signed-off-by: Chong Gao <chongg@nvidia.com>
@res-life res-life force-pushed the orc-tz-5-cuda-dst-gpu branch from bb754db to 2579702 Compare May 19, 2026 08:33
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds comprehensive test coverage to GpuTimeZoneDBTest for DST-aware ORC timezone conversion, completing the final part of the ORC timezone split series. It also fixes the CPU oracle to correctly handle fixed-offset zone IDs (e.g. +05:30) that java.util.TimeZone silently maps to GMT, by routing them through ZoneRules via a new OffsetLookup interface.

  • DST unlock: removes the isDST guards that previously skipped DST zones in testConvertOrcTimezones, and adds targeted tests for historical transitions, far-future DST fallback (year 9999, Asia/Gaza, America/Santiago), and fixed-offset "runtime synthesis" zones not present in the pre-built database.
  • New oracle design: OffsetLookup.of() correctly resolves fixed-offset IDs through GpuTimeZoneDB.getZoneId before falling back to TimeZone::getOffset, preventing the oracle from sharing the same blind spot as the bug it validates.
  • Opt-in exhaustive test: validates every 30-minute mark from year 0 to 9999 across four zones but requires -Dspark.rapids.tests.orc.exhaustiveHalfHour=true to run.

Confidence Score: 4/5

Safe to merge — the change is test-only and the new oracle correctly avoids sharing the fixed-offset blind spot it was designed to catch.

The oracle design is sound: fixed-offset IDs go through ZoneRules rather than TimeZone to avoid the bug being validated, while named DST zones correctly delegate to TimeZone::getOffset. The only concerns are non-blocking: assertFalse used in an inverted polarity (should be assertTrue), and the exhaustive test starts from ISO year 0 while the documented GPU-supported range opens at year 1.

No files require special attention. The single changed file is a test; review focus should be on the OffsetLookup fallback logic and the year-0 start in the exhaustive test.

Important Files Changed

Filename Overview
src/test/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDBTest.java Adds comprehensive DST and ORC timezone conversion tests; two minor style issues (inverted assertFalse usage, exhaustive-test start year mismatch with documented range).

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Test
    participant OffsetLookup
    participant GpuTimeZoneDB
    participant ZoneRules
    participant TimeZone
    participant GPU as GPU (convertOrcTimezones)

    Test->>GpuTimeZoneDB: cacheDatabase()
    GpuTimeZoneDB-->>Test: OK

    Test->>OffsetLookup: of(tzId)
    OffsetLookup->>GpuTimeZoneDB: getZoneId(tzId)
    GpuTimeZoneDB-->>OffsetLookup: ZoneId
    OffsetLookup->>ZoneRules: isFixedOffset()?
    alt Fixed offset zone (e.g. +05:30)
        ZoneRules-->>OffsetLookup: true → constant offsetMs
        OffsetLookup-->>Test: "lambda ms -> offsetMs"
    else Named DST zone (e.g. America/Los_Angeles)
        ZoneRules-->>OffsetLookup: false
        OffsetLookup->>TimeZone: getTimeZone(tzId)
        TimeZone-->>OffsetLookup: TimeZone object
        OffsetLookup-->>Test: tz::getOffset (DST-aware)
    end

    Test->>Test: convertOrcTimezonesOnCPU(micros, writer, reader)
    Note over Test: CPU oracle computes expected result

    Test->>GPU: convertOrcTimezones(input, 0L, writerTz, readerTz)
    GPU-->>Test: ColumnVector actual

    Test->>Test: assertColumnsAreEqual(expected, actual)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Test
    participant OffsetLookup
    participant GpuTimeZoneDB
    participant ZoneRules
    participant TimeZone
    participant GPU as GPU (convertOrcTimezones)

    Test->>GpuTimeZoneDB: cacheDatabase()
    GpuTimeZoneDB-->>Test: OK

    Test->>OffsetLookup: of(tzId)
    OffsetLookup->>GpuTimeZoneDB: getZoneId(tzId)
    GpuTimeZoneDB-->>OffsetLookup: ZoneId
    OffsetLookup->>ZoneRules: isFixedOffset()?
    alt Fixed offset zone (e.g. +05:30)
        ZoneRules-->>OffsetLookup: true → constant offsetMs
        OffsetLookup-->>Test: "lambda ms -> offsetMs"
    else Named DST zone (e.g. America/Los_Angeles)
        ZoneRules-->>OffsetLookup: false
        OffsetLookup->>TimeZone: getTimeZone(tzId)
        TimeZone-->>OffsetLookup: TimeZone object
        OffsetLookup-->>Test: tz::getOffset (DST-aware)
    end

    Test->>Test: convertOrcTimezonesOnCPU(micros, writer, reader)
    Note over Test: CPU oracle computes expected result

    Test->>GPU: convertOrcTimezones(input, 0L, writerTz, readerTz)
    GPU-->>Test: ColumnVector actual

    Test->>Test: assertColumnsAreEqual(expected, actual)
Loading

Reviews (1): Last reviewed commit: "ORC tz split 6/6: GpuTimeZoneDB tests fo..." | Re-trigger Greptile

Comment on lines +368 to +370
assertFalse(gazaSamples.length == 0, "precondition: expected future transitions for Asia/Gaza");
assertFalse(santiagoSamples.length == 0,
"precondition: expected future transitions for America/Santiago");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inverted assertion polarity — confusing failure messages

assertFalse(x == 0, msg) reads as "assert that 'x equals zero' is false", which is harder to parse than the positive form and produces a less informative failure message. The same pattern appears at line 462 (assertFalse(batchSize <= 0, ...)). Prefer assertTrue(gazaSamples.length > 0, ...) and assertTrue(batchSize > 0, ...) so both the source and the failure output are immediately clear.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

};

long stepMicros = TimeUnit.MINUTES.toMicros(30);
long startMicros = microsUtc(LocalDateTime.of(0, 1, 1, 0, 0));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Exhaustive test starts from year 0 (1 BCE) while the documented GPU range begins at year 1 CE

LocalDateTime.of(0, 1, 1, 0, 0) is ISO year 0, which is 1 BCE in the proleptic Gregorian calendar. The Javadoc for this test says "year 0000 through 9999", but the main testConvertOrcTimezones comment explicitly bounds the supported range to (0001-01-01, 9999-12-31). Timestamps before year 1 may be outside the GPU implementation's tested boundary; if the GPU path does not handle them, only opt-in runs will surface the discrepancy. Consider aligning this start with LocalDateTime.of(1, 1, 1, 0, 0) or adding a comment documenting why year 0 is intentional.

@res-life

Copy link
Copy Markdown
Collaborator Author

useless now.

@res-life res-life closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants