Skip to content

Move column vector and host memory helpers to the columnar module#15031

Open
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02e-shims-35-40from
codex/unshim-stack-02f-columnar-vectors
Open

Move column vector and host memory helpers to the columnar module#15031
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02e-shims-35-40from
codex/unshim-stack-02f-columnar-vectors

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It moves column-vector Java classes and host-memory stream helpers into the columnar helper module. Later caller updates can then depend on the shared columnar module instead of mixed shim-local copies.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov changed the title codex/unshim stack 02f columnar vectors Move column vector and host memory helpers to the columnar module Jun 10, 2026
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02e-shims-35-40 branch from 0c5a1fb to ac60c19 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02f-columnar-vectors branch 3 times, most recently from 204a929 to 6fbb0ad Compare June 10, 2026 21:36
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02e-shims-35-40 branch from ac60c19 to 8288b83 Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02f-columnar-vectors branch from 6fbb0ad to 03678ac Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR moves column-vector Java classes and host-memory stream helpers from sql-plugin into the sql-plugin-columnar shared module as part of the unshim stack refactoring. Several Scala classes are also translated to Java in the process.

  • Fourteen Java files are relocated (or created as Java translations of Scala originals) into sql-plugin-columnar, including GpuCompressedColumnVector, HostMemoryInputStream/OutputStream, iterator helpers, and all host-column-vector classes.
  • The from(CompressedTable) convenience overload on GpuCompressedColumnVector is removed; all five affected call sites (verified via codebase search) are updated to use the explicit from(buffer, meta) two-arg form.
  • HostMemoryInputStream is now a standalone Java class and no longer extends the Scala HostMemoryInputStreamMixIn trait; downstream callers will be updated in subsequent stack layers as described in the PR.

Confidence Score: 4/5

Safe to merge; this is a pure structural relocation with faithful Scala-to-Java translations and complete call-site updates for the removed convenience overload.

All Scala-to-Java translations match their originals exactly, including the pre-existing mark <= 0 boundary in HostMemoryInputStreamMixIn. The only new finding is a redundant limit() override in HostByteBufferIterator. The removal of from(CompressedTable) is complete — the two callers not touched by this diff (SpillFramework.scala, ShuffleBufferCatalog.scala) already used the 2-arg form. HostMemoryInputStream intentionally drops the HostMemoryInputStreamMixIn relationship as part of the stated migration plan.

No files require special attention beyond the minor redundant override in HostByteBufferIterator.java.

Important Files Changed

Filename Overview
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/AbstractHostByteBufferIterator.java New Java class, faithful translation of the Scala abstract class; logic is identical.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/HostMemoryInputStream.java New Java class; logic faithfully matches HostMemoryInputStreamMixIn; does not extend the mixin (intentional per migration plan).
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/HostMemoryOutputStream.java New Java class, faithful translation of Scala HostMemoryOutputStream; behavior is identical.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/NullHostMemoryOutputStream.java New Java class; faithfully translates Scala NullHostMemoryOutputStream including pre-existing NPE risk for write(ByteBuffer).
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/GpuCompressedColumnVector.java Moved from sql-plugin; removed from(CompressedTable) convenience overload; all callers updated to use the 2-arg form.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/HostByteBufferIterator.java New Java class; contains a redundant limit() override that duplicates the parent default.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/MemoryBufferToHostByteBufferIterator.java New Java class, faithful translation of Scala MemoryBufferToHostByteBufferIterator; logic is identical.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostMemoryStreams.scala Stripped of HostMemoryOutputStream, NullHostMemoryOutputStream, and HostMemoryInputStream classes; HostMemoryInputStreamMixIn trait retained.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala Updated two call sites from from(compressedTable) to from(ct.buffer, ct.meta); correct and consistent with remaining callers.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ColumnViewUtils.java Translated from Scala to Java; logic is equivalent, uses try-with-resources correctly for Scalar lifecycle.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph OLD["sql-plugin (before)"]
        A1[GpuCompressedColumnVector.java]
        A2[HostMemoryInputStream/OutputStream.scala]
        A3[AbstractHostByteBufferIterator.scala]
        A4[RapidsHostColumnVector classes]
    end

    subgraph NEW["sql-plugin-columnar (after)"]
        B1[GpuCompressedColumnVector.java]
        B2[HostMemoryInputStream.java]
        B3[HostMemoryOutputStream.java]
        B4[NullHostMemoryOutputStream.java]
        B5[AbstractHostByteBufferIterator.java]
        B6[HostByteBufferIterator.java]
        B7[MemoryBufferToHostByteBufferIterator.java]
        B8[RapidsHostColumnVector classes]
    end

    subgraph KEPT["sql-plugin (retained)"]
        C1[HostMemoryInputStreamMixIn trait]
        C2[GpuPartitioning.scala updated callers]
    end

    OLD -->|moved/translated| NEW
    C2 -->|from buffer meta| B1
Loading

Reviews (1): Last reviewed commit: "Move host memory stream helpers to colum..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants