-
Notifications
You must be signed in to change notification settings - Fork 288
[AutoSparkUT] Fix ORC coalescing ignoreMissingFiles #15103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
1a09e07
007ccea
bc2baf7
c25e238
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| /* | ||
| * Copyright (c) 2019-2025, NVIDIA CORPORATION. | ||
| * Copyright (c) 2019-2026, NVIDIA CORPORATION. | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
|
|
@@ -652,19 +652,24 @@ case class GpuOrcMultiFilePartitionReaderFactory( | |
|
|
||
| metrics.getOrElse(FILTER_TIME, NoopMetric).ns { | ||
| metrics.getOrElse(SCAN_TIME, NoopMetric).ns { | ||
| files.map { file => | ||
| val orcPartitionReaderContext = filterHandler.filterStripes(file, dataSchema, | ||
| readDataSchema, partitionSchema) | ||
| compressionAndStripes.getOrElseUpdate(orcPartitionReaderContext.compressionKind, | ||
| new ArrayBuffer[OrcSingleStripeMeta]) ++= | ||
| orcPartitionReaderContext.blockIterator.map(block => | ||
| OrcSingleStripeMeta( | ||
| orcPartitionReaderContext.filePath, | ||
| OrcDataStripe(OrcStripeWithMeta(block, orcPartitionReaderContext)), | ||
| file.partitionValues, | ||
| OrcSchemaWrapper(orcPartitionReaderContext.updatedReadSchema), | ||
| readDataSchema, | ||
| OrcExtraInfo(orcPartitionReaderContext.requestedMapping))) | ||
| files.foreach { file => | ||
| try { | ||
| val orcPartitionReaderContext = filterHandler.filterStripes(file, dataSchema, | ||
| readDataSchema, partitionSchema) | ||
| compressionAndStripes.getOrElseUpdate(orcPartitionReaderContext.compressionKind, | ||
| new ArrayBuffer[OrcSingleStripeMeta]) ++= | ||
| orcPartitionReaderContext.blockIterator.map(block => | ||
| OrcSingleStripeMeta( | ||
| orcPartitionReaderContext.filePath, | ||
| OrcDataStripe(OrcStripeWithMeta(block, orcPartitionReaderContext)), | ||
| file.partitionValues, | ||
| OrcSchemaWrapper(orcPartitionReaderContext.updatedReadSchema), | ||
| readDataSchema, | ||
| OrcExtraInfo(orcPartitionReaderContext.requestedMapping))) | ||
| } catch { | ||
| case e: FileNotFoundException if ignoreMissingFiles => | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT:
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch — added a test that deletes a planned ORC file with |
||
| logWarning(s"Skipped missing file: ${file.filePath}", e) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you guard the
filterStripesresult before dereferencing it?filterStripescan returnnullfor an empty ORC file, and this line currently readsorcPartitionReaderContext.compressionKindbefore any null check, so that case would throwNullPointerException.Please wrap all uses of
orcPartitionReaderContextin the non-null branch, matching the existing ORC reader paths that handle a null context by producing/skipping empty input.For example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch — guarded it. An empty ORC file makes filterStripes return null, so that file is now skipped instead of dereferencing the context, matching the single-file path that uses EmptyPartitionReader.