[SPARK-53890][SDP] Test (and fix) read/readstream options are respected for pipelines #53073
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Today, read options attached to any
UnresolvedRelationthat is analyzed by the pipelines flow analyzer are dropped. This PR fixes that bug, and in doing so also makes the following micro refactors:StreamingReadOptions/BatchReadOptions. Previously neither of the fields of either classes were ever populated, and the classes were instead used to determine whether a streaming read or batch read was being executed.Tableclass hierarchy.Tableis aGraphElementbut it is not an Input. Because it was previously inheritingInputit had aloadoverride, but that was dead code; logically aTablecould never be passed into the polymorphic call sites ofInput.load.AnalysisWarning, whose exceptions were also dead codeWhy are the changes needed?
Prior to these changes, any options specified in
UnresolvedRelation.optionswould be dropped when analyzed viaFlowAnalysis.analyze. To my knowledge, in a vanilla installation of Spark (ex. without Delta io) today there are no options that could be dropped that would've otherwise actually been respected by the creation of anUnresolvedRelation(ex. viaspark.read.table), but at the very least this is future proofing a definite bug.How was this patch tested?
org.apache.spark.sql.pipelines.analysis.ReadOptionsPropagationOnAnalysisSuite