You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
materialize-iceberg: serialize pyspark command inputs to a file
The maximum argument length for an EMR job is actually quite limited, around 10k
characters, so if the command input for a job gets very long it will fail. This
will happen if any number of significant bindings is associated with a
transaction or even a single binding with a large number of fields, since all of
the fields and their types need to be provided to the script in a serialized
form, in addition to the query to execute.
The fix here is to write out the input to a temporary cloud storage file and
read that in the PySpark script. Rather than providing the input as an argument,
the input is now a URI to the input file.
0 commit comments