-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
stream-encode: CSV writer differentiates 0-length vs. absent string
A string field that is present in a document with a value of 0-length, AKA the string "", is different than a string field that is absent. The stdlib Go CSV writer does not allow for such a distinction to be made, since all fields must be strings and there is no way to indicate an "absent" string. This commit implements a custom CSV writer that can tell the difference between an absent and 0-length string. An absent string gets "skipped", with no value at all placed between the commas of a row. A 0-length string gets quoted to just be "". Other strings get quoted if they have special characters requiring quotes; otherwise they are written as-is. This new CSV writer is inspired by the Go stdlib CSV writer, with the required additions for absent vs. empty string. Some of the extra configuration that we generally don't need has been stripped out like specifying a special NULL string, and using a custom "comma" value - nobody has ever used these options in our filesink materializations, and it seems reasonable to not support them unless somebody requests them down the line. This custom CSV writer also does not use an additional internal buffer, since this is redundant with the buffer(s) that are inevitably used by its outputs.
- Loading branch information
1 parent
8bce00f
commit aacc2a5
Showing
8 changed files
with
146 additions
and
135 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 0 additions & 12 deletions
12
materialize-boilerplate/stream-encode/.snapshots/TestCsvEncoder-with_custom_delimiter
This file was deleted.
Oops, something went wrong.
12 changes: 0 additions & 12 deletions
12
materialize-boilerplate/stream-encode/.snapshots/TestCsvEncoder-with_custom_null
This file was deleted.
Oops, something went wrong.
12 changes: 0 additions & 12 deletions
12
materialize-boilerplate/stream-encode/.snapshots/TestCsvEncoder-with_default_null
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.