Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,14 @@

## Parquet

[Apache Parquet](https://parquet.apache.org/docs/overview/) is a columnar storage format built for Hadoop. It is most efficient without the use of a codec. Positive results, however, can be achieved when it's configured with [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source).
[Apache Parquet](https://parquet.apache.org/docs/overview/) is a columnar storage format built for Hadoop. Pipeline authors can use the parquet codec to read Parquet data directly from the S3 object. This will retrieve all data from Parquet. An alternative is to use [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source) instead of the codec. In this case, [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source) will parse the Parquet file directly. This can be more efficient if you are filtering or loading a subset of data.

Additional S3 charges apply when using [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source).
{: .note}

## Avro

[Apache Avro] helps streamline streaming data pipelines. It is most efficient when used with the [`avro` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/s3#avro-codec) inside an `s3` sink.
[Apache Avro](https://avro.apache.org/docs) is a columnar storage format built for Hadoop. It is most efficient without the use of a codec, however, great results can be achieved when it is configured with [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source).

Check failure on line 49 in _data-prepper/common-use-cases/codec-processor-combinations.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Avro. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Avro. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/common-use-cases/codec-processor-combinations.md", "range": {"start": {"line": 49, "column": 9}}}, "severity": "ERROR"}

## `event_json`

Expand Down
Loading