Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ target
dependency-reduced-pom.xml
.idea/*
target/
examples/data/**
.cache
*~
mvn_install.log
.vscode/*
.DS_Store

7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,13 @@ Parquet-Java has supported Java Vector API to speed up reading, to enable this f
* Edit spark class#VectorizedRleValuesReader, function#readNextGroup refer to parquet class#ParquetReadRouter, function#readBatchUsing512Vector
* Build spark with maven and replace spark-sql_2.12-{VERSION}.jar on the spark jars folder

## Documentation

For comprehensive usage documentation, examples, and tutorials, see:

- **[Examples](examples/)** - Practical examples demonstrating basic and advanced usage
- **[API Reference](examples/README.md)** - Detailed API documentation

## Map/Reduce integration

[Input](https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java) and [Output](https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java) formats.
Expand Down
76 changes: 76 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->

# Parquet Java Examples

This directory contains real examples demonstrating how to use the Apache Parquet Java library.

## Examples Overview

### 1. BasicReadWriteExample.java
Demonstrates basic reading and writing of Parquet files using the example API.

- Schema definition
- Writing data with compression
- Reading data and calculating statistics
- Basic configuration options

**Run with:**
```bash
mvn exec:java -Dexec.mainClass="org.apache.parquet.examples.BasicReadWriteExample"
```

### 2. AvroIntegrationExample.java
Shows how to integrate Avro with Parquet format.

- Avro schema definition
- Writing Avro records to Parquet
- Reading Parquet files as Avro records
- Schema projection for performance

**Run with:**
```bash
mvn exec:java -Dexec.mainClass="org.apache.parquet.examples.AvroIntegrationExample"
```

### 3. AdvancedFeaturesExample.java
Demonstrates advanced Parquet features.

- Predicate pushdown filtering
- Performance optimization with projections
- Complex filter conditions

**Run with:**
```bash
mvn exec:java -Dexec.mainClass="org.apache.parquet.examples.AdvancedFeaturesExample"
```

## Prerequisites

- Java 8 or higher
- Maven 3.6+

## Contributing

Feel free to contribute additional examples by:

1. Creating new example classes
2. Improving existing examples
3. Adding more comprehensive test cases
4. Updating documentation
158 changes: 158 additions & 0 deletions examples/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.apache.parquet</groupId>
<artifactId>parquet-examples</artifactId>
<version>1.15.1</version>
<packaging>jar</packaging>

<name>Parquet Java Examples</name>
<description>Examples demonstrating how to use the Apache Parquet Java library</description>

<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<parquet.version>1.15.1</parquet.version>
<hadoop.version>3.3.4</hadoop.version>
<avro.version>1.11.1</avro.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-common</artifactId>
<version>${parquet.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-encoding</artifactId>
<version>${parquet.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
<version>${parquet.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>${parquet.version}</version>
</dependency>

<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>${parquet.version}</version>
</dependency>

<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-thrift</artifactId>
<version>${parquet.version}</version>
</dependency>

<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-protobuf</artifactId>
<version>${parquet.version}</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>

<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>${avro.version}</version>
</dependency>

<dependency>
<groupId>org.xerial.snappy</groupId>
<artifactId>snappy-java</artifactId>
<version>1.1.9.1</version>
</dependency>

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.36</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.12</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.apache.parquet.examples.BasicReadWriteExample</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.1.2</version>
</plugin>
</plugins>
</build>
</project>
Loading