Releases: mjakubowski84/parquet4s
v1.3.1
A nasty bug sneaked in into 1.3.0 release that made impossible to use a new viaParquet
flow. Sorry for that! Please use this bugfix release.
v1.3.0
This release crowns months of work on partitioning! And not only that.
Most of effort was focused on making Akka library more functional.
Changes in core module:
- Upgrade of Parquet library to version 1.11. Huge parts of its API got deprecated - including parts that is used in Parquet4S. Not all code has been updated to new API in order to allow gradual migration. You can expect that to change in the next major release.
- Core module received many new classes that support partitioning but not new behaviour/functionality was added to this module.
Changes in akka module: ParquetStreams.fromParquet
can now read partitioned directories. It also applies filters to partitions what allows to read Parquet files even faster! If filter predicate does not comply with partition value then the whole directory is not read.- Sink
ParquetStreams.toParquetIndefinite
got deprecated in favour of a new builder of passthrough flow:ParquetStreams.viaParquet
. Check updated example how an indefinite stream that writes Parquet files can look like! ParquetStreams.viaParquet
allows you to write partitioned Parquet files. Just specify which fields should be used for partitioning and then those values will not be saved into Parquet file but in the name of parent directory.- Akka library is upgraded to version 2.5.31.
v1.2.1
That's not a real hotfix to 1.2.0 but a general fix.
@mac01021 noticed that Parquet4s incorrectly reads legacy forms of lists. While "new" forms are handled perfectly, elements of "old" forms were read as nulls. Now Parquet4s supports several forms of repeated elements. Take note that still some legacy structures may not work.
v1.2.0
More improvements in library internals that made it even faster and useful!
- Improved collection serialisation and deserialisation. It is done now with lower number of traversals.
CollectionTransformer
s are no more in use. Transformations are now done using Scala 2.13'sFactory
. Lower versions of Scala use https://github.com/scala/scala-library-compat.- Thanks to above number of supported Scala collections grows a lot! Now Parquet4S reads and writes any immutable or mutable Scala collection of single-type element that has related Scala 2.13
Factory
. Scala 2.11 and 2.12 derive factory fromCanBuildFrom
. Array[Byte]
is finally saved as binary (and vice versa).- Last but not least - we encourage you to check internal API of generic
RowParquetRecord
,ListParquetRecord
andMapParquetRecord
. You can use it for reading and writing Parquet without defining schema by means of case class.
v1.1.0
This release mostly focuses on performance as we did some tweaks in internals of the library. But that's not all!
- Thanks to @mac01021
ParquetRecord
and its implementations obtained new API that made records mutable collections. It allows to perform operations on Parquet files in a generic way. - While improving functionality of API of
ParquetRecord
we also improved its performance. Reading and writing Parquet files is now faster! - Scala version upgrades: 2.12.10 -> 2.12.11, 2.13.1 -> 2.13.2
- Several fixes in examples as some of them didn't work in Scala 2.11 (due to bug in Scala 2.11 itself)
v1.0.0
New year - new release! And a big one!
Notables changes
Features
- Both
core
andakka
module are now available also for for Scala 2.13!
Braking changes
IncrementalParquetWriter
is gone. Its API is merged to regularParquetWriter
. Now you can create writer, use it and then close it. Or you can useParquetWriter.wirteAndClose
function at once. Check improved examples for more details.- Cleanup of deprecated functions in
akka
module.
Improvements
- More examples!
- Documented list of supported type mappings.
- Upgrade od sbt, akka and more.
v0.11.0
v0.10.0
This is what probably many waited for! Filter pushdown or how they call it differently - before-read filtering. One of the best features of Parquet. Now you can efficiently read and filter data using simple predicates. Check Readme, ScalaDoc and code for more details!
Other minor change - Scala 2.12.x updated to 2.12.10.
v0.9.1
Fixes bug in IndefiniteStreamParquetSink - default writer options were used internally instead of ones passed using akka api.
v0.9.0
This release seems to be small but has few things changed under the hood:
- Scala 2.12 upgraded to 2.12.9
- Simplified Parquet Value system so it matches better original Parquet library. Changes made lib a little bit faster!
- Improved handling of BigDecimals. Reading and writing decimal values is now safer and faster.
- Last but not least, thanks to @Timvd, bug in akka module is fixed. Custom Hadoop configuration was not properly handled when validating paths. Now it is better!