Skip to content

Releases: mjakubowski84/parquet4s

v1.3.1

03 Jul 18:53
Compare
Choose a tag to compare

A nasty bug sneaked in into 1.3.0 release that made impossible to use a new viaParquet flow. Sorry for that! Please use this bugfix release.

v1.3.0

02 Jul 19:00
Compare
Choose a tag to compare

This release crowns months of work on partitioning! And not only that.
Most of effort was focused on making Akka library more functional.

Changes in core module:

  • Upgrade of Parquet library to version 1.11. Huge parts of its API got deprecated - including parts that is used in Parquet4S. Not all code has been updated to new API in order to allow gradual migration. You can expect that to change in the next major release.
  • Core module received many new classes that support partitioning but not new behaviour/functionality was added to this module.
    Changes in akka module:
  • ParquetStreams.fromParquet can now read partitioned directories. It also applies filters to partitions what allows to read Parquet files even faster! If filter predicate does not comply with partition value then the whole directory is not read.
  • Sink ParquetStreams.toParquetIndefinite got deprecated in favour of a new builder of passthrough flow: ParquetStreams.viaParquet. Check updated example how an indefinite stream that writes Parquet files can look like!
  • ParquetStreams.viaParquet allows you to write partitioned Parquet files. Just specify which fields should be used for partitioning and then those values will not be saved into Parquet file but in the name of parent directory.
  • Akka library is upgraded to version 2.5.31.

v1.2.1

18 Jun 07:31
Compare
Choose a tag to compare

That's not a real hotfix to 1.2.0 but a general fix.
@mac01021 noticed that Parquet4s incorrectly reads legacy forms of lists. While "new" forms are handled perfectly, elements of "old" forms were read as nulls. Now Parquet4s supports several forms of repeated elements. Take note that still some legacy structures may not work.

v1.2.0

11 Jun 11:24
Compare
Choose a tag to compare

More improvements in library internals that made it even faster and useful!

  • Improved collection serialisation and deserialisation. It is done now with lower number of traversals.
  • CollectionTransformers are no more in use. Transformations are now done using Scala 2.13's Factory. Lower versions of Scala use https://github.com/scala/scala-library-compat.
  • Thanks to above number of supported Scala collections grows a lot! Now Parquet4S reads and writes any immutable or mutable Scala collection of single-type element that has related Scala 2.13 Factory. Scala 2.11 and 2.12 derive factory from CanBuildFrom.
  • Array[Byte] is finally saved as binary (and vice versa).
  • Last but not least - we encourage you to check internal API of generic RowParquetRecord, ListParquetRecord and MapParquetRecord. You can use it for reading and writing Parquet without defining schema by means of case class.

v1.1.0

29 May 07:33
Compare
Choose a tag to compare

This release mostly focuses on performance as we did some tweaks in internals of the library. But that's not all!

  • Thanks to @mac01021 ParquetRecord and its implementations obtained new API that made records mutable collections. It allows to perform operations on Parquet files in a generic way.
  • While improving functionality of API of ParquetRecord we also improved its performance. Reading and writing Parquet files is now faster!
  • Scala version upgrades: 2.12.10 -> 2.12.11, 2.13.1 -> 2.13.2
  • Several fixes in examples as some of them didn't work in Scala 2.11 (due to bug in Scala 2.11 itself)

v1.0.0

06 Jan 16:13
Compare
Choose a tag to compare

New year - new release! And a big one!

Notables changes

Features

  • Both core and akka module are now available also for for Scala 2.13!

Braking changes

  • IncrementalParquetWriter is gone. Its API is merged to regular ParquetWriter. Now you can create writer, use it and then close it. Or you can use ParquetWriter.wirteAndClose function at once. Check improved examples for more details.
  • Cleanup of deprecated functions in akka module.

Improvements

  • More examples!
  • Documented list of supported type mappings.
  • Upgrade od sbt, akka and more.

v0.11.0

18 Oct 07:56
Compare
Choose a tag to compare

@SeanU did a great job and this release includes his work on new filtering operator. Given list of values, while reading, you can use SQL-like in to filter Parquet files.

v0.10.0

04 Oct 16:27
Compare
Choose a tag to compare

This is what probably many waited for! Filter pushdown or how they call it differently - before-read filtering. One of the best features of Parquet. Now you can efficiently read and filter data using simple predicates. Check Readme, ScalaDoc and code for more details!

Other minor change - Scala 2.12.x updated to 2.12.10.

v0.9.1

13 Sep 15:11
Compare
Choose a tag to compare

Fixes bug in IndefiniteStreamParquetSink - default writer options were used internally instead of ones passed using akka api.

v0.9.0

12 Sep 19:46
Compare
Choose a tag to compare

This release seems to be small but has few things changed under the hood:

  • Scala 2.12 upgraded to 2.12.9
  • Simplified Parquet Value system so it matches better original Parquet library. Changes made lib a little bit faster!
  • Improved handling of BigDecimals. Reading and writing decimal values is now safer and faster.
  • Last but not least, thanks to @Timvd, bug in akka module is fixed. Custom Hadoop configuration was not properly handled when validating paths. Now it is better!