Skip to content

Commit

Permalink
v0.4.1
Browse files Browse the repository at this point in the history
Signed-off-by: Xuzhou Qin <[email protected]>
  • Loading branch information
Xuzhou Qin committed Feb 13, 2020
1 parent 40325ac commit 37f6fc5
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 34 deletions.
74 changes: 42 additions & 32 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,34 @@
## 0.4.1 (2020-01-15)
## 0.4.1 (2020-02-13)
Changes:
- Changed benchmark unit of time to *seconds* (#88)

Fixes:
- The master URL of SparkSession can now be overwritten in local environment (#74)
- `FileConnector` now lists path correctly for nested directories (#97)

New features:
- Added [Mermaid](https://mermaidjs.github.io/#/) diagram generation to **Pipeline** (#51)
- Added `showDiagram()` method to **Pipeline** that prints the Mermaid code and generates the
live editor URL 🎩🐰✨ (#52)
- Added `showDiagram()` method to **Pipeline** that prints the Mermaid code and generates the live editor URL 🎩🐰✨ (#52)
- Added **Codecov** report and **Scala API doc**
- Added `delete` method in `JDBCConnector` (#82)
- Added `drop` method in `DBConnector` (#83)
- Added support for both of the following two Spark configuration styles in SETL builder (#86)
```hocon
setl.config {
spark {
spark.app.name = "my_app"
spark.sql.shuffle.partitions = "1000"
}
}
setl.config_2 {
spark.app.name = "my_app"
spark.sql.shuffle.partitions = "1000"
}
```

Others:
- Improved test coverage

## 0.4.0 (2020-01-09)
Changes:
Expand All @@ -26,46 +51,37 @@ Others:
- Optimized **PipelineInspector** (#33)

## 0.3.5 (2019-12-16)
- BREAKING CHANGE: replace the Spark compatible version by the Scala compatible version in the artifact ID.
The old artifact id **dc-spark-sdk_2.4** was changed to **dc-spark-sdk_2.11** (or **dc-spark-sdk_2.12**)
- BREAKING CHANGE: replace the Spark compatible version by the Scala compatible version in the artifact ID. The old artifact id **dc-spark-sdk_2.4** was changed to **dc-spark-sdk_2.11** (or **dc-spark-sdk_2.12**)
- Upgraded dependencies
- Added Scala 2.12 support
- Removed **SparkSession** from Connector and SparkRepository constructor (old constructors are kept but now deprecated)
- Added **Column** type support in FindBy method of **SparkRepository** and **Condition**
- Added method **setConnector** and **setRepository** in **Setl** that accept
object of type Connector/SparkRepository
- Added method **setConnector** and **setRepository** in **Setl** that accept object of type Connector/SparkRepository

## 0.3.4 (2019-12-06)
- Added read cache into spark repository to avoid consecutive disk IO.
- Added option **autoLoad** in the Delivery annotation so that *DeliverableDispatcher* can still handle the dependency
injection in the case where the delivery is missing but a corresponding
repository is present.
- Added option **autoLoad** in the Delivery annotation so that *DeliverableDispatcher* can still handle the dependency injection in the case where the delivery is missing but a corresponding repository is present.
- Added option **condition** in the Delivery annotation to pre-filter loaded data when **autoLoad** is set to true.
- Added option **id** in the Delivery annotation. DeliveryDispatcher will match deliveries by the id in addition to
the payload type. By default the id is an empty string ("").
- Added **setConnector** method in DCContext. Each connector should be delivered with an ID. By default the ID will be its
config path.
- Added option **id** in the Delivery annotation. DeliveryDispatcher will match deliveries by the id in addition to the payload type. By default the id is an empty string ("").
- Added **setConnector** method in DCContext. Each connector should be delivered with an ID. By default the ID will be itsconfig path.
- Added support of wildcard path for SparkRepository and Connector
- Added JDBCConnector

## 0.3.3 (2019-10-22)
- Added **SnappyCompressor**.
- Added method **persist(persistence: Boolean)** into **Stage** and **Factory** to.
activate/deactivate output persistence. By default the output persistence is set to *true*.
- Added method **persist(persistence: Boolean)** into **Stage** and **Factory** to activate/deactivate output persistence. By default the output persistence is set to *true*.
- Added implicit method `filter(cond: Set[Condition])` for Dataset and DataFrame.
- Added `setUserDefinedSuffixKey` and `getUserDefinedSuffixKey` to **SparkRepository**.

## 0.3.2 (2019-10-14)
- Added **@Compress** annotation. **SparkRepository** will compress all columns having this annotation by
using a **Compressor** (the default compressor is **XZCompressor**)
- Added **@Compress** annotation. **SparkRepository** will compress all columns having this annotation by using a **Compressor** (the default compressor is **XZCompressor**)
```scala
case class CompressionDemo(@Compress col1: Seq[Int],
@Compress(compressor = classOf[GZIPCompressor]) col2: Seq[String])
```

- Added interface **Compressor** and implemented **XZCompressor** and **GZIPCompressor**
- Added **SparkRepositoryAdapter[A, B]**. It will allow a **SparkRepository[A]** to write/read a data store of type
**B** by using an implicit **DatasetConverter[A, B]**
- Added **SparkRepositoryAdapter[A, B]**. It will allow a **SparkRepository[A]** to write/read a data store of type **B** by using an implicit **DatasetConverter[A, B]**
- Added trait **Converter[A, B]** that handles the conversion between an object of type A and an object of type **B**
- Added abstract class **DatasetConverter[A, B]** that extends a **Converter[Dataset[A], Dataset[B]]**
- Added auto-correction for `SparkRepository.findby(conditions)` method when we filter by case class field name instead of column name
Expand All @@ -77,8 +93,7 @@ case class CompressionDemo(@Compress col1: Seq[Int],
- Added sequential mode in class `Stage`. Use can turn in on by setting `parallel` to *true*.
- Added external data flow description in pipeline description
- Added method `beforeAll` into `ConfigLoader`
- Added new method `addStage` and `addFactory` that take a class object as input. The instantiation will be handled
by the stage.
- Added new method `addStage` and `addFactory` that take a class object as input. The instantiation will be handled by the stage.
- Removed implicit argument encoder from all methods of Repository trait
- Added new get method to **Pipeline**: `get[A](cls: Class[_ <: Factory[_]): A`.

Expand All @@ -97,8 +112,7 @@ case class CompressionDemo(@Compress col1: Seq[Int],
```
- Added an optional argument `suffix` in `FileConnector` and `SparkRepository`
- Added method `partitionBy` in `FileConnector` and `SparkRepository`
- Added possibility to filter by name pattern when a FileConnector is trying to read a directory.
To do this, add `filenamePattern` into the configuration file
- Added possibility to filter by name pattern when a FileConnector is trying to read a directory. To do this, add `filenamePattern` into the configuration file
- Added possibility to create a `Conf` object from Map.
```scala
Conf(Map("a" -> "A"))
Expand All @@ -122,15 +136,12 @@ case class CompressionDemo(@Compress col1: Seq[Int],
- Added a second argument to CompoundKey to handle primary and sort keys

## 0.2.7 (2019-06-21)
- Added `Conf` into `SparkRepositoryBuilder` and changed all the set methods
of `SparkRepositoryBuilder` to use the conf object
- Added `Conf` into `SparkRepositoryBuilder` and changed all the set methods of `SparkRepositoryBuilder` to use the conf object
- Changed package name `com.jcdecaux.setl.annotations` to `com.jcdecaux.setl.annotation`

## 0.2.6 (2019-06-18)
- Added annotation `ColumnName`, which could be used to replace the current column name
with an alias in the data storage.
- Added annotation `CompoundKey`. It could be used to define a compound key for databases
that only allow one partition key
- Added annotation `ColumnName`, which could be used to replace the current column name with an alias in the data storage.
- Added annotation `CompoundKey`. It could be used to define a compound key for databases that only allow one partition key
- Added sheet name into arguments of ExcelConnector

## 0.2.5 (2019-06-12)
Expand All @@ -155,8 +166,7 @@ that only allow one partition key

## 0.2.0 (2019-05-21)
- Changed spark version to 2.4.3
- Added `SparkRepositoryBuilder` that allows creation of a `SparkRepository` for a given class without creating a
dedicated `Repository` class
- Added `SparkRepositoryBuilder` that allows creation of a `SparkRepository` for a given class without creating a dedicated `Repository` class
- Added Excel support for `SparkRepository` by creating `ExcelConnector`
- Added `Logging` trait

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ You can start working by cloning [this template project](https://github.com/qxzz
<dependency>
<groupId>com.jcdecaux.setl</groupId>
<artifactId>setl_2.11</artifactId>
<version>0.4.0</version>
<version>0.4.1</version>
</dependency>
```

Expand All @@ -42,7 +42,7 @@ To use the SNAPSHOT version, add Sonatype snapshot repository to your `pom.xml`
<dependency>
<groupId>com.jcdecaux.setl</groupId>
<artifactId>setl_2.11</artifactId>
<version>0.4.1-SNAPSHOT</version>
<version>0.4.2-SNAPSHOT</version>
</dependency>
</dependencies>
```
Expand Down

0 comments on commit 37f6fc5

Please sign in to comment.