Release of Data.SyncMaster 0.3.0 brings up support for Iceberg, Spark-on-K8s and Spark-on-Yarn.
Note
Currently Spark-on-K8s and Spark-on-Yarn do not support FTP, FTPS, SFTP, Samba and WebDAV.
Breaking Changes
-
Worker container command should be changed from
--queues 123-myqueuetoworker --queues 123-myqueue(#295). -
Application should be configured via
config.ymlfile (#289).It's still possible to use environment variables instead. But it is not recommended for security reasons, as docker/k8s envs can be read by other users.
Other notable changes:
- Environment variable
SYNCMASTER__ENTRYPOINT__SUPERUSERSis renamed toSYNCMASTER__SUPERUSERS. - Logging format is configured explicitly via
config.ymlinstead of having few predefined configuration files.
- Environment variable
-
Moved
server.sessionmiddleware settings toauthblock (#304). Also rename some fields inauth.keycloaksettings block.Before vs after
Before:
auth: provider: ... keycloak: server_url: ... redirect_url: ... server: session: enabled: true secret_key: ...
Now:
auth: provider: keycloak: api_url: ... ui_callback_url: ... cookie: secret_key: ...
Features
-
Added Iceberg support (#282, #284, #294, #297).
Iceberg connection currently supports only Iceberg REST Catalog with S3 warehouse.
-
Allow using SyncMaster worker image as
spark.kubernetes.container.image. (#295) -
Allow passing default Spark session config via worker settings (#291):
Example
worker: spark_session_default_config: spark.master: local spark.driver.host: 127.0.0.1 spark.driver.bindAddress: 0.0.0.0 spark.sql.pyspark.jvmStacktrace.enabled: true spark.ui.enabled: false
-
Added OAuth2GatewayProvider (#283).
This allows using Data.SyncMaster under OAuth2 Gateway. Implementation is similar to DummyAuthProvider.
-
Allow disabling
SessionMiddleware, as it only required byKeycloakAuthProvider. -
Add hooks support to worker classes (TransferController, Handler) (#279).
-
Pass transfer name and group name to Handlers (#308).
Improvements
- Make S3 connection
regiona mandatory option, to prevent possible errors. - Hide
database_namefrom Clickhouse and MySQL connection pages. - Frontend: add placeholders to connection params, like host, port and so on.
- Sync frontend and backend checks for some field patterns, e.g. table name should be in format
schema.table. - Improve OpenAPI schema fields description.
Bug Fixes
Fix some file format options were ignored by SyncMaster worker:
- XML:
root_tag,row_tag - Excel
start_cell,include_header - CSV
include_header,line_sep - JSON, JSONLine:
line_sep