Releases · dlt-hub/dlt

17 Dec 18:49

rudolfix

1.5.0

e8c5e9b

1.5.0 Latest

Latest

Core Library

After several weeks of experimenting we release dataset API. You can now read data in your destination with a neat, unified interface that works the same way for warehouses, relational databases, sql alchemy dialects, local and remote files, iceberg and delta tables.
You can use simple dot notation to access tables, execute sql or use data-frame expression (compiled to SQL with ibis). We materialize your data as panda frames, arrow tables or dbapi compatible records (also in batches). Here's main intro:
https://dlthub.com/docs/general-usage/dataset-access/dataset

Together with this we release our backend-less, catalog-less (well, ad hoc technical catalog is created) Iceberg implementation. You can use append and replace write dispositions, create partitions and write to the bucket. Be aware of limitations, we are just starting!
https://dlthub.com/docs/dlt-ecosystem/destinations/delta-iceberg

bump semver to minimum version 3.0.0 by @sh-rp in #2132
leverage ibis expression for getting readablerelations by @sh-rp in #2046
iceberg table format support for filesystem destination by @jorritsandbrink in #2067
fixes dlt init fails in Colab (userdata problem) by @rudolfix in #2117
Add open/closed range arguments for incremental by @steinitzu in #1991
Fix validation error in for custom auth classes by @burnash in #2129
add databricks oauth authentication by @donotpush in #2138
make duckdb handle Iceberg table with nested types by @jorritsandbrink in #2141
refresh standalone resources (old columns were recreated) by @rudolfix in #2140
fix ibis az problems on linux by @sh-rp in #2135
does not raise if data type was changed manually in schema by @rudolfix in #2150
allows to --eject source code of the core sources (ie. sql_database) to allow hacking-in customizations by @rudolfix in #2150
convert add_limit to pipe step based limiting by @sh-rp in #2131
Enable datatime format for negative timezone by @hairrrrr in #2155

ℹ️ Note on add_limit: now you can use it to chunk large resources and load them in pieces. We support chunks created
based on maximum number of rows or after a specified time. Please read the docs: your resource should return ordered rows or
be able to get data from checkpoint. Also note that we apply add_limit after all processing steps (ie. incremental), before we were limiting generator directly. This was a necessary change to implement chunking and is backward compatible regarding produced data but your resource can be queried many times the get "new" item that ie. is not filtered out by incremental.
https://dlthub.com/docs/examples/backfill_in_chunks

Docs

prepare dataset release & docs updates by @sh-rp in #2126
Add missing mention of the required endpoint_url config in GCS by @trymzet in #2120
example how to use add_limit to do large backfills in steps by @sh-rp in #2131
Update auth info in databricks docs by @VioletM in #2153
improve how dlt works page by @sh-rp in #2152
explicitly adding docs for destination item size control by @HulmaNaseer in #2118
Docs: rest_api tutorial: update primary key in merge example by @burnash in #2147

Verified Sources

Code got updated to 1.x.x dlt and tests work again. We are accepting contributions again.
ℹ️ 0.5 sources are on 0.5 tag. If you are still on dlt 0.5.x access this tag via dlt init sql_database duckdb --branch 0.5

New Contributors

@HulmaNaseer made their first contribution in #2118
@hairrrrr made their first contribution in #2155

Full Changelog: 1.4.1...1.5.0

Contributors

burnash, steinitzu, and 8 other contributors

Assets 2

02 Dec 21:54

rudolfix

1.4.1

f069071

1.4.1

Bugfixes

We release important bugfix and define identifier normalization behavior for compound identifiers. practically

identifiers that contain double underscores will be allowed
all existing schemas (ie. stored at destination) will be set to work in backward-compatible mode

#2087 allows double underscores in identifiers by @rudolfix in #2098
Fixes the usage of escaped JSONPath in incremental cursors in sql_database by @burnash in #2077
Fix/2089 support sets for pyarrow backend by @karakanb in #2090
allow to increase total count on most progress bars, fixes incorrect output in load stage by @sh-rp in #2100

Core Libary

Support custom Ollama Host by @Pipboyguy in #2044
feat(rest_api): custom client for specific resources by @joscha in #2082
Support Spatial Types for PostGIS by @Pipboyguy in #1927
Incremental table hints and incremental in resource decorator by @steinitzu in #2033
supports custom account host for azure (#2073 ) and fixes various edge cases for abfss @rudolfix

Core Sources

adds engine adapter and passes incremental and engine to query adapter by @rudolfix in #2070

adds engine adapter callback to modify engine settings before connection is opened (hopefully fixes #1920)
allows to return a subquery for table adapter, adds example that fixes #2076
passes Incremental and Engine instances to query adapter callback
allows to return text query from engine adapter #1997
arrow backend now infers not reflected columns from the data

(Still) experimental interfaces

allow to select schema from pipeline dataset factory by @sh-rp in #2075
ibis support - hand over credentials to ibis backend for a number of destinations by @sh-rp in #2004

Docs

Updated sql_database documentation for resource usage by @dat-a-man in #2072
Docs: improve links visibility in light mode by @burnash in #2078
edit snippet to make more runnable (#2066) by @AstrakhantsevaAA in #2079
Docs: update deprecated paginator type in examples by @burnash in #2093
Move "dlt in notebooks" by @AstrakhantsevaAA in #2096
docs: document that path can also be a URL by @joscha in #2099
Fix minor typo :3 by @jdbohrman in #2103
🐛 Fix parquet layout example in the docs by @trymzet in #2105
docs(rest_client): note about data_selector by @joscha in #2101

New Contributors

@joscha made their first contribution in #2099
@karakanb made their first contribution in #2090
@jdbohrman made their first contribution in #2103
@trymzet made their first contribution in #2105

Full Changelog: 1.4.0...1,4,1

Contributors

joscha, burnash, and 9 other contributors

Assets 2

14 Nov 21:15

rudolfix

1.4.0

0fce1c8

1.4.0

Core Library

feat: add incremental lag (attribution window) for datetime, int, and float cursors by @donotpush in #1957
LanceDB - (1) support merge key to merge chunked documents correctly - removes orphaned chunks (2) huge performance upgrade by loading data via arrow by @Pipboyguy in #1620
Move exclude_keys() to dlt.common.utils by @burnash in #1966
Fix BigQueryLoadJob hiding root cause exception by @xneg in #1992
loads secrets from colab userdata and steamlit + bugfixes by @rudolfix in #1994
Fix pagination issue in JSONResponseCursorPaginator with empty string cursor value by @kang8 in #2016
fix: if name of distribution is None by @senickel in #2024
allows to pass default values when writing specs by @rudolfix in #2018
enable delta partitioning on arrow normalizer load id by @jorritsandbrink in #2022
add session token to duckdb s3 secret by @jorritsandbrink in #2007
Add user agent for Databricks by @VioletM in #1987
Fix an incorrect missing dependency error by @burnash in #2001
fix resource level max_table_nesting and normalizer performance tuning by @sh-rp in #2026
move default pipelines of cores sources into source folders by @sh-rp in #1888
duckdb filesystem custom secrets by @sh-rp in #2017
allows for empty dataset clickhouse by @rudolfix in #2045
add GCP default credential handling for delta table format by @jorritsandbrink in #2048
enables merges for bigquery autodetect schema by @sh-rp in #2035
logs warning if deduplication state is large by @willi-mueller in #1877
Add core sources extras to requirements in dlt init by @burnash in #2028
Fix merge write disposition for pyarrow and ClickHouse by @burnash in #2042

Experimental interfaces

dlt dataset public interface and docs coming next week.

1990 - dataset columns select and limit by @sh-rp in #2000

Docs

Updated databricks destination documentation by @dat-a-man in #1984
Docs: fix capitalization of some terms, fix typos by @burnash in #1988
fix typo by @mariarice15 in #1995
Fix Zendesk example: make test resilient to data changes by @burnash in #1999
fix s3 credentials environment variable names by @seunggs in #2010
remove ga add tm by @alexanderfifefd in #2008
Super fast snippet linting & type checking by @sh-rp in #2019
Fix the deprecation warning in .common.configuration.container by @burnash in #2025
Added deploy with modal. by @dat-a-man in #1805
Updated google cloud function documentation by @dat-a-man in #2034
add warning for large delta memory footprint on filesystem docs page by @sh-rp in #2036
simplify advanced section by @kning in #2037
Added docs on how to deploy a pipeline using Google Cloud run by @dat-a-man in #2038
Format Delta table section in the filesystem destination by @burnash in #2057
Docs: add table formats to the sidebar by @burnash in #2060

New Contributors

@xneg made their first contribution in #1992
@seunggs made their first contribution in #2010
@alexanderfifefd made their first contribution in #2008
@kang8 made their first contribution in #2016
@senickel made their first contribution in #2024
@kning made their first contribution in #2037

Full Changelog: 1.3.0...1.4.0

Contributors

willi-mueller, burnash, and 14 other contributors

Assets 2

22 Oct 08:53

rudolfix

1.3.0

1893860

1.3.0

Core Library

Fix try/except in from_reference shadowing MissingDependencyException by @burnash in #1939
prefers uv over pip if found (when creating virtual envs) by @rudolfix in #1940
allows to plug new or updated dlt cli commands by @sh-rp in #1938
Feat/557 rest api add oauth2clientcredentials to built in auth methods by @willi-mueller in #1871
uses path normalize for columns in arrow tables by @rudolfix in #1947
Added extended jsonpath_ng parser (rest_api) by @francescomucio in #1941
Fix/1897 support https endpoints clickhouse by @sh-rp in #1931
Fix for multiple ignores is not working (rest_api) by @burnash in #1956
SQL Database: Support including/excluding NULL cursor values by @steinitzu in #1946
Add references table hint and reflect them in sql_database by @steinitzu in #1925
only truncate or delete from existing tables in refresh modes by @sh-rp in #1926
adds bigquery partition expiration and motherduck connection string by @rudolfix in #1968

Experimental interfaces

Below we expose a new pipeline._dataset and dlt._dataset interfaces that provide unified access to data loaded into destination. We also implement duckdb-based SQL client on a filesystem destination to access data in data lakes. We'll add documentation once we stabilize dataset interface. However already now you can benefit from new cursor implementation of sql_client that allows to take data frames, arrow tables also in batches:

dataset factory by @sh-rp in #1945
expose readable datasets as dataframes and arrow tables by @sh-rp in #1507

PRs below adds pluggy and a few first plugin hooks. The idea is to make a lot of functionalities in dlt pluggable. Currently you can plug new cli command (or upgrade existing) and you can also plug your own runtime environment (how dlt looks for data, secrets etc.)

adds registries and plugins by @rudolfix in #1894
unifies run configuration and run context by @rudolfix in #1944

Docs

Update url in deploy-with-airflow-composer.md by @FriedrichtenHagen in #1942
Added info about backend kwargs in pyarrow by @dat-a-man in #1903
Docs: sync styles with dlthub by @burnash in #1936
Docs: styles: remove underline for cards in dark mode by @burnash in #1967

New Contributors

@FriedrichtenHagen made their first contribution in #1942

Full Changelog: 1.2.0...1.3.0

Contributors

willi-mueller, burnash, and 6 other contributors

Assets 2

07 Oct 21:10

rudolfix

1.2.0

8798c17

1.2.0

Core Library

Sqlalchemy merge support by @steinitzu in #1842
Fix config sections for synching destinations and accessing destination clients by @sh-rp in #1887
incremental scd2 with merge_key by @jorritsandbrink in #1818
fix: UUIDs are not an unknown data type (logging) by @neuromantik33 in #1914
fix: PageNumberPaginator not reset when iterating through multiple pa… by @paul-godhouse in #1924
Feat/1922 rest api source add mulitple path parameters by @TheOneTrueAnt in #1923
enables gcs staging for databricks by @rudolfix in #1933

Docs

Update weaviate reference by @emmanuel-ferdman in #1896
Docs: Add sftp option for filesystem source by @VioletM in #1845
Update installation.md by @erikjamesmason in #1899
Added troubleshooting section to filesystem docs by @dat-a-man in #1900
Docs: make naming consistent in the cloud storage & file system source by @burnash in #1835
Docs: add section on resolving multiple path parameters by @burnash in #1929

New Contributors

@emmanuel-ferdman made their first contribution in #1896
@erikjamesmason made their first contribution in #1899
@neuromantik33 made their first contribution in #1914
@paul-godhouse made their first contribution in #1924

Full Changelog: 1.1.0...1.2.0

Contributors

burnash, TheOneTrueAnt, and 10 other contributors

Assets 2

26 Sep 13:33

rudolfix

1.1.0

d2b6d05

1.1.0

What's Changed

fix intermittent delta panic issue by @jorritsandbrink in #1832
Sqlalchemy staging dataset support and docs by @steinitzu in #1841
rest_api: allow specifying custom session (feat/1843) by @willi-mueller in #1844
Allows any duckdb version, fixes databricks az credentials by @rudolfix in #1854
Fix/1849 Do Not Parse Ignored Empty Responses by @TheOneTrueAnt in #1851
feat: filesystem delete old pipeline state files by @donotpush in #1838
supports adding DltResource in RESTAPIConfig dict by @willi-mueller in #1865
Fix/1858 make all connection string credentials optional by @rudolfix in #1867

Docs

sqlalchemy destination docs @steinitzu in #1841
Docs: move REST API helpers to the REST API category by @burnash in #1852
Docs: rest_api: document processing_steps by @burnash in #1872
Fix the paginator's doc heading by @burnash in #1869

Verified Sources

Custom filter clauses supported, pyarrow/arrowmongo requirement optional for Mongo by @Pipboyguy

New Contributors

@TheOneTrueAnt made their first contribution in #1851

Full Changelog: 1.0.0...1.1.0

Contributors

willi-mueller, burnash, and 6 other contributors

Assets 2

16 Sep 15:07

rudolfix

1.0.0

e48f641

1.0.0

This is a major dlt release. Please check the list of breaking changes and deprecations: #1778

Core Library

move rest_api, sql_database and filesystem sources to dlt core by @willi-mueller in #1728
drops foreign_key, adds nested references (row_key - parent_key) by @rudolfix in #1774
deprecates complex data type, changes to json by @rudolfix in #1792
Feat/1749 abort load package and raise exception on terminal errors in jobs by @willi-mueller in #1781
Feat/1492 extend timestamp config to handle naive timestamps (without timezone) by @donotpush in #1669
Fix/1571 Incremental: Optionally load or ignore/exclude/include records with cursor_path missing or None value by @willi-mueller in #1576
creates a single source in extract for all resource instances passed as list by @rudolfix in #1535
Enable BigQuery schema auto-detection with partitioning and clustering hints by @Pipboyguy in #1806
Sqlalchemy destination (merge support and docs still in progress) by @steinitzu in #1734
Feat/1730 extend filesystem sftp by @donotpush in #1769
Stops dumping secrets to dlt traces. by @willi-mueller in #1797
Don't use Custom Embedding Functions on LanceDB by @Pipboyguy in #1771
sets default concurrency for blob upload for adlfs to 1 to avoid massive memory usage on large files by @rudolfix in #1779
Fix/1790 support incremental load with arrow when cursor column is not nullable by @willi-mueller in #1791
controls row group size and empty tables in memory buffer when writing parquet by @rudolfix in #1782
fix installation command" by @novica in #1741
skips tables without jobs when merging delta tables by @rudolfix in #1803

Docs

display past versions of the documentation (0.5.x / 1.0.0 / devel) by @sh-rp in #1770
Refactor filesystem doc by @VioletM in #1745
Update REST API docs by @akelad in #1795
Add filesystem tutorial by @VioletM in #1775
adding the sql_database tutorial by @rahuljo in #1796
structural and content changes to the sql_database doc by @rahuljo in #1623
Docs: update the introduction, add the rest_api tutorial by @burnash in #1729
Docs/update deploy dagster by @mariarice15 in #1761
Correct wrong code example for apply_hints( incremental(xx) ) by @w0ut0 in #1785
Moves sources and destinations to the top level in docs navigation by @VioletM in #1750
Fix typo "frequenly" by @ruudwelten in #1800
Reorder sidebar by @mariarice15 in #1787

New Contributors

@novica made their first contribution in #1741
@mariarice15 made their first contribution in #1761
@w0ut0 made their first contribution in #1785
@ruudwelten made their first contribution in #1800

Full Changelog: 0.5.4...1.0.0

Contributors

novica, willi-mueller, and 12 other contributors

Assets 2

28 Aug 20:02

rudolfix

0.5.4

9857029

0.5.4

Core Library

BigQuery project_id may be different from credentials project_id by @VioletM in #1680
Enable schema evolution for merge write disposition with delta table format by @jorritsandbrink in #1742
Add storage_options to DeltaTable.create by @jorritsandbrink in #1686
Fix delta table dangling Parquet file bug by @jorritsandbrink in #1695
Add delta table partitioning support by @jorritsandbrink in #1696
fixes load job counter displayed in progress by @rudolfix in #1702
RESTClient: stops pagination after empty page (Feat/1637) by @willi-mueller in #1677
Enable scd2 record reinsert by @jorritsandbrink in #1707
scd2 custom "valid from" / "valid to" value feature by @jorritsandbrink in #1709
feat/1681 collects load job metrics and adds remote url to traces by @rudolfix in #1708
locks trace format with a contract @rudolfix in #1708
Feat/1711 create with not exists for dlt tables to reduce racing conditions by @rudolfix in #1740
provides detail exception messages when cursor stored value cannot be coerced to data by @rudolfix in #1748
Allows to configure if staging destination is truncated or left intact to config by @VioletM in #1717
enables external location and named credential in databricks, allows abfss://container@account Azure urls by @rudolfix in #1755
fixes #1703 and #1754 by @rudolfix in #1755

Docs:

rest_api: documents pluggable custom auth by @willi-mueller in #1690
Update Snowflake docs by @akelad in #1747
Docs/issue 1661 add tip to source docs and update weaviate docs by @dat-a-man in #1662
Add custom parent-child relationships example by @dat-a-man in #1678
Correct the library name for mem stats to psutil by @deepyaman in #1733
Replaced "full_refresh" with "dev_mode" by @dat-a-man in #1735

New Contributors

@deepyaman made their first contribution in #1733

Full Changelog: 0.5.3...0.5.4

Contributors

willi-mueller, VioletM, and 5 other contributors

Assets 2

13 Aug 00:20

rudolfix

0.5.3

19c41ea

0.5.3

Core Library

Add support for continuously starting load jobs as slots free up in the loader. This will significantly speed up loading packages with many files. by @sh-rp in #1494
Add get_delta_tables helper function to optimize and vacuum tables by @jorritsandbrink in #1664
Raise/warn on incomplete columns in normalize by @steinitzu in #1504
Add enable_dataset_name_normalization option by @VioletM in #1676
updates duckdb/motherduck load job to match parquet by column names by @rudolfix in #1674
updates duckdb/motherduck load job to fully allow jsonl file format by @rudolfix in #1674
removes internal locks when loading parquet from multiple threads (duckdb got fixed) #1674
enables multi transactions statements for Motherduck #1674
fixes dbt logs line endings

Docs

Updated config and credentials docs by @VioletM in #1508
Update salesforce.md by @makies in #1665

Verified Sources

Column selector added to sql_database @steinitzu

New Contributors

@makies made their first contribution in #1665

Full Changelog: 0.5.2...0.5.3

Contributors

makies, steinitzu, and 4 other contributors

Assets 2

02 Aug 19:18

rudolfix

0.5.2

e00baa0

0.5.2

Core Library

Add upsert merge strategy for Postgres and Snowflake, by @jorritsandbrink in #1466
Add basic upsert support for delta table format in filesystem destination by @jorritsandbrink in #1600
query tagging for snowflake by @rudolfix in #1582
Support Open Source ClickHouse Deployments (MergeTree engine and more) by @Pipboyguy in #1496
allows nested types in BigQuery via native autodetect_schema by @rudolfix in #1591
Enable upsert merge strategy for more SQL destinations (Athena, BigQuery, Databricks, mssql) by @jorritsandbrink in #1628
Fix/1512 fixes current.pipeline() access by @rudolfix in #1581
feat: add config dataset_name_prefix to set custom staging dataset name by @donotpush in #1563
fix: add airflow db reset for all tests by @donotpush in #1559
Enable S3 compatible storage for delta table format by @jorritsandbrink in #1586
feat/1495 rest_client: renames JSONResponsePaginator to JSONLinkPaginator by @willi-mueller in #1558
Feat/1596 adds custom config providers + example of yaml config provider supporting profiles and jinja placeholders by @rudolfix in #1642
Feat/1583 rest client session timeout configuration by @willi-mueller in #1590
Add clarification for add_limit by @VioletM in #1594
Fix/1606 fixes validator incremental step order to keep it always last in the pipe by @rudolfix in #1641
Feat/1593 rest_client: allow setting of request kwargs by @willi-mueller in #1609
prevent accidental wrapping of sources in resources when using adapters by @sh-rp in #1645
Add empty source handling for delta table format on filesystem destination by @jorritsandbrink in #1617
Surface original err msg from pydantic as extended_info on DataValidationError by @codingcyclist in #1569
fix(dockerfile): remove extra spaces around equals sign in LABEL inst… by @thisisdope in #1573
Qdrant uncommitted state restore and test by @steinitzu in #1545
fix: suppress alembic logs for tests by @donotpush in #1578

Docs

Document sql source reflection level and type adapter by @steinitzu in #1467
Add to docs docs configuring file format options by @VioletM in #1543
Added how dlt uses arrow by jorrit by @dat-a-man in #1577
docs/514 rest_api: docs on pluggable paginators by @willi-mueller in #1557
docs: documents new convert parameter in rest_api source incremental config by @willi-mueller in #1649
Docs/1571 docs on handling NULL values at incremental cursor path by @willi-mueller in #1650
Add note that pg_replication doesn't support scd2 by @akelad in #1608
docs/505 updates documentation on custom hooks in response_actions by @willi-mueller in #1524

New Contributors

@donotpush made their first contribution in #1559
@thisisdope made their first contribution in #1573
@akelad made their first contribution in #1608

Full Changelog: 0.5.1...0.5.2

Contributors

willi-mueller, steinitzu, and 10 other contributors

Assets 2

Releases: dlt-hub/dlt

1.5.0

Core Library

Docs

Verified Sources

New Contributors

Contributors

1.4.1

Bugfixes

Core Libary

Core Sources

(Still) experimental interfaces

Docs

New Contributors

Contributors

1.4.0

Core Library

Experimental interfaces

Docs

New Contributors

Contributors

1.3.0

Core Library

Experimental interfaces

Docs

New Contributors

Contributors

1.2.0

Core Library

Docs

New Contributors

Contributors

1.1.0

What's Changed

Docs

Verified Sources

New Contributors

Contributors

1.0.0

Core Library

Docs

New Contributors

Contributors

0.5.4

Core Library

Docs:

New Contributors

Contributors

0.5.3

Core Library

Docs

Verified Sources

New Contributors

Contributors

0.5.2

Core Library

Docs

New Contributors

Contributors