diff --git a/docs/website/docs/dlt-ecosystem/destinations/postgres.md b/docs/website/docs/dlt-ecosystem/destinations/postgres.md index 922b187a7e..4e20e9abfe 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/postgres.md +++ b/docs/website/docs/dlt-ecosystem/destinations/postgres.md @@ -70,7 +70,7 @@ To pass credentials directly, use the [explicit instance of the destination](../ pipeline = dlt.pipeline( pipeline_name='chess', destination=dlt.destinations.postgres("postgresql://loader:@localhost/dlt_data"), - dataset_name='chess_data' + dataset_name='chess_data' #your destination schema name ) ``` diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md index b783f64c0a..273aa80a17 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md @@ -18,6 +18,10 @@ import Header from '../_source-info-header.md'; Read more about sources and resources here: [General usage: source](../../../general-usage/source.md) and [General usage: resource](../../../general-usage/resource.md). +:::note NOTE +To see complete list of source arguments for `sql_database` [refer to the this section](#arguments-for-sql_database-source). +::: + ### Example usage: :::tip @@ -344,3 +348,55 @@ print(info) ``` With the dataset above and a local PostgreSQL instance, the `ConnectorX` backend is 2x faster than the `PyArrow` backend. +### Arguments for `sql_database` source +The following arguments can be used with the `sql_database` source: + + `credentials` (Union[ConnectionStringCredentials, Engine, str]): Database credentials or an `sqlalchemy.Engine` instance. + + `schema` (Optional[str]): Name of the database schema to load (if different from default). + + `metadata` (Optional[MetaData]): Optional `sqlalchemy.MetaData` instance. `schema` argument is ignored when this is used. + + `table_names` (Optional[List[str]]): A list of table names to load. By default, all tables in the schema are loaded. + + `chunk_size` (int): Number of rows yielded in one batch. SQL Alchemy will create additional internal rows buffer twice the chunk size. + + `backend` (TableBackend): Type of backend to generate table data. One of: "sqlalchemy", "pyarrow", "pandas" and "connectorx". + + - "sqlalchemy" yields batches as lists of Python dictionaries, "pyarrow" and "connectorx" yield batches as arrow tables, "pandas" yields panda frames. + + - "sqlalchemy" is the default and does not require additional dependencies, + + - "pyarrow" creates stable destination schemas with correct data types, + + - "connectorx" is typically the fastest but ignores the "chunk_size" so you must deal with large tables yourself. + + `detect_precision_hints` (bool): Deprecated. Use `reflection_level`. Set column precision and scale hints for supported data types in the target schema based on the columns in the source tables. This is disabled by default. + + `reflection_level`: (ReflectionLevel): Specifies how much information should be reflected from the source database schema. + + - "minimal": Only table names, nullability and primary keys are reflected. Data types are inferred from the data. This is the default option. + + - "full": Data types will be reflected on top of "minimal". `dlt` will coerce the data into reflected types if necessary. + + - "full_with_precision": Sets precision and scale on supported data types (ie. decimal, text, binary). Creates big and regular integer types. + + `defer_table_reflect` (bool): Will connect and reflect table schema only when yielding data. Requires table_names to be explicitly passed. + Enable this option when running on Airflow. Available on dlt 0.4.4 and later. + + `table_adapter_callback`: (Callable): Receives each reflected table. May be used to modify the list of columns that will be selected. + + `backend_kwargs` (**kwargs): kwargs passed to table backend ie. "conn" is used to pass specialized connection string to connectorx. + + `include_views` (bool): Reflect views as well as tables. Note view names included in `table_names` are always included regardless of this setting. This is set to false by default. + + `type_adapter_callback`(Optional[Callable]): Callable to override type inference when reflecting columns. + Argument is a single sqlalchemy data type (`TypeEngine` instance) and it should return another sqlalchemy data type, or `None` (type will be inferred from data) + + `query_adapter_callback`(Optional[Callable[Select, Table], Select]): Callable to override the SELECT query used to fetch data from the table. The callback receives the sqlalchemy `Select` and corresponding `Table`, 'Incremental` and `Engine` objects and should return the modified `Select` or `Text`. + + `resolve_foreign_keys` (bool): Translate foreign keys in the same schema to `references` table hints. + May incur additional database calls as all referenced tables are reflected. + + `engine_adapter_callback` (Callable[[Engine], Engine]): Callback to configure, modify and Engine instance that will be used to open a connection ie. to set transaction isolation level. + diff --git a/docs/website/docs/walkthroughs/adjust-a-schema.md b/docs/website/docs/walkthroughs/adjust-a-schema.md index d2a2bfa8c8..ec3bcdf6ff 100644 --- a/docs/website/docs/walkthroughs/adjust-a-schema.md +++ b/docs/website/docs/walkthroughs/adjust-a-schema.md @@ -36,8 +36,8 @@ schemas |---export/ ``` -Rather than providing the paths in the `dlt.pipeline` function, you can also set them -in the `config.toml` file: +Rather than providing the paths in the `dlt.pipeline` function, you can also set them at +the beginning of the `config.toml` file: ```toml export_schema_path="schemas/export" @@ -74,10 +74,11 @@ You should keep the import schema as simple as possible and let `dlt` do the res In the next steps, we'll experiment a lot; you will be warned to set `dev_mode=True` until we are done experimenting. :::caution -`dlt` will **not modify** tables after they are created. -So if you have a YAML file, and you change it (e.g., change a data type or add a hint), -then you need to **delete the dataset** -or set `dev_mode=True`: +dlt does **not modify** existing columns in a table after creation. While new columns can be added, changes to existing +columns (such as altering data types or adding hints) will not take effect automatically. + +If you modify a YAML schema file, you must either delete the dataset, enable `dev_mode=True`, or use one of the Pipeline +[Refresh options](../general-usage/pipeline#refresh-pipeline-data-and-state) to apply the changes. ```py dlt.pipeline( import_schema_path="schemas/import",