-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/2372 extend dataset querying API for incremental load #2386
Open
zilto
wants to merge
13
commits into
devel
Choose a base branch
from
feat/2372-incremental-dataset
base: devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+436
−47
Open
Changes from 8 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
918635d
extend dataset querying
c9623ba
fix linting
db01d41
return SupportsReadableRelation
854a921
handle None for status kwarg
9276980
refactor; enable root_key filtering
c5afb26
added ibis filters; load_table attr
2626450
added filtering funcs to raw SQL
72c2dcc
use dataset method to access table
2e5aed5
removing uncessary methods
f724f7c
modified ibis proxy access pattern to meet Ibis API
070200a
moved from subqueries filter to root_key joins
eee0ffb
clarified tests; linting
e28c402
fixing superclass signature; remove unused func/methods
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,16 @@ | ||
from typing import Any, Union, TYPE_CHECKING, List | ||
|
||
from dlt.common.json import json | ||
|
||
from dlt.common.exceptions import MissingDependencyException | ||
|
||
from dlt.common.destination.reference import TDestinationReferenceArg, Destination | ||
from dlt.common.destination.client import JobClientBase, WithStateSync | ||
from dlt.common.destination.dataset import SupportsReadableRelation, SupportsReadableDataset | ||
from dlt.common.destination.reference import TDestinationReferenceArg, Destination | ||
from dlt.common.destination.typing import TDatasetType | ||
|
||
from dlt.destinations.sql_client import SqlClientBase, WithSqlClient | ||
from dlt.common.exceptions import MissingDependencyException | ||
from dlt.common.json import json | ||
from dlt.common.schema import Schema | ||
from dlt.common.schema.utils import get_root_table | ||
from dlt.destinations.dataset.relation import ReadableDBAPIRelation | ||
from dlt.destinations.dataset.utils import get_destination_clients | ||
from dlt.destinations.sql_client import SqlClientBase, WithSqlClient | ||
|
||
if TYPE_CHECKING: | ||
try: | ||
|
@@ -121,6 +119,49 @@ def _ensure_client_and_schema(self) -> None: | |
def __call__(self, query: Any) -> ReadableDBAPIRelation: | ||
return ReadableDBAPIRelation(readable_dataset=self, provided_query=query) # type: ignore[abstract] | ||
|
||
def _filter_using_root_table( | ||
zilto marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self, | ||
table_name: str, | ||
key: str, | ||
values_to_include: Union[list[Any], list[ReadableDBAPIRelation], ReadableDBAPIRelation], | ||
) -> ReadableDBAPIRelation: | ||
"""Filter the root table and propagate to current table. | ||
|
||
Case 1: current table is root, then filter an return | ||
Case 2: current table is not root, then join on root key | ||
|
||
NOTE. The Ibis backend supports a 3rd case, where parent-child relationships are traversed | ||
Such recursive query is difficult to express in raw SQL. | ||
""" | ||
# Case 1: if current table is root table, you can filter directly and exit early | ||
dlt_root_table = get_root_table(self.schema.tables, table_name) | ||
if dlt_root_table["name"] == self.table_name: | ||
# TODO use secure SQL interpolation | ||
# TODO use `sql_client` utils to use fully qualified names that respect destination | ||
query = f"SELECT * FROM {table_name} WHERE {key} in {values_to_include}" | ||
return self(query) | ||
|
||
# Case 2: There is a root_key to join root with current table | ||
# (e.g., root_key=True on Source, write_disposition="merge") | ||
root_key = None | ||
for col_name, col in self.columns_schema.items(): | ||
if col.get("root_key") is True: | ||
root_key = col_name | ||
|
||
if root_key is not None: | ||
# TODO improve error handling | ||
raise ValueError(f"Table `{table_name}` has no `root_key` to join with the load table.") | ||
|
||
root_row_key = next( | ||
col_name for col_name, col in dlt_root_table["columns"].items() if col.get("row_key") is True | ||
) | ||
query = ( | ||
"SELECT * " | ||
f"FROM {table_name}" | ||
f"WHERE current.{root_key} IN SELECT {root_row_key} FROM {dlt_root_table['name']})" | ||
) | ||
return self(query) | ||
|
||
def table(self, table_name: str) -> SupportsReadableRelation: | ||
# we can create an ibis powered relation if ibis is available | ||
if table_name in self.schema.tables and self._dataset_type in ("auto", "ibis"): | ||
|
@@ -133,6 +174,7 @@ def table(self, table_name: str) -> SupportsReadableRelation: | |
readable_dataset=self, | ||
ibis_object=unbound_table, | ||
columns_schema=self.schema.tables[table_name]["columns"], | ||
table_name=table_name, | ||
) | ||
except MissingDependencyException: | ||
# if ibis is explicitly requested, reraise | ||
|
@@ -171,6 +213,78 @@ def row_counts( | |
# Execute query and build result dict | ||
return self(query) | ||
|
||
def list_load_ids( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above |
||
self, status: Union[int, list[int], None] = 0, limit: Union[int, None] = 10 | ||
) -> SupportsReadableRelation: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should be implemented as ibis expression and raise if ibis is not installed |
||
"""Return the list most recent `load_id`s in descending order. | ||
|
||
If no `load_id` is found, return empty list. | ||
""" | ||
# TODO protect from SQL injection | ||
query = "SELECT load_id FROM {self.schema.loads_table_name}" | ||
|
||
if status is not None: | ||
status_list = [status] if isinstance(status, int) else status | ||
query += f"WHERE status IN {status_list}" | ||
|
||
query += "ORDER BY load_id DESC" | ||
|
||
if limit is not None: | ||
query += f"LIMIT {limit}" | ||
|
||
return self(query) | ||
|
||
def latest_load_id(self, status: Union[int, list[int], None] = 0) -> SupportsReadableRelation: | ||
"""Return the latest `load_id`.""" | ||
query = "SELECT max(load_id) FROM {self.schema.loads_table_name}" | ||
if status is not None: | ||
status_list = [status] if isinstance(status, int) else status | ||
query += f"WHERE status IN {status_list}" | ||
|
||
return self(query) | ||
|
||
def filter_by_load_ids(self, table_name: str, load_ids: Union[str, list[str]]) -> ReadableDBAPIRelation: | ||
"""Filter on matching `load_ids`.""" | ||
load_ids = [load_ids] if isinstance(load_ids, str) else load_ids | ||
return self._filter_using_root_table( | ||
table_name=table_name, key="_dlt_load_id", values_to_include=load_ids | ||
) | ||
|
||
def filter_by_load_status( | ||
self, table_name: str, status: Union[int, list[int], None] = 0 | ||
) -> ReadableDBAPIRelation: | ||
"""Filter to rows with a specific load status.""" | ||
if status is None: | ||
return self | ||
|
||
load_ids = [row[0] for row in self.list_load_ids(status=status).fetchall()] | ||
return self._filter_using_root_table( | ||
table_name=table_name, key="_dlt_load_id", values_to_include=load_ids | ||
) | ||
|
||
def filter_by_latest_load_id( | ||
self, table_name: str, status: Union[int, list[int], None] = 0 | ||
) -> ReadableDBAPIRelation: | ||
"""Filter on the most recent `load_id` with a specific load status. | ||
|
||
If `status` is None, don't filter by status. | ||
""" | ||
latest_load_id = self.latest_load_id(status=status).fetchall()[0][0] | ||
return self._filter_using_root_table( | ||
table_name=table_name, key="_dlt_load_id", values_to_include=[latest_load_id] | ||
) | ||
|
||
def filter_by_load_id_gt( | ||
self, table_name: str, load_id: str, status: Union[int, list[int], None] = 0 | ||
) -> ReadableDBAPIRelation: | ||
load_ids = [ | ||
row[0] for row in self.list_load_ids(status=status, limit=None).fetchall() | ||
if row[0] > load_id | ||
] | ||
return self._filter_using_root_table( | ||
table_name=table_name, key="_dlt_load_id", values_to_include=load_ids | ||
) | ||
|
||
def __getitem__(self, table_name: str) -> SupportsReadableRelation: | ||
"""access of table via dict notation""" | ||
return self.table(table_name) | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need this anymore, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it for now, but we might want it if we want to support these joins when
root_key=True
is missing