You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a couple of extensions to our readable datasets to interact with the data in the context of dlt loads. We want to be able to retrieve all load ids that the dataset knows about, be able to sort by loading timestamp, be able to filter by success status and be able to filter rows of a table by selected load is. In short we need the following functions:
On ReadableDBAPIDataset:
def list_load_ids(status: Union[int, List[int]] = 0, limit: int = 10) - should list all load ids of the dataset with the given status (may be list), ordered by load_id desc, and limited by limit. Be aware that the right schema needs to be selected
def last_load_id(status: Union[int, List[int]] = 0) same as above but limited to 1.
On ReadableIbisRelation:
filter_by_load_ids(load_ids: List[str]) - Should modify the ibis query so the result is filtered to only return rows of the given load_ids. In the case of child_tables, you'll need to join to the root table and filter by load_id there.
filter_by_last_load_id(status: Union[int, List[int]] = 0) - Filter by most recent load id that matches a given status.
filter_by_load_status(status: Union[int, List[int]] = 0) - Filter by rows that are associated with a load with a given status. In case a row can not be matched with an entry in the loads table, assign it a status of -1 for the purpose of filtering which means failed.
I think it should be enough to support the above only if ibis datasets are available. Ensure that useful exceptions are raised if this is not the case, i think they probably are good already but I am not sure. Same goes for rows that do not have the dlt load id rows which can happen in some configurations. But also here I think ibis will tell you that columns are not there.
We need to test this on all destinations same as all the other dataset read access tests.
Goal:
We need a couple of extensions to our readable datasets to interact with the data in the context of dlt loads. We want to be able to retrieve all load ids that the dataset knows about, be able to sort by loading timestamp, be able to filter by success status and be able to filter rows of a table by selected load is. In short we need the following functions:
On
ReadableDBAPIDataset
:def list_load_ids(status: Union[int, List[int]] = 0, limit: int = 10)
- should list all load ids of the dataset with the givenstatus
(may be list), ordered by load_id desc, and limited bylimit
. Be aware that the right schema needs to be selecteddef last_load_id(status: Union[int, List[int]] = 0)
same as above but limited to 1.On
ReadableIbisRelation
:filter_by_load_ids(load_ids: List[str])
- Should modify the ibis query so the result is filtered to only return rows of the given load_ids. In the case of child_tables, you'll need to join to the root table and filter by load_id there.filter_by_last_load_id(status: Union[int, List[int]] = 0)
- Filter by most recent load id that matches a given status.filter_by_load_status(status: Union[int, List[int]] = 0)
- Filter by rows that are associated with a load with a given status. In case a row can not be matched with an entry in the loads table, assign it a status of -1 for the purpose of filtering which means failed.I think it should be enough to support the above only if ibis datasets are available. Ensure that useful exceptions are raised if this is not the case, i think they probably are good already but I am not sure. Same goes for rows that do not have the dlt load id rows which can happen in some configurations. But also here I think ibis will tell you that columns are not there.
We need to test this on all destinations same as all the other dataset read access tests.
Further notes:
The text was updated successfully, but these errors were encountered: