Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensions to ReadableDBAPIDataset and ReadableIbisRelation #2372

Open
sh-rp opened this issue Mar 4, 2025 · 0 comments · May be fixed by #2386
Open

Extensions to ReadableDBAPIDataset and ReadableIbisRelation #2372

sh-rp opened this issue Mar 4, 2025 · 0 comments · May be fixed by #2386
Assignees

Comments

@sh-rp
Copy link
Collaborator

sh-rp commented Mar 4, 2025

Goal:

We need a couple of extensions to our readable datasets to interact with the data in the context of dlt loads. We want to be able to retrieve all load ids that the dataset knows about, be able to sort by loading timestamp, be able to filter by success status and be able to filter rows of a table by selected load is. In short we need the following functions:

On ReadableDBAPIDataset:

  • def list_load_ids(status: Union[int, List[int]] = 0, limit: int = 10) - should list all load ids of the dataset with the given status (may be list), ordered by load_id desc, and limited by limit. Be aware that the right schema needs to be selected
  • def last_load_id(status: Union[int, List[int]] = 0) same as above but limited to 1.

On ReadableIbisRelation:

  • filter_by_load_ids(load_ids: List[str]) - Should modify the ibis query so the result is filtered to only return rows of the given load_ids. In the case of child_tables, you'll need to join to the root table and filter by load_id there.
  • filter_by_last_load_id(status: Union[int, List[int]] = 0) - Filter by most recent load id that matches a given status.
  • filter_by_load_status(status: Union[int, List[int]] = 0) - Filter by rows that are associated with a load with a given status. In case a row can not be matched with an entry in the loads table, assign it a status of -1 for the purpose of filtering which means failed.

I think it should be enough to support the above only if ibis datasets are available. Ensure that useful exceptions are raised if this is not the case, i think they probably are good already but I am not sure. Same goes for rows that do not have the dlt load id rows which can happen in some configurations. But also here I think ibis will tell you that columns are not there.

We need to test this on all destinations same as all the other dataset read access tests.

Further notes:

@sh-rp sh-rp moved this to Planned in dlt core library Mar 4, 2025
@rudolfix rudolfix moved this from Planned to In Progress in dlt core library Mar 10, 2025
@sh-rp sh-rp linked a pull request Mar 10, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants