-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T&D Should not dedupe raw table (and maintain current CDC delete behavior) #30710
Comments
Un-assigning myself. For whoever picks this up next, feel free to keep working on #30742 if you like (or not). Note this comment from @edgao: #30742 (comment) |
Grooming:
The goal is to research if we can not dedupe the raw table and still matinan our current CDC behavior (which is really deleting rows in the final table). If so, we should do this work now. If not, this work is blocked on changing what CDC deletes do (e.g. tombstone column) |
maybe figured out how to do this as part of #30764. will unassign myself if that doesn't pan out. |
pretty sure I have a line on this. I think the problem I ran into last week was that we were deduping the final table in a way that didn't interact well with duplicate _airbyte_raw_ids (I forget the specifics). Realized that we can just dedup new raw records before upserting them, which means we can safely not dedup the raw table. (yes, that's somewhat incoherent. no, I don't remember the exact issue I was facing last week. 🤷) |
One of the options we have to speed up T&D is to skip deduplication on the raw table, e.g (code):
Working on this story includes:
The text was updated successfully, but these errors were encountered: