-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Implement ReadN #276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
raulb
wants to merge
16
commits into
main
Choose a base branch
from
read-n
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+459
−154
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
7d3574b
add readN
raulb bb52ffa
better implementation
raulb 00ecdc6
more consistent
raulb 6247721
add test
raulb 8f4c034
add NextN test on cdciterator
raulb 3331840
fix lint
raulb adc0b29
remove Read
raulb 5aec436
fix typo
raulb bcde1ed
add combined iterator test
raulb fae1a17
remove Next()
raulb bb682c6
fetch until there's 0 records left
raulb 89f2f0c
cleaner
raulb ec0dd58
better return an empty slice
raulb d2dc205
better return a slice
raulb 808ecba
more
raulb 0e08e6d
and more
raulb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried out a couple of things with channels in this gist that might help here too. The test sends a number of objects from one goroutine to another through a channel.
The sending goroutine sends 20M objects in batches of different sizes (1-100k). The channel that's shared by the sending and receiving goroutine is either buffered or unbuffered.
On my own machine, sending those 20M objects in batches of 1 record with an unbuffered channel takes around 5.5 seconds.
With batch size set to 1, and a buffered channel of size 50, that drops to 1.8 seconds (3x less).
The biggest improvement though was I set the batch size to 1k or 10k (it didn't matter much if the channel was buffered), and the result was around 80ms.
With that, I think it's worth if we try sending the data from the CDC and snapshot iterator in batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant with iterators sending data in batches: https://github.com/ConduitIO/conduit-connector-postgres/compare/read-n...haris/read-n-batches?expand=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we should try collecting records as soon as we can to cut down on the code that processes records 1 by 1 as much as we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may work ok without a timer, since I think "collecting" needs two parameters (size, tinterval) for which the requirements are satisfied. Now, there is a side-effect which can be used and it has to do with Postgres sending keep alive messages so often that the time may not be required and we can count time based on the keep alive messages.. something like: