-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving dataset loader and preprocess script #102
Comments
Starting with the preprocess script the current time taken is as follows (noting that I've already done some optimization here for loading the file). This is from the
I tried tweaking around with this a bit but I think this is acceptable and hence moving on. |
Okay so I have identified a few repeated operations in the data loading pipeline. Right now the flow is
I think we can drop the last two conversions and just pipe in the original edge list and add/delete list. @nithinmanoj10 any opinions against this approach? I'll try making the changes to see if there is some dependency I missed. |
In which file did you benchmark the preprocessing steps? @JoelMathewC |
I'm running tests on |
There are some severe speed issues with the preprocess and data loader script and this oftens makes benchmarking a rather tedious process. I'll work on clearing up the technical debt here (mostly mine 😅).
The text was updated successfully, but these errors were encountered: