-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize /api/internal/add_nodes/ #1088
Comments
@lyw07 @aronasorman @ralphiee22 Here are some notes about possible optimizations for the Here is a time diagram from sushibar about a Khan Academy chef run: Not optimizableThere isn't much we can do about the download stage—the first time the chef runs, we have to download and compress the videos so it's normal if that takes a long time. Subsequent runs will not have to go through this whole process. OptimizableThe real bottleneck is the
The reason for this level-by-level upload is because This suggest the following optimization: make If ricecooker knows the I wanted flag this as the most viable long term optimization, because each call to |
I agree in general with this, but I'm skeptical of having the client generate the canonical ID to be saved on the DB. It might be better to have the ricecooker generate ricecooker-local IDs, but these are only used by Studio to refer to the relationships of each nodes -- Studio can then generate the UUIDs on its side, rather than depend on ricecooker. |
Isn't that the whole point of UUIDs? (reducing the need for centralized id-generation)
Yeah I thought of that too, but that would require building some sort of mapping form local-ids to studio_ids which is probably not worth it. Also this approach would not fix the level-by-level problem would still exist: we'd need to save nodes at level X, before we can create nodes at level X+1, which closes the door for bulk-create style operations. |
I can't think of any issues that could be introduced by generating UUIDs from ricecooker. We already create db objects with pre-generated or hardcoded ids, such as the garbage collection node root id. |
I see this as a possible security bug. The main issue is overwriting already existing nodes on Studio. We can have a check first on all IDs to make sure they're not overwriting nodes on other channels. But then we need to check (on KA's case) if the IDs we're uploading don't exist yet on Studio. That smells like a performance issue for me. Happy to discuss, maybe I'm just paranoid about having ricecooker clients generate IDs. |
If a UUID function is generating an ID that already exists, then by definition it's not generating a Universally Unique ID. :( With UUIDs, the database is not tracking what UUIDs have already been used, because it doesn't have to - the same UUID will never be generated twice. |
Superseded by #3041 |
This one currently slows down @ralphiee22 when he uploads KA.
add_nodes is one of the slowest endpoints in the sushi chef upload process.
Let's make it faster!
Step 1 is to write a benchmark script inside deploy/chaos/ that pings this
endpoint continuously.
Once we've established a baseline, then we can commence optimization. @jayoshih
recommends parallelizing the convert_data_to_nodes loop.
Category
ENHANCEMENT
The text was updated successfully, but these errors were encountered: