Ricecooker Studio upload optimizations #231

ralphiee22 · 2019-11-14T23:50:11Z

ricecooker version: v1.0.0?

Description

This issue serves as a reminder/discussion for possible avenues to ricecooker optimizations.
Some things suggested already were:

compressing/gzipping tree metadata before uploading to studio (alternative to chunking requests)
generating content databases on the ricecooker side and uploading to studio, whereby studio can import into the main database

ivanistheone · 2019-11-18T15:50:57Z

+1 for this. There is no reason for it to take so long...

I know @lyw07 had previously thought about performance improvements for ricecooker along these lines.

The current implementation makes repeated calls to /api/internal/add_nodes which takes hours to upload large channels. The node metadata upload happens in small chunks intentionally to avoid network timeout.

Here is one possible way we could implement a "bulk upload" path:

Ricecooker run proceeds as usual, until it reaches this line which gets replaced with conditional:

     if num_nodes < 2000:
          self.add_nodes(root, self.channel)
     else:
          self.bulk_add_nodes(root, self.channel)

In bulk_add_nodes, ricecooker creates a json tree of the entire channel (200MB channel.json), compresses with gzip (20MB channel.json.gz) and uploads it to a new endpoint, /api/internal/bulk_add_nodes. The endpoint responds immediately with a task_id of some sort.
Studio then unzips the channel json gz and creates the studio channel tree (can take ~10 minutes). Meanwhile ricecooker is still blocked in the bulk_add_nodes function and polls a task status endpoint to check on the task progress.
Once the bulk_add_nodes task completes ricecooker continues on to finish_channel as usual.

I'm not sure how the Studio tasks API works so maybe @kollivier can comment about feasibility/suitability for this purpose. The main thing is that processing the channel tree after the POST to /api/internal/bulk_add_nodes will take longer than 1 minute so needs to be done outside of the request-response cycle.

jayoshih · 2019-11-23T01:06:27Z

Alternatively, we could write to a sqlite db and read that (which might fit in nicely if we ever wanted to integrate the tools with Kolibri directly)

ivanistheone · 2019-12-02T20:44:11Z

Here is proof of concept for how a bulk_add_nodes could work: ivanistheone@8f66b0a

and session log:

Creating tree on Kolibri Studio...
   Creating channel Sample Ricecooker Channel
	Preparing fields...

1. Saved studio json tree to chefdata/trees/studio_json_tree.json
2. Compressing .... chefdata/trees/studio_json_tree.json
3. Bulk uploading chefdata/trees/studio_json_tree.json.gz to Studio /api/internal/bulk_add_nodes
4.     checking task status...
       checking task status...
       checking task status...
5. Done. (continuing as ricecooker process as usual)

ralphiee22 · 2019-12-02T22:01:44Z

Meeting notes here: https://www.notion.so/learningequality/Ricecooker-optimizations-e6713176dc894cd882e62cb26f060d04

rtibbles · 2021-03-23T23:54:37Z

Closing in favour of #321

kollivier changed the title ~~Discuss Ricecooker optimizations~~ Ricecooker Studio upload optimizations Dec 26, 2019

kollivier added this to the 0.7 milestone Dec 26, 2019

kollivier added the enhancement label Dec 26, 2019

rtibbles added TAG: new feature and removed enhancement labels Mar 12, 2021

rtibbles mentioned this issue Mar 23, 2021

Use batched uploads of tree subsections to more efficiently generate trees on Studio #321

Closed

rtibbles closed this as completed Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ricecooker Studio upload optimizations #231

Ricecooker Studio upload optimizations #231

ralphiee22 commented Nov 14, 2019

ivanistheone commented Nov 18, 2019

jayoshih commented Nov 23, 2019

ivanistheone commented Dec 2, 2019

ralphiee22 commented Dec 2, 2019

rtibbles commented Mar 23, 2021

Ricecooker Studio upload optimizations #231

Ricecooker Studio upload optimizations #231

Comments

ralphiee22 commented Nov 14, 2019

Description

ivanistheone commented Nov 18, 2019

jayoshih commented Nov 23, 2019

ivanistheone commented Dec 2, 2019

ralphiee22 commented Dec 2, 2019

rtibbles commented Mar 23, 2021