[inference] speedup by 84% - async, layer caching#103
[inference] speedup by 84% - async, layer caching#103zazer0 wants to merge 18 commits intohijohnnylin:mainfrom
Conversation
|
@zazer0 is attempting to deploy a commit to the Neuronpedia Team on Vercel. A member of the Team first needs to authorize it. |
|
this looks promising! i know this is still draft but quick question on parallel steer generation - does this result in the same results as if they ran separately if we specify a seed? for example scenario A (current):
scenario B (parallel):
i havent tested your code to see if this is the case or not, just flagging it as something to consider |
Good question! Not sure, will look into it and clarify 👍 |
|
@zazer0 I intended to merge this today (it looks great!) and deployed it to some test servers to test outputs, but realized that the |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #103 +/- ##
========================================
+ Coverage 7.98% 8.79% +0.81%
========================================
Files 121 122 +1
Lines 17040 17354 +314
Branches 362 422 +60
========================================
+ Hits 1360 1527 +167
- Misses 15669 15804 +135
- Partials 11 23 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for the updates to include SummaryIt seems that running DEFAULT and STEERED separately result in different results than if you run DEFAULT and STEERED in the same request, using the same settings and seed. I am not sure what the reason is - but the reason this matters to us is because we're a research platform, so we need consistent results between the two. Repro - a Spanish featureSetupTest Command (Change "types" array to ["STEERED"], ["DEFAULT"], or ["STEERED", "DEFAULT"]Test Outputs (clipped for readability): STEERED onlyDEFAULT onlyboth STEERED and DEFAULTRecap
Next Stesps
|
|
Moving this back to draft until author has time to look at it. |
Problem
Fix / Feature
Testing
🔥 Added + committed benchmarking script + results showing 84% speedup 🎉
✅ Added unit + integration tests for all aspects