Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Going faster on the JVM #51

Open
bsless opened this issue Jun 5, 2021 · 5 comments
Open

Going faster on the JVM #51

bsless opened this issue Jun 5, 2021 · 5 comments

Comments

@bsless
Copy link

bsless commented Jun 5, 2021

I looked into sieppari's performance on the JVM and ran some profiling to get a sense of what exactly needs to be optimized to get on-par with middleware
The measurements and results can be found here
Generally, CPU utilization can be broken down to:

  • enter ~66%
    • assoc-ing to the context: 25%
    • peeking/popping the queue: 20%
  • leave: ~30%
    • iterating over the stack~ 50%
    • invoking keywords on interceptors: ~40%

With native queue and stack and eliminating all keyword access (very ugly code) I was able to get a speed up of 2-3x. The remaining CPU went to the queue and stack mathods, MethodImplCache lookup, Context instantiation (immutable record) and -try. Still very far from just composing 10 functions.

To get that performance we need a "compiled" model which among other things gets rid of the queue and stack and perhaps creates a mutable context for each execution

If I understand the execution model correctly (sans the reified queue) the execution flow for three interceptors would look something like:

flow dot

This can be implemented in two different ways, besides the queue/stack implementation:

  • flow graph. Execution is handled by interceptors. Each interceptor knows who's the next stage and "routes" to it based on success or failure
  • composition. Can be a bit involved but by doing things in the correct order we can compose all the functions we need backwards, starting from the error handlers, then leave, finally enter. A very rouge sketch of this idea, putting async aside for a moment:
(defn -comp-error
  [g f]
  (fn [ctx]
    (g
     (try
       (f ctx)
       (catch Exception e
         (assoc ctx :error e))))))

(def fs [:errf1 :errf2 :errf3])
(def escapes (into [] (reductions -comp-error fs)))

(defn -comp-leave
  [[g-l g-e] [f-l f-e]]
  [(fn leave [ctx]
         (try
           (g-l (f-l ctx))
           (catch Exception e
             (g-e (assoc ctx :error e))))) f-e])

(def fs [:leave1 :leave2 :leave3])
(def leaves (into [] (map first) (reductions -comp-leave (map vector fs escapes))))

(defn -comp-enter
  [tot [f err]]
  (fn [ctx]
    (try
      (f (tot ctx))
      (catch Exception e
        (err (assoc ctx :error e))))))

(def es [:enter1 :enter2 :enter3])
(def doors (map vector es escapes))
(def enters (reduce -comp-enter identity doors))

(let [farewell (last leaves)]
  (def fin
    (fn [ctx]
      (farewell (enters ctx)))))

Regarding async, #9 suggests it might be handled incorrectly at the moment. An option to consider is choosing a unified execution model and doing everything in it. Or should it be determined by the executor?

We should also figure out which parts of the context can be mutable and are the execution environment and which are data. If we forgo the option of exposing the runtime environment (queue and stack) we can go much faster.
There could be a hierarchy of Runner -> ExecutionContext -> Context

There are a lot of considerations and degrees of freedom which can lead to widely different solutions and I'd like to open them for discussion and hopefully experiment with different solutions.

Would love to hear your thoughts and to take a swing at it myself.

@zerg000000
Copy link

queue/stack is needed, since the interceptor could modify the execution queue to provide dynamic behaviour. e.g. routing interceptor is base on the ability of modifying execution queue.

@bsless
Copy link
Author

bsless commented Jun 9, 2021

@zerg000000 since the queue/interceptors keys are known at compile time, couldn't the interceptor chains be precompiled just as the router is precompiled? The only case where you'd need a queue is if you injected it programatically at run time. How common is that pattern?

@bsless
Copy link
Author

bsless commented Jun 9, 2021

Yes I read that implementation before posting my reply.
To reiterate the idea - the chain of interceptors could be derived for each path in advance. This case does not show why it's needed to be modified dynamically. If an endpoint has interceptors associated with it, just "fork" the chain and run them when you match (or after?)

@tvaisanen
Copy link
Contributor

Doing the composition sounds like a good idea to improve the performance. Maybe it could be decoupled from the asynchronous implementation by providing a :async true flag etc.

What comes to the async aspect, reactive streams compliance could be something worthwhile looking into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants