Going faster on the JVM #51

bsless · 2021-06-05T08:46:05Z

I looked into sieppari's performance on the JVM and ran some profiling to get a sense of what exactly needs to be optimized to get on-par with middleware
The measurements and results can be found here
Generally, CPU utilization can be broken down to:

enter ~66%
- assoc-ing to the context: 25%
- peeking/popping the queue: 20%
leave: ~30%
- iterating over the stack~ 50%
- invoking keywords on interceptors: ~40%

With native queue and stack and eliminating all keyword access (very ugly code) I was able to get a speed up of 2-3x. The remaining CPU went to the queue and stack mathods, MethodImplCache lookup, Context instantiation (immutable record) and -try. Still very far from just composing 10 functions.

To get that performance we need a "compiled" model which among other things gets rid of the queue and stack and perhaps creates a mutable context for each execution

If I understand the execution model correctly (sans the reified queue) the execution flow for three interceptors would look something like:

This can be implemented in two different ways, besides the queue/stack implementation:

flow graph. Execution is handled by interceptors. Each interceptor knows who's the next stage and "routes" to it based on success or failure
composition. Can be a bit involved but by doing things in the correct order we can compose all the functions we need backwards, starting from the error handlers, then leave, finally enter. A very rouge sketch of this idea, putting async aside for a moment:

(defn -comp-error
  [g f]
  (fn [ctx]
    (g
     (try
       (f ctx)
       (catch Exception e
         (assoc ctx :error e))))))

(def fs [:errf1 :errf2 :errf3])
(def escapes (into [] (reductions -comp-error fs)))

(defn -comp-leave
  [[g-l g-e] [f-l f-e]]
  [(fn leave [ctx]
         (try
           (g-l (f-l ctx))
           (catch Exception e
             (g-e (assoc ctx :error e))))) f-e])

(def fs [:leave1 :leave2 :leave3])
(def leaves (into [] (map first) (reductions -comp-leave (map vector fs escapes))))

(defn -comp-enter
  [tot [f err]]
  (fn [ctx]
    (try
      (f (tot ctx))
      (catch Exception e
        (err (assoc ctx :error e))))))

(def es [:enter1 :enter2 :enter3])
(def doors (map vector es escapes))
(def enters (reduce -comp-enter identity doors))

(let [farewell (last leaves)]
  (def fin
    (fn [ctx]
      (farewell (enters ctx)))))

Regarding async, #9 suggests it might be handled incorrectly at the moment. An option to consider is choosing a unified execution model and doing everything in it. Or should it be determined by the executor?

We should also figure out which parts of the context can be mutable and are the execution environment and which are data. If we forgo the option of exposing the runtime environment (queue and stack) we can go much faster.
There could be a hierarchy of Runner -> ExecutionContext -> Context

There are a lot of considerations and degrees of freedom which can lead to widely different solutions and I'd like to open them for discussion and hopefully experiment with different solutions.

Would love to hear your thoughts and to take a swing at it myself.

The text was updated successfully, but these errors were encountered:

zerg000000 · 2021-06-09T01:13:44Z

queue/stack is needed, since the interceptor could modify the execution queue to provide dynamic behaviour. e.g. routing interceptor is base on the ability of modifying execution queue.

bsless · 2021-06-09T05:10:32Z

@zerg000000 since the queue/interceptors keys are known at compile time, couldn't the interceptor chains be precompiled just as the router is precompiled? The only case where you'd need a queue is if you injected it programatically at run time. How common is that pattern?

zerg000000 · 2021-06-09T08:56:03Z

https://github.com/metosin/reitit/blob/1ab075bd353966636f154ac36ae9b7990efeb008/modules/reitit-http/src/reitit/http.cljc#L99

bsless · 2021-06-09T09:15:58Z

Yes I read that implementation before posting my reply.
To reiterate the idea - the chain of interceptors could be derived for each path in advance. This case does not show why it's needed to be modified dynamically. If an endpoint has interceptors associated with it, just "fork" the chain and run them when you match (or after?)

tvaisanen · 2021-09-20T13:27:11Z

Doing the composition sounds like a good idea to improve the performance. Maybe it could be decoupled from the asynchronous implementation by providing a :async true flag etc.

What comes to the async aspect, reactive streams compliance could be something worthwhile looking into.

opqdonut mentioned this issue Jan 10, 2023

Optimizing interceptors with function composition metosin/reitit#567

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Going faster on the JVM #51

Going faster on the JVM #51

bsless commented Jun 5, 2021

zerg000000 commented Jun 9, 2021

bsless commented Jun 9, 2021

zerg000000 commented Jun 9, 2021

bsless commented Jun 9, 2021

tvaisanen commented Sep 20, 2021

Going faster on the JVM #51

Going faster on the JVM #51

Comments

bsless commented Jun 5, 2021

zerg000000 commented Jun 9, 2021

bsless commented Jun 9, 2021

zerg000000 commented Jun 9, 2021

bsless commented Jun 9, 2021

tvaisanen commented Sep 20, 2021