|
| 1 | +EULER |
| 2 | +===== |
| 3 | + |
| 4 | +Intro |
| 5 | +----- |
| 6 | + |
| 7 | +This repo hosts my attempts at using cython to create an library that defines |
| 8 | +an forward-Euler integration routine. This is a proxy for prototyping the |
| 9 | +structure that I will use in SODE. The situation is that the library should |
| 10 | +define an algorithm - in this case forward-Euler - and a user of the library |
| 11 | +should define the function that the algorithm will operate on. My objective is |
| 12 | +that the user should be able to: |
| 13 | + |
| 14 | +1. Use the library with a pure python function for learning or debugging. In |
| 15 | + this case performance is not a concern. |
| 16 | +2. Use the library with a function defined in a cython extension module for |
| 17 | + better performance. This case should perform as fast as possible. |
| 18 | + |
| 19 | +Ideally the code used for the python/cython versions should be as similar as |
| 20 | +possible so that converting a pure python version to its cython equivalent |
| 21 | +should be trivial. |
| 22 | + |
| 23 | +The basic problem is how to achieve maximum efficiency in the pure cython case |
| 24 | +while still allowing users the flexibility to supply a function from pure |
| 25 | +python. |
| 26 | + |
| 27 | +Outline |
| 28 | +------- |
| 29 | + |
| 30 | +The first attempt is in pure python and looks like:: |
| 31 | + :language: python |
| 32 | + |
| 33 | + # euler_01.py |
| 34 | + import numpy as np |
| 35 | + |
| 36 | + def euler(f, x0, t): |
| 37 | + X = np.zeros((len(t), len(x0)), float) |
| 38 | + dxdt = np.zeros(len(x0), float) |
| 39 | + |
| 40 | + X[0, :] = x = np.array(x0) |
| 41 | + tlast = t[0] |
| 42 | + |
| 43 | + for n, tcur in enumerate(t[1:], 1): |
| 44 | + f(x, tlast, dxdt) |
| 45 | + X[n, :] = x = x + dxdt * (tcur - tlast) |
| 46 | + tlast = tcur |
| 47 | + |
| 48 | + return X |
| 49 | + |
| 50 | + def func(x, t, dxdt): |
| 51 | + dxdt[0] = x[1] |
| 52 | + dxdt[1] = - x[0] |
| 53 | + |
| 54 | +The idea is that the function `euler` is defined in the library and the |
| 55 | +function `func` is defined in a script written by a user of the library. The |
| 56 | +function `func` is chosen to describe the Ordinary Differential Equations of |
| 57 | +Simple Harmonic Motion. The function `euler` will perform forward Euler |
| 58 | +integration of the function to produce an array `X` representing the state of |
| 59 | +the system at times corresponding to the elements of `t`. We can run this with |
| 60 | +a script like:: |
| 61 | + :language: python |
| 62 | + |
| 63 | + # script.py |
| 64 | + |
| 65 | + from pylab import * |
| 66 | + import euler_01 |
| 67 | + |
| 68 | + x0 = array([0., 1.]) |
| 69 | + t = arange(0, 10, 0.01) |
| 70 | + |
| 71 | + X = euler_01.euler(euler_01.func, x0, t) |
| 72 | + |
| 73 | + plot(t, X) |
| 74 | + show() |
| 75 | + |
| 76 | +We want to improve on the performance of the above routine by implementing |
| 77 | +`euler` in cython code and allowing a user to implement `func` in cython code |
| 78 | +and have the result achieve performance close to what would be avilable in |
| 79 | +pure-c. |
| 80 | + |
| 81 | +Files |
| 82 | +----- |
| 83 | + |
| 84 | +The files `euler_N.{py|pyx}` in this repo represent different attempts at |
| 85 | +achieving this. To build all of the cython extension modules run:: |
| 86 | + :language: terminal |
| 87 | + |
| 88 | + $ python setup.py build_ext --inplace |
| 89 | + |
| 90 | +To test the `euler_08` implementation do:: |
| 91 | + :language: terminal |
| 92 | + |
| 93 | + $ python run.py 08 |
| 94 | + |
| 95 | +which will plot the results using matplotlib. |
| 96 | + |
| 97 | +To test the performance of all of the implementations, do:: |
| 98 | + :language: terminal |
| 99 | + |
| 100 | + $ ./run.py |
| 101 | + stmt: euler_01.euler(euler_01.func, x0, t) t: 14574 usecs |
| 102 | + stmt: euler_02.euler(euler_02.func, x0, t) t: 16826 usecs |
| 103 | + stmt: euler_03.euler(euler_03.func, x0, t) t: 3118 usecs |
| 104 | + stmt: euler_04.euler(x0, t) t: 1693 usecs |
| 105 | + stmt: euler_05.euler(x0, t) t: 1512 usecs |
| 106 | + stmt: euler_06.euler(x0, t) t: 1491 usecs |
| 107 | + stmt: euler_07.euler(x0, t) t: 29 usecs |
| 108 | + stmt: euler_08.euler(x0, t) t: 31 usecs |
| 109 | + stmt: euler_09.euler(x0, t) t: 41 usecs |
| 110 | + stmt: euler_10.euler(x0, t) t: 44 usecs |
| 111 | + stmt: euler_11.euler(x0, t) t: 1494 usecs |
| 112 | + stmt: euler_12.euler(x0, t) t: 40 usecs |
| 113 | + stmt: euler_13.euler(x0, t) t: 4531 usecs |
| 114 | + |
| 115 | +My interpretation of the above performance differences is as follows. |
| 116 | + |
| 117 | +1. `euler_01` is a pure python implementation and takes 15 millisseconds to |
| 118 | +run the test. |
| 119 | +2. `euler_02` reimplements `euler_01` in cython using `cpdef` functions with |
| 120 | +and static typing. The function `func` is passed as an argument to the |
| 121 | +`euler_02.euler` function. This means that, although it is implemented in c, |
| 122 | +`func` is called through its python interface. The overhead of calling into a |
| 123 | +`cpdef` function through its python interface actually increases the time |
| 124 | +taken to around 17 milliseconds. |
| 125 | +3. `euler_03` improves on `euler_02` by eliminating the creation of temparoray |
| 126 | +arrays and performing all array assignments with `cdef`'d integers. This |
| 127 | +brings the total running time down to about 3 milliseconds which is a factor |
| 128 | +of 5 improvement over the original pure python implementation. |
| 129 | +4. `euler_04` sacrifices the flexibility of being able to pass in any function |
| 130 | +you like by explicitly calling `func` from the `euler` routine. This ensures |
| 131 | +that the `cpdef` function is always called via its c interface and cuts the |
| 132 | +running time by a further 50% (factor of 10 improvement over pure python). |
| 133 | +5. `euler_05` attempts to improve performance by using disabling `wraparound` |
| 134 | +and `boundscheck` in the generated cython code. Unfortunately this only gives |
| 135 | +a small improvement. |
| 136 | +6. `euler_06` attempts to improve on the performance of `euler_05` by |
| 137 | +extracting the data pointer from the numpy array in `func` before assigning to |
| 138 | +it. This results in only a very small improvement. |
| 139 | +7. `euler_07` uses `cdef` functions and `double` pointers everywhere and |
| 140 | +the `cdef`'d `euler` routine explicitly calls the `cdef`'d `func` routine. |
| 141 | +This results in a massive performance boost. The time taken is now 30 |
| 142 | +microseconds, which is 50 times faster than `euler_08` and 500 times faster |
| 143 | +than pure python. This is probably close to the performance that would be |
| 144 | +available in pure c. This does, however, make it impossible for a user to |
| 145 | +supply their own `func` to the library. |
| 146 | +8. `euler_08` attempts to go even further by making `func` an inline function. |
| 147 | +This actually incurs a small performance penalty. |
| 148 | +9. `euler_09` defines an extension type `ODES` with methods `euler` and |
| 149 | +`_func`. This enables `_func` to be customised by subclassing `ODES` in |
| 150 | +another cython module. This incurs a 33% increase in running time relative to |
| 151 | +the super-fast `euler_07`. |
| 152 | +10. `euler_10` is the same as `euler_09` but shows the performance when |
| 153 | +running with a subclass of `ODES` as a library user would. This has a roughly |
| 154 | +50% overhead compared to `euler_07`. |
| 155 | +11. `euler_11` attempts to make the more efficient `euler_07-10` |
| 156 | +implementations more flexible, by adding a `cpdef` function `func` that can be |
| 157 | +overridden by subclassing in pure python. The default implementation of `func` |
| 158 | +calls into a `cdef` function `_func` that can only be overridden by |
| 159 | +subclassing in cython code. This makes it possible to subclass in python or |
| 160 | +cython and override `func` or `_func` respectively. Unfortunately, the |
| 161 | +overhead of calling into the `cpdef`'d function `func` reduces performance |
| 162 | +massively. |
| 163 | +12. `euler_12` achieves the same flexibility as `euler_11` without the |
| 164 | +performance cost by creating two extension types. A user who wants to write |
| 165 | +something in pure python must subclass `pyODES` instead of `ODES` and override |
| 166 | +`func` instead of `_func`. The performance of this variant is about 33% worse |
| 167 | +than the fastest version `euler_07` while keeping the intended flexibility |
| 168 | +that a user can override the methods in either python or cython. It is, |
| 169 | +however, unfortunate to have to subclass a different type and override a |
| 170 | +different method. Also if there would be subclasses of `ODES`, then each would |
| 171 | +need a corresponding `py` variant to be usable from pure python. |
| 172 | +13. `euler_13` demonstrates subclassing `pyODES` from `euler_12`. The |
| 173 | +performance is better than the pure python `euler_01` by a factor of about 3 |
| 174 | +Performance is not really a concern if the user is operating in pure python |
| 175 | +but it's good to know that we haven't incurred a penalty for the pure python |
| 176 | +mode by introducing all of the cython infrastructure. |
| 177 | + |
| 178 | + |
0 commit comments