Skip to content

Commit 67a47a6

Browse files
committed
Added README.rst to explain what the files do.
1 parent d538921 commit 67a47a6

File tree

5 files changed

+189
-19
lines changed

5 files changed

+189
-19
lines changed

README.rst

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
EULER
2+
=====
3+
4+
Intro
5+
-----
6+
7+
This repo hosts my attempts at using cython to create an library that defines
8+
an forward-Euler integration routine. This is a proxy for prototyping the
9+
structure that I will use in SODE. The situation is that the library should
10+
define an algorithm - in this case forward-Euler - and a user of the library
11+
should define the function that the algorithm will operate on. My objective is
12+
that the user should be able to:
13+
14+
1. Use the library with a pure python function for learning or debugging. In
15+
this case performance is not a concern.
16+
2. Use the library with a function defined in a cython extension module for
17+
better performance. This case should perform as fast as possible.
18+
19+
Ideally the code used for the python/cython versions should be as similar as
20+
possible so that converting a pure python version to its cython equivalent
21+
should be trivial.
22+
23+
The basic problem is how to achieve maximum efficiency in the pure cython case
24+
while still allowing users the flexibility to supply a function from pure
25+
python.
26+
27+
Outline
28+
-------
29+
30+
The first attempt is in pure python and looks like::
31+
:language: python
32+
33+
# euler_01.py
34+
import numpy as np
35+
36+
def euler(f, x0, t):
37+
X = np.zeros((len(t), len(x0)), float)
38+
dxdt = np.zeros(len(x0), float)
39+
40+
X[0, :] = x = np.array(x0)
41+
tlast = t[0]
42+
43+
for n, tcur in enumerate(t[1:], 1):
44+
f(x, tlast, dxdt)
45+
X[n, :] = x = x + dxdt * (tcur - tlast)
46+
tlast = tcur
47+
48+
return X
49+
50+
def func(x, t, dxdt):
51+
dxdt[0] = x[1]
52+
dxdt[1] = - x[0]
53+
54+
The idea is that the function `euler` is defined in the library and the
55+
function `func` is defined in a script written by a user of the library. The
56+
function `func` is chosen to describe the Ordinary Differential Equations of
57+
Simple Harmonic Motion. The function `euler` will perform forward Euler
58+
integration of the function to produce an array `X` representing the state of
59+
the system at times corresponding to the elements of `t`. We can run this with
60+
a script like::
61+
:language: python
62+
63+
# script.py
64+
65+
from pylab import *
66+
import euler_01
67+
68+
x0 = array([0., 1.])
69+
t = arange(0, 10, 0.01)
70+
71+
X = euler_01.euler(euler_01.func, x0, t)
72+
73+
plot(t, X)
74+
show()
75+
76+
We want to improve on the performance of the above routine by implementing
77+
`euler` in cython code and allowing a user to implement `func` in cython code
78+
and have the result achieve performance close to what would be avilable in
79+
pure-c.
80+
81+
Files
82+
-----
83+
84+
The files `euler_N.{py|pyx}` in this repo represent different attempts at
85+
achieving this. To build all of the cython extension modules run::
86+
:language: terminal
87+
88+
$ python setup.py build_ext --inplace
89+
90+
To test the `euler_08` implementation do::
91+
:language: terminal
92+
93+
$ python run.py 08
94+
95+
which will plot the results using matplotlib.
96+
97+
To test the performance of all of the implementations, do::
98+
:language: terminal
99+
100+
$ ./run.py
101+
stmt: euler_01.euler(euler_01.func, x0, t) t: 14574 usecs
102+
stmt: euler_02.euler(euler_02.func, x0, t) t: 16826 usecs
103+
stmt: euler_03.euler(euler_03.func, x0, t) t: 3118 usecs
104+
stmt: euler_04.euler(x0, t) t: 1693 usecs
105+
stmt: euler_05.euler(x0, t) t: 1512 usecs
106+
stmt: euler_06.euler(x0, t) t: 1491 usecs
107+
stmt: euler_07.euler(x0, t) t: 29 usecs
108+
stmt: euler_08.euler(x0, t) t: 31 usecs
109+
stmt: euler_09.euler(x0, t) t: 41 usecs
110+
stmt: euler_10.euler(x0, t) t: 44 usecs
111+
stmt: euler_11.euler(x0, t) t: 1494 usecs
112+
stmt: euler_12.euler(x0, t) t: 40 usecs
113+
stmt: euler_13.euler(x0, t) t: 4531 usecs
114+
115+
My interpretation of the above performance differences is as follows.
116+
117+
1. `euler_01` is a pure python implementation and takes 15 millisseconds to
118+
run the test.
119+
2. `euler_02` reimplements `euler_01` in cython using `cpdef` functions with
120+
and static typing. The function `func` is passed as an argument to the
121+
`euler_02.euler` function. This means that, although it is implemented in c,
122+
`func` is called through its python interface. The overhead of calling into a
123+
`cpdef` function through its python interface actually increases the time
124+
taken to around 17 milliseconds.
125+
3. `euler_03` improves on `euler_02` by eliminating the creation of temparoray
126+
arrays and performing all array assignments with `cdef`'d integers. This
127+
brings the total running time down to about 3 milliseconds which is a factor
128+
of 5 improvement over the original pure python implementation.
129+
4. `euler_04` sacrifices the flexibility of being able to pass in any function
130+
you like by explicitly calling `func` from the `euler` routine. This ensures
131+
that the `cpdef` function is always called via its c interface and cuts the
132+
running time by a further 50% (factor of 10 improvement over pure python).
133+
5. `euler_05` attempts to improve performance by using disabling `wraparound`
134+
and `boundscheck` in the generated cython code. Unfortunately this only gives
135+
a small improvement.
136+
6. `euler_06` attempts to improve on the performance of `euler_05` by
137+
extracting the data pointer from the numpy array in `func` before assigning to
138+
it. This results in only a very small improvement.
139+
7. `euler_07` uses `cdef` functions and `double` pointers everywhere and
140+
the `cdef`'d `euler` routine explicitly calls the `cdef`'d `func` routine.
141+
This results in a massive performance boost. The time taken is now 30
142+
microseconds, which is 50 times faster than `euler_08` and 500 times faster
143+
than pure python. This is probably close to the performance that would be
144+
available in pure c. This does, however, make it impossible for a user to
145+
supply their own `func` to the library.
146+
8. `euler_08` attempts to go even further by making `func` an inline function.
147+
This actually incurs a small performance penalty.
148+
9. `euler_09` defines an extension type `ODES` with methods `euler` and
149+
`_func`. This enables `_func` to be customised by subclassing `ODES` in
150+
another cython module. This incurs a 33% increase in running time relative to
151+
the super-fast `euler_07`.
152+
10. `euler_10` is the same as `euler_09` but shows the performance when
153+
running with a subclass of `ODES` as a library user would. This has a roughly
154+
50% overhead compared to `euler_07`.
155+
11. `euler_11` attempts to make the more efficient `euler_07-10`
156+
implementations more flexible, by adding a `cpdef` function `func` that can be
157+
overridden by subclassing in pure python. The default implementation of `func`
158+
calls into a `cdef` function `_func` that can only be overridden by
159+
subclassing in cython code. This makes it possible to subclass in python or
160+
cython and override `func` or `_func` respectively. Unfortunately, the
161+
overhead of calling into the `cpdef`'d function `func` reduces performance
162+
massively.
163+
12. `euler_12` achieves the same flexibility as `euler_11` without the
164+
performance cost by creating two extension types. A user who wants to write
165+
something in pure python must subclass `pyODES` instead of `ODES` and override
166+
`func` instead of `_func`. The performance of this variant is about 33% worse
167+
than the fastest version `euler_07` while keeping the intended flexibility
168+
that a user can override the methods in either python or cython. It is,
169+
however, unfortunate to have to subclass a different type and override a
170+
different method. Also if there would be subclasses of `ODES`, then each would
171+
need a corresponding `py` variant to be usable from pure python.
172+
13. `euler_13` demonstrates subclassing `pyODES` from `euler_12`. The
173+
performance is better than the pure python `euler_01` by a factor of about 3
174+
Performance is not really a concern if the user is operating in pure python
175+
but it's good to know that we haven't incurred a penalty for the pure python
176+
mode by introducing all of the cython infrastructure.
177+
178+

euler_07.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ cpdef euler(np.ndarray[DTYPE_t, ndim=1] x0, np.ndarray[DTYPE_t, ndim=1] t):
2626
# Pre-loop setup
2727
for n in range(N):
2828
pX[0 + n] = px[n] = <double>x0[n]
29-
tlast = t[0]
29+
tlast = pt[0]
3030

3131
# Main loop
3232
for m in range(1, M):

euler_08.pyx

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,6 @@ cimport numpy as np
33

44
ctypedef np.float64_t DTYPE_t
55

6-
# This one is used by the euler function defined above
7-
cdef inline void func(double* x, double t, double* dxdt):
8-
dxdt[0] = x[1]
9-
dxdt[1] = - x[0]
10-
116
cpdef euler(np.ndarray[DTYPE_t, ndim=1] x0, np.ndarray[DTYPE_t, ndim=1] t):
127

138
cdef int n, m, N, M
@@ -30,7 +25,7 @@ cpdef euler(np.ndarray[DTYPE_t, ndim=1] x0, np.ndarray[DTYPE_t, ndim=1] t):
3025

3126
# Pre-loop setup
3227
for n in range(N):
33-
pX[0 + n] = px[n] = x0[n]
28+
pX[0 + n] = px[n] = <double>x0[n]
3429
tlast = pt[0]
3530

3631
# Main loop
@@ -42,4 +37,11 @@ cpdef euler(np.ndarray[DTYPE_t, ndim=1] x0, np.ndarray[DTYPE_t, ndim=1] t):
4237
px[n] += pdxdt[n] * dt
4338
pX[m*N + n] = px[n]
4439
tlast = tcur
40+
41+
# Final result
4542
return X
43+
44+
# This one is used by the euler function defined above
45+
cdef inline void func(double* x, double t, double* dxdt):
46+
dxdt[0] = x[1]
47+
dxdt[1] = - x[0]

euler_09.pyx

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,6 @@ cimport numpy as np
33

44
ctypedef np.float64_t DTYPE_t
55

6-
# This one is used by the euler function defined above
7-
cdef inline func(double* x, double t, double* dxdt):
8-
dxdt[0] = x[1]
9-
dxdt[1] = - x[0]
10-
116
cdef class ODES:
127

138
def func(self, np.ndarray[DTYPE_t, ndim=1] x, double t,

euler_11.pyx

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,10 @@ cimport numpy as np
33

44
ctypedef np.float64_t DTYPE_t
55

6-
# This one is used by the euler function defined above
7-
cdef inline func(double* x, double t, double* dxdt):
8-
dxdt[0] = x[1]
9-
dxdt[1] = - x[0]
10-
116
cdef class ODES:
127

13-
cdef func(self, np.ndarray[DTYPE_t, ndim=1] x, double t,
14-
np.ndarray[DTYPE_t, ndim=1] dxdt):
8+
cpdef func(self, np.ndarray[DTYPE_t, ndim=1] x, double t,
9+
np.ndarray[DTYPE_t, ndim=1] dxdt):
1510
self._func(<double*>x.data, t, <double*>dxdt.data)
1611
return dxdt
1712

0 commit comments

Comments
 (0)