Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial migration to nanobind #2

Open
PMeira opened this issue Feb 12, 2018 · 11 comments
Open

Partial migration to nanobind #2

PMeira opened this issue Feb 12, 2018 · 11 comments

Comments

@PMeira
Copy link
Member

PMeira commented Feb 12, 2018

Try to write a implementation based on ctypes and/or Cython to compare performance of the bindings on CPython and PyPy.

Investigate whether Numba's CFFI support can be useful to speed up anything.

Document pain points for each approach.

@PMeira PMeira added this to the 1.0 milestone Feb 12, 2018
@PMeira PMeira self-assigned this Feb 12, 2018
@PMeira
Copy link
Member Author

PMeira commented Aug 13, 2018

A Cython implementation will be investigated, especially since it's better integrated with NumPy.

After studying the different usages of CFFI, it becomes apparent that a ctypes implementation wouldn't help much. One of CFFI's modes actually use ctypes directly, so we could just use that to test the speed if really necessary.

@PMeira
Copy link
Member Author

PMeira commented Sep 6, 2018

An initial Cython implementation was added in the cython branch.

Initial observations from the somewhat naive implementation: Cython generates a huge C file (200k lines) compared to CFFI, and it takes a while to compile it (the CFFI file compiles instantly).

A quick set of microbenchmarks (reading most of the implemented properties) shows it is usually faster than CFFI. When both the benchmarks and this implementation are more complete, we can decide if it is worth to more to Cython. Ideally we should use a set of real-world scripts to benchmark the implementation as whole.

I also ran the benchmark with PyPy on Windows. Using lists instead of NumPy arrays (didn't want to spend time building NumPy on Windows) resulted in a 4-20x speedup compared to CPython.

@PMeira PMeira modified the milestones: 1.0, 0.11 Sep 6, 2018
@PMeira PMeira changed the title Provide implementation using ctypes and/or Cython Provide implementation using Cython and Pybind11 Sep 6, 2018
@PMeira
Copy link
Member Author

PMeira commented Sep 6, 2018

Added Pybind11 as another target of this comparison. Some libraries have moved from Cython and CFFI to it and usually people are happy. Since KLUSolve already needs a C++ compiler and we're trying to provide binary packages for most platforms, the C++ dependency wouldn't be an issue.

@PMeira PMeira changed the title Provide implementation using Cython and Pybind11 Provide implementations using Cython and Pybind11 Sep 6, 2018
@PMeira
Copy link
Member Author

PMeira commented Sep 10, 2018

It seems the call overhead of Pybind11 is higher than both Cython and CFFI.

@PMeira
Copy link
Member Author

PMeira commented Aug 9, 2020

Almost 2 years later, the most likely future is to move to Cython after updating and benchmarking the current versions.
The main advantage of Cython is that it creates the NumPy arrays at the C side, removing some overhead.

Since this is not urgent, we can wait for the initial results of the HPy project to decide. CFFI is very convenient and I'd like to keep it if possible. If HPy develops enough by the end of the year, fully migrating performance-dependent code to PyPy would be the recommendation.

@PMeira
Copy link
Member Author

PMeira commented Jan 5, 2021

If Cython 3.0 is released before the final 0.11 release, I'll retest it and probably use Cython as the main supporting code.

Looking at the size of the CFFI binaries, though, I think it is important to keep them too. One reason is backwards compatibility, another is that users would need a compiler to achieve some things that already can be done with CFFI. As long as the same underlying DSS C-API binary is used, things should work smoothly, and hopefully faster for many use-cases.

@PMeira PMeira modified the milestones: future, 0.11 Apr 11, 2021
@PMeira
Copy link
Member Author

PMeira commented Apr 11, 2021

Moving back to 0.11.

@PMeira
Copy link
Member Author

PMeira commented Apr 13, 2021

DragonFFI is close to how CFFI works, but uses LLVM instead to process the headers/code: https://github.com/aguinet/dragonffi

It uses Pybind11. Probably worth checking on CPython.

@PMeira PMeira modified the milestones: 0.12, 1.0 Jul 12, 2022
@PMeira
Copy link
Member Author

PMeira commented Jul 13, 2022

If we generate the code ourselves or use one of existing tools, due to the high number of variants required here -- (ARM32, ARM64, x86, x86-64) x (Windows, Linux, macOS) -- Py_LIMITED_API (#24) should be a requirement.

Since Cython is a bit slow to progress, the other options will be re-evaluated, as well as generating a full Python module in C/C++ for the CPython API or HPy. Most of the API-binding code for DSS Python is trivial and the main Cython advantage is the creation of NumPy arrays in C. So, provided we can do that without too many issues, there is no way a faster API can be developed for CPython.

The current CFFI implementation can be kept here for PyPy and others, and that can be used for benchmarks and comparisons. We should also setup a comparison across different CPython versions.

Related: dss-extensions/dss_matlab#13

@PMeira
Copy link
Member Author

PMeira commented Mar 29, 2023

Cython reached 3.0 beta stage, yet Py_LIMITED_API is not supported. It's a good point to retest the performance though.

pybind11 seems to support Py_LIMITED_API. I also found https://github.com/google/pywrapcc with some interesting comments.

Nowadays our codebase is mature enough to consider keeping the C extension code. Our Python C-API usage is minimal. Creating NumPy arrays directly would be good; if there is something like .NET's ArrayPool, if certain would be beneficial. It could be fun to test HPy too.

@PMeira PMeira changed the title Provide implementations using Cython and Pybind11 Partial migration to nanobind Oct 21, 2023
@PMeira
Copy link
Member Author

PMeira commented Oct 21, 2023

Considering we're moving to C++ in the engine, nanobind seems to fit well and has integration with some important packages already: https://github.com/wjakob/nanobind

This would move some code of the Python code to C++, but actually simplify some features, like the error checks.

I also tested a simple function to convert the errors directly to exceptions using Python's C-API:

void AltDSSPy_HandleError(int32_t errorNumber, const char* errorMessage)
{
    PyGILState_STATE gstate = PyGILState_Ensure();
    PyObject *util_mod = PyImport_AddModule("dss._cffi_api_util");
    PyObject *dss_exc = PyObject_GetAttrString(util_mod, "DSSException");
    PyObject *exc_args = PyTuple_New(2);
    PyTuple_SetItem(exc_args, 0, PyLong_FromLong(errorNumber));
    PyTuple_SetItem(exc_args, 1, PyUnicode_FromString(errorMessage));
    PyErr_SetObject(dss_exc, exc_args);
    Py_DECREF(dss_exc);
    PyGILState_Release(gstate);
}

This works fine, but we need to adapt the Python-facing function to return NULL, otherwise we get a SystemError exception too. At this point, it loses its utility.

This will be done after we fully migrate to C++ to avoid introducing even more changes in the current implementation.

(title of the issue updated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant