the Capsule Corporation

This post is not about the famous Hoi-Poi Capsule but about a feature I recently discovered from Python: PyCapsule. From the doc:

This subtype of PyObject represents an opaque value, useful for C extension modules who need to pass an opaque value (as a void* pointer) through Python code to other C code.

It turns out it's used in at least one situation relevant to Pythran: as a parameter of SciPy's LowLevelCallable. Thanks to this mechanics, some SciPy function written as C extensions can call function written in another functions without any Python conversion in-between.

I reproduce an example from an official SciPy tutorial as an example. The following code is going to be compiled as a shared library through $ gcc -shared -fPIC -o testlib.so testlib.c -O2

/* testlib.c */
double f(int n, double *x, void *user_data) {
    double c = *(double *)user_data;
    return c + x[0] - x[1] * x[2]; /* corresponds to c + x - y * z */
}

It is then loaded through ctypes and used as a parameter to scipy.integrate

import os, ctypes
from scipy import integrate, LowLevelCallable

lib = ctypes.CDLL(os.path.abspath('testlib.so'))
lib.f.restype = ctypes.c_double
lib.f.argtypes = (ctypes.c_int, ctypes.POINTER(ctypes.c_double), ctypes.c_void_p)

c = ctypes.c_double(1.0)
user_data = ctypes.cast(ctypes.pointer(c), ctypes.c_void_p)

func = LowLevelCallable(lib.f, user_data)

A quick'n dirty benchmark gives a hint about the raw performance of the process:

>>> dat =  [[0, 10], [-10, 0], [-1, 1]]
>>> %timeit integrate.nquad(func, dat)
1000 loops, best of 3: 1.78 ms per loop

Using Pythran to generate a capsule

The whole purpose of Pythran is to avoid writing any C code at all. An equivalent of testlib.so can be derived from the following Python code annotated with a pythran export, using $ pythran testlib.py -O2 to produce a shared library testlib.so.

# testlib.py
#pythran export f(int, float64 [], float64 [])
def f(n, x, cp):
    c = cp[0]
    return c + x[0] - x[1] * x[2]

Unfortunately the generated function still performs conversion from Python data to native data, before running the native code. So it's not a good candidate for ctypes importation at all.

Something I like to say about Pythran is that it converts Python programs into C++ meta-programs that are instantiated for the types given in the pythran export lines. And that's definitively a useful thing[0], as it is dead easy to change its interface to generate Python-free functions. With a bit of syntactic sugar, it gives the following:

# testlib.py
#pythran export capsule f(int32, float64*, float64* )
def f(n, x, cp):
    c = cp[0]
    return c + x[0] - x[1] * x[2]

Only the Pythran comment changes, the Python code is unchanged and the resulting function f is not even, it's actually a capsule:

>>> from testlib import f
>>> f
<capsule object "f(int, float64*, float64*)" at 0x7f554d69f840>

SciPy's LowLevelCallable also support capsule as a way to access function pointers:

>>> c = ctypes.c_double(1.0)
>>> user_data = ctypes.cast(ctypes.pointer(c), ctypes.c_void_p)
>>> func = LowLevelCallable(f, user_data, signature="double (int, double *, void *)")

Then we can run the same benchmark as above:

>>> dat =  [[0, 10], [-10, 0], [-1, 1]]
>>> %timeit integrate.nquad(func, dat)
1000 loops, best of 3: 1.75 ms per loop

Cool, the same performance, while keeping Python-compatible code \o/.

Capsule and Numpy

There is another interesting usage example in the SciPy documentation. In that example, the capsule creation is purely done in C, using the Python C API. Let's see how we can achieve the same result with Pythran. The original C routine is the following:

static int
_transform(npy_intp *output_coordinates, double *input_coordinates, int output_rank, int input_rank, void *user_data)
{
    npy_intp i;
    double shift = *(double *)user_data;

    for (i = 0; i < input_rank; i++) {
        input_coordinates[i] = output_coordinates[i] - shift;
    }
    return 1;
}

Using Pythran and Numpy, it is possible to write a portable version like this:

from numpy.ctypeslib import as_array
def transform(output_coordinates, input_coordinates, output_rank, input_rank, user_data):
    shift = user_data[0]
    input_data = as_array(input_coordinates, input_rank)
    output_data = as_array(output_coordinates, output_rank)
    input_data[:] = output_data - shift
    return 1

def transform_basic(output_coordinates, input_coordinates, output_rank, input_rank, user_data):
    shift = user_data[0]
    for i in range(input_rank):
        input_coordinates[i] = output_coordinates[i] - shift;
    return 1

Note that thanks to numpy.ctypeslib that's still 100% pure Python code, using official APIs.

The export line to create a capsule is:

#pythran export capsule transform(int64*, float64*, int32, int32, float64*)
#pythran export capsule transform_basic(int64*, float64*, int32, int32, float64*)

Once compiled with Pythran, we get a native library that can be imported and used in a Python script:

import ctypes
import numpy as np
from scipy import ndimage, LowLevelCallable

from example import transform

shift = 0.5

user_data = ctypes.c_double(shift)
ptr = ctypes.cast(ctypes.pointer(user_data), ctypes.c_void_p)
callback = LowLevelCallable(transform, ptr, "int (npy_intp *, double *, int, int, void *)")
im = np.arange(12).reshape(4, 3).astype(np.float64)
print(ndimage.geometric_transform(im, callback))

Performance wise, the version based on Numpy array is still slightly lagging behind because of the extra array creation (it initializes a here useless memory management part), and the other version is equivalent to the one written in C.

Pitfalls and Booby Traps

Using a PyCapsule requires some care, as the user (you) needs to take care of correctly mapping the native arguments:

The signature passed to LowLevelCallable needs to be exactly the one required by SciPy. Not a single extra white space is allowed!
Changing the Pythran annotation to #pythran export f(int32, float64 [], float32[]) does not yield any error (no type checking can done when matching this to the LowLevelCallable signature) but the actual result is incorrect. Indeed, aliasing a float32* to a float64* is incorrect!
The pointer types in the Pythran annotation are only meaningful within a capsule. There is currently no way to use them in regular Pythran functions.
There is no way to put an overloaded function into a capsule (a capsule wraps a function pointer, which is incompatible with overloads).
Wrapping a pointer into an ndarray using numpy.ctypeslib.as_array currently implies a slight overhead :/.

Apart from that, I'm glad this new feature landed, thanks a lot to @maartenbreddels for opening the related issue!

[0]	It comes at a price though: all Pythran optimization are type agnostic, which puts a heavy burden on the compiler developper's shoulder.