Crossing the native code frontier

Serge « sans paille » Guelton <serge.guelton@telecom-bretagne.eu>

PyParis 2018 -- France / 14 -- 15 november 2018

a talk from Namek, the ultimate frontier

Foreword

Whenever I'll be speaking of Python, I'm referencing:

  • the Python 3 language
  • its reference interpreter, CPython

Python, a glue language

All theses modules are written in C:

And these ones relie on C extension:

At the core: PyObject

Every object in Python is represented as a pointer to a reference-counted, heap-allocated structure, the PyObject*.

It wraps (among other things) an actual value

    ------------
    | PyObject* |
    -------------
    |  typeinfo |
    |  refcount |
    |    ...    |
    |   value   |
    -------------

RARE VIEW OF A PYOBJECT*

Example: PyList

             -----------
             | PyList* |
             -----------
             |   ...   |
             |   ptr   |
             -----------
                 |
                 |
-------------------------------------
| PyObject* | PyObject* | PyObject* |
-------------------------------------

     GUTS OF [1., "TWO", 3]

Consequences

  1. Great flexibility (dynamic typing, memory management etc)
  2. Great costs
    • boxing/unboxing
    • managed values (sizeof(PyObject) == 16 on my laptop)
    • memory fragmentation

Crossing the Frontier

Native code typically operate on unmanaged, unboxed value.

Being less Tedious

Being Careful

Updates on unboxed value may not modify the boxed value!

def up(l):
    l[0] += 1

!=

void up(PyObject* obj) {
    PyObject* first = PyList_GetItem(obj, 0);
    long val = PyLong_AsLong(first);
    val += 1;
}

Being less Costly

...

Being less Costly

Unchecked conversion

long unbox(PyObject* obj) {
    // comment this test for more fun
    if(PyLong_Check(obj))
        return PyLong_AsLong(obj);
    else
        throw std::runtime_error("...");

Being less Costly

Mixed mode

Convert scarcely

Being less Costly

Start unboxed

This is a trade-off!

Why?

Managed, fine grain operation on Numpy array are costly

Why?

Because that's just crossing the frontier, the other way around!

Avoiding Conversion

Use a PyCapsule!

PyObject* PyCapsule_New(void *pointer, const char *name, PyCapsule_Destructor destructor);

And live in the native world for ever

you indeed need a capsule to go to Namek

Illustration x2

import ctypes
# capsulefactory
PyCapsule_New = ctypes.pythonapi.PyCapsule_New
PyCapsule_New.restype = ctypes.py_object
PyCapsule_New.argtypes = (ctypes.c_void_p,
                          ctypes.c_char_p,
                          ctypes.c_void_p)
# load libm
libm = ctypes.CDLL('/lib/.../libm.so.6')
# extract the proper symbol
cbrt = libm.cbrt
# wrap it
capsule = PyCapsule_New(cbrt,
                        'double(double)',
                        None)

Quizz '0

How many conversion happens?

#pythran export quizz0(int)
def quizz0(obj):
    return obj + 1

Quizz '1

How many conversion happens?

#pythran export quizz1(int [])
def quizz1(obj):
    return n + 1

Quizz '2

How many conversion happens?

#pythran export quizz2(int list)
def quizz2(obj):
    return [x + 1 for x in obj]

Quizz '3

Any issue there?

#pythran export quizz3(int set)
def quizz3(obj):
    obj.pop(1)

Quizz '4

And there?

#pythran export quizz4(float64[])
def quizz4(obj):
    obj[0] = 1.

Quizz '5

How many conversion happen here?

#pythran export quizz5(float64(int32), int32)
def quizz5(capsule, value):
    if value > 0:
        return capsule(value)
    else:
        return 0.

Ending words

Now you should be able to understand

And if you don't, it's time for questions :-)

1
SpaceForward
Right, Down, Page DownNext slide
Left, Up, Page UpPrevious slide
GGo to slide number
POpen presenter console
HToggle this help