Pythran stories

Pythran 0.8.3 is out!

This blogpost originally was a Jupyter Notebook. You can download it if you want. The conversion was done using nbconvert and a custom template to match the style of the other part of the blog.

Pythran minor version bump

So Pythran just got a version bump. The primary goal of this version is to match the upgrade of the networkx dependency. Pythran now requires networkx>=2.0.

>>> import pythran
>>> pythran.__version__
'0.8.3'

A few extra stuff are bundled in this new version, let's explore them using this notebook.

Fix Jupyter magic

Pythran comes with a Jupyter magic cell extension, very similar to Cython's, that makes it possible to compile Python code with Pythran within a cell.

>>> %load_ext pythran.magic
>>> %%pythran
>>> #pythran export fma(float[:], float[:], float[:]))
>>> def fma(a, b, c):
...     return a + b * c
>>> import numpy as np
>>> n = 100000
>>> x = np.random.random(n)
>>> y = np.random.random(n)
>>> z = np.random.random(n)
>>> %timeit fma(x, y, z)
89.7 µs ± 3.27 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

There was a bug when a cell was recompiled with the same textual content, but a new set of options. This is fixed in this version, as showcased by the following cell that compiles the same code with more aggressive optimization flags, which results in faster (vectorized) code.

>>> %%pythran -march=native -Ofast -DUSE_BOOST_SIMD
>>> #pythran export fma(float[:], float[:], float[:]))
>>> def fma(a, b, c):
...     return a + b * c
>>> %timeit fma(x, y, z)
74 µs ± 288 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Support out param for np.sum

Many numpy operation accept an out parameter, as an alternative output array to place the result of the computation. Pythran now supports this parameter for the numpy.sum and numpy.prod operations. It makes it possible to avoid an extra copy.

>>> n = 1000
>>> x = np.random.random((n,n))
>>> y = np.empty(n)
>>> %%pythran
>>> #pythran export isum(float[:, :], float[:])
>>> def isum(x, y):
...     import numpy as np
...     y[:] = np.sum(x, axis=0)
...     return y
>>> %timeit isum(x, y)
757 µs ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%pythran
>>> #pythran export isum(float[:, :], float[:])
>>> def isum(x, y):
...     import numpy as np
...     np.sum(x, axis=0, out=y)
...     return y
>>> %timeit isum(x, y)
314 ns ± 0.574 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Partial Constant Folding

Pythran supports (interprocedural) constant folding since its very start, and it now also supports a degraded version that folds some operation on list and tuples when only part of the arguments is constant.

>>> import gast as ast
>>> from pythran import passmanager, backend
>>> from pythran.optimizations import PartialConstantFolding
... 
>>> code = '''
>>> def replicate(n):
...     return [n] * 8
>>> '''
>>> node = ast.parse(code)
>>> pm = passmanager.PassManager("test")
>>> status, node = pm.apply(PartialConstantFolding, node)
>>> print(pm.dump(backend.Python, node))
def replicate(n):
    return [n, n, n, n, n, n, n, n]

Better Cython integration

Since version 0.8.1, and thanks to the work of Adrien Guinet funded by OpenDreamKit, Cython has a Pythran mode to delegate Numpy expression computation to the Pythran engine. It's much more stable now, even if it does not mean stable :-).

>>> %load_ext Cython

Note that the following cell calls Pythran through Cython though Jupyter magic. What a wonderful world!

>>> %%cython
>>> # cython: language=c++
>>> # cython: np_pythran=True
... 
>>> import numpy as np
>>> cimport numpy as cnp
... 
>>> def tax(cnp.ndarray[double, ndim=1] d):
...     tax_seg1 = d[(d > 256303)] * 0.45 - 16164.53
...     tax_seg2 = d[(d > 54057) & (d <= 256303)] * 0.42 - 8475.44
...     seg3 = d[(d > 13769) & (d <= 54057)] - 13769
...     seg4 = d[(d > 8820) & (d <= 13769)] - 8820
...     prog_seg3 = seg3 * 0.0000022376 + 0.2397
...     prog_seg4 = seg4 * 0.0000100727 + 0.14
...     return (
...         np.sum(tax_seg1) +
...         np.sum(tax_seg2) +
...         np.sum(seg3 * prog_seg3 + 939.57) +
...         np.sum(seg4 * prog_seg4)
...     ) / np.sum(d)
>>> import numpy as np
>>> d = np.random.random(100)
>>> %timeit tax(d)
22.9 µs ± 39.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> def tax_numpy(d):
...     tax_seg1 = d[(d > 256303)] * 0.45 - 16164.53
...     tax_seg2 = d[(d > 54057) & (d <= 256303)] * 0.42 - 8475.44
...     seg3 = d[(d > 13769) & (d <= 54057)] - 13769
...     seg4 = d[(d > 8820) & (d <= 13769)] - 8820
...     prog_seg3 = seg3 * 0.0000022376 + 0.2397
...     prog_seg4 = seg4 * 0.0000100727 + 0.14
...     return (
...         np.sum(tax_seg1) +
...         np.sum(tax_seg2) +
...         np.sum(seg3 * prog_seg3 + 939.57) +
...         np.sum(seg4 * prog_seg4)
...     ) / np.sum(d)
>>> %timeit tax_numpy(d)
28.9 µs ± 90 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> abs(tax(d) - tax_numpy(d))
0.0

That's all folks

The release is available on Github, on PyPI and Conda.

Thanks to Finistere for his help, to paugier and ashwinvis for their bug reports.

And if you too feel like contributing, the bug tracker is a lively place ;-)