What is the air speed velocity of Pythran?

As many projects, Pythran has its own test suite that runs on travis-ci and AppVeyor. It's a nice test suite: it tests different compilers—namely Clang, GCC and MVSCC—with different set of flags—with and without OpenMP and vectorization— and for the 2.7 and 3.5 Python versions. Nice. Brest.

A while ago, I started to collect various high-level Numpy kernels in the numpy-benchmark repo. The goal was to check how well Pythran behaves on these kernels, and to compare it with other compilers. Over the time I've been adding new kernels that showcases different optimization challenges, which leads us to more than 35 different kernels. I was occasionally using that set of kernels to check for performance regression, but that was only semi-automated and rather tedious.

Then comes airspeed velocity, a tool that exactly brings the remaining bits: automated benchmarking over software revision, regression tracking and nice plots. So it was just a matter of stirring everything together to give birth to https://github.com/serge-sans-paille/pythran-asv. In order to not break the link with numpy-benchmarks, the original repo is imported as a git subtree, and the tests are automatically generated based on the kernel annotations :-)

Configuration

I don't have a dedicated machine to run the performance tracking session, so I just ran the following:

$ asv 0.8.6..master

Basically, this commands runs timeit on all kernels from numpy-benchmarks according to the kernel annotations, but only once these kernels have been compiled by the Pythran compiler extracted from the revisions ranging from latest release, 0.8.6 to current master. That's a good way to check if the recent commits brought any performance improvement or regressions.

In order to reduce system glitter, the following one-liner helps a lot:

$ python -m perf system tune

These benchmarks were run during the night, with all apps closed, using a blank ~/.pythranrc and GCC 7.3 and on good ol' Python 2.7. For reference, I have an Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz, but as the goal is merely to track regression, the details are not as relevant as they could be for a traditional benchmark setup.

Subtilities

Although asv works like a charm, some aspects require care.

Pythran is an ahead of time compiler, so it does not make sense to track compilation time as part of the actual benchmark. asv does provide a setup mechanism, but it proved to be too costly (the compilation is called multiple times, and it costs too much time when ranging over hundreds of commits). Fortunately, there is a setup_cache mechanism that fulfills our need perfectly!
numpy-benchmarks tries hard to use randomized input, in order to avoid over-specialization of a benchmark for a given input set. In some cases this led to unstable behavior, so we occasionally forced the seed. Changing the kernel body is not an option; since one of the interests behind using numpy-benchmarks is that it come from third-party source and are not tailored for a given compiler.
On very rare occasions, the Pythran annotation language evolves, and kernels that use new syntax cannot be compiled by older versions of Pythran. This actually happens only once, now that it is possible to specify the value dimension of a given dimension. As a workaround, if a kernel fails to compile, a rewrite rule automatically changes the annotation back to the traditional, less specialized one.

Results

As of this post, the result of the regression test has been snapshot on https://serge-sans-paille.github.io/pythran-asv-snapshot/. It provides a good illustration of the recent commits :-)

laplacien greatly benefits from the partially static shape specialization introduced in 2e9dc6d69. This commit also has a positive impact on grayscott, but a slightly negative impact on wdist.
make_decision exhibits a pattern captured by a52ee300 : the square of the norm of a complex number.
reverse_cumsum is returning a complex Numpy view, something Pythran can efficiently do since 8e871136.

Thanks and Future Works

The original idea of using asv to track Pythran performance comes from a discussion with the QuantStack guys. They really are amazing folks, all the interactions I have with them really keeps the motivation up. They even funded all this regression tracking stuff for Pythran, did I mention they're cool?

Next step is probably to run the regression suite on a larger commit range, this may spot some more regressions or give hint for further improvements. And I'm pretty confident @wolfv will provide an adaptation of pythran-asv for xtensor as he already did for numpy-benchmarks.

And thanks a lot to ashwinvis for his review!