C++ for snakes

a story of Pythran, Python and C++

Proudly made in Namek by serge-sans-paille

`/me`

Serge « sans paille » Guelton

$ whoami
sguelton

R&D engineer at QuarksLab on compilation for security
Associate researcher at Télécom Bretagne
Core dev of the Open Source Pythran compiler for Numerical Python

Python vs. C++

Foes?

Interpreted vs. Compiled
Duck Typing vs. Static Typing
Reflective vs. ∅
Automatic vs. Manual memory management

Python `<3` C++

for scientific computing

Calling C[++] code from Python is easy and cheap
Python scientific code looks like (modern) Fortran
No dynamic typing or reflective constructs

Translate a Python scientific kernel into C++ efficient code?

Kernel Comparison

In Python

def incr(x):
    return x + 1

In C++

template<class X>
auto incr(X&& x) -> decltype(x + 1)
{
    return x + 1;
}

... with functions

Kernel Comparison(0)

In Python

import numpy as np
def l2norm(x):
    return np.sqrt(np.sum(np.abs(x)**2, axis=1))

In C++

#include <numpy/abs.hpp>
#include <numpy/sqr.hpp>
#include <numpy/sqrt.hpp>
#include <numpy/sum.hpp>
namespace np = numpy;

template<class X>
auto l2norm(X&& x)
-> decltype(np::sqrt(np::sum(np::sqr(np::abs(std::forward<X>(x))), 1)))
{
    return np::sqrt(np::sum(np::sqr(np::abs(std::forward<X>(x))), 1));
}

... with slices

In Python

import numpy as np
def create_grid(x):
    N = x.shape[0]
    z = np.zeros((N, N, 3))
    z[:,:,0], z[:,:,1] = x.reshape(-1,1), x
    return z.reshape(N*N, 3)

In C++

#include <numpy/zeros.hpp>
namespace np = numpy;
template<class X>
auto create_grid(X&& x)
-> decltype(np::zeros(std::forward<X>(x).shape[0], std::forward<X>(x).shape[0], 3L))
{
  auto const N = std::forward<X>(x).shape[0];
  auto z = np::zeros(std::array<long, 3>{{N, N, 3}});
  std::tie(z(np::slice(), np::slice(), 0), z(np::slice(), np::slice(), 1)) =
      std::make_tuple(std::forward<X>(x).reshape(-1, 1), std::forward<X>(x));
  return z.reshape(N*N, 3);
}

Motivations for a numeric Python to C++ translator

Similar constructs (but C++ is more verbose)
Polymorphism seems possible (thanks to templates)
Variable arguments seems possible (thanks to variadic templates)
Type inference seems possible (thanks to auto and decltype)

... and C++11 is fun!

Pythran

A numeric Python to C++ translator

http://pythonhosted.org/pythran
Available through github and PyPI, and my personal debian repository
Works on Linux and MacOSX
Packaged for Archlinux too (thxs garrik!)

2 core developers, ~50 download per day on PyPI, scarce but valuable user reports!

Pythran's Compilation Flow

Turn .py into an AST (import ast...)
Simplify the AST for simpler analyses
Optimize the AST for performance
Generate parametric C++ code
Instantiate the C++ code for the given types
Wrap everything with Boost.Python
Use nt2 + pythonic headers
Generate a native Python module (.so)

Boost.Python

Provides a framework for easy encapsulation of C(++) functions and classes, and conversion between Python and C++ objects

Conversion of basic types (extendible)
Conversion of Python exceptions to C++ exceptions (extendible)
Conversion of function with different signatures (explicit)

/!\Conversions are based on a linear search for a good candidate

Boost.Python example

BOOST_PYTHON_MODULE(foo)
{
  import_array()
  boost::python::register_exception_translator<pythonic::types::TypeError>(
    &pythonic::translate_TypeError);
  pythonic::python_to_pythran<pythonic::types::ndarray<double,1>>();
  pythonic::python_to_pythran<pythonic::types::ndarray<double,2>>();
  pythonic::pythran_to_python<typename foo::foo:type<pythonic::types::ndarray<double,2>>::result_type>();
  boost::python::def("foo", &foo::foo);
  foo::__init__()();
}

Function Model

We want to be able to represent

a = map; a(foo, [1]); a(bar,[1], [2])

Use a generic functor!

struct map {
    template<class F, class... Args>
    auto operator()(F&& f, Args&&... args) const
    -> python::list<decltype(std::forward<F>(f)(std::forward<Args>(args)...))>;
};

Needs a specialization for F = None...

Automatic Vectorization

Boost.SIMD

Provides transparent manipulation of abstract vector registers, for all mathematical functions

`ndarray` class

Combines expression templates with Boost.SIMD for aggregated vector operations

Enhanced Expression Templates

Pythran compiler turns

a = b + c; d = a * 3

into

d = (b + c) * 3

but does not forward-substitute

a = b + c; c[0] = 1; d = a * 3

nor

a = b + c; d = a * 3; e = a ** 2

Tricks!

Neat C++ stuff we used in Pythran

Generic Array slicing and indexing

template<class Arg, class... S>
struct numpy_gexpr;

Represents any slicing or multiple indexing of any expression
S can be a scalar, a contiguous_slice or a slice
All operations imply recursive meta-progframming!

Generic Array Dimension

Dimension of a `numpy_gexpr`?

static constexpr size_t value = std::remove_reference<Arg>::type::value
                                - count_long<S...>::value;

where


template<class T, class... Types>
struct count_long<T, Types...> {
    static constexpr size_t value = count_long<T>::value + count_long<Types...>::value;
};
template<>
struct count_long<> {
    static constexpr size_t value = 0;
};

Fast Array Copy

std::copy is specialized to use ::memove when working on flat pointers

Specialize array expressions to return a pointer when legit

Use a type trait to encode the stride information, based on the slice type

static const bool is_strided =
  std::remove_reference<Arg>::type::is_strided or
  (((sizeof...(S) - count_long<S...>::value) == value)
    and not std::is_same<contiguous_slice,
            typename std::tuple_element<sizeof...(S) - 1, std::tuple<S...>>::type>::value);

Faster `shared_ptr`

in Pythran, memory is always allocated then put in a shared_ptr

Behavior similar to shared_ptr and make_shared
Manages complex interaction with CPython reference counting mechanism
Optional use of an atomic counter

R-value qualifier member functions

For fine grain control over memory ownership

// in ndarray class
auto fast(long i) const & // no copy there
-> decltype(type_helper<ndarray const &>::get(*this, i))
{
  return type_helper<ndarray const &>::get(*this, i);
}
auto fast(long i) && // implies a copy
-> decltype(type_helper<ndarray>::get(std::move(*this), i))
{
  return type_helper<ndarray>::get(std::move(*this), i);
}

Concluding Remarks

Be open-minded!

Learning Python makes you a better C++ dev! It favors:

doctesting, unittesting
High-level Programing

...and learning C++ makes you a better Python dev! It favors:

Performance|memory model understanding
Structured, hierarchical programing

C++ for snakes

a story of Pythran, Python and C++

/me

Serge « sans paille » Guelton

Python vs. C++

Foes?

Python <3 C++

for scientific computing

Translate a Python scientific kernel into C++ efficient code?

Kernel Comparison

In Python

In C++

... with functions

Kernel Comparison(0)

In Python

In C++

... with slices

In Python

In C++

Motivations for a numeric Python to C++ translator

... and C++11 is fun!

Pythran

Pythran's Compilation Flow

Boost.Python

Boost.Python example

Function Model

Use a generic functor!

Automatic Vectorization

Boost.SIMD

ndarray class

Enhanced Expression Templates

Tricks!

Neat C++ stuff we used in Pythran

Generic Array slicing and indexing

Generic Array Dimension

Dimension of a numpy_gexpr?

Fast Array Copy

Specialize array expressions to return a pointer when legit

Faster shared_ptr

R-value qualifier member functions

Concluding Remarks

Be open-minded!

`/me`

Python `<3` C++

`ndarray` class

Dimension of a `numpy_gexpr`?

Faster `shared_ptr`