C++ for snakes

a story of Pythran, Python and C++

Proudly made in Namek by serge-sans-paille

/me

Serge « sans paille » Guelton

$ whoami
sguelton
  • R&D engineer at QuarksLab on compilation for security
  • Associate researcher at Télécom Bretagne
  • Core dev of the Open Source Pythran compiler for Numerical Python

Python vs. C++

Foes?

  • Interpreted vs. Compiled
  • Duck Typing vs. Static Typing
  • Reflective vs. ∅
  • Automatic vs. Manual memory management

Python <3 C++

for scientific computing

  • Calling C[++] code from Python is easy and cheap
  • Python scientific code looks like (modern) Fortran
  • No dynamic typing or reflective constructs

Translate a Python scientific kernel into C++ efficient code?

Kernel Comparison

In Python

def incr(x):
    return x + 1

In C++

template<class X>
auto incr(X&& x) -> decltype(x + 1)
{
    return x + 1;
}

... with functions

Kernel Comparison(0)

In Python

import numpy as np
def l2norm(x):
    return np.sqrt(np.sum(np.abs(x)**2, axis=1))

In C++

#include <numpy/abs.hpp>
#include <numpy/sqr.hpp>
#include <numpy/sqrt.hpp>
#include <numpy/sum.hpp>
namespace np = numpy;

template<class X>
auto l2norm(X&& x)
-> decltype(np::sqrt(np::sum(np::sqr(np::abs(std::forward<X>(x))), 1)))
{
    return np::sqrt(np::sum(np::sqr(np::abs(std::forward<X>(x))), 1));
}

... with slices

In Python

import numpy as np
def create_grid(x):
    N = x.shape[0]
    z = np.zeros((N, N, 3))
    z[:,:,0], z[:,:,1] = x.reshape(-1,1), x
    return z.reshape(N*N, 3)

In C++

#include <numpy/zeros.hpp>
namespace np = numpy;
template<class X>
auto create_grid(X&& x)
-> decltype(np::zeros(std::forward<X>(x).shape[0], std::forward<X>(x).shape[0], 3L))
{
  auto const N = std::forward<X>(x).shape[0];
  auto z = np::zeros(std::array<long, 3>{{N, N, 3}});
  std::tie(z(np::slice(), np::slice(), 0), z(np::slice(), np::slice(), 1)) =
      std::make_tuple(std::forward<X>(x).reshape(-1, 1), std::forward<X>(x));
  return z.reshape(N*N, 3);
}

Motivations for a numeric Python to C++ translator

  • Similar constructs (but C++ is more verbose)
  • Polymorphism seems possible (thanks to templates)
  • Variable arguments seems possible (thanks to variadic templates)
  • Type inference seems possible (thanks to auto and decltype)

... and C++11 is fun!

Pythran

A numeric Python to C++ translator

  • http://pythonhosted.org/pythran
  • Available through github and PyPI, and my personal debian repository
  • Works on Linux and MacOSX
  • Packaged for Archlinux too (thxs garrik!)

2 core developers, ~50 download per day on PyPI, scarce but valuable user reports!

Pythran's Compilation Flow

  1. Turn .py into an AST (import ast...)
  2. Simplify the AST for simpler analyses
  3. Optimize the AST for performance
  4. Generate parametric C++ code
  5. Instantiate the C++ code for the given types
  6. Wrap everything with Boost.Python
  7. Use nt2 + pythonic headers
  8. Generate a native Python module (.so)

Boost.Python

Provides a framework for easy encapsulation of C(++) functions and classes, and conversion between Python and C++ objects

  • Conversion of basic types (extendible)
  • Conversion of Python exceptions to C++ exceptions (extendible)
  • Conversion of function with different signatures (explicit)

/!\Conversions are based on a linear search for a good candidate

Boost.Python example

BOOST_PYTHON_MODULE(foo)
{
  import_array()
  boost::python::register_exception_translator<pythonic::types::TypeError>(
    &pythonic::translate_TypeError);
  pythonic::python_to_pythran<pythonic::types::ndarray<double,1>>();
  pythonic::python_to_pythran<pythonic::types::ndarray<double,2>>();
  pythonic::pythran_to_python<typename foo::foo:type<pythonic::types::ndarray<double,2>>::result_type>();
  boost::python::def("foo", &foo::foo);
  foo::__init__()();
}
                

Function Model

We want to be able to represent

a = map; a(foo, [1]); a(bar,[1], [2])

Use a generic functor!

struct map {
    template<class F, class... Args>
    auto operator()(F&& f, Args&&... args) const
    -> python::list<decltype(std::forward<F>(f)(std::forward<Args>(args)...))>;
};

Needs a specialization for F = None...

Automatic Vectorization

Boost.SIMD

Provides transparent manipulation of abstract vector registers, for all mathematical functions

ndarray class

Combines expression templates with Boost.SIMD for aggregated vector operations

Enhanced Expression Templates

Pythran compiler turns

a = b + c; d = a * 3
into
d = (b + c) * 3
but does not forward-substitute
a = b + c; c[0] = 1; d = a * 3
nor
a = b + c; d = a * 3; e = a ** 2

Tricks!

Neat C++ stuff we used in Pythran

Generic Array slicing and indexing

template<class Arg, class... S>
struct numpy_gexpr;

  • Represents any slicing or multiple indexing of any expression
  • S can be a scalar, a contiguous_slice or a slice
  • All operations imply recursive meta-progframming!

Generic Array Dimension

Dimension of a numpy_gexpr?

static constexpr size_t value = std::remove_reference<Arg>::type::value
                                - count_long<S...>::value;

where


template<class T, class... Types>
struct count_long<T, Types...> {
    static constexpr size_t value = count_long<T>::value + count_long<Types...>::value;
};
template<>
struct count_long<> {
    static constexpr size_t value = 0;
};

Fast Array Copy

std::copy is specialized to use ::memove when working on flat pointers

Specialize array expressions to return a pointer when legit

Use a type trait to encode the stride information, based on the slice type

static const bool is_strided =
  std::remove_reference<Arg>::type::is_strided or
  (((sizeof...(S) - count_long<S...>::value) == value)
    and not std::is_same<contiguous_slice,
            typename std::tuple_element<sizeof...(S) - 1, std::tuple<S...>>::type>::value);

Faster shared_ptr

in Pythran, memory is always allocated then put in a shared_ptr

  • Behavior similar to shared_ptr and make_shared
  • Manages complex interaction with CPython reference counting mechanism
  • Optional use of an atomic counter

R-value qualifier member functions

For fine grain control over memory ownership
// in ndarray class
auto fast(long i) const & // no copy there
-> decltype(type_helper<ndarray const &>::get(*this, i))
{
  return type_helper<ndarray const &>::get(*this, i);
}
auto fast(long i) && // implies a copy
-> decltype(type_helper<ndarray>::get(std::move(*this), i))
{
  return type_helper<ndarray>::get(std::move(*this), i);
}

Concluding Remarks

Be open-minded!

Learning Python makes you a better C++ dev! It favors:

  • doctesting, unittesting
  • High-level Programing

...and learning C++ makes you a better Python dev! It favors:

  • Performance|memory model understanding
  • Structured, hierarchical programing