Pythran stories

Shrinking Pythran-Generated Binaries

Testbed

So the question was: do Pythran-generated binaries use more disk space than Cython ones?

I first pick a few Cython files from the Scipy code base. Those are files that can easily be converted back to Python so that Pythran can process them (remember, Pythran only processes pure Python code).

To the notable exception of spectral.py that uses high level numpy.sum(x, axis=1) construct, the Pythran code is generally a rewrite of the Cython code with types and annotation pruned, plus a few syntactic sugar from Python like the for: ... else:.. statement used in hausdorff.py.

For reference, I compiled the Cython code using the rather old school cython src.pyx && gcc -shared -fPIC `python-config --cflags --libs` -o src.so -O2 command line, then stripped the resulting binary using strip src.so. The later command is used to be fair with Pythran, that automatically adds -Wl,-strip-all flag during compilation (both have the same effect of stripping the binary from debug information and useless ELF sections).

Commit History

The following figures illustrate the history of binary size throughout the recent Pythran commit history, using the Cython binary as a base line.

Evolution of binary size for 'spectral.so' Evolution of binary size for 'hausdorff.so' Evolution of binary size for 'max_len_seq_inner.so' Evolution of binary size for 'solve_toeplitz.so'

So a lot of things actually happened :-) Let me explain that.

Controlling Symbols

The first commit around HEAD~12 was the most relevant one. Digging through the output of nm -C on Pythran generated binaries, I noticed quite a lot of symbols that were of no use for the generated binaries; But they were present because Pythran uses the pythonic header only libraries and as such, when it includes some headers, the symbol defined end up in the binary. They are actually marked as hidden because of the -fvisibility=hidden flag, but they are still there (this flag mostly affects the linker). I ended up adding an (optional) anonymous namespace right below the pythonic namespace, which effectively marks all symbols as internal symbols, so the compiler can remove them relatively early in the compilation process.

There's also a small shrink around HEAD~9. This is due to some symbols that were just hanging around in the global namespace, but they happened to be useless :-)

Avoiding Copies

The stats for spectral.py were not so good, even after the initial reduction. While digging through the generated assembly code, I noticed a lot of register spill, ended up with a lot of mov. It turns out my expression template code was making a bunch of copies of its argument, which is sometimes necessary (when the expression template owns its argument) but sometimes not. Not a big deal as pythonic object use a shared reference counter, but still, avoiding that would certainly shrink the generated binaries. Turns out that was a correct guess. And it also speeds up the execution of the code, less spilling is generally a good thing :-)

About Specialization

It may looks strange to have all Pythran binaries thiner that Cython's, except spectral.so. This is explained by the fact that Pythran generates code to handle broadcasting, actually generating two versions for each complex expression: one with broadcasting and one without. Twice the code, twice the fat :-)

That gives me an optimization hint: being able to symbolically compute expression size may turn dynamic broadcasting into static broadcasting, I need to dig on that idea.

Going Further

Let's have a look to the two version of hausdorff binaries:

$ readelf -SW hausdorff.so _hausdorff.so

File: hausdorff.so
There are 28 section headers, starting at offset 0x17490:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.gnu.build-id NOTE           00000000000001c8 0001c8 000024 00   A  0   0  4
  [ 2] .gnu.hash         GNU_HASH        00000000000001f0 0001f0 00006c 00   A  3   0  8
  [ 3] .dynsym           DYNSYM          0000000000000260 000260 000618 18   A  4   1  8
  [ 4] .dynstr           STRTAB          0000000000000878 000878 000818 00   A  0   0  1
  [ 5] .gnu.version      VERSYM          0000000000001090 001090 000082 02   A  3   0  2
  [ 6] .gnu.version_r    VERNEED         0000000000001118 001118 0000a0 00   A  4   3  8
  [ 7] .rela.dyn         RELA            00000000000011b8 0011b8 000378 18   A  3   0  8
  [ 8] .rela.plt         RELA            0000000000001530 001530 000378 18  AI  3  23  8
  [ 9] .init             PROGBITS        00000000000018a8 0018a8 000017 00  AX  0   0  4
  [10] .plt              PROGBITS        00000000000018c0 0018c0 000260 10  AX  0   0 16
  [11] .plt.got          PROGBITS        0000000000001b20 001b20 000008 08  AX  0   0  8
  [12] .text             PROGBITS        0000000000001b30 001b30 004c78 00  AX  0   0 16
  [13] .fini             PROGBITS        00000000000067a8 0067a8 000009 00  AX  0   0  4
  [14] .rodata           PROGBITS        00000000000067c0 0067c0 000900 00   A  0   0 32
  [15] .eh_frame_hdr     PROGBITS        00000000000070c0 0070c0 0000e4 00   A  0   0  4
  [16] .eh_frame         PROGBITS        00000000000071a8 0071a8 000650 00   A  0   0  8
  [17] .gcc_except_table PROGBITS        00000000000077f8 0077f8 0001d1 00   A  0   0  4
  [18] .init_array       INIT_ARRAY      0000000000207cf0 007cf0 000010 08  WA  0   0  8
  [19] .fini_array       FINI_ARRAY      0000000000207d00 007d00 000008 08  WA  0   0  8
  [20] .data.rel.ro      PROGBITS        0000000000207d08 007d08 000060 00  WA  0   0  8
  [21] .dynamic          DYNAMIC         0000000000207d68 007d68 000230 10  WA  4   0  8
  [22] .got              PROGBITS        0000000000207f98 007f98 000068 08  WA  0   0  8
  [23] .got.plt          PROGBITS        0000000000208000 008000 000140 08  WA  0   0  8
  [24] .data             PROGBITS        0000000000208140 008140 000088 00  WA  0   0 32
  [25] .bss              NOBITS          00000000002081e0 0081c8 002758 00  WA  0   0 32
  [26] .comment          PROGBITS        0000000000000000 0081c8 00001d 01  MS  0   0  1
  [27] .shstrtab         STRTAB          0000000000000000 0081e5 000100 00      0   0  1

(...)

File: _hausdorff.so
There are 26 section headers, starting at offset 0x2d6e8:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.gnu.build-id NOTE           00000000000001c8 0001c8 000024 00   A  0   0  4
  [ 2] .gnu.hash         GNU_HASH        00000000000001f0 0001f0 000040 00   A  3   0  8
  [ 3] .dynsym           DYNSYM          0000000000000230 000230 000f00 18   A  4   1  8
  [ 4] .dynstr           STRTAB          0000000000001130 001130 000aec 00   A  0   0  1
  [ 5] .gnu.version      VERSYM          0000000000001c1c 001c1c 000140 02   A  3   0  2
  [ 6] .gnu.version_r    VERNEED         0000000000001d60 001d60 000070 00   A  4   2  8
  [ 7] .rela.dyn         RELA            0000000000001dd0 001dd0 0026b8 18   A  3   0  8
  [ 8] .rela.plt         RELA            0000000000004488 004488 000a98 18  AI  3  21  8
  [ 9] .init             PROGBITS        0000000000004f20 004f20 000017 00  AX  0   0  4
  [10] .plt              PROGBITS        0000000000004f40 004f40 000720 10  AX  0   0 16
  [11] .plt.got          PROGBITS        0000000000005660 005660 000008 08  AX  0   0  8
  [12] .text             PROGBITS        0000000000005670 005670 01f753 00  AX  0   0 16
  [13] .fini             PROGBITS        0000000000024dc4 024dc4 000009 00  AX  0   0  4
  [14] .rodata           PROGBITS        0000000000024de0 024de0 002e48 00   A  0   0 32
  [15] .eh_frame_hdr     PROGBITS        0000000000027c28 027c28 000444 00   A  0   0  4
  [16] .eh_frame         PROGBITS        0000000000028070 028070 002508 00   A  0   0  8
  [17] .init_array       INIT_ARRAY      000000000022aca0 02aca0 000008 08  WA  0   0  8
  [18] .fini_array       FINI_ARRAY      000000000022aca8 02aca8 000008 08  WA  0   0  8
  [19] .dynamic          DYNAMIC         000000000022acb0 02acb0 000210 10  WA  4   0  8
  [20] .got              PROGBITS        000000000022aec0 02aec0 000140 08  WA  0   0  8
  [21] .got.plt          PROGBITS        000000000022b000 02b000 0003a0 08  WA  0   0  8
  [22] .data             PROGBITS        000000000022b3a0 02b3a0 002248 00  WA  0   0 32
  [23] .bss              NOBITS          000000000022d600 02d5e8 000768 00  WA  0   0 32
  [24] .comment          PROGBITS        0000000000000000 02d5e8 00001d 01  MS  0   0  1
  [25] .shstrtab         STRTAB          0000000000000000 02d605 0000e1 00      0   0  1

Special glasses help to read through these numbers, but basically:

  • The .text section, i.e. where code lies, is larger on Cython-generated binary, by a factor of ~4 on that binary.
  • The .plt and .plt.got sections, i.e. relocation informations are also larger. This is because Cython uses a lot of symbols fro the libpython while Pythran only uses some Python <> Native converters. This is confirmed by the number of dynamic symbols collected by `` nm -D _hausdorff.so | wc -l``: 159 in the case of Cython-generated binary and 64 for the Pythran version.
  • The .rodata section also contains more information in Cython case. A quick look at its content with objdump -s -j.rodata _hausdorff.so outputs a lot of documentation, error message etc. Looks like Cython takes more care on error message than Pythran :-)

Note that some sections could be removed using strip -r: I suspect .note.gnu.build-id and .comment are not critical.

Conclusion

Pythran generates code that does not make any call to the Python C API. Cython does. Even when the user does its best to remove them for computation critical-parts, it's just not the same guarantee. This has an impact on code size.

But Cython is also more mature, so it's probable that some of its checks that make the code larger may find their way into Pythran generated code too.

Oh, and thanks to the reduction of number of copies, the expression template engine of Pythran got better. That's an unexpected but pleasant side-effect \o/