Always two there are, no more, no less:
an apprentice Nicolas Szlifierski [Quarkslab, Telecom Bretagne]
and a master Serge Guelton [Quarkslab, Telecom Bretagne]
/me
$ whoami
sguelton
$ echo "print('hello world')" > hello.py
$ python -m py_compile hello
$ pycdc hello.pyc
# Source Generated with Decompyle++
# File: hello.pyc (Python 2.7)
print 'hello world'
$ printf "a = 1\nif a: print(a + 2)" > dce.py
$ python -O -m py_compile dce
$ pycdc dce.pyo
# Source Generated with Decompyle++
# File: dce.pyo (Python 2.7)
a = 1
if a:
print a + 2
CPython performs close to zero optimization...
Eenie, meenie, minie, moe... oh, why not all of them?
Python semantics makes it hard to perform source-to-source transformation because of lazy binding and polymorphism:
for i in range(10):
s += hex(i)
Nothing is as it seems...
range = lambda *args: args
Monkey Islandpatching anyone?
__builtin__.hex, __builtin__.oct = __builtin__.oct, __builtin__.hex
No variable lookup for control flow statements! That's an obfuscation opportunity:
(lambda g, c, d: (lambda _: (_.__setitem__('$', ''.join([(_['chr'] if ('chr'
in _) else chr)((_['_'] if ('_' in _) else _)) for _['_'] in (_['s'] if ('s'
in _) else s)[::(-1)]])), _)[-1])( (lambda _: (lambda f, _: f(f, _))((lambda
__,_: ((lambda _: __(__, _))((lambda _: (_.__setitem__('i', ((_['i'] if ('i'
in _) else i) + 1)),_)[(-1)])((lambda _: (_.__setitem__('s',((_['s'] if ('s'
in _) else s) + [((_['l'] if ('l' in _) else l)[(_['i'] if ('i' in _) else i
)] ^ (_['c'] if ('c' in _) else c))])), _)[-1])(_))) if (((_['g'] if ('g' in
_) else g) % 4) and ((_['i'] if ('i' in _) else i)< (_['len'] if ('len' in _
) else len)((_['l'] if ('l' in _) else l)))) else _)), _) ) ( (lambda _: (_.
__setitem__('!', []), _.__setitem__('s', _['!']), _)[(-1)] ) ((lambda _: (_.
__setitem__('!', ((_['d'] if ('d' in _) else d) ^ (_['d'] if ('d' in _) else
d))), _.__setitem__('i', _['!']), _)[(-1)])((lambda _: (_.__setitem__('!', [
(_['j'] if ('j' in _) else j) for _[ 'i'] in (_['zip'] if ('zip' in _) else
zip)((_['l0'] if ('l0' in _) else l0), (_['l1'] if ('l1' in _) else l1)) for
_['j'] in (_['i'] if ('i' in _) else i)]), _.__setitem__('l', _['!']), _)[-1
])((lambda _: (_.__setitem__('!', [1373, 1281, 1288, 1373, 1290, 1294, 1375,
1371,1289, 1281, 1280, 1293, 1289, 1280, 1373, 1294, 1289, 1280, 1372, 1288,
1375,1375, 1289, 1373, 1290, 1281, 1294, 1302, 1372, 1355, 1366, 1372, 1302,
1360, 1368, 1354, 1364, 1370, 1371, 1365, 1362, 1368, 1352, 1374, 1365, 1302
]), _.__setitem__('l1',_['!']), _)[-1])((lambda _: (_.__setitem__('!',[1375,
1368, 1294, 1293, 1373, 1295, 1290, 1373, 1290, 1293, 1280, 1368, 1368,1294,
1293, 1368, 1372, 1292, 1290, 1291, 1371, 1375, 1280, 1372, 1281, 1293,1373,
1371, 1354, 1370, 1356, 1354, 1355, 1370, 1357, 1357, 1302, 1366, 1303,1368,
1354, 1355, 1356, 1303, 1366, 1371]), _.__setitem__('l0', _['!']), _)[(-1)])
({ 'g': g, 'c': c, 'd': d, '$': None})))))))['$'])
Interested? Give a try on http://blog.quarkslab.com
Many opportunities there!
Modify the interpreter so that:
>>> import dis
>>> print dis.opmap['BINARY_ADD']
23
Turns into
>>> import dis
>>> print dis.opmap['BINARY_ADD']
62
and so on for bytecode generation etc
python/Include/opcode.h
shuffle opcodes per groups for custom interpreter generation!
An opcode is stored in a char
but only ~112 are used!
.pyc
and build the histogram, using marshal.loads
and inspect.iscode
.pyc
→.pyc
]For instance:
LOAD_FAST 0
LOAD_CONST n
Turns into:
LOAD_FAST_LOAD_CONST O
ANY_OPCODE_WITH_ARG n
dis
reaaaaly dislikes this one :-)
Decompyler make assumptions on bytecode sequence (some think decompiling ~=
pattern matching)
LOAD_FAST 0
LOAD_FAST 1
BUILD_MAP 0
ROT_THREE
BINARY_ADD
ROT_TWO
POP_TOP
Is equivalent to
LOAD_FAST 0
LOAD_FAST 1
BINARY_ADD
This makes uncompyle
crash! But not pycdc
...
>>>def foo(): return "hack.lu"
>>>import dis
>>>dis.dis(foo)
1 0 LOAD_CONST 1 ('hack.lu')
3 RETURN_VALUE
Strings are loaded using LOAD_CONST
, so...
LOAD_CONST
to perform on-the-fly decryptionproof of concept... rot13
... shame
Wanna write self modifying code?
Each function embeds its bytecode as a string :-)
But strings are immutable in Python :-(
Unless you modify them in a native module ;-)
static PyObject* this_function_modifies_its_caller() {
PyThreadState *tstate = PyThreadState_GET();
if (NULL != tstate && NULL != tstate->frame) {
PyFrameObject *frame = tstate->frame;
int instr = frame->f_lasti;
unsigned char* bytes = (void*)PyString_AS_STRING(frame->f_code->co_code);
bytes[instr + 10] = INPLACE_MODULO;
}
Py_INCREF(Py_None);
return Py_None;
}
Call this before a binary operation to turn it into a modulo
code
objectdump[s]
from the marshal module
pyinstaller
) to bundle your Python application and the modified Python interpreter in a single binarynumba, shedskin, pythran
) to turn some functions/modules into native code
$ ../configure --help | grep enable
[...]
--disable-marshal hide marshal functions
--disable-codeobject hide codeobject functions
--disable-recompilation disable recompilation of .pyc file when .py file is
--enable-cipher-str enable string litteral ciphering
--enable-shuffle-opcode enable opcodes shuffling
--enable-gen-opcode enable generation of new opcodes
Don't expect good engineering there though :-$
obfuscated/2.7