i’m quite new to exploit writing. as part of my n00bery, i learned a rather time-consuming and painful lesson: always make sure ESP is word-aligned!
i had a reasonably-sized region of memory which i could abuse, and so i filled it with nops and stuck my 2nd stage shellcode in there at an arbitrary/convenient place
then i adjusted ESP to point there in my 1st stage shellcode, and JMPed to it
unfortunately for me, my shellcode did not start on a word boundary, and so neither did ESP after my adjustment
even more unfortunately for me, everything – up to a point – behaved completely normally. my second stage shellcode decoded itself with complete integrity and invoked third stage shellcode, which also decoded intact. all of this was using a misaligned stack quite happily
but the final payload (be it bind or reverse shell, calc.exe… i tried a few!) never worked properly, despite being entirely correct
i debugged the decoded 3rd stage shellcode (reading C++ structs off the stack, what fun!) and everything looked normal… but the windows api would just give me a -1 result code and a generic error like ENOTSOCK. ESPFUBAR would have saved me some time :-p
after a few hours of metaphorically tearing my hair out and getting nowhere, i decided to rewrite the exploit again from scratch, and got a shell. using the exact same binary stages as i used before!
ok, so i had a working exploit now, but i NEEDED to know why it didn’t work first time. i debugged both programs and compared the registers, stack content and memory regions at various points… everything was identical apart from my stack adjustment before the JMP… could it be…?
and yes, it was. i tried adjusting ESP varying amounts and the win32 api calls only worked in the 3rd stage when ESP was word-aligned in memory!
so why does the rest of my shellcode, for example the decoder stub which also used the stack, work, but the win32 api barfs? i’ve no idea 🙂 i just know that it does
i read somewhere that it would be inefficient to have a non-aligned stack (because 2 words have to be read instead of 1, then being spliced together), but it turns out from experiential observation that it actually messes stuff up too!
an expensive lesson learned 🙂