Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

yes but there's real value in exploration. I haven't touched c or assembly for a long time. here's a cold read.

        push    rbp
this is going to take the contents of rbp and push it onto the top of the stack - this will probably also change the stack pointer

        mov     rbp, rsp
move goes left <-, like a = 5, not 5 = a. so, copy the updated stack pointer into rbp

        mov     DWORD PTR [rbp-4], edi
now, I'm not 100% sure, but I believe this guy puts edi just under the value we pushed to the top of the stack

        mov     eax, DWORD PTR [rbp-4]
Take that value, and put it into eax, I'm not 100% sure why it's not just mov eax edi.

        imul    eax, eax
integer multiply, this is the part that does the double.

        pop     rbp
restore rbp (which we messed with)

        ret
and we're done.

there are at least three holes in my understanding - but those three are not _that_ hard to track down.

1, does the stack pointer actually auto increment? (I think it does) 2, imul overflow and setting sign flags and such. - that shouldn't be hard to run down.

3, what is the c calling convention? it looks like the argument is top of stack, but also in edi - is that shuffling really needed? I think there's a bucket of implicit behavior there that's kinda scary.

I would _hope_ unless linking to a library, whatever called this, just did the imul eax eax.

My understanding may be deeply flawed, but explaining my assumptions and my understanding does two things.

1, it helps me learn.

2, it helps others re-evaluate their assumptions and possibly see from a different viewpoint.

I'm not saying spam compiler lists. But a clear and well thought out question can certainly advance discussion. It forces people to formalize their assumptions.



The default godbolt page runs the compiler with no flags, which means without any optimizations. This explains why the code unnecessarily shuffles stuff to the stack and back. Unoptimized clang/llvm output spills everything to the stack, and register allocation is an optimization.

With -O3, the code is:

    imul edi, edi
    mov eax, edi
    ret
Yep, the calling convention for x86-64 on Linux and macOS passes the first six integer arguments in rdi, rsi, rdx, rcx, r8, and r9, and then spills to stack.


And Win64 in rcx, rdx, r8, and r9.

Having originally learned the basics of assembly on the chronically register-deprived x86, it took me a while to get used to the fact that standard CCs now pass things in registers (and rsi and rdi in particular, retaining their ancient names while being completely general-purpose these days).


FWIW, the stack grows downwards and push decrements the stack pointer.


And user netch on stack overflow wrote this which explains more:

  notice also there is a 128-byte space ("red zone") before %rsp that keeps its contents between function calls but preserved by OS during interrupts. So, very temporary values (between function calls) can be used with negative offsets to %rsp. Not all compilers utilize this.
About this code (note opposite order of register movement - there are two main styles of displaying x86 assembly code):

  pushq   %rbp
  movq    %rsp, %rbp
  subq    $16, %rsp
the comment was:

  Compiler allocates some space for local values on function enter. That's why it subtracts value from %rsp on enter. This doesn't depend on whether %rbp is used as frame pointer. After that, this place is used with positive offsets upon %rsp. Also, if this function calls another one, %rsp shall be aligned on 16-byte boundary for each call, so, in that case compiler shall subtract 8 from %rsp on each enter.


Brownie points for using Intel's syntax. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: