Compiler Explorer

mattgodbolt · on Oct 16, 2022

Wow - we're on Hacker News again! Hello all, and thanks for the link and discussion! While we're here, we recently made some stats public: https://stats.compiler-explorer.com/ (spot the Hacker News spike!)

General stuff: we're always looking for help; everything's open source on GH: https://github.com/compiler-explorer/ (the base project, our cloud setup, all our build scripts, etc). The most valuable way to help us is with issues and PRs, or hang out on our Discord (https://discord.gg/zNNgyRKh). Then spread the word, and last we welcome sponsors on GH (https://github.com/sponsors/mattgodbolt) or Patreon (https://www.patreon.com/mattgodbolt).

chrisaycock · on Oct 16, 2022

And for those who haven't read it yet, here's Matt's own article in ACM Queue:

https://queue.acm.org/detail.cfm?id=3372264

pencilguin · on Oct 17, 2022

He has a Patreon acct.

For a few bucks, he will send you authentic Godbolt stickers for your laptop lids, for the ultimate in geek cred.

tomcam · on Oct 17, 2022

Surprisingly accessible description of common optimizations; a useful survey for beginning compiler writers.

Genbox · on Oct 16, 2022

Hi Matt. Thank you for such an excellent project.

I'm pretty sure you already know of sharplab.io. A lot of us C# devs prefer it over Compiler Explorer for a bunch of reasons, but the main one is really the speed.

My guess is that Sharplab gets it speed because it hooks directly into the JIT compiler itself. There is no overhead from touching disk or the linker.

mattgodbolt · on Oct 16, 2022

Hi Genbox, thanks for the kind words. I've never heard of sharplab.io - the speed of C# compilation is not very high on the priorities of CE right now (though we have some issues filed, and some (stale) PRs to help). We welcome help improving it!

If you're talking about the speed of other languages than C#, then most of the time is in the compilers itself, not the fairly limited pre- and post-processing we do. We have a pretty sophisticated setup (at least, I think it is...) for handling the 1,500 compiler/language combinations, hundreds of libraries etc (around 2TB of data), but there's always tradeoffs between fast start-up time, manageability and simplicity of adding new compilers, and the run-time performance of running the compilers etc.

andyayers · on Oct 16, 2022

IIRC with .NET 7 we will be able to improve this as we can now dump the jit-generated assembly from official builds.

.NET 7 will be officially released in a month or so.

Currently C#/F#/VB support in Compiler Explorer relies on prejitting which has various issues.

dataflow · on Oct 16, 2022

Confused, how is this related to C++ compilers?

Edit: Oh wow I didn't realize the website does other languages, thanks!

mattgodbolt · on Oct 16, 2022

Why does it have to be related to C++ compilers? Compiler Explorer supports a number of languages, including C#:

  $ curl -sL https://godbolt.org/api/languages
  Id             | Name
  csharp         | C#
  fsharp         | F#
  vb             | Visual Basic
  go             | Go
  c              | C
  c++            | C++
  fortran        | Fortran
  assembly       | Assembly
  circle         | C++ (Circle)
  circt          | CIRCT
  hlsl           | HLSL
  cppx           | Cppx
  crystal        | Crystal
  dart           | Dart
  erlang         | Erlang
  carbon         | Carbon
  hook           | Hook
  cppx_blue      | Cppx-Blue
  cppx_gold      | Cppx-Gold
  mlir           | MLIR
  cuda           | CUDA C++
  analysis       | Analysis
  python         | Python
  racket         | Racket
  ruby           | Ruby
  typescript     | TypeScript Native
  d              | D
  ada            | Ada
  cpp_for_opencl | C++ for OpenCL
  openclc        | OpenCL C
  llvm           | LLVM IR
  cpp2_cppfront  | Cpp2-cppfront
  rust           | Rust
  ispc           | ispc
  java           | Java
  kotlin         | Kotlin
  nim            | Nim
  pony           | Pony
  scala          | Scala
  solidity       | Solidity
  clean          | Clean
  pascal         | Pascal
  haskell        | Haskell
  ocaml          | OCaml
  swift          | Swift
  zig            | Zig
  $

jlarocco · on Oct 17, 2022

Since it's on topic and pretty cool, Common Lisp has built-in support for this with the disassemble function, (disassemble <function-name>):

    CL-USER> (defun do-something (a b)
               (format t "a: ~a~%b: ~a~%~%" a b))
    
    DO-SOMETHING
    CL-USER> (disassemble #'do-something)
    ; disassembly for DO-SOMETHING
    ; Size: 216 bytes. Origin: #x537D0F66                         ; DO-SOMETHING
    ; 0F66:       498B4510         MOV RAX, [R13+16]              ; thread.binding-stack-pointer
    ; 0F6A:       488945F8         MOV [RBP-8], RAX
    ; 0F6E:       840425F87F0050   TEST AL, [#x50007FF8]          ; safepoint
    ; 0F75:       498BB5E8050000   MOV RSI, [R13+1512]            ; tls: *STANDARD-OUTPUT*
    ; 0F7C:       4883FEFF         CMP RSI, -1
    ; 0F80:       480F443425B05B1450 CMOVEQ RSI, [#x50145BB0]     ; *STANDARD-OUTPUT*
    ; 0F89:       488975E0         MOV [RBP-32], RSI
    ; 0F8D:       4883EC10         SUB RSP, 16
    ; 0F91:       488B1570FFFFFF   MOV RDX, [RIP-144]             ; "a: "
    ; 0F98:       488B7DE0         MOV RDI, [RBP-32]
    ; ...

And of course it works for the functions that are defined by the standard, too:

    CL-USER> (disassemble #'format)
    ; disassembly for FORMAT
    ; Size: 581 bytes. Origin: #x52A12CCD                         ; FORMAT
    ; CCD:       840425F87F0050   TEST AL, [#x50007FF8]           ; safepoint
    ; CD4:       48817DF817810050 CMP QWORD PTR [RBP-8], #x50008117  ; NIL
    ; CDC:       0F85CB000000     JNE L4
    ; CE2:       4881EC90000000   SUB RSP, 144
    ; CE9:       4883E4F0         AND RSP, -16
    ; CED:       488D7C240F       LEA RDI, [RSP+15]
    ; CF2:       48C747F1E5000000 MOV QWORD PTR [RDI-15], 229
    ; CFA:       48C747F93E000000 MOV QWORD PTR [RDI-7], 62
    ; D02:       4881ECB0000000   SUB RSP, 176
    ; D09:       4883E4F0         AND RSP, -16
    ; D0D:       488BC4           MOV RAX, RSP
    ; ...

    CL-USER> (disassemble #'first)
    CL-USER> (disassemble #'+)

vips7L · on Oct 16, 2022

Godbolt does more than C++. I often share Java byte code snippets from there

pjmlp · on Oct 17, 2022

I wish it could also do the machine code version like on .NET languages.

Maybe there is a way to plug hsdis into the workflow.

EDIT: Actually there is already a ticket discussing this, https://github.com/compiler-explorer/compiler-explorer/issue...

vips7L · on Oct 17, 2022

Have you seen JitWatch?

https://github.com/AdoptOpenJDK/jitwatch

pjmlp · on Oct 17, 2022

Naturally, :)

the_only_law · on Oct 16, 2022

Thank you for making one of the coolest tools I found in the web in years.

mattgodbolt · on Oct 16, 2022

Thank you for the kind words :)

b3morales · on Oct 16, 2022

A small tip when visiting godbolt: you can use the name of the language you're interested in as a subdomain, to get a page immediately set up for that language, rather than starting with the default C++. For example https://erlang.godbolt.org or https://rust.godbolt.org

camel-cdr · on Oct 16, 2022

Another tip, specifically for mobile users: try https://godbolt.org/noscript

charcircuit · on Oct 16, 2022

https://clang.godbolt.org resolves but doesn't seem to do anything.

mattgodbolt · on Oct 16, 2022

It only works for languages, not compilers. _almost_ everything *.godbolt.org resolves to Compiler Explorer (I have some other projects at this domain), but *.compiler-explorer.com and *.godbo.lt always point at CE.

mattgodbolt · on Oct 17, 2022

That said; it should resolve and still point at Compiler Explorer. If you're not getting that, ping me on email/twitter/discord and we can debug!

kupopuffs · on Oct 16, 2022

Now, is clang a language? Hmm

saagarjha · on Oct 16, 2022

It has “lang” in its name!

account42 · on Oct 17, 2022

Considering it (like all C/C++ Compilers) provides extensions to the languages it implements, yes?

b3morales · on Oct 16, 2022

I'm not sure how many languages have alternative compilers available, but I suppose you could make that a feature request: to have a direct way to choose the compiler with the a subdomain too. https://clang.c.godbolt.org Or maybe it would make more sense in the path: https://c.godbolt.org/clang

choeger · on Oct 16, 2022

When I was still programming in C++ that tool was the method to discuss compiler internals and language semantics with colleagues. Just setup a minimal example and share it with your colleagues. Impressive and sad at the same time. Impressive for obvious reasons. Sad, because the language is so confabulated that there is no easy concise way to talk about its semantics.

jeffbee · on Oct 16, 2022

I also use it for this purpose, however I have come to hate the way other people use it for this. I have a colleague who has really no idea what he's talking about with respect to machine performance, and who did not have the requisite knowledge of how to peep at the assembly code of a given function with the standard tools like objdump, who now loves to send everyone godbolt links in slack, along with his suppositions about which function will be faster, based entirely on vibes (mostly, instruction count). This drives me up the wall. I wish there was some minimum height needed to ride godbolt.

mattgodbolt · on Oct 16, 2022

I go to some pain in my talks to say "instruction count is not a good proxy for performance", but unfortunately folks do still use it. It's handy to say "hey, there's no loop in this output" or "this loop does 3 multiplies; the alternative does two and an add" or similar. It's a mixed blessing to have brought the compiler output to the masses, I can only hope it starts a useful learning process!

trinovantes · on Oct 17, 2022

It's deceptively harder to interpret assembly nowadays with more and more niche instructions that may take hundreds of clock cycles to execute

_tom_ · on Oct 17, 2022

So much of performance is memory access now. The instructions can be fine and the access pattern wrecks you.

adrian_b · on Oct 17, 2022

Yes, indeed.

Now any load from the L3 cache memory or from the main memory takes much more time than any other instruction (not counting exceptions generated by instructions, which include many memory accesses that slow them down, or deprecated instructions that are kept for backwards compatibility and that are executed by long microcode sequences).

MauranKilom · on Oct 16, 2022

Assuming that "caring about performance" = "you are in a tight loop": Here's a tool that simulates/visualizes instruction flow and data dependencies over multiple loop iterations.

https://uica.uops.info/

Paste in assembly code, check "Trace Table" and run, then "Open Trace". Not sure if it will help with your annoying colleague, but it gives a much more concrete idea about how a processor will execute any given code.

Or, if you want to channel their energy into something slightly more direct, there's also https://quick-bench.com/ which allows easy micro-benchmarking. Still not guaranteed to be relevant to any real-world scenario, but more data-driven than "vibes".

gpderetta · on Oct 16, 2022

CE itself has support for llvm-mca.

jcranmer · on Oct 16, 2022

Using Compiler Explorer to see how different compilers interpret the same code, or understanding the generated ABI, or if various pragmas are or are not working, etc., is a very good use of it--I suspect most compiler developers at this point more or less have a tab of Compiler Explorer permanently open at this point.

> I have a colleague who has really no idea what he's talking about with respect to machine performance, and who did not have the requisite knowledge of how to peep at the assembly code of a given function with the standard tools like objdump, who now loves to send everyone godbolt links in slack, along with his suppositions about which function will be faster, based entirely on vibes (mostly, instruction count).

This, however, just no.

mwcampbell · on Oct 16, 2022

It's hard to measure performance in realistic situations; it's easy to measure code size. I recently found myself wasting time doing micro-optimizations, encouraged by the feedback loop of measuring and reducing code size (in my case, not with Compiler Explorer, but with "cargo bloat", since I was working on a Rust project.)

sitkack · on Oct 16, 2022

I know you know this but instructions are basically free in the face of memory access and random memory access is the worst. Linear scans over contiguous memory (per thread) generally optimizes performance.

Instruction counts are only useful if everything is guaranteed to be in registers.

slavik81 · on Oct 17, 2022

It can be tough even without the impact of the memory hierarchy. I've seen code where adding an extra instruction to the calculation made it faster. The extra instruction implicitly eliminated denormals, thus resulting in faster execution with some workloads on systems where operations on denormal values were slower than operations on other values.

It was a completely unnecessary instruction from a correctness perspective, because it had no effect on the answer. However, it was important for performance; removing the instruction made the calculation slower.

sitkack · on Oct 17, 2022

Damn.

It would be fun to non-destructively randomize the instruction stream and have an ML model learn how to remove hazards.

vnorilo · on Oct 16, 2022

How $lang maps to assembly is half the picture: how assembly maps to CPU is the other half. We shouldn't blame ignorance of the latter on a tool for exploring the former.

I do totally get how some people learn just enough to be annoying. Generally I still think that's not a good reason to gatekeep them.

nickysielicki · on Oct 16, 2022

objdump sucks, source annotations via coloring makes it at least 5x faster to read assembly, I don’t care how smart you are or how fluent you are in assembly. If your colleague is wrong for other reasons, that’s orthogonal.

nibbleshifter · on Oct 16, 2022

Side by side (or coloured, or whatever) source -> assembly annotations is also a really, really efficient way to learn some more assembly.

Write program -> compile -> disassemble w/ some mapping -> make notes -> repeat.

Eventually your brains pattern recognition starts to allow you to do neat things with disassembles of programs without source code.

curiousgal · on Oct 16, 2022

Do you all not need to understand assembly to advance the discussion?

jfoutz · on Oct 16, 2022

yes but there's real value in exploration. I haven't touched c or assembly for a long time. here's a cold read.

        push    rbp

this is going to take the contents of rbp and push it onto the top of the stack - this will probably also change the stack pointer

        mov     rbp, rsp

move goes left <-, like a = 5, not 5 = a. so, copy the updated stack pointer into rbp

        mov     DWORD PTR [rbp-4], edi

now, I'm not 100% sure, but I believe this guy puts edi just under the value we pushed to the top of the stack

        mov     eax, DWORD PTR [rbp-4]

Take that value, and put it into eax, I'm not 100% sure why it's not just mov eax edi.

        imul    eax, eax

integer multiply, this is the part that does the double.

        pop     rbp

restore rbp (which we messed with)

ret

and we're done.

there are at least three holes in my understanding - but those three are not _that_ hard to track down.

1, does the stack pointer actually auto increment? (I think it does) 2, imul overflow and setting sign flags and such. - that shouldn't be hard to run down.

3, what is the c calling convention? it looks like the argument is top of stack, but also in edi - is that shuffling really needed? I think there's a bucket of implicit behavior there that's kinda scary.

I would _hope_ unless linking to a library, whatever called this, just did the imul eax eax.

My understanding may be deeply flawed, but explaining my assumptions and my understanding does two things.

1, it helps me learn.

2, it helps others re-evaluate their assumptions and possibly see from a different viewpoint.

I'm not saying spam compiler lists. But a clear and well thought out question can certainly advance discussion. It forces people to formalize their assumptions.

umanwizard · on Oct 16, 2022

The default godbolt page runs the compiler with no flags, which means without any optimizations. This explains why the code unnecessarily shuffles stuff to the stack and back. Unoptimized clang/llvm output spills everything to the stack, and register allocation is an optimization.

With -O3, the code is:

    imul edi, edi
    mov eax, edi
    ret

Yep, the calling convention for x86-64 on Linux and macOS passes the first six integer arguments in rdi, rsi, rdx, rcx, r8, and r9, and then spills to stack.

Sharlin · on Oct 16, 2022

And Win64 in rcx, rdx, r8, and r9.

Having originally learned the basics of assembly on the chronically register-deprived x86, it took me a while to get used to the fact that standard CCs now pass things in registers (and rsi and rdi in particular, retaining their ancient names while being completely general-purpose these days).

atq2119 · on Oct 16, 2022

FWIW, the stack grows downwards and push decrements the stack pointer.

robocat · on Oct 16, 2022

And user netch on stack overflow wrote this which explains more:

  notice also there is a 128-byte space ("red zone") before %rsp that keeps its contents between function calls but preserved by OS during interrupts. So, very temporary values (between function calls) can be used with negative offsets to %rsp. Not all compilers utilize this.

About this code (note opposite order of register movement - there are two main styles of displaying x86 assembly code):

  pushq   %rbp
  movq    %rsp, %rbp
  subq    $16, %rsp

the comment was:

  Compiler allocates some space for local values on function enter. That's why it subtracts value from %rsp on enter. This doesn't depend on whether %rbp is used as frame pointer. After that, this place is used with positive offsets upon %rsp. Also, if this function calls another one, %rsp shall be aligned on 16-byte boundary for each call, so, in that case compiler shall subtract 8 from %rsp on each enter.

pjmlp · on Oct 17, 2022

Brownie points for using Intel's syntax. :)

umanwizard · on Oct 16, 2022

Assembly is not that hard to learn to read.

einpoklum · on Oct 16, 2022

No, because:

1. You can figure out things about the assembly even without understanding assembly (e.g. lines of source translating into 0 lines of output vs many lines of output).

2. You have labels.

3. You can figure out some of the assembly on your own. Say: `mov %r1 %r2` - it probably moves what's in entity %r1 into a similar entity %r2, or vice-versa. etc.

4. You can see what the executable outputs

5. and most importantly: You can read compiler warnings and errors...

whateveracct · on Oct 16, 2022

You can also RTFM and simply .. learn assembly.

dtgriscom · on Oct 16, 2022

... having trouble parsing this...

nice_byte · on Oct 17, 2022

I cannot imagine writing performance-critical software in _any_ language without a tool like this.

pjmlp · on Oct 17, 2022

Most compiled languages have offered switches to generate Assembly.

Even on JVM and .NET there are ways to dump it, while on the various JVM implementations it requires a plugin if not using a debug build, on the .NET side, you can use show Assembly on Visual Studio, or make use of WinDBG with SOS plugin.

throwawy6969 · on Oct 16, 2022

I think it has a few other languages supported now too. And/or there's equivalents for other languages.

I think most of the confabulation of C++ is necessary to get the semantics needed for it to work right. Especially with all of the optimizations that compilers are expected to make. I found the reasoning behind switching from just rvalues/lvalues to the 5+ types they have now to be fascinating, for example.

slavik81 · on Oct 16, 2022

I've used Compiler Explorer for many years as a C++ developer. When I started working in HIP, I really missed having Compiler Explorer in my toolbox. I've been on leave for the past couple months and I took the opportunity to make some contributions outside my normal work. Consequentially, full support for compiling HIP to AMDGPU assembly was merged last week.

Here's an example: https://godbolt.org/z/qjsErWzcs

mattgodbolt · on Oct 16, 2022

For which we thank you very much for your help!!

dorongrinstein · on Oct 16, 2022

Thank you Matt Godbolt for creating such a wonderful tool. You're as cool as they get!

skybrian · on Oct 16, 2022

I didn’t realize until today that Godbolt was someone’s name. I had thought it was a whimsically named compiler tool. :)

mattgodbolt · on Oct 16, 2022

It still surprises me too! I need to change my name to Matt Compiler-Explorer :D

sam_bristow · on Oct 17, 2022

It's the old Frankenstein's Monster problem.

Much like you could argue that Dr Frankenstein was the real monster in that story, I'd say Matt Goldbot was the real Compiler Explorer all along.

a_e_k · on Oct 17, 2022

If you want whimsical names though, someone set up a Decompiler Explorer at dogbolt.org.

Discussed 3mo ago on HN: https://news.ycombinator.com/item?id=32079227

therein · on Oct 16, 2022

I hadn't realized it until I started working with him back in the day and saw his name on Slack.

Wrote him a message and told him how much we appreciated his work at my previous workplace and how we used it all the time to settle debates about C++ code.

Wonderful guy, back then he was very pleasantly surprised that people actually used his website.

ainar-g · on Oct 16, 2022

I've only found out recently, thanks to the C++ Iceberg.

https://fouronnes.github.io/cppiceberg/

(Entries are clickable.)

photochemsyn · on Oct 16, 2022

It's great that you can now get RISC-V output. IMO RISC-V is the most pleasant way to learn assembly-level programming. For anyone interested here's a nice resource:

https://riscv-programming.org/book.html

nevi-me · on Oct 16, 2022

A bit unrelated, I've been learning ARM64 assembly out of the convenience of having an ARM machine. It's also been pleasant, I've been planning on learning RISC-V next. The only device I have access to though is the ESP32C3, so I don't know how far I'll go with it.

Outside of decompiling some code on Godbolt, peeking into the assembly on VS Code, I've also been practising with Rust's inline assembly, quite pleasant.

jshier · on Oct 16, 2022

Do you have a good tutorial or intro for ARM64 or Apple Silicon assembly? I'd like to learn but the books I have are all for MIPS, and the online resources are hit or miss.

pjmlp · on Oct 17, 2022

There is this book for ARM 64, which is quite alright,

https://link.springer.com/book/10.1007/978-1-4842-5881-1

dang · on Oct 16, 2022

Compiler Explorer - https://news.ycombinator.com/item?id=18671993 - Dec 2018 (44 comments)

RMSbolt – An implementation of the Godbolt compiler-explorer for Emacs - https://news.ycombinator.com/item?id=18104167 - Sept 2018 (9 comments)

Compiler bug in MSVC? - https://news.ycombinator.com/item?id=16142512 - Jan 2018 (2 comments)

C compiler undefined behavior: calls never-called function - https://news.ycombinator.com/item?id=15147335 - Sept 2017 (83 comments)

GodBolt: Enter C, get Assembly - https://news.ycombinator.com/item?id=13182726 - Dec 2016 (151 comments)

Compiler Explorer – now with side-by-side compiler comparison - https://news.ycombinator.com/item?id=12627295 - Oct 2016 (6 comments)

Compiler Explorer - https://news.ycombinator.com/item?id=11671730 - May 2016 (38 comments)

Interactive C++ Assembly Explorer - https://news.ycombinator.com/item?id=9861294 - July 2015 (8 comments)

See What Your C Function Compiles To - https://news.ycombinator.com/item?id=9808870 - June 2015 (2 comments)

C/C++ Fiddle - https://news.ycombinator.com/item?id=7593109 - April 2014 (40 comments)

Interactively watch how your C++ gets compiled. - https://news.ycombinator.com/item?id=4020814 - May 2012 (1 comment)

mattgodbolt · on Oct 16, 2022

Right...we turn up here every few months. Always grateful for the PR, of course :)

kolbe · on Oct 16, 2022

Do you earn anything from this site? Even if you do, I can't imagine it's much relative to an Optiver salary. Why do you do it?

mattgodbolt · on Oct 17, 2022

I'm delighted to say we get a decent income from Patreon supporters and GitHub sponsors, as well as a little from commercial sponsors. It covers our running costs, and leaves some left over to save for contingency, and reward some contributors (myself included!).

I don't know what an Optiver salary would be (though I can guess: I work in finance too), but no, it's absolutely not a living wage :), lucky though I am to have anything for an Open Source project.

I do it as it's something I feel strongly about; it's got my name (and reputation!) staked on it (not as I'd planned it); and it's given me enormous opportunities too!

ChrisMarshallNY · on Oct 16, 2022

I've used Godbolt for years.

The story that I heard, was that he was arguing with someone about some compiler optimization, and created the site to prove a point.

tialaramex · on Oct 16, 2022

There's a CPPCon talk - which I'm sure somebody else can remember and link - where Matt explains how this started. IIRC He was initially wondering whether C++ iterators are actually the same price as the C-style for loops you'd once have used instead with an index counting up or a pointer increasing. If it was slower then in Matt's industry that's useless, but if it's the same speed then the improved clarity of what you mean is valuable.

Obviously the results will be identical but because the iterators look fancy (and are easier to think about) maybe there's some object getting constructed, it might be a lot heavier - right? Nope, same machine code. Matt initially did this much more manually, the Godbolt web site is just that same idea getting further enriched over time.

This is even more striking for something like Rust's iterators that don't look at all like it's just the C-style for loop, there's a call to make the iterator IntoIterator::into_iter() and then repeated calls to its next() method, sounds expensive - but nope, once again the optimiser can see what's going on here and emit the same machine code. Having a tool like Godbolt to confirm (or sometimes refute) the belief that these things are actually the same is really useful, as even after confirming that optimisation is needed if the proposed "optimisation" doesn't change the machine code it wasn't an optimisation, just making the program needlessly harder to understand.

Simran-B · on Oct 16, 2022

It's the CppCon 2017 talk with the title "What has my compiler done for me lately? Unbolting the compiler's lid".

The Compiler Explorer backstory starts here: https://youtu.be/bSkpMdDe4g4?t=4m57s

JadeNB · on Oct 16, 2022

> This is even more striking for something like Rust's iterators that don't look at all like it's just the C-style for loop, there's a call to make the iterator IntoIterator::into_iter() and then repeated calls to its next() method, sounds expensive - but nope, once again the optimiser can see what's going on here and emit the same machine code.

> Having a tool like Godbolt to confirm (or sometimes refute) the belief that these things are actually the same is really useful, as even after confirming that optimisation is needed if the proposed "optimisation" doesn't change the machine code it wasn't an optimisation, just making the program needlessly harder to understand.

I'm not sure I understand this. Wouldn't you expect higher-order code to be easier to optimise, since it comes closer to telling the compiler what you want to do, so that the compiler can figure it out, rather than forcing the compiler to divine the big-picture intention of a bunch of low-level instructions?

And, if high-level code generates exactly the same machine code as low-level code, isn't that an argument in favour of high-level code—it lets you code declaratively, saying what you mean—rather than against high-level code? An optimisation might be optimising for intelligibility, not just run-time … and, while an experienced low-level C programmer might find the low-level code more readable, surely the non-expert maintenance programmer who comes afterwards will be grateful not to have to recognise the low-level patterns but rather have them spelled out in high-level code?

p1necone · on Oct 16, 2022

> I'm not sure I understand this. Wouldn't you expect higher-order code to be easier to optimise, since it comes closer to telling the compiler what you want to do, so that the compiler can figure it out, rather than forcing the compiler to divine the big-picture intention of a bunch of low-level instructions?

This assumes a "sufficiently smart compiler", whereas if you write the exact low level basic for loop (or unroll it manually or whatever) you want you can be confident that even a pretty dumb compiler is still going to output something close to optimal.

It just seems like most compilers are "sufficiently smart" for this sort of thing nowadays (and have been for quite some time). But it's not always the case, and the code you think would obviously be optimized by the compiler isn't always. So it pays to check these things if you're writing code for something where potentially eking out extra performance really matters.

generichuman · on Oct 16, 2022

> Wouldn't you expect higher-order code to be easier to optimise, since it comes closer to telling the compiler what you want to do, so that the compiler can figure it out, rather than forcing the compiler to divine the big-picture intention of a bunch of low-level instructions?

Optimization is an NP-hard problem. What compiler backends do these days is mostly to pattern match known optimizable code blocks. Some of the other optimizations are an approximation of the actual solution. The order of optimization type being made also affects the result.

So in a perfect world where we could solve NP-hard problems, higher-level code (with more constraints put on it -- as in Rust traits, not C++ templates) would be easier to optimize. But since we don't live in that utopia, nope.

JadeNB · on Oct 16, 2022

> > Wouldn't you expect higher-order code to be easier to optimise, since it comes closer to telling the compiler what you want to do, so that the compiler can figure it out, rather than forcing the compiler to divine the big-picture intention of a bunch of low-level instructions?

> Optimization is an NP-hard problem. What compiler backends do these days is mostly to pattern match known optimizable code blocks. Some of the other optimizations are an approximation of the actual solution. The order of optimization type being made also affects the result.

Right, and that's what I meant—although I certainly see why it sounded like I was referring to some infallible and perfect optimisation process.

To be precise—and sticking with the theme of iterators from my parent, though there's nothing particularly special except that it's a familiar pattern—if there's one high-level iterator construct, isn't it more likely that the average programmer will write each invocation of an iterator in the way that the compiler expects; whereas, if each user has to roll their own iterator, then different average programmers will roll different iterators, and it's more likely that a programmer will write something so baroque that the compiler doesn't realise it can apply a known optimisation?

saagarjha · on Oct 16, 2022

Perfect optimization is at the very least Turing complete.

mastax · on Oct 17, 2022

What a nested rust iterator is doing, semantically, is creating a struct containing a struct containing a struct... that calls a method that calls a method that calls a method. The compiler has to do a lot of work to make that into anything approaching a C for loop. Just try running unoptimized rust code and you'll see.

sylware · on Oct 16, 2022

WOW!

There is a noscript/ basic (x)html version AND it does handle the _canonical_ gcc aka the last C compilable gcc (4.7.4).

Many of the optimization tricks hidden deep into those massive and complex software will be yours now.

ufo · on Oct 17, 2022

I recently learned that it's not too difficult to run Compiler Explorer locally on your machine. Clone it from github and run the appropriate makefile and npm commands. Recently I've been using it Compiler Explorer a lot like this. It's easier to use your own header files and it's nice to not have the extra latency of compiling remotely or worry that you're using so many of their CPU cycles. The main downside is that if you run locally you can only test on the version of gcc and clang that you have installed, while on the website you can test other versions and also other CPU architectures.

5e92cb50239222b · on Oct 16, 2022

A decent interview with the creator in podcast form.

https://cppcast.com/matt-godbolt-compiler-explorer/

c-cube · on Oct 16, 2022

There's also 2's complement, a podcast by the author and a friend of his. It's pretty nice. https://www.twoscomplement.org/

usefulcat · on Oct 16, 2022

Awesome tool. I do mostly c++ all day and I use this almost daily, and certainly weekly. I think it’s really improved my feel for what the compiler will do with different constructs.

xfgusta · on Oct 16, 2022

I'd like to share a command-line tool to interact with Compiler Explorer that I made: https://github.com/xfgusta/cexpl. It's written in Python and it's available on PyPI.

elromulous · on Oct 16, 2022

Seriously asking, how is it even possible to submit this to hn at this point? When one submits a previously submitted link, doesn't it just alias to the previous submission?

josephcsible · on Oct 16, 2022

That only happens if the previous submission was recent enough.

caramelcream · on Oct 16, 2022

Compiler Explorer is such a wonderful tool. It made examining and comparing compiler outputs so much easier and now pretty much everyone interested in optimizations is using it.

bori · on Oct 17, 2022

I'm looking at the C maxArray example and see that the for loop:

  for (i = 0; i < 65536; i++)

gets compiled as

  .L2:
          add     rax, 8
          cmp     rax, 524288
          jne     .L4

Is there any fundamental reason why the loop increments with 8 instead of 1? Or is this completely arbitrary? Thanks.

kazinator · on Oct 17, 2022

Because it's working with arrays of double which is 8 bytes wide, and the optimized loop doesn't use any scaled indexing; basically the loop index is pre-scaled to the element size. Perhaps those MMX instructions don't support addressing modes with index scaling (wild guess).

The unoptimized code increments by 1 to 65535, but the memory accesses use scaling. Well, not exactly. We see this:

        lea     rdx, [0+rax*8]
        mov     rax, QWORD PTR [rsp-120]
        add     rax, rdx
        movsd   xmm0, QWORD PTR [rax]

This LEA here, though it means "load effective address" is not actually an effective address calculation. The base address is zero, and so this is just LEA being exploited to multiply RAX by 8, and get that into RDX. RAX is then clobbered with the base address of an array, to which the scaled displacement is added and then finally used to make an access.

shortlived · on Oct 16, 2022

I just discovered goldbolt aka compiler explorer via this excellent podcast episode: https://corecursive.com/to-the-assembly/

TobyTheDog123 · on Oct 16, 2022

First ran across this when looking into Carbon - it's a very cool tool!

seabass · on Oct 16, 2022

Is there a trick to make the typescript compiler run? All the other examples seem fine out of the box. No matter what I write, the ts compilation fails.

mattgodbolt · on Oct 16, 2022

Sounds like an issue - I've filed something to take a look: https://github.com/compiler-explorer/compiler-explorer/issue...

mattgodbolt · on Oct 17, 2022

So! The typescript compiler doesn't leave us (currently) with any asm. We can only execute what it produces: https://godbolt.org/z/YvKe8ojvT for example.

You can also use an "execute only" view instead of the default "show me the assembly": https://godbolt.org/z/cfvEvnK56

Hope that helps!

at_compile_time · on Oct 17, 2022

I was using CE earlier today to verify that rustc was optimizing the way I thought it would. It's a great service!

motgilk · on Oct 16, 2022

It's an excellent tool. Would be nice if it showed you the full linker output too, of the final binary.

mattgodbolt · on Oct 16, 2022

If you click "show binary" that's pretty much what we do. Here's a link doing so, and also running `elfdump -a` on the linked output: https://godbolt.org/z/9EcxhedK4

RobotToaster · on Oct 16, 2022

It's interesting to see what the different levels of compiler optimisation produce.

whatsakandr · on Oct 17, 2022

The addition of cppfront during Herb's the talk was legendary.

mik09 · on Oct 20, 2022

wow i first saw this on a small alternative reddit site. great to see it here and being this popular!

londont · on Oct 17, 2022

This is VERY cool. Congrats!

justshowpost · on Oct 16, 2022

Its better than the others but also very unprofessional as the other web stuff. It DOES link but its absolutely impossible to view generated map. People are just hyping about codegen related microoptimizations but don't care about layout at all.

Very typical for today's world where javascript coder counts as software engineer.

timmy777 · on Oct 16, 2022

After the rant, the very least you could do is to educate us about the tool that allows us view codegen and "layout".

>>> Very typical for today's world where javascript coder counts as software engineer.

Oh, hello software engineer from yesterday's world.

justshowpost · on Oct 17, 2022

Well polite people at least say "please". Otherwise you just go buy more SDRAM and ALU chips and be computer scientist.

Curious thing is, you yourself dropped into gcc studying tool discussion clearly having no idea what I was criticizing in it.