How Does Compiler Explorer (Godbolt) Work?
Compiler Explorer (Godbolt) is an interactive online compiler, which shows corresponding assembly code compiled by high-level language compilers. This article explores its principles and implements a local command-line version.
What is Assembly Language
It is suggested to read 汇编语言入门教程 - 阮一峰的网络日志[1] to learn the basics of assembly language.
Example Code
1 |
|
Result from Compiler Explorer
First of all, we use Compiler Explorer to compile the code above, and get the result as follows, which is our target:
Compile to Assembly by GCC
Use that command to compile C++ code to assembly[2][3]:
1 |
|
1 |
|
File Cleaning and Arguments Optimization
This article mainly shares the exploration process, this part will not actually be used in the end, you can consider skipping it.
.seh_*
Commands
First we need to remove .seh_*
, which are the MASM frame handling pseudo code gas
implementation[4]. Add the argument -fno-asynchronous-unwind-tables
.
1 |
|
Specify Output File
Add the argument -o
to specify the output file.
1 |
|
Intel Style
The assembly code on Compiler Explorer does not have %
, while the result of direct compilation has a lot of %
. Searching, I found that this is the difference between AT&T style and Intel style.
Update the command:
1 |
|
Remove .ident
Search by -fno-asynchronous-unwind-tables
, found the GCC document, and found that the -fno-ident
parameter can remove .ident
[5].
1 |
|
Current Result
Now the result is basically the same as the output on Compiler Explorer:
1 |
|
Deep into Compiler Explorer
Is GCC Really Filtering?
I noticed that there is a filter option on Compiler Explorer, and there is a huge difference in the results when the filter is not selected.
I suspect that this filter is not a function of the GCC compiler, because the binary file generated after the complete compilation process needs to be executable, and it must contain library function files, etc., and whether the Compiler Explorer actually filters after the compilation, and implements this effect.
For example, the simple code in the previous section, use the following command to view the preprocessed file:
1 |
|
1 |
|
After adding #include <stdio.h>
, the preprocessed file becomes more than a thousand lines, and the contents of stdio
are also included, so the compiler will compile this part into assembly code so that the executable file contains all the libraries.
Search for Compiler Explorer Principles
Compiler Explorer was originally named godbolt
, search for how does godbolt
filter assembly code:
Then I found this talk by Matt Godbolt, the author of Compiler Explorer: CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler’s Lid”
Matt Godbolt’s Talk
This speech explains in detail the differences between registers, how to maintain compatibility from 8-bit processors to 64-bit processors, Intel syntax and AT&T syntax, Compiler Explorer principles…
I personally strongly recommend this speech to undergraduate students in computer science. This is very helpful for understanding the process of the C compiler, the basic knowledge of assembly language, registers, compiler optimization strategies, etc. It is definitely worth spending 2-3 hours to learn and understand this speech.
Each of the details in it can diverge a lot of content, such as the history and changes of computers from 8-bit to 64-bit, assembly language basics, compiler optimization (he demonstrated several cases and showed how the compiler optimized them), linux commands, Docker, cloud server practice and virtual machines…
Learning the content of this talk, and fully understand it, is beneficial to students who are aiming for either industry or academia.
Filter Lines Starting with a Dot and C++ Symbols
In the speech, I noticed the following command:
1 |
|
The parameters of this command are as follows:
-O2
optimization level-c
compile and assemble only, do not link, but we don’t need object files-S
compile and get assembly code-o -
output to command line-masm=intel
use intel syntaxc++filt
filter C++ symbolsgrep -vE '\s+\.
filter lines starting with a dot
The function of c++filt
is to filter C++ symbols, because C++ has function overloading, the compiler will handle this, and we want to get human-readable function names, for example:
1 |
|
From this command, we can update our compilation command as follows:
1 |
|
The result of this compilation:
1 |
|
Problem of Command Parameters
Now we can see that -fno-asynchronous-unwind-tables
and -fno-ident
are actually not needed, and may filter out the content we want to keep, and further modify the command:
1 |
|
Matt’s Simple Solution
In the speech, Matt mentioned his previous simple solution: use the watch
command to execute the compilation command regularly, and then use tmux
to open vim
and watch
at the same time to achieve a simple Compiler Explorer. I simply reproduced his solution:
Discussion of Platform Differences
Here is current result, there are still some differences from the code on Compiler Explorer (green is my result, red is Compiler Explorer):
1 |
|
I considered whether it was because of the difference in the platform, so I compiled with the same GCC 13 on the AMD platform, Ubuntu system, and the same GCC 13, the results are as follows:
We can see that the operations on rsp
and the call __main
are removed, but there are still differences when using pointer variables.
Indirect Operands (Pointer Based)
The indirect operands of Compiler Explorer is [rbp-4]
, while ours is -4[rbp]
.
Generally, -4(%rbp)
is AT&T syntax, [rbp-4]
is Intel syntax, but in our compilation result, Intel syntax produces -4[rbp]
.
Adding -fno-pie
flag can produce the expected output on Linux gcc, let’s ignore these platform differences for now and continue to explore the principles of Compiler Explorer.
Compiler Explorer Principles
Deep into Source Code
Now our code is basically the same as the code on Compiler Explorer, we only need to do two things: filters Library functions and Unused label.
The talk did not mention how to filter, so I searched for the source code of Compiler Explorer: https://github.com/compiler-explorer/compiler-explorer
First, we search for filters
in the source code, and find all available filters in the API documentation:
Then, we search for libraryCode
and find the following code:
objdump
is a tool that disassembles the target file. We haven’t seen ASMPARSER
before, it seems to be an assembler parser. Then I continue to search for externalparser
, and found the software name:
Update Test Code
In order to test the ability of the filter library function, we update the test code to:
1 |
|
asm-parser
I found the asm-parser
software of the author, which is written in C++:
https://github.com/compiler-explorer/asm-parser
This software only compiles on the Linux platform. I tried to compile it on Windows but without success. I executed the following command on Linux:
1 |
|
The result is as follows:
1 |
|
The result on Compiler Explorer:
1 |
|
The flag -library_functions
of this software completely removes all library functions, not unused library functions. I found that in Compiler Explorer, you can choose the binary
option, which is to compile to binary object files and then compile to assembly, or directly use the source file to compile to assembly. This software may be a bit outdated, and can only process binary files.
Back to Source Code
Back to the source code of Compiler Explorer, I found a file named asm-parser.ts
, which contains a lot of regular expressions. This is probably the filter that the platform is currently using. This further confirms my idea that filtering is not a parameter of gcc, but filtering after compilation.
I downloaded the source code and ran it locally to see if it would output the command used in the console, but it didn’t.
Extract Source Code to Local Program
Extract Filtering Code
Then I plan to extract the filtered js code and repackage it, so that it can be run locally.
In the asm-parser.ts
file mentioned above, the last two functions seem to be functions that process binary and non-binary assembly.
1 |
|
The next thing is simple, set a breakpoint and view the method parameters:
asm
is the text of assembly code, filter
is a object contains filtering options.
By the asm text here, we know that the actual compilation parameters are as follows:
1 |
|
Which is added -g
parameter, this is to add debug information in the assembly code, I guess c++filt
will use it later.
I extracted asm_parser.ts
and other files I needed, and wrote a main file.
According to the documentation of asm-parser that written in C++, filter library function is based on file path, the file name must be example.cpp
to be retained, otherwise the main file will be filtered as a library file. The regular expression in the source code shows this:
1 |
|
JSON to Text
The current output is still in JSON format, and I wanted to convert it to plain text, so I wrote a function to output plain text.
1 |
|
c++filt
The next step is to filter C++ symbols by c++filt
, then we have the almost same result as Compiler Explorer.
Description after Symbol
The comparison of the above shows that the output of Compiler Explorer adds a description in brackets after the function call. After reading the source code again, I extracted the CppDemangler
class.
Up to now, the output is exactly the same as the output on godbolt.
asm-parser in Typescript
Optimize Command Line Interface
Finally, I made some small modifications to the code, improved the command line interface, read the assembly code from stdin
, and output to stdout
.
1 |
|
1 |
|
Test again, the output is the same as the output on Compiler Explorer, and the output is exactly the same.
Then we use https://github.com/vercel/pkg to compile ts into binary files, this tool can compile to three platforms at the same.
1 |
|
I published this project on GitHub, and welcome everyone to use and give suggestions: https://github.com/AnzhiZhang/asm-parser
Now we have the full compile command:
1 |
|
Source Code Correspondence and Coloring
The last small problem is that the assembly code on Compiler Explorer is colored to mark the correspondence between the assembly code and the source code, but ours is not. Looking at the return data of the interface, it can be found that the front end receives JSON format, and each line has the line number of the corresponding source code. This feature is implemented by the front end.
We only need to remove the --outputtext
parameter when using asm-parser
, and we can get the data with the correspondence between the source code and the line numbers. The command is as follows:
1 |
|
Other Platforms
We may also want to get filtered assembly code on the armv8 and riscv platforms. The following briefly introduces the implementation method.
To facilitate demonstration, I only use the following code:
1 |
|
Also, in order to facilitate cross-compilation, we use clang
and add -target <triple>
to specify the target platform.[8]
The triple has the general format
<arch><sub>-<vendor>-<sys>-<env>
, where:
arch
=x86_64
,i386
,arm
,thumb
,mips
, etc.sub
= for ex. on ARM:v5
,v6m
,v7a
,v7m
, etc.vendor
=pc
,apple
,nvidia
,ibm
, etc.sys
=none
,linux
,win32
,darwin
,cuda
, etc.env
=eabi
,gnu
,android
,macho
,elf
, etc.
armv8
1 |
|
Compile output:
1 |
|
riscv
1 |
|
Compile output:
1 |
|
References
- 阮一峰. 汇编语言入门教程 - 阮一峰的网络日志. 2018-01-21. Archived on 2023-12-11. Retrieved 2023-12-11. ↩
- Antoine Pelisse. Does C++ compile to assembly?. 2011-01-24. Archived on 2023-12-11. Retrieved 2023-12-11. ↩
- Andrew Edgecombe. How do you get assembler output from C/C++ source in GCC?. 2008-09-26. Archived on 2023-12-11. Retrieved 2023-12-11. ↩
- David Wohlferd. What are .seh_* assembly commands that gcc outputs?. 2016-07-04. Archived on 2023-12-11. Retrieved 2023-12-11. ↩
- Code Gen Options (Using the GNU Compiler Collection (GCC)). Archived on 2023-12-11. Retrieved 2023-12-11. ↩
- Peter Cordes. Step into standard library call with godbolt. 2019-05-21. Archived on 2023-12-11. Retrieved 2023-12-11. ↩
- Peter Cordes. How to remove “noise” from GCC/clang assembly output?. 2016-01-24. Archived on 2023-12-11. Retrieved 2023-12-11. ↩
- Cross-compilation using Clang — Clang 18.0.0git documentation. Archived on 2023-12-11. Retrieved 2023-12-11. ↩