The C Compilation Process

The C Compilation Process

·

4 min read

Compilation is the process of translating human-readable source code, such as C/C++, into machine code that your processor can execute.

The following image shows the steps involved in a typical compilation process for C code (the steps for C++ compilation are similar lol):

Compiling C code involves four phases, one of which is (awkwardly enough) is called compilation (mmhh..interesting). The phases are:

  • preprocessing

  • compilation

  • assembly

  • linking In practice, modern compilers often merge some or all of these phases.

The Preprocessing phase

The compilation process starts with a number of source files that you want to compile.

C source files contain macros (denoted by #define) and #include directives. You can use the #include directives to include header files(with the extension .h) on which the source file depends. The pre-processing phase expands any #define and #include directives in the source file so all that's left is pure C code ready to be compiled.

Suppose you want to compile a C source file using gcc, as shown here:

#include <stdio.h>
#define FORMAT_STRING "%s"
#define MESSAGE "Hello, world!\n"
int
main(int argc, char *argv[]) {
    printf(FORMAT_STRING, MESSAGE);
    return 0;
}

By default, gcc will automatically execute all compilation phases, so you have to explicitly tell it to stop after preprocessing and show you the immediate output. For gcc, this can be done using the command : gcc -E -P Where -E tells gcc to stop after preprocessing and -P causes the compiler to omit debugging information so the output is a bit cleaner.

$ gcc -E -P compilation_example.c

typedef long unsigned int size_t;
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;
typedef unsigned long int __u_long;
/* ... */
extern int sys_nerr;
extern const char *const sys_errlist[];
extern int fileno (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ;
extern int fileno_unlocked (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ;
extern FILE *popen (const char *__command, const char *__modes) ;
extern int pclose (FILE *__stream);
extern char *ctermid (char *__s) __attribute__ ((__nothrow__ , __leaf__));
extern void flockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ;
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));

The Compilation Phase

The compilation phase takes the preprocessed code and translates it into assembly language. The output of the compilation phase is assembly, in a reasonably human-readable form, with symbolic information intact. As mentioned, gcc normally calls all compilation phases automatically, so to see the emitted assembly from the compilation stage, you have to tell gcc to stop after this stage and store the assembly file on disk. You can do this using the -S flag. You will also pass the option -masm=intel to gcc so that it emits assembly in the Intel syntax rather than the default AT&T syntax.

$ gcc -S -masm=intel compilation_example.c
$ cat compilation_example.s

.file "compilation_example.c"
.intel_syntax noprefix
.section .rodata
➊ .LC0:
.string "Hello, world!"
.text
.globl main
.type main, @function
➋ main:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 16
mov DWORD PTR [rbp-4], edi
mov QWORD PTR [rbp-16], rsi
mov edi, ➌OFFSET FLAT:.LC0
call puts
mov eax, 0
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits

Constants and variables have symbolic names rather than just addresses such as LCO ➊ for the nameless “Hello, world!” string, and there’s an explicit label for the main function ➋ the only function in this case.

The Assembly Phase

The input of the assembly case is the set of object files, sometimes referred toas modules.

Object files contain machine instructions that are in principle executable by the processor.

To generate an object file, you pass the -c flag to gcc

$ gcc -c compilation_example.c
$ file compilation_example.o

compilation_example.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

What exactly does this mean? The first part of the file output shows that the file conforms the ELF specification for binary executables. More specifically, it's a 64-bit ELF file (since you're compiling for x86-64), and it is LSB, meaning that numbers are ordered in memory with their least significant byte first. You can also see that the file is relocatable which means files don't rely on being placed at any particular address in memory: rather they can be moved around at will without breaking any assumptions in the code.

The linking phase

The linking phase is the final step in compilation. During this phase, all object files are combined into a single binary executable. Object files are relocatable, meaning they’re compiled independently and contain relocation symbols for resolving references. The linker merges these files, resolves symbolic references, and produces the final executable. Static libraries (with the extension .a) are fully resolved, while dynamic (shared) libraries may have unresolved references.