In computer science, computer architecture is a description of the structure of a computer system made from parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the description may include the instruction set architecture design, microarchitecture design, logic design, and implementation.
The basic components of a computer include a Central Processing Unit (CPU), primary Storage or Random Access Memory (RAM), Secondary Storage, Input/Output devices (eg. screen, keyboard, mouse), and an interconnection referred to as the Bus.
Block diagram of a basic computer with uniprocessor CPU. Black lines indicate data flow, whereas red lines indicate control flow. Arrows indicate the direction of flow.
The architecture is typically referred to as Von Neumann Architecture, or Princeton architecture, and was described in 1945 by the mathematician and physicist John von Neumann.
Programs and data are typically stored on secondary storage (e.g., disk drive or solid-state drive). When a program is executed, it must be copied from the secondary storage into the primary storage or main memory (RAM). The CPU now executes the program from the RAM.
Primary storage or main memory is also referred to as volatile memory since when power is removed, the information is not retained and thus lost. Secondary storage is referred to as non-volatile memory since the information is retained when powered off.
Data Storage Sizes
The x86-64 architecture supports a specific set of data storage size elements, all based on the powers of two:
Storage | Size (bits) | Size (bytes) |
Byte | 8-bits | 1 byte |
Word | 16-bits | 2 bytes |
Double-word | 32-bits | 4 bytes |
Quadword | 64-bits | 8 bytes |
Double quadword | 128-bits | 16 bytes |
Central Processing Unit (CPU)
The CPU is typically referred to as the "brains" of the r computer since that is where the actual calculations are performed.
The CPU is housed in a single chip called a processor, chip, or die. It looks like this:
The CPU consists of 4 parts which are:
Control Unit - Retrieves and decodes instructions from the CPU and then stores and retrieves them to and from memory.
Execution Unit - Where the execution of fetching and retrieving instructions
Registers - Internal CPU memory locations used as temporary data storage.
Flags - indicate events when execution occurs.
It should be noted that the internal design of a modern processor is quite complex. this series provides a very simplified, high-level view of some key functional units within a CPU.
CPU Registers
A CPU register, or just register, is a temporary storage or working location built into the CPU itself (separate from memory).
Computations are typically performed by the CPU using registers.
General Purpose Registers (GPRs)
There are sixteen, 64-bit General Purpose Registers (GPRs). The GPRs are described in the following table.
64-bit register | Lowest 32-bits | Lowest 16-bits | Lowest 8-bits |
rax | eax | ax | al |
rbx | ebx | bx | bl |
rcx | ecx | cx | cl |
rdx | edx | dx | dl |
rsi | esi | si | sil |
rdi | edi | di | dil |
rpb | epb | bp | bpl |
rsp | esp | sp | spl |
r8 | r8d | r8w | r8b |
r9 | r9d | r9w | r9b |
r10 | r10d | r10w | r10b |
r11 | r11d | r11w | r11b |
r12 | r12d | r12w | r12b |
r13 | r13d | r13w | r13b |
r14 | r14d | r14w | r14b |
r15 | r15d | r15w | r15b |
The general-purpose registers are used to temporarily store data as it is processed on the processor. The registers have evolved dramatically over time and continue to do so. We will focus on 32-bit x86 architecture for our purposes. Each new version of general-purpose registers is created to be backward compatible with previous processors. This means that code utilizing 8-bit registers on the 8080 chips will still function on today's 64-bit chipset.
Let's review the 8 general-purpose registers in a 32-bit architecture
EAX
: the main register used in arithmetic calculations. Also known as an accumulator, as it holds the results of arithmetic operations and function return values.
EBX
: The base Register. Pointer to data in the DS segment. Used to store the base address of the program.
ECX
: The counter register is often used to hold a value representing the number of times a process is to be repeated. Used for loop and string operations.
EDX
: A general purpose register. Additionally used for I/O operations. in addition will extend EAX to 64-bits.
ESI
: Source Index register. Pinyer to date in the segment pointed to by the DS register. Used as an offset address in string and array operations. It hides the address from where to read data.
EDI
: Destination Index register. Pomter to data (or destination) in the segment pointed to by the ES register. used as an offset address in string and array operations.
EBP
: Base Pointer. Ponter to data on the stack (in the SS segment). It points to the current stack frame. it is used to reference local variables.
Keep in mind each of the above registers is 32-bit in length or 4 bytes in length. Each of the lower 2 bytes of the EAX, EBX, ECX, and EDX registers can be referenced by AX and then subdivided by the names AH, BH, CH, and DH for high bytes and AL, BL, CL, and DL for the low bytes which are 1 byte each. In addition, the ESI, EDI, EBP and ESP can be referenced by their 16-bit equivalent which is SI, DI, BP, SP. This can be a bit confusing to someone who has not studied computer engineering however let me illustrate in the table below:
Flag Register (rFlags)
The flag register, rFlags, is used for staus and CPU control information. The rFlag register is updated by the CPU after each instruction and not directly accessible by the programs.
Segment Registers
The segment registers are used specifically for referencing memory locations. There are six segment registers which are as follows:
CS: Code segment register stores the base location of the code section (.text section) which is used for data access.
DS: Data segment register stores the default location for variables (.data section) which is nnused for data access.
ES: extra segment register which isued during string operations.
SS: stack segment register stores the base location o he stack segment snd is used when implicitly using the stack pointer or when explicitly using the base pointer.
XMM Registers
There a set of dedicated registers used to support 64-bit and 32-bit floating-point operations.
Instruction Pointer Register
The Instruction pointer register called the EIP register is simply the most important register you will deal with in any reverse engineering. The EIP keeps track of the next instruction code to execute. EIP points to the next instruction to execute. If you were to alter that pointer to jump to another area in the code you have complete over the program.
Lets jump ahead and dive into some code. Here is an example of a simple hello world application in C that we will go into more detail much later in our series.
For our purposes today, we will see the raw POWER of assembly language and particularly that of the EIP register and what we can do to completely hack program control.
// Thank you @0xinfection
#include <stdio.h>
#include <stdlib.h>
void unreachableFunction(void) {
printf("I'm hacked! Iam a hidden function!\n");
exit(0);
}
int main(void) {
printf("Hello World!\n");
return 0;
}
Dont't worry if you do not understand what it does or its functionality. What to take note here is that the fact we have a function called unreachableFunction
that is never called by the main function. As you will see if we can control the EIP register we can hack this program to execute that code!
We have simply compiled the code to work with the IA32 instruction set and ran it. As you can see there is no call to the unreachableFunction
of any kind as its unreachable under normal conditions as you can see the 'Hello World!' printed when excuted.
pwndbg> set disassembly-flavor intel
pwndbg> b main
Breakpoint 1 at 0x11e4: file hello.c, line 12.
pwndbg> r
Starting program: /home/xi/asm64/hello
Downloading separate debug info for /lib/ld-linux.so.2
Downloading separate debug info for system-supplied DSO at 0xf7fc6000
Downloading separate debug info for /usr/lib32/libc.so.6
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, main () at hello.c:12
12 printf("Hello World!\n");
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ[ REGISTERS / show-flags off / show-compact-regs off ]โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
*EAX 0x56558ff4 (_GLOBAL_OFFSET_TABLE_) โโ 0x3ef0
*EBX 0xf7e43e34 (_GLOBAL_OFFSET_TABLE_) โโ 0x243d4c /* 'L=$' */
*ECX 0xffffc830 โโ 0x1
*EDX 0xffffc850 โโธ 0xf7e43e34 (_GLOBAL_OFFSET_TABLE_) โโ 0x243d4c /* 'L=$' */
*EDI 0xf7ffcb80 (_rtld_global_ro) โโ 0x0
*ESI 0xffffc8ec โโธ 0xffffcb58 โโ 'SHELL=/usr/bin/bash'
*EBP 0xffffc818 โโ 0x0
*ESP 0xffffc810 โโธ 0xffffc830 โโ 0x1
*EIP 0x565561e4 (main+25) โโ sub esp, 0xc
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ[ DISASM / i386 / set emulate on ]โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โบ 0x565561e4 <main+25> sub esp, 0xc
0x565561e7 <main+28> lea edx, [eax - 0x1fc9]
0x565561ed <main+34> push edx
0x565561ee <main+35> mov ebx, eax
0x565561f0 <main+37> call puts@plt <puts@plt>
0x565561f5 <main+42> add esp, 0x10
0x565561f8 <main+45> mov eax, 0
0x565561fd <main+50> lea esp, [ebp - 8]
0x56556200 <main+53> pop ecx
0x56556201 <main+54> pop ebx
0x56556202 <main+55> pop ebp
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ[ SOURCE (CODE) ]โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
In file: /home/xi/asm64/hello.c
7 printf("I'm hacked! Iam a hidden function!\n");
8 exit(0);
9 }
10
11 int main(void) {
โบ 12 printf("Hello World!\n");
13
14 return 0;
15 }
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ[ STACK ]โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
00:0000โ esp 0xffffc810 โโธ 0xffffc830 โโ 0x1
01:0004โ 0xffffc814 โโธ 0xf7e43e34 (_GLOBAL_OFFSET_TABLE_) โโ 0x243d4c /* 'L=$' */
02:0008โ ebp 0xffffc818 โโ 0x0
03:000cโ 0xffffc81c โโธ 0xf7c23b09 (__libc_start_call_main+121) โโ add esp, 0x10
04:0010โ 0xffffc820 โโ 0x0
05:0014โ 0xffffc824 โโ 0x0
06:0018โ 0xffffc828 โโธ 0x56555300 โโ '__libc_start_main'
07:001cโ 0xffffc82c โโธ 0xf7c23b09 (__libc_start_call_main+121) โโ add esp, 0x10
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ[ BACKTRACE ]โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โบ f 0 0x565561e4 main+25
f 1 0xf7c23b09 __libc_start_call_main+121
f 2 0xf7c23bcd __libc_start_main+141
f 3 0x5655609b _start+43
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
pwndbg>
Ok...Relax everything is all right.
W have disassembled the program using the GDB Debugger. We have set a breakpoint on the main function and the program. The => shows where EIP is pointing to when we step to the next instruction. If we follow normal program flow, 'Hello World!' will print to the console and exit.
Breakpoint 1, main () at hello.c:12
12 printf("Hello World!\n");
If we run the program again and do an examination of where EIP is pointing to we will see:
pwndbg> x/1xw $eip
0x565561e4 <main+25>: 0x8d0cec83
pwndbg>
We can see EIP is pointing to main+25
or address of 0x8d0cec83
Let's examine the unreachableFunction and see where it starts in memory and write down that address.
pwndbg> disas unreachableFunction
Dump of assembler code for function unreachableFunction:
0x5655619d <+0>: push ebp
0x5655619e <+1>: mov ebp,esp
0x565561a0 <+3>: push ebx
0x565561a1 <+4>: sub esp,0x4
0x565561a4 <+7>: call 0x565560a0 <__x86.get_pc_thunk.bx>
0x565561a9 <+12>: add ebx,0x2e4b
0x565561af <+18>: sub esp,0xc
0x565561b2 <+21>: lea eax,[ebx-0x1fec]
0x565561b8 <+27>: push eax
0x565561b9 <+28>: call 0x56556050 <puts@plt>
0x565561be <+33>: add esp,0x10
0x565561c1 <+36>: sub esp,0xc
0x565561c4 <+39>: push 0x0
0x565561c6 <+41>: call 0x56556060 <exit@plt>
End of assembler dump.
pwndbg>
The next step is to set EIP to address 0x5655619d
so that we hijack program flow to run the unreachableFunction.
pwndbg> set $eip = 0x5655619d
pwndbg> x/1w $eip
0x5655619d <unreachableFunction>: 0x53e58955
pwndbg>
Now that we have hacked control of EIP, lets continue and watch how we have hijacked the operation of a running program to our advantage
pwndbg> c
Continuing.
I'm hacked! Iam a hidden function!
[Inferior 1 (process 195369) exited normally]
pwndbg>
Beep Boop! We have hacked the program!
So the question in your mind is why did you show me this when I have no idea of what any of this is? It is important to understand that when we are doing a lengthy tutorial such as this we should sometimes look forward to see why we are taking so many steps to learn the basics before we dive in. It is important however to show you that if you stay with the tutorial your hard work will pay off as we will learn how to hijack any running program to make it do whatever we want in addition to proactively breaking down a malicious program so that we can not only disable it but trace it back to a potential IP of where the hack originated.
In our next part we will Introduce Memory Addressing and The Stack. stay tuned
Credits:
x86-64 Assembly Language Programming with Ubuntu by Ed Jorgensen, Ph.D. May 2022
0xinfection's reversing for everyone: https://0xinfection.github.io/reversing/