Architecture Overview [part 2]

Architecture Overview [part 2]

Sep 20, 2023ยท

10 min read

In computer science, computer architecture is a description of the structure of a computer system made from parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the description may include the instruction set architecture design, microarchitecture design, logic design, and implementation.

๐Ÿ’ก
Computer architecture. (2023, August 3). In Wikipedia. https://en.wikipedia.org/wiki/Computer_architecture

The basic components of a computer include a Central Processing Unit (CPU), primary Storage or Random Access Memory (RAM), Secondary Storage, Input/Output devices (eg. screen, keyboard, mouse), and an interconnection referred to as the Bus.

Block diagram of a basic computer with uniprocessor CPU. Black lines indicate data flow, whereas red lines indicate control flow. Arrows indicate the direction of flow.

The architecture is typically referred to as Von Neumann Architecture, or Princeton architecture, and was described in 1945 by the mathematician and physicist John von Neumann.

Programs and data are typically stored on secondary storage (e.g., disk drive or solid-state drive). When a program is executed, it must be copied from the secondary storage into the primary storage or main memory (RAM). The CPU now executes the program from the RAM.

Primary storage or main memory is also referred to as volatile memory since when power is removed, the information is not retained and thus lost. Secondary storage is referred to as non-volatile memory since the information is retained when powered off.

Data Storage Sizes

The x86-64 architecture supports a specific set of data storage size elements, all based on the powers of two:

StorageSize (bits)Size (bytes)
Byte8-bits1 byte
Word16-bits2 bytes
Double-word32-bits4 bytes
Quadword64-bits8 bytes
Double quadword128-bits16 bytes

Central Processing Unit (CPU)

The CPU is typically referred to as the "brains" of the r computer since that is where the actual calculations are performed.

The CPU is housed in a single chip called a processor, chip, or die. It looks like this:

view with care, that thing is expensive as hell

The CPU consists of 4 parts which are:

  1. Control Unit - Retrieves and decodes instructions from the CPU and then stores and retrieves them to and from memory.

  2. Execution Unit - Where the execution of fetching and retrieving instructions

  3. Registers - Internal CPU memory locations used as temporary data storage.

  4. Flags - indicate events when execution occurs.

It should be noted that the internal design of a modern processor is quite complex. this series provides a very simplified, high-level view of some key functional units within a CPU.

CPU Registers

A CPU register, or just register, is a temporary storage or working location built into the CPU itself (separate from memory).

Computations are typically performed by the CPU using registers.

General Purpose Registers (GPRs)

There are sixteen, 64-bit General Purpose Registers (GPRs). The GPRs are described in the following table.

64-bit registerLowest 32-bitsLowest 16-bitsLowest 8-bits
raxeaxaxal
rbxebxbxbl
rcxecxcxcl
rdxedxdxdl
rsiesisisil
rdiedididil
rpbepbbpbpl
rspespspspl
r8r8dr8wr8b
r9r9dr9wr9b
r10r10dr10wr10b
r11r11dr11wr11b
r12r12dr12wr12b
r13r13dr13wr13b
r14r14dr14wr14b
r15r15dr15wr15b

The general-purpose registers are used to temporarily store data as it is processed on the processor. The registers have evolved dramatically over time and continue to do so. We will focus on 32-bit x86 architecture for our purposes. Each new version of general-purpose registers is created to be backward compatible with previous processors. This means that code utilizing 8-bit registers on the 8080 chips will still function on today's 64-bit chipset.

Let's review the 8 general-purpose registers in a 32-bit architecture

EAX: the main register used in arithmetic calculations. Also known as an accumulator, as it holds the results of arithmetic operations and function return values.

EBX: The base Register. Pointer to data in the DS segment. Used to store the base address of the program.

ECX: The counter register is often used to hold a value representing the number of times a process is to be repeated. Used for loop and string operations.

EDX: A general purpose register. Additionally used for I/O operations. in addition will extend EAX to 64-bits.

ESI: Source Index register. Pinyer to date in the segment pointed to by the DS register. Used as an offset address in string and array operations. It hides the address from where to read data.

EDI: Destination Index register. Pomter to data (or destination) in the segment pointed to by the ES register. used as an offset address in string and array operations.

EBP: Base Pointer. Ponter to data on the stack (in the SS segment). It points to the current stack frame. it is used to reference local variables.

Keep in mind each of the above registers is 32-bit in length or 4 bytes in length. Each of the lower 2 bytes of the EAX, EBX, ECX, and EDX registers can be referenced by AX and then subdivided by the names AH, BH, CH, and DH for high bytes and AL, BL, CL, and DL for the low bytes which are 1 byte each. In addition, the ESI, EDI, EBP and ESP can be referenced by their 16-bit equivalent which is SI, DI, BP, SP. This can be a bit confusing to someone who has not studied computer engineering however let me illustrate in the table below:

Flag Register (rFlags)

The flag register, rFlags, is used for staus and CPU control information. The rFlag register is updated by the CPU after each instruction and not directly accessible by the programs.

Segment Registers

The segment registers are used specifically for referencing memory locations. There are six segment registers which are as follows:

CS: Code segment register stores the base location of the code section (.text section) which is used for data access.

DS: Data segment register stores the default location for variables (.data section) which is nnused for data access.

ES: extra segment register which isued during string operations.

SS: stack segment register stores the base location o he stack segment snd is used when implicitly using the stack pointer or when explicitly using the base pointer.

XMM Registers

There a set of dedicated registers used to support 64-bit and 32-bit floating-point operations.

Instruction Pointer Register

The Instruction pointer register called the EIP register is simply the most important register you will deal with in any reverse engineering. The EIP keeps track of the next instruction code to execute. EIP points to the next instruction to execute. If you were to alter that pointer to jump to another area in the code you have complete over the program.

Lets jump ahead and dive into some code. Here is an example of a simple hello world application in C that we will go into more detail much later in our series.

For our purposes today, we will see the raw POWER of assembly language and particularly that of the EIP register and what we can do to completely hack program control.

// Thank you @0xinfection

#include <stdio.h>
#include <stdlib.h>

void unreachableFunction(void) {
    printf("I'm hacked! Iam a hidden function!\n");
    exit(0);
}

int main(void) {
    printf("Hello World!\n");

    return 0;
}

Dont't worry if you do not understand what it does or its functionality. What to take note here is that the fact we have a function called unreachableFunction that is never called by the main function. As you will see if we can control the EIP register we can hack this program to execute that code!

We have simply compiled the code to work with the IA32 instruction set and ran it. As you can see there is no call to the unreachableFunction of any kind as its unreachable under normal conditions as you can see the 'Hello World!' printed when excuted.

pwndbg> set disassembly-flavor intel
pwndbg> b main
Breakpoint 1 at 0x11e4: file hello.c, line 12.
pwndbg> r
Starting program: /home/xi/asm64/hello 
Downloading separate debug info for /lib/ld-linux.so.2
Downloading separate debug info for system-supplied DSO at 0xf7fc6000                                                 
Downloading separate debug info for /usr/lib32/libc.so.6                                                              
[Thread debugging using libthread_db enabled]                                                                         
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Breakpoint 1, main () at hello.c:12
12        printf("Hello World!\n");
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ REGISTERS / show-flags off / show-compact-regs off ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
*EAX  0x56558ff4 (_GLOBAL_OFFSET_TABLE_) โ—‚โ€” 0x3ef0
*EBX  0xf7e43e34 (_GLOBAL_OFFSET_TABLE_) โ—‚โ€” 0x243d4c /* 'L=$' */
*ECX  0xffffc830 โ—‚โ€” 0x1
*EDX  0xffffc850 โ€”โ–ธ 0xf7e43e34 (_GLOBAL_OFFSET_TABLE_) โ—‚โ€” 0x243d4c /* 'L=$' */
*EDI  0xf7ffcb80 (_rtld_global_ro) โ—‚โ€” 0x0
*ESI  0xffffc8ec โ€”โ–ธ 0xffffcb58 โ—‚โ€” 'SHELL=/usr/bin/bash'
*EBP  0xffffc818 โ—‚โ€” 0x0
*ESP  0xffffc810 โ€”โ–ธ 0xffffc830 โ—‚โ€” 0x1
*EIP  0x565561e4 (main+25) โ—‚โ€” sub esp, 0xc
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ DISASM / i386 / set emulate on ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 โ–บ 0x565561e4 <main+25>    sub    esp, 0xc
   0x565561e7 <main+28>    lea    edx, [eax - 0x1fc9]
   0x565561ed <main+34>    push   edx
   0x565561ee <main+35>    mov    ebx, eax
   0x565561f0 <main+37>    call   puts@plt                    <puts@plt>

   0x565561f5 <main+42>    add    esp, 0x10
   0x565561f8 <main+45>    mov    eax, 0
   0x565561fd <main+50>    lea    esp, [ebp - 8]
   0x56556200 <main+53>    pop    ecx
   0x56556201 <main+54>    pop    ebx
   0x56556202 <main+55>    pop    ebp
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ SOURCE (CODE) ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
In file: /home/xi/asm64/hello.c
    7     printf("I'm hacked! Iam a hidden function!\n");
    8     exit(0);
    9 }
   10 
   11 int main(void) {
 โ–บ 12     printf("Hello World!\n");
   13 
   14     return 0;
   15 }
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ STACK ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
00:0000โ”‚ esp 0xffffc810 โ€”โ–ธ 0xffffc830 โ—‚โ€” 0x1
01:0004โ”‚     0xffffc814 โ€”โ–ธ 0xf7e43e34 (_GLOBAL_OFFSET_TABLE_) โ—‚โ€” 0x243d4c /* 'L=$' */
02:0008โ”‚ ebp 0xffffc818 โ—‚โ€” 0x0
03:000cโ”‚     0xffffc81c โ€”โ–ธ 0xf7c23b09 (__libc_start_call_main+121) โ—‚โ€” add esp, 0x10
04:0010โ”‚     0xffffc820 โ—‚โ€” 0x0
05:0014โ”‚     0xffffc824 โ—‚โ€” 0x0
06:0018โ”‚     0xffffc828 โ€”โ–ธ 0x56555300 โ—‚โ€” '__libc_start_main'
07:001cโ”‚     0xffffc82c โ€”โ–ธ 0xf7c23b09 (__libc_start_call_main+121) โ—‚โ€” add esp, 0x10
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ BACKTRACE ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 โ–บ f 0 0x565561e4 main+25
   f 1 0xf7c23b09 __libc_start_call_main+121
   f 2 0xf7c23bcd __libc_start_main+141
   f 3 0x5655609b _start+43
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
pwndbg>

Ok...Relax everything is all right.

W have disassembled the program using the GDB Debugger. We have set a breakpoint on the main function and the program. The => shows where EIP is pointing to when we step to the next instruction. If we follow normal program flow, 'Hello World!' will print to the console and exit.

Breakpoint 1, main () at hello.c:12
12        printf("Hello World!\n");

If we run the program again and do an examination of where EIP is pointing to we will see:

pwndbg> x/1xw $eip
0x565561e4 <main+25>:    0x8d0cec83
pwndbg>

We can see EIP is pointing to main+25 or address of 0x8d0cec83

Let's examine the unreachableFunction and see where it starts in memory and write down that address.

pwndbg> disas unreachableFunction
Dump of assembler code for function unreachableFunction:
   0x5655619d <+0>:    push   ebp
   0x5655619e <+1>:    mov    ebp,esp
   0x565561a0 <+3>:    push   ebx
   0x565561a1 <+4>:    sub    esp,0x4
   0x565561a4 <+7>:    call   0x565560a0 <__x86.get_pc_thunk.bx>
   0x565561a9 <+12>:    add    ebx,0x2e4b
   0x565561af <+18>:    sub    esp,0xc
   0x565561b2 <+21>:    lea    eax,[ebx-0x1fec]
   0x565561b8 <+27>:    push   eax
   0x565561b9 <+28>:    call   0x56556050 <puts@plt>
   0x565561be <+33>:    add    esp,0x10
   0x565561c1 <+36>:    sub    esp,0xc
   0x565561c4 <+39>:    push   0x0
   0x565561c6 <+41>:    call   0x56556060 <exit@plt>
End of assembler dump.
pwndbg>

The next step is to set EIP to address 0x5655619d so that we hijack program flow to run the unreachableFunction.

pwndbg> set $eip = 0x5655619d
pwndbg> x/1w $eip
0x5655619d <unreachableFunction>:    0x53e58955
pwndbg>

Now that we have hacked control of EIP, lets continue and watch how we have hijacked the operation of a running program to our advantage

pwndbg> c
Continuing.
I'm hacked! Iam a hidden function!
[Inferior 1 (process 195369) exited normally]
pwndbg>

Beep Boop! We have hacked the program!

So the question in your mind is why did you show me this when I have no idea of what any of this is? It is important to understand that when we are doing a lengthy tutorial such as this we should sometimes look forward to see why we are taking so many steps to learn the basics before we dive in. It is important however to show you that if you stay with the tutorial your hard work will pay off as we will learn how to hijack any running program to make it do whatever we want in addition to proactively breaking down a malicious program so that we can not only disable it but trace it back to a potential IP of where the hack originated.

In our next part we will Introduce Memory Addressing and The Stack. stay tuned

Credits:

ย