Walkthrough

Think something was too hard, confusing, or wrong? Send feedback.

How did the airlock door open?

The easiest way to start figuring this out is to first look at airlock_ctrl.log.

By examining the log you can start to understand the sequence of prints for the different sorts of events. Specifically that <- indicates a received command and -> is for a sent command.

The log ends with:

2216-05-25_08:04:04 DEBUG: <- SDN_MSG_TYPE_SET_SUIT_OCCUPANT           Len:  20, Dev: 0xae215e17, User: 0x488504f4, Preferences Len: 0
2216-05-25_08:04:04 INFO: Sending ControlDoor command to 0xae215e13, open=1
2216-05-25_08:04:04 DEBUG: -> SDN_MSG_TYPE_SET_OPEN                    Len:  17, Dev: 0xae215d67, Open: 1
Illegal instruction (core dumped)

We can compare that to the previous SDN_MSG_TYPE_SET_SUIT_OCCUPANT message:

2216-05-11_15:00:10 DEBUG: <- SDN_MSG_TYPE_SET_SUIT_OCCUPANT           Len: 288, Dev: 0xae215e17, User: 0xd481aa99, Preferences Len: 268
2216-05-11_15:00:10 INFO: Handling SET_SUIT_OCCUPANT message.
2216-05-11_15:00:10 DEBUG: -> SDN_MSG_TYPE_SET_SUIT_OCCUPANT           Len: 288, Dev: 0xae215d67, User: 0xd481aa99, Preferences Len: 268
2216-05-11_15:00:10 DEBUG: -> SDN_MSG_TYPE_CMD_RESPONSE                Len:  17, Dev: 0xae215d67, Response Code: 0

Searching for these log lines in airlock_ctrl.c we can see that only Handling SET_SUIT_OCCUPANT message. and Sending ControlDoor command are present. The other messages must happen in the unavailable libraries.

Now if we look at the backtrace of where the process crashed in GDB:

(gdb) backtrace
#0  0x00007fffffffe9ac in ?? ()
#1  0x00007fffffffe790 in ?? ()
#2  0x0000555555555936 in ProcessMessageData (handlers=0x55555555c7c0, num_handlers=6, msg_buffer=0x55555555c6b0, buffer_size_bytes=2048, state=0x7fffffffe7f0) at src/airlock_ctrl.c:328
#3  0x000055555555702a in main () at src/airlock_ctrl.c:847

We see the process crashed in src/airlock_ctrl.c:328

/**
 * @brief Reads and dispatches incoming SDN messages to registered handlers.
 *
 * This function enters a loop to read messages from the SDN network. For each
 * valid message received, it iterates through the list of registered handlers
 * and invokes the callback for the matching message type.
 *
 * @param handlers Array of message handlers.
 * @param num_handlers Number of handlers in the array.
 * @param msg_buffer Buffer to store incoming messages.
 * @param buffer_size_bytes Size of the message buffer.
 * @param state A pointer to the application state, passed to callbacks.
 * @return Returns a negative value on a read error. The loop continues otherwise.
 */
static int ProcessMessageData(SDNHandler *handlers, size_t num_handlers,
                              void *msg_buffer, size_t buffer_size_bytes, AirlockState *state)
{
    assert(handlers != NULL);
    assert(state != NULL);
    assert(buffer_size_bytes >= sizeof(SDNMsgHeader));
    while (true)
    {
        int ret = ReadNextMessage(msg_buffer, buffer_size_bytes);
        if (ret < 0)
        {
            return ret;
        }
        else if (ret == 0)
        {
            return 0;
        }
        else
        {
            SDNMsgHeader *msg_header = (SDNMsgHeader *)msg_buffer;
            for (SDNHandler *handler = handlers; handler < handlers + num_handlers;
                 handler++)
            {
                if (msg_header->msg_type == handler->type)
                {
                    // CRASH HAPPENED IN THIS FUNCTION CALL.
                    handler->callback(msg_buffer, ret, state);
                }
            }
        }
    }
}

Looking at the registered message handlers, we can see that this should have called HandleSetSuitOccupant which would have logged Handling SET_SUIT_OCCUPANT message.. So it seems like this callback was corrupted somehow and ran some non-code data.

We can see where the call was with:

(gdb) frame 2
#2  0x0000555555555936 in ProcessMessageData (handlers=0x55555555c7c0, num_handlers=6, msg_buffer=0x55555555c6b0, buffer_size_bytes=2048, state=0x7fffffffe7f0) at src/airlock_ctrl.c:328
328                         handler->callback(msg_buffer, ret, state);
(gdb) print handler->callback
$1 = (sdn_msg_callback_t) 0x7fffffffe96c

Since we know the crash was 0x7fffffffe9ac (frame 0's address), we can then see the instructions that this executed with:

(gdb) disassemble 0x7fffffffe96c,0x7fffffffe9ac+8
Dump of assembler code from 0x7fffffffe96c to 0x7fffffffe9b4:
   0x00007fffffffe96c:  push   %rbp
   0x00007fffffffe96d:  movabs $0x555555556816,%r15
   0x00007fffffffe977:  mov    %rdi,%r12
   0x00007fffffffe97a:  add    $0x10,%r12
   0x00007fffffffe97e:  cmpl   $0x488504f4,(%r12)
   0x00007fffffffe986:  jne    0x7fffffffe9c2
   0x00007fffffffe988:  mov    %rdi,%r12
   0x00007fffffffe98b:  mov    %rsi,%r13
   0x00007fffffffe98e:  mov    %rdx,%r14
   0x00007fffffffe991:  movabs $0x55555555539c,%rax
   0x00007fffffffe99b:  mov    $0xae215d67,%edi
   0x00007fffffffe9a0:  mov    $0xae215e13,%esi
   0x00007fffffffe9a5:  mov    $0x1,%edx
   0x00007fffffffe9aa:  call   *%rax
=> 0x00007fffffffe9ac:  (bad)
   0x00007fffffffe9ae:  jae    0x7fffffffea23
   0x00007fffffffe9b0:  jbe    0x7fffffffea21
   0x00007fffffffe9b2:  insb   (%dx),%es:(%rdi)
   0x00007fffffffe9b3:  insl   (%dx),%es:(%rdi)

Here we need to understand a little x64 assembly. call *%rax is a function call to an address stored in %rax. With movabs $0x55555555539c,%rax we see that this was just set. We can run:

(gdb) info symbol 0x55555555539c
ControlDoor in section .text of /tmp/test/airlock_ctrl

to see that this code does indeed call ControlDoor.

So what is 0x7fffffffe96c? We can look at the stack pointer register with print $rsp, look at the addresses for the variables in "frame 2" with info locals and print &ret, or run frame info to see that 0x7fffffffeXXX is part of the stack memory. pwndbg makes this much easier than regular gdb since it color codes stack memory and has the vmap command which lists its range.

How did the process end up executing code on the stack?

For the first question we found that the reason that the process was running on the stack was that the handler->callback value was a pointer to stack memory. To understand how this happened, it's easiest to start from the source code.

In airlock_ctrl.c, we can see how the handlers are allocated:

sdn_log(SDN_INFO, "Allocating RX message buffer of size %u.", state.config.rx_message_buffer_size);
rx_message_buffer = malloc(state.config.rx_message_buffer_size);
if (rx_message_buffer == NULL)
{
    exit_with_error(EXIT_CODE_MEMORY_ALLOC_FAILED);
}

sdn_log(SDN_INFO, "Registering message handlers.");
size_t num_message_handlers = MIN_NUM_MESSAGE_HANDLERS;
if (state.config.remote_fault_clear)
{
    num_message_handlers++;
}
#if APP_DEBUG_BUILD
num_message_handlers++;
#endif
message_handlers = malloc(sizeof(SDNHandler) * num_message_handlers);

We can see that the program uses malloc to allocate the memory for the handlers on the heap. We can also see that the only other place malloc is used in this file is right before to allocate the rx_message_buffer.

In GDB we can confirm that rx_message_buffer is in the memory right before message_handlers.

(gdb) frame 3
#3  0x000055555555702a in main () at src/airlock_ctrl.c:847
847             if (ProcessMessageData(message_handlers, num_message_handlers, rx_message_buffer, state.config.rx_message_buffer_size, &state) < 0)
(gdb) print rx_message_buffer
$1 = (void *) 0x55555555c6b0
(gdb) print message_handlers
$2 = (SDNHandler *) 0x55555555c7c0
(gdb) print (void*)message_handlers - (void*)rx_message_buffer
$1 = 272

From this, it's a safe assumption that message_handlers was corrupted by an overflow in rx_message_buffer that was larger than 272 bytes.

We can confirm this if we look at airlock_ctrl.log, we see that there was a SetSuitOccupant message that was 288 bytes.

Where did the code that opened the airlock come from?

To figure this out, we need to find a stack variable whose value is set by external data.

If we look at HandleSetSuitOccupant in airlock_ctrl.c we can see:

SDNSetSuitOccupantMessage *send_ptr = (SDNSetSuitOccupantMessage *)state->message_serialization_buffer;
memcpy(send_ptr, message_data, msg_len);

We can then use GDB to check if that's where the attack code was stored.

(gdb) frame 2
#2  0x0000555555555936 in ProcessMessageData (handlers=0x55555555c7c0, num_handlers=6, msg_buffer=0x55555555c6b0, buffer_size_bytes=2048, state=0x7fffffffe7f0)
    at src/airlock_ctrl.c:328
328                         handler->callback(msg_buffer, ret, state);
(gdb) print (void*)state.message_serialization_buffer
$1 = (void *) 0x7fffffffe8c8
(gdb) print sizeof(state.message_serialization_buffer)
$2 = 1044
(gdb) print handler->callback
$3 = (sdn_msg_callback_t) 0x7fffffffe96c
(gdb) print (void*)handler->callback - (void*)state.message_serialization_buffer
$4 = 164

We can see that the value in the callback is indeed in the message_serialization_buffer.

Since the function call address is 164 bytes into message_serialization_buffer, only SDNSetSuitOccupantMessage messages that are at least that long would affect the attack code.

Why didn't the bounds check on rx_message_buffer work?

Looking at airlock_ctrl.c, we can see that the state.config.rx_message_buffer_size is used to check the size of incoming messages. In GDB we can see:

(gdb) frame 3
#3  0x000055555555702a in main () at src/airlock_ctrl.c:847
847             if (ProcessMessageData(message_handlers, num_message_handlers, rx_message_buffer, state.config.rx_message_buffer_size, &state) < 0)
(gdb) print state.config.rx_message_buffer_size
$1 = 2048

so rx_message_buffer_size is 2048 bytes. In the second questions we already saw that message_handlers - rx_message_buffer == 272, so why are these inconsistent?

Looking at access_log.sql, the rx_message_buffer_size configuration value is 256. In airlock_ctrl.c we can see a SDNDebugWriteConfigInt message updated this value to 2048. This is handled in:

static void HandleDebugWriteConfigInt(const void *message_data, size_t msg_len, AirlockState *state)
{
    sdn_log(SDN_INFO, "Handling DEBUG_WRITE_CONFIG_INT message.");
    if (msg_len >= sizeof(SDNDebugWriteConfigInt))
    {
        SDNResponseStatus cmd_response = SDN_RESPONSE_CMD_ERROR_3; // Default to error
        SDNDebugWriteConfigInt *cf = (SDNDebugWriteConfigInt *)message_data;
        cf->key[sizeof(cf->key) - 1] = 0;
        state->fault_bits |= FAULT_DEBUGGER;
        sdn_log(SDN_DEBUG, "Attempting to write config key '%s' with value %d", cf->key, cf->value);
        if (WriteConfigU32(cf->key, (uint32_t)cf->value) || WriteConfigBool(cf->key, (bool)cf->value))
        {
            cmd_response = SDN_RESPONSE_GOOD;
        }

        if (state->config.apply_config_change && cmd_response == SDN_RESPONSE_GOOD && !LoadConfig(&state->config))
        {
            exit_with_error(EXIT_CODE_CONFIG_LOAD_FAILED);
        }
        SendCmdResponse(cmd_response);
    }
    else
    {
        sdn_log(SDN_WARN, "Received DEBUG_WRITE_CONFIG_INT with invalid length %d", (int)msg_len);
        SendCmdResponse(SDN_RESPONSE_INVALID_MSG_LEN);
    }
}

Since apply_config_change is true, rx_message_buffer_size gets updated right away. rx_message_buffer is only allocated on start up which explains the mismatch.

Who engineered the shellcode injection into the airlock?

For reference: https://en.wikipedia.org/wiki/Shellcode

This attack required the following steps:

  1. The airlock_ctrl needed to be deployed in debug mode to allow stack execution and ad-hoc config changes
  2. state.config.rx_message_buffer_size needed to be made small enough to overflow with a HandleSetSuitOccupant message
  3. A SDNDebugWriteConfigInt message increases rx_message_buffer_size to allow the buffer overflow
  4. A SDNSetSuitOccupantMessage message loads the shellcode into message_serialization_buffer
  5. A SDNSetSuitOccupantMessage message overflows rx_message_buffer pointing the message handler to the shellcode
  6. No other SDNSetSuitOccupantMessage messages are large enough to overwrite the shellcode until Alex Mercer tried to use it

Logic in the shellcode checks to see if Alex Mercer's user id is in the message and only triggers then.

No one person directly took all these actions.

Figuring out the perpetrator requires separating which of these actions were a result of people just doing their jobs, and which were intentional. Of these steps, the one that is the least justifiable is Helena Efrem decreasing the buffer size. In addition, there's the email where Rolf Ivo mentions that Helena helped him modify his user_preferences.

These together make Helena the prime suspect.