Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
README.md	README.md
assembler.py	assembler.py
grammar.lark	grammar.lark
microcode-enginetest.c	microcode-enginetest.c
microengine.cpp	microengine.cpp
microengine.hpp	microengine.hpp
myqueue.hpp	myqueue.hpp
names.c	names.c
phy.cpp	phy.cpp
phy.hpp	phy.hpp
types.hpp	types.hpp

ooows-broadcooom DEF CON 2021 Finals Challenge

This challenge is an implementation of the IXP1200 network processor.

Resources

I heavily used Douglas E. Comer excellent book Network Systems Design Using Network Processors: Intel IXP Version, of which I am now the proud owner.

I also used source code examples on the book's companion website.

Unfortuantely I was unable to acquire a real Intel assembler for the IXP1200, so I could not make the microengine bit-compatible with the IXP1200. The assembly language is quite similar though, and most of the functionality is the same (many aspects were simplified).

Components

ooows-net.cpp and ooows-net.hpp act as the interface between the driver and the device, using virtio as the communication mechanism.
microengine.cpp and microengine.hpp act as the "Core" processor of the IXP1200 and also execute the microengine instructions.
engine.uc is the microengine code that runs.
assembler.py assembles the microcode to binary which the microengine runs.

Bugs

There are two intended bugs in the microengine code, both revolving around the indirect memory references functionality of the IXP1200 (where the size of the memory transfer is taken from the result of the last ALU operation).

TX IP Checksum Offloading IHL overflow

When the Transmit IP Checksum Offloading functionality of the device is enabled, then the microengine code will first check that the IP version in the IP header is 4, then it will incorrectly read in as many 4 byte values as specified in the IHL field.

The vulnerable instruction ram[read, read_11, reg_12, reg_8, 0] ctx_swap reads as many register-size values (4 bytes) from memory and writes them into registers starting at thread-local register read_11.

The number of registers that are written to depends on the IHL field. The tx_thread running on the microengine is thread context 3, so read_11, according to how the microengine maps out registers in Microengine::absolute_register it is absolute register 251 in Microengine::m_registers.

Because the IHL field is a nibble (half a byte), the highest value that can be used to read is 0xf or 16, for a total of 64 bytes.

This allows us to overwrite memory (due to the fact that the memory transfer functions to not check bounds when reading/writing from memory starting at Microengine::m_registers[251].

Overwriting Microengine::m_registers means that we will start overwriting Microengine::m_code with data that we control.

Due to the size, this means that we will be able to control the first ~8 (if my memory/math is correct) instructions in m_code.

Looking at the start of the microengine code, we can see that the first 4 microengine instructions are only executed as part of the initialization of the microengine (essentially getting each thread to execute it's main loop).

However, we can control the first 4 instructions starting at the start of the rx_thread_loop.

For now, let's assume that we can control these 4 instructions, how can we get the rx_thread to execute our instructions? The rx_thread is waiting toward the end of its loop for a new packet to come in from the PHY interface. To trigger rx_thread to execute the new code, we need to upload another ooows kernel and have it send a packet (I believe any packet will work and cause rx_thread to loop back to code we control.

Now we have the microengine executing four instructions that we control! Is four instructions enough?

Exploitation

Once we achieve arbitrary microcode execution, we need to be able to achieve arbitrary code execution to get the flag. This is more complicated than one might expect: the microengine does not have access to any syscalls. Functionally, the microcode has only 7 types of instructions: ALU operations, load immediate, branch, load from registers, rx from PHY device, tx to core processor, access CSR registers, and memory reads/writes.

However, as we know from this bug, memory reads/writes do not have a bounds check!

We can read/write any memory location that is a positive offset of scratch memory which is at Microengine::m_scratch, and any memory location that is a positive offset of where Microengine::m_ram points to, which is 8MB acquired through malloc.

I believe some teams were able to achieve code execution only with these primitives, however my heap-foo was not so l33t. So I combined these primitives to achieve an arbitrary read/write.

The idea is that because Microengine::m_ram is a positive offset from Microengine::m_scratch, we can overwrite the Microengine::m_ram pointer to point to any part of memory, then memory reads/writes to ram will use the new pointer value that we control.

The one quirk here that make our job more difficult is that the microengine's ram processing thread first stores the pointer to mem on the stack, then asks for a memory job from the queue. This means that we will need to read/write from ram twice, and only the second time will use the correct value of ram that we controlled/overwrote.

I put this all together in example shellcode that overall overwrites saved RIP of Microengine::interpreter_loop and finally triggers the break condition in interpreter_loop.

Now that we have microengine shellcode, this is significantly longer than the 4 instructions that we control for this bug. So what can we do? We use the classic technique of using a first stage shellcode, and overwrites the rest of the microengine code of rx_thread with our target shellcode.

Where does the rest of our shellcode come from? We can either have it be in the packet that we sent to trigger this bug, or we can have it be in the packet that we received to trigger rx_thread to execute our code. Either way, the packets are stored at fixed locations in ram ring buffers.

TX Ethernet Checksum Offloading overflow

The second bug in the challenge was in the transmission ethernet checksum offloading.

This was a subtle bug which was the result of signed comparison and an off-by-one.

The signed comparison problem was that the ethernet checksum offloading would try to load in maximum of 8 bytes worth of data from the packet to calculate the checksum. The intention is the equivalent of min(#bytesleft, 2*4).

However, the signed comparison bug is that all branch instructions use signed comparison use so the check in the microengine will return #bytesleft if it has the highest bit set it will be considered negative, passed the check, and used as an indirect memory reference after being shifted right by 2.

The problem is that the maximum size of data that we can transmit to the device is 1518 (which should be consistent with the Ethernet spec).

So the goal now is that the #bytesleft in the packet should be either negative or very large.

The off-by-one bug is at the end of the CRC calculation loop. The microengine code checks if the #bytesleft is >=7, and if it is then it will subtract 8 from #bytesleft.

This means that if we send a packet a size of mod 7, at the last loop of the CRC calculation loop, it will set #bytesleft = #bytesleft - 8, which means that #bytesleft will be -1, which is represented as 0xFFFFFFFF in hex.

On this last loop through, the value to read in the indirect memory reference will be very large. Luckily, it's not so large, because the microengine limits the count to be one byte, so "only" 255 register types (4 bytes each) will be read in, for a total of 1,020 bytes.

Again, this will allow us to control a significant amount of m_code, and follow the exploitation technique of the prior bug (we'll need to be careful about overwriting code in other threads, but we can handle that by overwriting that code with what's already there).

However, we've hit a bit of a snag, on the overwrite, we're on the last loop of calculating the packet CRC, which means we're actually at the end of our packet, and the data that we're overwriting with is written by memory outside our packet. How can we control this memory if it's outside our packet?

Ring buffers to the rescue. The core processor asks the microengine to send packets by putting them on the tx ring buffer, the packet content is stored in RAM, and the size of the packet RAM ring buffer is 256.

Now our idea to exploit this bug is:

Send 255 packets with our shellcode at the correct offset.
Enable TX Ethernet Checksum Offloading (you want to enable this now b/c it is very slow and enabling it for the other packets is very very slow).
Send a packet of size 7 which will cause the overwrite.
RX a packet (by sending it from another ooows kernel) to trigger rx_thread to start its loop over (I'm sure there are other ways to go here).
Microengine is now executing your shellcode!

Finally, we can achieve code execution through the previously described techniques.

Fin

Anyway, this was my last DEF CON challenge as part of OOO, hope that now some time has passed you've "enjoyed" these trips down computing history.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broadcooom

broadcooom

README.md

ooows-broadcooom DEF CON 2021 Finals Challenge

Resources

Components

Bugs

TX IP Checksum Offloading IHL overflow

Exploitation

TX Ethernet Checksum Offloading overflow

Fin

Files

broadcooom

Directory actions

More options

Directory actions

More options

Latest commit

History

broadcooom

Folders and files

parent directory

README.md

ooows-broadcooom DEF CON 2021 Finals Challenge

Resources

Components

Bugs

TX IP Checksum Offloading IHL overflow

Exploitation

TX Ethernet Checksum Offloading overflow

Fin