Skip to content

jeras/gdb_server_stub_sv

Repository files navigation

GDB server stub

This is a GDB server stub written in SystemVerilog (and a bit of DPI-C). It is primarily designed for RISC-V standard processors, but it can be modified to connect to other processor families.

Advantages:

  • can be used on RTL CPU without explicit debug functionality,

  • real time debugging, no simulation time is consumed by the debugger (useful for bitbanging and real time interrupts),

  • read/write access to registers and memories using direct references,

  • hardware breakpoints/watchpoints.

Disadvantages/limitations:

  • non synthesizable,

  • writing to registers/memory can have unpredictable effects (seriously),

  • RTL modifications are still needed to be able to change the SW execution flow (jump, …​),

  • lack of system bus access, since it would consume simulation time,

  • access by reference results in multiple drivers to memory/register file arrays, necessitating the use of always instead of always_ff to implement them, this results in missing simulator optimizations and subsequently slower simulations.

Terminology

Table 1. Terminology
acronym definition

GRP

General Purpose Registers (register file)

PC

Program Counter

FPR

Floating-Point Registers (register file)

CSR

Configuration Status Registers

HDL

Hardware Description Language

RTL

Register-transfer level

ROM

Read-only memory

RAM

Random-access memory

Integration

The GDB server stub is connected to SoC signals:

  • following the system clock and driving the reset,

  • monitoring CPU IFU/LSU interfaces to trigger breakpoints/watchpoints (can also be used to detect illegal instruction execution),

  • register files (GPR/FPR) and individual registers (PC/CSR) inside the CPU should be connected using bidirectional hierarchical path names,

  • memory arrays inside memory modules should be connected using bidirectional hierarchical path names.

The connection to a GDB/LLDB client uses a Unix or TCP server socket.

Integration with a CPU block diagram
Figure 1. Integration with a CPU block diagram

Ports

Implementation details

The implementation is a mix of DPI-C and pure SystemVerilog code. The DPI-C code implements minimal socket functionality, while SystemVerilog implements packet parsing, responses and integration with the CPU.

DPI-C

The DPI-C implements socket functionality missing from the SystemVerilog language.

  • opening/closing of Unix server sockets, (TODO: TCP sockets)

  • blocking/non-blocking rend/receive access to sockets.

Note
While it would be partially possible to use a Linux character device model and existing SystemVerilog file IO functionality to implement communication with GDB, SystemVerilog only provides blocking fread/fgetc functions. Non-blocking read functionality is essential for implementing the GCC continue command, where the HDL simulation is running and consuming simulation time, while periodically (every clock cycle) checking for GDB commands.

SystemVerilog packet parsing

Development

TODO (simple):

  • add support for more packets (q* queries, v*, …​),

  • ~query for capabilities~, (DONE)

  • ~advanced mode~, (DONE)

  • ~mode without acknowledge~, (DONE, tested)

  • expand support for additional formats of partially supported packets,

  • write regression tests (simple but a lot of work),

  • integration with GUI tools,

  • ~support for TCP sockets~, (DONE, tested)

  • support for handling illegal instructions,

  • backtracing, (WIP)

  • support for binary memory access packets (x/X), but this is not really necessary,

  • modify DUT memory access to handle byte array instead of single bytes

  • workarounds for Verilator issues,

  • LLDB support, (WIP)

  • ~support for pausing the simulation ($stop()), to be able to see waveforms~, (DONE using the D packet, GDB detach command) ~also enabling/disabling waveform dumping,~ (DONE using a monitor packet, not tested yet)

  • …​

TODO (difficult):

  • understand packet patterns used by GDB,

  • what state the system should start in (reset/running/breakpoint/…​)?,

  • inserting/removing breakpoints/watchpoints and relation to step/continue,

  • software breakpoints inserting/removing is done with z/Z packets or m/M (memory access),

  • I/C (32/16 bit EBREAK instruction) breakpoints.

  • ~check whether there are race conditions to fix~ (WIP, improved code separation),

  • the code currently only runs in Questa, try to port to other simulators,

  • ~generalize access to more than one memory~ (DONE), and additional registers (CSR) ~(full generalization requires the SystemVerilog simulator to support the alias keyword),~ (not necessary, used a different approach with shadow memory)

  • add file storage fot the trace, something like a double linked list

  • …​

Leaky abstraction

A common debugger abstraction upon hitting a breakpoint is to execute all instructions up to the breakpoint address, but not the instruction on that address itself. While I was unable to find a strong statement defining this abstraction, it is evident from how software breakpoints work. The instruction at the breakpoint address is replaced by a BREAK instruction. It is similar for watchpoints, but those do not have a comparable software implementation.

While this abstraction makes sense with a common hardware debug interface, it is not the obvious approach for an entirely passive monitor of cycle accurate CPU execution. While a hardware debugger can enforce a state where all instructions up to the breakpoint have been executed and no instructions from the breakpoint on have started execution, a passive monitor is unable to modify the CPU state, so instructions before at and after the breakpoint can be in various stages of the execution pipeline. Additionally, pipelined, super-scalar and OoO CPU architectures perform some sort of speculative execution. In the simplest form, a pipelined CPU starts speculatively executing a branch, but flushes the pipeline, if the branch was mis-predicted.

Therefore the only reliable way to implement breakpoins/watchpoins or stepping in general within a passive monitor, is to detect when instructions are retired. So a passive monitor pauses the simulation at the point where the breakpoint instruction is retired, but presents to the debugger the state before the changes applied by the breakpoint instruction.

While this may sound complicated, it is far easier than the alternatives. My initial approach was to detect when the breakpoint instruction enters the instruction fetch stage, stop the simulation and modeling the state change caused by all the instructions further in the pipeline, but not yet retired. Such a model would basically reimplement the pipeline, which is only makes sense for singlecycle/multicycle implementations without a pipeline.

The shadow copy state and execution trace are designed with the following principles:

  1. A CPU simulator can exactly reproduce the execution sequence by using the shadow state updated by the trace. When the DUT and shadow states differ affecting the execution of an instruction, this difference must be recorded in the trace log.

  2. When stepping back in time by reverting traced instructions, it must be possible to achieve the exact same shadow state as during the forward execution.

Stepping forward/backward

Forward execution of the first 2 instructions:

  1. cnt=0 (RESET)

    • INIT: initialize the shadow copy (memories, PC, CSR) based on DUT reset values.

    • A breakpoint or a hardcoded condition is placed at the RESET address. In accordance with the breakpoint abstraction, the simulation must retire the first instruction, therefore a step FORWARD is always performed after RESET.

    • FORWARD (as part of the RESET and breakpoint at RESET sequence):

  2. shadow_apply()

  3. dut_step(ret) (collects first retired instruction)

  4. push(ret) (trace trc[0])

  5. shadow_update(0)

  6. shadow_remember(0)

  7. match breakpoint/watchpoint

  8. cnt++

    • BACKWARD

  9. cnt=1

    • FORWARD:

  10. shadow_apply(0)

  11. dut_step(ret) (collects second retired instruction)

  12. push(ret) (trace trc[1])

  13. shadow_update(1)

  14. shadow_remember(1)

  15. match breakpoint/watchpoint

  16. cnt++

    • BACKWARD:

    • responds with an error, since there is nowhere to go

  17. cnt=2

    • FORWARD:

  18. shadow_apply(1)

  19. dut_step(ret) (collects third retired instruction)

  20. push(ret) (trace trc[2])

  21. shadow_update(2)

  22. shadow_remember(2)

  23. match breakpoint/watchpoint

  24. cnt++

    • BACKWARD:

  25. shadow_remember(1)

  26. cnt--

  27. ret = trc[0]

  28. match breakpoint/watchpoint

There are 3 possible step operations 1. record (during first execution) 2. replay (replaying ) 3. revert

For the execution trace the following naming scheme makes sense:

  1. cur - current value

  2. nxt - next value

operation store load AMO

monitor_step

wdata → trc[].nxt

rdata → trc[].cur

wdata → trc[].nxt : rdata → trc[].cur

shadow_record

shadow → trc[].cur

trc[].cur → shadow

shadow → trc[].cur : trc[].cur → shadow

shadow_update

trc[].cur → shadow

trc[].cur → shadow

shadow_replay

trc[].nxt → shadow

trc[].nxt → shadow

shadow_revert

trc[].cur → shadow

trc[].cur → shadow

replay is for previous instruction write data, update is for current instruction read data

  1. RESET (empty trace queue)

    • INIT: initialize the shadow copy (memories, PC, CSR) based on DUT reset values.

    • A breakpoint or a hardcoded condition is placed at the RESET address. In accordance with the breakpoint abstraction, the simulation must retire the first instruction, therefore a step FORWARD is always performed after RESET.

    • LOAD: load program into memory.

    • INSPECT: the debugger will see the reset state of the SoC either before or after the program is loaded.

    • FORWARD (as part of the RESET and breakpoint at RESET sequence):

  2. dut_step(ret) (collects first retired instruction)

  3. push(ret) (trace trc[cnt])

  4. shadow_replay(cnt-1)

  5. shadow_record(cnt)

  6. match(cnt) breakpoint/watchpoint

    • BACKWARD

  7. cnt=0 (queue contains one element)

    • INSPECT: the debugger will se the first instruction

    • FORWARD:

  8. shadow_replay(cnt) (0)

  9. cnt++ (0 → 1)

  10. dut_step(ret) (collects second retired instruction)

  11. shadow_record(cnt)

  12. push(ret) (trace trc[cnt])

  13. match(cnt) breakpoint/watchpoint

    • BACKWARD:

    • responds with an error, since there is nowhere to go

  14. cnt=1

    • FORWARD (record):

  15. shadow_replay(cnt) (1)

  16. cnt++ (1 → 2)

  17. dut_step(ret) (collects third retired instruction)

  18. push(ret) (trace trc[cnt])

  19. shadow_record(cnt)

  20. match(ret) breakpoint/watchpoint

    • FORWARD (replay):

  21. shadow_replay(cnt)

  22. cnt++

  23. shadow_update(cnt)

  24. match(cnt) breakpoint/watchpoint

    • BACKWARD:

  25. cnt--

  26. shadow_revert(1)

  27. match(cnt) breakpoint/watchpoint

operation write read AMO

shadow_update

trc[].rdt → shadow

shadow_remember

shadow → trc[].rdt

shadow_apply

trc[].wdt → shadow

shadow_revert

shadow → trc[].rdt

VSCode integration

VSCode and time travel debugging

First, VSCode with the vscode-cpptools extension does support backwards step/continue. A simple prof would be this example (check the debugger buttons). However this is not a common feature and there are no complete examples documenting the steps necessary to enable this feature.

Microsoft provides a document describing how a debugger extension implementing a debugger adapter (DA) connects VScode and a debugger.

VSCodeDAPDAGDB/MIGDB/LLVMRSPQEMU/HDL

VSCode using the the debug adapter protocol (DAP) communicates with the debug adapter (DA) (with VS Code Debug Protocol and Debug Adapter) which uses the GDB machine interface (GDB/MI) protocol to communicate with GDB/LLVM. Further GDB/LLVM communicate with stub in a simulator like QEMU and the one in this project using RSP (GDB Remote Serial Protocol).

During Launch Sequencing the debug adapter should ask the debugger GDB/LLVM about capabilities/features.

The DAP protocol provides capabilities as a InitializeResponse to a InitializeRequest.

DAP protocol requests StepBack ReverseContinueRequest are available if the supportsStepBack capability is true.

The DA connects to GDB using GDB/MI (machine interface). The DA should ask GDB the -list-target-features question and get reverse in the response.

There are a few issues in the DA repository related to record/replay, reverse execution and GDB/MI -list-target-features command.

GDB would further communicate with a stub using GDB packets. The stub should respond to the qSupported packet with ReverseStep+;ReverseContinue+;.

QEMU

In a separate project I attempt to enable record/replay in QEMU (for ARM) and run a Zephyr APP to see if I can get the full VSCode integration working. I could not, but I at least checked the QEMU stub response to the qSupported RSP packet. QEMU provides the expected response.

I can use the same setup to check the GDB/MI protocol using a Python implementation of GDB/MI.

QEMU ARM record/replay demo with VSCode

Since QEMU record/replay functionality is not fully supported for the RISC-V ISA, this example will use the ARM ISA, to showcase QEMU record/replay and time travel debugging within VSCode.

Install ARM/RISC-V cross compiler:

sudo apt install gcc-riscv64-unknown-elf gdb-riscv64-unknown-elf
sudo apt install gcc-arm-none-eabi gdb-arm-none-eabi

Install ARM/RISC-V QEMU system emulator:

sudo apt install qemu-system-riscv32 qemu-system-riscv64
sudo apt install qemu-system-arm

Press Ctrl-A, x to exit emulation.

make -C uart01/ ARMGNU=arm-none-eabi
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -icount shift=auto,rr=record,rrfile=replay.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -icount shift=auto,rr=replay,rrfile=replay.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -s -S

References

Additional maintenance commands can be found here (useful for RESET functionality?):

Issue reports

Mirosoft C/C++ DAP adapter:

GDB:

  • Questions:

  • What signal should the stub send to GDB, when continue reaches the beginning of execution?

  • What should be the step response if there is no breakpoint/watchpoint?

QEMU:

Verilator:

Various stub implementations

More notes on record/replay functionality in GDB:

Major RISC-V simulators

List of major public RISC-V simulators with comments about interfacing with GDB and reverse execution.

Record/replay and reverse execution

RISC-V and deterministic record/replay tools (2018)

How to correctly use QEMU’s record/replay functionality? (2025)

GDB/LLVM integration with VSCode

I can see the step/continue backwards buttons.

BUILD, DEBUG, TEST

Alternative way to start the simulator (debugServerPath/debugServerArgs/serverStarted), instead of doing it as a task:

LLDB

RISC-V verification interfaces

This are interfaces exposing retired instructions, and are useful as an abstraction layer between the CPU and the GDB stub.

About

GDB server stub implemented in SystemVerilog and DPI-C

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published