This is a GDB server stub written in SystemVerilog (and a bit of DPI-C). It is primarily designed for RISC-V standard processors, but it can be modified to connect to other processor families.
Advantages:
-
can be used on RTL CPU without explicit debug functionality,
-
real time debugging, no simulation time is consumed by the debugger (useful for bitbanging and real time interrupts),
-
read/write access to registers and memories using direct references,
-
hardware breakpoints/watchpoints.
Disadvantages/limitations:
-
non synthesizable,
-
writing to registers/memory can have unpredictable effects (seriously),
-
RTL modifications are still needed to be able to change the SW execution flow (jump, …),
-
lack of system bus access, since it would consume simulation time,
-
access by reference results in multiple drivers to memory/register file arrays, necessitating the use of
always
instead ofalways_ff
to implement them, this results in missing simulator optimizations and subsequently slower simulations.
acronym | definition |
---|---|
GRP |
General Purpose Registers (register file) |
PC |
Program Counter |
FPR |
Floating-Point Registers (register file) |
CSR |
Configuration Status Registers |
HDL |
|
RTL |
|
ROM |
|
RAM |
The GDB server stub is connected to SoC signals:
-
following the system clock and driving the reset,
-
monitoring CPU IFU/LSU interfaces to trigger breakpoints/watchpoints (can also be used to detect illegal instruction execution),
-
register files (GPR/FPR) and individual registers (PC/CSR) inside the CPU should be connected using bidirectional hierarchical path names,
-
memory arrays inside memory modules should be connected using bidirectional hierarchical path names.
The connection to a GDB/LLDB client uses a Unix or TCP server socket.
The implementation is a mix of DPI-C
and pure SystemVerilog code.
The DPI-C
code implements minimal socket functionality,
while SystemVerilog implements packet parsing, responses
and integration with the CPU.
The DPI-C
implements socket functionality missing from the SystemVerilog language.
-
opening/closing of Unix server sockets, (TODO: TCP sockets)
-
blocking/non-blocking rend/receive access to sockets.
Note
|
While it would be partially possible to use a Linux character device model
and existing SystemVerilog file IO functionality to implement communication with GDB,
SystemVerilog only provides blocking fread /fgetc functions.
Non-blocking read functionality is essential for implementing the GCC continue command,
where the HDL simulation is running and consuming simulation time,
while periodically (every clock cycle) checking for GDB commands.
|
TODO (simple):
-
add support for more packets (
q*
queries,v*
, …), -
~query for capabilities~, (DONE)
-
~advanced mode~, (DONE)
-
~mode without acknowledge~, (DONE, tested)
-
expand support for additional formats of partially supported packets,
-
write regression tests (simple but a lot of work),
-
integration with GUI tools,
-
~support for TCP sockets~, (DONE, tested)
-
support for handling illegal instructions,
-
backtracing, (WIP)
-
support for binary memory access packets (
x
/X
), but this is not really necessary, -
modify DUT memory access to handle byte array instead of single bytes
-
workarounds for Verilator issues,
-
LLDB support, (WIP)
-
~support for pausing the simulation (
$stop()
), to be able to see waveforms~, (DONE using theD
packet, GDBdetach
command) ~also enabling/disabling waveform dumping,~ (DONE using amonitor
packet, not tested yet) -
…
TODO (difficult):
-
understand packet patterns used by GDB,
-
what state the system should start in (reset/running/breakpoint/…)?,
-
inserting/removing breakpoints/watchpoints and relation to step/continue,
-
software breakpoints inserting/removing is done with
z/Z
packets orm/M
(memory access), -
I/C (32/16 bit EBREAK instruction) breakpoints.
-
~check whether there are race conditions to fix~ (WIP, improved code separation),
-
the code currently only runs in Questa, try to port to other simulators,
-
~generalize access to more than one memory~ (DONE), and additional registers (CSR) ~(full generalization requires the SystemVerilog simulator to support the
alias
keyword),~ (not necessary, used a different approach with shadow memory) -
add file storage fot the trace, something like a double linked list
-
…
A common debugger abstraction upon hitting a breakpoint is to execute all instructions up to the breakpoint address, but not the instruction on that address itself. While I was unable to find a strong statement defining this abstraction, it is evident from how software breakpoints work. The instruction at the breakpoint address is replaced by a BREAK instruction. It is similar for watchpoints, but those do not have a comparable software implementation.
While this abstraction makes sense with a common hardware debug interface, it is not the obvious approach for an entirely passive monitor of cycle accurate CPU execution. While a hardware debugger can enforce a state where all instructions up to the breakpoint have been executed and no instructions from the breakpoint on have started execution, a passive monitor is unable to modify the CPU state, so instructions before at and after the breakpoint can be in various stages of the execution pipeline. Additionally, pipelined, super-scalar and OoO CPU architectures perform some sort of speculative execution. In the simplest form, a pipelined CPU starts speculatively executing a branch, but flushes the pipeline, if the branch was mis-predicted.
Therefore the only reliable way to implement breakpoins/watchpoins or stepping in general within a passive monitor, is to detect when instructions are retired. So a passive monitor pauses the simulation at the point where the breakpoint instruction is retired, but presents to the debugger the state before the changes applied by the breakpoint instruction.
While this may sound complicated, it is far easier than the alternatives. My initial approach was to detect when the breakpoint instruction enters the instruction fetch stage, stop the simulation and modeling the state change caused by all the instructions further in the pipeline, but not yet retired. Such a model would basically reimplement the pipeline, which is only makes sense for singlecycle/multicycle implementations without a pipeline.
The shadow copy state and execution trace are designed with the following principles:
-
A CPU simulator can exactly reproduce the execution sequence by using the shadow state updated by the trace. When the DUT and shadow states differ affecting the execution of an instruction, this difference must be recorded in the trace log.
-
When stepping back in time by reverting traced instructions, it must be possible to achieve the exact same shadow state as during the forward execution.
Forward execution of the first 2 instructions:
-
cnt=0
(RESET)-
INIT: initialize the shadow copy (memories, PC, CSR) based on DUT reset values.
-
A breakpoint or a hardcoded condition is placed at the RESET address. In accordance with the breakpoint abstraction, the simulation must retire the first instruction, therefore a step FORWARD is always performed after RESET.
-
FORWARD (as part of the RESET and breakpoint at RESET sequence):
-
-
shadow_apply()
-
dut_step(ret)
(collects first retired instruction) -
push(ret)
(tracetrc[0]
) -
shadow_update(0)
-
shadow_remember(0)
-
match breakpoint/watchpoint
-
cnt++
-
BACKWARD
-
-
cnt=1
-
FORWARD:
-
-
shadow_apply(0)
-
dut_step(ret)
(collects second retired instruction) -
push(ret)
(tracetrc[1]
) -
shadow_update(1)
-
shadow_remember(1)
-
match breakpoint/watchpoint
-
cnt++
-
BACKWARD:
-
responds with an error, since there is nowhere to go
-
-
cnt=2
-
FORWARD:
-
-
shadow_apply(1)
-
dut_step(ret)
(collects third retired instruction) -
push(ret)
(tracetrc[2]
) -
shadow_update(2)
-
shadow_remember(2)
-
match breakpoint/watchpoint
-
cnt++
-
BACKWARD:
-
-
shadow_remember(1)
-
cnt--
-
ret = trc[0]
-
match breakpoint/watchpoint
There are 3 possible step operations 1. record (during first execution) 2. replay (replaying ) 3. revert
For the execution trace the following naming scheme makes sense:
-
cur
- current value -
nxt
- next value
operation | store | load | AMO |
---|---|---|---|
|
wdata → |
rdata → |
wdata → |
|
shadow → |
|
shadow → |
|
|
|
|
|
|
|
|
|
|
|
replay is for previous instruction write data, update is for current instruction read data
-
RESET (empty trace queue)
-
INIT: initialize the shadow copy (memories, PC, CSR) based on DUT reset values.
-
A breakpoint or a hardcoded condition is placed at the RESET address. In accordance with the breakpoint abstraction, the simulation must retire the first instruction, therefore a step FORWARD is always performed after RESET.
-
LOAD: load program into memory.
-
INSPECT: the debugger will see the reset state of the SoC either before or after the program is loaded.
-
FORWARD (as part of the RESET and breakpoint at RESET sequence):
-
-
dut_step(ret)
(collects first retired instruction) -
push(ret)
(tracetrc[cnt]
) -
shadow_replay(cnt-1)
-
shadow_record(cnt)
-
match(cnt)
breakpoint/watchpoint-
BACKWARD
-
-
cnt=0
(queue contains one element)-
INSPECT: the debugger will se the first instruction
-
FORWARD:
-
-
shadow_replay(cnt)
(0) -
cnt++
(0 → 1) -
dut_step(ret)
(collects second retired instruction) -
shadow_record(cnt)
-
push(ret)
(tracetrc[cnt]
) -
match(cnt)
breakpoint/watchpoint-
BACKWARD:
-
responds with an error, since there is nowhere to go
-
-
cnt=1
-
FORWARD (record):
-
-
shadow_replay(cnt)
(1) -
cnt++
(1 → 2) -
dut_step(ret)
(collects third retired instruction) -
push(ret)
(tracetrc[cnt]
) -
shadow_record(cnt)
-
match(ret)
breakpoint/watchpoint-
FORWARD (replay):
-
-
shadow_replay(cnt)
-
cnt++
-
shadow_update(cnt)
-
match(cnt)
breakpoint/watchpoint-
BACKWARD:
-
-
cnt--
-
shadow_revert(1)
-
match(cnt)
breakpoint/watchpoint
operation | write | read | AMO |
---|---|---|---|
|
trc[].rdt → shadow |
||
|
shadow → trc[].rdt |
|
|
trc[].wdt → shadow |
|
shadow → trc[].rdt |
-
useExtendedRemote
microsoft/vscode-cpptools#9505
First, VSCode with the vscode-cpptools extension does support backwards step/continue. A simple prof would be this example (check the debugger buttons). However this is not a common feature and there are no complete examples documenting the steps necessary to enable this feature.
Microsoft provides a document describing how a debugger extension implementing a debugger adapter (DA) connects VScode and a debugger.
VSCode ←DAP→ DA ← GDB/MI → GDB/LLVM ← RSP → QEMU/HDL
VSCode using the the debug adapter protocol (DAP) communicates with the debug adapter (DA) (with VS Code Debug Protocol and Debug Adapter) which uses the GDB machine interface (GDB/MI) protocol to communicate with GDB/LLVM. Further GDB/LLVM communicate with stub in a simulator like QEMU and the one in this project using RSP (GDB Remote Serial Protocol).
During Launch Sequencing the debug adapter should ask the debugger GDB/LLVM about capabilities/features.
The DAP protocol
provides capabilities as a InitializeResponse
to a InitializeRequest
.
DAP protocol requests
StepBack
ReverseContinueRequest
are available if the
supportsStepBack
capability is true
.
The DA connects to GDB using GDB/MI (machine interface).
The DA should ask GDB the -list-target-features
question and get reverse
in the response.
There are a few issues in the DA repository
related to record/replay, reverse execution and GDB/MI -list-target-features
command.
GDB would further communicate with a stub using GDB packets.
The stub should respond to the qSupported
packet with ReverseStep+;ReverseContinue+;
.
In a separate project
I attempt to enable record/replay in QEMU (for ARM) and run a Zephyr APP
to see if I can get the full VSCode integration working.
I could not, but I at least checked the QEMU stub response to the qSupported
RSP packet.
QEMU provides the expected response.
I can use the same setup to check the GDB/MI protocol using a Python implementation of GDB/MI.
-
supportsSteppingGranularity
since requestsNext
,StepIn
,StepOut
andStepBack
have as part of request arguments theSteppingGranularity
. -
supportsInstructionBreakpoints
is required for the availability of requestSetInstructionBreakpoints
. -
supportsDisassembleRequest
is required for availability of requestDisassemble
. -
In my tests only inserting write watchpoints was an option from a VSCode dropdown menu, I would like to see read and access options in the same menu.
Since QEMU record/replay functionality is not fully supported for the RISC-V ISA, this example will use the ARM ISA, to showcase QEMU record/replay and time travel debugging within VSCode.
Install ARM/RISC-V cross compiler:
sudo apt install gcc-riscv64-unknown-elf gdb-riscv64-unknown-elf
sudo apt install gcc-arm-none-eabi gdb-arm-none-eabi
Install ARM/RISC-V QEMU system emulator:
sudo apt install qemu-system-riscv32 qemu-system-riscv64
sudo apt install qemu-system-arm
Press Ctrl-A, x
to exit emulation.
make -C uart01/ ARMGNU=arm-none-eabi
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -icount shift=auto,rr=record,rrfile=replay.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -icount shift=auto,rr=replay,rrfile=replay.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -s -S
Additional maintenance
commands can be found here (useful for RESET functionality?):
Mirosoft C/C++ DAP adapter:
GDB:
-
Questions:
-
What signal should the stub send to GDB, when continue reaches the beginning of execution?
-
What should be the step response if there is no breakpoint/watchpoint?
QEMU:
Verilator:
-
Zephyr RTOS gdbstub
-
OpenOCD gdb_server
-
RISC-V based Virtual Prototype (VP) gdb-mc
-
GDB connection flow https://www.embecosm.com/appnotes/ean4/html/ch03s03s01.html
More notes on record/replay functionality in GDB:
List of major public RISC-V simulators with comments about interfacing with GDB and reverse execution.
-
spike communicates with GDB through JTAG and OpenOCD (GDB example), does not seem to support reverse execution,
-
Intel® Simics® Simulator has RISC-V support can be connected with GDB although reverse execution can only be done from the native console,
-
QEMU (GDB usage, Record/replay) the Wiki page Features/record-replay indicates record/replay is not tested for RISC-V,
RISC-V and deterministic record/replay tools (2018)
How to correctly use QEMU’s record/replay functionality? (2025)
I can see the step/continue backwards buttons.
BUILD, DEBUG, TEST
Alternative way to start the simulator (debugServerPath
/debugServerArgs
/serverStarted
), instead of doing it as a task:
VSCode extension and source code, DAP protocol.
This are interfaces exposing retired instructions, and are useful as an abstraction layer between the CPU and the GDB stub.
SV socket DPI: https://github.com/witchard/sock.sv https://github.com/xver/Shunt
This links are CPU intensive:
Connecting to Python:
Talk about adding socket support to SystemVerilog