GDB server stub

This is a GDB server stub written in SystemVerilog (and a bit of DPI-C). It is primarily designed for RISC-V standard processors, but it can be modified to connect to other processor families.

Advantages:

can be used on RTL CPU without explicit debug functionality,
real time debugging, no simulation time is consumed by the debugger (useful for bitbanging and real time interrupts),
read/write access to registers and memories using direct references,
hardware breakpoints/watchpoints.

Disadvantages/limitations:

non synthesizable,
writing to registers/memory can have unpredictable effects (seriously),
RTL modifications are still needed to be able to change the SW execution flow (jump, …),
lack of system bus access, since it would consume simulation time,
access by reference results in multiple drivers to memory/register file arrays, necessitating the use of always instead of always_ff to implement them, this results in missing simulator optimizations and subsequently slower simulations.

Terminology

Table 1. Terminology

acronym	definition
GRP	General Purpose Registers (register file)
PC	Program Counter
FPR	Floating-Point Registers (register file)
CSR	Configuration Status Registers
HDL	Hardware Description Language
RTL	Register-transfer level
ROM	Read-only memory
RAM	Random-access memory

Integration

The GDB server stub is connected to SoC signals:

following the system clock and driving the reset,
monitoring CPU IFU/LSU interfaces to trigger breakpoints/watchpoints (can also be used to detect illegal instruction execution),
register files (GPR/FPR) and individual registers (PC/CSR) inside the CPU should be connected using bidirectional hierarchical path names,
memory arrays inside memory modules should be connected using bidirectional hierarchical path names.

The connection to a GDB/LLDB client uses a Unix or TCP server socket.

Figure 1. Integration with a CPU block diagram

Parameters

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Memory-Map-Format.html

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Target-Descriptions.html#Target-Descriptions

https://sourceware.org/gdb/current/onlinedocs/gdb.html/General-Query-Packets.html#General-Query-Packets

Ports

Implementation details

The implementation is a mix of DPI-C and pure SystemVerilog code. The DPI-C code implements minimal socket functionality, while SystemVerilog implements packet parsing, responses and integration with the CPU.

`DPI-C`

The DPI-C implements socket functionality missing from the SystemVerilog language.

opening/closing of Unix server sockets, (TODO: TCP sockets)
blocking/non-blocking rend/receive access to sockets.

Note

While it would be partially possible to use a Linux character device model and existing SystemVerilog file IO functionality to implement communication with GDB, SystemVerilog only provides blocking fread/fgetc functions. Non-blocking read functionality is essential for implementing the GCC continue command, where the HDL simulation is running and consuming simulation time, while periodically (every clock cycle) checking for GDB commands.

SystemVerilog packet parsing

Development

TODO (simple):

add support for more packets (q* queries, v*, …),
~query for capabilities~, (DONE)
~advanced mode~, (DONE)
~mode without acknowledge~, (DONE, tested)
expand support for additional formats of partially supported packets,
write regression tests (simple but a lot of work),
integration with GUI tools,
~support for TCP sockets~, (DONE, tested)
support for handling illegal instructions,
backtracing, (WIP)
support for binary memory access packets (x/X), but this is not really necessary,
modify DUT memory access to handle byte array instead of single bytes
workarounds for Verilator issues,
LLDB support, (WIP)
~support for pausing the simulation ($stop()), to be able to see waveforms~, (DONE using the D packet, GDB detach command) ~also enabling/disabling waveform dumping,~ (DONE using a monitor packet, not tested yet)
…

TODO (difficult):

understand packet patterns used by GDB,
what state the system should start in (reset/running/breakpoint/…)?,
inserting/removing breakpoints/watchpoints and relation to step/continue,
software breakpoints inserting/removing is done with z/Z packets or m/M (memory access),
I/C (32/16 bit EBREAK instruction) breakpoints.
~check whether there are race conditions to fix~ (WIP, improved code separation),
the code currently only runs in Questa, try to port to other simulators,
~generalize access to more than one memory~ (DONE), and additional registers (CSR) ~(full generalization requires the SystemVerilog simulator to support the alias keyword),~ (not necessary, used a different approach with shadow memory)
add file storage fot the trace, something like a double linked list
…

Leaky abstraction

A common debugger abstraction upon hitting a breakpoint is to execute all instructions up to the breakpoint address, but not the instruction on that address itself. While I was unable to find a strong statement defining this abstraction, it is evident from how software breakpoints work. The instruction at the breakpoint address is replaced by a BREAK instruction. It is similar for watchpoints, but those do not have a comparable software implementation.

While this abstraction makes sense with a common hardware debug interface, it is not the obvious approach for an entirely passive monitor of cycle accurate CPU execution. While a hardware debugger can enforce a state where all instructions up to the breakpoint have been executed and no instructions from the breakpoint on have started execution, a passive monitor is unable to modify the CPU state, so instructions before at and after the breakpoint can be in various stages of the execution pipeline. Additionally, pipelined, super-scalar and OoO CPU architectures perform some sort of speculative execution. In the simplest form, a pipelined CPU starts speculatively executing a branch, but flushes the pipeline, if the branch was mis-predicted.

Therefore the only reliable way to implement breakpoins/watchpoins or stepping in general within a passive monitor, is to detect when instructions are retired. So a passive monitor pauses the simulation at the point where the breakpoint instruction is retired, but presents to the debugger the state before the changes applied by the breakpoint instruction.

While this may sound complicated, it is far easier than the alternatives. My initial approach was to detect when the breakpoint instruction enters the instruction fetch stage, stop the simulation and modeling the state change caused by all the instructions further in the pipeline, but not yet retired. Such a model would basically reimplement the pipeline, which is only makes sense for singlecycle/multicycle implementations without a pipeline.

The shadow copy state and execution trace are designed with the following principles:

A CPU simulator can exactly reproduce the execution sequence by using the shadow state updated by the trace. When the DUT and shadow states differ affecting the execution of an instruction, this difference must be recorded in the trace log.
When stepping back in time by reverting traced instructions, it must be possible to achieve the exact same shadow state as during the forward execution.

Stepping forward/backward

Forward execution of the first 2 instructions:

cnt=0 (RESET)
- INIT: initialize the shadow copy (memories, PC, CSR) based on DUT reset values.
- A breakpoint or a hardcoded condition is placed at the RESET address. In accordance with the breakpoint abstraction, the simulation must retire the first instruction, therefore a step FORWARD is always performed after RESET.
- FORWARD (as part of the RESET and breakpoint at RESET sequence):
_{shadow_apply()}
dut_step(ret) (collects first retired instruction)
push(ret) (trace trc[0])
shadow_update(0)
shadow_remember(0)
match breakpoint/watchpoint
cnt++
- _BACKWARD
cnt=1
- FORWARD:
shadow_apply(0)
dut_step(ret) (collects second retired instruction)
push(ret) (trace trc[1])
shadow_update(1)
shadow_remember(1)
match breakpoint/watchpoint
cnt++
- BACKWARD:
- responds with an error, since there is nowhere to go
cnt=2
- FORWARD:
shadow_apply(1)
dut_step(ret) (collects third retired instruction)
push(ret) (trace trc[2])
shadow_update(2)
shadow_remember(2)
match breakpoint/watchpoint
cnt++
- BACKWARD:
shadow_remember(1)
cnt--
ret = trc[0]
match breakpoint/watchpoint

There are 3 possible step operations 1. record (during first execution) 2. replay (replaying ) 3. revert

For the execution trace the following naming scheme makes sense:

cur - current value
nxt - next value

operation	store	load	AMO
`monitor_step`	wdata → `trc[].nxt`	rdata → `trc[].cur`	wdata → `trc[].nxt` : rdata → `trc[].cur`
`shadow_record`	shadow → `trc[].cur`	`trc[].cur` → shadow	shadow → `trc[].cur` : `trc[].cur` → shadow
`shadow_update`		`trc[].cur` → shadow	`trc[].cur` → shadow
`shadow_replay`	`trc[].nxt` → shadow		`trc[].nxt` → shadow
	`shadow_revert`	`trc[].cur` → shadow	`trc[].cur` → shadow

replay is for previous instruction write data, update is for current instruction read data

RESET (empty trace queue)
- INIT: initialize the shadow copy (memories, PC, CSR) based on DUT reset values.
- A breakpoint or a hardcoded condition is placed at the RESET address. In accordance with the breakpoint abstraction, the simulation must retire the first instruction, therefore a step FORWARD is always performed after RESET.
- LOAD: load program into memory.
- INSPECT: the debugger will see the reset state of the SoC either before or after the program is loaded.
- FORWARD (as part of the RESET and breakpoint at RESET sequence):
dut_step(ret) (collects first retired instruction)
push(ret) (trace trc[cnt])
_{shadow_replay(cnt-1)}
shadow_record(cnt)
match(cnt) breakpoint/watchpoint
- _BACKWARD
cnt=0 (queue contains one element)
- INSPECT: the debugger will se the first instruction
- FORWARD:
shadow_replay(cnt) (0)
cnt++ (0 → 1)
dut_step(ret) (collects second retired instruction)
shadow_record(cnt)
push(ret) (trace trc[cnt])
match(cnt) breakpoint/watchpoint
- BACKWARD:
- responds with an error, since there is nowhere to go
cnt=1
- FORWARD (record):
shadow_replay(cnt) (1)
cnt++ (1 → 2)
dut_step(ret) (collects third retired instruction)
push(ret) (trace trc[cnt])
shadow_record(cnt)
match(ret) breakpoint/watchpoint
- FORWARD (replay):
shadow_replay(cnt)
cnt++
shadow_update(cnt)
match(cnt) breakpoint/watchpoint
- BACKWARD:
cnt--
shadow_revert(1)
match(cnt) breakpoint/watchpoint

operation	write	read	AMO
`shadow_update`		trc[].rdt → shadow
`shadow_remember`	shadow → trc[].rdt		`shadow_apply`
trc[].wdt → shadow		`shadow_revert`	shadow → trc[].rdt

VSCode integration

useExtendedRemote microsoft/vscode-cpptools#9505

VSCode and time travel debugging

First, VSCode with the vscode-cpptools extension does support backwards step/continue. A simple prof would be this example (check the debugger buttons). However this is not a common feature and there are no complete examples documenting the steps necessary to enable this feature.

Microsoft provides a document describing how a debugger extension implementing a debugger adapter (DA) connects VScode and a debugger.

VSCode ←DAP→ DA ← GDB/MI → GDB/LLVM ← RSP → QEMU/HDL

VSCode using the the debug adapter protocol (DAP) communicates with the debug adapter (DA) (with VS Code Debug Protocol and Debug Adapter) which uses the GDB machine interface (GDB/MI) protocol to communicate with GDB/LLVM. Further GDB/LLVM communicate with stub in a simulator like QEMU and the one in this project using RSP (GDB Remote Serial Protocol).

During Launch Sequencing the debug adapter should ask the debugger GDB/LLVM about capabilities/features.

The DAP protocol provides capabilities as a InitializeResponse to a InitializeRequest.

DAP protocol requests StepBack ReverseContinueRequest are available if the supportsStepBack capability is true.

The DA connects to GDB using GDB/MI (machine interface). The DA should ask GDB the -list-target-features question and get reverse in the response.

There are a few issues in the DA repository related to record/replay, reverse execution and GDB/MI -list-target-features command.

GDB would further communicate with a stub using GDB packets. The stub should respond to the qSupported packet with ReverseStep+;ReverseContinue+;.

QEMU

In a separate project I attempt to enable record/replay in QEMU (for ARM) and run a Zephyr APP to see if I can get the full VSCode integration working. I could not, but I at least checked the QEMU stub response to the qSupported RSP packet. QEMU provides the expected response.

I can use the same setup to check the GDB/MI protocol using a Python implementation of GDB/MI.

supportsSteppingGranularity since requests Next, StepIn, StepOut and StepBack have as part of request arguments the SteppingGranularity.
supportsInstructionBreakpoints is required for the availability of request SetInstructionBreakpoints.
supportsDisassembleRequest is required for availability of request Disassemble.
In my tests only inserting write watchpoints was an option from a VSCode dropdown menu, I would like to see read and access options in the same menu.

QEMU ARM record/replay demo with VSCode

Since QEMU record/replay functionality is not fully supported for the RISC-V ISA, this example will use the ARM ISA, to showcase QEMU record/replay and time travel debugging within VSCode.

Install ARM/RISC-V cross compiler:

sudo apt install gcc-riscv64-unknown-elf gdb-riscv64-unknown-elf
sudo apt install gcc-arm-none-eabi gdb-arm-none-eabi

Install ARM/RISC-V QEMU system emulator:

sudo apt install qemu-system-riscv32 qemu-system-riscv64
sudo apt install qemu-system-arm

`qemu_arm_samples`

Press Ctrl-A, x to exit emulation.

make -C uart01/ ARMGNU=arm-none-eabi
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -icount shift=auto,rr=record,rrfile=replay.bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -icount shift=auto,rr=replay,rrfile=replay.bin

qemu-system-arm -M versatilepb -m 128M -nographic -kernel uart01/notmain.bin -s -S

Debugging Zephyr on ARM QEMU

https://dojofive.com/blog/using-the-qemu-emulator-with-zephyr-builds-and-vscode/

References

Additional maintenance commands can be found here (useful for RESET functionality?):

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Maintenance-Commands.html

https://medium.com/@tatsuo.nomura/implement-gdb-remote-debug-protocol-stub-from-scratch-2-5e3025f0e987

Issue reports

Mirosoft C/C++ DAP adapter:

GDB:

Questions:
What signal should the stub send to GDB, when continue reaches the beginning of execution?
What should be the step response if there is no breakpoint/watchpoint?

QEMU:

Time travel debugging and integration with VSCode DAP adapter

Verilator:

Various stub implementations

GDBWave
Zephyr RTOS gdbstub
OpenOCD gdb_server
RISC-V based Virtual Prototype (VP) gdb-mc
Qemu gdbstub
gdbstub
GDB connection flow https://www.embecosm.com/appnotes/ean4/html/ch03s03s01.html

More notes on record/replay functionality in GDB:

GDB: Recording Inferior’s Execution and Replaying It,
GDB: Running programs backward,
Guinevere Larsen: Using GDB to time travel,
Guinevere Larsen: Advanced time manipulation with GDB,
FOSDEM 2024: Guinevere Larsen: Manipulating time with GDB (slides, video)

Major RISC-V simulators

List of major public RISC-V simulators with comments about interfacing with GDB and reverse execution.

spike communicates with GDB through JTAG and OpenOCD (GDB example), does not seem to support reverse execution,
Intel® Simics® Simulator has RISC-V support can be connected with GDB although reverse execution can only be done from the native console,
QEMU (GDB usage, Record/replay) the Wiki page Features/record-replay indicates record/replay is not tested for RISC-V,
the last public version of riscvOVPsim

Record/replay and reverse execution

RISC-V and deterministic record/replay tools (2018)

https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/JrJa01hihCQ/m/55rbSlpoAgAJ

How to correctly use QEMU’s record/replay functionality? (2025)

https://stackoverflow.com/questions/79670297/how-to-correctly-use-qemus-record-replay-functionality

GDB/LLVM integration with VSCode

I can see the step/continue backwards buttons.

https://www.justinmklam.com/posts/2017/10/vscode-debugger-setup/

BUILD, DEBUG, TEST

https://code.visualstudio.com/docs/debugtest/tasks

Alternative way to start the simulator (debugServerPath/debugServerArgs/serverStarted), instead of doing it as a task:

https://stackoverflow.com/questions/58048139/enable-semi-hosting-automatically-in-gdb-after-connecting-to-a-remote-target

LLDB

VSCode extension and source code, DAP protocol.

Application examples

RISC-V verification interfaces

This are interfaces exposing retired instructions, and are useful as an abstraction layer between the CPU and the GDB stub.

DPI:

https://verificationacademy.com/forums/t/how-to-pass-time-in-systemverilog-while-waiting-for-data-on-a-socket-in-dpi/37817/2

Questa GCC issue: https://www.reddit.com/r/FPGA/comments/nfkuq6/modelsim_fatal_vsim3828_could_not_link_vsim_auto/

Socket

Linux socket send and recv.

Non-blocking: https://stackoverflow.com/questions/20588002/nonblocking-get-character

https://www.consulting.amiq.com/2020/08/14/non-blocking-socket-communication-in-systemverilog-using-dpi-c/

SV socket DPI: https://github.com/witchard/sock.sv https://github.com/xver/Shunt

This links are CPU intensive:

Connecting to Python:

Talk about adding socket support to SystemVerilog

https://www.accellera.org/images/eda/sv-ec/0074.html

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.settings		.settings
.vscode		.vscode
doc		doc
hdl		hdl
sim		sim
submodules		submodules
test/firmware		test/firmware
.gitmodules		.gitmodules
.library_mapping.xml		.library_mapping.xml
.project		.project
README.adoc		README.adoc
SCRATCHPAD.md		SCRATCHPAD.md
settings-questa.sh		settings-questa.sh
settings-verilator.sh		settings-verilator.sh
settings-vivado.sh		settings-vivado.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GDB server stub

Terminology

Integration

Parameters

Ports

Implementation details

`DPI-C`

SystemVerilog packet parsing

Development

Leaky abstraction

Stepping forward/backward

VSCode integration

VSCode and time travel debugging

QEMU

QEMU ARM record/replay demo with VSCode

`qemu_arm_samples`

Debugging Zephyr on ARM QEMU

References

Issue reports

Various stub implementations

Major RISC-V simulators

Record/replay and reverse execution

GDB/LLVM integration with VSCode

LLDB

Application examples

RISC-V verification interfaces

DPI:

Socket

About

Uh oh!

Releases

Packages

Languages

jeras/gdb_server_stub_sv

Folders and files

Latest commit

History

Repository files navigation

GDB server stub

Terminology

Integration

Parameters

Ports

Implementation details

DPI-C

SystemVerilog packet parsing

Development

Leaky abstraction

Stepping forward/backward

VSCode integration

VSCode and time travel debugging

QEMU

Related capabilities

QEMU ARM record/replay demo with VSCode

qemu_arm_samples

Debugging Zephyr on ARM QEMU

References

Issue reports

Various stub implementations

Major RISC-V simulators

Record/replay and reverse execution

GDB/LLVM integration with VSCode

LLDB

Application examples

RISC-V verification interfaces

DPI:

Socket

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`DPI-C`

`qemu_arm_samples`

Packages