-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Bug Report
I am currently running the Debian system on the sysoul-x3300 platform (based on rk3588). During memory stress testing using memtester, I observed a critical stability issue.
When the tested memory size exceeds 4 GiB (total memory is 16 GiB, available 15 GiB; testing 1 GiB or 2 GiB works fine), an MMIO fault in zone0 is frequently triggered. This occurs even though the accessed memory address is correctly configured as belonging to zone0 in board.rs.
Logs
root@linaro-alip:/root# free -h
total used free shared buff/cache available
Mem: 15Gi 342Mi 14Gi 21Mi 342Mi 15Gi
Swap: 0B 0B 0B
root@linaro-alip:/root# memtester 12G 1
memtester version 4.5.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 12288MB (12884901888 bytes)
got 12288MB (12884901888 bytes), trying mlock ...[WARN 1] (hvisor::memory::mmio:109) Zone 0 unhandled mmio fault MMIOAccess {
address: 0xb600000,
size: 0x1,
is_write: true,
value: 0xffffff800b600000,
}
[ERROR 1] (hvisor::arch::aarch64::trap:251) mmio_handle_access: [src/memory/mmio.rs:110:13] Invalid argument
[ERROR 1] (hvisor::panic:24) panic occurred: PanicInfo {
payload: Any { .. },
message: Some(
root zone has some error,
),
location: Location {
file: "src/zone.rs",
line: 303,
col: 9,
},
can_unwind: true,
force_no_backtrace: false,
}
[WARN 0] (hvisor::memory::mmio:109) Zone 0 unhandled mmio fault MMIOAccess {
address: 0x3ec045a08,
size: 0x1,
is_write: false,
value: 0x0,
}
[ERROR 0] (hvisor::arch::aarch64::trap:251) mmio_handle_access: [src/memory/mmio.rs:110:13] Invalid argument
[ERROR 0] (hvisor::panic:24) panic occurred: PanicInfo {
payload: Any { .. },
message: Some(
root zone has some error,
),
location: Location {
file: "src/zone.rs",
line: 303,
col: 9,
},
can_unwind: true,
force_no_backtrace: false,
}
Configuration (board.rs)
/// The physical memory layout of the board.
/// Each address should align to 2M (0x20_0000).
/// Addresses must be in ascending order.
#[rustfmt::skip]
pub const BOARD_PHYSMEM_LIST: &[(u64, u64, MemoryType)] = &[
// ( start, end, type)
( 0x0000_0000, 0x0020_0000, MemoryType::Device), // Includes low-address SRAM, marked as Device
( 0x0020_0000, 0x0840_0000, MemoryType::Normal),
( 0x0940_0000, 0xf000_0000, MemoryType::Normal),
( 0xf000_0000, 0x1_0000_0000, MemoryType::Device), // Dense device region, marked as Device.
(0x1_0000_0000, 0x3_fc00_0000, MemoryType::Normal),
// (0x3_fc50_0000, 0x3_fff0_0000, MemoryType::Normal),
(0x3_fc40_0000, 0x4_0000_0000, MemoryType::Normal), // aligned to 2 MiB
(0x4_f000_0000, 0x5_0000_0000, MemoryType::Normal),
];pub const ROOT_ZONE_MEMORY_REGIONS: &[HvConfigMemoryRegion] = &[
// /proc/iomem System RAM
HvConfigMemoryRegion {
mem_type: MEM_TYPE_RAM,
physical_start: 0x0020_0000,
virtual_start: 0x0020_0000,
size: 0x0820_0000,
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_RAM,
physical_start: 0x0940_0000,
virtual_start: 0x0940_0000,
size: 0xe6c0_0000,
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_RAM,
physical_start: 0x1_0000_0000,
virtual_start: 0x1_0000_0000,
size: 0x2_fc00_0000,
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_RAM,
physical_start: 0x3_fc50_0000,
virtual_start: 0x3_fc50_0000,
size: 0x03a0_0000,
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_RAM,
physical_start: 0x4_f000_0000,
virtual_start: 0x4_f000_0000,
size: 0x1000_0000,
},
// Ramoops
HvConfigMemoryRegion {
mem_type: MEM_TYPE_RAM,
physical_start: 0x0011_0000,
virtual_start: 0x0011_0000,
size: 0x000f_0000,
},
// /proc/iomem Devices I/O
HvConfigMemoryRegion {
mem_type: MEM_TYPE_IO,
physical_start: 0xfb00_0000,
virtual_start: 0xfb00_0000,
size: 0x0020_0000,
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_IO,
physical_start: 0xfc00_0000,
virtual_start: 0xfc00_0000,
size: 0x0200_0000,
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_IO,
physical_start: 0xfe00_0000,
virtual_start: 0xfe00_0000,
size: 0x0060_0000,
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_IO,
physical_start: 0xfea0_0000,
virtual_start: 0xfea0_0000,
size: 0x0050_0000,
},
// SRAM and Other Devices
HvConfigMemoryRegion {
mem_type: MEM_TYPE_IO,
physical_start: 0x0010_f000,
virtual_start: 0x0010_f000,
// size: 0x0100, // 10f000.sram
size: 0x1000, // aligned with page size
},
HvConfigMemoryRegion {
mem_type: MEM_TYPE_IO,
physical_start: 0xff00_1000,
virtual_start: 0xff00_1000,
size: 0x000e_e000, //ff001000.sram
},
// Unknown Region, maybe we should ask vendor for help
HvConfigMemoryRegion {
mem_type: MEM_TYPE_IO,
physical_start: 0x0010_0000,
virtual_start: 0x0010_0000,
size: 0xf000,
},
];Root Cause Analysis
Upon investigation, the root cause is an insufficient reserved memory area for the hypervisor in the device tree, leading to memory corruption by the root-linux kernel.
According to src/consts.rs, the memory layout of hvisor consists of:
- Static binary code (
.text,.data, etc.) - Per-CPU local storage (Stack, etc.)
- Frame Allocator Memory Pool
Source Code Reference (src/consts.rs):
pub use crate::memory::PAGE_SIZE;
use crate::{memory::addr::VirtAddr, platform::BOARD_NCPUS};
/// Size of the hypervisor heap.
pub const HV_HEAP_SIZE: usize = 1024 * 1024; // 1 MiB
pub const HV_MEM_POOL_SIZE: usize = 64 * 1024 * 1024; // 64 MiB
/// Size of the per-CPU data (stack and other CPU-local data).
pub const PER_CPU_SIZE: usize = 512 * 1024; // 512 KiB
/// ... (omitted)
pub fn mem_pool_start() -> VirtAddr {
core_end() + MAX_CPU_NUM * PER_CPU_SIZE
}
pub fn hv_end() -> VirtAddr {
mem_pool_start() + HV_MEM_POOL_SIZE
}Memory Layout Calculation (sysoul-x3300, 8 CPUs):
- Start Address:
0x0050_0000 core_end(Binary end):0x006e_6000mem_pool_start:0x00ae_6000- Calculation:
core_end+ (512 KiB * 8 CPUs) ≈0x006e_6000+ 4 MiB
- Calculation:
hv_end:0x04ae_6000- Calculation:
mem_pool_start+ 64 MiB (Frame Allocator)
- Calculation:
The Discrepancy:
The actual required memory range extends up to 0x04ae_6000 (approx. 70 MiB total). However, most existing device tree configurations only reserve 4 MiB for hvisor.
%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'arial', 'fontSize': '14px'}}}%%
flowchart LR
classDef memBlock fill:#e3f2fd,stroke:#1565c0,stroke-width:1px;
classDef boundaryNode fill:none,stroke:none,color:#555,font-size:12px;
classDef dangerBlock fill:#ffcdd2,stroke:#b71c1c,stroke-width:2px;
subgraph Reserved ["✅ Reserved Memory (Safe: 4 MiB)<br/>Range: 0x0050_0000 ~ 0x0090_0000"]
direction LR
StartAddr["0x0050_0000"]:::boundaryNode
Bin["Static Bin<br/>(~1.9 MiB)<br/>End: 0x006E_6000"]:::memBlock
C0["CPU 0<br/>512 KiB"]:::memBlock
C1["CPU 1<br/>512 KiB"]:::memBlock
C2["CPU 2<br/>512 KiB"]:::memBlock
C3["CPU 3<br/>512 KiB<br/>End: 0x008E_6000"]:::memBlock
StartAddr --- Bin --- C0 --- C1 --- C2 --- C3
end
subgraph Unreserved ["❌ Unreserved Region (Unsafe / MMIO Fault Risk)<br/>Range: 0x0090_0000 ~ 0x04AE_6000"]
direction LR
C4["CPU 4<br/>(Cross Boundary)<br/>Start: 0x008E_6000"]:::dangerBlock
C5["CPU 5<br/>512 KiB"]:::dangerBlock
C6["CPU 6<br/>512 KiB"]:::dangerBlock
C7["CPU 7<br/>512 KiB"]:::dangerBlock
PoolStartAddr["0x00AE_6000"]:::boundaryNode
FrameAlloc["Frame Allocator Pool<br/>Size: 64 MiB<br/>(Target of Corruption)"]:::dangerBlock
EndAddr["0x04AE_6000"]:::boundaryNode
C4 --- C5 --- C6 --- C7 --- PoolStartAddr --- FrameAlloc --- EndAddr
end
C3 --- C4
style Reserved fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,stroke-dasharray: 5 5
style Unreserved fill:#ffebee,stroke:#c62828,stroke-width:2px,stroke-dasharray: 5 5
Failure Mechanism
- The reserved 4 MiB covers the static binary and potentially the per-CPU data for the first few cores, but completely fails to cover the 64 MiB Frame Allocator.
hvisoruses this Frame Allocator to manage memory regions via a BTree structure.- When running
memtesterwith large memory blocks, the root-linux kernel allocates pages that physically overlap withhvisor's unreserved Frame Allocator region. - Linux overwrites the Frame Allocator data, corrupting the BTree metadata used for zone memory region tracking.
- Consequently,
hvisorloses track of valid memory regions, resulting in false MMIO faults when those addresses are accessed.
Why it seemed to work before:
- Luck: The specific physical pages used by the Frame Allocator were not allocated/overwritten by Linux during lighter loads.
- Partial Coverage: The 4 MiB reservation covers the binary and initial CPU stacks. Since root-linux often utilizes fewer cores (e.g., 2 cores) during boot or idle, the per-CPU data for the active cores remained safe within the reserved area.
Action Items
To resolve this issue and prevent future occurrences, the following actions are required:
- Configuration Fix: Update all existing board configurations and Device Trees (DTS) to reserve sufficient memory (covering the full 64 MiB pool + per-CPU areas).
- CI/CD Enhancement: Integrate
memtesterinto the CI system test workflow. The root-linux should perform memory stress tests immediately after boot to ensure memory integrity before proceeding with other tests. This explains the high failure rate in past CI runs. - Documentation: Update the
hvisor-bookto explicitly document the static and runtime memory layout. Add a guide on how to correctly calculate and configurereserved-memoryin the device tree.