-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Deadlocks in hvisor
How do I find the deadlocks taking place
Hi, I am working on implementing the virtio-rng probed by pcie. It's necessary for pcie devices to register some mmio region to initialize the bar registers.
In hvisor, we use mmio_region_register to assign an area by a physical memory address given by guest OS. To run this function, you have to obtain the write lock of zone using this_zone. As I used this_zone to run mmio_region_register, I found that the non-root linux would stuck when it ran into the logic that obtained the write lock of zone.
So why obtaining the write lock of zone will cause deadlock? I tracked down the function calling chain, and found that a function called before my configuration function also obtained the write lock of zone:
`mmio_vpci_handler`->`handle_config_space_access`->`vpci_dev_write_cfg`->`write_cfg`
The mmio_vpci_handler would obtain the write lock of zone, while the write_cfg also need to obtain the write lock of zone, which will definitely leading to deadlock. Can't we just drop the write lock of zone in mmio_vpci_handler before it goes into handle_config_space_access? The answer is No. Because the mmio_vpci_handler have borrowed mutable reference of some member of zone to pass into handle_config_space_access, which means that it's impossible to drop the lock before handle_config_space_access.
How to solve this problem
In order to solve this problem, the most convenient method is to pass the lock into the function all the way down to write_cfg. However, this is ugly.
The main reason that causes this problem is the coarse lock of zone. Many functions are dependent on zone. But even if you only need a single data member of zone, you will have to obtain the whole lock of zone.
This was fine when we implemented the zone with a coarse lock in the first place. But as the use of zone is becoming more and more frequent, we need to make it more fine-grained.
The solution
- Make it more fine-grained. I think it's reasonable to use
Arc<Rwlock<...>>to get more fine-grained locks onzone - For those data members which is read-only, we don't need to put a lock on it.
- It's worth discussing that how fine-grained do we need to solve this problem. On the one hand, fine-grained locks do solve the deadlock, on the other hand, it will introduce some overheads and multiple locks which is prone to causing new deadlock problems.