Skip to content

Deadlocks in hvisor #228

@ZZJJWarth

Description

@ZZJJWarth

Deadlocks in hvisor

How do I find the deadlocks taking place

Hi, I am working on implementing the virtio-rng probed by pcie. It's necessary for pcie devices to register some mmio region to initialize the bar registers.

In hvisor, we use mmio_region_register to assign an area by a physical memory address given by guest OS. To run this function, you have to obtain the write lock of zone using this_zone. As I used this_zone to run mmio_region_register, I found that the non-root linux would stuck when it ran into the logic that obtained the write lock of zone.

So why obtaining the write lock of zone will cause deadlock? I tracked down the function calling chain, and found that a function called before my configuration function also obtained the write lock of zone:

`mmio_vpci_handler`->`handle_config_space_access`->`vpci_dev_write_cfg`->`write_cfg`

The mmio_vpci_handler would obtain the write lock of zone, while the write_cfg also need to obtain the write lock of zone, which will definitely leading to deadlock. Can't we just drop the write lock of zone in mmio_vpci_handler before it goes into handle_config_space_access? The answer is No. Because the mmio_vpci_handler have borrowed mutable reference of some member of zone to pass into handle_config_space_access, which means that it's impossible to drop the lock before handle_config_space_access.

How to solve this problem

In order to solve this problem, the most convenient method is to pass the lock into the function all the way down to write_cfg. However, this is ugly.

The main reason that causes this problem is the coarse lock of zone. Many functions are dependent on zone. But even if you only need a single data member of zone, you will have to obtain the whole lock of zone.

This was fine when we implemented the zone with a coarse lock in the first place. But as the use of zone is becoming more and more frequent, we need to make it more fine-grained.

The solution

  1. Make it more fine-grained. I think it's reasonable to use Arc<Rwlock<...>> to get more fine-grained locks on zone
  2. For those data members which is read-only, we don't need to put a lock on it.
  3. It's worth discussing that how fine-grained do we need to solve this problem. On the one hand, fine-grained locks do solve the deadlock, on the other hand, it will introduce some overheads and multiple locks which is prone to causing new deadlock problems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingciGithub CIquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions