Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 15 additions & 157 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ libvfio-user

vfio-user is a framework that allows implementing PCI devices in userspace.
Clients (such as [qemu](https://qemu.org)) talk the [vfio-user
protocol](https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02458.html)
protocol](https://www.qemu.org/docs/master/interop/vfio-user.html)
over a UNIX socket to a server. This library, `libvfio-user`, provides an API
for implementing such servers.

Expand Down Expand Up @@ -68,32 +68,27 @@ ninja -C build

Finally build your program and link with `libvfio-user.so`.

Supported features
==================
Using the library
=================

With the client support found in
[cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) or the
in-development [qemu](https://gitlab.com/qemu-project/qemu) support, most guest
VM use cases will work. See below for some details on how to try this out.
qemu
----

However, guests with an IOMMU (vIOMMU) will not currently work: the number of
DMA regions is strictly limited, and there are also issues with some server
implementations such as SPDK's virtual NVMe controller.
Step-by-step instructions for using `libvfio-user` with `qemu` can be [found
here](docs/qemu.md).

Currently, `libvfio-user` has explicit support for PCI devices only. In
addition, only PCI endpoints are supported (no bridges etc.).
See also [libvirt](docs/libvirt.md).

API
===
SPDK
----

The API is currently documented via the [libvfio-user header file](./include/libvfio-user.h),
along with some additional [documentation](docs/).
SPDK uses `libvfio-user` to implement a virtual NVMe controller: see
[SPDK and libvfio-user](docs/spdk.md) for more details.

The library (and the protocol) are actively under development, and should not
yet be considered a stable API or interface.
Developing with the library
===========================

The API is not thread safe, but individual `vfu_ctx_t` handles can be
used separately by each thread: that is, there is no global library state.
See [Developing with libvfio-user](./docs/develop.md).

Mailing List & Chat
===================
Expand Down Expand Up @@ -122,143 +117,6 @@ merging, a Coverity scan is also done.

See [Testing](docs/testing.md) for details on how the library is tested.

Examples
========

The [samples directory](./samples/) contains various libvfio-user examples.

lspci
-----

[lspci](./samples/lspci.c) implements an example of how to dump the PCI header
of a libvfio-user device and examine it with lspci(8):

```
# lspci -vv -F <(build/samples/lspci)
00:00.0 Non-VGA unclassified device: Device 0000:0000
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Region 0: I/O ports at <unassigned> [disabled]
Region 1: I/O ports at <unassigned> [disabled]
Region 2: I/O ports at <unassigned> [disabled]
Region 3: I/O ports at <unassigned> [disabled]
Region 4: I/O ports at <unassigned> [disabled]
Region 5: I/O ports at <unassigned> [disabled]
Capabilities: [40] Power Management version 0
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
```

The above sample implements a very simple PCI device that supports the Power
Management PCI capability. The sample can be trivially modified to change the
PCI configuration space header and add more PCI capabilities.


Client/Server Implementation
----------------------------

[Client](./samples/client.c)/[server](./samples/server.c) implements a basic
client/server model where basic tasks are performed.

The server implements a device that can be programmed to trigger interrupts
(INTx) to the client. This is done by writing the desired time in seconds since
Epoch to BAR0. The server then triggers an eventfd-based IRQ and then a message-based
one (in order to demonstrate how it's done when passing of file descriptors
isn't possible/desirable). The device also works as memory storage: BAR1 can
be freely written to/read from by the host.

Since this is a completely made up device, there's no kernel driver (yet).
[Client](./samples/client.c) implements a client that knows how to drive this
particular device (that would normally be QEMU + guest VM + kernel driver).

The client exercises all commands in the vfio-user protocol, and then proceeds
to perform live migration. The client spawns the destination server (this would
be normally done by libvirt) and then migrates the device state, before
switching entirely to the destination server. We re-use the source client
instead of spawning a destination one as this is something libvirt/QEMU would
normally do.

To spice things up, the client programs the source server to trigger an
interrupt and then migrates to the destination server; the programmed interrupt
is delivered by the destination server. Also, while the device is being live
migrated, the client spawns a thread that constantly writes to BAR1 in a tight
loop. This thread emulates the guest VM accessing the device while the main
thread (what would normally be QEMU) is driving the migration.

Start the source server as follows (pick whatever you like for
`/tmp/vfio-user.sock`):

```
rm -f /tmp/vfio-user.sock* ; build/samples/server -v /tmp/vfio-user.sock
```

And then the client:

```
build/samples/client /tmp/vfio-user.sock
```

After a couple of seconds the client will start live migration. The source
server will exit and the destination server will start, watch the client
terminal for destination server messages.

shadow_ioeventfd_server
-----------------------

shadow_ioeventfd_server.c and shadow_ioeventfd_speed_test.c are used to
demonstrate the benefits of shadow ioeventfd, see
[ioregionfd](./docs/ioregionfd.md) for more information.


Other usage notes
=================

qemu
----

Step-by-step instructions for using `libvfio-user` with `qemu` can be [found
here](docs/qemu.md).

SPDK
----

SPDK uses `libvfio-user` to implement a virtual NVMe controller: see
[docs/spdk.md](docs/spdk.md) for more details.

libvirt
-------

You can configure `vfio-user` devices in a `libvirt` domain configuration:

1. Add `xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'` to the `domain`
element.

2. Enable sharing of the guest's RAM:

```xml
<memoryBacking>
<source type='file'/>
<access mode='shared'/>
</memoryBacking>
```

3. Pass the vfio-user device:

```xml
<qemu:commandline>
<qemu:arg value='-device'/>
<qemu:arg value='vfio-user-pci,socket=/var/run/vfio-user.sock,x-enable-migration=on'/>
</qemu:commandline>
```

Live migration
--------------

The `master` branch of `libvfio-user` implements live migration with a protocol
based on vfio's v2 protocol. Currently, there is no support for this in any qemu
client. For current use cases that support live migration, such as SPDK, you
should refer to the [migration-v1 branch](https://github.com/nutanix/libvfio-user/tree/migration-v1).

History
=======

Expand Down
41 changes: 41 additions & 0 deletions docs/develop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Developing with libvfio-user
============================

The API is currently documented via the [libvfio-user header file](../include/libvfio-user.h),
along with some additional [documentation](./).

The library is actively under development, and should not yet be considered a
stable API/ABI.

The protocol itself can be considered stable and will not break backwards
compatibility. See the QEMU repository for the [canonical protocol
definition](https://www.qemu.org/docs/master/interop/vfio-user.html).

The API is not thread safe, but individual `vfu_ctx_t` handles can be
used separately by each thread: that is, there is no global library state.

See [Accessing memory with libvfio-user](memory-mapping.md) for more details on
how to manage memory.

See [Examples](examples.md) for some simple examples of using the library.

Supported features
------------------

With the client support found in
[cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) or
[qemu](https://gitlab.com/qemu-project/qemu), most guest VM use cases will work.

However, guests with an IOMMU (vIOMMU) will not currently work: the number of
DMA regions is strictly limited, and there are also issues with some server
implementations such as SPDK's virtual NVMe controller.

Currently, `libvfio-user` has explicit support for PCI devices only. In
addition, only PCI endpoints are supported (no bridges etc.).

Live migration
--------------

The `master` branch of `libvfio-user` implements live migration with a protocol
based on vfio's v2 protocol. Currently, there is no support for this in any qemu
client. Contributions are welcome!
88 changes: 88 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
Examples
========

The [samples directory](../samples/) contains various libvfio-user examples.

lspci
-----

[lspci](../samples/lspci.c) implements an example of how to dump the PCI header
of a libvfio-user device and examine it with lspci(8):

```
# lspci -vv -F <(build/samples/lspci)
00:00.0 Non-VGA unclassified device: Device 0000:0000
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Region 0: I/O ports at <unassigned> [disabled]
Region 1: I/O ports at <unassigned> [disabled]
Region 2: I/O ports at <unassigned> [disabled]
Region 3: I/O ports at <unassigned> [disabled]
Region 4: I/O ports at <unassigned> [disabled]
Region 5: I/O ports at <unassigned> [disabled]
Capabilities: [40] Power Management version 0
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
```

The above sample implements a very simple PCI device that supports the Power
Management PCI capability. The sample can be trivially modified to change the
PCI configuration space header and add more PCI capabilities.


Client/Server Implementation
----------------------------

[Client](../samples/client.c)/[server](../samples/server.c) implements a basic
client/server model where basic tasks are performed.

The server implements a device that can be programmed to trigger interrupts
(INTx) to the client. This is done by writing the desired time in seconds since
Epoch to BAR0. The server then triggers an eventfd-based IRQ and then a message-based
one (in order to demonstrate how it's done when passing of file descriptors
isn't possible/desirable). The device also works as memory storage: BAR1 can
be freely written to/read from by the host.

Since this is a completely made up device, there's no kernel driver (yet).
[Client](../samples/client.c) implements a client that knows how to drive this
particular device (that would normally be QEMU + guest VM + kernel driver).

The client exercises all commands in the vfio-user protocol, and then proceeds
to perform live migration. The client spawns the destination server (this would
be normally done by libvirt) and then migrates the device state, before
switching entirely to the destination server. We re-use the source client
instead of spawning a destination one as this is something libvirt/QEMU would
normally do.

To spice things up, the client programs the source server to trigger an
interrupt and then migrates to the destination server; the programmed interrupt
is delivered by the destination server. Also, while the device is being live
migrated, the client spawns a thread that constantly writes to BAR1 in a tight
loop. This thread emulates the guest VM accessing the device while the main
thread (what would normally be QEMU) is driving the migration.

Start the source server as follows (pick whatever you like for
`/tmp/vfio-user.sock`):

```
rm -f /tmp/vfio-user.sock* ; build/samples/server -v /tmp/vfio-user.sock
```

And then the client:

```
build/samples/client /tmp/vfio-user.sock
```

After a couple of seconds the client will start live migration. The source
server will exit and the destination server will start, watch the client
terminal for destination server messages.

shadow_ioeventfd_server
-----------------------

shadow_ioeventfd_server.c and shadow_ioeventfd_speed_test.c are used to
demonstrate the benefits of shadow ioeventfd, see
[ioregionfd](./ioregionfd.md) for more information.


25 changes: 25 additions & 0 deletions docs/libvirt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
libvirt
=======

You can configure `vfio-user` devices in a `libvirt` domain configuration:

1. Add `xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'` to the `domain`
element.

2. Enable sharing of the guest's RAM:

```xml
<memoryBacking>
<source type='file'/>
<access mode='shared'/>
</memoryBacking>
```

3. Pass the vfio-user device:

```xml
<qemu:commandline>
<qemu:arg value='-device'/>
<qemu:arg value='{"driver":"vfio-user-pci","socket":{"path": "/tmp/vfio-user.sock", "type": "unix"}'/>
</qemu:commandline>
```
9 changes: 0 additions & 9 deletions docs/meson.build

This file was deleted.

11 changes: 6 additions & 5 deletions docs/qemu.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,17 @@ device.
Building qemu
-------------

You will need QEMU 10.1 plus a small fix. Let's build it:
You will need QEMU 10.1.1 or later. Let's build it:

```
cd ~/src/qemu
git clone https://github.com/jlevon/qemu.git -b fix-class-code .
cd ~/src/
curl -L https://download.qemu.org/qemu-10.1.1.tar.xz | tar xJf -
cd ~/src/qemu-10.1.1

./configure --enable-kvm --enable-vnc --target-list=x86_64-softmmu --enable-trace-backends=log --enable-debug
make -j
```


Starting the server
-------------------

Expand Down Expand Up @@ -51,7 +52,7 @@ Now use the qemu you've built to start the VM as follows:
-kernel ./bzImage \
-hda ./rootfs.ext2 \
-append "console=ttyS0 root=/dev/sda" \
-device vfio-user-pci,socket=/tmp/vfio-user.sock
-device '{"driver":"vfio-user-pci","socket":{"path": "/tmp/vfio-user.sock", "type": "unix"}'
```

Log in to this VM as root (no password). We should be able to interact with the
Expand Down
Loading