Commit Graph

1335 Commits

Author SHA1 Message Date
Sebastien Boeuf
50a4c16d34 pci: Cleanup the crate from kvm_iotcls and kvm_bindings dependencies
Now that KVM specific interrupts are handled through InterruptManager
trait implementation, the pci crate does not need to rely on kvm_ioctls
and kvm_bindings crates.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
4bb12a2d8d interrupt: Reorganize all interrupt management with InterruptManager
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.

By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.

Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.

Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
92082ad439 vmm: Fully implement interrupt traits
After the skeleton of InterruptManager and InterruptSourceGroup traits
have been implemented, this new commit takes care of fully implementing
the content of KvmInterruptManager (InterruptManager trait) and
MsiInterruptGroup (InterruptSourceGroup).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
0f727127d5 vmm: Implement InterruptSourceGroup and InterruptManager skeleton
This commit introduces an empty implementation of both InterruptManager
and InterruptSourceGroup traits, as a proper basis for further
implementation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
11d4d57c06 vm-device: Introduce InterruptManager and InterruptSourceGroup traits
These new traits are meant to abstract the knowledge about the
hypervisor and the type of interrupt being used from the perspective
of the devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
be421dccea vm-virtio: Optimize vhost-user interrupt notification
Thanks to the recently introduced function notifier() in the
VirtioInterrupt trait, all vhost-user devices can now bypass
listening onto an intermediate event fd as they can provide the
actual fd responsible for triggering the interrupt directly to
the vhost-user backend.

In case the notifier does not provide the event fd, the code falls
back onto the creation of an intermediate event fd it needs to listen
to, so that it can trigger the interrupt on behalf of the backend.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
1f029dd2dc vm-virtio: Add notifier to VirtioInterrupt trait
The point is to be able to retrieve directly the event fd related to
the interrupt, as this might optimize the way VirtioDevice devices are
implemented.

For instance, this can be used by vhost-user devices to provide
vhost-user backends directly with the event fd triggering the
interrupt related to a virtqueue.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
c396baca46 vm-virtio: Modify VirtioInterrupt callback into a trait
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.

Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.

For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
ef7d889a79 vfio: Remove unused GSI routing functions
At this point, both MSI and MSI-X handle the KVM GSI routing update,
which means the vfio crate does not have to deal with it anymore.
Therefore, several functions can be removed from the vfio-pci code, as
they are not needed anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
1a4b5ecc75 msi: Set KVM routes from MsiConfig instead of VFIO
Now that MsiConfig has access to both KVM VmFd and the list of GSI
routes, the update of the KVM GSI routes can be directly done from
MsiConfig instead of specifically from the vfio-pci implementation.

By moving the KVM GSI routes update at the MsiConfig level, any PCI
device such as vfio-pci, virtio-pci, or any other emulated PCI device
can benefit from it, without having to implement it on their own.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
f3c3870159 msi: Create MsiConfig to embed MsiCap
The same way we have MsixConfig in charge of managing whatever relates
to MSI-X vectors, we need a MsiConfig structure to manage MSI vectors.
The MsiCap structure is still needed as a low level API, but it is now
part of the MsiConfig which oversees anything related to MSI.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
1e5e02801f msix: Perform interrupt enabling/disabling
In order to factorize one step further, we let MsixConfig perform the
interrupt enabling/disabling. This is done by registering/unregistering
the KVM irq_fds of all GSI routes related to this device.

And now that MsixConfig is in charge of the irq_fds, vfio-pci must rely
on it to retrieve them and provide them to the vfio driver.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
19aeac40c9 msix: Remove the need for interrupt callback
Now that MsixConfig has access to the irq_fd descriptors associated with
each vector, it can directly write to it anytime it needs to trigger an
interrupt.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
3fe362e3bd msix: Set KVM routes from MsixConfig instead of VFIO
Now that MsixConfig has access to both KVM VmFd and the list of GSI
routes, the update of the KVM GSI routes can be directly done from
MsixConfig instead of specifically from the vfio-pci implementation.

By moving the KVM GSI routes update at the MsixConfig level, both
vfio-pci and virtio-pci (or any other emulated PCI device) can benefit
from it, without having to implement it on their own.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
2381f32ae0 msix: Add gsi_msi_routes to MsixConfig
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
9b60fcdc39 msix: Add VmFd to MsixConfig
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
86c760a0d9 msix: Add SystemAllocator to MsixConfig
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.

Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
f77d2c2d16 pci: Add some KVM and interrupt utilities to the crate
In order to anticipate the need for both msi.rs and msix.rs to rely on
some KVM utils and InterruptRoute structure to handle the update of the
KVM GSI routes, this commit adds these utilities directly to the pci
crate. So far, these were exclusively used by the vfio crate, which is
why there were located there.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
f5704d32b3 vmm: Move gsi_msi_routes creation to be shared across all PCI devices
Because we will need to share the same list of GSI routes across
multiple PCI devices (virtio-pci, VFIO), this commit moves the creation
of such list to a higher level location in the code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sergio Lopez
ed5459f268 ci: Add integration test for vhost_user_blk with 'direct'
This test is a variant of test_boot_vhost_user_blk(), named
test_boot_vhost_user_blk_direct(), that instances the vhost-user-blk
daemon with 'direct=true', to test this recently introduced feature
for opening files with O_DIRECT.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Sergio Lopez
e0a8da2f46 vhost_user_blk: Add missing WCE property support
Add missing WCE (write-cache enable) property support. This not only
an enhancement, but also a fix for a bug.

Right now, when vhost_user_blk uses a qcow2 image, it doesn't write
the QCOW2 metadata until the guest explicitly requests a flush. In
practice, this is equivalent to the write back semantic.

Without WCE, the guest assumes write through for the virtio_blk
device, and doesn't send those flush requests. Adding support for WCE,
and enabling it by default, we ensure the guest does send said
requests.

Supporting "WCE = false" would require updating our qcow2
implementation to ensure that, when required, it honors the write
through semantics by not deferring the updates to QCOW2 metadata.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Sergio Lopez
c7e9056c1e vhost_user_blk: implement support for direct (O_DIRECT) mode
Add support for opening the disk images with O_DIRECT. This allows
bypassing the host's file system cache, which is useful to avoid
polluting its cache and for better data integrity.

This mode of operation can be enabled by adding the "direct=<bool>"
parameter to the "backend" argument:

./target/debug/vhost_user_blk --backend image=test.raw,sock=/tmp/vhostblk,direct=true

The "direct" parameter defaults to "false", to preserve the original
behavior.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Sergio Lopez
a14aee9213 qcow: Use RawFile as backend instead of File
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Sergio Lopez
c5a656c9dc vm-virtio: block: Add support for alignment restrictions
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:

 - Address of the source/destination memory buffer.
 - File offset.
 - Length of the data to be read/written.

The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".

To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.

We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Cathy Zhang
e483cde1bb docs: Update networking.md with multiple queue support
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
973eb16ae9 src: Add multiple queue checking in vhost-user-net integration test
Update queue number with 4 to verify if vhost-user-net device
and backend could work well with multiple queue.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
4885dc0ed4 src: Update test_valid_vm_config_net with new option for virtio-net
There are two new options num_queues and queue_size defined for
virtio-net, add them in test_valid_vm_config_net which is used
to validate that both the CLI and the OpenAPI will generate the
same configuration.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
652e7b9b8a vm-virtio: Implement multiple queue support for net devices
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.

Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
404316eea1 vmm: Add multiple queue option and update config for virtio-net device
Add num_queues and queue_size for virtio-net device to make them configurable,
while add the associated options in command line.

Update cloud-hypervisor.yaml with the new options for NetConfig.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
4ab88a8173 net_util: Add multiple queue support for tap
Add support to allow VMMs to open the same tap device many times, it will
create multiple file descriptors meanwhile.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
cf7e0cdf95 vm-virtio: Add multiple queue handling with control queue
Current guest kernel will check the oneline cpu count, in principle,
if the online cpu count is not smaller than the number of queue pairs
VMM reported, the net packets could be put/get to all the virtqueues,
otherwise, only the number of queue pairs that match the oneline cpu
count will have packets work with. guest kernel will send command
through control queue to tell VMMs the actual queue pair numbers which
it could currently play with. Add mq process in control queue handling
to get the queue pair number, VMM will verify if it is in a valid range,
nothing else but this.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
709f7fe607 vm-virtio: Implement control queue support for net devices
While feature VIRTIO_NET_F_CTRL_VQ is negotiated, control queue
will exits besides the Tx/Rx virtqueues, an epoll handler should
be started to monitor and handle the control queue event.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
d38787c578 vm-virtio: Add control queue support in net_util.rs
As virtio spec 1.1 said, the driver uses the control queue
to send commands to manipulate various features of the devices,
such as VIRTIO_NET_F_MQ which is required by multiple queue
support. Here add the control queue handling process.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
1ae7deb393 vm-virtio: Implement refactor for net devices and backend
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
6ae2597d19 vm-virtio: Create new module to abstract common parts for net devices
There are some common logic shared among virtio-net device, vhost-user-net
device and vhost-user-net backend, abstract those parts into net_util.rs
to improve code maintainability and readability.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
3485e89080 vm-virtio: Stop delivering interrupt while NO_VECTOR
According to virtio spec, for used buffer notifications, if
MSI-X capability is enabled, and queue msix vector is
VIRTIO_MSI_NO_VECTOR 0xffff, the device must not deliver an
interrupt for that virtqueue.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
dependabot-preview[bot]
1324aa451f build(deps): bump proc-macro2 from 1.0.7 to 1.0.8
Bumps [proc-macro2](https://github.com/alexcrichton/proc-macro2) from 1.0.7 to 1.0.8.
- [Release notes](https://github.com/alexcrichton/proc-macro2/releases)
- [Commits](https://github.com/alexcrichton/proc-macro2/compare/1.0.7...1.0.8)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-01-17 06:58:27 +00:00
dependabot-preview[bot]
dcb6d02b98 build(deps): bump micro_http from db75e88 to 6327290
Bumps [micro_http](https://github.com/firecracker-microvm/firecracker) from `db75e88` to `6327290`.
- [Release notes](https://github.com/firecracker-microvm/firecracker/releases)
- [Commits](db75e88e9e...6327290424)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-01-17 06:58:10 +00:00
dependabot-preview[bot]
cde2c4449b build(deps): bump backtrace from 0.3.41 to 0.3.42
Bumps [backtrace](https://github.com/rust-lang/backtrace-rs) from 0.3.41 to 0.3.42.
- [Release notes](https://github.com/rust-lang/backtrace-rs/releases)
- [Commits](https://github.com/rust-lang/backtrace-rs/compare/0.3.41...0.3.42)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-01-16 10:59:19 +00:00
dependabot-preview[bot]
d8adf6a6d7 build(deps): bump micro_http from 52e21d0 to db75e88
Bumps [micro_http](https://github.com/firecracker-microvm/firecracker) from `52e21d0` to `db75e88`.
- [Release notes](https://github.com/firecracker-microvm/firecracker/releases)
- [Commits](52e21d0f9e...db75e88e9e)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-01-16 06:38:18 +00:00
Rob Bradford
14041e97e7 docs: Add memory resizing documentation
Add documentation for memory resizing.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
32506dadfc docs: Document CPU unplug
This newly added feature wasn't yet documented.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
8b500d7873 deps: Bump vm-memory and linux-loader version
The function GuestMemory::end_addr() has been renamed to last_addr()

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
7310ab6fa7 devices, vmm: Use a bit field for ACPI GED interrupt type
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
d2d1248342 tests: Add test combining memory and vCPU hotplug
Test resizing with both optional parameters set in the VmResize API
works.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
2073007214 tests: Add integration test for RAM hotplug
Reuse the existing infrastructure for CPU hotplug and add a new test for
memory hotplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
351058ab0f resources: Add memory hotplug support to the kernel configuration
Add the config option for enabling memory hotplug to our recommended
kenrel configuration.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
28c6652e57 vmm: Upon VmResize attempt to hotplug the memory
If a new amount of RAM is requested in the VmResize command try and
hotplug if it an increase (MemoryManager::Resize() silently ignores
decreases.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
4e414f0d84 vmm: device_manager: Scan memory devices upon GED interrupt
If there is a GED interrupt and the field indicates that the memory
device has changed triggers a scan of the memory devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
284d5e011a vmm: Add memory hotplug ACPI entries to DSDT
Generate and expose the DSDT table entries required to support memory
hotplug. The AML methods call into the MemoryManager via I/O ports
exposed as fields.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00