Commit Graph

768 Commits

Author SHA1 Message Date
Samuel Ortiz
447af8e702 vmm: vm: Factorize the device and cpu managers creation routine
Into a new_from_memory_manager() routine.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
c73c9b112c vmm: vm: Open kernel and initramfs once all managers are created
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
0646a90626 vmm: cpu: Pass CpusConfig to simplify the new() prototype
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
b584ec3fb3 vmm: memory_manager: Own the system allocator
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
ef2b11ee6c vmm: memory_manager: Pass MemoryConfig to simplify the new() prototype
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
622f3f8fb6 vmm: vm: Avoid ioapic variable creation
For a more readable VM creation routine.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
164e810069 vmm: cpu: Move CPUID patching to CpuManager
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
1a2c1f9751 vmm: vm: Factorize the KVM setup code
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
7a50646c02 vmm: device_manager: Convert migratable_devices to a map
We must be able to map a migratable component id to its device.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-03 18:05:18 +01:00
Samuel Ortiz
8f300bed83 vmm: api: Add a /api/v1/vm.restore endpoint
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-02 13:24:25 +01:00
Samuel Ortiz
92c73c3b78 vmm: Add a VmRestore command
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-02 13:24:25 +01:00
Samuel Ortiz
39d4f817f0 vmm: http: Add a /api/v1/vm.snapshot endpoint
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-02 13:24:25 +01:00
Samuel Ortiz
cf8f8ce93a vmm: api: Add a Snapshot command
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
2020-04-02 13:24:25 +01:00
Sebastien Boeuf
452475c280 vmm: Add migration helpers
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-02 13:24:25 +01:00
Samuel Ortiz
1b1a2175ca vm-migration: Define the Snapshottable and Transportable traits
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.

A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.

Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
2020-04-02 13:24:25 +01:00
Sebastien Boeuf
2d17f4384a vmm: seccomp: Add missing open() syscall
On some systems, the open() system call is used by Cloud-Hypervisor,
that's why it should be part of the seccomp filters whitelist.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-02 09:56:48 +02:00
Sebastien Boeuf
e4ea8b0bef vmm: Add missing syscalls to the seccomp filters
Both clock_gettime and gettimeofday syscalls where missing when running
Cloud-Hypervisor on a Linux host without vDSO enabled. On a system with
vDSO enabled, the syscalls performed by vDSO were not filtered, that's
why we didn't have to whitelist them.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-27 16:50:52 +00:00
Sebastien Boeuf
9e18177654 vmm: Add memory hotplug support to VFIO PCI devices
Extend the update_memory() method from DeviceManager so that VFIO PCI
devices can update their DMA mappings to the physical IOMMU, after a
memory hotplug has been performed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-27 09:35:39 +01:00
Sebastien Boeuf
cc67131ecc vmm: Retrieve new memory region when memory is extended
Whenever the memory is resized, it's important to retrieve the new
region to pass it down to the device manager, this way it can decide
what to do with it.

Also, there's no need to use a boolean as we can instead use an Option
to carry the information about the region. In case of virtio-mem, there
will be no region since the whole memory has been reserved up front by
the VMM at boot. This means only the ACPI hotplug will return a region
and is the only method that requires the memory to be updated from the
device manager.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-27 09:35:39 +01:00
Samuel Ortiz
8fc7bf2953 vmm: Move to the latest linux-loader
Commit 2adddce2 reorganized the crate for a cleaner multi architecture
(x86_64 and aarch64) support.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-03-27 08:48:20 +01:00
Sebastien Boeuf
785812d976 vmm: Fallback to legacy boot if PVH is enabled along with initramfs
For now, the codebase does not support booting from initramfs with PVH
boot protocol, therefore we need to fallback to the legacy boot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-26 11:59:03 +01:00
Damjan Georgievski
6cce7b9560 arch: load initramfs and populate zero page
* load the initramfs File into the guest memory, aligned to page size
* finally setup the initramfs address and its size into the boot params
  (in configure_64bit_boot)

Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
2020-03-26 11:59:03 +01:00
Damjan Georgievski
1f9bc68c54 openapi: Add initramfs support
added InitramfsConfig property to the REST API spec

Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
2020-03-26 11:59:03 +01:00
Damjan Georgievski
4db252b418 main, vmm: add --initramfs cli option
currently unused, the initramfs argument is added to the cli,
and stored in vmm::config:VmConfig as an Option(InitramfsConfig(PathBuf))

Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
2020-03-26 11:59:03 +01:00
Rob Bradford
6244beb9d5 openapi: Add "vm.add-net" entry point
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Rob Bradford
57c3fa4b1e vmm: Add "add-net" to the API
Add the HTTP and internal API entry points for adding a network device
at runtime.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Rob Bradford
f664cddec9 vmm: Add support for adding network devices to the VM
The persistent memory will be hotplugged via DeviceManager and saved in
the config for later use.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Rob Bradford
8f323e61d8 vmm: Add support to DeviceManager for hotplugging network devices
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Rob Bradford
42a9896fe4 vmm: device_manager: Refactor make_virtio_net_devices
Split it into a method that creates a single device which is called by
the multiple device version so this can be used when dynamically adding
a device.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Rob Bradford
9df601a1df bin, vmm: Centralise the net syntax
This will allow the syntax to be reused with cloud-hypervsor binary and
ch-remote.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Samuel Ortiz
41d7b3a387 vmm: memory_manager: Only send the GED notification for the ACPI method
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-03-25 15:54:16 +01:00
Hui Zhu
15d9ec0149 openapit: Add hotplug_method to MemoryConfig
Add hotplug_method to MemoryConfig in cloud-hypervisor.yaml.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-03-25 15:54:16 +01:00
Hui Zhu
e63f98182a vmm: device: Add make_virtio_mem_devices
Add make_virtio_mem_devices to add virtio-mem to vmm.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-03-25 15:54:16 +01:00
Hui Zhu
e6b934a56a vmm: Add support for virtio-mem
This commit adds new option hotplug_method to memory config.
It can set the hotplug method to "acpi" or "virtio-mem".

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-03-25 15:54:16 +01:00
Rob Bradford
75878dd90a openapi: Add "vm.add-pmem" entry point
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
f6f4c68fb4 vmm: Add "add-pmem" to the API
Add the HTTP and internal API entry points for adding persistent memory
at runtime.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
15de30f141 vmm: Add support for adding pmem devices to the VM
The persistent memory will be hotplugged via DeviceManager and saved in
the config for later use.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
f7def621dd vmm: Add support to DeviceManager for hotplugging pmem devices
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
8c3ea8cd76 vmm: device_manager: Refactor make_virtio_pmem_devices
Split it into a method that creates a single device which is called by
the multiple device version so this can be used when dynamically adding
a device.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
a7296bbb52 bin, vmm: Centralise the pmem syntax
This will allow the syntax to be reused with cloud-hypervisor binary and
ch-remote.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
4c9d15d44c vmm: Fix copy and paste error message
vm_remove_device was copied from vm_add_device but the error message
wasn't correctly updated.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Rob Bradford
82cad99c0b openapi: Add "vm.add-disk" entry point
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Rob Bradford
f2151b2734 vmm: Add "add-disk" to the API
Add the HTTP and internal API entry points for adding disks at runtime.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Rob Bradford
164ec2b8e6 vmm: Add support for adding disks to the VM
The disk will be hotplugged via DeviceManager and saved in the config
for later use.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Rob Bradford
b3082c1984 vmm: Add support to DeviceManager for hotplugging disks
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Rob Bradford
2be703ca92 vmm: device_manager: Refactor make_virtio_block_devices
Split it into a method that creates a single device which is called by
the multiple device version so this can be used when dynamically adding
a device.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Rob Bradford
66da29d8dd bin, vmm: Centralise the disk syntax
This will allow the syntax to be reused with cloud-hypervsor binary and
ch-remote.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Sebastien Boeuf
e54f8ec8a5 vmm: Update memory through DeviceManager
Whenever the VM memory is resized, DeviceManager needs to be notified
so that it can subsequently notify each virtio devices about it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 19:01:15 +00:00
Sebastien Boeuf
feb8d7ae90 vmm: Separate seccomp filters between VMM and API threads
This separates the filters used between the VMM and API threads, so that
we can apply different rules for each thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Sebastien Boeuf
f1a23d712f vmm: api: Add seccomp to the HTTP API thread
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Sebastien Boeuf
db62cb3f4d vmm: Add seccomp filter to the VMM thread
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Sebastien Boeuf
cb98d90097 vmm: Create new seccomp_filter module
Based on the seccomp crate, we create a new vmm module responsible for
creating a seccomp filter that will be applied to the VMM main thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Sebastien Boeuf
708f02dc26 vmm: Pull seccomp crate from Firecracker
The seccomp crate from Firecracker is nicely implemented, documented and
tested, which is a good reason for relying on it to create and apply
seccomp filters.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Rob Bradford
8acc15a63c build: Bump vm-memory and linux-loader dependencies
linux-loader depends on vm-memory so must be updated at the same time.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-23 14:27:41 +00:00
Rob Bradford
f7197e8415 vmm: Add a "discard_writes=" to --pmem
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.

This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-20 14:46:34 +01:00
Rob Bradford
d11a67b0fe vmm: Use more generic MmapRegion constructor
Switch to MmapRegion::build() and fill in the fields appropriately.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-20 14:46:34 +01:00
Rob Bradford
7257e890ef vmm: Add "readonly" parameter MemoryManager::create_userspace_mapping
Use this boolean to turn on the KVM_MEM_READONLY flag to indicate that
this memory mapping should not be writable by the VM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-20 14:46:34 +01:00
Qiu Wenbo
c503118d16 vmm: fix a corrupted stack caused by get_win_size
According to `asm-generic/termios.h`, the `struct winsize` should be:

struct winsize {
        unsigned short ws_row;
        unsigned short ws_col;
        unsigned short ws_xpixel;
        unsigned short ws_ypixel;
};

The ioctl of TIOCGWINSZ will trigger a segfault on aarch64.

Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
2020-03-20 07:30:06 +01:00
Rob Bradford
0788600702 build: Remove "pvh_boot" feature flag
This feature is stable and there is no need for this to be behind a
flag. This will also reduce the time needed to run the integration test
as we will not be running them all again under the flag.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-19 13:05:44 +00:00
Rob Bradford
477bc17f18 bin: Share VFIO device syntax between cloud-hypervisor and ch-remote
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-18 23:38:55 +00:00
Jose Carlos Venegas Munoz
a31ffef085 openapi: Add hotplug_size for memory hotplug
Add hotplug_size, needed to be defined when hotplug is used.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-03-18 19:06:07 +00:00
Rob Bradford
87990f9e67 vmm: Add virtio-pci device to B/D/F hash table
This table currently contains only all the VFIO devices and it should
really contain all the PCI devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-18 19:05:58 +00:00
Rob Bradford
fb185fa839 vmm: Always return PCI B/D/F from add_virtio_pci_device
Previously this was only returned if the device had an IOMMU mapping and
whether the device should be added to the virtio-iommu. This was already
captured earlier as part of creating the device so use that information
instead.

Always returning the B/D/F is helpful as it facilitates virtio PCI
device hotplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-18 19:05:58 +00:00
Samuel Ortiz
63eeed29cc vm: Comment on the VM config update from memory hotplug
I spent a few minutes trying to understand why we were unconditionally
updating the VM config memory size, even if the guest memory resizing
did not happen.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-03-18 12:48:40 +01:00
dependabot-preview[bot]
51f51ea17d build(deps): bump libc from 0.2.67 to 0.2.68
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.67 to 0.2.68.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.67...0.2.68)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-03-17 21:36:38 +00:00
Rob Bradford
28a5f9dc19 vmm: acpi: Remove unused IORT related structures
The IORT table for virtio-iommu use was removed and replaced with a
purely virtio based solution. Although the table construction was
removed these structures were left behind.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-17 12:46:26 +00:00
Alejandro Jimenez
9e247c4e06 pvh: Introduce "pvh_boot" feature
Use a new feature called "pvh_boot" to enable using the PVH boot
protocol if the guest kernel supports it. The feature can be enabled
by building with:

cargo build [--release] --features "pvh_boot"

Once performance has been evaluated, this can be made part of the
default set of features so that any guest that supports it boots
using PVH as the preferred option as is the case in QEMU.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
2020-03-13 18:29:44 +01:00
Alejandro Jimenez
a22bc3559f pvh: Write start_info structure to guest memory
Fill the hvm_start_info and related memory map structures as
specified in the PVH boot protocol. Write the data structures
to guest memory at the GPA that will be stored in %rbx when
the guest starts.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
2020-03-13 18:29:44 +01:00
Alejandro Jimenez
840a9a97ff pvh: Initialize vCPU regs/sregs for PVH boot
Set the initial values of the KVM vCPU registers as specified in
the PVH boot ABI:

https://xenbits.xen.org/docs/unstable/misc/pvh.html

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
2020-03-13 18:29:44 +01:00
Alejandro Jimenez
24f0e42e6a pvh: Introduce EntryPoint struct
In order to properly initialize the kvm regs/sregs structs for
the guest, the load_kernel() return type must specify which
boot protocol to use with the entry point address it returns.

Make load_kernel() return an EntryPoint struct containing the
required information. This structure will later be used
in the vCPU configuration methods to setup the appropriate
initial conditions for the guest.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
2020-03-13 18:29:44 +01:00
Rob Bradford
4579afa091 vmm: For --disk error if socket and path is specified
This is an error as the path should be specfied by the unmanaged
backend.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-13 11:41:52 +00:00
Rob Bradford
7e599b4450 vmm: Make disk path optional
When using "--disk" with a vhost socket and not using self spawning then
it is not necessary or helpful to specify the path.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-13 11:41:52 +00:00
Sebastien Boeuf
8d785bbd5f pci: Fix the PciBus using HashMap instead of Vec
By using a Vec to hold the list of devices on the PciBus, there's a
problem when we use unplug. Indeed, the vector of devices gets reduced
and if the unplugged device was not the last one from the list, every
other device after this one is shifted on the bus.

To solve this problem, a HashMap is used. This allows to keep track of
the exact place where each device stands on the bus.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-13 10:54:34 +01:00
Jose Carlos Venegas Munoz
40b38a4222 openapi: Make desired_ram int64 format
The option desired_ram is in byte, make larger the amount of memory to
add.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-03-12 23:17:56 +01:00
Sebastien Boeuf
efba48dddb vmm: Don't put a VFIO device behind the vIOMMU by default
With some of the factorization that happened to be able to support VFIO
hotplug, one mistake was made. In case a vIOMMU is created through a
virtio-iommu device, and no matter the "iommu" option value from the
VFIO device parameter, the VFIO device was always placed behind the
virtual IOMMU.

This commit fixes this wrong behavior by making sure the device
configuration is taken into account to decide if it should be attached
or not to the virtual IOMMU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 19:50:31 +01:00
Sebastien Boeuf
34412c9b41 vmm: Add id option to VFIO hotplug
Add a new id option to the VFIO hotplug command so that it matches the
VFIO coldplug semantic.

This is done by refactoring the existing code for VFIO hotplug, where
VmAddDeviceData structure is replaced by DeviceConfig. This structure is
the one used whenever a VFIO device is coldplugged, which is why it
makes sense to reuse it for the hotplug codepath.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 19:50:31 +01:00
Samuel Ortiz
18dc916380 vmm: Switch to the micro-http package
It's been extracted from the Firecracker code base.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-03-11 17:38:01 +01:00
Sebastien Boeuf
9023444ad3 vmm: Add id field to --device through CLI
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.

This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.

If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.

Fixes #881

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 13:10:57 +00:00
Sebastien Boeuf
f4a956a60a vmm: Remove 32 bits MMIO range from correct address space
The 32 bits MMIO address space is handled separately from the 64 bits
one. For this reason, we need to invoke the appropriate freeing function
to remove a range from this address space.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 13:10:30 +00:00
Sebastien Boeuf
432eb5b70a vmm: Free PCI BARs when unplugging PCI device
Now that PciDevice trait has a dedicated function to remove the bars,
the DeviceManager can invoke this function whenever a PCI device is
unplugged from the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 13:10:30 +00:00
Sebastien Boeuf
b50cbe5064 pci: Give PCI device ID back when removing a device
Upon removal of a PCI device, make sure we don't hold onto the device ID
as it could be reused for another device later.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
df71aaee3f pci: Make the device ID allocation smarter
In order to handle the case where devices are very often plugged and
unplugged from a VM, we need to handle the PCI device ID allocation
better.

Any PCI device could be removed, which means we cannot simply rely on
the vector size to give the next available PCI device ID.

That's why this patch stores in memory the information about the 32
slots availability. Based on this information, whenever a new slot is
needed, the code can correctly provide an available ID, or simply return
an error because all slots are taken.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
e514b124ed vmm: Update VmConfig when removing VFIO device
This commit ensures that when a VFIO device is hot-unplugged from the
VM, it is also removed from the VmConfig. This prevents a potential
reboot from creating the device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
81173bf4ab vmm: Add id field to DeviceConfig structure
Add a new field to the DeviceConfig, allowing the VMM to allocate a name
to the VFIO devices.

By identifying a VFIO device with a unique name, we can make sure a user
can properly unplug it at any time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
6cbdb9aa47 vmm: api: Introduce new "remove-device" HTTP endpoint
This commit introduces the new command "remove-device" that will let a
user hot-unplug a VFIO PCI device from an already running VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
991f3bb5da vmm: Remove VFIO device from everywhere it is referenced
This commit implements the eject function so that a VFIO device will be
removed from any bus it might sit on, and from any list it might be
stored in.

The idea is to reach a point where there is no reference of the device
anywhere in the code, so that the Drop implementation will be invoked
and so that the device will be fully removed from the VMM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
6adebbc6a0 vmm: Detect when guest notifies about ejecting PCI device
When the guest OS is done removing a PCI device, it will invoke the _EJ0
method from ACPI, associated with the device. This will trigger a port
IO write to a region known by the VMM. Upon this writing, the VMM will
trap the VM exit and retrieve the written value.

Based on the value, the VMM will invoke its eject_device() method to
finalize the removal of the device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
08604ac6a8 vmm: Store PCI devices as Any devices from DeviceManager
As we try to keep track of every PCI device related to the VM, we don't
want to have separate lists depending on the concrete type associated
with the PciDevice trait. Also, we want to be able to cast the actual
type into any trait or concrete type.

The most efficient way to solve all these issues is to store every
device as an Arc<dyn Any + Send + Sync>. This gives the ability to
downcast into the appropriate concrete type, and then to cast back into
any trait that we might need.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
0f99d3f7cc vmm: Store VFIO device's name and its PCI b/d/f
Add a new list storing the device names across the entire codebase. VFIO
devices are added to the list whenever a new one is created. By default,
each VFIO device is given a name "vfioX" where X is the first available
integer.

Along with this new list of names, another list is created, grouping PCI
device's name with its associated b/d/f. This will be useful to keep
track of the created devices so that we can implement unplug
functionality.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Rob Bradford
f0a3e7c4a1 build: Bump linux-loader and vm-memory dependencies
linux-loader now uses the released vm-memory so we must move to that
version at the same time.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-05 11:01:30 +01:00
Sebastien Boeuf
09829c44b2 vmm: Remove IO bus strong reference from Vm
The Vm structure was used to store a strong reference to the IO bus.
This is not needed anymore since the AddressManager is logically the
one holding this strong reference. This has been made possible by the
introduction of Weak references on the Bus structure itself.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
2dbb376175 vmm: Remove all Weak references from DeviceManager
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
DeviceManager anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
9e915a0284 vmm: Remove all Weak references from CpuManager
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
CpuManager anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
49268bff3b pci: Remove all Weak references from PciBus
Now that the BusDevice devices are stored as Weak references by the IO
and MMIO buses, there's no need to use Weak references from the PciBus
anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
7773812f58 vmm: Store the list of BusDevice devices from DeviceManager
The point is to make sure the DeviceManager holds a strong reference of
each BusDevice inserted on the IO and MMIO buses. This will allow these
buses to hold Weak references onto the BusDevice devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
d0820cc026 vmm: Make add_vfio_device mutable
The method add_vfio_device() from the DeviceManager needs to be mutable
if we want later to be able to update some internal fields from the
DeviceManager from this same function.

This commit simply takes care of making the necessary changes to change
this function as mutable.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
948f808da6 vm: Rename DeviceManager field in Vm structure
It's more logical to name the field referring to the DeviceManager as
"device_manager" instead of "devices".

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
d47f733e51 vmm: Break the cyclic dependency between DeviceManager and IO bus
By inserting the DeviceManager on the IO bus, we introduced some cyclic
dependency:

  DeviceManager ---> AddressManager ---> Bus ---> BusDevice
        ^                                             |
        |                                             |
        +---------------------------------------------+

This cycle needs to be broken by inserting a Weak reference instead of
an Arc (considered as a strong reference).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c1af13efeb vmm: Update VmConfig when adding new device
Ensures the configuration is updated after a new device has been
hotplugged. In the event of a reboot, this means the new VM will be
started with the new device that had been previously hotplugged.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
a86f4369a7 vmm: Add VFIO PCI device hotplug support
This commit finalizes the VFIO PCI hotplug support, based on all the
previous commits preparing for it.

One thing to notice, this does not support vIOMMU yet. This means we can
hotplug VFIO PCI devices, but we cannot attach them to an existing or a
new virtio-iommu device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
320fea0eaf vmm: Factorize VFIO PCI device creation
This factorization is very important as it will allow both the standard
codepath and the VFIO PCI hotplug codepath to rely on the same function
to perform the addition of a new VFIO PCI device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
00716f90a0 vmm: Store virtio-iommu device from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
5902dfa403 vmm: Store VFIO KVM device from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
d9c1b4396e vmm: Store MSI InterruptManager from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
02adc4061a vmm: Store PciBus from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
d0218e94a3 vmm: Trigger hotplug notification to the guest
Whenever the user wants to hotplug a new VFIO PCI device, the VMM will
have to trigger a hotplug notification through the GED device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
0e58741a09 vmm: api: Introduce new "add-device" HTTP endpoint
This commit introduces the new command "add-device" that will let a user
hotplug a VFIO PCI device to an already running VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
0f1396acef vmm: Insert PCI device hotplug operation region on IO bus
Through the BusDevice implementation from the DeviceManager, and by
inserting the DeviceManager on the IO bus for a specific IO port range,
the VMM now has the ability to handle PCI device hotplug.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
65774e8a78 vmm: Implement BusDevice for DeviceManager
In anticipation of inserting the DeviceManager on the IO/MMIO buses,
the DeviceManager must implement the BusDevice trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
8dbc84318c vmm: acpi: Add PCNT method to invoke DVNT
Create a small method that will perform both hotplug of all the devices
identified by PCIU bitmap, and then perform the hotunplug of all the
devices identified by the PCID bitmap.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c62db97a81 vmm: acpi: Add _EJ0 to each PCI device slot
The _EJ0 method provides the guest OS a way to notify the VMM that the
device has been properly ejected from the guest OS. Only after this
point, the VMM can fully remove the device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
4dc2a39f3a vmm: acpi: Create PHPR container
This new PHPR device in the DSDT table introduces some specific
operation regions and the associated fields.

PCIU stands for "PCI up", which identifies PCI devices that must be
added.
PCID stands for "PCI down", which identifies PCI devices that must be
removed.
B0EJ stands for "Bus 0 eject", which identifies which device on the bus
has been ejected by the guest OS.

Thanks to these fields, the VMM and the guest OS can communicate while
performing hotplug/hotunplug operations.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c3a0685e2d vmm: acpi: Add notification method for PCI device slots
Adds the DVNT method to the PCI0 device in the DSDT table. This new
method is responsible for checking each slot and notify the guest OS if
one of the slots is supposed to be added or removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
5a68d5b6a7 vmm: acpi: Create PCI device slots
This commit introduces the ACPI support for describing the 32 device
slots attached to the main PCI host bridge.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Bin Liu
d6e6901957 vmm/api: Fix vm.info response definition
Update cloud-hypervisor.yaml with latest code.

Fixes: #841

Signed-off-by: liubin <liubin0329@gmail.com>
2020-03-03 09:34:25 +01:00
Sebastien Boeuf
8142c823ed vmm: Move DeviceManager into an Arc<Mutex<>>
In anticipation of the support for device hotplug, this commit moves the
DeviceManager object into an Arc<Mutex<>> when the DeviceManager is
being created. The reason is, we need the DeviceManager to implement the
BusDevice trait and then provide it to the IO bus, so that IO accesses
related to device hotplug can be handled correctly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-27 11:12:31 +01:00
Qiu Wenbo
9de3ace8c7 devices: implement Aml trait for GED device
Fixes: #657

Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
2020-02-25 08:32:16 +00:00
Sebastien Boeuf
b77fdeba2d msi/msi-x: Prevent from losing masked interrupts
We want to prevent from losing interrupts while they are masked. The
way they can be lost is due to the internals of how they are connected
through KVM. An eventfd is registered to a specific GSI, and then a
route is associated with this same GSI.

The current code adds/removes a route whenever a mask/unmask action
happens. Problem with this approach, KVM will consume the eventfd but
it won't be able to find an associated route and eventually it won't
be able to deliver the interrupt.

That's why this patch introduces a different way of masking/unmasking
the interrupts, simply by registering/unregistering the eventfd with the
GSI. This way, when the vector is masked, the eventfd is going to be
written but nothing will happen because KVM won't consume the event.
Whenever the unmask happens, the eventfd will be registered with a
specific GSI, and if there's some pending events, KVM will trigger them,
based on the route associated with the GSI.

Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-25 08:31:14 +00:00
Rob Bradford
bba5ef3a59 vmm: Remove deprecated CPU syntax
Remove the old way of specifying the number of vCPUs to use.

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
374ac77c63 main, vmm: Remove deprecated --vhost-user-net
This has been superseded by using --net with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
ffd816ebfa main, vmm: Remove deprecated --vhost-user-blk
This has been superseded by using --disk with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
dependabot-preview[bot]
f190cb05b5 build(deps): bump libc from 0.2.66 to 0.2.67
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.66 to 0.2.67.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.66...0.2.67)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-02-21 08:03:30 +00:00
Sergio Lopez
d2f1749edb vmm: config: Add poll_queue property to DiskConfig
Recently, vhost_user_block gained the ability of actively polling the
queue, a feature that can be disabled with the poll_queue property.

This change adds this property to DiskConfig, so it can be used
through the "disk" argument.

For the moment, it can only be used when vhost_user=true, but this
will change once virtio-block gets the poll_queue feature too.

Fixes: #787

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Sergio Lopez
378dd81204 vmm: openapi: Add missing "direct" knob to DiskConfig
Add missing "direct" knob that should be exposed through the REST API.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Sergio Lopez
056f5481ac vmm: openapi: Fix "readonly" and "wce" defaults in DiskConfig
Fix "readonly" and "wce" defaults in cloud-hypervisor.yaml to match
their respective defaults in config.rs:DiskConfig.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Samuel Ortiz
c49e31a6d9 vmm: api: Return a resize error when resize fails
And not a VmCreate one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:26:12 +01:00
Samuel Ortiz
ebc6391bea vmm: api: Fix resize command typos
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:26:12 +01:00
Samuel Ortiz
9de755334d vmm: openapi: Update DiskConfig
It's missing a few knobs (readonly, vhost, wce) that should be exposed
through the rest API.

Fixes: #790

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:17:50 +01:00
Rob Bradford
ed1e7817cc vmm: Workaround double reboot triggered by the kernel
The kernel does not adhere to the ACPI specification (probably to work
around broken hardware) and rather than busy looping after requesting an
ACPI reset it will attempt to reset by other mechanisms (such as i8042
reset.)

In order to trigger a reset the devices write to an EventFd (called
reset_evt.) This is used by the VMM to identify if a reset is requested
and make the VM reboot. As the reset_evt is part of the VMM and reused
for both the old and new VM it is possible for the newly booted VM to
immediately get reset as there is an old event sitting in the EventFd.

The simplest solution is to "drain" the reset_evt EventFd on reboot to
make sure that there is no spurious events in the EventFd.

Fixes: #783

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-19 18:51:14 +01:00
Sebastien Boeuf
793d4e7b8d vmm: Move codebase to GuestMemoryAtomic from vm-memory
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.

The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.

Fixes #735

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-19 13:48:19 +00:00
Rob Bradford
1f6cbad01a vmm: Add support for spawning vhost-user-block backend
If no socket is supplied when enabling "vhost_user=true" on "--disk"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-18 08:43:47 +00:00
Sebastien Boeuf
3edc2bd6ab vmm: Prevent memory overcommitment through virtio-fs shared regions
When a virtio-fs device is created with a dedicated shared region, by
default the region should be mapped as PROT_NONE so that no pages can be
faulted in.

It's only when the guest performs the mount of the virtiofs filesystem
that we can expect the VMM, on behalf of the backend, to perform some
new mappings in the reserved shared window, using PROT_READ and/or
PROT_WRITE.

Fixes #763

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-17 15:03:47 +01:00
Rob Bradford
bc75c1b4e1 vmm: Add support for spawning vhost-user-net backend
If no socket is supplied when enabling "vhost_user=true" on "--net"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)

Currently this only supports creating a new tap interface as the network
backend also only supports that.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Rob Bradford
b04eb4770b vmm: Follow the "exe" symlink from the PID directory in /proc
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))

The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Rob Bradford
7c9e8b103f vmm: device_manager: Shutdown all virtio devices
When the DeviceManager is dropped explicitly shutdown() all virtio
devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Sebastien Boeuf
3447e226d9 dependencies: bump vm-memory from 4237db3 to f3d1c27
This commit updates Cloud-Hypervisor to rely on the latest version of
the vm-memory crate.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-06 11:40:45 +01:00
Sebastien Boeuf
62ccccc303 vmm: Make sure to retry creating the VM on EINTR
If the ioctl syscall KVM_CREATE_VM gets interrupted while creating the
VM, it is expected that we should retry since EINTR should not be
considered a standard error.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-05 12:06:21 +01:00
Samuel Ortiz
da2b3c92d3 vm-device: interrupt: Remove InterruptType dependencies and definitions
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.

By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-04 19:32:45 +01:00
Samuel Ortiz
84fc807bc6 interrupt: Interrupt manager split
We create 2 different interrupt managers for separately handling
creation of legacy and MSI interrupt groups.
Doing so allows us to have a cleaner interrupt manager and IOAPIC
initialization path. It also prepares for an InterruptManager trait
design improvement where we remove the interrupt source type dependency
by associating an interrupt configuration type to the trait.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-04 19:32:45 +01:00
Rob Bradford
880a57c920 vmm: Remove VmInfo struct
After refactoring the VmInfo struct is no longer needed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
07bc292fa5 vmm: device_manager: Get VmFd from AddressManager
A reference to the VmFd is stored on the AddressManager so it is not
necessary to pass in the VmInfo into all methods that need it as it can
be obtained from the AddressManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
6411c3ae42 vmm: device_manager: Use MemoryManager to get guest memory
The DeviceManager has a reference to the MemoryManager so use that to
get the GuestMemoryMmap rather than the version stored in the VmInfo
struct.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
066fc6c0d1 vmm: device_manager: Get VM config from the struct member
Remove the use of vm_info in methods to get the config and instead use
the config stored on the DeviceManager itself.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
77ae3de4f3 vmm: device_manager: Make legacy device addition a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
599275b610 vmm: device_manager: Make ACPI device creation a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
b8c1b2e174 vmm: device_manager: Make console creation a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
b5440e2d0a vmm: device_manager: Make virtio device creation functions methods
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. This prepares the way to more easily store state on
the DeviceManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
e90c6f3c44 vmm: device_manager: Make make_virtio_devices a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. A follow-up commit will change the callee functions
that create the devices themselves.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
dbc09ad0ef vmm: device_manager: Make add_vfio_devices a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
d9e1c2cd22 vmm: device_manager: Make add_virtio_pci_device a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
aaa5e2e9ea vmm: device_manager: Make add_virtio_mmio_device a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
2987476e0a vmm: device_manager: Make add_pci_devices and add_mmio_devices methods
Modify these functions to take an &mut self and become methods on
DeviceManager. This allows the removal of some in/out parameters and
leads the way to further refactoring and simplification.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
3dbae423bb vmm: device_manager: Only add MemoryManager to I/O bus on ACPI builds
The MemoryManager should only be included on the I/O bus when doing ACPI
builds as that is the only time it will be interrogated.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
68fa97eb0e vmm: device_manager: Always embed MemoryManager in the struct
Currently the MemoryManager is only used on the ACPI code paths after
the DeviceManager has been created. This will change in a future commit
as part of the refactoring so for now always include it but name it with
underscore prefix to indicate it might not always be used.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Sebastien Boeuf
ac01ceddbb vmm: Cleanup list of PCI IDs related to virtual IOMMU
Now that devices attached to the virtual IOMMU are described through
virtio configuration, there is no need for the DeviceManager to store
the list of IDs for all these devices. Instead, things are handled
locally when PCI devices are being added.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-30 10:37:40 +01:00
Sebastien Boeuf
097cff2d85 vmm: Use virtio topology for virtio-iommu
Instead of relying on the ACPI tables to describe the devices attached
to the virtual IOMMU, let's use the virtio topology, as the ACPI support
is getting deprecated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-30 10:37:40 +01:00
dependabot-preview[bot]
1651cc3953 build(deps): bump kvm-ioctls from 0.4.0 to 0.5.0
Bumps [kvm-ioctls](https://github.com/rust-vmm/kvm-ioctls) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/rust-vmm/kvm-ioctls/releases)
- [Changelog](https://github.com/rust-vmm/kvm-ioctls/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-vmm/kvm-ioctls/compare/v0.4.0...v0.5.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-01-29 10:22:51 +00:00
Rob Bradford
75e6762897 vmm: Give deprecation warning for "--vhost-user-blk" syntax
This will be removed in a future release.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-29 08:06:37 +00:00
Rob Bradford
969b5ee4e8 vmm: config: Add warning about specifying "wce" without "vhost-user"
Currently configuring WCE is only supported when using vhost-user.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-29 08:06:37 +00:00
Rob Bradford
aeeae661fc vmm: Support vhost-user-block via "--disks"
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-block and
vhost-user-block.  For now it is necessary to specify both vhost_user
and socket parameters as auto activation is not yet implemented. The wce
parameter for supporting "Write Cache Enabling" is also added to the
disk configuration.

The original command line parameter is still supported for now and will
be removed in a future release.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-29 08:06:37 +00:00
Rob Bradford
2c6f528c23 vmm: Give deprecation warning for "--vhost-user-net" syntax
This will be removed in a future release.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-28 12:39:26 +00:00
Rob Bradford
a831aa214c vmm: Support vhost-user-net via "--net"
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-net and vhost-user-net.
For now it is necessary to specify both vhost_user and socket parameters
as auto activation is not yet implemented. The original command line
parameter is still supported for now.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-28 12:39:26 +00:00
Sebastien Boeuf
f5b53ae4be vm-virtio: Implement multiqueue/multithread support for virtio-blk
This commit improves the existing virtio-blk implementation, allowing
for better I/O performance. The cost for the end user is to accept
allocating more vCPUs to the virtual machine, so that multiple I/O
threads can run in parallel.

One thing to notice, the amount of vCPUs must be egal or superior to the
amount of queues dedicated to the virtio-blk device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-28 09:26:53 +01:00
Sebastien Boeuf
08e47ebd4b vmm: Add num_queues and queue_size parameters to virtio-blk
The number of queues and the size of each queue were not configurable.
In anticipation for adding multiqueue support, this commit introduces
some new parameters to let the user decide about the number of queues
and the queue size.

Note that the default values for each of these parameters are identical
to the default values used for vhost-user-blk, that is 1 for the number
of queues and 128 for the queue size.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-28 09:26:53 +01:00
dependabot-preview[bot]
16af54e583 build(deps): bump signal-hook from 0.1.12 to 0.1.13
Bumps [signal-hook](https://github.com/vorner/signal-hook) from 0.1.12 to 0.1.13.
- [Release notes](https://github.com/vorner/signal-hook/releases)
- [Changelog](https://github.com/vorner/signal-hook/blob/master/CHANGELOG.md)
- [Commits](https://github.com/vorner/signal-hook/compare/v0.1.12...v0.1.13)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-01-26 13:29:47 +00:00
Sebastien Boeuf
0fa1e2c241 vmm: Handle mapping from devices regions through vm-memory
Devices like virtio-pmem and virtio-fs require some dedicated memory
region to be mapped. The memory mapping from the DeviceManager is being
replaced by the usage of MmapRegion from the vm-memory crate.

The unmap will happen automatically when the MmapRegion will be dropped,
which should happen when the DeviceManager gets dropped.

Fixes #240

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-24 17:56:49 +01:00
Sebastien Boeuf
148a9ed5ce vmm: Fix map_err losing the inner error
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-24 12:42:09 +01:00
Sebastien Boeuf
06396593c9 net_util: Fix map_err losing the inner error
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-24 12:42:09 +01:00
Rob Bradford
a34893a402 Revert "vmm: Move MemoryManager from I/O ports to MMIO region"
This reverts commit 03108fb88b.
2020-01-24 12:08:31 +01:00
Rob Bradford
57ed006992 Revert "devices, vmm: Move GED device to MMIO region"
This reverts commit 5e3c62dc6a.
2020-01-24 12:08:31 +01:00
Rob Bradford
6120d0fb1b Revert "vmm: Move CpuManager device to MMIO region"
This reverts commit 980e03fa0a.
2020-01-24 12:08:31 +01:00
Rob Bradford
980e03fa0a vmm: Move CpuManager device to MMIO region
Move the CpuManager device from the I/O bus to living in an MMIO region.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-23 16:04:58 +00:00
Rob Bradford
5e3c62dc6a devices, vmm: Move GED device to MMIO region
Move GED device reporting of required device type to scan into an MMIO
region rather than an I/O port.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-23 16:04:58 +00:00
Rob Bradford
03108fb88b vmm: Move MemoryManager from I/O ports to MMIO region
Rather than have the MemoryManager device sit on the I/O bus allocate
space for MMIO and add it to the MMIO bus.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-23 16:04:58 +00:00
Sebastien Boeuf
0042f1de75 ioapic: Rely fully on the InterruptSourceGroup to manage interrupts
This commit relies on the interrupt manager and the resulting interrupt
source group to abstract the knowledge about KVM and how interrupts are
updated and delivered.

This allows the entire "devices" crate to be freed from kvm_ioctls and
kvm_bindings dependencies.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-23 11:20:08 +00:00
Sebastien Boeuf
2dca959084 ioapic: Create the InterruptSourceGroup from InterruptManager
The interrupt manager is passed to the IOAPIC creation, and the IOAPIC
now creates an InterruptSourceGroup for MSI interrupts based on it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-23 11:20:08 +00:00
Sebastien Boeuf
52800a871a vmm: Create an InterruptManager dedicated to IOAPIC
By introducing a new InterruptManager dedicated to the IOAPIC, we don't
have to solve the chicken and eggs problem about which of the
InterruptManager or the Ioapic should be created first. It's also
totally fine to have two interrupt manager instances as they both share
the same list of GSI routes and the same allocator.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-23 11:20:08 +00:00
Qiu Wenbo
2034fc2d84 vmm: Fix LENGTH_OFFSET_HIGH of MemoryManager
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
2020-01-22 12:33:38 +00:00
Sergio Lopez
925c862f98 vmm: device_manager: Add 'direct' support for virtio-blk
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'direct' property, honor
it while opening the file backing the disk image, and pass it to
vm_virtio::RawFile.

Fixes #631

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-21 13:39:45 +00:00
Sergio Lopez
fb79e75afc vmm: device_manager: Add read-only support for virtio-blk
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'readonly' properly, and pass
it to vm_virtio::Block.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-21 13:39:45 +00:00
Sebastien Boeuf
9ac06bf613 ci: Run clippy for each specific feature
The build is run against "--all-features", "pci,acpi", "pci" and "mmio"
separately. The clippy validation must be run against the same set of
features in order to validate the code is correct.

Because of these new checks, this commit includes multiple fixes
related to the errors generated when manually running the checks.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 11:44:40 +01:00
Sebastien Boeuf
99f39291fd pci: Simplify PciDevice trait
There's no need for assign_irq() or assign_msix() functions from the
PciDevice trait, as we can see it's never used anywhere in the codebase.
That's why it's better to remove these methods from the trait, and
slightly adapt the existing code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
a20b383be8 vmm: Always use a reference for InterruptManager
Since the InterruptManager is never stored into any structure, it should
be passed as a reference instead of being cloned.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
bb8cd9eb24 vmm: Use LegacyUserspaceInterruptGroup for acpi device
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.

Additionally, since it removes the last bit relying on the Interrupt
trait, the trait and its implementation can be removed from the
codebase.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
75e22ff34e vmm: Use LegacyUserspaceInterruptGroup for serial device
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
8d7c4ea334 vmm: Use LegacyUserspaceInterruptGroup for mmio devices
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
12657ef59f vmm: Fully implement LegacyUserspaceInterruptGroup
Relying on the previous commits, the legacy interrupt implementation can
be completed. The IOAPIC handler is used to deliver the interrupt that
will be triggered through the trigger() method.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
f70c9937fb vmm: Add ioapic to KvmInterruptManager
By having a reference to the IOAPIC, the KvmInterruptManager is going
to be able to initialize properly the legacy interrupt source group.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
c9ea235a0e vmm: Add LegacyUserspaceInterruptGroup skeleton for legacy interrupts
In order to be able to use the InterruptManager abstraction with
virtio-mmio devices, this commit introduces InterruptSourceGroup's
skeleton for legacy interrupts.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
2aabf58bf5 vmm: Move irq_routes creation to specific MSI use case
When KvmInterruptManager initializes a new InterruptSourceGroup, it's
only for PCI_MSI_IRQ case that it needs to allocate the GSI and create a
new InterruptRoute. That's why this commit moves the general code into
the specific use case.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
d34f31fe7b vmm: Fix KvmInterruptManager when base is different from 0
When the base InterruptIndex is different from 0, the loop allocating
GSI and HashMap entries won't work as expected. The for loop needs to
start from base, but the limit must be base+count so that we allocate
a number of "count" entries.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
e73cb1ff80 vmm: Initialize InterruptManager sooner
In order to let the InterruptManager be shared across both PCI and MMIO
devices, this commit moves the initialization earlier in the code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Rob Bradford
3901a1dd7d vmm: Log an error if VM resize fails
As well as returing an error to the API caller.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-17 23:44:21 +01:00
Rob Bradford
76d9bf2792 vmm: Start memory slots at zero
After refactoring a common function is used to setup these slots and
that function takes care of allocating a new slot so it is not necessary
to reserve the initial region slots.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-17 23:44:21 +01:00
Rob Bradford
0ab22fea2c vmm: Only generate GED event when new DIMM added
Avoid the ACPI scan in the guest OS when no new DIMM is hotplugged.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-17 23:44:21 +01:00
Rob Bradford
211786ab42 vmm: Only generate GED interrupt when the number of vCPUs has changed
Avoid activity in the the guest OS if the number of vCPUs has not
changed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-17 23:44:21 +01:00
Sebastien Boeuf
4bb12a2d8d interrupt: Reorganize all interrupt management with InterruptManager
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.

By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.

Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.

Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
92082ad439 vmm: Fully implement interrupt traits
After the skeleton of InterruptManager and InterruptSourceGroup traits
have been implemented, this new commit takes care of fully implementing
the content of KvmInterruptManager (InterruptManager trait) and
MsiInterruptGroup (InterruptSourceGroup).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
0f727127d5 vmm: Implement InterruptSourceGroup and InterruptManager skeleton
This commit introduces an empty implementation of both InterruptManager
and InterruptSourceGroup traits, as a proper basis for further
implementation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
c396baca46 vm-virtio: Modify VirtioInterrupt callback into a trait
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.

Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.

For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
2381f32ae0 msix: Add gsi_msi_routes to MsixConfig
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
9b60fcdc39 msix: Add VmFd to MsixConfig
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
86c760a0d9 msix: Add SystemAllocator to MsixConfig
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.

Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
f5704d32b3 vmm: Move gsi_msi_routes creation to be shared across all PCI devices
Because we will need to share the same list of GSI routes across
multiple PCI devices (virtio-pci, VFIO), this commit moves the creation
of such list to a higher level location in the code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sergio Lopez
a14aee9213 qcow: Use RawFile as backend instead of File
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Sergio Lopez
c5a656c9dc vm-virtio: block: Add support for alignment restrictions
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:

 - Address of the source/destination memory buffer.
 - File offset.
 - Length of the data to be read/written.

The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".

To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.

We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Cathy Zhang
652e7b9b8a vm-virtio: Implement multiple queue support for net devices
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.

Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
404316eea1 vmm: Add multiple queue option and update config for virtio-net device
Add num_queues and queue_size for virtio-net device to make them configurable,
while add the associated options in command line.

Update cloud-hypervisor.yaml with the new options for NetConfig.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
4ab88a8173 net_util: Add multiple queue support for tap
Add support to allow VMMs to open the same tap device many times, it will
create multiple file descriptors meanwhile.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
1ae7deb393 vm-virtio: Implement refactor for net devices and backend
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Rob Bradford
8b500d7873 deps: Bump vm-memory and linux-loader version
The function GuestMemory::end_addr() has been renamed to last_addr()

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
7310ab6fa7 devices, vmm: Use a bit field for ACPI GED interrupt type
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
28c6652e57 vmm: Upon VmResize attempt to hotplug the memory
If a new amount of RAM is requested in the VmResize command try and
hotplug if it an increase (MemoryManager::Resize() silently ignores
decreases.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
4e414f0d84 vmm: device_manager: Scan memory devices upon GED interrupt
If there is a GED interrupt and the field indicates that the memory
device has changed triggers a scan of the memory devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
284d5e011a vmm: Add memory hotplug ACPI entries to DSDT
Generate and expose the DSDT table entries required to support memory
hotplug. The AML methods call into the MemoryManager via I/O ports
exposed as fields.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
8ecf736982 vmm: device_manager: Add the MemoryManager to the I/O bus
Now that the MemoryManager has I/O port functionality it needs to be
exposed on the I/O bus.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
1218765df2 vmm: memory_manager: Expose the slots details via an I/O port
Expose the details of hotplug RAM slots via an I/O port. This will be
consumed by the ACPI DSDT tables to report the hotplug memory details to
the guest.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
9880a2aba9 vmm: memory_manger: Add support for adding new memory to the VM
Add a "resize()" method on MemoryManager which will create a new memory
allocation based on the difference between the desired RAM amount and
the amount already in use. After allocating the added RAM using the same
backing method as the boot RAM store the details in a vector and update
the KVM map and create a new GuestMemoryMmap and replace all the users.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
82fce5a4e2 vmm: Add support for resizing the memory used by the VM
For now the new memory size is only used after a reboot but support for
hotplugging memory will be added in a later commit.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
78dcb1862c vmm: device_manager: Store the type of notification in a local value
When the value is read from the I/O port via the ACPI AML functions to
determine what has been triggered the notifiction value is reset
preventing a second read from exposing the value. If we need support
multiple types of GED notification (such as memory hotplug) then we
should avoid reading the value multiple times.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
f5137e84bb vmm, main: Add optional "hotplug_size" to --mem
This specifies how much address space should be reserved for hotplugging
of RAM. This space is reserved by adding move the start of the device
area by the desired amount.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
f1b6657833 vmm: Make desired vCPUs optional in resize command
In order to be able to support resizing either vCPUs or memory or both
make the fields in the resize command optional.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
72b9e920a3 vmm: memory_manager: Further refactor memory region allocation
This allows the memory regions to be allocated later which is necessary
for hotplug memory.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
1af11a7c92 vmm: memory_manager: Refactor GuestMemoryMmap construction
Make the GuestMemoryMmap from a Vec<Arc<GuestRegionMmap>> by using this
method we can persist a set of regions in the MemoryManager and then
extend this set with a newly created region. Ultimately that will allow
the hotplug of memory.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Samuel Ortiz
5788d36583 vmm: Do not create virtio devices when missing a transport
If neither PCI or MMIO are built in, we should not bother creating any
virtio devices at all.
When building a minimal VMM made of a kernel with an initramfs and a
serial console, the RNG virtio device is still created even though there
is no way it can ever get probed.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-01-14 07:42:09 +01:00
Sebastien Boeuf
ae6f27277b acpi: Introduce VIOT to support latest virtio-iommu implementation
Because virtio-iommu is still evolving (as it's only partly upstream),
some pieces like the ACPI declaration of the different nodes and devices
attached to the virtual IOMMU are changing.

This patch introduces a new ACPI table called VIOT, standing as the high
level table overseeing the IORT table and associated subtables.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-08 09:27:07 +01:00
Rob Bradford
b2589d4f3f vm-virtio, vmm, vfio: Store GuestMemoryMmap in an Arc<ArcSwap<T>>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.

Not covered by this change is replacing the GuestMemoryMmap itself.

This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-02 13:20:11 +00:00
Rob Bradford
a551398135 vmm: device_manager: Use MemoryManager to create KVM mapping
Use the newly exported funtionality to reduce the amount of duplicated
code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
962dec2913 vmm: memory_manager: Refactor KVM userspace mapping creation
This function will be useful for other parts of the VMM that also
estabilish their own mappings.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
7df88793a0 vmm: device_manager: Get device range from MemoryManager
This removes the duplication of these values.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
61cfe3e72d vmm: Obtain sequential KVM memory slot numbers from MemoryManager
This removes the need to handle a mutable integer and also centralises
the allocation of these slot numbers.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
260cebb8cf vmm: Introduce MemoryManager
The memory manager is responsible for setting up the guest memory and in
the long term will also handle addition of guest memory.

In this commit move code for creating the backing memory and populating
the allocator into the new implementation trying to make as minimal
changes to other code as possible.

Follow on commits will further reduce some of the duplicated code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
d5682cd306 vmm: device_manager: Rewrite if chain using match
To reflect updated clippy rules:

error: `if` chain can be rewritten with `match`
    --> vmm/src/device_manager.rs:1508:25
     |
1508 | /                         if ret > 0 {
1509 | |                             debug!("MSI message successfully delivered");
1510 | |                         } else if ret == 0 {
1511 | |                             warn!("failed to deliver MSI message, blocked by guest");
1512 | |                         }
     | |_________________________^
     |
     = note: `-D clippy::comparison-chain` implied by `-D warnings`
     = help: Consider rewriting the `if` chain to use `cmp` and `match`.
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#comparison_chain

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-20 00:52:03 +01:00
Rob Bradford
21b88c3ea0 vmm: cpu: Rewrite if chain using match
Address updated clippy error:

error: `if` chain can be rewritten with `match`
   --> vmm/src/cpu.rs:668:9
    |
668 | /         if desired_vcpus > self.present_vcpus() {
669 | |             self.activate_vcpus(desired_vcpus, None)?;
670 | |         } else if desired_vcpus < self.present_vcpus() {
671 | |             self.mark_vcpus_for_removal(desired_vcpus)?;
672 | |         }
    | |_________^
    |
    = note: `-D clippy::comparison-chain` implied by `-D warnings`
    = help: Consider rewriting the `if` chain to use `cmp` and `match`.
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#comparison_chain

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-20 00:52:03 +01:00
Rob Bradford
e25a47b32c vmm: device_manager: Remove redundant clones
Address updated clippy errors:

error: redundant clone
   --> vmm/src/device_manager.rs:699:32
    |
699 |             .insert(acpi_device.clone(), 0x3c0, 0x4)
    |                                ^^^^^^^^ help: remove this
    |
    = note: `-D clippy::redundant-clone` implied by `-D warnings`
note: this value is dropped without further use
   --> vmm/src/device_manager.rs:699:21
    |
699 |             .insert(acpi_device.clone(), 0x3c0, 0x4)
    |                     ^^^^^^^^^^^
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone

error: redundant clone
   --> vmm/src/device_manager.rs:737:26
    |
737 |             .insert(i8042.clone(), 0x61, 0x4)
    |                          ^^^^^^^^ help: remove this
    |
note: this value is dropped without further use
   --> vmm/src/device_manager.rs:737:21
    |
737 |             .insert(i8042.clone(), 0x61, 0x4)
    |                     ^^^^^
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone

error: redundant clone
   --> vmm/src/device_manager.rs:754:29
    |
754 |                 .insert(cmos.clone(), 0x70, 0x2)
    |                             ^^^^^^^^ help: remove this
    |
note: this value is dropped without further use
   --> vmm/src/device_manager.rs:754:25
    |
754 |                 .insert(cmos.clone(), 0x70, 0x2)
    |                         ^^^^
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-20 00:52:03 +01:00
Rob Bradford
a6878accd5 vmm: cpu: Implement CPU removal
When the running OS has been told that a CPU should be removed it will
shutdown the CPU and then signal to the hypervisor via the "_EJ0" method
on the device that ultimately writes into an I/O port than the vCPU
should be shutdown. Upon notification the hypervisor signals to the
individual thread that it should shutdown and waits for that thread to
end.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-18 08:23:53 +00:00
Rob Bradford
7b3fc72aea vmm: cpu: Notify guest OS that it should offline vCPUs
Allow the resizing of the number of vCPUs to less than the current
active vCPUs. This does not currently remove them from the system but
the kernel will take them offline.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-18 08:23:53 +00:00
Rob Bradford
7e81b0ded7 vmm: cpu: Create vCPU state for all possible vCPUs
This will make it more straightforward when we attempt to remove vCPUs.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-18 08:23:53 +00:00
Rob Bradford
156ea392a2 vmm: cpu: Only do ACPI notify on newly added vCPUs
When we add a vCPU set an "inserting" boolean that is exposed as an ACPI
field that will be checked for and reset when the ACPI GED notification
for CPU devices happens.

This change is a precursor for CPU unplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-16 23:57:14 +01:00
Rob Bradford
e8313e3e69 vmm: acpi: Refactor ACPI CPU notification
Continue to notify on all vCPUs but instead separate the notification
functionality into two methods, CSCN that walks through all the CPUs
and CTFY which notifies based on the numerical CPU id. This is an
interim step towards only notifying on changed CPUs and ultimately CPU
removal.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-16 23:57:14 +01:00
Sebastien Boeuf
d1390906c8 vmm: config: Derive Debug and PartialEq for configuration structures
In anticipation for the writing of unit tests comparing two VmConfig
structures, this commit derives the PartialEq trait for VmConfig and
all embedded structures.

This patch also derives the Debug trait for the same set of structures
so that we can print them to facilitate debugging.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-16 16:48:59 +01:00
Sebastien Boeuf
93f5f6ed45 vmm: config: Provide a default empty command line through OpenAPI
The OpenAPI should not have to provide a command line since the CLI
considers the command line as an empty string if nothing is provided.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-16 16:48:59 +01:00
Sebastien Boeuf
43bd0e53c4 main: Move VmParams creation into a dedicated function
This brings more modularity to the code, which will be helpful when we
will later test the CLI and OpenAPI generate the same VmConfig output.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-16 16:48:59 +01:00
Samuel Ortiz
f0b7412495 vmm: device_manager: Add all virtio devices to the migratable list
We want to track all migratable devices through the DeviceManager.

Fixes: #341

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Samuel Ortiz
37557c8b35 vmm: vm: Implement the Pausable trait
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Samuel Ortiz
9756fc2dd0 vmm: cpu_manager: Implement the Pausable trait
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Samuel Ortiz
35dd1523c9 vmm: device_manager: Implement the Pausable trait
Since the Snapshotable placeholder and Migratable traits are provided as
well, the DeviceManager object and all its objects are now Migratable.

All Migratable devices are tracked as Arc<Mutex<dyn Migratable>>
references.

Keeping track of all migratable devices allows for implementing the
Migratable trait for the DeviceManager structure, making the whole
device model potentially migratable.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Samuel Ortiz
35d7721683 vmm: Convert virtio devices to Arc<Mutex<T>>
Migratable devices can be virtio or legacy devices.
In any case, they can potentially be tracked through one of the IO bus
as an Arc<Mutex<dyn BusDevice>>. In order for the DeviceManager to also
keep track of such devices as Migratable trait objects, they must be
shared as mutable atomic references, i.e. Arc<Mutex<T>>. That forces all
Migratable objects to be tracked as Arc<Mutex<dyn Migratable>>.

Virtio devices are typically migratable, and thus for them to be
referenced by the DeviceManager, they now should be built as
Arc<Mutex<VirtioDevice>>.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Sebastien Boeuf
64c5e3d8cb vmm: api: Adjust FsConfig for OpenAPI
The FsConfig structure has been recently adjusted so that the default
value matches between OpenAPI and CLI. Unfortunately, with the current
description, there is no way from the OpenAPI to describe a cache_size
value "None", so that DAX does not get enabled. Usually, using a Rust
"Option" works because the default value is None. But in this case, the
default value is Some(8G), which means we cannot describe a None.

This commit tackles the problem, introducing an explicit parameter
"dax", and leaving "cache_size" as a simple u64 integer.

This way, the default value is dax=true and cache_size=8G, but it lets
the opportunity to disable DAX entirely with dax=false, which will
simply ignore the cache_size value.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-11 15:50:24 +00:00
Sebastien Boeuf
4bfd51cc42 vmm: api: Match VhostUserBlkConfig defaults between CLI and HTTP API
In order to let the CLI and the HTTP API behave the same regarding the
VhostUserBlkConfig structure, this patch defines some default values
for num_queues, queue_size and wce.

num_queues is 1, queue_size is 128 and wce is true.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-11 15:50:24 +00:00