Commit Graph

1434 Commits

Author SHA1 Message Date
Michael Zhao
c9374d87ac vmm: Update devid in kvm_irq_routing_entry
After introducing multiple PCI segments, the `devid` value in
`kvm_irq_routing_entry` exceeds the maximum supported range on AArch64.

This commit restructed the `devid` to the allowed range.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-12-01 09:24:01 +08:00
Rob Bradford
82d06c0efa vmm: Add support for booting raw binary (e.g. firmware) on x86-64
If the provided binary isn't an ELF binary assume that it is a firmware
to be loaded in directly. In this case we shouldn't program any of the
registers as KVM starts in that state.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-30 13:39:36 +01:00
Ziye Yang
61ce4b8f31 vmm: Update comments related with enum Error struct in config.rs
Make the comments style consistent

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
2021-11-26 10:22:57 +01:00
Ziye Yang
896a651b5c vmm: Update some comments and error message info in config.rs
Update some comments and error message info related with TDX.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
2021-11-24 10:02:00 +01:00
Ziye Yang
51cfffd24f vmm: Make the comments consistent in 'DeviceManager'
Change  "Failed xxing" to "Failed to xx", then
we can only we one style.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
2021-11-19 08:43:23 +00:00
Bo Chen
2a312cd4fe vmm: Fix a comment typo from 'DeviceManager'
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-11-18 12:00:39 -08:00
Wei Liu
ff0e92ab88 vmm: add a safety comment for EpollContext
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-11-17 23:12:11 +00:00
Wei Liu
9b3cab8c72 device_manager: check return value of dup(2)
That function call can return -1 when it fails. Wrapping -1 into File
causes the code to panic when the File is dropped.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-11-17 23:12:11 +00:00
Wei Liu
84630aa0b5 device_manager: provide a few safety comments
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-11-17 23:12:11 +00:00
Alyssa Ross
ad8ed80eb1 vmm: use the tty raw mode implementation from libc
I encountered some trouble trying to use a virtio-console hooked up to
a PTY.  Reading from the PTY would produce stuff like this
"\n\nsh-5.1# \n\nsh-5.1# " (where I'm just pressing enter at a shell
prompt), and a terminal would render that like this:

----------------------------------------------------------------

sh-5.1#

       sh-5.1#
----------------------------------------------------------------

This was because we weren't disabling the ICRNL termios iflag, which
turns carriage returns (\r) into line feeds (\n).  Other raw mode
implementations (like QEMU's) set this flag, and don't have this
problem.

Instead of fixing our raw mode implementation to just disable ICRNL,
or copy the flags from QEMU's, though, here I've changed it to use the
raw mode implementation in libc.  It seems to work correctly in my
testing, and means we don't have to worry about what exactly raw mode
looks like under the hood any more.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2021-11-17 14:41:00 +00:00
Rob Bradford
419870ae45 vmm: Add epoll_ctl() syscall to vCPU seccomp filter
Fix seccomp violation when trying to add the out FD to the epoll loop
when the serial buffer needs to be flushed.

0x00007ffff7dc093e in epoll_ctl () at ../sysdeps/unix/syscall-template.S:120
0x0000555555db9b6d in epoll::ctl (epfd=56, op=epoll::ControlOptions::EPOLL_CTL_MOD, fd=55, event=...)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/epoll-4.3.1/src/lib.rs:155
0x00005555556f5127 in vmm::serial_buffer::SerialBuffer::add_out_poll (self=0x7fffe800b5d0) at vmm/src/serial_buffer.rs:101
0x00005555556f583d in vmm::serial_buffer::{impl#1}::write (self=0x7fffe800b5d0, buf=...) at vmm/src/serial_buffer.rs:139
0x0000555555a30b10 in std::io::Write::write_all<vmm::serial_buffer::SerialBuffer> (self=0x7fffe800b5d0, buf=...)
    at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/io/mod.rs:1527
0x0000555555ab82fb in devices::legacy::serial::Serial::handle_write (self=0x7fffe800b520, offset=0, v=13) at devices/src/legacy/serial.rs:217
0x0000555555ab897f in devices::legacy::serial::{impl#2}::write (self=0x7fffe800b520, _base=1016, offset=0, data=...) at devices/src/legacy/serial.rs:295
0x0000555555f30e95 in vm_device:🚌:Bus::write (self=0x7fffe8006ce0, addr=1016, data=...) at vm-device/src/bus.rs:235
0x00005555559406d4 in vmm::vm::{impl#4}::pio_write (self=0x7fffe8009640, port=1016, data=...) at vmm/src/vm.rs:459

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-16 07:27:46 -08:00
Rob Bradford
66a2045148 vmm: Fix panic in SIGWINCH listener thread when no seccomp filter set
When running with `--serial pty --console pty --seccomp=false` the
SIGWICH listener thread would panic as the seccomp filter was empty.
Adopt the mechanism used in the rest of the code and check for non-empty
filter before trying to apply it.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-16 14:28:02 +00:00
Sebastien Boeuf
a1f1dfddeb vmm: Fix CpusConfig validation error message
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-15 17:27:23 +01:00
Rob Bradford
3480e69ff5 vmm: Cache whether io_uring is supported in DeviceManager
Probing for whether the io_uring is supported is time consuming so cache
this value if it is known to reduce the cost for secondary block devices
that are added.

Before:

cloud-hypervisor: 3.988896ms: <vmm> INFO:vmm/src/device_manager.rs:1901 -- Creating virtio-block device: DiskConfig { path: Some("/home/rob/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 14.129591ms: <vmm> INFO:vmm/src/device_manager.rs:1983 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 14.159853ms: <vmm> INFO:vmm/src/device_manager.rs:1901 -- Creating virtio-block device: DiskConfig { path: Some("/tmp/disk"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk1"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 22.110281ms: <vmm> INFO:vmm/src/device_manager.rs:1983 -- Using asynchronous RAW disk file (io_uring)

After:

cloud-hypervisor: 4.880411ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/home/rob/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 14.105123ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 14.134837ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/tmp/disk"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk1"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 14.221869ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-12 18:09:55 +00:00
Sebastien Boeuf
932c8c9713 vmm: Add CPU affinity support
With the introduction of a new option `affinity` to the `cpus`
parameter, Cloud Hypervisor can now let the user choose the set
of host CPUs where to run each vCPU.

This is useful when trying to achieve CPU pinning, as well as making
sure the VM runs on a specific NUMA node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-12 09:40:37 +00:00
Sebastien Boeuf
a4f5ad6076 option_parser: Fix inner bracket support with list of integers
Give the option parser the ability to handle tuples with inner brackets
containing list of integers. The following example can now be handled
correctly "option=[key@[v1-v2,v3,v4]]" which means the option is
assigned a tuple with a key associated with a list of integers between
the range v1 - v2, as well as v3 and v4.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-12 09:40:37 +00:00
Sebastien Boeuf
c8e3c1eed6 clippy: Make sure to initialize data
Always properly initialize vectors so that we don't run in undefined
behaviors when the vector gets dropped.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-10 10:23:43 +01:00
Sebastien Boeuf
ad521fd4e4 option_parser: Create generic type Tuple
Creates a new generic type Tuple so that the same implementation of
FromStr trait can be reused for both parsing a list of two integers and
parsing a list of one integer associated with a list of integers.

This anticipates the need for retrieving sublists, which will be needed
when trying to describe the host CPU affinity for every vCPU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-09 08:59:59 +01:00
Sebastien Boeuf
b81d758c41 option_parser: Expect commas instead of colons for lists
The elements of a list should be using commas as the correct delimiter
now that it is supported. Deprecate use of colons as delimiter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-09 08:59:59 +01:00
Rob Bradford
751e76db08 vmm: acpi: Use Aml::append_aml_bytes() to generate DSDT
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-08 16:46:30 +00:00
Rob Bradford
d96d98d88e vmm: Port DeviceManager to Aml::append_aml_bytes()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-08 16:46:30 +00:00
Rob Bradford
185f0c1bf3 vmm: Port MemoryManager to Aml::append_aml_bytes()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-08 16:46:30 +00:00
Rob Bradford
e04cbb2ad4 vmm: Port PciSegment to Aml::append_aml_bytes()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-08 16:46:30 +00:00
Rob Bradford
986e43f899 vmm: cpu: Port CpuManager to Aml::append_aml_bytes()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-08 16:46:30 +00:00
Rob Bradford
d0c3342c97 vmm: acpi: Report time to generate ACPI tables
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-08 16:46:30 +00:00
Rob Bradford
a2e02a8fff vmm: Add SGX section creation logging
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
def98faf37 vmm, vm-allocator: Introduce an allocator for platform devices
This allocator allocates 64-bit MMIO addresses for use with platform
devices e.g. ACPI control devices and ensures there is no overlap with
PCI address space ranges which can cause issues with PCI device
remapping.

Use this allocator the ACPI platform devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
9d1a7e43a7 vmm: Refactor MCFG table creation to take just the PCI segments
This matches the lock taking behaviour of other functions in this file.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
afe95e5a2a vmm: Use an allocator specifically for RAM regions
Rather than use the system MMIO allocator for RAM use an allocator that
covers the full RAM range.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
b8fee11822 vmm: Place SGX EPC region between RAM and device area
Increase the start of the device area to accomodate the SGX EPC area.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
e20be3e147 vmm: Check hotplug memory against end of RAM not start of device area
This is because the SGX region will be placed between the end of ram and
the start of the device area.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
ec81f377b6 vmm: Refactor SGX setup to inside MemoryManager::new()
This makes it possible to manually allocate the SGX region after the end
of RAM region.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
438be0dad5 vmm: api: Add pci_segment entries to OpenAPI file
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
1a5a89508b vmm: Remove segment_id from DeviceNode
With the segment id now encoded in the bdf it is not necessary to have
the separate field for it.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
ae83e3b383 vmm: Use PciBdf throughout in order to remove manual bit manipulation
In particular use the accessor for getting the device id from the bdf.
As a side effect the VIOT table is now segment aware.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
a26ce353d3 vmm: Use the PCI segment allocator for pmem and fs cache allocations
Use the MMIO address space allocator associated with the segment that
the devices are on.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
cd9d1cf8fc pci, virtio-devices, vmm: Allocate PCI 64-bit bars per segment
Since each segment must have a non-overlapping memory range associated
with it the device memory must be equally divided amongst all segments.
A new allocator is used for each segment to ensure that BARs are
allocated from the correct address ranges. This requires changes to
PciDevice::allocate/free_bars to take that allocator and when
reallocating BARs the correct allocator must be identified from the
ranges.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
7cfeefde57 vmm: Add validation logic to check user specified pci_segment is valid
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
f71f6da907 vmm: Add pci_segment option to UserDeviceConfig
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
d4f7f42800 vmm: Add pci_segment option to DeviceConfig
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
ca955a47ff vmm: Implement pci_segment options for hotpluggable virtio devices
For all the devices that support being hotplugged (disk, net, pmem, fs
and vsock) add "pci_segment" option and propagate that through to the
addition onto the PCI busses.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
88378d17a2 vmm: Take PCI segment ID into BAR size allocation
Move the decision on whether to use a 64-bit bar up to the DeviceManager
so that it can use both the device type (e.g. block) and the PCI segment
ID to decide what size bar should be used.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
cf1c2bf0e8 vmm: Use the same set of reserved PCI IRQ routes for all segments
Generate a set of 8 IRQs and round-robin distribute those over all the
slots for a bus. This same set of IRQs is then used for all PCI
segments.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
e3d6e222a1 vmm: Add the required number of PCI segments
The platform config may specify a number of PCI segments to use, if this
greater than 1 then we add supplemental PCI segments as well as the
default segment.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
f8d9c073f0 vmm: Add "--platform"
This currently contains only the number over PCI segments to create.
This is limited to 16 at the moment which should allow 496 user specified
PCI devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
e3c35a3579 vmm: Allow specifying the PCI segment ID when adding virtio PCI device
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
7a4606f800 vmm: Implement ACPI hotplug/unplug handling for PCI segments
For the bus scanning the GED AML code now calls into a PSCN method that
scans all buses. This approach was chosen since it handles the case
correctly where one GED interrupt is services for two hotplugs on
distinct segments.

The PCIU and PCID field values are now determined by the PSEG field that
is uses to select which segment those values should be used for.
Similarly _EJ0 will notify based on the value of _SEG.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
49f19e061b vmm: Use device's segment when removing a device
The segment ID has been stored in the DeviceTree.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
d33d254921 vmm: Remove hardcoded zero PCI segment id
Replace the hardcoded zero PCI segment id when adding devices to the bus
and extend the DeviceTree to hold the PCI segment id.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
b8b0dab1ae vmm: Add segment_id parameter to DeviceManager::add_pci_device
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00