As part of the cleanup of the VM shutdown all the vCPU threads. This is
achieved by toggling a shared atomic boolean variable which is checked
in the vCPU loop. To trigger the vCPU code to look at this boolean it is
necessary to send a signal to the vCPU which will interrupt the running
KVM_RUN ioctl.
Fixes: #229
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Being able to reboot requires us to identify all the resources we are
leaking and cleaning those up before we can enable reboot. For now if
the user requests a reboot then shutdown instead.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Sadly only the first few characters of the thread name is preserved so
use a shorter name for the vCPU thread for now. Also give the signal
handling thread a name.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that we have ACPI shutdown support "reboot" will actually reboot the
VM rather than trigger the VMM to exit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add an I/O port "device" to handle requests from the kernel to shutdown
or trigger a reboot, borrowing an I/O used for ACPI on the Q35 platform.
The details of this I/O port are included in the FADT
(SLEEP_STATUS_REG/SLEEP_CONTROL_REG/RESET_REG) with the details of the
value to write in the FADT for reset (RESET_VALUE) and in the DSDT for
shutdown (S5 -> 0x05)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a 2nd EventFd to the VM to control resetting (rebooting) the VM this
supplements the EventFd used for managing shutdown of the VM.
The default behaviour on i8042 or triple-fault based reset is currently
unchanged i.e. it will trigger a shutdown.
In order to support restarting the VM it was necessary to make start()
function take a reference to the config.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The DSDT must declare the interrupt used by the serial device. This
helps the guest kernel matching the right interrupt to the 8250 serial
device. This is mandatory in case the IRQ routing is handled by ACPI, as
we must let ACPI know what do do with pin based interrupts.
One thing to notice, if we were using acpi=noirq from the kernel command
line, this would mean ACPI is not in charge of the IRQ routing, and the
device COM1 declaration would not be needed.
One additional requirement is to provide the appropriate interrupt
source override for the legacy ISA interrupts (0-15), which will give
the right information to the guest kernel about how to allocate the
associated IRQs.
Because we want to keep the MADT as simple as possible, and given that
our only device requiring pin based interrupt is the serial device, we
choose to only define the pin 4.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Only add the ACPI PNP device for the COM1 serial port if it is not
turned off with "--serial off"
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently this has a hardcoded range from 32GiB to 64GiB for the 64-bit PCI
range. It should range from the top of ram to 64GiB.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The MCFG table contains some PCI configuration details in particular
details of where the enhanced configuration space is.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently when the VCPU thread exits on an error the VMM continues to
run with no way of shutting down the main thread.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This provides important APIC configuration details for the CPU. Even
though it duplicates some of the information already included in the
mptable it is necessary when booting with ACPI as the mptable is not
used.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a revision 2 RSDP table only supporting an XSDT along with support
for creating generic SDT based tables.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The previous definitions does not cover config space read/write
and only cover general message as below:
A vhost-user message consists of 3 header fields and a payload.
+---------+-------+------+---------+
| request | flags | size | payload |
+---------+-------+------+---------+
but for config space, the payload include:
Virtio device config space
^^^^^^^^^^^^^^^^^^^^^^^^^^
+--------+------+-------+---------+
| offset | size | flags | payload |
+--------+------+-------+---------+
:offset: a 32-bit offset of virtio device's configuration space
:size: a 32-bit configuration space access size in bytes
🎏 a 32-bit value:
- 0: Vhost master messages used for writeable fields
- 1: Vhost master messages used for live migration
:payload: Size bytes array holding the contents of the virtio
device's configuration space
This patch add specific functions for config message, which can
get/set config space from/to backend.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
There is one definition in message.rs file as below:
pub const VHOST_USER_CONFIG_OFFSET: u32 = 0x100
This definition is only for virtio mmio config space
and we will add this offset in virtio-mmio side and
not vhost user protocl side.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Use acked_protocol_features to replace acked_virtio_features in
get_config()/set_config() for protocol features like CONFIG.
This patch also fix wrong GET_CONFIG setting for set_config().
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
The latest vhost user spec only define two members in
VhostSetConfigType, master and live migration. These
changes can make rust-vmm compatible with vhost user backend.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Bump from 829d605 to fd4dcd1.
PR #225 failed because we were still using the vmm-sys-util logging
macros and the crate's syslog module got removed.
This one relies on the previous commit switching to using the
log crate macros instead.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
At this point in the code, the acked features have been provided by the
guest and they can be set back to the backend. There's no need to
retrieve one more time the backend features for this purpose.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As mentioned in the vhost-user specification, each ring is initialized
in a stopped state. This means each ring should be enabled only after
it has been correctly initialized.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The available features are masked with the backend features, therefore
the available features should be the one used when calling into
set_features() API.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to factorize the code between vhost-user-net and virtio-fs one
step further, this patch extends the vhost-user handler implementation
to support slave requests.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch factorizes the existing virtio-fs code by relying onto the
common code part of the vhost_user module in the vm-virtio crate.
In details, it factorizes the vhost-user setup, and reuses the error
types defined by the module instead of defining its own types.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhost-user-net introduced a new module vhost_user inside the vm-virtio
crate. Because virtio-fs is actually vhost-user-fs, it belongs to this
new module and needs to be moved there.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhost-user framwork could provide good performance in data intensive
scenario due to the memory sharing mechanism. Implement vhost-user-net
device to get the benefit for Rust-based VMMs network.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
The currently directory handling process to open tempfile by
OpenOptions with custom_flags(O_TMPFILE) is workable for tmp
filesystem, but not workable for hugetlbfs, add new directory
handling process which works fine for both tmpfs and hugetlbfs.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
The recvmsg syscall can split a request in multiple packets unless we
use the flag MSG_WAITALL to make sure the request will wait for the
whole data to be transferred before returning.
This flag is needed to prevent the vhost crate from returning the error
PartialMessage, which occured sporadically when using virtio-fs, and
which was detected as part of our continuous integration testing.
Fixes#182
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By making the registration functions immutable, this patch prevents from
self borrowing issues with the RwLock on self.mem.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Following the refactoring of the code allowing multiple threads to
access the same instance of the guest memory, this patch goes one step
further by adding RwLock to it. This anticipates the future need for
being able to modify the content of the guest memory at runtime.
The reasons for adding regions to an existing guest memory could be:
- Add virtio-pmem and virtio-fs regions after the guest memory was
created.
- Support future hotplug of devices, memory, or anything that would
require more memory at runtime.
Because most of the time, the lock will be taken as read only, using
RwLock instead of Mutex is the right approach.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VMM guest memory was cloned (copied) everywhere the code needed to
have ownership of it. In order to clean the code, and in anticipation
for future support of modifying this guest memory instance at runtime,
it is important that every part of the code share the same instance.
Because VirtioDevice implementations need to have access to it from
different threads, that's why Arc must be used in this case.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When there are no available descriptors in the queue (observed when the
network interface hasn't been brought up by the kernel) stop waiting for
notifications that the TAP fd should be read from.
This avoids a situation where the TAP device has data avaiable and wakes
up the virtio-net thread only for the virtio-net thread not read that
data as it has nowhere to put it.
When there are descriptors available in the queue then we resume waiting
for the epoll event on the TAP fd.
This bug demonstrated itself as 100% CPU usage for cloud-hypervisor
binary prior to the guest network interface being brought up. The
solution was inspired by the Firecracker virtio-net code.
Fixes: #208
Signed-off-by: Rob Bradford <robert.bradford@intel.com>