When forwarding an epoll event from the unix muxer to the
targeted connection event handler, the eventset the connection
registered is forwarded instead of the actual epoll
operation (IN/OUT).
For example, if the connection was registered for EPOLLIN,
and receives an EPOLLOUT, the connection will actually handle
an EPOLLOUT.
This is the root cause of previous bug, which caused the
introduction of some workarounds (i.e: handling ewouldblock
when reading after receiving EPOLLIN, which should never happen).
When matching the connection, we retrieve and use the evset of
the connection instead of the one passed as a parameter.
The compiler does not complain for an unused variable because
it was first logged in a debug! statement.
This is an unfortunate naming mistake that caused a lot of problems.
Fixes#3497
Signed-off-by: Eduard Kyvenko <eduard.kyvenko@gmail.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
(cherry picked from commit c47ab55e99)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the underlying kernel is old PTY resize is disabled and this is
represented by the use of None in the provided Option<File> type. In the
virtio-console PTY path don't blindly unwrap() the value that will be
preserved across a reboot.
Fixes: #3496
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit a749063c8a)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These were use for the self spawning vhost-user device feature that has
been removed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit bde81405a8)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Setting up the SIGWINCH handler requires at least Linux 5.7. However
this functionality is not required for basic PTY operation.
Fixes: #3456
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit afe386bc13)
This reverts commit 58d25b3ccc.
This change introduced a regression when running iperf with the guest
running as the server:
marvin:~/src/cloud-hypervisor ((58d25b3c...))$ iperf -c 192.168.249.2
------------------------------------------------------------
Client connecting to 192.168.249.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.249.1 port 47078 connected with 192.168.249.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.40 sec 14.0 MBytes 11.3 Mbits/sec
marvin:~/src/cloud-hypervisor ((58d25b3c...))$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.249.1 port 5001 connected with 192.168.249.2 port 42866
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.01 sec 51.2 GBytes 44.0 Gbits/sec
Fixes: #3450
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 4959434219)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The sendmsg() syscall is limited in the number of fds it can handle.
This number matches that used by the vfio-user library and is
conservative (since we've seen it work with 64 fds.)
Fixes: #3401
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 7444c3a0c5)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Advertise the PCI MMIO config spaces here so that the MMIO config space
is correctly recognised.
Tested by: --platform num_pci_segments=1 or 16 hotplug NVMe vfio-user device
works correctly with hypervisor-fw & OVMF and direct kernel boot.
Fixes: #3432
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 50f5f43ae3)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The constant `PCI_MMIO_CONFIG_SIZE` defined in `vmm/pci_segment.rs`
describes the MMIO configuation size for each PCI segment. However,
this name conflicts with the `PCI_MMCONFIG_SIZE` defined in `layout.rs`
in the `arch` crate, which describes the memory size of the PCI MMIO
configuration region.
Therefore, this commit renames the `PCI_MMIO_CONFIG_SIZE` to
`PCI_MMIO_CONFIG_SIZE_PER_SEGMENT` and moves this constant from `vmm`
crate to `arch` crate.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 2f8540da70)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When restoring replace the internal value of the device tree rather than
replacing the Arc<Mutex<DeviceTree>> itself. This is fixes an issue
where the AddressManager has a copy of the the original
Arc<Mutex<DeviceTree>> from when the DeviceManager was created. The
original restore path only replaced the DeviceManager's version of the
Arc<Mutex<DeviceTree>>. Instead replace the contents of the
Arc<Mutex<DeviceTree>> so all users see the updated version.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit e1c09b66ba)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to avoid the identity map region to conflict with a possible
firmware being placed in the last 4MiB of the 4GiB range, we must set
the address to a chosen location. And it makes the most sense to have
this region placed right after the TSS region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
(cherry picked from commit 03a606c7ec)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extending the Vm trait with set_identity_map_address() in order to
expose this ioctl to the VMM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
(cherry picked from commit c452471c4e)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This brings the support for KVM_SET_IDENTITY_MAP_ADDR ioctl.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
(cherry picked from commit 882cdda995)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Place the 3 page TSS at an explicit location in the 32-bit address space
to avoid conflicting with the loaded raw firmware.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 348def9dfb)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Reduce the size of the reserved 32-bit address space to the range used
by both the PCI MMIO config data and the 32-bit PCI device space.
This avoids issues when using firmware that is loaded into the very top
of the 32-bit address space as the RAM conflicts with the reserved
memory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Current `Getting Started` section only contains steps for the x86_64
platform, as we have a documentation doing the same thing for AArch64,
we can point users to the correct documentation.
Also, this commit modifies the `docs/arm64.md` to fit the documentation
style within the project.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This test is flaky (#3400) while we are experiencing a bug of using the latest
SPDK/NVMe backend as VFIO user device (#3401). Let's disable this test
before we fix the above two issues.
Signed-off-by: Bo Chen <chen.bo@intel.com>
For now we only enable the vfio-user test on x86_64 platform, as we have
a known hanging issue to resovle on the aarch64 platform.
Fixes: #3098
Signed-off-by: Bo Chen <chen.bo@intel.com>
Enabling these configs can avoid systemd errors related to Device Mapper
multipath while guest booting. Especially, the guest can hang when being
used with an NVMe backend without these configs (#3352).
Signed-off-by: Bo Chen <chen.bo@intel.com>
This kernel config is needed to fix the observed guest hanging issue
cased by systemd crash while booting.
Fixes: #3352
Signed-off-by: Bo Chen <chen.bo@intel.com>
Added fields:
- `Memory address size limit`: the missing of this field triggered
warnings in guest kernel
- `Node ID`
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
After introducing multiple PCI segments, the `devid` value in
`kvm_irq_routing_entry` exceeds the maximum supported range on AArch64.
This commit restructed the `devid` to the allowed range.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The register D has only one bit that is not reserved, and its purpose is
to report if the RTC/CMOS device is powered or not.
The OVMF firmware was failing to boot as it was getting the information
that the device was powered off from the register D.
The simple way to fix this issue is by always returning the bit 7 from
register D as 1, indicating the device is always powered.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If the provided binary isn't an ELF binary assume that it is a firmware
to be loaded in directly. In this case we shouldn't program any of the
registers as KVM starts in that state.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Function alloc_zeroed can fail. Check its return in read and write
functions. Its return value in is_valid_alignment is not checked because
handling error in that case does not give us much benefit. Instead, an
assertion is added.
Add safety comments to all `unsafe`s.
Signed-off-by: Wei Liu <liuwe@microsoft.com>