363 Commits

Author SHA1 Message Date
Samuel Ortiz
fd45e94510 vm-virtio: Add the ability to serialize a Queue
This commit relies on serde to serialize and deserialize the content of
a Queue structure. This will be useful information to store when
implementing snapshot/restore feature for virtio devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
2020-04-17 19:29:41 +02:00
Rob Bradford
9bd5ec8967 pci, vfio, vm-virtio: Specify a PCI revision ID of 1 for virtio-pci
Add support for specifying the PCI revision in the PCI configuration and
populate this with the value of 1 for virtio-pci devices.

The virtio-pci specification is slightly ambiguous only saying that
transitional (i.e. devices that support legacy and virtio 1.0) should
set this to 0. In practice it seems that software expects the revision
to be set to 1 for modern only devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-17 13:46:48 +02:00
Rob Bradford
2fa652aa4c vm-virtio: pci: Add virtio_device() accessor
Add an accessor to return the underlying VirtioDevice. This is useful
for managing the removal of the device from internal datastructures when
handling virtio-pci device unplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-16 17:03:25 +02:00
Rob Bradford
8ff3633782 vm-virtio: pci: Update the BARs used by the VirtioPciDevice
In order to support freeing the memory that is allocated we need to make
sure that we update the internal representation so that free_bars() can
correctly free the memory if the device has its BARs moved.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-15 12:02:19 +02:00
Rob Bradford
a216c2ebd3 vm-virtio: pci: Implement free_bars() for VirtioPciDevice
Implement the free_bars() method from the PciDevice trait which is used
as part of the device removal process. Although there is only one BAR
allocated by VirtioPciDevice simplify the code by using a vector.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-15 12:02:19 +02:00
Rob Bradford
70ecd6bab4 vmm, virtio: fs: Move freeing of mappped region into device
Move the release of the managed memory region from the DeviceManager to
the vhost-user-fs device. This ensures that the memory will be freed when
the device is unplugged which will lead to it being Drop()ed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-14 17:46:11 +01:00
Rob Bradford
0c6706a510 vmm, virtio: pmem: Move freeing of mappped region into device
Move the release of the managed memory region from the DeviceManager to
the virtio-pmem device. This ensures that the memory will be freed when
the device is unplugged which will lead to it being Drop()ed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-14 17:46:11 +01:00
dependabot-preview[bot]
886c0f9093 build(deps): bump libc from 0.2.68 to 0.2.69
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.68 to 0.2.69.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.68...0.2.69)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-04-14 09:27:04 +01:00
Yang Zhong
183529d024 vmm: Cleanup warning from build
Remove unnecessary parentheses from code and this will cleanup
the warning from cargo build.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
2020-04-07 09:45:31 +02:00
Samuel Ortiz
1b1a2175ca vm-migration: Define the Snapshottable and Transportable traits
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.

A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.

Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
2020-04-02 13:24:25 +01:00
Eryu Guan
33be24bd5a vhost-user-fs: return EINVAL if req is out of range in fs_slave_mmap/unmap/sync
Return libc::EINVAL instead of custom "Wrong offset" error, as mmap(2)
returns EINVAL when offset/len is invalid.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
2020-03-27 11:27:56 +01:00
Eryu Guan
78b5cbc63a vhost-user-fs: validate fs_slave_map/unmap/sync request
In fs_slave_map/unmap/sync, we only made sure offset < cache_size, but
didn't validate (offset + len). We should ensure [offset, offset+len]
is within cache range as well.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
2020-03-27 11:27:56 +01:00
Hui Zhu
51d102c708 vm-virtio: Add virtio-mem device
The basic idea of virtio-mem is to provide a flexible, cross-architecture
memory hot plug and hot unplug solution that avoids many limitations
imposed by existing technologies, architectures, and interfaces. More
details can be found in https://lkml.org/lkml/2019/12/12/681.

This commit add virtio-mem device.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-03-25 15:54:16 +01:00
Eryu Guan
61e34331c2 virtio-fs: validate request len in fs_slave_io()
We made sure gpa is in cache range, but not the end addr of request,
which is (gpa + len). If the end addr of request is beyond dax cache
window, vmm would corrupt guest memory or crash.

Fix it by making sure end addr of request is within cache range as well.
And while we're at it, return EFAULT if the request is out of range, as
write(2)/read(2) returns EFAULT when buffer is outside accessible
address space.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
2020-03-25 13:12:26 +01:00
Sebastien Boeuf
d75e7456fc vm-virtio: vhost-user: Send memory update to the backend
In order to keep vhost-user backend to work across guest memory resizing
happening when memory is hot-plugged or hot-unplugged, both blk, net and
fs devices are implementing the notifier to let the backend know.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 19:01:15 +00:00
Sebastien Boeuf
7ff82af4b2 vm-virtio: vhost-user: Factorize SET_MEM_TABLE setup
By factorizing the setup of the memory table for vhost-user, we
anticipate the fact that vhost-user devices are going to reuse this
function when the guest memory will be updated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 19:01:15 +00:00
Sebastien Boeuf
bc874a9b6f vm-virtio: Add update_memory() to VirtioDevice trait
The virtio devices backed by a vhost-user backend must send an update to
the associated backend with the new file descriptors corresponding to
the memory regions.

This patch allows such devices to be notified when such update needs to
happen.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 19:01:15 +00:00
Eryu Guan
18fbd303ab vhost-user-fs: return correct result of fs_slave_io()
Virtio-fs daemon expects fs_slave_io() returns the number of bytes
read/written on success, but we always return 0 and make userspace think
nothing has been read/written.

Fix it by returning the actual bytes read/written. Note that This
depends on the corresponding fix in vhost crate.

Fixes: #949
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
2020-03-24 14:55:56 +01:00
Rob Bradford
8acc15a63c build: Bump vm-memory and linux-loader dependencies
linux-loader depends on vm-memory so must be updated at the same time.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-23 14:27:41 +00:00
Sergio Lopez
6329219749 vm-virtio: queue: Use a SeqCst fence on get_used_event
On x86_64, a hint to the compiler is not enough, we need to issue a
MFENCE instruction. Replace the Acquire fence with a SeqCst one.

Without this, it's still possible to miss an used_event update,
leading to the omission of a notification, possibly stalling the
vring.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-18 13:36:17 +00:00
dependabot-preview[bot]
51f51ea17d build(deps): bump libc from 0.2.67 to 0.2.68
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.67 to 0.2.68.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.67...0.2.68)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-03-17 21:36:38 +00:00
Liu Bo
5c1207c198 vhost-user-fs: handle FS_IO request
Virtiofs's dax window can be used as read/write's source (e.g. mmap a file
on virtiofs), but the dax window area is not shared with vhost-user
backend, i.e. virtiofs daemon.

To make those IO work, addresses of this kind of IO source are routed to
VMM via FS_IO requests to perform a read/write from an fd directly to the
given GPA.

This adds the support of FS_IO request to clh's vhost-user-fs master part.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
2020-03-17 08:23:38 +01:00
Sergio Lopez
90309b5106 vm-virtio: queue: Add methods to switch a descriptor context
"DescriptorChain"s are tied to the lifetime of the referenced
GuestMemoryMmap object (for good reasons), but sometimes (i.e., when
processing descriptors from different contexts) we may need to switch
them to point a different GuestMemoryMmap.

Here we introduce the structure DescriptorHead, which holds the data
needed to rebuild a DescriptorChain, the method "get_head" which
returns the DescriptorHead for a DescriptorChain, and the method
"new_from_head", which allows to create a new DescriptorChain with a
DescriptorHead and a new reference to a GuestMemoryMmap.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-13 15:20:34 +00:00
Sergio Lopez
536323d9fb vm-virtio: queue: hint that get_used_event should be inlined
get_used_event is used from vhost_user_backend:needs_notification to
check whether an interrupt must be sent to the guest to notify there
are new items in the queue. Shorten the update window by asking the
the compiler to inline this method, so a write won't slip between the
read of the memory contents and the actual check.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-12 14:34:21 +00:00
Sergio Lopez
401e1d2489 vm-virtio: queue: fix a barrier comment at update_avail_event
The barrier had a comment coming from other context. Adjust it to be
relevant to its own context.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-12 14:34:21 +00:00
Sergio Lopez
e0bdfe826e vm-virtio: queue: add a missing memory barrier in get_used_event
Add a missing memory barrier in get_used_event to make sure we see the
last value written by the guest.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-12 14:34:21 +00:00
Rob Bradford
30b69549e1 vm-virtio: Consume pause events to prevent infinite epoll_wait calls
When a virtio device is paused an event is written to the appropriate
"pause" EventFd for the device. This will be noticed by the the device's
epoll_wait(), an atomic bool checked an if true then the thread is
parked(). When resuming the bool is reset and the thread is unpark()ed.
However the event triggering the pause is still in the EventFd so the
epoll_wait() will continue to return but because the boolean is not set
the thread will not be park()ed but instead we will busy loop around an
event that is not being consumed.

The solution is to drain the "pause" EventFd when the event is first
received and thus the epoll_wait() will only return for the pause event
once. This resolves the infinite epoll_wait() wake-ups.

Fixes: #869

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-09 19:01:38 +01:00
Jose Carlos Venegas Munoz
df794993f8 net: Do not check multiqueue for new interface
If tun_flags file does not exist, check should not be fatal.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-03-05 19:21:06 +00:00
Rob Bradford
f0a3e7c4a1 build: Bump linux-loader and vm-memory dependencies
linux-loader now uses the released vm-memory so we must move to that
version at the same time.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-05 11:01:30 +01:00
Rob Bradford
642b890b0f vm-virtio: mmio: Enable reporting of SHM regions via config fields
The details of the SHM regions or the lack of, which is used by
virtio-fs DAX, is communicated through configuration fields on the
virtio-mmio memory region. Implement the necessary fields to return the
SHM entries and in particular return a length of (u64)-1 which is used
by the kernel to indicate there are no SHM regions.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-04 09:40:25 +01:00
Eryu Guan
5200bf3c59 Cargo: switch vhost_rs to external crate
As cloud-hypervisor/vhost crate (dragonball branch) is ready to be used,
switch vhost_rs from internal crate to the external one.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
2020-03-03 13:14:45 +00:00
Arron Wang
65a38e6f70 vm-virtio: vhost_user: Fix blk device configuration space offset value
Current device configuration space offset value is 0, we need to
update that value to VHOST_USER_CONFIG_OFFSET(0x100) to follow the spec

Fixes #844

Signed-off-by: Arron Wang <arron.wang@intel.com>
2020-03-03 12:27:55 +00:00
Sergio Lopez
42937c9754 vm-virtio: Add support for indirect descriptors
Indirect descriptors is a virtio feature that allows the driver to
store a table of descriptors anywhere in memory, pointing to it from a
virtqueue ring's descriptor with a particular flag.

We can't seamlessly transition from an iterator over a conventional
descriptor chain to an indirect chain, so Queue users need to
explicitly support this feature by calling Queue::is_indirect() and
Queue::new_from_indirect().

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-25 11:12:50 +00:00
dependabot-preview[bot]
f190cb05b5 build(deps): bump libc from 0.2.66 to 0.2.67
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.66 to 0.2.67.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.66...0.2.67)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-02-21 08:03:30 +00:00
Sergio Lopez
d17fa784bc vm-virtio: Implement support for EVENT_IDX
VIRTIO_RING_F_EVENT_IDX is a virtio feature that allows to avoid
device <-> driver notifications under some circunstances, most
notably when actively polling the queue.

This commit implements support for in in the vm-virtio
crate. Consumers of this crate will also need to add support for it by
exposing the feature and calling using update_avail_event() and
get_used_event() accordingly.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Sebastien Boeuf
793d4e7b8d vmm: Move codebase to GuestMemoryAtomic from vm-memory
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.

The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.

Fixes #735

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-19 13:48:19 +00:00
Rob Bradford
4d60ef59bc vm-virtio: vhost_user: block: On shutdown() drop the socket
This causes the vhost-user-block backend to shutdown.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-18 08:43:47 +00:00
Rob Bradford
503887843f vm-virtio: vhost_user: net: On shutdown() drop the socket
This causes the vhost-user-net backend to shutdown.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Rob Bradford
545ea9ea33 vm-virtio: Add shutdown method to VirtioDevice trait
This allows the VMM to explicitly shutdown devices as part of the VM
shutdown ahead of what Drop::drop() would do.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Sebastien Boeuf
4dd16c2686 vm-virtio: Detect if a tap interface supports multiqueue
By detecting if an existing tap interface supports multiqueue, we now
have the information to determine if the command line parameters
regarding the number of queues is correct.

Fixes #738

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-12 18:05:42 +00:00
dependabot-preview[bot]
d46c61c5d4 build(deps): bump byteorder from 1.3.2 to 1.3.4
Bumps [byteorder](https://github.com/BurntSushi/byteorder) from 1.3.2 to 1.3.4.
- [Release notes](https://github.com/BurntSushi/byteorder/releases)
- [Changelog](https://github.com/BurntSushi/byteorder/blob/master/CHANGELOG.md)
- [Commits](https://github.com/BurntSushi/byteorder/compare/1.3.2...1.3.4)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-02-07 14:18:07 +00:00
Cathy Zhang
14eddf72b4 vm-virtio: Simplify virtio feature handling
Remove duplicated code across the different devices by handling
the virtio feature pages in VirtioDevice itself rather than
in the backends. This works as no virtio devices use feature
bits beyond 64-bits.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-02-07 08:32:21 +00:00
Sebastien Boeuf
3447e226d9 dependencies: bump vm-memory from 4237db3 to f3d1c27
This commit updates Cloud-Hypervisor to rely on the latest version of
the vm-memory crate.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-06 11:40:45 +01:00
Samuel Ortiz
da2b3c92d3 vm-device: interrupt: Remove InterruptType dependencies and definitions
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.

By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-04 19:32:45 +01:00
Sebastien Boeuf
56d7c04226 vm-virtio: vsock: Don't return error when epoll_wait is interrupted
The existing code taking care of the epoll loop was too restrictive as
it was considering all errors the same. But in case the error is EINTR,
this means the syscall has been interrupted while waiting, and it should
be resumed to wait again.

This patch enforces the parsing of the returned error and prevent the
code from assuming EINTR should be handled as all other errors.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-04 18:16:37 +01:00
Yang Zhong
1038a07dd6 vhost-user-blk: Device support multiple queues
The previous code only support one queue, and we need
to support MQ in vhost user block device. This patch
can work with SPDK with MQ setting.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
2020-02-03 09:49:27 +01:00
Sebastien Boeuf
bac0d1e689 iommu: Implement virtio topology configuration
Based on the new structures previously introduced, the new topology
feature is being fully implemented through this commit. This allows
the description of the devices attached to the virtual IOMMU, which
is why a new function attach_devices() has been introduced. It gives
the virtual IOMMU device the full list of devices which must be attached
to it, letting the device share this information through its virtio
configuration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-30 10:37:40 +01:00
Sebastien Boeuf
0c73ff8129 iommu: Add topology structures
The virtio-iommu device defines a new virtio feature allowing the
topology to be discovered fully through virtio configuration.

By topology, it means describing the devices attached to the virtual
IOMMU. This is currently managed through ACPI with IORT and VIOT table,
but this is another way of describing it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-30 10:37:40 +01:00
Sebastien Boeuf
db42caef42 vm-virtio: Handle special virtio-pci capability CAP_PCI_CFG
The virtio capability VIRTIO_PCI_CAP_PCI_CFG is exposed through the
device's PCI config space the same way other virtio-pci capabilities
are exposed.

The main and important difference is that this specific capability is
designed as a way for the guest to access virtio capabilities without
mapping the PCI BAR. This is very rarely used, but it can be useful when
it is too early for the guest to be able to map the BARs.

One thing to note, this special feature MUST be implemented, based on
the virtio specification.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-30 09:25:52 +01:00
Sebastien Boeuf
db9f9b7820 pci: Make self mutable when reading from PCI config space
In order to anticipate the need to support more features related to the
access of a device's PCI config space, this commits changes the self
reference in the function read_config_register() to be mutable.

This also brings some more flexibility for any implementation of the
PciDevice trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-30 09:25:52 +01:00