Buggy condition meant that vcpu0 would not be iterated in the checks.
Since it's not hotpluggable anyways we would not be able to break the
configuration of a live VM.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1437013
A mediated device will be identified by a UUID (with 'model' now being
a mandatory <hostdev> attribute to represent the mediated device API) of
the user pre-created mediated device. We also need to make sure that if
user explicitly provides a guest address for a mdev device, the address
type will be matching the device API supported on that specific mediated
device and error out with an incorrect XML message.
The resulting device XML:
<devices>
<hostdev mode='subsystem' type='mdev' model='vfio-pci'>
<source>
<address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'>
</source>
</hostdev>
</devices>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Add an asyncJob argument for add/delete TLS Objects. A future patch will
add/delete TLS objects from a migration which may have a job to join.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Some users might want to pass a blockdev or a chardev as a
backend for NVDIMM. In fact, this is expected to be the mostly
used configuration. Therefore libvirt should allow the device in
devices CGroup then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that we have APIs for relabel memdevs on hotplug, fill in the
missing implementation in qemu hotplug code.
The qemuSecurity wrappers might look like overkill for now,
because qemu namespace code does not deal with the nvdimms yet.
Nor does our cgroup code. But hey, there's cgroup_device_acl
variable in qemu.conf. If users add their /dev/pmem* device in
there, the device is allowed in cgroups and created in the
namespace so they can successfully passthrough it to the domain.
It doesn't look like overkill after all, does it?
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Frankly, this function is one big mess. A lot of arguments,
complicated behaviour. It's really surprising that arguments were
in random order (input and output arguments were mixed together),
the documentation was outdated, the description of return values
was bogus.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If the delivery of the DEVICE_DELETED event for the vCPU being deleted
would time out, the code would not call 'qemuDomainResetDeviceRemoval'.
Since the waiting thread did not unregister itself prior to stopping the
waiting the monitor code would try to wake it up instead of dispatching
it to the event worker. As a result the unplug process would not be
completed and the definition would not be updated.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1428893https://bugzilla.redhat.com/show_bug.cgi?id=1427801
Split apart and rename qemuDomainGetChardevTLSObjects in order to make a
more generic API that can create the TLS JSON prop objects (secret and
tls-creds-x509) to be used to create the objects
Signed-off-by: John Ferlan <jferlan@redhat.com>
Create a qemuDomainAddChardevTLSObjects which will encapsulate the
qemuDomainGetChardevTLSObjects and qemuDomainAddTLSObjects so that
the callers don't need to worry about the props.
Move the dev->type and haveTLS checks in to the Add function to avoid
an unnecessary call to qemuDomainAddTLSObjects
Signed-off-by: John Ferlan <jferlan@redhat.com>
Refactor the TLS object adding code to make two separate API's that will
handle the add/remove of the "secret" and "tls-creds-x509" objects including
the Enter/Exit monitor commands.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since qemuDomainObjExitMonitor can also generate error messages,
let's move it inside any error message saving code on error paths
for various hotplug add activities.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Now that we have some qemuSecurity wrappers over
virSecurityManager APIs, lets make sure everybody sticks with
them. We have them for a reason and calling virSecurityManager
API directly instead of wrapper may lead into accidentally
labelling a file on the host instead of namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
These functions do not need to see the whole virDomainDiskDef.
Moreover, they are going to be called from places where we don't
have access to the full disk definition. Sticking with
virStorageSource is more than enough.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Again, one missed bit. This time without this commit there is no
/dev entry in the namespace of the qemu process when attaching
vhost SCSI device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since we have qemuSecurity wrappers over
virSecurityManagerSetHostdevLabel and
virSecurityManagerRestoreHostdevLabel we ought to use them
instead of calling secdriver APIs directly. Without those
wrappers the labelling won't be done in the correct namespace
and thus won't apply to the nodes seen by qemu itself.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
libvirt was able to set the host_mtu option when an MTU was explicitly
given in the interface config (with <mtu size='n'/>), set the MTU of a
libvirt network in the network config (with the same named
subelement), and would automatically set the MTU of any tap device to
the MTU of the network.
This patch ties that all together (for networks based on tap devices
and either Linux host bridges or OVS bridges) by learning the MTU of
the network (i.e. the bridge) during qemuInterfaceBridgeConnect(), and
returning that value so that it can then be passed to
qemuBuildNicDevStr(); qemuBuildNicDevStr() then sets host_mtu in the
interface's commandline options.
The result is that a higher MTU for all guests connecting to a
particular network will be plumbed top to bottom by simply changing
the MTU of the network (in libvirt's config for libvirt-managed
networks, or directly on the bridge device for simple host bridges or
OVS bridges managed outside of libvirt).
One question I have about this - it occurred to me that in the case of
migrating a guest from a host with an older libvirt to one with a
newer libvirt, the guest may have *not* had the host_mtu option on the
older machine, but *will* have it on the newer machine. I'm curious if
this could lead to incompatibilities between source and destination (I
guess it all depends on whether or not the setting of host_mtu has a
practical effect on a guest that is already running - Maxime?)
Likewise, we could run into problems when migrating from a newer
libvirt to older libvirt - The guest would have been told of the
higher MTU on the newer libvirt, then migrated to a host that didn't
understand <mtu size='blah'/>. (If this really is a problem, it would
be a problem with or without the current patch).
The current ordering is as follows:
1) set label
2) create the device in namespace
3) allow device in the cgroup
While this might work for now, it will definitely not work if the
security driver would use transactions as in that case there
would be no device to relabel in the domain namespace as the
device is created in the second step.
Swap steps 1) and 2) to allow security driver to use more
transactions.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The event needs to be emitted after the last monitor call, so that it's
not possible to find the device in the XML accidentally while the vm
object is unlocked.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1414393
https://bugzilla.redhat.com/show_bug.cgi?id=1405269
If a secret was not provided for what was determined to be a LUKS
encrypted disk (during virStorageFileGetMetadata processing when
called from qemuDomainDetermineDiskChain as a result of hotplug
attach qemuDomainAttachDeviceDiskLive), then do not attempt to
look it up (avoiding a libvirtd crash) and do not alter the format
to "luks" when adding the disk; otherwise, the device_add would
fail with a message such as:
"unable to execute QEMU command 'device_add': Property 'scsi-hd.drive'
can't find value 'drive-scsi0-0-0-0'"
because of assumptions that when the format=luks that libvirt would have
provided the secret to decrypt the volume.
Access to unlock the volume will thus be left to the application.
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Two reasons:
1.in none hotplug, we will pass it. We can see from libvirt function
qemuBuildVhostuserCommandLine
2.qemu will use this vetcor num to init msix table. If we don't pass, qemu
will use default value, this will cause VM can only use default value
interrupts at most.
Signed-off-by: gaohaifeng <gaohaifeng.gao@huawei.com>
Consider the following XML snippets:
$ cat scsicontroller.xml
<controller type='scsi' model='virtio-scsi' index='0'/>
$ cat scsihostdev.xml
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='8' unit='1074151456'/>
</source>
</hostdev>
If we create a guest that includes the contents of scsihostdev.xml,
but forget the virtio-scsi controller described in scsicontroller.xml,
one is silently created for us. The same holds true when attaching
a hostdev before the matching virtio-scsi controller.
(See qemuDomainFindOrCreateSCSIDiskController for context.)
Detaching the hostdev, followed by the controller, works well and the
guest behaves appropriately.
If we detach the virtio-scsi controller device first, any associated
hostdevs are detached for us by the underlying virtio-scsi code (this
is fine, since the connection is broken). But all is not well, as the
guest is unable to receive new virtio-scsi devices (the attach commands
succeed, but devices never appear within the guest), nor even be
shutdown, after this point.
While this is not libvirt's problem, we can prevent falling into this
scenario by checking if a controller is being used by any hostdev
devices. The same is already done for disk elements today.
Applying this patch and then using the XML snippets from earlier:
$ virsh detach-device guest_01 scsicontroller.xml
error: Failed to detach device from scsicontroller.xml
error: operation failed: device cannot be detached: device is busy
$ virsh detach-device guest_01 scsihostdev.xml
Device detached successfully
$ virsh detach-device guest_01 scsicontroller.xml
Device detached successfully
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
If libvirtd is running unprivileged, it can open a device's PCI config
data in sysfs, but can only read the first 64 bytes. But as part of
determining whether a device is Express or legacy PCI,
qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future
patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond
the first 64 bytes of the PCI config data and fails with an error log
if the read is unsuccessful.
In order to avoid creating a parallel "quiet" version of
virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down
through all the call chains that initialize the
qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver
pointer with the rest of the iterdata so that it can be used by
qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used
yet, but will be used in an upcoming patch (that detects Express vs
legacy PCI for VFIO assigned devices) to examine driver->privileged.
Adjust the device string that is built for vhost-scsi devices so that it
can be invoked from hotplug.
From the QEMU command line, the file descriptors are expect to be numeric only.
However, for hotplug, the file descriptors are expected to begin with at least
one alphabetic character else this error occurs:
# virsh attach-device guest_0001 ~/vhost.xml
error: Failed to attach device from /root/vhost.xml
error: internal error: unable to execute QEMU command 'getfd':
Parameter 'fdname' expects a name not starting with a digit
We also close the file descriptor in this case, so that shutting down the
guest cleans up the host cgroup entries and allows future guests to use
vhost-scsi devices. (Otherwise the guest will silently end.)
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
We already have a "scsi" hostdev subsys type, which refers to a single
LUN that is passed through to a guest. But what of things where
multiple LUNs are passed through via a single SCSI HBA, such as with
the vhost-scsi target? Create a new hostdev subsys type that will
carry this.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Just like in the previous commit, we are not updating CGroups on
chardev hot(un-)plug and thus leaving qemu unable to access any
non-default device users are trying to hotplug.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If users try to hotplug RNG device with a backend different to
/dev/random or /dev/urandom the whole operation fails as qemu is
unable to access the device. The problem is we don't update
device CGroups during the operation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Before now, all the qemu hotplug functions assumed that all devices to
be hotplugged were legacy PCI endpoint devices
(VIR_PCI_CONNECT_TYPE_PCI_DEVICE). This worked out "okay", because all
devices *are* legacy PCI endpoint devices on x86/440fx machinetypes,
and hotplug didn't work properly on machinetypes using PCIe anyway
(hotplugging onto a legacy PCI slot doesn't work, and until commit
b87703cf any attempt to manually specify a PCIe address for a
hotplugged device would be erroneously rejected).
This patch makes all qemu hotplug operations honor the pciConnectFlags
set by the single all-knowing function
qemuDomainDeviceCalculatePCIConnectFlags(). This is done in 3 steps,
but in a single commit since we would have to touch the other points
at each step anyway:
1) add a flags argument to the hypervisor-agnostic
virDomainPCIAddressEnsureAddr() (previously it hardcoded
..._PCI_DEVICE)
2) add a new qemu-specific function qemuDomainEnsurePCIAddress() which
gets the correct pciConnectFlags for the device from
qemuDomainDeviceConnectFlags(), then calls
virDomainPCIAddressEnsureAddr().
3) in qemu_hotplug.c replace all calls to
virDomainPCIAddressEnsureAddr() with calls to
qemuDomainEnsurePCIAddress()
So in effect, we're putting a "shim" on top of all calls to
virDomainPCIAddressEnsureAddr() that sets the right pciConnectFlags.
Coverity identified that this variable might be leaked. And it's
right. If an error occurred and we have to roll back the control
jumps to try_remove label where we save the current error (see
0e82fa4c34 for more info). However, inside the code a jump onto
other label is possible thus leaking the error object.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The memory device alias needs to be treated as machine ABI as qemu is
using it in the migration stream for section labels. To simplify this
generate the alias from the slot number unless an existing broken
configuration is detected.
With this patch the aliases are predictable and even certain
configurations which would not be migratable previously are fixed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1359135
As with other devices assign the slot number right away when adding the
device. This will make the slot numbers static as we do with other
addressing elements and it will ultimately simplify allocation of the
alias in a static way which does not break with qemu.
https://bugzilla.redhat.com/show_bug.cgi?id=1386976
We have everything ready. Actually the only limitation was our
check that denied hotplug of vhost-user.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If there is an error hotpluging a net device (for whatever
reason) a rollback operation is performed. However, whilst doing
so various helper functions that are called report errors on
their own. This results in the original error to be overwritten
and thus misleading the user.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>