Export the required helpers and add backend code to hotplug RNG devices.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
This way the device is in vmdef only if ret = 0 and the caller
(qemuDomainAttachDeviceFlags) does not free it.
Otherwise it might get double freed by qemuProcessStop
and qemuDomainAttachDeviceFlags if the domain crashed
in monitor after we've added it to vm->def.
Do the allocation first, then add the actual device.
The second part should never fail. This is good
for live hotplug where we don't want to remove the device
on OOM after the monitor command succeeded.
The only change in behavior is that on failure, the
vmdef->consoles array is freed, not just the first console.
Depending on the context, either error out if the domain
has disappeared in the meantime, or just ignore the value
to allow marking the function as ATTRIBUTE_RETURN_CHECK.
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
If the domain crashed while we were in monitor,
we cannot rely on the REALLOC done on live definition,
since vm->def now points to the persistent definition.
Skip adding the attached devices to domain definition
if the domain crashed.
In AttachChrDevice, the chardev was already added to the
live definition and freed by qemuProcessStop in the case
of a crash. Skip the device removal in that case.
Also skip audit if the domain crashed in the meantime.
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
In the device type-specific functions, exit early
if the domain has disappeared, because the cleanup
should have been done by qemuProcessStop.
Check the return value in processDeviceDeletedEvent
and qemuProcessUpdateDevices.
Skip audit and removing the device from live def because
it has already been cleaned up.
https://bugzilla.redhat.com/show_bug.cgi?id=1165993
So, there are still plenty of vNIC types that we don't know how to set
bandwidth on. Let's warn explicitly in case user has requested it
instead of pretending everything was set.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add the possibility to have more than one IP address configured for a
domain network interface. IP addresses can also have a prefix to define
the corresponding netmask.
We can change vnc password by using virDomainUpdateDeviceFlags API with
live flag. But it can't be changed with config flag. Error is reported as
below.
error: Operation not supported: persistent update of device 'graphics' is not supported
This patch supports the graphics arguments changed with config flag.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
It's not supported to change some graphics arguments with '--live'.
Replace some error code VIR_ERR_INTERNAL_ERROR and VIR_ERR_INVALID_ARG
with VIR_ERR_OPERATION_UNSUPPORTED.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
We now have a qemuInterfaceStartDevices() which does the final
activation needed for the host-side tap/macvtap devices that are used
for qemu network connections. It will soon make sense to have the
converse qemuInterfaceStopDevices() which will undo whatever was done
during qemuInterfaceStartDevices().
A function to "stop" a single device has also been added, and is
called from the appropriate place in qemuDomainDetachNetDevice(),
although this is currently unnecessary - the device is going to
immediately be deleted anyway, so any extra "deactivation" will be for
naught. The call is included for completeness, though, in anticipation
that in the future there may be some required action that *isn't*
nullified by deleting the device.
This patch is a part of a more complete fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
Currently, MAC registration occurs during device creation, which is
early enough that, during live migration, you end up with duplicate
MAC addresses on still-running source and target devices, even though
the target device isn't actually being used yet.
This patch proposes to defer MAC registration until right before
the guest can actually use the device -- In other words, right
before starting guest CPUs.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Laine Stump <laine@laine.org>
qemuNetworkIfaceConnect() used to have a special case for
actualType='network' (a network with forward mode of route, nat, or
isolated) to call the libvirt public API to retrieve the bridge being
used by a network. That is no longer necessary - since all network
types that use a bridge and tap device now get the bridge name stored
in the ActualNetDef, we can just always use
virDomainNetGetActualBridgeName() instead.
(an audit of the two callers to qemuNetworkIfaceConnect() confirms
that it is never called for any other type of network, so the dead
code in the else statement (logging an internal error if it is called
for any other type of network) is eliminated in the process.)
Since virNetworkFree will call virObjectUnref anyway, let's just use that
directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
Coverity complained that because the cfg->macFilter call checked
net->ifname != NULL before calling ebtablesRemoveForwardAllowIn, then
the virNetDevOpenvswitchRemovePort call should have the same check.
However, if I move the ebtables call prior to the check for TYPE_DIRECT
(where there is a VIR_FREE(net->ifname)), then it seems Coverity is
happy. Since firewall info is tacked on last during setup, removing
it in the opposite order of initialization seems to be natural anyway
Ethernet interfaces in libvirt currently do not support bandwidth setting.
For example, following xml file for an interface will not apply these
settings to corresponding qdiscs.
<interface type="ethernet">
<mac address="02:36:1d:18:2a:e4"/>
<model type="virtio"/>
<script path=""/>
<target dev="tap361d182a-e4"/>
<bandwidth>
<inbound average="984" peak="1024" burst="64"/>
<outbound average="2000" peak="2048" burst="128"/>
</bandwidth>
</interface>
Signed-off-by: Anirban Chakraborty <abchak@juniper.net>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Hotplugging and hotunplugging char devices is only supported through
'-device' and the check for device capability should be independently.
Coverity also complains about 'tmpChr->info.alias' could be NULL and we
are dereferencing it but it somehow only in this case don't recognize
that the value is set by 'qemuAssignDeviceChrAlias' so it's clearly
false positive. Add sa_assert to make coverity happy.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In qemuDomainDetachControllerDevice if the info.alias already exists
a call to qemuAssignDeviceControllerAlias would overwrite the existing
so avoid this possibility.
Prior patch removed the need for the virConnectPtr in the unplug
detach host path which caused ripple effect to remove in multiple
callers. The previous patch just left things as ATTRIBUTE_UNUSED -
this patch will remove the variable.
https://bugzilla.redhat.com/show_bug.cgi?id=1141732
Introduced by commit id '8f76ad99' the logic to detach a scsi_host
device (SCSI or iSCSI) fails when attempting to remove the 'drive'
because as I found in my investigation - the DelDevice takes care of
that for us.
The investigation turned up commits to adjust the logic for the
qemuMonitorDelDevice and qemuMonitorDriveDel processing for interfaces
(commit id '81f76598'), disk bus=VIRTIO,SCSI,USB (commit id '0635785b'),
and chr devices (commit id '55b21f9b'), but nothing with the host devices.
This commit uses the model for the previous set of changes and applies
it to the hostdev path. The call to qemuDomainDetachHostSCSIDevice will
return to qemuDomainDetachThisHostDevice handling either the audit of
the failure or the wait for the removal and then call into
qemuDomainRemoveHostDevice for the event, removal from the domain hostdev
list, and audit of the removal similar to other paths.
NOTE: For now the 'conn' param to +qemuDomainDetachHostSCSIDevice is left
as ATTRIBUTE_UNUSED. Removing requires a cascade of other changes to be
left for a future patch.
This patch adds parsing/formatting code as well as documentation for
shared memory devices. This will currently be only accessible in QEMU
using it's ivshmem device, but is designed as generic as possible to
allow future expansion for other hypervisors.
In the devices section in the domain XML users may specify:
- For shmem device using a server:
<shmem name='shmem0'>
<server path='/tmp/socket-ivshmem0'/>
<size unit='M'>32</size>
<msi vectors='32' ioeventfd='on'/>
</shmem>
- For ivshmem device not using an ivshmem server:
<shmem name='shmem1'>
<size unit='M'>32</size>
</shmem>
Most of the configuration is made optional so it also allows
specifications like:
<shmem name='shmem1/>
<shmem name='shmem2'>
<server/>
</shmem>
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Request erroring out from the backing chain traveller and drop qemu's
internal backing chain integrity tester.
The backing chain traveller reports errors by itself with possibly more
detail than qemuDiskChainCheckBroken ever could.
We also need to make sure that we reconnect to existing qemu instances
even at the cost of losing the backing chain info (this really should be
stored in the XML rather than reloaded from disk, but that needs some
work).
https://bugzilla.redhat.com/show_bug.cgi?id=1095636
When starting up the domain the domain's NICs are allocated. As of
1f24f682 (v1.0.6) we are able to use multiqueue feature on virtio
NICs. It breaks network processing into multiple queues which can be
processed in parallel by different host CPUs. The queues are, however,
created by opening /dev/net/tun several times. Unfortunately, only the
first FD in the row is labelled so when turning the multiqueue feature
on in the guest, qemu will get AVC denial. Make sure we label all the
FDs needed.
Moreover, the default label of /dev/net/tun doesn't allow
attaching a queue:
type=AVC msg=audit(1399622478.790:893): avc: denied { attach_queue }
for pid=7585 comm="qemu-kvm"
scontext=system_u:system_r:svirt_t:s0:c638,c877
tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023
tclass=tun_socket
And as suggested by SELinux maintainers, the tun FD should be labeled
as svirt_t. Therefore, we don't need to adjust any range (as done
previously by Guannan in ae368ebf) rather set the seclabel of the
domain directly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Pass the source of the changed media instead of a complete disk
definition.
Note that the @disk argument now contains what @olddisk would contain.
The new source is passed as a virStorageSource struct.
When we are changing media (or doing other hotplug operations) we need
to setup cgroups, locking and seclabels on the new disk. This is a
multi-step process where every piece can fail. To simplify dealing with
this introduce qemuDomainPrepareDisk that similarly to
qemuDomainPrepareDiskChainElement initializes/tears down a whole new
disk to be used with the domain.
Additionally the function supports passing a different source struct for
media changes of cdroms that will be refactored later.
Currently, qemu driver uses qemuTranslateDiskSourcePool()
to translate disk volume information. This function is
general enough and could be used for other drivers as well,
so move it to conf/domain_conf.c along with its helpers.
- qemuTranslateDiskSourcePool: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePool,
- qemuAddISCSIPoolSourceHost: move to storage/storage_driver.c
and rename to virStorageAddISCSIPoolSourceHost,
- qemuTranslateDiskSourcePoolAuth: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePoolAuth,
- Update users of qemuTranslateDiskSourcePool to use a
new name.
During a QEMU live migration several warning messages about job
handling could be written to syslog on the destination host:
"entering monitor without asking for a nested job is dangerous"
The messages are written because the job handling during migration
uses hard coded asyncJob values in several places that are incorrect.
This patch passes the required asyncJob value around and prevents
the warnings as well as any issues that the warnings may be referring
to.
https://bugzilla.redhat.com/show_bug.cgi?id=1130089
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Otherwise this beautiful error would be overwritten when
the function is called with a really high rate number:
2014-07-28 12:51:47.920+0000: 2304: error : virCommandWait:2399 :
internal error: Child process (/sbin/tc class add dev vnet0 parent 1:
classid 1:1 htb rate 4294968kbps) unexpected exit status 1: Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
default minor id of class to which unclassified packets are sent {0}
r2q DRR quantums are computed as rate in Bps/r2q {10}
debug string of 16 numbers each 0-3 {0}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
[prio P] [slot S] [pslot PS]
[ceil R2] [cburst B2] [mtu MTU] [quantum Q]
rate rate allocated to this class (class can still borrow)
burst max bytes burst which can be accumulated during idle period {computed}
mpu minimum packet size used in rate computations
overhead per-packet size overhead used in rate computations
linklay adapting to a linklayer e.g. atm
ceil definite upper class rate (no borrows) {rate}
cburst burst but for ceil {computed}
mtu max packet size we create rate map for {1600}
prio priority of leaf; lowe
https://bugzilla.redhat.com/show_bug.cgi?id=1043735
Create the structures and API's to hold and manage the iSCSI host device.
This extends the 'scsi_host' definitions added in commit id '5c811dce'.
A future patch will add the XML parsing, but that code requires some
infrastructure to be in place first in order to handle the differences
between a 'scsi_host' and an 'iSCSI host' device.
Split virDomainHostdevSubsysSCSI further. In preparation for having
either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI
to contain just a virDomainHostdevSubsysSCSIHost to describe the
'scsi_host' host device
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
As we are doing with the enum structures, a cleanup in "src/qemu/"
directory was done now. All the enums that were defined in the
header files were converted to typedefs in this directory. This
patch includes all the adjustments to remove conflicts when you do
this kind of change. "Enum-to-typedef"'s conversions were made in
"src/qemu/qemu_{capabilities, domain, migration, hotplug}.h".
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
A future patch will add two-phase block commit jobs; as the
mechanism for managing them is similar to managing a block copy
job, existing errors should be made generic enough to occur
for either job type.
* src/conf/domain_conf.c (virDomainHasDiskMirror): Update
comment.
* src/qemu/qemu_driver.c (qemuDomainDefineXML)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemuDomainBlockCopy): Update error
message.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Some of the APIs already return int since they can produce errors that
need to be propagated. For consistency reasons, this patch changes the
rest of the APIs to also return int even though they do not fail or
report any errors.
In general, we should only remove a backend after seeing DEVICE_DELETED
event for a corresponding frontend.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In general, we should only remove a backend after seeing DEVICE_DELETED
event for a corresponding frontend. This doesn't make any difference for
disks attached using -drive or drive_add since QEMU automatically
removes their backends but it's still better to make our code
consistent. And it may start making difference in case we switch to
attaching disks using -blockdev.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
[1] reported that we are removing network's backend too early. I didn't
really get the reproducer but libvirt behaves strangely when a guest
does not confirm the removal, e.g., it does not support PCI hotplug. In
such case, detaching a network device leaves its frontend in place but
removes the backend, which makes the device unusable for the guest.
Moreover attaching the same device again succeeds and both the guest and
libvirt will see two network interfaces attached but only one of them is
actually working.
I checked with Paolo Bonzini and he confirmed we should only remove a
backend after seeing DEVICE_DELETED event for a corresponding frontend.
[1] https://www.redhat.com/archives/libvir-list/2014-March/msg01740.html
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In "src/conf/domain_conf.h" there are many enum declarations. The
cleanup in this header filer was started, but it wasn't enough and
there are many other files that has enum variables declared. So, the
commit was starting to be big. This commit finish the cleanup in this
header file and in other files that has enum variables, parameters,
or functions declared.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
In "src/conf/domain_conf.h" there are many enumerations (enum)
declarations to be converted as a typedef too. As mentioned before,
it's better to use a typedef for variable types, function types and
other usages. I think this file has most of those enum declarations
at "src/conf/". So, me and Eric Blake plan to keep the cleanups all
over the source code. This time, most of the files changed in this
commit are related to part of one file: "src/conf/domain_conf.h".
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
If QEMU supports DEVICE_DELETED event, we always call
qemuDomainRemoveDevice from the event handler. However, we will need to
push this call away from the main event loop and begin a job for it (see
the following commit), we need to make sure the device is fully removed
by the original thread (and within its existing job) in case the
DEVICE_DELETED event arrives before qemuDomainWaitForDeviceRemoval times
out.
Without this patch, device removals would be guaranteed to never finish
before the timeout because the could would be blocked by the original
job being still active.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The commit 84c59ffa improved the way we change ejectable media.
If for any reason the first "eject" didn't open the tray we
should return with error.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Move sharable PCI handling functions to domain_addr.[ch], and
change theirs prefix from 'qemu' to 'vir':
- virDomainPCIAddressAsString;
- virDomainPCIAddressBusSetModel;
- virDomainPCIAddressEnsureAddr;
- virDomainPCIAddressFlagsCompatible;
- virDomainPCIAddressGetNextSlot;
- virDomainPCIAddressReleaseSlot;
- virDomainPCIAddressReserveAddr;
- virDomainPCIAddressReserveNextSlot;
- virDomainPCIAddressReserveSlot;
- virDomainPCIAddressSetFree;
- virDomainPCIAddressSetGrow;
- virDomainPCIAddressSlotInUse;
- virDomainPCIAddressValidate;
The only change here is function names, the implementation itself
stays untouched.
Extract common allocation code from DomainPCIAddressSetCreate
into virDomainPCIAddressSetAlloc.
This uses the new QEMU_CAPS_HOST_PCI_MULTIDOMAIN capability when
present, for -devivce pci-assign, -device vfio-pci, and -pcidevice.
While creating tests for this new functionality, I noticed that the
xmls for two existing tests had erroneously specified an
until-now-ignored domain="0x0002", so I corrected those two tests, and
also added two failure tests to be sure that we alert users who
attempt to use a non-zero domain with a qemu that doesn't support it.
If a domain network interface that contains a <filterref> is modified
"live" using "virsh update-device --live", libvirtd would crash. This
was because the code supporting live update of an interface's
filterref was assuming that a filterref might be added or modified,
but didn't account for removing the filterref, resulting in a null
dereference of the filter name.
Introduced with commit 258fb278, which was first in libvirt v1.0.1.
This addresses https://bugzilla.redhat.com/show_bug.cgi?id=1093301
The check for a network being active during interface attach was being
done individually in several places (by both the lxc driver and the
qemu driver), but those places were too specific, leading to it *not*
being checked when allocating a connection/device from a macvtap or
hostdev network.
This patch puts a single check in networkAllocateActualDevice(), which
is always called before the any network interface is attached to any
type of domain. It also removes all the other now-redundant checks
from the lxc and qemu drivers.
NB: the following patches are prerequisites for this patch, in the
case that it is backported to any branch:
440beeb network: fix virNetworkObjAssignDef and persistence
8aaa5b6 network: create statedir during driver initialization
b9e9549 network: change location of network state xml files
411c548 network: set macvtap/hostdev networks active if their state
file exists
This fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=880483
Since it is an abbreviation, PCI should always be fully
capitalized or full lower case, never Pci.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Every caller checked the return value and logged an error
- one if no device with the specified MAC was found,
other if there were multiple devices matching the MAC address
(except for qemuDomainUpdateDeviceConfig which logged the same
message in both cases).
Move the error reporting into virDomainNetFindIdx, since in both cases,
we couldn't find one single match - it's just the error messages that
differ.
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Change any method names with Usb, Pci or Scsi to use
USB, PCI and SCSI since they are abbreviations.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
For extracting hostdev codes from qemu_hostdev.c to common library, change qemu
specific cfg->relaxedACS handling to be a flag, and pass it to hostdev
functions.
Same logic of preparing/reattaching hostdevs could be used in attach/detach
hotplug places, so reuse hostdev interfaces to avoid duplicate, also for later
extracting general code to common library.
The qemu_bridge_filter.c file had some helpers for calling
the ebtablesXXX functions todo bridge filtering. The only
thing these helpers did was to overwrite the original error
message from the ebtables code. For added fun, the callers
of these helpers overwrote the errors yet again. For even
more fun, one of the helpers called another helper and
overwrite its errors too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Avoid the freeing of an array of zero file descriptors in case
of error. Initialize the array to -1 using memset.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
There might be some use cases, where user wants to prepare the host or
its environment prior to starting a network and do some cleanup after
the network has been shut down. Consider all the functionality that
libvirt doesn't currently have as an example what a hook script can
possibly do.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The code took into account only the global permissions. The domains now
support per-vm DAC labels and per-image DAC labels. Use the most
specific label available.
commit f094aaac changed qemuPrepareHostdevPCIDevices() such that it
may modify the "backend" (vfio vs. legacy kvm) setting in the
virHostdevDef. However, qemuDomainAttachHostPciDevice() (used by
hotplug) copies the backend setting into a local *before* calling
qemuPrepareHostdevPCIDevices(), and then later makes a decision based
on that pre-change value.
The result is that, if the backend had been set to "default" (i.e. not
specified in the config) and was later updated to "VFIO" by
qemuPrepareHostdevPCIDevices(), the qemu process' MacMemLock is not
increased (as is required for VFIO device assignment).
This patch delays making the local copy of backend until after its
potential modification.
This eliminates the misleading error message that was being logged
when a vfio hostdev hotplug failed:
error: unable to set user and group to '107:107' on '/dev/vfio/22':
No such file or directory
as documented in:
https://bugzilla.redhat.com/show_bug.cgi?id=1035490
Commit ee414b5d (pushed as a fix for Bug 1016511 and part of Bug
1025108) replaced the single call to
virSecurityManagerSetHostdevLabel() in qemuDomainAttachHostDevice()
with individual calls to that same function in each
device-type-specific attach function (for PCI, USB, and SCSI). It also
added a corresponding call to virSecurityManagerRestoreHostdevLabel()
in the error handling of the device-type-specific functions, but
forgot to remove the common call to that from
qemuDomainAttachHostDevice() - this resulted in a duplicate call to
virSecurityManagerRestoreHostdevLabel(), with the second occurrence
being after (e.g.) a PCI device has already been re-attached to the
host driver, thus destroying some of the device nodes / links that we
then attempted to re-label (e.f. /dev/vfio/22) and generating an error
log that obscured the original error.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1035490
virProcessSetMaxMemLock() (which is a wrapper over prlimit(3)) expects
the memory size in bytes, but libvirt's domain definition (which was
being used by qemuDomainAttachHostPciDevice()) stores all memory
tuning parameters in KiB. This was being accounted for when setting
MaxMemLock at domain startup time (so cold-plugged devices would
work), but not for hotplug.
This patch simplifies the few lines that call
virProcessSetMemMaxLock(), and multiply the amount * 1024 so that
we're locking the correct amount of memory.
What remains a mystery to me is why hot-plug of a managed='no' device
would succeed (at least on my system) while managed='yes' would
fail. I guess in one case the memory was coincidentally already
resident and in the other it wasn't.
We were unconditionally removing the device from the host list, when it
should only be done on error.
This fixes USB collision detection when hotplugging the same device to
two guests.
If we hit a collision, we free the USB device while it is still part
of our temporary USBDeviceList. When the list is unref'd, the device
is free'd again.
Make the initial device freeing dependent on whether it is present
in the temporary list or not.
Similar to what Jiri did for cgroup setup/teardown in 05e149f94, push
it all into the device handler functions so we can do the necessary prep
work before claiming the device.
This also fixes hotplugging USB devices by product/vendor (virt-manager's
default behavior):
https://bugzilla.redhat.com/show_bug.cgi?id=1016511
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1029732
The BZ asked for the capability to change the number of queues used by
a virtio-net device while the device is in use. Because the number of
queues can only be set at the time the device is created, that isn't
possible. However, libvirt also shouldn't be silently reporting
success when someone tries to change the number of queues. So this
patch flags that as an error (just as attempts to change any of the
other virtio-specific parameters already do).
If a SCSI hostdev is included in an initial domain XML, without a
corresponding controller statement, one is created silently when the
guest is booted.
When hotplugging a SCSI hostdev, a presumption is that the controller
is already present in the domain either from the original XML, or via
an earlier hotplug.
[root@xxxxxxxx ~]# cat disk.xml
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='3' unit='1088438288'/>
</source>
</hostdev>
[root@xxxxxxxx ~]# virsh attach-device guest01 disk.xml
error: Failed to attach device from disk.xml
error: internal error: unable to execute QEMU command 'device_add': Bus 'scsi0.0' not found
Since the infrastructure is in place, we can also create a controller
silently for use by the hotplugged hostdev device.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
For systems without a PCI bus, attaching a SCSI controller fails:
[root@xxxxxxxx ~]# cat controller.xml
<controller type='scsi' model='virtio-scsi' index='0' />
[root@xxxxxxxx ~]# virsh attach-device guest01 controller.xml
error: Failed to attach device from controller.xml
error: XML error: No PCI buses available
A similar problem occurs with the detach of a controller:
[root@xxxxxxxx ~]# virsh detach-device guest01 controller.xml
error: Failed to detach device from controller.xml
error: operation failed: controller scsi:0 not found
The qemuDomainXXtachPciControllerDevice routines made assumptions
that any caller had a PCI bus. These routines now selectively calls
PCI functions where necessary, and assigns the device information
type to one appropriate for the bus in use.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
For attach/detach of controller devices, we rename the functions to
remove 'PCI' from their title. The actual separation of PCI-specific
operations will be handled in the next patch.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1025108
So far qemuSetupHostdevCGroup was called very early during hotplug, even
before we knew the device we were about to hotplug was actually
available. By calling the function later, we make sure QEMU won't be
allowed to access devices used by other domains.
Another important effect of this change is that hopluging USB devices
specified by vendor and product (but not by their USB address) works
again. This was broken since v1.0.5-171-g7d763ac, when the call to
qemuFindHostdevUSBDevice was moved after the call to
qemuSetupHostdevCGroup, which then used an uninitialized USB address.
This patch moves some code in the qemuDomainAttachSCSIDisk
function. The check for the existence of a PCI address assigned
to the SCSI controller was moved in order to be executed only
when needed. The PCI address of a controller is not necessary
if QEMU_CAPS_DEVICE is supported.
This fixes issues with the hotplug of SCSI disks on pseries guests.
This patch (and the two patches that precede it) resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1005682
When libvirt was changed to delay the final cleanup of device removal
until the qemu process had signaled it with a DEVICE_DELETED event for
that device, the hostdev removal function
(qemuDomainRemoveHostDevice()) was written to properly handle the
removal of a hostdev that was actually an SRIOV virtual function
(defined with <interface type='hostdev'>). However, the function used
to search for a device matching the alias name provided in the
DEVICE_DELETED message (virDomainDefFindDevice()) would search through
the list of netdevs before hostdevs, so qemuDomainRemoveHostDevice()
was never called; instead the netdev function,
qemuDomainRemoveNetDevice() (which *doesn't* properly cleanup after
removal of <interface type='hostdev'>), was called.
(As a reminder - each <interface type='hostdev'> results in a
virDomainNetDef which contains a virDomainHostdevDef having a parent
type of VIR_DOMAIN_DEVICE_NET, and parent.data.net pointing back to
the virDomainNetDef; both Defs point to the same device info object
(and the info contains the device's "alias", which is used by qemu to
identify the device). The virDomainHostdevDef is added to the domain's
hostdevs list *and* the virDomainNetDef is added to the domain's nets
list, so searching either list for a particular alias will yield a
positive result.)
This function modifies the qemuDomainRemoveNetDevice() to short
circuit itself and call qemu DomainRemoveHostDevice() instead when the
actual device is a VIR_DOMAIN_NET_TYPE_HOSTDEV (similar logic to what
is done in the higher level qemuDomainDetachNetDevice())
Note that even if virDomainDefFindDevice() changes in the future so
that it finds the hostdev entry first, the current code will continue
to work properly.
This function was called in three places, and in each the call was
qualified by a slightly different conditional. In reality, this
function should only be called for a hostdev if all of the following
are true:
1) mode='subsystem'
2) type='pci'
3) there is a parent device definition which is an <interface>
(VIR_DOMAIN_DEVICE_NET)
We can simplify the callers and make them more consistent by checking
these conditions at the top ov qemuDomainHostdevNetConfigRestore and
returning 0 if one of them isn't satisfied.
The location of the call to qemuDomainHostdevNetConfigRestore() has
also been changed in the hot-plug case - it is moved into the caller
of its previous location (i.e. from qemuDomainRemovePCIHostDevice() to
qemuDomainRemoveHostDevice()). This was done to be more consistent
about which functions pay attention to whether or not this is one of
the special <interface> hostdevs or just a normal hostdev -
qemuDomainRemoveHostDevice() already contained a call to
networkReleaseActualDevice() and virDomainNetDefFree(), so it makes
sense for it to also handle the resetting of the device's MAC address
and vlan tag (which is what's done by
qemuDomainHostdevNetConfigRestore()).
Prefer using VFIO (if available) to the legacy KVM device passthrough.
With this patch a PCI passthrough device without the driver configured
will be started with VFIO if it's available on the host. If not legacy
KVM passthrough is checked and error is reported if it's not available.
The qemuDomainChangeNet() is called when 'virsh update-device' is
invoked on a NIC. Currently, we fail to update the QoS even though
we have routines for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The return value of virDomainControllerFind >=0 means that
the specific controller was found.
But some functions invoke it and treat 0 as not found.
This patch fix these incorrect invocation.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
When using a <interface type="network"> that points to a network with
hostdev forwarding mode a hostdev alias is created for the network. This
allias is inserted into the hostdev list, but is backed with a part of
the network object that it is connected to.
When a VM is being stopped qemuProcessStop() calls
networkReleaseActualDevice() which eventually frees the memory for the
hostdev object. Afterwards when the domain definition is being freed by
virDomainDefFree() an invalid pointer is accessed by
virDomainHostdevDefFree() and may cause a crash of the daemon.
This patch removes the entry in the hostdev list before freeing the
depending memory to avoid this issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000973
If there's no hard_limit set and domain uses VFIO we still must lock
the guest memory (prerequisite from qemu). Hence, we should compute
the amount to be locked from max_balloon.
If user requested multiqueue networking, beside multiple /dev/tap and
/dev/vhost-net openings, we forgot to pass mq=on onto the -device
virtio-net-pci command line. This is advised at:
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
Hotplugging a single SCSI device works, but adding additional ones
result in an error from QEMU:
[root@gpok197 ~]# virsh attach-device guest01 blah.xml
Device attached successfully
[root@gpok197 ~]# virsh attach-device guest01 blah2.xml
error: Failed to attach device from blah2.xml
error: internal error unable to execute QEMU command 'device_add': Duplicate ID 'hostdev0' for device
The hostdev ID that is created is always set to zero, regardless
of the contents of the XML. Changing the index in the hotplug case
to a negative one so the next available index is used.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
We had been setting the device alias in the devinceinfo for pci
controllers to "pci%u", but then hardcoding "pci.%u" when creating the
device address for other devices using that pci bus. This all worked
just fine until we encountered the built-in "pcie.0" bus (the PCIe
root complex) in Q35 machines.
In order to create the correct commandline for this one case, this
patch:
1) sets the alias for PCI controllers correctly, to "pci.%u" (or
"pcie.%u" for the pcie-root controller)
2) eliminates the hardcoded "pci.%u" for pci controllers when
generatuing device address strings, and instead uses the controller's
alias.
3) plumbs a pointer to the virDomainDef all the way down to
qemuBuildDeviceAddressStr. This was necessary in order to make the
aliase of the controller *used by a device* available (previously
qemuBuildDeviceAddressStr only had the deviceinfo of the device
itself, *not* of the controller it was connecting to). This made for a
larger than desired diff, but at least in the future we won't have to
do it again, since all the information we could possibly ever need for
future enhancements is in the virDomainDef. (right?)
This should be done for *all* controllers, but for now we just do it
in the case of PCI controllers, to reduce the likelyhood of
regression.
Introduced in commit 24b08219; compilation on RHEL 6.4 complained:
qemu/qemu_hotplug.c: In function 'qemuDomainAttachChrDevice':
qemu/qemu_hotplug.c:1257: error: declaration of 'remove' shadows a global declaration [-Wshadow]
/usr/include/stdio.h:177: error: shadowed declaration is here [-Wshadow]
* src/qemu/qemu_hotplug.c (qemuDomainAttachChrDevice): Avoid the
name 'remove'.
Signed-off-by: Eric Blake <eblake@redhat.com>
There are two levels on which a device may be hotplugged: config
and live. The config level requires just an insert or remove from
internal domain definition structure, which is exactly what this
patch does. There is currently no implementation for a chardev
update action, as there's not much to be updated. But more
importantly, the only thing that can be updated is path or socket
address by which chardevs are distinguished. So the update action
is currently not supported.
If an error occurs during qemuDomainAttachNetDevice after the macvtap
was created in qemuPhysIfaceConnect, the macvtap device gets left behind.
This patch adds code to the cleanup routine to delete the macvtap.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
I recently patches the callers to virPCIDeviceReset() to not call it
if the current driver for a device was vfio-pci (since that driver
will always reset the device itself when appropriate. At the time, Dan
Berrange suggested that I could instead modify virPCIDeviceReset
to check the currently bound driver for the device, and decide
for itself whether or not to go ahead with the reset.
This patch removes the previously added checks, and replaces them with
a check down in virPCIDeviceReset(), as suggested.
The functional difference here is that previously we were deciding
based on either the hostdev configuration or the value of
stubDriverName in the virPCIDevice object, but now we are actually
comparing to the "driver" link in the device's sysfs entry
directly. In practice, both should be the same.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.
Commit 752596b5 broke the build with -Werror
qemu/qemu_hotplug.c: In function 'qemuDomainChangeGraphics':
qemu/qemu_hotplug.c:1980:39: error: declaration of 'listen' shadows a
global declaration [-Werror=shadow]
Fix with s/listen/newlisten/
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in
7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
If we are just ejecting media, ret == -1 even after the retry loop
determines that the tray is open, as requested. This means media
disconnect always report's error.
Fix it, and fix some other mini issues:
- Don't overwrite the 'eject' error message if the retry loop fails
- Move the retries decrement inside the loop, otherwise the final loop
might succeed, yet retries == 0 and we will raise error
- Setting ret = -1 in the disk->src check is unneeded
- Fix comment typos
cc: mprivozn@redhat.com
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
In 84c59ffa I've tried to fix changing ejectable media process. The
process should go like this:
1) we need to call 'eject' on the monitor
2) we should wait for 'DEVICE_TRAY_MOVED' event
3) now we can issue 'change' command
However, while waiting in step 2) the domain monitor was locked. So
even if qemu reported the desired event, the proper callback was not
called immediately. The monitor handling code needs to lock the
monitor in order to read the event. So that's the first lock we must
not hold while waiting. The second one is the domain lock. When
monitor handling code reads an event, the appropriate callback is
called then. The first thing that each callback does is locking the
corresponding domain as a domain or its device is about to change
state. So we need to unlock both monitor and VM lock. Well, holding
any lock while sleep()-ing is not the best thing to do anyway.
Since 0d70656afd, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
This adds both attachment and detachment support for scsi host
device.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat>
It's better to put the usb related codes into qemuDomainAttachHostUsbDevice
instead of qemuDomainAttachHostDevice.
And in the old qemuDomainAttachHostDevice, just stealing the "usb" from
driver->activeUsbHostdevs leaks the memory.
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
The USB-specific cgroup setup had been inserted inline in
qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a
common cgroup setup function called for all hostdevs, so it makes sens
to put the usb-specific setup there and just rely on that function
being called.
The one thing I'm uncertain of here (and a reason for not pushing
until after release) is that previously hostdev->missing was checked
only when starting a domain (and cgroup setup for the device skipped
if missing was true), but with this consolidation, it is now checked
in the case of hotplug as well. I don't know if this will have any
practical effect (does it make sense to hotplug a "missing" usb
device?)
PCIO device assignment using VFIO requires read/write access by the
qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the
VFIO group number that the assigned device belongs to (and can be
found with the function virPCIDeviceGetVFIOGroupDev)
/dev/vfio/vfio can be accessible to any guest without danger
(according to vfio developers), so it is added to the static ACL.
The group device must be dynamically added to the cgroup ACL for each
vfio hostdev in two places:
1) for any devices in the persistent config when the domain is started
(done during qemuSetupCgroup())
2) at device attach time for any hotplug devices (done in
qemuDomainAttachHostDevice)
The group device must be removed from the ACL when a device it
"hot-unplugged" (in qemuDomainDetachHostDevice())
Note that USB devices are already doing their own cgroup setup and
teardown in the hostdev-usb specific function. I chose to make the new
functions generic and call them in a common location though. We can
then move the USB-specific code (which is duplicated in two locations)
to this single location. I'll be posting a followup patch to do that.
This isn't strictly speaking a bugfix, but I realized I'd gotten a bit
too verbose when I chose the names for
VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_*. This shortens them all a bit.
<source type='bridge'> uses a helper application to do the necessary
TUN/TAP setup to use an existing network bridge, thus letting
unprivileged users use TUN/TAP interfaces.
However, libvirt should be preventing QEMU from running any setuid
programs at all, which would include this helper program. From
a security POV, any setuid helper needs to be run by libvirtd itself,
not QEMU.
This is what this patch does. libvirt now invokes the setuid helper,
gets the TAP fd and then passes it to QEMU in the normal manner.
The path to the helper is specified in qemu.conf.
As a small advantage, this adds a <target dev='tap0'/> element to the
XML of an active domain using <interface type='bridge'>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VFIO requires all of the guest's memory and IO space to be lockable in
RAM. The domain's max_balloon is the maximum amount of memory the
domain can have (in KiB). We add a generous 1GiB to that for IO space
(still much better than KVM device assignment, where the KVM module
actually *ignores* the process limits and locks everything anyway),
and convert from KiB to bytes.
In the case of hotplug, we are changing the limit for the already
existing qemu process (prlimit() is used under the hood), and for
regular commandline additions of vfio devices, we schedule a call to
setrlimit() that will happen after the qemu process is forked.
The device option for vfio-pci is nearly identical to that for
pci-assign - only the configfd parameter isn't supported (or needed).
Checking for presence of the bootindex parameter is done separately
from constructing the commandline, similar to how it is done for
pci-assign.
This patch contains tests to check for proper commandline
construction. It also includes tests for parser-formatter-parser
roundtrips (xml2xml), because those tests use the same data files, and
would have failed had they been included before now.
qemu: xml/args tests for VFIO hostdev and <interface type='hostdev'/>
These should be squashed in with the patch that adds commandline
handling of vfio (they would fail at any earlier time).
There will soon be other items related to pci hostdevs that need to be
in the same part of the hostdevsubsys union as the pci address (which
is currently a single member called "pci". This patch replaces the
single member named pci with a struct named pci that contains a single
member named "addr".
Instead of calling virCgroupForDomain every time we need
the virCgrouPtr instance, just do it once at Vm startup
and cache a reference to the object in qemuDomainObjPrivatePtr
until shutdown of the VM. Removing the virCgroupPtr from
the QEMU driver state also means we don't have stale mount
info, if someone mounts the cgroups filesystem after libvirtd
has been started
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
f946462e14 changed behavior by settings
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI upfront. If we do so before invoking
qemuDomainPCIAddressEnsureAddr we merely try to set the PCI slot via
qemuDomainPCIAddressReserveSlot instead reserving a new address via
qemuDomainPCIAddressSetNextAddr which fails with
$ ~/run-tck-test domain/200-disk-hotplug.t
./scripts/domain/200-disk-hotplug.t .. # Creating a new transient domain
./scripts/domain/200-disk-hotplug.t .. 1/5 # Attaching the new disk /var/lib/jenkins/jobs/libvirt-tck-build/workspace/scratchdir/200-disk-hotplug/extra.img
# Failed test 'disk has been attached'
# at ./scripts/domain/200-disk-hotplug.t line 67.
# died: Sys::Virt::Error (libvirt error code: 1, message: internal error unable to reserve PCI address 0:0:0.0
# )
The VIR_ERR_NO_SUPPORT error code is reserved for cases where an
API is not implemented in a driver. It definitely should not be
used when an API execution fails due to unsupported operation.
We didn't yet expose the virtio device attach and detach functionality
for s390 domains as the device hotplug was very limited with the old
virtio-s390 bus. With the CCW bus there's full hotplug support for
virtio devices in QEMU, so we are adding this to libvirt too.
Since the virtio hotplug isn't limited to PCI anymore, we change the
function names from xxxPCIyyy to xxxVirtioyyy, where we handle all
three virtio bus types.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
For both AttachDevice and UpdateDevice APIs, if the disk device
is 'cdrom' or 'floppy', the operations could be ejecting, updating,
and inserting. For either ejecting or updating, the shared disk
entry of the original disk src has to be removed, because it's
not useful anymore.
And since the original disk def will be changed, new disk def passed
as argument will be free'ed in qemuDomainChangeEjectableMedia, so
we need to copy the orignal disk def before
qemuDomainChangeEjectableMedia, to use it for qemuRemoveSharedDisk.
Some functions were using virDomainDeviceInfo where virDevicePCIAddress
would suffice. Some were only using integers for slots and functions,
assuming the bus numbers are always 0.
Switch from virDomainDeviceInfoPtr to virDevicePCIAddressPtr:
qemuPCIAddressAsString
qemuDomainPCIAddressCheckSlot
qemuDomainPCIAddressReserveAddr
qemuDomainPCIAddressReleaseAddr
Switch from int slot to virDevicePCIAddressPtr:
qemuDomainPCIAddressReserveSlot
qemuDomainPCIAddressReleaseSlot
qemuDomainPCIAddressGetNextSlot
Deleted functions (they would take the same parameters
as ReserveAddr/ReleaseAddr do now.)
qemuDomainPCIAddressReserveFunction
qemuDomainPCIAddressReleaseFunction
With the majority of fields in the virQEMUDriverPtr struct
now immutable or self-locking, there is no need for practically
any methods to be using the QEMU driver lock. Only a handful
of helper APIs in qemu_conf.c now need it
From qemu's point of view these are still just tap devices, so there's
no reason they shouldn't work with vhost-net; as a matter of fact,
Raja Sivaramakrishnan <srajag00@yahoo.com> verified on libvir-list
that at least the qemu_command.c part of this patch works:
https://www.redhat.com/archives/libvir-list/2012-December/msg01314.html
(the hotplug case is extrapolation on my part).
To avoid confusion between 'virCapsPtr' and 'qemuCapsPtr'
do some renaming of various fucntions/variables. All
instances of 'qemuCapsPtr' are renamed to 'qemuCaps'. To
avoid that clashing with the 'qemuCaps' typedef though,
rename the latter to virQEMUCaps.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the activePciHostdevs, inactivePciHostdevsd and
activeUsbHostdevs lists are all implicitly protected by the
QEMU driver lock. Now that the lists all inherit from the
virObjectLockable, we can make the locking explicit, removing
the dependency on the QEMU driver lock for correctness.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow modifications to the lists to be synchronized, convert
virPCIDeviceList and virUSBDeviceList into virObjectLockable
classes. The locking, however, will not be self-contained. The
users of these classes will have to call virObjectLock/Unlock
in the critical regions.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the virQEMUDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virQEMUDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virQEMUDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.
NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=892289
It seems like with new udev within guest OS, the tray is locked,
so we need to:
- 'eject'
- wait for tray to open
- 'change'
Moreover, even when doing bare 'eject', we should check for
'tray_open' as guest may have locked the tray. However, the
waiting phase shouldn't be unbounded, so I've chosen 10 retries
maximum, each per 500ms. This should give enough time for guest
to eject a media and open the tray.
This avoids "Event negative_returns: A negative constant "-1" is passed as
an argument to a parameter that cannot be negative.". The called function
uses -1 to determine whether it needs to traverse all the hostdevs.
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).
This fixes a problem that showed up during testing of:
https://bugzilla.redhat.com/show_bug.cgi?id=881480
Due to a logic error in the function that gets the name of the bridge
an interface connects to, any time a bridge was specified directly
(type='bridge') rather than indirectly (type='network'), An error
would be logged (although the operation would then complete
successfully):
Network type 6 is not supported
The final virReportError() in the function
qemuDomainNetGetBridgeName() was apparently avoided in the past with a
"goto cleanup" at the end of each case, but the case of bridge somehow
no longer has that final goto cleanup.
The proper solution is anyway to not rely on goto's, but put the error
log inside an else {} clause, so that it's executed only if the type
is neither bridge nor network (in reality, this function should only
ever be called for those two types, that's why this is an internal
error).
While making this change, the error message was also tuned to be more
correct (since it's not really the type of the network, but the type
of the interface, and it *is* otherwise supported, it's just that the
interface type in question doesn't *have* a bridge device associated
with it, or at least we don't know how to get it).
Since we can't (currently) rely on the ability to provide blanket
support for all possible network changes by calling the toplevel
netdev hostside disconnect/connect functions (due to qemu only
supporting a lockstep between initialization of host side and guest
side of devices), in order to support live change of an interface's
nwfilter we need to make a special purpose function to only call the
nwfilter teardown and setup functions if the filter for an interface
(or its parameters) changes. The pattern is nearly identical to that
used to change the bridge that an interface is connected to.
This patch was inspired by a request from Guido Winkelmann
<guido@sagersystems.de>, who tested an earlier version.
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
Commit 1b2ebf95 overlooked that there were two spots affected.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice):
Transfer backing chain before deletion.
* src/qemu/qemu_driver.c (qemuDomainDetachDeviceDiskLive): Fix
spacing (partly to ensure a different-looking patch).
Remove the obsolete 'qemud' naming prefix and underscore
based type name. Introduce virQEMUDriverPtr as the replacement,
in common with LXC driver naming style
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
The actual regression is due to a latent bug. The hot unplug code
was computing the set of files needing cgroup ACL revocation based
on the XML passed in by the user, rather than based on the domain's
details on which disk was being deleted. As long as the revoke
path was always recomputing the backing chain, this didn't really
matter; but now that we want to compute the chain exactly once and
remember that computation, we need to hang on to the backing chain
until after the revoke has happened.
* src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice):
Transfer backing chain before deletion.
The libvirt coding standard is to use 'function(...args...)'
instead of 'function (...args...)'. A non-trivial number of
places did not follow this rule and are fixed in this patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>