https://bugzilla.redhat.com/show_bug.cgi?id=981139
If a domain is paused before migration starts, we need to tell that to
the destination libvirtd to prevent it from resuming the domain at the
end of migration. This regression was introduced by commit 5379bb0.
Since commit 23e8b5d8, the code is refactored in a way that supports
domains with multiple graphics elements and commit 37b415200 allows
starting such domains. However none of those commits take migration
into account. Even though qemu doesn't support relocation for
anything else than SPICE and for no more than one graphics, there is no
reason to hardcode one graphics into this part of the code as well.
Add monitor callback API domainGuestPanic, that implements
'destroy', 'restart' and 'preserve' events of the 'on_crash'
in the XML when domain crashed.
After abf75aea24 the compiler screams:
qemu/qemu_driver.c: In function 'qemuNodeDeviceDetachFlags':
qemu/qemu_driver.c:10693:9: error: 'domain' may be used uninitialized in this function [-Werror=maybe-uninitialized]
pci = virPCIDeviceNew(domain, bus, slot, function);
^
qemu/qemu_driver.c:10693:9: error: 'bus' may be used uninitialized in this function [-Werror=maybe-uninitialized]
qemu/qemu_driver.c:10693:9: error: 'slot' may be used uninitialized in this function [-Werror=maybe-uninitialized]
qemu/qemu_driver.c:10693:9: error: 'function' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Since the other functions qemuNodeDeviceReAttach and qemuNodeDeviceReset
looks exactly the same, I've initialized the variables there as well.
However, I am still wondering why those functions don't matter to gcc
while the first one does.
Implement check whether (maximum) vCPUs doesn't exceed machine
type's cpu-max settings.
On older versions of QEMU the check is disabled.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
A loop in qemuPrepareHostdevPCIDevices() intended to cycle through all
the objects on the list pcidevs was doing "while (listcount > 0)", but
nothing in the body of the loop was reducing the size of the list - it
was instead removing items from a *different* list. It has now been
safely changed to a for() loop.
(This isn't as bad as it sounds - it's only a problem in case of an
OOM error.)
qemuGetActivePciHostDeviceList() had been creating a list that
contained pointers to objects that were also on the activePciHostdevs
list. In case of an OOM error, this newly created list would be
virObjectUnref'ed, which would cause everything on the list to be
freed. But all of those objects would still be on the
activePciHostdevs list, which could have very bad consequences if that
list was ever again accessed.
The solution used here is to populate the new list with *copies* of
the objects from the original list. It turns out that on return from
qemuGetActivePciHostDeviceList(), the caller would almost immediately
go through all the device objects and "steal" them (i.e. remove the
pointer from the list but not delete it) all from either one list or
the other; we now instead just *delete* (remove from the list and
free) each device from one list or the other, so in the end we have
the same state.
I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).
I'm changing it now because I'm about to add several more functions
that deal with iommu groups.
The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it.
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.
virPCIDeviceDetach would previously sometimes consume the input device
object (to put it on the inactive list) and sometimes not. Avoiding
memory leaks required checking beforehand to see if the device was
already on the list, and freeing the device object in the caller only
if there wasn't already an identical object on the inactive list.
This patch makes it consistent - virPCIDeviceDetach will *never*
consume the input virPCIDevice object; if it needs to put one on the
inactive list, it will create a copy and put *that* on the list. This
way the caller knows that it is always their responsibility to free
the device object they created.
Previously stubDriver was always set from a string literal, so it was
okay to use a const char * that wasn't freed when the virPCIDevice was
freed. This will not be the case in the near future, so it is now a
char* that is allocated in virPCIDeviceSetStubDriver() and freed
during virPCIDeviceFree().
Add new CPU features for HyperV:
vapic for virtual APIC support
spinlocks for setting spinlock support
<features>
<hyperv>
<vapic state='on'/>
<spinlocks state='on' retries='4096'/>
</hyperv>
</features>
https://bugzilla.redhat.com/show_bug.cgi?id=784836
Commit 752596b5 broke the build with -Werror
qemu/qemu_hotplug.c: In function 'qemuDomainChangeGraphics':
qemu/qemu_hotplug.c:1980:39: error: declaration of 'listen' shadows a
global declaration [-Werror=shadow]
Fix with s/listen/newlisten/
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in
7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
As a consequence of the cgroup layout changes from commit '632f78ca', the
qemuDomainGetSchedulerParameters[Flags]()' and qemuGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix
cpu_shares : 0
vcpu_period : 0
vcpu_quota : 0
emulator_period: 0
emulator_quota : 0
This patch will restore the capability to return configuration only data
for a non running domain regardless of whether cgroups are available.
This flag is meant for errors happening on the source of the migration
and isn't used on the destination. To allow better migration
compatibility, don't propagate it to the destination.
Paolo Bonzini pointed out that it's actually possible to migrate a qemu
instance that was paused due to I/O error and it will be able to work on
the destination if the storage is accessible.
This patch introduces flag VIR_MIGRATE_ABORT_ON_ERROR that cancels the
migration in case an I/O error happens while it's being performed and
allows migration without this flag. This flag can be possibly used for
other error reasons that may be introduced in the future.
Currently, we wait for SPICE to migrate in the very same loop where we
wait for qemu to migrate. This has a disadvantage of slowing seamless
migration down. One one hand, we should not kill the domain until all
SPICE data has been migrated. On the other hand, there is no need to
wait in the very same loop and hence slowing down 'cont' on the
destination. For instance, if users are watching a movie, they can
experience the movie to be stopped for a couple of seconds, as
processors are not running nor on src nor on dst as libvirt waits for
SPICE to migrate. We should move the waiting phase to migration CONFIRM
phase.
When qemu >= 1.20, it is safe to use -device for primary video
device as described in 4c993d8ab.
So, we are missing the cap flag in QMP capabilities detection, this
flag can be initialized safely in virQEMUCapsInitQMPBasic.
Convert input XML to migratable before using it in
qemuDomainSaveImageOpen.
XML in the save image is migratable, i.e. doesn't contain implicit
controllers. If these controllers were in a non-default order in the
input XML, the ABI check would fail. Removing and re-adding these
controllers fixes it.
https://bugzilla.redhat.com/show_bug.cgi?id=834196
During a live migration the guest may receive a disk access I/O error.
In this state the guest is unable to continue running on a remote host
after migration as some state may be present in the kernel and not
migrated.
With this patch, the migration is canceled in such case so it can either
continue on the source if the I/O issues are recovered or has to be
destroyed anyways.
https://bugzilla.redhat.com/show_bug.cgi?id=971485
As of d7f9d82753 we copy the listen
address from the qemu.conf config file in case none has been provided
via XML. But later, when migrating, we should not include such listen
address in the migratable XML as it is something autogenerated, not
requested by user. Moreover, the binding to the listen address will
likely fail, unless the address is '0.0.0.0' or its IPv6 equivalent.
This patch introduces a new boolean attribute to virDomainGraphicsListenDef
to distinguish autofilled listen addresses. However, we must keep the
attribute over libvirtd restarts, so it must be kept within status XML.
This patch fixes changes done in commit 29c1e913e4
that was pushed without implementing review feedback.
The flag introduced by the patch is changed to VIR_DOMAIN_VCPU_GUEST and
documentation makes the difference between regular hotplug and this new
functionality more explicit.
The virsh options that enable the use of the new flag are changed to
"--guest" and the documentation is fixed too.
Currently, there's a path to use the ncpuinfo variable uninitialized,
which leads to a compiler warning:
qemu/qemu_driver.c: In function 'qemuDomainGetVcpusFlags':
qemu/qemu_driver.c:4573:9: error: 'ncpuinfo' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
for (i = 0; i < ncpuinfo; i++) {
^
This patch implements support for the "cpu-add" QMP command that plugs
CPUs into a live guest. The "cpu-add" command was introduced in QEMU
1.5. For the hotplug to work machine type "pc-i440fx-1.5" is required.
The qemu guest agent allows to online and offline CPUs from the
perspective of the guest. This patch adds helpers that call
'guest-get-vcpus' and 'guest-set-vcpus' guest agent functions and
convert the data for internal libvirt usage.
https://bugzilla.redhat.com/show_bug.cgi?id=964177
Though both libvirt and QEMU's document say RTC_CHANGE returns
the offset from the host UTC, qemu actually returns the offset
from the specified date instead when specific date is provided
(-rtc base=$date).
It's not safe for qemu to fix it in code, it worked like that
for 3 years, changing it now may break other QEMU use cases.
What qemu tries to do is to fix the document:
http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg04782.html
And in libvirt side, instead of replying on the value from qemu,
this converts the offset returned from qemu to the offset from
host UTC, by:
/*
* a: the offset from qemu RTC_CHANGE event
* b: The specified date (-rtc base=$date)
* c: the host date when libvirt gets the RTC_CHANGE event
* offset: What libvirt will report
*/
offset = a + (b - c);
The specified date (-rtc base=$date) is recorded in clock's def as
an internal only member (may be useful to exposed outside?).
Internal only XML tag "basetime" is introduced to not lose the
guest's basetime after libvirt restarting/reloading:
<clock offset='variable' adjustment='304' basis='utc' basetime='1370423588'/>
Currently, a listen address for a SPICE server can be specified. Later,
when the domain is migrated, we need to relocate the graphics which
involves telling new destination to the SPICE server. However, we can't
just assume the listen address is the new location, because the listen
address can be ANYCAST (0.0.0.0 for IPv4, :: for IPv6). In which case,
we want to pass the remote hostname. But there are some troubles with
ANYCAST. In both IPv4 and IPv6 it has many ways for specifying such
address. For instance, in IPv4: 0, 0.0, 0.0.0, 0.0.0.0. The number of
variations gets bigger in IPv6 world. Hence, in order to check for
ANYCAST address sanely, we should take the provided listen address,
parse it and format back in it's full form. Which is exactly what this
patch does.
The work was done at the time of snapshot xmlstring parsing
if (offline && def->memory &&
def->memory != VIR_DOMAIN_SNAPSHOT_LOCATION_NONE) {
virReportError(...);
}
This should resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=959191
The problem was that qemuUpdateActivePciHostdevs was returning 0
(success) when no hostdevs were present, but would otherwise return -1
(failure) even when it completed successfully. It is only called from
qemuProcessReconnect(), and when qemuProcessReconnect got back an
error, it would not only stop reconnecting, but would terminate the
guest qemu process "to remove danger of it ending up running twice if
user tries to start it again later".
(This bug was introduced in commit 011cf7ad, which was pushed between
v1.0.2 and v1.0.3, so all maintenance branches from v1.0.3 up to 1.0.5
will need this one line patch applied.)
A literal IPv6 must be escaped, otherwise migration fails with:
unable to execute QEMU command 'drive-mirror': address resolution failed
for f0::0d:5901: Servname not supported for ai_socktype
since QEMU treats everything after the first ':' as the port.
If snapshot creation failed for example due to invalid use of the
"REUSE_EXTERNAL" flag, libvirt killed access to the original image file
instead of the new image file. On machines with selinux this kills the
whole VM as the selinux context is enforced immediately.
* qemu_driver.c:qemuDomainSnapshotUndoSingleDiskActive():
- Kill access to the new image file instead of the old one.
Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=906639
A bug in Cygwin [1] and poor error messages from gcc [2] lead
to this confusing compilation error:
qemu/qemu_monitor.c:418:9: error: passing argument 2 of 'sendmsg' from incmpatible pointer type
/usr/include/sys/socket.h:42:11: note: expected 'const struct msghdr *' but argument is of type 'struct msghdr *'
[1] http://cygwin.com/ml/cygwin/2013-05/msg00451.html
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57475
* src/qemu/qemu_monitor.c (includes): Include <sys/socket.h>
before <sys/un.h>.
Signed-off-by: Eric Blake <eblake@redhat.com>
This is a recurring problem for cygwin :)
For example, see commit 23a4df88.
qemu/qemu_driver.c: In function 'qemuStateInitialize':
qemu/qemu_driver.c:691:13: error: format '%d' expects type 'int', but argument 8 has type 'uid_t' [-Wformat]
* src/qemu/qemu_driver.c (qemuStateInitialize): Add casts.
* daemon/remote.c (remoteDispatchAuthList): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
A cygwin build of the qemu driver fails with:
qemu/qemu_process.c: In function 'qemuPrepareCpumap':
qemu/qemu_process.c:1803:31: error: 'CPU_SETSIZE' undeclared (first use in this function)
CPU_SETSIZE is a Linux extension in <sched.h>; a bit more portable
is using sysconf if _SC_NPROCESSORS_CONF is defined (several platforms
have it, including Cygwin). Ultimately, I would have preferred to
use gnulib's 'nproc' module, but it is currently under an incompatible
license.
* src/qemu/qemu_conf.h (QEMUD_CPUMASK_LEN): Provide definition on
cygwin.
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, if there's an error opening /dev/vhost-net (e.g. because
it doesn't exist) but it's not required we proceed with vhostfd array
filled with -1 and vhostfdSize unchanged. Later, when constructing
the qemu command line only non-negative items within vhostfd array
are taken into account. This means, vhostfdSize may be greater than
the actual count of non-negative items in vhostfd array. This results
in improper command line arguments being generated, e.g.:
-netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=(null)
If we are just ejecting media, ret == -1 even after the retry loop
determines that the tray is open, as requested. This means media
disconnect always report's error.
Fix it, and fix some other mini issues:
- Don't overwrite the 'eject' error message if the retry loop fails
- Move the retries decrement inside the loop, otherwise the final loop
might succeed, yet retries == 0 and we will raise error
- Setting ret = -1 in the disk->src check is unneeded
- Fix comment typos
cc: mprivozn@redhat.com
Currently qemuDomainReboot() does reboot in two phases:
qemuMonitorSystemPowerdown() and qemuProcessFakeReboot().
qemuMonitorSystemPowerdown() shutdowns the domain and saves domain
state/reason as VIR_DOMAIN_SHUTDOWN_UNKNOWN.
qemuProcessFakeReboot() sets domain state/reason to
VIR_DOMAIN_RESUMED_UNPAUSED but does not save domain state changes.
Subsequent restart of libvirtd leads to restoring domain state/reason to
saved that is VIR_DOMAIN_SHUTDOWN_UNKNOWN and to automatic shutdown of
the domain. This commit adds virDomainSaveStatus() into
qemuProcessFakeReboot() to avoid unexpected shutdowns.
With previous patch, we accept negative value as length of string to
duplicate. So there is no need to pass strlen(src) in case we want to do
duplicate the whole string.
Function qemuDomainSetBlockIoTune() was checking QEMU capabilities
even when !(flags & VIR_DOMAIN_AFFECT_LIVE) and the domain was
shutoff, resulting in the following problem:
virsh # domstate asdf; blkdeviotune asdf vda --write-bytes-sec 100
shut off
error: Unable to change block I/O throttle
error: unsupported configuration: block I/O throttling not supported with this QEMU binary
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=965016
Since f03dcc5 we use [::] as the listening address both on qemu
command line in -incoming and in nbd-server-start QMP command.
However the latter requires just :: without the braces.
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
In 84c59ffa I've tried to fix changing ejectable media process. The
process should go like this:
1) we need to call 'eject' on the monitor
2) we should wait for 'DEVICE_TRAY_MOVED' event
3) now we can issue 'change' command
However, while waiting in step 2) the domain monitor was locked. So
even if qemu reported the desired event, the proper callback was not
called immediately. The monitor handling code needs to lock the
monitor in order to read the event. So that's the first lock we must
not hold while waiting. The second one is the domain lock. When
monitor handling code reads an event, the appropriate callback is
called then. The first thing that each callback does is locking the
corresponding domain as a domain or its device is about to change
state. So we need to unlock both monitor and VM lock. Well, holding
any lock while sleep()-ing is not the best thing to do anyway.
Since 0d70656afd, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
Resolves:https://bugzilla.redhat.com/show_bug.cgi?id=927620
#kill -STOP `pidof qemu-kvm`
#virsh destroy $guest --graceful
error: Failed to destroy domain testVM
error: An error occurred, but the cause is unknown
With --graceful, SIGTERM always is emitted to kill driver
process, but it won't success till burning out waiting time
in case of process being stopped.
But domain destroy without --graceful can work, SIGKILL will
be emitted to the stopped process after 10 secs which always
kills a process even one that is currently stopped.
So report an error after burning out waiting time in this case.
Change bbe97ae968 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>