During qemuTranslateDiskSourcePool() execution, if the srcpool has been
defined with authentication information, then for iSCSI pools copy the
authentication and host information to virDomainDiskDef.
Due to a goto statement missed when refactoring in 2771f8b74c
when acquiring of a domain job failed the error path was not taken. This
resulted into a crash afterwards as an extra reference was removed from a
domain object leading to it being freed. An attempt to list the domains
leaded to a crash of the daemon afterwards.
https://bugzilla.redhat.com/show_bug.cgi?id=928672
The translation must be done before both of cgroup and security
setting, otherwise since the disk source is not translated yet,
it might be skipped on cgroup and security setting.
The difference with already supported pool types (dir, fs, block)
is: there are two modes for iscsi pool (or network pools in future),
one can specify it either to use the volume target path (the path
showed up on host) with mode='host', or to use the remote URI qemu
supports (e.g. file=iscsi://example.org:6000/iqn.1992-01.com.example/1)
with mode='direct'.
For 'host' mode, it copies the volume target path into disk->src. For
'direct' mode, the corresponding info in the *one* pool source host def
is copied to disk->hosts[0].
Convert the remaining methods in vircgroup.c to report errors
instead of returning errno values.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The alias for hostdevs of type SCSI can be too long for QEMU if
larger LUNs are encountered. Here's a real life example:
<hostdev mode='subsystem' type='scsi' managed='no'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='19' unit='1088634913'/>
</source>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</hostdev>
this results in a too long drive id, resulting in QEMU yelling
Property 'scsi-generic.drive' can't find value 'drive-hostdev-scsi_host0-0-19-1088634913'
This commit changes the alias back to the default hostdev$(index)
scheme.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
In case libvirtd is asked to unplug a device but the device is actually
unplugged later when libvirtd is not running, we need to detect that and
remove such device when libvirtd starts again and reconnects to running
domains.
A future patch wants the DAC security manager to be able to safely
get the supplemental group list for a given uid, but at the time
of a fork rather than during initialization so as to pick up on
live changes to the system's group database. This patch adds the
framework, including the possibility of a pre-fork callback
failing.
For now, any driver that implements a prefork callback must be
robust against the possibility of being part of a security stack
where a later element in the chain fails prefork. This means
that drivers cannot do any action that requires a call to postfork
for proper cleanup (no grabbing a mutex, for example). If this
is too prohibitive in the future, we would have to switch to a
transactioning sequence, where each driver has (up to) 3 callbacks:
PreForkPrepare, PreForkCommit, and PreForkAbort, to either clean
up or commit changes made during prepare.
* src/security/security_driver.h (virSecurityDriverPreFork): New
callback.
* src/security/security_manager.h (virSecurityManagerPreFork):
Change signature.
* src/security/security_manager.c (virSecurityManagerPreFork):
Optionally call into driver, and allow returning failure.
* src/security/security_stack.c (virSecurityDriverStack):
Wrap the handler for the stack driver.
* src/qemu/qemu_process.c (qemuProcessStart): Adjust caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
When user does not specify any model for scsi controller, or worse, no
controller at all, but libvirt automatically adds scsi controller with
no model, we are not searching for virtio-scsi and thus this can fail
for example on qemu which doesn't support lsi logic adapter.
This means that when qemu on x86 doesn't support lsi53c895a and the
user adds the following to an XML without any scsi controller:
<disk ...>
...
<target dev='sda'>
</disk>
libvirt fails like this:
# virsh define asdf.xml
error: Failed to define domain from asdf.xml
error: internal error Unable to determine model for scsi controller
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=974943
When virAsprintf was changed from a function to a macro
reporting OOM error in dc6f2da, it was documented as returning
0 on success. This is incorrect, it returns the number of bytes
written as asprintf does.
Some of the functions were converted to use virAsprintf's return
value directly, changing the return value on success from 0 to >= 0.
For most of these, this is not a problem, but the change in
virPCIDriverDir breaks PCI passthrough.
The return value check in virhashtest pre-dates virAsprintf OOM
conversion.
vmwareMakePath seems to be unused.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduced in commit 24b08219; compilation on RHEL 6.4 complained:
qemu/qemu_hotplug.c: In function 'qemuDomainAttachChrDevice':
qemu/qemu_hotplug.c:1257: error: declaration of 'remove' shadows a global declaration [-Wshadow]
/usr/include/stdio.h:177: error: shadowed declaration is here [-Wshadow]
* src/qemu/qemu_hotplug.c (qemuDomainAttachChrDevice): Avoid the
name 'remove'.
Signed-off-by: Eric Blake <eblake@redhat.com>
A part of the returned monitor response was freed twice and caused
crashes of the daemon when using guest agent cpu count retrieval.
# virsh vcpucount dom --guest
Introduced in v1.0.6-48-gc6afcb0
Implement the new API that will handle setting the balloon driver statistics
collection period in order to enable or disable the collection dynamically.
This patch will add the qemuMonitorJSONGetMemoryStats() to execute a
"guest-stats" on the balloonpath using "get-qom" replacing the former
mechanism which looked through the "query-ballon" returned data for
the fields. The "query-balloon" code only returns 'actual' memory.
Rather than duplicating the existing code, have the JSON API use the
GetBalloonInfo API.
A check in the qemuMonitorGetMemoryStats() will be made to ensure the
balloon driver path has been set. Since the underlying JSON code can
return data not associated with the balloon driver, we don't fail on
a failure to get the balloonpath. Of course since we've made the check,
we can then set the ballooninit flag. Getting the path here is primarily
due to the process reconnect path which doesn't attempt to set the
collection period.
At vm startup and attach attempt to set the balloon driver statistics
collection period based on the value found in the domain xml file. This
is not done at reconnect since it's possible that a collection period
was set on the live guest and making the set period call would reset to
whatever value is stored in the config file.
Setting the stats collection period has a side effect of searching through
the qom-list output for the virtio balloon driver and making sure that it
has the right properties in order to allow setting of a collection period
and eventually fetching of statistics.
The walk through the qom-list is expensive and thus the balloonpath will
be saved in the monitor private structure as well as a flag indicating
that the initialization has already been attempted (in the event that a
path is not found, no sense to keep checking).
This processing model conforms to the qom object model model which
requires setting object properties after device startup. That is, it's
not possible to pass the period along via the startup code as it won't
be recognized.
If users haven't configured guest agent then qemuAgentCommand() will
dereference a NULL 'mon' pointer, which causes crash of libvirtd when
using agent based cpu (un)plug.
With the patch, when the qemu-ga service isn't running in the guest,
a expected error "error: Guest agent is not responding: Guest agent
not available for now" will be raised, and the error "error: argument
unsupported: QEMU guest agent is not configured" is raised when the
guest hasn't configured guest agent.
GDB backtrace:
(gdb) bt
#0 virNetServerFatalSignal (sig=11, siginfo=<value optimized out>, context=<value optimized out>) at rpc/virnetserver.c:326
#1 <signal handler called>
#2 qemuAgentCommand (mon=0x0, cmd=0x7f39300017b0, reply=0x7f394b090910, seconds=-2) at qemu/qemu_agent.c:975
#3 0x00007f39429507f6 in qemuAgentGetVCPUs (mon=0x0, info=0x7f394b0909b8) at qemu/qemu_agent.c:1475
#4 0x00007f39429d9857 in qemuDomainGetVcpusFlags (dom=<value optimized out>, flags=9) at qemu/qemu_driver.c:4849
#5 0x00007f3957dffd8d in virDomainGetVcpusFlags (domain=0x7f39300009c0, flags=8) at libvirt.c:9843
How to reproduce?
# To start a guest without guest agent configuration
# then run the following cmdline
# virsh vcpucount foobar --guest
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=984821
Signed-off-by: Alex Jia <ajia@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
There are two levels on which a device may be hotplugged: config
and live. The config level requires just an insert or remove from
internal domain definition structure, which is exactly what this
patch does. There is currently no implementation for a chardev
update action, as there's not much to be updated. But more
importantly, the only thing that can be updated is path or socket
address by which chardevs are distinguished. So the update action
is currently not supported.
Add a new qemuMonitorJSONSetObjectProperty() method to support invocation
of the 'qom-set' JSON monitor command with a provided path, property, and
expected data type to set.
NOTE: The set API was added only for the purpose of the qemumonitorjsontest
The test code uses the same "/machine/i440fx" property as the get test and
attempts to set the "realized" property to "true" (which it should be set
at anyway).
Add a new qemuMonitorJSONGetObjectProperty() method to support invocation
of the 'qom-get' JSON monitor command with a provided path, property, and
expected data type return. The qemuMonitorJSONObjectProperty is similar to
virTypedParameter; however, a future patch will extend it a bit to include
a void pointer to balloon driver statistic data.
NOTE: The ObjectProperty structures and API are added only for the
purpose of the qemumonitorjsontest
The provided test will execute a qom-get on "/machine/i440fx" which will
return a property "realized".
Add a new qemuMonitorJSONGetObjectListPaths() method to support invocation
of the 'qom-list' JSON monitor command with a provided path.
NOTE: The ListPath structures and API's are added only for the
purpose of the qemumonitorjsontest
The returned list of paired data fields of "name" and "type" that can
be used to peruse QOM configuration data and eventually utilize for the
balloon statistics.
The test does a "{"execute":"qom-list", "arguments": { "path": "/"}}" which
returns "{"return": [{"name": "machine", "type": "child<container>"},
{"name": "type", "type": "string"}]}" resulting in a return of an array
of 2 elements with [0].name="machine", [0].type="child<container>". The [1]
entry appears to be a header that could be used some day via a command such
as "virsh qemuobject --list" to format output.
If an error occurs during qemuDomainAttachNetDevice after the macvtap
was created in qemuPhysIfaceConnect, the macvtap device gets left behind.
This patch adds code to the cleanup routine to delete the macvtap.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
I recently patches the callers to virPCIDeviceReset() to not call it
if the current driver for a device was vfio-pci (since that driver
will always reset the device itself when appropriate. At the time, Dan
Berrange suggested that I could instead modify virPCIDeviceReset
to check the currently bound driver for the device, and decide
for itself whether or not to go ahead with the reset.
This patch removes the previously added checks, and replaces them with
a check down in virPCIDeviceReset(), as suggested.
The functional difference here is that previously we were deciding
based on either the hostdev configuration or the value of
stubDriverName in the virPCIDevice object, but now we are actually
comparing to the "driver" link in the device's sysfs entry
directly. In practice, both should be the same.
The function being introduced is responsible for creating command
line argument for '-device' for given character device. Based on
the chardev type, it calls appropriate qemuBuild.*ChrDeviceStr(),
e.g. qemuBuildSerialChrDeviceStr() for serial chardev and so on.
The chardev alias assignment is going to be needed in a separate
places, so it should be moved into a separate function rather
than copying code randomly around.
The function being introduced is responsible for preparing and
executing 'chardev-add' qemu monitor command. Moreover, in case
of PTY chardev, the corresponding pty path is updated.
Currently, we are building InetSocketAddress qemu json type
within the qemuMonitorJSONNBDServerStart function. However, other
future functions may profit from the code as well. So it should
be moved into a static function.
Recent changes uncovered a possibility that 'last_processed_hostdev_vf'
was set to -1 in 'qemuPrepareHostdevPCIDevices' and would cause problems
in for loop end condition in the 'resetvfnetconfig' label if the
variable was never set to 'i' due to 'qemuDomainHostdevNetConfigReplace'
failure.
With current code, error reporting for unsupported devices for hot plug,
unplug and update is total mess. The VIR_ERR_CONFIG_UNSUPPORTED error
code is reported instead of VIR_ERR_OPERATION_UNSUPPORTED. Moreover, the
error messages are not helping to find the root cause (lack of
implementation).
For low-memory domains (roughly under 400MB) our automatic memory limit
computation comes up with a limit that's too low. This is because the
0.5 multiplication does not add enough for such small values. Let's
increase the constant part of the computation to fix this.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Whenever virPortAllocatorRelease is called with port == 0, it complains
that the port is not in an allowed range, which is expectable as the
port was never allocated. Let's make virPortAllocatorRelease ignore 0
ports in a similar way free() ignores NULL pointers.
https://bugzilla.redhat.com/show_bug.cgi?id=981139
If a domain is paused before migration starts, we need to tell that to
the destination libvirtd to prevent it from resuming the domain at the
end of migration. This regression was introduced by commit 5379bb0.
Since commit 23e8b5d8, the code is refactored in a way that supports
domains with multiple graphics elements and commit 37b415200 allows
starting such domains. However none of those commits take migration
into account. Even though qemu doesn't support relocation for
anything else than SPICE and for no more than one graphics, there is no
reason to hardcode one graphics into this part of the code as well.
Add monitor callback API domainGuestPanic, that implements
'destroy', 'restart' and 'preserve' events of the 'on_crash'
in the XML when domain crashed.
After abf75aea24 the compiler screams:
qemu/qemu_driver.c: In function 'qemuNodeDeviceDetachFlags':
qemu/qemu_driver.c:10693:9: error: 'domain' may be used uninitialized in this function [-Werror=maybe-uninitialized]
pci = virPCIDeviceNew(domain, bus, slot, function);
^
qemu/qemu_driver.c:10693:9: error: 'bus' may be used uninitialized in this function [-Werror=maybe-uninitialized]
qemu/qemu_driver.c:10693:9: error: 'slot' may be used uninitialized in this function [-Werror=maybe-uninitialized]
qemu/qemu_driver.c:10693:9: error: 'function' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Since the other functions qemuNodeDeviceReAttach and qemuNodeDeviceReset
looks exactly the same, I've initialized the variables there as well.
However, I am still wondering why those functions don't matter to gcc
while the first one does.
Implement check whether (maximum) vCPUs doesn't exceed machine
type's cpu-max settings.
On older versions of QEMU the check is disabled.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
A loop in qemuPrepareHostdevPCIDevices() intended to cycle through all
the objects on the list pcidevs was doing "while (listcount > 0)", but
nothing in the body of the loop was reducing the size of the list - it
was instead removing items from a *different* list. It has now been
safely changed to a for() loop.
(This isn't as bad as it sounds - it's only a problem in case of an
OOM error.)
qemuGetActivePciHostDeviceList() had been creating a list that
contained pointers to objects that were also on the activePciHostdevs
list. In case of an OOM error, this newly created list would be
virObjectUnref'ed, which would cause everything on the list to be
freed. But all of those objects would still be on the
activePciHostdevs list, which could have very bad consequences if that
list was ever again accessed.
The solution used here is to populate the new list with *copies* of
the objects from the original list. It turns out that on return from
qemuGetActivePciHostDeviceList(), the caller would almost immediately
go through all the device objects and "steal" them (i.e. remove the
pointer from the list but not delete it) all from either one list or
the other; we now instead just *delete* (remove from the list and
free) each device from one list or the other, so in the end we have
the same state.
I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).
I'm changing it now because I'm about to add several more functions
that deal with iommu groups.
The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it.
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.
virPCIDeviceDetach would previously sometimes consume the input device
object (to put it on the inactive list) and sometimes not. Avoiding
memory leaks required checking beforehand to see if the device was
already on the list, and freeing the device object in the caller only
if there wasn't already an identical object on the inactive list.
This patch makes it consistent - virPCIDeviceDetach will *never*
consume the input virPCIDevice object; if it needs to put one on the
inactive list, it will create a copy and put *that* on the list. This
way the caller knows that it is always their responsibility to free
the device object they created.
Previously stubDriver was always set from a string literal, so it was
okay to use a const char * that wasn't freed when the virPCIDevice was
freed. This will not be the case in the near future, so it is now a
char* that is allocated in virPCIDeviceSetStubDriver() and freed
during virPCIDeviceFree().
Add new CPU features for HyperV:
vapic for virtual APIC support
spinlocks for setting spinlock support
<features>
<hyperv>
<vapic state='on'/>
<spinlocks state='on' retries='4096'/>
</hyperv>
</features>
https://bugzilla.redhat.com/show_bug.cgi?id=784836
Commit 752596b5 broke the build with -Werror
qemu/qemu_hotplug.c: In function 'qemuDomainChangeGraphics':
qemu/qemu_hotplug.c:1980:39: error: declaration of 'listen' shadows a
global declaration [-Werror=shadow]
Fix with s/listen/newlisten/
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in
7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
As a consequence of the cgroup layout changes from commit '632f78ca', the
qemuDomainGetSchedulerParameters[Flags]()' and qemuGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix
cpu_shares : 0
vcpu_period : 0
vcpu_quota : 0
emulator_period: 0
emulator_quota : 0
This patch will restore the capability to return configuration only data
for a non running domain regardless of whether cgroups are available.
This flag is meant for errors happening on the source of the migration
and isn't used on the destination. To allow better migration
compatibility, don't propagate it to the destination.
Paolo Bonzini pointed out that it's actually possible to migrate a qemu
instance that was paused due to I/O error and it will be able to work on
the destination if the storage is accessible.
This patch introduces flag VIR_MIGRATE_ABORT_ON_ERROR that cancels the
migration in case an I/O error happens while it's being performed and
allows migration without this flag. This flag can be possibly used for
other error reasons that may be introduced in the future.
Currently, we wait for SPICE to migrate in the very same loop where we
wait for qemu to migrate. This has a disadvantage of slowing seamless
migration down. One one hand, we should not kill the domain until all
SPICE data has been migrated. On the other hand, there is no need to
wait in the very same loop and hence slowing down 'cont' on the
destination. For instance, if users are watching a movie, they can
experience the movie to be stopped for a couple of seconds, as
processors are not running nor on src nor on dst as libvirt waits for
SPICE to migrate. We should move the waiting phase to migration CONFIRM
phase.
When qemu >= 1.20, it is safe to use -device for primary video
device as described in 4c993d8ab.
So, we are missing the cap flag in QMP capabilities detection, this
flag can be initialized safely in virQEMUCapsInitQMPBasic.
Convert input XML to migratable before using it in
qemuDomainSaveImageOpen.
XML in the save image is migratable, i.e. doesn't contain implicit
controllers. If these controllers were in a non-default order in the
input XML, the ABI check would fail. Removing and re-adding these
controllers fixes it.
https://bugzilla.redhat.com/show_bug.cgi?id=834196
During a live migration the guest may receive a disk access I/O error.
In this state the guest is unable to continue running on a remote host
after migration as some state may be present in the kernel and not
migrated.
With this patch, the migration is canceled in such case so it can either
continue on the source if the I/O issues are recovered or has to be
destroyed anyways.
https://bugzilla.redhat.com/show_bug.cgi?id=971485
As of d7f9d82753 we copy the listen
address from the qemu.conf config file in case none has been provided
via XML. But later, when migrating, we should not include such listen
address in the migratable XML as it is something autogenerated, not
requested by user. Moreover, the binding to the listen address will
likely fail, unless the address is '0.0.0.0' or its IPv6 equivalent.
This patch introduces a new boolean attribute to virDomainGraphicsListenDef
to distinguish autofilled listen addresses. However, we must keep the
attribute over libvirtd restarts, so it must be kept within status XML.
This patch fixes changes done in commit 29c1e913e4
that was pushed without implementing review feedback.
The flag introduced by the patch is changed to VIR_DOMAIN_VCPU_GUEST and
documentation makes the difference between regular hotplug and this new
functionality more explicit.
The virsh options that enable the use of the new flag are changed to
"--guest" and the documentation is fixed too.
Currently, there's a path to use the ncpuinfo variable uninitialized,
which leads to a compiler warning:
qemu/qemu_driver.c: In function 'qemuDomainGetVcpusFlags':
qemu/qemu_driver.c:4573:9: error: 'ncpuinfo' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
for (i = 0; i < ncpuinfo; i++) {
^
This patch implements support for the "cpu-add" QMP command that plugs
CPUs into a live guest. The "cpu-add" command was introduced in QEMU
1.5. For the hotplug to work machine type "pc-i440fx-1.5" is required.
The qemu guest agent allows to online and offline CPUs from the
perspective of the guest. This patch adds helpers that call
'guest-get-vcpus' and 'guest-set-vcpus' guest agent functions and
convert the data for internal libvirt usage.
https://bugzilla.redhat.com/show_bug.cgi?id=964177
Though both libvirt and QEMU's document say RTC_CHANGE returns
the offset from the host UTC, qemu actually returns the offset
from the specified date instead when specific date is provided
(-rtc base=$date).
It's not safe for qemu to fix it in code, it worked like that
for 3 years, changing it now may break other QEMU use cases.
What qemu tries to do is to fix the document:
http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg04782.html
And in libvirt side, instead of replying on the value from qemu,
this converts the offset returned from qemu to the offset from
host UTC, by:
/*
* a: the offset from qemu RTC_CHANGE event
* b: The specified date (-rtc base=$date)
* c: the host date when libvirt gets the RTC_CHANGE event
* offset: What libvirt will report
*/
offset = a + (b - c);
The specified date (-rtc base=$date) is recorded in clock's def as
an internal only member (may be useful to exposed outside?).
Internal only XML tag "basetime" is introduced to not lose the
guest's basetime after libvirt restarting/reloading:
<clock offset='variable' adjustment='304' basis='utc' basetime='1370423588'/>
Currently, a listen address for a SPICE server can be specified. Later,
when the domain is migrated, we need to relocate the graphics which
involves telling new destination to the SPICE server. However, we can't
just assume the listen address is the new location, because the listen
address can be ANYCAST (0.0.0.0 for IPv4, :: for IPv6). In which case,
we want to pass the remote hostname. But there are some troubles with
ANYCAST. In both IPv4 and IPv6 it has many ways for specifying such
address. For instance, in IPv4: 0, 0.0, 0.0.0, 0.0.0.0. The number of
variations gets bigger in IPv6 world. Hence, in order to check for
ANYCAST address sanely, we should take the provided listen address,
parse it and format back in it's full form. Which is exactly what this
patch does.
The work was done at the time of snapshot xmlstring parsing
if (offline && def->memory &&
def->memory != VIR_DOMAIN_SNAPSHOT_LOCATION_NONE) {
virReportError(...);
}
This should resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=959191
The problem was that qemuUpdateActivePciHostdevs was returning 0
(success) when no hostdevs were present, but would otherwise return -1
(failure) even when it completed successfully. It is only called from
qemuProcessReconnect(), and when qemuProcessReconnect got back an
error, it would not only stop reconnecting, but would terminate the
guest qemu process "to remove danger of it ending up running twice if
user tries to start it again later".
(This bug was introduced in commit 011cf7ad, which was pushed between
v1.0.2 and v1.0.3, so all maintenance branches from v1.0.3 up to 1.0.5
will need this one line patch applied.)
A literal IPv6 must be escaped, otherwise migration fails with:
unable to execute QEMU command 'drive-mirror': address resolution failed
for f0::0d:5901: Servname not supported for ai_socktype
since QEMU treats everything after the first ':' as the port.
If snapshot creation failed for example due to invalid use of the
"REUSE_EXTERNAL" flag, libvirt killed access to the original image file
instead of the new image file. On machines with selinux this kills the
whole VM as the selinux context is enforced immediately.
* qemu_driver.c:qemuDomainSnapshotUndoSingleDiskActive():
- Kill access to the new image file instead of the old one.
Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=906639
A bug in Cygwin [1] and poor error messages from gcc [2] lead
to this confusing compilation error:
qemu/qemu_monitor.c:418:9: error: passing argument 2 of 'sendmsg' from incmpatible pointer type
/usr/include/sys/socket.h:42:11: note: expected 'const struct msghdr *' but argument is of type 'struct msghdr *'
[1] http://cygwin.com/ml/cygwin/2013-05/msg00451.html
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57475
* src/qemu/qemu_monitor.c (includes): Include <sys/socket.h>
before <sys/un.h>.
Signed-off-by: Eric Blake <eblake@redhat.com>
This is a recurring problem for cygwin :)
For example, see commit 23a4df88.
qemu/qemu_driver.c: In function 'qemuStateInitialize':
qemu/qemu_driver.c:691:13: error: format '%d' expects type 'int', but argument 8 has type 'uid_t' [-Wformat]
* src/qemu/qemu_driver.c (qemuStateInitialize): Add casts.
* daemon/remote.c (remoteDispatchAuthList): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
A cygwin build of the qemu driver fails with:
qemu/qemu_process.c: In function 'qemuPrepareCpumap':
qemu/qemu_process.c:1803:31: error: 'CPU_SETSIZE' undeclared (first use in this function)
CPU_SETSIZE is a Linux extension in <sched.h>; a bit more portable
is using sysconf if _SC_NPROCESSORS_CONF is defined (several platforms
have it, including Cygwin). Ultimately, I would have preferred to
use gnulib's 'nproc' module, but it is currently under an incompatible
license.
* src/qemu/qemu_conf.h (QEMUD_CPUMASK_LEN): Provide definition on
cygwin.
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, if there's an error opening /dev/vhost-net (e.g. because
it doesn't exist) but it's not required we proceed with vhostfd array
filled with -1 and vhostfdSize unchanged. Later, when constructing
the qemu command line only non-negative items within vhostfd array
are taken into account. This means, vhostfdSize may be greater than
the actual count of non-negative items in vhostfd array. This results
in improper command line arguments being generated, e.g.:
-netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=(null)
If we are just ejecting media, ret == -1 even after the retry loop
determines that the tray is open, as requested. This means media
disconnect always report's error.
Fix it, and fix some other mini issues:
- Don't overwrite the 'eject' error message if the retry loop fails
- Move the retries decrement inside the loop, otherwise the final loop
might succeed, yet retries == 0 and we will raise error
- Setting ret = -1 in the disk->src check is unneeded
- Fix comment typos
cc: mprivozn@redhat.com
Currently qemuDomainReboot() does reboot in two phases:
qemuMonitorSystemPowerdown() and qemuProcessFakeReboot().
qemuMonitorSystemPowerdown() shutdowns the domain and saves domain
state/reason as VIR_DOMAIN_SHUTDOWN_UNKNOWN.
qemuProcessFakeReboot() sets domain state/reason to
VIR_DOMAIN_RESUMED_UNPAUSED but does not save domain state changes.
Subsequent restart of libvirtd leads to restoring domain state/reason to
saved that is VIR_DOMAIN_SHUTDOWN_UNKNOWN and to automatic shutdown of
the domain. This commit adds virDomainSaveStatus() into
qemuProcessFakeReboot() to avoid unexpected shutdowns.
With previous patch, we accept negative value as length of string to
duplicate. So there is no need to pass strlen(src) in case we want to do
duplicate the whole string.
Function qemuDomainSetBlockIoTune() was checking QEMU capabilities
even when !(flags & VIR_DOMAIN_AFFECT_LIVE) and the domain was
shutoff, resulting in the following problem:
virsh # domstate asdf; blkdeviotune asdf vda --write-bytes-sec 100
shut off
error: Unable to change block I/O throttle
error: unsupported configuration: block I/O throttling not supported with this QEMU binary
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=965016
Since f03dcc5 we use [::] as the listening address both on qemu
command line in -incoming and in nbd-server-start QMP command.
However the latter requires just :: without the braces.
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
In 84c59ffa I've tried to fix changing ejectable media process. The
process should go like this:
1) we need to call 'eject' on the monitor
2) we should wait for 'DEVICE_TRAY_MOVED' event
3) now we can issue 'change' command
However, while waiting in step 2) the domain monitor was locked. So
even if qemu reported the desired event, the proper callback was not
called immediately. The monitor handling code needs to lock the
monitor in order to read the event. So that's the first lock we must
not hold while waiting. The second one is the domain lock. When
monitor handling code reads an event, the appropriate callback is
called then. The first thing that each callback does is locking the
corresponding domain as a domain or its device is about to change
state. So we need to unlock both monitor and VM lock. Well, holding
any lock while sleep()-ing is not the best thing to do anyway.
Since 0d70656afd, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
Resolves:https://bugzilla.redhat.com/show_bug.cgi?id=927620
#kill -STOP `pidof qemu-kvm`
#virsh destroy $guest --graceful
error: Failed to destroy domain testVM
error: An error occurred, but the cause is unknown
With --graceful, SIGTERM always is emitted to kill driver
process, but it won't success till burning out waiting time
in case of process being stopped.
But domain destroy without --graceful can work, SIGKILL will
be emitted to the stopped process after 10 secs which always
kills a process even one that is currently stopped.
So report an error after burning out waiting time in this case.
Change bbe97ae968 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 632f78c introduced a regression which causes schedinfo being
unable to set some parameters. When migrating to priv->cgroup there
was missing variable left out and due to passed NULL to underlying
function, the setting failed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=963592
This adds the shared device entry when starting domain (more
exactly, when preparing host devices), and remove the entry
when destroying domain (when reattaching host devices).
This changes the helpers qemu{Add,Remove}SharedDisk into
qemu{Add,Remove}SharedDevice, as most of the code in the helpers
can be reused for scsi host device.
To track the shared scsi host device, first it finds out the
device path (e.g. /dev/s[dr]*) which is mapped to the sg device,
and use device ID of the found device path (/dev/s[dr]*) as the
hash key. This is because of the device ID is not unique between
between /dev/s[dr]* and /dev/sg*, e.g.
% sg_map
/dev/sg0 /dev/sda
/dev/sg1 /dev/sr0
% ls -l /dev/sda
brw-rw----. 1 root disk 8, 0 May 2 19:26 /dev/sda
%ls -l /dev/sg0
crw-rw----. 1 root disk 21, 0 May 2 19:26 /dev/sg0
"Shared disk" is not only the thing we should care about after "scsi
hostdev" is introduced. A same scsi device can be used as "disk" for
one domain, and as "scsi hostdev" for another domain at the same time.
That's why this patch renames qemu_driver->sharedDisks. Related functions
and structs are also renamed.
Commit 7f15ebc7a2 introduced a bug
happening when guests without a <graphics> element are migrated.
The initialization of listenAddress happens unconditionally
from the cookie even if the cookie->graphics pointer was NULL.
Moved the initialization to where it is safe.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
During building of the qemu command line determine whether to add/use the
'-no-reboot' option only if each of the 'on' events want to to destroy
the domain; otherwise, use the '-no-shutdown' option.
Prior to this change both could be on the command line, which while allowed
could be construed as a conflict.
Adding a VNC WebSocket support for QEMU driver. This functionality is
in upstream qemu from commit described as v1.3.0-982-g7536ee4, so the
capability is being recognized based on QEMU version for now.
QEMU introduced command line "-mem-merge=on|off" (defaults to on) to
enable/disable the memory merge (KSM) at guest startup. This exposes
it by new XML:
<memoryBacking>
<nosharepages/>
</memoryBacking>
The XML tag is same with what we used internally for old RHEL.
* src/qemu/qemu_capabilities.h: New capability bit.
* src/qemu/qemu_capabilities.c (virQEMUCapsProbeQMPCommandLine): New
function, based on qemuMonitorGetCommandLineOptionParameters, which was
introduced by commit bd56d0d813; use it to set new capability bit.
(virQEMUCapsInitQMP): Use new function.
The QEMU command line syntax for RBD disks is
file=rbd:pool/image:opt1=val1:opt2=val2...
There is no way to escape the ':' if it appears in the
pool or image name. Thus it must be explicitly forbidden
if it occurs in the libvirt XML. People are known to
be abusing the lack of escaping in current libvirt to
pass arbitrary args to QEMU.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit bd56d0d8 could lead to freeing an uninitialized pointer:
qemu/qemu_monitor_json.c: In function 'qemuMonitorJSONGetCommandLineOptionParameters':
qemu/qemu_monitor_json.c:4284: warning: 'cmd' may be used uninitialized in this function
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetCommandLineOptionParameters): Initialize variable.
Signed-off-by: Eric Blake <eblake@redhat.com>
Ever since the conversion to using only QMP for probing features
of qemu 1.2 and newer, we have been unable to detect features
that are added only by additional command line options. For
example, we'd like to know if '-machine mem-merge=on' (added
in qemu 1.5) is present. To do this, we will take advantage
of qemu 1.5's query-command-line-parameters QMP call [1].
This patch wires up the framework for probing the command results;
if the QMP command is missing, or if a particular command line
option does not output any parameters (for example, -net uses
a polymorphic parser, which showed up as no parameters as of qemu
1.5), we silently treat that command as having no results.
[1] https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg05180.html
* src/qemu/qemu_monitor.h (qemuMonitorGetOptions)
(qemuMonitorSetOptions)
(qemuMonitorGetCommandLineOptionParameters): New functions.
* src/qemu/qemu_monitor_json.h
(qemuMonitorJSONGetCommandLineOptionParameters): Likewise.
* src/qemu/qemu_monitor.c (_qemuMonitor): Add cache field.
(qemuMonitorDispose): Clean it.
(qemuMonitorGetCommandLineOptionParameters): Implement new function.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetCommandLineOptionParameters): Likewise.
(testQemuMonitorJSONGetCommandLineParameters): Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
No need to open code a string list cleanup, if we are nice
to the caller by guaranteeing a NULL-terminated result.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONGetCPUDefinitions)
(qemuMonitorJSONGetCommands, qemuMonitorJSONGetEvents)
(qemuMonitorJSONGetObjectTypes, qemuMonitorJSONGetObjectProps):
Use simpler cleanup.
Signed-off-by: Eric Blake <eblake@redhat.com>
This adds both attachment and detachment support for scsi host
device.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat>
Found that I was unable to start existing domains after updating
to a kernel with no cgroups support
# zgrep CGROUP /proc/config.gz
# CONFIG_CGROUPS is not set
# virsh start test
error: Failed to start domain test
error: Unable to initialize /machine cgroup: Cannot allocate memory
virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when
attempting to open /proc/cgroups on such a system, but it was being
dropped in virCgroupSetPartitionSuffix().
Change virCgroupSetPartitionSuffix() to propagate errors returned by
its callees. Also check for ENOENT in qemuInitCgroup() when determining
if cgroups support is available.
It's better to put the usb related codes into qemuDomainAttachHostUsbDevice
instead of qemuDomainAttachHostDevice.
And in the old qemuDomainAttachHostDevice, just stealing the "usb" from
driver->activeUsbHostdevs leaks the memory.
Although virtio-scsi supports SCSI PR (Persistent Reservations),
the device on host may do not support it. To avoid losing data,
Just like PCI and USB pass through devices, only one live guest
is allowed per SCSI host pass through device."
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
The <filesystem> element can now accept a <driver type='nbd'/>
as an alternative to 'loop'. The benefit of NBD is support
for non-raw disk image formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Extend the <driver> element in filesystem devices to
allow a storage format to be set. The new attribute
uses 'format' to reflect the storage format. This is
different from the <driver> element in disk devices
which use 'type' to reflect the storage format. This
is because the 'type' attribute on filesystem devices
is already used for the driver backend, for which the
disk devices use the 'name' attribute. Arggggh.
Anyway for disks we have
<driver name="qemu" type="raw"/>
And for filesystems this change means we now have
<driver type="loop" format="raw"/>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This adds the scsi-generic device into the device controller's
whitelist, so that it's allowed to used by the qemu process.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
Except the scsi host device's controller is "lsilogic", mapping
between the libvirt attributes and scsi-generic properties is:
libvirt qemu
-----------------------------------------
controller bus ($libvirt_controller.0)
bus channel
target scsi-id
unit lun
For scsi host device with "lsilogic" controller, the mapping is:
('target (libvirt)' must be 0, as it's not used; 'unit (libvirt)
must <= 7).
libvirt qemu
----------------------------------------------------------
controller && bus bus ($libvirt_controller.$libvirt_bus)
unit scsi-id
It's not good to hardcode/hard-check limits of these attributes,
and even worse, these limits are not documented, one has to find
out by either testing or reading the qemu code, I'm looking forward
to qemu expose limits like these one day). For example, exposing
"max_target", "max_lun" for megasas:
static const struct SCSIBusInfo megasas_scsi_info = {
.tcq = true,
.max_target = MFI_MAX_LD,
.max_lun = 255,
.transfer_data = megasas_xfer_complete,
.get_sg_list = megasas_get_sg_list,
.complete = megasas_command_complete,
.cancel = megasas_command_cancel,
};
Example of the qemu command line (lsilogic controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,scsi-id=8,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Example of the qemu command line (virtio-scsi controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,channel=0,scsi-id=128,lun=128,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
Adding two cap flags for scsi-generic:
QEMU_CAPS_SCSI_GENERIC
QEMU_CAPS_SCSI_GENERIC_BOOTINDEX
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
It is possible to build a kernel without swap cgroup controls
present. This causes a fatal error when querying memory
parameters. Treat missing swap controls as meaning "unlimited".
The fatal error remains if the user tries to actually change
the limit.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=851411https://bugzilla.redhat.com/show_bug.cgi?id=955500
The first problem was that virFileOpenAs was returning fd (-1) in one
of the error cases rather than ret (-errno), so the caller thought
that the error was EPERM rather than ENOENT.
The second problem was that some log messages in the general purpose
qemuOpenFile() function would always say "Failed to create" even if
the caller hadn't included O_CREAT (i.e. they were trying to open an
existing file).
This fixes virFileOpenAs to jump down to the error return (which
returns ret instead of fd) in the previously mentioned incorrect
failure case of virFileOpenAs(), removes all error logging from
virFileOpenAs() (since the callers report it), and modifies
qemuOpenFile to appropriately use "open" or "create" in its log
messages.
NB: I seriously considered removing logging from all callers of
virFileOpenAs(), but there is at least one case where the caller
doesn't want virFileOpenAs() to log any errors, because it's just
going to try again (qemuOpenFile()). We can't simply make a silent
variation of virFileOpenAs() though, because qemuOpenFile() can't make
the decision about whether or not it wants to retry until after
virFileOpenAs() has already returned an error code.
Likewise, I also considered changing virFileOpenAs() to return -1 with
errno set on return, and may still do that, but only as a separate
patch, as it obscures the intent of this patch too much.