Add a new parameter that will allow to return the XML stored in the save
image for further manipulation and adjust the callers. This option will
be used in later patches.
There is no need to acquire the driver-wide lock in
libxlDomainDefineXML. When switching to jobs in the libxl
driver, most driver-wide locks were removed. The locking here
was preserved since I mistakenly thought virDomainObjListAdd
needed protection. This is not the case, so remove the
unnecessary locking.
Commit id '9a2f36ec' added a build conditional of CAP_SYS_RAWIO
in order to determine whether or not a disk definition using rawio
should be allowed on platforms without CAP_SYS_RAWIO. If one was
found, virReportError was used but the code didn't goto cleanup.
This patch adds the goto.
We are not detecting the presence of FIPS from QEMU, but from procfs and
that means it's not QEMU capability. It was decided that we will pass
this flag to QEMU even if it's not supported by old QEMU binaries.
This patch also reverts changes done by commit a21cfb0f to
qemucapabilitestest and implements a new test case in qemuxml2argvtest.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1135431
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1141879
A long time ago I've implemented support for so called multiqueue
net. The idea was to let guest network traffic be processed by
multiple host CPUs and thus increasing performance. However, this
behavior is enabled by QEMU via special ioctl() iterated over the
all tap FDs passed in by libvirt. Unfortunately, SELinux comes in
and disallows the ioctl() call because the /dev/net/tun has label
system_u:object_r:tun_tap_device_t:s0 and 'attach_queue' ioctl()
is not allowed on tun_tap_device_t type. So after discussion with
a SELinux developer we've decided that the FDs passed to the QEMU
should be labelled with svirt_t type and SELinux policy will
allow the ioctl(). Therefore I've made a patch
(cf976d9dcf) that does exactly this. The patch
was fixed then by a443193139 and
b635b7a1af. However, things are not
that easy - even though the API to label FD is called
(fsetfilecon_raw) the underlying file is labelled too! So
effectively we are mangling /dev/net/tun label. Yes, that broke
dozen of other application from openvpn, or boxes, to qemu
running other domains.
The best solution would be if SELinux provides a way to label an
FD only, which could be then labeled when passed to the qemu.
However that's a long path to go and we should fix this
regression AQAP. So I went to talk to the SELinux developer again
and we agreed on temporary solution that:
1) All the three patches are reverted
2) SELinux temporarily allows 'attach_queue' on the
tun_tap_device_t
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
- Provide an implementation for buildPool and deletePool operations
for the ZFS storage backend.
- Add VIR_STORAGE_POOL_SOURCE_DEVICE flag to ZFS pool poolOptions
as now we can specify devices to build pool from
- storagepool.rng: add an optional 'sourceinfodev' to 'sourcezfs' and
add an optional 'target' to 'poolzfs' entity
- Add a couple of tests to storagepoolxml2xmltest
If the qemu being used doesn't support JSON, then querying for IOThread
data would fail. In that case, ensure the *iothreads is NULL and return 0
as the count of iothreads available.
Currently, build with clang fails with:
CC qemu/libvirt_driver_qemu_impl_la-qemu_command.lo
qemu/qemu_command.c:6580:58: error: implicit conversion from enumeration type
'virMemAccess' to different enumeration type 'virTristateSwitch'
[-Werror,-Wenum-conversion]
virTristateSwitch memAccess = def->cpu->cells[i].memAccess;
~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
1 error generated.
Fix that by using virMemAccess instead of virTristateSwitch.
Commit f05b6a91 added virQEMUDriverConfigPtr argument to the
virQEMUCapsFillDomainCaps function and it uses forward declaration
of virQEMUDriverConfig and virQEMUDriverConfigPtr that casues clang
build to fail:
gmake[3]: Entering directory `/usr/home/novel/code/libvirt/src'
CC qemu/libvirt_driver_qemu_impl_la-qemu_capabilities.lo
In file included from qemu/qemu_capabilities.c:43:
In file included from qemu/qemu_hostdev.h:27:
qemu/qemu_conf.h:63:37: error: redefinition of typedef 'virQEMUDriverConfig'
is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _virQEMUDriverConfig virQEMUDriverConfig;
^
qemu/qemu_capabilities.h:328:37: note: previous definition is here
typedef struct _virQEMUDriverConfig virQEMUDriverConfig;
^
Fix that by passing loader and nloader config attributes directly
instead of passing complete config.
Commit f36a94f introduced a double free on all success paths
in qemuSharedDeviceEntryInsert.
Only call qemuSharedDeviceEntryFree on the error path and
set entry to NULL before jumping there if the entry already
is in the hash table.
https://bugzilla.redhat.com/show_bug.cgi?id=1142722
Clean up all _virDomainMemoryStat.
Signed-off-by: James <james.wangyufei@huawei.com>
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Clean up all _virDomainBlockStats.
Signed-off-by: James <james.wangyufei@huawei.com>
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Clean up all _virDomainInterfaceStats.
Signed-off-by: Wang Yufei <james.wangyufei@huawei.com>
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Live definition was used to look up the disk index while persistent one
was indexed leading to a crash in qemuDomainGetBlockIoTune. Use the
correct def and report a nice error.
Unfortunately it's accessible via read-only connection, though it can
only crash libvirtd in the cases where the guest is hot-plugging disks
without reflecting those changes to the persistent definition. So
avoiding hotplug, or doing hotplug where persistent is always modified
alongside live definition, will avoid the out-of-bounds access.
Introduced in: eca96694a7f992be633d48d5ca03cedc9bbc3c9aa (v0.9.8)
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1140724
Reported-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1135396
There are two ways how to tell qemu to use huge pages. The first one
is suitable for domains with NUMA nodes: the path to hugetlbfs mount
is appended to NUMA node definition on the command line. The second
one is suitable for UMA domains: here there's this global '-mem-path'
argument that accepts path to the hugetlbfs mount point. However, the
latter case was not used for all the cases that it should be. For
instance:
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
didn't trigger the '-mem-path' so the huge pages - despite being
configured - were not used at all.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
As of 136ad4974 it is possible to specify different huge pages per
guest NUMA node. However, there's no check if nodeset specified in
./hugepages/page contains only those guest NUMA nodes that exist.
In other words with current code it is possible to define meaningless
combination:
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0,2-3'/>
<page size='2048' unit='KiB' nodeset='1,4'/>
</hugepages>
</memoryBacking>
<vcpu placement='static'>4</vcpu>
<cpu>
<numa>
<cell id='0' cpus='0' memory='1048576'/>
<cell id='1' cpus='1' memory='1048576'/>
<cell id='2' cpus='2' memory='1048576'/>
<cell id='3' cpus='3' memory='1048576'/>
</numa>
</cpu>
Notice the node 4 in <hugepages/>?
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch implements the VIR_DOMAIN_STATS_BLOCK group of statistics.
To do so, a helper function to get the block stats of all the disks of
a domain is added.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
This patch implements the VIR_DOMAIN_STATS_INTERFACE group of
statistics.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
This patch implements the VIR_DOMAIN_STATS_VCPU group of statistics. To
do so, this patch also extracts a helper to gather the vCPU information.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
This patch implements the VIR_DOMAIN_STATS_CPU_TOTAL group of
statistics.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Future patches which will implement more bulk stats groups for QEMU will
need to access the connection object.
To accommodate that, a few changes are needed:
* enrich internal prototype to pass qemu driver object
* add per-group flag to mark if one collector needs monitor access or not
* If at least one collector of the requested stats needs monitor access
we must start a query job for each domain. The specific collectors
will run nested monitor jobs inside that.
* If the job can't be acquired we pass flags to the collector so
specific collectors that need monitor access can be skipped in order
to gather as much data as is possible.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Check to see if the UEFI binary mentioned in qemu.conf actually
exists, and if so expose it in domcapabilities like
<loader ...>
<value>/path/to/ovmf</value>
</loader>
We introduce some generic domcaps infrastructure for handling
a dynamic list of string values, it may be of use for future bits.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Up till now the virQEMUCapsFillDomainCaps() was type of void as
there was no way for it to fail. This is, however, going to
change in the next commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
qemu for IBM Power processor architecture is adding functionality for
supporting multiple 'pseries' machine type versions, each with different
capabilities. This patch is for supporting the same
Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com>
As of 542899168c we learned libvirt to use UEFI for domains.
However, management applications may firstly query if libvirt
supports it. And this is where virConnectGetDomainCapabilities()
API comes handy.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virStorageSourceInitChainElement initializes a new storage chain element
for use as a new disk source. If the new element doesn't contain the
driver name, copy it from the old source.
This fixes issue where a disk would forget the driver after a snapshot.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1140984
Commit b606bbb4 broke reporting of errors when setting of guest time
fails via the guest agent as the return value is not checked and later
overwritten by the return value qemuMonitorRTCResetReinjection();
Fix this by checking the return value before resetting the RTC
reinjection.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1142294
For tuning the network, alternative devices
for creating tap and vhost devices can be specified via:
<backend tap='/dev/net/tun' vhost='/dev/net-vhost'/>
Instead of checking upfront if the <driver> element will be needed
in a big condition, just format all the attributes into a string
and output the <driver> element if the string is not empty.
We already are checking for negative value, reporting an error, but
using wrong function and the check only succeeds when a value that
cannot be converted to number successfully is encountered. This patch
provides just a minor change in call of the right version
of function virStrToLong.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1138539
The first one occurs in openvzDomainMigratePrepare3Params() where in
case no remote uri is given, the distant hostname is used. The name is
obtained via virGetHostname() which require callers to free the
returned value.
The second leak lies in openvzDomainMigratePerform3Params(). There's a
virCommand used later. However, at the beginning of the function
virCheckFlags() is called which returns. So the command created was
leaked.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently, the setns() wrapper is supported only for x86_64 and i686
which leaves us failing to build on other platforms like arm, aarch64
and so on. This means, that the wrapper needs to be extended to those
platforms and make to fail on runtime not compile time.
The syscall numbers for other platforms was fetched using this
command:
kernel.git $ git grep "define.*__NR_setns" | grep -e arm -e powerpc -e s390
arch/arm/include/uapi/asm/unistd.h:#define __NR_setns (__NR_SYSCALL_BASE+375)
arch/arm64/include/asm/unistd32.h:#define __NR_setns 375
arch/powerpc/include/uapi/asm/unistd.h:#define __NR_setns 350
arch/s390/include/uapi/asm/unistd.h:#define __NR_setns 339
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The backing store string location offset 0 determines that the file
isn't present. The string size shouldn't be then checked:
from qemu.git/docs/specs/qcow2.txt
== Header ==
The first cluster of a qcow2 image contains the file header:
Byte 0 - 3: magic
QCOW magic string ("QFI\xfb")
4 - 7: version
Version number (valid values are 2 and 3)
8 - 15: backing_file_offset
Offset into the image file at which the backing file name
is stored (NB: The string is not null terminated). 0 if the
image doesn't have a backing file.
16 - 19: backing_file_size
Length of the backing file name in bytes. Must not be
longer than 1023 bytes. Undefined if the image doesn't have
a backing file. ^^^^^^^^^
This patch intentionally leaves the backing file string size check in
place in case a malformatted file would be presented to libvirt. Also
according to the docs the string size is maximum 1023 bytes, thus this
patch adds a check to verify that.
I was also able to verify that the check was done the same way in the
legacy qcow fromat (in qemu's code).
If there are no iothreads, then return from qemuProcessDetectIOThreadPIDs
without error; otherwise, the following occurs:
error: Failed to start domain $dom
error: An error occurred, but the cause is unknown
https://bugzilla.redhat.com/show_bug.cgi?id=1101574
Add an option 'iothreadpin' to the <cpuset> to allow for setting the
CPU affinity for each IOThread.
The iothreadspin will mimic the vcpupin with respect to being able to
assign each iothread to a specific CPU, although iothreads ids start
at 1 while vcpu ids start at 0. This matches the iothread naming scheme.
Modify qemuProcessStart() in order to allowing setting affinity to
specific CPU's for IOThreads. The process followed is similar to
that for the vCPU's.
This involves adding a function to fetch the IOThread id's via
qemuMonitorGetIOThreads() and adding them to iothreadpids[] list.
Then making sure all the cgroup data has been properly set up and
finally assigning affinity.
In order to support cpuset setting, introduce qemuSetupCgroupIOThreadsPin
and qemuSetupCgroupForIOThreads to mimic the existing Vcpu API's.
These will support having an 'iotrhreadpin' element in the 'cpuset' in
order to pin named IOThreads to specific CPU's. The IOThread pin names
will follow the IOThread naming scheme starting at 1 (eg "iothread1")
up through an including the def->iothreads value.
When spanning tree protocol is allowed in bridge settings, forward delay
value is set as well (default is 0 if omitted). Until now, there was no
check for delay value validity. Delay makes sense only as a positive
numerical value.
Note: However, even if you provide positive numerical value, brctl
utility only uses values from range <2,30>, so the number provided can
be modified (kernel most likely) to fall within this range.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1125764
Coverity complains about the calculation of the buf & len within
the PROBE macro. So to quiet things down, do the calculation prior
to usage in either write() or qemuMonitorIOWriteWithFD() calls and
then have the PROBE use the calculated values - which works.
Seems when commit id 'ea130e3b' added the checks to ensure each of
the hard_limit, soft_limit, and swap_hard_limit wasn't set at
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED - a copy/paste error of using
the 'hard_limit' for each comparison was done. Adjust the code.
Coverity complains that because of how 'offset' is initialized to
0 (zero), the resulting math and comparison on rem is pointless.
According to the origin commit id '3ec128989', the code is a
replacement for gmtime(), but without the localtime() or GMT
calculations - so just remove this code and add a comment
indicating the removal
Since 98b9acf5aa
This was a false positive where Coverity was complaining that the
remoteDeserializeTypedParameters() could allocate 'params', but
none of the callers could return the allocated memory back to their
caller since on input the param was passed by value. Additionally,
the flow of the code was that if params was NULL on entry, then each
function would return 'nparams' as the number of params entries the
caller would need to allocate in order to call the function again
with 'nparams' and 'params' being set. By the time the deserialize
routine was called params would have something. For other callers
where the 'params' was passed by reference as NULL since it's expected
that the deserialize allocates the memory and then have that passed
back to the original caller to dispose there was no Coverity issue.
As it turns out Coverity didn't quite seem to understand the
relationship between 'nparams' and 'params'; however, if the
!userAllocated path of the deserialize code compared against
limit in any manner, then the Coverity error went away which
was quite strange, but useful.
As it turns out one code path remoteDomainGetJobStats had a
comparison against 'limit' while another remoteConnectGetAllDomainStats
did not assuming that limit would be checked. So I refactored the
code a bit to cause the limit check to occur in deserialize for
both conditions and then only made the check of current returned
size against the incoming *nparams fail the non allocation case.
This means the job code doesn't need to check the limit any more,
while the stats code now does check the limit.
Additionally, to help perhaps decipher which of the various
callers to the deserialize code caused the failure - I used
a #define to pass the __FUNCNAME__ of the caller along so that
error messages could have something like:
error: remoteConnectGetAllDomainStats: too many parameters '2' for nparams '0'
error: Reconnected to the hypervisor
(it's a contrived error just to show the funcname in the error)
The manufacurer and product from USB device itself are usually not particularly
useful -- they tend to be missing, or ugly (all-uppercase, padded with spaces,
etc.). Prefer what's in the usb id database and fall back to descriptors only
if the device is too new to be in database.
https://bugzilla.redhat.com/show_bug.cgi?id=1138887
This patch adds initial migration support to the OpenVZ driver,
using the VIR_DRV_FEATURE_MIGRATION_PARAMS family of migration
functions.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We stupidly modeled block job bandwidth after migration
bandwidth, which in turn was an 'unsigned long' and therefore
subject to 32-bit vs. 64-bit interpretations. To work around
the fact that 10-gigabit interfaces are possible but don't fit
within 32 bits, the original interface took the number scaled
as MiB/sec. But this scaling is rather coarse, and it might
be nice to tune bandwidth finer than in megabyte chunks.
Several of the block job calls that can set speed are fed
through a common interface, so it was easier to adjust them all
at once. Note that there is intentionally no flag for the new
virDomainBlockCopy; there, since the API already uses a 64-bit
type always, instead of a possible 32-bit type, and is brand
new, it was easier to just avoid scaling issues. As with the
previous patch that adjusted the query side (commit db33cc24),
omitting the new flag preserves old behavior, and the
documentation now mentions limits of what happens when a 32-bit
machine is on either client or server side.
* include/libvirt/libvirt.h.in (virDomainBlockJobSetSpeedFlags)
(virDomainBlockPullFlags)
(VIR_DOMAIN_BLOCK_REBASE_BANDWIDTH_BYTES)
(VIR_DOMAIN_BLOCK_COMMIT_BANDWIDTH_BYTES): New enums.
* src/libvirt.c (virDomainBlockJobSetSpeed, virDomainBlockPull)
(virDomainBlockRebase, virDomainBlockCommit): Document them.
* src/qemu/qemu_driver.c (qemuDomainBlockJobSetSpeed)
(qemuDomainBlockPull, qemuDomainBlockRebase)
(qemuDomainBlockCommit, qemuDomainBlockJobImpl): Support new flag.
Signed-off-by: Eric Blake <eblake@redhat.com>
Upstream qemu 1.4 added some drive-mirror tunables not present
when it was first introduced in 1.3. Management apps may want
to set these in some cases (for example, without tuning
granularity down to sector size, a copy may end up occupying
more bytes than the original because an entire cluster is
copied even when only a sector within the cluster is dirty,
although tuning it down results in more CPU time to do the
copy). I haven't personally needed to use the parameters, but
since they exist, and since the new API supports virTypedParams,
we might as well expose them.
Since the tuning parameters aren't often used, and omitted from
the QMP command when unspecified, I think it is safe to rely on
qemu 1.3 to issue an error about them being unsupported, rather
than trying to create a new capability bit in libvirt.
Meanwhile, all versions of qemu from 1.4 to 2.1 have a bug where
a bad granularity (such as non-power-of-2) gives a poor message:
error: internal error: unable to execute QEMU command 'drive-mirror': Invalid parameter 'drive-virtio-disk0'
because of abuse of QERR_INVALID_PARAMETER (which is supposed to
name the parameter that was given a bad value, rather than the
value passed to some other parameter). I don't see that a
capability check will help, so we'll just live with it (and it
has since been improved in upstream qemu).
* src/qemu/qemu_monitor.h (qemuMonitorDriveMirror): Add
parameters.
* src/qemu/qemu_monitor.c (qemuMonitorDriveMirror): Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveMirror):
Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockCopyCommon): Likewise.
(qemuDomainBlockRebase, qemuDomainBlockCopy): Adjust callers.
* src/qemu/qemu_migration.c (qemuMigrationDriveMirror): Likewise.
* tests/qemumonitorjsontest.c (qemuMonitorJSONDriveMirror): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The hard part of managing the disk copy is already coded; all
this had to do was convert the XML and virTypedParameters into
the internal representation.
With this patch, all blockcopy operations that used the old
API should also work via the new API. Additional extensions,
such as supporting the granularity tunable or a network rather
than file destination, will be added as later patches.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function.
Signed-off-by: Eric Blake <eblake@redhat.com>
In order to implement the new virDomainBlockCopy, the existing
block copy internal implementation needs to be adjusted. The
new function will parse XML into a storage source, and parse
typed parameters into integers, then call into the same common
backend. For now, it's easier to keep the same implementation
limits that only local file destinations are suported, but now
the check needs to be explicit. Similar to qemuDomainBlockJobImpl
consuming 'vm', this code also consumes the caller's 'mirror'
description of the destination.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Rename...
(qemuDomainBlockCopyCommon): ...and adjust parameters.
(qemuDomainBlockRebase): Adjust caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
When a domain is undefined, there are options to remove it's
managed save state or snapshots. However, there's another file
that libvirt creates per domain: the NVRAM variable store file.
Make sure that the file is not left behind if the domain is
undefined.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If we end up at the cleanup lable before we've VIR_EXPAND_N the list,
then calling virQEMUCapsFreeStringList() with a NULL proplist could
theoretically deref proplist if nproplist was set. Coverity doesn't
seem to acknowledge the relationship between proplist and nproplist
assuming in virQEMUCapsFreeStringList that nproplist could be at
least 1 and thus have a null deref. It only seems to follow the
NULL proplist.
Signed-off-by: John Ferlan <jferlan@redhat.com>
With the virGetGroupList() change in place - Coverity further complains
that if we fail to virFork(), the groups will be leaked - which aha seems
to be the case. Adjust the logic to save off the -errno, free the groups,
and then return the value we saved
Signed-off-by: John Ferlan <jferlan@redhat.com>
This ends up being a very bizarre false positive. With an assist from
eblake, the claim is that mgetgroups() could return a -1 value, but yet
still have a groups buffer allocated, yet the example shown doesn't
seem to prove that.
Rather than fret about it, by adding a well placed sa_assert() on the
returned *list value we can "assure" ourselves that the mgetgroups()
failure path won't signal this condition.
Signed-off-by: John Ferlan <jferlan@redhat.com>
If a (floppy) drive isn't selected for snapshot explicitly and is empty
don't try to snapshot it. For external snapshots this would fail as we
can't generate a name for the snapshot from an empty drive.
Reported-by: Pavel Hrdina <phrdina@redhat.com>
To express empty drive we historically use storage source with empty
path. Unfortunately NBD disks may be declared without a path.
Add a helper to wrap this logic.
The libxl driver was blindly assigning libvirt's
virDomainLifecycleAction to libxl's libxl_action_on_shutdown, when
in fact the various actions take on different values in these enums.
Introduce helpers to properly map the enum values.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Test suites using the port allocator don't want to have different
behaviour depending on whether a port is in use on the host. Add
a VIR_PORT_ALLOCATOR_SKIP_BIND_CHECK which test suites can use
to skip the bind() test. The port allocator will thus only track
ports in use by the test suite process itself. This is fine when
using the port allocator to generate guest configs which won't
actually be launched
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
I've noticed two problem with the automatically created NVRAM varstore
file. The first, even though I run qemu as root:root for some reason I
get Permission denied when trying to open the _VARS.fd file. The
problem is, the upper directory misses execute permissions, which in
combination with us dropping some capabilities result in EPERM.
The next thing is, that if I switch SELinux to enforcing mode, I get
another EPERM because the vars file is not labeled correctly. It is
passed to qemu as disk and hence should be labelled as disk. QEMU may
write to it eventually, so this is different to kernel or initrd.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
With all the changes in my previous foray into this code, I forgot to
remove the libxlDomainEventQueue(driver, event); call inside the
dom == NULL condition.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Coverity notes that if the virConnectListAllDomains returns a negative
value then the loop at the cleanup label that ends on numDomains will
have issues.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Coverity notes that if qemuMonitorGetMachines() returns a negative
nmachines value, then the code at the cleanup label will have issues.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Coverity notes that if the call to virBitmapParse() returns a negative
value, then when we jump to the error label, the call to
virCapabilitiesClearHostNUMACellCPUTopology() will have issues
with the negative nb_cpus
Signed-off-by: John Ferlan <jferlan@redhat.com>
If the virNumaGetNodeCPUs() call fails with -1, then jumping to cleanup
with 'cpus == NULL' and calling virCapabilitiesClearHostNUMACellCPUTopology
will cause issues.
Signed-off-by: John Ferlan <jferlan@redhat.com>
In qemuProcessInitPCIAddresses() if qemuMonitorGetAllPCIAddresses()
returns a negative (or zero) value, then no need to call the
qemuProcessDetectPCIAddresses().
Signed-off-by: John Ferlan <jferlan@redhat.com>
The code compares def->forwarders when deciding to return 0 at a
couple of points, then uses "def->nfwds" as a way to index into
the def->forwarders array. That reference results in Coverity
complaining that def->forwarders being NULL was checked as part
of an arithmetic OR operation where failure could be any one 5
conditions, but that is not checked when entering the loop to
dereference the array. Changing the comparisons to use nfwds
will clear the warnings
Signed-off-by: John Ferlan <jferlan@redhat.com>
If the qemuMigrationEatCookie() fails to set mig, we jump to cleanup:
which will call qemuMigrationCancelDriveMirror() without first checking
if mig == NULL
Signed-off-by: John Ferlan <jferlan@redhat.com>
Perhaps a false positive, but since Coverity doesn't understand the
relationship between the 'count' and the 'strings', rather than leave
the chance the on input 'strings' is NULL and causes a deref - just
check for it and return
Signed-off-by: John Ferlan <jferlan@redhat.com>
If the VIR_STRDUP(exptime,...) fails, then we will jump to cleanup,
no need to check if exptime is set which causes Coverity to issue
a complaint in the virStrToLong_ll call because there wasn't a check
for a NULL value while there was one for the reference right after
the VIR_STRDUP().
Signed-off-by: John Ferlan <jferlan@redhat.com>
If we jump to cleanup before allocating the 'result', then the call
to virBlkioDeviceArrayClear will deref result causing a problem.
Signed-off-by: John Ferlan <jferlan@redhat.com>
If we jump to cleanup before allocating 'result', then the call to
virBlkioDeviceArrayClear() could dereference result
Signed-off-by: John Ferlan <jferlan@redhat.com>
If the virJSONValueNewObject() fails, then rather than going to error
and getting a Coverity false positive since it doesn't seem to understand
the relationship between nkeywords, keywords, and values and seems to
believe calling qemuFreeKeywords will cause a NULL deref - just return NULL
Signed-off-by: John Ferlan <jferlan@redhat.com>
Adjust the parentheses in/for the waitpid loops; otherwise, Coverity
points out:
(1) Event assignment: Assigning: "waitret" = "waitpid(pid, &status, 0) == -1"
(2) Event between: At condition "waitret == -1", the value of "waitret"
must be between 0 and 1.
(3) Event dead_error_condition: The condition "waitret == -1" cannot
be true.
(4) Event dead_error_begin: Execution cannot reach this statement:
"ret = -*__errno_location();".
Signed-off-by: John Ferlan <jferlan@redhat.com>
Coverity complains that when multiplying to 32 bit values that eventually
will be stored in a 64 bit value that it's possible the math could
overflow unless one of the values being multiplied is type cast to
the proper size.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Coverity complains that checking for !domlist after setting doms = domlist
and making a deref of doms just above
It seems the call in question was intended to me made in the case that
'doms' was passed in and not when the virDomainObjListExport() call
allocated domlist and already called virConnectGetAllDomainStatsCheckACL().
Thus rather than check for !domlist - check that "doms != domlist" in
order to avoid the Coverity message.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Handle a few places where Coverity complains about the value being
unused. For two of them (Close cases) - the comments above the close
indicate there is no harm to ignore the error - so added an ignore_value.
For the other condition, added an rc check like other callers.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since cd4d547576
Coverity notes that setting 'ret = -3' prior to the unconditional
setting of 'ret = 0' will cause the value to be UNUSED.
Since the comment indicates that it is expect to allow the code
to continue, just remove the ret = -3 setting.
Signed-off-by: John Ferlan <jferlan@redhat.com>
In qemuDomainSetBlkioParameters(), Coverity points out that the calls
to qemuDomainParseBlkioDeviceStr() are slightly different and points
out there may be a cut-n-paste error.
In the first call (AFFECT_LIVE), the second parameter is "param->field";
however, for the second call (AFFECT_CONFIG), the second parameter is
"params->field". It seems the "param->field" is correct especially since
each path as a setting of "param" to "¶ms[i]". Furthermore, there
were a few more instances of using "params[i]" instead of "param->"
which I cleaned up.
Signed-off-by: John Ferlan <jferlan@redhat.com>
After a4431931 the TAP FDs ale labeled with image label instead
of the process label. On the other hand, the commit was
incomplete as a few lines above, there's still old check for the
process label presence while it should be check for the image
label instead.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
From time to time weird bugreports occur on the list, e.g [1].
Even though the kernel supports setns syscall, there's an older
glibc in the system that misses a wrapper over the syscall.
Hence, after the configure phase we think there's no setns
support in the system, which is obviously wrong. On the other
hand, we can't rely on linux distributions to provide newer glibc
soon. Therefore we need to introduce the wrapper on or own.
1: https://www.redhat.com/archives/libvir-list/2014-September/msg00492.html
Signed-off-by: Stephan Sachse <ste.sachse@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virProcessTranslateStatus is used on error paths that should not spoil
the returned error. As the errors are ignored, use the quiet versions of
virAsprintf to create the message.
When using split UEFI image, it may come handy if libvirt manages per
domain _VARS file automatically. While the _CODE file is RO and can be
shared among multiple domains, you certainly don't want to do that on
the _VARS file. This latter one needs to be per domain. So at the
domain startup process, if it's determined that domain needs _VARS
file it's copied from this master _VARS file. The location of the
master file is configurable in qemu.conf.
Temporary, on per domain basis the location of master NVRAM file can
be overridden by this @template attribute I'm inventing to the
<nvram/> element. All it does is holding path to the master NVRAM file
from which local copy is created. If that's the case, the map in
qemu.conf is not consulted.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
QEMU now supports UEFI with the following command line:
-drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
where the first line reflects <loader> and the second one <nvram>.
Moreover, these two lines obsolete the -bios argument.
Note that UEFI is unusable without ACPI. This is handled properly now.
Among with this extension, the variable file is expected to be
writable and hence we need security drivers to label it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Up to now, users can configure BIOS via the <loader/> element. With
the upcoming implementation of UEFI this is not enough as BIOS and
UEFI are conceptually different. For instance, while BIOS is ROM, UEFI
is programmable flash (although all writes to code section are
denied). Therefore we need new attribute @type which will
differentiate the two. Then, new attribute @readonly is introduced to
reflect the fact that some images are RO.
Moreover, the OVMF (which is going to be used mostly), works in two
modes:
1) Code and UEFI variable store is mixed in one file.
2) Code and UEFI variable store is separated in two files
The latter has advantage of updating the UEFI code without losing the
configuration. However, in order to represent the latter case we need
yet another XML element: <nvram/>. Currently, it has no additional
attributes, it's just a bare element containing path to the variable
store file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
After the previous commit, migration statistics on the source and
destination hosts are not equal because the destination updated time
statistics. Let's send the result back so that the same data can be
queried on both sides of the migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Total time of a migration and total downtime transfered from a source to
a destination host do not count with the transfer time to the
destination host and with the time elapsed before guest CPUs are
resumed. Thus, source libvirtd remembers when migration started and when
guest CPUs were paused. Both timestamps are transferred to destination
libvirtd which uses them to compute total migration time and total
downtime. Obviously, this requires the time to be synchronized between
the two hosts. The reported times are useless otherwise but they would
be equally useless if we didn't do this recomputation so don't lose
anything by doing it.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When migrating a transient domain or with VIR_MIGRATE_UNDEFINE_SOURCE
flag, the domain may disappear from source host. And so will migration
statistics associated with the domain. We need to transfer the
statistics at the end of a migration so that they can be queried at the
destination host.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
virDomainGetJobStats gains new VIR_DOMAIN_JOB_STATS_COMPLETED flag that
can be used to fetch statistics of a completed job rather than a
currently running job.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Job statistics data were tracked in several structures and variables.
Let's make a new qemuDomainJobInfo structure which can be used as a
single source of statistics data as a preparation for storing data about
completed a job.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The new blockcopy API wants to reuse only a subset of the disk
hotplug parser - namely, we only care about the embedded
virStorageSourcePtr inside a <disk> XML. Strange as it may
seem, it was easier to just parse an entire disk definition,
then throw away everything but the embedded source, than it
was to disentangle the source parsing code from the rest of
the overall disk parsing function. All that I needed was a
couple of tweaks and a new internal flag that determines
whether the normally-mandatory target element can be
gracefully skipped, since everything else was already optional.
* src/conf/domain_conf.h (virDomainDiskSourceParse): New
prototype.
* src/conf/domain_conf.c (VIR_DOMAIN_XML_INTERNAL_DISK_SOURCE):
New flag.
(virDomainDiskDefParseXML): Honor flag to make target optional.
(virDomainDiskSourceParse): New function.
Signed-off-by: Eric Blake <eblake@redhat.com>
qemu now checks for invalid address type for a panic device, which is
currently implemented only to use ISA address type, thus rejecting
any other options, except for leaving XML attributes blank, in that case,
defaults are used (this behaviour remains the same from earlier verions).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1138125
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When QEMU fails during incoming migration after we successfully started
it (i.e., during Perform or Finish phase), we report a rather unhelpful
message
Unable to read from monitor: Connection reset by peer
We already have a code that takes error messages from QEMU's error
output but we disable it once QEMU successfully starts. This patch
postpones this until the end of Finish phase during incoming migration
so that we can report a much better error message:
internal error: early end of file from monitor: possible problem:
Unknown savevm section or instance '0000:00:05.0/virtio-balloon' 0
load of migration failed
https://bugzilla.redhat.com/show_bug.cgi?id=1090093
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Return failure right away when the domain object can't be looked up
instead of jumping to cleanup. This allows to remove the condition
before unlocking the domain object.
The code would lookup the snapshot object before acquiring the job. This
could lead to a crash as one thread could delete the snapshot object,
while a second thread already had the reference.
Signed-off-by: Jincheng Miao <jmiao@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Creating snapshots modifies the domain state. Currently we wouldn't
enter the job for certain operations although they would modify the
state. Refactor job handling so that everything is covered by an async
job.
For security type='none' libvirt according to the docs should not
generate seclabel be it for selinux or any model. So, skip the
reservation of labels when type is none.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Fairly straightforward - I got lucky that the generated functions
worked out of the box :)
* src/remote/remote_protocol.x (remote_domain_block_copy_args):
New struct.
(REMOTE_PROC_DOMAIN_BLOCK_COPY): New RPC.
* src/remote/remote_driver.c (remote_driver): Wire it up.
* src/remote_protocol-structs: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
The usual portability fixes; and this includes a fix that adds
a new syntax check for double semicolons (commit 28de556 fixed
some, but gnulib found a better check).
* .gnulib: Update to latest.
* src/xenconfig/xen_common.c (xenFormatConfigCommon): Fix offender.
Signed-off-by: Eric Blake <eblake@redhat.com>
Since 9f781da69d
Resolve a libvirtd crash in virStoragePoolSourceFindDuplicate()
when there is an existing SCSI pool defined with adapter type as
'scsi_host' and defining a new SCSI pool with adapter type as
'fc_host' and parent attribute missing or vice versa.
For example, if there is an existing SCSI pool with adapter type
as 'scsi_host' defined using the following XML
<pool type='scsi'>
<name>TEST_SCSI_POOL</name>
<source>
<adapter type='scsi_host' name='scsi_host1'/>
</source>
<target>
<path>/dev/disk/by-path</path>
</target>
</pool>
When defining another SCSI pool with adapter type as 'fc_host' using the
following XML will crash libvirtd
<pool type='scsi'>
<name>TEST_SCSI_FC_POOL</name>
<source>
<adapter type='fc_host' wwnn='1234567890abcdef' wwpn='abcdef1234567890'/>
</source>
<target>
<path>/dev/disk/by-path</path>
</target>
</pool>
Same is true for the reverse case as well where there exists a SCSI pool
with adapter type as 'fc_host' and another SCSI pool is defined with
adapter type as 'scsi_host'.
This happens because for fc_host 'name' is optional attribute whereas for
scsi_host its mandatory. However the check in libvirt for finding duplicate
storage pools didn't take that into account while comparing
Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com>
To date, anyone performing a block copy and pivot ends up with
the destination being treated as <disk type='file'>. While this
works for data access for a block device, it has at least one
noticeable shortcoming: virDomainGetBlockInfo() reports allocation
differently for block devices visited as files (the size of the
device) than for block devices visited as <disk type='block'>
(the maximum sector used, as reported by qemu); and this difference
is significant when trying to manage qcow2 format on block devices
that can be grown as needed.
Of course, the more powerful virDomainBlockCopy() API can already
express the ability to set the <disk> type. But a new API can't
be backported, while a new flag to an existing API can; and it is
also rather inconvenient to have to resort to the full power of
generating XML when just adding a flag to the older call will do
the trick. So this patch enhances blockcopy to let the user flag
when the resulting XML after the copy must list the device as
type='block'.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_REBASE_COPY_DEV):
New flag.
* src/libvirt.c (virDomainBlockRebase): Document it.
* tools/virsh-domain.c (opts_block_copy, blockJobImpl): Add
--blockdev option.
* tools/virsh.pod (blockcopy): Document it.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow new flag.
(qemuDomainBlockCopy): Remember the flag, and make sure it is only
used on actual block devices.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
While reviewing the new virDomainBlockCopy API, Peter Krempa
pointed out that our existing design of using MiB/s for block
job bandwidth is rather coarse, especially since qemu tracks
it in bytes/s; so virDomainBlockCopy only accepts bytes/s.
But once the new API is implemented for qemu, we will be in
the situation where it is possible to set a value that cannot
be accurately reflected back to the user, because the existing
virDomainGetBlockJobInfo defaults to the coarser units.
Fortunately, we have an escape hatch; and one that has already
served us well in the past: we can use the flags argument to
specify which scale to use (see virDomainBlockResize for prior
art). This patch fixes the query side of the API; made easier
by previous patches that split the query side out from the
modification code. Later patches will address the virsh
interface, as well retrofitting all other blockjob APIs to
also accept a flag for toggling bandwidth units.
* include/libvirt/libvirt.h.in (_virDomainBlockJobInfo)
(VIR_DOMAIN_BLOCK_COPY_BANDWIDTH): Document sizing issues.
(virDomainBlockJobInfoFlags): New enum.
* src/libvirt.c (virDomainGetBlockJobInfo): Document new flag.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJobInfo): Add parameter.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJobInfo): Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJobInfo):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJobInfo)
(qemuMonitorJSONGetBlockJobInfoOne): Likewise. Don't scale here.
* src/qemu/qemu_migration.c (qemuMigrationDriveMirror): Update
callers.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl): Likewise.
(qemuDomainGetBlockJobInfo): Likewise, and support new flag.
Signed-off-by: Eric Blake <eblake@redhat.com>
The previous patch hoisted some bounds checks to the callers;
but someone that is not aware of the hoisted check could now
try passing an integer between LLONG_MAX and ULLONG_MAX. As a
safety measure, add new json conversion modes that let libvirt
error out early instead of pass bad numbers to qemu, if the
caller ever makes a mistake due to later refactoring.
Convert the various blockjob QMP calls to use the new modes,
and switch some of them to be optional (QMP has always supported
an omitted "speed" the same as "speed":0, for everything except
block-job-set-speed).
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMakeCommandRaw):
Add 'j'/'y' and 'J'/'Y' to error out on negative input.
(qemuMonitorJSONDriveMirror, qemuMonitorJSONBlockCommit)
(qemuMonitorJSONBlockJob): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com>
qemu treats blockjob bandwidth as a 64-bit number, in the units
of bytes/second. But we stupidly modeled block job bandwidth
after migration bandwidth, which in turn was an 'unsigned long'
and therefore subject to 32-bit vs. 64-bit interpretations, and
with a scale of MiB/s. Our code already has to convert between
the two scales, and report overflow as appropriate; although
this conversion currently lives in the monitor code. In fact,
our conversion code limited things to 63 bits, because we
checked against LLONG_MAX and reject what would be negative
bandwidth if treated as signed.
On the bright side, our use of MiB/s means that even with a
32-bit unsigned long, we still have no problem representing a
bandwidth of 2GiB/s, which is starting to be more feasible as
10-gigabit or even faster interfaces are used. And once you
get past the physical speeds of existing interfaces, any larger
bandwidth number behaves the same - effectively unlimited.
But on the low side, the granularity of 1MiB/s tuning is rather
coarse. So the new virDomainBlockJob API decided to go with
a direct 64-bit bytes/sec number instead of the scaled number
that prior blockjob APIs had used. But there is no point in
rounding this number to MiB/s just to scale it back to bytes/s
for handing to qemu.
In order to make future code sharing possible between the old
virDomainBlockRebase and the new virDomainBlockCopy, this patch
moves the scaling and overflow detection into the driver code.
Several of the block job calls that can set speed are fed
through a common interface, so it was easier to adjust all block
jobs at once, for consistency. This patch is just code motion;
there should be no user-visible change in behavior.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJob)
(qemuMonitorBlockCommit, qemuMonitorDriveMirror): Change
parameter type and scale.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob)
(qemuMonitorBlockCommit, qemuMonitorDriveMirror): Move scaling
and overflow detection...
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl)
(qemuDomainBlockRebase, qemuDomainBlockCommit): ...here.
(qemuDomainBlockCopy): Use bytes/sec.
Signed-off-by: Eric Blake <eblake@redhat.com>
Another layer of overly-multiplexed code that deserves to be
split into obviously separate paths for query vs. modify.
This continues the cleanup started in commit cefe0ba.
In the process, make some tweaks to simplify the logic when
parsing the JSON reply. There should be no user-visible
semantic changes.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Drop parameter.
(qemuMonitorBlockJobInfo): New prototype.
(BLOCK_JOB_INFO): Drop enum.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJob)
(qemuMonitorJSONBlockJobInfo): Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Split...
(qemuMonitorBlockJobInfo): ...into second function.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Move
block info portions...
(qemuMonitorJSONGetBlockJobInfo): ...here, and rename...
(qemuMonitorJSONBlockJobInfo): ...and export.
(qemuMonitorJSONGetBlockJobInfoOne): Alter return semantics.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl, qemuDomainGetBlockJobInfo): Adjust
callers.
* src/qemu/qemu_migration.c (qemuMigrationDriveMirror)
(qemuMigrationCancelDriveMirror): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit fba6bc4 introduced support for the 'invtsc' feature,
which blocks migration. We should not include it in the
host-model CPU by default, because it's intended to be used
with migration.
https://bugzilla.redhat.com/show_bug.cgi?id=1138221
https://bugzilla.redhat.com/show_bug.cgi?id=1027096#c8
There are two ways in which security model can make it way into
<seclabel/>. One is as the @model attribute, the second one is
via security_driver knob in qemu.conf. Then, while parsing
<seclabel/> several checks and fix ups of old, stale combinations
are performed. However, iff @model is specified. They are not
done in the latter case. So it's still possible to feed libvirt
with senseless combinations (if qemu.conf is adjusted correctly).
One example of a seclabel that needs some adjustment (in case
security_driver=none in qemu.conf) is:
<seclabel type='dynamic' relabel='yes'/>
The fixup code is copied from virSecurityLabelDefParseXML
(covering the former case) into virSecurityLabelDefsParseXML
(which handles the latter case).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The qemu implementation for virDomainGetBlockJobInfo() has a
minor bug: it grabs the qemu job with intent to QEMU_JOB_MODIFY,
which means it cannot be run in parallel with any other
domain-modifying command. Among others, virDomainBlockJobAbort()
is such a modifying command, and it defaults to being
synchronous, and can wait as long as several seconds to ensure
that the job has actually finished. Due to the job rules, this
means a user cannot obtain status about the job during that
timeframe, even though we know that some client management code
exists which is using a polling loop on status to see when a job
finishes.
This bug has been present ever since blockpull support was first
introduced (commit b976165, v0.9.4 in Jul 2011), all because we
stupidly tried to cram too much multiplexing through a single
helper routine, but was made worse in 97c59b9 (v1.2.7) when
BlockJobAbort was fixed to wait longer. It's time to disentangle
some of the mess in qemuDomainBlockJobImpl, and in the process
relax block job query to use QEMU_JOB_QUERY, since it can safely
be used in parallel with any long running modify command.
Technically, there is one case where getting block job info can
modify domain XML - we do snooping to see if a 2-phase job has
transitioned into the second phase, for an optimization in the
case of old qemu that lacked an event for the transition. I
claim this optimization is safe (the jobs are all about modifying
qemu state, not necessarily xml state); but if it proves to be
a problem, we could use the difference between the capabilities
QEMU_CAPS_BLOCKJOB_{ASYNC,SYNC} to determine whether we even
need snooping, and only request a modifying job in the case of
older qemu.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Move info
handling...
(qemuDomainGetBlockJobInfo): ...here, and relax job type.
(qemuDomainBlockJobAbort, qemuDomainBlockJobSetSpeed)
(qemuDomainBlockRebase, qemuDomainBlockPull): Adjust callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
The existing virDomainBlockRebase code rejected the combination of
_RELATIVE and _COPY flags, but only by accident. It makes sense
to add support for the combination someday, at least for the case
of _SHALLOW and not _REUSE_EXT; but to implement it, libvirt would
have to pre-create the file with a relative backing name, and I'm
not ready to code that in yet.
Meanwhile, the code to forward on to the block copy code is getting
longer, and reorganizing the function to have the block pull done
early makes it easier to add even more block copy prep code.
This patch should have no semantic difference other than the quality
of the error message on the unsupported flag combination. Pre-patch:
error: unsupported flags (0x10) in function qemuDomainBlockCopy
Post-patch:
error: argument unsupported: Relative backing during copy not supported yet
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Reorder code,
and improve error message of relative copy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Our style overwhelmingly uses hanging braces (the open brace
hangs at the end of the compound condition, rather than on
its own line), with the primary exception of the top level function
body. Fix the few remaining outliers, before adding a syntax
check in a later patch.
* src/interface/interface_backend_netcf.c (netcfStateReload)
(netcfInterfaceClose, netcf_to_vir_err): Correct use of { in
compound statement.
* src/conf/domain_conf.c (virDomainHostdevDefFormatSubsys)
(virDomainHostdevDefFormatCaps): Likewise.
* src/network/bridge_driver.c (networkAllocateActualDevice):
Likewise.
* src/util/virfile.c (virBuildPathInternal): Likewise.
* src/util/virnetdev.c (virNetDevGetVirtualFunctions): Likewise.
* src/util/virnetdevmacvlan.c
(virNetDevMacVLanVPortProfileCallback): Likewise.
* src/util/virtypedparam.c (virTypedParameterAssign): Likewise.
* src/util/virutil.c (virGetWin32DirectoryRoot)
(virFileWaitForDevices): Likewise.
* src/vbox/vbox_common.c (vboxDumpNetwork): Likewise.
* tests/seclabeltest.c (main): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
I'm about to add a syntax check that enforces our documented
HACKING style of always using matching {} on if-else statements.
This patch focuses on drivers that had several issues.
* src/lxc/lxc_fuse.c (lxcProcGetattr, lxcProcReadMeminfo): Correct
use of {}.
* src/lxc/lxc_driver.c (lxcDomainMergeBlkioDevice): Likewise.
* src/phyp/phyp_driver.c (phypConnectNumOfDomainsGeneric)
(phypUUIDTable_Init, openSSHSession, phypStoragePoolListVolumes)
(phypConnectListStoragePools, phypDomainSetVcpusFlags)
(phypStorageVolGetXMLDesc, phypStoragePoolGetXMLDesc)
(phypConnectListDefinedDomains): Likewise.
* src/vbox/vbox_common.c (vboxAttachSound, vboxDumpDisplay)
(vboxDomainRevertToSnapshot, vboxDomainSnapshotDelete): Likewise.
* src/vbox/vbox_tmpl.c (vboxStorageVolGetXMLDesc): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
We lacked of HOME environment variable,
set 'HOME=/' as default.
The kernel sets up $HOME for the init process.
Therefore any init can assume that $HOME is set.
libvirt currently violates that implicit rule.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When FIPS mode is on, gnutls_dh_params_generate2 will fail if 1024 is
specified as the prime's number of bits, a bigger value works in both
cases.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Memory is allocated for 'mnt_src' by VIR_STRDUP in the loop. Next
loop it will be allocated again. So we need to free 'mnt_src'
before continue the loop.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Commit 0e1a1a8c introduced umask for virCommand, but the variables
used emit a warning on older compilers about shadowed global
declaration.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add umask to _virCommand, allow user to set umask to command.
Set umask(002) to qemu process to overwrite the default umask
of 022 set by many distros, so that unix sockets created for
virtio-serial has expected permissions.
Fix problem reported here:
https://sourceware.org/bugzilla/show_bug.cgi?id=13078#c11https://bugzilla.novell.com/show_bug.cgi?id=888166
To use virtio-serial device, unix socket created for chardev with
default umask(022) has insufficient permissions.
e.g.:
-device virtio-serial \
-chardev socket,path=/tmp/foo,server,nowait,id=foo \
-device virtserialport,chardev=foo,name=org.fedoraproject.port.0
srwxr-xr-x 1 qemu qemu 0 21. Jul 14:19 /tmp/somefile.sock
Other users in the same group (like real user, test engines, etc)
cannot write to this socket.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, there is one flag passed in during macvtap creation
(withTap) -- Let's convert this field to an unsigned int flag
field for future expansion.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The cleanup in commit cf976d9d used secdef->label to label the tap
FDs, but that is not possible since it's process-only label (svirt_t)
and not a object label (e.g. svirt_image_t). Starting a domain failed
with EPERM, but simply using secdef->imagelabel instead of
secdef->label fixes it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Since 1b807f92, connecting with virsh to an already running session
libvirtd fails with:
$ virsh list --all
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to
'/run/user/1000/libvirt/libvirt-sock': Transport endpoint is already
connected
This is caused by a logic error in virNetSocketNewConnectUnix: even if
the connection to the daemon socket succeeded, we still try to spawn the
daemon and then connect to it.
This commit changes the logic to not try to spawn libvirtd if we
successfully connected to its socket.
Most of this commit is whitespace changes, use of -w is recommended to
look at it.
Currently, after calling commands to create a new volumes,
virStorageBackendZFSCreateVol calls virStorageBackendZFSFindVols that
calls virStorageBackendZFSParseVol.
virStorageBackendZFSParseVol checks if a volume already exists by
trying to get it using virStorageVolDefFindByName.
For a just created volume it returns NULL, so volume is reported as
new and appended to pool->volumes. This causes a volume to be listed
twice as storageVolCreateXML appends this new volume to the list as
well.
Fix that by passing a new volume definition to
virStorageBackendZFSParseVol so it could determine if it needs to add
this volume to the list.
In qemuDomainSnapshotCreateDiskActive() if we jumped to cleanup from a
failed actions = virJSONValueNewArray(), then 'cfg' would be NULL.
So just return -1, which in turn removes the need for cleanup:
Coverity complained about the following:
(3) Event ptr_arith:
Performing pointer arithmetic on "cur_fd" in expression "cur_fd++".
130 return virNetServerServiceNewFD(*cur_fd++,
The complaint is that pointer arithmetic taking place instead of the
expected auto increment of the variable... Adding some well placed
parentheses ensures our order of operation.
For virtio-blk-pci disks with the disk iothread attribute that are
running the correct emulator, add the "iothread=iothread#" to the
-device command line in order to enable iothreads for the disk as
long as the command is available, the disk iothread value provided is
valid, and is supported for the disk device being added
Add a new disk "driver" attribute "iothread" to be parsed as the thread
number for the disk to use. In order to more easily facilitate the usage
and configuration of the iothread, a "zero" for the attribute indicates
iothreads are not supported for the device and a positive value indicates
the specific thread to try and use.
Add a new capability to ensure the iothreads feature exists for the qemu
emulator being run - requires the "query-iothreads" QMP command. Using the
domain XML add correspoding command argument in order to generate the
threads. The iothreads will use a name space "iothread#" where, the
future patch to add support for using an iothread to a disk definition to
merely define which of the available threads to use.
Add tests to ensure the xml/argv processing is correct. Note that no
change was made to qemuargv2xmltest.c as processing the -object element
would require knowing more than just iothreads.
Introduce XML to allowing adding iothreads to the domain. These can be
used by virtio-blk-pci devices in order to assign a specific thread to
handle the workload for the device. The iothreads are the official
implementation of the virtio-blk Data Plane that's been in tech preview
for QEMU.
Coverity noted that all callers to libxlDomainEventQueue() could ensure
the second parameter (event) was true before calling except this case.
As I look at the code and how events are used - it seems that prior to
generating an event for the dom == NULL condition, the resume/suspend
event should be queue'd after the virDomainSaveStatus() call which will
goto cleanup and queue the saved event anyway.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Implement the API function for virDomainListGetStats and
virConnectGetAllDomainStats in a modular way and implement the
VIR_DOMAIN_STATS_STATE group of statistics.
Although it may look like the function looks universal I'd rather not
expose it to other drivers as the coming stats groups are likely to do
qemu specific stuff to obtain the stats.
One useless warning, but the other one rather pertinent. On entry
the 'trans' variable is initialized to VIR_DOMAIN_DISK_TRANS_DEFAULT.
When the "trans" was found in the parsing loop it def->geometry.trans
was assigned to the return from virDomainDiskGeometryTransTypeFromString
and then 'trans' was used to do the comparison to see if it was valid.
So remove 'trans' and use def->geometry.trans properly
In libxlDomainMigrationPrepare() if the uri_in is false, then
'hostname' is allocated and used "generically" in the routine,
but not freed. Conversely, if uri_in is true, then a uri is
allocated and hostname is set to the uri->hostname value and
likewise generically used.
At function exit, hostname wasn't free'd in the !uri_in path,
so that was added. To just make it clearer on usage the else
path became the call to virURIFree() although I suppose technically
it didn't have to since it would be a call using (NULL)
Coverity determined that on error path that 'mach' wouldn't be free'd
Since virCapabilitiesFreeGuestMachine() isn't globally available, we'll
insert first and then if the VIR_STRDUP's fail they it will eventually
cause the 'mach' to be freed in the error path
Coverity found that on error paths, the 'arg' value wasn't be cleaned
up. Followed the example in qemuAgentSetVCPUs() where upon successful call
to qemuAgentCommand() the 'cpus' is set to NULL; otherwise, when cleanup
occurs the free the memory for 'arg'
In function virQEMUCapsParseMachineTypesStr, VIR_STRNDUP allocates
memory for 'name' in {do,while} loop. If 'name' isn't freed before
'continue', its memory will be allocated again in the next loop.
In this case the memory allocated for 'name' in privious loop is
useless and not freed. Free it before continue this loop to fix that.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Coverity complains that checking for domain->def being non NULL in the
if (live) path of virDomainObjAssignDef() would be unnecessary or a
NULL deref since the call to virDomainObjIsActive() would already
dereference domain->def when checking if the def->id field was != -1.
Checked all callers to virDomainObjAssignDef() and each at some point
dereferences (vm)->def->{field} prior to calling when live is true.
In qemuNetworkIfaceConnect() a call to virNetDevBandwidthSet() is
made where the function prototype requires the first parameter
(net->ifname) to be non NULL. Coverity complains that the subsequent
non NULL check for net->ifname prior to the next call gets flagged as
an unnecessary check. Resolve by removing the extra check
In virDomainActualNetDefFormat() a call to virDomainNetGetActualType(def)
was made before a check for (!def) a few lines later. This triggered
Coverity to note the possible NULL deref. Just moving the initialization
to after the !def checks resolves the issue
There were two occurrances of attempting to initialize actualType by
calling virStorageSourceGetActualType(src) prior to a check if (!src)
resulting in Coverity complaining about the possible NULL dereference
in virStorageSourceGetActualType() of src.
Resolve by moving the actualType setting until after checking !src
If virDomainDiskDefFree(disk) is called in 'skipdisk:', then it's possible
to either return to skipdisk without reallocating a new disk (via the if
condition just prior) or to end the loop having deleted the disk. Since
virDomainDiskDefFree() does not pass by reference, disk isn't changed in
this context, thus the possible issue.
There were two warnings in this module
If the VIR_ALLOC_N(def->serials, 1) fails, then a virDomainChrDefFree(chr)
is called and we jump to cleanup which makes the same call. Just remove
the one after VIR_ALLOC_N()
In the label "skipnic:" a virDomainNetDefFree(net) is made; however, if
in going back to the top of the loop we jump back down to skipnic for any
reason, the call will attempt to free an already freed structure since
"net" was not passed by reference to virDomainNetDefFree(). Just set
net = NULL in skipnic: to resolve the issue.
Coverity complains that calling virNetworkDefFree(def), then jumping
to the cleanup: label which calls virNetworkDefFree(def) could result
in a double_free. Just remove the call from the if statement.
Since times when vbox moved to the daemon (due to some licensing
issue) the subdrivers that vbox implements were registered, but not
opened since our generic subdrivers took priority. I've tried to fix
this in 65b7d553f3 but it was not correct. Apparently moving
vbox driver registration upfront changes the default connection URI
which makes some users sad. So, this commit breaks vbox into pieces
and register vbox's network and storage drivers first, and vbox driver
then at the end. This way, the vbox driver is registered in the order
it always was, but its subdrivers are registered prior the generic
ones.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There's this unwritten rule in libvirt that vir_function is translated
into virFunction when needed (e.g. in remote protocol definition,
python, ...). Up till now we ignored such translation in driver module
loading and did fine. Well, we didn't have any module with an
underscore in its name. But this will change in next commit. The
problem is, once an a module is dlopen()-ed, we derive register
function name from its name. So instead of "driver_subdriverRegister"
do some magic to turn that into "driverSubdriverRegister".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
While working on virDomainBlockCopy, I noticed we had a verify()
concerning internal XML flags that was incomplete after several
recent flag additions; move that up higher in the code to make it
harder to forget to modify on the next flag addition. Adjust
some formatting while at it.
* src/conf/domain_conf.c (verify): Move closer to internal flag
definitions. Cover missing flags ALLOW_ROM and ALLOW_BOOT.
Signed-off-by: Eric Blake <eblake@redhat.com>
In qemuDomainRevertToSnapshot(), it will check snap->def->state.
But when the state is PMSUSPENDED/NOSTATE/BLOCKED, it forgets to
call qemuDomainObjEndJob.
https://bugzilla.redhat.com/show_bug.cgi?id=1134154
Bug introduced in commit 1e833899.
Signed-off-by: Jincheng Miao <jmiao@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Let's fix this before we bake in a painful API. Since we know
that we have exactly one non-negative fd on success, we might
as well return the fd directly instead of forcing the user to
pass in a pointer. Furthermore, I found some memory and fd
leaks while reviewing the code - the idea is that on success,
libvirtd will have handed two fds in two different directions:
one to qemu, and one to the RPC client.
* include/libvirt/libvirt.h.in (virDomainOpenGraphicsFD): Drop
unneeded parameter.
* src/driver.h (virDrvDomainOpenGraphicsFD): Likewise.
* src/libvirt.c (virDomainOpenGraphicsFD): Adjust interface to
return fd directly.
* daemon/remote.c (remoteDispatchDomainOpenGraphicsFd): Adjust
semantics.
* src/qemu/qemu_driver.c (qemuDomainOpenGraphicsFD): Likewise,
and plug fd leak.
* src/remote/remote_driver.c (remoteDomainOpenGraphicsFD):
Likewise, and plug memory and fd leak.
Signed-off-by: Eric Blake <eblake@redhat.com>
This commit (finally) adds the virDomainBlockCopy API, with the
intent that it will provide more power to the existing 'virsh
blockcopy' command.
'virsh blockcopy' was first added in Apr 2012 (v0.9.12), which
corresponds to the upstream qemu 1.2 timeframe. It was done as
a hack on top of the existing virDomainBlockRebase() API call,
for two reasons: 1) it was targetting a feature that landed first
in downstream RHEL qemu, but had not stabilized in upstream qemu
at the time (and indeed, 'drive-mirror' only landed upstream in
qemu 1.3 with slight differences to the first RHEL attempt,
and later gained further parameters like granularity and buf-size
that are also worth exposing), and 2) extending an existing API
allowed it to be backported without worrying about bumping .so
versions. A virDomainBlockCopy() API was proposed at that time
[1], but we decided not to accept it into libvirt until after
upstream qemu stabilized, and it ended up getting scrapped.
Whether or not RHEL should have attempted adding a new feature
without getting it upstream first is a debate that can be held
another day; but enough time has now elapsed that we are ready to
do the interface cleanly.
[1] https://www.redhat.com/archives/libvir-list/2012-April/msg00768.html
Delaying the creation of a clean API until now has also had a
benefit: we've only recently learned of a few shortcomings in the
original design: 1) it is unable to target a network destination
(such as a gluster volume) because it hard-coded the assumption
that the destination is a local file name. Because of all the
refactoring we've done to add virStorageSourcePtr, we are in a
better position to declare an API that parses XML describing a
host storage source as the copy destination, which was not
possible had we implemented virDomainBlockCopy as it had been
originally envisioned (although a network target will have to wait
until a later libvirt release compared to the API addition to
actually be implemented). 2) the design of using MiB/sec as the
bandwidth throttle is rather coarse; qemu is actually tuned to
bytes/second, and libvirt is preventing access to that level of
detail. A later patch will add flags to existing block job API
that can request bytes/second instead of back-compat MiB/s, but as
this is a new API, we can get it right to begin with.
At least I had the foresight to create 'virsh blockcopy' as a
separate command at the UI level (commit 1f06c00) rather than
leaking the underlying API overload of virDomainBlockRebase onto
shell users.
A further note on the bandwidth option: virTypedParameters
intentionally lacks unsigned long (since variable-width
interaction between mixed 32- vs. 64-bit client/server setups is
nasty), but we have to deal with the fact that we are interacting
with existing older code that mistakenly chose unsigned long
bandwidth at a point before we decided to prohibit it in all new
API. The typed parameter is therefore unsigned long long, but
the implementation (in a later patch) will have to do overflow
detection on 32-bit platforms, as well as capping the value to
match the LLONG_MAX>>20 cap of the existing MiB/s interfaces.
* include/libvirt/libvirt.h.in (virDomainBlockCopy): New API.
(virDomainBlockJobType, virConnectDomainEventBlockJobStatus):
Update related documentation.
* src/libvirt.c (virDomainBlockCopy): Implement it.
* src/libvirt_public.syms (LIBVIRT_1.2.8): Export it.
* src/driver.h (_virDriver): New driver callback.
Signed-off-by: Eric Blake <eblake@redhat.com>
The motivation for this API is that management layers that use libvirt
usually poll for statistics using various split up APIs we currently
provide. To get all the necessary stuff, the app needs to issue a lot of
calls and aggregate the results.
The APIs I'm introducing here:
1) Returns data in a format that we can expand in the future and is
(pseudo) hierarchical. The data is returned as typed parameters where
the fields are constructed as dot-separated strings containing names and
other stuff in a list of typed params.
2) Stats for multiple (all) domains can be queried at once and are
returned in one call. This will decrease the overhead necessary to issue
multiple calls per domain multiplied by the count of domains.
3) Selectable (bit mask) fields in the returned format. This will allow
to retrieve only specific stats according to the app's need.
The stats groups will be enabled using a bit field @stats passed as the
function argument. A few sample stats groups that this API will support:
VIR_DOMAIN_STATS_STATE
VIR_DOMAIN_STATS_CPU
VIR_DOMAIN_STATS_BLOCK
VIR_DOMAIN_STATS_INTERFACE
(Note that this is only an example, the initial implementation supports
only VIR_DOMAIN_STATS_STATE while others will be added later.)
the returned typed params will use the following scheme
state.state = VIR_DOMAIN_RUNNING
state.reason = VIR_DOMAIN_RUNNING_BOOTED (the actual values according to
the enum)
cpu.count = 8
cpu.0.state = running
cpu.0.time = 1234
According to docs/schemas/domaincommon.rng and _virDomainBlockIoTuneInfo
all the iotune values are interpreted as unsigned long long, however
according to qemu_monitor_json.c, qemu silently truncates numbers
larger than LLONG_MAX. There's really not much of a usage for such
large numbers anyway yet. This patch provides the same overflow
check during a domain start as it does during setting
a blkdeviotune element in qemu_driver.c and thus reports an error when
a larger number than LLONG_MAX is detected.
https://bugzilla.redhat.com/show_bug.cgi?id=1131876
QEMU 2.1 added support for the kvm=off option to the -cpu command,
allowing the KVM hypervisor signature to be hidden from the guest.
This enables disabling of some paravirualization features in the
guest as well as allowing certain drivers which test for the
hypervisor to load. Domain XML syntax is as follows:
<domain type='kvm>
...
<features>
...
<kvm>
<hidden state='on'/>
</kvm>
</features>
...
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Commit b55cc5f4e did a shallow copy of libxl_{sdl,vnc}_info from the
domain config to the build info, which resulted in double-freeing
strings contained in the structures during cleanup, which later
resulted in a libvirtd crash. Fix by performing a deep copy of the
structure, VIR_STRDUP'ing embedded strings instead of simply copying
their pointers.
Fixes the following issue reported on the libvirt dev list
https://www.redhat.com/archives/libvir-list/2014-August/msg01112.html
I noticed a line 'int nparams = 0;;' in remote_dispatch.h, and
tracked down where it was generated. While at it, I found a
couple of other double semicolons. Additionally, I noticed that
commit df0b57a95 left a stale reference to the file name
remote_dispatch_bodies.h.
* src/conf/numatune_conf.c (virDomainNumatuneNodeParseXML): Drop
empty statement.
* tests/virdbustest.c (testMessageStruct, testMessageSimple):
Likewise.
* src/rpc/gendispatch.pl (remote_dispatch_bodies.h): Likewise, and
update stale comments.
Signed-off-by: Eric Blake <eblake@redhat.com>
When trying to set an invalid value into iotune element, standard
behavior was to not report any error, rather to reset all affected
subelements of the iotune element back to 0 which results in ignoring
those particular subelements by XML generator. Patch further
examines the return code of the virXPathULongLong function
and in case of an invalid non-integer value raises an error.
Fixed to preserve consistency with invalid value checking
of other elements.
Resolves https://bugzilla.redhat.com/show_bug.cgi?id=1131811
The commit "f5b4c141" introduced new "force" parameter
for "virFDStreamOpenFileInternal" but forget to update
one call of that function.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
virStorageBackendVolDownloadLocal and virStorageBackendVolUploadLocal
use virFDStreamOpenFile function to work with the volume fd.
virFDStreamOpenFile calls virFDStreamOpenFileInternal that implements
handling of the non-blocking I/O. If a file is not a character device and
not a fifo, it uses libvirt_iohelper.
On FreeBSD, it doesn't work as expected because disk devices (including
ZFS volumes) are exposed as character devices, and ZFS volumes do not
support open(2) with O_NONBLOCK.
To overcome this, introduce a forceIOHelper flag to
virFDStreamOpenFileInternal that forces using libvirt_iohelper. And
introduce virFDStreamOpenBlockDevice that calls
virFDStreamOpenFileInternal with the forceIOHelper set to true.
virFDStreamOpenInternal terminates if virSetNonBlock fails. As
virSetNonBlock uses gnulib's set_nonblocking_flag that sets errno,
call virReportSystemError() to let user know the reason of fail.
Commit b606bbb41 reminded me that any time we drop locks to run
back-to-back guest interaction commands, we have to check that
the guest didn't disappear in between the two commands. A quick
audit found a couple of spots that were missing this check.
* src/qemu/qemu_driver.c (qemuDomainShutdownFlags)
(qemuDomainSetVcpusFlags): Check that domain is still up.
Signed-off-by: Eric Blake <eblake@redhat.com>
Since '337a13628' - Coverity complains that 'net' is VIR_ALLOC()'d, but
on various 'cleanup' exit paths from the code there is no corresponding
cleanup.
Since '1b807f92d' - Coverity complains that in the error paths of
both virFork() and virProcessWait() that the 'passfd' will not be closed.
Added the VIR_FORCE_CLOSE(passfd) and initialized it to -1.
Also noted that variable 'buf' was never really used - so I removed it
When trying to set numatune mode directly using virsh numatune command,
correct error is raised, however numatune structure was not deallocated,
thus resulting in creating an empty numatune element in the guest XML,
if none was present before. Running the same command aftewards results
in a successful change with broken XML structure. Patch fixes the
deallocation problem as well as checking for invalid attribute
combination VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO + a nonempty nodeset.
Resolves https://bugzilla.redhat.com/show_bug.cgi?id=1129998
The 'min_guarantee' is used by VMware ESX and OpenVZ drivers,
with qemu however, libvirt should report error when starting a domain,
because this element is not used.
Resolves https://bugzilla.redhat.com/show_bug.cgi?id=1122455
On some places in the libvirt code we have:
f(a,z)
instead of
f(a, z)
This trivial patch fixes couple of such occurrences.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Libvirt measures vram in Kbytes, not in bytes, so calculation
of Mbytes was incorrect. PCS server can take vram argument
with units, so I added K postfix to make params a little bit clearer.
That sets a new flag, but that flag does mean the child will get
LISTEN_FDS and LISTEN_PID environment variables properly set and
passed FDs reordered so that it corresponds with LISTEN_FDS (they must
start right after STDERR_FILENO).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
It's just a wrapper around NewFD and NewUNIX that selects the right
option and increments the number of used FDs.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Since not only systemd can do this (we'll be doing it as well few
patches later), change 'systemd' to 'caller' and fix LISTEN_FDS to
LISTEN_PID where applicable.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When formatting the forward mode addresses or interfaces the switch was
done based on the type of the network rather than of the type of the
individual <interface>/<address> element. In case a user would specify
an incorrect network type ("passhtrough") with <address> elements,
libvirtd would crash as it would attempt to format an <interface>.
Use the type of the individual element to format the XML.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1132347
https://bugzilla.redhat.com/show_bug.cgi?id=1078126
Using 'virsh attach-device --config' (or --persistent) to attach a
file backed lun device will succeed; however, subsequent domain restarts
will result in failure because the configuration of a file backed lun
is not supported.
Although allowing 'illegal configurations' is something that can be
allowed, it may not be practical in this case. Generally, when attaching
a device to a domain means the domain must be running. A way around
this is using the --config (or --persistent) option. When an attach
is done to a running domain, a temporary configuration is modified
first followed by the live update. The live update will make a number
of disk validity checks when building the qemu command to attach the
disk. If any fail, then change is rejected.
Rather than allow a potentially illegal combination, adjust the code
in the configuration path to make the same checks as the running path
will make with respect to disk validity checks. This way we avoid
having the potential for some subsequent start/reboot to fail because
an illegal combination was allowed.
NB: The live path still checks the configuration since it is possible
to just do --live guest modification...
Since vbox driver rewrite the virDriver structure init moved from
vbox_tmpl.c into vbox_common.c. However, our hvsupport.pl script
doesn't count with that. It still parses vbox_tmp.c and looks for
virDriver structure which is not found there anymore. As a result,
at hvsupport page is seems like vbox driver doesn't support
anything.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In case the host has 2 or more NUMA nodes, we fetch CPU map for each
node. However, we need to free the CPU map in between loops:
==29513== 96 (72 direct, 24 indirect) bytes in 3 blocks are definitely lost in loss record 951 of 1,264
==29513== at 0x4C2A700: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==29513== by 0x52AD24B: virAlloc (viralloc.c:144)
==29513== by 0x52AF0E6: virBitmapNew (virbitmap.c:78)
==29513== by 0x52FB720: virNumaGetNodeCPUs (virnuma.c:294)
==29513== by 0x53C700B: nodeCapsInitNUMA (nodeinfo.c:1886)
==29513== by 0x11759708: vboxCapsInit (vbox_common.c:398)
==29513== by 0x11759CC4: vboxConnectOpen (vbox_common.c:514)
==29513== by 0x53C965F: do_open (libvirt.c:1147)
==29513== by 0x53C9EBC: virConnectOpen (libvirt.c:1317)
==29513== by 0x142905: remoteDispatchConnectOpen (remote.c:1215)
==29513== by 0x126ADF: remoteDispatchConnectOpenHelper (remote_dispatch.h:2346)
==29513== by 0x5453D21: virNetServerProgramDispatchCall (virnetserverprogram.c:437)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1103245
An advice appeared there on the qemu-devel list [1]. When a domain is
suspended and then resumed guest kernel is not aware of this. So we've
introduced virDomainSetTime API that resets the time within guest
using qemu-ga. On the other hand, qemu itself is trying to make RTC
beat faster to catch the difference. But if we don't tell qemu that
guest's time was reset via the other method, both mechanisms are
applied resulting in again wrong guest time. In order to avoid summing
both corrections we need to tell qemu that it should not use the RTC
injection if the guest time is set via guest agent.
1: http://www.mail-archive.com/qemu-devel@nongnu.org/msg236435.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When a user would try changing the persistent IO tuning settings for a
disk that was hotplugged to a vm in a transient way, the
qemuDomainSetBlockIoTune API would use the same index for both the
live and config disk array. The disk was missing from the config array
though causing a crash of libvirtd.
To fix the issue, determine the indexes separately.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1131819
https://bugzilla.redhat.com/show_bug.cgi?id=1095636
When starting up the domain the domain's NICs are allocated. As of
1f24f682 (v1.0.6) we are able to use multiqueue feature on virtio
NICs. It breaks network processing into multiple queues which can be
processed in parallel by different host CPUs. The queues are, however,
created by opening /dev/net/tun several times. Unfortunately, only the
first FD in the row is labelled so when turning the multiqueue feature
on in the guest, qemu will get AVC denial. Make sure we label all the
FDs needed.
Moreover, the default label of /dev/net/tun doesn't allow
attaching a queue:
type=AVC msg=audit(1399622478.790:893): avc: denied { attach_queue }
for pid=7585 comm="qemu-kvm"
scontext=system_u:system_r:svirt_t:s0:c638,c877
tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023
tclass=tun_socket
And as suggested by SELinux maintainers, the tun FD should be labeled
as svirt_t. Therefore, we don't need to adjust any range (as done
previously by Guannan in ae368ebf) rather set the seclabel of the
domain directly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Removing a shared device needs special steps for disks and hostdevs.
Instead of having one function dealing this split the code into two
separate functions that can be used with better granularity.
Adding a shared device needs special steps for disks and hostdevs.
Instead of having one function dealing this split the code into two
separate functions that can be used with better granularity.
The qemuCheckSharedDevice function is operating only on disk devices.
Rename it and change the arguments to reflect that and refactor some
logic for more readability.
Split it out into a separate function and simplify the code. There's no
need to copy the entry to update it as the hash returns pointer to the
existing item.
Also remove the now unused qemuSharedDeviceEntryCopy function.
To allow reuse split the code into a separate function and refactor it.
To update an existing entry there's no need to copy it first, just
update it inplace.
Pass the source of the changed media instead of a complete disk
definition.
Note that the @disk argument now contains what @olddisk would contain.
The new source is passed as a virStorageSource struct.
When we are changing media (or doing other hotplug operations) we need
to setup cgroups, locking and seclabels on the new disk. This is a
multi-step process where every piece can fail. To simplify dealing with
this introduce qemuDomainPrepareDisk that similarly to
qemuDomainPrepareDiskChainElement initializes/tears down a whole new
disk to be used with the domain.
Additionally the function supports passing a different source struct for
media changes of cdroms that will be refactored later.
Update bhyveBuildDiskArgStr to support volumes:
- Make virBhyveProcessBuildBhyveCmd and
virBhyveProcessBuildLoadCmd take virConnectPtr as the
first argument instead of bhyveConnPtr as virConnectPtr is
needed for virStorageTranslateDiskSourcePool,
- Add virStorageTranslateDiskSourcePool call to
virBhyveProcessBuildBhyveCmd and
virBhyveProcessBuildLoadCmd,
- Allow disks of type VIR_STORAGE_TYPE_VOLUME
Currently, qemu driver uses qemuTranslateDiskSourcePool()
to translate disk volume information. This function is
general enough and could be used for other drivers as well,
so move it to conf/domain_conf.c along with its helpers.
- qemuTranslateDiskSourcePool: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePool,
- qemuAddISCSIPoolSourceHost: move to storage/storage_driver.c
and rename to virStorageAddISCSIPoolSourceHost,
- qemuTranslateDiskSourcePoolAuth: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePoolAuth,
- Update users of qemuTranslateDiskSourcePool to use a
new name.
In commit 45ad1adb I added a nicer message for tunings that need
cgroups when unavailable (unprivileged), but I added this check for
I/O tuning of block devices, which doesn't need cgroups, because it is
done by QEMU, so let's fix that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
XM and XL config are very similar. Disks are specified differently
in XL, but the old XM disk config is still supported by XL. XL also
supports new config like spice that was never supported by XM.
This patch moves all the common parsing and formatting functions to
the new file xen_common.c and adapts the XM parser/formatter accordingly.
This restructuring paves way for introducing an XL parser/formatter in
the future.
While moving the code, fixup whitespace, comments, and style issues.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Wrap formatting code common to xm and xl in xenFormatConfigCommon
and export it.
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Wrap parsing code common to xm and xl in xenParseConfigCommon
and export it.
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
src/xenxs contains parsing/formating functions for the various xen
config formats, and is better named src/xenconfig.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Pin existing vcpus rather than existing vcpu pinning infos. This
increases the complexity of the lookup, but avoids pinning cpus that are
not enabled actually.
Remove the pinning info when removing to CPU, otherwise when the VM will
be started our code will try to pin non-existing vcpus as the definition
wasn't updated.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1129372
Tidy up control flow, change boolean argument to use 'bool', improve
error message in case the function is used to parse emulator pinning
info and avoid a few temp variables that made no sense.
Also when the function is called to parse emulator pinning info, there's
no need to check the processor ID in that case.
The check doesn't make much sense as right below it the entries are
either checked for duplicity or ignored in some cases. Having this check
doesn't actually forbid passing invalid values.
When editing guest's XML (on QEMU), it was possible to add multiple
listen elements into graphics parent element. However QEMU does not
support listening on multiple addresses. Configuration is tested for
multiple 'listen address' and if positive, an error is raised.
https://bugzilla.redhat.com/show_bug.cgi?id=1119212
Four functions are rewrite in this patch, that is:
vboxNodeGetInfo
vboxNodeGetCellsFreeMemory
vboxNodeGetFreeMemory
vboxNodeGetFreePages
Since these functions has nothing to do with vbox,
it can be directly moved to vbox_common.c. So, I
merged these things into one patch.
The vboxDomainSnapshotCreateXML integrated the snapshot redefine
with this patch:
http://www.redhat.com/archives/libvir-list/2014-May/msg00589.html
This patch introduced vboxSnapshotRedefine in vboxUniformedAPI to
enable the features.
This patch replace all version specified APIs to the uniformed api,
then, moving the whole implementation to vbox_common.c. As there
is only API level changes, the behavior of the function doesn't
change.
Some old version's defects has brought to the new one. The already
known things are:
*goto cleanup in a loop without releasing the pointers in the
loop.
*When function failed after machine unregister, no roll back
to recovery it and the virtual machine would disappear.
All vbox objects are child objects from the nsISupports in vbox's
C++ API version. Since the CAPI is generated from the C++ API, I
kept their relationship here, by the definitations below:
typedef struct nsISupports nsISupports;
typedef nsISupports IVirtualBox;
typedef nsISupports ISession;
and so on...
So, when calling the API from nsISupports, we don't need to do
typecasting, and things work still work well.
Introduce vbox_uniformed_api to deal with version conflicts. Use
vbox_install_api to register the currect vboxUniformedAPI with
vbox version.
vboxConnectOpen has been rewritten.
Martin Kletzander pointed out in email that my commit 2a193f64
introduced a crash in networkCreateInterfacePool() during startup of
any network that doesn't have a <pf> subelement of its <forward>
element. He also supplied a patch.
http://www.redhat.com/archives/libvir-list/2014-August/msg00655.html
I expanded on that patch by cleaning up now-extraneous checks in the
callers of networkCreateInterfacePool().
Fortunately the offending patch hasn't been in any release, and hasn't
been (to my knowledge) backported to any other branch.
introduce functions
xenFormatXMCPUAllocation(virConfPtr conf, ......);
xenFormatXMCPUFeatures(virConfPtr conf, ......);
which formats CPU allocation and features config
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
introduce function
xenFormatXMTimeOffset(virConfPtr conf,........);
which formats time config instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
introduce function
xenFormatXMGeneralMeta(virConfPtr conf,......);
which parses uuid and name instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
The gid value passed to devpts has to be translated by hand as
virLXCControllerSetupDevPTS() is called before setting up the user
and group mappings.
Otherwise devpts will use an unmapped gid and openpty()
will fail within containers.
Linux kernel commit 23adbe12
("fs,userns: Change inode_capable to capable_wrt_inode_uidgid")
uncovered that issue.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
During a QEMU live migration several warning messages about job
handling could be written to syslog on the destination host:
"entering monitor without asking for a nested job is dangerous"
The messages are written because the job handling during migration
uses hard coded asyncJob values in several places that are incorrect.
This patch passes the required asyncJob value around and prevents
the warnings as well as any issues that the warnings may be referring
to.
https://bugzilla.redhat.com/show_bug.cgi?id=1130089
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
commit d9504941 introduces two new attributes "cmd_per_lun" and
"max_sectors" same with the names QEMU uses for virtio-scsi.
But the case of parsing them is not exact. Change to parse
them if controller has "driver" element.
Signed-off-by: Mo yuxiang <moyuxiang@huawei.com>
This patch changes the setmaxmem function to support the '--live',
'--config', and '--current' flags by revectoring the code through
the setmem function using the VIR_DOMAIN_MEM_MAXIMUM flag. The
setmem code is refactored to handle both cases depending on the flag.
The changed maxmem code for the MEM_MAXIMUM path will not allow
modification to the memory values of an active guest unless the --config
switch is used.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
At the beginning of the qemu config file parsing function there
are 3 helper macros defined: GET_VALUE_BOOL, GET_VALUE_LONG and
GET_VALUE_STR. Later, when they are no longer needed they are
undefined in order to keep the namespace clean. However, the
GET_VALUE_STRING is undefined instead of GET_VALUE_STR.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Implement ZFS storage backend driver. Currently supported
only on FreeBSD because of ZFS limitations on Linux.
Features supported:
- pool-start, pool-stop
- pool-info
- vol-list
- vol-create / vol-delete
Pool definition looks like that:
<pool type='zfs'>
<name>myzfspool</name>
<source>
<name>actualpoolname</name>
</source>
</pool>
The 'actualpoolname' value is a name of the pool on the system,
such as shown by 'zpool list' command. Target makes no sense
here because volumes path is always /dev/zvol/$poolname/$volname.
User has to create a pool on his own, this driver doesn't
support pool creation currently.
A volume could be used with Qemu by adding an entry like this:
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='myzfspool' volume='vol5'/>
<target dev='hdc' bus='ide'/>
</disk>
In qemuMigrationToFile we enter the monitor multiple times and don't
check if the VM is still alive after returning form the monitor. Add the
checks to skip pieces of code in case the VM crashes while saving it's
state.
Saving a shutoff VM doesn't make sense and libvirtd crashes while
attempting to do that. Check that the domain is alive after entering
the save async job.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1129207
https://bugzilla.redhat.com/show_bug.cgi?id=1128751
There's this <driver/> element under <interface/> which can have
several attributes. However, the driver element is currently formated
only if the driver's name or txmode has been specified. This makes
only a little sense as we parse even partial <driver/>, for instance:
<interface type='user'>
<mac address='52:54:00:e5:48:58'/>
<model type='virtio'/>
<driver ioeventfd='on' event_idx='on' queues='5'/>
</interface>
But such XML would never get formatted back.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When a network is defined with "<pf dev='xyz'/>", libvirt will query
sysfs to learn the list of all virtual functions (VF) associated with
that Physical Function (PF) then populate the network's interface pool
accordingly. This action was previously done only when the first guest
actually requested an interface from the network. This patch changes
it to populate the pool immediately when the network is started. This
way any problems with the PF or its VFs will become apparent sooner.
Note that we can't remove the old calls to networkCreateInterfacePool
that happen whenever a guest requests an interface - doing so would be
asking for failures on hosts that had libvirt upgraded with a network
that had been started but not yet used.
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1047818
networkCreateInterfacePool was a bit loose in its error cleanup, which
could result in a network definition with interfaces in the pool that
were NULL. This would in turn lead to a libvirtd crash when a guest
tried to attach an interface using the network with that pool.
In particular this would happen when creating a pool to be used for
macvtap connections. macvtap needs the netdev name of the virtual
function in order to use it, and each VF only has a netdev name if it
is currently bound to a network driver. If one of the VFs of a PF
happened to be bound to the pci-stub or vfio-pci driver (indicating
it's already in use for PCI passthrough), or no driver at all, it
would have no name. In this case networkCreateInterfacePool would
return an error, but would leave the netdef->forward.nifs set to the
total number of VFs in the PF. The interface attach that triggered
calling of networkCreateInterfacePool (it uses a "lazy fill" strategy)
would simply fail, but the very next attempt to attach an interface
using the same network pool would result in a crash.
This patch refactors networkCreateInterfacePool to bring it more in
line with current coding practices (label name, use of a switch with
no default case) as well as providing the following two changes to
behavior:
1) If a VF with no netdev name is encountered, just log a warning and
continue; only fail if exactly 0 devices are found to put in the pool.
2) If the function fails, clean up any partial interface pool and set
netdef->forward.nifs to 0.
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1111455
Otherwise we fail like
libvirt version: 1.2.7, package: 6 (root 2014-08-08-16:09:22 bogon)
virAuditOpen:62 : Unable to initialize audit layer: Protocol not supported
virFileGetDefaultHugepageSize:2958 : internal error: Unable to parse /proc/meminfo
virStateInitialize:749 : Initialization of QEMU state driver failed: internal error: Unable to parse /proc/meminfo
daemonRunStateInit:922 : Driver state initialization failed
if the data can't be determined.
Reference: http://bugs.debian.org/757609
This fixes compilation on kFreeBSD which otherwise fails like
CC util/libvirt_util_la-virprocess.lo
In file included from /usr/include/sys/cpuset.h:35:0,
from util/virprocess.c:43:
/usr/include/sys/_cpuset.h:49:43: error: 'NBBY' undeclared here (not in
a function)
long __bits[howmany(CPU_SETSIZE, _NCPUBITS)];
^
In file included from util/virprocess.c:43:0:
/usr/include/sys/cpuset.h:215:12: error: unknown type name 'cpusetid_t'
int cpuset(cpusetid_t *);
^
/usr/include/sys/cpuset.h:216:30: error: expected ')' before 'id_t'
int cpuset_setid(cpuwhich_t, id_t, cpusetid_t);
^
/usr/include/sys/cpuset.h:217:42: error: expected ')' before 'id_t'
int cpuset_getid(cpulevel_t, cpuwhich_t, id_t, cpusetid_t *);
^
/usr/include/sys/cpuset.h:218:48: error: expected ')' before 'id_t'
int cpuset_getaffinity(cpulevel_t, cpuwhich_t, id_t, size_t, cpuset_t
*);
^
/usr/include/sys/cpuset.h:219:48: error: expected ')' before 'id_t'
int cpuset_setaffinity(cpulevel_t, cpuwhich_t, id_t, size_t, const
cpuset_t *);
And it's the correct usage as documented in
http://www.freebsd.org/cgi/man.cgi?query=cpuset_setid
Also change the #ifdef HAVE_BSH_CPU_AFFINITY to #if for consistency.
A command to freeze a part of mounted file systems is implemented in
upstream QEMU-guest-agent with a name of 'guest-fsfreeze-freeze-list'.
This fixes the name of the command used to partial fsfreeze in qemu driver
when 'mountpoints' option is specified to virDomainFSFreeze API.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
The virDomainSetInterfaceParameters implementation in qemu over
VIR_DOMAIN_AFFECT_CONFIG doesn't work as expected. When trying to
clear out the bandwidth settings for an interface, it has no
actual effect:
virsh # domiftune --config $domain $interface
inbound.average: 100
inbound.peak : 0
inbound.burst : 0
outbound.average: 10
outbound.peak : 0
outbound.burst : 0
virsh domiftune --config $domain $interface 0 0
virsh # domiftune --config $domain $interface
inbound.average: 100
inbound.peak : 0
inbound.burst : 0
outbound.average: 10
outbound.peak : 0
outbound.burst : 0
But according to virsh man page:
To clear inbound or outbound settings, use --inbound or
--outbound respectfully with average value of zero.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
introduce function
xenParseXMOS(virConfPtr conf,...........);
which parses the OS config instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
introduce function
xenParseXMGeneralMeta(virConfPtr conf, .......);
which parses general metadata instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
introduce function
xenParseXMDisk(virConfPtr conf, ........);
which parses xm disk config instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
introduce function
xenParseXMCPUFeatures(virConfPtr conf,.........);
which parses CPU features instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
introduce function
xenParseXMEventActions(virConfPtr conf,........)
which parses events leading to certain actions
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
introduce function
xenParseXMTimeOffset(virConfPtr conf,.......);
which parses time offset config instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
During review of the iSCSI hostdev series, eblake noted that the
prototypes shouldn't have the extranenous space between the "*" and
the function name:
http://www.redhat.com/archives/libvir-list/2014-July/msg01227.html
Since it was more invasive than 1 or 2 lines - I said I'd send a
patch covering this once committed.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Introduce a new structure to handle an iSCSI host device based on the
existing virDomainHostdevSubsysSCSI by adding a "protocol='iscsi'" to
the <source/> element. The existing scsi_host subsystem RNG was modified
to read an optional "protocol='adapter'", although it won't be written
out nor is it documented as an option (by choice).
The new hostdev structure mimics the existing <disk/> element for an
iSCSI device (network) device. New XML is:
<hostdev mode='subsystem' type='scsi' managed='yes'>
<source protocol='iscsi' name='iqn.1992-01.com.example'>
<host name='example.org' port='3260'/>
<auth username='myname'>
<secret type='iscsi' usage='mycluster_myname'/>
</auth>
</source>
<address type='drive' controller='0' bus='0' target='2' unit='5'/>
</hostdev>
The controller element will mimic the existing scsi_host code insomuch
as when 'lsi' and 'virtio-scsi' are used.
In preparation for hostdev support for iSCSI and a virStorageNetHostDefPtr,
split out the network disk storage parsing of the 'host' element into a
separate routine.
Commit febf84c2 tried to delay in-memory modification of the actual
domain disk structure until after the qemu event was received.
However, I missed that the code for block pivot had been temporarily
setting disk->src = disk->mirror prior to the qemu command, in order
to label the backing chain of a reused external blockcopy disk;
and calls into qemu while still in that state before finally undoing
things at the cleanup label. Since the qemu event handler then does:
virStorageSourceFree(disk->src);
disk->src = disk->mirror;
we have the sad race that a fast enough qemu event can cause a leak of
the original disk->src, as well as a use-after-free of the disk->mirror
contents, bad enough to crash libvirtd in some of my test runs, even
though the common case of the qemu event being much later won't trip
the race.
I'll go wear the brown paper bag of shame, for introducing a crasher
in between rc1 and rc2 of the freeze for 1.2.7 :( My only
consolation is that virDomainBlockJobAbort requires the domain:write
ACL, so it is not a CVE.
The valgrind report when the race occurs looks like:
==25612== Invalid read of size 4
==25612== at 0x50E7C90: virStorageSourceGetActualType (virstoragefile.c:1948)
==25612== by 0x209C0B18: qemuDomainDetermineDiskChain (qemu_domain.c:2473)
==25612== by 0x209D7F6A: qemuProcessHandleBlockJob (qemu_process.c:1087)
==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357)
...
==25612== Address 0xe4b5610 is 0 bytes inside a block of size 200 free'd
==25612== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==25612== by 0x50839E9: virFree (viralloc.c:582)
==25612== by 0x50E7E51: virStorageSourceFree (virstoragefile.c:2015)
==25612== by 0x209D7EFF: qemuProcessHandleBlockJob (qemu_process.c:1073)
==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357)
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Don't corrupt
disk->src, and only label chain for blockcopy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Valgrind caught a memory leak:
==2018== 9 bytes in 1 blocks are definitely lost in loss record 143 of 927
==2018== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2018== by 0x8C42369: strdup (strdup.c:42)
==2018== by 0x50EACC9: virStrdup (virstring.c:676)
==2018== by 0x50E79E5: virStorageSourceCopy (virstoragefile.c:1845)
==2018== by 0x20A3FAA7: qemuDomainBlockCommit (qemu_driver.c:15620)
==2018== by 0x51DC6B2: virDomainBlockCommit (libvirt.c:20092)
I traced it to the fact that blockcopy and blockcommit end up
reparsing a backing chain on pivot, but the chain parsing code
doesn't gracefully handle the case where the backing file is
already known.
I'm not exactly sure when this was introduced, but suspect that the
refactoring in commit 9944b71 and friends that moved towards probing
in-place rather than into a temporary structure are part of the cause.
* src/util/virstoragefile.c (virStorageFileGetMetadataInternal):
Don't leak any prior value.
Signed-off-by: Eric Blake <eblake@redhat.com>
Fix a comment in virDomainAuditNetDevice.
Fix a typo in comment of qemuPhysIfaceConnect which is
the caller of virDomainAuditNetDevice.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
RNG schema as well as the qemu driver requires absolute paths for memory
and disk snapshot image files but the XML parser was not enforcing it.
Add checks to avoid problems in qemu where the configuration it creates
is invalid.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1126329
Since commit be0782e1 we are parsing /proc/meminfo to find out the
default huge page size. However, if the host we are running at does
not support any huge pages (e.g. CONFIG_HUGETLB_PAGE is turned off),
we will not successfully parse the meminfo file and hence the whole
qemu driver init process fails. Moreover, the default huge page size
is needed if and only if there's at least one hugetlbfs mount point.
So the fix consists of moving the virFileGetDefaultHugepageSize
function call after the first hugetlbfs mount point is found.
With this fix, we fail to start with one or more hugetlbfs mounts and
malformed meminfo file, but that's expected (how can one mount
hugetlbfs without kernel supporting huge pages?). Workaround in that
case is to umount all the hugetlbfs mounts.
Reported-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In a system with Fiber Channel Host Adapters, a query to list all Fibre Channel
HBAs OR Vports currently returns empty list:
$ virsh nodedev-list --cap fc_host
$
Libvirt correctly discovers properties for all HBAs. However, the reporting
fails because of incorrect flag comparison while filtering these types.
This is fixed by removing references to 'VIR_CONNECT_LIST_NODE_DEVICES_CAP_*'
for comparison and replacing those with 'VIR_NODE_DEV_CAP_*'
Introduced by original commit id '652a2ec6'
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Commit 232a31b munged job info to report 'active commit' instead of
'commit' when generating events, but forgot to also munge the polling
variant of the command.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Adjust type as
needed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Otherwise this beautiful error would be overwritten when
the function is called with a really high rate number:
2014-07-28 12:51:47.920+0000: 2304: error : virCommandWait:2399 :
internal error: Child process (/sbin/tc class add dev vnet0 parent 1:
classid 1:1 htb rate 4294968kbps) unexpected exit status 1: Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
default minor id of class to which unclassified packets are sent {0}
r2q DRR quantums are computed as rate in Bps/r2q {10}
debug string of 16 numbers each 0-3 {0}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
[prio P] [slot S] [pslot PS]
[ceil R2] [cburst B2] [mtu MTU] [quantum Q]
rate rate allocated to this class (class can still borrow)
burst max bytes burst which can be accumulated during idle period {computed}
mpu minimum packet size used in rate computations
overhead per-packet size overhead used in rate computations
linklay adapting to a linklayer e.g. atm
ceil definite upper class rate (no borrows) {rate}
cburst burst but for ceil {computed}
mtu max packet size we create rate map for {1600}
prio priority of leaf; lowe
https://bugzilla.redhat.com/show_bug.cgi?id=1043735
https://bugzilla.redhat.com/show_bug.cgi?id=1072653
Upon successful upload of a volume, the target volume and storage pool
were not updated to reflect any changes as a result of the upload. Make
use of the existing stream close callback mechanism to force a backend
pool refresh to occur in a separate thread once the stream closes. The
separate thread should avoid potential deadlocks if the refresh needed
to wait on some event from the event loop which is used to perform
the stream callback.
There are multiple mount points after commit 725a211f, but one comment
wasn't changed to use plurals.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Cygwin has getifaddrs(), but not AF_LINK, leading to:
util/virstats.c: In function 'virNetInterfaceStats':
util/virstats.c:138:41: error: 'AF_LINK' undeclared (first use in this function)
if (ifa->ifa_addr->sa_family != AF_LINK)
...
* src/util/virstats.c (virNetInterfaceStats): Only use getifaddrs
if AF_LINK is present.
Signed-off-by: Eric Blake <eblake@redhat.com>
libvirt previously only touched an interface's disable_ipv6 setting in
sysfs if it needed to be set to 1, assuming that 0 is the
default. Apparently that isn't always the case though (kernel 3.15.7-1
in Arch Linux reportedly defaults a new interface's disable_ipv6
setting to 1) so this patch explicitly sets it to 0 or 1 as
appropriate.
With this in place, I can (finally!) now do:
virsh blockcommit $dom vda --shallow --verbose --pivot
and watch qemu shorten the backing chain by one, followed by
libvirt automatically updating the dumpxml output, effectively
undoing the work of virsh snapshot-commit --no-metadata --disk-only.
Commit is SOOOO much faster than blockpull, when I'm still fairly
close in time to when the temporary qcow2 wrapper file was created
via a snapshot operation!
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Implement live
commit.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch is going to wire up qemu active block commit jobs;
but as they have similar events and are canceled/pivoted in the
same way as block copy jobs, it is easiest to track all bookkeeping
for the commit job by reusing the <mirror> element. This patch
adds domain XML to track which job was responsible for creating a
mirroring situation, and adds a job='copy' attribute to all
existing uses of <mirror>. Along the way, it also massages the
qemu monitor backend to read the new field in order to generate
the correct type of libvirt job (even though it requires a
future patch to actually cause a qemu event that can be reported
as an active commit). It also prepares to update persistent XML
to match changes made to live XML when a copy completes.
* docs/schemas/domaincommon.rng: Enhance schema.
* docs/formatdomain.html.in: Document it.
* src/conf/domain_conf.h (_virDomainDiskDef): Add a field.
* src/conf/domain_conf.c (virDomainBlockJobType): String conversion.
(virDomainDiskDefParseXML): Parse job type.
(virDomainDiskDefFormat): Output job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Distinguish
active from regular commit.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set job type.
(qemuDomainBlockPivot, qemuDomainBlockJobImpl): Clean up job type
on completion.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml:
Update tests.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml: New
file.
* tests/qemuxml2xmltest.c (mymain): Drive new test.
Signed-off-by: Eric Blake <eblake@redhat.com>
If all features are set to default (including the capabilities policy),
but some capabilities are toggled, we need to output the <features>
element when formatting the config.
We were not directly saving the domain XML to file after starting
or finishing a blockcopy. Without the startup write, a libvirtd
restart in the middle of a copy job would forget that the job was
underway. Then at pivot, we were indirectly writing new XML in
reaction to events that occur as we stop and restart the guest CPUs.
But there was a race: since pivot is an async action, it is possible
that libvirtd is restarted before the pivot completes, so if XML
changes during the event, that change was not written. The original
blockcopy code cleared out the <mirror> element prior to restarting
the CPUs, but this is also a race, observed if a user does an async
pivot and a dumpxml before the event occurs. Furthermore, this race
will interfere with active commit in a future patch, because that
code will rely on the <mirror> element at the time of the qemu event
to determine whether to inform the user of a normal commit or an
active commit.
Fix things by saving state any time we modify live XML, while
delaying XML disk modifications until after the event completes. We
still need a to teach libvirtd restarts to examine all existing
<mirror> elements to see if the job completed in the meantime (that
is, if libvirtd misses the event, the updated state still needs to be
updated in live XML), but that will be a later patch, in part because
we also need to to start taking advantage of newer qemu's ability to
keep the job around after completion rather than the current usage
where the job disappears both on error and on success.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change
on disk.
(qemuDomainBlockJobImpl, qemuDomainBlockPivot): Move job-end XML
rewrites...
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here.
Signed-off-by: Eric Blake <eblake@redhat.com>
Doing a blockcopy operation across a libvirtd restart is not very
robust at the moment. In particular, we are clearing the <mirror>
element prior to telling qemu to finish the job. Also, thanks to the
ability to request async completion, the user can easily regain
control prior to qemu actually finishing the effort, and they should
be able to poll the domain XML to see if the job is still going.
A future patch will fix things to actually wait until qemu is done
before modifying the XML to reflect the job completion. But since
qemu issues identical BLOCK_JOB_COMPLETE events regardless of whether
the job was cancelled (kept the original disk) or completed (pivoted
to the new disk), we have to track which of the two operations were
used to end the job. Furthermore, we'd like to avoid attempts to
end a job where we are already waiting on an earlier request to qemu
to end the job. Likewise, if we miss the qemu event (perhaps because
it arrived during a libvirtd restart), we still need enough state
recorded to be able to determine how to modify the domain XML once
we reconnect to qemu and manually learn whether the job still exists.
Although this patch doesn't actually fix the problem, it is a
preliminary step that makes it possible to track whether a job
has already begun steps towards completion.
* src/conf/domain_conf.h (virDomainDiskMirrorState): New enum.
(_virDomainDiskDef): Convert bool mirroring to new enum.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainDiskDefFormat): Handle new values.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Adjust
client.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl): Likewise.
* docs/schemas/domaincommon.rng (diskMirror): Expose new values.
* docs/formatdomain.html.in (elementsDisks): Document it.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
If PCI passthrough type is not supported, we should error out rather than
continue building the command line.
When starting a domain, the type has been already checked by
qemuPrepareHostdevPCICheckSupport() before building qemu command line,
so the problem doesn't emerge.
But when coverting a domain xml without specifying passthrough type explictly
to qemu arg, we will get a malformed command line.
the xml:
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0001' bus='0x03' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>
the converted command line:
-device ,host=0001:03:00.0,id=hostdev0,bus=pci.0,addr=0x5
After this patch, virsh gives an error message:
virsh domxml-to-native qemu-argv /tmp/tmp.xml
error: internal error: invalid PCI passthrough type 'default'
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Use better detection of hugetlbfs mount points. Yes, there can be
multiple mount points each serving different huge page size.
Since we already have ability to override the mount point in the
qemu.conf file, this crazy backward compatibility code is brought in.
Now we allow multiple mount points, so the "hugetlbfs_mount" option
must take an list of strings (mount points). But previously, it was
just a string, so we must accept both types now.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This should iterate over mount tab and search for hugetlbfs among with
looking for the default value of huge pages.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Use correct mode when pre-creating files (for snapshots). The refactor
changing to storage driver usage caused a regression as some systems
created the file with 000 permissions forbidding qemu to write the file.
Pass mode to the creating functions to avoid the problem.
Regression since 185e07a5f8.
Leak introduced in commit 16ebf10f (v1.2.6), detected by valgrind:
==9816== 216 (96 direct, 120 indirect) bytes in 6 blocks are definitely lost in loss record 665 of 821
==9816== at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9816== by 0x50836FB: virAlloc (viralloc.c:144)
==9816== by 0x1DBDBE27: udevProcessPCI (node_device_udev.c:546)
==9816== by 0x1DBDD79D: udevGetDeviceDetails (node_device_udev.c:1293)
* src/util/virpci.h (virPCIEDeviceInfoFree): New prototype.
* src/util/virpci.c (virPCIEDeviceInfoFree): New function.
* src/conf/node_device_conf.c (virNodeDevCapsDefFree): Clear
pci_express under pci case.
(virNodeDevCapPCIDevParseXML): Avoid leak.
* src/node_device/node_device_udev.c (udevProcessPCI): Likewise.
* src/libvirt_private.syms (virpci.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Finding virPCIE* code is more intuitive if located in virpci.h
instead of node_device_conf.h.
* src/conf/node_device_conf.h (virPCIELinkSpeed, virPCIELink)
(virPCIEDeviceInfo): Move...
* src/util/virpci.h: ...here.
* src/conf/node_device_conf.c (virPCIELinkSpeed): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The compiler can alert us to places where we need to expand switch
statements because we add a new enum value, but only if we don't
have a default case.
* src/conf/node_device_conf.c (virNodeDeviceDefFormat)
(virNodeDevCapsDefParseXML, virNodeDevCapsDefFree): Drop default
case.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit e5f36698e3 introduces a
false-positive build failure in the sound card model handling switch.
Initialize the model to NULL although the value should never be used.
Libvirt documents that the default entropy source for the 'random'
backend of a RNG device is /dev/random. Instead of storing and
propagating NULL across our code and checking it in multiple places fill
the default in the post parse callback and use that in the other places.
Since 24e5cafba6 (thankfully unreleased)
when a VM with an empty disk drive would be started the code would call
stat() on NULL path as a check was missing from the callback rendering
machines unstartable.
Report success when the path is empty (denoting an empty drive).
virTimeFieldsThenRaw will never return negative result, so I clean up
the related meaningless judgements to make it better.
Signed-off-by: James <james.wangyufei@huawei.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The "random" backend for virtio-rng can be started with no path
specified which equals to /dev/random. The cgroup code didn't consider
this and called few of the functions with NULL resulting into:
$ virsh start rng-vm
error: Failed to start domain rng-vm
error: Path '(null)' is not accessible: Bad address
Problem introduced by commit c6320d3463
If user hasn't provided any @emulatorbin, the qemuCaps are
searched by @arch provided (which in fact can be guessed from the
host). However, there's no guarantee that the qemu binary for
@arch will exist. Therefore qemu capabilities may be nonexistent
too. If that's the case, we should throw an error message prior
jumping onto 'cleanup' label as the helper lookup function
remains silent on no search result.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add support for CDROM devices for bhyve driver using
bhyve(8)'s 'ahci-cd' device type.
As bhyve currently does not support media insertion at runtime,
disallow to start a domain with an empty source path for cdrom
devices.
Create the structures and API's to hold and manage the iSCSI host device.
This extends the 'scsi_host' definitions added in commit id '5c811dce'.
A future patch will add the XML parsing, but that code requires some
infrastructure to be in place first in order to handle the differences
between a 'scsi_host' and an 'iSCSI host' device.
Split virDomainHostdevSubsysSCSI further. In preparation for having
either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI
to contain just a virDomainHostdevSubsysSCSIHost to describe the
'scsi_host' host device
To integrate the security driver with the storage driver we need to
pass a callback for a function that will chown storage volumes.
Introduce and document the callback prototype.
When restoring security labels in the dac driver the code would resolve
the file path and use the resolved one to be chown-ed. The setting code
doesn't do that. Remove the unnecessary code.
With my intended use of storage driver assist to chown files on remote
storage we will need a witness that will tell us whether the given
storage volume supports operations needed by the storage driver.
Gluster storage works on a similar principle to NFS where it takes the
uid and gid of the actual process and uses it to access the storage
volume on the remote server. This introduces a need to chown storage
files on gluster via native API.
Up to now, users have to pass two arguments at least: domain virt type
('qemu' vs 'kvm') and one of emulatorbin or architecture. This is not
much user friendly. Nowadays users mostly use KVM and share the host
architecture with the guest. So now, the API (and subsequently virsh
command) can be called with all NULLs (without any arguments).
Before this patch:
# virsh domcapabilities
error: failed to get emulator capabilities
error: virttype_str in qemuConnectGetDomainCapabilities must not be NULL
# virsh domcapabilities kvm
error: failed to get emulator capabilities
error: invalid argument: at least one of emulatorbin or architecture fields must be present
After:
# virsh domcapabilities
<domainCapabilities>
<path>/usr/bin/qemu-system-x86_64</path>
<domain>kvm</domain>
<machine>pc-i440fx-2.1</machine>
<arch>x86_64</arch>
<vcpu max='255'/>
</domainCapabilities>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds back the virDomainDef typedef into domain_conf and
makes all the numatune_conf functions independent of any virDomainDef
definitions.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Our seclabel parsing was repeatedly assigning malloc'd data into a
temporary variable, without first freeing the previous use. Among
other leaks flagged by valgrind:
==9312== 8 bytes in 1 blocks are definitely lost in loss record 88 of 821
==9312== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9312== by 0x8C40369: strdup (strdup.c:42)
==9312== by 0x50EA799: virStrdup (virstring.c:676)
==9312== by 0x50FAEB9: virXPathString (virxml.c:90)
==9312== by 0x50FAF1E: virXPathStringLimit (virxml.c:112)
==9312== by 0x510F516: virSecurityLabelDefParseXML (domain_conf.c:4571)
==9312== by 0x510FB20: virSecurityLabelDefsParseXML (domain_conf.c:4720)
While it was multiple problems, it looks like commit da78351 (thankfully
unreleased) was to blame for all of them.
* src/conf/domain_conf.c (virSecurityLabelDefParseXML): Plug leaks
detected by valgrind.
Signed-off-by: Eric Blake <eblake@redhat.com>
Introduced in commit 70571ccc (v1.2.4). Caught by valgrind:
==9816== 170 (32 direct, 138 indirect) bytes in 1 blocks are definitely lost in loss record 646 of 821
==9816== at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9816== by 0x50836FB: virAlloc (viralloc.c:144)
==9816== by 0x50AEC2B: virFirewallNew (virfirewall.c:204)
==9816== by 0x1E2308ED: ebiptablesDriverProbeStateMatch (nwfilter_ebiptables_driver.c:3715)
==9816== by 0x1E2309AD: ebiptablesDriverInit (nwfilter_ebiptables_driver.c:3742)
* src/nwfilter/nwfilter_ebiptables_driver.c
(ebiptablesDriverProbeStateMatch): Properly clean up.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1122205
Although the edits were changing in-memory XML, it was not flushed
to disk; so unless some other action changes XML, a libvirtd restart
would lose the changed information.
* src/conf/domain_conf.c (virDomainObjSetMetadata): Add parameter,
to save live status across restarts.
(virDomainSaveXML): Allow for test driver.
* src/conf/domain_conf.h (virDomainObjSetMetadata): Adjust
signature.
* src/bhyve/bhyve_driver.c (bhyveDomainSetMetadata): Adjust caller.
* src/lxc/lxc_driver.c (lxcDomainSetMetadata): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSetMetadata): Likewise.
* src/test/test_driver.c (testDomainSetMetadata): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The patch described above introduced two problems caught by the compiler
and thus breaking the build.
One of the problems was comparison of unsigned with < 0 and the second
one jumped a variable init.
Before:
virsh # dominfo chx3
State: shut off
Max memory: 92160 KiB
Used memory: 92160 KiB
After:
virsh # dominfo container1
State: shut off
Max memory: 92160 KiB
Used memory: 0 KiB
Similar to qemu cases.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Added <capabilities> in the <features> section of LXC domains
configuration. This section can contain elements named after the
capabilities like:
<mknod state="on"/>, keep CAP_MKNOD capability
<sys_chroot state="off"/> drop CAP_SYS_CHROOT capability
Users can restrict or give more capabilities than the default using
this mechanism.
kernel commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e
forbid us doing a fresh mount for sysfs
when enable userns but disable netns.
This patch will create a bind mount in this senario.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Under ./configure --without-numactl but with numactl-devel installed,
the build fails with:
../../src/util/virnuma.c: In function 'virNumaNodeIsAvailable':
../../src/util/virnuma.c:407:5: error: implicit declaration of function 'numa_bitmask_isbitset' [-Werror=implicit-function-declaration]
return numa_bitmask_isbitset(numa_nodes_ptr, node);
^
and other failures, all because the configure results for particular
functions were used without regard to whether libnuma was even being
linked in.
* src/util/virnuma.c (virNumaGetPages): Fix message typo.
(virNumaNodeIsAvailable): Correct build when not using numactl.
Signed-off-by: Eric Blake <eblake@redhat.com>
virStorageBackendLogicalCreateVol contains a piece like:
if (vol->target.path != NULL) {
/* A target path passed to CreateVol has no meaning */
VIR_FREE(vol->target.path);
}
The 'if' is useless here, but 'syntax-check' doesn't catch that
because of the comment, so drop the 'if'.
Commit ef48a1b introduced virFindSCSIHostByPCI for Linux and
a stub for other platforms that returns -1 while the function
should return 'char *', so use 'return NULL' instead.
Commit fbd91d4 introduced virReadSCSIUniqueId with the third
argument 'int *result', however the stub for non-Linux patform
uses 'unsigned int *result', so change it to 'int *result'.
Pushed under the build breaker rule.
If a parentaddr was provided in the XML, have getAdapterName lookup
the stable address. This allows virStorageBackendSCSICheckPool() and
virStorageBackendSCSIRefreshPool() to automagically find the scsi_host
by its PCI address and unique_id
Introduce a new function to parse the provided scsi_host parent address
and unique_id value in order to find the /sys/class/scsi_host directory
which will allow a stable SCSI host address
Add a test to scsihosttest to lookup the host# name by using the PCI address
and unique_id value
Add an optional unique_id parameter to nodedev. Allows for easier lookup
and display of the unique_id value in order to document for use with
scsi_host code.
Introduce a new function to read the current scsi_host entry and return
the value found in the 'unique_id' file.
Add a 'scsihosttest' test (similar to the fchosttest, but incorporating some
of the concepts of the mocked pci test library) in order to read the
unique_id file like would be found in the /sys/class/scsi_host tree.
Between reboots and kernel reloads, the SCSI host number used for SCSI
storage pools may change requiring modification to the storage pool XML
in order to use a specific SCSI host adapter.
This patch introduces the "parentaddr" element and "unique_id" attribute
for the SCSI host adapter in order to uniquely identify the adapter
between reboots and kernel reloads. For now the goal is to only parse
and format the XML. Both will be required to be provided in order to
uniquely identify the desired SCSI host.
The new XML is expected to be as follows:
<adapter type='scsi_host'>
<parentaddr unique_id='3'>
<address domain='0x0000' bus='0x00' slot='0x1f' func='0x2'/>
</parentaddr>
</adapter>
where "parentaddr" is the parent device of the SCSI host using the PCI
address on which the device resides and the value from the unique_id file
for the device. Both the PCI address and unique_id values will be used
to traverse the /sys/class/scsi_host/ directories looking at each link
to match the PCI address reformatted to the directory link format where
"domain🚌slot:function" is found. Then for each matching directory
the unique_id file for the scsi_host will be used to match the unique_id
value in the xml.
For a PCI address listed above, this will be formatted to "0000:00:1f.2"
and the links in /sys/class/scsi_host will be used to find the host#
to be used for the 'scsi_host' device. Each entry is a link to the
/sys/bus/pci/devices directories, e.g.:
% ls -al /sys/class/scsi_host/host2
lrwxrwxrwx. 1 root root 0 Jun 1 00:22 /sys/class/scsi_host/host2 -> ../../devices/pci0000:00/0000:00:1f.2/ata3/host2/scsi_host/host2
% cat /sys/class/scsi_host/host2/unique_id
3
The "parentaddr" and "name" attributes are mutually exclusive to identify
the SCSI host number. Use of the "parentaddr" element will be the preferred
mechanism.
This patch only supports to parse and format the XMLs. Later patches will
add code to find out the scsi host number.
Rather than assume that NOT FC_HOST is SCSI_HOST, let's call them out
specifically. Makes it easier to find SCSI_HOST code/structs and ensures
something isn't missed in the future
We do so in the vast majority of places, so there's no problem of adding
the attribute to enforce it by the complier and fix a few leftover
places.
This was originally pointed out by Coverity as a recent change triggered
it's warning that our code checked the vast majority of returns from
virStrToLong_ui.
Convert the target snapshot state selector to a switch statement
enumerating all possible values. This points out a few mistakes in the
original selector.
The logic of the code is preserved until later patches.
Try to reconnect to the running domains after libvirtd restart. To
achieve that, do:
* Save domain state
- Modify virBhyveProcessStart() to save domain state to the state
dir
- Modify virBhyveProcessStop() to cleanup the pidfile and the state
* Detect if the state information loaded from the driver's state
dir matches the actual state. Consider domain active if:
- PID it points to exist
- Process title of this PID matches the expected one with the
domain name
Otherwise, mark the domain as shut off.
Note: earlier development bhyve versions before FreeBSD 10.0-RELEASE
didn't set proctitle we expect, so the current code will not detect
it. I don't plan adding support for this unless somebody requests
this.
As with the local SCSI passthrough devicesm qemu can't support snapshots
on those as the block ops are handled by the device. This is also true
for iSCSI backing of the disk. Remove the check for the local block
device and just forbid snapshot when the disk is of type 'lun'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1073368
There's no need to use it since we have this shiny functions
that even checks for conversion and overflow errors.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
LXC network devices can now be assigned a custom NIC device name on the
container side. For example, this is configured with:
<interface type='network'>
<source network='default'/>
<guest dev="eth1"/>
</interface>
In this example the network card will appear as eth1 in the guest.
https://bugzilla.redhat.com/show_bug.cgi?id=1091866
Add a new boolean 'sparse'. This will be used by the logical backend
storage driver to determine whether the target volume is sparse or not
(also known by a snapshot or thin logical volume). Although setting sparse
to true at creation could be seen as duplicitous to setting during
virStorageBackendLogicalMakeVol() in case there are ever other code paths
between Create and FindLVs that need to know about the volume be sparse.
Use the 'sparse' in a new virStorageBackendLogicalVolWipe() to decide whether
to attempt to wipe the logical volume or not. For now, I have found no
means to wipe the volume without writing to it. Writing to the sparse
volume causes it to be filled. A sparse logical volume is not completely
writeable as there exists metadata which if overwritten will cause the
sparse lv to go INACTIVE which means pool-refresh will not find it.
Access to whatever lvm uses to manage data blocks is not provided by
any API I could find.
Commit 93e82727 introduced numatune_conf.h file that contains
typedefs already defined in domain_conf.h, such as:
- virDomainNumatune
- virDomainNumatunePtr
- virDomainDef
- virDomainDefPtr
As numatune_conf.h is included by domain_conf.h, clang
complains about redefinition of typedef and the build fails.
In order to fix it, drop typedefs already defined by numatume_conf.h
from domain_conf.h.
Coverity complains about the return value of ioctl not being checked.
Even though we carry on when this fails (just like qemu-img does),
we can log an error.
For non-local storage drivers we can't expect to use the "scrub" tool to
wipe the volume. Split the code into a separate backend function so that
we can add protocol specific code later.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1118710
The next patch will move the storage volume wiping code into the
individual backends. This patch splits out the common code to wipe a
local volume into a separate backend helper so that the next patch is
simpler.
When domain is started with numatune memory mode strict and the
nodeset does not include host NUMA node with DMA and DMA32 zones, KVM
initialization fails. This is because cgroup restrict even kernel
allocations. We are already doing numa_set_membind() which does the
same thing, only it does not restrict kernel allocations.
This patch leaves the userspace numa_set_membind() in place and moves
the cpuset.mems setting after the point where monitor comes up, but
before vcpu and emulator sub-groups are created.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Currently, we only bind the whole QEMU domain to memory nodes
specified in nodemask altogether. That, however, doesn't make much
sense when one wants to control from where the memory for particular
guest nodes should be allocated. QEMU allows us to do that by
specifying 'host-nodes' parameter for the 'memory-backend-ram' object,
so let's use that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When qemu switched to using OptsVisitor for -numa parameter, it did
two things in the same patch. One of them is that the numa parameter
is now visible in "query-command-line-options", the second one is that
it enabled using disjoint cpu ranges for -numa specification. This
will be used in later patch.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The numa patch series in qemu adds "memory-backend-ram" object type by
which we can tell whether we can use such objects.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
That can be lately achieved with by having .param == NULL in the
virQEMUCapsCommandLineProps struct.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There were numerous places where numatune configuration (and thus
domain config as well) was changed in different ways. On some
places this even resulted in persistent domain definition not to be
stable (it would change with daemon's restart).
In order to uniformly change how numatune config is dealt with, all
the internals are now accessible directly only in numatune_conf.c and
outside this file accessors must be used.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Since there was already public virDomainNumatune*, I changed the
private virNumaTune to match the same, so all the uses are unified and
public API is kept:
s/vir\(Domain\)\?Numa[tT]une/virDomainNumatune/g
then shrunk long lines, and mainly functions, that were created after
that:
sed -i 's/virDomainNumatuneMemPlacementMode/virDomainNumatunePlacement/g'
And to cope with the enum name, I haad to change the constants as
well:
s/VIR_NUMA_TUNE_MEM_PLACEMENT_MODE/VIR_DOMAIN_NUMATUNE_PLACEMENT/g
Last thing I did was at least a little shortening of already long
name:
s/virDomainNumatuneDef/virDomainNumatune/g
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There are many places with numatune-related code that should be put
into special numatune_conf and this patch creates a basis for that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In XML format, by definition, order of fields should not matter, so
order of parsing the elements doesn't affect the end result. When
specifying guest NUMA cells, we depend only on the order of the 'cell'
elements. With this patch all older domain XMLs are parsed as before,
but with the 'id' attribute they are parsed and formatted according to
that field. This will be useful when we have tuning settings for
particular guest NUMA node.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Excerpt from the virCommandAddArgBuffer() description: "Correctly
transfers memory errors or contents from buf to cmd."
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This patch adds support for the QEMU vhost-user feature to libvirt.
vhost-user enables the communication between a QEMU virtual machine
and other userspace process using the Virtio transport protocol.
It uses a char dev (e.g. Unix socket) for the control plane,
while the data plane based on shared memory.
The XML looks like:
<interface type='vhostuser'>
<mac address='52:54:00:3b:83:1a'/>
<source type='unix' path='/tmp/vhost.sock' mode='server'/>
<model type='virtio'/>
</interface>
Signed-off-by: Michele Paolino <m.paolino@virtualopensystems.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1119173 documents that
commit eaba79d was flawed in the implementation of the
VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag when it comes to completing
a blockcopy. Basically, the qemu pivot action is async (the QMP
command returns immediately, but the user must wait for the
BLOCK_JOB_COMPLETE event to know that all I/O related to the job
has finally been flushed), but the libvirt command was documented
as synchronous by default. As active block commit will also be
using this code, it is worth fixing now.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Don't skip wait
loop after pivot.
Signed-off-by: Eric Blake <eblake@redhat.com>
Now that we've finally fixed all the violators, it's time to
enforce that any pointer to a const object is never freed (it
is aliasing some other memory, where the non-const original
should be freed instead). Alas, the code still needs a normal
vs. Coverity version, but at least we are still guaranteeing
that the macro call evaluates its argument exactly once.
I verified that we still get the following compiler warnings,
which in turn halts the build thanks to -Werror on gcc (hmm,
gcc 4.8.3's placement of the ^ for ?: type mismatch is a bit
off, but that's not our problem):
int oops1 = 0;
VIR_FREE(oops1);
const char *oops2 = NULL;
VIR_FREE(oops2);
struct blah { int dummy; } oops3;
VIR_FREE(oops3);
util/virauthconfig.c:159:35: error: pointer/integer type mismatch in conditional expression [-Werror]
VIR_FREE(oops1);
^
util/virauthconfig.c:161:5: error: passing argument 1 of 'virFree' discards 'const' qualifier from pointer target type [-Werror]
VIR_FREE(oops2);
^
In file included from util/virauthconfig.c:28:0:
util/viralloc.h:79:6: note: expected 'void *' but argument is of type 'const void *'
void virFree(void *ptrptr) ATTRIBUTE_NONNULL(1);
^
util/virauthconfig.c:163:35: error: type mismatch in conditional expression
VIR_FREE(oops3);
^
* src/util/viralloc.h (VIR_FREE): No longer cast away const.
* src/xenapi/xenapi_utils.c (xenSessionFree): Work around bogus
header.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add 'nocow' to storage volume xml so that user can have an option
to set NOCOW flag to the newly created volume. It's useful on btrfs
file system to enhance performance.
Btrfs has low performance when hosting VM images, even more when the guest
in those VM are also using btrfs as file system. One way to mitigate this
bad performance is to turn off COW attributes on VM files. Generally, there
are two ways to turn off COW on btrfs: a) by mounting fs with nodatacow,
then all newly created files will be NOCOW. b) per file. Add the NOCOW file
attribute. It could only be done to empty or new files.
This patch tries the second way, according to 'nocow' option, it could set
NOCOW flag per file:
for raw file images, handle 'nocow' in libvirt code; for non-raw file images,
pass 'nocow=on' option to qemu-img, and let qemu-img to handle that (requires
qemu-img version >= 2.1).
Signed-off-by: Chunyan Liu <cyliu@suse.com>
In many places we define a variable as a 'const char *' when in fact
we modify it just a few lines below. Or even free it. We should not do
that.
There's one exception though, in xenSessionFree() xenapi_utils.c. We
are freeing the xen_session structure which is defined in
xen/api/xen_common.h public header. The structure contains session_id
which is type of 'const char *' when in fact it should have been just
'char *'. So I'm leaving this unmodified, just noticing the fact in
comment.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When the backing store of a volume wasn't accessible while updating the
volume definition the call would fail altogether. In cases where we
currently (incorrectly) treat remote backing stores as local one this
might lead to strange errors.
Ignore the opening errors until we figure out how to track proper volume
metadata.
Use the backing store parser to properly create the information about a
volume's backing store. Unfortunately as the storage driver isn't
prepared to allow volumes backed by networked filesystems add a
workaround that will avoid changing the XML output.
Rework the apparmor lxc profile abstraction to mimic ubuntu's container-default.
This profile allows quite a lot, but strives to restrict access to
dangerous resources.
Removing the explicit authorizations to bash, systemd and cron files,
forces them to keep the lxc profile for all applications inside the
container. PUx permissions where leading to running systemd (and others
tasks) unconfined.
Put the generic files, network and capabilities restrictions directly
in the TEMPLATE.lxc: this way, users can restrict them on a per
container basis.
Rename linuxDomainInterfaceStats to virNetInterfaceStats in order
to allow adding platform specific implementations without
making consumer worrying about specific implementation to be used.
Also, rename util/virstatslinux.c to util/virstats.c so placing
other platform specific implementations into this file don't
look unexpected from the file name.
Code logic in libxlDomainAttachDeviceFlags and libxlDomainDetachDeviceFlags
is wrong with return value in error cases.
'ret' was being set to 0 if 'flags & VIR_DOMAIN_DEVICE_MODIFY_CONFIG' was
false. Then if something like virDomainDeviceDefParse() failed in the
VIR_DOMAIN_DEVICE_MODIFY_LIVE logic, the error would be reported but the
function would return success.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
4cc1f1a01f introduced a crash when doing a
block copy as virStorageSourceInitChainElement was called on
"disk->mirror" that is still NULL at that point instead of "mirror"
which temporarily holds the mirror source struct until it's fully
initialized. This resulted into a crash as a NULL was dereferenced.
Reported by: Shanzi Yu <shyu@redhat.com>
Commit id '3ea661de' refactored the code to use the 'disk->src->path'
instead of getting the path from virDomainDiskGetSource(). The one
call to qemuOpenFile() didn't use the disk source path, rather it used
the path as passed from the caller (in this case 'vda') - this caused
a failure with the virt-test/tp-libvirt as follows:
$ virsh domblkinfo virt-tests-vm1 vda
error: cannot stat file '/home/virt-test/shared/data/images/jeos-20-64.qcow2': Bad file descriptor
$
If the openvswitch service is stopped, and is followed by destroying a
VM, the openvswitch bridge translates into a state where it doesn't
recover the port configuration. While it successfully fetches data
from the internal DB, since the corresponding virtual interface does
not exists anymore the whole recovery process fails leaving restarted
VM with inability to connect to the bridge. The following set of
commands will trigger the problem:
virsh start vm
service openvswitch-switch stop
virsh destroy vm
service openvswitch-switch start
virsh start vm
Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Instead of allocating the virSecurityLabelDef structure ourselves, we
can utilize virSecurityLabelDefNew which even sets the default values
for us.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1113860
We've always done that. Well, until 990e46c45. Point is, if we don't
format model, we may lose a domain on libvirtd restart. If the
seclabel is implicit however, we should skip it's formatting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Snapshots and block-copy have a flag that forces qemu to re-use existing
file. Our docs weren't exactly clear on what the existing file should
contain for this to actually work.
Re-word the docs a bit to state that the file needs to be pre-created in
the desired format and the backing chain metadata needs to be set prior
to handing it over to qemu.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1084360
Commit dae1568c6c converted the perms
member of the virStorageVolTarget struct into a pointer to make it
optional. But virStorageVolTargetDefFormat did not check perms for
NULL before dereferencing it.
Don't fail when there is nothing to do, as a tweak to the previous
patch regarding output of libvirt-UUID.files for LXC apparmor profiles
Signed-off-by: Eric Blake <eblake@redhat.com>
This was converted to a typedef in 5a2bd4c917 "conf: more enum
cleanups in "src/conf/domain_conf.h"" causing:
libxl/libxl_conf.c: In function 'libxlDiskSetDiscard':
libxl/libxl_conf.c:724:19: error: conversion to incomplete type
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
In lxc, we could not use setmem command
with --config options.
This patch will add support for this.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1066894
With current code it's possible to have for instance:
virsh dumpxml mydomain | grep seclabel
<seclabel type='dynamic' model='selinux' relabel='yes'/>
<seclabel type='dynamic' model='selinux' relabel='yes'/>
<seclabel type='dynamic' model='selinux' relabel='yes'/>
<seclabel type='dynamic' model='selinux' relabel='yes'/>
<seclabel type='dynamic' model='selinux' relabel='yes'/>
what doesn't make any sense. We should reject the XML in the config
parsing phase.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This negation in names of boolean variables is driving me insane. The
code is much more readable if we drop the 'no-' prefix. Well, at least
for me.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For non-local storage drivers we can't expect to use the FDStream
backend for up/downloading volumes. Split the code into a separate
backend function so that we can add protocol specific code later.
This saves a few lines of code and catches the error when:
<spice autoport ='yes' defaultMode='any' ..>
<channel name='main' mode='secure'/>
</spice>
is specified with spice_tls = 0 in qemu.conf.
Instead of this error in qemuBuildGraphicsSPICECommandLine:
error: unsupported configuration: spice secure channels set in XML
configuration, but TLS port is not provided
an error is reported in qemuProcessSPICEAllocatePorts:
error: unsupported configuration: Auto allocation of spice TLS port
requested but spice TLS is disabled in qemu.conf
Inspired by:
https://www.redhat.com/archives/libvir-list/2014-June/msg01408.html
When creating cgroups for vcpu and emulator threads whilst starting a
domain, we explicitly skip creating those cgroups in case priv->cgroup
is NULL (cgroups not supported) because SetAffinity() serves the same
purpose. If the host supports only some cgroups (the ones we need are
either unmounted or disabled in qemu.conf), we error out with weird
message even though we could continue starting the domain.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1097028
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The commit referenced above changed function arguments of
virStorageFileGetMetadataFromBuf() but didn't tweak the
ATTRIBUTE_NONNULL tied to them. This was caught by coverity as it
actually obeys them. We disabled them for GCC and thus it didn't show
up.
Additionally in commit 3ea661deea I passed
NULL to the backingFormat argument which was also marked as nonnull. Use
a dummy int's address when the argument isn't supplied so that the code
doesn't need to change much.
Split out checking of invalid metadata type from the switch statement so
that we can use the typecasted enum value to allow tracking addition of
new items by the compliler.
Also avoids two dead-code break statements.
The default graphics channel mode is 'any', so as to defaultMode attribute.
If defaultMode and channel mode are all the default value 'any',
qemuConnectDomainXMLToNative will set TLSPort.
But in qemuBuildGraphicsSPICECommandLine, if spice_tls is not enabled, libvirtd
will report an error to tell the user that spice TLS is disabled in qemu.conf.
So qemuConnectDomainXMLToNative should check spice_tls is enabled,
then decide to allocate an tlsPort number to this graphics.
If user specified defaultMode is 'secure', qemuConnectDomainXMLToNative
could allocate tlsPort, and then let qemuBuildGraphicsSPICECommandLine reports
the spice_tls disabled error.
The related bug is:
https://bugzilla.redhat.com/show_bug.cgi?id=1113868
Signed-off-by: Jincheng Miao <jmiao@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Now that cgroups/security driver/locking driver support labelling of
individual images and tolerate network storage we don't have to refrain
from passing all image files to it. This allows removing the checking
code as we already make sure that the snapshot function won't be called
with unsupported options.
Now that security, cgroup and locking APIs support working on individual
images and we track the backing chain security info on a per-image basis
we can finally kill swapping the disk source in virDomainDiskDef and use
the virStorageSource directly.
Until now we were changing information about the disk source via
multiple steps of copying data. Now that we changed to a pointer to
store the disk source we might use it to change the approach to track
the data.
Additionally this will allow proper tracking of the backing chain.
When pivoting to a new disk source after a block commit (and possibly
after a soon-to-be-added active block commit) we changed just a few
fields to the new target. In case we'd copy a network disk to a local
file we'd not change the type properly.
To avoid such problems, switch to tracking of the source via changing of
the complete source struct to the one tracking the mirroring info.
Use the source struct and the corresponding function so that we can
avoid using the path separately. Now that
qemuDomainPrepareDiskChainElementPath isn't use anywhere, we can safely
remove it.
Additionally, the removal fixes a misaligned comment as the removed
function was added under a comment for a different function.
Add security driver functions to label separate storage images using the
virStorageSource definition. This will help to avoid the need to do ugly
changes to the disk struct and use the source directly.
Add functions that will allow to set all the required cgroup stuff on
individual images taking a virStorageSourcePtr. Also convert functions
designed to setup whole backing chain to take advantage of the change.
When dispatching events from the event loop, the array of registered
handles is searched to see what handles happened an event on. However,
the array is searched in weird way: the check for the array boundaries
is at the end, so we may touch the elements after the end of the
array:
==10434== Invalid read of size 4
==10434== at 0x52D06B6: virEventPollDispatchHandles (vireventpoll.c:486)
==10434== by 0x52D10E4: virEventPollRunOnce (vireventpoll.c:660)
==10434== by 0x52CF207: virEventRunDefaultImpl (virevent.c:308)
==10434== by 0x1639D1: virNetServerRun (virnetserver.c:1139)
==10434== by 0x1220DC: main (libvirtd.c:1507)
==10434== Address 0xc11ff04 is 4 bytes after a block of size 960 alloc'd
==10434== at 0x4C2CA5E: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10434== by 0x52AD378: virReallocN (viralloc.c:245)
==10434== by 0x52AD46E: virExpandN (viralloc.c:294)
==10434== by 0x52AD5B1: virResizeN (viralloc.c:352)
==10434== by 0x52CF2EC: virEventPollAddHandle (vireventpoll.c:116)
==10434== by 0x52CEF5B: virEventAddHandle (virevent.c:78)
==10434== by 0x11F69A90: nodeStateInitialize (node_device_udev.c:1797)
==10434== by 0x53C3C89: virStateInitialize (libvirt.c:743)
==10434== by 0x120563: daemonRunStateInit (libvirtd.c:919)
==10434== by 0x5317719: virThreadHelper (virthread.c:197)
==10434== by 0x8376F39: start_thread (in /lib64/libpthread-2.17.so)
==10434== by 0x8A7F9FC: clone (in /lib64/libc-2.17.so)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit a48f445100 introduced a helper
function to convert cgroup device mode to string. The function was only
conditionally compiled on platforms that support cgroup. This broke the
build when attempting to export the symbol:
CCLD libvirt.la
Cannot export virCgroupGetDevicePermsString: symbol not defined
Move the function out of the ifdef, as it doesn't really depend on the
cgroup code being present.
In libxlDomainMigrationConfirm(), a transient domain is removed
from the domain list after successful migration. Later in cleanup,
the domain object is unlocked, resulting in a crash
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb4208ed700 (LWP 12044)]
0x00007fb4267251e6 in virClassIsDerivedFrom (klass=0xdeadbeef,
parent=0x7fb42830d0c0) at util/virobject.c:169
169 if (klass->magic == parent->magic)
(gdb) bt
0 0x00007fb4267251e6 in virClassIsDerivedFrom (klass=0xdeadbeef,
parent=0x7fb42830d0c0) at util/virobject.c:169
1 0x00007fb42672591b in virObjectIsClass (anyobj=0x7fb4100082b0,
klass=0x7fb42830d0c0) at util/virobject.c:365
2 0x00007fb42672583c in virObjectUnlock (anyobj=0x7fb4100082b0)
at util/virobject.c:338
3 0x00007fb41a8c7d7a in libxlDomainMigrationConfirm (driver=0x7fb4100404c0,
vm=0x7fb4100082b0, flags=1, cancelled=0) at libxl/libxl_migration.c:583
Fix by setting the virDomainObjPtr to NULL after removing it from
the domain list.
During migration, the libxl driver starts a modify job in the
begin phase, ending the job in the confirm phase. This is
essentially VIR_MIGRATE_CHANGE_PROTECTION semantics, but the
driver does not support that flag. Without CHANGE_PROTECTION
support, the job would never be terminated in error conditions
where migrate confirm phase is not executed. Further attempts
to modify the domain would result in failure to acquire a job
after LIBXL_JOB_WAIT_TIME.
Similar to the qemu driver, end the job in the begin phase.
Protecting the domain object across all phases of migration can
be done in a future patch adding CHANGE_PROTECTION support.
In libxlDomainMigrationPrepare(), a new virDomainObj is created
from the incoming domain def and added to the driver's domain
list, but never removed if there are subsequent failures during
the prepare phase.
targethost# virsh list --all
sourcehost# virsh migrate --live dom xen+ssh://targethost/system
error: operation failed: Fail to create socket for incoming migration.
targethost# virsh list --all
error: Failed to list domains
error: name in virGetDomain must not be NULL
After adding code to remove the domain on prepare failure, noticed
that libvirtd crashed due to double free of the virDomainDef. Similar
to the qemu driver, pass a pointer to virDomainDefPtr so it can be set
to NULL once a virDomainObj is created from it.
Qemu will fallback to aio=threads when the cache mode doesn't use
O_DIRECT, even if aio=native was explictly set.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1086704
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Cgroups code uses VIR_CGROUP_DEVICE_* flags to specify the mode but in
the end it needs to be converted to a string. Add a helper to do it and
use it in the cgroup code before introducing it into the rest of the
code.
When discovering a disk backing chain the parent disk's metadata need to
be populated into the guest images so that each piece of the backing
chain contains a copy of those. This will allow us to refactor the
security driver so that it will not need to carry around the original
disk definition.
We are going to modify storage source chains in place. Add a helper that
will copy relevant information such as security labels to the new
element if that doesn't contain it.
In the future we might need to track state of individual images. Move
the readonly and shared flags to the virStorageSource struct so that we
can keep them in a per-image basis.
Some of the further changes will propagate seclabels from a disk source
element into the backing store elements. This would change the XML
output of the backing store as the seclabels would be formatted for each
backing store element. Skip the seclabels formatting until we decide
that it's necessary.
Now that we are able to select images from the backing chain via indexed
access we should also convert possible network sources to
qemu-compatible strings before passing them to qemu.
Now that we are able to select images from the backing chain via indexed
access we should also convert possible network sources to
qemu-compatible strings before passing them to qemu.
Introduce flag for the block rebase API to allow the rebase operation to
leave the chain relatively addressed. Also adds a virsh switch to enable
this behavior.
Introduce flag for the block commit API to allow the commit operation to
leave the chain relatively addressed. Also adds a virsh switch to enable
this behavior.
The qemu block info function relied on working with local storage. Break
this assumption by adding support for remote volumes. Unfortunately we
still need to take a hybrid approach as some of the operations require a
filedescriptor.
Previously you'd get:
$ virsh domblkinfo gl vda
error: cannot stat file '/img10': Bad file descriptor
Now you get some stats:
$ virsh domblkinfo gl vda
Capacity: 10485760
Allocation: 197120
Physical: 197120
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1110198
To allow reusing this function in the qemu driver we need to allow
specifying the storage format. Also separate return of the backing store
path now isn't necessary.
For inactive domains, set both current and maximum memory
to the specified 'maximum memory' value.
This matches the behavior of QEMU driver's SetMaxMemory.
https://bugzilla.redhat.com/show_bug.cgi?id=1091132
For the regular dump operation we migrate the VM to a file. This won't
work when the VM has passthrough devices assigned. Rather than reporting
a cryptic error from qemu run our check whether it can be migrated.
This does not influence the memory-only dump that is allowed with
passthrough devices.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=874418
We invoked virCgroupHasController twice for checking
VIR_CGROUP_CONTROLLER_DEVICES
in lxcDomainAttachDeviceDiskLive.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
To allow changing the name that is recorded in the top of the current
image chain used in a block pull/rebase operation, we need to specify
the backing name to qemu. This is done via the "backing-file" attribute
to the block-stream commad.
To allow changing the name that is recorded in the overlay of the TOP
image used in a block commit operation, we need to specify the backing
name to qemu. This is done via the "backing-file" attribute to the
block-commit command.
This command allows to change the backing file name recorded in the
metadata of a qcow (or other) image. The capability also notifies that
the "block-stream" and "block-commit" commands understand the
"backing-file" attribute.
Extract common operations done when creating an audit message to a
separate generic function that can be reused and convert RNG, disk, FS
and net audit to use it.
There's a lot of places where we skip doing actions based on the
locality of given storage type. The usual pattern is to skip it if:
virStorageSourceGetActualType(src) == VIR_STORAGE_TYPE_NETWORK
Add a simple helper to simplify the pattern to
virStorageSourceIsLocalStorage(src)
Replace the authType, chap, and cephx unions in virStoragePoolSource
with a single pointer to a virStorageAuthDefPtr. Adjust all users of
the previous chap/cephx and secret unions with the source->auth data.
Replace the inline "auth" struct in virStorageSource with a pointer
to a virStorageAuthDefPtr and utilize between the domain_conf, qemu_conf,
and qemu_command sources for finding the auth data for a domain disk
Introduce virStorageAuthDef and friends. Future patches will merge/utilize
their view of storage source/pool auth/secret definitions.
New API's include:
virStorageAuthDefParse: Parse the "<auth/>" XML data for either the
domain disk or storage pool returning a
virStorageAuthDefPtr
virStorageAuthDefCopy: Copy a virStorageAuthDefPtr - to be used by
the qemuTranslateDiskSourcePoolAuth when it
copies storage pool auth data into domain
disk auth data
virStorageAuthDefFormat: Common output of the "<auth" in the domain
disk or storage pool XML
virStorageAuthDefFree: Free memory associated with virStorageAuthDef
Subsequent patches will utilize the new functions for the domain disk and
storage pools.
Future work in the hostdev pass through can then make use of common data
structures and code.
Use the probing functionality added in the last patch to turn on
a capability bit when active commit is present, and gate active
commit on that capability.
For my own reference: the difference between BLOCKJOB_SYNC and
BLOCKJOB_ASYNC is whether qemu generated an event at the
conclusion of blockpull; basically, RHEL 6.2 was the only release
of qemu that has the sync semantics and lacks the event. RHEL
6.3 added blockcopy, but also picked up on the upstream style
of qemu generating events. As no one is likely to backport
active commit to RHEL 6.2, it's safe for blockcommit to always
require async blockjob support.
Modifying qemucapabilitiestest is painful; the .replies files would
be so much easier if they had comments correlating which command
generated the given reply. Maybe I'll fix that up later...
* src/qemu/qemu_capabilities.h (QEMU_CAPS_ACTIVE_COMMIT): New
capability.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Use the new bit
* src/qemu/qemu_capabilities.c (virQEMUCaps): Name the new bit.
(virQEMUCapsProbeQMPCommands): Set it.
* tests/qemucapabilitiesdata/caps_1.3.1-1.replies: Update.
* tests/qemucapabilitiesdata/caps_1.4.2-1.replies: Likewise.
* tests/qemucapabilitiesdata/caps_1.5.3-1.replies: Likewise.
* tests/qemucapabilitiesdata/caps_1.6.0-1.replies: Likewise.
* tests/qemucapabilitiesdata/caps_1.6.50-1.replies: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
We are about to turn on support for active block commit. Although
qemu 2.0 was the first version to mostly support it, that version
mis-handles 0-length files, and doesn't have anything available for
easy probing. But qemu 2.1 fixed bugs, and made life simpler by
letting the 'top' argument be optional. Unless someone begs for
active commit with qemu 2.0, for now we are just going to enable
it only by probing for qemu 2.1 behavior (anyone backporting active
commit can also backport the optional argument behavior). This
requires qemu.git commit 7676e2c597000eff3a7233b40cca768b358f9bc9.
Although all our actual uses of block-commit supply arguments for
both base and top, we can omit both arguments and use a bogus
device string to trigger an interesting behavior in qemu. All QMP
commands first do argument validation, failing with GenericError
if a mandatory argument is missing. Once that passes, the code
in the specific command gets to do further checking, and the qemu
developers made sure that if device is the only supplied argument,
then the block-commit code will look up the device first, with a
failure of DeviceNotFound, before attempting any further argument
validation (most other validations fail with GenericError). Thus,
the category of error class can reliably be used to decipher
whether the top argument was optional, which in turn implies a
working active commit. Since we expect our bogus device string to
trigger an error either way, the code is written to return a
distinct return value without spamming the logs.
* src/qemu/qemu_monitor.h (qemuMonitorSupportsActiveCommit): New
prototype.
* src/qemu/qemu_monitor.c (qemuMonitorSupportsActiveCommit):
Implement it.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockCommit):
Allow NULL for top and base, for probing purposes.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockCommit):
Likewise, implementing the probe.
* tests/qemumonitorjsontest.c (mymain): Enable...
(testQemuMonitorJSONqemuMonitorSupportsActiveCommit): ...a new test.
Signed-off-by: Eric Blake <eblake@redhat.com>
So far only information on disks and host devices are exposed in the
capabilities XML. Well, at least something. Even a new test is
introduced. The qemu capabilities are stolen from already existing
qemucapabilities test. There's one tricky point though. Functions that
checks host's KVM and VFIO capabilities, are impossible to mock
currently. So in the test, we are setting the capabilities by hand.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Sometimes it may be useful to get a default machine for given qemu
binary. Fortunately, the default machine is stored always on the first
position in the supported machines array.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This internal API is meant to answer the question 'Is this machine
type supported by given qemu?'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The API may come handy if somebody has an architecture and wants to
look through available qemus if the architecture is supported or not.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This new module holds and formats capabilities for emulator. If you
are about to create a new domain, you may want to know what is the
host or hypervisor capable of. To make sure we don't regress on the
XML, the formatting is not something left for each driver to
implement, rather there's general format function.
The domain capabilities is a lockable object (even though the locking
is not necessary yet) which uses reference counter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In the lastest rework (9e7ecabf) a cleanup label was left over which
results in compilation error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Replace:
if (virBufferError(&buf)) {
virBufferFreeAndReset(&buf);
virReportOOMError();
...
}
with:
if (virBufferCheckError(&buf) < 0)
...
This should not be a functional change (unless some callers
misused the virBuffer APIs - a different error would be reported
then)
So far, we only report an error if formatting the siblings bitmap
in NUMA topology fails.
Be consistent and always report error in virCapabilitiesFormatXML.
Check if the buffer is in error state and report an error if it is.
This replaces the pattern:
if (virBufferError(buf)) {
virReportOOMError();
goto cleanup;
}
with:
if (virBufferCheckError(buf) < 0)
goto cleanup;
Document typical buffer usage to favor this.
Also remove the redundant FreeAndReset - if an error has
been set via virBufferSetError, the content is already freed.
https://bugzilla.redhat.com/show_bug.cgi?id=1086121
We now support startupPolicy='optional' for disks, but this
should work only for cold boot, not for restore or migrate.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The comments for lxcDomainCreateXMLWithFiles are out of date. So update them.
And add comments for lxcDomainCreateXML
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Signed-off-by: Yue wenyuan <yuewenyuan@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>