The virDomainDefParse* and virDomainDefFormat* methods both
accept the VIR_DOMAIN_XML_* flags defined in the public API,
along with a set of other VIR_DOMAIN_XML_INTERNAL_* flags
defined in domain_conf.c.
This is seriously confusing & error prone for a number of
reasons:
- VIR_DOMAIN_XML_SECURE, VIR_DOMAIN_XML_MIGRATABLE and
VIR_DOMAIN_XML_UPDATE_CPU are only relevant for the
formatting operation
- Some of the VIR_DOMAIN_XML_INTERNAL_* flags only apply
to parse or to format, but not both.
This patch cleanly separates out the flags. There are two
distint VIR_DOMAIN_DEF_PARSE_* and VIR_DOMAIN_DEF_FORMAT_*
flags that are used by the corresponding methods. The
VIR_DOMAIN_XML_* flags received via public API calls must
be converted to the VIR_DOMAIN_DEF_FORMAT_* flags where
needed.
The various calls to virDomainDefParse which hardcoded the
use of the VIR_DOMAIN_XML_INACTIVE flag change to use the
VIR_DOMAIN_DEF_PARSE_INACTIVE flag.
https://bugzilla.redhat.com/show_bug.cgi?id=1135339 documents some
confusing behavior when a user tries to start an inactive block
commit in a second connection while there is already an on-going
active commit from a first connection. Eventually, qemu will
support multiple simultaneous block jobs, but as of now, it does
not; furthermore, libvirt also needs an overhaul before we can
support simultaneous jobs. So, the best way to avoid confusing
ourselves is to quit relying on qemu to tell us about the situation
(where we risk getting in weird states) and instead forbid a
duplicate block commit ourselves.
Note that we are still relying on qemu to diagnose attempts to
interrupt an inactive commit (since we only track XML of an active
commit), but as inactive commit is less confusing for libvirt to
manage, there is less that can go wrong by leaving that detection
up to qemu.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Hoist check for
active commit to occur earlier outside of conditions.
Signed-off-by: Eric Blake <eblake@redhat.com>
QEMU supports feature specification with -cpu host and we just skip
using that. Since QEMU developers themselves would like to use this
feature, this patch modifies the code to work.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178850
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The default value should be 16 MiB instead of 8 MiB. Only really old
version of upstream QEMU used the 8 MiB as default for vga framebuffer.
Without this change if you update your libvirt where we introduced the
"vgamem" attribute for QXL video device the value will be set to 8 MiB,
but previously your guest had 16 MiB because we didn't pass any value to
QEMU command line which means QEMU used its own 16 MiB as default.
This will affect all users with guest's display resolution higher than
1920x1080.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In one of my previous commits (311b4a67) I've tried to allow to
pass regular system pages to <hugepages>. However, there was a
little bug that wasn't caught. If domain has guest NUMA topology
defined, qemuBuildNumaArgStr() function takes care of generating
corresponding command line. The hugepages backing for guest NUMA
nodes is handled there too. And here comes the bug: the hugepages
setting from XML is stored in KiB internally, however, the system
pages size was queried and stored in Bytes. So the check whether
these two are equal was failing even if it shouldn't.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In commit 540c339a25 the whole domain
reference counting was refactored in the qemu driver. Domain jobs now
don't need to reference the domain object as they now expect the
reference from the calling function.
However, the patch forgot to remove the unref call in case we exit the
monitor when we were acquiring a nested job. This caused the daemon to
crash on a subsequent access to the domain object once we've done an
operation requiring a nested job for a monitor access.
An easy reproducer case:
1) Start a vm with qcow disks
2) virsh snapshot-create-as DOMNAME
3) virsh dumpxml DOMNAME
4) daemon crashes in a semi-random spot while accessing a now-removed VM
object.
Fortunately, the commit wasn't released yet, so there are no security
implications.
Reported-by: Shanzi Yu <shyu@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1177723
When setting new bandwidth limits via
virDomainSetInterfaceParameters, the old ones are cleared first.
However, if setting the new ones fails, the old are already gone
and interface is left in inconsistent state. Therefore, right
before failing we ought to try to restore the old bandwidth.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1178652
We will get a warning when we have a guest in paused
status (caused by kernel panic) and restart libvirtd,
warning message like this:
Qemu reported unknown VM status: 'guest-panicked'
and this seems because we set a wrong status name in
qemu_monitor.c, and from qemu qapi-schema.json file
we know this status should named 'guest-panicked'.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add the possibility to have more than one IP address configured for a
domain network interface. IP addresses can also have a prefix to define
the corresponding netmask.
There is one problem that causes various errors in the daemon. When
domain is waiting for a job, it is unlocked while waiting on the
condition. However, if that domain is for example transient and being
removed in another API (e.g. cancelling incoming migration), it get's
unref'd. If the first call, that was waiting, fails to get the job, it
unref's the domain object, and because it was the last reference, it
causes clearing of the whole domain object. However, when finishing the
call, the domain must be unlocked, but there is no way for the API to
know whether it was cleaned or not (unless there is some ugly temporary
variable, but let's scratch that).
The root cause is that our APIs don't ref the objects they are using and
all use the implicit reference that the object has when it is in the
domain list. That reference can be removed when the API is waiting for
a job. And because each domain doesn't do its ref'ing, it results in
the ugly checking of the return value of virObjectUnref() that we have
everywhere.
This patch changes qemuDomObjFromDomain() to ref the domain (using
virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which
should be the only function in which the return value of
virObjectUnref() is checked. This makes all reference counting
deterministic and makes the code a bit clearer.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Although QMP returns info about vCPU threads in TCG mode, the
data it returns is mostly lies. Only the first vCPU has a valid
thread_id returned. The thread_id given for the other vCPUs is
in fact the main emulator thread. All vCPUs actually run under
the same thread in TCG mode.
Our vCPU pinning code is not at all able to cope with this
so if you try to set CPU affinity per-vCPU you end up with
wierd errors
error: Failed to start domain instance-00000007
error: cannot set CPU affinity on process 24365: Invalid argument
Since few people will care about the performance of TCG with
strict CPU pinning, lets just disable that for now, so we get
a clear error message
error: Failed to start domain instance-00000007
error: Requested operation is not valid: cpu affinity is not supported
The code assumes that def->vcpus == nvcpupids, so when we setup
fake CPU pids for old QEMU with nvcpupids == 1, we cause the
later code to read off the end of the array. This has fun results
like sche_setaffinity(0, ...) which changes libvirtd's own CPU
affinity, or even better sched_setaffinity($RANDOM, ...) which
changes the affinity of a random OS process.
Libvirt BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1175397
QEMU BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1170093
In qemu there are two interesting arguments:
1) -numa to create a guest NUMA node
2) -object memory-backend-{ram,file} to tell qemu which memory
region on which host's NUMA node it should allocate the guest
memory from.
Combining these two together we can instruct qemu to create a
guest NUMA node that is tied to a host NUMA node. And it works
just fine. However, depending on machine type used, there might
be some issued during migration when OVMF is enabled (see QEMU
BZ). While this truly is a QEMU bug, we can help avoiding it. The
problem lies within the memory backend objects somewhere. Having
said that, fix on our side consists on putting those objects on
the command line if and only if needed. For instance, while
previously we would construct this (in all ways correct) command
line:
-object memory-backend-ram,size=256M,id=ram-node0 \
-numa node,nodeid=0,cpus=0,memdev=ram-node0
now we create just:
-numa node,nodeid=0,cpus=0,mem=256
because the backend object is obviously not tied to any specific
host NUMA node.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Coverity flagged commit 0282ca45 as introducing a memory leak;
in all my refactoring to make capacity probing conditional on
whether the image is non-raw, I missed deleting the unconditional
probe.
* src/qemu/qemu_driver.c (qemuStorageLimitsRefresh): Drop
redundant assignment.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1174569
There's nothing we need to do for shared iSCSI devices in
qemuAddSharedHostdev and qemuRemoveSharedHostdev. The iSCSI layer
takes care about that for us.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Wire up backing chain recursion. For the first time, it is now
possible to get libvirt to expose that qemu tracks read statistics
on backing files, as well as report maximum extent written on a
backing file during a block-commit operation.
For a running domain, where one of the two images has a backing
file, I see the traditional output:
$ virsh domstats --block testvm2
Domain: 'testvm2'
block.count=2
block.0.name=vda
block.0.path=/tmp/wrapper.qcow2
block.0.rd.reqs=1
block.0.rd.bytes=512
block.0.rd.times=28858
block.0.wr.reqs=0
block.0.wr.bytes=0
block.0.wr.times=0
block.0.fl.reqs=0
block.0.fl.times=0
block.0.allocation=0
block.0.capacity=1310720000
block.0.physical=200704
block.1.name=vdb
block.1.path=/dev/sda7
block.1.rd.reqs=0
block.1.rd.bytes=0
block.1.rd.times=0
block.1.wr.reqs=0
block.1.wr.bytes=0
block.1.wr.times=0
block.1.fl.reqs=0
block.1.fl.times=0
block.1.allocation=0
block.1.capacity=1310720000
vs. the new output:
$ virsh domstats --block --backing testvm2
Domain: 'testvm2'
block.count=3
block.0.name=vda
block.0.path=/tmp/wrapper.qcow2
block.0.rd.reqs=1
block.0.rd.bytes=512
block.0.rd.times=28858
block.0.wr.reqs=0
block.0.wr.bytes=0
block.0.wr.times=0
block.0.fl.reqs=0
block.0.fl.times=0
block.0.allocation=0
block.0.capacity=1310720000
block.0.physical=200704
block.1.name=vda
block.1.path=/dev/sda6
block.1.backingIndex=1
block.1.rd.reqs=0
block.1.rd.bytes=0
block.1.rd.times=0
block.1.wr.reqs=0
block.1.wr.bytes=0
block.1.wr.times=0
block.1.fl.reqs=0
block.1.fl.times=0
block.1.allocation=327680
block.1.capacity=786432000
block.2.name=vdb
block.2.path=/dev/sda7
block.2.rd.reqs=0
block.2.rd.bytes=0
block.2.rd.times=0
block.2.wr.reqs=0
block.2.wr.bytes=0
block.2.wr.times=0
block.2.fl.reqs=0
block.2.fl.times=0
block.2.allocation=0
block.2.capacity=1310720000
I may later do a patch that trims the output to avoid 0 stats,
particularly for backing files (which are more likely to have
0 stats, at least for write statistics when no block-commit
is performed). Also, I still plan to expose physical size
information (qemu doesn't expose it yet, so it requires a stat,
and for block devices, a further open/seek operation). But
this patch is good enough without worrying about that yet.
* src/qemu/qemu_driver.c (QEMU_DOMAIN_STATS_BACKING): New internal
enum bit.
(qemuConnectGetAllDomainStats): Recognize new user flag, and pass
details to...
(qemuDomainGetStatsBlock): ...here, where we can do longer recursion.
(qemuDomainGetStatsOneBlock): Output new field.
Signed-off-by: Eric Blake <eblake@redhat.com>
In order to report stats on backing chains, we need to separate
the output of stats for one block from how we traverse blocks.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Split...
(qemuDomainGetStatsOneBlock): ...into new helper.
Signed-off-by: Eric Blake <eblake@redhat.com>
A coming patch will make it optionally possible to list backing
chain block stats; in this mode of operation, block.counts is no
longer the number of <disks> in the domain, but the number of
blocks in the array being reported. We still want block.count
listed first, but rather than iterate the tree twice (once to
count, and once to list stats), it's easier to just touch things
up after the fact.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Compute count
after the fact.
Signed-off-by: Eric Blake <eblake@redhat.com>
The prior refactoring can now be put to use. With the same domain
as the earlier commit 7b49926 (one qcow2 disk and an empty
cdrom drive):
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.0.path=/var/lib/libvirt/images/foo.qcow2
block.0.allocation=1309614080
block.0.capacity=42949672960
block.0.physical=1309671424
block.1.name=hdc
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Use
qemuStorageLimitsRefresh to report offline statistics.
Signed-off-by: Eric Blake <eblake@redhat.com>
Create a helper function that can be reused for gathering block
info from virDomainListGetStats.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Split guts...
(qemuStorageLimitsRefresh): ...into new helper function.
Signed-off-by: Eric Blake <eblake@redhat.com>
The documentation for virDomainBlockInfo was confusing: it stated
that 'physical' was the size of the container, then gave an example
of it being the amount of storage used by a sparse file (that is,
for a sparse raw image on a regular file, the wording implied
capacity==physical, while allocation was smaller; but the example
instead claimed physical==allocation). Since we use 'physical' for
the last offset of a block device, we should do likewise for
regular files.
Furthermore, the example claimed that for a qcow2 regular file,
allocation==physical. At the time the code was first written,
this was true (qcow2 files were allocated sequentially, and were
never sparse, so the last sector written happened to also match
the disk space occupied); but modern qemu does much better and
can punch holes for a qcow2 with allocation < physical.
Basically, after this patch, the three fields are now reliably
mapped as:
'capacity' - how much storage the guest can see (equal to
physical for raw images, determined by image metadata otherwise)
'allocation' - how much storage the image occupies (similar to
what 'du' would report)
'physical' - the last offset of the image (similar to what 'ls'
would report)
'capacity' can be larger than 'physical' (such as for a qcow2
image that does not vary much from a backing file) or smaller
(such as for a qcow2 file with lots of internal snapshots).
Likewise, 'allocation' can be (slightly) larger than 'physical'
(such as counting the tail of cluster allocations required to
round a file size up to filesystem granularity) or smaller
(for a sparse file). A block-resize operation changes capacity
(which, for raw images, also changes physical); many non-raw
images automatically grow physical and allocation as necessary
when starting with an allocation smaller than capacity; and even
when capacity and physical stay unchanged, allocation can change
when converting sectors from holes to data or back.
Note that this does not change semantics for qcow2 images stored
on block devices; there, we still rely on qemu to report the
highest written extent for allocation. So using this API to
track when to extend a block device because a qcow2 image is
about to exceed a threshold will not see any changes.
Also, note that virStorageVolInfo is unfortunately limited to
just 'capacity' and 'allocation' (we can't expand it to add
'physical', although we can expand the XML to add it there);
historically, that struct's 'allocation' value has reported
file size for qcow2 files (what this patch terms 'physical'
for a domain block device), but disk usage for raw files (what
this patch terms 'allocation'). So follow-up patches will be
needed to make storage volumes report the same allocation
values and get at physical values, where those differ.
* include/libvirt/libvirt-domain.h (_virDomainBlockInfo): Tweak
documentation to match saner definition.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): For regular
files, physical size is capacity, not allocation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Ultimately, we want to avoid read()ing a file while qemu is running.
We still have to open() block devices to determine their physical
size, but that is safer. This patch rearranges code to group
together all code that reads the image, to make it easier for later
patches to skip the metadata collection when possible.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Check for empty
disk up front. Place metadata reading next to use.
Signed-off-by: Eric Blake <eblake@redhat.com>
When requested in a later patch, the QMP command results are now
examined recursively. As qemu_driver will eventually have to
read items out of the hash table as stored by this patch, the
computation of backing alias string is done in a shared location.
* src/qemu/qemu_domain.h (qemuDomainStorageAlias): New prototype.
* src/qemu/qemu_domain.c (qemuDomainStorageAlias): Implement it.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): Perform recursion.
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Update callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch will allow recursion into backing chains when
collecting block stats. This patch should not change behavior,
but merely moves out the common code that will be reused once
recursion is enabled, and adds the parameter that will turn on
recursion.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Add recursion parameter,
although it is ignored for now.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.h
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Add parameter, and
split...
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): ...into helpers.
(qemuMonitorJSONGetBlockStatsInfo): Update caller.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Update caller.
* src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Right now, grabbing blockinfo always calls stat on the disk, then
opens the image to determine the capacity, using a throw-away
virStorageSourcePtr. This has a couple of drawbacks:
1. We are calling stat and opening a file on every invocation of
the API. However, there are cases where the stats should NOT be
changing between successive calls (if a domain is running, no
one should be changing the physical size of a block device or raw
image behind our backs; capacity of read-only files should not
be changing; and we are the gateway to the block-resize command
to know when the capacity of read-write files should be changing).
True, we still have to use stat in some cases (a sparse raw file
changes allocation if it is read-write and the amount of holes is
changing, and a read-write qcow2 image stored in a file changes
physical size if it was not fully pre-allocated). But for
read-only images, even this should be something we can remember
from the previous time, rather than repeating every call.
2. We want to enhance the power of virDomainListGetStats, by
sharing code. But we already have a virStorageSourcePtr for
each disk, and it would be easier to reuse the common structure
than to have to worry about the one-off virDomainBlockInfoPtr.
While this patch does not optimize reuse of information in point
1, it does get us closer to being able to do so; by updating a
structure that survives between consecutive calls.
* src/util/virstoragefile.h (_virStorageSource): Add physical, to
mirror virDomainBlockInfo; rearrange fields to match public struct.
(virStorageSourceCopy): Copy the new field.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Store into
storage source, then copy to block info.
Signed-off-by: Eric Blake <eblake@redhat.com>
In order for a future patch to virDomainListGetStats to reuse
some code for determining disk usage of offline domains, we
need to make it easier to pull out part of the guts of grabbing
blockinfo. The current implementation grabs a job fairly late
in the game, while getstats will already own a job; reordering
things so that the job is always grabbed up front in both
functions will make it easier to pull out the common code.
This patch results in grabbing a job in cases where one was not
previously needed, but as it is a query job, it should not be
noticeably slower.
This patch touches the same code as the fix for CVE-2014-6458
(commit b799259); in that patch, we avoided hotplug changing
a disk reference during the time of obtaining a monitor lock
by copying all data we needed and no longer referencing disk;
this patch goes the other way and ensures that by holding the
job, the disk cannot be changed so we no longer need to worry
about the disk being invalidated across the monitor lock.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Rearrange job
control to be outside of disk information.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit e3435caf added cleanup code to qemuDomainSetVcpusFlags() that was
not supposed to reset the error. Usual procedure was done, saving the
error to temporary variable, but it was never free'd, but rather leaked.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit af2a1f05 tried clearly separating each condition in
qemuRestoreCgroupState() for the sake of readability, however somehow
one condition body was missing. That means that the body of the next
condition got executed only if both of there were true, which is
impossible, thus resulting in a dead code and a logic error.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When hot-plugging a VCPU into the guest, kvm needs to allocate some data
from the DMA zone, which might be in a memory node that's not allowed in
cpuset.mems. Basically the same problem as there was with starting the
domain and due to which commit 7e72ac7878
exists. This patch just extends it to hotplugging as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1161540
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Instead of setting the value of cpuset.mems once when the domain starts
and then re-calculating the value every time we need to change the child
cgroup values, leave the cgroup alone and rather set the child data
every time there is new cgroup created. We don't leave any task in the
parent group anyway. This will ease both current and future code.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1174154
When we use attach-device add a hostdev or chr device which have a
iscsi address or others (just like guest agent, subsys iscsi disk...),
we will find there is no basic controller for our new attached device.
Somtimes this will make guest cannot start after we add them (although
they can start at the second time).
Signed-off-by: Luyao Huang <lhuang@redhat.com>
When libvirt is managing a bridge's forwarding database (FDB)
(macTableManager='libvirt'), if we add FDB entries for a new guest
interface even before the qemu process is created, then in the case of
a migration any other guest attached to the "destination" bridge will
have its traffic immediately sent to the destination of the migration
even while the source domain is still running (and the destination, of
course, isn't). To make sure that traffic from other guests on the new
host continues flowing to the old guest until the new one is ready, we
have to wait until the new guest CPUs are started to add the FDB
entries.
Conversely, we need to remove the FDB entries from the bridge any time
the guest CPUs are stopped; among other things, this will assure
proper operation during a post-copy migration (which is just the
opposite of the problem described in the previous paragraph).
We can change vnc password by using virDomainUpdateDeviceFlags API with
live flag. But it can't be changed with config flag. Error is reported as
below.
error: Operation not supported: persistent update of device 'graphics' is not supported
This patch supports the graphics arguments changed with config flag.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
It's not supported to change some graphics arguments with '--live'.
Replace some error code VIR_ERR_INTERNAL_ERROR and VIR_ERR_INVALID_ARG
with VIR_ERR_OPERATION_UNSUPPORTED.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1173507
It occurred to me that OpenStack uses the following XML when not using
regular huge pages:
<memoryBacking>
<hugepages>
<page size='4' unit='KiB'/>
</hugepages>
</memoryBacking>
However, since we are expecting to see huge pages only, we fail to
startup the domain with following error:
libvirtError: internal error: Unable to find any usable hugetlbfs
mount for 4 KiB
While regular system pages are not huge pages technically, our code is
prepared for that and if it helps OpenStack (or other management
applications) we should cope with that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1160995
In our config files users are expected to pass several integer values
for different configuration knobs. However, majority of them expect a
nonnegative number and only a few of them accept a negative number too
(notably keepalive_interval in libvirtd.conf).
Therefore, a new type to config value is introduced: VIR_CONF_ULONG
that is set whenever an integer is positive or zero. With this
approach knobs accepting VIR_CONF_LONG should accept VIR_CONF_ULONG
too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We now have a qemuInterfaceStartDevices() which does the final
activation needed for the host-side tap/macvtap devices that are used
for qemu network connections. It will soon make sense to have the
converse qemuInterfaceStopDevices() which will undo whatever was done
during qemuInterfaceStartDevices().
A function to "stop" a single device has also been added, and is
called from the appropriate place in qemuDomainDetachNetDevice(),
although this is currently unnecessary - the device is going to
immediately be deleted anyway, so any extra "deactivation" will be for
naught. The call is included for completeness, though, in anticipation
that in the future there may be some required action that *isn't*
nullified by deleting the device.
This patch is a part of a more complete fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
The patch that added qemuInterfaceStartDevices() (upstream commit
82977058f5) had an extra conditional to
prevent calling it if the reason for starting the CPUs was
VIR_DOMAIN_RUNNING_UNPAUSED or VIR_DOMAIN_RUNNING_SAVE_CANCELED. This
was put in by the author as the result of a reviewer asking if it was
necessary to ifup the interfaces in *all* occasions (because these
were the two cases where the CPU would have already been started (and
stopped) once, so the interface would already be ifup'ed).
It turns out that, as long as there is no corresponding
qemuInterfaceStopDevices() to ifdown the interfaces anytime the CPUs
are stopped, neglecting to ifup when reason is RUNNING_UNPAUSED or
RUNNING_SAVE_CANCELED doesn't cause any problems (because it just
happens that the interface will have already been ifup'ed by a prior
call when the CPU was previously started for some other reason).
However, it also doesn't *help*, and there will soon be a
qemuInterfaceStopDevices() function which *will* ifdown these
interfaces when the guest CPUs are stopped, and once that is done, the
interfaces will be left down in some cases when they should be up (for
example, if a domain is paused and then unpaused).
So, this patch is removing the condition in favor of always calling
qemuInterfaeStartDevices() when the guest CPUs are started.
This patch (and the aforementioned patch) resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
A logic bug in qemuConnectGetAllDomainStats makes the code mark the
monitor as available when qemuDomainObjBeginJob fails, instead of when
it succeeds, as the correct flow requires.
This patch fixes the check and updates the code documentation
accordingly.
Broken by commit 57023c0a3a.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Currently, MAC registration occurs during device creation, which is
early enough that, during live migration, you end up with duplicate
MAC addresses on still-running source and target devices, even though
the target device isn't actually being used yet.
This patch proposes to defer MAC registration until right before
the guest can actually use the device -- In other words, right
before starting guest CPUs.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Laine Stump <laine@laine.org>
When user doesn't have read access on one of the domains he requested,
the for loop could exit abruptly or continue and override pointer which
pointed to locked object.
This patch fixed two issues at once. One is that domflags might have
had QEMU_DOMAIN_STATS_HAVE_JOB even when there was no job started (this
is fixed by doing domflags |= QEMU_DOMAIN_STATS_HAVE_JOB only when the
job was acquired and cleaning domflags on every start of the loop.
Second one is that the domain is kept locked when
virConnectGetAllDomainStatsCheckACL() fails and continues the loop when
it didn't end. Adding a simple virObjectUnlock() and clearing the
pointer ought to do.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Avoid leaving the domain locked on a failed ACL check in
qemuDomainMigratePerform() and qemuDomainMigrateFinish2().
Introduced in commit abf75aea24 (Add ACL checks into the QEMU driver).
qemuNetworkIfaceConnect() used to have a special case for
actualType='network' (a network with forward mode of route, nat, or
isolated) to call the libvirt public API to retrieve the bridge being
used by a network. That is no longer necessary - since all network
types that use a bridge and tap device now get the bridge name stored
in the ActualNetDef, we can just always use
virDomainNetGetActualBridgeName() instead.
(an audit of the two callers to qemuNetworkIfaceConnect() confirms
that it is never called for any other type of network, so the dead
code in the else statement (logging an internal error if it is called
for any other type of network) is eliminated in the process.)
When libvirt is managing the MAC table of a Linux host bridge, it must
turn off learning and unicast_flood for each tap device attached to
that bridge, then add a Forwarding Database (fdb) entry for the tap
device using the MAC address from the domain interface config.
Once we have disabled learning and flooding, any packet that has a
destination MAC address not present in the fdb will be dropped by the
bridge. This, along with the opportunistic disabling of promiscuous
mode[*], can result in enhanced network performance. and a potential
slight security improvement.
[*] If there is only one device on the bridge with learning/unicast_flood
enabled, then that device will automatically have promiscuous mode
disabled. If there are *no* devices with learning/unicast_flood
enabled (e.g. for a libvirt "route", "nat", or isolated network that
has no physical device attached), then all non-tap devices will have
promiscuous mode disabled (tap devices always have promiscuous mode
enabled, which may be a bug in the kernel, but in practice has 0
effect).
None of this has any effect for kernels prior to 3.15 (upstream kernel
commit 2796d0c648c940b4796f84384fbcfb0a2399db84 "bridge: Automatically
manage port promiscuous mode"). Even after that, until kernel 3.17
(upstream commit 5be5a2df40f005ea7fb7e280e87bbbcfcf1c2fc0 "bridge: Add
filtering support for default_pvid") traffic will not be properly
forwarded without manually adding vlan table entries. Unfortunately,
although the presence of the first patch is signalled by existence of
the "learning" and "unicast_flood" options in sysfs, there is no
reliable way to query whether or not the system's kernel has the
second of those patches installed, the only thing that can be done is
to try the setting and see if traffic continues to pass.
I'm about to make block stats optionally more complex to cover
backing chains, where block.count will no longer equal the number
of <disks> for a domain. For these reasons, it is nicer if the
statistics output includes the source path (for local files).
This patch doesn't add anything for network disks, although we
may decide to add that later.
With this patch, I now see the following for the same domain as
in the previous patch (one qcow2 file, and an empty cdrom drive):
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.0.path=/var/lib/libvirt/images/foo.qcow2
block.1.name=hdc
* src/libvirt-domain.c (virConnectGetAllDomainStats): Document
new field.
* tools/virsh.pod (domstats): Document new field.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Return the new
stat for local files/block devices.
(QEMU_ADD_NAME_PARAM): Add parameter.
(qemuDomainGetStatsInterface): Update caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
I noticed that for an offline domain, 'virsh domstats --block $dom'
was producing just the domain name, with no stats. But the older
'virsh domblkinfo' works just fine on offline domains. This patch
starts to get us closer, by at least reporting the disk names for
an offline domain.
With this patch, I now see the following for an offline domain
with one qcow2 disk and an empty cdrom drive:
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.1.name=hdc
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Don't short-circuit
output of block name.
Signed-off-by: Eric Blake <eblake@redhat.com>
qemuDomainGetStatsBlock() could leak a stats hash table if it
encountered OOM while populating the virTypedParameters.
Oddly, the fix doesn't even touch qemuDomainGetStatsBlock :)
* src/qemu/qemu_driver.c (QEMU_ADD_COUNT_PARAM)
(QEMU_ADD_NAME_PARAM): Don't return early.
(qemuDomainGetStatsInterface): Adjust caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
If probing capabilities via QMP fails, we now have a check
that prevents us falling back to -help parsing. Unfortunately
the error message
"Failed to probe capabilities for /usr/bin/qemu-kvm:
unsupported configuration: QEMU 2.1.2 is too new for help parsing"
is proving rather unhelpful to the user. We need to be telling
them why QMP failed (the root cause), rather than they can't
use -help (the side effect).
To do this we should capture stderr during QMP probing, and
if -help parsing then sees a new QEMU version, we know that
QMP should have worked, and so we can show the messages from
stderr. The message thus becomes
"Failed to probe capabilities for /usr/bin/qemu-kvm:
internal error: QEMU / QMP failed: Could not access
KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory"
When attempting to create internal system checkpoint with a passthrough
device qemu will report the following error:
error: operation failed: Error -22 while writing VM
This patch calls the function to check if migration is possible with
given VM and thus improves the error to:
error: Requested operation is not valid: domain has assigned non-USB host devices
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=874418#c19
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Move entering the job into the thread to simplify the program flow. Also
as the code holds a separate reference to the domain object some
conditions can be simplified.
After this patch qemuDomainObjTransferJob is no longer needed so this
patch removes it.
If someone removes blockcopy storage file when still in mirroring phase
and then requesting blockjob abort using pivot, virsh cmd freezes. This
is not an issue with older qemu versions which did not support
asynchronous jobs (which we prefer by default).
As we have reached the mirroring phase successfully, polling monitor for
blockjob info always returns 1 and the loop never ends.
This fix introduces a check for qemuDomainBlockPivot return code, possibly
skipping the asynchronous waiting completely, if an error occurred and
asynchronous waiting was the preferred method.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1139567
Reconnect to the VM is a possibly long-running job spawned in a separate
thread. We should reload the snapshot defs and managedsave state prior
to spawning the thread to avoid blocking of the daemon startup which
would serialize on the VM lock.
Also the reloading code would violate the domain job held while
reconnecting as the loader functions don't create jobs.
Based on previous commit, we can now precreate missing volumes. While
digging out the functionality from storage driver would be nicer, if
you've seen the code it's nearly impossible. So I'm going from the
other end:
1) For given disk target, disk path is looked up.
2) For the disk path, storage pool is looked up, a volume XML is
constructed and then passed to virStorageVolCreateXML() which has all
the knowledge how to create raw images, (encrypted) qcow(2) images,
etc.
One of the advantages of this approach is, we don't have to care about
image conversion - qemu does that for us. So for instance, users can
transform qcow2 into raw on migration (if the correct XML is passed to
the migration API).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Up 'til now, users need to precreate non-shared storage on migration
themselves. This is not very friendly requirement and we should do
something about it. In this patch, the migration cookie is extended,
so that <nbd/> section does not only contain NBD port, but info on
disks being migrated. This patch sends a list of pairs of:
<disk target; disk size>
to the destination. The actual storage allocation is left for next
commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The function queries the block devices visible to qemu
('query-block') and parses the qemu's output. The info is
returned in a hash table which is expected to be pre-filled by
qemuMonitorJSONGetAllBlockStatsInfo(). However, in the next patch
we are not going to call the latter function at all, so we should
make the former function add devices into the hash table if not
found there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since virDomainSnapshotFree will call virObjectUnref anyway, let's just use
that directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
Since virNetworkFree will call virObjectUnref anyway, let's just use that
directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
Since virDomainFree will call virObjectUnref anyway, let's just use that
directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
There is a race condition between the fopen and fscanf calls
in qemuGetProcessInfo. If fopen succeeds, there is a small
possibility that the file no longer exists before reading from it.
Now, if either fopen or fscanf calls fail, the function will behave
just as only fopen had failed.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1169055
Signed-off-by: Eric Blake <eblake@redhat.com>
Coverity complained that because the cfg->macFilter call checked
net->ifname != NULL before calling ebtablesRemoveForwardAllowIn, then
the virNetDevOpenvswitchRemovePort call should have the same check.
However, if I move the ebtables call prior to the check for TYPE_DIRECT
(where there is a VIR_FREE(net->ifname)), then it seems Coverity is
happy. Since firewall info is tacked on last during setup, removing
it in the opposite order of initialization seems to be natural anyway
There are some small issue in qemuProcessAttach:
1.Fix virSecurityManagerGetProcessLabel always get pid = 0,
move 'vm->pid = pid' before call virSecurityManagerGetProcessLabel.
2.Use virSecurityManagerGenLabel to get image label.
3.Fix always set selinux label for other security driver label.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
When a block{commit,copy} job was aborted on a domain, block job handler
did not process it correctly, leaving a phantom job in the background.
Any further calls to any blockjob causes "block <jobtype> still active"
error. This patch fixes the blockjob handler so that it checks not only
for VIR_DOMAIN_BLOCK_JOB_FAILED status, but VIR_DOMAIN_BLOCK_JOB_CANCELED
status as well, followed by our existing cleanup routine.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1135169
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
If job is failed in qemuMigrationRun, we expect the jobinfo type as
FAILED. But jobinfo type won't be updated until entering
qemuMigrationWaitForCompletion. We should make it updated in all
conditions. Moreover, we can't use qemuMigrationUpdateJobStatus
here because job may fail in libvirt, so we can't query job status
from QEMU.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
The migration job status is traced in qemuMigrationUpdateJobStatus
which is called in qemuMigrationRun. But if migration is cancelled
before the trace such as in qemuMigrationDriveMirror, the jobinfo
type won't be updated to CANCELLED. After this patch, we can get
jobinfo type CANCELLED if migration is cancelled during drive
mirror. Moreover, we can't use qemuMigrationUpdateJobStatus
because from qemu's point of view it's just the drive mirror being
cancelled and the migration hasn't even started yet.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1160084
As of b6d4dad11b (1.2.5) we are trying to keep the status of FSFreeze
in the guest. Even though I've tried to fixed couple of corner cases
(6ea54769ba), it occurred to me just recently, that the approach is
broken by design. Firstly, there are many other ways to talk to
qemu-ga (even through libvirt) that filesystems can be thawed (e.g.
qemu-agent-command) without libvirt noticing. Moreover, there are
plenty of ways to thaw filesystems without even qemu-ga noticing (yes,
qemu-ga keeps internal track of FSFreeze status). So, instead of
keeping the track ourselves, or asking qemu-ga for stale state, it's
the best to let qemu-ga deal with that (and possibly let guest kernel
propagate an error).
Moreover, there's one bug with the following approach, if fsfreeze
command failed, we've executed fsthaw subsequently. So issuing
domfsfreeze in virsh gave the following result:
virsh # domfsfreeze gentoo
Froze 1 filesystem(s)
virsh # domfsfreeze gentoo
error: Unable to freeze filesystems
error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance
virsh # domfsfreeze gentoo
Froze 1 filesystem(s)
virsh # domfsfreeze gentoo
error: Unable to freeze filesystems
error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virReportSystemError is reserved for reporting system errors, calling it
with VIR_ERR_* error codes produces error messages that do not make any
sense, such as
internal error: guest failed to start: Kernel doesn't support user
namespace: Link has been severed
We should prohibit wrong usage with a syntax-check rule.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Commit 6fcddfcd refactored job statistics but missed the jobinfo type updated
in qemuDomainGetJobInfo. After this patch, we can use virDomainGetJobInfo to
get jobinfo type again.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Add attribute to set vgamem_mb parameter of QXL device for QEMU. This
value sets the size of VGA framebuffer for QXL device. Default value in
QEMU is 8MB so reuse it also in libvirt to not break things.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
So far we didn't have any option to set video memory size for qemu video
devices. There was only the vram (ram for QXL) attribute but it was valid
only for the QXL video device.
To provide this feature to users QEMU has a dedicated device attribute
called 'vgamem_mb' to set the video memory size. We will use the 'vram'
attribute for setting video memory size for other QEMU video devices.
For the cirrus device we will ignore the vram value because it has
hardcoded video size in QEMU.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
QEMU has two different type of QXL display device. The first "qxl-vga"
is for primary video device and second "qxl" is for secondary video
device.
There are also two different ways how to specify those devices on qemu
command line, the first one and obsolete is using "-vga" option and the
current new one is using "-device" option. The "-vga" could be used only
to setup primary video device, so the "-vga qxl" equal to
"-device qxl-vga". Unfortunately the "-vga qxl" doesn't support setting
additional parameters for the device and "-global" option must be used
for this purpose. It's mandatory to use "-global qxl-vga...." to set the
parameters of primary video device previously defined with "-vga qxl".
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The vram attribute was introduced to set the video memory but it is
usable only for few hypervisors excluding QEMU/KVM and the old XEN
driver. Only in case of QEMU the vram was used for QXL.
This patch updates the documentation to reflect current code in libvirt
and also changes the cases when we will set the default vram attribute.
It also fixes existing strange default value for VGA devices 9MB to 16MB
because the video ram should be rounded to power of two.
The change of default value could affect migrations but I found out that
QEMU always round the video ram to power of two internally so it's safe
to change the default value to the next closest power of two and also
silently correct every domain XML definition. And it's also safe because
we don't pass the value to QEMU.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Get mounted filesystems list, which contains hardware info of disks and its
controllers, from QEMU guest agent 2.2+. Then, convert the hardware info
to corresponding device aliases for the disks.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Improve the monitor function to also retrieve the guest state of
character device (if provided) so that we can refresh the state of
virtio-serial channels and perhaps react to changes in the state in
future patches.
This patch changes the returned data from qemuMonitorGetChardevInfo to
return a structure containing the pty path and the state for all the
character devices.
The change to the testsuite makes sure that the data is parsed
correctly.
To be able to express some use cases of the RBD backing with libvirt, we
need to be able to specify a config file for the RBD client to qemu as
that is one of the commonly used options.
Some storage systems have internal support for snapshots. Libvirt should
be able to select a correct snapshot when starting a VM.
This patch adds a XML element to select a storage source snapshot for
the RBD protocol which supports this feature.
To allow reuse this non-trivial parser code in the backing store parser
this part of the command line parser needs to be split out into a
separate funciton.
Instead of splitting out various fields, pass the complete structure and
let the function pick various things of it.
As one of the callers isn't using virStorageSourcePtr to store the data,
this patch adds glue code that fills the data into a dummy
virStorageSourcePtr before calling the func.
This change will help when adding new fields that need output processing
in the future.
New qemu added a new event that is emitted when a virtio serial channel
is opened in the guest OS. This allows us to update the state of the
port in the output-only XML element.
This patch implements the monitor callbacks and necessary handlers to
update the state in the definition.
To unify future additions that require information from "query-chardev"
rename qemuMonitorGetPtyPaths and friends to qemuMonitorGetChardevInfo
and move the allocation of the returned hash into the top level
function.
When creating a disk image snapshot the libvirt code would blindly copy
the parents label to the newly created image. This runs into problems
when you start a VM from an image hosted on NFS (or other storage system
that doesn't support selinux labels) and the snapshot destination is on
a storage system that does support selinux labels. Libvirt's code in
that case generates a different security label for the image hosted on
NFS. This label is valid only for NFS images and doesn't allow access in
case of a locally stored image.
To fix this issue libvirt needs to refrain from copying security
information in cases where the default domain seclabel is a better
choice.
This patch repurposes the now unused @force argument of
virStorageSourceInitChainElement to denote whether a copy of the
security labelling stuff should be attempted or not. This allows to
fine-control the copy operation for cases where we need to keep the
label of the old disk vs. the cases where we need to keep the label
unset to use the default domain imagelabel.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1151718
Oops, I forgot to squash one more instance of the same check in the
previous commit (v1.2.10-144-g52691f9).
https://bugzilla.redhat.com/show_bug.cgi?id=1147331
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Any attempt to start a tunnelled migration with libvirtd that supports
RDMA migration (specifically commit v1.2.8-226-ged22a47) crashes
libvirtd on the destination host.
The crash is inevitable because qemuMigrationPrepareAny is always called
with NULL protocol in case of tunnelled migration.
https://bugzilla.redhat.com/show_bug.cgi?id=1147331
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>