When using 'dimm' memory devices with qemu, some of the information
like the slot number and base address need to be reloaded from qemu
after process start so that it reflects the actual state. The state then
allows to use memory devices across migrations.
In qemu 2.3, the migration status will include 'cancelling' in the
window between when an asynchronous cancel has been requested and
when the migration is actually halted. Previously, qemu hid this
state and reported 'active'. Libvirt manages the sequence okay
even when the string is unrecognized (that is, it will report an
unknown state:
Migration: [ 69 %]^Cerror: internal error: unexpected migration status in cancelling.
but the migration is still cancelled), but recognizing the string
makes for a smoother user experience.
* src/qemu/qemu_monitor.h
(QEMU_MONITOR_MIGRATION_STATUS_CANCELLING): Add enum.
* src/qemu/qemu_monitor.c (qemuMonitorMigrationStatus): Map it.
* src/qemu/qemu_migration.c (qemuMigrationUpdateJobStatus): Adjust
clients.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetMigrationStatusReply): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that
after a series of disk snapshots into existing destination images,
followed by active commits of the top image, it is possible for
qemu 2.2 and earlier to end up tracking a different name for the
image than what it would have had when opening the chain afresh.
That is, when starting with the chain 'a <- b <- c', the name
associated with 'b' is how it was spelled in the metadata of 'c',
but when starting with 'a', taking two snapshots into 'a <- b <- c',
then committing 'c' back into 'b', the name associated with 'b' is
now the name used when taking the first snapshot.
Sadly, older qemu doesn't know how to treat different spellings of
the same filename as identical files (it uses strcmp() instead of
checking for the same inode), which means libvirt's attempt to
commit an image using solely the names learned from qcow2 metadata
fails with a cryptic:
error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found
even though the file exists. Trying to teach libvirt the rules on
which name qemu will expect is not worth the effort (besides, we'd
have to remember it across libvirtd restarts, and track whether a
file was opened via metadata or via snapshot creation for a given
qemu process); it is easier to just always directly ask qemu what
string it expects to see in the first place.
As a safety valve, we validate that any name returned by qemu
still maps to the same local file as we have tracked it, so that
a compromised qemu cannot accidentally cause us to act on an
incorrect file.
* src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New
prototype.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup)
(qemuMonitorJSONDiskNameLookupOne): Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit)
(qemuDomainBlockJobImpl): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Our virDomainBlockStatsFlags API uses the old approach where, when it's
called without the typed parameter array, returns the count of parameters
supported by qemu.
The supported parameter count is obtained via separate monitor calls
which is a waste since we can calculate it when gathering the data.
This patch adds code to the qemuMonitorGetAllBlockStatsInfo workers that
allows to track the count of supported fields reported by qemu and will
allow to remove the old duplicate code.
The function that is extracting block stats data from the QMP monitor
reply contains a lot of repeated code. Since I'd be changing each of the
copies in the next patch, lets convert it to a macro right away.
Allocate the hash table in the monitor wrapper function instead of the
worker itself so that the text monitor impl that will be added in the
next patch doesn't have to duplicate it.
QEMU internally updates the size of video memory if the domain XML had
provided too low memory size or there are some dependencies for a QXL
devices 'vgamem' and 'ram' size. We need to know about the changes and
store them into the status XML to not break migration or managedsave
through different libvirt versions.
The values would be loaded only if the "vgamem_mb" property exists for
the device. The presence of the "vgamem_mb" also tells that the
"ram_size" and "vram_size" exists for QXL devices.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When requested in a later patch, the QMP command results are now
examined recursively. As qemu_driver will eventually have to
read items out of the hash table as stored by this patch, the
computation of backing alias string is done in a shared location.
* src/qemu/qemu_domain.h (qemuDomainStorageAlias): New prototype.
* src/qemu/qemu_domain.c (qemuDomainStorageAlias): Implement it.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): Perform recursion.
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Update callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch will allow recursion into backing chains when
collecting block stats. This patch should not change behavior,
but merely moves out the common code that will be reused once
recursion is enabled, and adds the parameter that will turn on
recursion.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Add recursion parameter,
although it is ignored for now.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.h
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Add parameter, and
split...
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): ...into helpers.
(qemuMonitorJSONGetBlockStatsInfo): Update caller.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Update caller.
* src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The function queries the block devices visible to qemu
('query-block') and parses the qemu's output. The info is
returned in a hash table which is expected to be pre-filled by
qemuMonitorJSONGetAllBlockStatsInfo(). However, in the next patch
we are not going to call the latter function at all, so we should
make the former function add devices into the hash table if not
found there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Improve the monitor function to also retrieve the guest state of
character device (if provided) so that we can refresh the state of
virtio-serial channels and perhaps react to changes in the state in
future patches.
This patch changes the returned data from qemuMonitorGetChardevInfo to
return a structure containing the pty path and the state for all the
character devices.
The change to the testsuite makes sure that the data is parsed
correctly.
New qemu added a new event that is emitted when a virtio serial channel
is opened in the guest OS. This allows us to update the state of the
port in the output-only XML element.
This patch implements the monitor callbacks and necessary handlers to
update the state in the definition.
To unify future additions that require information from "query-chardev"
rename qemuMonitorGetPtyPaths and friends to qemuMonitorGetChardevInfo
and move the allocation of the returned hash into the top level
function.
We used to set migration capabilities only when a user asked for them in
flags. This is fine when migration succeeds since the QEMU process is
killed in the end but in case migration fails or if it's cancelled, some
capabilities may remain turned on with no way to turn them off. To fix
that, migration capabilities have to be turned on if requested but
explicitly turned off in case they were not requested but QEMU supports
them.
https://bugzilla.redhat.com/show_bug.cgi?id=1163953
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Use the device type name if we know it instead of its number,
even if we can't hotplug it:
qemuMonitorJSONAttachCharDevCommand:6094 : operation failed: Unsupported
char device type '10'
Detect if the the qemu binary currently in use support the bps_max option,
If yes add it to the command, if not, just ignore the option.
We don't print error here, because the check for invalide arguments
has alerady been made in qemu_driver.c
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Add support for bps_max and friends in the driver part.
In the part checking if a qemu is running, check if the running binary
support bps_max, if not print an error message, if yes add it to
"info" variable
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
To allow live modification of device backends in qemu libvirt needs to
be able to hot-add/remove "objects". Add monitor backend functions to
allow this.
This function will be used for hot-add/remove of RNG backends,
IOThreads, memory backing objects, etc.
Our qemu monitor code has a converter from key-value pairs to a json
value object. I want to re-use the code later and having it part of the
monitor command generator is inflexible. Split it out into a separate
helper.
NIC_RX_FILTER_CHANGED is sent by qemu any time a NIC driver in the
guest modified the NIC's RX Filter (for example, if the MAC address of
the NIC is changed by the guest).
This patch doesn't do anything useful with that event; it just sets up
all the plumbing to get news of the event into a worker thread with
all proper locking/reference counting, and provide an easy place to
add in desired functionality.
See src/qemu/EVENTHANDLERS.txt for information/instructions on adding
a libvirt-internal handler for a qemu event (using
NIC_RX_FILTER_CHANGED as an example).
This function can be called at any time to get the current status of a
guest's network device rx-filter. In particular it is useful to call
after libvirt recieves a NIC_RX_FILTER_CHANGED event - this event only
tells you that something has changed in the rx-filter, the details are
retrieved with the query-rx-filter monitor command (only available in
the json monitor). The command sent to the qemu monitor looks like this:
{"execute":"query-rx-filter", "arguments": {"name":"net2"} }'
and the results will look something like this:
{
"return": [
{
"promiscuous": false,
"name": "net2",
"main-mac": "52:54:00:98:2d:e3",
"unicast": "normal",
"vlan": "normal",
"vlan-table": [
42,
0
],
"unicast-table": [
],
"multicast": "normal",
"multicast-overflow": false,
"unicast-overflow": false,
"multicast-table": [
"33:33:ff:98:2d:e3",
"01:80:c2:00:00:21",
"01:00:5e:00:00:fb",
"33:33:ff:98:2d:e2",
"01:00:5e:00:00:01",
"33:33:00:00:00:01"
],
"broadcast-allowed": false
}
],
"id": "libvirt-14"
}
This is all parsed from JSON into a virNetDevRxFilter object for
easier consumption. (unicast-table is usually empty, but is also an
array of mac addresses similar to multicast-table).
(NB: LIBNL_CFLAGS was added to tests/Makefile.am because virnetdev.h
now includes util/virnetlink.h, which includes netlink/msg.h when
appropriate. Without LIBNL_CFLAGS, gcc can't find that file (if
libnl/netlink isn't available, LIBNL_CFLAGS will be empty and
virnetlink.h won't try to include netlink/msg.h anyway).)
Aeons ago (commit 34dcbbb4, v0.8.2), we added a new libvirt event
(VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON) in order to tell the user WHY
the guest halted. This is because at least VDSM wants to react
differently to ENOSPC events (resize the lvm partition to be larger,
and resume the guest as if nothing had happened) from all other events
(I/O is hosed, throw up our hands and flag things as broken). At the
time this was done, downstream RHEL qemu added a vendor extension
'__com.redhat_reason', which would be exactly one of these strings:
"enospc", "eperm", "eio", and "eother". In our stupidity, we exposed
those exact strings to clients, rather than an enum, and we also
return "" if we did not have access to a reason (which was the case
for upstream qemu).
Fast forward to now: upstream qemu commit c7c2ff0c (will be qemu 2.2)
FINALLY adds a 'nospace' boolean, after discussion with multiple
projects determined that VDSM really doesn't care about distinction
between any other error types. So this patch converts 'nospace' into
the string "enospc" for compatibility with RHEL clients that were
already used to the downstream extension, while leaving the reason
blank for all other cases (no change from the status quo).
See also https://bugzilla.redhat.com/show_bug.cgi?id=1119784
* src/qemu/qemu_monitor_json.c (qewmuMonitorJSONHandleIOError):
Parse reason field from modern qemu.
* include/libvirt/libvirt.h.in
(virConnectDomainEventIOErrorReasonCallback): Document it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Management software wants to be able to allocate disk space on demand.
To support this they need keep track of the space occupation of the
block device. This information is reported by qemu as part of block
stats.
This patch extend the block information in the bulk stats with the
allocation information.
To keep the same behaviour a helper is extracted from
qemuMonitorJSONGetBlockExtent in order to get per-device allocation
information.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
While our code gathers block stats via "query-blockstats" some
information need to be gathered via "query-block". Add a helper function
that will update the blockstats structure if requested.
The current block stats code matched up the disk name with the actual
stats by the order in the data returned from qemu. This unfortunately
isn't right as qemu may return the disks in any order. Fix this by
returning a hash of stats and index them by the disk alias.
RDMA migration uses the 'setup' state in QEMU to optionally lock
all memory before the migration starts. The total time spent in
this state is exposed as VIR_DOMAIN_JOB_SETUP_TIME.
Additionally, QEMU also exports migration throughput (mbps) for both
memory and disk, so let's add them too: VIR_DOMAIN_JOB_MEMORY_BPS,
VIR_DOMAIN_JOB_DISK_BPS.
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This patch implements the VIR_DOMAIN_STATS_BLOCK group of statistics.
To do so, a helper function to get the block stats of all the disks of
a domain is added.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Upstream qemu 1.4 added some drive-mirror tunables not present
when it was first introduced in 1.3. Management apps may want
to set these in some cases (for example, without tuning
granularity down to sector size, a copy may end up occupying
more bytes than the original because an entire cluster is
copied even when only a sector within the cluster is dirty,
although tuning it down results in more CPU time to do the
copy). I haven't personally needed to use the parameters, but
since they exist, and since the new API supports virTypedParams,
we might as well expose them.
Since the tuning parameters aren't often used, and omitted from
the QMP command when unspecified, I think it is safe to rely on
qemu 1.3 to issue an error about them being unsupported, rather
than trying to create a new capability bit in libvirt.
Meanwhile, all versions of qemu from 1.4 to 2.1 have a bug where
a bad granularity (such as non-power-of-2) gives a poor message:
error: internal error: unable to execute QEMU command 'drive-mirror': Invalid parameter 'drive-virtio-disk0'
because of abuse of QERR_INVALID_PARAMETER (which is supposed to
name the parameter that was given a bad value, rather than the
value passed to some other parameter). I don't see that a
capability check will help, so we'll just live with it (and it
has since been improved in upstream qemu).
* src/qemu/qemu_monitor.h (qemuMonitorDriveMirror): Add
parameters.
* src/qemu/qemu_monitor.c (qemuMonitorDriveMirror): Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveMirror):
Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockCopyCommon): Likewise.
(qemuDomainBlockRebase, qemuDomainBlockCopy): Adjust callers.
* src/qemu/qemu_migration.c (qemuMigrationDriveMirror): Likewise.
* tests/qemumonitorjsontest.c (qemuMonitorJSONDriveMirror): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
If the virJSONValueNewObject() fails, then rather than going to error
and getting a Coverity false positive since it doesn't seem to understand
the relationship between nkeywords, keywords, and values and seems to
believe calling qemuFreeKeywords will cause a NULL deref - just return NULL
Signed-off-by: John Ferlan <jferlan@redhat.com>
virDomainGetJobStats gains new VIR_DOMAIN_JOB_STATS_COMPLETED flag that
can be used to fetch statistics of a completed job rather than a
currently running job.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
While reviewing the new virDomainBlockCopy API, Peter Krempa
pointed out that our existing design of using MiB/s for block
job bandwidth is rather coarse, especially since qemu tracks
it in bytes/s; so virDomainBlockCopy only accepts bytes/s.
But once the new API is implemented for qemu, we will be in
the situation where it is possible to set a value that cannot
be accurately reflected back to the user, because the existing
virDomainGetBlockJobInfo defaults to the coarser units.
Fortunately, we have an escape hatch; and one that has already
served us well in the past: we can use the flags argument to
specify which scale to use (see virDomainBlockResize for prior
art). This patch fixes the query side of the API; made easier
by previous patches that split the query side out from the
modification code. Later patches will address the virsh
interface, as well retrofitting all other blockjob APIs to
also accept a flag for toggling bandwidth units.
* include/libvirt/libvirt.h.in (_virDomainBlockJobInfo)
(VIR_DOMAIN_BLOCK_COPY_BANDWIDTH): Document sizing issues.
(virDomainBlockJobInfoFlags): New enum.
* src/libvirt.c (virDomainGetBlockJobInfo): Document new flag.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJobInfo): Add parameter.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJobInfo): Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJobInfo):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJobInfo)
(qemuMonitorJSONGetBlockJobInfoOne): Likewise. Don't scale here.
* src/qemu/qemu_migration.c (qemuMigrationDriveMirror): Update
callers.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl): Likewise.
(qemuDomainGetBlockJobInfo): Likewise, and support new flag.
Signed-off-by: Eric Blake <eblake@redhat.com>
The previous patch hoisted some bounds checks to the callers;
but someone that is not aware of the hoisted check could now
try passing an integer between LLONG_MAX and ULLONG_MAX. As a
safety measure, add new json conversion modes that let libvirt
error out early instead of pass bad numbers to qemu, if the
caller ever makes a mistake due to later refactoring.
Convert the various blockjob QMP calls to use the new modes,
and switch some of them to be optional (QMP has always supported
an omitted "speed" the same as "speed":0, for everything except
block-job-set-speed).
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMakeCommandRaw):
Add 'j'/'y' and 'J'/'Y' to error out on negative input.
(qemuMonitorJSONDriveMirror, qemuMonitorJSONBlockCommit)
(qemuMonitorJSONBlockJob): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Another layer of overly-multiplexed code that deserves to be
split into obviously separate paths for query vs. modify.
This continues the cleanup started in commit cefe0ba.
In the process, make some tweaks to simplify the logic when
parsing the JSON reply. There should be no user-visible
semantic changes.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Drop parameter.
(qemuMonitorBlockJobInfo): New prototype.
(BLOCK_JOB_INFO): Drop enum.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJob)
(qemuMonitorJSONBlockJobInfo): Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Split...
(qemuMonitorBlockJobInfo): ...into second function.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Move
block info portions...
(qemuMonitorJSONGetBlockJobInfo): ...here, and rename...
(qemuMonitorJSONBlockJobInfo): ...and export.
(qemuMonitorJSONGetBlockJobInfoOne): Alter return semantics.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl, qemuDomainGetBlockJobInfo): Adjust
callers.
* src/qemu/qemu_migration.c (qemuMigrationDriveMirror)
(qemuMigrationCancelDriveMirror): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1103245
An advice appeared there on the qemu-devel list [1]. When a domain is
suspended and then resumed guest kernel is not aware of this. So we've
introduced virDomainSetTime API that resets the time within guest
using qemu-ga. On the other hand, qemu itself is trying to make RTC
beat faster to catch the difference. But if we don't tell qemu that
guest's time was reset via the other method, both mechanisms are
applied resulting in again wrong guest time. In order to avoid summing
both corrections we need to tell qemu that it should not use the RTC
injection if the guest time is set via guest agent.
1: http://www.mail-archive.com/qemu-devel@nongnu.org/msg236435.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
That can be lately achieved with by having .param == NULL in the
virQEMUCapsCommandLineProps struct.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
To allow changing the name that is recorded in the top of the current
image chain used in a block pull/rebase operation, we need to specify
the backing name to qemu. This is done via the "backing-file" attribute
to the block-stream commad.
To allow changing the name that is recorded in the overlay of the TOP
image used in a block commit operation, we need to specify the backing
name to qemu. This is done via the "backing-file" attribute to the
block-commit command.
We are about to turn on support for active block commit. Although
qemu 2.0 was the first version to mostly support it, that version
mis-handles 0-length files, and doesn't have anything available for
easy probing. But qemu 2.1 fixed bugs, and made life simpler by
letting the 'top' argument be optional. Unless someone begs for
active commit with qemu 2.0, for now we are just going to enable
it only by probing for qemu 2.1 behavior (anyone backporting active
commit can also backport the optional argument behavior). This
requires qemu.git commit 7676e2c597000eff3a7233b40cca768b358f9bc9.
Although all our actual uses of block-commit supply arguments for
both base and top, we can omit both arguments and use a bogus
device string to trigger an interesting behavior in qemu. All QMP
commands first do argument validation, failing with GenericError
if a mandatory argument is missing. Once that passes, the code
in the specific command gets to do further checking, and the qemu
developers made sure that if device is the only supplied argument,
then the block-commit code will look up the device first, with a
failure of DeviceNotFound, before attempting any further argument
validation (most other validations fail with GenericError). Thus,
the category of error class can reliably be used to decipher
whether the top argument was optional, which in turn implies a
working active commit. Since we expect our bogus device string to
trigger an error either way, the code is written to return a
distinct return value without spamming the logs.
* src/qemu/qemu_monitor.h (qemuMonitorSupportsActiveCommit): New
prototype.
* src/qemu/qemu_monitor.c (qemuMonitorSupportsActiveCommit):
Implement it.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockCommit):
Allow NULL for top and base, for probing purposes.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockCommit):
Likewise, implementing the probe.
* tests/qemumonitorjsontest.c (mymain): Enable...
(testQemuMonitorJSONqemuMonitorSupportsActiveCommit): ...a new test.
Signed-off-by: Eric Blake <eblake@redhat.com>