Extract the memory backend device code into a separate function so that
it can be later easily refactored and reused.
Few small changes for future reusability, namely:
- new (currently unused) parameter for user specified page size
- size of the memory is specified in kibibytes, divided up in the
function
- new (currently unused) parameter for user specifed source nodeset
- option to enforce capability check
Unlike -device, qemu uses a JSON object to add backend "objects" via the
monitor rather than the string that would be passed on the commandline.
To be able to reuse code parts that configure backends for various
devices, this patch adds a helper that will allow generating the command
line representations from the JSON property object.
This patch enables synchronization of the host macvtap
device options with the guest device's in response to the
NIC_RX_FILTER_CHANGED event.
The following device options will be synchronized:
* PROMISC
* MULTICAST
* ALLMULTI
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1158034
If we're expecting to create a file somewhere and that fails for some
reason during qemuOpenFileAs, then we unlink the path we're attempting
to create leaving no way to determine what the "existing" privileges,
protections, or labels are that caused the failure (open, change owner
and group, change mode, etc.).
Furthermore, if we fall into the path where we'll be opening / creating
the file using VIR_FILE_OPEN_FORK, we need to first unlink/delete the file
we created in the first path; otherwise, the attempt by the child process
to open as some specific user:group may fail because the file was already
created using nfsnobody:nfsnobody. Again, if we didn't create the file we
don't want to blindly delete what already exists. Thus, a second reason for
the original check to set need_unlink to false when we find the file with
CREAT set, but already existing.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit id '540c339a' to fix issues with reference counting and transient
domains moved the qemuDomainObjEndAsyncJob call prior to the attempt to
restart the guest CPU's resulting in an error:
error: Failed to save domain rhel70 to /tmp/pl/rhel70.save
error: internal error: unexpected async job 3
when (ret != 0) - eg, the error path from qemuDomainSaveMemory.
This patch will adjust the logic to call the EndAsyncJob only after
we've tried to restart the guest CPUs. It also needs to adjust the
test for qemuDomainRemoveInactive to add the ret == 0 condition.
Additionally, if we get to endjob: because of some error earlier, then
we need to save that error in the event the CPU restart logic fails.
We don't want to return the error from CPU restart failure, rather we
want to return the error from the failed save that caused us to fall
into the retry to start the CPU logic.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Well, even though users can pass the list of UEFI:NVRAM pairs at the
configure time, we may maintain the list of widely available UEFI
ourselves too. And as arm64 begin to rises, OVMF was ported there too.
With a slight name change - it's called AAVMF, with AAVMF_CODE.fd
being the UEFI firmware and AAVMF_VARS.fd being the NVRAM store file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Up until now there are just two ways how to specify UEFI paths to
libvirt. The first one is editing qemu.conf, the other is editing
qemu_conf.c and recompile which is not that fancy. So, new
configure option is introduced: --with-loader-nvram which takes a
list of pairs of UEFI firmware and NVRAM store. This way, the
compiled in defaults can be passed during compile time without
need to change the code itself.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1183890
When we try to update a xml to a image file, we will clear the
graphics passwd settings, because we do not pass VIR_DOMAIN_XML_SECURE
to qemuDomainDefCopy, qemuDomainDefFormatBuf won't format the passwd.
Add VIR_DOMAIN_XML_SECURE flag when we call qemuDomainDefCopy
in qemuDomainSaveImageUpdateDef.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
This way the device is in vmdef only if ret = 0 and the caller
(qemuDomainAttachDeviceFlags) does not free it.
Otherwise it might get double freed by qemuProcessStop
and qemuDomainAttachDeviceFlags if the domain crashed
in monitor after we've added it to vm->def.
Do the allocation first, then add the actual device.
The second part should never fail. This is good
for live hotplug where we don't want to remove the device
on OOM after the monitor command succeeded.
The only change in behavior is that on failure, the
vmdef->consoles array is freed, not just the first console.
For stateless, client side drivers, it is never correct to
probe for secondary drivers. It is only ever appropriate to
use the secondary driver that is associated with the
hypervisor in question. As a result the ESX & HyperV drivers
have both been forced to do hacks where they register no-op
drivers for the ones they don't implement.
For stateful, server side drivers, we always just want to
use the same built-in shared driver. The exception is
virtualbox which is really a stateless driver and so wants
to use its own server side secondary drivers. To deal with
this virtualbox has to be built as 3 separate loadable
modules to allow registration to work in the right order.
This can all be simplified by introducing a new struct
recording the precise set of secondary drivers each
hypervisor driver wants
struct _virConnectDriver {
virHypervisorDriverPtr hypervisorDriver;
virInterfaceDriverPtr interfaceDriver;
virNetworkDriverPtr networkDriver;
virNodeDeviceDriverPtr nodeDeviceDriver;
virNWFilterDriverPtr nwfilterDriver;
virSecretDriverPtr secretDriver;
virStorageDriverPtr storageDriver;
};
Instead of registering the hypervisor driver, we now
just register a virConnectDriver instead. This allows
us to remove all probing of secondary drivers. Once we
have chosen the primary driver, we immediately know the
correct secondary drivers to use.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
This patch also fixes a few very long lines of code around the touched
parts.
For distros that want to add versioned machine types, they will add
(downstream) machine types like "virt-foo-1.2.3". Detect these as
MMIO too.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Previous patch of this series fixed the issue with adding a new PCI bridge
when all the slots were reserved by devices with user specified addresses.
In case there are still some PCI devices waiting to get a slot reserved
by qemuAssignDevicePCISlots, this means a new bus needs to be
created along with a corresponding bridge controller. By adding an
additional check, this scenario now results in a reasonable error
instead of generating wrong qemu command line.
Commit 93c8ca tried to fix the issue with auto-adding of a PCI bridge
controller, but didn't work properly in all scenarios.
This patch provides a better fix of the issue when all slots on a PCI bus
are reserved by devices with user specified addresses and no additional
bridges need to be created.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1132900
In order to be able to test for fully reserved PCI buses, assignment of
PCI slots for integrated devices needs to be moved to a separate function.
This also might be a good preparation if we decide to add support for
other chipsets as well.
Move qemuDomainAssignPCIAddresses after the definition
of the static function qemuDomainValidateDevicePCISlotsQ35.
This lets us define a new static function using
qemuDomainValidateDevicePCISlots* and use it in
qemuDomainAssignPCIAddresses without a forward declaration.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
As it turned out, fix of dead code 419a22 changed the affected condition
from "never true" to "always true", so better fix would be to change the
return code of virDomainMaybeAddController from 0 to 1 if
a new bridge has been added, thus distinguishing case when we didn't need to
add any controller and case we successfully added one.
The return code is changed in the next commit
The ACL check didn't check the VIR_DOMAIN_XML_SECURE flag and the
appropriate permission for it. Found via code inspection while fixing
permissions for save images.
https://bugzilla.redhat.com/show_bug.cgi?id=1164627
When using 'virsh attach-device' to hotplug an unsupported console type
into a qemu guest the attachment would succeed as the command line
formatter didn't report error in such case.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1130390
The listen address is not mandatory for <interface type='server'>
but when it's not specified, we've been formatting it as:
-netdev socket,listen=(null):5558,id=hostnet0
which failed with:
Device 'socket' could not be initialized
Omit the address completely and only format the port in the listen
attribute.
Also fix the schema to allow specifying a model.
Using the same driver multiple times is pointless and
it can result in confusing errors:
$ virsh start test
error: Failed to start domain test
error: internal error: security label already defined for VM
https://bugzilla.redhat.com/show_bug.cgi?id=1153891
Depending on the context, either error out if the domain
has disappeared in the meantime, or just ignore the value
to allow marking the function as ATTRIBUTE_RETURN_CHECK.
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
If the domain crashed while we were in monitor,
we cannot rely on the REALLOC done on live definition,
since vm->def now points to the persistent definition.
Skip adding the attached devices to domain definition
if the domain crashed.
In AttachChrDevice, the chardev was already added to the
live definition and freed by qemuProcessStop in the case
of a crash. Skip the device removal in that case.
Also skip audit if the domain crashed in the meantime.
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
In the device type-specific functions, exit early
if the domain has disappeared, because the cleanup
should have been done by qemuProcessStop.
Check the return value in processDeviceDeletedEvent
and qemuProcessUpdateDevices.
Skip audit and removing the device from live def because
it has already been cleaned up.
Ploop is a pseudo device which makeit possible to access
to an image in a file as a block device. Like loop devices,
but with additional features, like snapshots, write tracker
and without double-caching.
It used in PCS for containers and in OpenVZ. You can manage
ploop devices and images with ploop utility
(http://git.openvz.org/?p=ploop).
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
We tested for positive return value from virDomainMaybeAddController,
but it returns 0 or -1 only resulting in a dead code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In case we find out, there are more PCI devices to be connected
than there are available slots on the default PCI bus, we automatically add a
new bus and a related PCI bridge controller as well. As there are no free slots
left on the default PCI bus, PCI bridge controller gets a free slot on a
newly created PCI bus which causes qemu to refuse to start the guest.
This fix introduces a new function qemuDomainPCIBusFullyReserved which
is checked right before we possibly try to reserve a slot for PCI bridge
controller.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1132900
The virDomainDefineXMLFlags and virDomainCreateXML APIs both
gain new flags allowing them to be told to validate XML.
This updates all the drivers to turn on validation in the
XML parser when the flags are set
systemd-machined introduced a new method CreateMachineWithNetwork
that obsoletes CreateMachine. It expects to be given a list of
VETH/TAP device indexes for the host side device(s) associated
with a container/machine.
This falls back to the old CreateMachine method when the new
one is not supported.
https://bugzilla.redhat.com/show_bug.cgi?id=1181182
When we meet error in qemuMigrationPrepareAny and goto
cleanup with rc < 0, we forget clear the priv->origname and this
will make this vm migrate fail next time because leave a wrong
origname in priv, and will Generate a wrong cookie when do
migrate next time.
This patch will make priv->origname is NULL when migrate fail
in target host.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Make local copy of the disk alias in qemuProcessInitPasswords,
instead of referencing the one in domain definition, which
might get freed if the domain crashes while we're in monitor.
Also copy the memballoon period value.
Make a local copy of the disk alias instead of pointing
to the domain definition, which might get freed if
the domain dies while we're in monitor.
Also exit early if that happens.
Exit the monitor right after we've done with it to get
the virDomainObjPtr lock back, otherwise we might be accessing
vm->def while it's being cleaned up by qemuProcessStop.
If the domain crashed while we were in the monitor, exit
early instead of changing vm->def which is now the persistent
definition.
The domain might disappear during the time in monitor when
the virDomainObjPtr is unlocked, so the caller needs to check
if it's still alive.
Since most of the callers are going to need it, put the
check inside qemuDomainObjExitMonitor and return -1 if
the domain died in the meantime.
QEMU internally updates the size of video memory if the domain XML had
provided too low memory size or there are some dependencies for a QXL
devices 'vgamem' and 'ram' size. We need to know about the changes and
store them into the status XML to not break migration or managedsave
through different libvirt versions.
The values would be loaded only if the "vgamem_mb" property exists for
the device. The presence of the "vgamem_mb" also tells that the
"ram_size" and "vram_size" exists for QXL devices.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The search is done recursively only through QOM object that has a type
prefixed with "child<" as this indicate that the QOM is a parent for
other QOM objects.
The usage is that you give known device name with starting path where to
search.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Commit e3435caf fixed hot-plugging of vcpus with strict memory pinning
on NUMA hosts, but unfortunately it also broke updating number of vcpus
for offline guests using our API.
The issue is that we try to create a cpu cgroup for non-running guest
which fails as there are no cgroups for that domain. We should create
cgroups and update cpuset.mems only if we are hot-plugging.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1165993
So, there are still plenty of vNIC types that we don't know how to set
bandwidth on. Let's warn explicitly in case user has requested it
instead of pretending everything was set.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When create inactive external snapshot, after update disk definitions,
virDomainSaveConfig is needed, if not after restart libvirtd the new snapshot
file definitions in xml will be lost.
Reproduce steps:
1. prepare a shut off guest
$ virsh domstate rhel7 && virsh domblklist rhel7
shut off
Target Source
------------------------------------------------
vda /var/lib/libvirt/images/rhel7.img
2. create external disk snapshot
$ virsh snapshot-create rhel7 --disk-only && virsh domblklist rhel7
Domain snapshot 1417882967 created
Target Source
------------------------------------------------
vda /var/lib/libvirt/images/rhel7.1417882967
3. restart libvirtd then check guest source file
$ service libvirtd restart && virsh domblklist rhel7
Redirecting to /bin/systemctl restart libvirtd.service
Target Source
------------------------------------------------
vda /var/lib/libvirt/images/rhel7.img
This was first reported by Eric Blake
http://www.redhat.com/archives/libvir-list/2014-December/msg00369.html
Signed-off-by: Shanzhi Yu <shyu@redhat.com>
The virDomainDefParse* and virDomainDefFormat* methods both
accept the VIR_DOMAIN_XML_* flags defined in the public API,
along with a set of other VIR_DOMAIN_XML_INTERNAL_* flags
defined in domain_conf.c.
This is seriously confusing & error prone for a number of
reasons:
- VIR_DOMAIN_XML_SECURE, VIR_DOMAIN_XML_MIGRATABLE and
VIR_DOMAIN_XML_UPDATE_CPU are only relevant for the
formatting operation
- Some of the VIR_DOMAIN_XML_INTERNAL_* flags only apply
to parse or to format, but not both.
This patch cleanly separates out the flags. There are two
distint VIR_DOMAIN_DEF_PARSE_* and VIR_DOMAIN_DEF_FORMAT_*
flags that are used by the corresponding methods. The
VIR_DOMAIN_XML_* flags received via public API calls must
be converted to the VIR_DOMAIN_DEF_FORMAT_* flags where
needed.
The various calls to virDomainDefParse which hardcoded the
use of the VIR_DOMAIN_XML_INACTIVE flag change to use the
VIR_DOMAIN_DEF_PARSE_INACTIVE flag.
https://bugzilla.redhat.com/show_bug.cgi?id=1135339 documents some
confusing behavior when a user tries to start an inactive block
commit in a second connection while there is already an on-going
active commit from a first connection. Eventually, qemu will
support multiple simultaneous block jobs, but as of now, it does
not; furthermore, libvirt also needs an overhaul before we can
support simultaneous jobs. So, the best way to avoid confusing
ourselves is to quit relying on qemu to tell us about the situation
(where we risk getting in weird states) and instead forbid a
duplicate block commit ourselves.
Note that we are still relying on qemu to diagnose attempts to
interrupt an inactive commit (since we only track XML of an active
commit), but as inactive commit is less confusing for libvirt to
manage, there is less that can go wrong by leaving that detection
up to qemu.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Hoist check for
active commit to occur earlier outside of conditions.
Signed-off-by: Eric Blake <eblake@redhat.com>
QEMU supports feature specification with -cpu host and we just skip
using that. Since QEMU developers themselves would like to use this
feature, this patch modifies the code to work.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178850
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The default value should be 16 MiB instead of 8 MiB. Only really old
version of upstream QEMU used the 8 MiB as default for vga framebuffer.
Without this change if you update your libvirt where we introduced the
"vgamem" attribute for QXL video device the value will be set to 8 MiB,
but previously your guest had 16 MiB because we didn't pass any value to
QEMU command line which means QEMU used its own 16 MiB as default.
This will affect all users with guest's display resolution higher than
1920x1080.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In one of my previous commits (311b4a67) I've tried to allow to
pass regular system pages to <hugepages>. However, there was a
little bug that wasn't caught. If domain has guest NUMA topology
defined, qemuBuildNumaArgStr() function takes care of generating
corresponding command line. The hugepages backing for guest NUMA
nodes is handled there too. And here comes the bug: the hugepages
setting from XML is stored in KiB internally, however, the system
pages size was queried and stored in Bytes. So the check whether
these two are equal was failing even if it shouldn't.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In commit 540c339a25 the whole domain
reference counting was refactored in the qemu driver. Domain jobs now
don't need to reference the domain object as they now expect the
reference from the calling function.
However, the patch forgot to remove the unref call in case we exit the
monitor when we were acquiring a nested job. This caused the daemon to
crash on a subsequent access to the domain object once we've done an
operation requiring a nested job for a monitor access.
An easy reproducer case:
1) Start a vm with qcow disks
2) virsh snapshot-create-as DOMNAME
3) virsh dumpxml DOMNAME
4) daemon crashes in a semi-random spot while accessing a now-removed VM
object.
Fortunately, the commit wasn't released yet, so there are no security
implications.
Reported-by: Shanzi Yu <shyu@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1177723
When setting new bandwidth limits via
virDomainSetInterfaceParameters, the old ones are cleared first.
However, if setting the new ones fails, the old are already gone
and interface is left in inconsistent state. Therefore, right
before failing we ought to try to restore the old bandwidth.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1178652
We will get a warning when we have a guest in paused
status (caused by kernel panic) and restart libvirtd,
warning message like this:
Qemu reported unknown VM status: 'guest-panicked'
and this seems because we set a wrong status name in
qemu_monitor.c, and from qemu qapi-schema.json file
we know this status should named 'guest-panicked'.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add the possibility to have more than one IP address configured for a
domain network interface. IP addresses can also have a prefix to define
the corresponding netmask.
There is one problem that causes various errors in the daemon. When
domain is waiting for a job, it is unlocked while waiting on the
condition. However, if that domain is for example transient and being
removed in another API (e.g. cancelling incoming migration), it get's
unref'd. If the first call, that was waiting, fails to get the job, it
unref's the domain object, and because it was the last reference, it
causes clearing of the whole domain object. However, when finishing the
call, the domain must be unlocked, but there is no way for the API to
know whether it was cleaned or not (unless there is some ugly temporary
variable, but let's scratch that).
The root cause is that our APIs don't ref the objects they are using and
all use the implicit reference that the object has when it is in the
domain list. That reference can be removed when the API is waiting for
a job. And because each domain doesn't do its ref'ing, it results in
the ugly checking of the return value of virObjectUnref() that we have
everywhere.
This patch changes qemuDomObjFromDomain() to ref the domain (using
virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which
should be the only function in which the return value of
virObjectUnref() is checked. This makes all reference counting
deterministic and makes the code a bit clearer.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Although QMP returns info about vCPU threads in TCG mode, the
data it returns is mostly lies. Only the first vCPU has a valid
thread_id returned. The thread_id given for the other vCPUs is
in fact the main emulator thread. All vCPUs actually run under
the same thread in TCG mode.
Our vCPU pinning code is not at all able to cope with this
so if you try to set CPU affinity per-vCPU you end up with
wierd errors
error: Failed to start domain instance-00000007
error: cannot set CPU affinity on process 24365: Invalid argument
Since few people will care about the performance of TCG with
strict CPU pinning, lets just disable that for now, so we get
a clear error message
error: Failed to start domain instance-00000007
error: Requested operation is not valid: cpu affinity is not supported
The code assumes that def->vcpus == nvcpupids, so when we setup
fake CPU pids for old QEMU with nvcpupids == 1, we cause the
later code to read off the end of the array. This has fun results
like sche_setaffinity(0, ...) which changes libvirtd's own CPU
affinity, or even better sched_setaffinity($RANDOM, ...) which
changes the affinity of a random OS process.
Libvirt BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1175397
QEMU BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1170093
In qemu there are two interesting arguments:
1) -numa to create a guest NUMA node
2) -object memory-backend-{ram,file} to tell qemu which memory
region on which host's NUMA node it should allocate the guest
memory from.
Combining these two together we can instruct qemu to create a
guest NUMA node that is tied to a host NUMA node. And it works
just fine. However, depending on machine type used, there might
be some issued during migration when OVMF is enabled (see QEMU
BZ). While this truly is a QEMU bug, we can help avoiding it. The
problem lies within the memory backend objects somewhere. Having
said that, fix on our side consists on putting those objects on
the command line if and only if needed. For instance, while
previously we would construct this (in all ways correct) command
line:
-object memory-backend-ram,size=256M,id=ram-node0 \
-numa node,nodeid=0,cpus=0,memdev=ram-node0
now we create just:
-numa node,nodeid=0,cpus=0,mem=256
because the backend object is obviously not tied to any specific
host NUMA node.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Coverity flagged commit 0282ca45 as introducing a memory leak;
in all my refactoring to make capacity probing conditional on
whether the image is non-raw, I missed deleting the unconditional
probe.
* src/qemu/qemu_driver.c (qemuStorageLimitsRefresh): Drop
redundant assignment.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1174569
There's nothing we need to do for shared iSCSI devices in
qemuAddSharedHostdev and qemuRemoveSharedHostdev. The iSCSI layer
takes care about that for us.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Wire up backing chain recursion. For the first time, it is now
possible to get libvirt to expose that qemu tracks read statistics
on backing files, as well as report maximum extent written on a
backing file during a block-commit operation.
For a running domain, where one of the two images has a backing
file, I see the traditional output:
$ virsh domstats --block testvm2
Domain: 'testvm2'
block.count=2
block.0.name=vda
block.0.path=/tmp/wrapper.qcow2
block.0.rd.reqs=1
block.0.rd.bytes=512
block.0.rd.times=28858
block.0.wr.reqs=0
block.0.wr.bytes=0
block.0.wr.times=0
block.0.fl.reqs=0
block.0.fl.times=0
block.0.allocation=0
block.0.capacity=1310720000
block.0.physical=200704
block.1.name=vdb
block.1.path=/dev/sda7
block.1.rd.reqs=0
block.1.rd.bytes=0
block.1.rd.times=0
block.1.wr.reqs=0
block.1.wr.bytes=0
block.1.wr.times=0
block.1.fl.reqs=0
block.1.fl.times=0
block.1.allocation=0
block.1.capacity=1310720000
vs. the new output:
$ virsh domstats --block --backing testvm2
Domain: 'testvm2'
block.count=3
block.0.name=vda
block.0.path=/tmp/wrapper.qcow2
block.0.rd.reqs=1
block.0.rd.bytes=512
block.0.rd.times=28858
block.0.wr.reqs=0
block.0.wr.bytes=0
block.0.wr.times=0
block.0.fl.reqs=0
block.0.fl.times=0
block.0.allocation=0
block.0.capacity=1310720000
block.0.physical=200704
block.1.name=vda
block.1.path=/dev/sda6
block.1.backingIndex=1
block.1.rd.reqs=0
block.1.rd.bytes=0
block.1.rd.times=0
block.1.wr.reqs=0
block.1.wr.bytes=0
block.1.wr.times=0
block.1.fl.reqs=0
block.1.fl.times=0
block.1.allocation=327680
block.1.capacity=786432000
block.2.name=vdb
block.2.path=/dev/sda7
block.2.rd.reqs=0
block.2.rd.bytes=0
block.2.rd.times=0
block.2.wr.reqs=0
block.2.wr.bytes=0
block.2.wr.times=0
block.2.fl.reqs=0
block.2.fl.times=0
block.2.allocation=0
block.2.capacity=1310720000
I may later do a patch that trims the output to avoid 0 stats,
particularly for backing files (which are more likely to have
0 stats, at least for write statistics when no block-commit
is performed). Also, I still plan to expose physical size
information (qemu doesn't expose it yet, so it requires a stat,
and for block devices, a further open/seek operation). But
this patch is good enough without worrying about that yet.
* src/qemu/qemu_driver.c (QEMU_DOMAIN_STATS_BACKING): New internal
enum bit.
(qemuConnectGetAllDomainStats): Recognize new user flag, and pass
details to...
(qemuDomainGetStatsBlock): ...here, where we can do longer recursion.
(qemuDomainGetStatsOneBlock): Output new field.
Signed-off-by: Eric Blake <eblake@redhat.com>
In order to report stats on backing chains, we need to separate
the output of stats for one block from how we traverse blocks.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Split...
(qemuDomainGetStatsOneBlock): ...into new helper.
Signed-off-by: Eric Blake <eblake@redhat.com>
A coming patch will make it optionally possible to list backing
chain block stats; in this mode of operation, block.counts is no
longer the number of <disks> in the domain, but the number of
blocks in the array being reported. We still want block.count
listed first, but rather than iterate the tree twice (once to
count, and once to list stats), it's easier to just touch things
up after the fact.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Compute count
after the fact.
Signed-off-by: Eric Blake <eblake@redhat.com>
The prior refactoring can now be put to use. With the same domain
as the earlier commit 7b49926 (one qcow2 disk and an empty
cdrom drive):
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.0.path=/var/lib/libvirt/images/foo.qcow2
block.0.allocation=1309614080
block.0.capacity=42949672960
block.0.physical=1309671424
block.1.name=hdc
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Use
qemuStorageLimitsRefresh to report offline statistics.
Signed-off-by: Eric Blake <eblake@redhat.com>
Create a helper function that can be reused for gathering block
info from virDomainListGetStats.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Split guts...
(qemuStorageLimitsRefresh): ...into new helper function.
Signed-off-by: Eric Blake <eblake@redhat.com>
The documentation for virDomainBlockInfo was confusing: it stated
that 'physical' was the size of the container, then gave an example
of it being the amount of storage used by a sparse file (that is,
for a sparse raw image on a regular file, the wording implied
capacity==physical, while allocation was smaller; but the example
instead claimed physical==allocation). Since we use 'physical' for
the last offset of a block device, we should do likewise for
regular files.
Furthermore, the example claimed that for a qcow2 regular file,
allocation==physical. At the time the code was first written,
this was true (qcow2 files were allocated sequentially, and were
never sparse, so the last sector written happened to also match
the disk space occupied); but modern qemu does much better and
can punch holes for a qcow2 with allocation < physical.
Basically, after this patch, the three fields are now reliably
mapped as:
'capacity' - how much storage the guest can see (equal to
physical for raw images, determined by image metadata otherwise)
'allocation' - how much storage the image occupies (similar to
what 'du' would report)
'physical' - the last offset of the image (similar to what 'ls'
would report)
'capacity' can be larger than 'physical' (such as for a qcow2
image that does not vary much from a backing file) or smaller
(such as for a qcow2 file with lots of internal snapshots).
Likewise, 'allocation' can be (slightly) larger than 'physical'
(such as counting the tail of cluster allocations required to
round a file size up to filesystem granularity) or smaller
(for a sparse file). A block-resize operation changes capacity
(which, for raw images, also changes physical); many non-raw
images automatically grow physical and allocation as necessary
when starting with an allocation smaller than capacity; and even
when capacity and physical stay unchanged, allocation can change
when converting sectors from holes to data or back.
Note that this does not change semantics for qcow2 images stored
on block devices; there, we still rely on qemu to report the
highest written extent for allocation. So using this API to
track when to extend a block device because a qcow2 image is
about to exceed a threshold will not see any changes.
Also, note that virStorageVolInfo is unfortunately limited to
just 'capacity' and 'allocation' (we can't expand it to add
'physical', although we can expand the XML to add it there);
historically, that struct's 'allocation' value has reported
file size for qcow2 files (what this patch terms 'physical'
for a domain block device), but disk usage for raw files (what
this patch terms 'allocation'). So follow-up patches will be
needed to make storage volumes report the same allocation
values and get at physical values, where those differ.
* include/libvirt/libvirt-domain.h (_virDomainBlockInfo): Tweak
documentation to match saner definition.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): For regular
files, physical size is capacity, not allocation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Ultimately, we want to avoid read()ing a file while qemu is running.
We still have to open() block devices to determine their physical
size, but that is safer. This patch rearranges code to group
together all code that reads the image, to make it easier for later
patches to skip the metadata collection when possible.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Check for empty
disk up front. Place metadata reading next to use.
Signed-off-by: Eric Blake <eblake@redhat.com>
When requested in a later patch, the QMP command results are now
examined recursively. As qemu_driver will eventually have to
read items out of the hash table as stored by this patch, the
computation of backing alias string is done in a shared location.
* src/qemu/qemu_domain.h (qemuDomainStorageAlias): New prototype.
* src/qemu/qemu_domain.c (qemuDomainStorageAlias): Implement it.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): Perform recursion.
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Update callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch will allow recursion into backing chains when
collecting block stats. This patch should not change behavior,
but merely moves out the common code that will be reused once
recursion is enabled, and adds the parameter that will turn on
recursion.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Add recursion parameter,
although it is ignored for now.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.h
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Add parameter, and
split...
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): ...into helpers.
(qemuMonitorJSONGetBlockStatsInfo): Update caller.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Update caller.
* src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Right now, grabbing blockinfo always calls stat on the disk, then
opens the image to determine the capacity, using a throw-away
virStorageSourcePtr. This has a couple of drawbacks:
1. We are calling stat and opening a file on every invocation of
the API. However, there are cases where the stats should NOT be
changing between successive calls (if a domain is running, no
one should be changing the physical size of a block device or raw
image behind our backs; capacity of read-only files should not
be changing; and we are the gateway to the block-resize command
to know when the capacity of read-write files should be changing).
True, we still have to use stat in some cases (a sparse raw file
changes allocation if it is read-write and the amount of holes is
changing, and a read-write qcow2 image stored in a file changes
physical size if it was not fully pre-allocated). But for
read-only images, even this should be something we can remember
from the previous time, rather than repeating every call.
2. We want to enhance the power of virDomainListGetStats, by
sharing code. But we already have a virStorageSourcePtr for
each disk, and it would be easier to reuse the common structure
than to have to worry about the one-off virDomainBlockInfoPtr.
While this patch does not optimize reuse of information in point
1, it does get us closer to being able to do so; by updating a
structure that survives between consecutive calls.
* src/util/virstoragefile.h (_virStorageSource): Add physical, to
mirror virDomainBlockInfo; rearrange fields to match public struct.
(virStorageSourceCopy): Copy the new field.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Store into
storage source, then copy to block info.
Signed-off-by: Eric Blake <eblake@redhat.com>
In order for a future patch to virDomainListGetStats to reuse
some code for determining disk usage of offline domains, we
need to make it easier to pull out part of the guts of grabbing
blockinfo. The current implementation grabs a job fairly late
in the game, while getstats will already own a job; reordering
things so that the job is always grabbed up front in both
functions will make it easier to pull out the common code.
This patch results in grabbing a job in cases where one was not
previously needed, but as it is a query job, it should not be
noticeably slower.
This patch touches the same code as the fix for CVE-2014-6458
(commit b799259); in that patch, we avoided hotplug changing
a disk reference during the time of obtaining a monitor lock
by copying all data we needed and no longer referencing disk;
this patch goes the other way and ensures that by holding the
job, the disk cannot be changed so we no longer need to worry
about the disk being invalidated across the monitor lock.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Rearrange job
control to be outside of disk information.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit e3435caf added cleanup code to qemuDomainSetVcpusFlags() that was
not supposed to reset the error. Usual procedure was done, saving the
error to temporary variable, but it was never free'd, but rather leaked.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit af2a1f05 tried clearly separating each condition in
qemuRestoreCgroupState() for the sake of readability, however somehow
one condition body was missing. That means that the body of the next
condition got executed only if both of there were true, which is
impossible, thus resulting in a dead code and a logic error.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When hot-plugging a VCPU into the guest, kvm needs to allocate some data
from the DMA zone, which might be in a memory node that's not allowed in
cpuset.mems. Basically the same problem as there was with starting the
domain and due to which commit 7e72ac7878
exists. This patch just extends it to hotplugging as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1161540
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Instead of setting the value of cpuset.mems once when the domain starts
and then re-calculating the value every time we need to change the child
cgroup values, leave the cgroup alone and rather set the child data
every time there is new cgroup created. We don't leave any task in the
parent group anyway. This will ease both current and future code.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1174154
When we use attach-device add a hostdev or chr device which have a
iscsi address or others (just like guest agent, subsys iscsi disk...),
we will find there is no basic controller for our new attached device.
Somtimes this will make guest cannot start after we add them (although
they can start at the second time).
Signed-off-by: Luyao Huang <lhuang@redhat.com>
When libvirt is managing a bridge's forwarding database (FDB)
(macTableManager='libvirt'), if we add FDB entries for a new guest
interface even before the qemu process is created, then in the case of
a migration any other guest attached to the "destination" bridge will
have its traffic immediately sent to the destination of the migration
even while the source domain is still running (and the destination, of
course, isn't). To make sure that traffic from other guests on the new
host continues flowing to the old guest until the new one is ready, we
have to wait until the new guest CPUs are started to add the FDB
entries.
Conversely, we need to remove the FDB entries from the bridge any time
the guest CPUs are stopped; among other things, this will assure
proper operation during a post-copy migration (which is just the
opposite of the problem described in the previous paragraph).
We can change vnc password by using virDomainUpdateDeviceFlags API with
live flag. But it can't be changed with config flag. Error is reported as
below.
error: Operation not supported: persistent update of device 'graphics' is not supported
This patch supports the graphics arguments changed with config flag.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
It's not supported to change some graphics arguments with '--live'.
Replace some error code VIR_ERR_INTERNAL_ERROR and VIR_ERR_INVALID_ARG
with VIR_ERR_OPERATION_UNSUPPORTED.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1173507
It occurred to me that OpenStack uses the following XML when not using
regular huge pages:
<memoryBacking>
<hugepages>
<page size='4' unit='KiB'/>
</hugepages>
</memoryBacking>
However, since we are expecting to see huge pages only, we fail to
startup the domain with following error:
libvirtError: internal error: Unable to find any usable hugetlbfs
mount for 4 KiB
While regular system pages are not huge pages technically, our code is
prepared for that and if it helps OpenStack (or other management
applications) we should cope with that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1160995
In our config files users are expected to pass several integer values
for different configuration knobs. However, majority of them expect a
nonnegative number and only a few of them accept a negative number too
(notably keepalive_interval in libvirtd.conf).
Therefore, a new type to config value is introduced: VIR_CONF_ULONG
that is set whenever an integer is positive or zero. With this
approach knobs accepting VIR_CONF_LONG should accept VIR_CONF_ULONG
too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We now have a qemuInterfaceStartDevices() which does the final
activation needed for the host-side tap/macvtap devices that are used
for qemu network connections. It will soon make sense to have the
converse qemuInterfaceStopDevices() which will undo whatever was done
during qemuInterfaceStartDevices().
A function to "stop" a single device has also been added, and is
called from the appropriate place in qemuDomainDetachNetDevice(),
although this is currently unnecessary - the device is going to
immediately be deleted anyway, so any extra "deactivation" will be for
naught. The call is included for completeness, though, in anticipation
that in the future there may be some required action that *isn't*
nullified by deleting the device.
This patch is a part of a more complete fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461