Currently it is possible to start a domain which have disks
in same iotune group and at the same time having different iotune
params. Both params set are passed to qemu in command line and the one
that is passed later down command line is get actually set.
Let's prohibit such configurations.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Currently upon successfull call to qemu's implementation of
virDomainSetBlockIoTune iotune settings are changed only for the
disk given in API if the disk is in iotune group while we need
to change the settings for all disks in the group.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
And introduce virDomainBlockIoTuneInfoHasAny.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
All the callees return either 0 or -1 so there is no need
for propagating the value. And we bail on the first error.
Remove the variable to make the function simpler.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
We have a helper variable to make the code more concise,
use it consistently.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that the cleanup section is empty, eliminate the cleanup
label as well as the 'ret' variable.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Use the g_auto macros wherever possible to eliminate the cleanup
section.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The template still references libvirt-qemu-shim, which was at one
point the name used to refer to what we now know as virt-qemu-run.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
A recent commit added an error check for too-nested backing chains
followed by a return, even though errors above jump to cleanup.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: b168fa88b8
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Remove bogus G_GNUC_UNUSED attribute and add a missing space.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: d600667278
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
The algorithm is used in two places to find the parent checkpoint object
which contains given disk and then uses data from the disk. Additionally
the code is written in a very non-obvious way. Factor out the lookup of
the disk into a function which also simplifies the callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The function has no users now and there's no need for it as the common
pattern is to look up the whole disk object anyways.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
If a disk is unplugged and then the user tries to delete a checkpoint
the code would try to use NULL node name as it was not checked.
Fix this by fetching the whole disk definition object and verifying it
was found.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Lookup the whole disk definition rather than just the node name.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Upcoming patches will also use the domain disk definition. Rename disk
to chkdisk for clarity.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Upcoming patches will also use the domain disk definition. Rename disk
to chkdisk for clarity.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qemuCheckpointDiscard is a massive function that can be separated into
smaller bits. Extract the part that actually modifies the disk from the
metadata handling.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Test code will need to know whether the virQEMUCaps object contains any
machine types already. Add a helper and expose it via 'qemu_capspriv.h'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The previous approac of just purging the alias combined with the fact
that we filled in fake machine types in the test data meant that if a
test case used an alias machine type such as 'pc' or 'q35' it would not
properly resolve to the actual data returned by qemu.
This started to be a problem since the CPU driver now looks at the
default CPU reported with the machine type.
This patch replaces the original approach of just removing the alias by
replacing it with a copy of the machine type data which the type would
alias to. This means that we are using the real data while we don't
modify the test output after every qemu upgrade.
Additionally this change will allow us to drop adding the fake machine
types later.
The test fallout is from actually excercising the CPU driver with
actual data.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Separate out the internals as they will become more complex soon.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Every supported qemu is able to return the list of machine types it
supports so we can start validating it against that list. The advantage
is a better error message, and the change will also prevent having stale
test data.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Debian/Ubuntu linkers are more strict that other distros requiring glib
to be linked explicitly.
macOS needs -export-dynamic instead of -Wl,--export-dynamic
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Similarly to 510d154a0b we need to prevent
doing too deeply nested backing chains and reject them with a sane error
message.
Add a loop to go through the snapshots prior to attempting actually
creating them to prevent some possible inconsistent scenarios.
We don't need to do it when reusing backing chains as we'll be
re-detecting the backing chain in that case anyways.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Don't adopt the backing store data when reusing images provided by the
user. This will force a backing chain re-probe as users might have
passed in something unexpected in the overlay where our view of the
backing chain would not correspond.
This is done only for inactive snapshots as there we have way less
verification.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The previous "QEMU shim" proof of concept was taking an approach of only
caring about initial spawning of the QEMU process. It was then
registered with the libvirtd daemon who took over management of it. The
intent was that later libvirtd would be refactored so that the shim
retained control over the QEMU monitor and libvirt just forwarded APIs
to each shim as needed. This forwarding of APIs would require quite alot
of significant refactoring of libvirtd to achieve.
This impl thus takes a quite different approach, explicitly deciding to
keep the VMs completely separate from those seen & managed by libvirtd.
Instead it uses the new "qemu:///embed" URI scheme to embed the entire
QEMU driver in the shim, running with a custom root directory.
Once the driver is initialization, the shim starts a VM and then waits
to shutdown automatically when QEMU shuts down, or should kill QEMU if
it is terminated itself. This ought to use the AUTO_DESTROY feature but
that is not yet available in embedded mode, so we rely on installing a
few signal handlers to gracefully kill QEMU. This isn't reliable if
we crash of course, but you can restart with the same root dir.
Note this program does not expose any way to manage the QEMU process,
since there's no RPC interface enabled. It merely starts the VM and
cleans up when the guest shuts down at the end. This program is
installed to /usr/bin/virt-qemu-run enabling direct use by end users.
Most use cases will probably want to integrate the concept directly
into their respective application codebases. This standalone binary
serves as a nice demo though, and also provides a way to measure
performance of the startup process quite simply.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This enables support for running QEMU embedded to the calling
application process using a URI:
qemu:///embed?root=/some/path
Note that it is important to keep the path reasonably short to
avoid risk of hitting the limit on UNIX socket path names
which is 108 characters.
When using the embedded mode with a root=/var/tmp/embed, the
driver will use the following paths:
logDir: /var/tmp/embed/log/qemu
swtpmLogDir: /var/tmp/embed/log/swtpm
configBaseDir: /var/tmp/embed/etc/qemu
stateDir: /var/tmp/embed/run/qemu
swtpmStateDir: /var/tmp/embed/run/swtpm
cacheDir: /var/tmp/embed/cache/qemu
libDir: /var/tmp/embed/lib/qemu
swtpmStorageDir: /var/tmp/embed/lib/swtpm
defaultTLSx509certdir: /var/tmp/embed/etc/pki/qemu
These are identical whether the embedded driver is privileged
or unprivileged.
This compares with the system instance which uses
logDir: /var/log/libvirt/qemu
swtpmLogDir: /var/log/swtpm/libvirt/qemu
configBaseDir: /etc/libvirt/qemu
stateDir: /run/libvirt/qemu
swtpmStateDir: /run/libvirt/qemu/swtpm
cacheDir: /var/cache/libvirt/qemu
libDir: /var/lib/libvirt/qemu
swtpmStorageDir: /var/lib/libvirt/swtpm
defaultTLSx509certdir: /etc/pki/qemu
At this time all features present in the QEMU driver are available when
running in embedded mode, availability matching whether the embedded
driver is privileged or unprivileged.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The intent here is to allow the virt drivers to be run directly embedded
in an arbitrary process without interfering with libvirtd. To achieve
this they need to store all their configuration & state in a separate
directory tree from the main system or session libvirtd instances.
This can be useful for doing testing of the virt drivers in "make check"
without interfering with the user's own libvirtd instances.
It can also be used for applications using KVM/QEMU as a piece of
infrastructure to build an service, rather than for general purpose
OS hosting. A long standing example is libguestfs, which would prefer
if its temporary VMs did show up in the main libvirtd VM list, because
this confuses apps such as OpenStack Nova. A more recent example would
be Kata which is using KVM as a technology to build containers.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
If a domain is configured to have an egl-headless display and a virtio
video device, virgl will be enabled automatically within the guest, even
if the video device is configured with accel3d='no'.
In this case we should explicitly pass 'virgl=off' to qemu.
See https://bugzilla.redhat.com/show_bug.cgi?id=1791236 for more
information.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since v4.2-rc0, QEMU introduced a builtin rng backend that uses
getrandom() syscall to generate random. Add it to libvirt with the
backend model 'builtin'.
https://bugzilla.redhat.com/show_bug.cgi?id=1785091
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The 'builtin' rng backend model can be used as following:
<rng model='virtio'>
<backend model='builtin'/>
</rng>
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It is used to check if qemu is capable of rng-builtin object.
This object is added since qemu-4.2.0-rc0, commit 6c4e9d48.
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since v5.6.0-48-g270583ed98 we try to cache domain capabilities,
i.e. store filled virDomainCaps in a hash table in virQEMUCaps
for future use. However, there's a race condition in the way it's
implemented. We use virQEMUCapsGetDomainCapsCache() to obtain the
pointer to the hash table, then we search the hash table for
cached data and if none is found the domcaps is constructed and
put into the table. Problem is that this is all done without any
locking, so if there are two threads trying to do the same, one
will succeed and the other will fail inserting the data into the
table.
Also, the API looks a bit fishy - obtaining pointer to the hash
table is dangerous.
The solution is to use a mutex that guards the whole operation
with the hash table. Then, the API can be changes to return
virDomainCapsPtr directly.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1791790
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When fixing [1] I've ran attached reproducer and had it spawn
1024 threads and query capabilities XML in each one of them. This
lead libvirtd to hit the RLIMIT_NOFILE limit which was kind of
expected. What wasn't expected was a subsequent segfault. It
happened because virCPUProbeHost failed and returned NULL. We've
taken the NULL and passed it to virCapabilitiesHostNUMARef()
which dereferenced it. Code inspection showed the same flas in
virQEMUDriverGetHostNUMACaps(), so I'm fixing both places.
1: https://bugzilla.redhat.com/show_bug.cgi?id=1791790
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Don't use ERANGE as it doesn't make much sense in the error message.
Also point out that the reply from qemu was too large which is not
obvious from the original error:
error: No complete monitor response found in 10485760 bytes: Numerical result out of range
The new message will read:
error: internal error: QEMU monitor reply exceeds buffer size (10485760 bytes)
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
libvirt treats 'luks' images as raw+encryption. The logic in
qemuBlockStorageSourceCreateFormat skipped the creation if the requested
image was raw but didn't take into account the encryption.
This manifested itself e.g. when attempting to do a virsh blockcopy with
the following XML:
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/tmp/enccpy'>
<encryption format='luks'>
<secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/>
</encryption>
</source>
</disk>
Where qemu would report the following error:
unable to execute QEMU command 'blockdev-add': Volume is not in LUKS format
rather than actually formatting the image first.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Use the user-configured name of the bitmap when merging the appropriate
bitmaps for an incremental backup so that the user can see it as
configured. Additionally expose the default bitmap name if nothing is
configured.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Pass the exportname as configured when exporting the image via NBD and
fill it with the default if it's not configured.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When using blockdev configurations the 'device' argument of
'blockdev-commit' must correspond to the topmost node in the block node
graph. Libvirt didn't do this properly in case when 'copy_on_read'
option was enabled on the disk.
Use qemuDomainDiskGetTopNodename to fix it when calling block-commit.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When using blockdev configurations the 'device' argument of
'blockdev-mirror' must correspond to the topmost node in the block node
graph. Libvirt didn't do this properly in case when 'copy_on_read'
option was enabled on the disk.
Use qemuDomainDiskGetTopNodename to fix it for the blockdev-mirror calls
in qemuDomainBlockCopy and the non-shared-storage migration.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
There are more places which require getting the topmost nodename to be
passed to qemu. Separate it out into a new function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
If a mirror job fails to start in -blockdev mode we'd not unplug the
backing files we added first because the code on the error path checked
the wrong value. 'rc' is used as status of the code which added the
images, but the state of the 'block(dev)-mirror' call is stored in 'ret'
at that point.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The virConnectGetDomainCapabilities API accepts either a binary path
to the emulator, or desired guest arch. If guest arch is not given,
then the host arch is assumed.
In the case where the binary is not given, the code tried to find the
emulator binary in the existing list of cached emulator capabilities.
This is not valid since we switched to lazy population of the cache in:
commit 3dd91af01f
Author: Daniel P. Berrangé <berrange@redhat.com>
Date: Mon Dec 2 13:04:26 2019 +0000
qemu: stop creating capabilities at driver startup
As a result of this change, if there are no persistent guests defined
using the requested guest architecture, virConnectGetDomainCapabilities
will fail to find an emulator binary.
The solution is to stop relying on the cached capabilities to find the
binary and instead use the same logic we use to pick default a binary
per arch when populating capabilities.
Tested-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The "ps2" bus is only available on certain machines like x86. On
machines like s390x, we should refuse to add a device to this bus
instead of silently ignoring it.
Looking at the QEMU sources, PS/2 is only available if the QEMU binary
has the "i8042" device, so let's check for that and only allow "ps2"
devices if this QEMU device is available, or if we're on x86 anyway
(so we don't have to fake the QEMU_CAPS_DEVICE_I8042 capability in
all the tests that use <input ... bus='ps2'/> in their xml data).
Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1763191
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
QEMU driver has two functions: qemuGetDHCPInterfaces() and
qemuARPGetInterfaces() that are being used inside only one single
function. They can be turned into generic functions that other drivers
can use. This commit move both from QEMU driver tree to domain conf
tree.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function virSecretGetSecretString calls into secret driver and is
used from other hypervisors drivers and as such makes more sense in
util.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
gmtime_r/localtime_r are mostly used in combination with
strftime to format timestamps in libvirt. This can all
be replaced with GDateTime resulting in simpler code
that is also more portable.
There is some boundary condition problem in parsing POSIX
timezone offsets in GLib which tickles our test suite.
The test suite is hacked to avoid the problem. The upsteam
GLib bug report is
https://gitlab.gnome.org/GNOME/glib/issues/1999
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
G_STATIC_ASSERT() is a drop-in functional equivalent of
the GNULIB verify() macro.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Libvirt's original atomic ops impls were largely copied
from GLib's code at the time. The only API difference
was that libvirt's virAtomicIntInc() would return a
value, but g_atomic_int_inc was void. We thus use
g_atomic_int_add(v, 1) instead, though this means
virAtomicIntInc() now returns the original value,
instead of the new value.
This rewrites libvirt's impl in terms of g_atomic_int*
as a short term conversion. The key motivation was to
quickly eliminate use of GNULIB's verify_expr() macro
which is not a direct match for G_STATIC_ASSERT_EXPR.
Long term all the callers should be updated to use
g_atomic_int* directly.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Switch from old VIR_ allocation APIs to glib equivalents.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This function potentially grabs both a monitor job and an agent job at
the same time. This is problematic because it means that a malicious (or
just buggy) guest agent can cause a denial of service on the host. The
presence of this function makes it easy to do the wrong thing and hold
both jobs at the same time. All existing uses have already been removed
by previous commits.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In order to avoid holding an agent job and a normal job at the same
time, we want to avoid accessing the domain's definition while holding
the agent job. To achieve this, qemuAgentGetFSInfo() only returns the
raw information from the agent query to the caller. The caller can then
release the agent job and then proceed to look up the disk alias from
the vm definition. This necessitates moving a few helper functions to
qemu_driver.c and exposing the agent data structure (qemuAgentFSInfo) in
the header.
In addition, because the agent function no longer returns the looked-up
disk alias, we can't test the alias within qemuagenttest. Instead we
simply test that we parse and return the raw agent data correctly.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The qemuAgentDiskInfo structure is filled with information received from
the agent command response, except for the 'alias' field, which is
retrieved from the vm definition. Limit this structure only to data that
was received from the agent message.
This is another intermediate step in moving the responsibility for
searching the vmdef from qemu_agent.c to qemu_driver.c so that we can
avoid holding an agent job and a normal job at the same time.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In an effort to avoid holding both an agent and normal job at the same
time, we shouldn't access the vm definition from within qemu_agent.c
(i.e. while the agent job is being held). In preparation, we need to
store the full filesystem disk information in qemuAgentDiskInfo. In a
following commit, we can pass this information back to the caller and
the caller can search the vm definition to match the filsystem disk to
an alias.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function name doesn't give a good idea of what the function does.
Rename to qemuAgentGetFSInfoFillDisks() to make it more obvious than it
is filling in the disk information in the fsinfo struct.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
QEMU since 4.1.0 supports the "dies" parameter for -smp
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Recently CPU hardware vendors have started to support a new structure
inside the CPU package topology known as a "die". Thus the hierarchy
is now:
sockets > dies > cores > threads
This adds support for "dies" in the XML parser, with the value
defaulting to 1 if not specified for backwards compatibility.
For example a system with 64 logical CPUs might report
<topology sockets="4" dies="2" cores="4" threads="2"/>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When pause-before-switchover QEMU capability is enabled, we get STOP
event before MIGRATION event with postcopy-active state. To properly
handle post-copy migration and emit correct events commit
v4.10.0-rc1-4-geca9d21e6c added a hack to
qemuProcessHandleMigrationStatus which translates the paused state
reason to VIR_DOMAIN_PAUSED_POSTCOPY and emits
VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event when migration state changes
to post-copy.
However, the code was effective on both sides of migration resulting in
a confusing VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event on the destination
host, where entering post-copy mode is already properly advertised by
VIR_DOMAIN_EVENT_RESUMED_POSTCOPY event.
https://bugzilla.redhat.com/show_bug.cgi?id=1791458
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When resuming a domain from a save file, we read the domain XML
from the file, add it onto our internal list of domains, start
the qemu process, let it load the incoming migration stream and
resume its vCPUs afterwards. If anything goes wrong, the domain
object is removed from the list of domains and error is returned
to the caller. However, the qemu process might be left behind -
if resuming vCPUs fails (e.g. because qemu is unable to acquire
write lock on a disk) then due to a bug the qemu process is not
killed but the domain object is removed from the list.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1718707
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
We have to keep the default - querying the agent if no flag is
set.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
libvirt currently always reports that USB is available as a bus subsystem
type when running "virsh domcapabilities". However, this is not always
true, for example the qemu-system-s390x binary normally never has support
for USB. Thus we should only report that USB is available if there is
also a USB host controller available where we can attach USB devices.
Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1759849
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Historically there are two places where we format authentication and
encryption for a disk. The logich which formats it for backing files was
flawed though and didn't format it at all. This worked if the image
became a backing file through the means of a snapshot but not directly.
Force formatting of the source and encryption for any non-disk case to
fix the issue.
This caused problems in many places as we use the formatter to copy the
definition. Effectively any copy lost the secret definition.
https://bugzilla.redhat.com/show_bug.cgi?id=1789310https://bugzilla.redhat.com/show_bug.cgi?id=1788898
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Internal snapshots of a non-running domain do not carry any memory state
and restoring such a snapshot will not replace existing saved memory
state. This allows a scenario, where a user first suspends a domain into
managedsave, restores a non-running snapshot and then resumes the domain
from managedsave. After that, the guest system will run with its
previous memory state atop a different disk state. The most obvious
possible fallout from this is extensive file system corruption. Swap
content and RAID bitmaps might also be off.
This has been discussed[1] and fixed[2] from the end-user perspective for
virt-manager.
This patch marks the restore operation as risky at the libvirt level,
requiring the user to remove the saved memory state first or force the
operation.
[1] https://www.redhat.com/archives/virt-tools-list/2019-November/msg00011.html
[2] https://www.redhat.com/archives/virt-tools-list/2019-December/msg00049.html
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit v5.10.0-290-g3a4787a301 refactored qemuDomainGetHostdevPath to
return a single path rather than an array of paths. When the function is
called on a missing device, it will now return NULL in @path rather than
a NULL array with zero items and the callers need to be adapted
properly.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
src/qemu/qemu_domain_address.c:680:13: error: this statement may fall through [-Werror=implicit-fallthrough=]
switch ((virDomainFSModel) dev->data.fs->model) {
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: f363af7e35
The point of this function is to translate virDomainOsDefFirmware
enum to qemuFirmwareOSInterface enum. However, with my commit
v5.10.0-507-g8e1804f9f6 we are passing a variable type of
virDomainLoader enum. Make the function accept both enums and
make the enum members correspond to each other.
This fixes clang build.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Split the formatting by fsdriver type to allow adding a new type.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Split the switch by fsdriver type to allow adding a new one.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Wire up the allocation and disposal of private data.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Remove the 'gluster' part and decouple the return from
the gluster_debug_level parsing to allow adding more options
to this section.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Inactive VM doesn't have qemuCaps set thus we'd never properly report
that VM backups are supported only for running VMs.
Move the capability check after the active check.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
While we discourage people to use the old style of specifying
UEFI for their domains (the old style is putting path to the FW
image under /domain/os/loader/ whilst the new one is using
/domain/os/@firmware), some applications might have not adapted
yet. They still rely on libvirt autofilling NVRAM path and
figuring out NVRAM template when using the old way (notably
virt-install does this). We must preserve backcompat for this
previously supported config approach. However, since we really
want distro maintainers to leave --with-loader-nvram configure
option and rely on JSON descriptors, we need to implement
autofilling of NVRAM template for the old way too.
Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1782778
RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1776949
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
These functions are meant to replace verbose check for the old
style of specifying UEFI with a simple function call.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This simplifies condition when matching FW interface by having a
single line condition instead of multiline one. Also, it prepares
the code for future expansion.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This function needs domain definition really, we don't need to
pass the whole domain object. This saves couple of dereferences
and characters esp. in more checks to come.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
We insert the checkpoint metadata into the list of checkpoints prior to
actually creating the on-disk bits. If the 'transaction' or any other
steps done between inserting the checkpoint and creating the on-disk
data fail we'd end up with an unusable checkpoint that would vanish
after libvirtd restart.
Prevent this by rolling back the metadata if we didn't actually take and
record the checkpoint.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
If we are certain that the checkpoint creation failed we remove the
metadata from the list. To allow reusing this in the backup code add a
new helper and export it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The underlying resctrl monitoring is actually using 64 bit counters,
not the 32bit one. Correct this by using 64bit data type for reading
hardware value.
To keep the interface consistent, the result of CPU last level cache
that occupied by vcpu processors of specific restrl monitor group is
still reported with a truncated 32bit data type. because, in silicon
world, CPU cache size will never exceed 4GB.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Commit d75f865fb9 caused a job-deadlock if
a VM is running the backup job and being destroyed as it removed the
cleanup of the async job type and there was nothing to clean up the
backup job.
Add an explicit cleanup of the backup job when destroying a VM.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When cancelling the blockjobs as part of failed backup job startup
recover we didn't pass in the correct async job type. Luckily the block
job handler and cancellation code paths use no block job at all
currently so those were correct.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Now that we delete the images elsewhere it's not required. Additionally
it's safe to do as we never released an upstream version which required
this being in place.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While qemu is running both locations are identical in semantics, but the
move will allow us to fix the scenario when the VM is destroyed or
crashes where we'd leak the images.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In contrast to snapshots the backup job does not complain when the
backup job's store file has backing pre-configured. It's actually
required so that the NBD server exposes all the data properly.
Remove our fake termination and use the existing disk source as backing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
qemuDomainObjPrivateDataClear clears state which become invalid after VM
stopped running and the node name allocator belongs there.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The waiting loop used QEMU_ASYNC_JOB_NONE rather than 'asyncJob' passed
from the caller.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
A few places were importing dirname.h without actually using it.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
g_get_real_time() returns the time since epoch in microseconds.
It uses gettimeofday() internally while libvirt used clock_gettime
because it is declared async signal safe. In practice gettimeofday
is also async signal safe *provided* the timezone parameter is
NULL. This is indeed the case in g_get_real_time().
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The g_pattern_match function_simple is an acceptably close
approximation of fnmatch for libvirt's needs.
In contrast to fnmatch(), the '/' character can be matched
by the wildcards, there are no '[...]' character ranges and
'*' and '?' can not be escaped to include them literally in
a pattern.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The GLib g_lstat() function provides a portable impl for
Win32.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
If we use fake reboot then domain goes thru running->shutdown->running
state changes with shutdown state only for short period of time. At
least this is implementation details leaking into API. And also there is
one real case when this is not convinient. I'm doing a backup with the
help of temporary block snapshot (with the help of qemu's API which is
used in the newly created libvirt's backup API). If guest is shutdowned
I want to continue to backup so I don't kill the process and domain is
in shutdown state. Later when backup is finished I want to destroy qemu
process. So I check if it is in shutdowned state and destroy it if it
is. Now if instead of shutdown domain got fake reboot then I can destroy
process in the middle of fake reboot process.
After shutdown event we also get stop event and now as domain state is
running it will be transitioned to paused state and back to running
later. Though this is not critical for the described case I guess it is
better not to leak these details to user too. So let's leave domain in
running state on stop event if fake reboot is in process.
Reconnection code handles this patch without modification. It detects
that qemu is not running due to shutdown and then calls qemuProcessShutdownOrReboot
which reboots as fake reboot flag is set.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
We don't need this for any functional purpose, but when debugging hosts
it is useful to know what binary a given capabilities XML document is
associated with.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Simplify repeated code patterns by providing a new constructor taking
the QEMU binary name.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently if the binary path is NULL in the qemu capabilities object,
cache invalidation is skipped. A future patch will ensure that the
binary path is always non-NULL, so a way to explicitly skip invalidation
is required.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The 'cleanup' flag is doing no cleaup in this function. We can
remove it and return NULL on error or qemuBuildCommandLine().
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The g_auto*() changes made by the previous patches made a lot
of 'cleanup' labels obsolete. Let's remove them.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Change all feasible pointers to use g_autoptr().
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This will allow us to g_autoptr qemuDomainLogContext pointers
in the following patch.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Change all feasible strings and scalar pointers to use g_autofree.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
virGetUserRuntimeDirectory() *never* *ever* returns NULL, making the
checks for it completely unnecessary.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virGetUserConfigDirectory() *never* *ever* returns NULL, making the
checks for it completely unnecessary.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virGetUserCacheDirectory() *never* *ever* returns NULL, making the
checks for it completely unnecessary.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virGetUserDirectory() *never* *ever* returns NULL, making the checks for
it completely unnecessary.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Previous patch made it possible for the QEMU driver to check if
a given PCI hostdev is unassigned, by checking if dev->info->type is
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_UNASSIGNED, meaning that this device
shouldn't be part of the actual guest launch.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This patch introduces a new PCI hostdev address type called
'unassigned'. This new type gives users the option to add
PCI hostdevs to the domain XML in an 'unassigned' state, meaning
that the device exists in the domain, is managed by Libvirt
like any regular PCI hostdev, but the guest does not have
access to it.
This adds extra options for managing PCI device binding
inside Libvirt, for example, making all the managed PCI hostdevs
declared in the domain XML to be detached from the host and bind
to the chosen driver and, at the same time, allowing just a
subset of these devices to be usable by the guest.
Next patch will use this new address type in the QEMU driver to
avoid adding unassigned devices to the QEMU launch command line.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move the validation of vmcoreinfo from qemuBuildVMCoreInfoCommandLine()
to qemuDomainDefValidateFeatures(), allowing for validation
at domain define time.
qemuxml2xmltest.c was changed to account for this caps being
now validated at this earlier stage.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move smartcard validation being done by qemuBuildSmartcardCommandLine()
to the existing qemuDomainSmartcardDefValidate() function. This
function is called by qemuDomainDeviceDefValidate(), allowing smartcard
validation in domain define time.
Tests were adapted to consider the new caps being needed in
this earlier stage.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move EGL Headless validation from qemuBuildGraphicsEGLHeadlessCommandLine()
to qemuDomainDeviceDefValidateGraphics(). This function is called by
qemuDomainDefValidate(), validating the graphics parameters in domain
define time.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move the NVDIMM validation from qemuBuildMachineCommandLine()
to a new function in qemu_domain.c, qemuDomainDeviceDefValidateMemory(),
which is called by qemuDomainDeviceDefValidate(). This allows
NVDIMM validation to occur in domain define time.
It also increments memory hotplug validation, which can be seen
by the failures in the hotplug tests in qemuxml2xmltest.c that
needed to be adjusted after the move.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The current 'for' loop with 5 consecutive 'ifs' inside
qemuBuildHostdevCommandLine can be a bit smarter:
- all 5 'ifs' fails if hostdev->mode is not equal to
VIR_DOMAIN_HOSTDEV_MODE_SUBSYS. This check can be moved to the
start of the loop, failing to the next element immediately
in case it fails;
- all 5 'ifs' checks for a specific subsys->type to build the proper
command line argument (virHostdevIsSCSIDevice and virHostdevIsMdevDevice
do that but within a helper). Problem is that the code will keep
checking for matches even if one was already found, and there is
no way a hostdev will fit more than one 'if' (i.e. a hostdev can't
have 2+ different types). This means that a SUBSYS_TYPE_USB will
create its command line argument in the first 'if', then all other
conditionals will surely fail but will end up being checked anyway.
All of this can be avoided by moving the hostdev->mode comparing
to the start of the loop and using a switch statement with
subsys->type to execute the proper code for a given hostdev
type.
Suggested-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When freeing qemu driver struct members, we forgot to free
@hostcpu and @hostnuma members.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This function is supposed to clean up virQEMUDriver structure and
free individual members. However, it's doing that in random order
which makes it hard to track which members are being freed and
which are not. Do the free in reverse order than the structure
definition - assuming that the most important members (like
mutex) are declared first and freed last.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Prior to commit 55ce656463 (first in libvirt 4.6.0), the XML sent to
virDomainAttachDeviceFlags() was parsed only once, and the results of
that parse were inserted into both the live object of the running
domain and into the persistent config. Thus, if MAC address was
omitted from in XML for a network device (<interface>), both the live
and config object would have the same MAC address.
Commit 55ce656463 changed the code to parse the incoming XML twice -
once for live and once for config. This does eliminate the problem of
PCI (/scsi/sata) address conflicts caused by allocating an address
based on existing devices in live object, but then inserting the
result into the config (which may already have a device using that
address), BUT it also means that when the MAC address of a network
device hasn't been specified in the XML, each copy will get a
different auto-generated MAC address.
This results in the MAC address of the device changing the next time
the domain is shutdown and restarted, which creates havoc with the
guest OS's network config.
There have been several discussions about this in the last > 1 year,
attempting to find the ideal solution to this problem that makes MAC
addresses consistent and accounts for all sorts of corner cases with
PCI/scsi/sata addresses. All of these discussions fizzled out because
every proposal was either too difficult to implement or failed to fix
some esoteric case someone thought up.
So, in the interest of solving the MAC address problem while not
making the "other address" situation any worse than before, this patch
simply adds a qemuDomainAttachDeviceLiveAndConfigHomogenize() function
that (for now) copies the MAC address from the config object to the
live object (if the original xml had <mac address='blah'/> then this
will be an effective NOP (as the macs already match)).
Any downstream libvirt containing upstream commit
55ce656463 should have this patch as well.
https://bugzilla.redhat.com/1783411
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If we use glib alloc functions, we can drop the 'cleanup' label
and @rv variable and also simplify the code a bit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Some variables are not used outside of the for() loop. Move their
declaration to clean up the code a bit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
When using the monolithic daemon, then dom->conn has all driver
tables filled in properly and thus it's safe to call an API other
than virDomain*(). However, when using split daemons then
dom->conn has only hypervisor driver table set
(dom->conn->driver) and the rest is NULL. Therefore, if we want
to call a non-domain API (virNetworkLookupByName() in this case),
we have obtain the cached connection object accessible via
virGetConnectNetwork().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
If we place qemuDomainInterfaceAddresses() a few lines below the
two functions its using then we can drop forward declarations of
those functions.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
While at bugfixing, convert the whole function to the new-style memory
allocation handling.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Pavel Mores <pmores@redhat.com>
With NVMe disks, one can start a blockjob with a NVMe disk
that is not visible in domain XML (at least right away). Usually,
it's fairly easy to override this limitation of
qemuDomainGetMemLockLimitBytes() - for instance for hostdevs we
temporarily add the device to domain def, let the function
calculate the limit and then remove the device. But it's not so
easy with virStorageSourcePtr - in some cases they don't
necessarily are attached to a disk. And even if they are it's
done later in the process and frankly, I find it too complicated
to be able to use the simple trick we use with hostdevs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
At the very beginning of the attach function the
qemuDomainStorageSourceChainAccessAllow() is called which
modifies CGroups, locks and seclabels for new disk and its
backing chain. This must be followed by a counterpart which
reverts back all the changes if something goes wrong. This boils
down to calling qemuDomainStorageSourceChainAccessRevoke() which
is done under 'error' label. But not all failure branches jump
there. They just jump onto 'cleanup' label where no revoke is
done. Such mistake is easy to do because 'cleanup' label does
exist. Therefore, dissolve 'error' block in 'cleanup' and have
everything jump onto 'cleanup' label.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Because this is a HMP we're dealing with, there is nothing like
class of reply message, so we have to do some string comparison
to guess if the command fails. Well, with NVMe disks whole new
class of errors comes to play because qemu needs to initialize
IOMMU and VFIO for them. You can see all the messages it may
produce in qemu_vfio_init_pci().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Now, that we have everything prepared, we can generate command
line for NVMe disks.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
This capability tracks if qemu is capable of:
-drive file.driver=nvme
The feature was added in QEMU's commit of v2.12.0-rc0~104^2~2.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
If a domain has an NVMe disk configured, then we need to allow it
on devices CGroup so that qemu can access it. There is one caveat
though - if an NVMe disk is read only we need CGroup to allow
write too. This is because when opening the device, qemu does
couple of ioctl()-s which are considered as write.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
There are couple of places where a domain with a VFIO device gets
special treatment: in CGroups when enabling/disabling access to
/dev/vfio/vfio, and when creating/removing nodes in domain mount
namespace. Well, a NVMe disk is a VFIO device too. Fortunately,
we have this qemuDomainNeedsVFIO() function which is the only
place that needs adjustment.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
If a domain has an NVMe disk configured, then we need to create
/dev/vfio/* paths in domain's namespace so that qemu can open
them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
We have this beautiful function that does crystal ball
divination. The function is named
qemuDomainGetMemLockLimitBytes() and it calculates the upper
limit of how much locked memory is given guest going to need. The
function bases its guess on devices defined for a domain. For
instance, if there is a VFIO hostdev defined then it adds 1GiB to
the guessed maximum. Since NVMe disks are pretty much VFIO
hostdevs (but not quite), we have to do the same sorcery.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
The qemu driver has its own wrappers around virHostdev module (so
that some arguments are filled in automatically). Extend these to
include NVMe devices too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
To simplify implementation, some restrictions are added. For
instance, an NVMe disk can't go to any bus but virtio and has to
be type of 'disk' and can't have startupPolicy set.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
There are going to be more disk types that are considered unsafe
with respect to migration. Therefore, move the error reporting
call outside of if() body and rework if-else combo to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Sometimes, we have a PCI address and not fully allocated
virPCIDevice and yet we still want to know its /dev/vfio/N path.
Introduce virPCIDeviceAddressGetIOMMUGroupDev() function exactly
for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Previous patches rendered some of 'cleanup' labels needless.
Drop them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Now that all callers of qemuDomainGetHostdevPath() handle
/dev/vfio/vfio on their own, we can safely drop handling in this
function. In near future the decision whether domain needs VFIO
file is going to include more device types than just
virDomainHostdev.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
There are several variables which could be automatically freed
upon return from the function. I'm not changing @tmpPaths (which
is a string list) because it is going to be removed in next
commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
In near future, the decision what to do with /dev/vfio/vfio with
respect to domain namespace and CGroup is going to be moved out
of qemuDomainGetHostdevPath() because there will be some other
types of devices than hostdevs that need access to VFIO.
All functions that I'm changing (except qemuSetupHostdevCgroup())
assume that hostdev we are adding/removing to VM is not in the
definition yet (because of how qemuDomainNeedsVFIO() is written).
Fortunately, this assumption is true.
For qemuSetupHostdevCgroup(), the worst thing that may happen is
that we allow /dev/vfio/vfio which was already allowed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
qemuBuildSoundCodecStr() validates if a given QEMU binary
supports the sound codec. This validation can be moved to
qemu_domain.c to be executed in domain define time.
The codec validation was moved to the existing
qemuDomainDeviceDefValidateSound() function.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>