Commit Graph

9417 Commits

Author SHA1 Message Date
Peter Krempa
bc8b159cb1 qemu: backup: Properly propagate async job type when cancelling the job
When cancelling the blockjobs as part of failed backup job startup
recover we didn't pass in the correct async job type. Luckily the block
job handler and cancellation code paths use no block job at all
currently so those were correct.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:36 +01:00
Peter Krempa
3a98fe9db3 qemu: blockjob: Remove infrastructure for remembering to delete image
Now that we delete the images elsewhere it's not required. Additionally
it's safe to do as we never released an upstream version which required
this being in place.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:36 +01:00
Peter Krempa
40485059ab qemu: backup: Move deletion of backup images to job termination
While qemu is running both locations are identical in semantics, but the
move will allow us to fix the scenario when the VM is destroyed or
crashes where we'd leak the images.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:35 +01:00
Peter Krempa
d6b994bafd qemu: backup: Configure backup store image with backing file
In contrast to snapshots the backup job does not complain when the
backup job's store file has backing pre-configured. It's actually
required so that the NBD server exposes all the data properly.

Remove our fake termination and use the existing disk source as backing.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:35 +01:00
Peter Krempa
728b993c8a qemu: Reset the node-name allocator in qemuDomainObjPrivateDataClear
qemuDomainObjPrivateDataClear clears state which become invalid after VM
stopped running and the node name allocator belongs there.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:35 +01:00
Peter Krempa
bae81b8e76 qemu: block: Use proper asyncJob when waiting for completion of blockdev-create
The waiting loop used QEMU_ASYNC_JOB_NONE rather than 'asyncJob' passed
from the caller.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:35 +01:00
Daniel P. Berrangé
8812163124 src: remove unused imports of dirname.h
A few places were importing dirname.h without actually using it.

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-03 15:42:13 +00:00
Daniel P. Berrangé
bf7d2a26a3 src: replace mdir_name() with g_path_get_dirname()
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-03 15:42:13 +00:00
Daniel P. Berrangé
472cc3941b util: replace IS_ABSOLUTE_FILE_NAME with g_path_is_absolute
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-03 15:42:13 +00:00
Daniel P. Berrangé
f5e9bdb87f src: replace clock_gettime()/gettimeofday() with g_get_real_time()
g_get_real_time() returns the time since epoch in microseconds.
It uses gettimeofday() internally while libvirt used clock_gettime
because it is declared async signal safe. In practice gettimeofday
is also async signal safe *provided* the timezone parameter is
NULL. This is indeed the case in g_get_real_time().

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-03 15:42:13 +00:00
Daniel P. Berrangé
f7df985684 src: switch from fnmatch to g_pattern_match_simple
The g_pattern_match function_simple is an acceptably close
approximation of fnmatch for libvirt's needs.

In contrast to fnmatch(), the '/' character can be matched
by the wildcards, there are no '[...]' character ranges and
'*' and '?' can not be escaped to include them literally in
a pattern.

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-03 15:42:13 +00:00
Daniel P. Berrangé
d0312c584f src: use g_lstat() instead of lstat()
The GLib g_lstat() function provides a portable impl for
Win32.

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-03 15:42:13 +00:00
Nikolay Shirokovskiy
6c6d93bc62 qemu: hide details of fake reboot
If we use fake reboot then domain goes thru running->shutdown->running
state changes with shutdown state only for short period of time.  At
least this is implementation details leaking into API. And also there is
one real case when this is not convinient. I'm doing a backup with the
help of temporary block snapshot (with the help of qemu's API which is
used in the newly created libvirt's backup API). If guest is shutdowned
I want to continue to backup so I don't kill the process and domain is
in shutdown state. Later when backup is finished I want to destroy qemu
process. So I check if it is in shutdowned state and destroy it if it
is. Now if instead of shutdown domain got fake reboot then I can destroy
process in the middle of fake reboot process.

After shutdown event we also get stop event and now as domain state is
running it will be transitioned to paused state and back to running
later. Though this is not critical for the described case I guess it is
better not to leak these details to user too. So let's leave domain in
running state on stop event if fake reboot is in process.

Reconnection code handles this patch without modification. It detects
that qemu is not running due to shutdown and then calls qemuProcessShutdownOrReboot
which reboots as fake reboot flag is set.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-24 09:22:40 +03:00
Daniel P. Berrangé
42b3e5b9e4 qemu: store the emulator name in the capabilities XML
We don't need this for any functional purpose, but when debugging hosts
it is useful to know what binary a given capabilities XML document is
associated with.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-23 16:39:38 +00:00
Daniel P. Berrangé
0fcc78d51b qemu: add qemu caps constructor which takes binary name
Simplify repeated code patterns by providing a new constructor taking
the QEMU binary name.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-23 16:39:36 +00:00
Daniel P. Berrangé
25db737471 qemu: add explicit flag to skip qemu caps invalidation
Currently if the binary path is NULL in the qemu capabilities object,
cache invalidation is skipped. A future patch will ensure that the
binary path is always non-NULL, so a way to explicitly skip invalidation
is required.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-23 16:39:20 +00:00
Daniel Henrique Barboza
7a7d36055c qemu_process.c: remove 'cleanup' label from qemuProcessCreatePretendCmd()
The 'cleanup' flag is doing no cleaup in this function. We can
remove it and return NULL on error or qemuBuildCommandLine().

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Daniel Henrique Barboza
d8eb3ab9e1 qemu_process.c: remove cleanup labels after g_auto*() changes
The g_auto*() changes made by the previous patches made a lot
of 'cleanup' labels obsolete. Let's remove them.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Daniel Henrique Barboza
d234efc59a qemu_process.c: use g_autoptr()
Change all feasible pointers to use g_autoptr().

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Daniel Henrique Barboza
906d653297 qemu_domain.h: add G_DEFINE_AUTOPTR_CLEANUP_FUNC for qemuDomainLogContext
This will allow us to g_autoptr qemuDomainLogContext pointers
in the following patch.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Daniel Henrique Barboza
982ea95142 qemu_process.c: use g_autofree
Change all feasible strings and scalar pointers to use g_autofree.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Fabiano Fidêncio
2c38781792 qemu: Don't check the output of virGetUserRuntimeDirectory()
virGetUserRuntimeDirectory() *never* *ever* returns NULL, making the
checks for it completely unnecessary.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-12-20 09:38:43 +01:00
Fabiano Fidêncio
c1a1c75952 qemu: Don't check the output of virGetUserConfigDirectory()
virGetUserConfigDirectory() *never* *ever* returns NULL, making the
checks for it completely unnecessary.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-12-20 09:38:43 +01:00
Fabiano Fidêncio
2db0583c73 qemu: Don't check the output of virGetUserCacheDirectory()
virGetUserCacheDirectory() *never* *ever* returns NULL, making the
checks for it completely unnecessary.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-12-20 09:38:43 +01:00
Fabiano Fidêncio
d0e1c6a6ae qemu: Don't check the output of virGetUserDirectory()
virGetUserDirectory() *never* *ever* returns NULL, making the checks for
it completely unnecessary.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-12-20 09:38:43 +01:00
Daniel Henrique Barboza
ae2edb39b9 qemu: handle unassigned PCI hostdevs in command line
Previous patch made it possible for the QEMU driver to check if
a given PCI hostdev is unassigned, by checking if dev->info->type is
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_UNASSIGNED, meaning that this device
shouldn't be part of the actual guest launch.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 13:08:28 -05:00
Daniel Henrique Barboza
96999404cb Introducing new address type='unassigned' for PCI hostdevs
This patch introduces a new PCI hostdev address type called
'unassigned'. This new type gives users the option to add
PCI hostdevs to the domain XML in an 'unassigned' state, meaning
that the device exists in the domain, is managed by Libvirt
like any regular PCI hostdev, but the guest does not have
access to it.

This adds extra options for managing PCI device binding
inside Libvirt, for example, making all the managed PCI hostdevs
declared in the domain XML to be detached from the host and bind
to the chosen driver and, at the same time, allowing just a
subset of these devices to be usable by the guest.

Next patch will use this new address type in the QEMU driver to
avoid adding unassigned devices to the QEMU launch command line.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 13:08:27 -05:00
Daniel Henrique Barboza
94f6e2f9fc qemu: command: move validation of vmcoreinfo to qemu_domain.c
Move the validation of vmcoreinfo from qemuBuildVMCoreInfoCommandLine()
to qemuDomainDefValidateFeatures(), allowing for validation
at domain define time.

qemuxml2xmltest.c was changed to account for this caps being
now validated at this earlier stage.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 13:01:36 -05:00
Daniel Henrique Barboza
a15de75dc5 qemu: command: move qemuBuildSmartcardCommandLine validation to qemu_domain.c
Move smartcard validation being done by qemuBuildSmartcardCommandLine()
to the existing qemuDomainSmartcardDefValidate() function. This
function is called by qemuDomainDeviceDefValidate(), allowing smartcard
validation in domain define time.

Tests were adapted to consider the new caps being needed in
this earlier stage.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 13:01:30 -05:00
Daniel Henrique Barboza
379e955eb8 qemu: command: move qemuBuildGraphicsEGLHeadlessCommandLine validation to qemu_domain.c
Move EGL Headless validation from qemuBuildGraphicsEGLHeadlessCommandLine()
to qemuDomainDeviceDefValidateGraphics(). This function is called by
qemuDomainDefValidate(), validating the graphics parameters in domain
define time.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 12:54:56 -05:00
Daniel Henrique Barboza
2acbbd821b qemu: command: move NVDIMM validation to qemu_domain.c
Move the NVDIMM validation from qemuBuildMachineCommandLine()
to a new function in qemu_domain.c, qemuDomainDeviceDefValidateMemory(),
which is called by qemuDomainDeviceDefValidate(). This allows
NVDIMM validation to occur in domain define time.

It also increments memory hotplug validation, which can be seen
by the failures in the hotplug tests in qemuxml2xmltest.c that
needed to be adjusted after the move.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 12:54:56 -05:00
Daniel Henrique Barboza
aed9bcd11b qemu_command: tidy up qemuBuildHostdevCommandLine loop
The current 'for' loop with 5 consecutive 'ifs' inside
qemuBuildHostdevCommandLine can be a bit smarter:

- all 5 'ifs' fails if hostdev->mode is not equal to
VIR_DOMAIN_HOSTDEV_MODE_SUBSYS. This check can be moved to the
start of the loop, failing to the next element immediately
in case it fails;

- all 5 'ifs' checks for a specific subsys->type to build the proper
command line argument (virHostdevIsSCSIDevice and virHostdevIsMdevDevice
do that but within a helper). Problem is that the code will keep
checking for matches even if one was already found, and there is
no way a hostdev will fit more than one 'if' (i.e. a hostdev can't
have 2+ different types). This means that a SUBSYS_TYPE_USB will
create its command line argument in the first 'if', then all other
conditionals will surely fail but will end up being checked anyway.

All of this can be avoided by moving the hostdev->mode comparing
to the start of the loop and using a switch statement with
subsys->type to execute the proper code for a given hostdev
type.

Suggested-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-18 16:02:08 +01:00
Michal Privoznik
39a7dff726 qemu: Don't leak hostcpu or hostnuma on driver cleanup
When freeing qemu driver struct members, we forgot to free
@hostcpu and @hostnuma members.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 14:28:48 +01:00
Michal Privoznik
7cf76d4e3a qemu: Reorder cleanup in qemuStateCleanup()
This function is supposed to clean up virQEMUDriver structure and
free individual members. However, it's doing that in random order
which makes it hard to track which members are being freed and
which are not. Do the free in reverse order than the structure
definition - assuming that the most important members (like
mutex) are declared first and freed last.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 14:28:48 +01:00
Laine Stump
6c17606b7c qemu: homogenize MAC address in live & config when hotplugging a netdev
Prior to commit 55ce656463 (first in libvirt 4.6.0), the XML sent to
virDomainAttachDeviceFlags() was parsed only once, and the results of
that parse were inserted into both the live object of the running
domain and into the persistent config. Thus, if MAC address was
omitted from in XML for a network device (<interface>), both the live
and config object would have the same MAC address.

Commit 55ce656463 changed the code to parse the incoming XML twice -
once for live and once for config. This does eliminate the problem of
PCI (/scsi/sata) address conflicts caused by allocating an address
based on existing devices in live object, but then inserting the
result into the config (which may already have a device using that
address), BUT it also means that when the MAC address of a network
device hasn't been specified in the XML, each copy will get a
different auto-generated MAC address.

This results in the MAC address of the device changing the next time
the domain is shutdown and restarted, which creates havoc with the
guest OS's network config.

There have been several discussions about this in the last > 1 year,
attempting to find the ideal solution to this problem that makes MAC
addresses consistent and accounts for all sorts of corner cases with
PCI/scsi/sata addresses. All of these discussions fizzled out because
every proposal was either too difficult to implement or failed to fix
some esoteric case someone thought up.

So, in the interest of solving the MAC address problem while not
making the "other address" situation any worse than before, this patch
simply adds a qemuDomainAttachDeviceLiveAndConfigHomogenize() function
that (for now) copies the MAC address from the config object to the
live object (if the original xml had <mac address='blah'/> then this
will be an effective NOP (as the macs already match)).

Any downstream libvirt containing upstream commit
55ce656463 should have this patch as well.

https://bugzilla.redhat.com/1783411

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-17 21:21:09 -05:00
Michal Privoznik
7be63dbe25 qemuGetDHCPInterfaces: Switch to GLib
If we use glib alloc functions, we can drop the 'cleanup' label
and @rv variable and also simplify the code a bit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Michal Privoznik
c06f4b48fe qemuGetDHCPInterfaces: Move some variables inside the loop
Some variables are not used outside of the for() loop. Move their
declaration to clean up the code a bit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Michal Privoznik
dae430ccbc qemu: Don't use dom->conn to lookup virNetwork
When using the monolithic daemon, then dom->conn has all driver
tables filled in properly and thus it's safe to call an API other
than virDomain*(). However, when using split daemons then
dom->conn has only hypervisor driver table set
(dom->conn->driver) and the rest is NULL. Therefore, if we want
to call a non-domain API (virNetworkLookupByName() in this case),
we have obtain the cached connection object accessible via
virGetConnectNetwork().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Michal Privoznik
5910b180ca qemu_driver: Push qemuDomainInterfaceAddresses() a few lines down
If we place qemuDomainInterfaceAddresses() a few lines below the
two functions its using then we can drop forward declarations of
those functions.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Pavel Mores
b036505279 qemu: use g_autofree instead of VIR_FREE in qemuMonitorTextCreateSnapshot()
While at bugfixing, convert the whole function to the new-style memory
allocation handling.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Pavel Mores <pmores@redhat.com>
2019-12-17 10:49:30 -05:00
Michal Privoznik
430715604f qemu_hotplug: Prepare NVMe disks on hotplug
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
6edb4321b2 qemu: Allow forcing VFIO when computing memlock limit
With NVMe disks, one can start a blockjob with a NVMe disk
that is not visible in domain XML (at least right away). Usually,
it's fairly easy to override this limitation of
qemuDomainGetMemLockLimitBytes() - for instance for hostdevs we
temporarily add the device to domain def, let the function
calculate the limit and then remove the device. But it's not so
easy with virStorageSourcePtr - in some cases they don't
necessarily are attached to a disk. And even if they are it's
done later in the process and frankly, I find it too complicated
to be able to use the simple trick we use with hostdevs.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
da27be1b09 qemu: Don't leak storage perms on failure in qemuDomainAttachDiskGeneric
At the very beginning of the attach function the
qemuDomainStorageSourceChainAccessAllow() is called which
modifies CGroups, locks and seclabels for new disk and its
backing chain. This must be followed by a counterpart which
reverts back all the changes if something goes wrong. This boils
down to calling qemuDomainStorageSourceChainAccessRevoke() which
is done under 'error' label. But not all failure branches jump
there. They just jump onto 'cleanup' label where no revoke is
done. Such mistake is easy to do because 'cleanup' label does
exist. Therefore, dissolve 'error' block in 'cleanup' and have
everything jump onto 'cleanup' label.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
1038505420 qemu_monitor_text: Catch IOMMU/VFIO related errors in qemuMonitorTextAddDrive
Because this is a HMP we're dealing with, there is nothing like
class of reply message, so we have to do some string comparison
to guess if the command fails. Well, with NVMe disks whole new
class of errors comes to play because qemu needs to initialize
IOMMU and VFIO for them. You can see all the messages it may
produce in qemu_vfio_init_pci().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
8e2026cc18 qemu: Generate command line of NVMe disks
Now, that we have everything prepared, we can generate command
line for NVMe disks.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
c4062d5620 qemu_capabilities: Introduce QEMU_CAPS_DRIVE_NVME
This capability tracks if qemu is capable of:

  -drive file.driver=nvme

The feature was added in QEMU's commit of v2.12.0-rc0~104^2~2.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
c988a39c7b qemu: Allow NVMe disk in CGroups
If a domain has an NVMe disk configured, then we need to allow it
on devices CGroup so that qemu can access it. There is one caveat
though - if an NVMe disk is read only we need CGroup to allow
write too. This is because when opening the device, qemu does
couple of ioctl()-s which are considered as write.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
329a680297 qemu: Mark NVMe disks as 'need VFIO'
There are couple of places where a domain with a VFIO device gets
special treatment: in CGroups when enabling/disabling access to
/dev/vfio/vfio, and when creating/removing nodes in domain mount
namespace. Well, a NVMe disk is a VFIO device too. Fortunately,
we have this qemuDomainNeedsVFIO() function which is the only
place that needs adjustment.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
a80ebd2a2a qemu: Create NVMe disk in domain namespace
If a domain has an NVMe disk configured, then we need to create
/dev/vfio/* paths in domain's namespace so that qemu can open
them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:43 +01:00
Michal Privoznik
d3f06dcdb5 qemu: Take NVMe disks into account when calculating memlock limit
We have this beautiful function that does crystal ball
divination. The function is named
qemuDomainGetMemLockLimitBytes() and it calculates the upper
limit of how much locked memory is given guest going to need. The
function bases its guess on devices defined for a domain. For
instance, if there is a VFIO hostdev defined then it adds 1GiB to
the guessed maximum. Since NVMe disks are pretty much VFIO
hostdevs (but not quite), we have to do the same sorcery.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:43 +01:00