35958 Commits

Author SHA1 Message Date
Daniel Henrique Barboza
a15de75dc5 qemu: command: move qemuBuildSmartcardCommandLine validation to qemu_domain.c
Move smartcard validation being done by qemuBuildSmartcardCommandLine()
to the existing qemuDomainSmartcardDefValidate() function. This
function is called by qemuDomainDeviceDefValidate(), allowing smartcard
validation in domain define time.

Tests were adapted to consider the new caps being needed in
this earlier stage.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 13:01:30 -05:00
Daniel Henrique Barboza
379e955eb8 qemu: command: move qemuBuildGraphicsEGLHeadlessCommandLine validation to qemu_domain.c
Move EGL Headless validation from qemuBuildGraphicsEGLHeadlessCommandLine()
to qemuDomainDeviceDefValidateGraphics(). This function is called by
qemuDomainDefValidate(), validating the graphics parameters in domain
define time.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 12:54:56 -05:00
Daniel Henrique Barboza
2acbbd821b qemu: command: move NVDIMM validation to qemu_domain.c
Move the NVDIMM validation from qemuBuildMachineCommandLine()
to a new function in qemu_domain.c, qemuDomainDeviceDefValidateMemory(),
which is called by qemuDomainDeviceDefValidate(). This allows
NVDIMM validation to occur in domain define time.

It also increments memory hotplug validation, which can be seen
by the failures in the hotplug tests in qemuxml2xmltest.c that
needed to be adjusted after the move.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 12:54:56 -05:00
Fabiano Fidêncio
5742d4c018 util: Rewrite virGetUserRuntimeDirectory() using g_get_user_runtime_dir()
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-18 17:58:19 +01:00
Fabiano Fidêncio
520e626e7e util: Rewrite virGetUserCacheDirectory() using g_get_user_cache_dir()
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-18 17:58:15 +01:00
Fabiano Fidêncio
e59b946ce4 util: Rewrite virGetUserConfigDirectory() using g_get_user_config_dir()
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-18 17:58:11 +01:00
Fabiano Fidêncio
850fb89a43 util: Rewrite virGetUserDirectory() using g_get_home_dir()
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-18 17:58:05 +01:00
Daniel P. Berrangé
e67ccd3cf8 conf: fix populating of fake NUMA in multi-node hosts
If the host OS doesn't have NUMA present, we fallback to
populating fake NUMA info and the code thus assumes only a
single NUMA node.

Unfortunately we also fallback to fake NUMA if numactl-devel
was not present, and in this case we can still have multiple
NUMA nodes. In this case we create all CPUs, but only the
CPUs in the first node have any data filled in, resulting in
capabilities like:

    <topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>15977572</memory>
          <cpus num='48'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='0' core_id='0' siblings='1'/>
            <cpu id='2' socket_id='0' core_id='1' siblings='2'/>
            <cpu id='3' socket_id='0' core_id='1' siblings='3'/>
            <cpu id='4' socket_id='0' core_id='2' siblings='4'/>
            <cpu id='5' socket_id='0' core_id='2' siblings='5'/>
            <cpu id='6' socket_id='0' core_id='3' siblings='6'/>
            <cpu id='7' socket_id='0' core_id='3' siblings='7'/>
            <cpu id='8' socket_id='0' core_id='4' siblings='8'/>
            <cpu id='9' socket_id='0' core_id='4' siblings='9'/>
            <cpu id='10' socket_id='0' core_id='5' siblings='10'/>
            <cpu id='11' socket_id='0' core_id='5' siblings='11'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
          </cpus>
        </cell>
      </cells>
    </topology>

With this new code we get something slightly less broken

    <topology>
      <cells num='4'>
        <cell id='0'>
          <memory unit='KiB'>15977572</memory>
          <cpus num='12'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
            <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
            <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/>
            <cpu id='5' socket_id='0' core_id='2' siblings='4-5'/>
            <cpu id='6' socket_id='0' core_id='3' siblings='6-7'/>
            <cpu id='7' socket_id='0' core_id='3' siblings='6-7'/>
            <cpu id='8' socket_id='0' core_id='4' siblings='8-9'/>
            <cpu id='9' socket_id='0' core_id='4' siblings='8-9'/>
            <cpu id='10' socket_id='0' core_id='5' siblings='10-11'/>
            <cpu id='11' socket_id='0' core_id='5' siblings='10-11'/>
          </cpus>
        </cell>
        <cell id='0'>
          <memory unit='KiB'>15977572</memory>
          <cpus num='12'>
            <cpu id='12' socket_id='0' core_id='0' siblings='12-13'/>
            <cpu id='13' socket_id='0' core_id='0' siblings='12-13'/>
            <cpu id='14' socket_id='0' core_id='1' siblings='14-15'/>
            <cpu id='15' socket_id='0' core_id='1' siblings='14-15'/>
            <cpu id='16' socket_id='0' core_id='2' siblings='16-17'/>
            <cpu id='17' socket_id='0' core_id='2' siblings='16-17'/>
            <cpu id='18' socket_id='0' core_id='3' siblings='18-19'/>
            <cpu id='19' socket_id='0' core_id='3' siblings='18-19'/>
            <cpu id='20' socket_id='0' core_id='4' siblings='20-21'/>
            <cpu id='21' socket_id='0' core_id='4' siblings='20-21'/>
            <cpu id='22' socket_id='0' core_id='5' siblings='22-23'/>
            <cpu id='23' socket_id='0' core_id='5' siblings='22-23'/>
          </cpus>
        </cell>
      </cells>
    </topology>

The topology at least now reflects what 'virsh nodeinfo' reports.
The main bug is that the CPU "id" values won't match what the Linux
host actually uses.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-18 15:19:22 +00:00
Daniel P. Berrangé
fb5aaf3d05 conf: avoid mem leak re-allocating fake NUMA capabilities
The 'caps' object is already allocated when the fake NUMA
initialization takes place.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-18 15:19:22 +00:00
Daniel Henrique Barboza
aed9bcd11b qemu_command: tidy up qemuBuildHostdevCommandLine loop
The current 'for' loop with 5 consecutive 'ifs' inside
qemuBuildHostdevCommandLine can be a bit smarter:

- all 5 'ifs' fails if hostdev->mode is not equal to
VIR_DOMAIN_HOSTDEV_MODE_SUBSYS. This check can be moved to the
start of the loop, failing to the next element immediately
in case it fails;

- all 5 'ifs' checks for a specific subsys->type to build the proper
command line argument (virHostdevIsSCSIDevice and virHostdevIsMdevDevice
do that but within a helper). Problem is that the code will keep
checking for matches even if one was already found, and there is
no way a hostdev will fit more than one 'if' (i.e. a hostdev can't
have 2+ different types). This means that a SUBSYS_TYPE_USB will
create its command line argument in the first 'if', then all other
conditionals will surely fail but will end up being checked anyway.

All of this can be avoided by moving the hostdev->mode comparing
to the start of the loop and using a switch statement with
subsys->type to execute the proper code for a given hostdev
type.

Suggested-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-18 16:02:08 +01:00
Daniel P. Berrangé
2e07a1e146 event: add API for requiring an event loop impl to be registered
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-18 14:04:59 +00:00
Daniel P. Berrangé
cccc3fc1bb access: report an error if no access manager is present
The code calling this method expects it to have reported an error on
failure.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-18 14:04:51 +00:00
Michal Privoznik
39a7dff726 qemu: Don't leak hostcpu or hostnuma on driver cleanup
When freeing qemu driver struct members, we forgot to free
@hostcpu and @hostnuma members.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 14:28:48 +01:00
Michal Privoznik
7cf76d4e3a qemu: Reorder cleanup in qemuStateCleanup()
This function is supposed to clean up virQEMUDriver structure and
free individual members. However, it's doing that in random order
which makes it hard to track which members are being freed and
which are not. Do the free in reverse order than the structure
definition - assuming that the most important members (like
mutex) are declared first and freed last.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 14:28:48 +01:00
Michal Privoznik
599f9c73d4 virCapabilitiesHostNUMAUnref: Accept NULL
Fortunately, this is not causing any problems now because glib
does this check for us when calling this function via attribute
cleanup. But in a future commit we will explicitly call this
function over a struct member that might be NULL.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 14:28:48 +01:00
Michal Privoznik
98f1f4a439 testutilsxen: Avoid double free of driver caps
In testXLInitDriver() a dummy driver structure is filled and it
is freed later in testXLFreeDriver(). However, it is sufficient
to unref just driver->config because that results in
libxlDriverConfigDispose() being called which unrefs
driver->config->caps. There is no need to unref it again in
testXLFreeDriver() - in fact it's undesired.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 14:28:48 +01:00
Michal Privoznik
08a7e88b6f domaincapstest: Don't leak cpu definitions
When generating domain capabilities, we need to fake host CPU to
get reproducible result. We do this by copying a pre-existent CPU
config and setting VIR_TEST_MOCK_FAKE_HOST_CPU env variable which
is then consumed by qemucpumock. However, we forget to free the
CPU copy afterwards.

 2,196 (2,016 direct, 180 indirect) bytes in 18 blocks are definitely lost in loss record 291 of 297
    at 0x4838B86: calloc (vg_replace_malloc.c:762)
    by 0x57CB6A0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7)
    by 0x4A0F72D: virCPUDefNew (cpu_conf.c:87)
    by 0x4A0FAC7: virCPUDefCopyWithoutModel (cpu_conf.c:235)
    by 0x4A0FBBE: virCPUDefCopy (cpu_conf.c:273)
    by 0x10E3C0: testUtilsHostCpusGetDefForArch (testutilshostcpus.h:157)
    by 0x10E3C0: fakeHostCPU (domaincapstest.c:61)
    by 0x10E3C0: fillQemuCaps (domaincapstest.c:86)
    by 0x10E3C0: test_virDomainCapsFormat (domaincapstest.c:234)
    by 0x10F4BC: virTestRun (testutils.c:146)
    by 0x10DE93: doTestQemuInternal (domaincapstest.c:301)
    by 0x10E13D: doTestQemu (domaincapstest.c:332)
    by 0x1124CF: testQemuCapsIterate (testutilsqemu.c:635)
    by 0x10DCE3: mymain (domaincapstest.c:435)
    by 0x10FD8B: virTestMain (testutils.c:916)

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-18 14:28:48 +01:00
Daniel P. Berrangé
5209791e47 src: warn against virNodeGetInfo() API call due to inaccurate info
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-18 11:57:18 +00:00
Peter Krempa
3e719fe949 test: qemucaps: Refresh x86_64 caps probe data for the qemu-4.2 release
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2019-12-18 09:49:31 +01:00
Peter Krempa
5949ac0f59 kbase: Add document outlining backing chain XML config and troubleshooting
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-18 09:36:49 +01:00
Peter Krempa
3615e8b39b util: storage: Don't treat files with missing backing store format as 'raw'
Assuming that the backing image format is raw is wrong when doing image
detection:

1) In -drive mode qemu will still probe the image format of the backing
   image. This means it will try to open a backing file of the image
   which will fail if a more advanced security model is in use.

2) In blockdev mode the image will be opened as raw actually which is
   wrong since it might be qcow. Not opening the backing images will
   also end up in the guest seeing corrupted data.

Rather than attempt to solve various corner cases when us assuming the
storage file being raw and actually being right forbid startup when the
guest image doesn't have the format specified in the metadata.

https://bugzilla.redhat.com/show_bug.cgi?id=1588373

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-18 09:36:48 +01:00
Peter Krempa
a649369480 tests: storage: Remove unused test modes
EXP_WARN and ALLOW_PROBE flags for the testStorageChain cases are no
longer used so we can remove them.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-18 09:36:48 +01:00
Peter Krempa
7e582fe995 tests: storage: Use strict version of virStorageFileGetMetadata
Pass in 'true' as '@report_broken' of virStorageFileGetMetadata to make
it fail in the tests. The most important code paths (when starting the
VM) expect this function to fail rather than silently return partial
data. Switch the test to exercise this more important code path.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-18 09:36:48 +01:00
Laine Stump
6c17606b7c qemu: homogenize MAC address in live & config when hotplugging a netdev
Prior to commit 55ce6564634 (first in libvirt 4.6.0), the XML sent to
virDomainAttachDeviceFlags() was parsed only once, and the results of
that parse were inserted into both the live object of the running
domain and into the persistent config. Thus, if MAC address was
omitted from in XML for a network device (<interface>), both the live
and config object would have the same MAC address.

Commit 55ce6564634 changed the code to parse the incoming XML twice -
once for live and once for config. This does eliminate the problem of
PCI (/scsi/sata) address conflicts caused by allocating an address
based on existing devices in live object, but then inserting the
result into the config (which may already have a device using that
address), BUT it also means that when the MAC address of a network
device hasn't been specified in the XML, each copy will get a
different auto-generated MAC address.

This results in the MAC address of the device changing the next time
the domain is shutdown and restarted, which creates havoc with the
guest OS's network config.

There have been several discussions about this in the last > 1 year,
attempting to find the ideal solution to this problem that makes MAC
addresses consistent and accounts for all sorts of corner cases with
PCI/scsi/sata addresses. All of these discussions fizzled out because
every proposal was either too difficult to implement or failed to fix
some esoteric case someone thought up.

So, in the interest of solving the MAC address problem while not
making the "other address" situation any worse than before, this patch
simply adds a qemuDomainAttachDeviceLiveAndConfigHomogenize() function
that (for now) copies the MAC address from the config object to the
live object (if the original xml had <mac address='blah'/> then this
will be an effective NOP (as the macs already match)).

Any downstream libvirt containing upstream commit
55ce6564634 should have this patch as well.

https://bugzilla.redhat.com/1783411

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2019-12-17 21:21:09 -05:00
Michal Privoznik
b86c65e170 get_nonnull_domain: Drop useless comment
The intent of get_nonnull_domain() is not to validate virDomain
as sent by the client but just to construct the virDomain
structure. The validation is then done in each API when looking
up the domain in our internal hash tables.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:43 +01:00
Michal Privoznik
dd2fd7d449 lxc: Cleanup virConnectPtr usage
There are some functions which pass virConnectPtr around for one
reason and one reason only: to obtain virLXCDriverPtr in the end.
Might replace the argument and pass a pointer to the driver right
from the start.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:43 +01:00
Michal Privoznik
f1625edc16 libxlGetDHCPInterfaces: Switch to GLib
If we use glib alloc functions, we can drop the 'cleanup' label
and @rv variable and also simplify the code a bit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:43 +01:00
Michal Privoznik
66eafbc26f libxlGetDHCPInterfaces: Move some variables inside the loop
Some variables are not used outside of the for() loop. Move their
declaration to clean up the code a bit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:43 +01:00
Michal Privoznik
068fd891cd libxl: Don't use dom->conn to lookup virNetwork
When using the monolithic daemon, then dom->conn has all driver
tables filled in properly and thus it's safe to call an API other
than virDomain*(). However, when using split daemons then
dom->conn has only hypervisor driver table set
(dom->conn->driver) and the rest is NULL. Therefore, if we want
to call a non-domain API (virNetworkLookupByName() in this case),
we have obtain the cached connection object accessible via
virGetConnectNetwork().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Michal Privoznik
7be63dbe25 qemuGetDHCPInterfaces: Switch to GLib
If we use glib alloc functions, we can drop the 'cleanup' label
and @rv variable and also simplify the code a bit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Michal Privoznik
c06f4b48fe qemuGetDHCPInterfaces: Move some variables inside the loop
Some variables are not used outside of the for() loop. Move their
declaration to clean up the code a bit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Michal Privoznik
dae430ccbc qemu: Don't use dom->conn to lookup virNetwork
When using the monolithic daemon, then dom->conn has all driver
tables filled in properly and thus it's safe to call an API other
than virDomain*(). However, when using split daemons then
dom->conn has only hypervisor driver table set
(dom->conn->driver) and the rest is NULL. Therefore, if we want
to call a non-domain API (virNetworkLookupByName() in this case),
we have obtain the cached connection object accessible via
virGetConnectNetwork().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Michal Privoznik
5910b180ca qemu_driver: Push qemuDomainInterfaceAddresses() a few lines down
If we place qemuDomainInterfaceAddresses() a few lines below the
two functions its using then we can drop forward declarations of
those functions.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 16:58:42 +01:00
Pavel Mores
b036505279 qemu: use g_autofree instead of VIR_FREE in qemuMonitorTextCreateSnapshot()
While at bugfixing, convert the whole function to the new-style memory
allocation handling.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Pavel Mores <pmores@redhat.com>
2019-12-17 10:49:30 -05:00
Ján Tomko
b87cca75c3 build: relax the relaxed stack frame limit further
Pick 256k as the limit.

While -Wno-frame-larger-than would make more sense for usage
in our test suite, the -Wno version seems to have no effect
if -Wframe-larger-than was already specified.

Use an (un)reasonably large value instead.

Fixes the build with clang:
../../tests/cputest.c:964:1: error: stack frame size of 33176 bytes
in function 'mymain' [-Werror,-Wframe-larger-than=]
mymain(void)
^
1 error generated.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-17 14:39:56 +01:00
Ján Tomko
5657608b5e build: warn on a large frame by default
My commit e73889b6311f5b43d859caa4bae84bfdb299967a
split the -Wframe-larger-than warning setting into
two different variables - STRICT_FRAME_LIMIT_CFLAGS
for the library code and RELAXED_FRAME_LIMIT_CFLAGS
which was needed for tests.

Use the strict limit by default and specify the warning
flag twice for the parts that require a larger stack
frame, relying on the fact that the compiler will pick
up the latter value.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-17 14:39:56 +01:00
Michal Privoznik
67010e8749 virsh: Introduce nvme disk to domblklist
This is slightly more complicated because NVMe disk source is not
a simple attribute to <source/> element. The format in which the
PCI address and namespace ID are printed is the same as QEMU
accepts them:

  nvme://XXXX:XX:XX.X/X

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
430715604f qemu_hotplug: Prepare NVMe disks on hotplug
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
6edb4321b2 qemu: Allow forcing VFIO when computing memlock limit
With NVMe disks, one can start a blockjob with a NVMe disk
that is not visible in domain XML (at least right away). Usually,
it's fairly easy to override this limitation of
qemuDomainGetMemLockLimitBytes() - for instance for hostdevs we
temporarily add the device to domain def, let the function
calculate the limit and then remove the device. But it's not so
easy with virStorageSourcePtr - in some cases they don't
necessarily are attached to a disk. And even if they are it's
done later in the process and frankly, I find it too complicated
to be able to use the simple trick we use with hostdevs.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
da27be1b09 qemu: Don't leak storage perms on failure in qemuDomainAttachDiskGeneric
At the very beginning of the attach function the
qemuDomainStorageSourceChainAccessAllow() is called which
modifies CGroups, locks and seclabels for new disk and its
backing chain. This must be followed by a counterpart which
reverts back all the changes if something goes wrong. This boils
down to calling qemuDomainStorageSourceChainAccessRevoke() which
is done under 'error' label. But not all failure branches jump
there. They just jump onto 'cleanup' label where no revoke is
done. Such mistake is easy to do because 'cleanup' label does
exist. Therefore, dissolve 'error' block in 'cleanup' and have
everything jump onto 'cleanup' label.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
1038505420 qemu_monitor_text: Catch IOMMU/VFIO related errors in qemuMonitorTextAddDrive
Because this is a HMP we're dealing with, there is nothing like
class of reply message, so we have to do some string comparison
to guess if the command fails. Well, with NVMe disks whole new
class of errors comes to play because qemu needs to initialize
IOMMU and VFIO for them. You can see all the messages it may
produce in qemu_vfio_init_pci().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
8e2026cc18 qemu: Generate command line of NVMe disks
Now, that we have everything prepared, we can generate command
line for NVMe disks.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
c4062d5620 qemu_capabilities: Introduce QEMU_CAPS_DRIVE_NVME
This capability tracks if qemu is capable of:

  -drive file.driver=nvme

The feature was added in QEMU's commit of v2.12.0-rc0~104^2~2.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
284a12bae0 virSecuritySELinuxRestoreImageLabelInt: Don't skip non-local storage
This function is currently not called for any type of storage
source that is not considered 'local' (as defined by
virStorageSourceIsLocalStorage()). Well, NVMe disks are not
'local' from that point of view and therefore we will need to
call this function more frequently.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
c988a39c7b qemu: Allow NVMe disk in CGroups
If a domain has an NVMe disk configured, then we need to allow it
on devices CGroup so that qemu can access it. There is one caveat
though - if an NVMe disk is read only we need CGroup to allow
write too. This is because when opening the device, qemu does
couple of ioctl()-s which are considered as write.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
329a680297 qemu: Mark NVMe disks as 'need VFIO'
There are couple of places where a domain with a VFIO device gets
special treatment: in CGroups when enabling/disabling access to
/dev/vfio/vfio, and when creating/removing nodes in domain mount
namespace. Well, a NVMe disk is a VFIO device too. Fortunately,
we have this qemuDomainNeedsVFIO() function which is the only
place that needs adjustment.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Michal Privoznik
a80ebd2a2a qemu: Create NVMe disk in domain namespace
If a domain has an NVMe disk configured, then we need to create
/dev/vfio/* paths in domain's namespace so that qemu can open
them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:43 +01:00
Michal Privoznik
d3f06dcdb5 qemu: Take NVMe disks into account when calculating memlock limit
We have this beautiful function that does crystal ball
divination. The function is named
qemuDomainGetMemLockLimitBytes() and it calculates the upper
limit of how much locked memory is given guest going to need. The
function bases its guess on devices defined for a domain. For
instance, if there is a VFIO hostdev defined then it adds 1GiB to
the guessed maximum. Since NVMe disks are pretty much VFIO
hostdevs (but not quite), we have to do the same sorcery.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:43 +01:00
Michal Privoznik
8943ca11b2 qemu: prepare NVMe devices too
The qemu driver has its own wrappers around virHostdev module (so
that some arguments are filled in automatically). Extend these to
include NVMe devices too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:43 +01:00
Michal Privoznik
d58facd781 virhostdevtest: Test virNVMeDevice assignment
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:43 +01:00