libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2024-12-31 18:15:25 +00:00

Author	SHA1	Message	Date
Daniel P. Berrangé	e67ccd3cf8	conf: fix populating of fake NUMA in multi-node hosts If the host OS doesn't have NUMA present, we fallback to populating fake NUMA info and the code thus assumes only a single NUMA node. Unfortunately we also fallback to fake NUMA if numactl-devel was not present, and in this case we can still have multiple NUMA nodes. In this case we create all CPUs, but only the CPUs in the first node have any data filled in, resulting in capabilities like: <topology> <cells num='1'> <cell id='0'> <memory unit='KiB'>15977572</memory> <cpus num='48'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' core_id='0' siblings='1'/> <cpu id='2' socket_id='0' core_id='1' siblings='2'/> <cpu id='3' socket_id='0' core_id='1' siblings='3'/> <cpu id='4' socket_id='0' core_id='2' siblings='4'/> <cpu id='5' socket_id='0' core_id='2' siblings='5'/> <cpu id='6' socket_id='0' core_id='3' siblings='6'/> <cpu id='7' socket_id='0' core_id='3' siblings='7'/> <cpu id='8' socket_id='0' core_id='4' siblings='8'/> <cpu id='9' socket_id='0' core_id='4' siblings='9'/> <cpu id='10' socket_id='0' core_id='5' siblings='10'/> <cpu id='11' socket_id='0' core_id='5' siblings='11'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> <cpu id='0'/> </cpus> </cell> </cells> </topology> With this new code we get something slightly less broken <topology> <cells num='4'> <cell id='0'> <memory unit='KiB'>15977572</memory> <cpus num='12'> <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/> <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/> <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/> <cpu id='5' socket_id='0' core_id='2' siblings='4-5'/> <cpu id='6' socket_id='0' core_id='3' siblings='6-7'/> <cpu id='7' socket_id='0' core_id='3' siblings='6-7'/> <cpu id='8' socket_id='0' core_id='4' siblings='8-9'/> <cpu id='9' socket_id='0' core_id='4' siblings='8-9'/> <cpu id='10' socket_id='0' core_id='5' siblings='10-11'/> <cpu id='11' socket_id='0' core_id='5' siblings='10-11'/> </cpus> </cell> <cell id='0'> <memory unit='KiB'>15977572</memory> <cpus num='12'> <cpu id='12' socket_id='0' core_id='0' siblings='12-13'/> <cpu id='13' socket_id='0' core_id='0' siblings='12-13'/> <cpu id='14' socket_id='0' core_id='1' siblings='14-15'/> <cpu id='15' socket_id='0' core_id='1' siblings='14-15'/> <cpu id='16' socket_id='0' core_id='2' siblings='16-17'/> <cpu id='17' socket_id='0' core_id='2' siblings='16-17'/> <cpu id='18' socket_id='0' core_id='3' siblings='18-19'/> <cpu id='19' socket_id='0' core_id='3' siblings='18-19'/> <cpu id='20' socket_id='0' core_id='4' siblings='20-21'/> <cpu id='21' socket_id='0' core_id='4' siblings='20-21'/> <cpu id='22' socket_id='0' core_id='5' siblings='22-23'/> <cpu id='23' socket_id='0' core_id='5' siblings='22-23'/> </cpus> </cell> </cells> </topology> The topology at least now reflects what 'virsh nodeinfo' reports. The main bug is that the CPU "id" values won't match what the Linux host actually uses. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2019-12-18 15:19:22 +00:00
Daniel P. Berrangé	fb5aaf3d05	conf: avoid mem leak re-allocating fake NUMA capabilities The 'caps' object is already allocated when the fake NUMA initialization takes place. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2019-12-18 15:19:22 +00:00
Daniel Henrique Barboza	aed9bcd11b	qemu_command: tidy up qemuBuildHostdevCommandLine loop The current 'for' loop with 5 consecutive 'ifs' inside qemuBuildHostdevCommandLine can be a bit smarter: - all 5 'ifs' fails if hostdev->mode is not equal to VIR_DOMAIN_HOSTDEV_MODE_SUBSYS. This check can be moved to the start of the loop, failing to the next element immediately in case it fails; - all 5 'ifs' checks for a specific subsys->type to build the proper command line argument (virHostdevIsSCSIDevice and virHostdevIsMdevDevice do that but within a helper). Problem is that the code will keep checking for matches even if one was already found, and there is no way a hostdev will fit more than one 'if' (i.e. a hostdev can't have 2+ different types). This means that a SUBSYS_TYPE_USB will create its command line argument in the first 'if', then all other conditionals will surely fail but will end up being checked anyway. All of this can be avoided by moving the hostdev->mode comparing to the start of the loop and using a switch statement with subsys->type to execute the proper code for a given hostdev type. Suggested-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-12-18 16:02:08 +01:00
Daniel P. Berrangé	2e07a1e146	event: add API for requiring an event loop impl to be registered Reviewed-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2019-12-18 14:04:59 +00:00
Daniel P. Berrangé	cccc3fc1bb	access: report an error if no access manager is present The code calling this method expects it to have reported an error on failure. Reviewed-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2019-12-18 14:04:51 +00:00
Michal Privoznik	39a7dff726	qemu: Don't leak hostcpu or hostnuma on driver cleanup When freeing qemu driver struct members, we forgot to free @hostcpu and @hostnuma members. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2019-12-18 14:28:48 +01:00
Michal Privoznik	7cf76d4e3a	qemu: Reorder cleanup in qemuStateCleanup() This function is supposed to clean up virQEMUDriver structure and free individual members. However, it's doing that in random order which makes it hard to track which members are being freed and which are not. Do the free in reverse order than the structure definition - assuming that the most important members (like mutex) are declared first and freed last. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2019-12-18 14:28:48 +01:00
Michal Privoznik	599f9c73d4	virCapabilitiesHostNUMAUnref: Accept NULL Fortunately, this is not causing any problems now because glib does this check for us when calling this function via attribute cleanup. But in a future commit we will explicitly call this function over a struct member that might be NULL. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2019-12-18 14:28:48 +01:00
Michal Privoznik	98f1f4a439	testutilsxen: Avoid double free of driver caps In testXLInitDriver() a dummy driver structure is filled and it is freed later in testXLFreeDriver(). However, it is sufficient to unref just driver->config because that results in libxlDriverConfigDispose() being called which unrefs driver->config->caps. There is no need to unref it again in testXLFreeDriver() - in fact it's undesired. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2019-12-18 14:28:48 +01:00
Michal Privoznik	08a7e88b6f	domaincapstest: Don't leak cpu definitions When generating domain capabilities, we need to fake host CPU to get reproducible result. We do this by copying a pre-existent CPU config and setting VIR_TEST_MOCK_FAKE_HOST_CPU env variable which is then consumed by qemucpumock. However, we forget to free the CPU copy afterwards. 2,196 (2,016 direct, 180 indirect) bytes in 18 blocks are definitely lost in loss record 291 of 297 at 0x4838B86: calloc (vg_replace_malloc.c:762) by 0x57CB6A0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7) by 0x4A0F72D: virCPUDefNew (cpu_conf.c:87) by 0x4A0FAC7: virCPUDefCopyWithoutModel (cpu_conf.c:235) by 0x4A0FBBE: virCPUDefCopy (cpu_conf.c:273) by 0x10E3C0: testUtilsHostCpusGetDefForArch (testutilshostcpus.h:157) by 0x10E3C0: fakeHostCPU (domaincapstest.c:61) by 0x10E3C0: fillQemuCaps (domaincapstest.c:86) by 0x10E3C0: test_virDomainCapsFormat (domaincapstest.c:234) by 0x10F4BC: virTestRun (testutils.c:146) by 0x10DE93: doTestQemuInternal (domaincapstest.c:301) by 0x10E13D: doTestQemu (domaincapstest.c:332) by 0x1124CF: testQemuCapsIterate (testutilsqemu.c:635) by 0x10DCE3: mymain (domaincapstest.c:435) by 0x10FD8B: virTestMain (testutils.c:916) Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2019-12-18 14:28:48 +01:00
Daniel P. Berrangé	5209791e47	src: warn against virNodeGetInfo() API call due to inaccurate info Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2019-12-18 11:57:18 +00:00
Peter Krempa	3e719fe949	test: qemucaps: Refresh x86_64 caps probe data for the qemu-4.2 release Signed-off-by: Peter Krempa <pkrempa@redhat.com>	2019-12-18 09:49:31 +01:00
Peter Krempa	5949ac0f59	kbase: Add document outlining backing chain XML config and troubleshooting Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-12-18 09:36:49 +01:00
Peter Krempa	3615e8b39b	util: storage: Don't treat files with missing backing store format as 'raw' Assuming that the backing image format is raw is wrong when doing image detection: 1) In -drive mode qemu will still probe the image format of the backing image. This means it will try to open a backing file of the image which will fail if a more advanced security model is in use. 2) In blockdev mode the image will be opened as raw actually which is wrong since it might be qcow. Not opening the backing images will also end up in the guest seeing corrupted data. Rather than attempt to solve various corner cases when us assuming the storage file being raw and actually being right forbid startup when the guest image doesn't have the format specified in the metadata. https://bugzilla.redhat.com/show_bug.cgi?id=1588373 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-12-18 09:36:48 +01:00
Peter Krempa	a649369480	tests: storage: Remove unused test modes EXP_WARN and ALLOW_PROBE flags for the testStorageChain cases are no longer used so we can remove them. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-12-18 09:36:48 +01:00
Peter Krempa	7e582fe995	tests: storage: Use strict version of virStorageFileGetMetadata Pass in 'true' as '@report_broken' of virStorageFileGetMetadata to make it fail in the tests. The most important code paths (when starting the VM) expect this function to fail rather than silently return partial data. Switch the test to exercise this more important code path. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-12-18 09:36:48 +01:00
Laine Stump	6c17606b7c	qemu: homogenize MAC address in live & config when hotplugging a netdev Prior to commit `55ce656463` (first in libvirt 4.6.0), the XML sent to virDomainAttachDeviceFlags() was parsed only once, and the results of that parse were inserted into both the live object of the running domain and into the persistent config. Thus, if MAC address was omitted from in XML for a network device (<interface>), both the live and config object would have the same MAC address. Commit `55ce656463` changed the code to parse the incoming XML twice - once for live and once for config. This does eliminate the problem of PCI (/scsi/sata) address conflicts caused by allocating an address based on existing devices in live object, but then inserting the result into the config (which may already have a device using that address), BUT it also means that when the MAC address of a network device hasn't been specified in the XML, each copy will get a different auto-generated MAC address. This results in the MAC address of the device changing the next time the domain is shutdown and restarted, which creates havoc with the guest OS's network config. There have been several discussions about this in the last > 1 year, attempting to find the ideal solution to this problem that makes MAC addresses consistent and accounts for all sorts of corner cases with PCI/scsi/sata addresses. All of these discussions fizzled out because every proposal was either too difficult to implement or failed to fix some esoteric case someone thought up. So, in the interest of solving the MAC address problem while not making the "other address" situation any worse than before, this patch simply adds a qemuDomainAttachDeviceLiveAndConfigHomogenize() function that (for now) copies the MAC address from the config object to the live object (if the original xml had <mac address='blah'/> then this will be an effective NOP (as the macs already match)). Any downstream libvirt containing upstream commit `55ce656463` should have this patch as well. https://bugzilla.redhat.com/1783411 Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-12-17 21:21:09 -05:00
Michal Privoznik	b86c65e170	get_nonnull_domain: Drop useless comment The intent of get_nonnull_domain() is not to validate virDomain as sent by the client but just to construct the virDomain structure. The validation is then done in each API when looking up the domain in our internal hash tables. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:43 +01:00
Michal Privoznik	dd2fd7d449	lxc: Cleanup virConnectPtr usage There are some functions which pass virConnectPtr around for one reason and one reason only: to obtain virLXCDriverPtr in the end. Might replace the argument and pass a pointer to the driver right from the start. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:43 +01:00
Michal Privoznik	f1625edc16	libxlGetDHCPInterfaces: Switch to GLib If we use glib alloc functions, we can drop the 'cleanup' label and @rv variable and also simplify the code a bit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:43 +01:00
Michal Privoznik	66eafbc26f	libxlGetDHCPInterfaces: Move some variables inside the loop Some variables are not used outside of the for() loop. Move their declaration to clean up the code a bit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:43 +01:00
Michal Privoznik	068fd891cd	libxl: Don't use dom->conn to lookup virNetwork When using the monolithic daemon, then dom->conn has all driver tables filled in properly and thus it's safe to call an API other than virDomain*(). However, when using split daemons then dom->conn has only hypervisor driver table set (dom->conn->driver) and the rest is NULL. Therefore, if we want to call a non-domain API (virNetworkLookupByName() in this case), we have obtain the cached connection object accessible via virGetConnectNetwork(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:42 +01:00
Michal Privoznik	7be63dbe25	qemuGetDHCPInterfaces: Switch to GLib If we use glib alloc functions, we can drop the 'cleanup' label and @rv variable and also simplify the code a bit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:42 +01:00
Michal Privoznik	c06f4b48fe	qemuGetDHCPInterfaces: Move some variables inside the loop Some variables are not used outside of the for() loop. Move their declaration to clean up the code a bit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:42 +01:00
Michal Privoznik	dae430ccbc	qemu: Don't use dom->conn to lookup virNetwork When using the monolithic daemon, then dom->conn has all driver tables filled in properly and thus it's safe to call an API other than virDomain*(). However, when using split daemons then dom->conn has only hypervisor driver table set (dom->conn->driver) and the rest is NULL. Therefore, if we want to call a non-domain API (virNetworkLookupByName() in this case), we have obtain the cached connection object accessible via virGetConnectNetwork(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:42 +01:00
Michal Privoznik	5910b180ca	qemu_driver: Push qemuDomainInterfaceAddresses() a few lines down If we place qemuDomainInterfaceAddresses() a few lines below the two functions its using then we can drop forward declarations of those functions. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 16:58:42 +01:00
Pavel Mores	b036505279	qemu: use g_autofree instead of VIR_FREE in qemuMonitorTextCreateSnapshot() While at bugfixing, convert the whole function to the new-style memory allocation handling. Reviewed-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Pavel Mores <pmores@redhat.com>	2019-12-17 10:49:30 -05:00
Ján Tomko	b87cca75c3	build: relax the relaxed stack frame limit further Pick 256k as the limit. While -Wno-frame-larger-than would make more sense for usage in our test suite, the -Wno version seems to have no effect if -Wframe-larger-than was already specified. Use an (un)reasonably large value instead. Fixes the build with clang: ../../tests/cputest.c:964:1: error: stack frame size of 33176 bytes in function 'mymain' [-Werror,-Wframe-larger-than=] mymain(void) ^ 1 error generated. Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2019-12-17 14:39:56 +01:00
Ján Tomko	5657608b5e	build: warn on a large frame by default My commit `e73889b631` split the -Wframe-larger-than warning setting into two different variables - STRICT_FRAME_LIMIT_CFLAGS for the library code and RELAXED_FRAME_LIMIT_CFLAGS which was needed for tests. Use the strict limit by default and specify the warning flag twice for the parts that require a larger stack frame, relying on the fact that the compiler will pick up the latter value. Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2019-12-17 14:39:56 +01:00
Michal Privoznik	67010e8749	virsh: Introduce nvme disk to domblklist This is slightly more complicated because NVMe disk source is not a simple attribute to <source/> element. The format in which the PCI address and namespace ID are printed is the same as QEMU accepts them: nvme://XXXX:XX:XX.X/X Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	430715604f	qemu_hotplug: Prepare NVMe disks on hotplug Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	6edb4321b2	qemu: Allow forcing VFIO when computing memlock limit With NVMe disks, one can start a blockjob with a NVMe disk that is not visible in domain XML (at least right away). Usually, it's fairly easy to override this limitation of qemuDomainGetMemLockLimitBytes() - for instance for hostdevs we temporarily add the device to domain def, let the function calculate the limit and then remove the device. But it's not so easy with virStorageSourcePtr - in some cases they don't necessarily are attached to a disk. And even if they are it's done later in the process and frankly, I find it too complicated to be able to use the simple trick we use with hostdevs. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	da27be1b09	qemu: Don't leak storage perms on failure in qemuDomainAttachDiskGeneric At the very beginning of the attach function the qemuDomainStorageSourceChainAccessAllow() is called which modifies CGroups, locks and seclabels for new disk and its backing chain. This must be followed by a counterpart which reverts back all the changes if something goes wrong. This boils down to calling qemuDomainStorageSourceChainAccessRevoke() which is done under 'error' label. But not all failure branches jump there. They just jump onto 'cleanup' label where no revoke is done. Such mistake is easy to do because 'cleanup' label does exist. Therefore, dissolve 'error' block in 'cleanup' and have everything jump onto 'cleanup' label. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	1038505420	qemu_monitor_text: Catch IOMMU/VFIO related errors in qemuMonitorTextAddDrive Because this is a HMP we're dealing with, there is nothing like class of reply message, so we have to do some string comparison to guess if the command fails. Well, with NVMe disks whole new class of errors comes to play because qemu needs to initialize IOMMU and VFIO for them. You can see all the messages it may produce in qemu_vfio_init_pci(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	8e2026cc18	qemu: Generate command line of NVMe disks Now, that we have everything prepared, we can generate command line for NVMe disks. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	c4062d5620	qemu_capabilities: Introduce QEMU_CAPS_DRIVE_NVME This capability tracks if qemu is capable of: -drive file.driver=nvme The feature was added in QEMU's commit of v2.12.0-rc0~104^2~2. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	284a12bae0	virSecuritySELinuxRestoreImageLabelInt: Don't skip non-local storage This function is currently not called for any type of storage source that is not considered 'local' (as defined by virStorageSourceIsLocalStorage()). Well, NVMe disks are not 'local' from that point of view and therefore we will need to call this function more frequently. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	c988a39c7b	qemu: Allow NVMe disk in CGroups If a domain has an NVMe disk configured, then we need to allow it on devices CGroup so that qemu can access it. There is one caveat though - if an NVMe disk is read only we need CGroup to allow write too. This is because when opening the device, qemu does couple of ioctl()-s which are considered as write. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	329a680297	qemu: Mark NVMe disks as 'need VFIO' There are couple of places where a domain with a VFIO device gets special treatment: in CGroups when enabling/disabling access to /dev/vfio/vfio, and when creating/removing nodes in domain mount namespace. Well, a NVMe disk is a VFIO device too. Fortunately, we have this qemuDomainNeedsVFIO() function which is the only place that needs adjustment. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:44 +01:00
Michal Privoznik	a80ebd2a2a	qemu: Create NVMe disk in domain namespace If a domain has an NVMe disk configured, then we need to create /dev/vfio/* paths in domain's namespace so that qemu can open them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	d3f06dcdb5	qemu: Take NVMe disks into account when calculating memlock limit We have this beautiful function that does crystal ball divination. The function is named qemuDomainGetMemLockLimitBytes() and it calculates the upper limit of how much locked memory is given guest going to need. The function bases its guess on devices defined for a domain. For instance, if there is a VFIO hostdev defined then it adds 1GiB to the guessed maximum. Since NVMe disks are pretty much VFIO hostdevs (but not quite), we have to do the same sorcery. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	8943ca11b2	qemu: prepare NVMe devices too The qemu driver has its own wrappers around virHostdev module (so that some arguments are filled in automatically). Extend these to include NVMe devices too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	d58facd781	virhostdevtest: Test virNVMeDevice assignment Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	d4bea2d5fb	virpcimock: Introduce NVMe driver and devices The device configs (which are actually the same one config) come from a NVMe disk of mine. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	3d6e2b5ee8	virhostdev: Include virNVMeDevice module Now that we have virNVMeDevice module (introduced in previous commit), let's use it int virHostdev to track which NVMe devices are free to be used by a domain and which are taken. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	b1e19ca36d	util: Introduce virNVMeDevice module This module will be used by virHostdevManager and it's inspired by virPCIDevice module. They are very similar except instead of what makes a NVMe device: PCI address AND namespace ID. This means that a NVMe device can appear in a domain multiple times, each time with a different namespace. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	abd7c4c746	domain_conf: Introduce virDomainDefHasNVMeDisk This function will return true if any of disks (or their backing chain) for given domain contains an NVMe disk. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	a88eef7c48	virstoragefile: Introduce virStorageSourceChainHasNVMe This function will return true if there's a storage source of type VIR_STORAGE_TYPE_NVME, or false otherwise. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	8cd7196974	conf: Format and parse NVMe type disk To simplify implementation, some restrictions are added. For instance, an NVMe disk can't go to any bus but virtio and has to be type of 'disk' and can't have startupPolicy set. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00
Michal Privoznik	e1b022890e	schemas: Introduce disk type NVMe There is this class of PCI devices that act like disks: NVMe. Therefore, they are both PCI devices and disks. While we already have <hostdev/> (and can assign a NVMe device to a domain successfully) we don't have disk representation. There are three problems with PCI assignment in case of a NVMe device: 1) domains with <hostdev/> can't be migrated 2) NVMe device is assigned whole, there's no way to assign only a namespace 3) Because hypervisors see <hostdev/> they don't put block layer on top of it - users don't get all the fancy features like snapshots NVMe namespaces are way of splitting one continuous NVDIMM memory into smaller ones, effectively creating smaller NVMe-s (which can then be partitioned, LVMed, etc.) Because of all of this the following XML was chosen to model a NVMe device: <disk type='nvme' device='disk'> <driver name='qemu' type='raw'/> <source type='pci' managed='yes' namespace='1'> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <target dev='vda' bus='virtio'/> </disk> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-12-17 10:04:43 +01:00

... 5 6 7 8 9 ...

36151 Commits