libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2024-10-14 01:59:14 +00:00

Author	SHA1	Message	Date
Michal Privoznik	2f0b3b103b	qemuDomainDetachDeviceUnlink: Don't unlink files we haven't created Even though there are several checks before calling this function and for some scenarios we don't call it at all (e.g. on disk hot unplug), it may be possible to sneak in some weird files (e.g. if domain would have RNG with /dev/shm/some_file as its backend). No matter how improbable, we shouldn't unlink it as we would be unlinking a file from the host which we haven't created in the first place. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>	2017-05-03 17:23:03 +02:00
Michal Privoznik	b3418f36be	qemuDomainAttachDeviceMknodRecursive: Don't try to create devices under preserved mount points Just like in previous commit, this fixes the same issue for hotplug. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>	2017-05-03 17:23:03 +02:00
Michal Privoznik	e30dbf35a1	qemuDomainCreateDeviceRecursive: Don't try to create devices under preserved mount points While the code allows devices to already be there (by some miracle), we shouldn't try to create devices that don't belong to us. For instance, we shouldn't try to create /dev/shm/file because /dev/shm is a mount point that is preserved. Therefore if a file is created there from an outside (e.g. by mgmt application or some other daemon running on the system like vhostmd), it exists in the qemu namespace too as the mount point is the same. It's only /dev and /dev only that is different. The same reasoning applies to all other preserved mount points. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>	2017-05-03 17:23:03 +02:00
Michal Privoznik	26c14be8d6	qemuDomainCreateDeviceRecursive: pass a structure instead of bare path Currently, all we need to do in qemuDomainCreateDeviceRecursive() is to take given @device, get all kinds of info on it (major & minor numbers, owner, seclabels) and create its copy at a temporary location @path (usually /var/run/libvirt/qemu/$domName.dev), if @device live under /dev. This is, however, very loose condition, as it also means /dev/shm/* is created too. Therefor, we will need to pass more arguments into the function for better decision making (e.g. list of mount points under /dev). Instead of adding more arguments to all the functions (not easily reachable because some functions are callback with strictly defined type), lets just turn this one 'const char ' into a 'struct '. New "arguments" can be then added at no cost. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>	2017-05-03 17:23:03 +02:00
Michal Privoznik	a7cc039dc7	qemuDomainBuildNamespace: Move /dev/* mountpoints later When setting up mount namespace for a qemu domain the following steps are executed: 1) get list of mountpoints under /dev/ 2) move them to /var/run/libvirt/qemu/$domName.ext 3) start constructing new device tree under /var/run/libvirt/qemu/$domName.dev 4) move the mountpoint of the new device tree to /dev 5) restore original mountpoints from step 2) Note the problem with this approach is that if some device in step 3) requires access to a mountpoint from step 2) it will fail as the mountpoint is not there anymore. For instance consider the following domain disk configuration: <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/dev/shm/vhostmd0'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </disk> In this case operation fails as we are unable to create vhostmd0 in the new device tree because after step 2) there is no /dev/shm anymore. Leave aside fact that we shouldn't try to create devices living in other mountpoints. That's a separate bug that will be addressed later. Currently, the order described above is rearranged to: 1) get list of mountpoints under /dev/ 2) start constructing new device tree under /var/run/libvirt/qemu/$domName.dev 3) move them to /var/run/libvirt/qemu/$domName.ext 4) move the mountpoint of the new device tree to /dev 5) restore original mountpoints from step 3) Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>	2017-05-03 17:23:03 +02:00
Pavel Hrdina	568887a32f	qemu: use qemu-xhci USB controller by default for ppc64 and aarch64 Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1438682 Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Acked-by: Andrea Bolognani <abologna@redhat.com>	2017-04-28 10:47:12 +02:00
Pavel Hrdina	278e70f8f8	qemu: add support for qemu-xhci USB controller Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1438682 Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Acked-by: Andrea Bolognani <abologna@redhat.com>	2017-04-28 10:44:36 +02:00
Pavel Hrdina	233f8d0bd4	qemu: use nec-usb-xhci as a default controller for aarch64 if available This is a USB3 controller and it's a better choice than piix3-uhci. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Acked-by: Andrea Bolognani <abologna@redhat.com>	2017-04-28 10:42:26 +02:00
Pavel Hrdina	e69001b464	qemu: change the logic of setting default USB controller The new logic will set the piix3-uhci if available regardless of any architecture and it will be updated to better model based on architecture and device existence. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Acked-by: Andrea Bolognani <abologna@redhat.com>	2017-04-28 10:41:53 +02:00
Jiri Denemark	df13c0b477	qemu: Add support for guest CPU cache This patch maps /domain/cpu/cache element into -cpu parameters: - <cache mode='passthrough'/> is translated to host-cache-info=on - <cache level='3' mode='emulate'/> is transformed into l3-cache=on - <cache mode='disable'/> is turned in host-cache-info=off,l3-cache=off Any other <cache> element is forbidden. The tricky part is detecting whether QEMU supports the CPU properties. The 'host-cache-info' property is introduced in v2.4.0-1389-ge265e3e480, earlier QEMU releases enabled host-cache-info by default and had no way to disable it. If the property is present, it defaults to 'off' for any QEMU until at least 2.9.0. The 'l3-cache' property was introduced later by v2.7.0-200-g14c985cffa. Earlier versions worked as if l3-cache=off was passed. For any QEMU until at least 2.9.0 l3-cache is 'off' by default. QEMU 2.9.0 was the first release which supports probing both properties by running device-list-properties with typename=host-x86_64-cpu. Older QEMU releases did not support device-list-properties command for CPU devices. Thus we can't really rely on probing them and we can just use query-cpu-model-expansion QMP command as a witness. Because the cache property probing is only reliable for QEMU >= 2.9.0 when both are already supported for quite a few releases, we let QEMU report an error if a specific cache mode is explicitly requested. The other mode (or both if a user requested CPU cache to be disabled) is explicitly turned off for QEMU >= 2.9.0 to avoid any surprises in case the QEMU defaults change. Any older QEMU already turns them off so not doing so explicitly does not make any harm. Signed-off-by: Jiri Denemark <jdenemar@redhat.com>	2017-04-27 22:41:10 +02:00
Jiri Denemark	2a978269fc	qemu: Report VIR_DOMAIN_JOB_OPERATION Not all async jobs are visible via virDomainGetJobStats (either they are too fast or getting the stats is not allowed during the job), but forcing all of them to advertise the operation is easier than hunting the jobs for which fetching statistics is allowed. And we won't need to think about this when we add support for getting stats for more jobs. https://bugzilla.redhat.com/show_bug.cgi?id=1441563 Signed-off-by: Jiri Denemark <jdenemar@redhat.com>	2017-04-27 15:08:12 +02:00
Yuri Chornoivan	5efa7f2a4b	Fix minor typos	2017-04-24 14:40:00 +02:00
Martin Kletzander	523c996062	conf, docs: Add support for coalesce setting(s) We are currently parsing only rx/frames/max because that's the only value that makes sense for us. The tun device just added support for this one and the others are only supported by hardware devices which we don't need to worry about as the only way we'd pass those to the domain is using <hostdev/> or <interface type='hostdev'/>. And in those cases the guest can modify the settings itself. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2017-04-21 13:34:41 +02:00
Pavel Hrdina	90acbc76ec	qemu_domain: use correct default USB controller on ppc64 The history of USB controller for ppc64 guest is complex and goes back to libvirt 1.3.1 where the fun started. Prior Libvirt 1.3.1 if no model for USB controller was specified we've simply passed "-usb" on QEMU command line. Since Libvirt 1.3.1 there is a patch (`8156493d8d`) that fixes this issue by using "-device pci-ohci,..." but it breaks migration with older Libvirts which was agreed that's acceptable. However this patch didn't reflect this change in the domain XML and the model was still missing. Since Libvirt 2.2.0 there is a patch (`f55eaccb0c`) that fixes the issue with not setting the USB model into domain XML which we need to know about to not break the migration and since the default model was pci-ohci it was used as default in this patch as well. This patch tries to take all the previous changes into account and also change the default for newly defined domains that don't specify any model for USB controller. The VIR_DOMAIN_DEF_PARSE_ABI_UPDATE is set only if new domain is defined or new device is added into a domain which means that in all other cases we will use the old pci-ohci model instead of the better and not broken nec-usb-xhci model. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1373184 Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2017-04-20 09:03:53 +02:00
Pavel Hrdina	ac97658d4f	qemu: refactor qemuDomainMachine* functions Introduce new wrapper functions without Machine in the function name that take the whole virDomainDef structure as argument and call the existing functions with Machine in the function name. Change the arguments of existing functions to machine and arch because they don't need the whole virDomainDef structure and they could be used in places where we don't have virDomainDef. Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2017-04-18 13:27:11 +02:00
Marc Hartmayer	b8cc509882	qemu: Turn qemuDomainLogContext into virObject This way qemuDomainLogContextRef() and qemuDomainLogContextFree() is no longer needed. The naming qemuDomainLogContextFree() was also somewhat misleading. Additionally, it's easier to turn qemuDomainLogContext in a self-locking object. Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>	2017-04-10 14:49:20 +02:00
Andrea Bolognani	396ca36cb0	qemu: Enforce ACPI, UEFI requirements Depending on the architecture, requirements for ACPI and UEFI can be different; more specifically, while on x86 UEFI requires ACPI, on aarch64 it's the other way around. Enforce these requirements when validating the domain, and make the error message more accurate by mentioning that they're not necessarily applicable to all architectures. Several aarch64 test cases had to be tweaked because they would have failed the validation step otherwise.	2017-04-03 10:58:00 +02:00
Michal Privoznik	462c4b66fa	Introduce and use virDomainDiskEmptySource Currently, if we want to zero out disk source (e,g, due to startupPolicy when starting up a domain) we use virDomainDiskSetSource(disk, NULL). This works well for file based storage (storage type file, dir, or block). But it doesn't work at all for other types like volume and network. So imagine that you have a domain that has a CDROM configured which source is a volume from an inactive pool. Because it is startupPolicy='optional', the CDROM is empty when the domain starts. However, the source element is not cleared out in the status XML and thus when the daemon restarts and tries to reconnect to the domain it refreshes the disks (which fails - the storage pool is still not running) and thus the domain is killed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-04-03 08:35:57 +02:00
Peter Krempa	20ee78bf9b	qemu: domain: Properly lookup top of chain in qemuDomainGetStorageSourceByDevstr When idx is 0 virStorageFileChainLookup returns the base (bottom) of the backing chain rather than the top. This is expected by the callers of qemuDomainGetStorageSourceByDevstr. Add a special case for idx == 0	2017-03-29 16:56:05 +02:00
Andrea Bolognani	7e667664d2	qemu: Fix memory locking limit calculation For guests that use <memoryBacking><locked>, our only option is to remove the memory locking limit altogether. Partially-resolves: https://bugzilla.redhat.com/1431793	2017-03-28 10:54:49 +02:00
Andrea Bolognani	1f7661af8c	qemu: Remove qemuDomainRequiresMemLock() Instead of having a separate function, we can simply return zero from the existing qemuDomainGetMemLockLimitBytes() to signal the caller that the memory locking limit doesn't need to be set for the guest. Having a single function instead of two makes it less likely that we will use the wrong value, which is exactly what happened when we started applying the limit that was meant for VFIO-using guests to <memoryBacking><locked>-using guests.	2017-03-28 10:54:47 +02:00
Andrea Bolognani	4b67e7a377	Revert "qemu: Forbid <memoryBacking><locked> without <memtune><hard_limit>" This reverts commit `c2e60ad0e5`. Turns out this check is excessively strict: there are ways other than <memtune><hard_limit> to raise the memory locking limit for QEMU processes, one prominent example being tweaking /etc/security/limits.conf. Partially-resolves: https://bugzilla.redhat.com/1431793	2017-03-28 10:44:25 +02:00
Erik Skultety	c8e6775f30	qemu: Bump the memory locking limit for mdevs as well Since mdevs are just another type of VFIO devices, we should increase the memory locking limit the same way we do for VFIO PCI devices. Signed-off-by: Erik Skultety <eskultet@redhat.com>	2017-03-27 15:39:35 +02:00
Erik Skultety	de4e8bdbc7	qemu: cgroup: Adjust cgroups' logic to allow mediated devices As goes for all the other hostdev device types, grant the qemu process access to /dev/vfio/<mediated_device_iommu_group>. Signed-off-by: Erik Skultety <eskultet@redhat.com>	2017-03-27 15:39:35 +02:00
Erik Skultety	ec783d7c77	conf: Introduce new hostdev device type mdev A mediated device will be identified by a UUID (with 'model' now being a mandatory <hostdev> attribute to represent the mediated device API) of the user pre-created mediated device. We also need to make sure that if user explicitly provides a guest address for a mdev device, the address type will be matching the device API supported on that specific mediated device and error out with an incorrect XML message. The resulting device XML: <devices> <hostdev mode='subsystem' type='mdev' model='vfio-pci'> <source> <address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'> </source> </hostdev> </devices> Signed-off-by: Erik Skultety <eskultet@redhat.com>	2017-03-27 15:39:35 +02:00
Peter Krempa	9b93c4c264	qemu: domain: Add helper to look up disk soruce by the backing store string	2017-03-27 10:18:16 +02:00
Peter Krempa	4e1618ce72	qemu: domain: Add helper to generate indexed backing store names The code is currently simple, but if we later add node names, it will be necessary to generate the names based on the node name. Add a helper so that there's a central point to fix once we add self-generated node names.	2017-03-27 09:29:57 +02:00
Peter Krempa	1a5e2a8098	qemu: domain: Add helper to lookup disk by node name Looks up a disk and its corresponding backing chain element by node name.	2017-03-27 09:29:57 +02:00
John Ferlan	1a6b6d9a56	qemu: Set up the migration TLS objects for target If the migration flags indicate this migration will be using TLS, then set up the destination during the prepare phase once the target domain has been started to add the TLS objects to perform the migration. This will create at least an "-object tls-creds-x509,endpoint=server,..." for TLS credentials and potentially an "-object secret,..." to handle the passphrase response to access the TLS credentials. The alias/id used for the TLS objects will contain "libvirt_migrate". Once the objects are created, the code will set the "tls-creds" and "tls-hostname" migration parameters to signify usage of TLS. During the Finish phase we'll be sure to attempt to clear the migration parameters and delete those objects (whether or not they were created). We'll also perform the same reset during recovery if we've reached FINISH3. If the migration isn't using TLS, then be sure to check if the migration parameters exist and clear them if so.	2017-03-25 08:19:49 -04:00
Jiri Denemark	fcd56ce866	qemu: Set default values for CPU check attribute Signed-off-by: Jiri Denemark <jdenemar@redhat.com>	2017-03-17 11:50:48 +01:00
Michal Privoznik	7b89f857d9	qemu: Namespaces for NVDIMM Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-03-15 17:04:33 +01:00
Michal Privoznik	1bc173199e	qemu: Implement NVDIMM So, majority of the code is just ready as-is. Well, with one slight change: differentiate between dimm and nvdimm in places like device alias generation, generating the command line and so on. Speaking of the command line, we also need to append 'nvdimm=on' to the '-machine' argument so that the nvdimm feature is advertised in the ACPI tables properly. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-03-15 14:16:32 +01:00
Michal Privoznik	b4e8a49f8d	Introduce NVDIMM memory model NVDIMM is new type of memory introduced into QEMU 2.6. The idea is that we have a Non-Volatile memory module that keeps the data persistent across domain reboots. At the domain XML level, we already have some representation of 'dimm' modules. Long story short, NVDIMM will utilize the existing <memory/> element that lives under <devices/> by adding a new attribute 'nvdimm' to the existing @model and introduce a new <path/> element for <source/> while reusing other fields. The resulting XML would appear as: <memory model='nvdimm'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> <address type='dimm' slot='0'/> </memory> So far, this is just a XML parser/formatter extension. QEMU driver implementation is in the next commit. For more info on NVDIMM visit the following web page: http://pmem.io/ Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-03-15 13:30:58 +01:00
Michal Privoznik	290a00e41d	qemuDomainBuildNamespace: Handle file mount points https://bugzilla.redhat.com/show_bug.cgi?id=1431112 Yeah, that's right. A mount point doesn't have to be a directory. It can be a file too. However, the code that tries to preserve mount points under /dev for new namespace for qemu does not count with that option. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-03-13 13:32:45 +01:00
Michal Privoznik	e915942b05	qemuProcessHandleMonitorEOF: Disable namespace for domain https://bugzilla.redhat.com/show_bug.cgi?id=1430634 If a qemu process has died, we get EOF on its monitor. At this point, since qemu process was the only one running in the namespace kernel has already cleaned the namespace up. Any attempt of ours to enter it has to fail. This really happened in the bug linked above. We've tried to attach a disk to qemu and while we were in the monitor talking to qemu it just died. Therefore our code tried to do some roll back (e.g. deny the device in cgroups again, restore labels, etc.). However, during the roll back (esp. when restoring labels) we still thought that domain has a namespace. So we used secdriver's transactions. This failed as there is no namespace to enter. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-03-10 16:02:34 +01:00
Pavel Hrdina	cd4a8b9304	conf: store "autoGenerated" for graphics listen in status XML When libvirtd is started we call qemuDomainRecheckInternalPaths to detect whether a domain has VNC socket path generated by libvirt based on option from qemu.conf. However if we are parsing status XML for running domain the existing socket path can be generated also if the config XML uses the new <listen type='socket'/> element without specifying any socket. The current code doesn't make difference how the socket was generated and always marks it as "fromConfig". We need to store the "autoGenerated" value in the status XML in order to preserve that information. The difference between "fromConfig" and "autoGenerated" is important for migration, because if the socket is based on "fromConfig" we don't print it into the migratable XML and we assume that user has properly configured qemu.conf on both hosts. However if the socket is based on "autoGenerated" it means that a new feature was used and therefore we need to leave the socket in migratable XML to make sure that if this feature is not supported on destination the migration will fail. Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2017-03-09 10:22:43 +01:00
John Ferlan	b2e5de96c7	qemu: Rename variable Rename 'secretUsageType' to 'usageType' since it's superfluous in an API qemuSecret	2017-03-08 14:37:05 -05:00
John Ferlan	7c2b7891cc	qemu: Introduce qemuDomainSecretInfoTLSNew Building upon the qemuDomainSecretInfoNew, create a helper which will build the secret used for TLS. Signed-off-by: John Ferlan <jferlan@redhat.com>	2017-03-08 14:31:09 -05:00
John Ferlan	c9a7b7b6ea	qemu: Introduce qemuDomainSecretInfoNew Create a helper which will create the secinfo used for disks, hostdevs, and chardevs. Signed-off-by: John Ferlan <jferlan@redhat.com>	2017-03-08 14:31:07 -05:00
Pavel Hrdina	3ffea19acd	qemu_domain: cleanup the controller post parse code Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2017-03-07 16:50:35 +01:00
Pavel Hrdina	57404ff7a7	qemu_domain: move controller post parse code into its own function Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2017-03-07 16:50:34 +01:00
Michal Privoznik	4da534c0b9	qemu: Enforce qemuSecurity wrappers Now that we have some qemuSecurity wrappers over virSecurityManager APIs, lets make sure everybody sticks with them. We have them for a reason and calling virSecurityManager API directly instead of wrapper may lead into accidentally labelling a file on the host instead of namespace. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-03-06 08:54:28 +01:00
Marc Hartmayer	e22de286b1	qemu: Fix deadlock across fork() in QEMU driver The functions in virCommand() after fork() must be careful with regard to accessing any mutexes that may have been locked by other threads in the parent process. It is possible that another thread in the parent process holds the lock for the virQEMUDriver while fork() is called. This leads to a deadlock in the child process when 'virQEMUDriverGetConfig(driver)' is called and therefore the handshake never completes between the child and the parent process. Ultimately the virDomainObjectPtr will never be unlocked. It gets much worse if the other thread of the parent process, that holds the lock for the virQEMUDriver, tries to lock the already locked virDomainObject. This leads to a completely unresponsive libvirtd. It's possible to reproduce this case with calling 'virsh start XXX' and 'virsh managedsave XXX' in a tight loop for multiple domains. This commit fixes the deadlock in the same way as it is described in commit `61b52d2e38`. Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>	2017-02-21 15:47:32 +01:00
Michal Privoznik	5c74cf1f44	qemu: Allow @rendernode for virgl domains When enabling virgl, qemu opens /dev/dri/render*. So far, we are not allowing that in devices CGroup nor creating the file in domain's namespace and thus requiring users to set the paths in qemu.conf. This, however, is suboptimal as it allows access to ALL qemu processes even those which don't have virgl configured. Now that we have a way to specify render node that qemu will use we can be more cautious and enable just that. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-20 10:44:22 +01:00
Michal Privoznik	1bb787fdc9	qemuDomainGetHostdevPath: Report /dev/vfio/vfio less frequently So far, qemuDomainGetHostdevPath has no knowledge of the reasong it is called and thus reports /dev/vfio/vfio for every VFIO backed device. This is suboptimal, as we want it to: a) report /dev/vfio/vfio on every addition or domain startup b) report /dev/vfio/vfio only on last VFIO device being unplugged If a domain is being stopped then namespace and CGroup die with it so no need to worry about that. I mean, even when a domain that's exiting has more than one VFIO devices assigned to it, this function does not clean /dev/vfio/vfio in CGroup nor in the namespace. But that doesn't matter. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-02-20 07:21:59 +01:00
Michal Privoznik	b8e659aa98	qemuDomainGetHostdevPath: Create /dev/vfio/vfio iff needed So far, we are allowing /dev/vfio/vfio in the devices cgroup unconditionally (and creating it in the namespace too). Even if domain has no hostdev assignment configured. This is potential security hole. Therefore, when starting the domain (or hotplugging a hostdev) create & allow /dev/vfio/vfio too (if needed). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-02-20 07:21:58 +01:00
Michal Privoznik	9d92f533f8	qemuSetupHostdevCgroup: Use qemuDomainGetHostdevPath Since these two functions are nearly identical (with qemuSetupHostdevCgroup actually calling virCgroupAllowDevicePath) we can have one function call the other and thus de-duplicate some code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-02-20 07:21:58 +01:00
Michal Privoznik	b57bd206b9	qemu_conf: Check for namespaces availability more wisely The bare fact that mnt namespace is available is not enough for us to allow/enable qemu namespaces feature. There are other requirements: we must copy all the ACL & SELinux labels otherwise we might grant access that is administratively forbidden or vice versa. At the same time, the check for namespace prerequisites is moved from domain startup time to qemu.conf parser as it doesn't make much sense to allow users to start misconfigured libvirt just to find out they can't start a single domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-15 12:43:23 +01:00
Andrea Bolognani	ee6ec7824d	qemu: Call chmod() after mknod() mknod() is affected my the current umask, so we're not guaranteed the newly-created device node will have the right permissions. Call chmod(), which is not affected by the current umask, immediately afterwards to solve the issue. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1421036	2017-02-14 19:23:05 +01:00
Ján Tomko	723fef99c0	qemu: enforce maximum ports value for nec-xhci This controller only allows up to 15 ports. https://bugzilla.redhat.com/show_bug.cgi?id=1375417	2017-02-13 16:34:09 +01:00
Michal Privoznik	c2130c0d47	qemu_security: Introduce ImageLabel APIs Just like we need wrappers over other virSecurityManager APIs, we need one for virSecurityManagerSetImageLabel and virSecurityManagerRestoreImageLabel. Otherwise we might end up relabelling device in wrong namespace. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-09 08:04:57 +01:00
Michal Privoznik	b7feabbfdc	qemuDomainNamespaceSetupDisk: Simplify disk check Firstly, instead of checking for next->path the virStorageSourceIsEmpty() function should be used which also takes disk type into account. Secondly, not every disk source passed has the correct type set (due to our laziness). Therefore, instead of checking for virStorageSourceIsBlockLocal() and also S_ISBLK() the former can be refined to just virStorageSourceIsLocalStorage(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-08 15:56:21 +01:00
Michal Privoznik	786d8d91b4	qemuDomainDiskChainElement{Prepare,Revoke}: manage /dev entry Again, one missed bit. This time without this commit there is no /dev entry in the namespace of the qemu process when doing disk snapshots or block-copy. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-08 15:56:13 +01:00
Michal Privoznik	18ce9d139d	qemuDomainNamespace{Setup,Teardown}Disk: Don't pass pointer to full disk These functions do not need to see the whole virDomainDiskDef. Moreover, they are going to be called from places where we don't have access to the full disk definition. Sticking with virStorageSource is more than enough. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-08 15:56:05 +01:00
Michal Privoznik	76d491ef14	qemuDomainNamespaceSetupDisk: Drop useless @src variable Since its introduction in `81df21507b` this variable was never used. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-08 15:55:56 +01:00
Michal Privoznik	8dc867e978	qemu_domain: Don't pass virDomainDeviceDefPtr to ns helpers There is no need for this. None of the namespace helpers uses it. Historically it was used when calling secdriver APIs, but we don't to that anymore. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-08 15:55:52 +01:00
Andrea Bolognani	c2e60ad0e5	qemu: Forbid <memoryBacking><locked> without <memtune><hard_limit> In order for memory locking to work, the hard limit on memory locking (and usage) has to be set appropriately by the user. The documentation mentions the requirement already: with this patch, it's going to be enforced by runtime checks as well, by forbidding a non-compliant guest from being defined as well as edited and started. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1316774	2017-02-07 18:43:10 +01:00
Michal Privoznik	7f0b382522	qemuDomainAttachDeviceMknod: Don't loop endlessly When working with symlinks it is fairly easy to get into a loop. Don't. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-07 13:20:19 +01:00
Michal Privoznik	3f5fcacf89	qemuDomainAttachDeviceMknod: Deal with symlinks Similarly to one of the previous commits, we need to deal properly with symlinks in hotplug case too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-07 13:20:17 +01:00
Michal Privoznik	4ac847f93b	qemuDomainCreateDevice: Don't loop endlessly When working with symlinks it is fairly easy to get into a loop. Don't. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-07 13:18:32 +01:00
Michal Privoznik	54ed672214	qemuDomainCreateDevice: Properly deal with symlinks Imagine you have a disk with the following source set up: /dev/disk/by-uuid/$uuid (symlink to) -> /dev/sda After `cbc45525cb` the transitive end of the symlink chain is created (/dev/sda), but we need to create any item in chain too. Others might rely on that. In this case, /dev/disk/by-uuid/$uuid comes from domain XML thus it is this path that secdriver tries to relabel. Not the resolved one. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-07 13:18:10 +01:00
Michal Privoznik	b621291f5c	qemuDomain{Attach,Detach}Device NS helpers: Don't relabel devices After previous commit this has become redundant step. Also setting up devices in namespace and setting their label later on are two different steps and should be not done at once. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-02-07 10:40:53 +01:00
Michal Privoznik	572eda12ad	qemu: Implement mtu on interface Not only we should set the MTU on the host end of the device but also let qemu know what MTU did we set. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-26 10:00:01 +01:00
Michal Privoznik	b020cf73fe	domain_conf: Introduce <mtu/> to <interface/> So far we allow to set MTU for libvirt networks. However, not all domain interfaces have to be plugged into a libvirt network and even if they are, they might want to have a different MTU (e.g. for testing purposes). Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-26 09:59:56 +01:00
Chen Hanxiao	980f2a35c7	qemu_domain: add timestamp in tainting of guests log We lacked of timestamp in tainting of guests log, which bring troubles for finding guest issues: such as whether a guest powerdown caused by qemu-monitor-command or others issues inside guests. If we had timestamp in tainting of guests log, it would be helpful when checking guest's /var/log/messages. Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>	2017-01-21 12:34:19 -05:00
Michal Privoznik	57b5e27d3d	qemu: set default vhost-user ifname Based on work of Mehdi Abaakouk <sileht@sileht.net>. When parsing vhost-user interface XML and no ifname is found we can try to fill it in in post parse callback. The way this works is we try to make up interface name from given socket path and then ask openvswitch whether it knows the interface. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-20 15:42:12 +01:00
Michal Privoznik	d0baf54e53	qemu: Actually unshare() iff running as root https://bugzilla.redhat.com/show_bug.cgi?id=1413922 While all the code that deals with qemu namespaces correctly detects whether we are running as root (and turn into NO-OP for qemu:///session) the actual unshare() call is not guarded with such check. Therefore any attempt to start a domain under qemu:///session shall fail as unshare() is reserved for root. The fix consists of moving unshare() call (for which we have a wrapper called virProcessSetupPrivateMountNS) into qemuDomainBuildNamespace() where the proper check is performed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2017-01-17 13:23:56 +01:00
Michal Privoznik	93a062c3b2	qemu: Copy SELinux labels for namespace too When creating new /dev/* for qemu, we do chown() and copy ACLs to create the exact copy from the original /dev. I though that copying SELinux labels is not necessary as SELinux will chose the sane defaults. Surprisingly, it does not leaving namespace with the following labels: crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0 random crw-------. root root system_u:object_r:tmpfs_t:s0 rtc0 drwxrwxrwt. root root system_u:object_r:tmpfs_t:s0 shm crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0 urandom As a result, domain is unable to start: error: internal error: process exited while connecting to monitor: Error in GnuTLS initialization: Failed to acquire random data. qemu-kvm: cannot initialize crypto: Unable to initialize GNUTLS library: Failed to acquire random data. The solution is to copy the SELinux labels as well. Reported-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-13 14:45:52 +01:00
Michal Privoznik	cbc45525cb	qemuDomainCreateDevice: Canonicalize paths So far the decision whether /dev/* entry is created in the qemu namespace is really simple: does the path starts with "/dev/"? This can be easily fooled by providing path like the following (for any considered device like disk, rng, chardev, ..): /dev/../var/lib/libvirt/images/disk.qcow2 Therefore, before making the decision the path should be canonicalized. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-11 18:08:13 +01:00
Michal Privoznik	49f326edc0	qemu: Use namespaces iff available on the host kernel So far the namespaces were turned on by default unconditionally. For all non-Linux platforms we provided stub functions that just ignored whatever namespaces setting there was in qemu.conf and returned 0 to indicate success. Moreover, we didn't really check if namespaces are available on the host kernel. This is suboptimal as we might have ignored user setting. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-11 18:07:43 +01:00
Michal Privoznik	41816751a7	util: Introduce virFileMoveMount This is a simple wrapper over mount(). However, not every system out there is capable of moving a mount point. Therefore, instead of having to deal with this fact in all the places of our code we can have a simple wrapper and deal with this fact at just one place. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-11 18:06:30 +01:00
Michal Privoznik	2ff8c30548	qemuDomainSetupAllInputs: Update debug message Due to a copy-paste error, the debug message reads: Setting up disks It should have been: Setting up inputs. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-11 17:39:24 +01:00
Michal Privoznik	269589146c	qemu_domain: Move qemuDomainGetPreservedMounts This function is used only from code compiled on Linux. Therefore on non-Linux platforms it triggers compilation error: ../../src/qemu/qemu_domain.c:209:1: error: unused function 'qemuDomainGetPreservedMounts' [-Werror,-Wunused-function] Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-10 19:23:49 +01:00
Michal Privoznik	406e390962	qemu: Drop qemuDomainDeleteNamespace After previous commits, this function is no longer needed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-10 13:04:57 +01:00
Michal Privoznik	5d198c2b2c	qemuDomainCreateNamespace: move mkdir to qemuDomainBuildNamespace Again, there is no need to create /var/lib/libvirt/$domain.* directories in CreateNamespace(). It is sufficient to create them as soon as we need them which is in BuildNamespace. This way we don't leave them around for the whole lifetime of domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-10 13:04:57 +01:00
Michal Privoznik	5d30057695	qemuDomainGetPreservedMounts: Do not special case /dev The `c1140eb9e` got me thinking. We don't want to special case /dev in qemuDomainGetPreservedMounts(), but in all other places in the code we special case it anyway. I mean, /var/run/libvirt/$domain.dev path is constructed separately just so that it is not constructed here. It makes only a little sense (if any at all). Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-10 13:04:57 +01:00
Michal Privoznik	40ebbf72d5	qemuDomainCreateNamespace: s/unlink/rmdir/ If something goes wrong in this function we try a rollback. That is unlink all the directories we created earlier. For some weird reason unlink() was called instead of rmdir(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-10 13:04:57 +01:00
Martin Kletzander	c1140eb9ed	qemu: Remove /dev mount info properly Just so it doesn't bite us in the future, even though it's unlikely. And fix the comment above it as well. Commit `e08ee7cd34` took the info from the function it's calling, but that was lie itself in the first place. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2017-01-05 16:24:55 +01:00
Michal Privoznik	e08ee7cd34	qemuDomainGetPreservedMounts: Fetch list of /dev/* mounts dynamically With my namespace patches, we are spawning qemu in its own namespace so that we can manage /dev entries ourselves. However, some filesystems mounted under /dev needs to be preserved in order to be shared with the parent namespace (e.g. /dev/pts). Currently, the list of mount points to preserve is hardcoded which ain't right - on some systems there might be less or more items under real /dev that on our list. The solution is to parse /proc/mounts and fetch the list from there. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-05 16:00:20 +01:00
Michal Privoznik	dd78da09b0	qemuDomainCreateDevice: Be more careful about device path Again, not something that I'd hit, but there is a chance in theory that this might bite us. Currently the way we decide whether or not to create /dev entry for a device is by marching first four characters of path with "/dev". This might be not enough. Just imagine somebody has a disk image stored under "/devil/path/to/disk". We ought to be matching against "/dev/". Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-04 15:36:42 +01:00
Michal Privoznik	ce01a2b11c	qemuDomainAttachDeviceMknodHelper: Don't unlink() so often Not that I'd encounter any bug here, but the code doesn't look 100% correct. Imagine, somebody is trying to attach a device to a domain, and the device's /dev entry already exists in the qemu namespace. This is handled gracefully and the control continues with setting up ACLs and calling security manager to set up labels. Now, if any of these steps fail, control jump on the 'cleanup' label and unlink() the file straight away. Even when it was not us who created the file in the first place. This can be possibly dangerous. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-04 15:36:42 +01:00
Michal Privoznik	3aae99fe71	qemu: Handle EEXIST gracefully in qemuDomainCreateDevice https://bugzilla.redhat.com/show_bug.cgi?id=1406837 Imagine you have a domain configured in such way that you are assigning two PCI devices that fall into the same IOMMU group. With mount namespace enabled what happens is that for the first PCI device corresponding /dev/vfio/X entry is created and when the code tries to do the same for the second mknod() fails as /dev/vfio/X already exists: 2016-12-21 14:40:45.648+0000: 24681: error : qemuProcessReportLogError:1792 : internal error: Process exited prior to exec: libvirt: QEMU Driver error : Failed to make device /var/run/libvirt/qemu/windoze.dev//vfio/22: File exists Worse, by default there are some devices that are created in the namespace regardless of domain configuration (e.g. /dev/null, /dev/urandom, etc.). If one of them is set as backend for some guest device (e.g. rng, chardev, etc.) it's the same story as described above. Weirdly, in attach code this is already handled. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2017-01-04 15:36:42 +01:00
John Ferlan	7f7d990483	qemu: Don't assume secret provided for LUKS encryption https://bugzilla.redhat.com/show_bug.cgi?id=1405269 If a secret was not provided for what was determined to be a LUKS encrypted disk (during virStorageFileGetMetadata processing when called from qemuDomainDetermineDiskChain as a result of hotplug attach qemuDomainAttachDeviceDiskLive), then do not attempt to look it up (avoiding a libvirtd crash) and do not alter the format to "luks" when adding the disk; otherwise, the device_add would fail with a message such as: "unable to execute QEMU command 'device_add': Property 'scsi-hd.drive' can't find value 'drive-scsi0-0-0-0'" because of assumptions that when the format=luks that libvirt would have provided the secret to decrypt the volume. Access to unlock the volume will thus be left to the application.	2017-01-03 12:59:18 -05:00
Marc Hartmayer	fb2cd32c9a	qemu: qemuDomainDiskChangeSupported: Add missing 'address' check Disk->info is not live updatable so add a check for this. Otherwise libvirt reports success even though no data was updated. Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>	2016-12-20 11:22:44 +01:00
Michal Privoznik	ab41ce7f4e	qemu: Mark more namespace code linux-only Some of the functions are not called on non-linux platforms which makes them useless there. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-16 11:51:06 +00:00
Michal Privoznik	f444faa94a	qemu: Enable mount namespace https://bugzilla.redhat.com/show_bug.cgi?id=1404952 Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	661887f558	qemu: Let users opt-out from containerization Given how intrusive previous patches are, it might happen that there's a bug or imperfection. Lets give users a way out: if they set 'namespaces' to an empty array in qemu.conf the feature is suppressed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	f95c5c48d4	qemu: Manage /dev entry on RNG hotplug When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	f5fdf23a68	qemu: Manage /dev entry on chardev hotplug When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	6e57492839	qemu: Manage /dev entry on hostdev hotplug When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	81df21507b	qemu: Manage /dev entry on disk hotplug When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	2160f338a7	qemu: Prepare RNGs when starting a domain When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	8ec8a8c5ff	qemu: Prepare inputs when starting a domain When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	2c654490f3	qemu: Prepare TPM when starting a domain When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	4e4451019c	qemu: Prepare chardevs when starting a domain When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	73267cec46	qemu: Prepare hostdevs when starting a domain When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	054202d020	qemu: Prepare disks when starting a domain When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	bb4e529664	qemu: Spawn qemu under mount namespace Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also. From technical POV, on domain startup process the parent (libvirtd) creates: /var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev. What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one. To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-15 09:25:16 +01:00
Michal Privoznik	7ed6934f3b	virDomainObjGetShortName: take virDomainDef So far this function takes virDomainObjPtr which: 1) is an overkill, 2) might be not available in all the places we will use it. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2016-12-08 15:45:52 +01:00
Laine Stump	9b0848d523	qemu: propagate virQEMUDriver object to qemuDomainDeviceCalculatePCIConnectFlags If libvirtd is running unprivileged, it can open a device's PCI config data in sysfs, but can only read the first 64 bytes. But as part of determining whether a device is Express or legacy PCI, qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond the first 64 bytes of the PCI config data and fails with an error log if the read is unsuccessful. In order to avoid creating a parallel "quiet" version of virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down through all the call chains that initialize the qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver pointer with the rest of the iterdata so that it can be used by qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used yet, but will be used in an upcoming patch (that detects Express vs legacy PCI for VFIO assigned devices) to examine driver->privileged.	2016-11-30 15:28:07 -05:00

1 2 3 4 5 ...

695 Commits