Since mdevs are just another type of VFIO devices, we should increase
the memory locking limit the same way we do for VFIO PCI devices.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
As goes for all the other hostdev device types, grant the qemu process
access to /dev/vfio/<mediated_device_iommu_group>.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
A mediated device will be identified by a UUID (with 'model' now being
a mandatory <hostdev> attribute to represent the mediated device API) of
the user pre-created mediated device. We also need to make sure that if
user explicitly provides a guest address for a mdev device, the address
type will be matching the device API supported on that specific mediated
device and error out with an incorrect XML message.
The resulting device XML:
<devices>
<hostdev mode='subsystem' type='mdev' model='vfio-pci'>
<source>
<address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'>
</source>
</hostdev>
</devices>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
The code is currently simple, but if we later add node names, it will be
necessary to generate the names based on the node name. Add a helper so
that there's a central point to fix once we add self-generated node
names.
If the migration flags indicate this migration will be using TLS,
then set up the destination during the prepare phase once the target
domain has been started to add the TLS objects to perform the migration.
This will create at least an "-object tls-creds-x509,endpoint=server,..."
for TLS credentials and potentially an "-object secret,..." to handle the
passphrase response to access the TLS credentials. The alias/id used for
the TLS objects will contain "libvirt_migrate".
Once the objects are created, the code will set the "tls-creds" and
"tls-hostname" migration parameters to signify usage of TLS.
During the Finish phase we'll be sure to attempt to clear the
migration parameters and delete those objects (whether or not they
were created). We'll also perform the same reset during recovery
if we've reached FINISH3.
If the migration isn't using TLS, then be sure to check if the
migration parameters exist and clear them if so.
So, majority of the code is just ready as-is. Well, with one
slight change: differentiate between dimm and nvdimm in places
like device alias generation, generating the command line and so
on.
Speaking of the command line, we also need to append 'nvdimm=on'
to the '-machine' argument so that the nvdimm feature is
advertised in the ACPI tables properly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
NVDIMM is new type of memory introduced into QEMU 2.6. The idea
is that we have a Non-Volatile memory module that keeps the data
persistent across domain reboots.
At the domain XML level, we already have some representation of
'dimm' modules. Long story short, NVDIMM will utilize the
existing <memory/> element that lives under <devices/> by adding
a new attribute 'nvdimm' to the existing @model and introduce a
new <path/> element for <source/> while reusing other fields. The
resulting XML would appear as:
<memory model='nvdimm'>
<source>
<path>/tmp/nvdimm</path>
</source>
<target>
<size unit='KiB'>523264</size>
<node>0</node>
</target>
<address type='dimm' slot='0'/>
</memory>
So far, this is just a XML parser/formatter extension. QEMU
driver implementation is in the next commit.
For more info on NVDIMM visit the following web page:
http://pmem.io/
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1431112
Yeah, that's right. A mount point doesn't have to be a directory.
It can be a file too. However, the code that tries to preserve
mount points under /dev for new namespace for qemu does not count
with that option.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1430634
If a qemu process has died, we get EOF on its monitor. At this
point, since qemu process was the only one running in the
namespace kernel has already cleaned the namespace up. Any
attempt of ours to enter it has to fail.
This really happened in the bug linked above. We've tried to
attach a disk to qemu and while we were in the monitor talking to
qemu it just died. Therefore our code tried to do some roll back
(e.g. deny the device in cgroups again, restore labels, etc.).
However, during the roll back (esp. when restoring labels) we
still thought that domain has a namespace. So we used secdriver's
transactions. This failed as there is no namespace to enter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When libvirtd is started we call qemuDomainRecheckInternalPaths
to detect whether a domain has VNC socket path generated by libvirt
based on option from qemu.conf. However if we are parsing status XML
for running domain the existing socket path can be generated also if
the config XML uses the new <listen type='socket'/> element without
specifying any socket.
The current code doesn't make difference how the socket was generated
and always marks it as "fromConfig". We need to store the
"autoGenerated" value in the status XML in order to preserve that
information.
The difference between "fromConfig" and "autoGenerated" is important
for migration, because if the socket is based on "fromConfig" we don't
print it into the migratable XML and we assume that user has properly
configured qemu.conf on both hosts. However if the socket is based
on "autoGenerated" it means that a new feature was used and therefore
we need to leave the socket in migratable XML to make sure that if
this feature is not supported on destination the migration will fail.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Now that we have some qemuSecurity wrappers over
virSecurityManager APIs, lets make sure everybody sticks with
them. We have them for a reason and calling virSecurityManager
API directly instead of wrapper may lead into accidentally
labelling a file on the host instead of namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The functions in virCommand() after fork() must be careful with regard
to accessing any mutexes that may have been locked by other threads in
the parent process. It is possible that another thread in the parent
process holds the lock for the virQEMUDriver while fork() is called.
This leads to a deadlock in the child process when
'virQEMUDriverGetConfig(driver)' is called and therefore the handshake
never completes between the child and the parent process. Ultimately
the virDomainObjectPtr will never be unlocked.
It gets much worse if the other thread of the parent process, that
holds the lock for the virQEMUDriver, tries to lock the already locked
virDomainObject. This leads to a completely unresponsive libvirtd.
It's possible to reproduce this case with calling 'virsh start XXX'
and 'virsh managedsave XXX' in a tight loop for multiple domains.
This commit fixes the deadlock in the same way as it is described in
commit 61b52d2e38.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
When enabling virgl, qemu opens /dev/dri/render*. So far, we are
not allowing that in devices CGroup nor creating the file in
domain's namespace and thus requiring users to set the paths in
qemu.conf. This, however, is suboptimal as it allows access to
ALL qemu processes even those which don't have virgl configured.
Now that we have a way to specify render node that qemu will use
we can be more cautious and enable just that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far, qemuDomainGetHostdevPath has no knowledge of the reasong
it is called and thus reports /dev/vfio/vfio for every VFIO
backed device. This is suboptimal, as we want it to:
a) report /dev/vfio/vfio on every addition or domain startup
b) report /dev/vfio/vfio only on last VFIO device being unplugged
If a domain is being stopped then namespace and CGroup die with
it so no need to worry about that. I mean, even when a domain
that's exiting has more than one VFIO devices assigned to it,
this function does not clean /dev/vfio/vfio in CGroup nor in the
namespace. But that doesn't matter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
So far, we are allowing /dev/vfio/vfio in the devices cgroup
unconditionally (and creating it in the namespace too). Even if
domain has no hostdev assignment configured. This is potential
security hole. Therefore, when starting the domain (or
hotplugging a hostdev) create & allow /dev/vfio/vfio too (if
needed).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Since these two functions are nearly identical (with
qemuSetupHostdevCgroup actually calling virCgroupAllowDevicePath)
we can have one function call the other and thus de-duplicate
some code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The bare fact that mnt namespace is available is not enough for
us to allow/enable qemu namespaces feature. There are other
requirements: we must copy all the ACL & SELinux labels otherwise
we might grant access that is administratively forbidden or vice
versa.
At the same time, the check for namespace prerequisites is moved
from domain startup time to qemu.conf parser as it doesn't make
much sense to allow users to start misconfigured libvirt just to
find out they can't start a single domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
mknod() is affected my the current umask, so we're not
guaranteed the newly-created device node will have the
right permissions.
Call chmod(), which is not affected by the current umask,
immediately afterwards to solve the issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1421036
Just like we need wrappers over other virSecurityManager APIs, we
need one for virSecurityManagerSetImageLabel and
virSecurityManagerRestoreImageLabel. Otherwise we might end up
relabelling device in wrong namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Firstly, instead of checking for next->path the
virStorageSourceIsEmpty() function should be used which also
takes disk type into account.
Secondly, not every disk source passed has the correct type set
(due to our laziness). Therefore, instead of checking for
virStorageSourceIsBlockLocal() and also S_ISBLK() the former can
be refined to just virStorageSourceIsLocalStorage().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Again, one missed bit. This time without this commit there is no
/dev entry in the namespace of the qemu process when doing disk
snapshots or block-copy.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
These functions do not need to see the whole virDomainDiskDef.
Moreover, they are going to be called from places where we don't
have access to the full disk definition. Sticking with
virStorageSource is more than enough.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There is no need for this. None of the namespace helpers uses it.
Historically it was used when calling secdriver APIs, but we
don't to that anymore.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In order for memory locking to work, the hard limit on memory
locking (and usage) has to be set appropriately by the user.
The documentation mentions the requirement already: with this
patch, it's going to be enforced by runtime checks as well,
by forbidding a non-compliant guest from being defined as well
as edited and started.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1316774
Similarly to one of the previous commits, we need to deal
properly with symlinks in hotplug case too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Imagine you have a disk with the following source set up:
/dev/disk/by-uuid/$uuid (symlink to) -> /dev/sda
After cbc45525cb the transitive end of the symlink chain is
created (/dev/sda), but we need to create any item in chain too.
Others might rely on that.
In this case, /dev/disk/by-uuid/$uuid comes from domain XML thus
it is this path that secdriver tries to relabel. Not the resolved
one.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
After previous commit this has become redundant step.
Also setting up devices in namespace and setting their label
later on are two different steps and should be not done at once.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Not only we should set the MTU on the host end of the device but
also let qemu know what MTU did we set.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far we allow to set MTU for libvirt networks. However, not all
domain interfaces have to be plugged into a libvirt network and
even if they are, they might want to have a different MTU (e.g.
for testing purposes).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We lacked of timestamp in tainting of guests log,
which bring troubles for finding guest issues:
such as whether a guest powerdown caused by qemu-monitor-command
or others issues inside guests.
If we had timestamp in tainting of guests log,
it would be helpful when checking guest's /var/log/messages.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Based on work of Mehdi Abaakouk <sileht@sileht.net>.
When parsing vhost-user interface XML and no ifname is found we
can try to fill it in in post parse callback. The way this works
is we try to make up interface name from given socket path and
then ask openvswitch whether it knows the interface.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1413922
While all the code that deals with qemu namespaces correctly
detects whether we are running as root (and turn into NO-OP for
qemu:///session) the actual unshare() call is not guarded with
such check. Therefore any attempt to start a domain under
qemu:///session shall fail as unshare() is reserved for root.
The fix consists of moving unshare() call (for which we have a
wrapper called virProcessSetupPrivateMountNS) into
qemuDomainBuildNamespace() where the proper check is performed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
When creating new /dev/* for qemu, we do chown() and copy ACLs to
create the exact copy from the original /dev. I though that
copying SELinux labels is not necessary as SELinux will chose the
sane defaults. Surprisingly, it does not leaving namespace with
the following labels:
crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0 random
crw-------. root root system_u:object_r:tmpfs_t:s0 rtc0
drwxrwxrwt. root root system_u:object_r:tmpfs_t:s0 shm
crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0 urandom
As a result, domain is unable to start:
error: internal error: process exited while connecting to monitor: Error in GnuTLS initialization: Failed to acquire random data.
qemu-kvm: cannot initialize crypto: Unable to initialize GNUTLS library: Failed to acquire random data.
The solution is to copy the SELinux labels as well.
Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far the decision whether /dev/* entry is created in the qemu
namespace is really simple: does the path starts with "/dev/"?
This can be easily fooled by providing path like the following
(for any considered device like disk, rng, chardev, ..):
/dev/../var/lib/libvirt/images/disk.qcow2
Therefore, before making the decision the path should be
canonicalized.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far the namespaces were turned on by default unconditionally.
For all non-Linux platforms we provided stub functions that just
ignored whatever namespaces setting there was in qemu.conf and
returned 0 to indicate success. Moreover, we didn't really check
if namespaces are available on the host kernel.
This is suboptimal as we might have ignored user setting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This is a simple wrapper over mount(). However, not every system
out there is capable of moving a mount point. Therefore, instead
of having to deal with this fact in all the places of our code we
can have a simple wrapper and deal with this fact at just one
place.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Due to a copy-paste error, the debug message reads:
Setting up disks
It should have been:
Setting up inputs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This function is used only from code compiled on Linux. Therefore
on non-Linux platforms it triggers compilation error:
../../src/qemu/qemu_domain.c:209:1: error: unused function 'qemuDomainGetPreservedMounts' [-Werror,-Wunused-function]
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Again, there is no need to create /var/lib/libvirt/$domain.*
directories in CreateNamespace(). It is sufficient to create them
as soon as we need them which is in BuildNamespace. This way we
don't leave them around for the whole lifetime of domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The c1140eb9e got me thinking. We don't want to special case /dev
in qemuDomainGetPreservedMounts(), but in all other places in the
code we special case it anyway. I mean,
/var/run/libvirt/$domain.dev path is constructed separately just
so that it is not constructed here. It makes only a little sense
(if any at all).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If something goes wrong in this function we try a rollback. That
is unlink all the directories we created earlier. For some weird
reason unlink() was called instead of rmdir().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Just so it doesn't bite us in the future, even though it's unlikely.
And fix the comment above it as well. Commit e08ee7cd34 took the
info from the function it's calling, but that was lie itself in the
first place.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
With my namespace patches, we are spawning qemu in its own
namespace so that we can manage /dev entries ourselves. However,
some filesystems mounted under /dev needs to be preserved in
order to be shared with the parent namespace (e.g. /dev/pts).
Currently, the list of mount points to preserve is hardcoded
which ain't right - on some systems there might be less or more
items under real /dev that on our list. The solution is to parse
/proc/mounts and fetch the list from there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Again, not something that I'd hit, but there is a chance in
theory that this might bite us. Currently the way we decide
whether or not to create /dev entry for a device is by marching
first four characters of path with "/dev". This might be not
enough. Just imagine somebody has a disk image stored under
"/devil/path/to/disk". We ought to be matching against "/dev/".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Not that I'd encounter any bug here, but the code doesn't look
100% correct. Imagine, somebody is trying to attach a device to a
domain, and the device's /dev entry already exists in the qemu
namespace. This is handled gracefully and the control continues
with setting up ACLs and calling security manager to set up
labels. Now, if any of these steps fail, control jump on the
'cleanup' label and unlink() the file straight away. Even when it
was not us who created the file in the first place. This can be
possibly dangerous.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1406837
Imagine you have a domain configured in such way that you are
assigning two PCI devices that fall into the same IOMMU group.
With mount namespace enabled what happens is that for the first
PCI device corresponding /dev/vfio/X entry is created and when
the code tries to do the same for the second mknod() fails as
/dev/vfio/X already exists:
2016-12-21 14:40:45.648+0000: 24681: error :
qemuProcessReportLogError:1792 : internal error: Process exited
prior to exec: libvirt: QEMU Driver error : Failed to make device
/var/run/libvirt/qemu/windoze.dev//vfio/22: File exists
Worse, by default there are some devices that are created in the
namespace regardless of domain configuration (e.g. /dev/null,
/dev/urandom, etc.). If one of them is set as backend for some
guest device (e.g. rng, chardev, etc.) it's the same story as
described above.
Weirdly, in attach code this is already handled.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1405269
If a secret was not provided for what was determined to be a LUKS
encrypted disk (during virStorageFileGetMetadata processing when
called from qemuDomainDetermineDiskChain as a result of hotplug
attach qemuDomainAttachDeviceDiskLive), then do not attempt to
look it up (avoiding a libvirtd crash) and do not alter the format
to "luks" when adding the disk; otherwise, the device_add would
fail with a message such as:
"unable to execute QEMU command 'device_add': Property 'scsi-hd.drive'
can't find value 'drive-scsi0-0-0-0'"
because of assumptions that when the format=luks that libvirt would have
provided the secret to decrypt the volume.
Access to unlock the volume will thus be left to the application.
Disk->info is not live updatable so add a check for this. Otherwise
libvirt reports success even though no data was updated.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Given how intrusive previous patches are, it might happen that
there's a bug or imperfection. Lets give users a way out: if they
set 'namespaces' to an empty array in qemu.conf the feature is
suppressed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Prime time. When it comes to spawning qemu process and
relabelling all the devices it's going to touch, there's inherent
race with other applications in the system (e.g. udev). Instead
of trying convincing udev to not touch libvirt managed devices,
we can create a separate mount namespace for the qemu, and mount
our own /dev there. Of course this puts more work onto us as we
have to maintain /dev files on each domain start and device
hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent
(libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev
/var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare()
to create new mount namespace. From now on anything that child
does is invisible to the parent. Child then mounts tmpfs on
$domain.dev (so that it still sees original /dev from the host)
and creates some devices (as explained in one of the previous
patches). The devices have to be created exactly as they are in
the host (including perms, seclabels, ACLs, ...). After that it
moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can
create PTYs for some chardevs. And historically we exposed the
host ends in our domain XML allowing users to connect to them.
Therefore we must preserve devpts mount to be shared with the
host's one.
To make this patch as small as possible, creating of devices
configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far this function takes virDomainObjPtr which:
1) is an overkill,
2) might be not available in all the places we will use it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If libvirtd is running unprivileged, it can open a device's PCI config
data in sysfs, but can only read the first 64 bytes. But as part of
determining whether a device is Express or legacy PCI,
qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future
patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond
the first 64 bytes of the PCI config data and fails with an error log
if the read is unsuccessful.
In order to avoid creating a parallel "quiet" version of
virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down
through all the call chains that initialize the
qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver
pointer with the rest of the iterdata so that it can be used by
qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used
yet, but will be used in an upcoming patch (that detects Express vs
legacy PCI for VFIO assigned devices) to examine driver->privileged.
We have couple of functions that operate over NULL terminated
lits of strings. However, our naming sucks:
virStringJoin
virStringFreeList
virStringFreeListCount
virStringArrayHasString
virStringGetFirstWithPrefix
We can do better:
virStringListJoin
virStringListFree
virStringListFreeCount
virStringListHasString
virStringListGetFirstWithPrefix
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
qemuDomainObjExitAgent is unsafe.
First it accesses domain object without domain lock.
Second it uses outdated logic that goes back to commit 79533da1 of
year 2009 when code was quite different. (unref function
instead of unreferencing only unlocked and disposed object
in case of last reference and leaved unlocking to the caller otherwise).
Nowadays this logic may lead to disposing locked object
i guess.
Another problem is that the callers of qemuDomainObjEnterAgent
use domain object again (namely priv->agent) without domain lock.
This patch address these two problems.
qemuDomainGetAgent is dropped as unused.
Use the util function virHostdevIsSCSIDevice() to simplify if
statements.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Add missing checks if a hostdev is a subsystem/SCSI device before access
the union member 'subsys'/'scsi'. Also fix indentation and simplify
qemuDomainObjCheckHostdevTaint().
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Don't use qemuMonitorGetCPUInfo which does a lot of matching to get the
full picture which is not necessary and would be mostly discarded.
Refresh only the vcpu halted state using data from query-cpus.
Previously we added a set of EHCI+UHCI controllers to Q35 machines to
mimic real hardware as closely as possible, but recent discussions
have pointed out that the nec-usb-xhci (USB3) controller is much more
virtualization-friendly (uses less CPU), so this patch switches the
default for Q35 machinetypes to add an XHCI instead (if it's
supported, which it of course *will* be).
Since none of the existing test cases left out USB controllers in the
input XML, a new Q35 test case was added which has *no* devices, so
ends up with only the defaults always put in by qemu, plus those added
by libvirt.
Now the a dmi-to-pci-bridge is automatically added just as it's needed
(when a pci-bridge is being added), we no longer have any need to
force-add one to every single Q35 domain.
We're keeping some things at default and that's not something we want to
do intentionally. Let's save some sensible defaults upfront in order to
avoid having problems later. The details for the defaults (of the newer
implementation) can be found in qemu's commit 5400c02b90bb:
http://git.qemu.org/?p=qemu.git;a=commit;h=5400c02b90bb
Since we are merely saving the defaults it will not change the guest ABI
and thanks to the fact that we're doing it in the PostParse callback it
will not break the ABI stability checks.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add the secret object so the 'passwordid=' can be added if the command line
if there's a secret defined in/on the host for TCP chardev TLS objects.
Preparation for the secret involves adding the secinfo to the char source
device prior to command line processing. There are multiple possibilities
for TCP chardev source backend usage.
Add test for at least a serial chardev as an example.
Adding a field to the domain's private vcpu object to hold the halted
state information.
Adding two functions in support of the halted state:
- qemuDomainGetVcpuHalted: retrieve the halted state from a
private vcpu object
- qemuDomainRefreshVcpuHalted: obtain the per-vcpu halted states
via qemu monitor and store the results in the private vcpu objects
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Hao QingFeng <haoqf@linux.vnet.ibm.com>
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Since TLS was introduced hostwide for libvirt 2.3.0 and a domain
configurable haveTLS was implemented for libvirt 2.4.0, we have to
modify the migratable XML for specific case where the 'tls' attribute
is based on setting from qemu.conf.
The "tlsFromConfig" is libvirt internal attribute and is stored only in
status XML to ensure that when libvirtd is restarted this internal flag
is not lost by the restart.
That flag is used to decide whether we should put *tls* attribute to
migratable XML or not.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Add an optional "tls='yes|no'" attribute for a TCP chardev.
For QEMU, this will allow for disabling the host config setting of the
'chardev_tls' for a domain chardev channel by setting the value to "no" or
to attempt to use a host TLS environment when setting the value to "yes"
when the host config 'chardev_tls' setting is disabled, but a TLS environment
is configured via either the host config 'chardev_tls_x509_cert_dir' or
'default_tls_x509_cert_dir'
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Commit id '5f2a132786' should have placed the data in the host source
def structure since that's also used by smartcard, redirdev, and rng in
order to provide a backend tcp channel. The data in the private structure
will be necessary in order to provide the secret properly.
This also renames the previous names from "Chardev" to "ChrSource" for
the private data structures and API's
Change the virDomainChrDef to use a pointer to 'source' and allocate
that pointer during virDomainChrDefNew.
This has tremendous "fallout" in the rest of the code which mainly
has to change source.$field to source->$field.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Modeled after the qemuDomainHostdevPrivatePtr (commit id '27726d8c'),
create a privateData pointer in the _virDomainChardevDef to allow storage
of private data for a hypervisor in order to at least temporarily store
secret data for usage during qemuBuildCommandLine.
NB: Since the qemu_parse_command (qemuParseCommandLine) code is not
expecting to restore the secret data, there's no need to add code
code to handle this new structure there.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This improves commit 706b5b6277 in a way that we check qemu capabilities
instead of what architecture we are running on to detect whether we can
use *virtio-vga* model or not. This is not a case only for arm/aarch64.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
All definition validation that doesn't depend on qemu capabilities
and was allowed previously as valid definition should be placed into
qemuDomainDefValidate.
The check whether video type is supported or not was based on an enum
that translates type into model. Use switch to ensure that if new
video type is added, it will be properly handled.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This breaks vCPU hotplug, because when starting a domain, we
create a copy of domain definition (which becomes live XML) and
during the post parse callbacks we might adjust some tunings so
that vCPU hotplug is possible.
This reverts commit 581b7756af.
When creating a copy of virDomainDef we save ourselves the
trouble of writing deep-copy functions and just format and parse
back domain/device XML. However, the XML we are parsing was
already fully formatted - there is no reason to run post parse
callbacks (which fill in blanks - there are none!).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Just like we did two commits ago, don't try to fetch capabilities
for non-existing binary. Re-use the ones we have for running
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Just like we did two commits ago, don't try to fetch capabilities
for non-existing binary. Re-use the ones we have for running
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We can't rely on def->emulator path. It may be provided by user
as we give them opportunity to provide their own XML for
migration. Therefore the path may point to just whatever binary
(or even to a non-existent file). Moreover, this path is meant
for destination, but the capabilities lookup is done on source.
What we can do is to assume same capabilities for post parse
callbacks as the running domain has. They will be used just to
add some default models/controllers/devices/... anyway.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Just like virDomainDefPostParseCallback has gained new
parseOpaque argument, we need to follow the logic with
virDomainDeviceDefPostParse.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We want to pass the proper opaque pointer instead of NULL to
virDomainDefParse and subsequently virDomainDefParseNode too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Some callers might want to pass yet another pointer to opaque
data to post parse callbacks. The driver generic one is not
enough because two threads executing post parse callback might
want to see different data (e.g. domain object pointer that
domain def belongs to).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The reworked API is now called virCPUUpdate and it should change the
provided CPU definition into a one which can be consumed by the QEMU
command line builder:
- host-passthrough remains unchanged
- host-model is turned into custom CPU with a model and features
copied from host
- custom CPU with minimum match is converted similarly to host-model
- optional features are updated according to host's CPU
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Put it into qemuDomainPrepareShmemChardev() so it can be used later.
Also don't fill in the path unless the server option is enabled.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Use the state information (online, hotpluggable) provided by the monitor
code rather than trying to infer it. This fixes an issue where on
architectures that require hotplug of multiple threads at once the
sub-cores would get updated as offline on daemon restart thus creating
an invalid configuration.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1375783
When migration fails, we need to poke QEMU monitor to check for a reason
of the failure. We did this using query-migrate QMP command, which is
not supposed to return any meaningful result on the destination side.
Thus if the monitor was still functional when we detected the migration
failure, parsing the answer from query-migrate always failed with the
following error message:
"info migration reply was missing return status"
This irrelevant message was then used as the reason for the migration
failure replacing any message we might have had.
Let's use harmless query-status for poking the monitor to make sure we
only get an error if the monitor connection is broken.
https://bugzilla.redhat.com/show_bug.cgi?id=1374613
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When a source image is dropped when missing due to startup policy the
policy needs to be cleared since it was relevant only for the given
storage source. New sources need to update it if needed.
Since the domain lock is not held during preparation of an external XML
config, it is possible that the value can change resulting in unexpected
failures during ABI consistency checking for some save and migrate
operations.
This patch adds a new flag to skip the checking of the cur_balloon value
and then sets the destination value to the source value to ensure
subsequent checks without the skip flag will succeed.
This way it is protected from forges and is keeped up to date too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Add support for using the new approach to hotplug vcpus using device_add
during startup of qemu to allow sparse vcpu topologies.
There are a few limitations imposed by qemu on the supported
configuration:
- vcpu0 needs to be always present and not hotpluggable
- non-hotpluggable cpus need to be ordered at the beginning
- order of the vcpus needs to be unique for every single hotpluggable
entity
Qemu also doesn't really allow to query the information necessary to
start a VM with the vcpus directly on the commandline. Fortunately they
can be hotplugged during startup.
The new hotplug code uses the following approach:
- non-hotpluggable vcpus are counted and put to the -smp option
- qemu is started
- qemu is queried for the necessary information
- the configuration is checked
- the hotpluggable vcpus are hotplugged
- vcpus are started
This patch adds a lot of checking code and enables the support to
specify the individual vcpu element with qemu.
The vcpu order information is extracted only for hotpluggable entities,
while vcpu definitions belonging to the same hotpluggable entity need
to all share the order information.
We also can't overwrite it right away in the vcpu info detection code as
the order is necessary to add the hotpluggable vcpus enabled on boot in
the correct order.
The helper will store the order information in places where we are
certain that it's necessary.
Introduce a new migration cookie flag that will be used for any
configurations that are not compatible with libvirt that would not
support the specific vcpu hotplug approach. This will make sure that old
libvirt does not fail to reproduce the configuration correctly.
Individual vCPU hotplug requires us to track the state of any vCPU. To
allow this add the following XML:
<domain>
...
<vcpu current='2'>3</vcpu>
<vcpus>
<vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
<vcpu id='1' enabled='yes' hotpluggable='yes' order='2'/>
<vcpu id='1' enabled='no' hotpluggable='yes'/>
</vcpus>
...
The 'enabled' attribute allows to control the state of the vcpu.
'hotpluggable' controls whether given vcpu can be hotplugged and 'order'
allows to specify the order to add the vcpus.
Similarly to devices the guest may allow unplug of the VCPU if libvirt
is down. To avoid problems, refresh the vcpu state on reconnect. Don't
mess with the vcpu state otherwise.
Now that the monitor code gathers all the data we can extract it to
relevant places either in the definition or the private data of a vcpu.
As only thread id is broken for TCG guests we may extract the rest of
the data and just skip assigning of the thread id. In case where qemu
would allow cpu hotplug in TCG mode this will make it work eventually.
For hotplug purposes it's necessary to retrieve data using
query-hotpluggable-cpus while the old query-cpus API report thread IDs
and order of hotplug.
This patch adds code that merges the data using a rather non-trivial
algorithm and fills the data to the qemuMonitorCPUInfo structure for
adding to appropriate place in the domain definition.
As of qemu commit:
commit a32ef3bfc12c8d0588f43f74dcc5280885bbdb30
Author: Thomas Huth <thuth@redhat.com>
Date: Wed Jul 22 15:59:50 2015 +0200
vl: Add another sanity check to smp_parse() function
v2.4.0-952-ga32ef3b
configuration where the maximum CPU count doesn't match the topology is
rejected. Prior to that only configurations where the topology would
contain more cpus than the maximum count would be rejected.
Use QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS as a relevant recent enough
witness to avoid breaking old configs.
Now that the default USB controller model is explicit rather
than implicit for i440fx machines, we have to tweak the
conditions for dropping it in order to keep migration towards
libvirt <= 0.9.4 working.
When the user doesn't specify any model for a USB controller,
we use an architecture-dependent default, but we don't reflect
it in the guest XML.
Pick the default USB controller model when parsing the guest
XML instead of when creating the QEMU command line, so that
our choice is saved back to disk.
In qemu, enabling this feature boils down to adding the following
onto the command line:
-global driver=cfi.pflash01,property=secure,value=on
However, there are some constraints resulting from the
implementation. For instance, System Management Mode (SMM) is
required to be enabled, the machine type must be q35-2.4 or
later, and the guest should be x86_64. While technically it is
possible to have 32 bit guests with secure boot, some non-trivial
CPU flags tuning is required (for instance lm and nx flags must
be prohibited). Given complexity of our CPU driver, this is not
trivial. Therefore I've chosen to forbid 32 bit guests for now.
If there's ever need, we can refine the check later.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Call the vcpu thread info validation separately to decrease complexity
of returned values by qemuDomainRefreshVcpuInfo.
This function now returns 0 on success and -1 on error. Certain
failures of qemu to report data are still considered as success. Any
error reported now is fatal.
Validate the presence of the thread id according to state of the vCPU
rather than just checking the vCPU count. Additionally put the new
validation code into a separate function so that the information
retrieval can be split from the validation.
Move QEMU_DRIVE_HOST_PREFIX into the qemu_alias.c to dissuade future
callers from using it. Create qemuAliasDiskDriveSkipPrefix in order
to handle the current consumers that desire to check if an alias has
the drive- prefix and "get beyond it" in order to get the disk alias.
To sync with virDomainControllerModelUSB, we add two models
in qemuControllerModelUSB 'qusb1' and 'qusb2', but those
models are not supported in qemu driver. So add check in
device post parse to report errors if 'qusb1' and 'qusb2'
are specified.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Until now we simply errored out when the translation from pool+volume
failed. However, we should instead check whether that disk is needed or
not since there is an option for that.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1168453
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There is an error reset following the function and check for
startupPolicy before that. Let's reflect those things inside that
function so that future code doesn't have to be that complex.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The panic devices with models s390 and pseries are autogenerated.
For backwards compatibility reasons the devices are to be removed
when migrating.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Consider the following XML snippet:
<memory model=''>
<target>
<size unit='KiB'>523264</size>
<node>0</node>
</target>
</memory>
Whats wrong you ask? The @model attribute. This should result in
an error thrown into users faces during virDomainDefine phase.
Except it doesn't. The XML validation catches this error, but if
users chose to ignore that, they will end up with invalid XML.
Well, they won't be able to start the machine - that's when error
is produced currently. But it would be nice if we could catch the
error like this earlier.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The current LUKS support has a "luks" volume type which has
a "luks" encryption format.
This partially makes sense if you consider the QEMU shorthand
syntax only requires you to specify a format=luks, and it'll
automagically uses "raw" as the next level driver. QEMU will
however let you override the "raw" with any other driver it
supports (vmdk, qcow, rbd, iscsi, etc, etc)
IOW the intention though is that the "luks" encryption format
is applied to all disk formats (whether raw, qcow2, rbd, gluster
or whatever). As such it doesn't make much sense for libvirt
to say the volume type is "luks" - we should be saying that it
is a "raw" file, but with "luks" encryption applied.
IOW, when creating a storage volume we should use this XML
<volume>
<name>demo.raw</name>
<capacity>5368709120</capacity>
<target>
<format type='raw'/>
<encryption format='luks'>
<secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccd2f80d6f'/>
</encryption>
</target>
</volume>
and when configuring a guest disk we should use
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/home/berrange/VirtualMachines/demo.raw'/>
<target dev='sda' bus='scsi'/>
<encryption format='luks'>
<secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccd2f80d6f'/>
</encryption>
</disk>
This commit thus removes the "luks" storage volume type added
in
commit 318ebb36f1
Author: John Ferlan <jferlan@redhat.com>
Date: Tue Jun 21 12:59:54 2016 -0400
util: Add 'luks' to the FileTypeInfo
The storage file probing code is modified so that it can probe
the actual encryption formats explicitly, rather than merely
probing existance of encryption and letting the storage driver
guess the format.
The rest of the code is then adapted to deal with
VIR_STORAGE_FILE_RAW w/ VIR_STORAGE_ENCRYPTION_FORMAT_LUKS
instead of just VIR_STORAGE_FILE_LUKS.
The commit mentioned above was included in libvirt v2.0.0.
So when querying volume XML this will be a change in behaviour
vs the 2.0.0 release - it'll report 'raw' instead of 'luks'
for the volume format, but still report 'luks' for encryption
format. I think this change is OK because the storage driver
did not include any support for creating volumes, nor starting
guets with luks volumes in v2.0.0 - that only since then.
Clearly if we change this we must do it before v2.1.0 though.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Dropping the caching of ccw address set.
The cached set is not required anymore, because the set is now being
recalculated from the domain definition on demand, so the cache
can be deleted.
Dropping the caching of virtio serial address set.
The cached set is not required anymore, because the set is now being
recalculated from the domain definition on demand, so the cache
can be deleted.
Credit goes to Cole Robinson.
Resolves a CI test integration failure with a RHEL6/Centos6 environment.
In order to use a LUKS encrypted device, the design decision was to
generate an encrypted secret based on the master key. However, commit
id 'da86c6c' missed checking for that specifically.
When qemuDomainSecretSetup was implemented, a design decision was made
to "fall back" to a plain text secret setup if the specific cipher was
not available (e.g. virCryptoHaveCipher(VIR_CRYPTO_CIPHER_AES256CBC))
as well as the QEMU_CAPS_OBJECT_SECRET. For the luks encryption setup
there is no fall back to the plaintext secret, thus if that gets set
up by qemuDomainSecretSetup, then we need to fail.
Also, while the qemuxml2argvtest has set the QEMU_CAPS_OBJECT_SECRET
bit, it didn't take into account the second requirement that the
ability to generate the encrypted secret is possible. So modify the
test to not attempt to run the luks-disk if we know we don't have
the encryption algorithm.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1301021
Generate the luks command line using the AES secret key to encrypt the
luks secret. A luks secret object will be in addition to a an AES secret.
For hotplug, check if the encinfo exists and if so, add the AES secret
for the passphrase for the secret object used to decrypt the device.
Modify/augment the fakeSecret* in qemuxml2argvtest in order to handle
find a uuid or a volume usage with a specific path prefix in the XML
(corresponds to the already generated XML tests). Add error message
when the 'usageID' is not 'mycluster_myname'. Commit id '1d632c39'
altered the error message generation to rely on the errors from the
secret_driver (or it's faked replacement).
Add the .args output for adding the LUKS disk to the domain
Signed-off-by: John Ferlan <jferlan@redhat.com>
Soon we will be adding luks encryption support. Since a volume could require
both a luks secret and a secret to give to the server to use of the device,
alter the alias generation to create a slightly different alias so that
we don't have two objects with the same alias.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Migration to an older libvirt (pre v1.3.0-175-g7140807) is broken
because older versions of libvirt generated different channel paths and
they didn't drop the default paths when parsing domain XMLs. We'd get
such a nice error message:
internal error: process exited while connecting to monitor:
2016-07-08T15:28:02.665706Z qemu-kvm: -chardev socket,
id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/
domain-3-nest/org.qemu.guest_agent.0,server,nowait: Failed to bind
socket to /var/lib/libvirt/qemu/channel/target/domain-3-nest/
org.qemu.guest_agent.0: No such file or directory
That said, we should not even format the default paths when generating a
migratable XML.
https://bugzilla.redhat.com/show_bug.cgi?id=1320470
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Playing directly with our live definition, updating it, and reverting it
back once we are done is very nice and it's quite dangerous too. Let's
just make a copy of the domain definition if needed and do all tricks on
the copy.
https://bugzilla.redhat.com/show_bug.cgi?id=1320470
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This one's a bit more complicated. In qemuProcessPrepareDomain()
a master key for encrypting secret for ciphered disks is created.
This object lives within qemuDomainObjPrivate object. It is freed
in qemuProcessStop(), but if nobody calls it (for instance like
our qemuxml2argvtest does), the key object leaks.
==17078== 32 bytes in 1 blocks are definitely lost in loss record 633 of 707
==17078== at 0x4C2C070: calloc (vg_replace_malloc.c:623)
==17078== by 0xAD924DF: virAllocN (viralloc.c:191)
==17078== by 0x5050BA6: virCryptoGenerateRandom (qemuxml2argvmock.c:166)
==17078== by 0x453DC8: qemuDomainMasterKeyCreate (qemu_domain.c:678)
==17078== by 0x47A36B: qemuProcessPrepareDomain (qemu_process.c:4913)
==17078== by 0x47C728: qemuProcessCreatePretendCmd (qemu_process.c:5542)
==17078== by 0x433698: testCompareXMLToArgvFiles (qemuxml2argvtest.c:332)
==17078== by 0x4339AC: testCompareXMLToArgvHelper (qemuxml2argvtest.c:413)
==17078== by 0x446E7A: virTestRun (testutils.c:179)
==17078== by 0x445BD9: mymain (qemuxml2argvtest.c:2022)
==17078== by 0x44886F: virTestMain (testutils.c:969)
==17078== by 0x445D9B: main (qemuxml2argvtest.c:2036)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The code in qemuDomainObjPrivateXMLParseVcpu for parsing
the 'idstr' string was comparing the overall boolean
result against 0 which was always true
qemu/qemu_domain.c: In function 'qemuDomainObjPrivateXMLParseVcpu':
qemu/qemu_domain.c:1482:59: error: comparison of constant '0' with boolean expression is always false [-Werror=bool-compare]
if ((idstr && virStrToLong_uip(idstr, NULL, 10, &idx)) < 0 ||
^
It was further performing two distinct error checks in
the same conditional and reporting a single error message,
which was misleading in one of the two cases.
This splits the conditional check into two parts with
distinct error messages and fixes the logic error.
Fixes the bug in
commit 5184f398b4
Author: Peter Krempa <pkrempa@redhat.com>
Date: Fri Jul 1 14:56:14 2016 +0200
qemu: Store vCPU thread ids in vcpu private data objects
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Otherwise migration during which we didn't send client_migrate_info QMP
command will get stuck waiting for SPICE migration to finish if libvirtd
sent the QMP command in a previous migration attempt.
Broken by bd7c8a69.
https://bugzilla.redhat.com/show_bug.cgi?id=1151723
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Due to the way the hardware works, KVM on ppc64 always requires
memory locking; however, that is not the case for non-KVM ppc64
guests, eg. ppc64 guests that are running on x86_64 with TCG.
Only require memory locking for ppc64 guests if they are using
KVM or, as it's the case for all architectures, they have host
devices assigned using VFIO.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1350772
Introduce a helper to help determine if a disk src could be possibly used
for a disk secret... Going to need this for hot unplug.
Signed-off-by: John Ferlan <jferlan@redhat.com>
libvirt's qemu driver doesn't have direct access to the config on the
guest side of a network interface, and currently doesn't have any
method in place to even inform the guest of the desired config. In the
future, an unenforceable attempt to set the guest-side IP info could
be made by adding a static host entry to the appropriate dnsmasq
configuration (or changing the default dhcp client address on the qemu
commandline for type='user' interfaces), or enhancing the guest agent
to allow setting an IP address, but for now it can't have any effect,
and we don't want to give the illusion that it does.
To prevent the "disappearance" of any existing configs with ip
address/route info (due to parser failure), this check is added in the
newly implemented qemuDomainDeviceDefValidate(), which is only called
when a domain is defined or started, *not* when it is reread from disk
at libvirtd startup.
Rather than pass authdef, pass the 'authdef->username' and the
'&authdef->secdef'
Note that a username may be NULL.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than assume/pass the protocol to the qemuDomainSecretPlainSetup
and qemuDomainSecretAESSetup, set and pass the secretUsageType based
on the src->protocol type. This will eventually be used by the
virSecretGetSecretString call
Signed-off-by: John Ferlan <jferlan@redhat.com>
Move the enum into a new src/util/virsecret.h, rename it to be
virSecretLookupType. Add a src/util/virsecret.h in order to perform
a couple of simple operations on the secret XML and virSecretLookupTypeDef
for clearing and copying.
This includes quite a bit of collateral damage, but the goal is to remove
the "virStorage*" and replace with the virSecretLookupType so that it's
easier to to add new lookups that aren't necessarily storage pool related.
Signed-off-by: John Ferlan <jferlan@redhat.com>
There has been some progress lately in enabling virtio-pci on
aarch64 guests; however, guest OS support is still spotty at best,
so most guests are going to be using virtio-mmio instead.
Currently, mach-virt guests are closely modeled after q35 guests,
and that includes always adding a dmi-to-pci-bridge that's just
impossible to get rid of. While that's acceptable (if suboptimal)
for q35, where you will always need some kind of PCI device anyway,
mach-virt guests should be allowed to avoid it.
While we need to know the difference between the total memory stored in
<memory> and the actual size not included in the possible memory modules
we can't pre-calculate it reliably. This is due to the fact that
libvirt's XML is copied via formatting and parsing the XML and the
initial memory size can be reliably calculated only when certain
conditions are met due to backwards compatibility.
This patch removes the storage of 'initial_memory' and fixes the helpers
to recalculate the initial memory size all the time from the total
memory size. This conversion is possible when we also make sure that
memory hotplug accounts properly for the update of the total memory size
and thus the helpers for inserting and removing memory devices need to
be tweaked too.
This fixes a bug where a cold-plug and cold-remove of a memory device
would increase the size reported in <memory> in the XML by the size of
the memory device. This would happen as the persistent definition is
copied before attaching the device and this would lead to the loss of
data in 'initial_memory'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1344892
Until now, a Q35 domain (or arm/virt, or any other domain that has a
pcie-root bus) would always have a pci-bridge added, so that there
would be a hotpluggable standard PCI slot available to plug in any PCI
devices that might be added. This patch removes the explicit add,
instead relying on the pci-bridge being auto-added during PCI address
assignment (it will add a pci-bridge if there are no free slots).
This doesn't eliminate the dmi-to-pci-bridge controller that is
explicitly added whether or not a standard PCI slot is required (and
that is almost never used as anything other than a converter between
pcie.0's PCIe slots and standard PCI). That will be done separately.
Previously there was no way to have a Q35 domain that didn't have
these two controllers. This patch skips their creation as long as
there are some other kinds of pci controllers at index 1 and 2
(e.g. some pcie-root-port controllers).
I'm hoping that soon we won't add them at all, plugging all devices
into auto-added pcie-*-port ports instead, but in the meantime this
makes it easier to experiment with alternative bus hierarchies.
VNC graphics already supports sockets but only via 'socket' attribute.
This patch coverts that attribute into listen type 'socket'.
For backward compatibility we need to handle listen type 'socket' and 'socket'
attribute properly to support old XMLs and new XMLs. If both are provided they
have to match, if only one of them is provided we need to be able to parse that
configuration too.
To not break migration back to old libvirt if the socket is provided by user we
need to generate migratable XML without the listen element and use only 'socket'
attribute.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Even though it's auto-generated it's based on qemu.conf option and listen type
address already uses "fromConfig" to carry this information. Following commits
will convert the socket to listen element so this rename is required because
there will be also an option to get socket auto-generated independently on the
qemu.conf option.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Put it into separate function called qemuDomainPrepareChannel() and call
it from the new qemuProcessPrepareDomain().
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Along with the virtlogd addition of the log file appending API implement
a helper for logging one-shot entries to the log file including the
fallback approach of using direct file access.
This will be used for noting the shutdown of the qemu proces and
possibly other actions such as VM migration and other critical VM
lifecycle events.
Until now we weren't able to add checks that would reject configuration
once accepted by the parser. This patch adds a new callback and
infrastructure to add such checks. In this patch all the places where
rejecting a now-invalid configuration wouldn't be a good idea are marked
with a new parser flag.
https://bugzilla.redhat.com/show_bug.cgi?id=1182074
If they're available and we need to pass secrets to qemu, then use the
qemu domain secret object in order to pass the secrets for RBD volumes
instead of passing the base64 encoded secret on the command line.
The goal is to make AES secrets the default and have no user interaction
required in order to allow using the AES mechanism. If the mechanism
is not available, then fall back to the current plain mechanism using
a base64 encoded secret.
New APIs:
qemu_domain.c:
qemuDomainGetSecretAESAlias:
Generate/return the secret object alias for an AES Secret Info type.
This will be called from qemuDomainSecretAESSetup.
qemuDomainSecretAESSetup: (private)
This API handles the details of the generation of the AES secret
and saves the pieces that need to be passed to qemu in order for
the secret to be decrypted. The encrypted secret based upon the
domain master key, an initialization vector (16 byte random value),
and the stored secret. Finally, the requirement from qemu is the IV
and encrypted secret are to be base64 encoded.
qemu_command.c:
qemuBuildSecretInfoProps: (private)
Generate/return a JSON properties object for the AES secret to
be used by both the command building and eventually the hotplug
code in order to add the secret object. Code was designed so that
in the future perhaps hotplug could use it if it made sense.
qemuBuildObjectSecretCommandLine (private)
Generate and add to the command line the -object secret for the
secret. This will be required for the subsequent RBD reference
to the object.
qemuBuildDiskSecinfoCommandLine (private)
Handle adding the AES secret object.
Adjustments:
qemu_domain.c:
The qemuDomainSecretSetup was altered to call either the AES or Plain
Setup functions based upon whether AES secrets are possible (we have
the encryption API) or not, we have secrets, and of course if the
protocol source is RBD.
qemu_command.c:
Adjust the qemuBuildRBDSecinfoURI API's in order to generate the
specific command options for an AES secret, such as:
-object secret,id=$alias,keyid=$masterKey,data=$base64encodedencrypted,
format=base64
-drive file=rbd:pool/image:id=myname:auth_supported=cephx\;none:\
mon_host=mon1.example.org\:6321,password-secret=$alias,...
where the 'id=' value is the secret object alias generated by
concatenating the disk alias and "-aesKey0". The 'keyid= $masterKey'
is the master key shared with qemu, and the -drive syntax will
reference that alias as the 'password-secret'. For the -drive
syntax, the 'id=myname' is kept to define the username, while the
'key=$base64 encoded secret' is removed.
While according to the syntax described for qemu commit '60390a21'
or as seen in the email archive:
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04083.html
it is possible to pass a plaintext password via a file, the qemu
commit 'ac1d8878' describes the more feature rich 'keyid=' option
based upon the shared masterKey.
Add tests for checking/comparing output.
NB: For hotplug, since the hotplug code doesn't add command line
arguments, passing the encoded secret directly to the monitor
will suffice.
Move the logic from qemuDomainGenerateRandomKey into this new
function, altering the comments, variable names, and error messages
to keep things more generic.
NB: Although perhaps more reasonable to add soemthing to virrandom.c.
The virrandom.c was included in the setuid_rpc_client, so I chose
placement in vircrypto.
This wires up qemuDomainAssignAddresses into the new
virDomainDefAssignAddressesCallback, so it's always triggered
via virDomainDefPostParse. We are essentially doing this already
with open coded calls sprinkled about.
qemu argv parse output changes slightly since previously it wasn't
hitting qemuDomainAssignAddresses.
When the <gic/> element in not present in the domain XML, use the
domain capabilities to figure out what GIC version is usable and
choose that one automatically.
This allows guests to be created on hardware that only supports
GIC v3 without having to update virt-manager and similar tools.
Keep using the default GIC version if the <gic/> element has been
added to the domain XML but no version has been specified, as not
to break existing guests.
Rather than returning a "char *" indicating perhaps some sized set of
characters that is NUL terminated, alter the function to return 0 or -1
for success/failure and add two parameters to handle returning the
buffer and it's size.
The function no longer encodes the returned secret, rather it returns
the unencoded secret forcing callers to make the necessary adjustments.
Alter the callers to handle the adjusted model.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Call the internal driver callbacks rather than the public APIs to avoid
calling unnecessarily the error dispatching code and don't overwrite
the error messages provided by the APIs. They are good enough to
describe which secret is missing either by UUID or the usage (basically
name).
Further followup discussions in list on commit 192a53e concluded
that we should be leaving out the USB controller only for
i440fx machines as default USB can be used by someone on q35
at random slots.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
libvirt may automatically add a pci-root or pcie-root controller to a
domain, depending on the arch/machinetype, and it hopefully always
makes the right decision about which to add (since in all cases these
controllers are an implicit part of the virtual machine).
But it's always possible that someone will create a config that
explicitly supplies the wrong type of PCI controller for the selected
machinetype. In the past that would lead to an error later when
libvirt was trying to assign addresses to other devices, for example:
XML error: PCI bus is not compatible with the device at
0000:00:02.0. Device requires a PCI Express slot, which is not
provided by bus 0000:00
(that's the error message that appears if you replace the pcie-root
controller in a Q35 domain with a pci-root controller).
This patch adds a check at the same place that the implicit
controllers are added (to ensure that the same logic is used to check
which type of pci root is correct). If a pci controller with index='0'
is already present, we verify that it is of the model that we would
have otherwise added automatically; if not, an error is logged:
The PCI controller with index='0' must be " model='pcie-root' for
this machine type, " but model='pci-root' was found instead.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1004602
Remove the possibility that a NULL hostdev->privateData or a
disk->privateData could crash libvirtd by checking for NULL
before dereferencing for the secinfo structure in the
qemuDomainSecret{Disk|Hostdev}Destroy functions. The hostdevPriv
could be NULL if qemuProcessNetworkPrepareDevices adds a new
hostdev during virDomainNetGetActualHostdev that then gets
inserted via virDomainHostdevInsert. The hostdevPriv was added
by commit id '27726d8' and is currently only used by scsi hostdev.
qemuDomainCheckDiskPresence has short-circuit code to skip the
determination of the disk backing chain for storage formats that can't
have backing volumes. The code treats VIR_STORAGE_FILE_NONE as not
having backing chain and skips the call to qemuDomainDetermineDiskChain.
This is wrong as qemuDomainDetermineDiskChain is responsible for storage
format detection and has logic to determine the default type if format
detection is disabled.
This allows to storage passed via <disk type="volume"> to circumvent the
enforcement to have correct storage format or that we shall default to
format='raw', since we don't set the default type via the post parse
callback for "volume" backed disks as the translation code could come up
with a better guess.
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1328003