With QEMU older than 2.9.0 libvirt uses CPUID instruction to determine
what CPU features are supported on the host. This was later used when
checking compatibility of guest CPUs. Since QEMU 2.9.0 we ask QEMU for
the host CPU data. But the two methods we use usually provide disjoint
sets of CPU features because QEMU/KVM does not support all features
provided by the host CPU and on the other hand it can enable some
feature even if the host CPU does not support them.
So if there is a domain which requires a CPU features disabled by
QEMU/KVM, libvirt will refuse to start it with QEMU > 2.9.0 as its guest
CPU is incompatible with the host CPU data we got from QEMU. But such
domain would happily start on older QEMU (of course, the features would
be missing the guest CPU). To fix this regression, we need to combine
both CPU feature sets when checking guest CPU compatibility.
https://bugzilla.redhat.com/show_bug.cgi?id=1439933
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We already know from QEMU which CPU features will block migration. Let's
use this information to make a migratable copy of the host CPU model and
use it for updating guest CPU specification. This will allow us to drop
feature filtering from virCPUUpdate where it was just a hack.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Soon we will need to store multiple host CPU definitions in
virQEMUCapsHostCPUData and qemuCaps users will want to request the one
they need. This patch introduces virQEMUCapsHostCPUType enum which will
be used for specifying the requested CPU definition.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This attribute is not needed here, since @mon is in use.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
This way qemuDomainLogContextRef() and qemuDomainLogContextFree() is
no longer needed. The naming qemuDomainLogContextFree() was also
somewhat misleading. Additionally, it's easier to turn
qemuDomainLogContext in a self-locking object.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
This new API is supposed to reset all migration parameters to make sure
future migrations won't accidentally use them. This patch makes the
first step and moves qemuMigrationResetTLS call inside
qemuMigrationReset.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
qemuProcessRecoverMigrationOut doesn't explicitly call
qemuMigrationResetTLS relying on two things:
- qemuMigrationCancel resets TLS parameters
- our migration code resets TLS before entering
QEMU_MIGRATION_PHASE_PERFORM3_DONE phase
But this is not obvious and the assumptions will be broken soon. Let's
explicitly reset TLS parameters on all paths which do not kill the
domain.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
There is no async job running when a freshly started libvirtd is trying
to recover from an interrupted incoming migration. While at it, let's
call qemuMigrationResetTLS every time we don't kill the domain. This is
not strictly necessary since TLS is not supported when v2 migration
protocol is used, but doing so makes more sense.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
qemuProcessVerifyHypervFeatures is supposed to check whether all
requested hyperv features were actually honored by QEMU/KVM. This is
done by checking the corresponding CPUID bits reported by the virtual
CPU. In other words, it doesn't work for string properties, such as
VIR_DOMAIN_HYPERV_VENDOR_ID (there is no CPUID bit we could check). We
could theoretically check all 96 bits corresponding to the vendor
string, but luckily we don't have to check the feature at all. If QEMU
is too old to support hyperv features, the domain won't even start.
Otherwise, it is always supported.
Without this patch, libvirt refuses to start a domain which contains
<features>
<hyperv>
<vendor_id state='on' value='...'/>
</hyperv>
</features>
reporting internal error: "unknown CPU feature __kvm_hv_vendor_id.
This regression was introduced by commit v3.1.0-186-ge9dbe7011, which
(by fixing the virCPUDataCheckFeature condition in
qemuProcessVerifyHypervFeatures) revealed an old bug in the feature
verification code. It's been there ever since the verification was
implemented by commit v1.3.3-rc1-5-g95bbe4bf5, which effectively did not
check VIR_DOMAIN_HYPERV_VENDOR_ID at all.
https://bugzilla.redhat.com/show_bug.cgi?id=1439424
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Currently, if we want to zero out disk source (e,g, due to
startupPolicy when starting up a domain) we use
virDomainDiskSetSource(disk, NULL). This works well for file
based storage (storage type file, dir, or block). But it doesn't
work at all for other types like volume and network.
So imagine that you have a domain that has a CDROM configured
which source is a volume from an inactive pool. Because it is
startupPolicy='optional', the CDROM is empty when the domain
starts. However, the source element is not cleared out in the
status XML and thus when the daemon restarts and tries to
reconnect to the domain it refreshes the disks (which fails - the
storage pool is still not running) and thus the domain is killed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For certain kinds of panic notifiers (notably hyper-v) qemu is able to
report some data regarding the crash passed from the guest.
Make the data accessible to the callback in qemu so that it can be
processed further.
Detect the node names when setting block threshold and when reconnecting
or when they are cleared when a block job finishes. This operation will
become a no-op once we fully support node names.
If the migration flags indicate this migration will be using TLS,
then set up the destination during the prepare phase once the target
domain has been started to add the TLS objects to perform the migration.
This will create at least an "-object tls-creds-x509,endpoint=server,..."
for TLS credentials and potentially an "-object secret,..." to handle the
passphrase response to access the TLS credentials. The alias/id used for
the TLS objects will contain "libvirt_migrate".
Once the objects are created, the code will set the "tls-creds" and
"tls-hostname" migration parameters to signify usage of TLS.
During the Finish phase we'll be sure to attempt to clear the
migration parameters and delete those objects (whether or not they
were created). We'll also perform the same reset during recovery
if we've reached FINISH3.
If the migration isn't using TLS, then be sure to check if the
migration parameters exist and clear them if so.
Calling virCPUUpdateLive on a domain with no guest CPU configuration
does not make sense. Especially when doing so would crash libvirtd.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When starting a domain with custom guest CPU specification QEMU may add
or remove some CPU features. There are several reasons for this, e.g.,
QEMU/KVM does not support some requested features or the definition of
the requested CPU model in libvirt's cpu_map.xml differs from the one
QEMU is using. We can't really avoid this because CPU models are allowed
to change with machine types and libvirt doesn't know (and probably
doesn't even want to know) about such changes.
Thus when we want to make sure guest ABI doesn't change when a domain
gets migrated to another host, we need to update our live CPU definition
according to the CPU QEMU created. Once updated, we will change CPU
checking to VIR_CPU_CHECK_FULL to make sure the virtual CPU created
after migration exactly matches the one on the source.
https://bugzilla.redhat.com/show_bug.cgi?id=822148https://bugzilla.redhat.com/show_bug.cgi?id=824989
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
qemuMonitorGetGuestCPU can now optionally create CPU data from
filtered-features in addition to feature-words.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The checks are now in a dedicated qemuProcessVerifyHypervFeatures
function.
In addition to moving the code this patch also fixes a few bugs: the
original code was leaking cpuFeature and the return value of
virCPUDataCheckFeature was not checked properly.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
There were couple of reports on the list (e.g. [1]) that guests
with huge amounts of RAM are unable to start because libvirt
kills qemu in the initialization phase. The problem is that if
guest is configured to use hugepages kernel has to zero them all
out before handing over to qemu process. For instance, 402GiB
worth of 1GiB pages took around 105 seconds (~3.8GiB/s). Since we
do not want to make the timeout for connecting to monitor
configurable, we have to teach libvirt to count with this
fact. This commit implements "1s per each 1GiB of RAM" approach
as suggested here [2].
1: https://www.redhat.com/archives/libvir-list/2017-March/msg00373.html
2: https://www.redhat.com/archives/libvir-list/2017-March/msg00405.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1430634
If a qemu process has died, we get EOF on its monitor. At this
point, since qemu process was the only one running in the
namespace kernel has already cleaned the namespace up. Any
attempt of ours to enter it has to fail.
This really happened in the bug linked above. We've tried to
attach a disk to qemu and while we were in the monitor talking to
qemu it just died. Therefore our code tried to do some roll back
(e.g. deny the device in cgroups again, restore labels, etc.).
However, during the roll back (esp. when restoring labels) we
still thought that domain has a namespace. So we used secdriver's
transactions. This failed as there is no namespace to enter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that we have some qemuSecurity wrappers over
virSecurityManager APIs, lets make sure everybody sticks with
them. We have them for a reason and calling virSecurityManager
API directly instead of wrapper may lead into accidentally
labelling a file on the host instead of namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Querying "host" CPU model expansion only makes sense for KVM. QEMU 2.9.0
introduces a new "max" CPU model which can be used to ask QEMU what the
best CPU it can provide to a TCG domain is.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
After eca76884ea in case of error in qemuDomainSetPrivatePaths()
in pretended start we jump to stop. I've changed this during
review from 'cleanup' which turned out to be correct. Well, sort
of. We can't call qemuProcessStop() as it decrements
driver->nactive and we did not increment it. However, it calls
virDomainObjRemoveTransientDef() which is basically the only
function we need to call. So call that function and goto cleanup;
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The new API is called virCPUDataFree. Individual CPU drivers are no
longer required to implement their own freeing function unless they need
to free architecture specific data from virCPUData.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The port is stored in graphics configuration and it will
also get released in qemuProcessStop in case of error.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1397440
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Fix incorrect jump labels in error paths as the stop jump is only
needed if the driver has already changed the state. For example
'virAtomicIntInc(&driver->nactive)' will be 'reverted' in the
qemuProcessStop call.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The functions in virCommand() after fork() must be careful with regard
to accessing any mutexes that may have been locked by other threads in
the parent process. It is possible that another thread in the parent
process holds the lock for the virQEMUDriver while fork() is called.
This leads to a deadlock in the child process when
'virQEMUDriverGetConfig(driver)' is called and therefore the handshake
never completes between the child and the parent process. Ultimately
the virDomainObjectPtr will never be unlocked.
It gets much worse if the other thread of the parent process, that
holds the lock for the virQEMUDriver, tries to lock the already locked
virDomainObject. This leads to a completely unresponsive libvirtd.
It's possible to reproduce this case with calling 'virsh start XXX'
and 'virsh managedsave XXX' in a tight loop for multiple domains.
This commit fixes the deadlock in the same way as it is described in
commit 61b52d2e38.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
With that users could access files outside /dev/shm. That itself
isn't a security problem, but might cause some errors we want to
avoid. So let's forbid slashes as we do with domain and volume names
and also mention that in the schema.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1395496
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The situation covered by the removed code will not ever happen.
This code is called only while starting a new QEMU process where
the capabilities where already checked and while attaching to
existing QEMU process where we don't even detect the iothreads.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Rename to avoid duplicate code. Because virDomainMemoryAccess will be
used in memorybacking for setting default behaviour.
NOTE: The enum cannot be moved to qemu/domain_conf because of headers
dependency
https://bugzilla.redhat.com/show_bug.cgi?id=1413922
While all the code that deals with qemu namespaces correctly
detects whether we are running as root (and turn into NO-OP for
qemu:///session) the actual unshare() call is not guarded with
such check. Therefore any attempt to start a domain under
qemu:///session shall fail as unshare() is reserved for root.
The fix consists of moving unshare() call (for which we have a
wrapper called virProcessSetupPrivateMountNS) into
qemuDomainBuildNamespace() where the proper check is performed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
If we restart libvirtd while VM was doing external memory snapshot, VM's
state be updated to paused as a result of running a migration-to-file
operation, and then VM will be left as paused state. In this case we must
restart the VM's CPUs to resume it.
Signed-off-by: Wang King <king.wang@huawei.com>
After qemu delivers the resume event it's already running and thus it's
too late to enter lockspaces since it may already have modified the
disk. The code only creates false log entries in the case when locking
is enabled. The lockspace needs to be acquired prior to starting cpus.
Instead of trying to fix our security drivers, we can use a
simple trick to relabel paths in both namespace and the host.
I mean, if we enter the namespace some paths are still shared
with the host so any change done to them is visible from the host
too.
Therefore, we can just enter the namespace and call
SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight
overhead because we have to fork in order to enter the namespace.
But on the other hand, no complexity is added to our code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Prime time. When it comes to spawning qemu process and
relabelling all the devices it's going to touch, there's inherent
race with other applications in the system (e.g. udev). Instead
of trying convincing udev to not touch libvirt managed devices,
we can create a separate mount namespace for the qemu, and mount
our own /dev there. Of course this puts more work onto us as we
have to maintain /dev files on each domain start and device
hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent
(libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev
/var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare()
to create new mount namespace. From now on anything that child
does is invisible to the parent. Child then mounts tmpfs on
$domain.dev (so that it still sees original /dev from the host)
and creates some devices (as explained in one of the previous
patches). The devices have to be created exactly as they are in
the host (including perms, seclabels, ACLs, ...). After that it
moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can
create PTYs for some chardevs. And historically we exposed the
host ends in our domain XML allowing users to connect to them.
Therefore we must preserve devpts mount to be shared with the
host's one.
To make this patch as small as possible, creating of devices
configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If the cpuset cgroup controller is disabled in /etc/libvirt/qemu.conf
QEMU virtual machines can in principle use all host CPUs, even if they
are hot plugged, if they have no explicit CPU affinity defined.
However, there's libvirt code supposed to handle the situation where
the libvirt daemon itself is not using all host CPUs. The code in
qemuProcessInitCpuAffinity attempts to set an affinity mask including
all defined host CPUs. Unfortunately, the resulting affinity mask for
the process will not contain the offline CPUs. See also the
sched_setaffinity(2) man page.
That means that even if the host CPUs come online again, they won't be
used by the QEMU process anymore. The same is true for newly hot
plugged CPUs. So we are effectively preventing that QEMU uses all
processors instead of enabling it to use them.
It only makes sense to set the QEMU process affinity if we're able
to actually grow the set of usable CPUs, i.e. if the process affinity
is a subset of the online host CPUs.
There's still the chance that for some reason the deliberately chosen
libvirtd affinity matches the online host CPU mask by accident. In this
case the behavior remains as it was before (CPUs offline while setting
the affinity will not be used if they show up later on).
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>