https://bugzilla.redhat.com/show_bug.cgi?id=1503284
The way we currently start qemu from CPU affinity POV is as
follows:
1) the child process is set affinity to all online CPUs (unless
some vcpu pinning was given in the domain XML)
2) Once qemu is running, cpuset cgroup is configured taking
memory pinning into account
Problem is that we let qemu allocate its memory just anywhere in
1) and then rely in 2) to be able to move the memory to
configured NUMA nodes. This might not be always possible (e.g.
qemu might lock some parts of its memory) and is very suboptimal
(copying large memory between NUMA nodes takes significant amount
of time).
The solution is to set affinity to one of (in priority order):
- The CPUs associated with NUMA memory affinity mask
- The CPUs associated with emulator pinning
- All online host CPUs
Later (once QEMU has allocated its memory) we then change this
again to (again in priority order):
- The CPUs associated with emulator pinning
- The CPUs returned by numad
- The CPUs associated with vCPU pinning
- All online host CPUs
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This is mainly about /dev/sev and its default permissions 0600. Of
course, rule of 'tinfoil' would be that we can't trust anything, but the
probing code in QEMU is considered safe from security's perspective + we
can't create an udev rule for this at the moment, because ioctls and
file system permissions aren't cross-checked in kernel and therefore a
user with read permissions could issue a 'privileged' operation on SEV
which is currently only limited to root.
https://bugzilla.redhat.com/show_bug.cgi?id=1665400
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Instead of exposing /dev/sev to every domain, do it selectively.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
SEV has a limit on number of concurrent guests. From security POV we
should only expose resources (any resources for that matter) to domains
that truly need them.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
We should not give domains access to something they don't necessarily
need by default. Remove it from the qemu driver docs too.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
virtio-mmio is still used by default, so if PCI is desired
it's necessary to explicitly opt-in by adding an appropriate
<address type='pci' domain='0x0000' ... />
element to the corresponding device.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The 'qemu' binary used to provide the i386 emulator until it was renamed
to qemu-system-i386 in QEMU 1.0. Since we don't support such old
versions we don't need to check for 'qemu' when probing capabilities.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Some clients poll virDomainGetBlockJobInfo rather than wait for the
VIR_DOMAIN_BLOCK_JOB_READY event. In some cases qemu can get to 100% and
still not reach the synchronised phase. Initiating a pivot in that case
will fail.
Given that computers are interacting here, the error that the job
can't be finalized yet is not handled very well by those specific
implementations.
Our docs now correctly state to use the event. We already do a similar
output adjustment in case when the progress is not available from qemu
as in that case we'd report 0 out of 0, which some apps also incorrectly
considered as 100% complete.
In this case we subtract 1 from the progress if the ready state is not
signalled by qemu if the progress was at 100% otherwise.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The virDomainDeviceInfo parameter is a large struct so it is preferrable
to pass it by reference instead of by value.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Be more sensible when setting labels of the target of a
virDomainBlockCopy operation. Previously we'd relabel everything in case
it's a copy job even if there's no unlabelled backing chain. Since we
are also not sure whether the backing chain is shared we don't relabel
the chain on completion of the blockjob. This certainly won't play nice
with the image permission relabelling feature.
While this does not fix the case where the image is reused and has
backing chain it certainly sanitizes all the other cases. Later on it
will also allow to do the correct thing in cases where only one layer
was introduced.
The change is necessary as in case when -blockdev will be used we will
need to hotplug the backing chain and thus labeling needs to be setup in
advance and not only at the time of pivot. To avoid multiple code paths
move the labeling now.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Rather than passing in a virStorageSource which would override the
originally passed disk->src we can now drop passing in a disk completely
as all functions called inside here require a virStorageSource.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Use the functions designed to deal with single images as the *Disk
functions were just wrappers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Previously there weren't any suitable functions which would allow
setting up host side of a full disk chain so we've opted to replace the
'src' in a virDomainDiskDef by the new image source.
That is now no longer necessary so remove the munging.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The same can be achieved by using qemuSecurity[Set|Restore]ImageLabel.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The flag will control the VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN
flag of the security driver image labeling APIs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Security labeling of disks consists of labeling of the disk image
itself and it's backing chain. Modify
virSecurityManager[Set|Restore]ImageLabel to take a boolean flag that
will label the full chain rather than the top image itself.
This allows to delete/unify some parts of the code and will also
simplify callers in some cases.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Since the disk is necessary only to get the source modify the functions
to take the source directly and rename them to
qemu[Setup|Teardown]ImageChainCgroup.
Additionally drop a pointless comment containing the old function name.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When we need to detect a chain for a image which will become the new
source for a disk (e.g. after a disk media change or a blockjob) we'd
need to replace disk->src temporarily to do so.
Move the 'disksrc' temporary variable to an argument and adjust callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The function at first validates the top image of the chain, then
traverses the chain as declared in the XML (if any) and then procedes to
detect the rest of the chain from images. All of the steps have their
own temporary iterator.
Clarify the use scope of the steps by introducing a new temp variable
holding the top level source and adding comments.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This reverts commit 8b035c84d8.
The MTTCG impl in QEMU does allow pinning vCPUs.
When the guest is running we already check if pinning is
possible in the qemuDomainPinVcpuLive method, so this
check was adding no benefit.
When the guest is not running, we cannot know whether the
subsequent launch will use MTTCG or TCG, so we must allow
the pinning request. If the guest does use TCG on the next
launch it will fail, but this is no worse than if the user
had done a virDomainDefineXML with an XML doc specifying
vCPU pinning.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
MTTCG is the new multi-threaded impl of TCG which follows
KVM in having one host OS thread per vCPU. Historically
we have discarded all PIDs reported for TCG guests, but
we must now selectively honour this data.
We don't have anything in the domain XML that indicates
whether a guest is using TCG or MTTCG. While QEMU does
have an option (-accel tcg,thread=single|multi), it is
not desirable to expose this in libvirt. QEMU will
automatically use MTTCG when the host/guest architecture
pairing is known to be safe. Only developers of QEMU TCG
have a strong reason to override this logic.
Thus we use two sanity checks to decide if the vCPU
PID information is usable. First we see if the PID
duplicates the main emulator PID, and second we see
if the PID duplicates any other vCPUs.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Add documentation that the 'VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB' flag
is auto-assumed if the block copy job is started while the VM is
transient and remove the restriction to define the domain when copy
is running.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
disk->mirror would not be cleared while the local pointer was freed in
qemuDomainBlockCommit if qemuDomainObjExitMonitor or qemuBlockJobDiskNew
would return a failure.
Since block job handling is executed in the separate handler which needs
a qemu job, we don't need to pre-set the mirror state prior to starting
the job. Similarly the block copy job does not do that.
Move the setting of the data after starting the job so that we avoid
this problem.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
While this should not be necessary as we clear it in the event handler,
let's be sure and clear it prior to starting the job.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Switching a block job to some states (e.g. QEMU_BLOCKJOB_STATE_READY)
might not require a job, thus if it will become ready asynchronously we
should not overwrite the state any more.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
While the callers should make sure that they don't call
qemuBlockJobEmitEvents for any internal state or job, let's add checks
that prevents us from emitting wrong events altogether.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We have this very handy macro called VIR_STEAL_PTR() which steals
one pointer into the other and sets the other to NULL. The
following coccinelle patch was used to create this commit:
@ rule1 @
identifier a, b;
@@
- b = a;
...
- a = NULL;
+ VIR_STEAL_PTR(b, a);
Some places were clean up afterwards to make syntax-check happy
(e.g. some curly braces were removed where the body become a one
liner).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Use qemuBuildControllersCommandLine since it builds the command line
for (nearly) all controllers, not just one.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Now that the inner loop does not require any other variables,
it can be easily separated. Apart from reducing the indentation
level this will allow it to be called from different code paths.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Now that it's no longer needed, remove the argument.
This removes the last helper variable in
qemuBuildControllerDevCommandLine.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
qemuBuildLegacyUSBControllerCommandLine is the only place where
we need to count the USB controllers.
Count them again instead of keeping track in a variable passed to
qemuBuildControllerDevStr.
This removes the need for another variable in the loop in
qemuBuildControllerDevCommandLine.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Count them in qemuBuildLegacyUSBControllerCommandLine to remove
yet another variable accessed from the loop in
qemuBuildControllerDevCommandLine.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
This removes the need to mark it in the 'usbcontroller' variable.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Move out the code formatting "-usb" on the QEMU command line.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Switch qemuBuildVirtioDevStr to use virDomainDeviceSetData: callers
pass in the virDomainDeviceType and the void * DefPtr. This will
save us from having to repeatedly extend the function argument
list in subsequent patches.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Current code essentially duplicates the same logic, but misses
some cases (like vhost-vsock-device).
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
The vhost-scsi device string should depend on the requested
address type, not strictly on the emulated arch. This is the
same logic used by qemuBuildVirtioDevStr, and this particular
path is already tested in the hostdev-scsi-vhost-scsi-ccw tests
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Move the rng->model == VIRTIO check to parse time. This also
allows us to remove similar checks throughout the qemu driver
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
If we validate that memballoon is NONE|VIRTIO at parse time,
we can drop similar checks elsewhere in the qemu driver
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
This will be extended in the future, so let's simplify things by
centralizing the checks.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Asserting the value we set four lines earlier in qemuBlockjobState
doesn't buy us any safety (if the public header adds a value, we end
up skipping that value without the compiler warning us of our gap);
what we really want is to assert that the value auto-assigned by the
compiler matches the actual last value in the public headers (as was
done below for qemuBlockJobType). Add useful comments while at it.
Signed-off-by: Eric Blake <eblake@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Hanlde all the possible failure codes as per ACPI standard documented in
the function header.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1660410
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We forgot to document the specific fields for the 0x103 and 0x200
sources which are tied to device removal and device hotplug
respectively.
The value description is based on the ACPI 6.2A standard Table 6-207 and
Table 6-208. At the time of writing of this patch the standard can be
accessed e.g. at:
https://www.uefi.org/sites/default/files/resources/ACPI%206_2_A_Sept29.pdf
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The device xml parser code does not set "model" while parsing the
following XML:
<interface type='hostdev'>
<source>
<address type='pci' domain='0x0002' bus='0x01' slot='0x00' function='0x2'/>
</source>
</interface>
The net->model can be NULL and therefore must be compared using
STREQ_NULLABLE instead of plain STREQ.
Fixes: ac47e4a622 (qemu: replace "def->nets[i]" with "net" and "def->sounds[i]" with "sound")
Fixes: c7fc151eec (qemu: assign virtio devices to PCIe slot when appropriate)
Signed-off-by: Radoslaw Biernacki <radoslaw.biernacki@linaro.org>
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add a capability check to qemuDomainDefValidate and refuse to start
a domain with VNC graphics if the TLS secret was set in qemu.conf
and it's not supported.
Note that qemuDomainSecretGraphicsPrepare does not generate any
secret data if the capability is not present and qemuBuildTLSx509BackendProps
is not called at all.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Use the password stored in the secret driver under
the uuid specified by the vnc_tls_x509_secret_uuid
option in qemu.conf.
https://bugzilla.redhat.com/show_bug.cgi?id=1602418
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>