This reverts commit a5a777a8ba.
After previous commit the domain won't disappear while connecting
to monitor. There's no need to ref monitor config then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
When connecting to qemu's monitor the @vm object is unlocked.
This is justified - connecting may take a long time and we don't
want to wait with the domain object locked. However, just before
the domain object is locked again, the monitor's FD is registered
in the event loop. Therefore, there is a small window where the
event loop has a chance to call a handler for an event that
occurred on the monitor FD but vm is not initalized properly just
yet (i.e. priv->mon is not set). For instance, if there's an
incoming migration, qemu creates its socket but then fails to
initialize (for various reasons, I'm reproducing this by using
hugepages but leaving the HP pool empty) then the following may
happen:
1) qemuConnectMonitor() unlocks @vm
2) qemuMonitorOpen() connects to the monitor socket and by
calling qemuMonitorOpenInternal() which subsequently calls
qemuMonitorRegister() the event handler is installed
3) qemu fails to initialize and exit()-s, which closes the
monitor
4) The even loop sees EOF on the monitor and the control gets to
qemuProcessEventHandler() which locks @vm and calls
processMonitorEOFEvent() which then calls
qemuMonitorLastError(priv->mon). But priv->mon is not set just
yet.
5) qemuMonitorLastError() dereferences NULL pointer
The solution is to unlock the domain object for a shorter time
and most importantly, register event handler with domain object
locked so that any possible event processing is done only after
@vm's private data was properly initialized.
This issue is also mentioned in v4.2.0-99-ga5a777a8ba.
Since we are unlocking @vm and locking it back, another thread
might have destroyed the domain meanwhile. Therefore we have to
check if domain is still active, and we have to do it at the
same place where domain lock is acquired back, i.e. in
qemuMonitorOpen(). This creates a small problem for our test
suite which calls qemuMonitorOpen() directly and passes @vm which
has no definition. This makes virDomainObjIsActive() call crash.
Fortunately, allocating empty domain definition is sufficient.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
QEMU 2.11 for ppc64 changed all CPU model names to lower case. Since
libvirt can't change the model names for compatibility reasons, we need
to translate the matching lower case models to the names known by
libvirt.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Call qemuExtVhostUserGPUPrepareDomain() to fill the domain with the
location of the vhost-user binary to start.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Check qemu capability, and accept 3d acceleration. 3d acceleration
support is checked when looking for a suitable vhost-user helper.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
The fact that qemu is capable -netdev socket is not enough to
start a migratable domain. It also needs dbus-vmstate capability.
Since there are already some qemu releases which have
net-socket-dgram capability and don't have dbus-vmstate we need
to check for dbus-vmstate.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If managed='no', then the tap device must already exist, and setting
of MAC address and online status (IFF_UP) is skipped.
NB: we still set IFF_VNET_HDR and IFF_MULTI_QUEUE as appropriate,
because those bits must be properly set in the TUNSETIFF we use to set
the tap device name of the handle we've opened - if IFF_VNET_HDR has
not been set and we set it the request will be honored even when
running libvirtd unprivileged; if IFF_MULTI_QUEUE is requested to be
different than how it was created, that will result in an error from
the kernel. This means that you don't need to pay attention to
IFF_VNET_HDR when creating the tap devices, but you *do* need to set
IFF_MULTI_QUEUE if you're going to use multiple queues for your tap
device.
NB2: /dev/vhost-net normally has permissions 600, so it can't be
opened by an unprivileged process. This would normally cause a warning
message when using a virtio net device from an unprivileged
libvirtd. I've found that setting the permissions for /dev/vhost-net
permits unprivileged libvirtd to use vhost-net for virtio devices, but
have no idea what sort of security implications that has. I haven't
changed libvrit's code to avoid *attempting* to open /dev/vhost-net -
if you are concerned about the security of opening up permissions of
/dev/vhost-net (probably a good idea at least until we ask someone who
knows about the code) then add <driver name='qemu'/> to the interface
definition and you'll avoid the warning message.
Note that virNetDevTapCreate() is the correct function to call in the
case of an existing device, because the same ioctl() that creates a
new tap device will also open an existing tap device.
Resolves: https://bugzilla.redhat.com/1723367 (partially)
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In some places where virDomainObjListForEach() is called the
passed callback calls virDomainObjListRemoveLocked(). Well, this
is unsafe, because the former only grabs a read lock but the
latter modifies the list.
I've identified the following unsafe calls:
- qemuProcessReconnectAll()
- libxlReconnectDomains()
The rest seem to be safe.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When the network interface is of "user" type, and QEMU has the "-net
socket,fd=" datagram support, call qemuInterfacePrepareSlirp() to
probe and associate a slirp-helper with the interface.
The usage of automated slirp-helper can be prevented with
disableSlirp (in particular when resuming a
VM that didn't start with slirp-helper before).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If a slirp-helper is associated with a network interface,
prepare/start/stop the process via qemu-extdevice.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
pid filenames (from swtpm and other helpers from this series) are
based on VM shortname, which is derived from VM id. If the id is reset
to early, the state filenames will not be found.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Once QEMU is started, the qemuDomainLogContext is owned by it, and can
no longer be used from libvirt. Instead, use
qemuDomainLogAppendMessage() which will redirect the log.
This is not strictly necessary for swtpm, but the following patches
are going to reuse qemuExtDeviceLogCommand().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
With blockdev we must issue the block_set_io_throttle QMP command to
setup disk throttling as we currently can't do it with the 'throttle'
layer.
Unfortunately there's nothing we can do if it fails.
https://bugzilla.redhat.com/show_bug.cgi?id=1733163
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU-4.1 supports 'Direct Mode' for Hyper-V synthetic timers
(hv-stimer-direct CPU flag): Windows guests can request that timer
expiration notifications are delivered as normal interrupts (and not
VMBus messages). This is used by Hyper-V on KVM.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Since qemuDomainDefPostParse callback requires qemuCaps, we need to make
sure it gets the capabilities stored in the domain's private data if the
domain is running. Passing NULL may cause QEMU capabilities probing to
be triggered in case QEMU binary changed in the meantime. When this
happens while a running domain object is locked, QMP event delivered to
the domain before QEMU capabilities probing finishes will deadlock the
event loop.
This patch fixes all paths leading to virDomainDefPostParse.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since qemuDomainDefPostParse callback requires qemuCaps, we need to make
sure it gets the capabilities stored in the domain's private data if the
domain is running. Passing NULL may cause QEMU capabilities probing to
be triggered in case QEMU binary changed in the meantime. When this
happens while a running domain object is locked, QMP event delivered to
the domain before QEMU capabilities probing finishes will deadlock the
event loop.
Several general functions from domain_conf.c were lazily passing NULL as
the parseOpaque pointer instead of letting their callers pass the right
data. This patch fixes all paths leading to virDomainDefCopy to do the
right thing.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since qemuDomainDefPostParse callback requires qemuCaps, we need to make
sure it gets the capabilities stored in the domain's private data if the
domain is running. Passing NULL may cause QEMU capabilities probing to
be triggered in case QEMU binary changed in the meantime. When this
happens while a running domain object is locked, QMP event delivered to
the domain before QEMU capabilities probing finishes will deadlock the
event loop.
This patch fixes all paths leading to qemuDomainDefFormatBufInternal.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'default monitor of an allocation' is defined as the resctrl
monitor group that created along with an resctrl allocation,
which is created by resctrl file system. If the monitor group
specified in domain configuration file is happened to be a
default monitor group of an allocation, then it is not necessary
to create monitor group since it is already created. But if
an monitor group is not an allocation default group, you
should create the group under folder
'/sys/fs/resctrl/mon_groups' and fill the vcpu PIDs to 'tasks'
file.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
hv-spinlocks is not a CPUID feature and should not be checked as such.
While starting a domain with hv-spinlocks enabled, we would report a
warning about unsupported hyperv spinlocks feature even though it was
set properly.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Originally the names of the hyperv CPU features were only used
internally for looking up their CPUID bits. So we used "__kvm_hv_"
prefix for them to make sure the names do not collide with normal CPU
features stored in our CPU map.
But with QEMU 4.1 we check which features were enabled or disabled by a
freshly started QEMU process using their names rather than their CPUID
bits (mostly because of MSR features). Thus we need to change our made
up internal names into the actual names used by QEMU. Most of the names
are only used with QEMU 4.1 and newer and the reset was introduced with
QEMU recently enough to already support spelling with "-". Thus we don't
need to define them as "hv_*" with a translation to "hv-*" for new QEMU.
Without this patch libvirt would mistakenly report all hyperv features
as unavailable and refuse to start any domain using them with QEMU 4.1.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In case of an incoming migration we do not need to run swtpm_setup
with all the parameters but only want to get the benefit of it
creating a TPM state file for us that we can then label with an
SELinux label. The actual state will be overwritten by the in-
coming state. So we have to pass an indicator for incomingMigration
all the way to the command line parameter generation for swtpm_setup.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
If we are using -blockdev, then node names are always available
(because we set them). But when not using it, we have to scrape node
names from QMP, and want to do so as infrequently as possible. We
were scraping node names after reconnecting a new libvirtd to an
existing guest (see qemuProcessReconnect), and after any block job
that may have changed the set of node names we care about (legacy
block jobs), but forgot to scrape the names when first starting a
guest. Do so now in order to allow the checkpoint code to always have
access to a node name without having to repeat a node name scrape
itself.
Future patches may need to clean up qemuDomainSetBlockThreshold (if
node names are always available, then it doesn't need to repeat a
scrape) and/or hotplug and media changes (if the addition of new nodes
can result in a null node name, then scraping at that point in time
would be appropriate). But for now, this patch addresses only the
most common instance of a missing node name.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The block job event handler qemuProcessHandleBlockJob looks at the block
job data to see whether the job requires synchronous handling. Since the
block job event may arrive before we continue the job handling (if the
job has no data to copy) we could hit the state when the job is still
set as QEMU_BLOCKJOB_STATE_NEW (as we move it to the
QEMU_BLOCKJOB_STATE_RUNNING state only after returning from monitor).
If the event handler uses qemuBlockJobStartupFinalize it would
unregister and free the job. Thankfully this is not a big problem for
legacy blockjobs as we don't need much data for them but since we'd
re-instantiate the job data structure we'd report wrong job type for
active commit as qemu reports it as a regular commit job.
Fix it by not using qemuBlockJobStartupFinalize function in
qemuProcessHandleBlockJob as it is not starting the job anyways.
https://bugzilla.redhat.com/show_bug.cgi?id=1721375
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The PR manager is a property of the format layer in qemu so we need to
be able to track it also in the chains of orphaned block jobs.
Add a helper for qemu to look also into the blockjob state.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Refresh the state of the jobs and process any events that might have
happened while libvirt was not running.
The job state processing requires some care to figure out if a job
needs to be bumped.
For any invalid job try doing our best to cancel it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add support for handling the event either synchronously or
asynchronously using the event thread.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
With blockdev we'll need to use the JOB_STATUS_CHANGE so gate the old
events by the blockdev capability.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that block job data is stored in the status XML portion we need to
make sure that everything which changes the state also saves the status
XML. The job registering function is used while parsing the status XML
so in that case we need to skip the XML saving.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add the job structure to the table when instantiating a new job and
remove it when it terminates/fails.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Pass an xmlopt argument through all the needed network conf
functions, like is done for domain XML handling. No functional
change for now
Reviewed-by: Laine Stump <laine@laine.org>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Filter out the given capabilities and set domain taint if we've done so.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For testing purposes it's sometimes desired to be able to control the
presence of capabilities of qemu. This adds the possibility to do this
via the qemu namespace.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When connecting to virtlogd fails e.g. due to wrong libvirtd selinux
process label we'd report an utterly useless error message:
$ virsh start upstream
error: Failed to start domain upstream
error: Cannot recv data: Connection reset by peer
Use virLastErrorPrefixMessage in the correct place to give a better
sense of what's going on:
$ virsh start upstream
error: Failed to start domain upstream
error: can't connect to virtlogd: Cannot recv data: Connection reset by peer
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Without "unavailable-features" CPU property we cannot properly detect
whether a specific MSR feature we asked for (either explicitly or
implicitly via a CPU model) was disabled by QEMU for some reason.
Because this could break migration, snapshots, and save/restore
operaions, it's better to just forbid any use of MSR features with QEMU
which lacks "unavailable-features" CPU property.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Always assume JSON monitor was requested, since all the callers
pass true anyway.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Now that we no longer support the HMP monitor, remove some dead code.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Now that the virDomainQemuAttach API returns an error, we can remove the
unused qemuProcessAttach function as well, deleting the only user
that possibly could have requested to open a non-JSON monitor.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
If spawning qemu fails then we report an error and proceed to
writing status XML onto the disk. This is unnecessary as we are
sure that the domain is not running.
At the same time, if virPidFileReadPath() fails it returns
-errno. Use it in the error message. It may explain what went
wrong.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When updating guest CPU definition according to the vCPU actually
created by QEMU, we want to use the generic qemuMonitorGetGuestCPU to
get both CPUID and MSR features.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
It was never implemented or used for anything else anyway. Mainly
because it uses CPUID features bits. The function is renamed as
qemuMonitorGetGuestCPUx86 to make this explicit.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Properly filter features which should not be passed to QEMU because they
were never supported by QEMU or they did nothing and QEMU dropped them.
Currently they are just silently ignored by the command line generator.
Let's make this process more visible and clean by dropping the features
from the domain's active definition in qemuProcessUpdateGuestCPU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If libvirt receives DISCONNECTED event and prDaemonRunning is set
to false, and qemuDomainRemoveDiskDevice() is performing in the
meantime, then qemuDomainRemoveDiskDevice() will fail to remove
pr-helper object because prDaemonRunning is false. But removing
that check from qemuHotplugRemoveManagedPR() is not enough,
because after removing the object through monitor the
qemuProcessKillManagedPRDaemon() is called which contains the
same check. Thus the pr-helper process might be left behind.
Signed-off-by: Jie Wang <wangjie88@huawei.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The hash table returned by qemuMonitorGetAllBlockJobInfo is organized by
the frontend name (which skipps the 'drive-' prefix). While our code
properly matches the jobs to the disk, qemu needs the full job name
including the 'drive-' prefix to be able to identify jobs.
Fix this by adding an argument to qemuMonitorGetAllBlockJobInfo which
does not modify the job name before filling the hash.
This fixes a regression where users would not be able to cancel/pivot
block jobs after restarting libvirtd while a blockjob is running.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit 2f2254c7f4 attempted to fix a memory leak by ensuring
cpumapToSet is always a freshly allocated bitmap, but regrettably
introduced a NULL pointer access while doing so, because it called
virBitmapCopy() without allocating the destination bitmap first.
Solve the issue by using virBitmapNewCopy() instead.
Reported-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We're using VIR_AUTOPTR() for everything now, plus the
cleanup section was not doing anything useful anyway.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In two out of three scenarios we are cleaning up properly after
ourselves, but commit 5f2212c062 has changed the remaining one
in a way that caused it to start leaking cpumapToSet.
Refactor the logic so that cpumapToSet is always a freshly
allocated bitmap that gets cleaned up automatically thanks to
VIR_AUTOPTR(); this also allows us to remove the hostcpumap
variable.
Reported-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Ever since the feature was introduced with commit 0f8e7ae33a,
it has contained a logic error in that it attempted to use a NUMA
node map where a CPU map was expected.
Because of that, guests using <numatune> might fail to start:
# virsh start guest
error: Failed to start domain guest
error: cannot set CPU affinity on process 40055: Invalid argument
This was particularly easy to trigger on POWER 8 machines, where
secondary threads always show up as offline in the host: having
<numatune>
<memory mode='strict' placement='static' nodeset='1'/>
</numatune>
in the guest configuration, for example, would result in libvirt
trying to set the process affinity so that it would prefer
running on CPU 1, but since that's a secondary thread and thus
shows up as offline, the operation would fail, and so would
starting the guest.
Use the newly introduced virNumaNodesetToCPUset() to convert the
NUMA node map to a CPU map, which in the example above would be
48,56,64,72,80,88 - a valid input for virProcessSetAffinity().
https://bugzilla.redhat.com/show_bug.cgi?id=1703661
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When migrating a domain with invtsc CPU feature enabled, the TSC
frequency of the destination host must match the frequency used when the
domain was started on the source host or the destination host has to
support TSC scaling.
If the frequencies do not match and the destination host does not
support TSC scaling, QEMU will fail to set the right TSC frequency when
starting vCPUs on the destination and thus migration will fail. However,
this is quite late since both host might have spent significant time
transferring memory and perhaps even storage data.
By adding the check to libvirt we can let migration fail before any data
starts to be sent over. If for some reason libvirt is unable to detect
the host's TSC frequency or scaling support, we'll just let QEMU try and
the migration will either succeed or fail later.
Luckily, we mandate TSC frequency to be explicitly set in the domain XML
to even allow migration of domains with invtsc. We can just check
whether the requested frequency is compatible with the current host
before starting QEMU.
https://bugzilla.redhat.com/show_bug.cgi?id=1641702
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
If the scheduler is set before vCPU0 cannot be moved into its cpu,cpuacct
cgroup. While it is not yet known whether this is a bug or not, it makes sense
for us to do that later as otherwise the scheduler would be inherited by vCPU
and I/O Threads even when they do not have any such setting specified.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
It's funny how this went unnoticed for such a long time. Long
story short, if a domain is configured with
VIR_DOMAIN_NUMATUNE_MEM_STRICT libvirt doesn't really honour
that. This is because of 7e72ac7878 after which libvirt allowed
qemu to allocate memory just anywhere and only after that it used
some magic involving cpuset.memory_migrate and cpuset.mems to
move the memory to desired NUMA nodes. This was done in order to
work around some KVM bug where KVM would fail if there wasn't a
DMA zone available on the NUMA node. Well, while the work around
might stopped libvirt tickling the KVM bug it also caused a bug
on libvirt side: if there is not enough memory on configured NUMA
node(s) then any attempt to start a domain must fail. Because of
the way we play with guest memory domains can start just happily.
The solution is to move the child we've just forked into emulator
cgroup, set up cpuset.mems and exec() qemu only after that.
This basically reverts 7e72ac7878 which was a workaround
for kernel bug. This bug was apparently fixed because I've tested
this successfully with recent kernel.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The call to resolve the actual network type will turn any NICs with
type=network into one of the other types. Thus there should be no need
to handle type=network in later switch() statements jumping off the
actual type.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The APIs for allocating/notifying/removing network ports just take
an internal domain interface struct right now. As a step towards
turning these into public facing APIs, add a virNetworkPtr argument
to all of them.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The port allocation APIs are currently called unconditionally for all
types of NIC, but (mostly) only do anything for NICs with type=network.
The exception is the port allocate API which does some validation even
for NICs with type!=network. Relying on this validation is flawed,
however, since the network driver may not even be installed. IOW virt
drivers must not delegate validation to the network driver for NICs
with type != network.
This change allows us to report errors when the virtual network driver
is not registered.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This helps in a scenarios where vCPUs run with a priority that is so high they
might starve the emulator thread. And it also fits with the rest of the
settings.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This does not cause a problem in usual scenarios thanks to us allowing
CAP_DAC_OVERRIDE for the qemu process, however in some scenarios this might be
an issue because the directory is created with mkdtemp(3) which explicitly
creates that with 0700 permissions and qemu running as non-root cannot access
that.
The scenarios include:
- Builds without CAPNG
- Running libvirtd in certain container configurations [1]
- and possibly others.
[1] https://github.com/kubevirt/kubevirt/pull/2181#issuecomment-481840304
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Since the STOP event handler can use the pausedReason as sent to
qemuProcessStopCPUs, we no longer need to send duplicate suspended
lifecycle events because we know what caused the stop along with extra
details. This processing allows us to also remove the duplicated state
change from qemuProcessStopCPUs.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Map is based on existing cases in code where we send suspended
event after changing domain state to paused.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Similar to commit [1] which saves and passes the running reason to
the RESUME event handler, during qemuProcessStopCPUs let's save and pass
the pause reason in the domain private data so that the STOP event
handler can use it.
[1] 5dab984ed : qemu: Pass running reason to RESUME event handler
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Now that the core of SnapshotObj is agnostic to snapshots and can be
shared with upcoming checkpoint code, it is time to rename the struct
and the functions specific to list operations. A later patch will
shuffle which file holds the common code. This is a fairly mechanical
patch.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Replace virDomainChrSourceDefFree with virObjectUnref.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Use refcounting for priv->monConfig instead of asymmetric freeing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
qemuProcessQMPStart starts a QEMU process and monitor connection that
can be used by multiple functions possibly for multiple QMP commands.
The QMP exchange to exit capabilities negotiation mode and enter command
mode can only be performed once after the monitor connection is
established.
Move responsibility for entering QMP command mode into the
qemuProcessQMP code so multiple functions can issue QMP commands in
arbitrary orders.
This also simplifies the functions using the connection provided by
qemuProcessQMPStart to issue QMP commands.
Test code now needs to call qemuMonitorSetCapabilities to send the
message to switch to command mode because the test code does not use the
qemuProcessQMP command that internally calls qemuMonitorSetCapabilities.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Multiple QEMU processes for QMP commands can operate concurrently.
Use a unique directory under libDir for each QEMU process to avoid
pidfile and unix socket collision between processes.
The pid file name is changed from "capabilities.pidfile" to "qmp.pid"
because we no longer need to avoid a possible clash with a qemu domain
called "capabilities" now that the processes artifacts are stored in
their own unique temporary directories.
"Capabilities" was changed to "qmp" in the pid file name because these
processes are no longer specific to the capabilities usecase and are
more generic in terms of being used for any general purpose QMP message
exchanges with a QEMU process that is not associated with a domain.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Users qemuProcessQMP struct were always forced to call both
qemuProcessQMPStop and qemuProcessQMPFree when they are done with the
process. We can just call qemuProcessQMPStop from qemuProcessQMPFree and
let users call qemuProcessQMPFree only.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
qemuProcessQMPNew is one of the public functions used to create and
manage a QEMU process for QMP command exchanges outside of domain
operations.
Add descriptive comment block, debug statement and make source
consistent with the cleanup / VIR_STEAL_PTR format used elsewhere.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The monitor config data is removed from the qemuProcessQMP struct.
The monitor config data can be initialized immediately before call to
qemuMonitorOpen and does not need to be maintained after the call
because qemuMonitorOpen copies any strings it needs.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Move code for setting paths and prepping file system from
qemuProcessQMPNew to qemuProcessQMPInit.
This keeps qemuProcessQMPNew limited to data structures and path
initialization is done in qemuProcessQMPInit.
The patch is a non-functional, cut / paste change, however goto is now
"cleanup" rather than "error".
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Store libDir path in the qemuProcessQMP struct in anticipation of moving
path construction code into qemuProcessQMPInit function.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
All code related to QEMU monitor is moved from qemuProcessQMPNew and
qemuProcessQMPInit into qemuProcessQMPConnectMonitor.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This is a replacement for qemuProcessQMPRun to make the name consistent
with qemuProcessStart. The original qemuProcessQMPRun function is
renamed as qemuProcessQMPLaunch and becomes one of the simpler functions
called from the main qemuProcessQMPStart entry point. The following
patches will move parts of the code in qemuProcessQMPLaunch to the other
functions (qemuProcessQMPInit and qemuProcessQMPConnectMonitor).
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Keep the pointer to QEMU stderr output in qemuProcessQMP struct instead
of requiring the caller to provide it (and free it).
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
While qemuProcessQMPRun and virQEMUCapsInitQMPMonitor* functions called
from virQEMUCapsInit ignore some errors, the caller of virQEMUCapsInit
would report an error unless usedQMP is true anyway. And since usedQMP
can only be true if the probing code really succeeded (i.e., no errors
were ignored), we can just simplify the logic by not ignoring the errors
in the first place.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In new process code, move from model where qemuProcessQMP struct can be
used to activate a series of Qemu processes to model where one
qemuProcessQMP struct is used for one and only one Qemu process.
By allowing only one process activation per qemuProcessQMP struct, the
struct can safely store process outputs like status and stderr, without
being overwritten, until qemuProcessQMPFree is called.
By doing this, process outputs like status and stderr can remain stored
in the qemuProcessQMP struct without being overwritten by subsequent
process activations.
The forceTCG parameter (use / don't use KVM) will be passed when the
qemuProcessQMP struct is initialized since the qemuProcessQMP struct
won't be reused.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virQEMUCapsInitQMP now stops QEMU process in all execution paths,
before freeing the process structure.
The qemuProcessQMPStop function can be called multiple times without
problems... Won't attempt to stop processes and free resources multiple
times.
Follow the convention established in qemu_process of
1) alloc process structure
2) start process
3) use process
4) stop process
5) free process data structure
The process data structure persists after the process activation fails
or the process dies or is killed so stderr strings can be retrieved
until the process data structure is freed.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
s/qemuProcessQMPAbort/qemuProcessQMPStop/ applied to change function
name used to stop QEMU processes in process code moved from
qemu_capabilities.
No functionality change.
The new name, qemuProcessQMPStop, is consistent with the existing
function qemuProcessStop used to stop Domain processes.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
s/cmd/proc/ in process code imported from qemu_capabilities.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add the const qualifier on non modified strings
(string only copied inside qemuProcessQMPNew)
so that const strings can be used directly in calls to
qemuProcessQMPNew in future patches.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU process code in qemu_capabilities.c is moved to qemu_process.c in
order to make the code usable outside the original capabilities use
cases.
The moved code activates and manages QEMU processes without establishing
a guest domain.
This patch is a straight cut/paste move between files.
Signed-off-by: Chris Venteicher <cventeic@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For normal starts (no incoming migration) the refresh of the QEMU
state must be done before the VCPUs getting started since otherwise
there might be a race condition between a possible shutdown of the
guest OS and the QEMU monitor queries.
This fixes "qemu: migration: Refresh device information after
transferring state" (93db7eea1b).
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1503284
The way we currently start qemu from CPU affinity POV is as
follows:
1) the child process is set affinity to all online CPUs (unless
some vcpu pinning was given in the domain XML)
2) Once qemu is running, cpuset cgroup is configured taking
memory pinning into account
Problem is that we let qemu allocate its memory just anywhere in
1) and then rely in 2) to be able to move the memory to
configured NUMA nodes. This might not be always possible (e.g.
qemu might lock some parts of its memory) and is very suboptimal
(copying large memory between NUMA nodes takes significant amount
of time).
The solution is to set affinity to one of (in priority order):
- The CPUs associated with NUMA memory affinity mask
- The CPUs associated with emulator pinning
- All online host CPUs
Later (once QEMU has allocated its memory) we then change this
again to (again in priority order):
- The CPUs associated with emulator pinning
- The CPUs returned by numad
- The CPUs associated with vCPU pinning
- All online host CPUs
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Be more sensible when setting labels of the target of a
virDomainBlockCopy operation. Previously we'd relabel everything in case
it's a copy job even if there's no unlabelled backing chain. Since we
are also not sure whether the backing chain is shared we don't relabel
the chain on completion of the blockjob. This certainly won't play nice
with the image permission relabelling feature.
While this does not fix the case where the image is reused and has
backing chain it certainly sanitizes all the other cases. Later on it
will also allow to do the correct thing in cases where only one layer
was introduced.
The change is necessary as in case when -blockdev will be used we will
need to hotplug the backing chain and thus labeling needs to be setup in
advance and not only at the time of pivot. To avoid multiple code paths
move the labeling now.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When we need to detect a chain for a image which will become the new
source for a disk (e.g. after a disk media change or a blockjob) we'd
need to replace disk->src temporarily to do so.
Move the 'disksrc' temporary variable to an argument and adjust callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
If we validate that memballoon is NONE|VIRTIO at parse time,
we can drop similar checks elsewhere in the qemu driver
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Hanlde all the possible failure codes as per ACPI standard documented in
the function header.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1660410
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We forgot to document the specific fields for the 0x103 and 0x200
sources which are tied to device removal and device hotplug
respectively.
The value description is based on the ACPI 6.2A standard Table 6-207 and
Table 6-208. At the time of writing of this patch the standard can be
accessed e.g. at:
https://www.uefi.org/sites/default/files/resources/ACPI%206_2_A_Sept29.pdf
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Be generic instead of trying to enumerate all the involved
device types.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Currently the job name corresponds to the disk the job belongs to. For
jobs which will not correspond to disks we'll need to track the name
separately.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add a field tracking the current state of job so that it can be queried
later. Until now the job state e.g. that the job is _READY for
finalizing was tracked only for mirror jobs. Add tracking of state for
all jobs.
Similarly to 'qemuBlockJobType' this maps the existing states of the
blockjob from virConnectDomainEventBlockJobStatus to
'qemuBlockJobState' so that we can track some internal states as well.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We can properly track the job type when starting the job so that we
don't have to infer it later.
This patch also adds an enum of block job types specific to qemu
(qemuBlockjobType) which mirrors the public block job types
(virDomainBlockJobType) but allows for other types to be added later
which will not be public.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rather than directly modifying fields in the qemuBlockJobDataPtr
structure add a bunch of fields which allow to do the transitions.
This will help later when adding more complexity to the job handling.
APIs introduced in this patch are:
qemuBlockJobDiskNew - prepare for starting a new blockjob on a disk
qemuBlockJobDiskGetJob - get the block job data structure for a disk
For individual job state manipulation the following APIs are added:
qemuBlockJobStarted - Sets the job as started with qemu. Until that
the job can be cancelled without asking qemu.
qemuBlockJobStartupFinalize - finalize job startup. If the job was
started in qemu already, just releases
reference to the job object. Otherwise
clears everything as if the job was never
started.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The field is used to note the state the job has transitioned to while
handling the blockjob state change event. Rename the field so that it's
obvious that this is the new state and not the general state of the
blockjob.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Block job state was widely untracked by libvirt across restarts which
was allowed by a stateless block job finishing handler which discarded
disk state and redetected it. This is undesirable since we'll need to
track more information for individual blockjobs due to -blockdev
integration requirements.
In case of legacy blockjobs we can recover whether the job is present at
reconnect time by querying qemu. Adding tracking whether a job is
present will allow simplification of the non-shared-storage cancellation
code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'cleanup' label was accessed only from a jump to 'error'. Consolidate
everyting into 'cleanup'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Struct qemuDomainDiskPrivate was holding multiple variables connected to
a disk block job. Consolidate them into a new struct qemuBlockJobData.
This will also allow simpler extensions to the block job mechanisms.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In the previous commit we are using uint64_t for storing subnet
prefix and interface id that qemu reports in
RDMA_GID_STATUS_CHANGED event. We also report them in some debug
messages. This poses a problem because uint64_t can be UL or ULL
depending on the host architecture and hence we wouldn't know
which format to use. Switch to ULL which is big enough and
doesn't suffer from the issue.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This event is emitted on the monitor when a GID table in pvrdma device
is modified and the change needs to be propagate to the backend RDMA
device's GID table.
The control over the RDMA device's GID table is done by updating the
device's Ethernet function addresses.
Usually the first GID entry is determine by the MAC address, the second
by the first IPv6 address and the third by the IPv4 address. Other
entries can be added by adding more IP addresses. The opposite is the
same, i.e. whenever an address is removed, the corresponding GID entry
is removed.
The process is done by the network and RDMA stacks. Whenever an address
is added the ib_core driver is notified and calls the device driver's
add_gid function which in turn update the device.
To support this in pvrdma device we need to hook into the create_bind
and destroy_bind HW commands triggered by pvrdma driver in guest.
Whenever a changed is made to the pvrdma device's GID table a special
QMP messages is sent to be processed by libvirt to update the address of
the backend Ethernet device.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Before launching a SEV guest we take the base64-encoded guest owner's
data specified in launchSecurity and create files with the same content
under /var/lib/libvirt/qemu/<domain>. The reason for this is that we
need to pass these files on to QEMU which then uses them to communicate
with the SEV firmware, except when it doesn't have permissions to open
those files since we don't relabel them.
https://bugzilla.redhat.com/show_bug.cgi?id=1658112
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
Since SEV operates on a per domain basis, it's very likely that all
SEV launch-related data will be created under
/var/lib/libvirt/qemu/<domain_name>. Therefore, when calling into
qemuProcessSEVCreateFile we can assume @libDir as the directory prefix
rather than passing it explicitly.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
Because missing optional storage source is not error. The patch
address only local files. Fixing other cases is a bit ugly.
Below is example of error notice in log now:
error: virStorageFileReportBrokenChain:427 :
Cannot access storage file '/path/to/missing/optional/disk':
No such file or directory
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
The QEMU command line arguments are very long and currently all written
on a single line to /var/log/libvirt/qemu/$GUEST.log. This introduces
logic to add line breaks after every env variable and "-" optional
argument, and every positional argument. This will create a clearer log
file, which will in turn present better in bug reports when people cut +
paste from the log into a bug comment.
An example log file entry now looks like this:
2018-12-14 12:57:03.677+0000: starting up libvirt version: 5.0.0, qemu version: 3.0.0qemu-3.0.0-1.fc29, kernel: 4.19.5-300.fc29.x86_64, hostname: localhost.localdomain
LC_ALL=C \
PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin \
HOME=/home/berrange \
USER=berrange \
LOGNAME=berrange \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-ppc64 \
-name guest=guest,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/home/berrange/.config/libvirt/qemu/lib/domain-33-guest/master-key.aes \
-machine pseries-2.10,accel=tcg,usb=off,dump-guest-core=off \
-m 1024 \
-realtime mlock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid c8a74977-ab18-41d0-ae3b-4041c7fffbcd \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=23,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-device qemu-xhci,id=usb,bus=pci.0,addr=0x1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2018-12-14 12:57:03.730+0000: shutting down, reason=failed
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Require that all headers are guarded by a symbol named
LIBVIRT_$FILENAME
where $FILENAME is the uppercased filename, with all characters
outside a-z changed into '_'.
Note we do not use a leading __ because that is technically a
namespace reserved for the toolchain.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This introduces a syntax-check script that validates header files use a
common layout:
/*
...copyright header...
*/
<one blank line>
#ifndef SYMBOL
# define SYMBOL
....content....
#endif /* SYMBOL */
For any file ending priv.h, before the #ifndef, we will require a
guard to prevent bogus imports:
#ifndef SYMBOL_ALLOW
# error ....
#endif /* SYMBOL_ALLOW */
<one blank line>
The many mistakes this script identifies are then fixed.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Unlike with SPICE and SDL which use the <gl> subelement to enable OpenGL
acceleration, specifying egl-headless graphics in the XML has
essentially the same meaning, thus in case of egl-headless we don't have
a need for the 'enable' element attribute and we'll only be interested
in the 'rendernode' one further down the road.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Up until now, we formatted 'rendernode=' onto QEMU cmdline only if the
user specified it in the XML, otherwise we let QEMU do it for us. This
causes permission issues because by default the /dev/dri/renderDX
permissions are as follows:
crw-rw----. 1 root video
There's literally no reason why it shouldn't be libvirt picking the DRM
render node instead of QEMU, that way (and because we're using
namespaces by default), we can safely relabel the device within the
namespace.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Post-copy migration has been broken on the source since commit
v3.8.0-245-g32c29f10db which implemented support for
pause-before-switchover QEMU migration capability.
Even though the migration itself went well, the source did not really
know when it switched to the post-copy mode despite the messages logged
by MIGRATION event handler. As a result of this, the events emitted by
source libvirtd were not accurate and statistics of the completed
migration would cover only the pre-copy part of migration. Moreover, if
migration failed during the post-copy phase for some reason, the source
libvirtd would just happily resume the domain, which could lead to disk
corruption.
With the pause-before-switchover capability enabled, the order of events
emitted by QEMU changed:
pause-before-switchover
disabled enabled
MIGRATION, postcopy-active STOP
STOP MIGRATION, pre-switchover
MIGRATION, postcopy-active
The STOP even handler checks the migration status (postcopy-active) and
sets the domain state accordingly. Which is sufficient when
pause-before-switchover is disabled, but once we enable it, the
migration status is still active when we get STOP from QEMU. Thus the
domain state set in the STOP handler has to be corrected once we are
notified that migration changed to postcopy-active.
This results in two SUSPENDED events to be emitted by the source
libvirtd during post-copy migration. The first one with
VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED detail, while the second one reports
the corrected VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY detail. This is
inevitable because we don't know whether migration will eventually
switch to post-copy at the time we emit the first event.
https://bugzilla.redhat.com/show_bug.cgi?id=1647365
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For metadata locking we might need an extra fork() which given
latest attempts to do fewer fork()-s is suboptimal. Therefore,
there will be a qemu.conf knob to {en|dis}able this feature. But
since the feature is actually not metadata locking itself rather
than remembering of the original owner of the file this is named
as 'rememberOwner'. But patches for that feature are not even
posted yet so there is actually no qemu.conf entry in this patch
nor a way to enable this feature.
Even though this is effectively a dead code for now it is still
desired.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The TPM code currently accepts pointer to a domain definition.
This is okay for now, but in near future the security driver APIs
it calls will require domain object. Therefore, change the TPM
code to accept the domain object pointer.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Commit ("qemu_domain.c: moving maxCpu validation to
qemuDomainDefValidate") shortened the code of qemuProcessStartValidateXML.
The function is called only by qemuProcessStartValidate, in the
same file, and its code is now a single check that calls virDomainDefValidate.
Instead of leaving a function call just to execute a single check,
this patch puts the check in the body of qemuProcessStartValidate in the
place where qemuProcessStartValidateXML was being called. The function can
now be removed.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Previous patch removed the call to qemuProcessValidateCpuCount
from qemuProcessStartValidateXML, in qemu_process.c. The only
caller left is qemuDomainDefValidate, in qemu_domain.c.
Instead of having a public function declared inside qemu_process.c
that isn't used in that file, this patch moves the function to
qemu_domain.c, making in static and renaming it to
qemuDomainValidateCpuCount to be compliant with other static
functions names in the file.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Adding maxCpu validation in qemuDomainDefValidate allows the user to
spot over the board maxCpus counts at editing time, instead of
facing a runtime error when starting the domain. This check is also
arch independent.
This leaves us with 2 calls to qemuProcessValidateCpuCount: one in
qemuProcessStartValidateXML and the new one at qemuDomainDefValidate.
The call in qemuProcessStartValidateXML is redundant. Following
up in that code, there is a call to virDomainDefValidate, which
in turn will call config.domainValidateCallback. In this case, the
callback function is qemuDomainDefValidate. This means that, on startup
time, qemuProcessValidateCpuCount will be called twice.
To avoid that, let's also remove the qemuProcessValidateCpuCount call
from qemuProcessStartValidateXML.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
qemuValidateCpuCount validates the maxCpus value of a domain at
startup time, preventing it to start if the value exceeds a maximum.
This checking is also done at qemu_domain.c, qemuDomainDefValidate.
However, it is done only for x86 (and even then, in a specific
scenario). We want this check to be done for all archs.
To accomplish this, let's first make qemuValidateCpuCount public so
it can be used inside qemuDomainDefValidate. The function was renamed
to qemuProcessValidateCpuCount to be compliant with the other public
methods at qemu_process.h. The method signature was slightly adapted
to fit the const 'def' variable used in qemuDomainDefValidate. This
change has no downside in in its original usage at
qemuProcessStartValidateXML.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Adding the maxCpus value in the error message of qemuValidateCpuCount
allows the user to set an acceptable maxCpus count without knowing
QEMU internals.
x86 guests, that might have been created prior to the x86
qemuDomainDefValidate maxCpus check code (that validates the maxCpus value
in editing time), will also benefit from this change.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Support Hyper-V Enlightened VMCS in domain config. QEMU support will
be implemented in the next patch, adding interim VIR_DOMAIN_HYPERV_EVMCS
cases to src/qemu/* for now.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
QEMU 3.1 supports Hyper-V-style PV IPIs making it cheaper for Windows
guests to send an IPI, especially when it targets many CPUs.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Support Hyper-V PV IPI enlightenment in domain config. QEMU support will
be implemented in the next patch, adding interim VIR_DOMAIN_HYPERV_IPI
cases to src/qemu/* for now.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1631622
If polkit authentication is enabled, an attempt to open
the connection failed during virAccessDriverPolkitGetCaller
when the call to virIdentityGetCurrent returned NULL resulting
in the errors:
virAccessDriverPolkitGetCaller:87 : access denied:
Policy kit denied action org.libvirt.api.connect.getattr from <anonymous>
Because qemuProcessReconnect runs in a thread during
daemonRunStateInit processing it doesn't have the thread
local identity. Thus when the virGetConnectNWFilter is
called as part of the qemuProcessFiltersInstantiate when
virDomainConfNWFilterInstantiate is run the attempt to get
the idenity fails and results in the anonymous error above.
To fix this, let's grab/use the virIdenityPtr of the process
that will be creating the thread, e.g. what daemonRunStateInit
has set and use that for our thread. That way any other similar
processing that uses/requires an identity for any other call
that would have previously been successfully run won't fail in
a similar manner.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Add functions for creating, destroying, reconnecting resctrl
monitor in qemu according to the configuration in domain XML.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch introduces a new shutdown reason "daemon" in order
to indicate that the daemon needed to force shutdown the domain
as the best course of action to take at the moment.
This action would occur during reconnection when processing
encounters an error once the monitor reconnection is successful.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The gotShutdown bool has been redundant since we started setting
VIR_DOMAIN_SHUTDOWN state after receiving SHUTDOWN event from QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
If gotShutdown is true, the domain state cannot be running because of
the following code in qemuProcessHandleShutdown:
priv->gotShutdown = true;
VIR_DEBUG("Transitioned guest %s to shutdown state",
vm->def->name);
virDomainObjSetState(vm,
VIR_DOMAIN_SHUTDOWN,
VIR_DOMAIN_SHUTDOWN_UNKNOWN);
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Since commit v4.7.0-302-ge6d77a75c4 processing RESUME event is mandatory
for updating domain state. But the event handler explicitly ignored this
event in some cases. Thus the state would be wrong after a fake reboot
or when a domain was rebooted after it crashed.
BTW, the code to ignore RESUME event after SHUTDOWN didn't make sense
even before making RESUME event mandatory. Most likely it was there as a
result of careless copy&paste from qemuProcessHandleStop.
The corresponding debug message was clarified since the original state
does not have to be "paused" only and while we have a "resumed" event,
the state is called "running".
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The current qemuProcessReconnect logic paints a broad brush
determining that the shutdown reason must be crashed if it was
determined that the domain was started with -no-shutdown; however,
there's many other ways to get to the error label, so let's narrow
our reasoning window for using VIR_DOMAIN_SHUTOFF_CRASHED to the
period where we essentially know we've tried to create to the
monitor and before we were successful in opening the connection.
Failures that occur outside that window would thus be considered
as VIR_DOMAIN_SHUTOFF_UNKNOWN, at least for now.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
When qemuProcessReconnectHelper was introduced (commit d38897a5d)
reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
or VIR_DOMAIN_SHUTOFF_UNKNOWN.
When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.
So introduce qemuDomainIsUsingNoShutdown which will manage the
condition when the domain was started with -no-shutdown so that
when/if reconnection failure occurs we can restore the decision
point used to determine whether CRASHED or UNKNOWN is provided.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
IOThread pids info will lost after libvirtd restart, then
if we call pinIOThread, sched_setaffinity will be called with
pid 0, not IOThread pid. So pinIOThread cannot work normally.
Signed-off-by: Jie Wang <wangjie88.huawei.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The QEMU @cfg config variable is unused in context of qemuProcessInit,
let's drop it.
Signed-off-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This function updates the used QEMU capabilities of @vm by querying
the QEMU capabilities cache.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The only place where VIR_DOMAIN_EVENT_RESUMED should be generated is the
RESUME event handler to make sure we don't generate duplicate events or
state changes. In the worse case the duplicity can revert or cover
changes done by other event handlers.
For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events
we could happily mark the domain as running and report
VIR_DOMAIN_EVENT_RESUMED to registered clients.
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Thanks to the previous commit the RESUME event handler knows what reason
should be used when changing the domain state to VIR_DOMAIN_RUNNING, but
the emitted VIR_DOMAIN_EVENT_RESUMED event still uses a generic
VIR_DOMAIN_EVENT_RESUMED_UNPAUSED detail. Luckily, the event detail can
be easily deduced from the running reason, which saves us from having to
pass one more value to the handler.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Whenever we get the RESUME event from QEMU, we change the state of the
affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED
reason. This is fine if the domain is resumed unexpectedly, but when we
sent "cont" to QEMU we usually have a better reason for the state
change. The better reason is used in qemuProcessStartCPUs which also
sets the domain state to running if qemuMonitorStartCPUs reports
success. Thus we may end up with two state updates in a row, but the
final reason is correct.
This patch is a preparation for dropping the state change done in
qemuMonitorStartCPUs for which we need to pass the actual running reason
to the RESUME event handler and use it there instead of
VIR_DOMAIN_RUNNING_UNPAUSED.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch replaces some rather generic VIR_DOMAIN_RUNNING_UNPAUSED
reasons when changing domain state to running with more specific ones.
All of them are done when libvirtd reconnects to an existing domain
after being restarted and sees an unfinished migration or save.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Once we introduce cgroup v2 support we need to handle processes and
threads differently.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In cgroup v2 we need to handle processes and threads differently,
following patch will introduce virCgroupAddThread.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In a following case:
virsh start $domain
service libvirtd stop
<shutdown> the guest from within the $domain
service libvirtd start
Notice that PCI devices which have been assigned to the $domain will
still be bound to stub drivers instead rebound to host drivers.
In that case the call stack is like below:
libvirtd start
qemuProcessReconnect
qemuProcessStop (because $domain was shutdown without
libvirtd event to process that)
qemuHostdevReAttachDomainDevices
qemuHostdevReAttachPCIDevices
virHostdevReAttachPCIDevices
However, because qemuHostdevUpdateActiveDomainDevices was called
after the qemuConnectMonitor, the setup of the tracking of each
host device in the $domain on either the activePCIHostdevs list
or inactivePCIHostdev list will not occur in an orderly manner.
Therefore, virHostdevReAttachPCIDevices just neglects these host PCI
devices which are bound to stub drivers and doesn't rebind them to
host drivers.
This patch fixs that by moving qemuHostdevUpdateActiveDomainDevices before
qemuConnectMonitor during libvirtd reconnection processing.
Signed-off-by: Wu Zongyong <cordius.wu@huawei.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Use the new qemuDomainRemoveInactiveJobLocked to remove the
@obj during the virDomainObjListForEach call which holds a
lock on the domain object list.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1607202
It's essentially stated in the nwfilterBindingDelete that we
will allow the admin to shoot themselves in the foot by deleting
the nwfilter binding which then allows them to undefine the
nwfilter that is in use for the running guest...
However, by allowing this we cause a problem for libvirtd
restart reconnect processing which would then try to recreate
the missing binding attempting to use the deleted filter
resulting in an error and thus shutting the guest down.
So rather than keep adding virDomainConfNWFilterInstantiate
flags to "ignore" specific error conditions, modify the logic
to ignore, but VIR_WARN errors other than ignoreExists. This
will at least allow the guest to not shutdown for only nwfilter
binding errors that we can now perhaps recover from since we
have the binding create/delete capability.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Even though the current use of the function does not require full
implementation with transactions (none of the callers pass a path
somewhere under /dev), it doesn't hurt either. Moreover, in
future patches the paradigm is going to shift so that any API
that touches a file is required to use transactions.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The qemuSecurityDomainSetPathLabel() function reports perfect
error itself. Do not overwrite it to something less meaningful.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
qemuProcessInitCpuAffinity prevents a VM from getting started on a
platform that uses cpu affinity wrapper stubs e.g. macOS.
The patch adds qemuProcessInitCpuAffinity stub on all platforms without
HAVE_SCHED_GETAFFINITY or HAVE_BSD_CPU_AFFINITY.
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
It was found that in cases with host devices virProcessKillPainfully
might be able to send signal zero to the target PID for quite a while
with the process already being gone from /proc/<PID>.
That is due to cleanup and reset of devices which might include a
secondary bus reset that on top of the actions taken has a 1s delay
to let the bus settle. Due to that guests with plenty of Host devices
could easily exceed the default timeouts.
To solve that, this adds an extra delay of 2s per hostdev that is associated
to a VM.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Switch to using the QOM/qdev handles in all calls to
qemuMonitorGetBlockInfo when using -blockdev. The callers also need to
make sure to use the correct handle afterwards to extract the data.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use the 'node-name' provided in the event if 'device' is empty to look
up the disk.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add handling of the 'id' field in the event which corresponds to the
QDEV id of the device.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Allow looking up also via QOM id and rename the function accordingly.
Also add documentation of the specifics.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The proper way to do this would be to use the 'throttle' driver but
unfortunately it can't change the 'throttle_group' so we can't provide
feature parity. This hack uses the block_set_io_throttle command to do
so until we can properly replace it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Node names for block objects in qemu need to be unique for an instance
of the qemu process. Add a counter to generate objects sequentially and
store it in the status XML so that we can restore it.
The helpers added allow to create new node names and reset the counter
after the VM process terminates.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We'll specify them ourselves so it's pointless to attempt to redetect
them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We need to load the backing chain from the XML when using -blockdev.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
SD cards are currently passed by using -drive only which would not be
compatible with using -blockdev fully.
Clear QEMU_CAPS_BLOCKDEV if the VM has such devices.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Currently we'd report the alias of the drive which is backing the cdrom
rather than the device itself:
$ virsh event ds tray-change --loop
event 'tray-change' for domain ds disk drive-ide0-0-1: opened
event 'tray-change' for domain ds disk drive-ide0-0-1: closed
Report the disk device alias as we document in the API docs:
https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectDomainEventTrayChangeCallback
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Qemu-3.0 supports Hyper-V-style PV TLB flush, Windows guests can benefit
from this feature as KVM knows which vCPUs are not currently scheduled (and
thus don't require any immediate action).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Qemu-3.0 supports so-called 'Reenlightenment' notifications and this (in
conjunction with 'hv-frequencies') can be used make Hyper-V on KVM pass
stable TSC page clocksource to L2 guests.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Qemu-2.12 gained 'hv-frequencies' cpu flag to enable Hyper-V frequency
MSRs. These MSRs are required (but not sufficient) to make Hyper-V on
KVM pass stable TSC page clocksource to L2 guests.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Resctrl not only supports cache tuning, but also memory bandwidth
tuning. Renaming cachetune to resctrl to reflect that. With resctrl,
all allocation for different resources (cache, memory bandwidth) are
aggregated and represented by a virResctrlAllocPtr inside
virDomainResctrlDef.
Signed-off-by: Bing Niu <bing.niu@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This reverts commit 0f80c71822.
Turns out, our code relies on virCgroupFree(&var) setting
var = NULL.
Conflicts:
src/util/vircgroup.c: context because 94f1855f09 is not
reverted.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Modify virCgroupFree function signature to take a value of type
virCgroupPtr instead of virCgroupPtr * as the parameter.
Change the argument type in all calls to virCgroupFree function
from virCgroupPtr * to virCgroupPtr. This is a step towards
having consistent function signatures for Free helpers so that
they can be used with VIR_AUTOPTR cleanup macro.
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Use of enum types for struct fields is generally avoided since it causes
warnings if the compiler assumes the enum is unsigned. For example
commit 8e2982b576
Author: Cole Robinson <crobinso@redhat.com>
Date: Tue Jul 24 16:27:54 2018 -0400
conf: Clean up virDomainDefParseCaps
Introduced a line:
if ((def->virtType = virDomainVirtTypeFromString(virttype)) < 0) {
which causes a build failure with CLang
conf/domain_conf.c:19143:65: error: comparison of unsigned enum expression < 0 is always false [-Werror,-Wtautological-compare]
as the compiler is free to optimize away the "< 0" check due to the
assumption that the enum type is unsigned and always in range.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
With 'switch' we can utilize the compile time enum checks which we can't
rely on with plain 'if' conditions.
Signed-off-by: Shi Lei <shilei.massclouds@gmx.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
In some cases backing chain needs to be cleared prior to re-detection.
Move this step out of qemuDomainDetermineDiskChain as only certain
places need it and the function itself is able to skip to the end of the
chain to perform detection.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Since 2.10 QEMU supports a new display type egl-headless which uses the
drm nodes for OpenGL rendering copying back the rendered bits back to
QEMU into a dma-buf which can be accessed by standard "display" apps
like VNC or SPICE. Although this display type can be used on its own,
for any practical use case it makes sense to pair it with either VNC or
SPICE display. The clear benefit of this display is that VNC gains
OpenGL support, which it natively doesn't have, and SPICE gains remote
OpenGL support (native OpenGL support only works locally through a UNIX
socket, i.e. listen type=socket/none).
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
If qemu-pr-helper process died while libvirtd was not running no
event is emitted. Therefore, when reconnecting to the monitor we
must check the qemu-pr-helper process status and act accordingly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This event is emitted on the monitor if one of pr-managers lost
connection to its pr-helper process. What libvirt needs to do is
restart the pr-helper process iff it corresponds to managed
pr-manager.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Users have possibility to disable qemu namespace feature (e.g.
because they are running on *BSD which lacks Linux NS support).
If that's the case we should not try to move qemu-pr-helper into
the same namespace as qemu is in.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When using domxml-to-native, we must generate CLI args that can be used
in a standalone scenario. This means no FD passing can be used. To
achieve this we must clear the QEMU_CAPS_CHARDEV_FD_PASS capability bit.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Remove the callbacks that the nwfilter driver registers with the domain
object config layer. Instead make the current helper methods call into
the public API for creating/deleting nwfilter bindings.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The UNIX socket FDs were we passing to QEMU inherited a label based on
libvirtd's context. QEMU is thus denied ability to access the UNIX
socket. We need to use the security manager to change our current
context temporarily when creating the UNIX socket FD.
Reviewed-by: Laine Stump <laine@laine.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Be more consistent and use 'preparing' instead of 'prepare' here.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
When commit 6718132d enforced usage of the cleanup label, it forgot to
set the @ret variable to 0 on "success" exit path.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Some identifiers use Sev, some SEV. Prefer the latter.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Tested-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
A common cleanup path for both the success and the error case.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Tested-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Make the function prefix match the file it's in.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Tested-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
And replace all calls with virObjectEventStateQueue such that:
qemuDomainEventQueue(driver, event);
becomes:
virObjectEventStateQueue(driver->domainEventState, event);
And remove NULL checking from all callers.
Signed-off-by: Anya Harter <aharter@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
QEMU >= 2.12 provides 'sev-guest' object which is used to launch encrypted
VMs on AMD platform using SEV feature. The various inputs required to
launch SEV guest is provided through the <launch-security> tag. A typical
SEV guest launch command line looks like this:
-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=5 ...\
-machine memory-encryption=sev0 \
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Add the external swtpm to the emulator cgroup so that upper limits of CPU
usage can be enforced on the emulated TPM.
To enable this we need to have the swtpm write its process id (pid) into a
file. We then read it from the file to configure the emulator cgroup.
The PID file is created in /var/run/libvirt/qemu/swtpm:
[root@localhost swtpm]# ls -lZ /var/run/libvirt/qemu/swtpm/
total 4
-rw-r--r--. 1 tss tss system_u:object_r:qemu_var_run_t:s0 5 Apr 10 12:26 1-testvm-swtpm.pid
srw-rw----. 1 qemu qemu system_u:object_r:svirt_image_t:s0:c597,c632 0 Apr 10 12:26 1-testvm-swtpm.sock
The swtpm command line now looks as follows:
root@localhost testvm]# ps auxZ | grep swtpm | grep socket | grep -v grep
system_u:system_r:virtd_t:s0:c597,c632 tss 18697 0.0 0.0 28172 3892 ? Ss 16:46 0:00 /usr/bin/swtpm socket --daemon --ctrl type=unixio,path=/var/run/libvirt/qemu/swtpm/1-testvm-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/485d0004-a48f-436a-8457-8a3b73e28568/tpm1.2/ --log file=/var/log/swtpm/libvirt/qemu/testvm-swtpm.log --pid file=/var/run/libvirt/qemu/swtpm/1-testvm-swtpm.pid
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Implement functions for managing the storage of the external swtpm as well
as starting and stopping it. Also implement functions to use swtpm_setup,
which simulates the manufacturing of a TPM, which includes creation of
certificates for the device.
Further, the external TPM needs storage on the host that we need to set
up before it can be run. We can clean up the host once the domain is
undefined.
This patch also implements a small layer for external device support that
calls into the TPM device layer if a domain has an attached TPM. This is
the layer we will wire up later on.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Replace instances where we previously called virGetLastError just to
either get the code or to check if an error exists with
virGetLastErrorCode to avoid a validity pre-check.
Signed-off-by: Ramy Elkest <ramyelkest@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Since libvirt called bind() and listen() on the UNIX socket, it is
guaranteed that connect() will immediately succeed, if QEMU is running
normally. It will only fail if QEMU has closed the monitor socket by
mistake or if QEMU has exited, letting the kernel close it.
With this in mind we can remove the retry loop and timeout when
connecting to the QEMU monitor if we are doing FD passing. Libvirt can
go straight to sending the QMP greeting and will simply block waiting
for a reply until QEMU is ready.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Now that the old qcow2 encryption is removed we can safely delete all
this code since it's not needed any more.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Create a new vsock endpoint by opening /dev/vhost-vsock,
set the requested CID via ioctl (or assign a free one if auto='yes'),
pass the file descriptor to QEMU and build the command line.
https://bugzilla.redhat.com/show_bug.cgi?id=1291851
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Before we generate the command line for qemu, if the domain about to
be launched desires to utilize the VM Generation ID functionality, then
handle both the regenerating the GUID value for backup recovery (restore
operation) and the startup after snapshot as both require a new GUID to
be generated to allow the guest operating system to recognize the VM
is re-executing something that has already executed before.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
After the text monitor was deleted this event can't be triggered.
Remove it and all the unnecessary code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We currently print the libvirt and qemu version strings into the
per-guest logfile. It would be useful to know what kernel is running
too, so add that.
Reviewed-by: Kashyap Chamarthy <kchamart@redhat.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Libvirt only manages one PR daemon. This means that we don't need to
pass the 'disk' object and also rename the functions dealing with this
so that it's obvious we only deal with the managed PR daemon.
Signed-off-by: Peter Krempa <pkrempa@redhat st.com>
Rather than always checking which path to use pre-assign it when
preparing storage source.
This reduces the need to pass 'vm' around too much. For later use the
path can be retrieved from the status XML.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
When attaching a disk that requires pr-manager we might need to
plug the pr-manager object and start the pr-helper process.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>