Require that all headers are guarded by a symbol named
LIBVIRT_$FILENAME
where $FILENAME is the uppercased filename, with all characters
outside a-z changed into '_'.
Note we do not use a leading __ because that is technically a
namespace reserved for the toolchain.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This introduces a syntax-check script that validates header files use a
common layout:
/*
...copyright header...
*/
<one blank line>
#ifndef SYMBOL
# define SYMBOL
....content....
#endif /* SYMBOL */
For any file ending priv.h, before the #ifndef, we will require a
guard to prevent bogus imports:
#ifndef SYMBOL_ALLOW
# error ....
#endif /* SYMBOL_ALLOW */
<one blank line>
The many mistakes this script identifies are then fixed.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
For some reason, xdr_free uses char * instead of void * for its 2nd
argument which is passed to a custom free routine. Commit
dc54b3ec missed this detail which made the build fail on a number of
platforms. Fix it by explicitly casting the object pointer to char *
just like we do in other places throughout the code base.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
When libvirt is reconnecting to running domain that uses cgroup v2
the QEMU process reports cgroup for the emulator directory because the
main thread is in that cgroup. We need to remove the "/emulator" part
in order to match with the root cgroup directory name for that domain.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The rewrite to support cgroup v2 missed this function. In cgroup v2
we have different files to track tasks.
We would fail to remove cgroup on non-systemd OSes if there is any
extra process assigned to guest cgroup because we would not kill any
process form the guest cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Turns out there some build platforms that must not define MOUNT
or VGCHANGE in config.h... So moving the commands from the storage
backend specific module into a common storage_util module causes
issues for those platforms.
So instead of assuming they are there, let's just pass the command
string to the storage util API's from the storage backend specific
code (as would have been successful before). Also modify the test
to determine whether the MOUNT and/or VGCHANGE doesn't exist and
just define it to (for example) what Fedora has for the path. Could
have just used "mount" and "vgchange" in the call, but that defeats
the purpose of adding the call to virTestClearCommandPath.
Signed-off-by: John Ferlan <jferlan@redhat.com>
The make_nonnull_XXX methods can all fail due to OOM but this was being
silently ignored and thus also not checked by callers. Make the methods
propagate errors and use ATTRIBUTE_RETURN_CHECK to force callers to deal
with it.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In many files there are header comments that contain an Author:
statement, supposedly reflecting who originally wrote the code.
In a large collaborative project like libvirt, any non-trivial
file will have been modified by a large number of different
contributors. IOW, the Author: comments are quickly out of date,
omitting people who have made significant contribitions.
In some places Author: lines have been added despite the person
merely being responsible for creating the file by moving existing
code out of another file. IOW, the Author: lines give an incorrect
record of authorship.
With this all in mind, the comments are useless as a means to identify
who to talk to about code in a particular file. Contributors will always
be better off using 'git log' and 'git blame' if they need to find the
author of a particular bit of code.
This commit thus deletes all Author: comments from the source and adds
a rule to prevent them reappearing.
The Copyright headers are similarly misleading and inaccurate, however,
we cannot delete these as they have legal meaning, despite being largely
inaccurate. In addition only the copyright holder is permitted to change
their respective copyright statement.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Support for nested KVM is handled via a kernel module configuration
parameters values for kvm_intel, kvm_amd, kvm_hv (PPC), or kvm (s390).
While it's possible to fetch the kmod config values via virKModConfig,
unfortunately that is the static value and we need to get the
current/dynamic value from the kernel file system.
So this patch adds a new API virHostKVMSupportsNesting that will
search the 3 kernel modules to get the nesting value and check if
it is 'Y' (or 'y' just in case) to return a true/false whether
the KVM kernel supports nesting.
We need to do this in order to handle cases where adjustments to
the value are made after libvirtd is started to force a refetch of
the latest QEMU capabilities since the correct CPU settings need
to be made for a guest to add the "vmx=on" to/for the guest config.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1656255
If virSecretGetSecretString is using by secretLookupByUUID,
then it's possible the found sec->usageType doesn't match the
desired @secretUsageType. If this occurs for the encrypted
volume creation processing and a subsequent pool refresh is
executed, then the secret used to create the volume will not
be found by the storageBackendLoadDefaultSecrets which expects
to find secrets by VIR_SECRET_USAGE_TYPE_VOLUME.
Add a check to virSecretGetSecretString to avoid the possibility
along with an error indicating the incorrect matched types.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Add the logical storage pool startup validation (xml2argv) tests.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
It's only pass as 0 or 1 and used as a bool, let's just use a bool
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Let's create helpers for each style of command line created. This
primarily is easier on the eyes rather than the large multi line
if-then-else-else clause used, but may also be useful if in the
future any particular pool needs to add to the command line based
on pool xml format.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Move virStorageBackendFileSystemMountCmd to storage_util so that
it can be used by the test harness.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Extract out the code that is used to create the MOUNT command
for starting the pool. We can use this for Storage Pool XML
to Argv testing to ensure code changes don't alter how a
storage pool is started.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1624223
There are two ways to request memory preallocation on cmd line:
-mem-prealloc and .prealloc attribute for a memory-backend-file.
However, as it turns out it's not safe to use both at the same
time. If -mem-prealloc is used then qemu will fully allocate the
memory (this is done by actually touching every page that has
been allocated). Then, if .prealloc=yes is specified,
mbind(flags = MPOL_MF_STRICT | MPOL_MF_MOVE) is called which:
a) has to (possibly) move the memory to a different NUMA node,
b) can have no effect when hugepages are in play (thus ignoring user
request to place memory on desired NUMA nodes).
Prefer -mem-prealloc as it is more backward compatible
compared to switching to "-numa node,memdev= + -object
memory-backend-file".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
So far we have two arguments that we are passing to
qemuBuildMemoryBackendProps() and that are taken from domain
private data: @qemuCaps and @autoNodeset. In the next commit I
will use one more item from there. Therefore, instead of having
it as yet another argument to the function, pass pointer to the
private data object.
There is one change in qemuDomainAttachMemory() where previously
@autoNodeset was NULL but now is priv->autoNodeset (which may be
set). This is safe to do as @autoNodeset is advisory only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1624336
Add a check during virDomainDefCompatibleDevice whether the
domain supports cold/hotplug of a memory module even though
this duplicates the qemuDomainDefValidateMemoryHotplug check.
Without this check, the cold/hot plug would fail on the
subsequent mem_memory check (since it's 0). Adding a check
for max_memory > 0 would allow the subsequent hotplug check
to fail, but would cause coldplug to fail with the somewhat
opaque message "no free memory device slot available".
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
If virDomainDefCompatibleDevice fails because there is insufficient
domain def->mem.max_memory, then let's also print out that value in
the error message.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
The QEMU validation code for graphics has been in place for a while, but
because it is only executed from virDomainDeviceInfoIterateInternal, it
was never run, since the iterator expects the device to have boot info
which graphics don't have. The unfortunate side effect of this whole mess
was that a few capabilities were missing from the test suite (as commit
d8266ebe1 demonstrated with graphics-spice-invalid-egl-headless test),
which in turn meant that a few graphics tests which expected a failure
happily accepted any failure the test runtime returned which made them
succeed. The impact of this was that we then allowed to start a domain
with multiple OpenGL-enabled graphics devices.
This patch enables iteration over graphics devices. Unsurprisingly,
a few tests started to fail as a result, so fix those too.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Validation of domain devices is accomplished via a generic device
iterator which takes a callback, iterates over all kinds of supported
device types and invokes the callback on every single device. However,
there might be cases when we need to alter the behaviour of the
iteration (most notably skip or include a group of devices). Therefore,
this patch introduces iterator flags.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Since the code was never run, it would have been very hard to spot this
mistake, especially since the compiler can't really warn about it.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This commit fixes a bug when you have multiple network settings defined.
Basically, if you set an IPv6 or IPv4 gateway, it carries on next
network settings. It is happening because the data is not being
initialized when a new network type is defined. So, the old data still
persists into the pointer. Another way to initialized the data was
introduced using memset() to avoid missing attributes from the struct.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Commit 255e0732 introduced a few graphics-related helpers. The problem
is that virDomainGraphicsNeedsAutoRenderNode returns true if it gets
NULL as a response from virDomainGraphicsNeedsAutoRenderNode. That's
okay for egl-headless because that one always needs a DRM render node,
the same is not true for SPICE though, and unless the XML specifies
<gl enable='yes'> for SPICE, there's no need for any renderer.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Disable external snapshot of a readonly disk for domains as
this operation is not very useful. Such a snapshot is not
possible for active domains but the error message from QEMU
is more cryptic:
error: internal error: unable to execute QEMU command 'transaction':
Could not create file: Permission denied
This error at least makes the error more understandable for
active domains and disallows for inactive domains as well.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
If domain is killed with `xl destroy`, libvirt will not notice it and
still report the domain as running. Also trying to destroy the domain
through libvirt will fail. The only way to recover from such a situation
is to restart libvirt daemon. The problem is that even though libxl
report LIBXL_EVENT_TYPE_DOMAIN_DEATH, libvirt ignore it as all the
domain cleanup is done in a function actually destroying the domain. If
destroy is done outside of libvirt, there is no place where it would be
handled.
Fix this by doing domain cleanup in LIBXL_EVENT_TYPE_DOMAIN_DEATH too.
To avoid doing it twice, add a ignoreDeathEvent flag
libxlDomainObjPrivate, set when the domain death is triggered by libvirt
itself.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Since domain was suspended before and on failed wakeup is destroyed,
send an event.
Also, add missing libxlDomainCleanup.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Commit 017dfa27d changed a few switch statements in the LXC code to
have all possible enum values, and in the process changed the switch
statement in virLXCControllerGetNICIndexes() to return an error status
for unsupported interface types, but it erroneously put type='direct'
on the list of unsupported types.
type='direct' (implemented with a macvlan interface) is supported on
LXC, but it's interface shouldn't be placed on the list of interfaces
given to CreateMachineWithNetwork() because the interface is put
inside the container, while CreateMachineWithNetwork() only wants to
know about the parent veths of veth pairs (the parent veth remains on
the host side, while the child veth is put into the container).
Resolves: https://bugzilla.redhat.com/1656463
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virLXCControllerGetNICIndexes() was deciding whether or not to add the
ifindex for an interface's ifname to the list of ifindexes sent to
CreateMachineWithNetwork based on the interface type stored in the
config. This would be incorrect in the case of <interface
type='network'> where the network was giving out macvlan interfaces
tied to a physical device (i.e. when the actual interface type was
"direct").
Instead of checking the setting of "net->type", we should be checking
the setting of virDomainNetGetActualType(net).
I don't think this caused any actual misbehavior, it was just
technically wrong.
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Most of the iptables APIs share code for the add/delete paths, but a
couple were separated. Merge the remaining APIs to facilitate future
changes.
Reviewed-by: Laine Stump <laine@laine.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Add support for converting openvswitch interface configuration
to/from libvirt domXML and xl.cfg(5). The xl config syntax for
virtual interfaces is described in detail in the
xl-network-configuration(5) man page. The Xen Networking wiki
also contains information and examples for using openvswitch
in xl.cfg config format
https://wiki.xenproject.org/wiki/Xen_Networking#Open_vSwitch
Tests are added to check conversions of openvswitch tagged and
trunked VLAN configuration.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
It is currently possible to use <interface>s of type openvswitch
with the libxl driver in a non-standard way, e.g.
<interface type='bridge'>
<source bridge='ovsbr0'/>
<mac address='00:16:3e:7a:35:ce'/>
<script path='vif-openvswitch'/>
</interface>
This patch adds support for openvswitch <interface>s specified
in typical libvirt config
<interface type='bridge'>
<source bridge='ovsbr0'/>
<mac address='00:16:3e:7a:35:ce'/>
<virtualport type='openvswitch'/>
</interface>
VLAN tags and trunking are also supported using the extended
syntax for specifying an openvswitch bridge in libxl
BRIDGE_NAME[.VLAN][:TRUNK:TRUNK]
See Xen's networking wiki for more details on openvswitch support
https://wiki.xenproject.org/wiki/Xen_Networking#Open_vSwitch
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 212dc9286 made a generic qemuDomainGetIOThreadsMon which
would fail if the QEMU_CAPS_OBJECT_IOTHREAD didn't exist. Then
commit d1eac927 used that helper for the collection of all domain
stats. However, if the capability doesn't exist, then the entire
stats collection fails. Since the IOThread stats were meant to be
if available only, thus rather than failing if the capability
doesn't exist, let's just not collect the stats. Restore the caps
failure logic for qemuDomainGetIOThreadsLive.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes
a failure, then when collecting more than one domain's worth of
statistics the loop in virDomainStatsRecordListFree would call
virDomainFree which would call virResetLastError effectively wiping
out the reason we failed leaving the caller with no idea why the
collection failed.
To fix this, let's Preserve the error and Restore it prior to return
so that a caller such as 'virsh domstats' doesn't get the generic
"error: An error occurred, but the cause is unknown".
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We already have a way stricter check in the code which is doing the
snapshot so duplicating it in the parser does not make much sense. Also
gets rid of an ugly ternary operator.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We are preparing a certain disk source passed in as '@src' so the
individual functions should use that rather than disk->src which
corresponds to the top level element of the chain only.
Without this change TLS and persistent reservations would not work for
backing images of a chain when using -blockdev.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function clears and frees the passed buffers on success, but not in
one case of failure. Modify the control flow that the args are always
consumed, record it in the docs and remove few pointless cleanup paths
in callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1656014
An RNG device can consists of more devices than RND device
itself. For instance, in case of EGD there is a chardev that
connects to EGD daemon and feeds the qemu with random data. When
doing RNG device removal we have to remove the associated chardev
as well.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The way that the code is currently written makes my eyes hurt.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There are two functions called from syncNicRxFilterMultiMode:
virNetDevSetRcvAllMulti() and virNetDevSetRcvMulti(). Both of
them return 0 on success and -1 on error. However, currently
their return value is checked for != 0 which conflicts with our
assumptions on retvals: a positive value is still considered
success but with current check it would lead to failure.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Depending on whether QEMU actually supports the option, we can put the
'rendernode' on the '-display egl-headless' cmdline.
https://bugzilla.redhat.com/show_bug.cgi?id=1628892
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Just like for SPICE, we need to change the permissions on the DRI device
used as the @rendernode for egl-headless graphics type.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Just like for SPICE, we need to put the render node DRI device into the
device cgroup list so that users don't need to add it manually via
qemu.conf file.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Just like for SPICE, we need to put the DRI device into the namespace,
otherwise it will be left out from the DAC relabeling process.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Unlike with SPICE and SDL which use the <gl> subelement to enable OpenGL
acceleration, specifying egl-headless graphics in the XML has
essentially the same meaning, thus in case of egl-headless we don't have
a need for the 'enable' element attribute and we'll only be interested
in the 'rendernode' one further down the road.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Since we need to specify the rendernode option onto QEMU cmdline, we
need this union member to retain consistency in how we build the
cmdline.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that we have QAPI introspection of display types in QEMU upstream,
we can check whether the 'rendernode' option is supported with
egl-headless display type.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We're going to need a bit more logic for egl-headless down the road so
prepare a helper just like for the other display types.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Up until now, we formatted 'rendernode=' onto QEMU cmdline only if the
user specified it in the XML, otherwise we let QEMU do it for us. This
causes permission issues because by default the /dev/dri/renderDX
permissions are as follows:
crw-rw----. 1 root video
There's literally no reason why it shouldn't be libvirt picking the DRM
render node instead of QEMU, that way (and because we're using
namespaces by default), we can safely relabel the device within the
namespace.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
A few simple helpers that allow us to determine whether a graphics can
and will need to make use of a DRM render node.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This is the first step towards libvirt picking the first available
render node instead of QEMU. It also makes sense for us to be able to do
that, since we allow specifying the node directly for SPICE, so if
there's no render node specified by the user, we should pick the first
available one. The algorithm used for that is essentially the same as
the one QEMU uses.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Guest network devices can set 'overflow' when there are a number of multicast
ips configured. For virtio_net, the limit is only 64. In this case, the list
of mac addresses is empty and the 'overflow' condition is set. Thus, the guest
will currently receive no multicast traffic in this state.
When 'overflow' is set in the guest, let's turn this into ALLMULTI on the host.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
Support for armv6l qemu guests has been added.
Tested with arm1176 CPU on x86.
Signed-off-by: Stefan Schallenberg <infos@nafets.de>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Post-copy migration has been broken on the source since commit
v3.8.0-245-g32c29f10db which implemented support for
pause-before-switchover QEMU migration capability.
Even though the migration itself went well, the source did not really
know when it switched to the post-copy mode despite the messages logged
by MIGRATION event handler. As a result of this, the events emitted by
source libvirtd were not accurate and statistics of the completed
migration would cover only the pre-copy part of migration. Moreover, if
migration failed during the post-copy phase for some reason, the source
libvirtd would just happily resume the domain, which could lead to disk
corruption.
With the pause-before-switchover capability enabled, the order of events
emitted by QEMU changed:
pause-before-switchover
disabled enabled
MIGRATION, postcopy-active STOP
STOP MIGRATION, pre-switchover
MIGRATION, postcopy-active
The STOP even handler checks the migration status (postcopy-active) and
sets the domain state accordingly. Which is sufficient when
pause-before-switchover is disabled, but once we enable it, the
migration status is still active when we get STOP from QEMU. Thus the
domain state set in the STOP handler has to be corrected once we are
notified that migration changed to postcopy-active.
This results in two SUSPENDED events to be emitted by the source
libvirtd during post-copy migration. The first one with
VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED detail, while the second one reports
the corrected VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY detail. This is
inevitable because we don't know whether migration will eventually
switch to post-copy at the time we emit the first event.
https://bugzilla.redhat.com/show_bug.cgi?id=1647365
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Both VIR_DOMAIN_FEATURE_HPT and VIR_DOMAIN_FEATURE_HTM are
handled in the exact same way, so we can remove some duplicated
code without losing any functionality.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
The call of virResctrlMonitorGetStats will allocate the memory for
holding cache occupancy or memory bandwidth statistics.
This patch adds the function virResctrlMonitorFreeStats as the
opposing action of virResctrlMonitorGetStats to free the memory.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Return a list of virResctrlMonitorStatsPtr instead of
a virResctrlMonitorStats array in virResctrlMonitorGetStats.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Handle PVH domain type in both directions (xen-xl->xml, xml->xen-xl).
And add a test for it.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
builder="hvm" is deprecated since Xen 4.10, new syntax is type="hvm" (or
type="pv", which is default). Since the old one is still supported,
still use it when writing native config, so the config will work on
older Xen too (and will also not complicate tests).
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Since this is something between PV and HVM, it makes sense to put the
setting in place where domain type is specified.
To enable it, use <os><type machine="xenpvh">xenpvh</type></os>. It is
also included in capabilities.xml, for every supported HVM guest type - it
doesn't seems to be any other requirement (besides new enough Xen).
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Make it easier to share HVM and PVH code where relevant. No functional
change.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
The test driver state (@testDriver) uses it's own reference counting
and locking implementation. Instead of doing that, convert @testDriver
into a virObjectLockable and use the provided functionalities.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
There are certain cases e.g. containers where the sysfs path might
exists, but might fail. Unfortunately the exact restrictions are only
known to libvirt when trying to write to it so we need to try it.
But in case it fails there is no need to fully abort, in those cases try
to fall back to the older ioctl interface which can still work.
That makes setting up a bridge in unprivileged LXD containers work.
Fixes: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1802906
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Reported-by: Brian Candler <b.candler@pobox.com>
If migration is cancelled or confirm phase fails the domain
should be kept on the source even if VIR_MIGRATE_UNDEFINE_SOURCE
was requested.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
There are some checks done when parsing a migration cookie. For
instance, one of the checks ensures that the domain is not being
migrated onto the same host. If that is the case, then we are in
big trouble because the @vm is the same domain object used by
source and it has some jobs sets and everything so recovering
from failed cookie parsing would be needlessly hard.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The function currently takes virDomainObjPtr because it's using
both: the domain definition and domain private data.
Unfortunately, this means that in prepare phase we can't parse
migration cookie before putting incoming domain def onto domain
objects list (addressed in the very next commit). Change the
arguments so that virDomainDef and private data are passed
instead of virDomainObjPtr.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
There are several functions called in the cleanup path. Some of
them do report error (e.g. qemuDomainRemoveInactiveJob()) which
may result in overwriting an error reported earlier with some
less useful message.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
virt-aa-helper needs to grant QEMU access to VFIO MDEV devices.
This extends commit 74e86b6b which only covered PCI hostdevs for VFIO-PCI
assignment by now also covering vfio MDEVs.
It has still the same limitations regarding the device lifecycle, IOW we're
unable to predict the actual VFIO device being created, thus we need
wildcards.
Also note that the hotplug case, where apparmor is able to detect the actual
VFIO device during runtime, is already covered by commit 606afafb.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
The path argument of virFileIsDir should be a full name
of file, pathname and filename. Fixed it by passing the
full path name to virFileIsDir.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Since the functions only return 0 or 1, they should return bool. I missed the
change when "refactoring" the first commit.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1545732
Implement the QEMU driver mechanism in order to set the polling
parameters for an IOThread within the bounds specified by the
QEMU qapi parameter passing.
Based heavily on patches originally posted by Pavel Hrdina
<phrdina@redhat.com>, but modified to only handle alterations
for a running guest. For the most part the API names changed,
the typed parameters removed the poll enabled value, and the
capabilities check was moved to just before the live attempt
to set. Since changes are only supported for a running guest,
no guest XML alterations were kept.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Add a capability check for IOThread polling (all were added at the
same time, so only one check is necessary).
Based on code originally posted by Pavel Hrdina <phrdina@redhat.com>
with the only changes to include the more recent QEMU releases.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Rather than passing an iothread_id, let's pass a qemuMonitorIOThreadInfo
structure so that a subsequent change to modify the iothread info can
just generate and pass one.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
We're about to add a new state "modify" and thus the function
goes from just Add/Del. Use an enum to manage.
Extracted from code originally posted by Pavel Hrdina
<phrdina@redhat.com>, but placed into a separate patch.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Add functions to set the IOThreadInfo param data for the live guest.
Modify the _qemuMonitorIOThreadInfo to have a flag to indicate when
a value was set so that we don't set a value unless it was desired
to be set.
Based on code originally posted by Pavel Hrdina <phrdina@redhat.com>,
but extracted into a separate patch. Note that qapi expects to receive
integer parameters rather than unsigned long long or unsigned int's.
QEMU does save the value in larger signed 64 bit values eventually.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Create a new API that will allow an adjustment of IOThread
polling parameters for the specified IOThread. These parameters
will not be saved in the guest XML. Currently the only parameters
supported will allow the hypervisor to adjust the parameters used
to limit and alter the scope of the polling interval. The polling
interval allows the IOThread to spend more or less time processing
in the guest.
Based on code originally posted by Pavel Hrdina <phrdina@redhat.com>
to add virDomainAddIOThreadParams and virDomainModIOThreadParams.
Modification of those changes to use virDomainSetIOThreadParams
instead and remove concepts related to saving the data in guest
XML as well as the way to specifically enable the polling parameters.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Process the IOThreads polling stats if available. Generate the
output params record to be returned to the caller with the three
values - poll-max-ns, poll-grow, and poll-shrink.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Separate out the fetch of the IOThread monitor call into a separate
helper so that a subsequent domain statistics change can fetch the raw
IOThread data and parse it as it sees fit.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
If there are IOThread polling values in the query-iothreads return
buffer, then fill them in and set a bool indicating their presence.
This will allow for displaying in a domain stats output eventually.
Note that the QEMU values are managed a bit differently (as int's
stored in int64_t's) than we will manage them (as unsigned long and
int values). This is intentional to allow for value validation
checking when it comes time to provide the values to QEMU.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
virXMLFormatElement() might fail, but we were not checking
its return value.
Fixing this requires us to change virDomainDeviceInfoFormat()
so that it can report an error back to the caller.
Introduced-by: 0d6b87335c
Spotted-by: Coverity
Reported-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
In many cases, an early exit from a function would cause
memory allocated by local virBuffer instances not to be
released.
Provide proper cleanup paths to solve the issue.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This avoids setting 'ret' multiple times, which will result
in errors being masked if the first operation fails but the
second one succeeds.
Introduced-by: f183b87fc1
Spotted-by: Coverity
Reported-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Trying to use virlockd to lock metadata turns out to be too big
gun. Since we will always spawn a separate process for relabeling
we are safe to use thread unsafe POSIX locks and take out
virtlockd completely out of the picture.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When metadata locking is enabled that means the security commit
processing will be run in a fork similar to how namespaces use fork()'s
for processing. This is done to ensure libvirt can properly and
synchronously modify the metadata to store the original owner data.
Since fork()'s (e.g. virFork) have been seen as a performance bottleneck
being able to disable them allows the admin to choose whether the
performance 'hit' is worth the extra 'security' of being able to
remember the original owner of a lock.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
For metadata locking we might need an extra fork() which given
latest attempts to do fewer fork()-s is suboptimal. Therefore,
there will be a qemu.conf knob to {en|dis}able this feature. But
since the feature is actually not metadata locking itself rather
than remembering of the original owner of the file this is named
as 'rememberOwner'. But patches for that feature are not even
posted yet so there is actually no qemu.conf entry in this patch
nor a way to enable this feature.
Even though this is effectively a dead code for now it is still
desired.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The TPM code currently accepts pointer to a domain definition.
This is okay for now, but in near future the security driver APIs
it calls will require domain object. Therefore, change the TPM
code to accept the domain object pointer.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Both virProcessRunInMountNamespace() and virProcessRunInFork()
look very similar. De-duplicate the code and make
virProcessRunInMountNamespace() call virProcessRunInFork().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This new helper can be used to spawn a child process and run
passed callback from it. This will come handy esp. if the
callback is not thread safe.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when
the capability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous
source type could be automatically using a memfd. However, there are
some complications when migrating from different memory backends in
qemu (mainly due to the internal object naming at this point, but
there could be more). For now, it is simpler and safer to simply
introduce a new source type "memfd". Eventually, the "anonymous" type
could learn to use memfd transparently in a separate change.
The main benefits are that it doesn't need to create filesystem files,
and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
QEMU 3.1 should only expose the property if the host is actually
capable of creating hugetable-backed memfd. However, it may fail
at runtime depending on requested "hugetlbsize".
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit ("qemu_domain.c: moving maxCpu validation to
qemuDomainDefValidate") shortened the code of qemuProcessStartValidateXML.
The function is called only by qemuProcessStartValidate, in the
same file, and its code is now a single check that calls virDomainDefValidate.
Instead of leaving a function call just to execute a single check,
this patch puts the check in the body of qemuProcessStartValidate in the
place where qemuProcessStartValidateXML was being called. The function can
now be removed.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Previous patch removed the call to qemuProcessValidateCpuCount
from qemuProcessStartValidateXML, in qemu_process.c. The only
caller left is qemuDomainDefValidate, in qemu_domain.c.
Instead of having a public function declared inside qemu_process.c
that isn't used in that file, this patch moves the function to
qemu_domain.c, making in static and renaming it to
qemuDomainValidateCpuCount to be compliant with other static
functions names in the file.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Adding maxCpu validation in qemuDomainDefValidate allows the user to
spot over the board maxCpus counts at editing time, instead of
facing a runtime error when starting the domain. This check is also
arch independent.
This leaves us with 2 calls to qemuProcessValidateCpuCount: one in
qemuProcessStartValidateXML and the new one at qemuDomainDefValidate.
The call in qemuProcessStartValidateXML is redundant. Following
up in that code, there is a call to virDomainDefValidate, which
in turn will call config.domainValidateCallback. In this case, the
callback function is qemuDomainDefValidate. This means that, on startup
time, qemuProcessValidateCpuCount will be called twice.
To avoid that, let's also remove the qemuProcessValidateCpuCount call
from qemuProcessStartValidateXML.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
qemuValidateCpuCount validates the maxCpus value of a domain at
startup time, preventing it to start if the value exceeds a maximum.
This checking is also done at qemu_domain.c, qemuDomainDefValidate.
However, it is done only for x86 (and even then, in a specific
scenario). We want this check to be done for all archs.
To accomplish this, let's first make qemuValidateCpuCount public so
it can be used inside qemuDomainDefValidate. The function was renamed
to qemuProcessValidateCpuCount to be compliant with the other public
methods at qemu_process.h. The method signature was slightly adapted
to fit the const 'def' variable used in qemuDomainDefValidate. This
change has no downside in in its original usage at
qemuProcessStartValidateXML.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Adding the maxCpus value in the error message of qemuValidateCpuCount
allows the user to set an acceptable maxCpus count without knowing
QEMU internals.
x86 guests, that might have been created prior to the x86
qemuDomainDefValidate maxCpus check code (that validates the maxCpus value
in editing time), will also benefit from this change.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The commit 89563efc02 fix the
monitor error when closing the QEMU monitor. The QEMU agent
has a problem similar to QEMU monitor. So fix the QEMU agent
with the same method.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We were mistakenly skipping virZPCIDeviceAddressIsEmpty() and
virZPCIDeviceAddressIsValid() when compiling on non-Linux,
which unsurprisingly ended up causing linking failures later
in the build process.
Clue-stick-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
This commit adds hotplug support for PCI devices on S390 guests.
There's no need to implement hot unplug for zPCI as QEMU implements
an unplug callback which will unplug both PCI and zPCI device in a
cascaded way.
Currently, the following PCI devices are supported:
virtio-blk-pci
virtio-net-pci
virtio-rng-pci
virtio-input-host-pci
virtio-keyboard-pci
virtio-mouse-pci
virtio-tablet-pci
vfio-pci
SCSIVhost device
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Add new functions to generate zPCI command string and append it to
QEMU command line. And the related tests are added.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This patch adds new functions for reservation, assignment and release
to handle the uid/fid. If the uid/fid is defined in the domain XML,
they will be reserved directly in the collecting phase. If any of them
is not defined, we will find out an available value for them from the
zPCI address hashtable, and reserve them. For the hotplug case there
might not be a zPCI definition. So allocate and reserve uid/fid the
case. Assign if needed and reserve uid/fid for the defined case.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
We should ensure that QEMU supports zPCI when a zPCI address is defined
in XML and otherwise report an error. This patch introduces a generic
validation function qemuDomainDeviceDefValidateAddress() which calls
qemuDomainDeviceDefValidateZPCIAddress() if address type is PCI address.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This patch introduces new XML parser/formatter functions. Uid is
16-bit and non-zero. Fid is 32-bit. They are the two attributes of zpci
which is introduced as PCI address element. Zpci element is parsed and
formatted along with PCI address. And add the related test cases.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
In order to add zPCI child element for PCI address, we update
virDomainDeviceInfoFormat() to format device info by helper function
virXMLFormatElement(). Then we could simply format zPCI address into
child buffer later.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
The pci-root depends on zpci capability. So autogenerate pci-root if
zpci exists.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This patch provides a caching mechanism for the device address
extensions uid and fid on S390. For efficient sparse address allocation,
we introduce two hash tables for uid/fid which hold the address set
information per domain. Also in order to improve performance of
searching available value, we introduce our own callbacks for the two
hashtables. In this way, uid/fid is saved in hash key and hash value
could be any non-NULL pointer due to no operation on hash value. That is
also the reason why we don't introduce hash value free callback.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This patch introduces PCI address extension flag for virDomainDeviceInfo
and virPCIDeviceAddress. The extension flag in virDomainDeviceInfo is
used internally during calculating PCI extension flag. The one in
virPCIDeviceAddress is the duplicate to indicate extension address is
being used. Currently only zPCI extension address is introduced to deal
with 'uid' and 'fid' on the S390 platform.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
QEMU on s390 supports PCI multibus since forever.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Let's introduce zPCI capability.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Add zPCI definitions in preparation of extending the PCI address
with parameters uid (user-defined identifier) and fid (PCI function
identifier).
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Support Hyper-V Enlightened VMCS in domain config. QEMU support will
be implemented in the next patch, adding interim VIR_DOMAIN_HYPERV_EVMCS
cases to src/qemu/* for now.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
QEMU 3.1 supports Hyper-V-style PV IPIs making it cheaper for Windows
guests to send an IPI, especially when it targets many CPUs.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Support Hyper-V PV IPI enlightenment in domain config. QEMU support will
be implemented in the next patch, adding interim VIR_DOMAIN_HYPERV_IPI
cases to src/qemu/* for now.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
This is a simple removal of a duplicated check of the return of the
filter function. There is a nested conditional checking exactly the same
thing since commit c9ede1cf removed the (ret > 0) check condition.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The function qemuDomainGetHostdevPath() is using VIR_FREE to free the
paths stored in tmpPaths. Both syntax analyzer are reporting a warning
about this. Replacing the old method to function
virStringListFreeCount() fixes the warnings/errors.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch introduce the new settings for LXC 3.0 or higher. The older
versions keep the compatibility to deprecated settings for LXC, but
after release 3.0, the compatibility was removed. This commit adds the
support to the refactored settings.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Commit 901d2b9c introduced virCgroupGetMemoryStat and replaced
the LXC virLXCCgroupGetMemStat logic in commit e634c7cd0. However,
in doing so the replacement wasn't exact as the LXC logic used
getline() to process the cgroup controller data, while the new
virCgroupGetMemoryStat used "memory.stat" manual buffer read/
processing which neglected to forward through @line in order
to read each line in the output.
To fix that, we should be sure to carry forward the @line value
for each line read updating it beyond that current @newLine value
once we've calculated the values that we want.
Signed-off-by: Peter Chubb <peter.chubb@data61.csiro.au>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1631622
If polkit authentication is enabled, an attempt to open
the connection failed during virAccessDriverPolkitGetCaller
when the call to virIdentityGetCurrent returned NULL resulting
in the errors:
virAccessDriverPolkitGetCaller:87 : access denied:
Policy kit denied action org.libvirt.api.connect.getattr from <anonymous>
Because qemuProcessReconnect runs in a thread during
daemonRunStateInit processing it doesn't have the thread
local identity. Thus when the virGetConnectNWFilter is
called as part of the qemuProcessFiltersInstantiate when
virDomainConfNWFilterInstantiate is run the attempt to get
the idenity fails and results in the anonymous error above.
To fix this, let's grab/use the virIdenityPtr of the process
that will be creating the thread, e.g. what daemonRunStateInit
has set and use that for our thread. That way any other similar
processing that uses/requires an identity for any other call
that would have previously been successfully run won't fail in
a similar manner.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1631606
Changes made to manage and utilize a secondary connection
driver to APIs outside the scope of the primary connection
driver have resulted in some confusion processing polkit rules
since the simple "access denied" error message doesn't provide
enough of a clue when combined with the "authentication failed:
access denied by policy" as to which connection driver refused
or failed the ACL check.
In order to provide some context, let's modify the existing
"access denied" error returned from the various vir*EnsureACL
API's to provide the connection driver name that is causing
the failure. This should provide the context for writing the
polkit rules that would allow access via the driver, but yet
still adhere to the virAccessManagerSanitizeError commentary
regarding not telling the user why access was denied.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This reverts commit ccc72d5cbd.
Based on upstream comment to a follow-up patch, this didn't take the
right approach and the right thing to do is revert and rework.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Missed during review and surprisingly my run through Coverity also
didn't see this. I only noticed it when reading the code while fixing
the build breaker for commit 36780a86a.
With all those continues we would leak @stats.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Refactoring qemuDomainGetStatsCpu, make it possible to add
more CPU statistics.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add functions for creating, destroying, reconnecting resctrl
monitor in qemu according to the configuration in domain XML.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Introducing <monitor> element under <cachetune> to represent
a cache monitor.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Introduced virDomainResctrlNew to do the most part of virDomainResctrlAppend
and move the operation of appending resctrl to @def->resctrls out of
function.
Rather than rely on virDomainResctrlAppend to perform the allocation, move
the onus to the caller and make use of virBitmapNewCopy for @vcpus and
virObjectRef for @alloc, thus removing the need to set each to NULL after the
call.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add interfaces monitor group to support operations such
as GetID, SetID, Remove, SetAlloc, etc.
Implement the internal virResctrlMonitorGetStats to fetch all
the statistical data and the virResctrlMonitorGetCacheOccupancy
in order to fetch the cache specific "llc_occupancy" value.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Refactor virResctrlAllocSetID generating an error if an attempt
is made to overwrite the existing value.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add interface for creating the resource monitoring group according
to '@virResctrlMonitor->path'.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The code for creating resctrl allocation group could be reused
for monitoring group, refactor it for reuse in the later patch.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The code of adding PID to the allocation could be reused, refactor it
for later reuse.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add interface for resctrl monitor to determine the path.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The code for determining resctrl allocation path could be reused
for monitor. Refactor it for reuse.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Cache Monitoring Technology (aka CMT) provides the capability
to report cache utilization information of system task.
This patch introduces the concept of resctrl monitor through
data structure virResctrlMonitor.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Refactor schemas and virresctrl to support optional <cache> element
in <cachetune>.
Later, the monitor entry will be introduced and to be placed
under <cachetune>. Either cache entry or monitor entry is
an optional element of <cachetune>.
An cachetune has no <cache> element is taking the default resource
allocating policy defined in '/sys/fs/resctrl/schemata'.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When no server name is provided in the URI, modern versions of libxml2
will set the port to '-1'. This is a change from behaviour with earlier
versions which set it to 0.
Libvirt expects the port to be 0 in these cases and as a result we get a
bug when connecting to URIs which lack a server name:
$ virsh -c test+ssh:///default list
error: failed to connect to the hypervisor
error: Cannot recv data: Bad port '-1': Connection reset by peer
This libxml2 change was attempting to fix another bug identified by
libvirt where it didn't roundtrip URIs correctly in:
beb7281055
Essentially libxml2 was not expecting apps to look at the URI port
field when the server name is not provided. This was a reasonable
assumption, but none the less libvirt did look at it :-)
The fix is to ensure we explicitly set port to 0 when server name
is not present, avoiding undefined behaviour for the port field in
libxml2.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This patch introduces a new shutdown reason "daemon" in order
to indicate that the daemon needed to force shutdown the domain
as the best course of action to take at the moment.
This action would occur during reconnection when processing
encounters an error once the monitor reconnection is successful.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Return -1 and report an error message if no transaction is set and
virSecuritySELinuxTransactionCommit is called.
The function description of virSecuritySELinuxTransactionCommit says:
"Also it is considered as error if there's no transaction set and this
function is called."
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
In 4674fc6afd I've implemented transactions for selinux driver.
Well, now that I am working in this area I've noticed a subtle
bug: @ret is initialized to 0 instead of -1. Facepalm.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
VFIO AP has a limitation on a single device per domain, however, when
commit 11708641 added the support for vfio-ap, check for this limitation
was performed as part of the post parse code. Generally, checks like that
should be performed within the driver's validation callback to eliminate
any slight chance of failing in post parse, which could potentially
result in the domain XML config vanishing.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Since we'll need to validate other models apart from VFIO PCI too,
having a helper for each model should keep the code base cleaner.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
There's a lot of stuff going on in src/conf/nodedev_conf which is
sometimes not directly related to config and we're not really consistent
with putting only parser/formatter related stuff here, e.g. like we do
for domains. So, let's start simply by adding a new module
node_device_util containing some of the helpers. Unfortunately, even
though these helpers tend to open a secondary driver connection and would
be much therefore better suited as a nodedev driver module, we can't do
that without pulling headers from the driver into conf/ and that's wrong
because we want conf/ to stay driver-agnostic.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
The gotShutdown bool has been redundant since we started setting
VIR_DOMAIN_SHUTDOWN state after receiving SHUTDOWN event from QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
If gotShutdown is true, the domain state cannot be running because of
the following code in qemuProcessHandleShutdown:
priv->gotShutdown = true;
VIR_DEBUG("Transitioned guest %s to shutdown state",
vm->def->name);
virDomainObjSetState(vm,
VIR_DOMAIN_SHUTDOWN,
VIR_DOMAIN_SHUTDOWN_UNKNOWN);
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
On aarch64, lauch vm with the follow configuration:
<interface type="hostdev" managed="yes">
<mac address="fa:16:3e:14:41:00"/>
<source>
<address type="pci" domain="0x0000" bus="0x01" slot="0x0b" function="0x2"/>
</source>
</interface>
libvirtd will crash when accessing net->model.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
If qemuDomainSnapshotDiscard() fails for any reason (rare,
but possible with an ill-timed ENOMEM or if
qemuDomainSnapshotForEachQcow2() has problems talking to the
qemu guest monitor), then an attempt to retry the snapshot
deletion API will crash because we didn't undo the effects
of virDomainSnapshotDropParent() temporarily rearranging the
internal list structures, and the second attempt to drop
parents will dereference NULL. Fix it by instead noting that
there are only two callers to qemuDomainSnapshotDiscard(),
and only one of the two callers wants the parent to be updated;
thus we can move the call to virDomainSnapshotDropParent()
into a code path that only gets executed on success.
Signed-off-by: Eric Blake <eblake@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Since commit v4.7.0-302-ge6d77a75c4 processing RESUME event is mandatory
for updating domain state. But the event handler explicitly ignored this
event in some cases. Thus the state would be wrong after a fake reboot
or when a domain was rebooted after it crashed.
BTW, the code to ignore RESUME event after SHUTDOWN didn't make sense
even before making RESUME event mandatory. Most likely it was there as a
result of careless copy&paste from qemuProcessHandleStop.
The corresponding debug message was clarified since the original state
does not have to be "paused" only and while we have a "resumed" event,
the state is called "running".
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The array "mount" inside lxc_container is not being checked before for
loop. Clang syntax scan is complaining about this segmentation fault.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The current qemuProcessReconnect logic paints a broad brush
determining that the shutdown reason must be crashed if it was
determined that the domain was started with -no-shutdown; however,
there's many other ways to get to the error label, so let's narrow
our reasoning window for using VIR_DOMAIN_SHUTOFF_CRASHED to the
period where we essentially know we've tried to create to the
monitor and before we were successful in opening the connection.
Failures that occur outside that window would thus be considered
as VIR_DOMAIN_SHUTOFF_UNKNOWN, at least for now.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
When qemuProcessReconnectHelper was introduced (commit d38897a5d)
reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
or VIR_DOMAIN_SHUTOFF_UNKNOWN.
When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.
So introduce qemuDomainIsUsingNoShutdown which will manage the
condition when the domain was started with -no-shutdown so that
when/if reconnection failure occurs we can restore the decision
point used to determine whether CRASHED or UNKNOWN is provided.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
V2 of the libxl soft reset patch, which was pushed as commit da4b0fd9,
dropped the hunk that disposed of the libxl_domain_config object. Add
the missing hunk to properly dispose the object.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
The pvops Linux kernel implements machine_ops.crash_shutdown as
static void xen_hvm_crash_shutdown(struct pt_regs *regs)
{
native_machine_crash_shutdown(regs);
xen_reboot(SHUTDOWN_soft_reset);
}
but currently the libxl driver does not handle the soft reset
shutdown event. As a result, the guest domain never proceeds
past xen_reboot(), making it impossible for HVM domains to save
a crash dump using kexec.
This patch adds support for handling the soft reset event by
calling libxl_domain_soft_reset() and re-enabling domain death
events, which is similar to the xl tool handling of soft reset
shutdown event.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
There are too many goto labels in libxlDomainShutdownThread. Convert the
'destroy' and 'restart' labels to helper functions, leaving only the
commonly used pattern of 'endjob' and 'cleanup' labels.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
In libxlDomainShutdownThread, virObjectEventStateQueue is needlessly
called in the destroy and restart labels. The cleanup label aready
queues whatever event was created based on libxl_shutdown_reason.
There is no need to handle destroy and restart differently.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Our HACKING guide forbids these.
There's no point in exempting these from the spacing check
if their existence is against our coding style.
Note that the non-usage of these comments itself is not enforced
by syntax check, probably because of the need to implement a C parser.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1631606
Changes made to manage and utilize a secondary connection
driver to APIs outside the scope of the primary connection
driver have resulted in some confusion processing polkit rules
since the simple "access denied" error message doesn't provide
enough of a clue when combined with the "authentication failed:
access denied by policy" as to which connection driver refused
or failed the ACL check.
In order to provide some context, let's modify the existing
"access denied" error returne from the various vir*EnsureACL
API's to provide the connection driver name that is causing
the failure. This should provide the context for writing the
polkit rules that would allow access via the driver.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 57f5621f modified nwfilterInstantiateFilter to detect when
a filter binding was already present before attempting to add the
new binding and instantiate it. Additionally, the change to
nwfilterStateInitialize to call virNWFilterBindingObjListLoadAllConfigs
(from commit c21679fa3f) to load active domain filter bindings, but
not instantiate them eventually leads to a problem for the QEMU
driver reconnection logic after a daemon restart where the filter
bindings would no longer be instantiated.
Subsequent commit f14c37ce4c replaced the nwfilterInstantiateFilter
with virDomainConfNWFilterInstantiate which uses @ignoreExists to
detect presence of the filter and still did not restore the filter
instantiation call when making the new nwfilter bindings logic active.
Thus in order to instantiate any active domain filter, we will call
virNWFilterBuildAll with 'false' to indicate the need to go through
all the active bindings calling virNWFilterInstantiateFilter to
instantiate the filter bindings.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Commit cdbe1332 neglected to document the API. So let's add some
details about the algorithm and why it was used to help future
readers understand the issues encountered.
NB: Management of the processing udev device notification is a
delicate balance between the udev process, the scheduler, and when
exactly the data from/for the socket is received. The balance is
particularly important for environments when multiple devices are
added into the system more or less simultaneously such as is done
for mdev or SRIOV. In these cases old libudev blocking on the udev
recv() occurs more frequently. It's expected that future devices
will follow similar algorithms. Even though the algorithm does
present some challenges for older OS's (such as Centos 6), trying
to rewrite the algorithm to fit both models would be more complex
and involve pulling the monitor object out of the private data
lockable object and would need to be guarded by a separate lock.
Devising such an algorithm to work around issues with older OS's
at the expense of more modern OS algorithms in newer event processing
code may result in unexpected issues, so the choice is to encourage
use of newer OS's with newer udev event processing code.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1524230
The qemuBuildVhostuserCommandLine builds command line for
vhostuser type interfaces. It is duplicating some code of the
function it is called from (qemuBuildInterfaceCommandLine)
because of the way it's called. If we merge it into the caller
not only we save a few lines but we also enable checks that we
would have to duplicate otherwise (e.g. QoS availability).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
When we have variables A, B, C then there are two ways to free
them. Either in the order they are declared or the reversed one.
Any other ordering is confusing. In this commit I'm reordering
calls to VIR_FREE in the reversed order.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The result of libssh2_userauth_password is being assigned to 'ret' in
one branch and 'rc' in the other branch. Checks are all done against the
'ret' variable, so one branch never does the correct check.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Adjusting domain format documentation, adding device address
support and adding command line generation for vfio-ap.
Since only one mediated hostdev with model vfio-ap is supported a check
disallows to define domains with more than one such hostdev device.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Chris Venteicher <cventeic@redhat.com>
Introduce vfio-ap capability.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Chris Venteicher <cventeic@redhat.com>
IOThread pids info will lost after libvirtd restart, then
if we call pinIOThread, sched_setaffinity will be called with
pid 0, not IOThread pid. So pinIOThread cannot work normally.
Signed-off-by: Jie Wang <wangjie88.huawei.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
virXMLFormatElement() frees attrBuf on success, but not necessarily
on failure. Most other callers of this function take the time to
reset attrBuf afterwords, but qemuDomainObjPrivateXMLFormatBlockjobs()
was relying on it succeeding, and could thus result in a memory leak.
Signed-off-by: Eric Blake <eblake@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1640465
Weirdly enough, there can be symlinks in the path we are trying
to fix. If it is the case our clever algorithm that finds matches
against mount table won't work. Canonicalize path at the
beginning then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The virFileInData() function should return to the caller if the
current position the passed file is in is a data section or a
hole (and also how long the current section is). At any rate,
upon return from this function (be it successful or not) the
original position in the file is restored. This may mess up with
errno which might have been set earlier. Save the errno into a
local variable so it can be restored for the caller's sake.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The QEMU @cfg config variable is unused in context of qemuProcessInit,
let's drop it.
Signed-off-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
If the learning thread is configured to learn on all ethernet frames
(which is hardcoded) then chances are high that there is a packet on
every iteration of inspecting frames loop. As result we will hang on
shutdown because we don't check threadsTerminate if there is packet.
Let's just check termination conditions on every iteration. Since
we'll check each iteration, the check after pcap_next essentially
is unnecessary since on failure we'd loop back to the top and timeout
and then fail.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1632833
When doing a SCSI passthrough we don't put format= onto the
command line. This causes qemu to probe the format automatically
which ends up in a warning in the domain log and possible qemu
disabling writes to the first block (according to the warning
message).
Based-on-work-of: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The @alloc object returned by virDomainResctrlVcpuMatch is not
properly referenced and un-referenced in virDomainCachetuneDefParse.
This patch fixes this problem.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The URI parser used by libvirt does not populate uri->path if the
trailing slash is missing. The code virStorageSourceParseBackingURI
would then not populate src->path.
As only NBD network disks are allowed to have the 'name' field in the
XML defining the disk source omitted we'd generate an invalid XML which
we'd not parse again.
Fix it by populating src->path with an empty string if the uri is
lacking slash.
As pointed out above NBD is special in this case since we actually allow
it being NULL. The URI path is used as export name. Since an empty
export does not make sense the new approach clears the src->path if the
trailing slash is present but nothing else.
Add test cases now to cover all the various cases for NBD and non-NBD
uris as there was to time only 1 test abusing the quirk witout slash for
NBD and all other URIs contained the slash or in case of NBD also the
export name.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The name is misleading. Change it to 'uristr' so that 'path' can be
reused in the proper context later.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
If the same source gets built twice ('build same source on different
hosts at different times') the resulting files may differ.
Fix this by sorting the hash keys before usage.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
There are couple of things wrong with the current implementation.
The first one is that in the first loop the code tries to build a
list of fuse.glusterfs mount points. Well, since the strings are
allocated in a temporary buffer and are not duplicated this
results in wrong decision made later in the code.
The second problem is that the code does not take into account
subtree mounts. For instance, if there's a fuse.gluster mounted
at /some/path and another FS mounted at /some/path/subdir the
code would not recognize this subdir mount.
Reported-by: Han Han <hhan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
If the given path is already a mount point (e.g. a bind mount of
a file, or simply a direct mount point of a FS), then our code
fails to detect that because the first thing it does is cutting
off part after last slash '/'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
On s390x the struct member f_type of statsfs is hard coded to 'unsigned
int'. Change virFileIsSharedFixFUSE() to take a 'long long int' and use
a temporary to avoid pointer-casting.
This fixes the following error:
../../src/util/virfile.c:3578:38: error: cast increases required alignment of target type [-Werror=cast-align]
virFileIsSharedFixFUSE(path, (long *) &sb.f_type);
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Bjoern Walk <bwalk@linux.ibm.com>
virFileReadValueUint does not log errors for non-existient files,
it merely returns -2.
Commit 12093f1 introduced this.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
-net name= will be deprecated in QEMU 3.1:
commit 101625a4d4ac7e96227a156bc5f6d21a9cc383cd
net: Deprecate the "name" parameter of -net
git describe: v3.0.0-791-g101625a4d4
Use the id option instead, supported since QEMU 1.2:
commit 6687b79d636cd60ed9adb1177d0d946b58fa7717
convert net_client_init() to OptsVisitor
git describe: v1.0-3564-g6687b79d63 contains: v1.2.0-rc0~142^2~8
Thankfully, libvirt only uses -net for non-PCI, non-virtio NICs
on ARM.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
We now explicitly handle media change elsewhere so we can drop the
switch statement. This will also make it more intuitive once CDROM
device hotplug might be supported.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Disk hotplug has slightly different semantics from media changing. Move
the media change code out and add proper initialization of the new
source object and proper cleanups if something fails.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The disk hotplug code also overloads media change which is not ideal.
This will allow splitting out of the media change code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The disk storage source needs to be prepared if we want to use -blockdev
or secrets for the new media image. It does not hurt to do the same for
the legacy hotplug code as well.
Unfortunately helpers like qemuDomainPrepareDiskSource take
virDomainDiskDef as an argument and it would be hard to fix them to take
an explicit source, so the function also temporarily replaces disk->src
for the new source in this function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Some functions require us to replace disk->src with the new source for
them to work properly. To avoid confusion all places which allow
explicit virStorageSource should get the appropriate definition.
The legacy code fortunately does not need anything from the old source
so that does not require modifications.
Blockdev does require the old definition so we'll pass it explicitly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Since the code is also used when changing media we need to allow
specifying explicit source for which we are going to prepare. With this
change callers don't have to replace disk->src with the new source
definition for generating these.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
qemu media changing code tried to assume old media's format for the new
one if that was not specified. Since the format will always be present
it does not make sense to keep the code around.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Old media changing code does not bother setting up the secrets for new
media or actually removing/adding of the corresponding objects.
Additionally it uses secrets setup for the old image to be removed as
the secret for the new image which is wrong.
Remove the support for secrets while changing media for the legacy
approach. The only reasonable way to fix it is when using blockdev.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
While the idea was good the implementation not so much as we need to
take into account the old disk data and the new source. The code will be
consolidated later in a different way.
This reverts commit 663b1d55de.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Preparing the storage source prior to assigning the alias will not work
as the names of the certain objects depend on the alias for the legacy
hotplug case as we generate the object names for the secrets based on
the alias.
This reverts commit 192fdaa614.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
This enables to use both cgroup v1 and v2 at the same time together
with libvirt. It is supported by kernel and there is valid use-case,
not all controllers are implemented in cgroup v2 so there might be
configurations where administrator would enable these missing
controllers in cgroup v1.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In order to set CPU cfs period using cgroup v2 'cpu.max' interface
we need to load the current value of CPU cfs quota first because
format of 'cpu.max' interface is '$quota $period' and in order to
change 'period' we need to write 'quota' as well. Writing only one
number changes only 'quota'.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In cgroups v2 we need to handle threads and processes differently.
If you need to move a process you need to write its pid into
cgrou.procs file and it will move the process with all its threads
as well. The whole process will be moved if you use tid of any thread.
In order to move only threads at first we need to create threaded group
and after that we can write the relevant thread tids into cgroup.threads
file. Threads can be moved only into cgroups that are children of
cgroup of its process.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When creating cgroup hierarchy we need to enable controllers in the
parent cgroup in order to be usable. That means writing "+{controller}"
into cgroup.subtree_control file. We can enable only controllers that
are enabled for parent cgroup, that means we need to do that for the
whole cgroup tree.
Cgroups for threads needs to be handled differently in cgroup v2. There
are two types of controllers:
- domain controllers: these cannot be enabled for threads
- threaded controllers: these can be enabled for threads
In addition there are multiple types of cgroups:
- domain: normal cgroup
- domain threaded: a domain cgroup that serves as root for threaded
cgroups
- domain invalid: invalid cgroup, can be changed into threaded, this
is the default state if you create subgroup inside
domain threaded group or threaded group
- threaded: threaded cgroup which can have domain threaded or
threaded as parent group
In order to create threaded cgroup it's sufficient to write "threaded"
into cgroup.type file, it will automatically make parent cgroup
"domain threaded" if it was only "domain". In case the parent cgroup
is already "domain threaded" or "threaded" it will modify only the type
of current cgroup. After that we can enable threaded controllers.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Cgroup v2 has only single mount point for all controllers. The list
of controllers is stored in cgroup.controllers file, name of controllers
are separated by space.
In cgroup v2 there is no cpuacct controller, the cpu.stat file always
exists with usage stats.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If the placement was copied from parent or set to absolute path
there is nothing to do, otherwise set the placement based on
process placement from /proc/self/cgroup or /proc/{pid}/cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When reconnecting to a domain we are validating the cgroup name.
In case of cgroup v2 we need to validate only the new format for host
without systemd '{machinename}.libvirt-{drivername}' or scope name
generated by systemd.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We cannot detect only mount points to figure out whether cgroup v2
is available because systemd uses cgroup v2 for process tracking and
all controllers are mounted as cgroup v1 controllers.
To make sure that this is no the situation we need to check
'cgroup.controllers' file if it's not empty to make sure that cgroup
v2 is not mounted only for process tracking.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Place cgroup v2 backend type before cgroup v1 to make it obvious
that cgroup v2 is preferred implementation.
Following patches will introduce support for hybrid configuration
which will allow us to use both at the same time, but we should
prefer cgroup v2 regardless.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
My commit d6b8838 fixed the uid:gid for the pre-created UNIX sockets
but did not account for the different umask of libvirtd and QEMU.
Since commit 0e1a1a8c we set umask to '0002' for the QEMU process.
Manually tune-up the permissions to match what we would have gotten
if QEMU had created the socket.
https://bugzilla.redhat.com/show_bug.cgi?id=1633389
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1632711
GlusterFS is typically safe when it comes to migration. It's a
network FS after all. However, it can be mounted via FUSE driver
they provide. If that is the case we fail to identify it and
think migration is not safe and require VIR_MIGRATE_UNSAFE flag.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
In commit v4.7.0-168-g993d85ae5e I introduced two Icelake CPU models,
but failed to actually include them in the CPU map index.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We switched to opening mode='bind' sockets ourselves:
commit 30fb2276d8
qemu: support passing pre-opened UNIX socket listen FD
in v4.5.0-rc1~251
Then fixed qemuBuildChrChardevStr to change libvirtd's label
while creating the socket:
commit b0c6300fc4
qemu: ensure FDs passed to QEMU for chardevs have correct SELinux labels
v4.5.0-rc1~52
Also add labeling of these sockets to the DAC driver.
Instead of duplicating the logic which decides whether libvirt should
pre-create the socket, assume an existing path meaning that it was created
by libvirt.
https://bugzilla.redhat.com/show_bug.cgi?id=1633389
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This function updates the used QEMU capabilities of @vm by querying
the QEMU capabilities cache.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Move the fetch of @ipAddrLeft to after the goto skip_instantiate
and remove the (req->binding) guard since we know that as long
as req->binding is created, then req->threadkey is filled in.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 87a8a30d6 added the function based on the virsh function,
but used an unsigned long long instead of a double and thus that
limits the maximum result.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
When libxlDomainMigrationDstPrepare adds the @args to an
virNetSocketAddIOCallback using libxlMigrateDstReceive as
the target of the virNetSocketIOFunc @func with the knowledge
that the libxlMigrateDstReceive will virObjectUnref @args
at the end thus not needing to Unref during normal processing
for libxlDomainMigrationDstPrepare.
However, Coverity believes there's an issue with this. The
problem is there can be @nsocks virNetSocketAddIOCallback's
added, but only one virObjectUnref. That means the first
one done will Unref and the subsequent callers may not get
the @args (or @opaque) as they expected. If there's only
one socket returned from virNetSocketNewListenTCP, then sure
that works. However, if it returned more than one there's
going to be a problem.
To resolve this, since we start with 1 reference from the
virObjectNew for @args, we will add 1 reference for each
time @args is used for virNetSocketAddIOCallback. Then
since libxlDomainMigrationDstPrepare would be done with
@args, move it's virObjectUnref from the error: label to
the done: label (since error: falls through). That way
once the last IOCallback is done, then @args will be freed.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Remove the "!params" check from the condition since it's possible
someone could pass a non NULL value there, but a 0 for the nparams
and thus continue on. The external API only checks if @nparams is
non-zero, then check for NULL @params.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
The pointer related to uml_driver needs to be checked before its usage
inside the function. Some attributes of the driver are being accessed
while the pointer is NULL considering the current logic.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This reverts commit 1602aa28f8.
There is no need to call virCgroupRemove() nor virCgroupFree() if
virCgroupEnableMissingControllers() fails because it will not modify
'group' at all.
The cleanup of directories is done in virCgroupMakeGroup().
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Turns out, there are couple of bugs that prevent this feature
from being operational. Given how close to the release we are
disable the feature temporarily. Hopefully, it can be enabled
back after all the bugs are fixed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Cgroups are linux specific and we need to make sure that the code is
compiled only on linux. On different OSes it fails the compilation:
../../src/util/vircgroupv1.c:65:19: error: variable has incomplete type 'struct mntent'
struct mntent entry;
^
../../src/util/vircgroupv1.c:65:12: note: forward declaration of 'struct mntent'
struct mntent entry;
^
../../src/util/vircgroupv1.c:74:12: error: implicit declaration of function 'getmntent_r' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
while (getmntent_r(mounts, &entry, buf, sizeof(buf)) != NULL) {
^
../../src/util/vircgroupv1.c:814:39: error: use of undeclared identifier 'MS_NOSUID'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:814:49: error: use of undeclared identifier 'MS_NODEV'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:814:58: error: use of undeclared identifier 'MS_NOEXEC'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:841:65: error: use of undeclared identifier 'MS_BIND'
if (mount(src, group->legacy[i].mountPoint, "none", MS_BIND,
^
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
All the system headers are used only if we are compiling on linux
and they all are present otherwise we would have seen build errors
because in our tests/vircgrouptest.c we use only __linux__ to check
whether to skip the cgroup tests or not.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
tests/vircgrouptest.c uses #ifdef __linux__ for a long time and no
failure was reported so far so it's safe to assume that __linux__ is
good enough to guard cgroup code.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Let's ignore the checking of interface type when we call the function
qemuARPGetInterfaces to get IP from host's arp table.
Signed-off-by: Lin Ma <lma@suse.com>
Reviewed-by: Chen Hanxiao <chenhanxiao@gmail.com>
It may happen that in the list of paths/disk sources to relabel
there is a disk source. If that is the case, the path is NULL. In
that case, we shouldn't try to lock the path. It's likely a
network disk anyway and therefore there is nothing to lock.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This shouldn't be needed per-se. Security manager shouldn't
disappear during transactions - it's immutable. However, it
doesn't hurt to grab a reference either - transaction code uses
it after all.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This array is allocated in virSecuritySELinuxContextListAppend()
but never freed. This commit is essentially the same as ca25026.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Use the mnemonic macros of libdbus for 1 (TRUE) and 0 (FALSE).
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Report a debug message if dbus_watch_handle() returns FALSE.
dbus_watch_handle() returns FALSE if there wasn't enough memory for
reading or writing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
As documented at
https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga2522ac5075dfe0a1535471f6e045e1ee
the creator of a non-shared D-Bus connection has to release the last
reference after closing for freeing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Grab a ref for info->bus (a DBus connection) as long as the while loop
is running. With the grabbed reference it is ensured that info->bus
isn't freed as long as the while loop is executed. This is necessary
as it's allowed to drop the last ref for the bus connection in a
handler.
There was already a bug of this kind in libdbus itself:
https://bugs.freedesktop.org/show_bug.cgi?id=15635.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The only place where VIR_DOMAIN_EVENT_RESUMED should be generated is the
RESUME event handler to make sure we don't generate duplicate events or
state changes. In the worse case the duplicity can revert or cover
changes done by other event handlers.
For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events
we could happily mark the domain as running and report
VIR_DOMAIN_EVENT_RESUMED to registered clients.
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Thanks to the previous commit the RESUME event handler knows what reason
should be used when changing the domain state to VIR_DOMAIN_RUNNING, but
the emitted VIR_DOMAIN_EVENT_RESUMED event still uses a generic
VIR_DOMAIN_EVENT_RESUMED_UNPAUSED detail. Luckily, the event detail can
be easily deduced from the running reason, which saves us from having to
pass one more value to the handler.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Whenever we get the RESUME event from QEMU, we change the state of the
affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED
reason. This is fine if the domain is resumed unexpectedly, but when we
sent "cont" to QEMU we usually have a better reason for the state
change. The better reason is used in qemuProcessStartCPUs which also
sets the domain state to running if qemuMonitorStartCPUs reports
success. Thus we may end up with two state updates in a row, but the
final reason is correct.
This patch is a preparation for dropping the state change done in
qemuMonitorStartCPUs for which we need to pass the actual running reason
to the RESUME event handler and use it there instead of
VIR_DOMAIN_RUNNING_UNPAUSED.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch replaces some rather generic VIR_DOMAIN_RUNNING_UNPAUSED
reasons when changing domain state to running with more specific ones.
All of them are done when libvirtd reconnects to an existing domain
after being restarted and sees an unfinished migration or save.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT was defined but not used anywhere
in our event generation code. This fixes qemuDomainRevertToSnapshot to
properly report why the domain was resumed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
A deadlock situation can occur when autostarting a LXC domain 'guest'
due to two threads attempting to take opposing locks while holding
opposing locks (AB BA problem). Thread A takes and holds the 'vm' lock
while attempting to take the 'client' lock, meanwhile, thread B takes
and holds the 'client' lock while attempting to take the 'vm' lock.
The potential for this can be seen as follows:
Thread A:
virLXCProcessAutostartDomain (takes vm lock)
--> virLXCProcessStart
--> virLXCProcessConnectMonitor
--> virLXCMonitorNew
--> virNetClientSetCloseCallback (wants client lock)
Thread B:
virNetClientIncomingEvent (takes client lock)
--> virNetClientIOHandleInput
--> virNetClientCallDispatch
--> virNetClientCallDispatchMessage
--> virNetClientProgramDispatch
--> virLXCMonitorHandleEventInit
--> virLXCProcessMonitorInitNotify (wants vm lock)
Since these threads are scheduled independently and are preemptible it
is possible for the deadlock scenario to occur where each thread locks
their first lock but both will fail to get their second lock and just
spin forever. You get something like:
virLXCProcessAutostartDomain (takes vm lock)
--> virLXCProcessStart
--> virLXCProcessConnectMonitor
--> virLXCMonitorNew
<...>
virNetClientIncomingEvent (takes client lock)
--> virNetClientIOHandleInput
--> virNetClientCallDispatch
--> virNetClientCallDispatchMessage
--> virNetClientProgramDispatch
--> virLXCMonitorHandleEventInit
--> virLXCProcessMonitorInitNotify (wants vm lock but spins)
<...>
--> virNetClientSetCloseCallback (wants client lock but spins)
Neither thread ever gets the lock it needs to be able to continue
while holding the lock that the other thread needs.
The actual window for preemption which can cause this deadlock is
rather small, between the calls to virNetClientProgramNew() and
execution of virNetClientSetCloseCallback(), both in
virLXCMonitorNew(). But it can be seen in real world use that this
small window is enough.
By moving the call to virNetClientSetCloseCallback() ahead of
virNetClientProgramNew() we can close any possible chance of the
deadlock taking place. There should be no other implications to the
move since the close callback (in the unlikely event was called) will
spin on the vm lock. The remaining work that takes place between the
old call location of virNetClientSetCloseCallback() and the new
location is unaffected by the move.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
With the introduction of cgroup v2 there are new names used with
cgroups based on which version is used:
- legacy: cgroup v1
- unified: cgroup v2
- hybrid: cgroup v1 and cgroup v2
Let's use 'legacy' instead of 'cgroupv1' or 'controllers' in our code.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
They all need virCgroupV1GetMemoryUnlimitedKB() so it's easier to
move them in one commit.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We need to update one test-case because now new cgroup object will be
created only if there is any cgroup backend available.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We will need to extract current cgroup v1 implementation into separate
backend because there will be new cgroup v2 implementation and both will
have to co-exist.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This will be required once cgroup v2 is introduced. The cgroup
detection is not simple and we will have multiple backends so we
should not just jump into the middle of the detection code.
In order to use virCgroupNewSelf we need to create all the remaining
data files:
- {name}.cgroups represents /proc/cgroups, it is a list of cgroup
controllers compiled into kernel
- {name}.self.cgroup represents /proc/self/cgroup, it describes
cgroups to which the process belongs
For "no-cgroups" we need to modify the expected behavior because
virCgroupNewSelf() will fail if there are no controllers available.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Because we can set which files to return for cgroup tests there
is no need to have special function tailored to run tests.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Once we introduce cgroup v2 support we need to handle processes and
threads differently.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Use flags in virCgroupAddTaskInternal instead of boolean parameter.
Following patch will add new flag to indicate thread instead of process.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In cgroup v2 we need to handle processes and threads differently,
following patch will introduce virCgroupAddThread.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If we are on host with systemd we need to build cgroup hierarchy
ourselves for controllers that are not managed by systemd.
As a starting parent we need to force root group because
virCgroupMakeGroup() takes that parent in order to inherit values
for cpuset controller.
By default cpuset controller is managed by systemd so we will never
hit the issue but for v2 cgroups we need to use parent cgroup every
time.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If virCgroupEnableMissingControllers() fails it could have already
created some directories, we should clean it up as well.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If available, use b_info->nested_hvm instead of
b_info->u.hvm.nested_hvm. This will make nested HVM config available
also for PVH domains.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Otherwise starting PVH guest will result in "arch_setup_bootlate:
mapping shared_info failed (pfn=..., rc=-1, errno: 12): Internal error".
After this change the behavior is the same as in `xl`.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Commit 40b5c99a modified the virConfGetValue callers to use
virConfGetValueString. However, using the virConfGetValueString
resulted in leaking the returned @value string in each case.
So, let's modify each instance to use the VIR_AUTOFREE(char *)
syntax. In some instances changing the variable name since
@value was used more than once.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Since lxcConvertSize already creates an error message, there
is no need to use an error: label in lxcSetMemTune to just
overwrite or essentially rewrite the same error. So remove
the label.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
In virDomainMemoryDefParseXML and virDomainVideoDefParseXML if
the VIR_ALLOC fails and NULL is returned, then the alteration
to ctxt->node isn't reversed.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
In a following case:
virsh start $domain
service libvirtd stop
<shutdown> the guest from within the $domain
service libvirtd start
Notice that PCI devices which have been assigned to the $domain will
still be bound to stub drivers instead rebound to host drivers.
In that case the call stack is like below:
libvirtd start
qemuProcessReconnect
qemuProcessStop (because $domain was shutdown without
libvirtd event to process that)
qemuHostdevReAttachDomainDevices
qemuHostdevReAttachPCIDevices
virHostdevReAttachPCIDevices
However, because qemuHostdevUpdateActiveDomainDevices was called
after the qemuConnectMonitor, the setup of the tracking of each
host device in the $domain on either the activePCIHostdevs list
or inactivePCIHostdev list will not occur in an orderly manner.
Therefore, virHostdevReAttachPCIDevices just neglects these host PCI
devices which are bound to stub drivers and doesn't rebind them to
host drivers.
This patch fixs that by moving qemuHostdevUpdateActiveDomainDevices before
qemuConnectMonitor during libvirtd reconnection processing.
Signed-off-by: Wu Zongyong <cordius.wu@huawei.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Use the new qemuDomainRemoveInactiveJobLocked to remove the
@obj during the virDomainObjListForEach call which holds a
lock on the domain object list.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Create a qemuDomainRemoveInactiveJobLocked which copies
qemuDomainRemoveInactiveJob except of course calling
another new helper qemuDomainRemoveInactiveLocked.
The qemuDomainRemoveInactiveLocked is a copy of
qemuDomainRemoveInactive except that instead of calling
virDomainObjListRemove it calls virDomainObjListRemoveLocked.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Introduce qemuDomainRemoveInactiveJobCommon to handle what will
be the common parts of the code with a new function that will
be used to call virDomainObjListRemoveLocked instead of the
unlocked variant.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: John Ferlan <jferlan@redhat.com>
It was already available in 1.5.0, so we can assume it's
present and avoid checking for it at runtime.
This commit is best viewed with 'git show -w'.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
We already prefer them in capabilities, and domcapabilities
should be consistent with that.
This commit is best viewed with 'git show -w'.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
The new implementation contains less duplicated code and
is easier to extend.
This commit is best viewed with 'git show -w'.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Now that we have reduced the number of sensible options down
to either the native QEMU binary or RHEL's qemu-kvm, we can
make virQEMUCapsInitGuest() a bit simpler.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Both Fedora's qemu-kvm and Debian's/Ubuntu's kvm are nothing
more than paper-thin wrappers around the native QEMU binary,
so we gain nothing by looking for them.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
We're only ever passing a single binary when calling this
function, so we can remove all code dealing with the
possibility of a second binary being specified.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
When the guest is native, we are currently looking at
potential KVM binaries regardless of whether or not we have
already located a QEMU binary suitable to run the guest.
This made sense back when KVM support was not part of QEMU
proper, but these days the KVM binaries are in most cases
just trivial wrapper scripts around the native QEMU binary
so it doesn't make sense to poke at them unless they're
the only binaries on the system, such as when running on
RHEL.
This will allow us to simplify both virQEMUCapsInitGuest()
and virQEMUCapsInitGuestFromBinary().
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
When running an armv7l guest on an aarch64 hosts, the
qemu-system-aarch64 binary should be our first choice instead
of qemu-system-arm since the former can take advantage of KVM
acceleration.
Move the special case to virQEMUCapsFindBinaryForArch() so
that it's handled along with all other cases rather than on
its own later on.
Doing so will also make further refactoring easier.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
virCapabilitiesAddGuestDomain() takes an optional binary
name: this is intended for cases where a certain domain
type can't use the default one registered for the guest
architecture, but has to use a special binary instead.
The current code, however, will pass 'binary' again when
'kvmbin' is not defined, which is unnecessary as 'binary'
has been registered as default earlier, and will result
in capabilities output such as
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<domain type='qemu'/>
<domain type='kvm'>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
</domain>
with the second <emulator> element providing no additional
information.
Change it so that, when 'kvmbin' is not defined, NULL is
passed and so the default emulator will be used instead.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
The function performing the checks, rather than its callers,
should contain comments explaining the rationale behind said
checks.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1630164
Since 2a13a0a103 we are querying the vhostuser's interface name
when building qemu command line. However, we forgot to do so on
hotplug.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
The `if(!list || list->type != VIR_CONF_LIST)` check couldn't be
written in a 100% similar way. Instead, we're just checking whether
`virConfGetValueStringList() <= 0` and creating a new function to:
- return -1 in case virConfGetValueStringList fails either due to some
allocation failure or when traversing the list;
- resetting the last error and return 0 otherwise;
Taking this approach we can have the behaviour with the new code as
close as possible to the old one.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This change actually changes the behaviour of xenConfigGetString() as
now it returns a newly-allocated string.
Unfortunately, there's not much that can be done in order to avoid that
and all the callers have to be changed in order to avoid leaking the
return value.
Also, as a side-effect of the change above, the function now takes a
"char **" argument instead of a "const char **" one.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
There seems to be no need to add the ignore_value wrapper or
caste with (void) to the unlink() calls, so let's just remove
them. I assume at one point in time Coverity complained. So,
let's just be consistent - those that care to check the return
status can and those that don't can just have the naked unlink.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This patch is introducing cache monitor(CMT) to cache and
memory bandwidth monitor(MBM) for monitoring CPU memory
bandwidth.
The host capability of the two monitors is also introduced
in this patch.
For CMT, the host capability is shown like:
<host>
...
<cache>
<bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'>
<control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/>
</bank>
<monitor level='3' 'reuseThreshold'='270336' maxMonitors='176'>
<feature name='llc_occupancy'/>
</monitor>
</cache>
...
</host>
For MBM, the capability is shown like this:
<host>
...
<memory_bandwidth>
<node id='1' cpus='6-11'>
<control granularity='10' min ='10' maxAllocs='4'/>
</node>
<monitor maxMonitors='176'>
<feature name='mbm_total_bytes'/>
<feature name='mbm_local_bytes'/>
</monitor>
</memory_bandwidth>
...
</host>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Move memory bandwidth capability nodes into one data structure,
this allows us to add a monitor for memory bandwidth.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Move all cache banks into one data structure, this allows
us to add other cache component, such as cache monitor.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch introduces the resource monitor and creates the interface
for getting host capability of resource monitor from the system resource
control file system.
The resource monitor takes the role of RDT monitoring group and could be
used to monitor the resource consumption information, such as the last
level cache occupancy and the utilization of memory bandwidth.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1614283
Save the error from the refresh failure because the stopPool
processing may overwrite the error or even worse clear it
due to calling an external libvirt API that resets the last
error such as is the case with the SCSI pool which may call
virGetConnectNodeDev (see commit decaeb288) in order to
process deleting an NPIV vport.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Create a common pool refresh failure handling method as the
same code is repeated multiple times.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Rather than duplicate the error code, let's create an error
label to keep code common.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Alter the code path to remove the need to to go cleanup and thus
remove the label completely.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
If the virStoragePoolRefresh fails and we call stopPool, the
code neglected to clean up the state file leading to the next
libvirtd restart attempting to start the pool. For a transient
pool this could make it unexpectedly reappear.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1607202
It's essentially stated in the nwfilterBindingDelete that we
will allow the admin to shoot themselves in the foot by deleting
the nwfilter binding which then allows them to undefine the
nwfilter that is in use for the running guest...
However, by allowing this we cause a problem for libvirtd
restart reconnect processing which would then try to recreate
the missing binding attempting to use the deleted filter
resulting in an error and thus shutting the guest down.
So rather than keep adding virDomainConfNWFilterInstantiate
flags to "ignore" specific error conditions, modify the logic
to ignore, but VIR_WARN errors other than ignoreExists. This
will at least allow the guest to not shutdown for only nwfilter
binding errors that we can now perhaps recover from since we
have the binding create/delete capability.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
All of the ones being removed are pulled in by internal.h. The only
exception is sanlock which expects the application to include <stdint.h>
before sanlock's headers, because sanlock prototypes use fixed width
int, but they don't include stdint.h themselves, so we have to leave
that one in place.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
It doesn't really make sense for us to have stdlib.h and string.h but
not stdio.h in the internal.h header.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
QEMU commits:
e37a5c7fa4 (v2.12.0)
i386: Add Intel Processor Trace feature support
c2f193b538 (v2.7.0)
target-i386: Add support for UMIP and RDPID CPUID bits
aff9e6e46a (v2.12.0)
x86/cpu: Enable new SSE/AVX/AVX512 cpu features
f77543772d (v2.9.0)
x86: add AVX512_VPOPCNTDQ features
5131dc433d (v3.1.0)
i386: Add CPUID bit for PCONFIG
59a80a19ca (v3.1.0)
i386: Add CPUID bit for WBNOINVD
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When restoring a domain from a compressed image, we launch an
intermediate process for decompressing the saved data. If QEMU fails to
load the data for some reason, we force close the stdin/stdout file
descriptors of the intermediate process and wait for it to die. However,
virCommandWait can report various errors which would overwrite the real
error from QEMU. Thus instead of getting something useful:
internal error: process exited while connecting to monitor:
2018-09-17T15:17:29.998910Z qemu-system-x86_64: can't apply global
Skylake-Client-x86_64-cpu.osxsave=off: Property '.osxsave' not found
we could get an irrelevant error message:
internal error: Child process (lzop -dc --ignore-warn) unexpected
fatal signal 13
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Lock all the paths we want to relabel to mutually exclude other
libvirt daemons.
The only hitch here is that directories can't be locked.
Therefore, when relabeling a directory do not lock it (this
happens only when setting up some domain private paths anyway,
e.g. huge pages directory).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
So far the whole transaction handling is done
virSecuritySELinuxSetFileconHelper(). This needs to change for
the sake of security label remembering and locking. Otherwise we
would be locking a path when only appending it to transaction
list and not when actually relabelling it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Firstly, the following code pattern is harder to follow:
if (func() < 0) {
error();
} else {
/* success */
}
We should put 'goto cleanup' into the error branch and move the
else branch one level up.
Secondly, 'rc' should really be named 'ret' because it holds
return value of the function. Not some intermediate value.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This label is used in both successful and error paths. Therefore
it should be named 'cleanup' and not 'err'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Lock all the paths we want to relabel to mutually exclude other
libvirt daemons.
The only hitch here is that directories can't be locked.
Therefore, when relabeling a directory do not lock it (this
happens only when setting up some domain private paths anyway,
e.g. huge pages directory).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Firstly, the message that says we're setting uid:gid shouldn't be
called from virSecurityDACSetOwnershipInternal() because
virSecurityDACRestoreFileLabelInternal() is calling it too.
Secondly, there are places between us reporting label restore and
us actually doing it where we can quit. Don't say we're doing
something until we are actually about to do it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
So far the whole transaction handling is done
virSecurityDACSetOwnershipInternal(). This needs to change for
the sake of security label remembering and locking. Otherwise we
would be locking a path when only appending it to transaction
list and not when actually relabeling it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Two new APIs are added so that security driver can lock and
unlock paths it wishes to touch. These APIs are not for other
drivers to call but security drivers (DAC and SELinux). That is
the reason these APIs are not exposed through our
libvirt_private.syms file.
Three interesting things happen in this commit. The first is the
global @lockManagerMutex. Unfortunately, this has to exist so that
there is only one thread talking to virtlockd at a time. If there
were more threads and one of them closed the connection
prematurely, it would cause virtlockd killing libvirtd. Instead
of complicated code that would handle that, let's have a mutex
and keep the code simple.
The second interesting thing is keeping connection open between
lock and unlock API calls. This is achieved by duplicating client
FD and keeping it open until unlock is called. This trick is used
by regular disk content locking code when the FD is leaked to
qemu.
Finally, the third thing is polling implemented at client side.
Since virtlockd has only one thread that handles locking
requests, all it can do is either acquire lock or error out.
Therefore, the polling has to be implemented in client. The
polling is capped at 60 second timeout, which should be plenty
since the metadata lock is held only for a fraction of a second.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Now that we know what metadata lock manager user wishes to use we
can load it when initializing security driver. This is achieved
by adding new argument to virSecurityManagerNewDriver() and
subsequently to all functions that end up calling it.
The cfg.mk change is needed in order to allow lock_manager.h
inclusion in security driver without 'syntax-check' complaining.
This is safe thing to do as locking APIs will always exist (it's
only backend implementation that changes). However, instead of
allowing the include for all other drivers (like cpu, network,
and so on) allow it only for security driver. This will still
trigger the error if including from other drivers.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This config option allows users to set and enable lock manager
for domain metadata. The lock manager is going to be used by
security drivers to serialize each other when changing a file
ownership or changing the SELinux label. The only supported lock
manager is 'lockd' for now.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
In some cases we might want to not load the lock driver config.
Alter virLockManagerPluginNew() and the lock drivers to cope with
this fact.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Soon there will be a virtlockd client that wants to either lock
all the resources or none (in order to avoid virtlockd killing
the client on connection close). Because on the RPC layer we can
only acquire one resource at a time, we have to perform a
rollback once we hit a resource that can't be acquired.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This is a new type of object that lock drivers can handle.
Currently, it is supported by lockd driver only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The fact whether domain has or doesn't have RW disks is specific
to VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN and therefore should
reside in union specific to it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We will want virtlockd to lock files on behalf of libvirtd and
not qemu process, because it is libvirtd that needs an exclusive
access not qemu. This requires new lock context.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This flag causes virtlockd to use different offset when locking
the file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
So far the virLockSpaceAcquireResource() locks the first byte in
the underlying file. But caller might want to lock other range.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The file being present doesn't necessarily mean anything these
days, as it's created independently of whether the kvm module
has been loaded[1]; moreover, we're already gathering all the
information we need through QMP, so poking the filesystem at
all is entirely unnecessary.
[1] https://github.com/systemd/systemd/commit/d35d6249d5a7ed3228
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
This capability is documented as having one meaning (whether
KVM is enabled by default) but is actually assigned two other
meanings over its life: whether the query-kvm QMP command is
available at first, and later on whether KVM is usable / was
used during probing.
Since the query-kvm QMP command was available in 1.5.0, we
can avoid probing for it; additionally, we can simplify the
logic by setting the flag when it applies instead of initially
setting it and then clearing it when it doesn't.
The flag's description is also updated to reflect reality.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
A side effect of recent changes is that we would always try
to regenerate the capabilities cache for non-native QEMU
binaries based on /dev/kvm availability, which is of course
complete nonsense. Make sure that doesn't happen.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
It was already available in 1.5.0.
Moreover, we're not even formatting it on the QEMU command
line, ever: we just use it as part of some logic that decides
whether KVM support should be advertised, and as it turns out
that logic is actually buggy and dropping this capability
fixes it.
https://bugzilla.redhat.com/show_bug.cgi?id=1628469
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Now that committing transactions using pid == -1 means that we're
not fork()-ing to run the transaction in a specific namespace, we
can utilize the transaction processing semantics in order to
start, run a or multiple commands, and then commit the
transaction without being concerned with other interactions or
transactions interrupting the processing. This will eventually
allow us to have a single place where all the paths can be
locked, followed by relabeling and unlocking again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
It will be desirable to run transactions more often than we
currently do. Even if the domain we're relabeling the paths for
does not run in a namespace. If that's the case, there is no need
to fork() as we are already running in the right namespace. To
differentiate whether transaction code should fork() or not the
@pid argument now accepts -1 (which means do not fork).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
In the future, the transactions are not going to be optional and
they will be run regardless of domain using namespace to collect
list of paths to be relabeled.
To make sure there won't be an API that goes behind transaction
code back update the comment that serves as decision manual
whether an API must be fully implemented or plain #define is
sufficient.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Even though the current use of the functions does not require full
implementation with transactions (none of the callers passes a path
somewhere under /dev), it doesn't hurt either. Moreover, in
future patches the paradigm is going to shift so that any API
that touches a file is required to use transactions.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Even though the current use of the function does not require full
implementation with transactions (none of the callers pass a path
somewhere under /dev), it doesn't hurt either. Moreover, in
future patches the paradigm is going to shift so that any API
that touches a file is required to use transactions.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Functions that deal with virPCIDeviceAddress exclusively
belong to util/virpci.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Currently the libxl driver claims support for Xen >= 4.4, but
Xen 4.4 and 4.5 are no longer supported upstream. Let's increase
the minimum supported Xen version to 4.6 and change the defined
LIBXL_API_VERSION to 0x040500, which is the API version defined
when Xen 4.6 was released.
Since Xen 4.6 contains a pkgconfig file, drop the now unused code
that falls back to using LIBVIRT_CHECK_LIB in the absence of
pkgconfig file. In addition, bumping the LIBXL_API_VERSION
required adjusting the calls to libxl_set_vcpuaffinity to account
for the extra parameter in the 0x040500 version of the API.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
commit b00c9c39 removed the label end_of_netlink_messages and 'return
table' statement, It causes the function virArpTableGet doesn't return
a proper virArpTable pointer.
How to reproduce:
# virsh domiflist sles12sp3
Interface Type Source Model MAC
-------------------------------------------------------
vnet0 network default virtio 52:54:00💿02:e6
# virsh domifaddr sles12sp3 --source arp
error: Failed to query for interfaces addresses
error: An error occurred, but the cause is unknown
It seems that the "if (nh->nlmsg_type == NLMSG_DONE)" statement won't be
meted. So this patch adds 'return table' when the iterations of nlmsghdr
for loop is over.
Signed-off-by: Lin Ma <lma@suse.com>
Reviewed-by: Chen Hanxiao <chenhanxiao@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
It is not a problem at all if the `tss` user/group does not exist, the code
fallbacks to the `root` user/group. However we report a warning for no reason
on every start-up. Fix this by checking if the user/group actually exists.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Instead of duplicating the code from virGet{User,Group}IDByName(), which are
static anyway, extend those functions to accept NULL pointers for the result and
a boolean for controlling the error reporting.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Otherwise after libvirtd restart we come back to issues fixed by
introducing this flag in [1].
[1] 61a0026a : qemu: Fix xml dump of autogenerated websocket
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
After removing the host CPU model re-computation,
this function is no longer necessary.
This reverts commits:
commit d0498881a0
virQEMUCapsFreeHostCPUModel: Don't always free host cpuData
commit 5276ec712a
testUpdateQEMUCaps: Don't leak host cpuData
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Commit 82327038 moved a couple of checks out of the XML parser
into the domain validation; however, those checks seem to be more
useful as hypervisor specific checks rather than the more general
domain conf checks (nothing in the docs indicate a specific error).
Fortunately only QEMU was processing the memoryBacking, thus
add the changes to qemuDomainDefValidateMemory and change the
code a bit to make usage of the similar deref to def->mem and
the mem->nhugepages filter.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
virDomainDefCollectBootOrder() is called for every item on the list
for each type of device. One of the checks it makes is to gather the
order attributes from the <boot> element of all devices, and assure
that no two devices have been given the same order.
Since (internally to libvirt, *not* in the domain XML) an <interface
type='hostdev'> is on both the list of hostdev devices and the list of
network devices, it will be counted twice, and the code that checks
for multiple devices with the same boot order will give a false
positive.
To remedy this, we make sure to return early for hostdev devices that
have a parent.type != NONE.
This was introduced in commit 5b75a4, which was first in libvirt-4.4.0.
Resolves: https://bugzilla.redhat.com/1601318
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We require QEMU 1.5.0 these days, so checking for versions
older than that is pointless.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capability was introduced in QEMU 1.5.0, which is our
minimum supported QEMU version these days.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capability was introduced in QEMU 1.3.1 and we require
QEMU 1.5.0 these days.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1613737
When processing the inputvol for encryption, we need to handle
the case where the inputvol is encrypted. This then allows for
the encrypted inputvol to be used either for an output encrypted
volume or an output volume of some XML provided type.
Add tests to show the various conversion options when either input
or output is encrypted. This includes when both are encrypted.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 39cef12a9 altered/fixed the inputvol processing to create
a multistep process when using an inputvol to create an encrypted
output volume; however, it unnecessarily assumed/restricted the
inputvol to be of 'raw' format only.
Modify the processing code to allow the inputvol format to be checked
and used in order to create the encrypted volume.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
There's really no need for it to be there since it's only ever
used inside virStorageBackendCreateQemuImgCmdFromVol
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
In some cases we are checking if the mount namespace is enabled
at two places: one is at the beginning of exported function (e.g.
qemuDomainNamespaceSetupDisk()) and the other is at the beginning
of qemuDomainNamespaceMknodPaths() which is called from the
former function anyway. Then we have some other functions which
rely on the later check solely.
In order to compensate for possibly needless function call,
qemuDomainNamespaceMknodPaths() returns early if @npaths is zero.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch simplifies virNetDevBridgeCreate and virNetDevMacVLanCreate
functions by making use of the virNetlinkNewLink helper.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This patch adds wrapper macros around nla_nest_[start|end] and nla_put,
thus getting rid of some redundancy and making virNetlinkNewLink more
readable.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This patch introduces virNetlinkNewLink helper which wraps the common
libnl/netlink code to create a new link.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
It is possible the incoming VM is not fully started when the finish
phase of migration is executed. In libxlDomainMigrationDstFinish,
wait for the thread receiving the VM to complete before executing
finish phase tasks.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>