So far if qemu is spawned under separate mount namespace in order
to relabel everything it needs an access to the security driver
to run in that namespace too. This has a very nasty down side -
it is being run in a separate process, so any internal state
transition is NOT reflected in the daemon. This can lead to many
sleepless nights. Therefore, use the transaction APIs so that
libvirt developers can sleep tight again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
With our new qemu namespace code in place, the relabelling of
devices is done not as good is it could: a child process is
spawned, it enters the mount namespace of the qemu process and
then runs desired API of the security driver.
Problem with this approach is that internal state transition of
the security driver done in the child process is not reflected in
the parent process. While currently it wouldn't matter that much,
it is fairly easy to forget about that. We should take the extra
step now while this limitation is still fresh in our minds.
Three new APIs are introduced here:
virSecurityManagerTransactionStart()
virSecurityManagerTransactionCommit()
virSecurityManagerTransactionAbort()
The Start() is going to be used to let security driver know that
we are starting a new transaction. During a transaction no
security labels are actually touched, but rather recorded and
only at Commit() phase they are actually updated. Should
something go wrong Abort() aborts the transaction freeing up all
memory allocated by transaction.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The code at the very bottom of the DAC secdriver that calls
chown() should be fine with read-only data. If something needs to
be prepared it should have been done beforehand.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virtio-pci is the way forward for aarch64 guests: it's faster
and less alien to people coming from other architectures.
Now that guest support is finally getting there (Fedora 24,
CentOS 7.3, Ubuntu 16.04 and Debian testing all support
virtio-pci out of the box), we'd like to start using it by
default instead of virtio-mmio.
Users and applications can already opt-in by explicitly using
<address type='pci'/>
inside the relevant elements, but that's kind of cumbersome and
requires all users and management applications to adapt, which
we'd really like to avoid.
What we can do instead is use virtio-mmio only if the guest
already has at least one virtio-mmio device, and use virtio-pci
in all other situations.
That means existing virtio-mmio guests will keep using the old
addressing scheme, and new guests will automatically be created
using virtio-pci instead. Users can still override the default
in either direction.
Existing tests such as aarch64-aavmf-virtio-mmio and
aarch64-virtio-pci-default already cover all possible
scenarios, so no additions to the test suites are necessary.
When coldplugging vcpus to a VM that already has a few hotpluggable
vcpus the code might generate invalid configuration as
non-hotpluggable cpus need to be clustered starting from vcpu 0.
This fix forces the added vcpus to be hotpluggable in such case.
Fixes a corner case described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1370357
The virsh manpage lists "shutdown" and "dying" as two of the possible
domain states that could be listed in the output of the "virsh list"
command. However, a domain that is being shutdown will be listed as
"in shutdown", and the "dying" state doesn't even exist (and never
has, as far as I can tell from looking through git history - it was
shown in the original import of the virsh.pod file in 2006; there was
no VIR_DOMAIN_DYING state then, there wasn't one when those lines of
virsh.pod were tweaked in 2008, and there still isn't one
today. Apparently it was just something that sounded like a good idea
to someone at some time, but was never implemented...)
Resolves: https://bugzilla.redhat.com/1408778
This patch adds support and documentation for
a generalized hardware cache event called cache_l1d
perf event.
Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
The virsh manpage lists options --uuid and --name as
mutually exclusive along option --table when actually
the option --table is mutually exclusive and can't go
with options --uuid and/or --name. This patch rewords the
virsh manpage to state the correct meaning.
Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
When setting perf events, the enabled/disabled perf events are not
listed. Since we know which events were changed it's possible to
print out the values on successful set, such as :
virsh perf Domain --enable instructions --disable cache_misses
instructions : enabled
cache_misses : disabled
Created a helper to print the messages - use the vshPrintExtra to
adhere to the --quiet|-q option being set by some script. This will
cause the get code to print nothing, but will return success/failure.
Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
The public virSecret object has a single "usage_id" field
but the virSecretDef object has a different 'char *' field
for each usage type, but the code all assumes every usage
type has a corresponding single string. Get rid of the
pointless union in virSecretDef and just use "usage_id"
everywhere. This doesn't impact public XML format, only
the internal handling.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The handler for the device removal failed event was using
the struct for the device added event. Fortunately the
layout was the same, so this was harmless.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When changing the metadata via virDomainSetMetadata, we now
emit an event to notify the app of changes. This is useful
when co-ordinating different applications read/write of
custom metadata.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently when spawning containers with systemd, the container PID 1
will get moved into the systemd machine slice. Libvirt then manually
moves the libvirt_lxc and qemu-nbd processes into the cgroups associated
with the slice, but skips the systemd controller cgroup. This means that
from systemd's POV, libvirt_lxc and qemu-nbd are still part of the
libvirtd.service unit.
On systemctl daemon-reload, it will notice that libvirt_lxc & qemu-nbd
are in the libvirtd.service unit for the systemd controller, but in the
machine cgroups for resources. Systemd will thus move them back into
the libvirtd.service resource cgroups next time libvirtd is restarted.
This causes libvirtd to kill off the container due to incorrect cgroup
placement.
The solution is to ensure that when moving libvirt_lxc & qemu-nbd, we
also move the systemd cgroup controller placement. Normally this is
not something we ever want todo, but this is a special case as we are
intentionally wanting to move them to a different systemd unit.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there's only linux implementation for
virGetFCHostNameByFabricWWN(). Since the symbol is exported in
our private symbols we ought to have implementation for other
platforms too. This also triggers compilation error on FreeBSD:
../src/.libs/libvirt_driver_storage_impl.a(libvirt_driver_storage_impl_la-storage_backend_scsi.o): In function `createVport':
/usr/home/jenkins/libvirt-master/systems/libvirt-freebsd/build/src/../../src/storage/storage_backend_scsi.c:740: undefined reference to `virGetFCHostNameByFabricWWN'
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add a test case for when the QEMU_CAPS_NO_KVM_PIT capability is set.
This capability is mutually exclusive to QEMU_CAPS_KVM_PIT_TICK_POLICY
and results in the same output regardless of whether "discard" or
"delay" was specified in the guest XML for 'tickpolicy'.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Separate out the "policy=discard" into it's own specific
qemu command line.
We'll rename "kvm-pit-device" test case to be "kvm-pit-discard"
since it has the syntax we'd be using.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
By a mistake, for the VIR_DOMAIN_TIMER_TICKPOLICY_DELAY qemu
command line creation, 'discard' was used instead of 'delay'
in commit id '1569fa14'.
Test "kvm-pit-delay" is fixed accordingly to show the correct
option being generated.
Remove the (now) redundant kvm-pit-device tests. As it turns
out there is no need to specify both QEMU_CAPS_NO_KVM_PIT and
QEMU_CAPS_KVM_PIT_TICK_POLICY since they are mutually exclusive
and "kvm-pit-device" becomes just the same as "kvm-pit-delay".
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1349696
As it turns out using only the 'parent' to achieve the goal of a
consistent vHBA parent has issues with reboots where the scsi_hostX
parent could change to scsi_hostY causing either failure to create
the vHBA or usage of the wrong HBA for our vHBA.
Thus add the ability to search for the "parent" by the parent wwnn/
wwpn values or just a fabric_name if someone only cares to ensure
usage of the same SAN for the vHBA.
Add new fields to the fchost structure to allow creation of a vHBA via
the storage pool when a parent_wwnn/parent_wwpn or parent_fabric_wwn is
supplied in the storage pool XML.
https://bugzilla.redhat.com/show_bug.cgi?id=1349696
When creating a vHBA, the process is to feed XML to nodeDeviceCreateXML
that lists the <parent> scsi_hostX to use to create the vHBA. However,
between reboots, it's possible that the <parent> changes its scsi_hostX
to scsi_hostY and saved XML to perform the creation will either fail or
create a vHBA using the wrong parent.
So add the ability to provide "wwnn" and "wwpn" or "fabric_wwn" to
the <parent> instead of a name of the scsi_hostN that is the parent.
The allowed XML will thus be:
<parent>scsi_host3</parent> (current)
or
<parent wwnn='$WWNN' wwpn='$WWPN'/>
or
<parent fabric_wwn='$WWNN'/>
Using the wwnn/wwpn or fabric_wwn ensures the same 'scsi_hostN' is
selected between hardware reconfigs or host reboots. The fabric_wwn
Using the wwnn/wwpn pair will provide the most specific search option,
while fabric_wwn will at least ensure usage of the same SAN, but maybe
not the same scsi_hostN.
This patch will add the new fields to the nodedev.rng for input purposes
only since the input XML is essentially thrown away, no need to Format
the values since they'd already be printed as part of the scsi_host
data block.
New API virNodeDeviceGetParentHostByWWNs will take the parent "wwnn" and
"wwpn" in order to search the list of devices for matching capability
data fields wwnn and wwpn.
New API virNodeDeviceGetParentHostByFabricWWN will take the parent "fabric_wwn"
in order to search the list of devices for matching capability data field
fabric_wwn.
Qemu has abandoned the +/-feature syntax in favor of key=value. Some
architectures (s390) do not support +/-feature. So we update libvirt to handle
both formats.
If we detect a sufficiently new Qemu (indicated by support for qmp
query-cpu-model-expansion) we use key=value else we fall back to +/-feature.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
When qmp query-cpu-model-expansion is available probe Qemu for its view of the
host model. In kvm environments this can provide a more complete view of the
host model because features supported by Qemu and Kvm can be considered.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
query-cpu-model-expansion is used to get a list of features for a given cpu
model name or to get the model and features of the host hardware/environment
as seen by Qemu/kvm.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Tests domain capabilities on s390x using the Qemu 2.8 capabilities data.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Expected Qemu replies for versions 2.7 and 2.8 from the s390x
Qemu binary.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
On s390, the host's features are heavily influenced by not only the host
hardware but also by hardware microcode level, host OS version, qemu
version and kvm version. In this environment it does not make sense to
attempt to report exact host details.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Jiri Denemark <jdenemar@redhat.com>
Implement compare for s390. Required to test the guest against the host for
guest cpu model runnability checking. We always return IDENTICAL to bypass
Libvirt's checking. s390 will rely on Qemu to perform the runnability checking.
Implement update for s390. required to support use of cpu "host-model" mode.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Jiri Denemark <jdenemar@redhat.com>
Just so it doesn't bite us in the future, even though it's unlikely.
And fix the comment above it as well. Commit e08ee7cd34 took the
info from the function it's calling, but that was lie itself in the
first place.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The resulting function virFileGetMountSubtreeImpl() just uses
virStringSortRevCompare or virStringSortCompare which uses strcmp().
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
With my namespace patches, we are spawning qemu in its own
namespace so that we can manage /dev entries ourselves. However,
some filesystems mounted under /dev needs to be preserved in
order to be shared with the parent namespace (e.g. /dev/pts).
Currently, the list of mount points to preserve is hardcoded
which ain't right - on some systems there might be less or more
items under real /dev that on our list. The solution is to parse
/proc/mounts and fetch the list from there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Our STREQ_NULLABLE and STRNEQ_NULLABLE macros are too
complicated. This was a result of some broken version of gcc.
However, that is long gone and therefore we can simplify the
macros.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>