18596 Commits

Author SHA1 Message Date
John Ferlan
f573f84eb7 storage: Add overwrite flag checking for logical pool
https://bugzilla.redhat.com/show_bug.cgi?id=1373711

Add support and documentation for the [NO_]OVERWRITE flags for the
logical backend.

Update virsh.pod with a description of the process for usage of
the flags and building of the pool's volume group.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
d5cc5f8997 storage: Extract logical device initialize into a helper
Make the remaining code a bit cleaner.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
71a08b5a5a storage: Clean up logical pool devices on build failure
If the build fails, then we need to ensure that we've run pvremove
on any devices which we've run pvcreate on; otherwise, a subsequent
build could fail since running pvcreate twice on a device requires
special force arguments.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
a4cb4a74f9 storage: Adjust disk label found to match labels
Currently as long as the disk is formatted using a known parted format
type, the algorithm is happy to continue. However, that leaves a scenario
whereby a disk formatted using "pc98" could be used by a pool that's defined
using "dvh" (or vice versa). Alter the check to be match and different
and adjust the caller.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
a48c674fba storage: Move and rename disk backend label checking
Rather than have the Disk code having to use PARTED to determine if
there's something on the device, let's use the virStorageBackendDeviceProbe.
and only fallback to the PARTED probing if the BLKID code isn't built in.

This will also provide a mechanism for the other current caller (File
System Backend) to utilize a PARTED parsing algorithm in the event that
BLKID isn't built in to at least see if *something* exists on the disk
before blindly trying to use. The PARTED error checking will not find
file system types, but if there is a partition table set on the device,
it will at least cause a failure.

Move virStorageBackendDiskValidLabel and virStorageBackendDiskFindLabel
to storage_backend and rename/rework the code to fit the new model.

Update the virsh.pod description to provide a more generic description
of the process since we could now use either blkid or parted to find
data on the target device.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
a11fd69735 storage: For FS pool check for properly formatted target volume
Prior to starting up, let's be sure the target volume device is
formatted as we expect; otherwise, inhibit the start.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
19ced38f1c storage: Add writelabel bool for virStorageBackendDeviceProbe
It's possible that the API could be called from a startup path in
order to check whether the label on the device matches what our
format is. In order to handle that condition, add a 'writelabel'
boolean to the API in order to indicate whether a write or just
read is about to happen.

This alters two "error" conditions that would care about knowing.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
a22e1a0032 storage: Add partition type checks for BLKID probing
A device may be formatted using some sort of disk partition format type.
We can check that using the blkid_ API's as well - so alter the logic to
allow checking the device for both a filesystem and a disk partition.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
f23d4bbce3 storage: Fix implementation of no-overwrite for file system backend
https://bugzilla.redhat.com/show_bug.cgi?id=1363586

Commit id '27758859' introduced the "NO_OVERWRITE" flag check for
file system backends; however, the implementation, documentation,
and algorithm was inconsistent. For the "flag" description for the
API the flag was described as "Do not overwrite existing pool";
however, within the storage backend code the flag is described
as "it probes to determine if filesystem already exists on the
target device, renurning an error if exists".

The code itself was implemented using the paradigm to set up the
superblock probe by creating a filter that would cause the code
to only search for the provided format type. If that type wasn't
found, then the algorithm would return success allowing the caller
to format the device. If the format type already existed on the
device, then the code would fail indicating that the a filesystem
of the same type existed on the device.

The result is that if someone had a file system of one type on the
device, it was possible to overwrite it if a different format type
was specified in updated XML effectively trashing whatever was on
the device already.

This patch alters what NO_OVERWRITE does for a file system backend
to be more realistic and consistent with what should be expected when
the caller requests to not overwrite the data on the disk.

Rather than filter results based on the expected format type, the
code will allow success/failure be determined solely on whether the
blkid_do_probe calls finds some known format on the device. This
adjustment also allows removal of the virStoragePoolProbeResult
enum that was under utilized.

If it does find a formatted file system different errors will be
generated indicating a file system of a specific type already exists
or a file system of some other type already exists.

In the original virsh support commit id 'ddcd5674', the description
for '--no-overwrite' within the 'pool-build' command help output
has an ambiguous "of this type" included in the short description.
Compared to the longer description within the "Build a given pool."
section of the virsh.pod file it's more apparent that the meaning
of this flag would cause failure if a probe of the target already
has a filesystem.

So this patch also modifies the short description to just be the
antecedent of the 'overwrite' flag, which matches the API description.
This patch also modifies the grammar in virsh.pod for no-overwrite
as well as reworking the paragraph formats to make it easier to read.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
John Ferlan
553d21da6c storage: Introduce virStorageBackendDeviceIsEmpty
Rename virStorageBackendFileSystemProbe and to virStorageBackendBLKIDFindFS
and move to the more common storage_backend module.

Create a shim virStorageBackendDeviceIsEmpty which will make the call
to the virStorageBackendBLKIDFindFS and check the return value.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-01-10 08:44:50 -05:00
Michal Privoznik
406e390962 qemu: Drop qemuDomainDeleteNamespace
After previous commits, this function is no longer needed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
5d198c2b2c qemuDomainCreateNamespace: move mkdir to qemuDomainBuildNamespace
Again, there is no need to create /var/lib/libvirt/$domain.*
directories in CreateNamespace(). It is sufficient to create them
as soon as we need them which is in BuildNamespace. This way we
don't leave them around for the whole lifetime of domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
5d30057695 qemuDomainGetPreservedMounts: Do not special case /dev
The c1140eb9e got me thinking. We don't want to special case /dev
in qemuDomainGetPreservedMounts(), but in all other places in the
code we special case it anyway. I mean,
/var/run/libvirt/$domain.dev path is constructed separately just
so that it is not constructed here. It makes only a little sense
(if any at all).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
40ebbf72d5 qemuDomainCreateNamespace: s/unlink/rmdir/
If something goes wrong in this function we try a rollback. That
is unlink all the directories we created earlier. For some weird
reason unlink() was called instead of rmdir().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
095f042ed6 qemu: Use transactions from security driver
So far if qemu is spawned under separate mount namespace in order
to relabel everything it needs an access to the security driver
to run in that namespace too. This has a very nasty down side -
it is being run in a separate process, so any internal state
transition is NOT reflected in the daemon. This can lead to many
sleepless nights. Therefore, use the transaction APIs so that
libvirt developers can sleep tight again.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:11 +01:00
Michal Privoznik
4674fc6afd security_selinux: Implement transaction APIs
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 12:50:00 +01:00
Michal Privoznik
67232478db security_dac: Implement transaction APIs
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 12:50:00 +01:00
Michal Privoznik
95576b4df0 security driver: Introduce transaction APIs
With our new qemu namespace code in place, the relabelling of
devices is done not as good is it could: a child process is
spawned, it enters the mount namespace of the qemu process and
then runs desired API of the security driver.

Problem with this approach is that internal state transition of
the security driver done in the child process is not reflected in
the parent process. While currently it wouldn't matter that much,
it is fairly easy to forget about that. We should take the extra
step now while this limitation is still fresh in our minds.

Three new APIs are introduced here:
  virSecurityManagerTransactionStart()
  virSecurityManagerTransactionCommit()
  virSecurityManagerTransactionAbort()

The Start() is going to be used to let security driver know that
we are starting a new transaction. During a transaction no
security labels are actually touched, but rather recorded and
only at Commit() phase they are actually updated. Should
something go wrong Abort() aborts the transaction freeing up all
memory allocated by transaction.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 12:49:59 +01:00
Michal Privoznik
39779eb195 security_dac: Resolve virSecurityDACSetOwnershipInternal const correctness
The code at the very bottom of the DAC secdriver that calls
chown() should be fine with read-only data. If something needs to
be prepared it should have been done beforehand.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 12:49:59 +01:00
Andrea Bolognani
1d8454639f qemu: Use virtio-pci by default for mach-virt guests
virtio-pci is the way forward for aarch64 guests: it's faster
and less alien to people coming from other architectures.
Now that guest support is finally getting there (Fedora 24,
CentOS 7.3, Ubuntu 16.04 and Debian testing all support
virtio-pci out of the box), we'd like to start using it by
default instead of virtio-mmio.

Users and applications can already opt-in by explicitly using

  <address type='pci'/>

inside the relevant elements, but that's kind of cumbersome and
requires all users and management applications to adapt, which
we'd really like to avoid.

What we can do instead is use virtio-mmio only if the guest
already has at least one virtio-mmio device, and use virtio-pci
in all other situations.

That means existing virtio-mmio guests will keep using the old
addressing scheme, and new guests will automatically be created
using virtio-pci instead. Users can still override the default
in either direction.

Existing tests such as aarch64-aavmf-virtio-mmio and
aarch64-virtio-pci-default already cover all possible
scenarios, so no additions to the test suites are necessary.
2017-01-10 12:33:53 +01:00
Peter Krempa
a946ea1a33 qemu: setvcpus: Properly coldplug vcpus when hotpluggable vcpus are present
When coldplugging vcpus to a VM that already has a few hotpluggable
vcpus the code might generate invalid configuration as
non-hotpluggable cpus need to be clustered starting from vcpu 0.

This fix forces the added vcpus to be hotpluggable in such case.

Fixes a corner case described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1370357
2017-01-10 10:47:06 +01:00
Nitesh Konkar
ae16c95f1b perf: Add cache_l1d perf event support
This patch adds support and documentation for
a generalized hardware cache event called cache_l1d
perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2017-01-09 18:15:31 -05:00
Jiri Denemark
dc2bfdc815 Update remote_protocol-structs for new events
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-01-09 19:53:55 +01:00
Daniel P. Berrange
42241208d9 secret: add support for value change events
Emit an event whenever a secret value changes

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 16:42:04 +00:00
Daniel P. Berrange
06fcee63cf secret: add support for lifecycle events
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:49 +00:00
Daniel P. Berrange
3b7bd6e540 remote: implement secret lifecycle event APIs
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:49 +00:00
Daniel P. Berrange
bd300b7194 conf: simplify internal virSecretDef handling of usage
The public virSecret object has a single "usage_id" field
but the virSecretDef object has a different 'char *' field
for each usage type, but the code all assumes every usage
type has a corresponding single string. Get rid of the
pointless union in virSecretDef and just use "usage_id"
everywhere. This doesn't impact public XML format, only
the internal handling.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:49 +00:00
Daniel P. Berrange
df740caf54 conf: add secret event handling
Add helper APIs / objects for managing secret events

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:48 +00:00
Daniel P. Berrange
34fd3caabf Introduce secret lifecycle event APIs
Add public APIs to allow applications to watch for define and
undefine of secret objects.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:48 +00:00
Daniel P. Berrange
89283c138e remote: fix struct for device removal failed event
The handler for the device removal failed event was using
the struct for the device added event. Fortunately the
layout was the same, so this was harmless.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:46 +00:00
Daniel P. Berrange
c50070173d Add domain event for metadata changes
When changing the metadata via virDomainSetMetadata, we now
emit an event to notify the app of changes. This is useful
when co-ordinating different applications read/write of
custom metadata.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:00 +00:00
Daniel P. Berrange
f1e48297cf cgroup: add virCgroupAddMachineTask stub for win32
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 14:27:34 +00:00
Daniel P. Berrange
44f79a0bd0 lxc: ensure libvirt_lxc and qemu-nbd move into systemd machine slice
Currently when spawning containers with systemd, the container PID 1
will get moved into the systemd machine slice. Libvirt then manually
moves the libvirt_lxc and qemu-nbd processes into the cgroups associated
with the slice, but skips the systemd controller cgroup. This means that
from systemd's POV, libvirt_lxc and qemu-nbd are still part of the
libvirtd.service unit.

On systemctl daemon-reload, it will notice that libvirt_lxc & qemu-nbd
are in the libvirtd.service unit for the systemd controller, but in the
machine cgroups for resources. Systemd will thus move them back into
the libvirtd.service resource cgroups next time libvirtd is restarted.
This causes libvirtd to kill off the container due to incorrect cgroup
placement.

The solution is to ensure that when moving libvirt_lxc & qemu-nbd, we
also move the systemd cgroup controller placement. Normally this is
not something we ever want todo, but this is a special case as we are
intentionally wanting to move them to a different systemd unit.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 12:46:52 +00:00
Michal Privoznik
65fb0b79f7 security_selinux: s/virSecuritySELinuxSecurity/virSecuritySELinux/
It doesn't make much sense to have two different prefix for
functions within the same driver.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-09 09:17:42 +01:00
Michal Privoznik
981d979047 virutil: Provide non-linux impl for virGetFCHostNameByFabricWWN
Currently, there's only linux implementation for
virGetFCHostNameByFabricWWN(). Since the symbol is exported in
our private symbols we ought to have implementation for other
platforms too. This also triggers compilation error on FreeBSD:

../src/.libs/libvirt_driver_storage_impl.a(libvirt_driver_storage_impl_la-storage_backend_scsi.o): In function `createVport':
/usr/home/jenkins/libvirt-master/systems/libvirt-freebsd/build/src/../../src/storage/storage_backend_scsi.c:740: undefined reference to `virGetFCHostNameByFabricWWN'

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-09 09:13:41 +01:00
Maxim Nestratov
af78cb0486 qemu: Allow to specify pit timer tick policy=discard
Separate out the "policy=discard" into it's own specific
qemu command line.

We'll rename "kvm-pit-device" test case to be "kvm-pit-discard"
since it has the syntax we'd be using.

Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
2017-01-06 18:27:06 -05:00
Maxim Nestratov
ef5c8bb412 qemu: Fix pit timer tick policy=delay
By a mistake, for the VIR_DOMAIN_TIMER_TICKPOLICY_DELAY qemu
command line creation, 'discard' was used instead of 'delay'
in commit id '1569fa14'.

Test "kvm-pit-delay" is fixed accordingly to show the correct
option being generated.

Remove the (now) redundant kvm-pit-device tests. As it turns
out there is no need to specify both QEMU_CAPS_NO_KVM_PIT and
QEMU_CAPS_KVM_PIT_TICK_POLICY since they are mutually exclusive
and "kvm-pit-device" becomes just the same as "kvm-pit-delay".

Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
2017-01-06 18:27:06 -05:00
John Ferlan
78be2e8b74 iscsi: Add parent wwnn/wwpn or fabric capability for createVport
https://bugzilla.redhat.com/show_bug.cgi?id=1349696

As it turns out using only the 'parent' to achieve the goal of a
consistent vHBA parent has issues with reboots where the scsi_hostX
parent could change to scsi_hostY causing either failure to create
the vHBA or usage of the wrong HBA for our vHBA.

Thus add the ability to search for the "parent" by the parent wwnn/
wwpn values or just a fabric_name if someone only cares to ensure
usage of the same SAN for the vHBA.
2017-01-06 17:15:34 -05:00
John Ferlan
8366cb0a20 util: Introduce virGetFCHostNameByFabricWWN
Create a utility routine in order to read the scsi_host fabric_name files
looking for a match to a passed fabric_name
2017-01-06 17:15:34 -05:00
John Ferlan
bb74a7ffeb conf: Add more fchost search fields for storage pool vHBA creation
Add new fields to the fchost structure to allow creation of a vHBA via
the storage pool when a parent_wwnn/parent_wwpn or parent_fabric_wwn is
supplied in the storage pool XML.
2017-01-06 17:15:34 -05:00
John Ferlan
2b13361bc7 nodedev: Add the ability to create vHBA by parent wwnn/wwpn or fabric_wwn
https://bugzilla.redhat.com/show_bug.cgi?id=1349696

When creating a vHBA, the process is to feed XML to nodeDeviceCreateXML
that lists the <parent> scsi_hostX to use to create the vHBA. However,
between reboots, it's possible that the <parent> changes its scsi_hostX
to scsi_hostY and saved XML to perform the creation will either fail or
create a vHBA using the wrong parent.

So add the ability to provide "wwnn" and "wwpn" or "fabric_wwn" to
the <parent> instead of a name of the scsi_hostN that is the parent.
The allowed XML will thus be:

  <parent>scsi_host3</parent>  (current)

or

  <parent wwnn='$WWNN' wwpn='$WWPN'/>

or

  <parent fabric_wwn='$WWNN'/>

Using the wwnn/wwpn or fabric_wwn ensures the same 'scsi_hostN' is
selected between hardware reconfigs or host reboots. The fabric_wwn
Using the wwnn/wwpn pair will provide the most specific search option,
while fabric_wwn will at least ensure usage of the same SAN, but maybe
not the same scsi_hostN.

This patch will add the new fields to the nodedev.rng for input purposes
only since the input XML is essentially thrown away, no need to Format
the values since they'd already be printed as part of the scsi_host
data block.

New API virNodeDeviceGetParentHostByWWNs will take the parent "wwnn" and
"wwpn" in order to search the list of devices for matching capability
data fields wwnn and wwpn.

New API virNodeDeviceGetParentHostByFabricWWN will take the parent "fabric_wwn"
in order to search the list of devices for matching capability data field
fabric_wwn.
2017-01-06 17:14:12 -05:00
Collin L. Walling
d47db7b16d qemu: command: Support new cpu feature argument syntax
Qemu has abandoned the +/-feature syntax in favor of key=value. Some
architectures (s390) do not support +/-feature. So we update libvirt to handle
both formats.

If we detect a sufficiently new Qemu (indicated by support for qmp
query-cpu-model-expansion) we use key=value else we fall back to +/-feature.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2017-01-06 12:24:57 +01:00
Jiri Denemark
5d513d4659 qemu-caps: Get host model directly from Qemu when available
When qmp query-cpu-model-expansion is available probe Qemu for its view of the
host model. In kvm environments this can provide a more complete view of the
host model because features supported by Qemu and Kvm can be considered.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2017-01-06 12:24:57 +01:00
Collin L. Walling
fab9d6e1a9 qemu: qmp query-cpu-model-expansion command
query-cpu-model-expansion is used to get a list of features for a given cpu
model name or to get the model and features of the host hardware/environment
as seen by Qemu/kvm.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2017-01-06 12:24:57 +01:00
Jason J. Herne
8f77821522 s390-cpu: Remove nodeData and decode
On s390, the host's features are heavily influenced by not only the host
hardware but also by hardware microcode level, host OS version, qemu
version and kvm version. In this environment it does not make sense to
attempt to report exact host details.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Jiri Denemark <jdenemar@redhat.com>
2017-01-06 12:24:56 +01:00
Jason J. Herne
79d72011ee s390: Cpu driver support for update and compare
Implement compare for s390. Required to test the guest against the host for
guest cpu model runnability checking. We always return IDENTICAL to bypass
Libvirt's checking. s390 will rely on Qemu to perform the runnability checking.

Implement update for s390. required to support use of cpu "host-model" mode.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Jiri Denemark <jdenemar@redhat.com>
2017-01-06 12:24:56 +01:00
Martin Kletzander
c1140eb9ed qemu: Remove /dev mount info properly
Just so it doesn't bite us in the future, even though it's unlikely.

And fix the comment above it as well.  Commit e08ee7cd3405 took the
info from the function it's calling, but that was lie itself in the
first place.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-01-05 16:24:55 +01:00
Martin Kletzander
08ad8f9fe2 util: Don't lie in virFileGetMount*Subtree's docstrings
The resulting function virFileGetMountSubtreeImpl() just uses
virStringSortRevCompare or virStringSortCompare which uses strcmp().

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-01-05 16:23:25 +01:00
Michal Privoznik
e08ee7cd34 qemuDomainGetPreservedMounts: Fetch list of /dev/* mounts dynamically
With my namespace patches, we are spawning qemu in its own
namespace so that we can manage /dev entries ourselves. However,
some filesystems mounted under /dev needs to be preserved in
order to be shared with the parent namespace (e.g. /dev/pts).
Currently, the list of mount points to preserve is hardcoded
which ain't right - on some systems there might be less or more
items under real /dev that on our list. The solution is to parse
/proc/mounts and fetch the list from there.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-05 16:00:20 +01:00
Michal Privoznik
486fd7f700 internal: Simplify STREQ_NULLABLE
Our STREQ_NULLABLE and STRNEQ_NULLABLE macros are too
complicated. This was a result of some broken version of gcc.
However, that is long gone and therefore we can simplify the
macros.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-05 14:40:15 +01:00