Commit Graph

6029 Commits

Author SHA1 Message Date
Martin Kletzander
87ee705183 qemu: Miscellaneous Block I/O tune cleanups
Well, just two.  One indentation and the usage of 'ret'.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-01-29 19:53:52 +01:00
Martin Kletzander
e9d75343d4 qemu: Only set group_name when actually requested
We were setting it based on whether it was supported and that lead to
setting it to NULL, which our JSON code caught.  However it ended up
producing the following results:

 $ virsh blkdeviotune fedora vda --total-bytes-sec-max 2000
 error: Unable to change block I/O throttle
 error: internal error: argument key 'group' must not have null value

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-01-29 19:46:51 +01:00
Michal Privoznik
572eda12ad qemu: Implement mtu on interface
Not only we should set the MTU on the host end of the device but
also let qemu know what MTU did we set.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-26 10:00:01 +01:00
Michal Privoznik
b020cf73fe domain_conf: Introduce <mtu/> to <interface/>
So far we allow to set MTU for libvirt networks. However, not all
domain interfaces have to be plugged into a libvirt network and
even if they are, they might want to have a different MTU (e.g.
for testing purposes).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-26 09:59:56 +01:00
Chen Hanxiao
980f2a35c7 qemu_domain: add timestamp in tainting of guests log
We lacked of timestamp in tainting of guests log,
which bring troubles for finding guest issues:
such as whether a guest powerdown caused by qemu-monitor-command
or others issues inside guests.
If we had timestamp in tainting of guests log,
it would be helpful when checking guest's /var/log/messages.

Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
2017-01-21 12:34:19 -05:00
Jiri Denemark
6cb204b7ac qemu: Reset hostModelInfo in virQEMUCapsReset
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-01-20 15:52:56 +01:00
Michal Privoznik
57b5e27d3d qemu: set default vhost-user ifname
Based on work of Mehdi Abaakouk <sileht@sileht.net>.

When parsing vhost-user interface XML and no ifname is found we
can try to fill it in in post parse callback. The way this works
is we try to make up interface name from given socket path and
then ask openvswitch whether it knows the interface.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-20 15:42:12 +01:00
Peter Krempa
1d4fd2dd0f qemu: hotplug: Properly emit "DEVICE_DELETED" event when unplugging memory
The event needs to be emitted after the last monitor call, so that it's
not possible to find the device in the XML accidentally while the vm
object is unlocked.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1414393
2017-01-20 14:24:35 +01:00
Daniel P. Berrange
b9cc6316c0 qemu: catch failure of drive_add
Previously when QEMU failed "drive_add" due to an error opening
a file it would report

  "could not open disk image"

These days though, QEMU reports

  "Could not open '/tmp/virtd-test_e3hnhh5/disk1.qcow2': Permission denied"

which we were not detecting as an error condition.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-19 10:56:53 +00:00
Peter Krempa
9d14cf595a qemu: Move cpu hotplug code into qemu_hotplug.c
Move all the worker code into the appropriate file. This will also allow
testing of cpu hotplug.
2017-01-18 09:57:06 +01:00
Peter Krempa
5570f26763 qemu: Prepare for reuse of qemuDomainSetVcpusLive
Extract the call to qemuDomainSelectHotplugVcpuEntities outside of
qemuDomainSetVcpusLive and decide whether to hotplug or unplug the
entities specified by the cpumap using a boolean flag.

This will allow to use qemuDomainSetVcpusLive in cases where we prepare
the list of vcpus to enable or disable by other means.
2017-01-18 09:57:06 +01:00
Peter Krempa
5cd670fea8 qemu: monitor: More strict checking of 'query-cpus' if hotplug is supported
In cases where CPU hotplug is supported by qemu force the monitor to
reject invalid or broken responses to 'query-cpus'. It's expected that
the command returns usable data in such case.
2017-01-18 09:57:06 +01:00
Jiri Denemark
f66b185c46 qemu: Don't leak hostCPUModelInfo in virQEMUCaps
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-01-17 14:36:52 +01:00
Michal Privoznik
d0baf54e53 qemu: Actually unshare() iff running as root
https://bugzilla.redhat.com/show_bug.cgi?id=1413922

While all the code that deals with qemu namespaces correctly
detects whether we are running as root (and turn into NO-OP for
qemu:///session) the actual unshare() call is not guarded with
such check. Therefore any attempt to start a domain under
qemu:///session shall fail as unshare() is reserved for root.

The fix consists of moving unshare() call (for which we have a
wrapper called virProcessSetupPrivateMountNS) into
qemuDomainBuildNamespace() where the proper check is performed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
2017-01-17 13:23:56 +01:00
Daniel P. Berrange
2d0c4947ab Revert "perf: Add cache_l1d perf event support"
This reverts commit ae16c95f1b.
2017-01-16 16:54:34 +00:00
Collin L. Walling
e8a43f1995 qemu-capabilities: Fix query-cpu-model-expansion on s390 with older kernel
When running on s390 with a kernel that does not support cpu model checking and
with a Qemu new enough to support query-cpu-model-expansion, the gathering of qemu
capabilities will fail. Qemu responds to the query-cpu-model-expansion qmp
command with an error because the needed kernel ioct does not exist. When this
happens a guest cannot even be defined due to missing qemu capabilities data.

This patch fixes the problem by silently ignoring generic errors stemming from
calls to query-cpu-model-expansion.

Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2017-01-13 16:55:58 +01:00
Michal Privoznik
93a062c3b2 qemu: Copy SELinux labels for namespace too
When creating new /dev/* for qemu, we do chown() and copy ACLs to
create the exact copy from the original /dev. I though that
copying SELinux labels is not necessary as SELinux will chose the
sane defaults. Surprisingly, it does not leaving namespace with
the following labels:

crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0     random
crw-------. root root system_u:object_r:tmpfs_t:s0     rtc0
drwxrwxrwt. root root system_u:object_r:tmpfs_t:s0     shm
crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0     urandom

As a result, domain is unable to start:

error: internal error: process exited while connecting to monitor: Error in GnuTLS initialization: Failed to acquire random data.
qemu-kvm: cannot initialize crypto: Unable to initialize GNUTLS library: Failed to acquire random data.

The solution is to copy the SELinux labels as well.

Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-13 14:45:52 +01:00
Jiri Denemark
19e06cfa25 qemu: Ignore non-boolean CPU model properties
The query-cpu-model-expansion is currently implemented for s390(x) only
and all CPU properties it returns are booleans. However, x86
implementation will report more types of properties. Without making the
code more tolerant older libvirt would fail to probe newer QEMU
versions.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-01-12 11:58:25 +01:00
Jiri Denemark
ec23791517 qemu: Don't check CPU model property key
The qemuMonitorJSONParseCPUModelProperty function is a callback for
virJSONValueObjectForeachKeyValue and is called for each key/value pair,
thus it doesn't really make sense to check whether key is NULL.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-01-12 11:58:25 +01:00
Michal Privoznik
cbc45525cb qemuDomainCreateDevice: Canonicalize paths
So far the decision whether /dev/* entry is created in the qemu
namespace is really simple: does the path starts with "/dev/"?
This can be easily fooled by providing path like the following
(for any considered device like disk, rng, chardev, ..):

  /dev/../var/lib/libvirt/images/disk.qcow2

Therefore, before making the decision the path should be
canonicalized.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 18:08:13 +01:00
Michal Privoznik
49f326edc0 qemu: Use namespaces iff available on the host kernel
So far the namespaces were turned on by default unconditionally.
For all non-Linux platforms we provided stub functions that just
ignored whatever namespaces setting there was in qemu.conf and
returned 0 to indicate success. Moreover, we didn't really check
if namespaces are available on the host kernel.

This is suboptimal as we might have ignored user setting.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 18:07:43 +01:00
Michal Privoznik
41816751a7 util: Introduce virFileMoveMount
This is a simple wrapper over mount(). However, not every system
out there is capable of moving a mount point. Therefore, instead
of having to deal with this fact in all the places of our code we
can have a simple wrapper and deal with this fact at just one
place.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 18:06:30 +01:00
Michal Privoznik
2ff8c30548 qemuDomainSetupAllInputs: Update debug message
Due to a copy-paste error, the debug message reads:

  Setting up disks

It should have been:

  Setting up inputs.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 17:39:24 +01:00
Laine Stump
5949b53aec conf: eliminate virDomainPCIAddressReleaseSlot() in favor of ...Addr()
Surprisingly there was a virDomainPCIAddressReleaseAddr() function
already, but it was completely unused. Since we don't reserve entire
slots at once any more, there is no need to release entire slots
either, so we just replace the single call to
virDomainPCIAddressReleaseSlot() with a call to
virDomainPCIAddressReleaseAddr() and remove the now unused function.

The keen observer may be concerned that ...Addr() doesn't call
virDomainPCIAddressValidate(), as ...Slot() did. But really the
validation was pointless anyway - if the device hadn't been suitable
to be connected at that address, it would have failed validation
before every being reserved in the first place, so by definition it
will pass validation when it is being unplugged. (And anyway, even if
something "bad" happened and we managed to have a device incorrectly
at the given address, we would still want to be able to free it up for
use by a device that *did* validate properly).
2017-01-11 05:00:34 -05:00
Laine Stump
6cc2014202 qemu: rename qemuDomainPCIAddressReserveNextSlot() to ...Addr()
This function doesn't actually reserve an entire slot any more, it
reserves a single PCI address, so this name is more appropriate.
2017-01-11 05:00:08 -05:00
Laine Stump
c5aea19d56 qemu: remove qemuDomainPCIAddressReserveNextAddr()
This function is only called in two places, and the function itself is
just adding a single argument and calling
virDomainPCIAddressReserveNextAddr(), so we can remove it and instead
call virDomainPCIAddressReserveNextAddr() directly. (The main
motivation for doing this is to free up the name so that
qemuDomainPCIAddressReserveNextSlot() can be renamed in the next
patch, as its current name is now inaccurate and misleading).
2017-01-11 04:59:42 -05:00
Laine Stump
27b0f971c4 conf: rename virDomainPCIAddressReserveSlot() to ...Addr()
This function doesn't actually reserve an entire slot any more, it
reserves a single PCI address, so this name is more appropriate.
2017-01-11 04:58:32 -05:00
Laine Stump
905859a6e5 qemu: replace virDomainPCIAddressReserveAddr with virDomainPCIAddressReserveSlot
All occurences of the former use fromConfig=true, and that's exactly
how virDomainPCIAddressReserveSlot() calls
virDomainPCIaddressReserveAddr(), so just use *Slot() so that *Addr()
can be made static to conf/domain_addr.c (both functions will be
renamed in upcoming patches).
2017-01-11 04:55:06 -05:00
Laine Stump
b59bbdba4b conf: fix fromConfig argument to virDomainPCIAddressValidate()
fromConfig should be true if the caller wants
virDomainPCIAddressValidate() to loosen restrictions on its
interpretation of the pciConnectFlags. In particular, either
PCI_DEVICE or PCIE_DEVICE will be counted as equivalent to both, and
HOTPLUG will be ignored. In a few cases where libvirt was manually
overriding automatic address assignment, it was setting fromConfig to
false when validating the hardcoded manual override. This patch
changes those to fromConfig=true as a preemptive strike against any
future bugs that might otherwise surface.
2017-01-11 04:51:54 -05:00
Laine Stump
79901543b9 conf: fix fromConfig argument to virDomainPCIAddressReserveAddr()
Although setting virDomainPCIAddressReserveAddr()'s fromConfig=true is
correct when a PCI addres is coming from a domain's config, the *true*
purpose of the fromConfig argument is to lower restrictions on what
kind of device can plug into what kind of controller - if fromConfig
is true, then a PCIE_DEVICE can plug into a slot that is marked as
only compatible with PCI_DEVICE (and vice versa), and the HOTPLUG flag
is ignored.

For a long time there have been several calls to
virDomainPCIAddressReserveAddr() that have fromConfig incorrectly set
to false - it's correct that the addresses aren't coming from user
config, but they are coming from hardcoded exceptions in libvirt that
should, if anything, pay *even less* attention to following the
pciConnectFlags (under the assumption that the libvirt programmer knew
what they were doing).

See commit b87703cf7 for an example of an actual bug caused by the
incorrect setting of the "fromConfig" argument to
virDomainPCIAddressReserveAddr(). Although they haven't resulted in
any reported bugs, this patch corrects all the other incorrect
settings of fromConfig in calls to virDomainPCIAddressReserveAddr().
2017-01-11 04:47:12 -05:00
Laine Stump
48d39cf96d conf: aggregate multiple devices on a slot when assigning PCI addresses
If a PCI device has VIR_PCI_CONNECT_AGGREGATE_SLOT set in its
pciConnectFlags, then during address assignment we allow multiple
instances of this type of device to be auto-assigned to multiple
functions on the same device. A slot is used for aggregating multiple
devices only if the first device assigned to that slot had
VIR_PCI_CONNECT_AGGREGATE_SLOT set. but any device types that have
AGGREGATE_SLOT set might be mix/matched on the same slot.

(NB: libvirt should never set the AGGREGATE_SLOT flag for a device
type that might need to be hotplugged. Currently it is only planned
for pcie-root-port and possibly other PCI controller types, and none
of those are hotpluggable anyway)

There aren't yet any devices that use this flag. That will be in a
later patch.
2017-01-11 04:43:22 -05:00
Laine Stump
8f4008713a qemu: use virDomainPCIAddressSetAllMulti() to set multi when needed
If there are multiple devices assigned to the different functions of a
single PCI slot, they will not work properly if the device at function
0 doesn't have its "multi" attribute turned on, so it makes sense for
libvirt to turn it on during PCI address assignment. Setting multi
then assures that the new setting is stored in the config (so it will
be used next time the domain is started), preventing any potential
problems in the case that a future change in the configuration
eliminates the devices on all non-0 functions (multi will still be set
for function 0 even though it is the only function in use on the slot,
which has no useful purpose, but also doesn't cause any problems).

(NB: If we were to instead just decide on the setting for
multifunction at runtime, a later removal of the non-0 functions of a
slot would result in a silent change in the guest ABI for the
remaining device on function 0 (although it may seem like an
inconsequential guest ABI change, it *is* a guest ABI change to turn
off the multi bit).)
2017-01-11 04:42:08 -05:00
Laine Stump
9ff9d9f5a9 conf: eliminate concept of "reserveEntireSlot"
setting reserveEntireSlot really accomplishes nothing - instead of
going to the trouble of computing the value for reserveEntireSlot and
then possibly setting *all* functions of the slot as in-use, we can
just set the in-use bit only for the specific function being used by a
device.  Later we will know from the context (the PCI connect flags,
and whether we are reserving a specific address or asking for "the
next available") whether or not it is okay to allocate other functions
on the same slot.

Although it's not used yet, we allow specifying "-1" for the function
number when looking for the "next available slot" - this is going to
end up meaning "return the lowest available function in the slot, but
since we currently only provide a function from an otherwise unused
slot, "-1" ends up meaning "0".
2017-01-11 04:36:34 -05:00
Laine Stump
9838cad9cd conf: use struct instead of int for each slot in virDomainPCIAddressBus
When keeping track of which functions of which slots are allocated, we
will need to have more information than just the current bitmap with a
bit for each function that is currently stored for each slot in a
virDomainPCIAddressBus. To prepare for adding more per-slot info, this
patch changes "uint8_t slots" into "virDomainPCIAddressSlot slot", which
currently has a single member named "functions" that serves the same
purpose previously served directly by "slots".
2017-01-11 04:29:48 -05:00
Michal Privoznik
269589146c qemu_domain: Move qemuDomainGetPreservedMounts
This function is used only from code compiled on Linux. Therefore
on non-Linux platforms it triggers compilation error:

../../src/qemu/qemu_domain.c:209:1: error: unused function 'qemuDomainGetPreservedMounts' [-Werror,-Wunused-function]

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 19:23:49 +01:00
Peter Krempa
b469853812 qemu: blockjob: Fix locking of block copy/active block commit
For the blockjobs, where libvirt is able to track the state internally
we can fix locking of images we can remove the appropriate locks.

Also when doing a pivoting operation we should not acquire the lock on
any of those images since both are actually locked already.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1302168
2017-01-10 19:12:19 +01:00
Peter Krempa
f61e40610d qemu: snapshot: Properly handle image locking
Images that became the backing chain of the current image due to the
snapshot need to be unlocked in the lock manager. Also if qemu was
paused during the snapshot the current top level images need to be
released until qemu is resumed so that they can be acquired properly.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1191901
2017-01-10 19:12:19 +01:00
Peter Krempa
cbb4d229de qemu: snapshot: Refactor snapshot rollback on failure
The code at first changed the definition and then rolled it back in case
of failure. This was ridiculous. Refactor the code so that the image in
the definition is changed only when the snapshot is successful.

The refactor will also simplify further fix of image locking when doing
snapshots.
2017-01-10 19:12:19 +01:00
Peter Krempa
7456c4f5f0 qemu: snapshot: Don't redetect backing chain after snapshot
Libvirt is able to properly model what happens to the backing chain
after a snapshot so there's no real need to redetect the data.
Additionally with the _REUSE_EXT flag this might end up in redetecting
wrong data if the user puts wrong backing chain reference into the
snapshot image.
2017-01-10 19:12:19 +01:00
Michal Privoznik
406e390962 qemu: Drop qemuDomainDeleteNamespace
After previous commits, this function is no longer needed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
5d198c2b2c qemuDomainCreateNamespace: move mkdir to qemuDomainBuildNamespace
Again, there is no need to create /var/lib/libvirt/$domain.*
directories in CreateNamespace(). It is sufficient to create them
as soon as we need them which is in BuildNamespace. This way we
don't leave them around for the whole lifetime of domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
5d30057695 qemuDomainGetPreservedMounts: Do not special case /dev
The c1140eb9e got me thinking. We don't want to special case /dev
in qemuDomainGetPreservedMounts(), but in all other places in the
code we special case it anyway. I mean,
/var/run/libvirt/$domain.dev path is constructed separately just
so that it is not constructed here. It makes only a little sense
(if any at all).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
40ebbf72d5 qemuDomainCreateNamespace: s/unlink/rmdir/
If something goes wrong in this function we try a rollback. That
is unlink all the directories we created earlier. For some weird
reason unlink() was called instead of rmdir().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
095f042ed6 qemu: Use transactions from security driver
So far if qemu is spawned under separate mount namespace in order
to relabel everything it needs an access to the security driver
to run in that namespace too. This has a very nasty down side -
it is being run in a separate process, so any internal state
transition is NOT reflected in the daemon. This can lead to many
sleepless nights. Therefore, use the transaction APIs so that
libvirt developers can sleep tight again.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:11 +01:00
Michal Privoznik
39779eb195 security_dac: Resolve virSecurityDACSetOwnershipInternal const correctness
The code at the very bottom of the DAC secdriver that calls
chown() should be fine with read-only data. If something needs to
be prepared it should have been done beforehand.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 12:49:59 +01:00
Andrea Bolognani
1d8454639f qemu: Use virtio-pci by default for mach-virt guests
virtio-pci is the way forward for aarch64 guests: it's faster
and less alien to people coming from other architectures.
Now that guest support is finally getting there (Fedora 24,
CentOS 7.3, Ubuntu 16.04 and Debian testing all support
virtio-pci out of the box), we'd like to start using it by
default instead of virtio-mmio.

Users and applications can already opt-in by explicitly using

  <address type='pci'/>

inside the relevant elements, but that's kind of cumbersome and
requires all users and management applications to adapt, which
we'd really like to avoid.

What we can do instead is use virtio-mmio only if the guest
already has at least one virtio-mmio device, and use virtio-pci
in all other situations.

That means existing virtio-mmio guests will keep using the old
addressing scheme, and new guests will automatically be created
using virtio-pci instead. Users can still override the default
in either direction.

Existing tests such as aarch64-aavmf-virtio-mmio and
aarch64-virtio-pci-default already cover all possible
scenarios, so no additions to the test suites are necessary.
2017-01-10 12:33:53 +01:00
Peter Krempa
a946ea1a33 qemu: setvcpus: Properly coldplug vcpus when hotpluggable vcpus are present
When coldplugging vcpus to a VM that already has a few hotpluggable
vcpus the code might generate invalid configuration as
non-hotpluggable cpus need to be clustered starting from vcpu 0.

This fix forces the added vcpus to be hotpluggable in such case.

Fixes a corner case described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1370357
2017-01-10 10:47:06 +01:00
Nitesh Konkar
ae16c95f1b perf: Add cache_l1d perf event support
This patch adds support and documentation for
a generalized hardware cache event called cache_l1d
perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2017-01-09 18:15:31 -05:00
Daniel P. Berrange
c50070173d Add domain event for metadata changes
When changing the metadata via virDomainSetMetadata, we now
emit an event to notify the app of changes. This is useful
when co-ordinating different applications read/write of
custom metadata.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 15:53:00 +00:00
Maxim Nestratov
af78cb0486 qemu: Allow to specify pit timer tick policy=discard
Separate out the "policy=discard" into it's own specific
qemu command line.

We'll rename "kvm-pit-device" test case to be "kvm-pit-discard"
since it has the syntax we'd be using.

Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
2017-01-06 18:27:06 -05:00
Maxim Nestratov
ef5c8bb412 qemu: Fix pit timer tick policy=delay
By a mistake, for the VIR_DOMAIN_TIMER_TICKPOLICY_DELAY qemu
command line creation, 'discard' was used instead of 'delay'
in commit id '1569fa14'.

Test "kvm-pit-delay" is fixed accordingly to show the correct
option being generated.

Remove the (now) redundant kvm-pit-device tests. As it turns
out there is no need to specify both QEMU_CAPS_NO_KVM_PIT and
QEMU_CAPS_KVM_PIT_TICK_POLICY since they are mutually exclusive
and "kvm-pit-device" becomes just the same as "kvm-pit-delay".

Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
2017-01-06 18:27:06 -05:00
Collin L. Walling
d47db7b16d qemu: command: Support new cpu feature argument syntax
Qemu has abandoned the +/-feature syntax in favor of key=value. Some
architectures (s390) do not support +/-feature. So we update libvirt to handle
both formats.

If we detect a sufficiently new Qemu (indicated by support for qmp
query-cpu-model-expansion) we use key=value else we fall back to +/-feature.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2017-01-06 12:24:57 +01:00
Jiri Denemark
5d513d4659 qemu-caps: Get host model directly from Qemu when available
When qmp query-cpu-model-expansion is available probe Qemu for its view of the
host model. In kvm environments this can provide a more complete view of the
host model because features supported by Qemu and Kvm can be considered.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2017-01-06 12:24:57 +01:00
Collin L. Walling
fab9d6e1a9 qemu: qmp query-cpu-model-expansion command
query-cpu-model-expansion is used to get a list of features for a given cpu
model name or to get the model and features of the host hardware/environment
as seen by Qemu/kvm.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2017-01-06 12:24:57 +01:00
Martin Kletzander
c1140eb9ed qemu: Remove /dev mount info properly
Just so it doesn't bite us in the future, even though it's unlikely.

And fix the comment above it as well.  Commit e08ee7cd34 took the
info from the function it's calling, but that was lie itself in the
first place.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-01-05 16:24:55 +01:00
Michal Privoznik
e08ee7cd34 qemuDomainGetPreservedMounts: Fetch list of /dev/* mounts dynamically
With my namespace patches, we are spawning qemu in its own
namespace so that we can manage /dev entries ourselves. However,
some filesystems mounted under /dev needs to be preserved in
order to be shared with the parent namespace (e.g. /dev/pts).
Currently, the list of mount points to preserve is hardcoded
which ain't right - on some systems there might be less or more
items under real /dev that on our list. The solution is to parse
/proc/mounts and fetch the list from there.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-05 16:00:20 +01:00
Michal Privoznik
6de3f11637 qemuProcessLaunch: fix indentation
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-05 14:38:45 +01:00
Wangjing (King, Euler)
3afaae4984 qemu: snapshot: restart CPUs when recover from interrupted snapshot job
If we restart libvirtd while VM was doing external memory snapshot, VM's
state be updated to paused as a result of running a migration-to-file
operation, and then VM will be left as paused state. In this case we must
restart the VM's CPUs to resume it.

Signed-off-by: Wang King <king.wang@huawei.com>
2017-01-05 10:47:03 +01:00
Peter Krempa
2e86c0816f qemu: snapshot: Resume VM after live snapshot
Commit 4b951d1e38 missed the fact that the
VM needs to be resumed after a live external checkpoint (memory
snapshot) where the cpus would be paused by the migration rather than
libvirt.
2017-01-04 16:50:18 +01:00
Michal Privoznik
dd78da09b0 qemuDomainCreateDevice: Be more careful about device path
Again, not something that I'd hit, but there is a chance in
theory that this might bite us. Currently the way we decide
whether or not to create /dev entry for a device is by marching
first four characters of path with "/dev". This might be not
enough. Just imagine somebody has a disk image stored under
"/devil/path/to/disk". We ought to be matching against "/dev/".

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-04 15:36:42 +01:00
Michal Privoznik
ce01a2b11c qemuDomainAttachDeviceMknodHelper: Don't unlink() so often
Not that I'd encounter any bug here, but the code doesn't look
100% correct. Imagine, somebody is trying to attach a device to a
domain, and the device's /dev entry already exists in the qemu
namespace. This is handled gracefully and the control continues
with setting up ACLs and calling security manager to set up
labels. Now, if any of these steps fail, control jump on the
'cleanup' label and unlink() the file straight away. Even when it
was not us who created the file in the first place. This can be
possibly dangerous.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-04 15:36:42 +01:00
Michal Privoznik
3aae99fe71 qemu: Handle EEXIST gracefully in qemuDomainCreateDevice
https://bugzilla.redhat.com/show_bug.cgi?id=1406837

Imagine you have a domain configured in such way that you are
assigning two PCI devices that fall into the same IOMMU group.
With mount namespace enabled what happens is that for the first
PCI device corresponding /dev/vfio/X entry is created and when
the code tries to do the same for the second mknod() fails as
/dev/vfio/X already exists:

2016-12-21 14:40:45.648+0000: 24681: error :
qemuProcessReportLogError:1792 : internal error: Process exited
prior to exec: libvirt: QEMU Driver error : Failed to make device
/var/run/libvirt/qemu/windoze.dev//vfio/22: File exists

Worse, by default there are some devices that are created in the
namespace regardless of domain configuration (e.g. /dev/null,
/dev/urandom, etc.). If one of them is set as backend for some
guest device (e.g. rng, chardev, etc.) it's the same story as
described above.

Weirdly, in attach code this is already handled.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-04 15:36:42 +01:00
John Ferlan
7f7d990483 qemu: Don't assume secret provided for LUKS encryption
https://bugzilla.redhat.com/show_bug.cgi?id=1405269

If a secret was not provided for what was determined to be a LUKS
encrypted disk (during virStorageFileGetMetadata processing when
called from qemuDomainDetermineDiskChain as a result of hotplug
attach qemuDomainAttachDeviceDiskLive), then do not attempt to
look it up (avoiding a libvirtd crash) and do not alter the format
to "luks" when adding the disk; otherwise, the device_add would
fail with a message such as:

   "unable to execute QEMU command 'device_add': Property 'scsi-hd.drive'
    can't find value 'drive-scsi0-0-0-0'"

because of assumptions that when the format=luks that libvirt would have
provided the secret to decrypt the volume.

Access to unlock the volume will thus be left to the application.
2017-01-03 12:59:18 -05:00
Shivaprasad G Bhat
5f65c96e8d Allow virtio-console on PPC64
virQEMUCapsSupportsChardev existing checks returns true
for spapr-vty alone. Instead verify spapr-vty validity
and let the logic to return true for other device types
so that virtio-console passes.

The non-pseries machines dont have spapr-vio-bus. So, the
function always returned false for them before.

Fixes - https://bugzilla.redhat.com/show_bug.cgi?id=1257813

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
2016-12-21 18:01:10 +01:00
Nikolay Shirokovskiy
9f08b76631 qemu: clean out unused migrate to unix 2016-12-21 16:24:59 +01:00
John Ferlan
b9b1aa6392 qemu: Adjust qemuDomainGetBlockInfo data for sparse backed files
According to commit id '0282ca45a' the 'physical' value should
essentially be the last offset of the image or the host physical
size in bytes of the image container. However, commit id '15fa84ac'
refactored the GetBlockInfo to use the same returned data as the
GetStatsBlock API for an active domain. For the 'entry->physical'
that would end up being the "actual-size" as set through the
qemuMonitorJSONBlockStatsUpdateCapacityOne (commit '7b11f5e5').
Digging deeper into QEMU code one finds that actual_size is
filled in using the same algorithm as GetBlockInfo has used for
setting the 'allocation' field when the domain is inactive.

The difference in values is seen primarily in sparse raw files
and other container type files (such as qcow2), which will return
a smaller value via the stat API for 'st_blocks'. Additionally
for container files, the 'capacity' field (populated via the
QEMU "virtual-size" value) may be slightly different (smaller)
in order to accomodate the overhead for the container. For
sparse files, the state 'st_size' field is returned.

This patch thus alters the allocation and physical values for
sparse backed storage files to be more appropriate to the API
contract. The result for GetBlockInfo is the following:

 capacity: logical size in bytes of the image (how much storage
           the guest will see)
 allocation: host storage in bytes occupied by the image (such
             as highest allocated extent if there are no holes,
             similar to 'du')
 physical: host physical size in bytes of the image container
           (last offset, similar to 'ls')

NB: The GetStatsBlock API allows a different contract for the
values:

 "block.<num>.allocation" - offset of the highest written sector
                            as unsigned long long.
 "block.<num>.capacity" - logical size in bytes of the block device
                          backing image as unsigned long long.
 "block.<num>.physical" - physical size in bytes of the container
                          of the backing image as unsigned long long.
2016-12-20 12:56:44 -05:00
Marc Hartmayer
fb2cd32c9a qemu: qemuDomainDiskChangeSupported: Add missing 'address' check
Disk->info is not live updatable so add a check for this. Otherwise
libvirt reports success even though no data was updated.

Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
2016-12-20 11:22:44 +01:00
Peter Krempa
8551d39f4f qemu: blockcopy: Save monitor error prior to calling into lock manager
The error would be overwritten otherwise producing a meaningless error
message.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1302171
2016-12-19 17:28:41 +01:00
Peter Krempa
9e9305542e qemu: block copy: Forbid block copy to relative paths
Similarly to 29bb066915 forbid paths used with blockjobs to be relative.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1300177
2016-12-16 18:30:39 +01:00
Michal Privoznik
ab41ce7f4e qemu: Mark more namespace code linux-only
Some of the functions are not called on non-linux platforms
which makes them useless there.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-16 11:51:06 +00:00
Nitesh Konkar
71bbe65311 perf: add ref_cpu_cycles perf event support
This patch adds support and documentation for
the ref_cpu_cycles perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2016-12-15 17:32:03 -05:00
Nitesh Konkar
9ae79400ff perf: add stalled_cycles_backend perf event support
This patch adds support and documentation for
the stalled_cycles_backend perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2016-12-15 16:47:05 -05:00
Nitesh Konkar
060c159b08 perf: add stalled_cycles_frontend perf event support
This patch adds support and documentation
for the stalled_cycles_frontend perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2016-12-15 16:47:05 -05:00
Nitesh Konkar
7d34731067 perf: add bus_cycles perf event support
This patch adds support and documentation
for the bus_cycles perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2016-12-15 16:47:05 -05:00
Peter Krempa
4b951d1e38 qemu: snapshot: Don't attempt to resume cpus if they were not paused
External disk-only snapshots with recent enough qemu don't require
libvirt to pause the VM. The logic determining when to resume cpus was
slightly flawed and attempted to resume them even if they were not
paused by the snapshot code. This normally was not a problem, but with
locking enabled the code would attempt to acquire the lock twice.

The fallout of this bug would be a error from the API, but the actual
snapshot being created. The bug was introduced with when adding support
for external snapshots with memory (checkpoints) in commit f569b87.

Resolves problems described by:
https://bugzilla.redhat.com/show_bug.cgi?id=1403691
2016-12-15 09:46:41 +01:00
Peter Krempa
e8f167a623 qemu: monitor: Don't resume lockspaces in resume event handler
After qemu delivers the resume event it's already running and thus it's
too late to enter lockspaces since it may already have modified the
disk. The code only creates false log entries in the case when locking
is enabled. The lockspace needs to be acquired prior to starting cpus.
2016-12-15 09:46:41 +01:00
Michal Privoznik
f444faa94a qemu: Enable mount namespace
https://bugzilla.redhat.com/show_bug.cgi?id=1404952

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
661887f558 qemu: Let users opt-out from containerization
Given how intrusive previous patches are, it might happen that
there's a bug or imperfection. Lets give users a way out: if they
set 'namespaces' to an empty array in qemu.conf the feature is
suppressed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
f95c5c48d4 qemu: Manage /dev entry on RNG hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
f5fdf23a68 qemu: Manage /dev entry on chardev hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
6e57492839 qemu: Manage /dev entry on hostdev hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
81df21507b qemu: Manage /dev entry on disk hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
eadaa97548 qemu: Enter the namespace on relabelling
Instead of trying to fix our security drivers, we can use a
simple trick to relabel paths in both namespace and the host.
I mean, if we enter the namespace some paths are still shared
with the host so any change done to them is visible from the host
too.
Therefore, we can just enter the namespace and call
SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight
overhead because we have to fork in order to enter the namespace.
But on the other hand, no complexity is added to our code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
2160f338a7 qemu: Prepare RNGs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
8ec8a8c5ff qemu: Prepare inputs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
2c654490f3 qemu: Prepare TPM when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
4e4451019c qemu: Prepare chardevs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
73267cec46 qemu: Prepare hostdevs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
054202d020 qemu: Prepare disks when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
bb4e529664 qemu: Spawn qemu under mount namespace
Prime time. When it comes to spawning qemu process and
relabelling all the devices it's going to touch, there's inherent
race with other applications in the system (e.g. udev). Instead
of trying convincing udev to not touch libvirt managed devices,
we can create a separate mount namespace for the qemu, and mount
our own /dev there. Of course this puts more work onto us as we
have to maintain /dev files on each domain start and device
hot(un-)plug. On the other hand, this enhances security also.

From technical POV, on domain startup process the parent
(libvirtd) creates:

  /var/lib/libvirt/qemu/$domain.dev
  /var/lib/libvirt/qemu/$domain.devpts

The child (which is going to be qemu eventually) calls unshare()
to create new mount namespace. From now on anything that child
does is invisible to the parent. Child then mounts tmpfs on
$domain.dev (so that it still sees original /dev from the host)
and creates some devices (as explained in one of the previous
patches). The devices have to be created exactly as they are in
the host (including perms, seclabels, ACLs, ...). After that it
moves $domain.dev mount to /dev.

What's the $domain.devpts mount there for then you ask? QEMU can
create PTYs for some chardevs. And historically we exposed the
host ends in our domain XML allowing users to connect to them.
Therefore we must preserve devpts mount to be shared with the
host's one.

To make this patch as small as possible, creating of devices
configured for domain in question is implemented in next patches.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
a5896e8ca4 qemu_cgroup: Expose defaultDeviceACL
This is a list of devices that qemu needs for its run (apart from
what's configured for domain). The devices on the list are
enabled in the CGroups by default so they will be good candidates
for initial /dev for new qemu.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Daniel P. Berrange
a81cfb649d Avoid variable named 'stat'
Using a variable named 'stat' clashes with the system function
'stat()' causing compiler warnings on some platforms

cc1: warnings being treated as errors
../../src/qemu/qemu_monitor_text.c: In function 'parseMemoryStat':
../../src/qemu/qemu_monitor_text.c:604: error: declaration of 'stat' shadows a global declaration [-Wshadow]
/usr/include/sys/stat.h:455: error: shadowed declaration is here [-Wshadow]

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-12-14 12:17:08 +00:00
Viktor Mihajlovski
283e290434 qemu: Allow use of hot plugged host CPUs if no affinity set
If the cpuset cgroup controller is disabled in /etc/libvirt/qemu.conf
QEMU virtual machines can in principle use all host CPUs, even if they
are hot plugged, if they have no explicit CPU affinity defined.

However, there's libvirt code supposed to handle the situation where
the libvirt daemon itself is not using all host CPUs. The code in
qemuProcessInitCpuAffinity attempts to set an affinity mask including
all defined host CPUs. Unfortunately, the resulting affinity mask for
the process will not contain the offline CPUs. See also the
sched_setaffinity(2) man page.

That means that even if the host CPUs come online again, they won't be
used by the QEMU process anymore. The same is true for newly hot
plugged CPUs. So we are effectively preventing that QEMU uses all
processors instead of enabling it to use them.

It only makes sense to set the QEMU process affinity if we're able
to actually grow the set of usable CPUs, i.e. if the process affinity
is a subset of the online host CPUs.

There's still the chance that for some reason the deliberately chosen
libvirtd affinity matches the online host CPU mask by accident. In this
case the behavior remains as it was before (CPUs offline while setting
the affinity will not be used if they show up later on).

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
2016-12-13 18:25:00 -05:00
Jiri Denemark
f00c00475f qemu: Fix virQEMUCapsFindTarget on ppc64le
virQEMUCapsFindTarget is supposed to find an alternative QEMU binary if
qemu-system-$GUEST_ARCH doesn't exist. The alternative is using host
architecture when it is compatible with $GUEST_ARCH. But a special
treatment has to be applied for ppc64le since the QEMU binary is always
called qemu-system-ppc64.

Broken by me in v2.2.0-171-gf2e71550d.

https://bugzilla.redhat.com/show_bug.cgi?id=1403745

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-12-13 22:11:33 +01:00
Nitesh Konkar
8981d7925e perf: add branch_misses perf event support
This patch adds support and documentation
for the branch_misses perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2016-12-12 18:04:52 -05:00
Nikolay Shirokovskiy
cdd6819318 qemu: agent: take monitor lock in qemuAgentNotifyEvent
qemuAgentNotifyEvent accesses monitor structure and is called on qemu
reset/shutdown/suspend events under domain lock. Other monitor
functions on the other hand take monitor lock and don't hold domain lock.
Thus it is possible to have risky simultaneous access to the structure
from 2 threads. Let's take monitor lock here to make access exclusive.
2016-12-12 17:14:11 -05:00
Nikolay Shirokovskiy
c9a191fc48 qemu: don't use vm when lock is dropped in qemuDomainGetFSInfo
Current call to qemuAgentGetFSInfo in qemuDomainGetFSInfo is
unsafe. Domain lock is dropped and we use vm->def. Let's make
def copy to fix that.
2016-12-12 17:14:11 -05:00
Nikolay Shirokovskiy
3ab9652a86 qemu: agent: fix uninitialized var case in qemuAgentGetFSInfo
In case of 0 filesystems *info is not set while according
to virDomainGetFSInfo contract user should call free on it even
in case of 0 filesystems. Thus we need to properly set
it. NULL will be enough as free eats NULLs ok.
2016-12-12 17:14:11 -05:00
John Ferlan
cf436a560d qemu: Fix GetBlockInfo setting allocation from wr_highest_offset
The libvirt-domain.h documentation indicates that for a qcow2 file
in a filesystem being used for a backing store should report the disk
space occupied by a file; however, commit id '15fa84ac' altered the
code to trust that the wr_highest_offset should be used whenever
wr_highest_offset_valid was set.

As it turns out this will lead to indeterminite results. For an active
domain when qemu hasn't yet had the need to find the wr_highest_offset
value, qemu will report 0 even though qemu-img will report the proper
disk size. This causes reporting of the following XML:

  <disk type='file' device='disk'>
    <driver name='qemu' type='qcow2'/>
    <source file='/path/to/test-1g.qcow2'/>

to be as follows:

Capacity:       1073741824
Allocation:     0
Physical:       1074139136

with qemu-img indicating:

image: /path/to/test-1g.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G

Once the backing source file is opened on the guest, then wr_highest_offset
is updated, but only to the high water mark and not the size of the file.

This patch will adjust the logic to check for the file backed qcow2 image
and enforce setting the allocation to the returned 'physical' value, which
is the 'actual-size' value from a 'query-block' operation.

NB: The other consumer of the wr_highest_offset output (GetAllDomainStats)
has a contract that indicates 'allocation' is the offset of the highest
written sector, so it doesn't need adjustment.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-12 16:04:17 -05:00
John Ferlan
9d734b60a7 util: Introduce virStorageSourceUpdateCapacity
Instead of having duplicated code in qemuStorageLimitsRefresh and
virStorageBackendUpdateVolTargetInfo to get capacity specific data
about the storage backing source or volume -- create a common API
to handle the details for both.

As a side effect, virStorageFileProbeFormatFromBuf returns to being
a local/static helper to virstoragefile.c

For the QEMU code - if the probe is done, then the format is saved so
as to avoid future such probes.

For the storage backend code, there is no need to deal with the probe
since we cannot call the new API if target->format == NONE.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-12 16:04:17 -05:00
John Ferlan
3039ec962e util: Introduce virStorageSourceUpdateBackingSizes
Instead of having duplicated code in qemuStorageLimitsRefresh and
virStorageBackendUpdateVolTargetInfoFD to fill in the storage backing
source or volume allocation, capacity, and physical values - create a
common API that will handle the details for both.

The common API will fill in "default" capacity values as well - although
those more than likely will be overridden by subsequent code. Having just
one place to make the determination of what the values should be will
make things be more consistent.

For the QEMU code - the data filled in will be for inactive domains
for the GetBlockInfo and DomainGetStatsOneBlock API's. For the storage
backend code - the data will be filled in during the volume updates.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-12 16:04:17 -05:00
John Ferlan
c5f6151390 util: Introduce virStorageSourceUpdatePhysicalSize
Commit id '8dc27259' introduced virStorageSourceUpdateBlockPhysicalSize
in order to retrieve the physical size for a block backed source device
for an active domain since commit id '15fa84ac' changed to use the
qemuMonitorGetAllBlockStatsInfo and qemuMonitorBlockStatsUpdateCapacity
API's to (essentially) retrieve the "actual-size" from a 'query-block'
operation for the source device.

However, the code only was made functional for a BLOCK backing type
and it neglected to use qemuOpenFile, instead using just open. After
the open the block lseek would find the end of the block and set the
physical value, close the fd and return.

Since the code would return 0 immediately if the source device wasn't
a BLOCK backed device, the physical would be displayed incorrectly,
such as follows in domblkinfo for a file backed source device:

Capacity:       1073741824
Allocation:     0
Physical:       0

This patch will modify the algorithm to get the physical size for other
backing types and it will make use of the qemuDomainStorageOpenStat
helper in order to open/stat the source file depending on its type.
The qemuDomainGetStatsOneBlock will no longer inhibit printing errors,
but it will still ignore them leaving the physical value set to 0.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-12 16:04:17 -05:00
John Ferlan
a7fea19fcd qemu: Introduce helper qemuDomainStorageUpdatePhysical
Currently just a shim to call virStorageSourceUpdateBlockPhysicalSize

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-12 16:04:17 -05:00
John Ferlan
732af77cce qemu: Add helpers to handle stat data for qemuStorageLimitsRefresh
Split out the opening of the file and fetch of the stat buffer into a
helper qemuDomainStorageOpenStat. This will handle either opening the
local or remote storage.

Additionally split out the cleanup of that into a separate helper
qemuDomainStorageCloseStat which will either close the file or
call the virStorageFileDeinit function.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-12 16:04:17 -05:00
John Ferlan
7149d1693d qemu: Clean up description for qemuStorageLimitsRefresh
Originally added by commit id '89646e69' prior to commit id '15fa84ac'
and '71d2c172' which ensured that qemuStorageLimitsRefresh was only called
for inactive domains.

Adjust the comment describing the need for FIXME and move all the text
to the function description.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-12 16:04:17 -05:00
Nikolay Shirokovskiy
1215965a4c qemu: mark user defined websocket as used
We need extra state variable to distinguish between autogenerated
and user defined cases after auto generation is done.
2016-12-09 07:54:34 -05:00
Nikolay Shirokovskiy
b07cfd724f qemu: Refactor qemuProcessGraphicsReservePorts
Use switch for enums rather than if/else conditions.
2016-12-09 07:40:46 -05:00
Michal Privoznik
b492f7ef0f qemuGetDomainHugepagePath: Initialize @ret
The variable may be used uninitialized in this function.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-09 10:51:37 +01:00
Mehdi Abaakouk
e0d893e86d Move virstat.c code to virnetdevtap.c
This is just a code move of virstat.c to virnetdevtap.c
2016-12-09 10:28:07 +01:00
Mehdi Abaakouk
9b6de7c506 virstat: fix signature of virstat helper
In preparation to the code move to virnetdevtap.c, this change:

* renames virNetInterfaceStats to virNetDevTapInterfaceStats
* changes 'path' to 'ifname', to use the same vocable as other
  method in virnetdevtap.c.
* Add the attributes checker
2016-12-09 10:27:56 +01:00
Mehdi Abaakouk
013df874db Gathering vhostuser interface stats with ovs
When vhostuser interfaces are used, the interface statistics
are not available in /proc/net/dev.

This change looks at the openvswitch interfaces statistics
tables to provide this information for vhostuser interface.

Note that in openvswitch world drop/error doesn't always make sense
for some interface type. When these informations are not available we
set them to 0 on the virDomainInterfaceStats.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-09 10:23:09 +01:00
Peter Krempa
a4ed5b4212 qemu: Don't try to find compression program for "raw" memory images
There's nothing to compress if the requested snapshot memory format is
set to 'raw' explicitly. After commit 9e14689ea libvirt would try to
run /sbin/raw to process the memory stream if the qemu.conf option
snapshot_image_format is set.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1402726
2016-12-08 17:12:54 +01:00
Michal Privoznik
ce937d3710 security: Drop virSecurityManagerSetHugepages
Since its introduction in 2012 this internal API did nothing.
Moreover we have the same API that does exactly the same:
virSecurityManagerDomainSetPathLabel.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-08 15:45:52 +01:00
Michal Privoznik
f55afd83b1 qemu: Create hugepage path on per domain basis
If you've ever tried running a huge page backed guest under
different user than in qemu.conf, you probably failed. Problem is
even though we have corresponding APIs in the security drivers,
there's no implementation and thus we don't relabel the huge page
path. But even if we did, so far all of the domains share the
same path:

   /hugepageMount/libvirt/qemu

Our only option there would be to set 0777 mode on the qemu dir
which is totally unsafe. Therefore, we can create dir on
per-domain basis, i.e.:

   /hugepageMount/libvirt/qemu/domainName

and chown domainName dir to the user that domain is configured to
run under.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-08 15:45:52 +01:00
Michal Privoznik
7ed6934f3b virDomainObjGetShortName: take virDomainDef
So far this function takes virDomainObjPtr which:
1) is an overkill,
2) might be not available in all the places we will use it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-08 15:45:52 +01:00
Peter Krempa
cf44dc072a qemu: capabilities: Add gluster.debug_level detection for 2.8.0+
Qemu 2.8.0+ changes arguments structure for blockdev-add in the effort
to make it finally stable. Since libvirt recently added the detection of
gluster debug support relying on the old syntax we need to add the new
as well.
2016-12-07 13:34:22 +01:00
Nitesh Konkar
8546adf80b perf: add one more perf event support
With current perf framework, this patch adds support and documentation
for the branch_instructions perf event.

Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2016-12-07 07:03:57 -05:00
John Ferlan
1ff38366b8 qemu: Add the group name option to the iotune command line
Add in the block I/O throttling group parameter to the command line
if supported. If not supported, fail command creation.

Add the xml2argvtest for testing.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-05 18:30:38 -05:00
John Ferlan
c53bd25b13 qemu: Add support for parsing iotune group setting
Add support to read/parse the iotune group setting for qemu.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-12-05 18:12:08 -05:00
John Ferlan
d0f82df205 qemu: Adjust various bool BlockIoTune set_ values into a single mask
Rather than have multiple bool values, create a single enum with bits
representing what fields are set. Fields are generally set in groups
of 3 (read, write, total).
2016-12-05 18:12:08 -05:00
John Ferlan
ad9f127302 qemu: Alter qemuMonitorJSONSetBlockIoThrottle command logic
Currently we build the JSON object for the "block_set_io_throttle"
command using the knowledge that a NULL for a support*Options boolean
would essentially ignore the rest of the arguments.

This may not work properly if some capability was backported, plus it just
looks rather ugly. So instead, build the "base" arguments and then if
the support*Option bool capability is set, add in the arguments on the fly.

Then append those arguments to the basic command and send to qemu.
2016-12-05 18:12:08 -05:00
John Ferlan
c84ad82a2d qemu: Adjust maxparams logic for qemuDomainGetBlockIoTune
Rather than using negative logic and setting the maxparams to a lesser
value based on which capabilities exist, alter the logic to modify the
maxparams based on a base value plus the found capabilities. Reduces the
chance that some backported feature produces an incorrect value.
2016-12-05 18:12:08 -05:00
John Ferlan
d3364dfdc8 caps: Add new capability for the iotune group name
Add the capability to detect if the qemu binary can support the feature
to use throttling.group.
2016-12-05 18:12:08 -05:00
Yuri Chornoivan
ff8e021225 Fix minor typos 2016-12-02 09:25:13 +01:00
gaohaifeng
f81b33b50c qemuDomainAttachNetDevice: pass mq and vectors for vhost-user with multiqueue
Two reasons:
1.in none hotplug, we will pass it. We can see from libvirt function
qemuBuildVhostuserCommandLine
2.qemu will use this vetcor num to init msix table. If we don't pass, qemu
will use default value, this will cause VM can only use default value
interrupts at most.

Signed-off-by: gaohaifeng <gaohaifeng.gao@huawei.com>
2016-12-01 15:02:35 +01:00
Eric Farman
655429a0d4 qemu: Prevent detaching SCSI controller used by hostdev
Consider the following XML snippets:

  $ cat scsicontroller.xml
      <controller type='scsi' model='virtio-scsi' index='0'/>
  $ cat scsihostdev.xml
      <hostdev mode='subsystem' type='scsi'>
        <source>
          <adapter name='scsi_host0'/>
          <address bus='0' target='8' unit='1074151456'/>
        </source>
      </hostdev>

If we create a guest that includes the contents of scsihostdev.xml,
but forget the virtio-scsi controller described in scsicontroller.xml,
one is silently created for us.  The same holds true when attaching
a hostdev before the matching virtio-scsi controller.
(See qemuDomainFindOrCreateSCSIDiskController for context.)

Detaching the hostdev, followed by the controller, works well and the
guest behaves appropriately.

If we detach the virtio-scsi controller device first, any associated
hostdevs are detached for us by the underlying virtio-scsi code (this
is fine, since the connection is broken).  But all is not well, as the
guest is unable to receive new virtio-scsi devices (the attach commands
succeed, but devices never appear within the guest), nor even be
shutdown, after this point.

While this is not libvirt's problem, we can prevent falling into this
scenario by checking if a controller is being used by any hostdev
devices.  The same is already done for disk elements today.

Applying this patch and then using the XML snippets from earlier:

  $ virsh detach-device guest_01 scsicontroller.xml
  error: Failed to detach device from scsicontroller.xml
  error: operation failed: device cannot be detached: device is busy

  $ virsh detach-device guest_01 scsihostdev.xml
  Device detached successfully

  $ virsh detach-device guest_01 scsicontroller.xml
  Device detached successfully

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
2016-11-30 17:16:47 -05:00
Laine Stump
70249927b7 qemu: assign VFIO devices to PCIe addresses when appropriate
Although nearly all host devices that are assigned to guests using
VFIO ("<hostdev>" devices in libvirt) are physically PCI Express
devices, until now libvirt's PCI address assignment has always
assigned them addresses on legacy PCI controllers in the guest, even
if the guest's machinetype has a PCIe root bus (e.g. q35 and
aarch64/virt).

This patch tries to assign them to an address on a PCIe controller
instead, when appropriate. First we do some preliminary checks that
might allow setting the flags without doing any extra work, and if
those conditions aren't met (and if libvirt is running privileged so
that it has proper permissions), we perform the (relatively) time
consuming task of reading the device's PCI config to see if it is an
Express device. If this is successful, the connect flags are set based
on the result, but if we aren't able to read the PCI config (most
likely due to the device not being present on the system at the time
of the check) we assume it is (or will be) an Express device, since
that is almost always the case anyway.
2016-11-30 15:41:57 -05:00
Laine Stump
9b0848d523 qemu: propagate virQEMUDriver object to qemuDomainDeviceCalculatePCIConnectFlags
If libvirtd is running unprivileged, it can open a device's PCI config
data in sysfs, but can only read the first 64 bytes. But as part of
determining whether a device is Express or legacy PCI,
qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future
patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond
the first 64 bytes of the PCI config data and fails with an error log
if the read is unsuccessful.

In order to avoid creating a parallel "quiet" version of
virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down
through all the call chains that initialize the
qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver
pointer with the rest of the iterdata so that it can be used by
qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used
yet, but will be used in an upcoming patch (that detects Express vs
legacy PCI for VFIO assigned devices) to examine driver->privileged.
2016-11-30 15:28:07 -05:00
Jiri Denemark
0355de2e77 qemuProcessReconnect: Avoid relabeling images after migration
Restarting libvirtd on the source host at the end of migration when a
domain is already running on the destination would cause image labels to
be reset effectively killing the domain. Commit e8d0166e1d fixed similar
issue on the destination host, but kept the source always resetting the
labels, which was mostly correct except for the specific case handled by
this patch.

https://bugzilla.redhat.com/show_bug.cgi?id=1343858

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-29 12:37:04 +01:00
Jiri Denemark
ee3ea86b37 qemu: Report tunnelled post-copy migration as unsupported
Post-copy migration needs bi-directional communication between the
source and the destination QEMU processes, which is not supported by
tunnelled migration.

https://bugzilla.redhat.com/show_bug.cgi?id=1371358

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-29 12:31:25 +01:00
Peter Krempa
b87a11340f qemu: capabilities: Don't partially reprope caps on process reconnect
Thanks to the complex capability caching code virQEMUCapsProbeQMP was
never called when we were starting a new qemu VM. On the other hand,
when we are reconnecting to the qemu process we reload the capability
list from the status XML file. This means that the flag preventing the
function being called was not set and thus we partially reprobed some of
the capabilities.

The recent addition of CPU hotplug clears the
QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS if the machine does not support it.
The partial re-probe on reconnect results into attempting to call the
unsupported command and then killing the VM.

Remove the partial reprobe and depend on the stored capabilities. If it
will be necessary to reprobe the capabilities in the future, we should
do a full reprobe rather than this partial one.
2016-11-28 10:02:36 +01:00
Jiri Denemark
a1adfb0f06 qemu: Add support for unavailable-features
QEMU 2.8.0 adds support for unavailable-features in
query-cpu-definitions reply. The unavailable-features array lists CPU
features which prevent a corresponding CPU model from being usable on
current host. It can only be used when all the unavailable features are
disabled. Empty array means the CPU model can be used without
modifications.

We can use unavailable-features for providing CPU model usability info
in domain capabilities XML:

    <domainCapabilities>
      ...
      <cpu>
        <mode name='host-passthrough' supported='yes'/>
        <mode name='host-model' supported='yes'>
          <model fallback='allow'>Skylake-Client</model>
          ...
        </mode>
        <mode name='custom' supported='yes'>
          <model usable='yes'>qemu64</model>
          <model usable='yes'>qemu32</model>
          <model usable='no'>phenom</model>
          <model usable='yes'>pentium3</model>
          <model usable='yes'>pentium2</model>
          <model usable='yes'>pentium</model>
          <model usable='yes'>n270</model>
          <model usable='yes'>kvm64</model>
          <model usable='yes'>kvm32</model>
          <model usable='yes'>coreduo</model>
          <model usable='yes'>core2duo</model>
          <model usable='no'>athlon</model>
          <model usable='yes'>Westmere</model>
          <model usable='yes'>Skylake-Client</model>
          ...
        </mode>
      </cpu>
      ...
    </domainCapabilities>

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-28 09:11:22 +01:00
Jiri Denemark
73411a7ff1 qemu: Avoid reporting "host" as a supported CPU model
"host" CPU model is supported by a special host-passthrough CPU mode and
users is not allowed to specify this model directly with custom mode.
Thus we should not advertise "host" CPU model in domain capabilities.
This worked well on architectures for which libvirt provides a list of
supported CPU models in cpu_map.xml (since "host" is not in the list).
But we need to explicitly filter "host" model out for all other
architectures.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:59:19 +01:00
Jiri Denemark
7bf6f345e0 qemu: Probe CPU models for KVM and TCG
CPU models (and especially some additional details which we will start
probing for later) differ depending on the accelerator. Thus we need to
call query-cpu-definitions in both KVM and TCG mode to get all data we
want.

Tests in tests/domaincapstest.c are temporarily switched to TCG to avoid
having to squash even more stuff into this single patch. They will all
be switched back later in separate commits.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:27 +01:00
Jiri Denemark
7c95619cb1 qemu: Introduce virQEMUCapsFormatCPUModels
This patch moves the CPU models formatting code from
virQEMUCapsFormatCache into a separate function.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
1bdcd7a4ee qemu: Introduce virQEMUCapsLoadCPUModels
This patch moves the CPU models parsing code from virQEMUCapsLoadCache
into a separate function.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
f9d57f2b57 qemu: Refresh caps in virQEMUCapsCacheLookupByArch
The function just returned cached capabilities without checking whether
they are still valid. We should check that and refresh the capabilities
to make sure we don't return stale data. In other words, we should do
what all other lookup functions do.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
72e5aa4e1e qemu: Refactor virQEMUCapsCacheLookup
The function is made a little bit more readable and the code which
refreshes cached capabilities if they are not valid any more was moved
into a separate function (virQEMUCapsCacheValidate) so that it can be
reused in other places.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
cd51b90fbf qemu: Don't return unusable virttype in domain capabilities
If a user asked for a KVM domain capabilities when KVM is not available,
we would happily return data we got when probing through TCG and
pretended they were relevant for KVM. Let's just report KVM is not
supported to avoid confusion.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
8f55eef246 qemu: Use saner defaults for domain capabilities
When domain capabilities were introduced we did not have enough data to
decide whether KVM works on the host or not and thus working legacy/VFIO
device assignment was used as a witness. Now that we know whether KVM
was enabled when probing QEMU capabilities (and thus we know it's
working), we can use this knowledge to provide better default value for
virttype.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
d87df9bd39 qemu: Discard caps cache when KVM availability changes
Since some may depend on the accelerator used when probing QEMU the
cache becomes invalid when KVM becomes available or if it is not
available anymore.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
25ba9c31f5 qemu: Enable KVM when probing capabilities
CPU related capabilities may differ depending on accelerator used when
probing. Let's use KVM if available when probing QEMU and fall back to
TCG. The created capabilities already contain all we need to distinguish
whether KVM or TCG was used:

    - KVM was used when probing capabilities:
        QEMU_CAPS_KVM is set
        QEMU_CAPS_ENABLE_KVM is not set

    - TCG was used and QEMU supports KVM, but it failed (e.g., missing
      kernel module or wrong /dev/kvm permissions)
        QEMU_CAPS_KVM is not set
        QEMU_CAPS_ENABLE_KVM is set

    - KVM was not used and QEMU does not support it
        QEMU_CAPS_KVM is not set
        QEMU_CAPS_ENABLE_KVM is not set

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
429a7b231c qemu: Probe KVM state earlier
Let's set QEMU_CAPS_KVM and QEMU_CAPS_ENABLE_KVM early so that the rest
of the probing code can use these capabilities to handle KVM/TCG replies
differently.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
e73447f693 qemu: Use -machine when probing capabilities via QMP
Using -machine instead of -M for QMP probing is safe because any QEMU
binary which is capable of QMP probing supports -machine.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Jiri Denemark
4c5d05ea8a qemu: Make QMP probing process reusable
The code that runs a new QEMU process to be used for probing
capabilities is separated into four reusable functions so that any code
that wants to probe a QEMU process may just follow a few simple steps:

    cmd = virQEMUCapsInitQMPCommandNew(...);
    virQEMUCapsInitQMPCommandRun(cmd);

    /* talk to the running QEMU process using its QMP monitor */

    if (reprobeIsRequired) {
        virQEMUCapsInitQMPCommandAbort(cmd, ...);
        virQEMUCapsInitQMPCommandRun(cmd);

        /* talk to the running QEMU process again */
    }

    virQEMUCapsInitQMPCommandFree(cmd);

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-11-25 20:34:26 +01:00
Michal Privoznik
c2a5a4e7ea virstring: Unify string list function names
We have couple of functions that operate over NULL terminated
lits of strings. However, our naming sucks:

virStringJoin
virStringFreeList
virStringFreeListCount
virStringArrayHasString
virStringGetFirstWithPrefix

We can do better:

virStringListJoin
virStringListFree
virStringListFreeCount
virStringListHasString
virStringListGetFirstWithPrefix

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-11-25 13:54:05 +01:00
Boris Fiuczynski
b178fa8ecb qemu: fix internal error: NUMA isn't available on this host
If libvirt is compiled without NUMACTL support starting libvirtd
reports a libvirt internal error "NUMA isn't available on this host"
without checking if NUMA support is compiled into the libvirt binaries.
This patch adds the missing NUMA support check to prevent the internal error.
It also includes a check if the cgroup controller cpuset is available before
using it.

The error was noticed when libvirtd was restarted with running domains and
on libvirtd start the qemuConnectCgroup gets called during qemuProcessReconnect.

Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
2016-11-25 09:48:41 +01:00
Eric Farman
8c6d365373 qemu: Allow hotplug of vhost-scsi device
Adjust the device string that is built for vhost-scsi devices so that it
can be invoked from hotplug.

From the QEMU command line, the file descriptors are expect to be numeric only.
However, for hotplug, the file descriptors are expected to begin with at least
one alphabetic character else this error occurs:

  # virsh attach-device guest_0001 ~/vhost.xml
  error: Failed to attach device from /root/vhost.xml
  error: internal error: unable to execute QEMU command 'getfd':
  Parameter 'fdname' expects a name not starting with a digit

We also close the file descriptor in this case, so that shutting down the
guest cleans up the host cgroup entries and allows future guests to use
vhost-scsi devices.  (Otherwise the guest will silently end.)

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
2016-11-24 12:16:23 -05:00
Eric Farman
9cc26dc622 qemu: Add vhost-scsi string for -device parameter
Open /dev/vhost-scsi, and record the resulting file descriptor, so that
the guest has access to the host device outside of the libvirt daemon.
Pass this information, along with data parsed from the XML file, to build
a device string for the qemu command line.  That device string will be
for either a vhost-scsi-ccw device in the case of an s390 machine, or
vhost-scsi-pci for any others.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
2016-11-24 12:16:19 -05:00
Eric Farman
fc0e627bac Introduce framework for a hostdev SCSI_host subsystem type
We already have a "scsi" hostdev subsys type, which refers to a single
LUN that is passed through to a guest.  But what of things where
multiple LUNs are passed through via a single SCSI HBA, such as with
the vhost-scsi target?  Create a new hostdev subsys type that will
carry this.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
2016-11-24 12:15:26 -05:00