Commit Graph

20811 Commits

Author SHA1 Message Date
Martin Kletzander
e28ccd2643 util: Remove unused variable in virResctrlGetInfo
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-29 21:40:54 +01:00
Martin Kletzander
6899118043 util: Make it possible for virResctrlAllocSetMask to replace existing mask
This wil be used in the future, but it makes sense for now as well.  It makes
sure there is no mask leftover that would leak.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-29 21:40:54 +01:00
Martin Kletzander
ebafc603c1 util: Use "resctrl" instead of "resctrlfs" spelling
Pointed out during review on one or two places, but it actually appears in lot
more places.  So let's be consistent.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-29 21:40:54 +01:00
Martin Kletzander
bd5d07425d util: Check for empty allocation instead of just NULL pointer
When working on the CAT series one of the changes was that the pointer got
allocated in another part of the code, even when resctrl was not available on
the host system.  However this one particular place neglected that so it needs
to be fixed in order to get the proper error message when requesting
<cachetune/> on HW with no support for it.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-29 21:40:53 +01:00
Erik Skultety
b7a7912411 build: Fix broken build on FreeBSD and OSX after recent nodedev series
Commits f83c7c88 and 6eb1f2b9 broke the build on FreeBSD and OSX because
of symbols being undefined for those platforms.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 17:53:13 +01:00
John Ferlan
6b7c5c4726 qemu: Fix memory leak in processGuestPanicEvent
After processing the processEvent->data for a qemuProcessEventHandler
callout, it's expected that the called processEvent->eventType helper
will perform the proper free on the data field. In this case it's
a qemuMonitorEventPanicInfoPtr.
2018-01-29 11:44:20 -05:00
Erik Skultety
75dbb27b10 conf: nodedev: Update PCI mdev capabilities dynamically
Just like SRIOV, a PCI device is only capable of the mediated devices
framework when it's bound to the vendor native driver, thus if a driver
change occurs, e.g. vendor_native->vfio, we need to refresh some of the
device's capabilities to reflect the reality, mdev included.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Suggested-by: Wu Zongyong <cordius.wu@huawei.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
882fd7a13c conf: Replace usage of virNodeDevCapMdevType with virMediatedDeviceType
Now that we have all the building blocks in place, switch the nodedev
driver to use the "new" virMediatedDeviceType type instead of the "old"
virNodeDevCapMdevType one.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
3cbac4dec0 nodedev: udev: Drop the unused mdev type helpers
These are not necessary anymore, since these are going to be shadowed by
the helpers provided by util/virmdev.c module.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
6eb1f2b9d0 util: pci: Introduce virPCIGetMdevTypes helper
This is a replacement for the existing udevPCIGetMdevTypesCap which is
static to the udev backend. This simple helper constructs the sysfs path
from the device's base path for each mdev type and queries the
corresponding attributes of that type.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
3545cbef7f util: mdev: Introduce virMediatedDeviceTypeReadAttrs getter
This should serve as a replacement for the existing udevFillMdevType
which is responsible for fetching the device type's attributes from the
sysfs interface. The problem with the existing solution is that it's
tied to the udev backend.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
c3faa92a1b util: mdev: Introduce virMediatedDeviceType structure
This is later going to replace the existing virNodeDevCapMdevType, since:
1) it's going to couple related stuff in a single module
2) util is supposed to contain helpers that are widely accessible across
the whole repository.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
0674ddd317 util: mdev: Drop some unused symbols/includes from the header
There were some leftovers from early development which never got used.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
d18feadc0c conf: nodedev: Refresh capabilities before touching them
Most of them are static, however in case of PCI and SCSI_HOST devices,
the nested capabilities can change dynamically, e.g. due to a driver
change (from host_pci_driver -> vfio_pci).

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Suggested-by: Wu Zongyong <cordius.wu@huawei.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
36546e3cdb nodedev: Introduce virNodeDeviceCapsListExport
Whether asking for a number of capabilities supported by a device or
listing them, it's handled essentially by a copy-paste code, so extract
the common stuff into this new helper which also updates all
capabilities just before touching them.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
349dda1fc8 nodedev: Export nodeDeviceUpdateCaps from node_device_conf.c
Since we moved the helpers from nodedev driver to src/conf, the actual
'update' function using those helpers should be moved as well so that we
don't need to call back into the driver.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
f83c7c88c5 nodedev: Move the sysfs-related cap handling to node_device_conf.c
The capabilities are defined/parsed/formatted/queried from this module,
no reason for 'update' not being part of the module as well. This also
involves some module-specific prefix changes.
This patch also drops the node_device_linux_sysfs module from the repo
since:
a) it only contained the capability handlers we just moved
b) it's only linked with the driver (by design) and thus unreachable to
other modules
c) we touch sysfs across all the src/util modules so the module being
deleted hasn't been serving its original intention for some time already.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
d1860140cc nodedev: Drop the nodeDeviceSysfsGetSCSIHostCaps wrapper
We can call directly the virNodeDeviceGetSCSIHostCaps helper instead.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
b20ec49e57 conf: nodedev: Convert virNodeDevObjHasCapStr to a simple wrapper
This patch drops the capability matching redundancy by simply converting
the string input to our internal types which are then in turn used for
the actual capability matching.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
54cab10518 conf: nodedev: Rename virNodeDeviceCapMatch to virNodeDevObjHasCap
We currently have 2 methods that do the capability matching. This should
be condensed to a single function and all the derivates should just call
into that using a proper type conversion.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Erik Skultety
1cbaeda707 conf: nodedev: Rename virNodeDevObjHasCap to virNodeDevObjHasCapStr
We currently have 2 methods that do the capability matching. This should
be condensed to a single function and all the derivates should just call
into that using a proper type conversion.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2018-01-29 15:34:30 +01:00
Julio Faracco
3f630aa0e9 test: Implementing testDomainRename().
There is no method to rename inactive domains for test driver.
After this patch, we can rename the domains using 'domrename'.

    virsh# domrename test anothertest
    Domain successfully renamed

Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2018-01-29 09:50:47 +01:00
ZhiPeng Lu
614be3b882 vhost-user: add support reconnect for vhost-user ports
For vhost-user ports, Open vSwitch acts as the server and QEMU the client.
When OVS crashes or restarts, the QEMU process should be reconnected to
OVS.

Signed-off-by: ZhiPeng Lu <lu.zhipeng@zte.com.cn>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2018-01-26 14:02:46 +01:00
Daniel P. Berrangé
a020ab03fd resctl: stub out functions with Linux-only APIs used
The flock() function and d_type field in struct dirent are not portable
to the mingw platform.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2018-01-25 17:42:36 +00:00
Daniel P. Berrange
4e13fb02fe rpc: fix race sending and encoding sasl data
The virNetSocketWriteSASL method has to encode the buffer it is given and then
write it to the underlying socket. This write is not guaranteed to send the
full amount of data that was encoded by SASL. We cache the SASL encoded data so
that on the next invocation of virNetSocketWriteSASL we carry on sending it.

The subtle problem is that the 'len' value passed into virNetSocketWriteSASL on
the 2nd call may be larger than the original value. So when we've completed
sending the SASL encoded data we previously cached, we must return the original
length we encoded, not the new length.

This flaw means we could potentially have been discarded queued data without
sending it. This would have exhibited itself as a libvirt client never receiving
the reply to a method it invokes, async events silently going missing, or worse
stream data silently getting dropped.

For this to be a problem libvirtd would have to be queued data to send to the
client, while at the same time the TCP socket send buffer is full (due to a very
slow client). This is quite unlikely so if this bug was ever triggered by a real
world user it would be almost impossible to reproduce or diagnose, if indeed it
was ever noticed at all.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2018-01-25 16:29:24 +00:00
Jim Fehlig
0c710a37ea libxl: resume lock process after failed migration
During migration, the lock process is paused in the perform phase
but not resumed if there is a subsequent failure, leaving the locked
resource unprotected.

The perform phase itself can fail, in which case the lock process
should be resumed before returning from perform. The finish phase
could also fail on the destination host, in which case the migration
is canceled in the confirm phase and the VM is resumed. The lock
process needs to be resumed there as well.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
2018-01-25 09:22:14 -07:00
Martin Kletzander
9a2fc2db8f qemu: Add support for resctrl
We've been building up to this.  This adds support for cputune/cachetune
settings for domains in the QEMU driver.  The addition into
qemuProcessSetupVcpu() automatically adds support for hotplug.  For hot-unplug
we need to remove the allocation only if all the vCPUs were unplugged.  But
since the threads are left running, we can't really do much about it now.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1289368

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Martin Kletzander
7387e3fea4 conf: Add support for cputune/cachetune
More info in the documentation, this is basically the XML parsing/formatting
support, schemas, tests and documentation for the new cputune/cachetune element
that will get used by following patches.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Martin Kletzander
a64c761c27 resctrl: Add functions to work with resctrl allocations
With this commit we finally have a way to read and manipulate basic resctrl
settings.  Locking is done only on exposed functions that read/write from/to
resctrlfs.  Not in functions that are exposed in virresctrlpriv.h as those are
only supposed to be used from tests.

More information about how resctrl works:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Martin Kletzander
434848d7dc fixup_resctrlinfo
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Martin Kletzander
6328e48713 util: Remove now-unneeded resctrl functions
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Martin Kletzander
3bbae43d8c conf: Use virResctrlInfo in capabilities
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Martin Kletzander
cd572df89b util: Add virResctrlInfo
This will make the current functions obsolete and it will provide more
information to the virresctrl module so that it can be used later.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Martin Kletzander
b2211a9e54 Rename virResctrlInfo to virResctrlInfoPerCache
Just to ease the review of following patches.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-25 17:16:08 +01:00
Daniel P. Berrange
7697706135 qemu: add support for generating SMBIOS OEM strings command line
This wires up the previously added OEM strings XML schema to be able to
generate comamnd line args for QEMU. This requires QEMU >= 2.12 release
containing this patch:

  commit 2d6dcbf93fb01b4a7f45a93d276d4d74b16392dd
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Sat Oct 28 21:51:36 2017 +0100

    smbios: support setting OEM strings table

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2018-01-25 14:48:56 +00:00
Daniel P. Berrange
68eed56b2d conf: add support for setting OEM strings SMBIOS data fields
The OEM strings table in SMBIOS allows the vendor to pass arbitrary
strings into the guest OS. This can be used as a way to pass data to an
application like cloud-init, or potentially as an alternative to the
kernel command line for OS installers where you can't modify the install
ISO image to change the kernel args.

As an example, consider if cloud-init and anaconda supported OEM strings
you could use something like

    <oemStrings>
      <entry>cloud-init:ds=nocloud-net;s=http://10.10.0.1:8000/</entry>
      <entry>anaconda:method=http://dl.fedoraproject.org/pub/fedora/linux/releases/25/x86_64/os</entry>
    </oemStrings>

use of a application specific prefix as illustrated above is
recommended, but not mandated, so that an app can reliably identify
which of the many OEM strings are targetted at it.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2018-01-25 14:48:56 +00:00
Shaohe Feng
e7cb9c4e23 cpu: Add support for al57 Intel features
We can start qemu with a "cpu,+la57" to set 57-bit vitrual address
space. So VM can be aware that it need to enable 5-level paging.

Corresponding QEMU commits:
        al57 6c7c3c21f95dd9af8a0691c0dd29b07247984122
2018-01-25 15:30:32 +01:00
Laine Stump
ed2049ea19 qemu: auto-add generic xhci rather than NEC xhci to Q35 domains
We recently added a generic XHCI USB3 controller to QEMU, and libvirt
supports adding that controller rather than the NEC XHCI USB3
controller, but when auto-adding a USB controller to Q35 domains we
were still adding the vendor-specific NEC controller. This patch
changes to add the generic controller instead, if it's available in
the QEMU binary that will be used.

Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2018-01-22 10:13:16 -05:00
Jiri Denemark
ba9ea2ad7d qemu: Don't initialize struct utsname
It breaks the build and it is not really useful for anything.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2018-01-22 14:53:39 +01:00
Jiri Denemark
52b7d910b6 qemu: Refresh caps cache after booting a different kernel
Whenever a different kernel is booted, some capabilities related to KVM
(such as CPUID bits) may change. We need to refresh the cache to see the
changes.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2018-01-22 14:11:58 +01:00
Laine Stump
7ce8ff0f88 qemu: move qemuDomainDefValidateVideo into qemuDomainDeviceDefValidateVideo
qemuDomainDefValidateVideo() (called from qemuDomainDefValidate()) is
just a loop performing various checks on each video device. Rather
than maintaining this separate function, just fold the validations
into qemuDomainDeviceDefValidateVideo(), which is called once for each
video device.
2018-01-21 11:10:03 -05:00
Laine Stump
18c24bc686 qemu: assign correct type of PCI address for vhost-scsi when using pcie-root
Commit 10c73bf1 fixed a bug that I had introduced back in commit
70249927 - if a vhost-scsi device had no manually assigned PCI
address, one wouldn't be assigned automatically. There was a slight
problem with the logic of the fix though - in the case of domains with
pcie-root (e.g. those with a q35 machinetype),
qemuDomainDeviceCalculatePCIConnectFlags() will attempt to determine
if the host-side PCI device is Express or legacy by examining sysfs
based on the host-side PCI address stored in
hostdev->source.subsys.u.pci.addr, but that part of the union is only
valid for PCI hostdevs, *not* for SCSI hostdevs. So we end up trying
to read sysfs for some probably-non-existent device, which fails, and
the function virPCIDeviceIsPCIExpress() returns failure (-1).

By coincidence, the return value is being examined as a boolean, and
since -1 is true, we still end up assigning the vhost-scsi device to
an Express slot, but that is just by chance (and could fail in the
case that the gibberish in the "hostside PCI address" was the address
of a real device that happened to be legacy PCI).

Since (according to Paolo Bonzini) vhost-scsi devices appear just like
virtio-scsi devices in the guest, they should follow the same rules as
virtio devices when deciding whether they should be placed in an
Express or a legacy slot. That's accomplished in this patch by
returning early with virtioFlags, rather than erroneously using
hostdev->source.subsys.u.pci.addr. It also adds a test case for PCIe
to assure it doesn't get broken in the future.
2018-01-20 22:01:24 -05:00
Jim Fehlig
71d56a3979 nodedev: Fix failing to parse PCI address for non-PCI network devices
Commit 8708ca01c added virNetDevSwitchdevFeature() to check if a network
device has Switchdev capabilities. virNetDevSwitchdevFeature() attempts
to retrieve the PCI device associated with the network device, ignoring
non-PCI devices. It does so via the following call chain

  virNetDevSwitchdevFeature()->virNetDevGetPCIDevice()->
  virPCIGetDeviceAddressFromSysfsLink()

For non-PCI network devices (qeth, Xen vif, etc),
virPCIGetDeviceAddressFromSysfsLink() will report an error when
virPCIDeviceAddressParse() fails. virPCIDeviceAddressParse() also
logs an error. After commit 8708ca01c there are now two errors reported
for each non-PCI network device even though the errors are harmless.

To avoid the errors, introduce virNetDevIsPCIDevice() and use it in
virNetDevGetPCIDevice() before attempting to retrieve the associated
PCI device. virNetDevIsPCIDevice() uses the 'subsystem' property of the
device to determine if it is PCI. See the sysfs rules in kernel
documentation for more details

https://www.kernel.org/doc/html/latest/admin-guide/sysfs-rules.html
2018-01-19 09:53:01 -07:00
Michal Privoznik
72adaf2f10 Revert "qemu: monitor: do not report error on shutdown"
https://bugzilla.redhat.com/show_bug.cgi?id=1536461

This reverts commit aeda1b8c56.

Problem is that we need mon->lastError to be set because it's
used all over the place. Also, there's nothing wrong with
reporting error if one occurred. I mean, if there's a thread
executing an API and which currently is talking on monitor it
definitely wants the error reported.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2018-01-19 14:31:03 +01:00
Jiri Denemark
bcc5710708 qemu: Fix crash in offline migration
When migrating a shutoff domain (i.e., offline migration), we have no
statistics to report and thus jobInfo will be NULL in
qemuMigrationFinish.

Broken by me in v3.10.0-183-ge8784e7868.

https://bugzilla.redhat.com/show_bug.cgi?id=1536351

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2018-01-19 10:51:19 +01:00
Jiri Denemark
6d4a3cd427 cpu: Add EPYC-IBPB CPU model
This is a variant of EPYC with indirect branch prediction protection.
The only difference between EPYC and EPYC-IBPB is the added "ibpb"
feature.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2018-01-18 15:04:18 +01:00
Daniel P. Berrange
bc251ea91b qemu: avoid denial of service reading from QEMU monitor (CVE-2018-5748)
We read from QEMU until seeing a \r\n pair to indicate a completed reply
or event. To avoid memory denial-of-service though, we must have a size
limit on amount of data we buffer. 10 MB is large enough that it ought
to cope with normal QEMU replies, and small enough that we're not
consuming unreasonable mem.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2018-01-18 09:04:27 +00:00
Marc Hartmayer
029e024770 qemu: qemuDomainNamespaceUnlinkPaths: Return 0 in case of success
Commit 7a931a4204 refactored the code and probably forgot to add
this line.

Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
2018-01-17 17:08:53 +01:00
Jiri Denemark
24d504396c cpu: Add Skylake-Server-IBRS CPU model
This is a variant of Skylake-Server with indirect branch prediction
protection. The only difference between Skylake-Server and
Skylake-Server-IBRS is the added "spec-ctrl" feature.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2018-01-17 17:07:04 +01:00
Jiri Denemark
b2042020c3 cpu: Add Skylake-Client-IBRS CPU model
This is a variant of Skylake-Client with indirect branch prediction
protection. The only difference between Skylake-Client and
Skylake-Client-IBRS is the added "spec-ctrl" feature.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2018-01-17 17:07:04 +01:00