This wil be used in the future, but it makes sense for now as well. It makes
sure there is no mask leftover that would leak.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Pointed out during review on one or two places, but it actually appears in lot
more places. So let's be consistent.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When working on the CAT series one of the changes was that the pointer got
allocated in another part of the code, even when resctrl was not available on
the host system. However this one particular place neglected that so it needs
to be fixed in order to get the proper error message when requesting
<cachetune/> on HW with no support for it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commits f83c7c88 and 6eb1f2b9 broke the build on FreeBSD and OSX because
of symbols being undefined for those platforms.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
After processing the processEvent->data for a qemuProcessEventHandler
callout, it's expected that the called processEvent->eventType helper
will perform the proper free on the data field. In this case it's
a qemuMonitorEventPanicInfoPtr.
Just like SRIOV, a PCI device is only capable of the mediated devices
framework when it's bound to the vendor native driver, thus if a driver
change occurs, e.g. vendor_native->vfio, we need to refresh some of the
device's capabilities to reflect the reality, mdev included.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Suggested-by: Wu Zongyong <cordius.wu@huawei.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Now that we have all the building blocks in place, switch the nodedev
driver to use the "new" virMediatedDeviceType type instead of the "old"
virNodeDevCapMdevType one.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
These are not necessary anymore, since these are going to be shadowed by
the helpers provided by util/virmdev.c module.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This is a replacement for the existing udevPCIGetMdevTypesCap which is
static to the udev backend. This simple helper constructs the sysfs path
from the device's base path for each mdev type and queries the
corresponding attributes of that type.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This should serve as a replacement for the existing udevFillMdevType
which is responsible for fetching the device type's attributes from the
sysfs interface. The problem with the existing solution is that it's
tied to the udev backend.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This is later going to replace the existing virNodeDevCapMdevType, since:
1) it's going to couple related stuff in a single module
2) util is supposed to contain helpers that are widely accessible across
the whole repository.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Most of them are static, however in case of PCI and SCSI_HOST devices,
the nested capabilities can change dynamically, e.g. due to a driver
change (from host_pci_driver -> vfio_pci).
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Suggested-by: Wu Zongyong <cordius.wu@huawei.com>
Whether asking for a number of capabilities supported by a device or
listing them, it's handled essentially by a copy-paste code, so extract
the common stuff into this new helper which also updates all
capabilities just before touching them.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Since we moved the helpers from nodedev driver to src/conf, the actual
'update' function using those helpers should be moved as well so that we
don't need to call back into the driver.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
The capabilities are defined/parsed/formatted/queried from this module,
no reason for 'update' not being part of the module as well. This also
involves some module-specific prefix changes.
This patch also drops the node_device_linux_sysfs module from the repo
since:
a) it only contained the capability handlers we just moved
b) it's only linked with the driver (by design) and thus unreachable to
other modules
c) we touch sysfs across all the src/util modules so the module being
deleted hasn't been serving its original intention for some time already.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This patch drops the capability matching redundancy by simply converting
the string input to our internal types which are then in turn used for
the actual capability matching.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
We currently have 2 methods that do the capability matching. This should
be condensed to a single function and all the derivates should just call
into that using a proper type conversion.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
We currently have 2 methods that do the capability matching. This should
be condensed to a single function and all the derivates should just call
into that using a proper type conversion.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
There is no method to rename inactive domains for test driver.
After this patch, we can rename the domains using 'domrename'.
virsh# domrename test anothertest
Domain successfully renamed
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For vhost-user ports, Open vSwitch acts as the server and QEMU the client.
When OVS crashes or restarts, the QEMU process should be reconnected to
OVS.
Signed-off-by: ZhiPeng Lu <lu.zhipeng@zte.com.cn>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The virNetSocketWriteSASL method has to encode the buffer it is given and then
write it to the underlying socket. This write is not guaranteed to send the
full amount of data that was encoded by SASL. We cache the SASL encoded data so
that on the next invocation of virNetSocketWriteSASL we carry on sending it.
The subtle problem is that the 'len' value passed into virNetSocketWriteSASL on
the 2nd call may be larger than the original value. So when we've completed
sending the SASL encoded data we previously cached, we must return the original
length we encoded, not the new length.
This flaw means we could potentially have been discarded queued data without
sending it. This would have exhibited itself as a libvirt client never receiving
the reply to a method it invokes, async events silently going missing, or worse
stream data silently getting dropped.
For this to be a problem libvirtd would have to be queued data to send to the
client, while at the same time the TCP socket send buffer is full (due to a very
slow client). This is quite unlikely so if this bug was ever triggered by a real
world user it would be almost impossible to reproduce or diagnose, if indeed it
was ever noticed at all.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
During migration, the lock process is paused in the perform phase
but not resumed if there is a subsequent failure, leaving the locked
resource unprotected.
The perform phase itself can fail, in which case the lock process
should be resumed before returning from perform. The finish phase
could also fail on the destination host, in which case the migration
is canceled in the confirm phase and the VM is resumed. The lock
process needs to be resumed there as well.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
We've been building up to this. This adds support for cputune/cachetune
settings for domains in the QEMU driver. The addition into
qemuProcessSetupVcpu() automatically adds support for hotplug. For hot-unplug
we need to remove the allocation only if all the vCPUs were unplugged. But
since the threads are left running, we can't really do much about it now.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1289368
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
More info in the documentation, this is basically the XML parsing/formatting
support, schemas, tests and documentation for the new cputune/cachetune element
that will get used by following patches.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
With this commit we finally have a way to read and manipulate basic resctrl
settings. Locking is done only on exposed functions that read/write from/to
resctrlfs. Not in functions that are exposed in virresctrlpriv.h as those are
only supposed to be used from tests.
More information about how resctrl works:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/intel_rdt_ui.txt
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This will make the current functions obsolete and it will provide more
information to the virresctrl module so that it can be used later.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This wires up the previously added OEM strings XML schema to be able to
generate comamnd line args for QEMU. This requires QEMU >= 2.12 release
containing this patch:
commit 2d6dcbf93fb01b4a7f45a93d276d4d74b16392dd
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Sat Oct 28 21:51:36 2017 +0100
smbios: support setting OEM strings table
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The OEM strings table in SMBIOS allows the vendor to pass arbitrary
strings into the guest OS. This can be used as a way to pass data to an
application like cloud-init, or potentially as an alternative to the
kernel command line for OS installers where you can't modify the install
ISO image to change the kernel args.
As an example, consider if cloud-init and anaconda supported OEM strings
you could use something like
<oemStrings>
<entry>cloud-init:ds=nocloud-net;s=http://10.10.0.1:8000/</entry>
<entry>anaconda:method=http://dl.fedoraproject.org/pub/fedora/linux/releases/25/x86_64/os</entry>
</oemStrings>
use of a application specific prefix as illustrated above is
recommended, but not mandated, so that an app can reliably identify
which of the many OEM strings are targetted at it.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We can start qemu with a "cpu,+la57" to set 57-bit vitrual address
space. So VM can be aware that it need to enable 5-level paging.
Corresponding QEMU commits:
al57 6c7c3c21f95dd9af8a0691c0dd29b07247984122
We recently added a generic XHCI USB3 controller to QEMU, and libvirt
supports adding that controller rather than the NEC XHCI USB3
controller, but when auto-adding a USB controller to Q35 domains we
were still adding the vendor-specific NEC controller. This patch
changes to add the generic controller instead, if it's available in
the QEMU binary that will be used.
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Whenever a different kernel is booted, some capabilities related to KVM
(such as CPUID bits) may change. We need to refresh the cache to see the
changes.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
qemuDomainDefValidateVideo() (called from qemuDomainDefValidate()) is
just a loop performing various checks on each video device. Rather
than maintaining this separate function, just fold the validations
into qemuDomainDeviceDefValidateVideo(), which is called once for each
video device.
Commit 10c73bf1 fixed a bug that I had introduced back in commit
70249927 - if a vhost-scsi device had no manually assigned PCI
address, one wouldn't be assigned automatically. There was a slight
problem with the logic of the fix though - in the case of domains with
pcie-root (e.g. those with a q35 machinetype),
qemuDomainDeviceCalculatePCIConnectFlags() will attempt to determine
if the host-side PCI device is Express or legacy by examining sysfs
based on the host-side PCI address stored in
hostdev->source.subsys.u.pci.addr, but that part of the union is only
valid for PCI hostdevs, *not* for SCSI hostdevs. So we end up trying
to read sysfs for some probably-non-existent device, which fails, and
the function virPCIDeviceIsPCIExpress() returns failure (-1).
By coincidence, the return value is being examined as a boolean, and
since -1 is true, we still end up assigning the vhost-scsi device to
an Express slot, but that is just by chance (and could fail in the
case that the gibberish in the "hostside PCI address" was the address
of a real device that happened to be legacy PCI).
Since (according to Paolo Bonzini) vhost-scsi devices appear just like
virtio-scsi devices in the guest, they should follow the same rules as
virtio devices when deciding whether they should be placed in an
Express or a legacy slot. That's accomplished in this patch by
returning early with virtioFlags, rather than erroneously using
hostdev->source.subsys.u.pci.addr. It also adds a test case for PCIe
to assure it doesn't get broken in the future.
Commit 8708ca01c added virNetDevSwitchdevFeature() to check if a network
device has Switchdev capabilities. virNetDevSwitchdevFeature() attempts
to retrieve the PCI device associated with the network device, ignoring
non-PCI devices. It does so via the following call chain
virNetDevSwitchdevFeature()->virNetDevGetPCIDevice()->
virPCIGetDeviceAddressFromSysfsLink()
For non-PCI network devices (qeth, Xen vif, etc),
virPCIGetDeviceAddressFromSysfsLink() will report an error when
virPCIDeviceAddressParse() fails. virPCIDeviceAddressParse() also
logs an error. After commit 8708ca01c there are now two errors reported
for each non-PCI network device even though the errors are harmless.
To avoid the errors, introduce virNetDevIsPCIDevice() and use it in
virNetDevGetPCIDevice() before attempting to retrieve the associated
PCI device. virNetDevIsPCIDevice() uses the 'subsystem' property of the
device to determine if it is PCI. See the sysfs rules in kernel
documentation for more details
https://www.kernel.org/doc/html/latest/admin-guide/sysfs-rules.html
https://bugzilla.redhat.com/show_bug.cgi?id=1536461
This reverts commit aeda1b8c56.
Problem is that we need mon->lastError to be set because it's
used all over the place. Also, there's nothing wrong with
reporting error if one occurred. I mean, if there's a thread
executing an API and which currently is talking on monitor it
definitely wants the error reported.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When migrating a shutoff domain (i.e., offline migration), we have no
statistics to report and thus jobInfo will be NULL in
qemuMigrationFinish.
Broken by me in v3.10.0-183-ge8784e7868.
https://bugzilla.redhat.com/show_bug.cgi?id=1536351
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This is a variant of EPYC with indirect branch prediction protection.
The only difference between EPYC and EPYC-IBPB is the added "ibpb"
feature.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We read from QEMU until seeing a \r\n pair to indicate a completed reply
or event. To avoid memory denial-of-service though, we must have a size
limit on amount of data we buffer. 10 MB is large enough that it ought
to cope with normal QEMU replies, and small enough that we're not
consuming unreasonable mem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 7a931a4204 refactored the code and probably forgot to add
this line.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
This is a variant of Skylake-Server with indirect branch prediction
protection. The only difference between Skylake-Server and
Skylake-Server-IBRS is the added "spec-ctrl" feature.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This is a variant of Skylake-Client with indirect branch prediction
protection. The only difference between Skylake-Client and
Skylake-Client-IBRS is the added "spec-ctrl" feature.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>