In the past we automatically added a USB controller and assigned
it a PCI address (0:0:1.2) even on machines without a PCI bus.
This didn't break machines with no PCI bus because the command
line for it is just '-usb', with no mention of the PCI bus.
The implicit IDE controller (reserved address 0:0:1.1) has
no command line at all.
Commit b33eb0dc removed the ability to reserve PCI addresses
on machines without a PCI bus. This made them stop working,
since there would always be the implicit USB controller.
Skip the reservation of addresses for these controllers when
there is no PCI bus, instead of failing.
This isn't strictly speaking a bugfix, but I realized I'd gotten a bit
too verbose when I chose the names for
VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_*. This shortens them all a bit.
<source type='bridge'> uses a helper application to do the necessary
TUN/TAP setup to use an existing network bridge, thus letting
unprivileged users use TUN/TAP interfaces.
However, libvirt should be preventing QEMU from running any setuid
programs at all, which would include this helper program. From
a security POV, any setuid helper needs to be run by libvirtd itself,
not QEMU.
This is what this patch does. libvirt now invokes the setuid helper,
gets the TAP fd and then passes it to QEMU in the normal manner.
The path to the helper is specified in qemu.conf.
As a small advantage, this adds a <target dev='tap0'/> element to the
XML of an active domain using <interface type='bridge'>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VFIO requires all of the guest's memory and IO space to be lockable in
RAM. The domain's max_balloon is the maximum amount of memory the
domain can have (in KiB). We add a generous 1GiB to that for IO space
(still much better than KVM device assignment, where the KVM module
actually *ignores* the process limits and locks everything anyway),
and convert from KiB to bytes.
In the case of hotplug, we are changing the limit for the already
existing qemu process (prlimit() is used under the hood), and for
regular commandline additions of vfio devices, we schedule a call to
setrlimit() that will happen after the qemu process is forked.
These were previously being set in a custom hook function, but now
that virCommand directly supports setting them, we can eliminate that
part of the hook and call the APIs directly.
The differences from virNodeDeviceDettach are very minor:
1) Check that the flags are 0.
2) Set the virPCIDevice's stubDriver according to the driverName that
is passed in.
3) Call virPCIDeviceDetach with a NULL stubDriver, indicating it
should get the name of the stub driver from the virPCIDevice
object.
If the config for a device has specified <driver name='vfio'/>,
"backend" in the pci part of the hostdev object will be set to
..._VFIO. In this case, when creating a virPCIDevice set the
stubDriver to "vfio-pci", otherwise set it to "pci-stub". We will rely
on the lower levels to report an error if the vfio driver isn't
loaded.
The detach/attach functions in virpci.c will pay attention to the
stubDriver setting in the device, and bind/unbind the appropriate
driver when preparing hostdevs for the domain.
Note that we don't yet attempt to do anything to mark active any other
devices in the same vfio "group" as a single device that is being
marked active. We do need to do that, but in order to get basic VFIO
functionality testing sooner rather than later, initially we'll just
live with more cryptic errors when someone tries to do that.
The device option for vfio-pci is nearly identical to that for
pci-assign - only the configfd parameter isn't supported (or needed).
Checking for presence of the bootindex parameter is done separately
from constructing the commandline, similar to how it is done for
pci-assign.
This patch contains tests to check for proper commandline
construction. It also includes tests for parser-formatter-parser
roundtrips (xml2xml), because those tests use the same data files, and
would have failed had they been included before now.
qemu: xml/args tests for VFIO hostdev and <interface type='hostdev'/>
These should be squashed in with the patch that adds commandline
handling of vfio (they would fail at any earlier time).
There will soon be other items related to pci hostdevs that need to be
in the same part of the hostdevsubsys union as the pci address (which
is currently a single member called "pci". This patch replaces the
single member named pci with a struct named pci that contains a single
member named "addr".
QEMU_CAPS_DEVICE_VFIO_PCI is set if the device named "vfio-pci" is
supported in the qemu binary.
QEMU_CAPS_VFIO_PCI_BOOTINDEX is set if the vfio-pci device supports
the "bootindex" parameter; for some reason, the bootindex parameter
wasn't included in early versions of vfio support (qemu 1.4) so we
have to check for it separately from vfio itself.
Jim Fehlig reported on IRC that older gcc/glibc triggers this warning:
cc1: warnings being treated as errors
qemu/qemu_domain.c: In function 'qemuDomainDefFormatBuf':
qemu/qemu_domain.c:1297: error: declaration of 'remove' shadows a global declaration [-Wshadow]
/usr/include/stdio.h:157: error: shadowed declaration is here [-Wshadow]
make[3]: *** [libvirt_driver_qemu_impl_la-qemu_domain.lo] Error 1
Fix it like we have done in the past (such as commit 2e6322a).
* src/qemu/qemu_domain.c (qemuDomainDefFormatBuf): Avoid shadowing
a function name.
Signed-off-by: Eric Blake <eblake@redhat.com>
After 9d6e56db the syntax-check was unhappy due to wrong whitespacing:
src/qemu/qemu_command.c:1637: for ( ; a.slot < QEMU_PCI_ADDRESS_SLOT_LAST; a.slot++) {
maint.mk: incorrect whitespace around brackets, see HACKING for rules
make: *** [bracket-spacing-check] Error 1
After 78d7c3c5 we are strdup()-ing path to qemu-bridge-helper.
However, the check for its return value is missing. So it is
possible we've ignored the OOM error silently.
Add a "dry run" address allocation to figure out how many bridges
will be needed for all the devices without explicit addresses.
Auto-add just enough bridges to put all the devices on, or up to the
bridge with the largest specified index.
<controller type='pci' index='0' model='pci-root'/>
is auto-added to pc* machine types.
Without this controller PCI bus 0 is not available and
no PCI addresses are assigned by default.
Since older libvirt supported PCI bus 0 even without
this controller, it is removed from the XML when migrating.
Now we set the default disk driver name when parsing
the qemu command line too, hence all the test changes.
Assume format type is 'auto' when none is specified on
qemu command line.
Currently, if there has been an error in building command line
process after virtual interfaces has been created, the flow jumps
to 'error' label, where virDomainConfNWFilterTeardown() is
called. This may report an error as well, but should not
overwrite the original cause why we jumped to 'error' label.
Instead of making a choice between the underscore and camelCase, this
simply changes "num_queues" into "queues", which is also consistent
with Michal's multiple queue support for interface.
Improve error reporting and generating of SPICE command line arguments
according to the need to enable TLS. If TLS is disabled, there's no need
to pass the certificate dir to qemu.
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=953126
Ensure that all drivers implementing public APIs use a
naming convention for their implementation that matches
the public API name.
eg for the public API virDomainCreate make sure QEMU
uses qemuDomainCreate and not qemuDomainStart
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure that the driver struct field names match the public
API names. For an API virXXXX we must have a driver struct
field xXXXX. ie strip the leading 'vir' and lowercase any
leading uppercase letters.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Switch the function from a bunch of ifs to a switch statement with
correct type and reflow some code.
Also fix comment in enum describing possible graphics types
Decrease size of qemuBuildGraphicsCommandLine() by splitting out
spice-related code into qemuBuildGraphicsVNCCommandLine().
This patch also fixes 2 possible memory leaks on error path in the code
that was split-out. The buffer containing the already generated options
and a listen address string could be leaked.
Also break a few very long lines and reflow code that fits now.
Decrease size of qemuBuildGraphicsCommandLine() by splitting out
spice-related code into qemuBuildGraphicsSPICECommandLine().
This patch also fixes 2 possible memory leaks on error path in the code
that was split-out. The buffer containing the already generated options
and a listen address string could be leaked.
Also break a few very long lines.
Refactoring done in 19c6ad9ac7e7eb2fd3c8262bff5f087b508ad07f didn't
correctly take into account the order cgroup limit modification needs to
be done in. This resulted into errors when decreasing the limits.
The operations need to take place in this order:
decrease hard limit
change swap hard limit
or
change swap hard limit
increase hard limit
This patch also fixes the check if the hard_limit is less than
swap_hard_limit to print better error messages. For this purpose I
introduced a helper function virCompareLimitUlong to compare limit
values where value of 0 is equal to unlimited. Additionally the check is
now applied also when the user does not provide all of the tunables
through the API and in that case the currently set values are used.
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=950478
The change in commit aed4986322fe77bdf718e31a0587d00f04f3d97a
was incomplete, missing a couple of cases of /system. This
caused failure to start VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After discussions with systemd developers it was decided that
a better default policy for resource partitions is to have
3 default partitions at the top level
/system - system services
/machine - virtual machines / containers
/user - user login session
This ensures that the default policy isolates guest from
user login sessions & system services, so a mis-behaving
guest can't consume 100% of CPU usage if other things are
contending for it.
Thus we change the default partition from /system to
/machine
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wrong use of the parentheses causes "rc" always having a boolean value,
either "1" or "0", and thus we can't get the detailed error message
when it fails:
Before (I only have 1 node):
% virsh numatune f18 --nodeset 12
error: Unable to change numa parameters
error: unable to set numa tunable: Unknown error -1
After:
virsh numatune f18 --nodeset 12
error: Unable to change numa parameters
error: unable to set numa tunable: Invalid argument
Each bus is represented as an array of 32 8-bit integers
where each bit represents a PCI function and each byte represents
a PCI slot.
Uses just one bus so far.
Create a new function qemuPCIAddressValidate and call it everywhere
the user might supply an incorrect address:
* qemuCollectPCIAddress for domain definition
* qemuDomainPCIAddressEnsureAddr and ReleaseSlot for hotplug
Slot and function shouldn't be wrong at this point, since values
out of range should be rejected by the XML parser.
Change QEMU_PCI_ADDRESS_LAST_SLOT to the number of slots in the bus,
not the maximum slot value, to match QEMU_PCI_ADDRESS_LAST_FUNCTION
and rename them both to have _LAST at the end.
Currently, -device xxx still doesn't work well for ppc64 platform.
It's better use legacy USB option with default for ppc64.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reusing the result of virArchFromHost instead of calling it multiple times
Signed-off-by: Tal Kain <tal.kain@ravellosystems.com>
Signed-off-by: Eric Blake <eblake@redhat.com>