There are some configuration options to some types of pci controllers
that are currently automatically derived from other parts of the
controller's configuration. For example, in qemu a pci-bridge
controller has an option that is called "chassis_nr"; up until now
libvirt has always set chassis_nr to the index of the pci-bridge. So
this:
<controller type='pci' model='pci-bridge' index='2'/>
will always result in:
-device pci-bridge,chassis_nr=2,...
on the qemu commandline. In the future we may decide there is a better
way to derive that option, but even in that case we will need for
existing domains to retain the same chassis_nr they were using in the
past - that is something that is visible to the guest so it is part of
the guest ABI and changing it would lead to problems for migrating
guests (or just guests with very picky OSes).
The <target> subelement has been added as a place to put the new
"chassisNr" attribute that will be filled in by libvirt when it
auto-generates the chassisNr; it will be saved in the config, then
reused any time the domain is started:
<controller type='pci' model='pci-bridge' index='2'>
<model type='pci-bridge'/>
<target chassisNr='2'/>
</controller>
The one oddity of all this is that if the controller configuration
is changed (for example to change the index or the pci address
where the controller is plugged in), the items in <target> will
*not* be re-generated, which might lead to conflict. I can't
really see any way around this, but fortunately if there is a
material conflict qemu will let us know and we will pass that on
to the user.
This patch provides qemu support for the contents of <model> in
<controller> for the two existing PCI controller types that need it
(i.e. the two controller types that are backed by a device that must
be specified on the qemu commandline):
1) pci-bridge - sets <model> name attribute default as "pci-bridge"
2) dmi-to-pci-bridge - sets <model> name attribute default as
"i82801b11-bridge".
These both match current hardcoded practice.
The defaults are set at the end of qemuDomainAssignPCIAddresses().
This can't be done earlier because some of the options that will be
autogenerated need full PCI address info for the controller, and
because qemuDomainAssignPCIAddresses() might create extra controllers
which would need default settings added, and that hasn't yet been done
at the time the PostParse callbacks are being run.
qemuDomainAssignPCIAddresses() is still called prior to the XML being
written to disk, though, so the autogenerated defaults are persistent.
qemu capabilities bits aren't checked when the domain is defined, but
rather when the commandline is actually created (so the domain can
possibly be defined on a host that doesn't yet have support for the
given device, or a host different from the one where it will
eventually be run). When the commandline is being generated we compare
the modelName to known qemu device names implementing the given type
of controller, and check the capabilities bit for that device.
This new subelement is used in PCI controllers: the toplevel
*attribute* "model" of a controller denotes what kind of PCI
controller is being described, e.g. a "dmi-to-pci-bridge",
"pci-bridge", or "pci-root". But in the future there will be different
implementations of some of those types of PCI controllers, which
behave similarly from libvirt's point of view (and so should have the
same model), but use a different device in qemu (and present
themselves as a different piece of hardware in the guest). In an ideal
world we (i.e. "I") would have thought of that back when the pci
controllers were added, and used some sort of type/class/model
notation (where class was used in the way we are now using model, and
model was used for the actual manufacturer's model number of a
particular family of PCI controller), but that opportunity is long
past, so as an alternative, this patch allows selecting a particular
implementation of a pci controller with the "name" attribute of the
<model> subelement, e.g.:
<controller type='pci' model='dmi-to-pci-bridge' index='1'>
<model name='i82801b11-bridge'/>
</controller>
In this case, "dmi-to-pci-bridge" is the kind of controller (one that
has a single PCIe port upstream, and 32 standard PCI ports downstream,
which are not hotpluggable), and the qemu device to be used to
implement this kind of controller is named "i82801b11-bridge".
Implementing the above now will allow us in the future to add a new
kind of dmi-to-pci-bridge that doesn't use qemu's i82801b11-bridge
device, but instead uses something else (which doesn't yet exist, but
qemu people have been discussing it), all without breaking existing
configs.
(note that for the existing "pci-bridge" type of PCI controller, both
the model attribute and <model> name are 'pci-bridge'. This is just a
coincidence, since it turns out that in this case the device name in
qemu really is a generic 'pci-bridge' rather than being the name of
some real-world chip)
If a pci address had a function number out of range, the error message
would be:
Insufficient specification for PCI address
which is logged by virDevicePCIAddressParseXML() after
virDevicePCIAddressIsValid returns a failure.
This patch enhances virDevicePCIAddressIsValid() to optionally report
the error itself (since it is the place that decides which part of the
address is "invalid"), and uses that feature when calling from
virDevicePCIAddressParseXML(), so that the error will be more useful,
e.g.:
Invalid PCI address function=0x8, must be <= 7
Previously, virDevicePCIAddressIsValid didn't check for the
theoretical limits of domain or bus, only for slot or function. While
adding log messages, we also correct that ommission. (The RNG for PCI
addresses already enforces this limit, which by the way means that we
can't add any negative tests for this - as far as I know our
domainschematest has no provisions for passing XML that is supposed to
fail).
Note that virDevicePCIAddressIsValid() can only check against the
absolute maximum attribute values for *any* possible PCI controller,
not for the actual maximums of the specific controller that this
device is attaching to; fortunately there is later more specific
validation for guest-side PCI addresses when building the set of
assigned PCI addresses. For host-side PCI addresses (e.g. for
<hostdev> and for network device pools), we rely on the error that
will be logged when it is found that the device doesn't actually
exist.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1004596
https://bugzilla.redhat.com/show_bug.cgi?id=1176020
Some users think this is a good idea:
<vcpu placement='static'>4</vcpu>
<cpu mode='host-model'>
<model fallback='allow'/>
<numa>
<cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
<cell id='1' cpus='9-10' memory='2097152' unit='KiB'/>
</numa>
</cpu>
It's not. Lets therefore introduce a check and discourage them in
doing so.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This function should return the greatest CPU number set in
/domain/cpu/numa/cell/@cpus. The idea is that we should compare
the returned value against /domain/vcpu value. Yes, there exist
users who think the following is a good idea:
<vcpu placement='static'>4</vcpu>
<cpu mode='host-model'>
<model fallback='allow'/>
<numa>
<cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
<cell id='1' cpus='9-10' memory='2097152' unit='KiB'/>
</numa>
</cpu>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Qemu reports physical size 0 for block devices. As 15fa84acbb55ebfee6a4
changed the behavior of qemuDomainGetBlockInfo to just query the monitor
this created a regression since we didn't report the size correctly any
more.
This patch adds code to refresh the physical size of a block device by
opening it and seeking to the end and uses it both in
qemuDomainGetBlockInfo and also in qemuDomainGetStatsOneBlock that was
broken since it was introduced in this respect.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1250982
The commit 7e72de4 didn't consider the hotplug scenarios. The patch addresses
the hotplug case whereby if atleast one of the pci function is owned by a
guest, the hotplug of other functions/devices in the same iommu group to the
same guest goes through successfully.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
virtio-net-pci adapter is capable to use irqfd with vhost-net only in MSI-X
mode, which appears to be available only on PCIe bus, at least on ARM
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Legacy -net option works correctly only with embedded device models, which
do not require any bus specification. Therefore, we should use -device for
PCI hardware
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Here we assume that if qemu supports generic PCI host controller,
it is a part of virt machine and can be used for adding PCI devices.
In qemu this is actually a PCIe bus, so we also declare multibus
capability so that 0'th bus is specified to qemu correctly as 'pcie.0'
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This capability specifies that qemu can implement generic PCI host
controller. It is often used for virtual environments, including ARM.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Libvirt doesn't reliably know the location of the backing chain when
pre-creating images for non-shared migration. This isn't a problem for
full copy, but incremental copy requires the information.
Forbid pre-creating the image in cases where incremental migration is
required. This limitation can perhaps be lifted once libvirt will fully
support loading of backing chain information from the XML.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1249587
Only the symbols exported by the driver have been updated;
the driver implementation itself still uses the old names
internally.
No functional changes.
The driver only supports VIR_ARCH_PPC64 and VIR_ARCH_PPC64LE.
Just shuffling files around and updating the build system
accordingly. No functional changes.
The recent changes to perform SCSI device address checks during the
post parse callbacks ran afoul of the Coverity checker since the changes
assumed that the 'xmlopt' parameter to virDomainDeviceDefPostParse
would be non NULL (commit id 'ca2cf74e87'); however, what was missed
is there was an "if (xmlopt &&" check being made, so Coverity believed
that it could be possible for a NULL 'xmlopt'.
Checking the various calling paths seemingly disproves that. If called
from virDomainDeviceDefParse, there were two other possible calls that
would end up dereffing, so that path could not be NULL. If called via
virDomainDefPostParseDeviceIterator via virDomainDefPostParse there
are two callers (virDomainDefParseXML and qemuParseCommandLine)
which deref xmlopt either directly or through another call.
So I'm removing the check for non-NULL xmlopt.
Rather than provide a somewhat generic error message when the API
returns false, allow the caller to supply a "report = true" option
in order to cause virReportError's to describe which of the 3 paths
that can cause failure.
Some callers don't care about what caused the failure, they just want
to have a true/false - for those, calling with report = false should
be sufficient.
PowerPC pseries based VMs do not support a floppy disk controller.
This prohibits libvirt from creating qemu command with floppy device.
Signed-off-by: Kothapally Madhu Pavan <kmp@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1180486
Signed-off-by: Ján Tomko <jtomko@redhat.com>
PowerPC pseries based VMs do not support a floppy disk controller.
This prohibits libvirt from adding floppy disk for a PowerPC pseries VM.
Signed-off-by: Kothapally Madhu Pavan <kmp@linux.vnet.ibm.com>
Rather than calling virDomainDiskDefAssignAddress during the parsing of
the XML, moving the setting of disk addresses into the domain/device post
processing.
Commit id '37588b25' which introduced VIR_DOMAIN_DEF_PARSE_DISK_SOURCE
in order to avoid generating the address which wasn't required will not
be affected by this as all it cared about was processing the source XML.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than calling virDomainHostdevAssignAddress during the parsing
of the XML, move the setting of a default hostdev address to domain/
device post processing.
Since the parse code no longer generates an address, we can remove
the virDomainDefMaybeAddHostdevSCSIcontroller since the call to
virDomainHostdevAssignAddress will attempt to add the controllers
that were not already defined in the XML.
This patch will also enforce that the address type is type 'drive'
when a SCSI subsystem <hostdev> element is provided with an <address>.
Signed-off-by: John Ferlan <jferlan@redhat.com>
If virDomainControllerSCSINextUnit failed to find a slot on the current
VIR_DOMAIN_CONTROLLER_TYPE_SCSI controller(s), try to add a new controller;
otherwise, there may be multiple unit=0 entries for the same "next"
controller.
Signed-off-by: John Ferlan <jferlan@redhat.com>
While searching the hostdevs the drive type can be either *_TYPE_DRIVE
or *_TYPE_NONE. If the type is _TYPE_NONE on the first scsi_host, then
there is an erroneous "match" that the address already exists.
Although this works by chance currently because hostdev's are added one
at a time and 'nhostdevs' would be zero, thus returning false for the
first hostdev added, a future patch will move the hostdev address
assignment into post processing resulting in the bad match.
This code is only called by path's expecting either drive or none.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Add the xmlopt parameter that was saved during virDomainDefPostParse
to the parameters. A future patch will use it.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Modify virDomainDriveAddressIsUsedBy{Disk|Hostdev} and
virDomainSCSIDriveAddressIsUsed to take 'bus' and 'target'
parameters. Will be used by future patches for more complete
address conflict checks
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since the only way virDomainHostdevAssignAddress can be called is from
within virDomainHostdevDefParseXML when hostdev->source.subsys.type is
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI, thus there's no need for redundancy.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Compiler error:
../../src/nodeinfo.c: In function 'nodeGetThreadsPerSubcore':
../../src/nodeinfo.c:2393: error: label 'out' defined but not used [-Wunused-label]
../../src/nodeinfo.c:2352: error: unused parameter 'arch' [-Wunused-parameter]
The virDomainObjListRemove() function unlocks a domain that it's given
due to legacy code. And because of that code, which should be
refactored, that last virObjectUnlock() cannot be just removed. So
instead, lock it right back for qemu for now. All calls to
qemuDomainRemoveInactive() are followed by code that unlocks the domain
again, plus the domain should be locked during qemuDomainObjEndJob(), so
the right place to lock it is right after virDomainObjListRemove().
The only place where this would cause a problem is the autodestroy
callback, so we need to get another reference there and uref+unlock it
afterwards. Luckily, returning NULL from that function doesn't mean an
error, and only means that it doesn't need to be unlocked anymore.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The nodeinfo is reporting incorrect number of cpus and incorrect host
topology on PPC64 KVM hosts. The KVM hypervisor on PPC64 needs only
the primary thread in a core to be online, and the secondaries offlined.
While scheduling a guest in, the kvm scheduler wakes up the secondaries to
run in guest context.
The host scheduling of the guests happen at the core level(as only primary
thread is online). The kvm scheduler exploits as many threads of the core
as needed by guest. Further, starting POWER8, the processor allows splitting
a physical core into multiple subcores with 2 or 4 threads each. Again, only
the primary thread in a subcore is online in the host. The KVM-PPC
scheduler allows guests to exploit all the offline threads in the subcore,
by bringing them online when needed.
(Kernel patches on split-core http://www.spinics.net/lists/kvm-ppc/msg09121.html)
Recently with dynamic micro-threading changes in ppc-kvm, makes sure
to utilize all the offline cpus across guests, and across guests with
different cpu topologies.
(https://www.mail-archive.com/kvm@vger.kernel.org/msg115978.html)
Since the offline cpus are brought online in the guest context, it is safe
to count them as online. Nodeinfo today discounts these offline cpus from
cpu count/topology calclulation, and the nodeinfo output is not of any help
and the host appears overcommited when it is actually not.
The patch carefully counts those offline threads whose primary threads are
online. The host topology displayed by the nodeinfo is also fixed when the
host is in valid kvm state.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Use I/O vector (iovec) instead of one huge memory buffer as suggested
in https://bugzilla.redhat.com/show_bug.cgi?id=1026137#c7. This avoids
doing memmove() to big buffers and performance doesn't degrade if
source (virNetClientStreamQueuePacket()) is faster than sink
(virNetClientStreamRecvPacket()).
Resolves: http://bugzilla.redhat.com/1026137
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
If cpuset is disabled or not available, it libvirt must not use it.
Mainly for actions that do not need it and can use sched_setaffinity()
or numa_membind() instead, because they will fail without good reason.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1244664
Signed-off-by: Luyao Huang <lhuang@redhat.com>
When stopping a domain on the destination host after a failed migration,
we need to avoid reseting security labels since the domain is still
running on the source host. While we were correctly doing so in some
cases, there were still some paths which did this wrong.
https://bugzilla.redhat.com/show_bug.cgi?id=1242904
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In addition to checking the current asynchronous job
qemuMigrationJobIsActive reports an error if the current job does not
match the one we asked for. Let's just check the job directly since we
are not interested in the error in qemuProcessHandleMonitorEOF.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
If destination libvirt doesn't support memory hotplug since all the
support was introduced by adding new elements the destination would
attempt to start qemu with an invalid configuration. The worse part is
that qemu might hang in such situation.
Fix this by sending a required migration feature called 'memory-hotplug'
to the destination. If the destination doesn't recognize it it will fail
the migration.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1248350
So far qemu-nbd is run even if the nbd kernel module isn't loaded. This
leads to errors when the user starts his lxc container while libvirt
could easily load the nbd module automatically.
Our atomic increment (virAtomicIntInc) uses (if available) gcc
__sync_add_and_fetch builtin. In qemu driver though, we'd profit more
from __sync_fetch_and_add builtin. To keep it simplistic, this patch
adjusts qemu driver initialization rather than adding a new atomic
increment macro.
virDomainDeleteConfig is meant to delete the persistent config and thus
it resets vm->autostart. Copy parts of qemuProcessRemoveDomainStatus to
a new helper to avoid using the incorrect function.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1230071
The remoteDomainOpenGraphicsFD method was using the wrong RPC
arg struct remote_domain_open_graphics_args instead of
remote_domain_open_graphics_fd_args. Fortunately both structs
had identical contents so there was no functional bug, but to
avoid confusing future maintainers, we should fix it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
First hunk changes the use of srcdir to top_srcdir so it complies with
other rules in the Makefile. Second one removes the need of
remote_protocol.h in admin_protocol.h as it was suggested and worked in,
but this one line was missed apparently. Last one just removes the
'remote' naming from admin protocol specification, just so it's cleaner.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit d506a51aeb2a7a7b0c963f760e32b94376ea7173 meant to check for
QEMU_CAPS_DRIVE_IOTUNE_MAX, but checked for QEMU_CAPS_DRIVE_IOTUNE
instead. That's clearly visible from the diff, but it got in. Because
of that, we were supplying information unknown for QEMU if it wasn't new
enough and we couldn't even properly handle the error, leading to
"Unexpected error". Also iops_size came at the same time with all the
other "_max" options, so check whether we're not setting that either if
QEMU_CAPS_DRIVE_IOTUNE_MAX is not supported.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1224053
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There are some non-0 default values in virDomainControllerDef (and
will soon be more) that are easier to not forget if the remembering is
done by a single initializer function (rather than inline code after
allocating the obejct with generic VIR_ALLOC().
This loop occurs just after we've assured that all devices that
require a PCI device have been assigned and all necessary PCI
controllers have been added. It is the perfect place to add other
potentially auto-generated PCI controller attributes that are
dependent on the controller's PCI address (upcoming patch).
There is a convenient loop through all controllers at the end of the
function, but the patch to add new functionality will be cleaner if we
first rearrange that loop a bit.
Note that the loop originally was accessing info.addr.pci.bus prior to
determining that the pci part of the object was valid. This isn't
dangerous in any way, but seemed a bit ugly, so I fixed it.
The function that auto-assigns PCI addresses was written with the
hardcoded assumptions that any PCI bus would have slots available
starting at 1 and ending at 31. This isn't true for many types of
controllers (some have a single slot/port at 0, some have slots/ports
from 0 to 31). This patch updates that function to remove the
hardcoded assumptions. It will properly find/assign addresses for
devices that can only connect to pcie-(root|downstream)-port (which
have minSlot/maxSlot of 0/0) or a pcie-switch-upstream-port (0/31).
It still will not auto-create a new bus of the proper kind for these
connections when one doesn't exist, that task is for another day.