and re-adjust if the hotplug fails.
This fixes a bug found during testing of
https://bugzilla.redhat.com/1939776, which was supposed to be resolved
by commit 98e22ff749, but failed to account for the case of device
hotplug.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
An upcoming patch will be checking if the addition of a new net device
requires adjusting the domain locked memory limit, which must be done
prior to sending the command to qemu to add the new device. But
qemuDomainAdjustMaxMemLock() checks all (and only) the devices that
are currently in the domain definition, and currently we are adding
new net devices to the domain definition only at the very end of the
hotplug operation, after qemu has already executed the device_add
command.
In order for the upcoming patch to work, this patch changes
qemuDomainAttachNetDevice() to add the device to the domain nets list
at an earlier time. It can't be added until after PCI address and
alias name have been determined (because both of those examine
existing devices in the domain to figure out a unique value for the
new device), but must be done before making the qemu monitor call.
Since the device has been added to the list earlier, we need to
potentially remove it on failure. This is done by replacing the
existing call to virDomainNetRemoveHostdev() (which checks if this is
a hostdev net device, and if so removes it from the hostdevs list,
since it could have already been added to that list) with a call to
the new virDomainNetRemoveByObj(), which looks for the device on both
nets and hostdevs lists, and removes it where it finds it.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
virDomainNetRemove() requires the index of the net device you want to
remove from the list, but in some cases you may not have the index
handy, only the object itself (or the object may not have been added
to the domain's list). virDomainNetRemoveByObj() first tries to find
the given object in the nets list, and deletes that if it is found.
As with virDomainNetRemove() it always unconditionally tries to remove
the device from the hostdevs list (in case it is the ridiculous
combined net+hostdev device created for <interface type='hostdev'>).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We have many places where the earliest error returns from a function
skip any cleanup label at the bottom (the assumption being that it is
so early in the function that there isn't yet anything that needs to
be explicitly undone on failure). But in general it is a bad sign if
there are any direct "return" statements in a function at any time
after there has been a "goto cleanup" - that indicates someone thought
that an earlier point in the code had done something needing cleanup,
so we shouldn't be skipping it.
There were two occurences of a "return -1" after "goto cleanup" in
qemuDomainAttachDeviceNet(). The first of these has been around for a
very long time (since 2013) and my assumption is that the earlier
"goto cleanup" didn't exist at that time (so it was proper), and when
the code further up in the function was added, the this return -1 was
missed. The second was added during a mass change to check the return
from qemuInterfacePrepareSlirp() in several places (commit
99a1cfc43889c6d); in this case it was erroneous from the start.
Change both of these "return -1"s to "goto cleanup". Since we already
have code paths earlier in the function that goto cleanup, this should
not cause any new problem.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
There was a recent change in libxml2 that caused a trouble for
us. To us, <metadata/> in domain or network XMLs are just opaque
value where management application can store whatever data it
finds fit. At XML parser/formatter level, we just make a copy of
the element during parsing and then format it back. For
formatting we use xmlNodeDump() which allows caller to specify
level of indentation. Previously, the indentation was not
applied onto the very first line, but as of v2.9.12-2-g85b1792e
libxml2 is applying indentation also on the first line.
This does not work well with out virBuffer because as soon as we
call virBufferAsprintf() to append <metadata/> element,
virBufferAsprintf() will apply another level of indentation.
Instead of version checking, let's skip any indentation added by
libxml2 before virBufferAsprintf() is called.
Note, the problem is only when telling xmlNodeDump() to use
indentation, i.e. level argument is not zero. Therefore,
virXMLNodeToString() which also calls xmlNodeDump() is safe as it
passes zero.
Tested-by: Bjoern Walk <bwalk@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
I guess this is more of an academic problem, because if
<metadata/> content was problematic we would have caught the
error during parsing. Anyway, as is this function returns -1
without any error reported. Fix it by reporting one.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
So far, we have to places where we format <metadata/> into XMLs:
domain and network. Bot places share the same code. Move it into
a helper function and just call it from those places.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
There's an if-else statement in libxlCapsInitNuma() that can
really be just two standalone if()-s. Writing it as such helps
with code readability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Implement this behaviour by skipping the disks on traditional
commandline and hotplug them before resuming CPUs. That allows to use
the support for hotplugging of transient disks which inherently allows
sharing of the backing image as we open it read-only.
This commit implements the validation code to allow it only with buses
supporting hotplug and the hotplug code while starting up the VM.
When we have such disk we need to issue a system-reset so that firmware
tables are regenerated to allow booting from such device.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
In case the user wants to share the disk image between multiple VMs the
qemu driver needs to hotplug such disks to instantiate the backends.
Since that doesn't work for all disk configs add a switch to force this
behaviour.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The qemuDomainAttachDiskGeneric will also be used on startup for
transient disks which share the overlay. The VM startup code passes the
asyncJob around so we need to pass it into qemuDomainAttachDiskGeneric.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Add code which creates the transient overlay after hotplugging the disk
backend before attaching the disk frontend.
The state of the topmost image is modified to be already read-only to
prevent the need to open the image in read-write mode.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
In preparation for hotplug of <transient> disks we'll need to track
whether the overlay file was created individually per-disk.
Add 'transientOverlayCreated' to 'struct _qemuDomainDiskPrivate' and
remove 'inhibitDiskTransientDelete' from 'qemuDomainObjPrivate' and
adjust the code for the change.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Split up the monitor contexts to attach the backend of the disk and the
frontend device in preparation for hotplugging transient disks where
we'll need to add the code for adding the transient overlay between
these two steps.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Modify the rollback section to use its own monitor context so that we
can later split up the hotplug into multiple steps and move the
detachment of the extension device into the rollback section rather than
doing it inline.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Similarly to previous refactors we want to move all hotplug related
setup which isn't strictly relevant to attaching the disk into
qemuDomainAttachDeviceDiskLiveInternal.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Remove the 'ret' variable and 'cleanup' label in favor of directly
returning the value since we don't have anything under the 'cleanup:'
label.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Remove two empty lines.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Move the auditing entry and insertion into the disk definition from the
function which deals with qemu to 'qemuDomainAttachDeviceDiskLiveInternal'
which deals with the hotplug related specifics.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
qemuDomainAttachDeviceDiskLiveInternal already sets up certain pieces of
the disk definition so it's better suited to move the setup of the
virStorageSource structs, granting access to the storage and allocation
of the alias from qemuDomainAttachDiskGeneric which will be just
handling the qemu interaction.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We can call it in one place as all per-device-type subcases use the same
code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Move the validation of the SCSI device address and the attachment of the
controller into qemuDomainAttachDeviceDiskLiveInternal as there's no
specific need for a special helper.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Move the specific device setup and address reservation code into the
main hotplug helper as it's just one extra function call.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Move the specific device setup and address reservation code into the
main hotplug helper as it's just one extra function call.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Unify the handling of the copy-on-read filter by changing the handling
to use qemuBlockStorageSourceChainData.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Fill in the required fields in qemuBlockStorageSourceChainData to handle
the hotplug so that we can simplify the cleanup code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
qemuBlockStorageSourceChainData encapsulates the backend of the disk for
startup and hotplug operations. Add the handling for the copy-on-read
filter so that the hotplug code doesn't need to have separate cleanup.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Replace the last use of the function by virDomainDiskInsert and remove
the unused helper.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Pre-extending the disk array size is pointless nowadays since we've
switched to memory APIs which don't return failure.
Switch all uses of reallocation of the array followed by
'virDomainDiskInsertPreAlloced' with direct virDomainDiskInsert.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The "machine-loadparm-multiple-disks-nets-s390" case now requires the
QEMU_CAPS_CCW feature to pass validation.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We can skip the formatting of the bootindex for floppies directly at the
place where it's being formatted.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The logic assigning the bootindices from the legacy boot order
configuration was spread through the command line formatters for the
disk device and for the floppy controller.
This patch adds 'effectiveBootindex' property to the disk private data
which holds the calculated boot index and moves the logic of determining
the boot index into 'qemuProcessPrepareDomainDiskBootorder' called from
'qemuProcessPrepareDomainStorage'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Later patches will implement sharing of the backing file, so we'll need
to be able to discriminate the overlays per VM.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The code deals with the startup of the VM and just uses the snapshot
code to achieve the desired outcome.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We store the virQEMUDriverConfig object in the context.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Remove all the arguments which are present in qemuSnapshotDiskContext.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The config is used both with the preparation and execution functions, so
we can store it in the context to simplify other helpers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Rather than filling various parts of the context from arguments pass in
the whole context.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Creating the overlay for the disk is needed when starting a new VM only.
Additionally for now migration with transient disks is forbidden
anyways.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The code will be later reused when adding support for sharing the
backing image of the snapshot.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
None of them are currently needed to pass our upstream CI, most were
either for ancient clang versions or coverity for silencing false
positives.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
They were added mostly randomly and we don't really want to keep working
around of false positives.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
After previous patches we have two structures:
virCapsHostNUMACellDistance and virNumaDistance which express the
same thing. And have the exact same members (modulo their names).
Drop the former in favor of the latter.
This change means that distances with value of 0 are no longer
printed out into capabilities XML, because domain XML code allows
partial distance specification and thus threats value of 0 as
unspecified by user (see virDomainNumaGetNodeDistance() which
returns the default LOCAL/REMOTE distance for value of 0).
Also, from ACPI 6.1 specification, section 5.2.17 System Locality
Distance Information Table (SLIT):
Distance values of 0-9 are reserved and have no meaning.
Thus we shouldn't be ever reporting 0 in neither domain nor
capabilities XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Expose virNumaDistance XML formatter so that it can be re-used by
other parts of the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
There's nothing domain specific about NUMA distances. Rename the
virDomainNumaDistance structure to just virNumaDistance.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The virCapsHostNUMACellSiblingInfo structure really represents
distance to other NUMA node. Rename the structure and variables
of that type to make it more obvious.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This reverts commit <1b22dd6dd44202094e0f78f887cbe790c00e9ebc>.
First of all, the reverted commit is incomplete. It only sets
cpuset.mems in the VM root cgroup when the API is used but there is no
code that would do the same when the VM is started.
Libvirt never places any process into the VM root cgroup directly. All
the supporting processes like slirp-helper or dbus-daemon are placed
into the emulator sub-cgroup and all the QEMU threads are distributed
between emulator, vcpu* and iothread* sub-cgroups. The scenario
described in the reverted commit can happen only if someone manually
adds any process there which we should not care about.
If we would like to set the limit in the VM root cgroup we need to
introduce better logic:
- set both (old and new) numa group in the VM root cgroup
- change the numa group in all sub-cgroups to new value
- finally set only the new value in the VM root cgroup
The simplest fix now is to revert the commit.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The used libxl_domain_build_info, which is contained in
libxl_domain_config, is owned and already initialized by the caller.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
The passed libxl_domain_create_info is owned, and already initialized,
by the caller.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>