We can finally introduce a specific target model for the spapr-vty
device used by pSeries guests, which means isa-serial will no longer
show up to confuse users.
We make sure migration works in both directions by interpreting the
isa-serial target type, or the lack of target type, appropriately
when parsing the guest XML, and skipping the newly-introduced type
when formatting if for migration. We also verify that spapr-vty is
not used for non-pSeries guests and add a bunch of test cases.
This commit is best viewed with 'git show -w'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1511421
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Target model and target type must agree for the configuration
to make sense, so check that's actually the case and error out
otherwise.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Instead of validating each target type / address type combination
separately, create a small helper to perform the matching and
collapse all existing checks into a single one.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Instead of waiting until we get to command line generation, we can
validate the target for a char device much earlier.
Move all the checks out of qemuBuildSerialChrDeviceStr() and into
the new fuction. This will later allow us to validate the target
for platform devices.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This is the first step in getting rid of the assumption that
isa-serial is the default target type for serial devices.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Having a separate function for char device handling is better than
adding even more code to qemuDomainDeviceDefPostParse().
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1425757
The blockdev-add code provides a mechanism to sanely provide user
and password-secret arguments for iscsi without placing them on the
command line to be viewable by a 'ps -ef' type command or needing
to create separate -iscsi devices for each disk/volume found.
So modify the iSCSI command line building to check for the presence
of the capability in order properly setup and use the domain master
secret object to encrypt the password in a secret object and alter
the parameters for the command line to utilize.
Modify the xml2argvtest to exhibit the syntax for both disk and
hostdev configurations.
Rather than picking apart the two pieces we need/want (path, hosts,
and auth)- let's allocate/use a virStorageSourcePtr for iSCSI storage.
The end result is that qemuBuildSCSIiSCSIHostdevDrvStr doesn't need
to "fake" one for the qemuBuildNetworkDriveStr call.
Move the setup of the disk attribute to the disk source prepare function
which will allow proper usage with JSON props and move the fallback
(legacy) generating code into the block which is executed with legacy
options.
As a side-effect of this change we can clean up propagation of 'cfg'
into the command generator.
Also it's nice to see that the test output is the same even when the
value is generated in a different place.
Disk sharing between two VMs may corrupt the images if the format driver
does not support it. Check that the user declared use of a supported
storage format when they want to share the disk.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1511480
When doing block commit we need to allow write for members of the
backing chain so that we can commit the data into them.
qemuDomainDiskChainElementPrepare was used for this which since commit
786d8d91b4 calls qemuDomainNamespaceSetupDisk which has very adverse
side-effects, namely it relabels the nodes to the same label it has in
the main namespace. This was messing up permissions for the commit
operation since its touching various parts of a single backing chain.
Since we are are actually not introducing new images at that point add a
flag for qemuDomainDiskChainElementPrepare which will refrain from
calling to the namespace setup function.
Calls from qemuDomainSnapshotCreateSingleDiskActive and
qemuDomainBlockCopyCommon do introduce new members all calls from
qemuDomainBlockCommit do not, so the calls are anotated accordingly.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1506072
Most of the time it's okay to leave this up to negotiation between
the guest and the host, but in some situations it can be useful to
manually decide the behavior, especially to enforce its availability.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1308743
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Setup everything related to disks in one place rather than calling in
from various places.
The change to ordering of the setup steps is necessary since secrets
need the master key to be present.
In 4f15707202 I've tried to make duplicates detection for
nested /dev mount better. However, I've missed the obvious case
when there are two same mount points. For instance if:
# mount --bind /dev/blah /dev/blah
# mount --bind /dev/blah /dev/blah
Yeah, very unlikely (in qemu driver world) but possible.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This function works over domain definition and not domain object.
Its name is thus misleading.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Right-aligning backslashes when defining macros or using complex
commands in Makefiles looks cute, but as soon as any changes is
required to the code you end up with either distractingly broken
alignment or unnecessarily big diffs where most of the changes
are just pushing all backslashes a few characters to one side.
Generated using
$ git grep -El '[[:blank:]][[:blank:]]\\$' | \
grep -E '*\.([chx]|am|mk)$$' | \
while read f; do \
sed -Ei 's/[[:blank:]]*[[:blank:]]\\$/ \\/g' "$f"; \
done
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
When a user provides the backing chain, we will not need to re-detect
all the backing stores again, but should move to the end of the user
specified chain. Additionally if a user provides a full terminated chain
we should not attempt any further detection.
Separate it so that it deals only with single virStorageSource, so that
it can later be reused for full backing chain support.
Two aliases are passed since authentication is more relevant to the
'storage backend' whereas encryption is more relevant to the protocol
layer. When using node names, the aliases will be different.
qemuDomainGetImageIds and qemuDomainStorageFileInit are helpful when
trying to access a virStorageSource from the qemu driver since they
figure out the correct uid and gid for the image.
When accessing members of a backing chain the permissions for the top
level would be used. To allow using specific permissions per backing
chain level but still allow inheritance from the parent of the chain we
need to add a new parameter to the image ID APIs.
This new capability enables a pause before device state serialization so
that we can finish all block jobs without racing with the end of the
migration. The pause is indicated by "pre-switchover" state. Once we're
done QEMU enters "device" migration state.
This patch just defines the new capability and QEMU migration states and
their mapping to our job states.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Since we will be allowing users to set device aliases and memory
devices are fragile when it comes to aliases we have to make sure
they won't change during migration. Other devices should be fine.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
All calls to qemuMonitorGetMigrationCapability in QEMU driver are
replaced with qemuMigrationCapsGet.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Each time we need to check whether a given migration capability is
supported by QEMU, we call query-migrate-capabilities QMP command and
lookup the capability in the returned list. Asking for the list of
supported capabilities once when we connect to QEMU and storing the
result in a bitmap is much better and we don't need to enter a monitor
just to check whether a migration capability is supported.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Since the encryption information can also be disk source specific
move it from qemuDomainDiskPrivate to qemuDomainStorageSourcePrivate
Since the last allocated element from qemuDomainDiskPrivate is
removed, that means we no longer need qemuDomainDiskPrivateDispose.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Since the secret information is really virStorageSource specific
piece of data, let's manage the privateData from there instead of
at the Disk level.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Add the object definition and helpers to store security-related private
data for virStorageSources.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
When commit id 'da86c6c22' added support for diskPriv->encinfo in
qemuDomainSecretDiskPrepare a change to qemuDomainSecretDiskDestroy
to was missed. Although qemuDomainDiskPrivateDispose probably would
do the trick.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1495511
When creating new /dev for domain ran in namespace we try to
preserve all sub-mounts of /dev. Well, not quite all. For
instance if /dev/foo/bar and /dev/foo are both mount points, only
/dev/foo needs preserving. /dev/foo/bar is preserved with it too.
Now, to identify such cases like this one STRPREFIX() is used.
That is not good enough. While it works for [/dev/foo/bar;
/dev/foo] case, it fails for [/dev/prefix; /dev/prefix2] where
the strings share the same prefix but are in fact two different
paths. The solution is to use STRSKIP().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
We need to send allowReboot in the migration cookie to ensure the same
behavior of the virDomainSetLifecycleAction() API on the destination.
Consider this scenario:
1. On the source the domain is started with:
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
2. User calls an API to set "destroy" for <on_reboot>:
<on_poweroff>destroy</on_poweroff>
<on_reboot>destroy</on_reboot>
<on_crash>destroy</on_crash>
3. The guest is migrated to a different host
4a. Without the allowReboot in the migration cookie the QEMU
process on destination would be started with -no-reboot
which would prevent using the virDomainSetLifecycleAction() API
for the rest of the guest lifetime.
4b. With the allowReboot in the migration cookie the QEMU process
on destination is started without -no-reboot like it was started
on the source host and the virDomainSetLifecycleAction() API
continues to work.
The following patch adds a QEMU implementation of the
virDomainSetLifecycleAction() API and that implementation disallows
using the API if all actions are set to "destroy" because we add
"-no-reboot" on the QEMU command line. Changing the lifecycle action
is in this case pointless because the QEMU process is always terminated.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This will be used later on in implementation of new API
virDomainSetLifecycleAction(). In order to use it, we need to store
the value in status XML to not lose the information if libvirtd is
restarted.
If some guest was started by old libvirt where it was not possible
to change the lifecycle action for running guest, we can safely
detect it based on the current actions from the status XML.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When libvirt older than 3.9.0 reconnected to a running domain started by
old libvirt it could have messed up the expansion of host-model by
adding features QEMU does not support (such as cmt). Thus whenever we
reconnect to a running domain, revert to an active snapshot, or restore
a saved domain we need to check the guest CPU model and remove the
CPU features unknown to QEMU. We can do this because we know the domain
was successfully started, which means the CPU did not contain the
features when libvirt started the domain.
https://bugzilla.redhat.com/show_bug.cgi?id=1495171
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Add helpers that will simplify checking if a backing file is valid or
whether it has backing store. The helper virStorageSourceIsBacking
returns true if the given virStorageSource is a valid backing store
member. virStorageSourceHasBacking returns true if the virStorageSource
has a backing store child.
Adding these functions creates a central points for further refactors.
The command "info migrate" of qemu outputs the dirty-pages-rate during
migration, but page size is different in different architectures. So
page size should be output to calculate dirty pages in bytes.
Page size is already implemented with commit
030ce1f8612215fcbe9d353dfeaeb2937f8e3f94 in qemu.
Now Implement the counter-part in libvirt.
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Currently we don't do it. Therefore we accept senseless
combinations of models and buses they are attached to.
Moreover, diag288 watchdog is exclusive to s390(x).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Checking of disk presence accesses storage on the host so it should be
done from the host setup function. Move the code to new function called
qemuProcessPrepareHostStorage and remove qemuDomainCheckDiskPresence.
Introduce a new function to prepare domain disks which will also do the
volume source to actual disk source translation.
The 'pretend' condition is not transferred to the new location since it
does not help in writing tests and also no tests abuse it.
Introduce a function to setup any TLS needs for a disk source.
If there's a configuration or other error setting up the disk source
for TLS, then cause the domain startup to fail.
For VxHS, follow the chardevTLS model where if the src->haveTLS hasn't
been configured, then take the system/global cfg->haveTLS setting for
the storage source *and* mark that we've done so via the tlsFromConfig
setting in storage source.
Next, if we are using TLS, then generate an alias into a virStorageSource
'tlsAlias' field that will be used to create the TLS object and added to
the disk object in order to link the two together for QEMU.
Signed-off-by: John Ferlan <jferlan@redhat.com>
VM private data is cleared when the VM is turned off and also when the
VM object is being freed. Some of the clearing code was duplicated.
Extract it to a separate function.
This also removes the now unnecessary function
qemuDomainClearPrivatePaths.
Because qemuDomainDefCopy needs a string representation of a domain
definition, there's no reason for calling the lower level
qemuDomainDefFormatBuf API.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
virDomainDefFormatInternal (called by qemuDomainDefFormatXMLInternal)
already checks for buffer errors and properly resets the buffer on
failure.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1471225
Commit id '99a2d6af2' was a bit too aggressive with determining whether
the provided path was a "physical" cd-rom in order to generate a taint
message due to the possibility of some guest and host trying to control
the tray. For cd-rom guest devices backed to some VIR_STORAGE_TYPE_FILE
storage, this wouldn't be a problem and as such it shouldn't be a problem
for guest devices using some sort of block device on the host such as
iSCSI, LVM, or a Disk pool would present.
So before issuing a taint message, let's check if the provided path of
the VIR_STORAGE_TYPE_BLOCK backed device is a "known" physical cdrom name
by comparing the beginning of the path w/ "/dev/cdrom" and "/dev/sr".
Also since it's possible the provided path could resolve to some /dev/srN
device, let's get that path as well and perform the same check.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since commit v2.2.0-199-g7ce711a30e libvirt stores an updated guest CPU
in domain's live definition and there's no need to update it every time
we want to format the definition. The commit itself tried to address
this in qemuDomainFormatXML, but forgot to fix qemuDomainDefFormatLive.
Not to mention that masking a previously set flag is only acceptable if
the flag was set by a public API user. Internally, libvirt should have
never set the flag in the first place.
https://bugzilla.redhat.com/show_bug.cgi?id=1485022
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When a user requested a domain XML description with
VIR_DOMAIN_XML_UPDATE_CPU flag, libvirt would use the host CPU
definition from host capabilities rather than the one which will
actually be used once the domain is started.
https://bugzilla.redhat.com/show_bug.cgi?id=1481309
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In the past we updated host-model CPUs with host CPU data by adding a
model and features, but keeping the host-model mode. And since the CPU
model is not normally formatted for host-model CPU defs, we had to pass
the updateCPU flag to the formatting code to be able to properly output
updated host-model CPUs. Libvirt doesn't do this anymore, host-model
CPUs are turned into custom mode CPUs once updated with host CPU data
and thus there's no reason for keeping the hacks inside CPU XML
formatters.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This will be used to improve the validation for this type of devices.
The former @def parameter is renamed to @dev, leaving @def for the
virDomainDef (following the style used elsewhere).
Signed-off-by: Pino Toscano <ptoscano@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1075520
Apart from generic checks, we need to constrain netmask/prefix
length a bit. Thing is, with current implementation QEMU needs to
be able to 'assign' some IP addresses to the virtual network. For
instance, the default gateway is at x.x.x.2, dns is at x.x.x.3,
the default DHCP range is x.x.x.15-x.x.x.30. Since we don't
expose these settings yet, it's safer to require shorter prefix
to have room for the defaults.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: laine@laine.org
https://bugzilla.redhat.com/show_bug.cgi?id=1075520
Currently, all that users can specify for an interface type of
'user' is the common attributes: PCI address, NIC model (and
that's basically it). However, some need to configure other
address range than the default one.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: laine@laine.org
Block job QMP commands with underscores rather than dashes were never
released in upstream qemu, (they were added, but modified in the same
release [1]), but a certain distro managed to backport the version in the
middle.
The change also slightly modified semantics for the abort command, which
made us have a lot of code which was only ever present in certain
downstream distros.
Clean the upstream code from the legacy cruft and support only the
upstream implementations.
[1] See qemu commit v1.0-2176-gdb58f9c060
Reviewed-by: Eric Blake <eblake@redhat.com>
No need to pass a @driver parameter since all that's done is deref
the @cfg especially since the only caller can just pass an already
referenced @cfg.
Also, looks like commit id '0298531b' at one time had a different
name for the API, so I took the liberty of fixing the comments too
since I would already be updating them for the @cfg variable.
In case of real migration (not migrating to file on save, dump etc)
migration info is not complete at time qemu finishes migration
in normal (non postcopy) mode. We need to update disks stats,
downtime info etc. Thus let's not expose this job status as
completed.
To archive this let's set status to 'qemu completed' after
qemu reports migration is finished. It is not visible as complete
job to clients. Cookie code on confirm phase will finally turn
job into completed. As we don't need more things to do when
migrating to file status is set to 'completed' as before
in this case.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When getting job info in case mirror does not reach ready phase
fetch mirror stats from qemu. Otherwise mirror stats are already
saved in current job.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Instead of checking stat.status let's set status to migrating
as soon as migrate command is send (waiting for completion
is a good place too).
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Setting status to none has little value - getting job status
will not return even elapsed time.
After this patch getting job stats stays correct in a sence
it will not fetch migration stats because it consults
stats.status before doing the fetch.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Let's introduce QEMU_DOMAIN_JOB_STATUS_POSTCOPY state for job.current->status
instead of checking job.current->stats.status. The latter can be changed
when fetching migration statistics. Moving state function from the variable
and leave only store function seems more managable.
This patch removes all state checking usage of stats except for
qemuDomainGetJobStatsInternal. This place will be handled separately.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This patch simply switches code from using VIR_DOMAIN_JOB_* to
introduced QEMU_DOMAIN_JOB_STATUS_*. Later this gives us freedom
to introduce states for postcopy and mirroring phases.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Neither @cfg nor (now) @driver is used in the API, so remove them
and mark @opaque as UNUSED.
NB: Commit id 'fa3c558596' dropped the unused @qemuCaps which was the
last consumer of @driver other than @cfg, but even @cfg was never used
even in the original implementation from commit id 'd987f63a'.
arm/aarch64 -M virt on KVM doesn't and will never work with standard
VGA card emulation. The recommended method is to use type=virtio, so
let's make it the default for video devices without an explicit type
set by the user.
https://bugzilla.redhat.com/show_bug.cgi?id=1404112
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Will be needed for future patches to pull the default video type
setting out of XML parsing routines.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
We call qemuDomainGetMachineName on domain start. On first
start (after daemon start) pid is 0 and virSystemdGetMachineNameByPID
don't get called. But after domain shutting down pid became -1 so
on next start virSystemdGetMachineNameByPID is called and returned an error.
Error is ignored so it is not critical. But at least on my system
(systemd-219 with extra patches) systemd-machined is crashed on
this request.
This behaviour is triggered by eaf2c9f89.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Currently while parsing domain XML we clear the UNIX path if it matches
one of the auto-generated paths by libvirt. After that when the guest
is started new path is generated but the mode is also changed to "bind".
In the real-world use-case the mode should not change, it only happens
if a user provides a mode='connect' and path that matches one of the
auto-generated path or not provides a path at all.
Before *reconnect* feature was introduced there was no issue, but with
the new feature we need to make sure that it's used only with "connect"
mode, therefore we need to move the mode change into parsing in order
to have a proper error reported by validation code.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When recreating folders with namespaces, the directory type was not
being handled at all. It's not special, we probably just didn't know
that that can be used as a volume path as well. The code failed
gracefully, but we want to allow that so that we can use <disk
type='dir'> in domains again.
Partially-resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1443434
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
At some places we either already have synchronous job or we just
released it. Also, some APIs might want to use this code without
having to release their job. Anyway, the job acquire code is
moved out to qemuDomainRemoveInactiveJob so that
qemuDomainRemoveInactive does just what it promises.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Otherwise longer domain names might generate paths that are too long
to be created. This follows what other parts of the code do as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1453194
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The reconnect attribute for chardev devices in QEMU is used to
configure the reconnect timeout in seconds. Setting '0' value disables
the reconnect functionality thus we don't allow to set '0' for QEMU.
To disable the reconnect user should use <reconnect enabled='no'/>.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1254971
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
And into DeviceDefValidate which is the expected place
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Use the new facility which allows to ignore failures in post parse
callbacks if they are not fatal so that VM configs are not lost if the
emulator binary is missing.
If qemuCaps can't be populated on daemon restart skip certain portions
of the post parse callbacks during config reload and re-run the callback
during VM startup.
This fixes VMs vanishing if the emulator binary was broken or
uninstalled and libvirtd was restarted.
qemuDomainControllerDefPostParse assigns the default USB controller
model when it was not specified by the user. Skip this step if @qemuCaps
is missing so that we don't fill wrong data. This will then be fixes by
re-running the post parse callback.
The domain post parse callback, domain address callback and the domain
device callback (for every single device) would each grab qemuCaps for
the current emulator. This is quite wasteful. Use the new callback to do
this just once.
We're storing the machine name in @priv but free it just in
qemuProcessStop, Therefore this may leak.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This is a preparation for following patches where we switch to
virFileCache for QEMU capabilities cache
The host arch will always remain the same but virCaps may change. Now
the host arch is stored while creating new qemu capabilities cache.
It removes the need to pass virCaps into virQEMUCapsCache*() functions.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
It's possible to have more than one unnamed virtio-serial unix channel.
We need to generate a unique name for each channel. Currently, we use
".../unknown.sock" for all of them. Better practice would be to specify
an explicit target path name; however, in the absence of that, we need
uniqueness in the names we generate internally.
Before the changes we'd get /var/lib/libvirt/qemu/channel/target/unknown.sock
for each instance of
<channel type='unix'>
<source mode='bind'/>
<target type='virtio'/>
</channel>
Now, we get vioser-00-00-01.sock, vioser-00-00-02.sock, etc.
Signed-off-by: Scott Garfinkle <seg@us.ibm.com>
It is more related to a domain as we might use it even when there is
no systemd and it does not use any dbus/systemd functions. In order
not to use code from conf/ in util/ pass machineName in cgroups code
as a parameter. That also fixes a leak of machineName in the lxc
driver and cleans up and de-duplicates some code.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The original name didn't hint at the fact that PHBs are
a pSeries-specific concept.
Suggested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
This patch addresses the same aspects on PPC the bug 1103314 addressed
on x86.
PCI expander bus creates multiple primary PCI busses, where each of these
busses can be assigned a specific NUMA affinity, which, on x86 is
advertised through ACPI on a per-bus basis.
For SPAPR, a PHB's NUMA affinities are assigned on a per-PHB basis, and
there is no mechanism for advertising NUMA affinities to a guest on a
per-bus basis. So, even if qemu-ppc manages to get some sort of multi-bus
topology working using PXB, there is no way to expose the affinities
of these busses to the guest. It can only be exposed on a per-PHB/per-domain
basis.
So patch enables NUMA node tag in pci-root controller on PPC.
The way to set the NUMA node is through the numa_node option of
spapr-pci-host-bridge device. However for the implicit PHB, the only way
to set the numa_node is from the -global option. The -global option applies
to all the PHBs unless explicitly specified with the option on the
respective PHB of CLI. The default PHB has the emulated devices only, so
the patch prevents setting the NUMA node for the default PHB.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Now that the multi-phb support series is in, work on the TODO at
qemuDomainGetMemLockLimitBytes() to arrive at the correct memlock limit
value.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
pSeries guests will soon be allowed to have multiple
PHBs (pci-root controllers), meaning the current check
on the controller index no longer applies to them.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
pSeries guests will soon be allowed to have multiple
PHBs (pci-root controllers), which of course means that
all but one of them will have a non-zero index; hence,
we'll need to relax the current check.
However, right now the check is performed in the conf
module, which is generic rather than tied to the QEMU
driver, and where we don't have information such as the
guest machine type available.
To make this change of behavior possible down the line,
we need to move the check from the XML parser to the
drivers. Luckily, only QEMU and bhyve are using PCI
controllers, so this doesn't result in much duplication.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
When checking ABI stability between two domain definitions, we first
make migratable copies of them. However, we also asked for the guest CPU
to be updated, even though the updated CPU is supposed to be already
included in the original definitions. Moreover, if we do this on the
destination host during migration, we're potentially updating the
definition with according to an incompatible host CPU.
While updating the CPU when checking ABI stability doesn't make any
sense, it actually just worked because updating the CPU doesn't do
anything for custom CPUs (only host-model CPUs are affected) and we
updated both definitions in the same way.
Less then a year ago commit v2.3.0-rc1~42 stopped updating the CPU in
the definition we got internally and only the user supplied definition
was updated. However, the same commit started updating host-model CPUs
to custom CPUs which are not affected by the request to update the CPU.
So it still seemed to work right, unless a user upgraded libvirt 2.2.0
to a newer version while there were some domains with host-model CPUs
running on the host. Such domains couldn't be migrated with a user
supplied XML since libvirt would complain:
Target CPU mode custom does not match source host-model
The fix is pretty straightforward, we just need to stop updating the CPU
when checking ABI stability.
https://bugzilla.redhat.com/show_bug.cgi?id=1463957
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Obviously, old gcc-s ale sad when a variable shares the name with
a function. And we do have such variable (added in 4d8a914be0):
@mount. Rename it to @mountpoint so that compiler's happy again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The way we create devices under /dev is highly linux specific.
For instance we do mknod(), mount(), umount(), etc. Some
platforms are even missing some of these functions. Then again,
as declared in qemuDomainNamespaceAvailable(): namespaces are
linux only. Therefore, to avoid obfuscating the code by trying to
make it compile on weird platforms, just provide a non-linux stub
for qemuDomainAttachDeviceMknodRecursive(). At the same time,
qemuDomainAttachDeviceMknodHelper() which actually calls the
non-existent functions is moved under ifdef __linux__ block since
its only caller is in that block too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Currently, the only type of chardev that we create the backend
for in the namespace is type='dev'. This is not enough, other
backends might have files under /dev too. For instance channels
might have a unix socket under /dev (well, bind mounted under
/dev from a different place).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1462060
Just like in the previous commit, when attaching a file based
device which has its source living under /dev (that is not a
device rather than a regular file), calling mknod() is no help.
We need to:
1) bind mount device to some temporary location
2) enter the namespace
3) move the mount point to desired place
4) umount it in the parent namespace from the temporary location
At the same time, the check in qemuDomainNamespaceSetupDisk makes
no longer sense. Therefore remove it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1462060
When building a qemu namespace we might be dealing with bare
regular files. Files that live under /dev. For instance
/dev/my_awesome_disk:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/dev/my_awesome_disk'/>
<target dev='vdc' bus='virtio'/>
</disk>
# qemu-img create -f qcow2 /dev/my_awesome_disk 10M
So far we were mknod()-ing them which is
obviously wrong. We need to touch the file and bind mount it to
the original:
1) touch /var/run/libvirt/qemu/fedora.dev/my_awesome_disk
2) mount --bind /dev/my_awesome_disk /var/run/libvirt/qemu/fedora.dev/my_awesome_disk
Later, when the new /dev is built and replaces original /dev the
file is going to live at expected location.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Currently, we silently assume that file we are creating in the
namespace is either a link or a device (character or block one).
This is not always the case. Therefore instead of doing something
wrong, claim about unsupported file type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Currently, we silently assume that file we are creating in the
namespace is either a link or a device (character or block one).
This is not always the case. Therefore instead of doing something
wrong, claim about unsupported file type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This function is going to be used on other places, so
instead of copying code we can just call the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1459592
In 290a00e41d I've tried to fix the process of building a
qemu namespace when dealing with file mount points. What I
haven't realized then is that we might be dealing not with just
regular files but also special files (like sockets). Indeed, try
the following:
1) socat unix-listen:/tmp/soket stdio
2) touch /dev/socket
3) mount --bind /tmp/socket /dev/socket
4) virsh start anyDomain
Problem with my previous approach is that I wasn't creating the
temporary location (where mount points under /dev are moved) for
anything but directories and regular files.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
vcpu properties gathered from query-hotpluggable cpus need to be passed
back to qemu. As qemu did not use the node-id property until now and
libvirt forgot to pass it back properly (it was parsed but not passed
around) we did not honor this.
This patch adds node-id to the structures where it was missing and
passes it around as necessary.
The test data was generated with a VM with following config:
<numa>
<cell id='0' cpus='0,2,4,6' memory='512000' unit='KiB'/>
<cell id='1' cpus='1,3,5,7' memory='512000' unit='KiB'/>
</numa>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1452053
In QEMU driver we can use virtlogd as stdio handler for source backend
of char devices if current QEMU is new enough and it's enabled in
qemu.conf. We should store this information while starting a guest
because the config option may change while the guest is running.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1431112
Imagine a FS mounted on /dev/blah/blah2. Our process of creating
suffix for temporary location where all the mounted filesystems
are moved is very simplistic. We want:
/var/run/libvirt/qemu/$domName.$suffix\
were $suffix is just the mount point path stripped of the "/dev/"
prefix. For instance:
/var/run/libvirt/qemu/fedora.mqueue for /dev/mqueue
/var/run/libvirt/qemu/fedora.pts for /dev/pts
and so on. Now if we plug /dev/blah/blah2 into the example we see
some misbehaviour:
/var/run/libvirt/qemu/fedora.blah/blah2
Well, misbehaviour if /dev/blah/blah2 is a file, because in that
case we call virFileTouch() instead of virFileMakePath().
The solution is to replace all the slashes in the suffix with say
dots. That way we don't have to care about nested directories.
IOW, the result we want for given example is:
/var/run/libvirt/qemu/fedora.blah.blah2
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1431112
There can be nested mount points. For instance /dev/shm/blah can
be a mount point and /dev/shm too. It doesn't make much sense to
return the former path because callers preserve the latter (and
with that the former too). Therefore prune nested mount points.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1431112
After 290a00e41d we know how to deal with file mount points.
However, when cleaning up the temporary location for preserved
mount points we are still calling rmdir(). This won't fly for
files. We need to call unlink(). Now, since we don't really care
if the cleanup succeeded or not (it's the best effort anyway), we
can call both rmdir() and unlink() without need for
differentiation between files and directories.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When making ABI stability checks for an active domain, we need to make
sure we use the same migratable definition which virDomainGetXMLDesc
with the MIGRATABLE flag provides, otherwise the ABI check will fail.
This is implemented in the new qemuDomainCheckABIStability which takes a
domain object and generates the right migratable definition from it.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This patch separates the actual ABI checks from getting migratable defs
in qemuDomainDefCheckABIStability so that we can create another wrapper
which will use different methods to get the migratable defs.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The main goal of this function is to enable reusing the parsing code
from qemuDomainDefCopy.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Use ATTRIBUTE_FALLTHROUGH, introduced by commit
5d84f5961b, instead of comments to
indicate that the fall through is an intentional behavior.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1459091
Currently, we are querying for vhostuser interface name in post
parse callback. At that time interface might not yet exist.
However, it has to exist when starting domain. Therefore it makes
more sense to query its name at that point. This partially
reverts 57b5e27.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If QEMU is new enough and we have the live updated CPU definition in
either save or migration cookie, we can use it to enforce ABI. The
original guest CPU from domain XML will be stored in private data.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Since the domain XML saved in a snapshot or saved image uses the
original guest CPU definition but we still want to enforce ABI when
restoring the domain if libvirt and QEMU are new enough, we save the
live updated CPU definition in a save cookie.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The destination host may not be able to start a domain using the live
updated CPU definition because either libvirt or QEMU may not be new
enough. Thus we need to send the original guest CPU definition.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When starting a domain we update the guest CPU definition to match what
QEMU actually provided (since it is allowed to add or removed some
features unless check='full' is specified). Let's store the original CPU
in domain private data so that we can use it to provide a backward
compatible domain XML.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This patch implements a new save cookie object and callbacks for qemu
driver. The actual useful content will be added in the object later.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This will be used later when a save cookie will become part of the
snapshot XML using new driver specific parser/formatter functions.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1450349
Problem is, qemu fails to load guest memory image if these
attribute change on migration/restore from an image.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
While checking for ABI stability, drivers might pose additional
checks that are not valid for general case. For instance, qemu
driver might check some memory backing attributes because of how
qemu works. But those attributes may work well in other drivers.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The @meminfo allocated in qemuMonitorGetMemoryDeviceInfo() may be
lost when qemuDomainObjExitMonitor() failed.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Namely, this patch is about virMediatedDeviceGetIOMMUGroup{Dev,Num}
functions. There's no compelling reason why these functions should take
an object, on the contrary, having to create an object every time one
needs to query the IOMMU group number, discarding the object afterwards,
seems odd.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
The QEMU default is GICv2, and some of the code in libvirt
relies on the exact value. Stop pretending that's not the
case and use GICv2 explicitly where needed.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
There are currently some limitations in the emulated GICv3
that make it unsuitable as a default. Use GICv2 instead.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1450433
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Currently we consider all UNIX paths with specific prefix as generated
by libvirt, but that's a wrong assumption. Let's make the detection
better by actually checking whether the whole path matches one of the
paths that we generate or generated in the past.
The UNIX path isn't stored in config XML since libvirt-1.3.1.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1446980
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Add kernel_irqchip=split/on to the QEMU command line
and a capability that looks for it in query-command-line-options
output. For the 'split' option, use a version check
since it cannot be reasonably probed.
https://bugzilla.redhat.com/show_bug.cgi?id=1427005
Even though there are several checks before calling this function
and for some scenarios we don't call it at all (e.g. on disk hot
unplug), it may be possible to sneak in some weird files (e.g. if
domain would have RNG with /dev/shm/some_file as its backend). No
matter how improbable, we shouldn't unlink it as we would be
unlinking a file from the host which we haven't created in the
first place.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
Just like in previous commit, this fixes the same issue for
hotplug.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
While the code allows devices to already be there (by some
miracle), we shouldn't try to create devices that don't belong to
us. For instance, we shouldn't try to create /dev/shm/file
because /dev/shm is a mount point that is preserved. Therefore if
a file is created there from an outside (e.g. by mgmt application
or some other daemon running on the system like vhostmd), it
exists in the qemu namespace too as the mount point is the same.
It's only /dev and /dev only that is different. The same
reasoning applies to all other preserved mount points.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
Currently, all we need to do in qemuDomainCreateDeviceRecursive() is to
take given @device, get all kinds of info on it (major & minor numbers,
owner, seclabels) and create its copy at a temporary location @path
(usually /var/run/libvirt/qemu/$domName.dev), if @device live under
/dev. This is, however, very loose condition, as it also means
/dev/shm/* is created too. Therefor, we will need to pass more arguments
into the function for better decision making (e.g. list of mount points
under /dev). Instead of adding more arguments to all the functions (not
easily reachable because some functions are callback with strictly
defined type), lets just turn this one 'const char *' into a 'struct *'.
New "arguments" can be then added at no cost.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
When setting up mount namespace for a qemu domain the following
steps are executed:
1) get list of mountpoints under /dev/
2) move them to /var/run/libvirt/qemu/$domName.ext
3) start constructing new device tree under /var/run/libvirt/qemu/$domName.dev
4) move the mountpoint of the new device tree to /dev
5) restore original mountpoints from step 2)
Note the problem with this approach is that if some device in step
3) requires access to a mountpoint from step 2) it will fail as
the mountpoint is not there anymore. For instance consider the
following domain disk configuration:
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/dev/shm/vhostmd0'/>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
</disk>
In this case operation fails as we are unable to create vhostmd0
in the new device tree because after step 2) there is no /dev/shm
anymore. Leave aside fact that we shouldn't try to create devices
living in other mountpoints. That's a separate bug that will be
addressed later.
Currently, the order described above is rearranged to:
1) get list of mountpoints under /dev/
2) start constructing new device tree under /var/run/libvirt/qemu/$domName.dev
3) move them to /var/run/libvirt/qemu/$domName.ext
4) move the mountpoint of the new device tree to /dev
5) restore original mountpoints from step 3)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
This is a USB3 controller and it's a better choice than piix3-uhci.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Andrea Bolognani <abologna@redhat.com>
The new logic will set the piix3-uhci if available regardless of
any architecture and it will be updated to better model based on
architecture and device existence.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Andrea Bolognani <abologna@redhat.com>
This patch maps /domain/cpu/cache element into -cpu parameters:
- <cache mode='passthrough'/> is translated to host-cache-info=on
- <cache level='3' mode='emulate'/> is transformed into l3-cache=on
- <cache mode='disable'/> is turned in host-cache-info=off,l3-cache=off
Any other <cache> element is forbidden.
The tricky part is detecting whether QEMU supports the CPU properties.
The 'host-cache-info' property is introduced in v2.4.0-1389-ge265e3e480,
earlier QEMU releases enabled host-cache-info by default and had no way
to disable it. If the property is present, it defaults to 'off' for any
QEMU until at least 2.9.0.
The 'l3-cache' property was introduced later by v2.7.0-200-g14c985cffa.
Earlier versions worked as if l3-cache=off was passed. For any QEMU
until at least 2.9.0 l3-cache is 'off' by default.
QEMU 2.9.0 was the first release which supports probing both properties
by running device-list-properties with typename=host-x86_64-cpu. Older
QEMU releases did not support device-list-properties command for CPU
devices. Thus we can't really rely on probing them and we can just use
query-cpu-model-expansion QMP command as a witness.
Because the cache property probing is only reliable for QEMU >= 2.9.0
when both are already supported for quite a few releases, we let QEMU
report an error if a specific cache mode is explicitly requested. The
other mode (or both if a user requested CPU cache to be disabled) is
explicitly turned off for QEMU >= 2.9.0 to avoid any surprises in case
the QEMU defaults change. Any older QEMU already turns them off so not
doing so explicitly does not make any harm.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Not all async jobs are visible via virDomainGetJobStats (either they are
too fast or getting the stats is not allowed during the job), but
forcing all of them to advertise the operation is easier than hunting
the jobs for which fetching statistics is allowed. And we won't need to
think about this when we add support for getting stats for more jobs.
https://bugzilla.redhat.com/show_bug.cgi?id=1441563
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We are currently parsing only rx/frames/max because that's the only
value that makes sense for us. The tun device just added support for
this one and the others are only supported by hardware devices which
we don't need to worry about as the only way we'd pass those to the
domain is using <hostdev/> or <interface type='hostdev'/>. And in
those cases the guest can modify the settings itself.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The history of USB controller for ppc64 guest is complex and goes
back to libvirt 1.3.1 where the fun started.
Prior Libvirt 1.3.1 if no model for USB controller was specified
we've simply passed "-usb" on QEMU command line.
Since Libvirt 1.3.1 there is a patch (8156493d8d) that fixes this
issue by using "-device pci-ohci,..." but it breaks migration with
older Libvirts which was agreed that's acceptable. However this
patch didn't reflect this change in the domain XML and the model
was still missing.
Since Libvirt 2.2.0 there is a patch (f55eaccb0c) that fixes the
issue with not setting the USB model into domain XML which we need
to know about to not break the migration and since the default
model was *pci-ohci* it was used as default in this patch as well.
This patch tries to take all the previous changes into account and
also change the default for newly defined domains that don't specify
any model for USB controller.
The VIR_DOMAIN_DEF_PARSE_ABI_UPDATE is set only if new domain is
defined or new device is added into a domain which means that in
all other cases we will use the old *pci-ohci* model instead of the
better and not broken *nec-usb-xhci* model.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1373184
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Introduce new wrapper functions without *Machine* in the function
name that take the whole virDomainDef structure as argument and
call the existing functions with *Machine* in the function name.
Change the arguments of existing functions to *machine* and *arch*
because they don't need the whole virDomainDef structure and they
could be used in places where we don't have virDomainDef.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This way qemuDomainLogContextRef() and qemuDomainLogContextFree() is
no longer needed. The naming qemuDomainLogContextFree() was also
somewhat misleading. Additionally, it's easier to turn
qemuDomainLogContext in a self-locking object.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Depending on the architecture, requirements for ACPI and UEFI can
be different; more specifically, while on x86 UEFI requires ACPI,
on aarch64 it's the other way around.
Enforce these requirements when validating the domain, and make
the error message more accurate by mentioning that they're not
necessarily applicable to all architectures.
Several aarch64 test cases had to be tweaked because they would
have failed the validation step otherwise.
Currently, if we want to zero out disk source (e,g, due to
startupPolicy when starting up a domain) we use
virDomainDiskSetSource(disk, NULL). This works well for file
based storage (storage type file, dir, or block). But it doesn't
work at all for other types like volume and network.
So imagine that you have a domain that has a CDROM configured
which source is a volume from an inactive pool. Because it is
startupPolicy='optional', the CDROM is empty when the domain
starts. However, the source element is not cleared out in the
status XML and thus when the daemon restarts and tries to
reconnect to the domain it refreshes the disks (which fails - the
storage pool is still not running) and thus the domain is killed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When idx is 0 virStorageFileChainLookup returns the base (bottom) of the
backing chain rather than the top. This is expected by the callers of
qemuDomainGetStorageSourceByDevstr.
Add a special case for idx == 0
For guests that use <memoryBacking><locked>, our only option
is to remove the memory locking limit altogether.
Partially-resolves: https://bugzilla.redhat.com/1431793
Instead of having a separate function, we can simply return
zero from the existing qemuDomainGetMemLockLimitBytes() to
signal the caller that the memory locking limit doesn't need
to be set for the guest.
Having a single function instead of two makes it less likely
that we will use the wrong value, which is exactly what
happened when we started applying the limit that was meant
for VFIO-using guests to <memoryBacking><locked>-using
guests.
This reverts commit c2e60ad0e5.
Turns out this check is excessively strict: there are ways
other than <memtune><hard_limit> to raise the memory locking
limit for QEMU processes, one prominent example being
tweaking /etc/security/limits.conf.
Partially-resolves: https://bugzilla.redhat.com/1431793
Since mdevs are just another type of VFIO devices, we should increase
the memory locking limit the same way we do for VFIO PCI devices.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
As goes for all the other hostdev device types, grant the qemu process
access to /dev/vfio/<mediated_device_iommu_group>.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
A mediated device will be identified by a UUID (with 'model' now being
a mandatory <hostdev> attribute to represent the mediated device API) of
the user pre-created mediated device. We also need to make sure that if
user explicitly provides a guest address for a mdev device, the address
type will be matching the device API supported on that specific mediated
device and error out with an incorrect XML message.
The resulting device XML:
<devices>
<hostdev mode='subsystem' type='mdev' model='vfio-pci'>
<source>
<address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'>
</source>
</hostdev>
</devices>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
The code is currently simple, but if we later add node names, it will be
necessary to generate the names based on the node name. Add a helper so
that there's a central point to fix once we add self-generated node
names.
If the migration flags indicate this migration will be using TLS,
then set up the destination during the prepare phase once the target
domain has been started to add the TLS objects to perform the migration.
This will create at least an "-object tls-creds-x509,endpoint=server,..."
for TLS credentials and potentially an "-object secret,..." to handle the
passphrase response to access the TLS credentials. The alias/id used for
the TLS objects will contain "libvirt_migrate".
Once the objects are created, the code will set the "tls-creds" and
"tls-hostname" migration parameters to signify usage of TLS.
During the Finish phase we'll be sure to attempt to clear the
migration parameters and delete those objects (whether or not they
were created). We'll also perform the same reset during recovery
if we've reached FINISH3.
If the migration isn't using TLS, then be sure to check if the
migration parameters exist and clear them if so.
So, majority of the code is just ready as-is. Well, with one
slight change: differentiate between dimm and nvdimm in places
like device alias generation, generating the command line and so
on.
Speaking of the command line, we also need to append 'nvdimm=on'
to the '-machine' argument so that the nvdimm feature is
advertised in the ACPI tables properly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
NVDIMM is new type of memory introduced into QEMU 2.6. The idea
is that we have a Non-Volatile memory module that keeps the data
persistent across domain reboots.
At the domain XML level, we already have some representation of
'dimm' modules. Long story short, NVDIMM will utilize the
existing <memory/> element that lives under <devices/> by adding
a new attribute 'nvdimm' to the existing @model and introduce a
new <path/> element for <source/> while reusing other fields. The
resulting XML would appear as:
<memory model='nvdimm'>
<source>
<path>/tmp/nvdimm</path>
</source>
<target>
<size unit='KiB'>523264</size>
<node>0</node>
</target>
<address type='dimm' slot='0'/>
</memory>
So far, this is just a XML parser/formatter extension. QEMU
driver implementation is in the next commit.
For more info on NVDIMM visit the following web page:
http://pmem.io/
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1431112
Yeah, that's right. A mount point doesn't have to be a directory.
It can be a file too. However, the code that tries to preserve
mount points under /dev for new namespace for qemu does not count
with that option.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1430634
If a qemu process has died, we get EOF on its monitor. At this
point, since qemu process was the only one running in the
namespace kernel has already cleaned the namespace up. Any
attempt of ours to enter it has to fail.
This really happened in the bug linked above. We've tried to
attach a disk to qemu and while we were in the monitor talking to
qemu it just died. Therefore our code tried to do some roll back
(e.g. deny the device in cgroups again, restore labels, etc.).
However, during the roll back (esp. when restoring labels) we
still thought that domain has a namespace. So we used secdriver's
transactions. This failed as there is no namespace to enter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When libvirtd is started we call qemuDomainRecheckInternalPaths
to detect whether a domain has VNC socket path generated by libvirt
based on option from qemu.conf. However if we are parsing status XML
for running domain the existing socket path can be generated also if
the config XML uses the new <listen type='socket'/> element without
specifying any socket.
The current code doesn't make difference how the socket was generated
and always marks it as "fromConfig". We need to store the
"autoGenerated" value in the status XML in order to preserve that
information.
The difference between "fromConfig" and "autoGenerated" is important
for migration, because if the socket is based on "fromConfig" we don't
print it into the migratable XML and we assume that user has properly
configured qemu.conf on both hosts. However if the socket is based
on "autoGenerated" it means that a new feature was used and therefore
we need to leave the socket in migratable XML to make sure that if
this feature is not supported on destination the migration will fail.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Now that we have some qemuSecurity wrappers over
virSecurityManager APIs, lets make sure everybody sticks with
them. We have them for a reason and calling virSecurityManager
API directly instead of wrapper may lead into accidentally
labelling a file on the host instead of namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The functions in virCommand() after fork() must be careful with regard
to accessing any mutexes that may have been locked by other threads in
the parent process. It is possible that another thread in the parent
process holds the lock for the virQEMUDriver while fork() is called.
This leads to a deadlock in the child process when
'virQEMUDriverGetConfig(driver)' is called and therefore the handshake
never completes between the child and the parent process. Ultimately
the virDomainObjectPtr will never be unlocked.
It gets much worse if the other thread of the parent process, that
holds the lock for the virQEMUDriver, tries to lock the already locked
virDomainObject. This leads to a completely unresponsive libvirtd.
It's possible to reproduce this case with calling 'virsh start XXX'
and 'virsh managedsave XXX' in a tight loop for multiple domains.
This commit fixes the deadlock in the same way as it is described in
commit 61b52d2e38.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
When enabling virgl, qemu opens /dev/dri/render*. So far, we are
not allowing that in devices CGroup nor creating the file in
domain's namespace and thus requiring users to set the paths in
qemu.conf. This, however, is suboptimal as it allows access to
ALL qemu processes even those which don't have virgl configured.
Now that we have a way to specify render node that qemu will use
we can be more cautious and enable just that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far, qemuDomainGetHostdevPath has no knowledge of the reasong
it is called and thus reports /dev/vfio/vfio for every VFIO
backed device. This is suboptimal, as we want it to:
a) report /dev/vfio/vfio on every addition or domain startup
b) report /dev/vfio/vfio only on last VFIO device being unplugged
If a domain is being stopped then namespace and CGroup die with
it so no need to worry about that. I mean, even when a domain
that's exiting has more than one VFIO devices assigned to it,
this function does not clean /dev/vfio/vfio in CGroup nor in the
namespace. But that doesn't matter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
So far, we are allowing /dev/vfio/vfio in the devices cgroup
unconditionally (and creating it in the namespace too). Even if
domain has no hostdev assignment configured. This is potential
security hole. Therefore, when starting the domain (or
hotplugging a hostdev) create & allow /dev/vfio/vfio too (if
needed).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Since these two functions are nearly identical (with
qemuSetupHostdevCgroup actually calling virCgroupAllowDevicePath)
we can have one function call the other and thus de-duplicate
some code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The bare fact that mnt namespace is available is not enough for
us to allow/enable qemu namespaces feature. There are other
requirements: we must copy all the ACL & SELinux labels otherwise
we might grant access that is administratively forbidden or vice
versa.
At the same time, the check for namespace prerequisites is moved
from domain startup time to qemu.conf parser as it doesn't make
much sense to allow users to start misconfigured libvirt just to
find out they can't start a single domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
mknod() is affected my the current umask, so we're not
guaranteed the newly-created device node will have the
right permissions.
Call chmod(), which is not affected by the current umask,
immediately afterwards to solve the issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1421036
Just like we need wrappers over other virSecurityManager APIs, we
need one for virSecurityManagerSetImageLabel and
virSecurityManagerRestoreImageLabel. Otherwise we might end up
relabelling device in wrong namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Firstly, instead of checking for next->path the
virStorageSourceIsEmpty() function should be used which also
takes disk type into account.
Secondly, not every disk source passed has the correct type set
(due to our laziness). Therefore, instead of checking for
virStorageSourceIsBlockLocal() and also S_ISBLK() the former can
be refined to just virStorageSourceIsLocalStorage().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>