Commit Graph

895 Commits

Author SHA1 Message Date
Peter Krempa
0a294a8e28 util: storagefile: Add helpers to check presence of backing store
Add helpers that will simplify checking if a backing file is valid or
whether it has backing store. The helper virStorageSourceIsBacking
returns true if the given virStorageSource is a valid backing store
member. virStorageSourceHasBacking returns true if the virStorageSource
has a backing store child.

Adding these functions creates a central points for further refactors.
2017-10-17 06:19:18 +02:00
Chao Fan
79b7ac43fa qemu: add the print of page size in cmd domjobinfo
The command "info migrate" of qemu outputs the dirty-pages-rate during
migration, but page size is different in different architectures. So
page size should be output to calculate dirty pages in bytes.

Page size is already implemented with commit
030ce1f8612215fcbe9d353dfeaeb2937f8e3f94 in qemu.
Now Implement the counter-part in libvirt.

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-10-12 17:06:07 +02:00
Peter Krempa
802fd24506 qemu: domain: Mark if no blockjobs are active in the status XML
Note when no blockjobs are running in the status XML so that we know
that the backing chain will not change until we reconnect.
2017-10-06 08:47:30 +02:00
Michal Privoznik
8a54cc1d08 qemuDomainDeviceDefValidate: Validate watchdog
Currently we don't do it. Therefore we accept senseless
combinations of models and buses they are attached to.
Moreover, diag288 watchdog is exclusive to s390(x).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-10-05 14:23:20 +02:00
Peter Krempa
8418aed978 qemu: process: move disk presence checking to host setup function
Checking of disk presence accesses storage on the host so it should be
done from the host setup function. Move the code to new function called
qemuProcessPrepareHostStorage and remove qemuDomainCheckDiskPresence.
2017-10-05 09:46:46 +02:00
Peter Krempa
0c09c5b0d1 qemu: process: Move TLS setup for storage source to qemuProcessPrepareDomainStorage 2017-10-05 09:45:59 +02:00
Peter Krempa
f1cec8829e qemu: process: Move 'volume' translation to domain prepare stage
Introduce a new function to prepare domain disks which will also do the
volume source to actual disk source translation.

The 'pretend' condition is not transferred to the new location since it
does not help in writing tests and also no tests abuse it.
2017-10-05 09:45:10 +02:00
Peter Krempa
76039bba87 qemu: domain: Document and export qemuDomainCheckDiskStartupPolicy 2017-10-05 09:40:15 +02:00
John Ferlan
5c09486c1e qemu: Introduce qemuDomainPrepareDiskSource
Introduce a function to setup any TLS needs for a disk source.

If there's a configuration or other error setting up the disk source
for TLS, then cause the domain startup to fail.

For VxHS, follow the chardevTLS model where if the src->haveTLS hasn't
been configured, then take the system/global cfg->haveTLS setting for
the storage source *and* mark that we've done so via the tlsFromConfig
setting in storage source.

Next, if we are using TLS, then generate an alias into a virStorageSource
'tlsAlias' field that will be used to create the TLS object and added to
the disk object in order to link the two together for QEMU.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-09-28 09:45:14 -04:00
Peter Krempa
3685e2dd64 qemu: domain: Extract common clearing of VM private data
VM private data is cleared when the VM is turned off and also when the
VM object is being freed. Some of the clearing code was duplicated.
Extract it to a separate function.

This also removes the now unnecessary function
qemuDomainClearPrivatePaths.
2017-09-27 17:02:02 +02:00
Ján Tomko
01f86fb301 qemu: adjust indentation of qemuDomainObjPrivateXMLFormatAutomaticPlacement
Commit 6801da94 fixed the typo in the function name, but forgot
to adjust the indentation level of the next line.
2017-09-26 17:10:51 +02:00
Peter Krempa
6801da940e qemu: domain: Fix typo in qemuDomainObjPtrivateXMLFormatAutomaticPlacement 2017-09-26 16:38:04 +02:00
Jiri Denemark
369199d1a9 qemu: Use qemuDomainDefFormatXML in qemuDomainDefCopy
Because qemuDomainDefCopy needs a string representation of a domain
definition, there's no reason for calling the lower level
qemuDomainDefFormatBuf API.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-22 14:02:59 +02:00
Jiri Denemark
5c4fc07d1a qemu: Fix error checking in qemuDomainDefFormatXMLInternal
virDomainDefFormatInternal (called by qemuDomainDefFormatXMLInternal)
already checks for buffer errors and properly resets the buffer on
failure.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-22 14:02:59 +02:00
John Ferlan
ed2a741e48 qemu: Be more selective when determining cdrom for taint messaging
https://bugzilla.redhat.com/show_bug.cgi?id=1471225

Commit id '99a2d6af2' was a bit too aggressive with determining whether
the provided path was a "physical" cd-rom in order to generate a taint
message due to the possibility of some guest and host trying to control
the tray. For cd-rom guest devices backed to some VIR_STORAGE_TYPE_FILE
storage, this wouldn't be a problem and as such it shouldn't be a problem
for guest devices using some sort of block device on the host such as
iSCSI, LVM, or a Disk pool would present.

So before issuing a taint message, let's check if the provided path of
the VIR_STORAGE_TYPE_BLOCK backed device is a "known" physical cdrom name
by comparing the beginning of the path w/ "/dev/cdrom" and "/dev/sr".
Also since it's possible the provided path could resolve to some /dev/srN
device, let's get that path as well and perform the same check.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-09-21 10:22:34 -04:00
Jiri Denemark
06f75ff2cb qemu: Don't update CPU when formatting live def
Since commit v2.2.0-199-g7ce711a30e libvirt stores an updated guest CPU
in domain's live definition and there's no need to update it every time
we want to format the definition. The commit itself tried to address
this in qemuDomainFormatXML, but forgot to fix qemuDomainDefFormatLive.
Not to mention that masking a previously set flag is only acceptable if
the flag was set by a public API user. Internally, libvirt should have
never set the flag in the first place.

https://bugzilla.redhat.com/show_bug.cgi?id=1485022

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-21 15:27:39 +02:00
Jiri Denemark
7e874326a3 qemu: Use correct host model for updating guest cpu
When a user requested a domain XML description with
VIR_DOMAIN_XML_UPDATE_CPU flag, libvirt would use the host CPU
definition from host capabilities rather than the one which will
actually be used once the domain is started.

https://bugzilla.redhat.com/show_bug.cgi?id=1481309

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-21 15:27:39 +02:00
Jiri Denemark
4fd179f518 cpu_conf: Drop updateCPU from virCPUDefFormat
In the past we updated host-model CPUs with host CPU data by adding a
model and features, but keeping the host-model mode. And since the CPU
model is not normally formatted for host-model CPU defs, we had to pass
the updateCPU flag to the formatting code to be able to properly output
updated host-model CPUs. Libvirt doesn't do this anymore, host-model
CPUs are turned into custom mode CPUs once updated with host CPU data
and thus there's no reason for keeping the hacks inside CPU XML
formatters.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-21 15:23:39 +02:00
Pino Toscano
cf4acafe8b qemu: reject parallel ports for pseries machines
They are simply not supported on that machine type.

Partially-resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1487499

Signed-off-by: Pino Toscano <ptoscano@redhat.com>
2017-09-21 13:05:14 +02:00
Pino Toscano
02b1908de6 qemu: reject parallel ports for s390 archs
They are simply not supported on those architectures.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1487499

Signed-off-by: Pino Toscano <ptoscano@redhat.com>
2017-09-21 13:05:14 +02:00
Pino Toscano
2c79a2b26c qemu: pass the virDomainDef to qemuDomainChrDefValidate
This will be used to improve the validation for this type of devices.

The former @def parameter is renamed to @dev, leaving @def for the
virDomainDef (following the style used elsewhere).

Signed-off-by: Pino Toscano <ptoscano@redhat.com>
2017-09-21 13:05:14 +02:00
Michal Privoznik
8703813aae qemu: Implement usernet address
https://bugzilla.redhat.com/show_bug.cgi?id=1075520

Apart from generic checks, we need to constrain netmask/prefix
length a bit. Thing is, with current implementation QEMU needs to
be able to 'assign' some IP addresses to the virtual network. For
instance, the default gateway is at x.x.x.2, dns is at x.x.x.3,
the default DHCP range is x.x.x.15-x.x.x.30. Since we don't
expose these settings yet, it's safer to require shorter prefix
to have room for the defaults.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: laine@laine.org
2017-09-18 13:54:27 +02:00
Michal Privoznik
d1dbb30782 conf: Allow usernet to have an address
https://bugzilla.redhat.com/show_bug.cgi?id=1075520

Currently, all that users can specify for an interface type of
'user' is the common attributes: PCI address, NIC model (and
that's basically it). However, some need to configure other
address range than the default one.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: laine@laine.org
2017-09-18 13:54:27 +02:00
Peter Krempa
2350d10149 qemu: Remove support for legacy block jobs
Block job QMP commands with underscores rather than dashes were never
released in upstream qemu, (they were added, but modified in the same
release [1]), but a certain distro managed to backport the version in the
middle.

The change also slightly modified semantics for the abort command, which
made us have a lot of code which was only ever present in certain
downstream distros.

Clean the upstream code from the legacy cruft and support only the
upstream implementations.

[1] See qemu commit v1.0-2176-gdb58f9c060

Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-14 10:03:25 +02:00
John Ferlan
23706c1708 qemu: Clean up qemuDomainSecretPrepare
No need to pass a @driver parameter since all that's done is deref
the @cfg especially since the only caller can just pass an already
referenced @cfg.

Also, looks like commit id '0298531b' at one time had a different
name for the API, so I took the liberty of fixing the comments too
since I would already be updating them for the @cfg variable.
2017-09-13 06:22:52 -04:00
Nikolay Shirokovskiy
3f2d6d829e qemu: migration: don't expose incomplete job as complete
In case of real migration (not migrating to file on save, dump etc)
migration info is not complete at time qemu finishes migration
in normal (non postcopy) mode. We need to update disks stats,
downtime info etc. Thus let's not expose this job status as
completed.

To archive this let's set status to 'qemu completed' after
qemu reports migration is finished. It is not visible as complete
job to clients. Cookie code on confirm phase will finally turn
job into completed. As we don't need more things to do when
migrating to file status is set to 'completed' as before
in this case.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-07 12:52:36 +02:00
Nikolay Shirokovskiy
8c46658337 qemu: migrate: add mirror stats to migration stats
When getting job info in case mirror does not reach ready phase
fetch mirror stats from qemu. Otherwise mirror stats are already
saved in current job.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-07 11:18:10 +02:00
Nikolay Shirokovskiy
5a274d4fdc qemu: introduce migrating job status
Instead of checking stat.status let's set status to migrating
as soon as migrate command is send (waiting for completion
is a good place too).

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-07 11:15:43 +02:00
Nikolay Shirokovskiy
b6868c3cdd qemu: start all async job with job status active
Setting status to none has little value - getting job status
will not return even elapsed time.

After this patch getting job stats stays correct in a sence
it will not fetch migration stats because it consults
stats.status before doing the fetch.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-07 11:15:01 +02:00
Nikolay Shirokovskiy
09f57f9aac qemu: introduce QEMU_DOMAIN_JOB_STATUS_POSTCOPY
Let's introduce QEMU_DOMAIN_JOB_STATUS_POSTCOPY state for job.current->status
instead of checking job.current->stats.status. The latter can be changed
when fetching migration statistics. Moving state function from the variable
and leave only store function seems more managable.

This patch removes all state checking usage of stats except for
qemuDomainGetJobStatsInternal. This place will be handled separately.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-07 09:41:45 +02:00
Nikolay Shirokovskiy
751a1c7f0a qemu: introduce qemu domain job status
This patch simply switches code from using VIR_DOMAIN_JOB_* to
introduced QEMU_DOMAIN_JOB_STATUS_*. Later this gives us freedom
to introduce states for postcopy and mirroring phases.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-07 09:41:45 +02:00
Nikolay Shirokovskiy
16bf7619b8 qemu: drop code for VIR_DOMAIN_JOB_BOUNDED and timeRemaining
qemu driver does not have VIR_DOMAIN_JOB_BOUNDED jobs and
timeRemaining is always 0.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-09-07 09:41:45 +02:00
John Ferlan
d143837bd1 qemu: Remove unused params from qemuDomainDeviceDefValidate
Neither @cfg nor (now) @driver is used in the API, so remove them
and mark @opaque as UNUSED.

NB: Commit id 'fa3c558596' dropped the unused @qemuCaps which was the
last consumer of @driver other than @cfg, but even @cfg was never used
even in the original implementation from commit id 'd987f63a'.
2017-09-05 10:56:58 -04:00
Cole Robinson
dda0da14cd qemu: Default to video type=virtio for machvirt
arm/aarch64 -M virt on KVM doesn't and will never work with standard
VGA card emulation. The recommended method is to use type=virtio, so
let's make it the default for video devices without an explicit type
set by the user.

https://bugzilla.redhat.com/show_bug.cgi?id=1404112

Signed-off-by: Cole Robinson <crobinso@redhat.com>
2017-09-05 10:41:32 -04:00
Cole Robinson
ef08a54538 qemu: Set default video type in qemu PostParse
And not generic domain_conf code. We will need qemu private functions
in a bit.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
2017-09-05 10:41:32 -04:00
Cole Robinson
a2ca7ca52e conf: domain: add VIDEO_TYPE_DEFAULT
Will be needed for future patches to pull the default video type
setting out of XML parsing routines.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
2017-09-05 10:41:32 -04:00
Nikolay Shirokovskiy
9820756cd3 qemu: handle -1 for pid in qemuDomainGetMachineName
We call qemuDomainGetMachineName on domain start. On first
start (after daemon start) pid is 0 and virSystemdGetMachineNameByPID
don't get called. But after domain shutting down pid became -1 so
on next start virSystemdGetMachineNameByPID is called and returned an error.
Error is ignored so it is not critical. But at least on my system
(systemd-219 with extra patches) systemd-machined is crashed on
this request.

This behaviour is triggered by eaf2c9f89.

Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-09-01 10:49:44 +02:00
Pavel Hrdina
be6a415e51 qemu: set bind mode for chardev while parsing XML
Currently while parsing domain XML we clear the UNIX path if it matches
one of the auto-generated paths by libvirt.  After that when the guest
is started new path is generated but the mode is also changed to "bind".

In the real-world use-case the mode should not change, it only happens
if a user provides a mode='connect' and path that matches one of the
auto-generated path or not provides a path at all.

Before *reconnect* feature was introduced there was no issue, but with
the new feature we need to make sure that it's used only with "connect"
mode, therefore we need to move the mode change into parsing in order
to have a proper error reported by validation code.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-08-30 17:47:56 +02:00
Martin Kletzander
ed8661a309 qemu: Also treat directories properly when using namespaces
When recreating folders with namespaces, the directory type was not
being handled at all.  It's not special, we probably just didn't know
that that can be used as a volume path as well.  The code failed
gracefully, but we want to allow that so that we can use <disk
type='dir'> in domains again.

Partially-resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1443434

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-08-29 16:30:04 +02:00
Michal Privoznik
9115dcd83e qemu: Introduce and use qemuDomainRemoveInactiveJob
At some places we either already have synchronous job or we just
released it. Also, some APIs might want to use this code without
having to release their job. Anyway, the job acquire code is
moved out to qemuDomainRemoveInactiveJob so that
qemuDomainRemoveInactive does just what it promises.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-08-29 11:18:34 +02:00
Martin Kletzander
f5ef291bdb qemu: Use short domain name in qemuDomainGetPreservedMountPath
Otherwise longer domain names might generate paths that are too long
to be created.  This follows what other parts of the code do as well.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1453194

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-08-29 11:17:16 +02:00
Pavel Hrdina
3ba6b532d1 qemu: implement chardev source reconnect
The reconnect attribute for chardev devices in QEMU is used to
configure the reconnect timeout in seconds.  Setting '0' value disables
the reconnect functionality thus we don't allow to set '0' for QEMU.
To disable the reconnect user should use <reconnect enabled='no'/>.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1254971

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-08-29 10:30:05 +02:00
Cole Robinson
5db046211f qemu: domain: Move some validation out of DeviceDefPostParse
And into DeviceDefValidate which is the expected place

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
2017-08-27 09:38:12 -04:00
Peter Krempa
7726d1581f qemu: Implement postParse callback skipping on config reload
Use the new facility which allows to ignore failures in post parse
callbacks if they are not fatal so that VM configs are not lost if the
emulator binary is missing.

If qemuCaps can't be populated on daemon restart skip certain portions
of the post parse callbacks during config reload and re-run the callback
during VM startup.

This fixes VMs vanishing if the emulator binary was broken or
uninstalled and libvirtd was restarted.
2017-08-18 15:07:44 +02:00
Peter Krempa
7808884808 qemu: domain: Don't set default USB model if qemuCaps is missing
qemuDomainControllerDefPostParse assigns the default USB controller
model when it was not specified by the user. Skip this step if @qemuCaps
is missing so that we don't fill wrong data. This will then be fixes by
re-running the post parse callback.
2017-08-18 15:07:44 +02:00
Peter Krempa
fde772cf82 qemu: domain: Don't return default NIC model if @qemuCaps are missing
Return NULL in qemuDomainDefaultNetModel if qemuCaps is missing and the
network card model would be determined by the capabilities.
2017-08-18 15:07:44 +02:00
Peter Krempa
18a8c36610 qemu: domain: Don't re-allocate qemuCaps in post parse callbacks
The domain post parse callback, domain address callback and the domain
device callback (for every single device) would each grab qemuCaps for
the current emulator. This is quite wasteful. Use the new callback to do
this just once.
2017-08-18 15:07:44 +02:00
Peter Krempa
03132bf487 qemu: Move assignment of default emulator to the basic post parse callback 2017-08-18 15:07:44 +02:00
Michal Privoznik
e0a4eaa913 qemuDomainObjPrivateFree: Free @machineName
We're storing the machine name in @priv but free it just in
qemuProcessStop, Therefore this may leak.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-08-07 10:44:06 +02:00
Michal Privoznik
2074ef6cd4 Add support for virtio-net.tx_queue_size
https://bugzilla.redhat.com/show_bug.cgi?id=1462653

Just like I've added support for setting rx_queue_size (in
c56cdf259 and friends), qemu just gained support for setting tx
ring size.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-08-02 15:37:09 +02:00
Pavel Hrdina
f366cfed62 qemu: pass only host arch instead of the whole virCaps
This is a preparation for following patches where we switch to
virFileCache for QEMU capabilities cache

The host arch will always remain the same but virCaps may change.  Now
the host arch is stored while creating new qemu capabilities cache.
It removes the need to pass virCaps into virQEMUCapsCache*() functions.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2017-07-26 15:35:24 +02:00
Pavel Hrdina
cc1329b627 qemu: we prefer C89 comment styles over C99
Introduced by commit 'a7bc2c8cfd6f'.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-07-25 23:10:00 +02:00
Scott Garfinkle
a7bc2c8cfd Generate unique socket file
It's possible to have more than one unnamed virtio-serial unix channel.
We need to generate a unique name for each channel. Currently, we use
".../unknown.sock" for all of them. Better practice would be to specify
an explicit target path name; however, in the absence of that, we need
uniqueness in the names we generate internally.

Before the changes we'd get /var/lib/libvirt/qemu/channel/target/unknown.sock
for each instance of
    <channel type='unix'>
        <source mode='bind'/>
        <target type='virtio'/>
    </channel>

Now, we get vioser-00-00-01.sock, vioser-00-00-02.sock, etc.

Signed-off-by: Scott Garfinkle <seg@us.ibm.com>
2017-07-25 22:38:35 +02:00
Martin Kletzander
eaf2c9f891 Move machineName generation from virsystemd into domain_conf
It is more related to a domain as we might use it even when there is
no systemd and it does not use any dbus/systemd functions.  In order
not to use code from conf/ in util/ pass machineName in cgroups code
as a parameter.  That also fixes a leak of machineName in the lxc
driver and cleans up and de-duplicates some code.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-07-25 17:02:27 +02:00
Martin Kletzander
2e6ecba1bc qemu: Save qemu driver in qemuDomainObjPrivateData
This way we can finally make it static and not use any externs anywhere.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-07-25 17:02:27 +02:00
Martin Kletzander
6e6faf6d62 conf: Pass config.priv to xmlopt->privateData.alloc
This will help us to get to some data more easily.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-07-25 17:02:27 +02:00
Andrea Bolognani
bbda2883c4 conf: Rename virDomainControllerIsPCIHostBridge() to IsPSeriesPHB()
The original name didn't hint at the fact that PHBs are
a pSeries-specific concept.

Suggested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
2017-07-25 09:42:38 +02:00
Shivaprasad G Bhat
e5a0579996 qemu: Enable NUMA node tag in pci-root for PPC64
This patch addresses the same aspects on PPC the bug 1103314 addressed
on x86.

PCI expander bus creates multiple primary PCI busses, where each of these
busses can be assigned a specific NUMA affinity, which, on x86 is
advertised through ACPI on a per-bus basis.

For SPAPR, a PHB's NUMA affinities are assigned on a per-PHB basis, and
there is no mechanism for advertising NUMA affinities to a guest on a
per-bus basis. So, even if qemu-ppc manages to get some sort of multi-bus
topology working using PXB, there is no way to expose the affinities
of these busses to the guest. It can only be exposed on a per-PHB/per-domain
basis.

So patch enables NUMA node tag in pci-root controller on PPC.

The way to set the NUMA node is through the numa_node option of
spapr-pci-host-bridge device. However for the implicit PHB, the only way
to set the numa_node is from the -global option. The -global option applies
to all the PHBs unless explicitly specified with the option on the
respective PHB of CLI. The default PHB has the emulated devices only, so
the patch prevents setting the NUMA node for the default PHB.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2017-07-21 15:46:29 +02:00
Peter Krempa
95d5601018 qemu: domain: Store and restore autoCpuset to status XML
Decouple them by storing them in the XML separately rather than
regenerating them. This will simplify upcoming fixes.
2017-07-20 16:14:50 +02:00
Peter Krempa
2dda319a9f qemu: domain: Extract parsing and formatting of priv->autoNodeset
Move the code to separate functions to avoid complicating the existing
ones with changes.
2017-07-20 16:14:50 +02:00
Shivaprasad G Bhat
210dd0c58d qemu: Take all PHBs into account while calculating memlock limits
Now that the multi-phb support series is in, work on the TODO at
qemuDomainGetMemLockLimitBytes() to arrive at the correct memlock limit
value.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2017-07-15 14:50:42 +02:00
Andrea Bolognani
591b42f39f qemu: Relax pci-root index requirement for pSeries guests
pSeries guests will soon be allowed to have multiple
PHBs (pci-root controllers), meaning the current check
on the controller index no longer applies to them.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
2017-07-15 14:50:42 +02:00
Andrea Bolognani
620c390c73 conf: Move index number checking to drivers
pSeries guests will soon be allowed to have multiple
PHBs (pci-root controllers), which of course means that
all but one of them will have a non-zero index; hence,
we'll need to relax the current check.

However, right now the check is performed in the conf
module, which is generic rather than tied to the QEMU
driver, and where we don't have information such as the
guest machine type available.

To make this change of behavior possible down the line,
we need to move the check from the XML parser to the
drivers. Luckily, only QEMU and bhyve are using PCI
controllers, so this doesn't result in much duplication.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
2017-07-15 14:50:42 +02:00
Jiri Denemark
ee68bb391e qemu: Don't update CPU when checking ABI stability
When checking ABI stability between two domain definitions, we first
make migratable copies of them. However, we also asked for the guest CPU
to be updated, even though the updated CPU is supposed to be already
included in the original definitions. Moreover, if we do this on the
destination host during migration, we're potentially updating the
definition with according to an incompatible host CPU.

While updating the CPU when checking ABI stability doesn't make any
sense, it actually just worked because updating the CPU doesn't do
anything for custom CPUs (only host-model CPUs are affected) and we
updated both definitions in the same way.

Less then a year ago commit v2.3.0-rc1~42 stopped updating the CPU in
the definition we got internally and only the user supplied definition
was updated. However, the same commit started updating host-model CPUs
to custom CPUs which are not affected by the request to update the CPU.
So it still seemed to work right, unless a user upgraded libvirt 2.2.0
to a newer version while there were some domains with host-model CPUs
running on the host. Such domains couldn't be migrated with a user
supplied XML since libvirt would complain:

    Target CPU mode custom does not match source host-model

The fix is pretty straightforward, we just need to stop updating the CPU
when checking ABI stability.

https://bugzilla.redhat.com/show_bug.cgi?id=1463957

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-07-13 09:53:15 +02:00
Michal Privoznik
c19d98d7c4 qemuDomainGetPreservedMountPath: rename @mount
Obviously, old gcc-s ale sad when a variable shares the name with
a function. And we do have such variable (added in 4d8a914be0):
@mount. Rename it to @mountpoint so that compiler's happy again.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-07-12 10:01:25 +02:00
Michal Privoznik
a4d9c31eac qemu: Provide non-linux stub for qemuDomainAttachDeviceMknodRecursive
The way we create devices under /dev is highly linux specific.
For instance we do mknod(), mount(), umount(), etc. Some
platforms are even missing some of these functions. Then again,
as declared in qemuDomainNamespaceAvailable(): namespaces are
linux only. Therefore, to avoid obfuscating the code by trying to
make it compile on weird platforms, just provide a non-linux stub
for qemuDomainAttachDeviceMknodRecursive(). At the same time,
qemuDomainAttachDeviceMknodHelper() which actually calls the
non-existent functions is moved under ifdef __linux__ block since
its only caller is in that block too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-07-12 08:44:57 +02:00
Peter Krempa
9506bd25a3 storage: Split out virStorageSource accessors to separate file
The helper methods for actually accessing the storage objects don't
really belong to the main storage driver implementation file. Split them
out.
2017-07-11 17:07:04 +02:00
Michal Privoznik
e93d844b90 qemu ns: Create chardev backends more frequently
Currently, the only type of chardev that we create the backend
for in the namespace is type='dev'. This is not enough, other
backends might have files under /dev too. For instance channels
might have a unix socket under /dev (well, bind mounted under
/dev from a different place).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-07-11 14:45:15 +02:00
Michal Privoznik
7976d1a514 qemuDomainAttachDeviceMknodRecursive: Support file mount points
https://bugzilla.redhat.com/show_bug.cgi?id=1462060

Just like in the previous commit, when attaching a file based
device which has its source living under /dev (that is not a
device rather than a regular file), calling mknod() is no help.
We need to:

1) bind mount device to some temporary location
2) enter the namespace
3) move the mount point to desired place
4) umount it in the parent namespace from the temporary location

At the same time, the check in qemuDomainNamespaceSetupDisk makes
no longer sense. Therefore remove it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-07-11 14:45:15 +02:00
Michal Privoznik
4f05f188de qemuDomainCreateDeviceRecursive: Support file mount points
https://bugzilla.redhat.com/show_bug.cgi?id=1462060

When building a qemu namespace we might be dealing with bare
regular files. Files that live under /dev. For instance
/dev/my_awesome_disk:

  <disk type='file' device='disk'>
    <driver name='qemu' type='qcow2'/>
    <source file='/dev/my_awesome_disk'/>
    <target dev='vdc' bus='virtio'/>
  </disk>

  # qemu-img create -f qcow2 /dev/my_awesome_disk 10M

So far we were mknod()-ing them which is
obviously wrong. We need to touch the file and bind mount it to
the original:

1) touch /var/run/libvirt/qemu/fedora.dev/my_awesome_disk
2) mount --bind /dev/my_awesome_disk /var/run/libvirt/qemu/fedora.dev/my_awesome_disk

Later, when the new /dev is built and replaces original /dev the
file is going to live at expected location.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-07-11 14:45:15 +02:00
Michal Privoznik
4fedbac620 qemuDomainAttachDeviceMknodHelper: Fail on unsupported file type
Currently, we silently assume that file we are creating in the
namespace is either a link or a device (character or block one).
This is not always the case. Therefore instead of doing something
wrong, claim about unsupported file type.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-07-11 14:45:15 +02:00
Michal Privoznik
89921f54cd qemuDomainCreateDeviceRecursive: Fail on unsupported file type
Currently, we silently assume that file we are creating in the
namespace is either a link or a device (character or block one).
This is not always the case. Therefore instead of doing something
wrong, claim about unsupported file type.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-07-11 14:45:15 +02:00
Michal Privoznik
4d8a914be0 qemu: Move preserved mount points path generation into a separate function
This function is going to be used on other places, so
instead of copying code we can just call the function.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-07-11 14:45:15 +02:00
Michal Privoznik
7154917908 qemuDomainBuildNamespace: Handle special file mount points
https://bugzilla.redhat.com/show_bug.cgi?id=1459592

In 290a00e41d I've tried to fix the process of building a
qemu namespace when dealing with file mount points. What I
haven't realized then is that we might be dealing not with just
regular files but also special files (like sockets). Indeed, try
the following:

1) socat unix-listen:/tmp/soket stdio
2) touch /dev/socket
3) mount --bind /tmp/socket /dev/socket
4) virsh start anyDomain

Problem with my previous approach is that I wasn't creating the
temporary location (where mount points under /dev are moved) for
anything but directories and regular files.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-07-11 14:45:15 +02:00
Peter Krempa
ccac446545 qemu: domain: Use vcpu 'node-id' property and pass it back to qemu
vcpu properties gathered from query-hotpluggable cpus need to be passed
back to qemu. As qemu did not use the node-id property until now and
libvirt forgot to pass it back properly (it was parsed but not passed
around) we did not honor this.

This patch adds node-id to the structures where it was missing and
passes it around as necessary.

The test data was generated with a VM with following config:
    <numa>
      <cell id='0' cpus='0,2,4,6' memory='512000' unit='KiB'/>
      <cell id='1' cpus='1,3,5,7' memory='512000' unit='KiB'/>
    </numa>

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1452053
2017-07-10 13:23:04 +02:00
Peter Krempa
0ca7f8b5f5 qemu: domain: Add missing newline to last element in status XML formatter
Commit f9758109a7 did not put a newline after the element it added.
2017-07-07 14:27:50 +02:00
Pavel Hrdina
f9758109a7 qemu: introduce chardevStdioLogd to qemu private data
In QEMU driver we can use virtlogd as stdio handler for source backend
of char devices if current QEMU is new enough and it's enabled in
qemu.conf.  We should store this information while starting a guest
because the config option may change while the guest is running.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-06-16 15:52:11 +02:00
Michal Privoznik
6451b55ec3 qemuDomainGetPreservedMounts: Fix suffixes for corner cases
https://bugzilla.redhat.com/show_bug.cgi?id=1431112

Imagine a FS mounted on /dev/blah/blah2. Our process of creating
suffix for temporary location where all the mounted filesystems
are moved is very simplistic. We want:

/var/run/libvirt/qemu/$domName.$suffix\

were $suffix is just the mount point path stripped of the "/dev/"
prefix. For instance:

/var/run/libvirt/qemu/fedora.mqueue  for /dev/mqueue
/var/run/libvirt/qemu/fedora.pts     for /dev/pts

and so on. Now if we plug /dev/blah/blah2 into the example we see
some misbehaviour:

/var/run/libvirt/qemu/fedora.blah/blah2

Well, misbehaviour if /dev/blah/blah2 is a file, because in that
case we call virFileTouch() instead of virFileMakePath().
The solution is to replace all the slashes in the suffix with say
dots. That way we don't have to care about nested directories.
IOW, the result we want for given example is:

/var/run/libvirt/qemu/fedora.blah.blah2

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-06-16 14:38:49 +02:00
Michal Privoznik
cdd9205dff qemuDomainGetPreservedMounts: Prune nested mount points
https://bugzilla.redhat.com/show_bug.cgi?id=1431112

There can be nested mount points. For instance /dev/shm/blah can
be a mount point and /dev/shm too. It doesn't make much sense to
return the former path because callers preserve the latter (and
with that the former too). Therefore prune nested mount points.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-06-16 14:38:23 +02:00
Michal Privoznik
6ab3e2f6c4 qemuDomainBuildNamespace: Clean up temp files
https://bugzilla.redhat.com/show_bug.cgi?id=1431112

After 290a00e41d we know how to deal with file mount points.
However, when cleaning up the temporary location for preserved
mount points we are still calling rmdir(). This won't fly for
files. We need to call unlink(). Now, since we don't really care
if the cleanup succeeded or not (it's the best effort anyway), we
can call both rmdir() and unlink() without need for
differentiation between files and directories.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2017-06-16 14:29:12 +02:00
Jiri Denemark
063b2b8788 qemu: Add qemuDomainCheckABIStability
When making ABI stability checks for an active domain, we need to make
sure we use the same migratable definition which virDomainGetXMLDesc
with the MIGRATABLE flag provides, otherwise the ABI check will fail.
This is implemented in the new qemuDomainCheckABIStability which takes a
domain object and generates the right migratable definition from it.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-14 17:08:16 +02:00
Jiri Denemark
a0912df3fa qemu: Add qemuDomainMigratableDefCheckABIStability
This patch separates the actual ABI checks from getting migratable defs
in qemuDomainDefCheckABIStability so that we can create another wrapper
which will use different methods to get the migratable defs.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-14 17:04:32 +02:00
Jiri Denemark
0810d4f5e0 qemu: Introduce qemuDomainDefFromXML helper
The main goal of this function is to enable reusing the parsing code
from qemuDomainDefCopy.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-14 17:04:12 +02:00
Marc Hartmayer
adf846d3c9 Use ATTRIBUTE_FALLTHROUGH
Use ATTRIBUTE_FALLTHROUGH, introduced by commit
5d84f5961b, instead of comments to
indicate that the fall through is an intentional behavior.

Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
2017-06-12 19:11:30 -04:00
Michal Privoznik
2a13a0a103 qemu: Query for vhostuser iface names at runtime
https://bugzilla.redhat.com/show_bug.cgi?id=1459091

Currently, we are querying for vhostuser interface name in post
parse callback. At that time interface might not yet exist.
However, it has to exist when starting domain. Therefore it makes
more sense to query its name at that point. This partially
reverts 57b5e27.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-06-08 15:02:22 +02:00
Jiri Denemark
8e34f47813 qemu: Use updated CPU when starting QEMU if possible
If QEMU is new enough and we have the live updated CPU definition in
either save or migration cookie, we can use it to enforce ABI. The
original guest CPU from domain XML will be stored in private data.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-07 13:36:02 +02:00
Jiri Denemark
8c19fbf452 qemu: Store updated CPU in save cookie
Since the domain XML saved in a snapshot or saved image uses the
original guest CPU definition but we still want to enforce ABI when
restoring the domain if libvirt and QEMU are new enough, we save the
live updated CPU definition in a save cookie.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-07 13:36:02 +02:00
Jiri Denemark
356a2161e2 qemu: Report the original CPU in migratable xml
The destination host may not be able to start a domain using the live
updated CPU definition because either libvirt or QEMU may not be new
enough. Thus we need to send the original guest CPU definition.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-07 13:36:02 +02:00
Jiri Denemark
ea6d898311 qemu: Remember CPU def from domain start
When starting a domain we update the guest CPU definition to match what
QEMU actually provided (since it is allowed to add or removed some
features unless check='full' is specified). Let's store the original CPU
in domain private data so that we can use it to provide a backward
compatible domain XML.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-07 13:36:02 +02:00
Jiri Denemark
215476b642 qemu: Implement virSaveCookie object and callbacks
This patch implements a new save cookie object and callbacks for qemu
driver. The actual useful content will be added in the object later.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-07 13:36:01 +02:00
Jiri Denemark
957cd268a9 conf: Pass xmlopt to virDomainSnapshotDefFormat
This will be used later when a save cookie will become part of the
snapshot XML using new driver specific parser/formatter functions.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2017-06-07 13:36:01 +02:00
Michal Privoznik
7b4e9b2c55 virQEMUDriverDomainABIStability: Check for memoryBacking
https://bugzilla.redhat.com/show_bug.cgi?id=1450349

Problem is, qemu fails to load guest memory image if these
attribute change on migration/restore from an image.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-06-05 09:18:34 +02:00
Michal Privoznik
4f0aeed871 virDomainXMLOption: Introduce virDomainABIStabilityDomain
While checking for ABI stability, drivers might pose additional
checks that are not valid for general case. For instance, qemu
driver might check some memory backing attributes because of how
qemu works. But those attributes may work well in other drivers.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-06-05 09:08:52 +02:00
Ján Tomko
381e638d81 qemu: format eim on intel-iommu command line
This option turns on extended interrupt mode,
which allows more than 255 vCPUs.

https://bugzilla.redhat.com/show_bug.cgi?id=1451282

Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2017-05-26 08:16:29 +02:00
Yi Wang
c679e8a41d qemu: Fix memory leak in qemuDomainUpdateMemoryDeviceInfo
The @meminfo allocated in qemuMonitorGetMemoryDeviceInfo() may be
lost when qemuDomainObjExitMonitor() failed.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-05-24 16:57:35 +02:00
Erik Skultety
3a2a2a7401 mdev: Pass a uuidstr rather than an mdev object to some util functions
Namely, this patch is about virMediatedDeviceGetIOMMUGroup{Dev,Num}
functions. There's no compelling reason why these functions should take
an object, on the contrary, having to create an object every time one
needs to query the IOMMU group number, discarding the object afterwards,
seems odd.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2017-05-18 12:20:15 +02:00
Andrea Bolognani
5645badd1f gic: Remove VIR_GIC_VERSION_DEFAULT
The QEMU default is GICv2, and some of the code in libvirt
relies on the exact value. Stop pretending that's not the
case and use GICv2 explicitly where needed.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
2017-05-16 16:48:30 +02:00
Andrea Bolognani
bc07101a7c qemu: Use GICv2 for aarch64/virt TCG guests
There are currently some limitations in the emulated GICv3
that make it unsuitable as a default. Use GICv2 instead.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1450433

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
2017-05-16 16:48:30 +02:00
Pavel Hrdina
ed99660446 qemu: improve detection of UNIX path generated by libvirt
Currently we consider all UNIX paths with specific prefix as generated
by libvirt, but that's a wrong assumption.  Let's make the detection
better by actually checking whether the whole path matches one of the
paths that we generate or generated in the past.

The UNIX path isn't stored in config XML since libvirt-1.3.1.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1446980

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-05-16 11:33:49 +02:00
Ján Tomko
6b5c6314b2 qemu: format kernel_irqchip on the command line
Add kernel_irqchip=split/on to the QEMU command line
and a capability that looks for it in query-command-line-options
output. For the 'split' option, use a version check
since it cannot be reasonably probed.

https://bugzilla.redhat.com/show_bug.cgi?id=1427005
2017-05-15 15:44:11 +02:00
Michal Privoznik
2f0b3b103b qemuDomainDetachDeviceUnlink: Don't unlink files we haven't created
Even though there are several checks before calling this function
and for some scenarios we don't call it at all (e.g. on disk hot
unplug), it may be possible to sneak in some weird files (e.g. if
domain would have RNG with /dev/shm/some_file as its backend). No
matter how improbable, we shouldn't unlink it as we would be
unlinking a file from the host which we haven't created in the
first place.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
2017-05-03 17:23:03 +02:00
Michal Privoznik
b3418f36be qemuDomainAttachDeviceMknodRecursive: Don't try to create devices under preserved mount points
Just like in previous commit, this fixes the same issue for
hotplug.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
2017-05-03 17:23:03 +02:00
Michal Privoznik
e30dbf35a1 qemuDomainCreateDeviceRecursive: Don't try to create devices under preserved mount points
While the code allows devices to already be there (by some
miracle), we shouldn't try to create devices that don't belong to
us. For instance, we shouldn't try to create /dev/shm/file
because /dev/shm is a mount point that is preserved. Therefore if
a file is created there from an outside (e.g. by mgmt application
or some other daemon running on the system like vhostmd), it
exists in the qemu namespace too as the mount point is the same.
It's only /dev and /dev only that is different. The same
reasoning applies to all other preserved mount points.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
2017-05-03 17:23:03 +02:00
Michal Privoznik
26c14be8d6 qemuDomainCreateDeviceRecursive: pass a structure instead of bare path
Currently, all we need to do in qemuDomainCreateDeviceRecursive() is to
take given @device, get all kinds of info on it (major & minor numbers,
owner, seclabels) and create its copy at a temporary location @path
(usually /var/run/libvirt/qemu/$domName.dev), if @device live under
/dev. This is, however, very loose condition, as it also means
/dev/shm/* is created too. Therefor, we will need to pass more arguments
into the function for better decision making (e.g. list of mount points
under /dev). Instead of adding more arguments to all the functions (not
easily reachable because some functions are callback with strictly
defined type), lets just turn this one 'const char *' into a 'struct *'.
New "arguments" can be then added at no cost.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
2017-05-03 17:23:03 +02:00
Michal Privoznik
a7cc039dc7 qemuDomainBuildNamespace: Move /dev/* mountpoints later
When setting up mount namespace for a qemu domain the following
steps are executed:

1) get list of mountpoints under /dev/
2) move them to /var/run/libvirt/qemu/$domName.ext
3) start constructing new device tree under /var/run/libvirt/qemu/$domName.dev
4) move the mountpoint of the new device tree to /dev
5) restore original mountpoints from step 2)

Note the problem with this approach is that if some device in step
3) requires access to a mountpoint from step 2) it will fail as
the mountpoint is not there anymore. For instance consider the
following domain disk configuration:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/shm/vhostmd0'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </disk>

In this case operation fails as we are unable to create vhostmd0
in the new device tree because after step 2) there is no /dev/shm
anymore. Leave aside fact that we shouldn't try to create devices
living in other mountpoints. That's a separate bug that will be
addressed later.

Currently, the order described above is rearranged to:

1) get list of mountpoints under /dev/
2) start constructing new device tree under /var/run/libvirt/qemu/$domName.dev
3) move them to /var/run/libvirt/qemu/$domName.ext
4) move the mountpoint of the new device tree to /dev
5) restore original mountpoints from step 3)

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cedric Bosdonnat <cbosdonnat@suse.com>
2017-05-03 17:23:03 +02:00
Pavel Hrdina
568887a32f qemu: use qemu-xhci USB controller by default for ppc64 and aarch64
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1438682

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Andrea Bolognani <abologna@redhat.com>
2017-04-28 10:47:12 +02:00
Pavel Hrdina
278e70f8f8 qemu: add support for qemu-xhci USB controller
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1438682

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Andrea Bolognani <abologna@redhat.com>
2017-04-28 10:44:36 +02:00
Pavel Hrdina
233f8d0bd4 qemu: use nec-usb-xhci as a default controller for aarch64 if available
This is a USB3 controller and it's a better choice than piix3-uhci.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Andrea Bolognani <abologna@redhat.com>
2017-04-28 10:42:26 +02:00
Pavel Hrdina
e69001b464 qemu: change the logic of setting default USB controller
The new logic will set the piix3-uhci if available regardless of
any architecture and it will be updated to better model based on
architecture and device existence.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Andrea Bolognani <abologna@redhat.com>
2017-04-28 10:41:53 +02:00
Jiri Denemark
df13c0b477 qemu: Add support for guest CPU cache
This patch maps /domain/cpu/cache element into -cpu parameters:

- <cache mode='passthrough'/> is translated to host-cache-info=on
- <cache level='3' mode='emulate'/> is transformed into l3-cache=on
- <cache mode='disable'/> is turned in host-cache-info=off,l3-cache=off

Any other <cache> element is forbidden.

The tricky part is detecting whether QEMU supports the CPU properties.

The 'host-cache-info' property is introduced in v2.4.0-1389-ge265e3e480,
earlier QEMU releases enabled host-cache-info by default and had no way
to disable it. If the property is present, it defaults to 'off' for any
QEMU until at least 2.9.0.

The 'l3-cache' property was introduced later by v2.7.0-200-g14c985cffa.
Earlier versions worked as if l3-cache=off was passed. For any QEMU
until at least 2.9.0 l3-cache is 'off' by default.

QEMU 2.9.0 was the first release which supports probing both properties
by running device-list-properties with typename=host-x86_64-cpu. Older
QEMU releases did not support device-list-properties command for CPU
devices. Thus we can't really rely on probing them and we can just use
query-cpu-model-expansion QMP command as a witness.

Because the cache property probing is only reliable for QEMU >= 2.9.0
when both are already supported for quite a few releases, we let QEMU
report an error if a specific cache mode is explicitly requested. The
other mode (or both if a user requested CPU cache to be disabled) is
explicitly turned off for QEMU >= 2.9.0 to avoid any surprises in case
the QEMU defaults change. Any older QEMU already turns them off so not
doing so explicitly does not make any harm.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-04-27 22:41:10 +02:00
Jiri Denemark
2a978269fc qemu: Report VIR_DOMAIN_JOB_OPERATION
Not all async jobs are visible via virDomainGetJobStats (either they are
too fast or getting the stats is not allowed during the job), but
forcing all of them to advertise the operation is easier than hunting
the jobs for which fetching statistics is allowed. And we won't need to
think about this when we add support for getting stats for more jobs.

https://bugzilla.redhat.com/show_bug.cgi?id=1441563

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-04-27 15:08:12 +02:00
Yuri Chornoivan
5efa7f2a4b Fix minor typos 2017-04-24 14:40:00 +02:00
Martin Kletzander
523c996062 conf, docs: Add support for coalesce setting(s)
We are currently parsing only rx/frames/max because that's the only
value that makes sense for us.  The tun device just added support for
this one and the others are only supported by hardware devices which
we don't need to worry about as the only way we'd pass those to the
domain is using <hostdev/> or <interface type='hostdev'/>.  And in
those cases the guest can modify the settings itself.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-04-21 13:34:41 +02:00
Pavel Hrdina
90acbc76ec qemu_domain: use correct default USB controller on ppc64
The history of USB controller for ppc64 guest is complex and goes
back to libvirt 1.3.1 where the fun started.

Prior Libvirt 1.3.1 if no model for USB controller was specified
we've simply passed "-usb" on QEMU command line.

Since Libvirt 1.3.1 there is a patch (8156493d8d) that fixes this
issue by using "-device pci-ohci,..." but it breaks migration with
older Libvirts which was agreed that's acceptable.  However this
patch didn't reflect this change in the domain XML and the model
was still missing.

Since Libvirt 2.2.0 there is a patch (f55eaccb0c) that fixes the
issue with not setting the USB model into domain XML which we need
to know about to not break the migration and since the default
model was *pci-ohci* it was used as default in this patch as well.

This patch tries to take all the previous changes into account and
also change the default for newly defined domains that don't specify
any model for USB controller.

The VIR_DOMAIN_DEF_PARSE_ABI_UPDATE is set only if new domain is
defined or new device is added into a domain which means that in
all other cases we will use the old *pci-ohci* model instead of the
better and not broken *nec-usb-xhci* model.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1373184

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-04-20 09:03:53 +02:00
Pavel Hrdina
ac97658d4f qemu: refactor qemuDomainMachine* functions
Introduce new wrapper functions without *Machine* in the function
name that take the whole virDomainDef structure as argument and
call the existing functions with *Machine* in the function name.

Change the arguments of existing functions to *machine* and *arch*
because they don't need the whole virDomainDef structure and they
could be used in places where we don't have virDomainDef.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-04-18 13:27:11 +02:00
Marc Hartmayer
b8cc509882 qemu: Turn qemuDomainLogContext into virObject
This way qemuDomainLogContextRef() and qemuDomainLogContextFree() is
no longer needed. The naming qemuDomainLogContextFree() was also
somewhat misleading. Additionally, it's easier to turn
qemuDomainLogContext in a self-locking object.

Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
2017-04-10 14:49:20 +02:00
Andrea Bolognani
396ca36cb0 qemu: Enforce ACPI, UEFI requirements
Depending on the architecture, requirements for ACPI and UEFI can
be different; more specifically, while on x86 UEFI requires ACPI,
on aarch64 it's the other way around.

Enforce these requirements when validating the domain, and make
the error message more accurate by mentioning that they're not
necessarily applicable to all architectures.

Several aarch64 test cases had to be tweaked because they would
have failed the validation step otherwise.
2017-04-03 10:58:00 +02:00
Michal Privoznik
462c4b66fa Introduce and use virDomainDiskEmptySource
Currently, if we want to zero out disk source (e,g, due to
startupPolicy when starting up a domain) we use
virDomainDiskSetSource(disk, NULL). This works well for file
based storage (storage type file, dir, or block). But it doesn't
work at all for other types like volume and network.

So imagine that you have a domain that has a CDROM configured
which source is a volume from an inactive pool. Because it is
startupPolicy='optional', the CDROM is empty when the domain
starts. However, the source element is not cleared out in the
status XML and thus when the daemon restarts and tries to
reconnect to the domain it refreshes the disks (which fails - the
storage pool is still not running) and thus the domain is killed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-04-03 08:35:57 +02:00
Peter Krempa
20ee78bf9b qemu: domain: Properly lookup top of chain in qemuDomainGetStorageSourceByDevstr
When idx is 0 virStorageFileChainLookup returns the base (bottom) of the
backing chain rather than the top. This is expected by the callers of
qemuDomainGetStorageSourceByDevstr.

Add a special case for idx == 0
2017-03-29 16:56:05 +02:00
Andrea Bolognani
7e667664d2 qemu: Fix memory locking limit calculation
For guests that use <memoryBacking><locked>, our only option
is to remove the memory locking limit altogether.

Partially-resolves: https://bugzilla.redhat.com/1431793
2017-03-28 10:54:49 +02:00
Andrea Bolognani
1f7661af8c qemu: Remove qemuDomainRequiresMemLock()
Instead of having a separate function, we can simply return
zero from the existing qemuDomainGetMemLockLimitBytes() to
signal the caller that the memory locking limit doesn't need
to be set for the guest.

Having a single function instead of two makes it less likely
that we will use the wrong value, which is exactly what
happened when we started applying the limit that was meant
for VFIO-using guests to <memoryBacking><locked>-using
guests.
2017-03-28 10:54:47 +02:00
Andrea Bolognani
4b67e7a377 Revert "qemu: Forbid <memoryBacking><locked> without <memtune><hard_limit>"
This reverts commit c2e60ad0e5.

Turns out this check is excessively strict: there are ways
other than <memtune><hard_limit> to raise the memory locking
limit for QEMU processes, one prominent example being
tweaking /etc/security/limits.conf.

Partially-resolves: https://bugzilla.redhat.com/1431793
2017-03-28 10:44:25 +02:00
Erik Skultety
c8e6775f30 qemu: Bump the memory locking limit for mdevs as well
Since mdevs are just another type of VFIO devices, we should increase
the memory locking limit the same way we do for VFIO PCI devices.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2017-03-27 15:39:35 +02:00
Erik Skultety
de4e8bdbc7 qemu: cgroup: Adjust cgroups' logic to allow mediated devices
As goes for all the other hostdev device types, grant the qemu process
access to /dev/vfio/<mediated_device_iommu_group>.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2017-03-27 15:39:35 +02:00
Erik Skultety
ec783d7c77 conf: Introduce new hostdev device type mdev
A mediated device will be identified by a UUID (with 'model' now being
a mandatory <hostdev> attribute to represent the mediated device API) of
the user pre-created mediated device. We also need to make sure that if
user explicitly provides a guest address for a mdev device, the address
type will be matching the device API supported on that specific mediated
device and error out with an incorrect XML message.

The resulting device XML:
<devices>
  <hostdev mode='subsystem' type='mdev' model='vfio-pci'>
    <source>
      <address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'>
    </source>
  </hostdev>
</devices>

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2017-03-27 15:39:35 +02:00
Peter Krempa
9b93c4c264 qemu: domain: Add helper to look up disk soruce by the backing store string 2017-03-27 10:18:16 +02:00
Peter Krempa
4e1618ce72 qemu: domain: Add helper to generate indexed backing store names
The code is currently simple, but if we later add node names, it will be
necessary to generate the names based on the node name. Add a helper so
that there's a central point to fix once we add self-generated node
names.
2017-03-27 09:29:57 +02:00
Peter Krempa
1a5e2a8098 qemu: domain: Add helper to lookup disk by node name
Looks up a disk and its corresponding backing chain element by node
name.
2017-03-27 09:29:57 +02:00
John Ferlan
1a6b6d9a56 qemu: Set up the migration TLS objects for target
If the migration flags indicate this migration will be using TLS,
then set up the destination during the prepare phase once the target
domain has been started to add the TLS objects to perform the migration.

This will create at least an "-object tls-creds-x509,endpoint=server,..."
for TLS credentials and potentially an "-object secret,..." to handle the
passphrase response to access the TLS credentials. The alias/id used for
the TLS objects will contain "libvirt_migrate".

Once the objects are created, the code will set the "tls-creds" and
"tls-hostname" migration parameters to signify usage of TLS.

During the Finish phase we'll be sure to attempt to clear the
migration parameters and delete those objects (whether or not they
were created). We'll also perform the same reset during recovery
if we've reached FINISH3.

If the migration isn't using TLS, then be sure to check if the
migration parameters exist and clear them if so.
2017-03-25 08:19:49 -04:00
Jiri Denemark
fcd56ce866 qemu: Set default values for CPU check attribute
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2017-03-17 11:50:48 +01:00
Michal Privoznik
7b89f857d9 qemu: Namespaces for NVDIMM
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-03-15 17:04:33 +01:00
Michal Privoznik
1bc173199e qemu: Implement NVDIMM
So, majority of the code is just ready as-is. Well, with one
slight change: differentiate between dimm and nvdimm in places
like device alias generation, generating the command line and so
on.

Speaking of the command line, we also need to append 'nvdimm=on'
to the '-machine' argument so that the nvdimm feature is
advertised in the ACPI tables properly.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-03-15 14:16:32 +01:00
Michal Privoznik
b4e8a49f8d Introduce NVDIMM memory model
NVDIMM is new type of memory introduced into QEMU 2.6. The idea
is that we have a Non-Volatile memory module that keeps the data
persistent across domain reboots.

At the domain XML level, we already have some representation of
'dimm' modules. Long story short, NVDIMM will utilize the
existing <memory/> element that lives under <devices/> by adding
a new attribute 'nvdimm' to the existing @model and introduce a
new <path/> element for <source/> while reusing other fields. The
resulting XML would appear as:

    <memory model='nvdimm'>
      <source>
        <path>/tmp/nvdimm</path>
      </source>
      <target>
        <size unit='KiB'>523264</size>
        <node>0</node>
      </target>
      <address type='dimm' slot='0'/>
    </memory>

So far, this is just a XML parser/formatter extension. QEMU
driver implementation is in the next commit.

For more info on NVDIMM visit the following web page:

    http://pmem.io/

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-03-15 13:30:58 +01:00
Michal Privoznik
290a00e41d qemuDomainBuildNamespace: Handle file mount points
https://bugzilla.redhat.com/show_bug.cgi?id=1431112

Yeah, that's right. A mount point doesn't have to be a directory.
It can be a file too. However, the code that tries to preserve
mount points under /dev for new namespace for qemu does not count
with that option.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-03-13 13:32:45 +01:00
Michal Privoznik
e915942b05 qemuProcessHandleMonitorEOF: Disable namespace for domain
https://bugzilla.redhat.com/show_bug.cgi?id=1430634

If a qemu process has died, we get EOF on its monitor. At this
point, since qemu process was the only one running in the
namespace kernel has already cleaned the namespace up. Any
attempt of ours to enter it has to fail.

This really happened in the bug linked above. We've tried to
attach a disk to qemu and while we were in the monitor talking to
qemu it just died. Therefore our code tried to do some roll back
(e.g. deny the device in cgroups again, restore labels, etc.).
However, during the roll back (esp. when restoring labels) we
still thought that domain has a namespace. So we used secdriver's
transactions. This failed as there is no namespace to enter.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-03-10 16:02:34 +01:00
Pavel Hrdina
cd4a8b9304 conf: store "autoGenerated" for graphics listen in status XML
When libvirtd is started we call qemuDomainRecheckInternalPaths
to detect whether a domain has VNC socket path generated by libvirt
based on option from qemu.conf.  However if we are parsing status XML
for running domain the existing socket path can be generated also if
the config XML uses the new <listen type='socket'/> element without
specifying any socket.

The current code doesn't make difference how the socket was generated
and always marks it as "fromConfig".  We need to store the
"autoGenerated" value in the status XML in order to preserve that
information.

The difference between "fromConfig" and "autoGenerated" is important
for migration, because if the socket is based on "fromConfig" we don't
print it into the migratable XML and we assume that user has properly
configured qemu.conf on both hosts.  However if the socket is based
on "autoGenerated" it means that a new feature was used and therefore
we need to leave the socket in migratable XML to make sure that if
this feature is not supported on destination the migration will fail.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-03-09 10:22:43 +01:00
John Ferlan
b2e5de96c7 qemu: Rename variable
Rename 'secretUsageType' to 'usageType' since it's superfluous in an
API qemu*Secret*
2017-03-08 14:37:05 -05:00
John Ferlan
7c2b7891cc qemu: Introduce qemuDomainSecretInfoTLSNew
Building upon the qemuDomainSecretInfoNew, create a helper which will
build the secret used for TLS.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-03-08 14:31:09 -05:00
John Ferlan
c9a7b7b6ea qemu: Introduce qemuDomainSecretInfoNew
Create a helper which will create the secinfo used for disks, hostdevs,
and chardevs.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2017-03-08 14:31:07 -05:00
Pavel Hrdina
3ffea19acd qemu_domain: cleanup the controller post parse code
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-03-07 16:50:35 +01:00
Pavel Hrdina
57404ff7a7 qemu_domain: move controller post parse code into its own function
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2017-03-07 16:50:34 +01:00
Michal Privoznik
4da534c0b9 qemu: Enforce qemuSecurity wrappers
Now that we have some qemuSecurity wrappers over
virSecurityManager APIs, lets make sure everybody sticks with
them. We have them for a reason and calling virSecurityManager
API directly instead of wrapper may lead into accidentally
labelling a file on the host instead of namespace.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-03-06 08:54:28 +01:00
Marc Hartmayer
e22de286b1 qemu: Fix deadlock across fork() in QEMU driver
The functions in virCommand() after fork() must be careful with regard
to accessing any mutexes that may have been locked by other threads in
the parent process. It is possible that another thread in the parent
process holds the lock for the virQEMUDriver while fork() is called.
This leads to a deadlock in the child process when
'virQEMUDriverGetConfig(driver)' is called and therefore the handshake
never completes between the child and the parent process. Ultimately
the virDomainObjectPtr will never be unlocked.

It gets much worse if the other thread of the parent process, that
holds the lock for the virQEMUDriver, tries to lock the already locked
virDomainObject. This leads to a completely unresponsive libvirtd.

It's possible to reproduce this case with calling 'virsh start XXX'
and 'virsh managedsave XXX' in a tight loop for multiple domains.

This commit fixes the deadlock in the same way as it is described in
commit 61b52d2e38.

Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
2017-02-21 15:47:32 +01:00
Michal Privoznik
5c74cf1f44 qemu: Allow @rendernode for virgl domains
When enabling virgl, qemu opens /dev/dri/render*. So far, we are
not allowing that in devices CGroup nor creating the file in
domain's namespace and thus requiring users to set the paths in
qemu.conf. This, however, is suboptimal as it allows access to
ALL qemu processes even those which don't have virgl configured.
Now that we have a way to specify render node that qemu will use
we can be more cautious and enable just that.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-20 10:44:22 +01:00
Michal Privoznik
1bb787fdc9 qemuDomainGetHostdevPath: Report /dev/vfio/vfio less frequently
So far, qemuDomainGetHostdevPath has no knowledge of the reasong
it is called and thus reports /dev/vfio/vfio for every VFIO
backed device. This is suboptimal, as we want it to:

a) report /dev/vfio/vfio on every addition or domain startup
b) report /dev/vfio/vfio only on last VFIO device being unplugged

If a domain is being stopped then namespace and CGroup die with
it so no need to worry about that. I mean, even when a domain
that's exiting has more than one VFIO devices assigned to it,
this function does not clean /dev/vfio/vfio in CGroup nor in the
namespace. But that doesn't matter.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:59 +01:00
Michal Privoznik
b8e659aa98 qemuDomainGetHostdevPath: Create /dev/vfio/vfio iff needed
So far, we are allowing /dev/vfio/vfio in the devices cgroup
unconditionally (and creating it in the namespace too). Even if
domain has no hostdev assignment configured. This is potential
security hole. Therefore, when starting the domain (or
hotplugging a hostdev) create & allow /dev/vfio/vfio too (if
needed).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:58 +01:00
Michal Privoznik
9d92f533f8 qemuSetupHostdevCgroup: Use qemuDomainGetHostdevPath
Since these two functions are nearly identical (with
qemuSetupHostdevCgroup actually calling virCgroupAllowDevicePath)
we can have one function call the other and thus de-duplicate
some code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:58 +01:00
Michal Privoznik
b57bd206b9 qemu_conf: Check for namespaces availability more wisely
The bare fact that mnt namespace is available is not enough for
us to allow/enable qemu namespaces feature. There are other
requirements: we must copy all the ACL & SELinux labels otherwise
we might grant access that is administratively forbidden or vice
versa.
At the same time, the check for namespace prerequisites is moved
from domain startup time to qemu.conf parser as it doesn't make
much sense to allow users to start misconfigured libvirt just to
find out they can't start a single domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-15 12:43:23 +01:00
Andrea Bolognani
ee6ec7824d qemu: Call chmod() after mknod()
mknod() is affected my the current umask, so we're not
guaranteed the newly-created device node will have the
right permissions.

Call chmod(), which is not affected by the current umask,
immediately afterwards to solve the issue.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1421036
2017-02-14 19:23:05 +01:00
Ján Tomko
723fef99c0 qemu: enforce maximum ports value for nec-xhci
This controller only allows up to 15 ports.

https://bugzilla.redhat.com/show_bug.cgi?id=1375417
2017-02-13 16:34:09 +01:00
Michal Privoznik
c2130c0d47 qemu_security: Introduce ImageLabel APIs
Just like we need wrappers over other virSecurityManager APIs, we
need one for virSecurityManagerSetImageLabel and
virSecurityManagerRestoreImageLabel. Otherwise we might end up
relabelling device in wrong namespace.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-09 08:04:57 +01:00
Michal Privoznik
b7feabbfdc qemuDomainNamespaceSetupDisk: Simplify disk check
Firstly, instead of checking for next->path the
virStorageSourceIsEmpty() function should be used which also
takes disk type into account.
Secondly, not every disk source passed has the correct type set
(due to our laziness). Therefore, instead of checking for
virStorageSourceIsBlockLocal() and also S_ISBLK() the former can
be refined to just virStorageSourceIsLocalStorage().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-08 15:56:21 +01:00
Michal Privoznik
786d8d91b4 qemuDomainDiskChainElement{Prepare,Revoke}: manage /dev entry
Again, one missed bit. This time without this commit there is no
/dev entry  in the namespace of the qemu process when doing disk
snapshots or block-copy.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-08 15:56:13 +01:00
Michal Privoznik
18ce9d139d qemuDomainNamespace{Setup,Teardown}Disk: Don't pass pointer to full disk
These functions do not need to see the whole virDomainDiskDef.
Moreover, they are going to be called from places where we don't
have access to the full disk definition. Sticking with
virStorageSource is more than enough.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-08 15:56:05 +01:00
Michal Privoznik
76d491ef14 qemuDomainNamespaceSetupDisk: Drop useless @src variable
Since its introduction in 81df21507b this variable was never
used.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-08 15:55:56 +01:00
Michal Privoznik
8dc867e978 qemu_domain: Don't pass virDomainDeviceDefPtr to ns helpers
There is no need for this. None of the namespace helpers uses it.
Historically it was used when calling secdriver APIs, but we
don't to that anymore.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-08 15:55:52 +01:00
Andrea Bolognani
c2e60ad0e5 qemu: Forbid <memoryBacking><locked> without <memtune><hard_limit>
In order for memory locking to work, the hard limit on memory
locking (and usage) has to be set appropriately by the user.

The documentation mentions the requirement already: with this
patch, it's going to be enforced by runtime checks as well,
by forbidding a non-compliant guest from being defined as well
as edited and started.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1316774
2017-02-07 18:43:10 +01:00
Michal Privoznik
7f0b382522 qemuDomainAttachDeviceMknod: Don't loop endlessly
When working with symlinks it is fairly easy to get into a loop.
Don't.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-07 13:20:19 +01:00
Michal Privoznik
3f5fcacf89 qemuDomainAttachDeviceMknod: Deal with symlinks
Similarly to one of the previous commits, we need to deal
properly with symlinks in hotplug case too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-07 13:20:17 +01:00
Michal Privoznik
4ac847f93b qemuDomainCreateDevice: Don't loop endlessly
When working with symlinks it is fairly easy to get into a loop.
Don't.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-07 13:18:32 +01:00
Michal Privoznik
54ed672214 qemuDomainCreateDevice: Properly deal with symlinks
Imagine you have a disk with the following source set up:

/dev/disk/by-uuid/$uuid (symlink to) -> /dev/sda

After cbc45525cb the transitive end of the symlink chain is
created (/dev/sda), but we need to create any item in chain too.
Others might rely on that.
In this case, /dev/disk/by-uuid/$uuid comes from domain XML thus
it is this path that secdriver tries to relabel. Not the resolved
one.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-07 13:18:10 +01:00
Michal Privoznik
b621291f5c qemuDomain{Attach,Detach}Device NS helpers: Don't relabel devices
After previous commit this has become redundant step.
Also setting up devices in namespace and setting their label
later on are two different steps and should be not done at once.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-07 10:40:53 +01:00
Michal Privoznik
572eda12ad qemu: Implement mtu on interface
Not only we should set the MTU on the host end of the device but
also let qemu know what MTU did we set.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-26 10:00:01 +01:00
Michal Privoznik
b020cf73fe domain_conf: Introduce <mtu/> to <interface/>
So far we allow to set MTU for libvirt networks. However, not all
domain interfaces have to be plugged into a libvirt network and
even if they are, they might want to have a different MTU (e.g.
for testing purposes).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-26 09:59:56 +01:00
Chen Hanxiao
980f2a35c7 qemu_domain: add timestamp in tainting of guests log
We lacked of timestamp in tainting of guests log,
which bring troubles for finding guest issues:
such as whether a guest powerdown caused by qemu-monitor-command
or others issues inside guests.
If we had timestamp in tainting of guests log,
it would be helpful when checking guest's /var/log/messages.

Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
2017-01-21 12:34:19 -05:00
Michal Privoznik
57b5e27d3d qemu: set default vhost-user ifname
Based on work of Mehdi Abaakouk <sileht@sileht.net>.

When parsing vhost-user interface XML and no ifname is found we
can try to fill it in in post parse callback. The way this works
is we try to make up interface name from given socket path and
then ask openvswitch whether it knows the interface.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-20 15:42:12 +01:00
Michal Privoznik
d0baf54e53 qemu: Actually unshare() iff running as root
https://bugzilla.redhat.com/show_bug.cgi?id=1413922

While all the code that deals with qemu namespaces correctly
detects whether we are running as root (and turn into NO-OP for
qemu:///session) the actual unshare() call is not guarded with
such check. Therefore any attempt to start a domain under
qemu:///session shall fail as unshare() is reserved for root.

The fix consists of moving unshare() call (for which we have a
wrapper called virProcessSetupPrivateMountNS) into
qemuDomainBuildNamespace() where the proper check is performed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
2017-01-17 13:23:56 +01:00
Michal Privoznik
93a062c3b2 qemu: Copy SELinux labels for namespace too
When creating new /dev/* for qemu, we do chown() and copy ACLs to
create the exact copy from the original /dev. I though that
copying SELinux labels is not necessary as SELinux will chose the
sane defaults. Surprisingly, it does not leaving namespace with
the following labels:

crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0     random
crw-------. root root system_u:object_r:tmpfs_t:s0     rtc0
drwxrwxrwt. root root system_u:object_r:tmpfs_t:s0     shm
crw-rw-rw-. root root system_u:object_r:tmpfs_t:s0     urandom

As a result, domain is unable to start:

error: internal error: process exited while connecting to monitor: Error in GnuTLS initialization: Failed to acquire random data.
qemu-kvm: cannot initialize crypto: Unable to initialize GNUTLS library: Failed to acquire random data.

The solution is to copy the SELinux labels as well.

Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-13 14:45:52 +01:00
Michal Privoznik
cbc45525cb qemuDomainCreateDevice: Canonicalize paths
So far the decision whether /dev/* entry is created in the qemu
namespace is really simple: does the path starts with "/dev/"?
This can be easily fooled by providing path like the following
(for any considered device like disk, rng, chardev, ..):

  /dev/../var/lib/libvirt/images/disk.qcow2

Therefore, before making the decision the path should be
canonicalized.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 18:08:13 +01:00
Michal Privoznik
49f326edc0 qemu: Use namespaces iff available on the host kernel
So far the namespaces were turned on by default unconditionally.
For all non-Linux platforms we provided stub functions that just
ignored whatever namespaces setting there was in qemu.conf and
returned 0 to indicate success. Moreover, we didn't really check
if namespaces are available on the host kernel.

This is suboptimal as we might have ignored user setting.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 18:07:43 +01:00
Michal Privoznik
41816751a7 util: Introduce virFileMoveMount
This is a simple wrapper over mount(). However, not every system
out there is capable of moving a mount point. Therefore, instead
of having to deal with this fact in all the places of our code we
can have a simple wrapper and deal with this fact at just one
place.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 18:06:30 +01:00
Michal Privoznik
2ff8c30548 qemuDomainSetupAllInputs: Update debug message
Due to a copy-paste error, the debug message reads:

  Setting up disks

It should have been:

  Setting up inputs.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-11 17:39:24 +01:00
Michal Privoznik
269589146c qemu_domain: Move qemuDomainGetPreservedMounts
This function is used only from code compiled on Linux. Therefore
on non-Linux platforms it triggers compilation error:

../../src/qemu/qemu_domain.c:209:1: error: unused function 'qemuDomainGetPreservedMounts' [-Werror,-Wunused-function]

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 19:23:49 +01:00
Michal Privoznik
406e390962 qemu: Drop qemuDomainDeleteNamespace
After previous commits, this function is no longer needed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
5d198c2b2c qemuDomainCreateNamespace: move mkdir to qemuDomainBuildNamespace
Again, there is no need to create /var/lib/libvirt/$domain.*
directories in CreateNamespace(). It is sufficient to create them
as soon as we need them which is in BuildNamespace. This way we
don't leave them around for the whole lifetime of domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
5d30057695 qemuDomainGetPreservedMounts: Do not special case /dev
The c1140eb9e got me thinking. We don't want to special case /dev
in qemuDomainGetPreservedMounts(), but in all other places in the
code we special case it anyway. I mean,
/var/run/libvirt/$domain.dev path is constructed separately just
so that it is not constructed here. It makes only a little sense
(if any at all).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Michal Privoznik
40ebbf72d5 qemuDomainCreateNamespace: s/unlink/rmdir/
If something goes wrong in this function we try a rollback. That
is unlink all the directories we created earlier. For some weird
reason unlink() was called instead of rmdir().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-10 13:04:57 +01:00
Martin Kletzander
c1140eb9ed qemu: Remove /dev mount info properly
Just so it doesn't bite us in the future, even though it's unlikely.

And fix the comment above it as well.  Commit e08ee7cd34 took the
info from the function it's calling, but that was lie itself in the
first place.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-01-05 16:24:55 +01:00
Michal Privoznik
e08ee7cd34 qemuDomainGetPreservedMounts: Fetch list of /dev/* mounts dynamically
With my namespace patches, we are spawning qemu in its own
namespace so that we can manage /dev entries ourselves. However,
some filesystems mounted under /dev needs to be preserved in
order to be shared with the parent namespace (e.g. /dev/pts).
Currently, the list of mount points to preserve is hardcoded
which ain't right - on some systems there might be less or more
items under real /dev that on our list. The solution is to parse
/proc/mounts and fetch the list from there.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-05 16:00:20 +01:00
Michal Privoznik
dd78da09b0 qemuDomainCreateDevice: Be more careful about device path
Again, not something that I'd hit, but there is a chance in
theory that this might bite us. Currently the way we decide
whether or not to create /dev entry for a device is by marching
first four characters of path with "/dev". This might be not
enough. Just imagine somebody has a disk image stored under
"/devil/path/to/disk". We ought to be matching against "/dev/".

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-04 15:36:42 +01:00
Michal Privoznik
ce01a2b11c qemuDomainAttachDeviceMknodHelper: Don't unlink() so often
Not that I'd encounter any bug here, but the code doesn't look
100% correct. Imagine, somebody is trying to attach a device to a
domain, and the device's /dev entry already exists in the qemu
namespace. This is handled gracefully and the control continues
with setting up ACLs and calling security manager to set up
labels. Now, if any of these steps fail, control jump on the
'cleanup' label and unlink() the file straight away. Even when it
was not us who created the file in the first place. This can be
possibly dangerous.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-04 15:36:42 +01:00
Michal Privoznik
3aae99fe71 qemu: Handle EEXIST gracefully in qemuDomainCreateDevice
https://bugzilla.redhat.com/show_bug.cgi?id=1406837

Imagine you have a domain configured in such way that you are
assigning two PCI devices that fall into the same IOMMU group.
With mount namespace enabled what happens is that for the first
PCI device corresponding /dev/vfio/X entry is created and when
the code tries to do the same for the second mknod() fails as
/dev/vfio/X already exists:

2016-12-21 14:40:45.648+0000: 24681: error :
qemuProcessReportLogError:1792 : internal error: Process exited
prior to exec: libvirt: QEMU Driver error : Failed to make device
/var/run/libvirt/qemu/windoze.dev//vfio/22: File exists

Worse, by default there are some devices that are created in the
namespace regardless of domain configuration (e.g. /dev/null,
/dev/urandom, etc.). If one of them is set as backend for some
guest device (e.g. rng, chardev, etc.) it's the same story as
described above.

Weirdly, in attach code this is already handled.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-01-04 15:36:42 +01:00
John Ferlan
7f7d990483 qemu: Don't assume secret provided for LUKS encryption
https://bugzilla.redhat.com/show_bug.cgi?id=1405269

If a secret was not provided for what was determined to be a LUKS
encrypted disk (during virStorageFileGetMetadata processing when
called from qemuDomainDetermineDiskChain as a result of hotplug
attach qemuDomainAttachDeviceDiskLive), then do not attempt to
look it up (avoiding a libvirtd crash) and do not alter the format
to "luks" when adding the disk; otherwise, the device_add would
fail with a message such as:

   "unable to execute QEMU command 'device_add': Property 'scsi-hd.drive'
    can't find value 'drive-scsi0-0-0-0'"

because of assumptions that when the format=luks that libvirt would have
provided the secret to decrypt the volume.

Access to unlock the volume will thus be left to the application.
2017-01-03 12:59:18 -05:00
Marc Hartmayer
fb2cd32c9a qemu: qemuDomainDiskChangeSupported: Add missing 'address' check
Disk->info is not live updatable so add a check for this. Otherwise
libvirt reports success even though no data was updated.

Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
2016-12-20 11:22:44 +01:00
Michal Privoznik
ab41ce7f4e qemu: Mark more namespace code linux-only
Some of the functions are not called on non-linux platforms
which makes them useless there.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-16 11:51:06 +00:00
Michal Privoznik
f444faa94a qemu: Enable mount namespace
https://bugzilla.redhat.com/show_bug.cgi?id=1404952

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
661887f558 qemu: Let users opt-out from containerization
Given how intrusive previous patches are, it might happen that
there's a bug or imperfection. Lets give users a way out: if they
set 'namespaces' to an empty array in qemu.conf the feature is
suppressed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
f95c5c48d4 qemu: Manage /dev entry on RNG hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
f5fdf23a68 qemu: Manage /dev entry on chardev hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
6e57492839 qemu: Manage /dev entry on hostdev hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
81df21507b qemu: Manage /dev entry on disk hotplug
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
2160f338a7 qemu: Prepare RNGs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
8ec8a8c5ff qemu: Prepare inputs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
2c654490f3 qemu: Prepare TPM when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
4e4451019c qemu: Prepare chardevs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
73267cec46 qemu: Prepare hostdevs when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
054202d020 qemu: Prepare disks when starting a domain
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
bb4e529664 qemu: Spawn qemu under mount namespace
Prime time. When it comes to spawning qemu process and
relabelling all the devices it's going to touch, there's inherent
race with other applications in the system (e.g. udev). Instead
of trying convincing udev to not touch libvirt managed devices,
we can create a separate mount namespace for the qemu, and mount
our own /dev there. Of course this puts more work onto us as we
have to maintain /dev files on each domain start and device
hot(un-)plug. On the other hand, this enhances security also.

From technical POV, on domain startup process the parent
(libvirtd) creates:

  /var/lib/libvirt/qemu/$domain.dev
  /var/lib/libvirt/qemu/$domain.devpts

The child (which is going to be qemu eventually) calls unshare()
to create new mount namespace. From now on anything that child
does is invisible to the parent. Child then mounts tmpfs on
$domain.dev (so that it still sees original /dev from the host)
and creates some devices (as explained in one of the previous
patches). The devices have to be created exactly as they are in
the host (including perms, seclabels, ACLs, ...). After that it
moves $domain.dev mount to /dev.

What's the $domain.devpts mount there for then you ask? QEMU can
create PTYs for some chardevs. And historically we exposed the
host ends in our domain XML allowing users to connect to them.
Therefore we must preserve devpts mount to be shared with the
host's one.

To make this patch as small as possible, creating of devices
configured for domain in question is implemented in next patches.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Michal Privoznik
7ed6934f3b virDomainObjGetShortName: take virDomainDef
So far this function takes virDomainObjPtr which:
1) is an overkill,
2) might be not available in all the places we will use it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-08 15:45:52 +01:00
Laine Stump
9b0848d523 qemu: propagate virQEMUDriver object to qemuDomainDeviceCalculatePCIConnectFlags
If libvirtd is running unprivileged, it can open a device's PCI config
data in sysfs, but can only read the first 64 bytes. But as part of
determining whether a device is Express or legacy PCI,
qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future
patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond
the first 64 bytes of the PCI config data and fails with an error log
if the read is unsuccessful.

In order to avoid creating a parallel "quiet" version of
virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down
through all the call chains that initialize the
qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver
pointer with the rest of the iterdata so that it can be used by
qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used
yet, but will be used in an upcoming patch (that detects Express vs
legacy PCI for VFIO assigned devices) to examine driver->privileged.
2016-11-30 15:28:07 -05:00