Commit Graph

250 Commits

Author SHA1 Message Date
Martin Kletzander
272649a1d7 qemu: Restore machinename even without cgroups
The virresctrl will use this as well and we need to have that info after restart
to properly clean up /sys/fs/resctrl.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2018-01-31 14:51:34 +01:00
Eduardo Habkost
9a22251bbe qemu_cgroup: Fix 'rc' argument on virDomainAuditCgroupPath() calls
All calls to virDomainAuditCgroupPath() were passing 'rc == 0' as
argument, when it was supposed to pass the 'rc' value directly.

As a consequence, the audit events that were supposed to be
logged (actual cgroup changes) were never being logged, and bogus
audit events were logged when using regular files as disk image.

Fix all calls to use the return value of
virCgroup{Allow,Deny}Device*() directly as the 'rc' argument.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2018-01-04 10:50:38 +01:00
Ján Tomko
f29612fd35 qemu: Introduce functions for input device cgroup manipulation
Export qemuSetupInputCgroup and introduce qemuTeardownInputCgroup
for hotunplug.
2017-11-24 17:38:51 +01:00
Peter Krempa
0a294a8e28 util: storagefile: Add helpers to check presence of backing store
Add helpers that will simplify checking if a backing file is valid or
whether it has backing store. The helper virStorageSourceIsBacking
returns true if the given virStorageSource is a valid backing store
member. virStorageSourceHasBacking returns true if the virStorageSource
has a backing store child.

Adding these functions creates a central points for further refactors.
2017-10-17 06:19:18 +02:00
Martin Kletzander
e1bafb0099 qemu_cgroup: Remove unnecessary virQEMUDriverPtr arguments
Since commit 2e6ecba1bc, the pointer to the qemu driver is saved in
domain object's private data and hence does not have to be passed as
yet another parameter if domain object is already one of them.

This is a first (example) patch of this kind of clean up, others will
hopefully follow.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-07-26 17:47:25 +02:00
Martin Kletzander
eaf2c9f891 Move machineName generation from virsystemd into domain_conf
It is more related to a domain as we might use it even when there is
no systemd and it does not use any dbus/systemd functions.  In order
not to use code from conf/ in util/ pass machineName in cgroups code
as a parameter.  That also fixes a leak of machineName in the lxc
driver and cleans up and de-duplicates some code.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-07-25 17:02:27 +02:00
Michal Privoznik
6e95abb446 qemu: Allow nvdimm in devices CGroups
Some users might want to pass a blockdev or a chardev as a
backend for NVDIMM. In fact, this is expected to be the mostly
used configuration. Therefore libvirt should allow the device in
devices CGroup then.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-03-15 16:55:30 +01:00
Michal Privoznik
3cddd63aec qemu_cgroup: Only try to allow devices if devices CGroup's available
When a domain needs an access to some device (be it a disk, RNG,
chardev, whatever), we have to allow it in the devices CGroup (if
it is available), because by default we disallow all the devices.
But some of the functions that are responsible for setting up
devices CGroup are lacking check whether there is any CGroup
available. Thus users might be unable to hotplug some devices:

  virsh # attach-device fedora rng.xml
  error: Failed to attach device from rng.xml
  error: internal error: Controller 'devices' is not mounted

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-23 11:21:26 +01:00
Michal Privoznik
5c74cf1f44 qemu: Allow @rendernode for virgl domains
When enabling virgl, qemu opens /dev/dri/render*. So far, we are
not allowing that in devices CGroup nor creating the file in
domain's namespace and thus requiring users to set the paths in
qemu.conf. This, however, is suboptimal as it allows access to
ALL qemu processes even those which don't have virgl configured.
Now that we have a way to specify render node that qemu will use
we can be more cautious and enable just that.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-02-20 10:44:22 +01:00
Michal Privoznik
1bb787fdc9 qemuDomainGetHostdevPath: Report /dev/vfio/vfio less frequently
So far, qemuDomainGetHostdevPath has no knowledge of the reasong
it is called and thus reports /dev/vfio/vfio for every VFIO
backed device. This is suboptimal, as we want it to:

a) report /dev/vfio/vfio on every addition or domain startup
b) report /dev/vfio/vfio only on last VFIO device being unplugged

If a domain is being stopped then namespace and CGroup die with
it so no need to worry about that. I mean, even when a domain
that's exiting has more than one VFIO devices assigned to it,
this function does not clean /dev/vfio/vfio in CGroup nor in the
namespace. But that doesn't matter.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:59 +01:00
Michal Privoznik
b8e659aa98 qemuDomainGetHostdevPath: Create /dev/vfio/vfio iff needed
So far, we are allowing /dev/vfio/vfio in the devices cgroup
unconditionally (and creating it in the namespace too). Even if
domain has no hostdev assignment configured. This is potential
security hole. Therefore, when starting the domain (or
hotplugging a hostdev) create & allow /dev/vfio/vfio too (if
needed).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:58 +01:00
Michal Privoznik
9d92f533f8 qemuSetupHostdevCgroup: Use qemuDomainGetHostdevPath
Since these two functions are nearly identical (with
qemuSetupHostdevCgroup actually calling virCgroupAllowDevicePath)
we can have one function call the other and thus de-duplicate
some code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:58 +01:00
Michal Privoznik
60ddceff8f qemu_cgroup: Kill qemuSetupHostSCSIVHostDeviceCgroup
There's no need for this function. Currently it is passed as a
callback to virSCSIVHostDeviceFileIterate(). However, SCSI host
devices have just one file path. Therefore we can mimic approach
used in qemuDomainGetHostdevPath() to get path and call
virCgroupAllowDevicePath() directly.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:58 +01:00
Michal Privoznik
7bb01ed3cd qemu_cgroup: Kill qemuSetupHostSCSIDeviceCgroup
There's no need for this function. Currently it is passed as a
callback to virSCSIDeviceFileIterate(). However, SCSI devices
have just one file path. Therefore we can mimic approach used in
qemuDomainGetHostdevPath() to get path and call
virCgroupAllowDevicePath() directly.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:58 +01:00
Michal Privoznik
4d7d1c4bc3 qemu_cgroup: Kill qemuSetupHostUSBDeviceCgroup
There's no need for this function. Currently it is passed as a
callback to virUSBDeviceFileIterate(). However, USB devices have
just one file path. Therefore we can mimic approach used in
qemuDomainGetHostdevPath() to get path and call
virCgroupAllowDevicePath() directly.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-02-20 07:21:58 +01:00
Michal Privoznik
a5896e8ca4 qemu_cgroup: Expose defaultDeviceACL
This is a list of devices that qemu needs for its run (apart from
what's configured for domain). The devices on the list are
enabled in the CGroups by default so they will be good candidates
for initial /dev for new qemu.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-15 09:25:16 +01:00
Boris Fiuczynski
b178fa8ecb qemu: fix internal error: NUMA isn't available on this host
If libvirt is compiled without NUMACTL support starting libvirtd
reports a libvirt internal error "NUMA isn't available on this host"
without checking if NUMA support is compiled into the libvirt binaries.
This patch adds the missing NUMA support check to prevent the internal error.
It also includes a check if the cgroup controller cpuset is available before
using it.

The error was noticed when libvirtd was restarted with running domains and
on libvirtd start the qemuConnectCgroup gets called during qemuProcessReconnect.

Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
2016-11-25 09:48:41 +01:00
Eric Farman
9cc26dc622 qemu: Add vhost-scsi string for -device parameter
Open /dev/vhost-scsi, and record the resulting file descriptor, so that
the guest has access to the host device outside of the libvirt daemon.
Pass this information, along with data parsed from the XML file, to build
a device string for the qemu command line.  That device string will be
for either a vhost-scsi-ccw device in the case of an s390 machine, or
vhost-scsi-pci for any others.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
2016-11-24 12:16:19 -05:00
Eric Farman
fc0e627bac Introduce framework for a hostdev SCSI_host subsystem type
We already have a "scsi" hostdev subsys type, which refers to a single
LUN that is passed through to a guest.  But what of things where
multiple LUNs are passed through via a single SCSI HBA, such as with
the vhost-scsi target?  Create a new hostdev subsys type that will
carry this.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
2016-11-24 12:15:26 -05:00
Michal Privoznik
5d9c2c7081 qemu: Update cgroup on chardev hotplug
Just like in the previous commit, we are not updating CGroups on
chardev hot(un-)plug and thus leaving qemu unable to access any
non-default device users are trying to hotplug.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-11-23 16:38:02 +01:00
Michal Privoznik
085692c8bb qemu: Update cgroup on RNG hotplug
If users try to hotplug RNG device with a backend different to
/dev/random or /dev/urandom the whole operation fails as qemu is
unable to access the device. The problem is we don't update
device CGroups during the operation.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-11-23 16:37:57 +01:00
Eric Farman
85b0721095 Cleanup switch statements on the hostdev subsystem type
As was suggested in an earlier review comment[1], we can
catch some additional code points by cleaning up how we use the
hostdev subsystem type in some switch statements.

[1] End of https://www.redhat.com/archives/libvir-list/2016-September/msg00399.html

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-11-11 16:58:56 -05:00
John Ferlan
77a12987a4 Introduce virDomainChrSourceDefNew for virDomainChrDefPtr
Change the virDomainChrDef to use a pointer to 'source' and allocate
that pointer during virDomainChrDefNew.

This has tremendous "fallout" in the rest of the code which mainly
has to change source.$field to source->$field.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-10-21 14:03:36 -04:00
Peter Krempa
77cb01bc0f numa: Rename virNumaGetHostNodeset and make it return only nodes with memory
Name it virNumaGetHostMemoryNodeset and return only NUMA nodes which
have memory installed. This is necessary as the kernel is not very happy
to set the memory cgroup setting for nodes which do not have any memory.

This would break vcpu hotplug with following message on such
configruation:

  Invalid value '0,8' for 'cpuset.mems': Invalid argument

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1375268
2016-09-14 08:41:41 +02:00
Peter Krempa
f428ff8ad4 qemu: Add missing 'p' to qemuCgrouEmulatorAllNodesRestore 2016-09-13 12:24:02 +02:00
Peter Krempa
eb5dee3534 qemu: cgroup: Extract temporary relaxing of cgroup setting for vcpu hotplug
When hot-adding vcpus qemu needs to allocate some structures in the DMA
zone which may be outside of the numa pinning. Extract the code doing
this in a set of helpers so that it can be reused.
2016-09-07 16:05:01 +02:00
Peter Krempa
c7d5dd3974 conf: Rename virDomainVcpuInfoPtr to virDomainVcpuDefPtr 2016-07-11 09:06:09 +02:00
Ján Tomko
d033d4762f Revert "qemu_cgroup: allow access to /dev/dri for virtio-vga"
This reverts commit 3943bdd60c.
2016-05-23 10:48:27 +02:00
Ján Tomko
3943bdd60c qemu_cgroup: allow access to /dev/dri for virtio-vga
QEMU needs access to the /dev/dri/render* device for
virgl to work.

Allow access to all /dev/dri/* devices for domains with
<video>
  <model type='virtio' heads='1' primary='yes'>
    <acceleration accel3d='yes'/>
  </model>
</video>

https://bugzilla.redhat.com/show_bug.cgi?id=1337290
2016-05-19 10:52:50 +02:00
Martin Kletzander
16b41728b5 qemu: Free priv->machineName
Commit c3bd0019c0 forgot to cleanup after itself.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1325043

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2016-04-11 11:46:09 +02:00
Alexander Burluka
ef1fa55e46 Implement qemuSetupGlobalCpuCgroup
This functions setups per-domain cpu bandwidth parameters

Signed-off-by: Alexander Burluka <aburluka@virtuozzo.com>
2016-03-01 14:30:11 +00:00
Peter Krempa
a06ef20782 qemu: process: Move emulator thread setting code into one function
Similarly to the refactors to iothreads and vcpus, move the code that
initializes the emulator thread settings into single function.
2016-03-01 14:07:27 +00:00
Bjoern Walk
65c4c7d850 qemu: cgroup: fix cgroup permission logic
Fix logic error introduced in commit d6c91b3c which essentially broke
starting any domain.

Signed-off-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
2016-02-18 10:32:46 +01:00
Peter Krempa
d1242ba24a qemu: cgroup: Setup cgroups for bios/firmware images
oVirt wants to use OVMF images on top of lvm for their 'logical'
storage thus we should set up device ACLs for them so it will actually
work.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1305922
2016-02-17 12:29:00 +01:00
Peter Krempa
d6c91b3c03 qemu: cgroup: Extract guts of qemuSetupImageCgroupInternal
They will later be reused for setting cgroup for other image backed
devices.
2016-02-17 10:54:05 +01:00
Peter Krempa
2b15f2a196 qemu: cgroup: Split up qemuSetImageCgroupInternal
Separate the Teardown and Setup code paths into separate helpers.
2016-02-17 10:54:05 +01:00
Peter Krempa
5dd610d01d qemu: cgroup: Switch to qemu(Setup|Teardown)ImageCgroup
For other objects we use the two functions rather than one with a bool.
Convert qemuSetImageCgroup to the same approach.
2016-02-17 10:54:05 +01:00
Peter Krempa
4e22355ee1 qemu: cgroup: Avoid reporting errors from inaccessible NFS volumes
Rather than reporting it and then reseting the error, don't report it in
the first place.
2016-02-17 10:54:05 +01:00
Peter Krempa
cf113e8d54 util: cgroup: Allow ignoring EACCES in virCgroup(Allow|Deny)DevicePath
When adding disk images to ACL we may call those functions on NFS
shares. In that case we might get an EACCES, which isn't really relevant
since NFS would not hold a block device. This patch adds a flag that
allows to stop reporting an error on EACCES to avoid spaming logs.

Currently there's no functional change.
2016-02-17 10:54:05 +01:00
Peter Krempa
9cd5da710e util: cgroup: Drop virCgroup(Allow|Deny)DeviceMajor
Since commit 47e5b5ae virCgroupAllowDevice allows to pass -1 as either
the minor or major device number and it automatically uses '*' in place
of that. Reuse the new approach through the code and drop the duplicated
functions.
2016-02-17 10:54:05 +01:00
Peter Krempa
21212fca13 qemu: cgroup: Remove abandoned function qemuAddToCgroup
This function doesn't do anything useful since 2049ef9942.
2016-02-17 10:28:34 +01:00
Peter Krempa
1dcc4c7ffd qemu: iothread: Aggregate code to set IOThread tuning
Rather than iterating 3 times for various settings this function
aggregates all the code into single place. One of the other advantages
is that it can then be reused for properly setting IOThread info on
hotplug.
2016-02-08 17:05:00 +01:00
Peter Krempa
56971667ee qemu: vcpu: Aggregate code to set vCPU tuning
Rather than iterating 3 times for various settings this function
aggregates all the code into single place. One of the other advantages
is that it can then be reused for properly setting vCPU info on hotplug.

With this approach autoCpuset is also used when setting the process
affinity rather than just via cgroups.
2016-02-08 17:05:00 +01:00
Peter Krempa
d2a6fc79e3 conf: Store cpu pinning data in def->vcpus
Now with the new struct the data can be stored in a much saner place.
2016-02-08 09:51:34 +01:00
Martin Kletzander
c3bd0019c0 systemd: Modernize machine naming
So, systemd-machined has this philosophy that machine names are like
hostnames and hence should follow the same rules.  But we always allowed
international characters in domain names.  Thus we need to modify the
machine name we are passing to systemd.

In order to change some machine names that we will be passing to systemd,
we also need to call TerminateMachine at the end of a lifetime of a
domain.  Even for domains that were started with older libvirt.  That
can be achieved thanks to virSystemdGetMachineNameByPID().  And because
we can change machine names, we can get rid of the inconsistent and
pointless escaping of domain names when creating machine names.

So this patch modifies the naming in the following way.  It creates the
name as <drivername>-<id>-<name> where invalid hostname characters are
stripped out of the name and if the resulting name is longer, it
truncates it to 64 characters.  That way we can start domains we
couldn't start before.  Well, at least on systemd.

To make it work all together, the machineName (which is needed only with
systemd) is saved in domain's private data.  That way the generation is
moved to the driver and we don't need to pass various unnecessary
arguments to cgroup functions.

The only thing this complicates a bit is the scope generation when
validating a cgroup where we must check both old and new naming, so a
slight modification was needed there.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1282846

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2016-02-05 16:11:50 +01:00
John Ferlan
d6d7e2885b cgroup: Fix possible bug as a result of code motion for vcpu cgroup setup
Commit id '90b721e43' moved where the virCgroupAddTask was made until
after the check for the vcpupin checks. However, in doing so it missed
an option where if the cpumap didn't exist, then the code would continue
back to the top of the current vcpu loop. The results was that the
virCgroupAddTask wouldn't be called.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2016-01-14 11:02:53 -05:00
John Ferlan
d41bd09596 Revert "util: cgroups do not implicitly add task to new machine cgroup"
This reverts commit 71ce475967.

Since commit id 'a41c00b47' has been reverted, this no longer is
necessary
2016-01-14 11:00:25 -05:00
John Ferlan
f8f6907284 Revert "qemu: do not put a task into machine cgroup"
This reverts commit a41c00b472.

After much testing and upstream discussion this has been deemed to be
the incorrect operation since it means we no longer have any guarantee
about which resource controllers the QEMU processes in general are in.
2016-01-14 10:56:53 -05:00
Henning Schild
90b721e43e qemu cgroups: move new threads to new cgroup after cpuset is set up
Moving tasks to cgroups implied sched_setaffinity. Changing the cpus in
a set implies the same for all tasks in the group.
The old code put the the thread into the cpuset inherited from the
machine cgroup, which allowed it to run outside of vcpupin for a short
while.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2015-12-14 15:58:05 -05:00
Henning Schild
a41c00b472 qemu: do not put a task into machine cgroup
The machine cgroup is a superset, a parent to the emulator and vcpuX
cgroups. The parent cgroup should never have any tasks directly in it.
In fact the parent cpuset might contain way more cpus than the sum of
emulatorpin and vcpupins. So putting tasks in the superset will allow
them to run outside of <cputune>.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2015-12-14 15:48:05 -05:00