Currently, the Linux kernel treats values of '0' and '1' as
the minimum of 2. Values larger than the maximum are changed
to the maximum.
Re-reading the shares value after setting it reflects this in
the live domain XML.
Currently, <cputune><shares>0</shares></cputune> is treated
as if it were not specified.
Treat is as a valid value if it was explicitly specified
and write it to the cgroups.
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To support passing the path of the test data to the utils, one
more argument is added to virSCSIDeviceGetSgName,
virSCSIDeviceGetDevName, and virSCSIDeviceNew, and the related
code is changed accordingly.
Later tests for the scsi utils will be based on this patch.
Signed-off-by: Osier Yang <jyang@redhat.com>
Creating a qemu VM with /dev/hwrng as backend RNG device throws the
following error - "Could not open '/dev/hwrng': Permission denied"
This patch fixes the issue
Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Unlike the host devices of other types, SCSI host device XML supports
"shareable" tag. This patch introduces it for the virSCSIDevice struct
for a later patch use (to detect if the SCSI device is shareable when
preparing the SCSI host device in QEMU driver).
This patch introduces virCgroupSetBlkioDeviceReadIops,
virCgroupSetBlkioDeviceWriteIops,
virCgroupSetBlkioDeviceReadBps and
virCgroupSetBlkioDeviceWriteBps,
we can use these interfaces to set up throttle
blkio cgroup for domain.
This patch also adds the new throttle blkio cgroup
elements to the test xml.
Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Most of our code base uses space after comma but not before;
fix the remaining uses before adding a syntax check.
* src/qemu/qemu_cgroup.c: Consistently use commas.
* src/qemu/qemu_command.c: Likewise.
* src/qemu/qemu_conf.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/qemu/qemu_monitor.c: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
On my machine, a guest fails to boot if it has a sound card, but not
graphical device/display is configured, because pulseaudio fails to
initialize since it can't access $HOME.
A workaround is removing the audio device, however on ARM boards there
isn't any option to do that, so -nographic always fails.
Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately
this has massive test suite fallout.
Add a qemu.conf parameter nographics_allow_host_audio, that if enabled
will pass through QEMU_AUDIO_DRV from sysconfig (similar to
vnc_allow_host_audio)
Since 16bcb3 we have a regression. The hard_limit is set
unconditionally. By default the limit is zero. Hence, if user hasn't
configured any, we set the zero in cgroup subsystem making the kernel
kill the corresponding qemu process immediately. The proper fix is to
set hard_limit iff user has configured any.
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
If upgrading from a libvirt that is older than 1.0.5, we can
not assume that vm->def->resource is non-NULL. This bogus
assumption caused libvirtd to crash
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make the virCgroupNewMachine method try to use systemd-machined
first. If that fails, then fallback to using the traditional
cgroup setup code path.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of requiring drivers to use a combination of calls
to virCgroupNewDetect and virCgroupIsValidMachine, combine
the two into virCgroupNewDetectMachine
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the QEMU driver creates the VM's cgroup prior to
forking, and then uses a virCommand hook to move the child
into the cgroup. This won't work with systemd whose APIs
do the creation of cgroups + attachment of processes atomically.
Fortunately we have a handshake taking place between the
QEMU driver and the child process prior to QEMU being exec()d,
which was introduced to allow setup of disk locking. By good
fortune this synchronization point can be used to enable the
QEMU driver to do atomic setup of cgroups removing the use
of the hook script.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Use the new virCgroupNewDetect function to determine cgroup
placement of existing running VMs. This will allow the legacy
cgroups creation APIs to be removed entirely
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Convert the remaining methods in vircgroup.c to report errors
instead of returning errno values.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).
I'm changing it now because I'm about to add several more functions
that deal with iommu groups.
Change bbe97ae968eba60b71e0066d49f9fc909966d9d6 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Found that I was unable to start existing domains after updating
to a kernel with no cgroups support
# zgrep CGROUP /proc/config.gz
# CONFIG_CGROUPS is not set
# virsh start test
error: Failed to start domain test
error: Unable to initialize /machine cgroup: Cannot allocate memory
virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when
attempting to open /proc/cgroups on such a system, but it was being
dropped in virCgroupSetPartitionSuffix().
Change virCgroupSetPartitionSuffix() to propagate errors returned by
its callees. Also check for ENOENT in qemuInitCgroup() when determining
if cgroups support is available.
This adds the scsi-generic device into the device controller's
whitelist, so that it's allowed to used by the qemu process.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
I must have looked at this a couple dozen times before I noticed it
had "!=" instead of "==". Not doing this setup prevented qemu from
doing anything with the vfio group device.
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
The USB-specific cgroup setup had been inserted inline in
qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a
common cgroup setup function called for all hostdevs, so it makes sens
to put the usb-specific setup there and just rely on that function
being called.
The one thing I'm uncertain of here (and a reason for not pushing
until after release) is that previously hostdev->missing was checked
only when starting a domain (and cgroup setup for the device skipped
if missing was true), but with this consolidation, it is now checked
in the case of hotplug as well. I don't know if this will have any
practical effect (does it make sense to hotplug a "missing" usb
device?)
PCIO device assignment using VFIO requires read/write access by the
qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the
VFIO group number that the assigned device belongs to (and can be
found with the function virPCIDeviceGetVFIOGroupDev)
/dev/vfio/vfio can be accessible to any guest without danger
(according to vfio developers), so it is added to the static ACL.
The group device must be dynamically added to the cgroup ACL for each
vfio hostdev in two places:
1) for any devices in the persistent config when the domain is started
(done during qemuSetupCgroup())
2) at device attach time for any hotplug devices (done in
qemuDomainAttachHostDevice)
The group device must be removed from the ACL when a device it
"hot-unplugged" (in qemuDomainDetachHostDevice())
Note that USB devices are already doing their own cgroup setup and
teardown in the hostdev-usb specific function. I chose to make the new
functions generic and call them in a common location though. We can
then move the USB-specific code (which is duplicated in two locations)
to this single location. I'll be posting a followup patch to do that.
The change in commit aed4986322fe77bdf718e31a0587d00f04f3d97a
was incomplete, missing a couple of cases of /system. This
caused failure to start VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After discussions with systemd developers it was decided that
a better default policy for resource partitions is to have
3 default partitions at the top level
/system - system services
/machine - virtual machines / containers
/user - user login session
This ensures that the default policy isolates guest from
user login sessions & system services, so a mis-behaving
guest can't consume 100% of CPU usage if other things are
contending for it.
Thus we change the default partition from /system to
/machine
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupNewDriver method had a 'bool privileged' param.
If a false value was ever passed in, it would simply not
work, since non-root users don't have any privileges to create
new cgroups. Just delete this broken code entirely and make
the QEMU driver skip cgroup setup in non-privileged mode
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Historically QEMU/LXC guests have been placed in a cgroup layout
that is
$LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME
This is bad for a number of reasons
- The cgroup hierarchy gets very deep which seriously
impacts kernel performance due to cgroups scalability
limitations.
- It is hard to setup cgroup policies which apply across
services and virtual machines, since all VMs are underneath
the libvirtd service.
To address this the default cgroup location is changed to
be
/system/$VMNAME.{lxc,qemu}.libvirt
This puts virtual machines at the same level in the hierarchy
as system services, allowing consistent policy to be setup
across all of them.
This also honours the new resource partition location from the
XML configuration, for example
<resource>
<partition>/virtualmachines/production</partitions>
</resource>
will result in the VM being placed at
/virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt
NB, with the exception of the default, /system, path which
is intended to always exist, libvirt will not attempt to
auto-create the partitions in the XML. It is the responsibility
of the admin/app to configure the partitions. Later libvirt
APIs will provide a way todo this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A resource partition is an absolute cgroup path, ignoring the
current process placement. Expose a virCgroupNewPartition API
for constructing such cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>