mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2024-12-22 05:35:25 +00:00
docs: Convert 'cgroups' page to rST
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
This commit is contained in:
parent
492576edb8
commit
42b5e496a7
@ -1,424 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE html>
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<body>
|
||||
<h1>Control Groups Resource Management</h1>
|
||||
|
||||
<ul id="toc"></ul>
|
||||
|
||||
<p>
|
||||
The QEMU and LXC drivers make use of the Linux "Control Groups" facility
|
||||
for applying resource management to their virtual machines and containers.
|
||||
</p>
|
||||
|
||||
<h2><a id="requiredControllers">Required controllers</a></h2>
|
||||
|
||||
<p>
|
||||
The control groups filesystem supports multiple "controllers". By default
|
||||
the init system (such as systemd) should mount all controllers compiled
|
||||
into the kernel at <code>/sys/fs/cgroup/$CONTROLLER-NAME</code>. Libvirt
|
||||
will never attempt to mount any controllers itself, merely detect where
|
||||
they are mounted.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The QEMU driver is capable of using the <code>cpuset</code>,
|
||||
<code>cpu</code>, <code>cpuacct</code>, <code>memory</code>,
|
||||
<code>blkio</code> and <code>devices</code> controllers.
|
||||
None of them are compulsory. If any controller is not mounted,
|
||||
the resource management APIs which use it will cease to operate.
|
||||
It is possible to explicitly turn off use of a controller,
|
||||
even when mounted, via the <code>/etc/libvirt/qemu.conf</code>
|
||||
configuration file.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The LXC driver is capable of using the <code>cpuset</code>,
|
||||
<code>cpu</code>, <code>cpuacct</code>, <code>freezer</code>,
|
||||
<code>memory</code>, <code>blkio</code> and <code>devices</code>
|
||||
controllers. The <code>cpuacct</code>, <code>devices</code>
|
||||
and <code>memory</code> controllers are compulsory. Without
|
||||
them mounted, no containers can be started. If any of the
|
||||
other controllers are not mounted, the resource management APIs
|
||||
which use them will cease to operate.
|
||||
</p>
|
||||
|
||||
<h2><a id="currentLayout">Current cgroups layout</a></h2>
|
||||
|
||||
<p>
|
||||
As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
|
||||
simplified, in order to facilitate the setup of resource control policies by
|
||||
administrators / management applications. The new layout is based on the concepts
|
||||
of "partitions" and "consumers". A "consumer" is a cgroup which holds the
|
||||
processes for a single virtual machine or container. A "partition" is a cgroup
|
||||
which does not contain any processes, but can have resource controls applied.
|
||||
A "partition" will have zero or more child directories which may be either
|
||||
"consumer" or "partition".
|
||||
</p>
|
||||
|
||||
<p>
|
||||
As of libvirt 1.1.1 or later, the cgroups layout will have some slight
|
||||
differences when running on a host with systemd 205 or later. The overall
|
||||
tree structure is the same, but there are some differences in the naming
|
||||
conventions for the cgroup directories. Thus the following docs split
|
||||
in two, one describing systemd hosts and the other non-systemd hosts.
|
||||
</p>
|
||||
|
||||
<h3><a id="currentLayoutSystemd">Systemd cgroups integration</a></h3>
|
||||
|
||||
<p>
|
||||
On hosts which use systemd, each consumer maps to a systemd scope unit,
|
||||
while partitions map to a system slice unit.
|
||||
</p>
|
||||
|
||||
<h4><a id="systemdScope">Systemd scope naming</a></h4>
|
||||
|
||||
<p>
|
||||
The systemd convention is for the scope name of virtual machines / containers
|
||||
to be of the general format <code>machine-$NAME.scope</code>. Libvirt forms the
|
||||
<code>$NAME</code> part of this by concatenating the driver type with the id
|
||||
and truncated name of the guest, and then escaping any systemd reserved
|
||||
characters.
|
||||
So for a guest <code>demo</code> running under the <code>lxc</code> driver,
|
||||
we get a <code>$NAME</code> of <code>lxc-12345-demo</code> which when escaped
|
||||
is <code>lxc\x2d12345\x2ddemo</code>. So the complete scope name is
|
||||
<code>machine-lxc\x2d12345\x2ddemo.scope</code>.
|
||||
The scope names map directly to the cgroup directory names.
|
||||
</p>
|
||||
|
||||
<h4><a id="systemdSlice">Systemd slice naming</a></h4>
|
||||
|
||||
<p>
|
||||
The systemd convention for slice naming is that a slice should include the
|
||||
name of all of its parents prepended on its own name. So for a libvirt
|
||||
partition <code>/machine/engineering/testing</code>, the slice name will
|
||||
be <code>machine-engineering-testing.slice</code>. Again the slice names
|
||||
map directly to the cgroup directory names. Systemd creates three top level
|
||||
slices by default, <code>system.slice</code> <code>user.slice</code> and
|
||||
<code>machine.slice</code>. All virtual machines or containers created
|
||||
by libvirt will be associated with <code>machine.slice</code> by default.
|
||||
</p>
|
||||
|
||||
<h4><a id="systemdLayout">Systemd cgroup layout</a></h4>
|
||||
|
||||
<p>
|
||||
Given this, a possible systemd cgroups layout involving 3 qemu guests,
|
||||
3 lxc containers and 3 custom child slices, would be:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
$ROOT
|
||||
|
|
||||
+- system.slice
|
||||
| |
|
||||
| +- libvirtd.service
|
||||
|
|
||||
+- machine.slice
|
||||
|
|
||||
+- machine-qemu\x2d1\x2dvm1.scope
|
||||
| |
|
||||
| +- libvirt
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-qemu\x2d2\x2dvm2.scope
|
||||
| |
|
||||
| +- libvirt
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-qemu\x2d3\x2dvm3.scope
|
||||
| |
|
||||
| +- libvirt
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-engineering.slice
|
||||
| |
|
||||
| +- machine-engineering-testing.slice
|
||||
| | |
|
||||
| | +- machine-lxc\x2d11111\x2dcontainer1.scope
|
||||
| |
|
||||
| +- machine-engineering-production.slice
|
||||
| |
|
||||
| +- machine-lxc\x2d22222\x2dcontainer2.scope
|
||||
|
|
||||
+- machine-marketing.slice
|
||||
|
|
||||
+- machine-lxc\x2d33333\x2dcontainer3.scope
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Prior libvirt 7.1.0 the topology doesn't have extra
|
||||
<code>libvirt</code> directory.
|
||||
</p>
|
||||
|
||||
<h3><a id="currentLayoutGeneric">Non-systemd cgroups layout</a></h3>
|
||||
|
||||
<p>
|
||||
On hosts which do not use systemd, each consumer has a corresponding cgroup
|
||||
named <code>$VMNAME.libvirt-{qemu,lxc}</code>. Each consumer is associated
|
||||
with exactly one partition, which also have a corresponding cgroup usually
|
||||
named <code>$PARTNAME.partition</code>. The exceptions to this naming rule
|
||||
is the top level default partition for virtual machines and containers
|
||||
<code>/machine</code>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Given this, a possible non-systemd cgroups layout involving 3 qemu guests,
|
||||
3 lxc containers and 2 custom child slices, would be:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
$ROOT
|
||||
|
|
||||
+- machine
|
||||
|
|
||||
+- qemu-1-vm1.libvirt-qemu
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- qeme-2-vm2.libvirt-qemu
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- qemu-3-vm3.libvirt-qemu
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- engineering.partition
|
||||
| |
|
||||
| +- testing.partition
|
||||
| | |
|
||||
| | +- lxc-11111-container1.libvirt-lxc
|
||||
| |
|
||||
| +- production.partition
|
||||
| |
|
||||
| +- lxc-22222-container2.libvirt-lxc
|
||||
|
|
||||
+- marketing.partition
|
||||
|
|
||||
+- lxc-33333-container3.libvirt-lxc
|
||||
</pre>
|
||||
|
||||
<h2><a id="customPartiton">Using custom partitions</a></h2>
|
||||
|
||||
<p>
|
||||
If there is a need to apply resource constraints to groups of
|
||||
virtual machines or containers, then the single default
|
||||
partition <code>/machine</code> may not be sufficiently
|
||||
flexible. The administrator may wish to sub-divide the
|
||||
default partition, for example into "testing" and "production"
|
||||
partitions, and then assign each guest to a specific
|
||||
sub-partition. This is achieved via a small element addition
|
||||
to the guest domain XML config, just below the main <code>domain</code>
|
||||
element
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
...
|
||||
<resource>
|
||||
<partition>/machine/production</partition>
|
||||
</resource>
|
||||
...
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Note that the partition names in the guest XML are using a
|
||||
generic naming format, not the low level naming convention
|
||||
required by the underlying host OS. That is, you should not include
|
||||
any of the <code>.partition</code> or <code>.slice</code>
|
||||
suffixes in the XML config. Given a partition name
|
||||
<code>/machine/production</code>, libvirt will automatically
|
||||
apply the platform specific translation required to get
|
||||
<code>/machine/production.partition</code> (non-systemd)
|
||||
or <code>/machine.slice/machine-production.slice</code>
|
||||
(systemd) as the underlying cgroup name
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Libvirt will not auto-create the cgroups directory to back
|
||||
this partition. In the future, libvirt / virsh will provide
|
||||
APIs / commands to create custom partitions, but currently
|
||||
this is left as an exercise for the administrator.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<strong>Note:</strong> the ability to place guests in custom
|
||||
partitions is only available with libvirt >= 1.0.5, using
|
||||
the new cgroup layout. The legacy cgroups layout described
|
||||
later in this document did not support customization per guest.
|
||||
</p>
|
||||
|
||||
<h3><a id="createSystemd">Creating custom partitions (systemd)</a></h3>
|
||||
|
||||
<p>
|
||||
Given the XML config above, the admin on a systemd based host would
|
||||
need to create a unit file <code>/etc/systemd/system/machine-production.slice</code>
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
# cat > /etc/systemd/system/machine-testing.slice <<EOF
|
||||
[Unit]
|
||||
Description=VM testing slice
|
||||
Before=slices.target
|
||||
Wants=machine.slice
|
||||
EOF
|
||||
# systemctl start machine-testing.slice
|
||||
</pre>
|
||||
|
||||
<h3><a id="createNonSystemd">Creating custom partitions (non-systemd)</a></h3>
|
||||
|
||||
<p>
|
||||
Given the XML config above, the admin on a non-systemd based host
|
||||
would need to create a cgroup named '/machine/production.partition'
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
# cd /sys/fs/cgroup
|
||||
# for i in blkio cpu,cpuacct cpuset devices freezer memory net_cls perf_event
|
||||
do
|
||||
mkdir $i/machine/production.partition
|
||||
done
|
||||
# for i in cpuset.cpus cpuset.mems
|
||||
do
|
||||
cat cpuset/machine/$i > cpuset/machine/production.partition/$i
|
||||
done
|
||||
</pre>
|
||||
|
||||
<h2><a id="resourceAPIs">Resource management APIs/commands</a></h2>
|
||||
|
||||
<p>
|
||||
Since libvirt aims to provide an API which is portable across
|
||||
hypervisors, the concept of cgroups is not exposed directly
|
||||
in the API or XML configuration. It is considered to be an
|
||||
internal implementation detail. Instead libvirt provides a
|
||||
set of APIs for applying resource controls, which are then
|
||||
mapped to corresponding cgroup tunables
|
||||
</p>
|
||||
|
||||
<h3>Scheduler tuning</h3>
|
||||
|
||||
<p>
|
||||
Parameters from the "cpu" controller are exposed via the
|
||||
<code>schedinfo</code> command in virsh.
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
# virsh schedinfo demo
|
||||
Scheduler : posix
|
||||
cpu_shares : 1024
|
||||
vcpu_period : 100000
|
||||
vcpu_quota : -1
|
||||
emulator_period: 100000
|
||||
emulator_quota : -1</pre>
|
||||
|
||||
|
||||
<h3>Block I/O tuning</h3>
|
||||
|
||||
<p>
|
||||
Parameters from the "blkio" controller are exposed via the
|
||||
<code>bkliotune</code> command in virsh.
|
||||
</p>
|
||||
|
||||
|
||||
<pre>
|
||||
# virsh blkiotune demo
|
||||
weight : 500
|
||||
device_weight : </pre>
|
||||
|
||||
<h3>Memory tuning</h3>
|
||||
|
||||
<p>
|
||||
Parameters from the "memory" controller are exposed via the
|
||||
<code>memtune</code> command in virsh.
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
# virsh memtune demo
|
||||
hard_limit : 580192
|
||||
soft_limit : unlimited
|
||||
swap_hard_limit: unlimited
|
||||
</pre>
|
||||
|
||||
<h3>Network tuning</h3>
|
||||
|
||||
<p>
|
||||
The <code>net_cls</code> is not currently used. Instead traffic
|
||||
filter policies are set directly against individual virtual
|
||||
network interfaces.
|
||||
</p>
|
||||
|
||||
<h2><a id="legacyLayout">Legacy cgroups layout</a></h2>
|
||||
|
||||
<p>
|
||||
Prior to libvirt 1.0.5, the cgroups layout created by libvirt was different
|
||||
from that described above, and did not allow for administrator customization.
|
||||
Libvirt used a fixed, 3-level hierarchy <code>libvirt/{qemu,lxc}/$VMNAME</code>
|
||||
which was rooted at the point in the hierarchy where libvirtd itself was
|
||||
located. So if libvirtd was placed at <code>/system/libvirtd.service</code>
|
||||
by systemd, the groups for each virtual machine / container would be located
|
||||
at <code>/system/libvirtd.service/libvirt/{qemu,lxc}/$VMNAME</code>. In addition
|
||||
to this, the QEMU drivers further child groups for each vCPU thread and the
|
||||
emulator thread(s). This leads to a hierarchy that looked like
|
||||
</p>
|
||||
|
||||
|
||||
<pre>
|
||||
$ROOT
|
||||
|
|
||||
+- system
|
||||
|
|
||||
+- libvirtd.service
|
||||
|
|
||||
+- libvirt
|
||||
|
|
||||
+- qemu
|
||||
| |
|
||||
| +- vm1
|
||||
| | |
|
||||
| | +- emulator
|
||||
| | +- vcpu0
|
||||
| | +- vcpu1
|
||||
| |
|
||||
| +- vm2
|
||||
| | |
|
||||
| | +- emulator
|
||||
| | +- vcpu0
|
||||
| | +- vcpu1
|
||||
| |
|
||||
| +- vm3
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- lxc
|
||||
|
|
||||
+- container1
|
||||
|
|
||||
+- container2
|
||||
|
|
||||
+- container3
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Although current releases are much improved, historically the use of deep
|
||||
hierarchies has had a significant negative impact on the kernel scalability.
|
||||
The legacy libvirt cgroups layout highlighted these problems, to the detriment
|
||||
of the performance of virtual machines and containers.
|
||||
</p>
|
||||
</body>
|
||||
</html>
|
364
docs/cgroups.rst
Normal file
364
docs/cgroups.rst
Normal file
@ -0,0 +1,364 @@
|
||||
==================================
|
||||
Control Groups Resource Management
|
||||
==================================
|
||||
|
||||
.. contents::
|
||||
|
||||
The QEMU and LXC drivers make use of the Linux "Control Groups" facility for
|
||||
applying resource management to their virtual machines and containers.
|
||||
|
||||
Required controllers
|
||||
--------------------
|
||||
|
||||
The control groups filesystem supports multiple "controllers". By default the
|
||||
init system (such as systemd) should mount all controllers compiled into the
|
||||
kernel at ``/sys/fs/cgroup/$CONTROLLER-NAME``. Libvirt will never attempt to
|
||||
mount any controllers itself, merely detect where they are mounted.
|
||||
|
||||
The QEMU driver is capable of using the ``cpuset``, ``cpu``, ``cpuacct``,
|
||||
``memory``, ``blkio`` and ``devices`` controllers. None of them are compulsory.
|
||||
If any controller is not mounted, the resource management APIs which use it will
|
||||
cease to operate. It is possible to explicitly turn off use of a controller,
|
||||
even when mounted, via the ``/etc/libvirt/qemu.conf`` configuration file.
|
||||
|
||||
The LXC driver is capable of using the ``cpuset``, ``cpu``, ``cpuacct``,
|
||||
``freezer``, ``memory``, ``blkio`` and ``devices`` controllers. The ``cpuacct``,
|
||||
``devices`` and ``memory`` controllers are compulsory. Without them mounted, no
|
||||
containers can be started. If any of the other controllers are not mounted, the
|
||||
resource management APIs which use them will cease to operate.
|
||||
|
||||
Current cgroups layout
|
||||
----------------------
|
||||
|
||||
As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
|
||||
simplified, in order to facilitate the setup of resource control policies by
|
||||
administrators / management applications. The new layout is based on the
|
||||
concepts of "partitions" and "consumers". A "consumer" is a cgroup which holds
|
||||
the processes for a single virtual machine or container. A "partition" is a
|
||||
cgroup which does not contain any processes, but can have resource controls
|
||||
applied. A "partition" will have zero or more child directories which may be
|
||||
either "consumer" or "partition".
|
||||
|
||||
As of libvirt 1.1.1 or later, the cgroups layout will have some slight
|
||||
differences when running on a host with systemd 205 or later. The overall tree
|
||||
structure is the same, but there are some differences in the naming conventions
|
||||
for the cgroup directories. Thus the following docs split in two, one describing
|
||||
systemd hosts and the other non-systemd hosts.
|
||||
|
||||
Systemd cgroups integration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
On hosts which use systemd, each consumer maps to a systemd scope unit, while
|
||||
partitions map to a system slice unit.
|
||||
|
||||
Systemd scope naming
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The systemd convention is for the scope name of virtual machines / containers to
|
||||
be of the general format ``machine-$NAME.scope``. Libvirt forms the ``$NAME``
|
||||
part of this by concatenating the driver type with the id and truncated name of
|
||||
the guest, and then escaping any systemd reserved characters. So for a guest
|
||||
``demo`` running under the ``lxc`` driver, we get a ``$NAME`` of
|
||||
``lxc-12345-demo`` which when escaped is ``lxc\x2d12345\x2ddemo``. So the
|
||||
complete scope name is ``machine-lxc\x2d12345\x2ddemo.scope``. The scope names
|
||||
map directly to the cgroup directory names.
|
||||
|
||||
Systemd slice naming
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The systemd convention for slice naming is that a slice should include the name
|
||||
of all of its parents prepended on its own name. So for a libvirt partition
|
||||
``/machine/engineering/testing``, the slice name will be
|
||||
``machine-engineering-testing.slice``. Again the slice names map directly to the
|
||||
cgroup directory names. Systemd creates three top level slices by default,
|
||||
``system.slice`` ``user.slice`` and ``machine.slice``. All virtual machines or
|
||||
containers created by libvirt will be associated with ``machine.slice`` by
|
||||
default.
|
||||
|
||||
Systemd cgroup layout
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Given this, a possible systemd cgroups layout involving 3 qemu guests, 3 lxc
|
||||
containers and 3 custom child slices, would be:
|
||||
|
||||
::
|
||||
|
||||
$ROOT
|
||||
|
|
||||
+- system.slice
|
||||
| |
|
||||
| +- libvirtd.service
|
||||
|
|
||||
+- machine.slice
|
||||
|
|
||||
+- machine-qemu\x2d1\x2dvm1.scope
|
||||
| |
|
||||
| +- libvirt
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-qemu\x2d2\x2dvm2.scope
|
||||
| |
|
||||
| +- libvirt
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-qemu\x2d3\x2dvm3.scope
|
||||
| |
|
||||
| +- libvirt
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-engineering.slice
|
||||
| |
|
||||
| +- machine-engineering-testing.slice
|
||||
| | |
|
||||
| | +- machine-lxc\x2d11111\x2dcontainer1.scope
|
||||
| |
|
||||
| +- machine-engineering-production.slice
|
||||
| |
|
||||
| +- machine-lxc\x2d22222\x2dcontainer2.scope
|
||||
|
|
||||
+- machine-marketing.slice
|
||||
|
|
||||
+- machine-lxc\x2d33333\x2dcontainer3.scope
|
||||
|
||||
Prior libvirt 7.1.0 the topology doesn't have extra ``libvirt`` directory.
|
||||
|
||||
Non-systemd cgroups layout
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
On hosts which do not use systemd, each consumer has a corresponding cgroup
|
||||
named ``$VMNAME.libvirt-{qemu,lxc}``. Each consumer is associated with exactly
|
||||
one partition, which also have a corresponding cgroup usually named
|
||||
``$PARTNAME.partition``. The exceptions to this naming rule is the top level
|
||||
default partition for virtual machines and containers ``/machine``.
|
||||
|
||||
Given this, a possible non-systemd cgroups layout involving 3 qemu guests, 3 lxc
|
||||
containers and 2 custom child slices, would be:
|
||||
|
||||
::
|
||||
|
||||
$ROOT
|
||||
|
|
||||
+- machine
|
||||
|
|
||||
+- qemu-1-vm1.libvirt-qemu
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- qeme-2-vm2.libvirt-qemu
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- qemu-3-vm3.libvirt-qemu
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- engineering.partition
|
||||
| |
|
||||
| +- testing.partition
|
||||
| | |
|
||||
| | +- lxc-11111-container1.libvirt-lxc
|
||||
| |
|
||||
| +- production.partition
|
||||
| |
|
||||
| +- lxc-22222-container2.libvirt-lxc
|
||||
|
|
||||
+- marketing.partition
|
||||
|
|
||||
+- lxc-33333-container3.libvirt-lxc
|
||||
|
||||
Using custom partitions
|
||||
-----------------------
|
||||
|
||||
If there is a need to apply resource constraints to groups of virtual machines
|
||||
or containers, then the single default partition ``/machine`` may not be
|
||||
sufficiently flexible. The administrator may wish to sub-divide the default
|
||||
partition, for example into "testing" and "production" partitions, and then
|
||||
assign each guest to a specific sub-partition. This is achieved via a small
|
||||
element addition to the guest domain XML config, just below the main ``domain``
|
||||
element
|
||||
|
||||
::
|
||||
|
||||
...
|
||||
<resource>
|
||||
<partition>/machine/production</partition>
|
||||
</resource>
|
||||
...
|
||||
|
||||
Note that the partition names in the guest XML are using a generic naming
|
||||
format, not the low level naming convention required by the underlying host OS.
|
||||
That is, you should not include any of the ``.partition`` or ``.slice`` suffixes
|
||||
in the XML config. Given a partition name ``/machine/production``, libvirt will
|
||||
automatically apply the platform specific translation required to get
|
||||
``/machine/production.partition`` (non-systemd) or
|
||||
``/machine.slice/machine-production.slice`` (systemd) as the underlying cgroup
|
||||
name
|
||||
|
||||
Libvirt will not auto-create the cgroups directory to back this partition. In
|
||||
the future, libvirt / virsh will provide APIs / commands to create custom
|
||||
partitions, but currently this is left as an exercise for the administrator.
|
||||
|
||||
**Note:** the ability to place guests in custom partitions is only available
|
||||
with libvirt >= 1.0.5, using the new cgroup layout. The legacy cgroups layout
|
||||
described later in this document did not support customization per guest.
|
||||
|
||||
Creating custom partitions (systemd)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Given the XML config above, the admin on a systemd based host would need to
|
||||
create a unit file ``/etc/systemd/system/machine-production.slice``
|
||||
|
||||
::
|
||||
|
||||
# cat > /etc/systemd/system/machine-testing.slice <<EOF
|
||||
[Unit]
|
||||
Description=VM testing slice
|
||||
Before=slices.target
|
||||
Wants=machine.slice
|
||||
EOF
|
||||
# systemctl start machine-testing.slice
|
||||
|
||||
Creating custom partitions (non-systemd)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Given the XML config above, the admin on a non-systemd based host would need to
|
||||
create a cgroup named '/machine/production.partition'
|
||||
|
||||
::
|
||||
|
||||
# cd /sys/fs/cgroup
|
||||
# for i in blkio cpu,cpuacct cpuset devices freezer memory net_cls perf_event
|
||||
do
|
||||
mkdir $i/machine/production.partition
|
||||
done
|
||||
# for i in cpuset.cpus cpuset.mems
|
||||
do
|
||||
cat cpuset/machine/$i > cpuset/machine/production.partition/$i
|
||||
done
|
||||
|
||||
Resource management APIs/commands
|
||||
---------------------------------
|
||||
|
||||
Since libvirt aims to provide an API which is portable across hypervisors, the
|
||||
concept of cgroups is not exposed directly in the API or XML configuration. It
|
||||
is considered to be an internal implementation detail. Instead libvirt provides
|
||||
a set of APIs for applying resource controls, which are then mapped to
|
||||
corresponding cgroup tunables
|
||||
|
||||
Scheduler tuning
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Parameters from the "cpu" controller are exposed via the ``schedinfo`` command
|
||||
in virsh.
|
||||
|
||||
::
|
||||
|
||||
# virsh schedinfo demo
|
||||
Scheduler : posix
|
||||
cpu_shares : 1024
|
||||
vcpu_period : 100000
|
||||
vcpu_quota : -1
|
||||
emulator_period: 100000
|
||||
emulator_quota : -1
|
||||
|
||||
Block I/O tuning
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Parameters from the "blkio" controller are exposed via the ``bkliotune`` command
|
||||
in virsh.
|
||||
|
||||
::
|
||||
|
||||
# virsh blkiotune demo
|
||||
weight : 500
|
||||
device_weight :
|
||||
|
||||
Memory tuning
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Parameters from the "memory" controller are exposed via the ``memtune`` command
|
||||
in virsh.
|
||||
|
||||
::
|
||||
|
||||
# virsh memtune demo
|
||||
hard_limit : 580192
|
||||
soft_limit : unlimited
|
||||
swap_hard_limit: unlimited
|
||||
|
||||
Network tuning
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The ``net_cls`` is not currently used. Instead traffic filter policies are set
|
||||
directly against individual virtual network interfaces.
|
||||
|
||||
Legacy cgroups layout
|
||||
---------------------
|
||||
|
||||
Prior to libvirt 1.0.5, the cgroups layout created by libvirt was different from
|
||||
that described above, and did not allow for administrator customization. Libvirt
|
||||
used a fixed, 3-level hierarchy ``libvirt/{qemu,lxc}/$VMNAME`` which was rooted
|
||||
at the point in the hierarchy where libvirtd itself was located. So if libvirtd
|
||||
was placed at ``/system/libvirtd.service`` by systemd, the groups for each
|
||||
virtual machine / container would be located at
|
||||
``/system/libvirtd.service/libvirt/{qemu,lxc}/$VMNAME``. In addition to this,
|
||||
the QEMU drivers further child groups for each vCPU thread and the emulator
|
||||
thread(s). This leads to a hierarchy that looked like
|
||||
|
||||
::
|
||||
|
||||
$ROOT
|
||||
|
|
||||
+- system
|
||||
|
|
||||
+- libvirtd.service
|
||||
|
|
||||
+- libvirt
|
||||
|
|
||||
+- qemu
|
||||
| |
|
||||
| +- vm1
|
||||
| | |
|
||||
| | +- emulator
|
||||
| | +- vcpu0
|
||||
| | +- vcpu1
|
||||
| |
|
||||
| +- vm2
|
||||
| | |
|
||||
| | +- emulator
|
||||
| | +- vcpu0
|
||||
| | +- vcpu1
|
||||
| |
|
||||
| +- vm3
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- lxc
|
||||
|
|
||||
+- container1
|
||||
|
|
||||
+- container2
|
||||
|
|
||||
+- container3
|
||||
|
||||
Although current releases are much improved, historically the use of deep
|
||||
hierarchies has had a significant negative impact on the kernel scalability. The
|
||||
legacy libvirt cgroups layout highlighted these problems, to the detriment of
|
||||
the performance of virtual machines and containers.
|
@ -19,7 +19,6 @@ docs_assets = [
|
||||
|
||||
docs_html_in_files = [
|
||||
'404',
|
||||
'cgroups',
|
||||
'csharp',
|
||||
'dbus',
|
||||
'docs',
|
||||
@ -70,6 +69,7 @@ docs_rst_files = [
|
||||
'best-practices',
|
||||
'bindings',
|
||||
'bugs',
|
||||
'cgroups',
|
||||
'ci',
|
||||
'coding-style',
|
||||
'committer-guidelines',
|
||||
|
Loading…
Reference in New Issue
Block a user