Improve cgroups docs to cover systemd integration

As of libvirt 1.1.1 and systemd 205, the cgroups layout used by
libvirt has some changes. Update the 'cgroups.html' file from
the website to describe how it works in a systemd world.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This commit is contained in:
Daniel P. Berrange 2013-11-11 16:58:35 +00:00
parent 5087a5a009
commit 7f2b173feb

View File

@ -47,17 +47,121 @@
<p>
As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
simplified, in order to facilitate the setup of resource control policies by
administrators / management applications. The layout is based on the concepts of
"partitions" and "consumers". Each virtual machine or container is a consumer,
and has a corresponding cgroup named <code>$VMNAME.libvirt-{qemu,lxc}</code>.
Each consumer is associated with exactly one partition, which also have a
corresponding cgroup usually named <code>$PARTNAME.partition</code>. The
exceptions to this naming rule are the three top level default partitions,
named <code>/system</code> (for system services), <code>/user</code> (for
user login sessions) and <code>/machine</code> (for virtual machines and
containers). By default every consumer will of course be associated with
the <code>/machine</code> partition. This leads to a hierarchy that looks
like
administrators / management applications. The new layout is based on the concepts
of "partitions" and "consumers". A "consumer" is a cgroup which holds the
processes for a single virtual machine or container. A "partition" is a cgroup
which does not contain any processes, but can have resource controls applied.
A "partition" will have zero or more child directories which may be either
"consumer" or "partition".
</p>
<p>
As of libvirt 1.1.1 or later, the cgroups layout will have some slight
differences when running on a host with systemd 205 or later. The overall
tree structure is the same, but there are some differences in the naming
conventions for the cgroup directories. Thus the following docs split
in two, one describing systemd hosts and the other non-systemd hosts.
</p>
<h3><a name="currentLayoutSystemd">Systemd cgroups integration</a></h3>
<p>
On hosts which use systemd, each consumer maps to a systemd scope unit,
while partitions map to a system slice unit.
</p>
<h4><a name="systemdScope">Systemd scope naming</a></h4>
<p>
The systemd convention is for the scope name of virtual machines / containers
to be of the general format <code>machine-$NAME.scope</code>. Libvirt forms the
<code>$NAME</code> part of this by concatenating the driver type with the name
of the guest, and then escaping any systemd reserved characters.
So for a guest <code>demo</code> running under the <code>lxc</code> driver,
we get a <code>$NAME</code> of <code>lxc-demo</code> which when escaped is
<code>lxc\x2ddemo</code>. So the complete scope name is <code>machine-lxc\x2ddemo.scope</code>.
The scope names map directly to the cgroup directory names.
</p>
<h4><a name="systemdSlice">Systemd slice naming</a></h4>
<p>
The systemd convention for slice naming is that a slice should include the
name of all of its parents prepended on its own name. So for a libvirt
partition <code>/machine/engineering/testing</code>, the slice name will
be <code>machine-engineering-testing.slice</code>. Again the slice names
map directly to the cgroup directory names. Systemd creates three top level
slices by default, <code>system.slice</code> <code>user.slice</code> and
<code>machine.slice</code>. All virtual machines or containers created
by libvirt will be associated with <code>machine.slice</code> by default.
</p>
<h4><a name="systemdLayout">Systemd cgroup layout</a></h4>
<p>
Given this, a possible systemd cgroups layout involving 3 qemu guests,
3 lxc containers and 3 custom child slices, would be:
</p>
<pre>
$ROOT
|
+- system.slice
| |
| +- libvirtd.service
|
+- machine.slice
|
+- machine-qemu\x2dvm1.scope
| |
| +- emulator
| +- vcpu0
| +- vcpu1
|
+- machine-qemu\x2dvm2.scope
| |
| +- emulator
| +- vcpu0
| +- vcpu1
|
+- machine-qemu\x2dvm3.scope
| |
| +- emulator
| +- vcpu0
| +- vcpu1
|
+- machine-engineering.slice
| |
| +- machine-engineering-testing.slice
| | |
| | +- machine-lxc\x2dcontainer1.scope
| |
| +- machine-engineering-production.slice
| |
| +- machine-lxc\x2dcontainer2.scope
|
+- machine-marketing.slice
|
+- machine-lxc\x2dcontainer3.scope
</pre>
<h3><a name="currentLayoutGeneric">Non-systemd cgroups layout</a></h3>
<p>
On hosts which do not use systemd, each consumer has a corresponding cgroup
named <code>$VMNAME.libvirt-{qemu,lxc}</code>. Each consumer is associated
with exactly one partition, which also have a corresponding cgroup usually
named <code>$PARTNAME.partition</code>. The exceptions to this naming rule
are the three top level default partitions, named <code>/system</code> (for
system services), <code>/user</code> (for user login sessions) and
<code>/machine</code> (for virtual machines and containers). By default
every consumer will of course be associated with the <code>/machine</code>
partition.
</p>
<p>
Given this, a possible systemd cgroups layout involving 3 qemu guests,
3 lxc containers and 2 custom child slices, would be:
</p>
<pre>
@ -87,23 +191,21 @@ $ROOT
| +- vcpu0
| +- vcpu1
|
+- container1.libvirt-lxc
+- engineering.partition
| |
| +- testing.partition
| | |
| | +- container1.libvirt-lxc
| |
| +- production.partition
| |
| +- container2.libvirt-lxc
|
+- container2.libvirt-lxc
|
+- container3.libvirt-lxc
+- marketing.partition
|
+- container3.libvirt-lxc
</pre>
<p>
The default cgroups layout ensures that, when there is contention for
CPU time, it is shared equally between system services, user sessions
and virtual machines / containers. This prevents virtual machines from
locking the administrator out of the host, or impacting execution of
system services. Conversely, when there is no contention from
system services / user sessions, it is possible for virtual machines
to fully utilize the host CPUs.
</p>
<h2><a name="customPartiton">Using custom partitions</a></h2>
<p>
@ -126,13 +228,55 @@ $ROOT
...
</pre>
<p>
Note that the partition names in the guest XML are using a
generic naming format, not the low level naming convention
required by the underlying host OS. That is, you should not include
any of the <code>.partition</code> or <code>.slice</code>
suffixes in the XML config. Given a partition name
<code>/machine/production</code>, libvirt will automatically
apply the platform specific translation required to get
<code>/machine/production.partition</code> (non-systemd)
or <code>/machine.slice/machine-production.slice</code>
(systemd) as the underlying cgroup name
</p>
<p>
Libvirt will not auto-create the cgroups directory to back
this partition. In the future, libvirt / virsh will provide
APIs / commands to create custom partitions, but currently
this is left as an exercise for the administrator. For
example, given the XML config above, the admin would need
to create a cgroup named '/machine/production.partition'
this is left as an exercise for the administrator.
</p>
<p>
<strong>Note:</strong> the ability to place guests in custom
partitions is only available with libvirt &gt;= 1.0.5, using
the new cgroup layout. The legacy cgroups layout described
later in this document did not support customization per guest.
</p>
<h3><a name="createSystemd">Creating custom partitions (systemd)</a></h3>
<p>
Given the XML config above, the admin on a systemd based host would
need to create a unit file <code>/etc/systemd/system/machine-production.slice</code>
</p>
<pre>
# cat &gt; /etc/systemd/system/machine-testing.slice &lt;&lt;EOF
[Unit]
Description=VM testing slice
Before=slices.target
Wants=machine.slice
EOF
# systemctl start machine-testing.slice
</pre>
<h3><a name="createNonSystemd">Creating custom partitions (non-systemd)</a></h3>
<p>
Given the XML config above, the admin on a non-systemd based host
would need to create a cgroup named '/machine/production.partition'
</p>
<pre>
@ -147,18 +291,6 @@ $ROOT
done
</pre>
<p>
<strong>Note:</strong> the cgroups directory created as a ".partition"
suffix, but the XML config does not require this suffix.
</p>
<p>
<strong>Note:</strong> the ability to place guests in custom
partitions is only available with libvirt &gt;= 1.0.5, using
the new cgroup layout. The legacy cgroups layout described
later did not support customization per guest.
</p>
<h2><a name="resourceAPIs">Resource management APIs/commands</a></h2>
<p>