mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2025-03-20 07:59:00 +00:00
Improve cgroups docs to cover systemd integration
As of libvirt 1.1.1 and systemd 205, the cgroups layout used by libvirt has some changes. Update the 'cgroups.html' file from the website to describe how it works in a systemd world. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This commit is contained in:
parent
5087a5a009
commit
7f2b173feb
@ -47,17 +47,121 @@
|
||||
<p>
|
||||
As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
|
||||
simplified, in order to facilitate the setup of resource control policies by
|
||||
administrators / management applications. The layout is based on the concepts of
|
||||
"partitions" and "consumers". Each virtual machine or container is a consumer,
|
||||
and has a corresponding cgroup named <code>$VMNAME.libvirt-{qemu,lxc}</code>.
|
||||
Each consumer is associated with exactly one partition, which also have a
|
||||
corresponding cgroup usually named <code>$PARTNAME.partition</code>. The
|
||||
exceptions to this naming rule are the three top level default partitions,
|
||||
named <code>/system</code> (for system services), <code>/user</code> (for
|
||||
user login sessions) and <code>/machine</code> (for virtual machines and
|
||||
containers). By default every consumer will of course be associated with
|
||||
the <code>/machine</code> partition. This leads to a hierarchy that looks
|
||||
like
|
||||
administrators / management applications. The new layout is based on the concepts
|
||||
of "partitions" and "consumers". A "consumer" is a cgroup which holds the
|
||||
processes for a single virtual machine or container. A "partition" is a cgroup
|
||||
which does not contain any processes, but can have resource controls applied.
|
||||
A "partition" will have zero or more child directories which may be either
|
||||
"consumer" or "partition".
|
||||
</p>
|
||||
|
||||
<p>
|
||||
As of libvirt 1.1.1 or later, the cgroups layout will have some slight
|
||||
differences when running on a host with systemd 205 or later. The overall
|
||||
tree structure is the same, but there are some differences in the naming
|
||||
conventions for the cgroup directories. Thus the following docs split
|
||||
in two, one describing systemd hosts and the other non-systemd hosts.
|
||||
</p>
|
||||
|
||||
<h3><a name="currentLayoutSystemd">Systemd cgroups integration</a></h3>
|
||||
|
||||
<p>
|
||||
On hosts which use systemd, each consumer maps to a systemd scope unit,
|
||||
while partitions map to a system slice unit.
|
||||
</p>
|
||||
|
||||
<h4><a name="systemdScope">Systemd scope naming</a></h4>
|
||||
|
||||
<p>
|
||||
The systemd convention is for the scope name of virtual machines / containers
|
||||
to be of the general format <code>machine-$NAME.scope</code>. Libvirt forms the
|
||||
<code>$NAME</code> part of this by concatenating the driver type with the name
|
||||
of the guest, and then escaping any systemd reserved characters.
|
||||
So for a guest <code>demo</code> running under the <code>lxc</code> driver,
|
||||
we get a <code>$NAME</code> of <code>lxc-demo</code> which when escaped is
|
||||
<code>lxc\x2ddemo</code>. So the complete scope name is <code>machine-lxc\x2ddemo.scope</code>.
|
||||
The scope names map directly to the cgroup directory names.
|
||||
</p>
|
||||
|
||||
<h4><a name="systemdSlice">Systemd slice naming</a></h4>
|
||||
|
||||
<p>
|
||||
The systemd convention for slice naming is that a slice should include the
|
||||
name of all of its parents prepended on its own name. So for a libvirt
|
||||
partition <code>/machine/engineering/testing</code>, the slice name will
|
||||
be <code>machine-engineering-testing.slice</code>. Again the slice names
|
||||
map directly to the cgroup directory names. Systemd creates three top level
|
||||
slices by default, <code>system.slice</code> <code>user.slice</code> and
|
||||
<code>machine.slice</code>. All virtual machines or containers created
|
||||
by libvirt will be associated with <code>machine.slice</code> by default.
|
||||
</p>
|
||||
|
||||
<h4><a name="systemdLayout">Systemd cgroup layout</a></h4>
|
||||
|
||||
<p>
|
||||
Given this, a possible systemd cgroups layout involving 3 qemu guests,
|
||||
3 lxc containers and 3 custom child slices, would be:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
$ROOT
|
||||
|
|
||||
+- system.slice
|
||||
| |
|
||||
| +- libvirtd.service
|
||||
|
|
||||
+- machine.slice
|
||||
|
|
||||
+- machine-qemu\x2dvm1.scope
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-qemu\x2dvm2.scope
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-qemu\x2dvm3.scope
|
||||
| |
|
||||
| +- emulator
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- machine-engineering.slice
|
||||
| |
|
||||
| +- machine-engineering-testing.slice
|
||||
| | |
|
||||
| | +- machine-lxc\x2dcontainer1.scope
|
||||
| |
|
||||
| +- machine-engineering-production.slice
|
||||
| |
|
||||
| +- machine-lxc\x2dcontainer2.scope
|
||||
|
|
||||
+- machine-marketing.slice
|
||||
|
|
||||
+- machine-lxc\x2dcontainer3.scope
|
||||
</pre>
|
||||
|
||||
<h3><a name="currentLayoutGeneric">Non-systemd cgroups layout</a></h3>
|
||||
|
||||
<p>
|
||||
On hosts which do not use systemd, each consumer has a corresponding cgroup
|
||||
named <code>$VMNAME.libvirt-{qemu,lxc}</code>. Each consumer is associated
|
||||
with exactly one partition, which also have a corresponding cgroup usually
|
||||
named <code>$PARTNAME.partition</code>. The exceptions to this naming rule
|
||||
are the three top level default partitions, named <code>/system</code> (for
|
||||
system services), <code>/user</code> (for user login sessions) and
|
||||
<code>/machine</code> (for virtual machines and containers). By default
|
||||
every consumer will of course be associated with the <code>/machine</code>
|
||||
partition.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Given this, a possible systemd cgroups layout involving 3 qemu guests,
|
||||
3 lxc containers and 2 custom child slices, would be:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
@ -87,23 +191,21 @@ $ROOT
|
||||
| +- vcpu0
|
||||
| +- vcpu1
|
||||
|
|
||||
+- container1.libvirt-lxc
|
||||
+- engineering.partition
|
||||
| |
|
||||
| +- testing.partition
|
||||
| | |
|
||||
| | +- container1.libvirt-lxc
|
||||
| |
|
||||
| +- production.partition
|
||||
| |
|
||||
| +- container2.libvirt-lxc
|
||||
|
|
||||
+- container2.libvirt-lxc
|
||||
|
|
||||
+- container3.libvirt-lxc
|
||||
+- marketing.partition
|
||||
|
|
||||
+- container3.libvirt-lxc
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
The default cgroups layout ensures that, when there is contention for
|
||||
CPU time, it is shared equally between system services, user sessions
|
||||
and virtual machines / containers. This prevents virtual machines from
|
||||
locking the administrator out of the host, or impacting execution of
|
||||
system services. Conversely, when there is no contention from
|
||||
system services / user sessions, it is possible for virtual machines
|
||||
to fully utilize the host CPUs.
|
||||
</p>
|
||||
|
||||
<h2><a name="customPartiton">Using custom partitions</a></h2>
|
||||
|
||||
<p>
|
||||
@ -126,13 +228,55 @@ $ROOT
|
||||
...
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Note that the partition names in the guest XML are using a
|
||||
generic naming format, not the low level naming convention
|
||||
required by the underlying host OS. That is, you should not include
|
||||
any of the <code>.partition</code> or <code>.slice</code>
|
||||
suffixes in the XML config. Given a partition name
|
||||
<code>/machine/production</code>, libvirt will automatically
|
||||
apply the platform specific translation required to get
|
||||
<code>/machine/production.partition</code> (non-systemd)
|
||||
or <code>/machine.slice/machine-production.slice</code>
|
||||
(systemd) as the underlying cgroup name
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Libvirt will not auto-create the cgroups directory to back
|
||||
this partition. In the future, libvirt / virsh will provide
|
||||
APIs / commands to create custom partitions, but currently
|
||||
this is left as an exercise for the administrator. For
|
||||
example, given the XML config above, the admin would need
|
||||
to create a cgroup named '/machine/production.partition'
|
||||
this is left as an exercise for the administrator.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<strong>Note:</strong> the ability to place guests in custom
|
||||
partitions is only available with libvirt >= 1.0.5, using
|
||||
the new cgroup layout. The legacy cgroups layout described
|
||||
later in this document did not support customization per guest.
|
||||
</p>
|
||||
|
||||
<h3><a name="createSystemd">Creating custom partitions (systemd)</a></h3>
|
||||
|
||||
<p>
|
||||
Given the XML config above, the admin on a systemd based host would
|
||||
need to create a unit file <code>/etc/systemd/system/machine-production.slice</code>
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
# cat > /etc/systemd/system/machine-testing.slice <<EOF
|
||||
[Unit]
|
||||
Description=VM testing slice
|
||||
Before=slices.target
|
||||
Wants=machine.slice
|
||||
EOF
|
||||
# systemctl start machine-testing.slice
|
||||
</pre>
|
||||
|
||||
<h3><a name="createNonSystemd">Creating custom partitions (non-systemd)</a></h3>
|
||||
|
||||
<p>
|
||||
Given the XML config above, the admin on a non-systemd based host
|
||||
would need to create a cgroup named '/machine/production.partition'
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
@ -147,18 +291,6 @@ $ROOT
|
||||
done
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
<strong>Note:</strong> the cgroups directory created as a ".partition"
|
||||
suffix, but the XML config does not require this suffix.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<strong>Note:</strong> the ability to place guests in custom
|
||||
partitions is only available with libvirt >= 1.0.5, using
|
||||
the new cgroup layout. The legacy cgroups layout described
|
||||
later did not support customization per guest.
|
||||
</p>
|
||||
|
||||
<h2><a name="resourceAPIs">Resource management APIs/commands</a></h2>
|
||||
|
||||
<p>
|
||||
|
Loading…
x
Reference in New Issue
Block a user