diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in new file mode 100644 index 0000000000..77656b2500 --- /dev/null +++ b/docs/cgroups.html.in @@ -0,0 +1,285 @@ + + + +
++ The QEMU and LXC drivers make use of the Linux "Control Groups" facility + for applying resource management to their virtual machines and containers. +
+ +
+ The control groups filesystem supports multiple "controllers". By default
+ the init system (such as systemd) should mount all controllers compiled
+ into the kernel at /sys/fs/cgroup/$CONTROLLER-NAME
. Libvirt
+ will never attempt to mount any controllers itself, merely detect where
+ they are mounted.
+
+ The QEMU driver is capable of using the cpuset
,
+ cpu
, memory
, blkio
and
+ devices
controllers. None of them are compulsory.
+ If any controller is not mounted, the resource management APIs
+ which use it will cease to operate. It is possible to explicitly
+ turn off use of a controller, even when mounted, via the
+ /etc/libvirt/qemu.conf
configuration file.
+
+ The LXC driver is capable of using the cpuset
,
+ cpu
, cpuset
, freezer
,
+ memory
, blkio
and devices
+ controllers. The cpuset
, devices
+ and memory
controllers are compulsory. Without
+ them mounted, no containers can be started. If any of the
+ other controllers are not mounted, the resource management APIs
+ which use them will cease to operate.
+
+ As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
+ simplified, in order to facilitate the setup of resource control policies by
+ administrators / management applications. The layout is based on the concepts of
+ "partitions" and "consumers". Each virtual machine or container is a consumer,
+ and has a corresponding cgroup named $VMNAME.libvirt-{qemu,lxc}
.
+ Each consumer is associated with exactly one partition, which also have a
+ corresponding cgroup usually named $PARTNAME.partition
. The
+ exceptions to this naming rule are the three top level default partitions,
+ named /system
(for system services), /user
(for
+ user login sessions) and /machine
(for virtual machines and
+ containers). By default every consumer will of course be associated with
+ the /machine
partition. This leads to a hierarchy that looks
+ like
+
+$ROOT + | + +- system + | | + | +- libvirtd.service + | + +- machine + | + +- vm1.libvirt-qemu + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- vm2.libvirt-qemu + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- vm3.libvirt-qemu + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- container1.libvirt-lxc + | + +- container2.libvirt-lxc + | + +- container3.libvirt-lxc ++ +
+ The default cgroups layout ensures that, when there is contention for + CPU time, it is shared equally between system services, user sessions + and virtual machines / containers. This prevents virtual machines from + locking the administrator out of the host, or impacting execution of + system services. Conversely, when there is no contention from + system services / user sessions, it is possible for virtual machines + to fully utilize the host CPUs. +
+ +
+ If there is a need to apply resource constraints to groups of
+ virtual machines or containers, then the single default
+ partition /machine
may not be sufficiently
+ flexible. The administrator may wish to sub-divide the
+ default partition, for example into "testing" and "production"
+ partitions, and then assign each guest to a specific
+ sub-partition. This is achieved via a small element addition
+ to the guest domain XML config, just below the main domain
+ element
+
+ ... + <resource> + <partition>/machine/production</partition> + </resource> + ... ++ +
+ Libvirt will not auto-create the cgroups directory to back + this partition. In the future, libvirt / virsh will provide + APIs / commands to create custom partitions, but currently + this is left as an exercise for the administrator. For + example, given the XML config above, the admin would need + to create a cgroup named '/machine/production.partition' +
+ ++# cd /sys/fs/cgroup +# for i in blkio cpu,cpuacct cpuset devices freezer memory net_cls perf_event + do + mkdir $i/machine/production.partition + done +# for i in cpuset.cpus cpuset.mems + do + cat cpuset/machine/$i > cpuset/machine/production.partition/$i + done ++ +
+ Note: the cgroups directory created as a ".partition" + suffix, but the XML config does not require this suffix. +
+ ++ Note: the ability to place guests in custom + partitions is only available with libvirt >= 1.0.5, using + the new cgroup layout. The legacy cgroups layout described + later did not support customization per guest. +
+ ++ Since libvirt aims to provide an API which is portable across + hypervisors, the concept of cgroups is not exposed directly + in the API or XML configuration. It is considered to be an + internal implementation detail. Instead libvirt provides a + set of APIs for applying resource controls, which are then + mapped to corresponding cgroup tunables +
+ +
+ Parameters from the "cpu" controller are exposed via the
+ schedinfo
command in virsh.
+
+# virsh schedinfo demo +Scheduler : posix +cpu_shares : 1024 +vcpu_period : 100000 +vcpu_quota : -1 +emulator_period: 100000 +emulator_quota : -1+ + +
+ Parameters from the "blkio" controller are exposed via the
+ bkliotune
command in virsh.
+
+# virsh blkiotune demo +weight : 500 +device_weight :+ +
+ Parameters from the "memory" controller are exposed via the
+ memtune
command in virsh.
+
+# virsh memtune demo +hard_limit : 580192 +soft_limit : unlimited +swap_hard_limit: unlimited ++ +
+ The net_cls
is not currently used. Instead traffic
+ filter policies are set directly against individual virtual
+ network interfaces.
+
+ Prior to libvirt 1.0.5, the cgroups layout created by libvirt was different
+ from that described above, and did not allow for administrator customization.
+ Libvirt used a fixed, 3-level hierarchy libvirt/{qemu,lxc}/$VMNAME
+ which was rooted at the point in the hierarchy where libvirtd itself was
+ located. So if libvirtd was placed at /system/libvirtd.service
+ by systemd, the groups for each virtual machine / container would be located
+ at /system/libvirtd.service/libvirt/{qemu,lxc}/$VMNAME
. In addition
+ to this, the QEMU drivers further child groups for each vCPU thread and the
+ emulator thread(s). This leads to a hierarchy that looked like
+
+$ROOT + | + +- system + | + +- libvirtd.service + | + +- libvirt + | + +- qemu + | | + | +- vm1 + | | | + | | +- emulator + | | +- vcpu0 + | | +- vcpu1 + | | + | +- vm2 + | | | + | | +- emulator + | | +- vcpu0 + | | +- vcpu1 + | | + | +- vm3 + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- lxc + | + +- container1 + | + +- container2 + | + +- container3 ++ +
+ Although current releases are much improved, historically the use of deep + hierarchies has had a significant negative impact on the kernel scalability. + The legacy libvirt cgroups layout highlighted these problems, to the detriment + of the performance of virtual machines and containers. +
+ + diff --git a/docs/sitemap.html.in b/docs/sitemap.html.in index afabf2dc22..ce8593a250 100644 --- a/docs/sitemap.html.in +++ b/docs/sitemap.html.in @@ -86,6 +86,10 @@ Disk locking Ensuring exclusive guest access to disks +