mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2025-02-14 07:31:30 +00:00
https://bugzilla.redhat.com/show_bug.cgi?id=965169 documents a problem starting domains when cgroups are enabled; I was able to reliably reproduce the race about 5% of the time when I added hooks to domain startup by 3 seconds (as that seemed to be about the length of time that qemu created and then closed a temporary thread, probably related to aio handling of initially opening a disk image). The problem has existed since we introduced virCgroupMoveTask in commit 9102829 (v0.10.0). There are some inherent TOCTTOU races when moving tasks between kernel cgroups, precisely because threads can be created or completed in the window between when we read a thread id from the source and when we write to the destination. As the goal of virCgroupMoveTask is merely to move ALL tasks into the new cgroup, it is sufficient to iterate until no more threads are being created in the old group, and ignoring any threads that die before we can move them. It would be nicer to start the threads in the right cgroup to begin with, but by default, all child threads are created in the same cgroup as their parent, and we don't want vcpu child threads in the emulator cgroup, so I don't see any good way of avoiding the move. It would also be nice if the kernel were to implement something like rename() as a way to atomically move a group of threads from one cgroup to another, instead of forcing a window where we have to read and parse the source, then format and write back into the destination. * src/util/vircgroup.c (virCgroupAddTaskStrController): Ignore ESRCH, because a thread ended between read and write attempts. (virCgroupMoveTask): Loop until all threads have moved. Signed-off-by: Eric Blake <eblake@redhat.com>
libvirt library code README =========================== The directory provides the bulk of the libvirt codebase. Everything except for the libvirtd daemon and client tools. The build uses a large number of libtool convenience libraries - one for each child directory, and then links them together for the final libvirt.so, although some bits get linked directly to libvirtd daemon instead. The files directly in this directory are supporting the public API entry points & data structures. There are two core shared modules to be aware of: * util/ - a collection of shared APIs that can be used by any code. This directory is always in the include path for all things built * conf/ - APIs for parsing / manipulating all the official XML files used by the public API. This directory is only in the include path for driver implementation modules * vmx/ - VMware VMX config handling (used by esx/ and vmware/) Then there are the hypervisor implementations: * esx/ - VMware ESX and GSX support using vSphere API over SOAP * hyperv/ - Microsoft Hyper-V support using WinRM * lxc/ - Linux Native Containers * openvz/ - OpenVZ containers using cli tools * phyp/ - IBM Power Hypervisor using CLI tools over SSH * qemu/ - QEMU / KVM using qemu CLI/monitor * remote/ - Generic libvirt native RPC client * test/ - A "mock" driver for testing * uml/ - User Mode Linux * vbox/ - Virtual Box using native API * vmware/ - VMware Workstation and Player using the vmrun tool * xen/ - Xen using hypercalls, XenD SEXPR & XenStore * xenapi/ - Xen using libxenserver Finally some secondary drivers that are shared for several HVs. Currently these are used by LXC, OpenVZ, QEMU, UML and Xen drivers. The ESX, Hyper-V, Power Hypervisor, Remote, Test & VirtualBox drivers all implement the secondary drivers directly * cpu/ - CPU feature management * interface/ - Host network interface management * network/ - Virtual NAT networking * nwfilter/ - Network traffic filtering rules * node_device/ - Host device enumeration * secret/ - Secret management * security/ - Mandatory access control drivers * storage/ - Storage management drivers Since both the hypervisor and secondary drivers can be built as dlopen()able modules, it is *FORBIDDEN* to have build dependencies between these directories. Drivers are only allowed to depend on the public API, and the internal APIs in the util/ and conf/ directories