Commit Graph

221 Commits

Author SHA1 Message Date
Peter Krempa
7095006921 qemu: cgroup: Refactor setup for IOThread cgroups
Use the default or auto cpuset if they are provided for IOThreads.
2015-04-02 10:12:08 +02:00
Peter Krempa
c9f9fa25d3 qemu: cgroup: Store auto cpuset instead of re-creating it on demand
The automatic cpuset can be stored along with automatic nodeset and it
does not have to be recreated when used.
2015-04-02 10:12:08 +02:00
Martin Kletzander
3a0e5b0c20 qemu: Migrate memory on numatune change
We've never set the cpuset.memory_migrate value to anything, keeping it
on default.  However, we allow changing cpuset.mems on live domain.
That setting, however, don't have any consequence on a domain unless
it's going to allocate new memory.

I managed to make 'virsh numatune' move all the memory to any node I
wanted even without disabling libnuma's numa_set_membind(), so this
should be safe to use with it as well.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1198497

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-03-20 13:40:02 +01:00
John Ferlan
a9f528ab29 Convert virDomainPinDefPtr->vcpuid to virDomainPinDefPtr->id
Since we're not specifically a vcpu related structure anymore...
2015-03-16 11:54:57 -04:00
John Ferlan
59ba70237a Convert virDomainVcpuPinDefPtr to virDomainPinDefPtr
As pointed out by jtomko in his review of the IOThreads pinning code:

http://www.redhat.com/archives/libvir-list/2015-March/msg00495.html

there are some comments sprinkled in indicating IOThreads were using
the same structure as the VcpuPin code...

This is the first patch of a few that will change the virDomainVcpuPin*
structures and code to just virDomainPin* - starting with the data
structure naming...
2015-03-16 11:54:56 -04:00
Pavel Hrdina
cf521fc8ba memtune: change the way how we store unlimited value
There was a mess in the way how we store unlimited value for memory
limits and how we handled values provided by user.  Internally there
were two possible ways how to store unlimited value: as 0 value or as
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED.  Because we chose to store memory
limits as unsigned long long, we cannot use -1 to represent unlimited.
It's much easier for us to say that everything greater than
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED means unlimited and leave 0 as valid
value despite that it makes no sense to set limit to 0.

Remove unnecessary function virCompareLimitUlong.  The update of test
is to prevent the 0 to be miss-used as unlimited in future.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146539

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2015-03-06 11:52:24 +01:00
Peter Krempa
6bc80fa86d conf: numa: Rename virDomainNumatune to virDomainNuma
The structure will gradually become the only place for NUMA related
config, thus rename it appropriately.
2015-02-20 17:43:04 +01:00
Pavel Hrdina
77a9dc0b8d qemu_cgroup: initialize mem_mask to NULL
If 'virNumaGetHostNodeset()' fails then the error path will try to free
uninitialized pointer mem_mask. Introduced by commit af2a1f058.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2015-02-17 14:22:50 +01:00
Daniel P. Berrange
f7afeddce9 qemu: report TAP device indexes to systemd
Record the index of each TAP device created and report them to
systemd, so they show up in machinectl status for the VM.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
318df5a05f Add support for systemd-machined CreateMachineWithNetwork
systemd-machined introduced a new method CreateMachineWithNetwork
that obsoletes CreateMachine. It expects to be given a list of
VETH/TAP device indexes for the host side device(s) associated
with a container/machine.

This falls back to the old CreateMachine method when the new
one is not supported.
2015-01-15 11:07:07 +00:00
Martin Kletzander
86759ec61a qemu: Add missing goto error in qemuRestoreCgroupState
Commit af2a1f05 tried clearly separating each condition in
qemuRestoreCgroupState() for the sake of readability, however somehow
one condition body was missing.  That means that the body of the next
condition got executed only if both of there were true, which is
impossible, thus resulting in a dead code and a logic error.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-12-16 20:44:33 +01:00
Martin Kletzander
af2a1f0587 qemu: Leave cpuset.mems in parent cgroup alone
Instead of setting the value of cpuset.mems once when the domain starts
and then re-calculating the value every time we need to change the child
cgroup values, leave the cgroup alone and rather set the child data
every time there is new cgroup created.  We don't leave any task in the
parent group anyway.  This will ease both current and future code.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-12-16 11:15:27 +01:00
Martin Kletzander
c74d58ad47 qemu: Save numad advice into qemuDomainObjPrivate
Thanks to that we don't need to drag the pointer everywhere and future
code will get cleaner.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-12-16 11:15:27 +01:00
Martin Kletzander
f801a81208 qemu: Remove unnecessary qemuSetupCgroupPostInit function
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-12-16 11:15:27 +01:00
Martin Kletzander
5cca4cd16f Remove unnecessary curly brackets in src/qemu/
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-11-14 17:13:01 +01:00
Wang Rui
c6e9024867 qemu: fix domain startup failing with 'strict' mode in numatune
If the memory mode is specified as 'strict' and with one node, we
get the following error when starting domain.

error: Unable to write to '$cgroup_path/cpuset.mems': Device or resource busy

XML is configured with numatune as follows:
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>

It's broken by Commit 411cea638f
which moved qemuSetupCgroupForEmulator() before setting cpuset.mems
in qemuSetupCgroupPostInit.

Directory '$cgroup_path/emulator/' is created in qemuSetupCgroupForEmulator.
But '$cgroup_path/emulator/cpuset.mems' it not set and has a default value
(all nodes, such as 0-1). Then we setup '$cgroup_path/cpuset.mems' to the
nodemask (in this case it's '0') in qemuSetupCgroupPostInit. It must fail.

This patch makes '$cgroup_path/emulator/cpuset.mems' is set before
'$cgroup_path/cpuset.mems'. The action is similar with that in
qemuDomainSetNumaParamsLive.

Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
2014-11-11 12:14:09 +01:00
Wang Rui
38a0f6df64 qemu: don't setup cpuset.mems if memory mode in numatune is not 'strict'
If the memory mode in numatune is specified as 'preferred' with one node
(such as nodeset='0'), domain's memory is not all in node 0 absolutely.
Assumption that node 0 doesn't have enough memory, memory can be allocated
on node 1 when qemu process startup. Then if we set cpuset.mems to '0',
it may invoke OOM.

Commit 1a7be8c600 changed the former logic of
checking memory mode in virDomainNumatuneGetNodeset. This patch adds the
check as before.

Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
2014-11-11 12:14:09 +01:00
Martin Kletzander
9661ac2f46 qemu: unref cfg after TerminateMachine has been called
Commit 4882618ed1 added the code that
requests driver cfg, but forgot to unref it.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-10-21 13:54:09 +02:00
Guido Günther
4882618ed1 qemu: use systemd's TerminateMachine to kill all processes
If we don't properly clean up all processes in the
machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm
starts fail with

  'CreateMachine: File exists'

Additional processes can e.g. be added via

  echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks

but there are other cases like

  http://bugs.debian.org/761521

Invoke TerminateMachine to be on the safe side since systemd tracks the
cgroup anyway. This is a noop if all processes have terminated already.
2014-10-01 20:17:46 +02:00
Ján Tomko
e26bbf49cc Fix crash cpu_shares change event crash on domain startup
Introduced by commit 0dce260.

qemuDomainEventQueue was called with qemuDomainObjPrivatePtr instead
of virQEMUDriverPtr.

https://bugzilla.redhat.com/show_bug.cgi?id=1147494
2014-09-29 13:58:43 +02:00
Daniel P. Berrange
0778c0be8d Rename tunable event constants
For the new VIR_DOMAIN_EVENT_ID_TUNABLE event we have a bunch of
constants added

   VIR_DOMAIN_EVENT_CPUTUNE_<blah>
   VIR_DOMAIN_EVENT_BLKDEVIOTUNE_<blah>

This naming convention is bad for two reasons

  - There is no common prefix unique for the events to both
    relate them, and distinguish them from other event
    constants

  - The values associated with the constants were chosen
    to match the names used with virConnectGetAllDomainStats
    so having EVENT in the constant name is not applicable in
    that respect

This patch proposes renaming the constants to

    VIR_DOMAIN_TUNABLE_CPU_<blah>
    VIR_DOMAIN_TUNABLE_BLKDEV_<blah>

ie, given them a common VIR_DOMAIN_TUNABLE prefix.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-09-26 10:58:15 +01:00
Pavel Hrdina
0dce260cc8 cputune_event: queue the event for cputune updates
Now we have universal tunable event so we can use it for reporting
changes to user. The cputune values will be prefixed with "cputune" to
distinguish it from other tunable events.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2014-09-23 21:58:09 +02:00
Ján Tomko
c1480871bb Fixes for domains with no iothreads
Plug a memory leak and silence a warning.
2014-09-18 14:49:01 +02:00
John Ferlan
500c91c57d qemu_cgroup: Adjust spacing around incrementor
Change "i+1" to "i + 1"
2014-09-15 21:05:46 -04:00
John Ferlan
5f6ad32c73 qemu_cgroup: Introduce cgroup functions for IOThreads
In order to support cpuset setting, introduce qemuSetupCgroupIOThreadsPin
and qemuSetupCgroupForIOThreads to mimic the existing Vcpu API's.

These will support having an 'iotrhreadpin' element in the 'cpuset' in
order to pin named IOThreads to specific CPU's. The IOThread pin names
will follow the IOThread naming scheme starting at 1 (eg "iothread1")
up through an including the def->iothreads value.
2014-09-15 13:18:56 -04:00
Peter Krempa
1c6999d340 conf: RNG: Always fill in default random source path for default backend
Libvirt documents that the default entropy source for the 'random'
backend of a RNG device is /dev/random. Instead of storing and
propagating NULL across our code and checking it in multiple places fill
the default in the post parse callback and use that in the other places.
2014-07-28 10:07:09 +02:00
Peter Krempa
bbddbefa2f virtio-rng: allow multiple RNG devices
qemu supports adding multiple RNG devices. This patch allows libvirt to
support this.
2014-07-25 09:34:53 +02:00
Peter Krempa
99ff49eed1 qemu: cgroup: Don't use NULL path on default backed RNGs
The "random" backend for virtio-rng can be started with no path
specified which equals to /dev/random. The cgroup code didn't consider
this and called few of the functions with NULL resulting into:

 $ virsh start rng-vm
 error: Failed to start domain rng-vm
 error: Path '(null)' is not accessible: Bad address

Problem introduced by commit c6320d3463
2014-07-25 09:34:53 +02:00
John Ferlan
17bddc46f4 hostdev: Introduce virDomainHostdevSubsysSCSIiSCSI
Create the structures and API's to hold and manage the iSCSI host device.
This extends the 'scsi_host' definitions added in commit id '5c811dce'.
A future patch will add the XML parsing, but that code requires some
infrastructure to be in place first in order to handle the differences
between a 'scsi_host' and an 'iSCSI host' device.
2014-07-24 07:04:44 -04:00
John Ferlan
42957661dc hostdev: Introduce virDomainHostdevSubsysSCSIHost
Split virDomainHostdevSubsysSCSI further. In preparation for having
either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI
to contain just a virDomainHostdevSubsysSCSIHost to describe the
'scsi_host' host device
2014-07-24 06:39:28 -04:00
John Ferlan
5805621cd9 hostdev: Introduce virDomainHostdevSubsysSCSI
Create a separate typedef for the hostdev union data describing SCSI
Then adjust the code to use the new pointer
2014-07-24 06:39:27 -04:00
John Ferlan
1c8da0d44e hostdev: Introduce virDomainHostdevSubsysPCI
Create a separate typedef for the hostdev union data describing PCI.
Then adjust the code to use the new pointer
2014-07-24 06:39:27 -04:00
John Ferlan
7540d07f09 hostdev: Introduce virDomainHostdevSubsysUSB
Create a separate typedef for the hostdev union data describing USB.
Then adjust the code to use the new pointer
2014-07-24 06:39:27 -04:00
Martin Kletzander
7e72ac7878 qemu: leave restricting cpuset.mems after initialization
When domain is started with numatune memory mode strict and the
nodeset does not include host NUMA node with DMA and DMA32 zones, KVM
initialization fails.  This is because cgroup restrict even kernel
allocations.  We are already doing numa_set_membind() which does the
same thing, only it does not restrict kernel allocations.

This patch leaves the userspace numa_set_membind() in place and moves
the cpuset.mems setting after the point where monitor comes up, but
before vcpu and emulator sub-groups are created.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-07-16 20:15:46 +02:00
Martin Kletzander
aa668fccf0 qemu: split out cpuset.mems setting
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-07-16 20:15:46 +02:00
Martin Kletzander
1a7be8c600 numatune: add support for per-node memory bindings in private APIs
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-07-16 20:15:45 +02:00
Martin Kletzander
93e82727ec numatune: Encapsulate numatune configuration in order to unify results
There were numerous places where numatune configuration (and thus
domain config as well) was changed in different ways.  On some
places this even resulted in persistent domain definition not to be
stable (it would change with daemon's restart).

In order to uniformly change how numatune config is dealt with, all
the internals are now accessible directly only in numatune_conf.c and
outside this file accessors must be used.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-07-16 20:15:45 +02:00
Martin Kletzander
e764ec7ae3 numatune: unify numatune struct and enum names
Since there was already public virDomainNumatune*, I changed the
private virNumaTune to match the same, so all the uses are unified and
public API is kept:

s/vir\(Domain\)\?Numa[tT]une/virDomainNumatune/g

then shrunk long lines, and mainly functions, that were created after
that:

sed -i 's/virDomainNumatuneMemPlacementMode/virDomainNumatunePlacement/g'

And to cope with the enum name, I haad to change the constants as
well:

s/VIR_NUMA_TUNE_MEM_PLACEMENT_MODE/VIR_DOMAIN_NUMATUNE_PLACEMENT/g

Last thing I did was at least a little shortening of already long
name:

s/virDomainNumatuneDef/virDomainNumatune/g

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-07-16 20:15:45 +02:00
Martin Kletzander
0c04906fa8 qemu: don't error out when cgroups don't exist
When creating cgroups for vcpu and emulator threads whilst starting a
domain, we explicitly skip creating those cgroups in case priv->cgroup
is NULL (cgroups not supported) because SetAffinity() serves the same
purpose.  If the host supports only some cgroups (the ones we need are
either unmounted or disabled in qemu.conf), we error out with weird
message even though we could continue starting the domain.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1097028

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-07-09 15:09:54 +02:00
Peter Krempa
1ba14d6df2 qemu: cgroup: Setup only the top level disk image for read-write access
Only the top level gets writes, so the rest of the backing chain
requires only read-only access.
2014-07-09 10:38:55 +02:00
Peter Krempa
aa53c77e1d qemu: cgroup: Add functions to set cgroup image stuff on individual imgs
Add functions that will allow to set all the required cgroup stuff on
individual images taking a virStorageSourcePtr. Also convert functions
designed to setup whole backing chain to take advantage of the change.
2014-07-09 10:38:55 +02:00
Peter Krempa
63834faadb storage: Move readonly and shared flags to disk source from disk def
In the future we might need to track state of individual images. Move
the readonly and shared flags to the virStorageSource struct so that we
can keep them in a per-image basis.
2014-07-08 14:27:19 +02:00
Ján Tomko
d4edce5f1e Always report an error if virBitmapFormat fails
It already reports an error if STRDUP fails.
2014-06-06 14:35:19 +02:00
Michal Privoznik
4dae1eddde qemuSetupCgroupForVcpu: s/virProcessInfoSetAffinity/virProcessSetAffinity/
In the f56c773bf we've made the substitution but forgot to fix one
comment which is still referring to the old name. This may be
potentially misleading.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-05-22 12:30:20 +02:00
Nehal J Wani
3d5c29a17c Fix typos in src/*
Fix minor typos in source comments

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-04-21 16:49:08 -06:00
Daniel P. Berrange
edfe82c7f9 Replace Usb with USB throughout
Since it is an abbreviation, USB should always be fully
capitalized or full lower case, never Usb.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-04-08 11:10:59 +01:00
Daniel P. Berrange
21a2446d92 Replace Scsi with SCSI throughout
Since it is an abbreviation, SCSI should always be fully
capitalized or full lower case, never Scsi.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-04-08 11:10:31 +01:00
Ján Tomko
97814d8ab3 Show the real cpu shares value in live XML
Currently, the Linux kernel treats values of '0' and '1' as
the minimum of 2. Values larger than the maximum are changed
to the maximum.

Re-reading the shares value after setting it reflects this in
the live domain XML.
2014-03-26 10:10:13 +01:00
Ján Tomko
bdffab0d5c Treat zero cpu shares as a valid value
Currently, <cputune><shares>0</shares></cputune> is treated
as if it were not specified.

Treat is as a valid value if it was explicitly specified
and write it to the cgroups.
2014-03-26 10:10:02 +01:00
Ján Tomko
5922d05aec Indent top-level labels by one space in src/qemu/ 2014-03-25 14:58:39 +01:00
Daniel P. Berrange
2835c1e730 Add virLogSource variables to all source files
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-03-18 14:29:22 +00:00
Osier Yang
10c9ceff6d util: Add one argument for several scsi utils
To support passing the path of the test data to the utils, one
more argument is added to virSCSIDeviceGetSgName,
virSCSIDeviceGetDevName, and virSCSIDeviceNew, and the related
code is changed accordingly.

Later tests for the scsi utils will be based on this patch.

Signed-off-by: Osier Yang <jyang@redhat.com>
2014-01-30 15:48:28 +08:00
Pradipta Kr. Banerjee
c6320d3463 Add hw random number generator (/dev/hwrng) to cgroup ACL
Creating a qemu VM with /dev/hwrng as backend RNG device throws the
following error - "Could not open '/dev/hwrng': Permission denied"
This patch fixes the issue

Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2014-01-27 09:48:39 -07:00
Osier Yang
2b66504ded util: Add "shareable" field for virSCSIDevice struct
Unlike the host devices of other types, SCSI host device XML supports
"shareable" tag. This patch introduces it for the virSCSIDevice struct
for a later patch use (to detect if the SCSI device is shareable when
preparing the SCSI host device in QEMU driver).
2014-01-23 17:52:33 +08:00
Gao feng
3b431929a2 blkio: Setting throttle blkio cgroup for domain
This patch introduces virCgroupSetBlkioDeviceReadIops,
virCgroupSetBlkioDeviceWriteIops,
virCgroupSetBlkioDeviceReadBps and
virCgroupSetBlkioDeviceWriteBps,

we can use these interfaces to set up throttle
blkio cgroup for domain.

This patch also adds the new throttle blkio cgroup
elements to the test xml.

Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2014-01-20 10:52:44 +08:00
Gao feng
b9ce5d388f rename virBlkioDeviceWeightPtr to virBlkioDevicePtr
The throttle blkio cgroup will reuse this struct.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-12-12 12:29:59 +00:00
Eric Blake
5d509e9ee2 maint: fix comma style issues: qemu
Most of our code base uses space after comma but not before;
fix the remaining uses before adding a syntax check.

* src/qemu/qemu_cgroup.c: Consistently use commas.
* src/qemu/qemu_command.c: Likewise.
* src/qemu/qemu_conf.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/qemu/qemu_monitor.c: Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-11-20 09:14:55 -07:00
Cole Robinson
a924d9d083 qemu: cgroup: Fix crash if starting nographics guest
We can dereference graphics[0] even if guest has no graphics device
configured. I screwed this up in a216e64872

https://bugzilla.redhat.com/show_bug.cgi?id=1014088
2013-10-01 11:22:18 -04:00
Peter Krempa
4baa8d7637 cleanup: Kill usage of access(PATH, F_OK) in favor of virFileExists()
Semantics of the libvirt helper are more clear. This change also allows
to clean up some pieces of code.
2013-09-16 10:37:39 +02:00
Cole Robinson
a216e64872 qemu: Set QEMU_AUDIO_DRV=none with -nographic
On my machine, a guest fails to boot if it has a sound card, but not
graphical device/display is configured, because pulseaudio fails to
initialize since it can't access $HOME.

A workaround is removing the audio device, however on ARM boards there
isn't any option to do that, so -nographic always fails.

Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately
this has massive test suite fallout.

Add a qemu.conf parameter nographics_allow_host_audio, that if enabled
will pass through QEMU_AUDIO_DRV from sysconfig (similar to
vnc_allow_host_audio)
2013-09-02 16:53:39 -04:00
Michal Privoznik
94a24dd3a9 qemuSetupMemoryCgroup: Handle hard_limit properly
Since 16bcb3 we have a regression. The hard_limit is set
unconditionally. By default the limit is zero. Hence, if user hasn't
configured any, we set the zero in cgroup subsystem making the kernel
kill the corresponding qemu process immediately. The proper fix is to
set hard_limit iff user has configured any.
2013-08-20 15:03:17 +02:00
Michal Privoznik
16bcb3b616 qemu: Drop qemuDomainMemoryLimit
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
2013-08-19 11:16:58 +02:00
Daniel P. Berrange
1166eeba61 Fix crashing upgrading from older libvirts with running guests
If upgrading from a libvirt that is older than 1.0.5, we can
not assume that vm->def->resource is non-NULL. This bogus
assumption caused libvirtd to crash

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-08-02 15:32:26 +01:00
Daniel P. Berrange
2fe2470181 Enable support for systemd-machined in cgroups creation
Make the virCgroupNewMachine method try to use systemd-machined
first. If that fails, then fallback to using the traditional
cgroup setup code path.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-31 19:29:19 +01:00
Daniel P. Berrange
5ec5a22493 Add 'controllers' arg to virCgroupNewDetect
When detecting cgroups we must honour any controllers
whitelist the driver may have.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:47 +01:00
Daniel P. Berrange
a45b99ead9 Introduce a more convenient virCgroupNewDetectMachine
Instead of requiring drivers to use a combination of calls
to virCgroupNewDetect and virCgroupIsValidMachine, combine
the two into virCgroupNewDetectMachine

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:47:30 +01:00
Daniel P. Berrange
02098ac260 Convert QEMU driver to use virCgroupNewMachine
Convert the QEMU driver code to use the new atomic API
for setup of cgroups

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 11:42:47 +01:00
Daniel P. Berrange
2049ef9942 Create + setup cgroups atomically for QEMU process
Currently the QEMU driver creates the VM's cgroup prior to
forking, and then uses a virCommand hook to move the child
into the cgroup. This won't work with systemd whose APIs
do the creation of cgroups + attachment of processes atomically.

Fortunately we have a handshake taking place between the
QEMU driver and the child process prior to QEMU being exec()d,
which was introduced to allow setup of disk locking. By good
fortune this synchronization point can be used to enable the
QEMU driver to do atomic setup of cgroups removing the use
of the hook script.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
87b2e6fa84 Auto-detect existing cgroup placement
Use the new virCgroupNewDetect function to determine cgroup
placement of existing running VMs. This will allow the legacy
cgroups creation APIs to be removed entirely

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
0d7f45aea7 Convert remainder of cgroups code to report errors
Convert the remaining methods in vircgroup.c to report errors
instead of returning errno values.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Daniel P. Berrange
b64dabff27 Report full errors from virCgroupNew*
Instead of returning raw errno values, report full libvirt
errors in virCgroupNew* functions.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Peter Krempa
bac2182041 qemu: Cleanup coding style nits in qemu_cgroup.c 2013-07-18 14:58:12 +02:00
Osier Yang
a39f69d2bb qemu: Set cpuset.cpus for domain process
When either "cpuset" of <vcpu> is specified, or the "placement" of
<vcpu> is "auto", only setting the cpuset.mems might cause the guest
starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is
"auto"):

1) Related XMLs
  <vcpu placement='auto'>4</vcpu>
  <numatune>
    <memory mode='strict' placement='auto'/>
  </numatune>

2) Host NUMA topology
  % numactl --hardware
  available: 8 nodes (0-7)
  node 0 cpus: 0 4 8 12 16 20 24 28
  node 0 size: 16374 MB
  node 0 free: 11899 MB
  node 1 cpus: 32 36 40 44 48 52 56 60
  node 1 size: 16384 MB
  node 1 free: 15318 MB
  node 2 cpus: 2 6 10 14 18 22 26 30
  node 2 size: 16384 MB
  node 2 free: 15766 MB
  node 3 cpus: 34 38 42 46 50 54 58 62
  node 3 size: 16384 MB
  node 3 free: 15347 MB
  node 4 cpus: 3 7 11 15 19 23 27 31
  node 4 size: 16384 MB
  node 4 free: 15041 MB
  node 5 cpus: 35 39 43 47 51 55 59 63
  node 5 size: 16384 MB
  node 5 free: 15202 MB
  node 6 cpus: 1 5 9 13 17 21 25 29
  node 6 size: 16384 MB
  node 6 free: 15197 MB
  node 7 cpus: 33 37 41 45 49 53 57 61
  node 7 size: 16368 MB
  node 7 free: 15669 MB

4) cpuset.cpus will be set as: (from debug log)

2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
to '0-63'

5) The advisory nodeset got from querying numad (from debug log)

2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
Nodeset returned from numad: 1

6) cpuset.mems will be set as: (from debug log)

2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
to '0-7'

I.E, the domain process's memory is restricted on the first NUMA node,
however, it can use all of the CPUs, which will likely cause the domain
process to fail to start because of the kernel fails to allocate
memory with the the memory policy as "strict".

% tail -n 20 /var/log/libvirt/qemu/toy.log
...
2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 :
Handshake with parent is done
char device redirected to /dev/pts/2 (label charserial0)
kvm_init_vcpu failed: Cannot allocate memory
...

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2013-07-18 14:57:57 +02:00
Daniel P. Berrange
50760e2a8a Convert 'int i' to 'size_t i' in src/qemu files
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-10 17:55:15 +01:00
Michal Privoznik
e987a30dfa Adapt to VIR_ALLOC and virAsprintf in src/qemu/* 2013-07-10 11:07:32 +02:00
Jiri Denemark
e0e438af00 qemu: Move memory limit computation to a reusable function 2013-07-08 12:35:27 +02:00
Laine Stump
1d829e1306 pci: rename virPCIDeviceGetVFIOGroupDev to virPCIDeviceGetIOMMUGroupDev
I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).

I'm changing it now because I'm about to add several more functions
that deal with iommu groups.
2013-06-25 18:07:38 -04:00
Osier Yang
8da9516a84 qemu: Abstract code for the cpu controller setting into a helper 2013-06-05 19:25:48 +08:00
Michal Privoznik
a88fb3009f Adapt to VIR_STRDUP and VIR_STRNDUP in src/qemu/* 2013-05-23 09:56:38 +02:00
Osier Yang
66194f71df src/qemu: Remove the whitespace before ';' 2013-05-21 23:41:44 +08:00
Osier Yang
58f8e0cd58 qemu: Don't remove the "return 0"
Commit f60a50c795 intended to remove the warning only, but not with
the "return 0" together.
2013-05-21 23:08:57 +08:00
Osier Yang
479d5991cd qemu: Abstract code for cpuset controller setting into a helper 2013-05-20 19:57:00 +08:00
Osier Yang
9f2455d359 qemu: Abstract code for devices controller setting into a helper 2013-05-20 19:52:35 +08:00
Osier Yang
f60a50c795 qemu: Abstract code for memory controller setting into a helper 2013-05-20 19:39:54 +08:00
Osier Yang
2fd16df7b5 qemu: Abstract the code for blkio controller setting into a helper 2013-05-20 19:24:45 +08:00
Daniel P. Berrange
c2cf5f1c2a Fix failure to detect missing cgroup partitions
Change bbe97ae968 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-17 10:25:15 +01:00
Jim Fehlig
bbe97ae968 Fix starting domains when kernel has no cgroups support
Found that I was unable to start existing domains after updating
to a kernel with no cgroups support

  # zgrep CGROUP /proc/config.gz
  # CONFIG_CGROUPS is not set
  # virsh start test
  error: Failed to start domain test
  error: Unable to initialize /machine cgroup: Cannot allocate memory

virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when
attempting to open /proc/cgroups on such a system, but it was being
dropped in virCgroupSetPartitionSuffix().

Change virCgroupSetPartitionSuffix() to propagate errors returned by
its callees.  Also check for ENOENT in qemuInitCgroup() when determining
if cgroups support is available.
2013-05-13 09:27:46 -06:00
Han Cheng
6eb42e38e8 qemu: Allow the scsi-generic device in cgroup
This adds the scsi-generic device into the device controller's
whitelist, so that it's allowed to used by the qemu process.

Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
2013-05-13 19:08:34 +08:00
Laine Stump
52ba0f6e1c qemu: fix stupid typos in VFIO cgroup setup/teardown
I must have looked at this a couple dozen times before I noticed it
had "!=" instead of "==". Not doing this setup prevented qemu from
doing anything with the vfio group device.
2013-05-03 14:32:54 -04:00
Michal Privoznik
7c9a2d88cd virutil: Move string related functions to virstring.c
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
2013-05-02 16:56:55 +02:00
Laine Stump
811143c0b6 qemu: put usb cgroup setup in common function
The USB-specific cgroup setup had been inserted inline in
qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a
common cgroup setup function called for all hostdevs, so it makes sens
to put the usb-specific setup there and just rely on that function
being called.

The one thing I'm uncertain of here (and a reason for not pushing
until after release) is that previously hostdev->missing was checked
only when starting a domain (and cgroup setup for the device skipped
if missing was true), but with this consolidation, it is now checked
in the case of hotplug as well. I don't know if this will have any
practical effect (does it make sense to hotplug a "missing" usb
device?)
2013-04-29 21:52:28 -04:00
Laine Stump
6e13860cb4 qemu: add vfio devices to cgroup ACL when appropriate
PCIO device assignment using VFIO requires read/write access by the
qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the
VFIO group number that the assigned device belongs to (and can be
found with the function virPCIDeviceGetVFIOGroupDev)

/dev/vfio/vfio can be accessible to any guest without danger
(according to vfio developers), so it is added to the static ACL.

The group device must be dynamically added to the cgroup ACL for each
vfio hostdev in two places:

1) for any devices in the persistent config when the domain is started
   (done during qemuSetupCgroup())

2) at device attach time for any hotplug devices (done in
   qemuDomainAttachHostDevice)

The group device must be removed from the ACL when a device it
"hot-unplugged" (in qemuDomainDetachHostDevice())

Note that USB devices are already doing their own cgroup setup and
teardown in the hostdev-usb specific function. I chose to make the new
functions generic and call them in a common location though. We can
then move the USB-specific code (which is duplicated in two locations)
to this single location. I'll be posting a followup patch to do that.
2013-04-29 21:52:28 -04:00
Daniel P. Berrange
1e05073fbb Replace more cases of /system with /machine
The change in commit aed4986322
was incomplete, missing a couple of cases of /system. This
caused failure to start VMs.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-22 17:11:36 +01:00
Daniel P. Berrange
aed4986322 Change default resource partition to /machine
After discussions with systemd developers it was decided that
a better default policy for resource partitions is to have
3 default partitions at the top level

   /system   - system services
   /machine - virtual machines / containers
   /user    - user login session

This ensures that the default policy isolates guest from
user login sessions & system services, so a mis-behaving
guest can't consume 100% of CPU usage if other things are
contending for it.

Thus we change the default partition from /system to
/machine

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-22 12:10:12 +01:00
Daniel P. Berrange
767596bdb4 Remove non-functional code for setting up non-root cgroups
The virCgroupNewDriver method had a 'bool privileged' param.
If a false value was ever passed in, it would simply not
work, since non-root users don't have any privileges to create
new cgroups. Just delete this broken code entirely and make
the QEMU driver skip cgroup setup in non-privileged mode

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Daniel P. Berrange
db44eb1b5f Change default cgroup layout for QEMU/LXC and honour XML config
Historically QEMU/LXC guests have been placed in a cgroup layout
that is

   $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME

This is bad for a number of reasons

 - The cgroup hierarchy gets very deep which seriously
   impacts kernel performance due to cgroups scalability
   limitations.

 - It is hard to setup cgroup policies which apply across
   services and virtual machines, since all VMs are underneath
   the libvirtd service.

To address this the default cgroup location is changed to
be

    /system/$VMNAME.{lxc,qemu}.libvirt

This puts virtual machines at the same level in the hierarchy
as system services, allowing consistent policy to be setup
across all of them.

This also honours the new resource partition location from the
XML configuration, for example

  <resource>
    <partition>/virtualmachines/production</partitions>
  </resource>

will result in the VM being placed at

    /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt

NB, with the exception of the default, /system, path which
is intended to always exist, libvirt will not attempt to
auto-create the partitions in the XML. It is the responsibility
of the admin/app to configure the partitions. Later libvirt
APIs will provide a way todo this.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Daniel P. Berrange
aa8604dd45 Add a new virCgroupNewPartition for setting up resource partitions
A resource partition is an absolute cgroup path, ignoring the
current process placement. Expose a virCgroupNewPartition API
for constructing such cgroups

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Daniel P. Berrange
04c18d25f1 Rename virCgroupForXXX to virCgroupNewXXX
Rename all the virCgroupForXXX methods to use the form
virCgroupNewXXX since they are all constructors. Also
make sure the output parameter is the last one in the
list, and annotate all pointers as non-null. Fix up
all callers, and make sure they use true/false not 0/1
for the boolean parameters

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Daniel P. Berrange
632f78caaf Store a virCgroupPtr instance in qemuDomainObjPrivatePtr
Instead of calling virCgroupForDomain every time we need
the virCgrouPtr instance, just do it once at Vm startup
and cache a reference to the object in qemuDomainObjPrivatePtr
until shutdown of the VM. Removing the virCgroupPtr from
the QEMU driver state also means we don't have stale mount
info, if someone mounts the cgroups filesystem after libvirtd
has been started

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Stefan Berger
22feb0d3e7 QEMU Cgroup support for TPM passthrough
Some refactoring for virDomainChrSourceDef type of devices so
we can use common code.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Tested-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
2013-04-12 16:55:46 -04:00