Commit Graph

21027 Commits

Author SHA1 Message Date
Henning Schild
90b721e43e qemu cgroups: move new threads to new cgroup after cpuset is set up
Moving tasks to cgroups implied sched_setaffinity. Changing the cpus in
a set implies the same for all tasks in the group.
The old code put the the thread into the cpuset inherited from the
machine cgroup, which allowed it to run outside of vcpupin for a short
while.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2015-12-14 15:58:05 -05:00
Henning Schild
a41c00b472 qemu: do not put a task into machine cgroup
The machine cgroup is a superset, a parent to the emulator and vcpuX
cgroups. The parent cgroup should never have any tasks directly in it.
In fact the parent cpuset might contain way more cpus than the sum of
emulatorpin and vcpupins. So putting tasks in the superset will allow
them to run outside of <cputune>.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2015-12-14 15:48:05 -05:00
Henning Schild
71ce475967 util: cgroups do not implicitly add task to new machine cgroup
virCgroupNewMachine used to add the pidleader to the newly created
machine cgroup. Do not do this implicit anymore.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2015-12-14 15:43:29 -05:00
Michal Privoznik
65e3451ea9 virNetDevMacVLanTapSetup: Drop @multiqueue argument
Firstly, there's a bug (or typo) in the only place where we call
this function: @multiqueue is set whenever @tapfdSize is greater
than zero, while in fact the condition should have been 'greater
than one'.
Then, secondly, since the condition depends on just one
variable, that we are even passing down to the function, we can
move the condition into the function and drop useless argument.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-14 15:58:18 +01:00
Martin Kletzander
686eb7a24f qemu: Warn when using vhost-user without shared memory
When user configures vhost-user interface and forgets to also configure
any shared memory, the search for the root cause of non-operational
interface might take unpleasantly long time.  Let's enhance user
experience by emitting a warning in the logs.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1266982

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-12-14 08:54:19 +01:00
Michal Privoznik
ec93cc25ec virNetDevMacVLanTapSetup: Work around older systems
Some older systems, e.g. RHEL-6 do not have IFF_MULTI_QUEUE flag
which we use to enable multiqueue feature. Therefore one gets the
following compile error there:

  CC     util/libvirt_util_la-virnetdevmacvlan.lo
util/virnetdevmacvlan.c: In function 'virNetDevMacVLanTapSetup':
util/virnetdevmacvlan.c:338: error: 'IFF_MULTI_QUEUE' undeclared (first use in this function)
util/virnetdevmacvlan.c:338: error: (Each undeclared identifier is reported only once
util/virnetdevmacvlan.c:338: error: for each function it appears in.)
make[3]: *** [util/libvirt_util_la-virnetdevmacvlan.lo] Error 1

So, whenever user wants us to enable the feature on such systems,
we will just throw a runtime error instead.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-13 08:35:46 +01:00
Eric Blake
034e47c338 CVE-2015-5313: storage: don't allow '/' in filesystem volume names
The libvirt file system storage driver determines what file to
act on by concatenating the pool location with the volume name.
If a user is able to pick names like "../../../etc/passwd", then
they can escape the bounds of the pool.  For that matter,
virStoragePoolListVolumes() doesn't descend into subdirectories,
so a user really shouldn't use a name with a slash.

Normally, only privileged users can coerce libvirt into creating
or opening existing files using the virStorageVol APIs; and such
users already have full privilege to create any domain XML (so it
is not an escalation of privilege).  But in the case of
fine-grained ACLs, it is feasible that a user can be granted
storage_vol:create but not domain:write, and it violates
assumptions if such a user can abuse libvirt to access files
outside of the storage pool.

Therefore, prevent all use of volume names that contain "/",
whether or not such a name is actually attempting to escape the
pool.

This changes things from:

$ virsh vol-create-as default ../../../../../../etc/haha --capacity 128
Vol ../../../../../../etc/haha created
$ rm /etc/haha

to:

$ virsh vol-create-as default ../../../../../../etc/haha --capacity 128
error: Failed to create vol ../../../../../../etc/haha
error: Requested operation is not valid: volume name '../../../../../../etc/haha' cannot contain '/'

Signed-off-by: Eric Blake <eblake@redhat.com>
2015-12-11 16:34:53 -07:00
John Ferlan
afe73ed468 util: Fixup virnetdevmacvlan.h ATTRIBUTE_NONNULL's
Commit id '56e2171c6' removed a variable from the argument list, but
neglected to update the ATTRIBUTE_NONNULL values, so when commit id
'08da97bfb' added a couple of arguments, the values were off.
2015-12-11 07:16:16 -05:00
Peter Krempa
ace1ee225f test: qemuxml2argv: Mock virMemoryMaxValue to remove 32/64 bit difference
Always return LLONG_MAX even on 32 bit systems. The limitation
originates from our use of "unsigned long" in several APIs. The internal
data type is unsigned long long. Make the test suite deterministic by
removing the architecture difference.

Flaw was introduced in 645881139b where
I've added a test that uses too large numbers.
2015-12-11 12:23:38 +01:00
Michal Privoznik
81a110edc7 qemu: Enable multiqueue for macvtaps
https://bugzilla.redhat.com/show_bug.cgi?id=1240439

Ta-da! Now that we know how to open a macvtap device multiple
times, we can finally enable the multiqueue feature. Everything
else is already prepared (e.g. command line generation) from the
previous iteration where the feature was implemented for
TUN/TAP devices.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-11 08:44:44 +01:00
Michal Privoznik
08da97bfb9 virNetDevMacVLanCreateWithVPortProfile: Rework to support multiple FDs
For the multiqueue on macvtaps we are going to need to open
the device multiple times. Currently, this is not supported.
Rework the function, so that upper layers can be reworked too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-11 08:44:43 +01:00
Michal Privoznik
1e90c744d5 virNetDevMacVLanTapSetup: Allow enabling of IFF_MULTI_QUEUE
Like we are doing for TUN/TAP devices, we should do the same for
macvtaps. Although, it's not as critical as in that case, we
should do it for the consistency.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-11 08:44:39 +01:00
Michal Privoznik
136fe2f7cc virNetDevMacVLanTapSetup: Rework to support multiple FDs
For the multiqueue on macvtaps we are going to need to open
the device multiple times. Currently, this is not supported.
Rework the function, so that upper layers can be reworked too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-11 08:42:50 +01:00
Michal Privoznik
d36897c765 virNetDevMacVLanTapOpen: Rework to support multiple FDs
For the multiqueue on macvtaps we are going to need to open
the device multiple times. Currently, this is not supported.
Rework the function, so that upper layers can be reworked too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-11 08:42:50 +01:00
Michal Privoznik
025a87065f virNetDevMacVLanTapOpen: Slightly rework
There are few outdated things. Firstly, we don't need to undergo
the torture of fopen, fscanf and fclose just to get the interface
index when we have nice wrapper over that: virNetDevGetIndex.
Secondly, we don't need to have statically allocated buffer for
the path we are opening.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-11 08:42:49 +01:00
Michal Privoznik
56e2171c6f virNetDevMacVLanCreateWithVPortProfile: Turn vnet_hdr into flag
So yet again one of integer arguments that we use as a boolean.
Since the argument count of the function is unbearably long
enough, lets turn those booleans into flags.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-11 08:42:49 +01:00
Daniel P. Berrange
1ce929603b log: include hostname in initial log message
On the very first log message we send to any output, we include
the libvirt version number and package string. In some bug reports
we have been given libvirtd.log files that came from a different
host than the corresponding /var/log/libvirt/qemu log files. So
extend the initial log message to include the hostname too.

eg on first log message we would now see:

 $ libvirtd
 2015-12-04 17:35:36.610+0000: 20917: info : libvirt version: 1.3.0
 2015-12-04 17:35:36.610+0000: 20917: info : hostname: dhcp-1-180.lcy.redhat.com
 2015-12-04 17:35:36.610+0000: 20917: error : qemuMonitorIO:687 : internal error: End of file from monitor

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-12-10 18:05:49 +00:00
John Ferlan
a523770c32 storage: Ignore block devices that fail format detection
https://bugzilla.redhat.com/show_bug.cgi?id=1276198

Prior to commit id '98322052' failure to saferead the block device would
cause an error to be logged and the device to be skipped while attempting
to discover/create a stable target path for a new LUN (NPIV).

This was because virStorageBackendSCSIFindLUs ignored errors from
processLU and virStorageBackendSCSINewLun.

Ignoring the failure allowed a multipath device with an "active" and
"ghost" to be present on the host with the "ghost" block device being
ignored. This patch will return a -2 to the caller indicating the desire
to ignore the block device since it cannot be used directly rather than
fail the pool startup.
2015-12-09 16:31:15 -05:00
John Ferlan
b3df72c4dd storage: Add debug message
I found this useful while processing a volume that wouldn't end up
showing up in the resulting list of block volumes. In this case, the
partition type wasn't found in the disk_types table.
2015-12-09 16:31:14 -05:00
John Ferlan
1bc84b0a08 storage: Handle readflags errors
Similar to the openflags VIR_STORAGE_VOL_OPEN_NOERROR processing, if some
read processing operation fails, check the readflags for the corresponding
error flag being set. If so, rather then causing an error - use VIR_WARN
to flag the error, but return -2 which some callers can use to perform
specific actions. Use a new VIR_STORAGE_VOL_READ_NOERROR flag in a new
VolReadErrorMode enum.
2015-12-09 16:31:14 -05:00
John Ferlan
1edfce9b18 storage: Set ret = -1 on failures in virStorageBackendUpdateVolTargetInfo
While processing the volume for lseek, virFileReadHeaderFD, and
virStorageFileGetMetadataFromBuf - failure would cause an error,
but ret would not be set. That would result in an error message being
sent, but successful status being returned.
2015-12-09 16:31:14 -05:00
John Ferlan
af4028dccd storage: Add comments for backend APIs
Just so it's clearer what to expect upon input and what types of return
values could be generated.  These were loosely copied from existing
virStorageBackendUpdateVolTargetInfoFD.
2015-12-09 16:31:14 -05:00
John Ferlan
22346003dc storage: Add readflags for backend error processing
Similar to the openflags which allow VIR_STORAGE_VOL_OPEN_NOERROR to be
passed to avoid open errors, add a 'readflags' variable so that in the
future read failures could also be ignored.
2015-12-09 16:31:14 -05:00
Andrea Bolognani
8df2f1d874 tests: scsihost: Use fakerootdir instead of fakesysfsdir
This updates the test program to make it consistent with recent changes
to the mock libraries, and also opens up the possibility of mocking more
than just /sys in the future.
2015-12-09 15:22:59 +01:00
Andrea Bolognani
d2939c0f4d tests: Use more specific names for variables
Instead of fakesysfsdir, which is very generic, use fakesysfspcidir and
fakesysfscgroupdir. This makes it explicit what part of the fake sysfs
filesystem they're referring to, and also leaves open the possibility of
handling files in two unrelated parts of the fake sysfs filesystem.

No functional changes.
2015-12-09 15:22:58 +01:00
Andrea Bolognani
46468c62ff tests: Rename LIBVIRT_FAKE_SYSFS_DIR to LIBVIRT_FAKE_ROOT_DIR
The old name is no longer accurate, since now we're using its value as
the root of the fake filesystem.

No functional changes.
2015-12-09 15:22:58 +01:00
Andrea Bolognani
bc80c45d5d tests: cgroupmock: Use the temporary directory as fake root
We might need to mock files living outside SYSFS_PREFIX later on,
so it's better to treat the temporary directory we are passed via
the environment as the root of the fake filesystem and create
SYSFS_PREFIX inside it.

The environment variable name will be changed to reflect the new use
we're making of it in a later commit.
2015-12-09 15:22:58 +01:00
Andrea Bolognani
f94398e72e tests: pcimock: Use the temporary directory as fake root
We might need to mock files living outside PCI_SYSFS_PREFIX later on,
so it's better to treat the temporary directory we are passed via
the environment as the root of the fake filesystem and create
PCI_SYSFS_PREFIX inside it.

The environment variable name will be changed to reflect the new use
we're making of it in a later commit.
2015-12-09 15:22:58 +01:00
Andrea Bolognani
b76d936670 tests: pcimock: Remove check for fakesysfsdir
init_env() will return right away if fakesysfsdir is already
initialized, so this check is redundant.
2015-12-09 15:22:58 +01:00
Andrea Bolognani
8e3122faf5 tests: scsihost: Don't set LIBVIRT_FAKE_SYSFS_DIR
The test program is not preloading any of the mock libraries that read
that environment variable, so setting it is pointless.
2015-12-09 15:22:58 +01:00
Peter Krempa
8715120e4d qemu: cgroup: Don't use priv->ncpupids to iterate domain vCPUs
Use the proper data structures for the iteration since ncpupids will be
made private later.
2015-12-09 14:57:12 +01:00
Peter Krempa
ce43cca0eb qemu: driver: Refactor qemuDomainHelperGetVcpus
Change some of the control structures and switch to using the new vcpu
structure.
2015-12-09 14:57:12 +01:00
Peter Krempa
e6b36736a8 qemu: Add helper to retrieve vCPU pid
Instead of directly accessing the array add a helper to do this.
2015-12-09 14:57:12 +01:00
Peter Krempa
220a2d51de qemu: Replace checking for vcpu<->pid mapping availability with a helper
Add qemuDomainHasVCpuPids to do the checking and replace in place checks
with it.

We no longer need checking whether the thread contains fake data
(vcpupids[0] == vm->pid) as in b07f3d821d
and 65686e5a81 this was removed.
2015-12-09 14:57:12 +01:00
Peter Krempa
e4bf9a3bcc qemu: Drop checking vcpu threads in emulator bandwidth getter/setter
The vCPU threads make sense in the counterparts that set the vCPU
bandwidth/quota, not in the emulator one. The emulator tunables are set
all the time anyways.

Drop the extra check and remove the now unneeded vm argument.
2015-12-09 14:57:12 +01:00
Peter Krempa
6ba02c21ac qemu: cgroup: Remove now unreachable check
Since commit 0c04906fa the check for priv->cgroup doesn't make sense as
the calls to virCgroupHasController return the same information. Remove
it and move it's comment partially to the new check.

The already spurious check was also later copied to the iothreads code.
2015-12-09 14:57:12 +01:00
Peter Krempa
233c3ac861 conf: Add helper to get pointer to a certain vCPU definition
Once more stuff will be moved into the vCPU data structure it will be
necessary to get a specific one in some ocasions. Add a helper that will
simplify this task.
2015-12-09 14:57:12 +01:00
Peter Krempa
24a7beea5a conf: ABI: Split up and improve vcpu info ABI checking
Extract the checking code into a separate function and prepare the
infrastructure for checking the new structure type.
2015-12-09 14:57:12 +01:00
Peter Krempa
4e86838d89 conf: turn def->vcpus into a structure
To allow collecting all relevant data at one place let's make def->vcpus
a structure and then we can start moving stuff into it.
2015-12-09 14:57:12 +01:00
Peter Krempa
9d5ac29eef qemu: refactor qemuDomainHotunplugVcpus
Refactor the code flow so that 'exit_monitor:' can be removed.

This patch moves the auditing functions into places where it's certain
that hotunplug was or was not successful and reports errors from
qemuMonitorGetCPUInfo properly.
2015-12-09 14:57:12 +01:00
Peter Krempa
de3db7d27f qemu: Refactor qemuDomainHotplugVcpus
Refactor the code flow so that 'exit_monitor:' can be removed.

This patch also moves the auditing and setting of the new vCPU count
right to the place where the hotplug happens, since it's possible that
the hotplug succeeds and adds a cpu while other stuff fails.

Lastly, failures of qemuMonitorGetCPUInfo are now reported rather than
ignored. The function retuns 0 if it "successfully" detected 0 threads.
2015-12-09 14:57:12 +01:00
Peter Krempa
3b3b98056d qemu: cpu hotplug: Move loops to qemuDomainSetVcpusFlags
qemuDomainHotplugVcpus/qemuDomainHotunplugVcpus are complex enough in
regards of adding one CPU. Additionally it will be desired to reuse
those functions later with specific vCPU hotplug.

Move the loops for adding vCPUs into qemuDomainSetVcpusFlags so that the
helpers can be made simpler and more straightforward.
2015-12-09 14:57:12 +01:00
Peter Krempa
7912d87920 qemu: monitor: Remove weird return values from qemuMonitorSetCPU
Let the function report errors internally and change it to return
standard return codes.
2015-12-09 14:57:12 +01:00
Peter Krempa
8cf65dabf2 qemu: cpu hotplug: Fix error handling logic
The cpu hotplug helper functions used negative error handling in a part
of them, although some code that was added later didn't properly set the
error codes in some cases. This would cause improper error messages in
cases where we couldn't modify the numa cpu mask and a few other cases.

Fix the logic by converting it to the regularly used pattern.
2015-12-09 14:57:12 +01:00
Peter Krempa
bb1d8d7a84 qemu: Split up vCPU hotplug and hotunplug
There's only very little common code among the two operations. Split the
functions so that the internals are easier to understand and refactor
later.
2015-12-09 14:57:12 +01:00
Peter Krempa
2642a36db5 qemu: qemuDomainSetVcpusAgent: re-check agent before calling it the again
With a very unfortunate timing, the agent might vanish before we do the
second call while the locks were down. Re-check that the agent is
available before attempting it again.
2015-12-09 14:57:12 +01:00
Peter Krempa
da6620ffac qemu: Extract vCPU onlining/offlining via agent into a separate function
Separate the code so that qemuDomainSetVcpusFlags contains only code
relevant to hardware hotplug/unplug.
2015-12-09 14:57:12 +01:00
Peter Krempa
31fea86564 qemu: domain: Add helper to access vm->privateData->agent
As in commit 88dc7e0c2f, the helper can be used in cases where the
function actually does not access anyting in the private data besides
the agent.
2015-12-09 14:57:12 +01:00
Peter Krempa
80a59b4aae conf: Turn def->maxvcpus into size_t
Later on this will also be used to track size of the vcpu data array.
Use size_t so that we can utilize the memory allocation helpers.
2015-12-09 14:57:12 +01:00
Peter Krempa
71c89ac9df conf: Replace read accesses to def->vcpus with accessor 2015-12-09 14:57:12 +01:00