CC libvirt_util_la-vircommand.lo
../../src/util/vircommand.c:2358:1: error: 'virCommandHandshakeChild' defined but not used [-Werror=unused-function]
The function is only implemented inside #ifndef WIN32.
* src/util/vircommand.c (virCommandHandshakeChild): Hoist earlier,
so that win32 build doesn't hit an unused forward declaration.
virCommand was previously calling virSetUIDGID() to change the uid and
gid of the child process, then separately calling
virSetCapabilities(). This did not work if the desired uid was != 0,
since a setuid to anything other than 0 normally clears all
capabilities bits.
The solution is to use the new virSetUIDGIDWithCaps(), sending it the
uid, gid, and capabilities bits. This will get the new process setup
properly.
Since the static functions virSetCapabilities() and
virClearCapabilities are no longer called, they have been removed.
NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch
will make CAP_SYS_RAWIO (which is required for passthrough of generic
scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and
74e0349) be retained by qemu when necessary. Apparently that
capability has been broken for non-root qemu ever since it was
originally added.
Normally when a process' uid is changed to non-0, all the capabilities
bits are cleared, even those explicitly set with calls to
capng_update()/capng_apply() made immediately before setuid. And
*after* the process' uid has been changed, it no longer has the
necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities
bits, it is necessary to either call capng_change_id() (which
unfortunately doesn't currently call initgroups to setup auxiliary
group membership), or to perform the small amount of calisthenics
contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities
setting/clearing in virSetUIDGIDWithCaps() and virCommand's
virSetCapabilities() (which it will replace in the next patch) is that
the new function properly clears the capabilities bounding set, so it
will not be possible for a child process to set any new
capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in
capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed
to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current
capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities
maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to
setuid/setgid/change the bounding set (unless they were also requested
by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled
with setting the uid, this is necessarily done in a single function,
rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at
time of execve) for a process that has uid != 0, the *file*
capabilities of the binary being executed must also have the desired
capabilities bit(s) set (see "man 7 capabilities"). This can be done
with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio").
This is an interim measure to make sure everything still works in this
order. The next step will be to perform capabilities drop and
setuid/gid as a single operation (which is the only way to keep any
capabilities when switching to a non-root uid).
virCommand gets two new APIs: virCommandSetSELinuxLabel() and
virCommandSetAppArmorProfile(), which both save a copy of a
null-terminated string in the virCommand. During virCommandRun, if the
string is non-NULL and we've been compiled with AppArmor and/or
SELinux security driver support, the appropriate security library
function is called for the child process, using the string that was
previously set. In the case of SELinux, setexeccon_raw() is called,
and for AppArmor, aa_change_profile() is called.
This functionality has been added so that users of virCommand can use
the upcoming virSecurityManagerSetChildProcessLabel() prior to running
a child process, rather than needing to setup a hook function to be
called (and in turn call virSecurityManagerSetProcessLabel()) *during*
the setup of the child process.
Rather than treating uid:gid of 0:0 as a NOP, we blindly pass that
through to the lower layers. However, we *do* check for a requested
value of "-1" to mean "don't change this setting". setregid() and
setreuid() already interpret -1 as a NOP, so this is just an
optimization, but we are also calling getpwuid_r and initgroups, and
it's unclear what the former would do with a uid of -1.
If a uid and/or gid is specified for a command, it will be set just
after the user-supplied post-fork "hook" function is called.
The intent is that this can replace user hook functions that set
uid/gid. This moves the setting of uid/gid and dropping of
capabilities closer to each other, which is important since the two
should really be done at the same time (libcapng provides a single
function that does both, which we will be unable to use, but want to
mimic as closely as possible).
All args except "cmd" in the call to virExec are now redundant, since
they can all be found in cmd, so remove the args and reference the
data directly in cmd. One exception to this is that "infd" was being
modified within virExec, and modifying the original in cmd caused make
check failures, so cmd->infd is copied to a local, and the local is
used during virExec().
virExecWithHook is only called from one place, so it always has the
same "hook" function (virHookCommand), and the data sent to that
function is always a virCommandPtr, so eliminate the function and
generic data from the arglist, and replace it with "virCommandPtr
cmd". The call to (hook)(data) is replaced with
"virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec.
Indentation has been updated only for code that will remain after the
next patch, which will remove all other args to virExec (since they
are now redundant, as they're all members of virCommandPtr).
Currently, if a command wants to do asynchronous IO, a callback
is registered in the libvirtd eventloop to handle writes and
reads. However, there's a race in virCommandWait. The eventloop
may already be executing the callback, while virCommandWait is
mangling internal state of virCommand. To deal with it, we need
to either introduce locking or spawn a separate thread where we
poll() on stdio from child. The former, however, requires to
unlock all mutexes held, as the event loop may execute other
callbacks which tries to lock one of the mutexes, deadlock and
thus never wake us up. So it's safer to spawn a separate thread.
This makes code easier to read, by avoiding lines longer than
80 columns and removing the repetition from the callers.
* src/util/virstoragefile.c (qedGetHeaderUL, qedGetHeaderULL):
Delete in favor of more generic macros.
(qcow2GetBackingStoreFormat, qcowXGetBackingStore)
(qedGetBackingStore, virStorageFileMatchesVersion)
(virStorageFileGetMetadataInternal): Use new macros.
* src/cpu/cpu_x86.c (x86VendorLoad): Likewise.
We have several cases where we need to read endian-dependent
data regardless of host endianness; rather than open-coding
these call sites, it will be nicer to funnel things through
a macro.
The virendian.h file can be expanded to add writer functions,
and/or 16-bit access patterns, if needed. Also, if we need
to turn things into a function to avoid multiple evaluations
of buf, that can be done later. But for now, a macro worked.
* src/util/virendian.h: New file.
* src/Makefile.am (UTIL_SOURCES): Ship it.
* tests/virendiantest.c: New test.
* tests/Makefile.am (test_programs, virendiantest_SOURCES): Run
the test.
* .gitignore: Ignore built file.
Instead of creating an iptables command in one shot, do it in steps
so we can add conditional options like physdev and protocol.
This removes code duplication while keeping existing behaviour.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are requesting for stderr catching for all cases in
virFileWrapperFdNew(). There is no need to have a separate
function just to report an error, esp. when we can do it in
virFileWrapperFdClose().
We had an easy way to iterate set bits, but not for iterating
cleared bits.
* src/util/virbitmap.h (virBitmapNextClearBit): New prototype.
* src/util/virbitmap.c (virBitmapNextClearBit): Implement it.
* src/libvirt_private.syms (bitmap.h): Export it.
* tests/virbitmaptest.c (test4): Test it.
To allow modifications to the lists to be synchronized, convert
virPCIDeviceList and virUSBDeviceList into virObjectLockable
classes. The locking, however, will not be self-contained. The
users of these classes will have to call virObjectLock/Unlock
in the critical regions.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 34e8f63a32 introduced support for catching errors from
libvirt iohelper. However, at those times there wasn't such fancy
API as virCommandDoAsyncIO(), so everything has to be implemented
on our own. But since we do have the API now, we can use it and
drop our implementation then.
Currently, if we want to feed stdin, or catch stdout or stderr of a
virCommand we have to use virCommandRun(). When using virCommandRunAsync()
we have to register FD handles by hand. This may lead to code duplication.
Hence, introduce an internal API, which does this automatically within
virCommandRunAsync(). The intended usage looks like this:
virCommandPtr cmd = virCommandNew*(...);
char *buf = NULL;
...
virCommandSetOutputBuffer(cmd, &buf);
virCommandDoAsyncIO(cmd);
if (virCommandRunAsync(cmd, NULL) < 0)
goto cleanup;
...
if (virCommandWait(cmd, NULL) < 0)
goto cleanup;
/* @buf now contains @cmd's stdout */
VIR_DEBUG("STDOUT: %s", NULLSTR(buf));
...
cleanup:
VIR_FREE(buf);
virCommandFree(cmd);
Note, that both stdout and stderr buffers may change until virCommandWait()
returns.
QEMU is fully capable of handling VDI images and we just refuse to
work with them. As qemu-img knows and supports this, there should be
no problem with this addition.
This is of course, just basic functionality, without searching for any
backing files, etc.
Some files have the magic shifted to some offset other than 0, so we
have to support that. I also cleaned up some lines to be more
readable and added missing magic for iso file format.
Commit 6094ad7b (0.9.3 release) promoted several functions from
internal to public, but forgot to fix the documentation generator
to provide details about those functions.
For an example of what this fixes, look at:
file:///path/to/libvirt/docs/html/libvirt-libvirt.html#virEventAddHandle
before and after the patch.
* docs/apibuild.py (ignored_functions): Don't ignore functions
that were turned into official API.
* src/util/virevent.c: Fix comments to pass through parser.
Way back when I started making changes for Coverity messages my first set
were to a bunch of CHECKED_RETURN errors. In particular virAsprintf() had
a few callers that Coverity noted didn't check their return (although some
did check if the buffer being printed to was NULL or not).
It was suggested at the time as a further patch an ATTRIBUTE_RETURN_CHECK
should be added to virAsprintf(), see:
https://www.redhat.com/archives/libvir-list/2013-January/msg00120.html
This patch does that and fixes a few more instances not found by Coverity
that failed the check.
Setting the log output prefix to 0 is not supported and in fact results
in the following message:
warning : virLogParseOutputs:1021 : Ignoring invalid log output setting.
This patch changes the name of the @sep argument to @terminator and
clarifies it's usage. This patch also explicitly documents that
whitespace can't be used as @terminator as it is skipped multiple times
in the implementation.
When building with static analysis enabled, we turn on attribute
nonnull checking. However, this caused the build to fail with:
../../src/util/virobject.c: In function 'virObjectOnceInit':
../../src/util/virobject.c:55:40: error: null argument where non-null required (argument 1) [-Werror=nonnull]
Creation of the virObject class is the one instance where the
parent class is allowed to be NULL. Making things conditional
will let us keep static analysis checking for all other .c file
callers, without breaking the build on this one exception.
* src/util/virobject.c: Define witness.
* src/util/virobject.h (virClassNew): Use it to force most callers
to pass non-null parameter.
The Coverity static analyzer was generating many false positives for the
unary operation inside the VIR_FREE() definition as it was trying to evaluate
the else portion of the "?:" even though the if portion was (1).
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, whenever somebody calls saferead() on nonblocking FD
(safewrite() is totally interchangeable for purpose of this message)
he might get wrong return value. For instance, in the first iteration
some data is read. The number of bytes read is stored into local
variable 'nread'. However, in next iterations we can get -1 from
read() with errno == EAGAIN, in which case the -1 is returned despite
fact some data has already been read. So the caller gets confused.
Bare read() should be used for nonblocking FD.
Working with virTypedParameters in clients written in C is ugly and
requires all clients to duplicate the same code. This set of APIs makes
this code for manipulating with virTypedParameters integral part of
libvirt so that all clients may benefit from it.
A build on FreeBSD failed with:
util/virportallocator.c:108: error: storage size of 'addr' isn't known
util/virportallocator.c:123: error: 'INADDR_ANY' undeclared (first use in this function)
It turns out that while POSIX allows sockaddr_in to leak in through
<arpa/inet.h> (the way Linux does it), it is not mandatory, and
conforming applications are required to get it through <netinet/in.h>.
* src/util/virportallocator.c: Include header for struct
sockaddr_in.
* tests/virportallocatortest.c: Likewise.
The QEMU driver default max port is 65535, but it then increments
this by 1 to 65536. This maps to 0 in an unsigned short :-( This
was apparently done so that for() loops could use "< max" instead
of "<= max". Remove this insanity and just make the loop do the
right thing.
A great many virObject instances require a mutex, so introduce
a convenient class for this which provides a mutex. This avoids
repeating the tedious init/destroy code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently all classes must directly inherit from virObject.
This allows for arbitrarily deep hierarchy. There's not much
to this aside from chaining up the 'dispose' handlers from
each class & providing APIs to check types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make cpuset local to the while loop and free it once done with it each
time through the loop. Add a sa_assert() to virBitmapParse() to keep Coverity
from believing there could be a negative return and possible resource leak.
Commit c308a9ae was incomplete; it resolved the configure failure,
but not a later build failure.
* src/util/virnetdevbridge.c: Include pre-req header.
* configure.ac (AC_CHECK_HEADERS): Prefer standard in.h over
non-standard ip6.h.
There's no need to do lots of readlink() calls to canonicalize
a name if we're only going to use stat() on it, since stat()
already chases symlinks.
* src/util/virutil.c (virGetDeviceID): Let stat() do the symlink
chasing.
Pass stub driver name directly to pciDettachDevice and pciReAttachDevice to fit
for different libvirt drivers. For example, qemu driver prefers pci-stub, but
Xen prefers pciback.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
"virGetDeviceID" could be used across the sources, but it doesn't
relate with this series, and could be done later.
* src/util/virutil.h: (Declare virGetDeviceID, and
vir{Get,Set}DeviceUnprivSGIO)
* src/util/virutil.c: (Implement virGetDeviceID and
vir{Get,Set}DeviceUnprivSGIO)
* src/libvirt_private.syms: Export private symbols of upper helpers
This is an adjustment to the fix for
https://bugzilla.redhat.com/show_bug.cgi?id=889319
to account for two bonehead mistakes I made.
commit ac2797cf2a attempted to fix a
problem with netlink in newer kernels requiring an extra attribute
with a filter flag set in order to receive an IFLA_VFINFO_LIST from
netlink. Unfortunately, the #ifdef that protected against compiling it
in on systems without the new flag went a bit too far, assuring that
the new code would *never* be compiled, and even if it had, the code
was incorrect.
The first problem was that, while some IFLA_* enum values are also
their existence at compile time, IFLA_EXT_MASK *isn't* #defined, so
checking to see if it's #defined is not a valid method of determining
whether or not to add the attribute. Fortunately, the flag that is
being set (RTEXT_FILTER_VF) *is* #defined, and it is never present if
IFLA_EXT_MASK isn't, so it's sufficient to just check for that flag.
And to top it off, due to the code not actually compiling when I
thought it did, I didn't realize that I'd been given the wrong arglist
to nla_put() - you can't just send a const value to nla_put, you have
to send it a pointer to memory containing what you want to add to the
message, along with the length of that memory.
This time I've actually sent the patch over to the other machine
that's experiencing the problem, applied it to the branch being used
(0.10.2) and verified that it works properly, i.e. it does fix the
problem it's supposed to fix. :-/
To bring in line with new naming practice, rename the=
src/util/cgroup.{h,c} files to vircgroup.{h,c}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=889319
When assigning an SRIOV virtual function to a guest using "intelligent
PCI passthrough" (<interface type='hostdev'>, which sets the MAC
address and vlan tag of the VF before passing its info to qemu),
libvirt first learns the current MAC address and vlan tag by sending
an NLM_F_REQUEST message for the VF's PF (physical function) to the
kernel via a NETLINK_ROUTE socket (see virNetDevLinkDump()); the
response message's IFLA_VFINFO_LIST section is examined to extract the
info for the particular VF being assigned.
This worked fine with kernels up until kernel commit
115c9b81928360d769a76c632bae62d15206a94a (first appearing in upstream
kernel 3.3) which changed the ABI to not return IFLA_VFINFO_LIST in
the response until a newly introduced IFLA_EXT_MASK field was included
in the request, with the (newly introduced, of course) RTEXT_FILTER_VF
flag set.
The justification for this ABI change was that new fields had been
added to the VFINFO, causing NLM_F_REQUEST messages to fail on systems
with large numbers of VFs if the requesting application didn't have a
large enough buffer for all the info. The idea is that most
applications doing an NLM_F_REQUEST don't care about VFINFO anyway, so
eliminating it from the response would lower the requirements on
buffer size. Apparently, the people who pushed this patch made the
mistaken assumption that iproute2 (the "ip" command) was the only
package that used IFLA_VFINFO_LIST, so it wouldn't break anything else
(and they made sure that iproute2 was fixed.
The logic of this "fix" is debatable at best (one could claim that the
proper fix would be for the applications in question to be fixed so
that they properly sized the buffer, which is what libvirt does
(purely by virtue of using libnl), but it is what it is and we have to
deal with it.
In order for <interface type='hostdev'> to work properly on systems
with a kernel 3.3 or later, libvirt needs to add the afore-mentioned
IFLA_EXT_MASK field with RTEXT_FILTER_VF set.
Of course we also need to continue working on systems with older
kernels, so that one bit of code is compiled conditionally. The one
time this could cause problems is if the libvirt binary was built on a
system without IFLA_EXT_MASK which was subsequently updated to a
kernel that *did* have it. That could be solved by manually providing
the values of IFLA_EXT_MASK and RTEXT_FILTER_VF and adding it to the
message anyway, but I'm uncertain what that might actually do on a
system that didn't support the message, so for the time being we'll
just fail in that case (which will very likely never happen anyway).
This patch fixes the lack of error messages when libvirt fails to find
VFINFO in a returned netlinke response message.
https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 is an example
of the error message that was previously logged when the
IFLA_VFINFO_LIST object was missing from the netlink response. The
reason for this failure is detailed in
https://bugzilla.redhat.com/show_bug.cgi?id=889319
Even though that root problem has been fixed, the experience of
finding the root cause shows us how important it is to properly log an
error message in these cases. This patch *seems* to replace the entire
function, but really most of the changes are due to moving code that
was previously inside an if() statement out to the top level of the
function (the original if() was reversed and made to log an error and
return).
Revert the complex workaround of commit 39d91e9, now that we have
a nicer framework for shutting up broken gcc.
* src/util/buf.c (virBufferEscape): Simplify.
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
Introduce a 'virArch' enum for CPU architectures. Include
data type providing wordsize and endianness, and APIs to
query this info and convert to/from enum and string form.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This is yet another refinement to the fix for CVE-2012-3411:
https://bugzilla.redhat.com/show_bug.cgi?id=833033
It turns out that it would be very intrusive to correctly backport the
entire --bind-dynamic option to older dnsmasq versions
(e.g. dnsmasq-2.48 that is used on RHEL6.x and CentOS 6.x), but very
simple to patch those versions to just use SO_BINDTODEVICE on all
their listening sockets (SO_BINDTODEVICE also has the desired effect
of permitting only traffic that was received on the interface(s) where
dnsmasq was set to listen.)
This patch modifies the dnsmasq capabilities detection to detect the
string:
--bind-interfaces with SO_BINDTODEVICE
in the output of "dnsmasq --version", and in that case realize that
using the old --bind-interfaces option is just as safe as
--bind-dynamic (and therefore *not* forbid creation of networks that
use public IP address ranges).
If -bind-dynamic is available, it is still preferred over
--bind-interfaces.
Note that this patch does no harm in upstream, or in any distro's
downstream if it happens to end up there, but builds for distros that
have a new enough dnsmasq to support --bind-dynamic do *NOT* need to
specifically backport this patch; it's only required for distro
releases that have dnsmasq too old to have --bind-dynamic (and those
distros will need to add the SO_BINDTODEVICE patch to dnsmasq,
*including the extra string in the --version output*, as well.
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There was a double free issue caused by virSysinfoRead on s390,
as the same manufacturer string instance was assigned to more
than one processor record.
Cleaned up other potential memory issues and restructured the sysinfo
parsing code by moving repeating patterns into a helper function.
The restructuring made it necessary to conditionally disable
-Wlogical-op for some older GCC versions, using pragma GCC diagnostic.
This is a GCC specific pragma, which is acceptable, since we're
using it to work around a GCC specific bug.
Finally, added a function virSysinfoSetup to configure the sysinfo
data source files/script during run time, to facilitate writing test
programs. This function is not published in sysinfo.h and only
there for testing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The current virStorageFileGet{LVM,SCSI}Key methods return
the key as the return value. Unfortunately it is desirable
for "NULL" to be a valid return value, as well as an error
indicator. Thus the returned key must instead be provided
as an out-parameter.
When we invoke lvs or scsi_id to extract ID for block devices,
we don't want virCommandWait logging errors messages. Thus we
must explicitly check 'status != 0', rather than letting
virCommandWait do it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The QED file format is non-versioned, so although the magic
value matched, libvirt rejected it due to lack of a version
number to compare against. We need to distinguish this case
by allowing a value of '-2' to indicate a non-versioned file
where only the magic is required to match
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To help us detect when new storage file versions come into
existance log a warning if the storage file magic matches,
but the version does not
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Fully stub out the virCgroupGetAppRoot method as done with other
methods in the file, rather than just the body. This lets us
annotate the unused parameter to avoid a warning
* Autotools changes:
- Don't assume Qemu is Linux-only
- Check Linux headers only on Linux
- Disable firewalld on FreeBSD
* Initctl:
Initctl seem to present only on Linux, so stub it on other platforms
* Raw I/O: Linux-only as well
* Headers cleanup
This patch gets rid of the undeterministic error reporting code done on
return values of get(pw|gr)nam_r. With this patch, if the group record
is not returned by the corresponding function this error is not
considered fatal even if errno != 0. The error is logged in such case.
virStorageFileGetLVMKey and virStorageFileGetSCSIKey
both return heap allocated strings, so the return value
should not be marked const.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This will be used whenever a NIC with guaranteed throughput is to
be plugged into a bridge. It will adjust the average throughput of
non guaranteed NICs (classid 1:2) to meet new requirements.
These set bridge part of QoS when bringing domain's interface up.
Long story short, if there's a 'floor' set, a new QoS class is created.
ClassID MUST be unique within the bridge and should be kept for
unplug phase.
These classes can borrow unused bandwidth. Basically,
only egress qdsics can have classes, therefore we can
do this kind of traffic shaping only on host's outgoing,
that is domain's incoming traffic.
This is however supported only on domain interfaces with
type='network'. Moreover, target network needs to have at least
inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average'
(old) or 'floor' (new). This new attribute can be used just for
interfaces type of network (<interface type='network'/>) currently.
Stochastic Fairness Queuing (SFQ) is queuing discipline
(qdisc) which doesn't really shape any traffic but 'just'
re-arrange packets in sending buffer so no stream starve.
The goal is to ensure fairness. There is basically only one
configuration parameter (perturb) which is set to advised
value of 10.
The DHCPv6 support includes IPV6 dhcp-range and dhcp-host for one
IPv6 subnetwork on one interface. This support will only work
if dnsmasq version >= 2.64; otherwise an error occurs if
dhcp-range or dhcp-host is specified for an IPv6 address.
Essentially, this change provides the same DHCP support for IPv6
that has been available for IPv4.
With dnsmasq >= 2.64, support for the RA service is also now provided
by dnsmasq (radvd is no longer used/started). (Although at least one
version of dnsmasq prior to 2.64 "supported" IPv6 Router
Advertisement, there were bugs (fixed in 2.64) that rendered it
unusable.)
Documentation and the network schema has been updated
to reflect the new support.
I noticed when writing the backend functions for virNetworkUpdate that
I was repeating the same sequence of memmove, VIR_REALLOC, nXXX-- (and
messed up the args to memmove at least once), and had seen the same
sequence in a lot of other places, so I decided to write a few
utility functions/macros - see the .h file for full documentation.
The intent is to reduce the number of lines of code, but more
importantly to eliminate the need to check the element size and
element count arithmetic every time we need to do this (I *always*
make at least one mistake.)
VIR_INSERT_ELEMENT: insert one element at an arbitrary index within an
array of objects. The size of each object is determined
automatically by the macro using sizeof(*array). The new element's
contents are copied into the inserted space, then the original copy
of contents are 0'ed out (if everything else was
successful). Compile-time assignment and size compatibility between
the array and the new element is guaranteed (see explanation below
[*])
VIR_INSERT_ELEMENT_COPY: identical to VIR_INSERT_ELEMENT, except that
the original contents of newelem are not cleared to 0 (i.e. a copy
is made).
VIR_APPEND_ELEMENT: This is just a special case of VIR_INSERT_ELEMENT
that "inserts" one past the current last element.
VIR_APPEND_ELEMENT_COPY: identical to VIR_APPEND_ELEMENT, except that
the original contents of newelem are not cleared to 0 (i.e. a copy
is made).
VIR_DELETE_ELEMENT: delete one element at an arbitrary index within an
array of objects. It's assumed that the element being deleted is
already saved elsewhere (or cleared, if that's what is appropriate).
All five of these macros have an _INPLACE variant, which skips the
memory re-allocation of the array, assuming that the caller has
already done it (when inserting) or will do it later (when deleting).
Note that VIR_DELETE_ELEMENT* can return a failure, but only if an
invalid index is given (index + amount to delete is > current array
size), so in most cases you can safely ignore the return (that's why
the helper function virDeleteElementsN isn't declared with
ATTRIBUTE_RETURN_CHECK). A warning is logged if this ever happens,
since it is surely a coding error.
[*] One initial problem with the INSERT and APPEND macros was that,
due to both the array pointer and newelem pointer being cast to void*
when passing to virInsertElementsN(), any chance of type-checking was
lost. If we were going to move in newelem with a memmove anyway, we
would be no worse off for this. However, most current open-coded
insert/append operations use direct struct assignment to move the new
element into place (or just populate the new element directly) - thus
use of the new macros would open a possibility for new usage errors
that didn't exist before (e.g. accidentally sending &newelemptr rather
than newelemptr - I actually did this quite a lot in my test
conversions of existing code).
But thanks to Eric Blake's clever thinking, I was able to modify the
INSERT and APPEND macros so that they *do* check for both assignment
and size compatibility of *ptr (an element in the array) and newelem
(the element being copied into the new position of the array). This is
done via clever use of the C89-guaranteed fact that the sizeof()
operator must have *no* side effects (so an assignment inside sizeof()
is checked for validity, but not actually evaluated), and the fact
that virInsertElementsN has a "# of new elements" argument that we
want to always be 1.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
virGetGroupIDByName is documented as returning 1 if the groupname
cannot be found. getgrnam_r is documented as returning:
« 0 or ENOENT or ESRCH or EBADF or EPERM or ... The given name
or gid was not found. »
and that:
« The formulation given above under "RETURN VALUE" is from POSIX.1-2001.
It does not call "not found" an error, hence does not specify what
value errno might have in this situation. But that makes it impossible to
recognize errors. One might argue that according to POSIX errno should be
left unchanged if an entry is not found. Experiments on various UNIX-like
systems shows that lots of different values occur in this situation: 0,
ENOENT, EBADF, ESRCH, EWOULDBLOCK, EPERM and probably others. »
virGetGroupIDByName returns an error when the return value of getgrnam_r
is non-0. However on my RHEL system, getgrnam_r returns ENOENT when the
requested user cannot be found, which then causes virGetGroupID not
to behave as documented (it returns an error instead of falling back
to parsing the passed-in value as an gid).
This commit makes virGetGroupIDByName only report an error when errno
is set to one of the values in the posix description of getgrnam_r
(which are the same as the ones described in the manpage on my system).
virGetUserIDByName is documented as returning 1 if the username
cannot be found. getpwnam_r is documented as returning:
« 0 or ENOENT or ESRCH or EBADF or EPERM or ... The given name
or uid was not found. »
and that:
« The formulation given above under "RETURN VALUE" is from POSIX.1-2001.
It does not call "not found" an error, hence does not specify what
value errno might have in this situation. But that makes it impossible to
recognize errors. One might argue that according to POSIX errno should be
left unchanged if an entry is not found. Experiments on various UNIX-like
systems shows that lots of different values occur in this situation: 0,
ENOENT, EBADF, ESRCH, EWOULDBLOCK, EPERM and probably others. »
virGetUserIDByName returns an error when the return value of getpwnam_r
is non-0. However on my RHEL system, getpwnam_r returns ENOENT when the
requested user cannot be found, which then causes virGetUserID not
to behave as documented (it returns an error instead of falling back
to parsing the passed-in value as an uid).
This commit makes virGetUserIDByName only report an error when errno
is set to one of the values in the posix description of getpwnam_r
(which are the same as the ones described in the manpage on my system).
If debugging is enabled, the debug messages are sent to stderr.
Moreover, if a command has catching of stderr set, the messages
gets mixed with stdout output (assuming both outputs are stored
in the same variable). The resulting string then doesn't
necessarily have to start with desired prefix then. This bug
exposes itself when parsing dnsmasq output:
2012-12-06 11:18:11.445+0000: 18491: error :
dnsmasqCapsSetFromBuffer:664 : internal error cannot parse
/usr/sbin/dnsmasq version number in '2012-12-06
11:11:02.232+0000: 18492: debug : virFileClose:72 : Closed fd 22'
We can clearly see that the output of dnsmasq --version doesn't
start with expected "Dnsmasq version " string but a libvirt debug
output.
If the debugging is enabled, the virCommand subsystem catches debug
messages in the command output as well. In that case, we can't assume
the string corresponding to command's stdout will start with specific
prefix. But the prefix can be moved deeper in the string. This bug
shows itself when parsing dnsmasq output:
2012-12-06 11:18:11.445+0000: 18491: error :
dnsmasqCapsSetFromBuffer:664 : internal error cannot parse
/usr/sbin/dnsmasq version number in '2012-12-06 11:11:02.232+0000:
18492: debug : virFileClose:72 : Closed fd 22'
We can clearly see that the output of dnsmasq --version
doesn't start with expected "Dnsmasq version " string but a libvirt
debug output.
The pciWrite32 function assembled the array of data to be written to the
fd with a bad offset on the last byte. This issue was probably caused by
a typo (14, 24).
This introduces a few new APIs for dealing with strings.
One to split a char * into a char **, another to join a
char ** into a char *, and finally one to free a char **
There is a simple test suite to validate the edge cases
too. No more need to use the horrible strtok_r() API,
or hand-written code for splitting strings.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To be able todo controlled shutdown/reboot of containers an
API to talk to init via /dev/initctl is required. Fortunately
this is quite straightforward to implement, and is supported
by both sysvinit and systemd. Upstart support for /dev/initctl
is unclear.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This new function returns true if the given address is in the range of
any "private" or "local" networks as defined in RFC1918 (IPv4) or
RFC3484/RFC4193 (IPv6), otherwise they return false.
These ranges are:
192.168.0.0/16
172.16.0.0/16
10.0.0.0/24
FC00::/7
FEC0::/10
In order to optionally take advantage of new features in dnsmasq when
the host's version of dnsmasq supports them, but still be able to run
on hosts that don't support the new features, we need to be able to
detect the version of dnsmasq running on the host, and possibly
determine from the help output what options are in this dnsmasq.
This patch implements a greatly simplified version of the capabilities
code we already have for qemu. A dnsmasqCaps device can be created and
populated either from running a program on disk, reading a file with
the concatenated output of "dnsmasq --version; dnsmasq --help", or
examining a buffer in memory that contains the concatenated output of
those two commands. Simple functions to retrieve capabilities flags,
the version number, and the path of the binary are also included.
bridge_driver.c creates a single dnsmasqCaps object at driver startup,
and disposes of it at driver shutdown. Any time it must be used, the
dnsmasqCapsRefresh method is called - it checks the mtime of the
binary, and re-runs the checks if the binary has changed.
networkxml2argvtest.c creates 2 "artificial" dnsmasqCaps objects at
startup - one "restricted" (doesn't support --bind-dynamic) and one
"full" (does support --bind-dynamic). Some of the test cases use one
and some the other, to make sure both code pathes are tested.
Found by coverity:
Error: REVERSE_INULL (CWE-476):
libvirt-0.10.2/src/util/processinfo.c:141: deref_ptr: Directly
dereferencing pointer "map".
libvirt-0.10.2/src/util/processinfo.c:142: check_after_deref:
Null-checking "map" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
The virStateInitialize method and several cgroups methods were
using an 'int privileged' parameter or similar for dual-state
values. These are better represented with the bool type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
because libvirt_lxc's cgroup mountpoint is what it shown
in /proc/self/cgroup.
we can get container's cgroup through virCgroupNew("/", &group),
add interface virCgroupGetAppRoot to help container to
get it's cgroup.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
virCgroupGetMemSwapUsage is used to get container's swap usage,
with this interface,we can get swap usage in fuse filesystem.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
This bug leads to getting incorrect vcpupin information via
qemudDomainGetVcpuPinInfo() API when the number of maximum
cpu on a host falls into a range such as 31 < ncpus < 64.
gcc warning:
left shift count >= width of type
The following bug is such the case
https://bugzilla.redhat.com/show_bug.cgi?id=876415
In virNetDevVethDelete the virRun method will properly report
errors, but when checking the exit status for non-zero exit
code no error is reported
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When failing to create a macvlan interface, make sure the
error message contains the name of the host interface
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It is possible for there to be deleted timers when we
calculate the next timeout, and they must be skipped.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The event code is a no-op if requested to update a non-existent
timer/handle watch. This makes it hard to detect bugs in the
caller who have passed bogus data. Add a VIR_WARN output in
such cases, since the API does not allow for return errors.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The docs for virDiskNameToIndex claim it ignores partition
numbers. In actual fact though, a code ordering bug means
that a partition number will cause the code to accidentally
multiply the result by 26.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit e0c469e58b that fixes the detection
of image chain wasn't complete. Iteration through the backing image
chain has to stop at the last existing image if some of the images are
missing otherwise the backing chain that is cached contains entries with
paths being set to NULL resulting to:
error: Unable to allow access for disk path (null): Bad address
Fortunately stat() is kind enough not to crash when it's presented with
a NULL argument. At least on Linux.