When building with static analysis enabled, we turn on attribute
nonnull checking. However, this caused the build to fail with:
../../src/util/virobject.c: In function 'virObjectOnceInit':
../../src/util/virobject.c:55:40: error: null argument where non-null required (argument 1) [-Werror=nonnull]
Creation of the virObject class is the one instance where the
parent class is allowed to be NULL. Making things conditional
will let us keep static analysis checking for all other .c file
callers, without breaking the build on this one exception.
* src/util/virobject.c: Define witness.
* src/util/virobject.h (virClassNew): Use it to force most callers
to pass non-null parameter.
Adds a "ram" attribute globally to the video.model element, that changes
the resulting qemu command line only if video.type == "qxl".
<video>
<model type='qxl' ram='65536' vram='65536' heads='1'/>
</video>
That attribute gets a default value of 64*1024. The schema is unchanged
for other video element types.
The resulting qemu command line change is the addition of
-global qxl-vga.ram_size=<ram>*1024
or
-global qxl.ram_size=<ram>*1024
For the main and secondary qxl devices respectively.
The default for the qxl ram bar is 64*1024 kilobytes (the same as the
default qxl vram bar size).
The Coverity static analyzer was generating many false positives for the
unary operation inside the VIR_FREE() definition as it was trying to evaluate
the else portion of the "?:" even though the if portion was (1).
Signed-off-by: Eric Blake <eblake@redhat.com>
The count of vCPUs for a domain is extracted as a usingned long variable
but is stored in a unsigned short. If the actual number was too large,
a faulty number was stored.
The local redefinition of PED_PARTITION_PROTECTED results in the error
but is not a problem especially if the built code doesn't have the latest
definitions.
Upon successful return of virNetClientStreamEventAddCallback() the
allocated cbdata field will be freed by virNetClientStreamEventRemoveCallback()
as cbOpaque using the free function remoteStreamCallbackFree().
This avoids "Event negative_returns: A negative constant "-1" is passed as
an argument to a parameter that cannot be negative.". The called function
uses -1 to determine whether it needs to traverse all the hostdevs.
The use of switch statements inside a bounded for loop resulted in some
false positives regarding the "default:" label which cannot be reached
since each of the other case statements use the possible for loop values.
A [dead_error_begin] was added before the default label.
Commit id ebdbe25a adjusted the algorithm and the caller guarantees that
the 'params' will have a '_' in the name being searched. Add the [returned_null]
tag to the two instances.
The use of switch statements inside a bounded for loop resulted in some
false positives regarding the "default:" label which cannot be reached
since each of the other case statements use the possible for loop values.
Commit id a994ef2d1 changed the mechanism to store/update the default
security label from using disk->seclabels[0] to allocating one on the
fly. That change allocated the label, but never saved it. This patch
will save the label. The new virDomainDiskDefAddSecurityLabelDef() is
a copy of the virDomainDefAddSecurityLabelDef().
The code is not reachable as of commit id: bb85f229. Removed
virKeepAliveStop() and virObjectUnref() because 'ka' cannot be
anything but NULL at the cleanup label.
Currently, whenever somebody calls saferead() on nonblocking FD
(safewrite() is totally interchangeable for purpose of this message)
he might get wrong return value. For instance, in the first iteration
some data is read. The number of bytes read is stored into local
variable 'nread'. However, in next iterations we can get -1 from
read() with errno == EAGAIN, in which case the -1 is returned despite
fact some data has already been read. So the caller gets confused.
Bare read() should be used for nonblocking FD.
The snapshot name is used to create path to the definition save file.
When the name contains slashes the creation of the file fails. Reject
such names.
When the snapshot definition can't be saved, the
qemuDomainSnapshotCreate function succeeded without filling some of the
fields in the internal definition.
This patch removes the snapshot and returns failure if the XML file
cannot be written.
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Working with virTypedParameters in clients written in C is ugly and
requires all clients to duplicate the same code. This set of APIs makes
this code for manipulating with virTypedParameters integral part of
libvirt so that all clients may benefit from it.
When virStorageBackendLogicalCreateVol() creates a snapshot for a
logical volume with backingStore element, it fails with the message
below:
2013-01-17 03:10:18.869+0000: 1967: error : virCommandWait:2345 :
internal error Child process (/sbin/lvcreate --name lvm-snapshot -L 51200K
-s=/dev/lvm-pool/lvm-volume) unexpected exit status 3: /sbin/lvcreate:
invalid option -- '=' Error during parsing of command line.
This is because virCommandAddArgPair() uses '=' to connect the two
parameters, it's unsuitable for -s option of the lvcreate.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
A build on FreeBSD failed with:
util/virportallocator.c:108: error: storage size of 'addr' isn't known
util/virportallocator.c:123: error: 'INADDR_ANY' undeclared (first use in this function)
It turns out that while POSIX allows sockaddr_in to leak in through
<arpa/inet.h> (the way Linux does it), it is not mandatory, and
conforming applications are required to get it through <netinet/in.h>.
* src/util/virportallocator.c: Include header for struct
sockaddr_in.
* tests/virportallocatortest.c: Likewise.
The fetch of 'ipdef' in networkRefreshDhcpDaemon() when the loop to fill
in ipv4def fails to find an ipv4 address with dhcp defined. The filled in
ipdef value was not used. Code was made unnecessary with commit it 2d5cd1.
The driver mutex was unlocked in qemuDomainModifyDeviceFlags before
entering qemuDomainObjBeginJobWithDriver where it will be unlocked once
more leaving it in an undefined state. The result was that two
threads were simultaneously looking up the domain hash table during
multiple parallel device attach/detach operations.
Luckily this triggered a virHashIterationError.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When starting a VM, /var/log/messages was spammed with the following message:
xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
With each extra VM I start, the messages get amplified
exponentially. This results in longer starting times every new VM,
relative the the previously started VM. When I ran a test with
starting 100 equal VM's, the first VM started in about 2 seconds, the
100th VM took 48 seconds to start. I'm running a vanilla 3.7.1 kernel,
but I have the same issue on VM hosts with kernel 3.2.28 or 3.2.0,
running libvirt 0.9.12 and 0.9.8 respectively.
Looking into the warning, it seemed that iptables need an extra argument,
--physdev-is-bridged, in commands like:
iptables -A libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet99 -g FP-vnet99
With that, the warnings in /var/log/messages are gone and running the
test again proved the 100th VM started in 3.8 seconds.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=895294
The symptom was that attempts to modify a network device using
virDomainUpdateDeviceFlags() would fail if the original device had a
<boot> element (e.g. "<boot order='1'/>"), even if the updated device
had the same <boot> element. Instead, the following error would be logged:
cannot modify network device boot index setting
It's true that it's not possible to change boot order (internally
known as bootIndex) of a live device; qemuDomainChangeNet checks for
that, but the problem was that the information it was checking was
incorrect.
Explanation:
When a complete domain is parsed, a global (to the domain) "bootMap"
is passed down to the parse for each device; the bootMap is used to
make sure that devices don't have conflicting settings for their boot
orders.
When a single device is parsed by itself (as in the case of
virDomainUpdateDeviceFlags), there is no global bootMap that would be
appropriate to send, so NULL is sent instead. However, although the
lowest level function that parses just the boot order *does* simply
skip the sanity check in that case, the next higher level
"virDomainDeviceInfoParseXML" function refuses to call down to the
lower "virDomainDeviceBootParseXML" if bootMap is NULL. So, the boot
order is never set in the "new" device object, and when it is compared
to the original (which does have a boot order), they don't match.
The fix is to patch virDomainDeviceInfoParseXML to not care about
bootMap, and just always call virDomainDeviceInfoBootParseXML whenever
there is a <boot> element. When we are only parsing a single device,
we don't care whether or not any specified boot order is consistent
with the rest of the domain; we will always do this check later (in
the current case, we do it by verifying that the net bootIndex exactly
matches the old bootIndex).
The bandwidth plug and unplug functions were assuming that an
interface's bandwidth setting was always specified directly in the
domain's <interface> definition, but that's not necessarily true - it
could have been obtained from a <portgroup> definition in the network
definition. This patch fixes those functions to use
virDomainNetGetActualBandwidth(), which gets the bandwidth pointer
from iface->data.network.actual if it exists, otherwise returns
iface->bandwidth.
Remove extraneous check for 'netdef' when dereferencing for vlan.nTags.
Prior code would already check if netdef was NULL.
Coverity complained about a path where the 'vlan' was potentially valid,
but a prior checks may not have allocated 'iface->data.network.actual',
so like other paths it needs to be allocated on the fly.
Move the copying of vlan up earlier in networkAllocateActualDevice, so
that actual.type gets properly set.
Since the first assignment to vlan is redundant except in the case of
jumping immediately to validate from the start of the function,
eliminate its initial setting at the top of the function in favor of
calling the helper function virDomainNetGetActualVlan() (which doesn't
depend on the local vlan pointer being initialized) down at validate:
Signed-off-by: Laine Stump <laine@redhat.com>
When creating the virClass object for virNetClient, we specified
virObject as the parent instead of virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The QEMU driver default max port is 65535, but it then increments
this by 1 to 65536. This maps to 0 in an unsigned short :-( This
was apparently done so that for() loops could use "< max" instead
of "<= max". Remove this insanity and just make the loop do the
right thing.
that broke the build like:
CC libvirt_conf_la-domain_conf.lo
conf/domain_conf.c: In function 'virDomainVcpuPinAdd':
conf/domain_conf.c:11920:29: error: 'vpcupin' undeclared (first use in this function)
conf/domain_conf.c:11920:29: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [libvirt_conf_la-domain_conf.lo] Error 1
Commit dfa1e1dd added functions whose definitions do not conform
to the style used in the libxl driver. Change these functions to
be consistent throughout the driver.
In commit c4bbaaf8, caps->arch was checked uninitialized, rendering the
whole check useless.
This patch moves the conditional setting of QEMU_CAPS_NO_ACPI to
qemuCapsInitQMP, and removes the no longer needed exception for S390.
It also clears the flag for all non-x86 archs instead of just S390 in
qemuCapsInitHelp.
The virDomainObj, qemuAgent, qemuMonitor, lxcMonitor classes
all require a mutex, so can be switched to use virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Check status when attempting to set SO_REUSEADDR flag on outgoing connection
On failure, VIR_WARN(), but continue to connect. This code path is on the
sender side where the setting is just a hint and would only take effect if
the sender is overflowed with TCP connections. Inability to set doesn't mean
failure to establish a connection.
In virLockSpaceProtocolDispatchNew() the returned value of lockspace from
virLockDaemonFindLockSpace() is overwritten by the virLockSpaceNew() return.
Coverity complains that it's unused.
In virLockSpaceProtocolDispatchCreateLockSpace() lockspace is also overwritten
in a similar manner resulting in the same Coverity message.
After live change of cpu counts, the number of processor threads is
verified. This patch makes use of this approach to check if qemu ignored
the request for cpu hot-unplug and report an appropriate message.
A great many virObject instances require a mutex, so introduce
a convenient class for this which provides a mutex. This avoids
repeating the tedious init/destroy code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently all classes must directly inherit from virObject.
This allows for arbitrarily deep hierarchy. There's not much
to this aside from chaining up the 'dispose' handlers from
each class & providing APIs to check types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
libvirt lxc will fail to start when selinux is disabled.
error: Failed to start domain noroot
error: internal error guest failed to start: PATH=/bin:/sbin TERM=linux container=lxc-libvirt container_uuid=b9873916-3516-c199-8112-1592ff694a9e LIBVIRT_LXC_UUID=b9873916-3516-c199-8112-1592ff694a9e LIBVIRT_LXC_NAME=noroot /bin/sh
2013-01-09 11:04:05.384+0000: 1: info : libvirt version: 1.0.1
2013-01-09 11:04:05.384+0000: 1: error : lxcContainerMountBasicFS:546 : Failed to mkdir /sys/fs/selinux: No such file or directory
2013-01-09 11:04:05.384+0000: 7536: info : libvirt version: 1.0.1
2013-01-09 11:04:05.384+0000: 7536: error : virLXCControllerRun:1466 : error receiving signal from container: Input/output error
2013-01-09 11:04:05.404+0000: 7536: error : virCommandWait:2287 : internal error Child process (ip link del veth1) unexpected exit status 1: Cannot find device "veth1"
fix this problem by checking if selinuxfs is mounted
in host before we try to create dir /sys/fs/selinux.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Commit 509eb51 added lxc_protocol.x; but without the initial
checkin of src/lxc_protocol-structs, 'make check' would fail for
anyone with pdwtags installed:
make[3]: *** No rule to make target `lxc_protocol-structs', needed by `check-protocol'. Stop.
* src/lxc_protocol-structs: New file.
The virDomainLxcOpenNamespace method needs to open every file
in /proc/$INITPID/ns and return the open file descriptor to the
client application.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make cpuset local to the while loop and free it once done with it each
time through the loop. Add a sa_assert() to virBitmapParse() to keep Coverity
from believing there could be a negative return and possible resource leak.
Commit-id 'afc4631b' added the regfree(reg) to free resources alloc'd
during regcomp; however, reg still needed to be VIR_FREE()'d. The call
to regfree() also didn't account for possible NULL value. Reformatted
the call to be closer to usage.
Commit c308a9ae was incomplete; it resolved the configure failure,
but not a later build failure.
* src/util/virnetdevbridge.c: Include pre-req header.
* configure.ac (AC_CHECK_HEADERS): Prefer standard in.h over
non-standard ip6.h.
Some places missed the conversion from LIBCURL_{CFLAGS,LIBS} to
CURL_{CFLAGS,LIBS}, and a part of curl check was left in
configure.ac instead of m4/virt-curl.m4 by mistake
This patch introduces support for LXC specific public APIs. In
common with what was done for QEMU, this creates a libvirt_lxc.so
library and libvirt/libvirt-lxc.h header file.
The actual APIs are
int virDomainLxcOpenNamespace(virDomainPtr domain,
int **fdlist,
unsigned int flags);
int virDomainLxcEnterNamespace(virDomainPtr domain,
unsigned int nfdlist,
int *fdlist,
unsigned int *noldfdlist,
int **oldfdlist,
unsigned int flags);
which provide a way to use the setns() system call to move the
calling process into the container's namespace. It is not
practical to write in a generically applicable manner. The
nearest that we could get to such an API would be an API which
allows to pass a command + argv to be executed inside a
container. Even if we had such a generic API, this LXC specific
API is still useful, because it allows the caller to maintain
the current process context, in particular any I/O streams they
have open.
NB the virDomainLxcEnterNamespace() API is special in that it
runs client side, so does not involve the internal driver API.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This converts the libssh2 configure check to use LIBVIRT_CHECK_PKG.
Previously it would check version 1.0 and 1.3, but this simplifies
things to just require version 1.3
If addition of rules in networkAddIptablesRules() failed the real error
was masked by error reported when trying to clean up the remaining
rules.
With this patch the original error message is saved and set back after
the removal is complete.
Commit 0211fd6e04 introduced regression
where newly defined networks were not made persistent.
This patch makes the network persistent on each successful definition.
The phypUUIDTable_Push and phypUUIDTable_Pull leaked their file descriptors
on normal return. Each function had an unnecessary use of creating a buffer
to print conn->uri->user and needed a bit better flow control. I also noted
that the Read function had a cut-n-paste error from the write function on a
couple of VIR_WARN's.
The openSSHSession leaked the sock on the failure path. Additionally that
turns into the internal_socket in the phypOpen code. That was neither saved
nor closed on any path. So I used the connnection_data->sock field to save
the socket for eventual close. Of interest here is that phypExec used the
connection_data->sock field even though it had never been initialized.
I ran 'make dist' in the directory left over from ./autobuild.sh
(which was configured for a mingw cross build); the resulting
tarball had more files than 'make dist' on a normal Linux build.
I traced it to the fact that we were distributing a generated
file, but only when configure said the end user had to generate
the file in the first place. In the process, I noticed that
we had some difference in symbol file names; I added a comment
explaining why the difference exists (after first trying to
normalize the names and hitting VPATH build failures).
* configure.ac (LIBVIRT_QEMU_SYMBOL_FILE): Add some comments.
* src/Makefile.am (EXTRA_DIST): No need to ship a generated file;
particularly since which file is built depends on configure results.
There's no need to do lots of readlink() calls to canonicalize
a name if we're only going to use stat() on it, since stat()
already chases symlinks.
* src/util/virutil.c (virGetDeviceID): Let stat() do the symlink
chasing.
Pass stub driver name directly to pciDettachDevice and pciReAttachDevice to fit
for different libvirt drivers. For example, qemu driver prefers pci-stub, but
Xen prefers pciback.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
https://bugzilla.redhat.com/show_bug.cgi?id=892079
With current code, if user calls virDomainPMSuspendForDuration()
followed by virDomainDestroy(), the former API checks for qemu agent
presence, which will evaluate as true (if agent is configured). While
talking to qemu agent, the qemu driver is unlocked, so the latter API
starts executing. However, if machine dies meanwhile, libvirtd gets
EOF on the agent socket and qemuProcessHandleAgentEOF() is called. The
handler clears reference to qemu agent while the destroy API already
holding a reference to it. This leads to NULL dereferencing later in
the code. Therefore, the agent pointer should be set to NULL only if
we are the exclusive owner of it.
While OOM can have knock-on effects that trash a system, generally
the first symptom is one of memory thrashing.
* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.
when we has no host's src mapped to container.
there is no .oldroot dir,so libvirt lxc will fail
to start when mouting meminfo.
in this case,the parameter srcprefix of function
lxcContainerMountProcFuse should be NULL.and make
this method handle NULL correctly.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Perform all the appropriate plumbing.
When qemu/KVM VMs are paused manually through a monitor not-owned by libvirt,
libvirt will think of them as "paused" event after they are resumed and
effectively running. With this patch the discrepancy goes away.
This is meant to address bug 892791.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
gcc 4.1.2 on RHEL 5 warned:
conf/network_conf.c:3136: warning: 'foundIdx' may be used uninitialized in this function
The warning is spurious, but initializing the variable doesn't hurt.
* src/conf/network_conf.c (virNetworkDefUpdateDNSHost): Silence
unused variable warning.
POSIX does not guarantee whether uid_t and gid_t are signed or
unsigned, nor does it guarantee whether they are smaller, same
size, or larger than int (or even the same size as one another).
Therefore, it is possible to have platforms where '(uid_t)-1==-1'
is false or where 'uid = gid = -1' sets uid to the wrong value,
thanks to integer promotion rules. The only portable way to use
the placeholder value of these two types is to always use a cast.
Thankfully, the issue is mostly theoretical - sanlock only
compiles on Linux for now, and on Linux, these types do not
suffer from strange promotion problems.
* src/locking/lock_driver_sanlock.c
(virLockManagerSanlockSetupLockspace, virLockManagerSanlockInit)
(virLockManagerSanlockCreateLease): Cast -1 to proper type before
comparing with uid_t or gid_t.
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
This is the QEMU backend code for the SCLP console support.
It includes SCLP capability detection, QEMU command line generation
and a test case.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The SCLP console is the native console type for s390 and is preferred
over the virtio console as it doesn't require special drivers and
is more efficient. Recent versions of QEMU come with SCLP support
which is hereby enabled.
The new target types 'sclp' and 'sclplm' can be used to specify a
SCLP console. Adding documentation, domain schema and XML processing
support.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
To avoid confusion between the LXC driver <-> controller
monitor RPC protocol and the libvirt-lxc.so <-> libvirtd public
RPC protocol, rename the former to lxc_monitor_protocol.x
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the libvirt client can pass FDs to the server, but the
dispatch mechanism provides no way to return FDs back from the
server to the client. Tweak the dispatch code, such that if a
dispatcher returns '1', this indicates that it populated the
virNetMessagePtr with FDs to return
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A number of bugs handling file descriptors received from the
server caused the FDs to be lost and leaked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Change calling sequence to only call xenUnifiedDomainSetVcpusFlags() when
'dom' is not NULL. Use the GET_PRIVATE() macro to reference privateData.
Just return -1 if dom is NULL.
The code for setting up a private /dev/pts for the containers
is also responsible for making the LXC controller have a
private mount namespace. Unfortunately the /dev/pts code is
not run if launching a container without a custom root. This
causes the LXC FUSE mount to leak into the host FS.
Since we daemonized QEMU for capabilities probing there is a long
time if QEMU fails to launch. This is because we're not passing in
any virDomainObjPtr instance and thus the monitor code can not
check to see if the PID is still alive.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current code is initializing capabilities before setting
directory permissions. Thus the QEMU binaries being run may
not have the ability to create the UNIX monitor socket on
the first run of libvirtd.
See also commit 66ff2dd, where we avoided installing these files
as executables.
* daemon/Makefile.am (libvirtd.service): Drop chmod.
* tools/Makefile.am (libvirt-guests.service): Likewise.
* src/Makefile.am (virtlockd.service, virtlockd.socket):
Likewise.
virtlockd.service could be installed to a configurable root,
but virtlockd.socket was hardcoded to installation into a
distro.
* src/Makefile.am (virtlockd.service, virtlockd.socket): Drop
unused substitutions.
* src/locking/virtlockd.socket.in (ListenStream): Don't hard-code
/var.
We had several different styles of .in conversion in our Makefiles:
ALLCAPS, @ALLCAPS@, @lower@, ::lower::
Canonicalize on one form, to make it easier to copy and paste
between .in files.
Also, we were using some non-portable sed constructs: \@ is an
undefined escape sequence (it happens to be @ itself in GNU sed,
but POSIX allows it to mean something else), as well as risky
behavior (failure to consistently quote things means a space
in $(sysconfdir) could throw things off; also, Autoconf recommends
using | rather than , or ! in the s||| operator, because | has to
be quoted in shell and is therefore less likely to appear in file
names than , or !).
Fix all of these uses to follow the same syntax.
* daemon/libvirtd.8.in: Switch to @var@.
* tools/virt-xml-validate.in: Likewise.
* tools/virt-pki-validate.in: Likewise.
* src/locking/virtlockd.init.in: Likewise.
* daemon/Makefile.am: Prefer | over ! in sed.
(libvirtd.8): Prefer consistent substitution.
(libvirtd.init, libvirtd.service): Avoid non-portable sed.
* tools/Makefile.am (libvirt-guests.sh, libvirt-guests.init)
(libvirt-guests.service): Likewise.
(virt-xml-validate, virt-pki-validate, virt-sanlock-cleanup):
Prefer consistent capitalization.
* src/Makefile.am (virtlockd.init, virtlockd.service)
(virtlockd.socket): Prefer consistent substitution.
This prevents domain starting and disk attaching if the shared disk's
setting conflicts with other active domain(s), E.g. A domain with
"sgio" set as "filtered", however, another active domain is using
it set as "unfiltered".
Like "rawio", "sgio" is only allowed for block disk of device
type "lun".
It doesn't default disk->sgio to "filtered" when parsing, as
it won't be able to distinguish explicitly requested "filtered"
and a default "filtered" in driver then. We have to error out for
explicit request when the kernel doesn't support the new sysfs
knob "unpriv_sgio", however, for defaulted "filtered", we can
just ignore it if the kernel doesn't support "unpriv_sgio".
This introduces a hash table for qemu driver, to store the shared
disk's info as (@major:minor, @ref_count). @ref_count is the number
of domains which shares the disk.
Since we only care about if the disk support unprivileged SG_IO
commands, and the SG_IO commands only make sense for block disk,
this patch only manages (add/remove hash entry) the shared disk for
block disk.
* src/qemu/qemu_conf.h: (Add member 'sharedDisks' of type
virHashTablePtr; Declare helpers
qemuGetSharedDiskKey, qemuAddSharedDisk
and qemuRemoveSharedDisk)
* src/qemu/qemu_conf.c (Implement the 3 helpers)
* src/qemu/qemu_process.c (Update 'sharedDisks' when domain
starting and shutdown)
* src/qemu/qemu_driver.c (Update 'sharedDisks' when attaching
or detaching disk).
"virGetDeviceID" could be used across the sources, but it doesn't
relate with this series, and could be done later.
* src/util/virutil.h: (Declare virGetDeviceID, and
vir{Get,Set}DeviceUnprivSGIO)
* src/util/virutil.c: (Implement virGetDeviceID and
vir{Get,Set}DeviceUnprivSGIO)
* src/libvirt_private.syms: Export private symbols of upper helpers
When the disk alignment check done while redefining an existing snapshot
failed, the qemu driver attempted to free the existing snapshot. As in
the cleanup path the definition of the snapshot wasn't assigned, the
cleanup code dereferenced a NULL pointer.
This patch changes the behavior on error paths while redefining snapshot
in two ways:
1) On failure, modifications done on the snapshot definition object are
rolled back.
2) The previous definition of the data isn't freed until it's certain it
won't be needed any more.
This change avoids the segfault and additionally the snapshot doesn't
vanish if redefinition fails for some reason.
This also changes the function signature to take a
virDomainChrSourceDefPtr instead of just a path, since it needs to
differentiate behavior based on source->type.
The functionality provided in virchrdev.c (previously virconsole.c) is
applicable to other types of character devices besides consoles, such
as channels. This patch is just code motion, renaming things such as
"console" or "pty", instead using more general terms such as
"character device" or "device path".
This patch adds a new API, virDomainOpenChannel, that uses streams to
connect to a virtio channel on a guest. This creates a secure
communication channel between a guest and a libvirt client.
This behaves the same as virDomainOpenConsole, except on channels
instead of console/serial/parallel devices.
gcc -O2 complained:
../../src/conf/network_conf.c: In function 'virNetworkDefUpdateDNSSrv':
../../src/conf/network_conf.c:3232: error: 'foundIdx' may be used uninitialized in this function [-Wuninitialized]
It turned out to be a spurious warning (we didn't use foundIdx
unless foundCt was non-zero). But in investigating that, I noticed
a worse problem: we were using 'if (foundCt > 1)', but since foundCt
was bool, it could never be > 1.
* src/conf/network_conf.c (virNetworkDefUpdateDNSHost): Use
correct type.
(virNetworkDefUpdateDNSSrv): Likewise, and silence compiler
warning.
Since 4c993d8a we failed to set this important capability, which
allows starting a domain with QXL video card. We set DEVICE_QXL
capability bit instead, which is not necessary wrong. Anyway, if
qemu supports the new '-device qxl' it supports older '-vga qxl'
as well. The latter is used for the primary (the first) qxl video
card, the former for other video cards.
Commit b3f2b4ca5c left buf unallocated in
the case of QMP capability probing being used, leading to a segfault in
strlen in the cleanup path.
This patch opens the log and allocates the buffer if QMP probing was
used, so we can display the helpful error message.
Despite our great effort we still parsed qemu log output.
We wouldn't notice unless upcoming qemu 1.4 changed the
format of the logs slightly. Anyway, now we should gather
all interesting knobs like pty paths from monitor. Moreover,
since for historical reasons the first console can be just
an alias to the first serial port, we need to check this and
copy the pty path if that's the case to the first console.
This reverts commit 28224c4d2a
which shouldn't be needed at all because with current qemu
we obtain all paths from 'query-chardev' output. We ought
not parse log output at all anymore.
Since 586502189edf9fd0f89a83de96717a2ea826fdb0 qemu commit, the log
lines reporting chardev's path has changed from:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
char device redirected to /dev/pts/7
to:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device compat_monitor0 redirected to /dev/pts/5
char device serial0 redirected to /dev/pts/6
char device serial1 redirected to /dev/pts/7
However, with current code we are not prepared for such change, which
results in us being unable to start any domain.
Since sanlock doesn't run under root:root, we have chown()'ed the
__LIBVIRT__DISKS__ lease file to the user:group defined in the
sanlock config. However, when writing the patch I've forgot about
lease files for each disk (this is the
/var/lib/libvirt/sanlock/<md5>) file.
Many internal qemu APIs must find domain object from passed
virDomainPtr. And with function Peter's introduced, we can use it
instead of copying multiple lines among code.
Since we switched to QMP probing, the object types are spelled out
explicitly, i.e. virtio-net-pci. This has effectively disabled
the capability detection of s390 virtio devices. The trivial fix
is to add the s390 virtio types explicitly to qemuCapsObjectProps.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This is an adjustment to the fix for
https://bugzilla.redhat.com/show_bug.cgi?id=889319
to account for two bonehead mistakes I made.
commit ac2797cf2a attempted to fix a
problem with netlink in newer kernels requiring an extra attribute
with a filter flag set in order to receive an IFLA_VFINFO_LIST from
netlink. Unfortunately, the #ifdef that protected against compiling it
in on systems without the new flag went a bit too far, assuring that
the new code would *never* be compiled, and even if it had, the code
was incorrect.
The first problem was that, while some IFLA_* enum values are also
their existence at compile time, IFLA_EXT_MASK *isn't* #defined, so
checking to see if it's #defined is not a valid method of determining
whether or not to add the attribute. Fortunately, the flag that is
being set (RTEXT_FILTER_VF) *is* #defined, and it is never present if
IFLA_EXT_MASK isn't, so it's sufficient to just check for that flag.
And to top it off, due to the code not actually compiling when I
thought it did, I didn't realize that I'd been given the wrong arglist
to nla_put() - you can't just send a const value to nla_put, you have
to send it a pointer to memory containing what you want to add to the
message, along with the length of that memory.
This time I've actually sent the patch over to the other machine
that's experiencing the problem, applied it to the branch being used
(0.10.2) and verified that it works properly, i.e. it does fix the
problem it's supposed to fix. :-/
https://bugzilla.redhat.com/show_bug.cgi?id=888426
The code for doing a block-copy was supposed to track the destination
file in drive->mirror, but was set up to do all mallocs prior to
starting the copy so that OOM wouldn't leave things partially started.
However, the wrong variable was being written; later in the code we
silently did 'disk->mirror = mirror' which was still NULL, and thus
leaking memory and leaving libvirt to think that the mirror job was
never started, which prevented a pivot operation after a copy.
Problem introduced in commit 35c7701c6.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Initialize correct
variable.
To bring in line with new naming practice, rename the=
src/util/cgroup.{h,c} files to vircgroup.{h,c}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, it only considers PTY backend serial devices for pseries.
It need to support all kinds of serial devices.
This patch is to fix the problem which is that it doesn't work
when specifying source type as file.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
ACPI is only supported on x86 platform, PPC can't support it.
So QEMU_CAPS_NO_ACPI shouldn't be set.
This patch is to remove QEMU_CAPS_NO_ACPI capability for
non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Cirrus VGA model is not supported on ppc64 currently.
It needs to set std VGA model as the default model.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=889319
When assigning an SRIOV virtual function to a guest using "intelligent
PCI passthrough" (<interface type='hostdev'>, which sets the MAC
address and vlan tag of the VF before passing its info to qemu),
libvirt first learns the current MAC address and vlan tag by sending
an NLM_F_REQUEST message for the VF's PF (physical function) to the
kernel via a NETLINK_ROUTE socket (see virNetDevLinkDump()); the
response message's IFLA_VFINFO_LIST section is examined to extract the
info for the particular VF being assigned.
This worked fine with kernels up until kernel commit
115c9b81928360d769a76c632bae62d15206a94a (first appearing in upstream
kernel 3.3) which changed the ABI to not return IFLA_VFINFO_LIST in
the response until a newly introduced IFLA_EXT_MASK field was included
in the request, with the (newly introduced, of course) RTEXT_FILTER_VF
flag set.
The justification for this ABI change was that new fields had been
added to the VFINFO, causing NLM_F_REQUEST messages to fail on systems
with large numbers of VFs if the requesting application didn't have a
large enough buffer for all the info. The idea is that most
applications doing an NLM_F_REQUEST don't care about VFINFO anyway, so
eliminating it from the response would lower the requirements on
buffer size. Apparently, the people who pushed this patch made the
mistaken assumption that iproute2 (the "ip" command) was the only
package that used IFLA_VFINFO_LIST, so it wouldn't break anything else
(and they made sure that iproute2 was fixed.
The logic of this "fix" is debatable at best (one could claim that the
proper fix would be for the applications in question to be fixed so
that they properly sized the buffer, which is what libvirt does
(purely by virtue of using libnl), but it is what it is and we have to
deal with it.
In order for <interface type='hostdev'> to work properly on systems
with a kernel 3.3 or later, libvirt needs to add the afore-mentioned
IFLA_EXT_MASK field with RTEXT_FILTER_VF set.
Of course we also need to continue working on systems with older
kernels, so that one bit of code is compiled conditionally. The one
time this could cause problems is if the libvirt binary was built on a
system without IFLA_EXT_MASK which was subsequently updated to a
kernel that *did* have it. That could be solved by manually providing
the values of IFLA_EXT_MASK and RTEXT_FILTER_VF and adding it to the
message anyway, but I'm uncertain what that might actually do on a
system that didn't support the message, so for the time being we'll
just fail in that case (which will very likely never happen anyway).
This patch fixes the lack of error messages when libvirt fails to find
VFINFO in a returned netlinke response message.
https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 is an example
of the error message that was previously logged when the
IFLA_VFINFO_LIST object was missing from the netlink response. The
reason for this failure is detailed in
https://bugzilla.redhat.com/show_bug.cgi?id=889319
Even though that root problem has been fixed, the experience of
finding the root cause shows us how important it is to properly log an
error message in these cases. This patch *seems* to replace the entire
function, but really most of the changes are due to moving code that
was previously inside an if() statement out to the top level of the
function (the original if() was reversed and made to log an error and
return).
Revert the complex workaround of commit 39d91e9, now that we have
a nicer framework for shutting up broken gcc.
* src/util/buf.c (virBufferEscape): Simplify.
When changing to virArch, the virt-aa-helper.c file was not
completely changed. The vahControl struct was left with a
char *arch field, instead of virArch arch field.
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
When parsing the arch from domain XML, the result was only
saved to a local variable, not the virDomainDefPtr
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Prior to the virArch changes, the CPU baseline method would
free the arch string in the returned CPU. Fix the regression
by setting arch to VIR_ARCH_NONE at the end
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We can use VIR_REALLOC_N with NULL pointer, which behaves the same way
as VIR_ALLOC_N in that case, so no need for a condition that's
checking if some data are allocated already.
---
I tried to find other parts of the code similar to this, so I can do a
full cleanup for the whole repository, so I used this (excuse the long
line, but that's how I was writing it):
git grep -nHC 5 -e VIR_REALLOC_N -e VIR_ALLOC_N | while read line; do if [[ "$line" == "--" ]]; then if [[ ${#tmpbuf} -gt 10 && "$REALLOC_N" == "true" && "$ALLOC_N" == "true" ]]; then echo $line; while [[ ${#tmpbuf[*]} -gt 0 ]]; do echo "${tmpbuf[0]}"; tmpbuf=( "${tmpbuf[@]:1:${#tmpbuf[*]}}" ); done; fi; unset tmpbuf REALLOC_N ALLOC_N; else if [[ "$ALLOC_N" != "true" && "${line/VIR_ALLOC_N//}" != "${line}" ]]; then ALLOC_N="true"; fi; if [[ "$REALLOC_N" != "true" && "${line/VIR_REALLOC_N//}" != "${line}" ]]; then REALLOC_N="true"; fi; tmpbuf[${#tmpbuf[*]}]="$line"; fi; done | less
And reviewed the output just to find out this was the only occurrence of
the inconsistency.
On few places there are too many levels of indentation when some of
them can be fixed with negating the option they are in or omitting
useless condition altogether.
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Replace use of uname in nodeGetInfo with virArch APIs to
provide canonicalization of host architecture name
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a 'virArch' enum for CPU architectures. Include
data type providing wordsize and endianness, and APIs to
query this info and convert to/from enum and string form.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This is yet another refinement to the fix for CVE-2012-3411:
https://bugzilla.redhat.com/show_bug.cgi?id=833033
It turns out that it would be very intrusive to correctly backport the
entire --bind-dynamic option to older dnsmasq versions
(e.g. dnsmasq-2.48 that is used on RHEL6.x and CentOS 6.x), but very
simple to patch those versions to just use SO_BINDTODEVICE on all
their listening sockets (SO_BINDTODEVICE also has the desired effect
of permitting only traffic that was received on the interface(s) where
dnsmasq was set to listen.)
This patch modifies the dnsmasq capabilities detection to detect the
string:
--bind-interfaces with SO_BINDTODEVICE
in the output of "dnsmasq --version", and in that case realize that
using the old --bind-interfaces option is just as safe as
--bind-dynamic (and therefore *not* forbid creation of networks that
use public IP address ranges).
If -bind-dynamic is available, it is still preferred over
--bind-interfaces.
Note that this patch does no harm in upstream, or in any distro's
downstream if it happens to end up there, but builds for distros that
have a new enough dnsmasq to support --bind-dynamic do *NOT* need to
specifically backport this patch; it's only required for distro
releases that have dnsmasq too old to have --bind-dynamic (and those
distros will need to add the SO_BINDTODEVICE patch to dnsmasq,
*including the extra string in the --version output*, as well.
Somehow I managed to push the changes to this file with improper
indentation. This patch just re-indents, reformats the comment lines,
and re-groups a couple of multi-line strings so that they fit within
80 columns. The resulting binary should be identical.
Wire up the attach/detach device drivers in LXC to support the
hotplug/unplug of host misc devices.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach device drivers in LXC to support the
hotplug/unplug of host storage devices.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach device drivers in LXC to support the
hotplug/unplug of USB host devices.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach/update device APIs to support changing
of hostdevs in the persistent config file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach/update device APIs to support changing
of disks in the persistent config file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach/update device APIs to support changing
of network interfaces in the persistent config file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This wires up the LXC driver to support the domain device attach/
detach/update APIs, following the same code design as used in
the QEMU driver. No actual changes are possible with this commit,
it is only providing the framework
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This extends support for host device passthrough with LXC to
cover misc devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This extends support for host device passthrough with LXC to
cover storage devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This adds support for host device passthrough with the
LXC driver. Since there is only a single kernel image,
it doesn't make sense to pass through PCI devices, but
USB devices are fine. For the latter we merely need to
make the /dev/bus/usb/NNN/MMM character device exist
in the container's /dev
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently LXC guests can be given arbitrary pre-mounted
filesystems, however, for some usecases it is more appropriate
to provide block devices which the container can mount itself.
This first impl only allows for <disk type='block'>, in other
words exposing a host disk device to a container. Since LXC
does not have device namespace virtualization, we are cheating
a little bit. If the XML specifies /dev/sdc4 to be given to
the container as /dev/sda1, when we do the mknod /dev/sda1
in the container's /dev, we actually use the major:minor
number of /dev/sdc4, not /dev/sda1.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Prepare to support different types of hostdevs by refactoring
the current SELinux security driver code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virSecurityManager{Set,Restore}AllLabel methods are invoked
at domain startup/shutdown to relabel resources associated with
a domain. This works fine with QEMU, but with LXC they are in
fact both currently no-ops since LXC does not support disks,
hostdevs, or kernel/initrd files. Worse, when LXC gains support
for disks/hostdevs, they will do the wrong thing, since they
run in host context, not container context. Thus this patch
turns then into a formal no-op when used with LXC. The LXC
controller will call out to specific security manager labelling
APIs as required during startup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code for creating veth/macvlan devices is part of the
LXC process startup code. Refactor this a little and export
the methods to the rest of the LXC driver. This allows them
to be reused for NIC hotplug code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The <hostdev> device type has long had a redundant "mode"
attribute, which has always been "subsys". This finally
introduces a new mode "capabilities", which will be used
by the LXC driver for device assignment. Since container
based virtualization uses a single kernel, the idea of
assigning physical PCI devices doesn't make sense. It is
still reasonable to assign USB devices, but for assigning
arbitrary nodes in /dev, the new 'capabilities' mode is
to be used.
The first capability support is 'storage', which is for
assignment of block devices. Functionally this is really
pretty similar to the <disk> support. The only difference
is the device node name is identical in both host and
container namespaces.
<hostdev mode='capabilities' type='storage'>
<source>
<block>/dev/sdf1</block>
</source>
</hostdev>
The second capability support is 'misc', which is for
assignment of character devices. There is no existing
parallel to this. Again the device node is the same
inside & outside the container.
<hostdev mode='capabilities' type='misc'>
<source>
<char>/dev/input/event3</char>
</source>
</hostdev>
The reason for keeping the char & storage devices
separate in the domain XML, is to mirror the split
in the node device XML. NB the node device XML does
not yet report character devices, but that's another
new patch to come
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There was a double free issue caused by virSysinfoRead on s390,
as the same manufacturer string instance was assigned to more
than one processor record.
Cleaned up other potential memory issues and restructured the sysinfo
parsing code by moving repeating patterns into a helper function.
The restructuring made it necessary to conditionally disable
-Wlogical-op for some older GCC versions, using pragma GCC diagnostic.
This is a GCC specific pragma, which is acceptable, since we're
using it to work around a GCC specific bug.
Finally, added a function virSysinfoSetup to configure the sysinfo
data source files/script during run time, to facilitate writing test
programs. This function is not published in sysinfo.h and only
there for testing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This patch simplifies the code that parses the fallback and vendor_id
attributes from the domain xml cpu definition.
Changes done:
- free temp variables in the cleanup section instead of local use
- remove checking for presence of the attribute to directly getting the
value (saving call to virXPathBoolean)
- replace loop used to check for ',' in the vendor_id string with strchr
This patch fixes a problem that vendor_id attribute can not be defined
when fallback attribute is not defined.
If I define domain xml like below:
<domain>
<cpu>
<model vendor_id='aaaabbbbcccc'>core2duo</model>
</cpu>
</domain>
In dumpxml, vendor_id is not reflected:
<domain>
<cpu mode='custom' match='exact'>
<model fallback='allow'>core2duo</model>
</cpu>
</domain>
The expected output is:
<domain>
<cpu mode='custom' match='exact'>
<model fallback='allow' vendor_id='aaaabbbbcccc'>core2duo</model>
</cpu>
</domain>
If the fallback attribute and vendor_id attribute is defined at the same
time, it's reflected as expected.
Signed-off-by: Ken ICHIKAWA <ichikawa.ken@jp.fujitsu.com>
The current SELinux policy only works for KVM guests, since
TCG requires the 'execmem' privilege. There is a 'virt_use_execmem'
boolean to turn this on globally, but that is unpleasant for users.
This changes libvirt to automatically use a new 'svirt_tcg_t'
context for TCG based guests. This obsoletes the previous
boolean tunable and makes things 'just work(tm)'
Since we can't assume we run with new enough policy, I also
make us log a warning message (once only) if we find the policy
lacks support. In this case we fallback to the normal label and
expect users to set the boolean tunable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
'-device VGA' maps to '-vga std'
'-device cirrus-vga' maps to '-vga cirrus'
'-device qxl-vga' maps to '-vga qxl'
(there is also '-device qxl' for secondary devices)
'-device vmware-svga' maps to '-vga vmware'
For qemu(>=1.2), we can use -device to replace -vga for video
device. For the primary video device, the patch tries to use 0x2
slot for matching old qemu. If the 0x2 slot is allocated already,
the addr property could help for using any available slot.
For qemu(< 1.2), we keep using -vga for primary device.
If there are multiple video devices
primary = 'yes' marks this video device as the primary one.
The rest are secondary video devices. No more than one could be
mark as primary. If none of them has primary attribute, the first
one will be the primary by default like what it was.
The reason of this changing is that for qemu, only one primary video
device is permitted which can be of any type. For secondary video
devices, only qxl is allowd. Primary attribute removes the restriction
that the first have to be the primary one.
We always put the primary video device into the first position of
video device structure array after parsing.
QEMU_CAPS_DEVICE_QXL -device qxl
QEMU_CAPS_DEVICE_VGA -device VGA
QEMU_CAPS_DEVICE_CIRRUS_VGA -device cirrus-vga
QEMU_CAPS_DEVICE_VMWARE_SVGA -device vmware-svga
QEMU_CAPS_DEVICE_VIDEO_PRIMARY /* safe to use -device XXX
for primary video device */
Fix a typo in qemuCapsObjectTypes, the string 'qxl' here
should be -device qxl rather than -vga [...|qxl|..]
Noticed these while building on FreeBSD.
* src/qemu/qemu_monitor.c (qemuMonitorBlockInfoLookup): Rename
variable to avoid 'devname' collision.
* src/qemu/qemu_driver.c (qemuDomainInterfaceStats): Mark unused
variable.
This adds an implementation of virNetSocketGetUNIXIdentity()
using LOCAL_PEERCRED socket option and xucred struct, defined
in <sys/ucred.h> on systems that have it.
A forgotten "!" in recently-modified code at the top of
networkRefreshDaemon() meant an improper early return, which led to 1)
dnsmasq config files not being updated from the newly modified config,
and 2) dnsmasq not being sent a SIGHUP so that it could learn about
the changes to the config.
virNetworkDefGetIpByIndex() returns NULL if there are no ip objects of
the requested type, and if there are no IP elements, then dnsmasq
shouldn't be running, so we can return early. Otherwise we should
rewrite the config files and send a SIGHUP.
Currently, if sanlock is already registering a lockspace other
libvirtd instances (from other hosts) obtain -EINPROGRESS. On
sufficiently new sanlock, sanlock_inq_lockspace() is called,
which suspend execution until lockspace state is changed. With
current libvirt implementation, we fail to retry adding the
lockspace again but continue in error path. Therefore we produce
meaningless error message:
virLockManagerSanlockSetupLockspace:363 : Unable to add lockspace
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Success
qemudLoadDriverConfig:558 : Failed to load lock manager sanlock
We should try to re-add the lockspace after its state change to
be sure it was added successfully. In fact, with sufficiently new
sanlock we can just avoid dummy usleep() which is used if there's
no inquire API.
The virtlockd daemon scripts were lousy, when compared to their
counterparts in daemon/Makefile.am. In particular, when init
scripts were selected, this resulted in 'make distcheck' failing
due to failure to clean up src/virtlockd.init.
* src/Makefile.am (install-systemd): Fix dependencies. Use MKDIR_P.
(uninstall-systemd): Remove empty directory. Use fewer processes.
(install-init, install-sysconfig): Use MKDIR_P.
(uninstall-init): Remove correct file, and also empty directory.
(uninstall-sysconfig): Remove empty directory.
(DISTCLEANFILES): Clean up trivially built sources.
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).
This patch resolves the problem reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=886663
The source of the problem was the fix for CVE 2011-3411:
https://bugzilla.redhat.com/show_bug.cgi?id=833033
which was originally committed upstream in commit
753ff83a50. That commit improperly
removed the "--except-interface lo" from dnsmasq commandlines when
--bind-dynamic was used (based on comments in the latter bug).
It turns out that the problem reported in the CVE could be eliminated
without removing "--except-interface lo", and removing it actually
caused each instance of dnsmasq to listen on localhost on port 53,
which created a new problem:
If another instance of dnsmasq using "bind-interfaces" (instead of
"bind-dynamic") had already been started (or if another instance
started later used "bind-dynamic"), this wouldn't have any immediately
visible ill effects, but if you tried to start another dnsmasq
instance using "bind-interfaces" *after* starting any libvirt
networks, the new dnsmasq would fail to start, because there was
already another process listening on port 53.
(Subsequent to the CVE fix, another patch changed the network driver
to put dnsmasq options in a conf file rather than directly on the
dnsmasq commandline, but preserved the same options.)
This patch changes the network driver to *always* add
"except-interface=lo" to dnsmasq conf files, regardless of whether we use
bind-dynamic or bind-interfaces. This way no libvirt dnsmasq instances
are listening on localhost (and the CVE is still fixed).
The actual code change is miniscule, but must be propogated through all
of the test files as well.
The default lockd driver behavour is to acquire leases
directly on the disk files. This introduces an alternative
mode, where leases are acquire indirectly on a file that
is based on a SHA256 hash of the disk filename.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This adds a 'lockd' lock driver which is just a client which
talks to the lockd daemon to perform all locking. This will
be the default lock driver for any hypervisor which needs one.
* src/Makefile.am: Add lockd.so plugin
* src/locking/lock_driver_lockd.c: Lockd driver impl
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virtlockd daemon maintains file locks on behalf of libvirtd
and any VMs it is running. These file locks must be held for as
long as any VM is running. If virtlockd itself ever quits, then
it is expected that a node would be fenced/rebooted. Thus to
allow for software upgrads on live systemd, virtlockd needs the
ability to re-exec() itself.
Upon receipt of SIGUSR1, virtlockd will save its current live
state out to a file /var/run/virtlockd-restart-exec.json
It then re-exec()'s itself with exactly the same argv as it
originally had, and loads the state file, reconstructing any
objects as appropriate.
The state file contains information about all locks held and
all network services and clients currently active. An example
state document is
{
"server": {
"min_workers": 1,
"max_workers": 20,
"priority_workers": 0,
"max_clients": 20,
"keepaliveInterval": 4294967295,
"keepaliveCount": 0,
"keepaliveRequired": false,
"services": [
{
"auth": 0,
"readonly": false,
"nrequests_client_max": 1,
"socks": [
{
"fd": 6,
"errfd": -1,
"pid": 0,
"isClient": false
}
]
}
],
"clients": [
{
"auth": 0,
"readonly": false,
"nrequests_max": 1,
"sock": {
"fd": 9,
"errfd": -1,
"pid": 0,
"isClient": true
},
"privateData": {
"restricted": true,
"ownerPid": 1722,
"ownerId": 6,
"ownerName": "f18x86_64",
"ownerUUID": "97586ba9-df27-9459-c806-f016c8bbd224"
}
},
{
"auth": 0,
"readonly": false,
"nrequests_max": 1,
"sock": {
"fd": 10,
"errfd": -1,
"pid": 0,
"isClient": true
},
"privateData": {
"restricted": true,
"ownerPid": 1784,
"ownerId": 7,
"ownerName": "f16x86_64",
"ownerUUID": "7b8e5e42-b875-61e9-b981-91ad8fa46979"
}
}
]
},
"defaultLockspace": {
"resources": [
{
"name": "/var/lib/libvirt/images/f16x86_64.raw",
"path": "/var/lib/libvirt/images/f16x86_64.raw",
"fd": 14,
"lockHeld": true,
"flags": 0,
"owners": [
1784
]
},
{
"name": "/var/lib/libvirt/images/shared.img",
"path": "/var/lib/libvirt/images/shared.img",
"fd": 12,
"lockHeld": true,
"flags": 1,
"owners": [
1722,
1784
]
},
{
"name": "/var/lib/libvirt/images/f18x86_64.img",
"path": "/var/lib/libvirt/images/f18x86_64.img",
"fd": 11,
"lockHeld": true,
"flags": 0,
"owners": [
1722
]
}
]
},
"lockspaces": [
],
"magic": "30199"
}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This enhancement virtlockd so that it can receive a pre-opened
UNIX domain socket from systemd at launch time, and adds the
systemd service/socket unit files
* daemon/libvirtd.service.in: Require virtlockd to be running
* libvirt.spec.in: Add virtlockd systemd files
* src/Makefile.am: Install systemd files
* src/locking/lock_daemon.c: Support socket activation
* src/locking/virtlockd.service.in, src/locking/virtlockd.socket.in:
systemd unit files
* src/rpc/virnetserverservice.c, src/rpc/virnetserverservice.h:
Add virNetServerServiceNewFD() method
* src/rpc/virnetsocket.c, src/rpc/virnetsocket.h: Add virNetSocketNewListenFD
method
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a lock_daemon_dispatch.c file which implements the
server side dispatcher the RPC APIs previously defined in the
lock protocol.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virtlockd daemon will be responsible for managing locks
on virtual machines. Communication will be via the standard
RPC infrastructure. This provides the XDR protocol definition
* src/locking/lock_protocol.x: Wire protocol for virtlockd
* src/Makefile.am: Include lock_protocol.[ch] in virtlockd
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virtlockd daemon will maintain locks on behalf of libvirtd.
There are two reasons for it to be separate
- Avoid risk of other libvirtd threads accidentally
releasing fcntl() locks by opening + closing a file
that is locked
- Ensure locks can be preserved across libvirtd restarts.
virtlockd will need to be able to re-exec itself while
maintaining locks. This is simpler to achieve if its
sole job is maintaining locks
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Refactor virLockManagerPluginNew() so that the caller does
not need to pass in the config file path itself - just the
config directory and driver name.
Fix QEMU to actually pass in a config file when creating the
default lock manager plugin, rather than NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current virStorageFileGet{LVM,SCSI}Key methods return
the key as the return value. Unfortunately it is desirable
for "NULL" to be a valid return value, as well as an error
indicator. Thus the returned key must instead be provided
as an out-parameter.
When we invoke lvs or scsi_id to extract ID for block devices,
we don't want virCommandWait logging errors messages. Thus we
must explicitly check 'status != 0', rather than letting
virCommandWait do it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The libxl driver ignored boot devices in the domain config,
preventing PXE booting HVM domains. This patch accounts for
user-specified boot devices when building the libxl domain
configuration.
The QED file format is non-versioned, so although the magic
value matched, libvirt rejected it due to lack of a version
number to compare against. We need to distinguish this case
by allowing a value of '-2' to indicate a non-versioned file
where only the magic is required to match
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To help us detect when new storage file versions come into
existance log a warning if the storage file magic matches,
but the version does not
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Fully stub out the virCgroupGetAppRoot method as done with other
methods in the file, rather than just the body. This lets us
annotate the unused parameter to avoid a warning
I noticed that /var/lib/libvirt/dnsmasq/*.conf used the wrong word;
it was intended to match the wording in src/util/xml.c.
* src/network/bridge_driver.c (networkDnsmasqConfContents): Fix typo.
* tests/networkxml2confdata/*.conf: Update accordingly.
* Autotools changes:
- Don't assume Qemu is Linux-only
- Check Linux headers only on Linux
- Disable firewalld on FreeBSD
* Initctl:
Initctl seem to present only on Linux, so stub it on other platforms
* Raw I/O: Linux-only as well
* Headers cleanup
make check fails in check-symsorting if configure is not run in
the source directory. Prefixing symfile names with $(srcdir)
fixes this.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This patch adds a new domain lookup helper qemuDomObjFromDomainDriver
that lookups the domain and leaves the driver locked. The driver is
returned as the second argument of that function. If the lookup fails
the driver is unlocked to help avoid cleanup codepaths.
This patch also improves docs for the helpers.
This patch gets rid of the undeterministic error reporting code done on
return values of get(pw|gr)nam_r. With this patch, if the group record
is not returned by the corresponding function this error is not
considered fatal even if errno != 0. The error is logged in such case.
Move the code for matching hostdev instances out of virDomainHostdevFind
and into virDomainHostdevMatch method, which in turn calls out to other
helper methods depending on the type of hostdev.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rename virDomainHostdevPartsParse to virDomainHostdevDefParseSubsys
to reflect the fact that it only deals with hostdevs uing the
traditional mode=subsystem, and not mode=capabilities
Rename virDomainHostSourceFormat to virDomainHostdevDefFormatSubsys
for the same reason.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
virStorageFileGetLVMKey and virStorageFileGetSCSIKey
both return heap allocated strings, so the return value
should not be marked const.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add check-symsorting.pl to perform case-insensitive alphabetical
sorting of groups of symbols. Fix all violations it reports
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When using vnc gaphics over a unix socket, virt-aa-helper needs to provide
access for the qemu domain to access the sockfile.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
When a qemu domain is backed by huge pages, apparmor needs to grant the domain
rw access to files under the hugetlbfs mount point. Add a hook, called in
qemu_process.c, which ends up adding the read-write access through
virt-aa-helper. Qemu will be creating a randomly named file under the
mountpoint and unlinking it as soon as it has mmap()d it, therefore we
cannot predict the full pathname, but for the same reason it is generally
safe to provide access to $path/**.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch exports qemuMigrationIsAllowed and adds a new parameter to it
to denote if it's a remote migration or a local migration. Local
migrations are used in snapshots and saving of the machine state and
have fewer restrictions. This patch also adjusts callers of the function
and tweaks some error messages to be more universal.
Currently there is no way to detect it via QMP and requesting "-sandbox
off" works correctly even if it was compiled out, so this will work
unless someone both requests the sandbox in qemu.conf and builds QEMU
without the support for it.
Interfaces keeps a class_id, which is an ID from which bridge
part of QoS settings is derived. We need to store class_id
in domain status file, so we can later pass it to
virNetDevBandwidthUnplug.
Currently, we are only keeping a inactive XML configuration
in status dir. This is no longer enough as we need to keep
this class_id attribute so we don't overwrite old entries
when the daemon restarts. However, since there has already
been release which has just <network/> as root element,
and we want to keep things compatible, detect that loaded
status file is older one, and don't scream about it.
Network should be notified if we plug in or unplug an
interface, so it can perform some action, e.g. set/unset
network part of QoS. However, we are doing this in very
early stage, so iface->ifname isn't filled in yet. So
whenever we want to report an error, we must use a different
identifier, e.g. the MAC address.
This will be used whenever a NIC with guaranteed throughput is to
be plugged into a bridge. It will adjust the average throughput of
non guaranteed NICs (classid 1:2) to meet new requirements.
These set bridge part of QoS when bringing domain's interface up.
Long story short, if there's a 'floor' set, a new QoS class is created.
ClassID MUST be unique within the bridge and should be kept for
unplug phase.
These classes can borrow unused bandwidth. Basically,
only egress qdsics can have classes, therefore we can
do this kind of traffic shaping only on host's outgoing,
that is domain's incoming traffic.
This is however supported only on domain interfaces with
type='network'. Moreover, target network needs to have at least
inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average'
(old) or 'floor' (new). This new attribute can be used just for
interfaces type of network (<interface type='network'/>) currently.
Stochastic Fairness Queuing (SFQ) is queuing discipline
(qdisc) which doesn't really shape any traffic but 'just'
re-arrange packets in sending buffer so no stream starve.
The goal is to ensure fairness. There is basically only one
configuration parameter (perturb) which is set to advised
value of 10.
Network adapters of type 'routed' is a special case. Other adapters
have 'network' parameter in prlctl's output instead.
Routed network adapters should be connected to 'routed' network
from libvirt's view.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Historically if traffic from the adapter is routed to LAN without
NAT, it isn't connected to any virtual networks, but has a 'type'
instead. Sinse libvirt has special virtual network type for such case,
let's add pseudo network 'routed' to fit libvirt's API well.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Parallels Cloud Server uses virtual networks model for network
configuration. It uses own tools for virtual network management.
So add network driver, which will be responsible for listing
virtual networks and performing different operations on them
(in consequent patched).
This patch only allows listing virtual network names, without
any parameters like DHCP server settings.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Allow changing network interfaces in domain configuration.
ifname is used as iterface identifier: if there is interface
with some ifname in old config and there are no interfaces with
such name in the new config - issue prlctl command to delete
the network interface. And vice versa - if interface with
some ifname exists only in new config - issue prlctl command
to create it.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Parse network interfaces info from prlctl output.
Parallels Cloud Server uses virtual networks model for
network configuration: You can add network adapter to
VM and connect it to some predefined virtual network.
Fill type, mac, network name and linkstate fields of
virDomainNetDef structure.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
When the disk snapshot part of an external system checkpoint fails the
memory image is retained. This patch adds code to remove the image in
such case.
In case the snapshot code isn't able to restart CPUs after an external
checkpoint we would leak a copy of the domains XML definition. This
patch fixes the cleanup path.