Commit 2d74822a9e renamed
"freebsdNodeGetCPUCount" to "appleFreebsdNodeGetCPUCount", leaving one
call to "freebsdNodeGetCPUCount". Fix this other case.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We currently have other error codes in singular form, e.g.
VIR_ERR_NETWORK_EXIST. Cleanup the previous patch to match the form.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I created a storage volume(eg: test) from a storage pool(eg:vg10) using
the following command:"virsh vol-create-as --pool vg10 --name test --capacity 300M."
When I re-executed the above command, the output was as the following:
"error: Failed to create vol test
error: Storage volume not found: storage vol 'test' already exists"
I think the output "Storage volume not found" is not appropriate. Because in fact storage
vol test has been found at this time. And then I think virErrorNumber should includes
VIR_ERR_STORAGE_EXIST which can also be used elsewhere. So I make this patch. The result
is as following:
"error: Failed to create vol test
error: storage volume 'test' exists already"
The virConnectPtr is passed around loads of nwfilter code in
order to provide it as a parameter to the callback registered
by the virt drivers. None of the virt drivers use this param
though, so it serves no purpose.
Avoiding the need to pass a virConnectPtr means that the
nwfilterStateReload method no longer needs to open a bogus
QEMU driver connection. This addresses a race condition that
can lead to a crash on startup.
The nwfilter driver starts before the QEMU driver and registers
some callbacks with DBus to detect firewalld reload. If the
firewalld reload happens while the QEMU driver is still starting
up though, the nwfilterStateReload method will open a connection
to the partially initialized QEMU driver and cause a crash.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The nwfilter driver only needs a reference to its private
state object, not a full virConnectPtr. Update the domUpdateCBStruct
struct to have a 'void *opaque' field instead of a virConnectPtr.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 27e81517a8 set the payload size to 256 KB, which is
actually the max packet size, including the size of the header.
Reduce this by VIR_NET_MESSAGE_HEADER_MAX (24) and set
VIR_NET_MESSAGE_LEGACY_PAYLOAD_MAX to 262120, which was the original
value before increasing the limit in commit eb635de1fe.
This fixes the following error:
error : nodeGetInfo:933 : this function is not supported
by the connection driver: node info not implemented on this platform
The freebsdNodeGetCPUCount was renamed to appleFreebsdNodeGetCPUCount
in order to make more visible the fact, that it works on Mac OS X too.
Mac OS X can use sysctlbyname as same as FreeBSD to get the CPU
frequency. However, the MIB style name is different from FreeBSD's.
And the unit of the return frequency is also different.
Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This fixes the following error:
error : virGetUserEnt:703 : Failed to find user record for uid '32654'
'32654' (it's random and varies) comes from getsockopt with
LOCAL_PEERCRED option. getsockopt returns w/o error but seems
to not set any value to the buffer for uid.
For Mac OS X, LOCAL_PEERCRED has to be used with SOL_LOCAL level.
With SOL_LOCAL, getsockopt returns a correct uid.
Note that SOL_LOCAL can be found in
/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys/un.h.
Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
On RHEL 5, compilation fails with:
storage/storage_backend.c: In function 'createRawFile':
storage/storage_backend.c:339: warning: implicit declaration of function 'fallocate'
storage/storage_backend.c:339: warning: nested extern declaration of 'fallocate' [-Wnested-externs]
But:
$ grep HAVE_FALLOCATE config.h
/* #undef HAVE_FALLOCATE */
Huh? It turns out that in kernels that old, fallocate() is not
implemented (config.h is correct), but <linux/fs.h> defines
HAVE_FALLOCATE as an empty witness macro for a completely
different purpose. Since storage_backend.c is including
<linux/fs.h> on RHEL 5, we are hosed by the kernel definition.
Newer kernels no longer pollute the namespace, and it's fairly
easy to convert to an expression that works with both the old
kernel witness and the new-style config.h (undefined or 1).
Problem introduced in commit 532fef3.
* src/storage/storage_backend.c (createRawFile): Avoid namespace
pollution from kernel, by checking HAVE_FALLOCATE for a value.
Signed-off-by: Eric Blake <eblake@redhat.com>
I tried to test ./configure --without-lxc --without-remote.
First, the build failed with some odd errors, such as an
inability to build xen, or link failures for virNetTLSInit.
But when you think about it, once there is no remote code,
all of libvirtd is useless, any stateful driver that depends
on libvirtd is also not worth compiling, and any libraries
used only by RPC code are not needed. So I patched
configure.ac to make for some saner defaults when an
explicit disable is attempted. Similarly, since we have
migrated virnetdevbridge into generic code, the workaround
for Linux kernel stupidity must not depend on stateful
drivers being in use.
Then there's 'make check' that needs segregation.
Wow - quite a bit of cleanup to make --without-remote useful :)
* configure.ac: Let --without-remote toggle defaults on stateful
drivers and other libraries. Pick up Linux kernel workarounds
even when qemu and lxc are not being compiled.
* tests/Makefile.am (test_programs): Factor out programs that
require remote.
* src/libvirt_private.syms (rpc/virnet*.h): Move...
* src/libvirt_remote.syms: ...into new file.
* src/Makefile.am (SYM_FILES): Ship new syms file.
Signed-off-by: Eric Blake <eblake@redhat.com>
Fixed the safezero call for allocating the rest of the file after cloning
an existing volume; it used to always use a zero offset, causing it to
only allocate the beginning of the file.
Also modified file creation to try to use fallocate(2) to pre-allocate
disk space before copying any data to make sure it fails early on if disk
is full and makes sure we can skip zero blocks when copying file contents.
If fallocate isn't available we will zero out the rest of the file after
cloning and only use sparse cloning if client requested a lower allocation
than the input volume's capacity.
Signed-off-by: Oskari Saarenmaa <os@ohmu.fi>
My previous commit 7dc1d4ab was supposed to change safezero to allocate
1 megabyte at maximum, but had the logic reversed and will allocate 1
megabyte at minimum (and a lot more at maximum.)
Signed-off-by: Oskari Saarenmaa <os@ohmu.fi>
mmap can fail on 32-bit systems if we're trying to zero out a lot of data.
Fall back to using block-by-block writing in that case. While we could map
smaller blocks it's unlikely that this code is used a lot and its easier to
just fall back to one of the existing methods.
Also modified the block-by-block zeroing to not allocate a megabyte of
zeroes if we're writing less than that.
Signed-off-by: Oskari Saarenmaa <os@ohmu.fi>
Again stolen from qemu_driver.c, but dropping all the unneeded bits.
This aims to copy all the current qemu validation checks since that's
the most commonly used real driver, but some of the checks are
completely artificial in the test driver.
This only supports creation of internal snapshots for initial
simplicity.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1012824https://bugzilla.redhat.com/show_bug.cgi?id=1012834
Note that a similar problem was reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=827519
but the fix only worked for <interface type='hostdev'>, *not* for
<interface type='network'> where the network itself was a pool of
hostdevs.
The symptom in both cases was this error message:
internal error: Unable to determine device index for network device
In both cases the cause was lack of proper handling for netdevs
(<interface>) of type='hostdev' when scanning the netdev list looking
for alias names in qemuAssignDeviceNetAlias() - those that aren't
type='hostdev' have an alias of the form "net%d", while those that are
hostdev use "hostdev%d". This special handling was completely lacking
prior to the fix for Bug 827519 which was:
When searching for the highest alias index, libvirt looks at the alias
for each netdev and if it is type='hostdev' it ignores the entry. If
the type is not hostdev, then it expects the "net%d" form; if it
doesn't find that, it fails and logs the above error message.
That fix works except in the case of <interface type='network'> where
the network uses hostdev (i.e. the network is a pool of VFs to be
assigned to the guests via PCI passthrough). In this case, the check
for type='hostdev' would fail because it was done as:
def->net[i]->type == VIR_DOMAIN_NET_TYPE_HOSTDEV
(which compares what was written in the config) when it actually
should have been:
virDomainNetGetActualType(def->net[i]) == VIR_DOMAIN_NET_TYPE_HOSTDEV
(which compares the type of netdev that was actually allocated from
the network at runtime).
Of course the latter wouldn't be of any use if the netdevs of
type='network' hadn't already acquired their actual network connection
yet, but manual examination of the code showed that this is never the
case.
While looking through qemu_command.c, two other places were found to
directly compare the net[i]->type field rather than getting actualType:
* qemuAssignDeviceAliases() - in this case, the incorrect comparison
would cause us to create a "net%d" alias for a netdev with
type='network' but actualType='hostdev'. This alias would be
subsequently overwritten by the proper "hostdev%d" form, so
everything would operate properly, but a string would be
leaked. This patch also fixes this problem.
* qemuAssignDevicePCISlots() - would defer assigning a PCI address to
a netdev if it was type='hostdev', but not for type='network +
actualType='hostdev'. In this case, the actual device usually hasn't
been acquired yet anyway, and even in the case that it has, there is
no practical difference between assigning a PCI address while
traversing the netdev list or while traversing the hostdev
list. Because changing it would be an effective NOP (but potentially
cause some unexpected regression), this usage was left unchanged.
The XML parser reserves 'vnet' as a prefix for automatically
generated NIC device names. Switch the veth device creation
to use this prefix, so it does not have to worry about clashes
with user specified names in the XML.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The veth device creation code run in two steps, first it looks
for two free veth device names, then it runs ip link to create
the veth pair. There is an obvious race between finding free
names and creating them, when guests are started in parallel.
Rewrite the code to loop and re-try creation if it fails, to
deal with the race condition.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If veth device allocation has a fatal error, the veths
array may contain NULL device names. Avoid calling the
virNetDevVethDelete function on such names.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The kernel automatically destroys veth devices when cleaning
up the container network namespace. During normal shutdown, it
is thus likely that the attempt to run 'ip link del vethN'
will fail. If it fails, check if the device exists, and avoid
reporting an error if it has gone. This switches to use the
virCommand APIs instead of virRun too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
During container cleanup there is a race where the kernel may
have destroyed the veth device before we try to set it offline.
This causes log error messages. Given that we're about to
delete the device entirely, setting it offline is pointless.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When querying for kvm, we try to find 'enabled' field. Hence the error
message should report we haven't found 'enabled' and not 'running'
(which is not even in the reply). Probably a typo or copy-paste error.
The qemuDomainChangeNet() is called when 'virsh update-device' is
invoked on a NIC. Currently, we fail to update the QoS even though
we have routines for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far the virNetDevBandwidthEqual() expected both ->in and ->out items
to be allocated for both @a and @b compared. This is not necessary true
for all our code. For instance, running 'update-device' twice over a NIC
with the very same XML results in SIGSEGV-ing in this function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This should resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1012085
libvirt previously recognized NFS, GFS2, OCFS2, and AFS filesystems as
"shared", and thus eligible for exceptions to certain rules/actions
about chowning image files before handing them off to a guest. This
patch widens the definition of "shared filesystem" to include SMB and
CIFS filesystems (aka "Windows file sharing"); both of these use the
same protocol, but different drivers so there are different magic
numbers for each.
This basically covers the talking-to-monitor part of
virQEMUCapsInitQMP. The patch itself has no real value,
but it creates an entity to be tested in the next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We forgot to do cleanup when lxcContainerMountFSTmpfs
failed to bind fs as read-only.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The libvirtd server pushes data out to clients. It does not
know what protocol version the client might have, so must be
conservative and use the old payload limits. ie send no more
than 256kb of data per packet.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The problem is described by [0] but its effect on libvirt is that
starting a container with a full distro running systemd after having
stopped it simply fails.
The container cleanup now calls the machined Terminate function to make
sure that everything is in order for the next run.
[0]: https://bugs.freedesktop.org/show_bug.cgi?id=68370
mmap's offset must be aligned to page size or mapping will fail.
mmap-based safezero is only used if posix_fallocate isn't available.
Signed-off-by: Oskari Saarenmaa <os@ohmu.fi>
Fixed the retrieval of the AdapterId from the AdapterName of the
hostdev source so it does return an error instead of leaving the
adapter_id uninitialized.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The change in ef29de14c3 that introduced
better error logging from qemu introduced a warning from coverity about
unused return value from lseek. Silence this warning and fix typo in the
corresponding error message.
Reported by: John Ferlan
Currently the VMware version check code only supports two types of
VMware backends, Workstation and Player. But in the near future we will
have an additional one so we need to support more. Additionally, we
discover and cache the path to the vmrun binary so we should use that
path when using the corresponding binary from the VMware VIX SDK.
Another case missed by commits 716c7bb and 6973e02.
* src/Makefile.am (VIR_NET_RPC_GENERATED): Drop $(srcdir).
(libvirt_net_rpc_la_SOURCES): List generated files more compactly.
Signed-off-by: Eric Blake <eblake@redhat.com>
When running 'make dist' on a system without policykit, we currently
fail. This is because $(srcdir)/access/org.libvirt.api.policy is in
EXTRA_DIST, however, the rule to generate the file is conditional
whether we build with polkit or not.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
On some systems (linux, cygwin and gnukfreebsd) rpcgen generates files
which when compiling produces this warning:
remote/remote_protocol.c: In function 'xdr_remote_node_get_cpu_stats_ret':
remote/remote_protocol.c:530: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
Hence, on those systems we need to post-process the files by the
rpc/genprotocol.pl perl script. At the beginning of the script the OS is
detected via $^O perl variable. From my latest build on FreeBSD I see we
need to fix the code there too. On FreeBSD the variable contains
'freebsd' string:
http://perldoc.perl.org/perlport.html#PLATFORMS
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The debug message said there was a timeout of 0 pending for -1 ms which
made me think this is where a hang was coming from but according to the
function comments this case means that there is no timeout pending so
make the debug message say that instead of saying there's a -1 ms
timeout.
While BSDs don't support process creation timestamp information via
PEERCRED for Unix sockets, we need to actually initialize the value
because it is used by the libvirt code.
https://bugzilla.redhat.com/show_bug.cgi?id=1011330 (case A)
While activeScsiHostdevs and webSocketPorts were allocated in
qemuStateInitialize, they were not freed in qemuStateCleanup.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1011330 (case D)
qemuProcessStart created two references to virQEMUDriverConfigPtr before
calling fork():
cfg = virQEMUDriverGetConfig(driver);
...
hookData.cfg = virObjectRef(cfg);
However, the child only unreferenced hookData.cfg and the parent only
removed the cfg reference. That said, we don't need to increment the
reference counter when assigning cfg to hookData. Both the child and the
parent will correctly remove the reference on cfg (the child will do
that through hookData).
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1012818
Commit 6d7d0b1869 (in 1.1.2) added bounds
checking to virDomainGetJobStats. But even at that time the API was able
to return 20 parameters while the limit was set to 16.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The return value of virDomainControllerFind >=0 means that
the specific controller was found.
But some functions invoke it and treat 0 as not found.
This patch fix these incorrect invocation.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Since commit 297c99a5 an invalid source definition XML of a character
device that is used as backend for RNG devices, smartcards and redirdevs
causes crash of the daemon when parsing such a definition.
The device types mentioned above are not a part of a regular character
device but are backends for other types. Thus when parsing such device
NULL is passed as the argument @chr_def. Later when checking the
validity of the definition @chr_def was dereferenced when parsing a UNIX
socket backend with missing path of the socket and crashed the daemon.
Sample offending configuration:
<devices>
...
<rng model='virtio'>
<backend model='egd' type='unix'>
<source mode='bind' service='1024'/>
</backend>
</rng>
</devices>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1012196
When passing in custom driver XML, allow a block like
<domain xmlns:test='http://libvirt.org/schemas/domain/test/1.0'>
...
<test:runstate>5</test:runstate>
</domain>
This is only read at initial driver start time, and sets the initial
run state of the object. This is handy for UI testing.
It's only wired up for domains, since that's the only conf/
infrastructure that supports namespaces at the moment.
For inexplicable reasons, the nwfilter XML parser is intentionally
ignoring errors that arise during parsing. As well as meaning that
users don't get any feedback on their XML mistakes, this will lead
it to silently drop data in OOM conditions.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Normally a lockspace resource is not freed while there are
active owners. During initial resource creation though, an
OOM error will trigger this scenario. virLockSpaceResourceFree
was not freeing the 'owners' field in this case.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If OOM or another error occurs in virJSONValueFromString the
parser state object will be leaked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If OOM occurs in virJSONParserHandleStartMap it will free
a variable that is owned by another object. This leads to
a later double-free.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If virDBusMessageIterEncode hits an OOM condition it often
leaks the memory associated with the dbus iterator object
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code parsing comments in config files called virConfAddEntry
but did not check for failure. This caused the comment string to
leak on OOM.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virVMXFormatConfig called virVMXEscapeHexPipe but
forgot to check for OOM. This caused data to silently
be lost.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virStoragePoolDefParseSource method would set def->nhosts
before allocating def->hosts. If the allocation failed due to
OOM, the cleanup code would crash accessing out of bounds.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If xenParseSxprPCI failed to expand the def->hostdevs array
due to OOM, it would free the hostdev instance twice.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virDomainSnapshotDefParse method assigned to def->ndisks
before allocating def->disks. Thus if an OOM occurred, the
cleanup code would access out of bounds.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Several places in virInterfaceDefParseProtoIPv6 clobber the
default 'ret' return value. So when jumping to cleanup on
error, 'ret' may mistakenly be set to 0 instead of -1. This
caused failure to report OOM errors, meaning data was silently
lost during parsing.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The methods for obtaining the Xen dom ID cannot distinguish
between returning -1 due to an error and returning -1 due to
the domain being shutoff. Change them to return the dom ID
via an output parameter.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The xenParseSxpr method sets def->nconsoles to 1 before allocating
the def->consoles array. If the allocation fails due to OOM the
cleanup code will thus crash accessing out of bounds.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If an OOM occurs in xenFormatXM when formatting to the
serial device value, the value is leaked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If an OOM occurs when xenFormatXM is setting the 'hpet'
variable it is silently ignored. Fix it to propagate
to the callers.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The xenParseXM sets def->nconsoles to 1 before claling
VIR_REALLOC_N on def->consoles. So if the alloc fails
due to OOM, the cleanup code will crash accessing a
console that does not exist.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If qemuParseCommandLine finds an arg it does not understand
it adds it to the QEMU passthrough custom arg list. If the
qemuParseCommandLine method hits an error for any reason
though, it just does 'VIR_FREE(cmd)' on the custom arg list.
This means all actual args / env vars are leaked. Introduce
a qemuDomainCmdlineDefFree method to be used for cleanup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the call to virDomainControllerInsert fails in
qemuParseCommandLine, the controller struct is leaked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The 'qemuStringToArgvEnv' method splits up a string of command
line env/args to an 'arglist' array. It then copies env vars
to a 'progenv' array and args to a 'progargv' array. When
copyin the env vars, it NULL-ifies the element in 'arglist'
that is copied.
Upon OOM the 'virStringListFree' is called on progenv and
arglist. Unfortunately, because the elements in 'arglist'
related to env vars have been set to NULL, the call to
virStringListFree(arglist) doesn't free anything, even
though some non-NULL args vars still exist later in the
array.
To fix this leak, stop NULL-ifying the 'arglist' elements,
and change the cleanup code to only free elements in the
'arglist' array, not 'progenv'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In a number of places in qemuParseCommandLineDisk, an error
is reported, but no 'goto error' jump is used. This causes
failure to report OOM conditions to the caller.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If OOM occurs in qemuParseCommandLineDisk some intermediate
variables will be leaked when parsing Sheepdog or RBD disks.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuBuildCommandLine code for parsing sound cards will leak
an intermediate variable if an OOM occurs. Move the free'ing of
the variable earlier to avoid the leak.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In qemuParseNBDString, if the virURIParse fails, the
error is not reported to the caller. Instead execution
falls through to the non-URI codepath causing memory
leaks later on.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If qemuAddRBDHost fails due to parsing problems or OOM, then
qemuParseRBDString cleanup is skipped causing a memory leak.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
qemuDomainPCIAddressGetNextSlot has a loop for finding
compatible PCI buses. In the loop body it creates a
PCI address string, but never frees this. This causes
a leak if the loop executes more than one iteration,
or if a call in the loop body fails.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If virDomainSoundCodecDefParseXML returns an error (eg due
to OOM), then the xml nodeset codecNodes is leaked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If virDomainVcpuPinDefArrayFree is called with def != NULL,
but nvcpupin == 0, then it leaks memory for 'def'. This is
an unusual scenario, but it hits when cleaning up after an
OOM during parsing of XML.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This resolves one of the issues listed in:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
00:1E.0 is the location of this controller on at least some actual Q35
hardware, so we try to replicate the placement. The bridge should work
just as well in any other location though, so if 00:1E.0 isn't
available, just allow it to be auto-assigned anywhere appropriate.
This resolves one of the issues in:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
This device is identical to qemu's "intel-hda" device (known as "ich6"
in libvirt), but has a different PCI device ID (which matches the ID
of the hda audio built into the ich9 chipset, of course). It's not
supported in earlier versions of qemu, so it requires a capability
bit.
I'm not sure why this code was written to compare the strings that it
had just retrieved from an enum->string conversion, rather than just
look at the original enum values, but this yields the same results,
and is much more efficient (especially as you add more devices).
This is a prerequisite for patches to resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
Part of the resolution to:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
Although most devices available in qemu area defined as PCI devices,
and strictly speaking should only be attached via a PCI slot, in
practice qemu allows them to be attached to a PCIe slot and sometimes
this makes sense.
For example, The UHCI and EHCI USB controllers are usually attached
directly to the PCIe "root complex" (i.e. PCIe slots) on real
hardware, so that should be possible for a Q35-based qemu virtual
machine as well.
We still want to prefer a standard PCI slot when auto-assigning
addresses, though, and in general to disallow attaching PCI devices
via PCIe slots.
This patch makes that possible by adding a new
QEMU_PCI_CONNECT_TYPE_EITHER_IF_CONFIG flag. Three things are done
with this flag:
1) It is set for the "pcie-root" controller
2) qemuCollectPCIAddress() now has a set of nested switches that set
this "EITHER" flag for devices that we want to allow connecting to
pcie-root when specifically requested in the config.
3) qemuDomainPCIAddressFlagsCompatible() adds this new flag to the
"flagsMatchMask" if the address being checked came from config rather
than being newly auto-allocated by libvirt (this knowledge is
conveniently already available in the "fromConfig" arg).
Now any device having the EITHER flag set can be connected to
pcie-root if explicitly requested, but auto-allocated addresses for
those devices will still be standard PCI slots instead.
This patch only loosens the restrictions on devices that have been
specifically requested, but the setup is such that it should be fairly
easy to add new devices.
Replace them with switch cases. This will make it more efficient when
we add exceptions for more controller types, and other device types.
This is a prerequisite for patches to resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
Packets sent by guests on virbrN, *or* by dnsmasq on the same, to
- 255.255.255.255/32 (netmask-independent local network broadcast
address), or to
- 224.0.0.0/24 (local subnetwork multicast range)
are never forwarded, hence it is not necessary to masquerade them.
In fact we must not masquerade them: translating their source addresses or
source ports (where applicable) may confuse receivers on virbrN.
One example is the DHCP client in OVMF (= UEFI firmware for virtual
machines):
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/1506/focus=2640
It expects DHCP replies to arrive from remote source port 67. Even though
dnsmasq conforms to that, the destination address (255.255.255.255) and
the source address (eg. 192.168.122.1) in the reply allow the UDP
masquerading rule to match, which rewrites the source port to or above
1024. This prevents the DHCP client in OVMF from accepting the packet.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=709418
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
The functions
- iptablesAddForwardDontMasquerade(),
- iptablesRemoveForwardDontMasquerade
handle exceptions in the masquerading implemented in the POSTROUTING chain
of the "nat" table. Such exceptions should be added as chronologically
latest, logically top-most rules.
The bridge driver will call these functions beginning with the next patch:
some special destination IP addresses always refer to the local
subnetwork, even though they don't match any practical subnetwork's
netmask. Packets from virbrN targeting such IP addresses are never routed
outwards, but the current rules treat them as non-virbrN-destined packets
and masquerade them. This causes problems for some receivers on virbrN.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
The previous patches added infrastructure to report better errors from
monitor in some cases. This patch finalizes this "feature" by enabling
this enhanced error reporting on early phases of VM startup. In these
phases the possibility of qemu producing a useful error message is
really high compared to running it during the whole life cycle. After
the start up is complete, the feature is disabled to provide the usual
error messages so that users are not confused by possibly irrelevant
messages that may be in the domain log.
The original motivation to do this enhancement is to capture errors when
using VFIO device passthrough, where qemu reports errors after the
monitor is initialized and the existing error catching code couldn't
catch this producing a unhelpful message:
# virsh start test
error: Failed to start domain test
error: Unable to read from monitor: Connection reset by peer
With this change, the message is changed to:
# virsh start test
error: Failed to start domain test
error: internal error: early end of file from monitor: possible problem:
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: error, group 8 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: failed to get group 8
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: Device 'vfio-pci' could not be initialized
Change the monitor error code to add the ability to access the qemu log
file using a file descriptor so that we can dig in it for a more useful
error message. The error is now logged on monitor hangups and overwrites
a possible lesser error. A hangup on the monitor usualy means that qemu
has crashed and there's a significant chance it produced a useful error
message.
The functionality will be latent until the next patch.
Early VM startup errors usually produce a better error message in the
machine log file. Currently we were accessing it only when the process
exited during certain phases of startup. This will help adding a more
comprehensive error extraction for early qemu startup phases.
This patch adds infrastructure to keep a file descriptor for the machine
log file that will be used in case an error happens.
Teach the function to skip character device definitions printed by qemu
at startup in addition to libvirt log messages and make it usable from
outside of qemu_process.c. Also add documentation about the func.
The parsing of '-usb' did not check for failure of the
virDomainControllerInsert method. As a result on OOM, the
parser mistakenly attached USB disks to the IDE controller.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code formatting NUMA args was ignoring the return value
of virBitmapFormat, so on OOM, it would silently drop the
NUMA cpumask arg.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When building boot menu args, if OOM occurred the CLI args
would end up containing 'order=(null)' due to a missing
call to 'virBufferError'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
On win32, using text mode for binary files might result in short
reads since ASCII character 0x1A is interpreted as EOF. Also, it
could lead to problems using the seek functions because of the \r
handling.
Signed-off-by: Claudio Bley <cbley@av-test.de>
N.B. This had no ill effects as long as O_RDONLY is defined to
to be 0, such that the expression (O_RDONLY < 0) yielded 0
again.
Signed-off-by: Claudio Bley <cbley@av-test.de>
The virCommandAddEnvPassCommon method ignored the failure to
pre-allocate the env variable array with VIR_RESIZE_N. While
this is harmless, it confuses the test harness which is trying
to validate OOM handling of every individual allocation call.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When the various viralloc.c functions were changed to use the
normal error reporting code, the OOM injection code paths
were not updated to report errors.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuParseCommandLine method did not check the return value of
virStringSplit to see if OOM had occurred. This lead to dereference
of a NULL pointer on OOM.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Most callers of qemuParseKeywords were assigning its return
value to a 'size_t' variable. Then then also checked '< 0'
for error condition, but this will never be true with the
unsigned size_t variable. Rather than using 'ssize_t', change
qemuParseKeywords so that the element count is returned via
an output parameter, leaving the return value solely as an
error indicator.
This avoids a crash accessing beyond the end of an error
upon OOM.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In
commit 41b5505679
Author: Eric Blake <eblake@redhat.com>
Date: Wed Aug 28 15:01:23 2013 -0600
qemu: simplify list cleanup
The qemuStringToArgvEnv method was changed to use virStringFreeList
to free the 'arglist' array. This method assumes the string list
array is NULL terminated, however, qemuStringToArgvEnv was not
ensuring this when populating 'arglist'. This caused an out of
bounds access by virStringFreeList when OOM occured in the initial
loop of qemuStringToArgvEnv
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When parsing the RBD hosts, it increments the 'nhosts' counter
before increasing the 'hosts' array allocation. If an OOM then
occurs when increasing the array allocation, the cleanup block
will attempt to access beyond the end of the array. Switch
to using VIR_EXPAND_N instead of VIR_REALLOC_N to protect against
this mistake
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If OOM occurs in qemuDomainCCWAddressSetCreate, it jumps to
a cleanup block and frees the partially initialized object.
It then mistakenly returns the address of the just free'd
pointer instead of NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If a OOM error occurs in virGetConnect, this may cause the
virConnectDispose method to de-reference a NULL pointer,
since the close callback will not have been initialized.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virDomainDefParseXML method did not check the return value
of the virBitmapNew API call for NULL. This lead to a crash on
OOM
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If an OOM error occurs in virSecurityDeviceLabelDefParseXML the
cleanup code may free an uninitialized pointer, causing a crash
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow creation of a virNetSocketPtr instance from a pre-opened
socketpair FD, add a virNetSocketNewConnectSockFD method.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The new function virConnectGetCPUModelNames allows to retrieve the list
of CPU models known by the hypervisor for a specific architecture.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The fix for CVE-2013-4311 had a pre-requisite enhancement
to the identity code
commit db7a5688c0
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Thu Aug 22 16:00:01 2013 +0100
Also store user & group ID values in virIdentity
This had a typo which caused the group ID to overwrite the
user ID string. This meant any checks using this would have
the wrong ID value. This only affected the ACL code, not the
initial polkit auth. It also leaked memory.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If a dir does not exist, raise an immediate error in logs
rather than letting virFileResolveAllLinks fail, since this
gives better error reporting to the user.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
When FUSE is enabled, the LXC container is setup with
a custom /proc/meminfo file. This file uses "KB" as a
suffix, rather than "kB" which is the kernel's style.
Fix this inconsistency to avoid confusing apps.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
The ABI compatibility check for domain features didn't check the
expanded HyperV and APIC EOI values, thus possibly allowing change in
guest ABI.
Add the check and use typecasted switch statement to warn developers
when adding a new HyperV feature.
Since the wait is done during migration (still inside
QEMU_ASYNC_JOB_MIGRATION_OUT), the code should enter the monitor as such
in order to prohibit all other jobs from interfering in the meantime.
This patch fixes bug #1009886 in which qemuDomainGetBlockInfo was
waiting on the monitor condition and after GetSpiceMigrationStatus
mangled its internal data, the daemon crashed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1009886
This splits up the version parsing code into a callable API like QEMU
help/version string parsing so that we can test it as we need to add
additional patterns for newer versions/products.
Rather than looking up the path to vmrun each time we call it, look it
up once and save it. This sets up the ability for us to detect where the
path is on Mac OS X and not have to look it up each time we execute it.
The VMware driver supports multiple backends for the VMware Player and
VMware Workstation, convert this logic into enum and use VIR_ENUM_IMPL()
to provide conversions to and from strings.
This resolves https://bugzilla.redhat.com/show_bug.cgi?id=1008903
The Q35 machinetype has an implicit SATA controller at 00:1F.2 which
isn't given the "expected" id of ahci0 by qemu when it's created. The
original suggested solution to this problem was to not specify any
controller for the disks that use the default controller and just
specify "unit=n" instead; qemu should then use the first IDE or SATA
controller for the disk.
Unfortunately, this "solution" is ignorant of the fact that in the
case of SATA disks, the "unit" attribute in the disk XML is actually
*not* being used for the unit, but is instead used to specify the
"bus" number; each SATA controller has 6 buses, and each bus only
allows a single unit. This makes it nonsensical to specify unit='n'
where n is anything other than 0. It also means that the only way to
connect more than a single device to the implicit SATA controller is
to explicitly give the bus names, which happen to be "ide.$n", where
$n can be replaced by the disk's "unit" number.
In commit 6d41cb8, the interface for virEventAddHandleFunc was changed.
This patch updates the documentation for virEventAddHandle to reflect
the new significance of the return value. Also, both functions now
mention -1 for failure.
With the existing pkcheck (pid, start time) tuple for identifying
the process, there is a race condition, where a process can make
a libvirt RPC call and in another thread exec a setuid application,
causing it to change to effective UID 0. This in turn causes polkit
to do its permission check based on the wrong UID.
To address this, libvirt must get the UID the caller had at time
of connect() (from SO_PEERCRED) and pass a (pid, start time, uid)
triple to the pkcheck program.
This fix requires that libvirt is re-built against a version of
polkit that has the fix for its CVE-2013-4288, so that libvirt
can see 'pkg-config --variable pkcheck_supports_uid polkit-gobject-1'
Signed-off-by: Colin Walters <walters@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The polkit access driver will want to use the process start
time field. This was already set for network identities, but
not for the system identity.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Future improvements to the polkit code will require access to
the numeric user ID, not merely user name.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
virDomainSetBlockIoTuneEnsureACL was incorrectly called after we already
started a job. As a result of this, the job was not cleaned up when an
access driver had forbidden the action.
Useful to set custom forwarders instead of using the contents of
/etc/resolv.conf. It helps me to setup dnsmasq as local nameserver to
resolve VM domain names from domain 0, when domain option is used.
Signed-off-by: Diego Woitasen <diego.woitasen@vhgroup.net>
Signed-off-by: Eric Blake <eblake@redhat.com>
VMWare Fusion 5 can set the CD-ROM's device name to be 'auto detect' when
using the physical drive via 'cdrom-raw' device type. VMWare will then
connect to first available host CD-ROM to the virtual machine upon start
up according to VMWare documentation. If no device is available, it
appears that the device will remain disconnected.
To better model this a CD-ROM that is marked as "auto detect" when in
the off state would be modeled as the following with this patch:
<disk type='block' device='lun'>
<source startupPolicy='optional'/>
<target dev='hda' bus='ide'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
Once the domain transitions to the powered on state, libvirt can
populate the remaining source data with what is connected, if anything.
However future power cycles, the domain may not always start with that
device attached.
Currently the XML parser already allows the following syntax:
<disk type='block' device='cdrom'>
<source startupPolicy='optional'/>
<target dev='hda' bus='ide'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
But it if the dev value is NULL then it would not have the leading
"<source ", resulting in invalid XML.
qemu/KVM also supports a tftp URL while specifying the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='tftp' name='/url/path'>
<host name='host.name' port='69'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The ftps protocol is another protocol supported by qemu/KVM while specifying
the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftps' name='/url/path'>
<host name='host.name' port='990'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The https protocol is also accepted by qemu/KVM when specifying the cdrom ISO
image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='https' name='/url/path'>
<host name='host.name' port='443'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
GCC 4.8.0+ whines about variable "new" being uninitialized since
commit 73bfac0e71. This is a false positive as the
xmlFreeNode(new) statement can be only reached if new was actually
allocated successfully.
CC conf/libvirt_conf_la-domain_conf.lo
conf/domain_conf.c: In function 'virDomainDefSetMetadata':
conf/domain_conf.c:18650:24: error: 'new' may be used uninitialized in this function [-Werror=maybe-uninitialized]
xmlFreeNode(new);
Reported independently by John Ferlan and Michal Privoznik.
Commit 073e1575 tried to set things up so that 1) generated files
to be shipped in the tarball always live in srcdir, and 2) we have
no files in SOURCES that depend on any other files with a literal
$(srcdir) in the name, because that situation can cause confusing
results for the make expansion of $@ depending on whether the file
is found locally or via VPATH. But all my testing for that patch
was done incrementally, where all the protocol.[ch] files had
already been generated prior to the patch and were up-to-date in
the srcdir, and thus I missed one case where $@ causes grief in a
VPATH build from a fresh checkout:
We have a pattern rule for generating remote_protocol.[ch], and
what's more, the rule for protocol.c depends on protocol.h AND
on the protocol.x file. The pattern for protocol.c is only
satisfied via the VPATH lookup for protocol.x, and if protocol.h
doesn't yet exist, the VPATH rule kicks in and we end up with a
dependency on a file with $(srcdir) in the name. Based on make's
rules for $@, this resulted in make building remote_protocol.h
into srcdir (where we want it), then remote_protocol.c into
builddir (oops, not so good for the tarball), and also causes
the build to fail (the compiler can't find the .h if it lives
in a different directory than the .c):
CC remote/libvirt_driver_remote_la-remote_protocol.lo
remote/remote_protocol.c:7:29: fatal error: remote_protocol.h: No such file or directory
#include "remote_protocol.h"
^
compilation terminated.
As before, the fix is to hard-code the output file to go into
srcdir in spite of $@; but since this is in a pattern rule, we
are forced to use $@ in the recipe, so the patch is a bit
trickier than what was done in commit 073e1575.
* src/Makefile.am (%protocol.c, %protocol.h): Force output to srcdir.
Signed-off-by: Eric Blake <eblake@redhat.com>
Eric Blake suggested that we could do a little better in case copying of
the metadata to be set fails. With this patch, the old metadata is
discarded after the new string is copied successfuly.
If the ABI compatibility check with the "migratable" user XML is
successful, we would leak the originally parsed XML from the user that
would not be used in this case.
Reported by Ján Tomko.
virDomainSetMetadata when operating on the metadata element was
requesting the @key argument to be passed even if @metadata was NULL
used to delete the corresponding metadata element. This is not needed as
the key is only used when adding the element and matching is done via
the XML namespace.
The virDomainGetMetadata function was designed to support also retrieval
of app specific metadata from the <metadata> element. This functionality
was never implemented originally.
The function implemented common behavior that can be reused for other
hypervisor drivers that use the virDomainObj data structures. Factor out
the core into a separate helper func.
The function implemented common behavior that can be reused for other
hypervisor drivers that use the virDomainObj data structures. Factor out
the core into a separate helper func.
In the original implementation of external checkpoints I've mistakenly
used the live definition to be stored in the save image. The normal
approach is to use the "migratable" definition. This was discovered when
commit 07966f6a8b changed the behavior to
use a converted XML from the user to do the compatibility check to fix
problem when using the regular machine saving.
As the previous patch added a compatibility layer, we can now change the
type of the XML in the image.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1008340
External checkpoints have a bug in the implementation where they use the
normal definition instead of the "migratable" one. This causes errors
when the snapshot is being reverted using the workaround method via
qemuDomainRestoreFlags() with a custom XML. This issue was introduced
when commit 07966f6a8b changed the code to
compare "migratable" XMLs from the user as we should have used
migratable in the image too.
This patch adds a compatibility layer, so that fixing the snapshot code
won't make existing snapshots fail to load.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1008340
CD-ROMs and Floppies are allowed to have no source to imply they are
empty or disconnected. Since the LUN type is used for raw CD-ROM access
with QEMU (and VMWare in the future), it also needs to allow an empty
source when the raw CD-ROM device is disconnected from the domain.
qemuMigrationEatCookie has flags to control if these should
be parsed, but it does not fill mig->flags. These cookies might
get leaked if these flags are not set by qemuMigrationBakeCookie.
42 (32 direct, 10 indirect) bytes in 1 blocks are definitely lost in
loss record 361 of 662
==123== by 0x1BA33FCA: qemuMigrationEatCookie (qemu_migration.c:678)
==123== by 0x1BA34A1E: qemuMigrationRun (qemu_migration.c:3108)
==123== by 0x1BA3622B: doNativeMigrate (qemu_migration.c:3343)
==123== by 0x1BA3B408: qemuMigrationPerform (qemu_migration.c:4138)
https://bugzilla.redhat.com/show_bug.cgi?id=1008619
1,003 bytes in 1 blocks are definitely lost in loss record 599 of 635
==404== by 0x50728A7: virBufferAddChar (virbuffer.c:185)
==404== by 0x50BC466: virSystemdEscapeName (virsystemd.c:67)
==404== by 0x50BC6B2: virSystemdMakeSliceName (virsystemd.c:108)
==404== by 0x50BC870: virSystemdCreateMachine (virsystemd.c:169)
==404== by 0x5078267: virCgroupNewMachine (vircgroup.c:1498)
Bother those kernel developers. In the latest rawhide, kernel
and glibc have now been unified so that <netinet/in.h> and
<linux/in6.h> no longer clash; but <linux/if_bridge.h> is still
not self-contained. Because of the latest header change, the
build is failing with:
checking for linux/param.h... no
configure: error: You must install kernel-headers in order to compile libvirt with QEMU or LXC support
with details:
In file included from conftest.c:561:0:
/usr/include/linux/in6.h:71:18: error: field 'flr_dst' has incomplete type
struct in6_addr flr_dst;
We need a workaround to avoid our workaround :)
* configure.ac (NETINET_LINUX_WORKAROUND): New test.
* src/util/virnetdevbridge.c (includes): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Since virnetsocket conditionally uses selinux we need to link against it
otherwise the build fails with:
CCLD libvirtd
/usr/bin/ld: ../src/.libs/libvirt-lxc.so: undefined reference to symbol 'freecon'
/lib/i386-linux-gnu/libselinux.so.1: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[3]: *** [libvirtd] Error 1
An off-list bug report mentioned some confusion where the public
documentation of libvirt.c:virConnectGetHostname did not match
the private documentation of util/virutil.c:virGetHostname.
* src/libvirt.c (virConnectGetHostname): Tweak docs.
Signed-off-by: Eric Blake <eblake@redhat.com>
The VIR_ACCESS_PERM_CONNECT_DETECT_STORAGE_POOLS enum
constant had its string format be 'detect_storage_pool',
note the missing trailing 's'. This prevent the ACL
check from ever succeeding. Fix this and add a simple
test script to validate this problem of matching names.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Right now we mount selinuxfs even user namespace is enabled and
ignore the error. But we shouldn't ignore these errors when user
namespace is not enabled.
This patch skips mounting selinuxfs when user namespace enabled.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
When reverting a live internal snapshot with a live guest the ABI
compatiblity check was comparing a "migratable" definition with a normal
one. This resulted in the check failing with:
revert requires force: Target device address type none does not match source pci
This patch generates a "migratable" definition from the actual one to
check against the definition from the snapshot to avoid this problem.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1006886
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=983026
The netcf interface driver previously had no state driver associated
with it - as a connection was opened, it would create a new netcf
instance just for that connection, and close it when it was
finished. the problem with this is that each connection to libvirt
used up a netlink socket, and there is a per process maximum of ~1000
netlink sockets.
The solution is to create a state driver to go along with the netcf
driver. The state driver will opens a netcf instance, then all
connections share that same netcf instance, thus only a single
netlink socket will be used no matter how many connections are mde to
libvirtd.
This was rather simple to do - a new virObjectLockable class is
created for the single driverState object, which is created in
netcfStateInitialize and contains the single netcf handle; instead of
creating a new object for each client connection, netcfInterfaceOpen
now just increments the driverState object's reference count and puts
a pointer to it into the connection's privateData. Similarly,
netcfInterfaceClose() just un-refs the driverState object (as does
netcfStateCleanup()), and virNetcfInterfaceDriverStateDispose()
handles closing the netcf instance. Since all the functions already
have locking around them, the static lock functions used by all
functions just needed to be changed to call virObjectLock() and
virObjectUnlock() instead of directly calling the virMutex* functions.
This better fits the modern naming scheme in libvirt, and anticipates
an upcoming change where a single instance of this state will be
maintained by a separate state driver, and every instance of the netcf
driver will share the same state.
If the guest is configured with
<filesystem type='mount'>
<source dir='/'/>
<target dir='/'/>
<readonly/>
</filesystem>
Then any submounts under / should also end up readonly, except
for those setup as basic mounts. eg if the user has /home on a
separate volume, they'd expect /home to be readonly, but we
should not touch the /sys, /proc, etc dirs we setup ourselves.
Users can selectively make sub-mounts read-write again by
simply listing them as new mounts without the <readonly>
flag set
<filesystem type='mount'>
<source dir='/home'/>
<target dir='/home'/>
</filesystem>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Move the array of basic mounts out of the lxcContainerMountBasicFS
function, to a global variable. This is to allow it to be referenced
by other methods wanting to know what the basic mount paths are.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Debian systems may run the 'systemd-logind' daemon, which causes the
/sys/fs/cgroup/systemd mount to be setup, but no other cgroup
controllers are created. While the LXC driver considers cgroups to
be mandatory, the QEMU driver is supposed to accept them as optional.
We detect whether they are present by looking in /proc/mounts for
any mounts of type 'cgroups', but this is not sufficient. We need to
skip any named mounts (as seen by a name=XXX string in the mount
options), so that we only detect actual resource controllers.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=721979
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The polkit access driver used the wrong permission names for checks
on storage pools, volumes and node devices. This led to them always
being denied access.
The 'dettach' permission was also mis-spelt and should have been
'detach'. While permission names are ABI sensitive, the fact that
the code used the wrong object name for checking node device
permissions, means that no one could have used the mis-spelt
'dettach' permission.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This fixes the description of virConnectGetType() API function in
API documentation to match the real functionality that it can be
used to get driver name, and provide a hint on how to learn about
full capabilities.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch introduces virDBusIsServiceEnabled, we can use
this method to get if the service is supported.
In one case, if org.freedesktop.machine1 is unavailable on
host, we should skip creating machine through systemd.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
The devpts, dev and fuse filesystems are mounted temporarily.
there is no need to export them to container if container shares
the root directory with host.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Some users in Ubuntu/Debian seem to have a setup where all the
cgroup controllers are mounted on /sys/fs/cgroup rather than
any /sys/fs/cgroup/<controller> name. In the loop which detects
which controllers are present for a mount point we were modifying
'mnt_dir' field in the 'struct mntent' var, but not always restoring
the original value. This caused detection to break in the all-in-one
mount setup.
Fix that logic bug and add test case coverage for this mount
setup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After freeing the bitmap pointer, it must set the pointer to NULL.
This will avoid any other use of the freed memory of the bitmap pointer.
https://bugzilla.redhat.com/show_bug.cgi?id=1006710
Signed-off-by: Liuji (Jeremy) <jeremy.liu@huawei.com>
Osier Yang pointed out that ever since commit 31cb030, the
signature of qemuDomainObjEndJob was changed to return a bool.
While comparison against 0 or > 0 still gives the right results,
it looks fishy; we also had one place that was comparing < 0
which is effectively dead code.
* src/qemu/qemu_migration.c (qemuMigrationPrepareAny): Fix dead
code bug.
(qemuMigrationBegin): Use more canonical form of bool check.
* src/qemu/qemu_driver.c (qemuAutostartDomain)
(qemuDomainCreateXML, qemuDomainSuspend, qemuDomainResume)
(qemuDomainShutdownFlags, qemuDomainReboot, qemuDomainReset)
(qemuDomainDestroyFlags, qemuDomainSetMemoryFlags)
(qemuDomainSetMemoryStatsPeriod, qemuDomainInjectNMI)
(qemuDomainSendKey, qemuDomainGetInfo, qemuDomainScreenshot)
(qemuDomainSetVcpusFlags, qemuDomainGetVcpusFlags)
(qemuDomainRestoreFlags, qemuDomainGetXMLDesc)
(qemuDomainCreateWithFlags, qemuDomainAttachDeviceFlags)
(qemuDomainUpdateDeviceFlags, qemuDomainDetachDeviceFlags)
(qemuDomainBlockResize, qemuDomainBlockStats)
(qemuDomainBlockStatsFlags, qemuDomainMemoryStats)
(qemuDomainMemoryPeek, qemuDomainGetBlockInfo)
(qemuDomainAbortJob, qemuDomainMigrateSetMaxDowntime)
(qemuDomainMigrateGetCompressionCache)
(qemuDomainMigrateSetCompressionCache)
(qemuDomainMigrateSetMaxSpeed)
(qemuDomainSnapshotCreateActiveInternal)
(qemuDomainRevertToSnapshot, qemuDomainSnapshotDelete)
(qemuDomainQemuMonitorCommand, qemuDomainQemuAttach)
(qemuDomainBlockJobImpl, qemuDomainBlockCopy)
(qemuDomainBlockCommit, qemuDomainOpenGraphics)
(qemuDomainGetBlockIoTune, qemuDomainGetDiskErrors)
(qemuDomainPMSuspendForDuration, qemuDomainPMWakeup)
(qemuDomainQemuAgentCommand, qemuDomainFSTrim): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Automake 2.0 will enable subdir-objects by default; in preparation
for that change, automake 1.14 outputs LOADS of warnings:
daemon/Makefile.am:38: warning: source file '../src/remote/remote_protocol.c' is in a subdirectory,
daemon/Makefile.am:38: but option 'subdir-objects' is disabled
automake-1.14: warning: possible forward-incompatibility.
automake-1.14: At least a source file is in a subdirectory, but the 'subdir-objects'
automake-1.14: automake option hasn't been enabled. For now, the corresponding output
automake-1.14: object file(s) will be placed in the top-level directory. However,
automake-1.14: this behaviour will change in future Automake versions: they will
automake-1.14: unconditionally cause object files to be placed in the same subdirectory
automake-1.14: of the corresponding sources.
automake-1.14: You are advised to start using 'subdir-objects' option throughout your
automake-1.14: project, to avoid future incompatibilities.
daemon/Makefile.am:38: warning: source file '../src/remote/lxc_protocol.c' is in a subdirectory,
daemon/Makefile.am:38: but option 'subdir-objects' is disabled
...
As automake 1.9 also supported this option, and the previous patches
fixed up the code base to work with it, it is safe to now turn it on
unconditionally.
* configure.ac (AM_INIT_AUTOMAKE): Enable subdir-objects.
* .gitignore: Ignore .dirstamp directories.
* src/Makefile.am (PDWTAGS, *-protocol-struct): Adjust to
new subdir-object location of .lo files.
Signed-off-by: Eric Blake <eblake@redhat.com>
We have been adding new .x files without keeping the list of
*-structs files up-to-date. This adds the support for the
recent additions.
In the process of testing this, I also noticed that Fedora 19's
use of dwarves-1.10 (providing pdwtags version 1.9) was producing
a single line on stderr but still giving enough useful info on
stdout that we could check structs; the real goal of checking
stderr separately from stdout was to avoid the bug in dwarves-1.9
where stdout was empty (see bug http://bugzilla.redhat.com/772358).
* src/Makefile.am (struct_prefix, PROTOCOL_STRUCTS): Add missing
struct tests.
(PDWTAGS): Work with Fedora 19 pdwtags.
(lxc_monitor_protocol-struct, lock_protocol-struct): New rules.
* src/lxc_monitor_protocol-structs: New file.
* src/lock_protocol-structs): Likewise.
* cfg.mk (generated_files): Enlarge list.
Signed-off-by: Eric Blake <eblake@redhat.com>
Trying to enable automake's subdir-objects option resulted in
the creation of literal directories such as src/$(srcdir)/remote/.
I traced this to the fact that we had used a literal $(srcdir)
in a location that later fed an automake *_SOURCES variable.
This has also been reported as an automake bug:
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928
but it's better to fix our code than to wait for an automake fix.
Some things to remember that affect VPATH builds, and where an
in-tree build is blissfully unaware of the issues: if a VPATH
build fails to find a file that was used as a prereq of any
other target, then the rule for that file will expand $@ to
prefer the current build dir (bad because a VPATH build on a
fresh checkout will then stick $@ in the current directory
instead of the desired srcdir); conversely, if a VPATH build
finds the file in srcdir but decides it needs to be rebuilt,
then the rule for that file will expand $@ to include the
directory where it was found out-of-date (bad for an explicit
listing of $(srcdir)/$@ because an incremental VPATH build will
then expand srcdir twice). As we want these files to go into
srcdir unconditionally, we have to massage or avoid $@ for any
recipe that involves one of these files.
Therefore, this patch removes all uses of $(srcdir) from any
generated file name that later feeds a *_SOURCES variable, and
then rewrites all the recipes to generate those files to
hard-code their creation into srcdir without the use of $@.
* src/Makefile.am (REMOTE_DRIVER_GENERATED): Drop $(srcdir); VPATH
builds know how to find the files, and automake subdir-objects
fails with it in place.
(LXC_MONITOR_PROTOCOL_GENERATED, (LXC_MONITOR_GENERATED)
(ACCESS_DRIVER_GENERATED, LOCK_PROTOCOL_GENERATED): Likewise.
(*_client_bodies.h): Hard-code rules to write into srcdir, as
VPATH tries to build $@ locally if missing.
(util/virkeymaps.h): Likewise.
(lxc/lxc_monitor_dispatch.h): Likewise.
(access/viraccessapi*): Likewise.
(locking/lock_daemon_dispatch_stubs.h): Likewise.
* daemon/Makeflie.am (DAEMON_GENERATED, remote_dispatch.h):
Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
fixup DAEMON_GENERATED
Failure to attach to a domain during 'virsh qemu-attach' left
the list of domains in an odd state:
$ virsh qemu-attach 4176
error: An error occurred, but the cause is unknown
$ virsh list --all
Id Name State
----------------------------------------------------
2 foo shut off
$ virsh qemu-attach 4176
error: Requested operation is not valid: domain is already active as 'foo'
$ virsh undefine foo
error: Failed to undefine domain foo
error: Requested operation is not valid: cannot undefine transient domain
$ virsh shutdown foo
error: Failed to shutdown domain foo
error: invalid argument: monitor must not be NULL
It all stems from leaving the list of domains unmodified on
the initial failure; we should follow the lead of createXML
which removes vm on failure (the actual initial failure still
needs to be fixed in a later patch, but at least this patch
gets us to the point where we aren't getting stuck with an
unremovable "shut off" transient domain).
While investigating, I also found a leak in qemuDomainCreateXML;
the two functions should behave similarly. Note that there are
still two unusual paths: if dom is not allocated, the user will
see an OOM error even though the vm remains registered (but oom
errors already indicate tricky cleanup); and if the vm starts
and then quits again all before the job ends, it is possible
to return a non-NULL dom even though the dom will no longer be
useful for anything (but this at least lets the user know their
short-lived vm ran).
* src/qemu/qemu_driver.c (qemuDomainCreateXML): Don't leak vm on
failure to obtain job.
(qemuDomainQemuAttach): Match cleanup of qemuDomainCreateXML.
Signed-off-by: Eric Blake <eblake@redhat.com>
ARM v7 can operate in either little or big endian modes. Add
support for the big-endian version known as armv7b from uname.
Signed-off-by: Yogesh Tillu <tillu.yogesh@gmail.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, only X86 provides users CPU features with CPUID instruction.
If users specify the features for non-x86, it should tell users to
remove them.
This patch is to report one error if features are specified by
users for non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
While debugging a failure of 'virsh qemu-attach', I noticed that
we were leaking the count of active domains on failure. This
means that a libvirtd session that is supposed to quit after
active domains disappear will hang around forever.
* src/qemu/qemu_process.c (qemuProcessAttach): Undo count of
active domains on failure.
Signed-off-by: Eric Blake <eblake@redhat.com>
In Fedora 19, 'qemu-kvm' is a simple wrapper that calls
'qemu-system-x86_64 -machine accel=kvm'. Attempting
to use 'virsh qemu-attach $pid' to a machine started as:
qemu-kvm -cdrom /var/lib/libvirt/images/foo.img \
-monitor unix:/tmp/demo,server,nowait -name foo \
--uuid cece4f9f-dff0-575d-0e8e-01fe380f12ea
was failing with:
error: XML error: No PCI buses available
because we did not see 'kvm' in the executable name read from
/proc/$pid/cmdline, and tried to assign os.machine as
"accel=kvm" instead of "pc"; this in turn led to refusal to
recognize the pci bus.
Noticed while investigating https://bugzilla.redhat.com/995312
although there are still other issues to fix before that bug
will be completely solved.
I've concluded that the existing parser code for native-to-xml
is a horrendous hodge-podge of ad-hoc approaches; I basically
rewrote the -machine section to be a bit saner.
* src/qemu/qemu_command.c (qemuParseCommandLine): Don't assume
-machine argument is always appropriate for os.machine; set
virtType if accel is present.
Signed-off-by: Eric Blake <eblake@redhat.com>
'virsh domxml-from-native' and 'virsh qemu-attach' could misbehave
for an emulator installed in (a somewhat unlikely) location
such as /usr/local/qemu-1.6/qemu-system-x86_64 or (an even less
likely) /opt/notxen/qemu-system-x86_64. Limit the strstr seach
to just the basename of the file where we are assuming details
about the binary based on its name.
While testing, I accidentally triggered a core dump during strcmp
when I forgot to set os.type on one of my code paths; this patch
changes such a coding error to raise a nicer internal error instead.
* src/qemu/qemu_command.c (qemuParseCommandLine): Compute basename
earlier.
* src/conf/domain_conf.c (virDomainDefPostParseInternal): Avoid
NULL deref.
Signed-off-by: Eric Blake <eblake@redhat.com>
The regular expression used to determine guest capabilities
was compiled in libxlCapsInitHost() but used in libxlCapsInitGuests().
Move compilation to libxlCapsInitGuests() where it is used, and free
the compiled regex after use. Ensure not to free the regex if
compilation fails.
On Power platform, Power7+ can support Power7 guest.
It needs to define XML configuration to specify guest's CPU model.
For exmaple:
<cpu match='exact'>
<model>POWER7_v2.1</model>
<vendor>IBM</vendor>
</cpu>
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
CPU features are not supported on non-x86 and hasFeatures will be NULL.
This patch is to remove CPU features functions calling to avoid errors.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
This patch changes virFileLoopDeviceOpen() to use the new loop-control
device to allocate a new loop device. If this behavior is unsupported
we fall back to the previous method of searching /dev for a free device.
With this patch you can start as many image based LXC domains as you
like (well almost).
Fixes bug https://bugzilla.redhat.com/show_bug.cgi?id=995543
Right now, securityfs is disallowed to be mounted in non-initial
user namespace, so we must avoid trying to mount securityfs in
a container which has user namespace enabled.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
The ESX code has a method esxVI_Alloc which would call
virAllocN directly, instead of using the VIR_ALLOC_N
macro. Remove this method and make the callers just
use VIR_ALLOC as is normal practice.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Delete the USB controller check from the USB Device checklist in
virDomainDeviceIsUSB as USB controller is a PCI device rather than
a USB one.
Signed-off-by: Liu Ji <jeremy.liu@huawei.com>
The s390, ppc and arm CPU drivers never set the 'arch' field
in their impl of cpuArchNodeData. This leads to error messages
being reported from cpuDataFree later, due to trying to use
VIR_ARCH_NONE.
#0 virRaiseErrorFull (filename=filename@entry=0x76f94434 "cpu/cpu.c", funcname=funcname@entry=0x76f942dc <__FUNCTION__.18096> "cpuGetSubDriver", linenr=linenr@entry=58,
domain=domain@entry=31, code=code@entry=1, level=level@entry=VIR_ERR_ERROR, str1=0x76f70e18 "internal error: %s",
str2=str2@entry=0x7155f2ec "undefined hardware architecture", str3=str3@entry=0x0, int1=int1@entry=-1, int2=int2@entry=-1, fmt=0x76f70e18 "internal error: %s")
at util/virerror.c:646
#1 0x76e682ea in virReportErrorHelper (domcode=domcode@entry=31, errorcode=errorcode@entry=1, filename=0x76f94434 "cpu/cpu.c",
funcname=0x76f942dc <__FUNCTION__.18096> "cpuGetSubDriver", linenr=linenr@entry=58, fmt=0x76f7e7e4 "%s") at util/virerror.c:1292
#2 0x76ed82d4 in cpuGetSubDriver (arch=<optimized out>) at cpu/cpu.c:57
#3 cpuGetSubDriver (arch=VIR_ARCH_NONE) at cpu/cpu.c:51
#4 0x76ed8818 in cpuDataFree (data=data@entry=0x70c22d78) at cpu/cpu.c:216
#5 0x716aaec0 in virQEMUCapsInitCPU (arch=VIR_ARCH_ARMV7L, caps=0x70c29a08) at qemu/qemu_capabilities.c:867
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The VIR_FREE() macro will cast away any const-ness. This masked a
number of places where we passed a 'const char *' string to
VIR_FREE. Fortunately in all of these cases, the variable was not
in fact const data, but a heap allocated string. Fix all the
variable declarations to reflect this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
No need to open code now that we have a nice function.
Interestingly, our virStringFreeList function is typed correctly
(a malloc'd list of malloc'd strings is NOT const, whether at the
point where it is created, or at the point where it is cleand up),
so using it with a 'const char **' argument would require a cast
to keep the compiler. I chose instead to remove const from code
even where we don't modify the argument, just to avoid the need
to cast.
* src/qemu/qemu_command.h (qemuParseCommandLine): Drop declaration.
* src/qemu/qemu_command.c (qemuParseProcFileStrings)
(qemuStringToArgvEnv): Don't force malloc'd result to be const.
(qemuParseCommandLinePid, qemuParseCommandLineString): Simplify
cleanup.
(qemuParseCommandLine, qemuFindEnv): Drop const-correctness to
avoid the need to cast in callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
In commit 991270db99 I've used virDomainNetGetActualHostdev() to get
the actual hostdev from a network when removing the network from the
list to avoid leaving the hostdev in the list. I didn't notice that this
function doesn't check if the actual network is allocated and
dereferences it. This crashes the daemon when cleaning up a domain
object in early startup phases when the actual network definition isn't
allocated. When the actual definition isn't present, the hostdev that
might correspond to it won't be present anyways so it's safe to return
NULL.
Thanks to Cole Robinson for noticing this problem.
No need to check if privileged when reading hostsysinfo, since
that check was already done in libxlDriverShouldLoad(). The
libxl driver fails to load if not privileged.
John Ferlan reported the following Coverity warning:
In libxlDomainCoreDump() Coverity has noted a FORWARD_NULL reference:
2004 if ((flags & VIR_DUMP_CRASH) && !vm->persistent) {
2005 virDomainObjListRemove(driver->domains, vm);
(20) Event assign_zero: Assigning: "vm" = "NULL".
Also see events: [var_deref_model]
2006 vm = NULL;
2007 }
2008
2009 ret = 0;
2010
2011 cleanup_unpause:
(21) Event var_deref_model: Passing null pointer "vm" to function
"virDomainObjIsActive(virDomainObjPtr)", which dereferences it. [details]
Also see events: [assign_zero]
2012 if (virDomainObjIsActive(vm) && paused) {
2013 if (libxl_domain_unpause(priv->ctx, dom->id) != 0) {
2014 virReportError(VIR_ERR_INTERNAL_ERROR,
Removing the vm from domain obj list and setting it to NULL can be
done in the previous 'if (flags & VIR_DUMP_CRASH)' conditional. Fix
the Coverity warning by ensuring vm is not NULL before testing if it
is still active.
daemon/Makefile.am installs a .policy file if WITH_LIBVIRTD and
WITH_POLKIT are both set. src/Makefile.am, on the other hand,
installs a .policy file if WITH_POLKIT1 is set, but without checking
WITH_LIBVIRTD. When running 'make rpm' with client_only manually
set, on a Fedora 19 box, that leads to a failure:
RPM build errors:
Installed (but unpackaged) file(s) found:
/usr/share/polkit-1/actions/org.libvirt.api.policy
Fix it by adding another conditional.
* src/Makefile.am (polkitaction_DATA): Make conditional.
Signed-off-by: Eric Blake <eblake@redhat.com>
When virGetUserEnt() and virGetGroupEnt() fail due to the uid or gid not
existing on the machine they'll print a message like:
$ virsh -c vbox:///session list
error: failed to connect to the hypervisor
error: Failed to find user record for uid '32655': Success
The success at the end is a bit confusing. This changes it to:
$ virsh -c vbox:///session list
error: failed to connect to the hypervisor
error: Failed to find user record for uid '32655'
Automake has builtin support to prevent botched conditional nesting,
but only if you use:
if FOO
else !FOO
endif !FOO
An example error message when using the wrong name:
daemon/Makefile.am:378: error: else reminder (LIBVIRT_INIT_SCRIPT_SYSTEMD_TRUE) incompatible with current conditional: LIBVIRT_INIT_SCRIPT_SYSTEMD_FALSE
daemon/Makefile.am:381: error: endif reminder (LIBVIRT_INIT_SCRIPT_SYSTEMD_TRUE) incompatible with current conditional: LIBVIRT_INIT_SCRIPT_SYSTEMD_FALSE
As our makefiles tend to have quite a bit of nested conditionals,
it's better to take advantage of the benefits of the build system
double-checking that our conditionals are well-nested, but that
requires a syntax check to enforce our usage style.
Alas, unlike C preprocessor and spec files, we can't use indentation
to make it easier to see how deeply nesting goes.
* cfg.mk (sc_makefile_conditionals): New rule.
* daemon/Makefile.am: Enforce the style.
* gnulib/tests/Makefile.am: Likewise.
* python/Makefile.am: Likewise.
* src/Makefile.am: Likewise.
* tests/Makefile.am: Likewise.
* tools/Makefile.am: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The 'uuid' field in virDomainDefPtr is not a pointer, it is a
fixed length array. Calling VIR_ALLOC on it is thus wrong and
leaks memory.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 50348e6edf reused the code to remove the hostdev portion of a
network definition on multiple places but forgot to take into account
that sometimes the "actual" network is passed and in some cases the
parent of that.
This patch uses the virDomainNetGetActualHostdev() helper to acquire the
correct pointer all the time while removing the hostdev portion from the
list.
https://bugzilla.redhat.com/show_bug.cgi?id=999352
Since commit v1.0.5-56-g449e6b1 (Pull parsing of migration xml up into
QEMU driver APIs) any attempt to rename a domain during migration fails
with the following error message:
internal error Incoming cookie data had unexpected name DOM vs DOM2
This is because migration cookies always use the original domain name
and the mentioned commit failed to propagate the name back to
qemuMigrationPrepareAny.
Now that most fields of libxlDriverPrivate struct are immutable
or self-locking, there is no need to acquire the driver lock in
much of the libxl driver.
The libxlDriverPrivate struct contains an variety of data with
varying access needs. Similar to the QEMU and LXC drivers,
move all the static config data into a dedicated libxlDriverConfig
object. The only locking requirement is to hold the driver lock
while obtaining an instance of libxlDriverConfig. Once a reference
is held on the config object, it can be used completely lockless
since it is immutable.
libxlDomainGetInfo() uses the driver-wide libxl ctx when
it would be more appropriate to use the per-domain ctx
associated with the domain. Switch to using the per-domain
libxl ctx.
libxlMakeDomCreateInfo() uses the driver-wide libxl ctx when
it would be more appropriate to use the per-domain ctx
associated with the domain. Switch to using the per-domain
libxl ctx.
libxl version info is static data as far as the libxl driver
is concerned, so retrieve this info when the driver is initialized
and stash it in the libxlDriverPrivate object. Subsequently use
the stashed info instead of repeatedly calling libxl_get_version_info().
Detect early on in libxl driver initialization if the driver
should be loaded at all, avoiding needless initialization steps
that only have to be undone later. While at it, move the
detection to a helper function to improve readability.
After detecting that the driver should be loaded, subsequent
failures such as initializing the log stream, allocating libxl
ctx, etc. should be treated as failure to initialize the driver.
Create libxl_domain.[ch] and move all functions operating on
libxlDomainObjPrivate to these files. This will be useful for
future patches that e.g. add job support for libxlDomainObjPrivate.
New coverity installation determined that the muliple if condition for
"*Alloc" and "*AppendToList" could fail during AppendToList thus leaking
memory.
Currently, kernel supports up to 8 queues for a multiqueue tap device.
However, if user tries to enter a huge number (e.g. one million) the tap
allocation fails, as expected. But what is not expected is the log full
of warnings:
warning : virFileClose:83 : Tried to close invalid fd 0
The problem is, upon error we iterate over an array of FDs (handlers to
queues) and VIR_FORCE_CLOSE() over each item. However, the array is
pre-filled with zeros. Hence, we repeatedly close stdin. Ouch.
But there's more. The queues allocation is done in virNetDevTapCreate()
which cleans up the FDs in case of error. Then, its caller, the
virNetDevTapCreateInBridgePort() iterates over the FD array and tries to
close them too. And so does qemuNetworkIfaceConnect() and
qemuBuildInterfaceCommandLine().
According to VMWare's documentation 'cdrom-raw' is an acceptable value
for deviceType for a CD-ROM drive. The documentation states that the VMX
configuration for a CD-ROM deviceType is as follows:
ide|scsi(n):(n).deviceType = "cdrom-raw|atapi-cdrom|cdrom-image"
From the documentation it appears the following is true:
- cdrom-image = Provides the ISO to the VM
- atapi-cdrom = Provides a NEC emulated ATAPI CD-ROM on top of the host
CD-ROM
- cdrom-raw = Passthru for a host CD-ROM drive. Allows CD-R burning from
within the guest.
A CD-ROM prior to this patch would always provide an 'atapi-cdrom' is
modeled as:
<disk type='block' device='cdrom'>
<source dev='/dev/scd0'/>
<target dev='hda' bus='ide'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
This patch allows the 'device' attribute to be set to 'lun' for a raw
acccess CD-ROM such as:
<disk type='block' device='lun'>
<source dev='/dev/scd0'/>
<target dev='hda' bus='ide'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
Sometimes a serial port might not be actually wired to a device when the
user does not have the VM powered on and we should not consider this a
fatal error.
Starting with qemu 1.6, the qemu-system-arm vexpress-a9 model has a
hardcoded virtio-mmio transport which enables attaching all virtio
devices.
On the command line, we have to use virtio-XXX-device rather than
virtio-XXX-pci, thankfully s390 already set the precedent here so
it's fairly straight forward.
At the XML level, this adds a new device address type virtio-mmio.
The controller and addressing don't have any subelements at the
moment because we they aren't needed for this usecase, but could
be added later if needed.
Add a test case for an ARM guest with one of every virtio device
enabled.
Similar to the chardev bit, ARM boards depend on the old style '-net nic'
for actually instantiating net devices. But we can't block out
-netdev altogether since it's needed for upcoming virtio support.
And add tests for working ARM XML with console, disk, and networking.
This corresponds to '-sd' and '-drive if=sd' on the qemu command line.
Needed for many ARM boards which don't provide any other way to
pass in storage.
QEMU ARM boards don't give us any way to explicitly wire in
a -chardev, so use the old style -serial options.
Unfortunately this isn't as simple as just turning off the CHARDEV flag
for qemu-system-arm, as upcoming virtio support _will_ use device/chardev.
On my machine, a guest fails to boot if it has a sound card, but not
graphical device/display is configured, because pulseaudio fails to
initialize since it can't access $HOME.
A workaround is removing the audio device, however on ARM boards there
isn't any option to do that, so -nographic always fails.
Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately
this has massive test suite fallout.
Add a qemu.conf parameter nographics_allow_host_audio, that if enabled
will pass through QEMU_AUDIO_DRV from sysconfig (similar to
vnc_allow_host_audio)
Add an attribute named 'removable' to the 'target' element of disks,
which controls the removable flag. For instance, on a Linux guest it
controls the value of /sys/block/$dev/removable. This option is only
valid for USB disks (i.e. bus='usb'), and its default value is 'off',
which is the same behaviour as before.
To achieve this, 'removable=on' (or 'off') is appended to the '-device
usb-storage' parameter sent to qemu when adding a USB disk via
'-disk'. A capability flag QEMU_CAPS_USB_STORAGE_REMOVABLE was added
to keep track if this option is supported by the qemu version used.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=922495
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Allow use of the usb-storage device only if the new capability flag
QEMU_CAPS_DEVICE_USB_STORAGE is set, which it is for qemu(-kvm)
versions >= 0.12.1.2-rhel62-beta.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
virVMXFormatHardDisk() and virVMXFormatCDROM() duplicated a lot of code
from each other and made a lot of nested if checks to build each part of
the VMX file. This hopefully simplifies the code path while combining
the two functions with no net difference.
Since virtlockd is only built when libvirtd is built, we should
not install its auxiliary files unconditionally. This solves
two failures. 1. 'make distcheck' complains:
rm -f Makefile
ERROR: files left in build directory after distclean:
./src/virtlockd.8
2. './autobuild.sh' complains:
Checking for unpackaged file(s): /usr/lib/rpm/check-files
/home/eblake/rpmbuild/BUILDROOT/mingw-libvirt-1.1.1-1.fc19.eblake1377879911.x86_64
error: Installed (but unpackaged) file(s) found:
/usr/i686-w64-mingw32/sys-root/mingw/etc/libvirt/virtlockd.conf
/usr/i686-w64-mingw32/sys-root/mingw/share/augeas/lenses/tests/test_virtlockd.aug
/usr/i686-w64-mingw32/sys-root/mingw/share/augeas/lenses/virtlockd.aug
/usr/i686-w64-mingw32/sys-root/mingw/share/man/man8/virtlockd.8
/usr/x86_64-w64-mingw32/sys-root/mingw/etc/libvirt/virtlockd.conf
/usr/x86_64-w64-mingw32/sys-root/mingw/share/augeas/lenses/tests/test_virtlockd.aug
/usr/x86_64-w64-mingw32/sys-root/mingw/share/augeas/lenses/virtlockd.aug
/usr/x86_64-w64-mingw32/sys-root/mingw/share/man/man8/virtlockd.8
* src/Makefile.am (CLEANFILES): Add virtlockd.8.
(man8_MANS, conf_DATA, augeas_DATA, augeastest_DATA): Only install
virtlockd files when daemon is built.
Signed-off-by: Eric Blake <eblake@redhat.com>
vhost only works in KVM mode at the moment, and is infact compiled
out if the emulator is built for non-native architecture. While it
may work at some point in the future for plain qemu, for now it's
just noise on the command line (and which contributes to arm cli
breakage).
FreeBSD 10 recently changed their definition of RAND_MAX, to try
and cover the fact that their evenly distributed results of rand()
really are a smaller range than a full power of 2. As a result,
I did some investigation, and learned:
1. POSIX requires random() to be evenly distributed across exactly
31 bits. glibc also guarantees this for rand(), but the two are
unrelated, and POSIX only associates RAND_MAX with rand().
Avoiding RAND_MAX altogether thus avoids a build failure on
FreeBSD 10.
2. Concatenating random bits from a PRNG will NOT provide uniform
coverage over the larger value UNLESS the period of the original
PRNG is at least as large as the number of bits being concatenated.
Simple example: suppose that RAND_MAX were 1 with a period of 2**1
(which means that the PRNG merely alternates between 0 and 1).
Concatenating two successive rand() calls would then invariably
result in 01 or 10, which is a rather non-uniform distribution
(00 and 11 are impossible) and an even worse period (2**0, since
our second attempt will get the same number as our first attempt).
But a RAND_MAX of 1 with a period of 2**2 (alternating between
0, 1, 1, 0) provides sane coverage of all four values, if properly
tempered. (Back-to-back calls would still only see half the values
if we don't do some tempering). We therefore want to guarantee a
period of at least 2**64, preferably larger (as a tempering factor);
POSIX only makes this guarantee for random() with 256 bytes of info.
* src/util/virrandom.c (virRandomBits): Use constants that are
accurate for the PRNG we are using, not an unrelated PRNG.
(randomState): Ensure the period of our PRNG exceeds our usage.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit 29fe5d7 (released in 1.1.1) introduced a latent problem
for any caller of virSecurityManagerSetProcessLabel and where
the domain already had a uid:gid label to be parsed. Such a
setup would collect the list of supplementary groups during
virSecurityManagerPreFork, but then ignores that information,
and thus fails to call setgroups() to adjust the supplementary
groups of the process.
Upstream does not use virSecurityManagerSetProcessLabel for
qemu (it uses virSecurityManagerSetChildProcessLabel instead),
so this problem remained latent until backporting the initial
commit into v0.10.2-maint (commit c061ff5, released in 0.10.2.7),
where virSecurityManagerSetChildProcessLabel has not been
backported. As a result of using a different code path in the
backport, attempts to start a qemu domain that runs as qemu:qemu
will end up with supplementary groups unchanged from the libvirtd
parent process, rather than the desired supplementary groups of
the qemu user. This can lead to failure to start a domain
(typical Fedora setup assigns user 107 'qemu' to both group 107
'qemu' and group 36 'kvm', so a disk image that is only readable
under kvm group rights is locked out). Worse, it is a security
hole (the qemu process will inherit supplemental group rights
from the parent libvirtd process, which means it has access
rights to files owned by group 0 even when such files should
not normally be visible to user qemu).
LXC does not use the DAC security driver, so it is not vulnerable
at this time. Still, it is better to plug the latent hole on
the master branch first, before cherry-picking it to the only
vulnerable branch v0.10.2-maint.
* src/security/security_dac.c (virSecurityDACGetIds): Always populate
groups and ngroups, rather than only when no label is parsed.
Signed-off-by: Eric Blake <eblake@redhat.com>
The return values for the virConnectListAllSecrets call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virConnectListAllNWFilters call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virConnectListAllNodeDevices call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virConnectListAllInterfaces call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virConnectListAllNetworks call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virStoragePoolListAllVolumes call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virConnectListAllStoragePools call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virConnectListAllDomains call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virDomain{SnapshotListAllChildren,ListAllSnapshots}
calls were not bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The return values for the virDomainGetJobStats call were not
bounds checked. This is a robustness issue for clients if
something where to cause corruption of the RPC stream data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The parameters for the virDomainMigrate*Params RPC calls were
not bounds checks, meaning a malicious client can cause libvirtd
to consume arbitrary memory
This issue was introduced in the 1.1.0 release of libvirt
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Similarly to qemu_driver.c, we can join often repeating code of looking
up network into one function: networkObjFromNetwork.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When using a <interface type="network"> that points to a network with
hostdev forwarding mode a hostdev alias is created for the network. This
allias is inserted into the hostdev list, but is backed with a part of
the network object that it is connected to.
When a VM is being stopped qemuProcessStop() calls
networkReleaseActualDevice() which eventually frees the memory for the
hostdev object. Afterwards when the domain definition is being freed by
virDomainDefFree() an invalid pointer is accessed by
virDomainHostdevDefFree() and may cause a crash of the daemon.
This patch removes the entry in the hostdev list before freeing the
depending memory to avoid this issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000973
QEMU commit 3984890 introduced the "pci-hole64-size" property,
to i440FX-pcihost and q35-pcihost with a default setting of 2 GB.
Translate <pcihole64>x<pcihole64/> to:
-global q35-pcihost.pci-hole64-size=x for q35 machines and
-global i440FX-pcihost.pci-hole64-size=x for i440FX-based machines.
Error out on other machine types or if the size was specified
but the pcihost device lacks 'pci-hole64-size' property.
https://bugzilla.redhat.com/show_bug.cgi?id=990418
<controller type='pci' index='0' model='pci-root'>
<pcihole64 unit='KiB'>1048576</pcihole64>
</controller>
It can be used to adjust (or disable) the size of the 64-bit
PCI hole. The size attribute is in kilobytes (different unit
can be specified on input), but it gets rounded up to
the nearest GB by QEMU.
Disabling it will be needed for guests that crash with the
64-bit PCI hole (like Windows XP), see:
https://bugzilla.redhat.com/show_bug.cgi?id=990418
The ftp protocol is already recognized by qemu/KVM so add this support to
libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftp' name='/url/path'>
<host name='host.name' port='21'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
QEMU/KVM already allows a HTTP URL for the cdrom ISO image so add this support
to libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='http' name='/url/path'>
<host name='host.name' port='80'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
qemu-img is going to switch the default for QCOW2
to QCOW2v3 (compat=1.1)
Extend the probing for qemu-img command line options to check
if -o compat is supported. If the volume definition specifies
the qcow2 format but no compat level and -o compat is supported,
specify -o compat=0.10 to create a QCOW2v2 image.
https://bugzilla.redhat.com/show_bug.cgi?id=997977
If there's no hard_limit set and domain uses VFIO we still must lock
the guest memory (prerequisite from qemu). Hence, we should compute
the amount to be locked from max_balloon.
When cpu hotplug fails without reporting an error, we would fail the
command but update the count of vCPUs anyways.
Commit 761fc48136 fixed the case when CPU
hot-unplug failed silently, but forgot to fix up the value in this case.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000357
The virDomainOpenGraphics method accepts a UNIX socket FD from
the client app. It must set the label on this FD otherwise QEMU
will be prevented from receiving it with recvmsg.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If user requested multiqueue networking, beside multiple /dev/tap and
/dev/vhost-net openings, we forgot to pass mq=on onto the -device
virtio-net-pci command line. This is advised at:
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
Re-arrange the code so that the returned bitmap is always initialized to
NULL even on early failures and return an error message as some callers
are already expecting it. Fix up the rest not to shadow the error.
Previously the error message showed the following:
error: internal error: Invalid or not yet handled value 'auto detect'
for VMX entry 'ide0:0.fileName'
This left the user unsure if it was a CD-ROM or a disk device that they
needed to fix. Now the error shows:
error: internal error: Invalid or not yet handled value 'auto detect'
for VMX entry 'ide0:0.fileName' for device type 'cdrom-raw'
Which should hopefully make it easier to see the issue with the VMX
configuration.
More fallout from commit d72ef888. When reconnecting to running
domains, the libxl_ctx in libxlDomainObjPrivate was used before
initializing it, causing a segfault in libxl and consequently
crashing libvirtd.
Initialize the libxlDomainObjPrivate libxl_ctx in libxlReconnectDomain,
and while at it use this ctx in libxlReconnectDomain instead of the
driver-wide ctx.
https://bugzilla.redhat.com/show_bug.cgi?id=822052
When doing a live migration, if the destination fails for any
reason after the point in which files should be labeled, then
the cleanup of the destination would restore the labels to their
defaults, even though the source is still trying to continue
running with the image open. Bug 822052 mentioned one source
of live migration failure - a mismatch in SELinux virt_use_nfs
settings (on for source, off for destination); but I found other
situations that would also trigger it (for example, having a
graphics device tied to port 5999 on the source, and a different
domain on the destination already using that port, so that the
destination cannot reuse the port).
In short, just as cleanup of the source on a successful migration
must not relabel files (because the destination would be crippled
by the relabel), cleanup of the destination on a failed migration
must not relabel files (because the source would be crippled).
* src/qemu/qemu_process.c (qemuProcessStart): Set flag to avoid
label restoration when cleaning up on failed migration.
Signed-off-by: Eric Blake <eblake@redhat.com>
Introduced by commit e0139e3044. virStorageVolDefFree free'ed the
pointers that are still used by the added volume object, this changes
it back to VIR_FREE.
Each of the modules handled reporting error messages from the secret fetching
slightly differently with respect to the error. Provide a similar message
for each error case and provide as much data as possible.
Following XML would fail :
<disk type='network' device='lun'>
<driver name='qemu' type='raw'/>
<source protocol='iscsi' name='iqn.2013-07.com.example:iscsi/1'>
<host name='example.com' port='3260'/>
</source>
<target dev='sda' bus='scsi'/>
</disk>
With the message:
error: Failed to start domain iscsilun
error: Unable to get device ID 'iqn.2013-07.com.example:iscsi/1': No such fi
Cause was commit id '1f49b05a' which added 'virDomainDiskSourceIsBlockType'
If we reached cleanup: prior to allocating cpus, it was possible that
'nr_nodes' had a value, but cpus was NULL leading to a possible NULL
deref. Add a 'cpus' as an end condition to for loop
https://bugzilla.redhat.com/show_bug.cgi?id=924153
Commit 904e05a2 (v0.9.9) added a per-<disk> seclabel element with
an attribute relabel='no' in order to try and minimize the
impact of shutdown delays when an NFS server disappears. The idea
was that if a disk is on NFS and can't be labeled in the first
place, there is no need to attempt the (no-op) relabel on domain
shutdown. Unfortunately, the way this was implemented was by
modifying the domain XML so that the optimization would survive
libvirtd restart, but in a way that is indistinguishable from an
explicit user setting. Furthermore, once the setting is turned
on, libvirt avoids attempts at labeling, even for operations like
snapshot or blockcopy where the chain is being extended or pivoted
onto non-NFS, where SELinux labeling is once again possible. As
a result, it was impossible to do a blockcopy to pivot from an
NFS image file onto a local file.
The solution is to separate the semantics of a chain that must
not be labeled (which the user can set even on persistent domains)
vs. the optimization of not attempting a relabel on cleanup (a
live-only annotation), and using only the user's explicit notation
rather than the optimization as the decision on whether to skip
a label attempt in the first place. When upgrading an older
libvirtd to a newer, an NFS volume will still attempt the relabel;
but as the avoidance of a relabel was only an optimization, this
shouldn't cause any problems.
In the ideal future, libvirt will eventually have XML describing
EVERY file in the backing chain, with each file having a separate
<seclabel> element. At that point, libvirt will be able to track
more closely which files need a relabel attempt at shutdown. But
until we reach that point, the single <seclabel> for the entire
<disk> chain is treated as a hint - when a chain has only one
file, then we know it is accurate; but if the chain has more than
one file, we have to attempt relabel in spite of the attribute,
in case part of the chain is local and SELinux mattered for that
portion of the chain.
* src/conf/domain_conf.h (_virSecurityDeviceLabelDef): Add new
member.
* src/conf/domain_conf.c (virSecurityDeviceLabelDefParseXML):
Parse it, for live images only.
(virSecurityDeviceLabelDefFormat): Output it.
(virDomainDiskDefParseXML, virDomainChrSourceDefParseXML)
(virDomainDiskSourceDefFormat, virDomainChrDefFormat)
(virDomainDiskDefFormat): Pass flags on through.
* src/security/security_selinux.c
(virSecuritySELinuxRestoreSecurityImageLabelInt): Honor labelskip
when possible.
(virSecuritySELinuxSetSecurityFileLabel): Set labelskip, not
norelabel, if labeling fails.
(virSecuritySELinuxSetFileconHelper): Fix indentation.
* docs/formatdomain.html.in (seclabel): Document new xml.
* docs/schemas/domaincommon.rng (devSeclabel): Allow it in RNG.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.xml:
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.args:
* tests/qemuxml2xmloutdata/qemuxml2xmlout-seclabel-*-labelskip.xml:
New test files.
* tests/qemuxml2argvtest.c (mymain): Run the new tests.
* tests/qemuxml2xmltest.c (mymain): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
If there's no hard_limit set and domain uses VFIO we still must lock the
guest memory (prerequisite from qemu). Hence, we should compute the
amount to be locked from max_balloon.
Since 16bcb3 we have a regression. The hard_limit is set
unconditionally. By default the limit is zero. Hence, if user hasn't
configured any, we set the zero in cgroup subsystem making the kernel
kill the corresponding qemu process immediately. The proper fix is to
set hard_limit iff user has configured any.
From: Dario Faggioli <dario.faggioli@citrix.com>
Starting from Xen 4.2, libxl has all the bits and pieces in place
for retrieving an adequate amount of information about the host
NUMA topology. It is therefore possible, after a bit of shuffling,
to arrange those information in the way libvirt wants to present
them to the outside world.
Therefore, with this patch, the <topology> section of the host
capabilities is properly populated, when running on Xen, so that
we can figure out whether or not we're running on a NUMA host,
and what its characteristics are.
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// capabilities
<capabilities>
<host>
<cpu>
....
<topology>
<cells num='2'>
<cell id='0'>
<memory unit='KiB'>6291456</memory>
<cpus num='8'>
<cpu id='0' socket_id='1' core_id='0' siblings='0-1'/>
<cpu id='1' socket_id='1' core_id='0' siblings='0-1'/>
<cpu id='2' socket_id='1' core_id='1' siblings='2-3'/>
<cpu id='3' socket_id='1' core_id='1' siblings='2-3'/>
<cpu id='4' socket_id='1' core_id='9' siblings='4-5'/>
<cpu id='5' socket_id='1' core_id='9' siblings='4-5'/>
<cpu id='6' socket_id='1' core_id='10' siblings='6-7'/>
<cpu id='7' socket_id='1' core_id='10' siblings='6-7'/>
</cpus>
</cell>
<cell id='1'>
<memory unit='KiB'>6881280</memory>
<cpus num='8'>
<cpu id='8' socket_id='0' core_id='0' siblings='8-9'/>
<cpu id='9' socket_id='0' core_id='0' siblings='8-9'/>
<cpu id='10' socket_id='0' core_id='1' siblings='10-11'/>
<cpu id='11' socket_id='0' core_id='1' siblings='10-11'/>
<cpu id='12' socket_id='0' core_id='9' siblings='12-13'/>
<cpu id='13' socket_id='0' core_id='9' siblings='12-13'/>
<cpu id='14' socket_id='0' core_id='10' siblings='14-15'/>
<cpu id='15' socket_id='0' core_id='10' siblings='14-15'/>
</cpus>
</cell>
</cells>
</topology>
</host>
....
When the daemon is compiled with firewalld support but the DBus message
bus isn't started in the system, the initialization of the nwfilter
driver fails even if there are fallback options.
On hosts that don't have the DBus service running or installed the new
systemd cgroups code failed with hard error instead of falling back to
"manual" cgroup creation.
Use the new helper to check for the system bus and use the fallback code
in case it isn't available.
Some systems may not use DBus in their system. Add a method to check if
the system bus is available that doesn't print error messages so that
code can later check for this condition and use an alternative approach.
Each new VM requires a new connection from libvirtd to virtlockd.
The default max clients limit in virtlockd of 20 is thus woefully
insufficient. virtlockd sockets are only accessible to matching
users, so there is no security need for such a tight limit. Make
it configurable and default to 1024.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
One has to refresh the pool to get the correct pool info after
adding/removing/resizing a volume, this updates the pool metadata
(allocation, available) after those operation are done.
The function that parses custom driver XML was getting pretty unruly,
split the object parsing into their own functions. Rename some variables
to be consistent across each function. This should be functionally
identical.
Currently the virConnectBaselineCPU API does not expose the CPU features
that are part of the CPU's model. This patch adds a new flag,
VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES, that causes the API to explicitly
list all features that are part of that model.
Signed-off-by: Don Dugger <donald.d.dugger@intel.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Cleanup the libxl capabilities code to be a bit more extensible,
splitting out the creation of host and guest capabilities. This
should make it easier to implement additional capabilities in the
future, such as NUMA topology reporting.
The virBitmapParse function was calling virBitmapIsSet() function that
requires the caller to check the bounds of the bitmap without checking
them. This resulted into crashes when parsing a bitmap string that was
exceeding the bounds used as argument.
This patch refactors the function to use virBitmapSetBit without
checking if the bit is set (this function does the checks internally)
and then counts the bits in the bitmap afterwards (instead of keeping
track while parsing the string).
This patch also changes the "parse_error" label to a more common
"error".
The refactor should also get rid of the need to call sa_assert on the
returned variable as the callpath should allow coverity to infer the
possible return values.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=997367
Thanks to Alex Jia for tracking down the issue. This issue is introduced
by commit 0fc8909.
There is a potential leak of a newly created libxlDomainObjPrivate
when subsequent allocation of the object's chrdev field fails.
Unref the object on such an error so that it is properly disposed.
This resolves the issue that prompted the filing of
https://bugzilla.redhat.com/show_bug.cgi?id=928638
(although the request there is for something much larger and more
general than this patch).
commit f3868259ca disabled the
forwarding to upstream DNS servers of unresolved DNS requests for
names that had no domain, but were just simple host names (no "."
character anywhere in the name). While this behavior is frowned upon
by DNS root servers (that's why it was changed in libvirt), it is
convenient in some cases, and since dnsmasq can be configured to allow
it, it must not be strictly forbidden.
This patch restores the old behavior, but since it is usually
undesirable, restoring it requires specification of a new option in
the network config. Adding the attribute "forwardPlainNames='yes'" to
the <dns> elemnt does the trick - when that attribute is added to a
network config, any simple hostnames that can't be resolved by the
network's dnsmasq instance will be forwarded to the DNS servers listed
in the host's /etc/resolv.conf for an attempt at resolution (just as
any FQDN would be forwarded).
When that attribute *isn't* specified, unresolved simple names will
*not* be forwarded to the upstream DNS server - this is the default
behavior.
If booting a container with a root FS that isn't the host's
root, we must ensure that the /dev mount point exists.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virtlockd/libvirtd daemons had listed '?' as the short option
for --help. getopt_long uses '?' for any unknown option. We want
to be able to distinguish unknown options (which use EXIT_FAILURE)
from correct usage of help (which should use EXIT_SUCCESS). Thus
we should use 'h' as a short option for --help. Also add this to
the man page docs
The virtlockd/libvirtd daemons did not list any short option
for the --version arg. Add -V as a valid short option, since
-v is already used for --verbose.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The lxcContainerMountFSBlockAuto method can be used to mount the
initial root filesystem, so it cannot assume a prefix of /.oldroot.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
- Convert virCgroupGet* to VIR_CGROUP_SUPPORTED
- Convert virCgroup(Get|Set)FreezerState to VIR_CGROUP_SUPPORTED
Signed-off-by: Eric Blake <eblake@redhat.com>
- Introduce VIR_CGROUP_SUPPORTED conditional
- Convert virCgroupKill* to use it
- Convert virCgroupIsolateMount() to use it
- Convert virCgroupRemoveRecursively to VIR_CGROUP_SUPPORTED
Signed-off-by: Eric Blake <eblake@redhat.com>
Make future patches smaller by matching a sane header listing in
the first place. No semantic change.
* src/util/vircgroup.h: Move free next to new, and controller
functions next to each other.
* src/util/vircgroup.c (virCgroupFree, virCgroupHasController)
(virCgroupPathOfController, virCgroupRemoveRecursively)
(virCgroupRemove): Sort implementation to be closer to header.
Signed-off-by: Eric Blake <eblake@redhat.com>
Avoid a forward declaration of a static function.
* src/util/vircgroup.c (virCgroupPartitionNeedsEscaping)
(virCgroupParticionEscape): Move up.
Signed-off-by: Eric Blake <eblake@redhat.com>
Format all functions with two blank lines between, and return type
on separate line from function name. Also break some lines longer
than 80 columns. This makes the subsequent macro refactoring
less noisy.
* src/util/vircgroup.c: Match prevailing style.
Signed-off-by: Eric Blake <eblake@redhat.com>
otherwise having a strict --no-copy-dt-needed-entries fails in several
places like:
CCLD virdbustest
/usr/bin/ld: virdbustest-virdbustest.o: undefined reference to symbol 'dbus_message_unref'
/lib/x86_64-linux-gnu/libdbus-1.so.3: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
https://bugzilla.redhat.com/show_bug.cgi?id=951637
Newer gnutls uses nettle, rather than gcrypt, which is a lot nicer
regarding initialization. Yet we were unconditionally initializing
gcrypt even when gnutls wouldn't be using it, and having two crypto
libraries linked into libvirt.so is pointless, but mostly harmless
(it doesn't crash, but does interfere with certification efforts).
There are three distinct version ranges to worry about when
determining which crypto lib gnutls uses, per these gnutls mails:
2.12: http://lists.gnu.org/archive/html/gnutls-devel/2011-03/msg00034.html
3.0: http://lists.gnu.org/archive/html/gnutls-devel/2011-07/msg00035.html
If pkg-config can prove version numbers and/or list the crypto
library used for static linking, we have our proof; if not, it
is safer (even if pointless) to continue to use gcrypt ourselves.
* configure.ac (WITH_GNUTLS): Probe whether to add -lgcrypt, and
define a witness WITH_GNUTLS_GCRYPT.
* src/libvirt.c (virTLSMutexInit, virTLSMutexDestroy)
(virTLSMutexLock, virTLSMutexUnlock, virTLSThreadImpl)
(virGlobalInit): Honor the witness.
* libvirt.spec.in (BuildRequires): Make gcrypt usage conditional,
no longer needed in Fedora 19.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit d72ef888 introduced a bug in the libxl driver that will
segfault libvirtd if libxl reports an error message, e.g. when
attempting to initialize the driver on a non-Xen system. I
assumed it was valid to pass a NULL logger to libxl_ctx_alloc(),
but that is not the case since any errors associated with the ctx
that are emitted by libxl will dereference the logger and crash
libvirtd.
Errors associated with the libxl driver-wide ctx could be useful
for debugging anyway, so create a 'libxl-driver.log' to capture
these errors.
Recentish (2011) kernels introduced a new device called /dev/loop-control,
which causes libvirt's detection of loop devices to get confused
since it only checks for a prefix of 'loop'. Also check that the
next character is a digit
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This adds two new pages to the website, acl.html describing
the general access control framework and permissions models,
and aclpolkit.html describing the use of polkit as an
access control driver.
page.xsl is modified to support a new syntax
<div id="include" filename="somefile.htmlinc"/>
which will cause the XSL transform to replace that <div>
with the contents of 'somefile.htmlinc'. We use this in
the acl.html.in file, to pull the table of permissions
for each libvirt object. This table is autogenerated
from the enums in src/access/viraccessperms.h by the
genaclperms.pl script.
newapi.xsl is modified so that the list of permissions
checks shown against each API will link to the description
of the permissions in acl.html
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The gendispatch.pl script puts comments at the top of files
it creates, saying that it auto-generated them. Also include
the name of the source data file which it reads when doing
the auto-generation.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
introduced by cs 4b9eec50fe ("libxl: implement per
NUMA node free memory reporting"). What was wrong was that
libxl_get_numainfo() put in nr_nodes the actual number of
host NUMA nodes, not the highest node ID (like libnuma's
numa_max_node() does instead).
While at it, turn the failure of libxl_get_numainfo() from
a simple warning to a proper error, as requested during the
review of another patch of the original series.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
This is a second attempt at fixing the problem first attempted
in commit 2df8d99; basically undoing the fact that it was
reverted in commit 43cee32f, plus fixing two more issues: the
code in configure.ac has to EXACTLY match virnetdevbridge.c
with regards to declaring in6 types before using if_bridge.h,
and the fact that RHEL 5 has even more conflicts:
In file included from util/virnetdevbridge.c:49:
/usr/include/linux/in6.h:47: error: conflicting types for 'in6addr_any'
/usr/include/netinet/in.h:206: error: previous declaration of 'in6addr_any' was here
/usr/include/linux/in6.h:49: error: conflicting types for 'in6addr_loopback'
/usr/include/netinet/in.h:207: error: previous declaration of 'in6addr_loopback' was here
The rest of this commit message borrows from the original try
of 2df8d99:
A fresh checkout on a RHEL 6 machine with these packages:
kernel-headers-2.6.32-405.el6.x86_64
glibc-2.12-1.128.el6.x86_64
failed to configure with this message:
checking for linux/if_bridge.h... no
configure: error: You must install kernel-headers in order to compile libvirt with QEMU or LXC support
Digging in config.log, we see that the problem is identical to
what we fixed earlier in commit d12c2811:
configure:98831: checking for linux/if_bridge.h
configure:98853: gcc -std=gnu99 -c -g -O2 conftest.c >&5
In file included from /usr/include/linux/if_bridge.h:17,
from conftest.c:559:
/usr/include/linux/in6.h:31: error: redefinition of 'struct in6_addr'
/usr/include/linux/in6.h:48: error: redefinition of 'struct sockaddr_in6'
/usr/include/linux/in6.h:56: error: redefinition of 'struct ipv6_mreq'
configure:98860: $? = 1
I had not hit it earlier because I was using incremental builds,
where config.cache had shielded me from the kernel-headers breakage.
* configure.ac (if_bridge.h): Avoid conflicting type definitions.
* src/util/virnetdevbridge.c (includes): Also sanitize for RHEL 5.
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, only one log file is created by the libxl driver, with
all output from libxl for all domains going to this one file.
Create a per-domain log file based on domain name, making sifting
through the logs a bit easier. This required deferring libxl_ctx
allocation until starting the domain, which is fine since the
ctx is not used when the domain is inactive.
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
The virtlockd daemon supports an /etc/libvirt/virtlockd.conf
config file, but we never installed a default config, nor
created any augeas scripts. This change addresses that omission.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Coverity complained about the usage of the uninitialized cacerts in the
event(s) that "access(certFile, R_OK)" and/or "access(cacertFile, R_OK)"
fail the for loop used to fill in the certs will have indeterminate data
as well as the possibility that both failures would result in the
gnutls_x509_crt_deinit() call having a similar fate.
Initializing cacerts only would resolve the issue; however, it still
would leave the indeterminate action, so rather add a parameter to
the virNetTLSContextLoadCACertListFromFile() to pass the max size rather
then overloading the returned count parameter. If the the call is never
made, then we won't go through the for loops referencing the empty
cacerts
Valgrind defects memory error:
==16759== 1 errors in context 1 of 8:
==16759== Invalid free() / delete / delete[] / realloc()
==16759== at 0x4A074C4: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==16759== by 0x83CD329: xdr_string (in /usr/lib64/libc-2.17.so)
==16759== by 0x4D93E4D: xdr_remote_nonnull_string (remote_protocol.c:31)
==16759== by 0x4D94350: xdr_remote_nonnull_domain (remote_protocol.c:58)
==16759== by 0x4D976C8: xdr_remote_domain_create_with_flags_ret (remote_protocol.c:1762)
==16759== by 0x83CC734: xdr_free (in /usr/lib64/libc-2.17.so)
==16759== by 0x4D7F1E0: remoteDomainCreateWithFlags (remote_driver.c:2441)
==16759== by 0x4D4BF17: virDomainCreateWithFlags (libvirt.c:9499)
==16759== by 0x13127A: cmdStart (virsh-domain.c:3376)
==16759== by 0x12BF83: vshCommandRun (virsh.c:1751)
==16759== by 0x126FFB: main (virsh.c:3205)
==16759== Address 0xe1394a0 is not stack'd, malloc'd or (recently) free'd
==16759== 1 errors in context 2 of 8:
==16759== Conditional jump or move depends on uninitialised value(s)
==16759== at 0x4A07477: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==16759== by 0x83CD329: xdr_string (in /usr/lib64/libc-2.17.so)
==16759== by 0x4D93E4D: xdr_remote_nonnull_string (remote_protocol.c:31)
==16759== by 0x4D94350: xdr_remote_nonnull_domain (remote_protocol.c:58)
==16759== by 0x4D976C8: xdr_remote_domain_create_with_flags_ret (remote_protocol.c:1762)
==16759== by 0x83CC734: xdr_free (in /usr/lib64/libc-2.17.so)
==16759== by 0x4D7F1E0: remoteDomainCreateWithFlags (remote_driver.c:2441)
==16759== by 0x4D4BF17: virDomainCreateWithFlags (libvirt.c:9499)
==16759== by 0x13127A: cmdStart (virsh-domain.c:3376)
==16759== by 0x12BF83: vshCommandRun (virsh.c:1751)
==16759== by 0x126FFB: main (virsh.c:3205)
==16759== Uninitialised value was created by a stack allocation
==16759== at 0x4D7F120: remoteDomainCreateWithFlags (remote_driver.c:2423)
How to reproduce?
# virsh start <domain> --paused
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=994855
Signed-off-by: Alex Jia <ajia@redhat.com>
If securityfs is available on the host, we should ensure to
mount it read-only in the container. This will avoid systemd
trying to mount it during startup causing SELinux AVCs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Hotplugging a single SCSI device works, but adding additional ones
result in an error from QEMU:
[root@gpok197 ~]# virsh attach-device guest01 blah.xml
Device attached successfully
[root@gpok197 ~]# virsh attach-device guest01 blah2.xml
error: Failed to attach device from blah2.xml
error: internal error unable to execute QEMU command 'device_add': Duplicate ID 'hostdev0' for device
The hostdev ID that is created is always set to zero, regardless
of the contents of the XML. Changing the index in the hotplug case
to a negative one so the next available index is used.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
So that app developers / admins know what access control checks
are performed for each API, this patch extends the API docs
generator to include details of the ACLs for each.
The gendispatch.pl script is extended so that it generates
a simple XML describing ACL rules, eg.
<aclinfo>
...
<api name='virConnectNumOfDomains'>
<check object='connect' perm='search_domains'/>
<filter object='domain' perm='getattr'/>
</api>
<api name='virDomainAttachDeviceFlags'>
<check object='domain' perm='write'/>
<check object='domain' perm='save' flags='!VIR_DOMAIN_AFFECT_CONFIG|VIR_DOMAIN_AFFECT_LIVE'/>
<check object='domain' perm='save' flags='VIR_DOMAIN_AFFECT_CONFIG'/>
</api>
...
</aclinfo>
The newapi.xsl template loads the XML files containing the ACL
rules and generates a short block of HTML for each API describing
the parameter checks and return value filters (if any).
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code added to validate CA certificates did not take into
account the possibility that the cacert.pem file can contain
multiple (concatenated) cert data blocks. Extend the code for
loading CA certs to use the gnutls APIs for loading cert lists.
Add test cases to check that multi-level trees of certs will
validate correctly.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 3d0e3c1 reintroduced a problem previously squelched in
commit 7e5aa78. Add a syntax check this time around.
util/virutil.c: In function 'virGetGroupList':
util/virutil.c:1015: error: 'for' loop initial declaration used outside C99 mode
* cfg.mk (sc_prohibit_loop_var_decl): New rule.
* src/util/virutil.c (virGetGroupList): Fix offender.
Signed-off-by: Eric Blake <eblake@redhat.com>
Before, missing attributes were only OK when adding entries;
modification and deletion required all of them.
Now, only deletion works with missing attributes, as long as
the host is uniquely identified.
Go through disks of guest, if one disk doesn't exist or its backing
chain is broken, with 'optional' startupPolicy, for CDROM and Floppy
we only discard its source path definition in xml, for disks we drop
it from disk list and free it.
Since iptables version 1.4.16 '-m state --state NEW' is converted to
'-m conntrack --ctstate NEW'. Therefore, when encountering this or later
versions of iptables use '-m conntrack --ctstate'.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
The change from initgroups to virGetGroupList/setgroups in
cab36cfe71ba83b71e536ba5c98e596f02b697b0 dropped the primary group from
processes group list iff the passed in group to virGetGroupList differs
from the user's primary group.
So always include the primary group to bring back the old behaviour.
Debian has the kvm group as primary group but uses
libvirt-qemu:libvirt-qemu as user:group to run the kvm process so
without this change the /dev/kvm is inaccessible.
Since commit 95e18efd most public interfaces (xenUnified...) obtain
a virDomainDefPtr via xenGetDomainDefFor...() which take the unified
lock.
This is already taken before calling xenDomainUsedCpus(), so we get
a deadlock for active guests. Avoid this by splitting up
xenUnifiedDomainGetVcpusFlags() and xenUnifiedDomainGetVcpus() into
public and private function calls (which get the virDomainDefPtr passed)
and use those in xenDomainUsedCpus().
xenDomainUsedCpus
...
nb_vcpu = xenUnifiedDomainGetMaxVcpus(dom);
return xenUnifiedDomainGetVcpusFlags(...)
...
if (!(def = xenGetDomainDefForDom(dom)))
return xenGetDomainDefForUUID(dom->conn, dom->uuid);
...
ret = xenHypervisorLookupDomainByUUID(conn, uuid);
...
xenUnifiedLock(priv);
name = xenStoreDomainGetName(conn, id);
xenUnifiedUnlock(priv);
...
if ((ncpus = xenUnifiedDomainGetVcpus(dom, cpuinfo, nb_vcpu,
...
if (!(def = xenGetDomainDefForDom(dom)))
[again like above]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
This patch addresses two concerns with the error reporting when an
incompatible PCI address is specified for a device:
1) It wasn't always apparent which device had the problem. With this
patch applied, any error about an incompatible address will always
contain the full address as given in the config, so it will be easier
to determine which device's config aused the problem.
2) In some cases when the problem came from bad config, the error
message was erroneously classified as VIR_ERR_INTERNAL_ERROR. With
this patch applied, the same error message will be changed to indicate
either "internal" or "xml" error depending on whether the address came
from the config, or was automatically generated by libvirt.
Note that in the case of "internal" (due to bad auto-generation)
errors, the PCI address won't be of much use in finding the location
in config to change (because it was automatically generated). Of
course that makes perfect sense, but still the address could provide a
clue about a bug in libvirt attempting to use a type of pci bus that
doesn't have its flags set correctly (or something similar). In other
words, it's not perfect, but it is definitely better.
q35 machines have an implicit ahci (sata) controller at 00:1F.2 which
has no "id" associated with it. For this reason, we can't refer to it
as "ahci0". Instead, we don't give an id on the commandline, which
qemu interprets as "use the first ahci controller". We then need to
specify the unit with "unit=%d" rather than adding it onto the bus
arg.
https://bugzilla.redhat.com/show_bug.cgi?id=979477
Since 1.0.3 we are using the new way to copy non shared storage during
migration (the NBD way). However, whether the new or old way is used is
not controllable by user but unconditionally turned on if both sides of
migration support it. Moreover, the implementation is not complete: the
combination for VIR_MIGRATE_TUNNELLED flag is missing (as we need to
open new port on the destination) in which case we just error out. This
is a deadly combination: not letting users choose their destiny and
erroring out. We should not do that but VIR_WARN and turn the NBD off
instead.
We had been setting the device alias in the devinceinfo for pci
controllers to "pci%u", but then hardcoding "pci.%u" when creating the
device address for other devices using that pci bus. This all worked
just fine until we encountered the built-in "pcie.0" bus (the PCIe
root complex) in Q35 machines.
In order to create the correct commandline for this one case, this
patch:
1) sets the alias for PCI controllers correctly, to "pci.%u" (or
"pcie.%u" for the pcie-root controller)
2) eliminates the hardcoded "pci.%u" for pci controllers when
generatuing device address strings, and instead uses the controller's
alias.
3) plumbs a pointer to the virDomainDef all the way down to
qemuBuildDeviceAddressStr. This was necessary in order to make the
aliase of the controller *used by a device* available (previously
qemuBuildDeviceAddressStr only had the deviceinfo of the device
itself, *not* of the controller it was connecting to). This made for a
larger than desired diff, but at least in the future we won't have to
do it again, since all the information we could possibly ever need for
future enhancements is in the virDomainDef. (right?)
This should be done for *all* controllers, but for now we just do it
in the case of PCI controllers, to reduce the likelyhood of
regression.
This patch adds in special handling for a few devices that need to be
treated differently for q35 domains:
usb - there is no implicit/default usb controller for the q35
machinetype. This is done because normally the default usb controller
is added to a domain by just adding "-usb" to the qemu commandline,
and it's assumed that this will add a single piix3 usb1 controller at
slot 1 function 2. That's not what happens when the machinetype is
q35, though. Instead, adding -usb to the commandline adds 3 usb
(version 2) controllers to the domain at slot 0x1D.{1,2,7}. Rather
than having
<controller type='usb' index='0'/>
translate into 3 separate devices on the PCI bus, it's cleaner to not
automatically add a default usb device; one can always be added
explicitly if desired. Or we may decide that on q35 machines, 3 usb
controllers will be automatically added when none is given. But for
this initial commit, at least we aren't locking ourselves into
something we later won't want.
video - qemu always initializes the primary video device immediately
after any integrated devices for the machinetype. Unless instructed
otherwise (by using "-device vga..." instead of "-vga" which libvirt
uses in many cases to work around deficiencies and bugs in various
qemu versions) qemu will always pick the first unused slot. In the
case of the "pc" machinetype and its derivatives, this is always slot
2, but on q35 machinetypes, the first free slot is slot 1 (since the
q35's integrated peripheral devices are placed in other slots,
e.g. slot 0x1f). In order to make the PCI address of the video device
predictable, that slot (1 or 2, depending on machinetype) is reserved
even when no video device has been specified.
sata - a q35 machine always has a sata controller implicitly added at
slot 0x1F, function 2. There is no way to avoid this controller, so we
always add it. Note that the xml2xml tests for the pcie-root and q35
cases were changed to use DO_TEST_DIFFERENT() so that we can check for
the sata controller being automatically added. This is especially
important because we can't check for it in the xml2argv output (it has
no effect on that output since it's an implicit device).
ide - q35 has no ide controllers.
isa and smbus controllers - these two are always present in a q35 (at
slot 0x1F functions 0 and 3) but we have no way of modelling them in
our config. We do need to reserve those functions so that the user
doesn't attempt to put anything else there though. (note that the "pc"
machine type also has an ISA controller, which we also ignore).
This PCI controller, named "dmi-to-pci-bridge" in the libvirt config,
and implemented with qemu's "i82801b11-bridge" device, connects to a
PCI Express slot (e.g. one of the slots provided by the pcie-root
controller, aka "pcie.0" on the qemu commandline), and provides 31
*non-hot-pluggable* PCI (*not* PCIe) slots, numbered 1-31.
Any time a machine is defined which has a pcie-root controller
(i.e. any q35-based machinetype), libvirt will automatically add a
dmi-to-pci-bridge controller if one doesn't exist, and also add a
pci-bridge controller. The reasoning here is that any useful domain
will have either an immediate (startup time) or eventual (subsequent
hot-plug) need for a standard PCI slot; since the pcie-root controller
only provides PCIe slots, we need to connect a dmi-to-pci-bridge
controller to it in order to get a non-hot-plug PCI slot that we can
then use to connect a pci-bridge - the slots provided by the
pci-bridge will be both standard PCI and hot-pluggable.
Since pci-bridge devices themselves can not be hot-plugged into a
running system (although you can hot-plug other devices into a
pci-bridge's slots), any new pci-bridge controller that is added can
(and will) be plugged into the dmi-to-pci-bridge as long as it has
empty slots available.
This patch is also changing the qemuxml2xml-pcie test from a "DO_TEST"
to a "DO_DIFFERENT_TEST". This is so that the "before" xml can omit
the automatically added dmi-to-pci-bridge and pci-bridge devices, and
the "after" xml can include it - this way we are testing if libvirt is
properly adding these devices.
This controller is implicit on q35 machinetypes. It provides 31 PCIe
(*not* PCI) slots as controller 0.
Currently there are no devices that can connect to pcie-root, and no
implicit pci controller on a q35 machine, so q35 is still
unusable. For a usable q35 system, we need to add a
"dmi-to-pci-bridge" pci controller, which can connect to pcie-root,
and provides standard pci slots that can be used to connect other
devices.
Previous refactoring of the guest PCI address reservation/allocation
code allowed for slot types other than basic PCI (e.g. PCI express,
non-hotpluggable slots, etc) but would not auto-allocate a slot for a
device that required any type other than a basic hot-pluggable
PCI slot.
This patch refactors the code to be aware of different slot types
during auto-allocation of addresses as well - as long as there is an
empty slot of the required type, it will be found and used.
The piece that *wasn't* added is that we don't auto-create a new PCI
bus when needed for anything except basic PCI devices. This is because
there are multiple different types of controllers that can provide,
for example, a PCI express slot (in addition to the pcie-root
controller, these can also be found on a "root-port" or on a
"downstream-switch-port"). Since we currently don't support any PCIe
devices (except pending support for dmi-to-pci-bridge), we can defer
any decision on what to do about this.
Commit 632180d1 introduced memory corruption in xenDaemonListDefinedDomains
by starting to populate the names array at index -1, causing all sorts
of havoc in libvirtd such as aborts like the following
*** Error in `/usr/sbin/libvirtd': double free or corruption (out): 0x00007fffe00ccf20 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7abf6)[0x7ffff3fa0bf6]
/lib64/libc.so.6(+0x7b973)[0x7ffff3fa1973]
/lib64/libc.so.6(xdr_array+0xde)[0x7ffff403cbae]
/usr/sbin/libvirtd(+0x50251)[0x5555555a4251]
/lib64/libc.so.6(xdr_free+0x15)[0x7ffff403ccd5]
/usr/lib64/libvirt.so.0(+0x1fad34)[0x7ffff76b1d34]
/usr/lib64/libvirt.so.0(virNetServerProgramDispatch+0x1fc)[0x7ffff76b16f1]
/usr/lib64/libvirt.so.0(+0x1f214a)[0x7ffff76a914a]
/usr/lib64/libvirt.so.0(+0x1f222d)[0x7ffff76a922d]
/usr/lib64/libvirt.so.0(+0xbcc4f)[0x7ffff7573c4f]
/usr/lib64/libvirt.so.0(+0xbc5e5)[0x7ffff75735e5]
/lib64/libpthread.so.0(+0x7e0f)[0x7ffff48f7e0f]
/lib64/libc.so.6(clone+0x6d)[0x7ffff400e7dd]
Fix by initializing ret to 0 and only setting to error on failure path.
This configuration knob lets user to set the length of queue of
connection requests waiting to be accept()-ed by the daemon. IOW, it
just controls the @backlog passed to listen:
int listen(int sockfd, int backlog);
Currently, even if max_client limit is hit, we accept() incoming
connection request, but close it immediately. This has disadvantage of
not using listen() queue. We should accept() only those clients we
know we can serve and let all other wait in the (limited) queue.
* The functions qemuDomainPCIAddressReserveAddr and
qemuDomainPCIAddressReserveSlot were very similar (and should have
been more similar) and were about to get more code added to them which
would create even more duplicated code, so this patch gives
qemuDomainPCIAddressReserveAddr a "reserveEntireSlot" arg, then
replaces the body of qemuDomainPCIAddressReserveSlot with a call to
qemuDomainPCIAddressReserveAddr.
You will notice that addrs->lastaddr was previously set in
qemuDomainPCIAddressReserveAddr (but *not* set in
qemuDomainPCIAddressReserveSlot). For consistency and cleanliness of
code, that bit was removed and put into the one caller of
qemuDomainPCIAddressReserveAddr (there is a similar place where the
caller of qemuDomainPCIAddressReserveSlot sets lastaddr). This does
guarantee identical functionality to pre-patch code, but in practice
isn't really critical, because lastaddr is just keeping track of where
to start when looking for a free slot - if it isn't updated, we will
just start looking on a slot that's already occupied, then skip up to
one that isn't.
* qemuCollectPCIAddress was essentially doing the same thing as
qemuDomainPCIAddressReserveAddr, but with some extra special case
checking at the beginning. The duplicate code has been replaced with
a call to qemuDomainPCIAddressReserveAddr. This required adding a
"fromConfig" boolean, which is only used to change the log error
code from VIR_ERR_INTERNAL_ERROR (when the address was
auto-generated by libvirt) to VIR_ERR_XML_ERROR (when the address is
coming from the config); without this differentiation, it would be
difficult to tell if an error was caused by something wrong in
libvirt's auto-allocate code or just bad config.
* the bit of code in qemuDomainPCIAddressValidate that checks the
connect type flags is going to be used in a couple more places where
we don't need to also check the slot limits (because we're generating
the slot number ourselves), so that has been pulled out into a
separate qemuDomainPCIAddressFlagsCompatible function.
* qemuDomainPCIAddressSetNextAddr
The name of this function was confusing because 1) other functions in
the file that end in "Addr" are only operating on a single function of
one PCI slot, not the entire slot, while functions that do something
with the entire slot end in "Slot", and 2) it didn't contain a verb
describing what it is doing (the "Set" refers to the set that contains
all PCI buses in the system, used to keep track of which slots in
which buses are already reserved for use).
It is now renamed to qemuDomainPCIAddressReserveNextSlot, which more
clearly describes what it is doing. Arguably, it could have been
changed to qemuDomainPCIAddressSetReserveNextSlot, but 1) the word
"set" is confusing in this context because it could be intended as a
verb or as a noun, and 2) most other functions that operate on a
single slot or address within this set are also named
qemuDomainPCIAddress... rather than qemuDomainPCIAddressSet... Only
the Create, Free, and Grow functions for an address set (which modify the
entire set, not just one element) use "Set" in their name.
* qemuPCIAddressAsString, qemuPCIAddressValidate
All the other functions in this set are named
qemuDomainPCIAddressxxxxx, so I renamed these to be consistent.
The parser shouldn't be doing arch-specific things like adding in
implicit controllers to the config. This should instead be done in the
hypervisor's post-parse callback.
This patch removes the auto-add of a usb controller from the domain
parser, and puts it into the qemu driver's post-parse callback (just
as is already done with the auto-add of the pci-root controller). In
the future, any machine/arch that shouldn't have a default usb
controller added should just set addDefaultUSB = false in this
function.
We've recently seen that q35 and ARMV7L domains shouldn't get a default USB
controller, so I've set addDefaultUSB to false for both of those.
If upgrading from a libvirt that is older than 1.0.5, we can
not assume that vm->def->resource is non-NULL. This bogus
assumption caused libvirtd to crash
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The journald code would crash if a NULL was passed for the
filename / funcname in the logging code. This shouldn't
happen in general, but it is better to be safe, since there
have been bugs triggering this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virLibConnError macros in libvirt-lxc.c and
libvirt-qemu.c were passing NULL for the filename.
This causes a crash if the logging code is configured
to use journald.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
* Move platform specific things (e.g. firewalling and route
collision checks) into bridge_driver_platform
* Create two platform specific implementations:
- bridge_driver_linux: Linux implementation using iptables,
it's actually the code moved from bridge_driver.c
- bridge_driver_nop: dumb implementation that does nothing
Signed-off-by: Eric Blake <eblake@redhat.com>
*src/util/virstoragefile.c: Add a helper function to get
the first name of missing backing files, if the name is NULL,
it means the diskchain is not broken.
*src/qemu/qemu_domain.c: qemuDiskChainCheckBroken(disk) to
check if its chain is broken
Refactor this function to make it focus on disk presence checking,
including diskchain checking, and not only for CDROM and Floppy.
This change is good for the following patches.
The virDomainDef is allocated by the caller and also used after
calling to xenDaemonCreateXML. So it must not get freed by the
callee.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Make the virCgroupNewMachine method try to use systemd-machined
first. If that fails, then fallback to using the traditional
cgroup setup code path.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When systemd is involved in managing processes, it may start
killing off & tearing down croups associated with the process
while we're still doing virCgroupKillPainfully. We must
explicitly check for ENOENT and treat it as if we had finished
killing processes
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Systemd uses a named cgroup mount for tracking processes. Add
it as another type of controller, albeit one which we have to
special case in a number of places. In particular we must
never create/delete directories there, nor add tasks. Essentially
the systemd mount is to be considered read-only for libvirt.
With this change both the virCgroupDetectPlacement and
virCgroupCopyPlacement methods must be invoked. The copy
placement method will copy setup for resource controllers
only. The detect placement method will probe for any
named controllers, or resource controllers not already
setup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are some interesting escaping rules to consider when dealing
with systemd slice/scope names. Thus it is helpful to have APIs
for formatting names
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Although this isn't apparently needed for the guest agent itself, the
test I will be adding later depends on the newline as a separator of
messages to process.
this patch introduce the console api in libxl driver for both pv and
hvm guest. and import and update the libxlMakeChrdevStr function
which was deleted in commit dfa1e1dd.
Signed-off-by: Bamvor Jian Zhang <bjzhang@suse.com>
This function is needed for virt-login-shell. Also modify virGirUserDirectory
to use the new function, to simplify the code.
Signed-off-by: Eric Blake <eblake@redhat.com>
The VIR_DOMAIN_PAUSED_GUEST_PANICKED constant is badly named,
leaking the QEMU event name. Elsewhere in the API we use
'CRASHED' rather than 'PANICKED', and the addition of 'GUEST'
is redundant since all events are guest related.
Thus rename it to VIR_DOMAIN_PAUSED_CRASHED, which matches
with VIR_DOMAIN_RUNNING_CRASHED and VIR_DOMAIN_EVENT_CRASHED.
It was added in commit 14e7e0ae8d
which post-dates v1.1.0, so is safe to rename before 1.1.1
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The VIR_DOMAIN_SHUTDOWN_CRASHED state constant does not appear
to be used in the QEMU code anyway. It also doesn't make much
(any) sense, since the 'shutdown' state is a transient state
between 'running' and 'shutoff' and when a guest crashes, it
does not end up in a 'shutdown' state, only 'shutoff'.
It was added in commit 14e7e0ae8d
which post-dates v1.1.0, so is safe to remove before 1.1.1
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The way we were casting small (<32bit) integers was broken
on big endian hosts, causing stack smashing. This was detected
in the test suite either by test failures due to incorrect
results, or by libc/gcc abort'ing with its stack canary
triggered.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Depending on the set of mingw packages installed, it is possible
that other .c files hit the mingw header pollution from the
virdbus.h file.
In file included from ../../src/rpc/virnetserver.c:39:0:
../../src/util/virdbus.h:41:35: error: expected ';', ',' or ')' before 'struct'
const char *interface,
^
* src/util/virdbus.h (virDBusCallMethod): Match .c file change.
Signed-off-by: Eric Blake <eblake@redhat.com>
On platforms without decent group support, the build failed:
Cannot export virGetGroupList: symbol not defined
./.libs/libvirt_security_manager.a(libvirt_security_manager_la-security_dac.o): In function `virSecurityDACPreFork':
/home/eblake/libvirt-tmp/build/src/../../src/security/security_dac.c:248: undefined reference to `virGetGroupList'
collect2: error: ld returned 1 exit status
* src/util/virutil.c (virGetGroupList): Provide dummy implementation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Our recent conversion to make VIR_ALLOC report oom wasn't
tested on mingw:
In file included from ../../src/util/virthread.c:29:0:
../../src/util/virthreadwin32.c: In function 'virCondWait':
../../src/util/virthreadwin32.c:166:81: error: 'VIR_FROM_THIS' undeclared (first use in this function)
if (VIR_REALLOC_N(c->waiters, c->nwaiters + 1) < 0) {
^
* src/util/virthreadwin32.c (VIR_FROM_THIS): Define.
Signed-off-by: Eric Blake <eblake@redhat.com>
The previous patch was incomplete.
CC libvirt_util_la-vircgroup.lo
../../src/util/vircgroup.c:70:12: error: 'virCgroupPartitionEscape' declared 'static' but never defined [-Werror=unused-function]
static int virCgroupPartitionEscape(char **path);
^
* src/util/vircgroup.c (virCgroupPartitionEscape): Move forward
declaration inside conditional.
Signed-off-by: Eric Blake <eblake@redhat.com>
The virCgroupValidateMachineGroup method calls some functions
which are only conditionally compiled, thus it too must be
made conditional. This fixes the build on non-Linux hosts.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A VPATH build 'make check' was failing with:
GEN check-driverimpls
Can't open ../../src/../../src/lxc/lxc_monitor_protocol.h: No such file or directory at ../../src/check-driverimpls.pl line 29, <> line 27153.
Can't open ../../src/../../src/lxc/lxc_monitor_protocol.c: No such file or directory at ../../src/check-driverimpls.pl line 29, <> line 27153.
...
GEN check-aclrules
cannot read ../../src/../../src/remote/remote_protocol.x at ../../src/check-aclrules.pl line 128.
because $(srcdir) was being prepended to file names that already
included it.
* src/Makefile.am (check-driverimpls): Don't add srcdir twice.
Signed-off-by: Eric Blake <eblake@redhat.com>
When the legacy Xen driver probes with a NULL URI, and
finds itself running on Xen, it will set conn->uri. A
little bit later though it checks to see if libxl support
exists, and if so declines the driver. This leaves the
conn->uri set to 'xen:///', so if libxl also declines
it, it prevents probing of the QEMU driver.
Once a driver has set the conn->uri, it must *never*
decline an open request. So we must move the libxl
check earlier
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=981094
The commit 0ad9025ef introduce qemu flag QEMU_CAPS_DEVICE_VIDEO_PRIMARY
for using -device VGA, -device cirrus-vga, -device vmware-svga and
-device qxl-vga. In use, for -device qxl-vga, mouse doesn't display
in guest window like the desciption in above bug.
This patch try to use -device for primary video when qemu >=1.6 which
contains the bug fix patch
Otherwise, with new enough gcc compiling at -O2, the build fails with:
../../src/conf/domain_conf.c: In function ‘virDomainDeviceDefPostParse’:
../../src/conf/domain_conf.c:2821:29: error: ‘cnt’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
for (i = 0; i < *cnt; i++) {
^
../../src/conf/domain_conf.c:2795:20: note: ‘cnt’ was declared here
size_t i, *cnt;
^
../../src/conf/domain_conf.c:2794:30: error: ‘arrPtr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
virDomainChrDefPtr **arrPtr;
^
* src/conf/domain_conf.c (virDomainChrGetDomainPtrs): Always
assign into output parameters.
Signed-off-by: Eric Blake <eblake@redhat.com>
By setting the default partition in libvirt_lxc it is not
visible when querying the live XML. Move setting of the
default partition into libvirtd virLXCProcessStart
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Decrementing it when it was already 0 causes an invalid free
in virNetworkDefUpdateDNSHost if virNetworkDNSHostDefParseXML
fails and virNetworkDNSHostDefClear gets called twice.
virNetworkForwardDefClear left the number untouched even if it
freed all the elements.
If the app has provided a whitelist of controllers to be used,
we skip detecting its mount point. We still, however, fill in
the placement info which later confuses the machine name
validation code. Skip detecting placement if the controller
mount point is not set
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When a VM has an 'emulator' child cgroup present, we must
strip off that suffix when detecting the cgroup for a
machine
Rename the virCgroupIsValidMachineGroup method to
virCgroupValidateMachineGroup to make a bit clearer
that this isn't simply a boolean check, it will make
changes to the object.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupIsValidMachine does not need to be called from
outside the cgroups file now, so make it static.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of requiring drivers to use a combination of calls
to virCgroupNewDetect and virCgroupIsValidMachine, combine
the two into virCgroupNewDetectMachine
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Both virStoragePoolFree and virStorageVolFree reset the last error,
which might lead to the cryptic message:
An error occurred, but the cause is unknown
When the volume wasn't found, virStorageVolFree was called with NULL,
leading to an error:
invalid storage volume pointer in virStorageVolFree
This patch changes it to:
Storage volume not found: no storage vol with matching name 'tomato'
Add protection such that the virCgroupRemove and
virCgroupKill* do not do anything to the root cgroup.
Killing all PIDs in the root cgroup does not end well.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of requiring one API call to create a cgroup and
another to add a task to it, introduce a new API
virCgroupNewMachine which does both jobs at once. This
will facilitate the later code to talk to systemd to
achieve this job which is also atomic.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There's a race in lxc driver causing a deadlock. If a domain is
destroyed immediately after started, the deadlock can occur. When domain
is started, the even loop tries to connect to the monitor. If the
connecting succeeds, virLXCProcessMonitorInitNotify() is called with
@mon->client locked. The first thing that callee does, is
virObjectLock(vm). So the order of locking is: 1) @mon->client, 2) @vm.
However, if there's another thread executing virDomainDestroy on the
very same domain, the first thing done here is locking the @vm. Then,
the corresponding libvirt_lxc process is killed and monitor is closed
via calling virLXCMonitorClose(). This callee tries to lock @mon->client
too. So the order is reversed to the first case. This situation results
in deadlock and unresponsive libvirtd (since the eventloop is involved).
The proper solution is to unlock the @vm in virLXCMonitorClose prior
entering virNetClientClose(). See the backtrace as follows:
Thread 25 (Thread 0x7f1b7c9b8700 (LWP 16312)):
0 0x00007f1b80539714 in __lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007f1b8053516c in _L_lock_516 () from /lib64/libpthread.so.0
2 0x00007f1b80534fbb in pthread_mutex_lock () from /lib64/libpthread.so.0
3 0x00007f1b82a637cf in virMutexLock (m=0x7f1b3c0038d0) at util/virthreadpthread.c:85
4 0x00007f1b82a4ccf2 in virObjectLock (anyobj=0x7f1b3c0038c0) at util/virobject.c:320
5 0x00007f1b82b861f6 in virNetClientCloseInternal (client=0x7f1b3c0038c0, reason=3) at rpc/virnetclient.c:696
6 0x00007f1b82b862f5 in virNetClientClose (client=0x7f1b3c0038c0) at rpc/virnetclient.c:721
7 0x00007f1b6ee12500 in virLXCMonitorClose (mon=0x7f1b3c007210) at lxc/lxc_monitor.c:216
8 0x00007f1b6ee129f0 in virLXCProcessCleanup (driver=0x7f1b68100240, vm=0x7f1b680ceb70, reason=VIR_DOMAIN_SHUTOFF_DESTROYED) at lxc/lxc_process.c:174
9 0x00007f1b6ee14106 in virLXCProcessStop (driver=0x7f1b68100240, vm=0x7f1b680ceb70, reason=VIR_DOMAIN_SHUTOFF_DESTROYED) at lxc/lxc_process.c:710
10 0x00007f1b6ee1aa36 in lxcDomainDestroyFlags (dom=0x7f1b5c002560, flags=0) at lxc/lxc_driver.c:1291
11 0x00007f1b6ee1ab1a in lxcDomainDestroy (dom=0x7f1b5c002560) at lxc/lxc_driver.c:1321
12 0x00007f1b82b05be5 in virDomainDestroy (domain=0x7f1b5c002560) at libvirt.c:2303
13 0x00007f1b835a7e85 in remoteDispatchDomainDestroy (server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0, rerr=0x7f1b7c9b7c30, args=0x7f1b5c004a50) at remote_dispatch.h:3143
14 0x00007f1b835a7d78 in remoteDispatchDomainDestroyHelper (server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0, rerr=0x7f1b7c9b7c30, args=0x7f1b5c004a50, ret=0x7f1b5c0029e0) at remote_dispatch.h:3121
15 0x00007f1b82b93704 in virNetServerProgramDispatchCall (prog=0x7f1b8573af90, server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0) at rpc/virnetserverprogram.c:435
16 0x00007f1b82b93263 in virNetServerProgramDispatch (prog=0x7f1b8573af90, server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0) at rpc/virnetserverprogram.c:305
17 0x00007f1b82b8c0f6 in virNetServerProcessMsg (srv=0x7f1b857419d0, client=0x7f1b8574ae40, prog=0x7f1b8573af90, msg=0x7f1b8574acf0) at rpc/virnetserver.c:163
18 0x00007f1b82b8c1da in virNetServerHandleJob (jobOpaque=0x7f1b8574dca0, opaque=0x7f1b857419d0) at rpc/virnetserver.c:184
19 0x00007f1b82a64158 in virThreadPoolWorker (opaque=0x7f1b8573cb10) at util/virthreadpool.c:144
20 0x00007f1b82a63ae5 in virThreadHelper (data=0x7f1b8574b9f0) at util/virthreadpthread.c:161
21 0x00007f1b80532f4a in start_thread () from /lib64/libpthread.so.0
22 0x00007f1b7fc4f20d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f1b83546740 (LWP 16297)):
0 0x00007f1b80539714 in __lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007f1b8053516c in _L_lock_516 () from /lib64/libpthread.so.0
2 0x00007f1b80534fbb in pthread_mutex_lock () from /lib64/libpthread.so.0
3 0x00007f1b82a637cf in virMutexLock (m=0x7f1b680ceb80) at util/virthreadpthread.c:85
4 0x00007f1b82a4ccf2 in virObjectLock (anyobj=0x7f1b680ceb70) at util/virobject.c:320
5 0x00007f1b6ee13bd7 in virLXCProcessMonitorInitNotify (mon=0x7f1b3c007210, initpid=4832, vm=0x7f1b680ceb70) at lxc/lxc_process.c:601
6 0x00007f1b6ee11fd3 in virLXCMonitorHandleEventInit (prog=0x7f1b3c001f10, client=0x7f1b3c0038c0, evdata=0x7f1b8574a7d0, opaque=0x7f1b3c007210) at lxc/lxc_monitor.c:109
7 0x00007f1b82b8a196 in virNetClientProgramDispatch (prog=0x7f1b3c001f10, client=0x7f1b3c0038c0, msg=0x7f1b3c003928) at rpc/virnetclientprogram.c:259
8 0x00007f1b82b87030 in virNetClientCallDispatchMessage (client=0x7f1b3c0038c0) at rpc/virnetclient.c:1019
9 0x00007f1b82b876bb in virNetClientCallDispatch (client=0x7f1b3c0038c0) at rpc/virnetclient.c:1140
10 0x00007f1b82b87d41 in virNetClientIOHandleInput (client=0x7f1b3c0038c0) at rpc/virnetclient.c:1312
11 0x00007f1b82b88f51 in virNetClientIncomingEvent (sock=0x7f1b3c0044e0, events=1, opaque=0x7f1b3c0038c0) at rpc/virnetclient.c:1832
12 0x00007f1b82b9e1c8 in virNetSocketEventHandle (watch=3321, fd=54, events=1, opaque=0x7f1b3c0044e0) at rpc/virnetsocket.c:1695
13 0x00007f1b82a272cf in virEventPollDispatchHandles (nfds=21, fds=0x7f1b8574ded0) at util/vireventpoll.c:498
14 0x00007f1b82a27af2 in virEventPollRunOnce () at util/vireventpoll.c:645
15 0x00007f1b82a25a61 in virEventRunDefaultImpl () at util/virevent.c:273
16 0x00007f1b82b8e97e in virNetServerRun (srv=0x7f1b857419d0) at rpc/virnetserver.c:1097
17 0x00007f1b8359db6b in main (argc=2, argv=0x7ffff98dbaa8) at libvirtd.c:1512
The avail_vcpu bitmap has to be allocated before it can be used (using
the maximum allowed value for that). Then for each available VCPU the
bit in the mask has to be set (libxl_bitmap_set takes a bit position
as an argument, not the number of bits to set).
Without this, I would always only get one VCPU for guests created
through libvirt/libxl.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
virCgroupAvailable() implementation calls getmntent_r
without checking if HAVE_GETMNTENT_R is defined, so it fails
to build on platforms without getmntent_r support.
Make virCgroupAvailable() just return false without
HAVE_GETMNTENT_R.
Function qemuOpenFile() haven't had any idea about seclabels applied
to VMs only, so in case the seclabel differed from the "user:group"
from configuration, there might have been issues with opening files.
Make qemuOpenFile() VM-aware, but only optionally, passing NULL
argument means skipping VM seclabel info completely.
However, all current qemuOpenFile() calls look like they should use VM
seclabel info in case there is any, so convert these calls as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=869053
Parsing 'user:group' is useful even outside the DAC security driver,
so expose the most abstract function which has no DAC security driver
bits in itself.
Since PCI bridges, PCIe bridges, PCIe switches, and PCIe root ports
all share the same namespace, they are all defined as controllers of
type='pci' in libvirt (but with a differing model attribute). Each of
these controllers has a certain connection type upstream, allows
certain connection types downstream, and each can either allow a
single downstream connection at slot 0, or connections from slot 1 -
31.
Right now, we only support the pci-root and pci-bridge devices, both
of which only allow PCI devices to connect, and both which have usable
slots 1 - 31. In preparation for adding other types of controllers
that have different capabilities, this patch 1) adds info to the
qemuDomainPCIAddressBus object to indicate the capabilities, 2) sets
those capabilities appropriately for pci-root and pci-bridge devices,
and 3) validates that the controller being connected to is the proper
type when allocating slots or validating that a user-selected slot is
appropriate for a device..
Having this infrastructure in place will make it much easier to add
support for the other PCI controller types.
While it would be possible to do all the necessary checking by just
storing the controller model in the qemyuDomainPCIAddressBus, it
greatly simplifies all the validation code to also keep a "flags",
"minSlot" and "maxSlot" for each - that way we can just check those
attributes rather than requiring a nearly identical switch statement
everywhere we need to validate compatibility.
You may notice many places where the flags are seemingly hard-coded to
QEMU_PCI_CONNECT_HOTPLUGGABLE | QEMU_PCI_CONNECT_TYPE_PCI
This is currently the correct value for all PCI devices, and in the
future will be the default, with small bits of code added to change to
the flags for the few devices which are the exceptions to this rule.
Finally, there are a few places with "FIXME" comments. Note that these
aren't indicating places that are broken according to the currently
supported devices, they are places that will need fixing when support
for new PCI controller models is added.
To assure that there was no regression in the auto-allocation of PCI
addresses or auto-creation of integrated pci-root, ide, and usb
controllers, a new test case (pci-bridge-many-disks) has been added to
both the qemuxml2argv and qemuxml2xml tests. This new test defines a
domain with several dozen virtio disks but no pci-root or
pci-bridges. The .args file of the new test case was created using
libvirt sources from before this patch, and the test still passes
after this patch has been applied.
Although these two enums are named ..._LAST, they really had the value
of ..._SIZE. This patch changes their values so that, e.g.,
QEMU_PCI_ADDRESS_SLOT_LAST really is the slot number of the last slot
on a PCI bus.
The implicit IDE, USB, and video controllers provided by the PIIX3
chipset in the pc-* machinetypes are not present on other
machinetypes, so we shouldn't be doing the special checking for
them. This patch places those validation checks into a separate
function that is only called for machine types that have a PIIX3 chip
(which happens to be the i440fx-based pc-* machine types).
One qemuxml2argv test data file had to be changed - the
pseries-usb-multi test had included a piix3-usb-uhci device, which was
being placed at a specific address, and also had slot 2 auto reserved
for a video device, but the pseries virtual machine doesn't actually
have a PIIX3 chip, so even if there was a piix3-usb-uhci driver for
it, the device wouldn't need to reside at slot 1 function 2. I just
changed the .argv file to have the generic slot info for the two
devices that results when the special PIIX3 code isn't executed.
qemuDomainPCIAddressBus was an array of QEMU_PCI_ADDRESS_SLOT_LAST
uint8_t's, which worked fine as long as every PCI bus was
identical. In the future, some PCI busses will allow connecting PCI
devices, and some will allow PCIe devices; also some will only allow
connection of a single device, while others will allow connecting 31
devices.
In order to keep track of that information for each bus, we need to
turn qemuDomainPCIAddressBus into a struct, for now with just one
member:
uint8_t slots[QEMU_PCI_ADDRESS_SLOT_LAST];
Additional members will come in later patches.
The item in qemuDomainPCIAddresSet that contains the array of
qemuDomainPCIAddressBus is now called "buses" to be more consistent
with the already existing "nbuses" (and with the new "slots" array).
Thanks to a lack of coordination between kernel and glibc folks,
it has been impossible to mix code using <linux/in.h> and
<net/in.h> for some time now (see for example commit c308a9a).
On at least RHEL 6, <linux/if_bridge.h> tries to use the kernel
side, and fails due to our desire to use the glibc side elsewhere:
In file included from /usr/include/linux/if_bridge.h:17,
from util/virnetdevbridge.c:42:
/usr/include/linux/in6.h:31: error: redefinition of ‘struct in6_addr’
/usr/include/linux/in6.h:48: error: redefinition of ‘struct sockaddr_in6’
/usr/include/linux/in6.h:56: error: redefinition of ‘struct ipv6_mreq’
Thankfully, the kernel layout of these structs is ABI-compatible,
they only differ in the type system presented to the C compiler.
While there are other versions of kernel headers that avoid the
problem, it is easier to just work around the issue than to expect
all developers to upgrade to working kernel headers.
* src/util/virnetdevbridge.c (includes): Coerce the kernel version
of in.h to not collide with the normal version.
Signed-off-by: Eric Blake <eblake@redhat.com>
dbus 1.2.24 (on RHEL 6) lacks DBUS_TYPE_UNIX_FD; but as we aren't
trying to pass one of those anyways, we can just drop support for
it in our wrapper. Solves this build error introduced in commit
834c9c94:
CC libvirt_util_la-virdbus.lo
util/virdbus.c:242: error: 'DBUS_TYPE_UNIX_FD' undeclared here (not in a function)
* src/util/virdbus.c (virDBusBasicTypes): Drop support for unix fds.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit id '4421e257' strdup'd devAlias, but didn't free
Running qemuhotplugtest under valgrind resulted in the following:
==7375== 9 bytes in 1 blocks are definitely lost in loss record 11 of 70
==7375== at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==7375== by 0x37C1085D71: strdup (strdup.c:42)
==7375== by 0x4CBBD5F: virStrdup (virstring.c:554)
==7375== by 0x4CFF9CB: virDomainEventDeviceRemovedNew (domain_event.c:1174)
==7375== by 0x427791: qemuDomainRemoveChrDevice (qemu_hotplug.c:2508)
==7375== by 0x42C65D: qemuDomainDetachChrDevice (qemu_hotplug.c:3357)
==7375== by 0x41C94F: testQemuHotplug (qemuhotplugtest.c:115)
==7375== by 0x41D817: virtTestRun (testutils.c:168)
==7375== by 0x41C400: mymain (qemuhotplugtest.c:322)
==7375== by 0x41DF3A: virtTestMain (testutils.c:764)
==7375== by 0x37C1021A04: (below main) (libc-start.c:225)
Commit 'c8695053' resulted in the following:
Coverity error seen in the output:
ERROR: REVERSE_INULL
FUNCTION: lxcProcessAutoDestroy
Due to the 'dom' being checked before 'dom->persistent' since 'dom'
is already dereferenced prior to that.
Currently the LXC driver creates the VM's cgroup prior to
forking, and then libvirt_lxc moves the child process
into the cgroup. This won't work with systemd whose APIs
do the creation of cgroups + attachment of processes atomically.
Fortunately we simply move the entire cgroups setup into
the libvirt_lxc child process. We make it take place before
fork'ing into the background, so by the time virCommandRun
returns in the LXC driver, the cgroup is guaranteed to be
present.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the QEMU driver creates the VM's cgroup prior to
forking, and then uses a virCommand hook to move the child
into the cgroup. This won't work with systemd whose APIs
do the creation of cgroups + attachment of processes atomically.
Fortunately we have a handshake taking place between the
QEMU driver and the child process prior to QEMU being exec()d,
which was introduced to allow setup of disk locking. By good
fortune this synchronization point can be used to enable the
QEMU driver to do atomic setup of cgroups removing the use
of the hook script.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupNewDomainDriver and virCgroupNewDriver methods
are obsolete now that we can auto-detect existing cgroup
placement. Delete them to reduce code bloat.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Use the new virCgroupNewDetect function to determine cgroup
placement of existing running VMs. This will allow the legacy
cgroups creation APIs to be removed entirely
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add virCgroupIsValidMachine API to check whether an auto
detected cgroup is valid for a machine. This lets us
check if a VM has just been placed into some generic
shared cgroup, or worse, the root cgroup
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a virCgroupNewDetect API which is used to initialize a
cgroup object with the placement of an arbitrary process.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If systemd machine does not exist, return -2 instead of -1,
so that applications don't need to repeat the tedious error
checking code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Current code for handling dbus errors only works for errors
received from the remote application itself. We must also
handle errors emitted by the bus itself, for example, when
it fails to spawn the target service.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a privileged field to storageDriverState
Use the privileged value in order to generate a connection which could
be passed to the various storage backend drivers.
In particular, the iSCSI driver will need a connect in order to perform
pool authentication using the 'chap' secrets and the RBD driver utilizes
the connection during pool refresh for pools using 'ceph' secrets.
For now that connection will be to be to qemu driver until a mechanism
is devised to get a connection to just the secret driver without qemu.
Update virStorageBackendRBDOpenRADOSConn() to use the internal API to the
secret driver in order to get the secret value instead of the external
virSecretGetValue() path. Without the flag VIR_SECRET_GET_VALUE_INTERNAL_CALL
there is no way to get the value of private secret.
This also requires ensuring there is a connection which wasn't true for
for the refreshPool() path calls from storageDriverAutostart() prior to
adding support for the connection to a qemu driver. It seems calls to
virSecretLookupByUUIDString() and virSecretLookupByUsage() from the
refreshPool() path would have failed with no way to find the secret - that is
theoretically speaking since the 'conn' was NULL the failure would have been
"failed to find the secret".
Although the XML for CHAP authentication with plain "password"
was introduced long ago, the function was never implemented. This
patch replaces the login/password mechanism by following the
'ceph' (or RBD) model of using a 'username' with a 'secret' which
has the authentication information.
This patch performs the authentication during startPool() processing
of pools with an authType of VIR_STORAGE_POOL_AUTH_CHAP specified
for iSCSI pools.
There are two types of CHAP configurations supported for iSCSI
authentication:
* Initiator Authentication
Forward, one-way; The initiator is authenticated by the target.
* Target Authentication
Reverse, Bi-directional, mutual, two-way; The target is authenticated
by the initiator; This method also requires Initiator Authentication
This only supports the "Initiator Authentication". (I don't have any
enterprise iSCSI env for testing, only have a iSCSI target setup with
tgtd, which doesn't support "Target Authentication").
"Discovery authentication" is not supported by tgt yet too. So this only
setup the session authentication by executing 3 iscsiadm commands, E.g:
% iscsiadm -m node --target "iqn.2013-05.test:iscsi.foo" --name \
"node.session.auth.authmethod" -v "CHAP" --op update
% iscsiadm -m node --target "iqn.2013-05.test:iscsi.foo" --name \
"node.session.auth.username" -v "Jim" --op update
% iscsiadm -m node --target "iqn.2013-05.test:iscsi.foo" --name \
"node.session.auth.password" -v "Jimsecret" --op update
During qemuTranslateDiskSourcePool() execution, if the srcpool has been
defined with authentication information, then for iSCSI pools copy the
authentication and host information to virDomainDiskDef.
Due to a goto statement missed when refactoring in 2771f8b74c
when acquiring of a domain job failed the error path was not taken. This
resulted into a crash afterwards as an extra reference was removed from a
domain object leading to it being freed. An attempt to list the domains
leaded to a crash of the daemon afterwards.
https://bugzilla.redhat.com/show_bug.cgi?id=928672
Commit 834c9c94 introduced virDBusMessageEncode and
virDBusMessageDecode functions, however corresponding stubs
were not added to !WITH_DBUS section, therefore 'make check'
started to fail when compiled w/out dbus support like that:
Expected symbol virDBusMessageDecode is not in ELF library
The translation must be done before both of cgroup and security
setting, otherwise since the disk source is not translated yet,
it might be skipped on cgroup and security setting.
virDomainDiskDefForeachPath is not only used by the security
setting helpers, also used by cgroup setting helpers, so this
is to ignore the volume type disk with mode="direct" for cgroup
setting.
The difference with already supported pool types (dir, fs, block)
is: there are two modes for iscsi pool (or network pools in future),
one can specify it either to use the volume target path (the path
showed up on host) with mode='host', or to use the remote URI qemu
supports (e.g. file=iscsi://example.org:6000/iqn.1992-01.com.example/1)
with mode='direct'.
For 'host' mode, it copies the volume target path into disk->src. For
'direct' mode, the corresponding info in the *one* pool source host def
is copied to disk->hosts[0].
There are two ways to use a iSCSI LUN as disk source for qemu.
* The LUN's path as it shows up on host, e.g.
/dev/disk/by-path/ip-$ip:3260-iscsi-$iqn-fc18:iscsi.iscsi0-lun-1
* The libiscsi URI from the storage pool source element host attribute, e.g.
iscsi://demo.org:6000/iqn.1992-01.com.example/1
For a "volume" type disk, if the specified "pool" is of iscsi
type, we should support to use the LUN in either of above 2 ways.
That's why to introduce a new XML tag "mode" for the disk source
(libvirt should support iscsi pool with libiscsi, but it's another
new feature, which should be done later).
The "mode" can be either of "host" or "direct". Use "host" to indicate
use of the LUN with the path as it shows up on host. Use "direct" to
indicate to use it with the source pool host URI (future patches may support
to use network type libvirt storage too, e.g. Ceph)
This is another cleanup before extracting platform-specific
parts from bridge_driver.
Rename struct network_driver to _virNetworkDriverState and
add appropriate typedefs: virNetworkDriverState and
virNetworkDriverStatePtr.
This will help us to avoid potential problems when moving
this struct to the .h file.
Convert the remaining methods in vircgroup.c to report errors
instead of returning errno values.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add virErrorSetErrnoFromLastError and virLastErrorIsSystemErrno
to simplify code which wants to handle system errors in a more
graceful fashion.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To register virtual machines and containers with systemd-machined,
and thus have cgroups auto-created, we need to talk over DBus.
This is somewhat tedious code, so introduce a dedicated function
to isolate the DBus call in one place.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Doing DBus method calls using libdbus.so is tedious in the
extreme. systemd developers came up with a nice high level
API for DBus method calls (sd_bus_call_method). While
systemd doesn't use libdbus.so, their API design can easily
be ported to libdbus.so.
This patch thus introduces methods virDBusCallMethod &
virDBusMessageRead, which are based on the code used for
sd_bus_call_method and sd_bus_message_read. This code in
systemd is under the LGPLv2+, so we're license compatible.
This code is probably pretty unintelligible unless you are
familiar with the DBus type system. So I added some API
docs trying to explain how to use them, as well as test
cases to validate that I didn't screw up the adaptation
from the original systemd code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Until now CPU features inherited from a specified CPU model could only
be overridden with 'disable' policy. With this patch, any explicitly
specified feature always overrides the same feature inherited from a CPU
model regardless on the specified policy.
The CPU in x86-exact-force-Haswell.xml would previously be incompatible
with x86-host-SandyBridge.xml CPU even though x86-host-SandyBridge.xml
provides all features required by x86-exact-force-Haswell.xml.
If no explicit driver is set for an image backed filesystem,
set it to use the loop driver (if raw) or nbd driver (if
non-raw)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A couple of places in LXC setup for filesystems did not do
a "goto cleanup" after reporting errors. While fixing this,
also add in many more debug statements to aid troubleshooting
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The alias for hostdevs of type SCSI can be too long for QEMU if
larger LUNs are encountered. Here's a real life example:
<hostdev mode='subsystem' type='scsi' managed='no'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='19' unit='1088634913'/>
</source>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</hostdev>
this results in a too long drive id, resulting in QEMU yelling
Property 'scsi-generic.drive' can't find value 'drive-hostdev-scsi_host0-0-19-1088634913'
This commit changes the alias back to the default hostdev$(index)
scheme.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
In case libvirtd is asked to unplug a device but the device is actually
unplugged later when libvirtd is not running, we need to detect that and
remove such device when libvirtd starts again and reconnects to running
domains.
Attempts to start a domain with both SELinux and DAC security
modules loaded will deadlock; latent problem introduced in commit
fdb3bde and exposed in commit 29fe5d7. Basically, when recursing
into the security manager for other driver's prefork, we have to
undo the asymmetric lock taken at the manager level.
Reported by Jiri Denemark, with diagnosis help from Dan Berrange.
* src/security/security_stack.c (virSecurityStackPreFork): Undo
extra lock grabbed during recursion.
Signed-off-by: Eric Blake <eblake@redhat.com>
Makefiles are another easy file to enforce line limits.
Mostly straightforward; interesting tricks worth noting:
src/Makefile.am: $(confdir) was already defined, use it in more places
tests/Makefile.am: path_add and VG required some interesting compression
* cfg.mk (sc_prohibit_long_lines): Add another test.
* Makefile.am: Fix offenders.
* daemon/Makefile.am: Likewise.
* docs/Makefile.am: Likewise.
* python/Makefile.am: Likewise.
* src/Makefile.am: Likewise.
* tests/Makefile.am: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit 75c1256 states that virGetGroupList must not be called
between fork and exec, then commit ee777e99 promptly violated
that for lxc's use of virSecurityManagerSetProcessLabel. Hoist
the supplemental group detection to the time that the security
manager needs to fork. Qemu is safe, as it uses
virSecurityManagerSetChildProcessLabel which in turn uses
virCommand to determine supplemental groups.
This does not fix the fact that virSecurityManagerSetProcessLabel
calls virSecurityDACParseIds calls parseIds which eventually
calls getpwnam_r, which also violates fork/exec async-signal-safe
safety rules, but so far no one has complained of hitting
deadlock in that case.
* src/security/security_dac.c (_virSecurityDACData): Track groups
in private data.
(virSecurityDACPreFork): New function, to set them.
(virSecurityDACClose): Clean up new fields.
(virSecurityDACGetIds): Alter signature.
(virSecurityDACSetSecurityHostdevLabelHelper)
(virSecurityDACSetChardevLabel, virSecurityDACSetProcessLabel)
(virSecurityDACSetChildProcessLabel): Update callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch wants the DAC security manager to be able to safely
get the supplemental group list for a given uid, but at the time
of a fork rather than during initialization so as to pick up on
live changes to the system's group database. This patch adds the
framework, including the possibility of a pre-fork callback
failing.
For now, any driver that implements a prefork callback must be
robust against the possibility of being part of a security stack
where a later element in the chain fails prefork. This means
that drivers cannot do any action that requires a call to postfork
for proper cleanup (no grabbing a mutex, for example). If this
is too prohibitive in the future, we would have to switch to a
transactioning sequence, where each driver has (up to) 3 callbacks:
PreForkPrepare, PreForkCommit, and PreForkAbort, to either clean
up or commit changes made during prepare.
* src/security/security_driver.h (virSecurityDriverPreFork): New
callback.
* src/security/security_manager.h (virSecurityManagerPreFork):
Change signature.
* src/security/security_manager.c (virSecurityManagerPreFork):
Optionally call into driver, and allow returning failure.
* src/security/security_stack.c (virSecurityDriverStack):
Wrap the handler for the stack driver.
* src/qemu/qemu_process.c (qemuProcessStart): Adjust caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
These helpers use the remembered host capabilities to retrieve the cpu
map rather than query the host again. The intended usage for this
helpers is to fix automatic NUMA placement with strict memory alloc. The
code doing the prepare needs to pin the emulator process only to cpus
belonging to a subset of NUMA nodes of the host.
When user does not specify any model for scsi controller, or worse, no
controller at all, but libvirt automatically adds scsi controller with
no model, we are not searching for virtio-scsi and thus this can fail
for example on qemu which doesn't support lsi logic adapter.
This means that when qemu on x86 doesn't support lsi53c895a and the
user adds the following to an XML without any scsi controller:
<disk ...>
...
<target dev='sda'>
</disk>
libvirt fails like this:
# virsh define asdf.xml
error: Failed to define domain from asdf.xml
error: internal error Unable to determine model for scsi controller
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=974943
With the majority of fields in the virLXCDriverPtr struct
now immutable or self-locking, there is no need for practically
any methods to be using the LXC driver lock. Only a handful
of helper APIs now need it.
The activeUsbHostdevs item in LXCDriver are lockable, but the lock has
to be called explicitly. Call the virObject(Un)Lock() in order to
achieve mutual exclusion once lxcDriverLock is removed.
The 'driver->caps' pointer can be changed on the fly. Accessing
it currently requires the global driver lock. Isolate this
access in a single helper, so a future patch can relax the
locking constraints.
Currently the virLXCDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virLXCDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virLXCDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.
NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
When virAsprintf was changed from a function to a macro
reporting OOM error in dc6f2da, it was documented as returning
0 on success. This is incorrect, it returns the number of bytes
written as asprintf does.
Some of the functions were converted to use virAsprintf's return
value directly, changing the return value on success from 0 to >= 0.
For most of these, this is not a problem, but the change in
virPCIDriverDir breaks PCI passthrough.
The return value check in virhashtest pre-dates virAsprintf OOM
conversion.
vmwareMakePath seems to be unused.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the new virDomainCreate{XML}WithFiles methods in the
LXC driver, so that FDs get passed down to the init process.
The lxc_container code needs to do a little dance in order
to renumber the file descriptors it receives into linear
order, starting from STDERR_FILENO + 1.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In the following commit:
commit 03d813bbcd
Author: Marek Marczykowski <marmarek@invisiblethingslab.com>
Date: Thu May 23 02:01:30 2013 +0200
remote: fix dom->id after virDomainCreateWithFlags
The virDomainCreateWithFlags remote client helper was made to
invoke REMOTE_PROC_DOMAIN_LOOKUP_BY_UUID to refresh the 'id'
of the domain, following the pattern used in the previous
virDomainCreate method impl.
The remote protocol for virDomainCreateWithFlags though did
actually fix the design flaw in virDomainCreate, by directly
returning the new domain info. For some reason, this data was
never used. So we can just use that data now instead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since they make use of file descriptor passing, the remote protocol
methods for virDomainCreate{XML}WithFiles must be written by hand.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
With container based virt, it is useful to be able to pass
pre-opened file descriptors to the container init process.
This allows for containers to be auto-activated from incoming
socket connections, passing the active socket into the container.
To do this, introduce a pair of new APIs, virDomainCreateXMLWithFiles
and virDomainCreateWithFiles, which accept an array of file
descriptors. For the LXC driver, UNIX file descriptor passing
will be used to send them to libvirtd, which will them pass
them down to libvirt_lxc, which will then pass them to the container
init process.
This will only be implemented for LXC right now, but the design
is generic enough it could work with other hypervisors, hence
I suggest adding this to libvirt.so, rather than libvirt-lxc.so
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add support for creating disk-only (no memory) snapshots in esx, and
for quiescing the VM before taking the snapshot. The VMware API
supports these operations directly, so adding support to libvirt is
just a matter of setting the flags correctly when calling
VMware. VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY and
VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE are now valid flags for esx.
Signed-off-by: Eric Blake <eblake@redhat.com>
Although, having it depending on Xen >= 4.3 (by using the proper
libxl feature flag).
Xen currently implements a NUMA placement policy which is basically
the same as the 'interleaved' policy of `numactl', although it can
be applied on a subset of the available nodes. We therefore hardcode
"interleave" as 'numa_mode', and we use the newly introduced libxl
interface to figure out what nodes a domain spans ('numa_nodeset').
With this change, it is now possible to query the NUMA node
affinity of a running domain:
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list
Id Name State
----------------------------------------------------
23 F18_x64 running
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// numatune 23
numa_mode : interleave
numa_nodeset : 1
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
domainGetNumaParameters has a string typed parameter, hence it
is necessary for the libxl driver to support this.
This change implements the connectSupportsFeature hook for the
libxl driver, advertising that VIR_DRV_FEATURE_TYPED_PARAM_STRING
is supported.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Eric Blake <eblake@redhat.com>
Xen 4.3 changes sysctl version to 10 and domctl version to 9. Update
the hypervisor driver to work with those.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Commit 75c1256 states that virGetGroupList must not be called
between fork and exec, then commit ee777e99 promptly violated
that for lxc.
Patch originally posted by Eric Blake <eblake@redhat.com>.
Reuse the buffer for getline and track buffer allocation
separately from the string length to prevent unlikely
out-of-bounds memory access.
This fixes the following leak that happened when zero bytes were read:
==404== 120 bytes in 1 blocks are definitely lost in loss record 1,344 of 1,671
==404== at 0x4C2C71B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==404== by 0x906F862: getdelim (iogetdelim.c:68)
==404== by 0x52A48FB: virCgroupPartitionNeedsEscaping (vircgroup.c:1136)
==404== by 0x52A0FB4: virCgroupPartitionEscape (vircgroup.c:1171)
==404== by 0x52A0EA4: virCgroupNewDomainPartition (vircgroup.c:1450)
While generating seclabels, we check the seclabel stack if required
driver is in the stack. If not, an error is returned. However, it is
possible for a seclabel to not have any model set (happens with LXC
domains that have just <seclabel type='none'>). If that's the case,
we should just skip the iteration instead of calling STREQ(NULL, ...)
and SIGSEGV-ing subsequently.
Currently, if the primary security driver is 'none', we skip
initializing caps->host.secModels. This means, later, when LXC domain
XML is parsed and <seclabel type='none'/> is found (see
virSecurityLabelDefsParseXML), the model name is not copied to the
seclabel. This leads to subsequent crash in virSecurityManagerGenLabel
where we call STREQ() over the model (note, that we are expecting model
to be !NULL).