https://bugzilla.redhat.com/show_bug.cgi?id=894723
Currently, if qemuProcessStart() succeeds, but it's decompression
binary that returns nonzero status, we don't kill the qemu process,
but remove it from internal domain list, leaving the qemu process
hanging around totally uncontrolled.
This patch resolves CVE-2013-0170:
https://bugzilla.redhat.com/show_bug.cgi?id=893450
When reading and dispatching of a message failed the message was freed
but wasn't removed from the message queue.
After that when the connection was about to be closed the pointer for
the message was still present in the queue and it was passed to
virNetMessageFree which tried to call the callback function from an
uninitialized pointer.
This patch removes the message from the queue before it's freed.
* rpc/virnetserverclient.c: virNetServerClientDispatchRead:
- avoid use after free of RPC messages
https://bugzilla.redhat.com/show_bug.cgi?id=892289
It seems like with new udev within guest OS, the tray is locked,
so we need to:
- 'eject'
- wait for tray to open
- 'change'
Moreover, even when doing bare 'eject', we should check for
'tray_open' as guest may have locked the tray. However, the
waiting phase shouldn't be unbounded, so I've chosen 10 retries
maximum, each per 500ms. This should give enough time for guest
to eject a media and open the tray.
Adjust the macros to free memory allocated during various calls to
perform the check if parameter is NULL prior to really freeing and to
set the pointer to NULL after done freeing.
Resolve a false positive from 'vboxIIDFromUUID_v2_x()'. The code sets
'iid->value = &iid->backing' unconditionally prior to calling 'nsIDFromChar()'.
The 'vboxIIDUnalloc_v2_x()' checks iid->value to not be &iid->backing. The
iid->backing is a static buffer within the initialized structure.
Since libxl provides the domain ID in the event handler callback,
find the domain object based on the ID. This approach prevents
processing the callback on a domain that has already been reaped.
Also, similar to the xl implementation, ignore the SUSPEND shutdown
reason. By calling libxl_domain_suspend(), we know a shutdown
event with SUSPEND reason will be generated, but it can be safely
ignored since any subsequent cleanup will be done by the callers.
libxlDoDomainSave() was removing non-persistent domains, but
required callers to have the virDomainObj locked. Callers could
potentially unlock an already freed virDomainObj. Move this
logic to the callers of libxlDoDomainSave().
I've noticed that libxl can invoke timeout reregister/modify hooks
after returning from libxl_ctx_free. Explicitly remove the
timeouts before freeing the libxl ctx to avoid executing hooks on
stale objects.
It is possible to destroy and cleanup a VM, resulting in freeing the
libxlDomainObjPrivate object and associated libxl ctx, before all fds and
timeouts have been deregistered and destroyed.
Fix this race by incrementing the reference count on libxlDomainObjPrivate
for each fd and timeout registration. Only when all fds and timeouts are
deregistered and destroyed will the libxlDomainObjPrivate be destroyed.
The libxl driver is racy in it's interactions with libxl and libvirt's
event loop. The event loop can invoke callbacks after libxl has
deregistered the event, and possibly access freed data associated with
the event.
This patch fixes the race by converting libxlDomainObjPrivate to a
virObjectLockable, and locking it while executing libxl upcalls and
libvirt event loop callbacks.
Note that using the virDomainObj lock is not satisfactory since it may
be desirable to hold the virDomainObj lock even when libxl events such
as reading and writing to xenstore need processed.
xen-unstable changeset 26469 makes changes wrt modifying and deregistering
timeouts.
First, timeout modify callbacks will only be invoked with an
abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this
commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be
compatible before and after changeset 26469.
While at it, fix a potential overflow in the timeout register callback.
While working with a pmsuspend vs. snapshot issue, I noticed that
the state file in /var/run/libvirt/qemu/dom.xml contained a rather
suspicious "(null)" string, which does not round-trip well through
a libvirtd restart. Had I been on a platform other than glibc
where printf("%s",NULL) crashes instead of printing (null), we might
have noticed the problem much sooner.
And in fixing that problem, I also noticed that we had several
missing states, because we were #defining several *_LAST names
to a value _different_ than what they were already given as enums
in libvirt.h. Yuck. I got rid of default: labels in the case
statements, because they get in the way of gcc's -Wswitch helping
us ensure we cover all enum values.
* src/conf/domain_conf.c (virDomainStateReasonToString)
(virDomainStateReasonFromString): Fill in missing domain states;
rewrite case statement to let compiler enforce checking.
(VIR_DOMAIN_NOSTATE_LAST, VIR_DOMAIN_RUNNING_LAST)
(VIR_DOMAIN_BLOCKED_LAST, VIR_DOMAIN_PAUSED_LAST)
(VIR_DOMAIN_SHUTDOWN_LAST, VIR_DOMAIN_SHUTOFF_LAST)
(VIR_DOMAIN_CRASHED_LAST): Drop dead defines.
(VIR_DOMAIN_PMSUSPENDED_LAST): Drop dead define.
(virDomainPMSuspendedReason): Add missing enum function.
(virDomainRunningReason, virDomainPausedReason): Add missing enum
value.
* src/conf/domain_conf.h (virDomainPMSuspendedReason): Declare
missing functions.
* src/libvirt_private.syms (domain_conf.h): Export them.
With our code, we fail to query for tray-open attribute currently.
That's because in HMP it is 'tray-open' and in QMP it's 'tray_open'.
It always has been. However, we got it exactly the opposite.
A logic bug meant we reported KVM was possible for every
architecture, merely based on whether the query-kvm command
exists. We should instead have been doing it based on whether
the query-kvm command returns 'present: 1'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently QEMU capabilities are initialized before the QEMU driver
sets ownership on its various directories. The upshot is that if
you change the user/group in the qemu.conf file, libvirtd will fail
to probe QEMU the first time it is run after the config change.
Moving QEMU capabilities initialization to after the chown() calls
fixes this
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This previous commit
commit 1a50ba2cb0
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date: Mon Nov 26 15:17:13 2012 +0100
qemu: Fix QMP Capabability Probing Failure
which attempted to make sure the QEMU process used for probing
ran as the right user id, caused serious performance regression
and unreliability in probing. The -daemonize switch in QEMU
guarantees that the monitor socket is present before the parent
process exits. This means libvirtd is guaranteed to be able to
connect immediately. By switching from -daemonize to the
virCommandDaemonize API libvirtd was no longer synchronized with
QEMU's startup process. The result was that the QEMU monitor
failed to open and went into its 200ms sleep loop. This happened
for all 25 binaries resulting in 5 seconds worth of sleeping
at libvirtd startup. In addition sometimes when libvirt connected,
QEMU would be partially initialized and crash causing total
failure to probe that binary.
This commit reverts the previous change, ensuring we do use the
-daemonize flag to QEMU. Startup delay is cut from 7 seconds
to 2 seconds on my machine, which is on a par with what it was
prior to the capabilities rewrite.
To deal with the fact that QEMU needs to be able to create the
pidfile, we switch pidfile location fron runDir to libDir, which
QEMU is guaranteed to be able to write to.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there is no reason to hold qemu driver locked
throughout whole API execution. Moreover, we can use the
new qemuDomObjFromDomain() internal API to lookup domain then.
Hosts for rbd are ceph monitor daemons. These have fixed IP addresses,
so they are often referenced by IP rather than hostname for
convenience, or to avoid relying on DNS. Using IPv4 addresses as the
host name works already, but IPv6 addresses require rbd-specific
escaping because the colon is used as an option separator in the
string passed to qemu.
Escape these colons, and enclose the IPv6 address in square brackets
so it is distinguished from the port, which is currently mandatory.
Acked-by: Osier Yang <jyang@redhat.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876829 complains that
if a guest is put into S3 state (such as via virsh dompmsuspend)
and then an external snapshot is taken, qemu forcefully transitions
the domain to paused, but libvirt doesn't reflect that change
internally. Thus, a user has to use 'virsh suspend' to get libvirt
back in sync with qemu state, and if the user doesn't know this
trick, then the guest appears hung.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateActiveExternal):
Track fact that qemu wakes up a suspended domain on migration.
https://bugzilla.redhat.com/show_bug.cgi?id=895882
virDomainSnapshot.getDomain() and virDomainSnapshot.getConnect()
wrappers around virDomainSnapshotGet{Domain,Connect} were not supposed
to be ever implemented. The class should contain proper domain() and
connect() accessors that fetch python objects stored internally within
the class. While domain() was already provided, connect() was missing.
This patch adds connect() method to virDomainSnapshot class and
reimplements getDomain() and getConnect() methods as aliases to domain()
and connect() for backward compatibility.
The previous fix to avoid leaking securityDriverNames forgot to
handle the case of securityDriverNames being NULL, leading to
a crash
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The autodestroy callback code has the following function
called from a hash iterator
qemuDriverCloseCallbackRun(void *payload,
const void *name,
void *opaque)
{
...
char *uuidstr = name
...
dom = closeDef->cb(data->driver, dom, data->conn);
if (dom)
virObjectUnlock(dom);
virHashRemoveEntry(data->driver->closeCallbacks, uuidstr);
}
The closeDef->cb function may well cause the current callback
to be removed, if it shuts down 'dom'. As such the use of
'uuidstr' in virHashRemoveEntry is accessing free'd memory.
We must make a copy of the uuid str before invoking the
callback to be safe.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When linuxNodeInfoCPUPopulate() method triggered use of an
uninitialize value, since it did not initialize the 'sockets'
field in the virNodeInfoPtr struct:
==30020== Conditional jump or move depends on uninitialised value(s)
==30020== at 0x5125DBD: linuxNodeInfoCPUPopulate (nodeinfo.c:513)
==30020== by 0x51261A0: nodeGetInfo (nodeinfo.c:884)
==30020== by 0x149B9B10: qemuCapsInit (qemu_capabilities.c:846)
==30020== by 0x14A11B25: qemuCreateCapabilities (qemu_driver.c:424)
==30020== by 0x14A12426: qemuStartup (qemu_driver.c:874)
==30020== by 0x512A7AF: virStateInitialize (libvirt.c:822)
==30020== by 0x40DE04: daemonRunStateInit (libvirtd.c:877)
==30020== by 0x50ADCE5: virThreadHelper (virthreadpthread.c:161)
==30020== by 0x328CA07D14: start_thread (pthread_create.c:308)
==30020== by 0x328C6F246C: clone (clone.S:114)
(happened twice)
if (socks > nodeinfo->sockets) <--- here
nodeinfo->sockets = socks;
Rather than doing this for each field, just make the caller memset
the entire struct to zero.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 87b4c10c6c added code that may call
the virCapabilitiesClearHostNUMACellCPUTopology function with
uninitialized second argument. Although the value wouldn't be used some
compilers whine about that.
Be sure to VIR_FREE(accel) and moved virDomainVideoDefFree() within no_memory
label to be consistent
Resolve resource leak in parallelsApplyIfaceParams() when the 'oldnet' is
allocated locally. Also virCommandFree(cmd) as necessary.
This patch adds data gathering to the NUMA gathering files and adds
support for outputting the data. The test driver and xend driver need to
be adapted to fill sensible data to the structure in a future patch.
This will allow storing additional topology data in the NUMA topology
definition.
This patch changes the storage type and fixes fallout of the change
across the drivers using it.
This patch also changes semantics of adding new NUMA cell information.
Until now the data were re-allocated and copied to the topology
definition. This patch changes the addition function to steal the
pointer to a pre-allocated structure to simplify the code.
The way in that memory balloon suppression was handled for S390
is flawed for a number or reasons.
1. Just preventing the default balloon to be created in the case
of VIR_ARCH_S390[X] is not sufficient. An explicit memballoon
element in the guest definition will still be honored, resulting
both in a -balloon option and the allocation of a PCI bus address,
neither being supported.
2. Prohibiting balloon for S390 altogether at a domain_conf level
is no good solution either as there's work in progress on the QEMU
side to implement a virtio-balloon device, although in
conjunction with a new machine type. Suppressing the balloon
should therefore be done at the QEMU driver level depending
on the present capabilities.
Therefore we remove the conditional suppression of the default
balloon in domain_conf.c.
Further, we are claiming the memballoon device for virtio-s390
during device address assignment to prevent it from being considered
as a PCI device.
Finally, we suppress the generation of the balloon command line option
if this is a virtio-s390 machine.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Should have been done in commit 56fd513 already, but was missed
due to oversight: qemuDomainSendKey didn't release the driver lock
in its cleanup section. This fixes an issue introduced by commit
8c5d2ba.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This patch changes the name of the @sep argument to @terminator and
clarifies it's usage. This patch also explicitly documents that
whitespace can't be used as @terminator as it is skipped multiple times
in the implementation.
https://bugzilla.redhat.com/show_bug.cgi?id=892079
One of my previous patches (f2a4e5f176) tried to fix crashing
libvirtd on domain detroy. However, we need to copy pattern from
qemuProcessHandleMonitorEOF() instead of decrementing reference
counter. The rationale for this is, if qemu process is dying due
to domain being destroyed, we obtain EOF on both the monitor and
agent sockets. However, if the exit is expected, qemuProcessStop
is called, which cleans both agent and monitor sockets up. We
want qemuAgentClose() to be called iff the EOF is not expected,
so we don't leak an FD and memory. Moreover, there could be race
with qemuProcessHandleMonitorEOF() which could have already
closed the agent socket, in which case we don't want to do
anything.
A followon to commit id: 68dceb635 - if iface->iname is NULL, then
neither virNetDevOpenvswitchRemovePort() nor virNetDevVethDelete()
should be called. Found by Coverity.
Although the nwfilter driver skips startup when running in a
session libvirtd, it did not skip reload or shutdown. This
caused errors to be reported when sending SIGHUP to libvirtd,
and caused an abort() in libdbus on shutdown due to trying
to remove a dbus filter that was never added
When building with static analysis enabled, we turn on attribute
nonnull checking. However, this caused the build to fail with:
../../src/util/virobject.c: In function 'virObjectOnceInit':
../../src/util/virobject.c:55:40: error: null argument where non-null required (argument 1) [-Werror=nonnull]
Creation of the virObject class is the one instance where the
parent class is allowed to be NULL. Making things conditional
will let us keep static analysis checking for all other .c file
callers, without breaking the build on this one exception.
* src/util/virobject.c: Define witness.
* src/util/virobject.h (virClassNew): Use it to force most callers
to pass non-null parameter.
Adds a "ram" attribute globally to the video.model element, that changes
the resulting qemu command line only if video.type == "qxl".
<video>
<model type='qxl' ram='65536' vram='65536' heads='1'/>
</video>
That attribute gets a default value of 64*1024. The schema is unchanged
for other video element types.
The resulting qemu command line change is the addition of
-global qxl-vga.ram_size=<ram>*1024
or
-global qxl.ram_size=<ram>*1024
For the main and secondary qxl devices respectively.
The default for the qxl ram bar is 64*1024 kilobytes (the same as the
default qxl vram bar size).
The Coverity static analyzer was generating many false positives for the
unary operation inside the VIR_FREE() definition as it was trying to evaluate
the else portion of the "?:" even though the if portion was (1).
Signed-off-by: Eric Blake <eblake@redhat.com>
The count of vCPUs for a domain is extracted as a usingned long variable
but is stored in a unsigned short. If the actual number was too large,
a faulty number was stored.
The local redefinition of PED_PARTITION_PROTECTED results in the error
but is not a problem especially if the built code doesn't have the latest
definitions.
Upon successful return of virNetClientStreamEventAddCallback() the
allocated cbdata field will be freed by virNetClientStreamEventRemoveCallback()
as cbOpaque using the free function remoteStreamCallbackFree().
This avoids "Event negative_returns: A negative constant "-1" is passed as
an argument to a parameter that cannot be negative.". The called function
uses -1 to determine whether it needs to traverse all the hostdevs.
The use of switch statements inside a bounded for loop resulted in some
false positives regarding the "default:" label which cannot be reached
since each of the other case statements use the possible for loop values.
A [dead_error_begin] was added before the default label.
Commit id ebdbe25a adjusted the algorithm and the caller guarantees that
the 'params' will have a '_' in the name being searched. Add the [returned_null]
tag to the two instances.
The use of switch statements inside a bounded for loop resulted in some
false positives regarding the "default:" label which cannot be reached
since each of the other case statements use the possible for loop values.
Commit id a994ef2d1 changed the mechanism to store/update the default
security label from using disk->seclabels[0] to allocating one on the
fly. That change allocated the label, but never saved it. This patch
will save the label. The new virDomainDiskDefAddSecurityLabelDef() is
a copy of the virDomainDefAddSecurityLabelDef().
The code is not reachable as of commit id: bb85f229. Removed
virKeepAliveStop() and virObjectUnref() because 'ka' cannot be
anything but NULL at the cleanup label.
Currently, whenever somebody calls saferead() on nonblocking FD
(safewrite() is totally interchangeable for purpose of this message)
he might get wrong return value. For instance, in the first iteration
some data is read. The number of bytes read is stored into local
variable 'nread'. However, in next iterations we can get -1 from
read() with errno == EAGAIN, in which case the -1 is returned despite
fact some data has already been read. So the caller gets confused.
Bare read() should be used for nonblocking FD.
The snapshot name is used to create path to the definition save file.
When the name contains slashes the creation of the file fails. Reject
such names.
When the snapshot definition can't be saved, the
qemuDomainSnapshotCreate function succeeded without filling some of the
fields in the internal definition.
This patch removes the snapshot and returns failure if the XML file
cannot be written.
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Working with virTypedParameters in clients written in C is ugly and
requires all clients to duplicate the same code. This set of APIs makes
this code for manipulating with virTypedParameters integral part of
libvirt so that all clients may benefit from it.
When virStorageBackendLogicalCreateVol() creates a snapshot for a
logical volume with backingStore element, it fails with the message
below:
2013-01-17 03:10:18.869+0000: 1967: error : virCommandWait:2345 :
internal error Child process (/sbin/lvcreate --name lvm-snapshot -L 51200K
-s=/dev/lvm-pool/lvm-volume) unexpected exit status 3: /sbin/lvcreate:
invalid option -- '=' Error during parsing of command line.
This is because virCommandAddArgPair() uses '=' to connect the two
parameters, it's unsuitable for -s option of the lvcreate.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
A build on FreeBSD failed with:
util/virportallocator.c:108: error: storage size of 'addr' isn't known
util/virportallocator.c:123: error: 'INADDR_ANY' undeclared (first use in this function)
It turns out that while POSIX allows sockaddr_in to leak in through
<arpa/inet.h> (the way Linux does it), it is not mandatory, and
conforming applications are required to get it through <netinet/in.h>.
* src/util/virportallocator.c: Include header for struct
sockaddr_in.
* tests/virportallocatortest.c: Likewise.
The fetch of 'ipdef' in networkRefreshDhcpDaemon() when the loop to fill
in ipv4def fails to find an ipv4 address with dhcp defined. The filled in
ipdef value was not used. Code was made unnecessary with commit it 2d5cd1.
The driver mutex was unlocked in qemuDomainModifyDeviceFlags before
entering qemuDomainObjBeginJobWithDriver where it will be unlocked once
more leaving it in an undefined state. The result was that two
threads were simultaneously looking up the domain hash table during
multiple parallel device attach/detach operations.
Luckily this triggered a virHashIterationError.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When starting a VM, /var/log/messages was spammed with the following message:
xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
With each extra VM I start, the messages get amplified
exponentially. This results in longer starting times every new VM,
relative the the previously started VM. When I ran a test with
starting 100 equal VM's, the first VM started in about 2 seconds, the
100th VM took 48 seconds to start. I'm running a vanilla 3.7.1 kernel,
but I have the same issue on VM hosts with kernel 3.2.28 or 3.2.0,
running libvirt 0.9.12 and 0.9.8 respectively.
Looking into the warning, it seemed that iptables need an extra argument,
--physdev-is-bridged, in commands like:
iptables -A libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet99 -g FP-vnet99
With that, the warnings in /var/log/messages are gone and running the
test again proved the 100th VM started in 3.8 seconds.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=895294
The symptom was that attempts to modify a network device using
virDomainUpdateDeviceFlags() would fail if the original device had a
<boot> element (e.g. "<boot order='1'/>"), even if the updated device
had the same <boot> element. Instead, the following error would be logged:
cannot modify network device boot index setting
It's true that it's not possible to change boot order (internally
known as bootIndex) of a live device; qemuDomainChangeNet checks for
that, but the problem was that the information it was checking was
incorrect.
Explanation:
When a complete domain is parsed, a global (to the domain) "bootMap"
is passed down to the parse for each device; the bootMap is used to
make sure that devices don't have conflicting settings for their boot
orders.
When a single device is parsed by itself (as in the case of
virDomainUpdateDeviceFlags), there is no global bootMap that would be
appropriate to send, so NULL is sent instead. However, although the
lowest level function that parses just the boot order *does* simply
skip the sanity check in that case, the next higher level
"virDomainDeviceInfoParseXML" function refuses to call down to the
lower "virDomainDeviceBootParseXML" if bootMap is NULL. So, the boot
order is never set in the "new" device object, and when it is compared
to the original (which does have a boot order), they don't match.
The fix is to patch virDomainDeviceInfoParseXML to not care about
bootMap, and just always call virDomainDeviceInfoBootParseXML whenever
there is a <boot> element. When we are only parsing a single device,
we don't care whether or not any specified boot order is consistent
with the rest of the domain; we will always do this check later (in
the current case, we do it by verifying that the net bootIndex exactly
matches the old bootIndex).
The bandwidth plug and unplug functions were assuming that an
interface's bandwidth setting was always specified directly in the
domain's <interface> definition, but that's not necessarily true - it
could have been obtained from a <portgroup> definition in the network
definition. This patch fixes those functions to use
virDomainNetGetActualBandwidth(), which gets the bandwidth pointer
from iface->data.network.actual if it exists, otherwise returns
iface->bandwidth.
Remove extraneous check for 'netdef' when dereferencing for vlan.nTags.
Prior code would already check if netdef was NULL.
Coverity complained about a path where the 'vlan' was potentially valid,
but a prior checks may not have allocated 'iface->data.network.actual',
so like other paths it needs to be allocated on the fly.
Move the copying of vlan up earlier in networkAllocateActualDevice, so
that actual.type gets properly set.
Since the first assignment to vlan is redundant except in the case of
jumping immediately to validate from the start of the function,
eliminate its initial setting at the top of the function in favor of
calling the helper function virDomainNetGetActualVlan() (which doesn't
depend on the local vlan pointer being initialized) down at validate:
Signed-off-by: Laine Stump <laine@redhat.com>
When creating the virClass object for virNetClient, we specified
virObject as the parent instead of virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The QEMU driver default max port is 65535, but it then increments
this by 1 to 65536. This maps to 0 in an unsigned short :-( This
was apparently done so that for() loops could use "< max" instead
of "<= max". Remove this insanity and just make the loop do the
right thing.
that broke the build like:
CC libvirt_conf_la-domain_conf.lo
conf/domain_conf.c: In function 'virDomainVcpuPinAdd':
conf/domain_conf.c:11920:29: error: 'vpcupin' undeclared (first use in this function)
conf/domain_conf.c:11920:29: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [libvirt_conf_la-domain_conf.lo] Error 1
Commit dfa1e1dd added functions whose definitions do not conform
to the style used in the libxl driver. Change these functions to
be consistent throughout the driver.
In commit c4bbaaf8, caps->arch was checked uninitialized, rendering the
whole check useless.
This patch moves the conditional setting of QEMU_CAPS_NO_ACPI to
qemuCapsInitQMP, and removes the no longer needed exception for S390.
It also clears the flag for all non-x86 archs instead of just S390 in
qemuCapsInitHelp.
The virDomainObj, qemuAgent, qemuMonitor, lxcMonitor classes
all require a mutex, so can be switched to use virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Check status when attempting to set SO_REUSEADDR flag on outgoing connection
On failure, VIR_WARN(), but continue to connect. This code path is on the
sender side where the setting is just a hint and would only take effect if
the sender is overflowed with TCP connections. Inability to set doesn't mean
failure to establish a connection.
In virLockSpaceProtocolDispatchNew() the returned value of lockspace from
virLockDaemonFindLockSpace() is overwritten by the virLockSpaceNew() return.
Coverity complains that it's unused.
In virLockSpaceProtocolDispatchCreateLockSpace() lockspace is also overwritten
in a similar manner resulting in the same Coverity message.
After live change of cpu counts, the number of processor threads is
verified. This patch makes use of this approach to check if qemu ignored
the request for cpu hot-unplug and report an appropriate message.
A great many virObject instances require a mutex, so introduce
a convenient class for this which provides a mutex. This avoids
repeating the tedious init/destroy code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently all classes must directly inherit from virObject.
This allows for arbitrarily deep hierarchy. There's not much
to this aside from chaining up the 'dispose' handlers from
each class & providing APIs to check types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
libvirt lxc will fail to start when selinux is disabled.
error: Failed to start domain noroot
error: internal error guest failed to start: PATH=/bin:/sbin TERM=linux container=lxc-libvirt container_uuid=b9873916-3516-c199-8112-1592ff694a9e LIBVIRT_LXC_UUID=b9873916-3516-c199-8112-1592ff694a9e LIBVIRT_LXC_NAME=noroot /bin/sh
2013-01-09 11:04:05.384+0000: 1: info : libvirt version: 1.0.1
2013-01-09 11:04:05.384+0000: 1: error : lxcContainerMountBasicFS:546 : Failed to mkdir /sys/fs/selinux: No such file or directory
2013-01-09 11:04:05.384+0000: 7536: info : libvirt version: 1.0.1
2013-01-09 11:04:05.384+0000: 7536: error : virLXCControllerRun:1466 : error receiving signal from container: Input/output error
2013-01-09 11:04:05.404+0000: 7536: error : virCommandWait:2287 : internal error Child process (ip link del veth1) unexpected exit status 1: Cannot find device "veth1"
fix this problem by checking if selinuxfs is mounted
in host before we try to create dir /sys/fs/selinux.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Commit 509eb51 added lxc_protocol.x; but without the initial
checkin of src/lxc_protocol-structs, 'make check' would fail for
anyone with pdwtags installed:
make[3]: *** No rule to make target `lxc_protocol-structs', needed by `check-protocol'. Stop.
* src/lxc_protocol-structs: New file.
The virDomainLxcOpenNamespace method needs to open every file
in /proc/$INITPID/ns and return the open file descriptor to the
client application.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make cpuset local to the while loop and free it once done with it each
time through the loop. Add a sa_assert() to virBitmapParse() to keep Coverity
from believing there could be a negative return and possible resource leak.
Commit-id 'afc4631b' added the regfree(reg) to free resources alloc'd
during regcomp; however, reg still needed to be VIR_FREE()'d. The call
to regfree() also didn't account for possible NULL value. Reformatted
the call to be closer to usage.
Commit c308a9ae was incomplete; it resolved the configure failure,
but not a later build failure.
* src/util/virnetdevbridge.c: Include pre-req header.
* configure.ac (AC_CHECK_HEADERS): Prefer standard in.h over
non-standard ip6.h.
Some places missed the conversion from LIBCURL_{CFLAGS,LIBS} to
CURL_{CFLAGS,LIBS}, and a part of curl check was left in
configure.ac instead of m4/virt-curl.m4 by mistake
This patch introduces support for LXC specific public APIs. In
common with what was done for QEMU, this creates a libvirt_lxc.so
library and libvirt/libvirt-lxc.h header file.
The actual APIs are
int virDomainLxcOpenNamespace(virDomainPtr domain,
int **fdlist,
unsigned int flags);
int virDomainLxcEnterNamespace(virDomainPtr domain,
unsigned int nfdlist,
int *fdlist,
unsigned int *noldfdlist,
int **oldfdlist,
unsigned int flags);
which provide a way to use the setns() system call to move the
calling process into the container's namespace. It is not
practical to write in a generically applicable manner. The
nearest that we could get to such an API would be an API which
allows to pass a command + argv to be executed inside a
container. Even if we had such a generic API, this LXC specific
API is still useful, because it allows the caller to maintain
the current process context, in particular any I/O streams they
have open.
NB the virDomainLxcEnterNamespace() API is special in that it
runs client side, so does not involve the internal driver API.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This converts the libssh2 configure check to use LIBVIRT_CHECK_PKG.
Previously it would check version 1.0 and 1.3, but this simplifies
things to just require version 1.3
If addition of rules in networkAddIptablesRules() failed the real error
was masked by error reported when trying to clean up the remaining
rules.
With this patch the original error message is saved and set back after
the removal is complete.
Commit 0211fd6e04 introduced regression
where newly defined networks were not made persistent.
This patch makes the network persistent on each successful definition.
The phypUUIDTable_Push and phypUUIDTable_Pull leaked their file descriptors
on normal return. Each function had an unnecessary use of creating a buffer
to print conn->uri->user and needed a bit better flow control. I also noted
that the Read function had a cut-n-paste error from the write function on a
couple of VIR_WARN's.
The openSSHSession leaked the sock on the failure path. Additionally that
turns into the internal_socket in the phypOpen code. That was neither saved
nor closed on any path. So I used the connnection_data->sock field to save
the socket for eventual close. Of interest here is that phypExec used the
connection_data->sock field even though it had never been initialized.
I ran 'make dist' in the directory left over from ./autobuild.sh
(which was configured for a mingw cross build); the resulting
tarball had more files than 'make dist' on a normal Linux build.
I traced it to the fact that we were distributing a generated
file, but only when configure said the end user had to generate
the file in the first place. In the process, I noticed that
we had some difference in symbol file names; I added a comment
explaining why the difference exists (after first trying to
normalize the names and hitting VPATH build failures).
* configure.ac (LIBVIRT_QEMU_SYMBOL_FILE): Add some comments.
* src/Makefile.am (EXTRA_DIST): No need to ship a generated file;
particularly since which file is built depends on configure results.
There's no need to do lots of readlink() calls to canonicalize
a name if we're only going to use stat() on it, since stat()
already chases symlinks.
* src/util/virutil.c (virGetDeviceID): Let stat() do the symlink
chasing.
Pass stub driver name directly to pciDettachDevice and pciReAttachDevice to fit
for different libvirt drivers. For example, qemu driver prefers pci-stub, but
Xen prefers pciback.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
https://bugzilla.redhat.com/show_bug.cgi?id=892079
With current code, if user calls virDomainPMSuspendForDuration()
followed by virDomainDestroy(), the former API checks for qemu agent
presence, which will evaluate as true (if agent is configured). While
talking to qemu agent, the qemu driver is unlocked, so the latter API
starts executing. However, if machine dies meanwhile, libvirtd gets
EOF on the agent socket and qemuProcessHandleAgentEOF() is called. The
handler clears reference to qemu agent while the destroy API already
holding a reference to it. This leads to NULL dereferencing later in
the code. Therefore, the agent pointer should be set to NULL only if
we are the exclusive owner of it.
While OOM can have knock-on effects that trash a system, generally
the first symptom is one of memory thrashing.
* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.
when we has no host's src mapped to container.
there is no .oldroot dir,so libvirt lxc will fail
to start when mouting meminfo.
in this case,the parameter srcprefix of function
lxcContainerMountProcFuse should be NULL.and make
this method handle NULL correctly.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Perform all the appropriate plumbing.
When qemu/KVM VMs are paused manually through a monitor not-owned by libvirt,
libvirt will think of them as "paused" event after they are resumed and
effectively running. With this patch the discrepancy goes away.
This is meant to address bug 892791.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
gcc 4.1.2 on RHEL 5 warned:
conf/network_conf.c:3136: warning: 'foundIdx' may be used uninitialized in this function
The warning is spurious, but initializing the variable doesn't hurt.
* src/conf/network_conf.c (virNetworkDefUpdateDNSHost): Silence
unused variable warning.
POSIX does not guarantee whether uid_t and gid_t are signed or
unsigned, nor does it guarantee whether they are smaller, same
size, or larger than int (or even the same size as one another).
Therefore, it is possible to have platforms where '(uid_t)-1==-1'
is false or where 'uid = gid = -1' sets uid to the wrong value,
thanks to integer promotion rules. The only portable way to use
the placeholder value of these two types is to always use a cast.
Thankfully, the issue is mostly theoretical - sanlock only
compiles on Linux for now, and on Linux, these types do not
suffer from strange promotion problems.
* src/locking/lock_driver_sanlock.c
(virLockManagerSanlockSetupLockspace, virLockManagerSanlockInit)
(virLockManagerSanlockCreateLease): Cast -1 to proper type before
comparing with uid_t or gid_t.
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
This is the QEMU backend code for the SCLP console support.
It includes SCLP capability detection, QEMU command line generation
and a test case.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The SCLP console is the native console type for s390 and is preferred
over the virtio console as it doesn't require special drivers and
is more efficient. Recent versions of QEMU come with SCLP support
which is hereby enabled.
The new target types 'sclp' and 'sclplm' can be used to specify a
SCLP console. Adding documentation, domain schema and XML processing
support.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
To avoid confusion between the LXC driver <-> controller
monitor RPC protocol and the libvirt-lxc.so <-> libvirtd public
RPC protocol, rename the former to lxc_monitor_protocol.x
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the libvirt client can pass FDs to the server, but the
dispatch mechanism provides no way to return FDs back from the
server to the client. Tweak the dispatch code, such that if a
dispatcher returns '1', this indicates that it populated the
virNetMessagePtr with FDs to return
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A number of bugs handling file descriptors received from the
server caused the FDs to be lost and leaked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Change calling sequence to only call xenUnifiedDomainSetVcpusFlags() when
'dom' is not NULL. Use the GET_PRIVATE() macro to reference privateData.
Just return -1 if dom is NULL.
The code for setting up a private /dev/pts for the containers
is also responsible for making the LXC controller have a
private mount namespace. Unfortunately the /dev/pts code is
not run if launching a container without a custom root. This
causes the LXC FUSE mount to leak into the host FS.
Since we daemonized QEMU for capabilities probing there is a long
time if QEMU fails to launch. This is because we're not passing in
any virDomainObjPtr instance and thus the monitor code can not
check to see if the PID is still alive.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current code is initializing capabilities before setting
directory permissions. Thus the QEMU binaries being run may
not have the ability to create the UNIX monitor socket on
the first run of libvirtd.
See also commit 66ff2dd, where we avoided installing these files
as executables.
* daemon/Makefile.am (libvirtd.service): Drop chmod.
* tools/Makefile.am (libvirt-guests.service): Likewise.
* src/Makefile.am (virtlockd.service, virtlockd.socket):
Likewise.
virtlockd.service could be installed to a configurable root,
but virtlockd.socket was hardcoded to installation into a
distro.
* src/Makefile.am (virtlockd.service, virtlockd.socket): Drop
unused substitutions.
* src/locking/virtlockd.socket.in (ListenStream): Don't hard-code
/var.
We had several different styles of .in conversion in our Makefiles:
ALLCAPS, @ALLCAPS@, @lower@, ::lower::
Canonicalize on one form, to make it easier to copy and paste
between .in files.
Also, we were using some non-portable sed constructs: \@ is an
undefined escape sequence (it happens to be @ itself in GNU sed,
but POSIX allows it to mean something else), as well as risky
behavior (failure to consistently quote things means a space
in $(sysconfdir) could throw things off; also, Autoconf recommends
using | rather than , or ! in the s||| operator, because | has to
be quoted in shell and is therefore less likely to appear in file
names than , or !).
Fix all of these uses to follow the same syntax.
* daemon/libvirtd.8.in: Switch to @var@.
* tools/virt-xml-validate.in: Likewise.
* tools/virt-pki-validate.in: Likewise.
* src/locking/virtlockd.init.in: Likewise.
* daemon/Makefile.am: Prefer | over ! in sed.
(libvirtd.8): Prefer consistent substitution.
(libvirtd.init, libvirtd.service): Avoid non-portable sed.
* tools/Makefile.am (libvirt-guests.sh, libvirt-guests.init)
(libvirt-guests.service): Likewise.
(virt-xml-validate, virt-pki-validate, virt-sanlock-cleanup):
Prefer consistent capitalization.
* src/Makefile.am (virtlockd.init, virtlockd.service)
(virtlockd.socket): Prefer consistent substitution.
This prevents domain starting and disk attaching if the shared disk's
setting conflicts with other active domain(s), E.g. A domain with
"sgio" set as "filtered", however, another active domain is using
it set as "unfiltered".
Like "rawio", "sgio" is only allowed for block disk of device
type "lun".
It doesn't default disk->sgio to "filtered" when parsing, as
it won't be able to distinguish explicitly requested "filtered"
and a default "filtered" in driver then. We have to error out for
explicit request when the kernel doesn't support the new sysfs
knob "unpriv_sgio", however, for defaulted "filtered", we can
just ignore it if the kernel doesn't support "unpriv_sgio".
This introduces a hash table for qemu driver, to store the shared
disk's info as (@major:minor, @ref_count). @ref_count is the number
of domains which shares the disk.
Since we only care about if the disk support unprivileged SG_IO
commands, and the SG_IO commands only make sense for block disk,
this patch only manages (add/remove hash entry) the shared disk for
block disk.
* src/qemu/qemu_conf.h: (Add member 'sharedDisks' of type
virHashTablePtr; Declare helpers
qemuGetSharedDiskKey, qemuAddSharedDisk
and qemuRemoveSharedDisk)
* src/qemu/qemu_conf.c (Implement the 3 helpers)
* src/qemu/qemu_process.c (Update 'sharedDisks' when domain
starting and shutdown)
* src/qemu/qemu_driver.c (Update 'sharedDisks' when attaching
or detaching disk).
"virGetDeviceID" could be used across the sources, but it doesn't
relate with this series, and could be done later.
* src/util/virutil.h: (Declare virGetDeviceID, and
vir{Get,Set}DeviceUnprivSGIO)
* src/util/virutil.c: (Implement virGetDeviceID and
vir{Get,Set}DeviceUnprivSGIO)
* src/libvirt_private.syms: Export private symbols of upper helpers
When the disk alignment check done while redefining an existing snapshot
failed, the qemu driver attempted to free the existing snapshot. As in
the cleanup path the definition of the snapshot wasn't assigned, the
cleanup code dereferenced a NULL pointer.
This patch changes the behavior on error paths while redefining snapshot
in two ways:
1) On failure, modifications done on the snapshot definition object are
rolled back.
2) The previous definition of the data isn't freed until it's certain it
won't be needed any more.
This change avoids the segfault and additionally the snapshot doesn't
vanish if redefinition fails for some reason.
This also changes the function signature to take a
virDomainChrSourceDefPtr instead of just a path, since it needs to
differentiate behavior based on source->type.
The functionality provided in virchrdev.c (previously virconsole.c) is
applicable to other types of character devices besides consoles, such
as channels. This patch is just code motion, renaming things such as
"console" or "pty", instead using more general terms such as
"character device" or "device path".
This patch adds a new API, virDomainOpenChannel, that uses streams to
connect to a virtio channel on a guest. This creates a secure
communication channel between a guest and a libvirt client.
This behaves the same as virDomainOpenConsole, except on channels
instead of console/serial/parallel devices.
gcc -O2 complained:
../../src/conf/network_conf.c: In function 'virNetworkDefUpdateDNSSrv':
../../src/conf/network_conf.c:3232: error: 'foundIdx' may be used uninitialized in this function [-Wuninitialized]
It turned out to be a spurious warning (we didn't use foundIdx
unless foundCt was non-zero). But in investigating that, I noticed
a worse problem: we were using 'if (foundCt > 1)', but since foundCt
was bool, it could never be > 1.
* src/conf/network_conf.c (virNetworkDefUpdateDNSHost): Use
correct type.
(virNetworkDefUpdateDNSSrv): Likewise, and silence compiler
warning.
Since 4c993d8a we failed to set this important capability, which
allows starting a domain with QXL video card. We set DEVICE_QXL
capability bit instead, which is not necessary wrong. Anyway, if
qemu supports the new '-device qxl' it supports older '-vga qxl'
as well. The latter is used for the primary (the first) qxl video
card, the former for other video cards.
Commit b3f2b4ca5c left buf unallocated in
the case of QMP capability probing being used, leading to a segfault in
strlen in the cleanup path.
This patch opens the log and allocates the buffer if QMP probing was
used, so we can display the helpful error message.
Despite our great effort we still parsed qemu log output.
We wouldn't notice unless upcoming qemu 1.4 changed the
format of the logs slightly. Anyway, now we should gather
all interesting knobs like pty paths from monitor. Moreover,
since for historical reasons the first console can be just
an alias to the first serial port, we need to check this and
copy the pty path if that's the case to the first console.
This reverts commit 28224c4d2a
which shouldn't be needed at all because with current qemu
we obtain all paths from 'query-chardev' output. We ought
not parse log output at all anymore.
Since 586502189edf9fd0f89a83de96717a2ea826fdb0 qemu commit, the log
lines reporting chardev's path has changed from:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
char device redirected to /dev/pts/7
to:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device compat_monitor0 redirected to /dev/pts/5
char device serial0 redirected to /dev/pts/6
char device serial1 redirected to /dev/pts/7
However, with current code we are not prepared for such change, which
results in us being unable to start any domain.
Since sanlock doesn't run under root:root, we have chown()'ed the
__LIBVIRT__DISKS__ lease file to the user:group defined in the
sanlock config. However, when writing the patch I've forgot about
lease files for each disk (this is the
/var/lib/libvirt/sanlock/<md5>) file.
Many internal qemu APIs must find domain object from passed
virDomainPtr. And with function Peter's introduced, we can use it
instead of copying multiple lines among code.
Since we switched to QMP probing, the object types are spelled out
explicitly, i.e. virtio-net-pci. This has effectively disabled
the capability detection of s390 virtio devices. The trivial fix
is to add the s390 virtio types explicitly to qemuCapsObjectProps.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This is an adjustment to the fix for
https://bugzilla.redhat.com/show_bug.cgi?id=889319
to account for two bonehead mistakes I made.
commit ac2797cf2a attempted to fix a
problem with netlink in newer kernels requiring an extra attribute
with a filter flag set in order to receive an IFLA_VFINFO_LIST from
netlink. Unfortunately, the #ifdef that protected against compiling it
in on systems without the new flag went a bit too far, assuring that
the new code would *never* be compiled, and even if it had, the code
was incorrect.
The first problem was that, while some IFLA_* enum values are also
their existence at compile time, IFLA_EXT_MASK *isn't* #defined, so
checking to see if it's #defined is not a valid method of determining
whether or not to add the attribute. Fortunately, the flag that is
being set (RTEXT_FILTER_VF) *is* #defined, and it is never present if
IFLA_EXT_MASK isn't, so it's sufficient to just check for that flag.
And to top it off, due to the code not actually compiling when I
thought it did, I didn't realize that I'd been given the wrong arglist
to nla_put() - you can't just send a const value to nla_put, you have
to send it a pointer to memory containing what you want to add to the
message, along with the length of that memory.
This time I've actually sent the patch over to the other machine
that's experiencing the problem, applied it to the branch being used
(0.10.2) and verified that it works properly, i.e. it does fix the
problem it's supposed to fix. :-/
https://bugzilla.redhat.com/show_bug.cgi?id=888426
The code for doing a block-copy was supposed to track the destination
file in drive->mirror, but was set up to do all mallocs prior to
starting the copy so that OOM wouldn't leave things partially started.
However, the wrong variable was being written; later in the code we
silently did 'disk->mirror = mirror' which was still NULL, and thus
leaking memory and leaving libvirt to think that the mirror job was
never started, which prevented a pivot operation after a copy.
Problem introduced in commit 35c7701c6.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Initialize correct
variable.
To bring in line with new naming practice, rename the=
src/util/cgroup.{h,c} files to vircgroup.{h,c}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, it only considers PTY backend serial devices for pseries.
It need to support all kinds of serial devices.
This patch is to fix the problem which is that it doesn't work
when specifying source type as file.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
ACPI is only supported on x86 platform, PPC can't support it.
So QEMU_CAPS_NO_ACPI shouldn't be set.
This patch is to remove QEMU_CAPS_NO_ACPI capability for
non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Cirrus VGA model is not supported on ppc64 currently.
It needs to set std VGA model as the default model.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=889319
When assigning an SRIOV virtual function to a guest using "intelligent
PCI passthrough" (<interface type='hostdev'>, which sets the MAC
address and vlan tag of the VF before passing its info to qemu),
libvirt first learns the current MAC address and vlan tag by sending
an NLM_F_REQUEST message for the VF's PF (physical function) to the
kernel via a NETLINK_ROUTE socket (see virNetDevLinkDump()); the
response message's IFLA_VFINFO_LIST section is examined to extract the
info for the particular VF being assigned.
This worked fine with kernels up until kernel commit
115c9b81928360d769a76c632bae62d15206a94a (first appearing in upstream
kernel 3.3) which changed the ABI to not return IFLA_VFINFO_LIST in
the response until a newly introduced IFLA_EXT_MASK field was included
in the request, with the (newly introduced, of course) RTEXT_FILTER_VF
flag set.
The justification for this ABI change was that new fields had been
added to the VFINFO, causing NLM_F_REQUEST messages to fail on systems
with large numbers of VFs if the requesting application didn't have a
large enough buffer for all the info. The idea is that most
applications doing an NLM_F_REQUEST don't care about VFINFO anyway, so
eliminating it from the response would lower the requirements on
buffer size. Apparently, the people who pushed this patch made the
mistaken assumption that iproute2 (the "ip" command) was the only
package that used IFLA_VFINFO_LIST, so it wouldn't break anything else
(and they made sure that iproute2 was fixed.
The logic of this "fix" is debatable at best (one could claim that the
proper fix would be for the applications in question to be fixed so
that they properly sized the buffer, which is what libvirt does
(purely by virtue of using libnl), but it is what it is and we have to
deal with it.
In order for <interface type='hostdev'> to work properly on systems
with a kernel 3.3 or later, libvirt needs to add the afore-mentioned
IFLA_EXT_MASK field with RTEXT_FILTER_VF set.
Of course we also need to continue working on systems with older
kernels, so that one bit of code is compiled conditionally. The one
time this could cause problems is if the libvirt binary was built on a
system without IFLA_EXT_MASK which was subsequently updated to a
kernel that *did* have it. That could be solved by manually providing
the values of IFLA_EXT_MASK and RTEXT_FILTER_VF and adding it to the
message anyway, but I'm uncertain what that might actually do on a
system that didn't support the message, so for the time being we'll
just fail in that case (which will very likely never happen anyway).
This patch fixes the lack of error messages when libvirt fails to find
VFINFO in a returned netlinke response message.
https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 is an example
of the error message that was previously logged when the
IFLA_VFINFO_LIST object was missing from the netlink response. The
reason for this failure is detailed in
https://bugzilla.redhat.com/show_bug.cgi?id=889319
Even though that root problem has been fixed, the experience of
finding the root cause shows us how important it is to properly log an
error message in these cases. This patch *seems* to replace the entire
function, but really most of the changes are due to moving code that
was previously inside an if() statement out to the top level of the
function (the original if() was reversed and made to log an error and
return).
Revert the complex workaround of commit 39d91e9, now that we have
a nicer framework for shutting up broken gcc.
* src/util/buf.c (virBufferEscape): Simplify.
When changing to virArch, the virt-aa-helper.c file was not
completely changed. The vahControl struct was left with a
char *arch field, instead of virArch arch field.
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
When parsing the arch from domain XML, the result was only
saved to a local variable, not the virDomainDefPtr
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Prior to the virArch changes, the CPU baseline method would
free the arch string in the returned CPU. Fix the regression
by setting arch to VIR_ARCH_NONE at the end
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We can use VIR_REALLOC_N with NULL pointer, which behaves the same way
as VIR_ALLOC_N in that case, so no need for a condition that's
checking if some data are allocated already.
---
I tried to find other parts of the code similar to this, so I can do a
full cleanup for the whole repository, so I used this (excuse the long
line, but that's how I was writing it):
git grep -nHC 5 -e VIR_REALLOC_N -e VIR_ALLOC_N | while read line; do if [[ "$line" == "--" ]]; then if [[ ${#tmpbuf} -gt 10 && "$REALLOC_N" == "true" && "$ALLOC_N" == "true" ]]; then echo $line; while [[ ${#tmpbuf[*]} -gt 0 ]]; do echo "${tmpbuf[0]}"; tmpbuf=( "${tmpbuf[@]:1:${#tmpbuf[*]}}" ); done; fi; unset tmpbuf REALLOC_N ALLOC_N; else if [[ "$ALLOC_N" != "true" && "${line/VIR_ALLOC_N//}" != "${line}" ]]; then ALLOC_N="true"; fi; if [[ "$REALLOC_N" != "true" && "${line/VIR_REALLOC_N//}" != "${line}" ]]; then REALLOC_N="true"; fi; tmpbuf[${#tmpbuf[*]}]="$line"; fi; done | less
And reviewed the output just to find out this was the only occurrence of
the inconsistency.
On few places there are too many levels of indentation when some of
them can be fixed with negating the option they are in or omitting
useless condition altogether.
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Replace use of uname in nodeGetInfo with virArch APIs to
provide canonicalization of host architecture name
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a 'virArch' enum for CPU architectures. Include
data type providing wordsize and endianness, and APIs to
query this info and convert to/from enum and string form.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This is yet another refinement to the fix for CVE-2012-3411:
https://bugzilla.redhat.com/show_bug.cgi?id=833033
It turns out that it would be very intrusive to correctly backport the
entire --bind-dynamic option to older dnsmasq versions
(e.g. dnsmasq-2.48 that is used on RHEL6.x and CentOS 6.x), but very
simple to patch those versions to just use SO_BINDTODEVICE on all
their listening sockets (SO_BINDTODEVICE also has the desired effect
of permitting only traffic that was received on the interface(s) where
dnsmasq was set to listen.)
This patch modifies the dnsmasq capabilities detection to detect the
string:
--bind-interfaces with SO_BINDTODEVICE
in the output of "dnsmasq --version", and in that case realize that
using the old --bind-interfaces option is just as safe as
--bind-dynamic (and therefore *not* forbid creation of networks that
use public IP address ranges).
If -bind-dynamic is available, it is still preferred over
--bind-interfaces.
Note that this patch does no harm in upstream, or in any distro's
downstream if it happens to end up there, but builds for distros that
have a new enough dnsmasq to support --bind-dynamic do *NOT* need to
specifically backport this patch; it's only required for distro
releases that have dnsmasq too old to have --bind-dynamic (and those
distros will need to add the SO_BINDTODEVICE patch to dnsmasq,
*including the extra string in the --version output*, as well.
Somehow I managed to push the changes to this file with improper
indentation. This patch just re-indents, reformats the comment lines,
and re-groups a couple of multi-line strings so that they fit within
80 columns. The resulting binary should be identical.
Wire up the attach/detach device drivers in LXC to support the
hotplug/unplug of host misc devices.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach device drivers in LXC to support the
hotplug/unplug of host storage devices.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach device drivers in LXC to support the
hotplug/unplug of USB host devices.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach/update device APIs to support changing
of hostdevs in the persistent config file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach/update device APIs to support changing
of disks in the persistent config file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wire up the attach/detach/update device APIs to support changing
of network interfaces in the persistent config file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This wires up the LXC driver to support the domain device attach/
detach/update APIs, following the same code design as used in
the QEMU driver. No actual changes are possible with this commit,
it is only providing the framework
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This extends support for host device passthrough with LXC to
cover misc devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This extends support for host device passthrough with LXC to
cover storage devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This adds support for host device passthrough with the
LXC driver. Since there is only a single kernel image,
it doesn't make sense to pass through PCI devices, but
USB devices are fine. For the latter we merely need to
make the /dev/bus/usb/NNN/MMM character device exist
in the container's /dev
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently LXC guests can be given arbitrary pre-mounted
filesystems, however, for some usecases it is more appropriate
to provide block devices which the container can mount itself.
This first impl only allows for <disk type='block'>, in other
words exposing a host disk device to a container. Since LXC
does not have device namespace virtualization, we are cheating
a little bit. If the XML specifies /dev/sdc4 to be given to
the container as /dev/sda1, when we do the mknod /dev/sda1
in the container's /dev, we actually use the major:minor
number of /dev/sdc4, not /dev/sda1.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Prepare to support different types of hostdevs by refactoring
the current SELinux security driver code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virSecurityManager{Set,Restore}AllLabel methods are invoked
at domain startup/shutdown to relabel resources associated with
a domain. This works fine with QEMU, but with LXC they are in
fact both currently no-ops since LXC does not support disks,
hostdevs, or kernel/initrd files. Worse, when LXC gains support
for disks/hostdevs, they will do the wrong thing, since they
run in host context, not container context. Thus this patch
turns then into a formal no-op when used with LXC. The LXC
controller will call out to specific security manager labelling
APIs as required during startup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code for creating veth/macvlan devices is part of the
LXC process startup code. Refactor this a little and export
the methods to the rest of the LXC driver. This allows them
to be reused for NIC hotplug code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The <hostdev> device type has long had a redundant "mode"
attribute, which has always been "subsys". This finally
introduces a new mode "capabilities", which will be used
by the LXC driver for device assignment. Since container
based virtualization uses a single kernel, the idea of
assigning physical PCI devices doesn't make sense. It is
still reasonable to assign USB devices, but for assigning
arbitrary nodes in /dev, the new 'capabilities' mode is
to be used.
The first capability support is 'storage', which is for
assignment of block devices. Functionally this is really
pretty similar to the <disk> support. The only difference
is the device node name is identical in both host and
container namespaces.
<hostdev mode='capabilities' type='storage'>
<source>
<block>/dev/sdf1</block>
</source>
</hostdev>
The second capability support is 'misc', which is for
assignment of character devices. There is no existing
parallel to this. Again the device node is the same
inside & outside the container.
<hostdev mode='capabilities' type='misc'>
<source>
<char>/dev/input/event3</char>
</source>
</hostdev>
The reason for keeping the char & storage devices
separate in the domain XML, is to mirror the split
in the node device XML. NB the node device XML does
not yet report character devices, but that's another
new patch to come
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There was a double free issue caused by virSysinfoRead on s390,
as the same manufacturer string instance was assigned to more
than one processor record.
Cleaned up other potential memory issues and restructured the sysinfo
parsing code by moving repeating patterns into a helper function.
The restructuring made it necessary to conditionally disable
-Wlogical-op for some older GCC versions, using pragma GCC diagnostic.
This is a GCC specific pragma, which is acceptable, since we're
using it to work around a GCC specific bug.
Finally, added a function virSysinfoSetup to configure the sysinfo
data source files/script during run time, to facilitate writing test
programs. This function is not published in sysinfo.h and only
there for testing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This patch simplifies the code that parses the fallback and vendor_id
attributes from the domain xml cpu definition.
Changes done:
- free temp variables in the cleanup section instead of local use
- remove checking for presence of the attribute to directly getting the
value (saving call to virXPathBoolean)
- replace loop used to check for ',' in the vendor_id string with strchr
This patch fixes a problem that vendor_id attribute can not be defined
when fallback attribute is not defined.
If I define domain xml like below:
<domain>
<cpu>
<model vendor_id='aaaabbbbcccc'>core2duo</model>
</cpu>
</domain>
In dumpxml, vendor_id is not reflected:
<domain>
<cpu mode='custom' match='exact'>
<model fallback='allow'>core2duo</model>
</cpu>
</domain>
The expected output is:
<domain>
<cpu mode='custom' match='exact'>
<model fallback='allow' vendor_id='aaaabbbbcccc'>core2duo</model>
</cpu>
</domain>
If the fallback attribute and vendor_id attribute is defined at the same
time, it's reflected as expected.
Signed-off-by: Ken ICHIKAWA <ichikawa.ken@jp.fujitsu.com>
The current SELinux policy only works for KVM guests, since
TCG requires the 'execmem' privilege. There is a 'virt_use_execmem'
boolean to turn this on globally, but that is unpleasant for users.
This changes libvirt to automatically use a new 'svirt_tcg_t'
context for TCG based guests. This obsoletes the previous
boolean tunable and makes things 'just work(tm)'
Since we can't assume we run with new enough policy, I also
make us log a warning message (once only) if we find the policy
lacks support. In this case we fallback to the normal label and
expect users to set the boolean tunable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
'-device VGA' maps to '-vga std'
'-device cirrus-vga' maps to '-vga cirrus'
'-device qxl-vga' maps to '-vga qxl'
(there is also '-device qxl' for secondary devices)
'-device vmware-svga' maps to '-vga vmware'
For qemu(>=1.2), we can use -device to replace -vga for video
device. For the primary video device, the patch tries to use 0x2
slot for matching old qemu. If the 0x2 slot is allocated already,
the addr property could help for using any available slot.
For qemu(< 1.2), we keep using -vga for primary device.
If there are multiple video devices
primary = 'yes' marks this video device as the primary one.
The rest are secondary video devices. No more than one could be
mark as primary. If none of them has primary attribute, the first
one will be the primary by default like what it was.
The reason of this changing is that for qemu, only one primary video
device is permitted which can be of any type. For secondary video
devices, only qxl is allowd. Primary attribute removes the restriction
that the first have to be the primary one.
We always put the primary video device into the first position of
video device structure array after parsing.
QEMU_CAPS_DEVICE_QXL -device qxl
QEMU_CAPS_DEVICE_VGA -device VGA
QEMU_CAPS_DEVICE_CIRRUS_VGA -device cirrus-vga
QEMU_CAPS_DEVICE_VMWARE_SVGA -device vmware-svga
QEMU_CAPS_DEVICE_VIDEO_PRIMARY /* safe to use -device XXX
for primary video device */
Fix a typo in qemuCapsObjectTypes, the string 'qxl' here
should be -device qxl rather than -vga [...|qxl|..]
Noticed these while building on FreeBSD.
* src/qemu/qemu_monitor.c (qemuMonitorBlockInfoLookup): Rename
variable to avoid 'devname' collision.
* src/qemu/qemu_driver.c (qemuDomainInterfaceStats): Mark unused
variable.
This adds an implementation of virNetSocketGetUNIXIdentity()
using LOCAL_PEERCRED socket option and xucred struct, defined
in <sys/ucred.h> on systems that have it.
A forgotten "!" in recently-modified code at the top of
networkRefreshDaemon() meant an improper early return, which led to 1)
dnsmasq config files not being updated from the newly modified config,
and 2) dnsmasq not being sent a SIGHUP so that it could learn about
the changes to the config.
virNetworkDefGetIpByIndex() returns NULL if there are no ip objects of
the requested type, and if there are no IP elements, then dnsmasq
shouldn't be running, so we can return early. Otherwise we should
rewrite the config files and send a SIGHUP.
Currently, if sanlock is already registering a lockspace other
libvirtd instances (from other hosts) obtain -EINPROGRESS. On
sufficiently new sanlock, sanlock_inq_lockspace() is called,
which suspend execution until lockspace state is changed. With
current libvirt implementation, we fail to retry adding the
lockspace again but continue in error path. Therefore we produce
meaningless error message:
virLockManagerSanlockSetupLockspace:363 : Unable to add lockspace
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Success
qemudLoadDriverConfig:558 : Failed to load lock manager sanlock
We should try to re-add the lockspace after its state change to
be sure it was added successfully. In fact, with sufficiently new
sanlock we can just avoid dummy usleep() which is used if there's
no inquire API.
The virtlockd daemon scripts were lousy, when compared to their
counterparts in daemon/Makefile.am. In particular, when init
scripts were selected, this resulted in 'make distcheck' failing
due to failure to clean up src/virtlockd.init.
* src/Makefile.am (install-systemd): Fix dependencies. Use MKDIR_P.
(uninstall-systemd): Remove empty directory. Use fewer processes.
(install-init, install-sysconfig): Use MKDIR_P.
(uninstall-init): Remove correct file, and also empty directory.
(uninstall-sysconfig): Remove empty directory.
(DISTCLEANFILES): Clean up trivially built sources.
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).
This patch resolves the problem reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=886663
The source of the problem was the fix for CVE 2011-3411:
https://bugzilla.redhat.com/show_bug.cgi?id=833033
which was originally committed upstream in commit
753ff83a50. That commit improperly
removed the "--except-interface lo" from dnsmasq commandlines when
--bind-dynamic was used (based on comments in the latter bug).
It turns out that the problem reported in the CVE could be eliminated
without removing "--except-interface lo", and removing it actually
caused each instance of dnsmasq to listen on localhost on port 53,
which created a new problem:
If another instance of dnsmasq using "bind-interfaces" (instead of
"bind-dynamic") had already been started (or if another instance
started later used "bind-dynamic"), this wouldn't have any immediately
visible ill effects, but if you tried to start another dnsmasq
instance using "bind-interfaces" *after* starting any libvirt
networks, the new dnsmasq would fail to start, because there was
already another process listening on port 53.
(Subsequent to the CVE fix, another patch changed the network driver
to put dnsmasq options in a conf file rather than directly on the
dnsmasq commandline, but preserved the same options.)
This patch changes the network driver to *always* add
"except-interface=lo" to dnsmasq conf files, regardless of whether we use
bind-dynamic or bind-interfaces. This way no libvirt dnsmasq instances
are listening on localhost (and the CVE is still fixed).
The actual code change is miniscule, but must be propogated through all
of the test files as well.
The default lockd driver behavour is to acquire leases
directly on the disk files. This introduces an alternative
mode, where leases are acquire indirectly on a file that
is based on a SHA256 hash of the disk filename.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This adds a 'lockd' lock driver which is just a client which
talks to the lockd daemon to perform all locking. This will
be the default lock driver for any hypervisor which needs one.
* src/Makefile.am: Add lockd.so plugin
* src/locking/lock_driver_lockd.c: Lockd driver impl
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virtlockd daemon maintains file locks on behalf of libvirtd
and any VMs it is running. These file locks must be held for as
long as any VM is running. If virtlockd itself ever quits, then
it is expected that a node would be fenced/rebooted. Thus to
allow for software upgrads on live systemd, virtlockd needs the
ability to re-exec() itself.
Upon receipt of SIGUSR1, virtlockd will save its current live
state out to a file /var/run/virtlockd-restart-exec.json
It then re-exec()'s itself with exactly the same argv as it
originally had, and loads the state file, reconstructing any
objects as appropriate.
The state file contains information about all locks held and
all network services and clients currently active. An example
state document is
{
"server": {
"min_workers": 1,
"max_workers": 20,
"priority_workers": 0,
"max_clients": 20,
"keepaliveInterval": 4294967295,
"keepaliveCount": 0,
"keepaliveRequired": false,
"services": [
{
"auth": 0,
"readonly": false,
"nrequests_client_max": 1,
"socks": [
{
"fd": 6,
"errfd": -1,
"pid": 0,
"isClient": false
}
]
}
],
"clients": [
{
"auth": 0,
"readonly": false,
"nrequests_max": 1,
"sock": {
"fd": 9,
"errfd": -1,
"pid": 0,
"isClient": true
},
"privateData": {
"restricted": true,
"ownerPid": 1722,
"ownerId": 6,
"ownerName": "f18x86_64",
"ownerUUID": "97586ba9-df27-9459-c806-f016c8bbd224"
}
},
{
"auth": 0,
"readonly": false,
"nrequests_max": 1,
"sock": {
"fd": 10,
"errfd": -1,
"pid": 0,
"isClient": true
},
"privateData": {
"restricted": true,
"ownerPid": 1784,
"ownerId": 7,
"ownerName": "f16x86_64",
"ownerUUID": "7b8e5e42-b875-61e9-b981-91ad8fa46979"
}
}
]
},
"defaultLockspace": {
"resources": [
{
"name": "/var/lib/libvirt/images/f16x86_64.raw",
"path": "/var/lib/libvirt/images/f16x86_64.raw",
"fd": 14,
"lockHeld": true,
"flags": 0,
"owners": [
1784
]
},
{
"name": "/var/lib/libvirt/images/shared.img",
"path": "/var/lib/libvirt/images/shared.img",
"fd": 12,
"lockHeld": true,
"flags": 1,
"owners": [
1722,
1784
]
},
{
"name": "/var/lib/libvirt/images/f18x86_64.img",
"path": "/var/lib/libvirt/images/f18x86_64.img",
"fd": 11,
"lockHeld": true,
"flags": 0,
"owners": [
1722
]
}
]
},
"lockspaces": [
],
"magic": "30199"
}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This enhancement virtlockd so that it can receive a pre-opened
UNIX domain socket from systemd at launch time, and adds the
systemd service/socket unit files
* daemon/libvirtd.service.in: Require virtlockd to be running
* libvirt.spec.in: Add virtlockd systemd files
* src/Makefile.am: Install systemd files
* src/locking/lock_daemon.c: Support socket activation
* src/locking/virtlockd.service.in, src/locking/virtlockd.socket.in:
systemd unit files
* src/rpc/virnetserverservice.c, src/rpc/virnetserverservice.h:
Add virNetServerServiceNewFD() method
* src/rpc/virnetsocket.c, src/rpc/virnetsocket.h: Add virNetSocketNewListenFD
method
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a lock_daemon_dispatch.c file which implements the
server side dispatcher the RPC APIs previously defined in the
lock protocol.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virtlockd daemon will be responsible for managing locks
on virtual machines. Communication will be via the standard
RPC infrastructure. This provides the XDR protocol definition
* src/locking/lock_protocol.x: Wire protocol for virtlockd
* src/Makefile.am: Include lock_protocol.[ch] in virtlockd
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virtlockd daemon will maintain locks on behalf of libvirtd.
There are two reasons for it to be separate
- Avoid risk of other libvirtd threads accidentally
releasing fcntl() locks by opening + closing a file
that is locked
- Ensure locks can be preserved across libvirtd restarts.
virtlockd will need to be able to re-exec itself while
maintaining locks. This is simpler to achieve if its
sole job is maintaining locks
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Refactor virLockManagerPluginNew() so that the caller does
not need to pass in the config file path itself - just the
config directory and driver name.
Fix QEMU to actually pass in a config file when creating the
default lock manager plugin, rather than NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current virStorageFileGet{LVM,SCSI}Key methods return
the key as the return value. Unfortunately it is desirable
for "NULL" to be a valid return value, as well as an error
indicator. Thus the returned key must instead be provided
as an out-parameter.
When we invoke lvs or scsi_id to extract ID for block devices,
we don't want virCommandWait logging errors messages. Thus we
must explicitly check 'status != 0', rather than letting
virCommandWait do it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The libxl driver ignored boot devices in the domain config,
preventing PXE booting HVM domains. This patch accounts for
user-specified boot devices when building the libxl domain
configuration.
The QED file format is non-versioned, so although the magic
value matched, libvirt rejected it due to lack of a version
number to compare against. We need to distinguish this case
by allowing a value of '-2' to indicate a non-versioned file
where only the magic is required to match
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To help us detect when new storage file versions come into
existance log a warning if the storage file magic matches,
but the version does not
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Fully stub out the virCgroupGetAppRoot method as done with other
methods in the file, rather than just the body. This lets us
annotate the unused parameter to avoid a warning
I noticed that /var/lib/libvirt/dnsmasq/*.conf used the wrong word;
it was intended to match the wording in src/util/xml.c.
* src/network/bridge_driver.c (networkDnsmasqConfContents): Fix typo.
* tests/networkxml2confdata/*.conf: Update accordingly.
* Autotools changes:
- Don't assume Qemu is Linux-only
- Check Linux headers only on Linux
- Disable firewalld on FreeBSD
* Initctl:
Initctl seem to present only on Linux, so stub it on other platforms
* Raw I/O: Linux-only as well
* Headers cleanup
make check fails in check-symsorting if configure is not run in
the source directory. Prefixing symfile names with $(srcdir)
fixes this.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This patch adds a new domain lookup helper qemuDomObjFromDomainDriver
that lookups the domain and leaves the driver locked. The driver is
returned as the second argument of that function. If the lookup fails
the driver is unlocked to help avoid cleanup codepaths.
This patch also improves docs for the helpers.
This patch gets rid of the undeterministic error reporting code done on
return values of get(pw|gr)nam_r. With this patch, if the group record
is not returned by the corresponding function this error is not
considered fatal even if errno != 0. The error is logged in such case.
Move the code for matching hostdev instances out of virDomainHostdevFind
and into virDomainHostdevMatch method, which in turn calls out to other
helper methods depending on the type of hostdev.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rename virDomainHostdevPartsParse to virDomainHostdevDefParseSubsys
to reflect the fact that it only deals with hostdevs uing the
traditional mode=subsystem, and not mode=capabilities
Rename virDomainHostSourceFormat to virDomainHostdevDefFormatSubsys
for the same reason.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
virStorageFileGetLVMKey and virStorageFileGetSCSIKey
both return heap allocated strings, so the return value
should not be marked const.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add check-symsorting.pl to perform case-insensitive alphabetical
sorting of groups of symbols. Fix all violations it reports
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When using vnc gaphics over a unix socket, virt-aa-helper needs to provide
access for the qemu domain to access the sockfile.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
When a qemu domain is backed by huge pages, apparmor needs to grant the domain
rw access to files under the hugetlbfs mount point. Add a hook, called in
qemu_process.c, which ends up adding the read-write access through
virt-aa-helper. Qemu will be creating a randomly named file under the
mountpoint and unlinking it as soon as it has mmap()d it, therefore we
cannot predict the full pathname, but for the same reason it is generally
safe to provide access to $path/**.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch exports qemuMigrationIsAllowed and adds a new parameter to it
to denote if it's a remote migration or a local migration. Local
migrations are used in snapshots and saving of the machine state and
have fewer restrictions. This patch also adjusts callers of the function
and tweaks some error messages to be more universal.
Currently there is no way to detect it via QMP and requesting "-sandbox
off" works correctly even if it was compiled out, so this will work
unless someone both requests the sandbox in qemu.conf and builds QEMU
without the support for it.
Interfaces keeps a class_id, which is an ID from which bridge
part of QoS settings is derived. We need to store class_id
in domain status file, so we can later pass it to
virNetDevBandwidthUnplug.
Currently, we are only keeping a inactive XML configuration
in status dir. This is no longer enough as we need to keep
this class_id attribute so we don't overwrite old entries
when the daemon restarts. However, since there has already
been release which has just <network/> as root element,
and we want to keep things compatible, detect that loaded
status file is older one, and don't scream about it.
Network should be notified if we plug in or unplug an
interface, so it can perform some action, e.g. set/unset
network part of QoS. However, we are doing this in very
early stage, so iface->ifname isn't filled in yet. So
whenever we want to report an error, we must use a different
identifier, e.g. the MAC address.
This will be used whenever a NIC with guaranteed throughput is to
be plugged into a bridge. It will adjust the average throughput of
non guaranteed NICs (classid 1:2) to meet new requirements.
These set bridge part of QoS when bringing domain's interface up.
Long story short, if there's a 'floor' set, a new QoS class is created.
ClassID MUST be unique within the bridge and should be kept for
unplug phase.
These classes can borrow unused bandwidth. Basically,
only egress qdsics can have classes, therefore we can
do this kind of traffic shaping only on host's outgoing,
that is domain's incoming traffic.
This is however supported only on domain interfaces with
type='network'. Moreover, target network needs to have at least
inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average'
(old) or 'floor' (new). This new attribute can be used just for
interfaces type of network (<interface type='network'/>) currently.
Stochastic Fairness Queuing (SFQ) is queuing discipline
(qdisc) which doesn't really shape any traffic but 'just'
re-arrange packets in sending buffer so no stream starve.
The goal is to ensure fairness. There is basically only one
configuration parameter (perturb) which is set to advised
value of 10.
Network adapters of type 'routed' is a special case. Other adapters
have 'network' parameter in prlctl's output instead.
Routed network adapters should be connected to 'routed' network
from libvirt's view.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Historically if traffic from the adapter is routed to LAN without
NAT, it isn't connected to any virtual networks, but has a 'type'
instead. Sinse libvirt has special virtual network type for such case,
let's add pseudo network 'routed' to fit libvirt's API well.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Parallels Cloud Server uses virtual networks model for network
configuration. It uses own tools for virtual network management.
So add network driver, which will be responsible for listing
virtual networks and performing different operations on them
(in consequent patched).
This patch only allows listing virtual network names, without
any parameters like DHCP server settings.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Allow changing network interfaces in domain configuration.
ifname is used as iterface identifier: if there is interface
with some ifname in old config and there are no interfaces with
such name in the new config - issue prlctl command to delete
the network interface. And vice versa - if interface with
some ifname exists only in new config - issue prlctl command
to create it.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Parse network interfaces info from prlctl output.
Parallels Cloud Server uses virtual networks model for
network configuration: You can add network adapter to
VM and connect it to some predefined virtual network.
Fill type, mac, network name and linkstate fields of
virDomainNetDef structure.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
When the disk snapshot part of an external system checkpoint fails the
memory image is retained. This patch adds code to remove the image in
such case.
In case the snapshot code isn't able to restart CPUs after an external
checkpoint we would leak a copy of the domains XML definition. This
patch fixes the cleanup path.
False positive, but it breaks the build with gcc-4.6.3.
qemu/qemu_migration.c:2931:37: error: 'offline' may be used
uninitialized in this function [-Werror=uninitialized]
qemu/qemu_migration.c:2887:10: note: 'offline' was declared here
This patch changes how parameters are passed to dnsmasq. Instead of
being on the command line, the parameters are put into a file (one
parameter per line) and a commandline --conf-file= specifies the
location of the file. The file is located in the same directory as
the leases file.
Putting the dnsmasq parameters into a configuration file
allows them to be examined and more easily understood than
examining the command lines displayed by "ps ax". This is
especially true when a number of networks have been started.
When the use of dnsmasq was originally done, the required command line
was simple, but it has gotten more complicated over time and will
likely become even more complicated in the future.
Note: The test conf files have all been renamed .conf instead of
.argv, and tests/networkxml2xmlargvdata was moved to
tests/networkxml2xmlconfdata.
The DHCPv6 support includes IPV6 dhcp-range and dhcp-host for one
IPv6 subnetwork on one interface. This support will only work
if dnsmasq version >= 2.64; otherwise an error occurs if
dhcp-range or dhcp-host is specified for an IPv6 address.
Essentially, this change provides the same DHCP support for IPv6
that has been available for IPv4.
With dnsmasq >= 2.64, support for the RA service is also now provided
by dnsmasq (radvd is no longer used/started). (Although at least one
version of dnsmasq prior to 2.64 "supported" IPv6 Router
Advertisement, there were bugs (fixed in 2.64) that rendered it
unusable.)
Documentation and the network schema has been updated
to reflect the new support.
virNetworkDefUpdateForward requires separate functions to parse and
clear a virNetworkForwardDef by itself, but they were previously just
inlined in the virNetworkDef parse and free functions. This patch
makes them into separate functions.
The attributes of a <network> element's <forward> element were
previously stored directly in the virNetworkDef object, but
virNetworkUpdateForward() needs to operate on a <forward> in
isolation, so this patchs pulls out all those attributes into a
separate virNetworkForwardDef struct (and shortens their names
appropriately). This new object is contained in the virNetworkDef, not
pointed to by it, so there is no extra memory management.
This patch makes no functional changes, it only changes, e.g.,
"nForwardIfs" to "forward.nifs".
The other clear functions in network_conf.c that clear out arrays of
sub-objects do so by using the n[itemname]s value as a counter going
down to 0. Make this one consistent. There's no functional value, just
makes the style more consistent with the rest of the file.
This makes some function names and arg lists for consistent with other
parse functions in network_conf.c. While modifying
virNetworkIPParseXML(), also change its "error" label to "cleanup",
since the code at that label is executed on success as well as
failure.
These three functions are very similar - none allow a MODIFY
operation; you can only add or delete.
The biggest difference between them (other than the data itself) is in
the criteria for determining a match, and whether or not multiple
matches are possible:
1) for HOST records, it's considered a match if the IP address or any
of the hostnames of an existing record matches.
2) for SRV records, it's a match if all of
domain+service+protocol+target *which have been specified* are
matched.
3) for TXT records, there is only a single field to match - name
(value can be the same for multiple records, and isn't considered a
search term), so by definition there can be no ambiguous matches.
In all three cases, if any matches are found, ADD will fail; if
multiple matches are found, it means the search term was ambiguous,
and a DELETE will fail.
The upper level code in bridge_driver.c is already implemented for
these functions - appropriate conf files will be re-written, and
dnsmasq will be SIGHUPed or restarted as appropriate.
Since there is only a single virNetworkDNSDef for any virNetworkDef,
and it's trivial to determine whether or not it contains any real
data, it's much simpler (and fits more uniformly with the parse
function calling sequence of the parsers for many other objects that
are subordinates of virNetworkDef) if virNetworkDef *contains* an
virNetworkDNSDef rather than pointing to one.
Since it is now just a part of another object rather than its own
object, it no longer makes sense to have a *Free() function, so that
is changed to a *Clear() function.
More importantly though, ParseXML and Clear functions are needed for
the individual items contained in a virNetworkDNSDef (srv, txt, and
host records), but none of them have a *Clear(), and only two of the
three had *ParseXML() functions (both of which used a non-uniform
arglist). Those problems are cleared up by this patch - it splits the
higher-level Clear function into separate functions for each of the
three, creates a parse for txt records, and cleans up the srv and host
parsers, so we now have all the utility functions necessary to
implement virNetworkDefUpdateDNS(Host|Srv|Txt).
This shortens the name of the structs for srv and txt, and their
instances in virNetworkDNSDef, to be more compact and uniform with the
naming of the dns host array. It also changes the type of ntxts, etc
from unsigned int to size_t, so that they can be used directly as args
to VIR_*_ELEMENT.
The already-written backend functions for virNetworkUpdate that add
and delete items into lists within the a network were already debugged
to work properly, but future such functions will use
VIR_(INSERT|DELETE)_ELEMENT instead, so these are changed for
uniformity.
I noticed when writing the backend functions for virNetworkUpdate that
I was repeating the same sequence of memmove, VIR_REALLOC, nXXX-- (and
messed up the args to memmove at least once), and had seen the same
sequence in a lot of other places, so I decided to write a few
utility functions/macros - see the .h file for full documentation.
The intent is to reduce the number of lines of code, but more
importantly to eliminate the need to check the element size and
element count arithmetic every time we need to do this (I *always*
make at least one mistake.)
VIR_INSERT_ELEMENT: insert one element at an arbitrary index within an
array of objects. The size of each object is determined
automatically by the macro using sizeof(*array). The new element's
contents are copied into the inserted space, then the original copy
of contents are 0'ed out (if everything else was
successful). Compile-time assignment and size compatibility between
the array and the new element is guaranteed (see explanation below
[*])
VIR_INSERT_ELEMENT_COPY: identical to VIR_INSERT_ELEMENT, except that
the original contents of newelem are not cleared to 0 (i.e. a copy
is made).
VIR_APPEND_ELEMENT: This is just a special case of VIR_INSERT_ELEMENT
that "inserts" one past the current last element.
VIR_APPEND_ELEMENT_COPY: identical to VIR_APPEND_ELEMENT, except that
the original contents of newelem are not cleared to 0 (i.e. a copy
is made).
VIR_DELETE_ELEMENT: delete one element at an arbitrary index within an
array of objects. It's assumed that the element being deleted is
already saved elsewhere (or cleared, if that's what is appropriate).
All five of these macros have an _INPLACE variant, which skips the
memory re-allocation of the array, assuming that the caller has
already done it (when inserting) or will do it later (when deleting).
Note that VIR_DELETE_ELEMENT* can return a failure, but only if an
invalid index is given (index + amount to delete is > current array
size), so in most cases you can safely ignore the return (that's why
the helper function virDeleteElementsN isn't declared with
ATTRIBUTE_RETURN_CHECK). A warning is logged if this ever happens,
since it is surely a coding error.
[*] One initial problem with the INSERT and APPEND macros was that,
due to both the array pointer and newelem pointer being cast to void*
when passing to virInsertElementsN(), any chance of type-checking was
lost. If we were going to move in newelem with a memmove anyway, we
would be no worse off for this. However, most current open-coded
insert/append operations use direct struct assignment to move the new
element into place (or just populate the new element directly) - thus
use of the new macros would open a possibility for new usage errors
that didn't exist before (e.g. accidentally sending &newelemptr rather
than newelemptr - I actually did this quite a lot in my test
conversions of existing code).
But thanks to Eric Blake's clever thinking, I was able to modify the
INSERT and APPEND macros so that they *do* check for both assignment
and size compatibility of *ptr (an element in the array) and newelem
(the element being copied into the new position of the array). This is
done via clever use of the C89-guaranteed fact that the sizeof()
operator must have *no* side effects (so an assignment inside sizeof()
is checked for validity, but not actually evaluated), and the fact
that virInsertElementsN has a "# of new elements" argument that we
want to always be 1.
When restarting CPUs after an external snapshot, the restarting function
was called without the appropriate async job type. This caused that a
new sync job wasn't created and allowed races in the monitor.
New VM will have default values for all parameters, like
cpu number, we have to change its configuration as provided
by xml definition, given to parallelsDomainDefineXML.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Implement creation of new disks - if a new disk found
in configuration, find a volume by disk path and
actually create a disk image by issuing prlctl command.
If it's successfully finished - remove the file with volume
definition.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Add function for convertion bus from libvirt's numeric constant
to a name, used in a parallels command-line tools.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Move part, which deletes existing volume, to a new function
parallelsStorageVolumeDefRemove so that we can use it later
in parallels_driver.c
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Read disk images size from xml description and fill
virStorageVolDef.capacity and allocation (let's consider
that allocation is the same as capacity, calculating real
allcoation will be implemented later).
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Disk images in Parallels Cloud Server stored in directories. Each
one has files with data and xml description of an image stored in
file DiskDescriptior.xml.
Since we have to support 'detached' images, which are not used by
any VM, the better way to collect info about volumes is searching for
directories with a file DiskDescriptior.xml in each VM directory.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
There are no storage pools in Parallels Cloud Server -
All VM data stored in a single directory: config, snapshots,
memory dump together with disk images.
Let's look through list of VMs and create a storage pool for
each directory, containing VMs.
So if you have 3 vms: /var/parallels/vm-1.pvm,
/var/parallels/vm-2.pvm and /root/test.pvm - 2 storage pools
appear: -var-parallels and -root. xml descriptions of the pools
will be saved in /etc/libvirt/parallels-storage, so UUIDs will
not change netween connections to libvirt.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
We don't support unprivileged users anymore, so remove code, which
selects configuration directory depending on user.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Allow changing some parameters of the hard disks: bus,
image and drive address.
Creating new disk devices and removing existing ones
require changes in the storage driver, so it will be
implemented later.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Offline migration transfers inactive definition of a domain (which may
or may not be active). After successful completion, the domain remains
in its current state on source host and is defined but inactive on
destination host. It's a bit more clever than virDomainGetXMLDesc() on
source host followed by virDomainDefineXML() on destination host, as
offline migration will run pre-migration hook to update the domain XML
on destination host. Currently, copying non-shared storage is not
supported during offline migration.
Offline migration can be requested with a new migration flag called
VIR_MIGRATE_OFFLINE (which has to be combined with
VIR_MIGRATE_PERSIST_DEST flag).
This fixes a problem that showed up during testing of:
https://bugzilla.redhat.com/show_bug.cgi?id=881480
Due to a logic error in the function that gets the name of the bridge
an interface connects to, any time a bridge was specified directly
(type='bridge') rather than indirectly (type='network'), An error
would be logged (although the operation would then complete
successfully):
Network type 6 is not supported
The final virReportError() in the function
qemuDomainNetGetBridgeName() was apparently avoided in the past with a
"goto cleanup" at the end of each case, but the case of bridge somehow
no longer has that final goto cleanup.
The proper solution is anyway to not rely on goto's, but put the error
log inside an else {} clause, so that it's executed only if the type
is neither bridge nor network (in reality, this function should only
ever be called for those two types, that's why this is an internal
error).
While making this change, the error message was also tuned to be more
correct (since it's not really the type of the network, but the type
of the interface, and it *is* otherwise supported, it's just that the
interface type in question doesn't *have* a bridge device associated
with it, or at least we don't know how to get it).
If a network interface model is not specified, libvirt will run
into an unchecked NULL pointer coredump. On the other hand if
the empty model is ignored, a PCI bus address would be generated,
which is not supported by S390.
Since the only valid network type model for S390 is virtio,
we use this as the default value, which is the same for QEMU.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Things are supposed to look like:
<machine canonical='pc-0.12'>pc</machine>
But are currently swapped. This can cause many VMs to revert to having
machine type='pc' which will affect save/restore across qemu upgrades.
Add VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA flag to virStorageVolCreateXML
and virStorageVolCreateXMLFrom. This flag requests metadata
preallocation when creating/cloning qcow2 images, resulting in creating
a sparse file with qcow2 metadata. It has only slightly larger disk usage
compared to new image with no allocation, but offers higher performance.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
Based on a patch originally authored by Daniel De Graaf
http://lists.xen.org/archives/html/xen-devel/2012-05/msg00565.html
This patch converts the Xen libxl driver to support only Xen >= 4.2.
Support for Xen 4.1 libxl is dropped since that version of libxl is
designated 'technology preview' only and is incompatible with Xen 4.2
libxl. Additionally, the default toolstack in Xen 4.1 is still xend,
for which libvirt has a stable, functional driver.
virGetGroupIDByName is documented as returning 1 if the groupname
cannot be found. getgrnam_r is documented as returning:
« 0 or ENOENT or ESRCH or EBADF or EPERM or ... The given name
or gid was not found. »
and that:
« The formulation given above under "RETURN VALUE" is from POSIX.1-2001.
It does not call "not found" an error, hence does not specify what
value errno might have in this situation. But that makes it impossible to
recognize errors. One might argue that according to POSIX errno should be
left unchanged if an entry is not found. Experiments on various UNIX-like
systems shows that lots of different values occur in this situation: 0,
ENOENT, EBADF, ESRCH, EWOULDBLOCK, EPERM and probably others. »
virGetGroupIDByName returns an error when the return value of getgrnam_r
is non-0. However on my RHEL system, getgrnam_r returns ENOENT when the
requested user cannot be found, which then causes virGetGroupID not
to behave as documented (it returns an error instead of falling back
to parsing the passed-in value as an gid).
This commit makes virGetGroupIDByName only report an error when errno
is set to one of the values in the posix description of getgrnam_r
(which are the same as the ones described in the manpage on my system).
virGetUserIDByName is documented as returning 1 if the username
cannot be found. getpwnam_r is documented as returning:
« 0 or ENOENT or ESRCH or EBADF or EPERM or ... The given name
or uid was not found. »
and that:
« The formulation given above under "RETURN VALUE" is from POSIX.1-2001.
It does not call "not found" an error, hence does not specify what
value errno might have in this situation. But that makes it impossible to
recognize errors. One might argue that according to POSIX errno should be
left unchanged if an entry is not found. Experiments on various UNIX-like
systems shows that lots of different values occur in this situation: 0,
ENOENT, EBADF, ESRCH, EWOULDBLOCK, EPERM and probably others. »
virGetUserIDByName returns an error when the return value of getpwnam_r
is non-0. However on my RHEL system, getpwnam_r returns ENOENT when the
requested user cannot be found, which then causes virGetUserID not
to behave as documented (it returns an error instead of falling back
to parsing the passed-in value as an uid).
This commit makes virGetUserIDByName only report an error when errno
is set to one of the values in the posix description of getpwnam_r
(which are the same as the ones described in the manpage on my system).
If debugging is enabled, the debug messages are sent to stderr.
Moreover, if a command has catching of stderr set, the messages
gets mixed with stdout output (assuming both outputs are stored
in the same variable). The resulting string then doesn't
necessarily have to start with desired prefix then. This bug
exposes itself when parsing dnsmasq output:
2012-12-06 11:18:11.445+0000: 18491: error :
dnsmasqCapsSetFromBuffer:664 : internal error cannot parse
/usr/sbin/dnsmasq version number in '2012-12-06
11:11:02.232+0000: 18492: debug : virFileClose:72 : Closed fd 22'
We can clearly see that the output of dnsmasq --version doesn't
start with expected "Dnsmasq version " string but a libvirt debug
output.
If the debugging is enabled, the virCommand subsystem catches debug
messages in the command output as well. In that case, we can't assume
the string corresponding to command's stdout will start with specific
prefix. But the prefix can be moved deeper in the string. This bug
shows itself when parsing dnsmasq output:
2012-12-06 11:18:11.445+0000: 18491: error :
dnsmasqCapsSetFromBuffer:664 : internal error cannot parse
/usr/sbin/dnsmasq version number in '2012-12-06 11:11:02.232+0000:
18492: debug : virFileClose:72 : Closed fd 22'
We can clearly see that the output of dnsmasq --version
doesn't start with expected "Dnsmasq version " string but a libvirt
debug output.
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=767057
It was possible to define a network with <forward mode='bridge'> that
had both a bridge device and a forward device defined. These two are
mutually exclusive by definition (if you are using a bridge device,
then this is a host bridge, and if you have a forward dev defined,
this is using macvtap). It was also possible to put <ip>, <dns>, and
<domain> elements in this definition, although those aren't supported
by the current driver (although it's conceivable that some other
driver might support that).
The items that are invalid by definition, are now checked in the XML
parser (since they will definitely *always* be wrong), and the others
are checked in networkValidate() in the network driver (since, as
mentioned, it's possible that some other network driver, or even this
one, could some day support setting those).
This patch adds the capability for virtual guests to do IPv6
communication via a virtual network interface with no IPv6 (gateway)
addresses specified. This capability has always been enabled by
default for IPv4, but disabled for IPv6 for security concerns, and
because it requires the ip6tables command to be operational (which
isn't the case on a system with the ipv6 module completely disabled).
This patch adds a new attribute "ipv6" at the toplevel of a <network>
object. If ipv6='yes', the extra ip6tables rules required to permite
inter-guest communications are added when the network is started. If
it is 'no', or not present, those rules will not be added; thus the
default behavior doesn't change, so there should be no compatibility
issues with any existing installations.
Note that virtual guests cannot communication with the virtualization
host via this interface, because the following kernel tunable has
been set:
net.ipv6.conf.<bridge_interface_name>.disable_ipv6 = 1
This assures that the bridge interface will not have an IPv6
link-local (fe80::) address.
To control this behavior so that it is not enabled by default, the parameter
ipv6='yes' on the <network> statement has been added.
Documentation related to this patch has been updated.
The network schema has also been updated.
https://bugzilla.redhat.com/show_bug.cgi?id=832302
It's odd to fall through to buildVol, and the existed file is
removed when buildVol fails. This checks if the volume target
path already exists in createVol. The reason for not using
error like "Volume already exists" is that there isn't volume
maintained by libvirt for the path until a operation like
pool-refresh, using error like that will just cause confusion.
https://bugzilla.redhat.com/show_bug.cgi?id=866524
Since the virConnect object is not locked wholely when doing
virConenctDispose, a thread can get the lock and thus might
cause the race.
Detected by valgrind:
==23687== Invalid read of size 4
==23687== at 0x38BAA091EC: pthread_mutex_lock (pthread_mutex_lock.c:61)
==23687== by 0x3FBA919E36: remoteClientCloseFunc (remote_driver.c:337)
==23687== by 0x3FBA936BF2: virNetClientCloseLocked (virnetclient.c:688)
==23687== by 0x3FBA9390D8: virNetClientIncomingEvent (virnetclient.c:1859)
==23687== by 0x3FBA851AAE: virEventPollRunOnce (event_poll.c:485)
==23687== by 0x3FBA850846: virEventRunDefaultImpl (event.c:247)
==23687== by 0x40CD61: vshEventLoop (virsh.c:2128)
==23687== by 0x3FBA8626F8: virThreadHelper (threads-pthread.c:161)
==23687== by 0x38BAA077F0: start_thread (pthread_create.c:301)
==23687== by 0x33F68E570C: clone (clone.S:115)
==23687== Address 0x4ca94e0 is 144 bytes inside a block of size 312 free'd
==23687== at 0x4A0595D: free (vg_replace_malloc.c:366)
==23687== by 0x3FBA8588B8: virFree (memory.c:309)
==23687== by 0x3FBA86AAFC: virObjectUnref (virobject.c:145)
==23687== by 0x3FBA8EA767: virConnectClose (libvirt.c:1458)
==23687== by 0x40C8B8: vshDeinit (virsh.c:2584)
==23687== by 0x41071E: main (virsh.c:3022)
The above race is caused by the eventLoop thread tries to handle
the net client event by calling the callback set by:
virNetClientSetCloseCallback(priv->client,
remoteClientCloseFunc,
conn, NULL);
I.E. remoteClientCloseFunc, which lock/unlock the virConnect object.
This patch is to fix the bug by setting the callback to NULL when
doRemoteClose.
The pciWrite32 function assembled the array of data to be written to the
fd with a bad offset on the last byte. This issue was probably caused by
a typo (14, 24).
Unmanaged PCI devices were only leaked if pciDeviceListAdd failed but
managed devices were always leaked. And leaking PCI device is likely to
leave PCI config file descriptor open. This patch fixes
qemuReattachPciDevice to either free the PCI device or add it to the
inactivePciHostdevs list.
An attempt to attach device that is already attached to a domain results
in the following error:
virsh # attach-device rhel6 pci2 --persistent
error: Failed to attach device from pci2
error: invalid argument: device is already in the domain configuration
The "invalid argument" error code looks wrong, we usually use "operation
invalid" when the action cannot be done in current state.
Only one error in qemu_monitor was already using the relatively
new OPERATION_UNSUPPORTED error, even though it is a better fit
for all of the messages related to options that are unsupported
due to the version of qemu in use rather than due to a user's
XML or .conf file choice. Suggested by Osier Yang.
* src/qemu/qemu_monitor.c (qemuMonitorSendFileHandle)
(qemuMonitorAddHostNetwork, qemuMonitorRemoveHostNetwork)
(qemuMonitorAttachDrive, qemuMonitorDiskSnapshot)
(qemuMonitorDriveMirror, qemuMonitorTransaction)
(qemuMonitorBlockCommit, qemuMonitorDrivePivot)
(qemuMonitorBlockJob, qemuMonitorSystemWakeup)
(qemuMonitorGetVersion, qemuMonitorGetMachines)
(qemuMonitorGetCPUDefinitions, qemuMonitorGetCommands)
(qemuMonitorGetEvents, qemuMonitorGetKVMState)
(qemuMonitorGetObjectTypes, qemuMonitorGetObjectProps)
(qemuMonitorGetTargetArch): Use better error category.
Without this patch, attempts to create a disk snapshot when qemu
is too old results in a cryptic message:
virsh # snapshot-create 23 --disk-only
error: operation failed: Failed to take snapshot: unknown command: 'snapshot_blkdev'
Now it reports:
virsh # snapshot-create 23 --disk-only
error: unsupported configuration: live disk snapshot not supported with this QEMU binary
All versions of qemu that support live disk snapshot also support
QMP (basically upstream qemu 1.1 and later, and backports to RHEL 6.2).
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DISK_SNAPSHOT): New
capability.
* src/qemu/qemu_capabilities.c (qemuCaps): Track it.
(qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive): Use
it.
* src/qemu/qemu_monitor.c (qemuMonitorDiskSnapshot): Simplify.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextDiskSnapshot):
Delete.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextDiskSnapshot):
Likewise.
RHEL 6.3 uses dbus-devel-1.2.24, which lacked support for the
DBUS_TYPE_UNIX_FD define (contrast with Fedora 18 using 1.6.8).
But since it is an older dbus, it also lacks support for shutdown
inhibitions as provided by newer systemd.
Compilation failure introduced in commit 31330926.
* src/rpc/virnetserver.c (virNetServerAddShutdownInhibition):
Compile out if dbus is too old.
Implement the domainManagedSave, domainHasManagedSaveImage, and
domainManagedSaveRemove functions in the libvirt legacy xen driver.
domainHasManagedSaveImage check the managedsave image from filesystem
everytime. This is different from qemu and libxl driver. In qemu or
libxl driver, there is a hasManagesSave flag in virDomainObjPtr which
is not used in xen legacy driver. This flag could not add into xen
driver ptr either, because the driver ptr will be released at the end of
every libvirt api call. Meanwhile, AFAIK, xen store all the flags in
xen not in libvirt xen driver. There is no need to add this flag in xen.
Signed-off-by: Bamvor Jian Zhang <bjzhang@suse.com>
Use the freedesktop inhibition DBus service to prevent host
shutdown or session logout while any VMs are running.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently to deal with auto-shutdown libvirtd must periodically
poll all stateful drivers. Thus sucks because it requires
acquiring both the driver lock and locks on every single virtual
machine. Instead pass in a "inhibit" callback to virStateInitialize
which drivers can invoke whenever they want to inhibit shutdown
due to existance of active VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The only important state that should prevent libvirtd shutdown
is from running VMs. Networks, host devices, network filters
and storage pools are all long lived resources that have no
significant in-memory state. They should not block shutdown.
The patch adds the backend driver to support iSCSI format storage pools
and volumes for ESX host. The mapping of ESX iSCSI specifics to Libvirt
is as follows:
1. ESX static iSCSI target <------> Libvirt Storage Pools
2. ESX iSCSI LUNs <------> Libvirt Storage Volumes.
The above understanding is based on http://libvirt.org/storage.html.
The operation supported on iSCSI pools includes:
1. List storage pools & volumes.
2. Get XML descriptor operaion on pools & volumes.
3. Lookup operation on pools & volumes by name, UUID and path (if applicable).
iSCSI pools does not support operations such as: Create / remove pools
and volumes.
Since we can't (currently) rely on the ability to provide blanket
support for all possible network changes by calling the toplevel
netdev hostside disconnect/connect functions (due to qemu only
supporting a lockstep between initialization of host side and guest
side of devices), in order to support live change of an interface's
nwfilter we need to make a special purpose function to only call the
nwfilter teardown and setup functions if the filter for an interface
(or its parameters) changes. The pattern is nearly identical to that
used to change the bridge that an interface is connected to.
This patch was inspired by a request from Guido Winkelmann
<guido@sagersystems.de>, who tested an earlier version.
To detect if an interface's nwfilter has changed, we need to also
compare the filterparams, which is a hashtable of virNWFilterVarValue.
virHashEqual can do this nicely, but requires a pointer to a function
that will compare two of the items being stored in the hashes.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=881480
These three functions:
virDomainNetGetActualBridgeName
virDomainNetGetActualDirectDev
virDomainNetGetActualDirectMode
return attributes that are in a union whose contents are interpreted
differently depending on the actual->type and so they should only
return non-0 when actual->type is 'bridge' (in the first case) or
'direct' (in the other two cases, but I had neglected to do that, so
...DirectDev() was returning bridge.brname (which happens to share the
same spot in the union with direct.linkdev) if actual->type was
'bridge', and ...BridgeName was returning direct.linkdev when
actual->type was 'direct'.
How does this involve Bug 881480 (which was about the inability to
switch between two networks that both have "<forward mode='bridge'/>
<bridge name='xxx'/>"? Whenever the return value of
virDomainNetGetActualDirectDev() for the new and old network
definitions doesn't match, qemuDomainChangeNet() requires a "complete
reconnect" of the device, which qemu currently doesn't
support. ...DirectDev() *should* have been returning NULL for old and
new, but was instead returning the old and new bridge names, which
differ.
(The other two functions weren't causing any behavioral problems in
virDomainChangeNet(), but their problem and fix was identical, so I
included them in this same patch).
Fix the null pointer access when UUID is not specified.
Introduce a bool 'uuidUsable' to virStoragePoolAuthCephx that indicates
if uuid was specified or not and use it instead of the pointless
comparison of the static UUID array to NULL.
Add an error message if both uuid and usage are specified.
Fixes:
Error: FORWARD_NULL (CWE-476):
libvirt-0.10.2/src/conf/storage_conf.c:461: var_deref_model: Passing
null pointer "uuid" to function "virUUIDParse(char const *, unsigned
char *)", which dereferences it. (The dereference is assumed on the
basis of the 'nonnull' parameter attribute.)
Error: NO_EFFECT (CWE-398):
libvirt-0.10.2/src/conf/storage_conf.c:979: array_null: Comparing an
array to null is not useful: "src->auth.cephx.secret.uuid != NULL".
Commit a21f5112 fixed one API, but missed two others that also
failed to log their 'flags' argument.
* src/libvirt.c (virNodeSuspendForDuration, virDomainGetHostname):
Log flags parameter.
This introduces a few new APIs for dealing with strings.
One to split a char * into a char **, another to join a
char ** into a char *, and finally one to free a char **
There is a simple test suite to validate the edge cases
too. No more need to use the horrible strtok_r() API,
or hand-written code for splitting strings.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add support for doing controlled shutdown / reboot in the LXC
driver. The default behaviour is to try talking to /dev/initctl
inside the container's virtual root (/proc/$INITPID/root). This
works with sysvinit or systemd. If that file does not exist
then send SIGTERM (for shutdown) or SIGHUP (for reboot). These
signals are not any kind of particular standard for shutdown
or reboot, just something apps can choose to handle. The new
virDomainSendProcessSignal allows for sending custom signals.
We might allow the choice of SIGTERM/HUP to be configured for
LXC containers via the XML in the future.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The fact that only the guest agent, or ACPI flag can be used
when requesting reboot/shutdown is merely a limitation of the
QEMU driver impl at this time. Thus it should not be in
libvirt.c code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To be able todo controlled shutdown/reboot of containers an
API to talk to init via /dev/initctl is required. Fortunately
this is quite straightforward to implement, and is supported
by both sysvinit and systemd. Upstart support for /dev/initctl
is unclear.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When seeing a message
virNetSASLContextCheckIdentity:146 : SASL client admin not allowed in whitelist
it isn't immediately obvious that 'admin' is the identity
being checked. Quote the string to make it more obvious
The default machine type must be stored in the first element of
the caps->machineTypes array. This was done for help output
parsing but not for QMP probing.
Added a helper function qemuSetDefaultMachine to apply the same
fix up for both probing methods.
Further, it was necessary to set caps->nmachineTypes after QMP
probing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=872292
Libvirt should not attempt to call a QMP command that has not been
documented in qemu.git - if future qemu introduces a command by the
same name but with subtly different semantics, then libvirt will be
broken when trying to use that command.
We also had some code that could never be reached - some of our
commands have an alternate for new vs. old qemu HMP commands; but
if we are new enough to support QMP, we only need a fallback to
the new HMP counterpart, and don't need to try for a QMP counterpart
for the old HMP version.
See also this attempt to convert the three snapshot commands to QMP:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01597.html
although it looks like that will still not happen before qemu 1.3.
That thread eventually decided that qemu would use the name
'save-vm' rather than 'savevm', which mitigates the fact that
libvirt's attempt to use a QMP 'savevm' would be broken, but we
might not be as lucky on the other commands.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONSetCPU)
(qemuMonitorJSONAddDrive, qemuMonitorJSONDriveDel)
(qemuMonitorJSONCreateSnapshot, qemuMonitorJSONLoadSnapshot)
(qemuMonitorJSONDeleteSnapshot): Use only HMP fallback for now.
(qemuMonitorJSONAddHostNetwork, qemuMonitorJSONRemoveHostNetwork)
(qemuMonitorJSONAttachDrive, qemuMonitorJSONGetGuestDriveAddress):
Delete; QMP implies QEMU_CAPS_DEVICE, which prefers AddNetdev,
RemoveNetdev, and AddDrive anyways (qemu_hotplug.c has all callers).
* src/qemu/qemu_monitor.c (qemuMonitorAddHostNetwork)
(qemuMonitorRemoveHostNetwork, qemuMonitorAttachDrive): Reflect
deleted commands.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONAddHostNetwork)
(qemuMonitorJSONRemoveHostNetwork, qemuMonitorJSONAttachDrive):
Likewise.
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
Commit 1b2ebf95 overlooked that there were two spots affected.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice):
Transfer backing chain before deletion.
* src/qemu/qemu_driver.c (qemuDomainDetachDeviceDiskLive): Fix
spacing (partly to ensure a different-looking patch).
Also removed some unreachable code found by coverity:
libvirt-0.10.2/src/nwfilter/nwfilter_driver.c:259: unreachable: This
code cannot be reached: "nwfilterDriverUnlock(driver...".
This patch adds two labels and gets rid of a ton of duplicated code.
This patch also fixes some error message and switches most of them to
proper error reporting functions.
This patch adds macros to help retrieve configuration values from qemu
driver's configuration. Some configuration options are grouped
together in the process.
This bug resolves CVE-2012-3411, which is described in the following
bugzilla report:
https://bugzilla.redhat.com/show_bug.cgi?id=833033
The following report is specifically for libvirt on Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=874702
In short, a dnsmasq instance run with the intention of listening for
DHCP/DNS requests only on a libvirt virtual network (which is
constructed using a Linux host bridge) would also answer queries sent
from outside the virtualization host.
This patch takes advantage of a new dnsmasq option "--bind-dynamic",
which will cause the listening socket to be setup such that it will
only receive those requests that actually come in via the bridge
interface. In order for this behavior to actually occur, not only must
"--bind-interfaces" be replaced with "--bind-dynamic", but also all
"--listen-address" options must be replaced with a single
"--interface" option. Fully:
--bind-interfaces --except-interface lo --listen-address x.x.x.x ...
(with --listen-address possibly repeated) is replaced with:
--bind-dynamic --interface virbrX
Of course libvirt can't use this new option if the host's dnsmasq
doesn't have it, but we still want libvirt to function (because the
great majority of libvirt installations, which only have mode='nat'
networks using RFC1918 private address ranges (e.g. 192.168.122.0/24),
are immune to this vulnerability from anywhere beyond the local subnet
of the host), so we use the new dnsmasqCaps API to check if dnsmasq
supports the new option and, if not, we use the "old" option style
instead. In order to assure that this permissiveness doesn't lead to a
vulnerable system, we do check for non-private addresses in this case,
and refuse to start the network if both a) we are using the old-style
options, and b) the network has a publicly routable IP
address. Hopefully this will provide the proper balance of not being
disruptive to those not practically affected, and making sure that
those who *are* affected get their dnsmasq upgraded.
(--bind-dynamic was added to dnsmasq in upstream commit
54dd393f3938fc0c19088fbd319b95e37d81a2b0, which was included in
dnsmasq-2.63)
This new function returns true if the given address is in the range of
any "private" or "local" networks as defined in RFC1918 (IPv4) or
RFC3484/RFC4193 (IPv6), otherwise they return false.
These ranges are:
192.168.0.0/16
172.16.0.0/16
10.0.0.0/24
FC00::/7
FEC0::/10
In order to optionally take advantage of new features in dnsmasq when
the host's version of dnsmasq supports them, but still be able to run
on hosts that don't support the new features, we need to be able to
detect the version of dnsmasq running on the host, and possibly
determine from the help output what options are in this dnsmasq.
This patch implements a greatly simplified version of the capabilities
code we already have for qemu. A dnsmasqCaps device can be created and
populated either from running a program on disk, reading a file with
the concatenated output of "dnsmasq --version; dnsmasq --help", or
examining a buffer in memory that contains the concatenated output of
those two commands. Simple functions to retrieve capabilities flags,
the version number, and the path of the binary are also included.
bridge_driver.c creates a single dnsmasqCaps object at driver startup,
and disposes of it at driver shutdown. Any time it must be used, the
dnsmasqCapsRefresh method is called - it checks the mtime of the
binary, and re-runs the checks if the binary has changed.
networkxml2argvtest.c creates 2 "artificial" dnsmasqCaps objects at
startup - one "restricted" (doesn't support --bind-dynamic) and one
"full" (does support --bind-dynamic). Some of the test cases use one
and some the other, to make sure both code pathes are tested.
On OOM, xdr_destroy got called even though it wasn't created yet.
Found by coverity:
Error: UNINIT (CWE-457):
libvirt-0.10.2/src/rpc/virnetmessage.c:214: var_decl: Declaring
variable "xdr" without initializer.
libvirt-0.10.2/src/rpc/virnetmessage.c:219: cond_true: Condition
"virReallocN(&msg->buffer, 1UL /* sizeof (*msg->buffer) */,
msg->bufferLength) < 0", taking true branch
libvirt-0.10.2/src/rpc/virnetmessage.c:221: goto: Jumping to label
"cleanup"
libvirt-0.10.2/src/rpc/virnetmessage.c:257: label: Reached label
"cleanup"
libvirt-0.10.2/src/rpc/virnetmessage.c:258: uninit_use: Using
uninitialized value "xdr.x_ops".
Found by coverity:
Error: REVERSE_INULL (CWE-476):
libvirt-0.10.2/src/util/processinfo.c:141: deref_ptr: Directly
dereferencing pointer "map".
libvirt-0.10.2/src/util/processinfo.c:142: check_after_deref:
Null-checking "map" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
Found by coverity:
Error: REVERSE_INULL (CWE-476):
libvirt-0.10.2/src/conf/netdev_bandwidth_conf.c:99: deref_ptr:
Directly dereferencing pointer "node".
libvirt-0.10.2/src/conf/netdev_bandwidth_conf.c:107:
check_after_deref: Null-checking "node" suggests that it may be
null, but it has already been dereferenced on all paths leading to
the check.
The virStateInitialize method and several cgroups methods were
using an 'int privileged' parameter or similar for dual-state
values. These are better represented with the bool type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow actions to be performed in libvirtd when the host
shuts down, or user session exits, introduce a 'stop'
method to virDriverState. This will do things like saving
the VM state to a file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Implement the new API for sending signals to processes in a guest
for the LXC driver. Only support sending signals to the init
process for now, because
- The kernel does not appear to expose the mapping between
container PID numbers and host PID numbers anywhere in the
host OS namespace
- There is no race-free way to validate whether a host PID
corresponds to a process in a container.
* src/lxc/lxc_driver.c: Allow sending processes signals
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
* src/remote/remote_protocol.x: message definition
* src/remote/remote_driver.c: Register driver function
* src/remote_protocol-structs: Test case
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an API for sending signals to arbitrary processes in the
guest OS. This is primarily useful for container based virt,
but can be used for machine virt too, if there is a suitable
guest agent,
* include/libvirt/libvirt.h.in: Add virDomainSendProcessSignal
and virDomainProcessSignal enum
* src/driver.h: Driver entry point
* src/libvirt.c, src/libvirt_public.syms: Impl for new API
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As of 1a50ba2cb0 we fail to connect to the
monitor instead of getting an exit status != 0 from qemu itself. This
breaks capabilities probing for the non QMP case.
The documentation to this API has some defects from
grammar and wording POV. These were raised after I've
pushed the patches, so they are in a separate commit.
It makes no sense to fail the whole getting command if there is
a parameter unsupported by the kernel. This patch fixes it by
omitting the unsupported parameter for getMemoryParameters.
And for setMemoryParameters, this checks if there is an unsupported
parameter up front of the setting, and just returns failure if not
all parameters are supported.
Replace the following names
* struct qemu_snap_remove with virQEMUSnapRemovePtr
* struct qemu_snap_reparent with virQEMUSnapReparentPtr
* struct qemu_save_header with virQEMUSaveHeaderPtr
* enum qemu_save_formats with virQEMUSaveFormat
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove the obsolete 'qemud' naming prefix and underscore
based type name. Introduce virQEMUDriverPtr as the replacement,
in common with LXC driver naming style
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=879473
The name attribute is required for portgroup elements (yes, the RNG
specifies that), and there is code in libvirt that assumes it is
non-null. Unfortunately, the portgroup parsing function wasn't
checking for lack of portgroup. One adverse result of this was that
attempts to update a network by adding a portgroup with no name would
cause libvirtd to segfault. For example:
virsh net-update default add portgroup "<portgroup default='yes'/>"
This patch causes virNetworkPortGroupParseXML to fail if no name is
specified, thus avoiding any later problems.
Throughout the code, we've always used VIR_DOMAIN_SHUTDOWN* flags
even for virDomainReboot() API and its implementation. Fortunately,
the appropriate macros has the same value. But if we want to keep
things consistent, we should be using the correct macros. This
patch doesn't break anything, luckily.
Commit cb022152 went overboard and introduced a dead conditional
while trying to get rid of a potential NULL dereference.
* src/nwfilter/nwfilter_dhcpsnoop.c (virNWFilterSnoopReqNew):
Remove redundant conditional.
In a few places, the return value could get passed to VIR_ALLOC_N without
being checked, resulting in a request to allocate a lot of memory if the
return value was negative.
This can't lead to a crash since virNWFilterSnoopReqNew is only called
with a static array as the argument, but if we check for NULL we should
do it right.
Error messages produced while dispatching guest agent commands didn't
have an apparent reference to the fact that they are dealing with guest
agent commands. This patch fixes up some of the messages to contain that
reference.
using qemu guest agent. As said in previous patch,
@mountPoint must be NULL and @flags zero because
qemu guest agent doesn't support these arguments
yet. If qemu learns them, we can start supporting
them as well.
This will call FITRIM within guest. The API has 4 arguments,
however, only 2 will be used for now (@dom and @minumum).
The rest two are there if in future qemu guest agent learns them.
With QMP capability probing, the version was not set.
virsh version returns:
...
Cannot extract running QEMU hypervisor version
This is fixed by computing caps->version from QMP major,
minor, micro values.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
QMP Capability probing will fail if QEMU cannot bind to the
QMP monitor socket in the qemu_driver->libDir directory.
That's because the child process is stripped of all
capabilities and this directory is chown'ed to the configured
QEMU user/group (normally qemu:qemu) by the QEMU driver.
To prevent this from happening, the driver startup will now pass
the QEMU uid and gid down to the capability probing code.
All capability probing invocations of QEMU will be run with
the configured QEMU uid instead of libvirtd's.
Furter, the pid file handling is moved to libvirt, as QEMU
cannot write to the qemu_driver->runDir (root:root). This also
means that the libvirt daemonizing must be used.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
If qemuMonitorOpenUnix is called without a related pid, i.e. for
QMP probing, a connect failure can happen as the result of a race.
Without a pid there is no retry and thus we give up too early.
This changes the code to retry if no pid is supplied.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
we already have virtualize meminfo for container through fuse filesystem,
add function lxcContainerMountProcFuse to mount this meminfo file to
the container's /proc/meminfo.
So we can isolate container's /proc/meminfo from host now.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
with this patch,container's meminfo will be shown based on
containers' mem cgroup.
Right now,it's impossible to virtualize all values in meminfo,
I collect some values such as MemTotal,MemFree,Cached,Active,
Inactive,Active(anon),Inactive(anon),Active(file),Inactive(anon),
Active(file),Inactive(file),Unevictable,SwapTotal,SwapFree.
if I miss something, please let me know.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
because libvirt_lxc's cgroup mountpoint is what it shown
in /proc/self/cgroup.
we can get container's cgroup through virCgroupNew("/", &group),
add interface virCgroupGetAppRoot to help container to
get it's cgroup.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
virCgroupGetMemSwapUsage is used to get container's swap usage,
with this interface,we can get swap usage in fuse filesystem.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
this patch addes fuse support for libvirt lxc.
we can use fuse filesystem to generate sysinfo dynamically,
So we can isolate /proc/meminfo,cpuinfo and so on through
fuse filesystem.
we mount fuse filesystem for every container.
the mount name is libvirt,mount point is
localstatedir/run/libvirt/lxc/containername.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
This bug leads to getting incorrect vcpupin information via
qemudDomainGetVcpuPinInfo() API when the number of maximum
cpu on a host falls into a range such as 31 < ncpus < 64.
gcc warning:
left shift count >= width of type
The following bug is such the case
https://bugzilla.redhat.com/show_bug.cgi?id=876415
Change some legacy function names to use 'qemu' as their
prefix instead of 'qemud' which was a hang over from when
the QEMU driver ran inside a separate daemon
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When starting an LXC guest with a virNetwork based NIC device,
if the network was not active, the virNetworkPtr device would
be leaked
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In virNetDevVethDelete the virRun method will properly report
errors, but when checking the exit status for non-zero exit
code no error is reported
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When starting a container, newDef is initialized to a
copy of 'def', but when startup fails newDef is never
removed. This cause later attempts to use 'virDomainDefine'
to lose the new data being defined.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A mistaken initialization of 'ret' caused failure to create
macvtap devices to be ignored. The libvirt_lxc process
would later fail to start due to missing devices
Also make sure code checks '< 0' and not '!= 0' since only
-1 is considered an error condition
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the <interface> device did not contain any <target>
element, LXC would crash on a NULL pointer if starting
the container failed
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When failing to create a macvlan interface, make sure the
error message contains the name of the host interface
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The LXC driver relies on use of cgroups to kill off LXC processes
in shutdown. If cgroups aren't available, we're unable to kill
off processes, so we must treat lack of cgroups as a fatal startup
error.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code setting up LXC cgroups used an 'rc' variable both
for capturing the return value of methods it calls, and
its own return status. The result was that several failures
in setting up cgroups would actually result in success being
returned.
Use a separate 'ret' for tracking return value as per normal
code design in other parts of libvirt
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The initpid will be required long term to enable LXC to
implement various hotplug operations. Thus it needs to be
persisted in the domain status XML. LXC has not used the
domain status XML before, so this introduces use of the
helpers.
Currently the lxcContainerSetupMounts method uses the
virSecurityManagerPtr instance to obtain the mount options
string and then only passes the string down into methods
it calls. As functionality in LXC grows though, those
methods need to have direct access to the virSecurityManagerPtr
instance. So push the code down a level.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The impls of virSecurityManagerGetMountOptions had no way to
return errors, since the code was treating 'NULL' as a success
value. This is somewhat pointless, since the calling code did
not want NULL in the first place and has to translate it into
the empty string "". So change the code so that the impls can
return "" directly, allowing use of NULL for error reporting
once again
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
The actual regression is due to a latent bug. The hot unplug code
was computing the set of files needing cgroup ACL revocation based
on the XML passed in by the user, rather than based on the domain's
details on which disk was being deleted. As long as the revoke
path was always recomputing the backing chain, this didn't really
matter; but now that we want to compute the chain exactly once and
remember that computation, we need to hang on to the backing chain
until after the revoke has happened.
* src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice):
Transfer backing chain before deletion.
This patch introduces the RNG schema and updates necessary data strucutures
to allow various hypervisors to make use of Gluster protocol as one of the
supported network disk backend. Next patch will add support to make use of
this feature in Qemu since it now supports Gluster protocol as one of the
network based storage backend.
Two new optional attributes for <host> element are introduced - 'transport'
and 'socket'. Valid transport values are tcp, unix or rdma. If none specified,
tcp is assumed. If transport is unix, socket specifies path to unix socket.
This patch allows users to specify disks on gluster backends like this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume1/image'>
<host name='example.org' port='6000' transport='tcp'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume2/image'>
<host transport='unix' socket='/path/to/sock'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Although we require various C99 features, we don't yet require a
complete C99 compiler. On RHEL 5, compilation complained:
qemu/qemu_command.c: In function 'qemuBuildGraphicsCommandLine':
qemu/qemu_command.c:4688: error: 'for' loop initial declaration used outside C99 mode
* src/qemu/qemu_command.c (qemuBuildGraphicsCommandLine): Declare
variable sooner.
* src/qemu/qemu_process.c (qemuProcessInitPasswords): Likewise.
The patch refactors the current ESX storage driver due to following reasons:
1. Given most of the public APIs exposed by the storage driver in Libvirt
remains same, ESX storage driver should not implement logic specific
for only one supported format (current implementation only supports VMFS).
2. Decoupling interface from specific storage implementation gives us an
extensible design to hook implementation for other supported storage
formats.
This patch refactors the current driver to implement it as a facade pattern i.e.
the driver exposes all the public libvirt APIs, but uses backend drivers to get
the required task done. The backend drivers provide implementation specific to
the type of storage device.
File changes:
------------------
esx_storage_driver.c ----> esx_storage_driver.c (base storage driver)
|
|---> esx_storage_backend_vmfs.c (VMFS backend)
When no security driver is specified libvirt_lxc segfaults as a debug
message tries to access security labels for the container that are not
present.
This problem was introduced in commit 6c3cf57d6c.
Early jumps to the cleanup label caused a crash of the libvirt_lxc
container helper as the cleanup section called
virLXCControllerDeleteInterfaces(ctrl) without checking the ctrl argument
for NULL. The argument was de-referenced soon after.
$ /usr/libexec/libvirt_lxc
/usr/libexec/libvirt_lxc: missing --name argument for configuration
Segmentation fault
This will simplify the refactoring of the ESX storage driver to support
a VMFS and an iSCSI backend.
One of the tasks the storage driver needs to do is to decide which backend
driver needs to be invoked for a given request. This approach extends
virStoragePool and virStorageVol to store extra parameters:
1. privateData: stores pointer to respective backend storage driver.
2. privateDataFreeFunc: stores cleanup function pointer.
virGetStoragePool and virGetStorageVol are modfied to accept these extra
parameters as user params. virStoragePoolDispose and virStorageVolDispose
checks for cleanup operation if available.
The private data pointer allows the ESX storage driver to store a pointer
to the used backend with each storage pool and volume. This avoids the need
to detect the correct backend in each storage driver function call.
The new model supports following features in addition to those supported
by SandyBridge:
fma, pcid, movbe, fsgsbase, bmi1, hle, avx2, smep, bmi2, erms, invpcid,
rtm
Commit 258e06c removed setting of the volume type to
VIR_STORAGE_VOL_BLOCK, which leads to failures in
storageVolumeCreateXMLFrom.
The type (and target.format) of the volume was set to zero. In
virStorageBackendGetBuildVolFromFunction, this gets interpreted as
VIR_STORAGE_FILE_NONE and the qemu-img tool is called with unknown
"none" format.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=879780
bridge_driver.h: silence gcc warnings:
statement with no effect [-Wunused-value]
unused variable 'net' [-Wunused-variable]
virdrivermoduletest.c: don't require network driver module
if it hasn't been built.
The virLXCControllerClientCloseHook method was mistakenly
assuming that the private data associated with the network
client was the virLXCControllerPtr. In fact it was just a
dummy int, so we were derefencing a bogus struct. The
frequent result of this was that we would never quit, because
we tried to arm a non-existant timer.
Fix the code by removing the dummy private data and just
using the virLXCControllerPtr instance as private data
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It is possible for there to be deleted timers when we
calculate the next timeout, and they must be skipped.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The event code is a no-op if requested to update a non-existent
timer/handle watch. This makes it hard to detect bugs in the
caller who have passed bogus data. Add a VIR_WARN output in
such cases, since the API does not allow for return errors.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The docs for virDiskNameToIndex claim it ignores partition
numbers. In actual fact though, a code ordering bug means
that a partition number will cause the code to accidentally
multiply the result by 26.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit e0c469e58b that fixes the detection
of image chain wasn't complete. Iteration through the backing image
chain has to stop at the last existing image if some of the images are
missing otherwise the backing chain that is cached contains entries with
paths being set to NULL resulting to:
error: Unable to allow access for disk path (null): Bad address
Fortunately stat() is kind enough not to crash when it's presented with
a NULL argument. At least on Linux.
The error "... but the cause is unknown" appeared for XMLs similar to
this:
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/dev/zero'/>
<target dev='sr0'/>
</disk>
Notice unsupported disk type (for the driver), but also no address
specified. The first part is not a problem and we should not abort
immediately because of that, but the combination with the address
unknown was causing an unspecified error.
While fixing this, I added an error to one place where this return
value was not managed properly.
Fixes this error when building with -Werror on Alpine Linux:
util/processinfo.c: In function 'virProcessInfoSetAffinity':
util/processinfo.c:52:5: error: implicit declaration of function 'malloc' [-Werror=implicit-function-declaration]
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Currently the LXC driver logs audit messages when a container
is started or stopped. These audit messages, however, contain
the PID of the libvirt_lxc supervisor process. To enable
sysadmins to correlate with audit messages generated by
processes /inside/ the container, we need to include the
container init process PID.
We can't do this in the main 'start' audit message, since
the init PID is not available at that point. Instead we output
a completely new audit record, that lists both PIDs.
type=VIRT_CONTROL msg=audit(1353433750.071:363): pid=20180 uid=0 auid=501 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='virt=lxc op=init vm="busy" uuid=dda7b947-0846-1759-2873-0f375df7d7eb vm-pid=20371 init-pid=20372 exe="/home/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd" hostname=? addr=? terminal=pts/6 res=success'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The LXC controller code currently directly invokes the
libvirt main loop code. The problem is that this misses
the cleanup of virNetServerClient connections that
virNetServerRun takes care of.
The result is that when libvirtd is stopped, the
libvirt_lxc controller process gets stuck in a I/O loop.
When libvirtd is then started again, it fails to connect
to the controller and thus kills off the entire domain.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Regression introduced by commit 258e06c85b, "ret" could be set to 1
or 0 by virStorageBackendFileSystemIsMounted before goto cleanup.
This could mislead the callers (up to the public API
virStoragePoolDestroy) to return success even the underlying umount
command fails.
I have been testing libvirt v1.0.0 for deployment within my
organization, and in the process discovered what appears to be a bug
that breaks virsh attach-device, when attaching an RBD volume to an
instance. First, here is the error presented, with v1.0.0 (this worked
in v0.10.2):
[root@host ~]# virsh attach-device W5APQ8 G84VV1.xml
error: Failed to attach device from G84VV1.xml
error: cannot open file 'dc3-1-test/G84VV1': No such file or directory
Using git bisect, I narrowed the problem down to this as the first
commit to break this setup:
4d34c92947 is the first bad commit
Commit a4c19459aa only added the
QEMU capability flag, command line option and added the boot element
for redirdev's in the XML schema.
This patch adds support for parsing and writing the XML with redirdevs
with the boot flag. It also ignores unknown XML elements in redirdev
instead of failing with:
"error: An error occurred, but the cause is unknown"
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=805414
Upcoming patches for revert-and-clone branching of snapshots need
to be able to copy a domain definition; make this step reusable.
* src/conf/domain_conf.h (virDomainDefCopy): New prototype.
* src/conf/domain_conf.c (virDomainObjCopyPersistentDef): Split...
(virDomainDefCopy): ...into new function.
(virDomainObjSetDefTransient): Use it.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Use it.
Relatively straight-forward. And since qemu was already using
VIR_DOMAIN_SNAPSHOT_FILTERS_ALL, with 6 different APIs all calling
into this common code, I've instantly added all 5 flags to 6 APIs.
* src/conf/snapshot_conf.h (VIR_DOMAIN_SNAPSHOT_FILTERS_ALL):
Enable new filters.
* src/conf/snapshot_conf.c (virDomainSnapshotObjListGetNames):
Prep the new flags.
(virDomainSnapshotObjListCopyNames): Actually do the filtering.
As we enable more modes of snapshot creation, it becomes more important
to be able to quickly filter based on snapshot properties. This patch
introduces new filter flags; subsequent patches will introduce virsh
back-compat filtering, as well as actual libvirt filtering.
* include/libvirt/libvirt.h.in (virDomainSnapshotListFlags): Add
five new flags in two new groups.
* src/libvirt.c (virDomainSnapshotNum, virDomainSnapshotListNames)
(virDomainListAllSnapshots, virDomainSnapshotNumChildren)
(virDomainSnapshotListChildrenNames)
(virDomainSnapshotListAllChildren): Document them.
* src/conf/snapshot_conf.h (VIR_DOMAIN_SNAPSHOT_FILTERS_STATUS)
(VIR_DOMAIN_SNAPSHOT_FILTERS_LOCATION): Add new convenience filter
collection macros.
* tools/virsh-snapshot.c (cmdSnapshotList): Add 5 new flags.
* tools/virsh.pod (snapshot-list): Document them.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=873134
The reported problem is that an attempt to restore a saved domain that
was configured with <currentMemory> and <memory> set to some (same for
both) number that's not a multiple of 4096KiB results in an error like
this:
error: Failed to start domain libvirt_test_api
error: XML error: current memory '4001792k' exceeds maximum '4000768k'
(in this case, currentMemory was set to 4000000KiB).
The reason for this failure is:
1) a saved image contains the "live xml" of the domain at the time of
the save.
2) the live xml of a running domain gets its currentMemory
(a.k.a. cur_balloon) directly from the qemu monitor rather than from
the configuration of the domain.
3) the value reported by qemu is (sometimes) not exactly what was
originally given to qemu when the domain was started, but is rounded
up to [some indeterminate granularity] - in some versions of qemu that
granularity is apparently 1MiB, and in others it is 4MiB.
4) When the XML is parsed to setup the state of the restored domain,
the XML parser for <currentMemory> compares it to <memory> (which is
the maximum allowed memory size for the domain) and if <currentMemory>
is greater than the next 1024KiB boundary above <memory>, it spits out
an error and fails.
For example (from the BZ) if you start qemu on RHEL6 with both
<currentMemory> and <memory> of 4000000 (this number is in KiB),
libvirt's dominfo or dumpxml will report "4001792" back (rounded up to
next 4MiB) for 10-20 seconds after the start, then revert to reporting
"4000000". On Fedora 16 (which uses qemu-1.0), it will instead report
"4000768" (rounded up to next 1MiB). On Fedora 17 (qemu-1.2), it seems
to always report "4000000". ("4000000" is of course okay, and
"4000768" is also okay since that's the next 1024KiB boundary above
"4000000" and the parser was already allowing for that. But "4001792
is *not* okay and produces the error message.)
This patch solves the problem by changing the allowed "fudge factor"
when parsing from 1024KiB to 4096KiB to match the maximum up-rounding
that could be done in qemu.
(I had earlier thought to fix this by up-rounding <memory> in the
dumpxml that's put into the saved image, but that wouldn't have fixed
the case where the save image was produced by an "unfixed"
libvirtd.)
Prior to this patch, 'virsh nodecpumap' on older kernels reported:
error: Unable to get cpu map
error: out of memory
* src/nodeinfo.c (linuxParseCPUmax): Don't overwrite error.
(nodeGetCPUBitmap): Provide backup implementation.
On RHEL 5, I was getting a segfault trying to start libvirtd,
because we were failing virNodeParseSocket but not checking
for errors, and then calling CPU_SET(-1, &sock_map) as a result.
But if you don't have a topology/physical_package_id file,
then you can just assume that the cpu belongs to socket 0.
* src/nodeinfo.c (virNodeGetCpuValue): Change bool into
default_value.
(virNodeParseSocket): Allow for default value when file is missing,
different from fatal error on reading file.
(virNodeParseNode): Update call sites to fail on error.
For disk snapshots, the user could request an external snapshot
but not supply a filename; later on, we would check this condition
and generate a suitable name if possible, or gracefully error out
when not possible (such as when the original file was a block
device). But unless we come up with a suitable way to generate
external memory file names, we have no later code point that was
checking for NULL, so we should forbid this up front.
* src/conf/snapshot_conf.c (virDomainSnapshotDefParseString):
Avoid NULL deref, since we don't generate names yet.
It may take some time for sanlock to add a lockspace. And if user
restart libvirtd service meanwhile, the fresh daemon can fail adding
the same lockspace with EINPROGRESS. Recent sanlock has
sanlock_inq_lockspace() function which should block until lockspace
changes state. If we are building against older sanlock we should
retry a few times before claiming an error. This issue can be easily
reproduced:
for i in {1..1000} ; do echo $i; service libvirtd restart; sleep 2; done
20
Stopping libvirtd daemon: [FAILED]
Starting libvirtd daemon: [ OK ]
21
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]
22
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]
error : virLockManagerSanlockSetupLockspace:334 : Unable to add
lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation now in
progress
Since /sys/devices/system/cpu/present is not available on
older kernels like on RHEL 5.x nodeGetCPUCount will
fail there. The fallback implemented is to scan for
/sys/devices/system/cpu/cpuNN entries.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This simplifies the top-level code, at the cost of using a little more
stack space. The primary benefit is being able to send more fields
without knowing in advance how many of them, and of which types, these
fields will be, and without having to individually add buffer variables.
The code imposes an upper limit on the total number of iovs/buffers
used, and fields that wouldn't fit are silently dropped. This is not
significant in this patch, but will affect the following one.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... and update all users. No change in functionality, the parameter
will be used later.
The metadata representation is as minimal as possible, but requires
the caller to allocate an array on stack explicitly.
The alternative of using varargs in the virLogMessage() callers:
* Would not allow the caller to optionally omit some metadata elements,
except by having two calls to virLogMessage.
* Would not be as type-safe (e.g. using int vs. size_t), and the compiler
wouldn't be able to do type checking
* Depending on parameter order:
a) virLogMessage(..., message format, message params...,
metadata..., NULL)
can not be portably implemented (parse_printf_format() is a glibc
function)
b) virLogMessage(..., metadata..., NULL,
message format, message params...)
would prevent usage of ATTRIBUTE_FMT_PRINTF and the associated
compiler checking.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The "restart" function for locks allocates a new array according to
and pre-sets its length, then reads the owner pids from a JSON
document in a loop. Rather than adding each owner at a different
index, though, it repeatedly overwrites the last element of the array
with all the owners.
This patch adds a helper to determine if snapshots are external and uses
the helper to fix detection of those in snapshot deletion code.
Snapshots are external if they have an external memory image or if the
disk locations are external. As mixed snapshots are forbidden for now
we need to check just one disk to know.
Lately there were a few reports of the output of the virsh nodeinfo
command being inaccurate. This patch tries to avoid that by checking if
the topology actually makes sense. If it doesn't we then report a
synthetic topology that indicates to the user that the host capabilities
should be checked for the actual topology.
Currently, if user calls virDomainAbortJob we just issue
'migrate_cancel' and hope for the best. However, if user calls
the API in wrong phase when migration hasn't been started yet
(perform phase) the cancel request is just ignored. With this
patch, the request is remembered and as soon as perform phase
starts, migration is cancelled.
For S390, the default console target type cannot be of type 'serial'.
It is necessary to at least interpret the 'arch' attribute
value of the os/type element to produce the correct default type.
Therefore we need to extend the signature of defaultConsoleTargetType
to account for architecture. As a consequence all the drivers
supporting this capability function must be updated.
Despite the amount of changed files, the only change in behavior is
that for S390 the default console target type will be 'virtio'.
N.B.: A more future-proof approach could be to to use hypervisor
specific capabilities to determine the best possible console type.
For instance one could add an opaque private data pointer to the
virCaps structure (in case of QEMU to hold capsCache) which could
then be passed to the defaultConsoleTargetType callback to determine
the console target type.
Seems to be however a bit overengineered for the use case...
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When the libvirt daemon is restarted it tries to reconnect to running
qemu domains. Since commit d38897a5d4 the
re-connection code runs in separate threads. In the original
implementation the maximum of domain ID's (that is used as an
initializer for numbering guests created next) while libvirt was
reconnecting to the guest.
With the threaded implementation this opens a possibility for race
conditions with the thread that is autostarting guests. When there's a
guest running with id 1 and the daemon is restarted. The autostart code
is reached first and spawns the first guest that should be autostarted
as id 1. This results into the following unwanted situation:
# virsh list
Id Name State
----------------------------------------------------
1 guest1 running
1 guest2 running
This patch extracts the detection code before the re-connection threads
are started so that the maximum id of the guests being reconnected to is
known.
The only semantic change created by this is if the guest with greatest ID
quits before we are able to reconnect it's ID is used anyway as the
greatest one as without this patch the greatest ID of a process we could
successfuly reconnect to would be used.
82507838 refactored the code to keep both the raw and canonicalized form
of the backingStore, which breaks badly when the storage pool contains a
storage volume, which is missing its backing store file:
# ./daemon/libvirtd -l
2012-11-07 12:43:33.279+0000: 22175: info : libvirt version: 1.0.0
2012-11-07 12:43:33.279+0000: 22175: error : absolutePathFromBaseFile:542 : Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory
2012-11-07 12:43:33.280+0000: 22175: error : storageDriverAutostart:115 : Failed to autostart storage pool 'default': Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory
This is because virStorageFileGetMetadataFromBuf() aborts with -1 if the
filename of the backingStore can not be canonicalized:
#0 absolutePathFromBaseFile () at util/storage_file.c:541
#1 virStorageFileGetMetadataFromBuf () at util/storage_file.c:728
#2 virStorageFileGetMetadataFromFD () at util/storage_file.c:932
#3 virStorageBackendProbeTarget () at storage/storage_backend_fs.c:94
#4 virStorageBackendFileSystemRefresh () at storage/storage_backend_fs.c:849
#5 storagePoolStart () at storage/storage_driver.c:700
#6 virStoragePoolCreate () at libvirt.c:12471
...
Treat files which miss their backing file as standalone files.
Signed-off-by: Philipp Hahn <hahn@univention.de>
This patch adds support for external disk snapshots of inactive domains.
The snapshot is created by calling using qemu-img by calling:
qemu-img create -f format_of_snapshot -o
backing_file=/path/to/src,backing_fmt=format_of_backing_image
/path/to/snapshot
in case the backing image format is known or probing is allowed and
otherwise:
qemu-img create -f format_of_snapshot -o backing_file=/path/to/src
/path/to/snapshot
on each of the disks selected for snapshotting. This patch also modifies
the snapshot preparing function to support creating external snapshots
and to sanitize arguments. For now the user isn't able to mix external
and internal snapshots but this restriction might be lifted in the
future.