Here is a new version of this patch:
https://www.redhat.com/archives/libvir-list/2011-April/msg00337.html
v2:
- store the cputune info for the whole runtime of the domain
- remove cputune info when domain is destroyed
The nodeGetInfo code had to be moved into a helper
function to reuse it without a virConnectPtr.
Rather than copying and pasting lots of code, factor it into a
single helper function.
This commit adds a warning if tighter integer parsing would fail
due to any stray bytes after the number, but should not change
any behavior other than the bug fix for phypNumDomainsGeneric
looking only at numeric lines.
* src/phyp/phyp_driver.c (phypExecInt): New function.
(phypGetVIOSPartitionID, phypNumDomainsGeneric, phypGetLparID)
(phypGetLparMem, phypGetLparCPUGeneric, phypGetRemoteSlot)
(phypGetVIOSNextSlotNumber, phypAttachDevice)
(phypGetStoragePoolSize, phypStoragePoolNumOfVolumes)
(phypNumOfStoragePools, phypInterfaceDestroy)
(phypInterfaceDefineXML, phypInterfaceLookupByName)
(phypInterfaceIsActive, phypNumOfInterfaces): Use it.
(phypNumDomainsGeneric): Correctly find numeric line.
This last minute addition caused a build failure
cc1: warnings being treated as errors
qemu/qemu_process.c: In function 'qemuProcessHandleWatchdog':
qemu/qemu_process.c:436:34: error: ignoring return value of 'virDomainObjUnref', declared with attribute warn_unused_result [-Wunused-result]
make[3]: *** [libvirt_driver_qemu_la-qemu_process.lo] Error 1
This patch does the following two things:
1. hold an extra reference while handling watchdog event
If the domain is not persistent, and qemu quits unexpectedly before
calling processWatchdogEvent(), vm will be freed and the function
processWatchdogEvent() will be dangerous.
2. unlock qemu driver and vm before returning from processWatchdogEvent()
When the function processWatchdogEvent() failed, we only free wdEvent,
but forget to unlock qemu driver and vm, free dumpfile.
We do not lock qemu_driver when calling virThreadPoolNew(). If it failed,
we will unlock qemu_driver. It is dangerous.
We may use this pool during auto starting domains. So we must create it before
calling qemuAutostartDomains(). Otherwise, libvirtd will crash.
Also mark error messages in block_stats.c for translation, add the
new macro to the msg_gen functions in cfg.mk and add block_stats.c
to po/POTFILES.in
commit d4601696 introduces two more generated files: esx_vi.generated.h
and esx_vi.generated.h. But we do not include them into dist file.
It will break building if using dist file to build.
Use the name 'ret' for all phypExec results, to make it easier
to wrap phypExec. Don't allow a possibly NULL ret through printf.
* src/phyp/phyp_driver.c (phypBuildVolume, phypDestroyStoragePool)
(phypBuildStoragePool, phypBuildLpar): Avoid NULL dereference.
(phypInterfaceDestroy): Avoid redundant free.
(phypVolumeLookupByPath, phypVolumeGetPath): Use consistent
naming.
Ever since commit ebc46f, the destroy function built two command
variants but only used one. I went with the variant that matches
the idiom used in the counterpart of phypBuildStoragePool.
* src/phyp/phyp_driver.c (phypDestroyStoragePool): Avoid
clobbering cmd. Fix error message typo.
This warnings come from partly generated code. Therefore, the best
solution is to mark them as potentially being unused using the
ATTRIBUTE_UNUSED macro. This is suggested by the gcc documentation.
Reported by Christophe Fergeau
This patch addresses:
https://bugzilla.redhat.com/show_bug.cgi?id=694382
In order to give each libvirt-created bridge a fixed MAC address,
commit 5754dbd56d, added code to create
a dummy tap device with guaranteed lowest MAC address and attach it to
the bridge. This tap device was given the name "${bridgename}-nic".
However, an interface device name must be IFNAMSIZ (15) characters or
less, so a valid ${bridgename} such as "verylongname123" (15
characters) would lead to an invalid tap device name
("verylongname123-nic" - 19 characters), and that in turn led to a
failure to bring up the network.
The solution is to shorten the part of the original name used to
generate the tap device name. However, simply truncating it is
insufficient, because the last few characters of an interface name are
often a number used to indicate one of a list of several similar
devices (for example, "verylongname123", "verylongname124", etc) and
simple truncation would lead to duplicate names (eg "verlongnam-nic"
and "verylongnam-nic"). So instead we take the first 8 characters of
$bridgename ("verylong" in the example), add on the final 3 bytes
("123"), then add "-nic" (so "verylong123-nic"). Not pretty, but it
is much more likely to generate a unique name, and is reproducible
(unlike, say, a random number).
Due to differences in /proc/cpuinfo the parsing of the cpu data is
different between architectures. On PPC /proc/cpuinfo looks like this:
[original formatting with tabs]
processor : 0
cpu : PPC970MP, altivec supported
clock : 2297.700000MHz
revision : 1.1 (pvr 0044 0101)
processor : 1
cpu : PPC970MP, altivec supported
clock : 2297.700000MHz
revision : 1.1 (pvr 0044 0101)
[..]
timebase : 14318000
platform : pSeries
model : IBM,8844-AC1
machine : CHRP IBM,8844-AC1
The patch adapts the parsing of the data found in /proc/cpuinfo.
/sys/devices/system/cpu/cpuX/topology/physical_package_id also
always returns -1. Check for it on ppc and make it '0' if found negative.
This patch enables the migration of Qemu VMs between hosts of different endianess. I tested this by migrating a i686 VM between a x86 and ppc64 host.
I am converting the 'int's in the VM's state header to uint32_t assuming this doesn't break compatibility with existing deployments other than Linux.
gcc 4.6 warns when a variable is initialized but isn't used afterwards:
vmware/vmware_driver.c:449:18: warning: variable 'vmxPath' set but not used [-Wunused-but-set-variable]
This patch fixes these warnings. There are still 2 offending files:
- vbox_tmpl.c: the variable is used inside an #ifdef and is assigned several
times outside of #ifdef. Fixing the warning would have required wrapping
all the assignment inside #ifdef which hurts readability.
vbox/vbox_tmpl.c: In function 'vboxAttachDrives':
vbox/vbox_tmpl.c:3918:22: warning: variable 'accessMode' set but not used [-Wunused-but-set-variable]
- esx_vi_types.generated.c: the name implies it's generated code and I
didn't want to dive into the code generator
esx/esx_vi_types.generated.c: In function 'esxVI_FileQueryFlags_Free':
esx/esx_vi_types.generated.c:1203:3: warning: variable 'item' set but not used [-Wunused-but-set-variable]
Make: passed
Make check: passed
Make syntax-check: passed
this is the commit to introduce the function to create new character
device definition for the domain as advised by Cole Robinson
<crobinso@redhat.com>.
The function is used on the relevant places and also new tests has
been added.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
This extends the SPICE XML to allow variable compression settings for audio,
images and streaming:
<graphics type='spice' port='5901' tlsPort='-1' autoport='yes'>
<image compression='auto_glz'/>
<jpeg compression='auto'/>
<zlib compression='auto'/>
<playback compression='on'/>
</graphics>
All new elements are optional.
This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=696660
While starting a network, if brSetForwardDelay() fails, we go to err1
where we want to access macTapIfName variable which was just
VIR_FREE'd a few lines above. Instead, keep macTapIfName until we are
certain of success.
The methods qemuDomain{Get,Set}{Memory,Blkio,Scheduler}Parameters
all forgot to do a check on virDomainIsActive(), resulting in bogus
error messages from later parts of their impl
* src/qemu/qemu_driver.c: Add missing checks on virDomainIsActive()
sizeof(domain->name) is the wrong thing. Instead of using strdup here
rewrite escape_specialcharacters to allocate the buffer itself.
Add a contains_specialcharacters to be used in phypOpen, as phypOpen is
not interested in the escaped version.
Don't pre-allocate 4kb per key, make phypVolumeGetKey allocate the memory.
Make phypBuildVolume return the volume key instead of using pre-allocated
memory to store it.
Also fix a memory leak in phypVolumeLookupByName when phypVolumeGetKey
fails. Fix another memory leak in phypVolumeLookupByPath in the success
path. Fix phypVolumeGetXMLDesc leaking voldef.key.
Move the virInterfacePtr declaration to the top of the
function to avoid jump uninitialized variable warnings
* src/phyp/phyp_driver.c: Fix var declaration
This is the implementation of the previous patch now using virInterface*
API. Ended up this patch got much more simpler, smaller and easier to
review. Here is some details:
* MAC size and interface name are fixed due to specifications on HMC,
both are created automatically and CAN'T be specified from user. They
have the following format:
* MAC: 122980003002
* Interface name: U9124.720.067BE8B-V3-C0
* I did replaced all the |grep|sed following the comments Eric Blake
did on the last patch.
* According to my last email, It's not possible to create a network
interface without assigning it to a specific lpar. Then, I am using
this very minimalistic XML file for testing:
<interface type='ethernet' name='LPAR01'>
</interface>
In this file I am using "name" as the lpar name which I am going to
assign the new network interface. I couldn't find a better way to
refer to it. Comments are welcome.
* Regarding the fact I am sleeping one second waiting for the HMC to
complete creation of the interface, I don't have means to check
if the whole process is done. All I do is execute a command, wait
until is complete (which is not enough in this case) check
the return and the exit status. The process of actually creating
a networking interface seems to take a little longer than just the
return of the ssh control.
In qemuDomainObjBeginJobWithDriver, when virCondWaitUntil timeouts,
the function tries to call qemuDriverLock with virDomainObj locked,
this causes the dead-lock problem. This patch fixes this.
Commit 9677cd33ee made it possible to
remove current entry when iterating through all hash entries. However,
it didn't properly handle a special case of removing first entry
assigned to a given key which contains several entries in its collision
list.
This patch implements the code to support virDomainSetMaxMemory API,
and to support VIR_DOMAIN_MEM_MAXIMUM flag in qemudDomainSetMemoryFlags function.
As a result, we can change the maximum memory size of inactive QEMU guests.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Move "returns" keyword from beginning of API doc lines
when it does not describe return values.
Maybe the API doc extractor could be changed to look for
"returns: " to avoid such confusion.
The libexec program libvirt_iohelper is only for libvirtd. If we build rpm
without libvirtd, we will receive the following messages:
Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/wency/rpmbuild/BUILDROOT/libvirt-0.9.0-1.el6.x86_64
error: Installed (but unpackaged) file(s) found:
/usr/libexec/libvirt_iohelper
This patch adds support for the evaluation of TCP flags in nwfilters.
It adds documentation to the web page and extends the tests as well.
Also, the nwfilter schema is extended.
The following are some example for rules using the tcp flags:
<rule action='accept' direction='in'>
<tcp state='NONE' flags='SYN/ALL' dsptportstart='80'/>
</rule>
<rule action='drop' direction='in'>
<tcp state='NONE' flags='SYN/ALL'/>
</rule>
This patch adds virDomainSetMemoryFlags(,,VIR_DOMAIN_MEM_CURRENT) support
code to qemu driver.
Also, change virDomainObjIsActive to return bool, given its usage.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
This patch introduces VIR_DOMAIN_MEM_CURRENT flag and
modifies virDomainSetMemoryFlags function to support it.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
* .gnulib: Update to latest, for pipe2.
* bootstrap.conf (gnulib_modules): Add pipe2.
* src/util/event_poll.c (virEventPollInit): Use it, to avoid
problematic virSetCloseExec on mingw.
1) Both "qemuDomainStartWithFlags" and "qemuAutostartDomain" try to
restore the domain from managedsave'ed image if it exists (by
invoking "qemuDomainObjRestore"), but it unlinks the image even
if restoring fails, which causes data loss. (This problem exists
for "virsh managedsave dom; virsh start dom").
The fix for is to unlink the managed state file only if restoring
succeeded.
2) For "virsh save dom; virsh restore dom;", it can cause data
corruption if one reuse the saved state file for restoring. Add
doc to tell user about it.
3) In "qemuDomainObjStart", if "managed_save" is NULL, we shouldn't
fallback to start the domain, skipping it to cleanup as a incidental
fix. Discovered by Eric.
We should bind pci device to original driver when pciBindDeviceToStub() failed.
If the pci device is not bound to any driver before calling pciBindDeviceToStub(),
we should only unbind it from pci-stub. If it is bound to pci-stub, we should not
unbind it from pci-stub.
This patch do the following things:
1. rename the function as 'Unbind' is better than 'UnBind'.
2. pciUnbindDeviceFromStub() will be used in the function pciBindDeviceToStub() in
next patch. Float it up, instead of having to have a forward declaration
In file included from util/threads.c:31:
util/threads-pthread.c: In function 'virThreadSelfID':
util/threads-pthread.c:214: warning: cast from function call of type 'pthread_t' to non-matching type 'int' [-Wbad-function-cast]
* src/util/threads-pthread.c (virThreadSelfID) [!SYS_gettid]:
Add intermediate cast to silence gcc.
We're seeing bugs apparently resulting from thread unsafety of
libpciaccess, such as
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/726099
To prevent those, as suggested by danpb on irc, move the
nodeDeviceLock(driverState) higher into the callers. In
particular:
udevDeviceMonitorStartup should hold the lock while calling
udevEnumerateDevices(), and udevEventHandleCallback should hold it
over its entire execution.
It's not clear to me whether it is ok to hold the
nodeDeviceLock while taking the virNodeDeviceObjLock(dev) on a
device. If not, then the lock will need to be dropped around
the calling of udevSetupSystemDev(), and udevAddOneDevice()
may not actually be safe to call from higher layers with the
driverstate lock held.
libvirt 0.8.8 with this patch on it seems to work fine for me.
Assuming it looks ok and I haven't done anything obviously dumb,
I'll ask the bug submitters to try this patch.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch adds max_processes option to qemu.conf which can be used to
override system default limit on number of processes that are allowed to
be running for qemu user.
cc1: warnings being treated as errors
libxl/libxl_driver.c: In function 'libxlDomainSetVcpusFlags':
libxl/libxl_driver.c:1570:14: error: cast from function call of type 'double' to non-matching type 'unsigned int' [-Wbad-function-cast]
libxl/libxl_driver.c:1578:15: error: cast from function call of type 'double' to non-matching type 'unsigned int' [-Wbad-function-cast]
This was the only use of floor() and ceil(), and floating-point
is overkill for power-of-two manipulations.
* src/libxl/libxl_driver.c (libxlDomainSetVcpusFlags): Avoid -lm
for trivial computations.
GCC is a little confused about the cast of beginthread/beginthreadex
from unsigned long -> void *. Go via an intermediate variable avoids
the bogus warning, and makes the code a little cleaner
* src/util/threads-win32.c: Avoid compiler warning in cast
The SCSI volumes get a better 'key' field based on the fully
qualified volume path. All SCSI volumes have a unique serial
available in hardware which can be obtained by sending a
suitable SCSI command. Call out to udev's 'scsi_id' command
to fetch this value
* src/storage/storage_backend_scsi.c: Improve volume key
field value stability and uniqueness
When initializing qemu guest capabilities, we should ignore qemu
binaries that we are not able to extract version/help info from since
they will be unusable for creating domains anyway. Ignoring them is also
much better than letting initialization of qemu driver fail.
A couple of functions were declared using the old style foo()
for no-parameters, instead of foo(void)
* src/xen/xen_hypervisor.c, tests/testutils.c: Replace () with (void)
in some function declarations
* m4/virt-compile-warnings.m4: Enable -Wold-style-definition
Replace openvz_readline with getline in several places to get rid of stack
allocated buffers to hold lines.
openvzReadConfigParam allocates memory for return values instead of
expecting a preexisting buffer.
This patch enables the relative backing file path support provided by
qemu-img create.
If a relative path is specified for the backing file, it is converted
to an absolute path using the storage pool path. The absolute path is
used to verify that the backing file exists. If the backing file exists,
the relative path is allowed and will be provided to qemu-img create.
Even with -Wuninitialized (which is part of autobuild.sh
--enable-compile-warnings=error), gcc does NOT catch this
use of an uninitialized variable:
{
if (cond)
goto error;
int a = 1;
error:
printf("%d", a);
}
which prints 0 (supposing the stack started life wiped) if
cond was true. Clang will catch it, but we don't use clang
as often. Using gcc -Wjump-misses-init catches it, but also
gives false positives:
{
if (cond)
goto error;
int a = 1;
return a;
error:
return 0;
}
Here, a was never used in the scope of the error block, so
declaring it after goto is technically fine (and clang agrees).
However, given that our HACKING already documents a preference
to C89 decl-before-statement, the false positive warning is
enough of a prod to comply with HACKING.
[Personally, I'd _really_ rather use C99 decl-after-statement
to minimize scope, but until gcc can efficiently and reliably
catch scoping and uninitialized usage bugs, I'll settle with
the compromise of enforcing a coding standard that happens to
reject false positives if it can also detect real bugs.]
* acinclude.m4 (LIBVIRT_COMPILE_WARNINGS): Add -Wjump-misses-init.
* src/util/util.c (__virExec): Adjust offenders.
* src/conf/domain_conf.c (virDomainTimerDefParseXML): Likewise.
* src/remote/remote_driver.c (doRemoteOpen): Likewise.
* src/phyp/phyp_driver.c (phypGetLparNAME, phypGetLparProfile)
(phypGetVIOSFreeSCSIAdapter, phypVolumeGetKey)
(phypGetStoragePoolDevice)
(phypVolumeGetPhysicalVolumeByStoragePool)
(phypVolumeGetPath): Likewise.
* src/vbox/vbox_tmpl.c (vboxNetworkUndefineDestroy)
(vboxNetworkCreate, vboxNetworkDumpXML)
(vboxNetworkDefineCreateXML): Likewise.
* src/xenapi/xenapi_driver.c (getCapsObject)
(xenapiDomainDumpXML): Likewise.
* src/xenapi/xenapi_utils.c (createVMRecordFromXml): Likewise.
* src/security/security_selinux.c (SELinuxGenNewContext):
Likewise.
* src/qemu/qemu_command.c (qemuBuildCommandLine): Likewise.
* src/qemu/qemu_hotplug.c (qemuDomainChangeEjectableMedia):
Likewise.
* src/qemu/qemu_process.c (qemuProcessWaitForMonitor): Likewise.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextGetPtyPaths):
Likewise.
* src/qemu/qemu_driver.c (qemudDomainShutdown)
(qemudDomainBlockStats, qemudDomainMemoryPeek): Likewise.
* src/storage/storage_backend_iscsi.c
(virStorageBackendCreateIfaceIQN): Likewise.
* src/node_device/node_device_udev.c (udevProcessPCI): Likewise.
If strdup("x509dname") or strdup("saslUsername") success, but
strdup(x509dname) or strdup(saslUsername) failed, subject->nidentity
is not the num elements of subject->identities, and we will leak some
memory.
When you happen to have a libvirtd binary compiled with the
libxenlight driver (say you have installed xen-4.1 libraries)
but not running a xen enabled system, then libvirtd fails to start.
The cause is that libxlStartup() returns -1 when failing to initialize
the library, and this propagates to virStateInitialize() which consider
this a failure. We should only exit libxlStartup with an error code
if something like an allocation error occurs, not if the driver failed
to initialize.
* src/libxl/libxl_driver.c: fix libxlStartup() to not return -1
when failing to initialize the libxenlight library
qemu driver uses a 4K buffer for reading qemu log file. This is enough
when only qemu's output is present in the log file. However, when
debugging messages are turned on, intermediate libvirt process fills the
log with a bunch of debugging messages before it executes qemu binary.
In such a case the buffer may become too small. However, we are not
really interested in libvirt messages so they can be filtered out from
the buffer.
It throws errors as long as the cgroup controller is not available,
regardless of whether we really want to use it to do setup or not,
which is not what we want, fixing it with throwing error when need
to use the controller.
And change "VIR_WARN" to "qemuReportError" for memory controller
incidentally.
We create a temporary file to save memory, and we will remove it after reading
memory to buffer. But we free the variable that contains the temporary filename
before we remove it. So we should free tmp after unlinking it.
strcase{cmp/str} have the drawback of being sensitive to the global
locale; this is unacceptable in a library setting. Prefer a
hard-coded C locale alternative for all but virsh, which is user
facing and where the global locale isn't changing externally.
* .gnulib: Update to latest, for c-strcasestr change.
* bootstrap.conf (gnulib_modules): Drop strcasestr, add c-strcase
and c-strcasestr.
* cfg.mk (sc_avoid_strcase): New rule.
(exclude_file_name_regexp--sc_avoid_strcase): New exception.
* src/internal.h (STRCASEEQ, STRCASENEQ, STRCASEEQLEN)
(STRCASENEQLEN): Adjust offenders.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextEjectMedia):
Likewise.
* tools/virsh.c (namesorter): Document exception.
If qemu quited unexpectedly when we call qemuMonitorJSONHMP(),
libvirt will crash.
Steps to reproduce this bug:
1. use gdb to attach libvirtd, and set a breakpoint in the function
qemuMonitorSetCapabilities()
2. start a vm
3. let the libvirtd to run until qemuMonitorJSONSetCapabilities() returns.
4. kill the qemu process
5. continue running libvirtd
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
If the monitor met a error, and we will call qemuProcessHandleMonitorEOF().
But we may try to send monitor command after qemuProcessHandleMonitorEOF()
returned. Then libvirtd will be blocked in qemuMonitorSend().
Steps to reproduce this bug:
1. use gdb to attach libvirtd, and set a breakpoint in the function
qemuConnectMonitor()
2. start a vm
3. let the libvirtd to run until qemuMonitorOpen() returns.
4. kill the qemu process
5. continue running libvirtd
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Currently libvirt's default logging is limited and it is difficult to
determine what was happening when a proglem occurred (especially on a
machines where one don't know the detail.) This patch helps to do that
by making additional logging available for the following events:
creating/defining/undefining domains
creating/defining/undefining/starting/stopping networks
creating/defining/undefining/starting/stopping storage pools
creating/defining/undefining/starting/stopping storage volumes.
* AUTHORS: add Naoya Horiguchi
* src/network/bridge_driver.c src/qemu/qemu_driver.c
src/storage/storage_driver.c: provide more VIR_INFO logging
Not sure if it's the correct way to add cputune xml for xend driver,
and besides, seems "xm driver" and "xen hypervisor" also support
vcpu affinity, do we need to add support for them too?
When domain startup, setting cpu affinity and cpu shares according
to the cputune xml specified in domain xml.
Modify "qemudDomainPinVcpu" to update domain config for vcpupin,
and modify "qemuSetSchedulerParameters" to update domain config
for cpu shares.
v1 - v2:
* Use "VIR_ALLOC_N" instead of "VIR_ALLOC_VAR"
* But keep raising error when it fails on adding vcpupin xml
entry, as I still don't have a better idea yet.
Implementations of following functions:
virDomainVcpupinIsDuplicate
virDomainVcpupinFindByVcpu
virDomainVcpupinAdd
Update "virDomainDefParseXML" to parse, and "virDomainDefFormatXML"
to build cputune xml, also implementations of new internal helper
functions.
v1 - v2:
* Resolve potential crash bug of "virDomainVcpupinAdd"
Also related new functions' declaration, and expose the new introduced
functions in libvirt_private.syms.
v1 - v2:
Don't expose "virAllocVar" in libvirt_private.syms
My earlier testing for commit 34fa0de0 was done while starting
just-built libvirt from an unconfined_t shell, where the fds happened
to work when transferring to qemu. But when installed and run under
virtd_t, failure to label the raw file (with no compression) or the
pipe (with compression) triggers SELinux failures when passing fds
over SCM_RIGHTS to svirt_t qemu.
* src/qemu/qemu_migration.c (qemuMigrationToFile): When passing
FDs, make sure they are labeled.
First fallout of fd: migration - it looks like SELinux enforcing
_does_ require fd labeling (running uninstalled libvirtd from an
unconstrained shell had no problems, but once faked out by doing
chcon `stat -c %C /usr/sbin/libvirtd` daemon/libvirtd
run_init $PWD/daemon/libvirtd
to run it with the same context as an init script service, and with
SELinux enforcing, I got a rather confusing failure:
error: Failed to save domain fedora_12 to fed12.img
error: internal error unable to send TAP file handle: No file descriptor supplied via SCM_RIGHTS
This fixes the error message, then I need to figure out a subsequent
patch that does the fsetfilecon() necessary to keep things happy.
It also appears that libvirtd hangs on a failed fd transfer; I don't
know if that needs an independent fix.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextSendFileHandle):
Improve message, since TAP is no longer only client.
* src/Makefile.am src/libvirt_private.syms configure.ac: share and
reuse the sexpr routines from sexpr.h of the old xen driver
* src/libxl/libxl_driver.c: implements libxlDomainXMLFromNative and
libxlDomainXMLToNative
Hook the virtual cpu functions to their libxenlight counterparts
* src/libxl/libxl_driver.c: implements libxlDomainSetVcpus,
libxlDomainGetVcpus, libxlDomainSetVcpusFlags,
libxlDomainGetVcpusFlags and libxlDomainPinVcpu
* src/libxl/libxl_conf.h: add the necessary fields to the driver
private structure
* src/libxl/libxl_driver.c: add lifecycle event support and entry
points for event(de)register(any)
New APIs are added allowing streaming of content to/from
storage volumes.
* include/libvirt/libvirt.h.in: Add virStorageVolUpload and
virStorageVolDownload APIs
* src/driver.h, src/libvirt.c, src/libvirt_public.syms: Stub
code for new APIs
* src/storage/storage_driver.c, src/esx/esx_storage_driver.c:
Add dummy entries in driver table for new APIs
The O_NONBLOCK flag doesn't work as desired on plain files
or block devices. Introduce an I/O helper program that does
the blocking I/O operations, communicating over a pipe that
can support O_NONBLOCK
* src/fdstream.c, src/fdstream.h: Add non-blocking I/O
on plain files/block devices
* src/Makefile.am, src/util/iohelper.c: I/O helper program
* src/qemu/qemu_driver.c, src/lxc/lxc_driver.c,
src/uml/uml_driver.c, src/xen/xen_driver.c: Update for
streams API change
Spawn the compressor ourselves, instead of requiring the shell.
* src/qemu/qemu_migration.c (qemuMigrationToFile): Spawn
compression helper process when needed.
SELinux labeling and cgroup ACLs aren't required if we hand a
pre-opened fd to qemu. All the more reason to love fd: migration.
* src/qemu/qemu_migration.c (qemuMigrationToFile): Skip steps
that are irrelevant in fd migration.
This points out that core dumps (still) don't work for root-squash
NFS, since the fd is not opened correctly. This patch should not
introduce any functionality change, it is just a refactoring to
avoid duplicated code.
* src/qemu/qemu_migration.h (qemuMigrationToFile): New prototype.
* src/qemu/qemu_migration.c (qemuMigrationToFile): New function.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag, doCoreDump): Use
it.
Direct access to an open file is so much simpler than passing
everything through a pipe!
* src/qemu/qemu_driver.c (qemudOpenAsUID)
(qemudDomainSaveImageClose): Delete.
(qemudDomainSaveImageOpen): Rename...
(qemuDomainSaveImageOpen): ...and drop read_pid argument. Use
virFileOpenAs instead of qemudOpenAsUID.
(qemudDomainSaveImageStartVM, qemudDomainRestore)
(qemudDomainObjRestore): Rename...
(qemuDomainSaveImageStartVM, qemuDomainRestore)
(qemDomainObjRestore): ...and simplify accordingly.
(qemudDomainObjStart, qemuDriver): Update callers.
This patch intentionally doesn't change indentation, in order to
make it easier to review the real changes.
* src/util/util.h (VIR_FILE_OP_RETURN_FD, virFileOperationHook):
Delete.
(virFileOperation): Rename...
(virFileOpenAs): ...and reduce parameters.
* src/util/util.c (virFileOperationNoFork, virFileOperation):
Rename and simplify.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Adjust caller.
* src/storage/storage_backend.c (virStorageBackendCreateRaw):
Likewise.
* src/libvirt_private.syms: Reflect rename.
Currently, the hook function in virFileOperation is extremely limited:
it must be async-signal-safe, and cannot modify any memory in the
parent process. It is much handier to return a valid fd and operate
on it in the parent than to deal with hook restrictions.
* src/util/util.h (VIR_FILE_OP_RETURN_FD): New flag.
* src/util/util.c (virFileOperationNoFork, virFileOperation):
Honor new flag.
This allows direct saves (no compression, no root-squash NFS) to use
the more efficient fd: migration, which in turn avoids a race where
qemu exec: migration can sometimes fail because qemu does a generic
waitpid() that conflicts with the pclose() used by exec:. Further
patches will solve compression and root-squash NFS.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Use new function
when there is no compression.
Latent bug introduced in commit 2d6a581960 (Aug 2009), but not exposed
until commit 1859939a (Jan 2011). Basically, when virExec creates a
pipe, it always marks libvirt's side as cloexec. If libvirt then
wants to hand that pipe to another child process, things work great if
the fd is dup2()'d onto stdin or stdout (as with stdin: or exec:
migration), but if the pipe is instead used as-is (such as with fd:
migration) then qemu sees EBADF because the fd was closed at exec().
This is a minimal fix for the problem at hand; it is slightly racy,
but no more racy than the rest of libvirt fd handling, including the
case of uncompressed save images. A more invasive fix, but ultimately
safer at avoiding leaking unintended fds, would be to _always and
atomically_ open all fds as cloexec in libvirt (thanks to primitives
like open(O_CLOEXEC), pipe2(), accept4(), ...), then teach virExec to
clear that bit for all fds explicitly marked to be handed to the child
only after forking.
* src/qemu/qemu_command.c (qemuBuildCommandLine): Clear cloexec
flag.
* tests/qemuxml2argvtest.c (testCompareXMLToArgvFiles): Tweak test.
* src/util/logging.c (virLogStartup, virLogSetBufferSize):
Over-allocate, so that a debugger can just print the circular
buffer. Suggested by Daniel Veillard.
* src/Makefile.am (remote_protocol-structs): Flatten tabs.
* src/remote_protocol-structs: Likewise. Also add a hint to emacs
to make it easier to keep spaces in the file.
Otherwise, if something like doStopVcpus fails after the first
restore, a second restore is attempted and throws a useless
warning.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Avoid second
restore of state label.
The Open Nebula driver has been unmaintained since it was first
introduced. The only commits have been for tree-wide cleanups.
It also has a major design flaw, in that it only knows about guests
that it has created itself, which makes it of very limited use.
Discussions wrt evolution of the VMWare ESX driver, concluded that
it should limit itself to single-node ESX operation and not try to
manage the multi-node architecture of VirtualCenter. Open Nebula
is a cluster like Virtual Center, not a single node system, so
the same reasoning applies.
The DeltaCloud project includes an Open Nebula driver and is a much
better fit architecturally, since it is explicitly targetting the
distributed multihost cluster scenario.
Thus this patch deletes the libvirt Open Nebula driver with the
recommendation that people use DeltaCloud for managing it instead.
* configure.ac: Remove probe for xmlrpc & --with-one arg
* daemon/Makefile.am, daemon/libvirtd.c, src/Makefile.am: Remove
ONE driver build
* src/opennebula/one_client.c, src/opennebula/one_client.h,
src/opennebula/one_conf.c, src/opennebula/one_conf.h,
src/opennebula/one_driver.c, src/opennebula/one_driver.c: Delete
files
* autobuild.sh, libvirt.spec.in, mingw32-libvirt.spec.in: Remove
build rules for Open Nebula
* docs/drivers.html.in, docs/sitemap.html.in: Remove reference
to OpenNebula
* docs/drvone.html.in: Delete file
Add missing open curly brace between function declaration of non-linux
variant of qemudDomainInterfaceStats() and its body.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Sometimes, an asynchronous helper is started (such as a compressor
or iohelper program), but a later error means that we want to
abort that child. Make this easier.
Note that since daemons and virCommandRunAsync can't mix, the only
time virCommandFree can reap a process is if someone did
virCommandRunAsync for a non-daemon and didn't stash the pid.
* src/util/command.h (virCommandAbort): New prototype.
* src/util/command.c (_virCommand): Add new field.
(virCommandRunAsync, virCommandWait): Track whether pid was used.
(virCommandFree): Reap child if caller did not request pid.
(virCommandAbort): New function.
* src/libvirt_private.syms (command.h): Export it.
* tests/commandtest.c (test19): New test.
It doesn't make sense to run a daemon without synchronously
waiting for the child process to reply whether the daemon has
been kicked off and pidfile written yet.
* src/util/command.c (VIR_EXEC_RUN_SYNC): New constant.
(virCommandRun): Set temporary flag.
(virCommandRunAsync): Use it to prevent async runs of intermediate
child when spawning asynchronous daemon grandchild.
Child processes don't always reach _exit(); if they die from a
signal, then any messages should still be accurate. Most users
either expect a 0 status (thankfully, if status==0, then
WIFEXITED(status) is true and WEXITSTATUS(status)==0 for all
known platforms) or were filtering on WIFEXITED before printing
a status, but a few were missing this check. Additionally,
nwfilter_ebiptables_driver was making an assumption that works
on Linux (where WEXITSTATUS shifts and WTERMSIG just masks)
but fails on other platforms (where WEXITSTATUS just masks and
WTERMSIG shifts).
* src/util/command.h (virCommandTranslateStatus): New helper.
* src/libvirt_private.syms (command.h): Export it.
* src/util/command.c (virCommandTranslateStatus): New function.
(virCommandWait): Use it to also diagnose status from signals.
* src/security/security_apparmor.c (load_profile): Likewise.
* src/storage/storage_backend.c
(virStorageBackendQEMUImgBackingFormat): Likewise.
* src/util/util.c (virExecDaemonize, virRunWithHook)
(virFileOperation, virDirCreate): Likewise.
* daemon/remote.c (remoteDispatchAuthPolkit): Likewise.
* src/nwfilter/nwfilter_ebiptables_driver.c (ebiptablesExecCLI):
Likewise.
Hotpluging host usb device by text mode will fail, because the monitor
command 'device_add' outputs 'husb: using...' if it succeeds, but we
think the command should not output anything.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Add the compiler attribute to ensure we don't introduce any more
ref bugs like were just patched in commit 9741f34, then explicitly
mark the remaining places in code that are safe.
* src/qemu/qemu_monitor.h (qemuMonitorUnref): Mark
ATTRIBUTE_RETURN_CHECK.
* src/conf/domain_conf.h (virDomainObjUnref): Likewise.
* src/conf/domain_conf.c (virDomainObjParseXML)
(virDomainLoadStatus): Fix offenders.
* src/openvz/openvz_conf.c (openvzLoadDomains): Likewise.
* src/vmware/vmware_conf.c (vmwareLoadDomains): Likewise.
* src/qemu/qemu_domain.c (qemuDomainObjBeginJob)
(qemuDomainObjBeginJobWithDriver)
(qemuDomainObjExitRemoteWithDriver): Likewise.
* src/qemu/qemu_monitor.c (QEMU_MONITOR_CALLBACK): Likewise.
Suggested by Daniel P. Berrange.
This simplifies several callers that were repeating checks already
guaranteed by util.c, and makes other callers more robust to now
reject directories. remote_driver.c was over-strict - access(,R_OK)
is only needed to execute a script file; a binary only needs
access(,X_OK) (besides, it's unusual to see a file with x but not
r permissions, whether script or binary).
* cfg.mk (sc_prohibit_access_xok): New syntax-check rule.
(exclude_file_name_regexp--sc_prohibit_access_xok): Exempt one use.
* src/network/bridge_driver.c (networkStartRadvd): Fix offenders.
* src/qemu/qemu_capabilities.c (qemuCapsProbeMachineTypes)
(qemuCapsInitGuest, qemuCapsInit, qemuCapsExtractVersionInfo):
Likewise.
* src/remote/remote_driver.c (remoteFindDaemonPath): Likewise.
* src/uml/uml_driver.c (umlStartVMDaemon): Likewise.
* src/util/hooks.c (virHookCheck): Likewise.
Steps to reproduce this bug:
1. virsh attach-disk domain --source diskimage --target sdb --sourcetype file --driver qemu --subdriver qcow2
error: Failed to attach disk
error: operation failed: adding scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 device failed: Property 'scsi-disk.drive' can't find value 'drive-scsi0-0-1'
2. service libvirtd restart
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]
3. virsh attach-disk domain --source diskimage --target sdb --sourcetype file --driver qemu --subdriver raw
error: Failed to attach disk
error: operation failed: adding lsi,id=scsi0,bus=pci.0,addr=0x6 device failed: Duplicate ID 'scsi0' for device
The reason is that we create a new scsi controller but we do not update
/var/run/libvirt/qemu/domain.xml.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
The ref count was assigned to 1 at creation, then never modified again
until it was decremented just before freeing the object.
* src/conf/domain_conf.h (_virDomainSnapshotObj): Delete unused
field.
(virDomainSnapshotObjUnref): Delete unused prototype.
* src/libvirt_private.syms: Likewise.
* src/conf/domain_conf.c (virDomainSnapshotObjNew)
(virDomainSnapshotObjListDataFree): Update users.
(virDomainSnapshotObjUnref): Delete.
Problem:
"parser.head" is not NULL even if it's free'ed by "virJSONValueFree",
returning "parser.head" when "virJSONValueFromString" fails will cause
unexpected errors (libvirtd will crash sometimes), e.g.
In function "qemuMonitorJSONArbitraryCommand":
if (!(cmd = virJSONValueFromString(cmd_str)))
goto cleanup;
if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0)
goto cleanup;
......
cleanup:
virJSONValueFree(cmd);
It will continues to send command to monitor even if "virJSONValueFromString"
is failed, and more worse, it trys to free "cmd" again.
Crash example:
{"error":{"class":"QMPBadInputObject","desc":"Expected 'execute' in QMP input","data":{"expected":"execute"}}}
{"error":{"class":"QMPBadInputObject","desc":"Expected 'execute' in QMP input","data":{"expected":"execute"}}}
error: server closed connection:
error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Connection refused
error: failed to connect to the hypervisor
This fix is to:
1) return NULL for failure of "virJSONValueFromString",
2) and it seems "virJSONValueFree" uses incorrect loop index for type
of "VIR_JSON_TYPE_OBJECT", fix it together.
* src/util/json.c
Steps to reproduce this bug:
# cat usb.xml
<hostdev mode='subsystem' type='usb'>
<source>
<address bus='0x001' device='0x003'/>
</source>
</hostdev>
# virsh attach-device vm1 usb.xml
error: Failed to attach device from usb.xml
error: server closed connection:
The reason of this bug is that we set data.cgroup to NULL, and this will cause
libvirtd crashed.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
A future patch will change reference counting idioms; consolidating
this pattern now makes the next patch smaller (touch only the new
macro rather than every caller).
* src/qemu/qemu_monitor.c (QEMU_MONITOR_CALLBACK): New helper.
(qemuMonitorGetDiskSecret, qemuMonitorEmitShutdown)
(qemuMonitorEmitReset, qemuMonitorEmitPowerdown)
(qemuMonitorEmitStop, qemuMonitorEmitRTCChange)
(qemuMonitorEmitWatchdog, qemuMonitorEmitIOError)
(qemuMonitorEmitGraphics): Use it to reduce duplication.
This patch introduces PREASSOCIATE-RR during incoming VM migration on the
destination host. This is similar to the usage of PREASSOCIATE during
migration in 8021qbg libvirt code today. PREASSOCIATE-RR is a VDP operation.
With the latest at IEEE, 8021qbh will need to support VDP operations.
A corresponding enic driver patch to support PREASSOCIATE-RR for 8021qbh
will be posted for net-next-2.6 inclusion soon.
THe veth setup in LXC had a couple of flaws, first brInit did
not report any error when it failed. Second vethCreate() did
not correctly initialize the variable containing the return
code, so could report failure even when it succeeded.
* src/lxc/lxc_driver.c: Report error when brInit fails
* src/lxc/veth.c: Fix uninitialized variable
Enhance the QEMU migration monitoring loop, so that it can get
a signal to change migration speed on the fly
* src/qemu/qemu_domain.h: Add signal for changing speed on the fly
* src/qemu/qemu_driver.c: Wire up virDomainMigrateSetSpeed driver
* src/qemu/qemu_migration.c: Support signal for changing speed
It is possible to set a migration speed limit when starting
migration. This new API allows the speed limit to be changed
on the fly to adjust to changing conditions
* src/driver.h, src/libvirt.c, src/libvirt_public.syms,
include/libvirt/libvirt.h.in: Add virDomainMigrateSetMaxSpeed
* src/esx/esx_driver.c, src/lxc/lxc_driver.c,
src/opennebula/one_driver.c, src/openvz/openvz_driver.c,
src/phyp/phyp_driver.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c,
src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/vmware/vmware_driver.c, src/xen/xen_driver.c,
src/libxl/libxl_driver.c: Stub new API
Fix for bug https://bugzilla.redhat.com/show_bug.cgi?id=618970
The "prepare" hook is called very early in the VM statup process
before device labeling, so that it can allocate ressources not
managed by libvirt, such as DRBD, or for instance create missing
bridges and vlan interfaces.
* src/util/hooks.c src/util/hooks.h: add definitions for new hooks
VIR_HOOK_QEMU_OP_PREPARE and VIR_HOOK_QEMU_OP_RELEASE
* src/qemu/qemu_process.c: use them in qemuProcessStart and
qemuProcessStop()
With only a single caller to these two monitor commands, I
didn't need to wrap a new WithFds version, but just change
the command itself.
* src/qemu/qemu_monitor.h (qemuMonitorAddNetdev)
(qemuMonitorAddHostNetwork): Add parameters.
* src/qemu/qemu_monitor.c (qemuMonitorAddNetdev)
(qemuMonitorAddHostNetwork): Add support for fd passing.
* src/qemu/qemu_hotplug.c (qemuDomainAttachNetDevice): Use it to
simplify code.
This is also a bug fix - on the error path, qemu_hotplug would
leave the configfd file leaked into qemu. At least the next
attempt to hotplug a PCI device would reuse the same fdname,
and when the qemu getfd monitor command gets a new fd by the
same name as an earlier one, it closes the earlier one, so there
is no risk of qemu running out of fds.
* src/qemu/qemu_monitor.h (qemuMonitorAddDeviceWithFd): New
prototype.
* src/qemu/qemu_monitor.c (qemuMonitorAddDevice): Move guts...
(qemuMonitorAddDeviceWithFd): ...to new function, and add support
for fd passing.
* src/qemu/qemu_hotplug.c (qemuDomainAttachHostPciDevice): Use it
to simplify code.
Suggested by Daniel P. Berrange.
qemu_monitor was already returning -1 and setting errno to EINVAL
on any attempt to send an fd without a unix socket, but this was
a silent failure in the case of qemuDomainAttachHostPciDevice.
Meanwhile, qemuDomainAttachNetDevice was doing some sanity checking
for a better error message; it's better to consolidate that to a
central point in the API.
* src/qemu/qemu_hotplug.c (qemuDomainAttachNetDevice): Move sanity
checking...
* src/qemu/qemu_monitor.c (qemuMonitorSendFileHandle): ...into
central location.
Suggested by Chris Wright.
https://bugzilla.redhat.com/show_bug.cgi?id=684655 points out
a regression introduced in commit 2215050edd - non-root users
can't connect to qemu:///session because libvirtd dies when
it can't use pciaccess initialization.
* src/node_device/node_device_udev.c (udevDeviceMonitorStartup):
Don't abort udev driver (and libvirtd overall) if non-root user
can't use pciaccess.
Valgrind caught that our log wrap-around was going 1 past the end.
Regression introduced in commit b16f47a; previously the
buffer was static and size+1 bytes, but now it is dynamic and
exactly size bytes.
* src/util/logging.c (virLogStr): Don't write past end of log.
We have reported error in the function prepareCall(), and
the error is not only OOM error. So we should not report
OOM error in the function call() when prepareCall() failed.
If virFileIsExecutable is to replace access(file,X_OK), then
errno must be usable on failure.
* src/util/util.c (virFileIsExecutable): Set errno on failure.
The current description suggests that you always have to provide
a valid typeVer pointer. But if you want only the libvirt version
it's also possible to set type and typeVer to NULL to skip the
hypervisor part.
This patch enables cgroup controllers as much as possible by skipping
the creation of blkio controller when running with old kernels that
doesn't support multi-level directory for blkio controller.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
THREADS.txt states that the contents of vm should not be read or
modified while the vm lock is not held, but that the lock must not
be held while performing a monitor command. This fixes all the
offenders that I could find.
* src/qemu/qemu_process.c (qemuProcessStartCPUs)
(qemuProcessInitPasswords, qemuProcessStart): Don't modify or
refer to vm state outside lock.
* src/qemu/qemu_driver.c (qemudDomainHotplugVcpus): Likewise.
* src/qemu/qemu_hotplug.c (qemuDomainChangeGraphicsPasswords):
Likewise.
This is detailed in:
https://bugzilla.redhat.com/show_bug.cgi?id=688957
Since radvd is executed by daemonizing it, the attempt to exec the
radvd binary doesn't happen until after libvirtd has already received
an exit code from the intermediate forked process, so no error is
detected or logged by __virExec().
We can't require radvd as a prerequisite for the libvirt package (many
installations don't use IPv6, so they don't need it), so instead we
add in a check to verify there is an executable radvd binary prior to
trying to exec it.
When SASL is active, it was possible that we read and decoded
more data off the wire than we initially wanted. The loop
processing this data terminated after only one message to
avoid delaying the calling thread, but this could delay
event delivery. As long as there is decoded SASL data in
memory, we must process it, before returning to the poll()
event loop.
This is a counterpart to the same kind of issue solved in
commit 68d2c3482f
in a different area of the code
* src/remote/remote_driver.c: Process all pending SASL data
virExec would only resolved the binary to $PATH if no env
variables were being set. Since there is no execvep() API
in POSIX, we use virFindFileInPath to manually resolve
the binary and then use execv() instead of execvp().
Add a new xen driver based on libxenlight [1], which is the primary
toolstack starting with Xen 4.1.0. The driver is stateful and runs
privileged only.
Like the existing xen-unified driver, the libxenlight driver is
accessed with xen:// URI. Driver selection is based on the status
of xend. If xend is running, the libxenlight driver will not load
and xen:// connections are handled by xen-unified. If xend is not
running *and* the libxenlight driver is available, xen://
connections are deferred to the libxenlight driver.
V6:
- Address several code style issues noted by Daniel Veillard
- Make drive work with xen:/// URI
- Hold domain object reference while domain is injected in
libvirt event loop. Race found and fixed by Markus Groß.
V5:
- Ensure events are unregistered when domain private data
is destroyed. Discovered and fixed by Markus Groß.
V4:
- Handle restart of libvirtd, reconnecting to previously
started domains
- Rebased to current master
- Tested against Xen 4.1 RC7-pre (c/s 22961:c5d121fd35c0)
V3:
- Reserve vnc port within driver when autoport=yes
V2:
- Update to Xen 4.1 RC6-pre (c/s 22940:5a4710640f81)
- Rebased to current master
- Plug memory leaks found by Stefano Stabellini and valgrind
- Handle SHUTDOWN_crash domain death event
[1] http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00436.html
Calling most hash APIs is not safe from inside of an iterator callback.
Exceptions are APIs that do not modify the hash table and removing
current hash entry from virHashFroEach callback.
This patch make all APIs which are not safe fail instead of just relying
on the callback being nice not calling any unsafe APIs.
Steps to reproduce this bug:
# cat test.sh
#! /bin/bash -x
virsh start domain
sleep 5
virsh qemu-monitor-command domain 'cpu_set 2 online' --hmp
# while true; do ./test.sh ; done
Then libvirtd will crash.
The reason is that:
we add a reference of obj when we open the monitor. We will reduce this
reference when we free the monitor.
If the reference of monitor is 0, we will free monitor automatically and
the reference of obj is reduced.
But in the function qemuDomainObjExitMonitorWithDriver(), we reduce this
reference again when the reference of monitor is 0.
It will cause the obj be freed in the function qemuDomainObjEndJob().
Then we start the domain again, and libvirtd will crash in the function
virDomainObjListSearchName(), because we pass a null pointer(obj->def->name)
to strcmp().
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
This bug was reported by Shi Jin(jinzishuai@gmail.com):
=============
# virsh attach-disk RHEL6RC /var/lib/libvirt/images/test3.img vdb \
--driver file --subdriver qcow2
Disk attached successfully
# virsh save RHEL6RC /var/lib/libvirt/images/memory.save
Domain RHEL6RC saved to /var/lib/libvirt/images/memory.save
# virsh restore /var/lib/libvirt/images/memory.save
error: Failed to restore domain from /var/lib/libvirt/images/memory.save
error: internal error unsupported driver name 'file'
for disk '/var/lib/libvirt/images/test3.img'
=============
We check the driver name when we start or restore VM, but we do
not check it while attaching a disk. This adds the same check on disk
driverName used in qemuBuildCommandLine to qemudDomainAttachDevice.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
As pointed out, locking the buffer from the signal handler
cannot been guaranteed to be safe, so to avoid any hazard
we prefer the trade off of dumping logs possibly messed up
by concurrent logging activity rather than risk a daemon
crash.
* src/util/logging.c: change virLogEmergencyDumpAll() to not
take any lock on the log buffer but reset buffer content variables
to an empty set before starting the actual dump.
Steps to reproduce this bug:
# virsh qemu-monitor-command domain 'cpu_set 2 online' --hmp
The domain has 2 cpus, and we try to set the third cpu online.
The qemu crashes, and this command will hang.
The reason is that the refs is not 1 when we unwatch the monitor.
We lock the monitor, but we do not unlock it. So virCondWait()
will be blocked.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
* Correct the documentation for cgroup: the swap_hard_limit indicates
mem+swap_hard_limit.
* Change cgroup private apis to: virCgroupGet/SetMemSwapHardLimit
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
I'm proposing we make use of $PCIDIR/reset in qemu-kvm to reset
devices on VM reset. We need to add it to libvirt's list of
files that get ownership for device assignment.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
xen-unstable c/s 21118:28e5409e3fb3 bumped sysctl version to 8.
xen-unstable c/s 21212:de94884a669c introduced CPU pools feature,
adding another member to xen_domctl_getdomaininfo struct. Add a
corresponding domctl v7 struct in xen hypervisor sub-driver and
detect sysctl v8 during initialization.
The virCond of the remote_thread_call struct was leaked in some
places. This results in leaking the underlying mutex. Which in turn
leaks a handle on Windows.
Reported by Aliaksandr Chabatar and Ihar Smertsin.
A bug in libnl (see https://bugzilla.redhat.com/show_bug.cgi?id=677724
and https://bugzilla.redhat.com/show_bug.cgi?id=677725) makes it very
easy to create a failure to connect to the netlink socket when trying
to open a macvtap network device ("type='direct'" in domain interface
XML). When that error occurred (during a call to libnl's nl_connect()
from libvirt's nlComm(), there was no log message, leading virsh (for
example) to report "unknown error".
There were two other cases in nlComm where an error in a libnl
function might return with failure but no error reported. In all three
cases, this patch logs a message which will hopefully be more useful.
Note that more detailed information about the failure might be
available from libnl's nl_geterror() function, but it calls
strerror(), which is not threadsafe, so we can't use it.
If pool xml has no definition for "port", then "Segmentation fault"
happens when jumping to "cleanup:" to do "VIR_FREE(port)", as "port"
was not initialized in this situation.
* src/conf/storage_conf.c
POSIX states about dd:
If the bs=expr operand is specified and no conversions other than
sync, noerror, or notrunc are requested, the data returned from each
input block shall be written as a separate output block; if the read
returns less than a full block and the sync conversion is not
specified, the resulting output block shall be the same size as the
input block. If the bs=expr operand is not specified, or a conversion
other than sync, noerror, or notrunc is requested, the input shall be
processed and collected into full-sized output blocks until the end of
the input is reached.
Since we aren't using conv=sync, there is no zero-padding, but our
use of bs= means that a short read results in a short write. If
instead we use ibs= and obs=, then short reads are collected and dd
only has to do a single write, which can make dd more efficient.
* src/qemu/qemu_monitor.c (qemuMonitorMigrateToFile):
Avoid 'dd bs=', since it can cause short writes.
The virCommandNewArgs() method would free the virCommandPtr
if it failed to add the args. This meant errors reported in
virCommandAddArgSet() were lost. Simply removing the check
for errors from the constructor means they can be reported
correctly later
The virCommandAddEnvPassCommon() method failed to check for
errors before reallocating the cmd->env array, causing a
potential SEGV if cmd was NULL
The virCommandAddArgSet() method needs to validate that at
least 1 element in 'val's parameter is non-NULL, otherwise
code like
cmd = virCommandNew(binary)
virCommandAddAtg(cmd, "foo")
Would end up trying todo execve("foo"), if binary was
NULL.
The virSetNonBlock() API only allows enabling non-blocking
operations. It doesn't allow turning blocking back on. Add
a new API to allow arbitrary toggling.
* src/libvirt_private.syms, src/util/util.h
src/util/util.c: Add virSetBlocking
This patch fix a simple bug in virDomainSetMemoryFlags function.
The patch sent before lacks the consideration of the case
where the driver doesn't support virDomainSetMemoryFlags API.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
The current LXC I/O controller looks for HUP to detect
when a guest has quit. This isn't reliable as during
initial bootup it is possible that 'init' will close
the console and let mingetty re-open it. The shutdown
of containers was also flakey because it only killed
the libvirt I/O controller and expected container
processes to gracefully follow.
Change the I/O controller such that when it see HUP
or an I/O error, it uses kill($PID, 0) to see if the
process has really quit.
Change the container shutdown sequence to use the
virCgroupKillPainfully function to ensure every
really goes away
This change makes the use of the 'cpu', 'devices'
and 'memory' cgroups controllers compulsory with
LXC
* docs/drvlxc.html.in: Document that certain cgroups
controllers are now mandatory
* src/lxc/lxc_controller.c: Check if PID is still
alive before quitting on I/O error/HUP
* src/lxc/lxc_driver.c: Use virCgroupKillPainfully
This is the part allowing to dynamically resize the debug log
buffer from it's default 64kB size. The buffer is now dynamically
allocated.
It adds a new API virLogSetBufferSize() which resizes the buffer
If passed a zero size, the buffer is deallocated and we do the small
optimization of not formatting messages which are not output anymore.
On the daemon side, it just adds a new option log_buffer_size to
libvirtd.conf and call virLogSetBufferSize() if needed
* src/util/logging.h src/util/logging.c src/libvirt_private.syms:
make buffer dynamic and add virLogSetBufferSize() internal API
* daemon/libvirtd.conf: document the new log_buffer_size option
* daemon/libvirtd.c: read and use the new log_buffer_size option
Outgoing migration still uses a Unix socket and or exec netcat until
the next patch.
* src/qemu/qemu_migration.c (qemuMigrationPrepareTunnel):
Replace Unix socket with simpler pipe.
Suggested by Paolo Bonzini.
The newly added call to qemuAuditNetDevice in qemuPhysIfaceConnect was
assuming that res_ifname (the name of the macvtap device) was always
valid, but this isn't the case. If openMacvtapTap fails, it always
returns NULL, which would result in a segv.
Since the audit log only needs a record of devices that are actually
sent to qemu, and a failure to open the macvtap device means that no
device will be sent to qemu, we can solve this problem by only doing
the audit if openMacvtapTap is successful (in which case res_ifname is
guaranteed valid).
Normally dnsmasq will send a default route (the address of the host in
the network definition) to any client requesting an address via
DHCP. On an isolated network this makes no sense, as we have iptables
to prevent any traffic going out via that interface, so anything sent
that way would be dropped anyway.
This extra/unusable default route becomes problematic if you have
setup a guest with multiple network interfaces, with one connected to
an isolated network and another that provides connectivity to the
outside (example - one interface directly connecting to a physical
interface via macvtap, with a second connected to an isolated network
so that the host and guest can communicate (macvtap doesn't support
guest<->host communication without an external switch that supports
vepa, or reflecting all traffic back)). In this case, if the guest
chooses the default route of the isolated network, the guest will not
be able to get network traffic beyond the host.
To prevent dnsmasq from sending a default route, you can tell it to
send 0 bytes of data for the default route option (option number 3)
with --dhcp-option=3 (normally the data to send for the option would
follow the option number; no extra data means "don't send this option").
I have checked on RHEL5 (a good representative of the oldest supported
libvirt platforms) and its version of dnsmasq (2.45) does support
--dhcp-option, so this shouldn't create any compatibility problems.
As pointed on CVE-2011-1146, some API forgot to check the read-only
status of the connection for entry point which modify the state
of the system or may lead to a remote execution using user data.
The entry points concerned are:
- virConnectDomainXMLToNative
- virNodeDeviceDettach
- virNodeDeviceReAttach
- virNodeDeviceReset
- virDomainRevertToSnapshot
- virDomainSnapshotDelete
* src/libvirt.c: fix the above set of entry points to error on read-only
connections
By default, all dnsmasq processes share the same leases file. libvirt
also uses the --dhcp-lease-max option to control the maximum number of
leases allowed. The problem is that libvirt puts in a number equal to
the number of addresses in the range for the one network handled by a
single instance of dnsmasq, but dnsmasq checks the total number of
leases in the file (which could potentially contain many more).
The solution is to tell each instance of dnsmasq to create and use its
own leases file. (/var/lib/libvirt/network/<net-name>.leases).
This file is created by dnsmasq when it starts, but not deleted when
it exists. This is fine when the network is just being stopped, but if
the leases file was left around when a network was undefined, we could
end up with an ever-increasing number of dead files - instead, we
explicitly unlink the leases file when a network is undefined.
Note that Ubuntu carries a patch against an older version of libvirt for this:
hhttps://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/713071
ttp://bazaar.launchpad.net/~serge-hallyn/ubuntu/maverick/libvirt/bugall/revision/109
I was certain I'd also seen discussion of this on libvir-list or
libvirt-users, but couldn't find it.
This fixes a regression introduced in commit ad48df, and reported on
the libvirt-users list:
https://www.redhat.com/archives/libvirt-users/2011-March/msg00018.html
The problem in that commit was that we began searching a list of ip
address definitions (rather than just having one) to look for a dhcp
range or static host; when we didn't find any, our pointer (ipdef) was
left at NULL, and when ipdef was NULL, we returned without starting up
dnsmasq.
Previously dnsmasq was started even without any dhcp ranges or static
entries, because it's still useful for DNS services.
Another problem I noticed while investigating was that, if there are
IPv6 addresses, but no IPv4 addresses of any kind, we would jump out
at an ever higher level in the call chain.
This patch does the following:
1) networkBuildDnsmasqArgv() = all uses of ipdef are protected from
NULL dereference. (this patch doesn't change indentation, to make
review easier. The next patch will change just the
indentation). ipdef is intended to point to the first IPv4 address
with DHCP info (or the first IPv4 address if none of them have any
dhcp info).
2) networkStartDhcpDaemon() = if the loop looking for an ipdef with
DHCP info comes up empty, we then grab the first IPv4 def from the
list. Also, instead of returning if there are no IPv4 defs, we just
return if there are no IP defs at all (either v4 or v6). This way a
network that is IPv6-only will still get dnsmasq listening for DNS
queries.
3) in networkStartNetworkDaemon() - we will startup dhcp not just if there
are any IPv4 addresses, but also if there are any IPv6 addresses.
Currently a single storage volume with a broken backing file will disable the
whole storage pool. This can happen when the backing file is on some
unavailable network storage or if the backing volume is deleted, while the
storage volumes using it remain.
Since the storage pool can not be re-activated, re-creating the missing
or deleting the now useless volumes using libvirt only is not possible.
Fixing this is a little bit tricky:
1. virStorageBackendProbeTarget() only detects the missing backing file,
if the backing file format is not explicitly specified. If the
backing file is created using
kvm-img create -f qcow2 -o backing_fmt=qcow2,backing_file=... ...
no error is detected at this stage.
The new return code -3 signals that the backing file could not be
opened.
2. The backingStore.format must be >= 0, since values < 0 would break
virStorageVolTargetDefFormat() when dumping the XML data such as
<format type='...'/>
Because of this the format is faked as VIR_STORAGE_FILE_RAW.
3. virStorageBackendUpdateVolTargetInfo() always opens the backing file
and thus always detects a missing backing file.
Since it "only" updates the capacity, allocation, owner, group, mode
and SELinux label, just ignore errors at this stage, print an error
message and continue.
4. Using vol-dump on a broken volume still doesn't work, but at least
vol-destroy and pool-refresh do work now.
To reproduce:
dir=$(mktemp -d)
virsh pool-create-as tmp dir '' '' '' '' "$dir"
virsh vol-create-as --format qcow2 tmp back 1G
virsh vol-create-as --format qcow2 --backing-vol-format qcow2 --backing-vol back tmp cow 1G
virsh vol-delete --pool tmp back
virsh pool-refresh tmp
After the last step, the pool will be gone (because it was not persistent). As
long as the now broken image stays in the directory, you will not be able to
re-create or re-start the pool.
Signed-off-by: Philipp Hahn <hahn@univention.de>
This patch introduces a new libvirt API (virDomainSetMemoryFlags) and
a flag (virDomainMemoryModFlags).
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Opening raw network devices with the intent of passing those fds to
qemu is worth an audit point. This makes a multi-part audit: first,
we audit the device(s) that libvirt opens on behalf of the MAC address
of a to-be-created interface (which can independently succeed or
fail), then we audit whether qemu actually started the network device
with the same MAC (so searching backwards for successful audits with
the same MAC will show which fd(s) qemu is actually using). Note that
it is possible for the fd to be successfully opened but no attempt
made to pass the fd to qemu (for example, because intermediate
nwfilter operations failed) - no interface start audit will occur in
that case; so the audit for a successful opened fd does not imply
rights given to qemu unless there is a followup audit about the
attempt to start a new interface.
Likewise, when a network device is hot-unplugged, there is only one
audit message about the MAC being discontinued; again, searching back
to the earlier device open audits will show which fds that qemu quits
using (and yes, I checked via /proc/<qemu-pid>/fd that qemu _does_
close out the fds associated with an interface on hot-unplug). The
code would require much more refactoring to be able to definitively
state which device(s) were discontinued at that point, since we
currently don't record anywhere in the XML whether /dev/vhost-net was
opened for a given interface.
* src/qemu/qemu_audit.h (qemuAuditNetDevice): New prototype.
* src/qemu/qemu_audit.c (qemuAuditNetDevice): New function.
* src/qemu/qemu_command.h (qemuNetworkIfaceConnect)
(qemuPhysIfaceConnect, qemuOpenVhostNet): Adjust prototype.
* src/qemu/qemu_command.c (qemuNetworkIfaceConnect)
(qemuPhysIfaceConnect, qemuOpenVhostNet): Add audit points and
adjust parameters.
(qemuBuildCommandLine): Adjust caller.
* src/qemu/qemu_hotplug.c (qemuDomainAttachNetDevice): Likewise.
Since libvirt always passes /dev/net/tun to qemu via fd, we should
never trigger the cases where qemu tries to directly open the
device. Therefore, it is safer to deny the cgroup device ACL.
* src/qemu/qemu_cgroup.c (defaultDeviceACL): Remove /dev/net/tun.
* src/qemu/qemu.conf (cgroup_device_acl): Reflect this change.
qemu driver in libvirt gained support for creating domain snapshots
almost a year ago in libvirt 0.8.0. Since then we enabled QMP support
for qemu >= 0.13.0 but QMP equivalents of {save,load,del}vm commands are
not implemented in current qemu (0.14.0) so the domain snapshot support
is not very useful.
This patch detects when the appropriate QMP command is not implemented
and tries to use human-monitor-command (aka HMP passthrough) to run
it's HMP equivalent.
JSON monitor command implementation can now just directly call text
monitor implementation and it will be automatically encapsulated into
QMP's human-monitor-command.
Some qemu monitor event handlers were issuing inadequate warning when
virDomainSaveStatus() failed. They copied the message from I/O error
handler without customizing it to provide better information on why
virDomainSaveStatus() was called.
For newer qemu-img, the help string for "backing file format" is
"[-F backing_fmt]".
Fix the wrong logic error by commit e997c268.
* src/storage/storage_backend.c