Commit Graph

4403 Commits

Author SHA1 Message Date
Jiri Denemark
81711cee34 qemu: Escape snapshot name passed to {save,load,del}vm 2011-03-10 14:36:05 +01:00
Jiri Denemark
89241fe0d1 qemu: Fallback to HMP for snapshot commands
qemu driver in libvirt gained support for creating domain snapshots
almost a year ago in libvirt 0.8.0. Since then we enabled QMP support
for qemu >= 0.13.0 but QMP equivalents of {save,load,del}vm commands are
not implemented in current qemu (0.14.0) so the domain snapshot support
is not very useful.

This patch detects when the appropriate QMP command is not implemented
and tries to use human-monitor-command (aka HMP passthrough) to run
it's HMP equivalent.
2011-03-10 14:36:05 +01:00
Jiri Denemark
b3c6ec03b8 qemu: Rename qemuMonitorCommandWithHandler as qemuMonitorText*
To make it more obvious that it is only used for text monitor. The
naming also matches the style of qemuMonitorTextCommandWithFd.
2011-03-10 14:36:05 +01:00
Jiri Denemark
39b4f4aab2 qemu: Rename qemuMonitorCommand{,WithFd} as qemuMonitorHMP*
So that it's obvious that they are supposed to be used with HMP commands.
2011-03-10 14:36:04 +01:00
Jiri Denemark
266265a560 qemu: Setup infrastructure for HMP passthrough
JSON monitor command implementation can now just directly call text
monitor implementation and it will be automatically encapsulated into
QMP's human-monitor-command.
2011-03-10 14:36:04 +01:00
Jiri Denemark
3b8bf4a3a9 qemu: Fix warnings in event handlers
Some qemu monitor event handlers were issuing inadequate warning when
virDomainSaveStatus() failed. They copied the message from I/O error
handler without customizing it to provide better information on why
virDomainSaveStatus() was called.
2011-03-10 14:18:37 +01:00
Osier Yang
d999376954 storage: Update qemu-img flag checking
For newer qemu-img, the help string for "backing file format" is
"[-F backing_fmt]".

Fix the wrong logic error by commit e997c268.

* src/storage/storage_backend.c
2011-03-10 15:02:28 +08:00
Osier Yang
e997c268ef qemu: Replace deprecated option of qemu-img
qemu-img silently disable "-e", so we can't use it for volume
encryption anymore, change it into "-o encryption=on" if qemu
supports "-o" option.
2011-03-10 10:05:14 +08:00
Eric Blake
340ab27dd2 audit: also audit cgroup ACL permissions
* src/qemu/qemu_audit.h (qemuAuditCgroupMajor)
(qemuAuditCgroupPath): Add parameter.
* src/qemu/qemu_audit.c (qemuAuditCgroupMajor)
(qemuAuditCgroupPath): Add 'acl=rwm' to cgroup audit entries.
* src/qemu/qemu_cgroup.c: Update clients.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Likewise.
2011-03-09 11:36:59 -07:00
Eric Blake
5564c57528 cgroup: allow fine-tuning of device ACL permissions
Adding audit points showed that we were granting too much privilege
to qemu; it should not need any mknod rights to recreate any
devices.  On the other hand, lxc should have all device privileges.
The solution is adding a flag parameter.

This also lets us restrict write access to read-only disks.

* src/util/cgroup.h (virCgroup*Device*): Adjust prototypes.
* src/util/cgroup.c (virCgroupAllowDevice)
(virCgroupAllowDeviceMajor, virCgroupAllowDevicePath)
(virCgroupDenyDevice, virCgroupDenyDeviceMajor)
(virCgroupDenyDevicePath): Add parameter.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Update clients.
* src/lxc/lxc_controller.c (lxcSetContainerResources): Likewise.
* src/qemu/qemu_cgroup.c: Likewise.
(qemuSetupDiskPathAllow): Also, honor read-only disks.
2011-03-09 11:35:36 -07:00
Eric Blake
48096a0064 audit: rename remaining qemu audit functions
Also add ATTRIBUTE_NONNULL markers.

* src/qemu/qemu_audit.h: The pattern qemuDomainXXXAudit is
inconsistent; prefer qemuAuditXXX instead.
* src/qemu/qemu_audit.c: Reflect the renames.
* src/qemu/qemu_driver.c: Likewise.
* src/qemu/qemu_hotplug.c: Likewise.
* src/qemu/qemu_migration.c: Likewise.
* src/qemu/qemu_process.c: Likewise.
2011-03-09 11:35:20 -07:00
Eric Blake
f2512684ad audit: also audit cgroup controller path
Although the cgroup device ACL controller path can be worked out
by researching the code, it is more efficient to include that
information directly in the audit message.

* src/util/cgroup.h (virCgroupPathOfController): New prototype.
* src/util/cgroup.c (virCgroupPathOfController): Export.
* src/libvirt_private.syms: Likewise.
* src/qemu/qemu_audit.c (qemuAuditCgroup): Use it.
2011-03-09 10:19:17 -07:00
Eric Blake
d04916faae audit: split cgroup audit types to allow more information
Device names can be manipulated, so it is better to also log
the major/minor device number corresponding to the cgroup ACL
changes that libvirt made.  This required some refactoring
of the relatively new qemu cgroup audit code.

Also, qemuSetupChardevCgroup was only auditing on failure, not success.

* src/qemu/qemu_audit.h (qemuDomainCgroupAudit): Delete.
(qemuAuditCgroup, qemuAuditCgroupMajor, qemuAuditCgroupPath): New
prototypes.
* src/qemu/qemu_audit.c (qemuDomainCgroupAudit): Rename...
(qemuAuditCgroup): ...and drop a parameter.
(qemuAuditCgroupMajor, qemuAuditCgroupPath): New functions, to
allow listing device major/minor in audit.
(qemuAuditGetRdev): New helper function.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Adjust callers.
* src/qemu/qemu_cgroup.c (qemuSetupDiskPathAllow)
(qemuSetupHostUsbDeviceCgroup, qemuSetupCgroup)
(qemuTeardownDiskPathDeny): Likewise.
(qemuSetupChardevCgroup): Likewise, fixing missing audit.
2011-03-09 09:08:10 -07:00
Eric Blake
30ad48836e audit: tweak audit messages to match conventions
* src/qemu/qemu_audit.c (qemuDomainHostdevAudit): Avoid use of
"type", which has a pre-defined meaning.
(qemuDomainCgroupAudit): Likewise, as well as "item".
2011-03-09 08:11:31 -07:00
Eric Blake
b12a02803e docs: silence warnings about generated API docs
I noticed these while testing 'make dist'.

Parsing ./../src/util/event.c
Function comment for virEventRegisterDefaultImpl lacks description of return value
Function comment for virEventRunDefaultImpl lacks description of return value
Parsing ./../src/util/virterror.c
Missing comment for function virSetErrorLogPriorityFunc

* src/util/event.c (virEventRegisterDefaultImpl)
(virEventRunDefaultImpl): Document return types.
* src/util/virterror.c (virSetErrorLogPriorityFunc): Provide docs.
2011-03-09 08:07:09 -07:00
Cole Robinson
9189301426 Don't overwrite virRun error messages
virRun gives pretty useful error output, let's not overwrite it unless there
is a good reason. Some places were providing more information about what
the commands were _attempting_ to do, however that's usually less useful from
a debugging POV than what actually happened.
2011-03-09 08:53:12 -05:00
Guido Günther
ae1c5a936c libvirtd: Remove indirect linking
as described at
http://wiki.debian.org/ToolChain/DSOLinking
https://fedoraproject.org/wiki/UnderstandingDSOLinkChange

otherwise the build fails on current Debian unstable with:

CCLD   libvirtd
/usr/bin/ld: ../src/.libs/libvirt_driver_lxc.a(libvirt_driver_lxc_la-lxc_container.o): undefined reference to symbol 'capng_apply'
/usr/bin/ld: note: 'capng_apply' is defined in DSO //usr/lib/libcap-ng.so.0 so try adding it to the linker command line

CCLD   libvirtd
/usr/bin/ld: ../src/.libs/libvirt_driver_storage.a(libvirt_driver_storage_la-storage_backend.o): undefined reference to symbol 'fgetfilecon'
/usr/bin/ld: note: 'fgetfilecon' is defined in DSO //lib/libselinux.so.1 so try adding it to the linker command line
//lib/libselinux.so.1: could not read symbols: Invalid operation

and similar errors.
2011-03-09 13:59:58 +01:00
Hu Tao
83d35233a9 Fix a wrong error message thrown to user
* src/qemu/qemu_driver.c: qemuDomainUpdateDeviceFlags() is not disk
  specific as the message suggests
2011-03-09 20:09:25 +08:00
Eric Blake
3dfd4ea398 build: avoid compiler warning on cygwin
On cygwin:

  CC       libvirt_driver_security_la-security_dac.lo
security/security_dac.c: In function 'virSecurityDACSetProcessLabel':
security/security_dac.c:618: warning: format '%d' expects type 'int', but argument 7 has type 'uid_t' [-Wformat]

We've done this before (see src/util/util.c).

* src/security/security_dac.c (virSecurityDACSetProcessLabel): On
cygwin, uid_t is a 32-bit long.
2011-03-08 21:56:18 -07:00
Eric Blake
b1a5aefcee build: fix build on cygwin
On cygwin:

  CC        libvirt_util_la-cgroup.lo
util/cgroup.c: In function 'virCgroupKillRecursiveInternal':
util/cgroup.c:1458: warning: implicit declaration of function 'virCgroupNew' [-Wimplicit-function-declaration]

* src/util/cgroup.c (virCgroupKill): Don't build on platforms
where virCgroupNew is unsupported.
2011-03-08 21:44:24 -07:00
Daniel Veillard
d299e1d08e Fix build on cygwin
Apparently some signals found on Unix are not exposed, this led
to a compilation failure
* src/util/logging.c: make code related to each signal dependant
  upon the definition of that signal
2011-03-08 16:01:25 +08:00
Wen Congyang
0e29f71135 support to detach USB disk
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-03-07 11:40:12 -07:00
Wen Congyang
8f338032b9 rename qemuDomainDetachSCSIDiskDevice to qemuDomainDetachDiskDevice
The way to detach a USB disk is the same as that to detach a SCSI
disk. Rename this function and we can use it to detach a USB disk.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-03-07 11:28:15 -07:00
Cole Robinson
56a4d8127f qemu_hotplug: Reword error if spice password change not available
Currently it sounds like spice is completely unsupported, which is
confusing.
2011-03-07 13:26:46 -05:00
Wen Congyang
ac9ee6b5e0 unlock eventLoop before calling callback function
When I use newest libvirt to save a domain, libvirtd will be deadlock.
Here is the output of gdb:
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f972a1fc710 (LWP 30265))]#0  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
    at qemu/qemu_driver.c:2074
    ret=0x7f972a1fbbe0) at remote.c:2273
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f9730bcd710 (LWP 30261))]#0  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
(gdb) p *(virMutexPtr)0x6fdd60
$2 = {lock = {__data = {__lock = 2, __count = 0, __owner = 30261, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\000\000\000\000\065v\000\000\001", '\000' <repeats 26 times>, __align = 2}}
(gdb) p *(virMutexPtr)0x1a63ac0
$3 = {lock = {__data = {__lock = 2, __count = 0, __owner = 30265, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\000\000\000\000\071v\000\000\001", '\000' <repeats 26 times>, __align = 2}}
(gdb) info threads
  7 Thread 0x7f9730bcd710 (LWP 30261)  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
  6 Thread 0x7f972bfff710 (LWP 30262)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f972b5fe710 (LWP 30263)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f972abfd710 (LWP 30264)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 3 Thread 0x7f972a1fc710 (LWP 30265)  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x7f97297fb710 (LWP 30266)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  1 Thread 0x7f9737aac800 (LWP 30260)  0x000000351fe0803d in pthread_join () from /lib64/libpthread.so.0

The reason is that we will try to lock some object in callback function, and we may call event API with locking the same object.
In the function virEventDispatchHandles(), we unlock eventLoop before calling callback function. I think we should
do the same thing in the function virEventCleanupTimeouts() and virEventCleanupHandles().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2011-03-07 10:05:17 -07:00
Daniel P. Berrange
2ed6cc7bec Expose event loop implementation as a public API
Not all applications have an existing event loop they need
to integrate with. Forcing them to implement the libvirt
event loop integration APIs is an undue burden. This just
exposes our simple poll() based implementation for apps
to use. So instead of calling

   virEventRegister(....callbacks...)

The app would call

   virEventRegisterDefaultImpl()

And then have a thread somewhere calling

    static bool quit = false;
    ....
    while (!quit)
      virEventRunDefaultImpl()

* daemon/libvirtd.c, tools/console.c,
  tools/virsh.c: Convert to public event loop APIs
* include/libvirt/libvirt.h.in, src/libvirt_private.syms: Add
  virEventRegisterDefaultImpl and virEventRunDefaultImpl
* src/util/event.c: Implement virEventRegisterDefaultImpl
  and virEventRunDefaultImpl using poll() event loop
* src/util/event_poll.c: Add full error reporting
* src/util/virterror.c, include/libvirt/virterror.h: Add
  VIR_FROM_EVENTS
2011-03-07 14:16:13 +00:00
Daniel P. Berrange
343eaa150b Move event code out of the daemon/ into src/util/
The event loop implementation is used by more than just the
daemon, so move it into the shared area.

* daemon/event.c, src/util/event_poll.c: Renamed
* daemon/event.h, src/util/event_poll.h: Renamed
* tools/Makefile.am, tools/console.c, tools/virsh.c: Update
  to use new virEventPoll APIs
* daemon/mdns.c, daemon/mdns.c, daemon/Makefile.am: Update
  to use new virEventPoll APIs
2011-03-07 14:16:13 +00:00
Daniel Veillard
bcb40b852c Cleaning up some of the logging code
* src/util/logging.c: fix virLogDumpAllFD() to avoid snprintf, simplify
  the code and provide more useful signal descriptions. Also remove an
  unused variable.
2011-03-07 21:23:53 +08:00
Osier Yang
82dfc6f38e qemu: Support vram for video of qxl type
For qemu names the primary vga as "qxl-vga":

  1) if vram is specified for 2nd qxl device:

    -vga qxl -global qxl-vga.vram_size=$SIZE \
    -device qxl,id=video1,vram_size=$SIZE,...

  2) if vram is not specified for 2nd qxl device, (use the default
     set by global):

    -vga qxl -global qxl-vga.vram_size=$SIZE \
    -device qxl,id=video1,...

For qemu names all qxl devices as "qxl":

  1) if vram is specified for 2nd qxl device:

    -vga qxl -global qxl.vram_size=$SIZE \
    -device qxl,id=video1,vram_size=$SIZE ...

  2) if vram is not specified for 2nd qxl device:

    -vga qxl -global qxl-vga.vram_size=$SIZE \
    -device qxl,id=video1,...

"-global" is the only way to define vram_size for the primary qxl
device, regardless of how qemu names it, (It's not good a good
way, as original idea of "-global" is to set a global default for
a driver property, but to specify vram for first qxl device, we
have to use it).

For other qxl devices, as they are represented by "-device", could
specify it directly and seperately for each, and it overrides the
default set by "-global" if specified.

v1 - v2:
  * modify "virDomainVideoDefaultRAM" so that it returns 16M as the
    default vram_size for qxl device.

  * vram_size * 1024 (qemu accepts bytes for vram_size).

  * apply default vram_size for qxl device for which vram_size is
    not specified.

  * modify "graphics-spice" tests (more sensiable vram_size)

  * Add an argument of virDomainDefPtr type for qemuBuildVideoDevStr,
    to use virDomainVideoDefaultRAM in qemuBuildVideoDevStr).

v2 - v3:
  * Modify default video memory size for qxl device from 16M to 24M

  * Update codes to be consistent with changes on qemu_capabilities.*
2011-03-06 22:00:27 +08:00
Phil Petty
5a81401235 fixes for several memory leaks
Signed-off-by: Eric Blake <eblake@redhat.com>
2011-03-04 09:52:12 -07:00
Daniel Veillard
398553c157 Add an an internal API for emergency dump of debug buffer
virLogEmergencyDumpAll() allows to dump the content of the
debug buffer from within a signal handler. It saves to all
log file or stderr if none is found
* src/util/logging.h src/util/logging.c: add the new API
  and cleanup the old virLogDump code
* src/libvirt_private.syms: exports it as a private symbol
2011-03-04 22:43:55 +08:00
Daniel Veillard
35708ec151 Fix a counter bug in the log buffer
* src/util/logging.c: the start pointer need to wrap around too
2011-03-04 22:43:55 +08:00
Daniel Veillard
8b9a1190c1 Force all logs to go to the round robbin memory buffer
Initially only the log actually written out by libvirt were
saved on the memory buffer, this patch forces all informations
including info and debug to be saved in memory too. This is
useful to get full data in case of crash.
2011-03-04 22:43:55 +08:00
Laine Stump
f8ac67909d qemu: avoid corruption of domain hashtable and misuse of freed domains
This was also found while investigating

   https://bugzilla.redhat.com/show_bug.cgi?id=670848

An EOF on a domain's monitor socket results in an event being queued
to handle the EOF. The handler calls qemuProcessHandleMonitorEOF. If
it is a transient domain, this leads to a call to
virDomainRemoveInactive, which removes the domain from the driver's
hashtable and unref's it. Nowhere in this code is the qemu driver lock
acquired.

However, all modifications to the driver's domain hashtable *must* be
done while holding the driver lock, otherwise the hashtable can become
corrupt, and (even more likely) another thread could call a different
hashtable function and acquire a pointer to the domain that is in the
process of being destroyed.

To prevent such a disaster, qemuProcessHandleMonitorEOF must get the
qemu driver lock *before* it gets the DomainObj's lock, and hold it
until it is finished with the DomainObj. This guarantees that nobody
else modifies the hashtable at the same time, and that anyone who had
already gotten the DomainObj from the hashtable prior to this call has
finished with it before we remove/destroy it.
2011-03-04 08:13:11 -05:00
Laine Stump
e570ca1246 qemu: Add missing lock of virDomainObj before calling virDomainUnref
This was found while researching the root cause of:

https://bugzilla.redhat.com/show_bug.cgi?id=670848

virDomainUnref should only be called with the lock held for the
virDomainObj in question. However, when a transient qemu domain gets
EOF on its monitor socket, it queues an event which frees the monitor,
which unref's the virDomainObj without first locking it. If another
thread has already locked the virDomainObj, the modification of the
refcount could potentially be corrupted. In an extreme case, it could
also be potentially unlocked by virDomainObjFree, thus left open to
modification by anyone else who would have otherwise waited for the
lock (not to mention the fact that they would be accessing freed
data!).

The solution is to have qemuMonitorFree lock the domain object right
before unrefing it. Since the caller to qemuMonitorFree doesn't expect
this lock to be held, if the refcount doesn't go all the way to 0,
qemuMonitorFree must unlock it after the unref.
2011-03-04 08:12:58 -05:00
Matthias Bolte
b31d6c1269 esx: Escape password for XML
Passwords are allowed to contain <, >, &, ', " characters.
Those need to be replaced by the corresponding entities.

Reported by Hereward Cooper.
2011-03-03 22:18:09 +01:00
Eric Blake
d152f64760 util: correct retry path in virFileOperation
In virFileOperation, the parent does a fallback to a non-fork
attempt if it detects that the child returned EACCES.  However,
the child was calling _exit(-EACCES), which does _not_ appear
as EACCES in the parent.

* src/util/util.c (virFileOperation): Correctly pass EACCES from
child to parent.
2011-03-03 08:12:08 -07:00
Soren Hansen
e5f3b90e97 Pass virSecurityManagerPtr to virSecurityDAC{Set, Restore}ChardevCallback
virSecurityDAC{Set,Restore}ChardevCallback expect virSecurityManagerPtr,
but are passed virDomainObjPtr instead. This makes
virSecurityDACSetChardevLabel set a wrong uid/gid on chardevs. This
patch fixes this behaviour.

Signed-off-by: Soren Hansen <soren@linux2go.dk>
2011-03-03 08:08:16 -07:00
Jiri Denemark
9677cd33ee util: Allow removing hash entries in virHashForEach
This fixes a possible crash of libvirtd during its startup. When qemu
driver reconnects to running domains, it iterates over all domain
objects in a hash. When reconnecting to an associated qemu monitor
fails and the domain is transient, it's immediately removed from the
hash. Despite the fact that it's explicitly forbidden to do so. If
libvirtd is lucky enough, virHashForEach will access random memory when
the callback finishes and the deamon will crash.

Since it's trivial to fix virHashForEach to allow removal of hash
entries while iterating through them, I went this way instead of fixing
qemuReconnectDomain callback (and possibly others) to avoid deleting the
entries.
2011-03-03 15:22:16 +01:00
Daniel P. Berrange
d6d30cd4ae Attempt to improve an error message
Replace the 'Unknown failure' error message with something a
little bit more descriptive.

* src/util/virterror.c: Improve error message
2011-03-03 14:17:12 +00:00
Eric Blake
4f805dcdc4 qemu: avoid double close on domain restore
qemudDomainSaveImageStartVM was evil - it closed the incoming fd
argument on some, but not all, code paths, without informing the
caller about that action.  No wonder that this resulted in
double-closes: https://bugzilla.redhat.com/show_bug.cgi?id=672725

* src/qemu/qemu_driver.c (qemudDomainSaveImageStartVM): Alter
signature, to avoid double-close.
(qemudDomainRestore, qemudDomainObjRestore): Update callers.
2011-03-02 08:58:49 -07:00
Daniel P. Berrange
6a8ef183be add additional event debug points
Followup to commit 2222bd24
2011-03-01 16:54:20 -07:00
Eric Blake
7c6b22c4d5 qemu: only request sound cgroup ACL when required
When a SPICE or VNC graphics controller is present, and sound is
piggybacked over a channel to the graphics device rather than
directly accessing host hardware, then there is no need to grant
host hardware access to that qemu process.

* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Prevent sound with
spice, and with vnc when vnc_allow_host_audio is 0.
Reported by Daniel Berrange.
2011-02-28 09:42:25 -07:00
Daniel P. Berrange
3c37a171a2 Add check for kill() to fix build of cgroups on win32
The kill() function doesn't exist on Win32, so it needs to be
checked for at build time & code disabled in cgroups

* configure.ac: Check for kill()
* src/util/cgroup.c: Stub out virCGroupKill* functions
  when kill() isn't available
2011-02-28 14:13:58 +00:00
Michal Novotny
3ee7cf6c9b Add support for multiple serial ports into the Xen driver
this is the patch to add support for multiple serial ports to the
libvirt Xen driver. It support both old style (serial = "pty") and
new style (serial = [ "/dev/ttyS0", "/dev/ttyS1" ]) definition and
tests for xml2sexpr, sexpr2xml and xmconfig have been added as well.

Written and tested on RHEL-5 Xen dom0 and working as designed but
the Xen version have to have patch for RHBZ #614004 but this patch
is for upstream version of libvirt.

Also, this patch is addressing issue described in RHBZ #670789.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
2011-02-25 11:33:27 -07:00
Michal Novotny
79c3fe4d16 Fix port value parsing for serial and parallel ports
this is the patch to fix the virDomainChrDefParseTargetXML() functionality
to parse the target port from XML if available. This is necessary for
multiple serial port support which is the second part of this patch.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
2011-02-25 11:33:27 -07:00
Daniel P. Berrange
33191b419c Add APIs for killing off processes inside a cgroup
The virCgroupKill method kills all PIDs found in a cgroup

The virCgroupKillRecursively method does this recursively
for child cgroups.

The virCgroupKillPainfully method does a recursive kill
several times in a row until everything has really died
2011-02-25 14:21:30 +00:00
Daniel P. Berrange
16ba2aafc4 Allow hash tables to use generic pointers as keys
Relax the restriction that the hash table key must be a string
by allowing an arbitrary hash code generator + comparison func
to be provided

* util/hash.c, util/hash.h: Allow any pointer as a key
* internal.h: Include stdbool.h as standard.
* conf/domain_conf.c, conf/domain_conf.c,
  conf/nwfilter_params.c, nwfilter/nwfilter_gentech_driver.c,
  nwfilter/nwfilter_gentech_driver.h, nwfilter/nwfilter_learnipaddr.c,
  qemu/qemu_command.c, qemu/qemu_driver.c,
  qemu/qemu_process.c, uml/uml_driver.c,
  xen/xm_internal.c: s/char */void */ in hash callbacks
2011-02-25 13:00:54 +00:00
Daniel P. Berrange
6952708ca4 Remove deallocator parameter from hash functions
Since the deallocator is passed into the constructor of
a hash table it is not desirable to pass it into each
function again. Remove it from all functions, but provide
a virHashSteal to allow a item to be removed from a hash
table without deleteing it.

* src/util/hash.c, src/util/hash.h: Remove deallocator
  param from all functions. Add virHashSteal
* src/libvirt_private.syms: Add virHashSteal
* src/conf/domain_conf.c, src/conf/nwfilter_params.c,
  src/nwfilter/nwfilter_learnipaddr.c,
  src/qemu/qemu_command.c, src/xen/xm_internal.c: Update
  for changed hash API
2011-02-25 13:00:46 +00:00
Philipp Hahn
0905d1ee95 Fix spelling mistake: seek
Replace wrong "set" by correct "seek" in error message.

Signed-off-by: Philipp Hahn <hahn@univention.de>
2011-02-24 14:19:15 -07:00
Eric Blake
1aaef5ad72 audit: audit qemu pci and usb device passthrough
* src/qemu/qemu_audit.h (qemuDomainHostdevAudit): New prototype.
* src/qemu/qemu_audit.c (qemuDomainHostdevAudit): New function.
(qemuDomainStartAudit): Call as appropriate.
* src/qemu/qemu_hotplug.c (qemuDomainAttachHostPciDevice)
(qemuDomainAttachHostUsbDevice, qemuDomainDetachHostPciDevice)
(qemuDomainDetachHostUsbDevice): Likewise.
2011-02-24 13:32:17 -07:00
Eric Blake
e25f2c74df audit: audit qemu memory and vcpu adjusments
* src/qemu/qemu_audit.h (qemuDomainMemoryAudit)
(qemuDomainVcpuAudit): New prototypes.
* src/qemu/qemu_audit.c (qemuDomainResourceAudit)
(qemuDomainMemoryAudit, qemuDomainVcpuAudit): New functions.
(qemuDomainStartAudit): Call as appropriate.
* src/qemu/qemu_driver.c (qemudDomainSetMemory)
(qemudDomainHotplugVcpus): Likewise.
2011-02-24 13:32:17 -07:00
Eric Blake
6bb98d419f audit: add qemu hooks for auditing cgroup events
* src/qemu/qemu_audit.h (qemuDomainCgroupAudit): New prototype.
* src/qemu/qemu_audit.c (qemuDomainCgroupAudit): Implement it.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Add audit.
* src/qemu/qemu_cgroup.c (qemuSetupDiskPathAllow)
(qemuSetupChardevCgroup, qemuSetupHostUsbDeviceCgroup)
(qemuSetupCgroup, qemuTeardownDiskPathDeny): Likewise.
2011-02-24 13:32:15 -07:00
Eric Blake
b4d3434fc2 audit: prepare qemu for listing vm in cgroup audits
* src/qemu/qemu_cgroup.h (struct qemuCgroupData): New helper type.
(qemuSetupDiskPathAllow, qemuSetupChardevCgroup)
(qemuTeardownDiskPathDeny): Drop unneeded prototypes.
(qemuSetupDiskCgroup, qemuTeardownDiskCgroup): Adjust prototype.
* src/qemu/qemu_cgroup.c
(qemuSetupDiskPathAllow, qemuSetupChardevCgroup)
(qemuTeardownDiskPathDeny): Mark static and use new type.
(qemuSetupHostUsbDeviceCgroup): Use new type.
(qemuSetupDiskCgroup): Alter signature.
(qemuSetupCgroup): Adjust caller.
* src/qemu/qemu_hotplug.c (qemuDomainAttachHostUsbDevice)
(qemuDomainDetachPciDiskDevice, qemuDomainDetachSCSIDiskDevice):
Likewise.
* src/qemu/qemu_driver.c (qemudDomainAttachDevice)
(qemuDomainUpdateDeviceFlags): Likewise.
2011-02-24 13:31:05 -07:00
Eric Blake
061738764d cgroup: determine when skipping non-devices
* src/util/cgroup.c (virCgroupAllowDevicePath)
(virCgroupDenyDevicePath): Don't fail with EINVAL for
non-devices.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Update caller.
* src/qemu/qemu_cgroup.c (qemuSetupDiskPathAllow)
(qemuSetupChardevCgroup, qemuSetupHostUsbDeviceCgroup)
(qemuSetupCgroup, qemuTeardownDiskPathDeny): Likewise.
2011-02-24 13:31:05 -07:00
Eric Blake
fd21ecfd49 virExec: avoid uninitialized memory usage
valgrind warns:

==21079== Syscall param rt_sigaction(act->sa_mask) points to uninitialised byte(s)
==21079==    at 0x329840F63E: __libc_sigaction (sigaction.c:67)
==21079==    by 0x4E5A8E7: __virExec (util.c:661)

Regression introduced in commit ab07533e.  Technically, sa_mask
shouldn't affect operation if sa_flags selects sa_handler, and
sa_handler selects SIG_IGN, but better safe than sorry.

* src/util/util.c (__virExec): Supply missing sigemptyset.
2011-02-24 13:12:52 -07:00
Daniel P. Berrange
4f2094a8a6 Allow 32-on-64 execution for LXC guests
Using the 'personality(2)' system call, we can make a container
on an x86_64 host appear to be i686. Likewise for most other
Linux 64bit arches.

* src/lxc/lxc_conf.c: Fill in 32bit capabilities for x86_64 hosts
* src/lxc/lxc_container.h, src/lxc/lxc_container.c: Add API to
  check if an arch has a 32bit alternative
* src/lxc/lxc_controller.c: Set the process personality when
  starting guest
2011-02-24 12:04:29 +00:00
Daniel P. Berrange
35416720c2 Put <stdbool.h> into internal.h so it is available everywhere
Remove the <stdbool.h> header from all source files / headers
and just put it into internal.h

* src/internal.h: Add <stdbool.h>
2011-02-24 12:04:06 +00:00
Jiri Denemark
9fc4b6a606 qemu: Switch over command line capabilities to virBitmap
This is done for two reasons:
- we are getting very close to 64 flags which is the maximum we can use
  with unsigned long long
- by using LL constants in enum we already violates C99 constraint that
  enum values have to fit into int
2011-02-24 12:10:00 +01:00
Jiri Denemark
23d935bd97 qemu: Rename qemud\?CmdFlags to qemuCaps
The new name complies more with the fact that it contains a set of
qemuCapsFlags.
2011-02-24 12:08:34 +01:00
Jiri Denemark
a96d08dc53 qemu: Use helper functions for handling cmd line capabilities
Three new functions (qemuCapsSet, qemuCapsClear, and qemuCapsGet) were
introduced replacing direct bit operations.
2011-02-24 12:07:06 +01:00
Jiri Denemark
21642e82b1 qemu: Rename QEMUD_CMD_FLAG_* to QEMU_CAPS_*
The new names comply more with the fact that they are all members of
enum qemuCapsFlags.
2011-02-24 12:05:39 +01:00
Jiri Denemark
40641f2a63 util: Add API for converting virBitmap into printable string 2011-02-24 12:03:04 +01:00
Jiri Denemark
533bee8249 util: Use unsigned long as a base type for virBitmap 2011-02-24 12:00:52 +01:00
Daniel P. Berrange
6704e3fdb3 Expose name + UUID to LXC containers via env variables
When spawning 'init' in the container, set

  LIBVIRT_LXC_UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  LIBVIRT_LXC_NAME=YYYYYYYYYYYY

to allow guest software to detect & identify that they
are in a container

* src/lxc/lxc_container.c: Set LIBVIRT_LXC_UUID and
  LIBVIRT_LXC_NAME env vars
2011-02-23 11:41:02 +00:00
Daniel P. Berrange
9f5bbe3b92 Fix off-by-1 in virFileAbsPath.
The virFileAbsPath was not taking into account the '/' directory
separator when allocating memory for combining cwd + path. Convert
to use virAsprintf to avoid this type of bug completely.

* src/util/util.c: Convert virFileAbsPath to use virAsprintf
2011-02-23 11:11:45 +00:00
Daniel P. Berrange
08fb2a9ce8 Fix group/mode for /dev/pts inside LXC container
Normal practice for /dev/pts is to have it mode=620,gid=5
but LXC was leaving mode=000,gid=0 preventing unprivilegd
users in the guest use of PTYs

* src/lxc/lxc_controller.c: Fix /dev/pts setup
2011-02-23 11:11:35 +00:00
Eric Blake
009fce98be security: avoid memory leak
Leak introduced in commit d6623003.

* src/qemu/qemu_driver.c (qemuSecurityInit): Avoid leak on failure.
* src/security/security_stack.c (virSecurityStackClose): Avoid
leaking component drivers.
2011-02-22 09:50:34 -07:00
Roopa Prabhu
dfd39ccda8 802.1Qbh: Delay IFF_UP'ing interface until migration final stage
Current code does an IFF_UP on a 8021Qbh interface immediately after a port
profile set. This is ok in most cases except when its the migration prepare
stage. During migration we want to postpone IFF_UP'ing the interface on the
destination host until the source host has disassociated the interface.
This patch moves IFF_UP of the interface to the final stage of migration.
The motivation for this change is to postpone any addr registrations on the
destination host until the source host has done the addr deregistrations.

While at it, for symmetry with associate move ifDown of a 8021Qbh interface
to before disassociate
2011-02-22 07:48:54 -05:00
Osier Yang
9f928af12b storage: make debug log more useful
"__func__" is useless there, as VIR_DEBUG will print the function
name.

* src/storage/storage_backend_mpath.c
2011-02-22 10:10:31 +08:00
Markus Groß
3e53c7f954 Renamed functions in xenxs 2011-02-21 11:15:08 -07:00
Markus Groß
2e69e66e38 Moved XM formatting functions to xenxs 2011-02-21 11:14:52 -07:00
Markus Groß
1556ced2de Moved XM parsing functions to xenxs 2011-02-21 11:11:22 -07:00
Markus Groß
2f2a88b998 Moved SEXPR formatting functions to xenxs 2011-02-21 11:07:43 -07:00
Markus Groß
07129039cc Moved SEXPR parsing functions to xenxs 2011-02-21 10:53:53 -07:00
Markus Groß
c71328b9aa Moved some SEXPR functions from xen-unified 2011-02-21 10:50:18 -07:00
Markus Groß
8606ca0d0b Moved SEXPR unit to utils 2011-02-21 10:48:02 -07:00
Wen Congyang
cf61114cee protect the scsi controller to be deleted when it is in use
Steps to reproduce this bug:
1. virsh attach-disk domain --source imagefile --target sdb --sourcetype file --driver qemu --subdriver raw
2. virsh detach-device controller.xml # remove scsi controller 0
3. virsh detach-disk domain sdb
   error: Failed to detach disk
   error: operation failed: detaching scsi0-0-1 device failed: Device 'scsi0-0-1' not found

I think we should not detach a controller when it is used by some other device.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-02-21 10:38:19 -07:00
Eric Blake
994e7567b6 maint: kill all remaining uses of old DEBUG macro
Done mechanically with:
$ git grep -l '\bDEBUG0\? *(' | xargs -L1 sed -i 's/\bDEBUG0\? *(/VIR_&/'

followed by manual deletion of qemudDebug in daemon/libvirtd.c, along
with a single 'make syntax-check' fallout in the same file, and the
actual deletion in src/util/logging.h.

* src/util/logging.h (DEBUG, DEBUG0): Delete.
* daemon/libvirtd.h (qemudDebug): Likewise.
* global: Change remaining clients over to VIR_DEBUG counterpart.
2011-02-21 08:46:52 -07:00
Eric Blake
03ba07cb73 hash: make virHashFree more free-like
Two-argument free functions are uncommon; match the style elsewhere
by caching the callback at creation.

* src/util/hash.h (virHashCreate, virHashFree): Move deallocator
argument to creation.
* cfg.mk (useless_free_options): Add virHashFree.
* src/util/hash.c (_virHashTable): Track deallocator.
(virHashCreate, virHashFree): Update to new signature.
* src/conf/domain_conf.c (virDomainObjListDeinit)
(virDomainObjListInit, virDomainDiskDefForeachPath)
(virDomainSnapshotObjListDeinit, virDomainSnapshotObjListInit):
Update callers.
* src/conf/nwfilter_params.c (virNWFilterHashTableFree)
(virNWFilterHashTableCreate): Likewise.
* src/conf/nwfilter_conf.c (virNWFilterTriggerVMFilterRebuild):
Likewise.
* src/cpu/cpu_generic.c (genericHashFeatures, genericBaseline):
Likewise.
* src/xen/xm_internal.c (xenXMOpen, xenXMClose): Likewise.
* src/nwfilter/nwfilter_learnipaddr.c (virNWFilterLearnInit)
(virNWFilterLearnShutdown): Likewise.
* src/qemu/qemu_command.c (qemuDomainPCIAddressSetCreate)
(qemuDomainPCIAddressSetFree): Likewise.
* src/qemu/qemu_process.c (qemuProcessWaitForMonitor): Likewise.
2011-02-21 08:27:02 -07:00
Daniel P. Berrange
2ff4c13754 Remove all object hashtable caches from virConnectPtr
The virConnectPtr struct will cache instances of all other
objects. APIs like virDomainLookupByUUID will return a
cached object, so if you do virDomainLookupByUUID twice in
a row, you'll get the same exact virDomainPtr instance.

This does not have any performance benefit, since the actual
logic in virDomainLookupByUUID (and other APIs returning
virDomainPtr, etc instances) is not short-circuited. All
it does is to ensure there is only one single virDomainPtr
in existance for any given UUID.

The caching has a number of downsides though, all relating
to stale data. If APIs aren't careful to always overwrite
the 'id' field in virDomainPtr it may become out of data.
Likewise for the name field, if a guest is renamed, or if
a guest is deleted, and then a new one created with the
same UUID but different name.

This has been an ongoing, endless source of bugs for all
applications using libvirt from languages with garbage
collection, causing them to get virDomainPtr instances
from long ago with stale data.

The caching is also a waste of memory resources, since
both applications, and language bindings often maintain
their own hashtable caches of object instances.

This patch removes all the hash table caching, so all
APIs return brand new virDomainPtr (etc) object instances.

* src/datatypes.h: Delete all hash tables.
* src/datatypes.c: Remove all object caching code
2011-02-21 11:50:46 +00:00
Stefan Berger
912d170f87 nwfilter: enable rejection of packets
This patch adds the possibility to not just drop packets, but to also have them rejected where iptables at least sends an ICMP msg back to the originator. On ebtables this again maps into dropping packets since rejecting is not supported.

I am adding 'since 0.8.9' to the docs assuming this will be the next version of libvirt.
2011-02-18 20:13:40 -05:00
Guido Günther
acab8a97ce Drop empty argument from dnsmasq call
since dnsmasq >= 2.56 now bails out with empty arguments. See
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613944
for the Debian bug and
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589885
for the upstream reasoning.
2011-02-18 22:23:15 +01:00
Eric Blake
7b6286b780 build: fix broken mingw cross-compilation
Two regressions:
Commit df1011ca broke builds for systems that lack devmapper
(non-Linux, as well as Linux with ./autogen.sh --without-libvirtd
and without the libraries present).
Commit ce6fd650 broke cross-compilation, due to a gnulib bug.

* .gnulib: Update to latest, for cross-compilation fix.
* src/util/util.c (virIsDevMapperDevice): Provide stub for
platforms not using storage driver.
* configure.ac (devmapper): Arrange to define HAVE_LIBDEVMAPPER_H.
devmapper issue reported by Wen Congyang.
2011-02-18 12:02:22 -07:00
Matthias Bolte
791da4e736 esx: Ignore malformed host UUID from BIOS
Etienne Gosset reported that libvirt fails to connect to his ESX
server because it failed to parse its malformed host UUID, that
contains an additional space and lacks one hexdigit in the last
group:

xxxxxxxx-xxxx-xxxx-xxxx- xxxxxxxxxxx

Don't treat this as a fatal error, just ignore it.
2011-02-18 17:59:05 +01:00
Michal Privoznik
595174aeb7 virsh: freecell --all getting wrong NUMA nodes count
Virsh freecell --all was not only getting wrong NUMA nodes count, but
even the NUMA nodes IDs. They doesn't have to be continuous, as I've
found out during testing this. Therefore a modification of
nodeGetCellsFreeMemory() error message.
2011-02-18 09:26:40 -07:00
Eric Blake
053013b0da build: recompute symbols after changing configure options
$ ./configure
...
$ make
...
  GEN    libvirt.syms
...
$ ./configure --with-driver-modules
...
$ make
...

libvirt.syms doesn't get regenerated but it should as it should
contain virDriverLoadModule now.

* src/Makefile.am (libvirt.syms): Depend on configure changes.
Reported by Matthias Bolte.
2011-02-18 08:58:36 -07:00
Jim Fehlig
efc2594b4e Do not add drive 'boot=on' param when a kernel is specified
libvirt-tck was failing several domain tests [1] with qemu 0.14, which
is now less tolerable of specifying 2 bootroms with the same boot index [2].

Drop the 'boot=on' param if kernel has been specfied.

[1] https://www.redhat.com/archives/libvir-list/2011-February/msg00559.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2011-02/msg01892.html
2011-02-17 20:32:48 -07:00
Christophe Fergeau
50daaa0a3e remove duplicated call to reportOOMError 2011-02-17 17:12:23 -07:00
Christophe Fergeau
787e38890e remove space between function name and (
There were several occurrences of an extra space inserted between
a function name and the ( opening the argument list in
datatypes.c. This is not consistent with the coding style used in
the rest of this file so removing this extra space makes the
code slightly more readable.
2011-02-17 17:10:27 -07:00
Christophe Fergeau
7b9a509953 don't check for NULL before calling virHashFree
virHashFree follows the convention described in HACKING that
XXXFree() functions can be called with a NULL argument.
2011-02-17 17:02:32 -07:00
Christophe Fergeau
9905c69e4f remove no longer needed calls to virReportOOMError
Now that the virHash handling functions call virReportOOMError by
themselves when needed, users of the virHash API no longer need to
do it by themselves. Since users of the virHash API were not
consistently calling virReportOOMError after memory failures from
the virHash code, this has the added benefit of making OOM
reporting from this code more consistent and reliable.
2011-02-17 16:59:14 -07:00
Christophe Fergeau
7f1c65e551 factor common code in virHashAddEntry and virHashUpdateEntry
The only difference between these 2 functions is that one errors
out when the entry is already present while the other modifies
the existing entry. Add an helper function with a boolean argument
indicating whether existing entries should be updated or not, and
use this helper in both functions.
2011-02-17 16:46:54 -07:00
Christophe Fergeau
5c5880e047 add hash table rebalancing in virHashUpdateEntry
The code in virHashUpdateEntry and virHashAddEntry is really
similar. However, the latter rebalances the hash table when
one of its buckets contains too many elements while the former
does not. Fix this discrepancy.
2011-02-17 16:45:25 -07:00
Eric Blake
aebe04d75e hash: modernize debug code
* src/util/hash.c (virHashGrow) [DEBUG_GROW]: Use modern logging.
Reported by Christophe Fergeau.
2011-02-17 16:33:12 -07:00
Wen Congyang
34c13d0d1a check more error info about whether drive_add failed
When we attach a disk, but we specify a wrong format of disk image,
qemu monitor command drive_add will fail, but libvirt does not detect
this error.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-02-17 14:10:26 -07:00
Eric Blake
bb904f45ff logging: make VIR_ERROR and friends preserve errno
Followup to commit 17e19add, and would have prevented the bug
independently fixed in commit 76c57a7c.

* src/util/logging.c (virLogMessage): Preserve errno, since
logging should be as unintrusive as possible.
2011-02-17 14:03:11 -07:00
Laine Stump
5754dbd56d Give each virtual network bridge its own fixed MAC address
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=609463

The problem was that, since a bridge always acquires the MAC address
of the connected interface with the numerically lowest MAC, as guests
are started and stopped, it was possible for the MAC address to change
over time, and this change in the network was being detected by
Windows 7 (it sees the MAC of the default route change), so on each
reboot it would bring up a dialog box asking about this "new network".

The solution is to create a dummy tap interface with a MAC guaranteed
to be lower than any guest interface's MAC, and attach that tap to the
bridge as soon as it's created. Since all guest MAC addresses start
with 0xFE, we can just generate a MAC with the standard "0x52, 0x54,
0" prefix, and it's guaranteed to always win (physical interfaces are
never connected to these bridges, so we don't need to worry about
competing numerically with them).

Note that the dummy tap is never set to IFF_UP state - that's not
necessary in order for the bridge to take its MAC, and not setting it
to UP eliminates the clutter of having an (eg) "virbr0-nic" displayed
in the output of the ifconfig command.

I chose to not auto-generate the MAC address in the network XML
parser, as there are likely to be consumers of that API that don't
need or want to have a MAC address associated with the
bridge.

Instead, in bridge_driver.c when the network is being defined, if
there is no MAC, one is generated. To account for virtual network
configs that already exist when upgrading from an older version of
libvirt, I've added a %post script to the specfile that searches for
all network definitions in both the config directory
(/etc/libvirt/qemu/networks) and the state directory
(/var/lib/libvirt/network) that are missing a mac address, generates a
random address, and adds it to the config (and a matching address to
the state file, if there is one).

docs/formatnetwork.html.in: document <mac address.../>
docs/schemas/network.rng: add nac address to schema
libvirt.spec.in: %post script to update existing networks
src/conf/network_conf.[ch]: parse and format <mac address.../>
src/libvirt_private.syms: export a couple private symbols we need
src/network/bridge_driver.c:
    auto-generate mac address when needed,
    create dummy interface if mac address is present.
tests/networkxml2xmlin/isolated-network.xml
tests/networkxml2xmlin/routed-network.xml
tests/networkxml2xmlout/isolated-network.xml
tests/networkxml2xmlout/routed-network.xml: add mac address to some tests
2011-02-17 13:36:32 -05:00
Laine Stump
13ae7a02b3 Allow brAddTap to create a tap device that is down
An upcoming patch has a use for a tap device to be created that
doesn't need to be actually put into the "up" state, and keeping it
"down" keeps the output of ifconfig from being unnecessarily cluttered
(ifconfig won't show down interfaces unless you add "-a").

bridge.[ch]: add "up" as an arg to brAddTap()
uml_conf.c, qemu_command.c: add "up" (set to "true") to brAddTap() call.
2011-02-17 13:36:22 -05:00
Laine Stump
e9bd5c0e24 Add txmode attribute to interface XML for virtio backend
This is in response to:

   https://bugzilla.redhat.com/show_bug.cgi?id=629662

Explanation

qemu's virtio-net-pci driver allows setting the algorithm used for tx
packets to either "bh" or "timer". This is done by adding ",tx=bh" or
",tx=timer" to the "-device virtio-net-pci" commandline option.

'bh' stands for 'bottom half'; when this is set, packet tx is all done
in an iothread in the bottom half of the driver. (In libvirt, this
option is called the more descriptive "iothread".)

'timer' means that tx work is done in qemu, and if there is more tx
data than can be sent at the present time, a timer is set before qemu
moves on to do other things; when the timer fires, another attempt is
made to send more data. (libvirt retains the name "timer" for this
option.)

The resulting difference, according to the qemu developer who added
the option is:

    bh makes tx more asynchronous and reduces latency, but potentially
    causes more processor bandwidth contention since the cpu doing the
    tx isn't necessarily the cpu where the guest generated the
    packets.

Solution

This patch provides a libvirt domain xml knob to change the option on
the qemu commandline, by adding a new attribute "txmode" to the
<driver> element that can be placed inside any <interface> element in
a domain definition. It's use would be something like this:

    <interface ...>
      ...
      <model type='virtio'/>
      <driver txmode='iothread'/>
      ...
    </interface>

I chose to put this setting as an attribute to <driver> rather than as
a sub-element to <tune> because it is specific to the virtio-net
driver, not something that is generally usable by all network drivers.
(note that this is the same placement as the "driver name=..."
attribute used to choose kernel vs. userland backend for the
virtio-net driver.)

Actually adding the tx=xxx option to the qemu commandline is only done
if the version of qemu being used advertises it in the output of

    qemu -device virtio-net-pci,?

If a particular txmode is requested in the XML, and the option isn't
listed in that help output, an UNSUPPORTED_CONFIG error is logged, and
the domain fails to start.
2011-02-17 11:07:58 -05:00