Commit Graph

4501 Commits

Author SHA1 Message Date
Jim Fehlig
2b84e445d5 Add libxenlight driver
Add a new xen driver based on libxenlight [1], which is the primary
toolstack starting with Xen 4.1.0.  The driver is stateful and runs
privileged only.

Like the existing xen-unified driver, the libxenlight driver is
accessed with xen:// URI.  Driver selection is based on the status
of xend.  If xend is running, the libxenlight driver will not load
and xen:// connections are handled by xen-unified.  If xend is not
running *and* the libxenlight driver is available, xen://
connections are deferred to the libxenlight driver.

V6:
 - Address several code style issues noted by Daniel Veillard
 - Make drive work with xen:/// URI
 - Hold domain object reference while domain is injected in
   libvirt event loop.  Race found and fixed by Markus Groß.

V5:
 - Ensure events are unregistered when domain private data
   is destroyed.  Discovered and fixed by Markus Groß.

V4:
 - Handle restart of libvirtd, reconnecting to previously
   started domains
 - Rebased to current master
 - Tested against Xen 4.1 RC7-pre (c/s 22961:c5d121fd35c0)

V3:
  - Reserve vnc port within driver when autoport=yes

V2:
  - Update to Xen 4.1 RC6-pre (c/s 22940:5a4710640f81)
  - Rebased to current master
  - Plug memory leaks found by Stefano Stabellini and valgrind
  - Handle SHUTDOWN_crash domain death event

[1] http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00436.html
2011-03-18 08:57:48 -06:00
Jiri Denemark
fba550f651 util: Forbid calling hash APIs from iterator callback
Calling most hash APIs is not safe from inside of an iterator callback.
Exceptions are APIs that do not modify the hash table and removing
current hash entry from virHashFroEach callback.

This patch make all APIs which are not safe fail instead of just relying
on the callback being nice not calling any unsafe APIs.
2011-03-18 10:54:56 +01:00
Jiri Denemark
c3ad755f58 qemu: Fix copy&paste error messages in text monitor 2011-03-18 10:49:11 +01:00
Wen Congyang
d5df67be3c do not unref obj in qemuDomainObjExitMonitor*
Steps to reproduce this bug:
# cat test.sh
  #! /bin/bash -x
  virsh start domain
  sleep 5
  virsh qemu-monitor-command domain 'cpu_set 2 online' --hmp
# while true; do ./test.sh ; done

Then libvirtd will crash.

The reason is that:
we add a reference of obj when we open the monitor. We will reduce this
reference when we free the monitor.

If the reference of monitor is 0, we will free monitor automatically and
the reference of obj is reduced.

But in the function qemuDomainObjExitMonitorWithDriver(), we reduce this
reference again when the reference of monitor is 0.

It will cause the obj be freed in the function qemuDomainObjEndJob().

Then we start the domain again, and libvirtd will crash in the function
virDomainObjListSearchName(), because we pass a null pointer(obj->def->name)
to strcmp().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-03-18 01:26:31 -04:00
Wen Congyang
e2aec53b97 qemu: check driver name while attaching disk
This bug was reported by Shi Jin(jinzishuai@gmail.com):
=============
# virsh attach-disk RHEL6RC /var/lib/libvirt/images/test3.img vdb \
        --driver file --subdriver qcow2
Disk attached successfully

# virsh save RHEL6RC /var/lib/libvirt/images/memory.save
Domain RHEL6RC saved to /var/lib/libvirt/images/memory.save

# virsh restore /var/lib/libvirt/images/memory.save
error: Failed to restore domain from /var/lib/libvirt/images/memory.save
error: internal error unsupported driver name 'file'
       for disk '/var/lib/libvirt/images/test3.img'
=============

We check the driver name when we start or restore VM, but we do
not check it while attaching a disk. This adds the same check on disk
driverName used in qemuBuildCommandLine to qemudDomainAttachDevice.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-03-18 00:36:37 -04:00
Daniel Veillard
10598dd568 Avoid taking lock in libvirt debug dump
As pointed out, locking the buffer from the signal handler
cannot been guaranteed to be safe, so to avoid any hazard
we prefer the trade off of dumping logs possibly messed up
by concurrent logging activity rather than risk a daemon
crash.

* src/util/logging.c: change virLogEmergencyDumpAll() to not
  take any lock on the log buffer but reset buffer content variables
  to an empty set before starting the actual dump.
2011-03-18 10:06:30 +08:00
Wen Congyang
9741f3461b unlock the monitor when unwatching the monitor
Steps to reproduce this bug:
# virsh qemu-monitor-command domain 'cpu_set 2 online' --hmp
The domain has 2 cpus, and we try to set the third cpu online.
The qemu crashes, and this command will hang.

The reason is that the refs is not 1 when we unwatch the monitor.
We lock the monitor, but we do not unlock it. So virCondWait()
will be blocked.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-03-17 17:29:38 -06:00
Hu Tao
d69171566d Make virDomainObjParseNode() static
Make virDomainObjParseNode() static since it is called only
in one file.
2011-03-17 16:47:55 -06:00
Nikunj A. Dadhania
78ba748ef1 virsh: fix memtune's help message for swap_hard_limit
* Correct the documentation for cgroup: the swap_hard_limit indicates
  mem+swap_hard_limit.
* Change cgroup private apis to: virCgroupGet/SetMemSwapHardLimit

Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
2011-03-17 16:45:06 -06:00
Alex Williamson
2090b0f52d Add PCI sysfs reset access
I'm proposing we make use of $PCIDIR/reset in qemu-kvm to reset
devices on VM reset.  We need to add it to libvirt's list of
files that get ownership for device assignment.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2011-03-17 14:52:50 -06:00
Jim Fehlig
b24b442bf7 Support Xen sysctl v8, domctl v7
xen-unstable c/s 21118:28e5409e3fb3 bumped sysctl version to 8.
xen-unstable c/s 21212:de94884a669c introduced CPU pools feature,
adding another member to xen_domctl_getdomaininfo struct.  Add a
corresponding domctl v7 struct in xen hypervisor sub-driver and
detect sysctl v8 during initialization.
2011-03-17 14:16:47 -06:00
Matthias Bolte
55fb386671 remote: Add missing virCondDestroy calls
The virCond of the remote_thread_call struct was leaked in some
places. This results in leaking the underlying mutex. Which in turn
leaks a handle on Windows.

Reported by Aliaksandr Chabatar and Ihar Smertsin.
2011-03-17 17:51:33 +01:00
Laine Stump
12775d9491 macvtap: log an error if on failure to connect to netlink socket
A bug in libnl (see https://bugzilla.redhat.com/show_bug.cgi?id=677724
and https://bugzilla.redhat.com/show_bug.cgi?id=677725) makes it very
easy to create a failure to connect to the netlink socket when trying
to open a macvtap network device ("type='direct'" in domain interface
XML). When that error occurred (during a call to libnl's nl_connect()
from libvirt's nlComm(), there was no log message, leading virsh (for
example) to report "unknown error".

There were two other cases in nlComm where an error in a libnl
function might return with failure but no error reported. In all three
cases, this patch logs a message which will hopefully be more useful.

Note that more detailed information about the failure might be
available from libnl's nl_geterror() function, but it calls
strerror(), which is not threadsafe, so we can't use it.
2011-03-16 13:46:29 -04:00
Osier Yang
98a4e5a301 storage: Fix a problem which will cause libvirtd crashed
If pool xml has no definition for "port", then "Segmentation fault"
happens when jumping to "cleanup:" to do "VIR_FREE(port)", as "port"
was not initialized in this situation.

* src/conf/storage_conf.c
2011-03-16 16:28:07 +08:00
Eric Blake
100bba0647 qemu: support migration to fd
* src/qemu/qemu_monitor.h (qemuMonitorMigrateToFd): New
prototype.
* src/qemu/qemu_monitor.c (qemuMonitorMigrateToFd): New function.
2011-03-15 16:35:43 -06:00
Eric Blake
8e42c50bd4 qemu: improve efficiency of dd during snapshots
POSIX states about dd:

If the bs=expr operand is specified and no conversions other than
sync, noerror, or notrunc are requested, the data returned from each
input block shall be written as a separate output block; if the read
returns less than a full block and the sync conversion is not
specified, the resulting output block shall be the same size as the
input block. If the bs=expr operand is not specified, or a conversion
other than sync, noerror, or notrunc is requested, the input shall be
processed and collected into full-sized output blocks until the end of
the input is reached.

Since we aren't using conv=sync, there is no zero-padding, but our
use of bs= means that a short read results in a short write.  If
instead we use ibs= and obs=, then short reads are collected and dd
only has to do a single write, which can make dd more efficient.

* src/qemu/qemu_monitor.c (qemuMonitorMigrateToFile):
Avoid 'dd bs=', since it can cause short writes.
2011-03-15 16:35:43 -06:00
Wen Congyang
ce81bc5ce8 qemu: Fallback to HMP when cpu_set QMP command is not found 2011-03-15 09:55:06 -06:00
Daniel P. Berrange
a9c32b5d62 Change message for VIR_FROM_RPC error domain
The VIR_FROM_RPC error domain is used generically for any RPC
problem, not simply XML-RPC problems.

* src/util/virterror.c: s/XML-RPC/RPC/
2011-03-15 15:26:35 +00:00
Daniel P. Berrange
bd82db4057 Add compat function for geteuid()
* configure.ac: Check for geteuid()
* src/util/util.h: Compat for geteuid()
2011-03-15 15:26:35 +00:00
Daniel P. Berrange
2a2a00eb69 Fix misc bugs in virCommandPtr
The virCommandNewArgs() method would free the virCommandPtr
if it failed to add the args. This meant errors reported in
virCommandAddArgSet() were lost. Simply removing the check
for errors from the constructor means they can be reported
correctly later

The virCommandAddEnvPassCommon() method failed to check for
errors before reallocating the cmd->env array, causing a
potential SEGV if cmd was NULL

The virCommandAddArgSet() method needs to validate that at
least 1 element in 'val's parameter is non-NULL, otherwise
code like

    cmd = virCommandNew(binary)
    virCommandAddAtg(cmd, "foo")

Would end up trying todo  execve("foo"), if binary was
NULL.
2011-03-15 15:26:35 +00:00
Daniel P. Berrange
2737b6c20b Add virSetBlocking() to allow O_NONBLOCK to be toggle on or off
The virSetNonBlock() API only allows enabling non-blocking
operations. It doesn't allow turning blocking back on. Add
a new API to allow arbitrary toggling.

* src/libvirt_private.syms, src/util/util.h
  src/util/util.c: Add virSetBlocking
2011-03-15 15:26:35 +00:00
Eric Blake
30a50fc3b0 qemu: use more appropriate error
Fixes bug in commit acacced

* src/qemu/qemu_command.c (qemuBuildCommandLine):
s/INVALID_ARG/CONFIG_UNSUPPORTED/.
Reported by Daniel P. Berrange.
2011-03-15 08:49:04 -06:00
Taku Izumi
e5d46c08af libvirt: fix a simple bug in virDomainSetMemoryFlags()
This patch fix a simple bug in virDomainSetMemoryFlags function.
The patch sent before lacks the consideration of the case
where the driver doesn't support virDomainSetMemoryFlags API.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
2011-03-15 08:14:41 -04:00
Daniel P. Berrange
4e3117ae50 Make LXC container startup/shutdown/I/O more robust
The current LXC I/O controller looks for HUP to detect
when a guest has quit. This isn't reliable as during
initial bootup it is possible that 'init' will close
the console and let mingetty re-open it. The shutdown
of containers was also flakey because it only killed
the libvirt I/O controller and expected container
processes to gracefully follow.

Change the I/O controller such that when it see HUP
or an I/O error, it uses kill($PID, 0) to see if the
process has really quit.

Change the container shutdown sequence to use the
virCgroupKillPainfully function to ensure every
really goes away

This change makes the use of the 'cpu', 'devices'
and 'memory' cgroups controllers compulsory with
LXC

* docs/drvlxc.html.in: Document that certain cgroups
  controllers are now mandatory
* src/lxc/lxc_controller.c: Check if PID is still
  alive before quitting on I/O error/HUP
* src/lxc/lxc_driver.c: Use virCgroupKillPainfully
2011-03-15 12:12:53 +00:00
Daniel Veillard
b16f47ab61 Allow to dynamically set the size of the debug buffer
This is the part allowing to dynamically resize the debug log
buffer from it's default 64kB size. The buffer is now dynamically
allocated.
It adds a new API virLogSetBufferSize() which resizes the buffer
If passed a zero size, the buffer is deallocated and we do the small
optimization of not formatting messages which are not output anymore.
On the daemon side, it just adds a new option log_buffer_size to
libvirtd.conf and call virLogSetBufferSize() if needed
* src/util/logging.h src/util/logging.c src/libvirt_private.syms:
  make buffer dynamic and add virLogSetBufferSize() internal API
* daemon/libvirtd.conf: document the new log_buffer_size option
* daemon/libvirtd.c: read and use the new log_buffer_size option
2011-03-15 15:13:21 +08:00
Eric Blake
1c5dc4c607 qemu: consolidate duplicated monitor migration code
* src/qemu/qemu_monitor_text.h (qemuMonitorTextMigrate): Declare
in place of individual monitor commands.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONMigrate): Likewise.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextMigrateToHost)
(qemuMonitorTextMigrateToCommand, qemuMonitorTextMigrateToFile)
(qemuMonitorTextMigrateToUnix): Delete.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMigrateToHost)
(qemuMonitorJSONMigrateToCommand, qemuMonitorJSONMigrateToFile)
(qemuMonitorJSONMigrateToUnix): Delete.
* src/qemu/qemu_monitor.c (qemuMonitorMigrateToHost)
(qemuMonitorMigrateToCommand, qemuMonitorMigrateToFile)
(qemuMonitorMigrateToUnix): Consolidate shared code.
2011-03-14 21:57:43 -06:00
Eric Blake
c7af07ace4 qemu: use lighter-weight fd:n on incoming tunneled migration
Outgoing migration still uses a Unix socket and or exec netcat until
the next patch.

* src/qemu/qemu_migration.c (qemuMigrationPrepareTunnel):
Replace Unix socket with simpler pipe.
Suggested by Paolo Bonzini.
2011-03-14 21:57:42 -06:00
Osier Yang
acacced812 qemu: Check the unsigned integer overflow
As perhaps other hypervisor drivers use different capacity units,
do the checking in qemu driver instead of in conf/domain_conf.c.
2011-03-15 11:50:09 +08:00
Minoru Usui
9bfde34661 Fix performance problem of virStorageVolCreateXMLFrom()
This patch changes zerobuf variable from array to VIR_ALLOC_N().

Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
2011-03-14 21:02:17 -06:00
Laine Stump
7cc101ce0e audit: eliminate potential null pointer deref when auditing macvtap devices
The newly added call to qemuAuditNetDevice in qemuPhysIfaceConnect was
assuming that res_ifname (the name of the macvtap device) was always
valid, but this isn't the case. If openMacvtapTap fails, it always
returns NULL, which would result in a segv.

Since the audit log only needs a record of devices that are actually
sent to qemu, and a failure to open the macvtap device means that no
device will be sent to qemu, we can solve this problem by only doing
the audit if openMacvtapTap is successful (in which case res_ifname is
guaranteed valid).
2011-03-14 12:45:03 -04:00
Laine Stump
013427e6e7 network driver: don't send default route to clients on isolated networks
Normally dnsmasq will send a default route (the address of the host in
the network definition) to any client requesting an address via
DHCP. On an isolated network this makes no sense, as we have iptables
to prevent any traffic going out via that interface, so anything sent
that way would be dropped anyway.

This extra/unusable default route becomes problematic if you have
setup a guest with multiple network interfaces, with one connected to
an isolated network and another that provides connectivity to the
outside (example - one interface directly connecting to a physical
interface via macvtap, with a second connected to an isolated network
so that the host and guest can communicate (macvtap doesn't support
guest<->host communication without an external switch that supports
vepa, or reflecting all traffic back)). In this case, if the guest
chooses the default route of the isolated network, the guest will not
be able to get network traffic beyond the host.

To prevent dnsmasq from sending a default route, you can tell it to
send 0 bytes of data for the default route option (option number 3)
with --dhcp-option=3 (normally the data to send for the option would
follow the option number; no extra data means "don't send this option").

I have checked on RHEL5 (a good representative of the oldest supported
libvirt platforms) and its version of dnsmasq (2.45) does support
--dhcp-option, so this shouldn't create any compatibility problems.
2011-03-14 08:24:23 -04:00
Guido Günther
71753cb7f7 Add missing checks for read only connections
As pointed on CVE-2011-1146, some API forgot to check the read-only
status of the connection for entry point which modify the state
of the system or may lead to a remote execution using user data.
The entry points concerned are:
  - virConnectDomainXMLToNative
  - virNodeDeviceDettach
  - virNodeDeviceReAttach
  - virNodeDeviceReset
  - virDomainRevertToSnapshot
  - virDomainSnapshotDelete

* src/libvirt.c: fix the above set of entry points to error on read-only
                 connections
2011-03-14 10:56:28 +08:00
Laine Stump
13c00dde31 network driver: Use a separate dhcp leases file for each network
By default, all dnsmasq processes share the same leases file. libvirt
also uses the --dhcp-lease-max option to control the maximum number of
leases allowed. The problem is that libvirt puts in a number equal to
the number of addresses in the range for the one network handled by a
single instance of dnsmasq, but dnsmasq checks the total number of
leases in the file (which could potentially contain many more).

The solution is to tell each instance of dnsmasq to create and use its
own leases file. (/var/lib/libvirt/network/<net-name>.leases).

This file is created by dnsmasq when it starts, but not deleted when
it exists. This is fine when the network is just being stopped, but if
the leases file was left around when a network was undefined, we could
end up with an ever-increasing number of dead files - instead, we
explicitly unlink the leases file when a network is undefined.

Note that Ubuntu carries a patch against an older version of libvirt for this:

hhttps://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/713071
ttp://bazaar.launchpad.net/~serge-hallyn/ubuntu/maverick/libvirt/bugall/revision/109

I was certain I'd also seen discussion of this on libvir-list or
libvirt-users, but couldn't find it.
2011-03-11 23:49:47 -05:00
Laine Stump
e368e71040 network driver: Fix indentation from previous commit
The previous commit put a large portion of networkBuildDnsmasqArgv
inside an if { } block. This readjusts the indentation.
2011-03-11 23:49:30 -05:00
Laine Stump
7892edc9cc network driver: Start dnsmasq even if no dhcp ranges/hosts are specified.
This fixes a regression introduced in commit ad48df, and reported on
the libvirt-users list:

  https://www.redhat.com/archives/libvirt-users/2011-March/msg00018.html

The problem in that commit was that we began searching a list of ip
address definitions (rather than just having one) to look for a dhcp
range or static host; when we didn't find any, our pointer (ipdef) was
left at NULL, and when ipdef was NULL, we returned without starting up
dnsmasq.

Previously dnsmasq was started even without any dhcp ranges or static
entries, because it's still useful for DNS services.

Another problem I noticed while investigating was that, if there are
IPv6 addresses, but no IPv4 addresses of any kind, we would jump out
at an ever higher level in the call chain.

This patch does the following:

1) networkBuildDnsmasqArgv() = all uses of ipdef are protected from
   NULL dereference. (this patch doesn't change indentation, to make
   review easier. The next patch will change just the
   indentation). ipdef is intended to point to the first IPv4 address
   with DHCP info (or the first IPv4 address if none of them have any
   dhcp info).

2) networkStartDhcpDaemon() = if the loop looking for an ipdef with
   DHCP info comes up empty, we then grab the first IPv4 def from the
   list. Also, instead of returning if there are no IPv4 defs, we just
   return if there are no IP defs at all (either v4 or v6). This way a
   network that is IPv6-only will still get dnsmasq listening for DNS
   queries.

3) in networkStartNetworkDaemon() - we will startup dhcp not just if there
   are any IPv4 addresses, but also if there are any IPv6 addresses.
2011-03-11 23:49:11 -05:00
Eric Blake
de6b8a0800 qemu: fix -global argument usage
* src/qemu/qemu_command.c (qemuBuildCommandLine): Pass two
separate arguments, and fix indentation.
2011-03-11 10:11:48 -07:00
Philipp Hahn
0ed445e79c Ignore backing file errors in FS storage pool
Currently a single storage volume with a broken backing file will disable the
whole storage pool. This can happen when the backing file is on some
unavailable network storage or if the backing volume is deleted, while the
storage volumes using it remain.
Since the storage pool can not be re-activated, re-creating the missing
or deleting the now useless volumes using libvirt only is not possible.

Fixing this is a little bit tricky:
1. virStorageBackendProbeTarget() only detects the missing backing file,
   if the backing file format is not explicitly specified. If the
   backing file is created using
	   kvm-img create -f qcow2 -o backing_fmt=qcow2,backing_file=... ...
   no error is detected at this stage.
   The new return code -3 signals that the backing file could not be
   opened.
2. The backingStore.format must be >= 0, since values < 0 would break
   virStorageVolTargetDefFormat() when dumping the XML data such as
       <format type='...'/>
   Because of this the format is faked as VIR_STORAGE_FILE_RAW.
3. virStorageBackendUpdateVolTargetInfo() always opens the backing file
   and thus always detects a missing backing file.
   Since it "only" updates the capacity, allocation, owner, group, mode
   and SELinux label, just ignore errors at this stage, print an error
   message and continue.
4. Using vol-dump on a broken volume still doesn't work, but at least
   vol-destroy and pool-refresh do work now.

To reproduce:
  dir=$(mktemp -d)
  virsh pool-create-as tmp dir '' '' '' '' "$dir"
  virsh vol-create-as --format qcow2 tmp back 1G
  virsh vol-create-as --format qcow2 --backing-vol-format qcow2 --backing-vol back tmp cow 1G
  virsh vol-delete --pool tmp back
  virsh pool-refresh tmp
After the last step, the pool will be gone (because it was not persistent). As
long as the now broken image stays in the directory, you will not be able to
re-create or re-start the pool.

Signed-off-by: Philipp Hahn <hahn@univention.de>
2011-03-11 09:54:44 -07:00
Gui Jianfeng
5f4a48cbdf remote-protocol: implement new BlkioParameters API
Remote protocol implementation of virDomainSetBlkioParameters and virDomainGetBlkioParameters.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
2011-03-10 17:54:12 -07:00
Gui Jianfeng
f84a756eca qemu: implement new BlkioParameters API
Implement domainSetBlkioParameters and domainGetBlkioParameters for QEmu

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
2011-03-10 17:53:52 -07:00
Gui Jianfeng
d55aa8694e libvirt: implements virDomain{Get,Set}BlkioParameters
Implements virDomainSetBlkioParameters and virDomainGetBlkioParameters and initialization

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
2011-03-10 17:53:33 -07:00
Taku Izumi
55141abf9d setmem: implement the remote protocol to address the new API
This patch implements the remote protocol to address the new API.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
2011-03-10 15:02:58 -07:00
Taku Izumi
cad769001c setmem: implement the code to address the new API in the qemu driver
This patch implements the code to address the new API
in the qemu driver.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
2011-03-10 15:02:58 -07:00
Taku Izumi
e8340a8b79 setmem: introduce a new libvirt API (virDomainSetMemoryFlags)
This patch introduces a new libvirt API (virDomainSetMemoryFlags) and
a flag (virDomainMemoryModFlags).

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
2011-03-10 15:02:58 -07:00
Eric Blake
9516a0eca3 audit: audit use of /dev/net/tun, /dev/tapN, /dev/vhost-net
Opening raw network devices with the intent of passing those fds to
qemu is worth an audit point.  This makes a multi-part audit: first,
we audit the device(s) that libvirt opens on behalf of the MAC address
of a to-be-created interface (which can independently succeed or
fail), then we audit whether qemu actually started the network device
with the same MAC (so searching backwards for successful audits with
the same MAC will show which fd(s) qemu is actually using).  Note that
it is possible for the fd to be successfully opened but no attempt
made to pass the fd to qemu (for example, because intermediate
nwfilter operations failed) - no interface start audit will occur in
that case; so the audit for a successful opened fd does not imply
rights given to qemu unless there is a followup audit about the
attempt to start a new interface.

Likewise, when a network device is hot-unplugged, there is only one
audit message about the MAC being discontinued; again, searching back
to the earlier device open audits will show which fds that qemu quits
using (and yes, I checked via /proc/<qemu-pid>/fd that qemu _does_
close out the fds associated with an interface on hot-unplug).  The
code would require much more refactoring to be able to definitively
state which device(s) were discontinued at that point, since we
currently don't record anywhere in the XML whether /dev/vhost-net was
opened for a given interface.

* src/qemu/qemu_audit.h (qemuAuditNetDevice): New prototype.
* src/qemu/qemu_audit.c (qemuAuditNetDevice): New function.
* src/qemu/qemu_command.h (qemuNetworkIfaceConnect)
(qemuPhysIfaceConnect, qemuOpenVhostNet): Adjust prototype.
* src/qemu/qemu_command.c (qemuNetworkIfaceConnect)
(qemuPhysIfaceConnect, qemuOpenVhostNet): Add audit points and
adjust parameters.
(qemuBuildCommandLine): Adjust caller.
* src/qemu/qemu_hotplug.c (qemuDomainAttachNetDevice): Likewise.
2011-03-10 08:35:42 -07:00
Eric Blake
c52cbe487c qemu: don't request cgroup ACL access for /dev/net/tun
Since libvirt always passes /dev/net/tun to qemu via fd, we should
never trigger the cases where qemu tries to directly open the
device.  Therefore, it is safer to deny the cgroup device ACL.

* src/qemu/qemu_cgroup.c (defaultDeviceACL): Remove /dev/net/tun.
* src/qemu/qemu.conf (cgroup_device_acl): Reflect this change.
2011-03-10 08:32:43 -07:00
Eric Blake
5d09151341 qemu: support vhost in attach-interface
* src/qemu/qemu_hotplug.c (qemuDomainAttachNetDevice): Honor vhost
designations, similar to qemu_command code paths.
* src/qemu/qemu_command.h (qemuOpenVhostNet): New prototype.
* src/qemu/qemu_command.c (qemuOpenVhostNet): Export.
2011-03-10 08:28:16 -07:00
Jiri Denemark
346236fea9 qemu: Stop guest CPUs before creating a snapshot 2011-03-10 14:36:05 +01:00
Jiri Denemark
89e75b01a0 qemu: Refactor qemuDomainSnapshotCreateXML 2011-03-10 14:36:05 +01:00
Jiri Denemark
81711cee34 qemu: Escape snapshot name passed to {save,load,del}vm 2011-03-10 14:36:05 +01:00
Jiri Denemark
89241fe0d1 qemu: Fallback to HMP for snapshot commands
qemu driver in libvirt gained support for creating domain snapshots
almost a year ago in libvirt 0.8.0. Since then we enabled QMP support
for qemu >= 0.13.0 but QMP equivalents of {save,load,del}vm commands are
not implemented in current qemu (0.14.0) so the domain snapshot support
is not very useful.

This patch detects when the appropriate QMP command is not implemented
and tries to use human-monitor-command (aka HMP passthrough) to run
it's HMP equivalent.
2011-03-10 14:36:05 +01:00
Jiri Denemark
b3c6ec03b8 qemu: Rename qemuMonitorCommandWithHandler as qemuMonitorText*
To make it more obvious that it is only used for text monitor. The
naming also matches the style of qemuMonitorTextCommandWithFd.
2011-03-10 14:36:05 +01:00
Jiri Denemark
39b4f4aab2 qemu: Rename qemuMonitorCommand{,WithFd} as qemuMonitorHMP*
So that it's obvious that they are supposed to be used with HMP commands.
2011-03-10 14:36:04 +01:00
Jiri Denemark
266265a560 qemu: Setup infrastructure for HMP passthrough
JSON monitor command implementation can now just directly call text
monitor implementation and it will be automatically encapsulated into
QMP's human-monitor-command.
2011-03-10 14:36:04 +01:00
Jiri Denemark
3b8bf4a3a9 qemu: Fix warnings in event handlers
Some qemu monitor event handlers were issuing inadequate warning when
virDomainSaveStatus() failed. They copied the message from I/O error
handler without customizing it to provide better information on why
virDomainSaveStatus() was called.
2011-03-10 14:18:37 +01:00
Osier Yang
d999376954 storage: Update qemu-img flag checking
For newer qemu-img, the help string for "backing file format" is
"[-F backing_fmt]".

Fix the wrong logic error by commit e997c268.

* src/storage/storage_backend.c
2011-03-10 15:02:28 +08:00
Osier Yang
e997c268ef qemu: Replace deprecated option of qemu-img
qemu-img silently disable "-e", so we can't use it for volume
encryption anymore, change it into "-o encryption=on" if qemu
supports "-o" option.
2011-03-10 10:05:14 +08:00
Eric Blake
340ab27dd2 audit: also audit cgroup ACL permissions
* src/qemu/qemu_audit.h (qemuAuditCgroupMajor)
(qemuAuditCgroupPath): Add parameter.
* src/qemu/qemu_audit.c (qemuAuditCgroupMajor)
(qemuAuditCgroupPath): Add 'acl=rwm' to cgroup audit entries.
* src/qemu/qemu_cgroup.c: Update clients.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Likewise.
2011-03-09 11:36:59 -07:00
Eric Blake
5564c57528 cgroup: allow fine-tuning of device ACL permissions
Adding audit points showed that we were granting too much privilege
to qemu; it should not need any mknod rights to recreate any
devices.  On the other hand, lxc should have all device privileges.
The solution is adding a flag parameter.

This also lets us restrict write access to read-only disks.

* src/util/cgroup.h (virCgroup*Device*): Adjust prototypes.
* src/util/cgroup.c (virCgroupAllowDevice)
(virCgroupAllowDeviceMajor, virCgroupAllowDevicePath)
(virCgroupDenyDevice, virCgroupDenyDeviceMajor)
(virCgroupDenyDevicePath): Add parameter.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Update clients.
* src/lxc/lxc_controller.c (lxcSetContainerResources): Likewise.
* src/qemu/qemu_cgroup.c: Likewise.
(qemuSetupDiskPathAllow): Also, honor read-only disks.
2011-03-09 11:35:36 -07:00
Eric Blake
48096a0064 audit: rename remaining qemu audit functions
Also add ATTRIBUTE_NONNULL markers.

* src/qemu/qemu_audit.h: The pattern qemuDomainXXXAudit is
inconsistent; prefer qemuAuditXXX instead.
* src/qemu/qemu_audit.c: Reflect the renames.
* src/qemu/qemu_driver.c: Likewise.
* src/qemu/qemu_hotplug.c: Likewise.
* src/qemu/qemu_migration.c: Likewise.
* src/qemu/qemu_process.c: Likewise.
2011-03-09 11:35:20 -07:00
Eric Blake
f2512684ad audit: also audit cgroup controller path
Although the cgroup device ACL controller path can be worked out
by researching the code, it is more efficient to include that
information directly in the audit message.

* src/util/cgroup.h (virCgroupPathOfController): New prototype.
* src/util/cgroup.c (virCgroupPathOfController): Export.
* src/libvirt_private.syms: Likewise.
* src/qemu/qemu_audit.c (qemuAuditCgroup): Use it.
2011-03-09 10:19:17 -07:00
Eric Blake
d04916faae audit: split cgroup audit types to allow more information
Device names can be manipulated, so it is better to also log
the major/minor device number corresponding to the cgroup ACL
changes that libvirt made.  This required some refactoring
of the relatively new qemu cgroup audit code.

Also, qemuSetupChardevCgroup was only auditing on failure, not success.

* src/qemu/qemu_audit.h (qemuDomainCgroupAudit): Delete.
(qemuAuditCgroup, qemuAuditCgroupMajor, qemuAuditCgroupPath): New
prototypes.
* src/qemu/qemu_audit.c (qemuDomainCgroupAudit): Rename...
(qemuAuditCgroup): ...and drop a parameter.
(qemuAuditCgroupMajor, qemuAuditCgroupPath): New functions, to
allow listing device major/minor in audit.
(qemuAuditGetRdev): New helper function.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Adjust callers.
* src/qemu/qemu_cgroup.c (qemuSetupDiskPathAllow)
(qemuSetupHostUsbDeviceCgroup, qemuSetupCgroup)
(qemuTeardownDiskPathDeny): Likewise.
(qemuSetupChardevCgroup): Likewise, fixing missing audit.
2011-03-09 09:08:10 -07:00
Eric Blake
30ad48836e audit: tweak audit messages to match conventions
* src/qemu/qemu_audit.c (qemuDomainHostdevAudit): Avoid use of
"type", which has a pre-defined meaning.
(qemuDomainCgroupAudit): Likewise, as well as "item".
2011-03-09 08:11:31 -07:00
Eric Blake
b12a02803e docs: silence warnings about generated API docs
I noticed these while testing 'make dist'.

Parsing ./../src/util/event.c
Function comment for virEventRegisterDefaultImpl lacks description of return value
Function comment for virEventRunDefaultImpl lacks description of return value
Parsing ./../src/util/virterror.c
Missing comment for function virSetErrorLogPriorityFunc

* src/util/event.c (virEventRegisterDefaultImpl)
(virEventRunDefaultImpl): Document return types.
* src/util/virterror.c (virSetErrorLogPriorityFunc): Provide docs.
2011-03-09 08:07:09 -07:00
Cole Robinson
9189301426 Don't overwrite virRun error messages
virRun gives pretty useful error output, let's not overwrite it unless there
is a good reason. Some places were providing more information about what
the commands were _attempting_ to do, however that's usually less useful from
a debugging POV than what actually happened.
2011-03-09 08:53:12 -05:00
Guido Günther
ae1c5a936c libvirtd: Remove indirect linking
as described at
http://wiki.debian.org/ToolChain/DSOLinking
https://fedoraproject.org/wiki/UnderstandingDSOLinkChange

otherwise the build fails on current Debian unstable with:

CCLD   libvirtd
/usr/bin/ld: ../src/.libs/libvirt_driver_lxc.a(libvirt_driver_lxc_la-lxc_container.o): undefined reference to symbol 'capng_apply'
/usr/bin/ld: note: 'capng_apply' is defined in DSO //usr/lib/libcap-ng.so.0 so try adding it to the linker command line

CCLD   libvirtd
/usr/bin/ld: ../src/.libs/libvirt_driver_storage.a(libvirt_driver_storage_la-storage_backend.o): undefined reference to symbol 'fgetfilecon'
/usr/bin/ld: note: 'fgetfilecon' is defined in DSO //lib/libselinux.so.1 so try adding it to the linker command line
//lib/libselinux.so.1: could not read symbols: Invalid operation

and similar errors.
2011-03-09 13:59:58 +01:00
Hu Tao
83d35233a9 Fix a wrong error message thrown to user
* src/qemu/qemu_driver.c: qemuDomainUpdateDeviceFlags() is not disk
  specific as the message suggests
2011-03-09 20:09:25 +08:00
Eric Blake
3dfd4ea398 build: avoid compiler warning on cygwin
On cygwin:

  CC       libvirt_driver_security_la-security_dac.lo
security/security_dac.c: In function 'virSecurityDACSetProcessLabel':
security/security_dac.c:618: warning: format '%d' expects type 'int', but argument 7 has type 'uid_t' [-Wformat]

We've done this before (see src/util/util.c).

* src/security/security_dac.c (virSecurityDACSetProcessLabel): On
cygwin, uid_t is a 32-bit long.
2011-03-08 21:56:18 -07:00
Eric Blake
b1a5aefcee build: fix build on cygwin
On cygwin:

  CC        libvirt_util_la-cgroup.lo
util/cgroup.c: In function 'virCgroupKillRecursiveInternal':
util/cgroup.c:1458: warning: implicit declaration of function 'virCgroupNew' [-Wimplicit-function-declaration]

* src/util/cgroup.c (virCgroupKill): Don't build on platforms
where virCgroupNew is unsupported.
2011-03-08 21:44:24 -07:00
Daniel Veillard
d299e1d08e Fix build on cygwin
Apparently some signals found on Unix are not exposed, this led
to a compilation failure
* src/util/logging.c: make code related to each signal dependant
  upon the definition of that signal
2011-03-08 16:01:25 +08:00
Wen Congyang
0e29f71135 support to detach USB disk
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-03-07 11:40:12 -07:00
Wen Congyang
8f338032b9 rename qemuDomainDetachSCSIDiskDevice to qemuDomainDetachDiskDevice
The way to detach a USB disk is the same as that to detach a SCSI
disk. Rename this function and we can use it to detach a USB disk.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-03-07 11:28:15 -07:00
Cole Robinson
56a4d8127f qemu_hotplug: Reword error if spice password change not available
Currently it sounds like spice is completely unsupported, which is
confusing.
2011-03-07 13:26:46 -05:00
Wen Congyang
ac9ee6b5e0 unlock eventLoop before calling callback function
When I use newest libvirt to save a domain, libvirtd will be deadlock.
Here is the output of gdb:
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f972a1fc710 (LWP 30265))]#0  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
    at qemu/qemu_driver.c:2074
    ret=0x7f972a1fbbe0) at remote.c:2273
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f9730bcd710 (LWP 30261))]#0  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
(gdb) p *(virMutexPtr)0x6fdd60
$2 = {lock = {__data = {__lock = 2, __count = 0, __owner = 30261, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\000\000\000\000\065v\000\000\001", '\000' <repeats 26 times>, __align = 2}}
(gdb) p *(virMutexPtr)0x1a63ac0
$3 = {lock = {__data = {__lock = 2, __count = 0, __owner = 30265, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\000\000\000\000\071v\000\000\001", '\000' <repeats 26 times>, __align = 2}}
(gdb) info threads
  7 Thread 0x7f9730bcd710 (LWP 30261)  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
  6 Thread 0x7f972bfff710 (LWP 30262)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f972b5fe710 (LWP 30263)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f972abfd710 (LWP 30264)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 3 Thread 0x7f972a1fc710 (LWP 30265)  0x000000351fe0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x7f97297fb710 (LWP 30266)  0x000000351fe0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  1 Thread 0x7f9737aac800 (LWP 30260)  0x000000351fe0803d in pthread_join () from /lib64/libpthread.so.0

The reason is that we will try to lock some object in callback function, and we may call event API with locking the same object.
In the function virEventDispatchHandles(), we unlock eventLoop before calling callback function. I think we should
do the same thing in the function virEventCleanupTimeouts() and virEventCleanupHandles().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2011-03-07 10:05:17 -07:00
Daniel P. Berrange
2ed6cc7bec Expose event loop implementation as a public API
Not all applications have an existing event loop they need
to integrate with. Forcing them to implement the libvirt
event loop integration APIs is an undue burden. This just
exposes our simple poll() based implementation for apps
to use. So instead of calling

   virEventRegister(....callbacks...)

The app would call

   virEventRegisterDefaultImpl()

And then have a thread somewhere calling

    static bool quit = false;
    ....
    while (!quit)
      virEventRunDefaultImpl()

* daemon/libvirtd.c, tools/console.c,
  tools/virsh.c: Convert to public event loop APIs
* include/libvirt/libvirt.h.in, src/libvirt_private.syms: Add
  virEventRegisterDefaultImpl and virEventRunDefaultImpl
* src/util/event.c: Implement virEventRegisterDefaultImpl
  and virEventRunDefaultImpl using poll() event loop
* src/util/event_poll.c: Add full error reporting
* src/util/virterror.c, include/libvirt/virterror.h: Add
  VIR_FROM_EVENTS
2011-03-07 14:16:13 +00:00
Daniel P. Berrange
343eaa150b Move event code out of the daemon/ into src/util/
The event loop implementation is used by more than just the
daemon, so move it into the shared area.

* daemon/event.c, src/util/event_poll.c: Renamed
* daemon/event.h, src/util/event_poll.h: Renamed
* tools/Makefile.am, tools/console.c, tools/virsh.c: Update
  to use new virEventPoll APIs
* daemon/mdns.c, daemon/mdns.c, daemon/Makefile.am: Update
  to use new virEventPoll APIs
2011-03-07 14:16:13 +00:00
Daniel Veillard
bcb40b852c Cleaning up some of the logging code
* src/util/logging.c: fix virLogDumpAllFD() to avoid snprintf, simplify
  the code and provide more useful signal descriptions. Also remove an
  unused variable.
2011-03-07 21:23:53 +08:00
Osier Yang
82dfc6f38e qemu: Support vram for video of qxl type
For qemu names the primary vga as "qxl-vga":

  1) if vram is specified for 2nd qxl device:

    -vga qxl -global qxl-vga.vram_size=$SIZE \
    -device qxl,id=video1,vram_size=$SIZE,...

  2) if vram is not specified for 2nd qxl device, (use the default
     set by global):

    -vga qxl -global qxl-vga.vram_size=$SIZE \
    -device qxl,id=video1,...

For qemu names all qxl devices as "qxl":

  1) if vram is specified for 2nd qxl device:

    -vga qxl -global qxl.vram_size=$SIZE \
    -device qxl,id=video1,vram_size=$SIZE ...

  2) if vram is not specified for 2nd qxl device:

    -vga qxl -global qxl-vga.vram_size=$SIZE \
    -device qxl,id=video1,...

"-global" is the only way to define vram_size for the primary qxl
device, regardless of how qemu names it, (It's not good a good
way, as original idea of "-global" is to set a global default for
a driver property, but to specify vram for first qxl device, we
have to use it).

For other qxl devices, as they are represented by "-device", could
specify it directly and seperately for each, and it overrides the
default set by "-global" if specified.

v1 - v2:
  * modify "virDomainVideoDefaultRAM" so that it returns 16M as the
    default vram_size for qxl device.

  * vram_size * 1024 (qemu accepts bytes for vram_size).

  * apply default vram_size for qxl device for which vram_size is
    not specified.

  * modify "graphics-spice" tests (more sensiable vram_size)

  * Add an argument of virDomainDefPtr type for qemuBuildVideoDevStr,
    to use virDomainVideoDefaultRAM in qemuBuildVideoDevStr).

v2 - v3:
  * Modify default video memory size for qxl device from 16M to 24M

  * Update codes to be consistent with changes on qemu_capabilities.*
2011-03-06 22:00:27 +08:00
Phil Petty
5a81401235 fixes for several memory leaks
Signed-off-by: Eric Blake <eblake@redhat.com>
2011-03-04 09:52:12 -07:00
Daniel Veillard
398553c157 Add an an internal API for emergency dump of debug buffer
virLogEmergencyDumpAll() allows to dump the content of the
debug buffer from within a signal handler. It saves to all
log file or stderr if none is found
* src/util/logging.h src/util/logging.c: add the new API
  and cleanup the old virLogDump code
* src/libvirt_private.syms: exports it as a private symbol
2011-03-04 22:43:55 +08:00
Daniel Veillard
35708ec151 Fix a counter bug in the log buffer
* src/util/logging.c: the start pointer need to wrap around too
2011-03-04 22:43:55 +08:00
Daniel Veillard
8b9a1190c1 Force all logs to go to the round robbin memory buffer
Initially only the log actually written out by libvirt were
saved on the memory buffer, this patch forces all informations
including info and debug to be saved in memory too. This is
useful to get full data in case of crash.
2011-03-04 22:43:55 +08:00
Laine Stump
f8ac67909d qemu: avoid corruption of domain hashtable and misuse of freed domains
This was also found while investigating

   https://bugzilla.redhat.com/show_bug.cgi?id=670848

An EOF on a domain's monitor socket results in an event being queued
to handle the EOF. The handler calls qemuProcessHandleMonitorEOF. If
it is a transient domain, this leads to a call to
virDomainRemoveInactive, which removes the domain from the driver's
hashtable and unref's it. Nowhere in this code is the qemu driver lock
acquired.

However, all modifications to the driver's domain hashtable *must* be
done while holding the driver lock, otherwise the hashtable can become
corrupt, and (even more likely) another thread could call a different
hashtable function and acquire a pointer to the domain that is in the
process of being destroyed.

To prevent such a disaster, qemuProcessHandleMonitorEOF must get the
qemu driver lock *before* it gets the DomainObj's lock, and hold it
until it is finished with the DomainObj. This guarantees that nobody
else modifies the hashtable at the same time, and that anyone who had
already gotten the DomainObj from the hashtable prior to this call has
finished with it before we remove/destroy it.
2011-03-04 08:13:11 -05:00
Laine Stump
e570ca1246 qemu: Add missing lock of virDomainObj before calling virDomainUnref
This was found while researching the root cause of:

https://bugzilla.redhat.com/show_bug.cgi?id=670848

virDomainUnref should only be called with the lock held for the
virDomainObj in question. However, when a transient qemu domain gets
EOF on its monitor socket, it queues an event which frees the monitor,
which unref's the virDomainObj without first locking it. If another
thread has already locked the virDomainObj, the modification of the
refcount could potentially be corrupted. In an extreme case, it could
also be potentially unlocked by virDomainObjFree, thus left open to
modification by anyone else who would have otherwise waited for the
lock (not to mention the fact that they would be accessing freed
data!).

The solution is to have qemuMonitorFree lock the domain object right
before unrefing it. Since the caller to qemuMonitorFree doesn't expect
this lock to be held, if the refcount doesn't go all the way to 0,
qemuMonitorFree must unlock it after the unref.
2011-03-04 08:12:58 -05:00
Matthias Bolte
b31d6c1269 esx: Escape password for XML
Passwords are allowed to contain <, >, &, ', " characters.
Those need to be replaced by the corresponding entities.

Reported by Hereward Cooper.
2011-03-03 22:18:09 +01:00
Eric Blake
d152f64760 util: correct retry path in virFileOperation
In virFileOperation, the parent does a fallback to a non-fork
attempt if it detects that the child returned EACCES.  However,
the child was calling _exit(-EACCES), which does _not_ appear
as EACCES in the parent.

* src/util/util.c (virFileOperation): Correctly pass EACCES from
child to parent.
2011-03-03 08:12:08 -07:00
Soren Hansen
e5f3b90e97 Pass virSecurityManagerPtr to virSecurityDAC{Set, Restore}ChardevCallback
virSecurityDAC{Set,Restore}ChardevCallback expect virSecurityManagerPtr,
but are passed virDomainObjPtr instead. This makes
virSecurityDACSetChardevLabel set a wrong uid/gid on chardevs. This
patch fixes this behaviour.

Signed-off-by: Soren Hansen <soren@linux2go.dk>
2011-03-03 08:08:16 -07:00
Jiri Denemark
9677cd33ee util: Allow removing hash entries in virHashForEach
This fixes a possible crash of libvirtd during its startup. When qemu
driver reconnects to running domains, it iterates over all domain
objects in a hash. When reconnecting to an associated qemu monitor
fails and the domain is transient, it's immediately removed from the
hash. Despite the fact that it's explicitly forbidden to do so. If
libvirtd is lucky enough, virHashForEach will access random memory when
the callback finishes and the deamon will crash.

Since it's trivial to fix virHashForEach to allow removal of hash
entries while iterating through them, I went this way instead of fixing
qemuReconnectDomain callback (and possibly others) to avoid deleting the
entries.
2011-03-03 15:22:16 +01:00
Daniel P. Berrange
d6d30cd4ae Attempt to improve an error message
Replace the 'Unknown failure' error message with something a
little bit more descriptive.

* src/util/virterror.c: Improve error message
2011-03-03 14:17:12 +00:00
Eric Blake
4f805dcdc4 qemu: avoid double close on domain restore
qemudDomainSaveImageStartVM was evil - it closed the incoming fd
argument on some, but not all, code paths, without informing the
caller about that action.  No wonder that this resulted in
double-closes: https://bugzilla.redhat.com/show_bug.cgi?id=672725

* src/qemu/qemu_driver.c (qemudDomainSaveImageStartVM): Alter
signature, to avoid double-close.
(qemudDomainRestore, qemudDomainObjRestore): Update callers.
2011-03-02 08:58:49 -07:00
Daniel P. Berrange
6a8ef183be add additional event debug points
Followup to commit 2222bd24
2011-03-01 16:54:20 -07:00
Eric Blake
7c6b22c4d5 qemu: only request sound cgroup ACL when required
When a SPICE or VNC graphics controller is present, and sound is
piggybacked over a channel to the graphics device rather than
directly accessing host hardware, then there is no need to grant
host hardware access to that qemu process.

* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Prevent sound with
spice, and with vnc when vnc_allow_host_audio is 0.
Reported by Daniel Berrange.
2011-02-28 09:42:25 -07:00
Daniel P. Berrange
3c37a171a2 Add check for kill() to fix build of cgroups on win32
The kill() function doesn't exist on Win32, so it needs to be
checked for at build time & code disabled in cgroups

* configure.ac: Check for kill()
* src/util/cgroup.c: Stub out virCGroupKill* functions
  when kill() isn't available
2011-02-28 14:13:58 +00:00
Michal Novotny
3ee7cf6c9b Add support for multiple serial ports into the Xen driver
this is the patch to add support for multiple serial ports to the
libvirt Xen driver. It support both old style (serial = "pty") and
new style (serial = [ "/dev/ttyS0", "/dev/ttyS1" ]) definition and
tests for xml2sexpr, sexpr2xml and xmconfig have been added as well.

Written and tested on RHEL-5 Xen dom0 and working as designed but
the Xen version have to have patch for RHBZ #614004 but this patch
is for upstream version of libvirt.

Also, this patch is addressing issue described in RHBZ #670789.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
2011-02-25 11:33:27 -07:00
Michal Novotny
79c3fe4d16 Fix port value parsing for serial and parallel ports
this is the patch to fix the virDomainChrDefParseTargetXML() functionality
to parse the target port from XML if available. This is necessary for
multiple serial port support which is the second part of this patch.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
2011-02-25 11:33:27 -07:00
Daniel P. Berrange
33191b419c Add APIs for killing off processes inside a cgroup
The virCgroupKill method kills all PIDs found in a cgroup

The virCgroupKillRecursively method does this recursively
for child cgroups.

The virCgroupKillPainfully method does a recursive kill
several times in a row until everything has really died
2011-02-25 14:21:30 +00:00
Daniel P. Berrange
16ba2aafc4 Allow hash tables to use generic pointers as keys
Relax the restriction that the hash table key must be a string
by allowing an arbitrary hash code generator + comparison func
to be provided

* util/hash.c, util/hash.h: Allow any pointer as a key
* internal.h: Include stdbool.h as standard.
* conf/domain_conf.c, conf/domain_conf.c,
  conf/nwfilter_params.c, nwfilter/nwfilter_gentech_driver.c,
  nwfilter/nwfilter_gentech_driver.h, nwfilter/nwfilter_learnipaddr.c,
  qemu/qemu_command.c, qemu/qemu_driver.c,
  qemu/qemu_process.c, uml/uml_driver.c,
  xen/xm_internal.c: s/char */void */ in hash callbacks
2011-02-25 13:00:54 +00:00
Daniel P. Berrange
6952708ca4 Remove deallocator parameter from hash functions
Since the deallocator is passed into the constructor of
a hash table it is not desirable to pass it into each
function again. Remove it from all functions, but provide
a virHashSteal to allow a item to be removed from a hash
table without deleteing it.

* src/util/hash.c, src/util/hash.h: Remove deallocator
  param from all functions. Add virHashSteal
* src/libvirt_private.syms: Add virHashSteal
* src/conf/domain_conf.c, src/conf/nwfilter_params.c,
  src/nwfilter/nwfilter_learnipaddr.c,
  src/qemu/qemu_command.c, src/xen/xm_internal.c: Update
  for changed hash API
2011-02-25 13:00:46 +00:00
Philipp Hahn
0905d1ee95 Fix spelling mistake: seek
Replace wrong "set" by correct "seek" in error message.

Signed-off-by: Philipp Hahn <hahn@univention.de>
2011-02-24 14:19:15 -07:00
Eric Blake
1aaef5ad72 audit: audit qemu pci and usb device passthrough
* src/qemu/qemu_audit.h (qemuDomainHostdevAudit): New prototype.
* src/qemu/qemu_audit.c (qemuDomainHostdevAudit): New function.
(qemuDomainStartAudit): Call as appropriate.
* src/qemu/qemu_hotplug.c (qemuDomainAttachHostPciDevice)
(qemuDomainAttachHostUsbDevice, qemuDomainDetachHostPciDevice)
(qemuDomainDetachHostUsbDevice): Likewise.
2011-02-24 13:32:17 -07:00
Eric Blake
e25f2c74df audit: audit qemu memory and vcpu adjusments
* src/qemu/qemu_audit.h (qemuDomainMemoryAudit)
(qemuDomainVcpuAudit): New prototypes.
* src/qemu/qemu_audit.c (qemuDomainResourceAudit)
(qemuDomainMemoryAudit, qemuDomainVcpuAudit): New functions.
(qemuDomainStartAudit): Call as appropriate.
* src/qemu/qemu_driver.c (qemudDomainSetMemory)
(qemudDomainHotplugVcpus): Likewise.
2011-02-24 13:32:17 -07:00
Eric Blake
6bb98d419f audit: add qemu hooks for auditing cgroup events
* src/qemu/qemu_audit.h (qemuDomainCgroupAudit): New prototype.
* src/qemu/qemu_audit.c (qemuDomainCgroupAudit): Implement it.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Add audit.
* src/qemu/qemu_cgroup.c (qemuSetupDiskPathAllow)
(qemuSetupChardevCgroup, qemuSetupHostUsbDeviceCgroup)
(qemuSetupCgroup, qemuTeardownDiskPathDeny): Likewise.
2011-02-24 13:32:15 -07:00
Eric Blake
b4d3434fc2 audit: prepare qemu for listing vm in cgroup audits
* src/qemu/qemu_cgroup.h (struct qemuCgroupData): New helper type.
(qemuSetupDiskPathAllow, qemuSetupChardevCgroup)
(qemuTeardownDiskPathDeny): Drop unneeded prototypes.
(qemuSetupDiskCgroup, qemuTeardownDiskCgroup): Adjust prototype.
* src/qemu/qemu_cgroup.c
(qemuSetupDiskPathAllow, qemuSetupChardevCgroup)
(qemuTeardownDiskPathDeny): Mark static and use new type.
(qemuSetupHostUsbDeviceCgroup): Use new type.
(qemuSetupDiskCgroup): Alter signature.
(qemuSetupCgroup): Adjust caller.
* src/qemu/qemu_hotplug.c (qemuDomainAttachHostUsbDevice)
(qemuDomainDetachPciDiskDevice, qemuDomainDetachSCSIDiskDevice):
Likewise.
* src/qemu/qemu_driver.c (qemudDomainAttachDevice)
(qemuDomainUpdateDeviceFlags): Likewise.
2011-02-24 13:31:05 -07:00
Eric Blake
061738764d cgroup: determine when skipping non-devices
* src/util/cgroup.c (virCgroupAllowDevicePath)
(virCgroupDenyDevicePath): Don't fail with EINVAL for
non-devices.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Update caller.
* src/qemu/qemu_cgroup.c (qemuSetupDiskPathAllow)
(qemuSetupChardevCgroup, qemuSetupHostUsbDeviceCgroup)
(qemuSetupCgroup, qemuTeardownDiskPathDeny): Likewise.
2011-02-24 13:31:05 -07:00
Eric Blake
fd21ecfd49 virExec: avoid uninitialized memory usage
valgrind warns:

==21079== Syscall param rt_sigaction(act->sa_mask) points to uninitialised byte(s)
==21079==    at 0x329840F63E: __libc_sigaction (sigaction.c:67)
==21079==    by 0x4E5A8E7: __virExec (util.c:661)

Regression introduced in commit ab07533e.  Technically, sa_mask
shouldn't affect operation if sa_flags selects sa_handler, and
sa_handler selects SIG_IGN, but better safe than sorry.

* src/util/util.c (__virExec): Supply missing sigemptyset.
2011-02-24 13:12:52 -07:00
Daniel P. Berrange
4f2094a8a6 Allow 32-on-64 execution for LXC guests
Using the 'personality(2)' system call, we can make a container
on an x86_64 host appear to be i686. Likewise for most other
Linux 64bit arches.

* src/lxc/lxc_conf.c: Fill in 32bit capabilities for x86_64 hosts
* src/lxc/lxc_container.h, src/lxc/lxc_container.c: Add API to
  check if an arch has a 32bit alternative
* src/lxc/lxc_controller.c: Set the process personality when
  starting guest
2011-02-24 12:04:29 +00:00
Daniel P. Berrange
35416720c2 Put <stdbool.h> into internal.h so it is available everywhere
Remove the <stdbool.h> header from all source files / headers
and just put it into internal.h

* src/internal.h: Add <stdbool.h>
2011-02-24 12:04:06 +00:00
Jiri Denemark
9fc4b6a606 qemu: Switch over command line capabilities to virBitmap
This is done for two reasons:
- we are getting very close to 64 flags which is the maximum we can use
  with unsigned long long
- by using LL constants in enum we already violates C99 constraint that
  enum values have to fit into int
2011-02-24 12:10:00 +01:00
Jiri Denemark
23d935bd97 qemu: Rename qemud\?CmdFlags to qemuCaps
The new name complies more with the fact that it contains a set of
qemuCapsFlags.
2011-02-24 12:08:34 +01:00
Jiri Denemark
a96d08dc53 qemu: Use helper functions for handling cmd line capabilities
Three new functions (qemuCapsSet, qemuCapsClear, and qemuCapsGet) were
introduced replacing direct bit operations.
2011-02-24 12:07:06 +01:00
Jiri Denemark
21642e82b1 qemu: Rename QEMUD_CMD_FLAG_* to QEMU_CAPS_*
The new names comply more with the fact that they are all members of
enum qemuCapsFlags.
2011-02-24 12:05:39 +01:00
Jiri Denemark
40641f2a63 util: Add API for converting virBitmap into printable string 2011-02-24 12:03:04 +01:00
Jiri Denemark
533bee8249 util: Use unsigned long as a base type for virBitmap 2011-02-24 12:00:52 +01:00
Daniel P. Berrange
6704e3fdb3 Expose name + UUID to LXC containers via env variables
When spawning 'init' in the container, set

  LIBVIRT_LXC_UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  LIBVIRT_LXC_NAME=YYYYYYYYYYYY

to allow guest software to detect & identify that they
are in a container

* src/lxc/lxc_container.c: Set LIBVIRT_LXC_UUID and
  LIBVIRT_LXC_NAME env vars
2011-02-23 11:41:02 +00:00
Daniel P. Berrange
9f5bbe3b92 Fix off-by-1 in virFileAbsPath.
The virFileAbsPath was not taking into account the '/' directory
separator when allocating memory for combining cwd + path. Convert
to use virAsprintf to avoid this type of bug completely.

* src/util/util.c: Convert virFileAbsPath to use virAsprintf
2011-02-23 11:11:45 +00:00
Daniel P. Berrange
08fb2a9ce8 Fix group/mode for /dev/pts inside LXC container
Normal practice for /dev/pts is to have it mode=620,gid=5
but LXC was leaving mode=000,gid=0 preventing unprivilegd
users in the guest use of PTYs

* src/lxc/lxc_controller.c: Fix /dev/pts setup
2011-02-23 11:11:35 +00:00
Eric Blake
009fce98be security: avoid memory leak
Leak introduced in commit d6623003.

* src/qemu/qemu_driver.c (qemuSecurityInit): Avoid leak on failure.
* src/security/security_stack.c (virSecurityStackClose): Avoid
leaking component drivers.
2011-02-22 09:50:34 -07:00
Roopa Prabhu
dfd39ccda8 802.1Qbh: Delay IFF_UP'ing interface until migration final stage
Current code does an IFF_UP on a 8021Qbh interface immediately after a port
profile set. This is ok in most cases except when its the migration prepare
stage. During migration we want to postpone IFF_UP'ing the interface on the
destination host until the source host has disassociated the interface.
This patch moves IFF_UP of the interface to the final stage of migration.
The motivation for this change is to postpone any addr registrations on the
destination host until the source host has done the addr deregistrations.

While at it, for symmetry with associate move ifDown of a 8021Qbh interface
to before disassociate
2011-02-22 07:48:54 -05:00
Osier Yang
9f928af12b storage: make debug log more useful
"__func__" is useless there, as VIR_DEBUG will print the function
name.

* src/storage/storage_backend_mpath.c
2011-02-22 10:10:31 +08:00
Markus Groß
3e53c7f954 Renamed functions in xenxs 2011-02-21 11:15:08 -07:00
Markus Groß
2e69e66e38 Moved XM formatting functions to xenxs 2011-02-21 11:14:52 -07:00
Markus Groß
1556ced2de Moved XM parsing functions to xenxs 2011-02-21 11:11:22 -07:00
Markus Groß
2f2a88b998 Moved SEXPR formatting functions to xenxs 2011-02-21 11:07:43 -07:00
Markus Groß
07129039cc Moved SEXPR parsing functions to xenxs 2011-02-21 10:53:53 -07:00
Markus Groß
c71328b9aa Moved some SEXPR functions from xen-unified 2011-02-21 10:50:18 -07:00
Markus Groß
8606ca0d0b Moved SEXPR unit to utils 2011-02-21 10:48:02 -07:00
Wen Congyang
cf61114cee protect the scsi controller to be deleted when it is in use
Steps to reproduce this bug:
1. virsh attach-disk domain --source imagefile --target sdb --sourcetype file --driver qemu --subdriver raw
2. virsh detach-device controller.xml # remove scsi controller 0
3. virsh detach-disk domain sdb
   error: Failed to detach disk
   error: operation failed: detaching scsi0-0-1 device failed: Device 'scsi0-0-1' not found

I think we should not detach a controller when it is used by some other device.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-02-21 10:38:19 -07:00
Eric Blake
994e7567b6 maint: kill all remaining uses of old DEBUG macro
Done mechanically with:
$ git grep -l '\bDEBUG0\? *(' | xargs -L1 sed -i 's/\bDEBUG0\? *(/VIR_&/'

followed by manual deletion of qemudDebug in daemon/libvirtd.c, along
with a single 'make syntax-check' fallout in the same file, and the
actual deletion in src/util/logging.h.

* src/util/logging.h (DEBUG, DEBUG0): Delete.
* daemon/libvirtd.h (qemudDebug): Likewise.
* global: Change remaining clients over to VIR_DEBUG counterpart.
2011-02-21 08:46:52 -07:00
Eric Blake
03ba07cb73 hash: make virHashFree more free-like
Two-argument free functions are uncommon; match the style elsewhere
by caching the callback at creation.

* src/util/hash.h (virHashCreate, virHashFree): Move deallocator
argument to creation.
* cfg.mk (useless_free_options): Add virHashFree.
* src/util/hash.c (_virHashTable): Track deallocator.
(virHashCreate, virHashFree): Update to new signature.
* src/conf/domain_conf.c (virDomainObjListDeinit)
(virDomainObjListInit, virDomainDiskDefForeachPath)
(virDomainSnapshotObjListDeinit, virDomainSnapshotObjListInit):
Update callers.
* src/conf/nwfilter_params.c (virNWFilterHashTableFree)
(virNWFilterHashTableCreate): Likewise.
* src/conf/nwfilter_conf.c (virNWFilterTriggerVMFilterRebuild):
Likewise.
* src/cpu/cpu_generic.c (genericHashFeatures, genericBaseline):
Likewise.
* src/xen/xm_internal.c (xenXMOpen, xenXMClose): Likewise.
* src/nwfilter/nwfilter_learnipaddr.c (virNWFilterLearnInit)
(virNWFilterLearnShutdown): Likewise.
* src/qemu/qemu_command.c (qemuDomainPCIAddressSetCreate)
(qemuDomainPCIAddressSetFree): Likewise.
* src/qemu/qemu_process.c (qemuProcessWaitForMonitor): Likewise.
2011-02-21 08:27:02 -07:00
Daniel P. Berrange
2ff4c13754 Remove all object hashtable caches from virConnectPtr
The virConnectPtr struct will cache instances of all other
objects. APIs like virDomainLookupByUUID will return a
cached object, so if you do virDomainLookupByUUID twice in
a row, you'll get the same exact virDomainPtr instance.

This does not have any performance benefit, since the actual
logic in virDomainLookupByUUID (and other APIs returning
virDomainPtr, etc instances) is not short-circuited. All
it does is to ensure there is only one single virDomainPtr
in existance for any given UUID.

The caching has a number of downsides though, all relating
to stale data. If APIs aren't careful to always overwrite
the 'id' field in virDomainPtr it may become out of data.
Likewise for the name field, if a guest is renamed, or if
a guest is deleted, and then a new one created with the
same UUID but different name.

This has been an ongoing, endless source of bugs for all
applications using libvirt from languages with garbage
collection, causing them to get virDomainPtr instances
from long ago with stale data.

The caching is also a waste of memory resources, since
both applications, and language bindings often maintain
their own hashtable caches of object instances.

This patch removes all the hash table caching, so all
APIs return brand new virDomainPtr (etc) object instances.

* src/datatypes.h: Delete all hash tables.
* src/datatypes.c: Remove all object caching code
2011-02-21 11:50:46 +00:00
Stefan Berger
912d170f87 nwfilter: enable rejection of packets
This patch adds the possibility to not just drop packets, but to also have them rejected where iptables at least sends an ICMP msg back to the originator. On ebtables this again maps into dropping packets since rejecting is not supported.

I am adding 'since 0.8.9' to the docs assuming this will be the next version of libvirt.
2011-02-18 20:13:40 -05:00
Guido Günther
acab8a97ce Drop empty argument from dnsmasq call
since dnsmasq >= 2.56 now bails out with empty arguments. See
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613944
for the Debian bug and
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589885
for the upstream reasoning.
2011-02-18 22:23:15 +01:00
Eric Blake
7b6286b780 build: fix broken mingw cross-compilation
Two regressions:
Commit df1011ca broke builds for systems that lack devmapper
(non-Linux, as well as Linux with ./autogen.sh --without-libvirtd
and without the libraries present).
Commit ce6fd650 broke cross-compilation, due to a gnulib bug.

* .gnulib: Update to latest, for cross-compilation fix.
* src/util/util.c (virIsDevMapperDevice): Provide stub for
platforms not using storage driver.
* configure.ac (devmapper): Arrange to define HAVE_LIBDEVMAPPER_H.
devmapper issue reported by Wen Congyang.
2011-02-18 12:02:22 -07:00
Matthias Bolte
791da4e736 esx: Ignore malformed host UUID from BIOS
Etienne Gosset reported that libvirt fails to connect to his ESX
server because it failed to parse its malformed host UUID, that
contains an additional space and lacks one hexdigit in the last
group:

xxxxxxxx-xxxx-xxxx-xxxx- xxxxxxxxxxx

Don't treat this as a fatal error, just ignore it.
2011-02-18 17:59:05 +01:00
Michal Privoznik
595174aeb7 virsh: freecell --all getting wrong NUMA nodes count
Virsh freecell --all was not only getting wrong NUMA nodes count, but
even the NUMA nodes IDs. They doesn't have to be continuous, as I've
found out during testing this. Therefore a modification of
nodeGetCellsFreeMemory() error message.
2011-02-18 09:26:40 -07:00
Eric Blake
053013b0da build: recompute symbols after changing configure options
$ ./configure
...
$ make
...
  GEN    libvirt.syms
...
$ ./configure --with-driver-modules
...
$ make
...

libvirt.syms doesn't get regenerated but it should as it should
contain virDriverLoadModule now.

* src/Makefile.am (libvirt.syms): Depend on configure changes.
Reported by Matthias Bolte.
2011-02-18 08:58:36 -07:00
Jim Fehlig
efc2594b4e Do not add drive 'boot=on' param when a kernel is specified
libvirt-tck was failing several domain tests [1] with qemu 0.14, which
is now less tolerable of specifying 2 bootroms with the same boot index [2].

Drop the 'boot=on' param if kernel has been specfied.

[1] https://www.redhat.com/archives/libvir-list/2011-February/msg00559.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2011-02/msg01892.html
2011-02-17 20:32:48 -07:00
Christophe Fergeau
50daaa0a3e remove duplicated call to reportOOMError 2011-02-17 17:12:23 -07:00
Christophe Fergeau
787e38890e remove space between function name and (
There were several occurrences of an extra space inserted between
a function name and the ( opening the argument list in
datatypes.c. This is not consistent with the coding style used in
the rest of this file so removing this extra space makes the
code slightly more readable.
2011-02-17 17:10:27 -07:00
Christophe Fergeau
7b9a509953 don't check for NULL before calling virHashFree
virHashFree follows the convention described in HACKING that
XXXFree() functions can be called with a NULL argument.
2011-02-17 17:02:32 -07:00
Christophe Fergeau
9905c69e4f remove no longer needed calls to virReportOOMError
Now that the virHash handling functions call virReportOOMError by
themselves when needed, users of the virHash API no longer need to
do it by themselves. Since users of the virHash API were not
consistently calling virReportOOMError after memory failures from
the virHash code, this has the added benefit of making OOM
reporting from this code more consistent and reliable.
2011-02-17 16:59:14 -07:00
Christophe Fergeau
7f1c65e551 factor common code in virHashAddEntry and virHashUpdateEntry
The only difference between these 2 functions is that one errors
out when the entry is already present while the other modifies
the existing entry. Add an helper function with a boolean argument
indicating whether existing entries should be updated or not, and
use this helper in both functions.
2011-02-17 16:46:54 -07:00
Christophe Fergeau
5c5880e047 add hash table rebalancing in virHashUpdateEntry
The code in virHashUpdateEntry and virHashAddEntry is really
similar. However, the latter rebalances the hash table when
one of its buckets contains too many elements while the former
does not. Fix this discrepancy.
2011-02-17 16:45:25 -07:00
Eric Blake
aebe04d75e hash: modernize debug code
* src/util/hash.c (virHashGrow) [DEBUG_GROW]: Use modern logging.
Reported by Christophe Fergeau.
2011-02-17 16:33:12 -07:00
Wen Congyang
34c13d0d1a check more error info about whether drive_add failed
When we attach a disk, but we specify a wrong format of disk image,
qemu monitor command drive_add will fail, but libvirt does not detect
this error.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
2011-02-17 14:10:26 -07:00
Eric Blake
bb904f45ff logging: make VIR_ERROR and friends preserve errno
Followup to commit 17e19add, and would have prevented the bug
independently fixed in commit 76c57a7c.

* src/util/logging.c (virLogMessage): Preserve errno, since
logging should be as unintrusive as possible.
2011-02-17 14:03:11 -07:00
Laine Stump
5754dbd56d Give each virtual network bridge its own fixed MAC address
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=609463

The problem was that, since a bridge always acquires the MAC address
of the connected interface with the numerically lowest MAC, as guests
are started and stopped, it was possible for the MAC address to change
over time, and this change in the network was being detected by
Windows 7 (it sees the MAC of the default route change), so on each
reboot it would bring up a dialog box asking about this "new network".

The solution is to create a dummy tap interface with a MAC guaranteed
to be lower than any guest interface's MAC, and attach that tap to the
bridge as soon as it's created. Since all guest MAC addresses start
with 0xFE, we can just generate a MAC with the standard "0x52, 0x54,
0" prefix, and it's guaranteed to always win (physical interfaces are
never connected to these bridges, so we don't need to worry about
competing numerically with them).

Note that the dummy tap is never set to IFF_UP state - that's not
necessary in order for the bridge to take its MAC, and not setting it
to UP eliminates the clutter of having an (eg) "virbr0-nic" displayed
in the output of the ifconfig command.

I chose to not auto-generate the MAC address in the network XML
parser, as there are likely to be consumers of that API that don't
need or want to have a MAC address associated with the
bridge.

Instead, in bridge_driver.c when the network is being defined, if
there is no MAC, one is generated. To account for virtual network
configs that already exist when upgrading from an older version of
libvirt, I've added a %post script to the specfile that searches for
all network definitions in both the config directory
(/etc/libvirt/qemu/networks) and the state directory
(/var/lib/libvirt/network) that are missing a mac address, generates a
random address, and adds it to the config (and a matching address to
the state file, if there is one).

docs/formatnetwork.html.in: document <mac address.../>
docs/schemas/network.rng: add nac address to schema
libvirt.spec.in: %post script to update existing networks
src/conf/network_conf.[ch]: parse and format <mac address.../>
src/libvirt_private.syms: export a couple private symbols we need
src/network/bridge_driver.c:
    auto-generate mac address when needed,
    create dummy interface if mac address is present.
tests/networkxml2xmlin/isolated-network.xml
tests/networkxml2xmlin/routed-network.xml
tests/networkxml2xmlout/isolated-network.xml
tests/networkxml2xmlout/routed-network.xml: add mac address to some tests
2011-02-17 13:36:32 -05:00
Laine Stump
13ae7a02b3 Allow brAddTap to create a tap device that is down
An upcoming patch has a use for a tap device to be created that
doesn't need to be actually put into the "up" state, and keeping it
"down" keeps the output of ifconfig from being unnecessarily cluttered
(ifconfig won't show down interfaces unless you add "-a").

bridge.[ch]: add "up" as an arg to brAddTap()
uml_conf.c, qemu_command.c: add "up" (set to "true") to brAddTap() call.
2011-02-17 13:36:22 -05:00
Laine Stump
e9bd5c0e24 Add txmode attribute to interface XML for virtio backend
This is in response to:

   https://bugzilla.redhat.com/show_bug.cgi?id=629662

Explanation

qemu's virtio-net-pci driver allows setting the algorithm used for tx
packets to either "bh" or "timer". This is done by adding ",tx=bh" or
",tx=timer" to the "-device virtio-net-pci" commandline option.

'bh' stands for 'bottom half'; when this is set, packet tx is all done
in an iothread in the bottom half of the driver. (In libvirt, this
option is called the more descriptive "iothread".)

'timer' means that tx work is done in qemu, and if there is more tx
data than can be sent at the present time, a timer is set before qemu
moves on to do other things; when the timer fires, another attempt is
made to send more data. (libvirt retains the name "timer" for this
option.)

The resulting difference, according to the qemu developer who added
the option is:

    bh makes tx more asynchronous and reduces latency, but potentially
    causes more processor bandwidth contention since the cpu doing the
    tx isn't necessarily the cpu where the guest generated the
    packets.

Solution

This patch provides a libvirt domain xml knob to change the option on
the qemu commandline, by adding a new attribute "txmode" to the
<driver> element that can be placed inside any <interface> element in
a domain definition. It's use would be something like this:

    <interface ...>
      ...
      <model type='virtio'/>
      <driver txmode='iothread'/>
      ...
    </interface>

I chose to put this setting as an attribute to <driver> rather than as
a sub-element to <tune> because it is specific to the virtio-net
driver, not something that is generally usable by all network drivers.
(note that this is the same placement as the "driver name=..."
attribute used to choose kernel vs. userland backend for the
virtio-net driver.)

Actually adding the tx=xxx option to the qemu commandline is only done
if the version of qemu being used advertises it in the output of

    qemu -device virtio-net-pci,?

If a particular txmode is requested in the XML, and the option isn't
listed in that help output, an UNSUPPORTED_CONFIG error is logged, and
the domain fails to start.
2011-02-17 11:07:58 -05:00
Laine Stump
b670a41206 Restructure domain struct interface "driver" data for easier expansion
When the <driver> element (and its "name" attribute) was added to the
domain XML's interface element, a "backend" enum was simply added to
the toplevel of the virDomainNetDef struct.

Ignoring the naming inconsistency ("name" vs. "backend"), this is fine
when there's only a single item contained in the driver element of the
XML, but doesn't scale well as we add more attributes that apply to
the backend of the virtio-net driver, or add attributes applicable to
other drivers.

This patch changes virDomainNetDef in two ways:

1) Rename the item in the struct from "backend" to "name", so that
   it's the same in the XML and in the struct, hopefully avoiding
   confusion for someone unfamiliar with the function of the
   attribute.

2) Create a "driver" union within virDomainNetDef, and a "virtio"
   struct in that struct, which contains the "name" enum value.

3) Move around the virDomainNetParse and virDomainNetFormat functions
   to allow for simple plugin of new attributes without disturbing
   existing code. (you'll note that this results in a seemingly
   redundant if() in the format function, but that will no longer be
   the case as soon as a 2nd attribute is added).

In the future, new attributes for the virtio driver backend can be
added to the "virtio" struct, and any other network device backend that
needs an attribute will have its own struct added to the "driver"
union.
2011-02-17 11:07:45 -05:00
Daniel P. Berrange
766de43533 Move all the QEMU migration code to a new file
The introduction of the v3 migration protocol, along with
support for migration cookies, will significantly expand
the size of the migration code. Move it all to a separate
file to make it more manageable

The functions are not moved 100%. The API entry points
remain in the main QEMU driver, but once the public
virDomainPtr is resolved to the internal virDomainObjPtr,
all following code is moved.

This will allow the new v3 API entry points to call into the
same shared internal migration functions

* src/qemu/qemu_domain.c, src/qemu/qemu_domain.h: Add
  qemuDomainFormatXML helper method
* src/qemu/qemu_driver.c: Remove all migration code
* src/qemu/qemu_migration.c, src/qemu/qemu_migration.h: Add
  all migration code.
2011-02-17 12:56:10 +00:00