Commit Graph

593 Commits

Author SHA1 Message Date
Stefan Berger
4435f3c477 nwfilter: resolve deadlock between VM ops and filter update
This is from a bug report and conversation on IRC where Soren reported that while a filter update is occurring on one or more VMs (due to a rule having been edited for example), a deadlock can occur when a VM referencing a filter is started.

The problem is caused by the two locking sequences of

qemu driver, qemu domain, filter             # for the VM start operation
filter, qemu_driver, qemu_domain            # for the filter update operation

that obviously don't lock in the same order. The problem is the 2nd lock sequence. Here the qemu_driver lock is being grabbed in qemu_driver:qemudVMFilterRebuild()

The following solution is based on the idea of trying to re-arrange the 2nd sequence of locks as follows:

qemu_driver, filter, qemu_driver, qemu_domain

and making the qemu driver recursively lockable so that a second lock can occur, this would then lead to the following net-locking sequence

qemu_driver, filter, qemu_domain

where the 2nd qemu_driver lock has been ( logically ) eliminated.

The 2nd part of the idea is that the sequence of locks (filter, qemu_domain) and (qemu_domain, filter) becomes interchangeable if all code paths where filter AND qemu_domain are locked have a preceding qemu_domain lock that basically blocks their concurrent execution

So, the following code paths exist towards qemu_driver:qemudVMFilterRebuild where we now want to put a qemu_driver lock in front of the filter lock.

-> nwfilterUndefine()   [ locks the filter ]
    -> virNWFilterTestUnassignDef()
        -> virNWFilterTriggerVMFilterRebuild()
            -> qemudVMFilterRebuild()

-> nwfilterDefine()
    -> virNWFilterPoolAssignDef() [ locks the filter ]
        -> virNWFilterTriggerVMFilterRebuild()
            -> qemudVMFilterRebuild()

-> nwfilterDriverReload()
    -> virNWFilterPoolLoadAllConfigs()
        ->virNWFilterPoolObjLoad()
            -> virNWFilterPoolAssignDef() [ locks the filter ]
                -> virNWFilterTriggerVMFilterRebuild()
                    -> qemudVMFilterRebuild()

-> nwfilterDriverStartup()
    -> virNWFilterPoolLoadAllConfigs()
        ->virNWFilterPoolObjLoad()
            -> virNWFilterPoolAssignDef() [ locks the filter ]
                -> virNWFilterTriggerVMFilterRebuild()
                    -> qemudVMFilterRebuild()

Qemu is not the only driver using the nwfilter driver, but also the UML driver calls into it. Therefore qemuVMFilterRebuild() can be exchanged with umlVMFilterRebuild() along with the driver lock of qemu_driver that can now be a uml_driver. Further, since UML and Qemu domains can be running on the same machine, the triggering of a rebuild of the filter can touch both types of drivers and their domains.

In the patch below I am now extending each nwfilter callback driver with functions for locking and unlocking the (VM) driver (UML, QEMU) and introduce new functions for locking all registered callback drivers and unlocking them. Then I am distributing the lock-all-cbdrivers/unlock-all-cbdrivers call into the above call paths. The last shown callpath starting with nwfilterDriverStart() is problematic since it is initialize before the Qemu and UML drives are and thus a lock in the path would result in a NULL pointer attempted to be locked -- the call to virNWFilterTriggerVMFilterRebuild() is never called, so we never lock either the qemu_driver or the uml_driver in that path. Therefore, only the first 3 paths now receive calls to lock and unlock all callback drivers. Now that the locks are distributed where it matters I can remove the qemu_driver and uml_driver lock from qemudVMFilterRebuild() and umlVMFilterRebuild() and not requiring the recursive locks.

For now I want to put this out as an RFC patch. I have tested it by 'stretching' the critical section after the define/undefine functions each lock the filter so I can (easily) concurrently execute another VM operation (suspend,start). That code is in this patch and if you want you can de-activate it. It seems to work ok and operations are being blocked while the update is being done.
I still also want to verify the other assumption above that locking filter and qemu_domain always has a preceding qemu_driver lock.
2010-10-13 10:33:26 -04:00
Daniel P. Berrange
a5c646a770 Implement support for virtio plan9fs filesystem passthrough in QEMU
Make use of the existing <filesystem> element to support plan9fs
filesystem passthrough in the QEMU driver

    <filesystem type='mount'>
      <source dir='/export/to/guest'/>
      <target dir='/import/from/host'/>
    </filesystem>

NB, the target is not actually a directory, it is merely a arbitrary
string tag that is exported to the guest as a hint for where to mount
it.
2010-10-13 12:04:50 +01:00
Nikunj A. Dadhania
261ad74e52 Adding memtunables to qemuSetupCgroup
QEmu startup will pick up the memory tunables specified in the domain
configuration file
2010-10-12 19:26:09 +02:00
Nikunj A. Dadhania
013fe4b848 Implement domainGetMemoryParamters for QEmu
Driver interface for getting memory parameters, eg. hard_limit,
soft_limit and swap_hard_limit based on cgroup support
2010-10-12 19:26:09 +02:00
Nikunj A. Dadhania
71d0b4275d Implement domainSetMemoryParamters for QEmu
Driver interface for setting memory hard_limit, soft_limit and swap
hard_limit based on cgroup support
2010-10-12 19:26:09 +02:00
Nikunj A. Dadhania
d390fce413 XML parsing for memory tunables
Adding parsing code for memory tunables in the domain xml file
also change the internal define structures used for domain memory
informations
Adds a new specific test
2010-10-12 19:26:09 +02:00
Nikunj A. Dadhania
0cd7823271 Adding virDomainSetMemoryParameters and virDomainGetMemoryParameters API
Public api to set/get memory tunables supported by the hypervisors.

dv:
* some cleanups in libvirt.c
* adding extra checks in libvirt.c new entry points

v4:
* Move exporting public API to this patch
* Add unsigned int flags to the public api for future extensions

v3:
* Add domainGetMemoryParamters and NULL in all the driver interface

v2:
* Initialize domainSetMemoryParameters to NULL in all the driver
  interface structure.
2010-10-12 19:26:09 +02:00
Guido Günther
2ae5086c97 Return a suitable error message if we can't find a matching emulator 2010-10-12 09:07:53 +02:00
Daniel P. Berrange
48ab20999f Fix off-by-1 in QEMU boot arg array handling
A QEMU guest can have upto VIR_DOMAIN_BOOT_LAST boot entries
defined. When building the QEMU arg, each entry takes a
single byte. This means the array must be declared to be
VIR_DOMAIN_BOOT_LAST+1 bytes in length to allow for the
trailing null

* src/qemu/qemu_conf.c: Fix off-by-1 boot arg array size
2010-09-10 11:14:01 +01:00
Luiz Capitulino
e70880c51b qemu: qemuMonitorJSONEjectMedia(): Fix arguments' type
QMP in QEMU 0.13 has been fixed to enforce type correctness,
this means that boolean types must be true or false, not
integers.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2010-09-09 16:29:40 -06:00
Luiz Capitulino
ffefe5fb86 qemu: qemuMonitorJSONMigrate(): Fix arguments' type
QMP in QEMU 0.13 has been fixed to enforce type correctness,
this means that boolean types must be true or false, not
integers.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2010-09-09 16:29:27 -06:00
Soren Hansen
efe4e210b8 Rename qemudShrinkDisks to virDomainDiskRemove and move to domain_conf.c
Other drivers will need this same functionality, so move it to up to
conf/domain_conf.c and give it a more general name.

Signed-off-by: Soren Hansen <soren@linux2go.dk>
2010-08-24 20:17:48 +02:00
Daniel P. Berrange
6e44ec7a91 Add support for -enable-kqemu flag
Previously QEMU enabled KQEMU by default and had -no-kqemu.
0.11.x switched to requiring -enable-kqemu. 0.12.x dropped
kqemu entirely. This patch adds support for -enable-kqemu
so 0.11.x works. It replaces a huge set of if() with a
switch() to make the code a bit more readable.

* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Support
  -enable-kqemu
2010-08-23 14:10:15 +01:00
Jiri Denemark
7fb3435186 qemu: Remove code duplication
We already filled the PCI address structure when we checked whether it's
free or not, so let's just use the structure here instead of filling it
again.
2010-08-20 16:26:28 +02:00
Jiri Denemark
1208e6e488 qemu: Check for errors when converting PCI address to string 2010-08-20 16:26:28 +02:00
Jiri Denemark
72c791e430 qemu: Fix JSON migrate_set_downtime command 2010-08-20 16:26:28 +02:00
Eric Blake
4b93002358 build: delete dead comments
* src/qemu/qemu_driver.c (qemudGetProcessInfo): Clean up.
* src/uml/uml_driver.c (umlGetProcessInfo): Likewise.
* src/xen/sexpr.c (_string2sexpr): Likewise.
2010-08-19 16:09:46 -06:00
Chris Lalancette
4303c91cc3 Fix up qemu domain save/managed save locking.
The current version of the qemu managed save implementation
is subject to a race where the domain shuts down between
the time that we start the command and the time that we
actually try to do the save.  Close this race by making
qemuDomainSaveFlags() expect both the driver and the passed-in
vm object to be locked before executing.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-08-17 16:18:49 -04:00
Jiri Denemark
0a5f3ae0c6 qemu: Fix copy&paste error in warning message
This also makes the message consistent with the message used in error
path of qemudDomainAttachHostPciDevice.
2010-08-16 21:37:13 +02:00
Jiri Denemark
5afec51730 qemu: Release PCI slot when detaching disk and net devices 2010-08-16 21:36:59 +02:00
Jiri Denemark
4f86613ba1 qemu: Re-reserve all PCI addresses on libvirtd restart
When reconnecting to existing VMs, we re-reserved only those PCI
addresses which were explicitly mentioned in domain XML. Since some
addresses are always reserved (e.g., 0:0:0 and 0:0:1), we need to handle
those too.

Also all this should only be done if device flag is supported by qemu.
2010-08-16 21:36:53 +02:00
Stefan Berger
cf6f8b9a97 nwfilter: extend nwfilter reload support
In this patch I am extending and fixing the nwfilter module's reload support to stop all ongoing threads (for learning IP addresses of interfaces) and rebuild the filtering rules of all interfaces of all VMs when libvirt is started. Now libvirtd rebuilds the filters upon the SIGHUP signal and libvirtd restart.

About the patch: The nwfilter functions require a virConnectPtr. Therefore I am opening a connection in qemudStartup, which later on needs to be closed outside where the driver lock is held since otherwise it ends up in a deadlock due to virConnectClose() trying to lock the driver as well.

I have tested this now for a while with several machines running and needing the IP address learner thread(s). The rebuilding of the firewall rules seems to work fine following libvirtd restart or a SIGHUP. Also the termination of libvirtd worked fine.
2010-08-16 12:59:54 -04:00
Chris Lalancette
e80f1a7e3f Move the tunnelled migration unix socket to /var/lib/libvirt/qemu
Since the qemu process is running as qemu:qemu, it can't actually
look at the unix socket in /var/run/libvirt/qemu which is owned by
root and has permission 700.  Move the unix socket to
/var/lib/libvirt/qemu, which is already owned by qemu:qemu.

Thanks to Justin Clift for test this out for me.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-08-13 08:39:53 -04:00
Chris Lalancette
a2f0b6b81d Fix tunnelled migration with qemu running as qemu:qemu.
The problem is that on the source of the migration, libvirtd
is responsible for creating the unix socket over which the data
will flow.  Since libvirtd is running as root, this file will
be created as root.  When the qemu process running as qemu:qemu
goes to access the unix file to write data to it, it will get
permission denied and fail.  Make sure to change the owner
of the unix file to qemu:qemu.

Thanks to Justin Clift for testing this patch out for me.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-08-13 08:39:46 -04:00
Daniel Veillard
986c208695 qemu: avoid calling the balloon info command if disabled
Basically a followup of the previous patch about balloon desactivation
if desactivated, to not ask for balloon information to qemu as we will
just get an error back.
 This can make a huge difference in the time needed for domain
information or list when a machine is loaded, and balloon has been
desactivated in the guests.

* src/qemu/qemu_driver.c: do not get the balloon info if the balloon
  suppor is disabled
2010-08-12 18:32:16 +02:00
Daniel Veillard
79c27a620a allow memballoon type of none to desactivate it
The balloon device is automatically added to qemu guests if supported,
but it may be useful to desactivate it. The simplest to not change the
existing behaviour is to allow
  <memballoon type="none"/>
as an extra option to desactivate it (it is automatically added if the
memballoon construct is missing for the domain).
The following simple patch just adds the extra option and does not
change the default behaviour but avoid creating a balloon device if
type="none" is used.

* docs/schemas/domain.rng: add the extra type attribute value
* src/conf/domain_conf.c src/conf/domain_conf.h: add the extra enum
  value
* src/qemu/qemu_conf.c: if enum is NONE, don't activate the device,
  i.e. don't pass the args to qemu/kvm
2010-08-11 11:28:17 +02:00
Jiri Denemark
d1e5676c0d qemu: Hack around asynchronous device_del
device_del command is not synchronous for PCI devices, it merely asks
the guest to release the device and returns. If the host wants to use
that device before the guest actually releases it, we are in big
trouble. To avoid this, we already added a loop which waits up to 10
seconds until the device is actually released before we do anything else
with that device. But we only added this loop for managed PCI devices
before we try reattach them back to the host.

However, we need to wait even for non-managed devices. We don't reattach
them automatically, but we still want to prevent the host from using it.
This was revealed thanks to sVirt: when we relabel sysfs files
corresponding to the PCI device before the guest finished releasing the
device, qemu is no longer allowed to access those files and if it wants
(as a result of guest's request) to write anything to them, it just
exits, which kills the guest.

This is not a proper fix and needs some further work both on libvirt and
qemu side in the future.
2010-08-10 16:59:49 +02:00
Doug Goldstein
0890a70a19 Fix return value usage
Fix the error checking to use the return value from brAddTap() instead
of checking the current errno value which might have been changed by
clean up calls inside of brAddTap().

Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
2010-08-05 17:05:16 -06:00
Doug Goldstein
bcc8b58be3 qemu: improve error if tun device is missing
Added a more detailed error message when adding a tap devices fails and
the kernel is missing tun support.

Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
2010-08-05 17:04:38 -06:00
Daniel Veillard
634ea3faae Do not use boot=on on IDE device
the followup on the boot=on problem, basically it's not needed to
specify it when booting out of IDE devices when using KVM
* src/qemu/qemu_conf.c: do not use boot=on for IDE devices
* tests/qemuxml2argvdata/qemuxml2argv*.args: this changes the output
  for 5 of the tests
2010-08-04 18:31:44 +02:00
Jiri Denemark
bf0bf4e783 qemu: Fix PCI address allocation
Patch version revamped by Eric Blake <eblake@redhat.com> of Jiri
Denemark <jdenemar@redhat.com> original patch

When attaching a PCI device which doesn't explicitly set its PCI
address, libvirt allocates the address automatically. The problem is
that when checking which PCI address is unused, we only check for those
with slot number higher than the highest slot number ever used.

Thus attaching/detaching such device several times in a row (31 is the
theoretical limit, less then 30 tries are enough in practise) makes any
further device attachment fail. Furthermore, attaching a device with
predefined PCI address to 0:0:31 immediately forbids attachment of any
PCI device without explicit address.

This patch changes the logic so that we always check all PCI addresses
before we say there is no PCI address available.

Modifications from v1: revert back to remembering the last slot
reserved, but allow wraparound to not be limited by the end.
In this way, slots are still assigned in the same order as
before the patch, rather than filling in the gaps closest to
0 and risking making windows guests mad.

* src/qemu/qemu_conf.c: fix pci reservation code to do a round-robbin
  check of all available PCI splot availability before failing.
2010-08-04 14:46:06 +02:00
Eric Blake
6790805d6e qemu: don't lose error on setting monitor capabilities
Spotted by clang.  Regression introduced in commit e72cc3c11d.

* src/qemu/qemu_driver.c (qemuConnectMonitor): Don't lose error status.
2010-08-02 14:16:10 -06:00
Eric Blake
68e4be71be qemu: kill some dead stores
Spotted by clang.

* src/qemu/qemu_monitor.c (qemuMonitorClose): Kill dead store.
* src/qemu/qemu_driver.c (qemudDomainSaveImageStartVM): Likewise.
2010-07-30 11:33:26 -06:00
Daniel Veillard
e7da872294 Do not activate boot=on on devices when not using KVM
Basically the 'boot=on' boot selection device is something present in
KVM but not in upstream QEmu, as a result if we boot a QEmu domain
without KVM acceleration we must disable boot=on ... even if the front
end kvm binary expose that capability in the help page.

* src/qemu/qemu_conf.c: in qemudBuildCommandLine if -no-kvm
  is passed, then deactivate QEMUD_CMD_FLAG_DRIVE_BOOT
2010-07-30 16:38:48 +02:00
Chris Lalancette
4313e1b9b1 Fix a memory leak in the qemudBuildCommandLine.
ADD_ARG_LIT should only be used for literal arguments,
since it duplicates the memory.  Since virBufferContentAndReset
is already allocating memory, we should only use ADD_ARG.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-07-30 10:01:50 -04:00
Chris Lalancette
56b408231a Fix a potential race in pciInitDevice.
If detecting the FLR flag of a pci device fails, then we
could run into the situation of trying to close a file
descriptor twice, once in pciInitDevice() and once in pciFreeDevice().
Fix that by removing the pciCloseConfig() in pciInitDevice() and
just letting pciFreeDevice() handle it.

Thanks to Chris Wright for pointing out this problem.

While we are at it, fix an error check.  While it would actually
work as-is (since success returns 0), it's still more clear to
check for < 0 (as the rest of the code does).

Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-07-29 10:18:23 -04:00
Cole Robinson
82b6d7600e qemu: virtio console support
Enable specifying a virtio console device with:

<console type='pty'>
  <target type='virtio'/>
</console>
2010-07-28 16:48:00 -04:00
Cole Robinson
6b24755235 domain conf: Track <console> target type
All <console> devices now export a <target> type attribute. QEMU defaults
to 'serial', UML defaults to 'uml, xen can be either 'serial' or 'xen'
depending on fullvirt. Understandably there is lots of test fallout.

This will be used to differentiate between a serial vs. virtio console for
QEMU.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
2010-07-28 16:47:59 -04:00
Cole Robinson
6488ea2c5c domain conf: char: Add an explicit targetType field
targetType only tracks the actual <target> format we are parsing. Currently
we only fill abide this value for channel devices.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
2010-07-28 16:47:58 -04:00
Cole Robinson
50147933a5 domain conf: Rename character prop targetType -> deviceType
There is actually a difference between the character device type (serial,
parallel, channel, ...) and the target type (virtio, guestfwd). Currently
they are awkwardly conflated.

Start to pull them apart by renaming targetType -> deviceType. This is
an entirely mechanical change.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
2010-07-28 16:47:57 -04:00
Chris Lalancette
8bb0cd14e7 Fix up confusing indentation in qemudDomainAttachHostPciDevice.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-07-28 09:47:47 -04:00
Daniel P. Berrange
9749d94f7b Invert logic for checking for QEMU disk cache options
QEMU has had two different syntax for disk cache options

 Old: on|off
 New: writeback|writethrough|none

QEMU recently added another 'unsafe' option which broke the
libvirt check. We can avoid this & future breakage, if we
do a negative check for the old syntax, instead of a positive
check for the new syntax

* src/qemu/qemu_conf.c: Invert cache option check
2010-07-28 11:27:13 +01:00
Cole Robinson
4f24ca01e8 qemu: Allow setting boot menu on/off
Add a new element to the <os> block:

  <bootmenu enable="yes|no"/>

Which maps to -boot,menu=on|off on the QEMU command line.

I decided to use an explicit 'enable' attribute rather than just make the
bootmenu element boolean. This allows us to treat lack of a bootmenu element
as 'use hypervisor default'.
2010-07-27 16:38:32 -04:00
Cole Robinson
6fe9025eb5 qemu: Error on unsupported graphics config
Throw an explicit error if multiple graphics devices are specified, or
an unsupported type is specified (rdp).
2010-07-27 15:41:36 -04:00
Chris Wright
46bcdb960d pciResetDevice: use inactive devices to determine safe reset
When doing a PCI secondary bus reset, we must be sure that there are no
active devices on the same bus segment.  The active device tracking is
designed to only track host devices that are active in use by guests.
This ignores host devices that are actively in use by the host.  So the
current logic will reset host devices.

Switch this logic around and allow sbus reset when we are assigning all
devices behind a bridge to the same guest at guest startup or as a result
of a single attach-device command.

* src/util/pci.h: change signature of pciResetDevice to add an
  inactive devices list
* src/qemu/qemu_driver.c src/xen/xen_driver.c: use (or not) the new
  functionality of pciResetDevice() depending on the place of use
* src/util/pci.c: implement the interface and logic changes
2010-07-26 18:43:04 +02:00
Chris Wright
042b208370 qemudDomainAttachHostPciDevice refactor to use new helpers
- src/qemu/qemu_driver.c: Eliminate code duplication by using the new
  helpers qemuPrepareHostdevPCIDevices and qemuDomainReAttachHostdevDevices.
  This reduces the number of open coded calls to pciResetDevice.
2010-07-26 18:34:24 +02:00
Chris Wright
f1365b558d Add helpers qemuPrepareHostdevPCIDevice and qemuDomainReAttachHostdevDevices
- src/qemu/qemu_driver.c: These new helpers take hostdev list and count
  directly rather than getting them indirectly from domain definition.
  This will allow reuse for the attach-device case.
2010-07-26 18:23:17 +02:00
Chris Wright
8bd00c0edf qemuGetPciHostDeviceList take hostdev list directly
- src/qemu/qemu_driver.c: Update qemuGetPciHostDeviceList to take a
  hostdev list and count directly, rather than getting this indirectly
  from domain definition. This will allow reuse for the attach-device case.
2010-07-26 18:17:20 +02:00
Chris Lalancette
a71be01f04 Add tests for the new Qemu namespace XML.
Thanks to DV for knocking together the Relax-NG changes
quickly for me.

Changes since v1:
 - Change the domain.rng to correspond to the new schema
 - Don't allocate caps->ns in testQemuCapsInit since it is a static table

Changes since v2:
 - Change domain.rng to add restrictions on allowed environment names

Changes since v3:
 - Remove a bogus comment in the tests

Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-07-23 17:30:45 -04:00
Chris Lalancette
057e855324 Qemu arbitrary monitor commands.
Implement the qemu driver's virDomainQemuMonitorCommand
and hook it into the API entry point.

Changes since v1:
 - Rename the (external) qemuMonitorCommand to qemuDomainMonitorCommand
 - Add virCheckFlags to qemuDomainMonitorCommand

Changes since v2:
 - Drop ATTRIBUTE_UNUSED from the flags

Changes since v3:
 - Add a flag to priv so we only print out monitor command warning once.  Note
   that this has not been plumbed into qemuDomainObjPrivateXMLFormat or
   qemuDomainObjPrivateXMLParse, which means that if you run a monitor command,
   restart libvirtd, and then run another monitor command, you may get an
   an erroneous VIR_INFO.  It's a pretty minor matter, and I didn't think it
   warranted the additional code.
 - Add BeginJob/EndJob calls around EnterMonitor/ExitMonitor

Signed-off-by: Chris Lalancette <clalance@redhat.com>
2010-07-23 17:30:24 -04:00