This is from a bug report and conversation on IRC where Soren reported that while a filter update is occurring on one or more VMs (due to a rule having been edited for example), a deadlock can occur when a VM referencing a filter is started.
The problem is caused by the two locking sequences of
qemu driver, qemu domain, filter # for the VM start operation
filter, qemu_driver, qemu_domain # for the filter update operation
that obviously don't lock in the same order. The problem is the 2nd lock sequence. Here the qemu_driver lock is being grabbed in qemu_driver:qemudVMFilterRebuild()
The following solution is based on the idea of trying to re-arrange the 2nd sequence of locks as follows:
qemu_driver, filter, qemu_driver, qemu_domain
and making the qemu driver recursively lockable so that a second lock can occur, this would then lead to the following net-locking sequence
qemu_driver, filter, qemu_domain
where the 2nd qemu_driver lock has been ( logically ) eliminated.
The 2nd part of the idea is that the sequence of locks (filter, qemu_domain) and (qemu_domain, filter) becomes interchangeable if all code paths where filter AND qemu_domain are locked have a preceding qemu_domain lock that basically blocks their concurrent execution
So, the following code paths exist towards qemu_driver:qemudVMFilterRebuild where we now want to put a qemu_driver lock in front of the filter lock.
-> nwfilterUndefine() [ locks the filter ]
-> virNWFilterTestUnassignDef()
-> virNWFilterTriggerVMFilterRebuild()
-> qemudVMFilterRebuild()
-> nwfilterDefine()
-> virNWFilterPoolAssignDef() [ locks the filter ]
-> virNWFilterTriggerVMFilterRebuild()
-> qemudVMFilterRebuild()
-> nwfilterDriverReload()
-> virNWFilterPoolLoadAllConfigs()
->virNWFilterPoolObjLoad()
-> virNWFilterPoolAssignDef() [ locks the filter ]
-> virNWFilterTriggerVMFilterRebuild()
-> qemudVMFilterRebuild()
-> nwfilterDriverStartup()
-> virNWFilterPoolLoadAllConfigs()
->virNWFilterPoolObjLoad()
-> virNWFilterPoolAssignDef() [ locks the filter ]
-> virNWFilterTriggerVMFilterRebuild()
-> qemudVMFilterRebuild()
Qemu is not the only driver using the nwfilter driver, but also the UML driver calls into it. Therefore qemuVMFilterRebuild() can be exchanged with umlVMFilterRebuild() along with the driver lock of qemu_driver that can now be a uml_driver. Further, since UML and Qemu domains can be running on the same machine, the triggering of a rebuild of the filter can touch both types of drivers and their domains.
In the patch below I am now extending each nwfilter callback driver with functions for locking and unlocking the (VM) driver (UML, QEMU) and introduce new functions for locking all registered callback drivers and unlocking them. Then I am distributing the lock-all-cbdrivers/unlock-all-cbdrivers call into the above call paths. The last shown callpath starting with nwfilterDriverStart() is problematic since it is initialize before the Qemu and UML drives are and thus a lock in the path would result in a NULL pointer attempted to be locked -- the call to virNWFilterTriggerVMFilterRebuild() is never called, so we never lock either the qemu_driver or the uml_driver in that path. Therefore, only the first 3 paths now receive calls to lock and unlock all callback drivers. Now that the locks are distributed where it matters I can remove the qemu_driver and uml_driver lock from qemudVMFilterRebuild() and umlVMFilterRebuild() and not requiring the recursive locks.
For now I want to put this out as an RFC patch. I have tested it by 'stretching' the critical section after the define/undefine functions each lock the filter so I can (easily) concurrently execute another VM operation (suspend,start). That code is in this patch and if you want you can de-activate it. It seems to work ok and operations are being blocked while the update is being done.
I still also want to verify the other assumption above that locking filter and qemu_domain always has a preceding qemu_driver lock.
This makes 'virsh --conn test:///default help help' work right;
previously, the abbreviation confused our hand-rolled option parsing.
* tools/virsh.c (vshParseArgv): Use getopt_long feature, rather
than (incorrectly) reparsing options ourselves.
Some users may type command like this at the virsh shell:
virsh # somecmd 'some arg'
because they often use single quote in linux shell.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Old virsh command parsing mashes all the args back into a string and
miss the quotes, this patches fix it. It is also needed for introducing
qemu-monitor-command which is very useful.
This patches uses the new vshCommandParser abstraction and adds
vshCommandArgvParse() for arguments vector, so we don't need
to mash arguments vector into a command sting.
And the usage was changed:
old:
virsh [options] [commands]
new:
virsh [options]... [<command_string>]
virsh [options]... <command> [args...]
So we still support commands like:
"define D.xml; dumpxml D" was parsed as a commands-string.
and support commands like:
we will not mash them into a string, we use new argv parser for it.
But we don't support the command like:
"define D.xml; dumpxml" was parsed as a command-name, but we have no such command-name.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
add vshCommandParser and make vshCommandParse() accept different
parsers.
the current code for parse command string is integrated as
vshCommandStringParse().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
in old code the following commands are equivalent:
virsh # dumpxml --update-cpu=vm1
virsh # dumpxml --update-cpu vm1
because the old code split the option argument into 2 parts:
--update-cpu=vm1 is split into update-cpu and vm1,
and update-cpu is a boolean option, so the parser takes vm1 as another
argument, very strange.
after this patch applied, the first one will become illegal.
To achieve this, we don't parse/check options when parsing command sting,
but check options when parsing a command argument. And the argument is
not split when parsing command sting.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
* tools/virsh.c (malloc, calloc, realloc, strdup): Enforce that
within this file, we use the safe vsh wrappers instead.
(cmdNodeListDevices, cmdSnapshotCreate, main): Fix violations of
this policy.
In origin code, double quote is only allowed at the begin or end
"complicated argument"
--some_opt="complicated string" (we split this argument into 2 parts,
option and data, the data is "complicated string").
This patch makes it allow double quote at any position of
an argument:
complicated" argument"
complicated" "argument
--"some opt=complicated string"
This patch is also needed for the following patches,
the following patches will not split option argument into 2 parts,
so we have to allow double quote at any position of an argument.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
* include/libvirt/libvirt.h.in: some of the function type description
were broken so they could not be automatically documented
* src/util/event.c docs/apibuild.py: event.c exports one public API
so it needs to be scanned too, avoid a few warnings
Make use of the existing <filesystem> element to support plan9fs
filesystem passthrough in the QEMU driver
<filesystem type='mount'>
<source dir='/export/to/guest'/>
<target dir='/import/from/host'/>
</filesystem>
NB, the target is not actually a directory, it is merely a arbitrary
string tag that is exported to the guest as a hint for where to mount
it.
Add proper documentation to the new VIR_DOMAIN_MEMORY_* macros in
libvirt.h.in to placate apibuild.py.
Mark args as unused in for libvirt_virDomain{Get,Set}MemoryParameters
in the Python bindings and add both to the libvirtMethods array.
Update remote_protocol-structs to placate make syntax-check.
Undo unintended modifications in vboxDomainGetInfo.
Update the function table of the VirtualBox and XenAPI drivers.
The command helps to control the memory/swap parameters for the system, for
eg. hard_limit (max memory the vm can use), soft_limit (limit during memory
contention), swap_hard_limit(max swap the vm can use)
Adding parsing code for memory tunables in the domain xml file
also change the internal define structures used for domain memory
informations
Adds a new specific test
Public api to set/get memory tunables supported by the hypervisors.
dv:
* some cleanups in libvirt.c
* adding extra checks in libvirt.c new entry points
v4:
* Move exporting public API to this patch
* Add unsigned int flags to the public api for future extensions
v3:
* Add domainGetMemoryParamters and NULL in all the driver interface
v2:
* Initialize domainSetMemoryParameters to NULL in all the driver
interface structure.
This patch adds a structure virMemoryParameter, it contains the name of
the
parameter and the type of the parameter along with a union.
dv:
+ rename enums to VIR_DOMAIN_MEMORY_PARAM_*
+ remove some extraneous tabs
v4:
+ Add unsigned int flags to the public api for future extensions
v3:
+ Protoype for virDomainGetMemoryParameters and dummy python binding.
v2:
+ Includes dummy python bindings for the library to build cleanly.
+ Define string constants like "hard_limit", etc.
+ re-order this patch.
Some features provided by the recently added CPU models were mentioned
twice for each model. This was a result of automatic generation of the
XML from qemu's CPU configuration file without noticing this redundancy.
To enable the CPU XML from the capabilities to be pasted directly
into the guest XML with no editing, pick a sensible default for
match and feature policy. The CPU match will be exact and the
feature policy will be require. This should ensure safety for
migration and give DWIM semantics for users
* src/conf/cpu_conf.c: Default to exact match and require policy
* docs/formatdomain.html.in: Document new defaults
This adds a script to generate the todo item page from
bugzilla. This requires a valid username+password for
bugzilla, so it is intended that this only be run on
the libvirt.org website via cron. Normal usage will just
generate an empty stub page.
* docs/todo.pl: Script to extract todo items from bugzilla
* docs/todo.cfg-example: Example config file
* docs/sitemap.html.in: Add todo page
* docs/Makefile.am: Generation rules for todo items
According to API documentation virDomain{At,De}tachDevice calls are
supposed to only work on active guests for device hotplug. For anything
beyond that, their *Flags variants have to be used.
Despite the variant which was acked on libvirt mailing list
(https://www.redhat.com/archives/libvir-list/2010-January/msg00385.html)
commit ed9c14a7ef (by Jim Fehlig)
introduced automagic behavior of these API calls for xen driver. Since
January, these calls always change persistent configuration of a guest
and if the guest is currently active, they also hot(un)plug the device.
That change didn't follow API documentation and also broke device
hot(un)plug for older xend implementations which do not support changing
persistent configuration of a guest and hot(un)plugging in one step.
This patch should not break anything for active guests. On the other
hand, changing inactive guests is not supported any more.
When a user calls to virDomain{Attach,Detach,Update}DeviceFlags() with
flags == VIR_DOMAIN_DEVICE_MODIFY_LIVE on an inactive guest running on
an old Xen hypervisor (such as RHEL-5) xend_internal driver reports:
Xend version does not support modifying persistent config
which is pretty confusing since no-one requested to modify persistent
config.
This patch adds another example to the nwfilter html page and provides 2 solutions for how to write a filter meeting the given requirements using newly added features.
This patch adds a test case for testing the XML parser's and instantiator's
support of the state attribute. The other test case tests existing
capabilities. Both test cases will be used in TCK again.
In this patch I am extending the rule instantiator to create the state
match according to the state attribute in the XML. Only one iptables
rule in the incoming or outgoing direction will be created for a rule
in direction 'in' or 'out' respectively. A rule in direction 'inout' does
get iptables rules in both directions.