The nwfilter conf update mutex previously serialized
updates to the internal data structures for firewall
rules, and updates to the firewall itself. The latter
was recently turned into a read/write lock, and filter
instantiation allowed to proceed in parallel. It was
believed that this was ok, since each filter is created
on a separate iptables/ebtables chain.
It turns out that there is a subtle lock ordering problem
on virNWFilterObjPtr instances. __virNWFilterInstantiateFilter
will hold a lock on the virNWFilterObjPtr it is instantiating.
This in turn invokes virNWFilterInstantiate which then invokes
virNWFilterDetermineMissingVarsRec which then invokes
virNWFilterObjFindByName. This iterates over every single
virNWFilterObjPtr in the list, locking them and checking their
name. So if 2 or more threads try to instantiate a filter in
parallel, they'll all hold 1 lock at the top level in the
__virNWFilterInstantiateFilter method which will cause the
other thread to deadlock in virNWFilterObjFindByName.
The fix is to add an exclusive mutex to serialize the
execution of __virNWFilterInstantiateFilter.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 925de19ed7f13e0d12d0b993496d314bab886589)
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
1. nwfilter driver lock
2. virt driver lock
3. nwfilter update lock
4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock
2. domain object lock
3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of
1. nwfilter driver lock
2. nwfilter update lock
3. virt driver lock
4. domain object lock
and VM start using
1. nwfilter update lock
2. virt driver lock
3. domain object lock
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb)
The virConnectPtr is passed around loads of nwfilter code in
order to provide it as a parameter to the callback registered
by the virt drivers. None of the virt drivers use this param
though, so it serves no purpose.
Avoiding the need to pass a virConnectPtr means that the
nwfilterStateReload method no longer needs to open a bogus
QEMU driver connection. This addresses a race condition that
can lead to a crash on startup.
The nwfilter driver starts before the QEMU driver and registers
some callbacks with DBus to detect firewalld reload. If the
firewalld reload happens while the QEMU driver is still starting
up though, the nwfilterStateReload method will open a connection
to the partially initialized QEMU driver and cause a crash.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 999d72fbd59ea712128ae294b69b6a54039d757b)
The nwfilter driver only needs a reference to its private
state object, not a full virConnectPtr. Update the domUpdateCBStruct
struct to have a 'void *opaque' field instead of a virConnectPtr.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit ebca369e3fe5ac999c261c2d44e60a1bac3cfe65)
None of the virNWFilterDefParse* methods require a virConnectPtr
arg, so just drop it
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit b77b16ce4166dcc87963ae5d279b77b162ddbb55)
When opening a new connection to the driver, nwfilterOpen
only succeeds if the driverState has been allocated.
Move the privilege check in driver initialization before
the state allocation to disable the driver.
This changes the nwfilter-define error from:
error: cannot create config directory (null): Bad address
To:
this function is not supported by the connection driver:
virNWFilterDefineXML
https://bugzilla.redhat.com/show_bug.cgi?id=1029266
(cherry picked from commit b7829f959b33c6e32422222a9ed745c0da7dc696)
When the daemon is compiled with firewalld support but the DBus message
bus isn't started in the system, the initialization of the nwfilter
driver fails even if there are fallback options.
Since iptables version 1.4.16 '-m state --state NEW' is converted to
'-m conntrack --ctstate NEW'. Therefore, when encountering this or later
versions of iptables use '-m conntrack --ctstate'.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure that all APIs which list nwfilter objects filter
them against the access control system.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch is in relation to Bug 966449:
https://bugzilla.redhat.com/show_bug.cgi?id=966449
This is a patch addressing the coredump.
Thread 1 must be calling nwfilterDriverRemoveDBusMatches(). It does so with
nwfilterDriverLock held. In the patch below I am now moving the
nwfilterDriverLock(driverState) further up so that the initialization, which
seems to either take a long time or is entirely stuck, occurs with the lock
held and the shutdown cannot occur at the same time.
Remove the lock in virNWFilterDriverIsWatchingFirewallD to avoid
double-locking.
https://bugzilla.redhat.com/show_bug.cgi?id=903480
During domain destruction it's possible that the learnIPAddressThread has
already removed the interface prior to the teardown filter path being run.
The teardown code would only be telling the thread to terminate.
Remove error reporting when calling the virNWFilterDHCPSnoopEnd
function with an interface for which no thread is snooping traffic.
Document the usage of this function.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Linux netfilter at some point (Linux 2.6.39) inverted the meaning of the
'--ctdir reply' and newer netfilter implementations now expect
'--ctdir original' instead and vice-versa.
We check for the kernel version and assume that all Linux kernels with version
2.6.39 have the newer inverted logic.
Any distro backporting the Linux kernel patch that inverts the --ctdir logic
(Linux commit 96120d86f) must also backport this patch for Linux and
adapt the kernel version being tested for.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
Ensure that all drivers implementing public APIs use a
naming convention for their implementation that matches
the public API name.
eg for the public API virDomainCreate make sure QEMU
uses qemuDomainCreate and not qemuDomainStart
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It will simplify later work if the sub-drivers have dedicated
APIs / field names. ie virNetworkDriver should have
virDrvNetworkOpen and virDrvNetworkClose methods
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure that the driver struct field names match the public
API names. For an API virXXXX we must have a driver struct
field xXXXX. ie strip the leading 'vir' and lowercase any
leading uppercase letters.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are a number of places which generate cast alignment
warnings, which are difficult or impossible to address. Use
pragmas to disable the warnings in these few places
conf/nwfilter_conf.c: In function 'virNWFilterRuleDetailsParse':
conf/nwfilter_conf.c:1806:16: warning: cast increases required alignment of target type [-Wcast-align]
item = (nwItemDesc *)((char *)nwf + att[idx].dataIdx);
conf/nwfilter_conf.c: In function 'virNWFilterRuleDefDetailsFormat':
conf/nwfilter_conf.c:3238:16: warning: cast increases required alignment of target type [-Wcast-align]
item = (nwItemDesc *)((char *)def + att[i].dataIdx);
storage/storage_backend_mpath.c: In function 'virStorageBackendCreateVols':
storage/storage_backend_mpath.c:247:17: warning: cast increases required alignment of target type [-Wcast-align]
names = (struct dm_names *)(((char *)names) + next);
nwfilter/nwfilter_dhcpsnoop.c: In function 'virNWFilterSnoopDHCPDecode':
nwfilter/nwfilter_dhcpsnoop.c:994:15: warning: cast increases required alignment of target type [-Wcast-align]
pip = (struct iphdr *) pep->eh_data;
nwfilter/nwfilter_dhcpsnoop.c:1004:11: warning: cast increases required alignment of target type [-Wcast-align]
pup = (struct udphdr *) ((char *) pip + (pip->ihl << 2));
nwfilter/nwfilter_learnipaddr.c: In function 'procDHCPOpts':
nwfilter/nwfilter_learnipaddr.c:327:33: warning: cast increases required alignment of target type [-Wcast-align]
uint32_t *tmp = (uint32_t *)&dhcpopt->value;
nwfilter/nwfilter_learnipaddr.c: In function 'learnIPAddressThread':
nwfilter/nwfilter_learnipaddr.c:501:43: warning: cast increases required alignment of target type [-Wcast-align]
struct iphdr *iphdr = (struct iphdr*)(packet +
nwfilter/nwfilter_learnipaddr.c:538:43: warning: cast increases required alignment of target type [-Wcast-align]
struct iphdr *iphdr = (struct iphdr*)(packet +
nwfilter/nwfilter_learnipaddr.c:544:48: warning: cast increases required alignment of target type [-Wcast-align]
struct udphdr *udphdr= (struct udphdr *)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Between revision 65fb9d49 and before this patch, an upgrade of libvirt while
VMs are running and instantiating iptables filtering rules due to nwfilter
rules, may leave stray iptables rules behind when shutting VMs down.
Left-over iptables rules may look like this:
Chain FP-vnet0 (1 references)
target prot opt source destination
DROP tcp -- 0.0.0.0/0 0.0.0.0/0 tcp spt:122
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
[...]
Chain libvirt-out (1 references)
target prot opt source destination
FO-vnet0 all -- 0.0.0.0/0 0.0.0.0/0 [goto] PHYSDEV match --physdev-out vnet0
The reason is that the recent nwfilter code only removed filtering rules in
the libvirt-out chain that contain the --physdev-is-bridged parameter.
Older rules didn't match and were not removed.
Note that the user-defined chain FO-vnet0 could not be removed due to the
reference from the rule in libvirt-out.
Often the work around may be done through
service iptables restart
kill -SIGHUP $(pidof libvirtd)
This patch now also removes older libvirt versions' iptables rules.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
As a step towards making virDomainObjList thread-safe turn it
into an opaque virObject, preventing any direct access to its
internals.
As part of this a new method virDomainObjListForEach is
introduced to replace all existing usage of virHashForEach
Although the nwfilter driver skips startup when running in a
session libvirtd, it did not skip reload or shutdown. This
caused errors to be reported when sending SIGHUP to libvirtd,
and caused an abort() in libdbus on shutdown due to trying
to remove a dbus filter that was never added
When starting a VM, /var/log/messages was spammed with the following message:
xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
With each extra VM I start, the messages get amplified
exponentially. This results in longer starting times every new VM,
relative the the previously started VM. When I ran a test with
starting 100 equal VM's, the first VM started in about 2 seconds, the
100th VM took 48 seconds to start. I'm running a vanilla 3.7.1 kernel,
but I have the same issue on VM hosts with kernel 3.2.28 or 3.2.0,
running libvirt 0.9.12 and 0.9.8 respectively.
Looking into the warning, it seemed that iptables need an extra argument,
--physdev-is-bridged, in commands like:
iptables -A libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet99 -g FP-vnet99
With that, the warnings in /var/log/messages are gone and running the
test again proved the 100th VM started in 3.8 seconds.
The virDomainObj, qemuAgent, qemuMonitor, lxcMonitor classes
all require a mutex, so can be switched to use virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently to deal with auto-shutdown libvirtd must periodically
poll all stateful drivers. Thus sucks because it requires
acquiring both the driver lock and locks on every single virtual
machine. Instead pass in a "inhibit" callback to virStateInitialize
which drivers can invoke whenever they want to inhibit shutdown
due to existance of active VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The only important state that should prevent libvirtd shutdown
is from running VMs. Networks, host devices, network filters
and storage pools are all long lived resources that have no
significant in-memory state. They should not block shutdown.
Also removed some unreachable code found by coverity:
libvirt-0.10.2/src/nwfilter/nwfilter_driver.c:259: unreachable: This
code cannot be reached: "nwfilterDriverUnlock(driver...".
The virStateInitialize method and several cgroups methods were
using an 'int privileged' parameter or similar for dual-state
values. These are better represented with the bool type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>