Following Daniel Berrange's multiple helpful suggestions for improving
this patch and introducing another driver interface, I now wrote the
below patch where the nwfilter driver registers the functions to
instantiate and teardown the nwfilters with a function in
conf/domain_nwfilter.c called virDomainConfNWFilterRegister. Previous
helper functions that were called from qemu_driver.c and qemu_conf.c
were move into conf/domain_nwfilter.h with slight renaming done for
consistency. Those functions now call the function expored by
domain_nwfilter.c, which in turn call the functions of the new driver
interface, if available.
This patch adds an optional XML attribute to a nwfilter rule to give the user control over whether the rule is supposed to be using the iptables state match or not. A rule may now look like shown in the XML below with the statematch attribute either having value '0' or 'false' (case-insensitive).
[...]
<rule action='accept' direction='in' statematch='false'>
<tcp srcmacaddr='1:2:3:4:5:6'
srcipaddr='10.1.2.3' srcipmask='32'
dscp='33'
srcportstart='20' srcportend='21'
dstportstart='100' dstportend='1111'/>
</rule>
[...]
I am also extending the nwfilter schema and add this attribute to a test case.
When a filter is updated, only those interfaces must have their old
rules cleared that either reference the filter directly or indirectly
through another filter. Remember between the different steps of the
instantiation of the filters which interfaces must be skipped. I am
using a hash map to remember the names of the interfaces and store a
bogus pointer to ~0 into it that need not be freed.
For the decision on whether to instantiate the rules, the check for a
pending IP address learn request is not sufficient since then only the
thread could instantiate the rules. So, a boolean needs to be passed
when the thread instantiates the filter rules late and the IP address
learn request is still pending in order to override the check for the
pending learn request. If the rules are to be updated while the thread
is active, this will not be done immediately but the thread will do that
later on.
Introduce a function to notify the IP address learning
thread to terminate and thus release the lock on the interface.
Notify the thread before grabbing the lock on the interface
and tearing down the rules. This prevents a 'virsh destroy' to
tear down the rules that the IP address learning thread has
applied.
The functions invoked by the IP address learning thread
that apply some basic filtering rules did not clean up
any previous filtering rules that may still be there
(due to a libvirt restart for example). With the
patch below all the rules are cleaned up first.
Also, I am introducing a function to drop all traffic
in case the IP address learning thread could not apply
the rules.
The local DHCP server on virtbr0 sends DHCP ACK messages when a VM is
started and requests an IP address while the initial DHCP lease on the
VM's MAC address hasn't expired. So, also pick the IP address of the VM
if that type of message is seen.
Thanks to Gerhard Stenzel for providing a test case for this.
Changes from V1 to V2:
- cleanup: replacing DHCP option numbers through constants
This patch adds support for the RARP protocol. This may be needed due to
qemu sending out a RARP packet (at least that's what it seems to want to
do even though the protocol id is wrong) when migration finishes and
we'd need a rule to let the packets pass.
Unfortunately my installation of ebtables does not understand -p RARP
and also seems to otherwise depend on strings in /etc/ethertype
translated to protocol identifiers. Therefore I need to pass -p 0x8035
for RARP. To generally get rid of the dependency of that file I switch
all so far supported protocols to use their protocol identifier in the
-p parameter rather than the string.
I am also extending the schema and added a test case.
changes from v1 to v2:
- added test case into patch
With this patch I want to enable hex number inputs in the filter XML. A
number that was entered as hex is also printed as hex unless a string
representing the meaning can be found.
I am also extending the schema and adding a test case. A problem with
the DSCP value is fixed on the way as well.
Changes from V1 to V2:
- using asHex boolean in all printf type of functions to select the
output format in hex or decimal format
The nwfilterDriverActive() could de-reference a NULL pointer
if it hadn't be started at the point it was called. It was
also not thread safe, since it lacked locking around data
accesses.
* src/nwfilter/nwfilter_driver.c: Fix locking & NULL checks
in nwfilterDriverActive()
- using INT_BUFSIZE_BOUND() to determine the length of the buffersize
for printing and integer into
- not explicitly initializing static var threadsTerminate to false
anymore, since that's done automatically
Changes after V2:
- removed while looks in case of OOM error
- removed on ifaceDown() call
- preceding one ifaceDown() call with an ifaceCheck() call
Since the name of an interface can be the same between stops and starts
of different VMs I have to switch the IP address learning thread to use
the index of the interface to determine whether an interface is still
available or not - in the case of macvtap the thread needs to listen for
traffic on the physical interface, thus having to time out periodically
to check whether the VM's macvtap device is still there as an indication
that the VM is still alive. Previously the following sequence of 2 VMs
with macvtap device
virsh start testvm1; virsh destroy testvm1 ; virsh start testvm2
would not terminate the thread upon testvm1's destroy since the name of
the interface on the host could be the same (i.e, macvtap0) on testvm1
and testvm2, thus it was easily race-able. The thread would then
determine the IP address parameter for testvm2 but apply the rule set
for testvm1. :-(
I am also introducing a lock for the interface (by name) that the thread
must hold while it listens for the traffic and releases when it
terminates upon VM termination or 0.5 second thereafter. Thus, the new
thread for a newly started VM with the same interface name will not
start while the old one still holds the lock. The only other code that I
see that also needs to grab the lock to serialize operation is the one
that tears down the firewall that were established on behalf of an
interface.
I am moving the code applying the 'basic' firewall rules during the IP
address learning phase inside the thread but won't start the thread
unless it is ensured that the firewall driver has the ability to apply
the 'basic' firewall rules.
I am moving some of the eb/iptables related functions into the interface
of the firewall driver and am making them only accessible via the driver's
interface. Otherwise exsiting code is adapted where needed. I am adding one
new function to the interface that checks whether the 'basic' rules can be
applied, which will then be used by a subsequent patch.
To avoid race-conditions, the tear down of a filter has to happen before
the tap interface disappears and another tap interface with the same
name can re-appear. This patch tries to fix this. In one place, where
communication with the qemu monitor may fail, I am only tearing the
filters down after knowing that the function did not fail.
I am also moving the tear down functions into an include file for other
drivers to reuse.
* src/nwfilter/nwfilter_ebiptables_driver.c (ebiptablesApplyNewRules):
Don't dereference a NULL or uninitialized pointer when given
an empty list of rules. Add an sa_assert(inst) in each loop to
tell clang that the uses of "inst[i]" are valid.
I am getting rid of determining the path to necessary CLI tools at
compile time. Instead, now the firewall driver has an initialization
function that uses virFindFileInPath() to determine the path to
necessary CLI tools and a shutdown function to free allocated memory.
The rest of the patch mostly deals with availability of the CLI tools
and to not call certain code blocks if a tool is not available and that
strings now have to be built slightly differently.
Changes from v1 to v2:
- changed function name prefixes to 'iface' from previous 'Iface'
- Further to make make syntax-check pass:
- indentation fix in interface.h
- added entry to POTFILES.in
I am consolidating network interface related functions used in nwfilter
and macvtap code in utils/interface.c. All function names are prefixed
with 'Iface'. The following functions are now available through
interface.h:
int ifaceCtrl(const char *name, bool up);
int ifaceUp(const char *name);
int ifaceDown(const char *name);
int ifaceCheck(bool reportError, const char *ifname,
const unsigned char *macaddr, int ifindex);
int ifaceGetIndex(bool reportError, const char *ifname, int *ifindex);
I added 'int ifindex' as parameter to ifaceCheck to the original
function and modified the code accordingly.
I mistakenly took the op field in the DHCP message as the DHCP_OFFER
type. Rather than basing the decision to read the VM's IP address on
that field, process the appended DHCP options where option 53 indicates
the actual type of the packet. I am also reading the broadcast address
of the VM, but don't use it so far.
Changes from V1 to V2 of this patch
- I had reversed the logic thinking that icmp type 0 is a echo
request,but it's reply -- needed to reverse the logic
- Found that ebtables takes the --ip-tos argument only as a hex number
This patch enables the skipping of some of the ICMP traffic rules on the
iptables level under certain circumstances so that the following filter
properly enables unidirectional pings:
<filter name='testcase'>
<uuid>d6b1a2af-def6-2898-9f8d-4a74e3c39558</uuid>
<!-- allow incoming ICMP Echo Request -->
<rule action='accept' direction='in' priority='500'>
<icmp type='8'/>
</rule>
<!-- allow outgoing ICMP Echo Reply -->
<rule action='accept' direction='out' priority='500'>
<icmp type='0'/>
</rule>
<!-- drop all other ICMP traffic -->
<rule action='drop' direction='inout' priority='600'>
<icmp/>
</rule>
</filter>
This patch implements support for learning a VM's IP address. It uses
the pcap library to listen on the VM's backend network interface (tap)
or the physical ethernet device (macvtap) and tries to capture packets
with source or destination MAC address of the VM and learn from DHCP
Offers, ARP traffic, or first-sent IPv4 packet what the IP address of
the VM's interface is. This then allows to instantiate the network
traffic filtering rules without the user having to provide the IP
parameter somewhere in the filter description or in the interface
description as a parameter. This only supports to detect the parameter
IP, which is for the assumed single IPv4 address of a VM. There is not
support for interfaces that may have multiple IP addresses (IP
aliasing) or IPv6 that may then require more than one valid IP address
to be detected. A VM can have multiple independent interfaces that each
uses a different IP address and in that case it will be attempted to
detect each one of the address independently.
So, when for example an interface description in the domain XML has
looked like this up to now:
<interface type='bridge'>
<source bridge='mybridge'/>
<model type='virtio'/>
<filterref filter='clean-traffic'>
<parameter name='IP' value='10.2.3.4'/>
</filterref>
</interface>
you may omit the IP parameter:
<interface type='bridge'>
<source bridge='mybridge'/>
<model type='virtio'/>
<filterref filter='clean-traffic'/>
</interface>
Internally I am walking the 'tree' of a VM's referenced network filters
and determine with the given variables which variables are missing. Now,
the above IP parameter may be missing and this causes a libvirt-internal
thread to be started that uses the pcap library's API to listen to the
backend interface (in case of macvtap to the physical interface) in an
attempt to determine the missing IP parameter. If the backend interface
disappears the thread terminates assuming the VM was brought down. In
case of a macvtap device a timeout is being used to wait for packets
from the given VM (filtering by VM's interface MAC address). If the VM's
macvtap device disappeared the thread also terminates. In all other
cases it tries to determine the IP address of the VM and will then apply
the rules late on the given interface, which would have happened
immediately if the IP parameter had been explicitly given. In case an
error happens while the firewall rules are applied, the VM's backend
interface is 'down'ed preventing it to communicate. Reasons for failure
for applying the network firewall rules may that an ebtables/iptables
command failes or OOM errors. Essentially the same failure reasons may
occur as when the firewall rules are applied immediately on VM start,
except that due to the late application of the filtering rules the VM
now is already running and cannot be hindered anymore from starting.
Bringing down the whole VM would probably be considered too drastic.
While a VM's IP address is attempted to be determined only limited
updates to network filters are allowed. In particular it is prevented
that filters are modified in such a way that they would introduce new
variables.
A caveat: The algorithm does not know which one is the appropriate IP
address of a VM. If the VM spoofs an IP address in its first ARP traffic
or IPv4 packets its filtering rules will be instantiated for this IP
address, thus 'locking' it to the found IP address. So, it's still
'safer' to explicitly provide the IP address of a VM's interface in the
filter description if it is known beforehand.
* configure.ac: detect libpcap
* libvirt.spec.in: require libpcap[-devel] if qemu is built
* src/internal.h: add the new ATTRIBUTE_PACKED define
* src/Makefile.am src/libvirt_private.syms: add the new modules and symbols
* src/nwfilter/nwfilter_learnipaddr.[ch]: new module being added
* src/nwfilter/nwfilter_driver.c src/conf/nwfilter_conf.[ch]
src/nwfilter/nwfilter_ebiptables_driver.[ch]
src/nwfilter/nwfilter_gentech_driver.[ch]: plu the new functionality in
* tests/nwfilterxml2xmltest: extend testing
The attached patch fixes a problem due to the mac match in iptables only
supporting --mac-source and no --mac-destination, thus it not being
symmetric. Therefore a rule like this one
<rule action='drop' direction='out'>
<all match='no' srcmacaddr='$MAC'/>
</rule>
should only have the MAC match on traffic leaving the VM and not test
for the same source MAC address on traffic that the VM receives.
With Eric Blake's suggestions applied.
The following rule for direction 'in'
<rule direction='in' action='drop'>
<mac srcmacaddr='1:2:3:4:5:6'/>
</rule>
drops all traffic from the given mac address.
The following rule for direction 'out'
<rule direction='out' action='drop'>
<mac dstmacaddr='1:2:3:4:5:6'/>
</rule>
drops all traffic to the given mac address.
The following rule in direction 'inout'
<rule direction='inout' action='drop'>
<mac srcmacaddr='1:2:3:4:5:6'/>
</rule>
now drops all traffic from and to the given MAC address.
So far it would have dropped traffic from the given MAC address
and outgoing traffic with the given source MAC address, which is not useful
since the packets will always have the VM's MAC address as source
MAC address. The attached patch fixes this.
This is the last bug I currently know of and want to fix.
- ebtables requires that some of the command line parameters are passed as hex numbers; so have those attributes call a function that prints 16 and 8 bit integers as hex nunbers.
- ip6tables requires '--icmpv6-type' rather than '--icmp-type'
- ebtables complains about protocol identifiers lower than 0x600, so already discard anything lower than 0x600 in the parser
- make the protocol entry types more readable using a #define for its entries
- continue parsing a filtering rule even if a faulty entry is encountered; return an error value at the end and let the caller decide what to do with the rule's object
- fix an error message
found some cases where the output ended up not looking as expected. So
the following changes are in the patch below:
- if the protocol ID in the MAC header is an integer, just write it into
the datastructure without trying to find a corresponding string for it
and if none is found failing
- when writing the protocol ID as string, simply write it as integer if
no corresponding string can be found
- same changes for arpOpcode parsing and printing
- same changes for protocol ID in an IP packet
- DSCP value needs to be written into the data structure
- IP protocol version number is redundant at this level, so remove it
- parse the protocol ID found inside an IP packet not only as string but
also as uint8
- arrange the display of the src and destination masks to be shown after
the src and destination ip address respectively in the XML
- the existing libvirt IP address parser accepts for example '25' as an
IP address. I want this to be parsed as a CIDR type netmask. So try to
parse it as an integer first (CIDR netmask) and if that doesn't work as
a dotted IP address style netmask.
- instantiation of rules with MAC masks didn't work because they weren't
printed into a buffer, yet.
This patch changes the network filtering code to use libvirt's existing
IPv4 and IPv6 address parsers/printers rather than my self-written ones.
I am introducing a new function in network.c that counts the number of
bits in a netmask and ensures that the given address is indeed a netmask,
return -1 on error or values of 0-32 for IPv4 addresses and 0-128 for
IPv6 addresses. I then based the function checking for valid netmask
on invoking this function.
This patch adds IPv6 filtering support for the following protocols:
- tcp-ipv6
- udp-ipv6
- udplite-ipv6
- esp-ipv6
- ah-ipv6
- sctp-ipv6
- all-ipv6
- icmpv6
Many of the IPv4 data structure could be re-used for IPv6 support.
Since ip6tables also supports pretty much the same command line parameters
as iptables does, also much of the code could be re-used and now
command lines are invoked with the ip(6)tables tool parameter passed
through the functions as a parameter.
This patch removes the driver dependency from nwfilter_conf.c and moves
a callback function calling into the driver into
nwfilter_gentech_driver.c and passes a pointer to that callback function
upon initialization of nwfilter_conf.c.
This patch adds support for L3/L4 filtering using iptables. This adds
support for 'tcp', 'udp', 'icmp', 'igmp', 'sctp' etc. filtering.
As mentioned in the introduction, a .c file provided by this patch
is #include'd into a .c file. This will need work, but should be alright
for review.
Signed-off-by: Stefan Berger <stefanb@us.ibm.com>
This patch adds IPv6 support for the ebtables layer. Since the parser
etc. are all parameterized, it was fairly easy to add this...
Signed-off-by: Stefan Berger <stefanb@us.ibm.com>
This patch implements the core driver and provides
- management functionality for managing the filter XMLs
- compiling the internal filter representation into ebtables rules
- applying ebtables rules on a network (tap,macvtap) interface
- tearing down ebtables rules that were applied on behalf of an
interface
- updating of filters while VMs are running and causing the firewalls to
be rebuilt
- other bits and pieces
Signed-off-by: Stefan Berger <stefanb@us.ibm.com>