NWFilters can be provided name-value pairs using the following
XML notation:
<filterref filter='xyz'>
<parameter name='PORT' value='80'/>
<parameter name='VAL' value='abc'/>
</filterref>
The internal representation currently is so that a name is stored as a
string and the value as well. This patch now addresses the value part of it
and introduces a data structure for storing a value either as a simple
value or as an array for later support of lists.
This patch adjusts all code that was handling the values in hash tables
and makes it use the new data type.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch adds several aspects of documentation about the network filtering
system:
- chains, chains' priorities and chains' default priorities
- talks about lists of elements, i.e., a variable assigned multiple values
(part of already ACK-ed series)
- already mentions the vlan, stp and mac chains added later on
(https://www.redhat.com/archives/libvir-list/2011-October/msg01238.html)
- mentions limitations of vlan filtering (when sent by VM) on Linux systems
The previous patch extends the priority of filtering rules into negative
numbers. We now use this possibility to interleave the jumping into
chains with filtering rules to for example create the 'root' table of
an interface with the following sequence of rules:
Bridge chain: libvirt-I-vnet0, entries: 6, policy: ACCEPT
-p IPv4 -j I-vnet0-ipv4
-p ARP -j I-vnet0-arp
-p ARP -j ACCEPT
-p 0x8035 -j I-vnet0-rarp
-p 0x835 -j ACCEPT
-j DROP
The '-p ARP -j ACCEPT' rule now appears between the jumps.
Since the 'arp' chain has been assigned priority -700 and the 'rarp'
chain -600, the above ordering can now be achieved with the following
rule:
<rule action='accept' direction='out' priority='-650'>
<mac protocolid='arp'/>
</rule>
This patch now sorts the commands generating the above shown jumps into
chains and interleaves their execution with those for generating rules.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
So far rules' priorities have only been valid in the range [0,1000].
Now I am extending their priority into the range [-1000, 1000] for subsequently
being able to sort rules and the access of (jumps into) chains following
priorities.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch enables chains that have a known prefix in their name.
Known prefixes are: 'ipv4', 'ipv6', 'arp', 'rarp'. All prefixes
are also protocols that can be evaluated on the ebtables level.
Following the prefix they will be automatically connected to an interface's
'root' chain and jumped into following the protocol they evaluate, i.e.,
a table 'arp-xyz' will be accessed from the root table using
ebtables -t nat -A <iface root table> -p arp -j I-<ifname>-arp-xyz
thus generating a 'root' chain like this one here:
Bridge chain: libvirt-O-vnet0, entries: 5, policy: ACCEPT
-p IPv4 -j O-vnet0-ipv4
-p ARP -j O-vnet0-arp
-p 0x8035 -j O-vnet0-rarp
-p ARP -j O-vnet0-arp-xyz
-j DROP
where the chain 'arp-xyz' is accessed for filtering of ARP packets.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch extends the filter XML to support priorities of chains
in the XML. An example would be:
<filter name='allow-arpxyz' chain='arp-xyz' priority='200'>
[...]
</filter>
The permitted values for priorities are [-1000, 1000].
By setting the priority of a chain the order in which it is accessed
from the interface root chain can be influenced.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Use the name of the chain rather than its type index (enum).
This pushes the later enablement of chains with user-given names
into the XML parser. For now we still only allow those names that
are well known ('root', 'arp', 'rarp', 'ipv4' and 'ipv6').
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Use scripts for the renaming and cleaning up of chains. This allows us to get
rid of some of the code that is only capable of renaming and removing chains
whose names are hardcoded.
A shell function 'collect_chains' is introduced that is given the name
of an ebtables chain and then recursively determines the names of all
chains that are accessed from this chain and its sub-chains using 'jumps'.
The resulting list of chain names is then used to delete all the found
chains by first flushing and then deleting them.
The same function is also used for renaming temporary filters to their final
names.
I tested this with the bash and dash as script interpreters.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Use the previously introduced chain priorities to sort the chains for access
from an interface's 'root' table and have them created in the proper order.
This gets rid of a lot of code that was previously creating the chains in a
more hardcoded way.
To determine what protocol a filter is used for evaluation do prefix-
matching, i.e., the filter 'arp' is used to filter for the 'arp' protocol,
'ipv4' for the 'ipv4' protocol and 'arp-xyz' will also be used to filter
for the 'arp' protocol following the prefix 'arp' in its name.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
For better handling of the sorting of chains introduce an internally used
priority. Use a lookup table to store the priorities. For now their actual
values do not matter just that the values cause the chains to be properly
sorted through changes in the following patches. However, the values are
chosen as negative so that once they are sorted along with filtering rules
(whose priority may only be positive for now) they will always be instantiated
before them (lower values cause instantiation before higher values). This
is done to maintain backwards compatibility.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Add a function to the virHashTable for getting an array of the hash table's
key-value pairs and have the keys (optionally) sorted.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch adds support for a systemd init service for libvirtd
and libvirt-guests. The libvirtd.service is *not* written to use
socket activation, since we want libvirtd to start on boot so it
can do guest auto-start.
The libvirt-guests.service is pretty lame, just exec'ing the
original init script for now. Ideally we would factor out the
functionality, into some shared tool.
Instead of
./configure --with-init-script=redhat
You can now do
./configure --with-init-script=systemd
Or better still:
./configure --with-init-script=systemd+redhat
We can also now support install of the upstart init script
* configure.ac: Add systemd, and systemd+redhat options to
--with-init-script option
* daemon/Makefile.am: Install systemd services
* daemon/libvirtd.sysconf: Add note about unused env variable
with systemd
* daemon/libvirtd.service.in: libvirtd systemd service unit
* libvirt.spec.in: Add scripts to installing systemd services
and migrating from legacy init scripts
* tools/Makefile.am: Install systemd services
* tools/libvirt-guests.init.sh: Rename to tools/libvirt-guests.init.in
* tools/libvirt-guests.service.in: systemd service unit
Support creation of macvlan devices for LXC containers. Do not
allow setting of bandwidth controls or vport profiles due to the
complication that there is no host side visible device to work
with.
* src/lxc/lxc_driver.c: Support type=direct interfaces
Update virNetDevMacVLanCreateWithVPortProfile to allow creation
of plain macvlan devices, as well as macvtap devices. The former
is useful for LXC containers
* src/qemu/qemu_command.c: Explicitly request a macvtap device
* src/util/virnetdevmacvlan.c, src/util/virnetdevmacvlan.h: Add
new flag to allow switching between macvlan and macvtap
creation
The current lxcSetupInterfaces() method directly performs setup
of the bridge devices. Since it will shortly need to also create
macvlan devices, move the bridge related code into a separate
method
* src/lxc/lxc_driver.c: Split lxcSetupInterfaces() to create a
new lxcSetupInterfaceBridge()
The virDomainNetGetActualBridgeName and virDomainNetGetActualDirectDev
methods both return strings that point to data in the virDomainDefPtr
struct, and should therefore not be freed. The return values should
thus be 'const char *' not 'char *'.
* src/conf/domain_conf.c, src/conf/domain_conf.h: Mark const
* src/network/bridge_driver.c: Update to use a const char *
Move the ifaceMacvtapLinkDump and ifaceGetNthParent functions
into virnetdevvportprofile.c since they are specific to that
code. This avoids polluting the headers with the Linux specific
netlink data types
* src/util/interface.c, src/util/interface.h: Move
ifaceMacvtapLinkDump and ifaceGetNthParent functions and delete
remaining file
* src/util/virnetdevvportprofile.c: Add ifaceMacvtapLinkDump
and ifaceGetNthParent functions
* src/network/bridge_driver.c, src/nwfilter/nwfilter_gentech_driver.c,
src/nwfilter/nwfilter_learnipaddr.c, src/util/virnetdevmacvlan.c:
Remove include of interface.h
Rename ifaceIsVirtualFunction to virNetDevIsVirtualFunction,
ifaceGetVirtualFunctionIndex to virNetDevGetVirtualFunctionIndex
and ifaceGetPhysicalFunction to virNetDevGetPhysicalFunction
* src/util/interface.c, src/util/interface.h: Rename APIs
* src/util/virnetdevvportprofile.c: Update for API rename
Rename the ifaceCheck method to virNetDevValidateConfig and change
so that it always raises an error and returns -1 on error.
* src/util/interface.c, src/util/interface.h: Rename ifaceCheck
to virNetDevValidateConfig
* src/nwfilter/nwfilter_gentech_driver.c,
src/nwfilter/nwfilter_learnipaddr.c: Update for API rename
To match up with the existing virNetDevSetIPv4Address, rename
ifaceGetIPAddress to virNetDevGetIPv4Address
* util/interface.h, util/interface.c: Rename API
* network/bridge_driver.c: Update for API rename
Rename the ifaceGetIndex method to virNetDevGetIndex and
ifaceGetVlanID to virNetDevGetVLanID. Also change the error
reporting behaviour to always raise errors and return -1 on
failure
* util/interface.c, util/interface.h: Rename ifaceGetIndex
and ifaceGetVLAN
* nwfilter/nwfilter_gentech_driver.c, nwfilter/nwfilter_learnipaddr.c,
nwfilter/nwfilter_learnipaddr.c, util/virnetdevvportprofile.c: Update
for API renames and error handling changes
Move virNetDevReplaceMacAddress and virNetDevRestoreMacAddress
to the virnetdev.c file where they naturally belong
* util/interface.c, util/interface.h: Remove
virNetDevReplaceMacAddress and virNetDevRestoreMacAddress
* util/virnetdev.c, util/virnetdev.h: Add
virNetDevReplaceMacAddress and virNetDevRestoreMacAddress
Rename ifaceReplaceMacAddress to virNetDevReplaceMacAddress
and ifaceRestoreMacAddress to virNetDevRestoreMacAddress.
* util/interface.c, util/interface.h, util/virnetdevmacvlan.c:
Rename APIs
Move the low level macvlan creation APIs into the
virnetdevmacvlan.c file where they more naturally
belong
* util/interface.c, util/interface.h: Remove virNetDevMacVLanCreate
and virNetDevMacVLanDelete
* util/virnetdevmacvlan.c, util/virnetdevmacvlan.h: Add
virNetDevMacVLanCreate and virNetDevMacVLanDelete
Rename ifaceMacvtapLinkAdd to virNetDevMacVLanCreate and
ifaceLinkDel to virNetDevMacVLanDelete. Strictly speaking
the latter isn't restricted to macvlan devices, but that's
the only use libvirt has for it.
* util/interface.c, util/interface.h,
util/virnetdevmacvlan.c: Rename APIs
Rename virNetDevMacVLanCreate to virNetDevMacVLanCreateWithVPortProfile
and virNetDevMacVLanDelete to virNetDevMacVLanDeleteWithVPortProfile
To make way for renaming the other macvlan creation APIs in
interface.c
* util/virnetdevmacvlan.c, util/virnetdevmacvlan.h,
qemu/qemu_command.c, qemu/qemu_hotplug.c, qemu/qemu_process.c:
Rename APIs
Rename the macvtap.c file to virnetdevmacvlan.c to reflect its
functionality. Move the port profile association code out into
virnetdevvportprofile.c. Make the APIs available unconditionally
to callers
* src/util/macvtap.h: rename to src/util/virnetdevmacvlan.h,
* src/util/macvtap.c: rename to src/util/virnetdevmacvlan.c
* src/util/virnetdevvportprofile.c, src/util/virnetdevvportprofile.h:
Pull in vport association code
* src/Makefile.am, src/conf/domain_conf.h, src/qemu/qemu_conf.c,
src/qemu/qemu_conf.h, src/qemu/qemu_driver.c: Update include
paths & remove conditional compilation
In preparation for code re-organization, rename the Macvtap
management APIs to have the following patterns
virNetDevMacVLanXXXXX - macvlan/macvtap interface management
virNetDevVPortProfileXXXX - virtual port profile management
* src/util/macvtap.c, src/util/macvtap.h: Rename APIs
* src/conf/domain_conf.c, src/network/bridge_driver.c,
src/qemu/qemu_command.c, src/qemu/qemu_command.h,
src/qemu/qemu_driver.c, src/qemu/qemu_hotplug.c,
src/qemu/qemu_migration.c, src/qemu/qemu_process.c,
src/qemu/qemu_process.h: Update for renamed APIs
Add routines to generate -numa QEMU command line option based on
<numa> ... </numa> XML specifications.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
This patch adds XML definitions for guest NUMA specification and contains
routines to parse the same. The guest NUMA specification looks like this:
<cpu>
...
<topology sockets='2' cores='4' threads='2'/>
<numa>
<cell cpus='0-7' memory='512000'/>
<cell cpus='8-15' memory='512000'/>
</numa>
...
</cpu>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
For whatever reason, the kernel allows you to create a regular
file named /dev/sdc.12345; although this file will disappear the
next time devtmpfs is remounted. If you let libvirt generate
the name of the external snapshot for a disk image originally
using the block device /dev/sdc, then the domain will be rendered
unbootable once the qcow2 file is lost on the next devtmpfs
remount. In this case, the user should have used 'virsh
snapshot-create --xmlfile' or 'virsh snapshot-create-as --diskspec'
to specify the name for the qcow2 file in a sane location, rather
than relying on libvirt generating a name that is most likely to
be wrong. We can help avoid naive mistakes by enforcing that
the user provide the external name for any backing file that is
not a regular file.
* src/conf/domain_conf.c (virDomainSnapshotAlignDisks): Only
generate names if backing file exists as regular file.
Reported by MATSUDA Daiki.
I missed adding virNetServerGetDBusConn() to libvirtd_private.syms
in commit b8adfcc6, which didn't cause a problem in 0.9.6 but
results in this build error in 0.9.7
libvirtd-remote.o: In function `remoteDispatchAuthPolkit':
remote.c:(.text+0x188dd): undefined reference to `virNetServerGetDBusConn'
One of the top questions by libvirt users is how to create a host
bridge device so that guests can be directly on the physical
network. There are several example documents that explain how to do
this manually, but following them often results in confusion and
failure. virt-manager does a good job of creating a bridge based on an
existing network device, but not everyone wants to use virt-manager.
This patch adds a new command, iface-bridge that makes it just about
as simple as possible to create a new bridge device based on an
existing ethernet/vlan/bond device (including associating IP
configuration with the bridge rather than the now-attached device),
and start that new bridge up ready for action, eg:
virsh iface-bridge eth0 br0
For symmetry's sake, it also adds a command to remove a device from a
bridge, restoring the IP config to the now-unattached device:
virsh iface-unbridge br0
(I had a short debate about whether to do "iface-unbridge eth0"
instead, but that would involve searching through all bridge devices
for the one that contained eth0, which seems like a bit too much
trouble).
NOTE: These two commands require that the netcf library be available
on the host. Hopefully this will provide some extra incentive for
people using suse, debian, ubuntu, and other similar systems to polish
up (and push downstream) the ports to those distros recently pushed to
the upstream netcf repo by Dan Berrange. Anyone interested in helping
with that effort in any way should join the netcf-devel mailing list
(subscription info at
https://fedorahosted.org/mailman/listinfo/netcf-devel)
During creation of the bridge, it's possible to specify whether or not
the STP protocol should be started up on the bridge and, if so, how
many seconds the bridge should squelch traffic from newly added
devices while learning new topology (defaults are stp='on' and
delay='0', which seems to usually work best for bridges used in the
context of libvirt guests).
There is also an option to not immediately start the bridge (and a
similar option to not immediately start the un-attached device after
destroying the bridge. Default is to start the new device, because in
the case of iface-unbridge not starting is strongly discouraged as it
will leave the system with no network connectivity on that interface
(because it's necessary to destroy/undefine the bridge device before
the unattached device can be defined), and it seemed better to make
the option for iface-bridge behave consistently.
NOTE TO THOSE TRYING THESE COMMANDS FOR THE FIRST TIME: to guard
against any "unexpected" change to configuration, it is advisable to
issue an "virsh iface-begin" command before starting any interface
config changes, and "virsh iface-commit" only after you've verified
that everything is working as you expect. If something goes wrong,
you can always run "virsh iface-rollback" or reboot the system (which
should automatically do iface-rollback).
Aside from adding the code for these two functions, and the two
entries into the command table, the only other change to virsh.c was
to add the option name to vshCommandOptInterfaceBy(), because the
iface-unbridge command names its interface option as "bridge".
virsh.pod has also been updated with short descriptions of these two
new commands.
Due to the asynchronous nature of streams, we might continue to
receive some stream packets from the server even after we have
shutdown the stream on the client side. These should be discarded
silently, rather than raising an error in the RPC layer.
* src/rpc/virnetclient.c: Discard stream data silently
Very occasionally the sequence of events from poll would result
in getting a HANGUP on its own, instead of a HANGUP+READABLE
at the same time. In the former case we would send back an error
event to the client, but never send the empty packet to indicate
EOF.
Add a new virNetClientSendNonBlock which returns 2 on
full send, 1 on partial send, 0 on no send, -1 on error
If a partial send occurs, then a subsequent call to any
of the virNetClientSend* APIs will finish any outstanding
I/O.
TODO: the virNetClientEvent event handler could be used
to speed up completion of partial sends if an event loop
is present.
* src/rpc/virnetsocket.h, src/rpc/virnetsocket.c: Add new
virNetSocketHasPendingData() API to test for cached
data pending send.
* src/rpc/virnetclient.c, src/rpc/virnetclient.h: Add new
virNetClientSendNonBlock() API to send non-blocking API
Stop multiplexing virNetClientSend for two different purposes,
instead add virNetClientSendWithReply and virNetClientSendNoReply
* src/rpc/virnetclient.c, src/rpc/virnetclient.h: Replace
virNetClientSend with virNetClientSendWithReply and
virNetClientSendNoReply
* src/rpc/virnetclientprogram.c, src/rpc/virnetclientstream.c:
Update for new API names
Remove some duplication by pulling the code for passing the
buck out into a helper method
* src/rpc/virnetclient.c: Introduce virNetClientIOEventLoopPassTheBuck
Instead of inferring whether the buck is held from the waitDispatch
pointer, use an explicit 'bool haveTheBuck' field
* src/rpc/virnetclient.c: Explicitly track the buck
Directly messing around with the linked list is potentially
dangerous. Introduce some helper APIs to deal with list
manipulating the list
* src/rpc/virnetclient.c: Create linked list handlers