Commit Graph

32033 Commits

Author SHA1 Message Date
Ján Tomko
6dec641394 storagepoolxml2argvtest: run mountopts test conditionally
This test relies on namespace support, which is only compiled in
if we have the 'fs' and 'netfs' backends.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2019-01-31 16:48:25 +01:00
Ján Tomko
55aa7ab182 storagepoolxml2argvtest: introduce DO_TEST_PLATFORM
Instead of repeating the same platform for every test,
set it once, since we do the same tests with the same
input for all platforms, it's just the output that differs.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2019-01-31 16:48:25 +01:00
Ján Tomko
f6b839f7b4 storagepoolxml2argvtest: pass the platform suffix as a string
Instead of a pair of bools.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2019-01-31 16:48:24 +01:00
Erik Skultety
13500ee289 docs: Drop /dev/net/tun from the list of shared devices
This was a left-over that should have been dropped along the change in
qemu.conf.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2019-01-31 16:07:28 +01:00
John Ferlan
6c87c75a0c tests: Fix storagepoolxml2xmltest execution for XML namespaces
Only run the pool-netfs-ns-mountopts if built WITH_STORAGE_FS and only
run pool-rbd-ns-configopts if built with WITH_STORAGE_RBD since the
namespace support is only enabled if the pool is enabled.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-01-31 08:31:17 -05:00
Daniel P. Berrangé
6bb582bff8 qemu: remove check for 'qemu' binary
The 'qemu' binary used to provide the i386 emulator until it was renamed
to qemu-system-i386 in QEMU 1.0. Since we don't support such old
versions we don't need to check for 'qemu' when probing capabilities.

Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 13:28:40 +00:00
Daniel P. Berrangé
4a8d9d4953 storage: change custom namespace URIs to drop '/source' component
The custom namespaces were originally registered against the storage
pool source struct, but during review this was changed to the top level
storage pool struct. The namespace URIs were not updated to match, so
had a redundant '/source' component.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 12:34:05 +00:00
Peter Krempa
73ce3911aa qemu: blockjob: Don't report block job progress at 100% if job isn't ready
Some clients poll virDomainGetBlockJobInfo rather than wait for the
VIR_DOMAIN_BLOCK_JOB_READY event. In some cases qemu can get to 100% and
still not reach the synchronised phase. Initiating a pivot in that case
will fail.

Given that computers are interacting here, the error that the job
can't be finalized yet is not handled very well by those specific
implementations.

Our docs now correctly state to use the event. We already do a similar
output adjustment in case when the progress is not available from qemu
as in that case we'd report 0 out of 0, which some apps also incorrectly
considered as 100% complete.

In this case we subtract 1 from the progress if the ready state is not
signalled by qemu if the progress was at 100% otherwise.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2019-01-31 13:03:01 +01:00
Peter Krempa
52bf9ada8e docs: css: Make docs page wider while still accomodating narrow screens
Bump the width to 70em while keeping a maximum width of 95% to allow for
some border.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2019-01-31 12:03:32 +01:00
Peter Krempa
63cbad4e05 docs: Format bit shift and hex notation for bitwise flag enums
Big number itself does not make much sense in some cases. Format the
bitshift format as well.

Changes our web page docs from:

VIR_MIGRATE_POSTCOPY = 32768 : Setting the VIR_MIGRATE_POSTCOPY...
VIR_MIGRATE_TLS      = 65536 : Setting the VIR_MIGRATE_TLS flag...

to:

VIR_MIGRATE_POSTCOPY = 32768 (0x8000; 1 << 15)  : Setting the VIR_MIGRATE_POSTCOPY...
VIR_MIGRATE_TLS      = 65536 (0x10000; 1 << 16) : Setting the VIR_MIGRATE_TLS flag...

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2019-01-31 12:02:35 +01:00
Daniel P. Berrangé
6a306a6b8f conf: fix enum convertor function for feature capability errors
A copy+paste mistaken meant the wrong enum -> string convertor
function was used for the error when an incorrect feature capability was
used.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 10:54:22 +00:00
Daniel P. Berrangé
8c618e17d1 hyperv: use "is None" not "== None" for PEP-8 compliance
PEP 8 says:

    "Comparisons to singletons like None should always be done
     with 'is' or 'is not', never the equality operators."

There are potentially semantics differences, though in the case of this
libvirt code its merely a style change:

  http://jaredgrubb.blogspot.com/2009/04/python-is-none-vs-none.html

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 10:38:43 +00:00
Daniel P. Berrangé
a962af7df3 hyperv: remove unused 'total' variable
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 10:38:39 +00:00
Daniel P. Berrangé
a507edef33 qemu: pass virDomainDeviceInfo by reference
The virDomainDeviceInfo parameter is a large struct so it is preferrable
to pass it by reference instead of by value.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 10:38:36 +00:00
Daniel P. Berrangé
72e8c721eb storage: pass struct _virStorageBackendQemuImgInfo by reference
The struct _virStorageBackendQemuImgInfo is quite large so it is
preferrable to pass it by reference instead of by value. This requires
us to stop modifying the "compat" field.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 10:38:33 +00:00
Daniel P. Berrangé
75d4defe8f remote: remove variable whose value is a constant
The 'rv' variable is never changed after being declared, so can be
removed.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 10:38:30 +00:00
Daniel P. Berrangé
df7b679c58 conf: remove pointless check on enum value
'val' is initialized from virDomainCapsFeatureTypeFromString and a
few lines earlier there was already a check for 'val < 0'.

The 'val >= 0' is thus always true. The enum conversion similarly
ensures that the val will be less than VIR_DOMAIN_CAPS_FEATURE_LAST,
so "val < VIR_DOMAIN_CAPS_FEATURE_LAST' is thus always true too.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-31 10:38:13 +00:00
Peter Krempa
d56afb8e39 qemu: Label backing chain of user-provided target of blockCopy when starting the job
Be more sensible when setting labels of the target of a
virDomainBlockCopy operation. Previously we'd relabel everything in case
it's a copy job even if there's no unlabelled backing chain. Since we
are also not sure whether the backing chain is shared we don't relabel
the chain on completion of the blockjob. This certainly won't play nice
with the image permission relabelling feature.

While this does not fix the case where the image is reused and has
backing chain it certainly sanitizes all the other cases. Later on it
will also allow to do the correct thing in cases where only one layer
was introduced.

The change is necessary as in case when -blockdev will be used we will
need to hotplug the backing chain and thus labeling needs to be setup in
advance and not only at the time of pivot.  To avoid multiple code paths
move the labeling now.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
9b197f0e36 qemu: hotplug: Refactor qemuHotplugPrepareDiskAccess to work on virStorageSource
Rather than passing in a virStorageSource which would override the
originally passed disk->src we can now drop passing in a disk completely
as all functions called inside here require a virStorageSource.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
083b74cd20 locking: Use virDomainLockImage[Attach|Detach] instead of *Disk
Use the functions designed to deal with single images as the *Disk
functions were just wrappers.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
93a1659171 qemu: driver: Remove disk source munging in qemuDomainBlockPivot
Previously there weren't any suitable functions which would allow
setting up host side of a full disk chain so we've opted to replace the
'src' in a virDomainDiskDef by the new image source.

That is now no longer necessary so remove the munging.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
c938c35363 security: Remove disk labeling functions and fix callers
Now that we have replacement in the form of the image labeling function
we can drop the unnecessary functions by replacing all callers.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
787e4a3dc8 qemu: security: Replace and remove qemuSecurity[Set|Restore]DiskLabel
The same can be achieved by using qemuSecurity[Set|Restore]ImageLabel.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
81594afb05 qemu: security: Add 'backingChain' flag to qemuSecurity[Set|Restore]ImageLabel
The flag will control the VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN
flag of the security driver image labeling APIs.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
43479005ee security: Remove security driver internals for disk labeling
Security labeling of disks consists of labeling of the disk image
itself and it's backing chain. Modify
virSecurityManager[Set|Restore]ImageLabel to take a boolean flag that
will label the full chain rather than the top image itself.

This allows to delete/unify some parts of the code and will also
simplify callers in some cases.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
e7d14bf965 qemu: cgroup: Change qemu[Setup|Teardown]DiskCgroup to take virStorageSource
Since the disk is necessary only to get the source modify the functions
to take the source directly and rename them to
qemu[Setup|Teardown]ImageChainCgroup.

Additionally drop a pointless comment containing the old function name.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
33b0a3bab8 qemu: domain: Allow overriding disk source in qemuDomainDetermineDiskChain
When we need to detect a chain for a image which will become the new
source for a disk (e.g. after a disk media change or a blockjob) we'd
need to replace disk->src temporarily to do so.

Move the 'disksrc' temporary variable to an argument and adjust callers.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:38 +01:00
Peter Krempa
73163a0e86 qemu: domain: Clarify temp variable scope in qemuDomainDetermineDiskChain
The function at first validates the top image of the chain, then
traverses the chain as declared in the XML (if any) and then procedes to
detect the rest of the chain from images. All of the steps have their
own temporary iterator.

Clarify the use scope of the steps by introducing a new temp variable
holding the top level source and adding comments.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
2019-01-30 17:20:37 +01:00
John Ferlan
adb15b5add tests: Add storagepoolxml2argvtest source to EXTRA_DIST
Commit f2f84b4d4 added storagepoolxml2argvtest processing; however,
it didn't follow alter the else to !WITH_STORAGE and add the source
itself to the EXTRA_DIST like the other WITH_STORAGE options for
virstorageutiltest and storagevolxml2argvtest.

Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
2019-01-30 10:35:28 -05:00
John Ferlan
406473990c tests: Fix build issue with storagevolxml2xmltest
Commit 7a227688a caused a build failure on mingw. Following
other uses of including ../src/libvirt_driver_storage_impl.la
I moved to under the WITH_STORAGE conditional.

Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
2019-01-30 10:35:07 -05:00
John Ferlan
ab6ca81276 rbd: Utilize storage pool namespace to manage config options
Allow for adjustment of RBD configuration options via Storage
Pool XML Namespace adjustments. When namespace arguments are
used to start the pool, add a VIR_WARN to indicate that the
startup was tainted by custom config_opts.

Based off original patch/concept:

https://www.redhat.com/archives/libvir-list/2014-May/msg00940.html

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:16:25 -05:00
John Ferlan
ab995c1fe9 storage: Add storage pool namespace options to fs and netfs command lines
If the Storage Pool Namespace XML data exists, format the mount
options on the MOUNT command line and issue a VIR_WARN to indicate
that the storage pool was tainted by custom mount_opts.

When the pool is started, the options will be generated on the
command line along with the options already defined.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:16:19 -05:00
John Ferlan
7a227688a8 storage: Add infrastructure to manage XML namespace options
Introduce the virStoragePoolFSMountOptionsDef to be used to
manage the Storage Pool XML Namespace for mount options.

Using a new virStorageBackendNamespaceInit function, set the
virStoragePoolXMLNamespace into the _virStoragePoolOptions when
the storage backend is loaded.

Modify the storagepool.rng to allow for the usage of a different
XML namespace to parse the fs_mount_opts to be included with
the fs and netfs storage pool definitions.

Modify the storagepoolxml2xmltest to utilize a properly modified
XML file to parse and format the namespace for a netfs storage pool.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:16:13 -05:00
John Ferlan
fa7a66d079 conf: Introduce virStoragePoolXMLNamespace
Introduce the infrastructure necessary to manage a Storage Pool XML
Namespace. The general concept is similar to virDomainXMLNamespace,
except that for Storage Pools the storage backend specific details
can be stored within the _virStoragePoolOptions unlike the domain
processing code which manages its xmlopt's via the virDomainXMLOption
which is allocated/passed around for each domain.

This patch defines the add the parse, format, free, and href methods
required to process the XML and callout from the Storage Pool Def
parse, format, and free API's to perform the action on the XML data
for/from the backend.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:16:13 -05:00
John Ferlan
a3dbaa3647 virsh: Add source-protocol-ver for pool commands
Allow the addition of the <protocol ver='n'/> to the provided XML.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:16:13 -05:00
John Ferlan
3d3647e14f storage: Add the nfsvers to the command line
If protocolVer present, add the -o nfsvers=# to the command
line for the NFS Storage Pool

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:16:07 -05:00
John Ferlan
801f8cfb37 conf: Add optional NFS Source Pool <protocol ver='n'/> option
Add an optional way to define which NFS Server version will be
used to content the target NFS server.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:15:27 -05:00
John Ferlan
f06e94af07 docs: Add news mention of default fs/netfs storage pool mount options
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:15:27 -05:00
John Ferlan
f00cde7f11 storage: Add default mount options for fs/netfs storage pools
https://bugzilla.redhat.com/show_bug.cgi?id=1584663

Modify the command generation to add some default options to the
fs/netfs storage pools based on the OS type. For Linux, it'll be
the "nodev, nosuid, noexec". For FreeBSD, it'll be "nosuid, noexec".
For others, just leave the options alone.

Modify the storagepoolxml2argvtest to handle the fact that the
same input XML could generate different output XML based on whether
Linux, FreeBSD, or other was being built.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 19:15:20 -05:00
John Ferlan
d0ba8d6553 conf: Alter virCapabilitiesFormatGuestXML to take virCapsGuestPtr
Rather than deref off of "caps->guests", let's pass "caps->guests" and
caps->nguests to have the helper use "guests[i]->" instead.

Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
2019-01-29 13:24:46 -05:00
John Ferlan
181acfe9a8 conf: Extract guest XML formatting from virCapabilitiesFormatXML
Let's extract out the <guest> code into it's own method/helper.

NB: One minor change between the two is usage of "buf" instead
of "&buf" in the new code since we pass the address of &buf to
the helper.

Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
2019-01-29 13:24:41 -05:00
John Ferlan
0d832b873c conf: Alter virCapabilitiesFormatHostXML to take virCapsHostPtr
Rather than deref off of "caps->host.", let's pass "&caps->host"
and make the helper use "host->" instead.

Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
2019-01-29 13:24:36 -05:00
John Ferlan
da87aa5963 conf: Extract host XML formatting from virCapabilitiesFormatXML
Let's extract out the <host> code into it's own method/helper.

NB: One minor change between the two is usage of "buf" instead
of "&buf" in the new code since we pass the address of &buf to
the helper.

Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
2019-01-29 13:24:14 -05:00
Daniel P. Berrangé
9047b9aec0 Revert "qemu: Forbid pinning vCPUs for TCG domain"
This reverts commit 8b035c84d8.

The MTTCG impl in QEMU does allow pinning vCPUs.

When the guest is running we already check if pinning is
possible in the qemuDomainPinVcpuLive method, so this
check was adding no benefit.

When the guest is not running, we cannot know whether the
subsequent launch will use MTTCG or TCG, so we must allow
the pinning request. If the guest does use TCG on the next
launch it will fail, but this is no worse than if the user
had done a virDomainDefineXML with an XML doc specifying
vCPU pinning.

Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 17:19:10 +00:00
Daniel P. Berrangé
34f77437da qemu: fix recording of vCPU pids for MTTCG
MTTCG is the new multi-threaded impl of TCG which follows
KVM in having one host OS thread per vCPU. Historically
we have discarded all PIDs reported for TCG guests, but
we must now selectively honour this data.

We don't have anything in the domain XML that indicates
whether a guest is using TCG or MTTCG. While QEMU does
have an option (-accel tcg,thread=single|multi), it is
not desirable to expose this in libvirt. QEMU will
automatically use MTTCG when the host/guest architecture
pairing is known to be safe. Only developers of QEMU TCG
have a strong reason to override this logic.

Thus we use two sanity checks to decide if the vCPU
PID information is usable. First we see if the PID
duplicates the main emulator PID, and second we see
if the PID duplicates any other vCPUs.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 16:31:04 +00:00
Peter Krempa
38757744c2 lib: domain: Emphasise that users should wait for block job READY state via events
The transition to the ready state is best observed by events as it's
ansynchronous and does not hint users to do polling. As currently only
the qemu driver supports block copy and block commit and the ready state
event was introduced by qemu 1.3 we can fully switch to the new
approach.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-01-29 17:11:02 +01:00
Peter Krempa
b7bd97fbe7 lib: Clarify that any block job may block VM save or device detach
The documentation was only referring to a copy job, but in fact any
running blockjob will have the same results.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-01-29 17:11:02 +01:00
Peter Krempa
5ea24bbb54 qemu: Don't reject making domain persistent if block copy is running
Add documentation that the 'VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB' flag
is auto-assumed if the block copy job is started while the VM is
transient and remove the restriction to define the domain when copy
is running.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-01-29 17:11:02 +01:00
Daniel P. Berrangé
7431b3eb9a util: move virtual network firwall rules into private chains
The previous commit created new chains to hold the firewall rules. This
commit changes the code that creates rules to place them in the new
private chains instead of the builtin top level chains.

With two networks running, the rules in the filter table now look like

  -N LIBVIRT_FWI
  -N LIBVIRT_FWO
  -N LIBVIRT_FWX
  -N LIBVIRT_INP
  -N LIBVIRT_OUT
  -A INPUT -j LIBVIRT_INP
  -A FORWARD -j LIBVIRT_FWX
  -A FORWARD -j LIBVIRT_FWI
  -A FORWARD -j LIBVIRT_FWO
  -A OUTPUT -j LIBVIRT_OUT
  -A LIBVIRT_FWI -d 192.168.0.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  -A LIBVIRT_FWI -o virbr0 -j REJECT --reject-with icmp-port-unreachable
  -A LIBVIRT_FWI -d 192.168.1.0/24 -o virbr1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  -A LIBVIRT_FWI -o virbr1 -j REJECT --reject-with icmp-port-unreachable
  -A LIBVIRT_FWO -s 192.168.0.0/24 -i virbr0 -j ACCEPT
  -A LIBVIRT_FWO -i virbr0 -j REJECT --reject-with icmp-port-unreachable
  -A LIBVIRT_FWO -s 192.168.1.0/24 -i virbr1 -j ACCEPT
  -A LIBVIRT_FWO -i virbr1 -j REJECT --reject-with icmp-port-unreachable
  -A LIBVIRT_FWX -i virbr0 -o virbr0 -j ACCEPT
  -A LIBVIRT_FWX -i virbr1 -o virbr1 -j ACCEPT
  -A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
  -A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
  -A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
  -A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
  -A LIBVIRT_INP -i virbr1 -p udp -m udp --dport 53 -j ACCEPT
  -A LIBVIRT_INP -i virbr1 -p tcp -m tcp --dport 53 -j ACCEPT
  -A LIBVIRT_INP -i virbr1 -p udp -m udp --dport 67 -j ACCEPT
  -A LIBVIRT_INP -i virbr1 -p tcp -m tcp --dport 67 -j ACCEPT
  -A LIBVIRT_OUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
  -A LIBVIRT_OUT -o virbr1 -p udp -m udp --dport 68 -j ACCEPT

While in the nat table:

  -N LIBVIRT_PRT
  -A POSTROUTING -j LIBVIRT_PRT
  -A LIBVIRT_PRT -s 192.168.0.0/24 -d 224.0.0.0/24 -j RETURN
  -A LIBVIRT_PRT -s 192.168.0.0/24 -d 255.255.255.255/32 -j RETURN
  -A LIBVIRT_PRT -s 192.168.0.0/24 ! -d 192.168.0.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
  -A LIBVIRT_PRT -s 192.168.0.0/24 ! -d 192.168.0.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
  -A LIBVIRT_PRT -s 192.168.0.0/24 ! -d 192.168.0.0/24 -j MASQUERADE
  -A LIBVIRT_PRT -s 192.168.1.0/24 -d 224.0.0.0/24 -j RETURN
  -A LIBVIRT_PRT -s 192.168.1.0/24 -d 255.255.255.255/32 -j RETURN
  -A LIBVIRT_PRT -s 192.168.1.0/24 ! -d 192.168.1.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
  -A LIBVIRT_PRT -s 192.168.1.0/24 ! -d 192.168.1.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
  -A LIBVIRT_PRT -s 192.168.1.0/24 ! -d 192.168.1.0/24 -j MASQUERADE

And finally the mangle table:

  -N LIBVIRT_PRT
  -A POSTROUTING -j LIBVIRT_PRT
  -A LIBVIRT_PRT -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
  -A LIBVIRT_PRT -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 13:37:11 +00:00
Daniel P. Berrangé
5f1e6a7d48 util: create private chains for virtual network firewall rules
Historically firewall rules for virtual networks were added straight
into the base chains. This works but has a number of bugs and design
limitations:

  - It is inflexible for admins wanting to add extra rules ahead
    of libvirt's rules, via hook scripts.

  - It is not clear to the admin that the rules were created by
    libvirt

  - Each rule must be deleted by libvirt individually since they
    are all directly in the builtin chains

  - The ordering of rules in the forward chain is incorrect
    when multiple networks are created, allowing traffic to
    mistakenly flow between networks in one direction.

To address all of these problems, libvirt needs to move to creating
rules in its own private chains. In the top level builtin chains,
libvirt will add links to its own private top level chains.

Addressing the traffic ordering bug requires some extra steps. With
everything going into the FORWARD chain there was interleaving of rules
for outbound traffic and inbound traffic for each network:

  -A FORWARD -d 192.168.3.0/24 -o virbr1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  -A FORWARD -s 192.168.3.0/24 -i virbr1 -j ACCEPT
  -A FORWARD -i virbr1 -o virbr1 -j ACCEPT
  -A FORWARD -o virbr1 -j REJECT --reject-with icmp-port-unreachable
  -A FORWARD -i virbr1 -j REJECT --reject-with icmp-port-unreachable
  -A FORWARD -d 192.168.2.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  -A FORWARD -s 192.168.2.0/24 -i virbr0 -j ACCEPT
  -A FORWARD -i virbr0 -o virbr0 -j ACCEPT
  -A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
  -A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable

The rule allowing outbound traffic from virbr1 would mistakenly
allow packets from virbr1 to virbr0, before the rule denying input
to virbr0 gets a chance to run.

What we really need todo is group the forwarding rules into three
distinct sets:

 * Cross rules - LIBVIRT_FWX

  -A FORWARD -i virbr1 -o virbr1 -j ACCEPT
  -A FORWARD -i virbr0 -o virbr0 -j ACCEPT

 * Incoming rules - LIBVIRT_FWI

  -A FORWARD -d 192.168.3.0/24 -o virbr1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  -A FORWARD -o virbr1 -j REJECT --reject-with icmp-port-unreachable
  -A FORWARD -d 192.168.2.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  -A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable

 * Outgoing rules - LIBVIRT_FWO

  -A FORWARD -s 192.168.3.0/24 -i virbr1 -j ACCEPT
  -A FORWARD -i virbr1 -j REJECT --reject-with icmp-port-unreachable
  -A FORWARD -s 192.168.2.0/24 -i virbr0 -j ACCEPT
  -A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable

There is thus no risk of outgoing rules for one network mistakenly
allowing incoming traffic for another network, as all incoming rules
are evalated first.

With this in mind, we'll thus need three distinct chains linked from
the FORWARD chain, so we end up with:

        INPUT --> LIBVIRT_INP   (filter)

       OUTPUT --> LIBVIRT_OUT   (filter)

      FORWARD +-> LIBVIRT_FWX   (filter)
              +-> LIBVIRT_FWO
              \-> LIBVIRT_FWI

  POSTROUTING --> LIBVIRT_PRT   (nat & mangle)

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-01-29 13:35:58 +00:00