Please see the commit log for commit v10.9.0-rc1-1-g42ab0148dd for the
history and explanation of the problem that this patch is fixing.
A shorter explanation is that when a guest is connected to a libvirt
virtual network using a virtio-net adapter with in-kernel "vhost-net"
packet processing enabled, it will fail to acquire an IP address from
a DHCP seever running on the host.
In commit v10.9.0-rc1-1-g42ab0148dd we tried fixing this by *zeroing
out* the checksums of these packets with an nftables rule (nftables
can't recompute the checksum, but it can set it to 0) . This
*appeared* to work initially, but it turned out that zeroing the
checksum ends up breaking dhcp packets on *non* virtio/vhost-net guest
interfaces. That attempt was reverted in commit v10.9.0-rc2.
Fortunately, there is an existing way to recompute the checksum of a
packet as it leaves an interface - the "tc" (traffic control) utility
that libvirt already uses for bandwidth management. This patch uses a
tc filter rule to match dhcp response packets on the bridge and
recompute their checksum.
The filter rule must be attached to a tc qdisc, which may also have a
filter attached for bandwidth management (in the <bandwidth> element
of the network config). Not only must we add the qdisc only once
(which was already handled by the patch two prior to this one), but
also the filter rule for checksum fixing and the filter rule for
bandwidth management must be different priorities so they don't clash;
this is solved by adding the checksum-fix filter with "priority 2",
while the bandwidth management filter remains "priority 1" (both will
always be evaluated anyway, it's just a matter of which is evaluated
first).
So far this method has worked with every different guest we could
throw at it, including several that failed with the previous method.
Fixes: b89c4991da
Reported-by: Rich Jones <rjones@redhat.com>
Reported-by: Andrea Bolognani <abologna@redhat.com>
Fix-Suggested-by: Eric Garver <egarver@redhat.com>
Fix-Suggested-by: Phil Sutter <psutter@redhat.com>
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If the layer of a virFirewallCmd is "tc", then the "tc" utility will
be executed using the arguments that had been added to the
virFirewallCmd
tc layer doesn't support auto-rollback command creation (any rollback
needs to be added manually with virFirewallAddRollbackCmd()), and also
tc layer isn't supported by the iptables backend (it would have been
straightforward to add, but the iptables backend doesn't need it, and
I didn't want to take the chance of causing a regression in that
code for no good reason).
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
There will soon be two separate users of tc on virtual networks, and
both will use the "qdisc root handle 1: htb" to add tx filters. One or the
other could get the first chance to add the qdisc, and then if at a
later time the other decides to use it, we need to prevent the 2nd
user from attempting to re-add the qdisc (because that just generates
an error).
We do this by running "tc qdisc show dev $bridge handle 1:" then
checking if the output of that command contains both "qdisc" and " 1:
".[*] If it does then the qdisc has already been added. If not then we
need to add it now.
[*]As of this writing, the output more exactly starts with "qdisc
htb 1: root", but our comparison is made purposefully generous to
increase the chances that it will continue to work properly if tc
modifies the format of its output.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
virNetDevBandwidthSet() adds a queue discipline (qdisc) for each
interface that it will need to add tc transmit filters to, and the
filters are then attached to the qdisc.
There are other circumstances where some other function will need to
add tc transmit filters to an interface (in particular an upcoming
patch to the network driver nftables backend that will use a tc tx
filter to fix the checksum of dhcp packets), so that function will
also need a qdisc for the tx filter. To assure both always use exactly
the same qdisc, this patch puts the command that adds the tx filter
qdisc into a separate helper function that can (and will) be called
from either place
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
virNetDevBandwidthSet() always clears all existing qdiscs and their
subordinate filters before adding all the new qdiscs/filters. This is
normally exactly what we want, but there is one case (the network
driver) where the Qdisc added by virNetDevBandwidthSet() may already
be in use by the nftables backend (which will add a rule to fix the
checksum of dhcp packets); in that case, we *don't* want
virNetDevBandwidthSet() to clear out the qdisc that was already added
for nftables, and none of the bandwidth filters have been added yet,
so there already aren't any "old" filters that need to be removed
either - it is safe to just skip virNetDevBandwidthClear() in this
case.
To allow the network driver to set bandwidth without first clearing
it, this patch adds the flag VIR_NETDEV_BANDWIDTH_SET_CLEAR_ALL to the
virNetDevBandwidthSetFlags enum, and recognizes it in
virNetDevBandwidthSet() - if the flag is set, then
virNetDevBandwidth() will call virNetDevBandwidthClear() just as it
always has. But if the flag isn't set it *won't* call
virNetDevBandwidthClear().
As suggested above, VIR_NETDEV_BANDWIDTH_SET_CLEAR_ALL is set for all
calls to virNetdevBandwidthSet() except for two places in the network
driver.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Having two bools in the arg list is on the borderline of being
confusing to anyone trying to read the code, but we're about to add a
3rd. This patch replaces the two bools with a single flags argument
which will instead have one or more bits from virNetDevBandwidthFlags
set.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Some models are just aliases to other models. Make this relation
available to users via domain capabilities.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Record a fact a specific CPU model was derived from another one. The
original model is also marked as an alias of the new one in case it did
not change any properties of the original CPU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The signatures in the CPU map are used for matching physical CPUs and
thus we need to cover all possible real world variants we know about.
When adding a new version of an existing CPU model, we should copy the
signature(s) of the existing model rather than replacing it with the
signature that QEMU uses.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Add all newly generated CPU models to the appropriate section of
index.xml.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
We already visually group the included models using comments. This patch
introduces a new <group name='...'> element for doing it properly in a
machine friendly way.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
XMLs parse/format round trip using lxml results in an XML document that
almost exactly matches the original (including comments).
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
We don't really need or want the extra info to be included in the CPU
model definitions in git, it's mostly useful for verifying the output of
the script. Let's store it in a separate file rather than in a comment
block of the CPU model definition itself.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Each CPU model with -v* suffix is defined as a standalone model copying
all attributes of the previous version. CPU model versions with an alias
are handled differently. The full definition is used for the alias and
the versioned model is created as an identical copy of the alias.
To avoid breaking migration compatibility of host-model CPUs all
versioned models are marked with <decode guest='off'/> so that they are
ignored when selecting candidates for host-model. It's not ideal but not
doing so would break almost all host-model CPUs as the new versioned CPU
models have all vmx-* features included since their introduction while
existing CPU models were updated later. This meas existing models would
be accompanied with a long list of vmx-* features to properly describe a
host CPU while the newly added CPU models would have those features
enabled implicitly and their list of features would be significantly
shorter. Thus the new models would always be better candidates for
host-model than the existing models.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While the script for synchronizing CPU features expects a path to QEMU
source tree, this CPU model script insisted on getting a full patch to
cpu.c file, even though it could easily deduce it from the path to QEMU
source tree.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
We don't change definitions of CPU models which were already included in
a libvirt release to maintain migration compatibility. Thus the script
can just skip existing models and save us from having to drop the
changes it would do to them.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When removing features unknown to QEMU (they have a different name or
are completely missing as they are not configurable by a user) I should
not have removed them from the list of features unknown to QEMU in the
script for synchronizing QEMU features to the CPU map.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When a CPU model is defined based on another model, we were completely
ignoring features marked as added to or removed from the original model
after it was released. For added features this is the right thing to do
as it will promote them to become normal features included in the new
model. But features marked as removed would become included in the new
model as well. We need to explicitly remove them as if they were never
included in the model.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Document which fields are inherited when a CPU model is based on another
model.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>