While the code that's setting default qdisc is clever enough to
not overwrite any bandwidth (potentially) set by
virNetDevBandwidthSet() (and thus the root qdisc htb is not
replaced with noqueue), it does print a debug message when that's
the case. It's needless. We can set the root qdisc beforehand and
let virNetDevBandwidthSet() overwrite it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Currently, we configure QEMU to prealloc memory almost by
default. Well, by default for NVDIMMs, hugepages and if user
asked us to (via memoryBacking <allocation mode="immediate"/>).
However, when guest's NVDIMM is backed by real life NVDIMM this
approach is not the best. In this case users should put <pmem/>
into the <memory/> device <source/>, like this:
<memory model='nvdimm' access='shared'>
<source>
<path>/dev/pmem0</path>
<pmem/>
</source>
</memory>
Instructing QEMU to do prealloc in this case means that each
page of the NVDIMM is "touched" (the first byte is read and
written back - see QEMU commit v2.9.0-rc1~26^2) which cripples
device wear.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1894053
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Although the function currently only returns errors for PCI addresses,
check it here too, in case that changes in the future.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
So far our memory modules could go only into DIMM slots. But with
virtio model this assumption is no longer true - virtio-pmem goes
onto PCI bus. But for formatting PCI address onto command line we
already have a function - qemuBuildDeviceAddressStr(). Therefore,
mode DIMM address generation into it so that we don't have to
special case address building later on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Han Han <hhan@redhat.com>
The virDomainMemoryModel structure has a @type member which is
really type of virDomainMemoryModel but we store it as int
because the virDomainMemoryModelTypeFromString() call stores its
retval right into it. Then, to have compiler do compile time
check for us, every switch() typecasts the @type. This is
needlessly verbose because the parses already has @val - a
variable to store temporary values. Switch @type in the struct to
virDomainMemoryModel and drop all typecasts.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Han Han <hhan@redhat.com>
qemuBuildCommandLine() is calling qemuDomainAlignMemorySizes(),
which is an operation that changes live XML and domain and has
little to do with the command line build process.
Move it to qemuProcessPrepareDomain() where we're supposed to
make live XML and domain changes before launch. qemuProcessStart()
is setting VIR_QEMU_PROCESS_START_NEW if !migrate && !snapshot,
same conditions used in qemuBuildCommandLine() to call
qemuDomainAlignMemorySizes(), making this change seamless.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Do not look up the index of the passed FD in places where
we already have it.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
An alternative to qemuVirCommandGetFDSet that takes the index
into the passed FD set as an argument and does not try to look it up.
Use it as well ass virCommandPassFDIndex in qemuBuildChrChardevFileStr
and qemuBuildInterfaceCommandLine.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The ESP SCSI controllers (NCR53C90, DC390, AM53C974) have the same
requirement as the LSI Logic controller for each disk to be set via
the scsi-id=NNN property, not the lun=NNN property.
Switching the code to use an enum will force authors to pay attention
to this difference when adding future SCSI controllers.
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The NCR53C90 is the built-in SCSI controller on all sparc machine types,
but not sparc64. Note that it has the fixed alias "scsi", which differs
from our normal naming convention of "scsi0".
The DC390 and AM53C974 are PCI SCSI controllers that can be added to any
PCI machine.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The NCR53C90 is the built-in SCSI controller on all sparc machine types,
and some mips and m68k machine types.
The DC390 and AM53C974 are PCI SCSI controllers that can be added to any
PCI machine.
These are only interesting for emulating obsolete hardware platforms.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
QEMU version 4.2 introduced a performance feature under commit
d645e13287 ("kvm: i386: halt poll control MSR support").
This patch adds a new KVM feature 'poll-control' to set this performance
hint for KVM guests. The feature is off by default.
To enable this hint and have libvirt add "-cpu host,kvm-poll-control=on"
to the QEMU command line, the following XML code needs to be added to the
guest's domain description:
<features>
<kvm>
<poll-control state='on'/>
</kvm>
</features>
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
There are two types of vhostuser ports:
dpdkvhostuser - OVS creates the socket and QEMU connects to it
dpdkvhostuserclient - QEMU creates the socket and OVS connects to it
But of course ovs-vsctl syntax for fetching ifname is different.
So far, we've implemented the former. The lack of implementation
for the latter means that we are not detecting the interface name
and thus not reporting it in domain XML, or failing to get
interface statistics.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1767013
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Although the code in qemuProcessStartValidateTSC works as if the
timer frequency was already unsigned long long (by using an appropriate
temporary variable), the virDomainTimerDef structure actually defines
frequency as unsigned long, which is not guaranteed to be 64b.
Fixes support for frequencies higher than 2^32 - 1 on 32b systems.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Add logic to validate and then pass through 'fmode' and 'dmode' to the
QEMU call.
Signed-off-by: Brian Turek <brian.turek@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Coverity reported a potential resource leak. While it's probably not
a real-world scenario, the code could technically jump to cleanup
between the time that vdpafd is opened and when it is used. Ensure that
it gets cleaned up in that case.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Use of the -enable-fips option is being deprecated in QEMU >= 5.2.0. If
FIPS compliance is required, QEMU must be built with libcrypt which will
unconditionally enforce it.
Thus there is no need for libvirt to pass -enable-fips to modern QEMU.
Unfortunately there was never any way to probe for -enable-fips in the
first instance, it was enabled by libvirt based on version number
originally, and then later unconditionally enabled when libvirt dropped
support for older QEMU. Similarly we now use a version number check to
decide when to stop passing -enable-fips.
Note that the qemu-5.2 capabilities are currently from the pre-release
version and will be updated once qemu-5.2 is released.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Enable <interface type='vdpa'> for qemu domains. This provides basic
support and does not support hotplug or migration.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This patch adds new schema and adds support for parsing and formatting
domain configurations that include vdpa devices.
vDPA network devices allow high-performance networking in a virtual
machine by providing a wire-speed data path. These devices require a
vendor-specific host driver but the data path follows the virtio
specification.
When a device on the host is bound to an appropriate vendor-specific
driver, it will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
That chardev path can then be used to define a new interface with
type='vdpa'.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
SCSI hostdev setup requires querying the host os for the actual path of
the configured hostdev. This was historically done in the command line
formatter. Our new approach is to split out this part into
'qemuProcessPrepareHost' which is designed to be skipped in tests.
Refactor the hostdev code to use this new semantics, and add appropriate
handlers filling in the data for tests and the qemuConnectDomainXMLToNative
users.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
qemuBuildHostdevSCSIAttachPrepare is supposed to prepare the data
structure used for attaching the hostdev not preparing the hostdev
definition itself. Move the corresponding bits to qemuDomainPrepareHostdev
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
All but VIR_CPU_MODE_HOST_MODEL were moved. 'host_model' mode
has nuances that forbid the verification to be moved to parse
time.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
We have a lot of "if (usingVirtio)" checks being done while
constructing the NIC command line. Let's put all of them in
a single "if".
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
A few tweaks were made during the move:
- the error messages were changed to mention 'sata controller'
instead of 'ide controller';
- a check for address type 'drive' was added like it is done
with other bus types. The error message of qemuxml2argdata was
updated to reflect that now, instead of erroring it out from the
common code in virDomainDiskDefValidate(), we're failing earlier
with a different error message.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This patch enables the free-page-reporting in qemu.
Signed-off-by: Nico Pache <npache@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
By default, pfifo_fast queueing discipline (qdisc) is set on
newly created interfaces (including TAPs). This qdisc has three
queues and packets that want to be sent through given NIC are
placed into one of the queues based on TOS field. Queues are then
emptied based on their priority allowing interactive sessions
stay interactive whilst something else is downloading a large
file.
Obviously, this means that kernel has to be involved and some
locking has to happen (when placing packets into queues). If
virtualization is taken into account then the above algorithm
happens twice - once in the guest and the second time in the
host.
This is arguably not optimal as it burns host CPU cycles
needlessly. Guest already made it choice and sent packets in the
order it wants.
To resolve this, Linux kernel offers 'noqueue' qdisc which can be
applied on virtual interfaces and in fact for 'lo' it is by
default:
lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
Set it for other TAP devices we create for domains too. With this
change I was able to squeeze 1Mbps more from a macvtap attached
to a guest and to my 1Gbps LAN (as measured by iperf3).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1329644
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
In 88957116c9 I've switched to -machine memory-backend=ID and
-object memory-backend-* because QEMU is obsoleting -mem-path
and -mem-prealloc. However, what I did not foresee was that using
-machine memory-backend in combination with -numa is not allowed
in QEMU. This was reported upstream and fortunately not released
yet.
The problem is that if domain has NUMA nodes then we will
generate memory-backend-* objects for NUMA nodes (because if QEMU
is new enough to expose default RAM ID it also supports -numa
memdev=) and adding non-NUMA memory backend is wrong.
Reported-by: Masayoshi Mizuma <msys.mizuma@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
So far, Libvirt configures memory-backend-* for memory hotplug,
possibly NUMA nodes and in a few other cases. This patch
switches to constructing the memory-backend-* command line for
all cases. To keep ability to migrate guests a little hack is
used: the ID of the object is set to the one that QEMU uses
internally anyways. These IDs are stable (first started to appear
somewhere around v0.13.0-rc0~96) and can't change.
In fact, this patch does exactly what QEMU does internally. The
reason for moving the logic into Libvirt is that QEMU wants to
deprecate the old style of specifying memory.
So far, only x84_64 test cases are changed, because tests for
other architectures use older capabilities, which still lack the
QEMU_CAPS_MACHINE_MEMORY_BACKEND capability and they don't report
the RAM ID.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1836043
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The objects at @def and @mem pointers are only read and not
written. Make the arguments const to make that explicit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
If a domain was using hugepages through memory-backend-file or
via -mem-path, we would turn prealloc on. But we are not doing
that for memory-backend-memfd. Fix this, because we need QEMU to
fully allocate hugepages.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
If user specifies immediate memory allocation in the domain XML,
they want QEMU to fully allocate its memory. And if the memory
was allocated using plain '-m' then we would honour it. But, if a
memory backend is used, then we don't set the prealloc attribute
of the backend.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>