When libvirt is managing the MAC table of a Linux host bridge, it must
turn off learning and unicast_flood for each tap device attached to
that bridge, then add a Forwarding Database (fdb) entry for the tap
device using the MAC address from the domain interface config.
Once we have disabled learning and flooding, any packet that has a
destination MAC address not present in the fdb will be dropped by the
bridge. This, along with the opportunistic disabling of promiscuous
mode[*], can result in enhanced network performance. and a potential
slight security improvement.
[*] If there is only one device on the bridge with learning/unicast_flood
enabled, then that device will automatically have promiscuous mode
disabled. If there are *no* devices with learning/unicast_flood
enabled (e.g. for a libvirt "route", "nat", or isolated network that
has no physical device attached), then all non-tap devices will have
promiscuous mode disabled (tap devices always have promiscuous mode
enabled, which may be a bug in the kernel, but in practice has 0
effect).
None of this has any effect for kernels prior to 3.15 (upstream kernel
commit 2796d0c648c940b4796f84384fbcfb0a2399db84 "bridge: Automatically
manage port promiscuous mode"). Even after that, until kernel 3.17
(upstream commit 5be5a2df40f005ea7fb7e280e87bbbcfcf1c2fc0 "bridge: Add
filtering support for default_pvid") traffic will not be properly
forwarded without manually adding vlan table entries. Unfortunately,
although the presence of the first patch is signalled by existence of
the "learning" and "unicast_flood" options in sysfs, there is no
reliable way to query whether or not the system's kernel has the
second of those patches installed, the only thing that can be done is
to try the setting and see if traffic continues to pass.
I'm about to make block stats optionally more complex to cover
backing chains, where block.count will no longer equal the number
of <disks> for a domain. For these reasons, it is nicer if the
statistics output includes the source path (for local files).
This patch doesn't add anything for network disks, although we
may decide to add that later.
With this patch, I now see the following for the same domain as
in the previous patch (one qcow2 file, and an empty cdrom drive):
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.0.path=/var/lib/libvirt/images/foo.qcow2
block.1.name=hdc
* src/libvirt-domain.c (virConnectGetAllDomainStats): Document
new field.
* tools/virsh.pod (domstats): Document new field.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Return the new
stat for local files/block devices.
(QEMU_ADD_NAME_PARAM): Add parameter.
(qemuDomainGetStatsInterface): Update caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
I noticed that for an offline domain, 'virsh domstats --block $dom'
was producing just the domain name, with no stats. But the older
'virsh domblkinfo' works just fine on offline domains. This patch
starts to get us closer, by at least reporting the disk names for
an offline domain.
With this patch, I now see the following for an offline domain
with one qcow2 disk and an empty cdrom drive:
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.1.name=hdc
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Don't short-circuit
output of block name.
Signed-off-by: Eric Blake <eblake@redhat.com>
qemuDomainGetStatsBlock() could leak a stats hash table if it
encountered OOM while populating the virTypedParameters.
Oddly, the fix doesn't even touch qemuDomainGetStatsBlock :)
* src/qemu/qemu_driver.c (QEMU_ADD_COUNT_PARAM)
(QEMU_ADD_NAME_PARAM): Don't return early.
(qemuDomainGetStatsInterface): Adjust caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
If probing capabilities via QMP fails, we now have a check
that prevents us falling back to -help parsing. Unfortunately
the error message
"Failed to probe capabilities for /usr/bin/qemu-kvm:
unsupported configuration: QEMU 2.1.2 is too new for help parsing"
is proving rather unhelpful to the user. We need to be telling
them why QMP failed (the root cause), rather than they can't
use -help (the side effect).
To do this we should capture stderr during QMP probing, and
if -help parsing then sees a new QEMU version, we know that
QMP should have worked, and so we can show the messages from
stderr. The message thus becomes
"Failed to probe capabilities for /usr/bin/qemu-kvm:
internal error: QEMU / QMP failed: Could not access
KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory"
When attempting to create internal system checkpoint with a passthrough
device qemu will report the following error:
error: operation failed: Error -22 while writing VM
This patch calls the function to check if migration is possible with
given VM and thus improves the error to:
error: Requested operation is not valid: domain has assigned non-USB host devices
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=874418#c19
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
3ecebf07110ca8d3413072557f29137943e848e3 breaks the build as it adds a
way to jump to cleanup before the 'cfg' object is retrieved and 'priv'
is initialized.
Move entering the job into the thread to simplify the program flow. Also
as the code holds a separate reference to the domain object some
conditions can be simplified.
After this patch qemuDomainObjTransferJob is no longer needed so this
patch removes it.
If someone removes blockcopy storage file when still in mirroring phase
and then requesting blockjob abort using pivot, virsh cmd freezes. This
is not an issue with older qemu versions which did not support
asynchronous jobs (which we prefer by default).
As we have reached the mirroring phase successfully, polling monitor for
blockjob info always returns 1 and the loop never ends.
This fix introduces a check for qemuDomainBlockPivot return code, possibly
skipping the asynchronous waiting completely, if an error occurred and
asynchronous waiting was the preferred method.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1139567
Reconnect to the VM is a possibly long-running job spawned in a separate
thread. We should reload the snapshot defs and managedsave state prior
to spawning the thread to avoid blocking of the daemon startup which
would serialize on the VM lock.
Also the reloading code would violate the domain job held while
reconnecting as the loader functions don't create jobs.
Based on previous commit, we can now precreate missing volumes. While
digging out the functionality from storage driver would be nicer, if
you've seen the code it's nearly impossible. So I'm going from the
other end:
1) For given disk target, disk path is looked up.
2) For the disk path, storage pool is looked up, a volume XML is
constructed and then passed to virStorageVolCreateXML() which has all
the knowledge how to create raw images, (encrypted) qcow(2) images,
etc.
One of the advantages of this approach is, we don't have to care about
image conversion - qemu does that for us. So for instance, users can
transform qcow2 into raw on migration (if the correct XML is passed to
the migration API).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Up 'til now, users need to precreate non-shared storage on migration
themselves. This is not very friendly requirement and we should do
something about it. In this patch, the migration cookie is extended,
so that <nbd/> section does not only contain NBD port, but info on
disks being migrated. This patch sends a list of pairs of:
<disk target; disk size>
to the destination. The actual storage allocation is left for next
commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The function queries the block devices visible to qemu
('query-block') and parses the qemu's output. The info is
returned in a hash table which is expected to be pre-filled by
qemuMonitorJSONGetAllBlockStatsInfo(). However, in the next patch
we are not going to call the latter function at all, so we should
make the former function add devices into the hash table if not
found there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since virDomainSnapshotFree will call virObjectUnref anyway, let's just use
that directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
Since virNetworkFree will call virObjectUnref anyway, let's just use that
directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
Since virDomainFree will call virObjectUnref anyway, let's just use that
directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
There is a race condition between the fopen and fscanf calls
in qemuGetProcessInfo. If fopen succeeds, there is a small
possibility that the file no longer exists before reading from it.
Now, if either fopen or fscanf calls fail, the function will behave
just as only fopen had failed.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1169055
Signed-off-by: Eric Blake <eblake@redhat.com>
Coverity complained that because the cfg->macFilter call checked
net->ifname != NULL before calling ebtablesRemoveForwardAllowIn, then
the virNetDevOpenvswitchRemovePort call should have the same check.
However, if I move the ebtables call prior to the check for TYPE_DIRECT
(where there is a VIR_FREE(net->ifname)), then it seems Coverity is
happy. Since firewall info is tacked on last during setup, removing
it in the opposite order of initialization seems to be natural anyway
There are some small issue in qemuProcessAttach:
1.Fix virSecurityManagerGetProcessLabel always get pid = 0,
move 'vm->pid = pid' before call virSecurityManagerGetProcessLabel.
2.Use virSecurityManagerGenLabel to get image label.
3.Fix always set selinux label for other security driver label.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
When a block{commit,copy} job was aborted on a domain, block job handler
did not process it correctly, leaving a phantom job in the background.
Any further calls to any blockjob causes "block <jobtype> still active"
error. This patch fixes the blockjob handler so that it checks not only
for VIR_DOMAIN_BLOCK_JOB_FAILED status, but VIR_DOMAIN_BLOCK_JOB_CANCELED
status as well, followed by our existing cleanup routine.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1135169
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
If job is failed in qemuMigrationRun, we expect the jobinfo type as
FAILED. But jobinfo type won't be updated until entering
qemuMigrationWaitForCompletion. We should make it updated in all
conditions. Moreover, we can't use qemuMigrationUpdateJobStatus
here because job may fail in libvirt, so we can't query job status
from QEMU.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
The migration job status is traced in qemuMigrationUpdateJobStatus
which is called in qemuMigrationRun. But if migration is cancelled
before the trace such as in qemuMigrationDriveMirror, the jobinfo
type won't be updated to CANCELLED. After this patch, we can get
jobinfo type CANCELLED if migration is cancelled during drive
mirror. Moreover, we can't use qemuMigrationUpdateJobStatus
because from qemu's point of view it's just the drive mirror being
cancelled and the migration hasn't even started yet.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1160084
As of b6d4dad11b (1.2.5) we are trying to keep the status of FSFreeze
in the guest. Even though I've tried to fixed couple of corner cases
(6ea54769ba18), it occurred to me just recently, that the approach is
broken by design. Firstly, there are many other ways to talk to
qemu-ga (even through libvirt) that filesystems can be thawed (e.g.
qemu-agent-command) without libvirt noticing. Moreover, there are
plenty of ways to thaw filesystems without even qemu-ga noticing (yes,
qemu-ga keeps internal track of FSFreeze status). So, instead of
keeping the track ourselves, or asking qemu-ga for stale state, it's
the best to let qemu-ga deal with that (and possibly let guest kernel
propagate an error).
Moreover, there's one bug with the following approach, if fsfreeze
command failed, we've executed fsthaw subsequently. So issuing
domfsfreeze in virsh gave the following result:
virsh # domfsfreeze gentoo
Froze 1 filesystem(s)
virsh # domfsfreeze gentoo
error: Unable to freeze filesystems
error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance
virsh # domfsfreeze gentoo
Froze 1 filesystem(s)
virsh # domfsfreeze gentoo
error: Unable to freeze filesystems
error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virReportSystemError is reserved for reporting system errors, calling it
with VIR_ERR_* error codes produces error messages that do not make any
sense, such as
internal error: guest failed to start: Kernel doesn't support user
namespace: Link has been severed
We should prohibit wrong usage with a syntax-check rule.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Commit 6fcddfcd refactored job statistics but missed the jobinfo type updated
in qemuDomainGetJobInfo. After this patch, we can use virDomainGetJobInfo to
get jobinfo type again.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Add attribute to set vgamem_mb parameter of QXL device for QEMU. This
value sets the size of VGA framebuffer for QXL device. Default value in
QEMU is 8MB so reuse it also in libvirt to not break things.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
So far we didn't have any option to set video memory size for qemu video
devices. There was only the vram (ram for QXL) attribute but it was valid
only for the QXL video device.
To provide this feature to users QEMU has a dedicated device attribute
called 'vgamem_mb' to set the video memory size. We will use the 'vram'
attribute for setting video memory size for other QEMU video devices.
For the cirrus device we will ignore the vram value because it has
hardcoded video size in QEMU.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
QEMU has two different type of QXL display device. The first "qxl-vga"
is for primary video device and second "qxl" is for secondary video
device.
There are also two different ways how to specify those devices on qemu
command line, the first one and obsolete is using "-vga" option and the
current new one is using "-device" option. The "-vga" could be used only
to setup primary video device, so the "-vga qxl" equal to
"-device qxl-vga". Unfortunately the "-vga qxl" doesn't support setting
additional parameters for the device and "-global" option must be used
for this purpose. It's mandatory to use "-global qxl-vga...." to set the
parameters of primary video device previously defined with "-vga qxl".
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The vram attribute was introduced to set the video memory but it is
usable only for few hypervisors excluding QEMU/KVM and the old XEN
driver. Only in case of QEMU the vram was used for QXL.
This patch updates the documentation to reflect current code in libvirt
and also changes the cases when we will set the default vram attribute.
It also fixes existing strange default value for VGA devices 9MB to 16MB
because the video ram should be rounded to power of two.
The change of default value could affect migrations but I found out that
QEMU always round the video ram to power of two internally so it's safe
to change the default value to the next closest power of two and also
silently correct every domain XML definition. And it's also safe because
we don't pass the value to QEMU.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Get mounted filesystems list, which contains hardware info of disks and its
controllers, from QEMU guest agent 2.2+. Then, convert the hardware info
to corresponding device aliases for the disks.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Improve the monitor function to also retrieve the guest state of
character device (if provided) so that we can refresh the state of
virtio-serial channels and perhaps react to changes in the state in
future patches.
This patch changes the returned data from qemuMonitorGetChardevInfo to
return a structure containing the pty path and the state for all the
character devices.
The change to the testsuite makes sure that the data is parsed
correctly.
To be able to express some use cases of the RBD backing with libvirt, we
need to be able to specify a config file for the RBD client to qemu as
that is one of the commonly used options.
Some storage systems have internal support for snapshots. Libvirt should
be able to select a correct snapshot when starting a VM.
This patch adds a XML element to select a storage source snapshot for
the RBD protocol which supports this feature.
To allow reuse this non-trivial parser code in the backing store parser
this part of the command line parser needs to be split out into a
separate funciton.
Instead of splitting out various fields, pass the complete structure and
let the function pick various things of it.
As one of the callers isn't using virStorageSourcePtr to store the data,
this patch adds glue code that fills the data into a dummy
virStorageSourcePtr before calling the func.
This change will help when adding new fields that need output processing
in the future.
New qemu added a new event that is emitted when a virtio serial channel
is opened in the guest OS. This allows us to update the state of the
port in the output-only XML element.
This patch implements the monitor callbacks and necessary handlers to
update the state in the definition.
To unify future additions that require information from "query-chardev"
rename qemuMonitorGetPtyPaths and friends to qemuMonitorGetChardevInfo
and move the allocation of the returned hash into the top level
function.
When creating a disk image snapshot the libvirt code would blindly copy
the parents label to the newly created image. This runs into problems
when you start a VM from an image hosted on NFS (or other storage system
that doesn't support selinux labels) and the snapshot destination is on
a storage system that does support selinux labels. Libvirt's code in
that case generates a different security label for the image hosted on
NFS. This label is valid only for NFS images and doesn't allow access in
case of a locally stored image.
To fix this issue libvirt needs to refrain from copying security
information in cases where the default domain seclabel is a better
choice.
This patch repurposes the now unused @force argument of
virStorageSourceInitChainElement to denote whether a copy of the
security labelling stuff should be attempted or not. This allows to
fine-control the copy operation for cases where we need to keep the
label of the old disk vs. the cases where we need to keep the label
unset to use the default domain imagelabel.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1151718
Oops, I forgot to squash one more instance of the same check in the
previous commit (v1.2.10-144-g52691f9).
https://bugzilla.redhat.com/show_bug.cgi?id=1147331
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Any attempt to start a tunnelled migration with libvirtd that supports
RDMA migration (specifically commit v1.2.8-226-ged22a47) crashes
libvirtd on the destination host.
The crash is inevitable because qemuMigrationPrepareAny is always called
with NULL protocol in case of tunnelled migration.
https://bugzilla.redhat.com/show_bug.cgi?id=1147331
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
As discussed on the upstream list, it's better not to make this
kind of predictions in libvirt. It may happen that qemu learns
how to enable OVMF on other architectures too and we shouldn't
try to chase that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently, we are whitelisting architectures, that we know how to run
OVMF on. So far, only x86_64 was enabled. However, looking at qemu
code, the same commandline can be used to enable OVMF for armv7l and
aarch64.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I noticed this while working on qemuDomainGetBlockInfo. Assigning
a bool value to an int variable compiles fine, but raises red flags
on the maintenance front as it becomes too easy to assign -1 or 2
or any other non-bool value to the same variable.
* cfg.mk (sc_prohibit_int_assign_bool): New rule.
* src/conf/snapshot_conf.c (virDomainSnapshotRedefinePrep): Fix
offenders.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo)
(qemuDomainSnapshotCreateXML): Likewise.
* src/test/test_driver.c (testDomainSnapshotAlignDisks):
Likewise.
* src/util/vircgroup.c (virCgroupSupportsCpuBW): Likewise.
* src/util/virpci.c (virPCIDeviceBindToStub): Likewise.
* src/util/virutil.c (virIsCapableVport): Likewise.
* tools/virsh-domain-monitor.c (cmdDomMemStat): Likewise.
* tools/virsh-domain.c (cmdBlockResize, cmdScreenshot)
(cmdInjectNMI, cmdSendKey, cmdSendProcessSignal)
(cmdDetachInterface): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Ethernet interfaces in libvirt currently do not support bandwidth setting.
For example, following xml file for an interface will not apply these
settings to corresponding qdiscs.
<interface type="ethernet">
<mac address="02:36:1d:18:2a:e4"/>
<model type="virtio"/>
<script path=""/>
<target dev="tap361d182a-e4"/>
<bandwidth>
<inbound average="984" peak="1024" burst="64"/>
<outbound average="2000" peak="2048" burst="128"/>
</bandwidth>
</interface>
Signed-off-by: Anirban Chakraborty <abchak@juniper.net>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>