We were not directly saving the domain XML to file after starting
or finishing a blockcopy. Without the startup write, a libvirtd
restart in the middle of a copy job would forget that the job was
underway. Then at pivot, we were indirectly writing new XML in
reaction to events that occur as we stop and restart the guest CPUs.
But there was a race: since pivot is an async action, it is possible
that libvirtd is restarted before the pivot completes, so if XML
changes during the event, that change was not written. The original
blockcopy code cleared out the <mirror> element prior to restarting
the CPUs, but this is also a race, observed if a user does an async
pivot and a dumpxml before the event occurs. Furthermore, this race
will interfere with active commit in a future patch, because that
code will rely on the <mirror> element at the time of the qemu event
to determine whether to inform the user of a normal commit or an
active commit.
Fix things by saving state any time we modify live XML, while
delaying XML disk modifications until after the event completes. We
still need a to teach libvirtd restarts to examine all existing
<mirror> elements to see if the job completed in the meantime (that
is, if libvirtd misses the event, the updated state still needs to be
updated in live XML), but that will be a later patch, in part because
we also need to to start taking advantage of newer qemu's ability to
keep the job around after completion rather than the current usage
where the job disappears both on error and on success.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change
on disk.
(qemuDomainBlockJobImpl, qemuDomainBlockPivot): Move job-end XML
rewrites...
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here.
Signed-off-by: Eric Blake <eblake@redhat.com>
Doing a blockcopy operation across a libvirtd restart is not very
robust at the moment. In particular, we are clearing the <mirror>
element prior to telling qemu to finish the job. Also, thanks to the
ability to request async completion, the user can easily regain
control prior to qemu actually finishing the effort, and they should
be able to poll the domain XML to see if the job is still going.
A future patch will fix things to actually wait until qemu is done
before modifying the XML to reflect the job completion. But since
qemu issues identical BLOCK_JOB_COMPLETE events regardless of whether
the job was cancelled (kept the original disk) or completed (pivoted
to the new disk), we have to track which of the two operations were
used to end the job. Furthermore, we'd like to avoid attempts to
end a job where we are already waiting on an earlier request to qemu
to end the job. Likewise, if we miss the qemu event (perhaps because
it arrived during a libvirtd restart), we still need enough state
recorded to be able to determine how to modify the domain XML once
we reconnect to qemu and manually learn whether the job still exists.
Although this patch doesn't actually fix the problem, it is a
preliminary step that makes it possible to track whether a job
has already begun steps towards completion.
* src/conf/domain_conf.h (virDomainDiskMirrorState): New enum.
(_virDomainDiskDef): Convert bool mirroring to new enum.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainDiskDefFormat): Handle new values.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Adjust
client.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl): Likewise.
* docs/schemas/domaincommon.rng (diskMirror): Expose new values.
* docs/formatdomain.html.in (elementsDisks): Document it.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
If PCI passthrough type is not supported, we should error out rather than
continue building the command line.
When starting a domain, the type has been already checked by
qemuPrepareHostdevPCICheckSupport() before building qemu command line,
so the problem doesn't emerge.
But when coverting a domain xml without specifying passthrough type explictly
to qemu arg, we will get a malformed command line.
the xml:
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0001' bus='0x03' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>
the converted command line:
-device ,host=0001:03:00.0,id=hostdev0,bus=pci.0,addr=0x5
After this patch, virsh gives an error message:
virsh domxml-to-native qemu-argv /tmp/tmp.xml
error: internal error: invalid PCI passthrough type 'default'
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Use better detection of hugetlbfs mount points. Yes, there can be
multiple mount points each serving different huge page size.
Since we already have ability to override the mount point in the
qemu.conf file, this crazy backward compatibility code is brought in.
Now we allow multiple mount points, so the "hugetlbfs_mount" option
must take an list of strings (mount points). But previously, it was
just a string, so we must accept both types now.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This should iterate over mount tab and search for hugetlbfs among with
looking for the default value of huge pages.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Use correct mode when pre-creating files (for snapshots). The refactor
changing to storage driver usage caused a regression as some systems
created the file with 000 permissions forbidding qemu to write the file.
Pass mode to the creating functions to avoid the problem.
Regression since 185e07a5f8.
* docs/schemas/domaincommon.rng: Add bhyve domain type, nmdm
serial type and master and slave optional attributes for
serial that are used by nmdm
* tests/domainschematest: Add bhyvexml2argvdata directory
to validate bhyve XMLs
Leak introduced in commit 16ebf10f (v1.2.6), detected by valgrind:
==9816== 216 (96 direct, 120 indirect) bytes in 6 blocks are definitely lost in loss record 665 of 821
==9816== at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9816== by 0x50836FB: virAlloc (viralloc.c:144)
==9816== by 0x1DBDBE27: udevProcessPCI (node_device_udev.c:546)
==9816== by 0x1DBDD79D: udevGetDeviceDetails (node_device_udev.c:1293)
* src/util/virpci.h (virPCIEDeviceInfoFree): New prototype.
* src/util/virpci.c (virPCIEDeviceInfoFree): New function.
* src/conf/node_device_conf.c (virNodeDevCapsDefFree): Clear
pci_express under pci case.
(virNodeDevCapPCIDevParseXML): Avoid leak.
* src/node_device/node_device_udev.c (udevProcessPCI): Likewise.
* src/libvirt_private.syms (virpci.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Finding virPCIE* code is more intuitive if located in virpci.h
instead of node_device_conf.h.
* src/conf/node_device_conf.h (virPCIELinkSpeed, virPCIELink)
(virPCIEDeviceInfo): Move...
* src/util/virpci.h: ...here.
* src/conf/node_device_conf.c (virPCIELinkSpeed): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The compiler can alert us to places where we need to expand switch
statements because we add a new enum value, but only if we don't
have a default case.
* src/conf/node_device_conf.c (virNodeDeviceDefFormat)
(virNodeDevCapsDefParseXML, virNodeDevCapsDefFree): Drop default
case.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit e5f36698e3 introduces a
false-positive build failure in the sound card model handling switch.
Initialize the model to NULL although the value should never be used.
Libvirt documents that the default entropy source for the 'random'
backend of a RNG device is /dev/random. Instead of storing and
propagating NULL across our code and checking it in multiple places fill
the default in the post parse callback and use that in the other places.
Since 24e5cafba6 (thankfully unreleased)
when a VM with an empty disk drive would be started the code would call
stat() on NULL path as a check was missing from the callback rendering
machines unstartable.
Report success when the path is empty (denoting an empty drive).
virTimeFieldsThenRaw will never return negative result, so I clean up
the related meaningless judgements to make it better.
Signed-off-by: James <james.wangyufei@huawei.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The "random" backend for virtio-rng can be started with no path
specified which equals to /dev/random. The cgroup code didn't consider
this and called few of the functions with NULL resulting into:
$ virsh start rng-vm
error: Failed to start domain rng-vm
error: Path '(null)' is not accessible: Bad address
Problem introduced by commit c6320d3463
If user hasn't provided any @emulatorbin, the qemuCaps are
searched by @arch provided (which in fact can be guessed from the
host). However, there's no guarantee that the qemu binary for
@arch will exist. Therefore qemu capabilities may be nonexistent
too. If that's the case, we should throw an error message prior
jumping onto 'cleanup' label as the helper lookup function
remains silent on no search result.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add support for CDROM devices for bhyve driver using
bhyve(8)'s 'ahci-cd' device type.
As bhyve currently does not support media insertion at runtime,
disallow to start a domain with an empty source path for cdrom
devices.
Create the structures and API's to hold and manage the iSCSI host device.
This extends the 'scsi_host' definitions added in commit id '5c811dce'.
A future patch will add the XML parsing, but that code requires some
infrastructure to be in place first in order to handle the differences
between a 'scsi_host' and an 'iSCSI host' device.
Split virDomainHostdevSubsysSCSI further. In preparation for having
either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI
to contain just a virDomainHostdevSubsysSCSIHost to describe the
'scsi_host' host device
To integrate the security driver with the storage driver we need to
pass a callback for a function that will chown storage volumes.
Introduce and document the callback prototype.
When restoring security labels in the dac driver the code would resolve
the file path and use the resolved one to be chown-ed. The setting code
doesn't do that. Remove the unnecessary code.
With my intended use of storage driver assist to chown files on remote
storage we will need a witness that will tell us whether the given
storage volume supports operations needed by the storage driver.
Gluster storage works on a similar principle to NFS where it takes the
uid and gid of the actual process and uses it to access the storage
volume on the remote server. This introduces a need to chown storage
files on gluster via native API.
Up to now, users have to pass two arguments at least: domain virt type
('qemu' vs 'kvm') and one of emulatorbin or architecture. This is not
much user friendly. Nowadays users mostly use KVM and share the host
architecture with the guest. So now, the API (and subsequently virsh
command) can be called with all NULLs (without any arguments).
Before this patch:
# virsh domcapabilities
error: failed to get emulator capabilities
error: virttype_str in qemuConnectGetDomainCapabilities must not be NULL
# virsh domcapabilities kvm
error: failed to get emulator capabilities
error: invalid argument: at least one of emulatorbin or architecture fields must be present
After:
# virsh domcapabilities
<domainCapabilities>
<path>/usr/bin/qemu-system-x86_64</path>
<domain>kvm</domain>
<machine>pc-i440fx-2.1</machine>
<arch>x86_64</arch>
<vcpu max='255'/>
</domainCapabilities>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds back the virDomainDef typedef into domain_conf and
makes all the numatune_conf functions independent of any virDomainDef
definitions.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit 5028160 accidentally weakened the strtol prohibitions to
skip ALL files under src/util instead of the former situation of
just protecting util/virsexpr.c; even though NONE of the files
in that directory need any protection.
Shorten some long lines while at it.
* cfg.mk (exclude_file_name_regexp--sc_prohibit_strtol): No need
to exclude all of util.
(exclude_file_name_regexp--sc_prohibit_sprintf): Reduce long line.
(exclude_file_name_regexp--sc_prohibit_raw_allocation): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Our seclabel parsing was repeatedly assigning malloc'd data into a
temporary variable, without first freeing the previous use. Among
other leaks flagged by valgrind:
==9312== 8 bytes in 1 blocks are definitely lost in loss record 88 of 821
==9312== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9312== by 0x8C40369: strdup (strdup.c:42)
==9312== by 0x50EA799: virStrdup (virstring.c:676)
==9312== by 0x50FAEB9: virXPathString (virxml.c:90)
==9312== by 0x50FAF1E: virXPathStringLimit (virxml.c:112)
==9312== by 0x510F516: virSecurityLabelDefParseXML (domain_conf.c:4571)
==9312== by 0x510FB20: virSecurityLabelDefsParseXML (domain_conf.c:4720)
While it was multiple problems, it looks like commit da78351 (thankfully
unreleased) was to blame for all of them.
* src/conf/domain_conf.c (virSecurityLabelDefParseXML): Plug leaks
detected by valgrind.
Signed-off-by: Eric Blake <eblake@redhat.com>