introduce function
xenParseXMEventActions(virConfPtr conf,........)
which parses events leading to certain actions
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
introduce function
xenParseXMTimeOffset(virConfPtr conf,.......);
which parses time offset config instead
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com>
During review of the iSCSI hostdev series, eblake noted that the
prototypes shouldn't have the extranenous space between the "*" and
the function name:
http://www.redhat.com/archives/libvir-list/2014-July/msg01227.html
Since it was more invasive than 1 or 2 lines - I said I'd send a
patch covering this once committed.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Introduce a new structure to handle an iSCSI host device based on the
existing virDomainHostdevSubsysSCSI by adding a "protocol='iscsi'" to
the <source/> element. The existing scsi_host subsystem RNG was modified
to read an optional "protocol='adapter'", although it won't be written
out nor is it documented as an option (by choice).
The new hostdev structure mimics the existing <disk/> element for an
iSCSI device (network) device. New XML is:
<hostdev mode='subsystem' type='scsi' managed='yes'>
<source protocol='iscsi' name='iqn.1992-01.com.example'>
<host name='example.org' port='3260'/>
<auth username='myname'>
<secret type='iscsi' usage='mycluster_myname'/>
</auth>
</source>
<address type='drive' controller='0' bus='0' target='2' unit='5'/>
</hostdev>
The controller element will mimic the existing scsi_host code insomuch
as when 'lsi' and 'virtio-scsi' are used.
In preparation for hostdev support for iSCSI and a virStorageNetHostDefPtr,
split out the network disk storage parsing of the 'host' element into a
separate routine.
Commit febf84c2 tried to delay in-memory modification of the actual
domain disk structure until after the qemu event was received.
However, I missed that the code for block pivot had been temporarily
setting disk->src = disk->mirror prior to the qemu command, in order
to label the backing chain of a reused external blockcopy disk;
and calls into qemu while still in that state before finally undoing
things at the cleanup label. Since the qemu event handler then does:
virStorageSourceFree(disk->src);
disk->src = disk->mirror;
we have the sad race that a fast enough qemu event can cause a leak of
the original disk->src, as well as a use-after-free of the disk->mirror
contents, bad enough to crash libvirtd in some of my test runs, even
though the common case of the qemu event being much later won't trip
the race.
I'll go wear the brown paper bag of shame, for introducing a crasher
in between rc1 and rc2 of the freeze for 1.2.7 :( My only
consolation is that virDomainBlockJobAbort requires the domain:write
ACL, so it is not a CVE.
The valgrind report when the race occurs looks like:
==25612== Invalid read of size 4
==25612== at 0x50E7C90: virStorageSourceGetActualType (virstoragefile.c:1948)
==25612== by 0x209C0B18: qemuDomainDetermineDiskChain (qemu_domain.c:2473)
==25612== by 0x209D7F6A: qemuProcessHandleBlockJob (qemu_process.c:1087)
==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357)
...
==25612== Address 0xe4b5610 is 0 bytes inside a block of size 200 free'd
==25612== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==25612== by 0x50839E9: virFree (viralloc.c:582)
==25612== by 0x50E7E51: virStorageSourceFree (virstoragefile.c:2015)
==25612== by 0x209D7EFF: qemuProcessHandleBlockJob (qemu_process.c:1073)
==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357)
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Don't corrupt
disk->src, and only label chain for blockcopy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Valgrind caught a memory leak:
==2018== 9 bytes in 1 blocks are definitely lost in loss record 143 of 927
==2018== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2018== by 0x8C42369: strdup (strdup.c:42)
==2018== by 0x50EACC9: virStrdup (virstring.c:676)
==2018== by 0x50E79E5: virStorageSourceCopy (virstoragefile.c:1845)
==2018== by 0x20A3FAA7: qemuDomainBlockCommit (qemu_driver.c:15620)
==2018== by 0x51DC6B2: virDomainBlockCommit (libvirt.c:20092)
I traced it to the fact that blockcopy and blockcommit end up
reparsing a backing chain on pivot, but the chain parsing code
doesn't gracefully handle the case where the backing file is
already known.
I'm not exactly sure when this was introduced, but suspect that the
refactoring in commit 9944b71 and friends that moved towards probing
in-place rather than into a temporary structure are part of the cause.
* src/util/virstoragefile.c (virStorageFileGetMetadataInternal):
Don't leak any prior value.
Signed-off-by: Eric Blake <eblake@redhat.com>
Fix a comment in virDomainAuditNetDevice.
Fix a typo in comment of qemuPhysIfaceConnect which is
the caller of virDomainAuditNetDevice.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
RNG schema as well as the qemu driver requires absolute paths for memory
and disk snapshot image files but the XML parser was not enforcing it.
Add checks to avoid problems in qemu where the configuration it creates
is invalid.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1126329
Since commit be0782e1 we are parsing /proc/meminfo to find out the
default huge page size. However, if the host we are running at does
not support any huge pages (e.g. CONFIG_HUGETLB_PAGE is turned off),
we will not successfully parse the meminfo file and hence the whole
qemu driver init process fails. Moreover, the default huge page size
is needed if and only if there's at least one hugetlbfs mount point.
So the fix consists of moving the virFileGetDefaultHugepageSize
function call after the first hugetlbfs mount point is found.
With this fix, we fail to start with one or more hugetlbfs mounts and
malformed meminfo file, but that's expected (how can one mount
hugetlbfs without kernel supporting huge pages?). Workaround in that
case is to umount all the hugetlbfs mounts.
Reported-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In a system with Fiber Channel Host Adapters, a query to list all Fibre Channel
HBAs OR Vports currently returns empty list:
$ virsh nodedev-list --cap fc_host
$
Libvirt correctly discovers properties for all HBAs. However, the reporting
fails because of incorrect flag comparison while filtering these types.
This is fixed by removing references to 'VIR_CONNECT_LIST_NODE_DEVICES_CAP_*'
for comparison and replacing those with 'VIR_NODE_DEV_CAP_*'
Introduced by original commit id '652a2ec6'
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Commit 232a31b munged job info to report 'active commit' instead of
'commit' when generating events, but forgot to also munge the polling
variant of the command.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Adjust type as
needed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Otherwise this beautiful error would be overwritten when
the function is called with a really high rate number:
2014-07-28 12:51:47.920+0000: 2304: error : virCommandWait:2399 :
internal error: Child process (/sbin/tc class add dev vnet0 parent 1:
classid 1:1 htb rate 4294968kbps) unexpected exit status 1: Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
default minor id of class to which unclassified packets are sent {0}
r2q DRR quantums are computed as rate in Bps/r2q {10}
debug string of 16 numbers each 0-3 {0}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
[prio P] [slot S] [pslot PS]
[ceil R2] [cburst B2] [mtu MTU] [quantum Q]
rate rate allocated to this class (class can still borrow)
burst max bytes burst which can be accumulated during idle period {computed}
mpu minimum packet size used in rate computations
overhead per-packet size overhead used in rate computations
linklay adapting to a linklayer e.g. atm
ceil definite upper class rate (no borrows) {rate}
cburst burst but for ceil {computed}
mtu max packet size we create rate map for {1600}
prio priority of leaf; lowe
https://bugzilla.redhat.com/show_bug.cgi?id=1043735
https://bugzilla.redhat.com/show_bug.cgi?id=1072653
Upon successful upload of a volume, the target volume and storage pool
were not updated to reflect any changes as a result of the upload. Make
use of the existing stream close callback mechanism to force a backend
pool refresh to occur in a separate thread once the stream closes. The
separate thread should avoid potential deadlocks if the refresh needed
to wait on some event from the event loop which is used to perform
the stream callback.
There are multiple mount points after commit 725a211f, but one comment
wasn't changed to use plurals.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Cygwin has getifaddrs(), but not AF_LINK, leading to:
util/virstats.c: In function 'virNetInterfaceStats':
util/virstats.c:138:41: error: 'AF_LINK' undeclared (first use in this function)
if (ifa->ifa_addr->sa_family != AF_LINK)
...
* src/util/virstats.c (virNetInterfaceStats): Only use getifaddrs
if AF_LINK is present.
Signed-off-by: Eric Blake <eblake@redhat.com>
libvirt previously only touched an interface's disable_ipv6 setting in
sysfs if it needed to be set to 1, assuming that 0 is the
default. Apparently that isn't always the case though (kernel 3.15.7-1
in Arch Linux reportedly defaults a new interface's disable_ipv6
setting to 1) so this patch explicitly sets it to 0 or 1 as
appropriate.
With this in place, I can (finally!) now do:
virsh blockcommit $dom vda --shallow --verbose --pivot
and watch qemu shorten the backing chain by one, followed by
libvirt automatically updating the dumpxml output, effectively
undoing the work of virsh snapshot-commit --no-metadata --disk-only.
Commit is SOOOO much faster than blockpull, when I'm still fairly
close in time to when the temporary qcow2 wrapper file was created
via a snapshot operation!
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Implement live
commit.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch is going to wire up qemu active block commit jobs;
but as they have similar events and are canceled/pivoted in the
same way as block copy jobs, it is easiest to track all bookkeeping
for the commit job by reusing the <mirror> element. This patch
adds domain XML to track which job was responsible for creating a
mirroring situation, and adds a job='copy' attribute to all
existing uses of <mirror>. Along the way, it also massages the
qemu monitor backend to read the new field in order to generate
the correct type of libvirt job (even though it requires a
future patch to actually cause a qemu event that can be reported
as an active commit). It also prepares to update persistent XML
to match changes made to live XML when a copy completes.
* docs/schemas/domaincommon.rng: Enhance schema.
* docs/formatdomain.html.in: Document it.
* src/conf/domain_conf.h (_virDomainDiskDef): Add a field.
* src/conf/domain_conf.c (virDomainBlockJobType): String conversion.
(virDomainDiskDefParseXML): Parse job type.
(virDomainDiskDefFormat): Output job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Distinguish
active from regular commit.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set job type.
(qemuDomainBlockPivot, qemuDomainBlockJobImpl): Clean up job type
on completion.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml:
Update tests.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml: New
file.
* tests/qemuxml2xmltest.c (mymain): Drive new test.
Signed-off-by: Eric Blake <eblake@redhat.com>
If all features are set to default (including the capabilities policy),
but some capabilities are toggled, we need to output the <features>
element when formatting the config.
We were not directly saving the domain XML to file after starting
or finishing a blockcopy. Without the startup write, a libvirtd
restart in the middle of a copy job would forget that the job was
underway. Then at pivot, we were indirectly writing new XML in
reaction to events that occur as we stop and restart the guest CPUs.
But there was a race: since pivot is an async action, it is possible
that libvirtd is restarted before the pivot completes, so if XML
changes during the event, that change was not written. The original
blockcopy code cleared out the <mirror> element prior to restarting
the CPUs, but this is also a race, observed if a user does an async
pivot and a dumpxml before the event occurs. Furthermore, this race
will interfere with active commit in a future patch, because that
code will rely on the <mirror> element at the time of the qemu event
to determine whether to inform the user of a normal commit or an
active commit.
Fix things by saving state any time we modify live XML, while
delaying XML disk modifications until after the event completes. We
still need a to teach libvirtd restarts to examine all existing
<mirror> elements to see if the job completed in the meantime (that
is, if libvirtd misses the event, the updated state still needs to be
updated in live XML), but that will be a later patch, in part because
we also need to to start taking advantage of newer qemu's ability to
keep the job around after completion rather than the current usage
where the job disappears both on error and on success.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change
on disk.
(qemuDomainBlockJobImpl, qemuDomainBlockPivot): Move job-end XML
rewrites...
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here.
Signed-off-by: Eric Blake <eblake@redhat.com>
Doing a blockcopy operation across a libvirtd restart is not very
robust at the moment. In particular, we are clearing the <mirror>
element prior to telling qemu to finish the job. Also, thanks to the
ability to request async completion, the user can easily regain
control prior to qemu actually finishing the effort, and they should
be able to poll the domain XML to see if the job is still going.
A future patch will fix things to actually wait until qemu is done
before modifying the XML to reflect the job completion. But since
qemu issues identical BLOCK_JOB_COMPLETE events regardless of whether
the job was cancelled (kept the original disk) or completed (pivoted
to the new disk), we have to track which of the two operations were
used to end the job. Furthermore, we'd like to avoid attempts to
end a job where we are already waiting on an earlier request to qemu
to end the job. Likewise, if we miss the qemu event (perhaps because
it arrived during a libvirtd restart), we still need enough state
recorded to be able to determine how to modify the domain XML once
we reconnect to qemu and manually learn whether the job still exists.
Although this patch doesn't actually fix the problem, it is a
preliminary step that makes it possible to track whether a job
has already begun steps towards completion.
* src/conf/domain_conf.h (virDomainDiskMirrorState): New enum.
(_virDomainDiskDef): Convert bool mirroring to new enum.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainDiskDefFormat): Handle new values.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Adjust
client.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl): Likewise.
* docs/schemas/domaincommon.rng (diskMirror): Expose new values.
* docs/formatdomain.html.in (elementsDisks): Document it.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
If PCI passthrough type is not supported, we should error out rather than
continue building the command line.
When starting a domain, the type has been already checked by
qemuPrepareHostdevPCICheckSupport() before building qemu command line,
so the problem doesn't emerge.
But when coverting a domain xml without specifying passthrough type explictly
to qemu arg, we will get a malformed command line.
the xml:
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0001' bus='0x03' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>
the converted command line:
-device ,host=0001:03:00.0,id=hostdev0,bus=pci.0,addr=0x5
After this patch, virsh gives an error message:
virsh domxml-to-native qemu-argv /tmp/tmp.xml
error: internal error: invalid PCI passthrough type 'default'
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Use better detection of hugetlbfs mount points. Yes, there can be
multiple mount points each serving different huge page size.
Since we already have ability to override the mount point in the
qemu.conf file, this crazy backward compatibility code is brought in.
Now we allow multiple mount points, so the "hugetlbfs_mount" option
must take an list of strings (mount points). But previously, it was
just a string, so we must accept both types now.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This should iterate over mount tab and search for hugetlbfs among with
looking for the default value of huge pages.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Use correct mode when pre-creating files (for snapshots). The refactor
changing to storage driver usage caused a regression as some systems
created the file with 000 permissions forbidding qemu to write the file.
Pass mode to the creating functions to avoid the problem.
Regression since 185e07a5f8.
Leak introduced in commit 16ebf10f (v1.2.6), detected by valgrind:
==9816== 216 (96 direct, 120 indirect) bytes in 6 blocks are definitely lost in loss record 665 of 821
==9816== at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9816== by 0x50836FB: virAlloc (viralloc.c:144)
==9816== by 0x1DBDBE27: udevProcessPCI (node_device_udev.c:546)
==9816== by 0x1DBDD79D: udevGetDeviceDetails (node_device_udev.c:1293)
* src/util/virpci.h (virPCIEDeviceInfoFree): New prototype.
* src/util/virpci.c (virPCIEDeviceInfoFree): New function.
* src/conf/node_device_conf.c (virNodeDevCapsDefFree): Clear
pci_express under pci case.
(virNodeDevCapPCIDevParseXML): Avoid leak.
* src/node_device/node_device_udev.c (udevProcessPCI): Likewise.
* src/libvirt_private.syms (virpci.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Finding virPCIE* code is more intuitive if located in virpci.h
instead of node_device_conf.h.
* src/conf/node_device_conf.h (virPCIELinkSpeed, virPCIELink)
(virPCIEDeviceInfo): Move...
* src/util/virpci.h: ...here.
* src/conf/node_device_conf.c (virPCIELinkSpeed): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The compiler can alert us to places where we need to expand switch
statements because we add a new enum value, but only if we don't
have a default case.
* src/conf/node_device_conf.c (virNodeDeviceDefFormat)
(virNodeDevCapsDefParseXML, virNodeDevCapsDefFree): Drop default
case.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit e5f36698e3 introduces a
false-positive build failure in the sound card model handling switch.
Initialize the model to NULL although the value should never be used.
Libvirt documents that the default entropy source for the 'random'
backend of a RNG device is /dev/random. Instead of storing and
propagating NULL across our code and checking it in multiple places fill
the default in the post parse callback and use that in the other places.
Since 24e5cafba6 (thankfully unreleased)
when a VM with an empty disk drive would be started the code would call
stat() on NULL path as a check was missing from the callback rendering
machines unstartable.
Report success when the path is empty (denoting an empty drive).
virTimeFieldsThenRaw will never return negative result, so I clean up
the related meaningless judgements to make it better.
Signed-off-by: James <james.wangyufei@huawei.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The "random" backend for virtio-rng can be started with no path
specified which equals to /dev/random. The cgroup code didn't consider
this and called few of the functions with NULL resulting into:
$ virsh start rng-vm
error: Failed to start domain rng-vm
error: Path '(null)' is not accessible: Bad address
Problem introduced by commit c6320d3463
If user hasn't provided any @emulatorbin, the qemuCaps are
searched by @arch provided (which in fact can be guessed from the
host). However, there's no guarantee that the qemu binary for
@arch will exist. Therefore qemu capabilities may be nonexistent
too. If that's the case, we should throw an error message prior
jumping onto 'cleanup' label as the helper lookup function
remains silent on no search result.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add support for CDROM devices for bhyve driver using
bhyve(8)'s 'ahci-cd' device type.
As bhyve currently does not support media insertion at runtime,
disallow to start a domain with an empty source path for cdrom
devices.
Create the structures and API's to hold and manage the iSCSI host device.
This extends the 'scsi_host' definitions added in commit id '5c811dce'.
A future patch will add the XML parsing, but that code requires some
infrastructure to be in place first in order to handle the differences
between a 'scsi_host' and an 'iSCSI host' device.
Split virDomainHostdevSubsysSCSI further. In preparation for having
either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI
to contain just a virDomainHostdevSubsysSCSIHost to describe the
'scsi_host' host device
To integrate the security driver with the storage driver we need to
pass a callback for a function that will chown storage volumes.
Introduce and document the callback prototype.
When restoring security labels in the dac driver the code would resolve
the file path and use the resolved one to be chown-ed. The setting code
doesn't do that. Remove the unnecessary code.
With my intended use of storage driver assist to chown files on remote
storage we will need a witness that will tell us whether the given
storage volume supports operations needed by the storage driver.
Gluster storage works on a similar principle to NFS where it takes the
uid and gid of the actual process and uses it to access the storage
volume on the remote server. This introduces a need to chown storage
files on gluster via native API.
Up to now, users have to pass two arguments at least: domain virt type
('qemu' vs 'kvm') and one of emulatorbin or architecture. This is not
much user friendly. Nowadays users mostly use KVM and share the host
architecture with the guest. So now, the API (and subsequently virsh
command) can be called with all NULLs (without any arguments).
Before this patch:
# virsh domcapabilities
error: failed to get emulator capabilities
error: virttype_str in qemuConnectGetDomainCapabilities must not be NULL
# virsh domcapabilities kvm
error: failed to get emulator capabilities
error: invalid argument: at least one of emulatorbin or architecture fields must be present
After:
# virsh domcapabilities
<domainCapabilities>
<path>/usr/bin/qemu-system-x86_64</path>
<domain>kvm</domain>
<machine>pc-i440fx-2.1</machine>
<arch>x86_64</arch>
<vcpu max='255'/>
</domainCapabilities>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds back the virDomainDef typedef into domain_conf and
makes all the numatune_conf functions independent of any virDomainDef
definitions.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Our seclabel parsing was repeatedly assigning malloc'd data into a
temporary variable, without first freeing the previous use. Among
other leaks flagged by valgrind:
==9312== 8 bytes in 1 blocks are definitely lost in loss record 88 of 821
==9312== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9312== by 0x8C40369: strdup (strdup.c:42)
==9312== by 0x50EA799: virStrdup (virstring.c:676)
==9312== by 0x50FAEB9: virXPathString (virxml.c:90)
==9312== by 0x50FAF1E: virXPathStringLimit (virxml.c:112)
==9312== by 0x510F516: virSecurityLabelDefParseXML (domain_conf.c:4571)
==9312== by 0x510FB20: virSecurityLabelDefsParseXML (domain_conf.c:4720)
While it was multiple problems, it looks like commit da78351 (thankfully
unreleased) was to blame for all of them.
* src/conf/domain_conf.c (virSecurityLabelDefParseXML): Plug leaks
detected by valgrind.
Signed-off-by: Eric Blake <eblake@redhat.com>
Introduced in commit 70571ccc (v1.2.4). Caught by valgrind:
==9816== 170 (32 direct, 138 indirect) bytes in 1 blocks are definitely lost in loss record 646 of 821
==9816== at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9816== by 0x50836FB: virAlloc (viralloc.c:144)
==9816== by 0x50AEC2B: virFirewallNew (virfirewall.c:204)
==9816== by 0x1E2308ED: ebiptablesDriverProbeStateMatch (nwfilter_ebiptables_driver.c:3715)
==9816== by 0x1E2309AD: ebiptablesDriverInit (nwfilter_ebiptables_driver.c:3742)
* src/nwfilter/nwfilter_ebiptables_driver.c
(ebiptablesDriverProbeStateMatch): Properly clean up.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1122205
Although the edits were changing in-memory XML, it was not flushed
to disk; so unless some other action changes XML, a libvirtd restart
would lose the changed information.
* src/conf/domain_conf.c (virDomainObjSetMetadata): Add parameter,
to save live status across restarts.
(virDomainSaveXML): Allow for test driver.
* src/conf/domain_conf.h (virDomainObjSetMetadata): Adjust
signature.
* src/bhyve/bhyve_driver.c (bhyveDomainSetMetadata): Adjust caller.
* src/lxc/lxc_driver.c (lxcDomainSetMetadata): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSetMetadata): Likewise.
* src/test/test_driver.c (testDomainSetMetadata): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The patch described above introduced two problems caught by the compiler
and thus breaking the build.
One of the problems was comparison of unsigned with < 0 and the second
one jumped a variable init.
Before:
virsh # dominfo chx3
State: shut off
Max memory: 92160 KiB
Used memory: 92160 KiB
After:
virsh # dominfo container1
State: shut off
Max memory: 92160 KiB
Used memory: 0 KiB
Similar to qemu cases.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Added <capabilities> in the <features> section of LXC domains
configuration. This section can contain elements named after the
capabilities like:
<mknod state="on"/>, keep CAP_MKNOD capability
<sys_chroot state="off"/> drop CAP_SYS_CHROOT capability
Users can restrict or give more capabilities than the default using
this mechanism.
kernel commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e
forbid us doing a fresh mount for sysfs
when enable userns but disable netns.
This patch will create a bind mount in this senario.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Under ./configure --without-numactl but with numactl-devel installed,
the build fails with:
../../src/util/virnuma.c: In function 'virNumaNodeIsAvailable':
../../src/util/virnuma.c:407:5: error: implicit declaration of function 'numa_bitmask_isbitset' [-Werror=implicit-function-declaration]
return numa_bitmask_isbitset(numa_nodes_ptr, node);
^
and other failures, all because the configure results for particular
functions were used without regard to whether libnuma was even being
linked in.
* src/util/virnuma.c (virNumaGetPages): Fix message typo.
(virNumaNodeIsAvailable): Correct build when not using numactl.
Signed-off-by: Eric Blake <eblake@redhat.com>
virStorageBackendLogicalCreateVol contains a piece like:
if (vol->target.path != NULL) {
/* A target path passed to CreateVol has no meaning */
VIR_FREE(vol->target.path);
}
The 'if' is useless here, but 'syntax-check' doesn't catch that
because of the comment, so drop the 'if'.
Commit ef48a1b introduced virFindSCSIHostByPCI for Linux and
a stub for other platforms that returns -1 while the function
should return 'char *', so use 'return NULL' instead.
Commit fbd91d4 introduced virReadSCSIUniqueId with the third
argument 'int *result', however the stub for non-Linux patform
uses 'unsigned int *result', so change it to 'int *result'.
Pushed under the build breaker rule.
If a parentaddr was provided in the XML, have getAdapterName lookup
the stable address. This allows virStorageBackendSCSICheckPool() and
virStorageBackendSCSIRefreshPool() to automagically find the scsi_host
by its PCI address and unique_id
Introduce a new function to parse the provided scsi_host parent address
and unique_id value in order to find the /sys/class/scsi_host directory
which will allow a stable SCSI host address
Add a test to scsihosttest to lookup the host# name by using the PCI address
and unique_id value
Add an optional unique_id parameter to nodedev. Allows for easier lookup
and display of the unique_id value in order to document for use with
scsi_host code.
Introduce a new function to read the current scsi_host entry and return
the value found in the 'unique_id' file.
Add a 'scsihosttest' test (similar to the fchosttest, but incorporating some
of the concepts of the mocked pci test library) in order to read the
unique_id file like would be found in the /sys/class/scsi_host tree.
Between reboots and kernel reloads, the SCSI host number used for SCSI
storage pools may change requiring modification to the storage pool XML
in order to use a specific SCSI host adapter.
This patch introduces the "parentaddr" element and "unique_id" attribute
for the SCSI host adapter in order to uniquely identify the adapter
between reboots and kernel reloads. For now the goal is to only parse
and format the XML. Both will be required to be provided in order to
uniquely identify the desired SCSI host.
The new XML is expected to be as follows:
<adapter type='scsi_host'>
<parentaddr unique_id='3'>
<address domain='0x0000' bus='0x00' slot='0x1f' func='0x2'/>
</parentaddr>
</adapter>
where "parentaddr" is the parent device of the SCSI host using the PCI
address on which the device resides and the value from the unique_id file
for the device. Both the PCI address and unique_id values will be used
to traverse the /sys/class/scsi_host/ directories looking at each link
to match the PCI address reformatted to the directory link format where
"domain🚌slot:function" is found. Then for each matching directory
the unique_id file for the scsi_host will be used to match the unique_id
value in the xml.
For a PCI address listed above, this will be formatted to "0000:00:1f.2"
and the links in /sys/class/scsi_host will be used to find the host#
to be used for the 'scsi_host' device. Each entry is a link to the
/sys/bus/pci/devices directories, e.g.:
% ls -al /sys/class/scsi_host/host2
lrwxrwxrwx. 1 root root 0 Jun 1 00:22 /sys/class/scsi_host/host2 -> ../../devices/pci0000:00/0000:00:1f.2/ata3/host2/scsi_host/host2
% cat /sys/class/scsi_host/host2/unique_id
3
The "parentaddr" and "name" attributes are mutually exclusive to identify
the SCSI host number. Use of the "parentaddr" element will be the preferred
mechanism.
This patch only supports to parse and format the XMLs. Later patches will
add code to find out the scsi host number.
Rather than assume that NOT FC_HOST is SCSI_HOST, let's call them out
specifically. Makes it easier to find SCSI_HOST code/structs and ensures
something isn't missed in the future
We do so in the vast majority of places, so there's no problem of adding
the attribute to enforce it by the complier and fix a few leftover
places.
This was originally pointed out by Coverity as a recent change triggered
it's warning that our code checked the vast majority of returns from
virStrToLong_ui.
Convert the target snapshot state selector to a switch statement
enumerating all possible values. This points out a few mistakes in the
original selector.
The logic of the code is preserved until later patches.
Try to reconnect to the running domains after libvirtd restart. To
achieve that, do:
* Save domain state
- Modify virBhyveProcessStart() to save domain state to the state
dir
- Modify virBhyveProcessStop() to cleanup the pidfile and the state
* Detect if the state information loaded from the driver's state
dir matches the actual state. Consider domain active if:
- PID it points to exist
- Process title of this PID matches the expected one with the
domain name
Otherwise, mark the domain as shut off.
Note: earlier development bhyve versions before FreeBSD 10.0-RELEASE
didn't set proctitle we expect, so the current code will not detect
it. I don't plan adding support for this unless somebody requests
this.
As with the local SCSI passthrough devicesm qemu can't support snapshots
on those as the block ops are handled by the device. This is also true
for iSCSI backing of the disk. Remove the check for the local block
device and just forbid snapshot when the disk is of type 'lun'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1073368
There's no need to use it since we have this shiny functions
that even checks for conversion and overflow errors.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
LXC network devices can now be assigned a custom NIC device name on the
container side. For example, this is configured with:
<interface type='network'>
<source network='default'/>
<guest dev="eth1"/>
</interface>
In this example the network card will appear as eth1 in the guest.
https://bugzilla.redhat.com/show_bug.cgi?id=1091866
Add a new boolean 'sparse'. This will be used by the logical backend
storage driver to determine whether the target volume is sparse or not
(also known by a snapshot or thin logical volume). Although setting sparse
to true at creation could be seen as duplicitous to setting during
virStorageBackendLogicalMakeVol() in case there are ever other code paths
between Create and FindLVs that need to know about the volume be sparse.
Use the 'sparse' in a new virStorageBackendLogicalVolWipe() to decide whether
to attempt to wipe the logical volume or not. For now, I have found no
means to wipe the volume without writing to it. Writing to the sparse
volume causes it to be filled. A sparse logical volume is not completely
writeable as there exists metadata which if overwritten will cause the
sparse lv to go INACTIVE which means pool-refresh will not find it.
Access to whatever lvm uses to manage data blocks is not provided by
any API I could find.
Commit 93e82727 introduced numatune_conf.h file that contains
typedefs already defined in domain_conf.h, such as:
- virDomainNumatune
- virDomainNumatunePtr
- virDomainDef
- virDomainDefPtr
As numatune_conf.h is included by domain_conf.h, clang
complains about redefinition of typedef and the build fails.
In order to fix it, drop typedefs already defined by numatume_conf.h
from domain_conf.h.
Coverity complains about the return value of ioctl not being checked.
Even though we carry on when this fails (just like qemu-img does),
we can log an error.
For non-local storage drivers we can't expect to use the "scrub" tool to
wipe the volume. Split the code into a separate backend function so that
we can add protocol specific code later.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1118710
The next patch will move the storage volume wiping code into the
individual backends. This patch splits out the common code to wipe a
local volume into a separate backend helper so that the next patch is
simpler.
When domain is started with numatune memory mode strict and the
nodeset does not include host NUMA node with DMA and DMA32 zones, KVM
initialization fails. This is because cgroup restrict even kernel
allocations. We are already doing numa_set_membind() which does the
same thing, only it does not restrict kernel allocations.
This patch leaves the userspace numa_set_membind() in place and moves
the cpuset.mems setting after the point where monitor comes up, but
before vcpu and emulator sub-groups are created.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Currently, we only bind the whole QEMU domain to memory nodes
specified in nodemask altogether. That, however, doesn't make much
sense when one wants to control from where the memory for particular
guest nodes should be allocated. QEMU allows us to do that by
specifying 'host-nodes' parameter for the 'memory-backend-ram' object,
so let's use that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When qemu switched to using OptsVisitor for -numa parameter, it did
two things in the same patch. One of them is that the numa parameter
is now visible in "query-command-line-options", the second one is that
it enabled using disjoint cpu ranges for -numa specification. This
will be used in later patch.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The numa patch series in qemu adds "memory-backend-ram" object type by
which we can tell whether we can use such objects.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
That can be lately achieved with by having .param == NULL in the
virQEMUCapsCommandLineProps struct.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There were numerous places where numatune configuration (and thus
domain config as well) was changed in different ways. On some
places this even resulted in persistent domain definition not to be
stable (it would change with daemon's restart).
In order to uniformly change how numatune config is dealt with, all
the internals are now accessible directly only in numatune_conf.c and
outside this file accessors must be used.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Since there was already public virDomainNumatune*, I changed the
private virNumaTune to match the same, so all the uses are unified and
public API is kept:
s/vir\(Domain\)\?Numa[tT]une/virDomainNumatune/g
then shrunk long lines, and mainly functions, that were created after
that:
sed -i 's/virDomainNumatuneMemPlacementMode/virDomainNumatunePlacement/g'
And to cope with the enum name, I haad to change the constants as
well:
s/VIR_NUMA_TUNE_MEM_PLACEMENT_MODE/VIR_DOMAIN_NUMATUNE_PLACEMENT/g
Last thing I did was at least a little shortening of already long
name:
s/virDomainNumatuneDef/virDomainNumatune/g
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There are many places with numatune-related code that should be put
into special numatune_conf and this patch creates a basis for that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In XML format, by definition, order of fields should not matter, so
order of parsing the elements doesn't affect the end result. When
specifying guest NUMA cells, we depend only on the order of the 'cell'
elements. With this patch all older domain XMLs are parsed as before,
but with the 'id' attribute they are parsed and formatted according to
that field. This will be useful when we have tuning settings for
particular guest NUMA node.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Excerpt from the virCommandAddArgBuffer() description: "Correctly
transfers memory errors or contents from buf to cmd."
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This patch adds support for the QEMU vhost-user feature to libvirt.
vhost-user enables the communication between a QEMU virtual machine
and other userspace process using the Virtio transport protocol.
It uses a char dev (e.g. Unix socket) for the control plane,
while the data plane based on shared memory.
The XML looks like:
<interface type='vhostuser'>
<mac address='52:54:00:3b:83:1a'/>
<source type='unix' path='/tmp/vhost.sock' mode='server'/>
<model type='virtio'/>
</interface>
Signed-off-by: Michele Paolino <m.paolino@virtualopensystems.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1119173 documents that
commit eaba79d was flawed in the implementation of the
VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag when it comes to completing
a blockcopy. Basically, the qemu pivot action is async (the QMP
command returns immediately, but the user must wait for the
BLOCK_JOB_COMPLETE event to know that all I/O related to the job
has finally been flushed), but the libvirt command was documented
as synchronous by default. As active block commit will also be
using this code, it is worth fixing now.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Don't skip wait
loop after pivot.
Signed-off-by: Eric Blake <eblake@redhat.com>
Now that we've finally fixed all the violators, it's time to
enforce that any pointer to a const object is never freed (it
is aliasing some other memory, where the non-const original
should be freed instead). Alas, the code still needs a normal
vs. Coverity version, but at least we are still guaranteeing
that the macro call evaluates its argument exactly once.
I verified that we still get the following compiler warnings,
which in turn halts the build thanks to -Werror on gcc (hmm,
gcc 4.8.3's placement of the ^ for ?: type mismatch is a bit
off, but that's not our problem):
int oops1 = 0;
VIR_FREE(oops1);
const char *oops2 = NULL;
VIR_FREE(oops2);
struct blah { int dummy; } oops3;
VIR_FREE(oops3);
util/virauthconfig.c:159:35: error: pointer/integer type mismatch in conditional expression [-Werror]
VIR_FREE(oops1);
^
util/virauthconfig.c:161:5: error: passing argument 1 of 'virFree' discards 'const' qualifier from pointer target type [-Werror]
VIR_FREE(oops2);
^
In file included from util/virauthconfig.c:28:0:
util/viralloc.h:79:6: note: expected 'void *' but argument is of type 'const void *'
void virFree(void *ptrptr) ATTRIBUTE_NONNULL(1);
^
util/virauthconfig.c:163:35: error: type mismatch in conditional expression
VIR_FREE(oops3);
^
* src/util/viralloc.h (VIR_FREE): No longer cast away const.
* src/xenapi/xenapi_utils.c (xenSessionFree): Work around bogus
header.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add 'nocow' to storage volume xml so that user can have an option
to set NOCOW flag to the newly created volume. It's useful on btrfs
file system to enhance performance.
Btrfs has low performance when hosting VM images, even more when the guest
in those VM are also using btrfs as file system. One way to mitigate this
bad performance is to turn off COW attributes on VM files. Generally, there
are two ways to turn off COW on btrfs: a) by mounting fs with nodatacow,
then all newly created files will be NOCOW. b) per file. Add the NOCOW file
attribute. It could only be done to empty or new files.
This patch tries the second way, according to 'nocow' option, it could set
NOCOW flag per file:
for raw file images, handle 'nocow' in libvirt code; for non-raw file images,
pass 'nocow=on' option to qemu-img, and let qemu-img to handle that (requires
qemu-img version >= 2.1).
Signed-off-by: Chunyan Liu <cyliu@suse.com>
In many places we define a variable as a 'const char *' when in fact
we modify it just a few lines below. Or even free it. We should not do
that.
There's one exception though, in xenSessionFree() xenapi_utils.c. We
are freeing the xen_session structure which is defined in
xen/api/xen_common.h public header. The structure contains session_id
which is type of 'const char *' when in fact it should have been just
'char *'. So I'm leaving this unmodified, just noticing the fact in
comment.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When the backing store of a volume wasn't accessible while updating the
volume definition the call would fail altogether. In cases where we
currently (incorrectly) treat remote backing stores as local one this
might lead to strange errors.
Ignore the opening errors until we figure out how to track proper volume
metadata.
Use the backing store parser to properly create the information about a
volume's backing store. Unfortunately as the storage driver isn't
prepared to allow volumes backed by networked filesystems add a
workaround that will avoid changing the XML output.
Rework the apparmor lxc profile abstraction to mimic ubuntu's container-default.
This profile allows quite a lot, but strives to restrict access to
dangerous resources.
Removing the explicit authorizations to bash, systemd and cron files,
forces them to keep the lxc profile for all applications inside the
container. PUx permissions where leading to running systemd (and others
tasks) unconfined.
Put the generic files, network and capabilities restrictions directly
in the TEMPLATE.lxc: this way, users can restrict them on a per
container basis.
Rename linuxDomainInterfaceStats to virNetInterfaceStats in order
to allow adding platform specific implementations without
making consumer worrying about specific implementation to be used.
Also, rename util/virstatslinux.c to util/virstats.c so placing
other platform specific implementations into this file don't
look unexpected from the file name.
Code logic in libxlDomainAttachDeviceFlags and libxlDomainDetachDeviceFlags
is wrong with return value in error cases.
'ret' was being set to 0 if 'flags & VIR_DOMAIN_DEVICE_MODIFY_CONFIG' was
false. Then if something like virDomainDeviceDefParse() failed in the
VIR_DOMAIN_DEVICE_MODIFY_LIVE logic, the error would be reported but the
function would return success.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
4cc1f1a01f introduced a crash when doing a
block copy as virStorageSourceInitChainElement was called on
"disk->mirror" that is still NULL at that point instead of "mirror"
which temporarily holds the mirror source struct until it's fully
initialized. This resulted into a crash as a NULL was dereferenced.
Reported by: Shanzi Yu <shyu@redhat.com>
Commit id '3ea661de' refactored the code to use the 'disk->src->path'
instead of getting the path from virDomainDiskGetSource(). The one
call to qemuOpenFile() didn't use the disk source path, rather it used
the path as passed from the caller (in this case 'vda') - this caused
a failure with the virt-test/tp-libvirt as follows:
$ virsh domblkinfo virt-tests-vm1 vda
error: cannot stat file '/home/virt-test/shared/data/images/jeos-20-64.qcow2': Bad file descriptor
$
If the openvswitch service is stopped, and is followed by destroying a
VM, the openvswitch bridge translates into a state where it doesn't
recover the port configuration. While it successfully fetches data
from the internal DB, since the corresponding virtual interface does
not exists anymore the whole recovery process fails leaving restarted
VM with inability to connect to the bridge. The following set of
commands will trigger the problem:
virsh start vm
service openvswitch-switch stop
virsh destroy vm
service openvswitch-switch start
virsh start vm
Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Instead of allocating the virSecurityLabelDef structure ourselves, we
can utilize virSecurityLabelDefNew which even sets the default values
for us.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1113860
We've always done that. Well, until 990e46c45. Point is, if we don't
format model, we may lose a domain on libvirtd restart. If the
seclabel is implicit however, we should skip it's formatting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Snapshots and block-copy have a flag that forces qemu to re-use existing
file. Our docs weren't exactly clear on what the existing file should
contain for this to actually work.
Re-word the docs a bit to state that the file needs to be pre-created in
the desired format and the backing chain metadata needs to be set prior
to handing it over to qemu.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1084360
Commit dae1568c6c converted the perms
member of the virStorageVolTarget struct into a pointer to make it
optional. But virStorageVolTargetDefFormat did not check perms for
NULL before dereferencing it.
Don't fail when there is nothing to do, as a tweak to the previous
patch regarding output of libvirt-UUID.files for LXC apparmor profiles
Signed-off-by: Eric Blake <eblake@redhat.com>