When a user would try changing the persistent IO tuning settings for a
disk that was hotplugged to a vm in a transient way, the
qemuDomainSetBlockIoTune API would use the same index for both the
live and config disk array. The disk was missing from the config array
though causing a crash of libvirtd.
To fix the issue, determine the indexes separately.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1131819
https://bugzilla.redhat.com/show_bug.cgi?id=1095636
When starting up the domain the domain's NICs are allocated. As of
1f24f682 (v1.0.6) we are able to use multiqueue feature on virtio
NICs. It breaks network processing into multiple queues which can be
processed in parallel by different host CPUs. The queues are, however,
created by opening /dev/net/tun several times. Unfortunately, only the
first FD in the row is labelled so when turning the multiqueue feature
on in the guest, qemu will get AVC denial. Make sure we label all the
FDs needed.
Moreover, the default label of /dev/net/tun doesn't allow
attaching a queue:
type=AVC msg=audit(1399622478.790:893): avc: denied { attach_queue }
for pid=7585 comm="qemu-kvm"
scontext=system_u:system_r:svirt_t:s0:c638,c877
tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023
tclass=tun_socket
And as suggested by SELinux maintainers, the tun FD should be labeled
as svirt_t. Therefore, we don't need to adjust any range (as done
previously by Guannan in ae368ebf) rather set the seclabel of the
domain directly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Removing a shared device needs special steps for disks and hostdevs.
Instead of having one function dealing this split the code into two
separate functions that can be used with better granularity.
Adding a shared device needs special steps for disks and hostdevs.
Instead of having one function dealing this split the code into two
separate functions that can be used with better granularity.
The qemuCheckSharedDevice function is operating only on disk devices.
Rename it and change the arguments to reflect that and refactor some
logic for more readability.
Split it out into a separate function and simplify the code. There's no
need to copy the entry to update it as the hash returns pointer to the
existing item.
Also remove the now unused qemuSharedDeviceEntryCopy function.
To allow reuse split the code into a separate function and refactor it.
To update an existing entry there's no need to copy it first, just
update it inplace.
Pass the source of the changed media instead of a complete disk
definition.
Note that the @disk argument now contains what @olddisk would contain.
The new source is passed as a virStorageSource struct.
When we are changing media (or doing other hotplug operations) we need
to setup cgroups, locking and seclabels on the new disk. This is a
multi-step process where every piece can fail. To simplify dealing with
this introduce qemuDomainPrepareDisk that similarly to
qemuDomainPrepareDiskChainElement initializes/tears down a whole new
disk to be used with the domain.
Additionally the function supports passing a different source struct for
media changes of cdroms that will be refactored later.
Currently, qemu driver uses qemuTranslateDiskSourcePool()
to translate disk volume information. This function is
general enough and could be used for other drivers as well,
so move it to conf/domain_conf.c along with its helpers.
- qemuTranslateDiskSourcePool: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePool,
- qemuAddISCSIPoolSourceHost: move to storage/storage_driver.c
and rename to virStorageAddISCSIPoolSourceHost,
- qemuTranslateDiskSourcePoolAuth: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePoolAuth,
- Update users of qemuTranslateDiskSourcePool to use a
new name.
In commit 45ad1adb I added a nicer message for tunings that need
cgroups when unavailable (unprivileged), but I added this check for
I/O tuning of block devices, which doesn't need cgroups, because it is
done by QEMU, so let's fix that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Pin existing vcpus rather than existing vcpu pinning infos. This
increases the complexity of the lookup, but avoids pinning cpus that are
not enabled actually.
Remove the pinning info when removing to CPU, otherwise when the VM will
be started our code will try to pin non-existing vcpus as the definition
wasn't updated.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1129372
When editing guest's XML (on QEMU), it was possible to add multiple
listen elements into graphics parent element. However QEMU does not
support listening on multiple addresses. Configuration is tested for
multiple 'listen address' and if positive, an error is raised.
https://bugzilla.redhat.com/show_bug.cgi?id=1119212
During a QEMU live migration several warning messages about job
handling could be written to syslog on the destination host:
"entering monitor without asking for a nested job is dangerous"
The messages are written because the job handling during migration
uses hard coded asyncJob values in several places that are incorrect.
This patch passes the required asyncJob value around and prevents
the warnings as well as any issues that the warnings may be referring
to.
https://bugzilla.redhat.com/show_bug.cgi?id=1130089
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
At the beginning of the qemu config file parsing function there
are 3 helper macros defined: GET_VALUE_BOOL, GET_VALUE_LONG and
GET_VALUE_STR. Later, when they are no longer needed they are
undefined in order to keep the namespace clean. However, the
GET_VALUE_STRING is undefined instead of GET_VALUE_STR.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Implement ZFS storage backend driver. Currently supported
only on FreeBSD because of ZFS limitations on Linux.
Features supported:
- pool-start, pool-stop
- pool-info
- vol-list
- vol-create / vol-delete
Pool definition looks like that:
<pool type='zfs'>
<name>myzfspool</name>
<source>
<name>actualpoolname</name>
</source>
</pool>
The 'actualpoolname' value is a name of the pool on the system,
such as shown by 'zpool list' command. Target makes no sense
here because volumes path is always /dev/zvol/$poolname/$volname.
User has to create a pool on his own, this driver doesn't
support pool creation currently.
A volume could be used with Qemu by adding an entry like this:
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='myzfspool' volume='vol5'/>
<target dev='hdc' bus='ide'/>
</disk>
In qemuMigrationToFile we enter the monitor multiple times and don't
check if the VM is still alive after returning form the monitor. Add the
checks to skip pieces of code in case the VM crashes while saving it's
state.
Saving a shutoff VM doesn't make sense and libvirtd crashes while
attempting to do that. Check that the domain is alive after entering
the save async job.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1129207
A command to freeze a part of mounted file systems is implemented in
upstream QEMU-guest-agent with a name of 'guest-fsfreeze-freeze-list'.
This fixes the name of the command used to partial fsfreeze in qemu driver
when 'mountpoints' option is specified to virDomainFSFreeze API.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
The virDomainSetInterfaceParameters implementation in qemu over
VIR_DOMAIN_AFFECT_CONFIG doesn't work as expected. When trying to
clear out the bandwidth settings for an interface, it has no
actual effect:
virsh # domiftune --config $domain $interface
inbound.average: 100
inbound.peak : 0
inbound.burst : 0
outbound.average: 10
outbound.peak : 0
outbound.burst : 0
virsh domiftune --config $domain $interface 0 0
virsh # domiftune --config $domain $interface
inbound.average: 100
inbound.peak : 0
inbound.burst : 0
outbound.average: 10
outbound.peak : 0
outbound.burst : 0
But according to virsh man page:
To clear inbound or outbound settings, use --inbound or
--outbound respectfully with average value of zero.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
During review of the iSCSI hostdev series, eblake noted that the
prototypes shouldn't have the extranenous space between the "*" and
the function name:
http://www.redhat.com/archives/libvir-list/2014-July/msg01227.html
Since it was more invasive than 1 or 2 lines - I said I'd send a
patch covering this once committed.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit febf84c2 tried to delay in-memory modification of the actual
domain disk structure until after the qemu event was received.
However, I missed that the code for block pivot had been temporarily
setting disk->src = disk->mirror prior to the qemu command, in order
to label the backing chain of a reused external blockcopy disk;
and calls into qemu while still in that state before finally undoing
things at the cleanup label. Since the qemu event handler then does:
virStorageSourceFree(disk->src);
disk->src = disk->mirror;
we have the sad race that a fast enough qemu event can cause a leak of
the original disk->src, as well as a use-after-free of the disk->mirror
contents, bad enough to crash libvirtd in some of my test runs, even
though the common case of the qemu event being much later won't trip
the race.
I'll go wear the brown paper bag of shame, for introducing a crasher
in between rc1 and rc2 of the freeze for 1.2.7 :( My only
consolation is that virDomainBlockJobAbort requires the domain:write
ACL, so it is not a CVE.
The valgrind report when the race occurs looks like:
==25612== Invalid read of size 4
==25612== at 0x50E7C90: virStorageSourceGetActualType (virstoragefile.c:1948)
==25612== by 0x209C0B18: qemuDomainDetermineDiskChain (qemu_domain.c:2473)
==25612== by 0x209D7F6A: qemuProcessHandleBlockJob (qemu_process.c:1087)
==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357)
...
==25612== Address 0xe4b5610 is 0 bytes inside a block of size 200 free'd
==25612== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==25612== by 0x50839E9: virFree (viralloc.c:582)
==25612== by 0x50E7E51: virStorageSourceFree (virstoragefile.c:2015)
==25612== by 0x209D7EFF: qemuProcessHandleBlockJob (qemu_process.c:1073)
==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357)
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Don't corrupt
disk->src, and only label chain for blockcopy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Fix a comment in virDomainAuditNetDevice.
Fix a typo in comment of qemuPhysIfaceConnect which is
the caller of virDomainAuditNetDevice.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
Commit 232a31b munged job info to report 'active commit' instead of
'commit' when generating events, but forgot to also munge the polling
variant of the command.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Adjust type as
needed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Otherwise this beautiful error would be overwritten when
the function is called with a really high rate number:
2014-07-28 12:51:47.920+0000: 2304: error : virCommandWait:2399 :
internal error: Child process (/sbin/tc class add dev vnet0 parent 1:
classid 1:1 htb rate 4294968kbps) unexpected exit status 1: Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
default minor id of class to which unclassified packets are sent {0}
r2q DRR quantums are computed as rate in Bps/r2q {10}
debug string of 16 numbers each 0-3 {0}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
[prio P] [slot S] [pslot PS]
[ceil R2] [cburst B2] [mtu MTU] [quantum Q]
rate rate allocated to this class (class can still borrow)
burst max bytes burst which can be accumulated during idle period {computed}
mpu minimum packet size used in rate computations
overhead per-packet size overhead used in rate computations
linklay adapting to a linklayer e.g. atm
ceil definite upper class rate (no borrows) {rate}
cburst burst but for ceil {computed}
mtu max packet size we create rate map for {1600}
prio priority of leaf; lowe
https://bugzilla.redhat.com/show_bug.cgi?id=1043735
There are multiple mount points after commit 725a211f, but one comment
wasn't changed to use plurals.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
With this in place, I can (finally!) now do:
virsh blockcommit $dom vda --shallow --verbose --pivot
and watch qemu shorten the backing chain by one, followed by
libvirt automatically updating the dumpxml output, effectively
undoing the work of virsh snapshot-commit --no-metadata --disk-only.
Commit is SOOOO much faster than blockpull, when I'm still fairly
close in time to when the temporary qcow2 wrapper file was created
via a snapshot operation!
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Implement live
commit.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch is going to wire up qemu active block commit jobs;
but as they have similar events and are canceled/pivoted in the
same way as block copy jobs, it is easiest to track all bookkeeping
for the commit job by reusing the <mirror> element. This patch
adds domain XML to track which job was responsible for creating a
mirroring situation, and adds a job='copy' attribute to all
existing uses of <mirror>. Along the way, it also massages the
qemu monitor backend to read the new field in order to generate
the correct type of libvirt job (even though it requires a
future patch to actually cause a qemu event that can be reported
as an active commit). It also prepares to update persistent XML
to match changes made to live XML when a copy completes.
* docs/schemas/domaincommon.rng: Enhance schema.
* docs/formatdomain.html.in: Document it.
* src/conf/domain_conf.h (_virDomainDiskDef): Add a field.
* src/conf/domain_conf.c (virDomainBlockJobType): String conversion.
(virDomainDiskDefParseXML): Parse job type.
(virDomainDiskDefFormat): Output job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Distinguish
active from regular commit.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set job type.
(qemuDomainBlockPivot, qemuDomainBlockJobImpl): Clean up job type
on completion.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml:
Update tests.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml: New
file.
* tests/qemuxml2xmltest.c (mymain): Drive new test.
Signed-off-by: Eric Blake <eblake@redhat.com>
We were not directly saving the domain XML to file after starting
or finishing a blockcopy. Without the startup write, a libvirtd
restart in the middle of a copy job would forget that the job was
underway. Then at pivot, we were indirectly writing new XML in
reaction to events that occur as we stop and restart the guest CPUs.
But there was a race: since pivot is an async action, it is possible
that libvirtd is restarted before the pivot completes, so if XML
changes during the event, that change was not written. The original
blockcopy code cleared out the <mirror> element prior to restarting
the CPUs, but this is also a race, observed if a user does an async
pivot and a dumpxml before the event occurs. Furthermore, this race
will interfere with active commit in a future patch, because that
code will rely on the <mirror> element at the time of the qemu event
to determine whether to inform the user of a normal commit or an
active commit.
Fix things by saving state any time we modify live XML, while
delaying XML disk modifications until after the event completes. We
still need a to teach libvirtd restarts to examine all existing
<mirror> elements to see if the job completed in the meantime (that
is, if libvirtd misses the event, the updated state still needs to be
updated in live XML), but that will be a later patch, in part because
we also need to to start taking advantage of newer qemu's ability to
keep the job around after completion rather than the current usage
where the job disappears both on error and on success.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change
on disk.
(qemuDomainBlockJobImpl, qemuDomainBlockPivot): Move job-end XML
rewrites...
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here.
Signed-off-by: Eric Blake <eblake@redhat.com>
Doing a blockcopy operation across a libvirtd restart is not very
robust at the moment. In particular, we are clearing the <mirror>
element prior to telling qemu to finish the job. Also, thanks to the
ability to request async completion, the user can easily regain
control prior to qemu actually finishing the effort, and they should
be able to poll the domain XML to see if the job is still going.
A future patch will fix things to actually wait until qemu is done
before modifying the XML to reflect the job completion. But since
qemu issues identical BLOCK_JOB_COMPLETE events regardless of whether
the job was cancelled (kept the original disk) or completed (pivoted
to the new disk), we have to track which of the two operations were
used to end the job. Furthermore, we'd like to avoid attempts to
end a job where we are already waiting on an earlier request to qemu
to end the job. Likewise, if we miss the qemu event (perhaps because
it arrived during a libvirtd restart), we still need enough state
recorded to be able to determine how to modify the domain XML once
we reconnect to qemu and manually learn whether the job still exists.
Although this patch doesn't actually fix the problem, it is a
preliminary step that makes it possible to track whether a job
has already begun steps towards completion.
* src/conf/domain_conf.h (virDomainDiskMirrorState): New enum.
(_virDomainDiskDef): Convert bool mirroring to new enum.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainDiskDefFormat): Handle new values.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Adjust
client.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl): Likewise.
* docs/schemas/domaincommon.rng (diskMirror): Expose new values.
* docs/formatdomain.html.in (elementsDisks): Document it.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
If PCI passthrough type is not supported, we should error out rather than
continue building the command line.
When starting a domain, the type has been already checked by
qemuPrepareHostdevPCICheckSupport() before building qemu command line,
so the problem doesn't emerge.
But when coverting a domain xml without specifying passthrough type explictly
to qemu arg, we will get a malformed command line.
the xml:
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0001' bus='0x03' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>
the converted command line:
-device ,host=0001:03:00.0,id=hostdev0,bus=pci.0,addr=0x5
After this patch, virsh gives an error message:
virsh domxml-to-native qemu-argv /tmp/tmp.xml
error: internal error: invalid PCI passthrough type 'default'
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Use better detection of hugetlbfs mount points. Yes, there can be
multiple mount points each serving different huge page size.
Since we already have ability to override the mount point in the
qemu.conf file, this crazy backward compatibility code is brought in.
Now we allow multiple mount points, so the "hugetlbfs_mount" option
must take an list of strings (mount points). But previously, it was
just a string, so we must accept both types now.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit e5f36698e3 introduces a
false-positive build failure in the sound card model handling switch.
Initialize the model to NULL although the value should never be used.
Libvirt documents that the default entropy source for the 'random'
backend of a RNG device is /dev/random. Instead of storing and
propagating NULL across our code and checking it in multiple places fill
the default in the post parse callback and use that in the other places.