This patch fixes console corruption, that happens if two concurrent
sessions are opened for a single console on a domain. Result of this
corruption was that each of the console streams recieved just a part
of the data written to the pipe so every console rendered unusable.
New helper function for safe console handling is used to establish the
console stream connection. This function ensures that no other libvirt
client is using the console (with the ability to disconnect consoles of
libvirt clients) and that no UUCP style lockfile is placed on the PTY
device.
* src/qemu/qemu_domain.h
- add data structure to domain's private data dealing with
console connections
* src/qemu/qemu_domain.c:
- allocate/free domain's console data structure
* src/qemu/qemu_driver.c
- use the new helper function for console handling
using 'system-wakeup' monitor command. It is supported only in JSON,
as we are enabling it if possible. Moreover, this command is available
in qemu-1.1+ which definitely has JSON.
Function xmlParseURI does not remove square brackets around IPv6
address when parsing. One of the solutions is making wrappers around
functions working with xmlURI*. This assures that uri->server will be
always properly assigned and it doesn't have to be changed when used
on some new place in the code.
For this purpose, functions virParseURI and virSaveURI were
added. These function are wrappers around xmlParseURI and xmlSaveUri
respectively.
Also there is one new syntax check function to prohibit these functions
anywhere else.
File changes:
- src/util/viruri.h -- declaration
- src/util/viruri.c -- definition
- src/libvirt_private.syms -- symbol export
- src/Makefile.am -- added source and header files
- cfg.mk -- added sc_prohibit_xmlURI
- all others -- ID name and include fixes
https://bugzilla.redhat.com/show_bug.cgi?id=795656 mentions
that a graceful destroy request can time out, meaning that the
error message is user-visible and should be more appropriate
than just internal error.
* src/qemu/qemu_driver.c (qemuDomainDestroyFlags): Swap error type.
Migrating domains with disks using cache != none is unsafe unless the
disk images are stored on coherent clustered filesystem. Thus we forbid
migrating such domains unless VIR_MIGRATE_UNSAFE flags is used.
This patch is based on an earlier patch by Eric Blake which was never
committed:
https://www.redhat.com/archives/libvir-list/2011-November/msg00243.html
Aside from rebasing, this patch only drops the driver lock once (prior
to the first time the function sleeps), then leaves it dropped until
it returns (Eric's patch would drop and re-acquire the lock around
each call to sleep).
At the time Eric sent his patch, the response (from Dan Berrange) was
that, while it wasn't a good thing to be holding the driver lock while
sleeping, we really need to rethink locking wrt the driver object,
switching to a finer-grained approach that locks individual items
within the driver object separately to allow for greater concurrency.
This is a good plan, and at the time it made sense to not apply the
patch because there was no known bug related to the driver lock being
held in this function.
However, we now know that the length of the wait in qemuProcessKill is
sometimes too short to allow the qemu process to fully flush its disk
cache before SIGKILL is sent, so we need to lengthen the timeout (in
order to improve the situation with management applications until they
can be updated to use the new VIR_DOMAIN_DESTROY_GRACEFUL flag added
in commit 72f8a7f19753506ed957b78ad800c0f3892c9304). But, if we
lengthen the timeout, we also lengthen the amount of time that all
other threads in libvirtd are essentially blocked from doing anything
(since just about everything needs to acquire the driver lock, if only
for long enough to get a pointer to a domain).
The solution is to modify qemuProcessKill to drop the driver lock
while sleeping, as proposed in Eric's patch. Then we can increase the
timeout with a clear conscience, and thus at least lower the chances
that someone running with existing management software will suffer the
consequence's of qemu's disk cache not being flushed.
In the meantime, we still should work on Dan's proposal to make
locking within the driver object more fine grained.
(NB: although I couldn't find any instance where qemuProcessKill() was
called with no jobs active for the domain (or some other guarantee
that the current thread had at least one refcount on the domain
object), this patch still follows Eric's method of temporarily adding
a ref prior to unlocking the domain object, because I couldn't
convince myself 100% that this was the case.)
When blkdeviotune was first committed in 0.9.8, we had the limitation
that setting one value reset all others. But bytes and iops should
be relatively independent. Furthermore, setting tuning values on
a live domain followed by dumpxml did not output the new settings.
* src/qemu/qemu_driver.c (qemuDiskPathToAlias): Add parameter, and
update callers.
(qemuDomainSetBlockIoTune): Don't lose previous unrelated
settings. Make live changes reflect to dumpxml output.
* tools/virsh.pod (blkdeviotune): Update documentation.
The auto-generated WWN comply with the new addressing schema of WWN:
<quote>
the first nibble is either hex 5 or 6 followed by a 3-byte vendor
identifier and 36 bits for a vendor-specified serial number.
</quote>
We choose hex 5 for the first nibble. And for the 3-bytes vendor ID,
we uses the OUI according to underlying hypervisor type, (invoking
virConnectGetType to get the virt type). e.g. If virConnectGetType
returns "QEMU", we use Qumranet's OUI (00:1A:4A), if returns
ESX|VMWARE, we use VMWARE's OUI (00:05:69). Currently it only
supports qemu|xen|libxl|xenapi|hyperv|esx|vmware drivers. The last
36 bits are auto-generated.
virsh blkiotune dom --device-weights /dev/sda,400 --config
wasn't working correctly.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Use
correct definition.
Qemu uses non-blocking I/O which doesn't play nice with regular file
descriptors. We need to pass a pipe to qemu instead, which can easily be
done using iohelper.
virFileDirectFd was used for accessing files opened with O_DIRECT using
libvirt_iohelper. We will want to use the helper for accessing files
regardless on O_DIRECT and thus virFileDirectFd was generalized and
renamed to virFileWrapperFd.
virFileOpenAs previously would only try opening a file as the current
user, or as a different user, but wouldn't try both methods in a
single call. This made it cumbersome to use as a replacement for
open(2). Additionally, it had a lot of historical baggage that led to
it being difficult to understand.
This patch refactors virFileOpenAs in the following ways:
* reorganize the code so that everything dealing with both the parent
and child sides of the "fork+setuid+setgid+open" method are in a
separate function. This makes the public function easier to understand.
* Allow a single call to virFileOpenAs() to first attempt the open as
the current user, and if that fails to automatically re-try after
doing fork+setuid (if deemed appropriate, i.e. errno indicates it
would now be successful, and the file is on a networkFS). This makes
it possible (in many, but possibly not all, cases) to drop-in
virFileOpenAs() as a replacement for open(2).
(NB: currently qemuOpenFile() calls virFileOpenAs() twice, once
without forking, then again with forking. That unfortunately can't
be changed without at least some discussion of the ramifications,
because the requested file permissions are different in each case,
which is something that a single call to virFileOpenAs() can't deal
with.)
* Add a flag so that any fchown() of the file to a different uid:gid
is explicitly requested when the function is called, rather than it
being implied by the presence of the O_CREAT flag. This just makes
for less subtle surprises to consumers. (Commit
b1643dc15c5de886fefe56ad18608d65f1325a2c added the check for O_CREAT
before forcing ownership. This patch just makes that restriction
more explicit.)
* If either the uid or gid is specified as "-1", virFileOpenAs will
interpret this to mean "the current [gu]id".
All current consumers of virFileOpenAs should retain their present
behavior (after a few minor changes to their setup code and
arguments).
When libvirt's virDomainDestroy API is shutting down the qemu process,
it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the
process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss
because the guest running in qemu doesn't have time to flush its disk
cache buffers before it's unceremoniously whacked.
This patch maintains that default behavior, but provides a new flag
VIR_DOMAIN_DESTROY_GRACEFUL to alter the behavior. If this flag is set
in the call to virDomainDestroyFlags, SIGKILL will never be sent to
the qemu process; instead, if the timeout is reached and the qemu
process still exists, virDomainDestroy will return an error.
Once this patch is in, the recommended method for applications to call
virDomainDestroyFlags will be with VIR_DOMAIN_DESTROY_GRACEFUL
included. If that fails, then the application can decide if and when
to call virDomainDestroyFlags again without
VIR_DOMAIN_DESTROY_GRACEFUL (to force the issue with SIGKILL).
(Note that this does not address the issue of existing applications
that have not yet been modified to use VIR_DOMAIN_DESTROY_GRACEFUL.
That is a separate patch.)
Curently security labels can be of type 'dynamic' or 'static'.
If no security label is given, then 'dynamic' is assumed. The
current code takes advantage of this default, and avoids even
saving <seclabel> elements with type='dynamic' to disk. This
means if you temporarily change security driver, the guests
can all still start.
With the introduction of sVirt to LXC though, there needs to be
a new default of 'none' to allow unconfined LXC containers.
This patch introduces two new security label types
- default: the host configuration decides whether to run the
guest with type 'none' or 'dynamic' at guest start
- none: the guest will run unconfined by security policy
The 'none' label type will obviously be undesirable for some
deployments, so a new qemu.conf option allows a host admin to
mandate confined guests. It is also possible to turn off default
confinement
security_default_confined = 1|0 (default == 1)
security_require_confined = 1|0 (default == 0)
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add new
seclabel types
* src/security/security_manager.c, src/security/security_manager.h:
Set default sec label types
* src/security/security_selinux.c: Handle 'none' seclabel type
* src/qemu/qemu.conf, src/qemu/qemu_conf.c, src/qemu/qemu_conf.h,
src/qemu/libvirtd_qemu.aug: New security config options
* src/qemu/qemu_driver.c: Tell security driver about default
config
This is a trivial implementation, which works with the current
released qemu 1.0 with backports of preliminary block pull but
no partial rebase. Future patches will update the monitor handling
to support an optional parameter for partial rebase; but as qemu
1.1 is unreleased, it can be in later patches, designed to be
backported on top of the supported API.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Add parameter,
and adjust callers. Drop redundant check.
(qemuDomainBlockPull): Move guts...
(qemuDomainBlockRebase): ...to new function.
This patch adds support for the new api into the qemu driver to support
modification and retrieval of domain description and title. This patch
does not add support for modifying the <metadata> element.
In qemuDomainShutdownFlags if we try to use guest agent,
which has error or is not configured, we jump go endjob
label even if we haven't started any job yet. This may
lead to the daemon crash:
1) virsh shutdown --mode agent on a domain without agent configured
2) wait until domain quits
3) virsh edit
To help consolidate the commonality between virDomainHostdevDef and
virDomainNetDef into as few members as possible (and because I
think it makes sense), this patch moves the rombar and bootIndex
members into the "info" member that is common to both (and to all the
other structs that use them).
It's a bit problematic that this gives rombar and bootIndex to many
device types that don't use them, but this is already the case for the
master and mastertype members of virDomainDeviceInfo, and is properly
commented as such in the definition.
Note that this opens the door to supporting rombar for other devices
that are attached to the guest PCI bus - virtio-blk-pci,
virtio-net-pci, various other network adapters - which which have that
capability in qemu, but previously had no support in libvirt.
This makes use of the QEMU guest agent to implement the
virDomainShutdownFlags and virDomainReboot APIs. With
no flags specified, it will prefer to use the agent, but
fallback to ACPI. Explicit choice can be made by using
a suitable flag
* src/qemu/qemu_driver.c: Wire up use of agent
Reusing common code makes things smaller; it also buys us some
additional safety, such as now rejecting duplicate parameters
during a set operation.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters)
(qemuDomainSetMemoryParameters, qemuDomainSetNumaParameters)
(qemuSetSchedulerParametersFlags)
(qemuDomainSetInterfaceParameters, qemuDomainSetBlockIoTune)
(qemuDomainGetBlkioParameters, qemuDomainGetMemoryParameters)
(qemuDomainGetNumaParameters, qemuGetSchedulerParametersFlags)
(qemuDomainBlockStatsFlags, qemuDomainGetInterfaceParameters)
(qemuDomainGetBlockIoTune): Use new helpers.
* src/esx/esx_driver.c (esxDomainSetSchedulerParametersFlags)
(esxDomainSetMemoryParameters)
(esxDomainGetSchedulerParametersFlags)
(esxDomainGetMemoryParameters): Likewise.
* src/libxl/libxl_driver.c
(libxlDomainSetSchedulerParametersFlags)
(libxlDomainGetSchedulerParametersFlags): Likewise.
* src/lxc/lxc_driver.c (lxcDomainSetMemoryParameters)
(lxcSetSchedulerParametersFlags, lxcDomainSetBlkioParameters)
(lxcDomainGetMemoryParameters, lxcGetSchedulerParametersFlags)
(lxcDomainGetBlkioParameters): Likewise.
* src/test/test_driver.c (testDomainSetSchedulerParamsFlags)
(testDomainGetSchedulerParamsFlags): Likewise.
* src/xen/xen_hypervisor.c (xenHypervisorSetSchedulerParameters)
(xenHypervisorGetSchedulerParameters): Likewise.
pciTrySecondaryBusReset checks if there is active device on the
same bus, however, qemu driver doesn't maintain an effective
list for the inactive devices, and it passes meaningless argument
for parameter "inactiveDevs". e.g. (qemuPrepareHostdevPCIDevices)
if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
return -1;
..skipped...
if (pciResetDevice(dev, driver->activePciHostdevs, pcidevs) < 0)
goto reattachdevs;
NB, the "pcidevs" used above are extracted from domain def, and
thus one won't be able to attach a device of which bus has other
device even detached from host (nodedev-detach). To see more
details of the problem:
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=773667
This patch is to resolve the problem by introducing an inactive
PCI device list (just like qemu_driver->activePciHostdevs), and
the whole logic is:
* Add the device to inactive list during nodedev-dettach
* Remove the device from inactive list during nodedev-reattach
* Remove the device from inactive list during attach-device
(for non-managed device)
* Add the device to inactive list after detach-device, only
if the device is not managed
With the above, we have a sufficient inactive PCI device list, and thus
we can use it for pciResetDevice. e.g.(qemuPrepareHostdevPCIDevices)
if (pciResetDevice(dev, driver->activePciHostdevs,
driver->inactivePciHostdevs) < 0)
goto reattachdevs;
When sVirt is integrated with the LXC driver, it will be neccessary
to invoke the security driver APIs using only a virDomainDefPtr
since the lxc_container.c code has no virDomainObjPtr available.
Aside from two functions which want obj->pid, every bit of the
security driver code only touches obj->def. So we don't need to
pass a virDomainObjPtr into the security drivers, a virDomainDefPtr
is sufficient. Two functions also gain a 'pid_t pid' argument.
* src/qemu/qemu_driver.c, src/qemu/qemu_hotplug.c,
src/qemu/qemu_migration.c, src/qemu/qemu_process.c,
src/security/security_apparmor.c,
src/security/security_dac.c,
src/security/security_driver.h,
src/security/security_manager.c,
src/security/security_manager.h,
src/security/security_nop.c,
src/security/security_selinux.c,
src/security/security_stack.c: Change all security APIs to use a
virDomainDefPtr instead of virDomainObjPtr
When disk snapshots were first implemented, libvirt blindly refused
to allow an external snapshot destination that already exists, since
qemu will blindly overwrite the contents of that file during the
snapshot_blkdev monitor command, and we don't like a default of
data loss by default. But VDSM has a scenario where NFS permissions
are intentionally set so that the destination file can only be
created by the management machine, and not the machine where the
guest is running, so that libvirt will necessarily see the destination
file already existing; adding a flag will allow VDSM to force the file
reuse without libvirt complaining of possible data loss.
https://bugzilla.redhat.com/show_bug.cgi?id=767104
* include/libvirt/libvirt.h.in (virDomainSnapshotCreateFlags): Add
VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT.
* src/libvirt.c (virDomainSnapshotCreateXML): Document it. Add
note about partial failure.
* tools/virsh.c (cmdSnapshotCreate, cmdSnapshotCreateAs): Add new
flag.
* tools/virsh.pod (snapshot-create, snapshot-create-as): Document
it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateXML): Implement the new flag.
It should be a copy-paste error, the result is programming will result in an
infinite loop again due to without iterating 'j' variable.
* src/qemu/qemu_driver.c: fix a typo on qemuDomainSetBlkioParameters.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=770520
Signed-off-by: Alex Jia <ajia@redhat.com>
In the past, generic SCSI commands issued from a guest to a virtio
disk were always passed through to the underlying disk by qemu, and
the kernel would also pass them on.
As a result of CVE-2011-4127 (see:
http://seclists.org/oss-sec/2011/q4/536), qemu now honors its
scsi=on|off device option for virtio-blk-pci (which enables/disables
passthrough of generic SCSI commands), and the kernel will only allow
the commands for physical devices (not for partitions or logical
volumes). The default behavior of qemu is still to allow sending
generic SCSI commands to physical disks that are presented to a guest
as virtio-blk-pci devices, but libvirt prefers to disable those
commands in the standard virtio block devices, enabling it only when
specifically requested (hopefully indicating that the requester
understands what they're asking for). For this purpose, a new libvirt
disk device type (device='lun') has been created.
device='lun' is identical to the default device='disk', except that:
1) It is only allowed if bus='virtio', type='block', and the qemu
version is "new enough" to support it ("new enough" == qemu 0.11 or
better), otherwise the domain will fail to start and a
CONFIG_UNSUPPORTED error will be logged).
2) The option "scsi=on" will be added to the -device arg to allow
SG_IO commands (if device !='lun', "scsi=off" will be added to the
-device arg so that SG_IO commands are specifically forbidden).
Guests which continue to use disk device='disk' (the default) will no
longer be able to use SG_IO commands on the disk; those that have
their disk device changed to device='lun' will still be able to use SG_IO
commands.
*docs/formatdomain.html.in - document the new device attribute value.
*docs/schemas/domaincommon.rng - allow it in the RNG
*tests/* - update the args of several existing tests to add scsi=off, and
add one new test that will test scsi=on.
*src/conf/domain_conf.c - update domain XML parser and formatter
*src/qemu/qemu_(command|driver|hotplug).c - treat
VIR_DOMAIN_DISK_DEVICE_LUN *almost* identically to
VIR_DOMAIN_DISK_DEVICE_DISK, except as indicated above.
Note that no support for this new device value was added to any
hypervisor drivers other than qemu, because it's unclear what it might
mean (if anything) to those drivers.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=638633
Although scripts are not used by interfaces of type other than
"ethernet" in qemu, due to the fact that the parser stores the script
name in a union that is only valid when type is ethernet or bridge,
there is no way for anyone except the parser itself to catch the
problem of specifying an interface script for an inappropriate
interface type (by the time the parsed data gets back to the code that
called the parser, all evidence that a script was specified is
forgotten).
Since the parser itself should be agnostic to which type of interface
allows scripts (an example of why: a script specified for an interface
of type bridge is valid for xen domains, but not for qemu domains),
the solution here is to move the script out of the union(s) in the
DomainNetDef, always populate it when specified (regardless of
interface type), and let the driver decide whether or not it is
appropriate.
Currently the qemu, xen, libxml, and uml drivers recognize the script
parameter and do something with it (the uml driver only to report that
it isn't supported). Those drivers have been updated to log a
CONFIG_UNSUPPORTED error when a script is specified for an interface
type that's inappropriate for that particular hypervisor.
(NB: There was earlier discussion of solving this problem by adding a
VALIDATE flag to all libvirt APIs that accept XML, which would cause
the XML to be validated against the RNG files. One statement during
that discussion was that the RNG shouldn't contain hypervisor-specific
things, though, and a proper solution to this problem would require
that (again, because a script for an interface of type "bridge" is
accepted by xen, but not by qemu).
Commit ae523427 missed one pair of functions that could use
the helper routine.
* src/qemu/qemu_driver.c (qemuSetSchedulerParametersFlags)
(qemuGetSchedulerParametersFlags): Simplify.
When setting numa nodeset for a domain which has no nodeset set
before, libvirtd crashes by dereferencing the pointer to the old
nodemask which is null in that case.
Commit baade4d fixed a memory leak on failure, but in the process,
introduced a use-after-free on success, which can be triggered with:
1. set bandwidth with --live
2. query bandwidth
3. set bandwidth with --live
* src/qemu/qemu_driver.c (qemuDomainSetInterfaceParameters): Don't
free newBandwidth on success.
Reported by Hu Tao.
Most severe here is a latent (but currently untriggered) memory leak
if any hypervisor ever adds a string interface property; the
remainder are mainly cosmetic.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BANDWIDTH_*): Move
macros closer to interface that uses them, and document type.
* src/libvirt.c (virDomainSetInterfaceParameters)
(virDomainGetInterfaceParameters): Formatting tweaks.
* daemon/remote.c (remoteDispatchDomainGetInterfaceParameters):
Avoid memory leak.
* src/libvirt_public.syms (LIBVIRT_0.9.9): Sort lines.
* src/libvirt_private.syms (domain_conf.h): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSetInterfaceParameters): Fix
comments, break long lines.
Leak detected by Coverity, and introduced in commit 93ab585.
Reported by Alex Jia.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Free
devices array on error.
https://bugzilla.redhat.com/show_bug.cgi?id=770520
We had two nested loops both trying to use 'i' as the iteration
variable, which can result in an infinite loop when the inner
loop interferes with the outer loop. Introduced in commit 93ab585.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Don't
reuse iteration variable across two loops.
The lifetime of the virDomainEventState object is tied to
the lifetime of the driver, which in stateless drivers is
tied to the lifetime of the virConnectPtr.
If we add & remove a timer when allocating/freeing the
virDomainEventState object, we can get a situation where
the timer still triggers once after virDomainEventState
has been freed. The timeout callback can't keep a ref
on the event state though, since that would be a circular
reference.
The trick is to only register the timer when a callback
is registered with the event state & remove the timer
when the callback is unregistered.
The demo for the bug is to run
while true ; do date ; ../tools/virsh -q -c test:///default 'shutdown test; undefine test; dominfo test' ; done
prior to this fix, it will frequently hang and / or
crash, or corrupt memory
Currently all drivers using domain events need to provide a callback
for handling a timer to dispatch events in a clean stack. There is
no technical reason for dispatch to go via driver specific code. It
could trivially be dispatched directly from the domain event code,
thus removing tedious boilerplate code from all drivers
Also fix the libxl & xen drivers to pass 'true' when creating the
virDomainEventState, since they run inside the daemon & thus always
expect events to be present.
* src/conf/domain_event.c, src/conf/domain_event.h: Internalize
dispatch of events from timer callback
* src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
src/qemu/qemu_domain.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c,
src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c: Remove all timer dispatch functions