Update the QEMU cgroups code, QEMU DAC security driver, SELinux
and AppArmour security drivers over to use the shared helper API
virDomainDiskDefForeachPath().
* src/qemu/qemu_driver.c, src/qemu/qemu_security_dac.c,
src/security/security_selinux.c, src/security/virt-aa-helper.c:
Convert over to use virDomainDiskDefForeachPath()
There is duplicated code which iterates over disk backing stores
performing some action. Provide a convenient helper for doing
this to eliminate duplication & risk of mistakes with disk format
probing
* src/conf/domain_conf.c, src/conf/domain_conf.h,
src/libvirt_private.syms: Add virDomainDiskDefForeachPath()
Require the disk image to be passed into virStorageFileGetMetadata.
If this is set to VIR_STORAGE_FILE_AUTO, then the format will be
resolved using probing. This makes it easier to control when
probing will be used
* src/qemu/qemu_driver.c, src/qemu/qemu_security_dac.c,
src/security/security_selinux.c, src/security/virt-aa-helper.c:
Set VIR_STORAGE_FILE_AUTO when calling virStorageFileGetMetadata.
* src/storage/storage_backend_fs.c: Probe for disk format before
calling virStorageFileGetMetadata.
* src/util/storage_file.h, src/util/storage_file.c: Remove format
from virStorageFileMeta struct & require it to be passed into
method.
The virStorageFileGetMetadataFromFD did two jobs in one. First
it probed for storage type, then it extracted metadata for the
type. It is desirable to be able to separate these jobs, allowing
probing without querying metadata, and querying metadata without
probing.
To prepare for this, split out probing code into a new pair of
methods
virStorageFileProbeFormatFromFD
virStorageFileProbeFormat
* src/util/storage_file.c, src/util/storage_file.h,
src/libvirt_private.syms: Introduce virStorageFileProbeFormat
and virStorageFileProbeFormatFromFD
Instead of including a field in FileTypeInfo struct for the
disk format, rely on the array index matching the format.
Use verify() to assert the correct number of elements in the
array.
* src/util/storage_file.c: remove type field from FileTypeInfo
When QEMU opens a backing store for a QCow2 file, it will
normally auto-probe for the format of the backing store,
rather than assuming it has the same format as the referencing
file. There is a QCow2 extension that allows an explicit format
for the backing store to be embedded in the referencing file.
This closes the auto-probing security hole in QEMU.
This backing store format can be useful for libvirt users
of virStorageFileGetMetadata, so extract this data and report
it.
QEMU does not require disk image backing store files to be in
the same format the file linkee. It will auto-probe the disk
format for the backing store when opening it. If the backing
store was intended to be a raw file this could be a security
hole, because a guest may have written data into its disk that
then makes the backing store look like a qcow2 file. If it can
trick QEMU into thinking the raw file is a qcow2 file, it can
access arbitrary files on the host by adding further backing
store links.
To address this, callers of virStorageFileGetMeta need to be
told of the backing store format. If no format is declared,
they can make a decision whether to allow format probing or
not.
IPtables will seek to preserve the source port unchanged when
doing masquerading, if possible. NFS has a pseudo-security
option where it checks for the source port <= 1023 before
allowing a mount request. If an admin has used this to make the
host OS trusted for mounts, the default iptables behaviour will
potentially allow NAT'd guests access too. This needs to be
stopped.
With this change, the iptables -t nat -L -n -v rules for the
default network will be
Chain POSTROUTING (policy ACCEPT 95 packets, 9163 bytes)
pkts bytes target prot opt in out source destination
14 840 MASQUERADE tcp -- * * 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
75 5752 MASQUERADE udp -- * * 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
0 0 MASQUERADE all -- * * 192.168.122.0/24 !192.168.122.0/24
* src/network/bridge_driver.c: Add masquerade rules for TCP
and UDP protocols
* src/util/iptables.c, src/util/iptables.c: Add source port
mappings for TCP & UDP protocols when masquerading.
There are many naming conventions for partitions associated with a
block device. Some of the major ones are:
/dev/foo -> /dev/foo1
/dev/foo1 -> /dev/foo1p1
/dev/mapper/foo -> /dev/mapper/foop1
/dev/disk/by-path/foo -> /dev/disk/by-path/foo-part1
The universe of possible conventions isn't clear. Rather than trying
to understand all possible conventions, this patch divides devices
into two groups, device mapper devices and everything else. Device
mapper devices seem always to follow the convention of device ->
devicep1; everything else is canonicalized.
* src/uml/uml_driver.c (umlMonitorCommand): Correct flaw that would
cause unconditional "incomplete reply ..." failure, since "nbytes"
was always 0 or 1.
* src/qemu/qemu_driver.c (qemuConnectMonitor): Correct erroneous
parenthesization in two expressions. Without this fix, failure
to set or clear SELinux security context in the monitor would go
undiagnosed. Also correct a diagnostic and split some long lines.
When comparing a CPU without <model> element, such as
<cpu>
<topology sockets='1' cores='1' threads='1'/>
</cpu>
libvirt would happily crash without warning.
When autodetecting whether XML describes guest or host CPU, the presence
of <arch> element is checked. If it's present, we treat the XML as host
CPU definition. Which is right, since guest CPU definitions do not
contain <arch> element. However, if at the same time the root <cpu>
element contains `match' attribute, we would silently ignore it and
still treat the XML as host CPU. We should rather refuse such invalid
XML.
When a CPU to be compared with host CPU describes a host CPU instead of
a guest CPU, the result is incorrect. This is because instead of
treating additional features in host CPU description as required, they
were treated as if they were mentioned with all possible policies at the
same time.
In case qemu supports -nodefconfig, libvirt adds uses it when launching
new guests. Since this option may affect CPU models supported by qemu,
we need to use it when probing for available models.
An indentation mistake meant that a check for return status
was not properly performed in all cases. This could result
in a crash on NULL pointer in a following line.
* src/qemu/qemu_monitor_json.c: Fix check for return status
when processing JSON for blockstats
By specifying <vendor> element in CPU requirements a guest can be
restricted to run only on CPUs by a given vendor. Host CPU vendor is
also specified in capabilities XML.
The vendor is checked when migrating a guest but it's not forced, i.e.,
guests configured without <vendor> element can be freely migrated.
All features in the baseline CPU definition were always created with
policy='require' even though an arch driver returned them with different
policy settings.
This allows the user to give an explicit path to configure
./configure --with-vbox=/path/to/virtualbox
instead of having the VirtualBox driver probe a set of possible
paths at runtime. If no explicit path is specified then configure
probes the set of "known" paths.
https://bugzilla.redhat.com/show_bug.cgi?id=609185
We only use libpciaccess for resolving device product/vendor. If
initializing the library fails (say if using qemu:///session), don't
warn so loudly, and carry on as usual.
Any error message raised after the process has forked needs
to be followed by virDispatchError, otherwise we have no chance of
ever seeing it. This was selectively done for hook functions in the past,
but really applies to all post-fork errors.
As pointed out by Eric Blake, using dirent->d_type breaks
compilation on MinGW. This patch addresses this by using
'#if defined' as same as doing for virCgroupForDriver.
Some, but not all, codepaths in the qemuMonitorOpen() method
would trigger the destroy callback. The caller does not expect
this to be invoked if construction fails, only during normal
release of the monitor. This resulted in a possible double-unref
of the virDomainObjPtr, because the caller explicitly unrefs
the virDomainObjPtr if qemuMonitorOpen() fails
* src/qemu/qemu_monitor.c: Don't invoke destroy callback from
qemuMonitorOpen() failure paths
ENOENT happens normally when a subsystem is enabled with any other
subsystems and the directory of the target group has already removed
in a prior loop. In that case, the function should just return without
leaving an error message.
NB this is the same behavior as before introducing virCgroupRemoveRecursively.
Make sure to *not* call qemuDomainPCIAddressReleaseAddr if
QEMUD_CMD_FLAG_DEVICE is *not* set (for older qemu). This
prevents a crash when trying to do device detachment from
a qemu guest.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
In the current libvirt PCI code, there is no checking whether
a PCI device is in use by a guest when doing node device
detach or reattach. This causes problems when a device is
assigned to a guest, and the administrator starts issuing
nodedevice commands. Make it so that we check the list
of active devices when trying to detach/reattach, and only
allow the operation if the device is not assigned to a guest.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Fix regression introduced in commit a4a287242 - basically, the
phyp storage driver should only accept the same URIs that the
main phyp driver is willing to accept. Blindly accepting all
URIs meant that the phyp storage driver was being consulted for
'virsh -c qemu:///session pool-list --all', rather than the
qemu storage driver, then since the URI was not for phyp, attempts
to then use the phyp driver crashed because it was not initialized.
* src/phyp/phyp_driver.c (phypStorageOpen): Only accept connections
already open to a phyp driver.
This code was just recently added (by me) and didn't account for the
fact that stdin_path is sometimes NULL. If it's NULL, and
SetSecurityAllLabel fails, a segfault would result.
Because tty path is unexpectedly not saved in the live configuration
file of a domain, libvirtd cannot get the console of the domain back
after restarting.
The reason why the tty path isn't saved is that, to save the tty path,
the save function, virDomainSaveConfig, requires that the target domain
is running (pid != -1), however, lxc driver calls the function before
starting the domain to pass the configuration to controller.
To ensure to save the tty path, the patch lets lxc driver call the save
function again after starting the domain.
The function is expected to return negative value on failure,
however, it returns positive value when either setInterfaceName
or vethInterfaceUpOrDown fails. Because the function returns
the return value of either as is, however, the two functions
may return positive value on failure.
The patch fixes the defects and add error messages.
When the saved domain image is on an NFS share, at least some part of
domainSetSecurityAllLabel will fail (for example, selinux labels can't
be modified). To allow domain restore to still work in this case, just
ignore the errors.
virStorageFileIsSharedFS would previously only work if the entire path
in question was stat'able by the uid of the libvirtd process. This
patch changes it to crawl backwards up the path retrying the statfs
call until it gets to a partial path that *can* be stat'ed.
This is necessary to use the function to learn the fstype for files
stored as a different user (and readable only by that user) on a
root-squashed remote filesystem.
Also restore the label to its original value after qemu is finished
with the file.
Prior to this patch, qemu domain restore did not function properly if
selinux was set to enforce.
If an active migration operation fails, or is cancelled by the
admin, the QEMU on the destination is shutdown and the one on
the source continues running. It is important in shutting down
the QEMU on the destination, the security drivers don't reset
the file labelling/permissions.
* src/qemu/qemu_driver.c: Don't reset labelling/permissions
on migration abort
Minor speedups by using the full power of sed.
* src/phyp/phyp_driver.c (phypGetVIOSFreeSCSIAdapter)
(phypDiskType, phypListDefinedDomains): Use fewer processes, by
folding other work into sed.
(phypGetVIOSPartitionID): Likewise. Also avoid non-portable use
of 'sed -s'.
Add the storage management driver to the Power Hypervisor driver.
This is a big but simple patch, it's just a new set of functions.
This patch includes:
* Storage driver: The set of pool-* and vol-* functions.
* attach-disk function.
* Support for IVM on the new functions.
Signed-off-by: Eric Blake <eblake@redhat.com>
* src/phyp/phyp_driver.c (phypStorageDriver): New driver.
(phypStorageOpen, phypStorageClose): New functions.
(phypRegister): Register it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Several phyp functions are not namespace clean, and had no reason
to be exported since no one outside the phyp driver needed to use
them. Rather than do lots of forward declarations, I was able
to topologically sort the file. So, this patch looks huge, but
is really just a matter of marking things static and dealing with
the compiler fallout.
* src/phyp/phyp_driver.h (PHYP_DRIVER_H): Add include guard.
(phypCheckSPFreeSapce): Delete unused declaration.
(phypGetSystemType, phypGetVIOSPartitionID, phypCapsInit)
(phypBuildLpar, phypUUIDTable_WriteFile, phypUUIDTable_ReadFile)
(phypUUIDTable_AddLpar, phypUUIDTable_RemLpar, phypUUIDTable_Pull)
(phypUUIDTable_Push, phypUUIDTable_Init, phypUUIDTable_Free)
(escape_specialcharacters, waitsocket, phypGetLparUUID)
(phypGetLparMem, phypGetLparCPU, phypGetLparCPUGeneric)
(phypGetRemoteSlot, phypGetBackingDevice, phypDiskType)
(openSSHSession): Move declarations to phyp_driver.c and make static.
* src/phyp/phyp_driver.c: Rearrange file contents to provide
topological sorting of newly-static funtions (no semantic changes
other than reduced scope).
(phypGetBackingDevice, phypDiskType): Mark unused, for now.
The patches for shared storage migration were not correctly written
for json mode. Thus the 'blk' and 'inc' parameters were never being
set. In addition they didn't set the QEMU_MONITOR_MIGRATE_BACKGROUND
so migration was synchronous. Due to multiple bugs in QEMU's JSON
impl this wasn't noticed because it treated the sync migration requst
as asynchronous anyway. Finally 'background' parameter was converted
to take arbitrary flags but not renamed, and not all uses were changed
to unsigned int.
* src/qemu/qemu_driver.c: Set QEMU_MONITOR_MIGRATE_BACKGROUND in
doNativeMigrate
* src/qemu/qemu_monitor_json.c: Process QEMU_MONITOR_MIGRATE_NON_SHARED_DISK
and QEMU_MONITOR_MIGRATE_NON_SHARED_INC flags
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.h, src/qemu/qemu_monitor_text.c,
src/qemu/qemu_monitor_text.h: change 'int background' to
'unsigned int flags' in migration APIs. Add logging of flags
parameter
During incoming migration the QEMU monitor is not able to be
used. The incoming migration code did not keep hold of the
job lock because migration is split across multiple API calls.
This meant that further monitor commands on the guest would
hang until migration finished with no timeout.
In this change the qemuDomainMigratePrepare method sets the
job flag just before it returns. The qemuDomainMigrateFinish
method checks for this job flag & clears it once done. This
prevents any use of the monitor between prepare+finish steps.
The qemuDomainGetJobInfo method is also updated to refresh
the job elapsed time. This means that virsh domjobinfo can
return time data during incoming migration
* src/qemu/qemu_driver.c: Keep a job active during incoming
migration. Refresh job elapsed time when returning job info
When configuring serial, parallel, console or channel devices
with a file, dev or pipe backend type, it is necessary to label
the file path in the security drivers. For char devices of type
file, it is neccessary to pre-create (touch) the file if it does
not already exist since QEMU won't be allowed todo so itself.
dev/pipe configs already require the admin to pre-create before
starting the guest.
* src/qemu/qemu_security_dac.c: set file ownership for character
devices
* src/security/security_selinux.c: Set file labeling for character
devices
* src/qemu/qemu_driver.c: Add character devices to cgroup ACL
The parallel, serial, console and channel devices are all just
character devices. A lot of code needs todo the same thing to
all these devices. This provides an convenient API for iterating
over all of them.
* src/conf/domain_conf.c, src/conf/domain_conf.c,
src/libvirt_private.syms: Add virDomainChrDefForeach
We previously assumed that if the -device option existed in qemu, that
-nodefconfig would also exist. It turns out that isn't the case, as
demonstrated by qemu-kvm-0.12.3 in Fedora 13.
*/src/qemu/qemu_conf.[hc] - add a new QEMUD_CMD_FLAG, set it via the
help output, and check it before adding
-nodefconfig to the qemu commandline.