* src/uml/uml_driver.c (umlMonitorCommand): This function would
sometimes return -1, yet fail to free the "reply" it had allocated.
Hence, no caller would know to free the corresponding argument.
When returning -1, be sure to free all allocated resources.
* src/storage/storage_backend_mpath.c (virStorageBackendIsMultipath):
The result of dm_get_next_target was never used (and isn't needed),
so don't store it.
Similar to the Set*Mem commands, this implementation was bogus and
misleading. Make it clear this is a hotplug only operation, and that the
hotplug piece isn't even implemented.
Also drop the overkill maxvcpus validation: we don't perform this check
at XML define time so clearly no one is missing it, and there is
always the risk that our info will be out of date, possibly preventing
legitimate CPU values.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
SetMem and SetMaxMem are hotplug only APIs, any persistent config
changes are supposed to go via XML definition. The original implementation
of these calls were incorrect and had the nasty side effect of making
a psuedo persistent change that would be lost after libvirtd restart
(I didn't know any better).
Fix these APIs to rightly reject non running domains.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
The plain QEMU tree does not include 'thread_id' in the JSON
output. Thus we need to treat it as non-fatal if missing.
* src/qemu/qemu_monitor_json.c: Treat missing thread_id as non-fatal
A typo in the check for the primary IDE controller could cause
a crash on restore depending on the exact guest config.
* src/qemu/qemu_conf.c: Fix s/video/controller/ typo & slot
number typo
Current error reporting for JSON mode returns the full JSON
command string and full JSON error string. This is not very
user friendly, so this change makes the error report only
contain the basic command name, and friendly error message
description string. The full JSON data is logged instead.
* src/qemu/qemu_monitor_json.c: Always return the 'desc' field from
the JSON error message to users.
When in JSON mode, QEMU requires that 'qmp_capabilities' is run as
the first command in the monitor. This is a no-op when run in the
text mode monitor
* src/qemu/qemu_driver.c: Run capabilities negotiation when
connecting to the monitor
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h: Add
support for the 'qmp_capabilities' command, no-op in text mode.
This part adds support for qemu making a macvtap tap device available
via file descriptor passed to qemu command line. This also attempts to
tear down the macvtap device when a VM terminates. This includes support
for attachment and detachment to/from running VM.
* src/qemu/qemu_conf.[ch] src/qemu/qemu_driver.c: add support in the
QEmu driver
This part adds the helper code to setup and tear down macvtap devices
using direct communication with the device driver via netlink sockets.
The rather short messages received from the netlink layer are now
written into a dynamically allocated buffer
* src/util/macvtap.h src/util/macvtap.c: provides the new module
* po/POTFILES.in: the module contains translated strings
This part adds support to domain_conf.{c|h} for parsing the new
interface XML of type 'direct'. The parsed mode is now stored as
an int.
* src/conf/domain_conf.c src/conf/domain_conf.h: extend parsing code
* src/util/macvtap.h: empty header to not break compilation
This patch adds build support for libvirt checking for certain contents
of /usr/include/linux/if_link.h to see whether macvtap support is
compilable on that system. One can disable macvtap support in libvirt
via --without-macvtap passed to configure.
* configure.ac src/Makefile.am: new build support
* src/libvirt_macvtap.syms: list of exported symbols
* src/util/macvtap.c: empty module to not break compilation
The virRaiseError macro inside of virSecurityReportError expands to
virRaiseErrorFull and includes the __FILE__, __FUNCTION__ and __LINE__
information. But this three values are always the same for every call
to virSecurityReportError and do not reflect the actual error context.
Converting virSecurityReportError into a macro results in getting the
correct __FILE__, __FUNCTION__ and __LINE__ information.
Current PCI addresses are allocated at time of VM startup.
To make them truely persistent, it is neccessary to do this
at time of virDomainDefine/virDomainCreate. The code in
qemuStartVMDaemon still remains in order to cope with upgrades
from older libvirt releases
* src/qemu/qemu_driver.c: Rename existing qemuAssignPCIAddresses
to qemuDetectPCIAddresses. Add new qemuAssignPCIAddresses which
does auto-allocation upfront. Call qemuAssignPCIAddresses from
qemuDomainDefine and qemuDomainCreate to assign PCI addresses that
can then be persisted. Don't clear PCI addresses at shutdown if
they are intended to be persistent
The old text mode monitor prompts for a password when disks are
encrypted. This interactive approach doesn't work for JSON mode
monitor. Thus there is a new 'block_passwd' command that can be
used.
* src/qemu/qemu_driver.c: Split out code for looking up a disk
secret from findVolumeQcowPassphrase, into a new method
getVolumeQcowPassphrase. Enhance qemuInitPasswords() to also
set the disk encryption password via the monitor
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
support for the 'block_passwd' monitor command.
Since c26cb9234f, the dname
parameter has been ignored by these two functions. Use it.
* src/qemu/qemu_driver.c (qemudDomainMigratePrepareTunnel): Honor dname
parameter once again.
(qemudDomainMigratePrepare2): Likewise.
All other libvirt functions use array first and then number of elements
in that array. Let's make cpuDecode follow this rule.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
With QEMU >= 0.12 the host and guest side of disks no longer have
the same naming convention. Specifically the host side will now
get a 'drive-' prefix added to its name. The 'info blockstats'
monitor command returns the host side name, so it is neccessary
to strip this off when looking up stats since libvirt stores the
guest side name !
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Move 'drive-' prefix
string to a defined constant
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_text.c: Strip
off 'drive-' prefix (if found) when looking up disk stats
Currently the timeout for reading startup output is 3 seconds. If the
host is under any sort of load, we can easily trigger this. Lets bump
it to 30 seconds.
Since the polling loop checks to see if the process has died, we shouldn't
erroneously hit this timeout if qemu bombs (only if it is stuck in some
infinite loop).
Use the ATTRIBUTE_NONNULL annotation to mark some virConnectPtr
args as mandatory non-null so the compiler can warn of mistakes
* src/conf/domain_event.h: All virConnectPtr args must be non-null
* src/qemu/qemu_conf.h: qemudBuildCommandLine and
qemudNetworkIfaceConnect() must be given non-null connection
* tests/qemuxml2argvtest.c: Provide a non-null (dummy) connection to
qemudBuildCommandLine()
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in secret_conf.{h,c} and update all callers to
match
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in interface_conf.{h,c} and update all callers to
match
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in cpu_conf.{h,c} and update all callers to
match
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in storage_conf.{h,c} and storage_encryption_conf.{h,c}
and update all callers to match
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in node_device_conf.{h,c} and update all callers to
match
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in network_conf.{h,c} and update all callers to
match
All callers now pass a NULL virConnectPtr into the USB/PCi device
iterator functions. Therefore the virConnectPtr arg can now be
removed from these functions
* src/util/hostusb.h, src/util/hostusb.c: Remove virConnectPtr
from usbDeviceFileIterate
* src/util/pci.c, src/util/pci.h: Remove virConnectPtr arg from
pciDeviceFileIterate
* src/qemu/qemu_security_dac.c, src/security/security_selinux.c: Update
to drop redundant virConnectPtr arg
The QEMU flags are commonly stored as a signed or unsigned int,
allowing only 31 flags. This limit is rather close, so to aid
future patches, change it to a 64-bit int
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h, src/qemu/qemu_driver.c,
tests/qemuargv2xmltest.c, tests/qemuhelptest.c, tests/qemuxml2argvtest.c:
Use 'unsigned long long' for QEMU flags
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in security_driver.{h,c} and update all callers to
match
The security driver was mistakenly initialized before the QEMU
config file was loaded. This prevents it being turned off again.
The capabilities XML was also getting the wrong security driver
name, due to the stacked driver arrangement.
* src/qemu/qemu_driver.c: Fix initialization order and capabilities
model name
* src/util/util.h (virAsprintf): Remove ATTRIBUTE_RETURN_CHECK, since
it is perfectly fine to ignore the return value, now that the pointer
is guaranteed to be set to NULL upon failure.
* src/util/storage_file.c (absolutePathFromBaseFile): Remove now-
unnecessary use of ignore_value.
* src/util/storage_file.c (absolutePathFromBaseFile): While this use
of virAsprintf is slightly cleaner than using stpncpy(stpcpy(...,
it does impose an artificial limitation on the length of the base_file
name. Rather than asserting that it does not exceed INT_MAX, return
NULL when it does.
When creating preallocated large raw files opening them with O_DSYNC
prevents long delays in reading because cache pages can be immediately
reused without writing them on a disk first.
virDomain{Attach,Detach}Device is now only permitted on active
domains. Explicitly state this restriction in the API
documentation.
V2: Only change doc, dropping the hunk that forced the restriction
in libvirt frontend.
When configured with --enable-gcc-warnings, it didn't even compile.
* src/util/storage_file.c: Include <assert.h>.
(absolutePathFromBaseFile): Assert that converting size_t to int is valid.
Reverse length/string args to match "%.*s".
Explicitly ignore the return value of virAsprintf.
* src/util/storage_file.c: Include "dirname.h".
(absolutePathFromBaseFile): Rewrite not to leak, and to require
fewer allocations.
* bootstrap (modules): Add dirname-lgpl.
* .gnulib: Update submodule to the latest.
When restoring from a saved guest image, the XML would already
contain the PCI slot ID of the IDE controller & video card.
The attempt to explicitly reserve this upfront would thus fail
everytime.
* src/qemu/qemu_conf.c: Reserve IDE controller / video card
slot at time of need, rather than upfront
Similar fix as previous one but for fork() usage when creating
a file or directory
* src/util/util.c: virLogLock() and virLogUnlock() around fork()
in virFileCreate() and virDirCreateSimple()
Ad pointed out by Dan Berrange:
So if some thread in libvirtd is currently executing a logging call,
while another thread calls virExec(), that other thread no longer
exists in the child, but its lock is never released. So when the
child then does virLogReset() it deadlocks.
The only way I see to address this, is for the parent process to call
virLogLock(), immediately before fork(), and then virLogUnlock()
afterwards in both parent & child. This will ensure that no other
thread
can be holding the lock across fork().
* src/util/logging.[ch] src/libvirt_private.syms: export virLogLock() and
virLogUnlock()
* src/util/util.c: lock just before forking and unlock just after - in
both parent and child.
The original udev node device backend neglected to lock the driverState
struct containing the device list when adding and removing devices
* src/node_device/node_device_udev.c: add necessary locks in
udevRemoveOneDevice() and udevAddOneDevice()
* src/xen/xs_internal.c (xenStoreDomainIntroduced): Don't use -1
as an allocation size upon xenStoreNumOfDomains failure.
(xenStoreDomainReleased): Likewise.
If the primary security driver (SELinux/AppArmour) was disabled
then the secondary QEMU DAC security driver was also disabled.
This is mistaken, because the latter must be active at all times
* src/qemu/qemu_driver.c: Ensure DAC driver is always active
* src/xen/xen_hypervisor.c: Remove all "domain == NULL" tests.
* src/xen/xen_hypervisor.h: Instead, use ATTRIBUTE_NONNULL to
mark each "domain" parameter as "known always to be non-NULL".
When attaching a USB host device based on vendor/product, libvirt
will resolve the vendor/product into a device/bus pair. This means
that when printing XML we should allow device/bus info to be printed
at any time if present
* src/conf/domain_conf.c, docs/schemas/domain.rng: Allow USB device
bus info alongside vendor/product
To allow devices to be hot(un-)plugged it is neccessary to ensure
they all have a unique device aliases. This fixes the hotplug
methods to assign device aliases before invoking the monitor
commands which need them
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Expose methods
for assigning device aliases for disks, host devices and
controllers
* src/qemu/qemu_driver.c: Assign device aliases when hotplugging
all types of device
* tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address-device.args,
tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device.args:
Update for changed hostdev naming scheme
This patch re-arranges the QEMU device alias assignment code to
make it easier to call into the same codeblock when performing
device hotplug. The new code has the ability to skip over already
assigned names to facilitate hotplug
* src/qemu/qemu_driver.c: Call qemuAssignDeviceNetAlias()
instead of qemuAssignNetNames
* src/qemu/qemu_conf.h: Export qemuAssignDeviceNetAlias()
instead of qemuAssignNetNames
* src/qemu/qemu_driver.c: Merge the legacy disk/network alias
assignment code into the main methods
The current way of assigning names to the host network backend and
NIC device in QEMU was over complicated, by varying naming scheme
based on the NIC model and backend type. This simplifies the naming
to simply be 'net0' and 'hostnet0', allowing code to easily determine
the host network name and vlan based off the primary device alias
name 'net0'. This in turn allows removal of alot of QEMU specific
code from the XML parser, and makes it easier to assign new unique
names for NICs that are hotplugged
* src/conf/domain_conf.c, src/conf/domain_conf.h: Remove hostnet_name
and vlan fields from virNetworkDefPtr
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h, src/qemu/qemu_driver.c:
Use a single network alias naming scheme regardless of NIC type
or backend type. Determine VLANs from the alias name.
* tests/qemuxml2argvdata/qemuxml2argv-net-eth-names.args,
tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.args,
tests/qemuxml2argvdata/qemuxml2argv-net-virtio-netdev.args: Update
for new simpler naming scheme
The QEMU 0.12.x tree has the -netdev command line argument, but not
corresponding monitor command. We can't enable the former, without
the latter since it will break hotplug/unplug.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Disable -netdev usage
until 0.13 at earliest
* tests/qemuxml2argvtest.c: Add test for -netdev syntax
* tests/qemuxml2argvdata/qemuxml2argv-net-virtio-netdev.args,
tests/qemuxml2argvdata/qemuxml2argv-net-virtio-netdev.xml: Test
data files for -netdev syntax
PCI disk, disk controllers, net devices and host devices need to
have PCI addresses assigned before they are hot-plugged
* src/qemu/qemu_conf.c: Add APIs for ensuring a device has an
address and releasing unused addresses
* src/qemu/qemu_driver.c: Ensure all devices have addresses
when hotplugging.
The current QEMU code allocates PCI addresses incrementally starting
at 4. This is not satisfactory because the user may have given some
addresses in their XML config, which need to be skipped over when
allocating addresses to remaining devices.
It is thus neccessary to maintain a list of already allocated PCI
addresses and then only allocate ones that remain unused. This is
also required for domain device hotplug to work properly later.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Add APIs for creating
list of existing PCI addresses, and allocating new addresses.
Refactor address assignment to use this code
* src/qemu/qemu_driver.c: Pull PCI address assignment up into the
qemuStartVMDaemon() method, as a prelude to moving it into the
'define' method. Update list of allocated addresses when connecting
to a running VM at daemon startup.
* tests/qemuxml2argvtest.c, tests/qemuargv2xmltest.c,
tests/qemuxml2xmltest.c: Remove USB product test since all
passthrough is done based on address
* tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-product.args,
tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-product.xml: Kil
unused data files
The virDomainDeviceInfoIterate() function will provide a
convenient way to iterate over all devices in a domain.
* src/conf/domain_conf.c, src/conf/domain_conf.h,
src/libvirt_private.syms: Add virDomainDeviceInfoIterate()
function.
Since QEMU startup uses the new -device argument, the hotplug
code needs todo the same. This converts disk, network and
host device hotplug to use the device_add command
* src/qemu/qemu_driver.c: Use new device_add monitor APIs
whereever possible
The way QEMU is started has been changed to use '-device' and
the new style '-drive' syntax. This needs to be mirrored in
the hotplug code, requiring addition of two new APIs.
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Define APIs
qemuMonitorAddDevice() and qemuMonitorAddDrive()
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Implement the new monitor APIs
To allow for better code reuse from hotplug methods, the code for
generating PCI/USB hostdev arg values is split out into separate
methods
* qemu/qemu_conf.h, qemu/qemu_conf.c: Introduce new APis for
qemuBuildPCIHostdevPCIDevStr, qemuBuildUSBHostdevUsbDevStr
and qemuBuildUSBHostdevDevStr
All the helper functions for building command line arguments
now return a 'char *', instead of acepting a 'char **' or
virBufferPtr argument
* qemu/qemu_conf.c: Standardize syntax for building args
* qemu/qemu_conf.h: Export all functions for building args
* qemu/qemu_driver.c: Update for changed syntax for building
NIC/hostnet args
udevGetUintProperty was called with base set to 0 for busnum and devnum.
With base 0 strtoul parses the number as octal if it start with a 0. But
busnum and devnum are decimal and udev returns them padded with leading
zeros. So strtoul parses them as octal. This works for certain decimal
values like 001-007, but fails for values like 008.
Change udevProcessUSBDevice to use base 10 for busnum and devnum.
* src/util/util.c (virGetUserID, virGetGroupID): In the unlikely event
that sysconf(_SC_GETPW_R_SIZE_MAX) fails, don't use -1 as the size in
the subsequent allocation.
Similar to the race fixed by
be34c3c7ef, make sure
to wait around for KVM to release the resources from
a hot-detached PCI device before attempting to
rebind that device to the host driver.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The QEMU driver contained code to generate a -device string for piix4-ide, but
wasn't using it. This change removes this string generation. It also adds a
comment explaining why IDE and FDC controllers don't generate -device strings.
The change also generates an error if a sata controller is specified for a QEMU
domain, as this isn't supported.
* src/qemu/qemu_conf.c: Remove VIR_DOMAIN_CONTROLLER_TYPE_IDE handler in
qemuBuildControllerDevStr(). Ignore IDE and FDC controllers. Error if
SATA controller is discovered. Add comments.
On RHEL-5 the qemu-kvm binary is located in /usr/libexec.
To reduce confusion for people trying to run upstream libvirt
on RHEL-5 machines, make the qemu driver look in /usr/libexec
for the qemu-kvm binary.
To make this work, I modified virFindFileInPath to handle an
absolute path correctly. I also ran into an issue where
NULL was sometimes being passed for the file parameter
to virFindFileInPath; it didn't crash prior to this patch
since it was building paths like /usr/bin/(null). This
is non-standard behavior, though, so I added a NULL
check at the beginning.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
* src/util/util.c (virGetUserEnt): In the unlikely event that
sysconf(_SC_GETPW_R_SIZE_MAX) fails, don't use -1 as the size in
the subsequent allocation.
xen-unstable c/s 20762 bumped XEN_SYSCTL_INTERFACE_VERSION to 7. The
interface change does not affect libvirt, other than xenHypervisorInit()
failing since version 7 is not tried.
The attached patch accommodates the upcoming Xen 4.0 release by checking
for XEN_SYSCTL_INTERFACE_VERSION 7. If found, it sets
XEN_DOMCTL_INTERFACE_VERSION to 6, which is also new to Xen 4.0.
it causes a NULL-dereference on some systems like Solaris 10.
* src/node_device/node_device_linux_sysfs.c. Include <stdlib.h>.
(get_sriov_function): Use canonicalize_file_name, not realpath.
* bootstrap (modules): Add canonicalize-lgpl.
This fixes a segfault when the event handler is called after shutdown
when the global driver state is NULL again.
Also fix a locking issue in an error path.
virFileMakePath is a recursive function that was creates a buffer
PATH_MAX bytes long for each recursion (one recursion for each element
in the path). This changes it to have no buffers on the stack, and to
allocate just one buffer total, no matter how many elements are in the
path. Because the modified algorithm requires a char* to be passed in
rather than const char *, it is now 2 functions - a toplevel API
function that remains identical in function, and a 2nd helper function
called for the recursions, which 1) doesn't allocate anything, and 2)
takes a char* arg, so it can modify the contents.
* src/util/util.c: rewrite virFileMakePath
This reverts commit cdc42d0a48.
As DanB pointed out, this patch is actually wrong. The real
bug that was causing me to see this problem is a bug
introduced in a RHEL-5 libvirt snapshot, and I'm going to
fix the real bug there.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
If you shutdown libvirtd while a domain with PCI
devices is running, then try to restart libvirtd,
libvirtd will crash.
This happens because qemuUpdateActivePciHostdevs() is calling
pciDeviceListSteal() with a dev of 0x0 (NULL), and then trying
to dereference it. This patch fixes it up so that
qemuUpdateActivePciHostdevs() steals the devices after first
Get()'ting them, avoiding the crash.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
* src/qemu/qemu_monitor_text.c (qemuMonitorTextAttachDrive): Most other
failures in this function would "goto cleanup", but one mistakenly
returned directly, skipping the cleanup and resulting in a leak.
In addition, iterating the "try_command" loop would clobber, and
thus leak, the "cmd" allocated on the first iteration,
so be careful to free it in addition to "reply" beforehand.
The KVM build of QEMU includs the thread ID of each vCPU in the
'query-cpus' output. This is required for pinning guests to
particular host CPUs
* src/qemu/qemu_monitor_json.c: Extract 'thread_id' from CPU info
* src/util/json.c, src/util/json.h: Declare returned strings
to be const
* src/qemu/qemu_monitor.c: Wire up JSON mode for qemuMonitorGetPtyPaths
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h: Fix
const correctness. Add missing error message in the function
qemuMonitorJSONGetAllPCIAddresses. Add implementation of the
qemuMonitorGetPtyPaths function calling 'query-chardev'.
Two files were using functions from <sys/stat.h> but not including
in. Most of the time they got this automatically via another header,
but certain build flag combinations can reveal the problem
* src/lxc/lxc_container.c, src/node_device/node_device_linux_sysfs.c:
Add <sys/stat.h>
The <console> tag is supposed to result in addition of a single
<serial> device for HVM guests. The 'targetType' attribute was
missing though causing the compatibility code to add a second
<console> device
* src/conf/domain_conf.c: Set targetType for serial device
When libvirtd shuts down, it places a <state/> tag in the XML
state file it writes out for guests with PCI passthrough
devices. For devices that are attached at bootup time, the
state tag is empty. However, at libvirtd startup time, it
ignores anything with a <state/> tag in the XML, effectively
hiding the guest.
This patch remove the check for VIR_DOMAIN_XML_INTERNAL_STATUS
when parsing the XML.
* src/conf/domain_conf.c: remove VIR_DOMAIN_XML_INTERNAL_STATUS
flag check in virDomainHostdevSubsysPciDefParseXML()
Certain hypervisors (like qemu/kvm) map the PCI bar(s) on
the host when doing device passthrough. This can lead to a race
condition where the hypervisor is still cleaning up the device while
libvirt is trying to re-attach it to the host device driver. To avoid
this situation, we look through /proc/iomem, and if the hypervisor is
still holding onto the bar (denoted by the string in the matcher variable),
then we can wait around a bit for that to clear up.
v2: Thanks to review by DV, make sure we wait the full timeout per-device
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The patches to add ACS checking to PCI device passthrough
introduced a bug. With the current code, if you try to
passthrough a device on the root bus (i.e. bus 0), then
it denies the passthrough. This is because the code in
pciDeviceIsBehindSwitchLackingACS() to check for a parent
device doesn't take into account the possibility of the
root bus. If we are on the root bus, it means we
legitimately can't find a parent, and it also means that
we don't have to worry about whether ACS is enabled.
Therefore return 0 (indicating we don't lack ACS) from
pciDeviceIsBehindSwitchLackingACS().
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Fix a small problem with the qemu memory stats parsing algorithm. If qemu
reports a stat that libvirt does not recognize, skip past it so parsing can
continue. This corrects a potential infinite loop in the parsing code that can
only be triggered if new statistics are added to qemu.
* src/qemu/qemu_monitor_text.c: qemuMonitorParseExtraBalloonInfo add a
skip for extra ','
The loop looking for the controller associated with a SCI drive had
an off by one, causing it to miss the last controller.
* src/qemu/qemu_driver.c: Fix off-by-1 in searching for SCSI
drive hotplug
The hotplug code in QEMU was leaking memory because although the
inner device object was being moved into the main virDomainDefPtr
config object, the outer container virDomainDeviceDefPtr was not.
* src/qemu/qemu_driver.c: Clarify code to show that the inner
device object is owned by the main domain config upon
successfull attach.
Add the ability to turn off dynamic management of file permissions
for libvirt guests.
* qemu/libvirtd_qemu.aug: Support 'dynamic_ownership' flag
* qemu/qemu.conf: Document 'dynamic_ownership' flag.
* qemu/qemu_conf.c: Load 'dynamic_ownership' flag
* qemu/test_libvirtd_qemu.aug: Test 'dynamic_ownership' flag
The hotplug code was not correctly invoking the security driver
in error paths. If a hotplug attempt failed, the device would
be left with VM permissions applied, rather than restored to the
original permissions. Also, a CDROM media that is ejected was
not restored to original permissions. Finally there was a bogus
call to set hostdev permissions in the hostdev unplug code
* qemu/qemu_driver.c: Fix security driver usage in hotplug/unplug
If there is a problem with VM startup, PCI devices may be left
assigned to pci-stub / pci-back. Adding a call to reattach
host devices in the cleanup path is required.
* qemu/qemu_driver.c: qemuDomainReAttachHostDevices() when
VM startup fails
Remove all the QEMU driver calls for setting file ownership and
process uid/gid. Instead wire in the QEMU DAC security driver,
stacking it ontop of the primary SELinux/AppArmour driver.
* qemu/qemu_driver.c: Switch over to new DAC security driver
This new security driver is responsible for managing UID/GID changes
to the QEMU process, and any files/disks/devices assigned to it.
* qemu/qemu_conf.h: Add flag for disabling automatic file permission
changes
* qemu/qemu_security_dac.h, qemu/qemu_security_dac.c: New DAC driver
for QEMU guests
* Makefile.am: Add new files
Pulling the disk labelling code out of the exec hook, and into
libvirtd will allow it to access shared state in the daemon. It
will also make debugging & error reporting easier / more reliable.
* qemu/qemu_driver.c: Move initial disk labelling calls up into
libvirtd. Add cleanup of disk labels upon failure
If a VM fails to start, we can't simply free the security label
strings, we must call the domainReleaseSecurityLabel() method
otherwise the reserved 'mcs' level will be leaked in SElinux
* src/qemu/qemu_driver.c: Invoke domainReleaseSecurityLabel()
when domain fails to start
The current security driver architecture has the following
split of logic
* domainGenSecurityLabel
Allocate the unique label for the domain about to be started
* domainGetSecurityLabel
Retrieve the current live security label for a process
* domainSetSecurityLabel
Apply the previously allocated label to the current process
Setup all disk image / device labelling
* domainRestoreSecurityLabel
Restore the original disk image / device labelling.
Release the unique label for the domain
The 'domainSetSecurityLabel' method is special because it runs
in the context of the child process between the fork + exec.
This is require in order to set the process label. It is not
required in order to label disks/devices though. Having the
disk labelling code run in the child process limits what it
can do.
In particularly libvirtd would like to remember the current
disk image label, and only change shared image labels for the
first VM to start. This requires use & update of global state
in the libvirtd daemon, and thus cannot run in the child
process context.
The solution is to split domainSetSecurityLabel into two parts,
one applies process label, and the other handles disk image
labelling. At the same time domainRestoreSecurityLabel is
similarly split, just so that it matches the style. Thus the
previous 4 methods are replaced by the following 6 new methods
* domainGenSecurityLabel
Allocate the unique label for the domain about to be started
No actual change here.
* domainReleaseSecurityLabel
Release the unique label for the domain
* domainGetSecurityProcessLabel
Retrieve the current live security label for a process
Merely renamed for clarity.
* domainSetSecurityProcessLabel
Apply the previously allocated label to the current process
* domainRestoreSecurityAllLabel
Restore the original disk image / device labelling.
* domainSetSecurityAllLabel
Setup all disk image / device labelling
The SELinux and AppArmour drivers are then updated to comply with
this new spec. Notice that the AppArmour driver was actually a
little different. It was creating its profile for the disk image
and device labels in the 'domainGenSecurityLabel' method, where as
the SELinux driver did it in 'domainSetSecurityLabel'. With the
new method split, we can have consistency, with both drivers doing
that in the domainSetSecurityAllLabel method.
NB, the AppArmour changes here haven't been compiled so may not
build.
The QEMU driver is doing 90% of the calls to check for static vs
dynamic labelling. Except it is forgetting todo so in many places,
in particular hotplug is mistakenly assigning disk labels. Move
all this logic into the security drivers themselves, so the HV
drivers don't have to think about it.
* src/security/security_driver.h: Add virDomainObjPtr parameter
to virSecurityDomainRestoreHostdevLabel and to
virSecurityDomainRestoreSavedStateLabel
* src/security/security_selinux.c, src/security/security_apparmor.c:
Add explicit checks for VIR_DOMAIN_SECLABEL_STATIC and skip all
chcon() code in those cases
* src/qemu/qemu_driver.c: Remove all checks for VIR_DOMAIN_SECLABEL_STATIC
or VIR_DOMAIN_SECLABEL_DYNAMIC. Add missing checks for possibly NULL
driver entry points.
Allows the initiator to use a variety of IQNs rather than just the
system IQN when creating iSCSI pools.
* docs/schemas/storagepool.rng: extends the syntax with <iqn name="..."/>
* src/conf/storage_conf.[ch]: read and stores the iqn name
* src/storage/storage_backend_iscsi.[ch]: implement the IQN selection
when detected
* src/lxc/lxc_container.c src/lxc/lxc_controller.c src/lxc/lxc_driver.c
src/network/bridge_driver.c src/qemu/qemu_driver.c
src/uml/uml_driver.c: virFileMakePath returns 0 for success, or the
value of errno on failure, so error checking should be to test
if non-zero, not if lower than 0
Previously the uid/gid/mode in the xml was ignored when creating new
storage pool directories. This commit attempts to honor the requested
permissions, and spits out an error if it can't.
Note that when creating the directory, the rest of the path leading up
to the final element is created using current uid/gid/mode, and the
final element gets the settings from xml. It is NOT an error for the
directory to already exist; in this case, the perms for the existing
directory are just set (if necessary).
* src/storage/storage_backend_fs.c: update the virStorageBackendFileSystemBuild
function to check the directory hierarchy separately then create the
leaf directory with the right attributes
In order to avoid problems trying to chown files that were created by
root on a root-squashing nfs server, fork a new process that setuid's
to the desired uid before creating the file. (It's only done this way
if the pool containing the new volume is of type 'netfs', otherwise
the old method of creating the file followed by chown() is used.)
This changes the semantics of the "create_func" slightly - previously
it was assumed that this function just created the file, then the
caller would chown it to the desired uid. Now, create_func does both
operations.
There are multiple functions that can take on the role of create_func:
createFileDir - previously called mkdir(), now calls virDirCreate().
virStorageBackendCreateRaw - previously called open(),
now calls virFileCreate().
virStorageBackendCreateQemuImg - use virRunWithHook() to setuid/gid.
virStorageBackendCreateQcowCreate - same.
virStorageBackendCreateBlockFrom - preserve old behavior (but attempt
chown when necessary even if not root)
* src/storage/storage_backend.[ch] src/storage/storage_backend_disk.c
src/storage/storage_backend_fs.c src/storage/storage_backend_logical.c
src/storage/storage_driver.c: change the create_func implementations,
also propagate the pool information to be able to detect NETFS ones.
These functions create a new file or directory with the given
uid/gid. If the flag VIR_FILE_CREATE_AS_UID is given, they do this by
forking a new process, calling setuid/setgid in the new process, and
then creating the file. This is better than simply calling open then
fchown, because in the latter case, a root-squashing nfs server would
create the new file as user nobody, then refuse to allow fchown.
If VIR_FILE_CREATE_AS_UID is not specified, the simpler tactic of
creating the file/dir, then chowning is is used. This gives better
results in cases where the parent directory isn't on a root-squashing
NFS server, but doesn't give permission for the specified uid/gid to
create files. (Note that if the fork/setuid method fails to create the
file due to access privileges, the parent process will make a second
attempt using this simpler method.)
If the bit VIR_FILE_CREATE_ALLOW_EXIST is set in the flags, an
existing file/directory will not cause an error; in this case, the
function will simply set the permissions of the file/directory to
those requested. If VIR_FILE_CREATE_ALLOW_EXIST is not specified, an
existing file/directory is considered (and reported as) an error.
Return from both of these functions is 0 on success, or the value of
errno if there was a failure.
* src/util/util.[ch]: add the 2 new util functions
The test expected all environment variables copied in qemudBuildCommandLine
to have known values. So all of them have to be either set to a known value
or be unset. SDL_VIDEODRIVER and QEMU_AUDIO_DRV are not handled at all but
should be handled. Unset both, otherwise the test will fail if they are set
in the testing environment.
* src/qemu/qemu_conf.c: add a comment about copied environment variables
and qemuxml2argvtest
* tests/qemuxml2argvtest.c: unset SDL_VIDEODRIVER and QEMU_AUDIO_DRV
* src/conf/domain_conf.c (virDomainChrDefFormat): Plug a leak on
an error path, and at the same time, eliminate the need for a
"cleanup:" block. Before, the "return -1" after the switch
would leak an "addr" string. Now, by reversing the port,addr-
getting blocks we can free "addr" immediately and skip the goto.
The 'int virInterfaceIsActive()' method was directly returning the
value of the 'int active:1' bitfield in virIntefaceDefPtr. A bitfield
with a signed integer, will hold the values 0 and -1, not 0 and +1
as might be expected. This meant that virInterfaceIsActive() was
always returning -1 when the interface was active, not +1 & thus all
callers thought an error had occurred. To protect against this kind
of mistake again, change all bitfields to be unsigned ints
* daemon/libvirtd.h, src/conf/domain_conf.h, src/conf/interface_conf.h,
src/conf/network_conf.h: Change bitfields to unsigned int.
Invoking the virConnectGetCapabilities() method causes the QEMU
driver to rebuild its internal capabilities object. Unfortunately
it was forgetting to register the custom domain status XML hooks
again.
To avoid this kind of error in the future, the code which builds
capabilities is refactored into one single method, which can be
called from all locations, ensuring reliable rebuilds.
* src/qemu/qemu_driver.c: Fix rebuilding of capabilities XML and
guarentee it is always consistent
* src/util/logging.c (virLogMessage): Include "ignore-value.h".
Use it to ignore the return value of safewrite.
Use STDERR_FILENO, rather than "2".
* bootstrap (modules): Add ignore-value.
* gnulib: Update to latest, for ignore-value that is now LGPLv2+.
This was accomplished in xml parsing by doing away with the
stripped-down virInterfaceBareDef object, and just always using
virInterfaceDef, but with restrictions in certain places (eg, the type
of subordinate interface allowed in parsing depends on the parent
interface).
xml formatting was similarly adjusted. In addition, the formatting
functions keep track of the level of interface nesting, and insert
extra leading spaces on each line accordingly (using %*s).
The only change in formatted xml from previous (aside frmo supporting
new combinations of interface types) is that the subordinate ethernet
interfaces take up 2 lines rather than one, eg:
<interface type='ethernet' name='eth0'>
</interface>
instead of:
<interface type='ethernet' name='eth0'/>
I noticed some debug messages are printed with an empty lines after
them. This patch removes these empty lines from all invocations of the
following macros:
VIR_DEBUG
VIR_DEBUG0
VIR_ERROR
VIR_ERROR0
VIR_INFO
VIR_WARN
VIR_WARN0
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
New pciDeviceIsAssignable() function for checking whether a given PCI
device can be assigned to a guest was added. Currently it only checks
for ACS being enabled on all PCIe switches between root and the PCI
device. In the future, it could be the right place to check whether a
device is unbound or bound to a stub driver.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Currently CPU topology may only be specified together with CPU model:
<cpu match='exact'>
<model>name</model>
<topology sockets='1' cores='2' threads='3'/>
</cpu>
This patch allows for CPU topology specification without the need for
also specifying CPU model:
<cpu>
<topology sockets='1' cores='2' threads='3'/>
</cpu>
'match' attribute and 'model' element are made optional with the
restriction that 'match' attribute has to be set when 'model' is
present.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>