There was missing capability for blkiotune and thus specifying these
settings caused libvirt to run qemu with invalid parameters and then
reporting qemu error instead of the standard libvirt one. The support
for blkiotune setting was added in upstream qemu repo under commit
0563e191516289c9d2f282a8c50f2eecef2fa773.
Given an LXC guest with a root filesystem path of
/export/lxc/roots/helloworld/root
During startup, we will pivot the root filesystem to end up
at
/.oldroot/export/lxc/roots/helloworld/root
We then try to open
/.oldroot/export/lxc/roots/helloworld/root/dev/pts
Now consider if '/export/lxc' is an absolute symlink pointing
to '/media/lxc'. The kernel will try to open
/media/lxc/roots/helloworld/root/dev/pts
whereas it should be trying to open
/.oldroot//media/lxc/roots/helloworld/root/dev/pts
To deal with the fact that the root filesystem can be moved,
we need to resolve symlinks in *any* part of the filesystem
source path.
* src/libvirt_private.syms, src/util/util.c,
src/util/util.h: Add virFileResolveAllLinks to resolve
all symlinks in a path
* src/lxc/lxc_container.c: Resolve all symlinks in filesystem
paths during startup
pciTrySecondaryBusReset checks if there is active device on the
same bus, however, qemu driver doesn't maintain an effective
list for the inactive devices, and it passes meaningless argument
for parameter "inactiveDevs". e.g. (qemuPrepareHostdevPCIDevices)
if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
return -1;
..skipped...
if (pciResetDevice(dev, driver->activePciHostdevs, pcidevs) < 0)
goto reattachdevs;
NB, the "pcidevs" used above are extracted from domain def, and
thus one won't be able to attach a device of which bus has other
device even detached from host (nodedev-detach). To see more
details of the problem:
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=773667
This patch is to resolve the problem by introducing an inactive
PCI device list (just like qemu_driver->activePciHostdevs), and
the whole logic is:
* Add the device to inactive list during nodedev-dettach
* Remove the device from inactive list during nodedev-reattach
* Remove the device from inactive list during attach-device
(for non-managed device)
* Add the device to inactive list after detach-device, only
if the device is not managed
With the above, we have a sufficient inactive PCI device list, and thus
we can use it for pciResetDevice. e.g.(qemuPrepareHostdevPCIDevices)
if (pciResetDevice(dev, driver->activePciHostdevs,
driver->inactivePciHostdevs) < 0)
goto reattachdevs;
Execute bit on *.stp files in examples/systemtap/ caused dependency when
building RPM packages. Disabling execute permission should help the auto
dependency resolver to see that systemtap is not needed.
This introduces new attribute wrpolicy with only supported
value as immediate. This will be an optional
attribute with no defaults. This helps specify whether
to skip the host page cache.
When wrpolicy is specified, meaning when wrpolicy=immediate
a writeback is explicitly initiated for the dirty pages in
the host page cache as part of the guest file write operation.
Usage:
<filesystem type='mount' accessmode='passthrough'>
<driver type='path' wrpolicy='immediate'/>
<source dir='/export/to/guest'/>
<target dir='mount_tag'/>
</filesystem>
Currently this only works with type='mount' for the QEMU/KVM driver.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
In the past we didn't reserve 0:0:2.0 PCI address if there was no video
device assigned to a domain, which made it impossible to add a video
device later on. So we fixed it (commit v0.9.0-37-g7b2cac1) by always
reserving that address. However, that breaks existing domains without
video devices that already have another device assigned to the
problematic address.
This patch reserves address 0:0:2.0 only in case it was not explicitly
assigned to another device, which means libvirt will try to keep this
address free and will not automatically assign it new devices. But
existing domains for which older libvirt already assigned the address to
a non-video device will keep working as they used to work before 0.9.1.
Moreover, users who want to create a domain without a video device and
use its address for another device may do so by explicitly configuring
the PCI address in domain XML.
qemuxml2argvtest sanitizes PATH to just /bin, but on at least
Fedora 16, dirname lives in /usr/bin instead. Regression
introduced in commit e7201afd.
* tests/qemuxml2argvdata/qemu.sh: Avoid forking a dirname call,
since dirname might not be in PATH after test sanitization.
* tests/qemuxml2argvdata/qemu-supported-cpus.sh: Likewise.
Diagnosed by Michal Privoznik.
Fix a typing error in the no-ip-spoofing filter.
Return DHCP request packets passing through this filter. Have
the user use another filter to actually allow DHCP requests to be
sent (action='accept').
There are several reasons for doing this:
- the CPU specification is out of libvirt's control so we cannot
guarantee stable guest ABI
- not every feature of a CPU may actually work as expected when
advertised directly to a guest
- migration between two machines with exactly the same CPU may work but
no guarantees can be made
- this mode is not supported and its use is at one's own risk
VIR_DOMAIN_XML_UPDATE_CPU flag for virDomainGetXMLDesc may be used to
get updated custom mode guest CPU definition in case it depends on host
CPU. This patch implements the same behavior for host-model and
host-passthrough CPU modes.
The mode can be either of "custom" (default), "host-model",
"host-passthrough". The semantics of each mode is described in the
following examples:
- guest CPU is a default model with specified topology:
<cpu>
<topology sockets='1' cores='2' threads='1'/>
</cpu>
- guest CPU matches selected model:
<cpu mode='custom' match='exact'>
<model>core2duo</model>
</cpu>
- guest CPU should be a copy of host CPU as advertised by capabilities
XML (this is a short cut for manually copying host CPU specification
from capabilities to domain XML):
<cpu mode='host-model'/>
In case a hypervisor does not support the exact host model, libvirt
automatically falls back to a closest supported CPU model and
removes/adds features to match host. This behavior can be disabled by
<cpu mode='host-model'>
<model fallback='forbid'/>
</cpu>
- the same as previous returned by virDomainGetXMLDesc with
VIR_DOMAIN_XML_UPDATE_CPU flag:
<cpu mode='host-model' match='exact'>
<model fallback='allow'>Penryn</model> --+
<vendor>Intel</vendor> |
<topology sockets='2' cores='4' threads='1'/> + copied from
<feature policy='require' name='dca'/> | capabilities XML
<feature policy='require' name='xtpr'/> |
... --+
</cpu>
- guest CPU should be exactly the same as host CPU even in the aspects
libvirt doesn't model (such domain cannot be migrated unless both
hosts contain exactly the same CPUs):
<cpu mode='host-passthrough'/>
- the same as previous returned by virDomainGetXMLDesc with
VIR_DOMAIN_XML_UPDATE_CPU flag:
<cpu mode='host-passthrough' match='minimal'>
<model>Penryn</model> --+ copied from caps
<vendor>Intel</vendor> | XML but doesn't
<topology sockets='2' cores='4' threads='1'/> | describe all
<feature policy='require' name='dca'/> | aspects of the
<feature policy='require' name='xtpr'/> | actual guest CPU
... --+
</cpu>
In case a hypervisor doesn't support the exact CPU model requested by a
domain XML, we automatically fallback to a closest CPU model the
hypervisor supports (and make sure we add/remove any additional features
if needed). This patch adds 'fallback' attribute to model element, which
can be used to disable this automatic fallback.
It's not totally obvious that a failure in
CPU guest data(x86): host/guest (models, pref="qemu64")
test means one needs to fix
x86-host+guest,models,qemu64-result.xml
where the expected XML is stored. Better to provide a nice hint in
verbose mode for failed tests.
Commit 5d784bd6d7 was a nice attempt to
clarify the semantics by requiring domain name from dxml to either match
original name or dname. However, setting dxml domain name to dname
doesn't really work since destination host needs to know the original
domain name to be able to use it in migration cookies. This patch
requires domain name in dxml to match the original domain name. The
change should be safe and backward compatible since migration would fail
just a bit later in the process.
We support <interface> of type "mcast", "server", and "client",
but the RNG schema for them are missed. Attribute "address" is
optional for "server" type. And these 3 types support
<mac address='MAC'/>, too.
Commit 29db7a0 picked up a gnulib bug, where a change in
bootstrap meant that it would fail to run libtoolize on
projects, like libvirt, that used the older spelling
AM_PROG_LIBTOOL instead of LT_INIT for the sake of building
on RHEL 5. Now that gnulib is fixed, we should pick up that
fix.
* .gnulib: Update to latest, for bootstrap fix.
* bootstrap: Resync from gnulib.
Though <alias> is ignored when defining a domain, it can cause
failure if one validates (e.g. virt-xml-validate) the XML dumped
from a running domain. This patch expose it in domain RNG schema
for all the devices which support it.
There are three address validation routines that do nothing:
virDomainDeviceDriveAddressIsValid()
virDomainDeviceUSBAddressIsValid()
virDomainDeviceVirtioSerialAddressIsValid()
Remove them, and replace their call sites with "1" which is what they
currently return. In some cases this means we can remove an entire
if block.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Add four tests of the XML -> argv handling for the PPC64 pseries machine.
The first is just a basic test of a bare bones machine.
The three others test various aspects of the spapr-vio address handling.
It seems that currently we can't include network devices, doing so leads
to a segfault because the network driverState is not initialised. Working
around that leads us to the problem that the 'default' network doesn't
exist. So for now just leave network devices out.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
We can't call qemuCapsExtractVersionInfo() from test code, because it
expects to be able to call the emulator, and for testing we have fake
emulators that can't be executed. For that reason qemuxml2argvtest.c
doesn't call qemuDomainAssignPCIAddresses(), instead it open codes its
own version.
That means we can't call qemuDomainAssignAddresses() from the test code,
instead we need to manually call qemuDomainAssignSpaprVioAddresses().
Also add logic to cope with qemuDomainAssignSpaprVioAddresses() failing,
so that we can write a test that checks for a known failure in there.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
The "unit" attribute of a drive address is optional in the code, so should
also be in the XML schema.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
KVM will be able to use a PCI SCSI controller even on POWER. Let
the user specify the vSCSI controller by other means than a default.
After this patch, the QEMU driver will actually look at the model
and reject anything but auto, lsilogic and ibmvscsi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit d09f6ba5fe introduced a regression in event
registration. virDomainEventCallbackListAddID() will only return a positive
integer if the type of event being registered is VIR_DOMAIN_EVENT_ID_LIFECYCLE.
For other event types, 0 is always returned on success. This has the
unfortunate side effect of not enabling remote event callbacks because
remoteDomainEventRegisterAny() uses the return value from the local call to
determine if an event callback needs to be registered on the remote end.
Make sure virDomainEventCallbackListAddID() returns the callback count for the
eventID being registered.
Signed-off-by: Adam Litke <agl@us.ibm.com>
When using "virsh domifstat" command or "virsh domiftune" command,
we pass an interface name as a parameter, so interface name is
important.
"virsh domiflist" output should display interface names
on the first row.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Disk "type" and "device" are generally interesting stuff the
user may want to known, too. To not break any scripts which
parsed the output field, a new option "--details" is introduced
to output the two introduced fields.
The new introduced optional attribute "copy_on_read</code> controls
whether to copy read backing file into the image file. The value can
be either "on" or "off". Copy-on-read avoids accessing the same backing
file sectors repeatedly and is useful when the backing file is over a
slow network. By default copy-on-read is off.
Earlier, when the number of vcpus was greater than the topology allowed,
libvirt didn't raise an error and continued, resulting in running qemu
with parameters making no sense. Even though qemu did not report any
error itself, the number of vcpus was set to maximum allowed by the
topology.
Detected by Coverity. Although unlikely, if we are ever started
with stdin closed, we could reach a situation where we open a
uuid file but then fail to close it, making that file the new
stdin for the rest of the process.
* src/util/uuid.c (getDMISystemUUID): Allow for stdin.
Commit 69f0b446 failed to update the expected test output.
* tests/virshtest.c (testCompareListDefault)
(testCompareListCustom): Adjust to recent code change.
Currently the LXC controller attempts to deal with EOF on a
tty by spawning a thread to do an edge triggered epoll_wait().
This avoids the normal event loop spinning on POLLHUP. There
is a subtle mistake though - even after seeing POLLHUP on a
master PTY, it is still perfectly possible & valid to write
data to the PTY. There is a buffer that can be filled with
data, even when no client is present.
The second mistake is that the epoll_wait() thread was not
looking for the EPOLLOUT condition, so when a new client
connects to the LXC console, it had to explicitly send a
character before any queued output would appear.
Finally, there was in fact no need to spawn a new thread to
deal with epoll_wait(). The epoll file descriptor itself
can be poll()'d on normally.
This patch attempts to deal with all these problems.
- The blocking epoll_wait() thread is replaced by a poll
on the epoll file descriptor which then does a non-blocking
epoll_wait() to handle events
- Even if POLLHUP is seen, we continue trying to write
any pending output until getting EAGAIN from write.
- Once write returns EAGAIN, we modify the epoll event
mask to also look for EPOLLOUT
* src/lxc/lxc_controller.c: Avoid stalled I/O upon
connected to an LXC console
Domain IDs are at least 16 bits for most hypervisors, theoretically
event 32-bits. 3 characters is clearly too small an alignment.
Increase alignment to 5 characters to allow 16-bit domain IDs to
display cleanly. Commonly seen with LXC where domain IDs are the
process IDs by default. Also increase the 'name' field from 20
to 30 characters to cope with longer guest names which are quite
common
If client stream does not have any data to sink and neither received
EOF, a dummy packet is sent to the daemon signalising client is ready to
sink some data. However, after we added event loop to client a race may
occur:
Thread 1 calls virNetClientStreamRecvPacket and since no data are cached
nor stream has EOF, it decides to send dummy packet to server which will
sent some data in turn. However, during this decision and actual message
exchange with server -
Thread 2 receives last stream data from server. Therefore an EOF is set
on stream and if there is a call waiting (which is not yet) it is woken
up. However, Thread 1 haven't sent anything so far, so there is no call
to be woken up. So this thread sent dummy packet to daemon, which
ignores that as no stream is associated with such packet and therefore
no reply will ever come.
This race causes client to hang indefinitely.
Just like command "domblklist", the command extracts "type",
"source", "target", "model", and "MAC" of all virtual interfaces
from domain XML (live or persistent).
QEMU does not support security_model for anything but 'path' fs driver type.
Currently in libvirt, when security_model ( accessmode attribute) is not
specified it auto-generates it irrespective of the fs driver type, which
can result in a qemu error for drivers other than path. This patch ensures
that the qemu cmdline is correctly generated by taking into account the
fs driver type.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
If a system has 64 or more VF's, it is quite tedious to mention each VF
in the interface pool.
The following modification will implicitly create an interface pool from
the SR-IOV PF.
This functions enables us to get the Virtual Functions attached to
a Physical function given the name of a SR-IOV physical functio.
In order to accomplish the task, added a getter function pciGetDeviceAddrString
to get the BDF of the Virtual Function in a char array.
Although the netcf interface driver can in theory be used by
the stateless drivers, in practice none of them want to use
it because they have different ways of dealing with interfaces.
Furthermore, if you have mingw32-netcf installed, then the
libvirt mingw32 build will fail with
../../src/interface/netcf_driver.c:644:5: error: unknown field 'close_used_without_including_unistd_h' specified in initializer
* configure.ac: disable netcf if built without libvirtd
The autobuilder pointed out an odd failure on mingw:
../../src/interface/netcf_driver.c:644:5: error: unknown field 'close_used_without_including_unistd_h' specified in initializer
cc1: warnings being treated as errors
This is because the gnulib headers #define close to different strings,
according to which headers are included, in order to work around some
odd mingw problems with close(), and these defines happen to also
affect field members declared with a name of struct foo.close. As long
as all headers are included before both the definition and use of the
struct, the various #define doesn't matter, but the netcf file hit
an instance where things were included in a different order. Fix this
for all clients that use a struct member named 'close'.
* src/driver.h: Include <unistd.h> before using 'close'.