qemuDomainPCIAddressBus was an array of QEMU_PCI_ADDRESS_SLOT_LAST
uint8_t's, which worked fine as long as every PCI bus was
identical. In the future, some PCI busses will allow connecting PCI
devices, and some will allow PCIe devices; also some will only allow
connection of a single device, while others will allow connecting 31
devices.
In order to keep track of that information for each bus, we need to
turn qemuDomainPCIAddressBus into a struct, for now with just one
member:
uint8_t slots[QEMU_PCI_ADDRESS_SLOT_LAST];
Additional members will come in later patches.
The item in qemuDomainPCIAddresSet that contains the array of
qemuDomainPCIAddressBus is now called "buses" to be more consistent
with the already existing "nbuses" (and with the new "slots" array).
The difference with already supported pool types (dir, fs, block)
is: there are two modes for iscsi pool (or network pools in future),
one can specify it either to use the volume target path (the path
showed up on host) with mode='host', or to use the remote URI qemu
supports (e.g. file=iscsi://example.org:6000/iqn.1992-01.com.example/1)
with mode='direct'.
For 'host' mode, it copies the volume target path into disk->src. For
'direct' mode, the corresponding info in the *one* pool source host def
is copied to disk->hosts[0].
The alias for hostdevs of type SCSI can be too long for QEMU if
larger LUNs are encountered. Here's a real life example:
<hostdev mode='subsystem' type='scsi' managed='no'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='19' unit='1088634913'/>
</source>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</hostdev>
this results in a too long drive id, resulting in QEMU yelling
Property 'scsi-generic.drive' can't find value 'drive-hostdev-scsi_host0-0-19-1088634913'
This commit changes the alias back to the default hostdev$(index)
scheme.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When user does not specify any model for scsi controller, or worse, no
controller at all, but libvirt automatically adds scsi controller with
no model, we are not searching for virtio-scsi and thus this can fail
for example on qemu which doesn't support lsi logic adapter.
This means that when qemu on x86 doesn't support lsi53c895a and the
user adds the following to an XML without any scsi controller:
<disk ...>
...
<target dev='sda'>
</disk>
libvirt fails like this:
# virsh define asdf.xml
error: Failed to define domain from asdf.xml
error: internal error Unable to determine model for scsi controller
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=974943
When virAsprintf was changed from a function to a macro
reporting OOM error in dc6f2da, it was documented as returning
0 on success. This is incorrect, it returns the number of bytes
written as asprintf does.
Some of the functions were converted to use virAsprintf's return
value directly, changing the return value on success from 0 to >= 0.
For most of these, this is not a problem, but the change in
virPCIDriverDir breaks PCI passthrough.
The return value check in virhashtest pre-dates virAsprintf OOM
conversion.
vmwareMakePath seems to be unused.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The function being introduced is responsible for creating command
line argument for '-device' for given character device. Based on
the chardev type, it calls appropriate qemuBuild.*ChrDeviceStr(),
e.g. qemuBuildSerialChrDeviceStr() for serial chardev and so on.
The chardev alias assignment is going to be needed in a separate
places, so it should be moved into a separate function rather
than copying code randomly around.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add new CPU features for HyperV:
vapic for virtual APIC support
spinlocks for setting spinlock support
<features>
<hyperv>
<vapic state='on'/>
<spinlocks state='on' retries='4096'/>
</hyperv>
</features>
https://bugzilla.redhat.com/show_bug.cgi?id=784836
https://bugzilla.redhat.com/show_bug.cgi?id=964177
Though both libvirt and QEMU's document say RTC_CHANGE returns
the offset from the host UTC, qemu actually returns the offset
from the specified date instead when specific date is provided
(-rtc base=$date).
It's not safe for qemu to fix it in code, it worked like that
for 3 years, changing it now may break other QEMU use cases.
What qemu tries to do is to fix the document:
http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg04782.html
And in libvirt side, instead of replying on the value from qemu,
this converts the offset returned from qemu to the offset from
host UTC, by:
/*
* a: the offset from qemu RTC_CHANGE event
* b: The specified date (-rtc base=$date)
* c: the host date when libvirt gets the RTC_CHANGE event
* offset: What libvirt will report
*/
offset = a + (b - c);
The specified date (-rtc base=$date) is recorded in clock's def as
an internal only member (may be useful to exposed outside?).
Internal only XML tag "basetime" is introduced to not lose the
guest's basetime after libvirt restarting/reloading:
<clock offset='variable' adjustment='304' basis='utc' basetime='1370423588'/>
Currently, if there's an error opening /dev/vhost-net (e.g. because
it doesn't exist) but it's not required we proceed with vhostfd array
filled with -1 and vhostfdSize unchanged. Later, when constructing
the qemu command line only non-negative items within vhostfd array
are taken into account. This means, vhostfdSize may be greater than
the actual count of non-negative items in vhostfd array. This results
in improper command line arguments being generated, e.g.:
-netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=(null)
With previous patch, we accept negative value as length of string to
duplicate. So there is no need to pass strlen(src) in case we want to do
duplicate the whole string.
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
Since 0d70656afded, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
During building of the qemu command line determine whether to add/use the
'-no-reboot' option only if each of the 'on' events want to to destroy
the domain; otherwise, use the '-no-shutdown' option.
Prior to this change both could be on the command line, which while allowed
could be construed as a conflict.
Adding a VNC WebSocket support for QEMU driver. This functionality is
in upstream qemu from commit described as v1.3.0-982-g7536ee4, so the
capability is being recognized based on QEMU version for now.
QEMU introduced command line "-mem-merge=on|off" (defaults to on) to
enable/disable the memory merge (KSM) at guest startup. This exposes
it by new XML:
<memoryBacking>
<nosharepages/>
</memoryBacking>
The XML tag is same with what we used internally for old RHEL.
The QEMU command line syntax for RBD disks is
file=rbd:pool/image:opt1=val1:opt2=val2...
There is no way to escape the ':' if it appears in the
pool or image name. Thus it must be explicitly forbidden
if it occurs in the libvirt XML. People are known to
be abusing the lack of escaping in current libvirt to
pass arbitrary args to QEMU.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The <filesystem> element can now accept a <driver type='nbd'/>
as an alternative to 'loop'. The benefit of NBD is support
for non-raw disk image formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Extend the <driver> element in filesystem devices to
allow a storage format to be set. The new attribute
uses 'format' to reflect the storage format. This is
different from the <driver> element in disk devices
which use 'type' to reflect the storage format. This
is because the 'type' attribute on filesystem devices
is already used for the driver backend, for which the
disk devices use the 'name' attribute. Arggggh.
Anyway for disks we have
<driver name="qemu" type="raw"/>
And for filesystems this change means we now have
<driver type="loop" format="raw"/>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Except the scsi host device's controller is "lsilogic", mapping
between the libvirt attributes and scsi-generic properties is:
libvirt qemu
-----------------------------------------
controller bus ($libvirt_controller.0)
bus channel
target scsi-id
unit lun
For scsi host device with "lsilogic" controller, the mapping is:
('target (libvirt)' must be 0, as it's not used; 'unit (libvirt)
must <= 7).
libvirt qemu
----------------------------------------------------------
controller && bus bus ($libvirt_controller.$libvirt_bus)
unit scsi-id
It's not good to hardcode/hard-check limits of these attributes,
and even worse, these limits are not documented, one has to find
out by either testing or reading the qemu code, I'm looking forward
to qemu expose limits like these one day). For example, exposing
"max_target", "max_lun" for megasas:
static const struct SCSIBusInfo megasas_scsi_info = {
.tcq = true,
.max_target = MFI_MAX_LD,
.max_lun = 255,
.transfer_data = megasas_xfer_complete,
.get_sg_list = megasas_get_sg_list,
.complete = megasas_command_complete,
.cancel = megasas_command_cancel,
};
Example of the qemu command line (lsilogic controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,scsi-id=8,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Example of the qemu command line (virtio-scsi controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,channel=0,scsi-id=128,lun=128,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
VFIO device assignment requires a cgroup ACL to be setup for access to
the /dev/vfio/nn "group" device for any devices that will be assigned
to a guest. In the case of a host device that is allocated from a
pool, it was being allocated during qemuBuildCommandLine(), which is
called by qemuProcessStart() *after* the all-encompassing
qemuSetupCgroup() was called, meaning that the standard Cgroup ACL
setup wasn't creating ACLs for these devices allocated from pools.
One possible solution was to manually add a single ACL down inside
qemuBuildCommandLine() when networkAllocateActualDevice() is called,
but that has two problems: 1) the function that adds the cgroup ACL
requires a virDomainObjPtr, which isn't available in
qemuBuildCommandLine(), and 2) we really shouldn't be doing network
device setup inside qemuBuildCommandLine() anyway.
Instead, I've created a new function called
qemuNetworkPrepareDevices() which is called just before
qemuPrepareHostDevices() during qemuProcessStart() (explanation of
ordering in the comments), i.e. well before the call to
qemuSetupCgroup(). To minimize code churn in a patch that will be
backported to 1.0.5-maint, qemuNetworkPrepareDevices only does
networkAllocateActualDevice() and the bare amount of setup required
for type='hostdev network devices, but it eventually should do *all*
device setup for guest network devices.
Note that some of the code that was previously needed in
qemuBuildCommandLine() is no longer required when
networkAllocateActualDevice() is called earlier:
* qemuAssignDeviceHostdevAlias() is already done further down in
qemuProcessStart().
* qemuPrepareHostdevPCIDevices() is called by
qemuPrepareHostDevices() which is called after
qemuNetworkPrepareDevices() in qemuProcessStart().
As hinted above, this new function should be moved into a separate
qemu_network.c (or similarly named) file along with
qemuPhysIfaceConnect(), qemuNetworkIfaceConnect(), and
qemuOpenVhostNet(), and expanded to call those functions as well, then
the nnets loop in qemuBuildCommandLine() should be reduced to only
build the commandline string (which itself can be in a separate
qemuInterfaceBuilldCommandLine() function as suggested by
Michal). However, this will require storing away an array of tapfd and
vhostfd that are needed for the commandline, so I would rather do that
in a separate patch and leave this patch at the minimum to fix the
bug.