The code formatting NUMA args was ignoring the return value
of virBitmapFormat, so on OOM, it would silently drop the
NUMA cpumask arg.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When building boot menu args, if OOM occurred the CLI args
would end up containing 'order=(null)' due to a missing
call to 'virBufferError'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuParseCommandLine method did not check the return value of
virStringSplit to see if OOM had occurred. This lead to dereference
of a NULL pointer on OOM.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Most callers of qemuParseKeywords were assigning its return
value to a 'size_t' variable. Then then also checked '< 0'
for error condition, but this will never be true with the
unsigned size_t variable. Rather than using 'ssize_t', change
qemuParseKeywords so that the element count is returned via
an output parameter, leaving the return value solely as an
error indicator.
This avoids a crash accessing beyond the end of an error
upon OOM.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In
commit 41b5505679
Author: Eric Blake <eblake@redhat.com>
Date: Wed Aug 28 15:01:23 2013 -0600
qemu: simplify list cleanup
The qemuStringToArgvEnv method was changed to use virStringFreeList
to free the 'arglist' array. This method assumes the string list
array is NULL terminated, however, qemuStringToArgvEnv was not
ensuring this when populating 'arglist'. This caused an out of
bounds access by virStringFreeList when OOM occured in the initial
loop of qemuStringToArgvEnv
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When parsing the RBD hosts, it increments the 'nhosts' counter
before increasing the 'hosts' array allocation. If an OOM then
occurs when increasing the array allocation, the cleanup block
will attempt to access beyond the end of the array. Switch
to using VIR_EXPAND_N instead of VIR_REALLOC_N to protect against
this mistake
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If OOM occurs in qemuDomainCCWAddressSetCreate, it jumps to
a cleanup block and frees the partially initialized object.
It then mistakenly returns the address of the just free'd
pointer instead of NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This resolves https://bugzilla.redhat.com/show_bug.cgi?id=1008903
The Q35 machinetype has an implicit SATA controller at 00:1F.2 which
isn't given the "expected" id of ahci0 by qemu when it's created. The
original suggested solution to this problem was to not specify any
controller for the disks that use the default controller and just
specify "unit=n" instead; qemu should then use the first IDE or SATA
controller for the disk.
Unfortunately, this "solution" is ignorant of the fact that in the
case of SATA disks, the "unit" attribute in the disk XML is actually
*not* being used for the unit, but is instead used to specify the
"bus" number; each SATA controller has 6 buses, and each bus only
allows a single unit. This makes it nonsensical to specify unit='n'
where n is anything other than 0. It also means that the only way to
connect more than a single device to the implicit SATA controller is
to explicitly give the bus names, which happen to be "ide.$n", where
$n can be replaced by the disk's "unit" number.
qemu/KVM also supports a tftp URL while specifying the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='tftp' name='/url/path'>
<host name='host.name' port='69'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The ftps protocol is another protocol supported by qemu/KVM while specifying
the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftps' name='/url/path'>
<host name='host.name' port='990'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The https protocol is also accepted by qemu/KVM when specifying the cdrom ISO
image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='https' name='/url/path'>
<host name='host.name' port='443'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
Currently, only X86 provides users CPU features with CPUID instruction.
If users specify the features for non-x86, it should tell users to
remove them.
This patch is to report one error if features are specified by
users for non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
In Fedora 19, 'qemu-kvm' is a simple wrapper that calls
'qemu-system-x86_64 -machine accel=kvm'. Attempting
to use 'virsh qemu-attach $pid' to a machine started as:
qemu-kvm -cdrom /var/lib/libvirt/images/foo.img \
-monitor unix:/tmp/demo,server,nowait -name foo \
--uuid cece4f9f-dff0-575d-0e8e-01fe380f12ea
was failing with:
error: XML error: No PCI buses available
because we did not see 'kvm' in the executable name read from
/proc/$pid/cmdline, and tried to assign os.machine as
"accel=kvm" instead of "pc"; this in turn led to refusal to
recognize the pci bus.
Noticed while investigating https://bugzilla.redhat.com/995312
although there are still other issues to fix before that bug
will be completely solved.
I've concluded that the existing parser code for native-to-xml
is a horrendous hodge-podge of ad-hoc approaches; I basically
rewrote the -machine section to be a bit saner.
* src/qemu/qemu_command.c (qemuParseCommandLine): Don't assume
-machine argument is always appropriate for os.machine; set
virtType if accel is present.
Signed-off-by: Eric Blake <eblake@redhat.com>
'virsh domxml-from-native' and 'virsh qemu-attach' could misbehave
for an emulator installed in (a somewhat unlikely) location
such as /usr/local/qemu-1.6/qemu-system-x86_64 or (an even less
likely) /opt/notxen/qemu-system-x86_64. Limit the strstr seach
to just the basename of the file where we are assuming details
about the binary based on its name.
While testing, I accidentally triggered a core dump during strcmp
when I forgot to set os.type on one of my code paths; this patch
changes such a coding error to raise a nicer internal error instead.
* src/qemu/qemu_command.c (qemuParseCommandLine): Compute basename
earlier.
* src/conf/domain_conf.c (virDomainDefPostParseInternal): Avoid
NULL deref.
Signed-off-by: Eric Blake <eblake@redhat.com>
CPU features are not supported on non-x86 and hasFeatures will be NULL.
This patch is to remove CPU features functions calling to avoid errors.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
The VIR_FREE() macro will cast away any const-ness. This masked a
number of places where we passed a 'const char *' string to
VIR_FREE. Fortunately in all of these cases, the variable was not
in fact const data, but a heap allocated string. Fix all the
variable declarations to reflect this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
No need to open code now that we have a nice function.
Interestingly, our virStringFreeList function is typed correctly
(a malloc'd list of malloc'd strings is NOT const, whether at the
point where it is created, or at the point where it is cleand up),
so using it with a 'const char **' argument would require a cast
to keep the compiler. I chose instead to remove const from code
even where we don't modify the argument, just to avoid the need
to cast.
* src/qemu/qemu_command.h (qemuParseCommandLine): Drop declaration.
* src/qemu/qemu_command.c (qemuParseProcFileStrings)
(qemuStringToArgvEnv): Don't force malloc'd result to be const.
(qemuParseCommandLinePid, qemuParseCommandLineString): Simplify
cleanup.
(qemuParseCommandLine, qemuFindEnv): Drop const-correctness to
avoid the need to cast in callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, kernel supports up to 8 queues for a multiqueue tap device.
However, if user tries to enter a huge number (e.g. one million) the tap
allocation fails, as expected. But what is not expected is the log full
of warnings:
warning : virFileClose:83 : Tried to close invalid fd 0
The problem is, upon error we iterate over an array of FDs (handlers to
queues) and VIR_FORCE_CLOSE() over each item. However, the array is
pre-filled with zeros. Hence, we repeatedly close stdin. Ouch.
But there's more. The queues allocation is done in virNetDevTapCreate()
which cleans up the FDs in case of error. Then, its caller, the
virNetDevTapCreateInBridgePort() iterates over the FD array and tries to
close them too. And so does qemuNetworkIfaceConnect() and
qemuBuildInterfaceCommandLine().
Starting with qemu 1.6, the qemu-system-arm vexpress-a9 model has a
hardcoded virtio-mmio transport which enables attaching all virtio
devices.
On the command line, we have to use virtio-XXX-device rather than
virtio-XXX-pci, thankfully s390 already set the precedent here so
it's fairly straight forward.
At the XML level, this adds a new device address type virtio-mmio.
The controller and addressing don't have any subelements at the
moment because we they aren't needed for this usecase, but could
be added later if needed.
Add a test case for an ARM guest with one of every virtio device
enabled.
Similar to the chardev bit, ARM boards depend on the old style '-net nic'
for actually instantiating net devices. But we can't block out
-netdev altogether since it's needed for upcoming virtio support.
And add tests for working ARM XML with console, disk, and networking.
This corresponds to '-sd' and '-drive if=sd' on the qemu command line.
Needed for many ARM boards which don't provide any other way to
pass in storage.
QEMU ARM boards don't give us any way to explicitly wire in
a -chardev, so use the old style -serial options.
Unfortunately this isn't as simple as just turning off the CHARDEV flag
for qemu-system-arm, as upcoming virtio support _will_ use device/chardev.
On my machine, a guest fails to boot if it has a sound card, but not
graphical device/display is configured, because pulseaudio fails to
initialize since it can't access $HOME.
A workaround is removing the audio device, however on ARM boards there
isn't any option to do that, so -nographic always fails.
Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately
this has massive test suite fallout.
Add a qemu.conf parameter nographics_allow_host_audio, that if enabled
will pass through QEMU_AUDIO_DRV from sysconfig (similar to
vnc_allow_host_audio)
Add an attribute named 'removable' to the 'target' element of disks,
which controls the removable flag. For instance, on a Linux guest it
controls the value of /sys/block/$dev/removable. This option is only
valid for USB disks (i.e. bus='usb'), and its default value is 'off',
which is the same behaviour as before.
To achieve this, 'removable=on' (or 'off') is appended to the '-device
usb-storage' parameter sent to qemu when adding a USB disk via
'-disk'. A capability flag QEMU_CAPS_USB_STORAGE_REMOVABLE was added
to keep track if this option is supported by the qemu version used.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=922495
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Allow use of the usb-storage device only if the new capability flag
QEMU_CAPS_DEVICE_USB_STORAGE is set, which it is for qemu(-kvm)
versions >= 0.12.1.2-rhel62-beta.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
vhost only works in KVM mode at the moment, and is infact compiled
out if the emulator is built for non-native architecture. While it
may work at some point in the future for plain qemu, for now it's
just noise on the command line (and which contributes to arm cli
breakage).
QEMU commit 3984890 introduced the "pci-hole64-size" property,
to i440FX-pcihost and q35-pcihost with a default setting of 2 GB.
Translate <pcihole64>x<pcihole64/> to:
-global q35-pcihost.pci-hole64-size=x for q35 machines and
-global i440FX-pcihost.pci-hole64-size=x for i440FX-based machines.
Error out on other machine types or if the size was specified
but the pcihost device lacks 'pci-hole64-size' property.
https://bugzilla.redhat.com/show_bug.cgi?id=990418
The ftp protocol is already recognized by qemu/KVM so add this support to
libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftp' name='/url/path'>
<host name='host.name' port='21'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
QEMU/KVM already allows a HTTP URL for the cdrom ISO image so add this support
to libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='http' name='/url/path'>
<host name='host.name' port='80'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
If user requested multiqueue networking, beside multiple /dev/tap and
/dev/vhost-net openings, we forgot to pass mq=on onto the -device
virtio-net-pci command line. This is advised at:
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
Each of the modules handled reporting error messages from the secret fetching
slightly differently with respect to the error. Provide a similar message
for each error case and provide as much data as possible.
If there's no hard_limit set and domain uses VFIO we still must lock the
guest memory (prerequisite from qemu). Hence, we should compute the
amount to be locked from max_balloon.
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
This patch addresses two concerns with the error reporting when an
incompatible PCI address is specified for a device:
1) It wasn't always apparent which device had the problem. With this
patch applied, any error about an incompatible address will always
contain the full address as given in the config, so it will be easier
to determine which device's config aused the problem.
2) In some cases when the problem came from bad config, the error
message was erroneously classified as VIR_ERR_INTERNAL_ERROR. With
this patch applied, the same error message will be changed to indicate
either "internal" or "xml" error depending on whether the address came
from the config, or was automatically generated by libvirt.
Note that in the case of "internal" (due to bad auto-generation)
errors, the PCI address won't be of much use in finding the location
in config to change (because it was automatically generated). Of
course that makes perfect sense, but still the address could provide a
clue about a bug in libvirt attempting to use a type of pci bus that
doesn't have its flags set correctly (or something similar). In other
words, it's not perfect, but it is definitely better.
q35 machines have an implicit ahci (sata) controller at 00:1F.2 which
has no "id" associated with it. For this reason, we can't refer to it
as "ahci0". Instead, we don't give an id on the commandline, which
qemu interprets as "use the first ahci controller". We then need to
specify the unit with "unit=%d" rather than adding it onto the bus
arg.
We had been setting the device alias in the devinceinfo for pci
controllers to "pci%u", but then hardcoding "pci.%u" when creating the
device address for other devices using that pci bus. This all worked
just fine until we encountered the built-in "pcie.0" bus (the PCIe
root complex) in Q35 machines.
In order to create the correct commandline for this one case, this
patch:
1) sets the alias for PCI controllers correctly, to "pci.%u" (or
"pcie.%u" for the pcie-root controller)
2) eliminates the hardcoded "pci.%u" for pci controllers when
generatuing device address strings, and instead uses the controller's
alias.
3) plumbs a pointer to the virDomainDef all the way down to
qemuBuildDeviceAddressStr. This was necessary in order to make the
aliase of the controller *used by a device* available (previously
qemuBuildDeviceAddressStr only had the deviceinfo of the device
itself, *not* of the controller it was connecting to). This made for a
larger than desired diff, but at least in the future we won't have to
do it again, since all the information we could possibly ever need for
future enhancements is in the virDomainDef. (right?)
This should be done for *all* controllers, but for now we just do it
in the case of PCI controllers, to reduce the likelyhood of
regression.
This patch adds in special handling for a few devices that need to be
treated differently for q35 domains:
usb - there is no implicit/default usb controller for the q35
machinetype. This is done because normally the default usb controller
is added to a domain by just adding "-usb" to the qemu commandline,
and it's assumed that this will add a single piix3 usb1 controller at
slot 1 function 2. That's not what happens when the machinetype is
q35, though. Instead, adding -usb to the commandline adds 3 usb
(version 2) controllers to the domain at slot 0x1D.{1,2,7}. Rather
than having
<controller type='usb' index='0'/>
translate into 3 separate devices on the PCI bus, it's cleaner to not
automatically add a default usb device; one can always be added
explicitly if desired. Or we may decide that on q35 machines, 3 usb
controllers will be automatically added when none is given. But for
this initial commit, at least we aren't locking ourselves into
something we later won't want.
video - qemu always initializes the primary video device immediately
after any integrated devices for the machinetype. Unless instructed
otherwise (by using "-device vga..." instead of "-vga" which libvirt
uses in many cases to work around deficiencies and bugs in various
qemu versions) qemu will always pick the first unused slot. In the
case of the "pc" machinetype and its derivatives, this is always slot
2, but on q35 machinetypes, the first free slot is slot 1 (since the
q35's integrated peripheral devices are placed in other slots,
e.g. slot 0x1f). In order to make the PCI address of the video device
predictable, that slot (1 or 2, depending on machinetype) is reserved
even when no video device has been specified.
sata - a q35 machine always has a sata controller implicitly added at
slot 0x1F, function 2. There is no way to avoid this controller, so we
always add it. Note that the xml2xml tests for the pcie-root and q35
cases were changed to use DO_TEST_DIFFERENT() so that we can check for
the sata controller being automatically added. This is especially
important because we can't check for it in the xml2argv output (it has
no effect on that output since it's an implicit device).
ide - q35 has no ide controllers.
isa and smbus controllers - these two are always present in a q35 (at
slot 0x1F functions 0 and 3) but we have no way of modelling them in
our config. We do need to reserve those functions so that the user
doesn't attempt to put anything else there though. (note that the "pc"
machine type also has an ISA controller, which we also ignore).
This PCI controller, named "dmi-to-pci-bridge" in the libvirt config,
and implemented with qemu's "i82801b11-bridge" device, connects to a
PCI Express slot (e.g. one of the slots provided by the pcie-root
controller, aka "pcie.0" on the qemu commandline), and provides 31
*non-hot-pluggable* PCI (*not* PCIe) slots, numbered 1-31.
Any time a machine is defined which has a pcie-root controller
(i.e. any q35-based machinetype), libvirt will automatically add a
dmi-to-pci-bridge controller if one doesn't exist, and also add a
pci-bridge controller. The reasoning here is that any useful domain
will have either an immediate (startup time) or eventual (subsequent
hot-plug) need for a standard PCI slot; since the pcie-root controller
only provides PCIe slots, we need to connect a dmi-to-pci-bridge
controller to it in order to get a non-hot-plug PCI slot that we can
then use to connect a pci-bridge - the slots provided by the
pci-bridge will be both standard PCI and hot-pluggable.
Since pci-bridge devices themselves can not be hot-plugged into a
running system (although you can hot-plug other devices into a
pci-bridge's slots), any new pci-bridge controller that is added can
(and will) be plugged into the dmi-to-pci-bridge as long as it has
empty slots available.
This patch is also changing the qemuxml2xml-pcie test from a "DO_TEST"
to a "DO_DIFFERENT_TEST". This is so that the "before" xml can omit
the automatically added dmi-to-pci-bridge and pci-bridge devices, and
the "after" xml can include it - this way we are testing if libvirt is
properly adding these devices.
This controller is implicit on q35 machinetypes. It provides 31 PCIe
(*not* PCI) slots as controller 0.
Currently there are no devices that can connect to pcie-root, and no
implicit pci controller on a q35 machine, so q35 is still
unusable. For a usable q35 system, we need to add a
"dmi-to-pci-bridge" pci controller, which can connect to pcie-root,
and provides standard pci slots that can be used to connect other
devices.
Previous refactoring of the guest PCI address reservation/allocation
code allowed for slot types other than basic PCI (e.g. PCI express,
non-hotpluggable slots, etc) but would not auto-allocate a slot for a
device that required any type other than a basic hot-pluggable
PCI slot.
This patch refactors the code to be aware of different slot types
during auto-allocation of addresses as well - as long as there is an
empty slot of the required type, it will be found and used.
The piece that *wasn't* added is that we don't auto-create a new PCI
bus when needed for anything except basic PCI devices. This is because
there are multiple different types of controllers that can provide,
for example, a PCI express slot (in addition to the pcie-root
controller, these can also be found on a "root-port" or on a
"downstream-switch-port"). Since we currently don't support any PCIe
devices (except pending support for dmi-to-pci-bridge), we can defer
any decision on what to do about this.
* The functions qemuDomainPCIAddressReserveAddr and
qemuDomainPCIAddressReserveSlot were very similar (and should have
been more similar) and were about to get more code added to them which
would create even more duplicated code, so this patch gives
qemuDomainPCIAddressReserveAddr a "reserveEntireSlot" arg, then
replaces the body of qemuDomainPCIAddressReserveSlot with a call to
qemuDomainPCIAddressReserveAddr.
You will notice that addrs->lastaddr was previously set in
qemuDomainPCIAddressReserveAddr (but *not* set in
qemuDomainPCIAddressReserveSlot). For consistency and cleanliness of
code, that bit was removed and put into the one caller of
qemuDomainPCIAddressReserveAddr (there is a similar place where the
caller of qemuDomainPCIAddressReserveSlot sets lastaddr). This does
guarantee identical functionality to pre-patch code, but in practice
isn't really critical, because lastaddr is just keeping track of where
to start when looking for a free slot - if it isn't updated, we will
just start looking on a slot that's already occupied, then skip up to
one that isn't.
* qemuCollectPCIAddress was essentially doing the same thing as
qemuDomainPCIAddressReserveAddr, but with some extra special case
checking at the beginning. The duplicate code has been replaced with
a call to qemuDomainPCIAddressReserveAddr. This required adding a
"fromConfig" boolean, which is only used to change the log error
code from VIR_ERR_INTERNAL_ERROR (when the address was
auto-generated by libvirt) to VIR_ERR_XML_ERROR (when the address is
coming from the config); without this differentiation, it would be
difficult to tell if an error was caused by something wrong in
libvirt's auto-allocate code or just bad config.
* the bit of code in qemuDomainPCIAddressValidate that checks the
connect type flags is going to be used in a couple more places where
we don't need to also check the slot limits (because we're generating
the slot number ourselves), so that has been pulled out into a
separate qemuDomainPCIAddressFlagsCompatible function.
* qemuDomainPCIAddressSetNextAddr
The name of this function was confusing because 1) other functions in
the file that end in "Addr" are only operating on a single function of
one PCI slot, not the entire slot, while functions that do something
with the entire slot end in "Slot", and 2) it didn't contain a verb
describing what it is doing (the "Set" refers to the set that contains
all PCI buses in the system, used to keep track of which slots in
which buses are already reserved for use).
It is now renamed to qemuDomainPCIAddressReserveNextSlot, which more
clearly describes what it is doing. Arguably, it could have been
changed to qemuDomainPCIAddressSetReserveNextSlot, but 1) the word
"set" is confusing in this context because it could be intended as a
verb or as a noun, and 2) most other functions that operate on a
single slot or address within this set are also named
qemuDomainPCIAddress... rather than qemuDomainPCIAddressSet... Only
the Create, Free, and Grow functions for an address set (which modify the
entire set, not just one element) use "Set" in their name.
* qemuPCIAddressAsString, qemuPCIAddressValidate
All the other functions in this set are named
qemuDomainPCIAddressxxxxx, so I renamed these to be consistent.
Since PCI bridges, PCIe bridges, PCIe switches, and PCIe root ports
all share the same namespace, they are all defined as controllers of
type='pci' in libvirt (but with a differing model attribute). Each of
these controllers has a certain connection type upstream, allows
certain connection types downstream, and each can either allow a
single downstream connection at slot 0, or connections from slot 1 -
31.
Right now, we only support the pci-root and pci-bridge devices, both
of which only allow PCI devices to connect, and both which have usable
slots 1 - 31. In preparation for adding other types of controllers
that have different capabilities, this patch 1) adds info to the
qemuDomainPCIAddressBus object to indicate the capabilities, 2) sets
those capabilities appropriately for pci-root and pci-bridge devices,
and 3) validates that the controller being connected to is the proper
type when allocating slots or validating that a user-selected slot is
appropriate for a device..
Having this infrastructure in place will make it much easier to add
support for the other PCI controller types.
While it would be possible to do all the necessary checking by just
storing the controller model in the qemyuDomainPCIAddressBus, it
greatly simplifies all the validation code to also keep a "flags",
"minSlot" and "maxSlot" for each - that way we can just check those
attributes rather than requiring a nearly identical switch statement
everywhere we need to validate compatibility.
You may notice many places where the flags are seemingly hard-coded to
QEMU_PCI_CONNECT_HOTPLUGGABLE | QEMU_PCI_CONNECT_TYPE_PCI
This is currently the correct value for all PCI devices, and in the
future will be the default, with small bits of code added to change to
the flags for the few devices which are the exceptions to this rule.
Finally, there are a few places with "FIXME" comments. Note that these
aren't indicating places that are broken according to the currently
supported devices, they are places that will need fixing when support
for new PCI controller models is added.
To assure that there was no regression in the auto-allocation of PCI
addresses or auto-creation of integrated pci-root, ide, and usb
controllers, a new test case (pci-bridge-many-disks) has been added to
both the qemuxml2argv and qemuxml2xml tests. This new test defines a
domain with several dozen virtio disks but no pci-root or
pci-bridges. The .args file of the new test case was created using
libvirt sources from before this patch, and the test still passes
after this patch has been applied.
Although these two enums are named ..._LAST, they really had the value
of ..._SIZE. This patch changes their values so that, e.g.,
QEMU_PCI_ADDRESS_SLOT_LAST really is the slot number of the last slot
on a PCI bus.
The implicit IDE, USB, and video controllers provided by the PIIX3
chipset in the pc-* machinetypes are not present on other
machinetypes, so we shouldn't be doing the special checking for
them. This patch places those validation checks into a separate
function that is only called for machine types that have a PIIX3 chip
(which happens to be the i440fx-based pc-* machine types).
One qemuxml2argv test data file had to be changed - the
pseries-usb-multi test had included a piix3-usb-uhci device, which was
being placed at a specific address, and also had slot 2 auto reserved
for a video device, but the pseries virtual machine doesn't actually
have a PIIX3 chip, so even if there was a piix3-usb-uhci driver for
it, the device wouldn't need to reside at slot 1 function 2. I just
changed the .argv file to have the generic slot info for the two
devices that results when the special PIIX3 code isn't executed.
qemuDomainPCIAddressBus was an array of QEMU_PCI_ADDRESS_SLOT_LAST
uint8_t's, which worked fine as long as every PCI bus was
identical. In the future, some PCI busses will allow connecting PCI
devices, and some will allow PCIe devices; also some will only allow
connection of a single device, while others will allow connecting 31
devices.
In order to keep track of that information for each bus, we need to
turn qemuDomainPCIAddressBus into a struct, for now with just one
member:
uint8_t slots[QEMU_PCI_ADDRESS_SLOT_LAST];
Additional members will come in later patches.
The item in qemuDomainPCIAddresSet that contains the array of
qemuDomainPCIAddressBus is now called "buses" to be more consistent
with the already existing "nbuses" (and with the new "slots" array).
The difference with already supported pool types (dir, fs, block)
is: there are two modes for iscsi pool (or network pools in future),
one can specify it either to use the volume target path (the path
showed up on host) with mode='host', or to use the remote URI qemu
supports (e.g. file=iscsi://example.org:6000/iqn.1992-01.com.example/1)
with mode='direct'.
For 'host' mode, it copies the volume target path into disk->src. For
'direct' mode, the corresponding info in the *one* pool source host def
is copied to disk->hosts[0].
The alias for hostdevs of type SCSI can be too long for QEMU if
larger LUNs are encountered. Here's a real life example:
<hostdev mode='subsystem' type='scsi' managed='no'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='19' unit='1088634913'/>
</source>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</hostdev>
this results in a too long drive id, resulting in QEMU yelling
Property 'scsi-generic.drive' can't find value 'drive-hostdev-scsi_host0-0-19-1088634913'
This commit changes the alias back to the default hostdev$(index)
scheme.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When user does not specify any model for scsi controller, or worse, no
controller at all, but libvirt automatically adds scsi controller with
no model, we are not searching for virtio-scsi and thus this can fail
for example on qemu which doesn't support lsi logic adapter.
This means that when qemu on x86 doesn't support lsi53c895a and the
user adds the following to an XML without any scsi controller:
<disk ...>
...
<target dev='sda'>
</disk>
libvirt fails like this:
# virsh define asdf.xml
error: Failed to define domain from asdf.xml
error: internal error Unable to determine model for scsi controller
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=974943
When virAsprintf was changed from a function to a macro
reporting OOM error in dc6f2da, it was documented as returning
0 on success. This is incorrect, it returns the number of bytes
written as asprintf does.
Some of the functions were converted to use virAsprintf's return
value directly, changing the return value on success from 0 to >= 0.
For most of these, this is not a problem, but the change in
virPCIDriverDir breaks PCI passthrough.
The return value check in virhashtest pre-dates virAsprintf OOM
conversion.
vmwareMakePath seems to be unused.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The function being introduced is responsible for creating command
line argument for '-device' for given character device. Based on
the chardev type, it calls appropriate qemuBuild.*ChrDeviceStr(),
e.g. qemuBuildSerialChrDeviceStr() for serial chardev and so on.
The chardev alias assignment is going to be needed in a separate
places, so it should be moved into a separate function rather
than copying code randomly around.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add new CPU features for HyperV:
vapic for virtual APIC support
spinlocks for setting spinlock support
<features>
<hyperv>
<vapic state='on'/>
<spinlocks state='on' retries='4096'/>
</hyperv>
</features>
https://bugzilla.redhat.com/show_bug.cgi?id=784836
https://bugzilla.redhat.com/show_bug.cgi?id=964177
Though both libvirt and QEMU's document say RTC_CHANGE returns
the offset from the host UTC, qemu actually returns the offset
from the specified date instead when specific date is provided
(-rtc base=$date).
It's not safe for qemu to fix it in code, it worked like that
for 3 years, changing it now may break other QEMU use cases.
What qemu tries to do is to fix the document:
http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg04782.html
And in libvirt side, instead of replying on the value from qemu,
this converts the offset returned from qemu to the offset from
host UTC, by:
/*
* a: the offset from qemu RTC_CHANGE event
* b: The specified date (-rtc base=$date)
* c: the host date when libvirt gets the RTC_CHANGE event
* offset: What libvirt will report
*/
offset = a + (b - c);
The specified date (-rtc base=$date) is recorded in clock's def as
an internal only member (may be useful to exposed outside?).
Internal only XML tag "basetime" is introduced to not lose the
guest's basetime after libvirt restarting/reloading:
<clock offset='variable' adjustment='304' basis='utc' basetime='1370423588'/>
Currently, if there's an error opening /dev/vhost-net (e.g. because
it doesn't exist) but it's not required we proceed with vhostfd array
filled with -1 and vhostfdSize unchanged. Later, when constructing
the qemu command line only non-negative items within vhostfd array
are taken into account. This means, vhostfdSize may be greater than
the actual count of non-negative items in vhostfd array. This results
in improper command line arguments being generated, e.g.:
-netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=(null)
With previous patch, we accept negative value as length of string to
duplicate. So there is no need to pass strlen(src) in case we want to do
duplicate the whole string.
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
Since 0d70656afd, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
During building of the qemu command line determine whether to add/use the
'-no-reboot' option only if each of the 'on' events want to to destroy
the domain; otherwise, use the '-no-shutdown' option.
Prior to this change both could be on the command line, which while allowed
could be construed as a conflict.
Adding a VNC WebSocket support for QEMU driver. This functionality is
in upstream qemu from commit described as v1.3.0-982-g7536ee4, so the
capability is being recognized based on QEMU version for now.
QEMU introduced command line "-mem-merge=on|off" (defaults to on) to
enable/disable the memory merge (KSM) at guest startup. This exposes
it by new XML:
<memoryBacking>
<nosharepages/>
</memoryBacking>
The XML tag is same with what we used internally for old RHEL.
The QEMU command line syntax for RBD disks is
file=rbd:pool/image:opt1=val1:opt2=val2...
There is no way to escape the ':' if it appears in the
pool or image name. Thus it must be explicitly forbidden
if it occurs in the libvirt XML. People are known to
be abusing the lack of escaping in current libvirt to
pass arbitrary args to QEMU.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The <filesystem> element can now accept a <driver type='nbd'/>
as an alternative to 'loop'. The benefit of NBD is support
for non-raw disk image formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Extend the <driver> element in filesystem devices to
allow a storage format to be set. The new attribute
uses 'format' to reflect the storage format. This is
different from the <driver> element in disk devices
which use 'type' to reflect the storage format. This
is because the 'type' attribute on filesystem devices
is already used for the driver backend, for which the
disk devices use the 'name' attribute. Arggggh.
Anyway for disks we have
<driver name="qemu" type="raw"/>
And for filesystems this change means we now have
<driver type="loop" format="raw"/>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Except the scsi host device's controller is "lsilogic", mapping
between the libvirt attributes and scsi-generic properties is:
libvirt qemu
-----------------------------------------
controller bus ($libvirt_controller.0)
bus channel
target scsi-id
unit lun
For scsi host device with "lsilogic" controller, the mapping is:
('target (libvirt)' must be 0, as it's not used; 'unit (libvirt)
must <= 7).
libvirt qemu
----------------------------------------------------------
controller && bus bus ($libvirt_controller.$libvirt_bus)
unit scsi-id
It's not good to hardcode/hard-check limits of these attributes,
and even worse, these limits are not documented, one has to find
out by either testing or reading the qemu code, I'm looking forward
to qemu expose limits like these one day). For example, exposing
"max_target", "max_lun" for megasas:
static const struct SCSIBusInfo megasas_scsi_info = {
.tcq = true,
.max_target = MFI_MAX_LD,
.max_lun = 255,
.transfer_data = megasas_xfer_complete,
.get_sg_list = megasas_get_sg_list,
.complete = megasas_command_complete,
.cancel = megasas_command_cancel,
};
Example of the qemu command line (lsilogic controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,scsi-id=8,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Example of the qemu command line (virtio-scsi controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,channel=0,scsi-id=128,lun=128,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
VFIO device assignment requires a cgroup ACL to be setup for access to
the /dev/vfio/nn "group" device for any devices that will be assigned
to a guest. In the case of a host device that is allocated from a
pool, it was being allocated during qemuBuildCommandLine(), which is
called by qemuProcessStart() *after* the all-encompassing
qemuSetupCgroup() was called, meaning that the standard Cgroup ACL
setup wasn't creating ACLs for these devices allocated from pools.
One possible solution was to manually add a single ACL down inside
qemuBuildCommandLine() when networkAllocateActualDevice() is called,
but that has two problems: 1) the function that adds the cgroup ACL
requires a virDomainObjPtr, which isn't available in
qemuBuildCommandLine(), and 2) we really shouldn't be doing network
device setup inside qemuBuildCommandLine() anyway.
Instead, I've created a new function called
qemuNetworkPrepareDevices() which is called just before
qemuPrepareHostDevices() during qemuProcessStart() (explanation of
ordering in the comments), i.e. well before the call to
qemuSetupCgroup(). To minimize code churn in a patch that will be
backported to 1.0.5-maint, qemuNetworkPrepareDevices only does
networkAllocateActualDevice() and the bare amount of setup required
for type='hostdev network devices, but it eventually should do *all*
device setup for guest network devices.
Note that some of the code that was previously needed in
qemuBuildCommandLine() is no longer required when
networkAllocateActualDevice() is called earlier:
* qemuAssignDeviceHostdevAlias() is already done further down in
qemuProcessStart().
* qemuPrepareHostdevPCIDevices() is called by
qemuPrepareHostDevices() which is called after
qemuNetworkPrepareDevices() in qemuProcessStart().
As hinted above, this new function should be moved into a separate
qemu_network.c (or similarly named) file along with
qemuPhysIfaceConnect(), qemuNetworkIfaceConnect(), and
qemuOpenVhostNet(), and expanded to call those functions as well, then
the nnets loop in qemuBuildCommandLine() should be reduced to only
build the commandline string (which itself can be in a separate
qemuInterfaceBuilldCommandLine() function as suggested by
Michal). However, this will require storing away an array of tapfd and
vhostfd that are needed for the commandline, so I would rather do that
in a separate patch and leave this patch at the minimum to fix the
bug.
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
As a result of commit id '19c345f2', 'make -C tests valgrind' has the
following for qemuxml2argvtest:
==22482== 197 (80 direct, 117 indirect) bytes in 1 blocks are definitely lost in loss record 101 of 120
==22482== at 0x4A06B6F: calloc (vg_replace_malloc.c:593)
==22482== by 0x4C6F301: virAlloc (viralloc.c:124)
==22482== by 0x4C840FC: virSaveLastError (virerror.c:308)
==22482== by 0x431882: qemuBuildCommandLine (qemu_command.c:8204)
==22482== by 0x41E8F0: testCompareXMLToArgvHelper (qemuxml2argvtest.c:155)
==22482== by 0x41FE9F: virtTestRun (testutils.c:157)
==22482== by 0x419DEB: mymain (qemuxml2argvtest.c:654)
==22482== by 0x4204DA: virtTestMain (testutils.c:719)
==22482== by 0x39D0821A04: (below main) (libc-start.c:225)
==22482==
qemuBuildMemballoonDevStr returns NULL if memballoon doesn't have
the right address type, but it doesn't report an error, leading to:
error: An error occurred, but the cause is unknown
Report a helpful error message instead, e.g.:
error: XML error: memballoon unsupported with address type 'usb'