Currently, there is no possibility for user to specify desired behaviour of
output to file - truncate or append. This patch adds an ability to explicitly
specify that user wants to preserve file's content on reopen.
Signed-off-by: Dmitry Mishin <dim@virtuozzo.com>
Using more than 4TiB of memory per NUMA node would not be possible to
express in the XML without violating the schema. Not that such boxes
would be common, but we should use a longer type at this point.
The pattern is not necessary since libvirt redefines the type already in
basictypes.rng with the same pattern.
To be used by the family of virtio input devices:
<input type='mouse' bus='virtio'/>
<input type='tablet' bus='virtio'/>
<input type='keyboard' bus='virtio'/>
https://bugzilla.redhat.com/show_bug.cgi?id=1231114
qemu 2.5 provides virtio video device. It can be used with -device
virtio-vga for primary devices, or -device virtio-gpu for non-vga
devices. However, only the primary device (VGA) is supported with this
patch.
Reference:
https://bugzilla.redhat.com/show_bug.cgi?id=1195176
Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
'model' attribute was added to a panic device but only one panic
device is allowed. This patch changes panic device presence
from 'optional' to 'zeroOrMore'.
Libvirt already has two types of panic devices - pvpanic and pSeries firmware.
This patch introduces the 'model' attribute and a new type of panic device.
'isa' model is for ISA pvpanic device.
'pseries' model is a default value for pSeries guests.
'hyperv' model is the new type. It's used for Hyper-V crash.
Schema and docs are updated for the new attribute.
Adjust the config code so that it does not enforce that target memory
node is specified. To avoid breakage, adjust the qemu memory hotplug
config checker to disallow such config for now.
Adds a new interface type using UDP sockets, this seems only applicable
to QEMU but have edited tree-wide to support the new interface type.
The interface type required the addition of a "localaddr" (local
address), this then maps into the following xml and qemu call.
<interface type='udp'>
<mac address='52:54:00:5c:67:56'/>
<source address='127.0.0.1' port='11112'>
<local address='127.0.0.1' port='22222'/>
</source>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</interface>
QEMU call:
-net socket,udp=127.0.0.1:11112,localaddr=127.0.0.1:22222
Notice the xml "local" entry becomes the "localaddr" for the qemu call.
reference:
http://lists.gnu.org/archive/html/qemu-devel/2011-11/msg00629.html
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
This patch adds feature for lxc containers to inherit namespaces.
This is very similar to what lxc-tools or docker provides. Look
for "man lxc-start" and you will find that you can pass command
args as [ --share-[net|ipc|uts] name|pid ]. Or check out docker
networking option in which you can give --net=container:NAME_or_ID
as an option for sharing +namespace.
>From this patch you can add extra libvirt option to share
namespace in following way.
<lxc:namespace>
<lxc:sharenet type='netns' value='red'/>
<lxc:shareipc type='pid' value='12345'/>
<lxc:shareuts type='name' value='container1'/>
</lxc:namespace>
The netns option is specific to sharenet. It can be used to
inherit from existing network namespace.
Co-authored: Daniel P. Berrange <berrange@redhat.com>
This controller can be connected only to a port on a
pcie-switch-upstream-port. It provides a single hotpluggable port that
will accept any PCI or PCIe device, as well as any device requiring a
pcie-*-port (the only current example of such a device is the
pcie-switch-upstream-port).
This controller can be connected only to a pcie-root-port or a
pcie-switch-downstream-port (which will be added in a later patch),
which is the reason for the new connect type
VIR_PCI_CONNECT_TYPE_PCIE_PORT. A pcie-switch-upstream-port provides
32 ports (slot=0 to slot=31) on the downstream side, which can only
have pci controllers of model "pcie-switch-downstream-port" plugged
into them, which is the reason for the other new connect type
VIR_PCI_CONNECT_TYPE_PCIE_SWITCH.
This controller can be connected (at domain startup time only - not
hotpluggable) only to a port on the pcie root complex ("pcie-root" in
libvirt config), hence the new connect type
VIR_PCI_CONNECT_TYPE_PCIE_ROOT. It provides a hotpluggable port that
will accept any PCI or PCIe device.
New attributes must be added to the controller <target> subelement for
this - chassis and port are guest-visible option values that will be
set by libvirt with values derived from the controller's index and pci
address information.
There are some configuration options to some types of pci controllers
that are currently automatically derived from other parts of the
controller's configuration. For example, in qemu a pci-bridge
controller has an option that is called "chassis_nr"; up until now
libvirt has always set chassis_nr to the index of the pci-bridge. So
this:
<controller type='pci' model='pci-bridge' index='2'/>
will always result in:
-device pci-bridge,chassis_nr=2,...
on the qemu commandline. In the future we may decide there is a better
way to derive that option, but even in that case we will need for
existing domains to retain the same chassis_nr they were using in the
past - that is something that is visible to the guest so it is part of
the guest ABI and changing it would lead to problems for migrating
guests (or just guests with very picky OSes).
The <target> subelement has been added as a place to put the new
"chassisNr" attribute that will be filled in by libvirt when it
auto-generates the chassisNr; it will be saved in the config, then
reused any time the domain is started:
<controller type='pci' model='pci-bridge' index='2'>
<model type='pci-bridge'/>
<target chassisNr='2'/>
</controller>
The one oddity of all this is that if the controller configuration
is changed (for example to change the index or the pci address
where the controller is plugged in), the items in <target> will
*not* be re-generated, which might lead to conflict. I can't
really see any way around this, but fortunately if there is a
material conflict qemu will let us know and we will pass that on
to the user.
This new subelement is used in PCI controllers: the toplevel
*attribute* "model" of a controller denotes what kind of PCI
controller is being described, e.g. a "dmi-to-pci-bridge",
"pci-bridge", or "pci-root". But in the future there will be different
implementations of some of those types of PCI controllers, which
behave similarly from libvirt's point of view (and so should have the
same model), but use a different device in qemu (and present
themselves as a different piece of hardware in the guest). In an ideal
world we (i.e. "I") would have thought of that back when the pci
controllers were added, and used some sort of type/class/model
notation (where class was used in the way we are now using model, and
model was used for the actual manufacturer's model number of a
particular family of PCI controller), but that opportunity is long
past, so as an alternative, this patch allows selecting a particular
implementation of a pci controller with the "name" attribute of the
<model> subelement, e.g.:
<controller type='pci' model='dmi-to-pci-bridge' index='1'>
<model name='i82801b11-bridge'/>
</controller>
In this case, "dmi-to-pci-bridge" is the kind of controller (one that
has a single PCIe port upstream, and 32 standard PCI ports downstream,
which are not hotpluggable), and the qemu device to be used to
implement this kind of controller is named "i82801b11-bridge".
Implementing the above now will allow us in the future to add a new
kind of dmi-to-pci-bridge that doesn't use qemu's i82801b11-bridge
device, but instead uses something else (which doesn't yet exist, but
qemu people have been discussing it), all without breaking existing
configs.
(note that for the existing "pci-bridge" type of PCI controller, both
the model attribute and <model> name are 'pci-bridge'. This is just a
coincidence, since it turns out that in this case the device name in
qemu really is a generic 'pci-bridge' rather than being the name of
some real-world chip)
This patch provides support for the new watchdog model "diag288".
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Daniel Hansel <daniel.hansel@linux.vnet.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
This patch provides support for a new watchdog action "inject-nmi" which
allows to define an inject of a non-maskable interrupt into a guest.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Daniel Hansel <daniel.hansel@linux.vnet.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Currently the grammar uses "none" for a "valid" Disk Storage Pool
format type; however, virStoragePoolFormatDisk uses "unknown" so
virt-xml-validate will fail to validate when "unknown" is found
Defining a domain with a SCSI disk attached via a hostdev
tag and a source address unit value longer than two digits
causes an error when editing the domain with virsh edit,
even if no changes are made to the domain definition.
The error suggests invalid XML, somewhere:
# virsh edit lmb_guest
error: XML document failed to validate against schema:
Unable to validate doc against /usr/local/share/libvirt/schemas/domain.rng
Extra element devices in interleave
Element domain failed to validate content
The virt-xml-validate tool fails with a similar error:
# virt-xml-validate lmb_guest.xml
Relax-NG validity error : Extra element devices in interleave
lmb_guest.xml:17: element devices: Relax-NG validity error :
Element domain failed to validate content
lmb_guest.xml fails to validate
The hostdev tag requires a source address to be specified,
which includes bus, target, and unit address attributes.
According to the SCSI Architecture Model spec (section
4.9 of SAM-2), a LUN address is 64 bits and thus could be
up to 20 decimal digits long. Unfortunately, the XML
schema limits this string to just two digits. Similarly,
the target field can be up to 32 bits in length, which
would be 10 decimal digits.
# lsscsi -xx
[0:0:19:0x4022401100000000] disk IBM 2107900 3.44 /dev/sda
# lsscsi
[0:0:19:1074872354]disk IBM 2107900 3.44 /dev/sda
# cat lmb_guest.xml
<domain type='kvm'>
<name>lmb_guest</name>
<memory unit='MiB'>1024</memory>
...trimmed...
<devices>
<controller type='scsi' model='virtio-scsi' index='0'/>
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='19' unit='1074872354'/>
</source>
</hostdev>
...trimmed...
Since the reference unit and target fields are used in
several places in the XML schema, create a separate one
specific for SCSI Logical Units that will permit the
greater length. This permits both the validation utility
and the virsh edit command to succeed when a hostdev
tag is included.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1220527
This type of information defines attributes of a system
baseboard. With one exception: board type is yet not implemented
in qemu so it's not introduced here either.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit id '887dd362' added support for a netfs pool format type 'cifs'
and 'gluster' in order to add rng support for Samba and glusterfs netfs
pools. Originally, the CIFS type support was added as part of commit
id '61fb6979'. Eventually commit id 'b325be12' fixed the gluster rng
definition to match expectations.
As it turns out the CIFS rng needed a similar change since the directory
path is not an absDirPath, rather just a dirPath will be required.
https://bugzilla.redhat.com/show_bug.cgi?id=1228007
When attaching a scsi volume lun via the attach-device --config or
--persistent options, there was no translation of the source pool
like there was for the live path, thus the attempt to modify the config
would fail since not enough was known about the disk.
I see no reason to duplicate this list of architectures. This also allows
more guest architectures to be used with libvirt (like the mips64el qemu
machine I am trying to run).
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
The network name is currently of type "deviceName" but it should be
"text" as name is defined in the network.rng.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
The XML parser sets a default <mode> if none is explicitly passed in.
This is then used at pool/vol creation time, and unconditionally reported
in the XML.
The problem with this approach is that it's impossible for other code
to determine if the user explicitly requested a storage mode. There
are some cases where we want to make this distinction, but we currently
can't.
Handle <mode> parsing like we handle <owner>/<group>: if no value is
passed in, set it to -1, and adjust the internal consumers to handle
it.
As of netcf-0.2.8, netcf supports configuring multipl IPv4 addresses,
as well as simultaneously configuring dhcp and static IPv4 addresses,
on a single interface. This patch updates libvirt's interface.rng to
allow such configurations.
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1223688
https://bugzilla.redhat.com/show_bug.cgi?id=998813
Like usb-serial, the pci-serial device allows a serial device to be
attached to PCI bus. An example XML looks like this:
<serial type='dev'>
<source path='/dev/ttyS2'/>
<target type='pci-serial' port='0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</serial>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Two new domain configuration XML elements are added to enable/disable
the protected key management operations for a guest:
<domain>
...
<keywrap>
<cipher name='aes|dea' state='on|off'/>
</keywrap>
...
</domain>
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Daniel Hansel <daniel.hansel@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Some platforms, like aarch64, don't have APIC but GIC. So there's
no reason to have <apic/> feature turned on. However, we are
still missing <gic/> feature. This commit introduces the feature
to XML parser and formatter, adds documentation and updates RNG
schema.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
A new feature that can be turned on or off.
The QEMU machine vmport option allows to set the VMWare IO port
emulation. This emulation is useful for absolute pointer input when the
guest has vmware input drivers, and is enabled by default for kvm.
However it is unnecessary for Spice-enabled VM, since the agent already
handles absolute pointer and multi-monitors. Furthermore, it prevents
Spice from switching to relative input since the regular ps/2 pointer
driver is replaced by the vmware driver. It is thus advised to disable
vmport when using a Spice VM. This will permit the Spice client to
switch from absolute to relative pointer, as it may be required for
certain games or applications.
The phyp driver stuffed it into a DomainDefPtr during its attachdevice
routine, but the value is never advertised via capabilities so it should
be safe to drop.
Have the phyp driver use OSTYPE_LINUX, which is what it advertises via
capabilities.
Adding a new XML element 'iothreadids' in order to allow defining
specific IOThread ID's rather than relying on the algorithm to assign
IOThread ID's starting at 1 and incrementing to iothreads count.
This will allow future patches to be able to add new IOThreads by
a specific iothread_id and of course delete any exisiting IOThread.
Each iothreadids element will have 'n' <iothread> children elements
which will have attribute "id". The "id" will allow for definition
of any "valid" (eg > 0) iothread_id value.
On input, if any <iothreadids> <iothread>'s are provided, they will
be marked so that we only print out what we read in.
On input, if no <iothreadids> are provided, the PostParse code will
self generate a list of ID's starting at 1 and going to the number
of iothreads defined for the domain (just like the current algorithm
numbering scheme). A future patch will rework the existing algorithm
to make use of the iothreadids list.
On output, only print out the <iothreadids> if they were read in.
The PortNumber data type is declared to derive from 'short'.
Unfortunately this is an signed type, so validates the range
[-32,768, 32,767] which excludes valid port numbers between
32767 and 65535.
We can't use 'unsignedShort', since we need -1 to be a valid
port number too.
This change is to use 'int' and set an explicit max boundary
instead of relying on the data types' built-in max.
One of the existing tests is changed to use a high port number
to validate the schema.
https://bugzilla.redhat.com/show_bug.cgi?id=1214664
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
According to docs, using 'lun' as a value for device attribute is only valid
with disk types 'block' and 'network'. However current RNG schema also allows
a combination type='file' device='lun' which results in a successfull
xml validation, but fails at qemuBuildCommandLine.
Besides fixing the RNG schema, this patch also adds a qemuxml2argvtest
for this case.
https://bugzilla.redhat.com/show_bug.cgi?id=1210669
The <inbound/> element to <bandwidth/> has several attributes from
which two are mandatory. Well, from two at least one has to be
present: @average or @floor or both. Instead of inventing crazy RNG
schema, let's make all the attributes optional there and rely on our
parsing code to correctly handle the situation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
On IRC, Hydrar pointed a problem where 'virsh edit' failed on
his domain created through an ISCSI pool managed by virt-manager,
all because the XML included a block device with colons in the
name.
* docs/schemas/basictypes.rng (absFilePath): Add colon as safe.
* tests/qemuxml2argvdata/qemuxml2argv-disk-iscsi.xml: New file.
* tests/qemuxml2argvdata/qemuxml2argv-disk-iscsi.args: Likewise.
* tests/qemuxml2argvtest.c (mymain): Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
When using QEMU's 9pfs the target "dir" element is not necessarily an
absolute path but merely an arbitrary identifier. So validation in that
case currently fails with the misleading
$ virt-xml-validate /tmp/test.xml
Relax-NG validity error : Extra element devices in interleave
/tmp/test.xml:24: element devices: Relax-NG validity error : Element domain failed to validate content
/tmp/test.xml fails to validate
This patch adds code that parses and formats configuration for memory
devices.
A simple configuration would be:
<memory model='dimm'>
<target>
<size unit='KiB'>524287</size>
<node>0</node>
</target>
</memory>
A complete configuration of a memory device:
<memory model='dimm'>
<source>
<pagesize unit='KiB'>4096</pagesize>
<nodemask>1-3</nodemask>
</source>
<target>
<size unit='KiB'>524287</size>
<node>1</node>
</target>
</memory>
This patch preemptively forbids use of the <memory> device in individual
drivers so the users are warned right away that the device is not
supported.
Add a XML element that will allow to specify maximum supportable memory
and the count of memory slots to use with memory hotplug.
To avoid possible confusion and misuse of the new element this patch
also explicitly forbids the use of the maxMemory setting in individual
drivers's post parse callbacks. This limitation will be lifted when the
support is implemented.
Wikipedia's list of common misspellings [1] has a machine-readable
version. This patch fixes those misspellings mentioned in the list
which don't have multiple right variants (as e.g. "accension", which can
be both "accession" and "ascension"), such misspellings are left
untouched. The list of changes was manually re-checked for false
positives.
[1] https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Midonet is an opensource virtual networking that over lays the IP
network between hypervisors. Currently, such networks can be made
with the openvswitch virtualport type.
This patch, defines the schema and documentation that will serve
as basis for the follow up patches that will add support to libvirt
for using Midonet virtual ports for its interfaces. The schema
definition requires that the port profile expresses its interfaceid
as part of the port profile. For that reason, this is part of the
patch too.
Signed-off-by: Antoni Segura Puimedon <toni+libvirt@midokura.com>
All the devices we have format their address as its last sub-element, so
let's change memballoon to follow suit. Also adjust RNG to allow any
order of them so 'virsh edit' doesn't shout at us.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Now that the size of guest's memory can be inferred from the NUMA
configuration (if present) make it optional to specify <memory>
explicitly.
To make sure that memory is specified add a check that some form of
memory size was specified. One side effect of this change is that it is
no longer possible to specify 0KiB as memory size for the VM, but I
don't think it would be any useful to do so. (I can imagine embedded
systems without memory, just registers, but that's far from what libvirt
is usually doing).
Forbidding 0 memory for guests also fixes a few corner cases where 0 was
not interpreted correctly and caused failures. (Arguments for numad when
using automatic placement, size of the balloon). This fixes problems
described in https://bugzilla.redhat.com/show_bug.cgi?id=1161461
Test case changes are added to verify that the schema change and code
behave correctly.
Our code supports that for ages. When using a <filterref/> to an
<interface/> several parameters can be passed to the filter. Later,
when building firewall rules, parameters are substituted for their
values. However, our RNG schema allowed only one parameter to be
passed.
Reported-by: Brian Rak <brak@gameservers.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Adding functionality to libvirt that will allow it
query the ethtool interface for the availability
of certain NIC HW offload features
Here is an example of the feature XML definition:
<device>
<name>net_eth4_90_e2_ba_5e_a5_45</name>
<path>/sys/devices/pci0000:00/0000:00:03.0/0000:08:00.1/net/eth4</path>
<parent>pci_0000_08_00_1</parent>
<capability type='net'>
<interface>eth4</interface>
<address>90:e2:ba:5e:a5:45</address>
<link speed='10000' state='up'/>
<feature name='rx'/>
<feature name='tx'/>
<feature name='sg'/>
<feature name='tso'/>
<feature name='gso'/>
<feature name='gro'/>
<feature name='rxvlan'/>
<feature name='txvlan'/>
<feature name='rxhash'/>
<capability type='80203'/>
</capability>
</device>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
In commit edd1295e1d I've introduced an
XML element that allows to configure state of the network interface
link. Somehow the RNG schema hunk ended up in a weird place in the
network schema definition. Move it to the right place and add a test
case.
Note that the link state is set up via the monitor at VM startup so I
originally didn't think of adding a test case.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1173468
The element wasn't declared under the interleave thus it was required
always to be first. This made it inconvenient when pasting new stuff to
the XML manually in the "wrong" place.
The "virtio-mmio" is perfectly valid address type which we parse and
format correctly, but it's missing in our RNG schemas, hence editing a
domain with device having such address fails the validation.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
At least Xen supports backend drivers in another domain (aka "driver
domain"). This patch introduces an XML config option for specifying the
backend domain name for <disk> and <interface> devices. E.g.
<disk>
<backenddomain name='diskvm'/>
...
</disk>
<interface type='bridge'>
<backenddomain name='netvm'/>
...
</interface>
In the future, same option will be needed for USB devices (hostdev
objects), but for now libxl doesn't have support for PVUSB.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Add an XML attribute to allow disabling merge of rx buffers
on the host:
<interface ...>
...
<model type='virtio'/>
<driver ...>
<host mrg_rxbuf='off'/>
</driver>
</interface>
https://bugzilla.redhat.com/show_bug.cgi?id=1186886
In order for QEMU vCPU (and other) threads to run with RT scheduler,
libvirt needs to take care of that so QEMU doesn't have to run privileged.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178986
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In our RNG schema we do allow multiple (different) seclabels per-domain,
but don't allow this for devices, yet we neither have a check in our XML parser,
nor in a post-parse callback. In that case we should allow multiple
(different) seclabels for devices as well.
There are some interface types (notably 'server' and 'client')
which instead of allowing the default set of elements and
attributes (like the rest do), try to enumerate only the elements
they know of. This way it's, however, easy to miss something. For
instance, the <address/> element was not mentioned at all. This
resulted in a strange behavior: when such interface was added
into XML, the address was automatically generated by parsing
code. Later, the formatted XML hasn't passed the RNG schema. This
became more visible once we've turned on the XML validation on
domain XML changes: appending an empty line at the end of
formatted XML (to trick virsh think the XML had changed) made
libvirt to refuse the very same XML it formatted.
Instead of trying to find each element and attribute we are
missing in the schema, lets just allow all the elements and
attributes like we're doing that for the rest of types. It's no
harm if the schema is wider than our parser allows.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1130390
The listen address is not mandatory for <interface type='server'>
but when it's not specified, we've been formatting it as:
-netdev socket,listen=(null):5558,id=hostnet0
which failed with:
Device 'socket' could not be initialized
Omit the address completely and only format the port in the listen
attribute.
Also fix the schema to allow specifying a model.
This adds a new "localOnly" attribute on the domain element of the
network xml. With this set to "yes", DNS requests under that domain
will only be resolved by libvirt's dnsmasq, never forwarded upstream.
This was how it worked before commit f69a6b987d, and I found that
functionality useful. For example, I have my host's NetworkManager
dnsmasq configured to forward that domain to libvirt's dnsmasq, so I can
easily resolve guest names from outside. But if libvirt's dnsmasq
doesn't know a name and forwards it to the host, I'd get an endless
forwarding loop. Now I can set localOnly="yes" to prevent the loop.
Signed-off-by: Josh Stone <jistone@redhat.com>
Ploop is a pseudo device which makeit possible to access
to an image in a file as a block device. Like loop devices,
but with additional features, like snapshots, write tracker
and without double-caching.
It used in PCS for containers and in OpenVZ. You can manage
ploop devices and images with ploop utility
(http://git.openvz.org/?p=ploop).
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
QEMU supports feature specification with -cpu host and we just skip
using that. Since QEMU developers themselves would like to use this
feature, this patch modifies the code to work.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178850
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Make use of the ebtables functionality to be able to filter certain
parameters of icmpv6 packets. Extend the XML parser for icmpv6 types,
type ranges, codes, and code ranges. Extend the nwfilter documentation,
schema, and test cases.
Being able to filter icmpv6 types and codes helps extending the DHCP
snooper for IPv6 and filtering at least some parameters of IPv6's NDP
(Neighbor Discovery Protocol) packets. However, the filtering will not
be as good as the filtering of ARP packets since we cannot
check on IP addresses in the payload of the NDP packets.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Add the possibility to have more than one IP address configured for a
domain network interface. IP addresses can also have a prefix to define
the corresponding netmask.
Volume and pool formatting functions took different approaches to
unspecified uids/gids. When unknown, it is always parsed as -1, but one
of the functions formatted it as unsigned int (wrong) and one as
int (better). Due to that, our two of our XML files from tests cannot
be parsed on 32-bit machines.
RNG schema needs to be modified as well, but because both
storagepool.rng and storagevol.rng need same schema for permission
element, save some space by moving it to storagecommon.rng.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The macTableManager attribute of a network's bridge subelement tells
libvirt how the bridge's MAC address table (used to determine the
egress port for packets) is managed. In the default mode, "kernel",
management is left to the kernel, which usually determines entries in
part by turning on promiscuous mode on all ports of the bridge,
flooding packets to all ports when the correct destination is unknown,
and adding/removing entries to the fdb as it sees incoming traffic
from particular MAC addresses. In "libvirt" mode, libvirt turns off
learning and flooding on all the bridge ports connected to guest
domain interfaces, and adds/removes entries according to the MAC
addresses in the domain interface configurations. A side effect of
turning off learning and unicast_flood on the ports of a bridge is
that (with Linux kernel 3.17 and newer), the kernel can automatically
turn off promiscuous mode on one or more of the bridge's ports
(usually only the one interface that is used to connect the bridge to
the physical network). The result is better performance (because
packets aren't being flooded to all ports, and can be dropped earlier
when they are of no interest) and slightly better security (a guest
can still send out packets with a spoofed source MAC address, but will
only receive traffic intended for the guest interface's configured MAC
address).
The attribute looks like this in the configuration:
<network>
<name>test</name>
<bridge name='br0' macTableManager='libvirt'/>
...
This patch only adds the config knob, documentation, and test
cases. The functionality behind this knob is added in later patches.
Add attribute to set vgamem_mb parameter of QXL device for QEMU. This
value sets the size of VGA framebuffer for QXL device. Default value in
QEMU is 8MB so reuse it also in libvirt to not break things.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1076098
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
To be able to express some use cases of the RBD backing with libvirt, we
need to be able to specify a config file for the RBD client to qemu as
that is one of the commonly used options.
Some storage systems have internal support for snapshots. Libvirt should
be able to select a correct snapshot when starting a VM.
This patch adds a XML element to select a storage source snapshot for
the RBD protocol which supports this feature.
To track state of virtio channels this patch adds a new output-only
attribute called 'state' to the <target> element of virtio channels.
This will be later populated with the guest state of the channel.
https://bugzilla.redhat.com/show_bug.cgi?id=1160926
Introduce a 'managed' attribute to allow libvirt to decide whether to
delete a vHBA vport created via external means such as nodedev-create.
The code currently decides whether to delete the vHBA based solely on
whether the parent was provided at creation time. However, that may not
be the desired action, so rather than delete and force someone to create
another vHBA via an additional nodedev-create allow the configuration of
the storage pool to decide the desired action.
During createVport when libvirt does the VPORT_CREATE, set the managed
value to YES if not already set to indicate to the deleteVport code that
it should delete the vHBA when the pool is destroyed.
If libvirtd is restarted all the memory only state was lost, so for a
persistent storage pool, use the virStoragePoolSaveConfig in order to
write out the managed value.
Because we're now saving the current configuration, we need to be sure
to not save the parent in the output XML if it was undefined at start.
Saving the name would cause future starts to always use the same parent
which is not the expected result when not providing a parent. By not
providing a parent, libvirt is expected to find the best available
vHBA port for each subsequent (re)start.
At deleteVport, use the new managed value to decide whether to execute
the VPORT_DELETE. Since we no longer save the parent in memory or in
XML when provided, if it was not provided, then we have to look it up.
Modify the structure _virDomainBlockIoTuneInfo to support these the new
options.
Change the initialization of the variable expectedInfo in qemumonitorjsontest.c
to avoid compiling problem.
Add documentation about the new xml options
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
CPU numa topology implicitly allows memory specification in 'KiB'.
Enabling this to accept the 'unit' in which memory needs to be specified.
This now allows users to specify memory in units of choice, and
lists the same in 'KiB' -- just like other 'memory' elements in XML.
<numa>
<cell cpus='0-3' memory='1024' unit='MiB' />
<cell cpus='4-7' memory='1024' unit='MiB' />
</numa>
Also augment test cases to correctly model NUMA memory specification.
This adds the tag 'unit="KiB"' for memory attribute in NUMA cells.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This new attribute will control whether or not libvirt will pay
attention to guest notifications about changes to network device mac
addresses and receive filters. The default for this is 'no' (for
security reasons). If it is set to 'yes' *and* the specified device
model and connection support it (currently only macvtap+virtio) then
libvirt will watch for NIC_RX_FILTER_CHANGED events, and when it
receives one, it will issue a query-rx-filter command, retrieve the
result, and modify the host-side macvtap interface's mac address and
unicast/multicast filters accordingly.
The functionality behind this attribute will be in a later patch. This
patch merely adds the attribute to the top-level of a domain's
<interface> as well as to <network> and <portgroup>, and adds
documentation and schema/xml2xml tests. Rather than adding even more
test files, I've just added the net attribute in various applicable
places of existing test files.
This patch adds parsing/formatting code as well as documentation for
shared memory devices. This will currently be only accessible in QEMU
using it's ivshmem device, but is designed as generic as possible to
allow future expansion for other hypervisors.
In the devices section in the domain XML users may specify:
- For shmem device using a server:
<shmem name='shmem0'>
<server path='/tmp/socket-ivshmem0'/>
<size unit='M'>32</size>
<msi vectors='32' ioeventfd='on'/>
</shmem>
- For ivshmem device not using an ivshmem server:
<shmem name='shmem1'>
<size unit='M'>32</size>
</shmem>
Most of the configuration is made optional so it also allows
specifications like:
<shmem name='shmem1/>
<shmem name='shmem2'>
<server/>
</shmem>
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
According to our documentation logical pool supports formats 'auto' and
'lvm2'. However, in storage_conf.c we previously defined storage pool
formats: unknown, lvm2. Due to backward compatibility reasons
we must continue refer to pool format type 'unknown' instead of 'auto'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1123767
Add options for tuning segment offloading:
<driver>
<host csum='off' gso='off' tso4='off' tso6='off'
ecn='off' ufo='off'/>
<guest csum='off' tso4='off' tso6='off' ecn='off' ufo='off'/>
</driver>
which control the respective host_ and guest_ properties
of the virtio-net device.
- Provide an implementation for buildPool and deletePool operations
for the ZFS storage backend.
- Add VIR_STORAGE_POOL_SOURCE_DEVICE flag to ZFS pool poolOptions
as now we can specify devices to build pool from
- storagepool.rng: add an optional 'sourceinfodev' to 'sourcezfs' and
add an optional 'target' to 'poolzfs' entity
- Add a couple of tests to storagepoolxml2xmltest
Check to see if the UEFI binary mentioned in qemu.conf actually
exists, and if so expose it in domcapabilities like
<loader ...>
<value>/path/to/ovmf</value>
</loader>
We introduce some generic domcaps infrastructure for handling
a dynamic list of string values, it may be of use for future bits.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
As of 542899168c we learned libvirt to use UEFI for domains.
However, management applications may firstly query if libvirt
supports it. And this is where virConnectGetDomainCapabilities()
API comes handy.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For tuning the network, alternative devices
for creating tap and vhost devices can be specified via:
<backend tap='/dev/net/tun' vhost='/dev/net-vhost'/>
I noticed this with the recent iothread pinning code, but the
problem existed longer than that. The XML validation required
users to supply <cputune> children in a strict order, even though
there was no conceptual reason why they can't occur in any order.
docs/ changes best viewed with -w
* docs/schemas/domaincommon.rng (cputune): Add interleave.
* tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml: Swap
up order, copying canonical form...
* tests/qemuxml2xmloutdata/qemuxml2xmlout-cputune-iothreads.xml:
...here.
* tests/qemuxml2xmltest.c (mymain): Mark the difference.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1101574
Add an option 'iothreadpin' to the <cpuset> to allow for setting the
CPU affinity for each IOThread.
The iothreadspin will mimic the vcpupin with respect to being able to
assign each iothread to a specific CPU, although iothreads ids start
at 1 while vcpu ids start at 0. This matches the iothread naming scheme.
When spanning tree protocol is allowed in bridge settings, forward delay
value is set as well (default is 0 if omitted). Until now, there was no
check for delay value validity. Delay makes sense only as a positive
numerical value.
Note: However, even if you provide positive numerical value, brctl
utility only uses values from range <2,30>, so the number provided can
be modified (kernel most likely) to fall within this range.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1125764
When using split UEFI image, it may come handy if libvirt manages per
domain _VARS file automatically. While the _CODE file is RO and can be
shared among multiple domains, you certainly don't want to do that on
the _VARS file. This latter one needs to be per domain. So at the
domain startup process, if it's determined that domain needs _VARS
file it's copied from this master _VARS file. The location of the
master file is configurable in qemu.conf.
Temporary, on per domain basis the location of master NVRAM file can
be overridden by this @template attribute I'm inventing to the
<nvram/> element. All it does is holding path to the master NVRAM file
from which local copy is created. If that's the case, the map in
qemu.conf is not consulted.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Up to now, users can configure BIOS via the <loader/> element. With
the upcoming implementation of UEFI this is not enough as BIOS and
UEFI are conceptually different. For instance, while BIOS is ROM, UEFI
is programmable flash (although all writes to code section are
denied). Therefore we need new attribute @type which will
differentiate the two. Then, new attribute @readonly is introduced to
reflect the fact that some images are RO.
Moreover, the OVMF (which is going to be used mostly), works in two
modes:
1) Code and UEFI variable store is mixed in one file.
2) Code and UEFI variable store is separated in two files
The latter has advantage of updating the UEFI code without losing the
configuration. However, in order to represent the latter case we need
yet another XML element: <nvram/>. Currently, it has no additional
attributes, it's just a bare element containing path to the variable
store file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add a new disk "driver" attribute "iothread" to be parsed as the thread
number for the disk to use. In order to more easily facilitate the usage
and configuration of the iothread, a "zero" for the attribute indicates
iothreads are not supported for the device and a positive value indicates
the specific thread to try and use.
Introduce XML to allowing adding iothreads to the domain. These can be
used by virtio-blk-pci devices in order to assign a specific thread to
handle the workload for the device. The iothreads are the official
implementation of the virtio-blk Data Plane that's been in tech preview
for QEMU.
QEMU 2.1 added support for the kvm=off option to the -cpu command,
allowing the KVM hypervisor signature to be hidden from the guest.
This enables disabling of some paravirualization features in the
guest as well as allowing certain drivers which test for the
hypervisor to load. Domain XML syntax is as follows:
<domain type='kvm>
...
<features>
...
<kvm>
<hidden state='on'/>
</kvm>
</features>
...
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The correct vlanid range is 0~4095.
After merging this patch, we can not validate a interface xml with vlanid >= 4096.
[root@localhost ~]# cat vlan.xml
<interface type='vlan' name='eno1.4096'>
<start mode='onboot'/>
<protocol family='ipv4'>
<dhcp/>
</protocol>
<vlan tag='4096'>
<interface name='eno1'/>
</vlan>
</interface>
[root@localhost ~]# virt-xml-validate vlan.xml
vlan.xml:1: element interface: Relax-NG validity error : Invalid sequence in interleave
vlan.xml:6: element vlan: Relax-NG validity error : Element interface failed to validate content
vlan.xml:6: element vlan: Relax-NG validity error : Element vlan failed to validate attributes
vlan.xml fails to validate
[root@localhost ~]#
Here is a ip command help on this.
[root@localhost /]# ip link add link eno1 name eno1.90 type vlan help
Usage: ... vlan [ protocol VLANPROTO ] id VLANID [ FLAG-LIST ]
[ ingress-qos-map QOS-MAP ] [ egress-qos-map QOS-MAP ]
VLANPROTO: [ 802.1Q / 802.1ad ]
VLANID := 0-4095
FLAG-LIST := [ FLAG-LIST ] FLAG
FLAG := [ reorder_hdr { on | off } ] [ gvrp { on | off } ] [ mvrp { on | off } ]
[ loose_binding { on | off } ]
QOS-MAP := [ QOS-MAP ] QOS-MAPPING
QOS-MAPPING := FROM:TO
Implement ZFS storage backend driver. Currently supported
only on FreeBSD because of ZFS limitations on Linux.
Features supported:
- pool-start, pool-stop
- pool-info
- vol-list
- vol-create / vol-delete
Pool definition looks like that:
<pool type='zfs'>
<name>myzfspool</name>
<source>
<name>actualpoolname</name>
</source>
</pool>
The 'actualpoolname' value is a name of the pool on the system,
such as shown by 'zpool list' command. Target makes no sense
here because volumes path is always /dev/zvol/$poolname/$volname.
User has to create a pool on his own, this driver doesn't
support pool creation currently.
A volume could be used with Qemu by adding an entry like this:
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='myzfspool' volume='vol5'/>
<target dev='hdc' bus='ide'/>
</disk>
Introduce a new structure to handle an iSCSI host device based on the
existing virDomainHostdevSubsysSCSI by adding a "protocol='iscsi'" to
the <source/> element. The existing scsi_host subsystem RNG was modified
to read an optional "protocol='adapter'", although it won't be written
out nor is it documented as an option (by choice).
The new hostdev structure mimics the existing <disk/> element for an
iSCSI device (network) device. New XML is:
<hostdev mode='subsystem' type='scsi' managed='yes'>
<source protocol='iscsi' name='iqn.1992-01.com.example'>
<host name='example.org' port='3260'/>
<auth username='myname'>
<secret type='iscsi' usage='mycluster_myname'/>
</auth>
</source>
<address type='drive' controller='0' bus='0' target='2' unit='5'/>
</hostdev>
The controller element will mimic the existing scsi_host code insomuch
as when 'lsi' and 'virtio-scsi' are used.
A future patch is going to wire up qemu active block commit jobs;
but as they have similar events and are canceled/pivoted in the
same way as block copy jobs, it is easiest to track all bookkeeping
for the commit job by reusing the <mirror> element. This patch
adds domain XML to track which job was responsible for creating a
mirroring situation, and adds a job='copy' attribute to all
existing uses of <mirror>. Along the way, it also massages the
qemu monitor backend to read the new field in order to generate
the correct type of libvirt job (even though it requires a
future patch to actually cause a qemu event that can be reported
as an active commit). It also prepares to update persistent XML
to match changes made to live XML when a copy completes.
* docs/schemas/domaincommon.rng: Enhance schema.
* docs/formatdomain.html.in: Document it.
* src/conf/domain_conf.h (_virDomainDiskDef): Add a field.
* src/conf/domain_conf.c (virDomainBlockJobType): String conversion.
(virDomainDiskDefParseXML): Parse job type.
(virDomainDiskDefFormat): Output job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Distinguish
active from regular commit.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set job type.
(qemuDomainBlockPivot, qemuDomainBlockJobImpl): Clean up job type
on completion.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml:
Update tests.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml: New
file.
* tests/qemuxml2xmltest.c (mymain): Drive new test.
Signed-off-by: Eric Blake <eblake@redhat.com>
Doing a blockcopy operation across a libvirtd restart is not very
robust at the moment. In particular, we are clearing the <mirror>
element prior to telling qemu to finish the job. Also, thanks to the
ability to request async completion, the user can easily regain
control prior to qemu actually finishing the effort, and they should
be able to poll the domain XML to see if the job is still going.
A future patch will fix things to actually wait until qemu is done
before modifying the XML to reflect the job completion. But since
qemu issues identical BLOCK_JOB_COMPLETE events regardless of whether
the job was cancelled (kept the original disk) or completed (pivoted
to the new disk), we have to track which of the two operations were
used to end the job. Furthermore, we'd like to avoid attempts to
end a job where we are already waiting on an earlier request to qemu
to end the job. Likewise, if we miss the qemu event (perhaps because
it arrived during a libvirtd restart), we still need enough state
recorded to be able to determine how to modify the domain XML once
we reconnect to qemu and manually learn whether the job still exists.
Although this patch doesn't actually fix the problem, it is a
preliminary step that makes it possible to track whether a job
has already begun steps towards completion.
* src/conf/domain_conf.h (virDomainDiskMirrorState): New enum.
(_virDomainDiskDef): Convert bool mirroring to new enum.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainDiskDefFormat): Handle new values.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Adjust
client.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot)
(qemuDomainBlockJobImpl): Likewise.
* docs/schemas/domaincommon.rng (diskMirror): Expose new values.
* docs/formatdomain.html.in (elementsDisks): Document it.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
* docs/schemas/domaincommon.rng: Add bhyve domain type, nmdm
serial type and master and slave optional attributes for
serial that are used by nmdm
* tests/domainschematest: Add bhyvexml2argvdata directory
to validate bhyve XMLs
Added <capabilities> in the <features> section of LXC domains
configuration. This section can contain elements named after the
capabilities like:
<mknod state="on"/>, keep CAP_MKNOD capability
<sys_chroot state="off"/> drop CAP_SYS_CHROOT capability
Users can restrict or give more capabilities than the default using
this mechanism.
Add an optional unique_id parameter to nodedev. Allows for easier lookup
and display of the unique_id value in order to document for use with
scsi_host code.
Between reboots and kernel reloads, the SCSI host number used for SCSI
storage pools may change requiring modification to the storage pool XML
in order to use a specific SCSI host adapter.
This patch introduces the "parentaddr" element and "unique_id" attribute
for the SCSI host adapter in order to uniquely identify the adapter
between reboots and kernel reloads. For now the goal is to only parse
and format the XML. Both will be required to be provided in order to
uniquely identify the desired SCSI host.
The new XML is expected to be as follows:
<adapter type='scsi_host'>
<parentaddr unique_id='3'>
<address domain='0x0000' bus='0x00' slot='0x1f' func='0x2'/>
</parentaddr>
</adapter>
where "parentaddr" is the parent device of the SCSI host using the PCI
address on which the device resides and the value from the unique_id file
for the device. Both the PCI address and unique_id values will be used
to traverse the /sys/class/scsi_host/ directories looking at each link
to match the PCI address reformatted to the directory link format where
"domain🚌slot:function" is found. Then for each matching directory
the unique_id file for the scsi_host will be used to match the unique_id
value in the xml.
For a PCI address listed above, this will be formatted to "0000:00:1f.2"
and the links in /sys/class/scsi_host will be used to find the host#
to be used for the 'scsi_host' device. Each entry is a link to the
/sys/bus/pci/devices directories, e.g.:
% ls -al /sys/class/scsi_host/host2
lrwxrwxrwx. 1 root root 0 Jun 1 00:22 /sys/class/scsi_host/host2 -> ../../devices/pci0000:00/0000:00:1f.2/ata3/host2/scsi_host/host2
% cat /sys/class/scsi_host/host2/unique_id
3
The "parentaddr" and "name" attributes are mutually exclusive to identify
the SCSI host number. Use of the "parentaddr" element will be the preferred
mechanism.
This patch only supports to parse and format the XMLs. Later patches will
add code to find out the scsi host number.
Gluster volumes don't start with a leading slash. Our schema for netfs
gluster pools enforces it though. Luckily mount.glusterfs skips it.
Allow a slashless volume name for glusterfs netfs mounts in the schema.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1101999
LXC network devices can now be assigned a custom NIC device name on the
container side. For example, this is configured with:
<interface type='network'>
<source network='default'/>
<guest dev="eth1"/>
</interface>
In this example the network card will appear as eth1 in the guest.
The previous commit 09d4d26 put the interleave at the wrong point;
it didn't allow interleaving with <memory>.
* docs/schema/domaincommon.rng (numatune): Fix interleave location.
* tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode.xml: Adjust test.
Signed-off-by: Eric Blake <eblake@redhat.com>
In XML format, by definition, order of fields should not matter, so
order of parsing the elements doesn't affect the end result. When
specifying guest NUMA cells, we depend only on the order of the 'cell'
elements. With this patch all older domain XMLs are parsed as before,
but with the 'id' attribute they are parsed and formatted according to
that field. This will be useful when we have tuning settings for
particular guest NUMA node.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This patch adds support for the QEMU vhost-user feature to libvirt.
vhost-user enables the communication between a QEMU virtual machine
and other userspace process using the Virtio transport protocol.
It uses a char dev (e.g. Unix socket) for the control plane,
while the data plane based on shared memory.
The XML looks like:
<interface type='vhostuser'>
<mac address='52:54:00:3b:83:1a'/>
<source type='unix' path='/tmp/vhost.sock' mode='server'/>
<model type='virtio'/>
</interface>
Signed-off-by: Michele Paolino <m.paolino@virtualopensystems.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add 'nocow' to storage volume xml so that user can have an option
to set NOCOW flag to the newly created volume. It's useful on btrfs
file system to enhance performance.
Btrfs has low performance when hosting VM images, even more when the guest
in those VM are also using btrfs as file system. One way to mitigate this
bad performance is to turn off COW attributes on VM files. Generally, there
are two ways to turn off COW on btrfs: a) by mounting fs with nodatacow,
then all newly created files will be NOCOW. b) per file. Add the NOCOW file
attribute. It could only be done to empty or new files.
This patch tries the second way, according to 'nocow' option, it could set
NOCOW flag per file:
for raw file images, handle 'nocow' in libvirt code; for non-raw file images,
pass 'nocow=on' option to qemu-img, and let qemu-img to handle that (requires
qemu-img version >= 2.1).
Signed-off-by: Chunyan Liu <cyliu@suse.com>
This new module holds and formats capabilities for emulator. If you
are about to create a new domain, you may want to know what is the
host or hypervisor capable of. To make sure we don't regress on the
XML, the formatting is not something left for each driver to
implement, rather there's general format function.
The domain capabilities is a lockable object (even though the locking
is not necessary yet) which uses reference counter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This introduces two new attributes "cmd_per_lun" and "max_sectors" same
with the names QEMU uses for virtio-scsi. An example of the XML:
<controller type='scsi' index='0' model='virtio-scsi' cmd_per_lun='50'
max_sectors='512'/>
The corresponding QEMU command line:
-device virtio-scsi-pci,id=scsi0,cmd_per_lun=50,max_sectors=512,
bus=pci.0,addr=0x3
Signed-off-by: Mike Perez <thingee@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The interface state for bonds and vlans does seem to reflect the state
of the underlying physical devices, at least in some cases, so it
makes sense to allow reporting it (netcf now does).
The link state/speed for bridge devices is meaningless though, so we
don't even look for it.
The interface xml schema was written with strict rules about the
ordering of the elements. This was never intentional, but just due to
omission of <interleave> in the appropriate places. This patch just
adds in <interleave> wherever there is more than one element, and
re-indents everything else appropriately.
There are two places where you'll find info on page sizes. The first
one is under <cpu/> element, where all supported pages sizes are
listed. Then the second one is under each <cell/> element which refers
to concrete NUMA node. At this place, the size of page's pool is
reported. So the capabilities XML looks something like this:
<capabilities>
<host>
<uuid>01281cda-f352-cb11-a9db-e905fe22010c</uuid>
<cpu>
<arch>x86_64</arch>
<model>Westmere</model>
<vendor>Intel</vendor>
<topology sockets='1' cores='1' threads='1'/>
...
<pages unit='KiB' size='4'/>
<pages unit='KiB' size='2048'/>
<pages unit='KiB' size='1048576'/>
</cpu>
...
<topology>
<cells num='4'>
<cell id='0'>
<memory unit='KiB'>4054408</memory>
<pages unit='KiB' size='4'>1013602</pages>
<pages unit='KiB' size='2048'>3</pages>
<pages unit='KiB' size='1048576'>1</pages>
<distances/>
<cpus num='1'>
<cpu id='0' socket_id='0' core_id='0' siblings='0'/>
</cpus>
</cell>
<cell id='1'>
<memory unit='KiB'>4071072</memory>
<pages unit='KiB' size='4'>1017768</pages>
<pages unit='KiB' size='2048'>3</pages>
<pages unit='KiB' size='1048576'>1</pages>
<distances/>
<cpus num='1'>
<cpu id='1' socket_id='0' core_id='0' siblings='1'/>
</cpus>
</cell>
...
</cells>
</topology>
...
</host>
<guest/>
</capabilities>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 7c6fc39 introduced a regression in the XML produced for older
clients. The argument at the time was that clients shouldn't be
depending on output-only data for something that is only going to
be triggered for a transient guest; but John Ferlan reported that
the automated testsuite was such a client. It's better to be safe
than sorry by guaranteeing back-compat cruft. Note that later
patches will be using <mirror> for active block commit, but there
we don't have to worry about back-compat.
* src/conf/domain_conf.c (virDomainDiskDefFormat): Restore old
style output when necessary.
* docs/schemas/domaincommon.rng: Validate back-compat style.
* docs/formatdomain.html.in: Update the documentation.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml:
Update tests.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
This new element is there to represent PCI-Express capabilities
of a PCI devices, like link speed, number of lanes, etc.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
While exposing the info under <interface/> in previous patch works, it
may work only in cases where interface is configured on the host.
However, orchestrating application may want to know the link state and
speed even in that case. That's why we ought to expose this in nodedev
XML too:
virsh # nodedev-dumpxml net_eth0_f0_de_f1_2b_1b_f3
<device>
<name>net_eth0_f0_de_f1_2b_1b_f3</name>
<path>/sys/devices/pci0000:00/0000:00:19.0/net/eth0</path>
<parent>pci_0000_00_19_0</parent>
<capability type='net'>
<interface>eth0</interface>
<address>f0🇩🇪f1:2b:1b:f3</address>
<link speed='1000' state='up'/>
<capability type='80203'/>
</capability>
</device>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently it is not possible to determine the speed of an interface
and whether a link is actually detected from the API. Orchestrating
platforms want to be able to determine when the link has failed and
where multiple speeds may be available which one the interface is
actually connected at. This commit introduces an extension to our
interface XML (without implementation to interface driver backends):
<interface type='ethernet' name='eth0'>
<start mode='none'/>
<mac address='aa:bb:cc:dd:ee:ff'/>
<link speed='1000' state='up'/>
<mtu size='1492'/>
...
</interface>
Where @speed is negotiated link speed in Mbits per second, and state
is the current NIC state (can be one of the following: "unknown",
"notpresent", "down", "lowerlayerdown","testing", "dormant", "up").
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that we track a disk mirror as a virStorageSource, we might
as well update the XML to theoretically allow any type of
mirroring destination (not just a local file). A later patch
will also be reusing <mirror> to track the block commit of the
top layer of a chain, which is another case where libvirt needs
to update the backing chain after the job is finally pivoted,
and since backing chains can have network backing files as the
destination to commit into, it makes more sense to display that
in the XML.
This patch changes output-only XML; it was already documented
that <mirror> does not affect a domain definition at this point
(because qemu doesn't provide persistent bitmaps yet). Any
application that was starting a block copy job with older libvirt
and then relying on the domain XML to determine if it was
complete will no longer be able to access the file= and format=
attributes of mirror that were previously used. However, this is
not going to be a problem in practice: the only time a block copy
job works is on a transient domain, and any app that is managing
a transient domain probably already does enough of its own
bookkeeping to know which file it is mirroring into without
having to re-read it from the libvirt XML. The one thing that
was likely to be used in a mirroring job was the ready=
attribute, which is unchanged. Meanwhile, I made sure the schema
and parser still accept the old format, even if we no longer
output it, so that upgrading from an older version of libvirt is
seamless.
* docs/schemas/domaincommon.rng (diskMirror): Alter definition.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): Parse two
styles of mirror elements.
(virDomainDiskDefFormat): Output new style.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror-old.xml: New
file, copied from...
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: ...here
before modernizing.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old*: New
files.
* tests/qemuxml2xmltest.c (mymain): Test both styles.
Signed-off-by: Eric Blake <eblake@redhat.com>
A PCI device can be associated with a specific NUMA node. Later, when
a guest is pinned to one NUMA node the PCI device can be assigned on
different NUMA node. This makes DMA transfers travel across nodes and
thus results in suboptimal performance. We should expose the NUMA node
locality for PCI devices so management applications can make better
decisions.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If user or management application wants to create a guest,
it may be useful to know the cost of internode latencies
before the guest resources are pinned. For example:
<capabilities>
<host>
...
<topology>
<cells num='2'>
<cell id='0'>
<memory unit='KiB'>4004132</memory>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='20'/>
</distances>
<cpus num='2'>
<cpu id='0' socket_id='0' core_id='0' siblings='0'/>
<cpu id='2' socket_id='0' core_id='2' siblings='2'/>
</cpus>
</cell>
<cell id='1'>
<memory unit='KiB'>4030064</memory>
<distances>
<sibling id='0' value='20'/>
<sibling id='1' value='10'/>
</distances>
<cpus num='2'>
<cpu id='1' socket_id='0' core_id='0' siblings='1'/>
<cpu id='3' socket_id='0' core_id='2' siblings='3'/>
</cpus>
</cell>
</cells>
</topology>
...
</host>
...
</capabilities>
We can see the distance from node1 to node0 is 20 and within nodes 10.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The XML for quite a longish backing chain is shown below:
<disk type='network' device='disk'>
<driver name='qemu' type='qcow2'/>
<source protocol='nbd' name='bar'>
<host transport='unix' socket='/var/run/nbdsock'/>
</source>
<backingStore type='block' index='1'>
<format type='qcow2'/>
<source dev='/dev/HostVG/QEMUGuest1'/>
<backingStore type='file' index='2'>
<format type='qcow2'/>
<source file='/tmp/image2.qcow'/>
<backingStore type='file' index='3'>
<format type='qcow2'/>
<source file='/tmp/image3.qcow'/>
<backingStore type='file' index='4'>
<format type='qcow2'/>
<source file='/tmp/image4.qcow'/>
<backingStore type='file' index='5'>
<format type='qcow2'/>
<source file='/tmp/image5.qcow'/>
<backingStore type='file' index='6'>
<format type='raw'/>
<source file='/tmp/Fedora-17-x86_64-Live-KDE.iso'/>
<backingStore/>
</backingStore>
</backingStore>
</backingStore>
</backingStore>
</backingStore>
</backingStore>
<target dev='vdb' bus='virtio'/>
</disk>
Various disk types and formats can be mixed in one chain. The
<backingStore/> empty element marks the end of the backing chain and it
is there mostly for future support of parsing the chain provided by a
user. If it's missing, we are supposed to probe for the rest of the
chain ourselves, otherwise complete chain was provided by the user. The
index attributes of backingStore elements can be used to unambiguously
identify a specific part of the image chain.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
I noticed that depending on the <driver> attributes the user passed
in, the output may omit the <driver> element altogether. For example,
the rerror_policy has had this problem since commit 4bb4109 in Oct
2011. But in adding testsuite coverage to expose it, I found another
problem: the C code is just fine without a driver name, but the
XML validator required either a name or a cache mode.
* src/conf/domain_conf.c (virDomainDiskDefFormat): Update
conditional.
* docs/schemas/domaincommon.rng (diskDriver): Simplify.
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-copy-on-read.xml:
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-copy-on-read.args:
New files.
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-discard.xml:
Enhance test.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-drive-discard.xml:
Likewise.
* tests/qemuxml2argvtest.c (mymain): New test.
* tests/qemuxml2xmltest.c (mymain): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
To make <disk> schema more maintainable and to allow for moving the
pieces to a common file in the future. It relies on the ability to
override definitions as part of an include, set up in the previous
patch.
The diff is a bit hard to read, because it mixes reindentation
with refactoring; 'git diff -b --patience' may help.
* docs/schemas/domaincommon.rng (disk): Refactor into pieces.
(diskSource, diskSourceFile, diskSourceBlock, diskSourceDir)
(diskSourceVolume: New defines.
(diskSourceNetwork): Revise scope.
* docs/schemas/domainsnapshot.rng (disksnapshot): Adjust.
* tests/domainsnapshotxml2xmlin/disk-seclabel-invalid.xml,
tests/domainsnapshotxml2xmlin/disk-network-seclabel-invalid.xml: New
tests to check seclabel is forbidden in domain snapshot by schema.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This patch is my first experience playing with nested grammars,
as documented in http://relaxng.org/tutorial-20011203.html#IDA3PZR.
I plan on doing more overrides in order to make the RelaxNG
grammar mirror the C code refactoring into a common
virStorageSource, but where different clients of that source do
not support the same subset of functionality. By starting with
something fairly easy to validate, I can make sure my later
patches will be possible.
This patch adds a use of the no-op <ref
name='sourceStartupPolicy'/> to the disksnapshot definition, so
that the snapshot version of a type='file' <source> more closely
resembles the version in domaincommon. A future patch will merge
the two files into using a common define, but this patch is
sufficient for testing that adding <source
startupPolicy='optional'/> in any of the
tests/domainsnapshotxml2xmlin/*.xml files still gets rejected
unless it occurs within the <domain> subelement, because the
definition of startupPolicy is empty outside of domain.rng.
* docs/schemas/storagecommon.rng (storageStartupPolicy)
(storageSourceExtra): Create no-op defaults.
* docs/schemas/domainsnapshot.rng (domain): Use nested grammar
to avoid restricting <domain>.
(storageSourceExtra): Create new override.
(disksnapshot): Access overrides through common names.
* docs/schemas/domaincommon.rng (disk): Access overrides through
common names.
* docs/schemas/domain.rng (storageStartupPolicy)
(storageSourceExtra): Create new overrides.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Domain snapshots should only permit an external snapshot into
a storage format that permits a backing chain, since the new
snapshot file necessarily must be backed by the existing file.
The C code for the qemu driver is a little bit stricter in
currently enforcing only qcow2 or qed, but at the XML parser
level, including virt-xml-validate, it is fairly easy to
enforce that a user can't request a 'raw' external snapshot.
* docs/schemas/storagecommon.rng (storageFormat): Split out...
(storageFormatBacking): ...new sublist.
* docs/schemas/domainsnapshot.rng (disksnapshotdriver): Use new
type.
* src/util/virstoragefile.h (virStorageFileFormat): Rearrange for
easier code management.
* src/util/virstoragefile.c (virStorageFileFormat, fileTypeInfo):
Likewise.
* src/conf/snapshot_conf.c (virDomainSnapshotDiskDefParseXML): Use
new marker to limit selection of formats.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We had incomplete RelaxNG support for storage formats listed
in virstoragefile.h: commit 027bf2e added 'vdi' but forgot
to update the <volume> and <domain> xml lists; the <volume>
list was also missing 'fat' and 'vhd'. Maintaining two lists
is a recipe for them getting out of sync, so make the list
common so that both contexts benefit the next time we add a
format in a single location.
* docs/schemas/domaincommon.rng (storageFormat): Move...
* docs/schemas/storagecommon.rng: ...here, and add vdi.
* docs/schemas/storagevol.rng (formatfile): Use common list.
Signed-off-by: Eric Blake <eblake@redhat.com>
In general, we try to make virt-xml-validate tolerant of input
elements in any order when possible. However, as written, the
RNG grammar did not permit <source> unless there was an explicit
type= attribute (even though the C code manages just fine by
defaulting to type='file'). After making the attribute optional
on the 'file' branch, I noticed that the use of diskspec was now
redundant with the branch when no <source> was supplied.
View this patch with 'git diff -b' for a better picture of the
schema change.
* docs/schemas/domaincommon.rng (disk): Hoist 'diskspec' out of
choice, make type='file' default, and still preserve interleave.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-source-pool.xml:
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-drive-discard.xml:
New files.
* tests/qemuxml2argvdata/qemuxml2argv-disk-source-pool.xml:
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-discard.xml:
Reorder XML.
* tests/qemuxml2xmltest.c (mymain): Cover new files.
Signed-off-by: Eric Blake <eblake@redhat.com>
Having two tiny files with a couple definitions didn't make
as much sense as one common file, especially since I plan to
add more definitions and use it in more places.
* docs/schemas/storageencryption.rng: Merge this...
* docs/schemas/storagefilefeatures.rng: ...and this, into...
* docs/schemas/storagecommon.rng: ...this new file.
* docs/schemas/Makefile.am (schema_DATA): Reflect renames.
* docs/schemas/storagevol.rng: Likewise.
* docs/schemas/domaincommon.rng: Likewise.
* libvirt.spec.in: Likewise.
* mingw-libvirt.spec.in: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch adds an element to QEMU's capability XML, to
show if the underlying QEMU binary supports the live disk
snapshotting or not.
This allows any client to know ahead of time if the feature
is available.
Without this information available, the only way to check
for the snapshot support is to request one and check for
errors.
Signed-off-by: Francesco Romani <fromani@redhat.com>
There is no keyboard support currently in libvirt.
For some platforms (PPC64 QEMU) this makes graphics unusable,
since the keyboard is not implicit and it can't be added via libvirt.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Add a new character device backend called 'spiceport' that uses
spice's channel for communications and apart from spicevmc can be used
as a backend for any character device from libvirt's point of view.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add a new <timer> for the HyperV reference time counter enlightenment
and the iTSC reference page for Windows guests.
This feature provides a paravirtual approach to track timer events for
the guest (similar to kvmclock) with the option to use real hardware
clock on systems with a iTSC with compensation across various hosts.
According to the documentation describing various tunables for domain
timers not all the fields are supported by all the driver types. Express
these in the RNG:
- rtc, platform: Only these support the "track" attribute.
- tsc: only one to support "frequency" and "mode" attributes
- hpet, pit: tickpolicy/catchup attribute/element
- kvmclock: no extra attributes are supported
Additionally the attributes of the <catchup> element for
tickpolicy='catchup' are optional according to the parsing code. Express
this in the XML and fix a spurious space added while formatting the
<catchup> element and add tests for it.
Add support for specifying various types when doing snapshots. This will
later allow to do snapshots on network backed volumes. Disks of type
'volume' are not supported by snapshots (yet).
Also amend the test suite to check parsing of the various new disk
types that can now be specified.
spice-server offers an API to disable file transfer messages
on the agent channel between the client and the guest.
This is supported in qemu through the disable-agent-file-xfer option.
This patch exposes this option to libvirt.
Adds a new element 'filetransfer', with one property,
'enable', which accepts a boolean.
Default is enabled, for backward compatibility.
Depends on the capability exported in the first patch of the series.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch introduces new xml elements under <blkiotune>,
we use these new elements to setup the throttle blkio
cgroup for domain. The new blkiotune node looks like this:
<blkiotune>
<device>
<path>/path/to/block</path>
<weight>1000</weight>
<read_iops_sec>10000</read_iops_sec>
<write_iops_sec>10000</write_iops_sec>
<read_bytes_sec>1000000</read_bytes_sec>
<write_bytes_sec>1000000</write_bytes_sec>
</device>
</blkiotune>
Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
When idmap was added to LXC, we forgot to cover it in the testsuite.
The schema was missing an <element> layer, and as a result,
virt-xml-validate was failing on valid dumpxml output.
Reported by Eduard - Gabriel Munteanu on IRC.
* docs/schemas/domaincommon.rng (idmap): Include <idmap> element,
and support interleaves.
* tests/lxcxml2xmldata/lxc-idmap.xml: New file.
* tests/lxcxml2xmltest.c (mymain): Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
AArch64 qemu has similar behavior as armv7l, like use of mmio etc.
This patch adds similar bypass checks what we have for armv7l to aarch64.
E.g. we are enabling mmio transport for Nicdev.
Making addDefaultUSB and addDefaultMemballoon to false etc.
V3:
- Adding missing domain rng schema for aarcg64 and test case in
testutilsqemu.c which was causing test suite failure
while running make check.
V2:
- Added testcase to qemuxml2argvtest as suggested
during review comments of V1.
V1:
- Initial patch.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
https://bugzilla.redhat.com/show_bug.cgi?id=1035118
When outputting the XML for the RNG device, the code didn't format the
PCI address info. Additionally the schema wasn't expecting the info
although it was being parsed and used internally. Fix those mistakes and
add test for the PCI info section.
In the 'directory' and 'netfs' storage pools, a user can see
both 'file' and 'dir' storage volume types, to know when they
can descend into a subdirectory. But in a network-based storage
pool, such as the upcoming 'gluster' pool, we use 'network'
instead of 'file', and did not have any counterpart for a
directory until this patch. Adding a new volume type
'network-dir' is better than reusing 'dir', because it makes
it clear that the only way to access 'network' volumes within
that container is through the network mounting (leaving 'dir'
for something accessible in the local file system).
* include/libvirt/libvirt.h.in (virStorageVolType): Expand enum.
* docs/formatstorage.html.in: Document it.
* docs/schemasa/storagevol.rng (vol): Allow new value.
* src/conf/storage_conf.c (virStorageVol): Use new value.
* src/qemu/qemu_command.c (qemuBuildVolumeString): Fix client.
* src/qemu/qemu_conf.c (qemuTranslateDiskSourcePool): Likewise.
* tools/virsh-volume.c (vshVolumeTypeToString): Likewise.
* src/storage/storage_backend_fs.c
(virStorageBackendFileSystemVolDelete): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add support for a new <pool type='gluster'>, similar to
RBD and Sheepdog. Terminology wise, a gluster volume
forms a libvirt storage pool, within the gluster volume,
individual files are treated as libvirt storage volumes.
* docs/schemas/storagepool.rng (poolgluster): New pool type.
* docs/formatstorage.html.in: Document gluster.
* docs/storage.html.in: Likewise, and contrast it with netfs.
* tests/storagepoolxml2xmlin/pool-gluster.xml: New test.
* tests/storagepoolxml2xmlout/pool-gluster.xml: Likewise.
* tests/storagepoolxml2xmltest.c (mymain): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
I got annoyed at having to use both 'virsh vol-list $pool --details'
AND 'virsh vol-dumpxml $vol $pool' to learn if I had populated
the volume correctly. Since two-thirds of the data present in
virStorageVolGetInfo() already appears in virStorageVolGetXMLDesc(),
this just adds the remaining piece of information, as:
<volume type='...'>
...
</volume>
* docs/formatstorage.html.in: Document new <volume type=...>.
* docs/schemas/storagevol.rng (vol): Add it to RelaxNG.
* src/conf/storage_conf.h (virStorageVolTypeToString): Declare.
* src/conf/storage_conf.c (virStorageVolTargetDefFormat): Output
the metatype.
(virStorageVolDefParseXML): Parse it, for unit tests.
* tests/storagevolxml2xmlout/vol-*.xml: Update tests to match.
Signed-off-by: Eric Blake <eblake@redhat.com>
The RNG grammar did not allow arbitrary interleaving, which makes
it harder than necessary to create a new volume from handwritten XML.
(Compare also to commit caf516db for pools).
* docs/schemas/storagevol.rng: Support interleaving.
* tests/storagevolxml2xmlin/vol-file-backing.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Older xmllint version don't allow such characters in datatype anyURI.
In order not to change too much, I'm suggesting making a choice of
anyURI or 'absPathName' which should be fine (checked with upstream
and that old xmllint, both work fine).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
$ touch /var/lib/libvirt/images/'a<b>c'
$ virsh pool-refresh default
$ virsh vol-dumpxml 'a<b>c' default | head -n2
<volume>
<name>a<b>c</name>
Oops. That's not valid XML. And when we fix the XML
generation, it fails RelaxNG validation.
I'm also tired of seeing <key>(null)</key> in the example
output for volume xml; while we used NULLSTR() to avoid
a NULL deref rather than relying on glibc's printf
extension behavior, it's even better if we avoid the issue
in the first place. But this requires being careful that
we don't invalidate any storage backends that were relying
on key being unassigned during virStoragVolCreateXML[From].
I would have split this into two patches (one for escaping,
one for avoiding <key>(null)</key>), but since they both
end up touching a lot of the same test files, I ended up
merging it into one.
Note that this patch allows pretty much any volume name
that can appear in a directory (excluding . and .. because
those are special), but does nothing to change the current
(unenforced) RelaxNG claim that pool names will consist
only of letters, numbers, _, -, and +. Tightening the C
code to match RelaxNG patterns and/or relaxing the grammar
to match the C code for pool names is a task for another
day (but remember, we DID recently tighten C code for
domain names to exclude a leading '.').
* src/conf/storage_conf.c (virStoragePoolSourceFormat)
(virStoragePoolDefFormat, virStorageVolTargetDefFormat)
(virStorageVolDefFormat): Escape user-controlled strings.
(virStorageVolDefParseXML): Parse key, for use in unit tests.
* src/storage/storage_driver.c (storageVolCreateXML)
(storageVolCreateXMLFrom): Ensure parsed key doesn't confuse
volume creation.
* docs/schemas/basictypes.rng (volName): Relax definition.
* tests/storagepoolxml2xmltest.c (mymain): Test it.
* tests/storagevolxml2xmltest.c (mymain): Likewise.
* tests/storagepoolxml2xmlin/pool-dir-naming.xml: New file.
* tests/storagepoolxml2xmlout/pool-dir-naming.xml: Likewise.
* tests/storagevolxml2xmlin/vol-file-naming.xml: Likewise.
* tests/storagevolxml2xmlout/vol-file-naming.xml: Likewise.
* tests/storagevolxml2xmlout/vol-*.xml: Fix fallout.
Signed-off-by: Eric Blake <eblake@redhat.com>
While trying to compare netfs against my new gluster pool, I
discovered two things:
virt-xml-validate chokes on valid xml produced by 'virsh pool-dumpxml'
[yet another reason that ALL patches that add new xml should be adding
corresponding tests]
When using glusterfs FUSE mounts, you cannot access a subdirectory
of a gluster volume. The recommended workaround in the gluster
community is to mount the volume to an intermediate location, then
bind-mount the desired subdirectory to the final location. Maybe
we should teach libvirt to do bind-mounting, but for now I chose to
just document the limitation.
* docs/storage.html.in: Improve documentation.
* docs/schemas/storagepool.rng (sourcefmtnetfs): Allow all
formats, and drop redundant info-vendor.
* tests/storagepoolxml2xmltest.c (mymain): New test.
* tests/storagepoolxml2xmlin/pool-netfs-gluster.xml: New file.
* tests/storagepoolxml2xmlout/pool-netfs-gluster.xml: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The linux kernel recently added support for paravirtual spinlock
handling to avoid performance regressions on overcomitted hosts. This
feature needs to be turned in the hypervisor so that the guest OS is
notified about the possible support.
This patch adds a new feature "paravirt-spinlock" to the XML and
supporting code to enable the "kvm_pv_unhalt" pseudo CPU feature in
qemu.
https://bugzilla.redhat.com/show_bug.cgi?id=1008989
Expand the "secmodel" XML fragment of "host" with a sequence of
baselabel's which describe the default security context used by
libvirt with a specific security model and virtualization type:
<secmodel>
<model>selinux</model>
<doi>0</doi>
<baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel>
<baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel>
</secmodel>
<secmodel>
<model>dac</model>
<doi>0</doi>
<baselabel type='kvm'>107:107</baselabel>
<baselabel type='qemu'>107:107</baselabel>
</secmodel>
"baselabel" is driver-specific information, e.g. in the DAC security
model, it indicates USER_ID:GROUP_ID.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The RNG grammar did not allow arbitrary interleaving, which makes
it harder than necessary to create a new pool from handwritten XML.
* docs/schemas/storagepool.rng: Allow interleaving.
* tests/storagepoolxml2xmlin/pool-sheepdog.xml: Test interleave.
* tests/storagepoolxml2xmlin/pool-iscsi-auth.xml: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Since 76b644c when the support for RAM filesystems was introduced,
libvirt accepted the following XML:
<source usage='1024' unit='KiB'/>
This was parsed correctly and internally stored in bytes, but it
was formatted as (with an extra 's'):
<source usage='1024' units='KiB'/>
When read again, this was treated as if the units were missing,
meaning libvirt was unable to parse its own XML correctly.
The usage attribute was documented as being in KiB, but it was not
scaled if the unit was missing. Transient domains still worked,
because this was balanced by an extra 'k' in the mount options.
This patch:
Changes the parser to use 'units' instead of 'unit', as the latter
was never documented (fixing persistent domains) and some programs
(libvirt-glib, libvirt-sandbox) already parse the 'units' attribute.
Removes the extra 'k' from the tmpfs mount options, which is needed
because now we parse our own XML correctly.
Changes the default input unit to KiB to match documentation, fixing:
https://bugzilla.redhat.com/show_bug.cgi?id=1015689
Commit id 'c4a4603de' added an output <path> to the nodedev xml, but
did not update the schema.
This resulted in the failure of the 'virt-xml-validate' on a file
generated by 'virsh nodedev-dumpxml pci_0000_00_00_0' (for example).
This was found/seen by running autotest on my host.
This resolves one of the issues in:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
This device is identical to qemu's "intel-hda" device (known as "ich6"
in libvirt), but has a different PCI device ID (which matches the ID
of the hda audio built into the ich9 chipset, of course). It's not
supported in earlier versions of qemu, so it requires a capability
bit.
Useful to set custom forwarders instead of using the contents of
/etc/resolv.conf. It helps me to setup dnsmasq as local nameserver to
resolve VM domain names from domain 0, when domain option is used.
Signed-off-by: Diego Woitasen <diego.woitasen@vhgroup.net>
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently the XML parser already allows the following syntax:
<disk type='block' device='cdrom'>
<source startupPolicy='optional'/>
<target dev='hda' bus='ide'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
But it if the dev value is NULL then it would not have the leading
"<source ", resulting in invalid XML.
qemu/KVM also supports a tftp URL while specifying the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='tftp' name='/url/path'>
<host name='host.name' port='69'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The ftps protocol is another protocol supported by qemu/KVM while specifying
the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftps' name='/url/path'>
<host name='host.name' port='990'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The https protocol is also accepted by qemu/KVM when specifying the cdrom ISO
image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='https' name='/url/path'>
<host name='host.name' port='443'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
Commits 905629f4 and 1716e7a6 have added support for specifying
an IPv4 range and a port range to be used by NAT:
<forward mode='nat'>
<nat>
<address start='10.20.30.40' end='10.20.30.44'/>
<port start='60000' end='65432'/>
</nat>
</forward>
https://bugzilla.redhat.com/show_bug.cgi?id=1004364
This corresponds to '-sd' and '-drive if=sd' on the qemu command line.
Needed for many ARM boards which don't provide any other way to
pass in storage.
Add an attribute named 'removable' to the 'target' element of disks,
which controls the removable flag. For instance, on a Linux guest it
controls the value of /sys/block/$dev/removable. This option is only
valid for USB disks (i.e. bus='usb'), and its default value is 'off',
which is the same behaviour as before.
To achieve this, 'removable=on' (or 'off') is appended to the '-device
usb-storage' parameter sent to qemu when adding a USB disk via
'-disk'. A capability flag QEMU_CAPS_USB_STORAGE_REMOVABLE was added
to keep track if this option is supported by the qemu version used.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=922495
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
<controller type='pci' index='0' model='pci-root'>
<pcihole64 unit='KiB'>1048576</pcihole64>
</controller>
It can be used to adjust (or disable) the size of the 64-bit
PCI hole. The size attribute is in kilobytes (different unit
can be specified on input), but it gets rounded up to
the nearest GB by QEMU.
Disabling it will be needed for guests that crash with the
64-bit PCI hole (like Windows XP), see:
https://bugzilla.redhat.com/show_bug.cgi?id=990418
The ftp protocol is already recognized by qemu/KVM so add this support to
libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftp' name='/url/path'>
<host name='host.name' port='21'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
QEMU/KVM already allows a HTTP URL for the cdrom ISO image so add this support
to libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='http' name='/url/path'>
<host name='host.name' port='80'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=924153
Commit 904e05a2 (v0.9.9) added a per-<disk> seclabel element with
an attribute relabel='no' in order to try and minimize the
impact of shutdown delays when an NFS server disappears. The idea
was that if a disk is on NFS and can't be labeled in the first
place, there is no need to attempt the (no-op) relabel on domain
shutdown. Unfortunately, the way this was implemented was by
modifying the domain XML so that the optimization would survive
libvirtd restart, but in a way that is indistinguishable from an
explicit user setting. Furthermore, once the setting is turned
on, libvirt avoids attempts at labeling, even for operations like
snapshot or blockcopy where the chain is being extended or pivoted
onto non-NFS, where SELinux labeling is once again possible. As
a result, it was impossible to do a blockcopy to pivot from an
NFS image file onto a local file.
The solution is to separate the semantics of a chain that must
not be labeled (which the user can set even on persistent domains)
vs. the optimization of not attempting a relabel on cleanup (a
live-only annotation), and using only the user's explicit notation
rather than the optimization as the decision on whether to skip
a label attempt in the first place. When upgrading an older
libvirtd to a newer, an NFS volume will still attempt the relabel;
but as the avoidance of a relabel was only an optimization, this
shouldn't cause any problems.
In the ideal future, libvirt will eventually have XML describing
EVERY file in the backing chain, with each file having a separate
<seclabel> element. At that point, libvirt will be able to track
more closely which files need a relabel attempt at shutdown. But
until we reach that point, the single <seclabel> for the entire
<disk> chain is treated as a hint - when a chain has only one
file, then we know it is accurate; but if the chain has more than
one file, we have to attempt relabel in spite of the attribute,
in case part of the chain is local and SELinux mattered for that
portion of the chain.
* src/conf/domain_conf.h (_virSecurityDeviceLabelDef): Add new
member.
* src/conf/domain_conf.c (virSecurityDeviceLabelDefParseXML):
Parse it, for live images only.
(virSecurityDeviceLabelDefFormat): Output it.
(virDomainDiskDefParseXML, virDomainChrSourceDefParseXML)
(virDomainDiskSourceDefFormat, virDomainChrDefFormat)
(virDomainDiskDefFormat): Pass flags on through.
* src/security/security_selinux.c
(virSecuritySELinuxRestoreSecurityImageLabelInt): Honor labelskip
when possible.
(virSecuritySELinuxSetSecurityFileLabel): Set labelskip, not
norelabel, if labeling fails.
(virSecuritySELinuxSetFileconHelper): Fix indentation.
* docs/formatdomain.html.in (seclabel): Document new xml.
* docs/schemas/domaincommon.rng (devSeclabel): Allow it in RNG.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.xml:
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.args:
* tests/qemuxml2xmloutdata/qemuxml2xmlout-seclabel-*-labelskip.xml:
New test files.
* tests/qemuxml2argvtest.c (mymain): Run the new tests.
* tests/qemuxml2xmltest.c (mymain): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
This resolves the issue that prompted the filing of
https://bugzilla.redhat.com/show_bug.cgi?id=928638
(although the request there is for something much larger and more
general than this patch).
commit f3868259ca disabled the
forwarding to upstream DNS servers of unresolved DNS requests for
names that had no domain, but were just simple host names (no "."
character anywhere in the name). While this behavior is frowned upon
by DNS root servers (that's why it was changed in libvirt), it is
convenient in some cases, and since dnsmasq can be configured to allow
it, it must not be strictly forbidden.
This patch restores the old behavior, but since it is usually
undesirable, restoring it requires specification of a new option in
the network config. Adding the attribute "forwardPlainNames='yes'" to
the <dns> elemnt does the trick - when that attribute is added to a
network config, any simple hostnames that can't be resolved by the
network's dnsmasq instance will be forwarded to the DNS servers listed
in the host's /etc/resolv.conf for an attempt at resolution (just as
any FQDN would be forwarded).
When that attribute *isn't* specified, unresolved simple names will
*not* be forwarded to the upstream DNS server - this is the default
behavior.
This PCI controller, named "dmi-to-pci-bridge" in the libvirt config,
and implemented with qemu's "i82801b11-bridge" device, connects to a
PCI Express slot (e.g. one of the slots provided by the pcie-root
controller, aka "pcie.0" on the qemu commandline), and provides 31
*non-hot-pluggable* PCI (*not* PCIe) slots, numbered 1-31.
Any time a machine is defined which has a pcie-root controller
(i.e. any q35-based machinetype), libvirt will automatically add a
dmi-to-pci-bridge controller if one doesn't exist, and also add a
pci-bridge controller. The reasoning here is that any useful domain
will have either an immediate (startup time) or eventual (subsequent
hot-plug) need for a standard PCI slot; since the pcie-root controller
only provides PCIe slots, we need to connect a dmi-to-pci-bridge
controller to it in order to get a non-hot-plug PCI slot that we can
then use to connect a pci-bridge - the slots provided by the
pci-bridge will be both standard PCI and hot-pluggable.
Since pci-bridge devices themselves can not be hot-plugged into a
running system (although you can hot-plug other devices into a
pci-bridge's slots), any new pci-bridge controller that is added can
(and will) be plugged into the dmi-to-pci-bridge as long as it has
empty slots available.
This patch is also changing the qemuxml2xml-pcie test from a "DO_TEST"
to a "DO_DIFFERENT_TEST". This is so that the "before" xml can omit
the automatically added dmi-to-pci-bridge and pci-bridge devices, and
the "after" xml can include it - this way we are testing if libvirt is
properly adding these devices.
This controller is implicit on q35 machinetypes. It provides 31 PCIe
(*not* PCI) slots as controller 0.
Currently there are no devices that can connect to pcie-root, and no
implicit pci controller on a q35 machine, so q35 is still
unusable. For a usable q35 system, we need to add a
"dmi-to-pci-bridge" pci controller, which can connect to pcie-root,
and provides standard pci slots that can be used to connect other
devices.
There are two ways to use a iSCSI LUN as disk source for qemu.
* The LUN's path as it shows up on host, e.g.
/dev/disk/by-path/ip-$ip:3260-iscsi-$iqn-fc18:iscsi.iscsi0-lun-1
* The libiscsi URI from the storage pool source element host attribute, e.g.
iscsi://demo.org:6000/iqn.1992-01.com.example/1
For a "volume" type disk, if the specified "pool" is of iscsi
type, we should support to use the LUN in either of above 2 ways.
That's why to introduce a new XML tag "mode" for the disk source
(libvirt should support iscsi pool with libiscsi, but it's another
new feature, which should be done later).
The "mode" can be either of "host" or "direct". Use "host" to indicate
use of the LUN with the path as it shows up on host. Use "direct" to
indicate to use it with the source pool host URI (future patches may support
to use network type libvirt storage too, e.g. Ceph)
When using logical pools, we had to trust the target->path provided.
This parameter, however, can be completely ommited and we can use
'/dev/<source.name>' safely and populate it to target.path.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=952973
The existing 'chap' XML logic was never used - just defined. Rather than
try to insert a square peg into a round hole, blow it up and rewrite the
logic to follow the 'ceph' format.
Remove the former "chap.login" and "chap.passwd" fields and replace
with "chap.username" and "chap.secret" in _virStoragePoolAuthChap.
Adjust the virStoragePoolDefParseAuthChap() to process.
Change the rng file to describe the new layout
Update the formatstorage.html to describe the usage of the secret element
to mention that the secret type "iscsi" and "ceph" can be used
to storage pool too.
Update the formatsecret.html to include a reference to the storage pool
Update tests to handle the changes from 'login' and 'passwd' to 'username'
and '<secret>' format
<hyperv>
<spinlocks state='off'/>
</hyperv>
results in:
error: XML error: missing HyperV spinlock retry count
Don't require retries when state is off and use virXPathUInt
instead of virXPathString to simplify parsing.
https://bugzilla.redhat.com/show_bug.cgi?id=784836#c19
This patch introduces new element <idmap> for
user namespace. for example
<idmap>
<uid start='0' target='1000' count='10'/>
<gid start='0' target='1000' count='10'/>
</idmap>
this new element is used for setting proc files
/proc/<pid>/{uid_map,gid_map}.
This patch also supports multiple uid/gid elements
setting in XML configuration.
We don't support the semi configuation, user has to
configure uid and gid both.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Implement check whether (maximum) vCPUs doesn't exceed machine
type's cpu-max settings.
On older versions of QEMU the check is disabled.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
This includes adding it to the nodedev parser and formatter, docs, and
test.
An example of the new iommuGroup element that is a part of the output
from "virsh nodedev-dumpxml" (virNodeDeviceGetXMLDesc()):
<device>
<name>pci_0000_02_00_1</name>
<capability type='pci'>
...
<iommuGroup number='12'>
<address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
<address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
</iommuGroup>
</capability>
</device>
This patch adds functionality to allow libvirt to configure the
'native-tagged' and 'native-untagged' modes on openvswitch networks.
Signed-off-by: Laine Stump <laine@redhat.com>
Add <features> and <compat> elements to volume target XML.
<compat> is a string which for qcow2 represents the QEMU version
it should be compatible with. Valid values are 0.10 and 1.1.
1.1 is implicit if the <features> element is present, otherwise
qemu-img default is used. 0.10 can be specified to explicitly
create older images after the qemu-img default changes.
<features> contains optional features, so far
<lazy_refcounts/> is available, which enables caching of reference
counters, improving performance for snapshots.
Add new CPU features for HyperV:
vapic for virtual APIC support
spinlocks for setting spinlock support
<features>
<hyperv>
<vapic state='on'/>
<spinlocks state='on' retries='4096'/>
</hyperv>
</features>
https://bugzilla.redhat.com/show_bug.cgi?id=784836
This attribute is going to represent number of queues for
multique vhost network interface. This commit implements XML
extension part of the feature and add one test as well. For now,
we can only do xml2xml test as qemu command line generation code
is not adapted yet.
-vnc :5900,share=allow-exclusive
allows clients to ask for exclusive access which is
implemented by dropping other connections Connecting
multiple clients in parallel requires all clients asking
for a shared session (vncviewer: -shared switch)
-vnc :5900,share=force-shared
disables exclusive client access. Useful for shared
desktop sessions, where you don't want someone forgetting
specify -shared disconnect everybody else.
-vnc :5900,share=ignore
completely ignores the shared flag and allows everybody
connect unconditionally
QEMU might support more values for "-drive discard", so using Bi-state
values (on/off) for it doesn't make sense.
"on" maps to "unmap", "off" maps to "ignore":
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
The following XML configuration can be used to request all domain's
memory pages to be kept locked in host's memory (i.e., domain's memory
pages will not be swapped out):
<memoryBacking>
<locked/>
</memoryBacking>
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
Adding support for new attribute 'websocket' in the '<graphics>'
element, the attribute value is the port to listen on with '-1'
meaning auto-allocation, '0' meaning no websockets.
QEMU introduced command line "-mem-merge=on|off" (defaults to on) to
enable/disable the memory merge (KSM) at guest startup. This exposes
it by new XML:
<memoryBacking>
<nosharepages/>
</memoryBacking>
The XML tag is same with what we used internally for old RHEL.
network: static route support for <network>
This patch adds the <route> subelement of <network> to define a static
route. the address and prefix (or netmask) attribute identify the
destination network, and the gateway attribute specifies the next hop
address (which must be directly reachable from the containing
<network>) which is to receive the packets destined for
"address/(prefix|netmask)".
These attributes are translated into an "ip route add" command that is
executed when the network is started. The command used is of the
following form:
ip route add <address>/<prefix> via <gateway> \
dev <virbr-bridge> proto static metric <metric>
Tests are done to validate that the input data are correct. For
example, for a static route ip definition, the address must be a
network address and not a host address. Additional checks are added
to ensure that the specified gateway is directly reachable via this
network (i.e. that the gateway IP address is in the same subnet as one
of the IP's defined for the network).
prefix='0' is supported for both family='ipv4' address='0.0.0.0'
netmask='0.0.0.0' or prefix='0', and for family='ipv6' address='::',
prefix=0', although care should be taken to not override a desired
system default route.
Anytime an attempt is made to define a static route which *exactly*
duplicates an existing static route (for example, address=::,
prefix=0, metric=1), the following error message will be sent to
syslog:
RTNETLINK answers: File exists
This can be overridden by decreasing the metric value for the route
that should be preferred, or increasing the metric for the route that
shouldn't be preferred (and is thus in place only in anticipation that
the preferred route may be removed in the future). Caution should be
used when manipulating route metrics, especially for a default route.
Note: The use of the command-line interface should be replaced by
direct use of libnl so that error conditions can be handled better. But,
that is being left as an exercise for another day.
Signed-off-by: Gene Czarcinski <gene@czarc.net>
Signed-off-by: Laine Stump <laine@laine.org>
The <filesystem> element can now accept a <driver type='nbd'/>
as an alternative to 'loop'. The benefit of NBD is support
for non-raw disk image formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Extend the <driver> element in filesystem devices to
allow a storage format to be set. The new attribute
uses 'format' to reflect the storage format. This is
different from the <driver> element in disk devices
which use 'type' to reflect the storage format. This
is because the 'type' attribute on filesystem devices
is already used for the driver backend, for which the
disk devices use the 'name' attribute. Arggggh.
Anyway for disks we have
<driver name="qemu" type="raw"/>
And for filesystems this change means we now have
<driver type="loop" format="raw"/>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
An example of the scsi hostdev XML:
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='0' unit='0'/>
</source>
<address type='drive' controller='0' bus='0' target='4' unit='8'/>
</hostdev>
Controller is implicitly added for scsi hostdev, though the scsi
controller's model defaults to "lsilogic", which might be not what
the user wants (same problem exists for virtio-scsi disk). It's
the existing problem, will be addressed later.
The device address must be specified manually. Later patch will let
libvirt generate it automatically.
This only introduces the generic XMLs for scsi hostdev, later patches
will add other elements, e.g. <readonly>, <shareable>.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
A domain's <interface> or <hostdev>, as well as a <network>'s
<forward>, can now have an optional <driver name='kvm|vfio'/>
element. As of this patch, there is no functionality behind this new
knob - this patch adds support to the domain and network
formatter/parser, and to the RNG and documentation.
When the backend is added, legacy KVM PCI device assignment will
continue to be used when no driver name is specified (or if <driver
name='kvm'/> is specified), but if driver name is 'vfio', the new UEFI
Secure Boot compatible VFIO device assignment will be used.
Note that the parser doesn't automatically insert the current default
value of this setting. This is done on purpose because the two
possibilities are functionally equivalent from the guest's point of
view, and we want to be able to automatically start using vfio as the
default (even for existing domains) at some time in the future. This
is similar to what was done with the "vhost" driver option in
<interface>.
For pSeries guest in QEMU, NVRAM is one kind of spapr-vio device.
Users are allowed to specify spapr-vio devices'address.
But NVRAM is not supported in libvirt. So this patch is to
add NVRAM device to allow users to specify its address.
In QEMU, NVRAM device's address is specified by
"-global spapr-nvram.reg=xxxxx".
In libvirt, XML file is defined as the following:
<nvram>
<address type='spapr-vio' reg='0x3000'/>
</nvram>
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Instead of making a choice between the underscore and camelCase, this
simply changes "num_queues" into "queues", which is also consistent
with Michal's multiple queue support for interface.
The rng schema for <controller> had been non-specific about which
types of controllers allowed which models, and also allowed the
num_queues attribute (since that hasn't been released yet, should we
rename it to "numQueues"?) and <master> subelement to be included for
any controller type. In reality, half of the models are allowed only
for type='scsi', and the other half only for type='usb', num_queues is
allowed only for type='scsi', and <master> only for type='usb'.
This patch makes a separate <group> for type='scsi' and type='usb',
with each group allowing only the appropriate model values, and
allowing num_queue and <master> only when appropriate.
<interleave> also hadn't been specified, forcing a specific order of
subelements, which should never be done. (Note that the <interleave>
had to surround the main element attributes that are in the <group>
subelements, due to one of the <group>s containing a subelement).
The recent qemu requires "0x" prefix for the disk wwn, this patch
changes virValidateWWN to allow the prefix, and prepend "0x" if
it's not specified. E.g.
qemu-kvm: -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,\
drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,wwn=6000c60016ea71ad:
Property 'scsi-hd.wwn' doesn't take value '6000c60016ea71ad'
Though it's a qemu regression, but it's nice to allow the prefix,
and doesn't hurt for us to always output "0x".
Allow VMs to be placed into resource groups using the
following syntax
<resource>
<partition>/virtualmachines/production</partition>
</resource>
A resource cgroup will be backed by some hypervisor specific
functionality, such as cgroups with KVM/LXC.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The definiton of scsi adapter in storagespool.rng (sourceinfoadapter)
can be used by scsi hostdev in later patch. Move it to basictypes.rng.
PortNumber is defined in both domaincommon.rng and storagespool.rng,
simplify it by moving it to basictypes.rng.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
This updates the definitions and supporting structures in the XML
schema and domain configuration files.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
With this patch, one can specify the disk source using libvirt
storage like:
<disk type='volume' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source pool='default' volume='fc18.img'/>
<target dev='vdb' bus='virtio'/>
</disk>
"seclabels" and "startupPolicy" are not supported for this new
disk type ("volume"). They will be supported in later patches.
docs/formatdomain.html.in:
* Add documents for new XMLs
docs/schemas/domaincommon.rng:
* Add rng for new XMLs;
src/conf/domain_conf.h:
* New struct for 'volume' type disk source (virDomainDiskSourcePoolDef)
* Add VIR_DOMAIN_DISK_TYPE_VOLUME for enum virDomainDiskType
src/conf/domain_conf.c:
* New helper virDomainDiskSourcePoolDefParse to parse the 'volume'
type disk source.
* New helper virDomainDiskSourcePoolDefFree to free the source def
if 'volume' type disk.
tests/qemuxml2argvdata/qemuxml2argv-disk-source-pool.xml:
tests/qemuxml2xmltest.c:
* New test
This introduces 4 new attributes for storage pool source adapter.
E.g.
<adapter type='fc_host' parent='scsi_host5' wwnn='20000000c9831b4b' wwpn='10000000c9831b4b'/>
Attribute 'type' can be either 'scsi_host' or 'fc_host', and defaults
to 'scsi_host' if attribute 'name' is specified. I.e. It's optional
for 'scsi_host' adapter, for back-compat reason. However, mandatory
for 'fc_host' adapter and any new future adapter types. Attribute
'parent' is to specify the parent for the fc_host adapter.
* docs/formatstorage.html.in:
- Add documents for the 4 new attrs
* docs/schemas/storagepool.rng:
- Add RNG schema
* src/conf/storage_conf.c:
- Parse and format the new XMLs
* src/conf/storage_conf.h:
- New struct virStoragePoolSourceAdapter, replace "char *adapter" with it;
- New enum virStoragePoolSourceAdapterType
* src/libvirt_private.syms:
- Export TypeToString and TypeFromString
* src/phyp/phyp_driver.c:
- Replace "adapter" with "adapter.data.name", which is member of the union
of the new struct virStoragePoolSourceAdapter now. Later patch will
add the checking, as "adapter.data.name" is only valid for "scsi_host"
adapter.
* src/storage/storage_backend_scsi.c:
- Like above
* tests/storagepoolxml2xmlin/pool-scsi-type-scsi-host.xml:
* tests/storagepoolxml2xmlin/pool-scsi-type-fc-host.xml:
- New test for 'fc_host' and "scsi_host" adapter
* tests/storagepoolxml2xmlout/pool-scsi.xml:
- Change the expected output, as the 'type' defaults to 'scsi_host' if 'name"
specified now
* tests/storagepoolxml2xmlout/pool-scsi-type-scsi-host.xml:
* tests/storagepoolxml2xmlout/pool-scsi-type-fc-host.xml:
- New test
* tests/storagepoolxml2xmltest.c:
- Include the test
This introduce a new attribute "num_queues" (same with the good name
QEMU uses) for virtio-scsi controller. An example of the XML:
<controller type='scsi' index='0' model='virtio-scsi' num_queues='8'/>
The corresponding QEMU command line:
-device virtio-scsi-pci,id=scsi0,num_queues=8,bus=pci.0,addr=0x3 \
Split the "resource" define out into multiple smaller
defines, one for each type of resource tuning parameter.
This makes the schema a bit clearer to read
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This enrichs HBA's xml by dumping the number of max vports and
vports in use. Format is like:
<capability type='vport_ops'>
<max_vports>164</max_vports>
<vports>5</vports>
</capability>
* docs/formatnode.html.in: (Document the new XML)
* docs/schemas/nodedev.rng: (Add the schema)
* src/conf/node_device_conf.h: (New member for data.scsi_host)
* src/node_device/node_device_linux_sysfs.c: (Collect the value of
max_vports and vports)
This does nothing more than adding the new device and capability.
The device is present since QEMU 1.2.0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Only sheepdog actually required it in the code, and we can use 7000 as the
default---the same value that QEMU uses for the simple "sheepdog:VOLUME"
syntax. With this change, the schema can be fixed to allow no port.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'trang' utility, which is able to transform '.rng' files into
'.rnc' files, reported some errors in our schemas that weren't caught
by the tools we use in the build. I haven't added a test for this,
but the validity can be checked by the following command:
trang -I rng -O rnc domain.rng domain.rnc
There were unescaped minuses in regular expressions and we were
constraining int (which is by default in the range of [-2^31;2^31-1]
to maximum of 2^32. But what we wanted was exactly an unsignedInt.
This plumbs in the XML description of iSCSI shares. The next patches
will add support for the libiscsi userspace initiator.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The storage volume formats supported by the disk storage pool are
missing from the allowed values.
Add partition types.
Signed-off-by: Philipp Hahn <hahn@univention.de>
iSCSI qualified names (iqn) from RFC3721 may contain colons (':'), which
neither matches the absFilePath nor genericName:
$ virsh pool-dumpxml myiscsipool
<pool type='iscsi'>
...
<source>
...
<device path='iqn.2003-01.org.linux-iscsi.phahn-sid93.x8664:sn.8a3daa0d4efd'/>
</source>
...
</pool>
Add IscsiQualifiedName type and allow its use in sourceiscsi.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Qemu's implementation of virtio RNG supports rate limiting of the
entropy used. This patch exposes the option to tune this functionality.
This patch is based on qemu commit 904d6f588063fb5ad2b61998acdf1e73fb4
The rate limiting is exported in the XML as:
<devices>
...
<rng model='virtio'>
<rate bytes='123' period='1234'/>
<backend model='random'/>
</rng>
...
The native bus for s390 I/O is called CCW (channel command word).
As QEMU has added basic support for the CCW bus, i.e. the
ability to assign CCW devnos (bus addresses) to devices.
Domains with the new machine type s390-ccw-virtio can use the
CCW bus. Currently QEMU will only allow to define virtio
devices on the CCW bus.
Here we add the new machine type and the new device address to the
schema definition and add a new paragraph to the domain XML
documentation.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
'virsh capabilities' will now include a new <memory> element
per <cell> of the topology, as in:
<topology>
<cells num='2'>
<cell id='0'>
<memory unit='KiB'>12572412</memory>
<cpus num='12'>
...
</cell>
Signed-off-by: Eric Blake <eblake@redhat.com>
There is some controversy[1] on the qemu list on whether qemu should
have ever allowed arbitrary file name passthrough, or whether it
should be restricted to JUST /dev/random and /dev/hwrng. It is
always easier to add support for additional filenames than it is
to remove support for something once released, so this patch
restricts libvirt 1.0.3 (where the virtio-random backend was first
supported) to just the two uncontroversial names, letting us defer
to a later date any decision on whether supporting arbitrary files
makes sense. Additionally, since qemu 1.4 does NOT support
/dev/fdset/nnn fd passthrough for the backend, limiting to just
two known names means that we don't get tempted to try fd
passthrough where it won't work.
[1]https://lists.gnu.org/archive/html/qemu-devel/2013-03/threads.html#00023
* src/conf/domain_conf.c (virDomainRNGDefParseXML): Only allow
/dev/random and /dev/hwrng.
* docs/schemas/domaincommon.rng: Flag invalid files.
* docs/formatdomain.html.in (elementsRng): Document this.
* tests/qemuxml2argvdata/qemuxml2argv-virtio-rng-random.args:
Update test to match.
* tests/qemuxml2argvdata/qemuxml2argv-virtio-rng-random.xml:
Likewise.
This reverts commit 383ebc4694.
We decided the xml for this feature needed more thought to make sure
we are doing it the best way, in particular wrt option values that
have multiple items.
This reverts commit 24aa7f8d11.
The implementation to match the documentation is not complete yet,
and the final design might change the name of the 'schid' attribute.
virStrToLong(..., 8, ...) already requires the mode to be octal.
Change the relax-ng schema to check for octal as well.
Signed-off-by: Philipp Hahn <hahn@univention.de>
This patch documents XML elements used for (basic) support of virtual
RNG devices.
In the devices section in the domain XML users may specify:
For the default 'random' backend:
<devices>
<rng model='virtio'>
<backend model='random'>/dev/urandom</backend>
</rng>
</devices>
For the slightly more advanced EGD backend:
<devices>
<rng model='virtio'>
<backend model='egd' type='udp'>
<!-- this is a definition of a character device -->
<source mode='bind' service='1234'/>
<source mode='connect' host='1.2.3.4' service='1234'/>
<!-- or other valid character device configuration -->
</backend>
</rng>
</devices>
For the planned random daemon/pool:
<devices>
<rng model='virtio'>
<backend model='pool' pool='poolname'>class</backend>
</rng>
</devices>
to enable the RNG device for guests.
Originally, only a host name was used to associate a
DHCPv6 request with a specific IPv6 address. Further testing
demonstrates that this is an unreliable method and, instead,
a client-id or DUID needs to be used. According to DHCPv6
standards, this id can be a duid-LLT, duid-LL, or duid-UUID
even though dnsmasq will accept almost any text string.
Although validity checking of a specified string makes sure it is
hexadecimal notation with bytes separated by colons, there is no
rigorous check to make sure it meets the standard.
Documentation and schemas have been updated.
Signed-off-by: Gene Czarcinski <gene@czarc.net>
Signed-off-by: Laine Stump <laine@laine.org>
Although in IPv4 one must pick either mac or name, either
can be omitted. Similarly, for IPv6, the name
can be optionally omitted.
Signed-off-by: Gene Czarcinski <gene@czarc.net>
Signed-off-by: Laine Stump <laine@laine.org>
This patch adds support for a new <option>-Tag in the <dhcp> block of
network configs, based on a subset of the fifth proposal by Laine
Stump in the mailing list discussion at
https://www.redhat.com/archives/libvir-list/2012-November/msg01054.html.
Any such defined option will result in a dhcp-option=<number>,"<value>"
statement in the generated dnsmasq configuration file.
Currently, DHCP options can be specified by number only and there is
no whitelisting or blacklisting of option numbers, which should
probably be added.
Signed-off-by: Pieter Hollants <pieter@hollants.com>
Signed-off-by: Laine Stump <laine@laine.org>
The native bus for s390 I/O is called CCW (channel command word).
As QEMU has added basic support for the CCW bus, i.e. the
ability to assign CCW devnos (bus addresses) to devices.
Domains with the new machine type s390-ccw-virtio can use the
CCW bus. Currently QEMU will only allow to define virtio
devices on the CCW bus.
Here we add the new machine type and the new device address to the
schema definition and add a new paragraph to the domain XML
documentation.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Hosts for rbd are ceph monitor daemons. These have fixed IP addresses,
so they are often referenced by IP rather than hostname for
convenience, or to avoid relying on DNS. Using IPv4 addresses as the
host name works already, but IPv6 addresses require rbd-specific
escaping because the colon is used as an option separator in the
string passed to qemu.
Escape these colons, and enclose the IPv6 address in square brackets
so it is distinguished from the port, which is currently mandatory.
Acked-by: Osier Yang <jyang@redhat.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This patch adds RNG schemas for adding more information in the topology
output of the NUMA section in the capabilities XML.
The added elements are designed to provide more information about the
placement and topology of the processors in the system to management
applications.
A demonstration of supported XML added by this patch:
<capabilities>
<host>
<topology>
<cells num='3'>
<cell id='0'>
<cpus num='4'> <!-- this is node with Hyperthreading -->
<cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
<cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
</cpus>
</cell>
<cell id='1'>
<cpus num='4'> <!-- this is node with modules (Bulldozer) -->
<cpu id='4' socket_id='0' core_id='2' siblings='4-5'/>
<cpu id='5' socket_id='0' core_id='3' siblings='4-5'/>
<cpu id='6' socket_id='0' core_id='4' siblings='6-7'/>
<cpu id='7' socket_id='0' core_id='5' siblings='6-7'/>
</cpus>
</cell>
<cell id='2'>
<cpus num='4'> <!-- this is a normal multi-core node -->
<cpu id='8' socket_id='1' core_id='0' siblings='8'/>
<cpu id='9' socket_id='1' core_id='1' siblings='9'/>
<cpu id='10' socket_id='1' core_id='2' siblings='10'/>
<cpu id='11' socket_id='1' core_id='3' siblings='11'/>
</cpus>
</cell>
</cells>
</topology>
</host>
</capabilities>
The socket_id field represents identification of the physical socket the
CPU is plugged in. This ID may not be identical to the physical socket
ID reported by the kernel.
The core_id identifies a core within a socket. Also this field may not
accurately represent physical ID's.
The core_id is guaranteed to be unique within a cell and a socket. There
may be duplicates between sockets. Only cores sharing core_id within one
cell and one socket can be considered as threads. Cores sharing core_id
within sparate cells are distinct cores.
The siblings field is a list of CPU id's the cpu id's the CPU is sibling
with - thus a thread. The list is in the cpuset format.
Adds a "ram" attribute globally to the video.model element, that changes
the resulting qemu command line only if video.type == "qxl".
<video>
<model type='qxl' ram='65536' vram='65536' heads='1'/>
</video>
That attribute gets a default value of 64*1024. The schema is unchanged
for other video element types.
The resulting qemu command line change is the addition of
-global qxl-vga.ram_size=<ram>*1024
or
-global qxl.ram_size=<ram>*1024
For the main and secondary qxl devices respectively.
The default for the qxl ram bar is 64*1024 kilobytes (the same as the
default qxl vram bar size).
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
The SCLP console is the native console type for s390 and is preferred
over the virtio console as it doesn't require special drivers and
is more efficient. Recent versions of QEMU come with SCLP support
which is hereby enabled.
The new target types 'sclp' and 'sclplm' can be used to specify a
SCLP console. Adding documentation, domain schema and XML processing
support.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This introduces new XML tag "sgio" for disk, its valid values
are "filtered" and "unfiltered", setting it as "filtered" will
set the disk's unpriv_sgio to 0, and "unfiltered" to set it
as 1, which allows the unprivileged SG_IO commands.
The <hostdev> device type has long had a redundant "mode"
attribute, which has always been "subsys". This finally
introduces a new mode "capabilities", which will be used
by the LXC driver for device assignment. Since container
based virtualization uses a single kernel, the idea of
assigning physical PCI devices doesn't make sense. It is
still reasonable to assign USB devices, but for assigning
arbitrary nodes in /dev, the new 'capabilities' mode is
to be used.
The first capability support is 'storage', which is for
assignment of block devices. Functionally this is really
pretty similar to the <disk> support. The only difference
is the device node name is identical in both host and
container namespaces.
<hostdev mode='capabilities' type='storage'>
<source>
<block>/dev/sdf1</block>
</source>
</hostdev>
The second capability support is 'misc', which is for
assignment of character devices. There is no existing
parallel to this. Again the device node is the same
inside & outside the container.
<hostdev mode='capabilities' type='misc'>
<source>
<char>/dev/input/event3</char>
</source>
</hostdev>
The reason for keeping the char & storage devices
separate in the domain XML, is to mirror the split
in the node device XML. NB the node device XML does
not yet report character devices, but that's another
new patch to come
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If there are multiple video devices
primary = 'yes' marks this video device as the primary one.
The rest are secondary video devices. No more than one could be
mark as primary. If none of them has primary attribute, the first
one will be the primary by default like what it was.
The reason of this changing is that for qemu, only one primary video
device is permitted which can be of any type. For secondary video
devices, only qxl is allowd. Primary attribute removes the restriction
that the first have to be the primary one.
We always put the primary video device into the first position of
video device structure array after parsing.
This is however supported only on domain interfaces with
type='network'. Moreover, target network needs to have at least
inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average'
(old) or 'floor' (new). This new attribute can be used just for
interfaces type of network (<interface type='network'/>) currently.
The DHCPv6 support includes IPV6 dhcp-range and dhcp-host for one
IPv6 subnetwork on one interface. This support will only work
if dnsmasq version >= 2.64; otherwise an error occurs if
dhcp-range or dhcp-host is specified for an IPv6 address.
Essentially, this change provides the same DHCP support for IPv6
that has been available for IPv4.
With dnsmasq >= 2.64, support for the RA service is also now provided
by dnsmasq (radvd is no longer used/started). (Although at least one
version of dnsmasq prior to 2.64 "supported" IPv6 Router
Advertisement, there were bugs (fixed in 2.64) that rendered it
unusable.)
Documentation and the network schema has been updated
to reflect the new support.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
This patch adds the capability for virtual guests to do IPv6
communication via a virtual network interface with no IPv6 (gateway)
addresses specified. This capability has always been enabled by
default for IPv4, but disabled for IPv6 for security concerns, and
because it requires the ip6tables command to be operational (which
isn't the case on a system with the ipv6 module completely disabled).
This patch adds a new attribute "ipv6" at the toplevel of a <network>
object. If ipv6='yes', the extra ip6tables rules required to permite
inter-guest communications are added when the network is started. If
it is 'no', or not present, those rules will not be added; thus the
default behavior doesn't change, so there should be no compatibility
issues with any existing installations.
Note that virtual guests cannot communication with the virtualization
host via this interface, because the following kernel tunable has
been set:
net.ipv6.conf.<bridge_interface_name>.disable_ipv6 = 1
This assures that the bridge interface will not have an IPv6
link-local (fe80::) address.
To control this behavior so that it is not enabled by default, the parameter
ipv6='yes' on the <network> statement has been added.
Documentation related to this patch has been updated.
The network schema has also been updated.
This patch introduces the RNG schema and updates necessary data strucutures
to allow various hypervisors to make use of Gluster protocol as one of the
supported network disk backend. Next patch will add support to make use of
this feature in Qemu since it now supports Gluster protocol as one of the
network based storage backend.
Two new optional attributes for <host> element are introduced - 'transport'
and 'socket'. Valid transport values are tcp, unix or rdma. If none specified,
tcp is assumed. If transport is unix, socket specifies path to unix socket.
This patch allows users to specify disks on gluster backends like this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume1/image'>
<host name='example.org' port='6000' transport='tcp'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume2/image'>
<host transport='unix' socket='/path/to/sock'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Each <domainsnapshot> can now contain an optional <memory>
element that describes how the VM state was handled, similar
to disk snapshots. The new element will always appear in
output; for back-compat, an input that lacks the element will
assume 'no' or 'internal' according to the domain state.
Along with this change, it is now possible to pass <disks> in
the XML for an offline snapshot; this also needs to be wired up
in a future patch, to make it possible to choose internal vs.
external on a per-disk basis for each disk in an offline domain.
At that point, using the --disk-only flag for an offline domain
will be able to work.
For some examples below, remember that qemu supports the
following snapshot actions:
qemu-img: offline external and internal disk
savevm: online internal VM and disk
migrate: online external VM
transaction: online external disk
=====
<domainsnapshot>
<memory snapshot='no'/>
...
</domainsnapshot>
implies that there is no VM state saved (mandatory for
offline and disk-only snapshots, not possible otherwise);
using qemu-img for offline domains and transaction for online.
=====
<domainsnapshot>
<memory snapshot='internal'/>
...
</domainsnapshot>
state is saved inside one of the disks (as in qemu's 'savevm'
system checkpoint implementation). If needed in the future,
we can also add an attribute pointing out _which_ disk saved
the internal state; maybe disk='vda'.
=====
<domainsnapshot>
<memory snapshot='external' file='/path/to/state'/>
...
</domainsnapshot>
This is not wired up yet, but future patches will allow this to
control a combination of 'virsh save /path/to/state' plus disk
snapshots from the same point in time.
=====
So for 1.0.1 (and later, as needed), I plan to implement this table
of combinations, with '*' designating new code and '+' designating
existing code reached through new combinations of xml and/or the
existing DISK_ONLY flag:
domain memory disk disk-only | result
-----------------------------------------
offline omit omit any | memory=no disk=int, via qemu-img
offline no omit any |+memory=no disk=int, via qemu-img
offline omit/no no any | invalid combination (nothing to snapshot)
offline omit/no int any |+memory=no disk=int, via qemu-img
offline omit/no ext any |*memory=no disk=ext, via qemu-img
offline int/ext any any | invalid combination (no memory to save)
online omit omit off | memory=int disk=int, via savevm
online omit omit on | memory=no disk=default, via transaction
online omit no/ext off | unsupported for now
online omit no on | invalid combination (nothing to snapshot)
online omit ext on | memory=no disk=ext, via transaction
online omit int off |+memory=int disk=int, via savevm
online omit int on | unsupported for now
online no omit any |+memory=no disk=default, via transaction
online no no any | invalid combination (nothing to snapshot)
online no int any | unsupported for now
online no ext any |+memory=no disk=ext, via transaction
online int/ext any on | invalid combination (disk-only vs. memory)
online int omit off |+memory=int disk=int, via savevm
online int no/ext off | unsupported for now
online int int off |+memory=int disk=int, via savevm
online ext omit off |*memory=ext disk=default, via migrate+trans
online ext no off |+memory=ext disk=no, via migrate
online ext int off | unsupported for now
online ext ext off |*memory=ext disk=ext, via migrate+transaction
* docs/schemas/domainsnapshot.rng (memory): New RNG element.
* docs/formatsnapshot.html.in: Document it.
* src/conf/snapshot_conf.h (virDomainSnapshotDef): New fields.
* src/conf/domain_conf.c (virDomainSnapshotDefFree)
(virDomainSnapshotDefParseString, virDomainSnapshotDefFormat):
Manage new fields.
* tests/domainsnapshotxml2xmltest.c: New test.
* tests/domainsnapshotxml2xmlin/*.xml: Update existing tests.
* tests/domainsnapshotxml2xmlout/*.xml: Likewise.
At one point, the code passed through arbitrary strings for file
formats, which supposedly lets qemu handle a new file type even
before libvirt has been taught to handle it. However, to properly
label files, libvirt has to learn the file type anyway, so we
might as well make our life easier by only accepting file types
that we are prepared to handle. This patch lets the RNG validation
ensure that only known strings are let through.
* docs/schemas/domaincommon.rng (driverFormat): Limit to list of
supported strings.
* docs/schemas/domainsnapshot.rng (driver): Likewise.
Hypervisors are starting to support HyperV Enlightenment features that
improve behavior of guests running Microsoft Windows operating systems.
This patch adds support for the "relaxed" feature that improves timer
behavior and also establishes a framework to add these features in
future.
When startupPolicy set for a USB devices allows such device to be
missing, there was no way this could be detected from domain XML. With
this patch, libvirt emits a new missing='yes' attribute for such devices
when active domain XML is generated.
USB devices can disappear without OS being mad about it, which makes
them ideal for startupPolicy. With this attribute, USB devices can be
configured to be mandatory (the default), requisite (will disappear
during migration if they cannot be found), or completely optional.
While current on_{poweroff,reboot,crash} action configuration is about
configuring life cycle actions, they can all be considered events and
actions that need to be done on a particular event. Let's generalize the
code by renaming life cycle actions to event actions so that it can be
reused later for non-lifecycle events.
This allows the user to control labelling of each character device
separately (the default is to inherit from the VM).
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Sometimes when guest machine crashes, coredump can get huge due to the
guest memory. This can be limited using madvise(2) system call and is
being used in QEMU hypervisor. This patch adds an option for configuring
that in the domain XML and related documentation.
Whenever the guest machine fails to boot, new parameter (reboot-timeout)
controls whether it should reboot and after how many ms it should do so.
Docs included.
New options is added to support EOI (End of Interrupt) exposure for
guests. As it makes sense only when APIC is enabled, I added this into
the <apic> element in <features> because this should be tri-state
option (cannot be handled as standalone feature).
After discussion with DB we decided to rename the new iolimit
element as it creates the impression it would be there to
limit (i.e. throttle) I/O instead of specifying immutable
characteristics of a block device.
This is also backed by the fact that the term I/O Limits has
vanished from newer storage admin documentation.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
There is a new <pm/> element implemented that can control what ACPI
sleeping states will be advertised by BIOS and allowed to be switched
to by libvirt. The default keeps defaults on hypervisor, otherwise
forces chosen setting.
The documentation of the pm element is added as well.
Introducing a new iolimits element allowing to override certain
properties of a guest block device like the physical and logical
block size.
This can be useful for platforms with 'non-standard' disk formats
like S390 DASD with its 4K block size.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This patch introduces support of setting emulator's period and
quota to limit cpu bandwidth when the vm starts. Also updates
XML Schema for new entries and docs.
This patch adds a new xml element <emulatorpin>, which is a sibling
to the existing <vcpupin> element under the <cputune>, to pin emulator
threads to specified physical CPUs.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
A hypervisor may allow to override the disk geometry of drives.
Qemu, as an example with cyls=,heads=,secs=[,trans=].
This patch extends the domain config to allow the specification of
disk geometry with libvirt.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This patch updates the domain and capability XML parser and formatter to
support more than one "seclabel" element for each domain and device. The
RNG schema and the tests related to this are also updated by this patch.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>
This patch introduces the new forward mode='hostdev' along with
attribute managed. Includes updates to the network RNG and new xml
parser/formatter code.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
The following config elements now support a <vlan> subelements:
within a domain: <interface>, and the <actual> subelement of <interface>
within a network: the toplevel, as well as any <portgroup>
Each vlan element must have one or more <tag id='n'/> subelements. If
there is more than one tag, it is assumed that vlan trunking is being
requested. If trunking is required with only a single tag, the
attribute "trunk='yes'" should be added to the toplevel <vlan>
element.
Some examples:
<interface type='hostdev'/>
<vlan>
<tag id='42'/>
</vlan>
<mac address='52:54:00:12:34:56'/>
...
</interface>
<network>
<name>vlan-net</name>
<vlan trunk='yes'>
<tag id='30'/>
</vlan>
<virtualport type='openvswitch'/>
</network>
<interface type='network'/>
<source network='vlan-net'/>
...
</interface>
<network>
<name>trunk-vlan</name>
<vlan>
<tag id='42'/>
<tag id='43'/>
</vlan>
...
</network>
<network>
<name>multi</name>
...
<portgroup name='production'/>
<vlan>
<tag id='42'/>
</vlan>
</portgroup>
<portgroup name='test'/>
<vlan>
<tag id='666'/>
</vlan>
</portgroup>
</network>
<interface type='network'/>
<source network='multi' portgroup='test'/>
...
</interface>
IMPORTANT NOTE: As of this patch there is no backend support for the
vlan element for *any* network device type. When support is added in
later patches, it will only be for those select network types that
support setting up a vlan on the host side, without the guest's
involvement. (For example, it will be possible to configure a vlan for
a guest connected to an openvswitch bridge, but it won't be possible
to do that for one that is connected to a standard Linux host bridge.)
<portgroup> allows a <bandwidth> element, but the schema didn't have
this. Since this makes for multiple elements in portgroup, they must
be interleaved.
<interface type='bridge'> needs to allow <virtualport> elements
for openvswitch, but the schema didn't allow this.
Just as each physical device used by a network has a connections
counter, now each network has a connections counter which is
incremented once for each guest interface that connects using this
network.
The count is output in the live network XML, like this:
<network connections='20'>
...
</network>
It is read-only, and for informational purposes only - it isn't used
internally anywhere by libvirt.
Until now, all attributes in a <virtualport> parameter list that were
acceptable for a particular type, were also required. There were no
optional attributes.
One of the aims of supporting <virtualport> in libvirt's virtual
networks and portgroups is to allow specifying the group-wide
parameters in the network's virtualport, and merge that with the
interface's virtualport, which will have the instance-specific info
(i.e. the interfaceid or instanceid).
Additionally, the guest's interface XML shouldn't need to know what
type of network connection will be used prior to runtime - it could be
openvswitch, 802.1Qbh, 802.1Qbg, or none of the above - but should
still be able to specify instance-specific info just in case it turns
out to be applicable.
Finally, up to now, the parser for virtualport has always generated a
random instanceid/interfaceid when appropriate, making it impossible
to leave it blank (which is what's required for virtualports within a
network/portprofile definition).
This patch modifies the parser and formatter of the <virtualport>
element in the following ways:
* because most of the attributes in a virNetDevVPortProfile are fixed
size binary data with no reserved values, there is no way to embed a
"this value wasn't specified" sentinel into the existing data. To
solve this problem, the new *_specified fields in the
virNetDevVPortProfile object that were added in a previous patch of
this series are now set when the corresponding attribute is present
during the parse.
* allow parsing/formatting a <virtualport> that has no type set. In
this case, all fields are settable, but all are also optional.
* add a GENERATE_MISSING_DEFAULTS flag to the parser - if this flag is
set and an instanceid/interfaceid is expected but not provided, a
random one will be generated. This was previously the default
behavior, but is now done only for virtualports inside an
<interface> definition, not for those in <network> or <portgroup>.
* add a REQUIRE_ALL_ATTRIBUTES flag to the parser - if this flag is
set the parser will call the new
virNetDevVPortProfileCheckComplete() functions at the end of the
parser to check for any missing attributes (based on type), and
return failure if anything is missing. This used to be default
behavior. Now it is only used for the virtualport defined inside an
interface's <actual> element (by the time you've figured out the
contents of <actual>, you should have all the necessary data to fill
in the entire virtualport)
* add a REQUIRE_TYPE flag to the parser - if this flag is set, the
parser will return an error if the virtualport has no type
attribute. This also was previously the default behavior, but isn't
needed in the case of the virtualport for a type='network' interface
(i.e. the exact type isn't yet known), or the virtualport of a
portgroup (i.e. the portgroup just has modifiers for the network's
virtualport, which *does* require a type) - in those cases, the
check will be done at domain startup, once the final virtualport is
assembled (this is handled in the next patch).
The access, birth, modification and change times are added to
storage volumes and corresponding xml representations. This
shows up in the XML in this format:
<timestamps>
<atime>1341933637.027319099</atime>
<mtime>1341933637.027319099</mtime>
</timestamps>
Signed-off-by: Eric Blake <eblake@redhat.com>
capability.rng: Guest features can be in any order.
nodedev.rng: Added <driver> element, <capability> phys_function and
virt_functions for PCI devices.
storagepool.rng: Owner or group ID can be -1.
schema tests: New capabilities and nodedev files; changed owner and
group to -1 in pool-dir.xml.
storage_conf: Print uid_t and gid_t as signed to storage pool XML.
Libvirt adds a USB controller to the guest even if the user does not
specify any in the XML. This is due to back-compat reasons.
To allow disabling USB for a guest this patch adds a new USB controller
type "none" that disables USB support for the guest.
This patch brings support to manage sheepdog pools and volumes to libvirt.
It uses the "collie" command-line utility that comes with sheepdog for that.
A sheepdog pool in libvirt maps to a sheepdog cluster.
It needs a host and port to connect to, which in most cases
is just going to be the default of localhost on port 7000.
A sheepdog volume in libvirt maps to a sheepdog vdi.
To create one specify the pool, a name and the capacity.
Volumes can also be resized later.
In the volume XML the vdi name has to be put into the <target><path>.
To use the volume as a disk source for virtual machines specify
the vdi name as "name" attribute of the <source>.
The host and port information from the pool are specified inside the host tag.
<disk type='network'>
...
<source protocol="sheepdog" name="vdi_name">
<host name="localhost" port="7000"/>
</source>
</disk>
To work right this patch parses the output of collie,
so it relies on the raw output option. There recently was a bug which caused
size information to be reported wrong. This is fixed upstream already and
will be in the next release.
Signed-off-by: Sebastian Wiedenroth <wiedi@frubar.net>
Added s390-virtio machine type to the XML schema for domains in order
to not fail the domain schema tests.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Currently you can configure LXC to bind a host directory to
a guest directory, but not to bind a guest directory to a
guest directory. While the guest container init could do
this itself, allowing it in the libvirt XML means a stricter
SELinux policy can be written
Introduce a new syntax for filesystems to allow use of a RAM
filesystem
<filesystem type='ram'>
<source usage='10' units='MiB'/>
<target dir='/mnt'/>
</filesystem>
The usage units default to KiB to limit consumption of host memory.
* docs/formatdomain.html.in: Document new syntax
* docs/schemas/domaincommon.rng: Add new attributes
* src/conf/domain_conf.c: Parsing/formatting of RAM filesystems
* src/lxc/lxc_container.c: Mounting of RAM filesystems
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
'boot' tag shouldn't be exclusive with 'kernel', 'initrd', and 'cmdline',
though the boot sequence doesn't make sense when the guest boots from
kernel directly. But it's useful if booting from kernel is to install
a newguest, even if it's not to install a guest, there is no hurt. And
on the other hand, we allow 'boot' and the kernel tags when parsing.
This patch adds support for a new storage backend with RBD support.
RBD is the RADOS Block Device and is part of the Ceph distributed storage
system.
It comes in two flavours: Qemu-RBD and Kernel RBD, this storage backend only
supports Qemu-RBD, thus limiting the use of this storage driver to Qemu only.
To function this backend relies on librbd and librados being present on the
local system.
The backend also supports Cephx authentication for safe authentication with
the Ceph cluster.
For storing credentials it uses the built-in secret mechanism of libvirt.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This patch adds support for the recent ipset iptables extension
to libvirt's nwfilter subsystem. Ipset allows to maintain 'sets'
of IP addresses, ports and other packet parameters and allows for
faster lookup (in the order of O(1) vs. O(n)) and rule evaluation
to achieve higher throughput than what can be achieved with
individual iptables rules.
On the command line iptables supports ipset using
iptables ... -m set --match-set <ipset name> <flags> -j ...
where 'ipset name' is the name of a previously created ipset and
flags is a comma-separated list of up to 6 flags. Flags use 'src' and 'dst'
for selecting IP addresses, ports etc. from the source or
destination part of a packet. So a concrete example may look like this:
iptables -A INPUT -m set --match-set test src,src -j ACCEPT
Since ipset management is quite complex, the idea was to leave ipset
management outside of libvirt but still allow users to reference an ipset.
The user would have to make sure the ipset is available once the VM is
started so that the iptables rule(s) referencing the ipset can be created.
Using XML to describe an ipset in an nwfilter rule would then look as
follows:
<rule action='accept' direction='in'>
<all ipset='test' ipsetflags='src,src'/>
</rule>
The two parameters on the command line are also the two distinct XML attributes
'ipset' and 'ipsetflags'.
FYI: Here is the man page for ipset:
https://ipset.netfilter.org/ipset.man.html
Regards,
Stefan
Though numad will manage the memory allocation of task dynamically,
it wants management application (libvirt) to pre-set the memory
policy according to the advisory nodeset returned from querying numad,
(just like pre-bind CPU nodeset for domain process), and thus the
performance could benefit much more from it.
This patch introduces new XML tag 'placement', value 'auto' indicates
whether to set the memory policy with the advisory nodeset from numad,
and its value defaults to the value of <vcpu> placement, or 'static'
if 'nodeset' is specified. Example of the new XML tag's usage:
<numatune>
<memory placement='auto' mode='interleave'/>
</numatune>
Just like what current "numatune" does, the 'auto' numa memory policy
setting uses libnuma's API too.
If <vcpu> "placement" is "auto", and <numatune> is not specified
explicitly, a default <numatume> will be added with "placement"
set as "auto", and "mode" set as "strict".
The following XML can now fully drive numad:
1) <vcpu> placement is 'auto', no <numatune> is specified.
<vcpu placement='auto'>10</vcpu>
2) <vcpu> placement is 'auto', no 'placement' is specified for
<numatune>.
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='interleave'/>
</numatune>
And it's also able to control the CPU placement and memory policy
independently. e.g.
1) <vcpu> placement is 'auto', and <numatune> placement is 'static'
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='strict' nodeset='0-10,^7'/>
</numatune>
2) <vcpu> placement is 'static', and <numatune> placement is 'auto'
<vcpu placement='static' cpuset='0-24,^12'>10</vcpu>
<numatune>
<memory mode='interleave' placement='auto'/>
</numatume>
A follow up patch will change the XML formatting codes to always output
'placement' for <vcpu>, even it's 'static'.
qemu's behavior in this case is to change the spice server behavior to
require secure connection to any channel not otherwise specified as
being in plaintext mode. libvirt doesn't currently allow requesting this
(via plaintext-channel=<channel name>).
RHBZ: 819499
Signed-off-by: Alon Levy <alevy@redhat.com>
In order to track a block copy job across libvirtd restarts, we
need to save internal XML that tracks the name of the file
holding the mirror. Displaying this name in dumpxml might also
be useful to the user, even if we don't yet have a way to (re-)
start a domain with mirroring enabled up front. This is done
with a new <mirror> sub-element to <disk>, as in:
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/var/lib/libvirt/images/original.img'/>
<mirror file='/var/lib/libvirt/images/copy.img' format='qcow2' ready='yes'/>
...
</disk>
For now, the element is output-only, in live domains; it is ignored
when defining a domain or hot-plugging a disk (since those contexts
use VIR_DOMAIN_XML_INACTIVE in parsing). The 'ready' attribute appears
when libvirt knows that the job has changed from the initial pulling
phase over to the mirroring phase, although absence of the attribute
is not a sure indicator of the current phase. If we come up with a way
to make qemu start with mirroring enabled, we can relax the xml
restriction, and allow <mirror> (but not attribute 'ready') on input.
Testing active-only XML meant tweaking the testsuite slightly, but it
was worth it.
* docs/schemas/domaincommon.rng (diskspec): Add diskMirror.
* docs/formatdomain.html.in (elementsDisks): Document it.
* src/conf/domain_conf.h (_virDomainDiskDef): New members.
* src/conf/domain_conf.c (virDomainDiskDefFree): Clean them.
(virDomainDiskDefParseXML): Parse them, but only internally.
(virDomainDiskDefFormat): Output them.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: New test file.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml: Likewise.
* tests/qemuxml2xmltest.c (testInfo): Alter members.
(testCompareXMLToXMLHelper): Allow more test control.
(mymain): Run new test.
<filesystemtgt> is redundant, as every group uses it; <address>
shouldn't be in <filesystemtgt> in case of the meaning could be
"filesystemtarget"; The elements <address>, <alias>, <target>,
... should be interleaved.
Since Xen 3.1 the clock=variable semantic is supported. In addition to
qemu/kvm Xen also knows about a variant where the offset is relative to
'localtime' instead of 'utc'.
Extends the libvirt structure with a flag 'basis' to specify, if the
offset is relative to 'localtime' or 'utc'.
Extends the libvirt structure with a flag 'reset' to force the reset
behaviour of 'localtime' and 'utc'; this is needed for backward
compatibility with previous versions of libvirt, since they report
incorrect XML.
Adapt the only user 'qemu' to the new name.
Extend the RelaxNG schema accordingly.
Document the new 'basis' attribute in the HTML documentation.
Adapt test for the new attribute.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Pass argv to the init binary of LXC, using a new <initarg> element.
* docs/formatdomain.html.in: Document <os> usage for containers
* docs/schemas/domaincommon.rng: Add <initarg> element
* src/conf/domain_conf.c, src/conf/domain_conf.h: parsing and
formatting of <initarg>
* src/lxc/lxc_container.c: Setup LXC argv
* tests/Makefile.am, tests/lxcxml2xmldata/lxc-systemd.xml,
tests/lxcxml2xmltest.c, tests/testutilslxc.c,
tests/testutilslxc.h: Test parsing/formatting of LXC related
XML parts
A few times libvirt users manually setting mac addresses have
complained of a networking failure that ends up being due to a multicast
mac address being used for a guest interface. This patch prevents that
by logging an error and failing if a multicast mac address is
encountered in each of the three following cases:
1) domain xml <interface> mac address.
2) network xml bridge mac address.
3) network xml dhcp/host mac address.
There are several other places where a mac address can be input that
aren't controlled in this manner because failure to do so has no
consequences (e.g., if the address will be used to search through
existing interfaces for a match).
The RNG has been updated to add multiMacAddr and uniMacAddr along with
the existing macAddr, and macAddr was switched to uniMacAddr where
appropriate.
If no <interface> elements are included in an LXC guest XML
description, then the LXC guest will just see the host's
network interfaces. It is desirable to be able to hide the
host interfaces, without having to define any guest interfaces.
This patch introduces a new feature flag <privnet/> to allow
forcing of a private network namespace for LXC. In the future
I also anticipate that we will add <privuser/> to force a
private user ID namespace.
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add support
for <privnet/> feature. Auto-set <privnet> if any <interface>
devices are defined
* src/lxc/lxc_container.c: Honour request for private network
namespace
numad is an user-level daemon that monitors NUMA topology and
processes resource consumption to facilitate good NUMA resource
alignment of applications/virtual machines to improve performance
and minimize cost of remote memory latencies. It provides a
pre-placement advisory interface, so significant processes can
be pre-bound to nodes with sufficient available resources.
More details: http://fedoraproject.org/wiki/Features/numad
"numad -w ncpus:memory_amount" is the advisory interface numad
provides currently.
This patch add the support by introducing a new XML attribute
for <vcpu>. e.g.
<vcpu placement="auto">4</vcpu>
<vcpu placement="static" cpuset="1-10^6">4</vcpu>
The returned advisory nodeset from numad will be printed
in domain's dumped XML. e.g.
<vcpu placement="auto" cpuset="1-10^6">4</vcpu>
If placement is "auto", the number of vcpus and the current
memory amount specified in domain XML will be used for numad
command line (numad uses MB for memory amount):
numad -w $num_of_vcpus:$current_memory_amount / 1024
The advisory nodeset returned from numad will be used to set
domain process CPU affinity then. (e.g. qemuProcessInitCpuAffinity).
If the user specifies both CPU affinity policy (e.g.
(<vcpu cpuset="1-10,^7,^8">4</vcpu>) and placement == "auto"
the specified CPU affinity will be overridden.
Only QEMU/KVM drivers support it now.
See docs update in patch for more details.
If there is a disk file with a comma in the name, QEmu expects a double
comma instead of a single one (e.g., the file "virtual,disk.img" needs
to be specified as "virtual,,disk.img" in QEmu's command line). This
patch fixes libvirt to work with that feature. Fix RHBZ #801036.
Based on an initial patch by Crístian Viana.
* src/util/buf.h (virBufferEscape): Alter signature.
* src/util/buf.c (virBufferEscape): Add parameter.
(virBufferEscapeSexpr): Fix caller.
* src/qemu/qemu_command.c (qemuBuildRBDString): Likewise. Also
escape commas in file names.
(qemuBuildDriveStr): Escape commas in file names.
* docs/schemas/basictypes.rng (absFilePath): Relax RNG to allow
commas in input file names.
* tests/qemuxml2argvdata/*-disk-drive-network-sheepdog.*: Update
test.
Signed-off-by: Eric Blake <eblake@redhat.com>
Output is still in kibibytes, but input can now be in different
scales for ease of typing.
* src/conf/domain_conf.c (virDomainParseMemory): New helper.
(virDomainDefParseXML): Use it when parsing.
* docs/schemas/domaincommon.rng: Expand XML; rename memoryKBElement
to memoryElement and update callers.
* docs/formatdomain.html.in (elementsMemoryAllocation): Document
scaling.
* tests/qemuxml2argvdata/qemuxml2argv-memtune.xml: Adjust test.
* tests/qemuxml2xmltest.c: Likewise.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-memtune.xml: New file.
The test domain allows <memory>0</memory>, but the RNG was stating
that memory had to be at least 4096000 bytes. Hypervisors should
enforce their own limits, rather than complicating the RNG.
Meanwhile, some copy and paste had introduced some fishy constructs
in various unit tests.
* docs/schemas/domaincommon.rng (memoryKB, memoryKBElement): Drop
limit that isn't enforced in code.
* src/conf/domain_conf.c (virDomainDefParseXML): Require current
<= maximum.
* tests/qemuxml2argvdata/*.xml: Fix offenders.
Disk manufacturers are fond of quoting sizes in powers of 10,
rather than powers of 2 (after all, 2.1 GB sounds larger than
2.0 GiB, even though the exact opposite is true). So, we might
as well follow coreutils' lead in supporting three types of
suffix: single letter ${u} (which we already had) and ${u}iB
for the power of 2, and ${u}B for power of 10.
Additionally, it is impossible to create a file with more than
2**63 bytes, since off_t is signed (if you have enough storage
to even create one 8EiB file, I'm jealous). This now reports
failure up front rather than down the road when the kernel
finally refuses an impossible size.
* docs/schemas/basictypes.rng (unit): Add suffixes.
* src/conf/storage_conf.c (virStorageSize): Use new function.
* docs/formatstorage.html.in: Document it.
* tests/storagevolxml2xmlin/vol-file-backing.xml: Test it.
* tests/storagevolxml2xmlin/vol-file.xml: Likewise.
Make it obvious to 'dumpxml' readers what unit we are using,
since our default of KiB for memory (1024) differs from qemu's
default of MiB; and differs from our use of bytes for storage.
Tests were updated via:
$ find tests/*data tests/*out -name '*.xml' | \
xargs sed -i 's/<\(memory\|currentMemory\|hard_limit\|soft_limit\|min_guarantee\|swap_hard_limit\)>/<\1 unit='"'KiB'>/"
$ find tests/*data tests/*out -name '*.xml' | \
xargs sed -i 's/<\(capacity\|allocation\|available\)>/<\1 unit='"'bytes'>/"
followed by a few fixes for the stragglers.
Note that with this patch, the RNG for <memory> still forbids
validation of anything except unit='KiB', since the code silently
ignores the attribute; a later patch will expand <memory> to allow
scaled input in the code and update the RNG to match.
* docs/schemas/basictypes.rng (unit): Add 'bytes'.
(scaledInteger): New define.
* docs/schemas/storagevol.rng (sizing): Use it.
* docs/schemas/storagepool.rng (sizing): Likewise.
* docs/schemas/domaincommon.rng (memoryKBElement): New define; use
for memory elements.
* src/conf/storage_conf.c (virStoragePoolDefFormat)
(virStorageVolDefFormat): Likewise.
* src/conf/domain_conf.h (_virDomainDef): Document unit used
internally.
* src/conf/storage_conf.h (_virStoragePoolDef, _virStorageVolDef):
Likewise.
* tests/*data/*.xml: Update all tests.
* tests/*out/*.xml: Likewise.
* tests/define-dev-segfault: Likewise.
* tests/openvzutilstest.c (testReadNetworkConf): Likewise.
* tests/qemuargv2xmltest.c (blankProblemElements): Likewise.
The code supported unit='E' for "exabyte", but the RNG did not;
conversely, the RNG supported "z" and "y" but the code did not
(I'm jealous if you have that much storage, particularly since
it won't fit in 64-bit off_t). Also, the code supported
<allocation unit='...'>, but not the RNG.
In an effort to make 'unit' more worthwhile in future patches,
it's easier to share it between files.
In making this factorization, note that absFilePath is more
permissive than 'path', so storage pools and storage volumes
will now validate with a wider set of file names than before.
I don't think this should be a problem in practice.
* docs/schemas/storagepool.rng: Include basic types, rather than
repeating things here.
* docs/schemas/storagevol.rng: Likewise.
* docs/schemas/basictypes.rng: Add 'unsignedLong', 'unit', and fix
to match storage code.
This is the new interface type that sets up an SR-IOV PCI network
device to be assigned to the guest with PCI passthrough after
initializing some network device-specific things from the config
(e.g. MAC address, virtualport profile parameters). Here is an example
of the syntax:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0' bus='0' slot='4' function='3'/>
</source>
<mac address='00:11:22:33:44:55'/>
<address type='pci' domain='0' bus='0' slot='7' function='0'/>
</interface>
This would assign the PCI card from bus 0 slot 4 function 3 on the
host, to bus 0 slot 7 function 0 on the guest, but would first set the
MAC address of the card to 00:11:22:33:44:55.
NB: The parser and formatter don't care if the PCI card being
specified is a standard single function network adapter, or a virtual
function (VF) of an SR-IOV capable network adapter, but the upcoming
code that implements the back end of this config will work *only* with
SR-IOV VFs. This is because modifying the mac address of a standard
network adapter prior to assigning it to a guest is pointless - part
of the device reset that occurs during that process will reset the MAC
address to the value programmed into the card's firmware.
Although it's not supported by any of libvirt's hypervisor drivers,
usb network hostdevs are also supported in the parser and formatter
for completeness and consistency. <source> syntax is identical to that
for plain <hostdev> devices, except that the <address> element should
have "type='usb'" added if bus/device are specified:
<interface type='hostdev'>
<source>
<address type='usb' bus='0' device='4'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
If the vendor/product form of usb specification is used, type='usb'
is implied:
<interface type='hostdev'>
<source>
<vendor id='0x0012'/>
<product id='0x24dd'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
Again, the upcoming patch to fill in the backend of this functionality
will log an error and fail with "Unsupported Config" if you actually
try to assign a USB network adapter to a guest using <interface
type='hostdev'> - just use a standard <hostdev> entry in that case
(and also for single-port PCI adapters).
* src/conf/domain_conf.h: Add new member "target" to struct
_virDomainDeviceDriveAddress.
* src/conf/domain_conf.c: Parse and format "target"
* Lots of tests (.xml) in tests/domainsnapshotxml2xmlout,
tests/qemuxml2argvdata, tests/qemuxml2xmloutdata, and
tests/vmx2xmldata/ are modified for newly introduced
attribute "target" for address of "drive" type.
KVM will be able to use a PCI SCSI controller even on POWER. Let
the user specify the vSCSI controller by other means than a default.
After this patch, the QEMU driver will actually look at the model
and reject anything but auto, lsilogic and ibmvscsi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
Commit b170eb99 introduced a bug: domains that had an explicit
<seclabel type='none'/> when started would not be reparsed if
libvirtd restarted. It turns out that our testsuite was not
exercising this because it never tried anything but inactive
parsing. Additionally, the live XML for such a domain failed
to re-validate. Applying just the tests/ portion of this patch
will expose the bugs that are fixed by the other two files.
* docs/schemas/domaincommon.rng (seclabel): Allow relabel under
type='none'.
* src/conf/domain_conf.c (virSecurityLabelDefParseXML): Per RNG,
presence of <seclabel> with no type implies dynamic. Don't
require sub-elements for type='none'.
* tests/qemuxml2xmltest.c (mymain): Add test.
* tests/qemuxml2argvtest.c (mymain): Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-none.xml: Add file.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-none.args: Add file.
Reported by Ansis Atteka.
Curently security labels can be of type 'dynamic' or 'static'.
If no security label is given, then 'dynamic' is assumed. The
current code takes advantage of this default, and avoids even
saving <seclabel> elements with type='dynamic' to disk. This
means if you temporarily change security driver, the guests
can all still start.
With the introduction of sVirt to LXC though, there needs to be
a new default of 'none' to allow unconfined LXC containers.
This patch introduces two new security label types
- default: the host configuration decides whether to run the
guest with type 'none' or 'dynamic' at guest start
- none: the guest will run unconfined by security policy
The 'none' label type will obviously be undesirable for some
deployments, so a new qemu.conf option allows a host admin to
mandate confined guests. It is also possible to turn off default
confinement
security_default_confined = 1|0 (default == 1)
security_require_confined = 1|0 (default == 0)
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add new
seclabel types
* src/security/security_manager.c, src/security/security_manager.h:
Set default sec label types
* src/security/security_selinux.c: Handle 'none' seclabel type
* src/qemu/qemu.conf, src/qemu/qemu_conf.c, src/qemu/qemu_conf.h,
src/qemu/libvirtd_qemu.aug: New security config options
* src/qemu/qemu_driver.c: Tell security driver about default
config
This patch adds a new element <title> to the domain XML. This attribute
can hold a short title defined by the user to ease the identification of
domains. The title may not contain newlines and should be reasonably short.
*docs/formatdomain.html.in
*docs/schemas/domaincommon.rng
- add schema grammar for the new element and documentation
*src/conf/domain_conf.c
*src/conf/domain_conf.h
- add field to hold the new attribute
- add code to parse and create XML with the new attribute
Commit 8a09ee410 tickles a bug in libxml2-2.7.6 on RHEL 6.2,
where libxml2 treats the pattern [^\n] as excluding literal
backslash and n, instead of the intended newline, thus failing
to validate any domain name containing 'n'.
* docs/schemas/domaincommon.rng: Use literal newline instead.
This patch adds a new attribute "rawio" to the "disk" element
of domain XML. Valid values of "rawio" attribute are "yes"
and "no".
rawio='yes' indicates the disk is desirous of CAP_SYS_RAWIO.
If you specify the following XML:
<disk type='block' device='lun' rawio='yes'>
...
</disk>
the domain will be granted CAP_SYS_RAWIO.
(of course, the domain have to be executed with root privilege)
NOTE:
- "rawio" attribute is only valid when device='lun'
- At the moment, any other disks you won't use rawio can use rawio.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
This patch addresses: https://bugzilla.redhat.com/show_bug.cgi?id=781562
Along with the "rombar" option that controls whether or not a boot rom
is made visible to the guest, qemu also has a "romfile" option that
allows specifying a binary file to present as the ROM BIOS of any
emulated or passthrough PCI device. This patch adds support for
specifying romfile to both passthrough PCI devices, and emulated
network devices that attach to the guest's PCI bus (just about
everything other than ne2k_isa).
One example of the usefulness of this option is described in the
bugzilla report: 82576 sriov network adapters don't provide a ROM BIOS
for the cards virtual functions (VF), but an image of such a ROM is
available, and with this ROM visible to the guest, it can PXE boot.
In libvirt's xml, the new option is configured like this:
<hostdev>
...
<rom file='/etc/fake/boot.bin'/>
...
</hostdev
(similarly for <interface>).
When support for the rombar option was added, it was only added for
PCI passthrough devices, configured with <hostdev>. The same option is
available for any network device that is attached to the guest's PCI
bus. This patch allows setting rombar for any PCI network device type.
After adding cases to test this to qemuxml2argv-hostdev-pci-rombar.*,
I decided to rename those files (to qemuxml2argv-pci-rom.*) to more
accurately reflect the additional tests, and also noticed that up to
now we've only been performing a domainschematest for that case, so I
added the "pci-rom" test to both qemuxml2argv and qemuxml2xml (and in
the process found some bugs whose fixes I squashed into previous
commits of this series).
Add kvmclock timer to documentation, schema and parsers. Keep the
platform timer first since it is kind of special, and alphabetize
the others when possible (i.e. when it does not change the ABI).
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The domain schema enforced restrictions on the domain name string that
the code doesn't. This patch relaxes the check, leaving the restrictions
on the driver or hypervisor. The only invalid character is a newline.
Applications can now insert custom nodes and hierarchies into domain
configuration XML. Although currently not enforced, applications are
required to use their own namespaces on every custom node they insert,
with only one top-level element per namespace.
This introduces new attribute wrpolicy with only supported
value as immediate. This will be an optional
attribute with no defaults. This helps specify whether
to skip the host page cache.
When wrpolicy is specified, meaning when wrpolicy=immediate
a writeback is explicitly initiated for the dirty pages in
the host page cache as part of the guest file write operation.
Usage:
<filesystem type='mount' accessmode='passthrough'>
<driver type='path' wrpolicy='immediate'/>
<source dir='/export/to/guest'/>
<target dir='mount_tag'/>
</filesystem>
Currently this only works with type='mount' for the QEMU/KVM driver.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
The mode can be either of "custom" (default), "host-model",
"host-passthrough". The semantics of each mode is described in the
following examples:
- guest CPU is a default model with specified topology:
<cpu>
<topology sockets='1' cores='2' threads='1'/>
</cpu>
- guest CPU matches selected model:
<cpu mode='custom' match='exact'>
<model>core2duo</model>
</cpu>
- guest CPU should be a copy of host CPU as advertised by capabilities
XML (this is a short cut for manually copying host CPU specification
from capabilities to domain XML):
<cpu mode='host-model'/>
In case a hypervisor does not support the exact host model, libvirt
automatically falls back to a closest supported CPU model and
removes/adds features to match host. This behavior can be disabled by
<cpu mode='host-model'>
<model fallback='forbid'/>
</cpu>
- the same as previous returned by virDomainGetXMLDesc with
VIR_DOMAIN_XML_UPDATE_CPU flag:
<cpu mode='host-model' match='exact'>
<model fallback='allow'>Penryn</model> --+
<vendor>Intel</vendor> |
<topology sockets='2' cores='4' threads='1'/> + copied from
<feature policy='require' name='dca'/> | capabilities XML
<feature policy='require' name='xtpr'/> |
... --+
</cpu>
- guest CPU should be exactly the same as host CPU even in the aspects
libvirt doesn't model (such domain cannot be migrated unless both
hosts contain exactly the same CPUs):
<cpu mode='host-passthrough'/>
- the same as previous returned by virDomainGetXMLDesc with
VIR_DOMAIN_XML_UPDATE_CPU flag:
<cpu mode='host-passthrough' match='minimal'>
<model>Penryn</model> --+ copied from caps
<vendor>Intel</vendor> | XML but doesn't
<topology sockets='2' cores='4' threads='1'/> | describe all
<feature policy='require' name='dca'/> | aspects of the
<feature policy='require' name='xtpr'/> | actual guest CPU
... --+
</cpu>
In case a hypervisor doesn't support the exact CPU model requested by a
domain XML, we automatically fallback to a closest CPU model the
hypervisor supports (and make sure we add/remove any additional features
if needed). This patch adds 'fallback' attribute to model element, which
can be used to disable this automatic fallback.
We support <interface> of type "mcast", "server", and "client",
but the RNG schema for them are missed. Attribute "address" is
optional for "server" type. And these 3 types support
<mac address='MAC'/>, too.
Though <alias> is ignored when defining a domain, it can cause
failure if one validates (e.g. virt-xml-validate) the XML dumped
from a running domain. This patch expose it in domain RNG schema
for all the devices which support it.
The "unit" attribute of a drive address is optional in the code, so should
also be in the XML schema.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
KVM will be able to use a PCI SCSI controller even on POWER. Let
the user specify the vSCSI controller by other means than a default.
After this patch, the QEMU driver will actually look at the model
and reject anything but auto, lsilogic and ibmvscsi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new introduced optional attribute "copy_on_read</code> controls
whether to copy read backing file into the image file. The value can
be either "on" or "off". Copy-on-read avoids accessing the same backing
file sectors repeatedly and is useful when the backing file is over a
slow network. By default copy-on-read is off.
This patch introduces the capability to use a different iterator per
variable.
The currently supported notation of variables in a filtering rule like
<rule action='accept' direction='out'>
<tcp srcipaddr='$A' srcportstart='$B'/>
</rule>
processes the two lists 'A' and 'B' in parallel. This means that A and B
must have the same number of 'N' elements and that 'N' rules will be
instantiated (assuming all tuples from A and B are unique).
In this patch we now introduce the assignment of variables to different
iterators. Therefore a rule like
<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@1]' srcportstart='$B[@2]'/>
</rule>
will now create every combination of elements in A with elements in B since
A has been assigned to an iterator with Id '1' and B has been assigned to an
iterator with Id '2', thus processing their value independently.
The first rule has an equivalent notation of
<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@0]' srcportstart='$B[@0]'/>
</rule>
In the past, generic SCSI commands issued from a guest to a virtio
disk were always passed through to the underlying disk by qemu, and
the kernel would also pass them on.
As a result of CVE-2011-4127 (see:
http://seclists.org/oss-sec/2011/q4/536), qemu now honors its
scsi=on|off device option for virtio-blk-pci (which enables/disables
passthrough of generic SCSI commands), and the kernel will only allow
the commands for physical devices (not for partitions or logical
volumes). The default behavior of qemu is still to allow sending
generic SCSI commands to physical disks that are presented to a guest
as virtio-blk-pci devices, but libvirt prefers to disable those
commands in the standard virtio block devices, enabling it only when
specifically requested (hopefully indicating that the requester
understands what they're asking for). For this purpose, a new libvirt
disk device type (device='lun') has been created.
device='lun' is identical to the default device='disk', except that:
1) It is only allowed if bus='virtio', type='block', and the qemu
version is "new enough" to support it ("new enough" == qemu 0.11 or
better), otherwise the domain will fail to start and a
CONFIG_UNSUPPORTED error will be logged).
2) The option "scsi=on" will be added to the -device arg to allow
SG_IO commands (if device !='lun', "scsi=off" will be added to the
-device arg so that SG_IO commands are specifically forbidden).
Guests which continue to use disk device='disk' (the default) will no
longer be able to use SG_IO commands on the disk; those that have
their disk device changed to device='lun' will still be able to use SG_IO
commands.
*docs/formatdomain.html.in - document the new device attribute value.
*docs/schemas/domaincommon.rng - allow it in the RNG
*tests/* - update the args of several existing tests to add scsi=off, and
add one new test that will test scsi=on.
*src/conf/domain_conf.c - update domain XML parser and formatter
*src/qemu/qemu_(command|driver|hotplug).c - treat
VIR_DOMAIN_DISK_DEVICE_LUN *almost* identically to
VIR_DOMAIN_DISK_DEVICE_DISK, except as indicated above.
Note that no support for this new device value was added to any
hypervisor drivers other than qemu, because it's unclear what it might
mean (if anything) to those drivers.
Hi,
this is the fifth version of my SRV record for DNSMasq patch rebased
for the current codebase to the bridge driver and libvirt XML file to
include support for the SRV records in the DNS. The syntax is based on
DNSMasq man page and tests for both xml2xml and xml2argv were added as
well. There are some things written a better way in comparison with
version 4, mainly there's no hack in tests/networkxml2argvtest.c and
also the xPath context is changed to use a simpler query using the
virXPathInt() function relative to the current node.
Also, the patch is also fixing the networkxml2argv test to pass both
checks, i.e. both unit tests and also syntax check.
Please review,
Michal
Signed-off-by: Michal Novotny <minovotn@redhat.com>
When doing security relabeling, there are cases where a per-file
override might be appropriate. For example, with a static label
and relabeling, it might be appropriate to skip relabeling on a
particular disk, where the backing file lives on NFS that lacks
the ability to track labeling. Or with dynamic labeling, it might
be appropriate to use a custom (non-dynamic) label for a disk
specifically intended to be shared across domains.
The new XML resembles the top-level <seclabel>, but with fewer
options (basically relabel='no', or <label>text</label>):
<domain ...>
...
<devices>
<disk type='file' device='disk'>
<source file='/path/to/image1'>
<seclabel relabel='no'/> <!-- override for just this disk -->
</source>
...
</disk>
<disk type='file' device='disk'>
<source file='/path/to/image1'>
<seclabel relabel='yes'> <!-- override for just this disk -->
<label>system_u:object_r:shared_content_t:s0</label>
</seclabel>
</source>
...
</disk>
...
</devices>
<seclabel type='dynamic' model='selinux'>
<baselabel>text</baselabel> <!-- used for all devices without override -->
</seclabel>
</domain>
This patch only introduces the XML and documentation; future patches
will actually parse and make use of it. The intent is that we can
further extend things as needed, adding a per-device <seclabel> in
more places (such as the source of a console device), and possibly
allowing a <baselabel> instead of <label> for labeling where we want
to reuse the cNNN,cNNN pair of a dynamically labeled domain but a
different base label.
First suggested by Daniel P. Berrange here:
https://www.redhat.com/archives/libvir-list/2011-December/msg00258.html
* docs/schemas/domaincommon.rng (devSeclabel): New define.
(disk): Use it.
* docs/formatdomain.html.in (elementsDisks, seclabel): Document
the new XML.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-dynamic-override.xml:
New test, to validate RNG.
The RNG for <seclabel> was too strict - if it was present, then it
had to have sub-elements, even if those didn't make sense for the
given attributes. Also, we didn't have any tests of <seclabel>
parsing or XML output.
In this patch, I added more parsing tests than output tests (since
the output populates and/or reorders fields not present in certain
inputs). Making the RNG reliable is a precursor to using <seclabel>
variants in more places in the XML in later patches.
See also:
http://berrange.com/posts/2011/09/29/two-small-improvements-to-svirt-guest-configuration-flexibility-with-kvmlibvirt/
* docs/schemas/domaincommon.rng (seclabel): Tighten rules.
* tests/qemuxml2argvtest.c (mymain): New tests.
* tests/qemuxml2xmltest.c (mymain): Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*.*: New files.
Original patch by Bharata. Updated to use {1,16} in spaprvioReg based
on example from Eric Blake.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
In QEMU PPC64 we have a network device called "spapr-vlan". We can specify
this using the existing syntax for network devices, however libvirt
currently rejects "spapr-vlan" in virDomainNetDefParseXML() because of
the "-". Fix the code to accept "-".
* src/conf/domain_conf.c (virDomainNetDefParseXML): Allow '-' in
model name, and be more efficient.
* docs/schemas/domaincommon.rng: Limit valid model names to match code.
Based on a patch by Michael Ellerman.
ppc64 as new arch type and pseries as new machine type are
added under <os> ... </os>.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
This patch is to expose the fabric_name of fc_host class, which
might be useful for users who wants to known which fabric the
(v)HBA connects to.
The patch also adds the missed capabilities' XML schema of scsi_host,
(of course, with fabric_wwn added), and update the documents
(docs/formatnode.html.in)
Enable block I/O throttle for per-disk in XML, as the first
per-disk IO tuning parameter.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The capabilities XML uses the x86 specific terms 'S3', 'S4'
and 'Hybrid-Syspend'. Switch it to use the same terminology
as the API constants and virsh options, eg 'suspend_mem'
'suspend_disk' and 'suspend_hybrid'
* docs/formatcaps.html.in, docs/schemas/capability.rng,
src/conf/capabilities.c: Rename suspend constants
This adds per-device weights to <blkiotune>. Note that the
cgroups implementation only supports weights per block device,
and not per-file within the device; hence this option must be
global to the domain definition rather than tied to individual
<devices>/<disk> entries:
<domain ...>
<blkiotune>
<device>
<path>/path/to/block</path>
<weight>1000</weight>
</device>
</blkiotune>
..
This patch also adds a parameter --device-weights to virsh command
blkiotune for setting/getting blkiotune.weight_device for any
hypervisor that supports it. All <device> entries under
<blkiotune> are concatenated into a single string attribute under
virDomain{Get,Set}BlkioParameters, named "device_weight".
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Some systems support a feature known as 'Hybrid-Suspend', apart from the
usual system-wide sleep states such as Suspend-to-RAM (S3) or Suspend-to-Disk
(S4). Add the functionality to discover this power management feature and
export it in the capabilities XML under the <power_management> tag.
virt-xml-validate fails when run on a domain XML file of type 'vbox'.
For failing test case, see https://bugzilla.redhat.com/show_bug.cgi?id=757097
This patch updates the XML schema to accept all valid hypervisor
types, as well as dropping hypervisor types that are not in use
by the current code base.
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch adds support for filtering of STP (spanning tree protocol) traffic
to the parser and makes us of the ebtables support for STP filtering. This code
now enables the filtering of traffic in chains with prefix 'stp'.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
With hunks borrowed from one of David Steven's previous patches, we now
add the capability of having a 'mac' chain which is useful to filter
for multiple valid MAC addresses.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch exports KVM Host Power Management capabilities as XML so that
higher-level systems management software can make use of these features
available in the host.
The script "pm-is-supported" (from pm-utils package) is run to discover if
Suspend-to-RAM (S3) or Suspend-to-Disk (S4) is supported by the host.
If either of them are supported, then a new tag "<power_management>" is
introduced in the XML under the <host> tag.
However in case the query to check for power management features succeeded,
but the host does not support any such feature, then the XML will contain
an empty <power_management/> tag. In the event that the PM query itself
failed, the XML will not contain any "power_management" tag.
To use this, new APIs could be implemented in libvirt to exploit power
management features such as S3/S4.
This patch adds support for filtering of VLAN (802.1Q) traffic to the
parser and makes us of the ebtables support for VLAN filtering. This code
now enables the filtering of traffic in chains with prefix 'vlan'.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch modifies the NWFilter parameter parser to support multiple
elements with the same name and to internally build a list of items.
An example of the XML looks like this:
<parameter name='TEST' value='10.1.2.3'/>
<parameter name='TEST' value='10.2.3.4'/>
<parameter name='TEST' value='10.1.1.1'/>
The list of values is then stored in the newly introduced data type
virNWFilterVarValue.
The XML formatter is also adapted to print out all items in alphabetical
order sorted by 'name'.
This patch also fixes a bug in the XML schema on the way.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch enables chains that have a known prefix in their name.
Known prefixes are: 'ipv4', 'ipv6', 'arp', 'rarp'. All prefixes
are also protocols that can be evaluated on the ebtables level.
Following the prefix they will be automatically connected to an interface's
'root' chain and jumped into following the protocol they evaluate, i.e.,
a table 'arp-xyz' will be accessed from the root table using
ebtables -t nat -A <iface root table> -p arp -j I-<ifname>-arp-xyz
thus generating a 'root' chain like this one here:
Bridge chain: libvirt-O-vnet0, entries: 5, policy: ACCEPT
-p IPv4 -j O-vnet0-ipv4
-p ARP -j O-vnet0-arp
-p 0x8035 -j O-vnet0-rarp
-p ARP -j O-vnet0-arp-xyz
-j DROP
where the chain 'arp-xyz' is accessed for filtering of ARP packets.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch extends the filter XML to support priorities of chains
in the XML. An example would be:
<filter name='allow-arpxyz' chain='arp-xyz' priority='200'>
[...]
</filter>
The permitted values for priorities are [-1000, 1000].
By setting the priority of a chain the order in which it is accessed
from the interface root chain can be influenced.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch adds XML definitions for guest NUMA specification and contains
routines to parse the same. The guest NUMA specification looks like this:
<cpu>
...
<topology sockets='2' cores='4' threads='2'/>
<numa>
<cell cpus='0-7' memory='512000'/>
<cell cpus='8-15' memory='512000'/>
</numa>
...
</cpu>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>