This patch introduces support for LXC specific public APIs. In
common with what was done for QEMU, this creates a libvirt_lxc.so
library and libvirt/libvirt-lxc.h header file.
The actual APIs are
int virDomainLxcOpenNamespace(virDomainPtr domain,
int **fdlist,
unsigned int flags);
int virDomainLxcEnterNamespace(virDomainPtr domain,
unsigned int nfdlist,
int *fdlist,
unsigned int *noldfdlist,
int **oldfdlist,
unsigned int flags);
which provide a way to use the setns() system call to move the
calling process into the container's namespace. It is not
practical to write in a generically applicable manner. The
nearest that we could get to such an API would be an API which
allows to pass a command + argv to be executed inside a
container. Even if we had such a generic API, this LXC specific
API is still useful, because it allows the caller to maintain
the current process context, in particular any I/O streams they
have open.
NB the virDomainLxcEnterNamespace() API is special in that it
runs client side, so does not involve the internal driver API.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When parsing the top level comment of a file, apibuild.py used
to split on any ':' character of a line regarding the first part
as a key for a setting, e.g. "Summary". The second part would then
be assigned as the value for that key.
This means you could not use a ':' character inside those comments
without ill effects.
Now, a key must consist solely of alphanumeric characters, '_' or '.'.
I've noticed a number of people sending patches with file
renames not compressed, so we might as well document how to
set this up. (Git won't do it by default, for back-compat
reasons)
* docs/hacking.html.in: Add git config tip.
* HACKING: Regenerate.
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
Libvirt's HTML documentation is not as easy to the eyes as it could
be since long text has no visual breaks.
Take advantage of the formatting in documentation comments and wrap
each part separated by two consecutive \n into a HTML <p> element.
The SCLP console is the native console type for s390 and is preferred
over the virtio console as it doesn't require special drivers and
is more efficient. Recent versions of QEMU come with SCLP support
which is hereby enabled.
The new target types 'sclp' and 'sclplm' can be used to specify a
SCLP console. Adding documentation, domain schema and XML processing
support.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This introduces new XML tag "sgio" for disk, its valid values
are "filtered" and "unfiltered", setting it as "filtered" will
set the disk's unpriv_sgio to 0, and "unfiltered" to set it
as 1, which allows the unprivileged SG_IO commands.
The <hostdev> device type has long had a redundant "mode"
attribute, which has always been "subsys". This finally
introduces a new mode "capabilities", which will be used
by the LXC driver for device assignment. Since container
based virtualization uses a single kernel, the idea of
assigning physical PCI devices doesn't make sense. It is
still reasonable to assign USB devices, but for assigning
arbitrary nodes in /dev, the new 'capabilities' mode is
to be used.
The first capability support is 'storage', which is for
assignment of block devices. Functionally this is really
pretty similar to the <disk> support. The only difference
is the device node name is identical in both host and
container namespaces.
<hostdev mode='capabilities' type='storage'>
<source>
<block>/dev/sdf1</block>
</source>
</hostdev>
The second capability support is 'misc', which is for
assignment of character devices. There is no existing
parallel to this. Again the device node is the same
inside & outside the container.
<hostdev mode='capabilities' type='misc'>
<source>
<char>/dev/input/event3</char>
</source>
</hostdev>
The reason for keeping the char & storage devices
separate in the domain XML, is to mirror the split
in the node device XML. NB the node device XML does
not yet report character devices, but that's another
new patch to come
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If there are multiple video devices
primary = 'yes' marks this video device as the primary one.
The rest are secondary video devices. No more than one could be
mark as primary. If none of them has primary attribute, the first
one will be the primary by default like what it was.
The reason of this changing is that for qemu, only one primary video
device is permitted which can be of any type. For secondary video
devices, only qxl is allowd. Primary attribute removes the restriction
that the first have to be the primary one.
We always put the primary video device into the first position of
video device structure array after parsing.
This is however supported only on domain interfaces with
type='network'. Moreover, target network needs to have at least
inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average'
(old) or 'floor' (new). This new attribute can be used just for
interfaces type of network (<interface type='network'/>) currently.
The DHCPv6 support includes IPV6 dhcp-range and dhcp-host for one
IPv6 subnetwork on one interface. This support will only work
if dnsmasq version >= 2.64; otherwise an error occurs if
dhcp-range or dhcp-host is specified for an IPv6 address.
Essentially, this change provides the same DHCP support for IPv6
that has been available for IPv4.
With dnsmasq >= 2.64, support for the RA service is also now provided
by dnsmasq (radvd is no longer used/started). (Although at least one
version of dnsmasq prior to 2.64 "supported" IPv6 Router
Advertisement, there were bugs (fixed in 2.64) that rendered it
unusable.)
Documentation and the network schema has been updated
to reflect the new support.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
Based on a patch originally authored by Daniel De Graaf
http://lists.xen.org/archives/html/xen-devel/2012-05/msg00565.html
This patch converts the Xen libxl driver to support only Xen >= 4.2.
Support for Xen 4.1 libxl is dropped since that version of libxl is
designated 'technology preview' only and is incompatible with Xen 4.2
libxl. Additionally, the default toolstack in Xen 4.1 is still xend,
for which libvirt has a stable, functional driver.
This patch adds the capability for virtual guests to do IPv6
communication via a virtual network interface with no IPv6 (gateway)
addresses specified. This capability has always been enabled by
default for IPv4, but disabled for IPv6 for security concerns, and
because it requires the ip6tables command to be operational (which
isn't the case on a system with the ipv6 module completely disabled).
This patch adds a new attribute "ipv6" at the toplevel of a <network>
object. If ipv6='yes', the extra ip6tables rules required to permite
inter-guest communications are added when the network is started. If
it is 'no', or not present, those rules will not be added; thus the
default behavior doesn't change, so there should be no compatibility
issues with any existing installations.
Note that virtual guests cannot communication with the virtualization
host via this interface, because the following kernel tunable has
been set:
net.ipv6.conf.<bridge_interface_name>.disable_ipv6 = 1
This assures that the bridge interface will not have an IPv6
link-local (fe80::) address.
To control this behavior so that it is not enabled by default, the parameter
ipv6='yes' on the <network> statement has been added.
Documentation related to this patch has been updated.
The network schema has also been updated.
This patch introduces the RNG schema and updates necessary data strucutures
to allow various hypervisors to make use of Gluster protocol as one of the
supported network disk backend. Next patch will add support to make use of
this feature in Qemu since it now supports Gluster protocol as one of the
network based storage backend.
Two new optional attributes for <host> element are introduced - 'transport'
and 'socket'. Valid transport values are tcp, unix or rdma. If none specified,
tcp is assumed. If transport is unix, socket specifies path to unix socket.
This patch allows users to specify disks on gluster backends like this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume1/image'>
<host name='example.org' port='6000' transport='tcp'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume2/image'>
<host transport='unix' socket='/path/to/sock'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Each <domainsnapshot> can now contain an optional <memory>
element that describes how the VM state was handled, similar
to disk snapshots. The new element will always appear in
output; for back-compat, an input that lacks the element will
assume 'no' or 'internal' according to the domain state.
Along with this change, it is now possible to pass <disks> in
the XML for an offline snapshot; this also needs to be wired up
in a future patch, to make it possible to choose internal vs.
external on a per-disk basis for each disk in an offline domain.
At that point, using the --disk-only flag for an offline domain
will be able to work.
For some examples below, remember that qemu supports the
following snapshot actions:
qemu-img: offline external and internal disk
savevm: online internal VM and disk
migrate: online external VM
transaction: online external disk
=====
<domainsnapshot>
<memory snapshot='no'/>
...
</domainsnapshot>
implies that there is no VM state saved (mandatory for
offline and disk-only snapshots, not possible otherwise);
using qemu-img for offline domains and transaction for online.
=====
<domainsnapshot>
<memory snapshot='internal'/>
...
</domainsnapshot>
state is saved inside one of the disks (as in qemu's 'savevm'
system checkpoint implementation). If needed in the future,
we can also add an attribute pointing out _which_ disk saved
the internal state; maybe disk='vda'.
=====
<domainsnapshot>
<memory snapshot='external' file='/path/to/state'/>
...
</domainsnapshot>
This is not wired up yet, but future patches will allow this to
control a combination of 'virsh save /path/to/state' plus disk
snapshots from the same point in time.
=====
So for 1.0.1 (and later, as needed), I plan to implement this table
of combinations, with '*' designating new code and '+' designating
existing code reached through new combinations of xml and/or the
existing DISK_ONLY flag:
domain memory disk disk-only | result
-----------------------------------------
offline omit omit any | memory=no disk=int, via qemu-img
offline no omit any |+memory=no disk=int, via qemu-img
offline omit/no no any | invalid combination (nothing to snapshot)
offline omit/no int any |+memory=no disk=int, via qemu-img
offline omit/no ext any |*memory=no disk=ext, via qemu-img
offline int/ext any any | invalid combination (no memory to save)
online omit omit off | memory=int disk=int, via savevm
online omit omit on | memory=no disk=default, via transaction
online omit no/ext off | unsupported for now
online omit no on | invalid combination (nothing to snapshot)
online omit ext on | memory=no disk=ext, via transaction
online omit int off |+memory=int disk=int, via savevm
online omit int on | unsupported for now
online no omit any |+memory=no disk=default, via transaction
online no no any | invalid combination (nothing to snapshot)
online no int any | unsupported for now
online no ext any |+memory=no disk=ext, via transaction
online int/ext any on | invalid combination (disk-only vs. memory)
online int omit off |+memory=int disk=int, via savevm
online int no/ext off | unsupported for now
online int int off |+memory=int disk=int, via savevm
online ext omit off |*memory=ext disk=default, via migrate+trans
online ext no off |+memory=ext disk=no, via migrate
online ext int off | unsupported for now
online ext ext off |*memory=ext disk=ext, via migrate+transaction
* docs/schemas/domainsnapshot.rng (memory): New RNG element.
* docs/formatsnapshot.html.in: Document it.
* src/conf/snapshot_conf.h (virDomainSnapshotDef): New fields.
* src/conf/domain_conf.c (virDomainSnapshotDefFree)
(virDomainSnapshotDefParseString, virDomainSnapshotDefFormat):
Manage new fields.
* tests/domainsnapshotxml2xmltest.c: New test.
* tests/domainsnapshotxml2xmlin/*.xml: Update existing tests.
* tests/domainsnapshotxml2xmlout/*.xml: Likewise.
This documents the following whitespace rules
if(foo) // Bad
if (foo) // Good
int foo (int wizz) // Bad
int foo(int wizz) // Good
bar = foo (wizz); // Bad
bar = foo(wizz); // Good
typedef int (*foo) (int wizz); // Bad
typedef int (*foo)(int wizz); // Good
int foo( int wizz ); // Bad
int foo(int wizz); // Good
There is a syntax-check rule extension to validate all these rules.
Checking for 'function (...args...)' is quite difficult since it
needs to ignore valid usage with keywords like 'if (...test...)'
and while/for/switch. It must also ignore source comments and
quoted strings.
It is not possible todo this with a simple regex in the normal
syntax-check style. So a short Perl script is created instead
to analyse the source. In practice this works well enough. The
only thing it can't cope with is multi-line quoted strings of
the form
"start of string\
more lines\
more line\
the end"
but this can and should be written as
"start of string"
"more lines"
"more line"
"the end"
with this simple change, the bracket checking script does not
have any false positives across libvirt source, provided it
is only run against .c files. It is not practical to run it
against .h files, since those use whitespace extensively to
get alignment (though this is somewhat inconsistent and could
arguably be fixed).
The only limitation is that it cannot detect a violation where
the first arg starts with a '*', eg
foo(*wizz);
since this generates too many false positives on function
typedefs which can't be supressed efficiently.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
* configure.ac docs/news.html.in libvirt.spec.in: update for the new release
* po/*.po*: update from transifex, a lot of added support e.g. Indian
languages, and regenerate
Also remove warnings for upcoming versions. There hadn't been any
compatibility problems with new ESX version over the whole lifetime
of the ESX driver, so I don't expect any in the future.
Update documentation to mention vSphere 5.x support.
On F17 at least, every time libvirtd starts we get this in syslog:
libvirtd: Could not find keytab file: /etc/libvirt/krb5.tab: No such file or directory
This comes from cyrus-sasl, and happens regardless of whether the
gssapi plugin is requested, which is what actually uses
/etc/libvirt/krb5.tab.
While cyrus-sasl shouldn't complain, we can easily make it shut up by
commenting out the keytab value by default.
Also update the keytab comment to the more modern one from qemu's
sasl config file.
At one point, the code passed through arbitrary strings for file
formats, which supposedly lets qemu handle a new file type even
before libvirt has been taught to handle it. However, to properly
label files, libvirt has to learn the file type anyway, so we
might as well make our life easier by only accepting file types
that we are prepared to handle. This patch lets the RNG validation
ensure that only known strings are let through.
* docs/schemas/domaincommon.rng (driverFormat): Limit to list of
supported strings.
* docs/schemas/domainsnapshot.rng (driver): Likewise.
Hypervisors are starting to support HyperV Enlightenment features that
improve behavior of guests running Microsoft Windows operating systems.
This patch adds support for the "relaxed" feature that improves timer
behavior and also establishes a framework to add these features in
future.
Given Daniel's announcement[1], code targetting the next release will
be in 1.0.0, not 0.10.3. Changed mechanically with:
for f in $(git grep -l '0\(.\)10\13\b') ; do
sed -i -e 's/0\(.\)10\13/1\10\10/g' $f
done
[1]https://www.redhat.com/archives/libvir-list/2012-October/msg00403.html
* docs/formatdomain.html.in: Use 1.0.0 for next release.
* src/interface/interface_backend_udev.c: Likewise.
These 3 elements conflicts with each other in either the doc
or the underlying codes.
Current problems:
Problem 1:
The doc shouldn't simply say "These settings are superseded
by CPU tuning. " for element <vcpu>. As except the tuning, <vcpu>
allows to specify the current, maxmum vcpu number. Apart from that,
<vcpu> also allows to specify the placement as "auto", which binds
the domain process to the advisory nodeset from numad.
Problem 2:
Doc for <vcpu> says its "cpuset" specify the physical CPUs
that the vcpus can be pinned. But it's not the truth, as
actually it only pin domain process to the specified physical
CPUs. So either it's a document bug, or code bug.
Problem 3:
Doc for <vcpupin> says it supersed "cpuset" of <vcpu>, it's
not quite correct, as each <vcpupin> specify the pinning policy
only for one vcpu. How about the ones which doesn't have
<vcpupin> specified? it says the vcpu will be pinned to all
available physical CPUs, but what's the meaning of attribute
"cpuset" of <vcpu> then?
Problem 4:
Doc for <emulatorpin> says it pin the emulator threads (domain
process in other context, perhaps another follow up patch to
cleanup the inconsistency is needed) to the physical CPUs
specified its attribute "cpuset". Which conflicts with
<vcpu>'s "cpuset". And actually in the underlying codes,
it set the affinity for domain process twice if both
"cpuset" for <vcpu> and <emulatorpin> are specified,
and <emulatorpin>'s pinning will override <vcpu>'s.
Problem 5:
When "placement" of <vcpu> is "auto" (I.e. uses numad to
get the advisory nodeset to which the domain process is
pinned to), it will also be overridden by <emulatorpin>,
This patch is trying to sort out the conflicts or bugs by:
1) Don't say <vcpu> is superseded by <cputune>
2) Keep the semanteme for "cpuset" of <vcpu> (I.e. Still says it
specify the physical CPUs the virtual CPUs). But modifying it
to mention it also set the pinning policy for domain process,
and the CPU placement of domain process specified by "cpuset"
of <vcpu> will be ingored if <emulatorpin> specified, and
similary, the CPU placement of vcpu thread will be ignored
if it has <vcpupin> specified, for vcpu which doesn't have
<vcpupin> specified, it inherits "cpuset" of <vcpu>.
3) Don't say <vcpu> is supersed by <vcpupin>. If neither <vcpupin>
nor "cpuset" of <vcpu> is specified, the vcpu will be pinned
to all available pCPUs.
4) If neither <emulatorpin> nor "cpuset" of <vcpu> is specified,
the domain process (emulator threads in the context) will be
pinned to all available pCPUs.
5) If "placement" of <vcpu> is "auto", <emulatorpin> is not allowed.
6) hotplugged vcpus will also inherit "cpuset" of <vcpu>
Codes changes according to above document changes:
1) Inherit def->cpumask for each vcpu which doesn't have <vcpupin>
specified, during parsing.
2) ping the vcpu which doesn't have <vcpupin> specified to def->cpumask
either by cgroup for sched_setaffinity(2), which is actually done
by 1).
3) Error out if "placement" == "auto", and <emulatorpin> is specified.
Otherwise, <emulatorpin> is honored, and "cpuset" of <cpuset> is
ignored.
4) Setup cgroup for each hotplugged vcpu, and setup the pinning policy
by either cgroup or sched_setaffinity(2).
5) Remove cgroup and <vcpupin> for each hot unplugged vcpu.
Patches are following (6 in total except this patch)
When startupPolicy set for a USB devices allows such device to be
missing, there was no way this could be detected from domain XML. With
this patch, libvirt emits a new missing='yes' attribute for such devices
when active domain XML is generated.
USB devices can disappear without OS being mad about it, which makes
them ideal for startupPolicy. With this attribute, USB devices can be
configured to be mandatory (the default), requisite (will disappear
during migration if they cannot be found), or completely optional.
While the changes to sanlock driver should be stable, the actual
implementation of sanlock_helper is supposed to be replaced in the
future. However, before we can implement a better sanlock_helper, we
need an administrative interface to libvirtd so that the helper can just
pass a "leases lost" event to the particular libvirt driver and
everything else will be taken care of internally. This approach will
also allow libvirt to pass such event to applications and use
appropriate reasons when changing domain states.
The temporary implementation handles all actions directly by calling
appropriate libvirt APIs (which among other things means that it needs
to know the credentials required to connect to libvirtd).
While current on_{poweroff,reboot,crash} action configuration is about
configuring life cycle actions, they can all be considered events and
actions that need to be done on a particular event. Let's generalize the
code by renaming life cycle actions to event actions so that it can be
reused later for non-lifecycle events.