Add two new APIs virNetSocketNewPostExecRestart and
virNetSocketPreExecRestart which allow a virNetSocketPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()
Since it is not possible to serialize SASL or TLS encryption
state, an error will be raised if attempting to perform
serialization on non-raw sockets
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add two new APIs virLockSpaceNewPostExecRestart and
virLockSpacePreExecRestart which allow a virLockSpacePtr
object to be created from a JSON object and saved to a
JSON object, for the purposes of re-exec'ing a process.
As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The previously introduced virFile{Lock,Unlock} APIs provide a
way to acquire/release fcntl() locks on individual files. For
unknown reason though, the POSIX spec says that fcntl() locks
are released when *any* file handle referring to the same path
is closed. In the following sequence
threadA: fd1 = open("foo")
threadB: fd2 = open("foo")
threadA: virFileLock(fd1)
threadB: virFileLock(fd2)
threadB: close(fd2)
you'd expect threadA to come out holding a lock on 'foo', and
indeed it does hold a lock for a very short time. Unfortunately
when threadB does close(fd2) this releases the lock associated
with fd1. For the current libvirt use case for virFileLock -
pidfiles - this doesn't matter since the lock is acquired
at startup while single threaded an never released until
exit.
To provide a more generally useful API though, it is necessary
to introduce a slightly higher level abstraction, which is to
be referred to as a "lockspace". This is to be provided by
a virLockSpacePtr object in src/util/virlockspace.{c,h}. The
core idea is that the lockspace keeps track of what files are
already open+locked. This means that when a 2nd thread comes
along and tries to acquire a lock, it doesn't end up opening
and closing a new FD. The lockspace just checks the current
list of held locks and immediately returns VIR_ERR_RESOURCE_BUSY.
NB, the API as it stands is designed on the basis that the
files being locked are not being otherwise opened and used
by the application code. One approach to using this API is to
acquire locks based on a hash of the filepath.
eg to lock /var/lib/libvirt/images/foo.img the application
might do
virLockSpacePtr lockspace = virLockSpaceNew("/var/lib/libvirt/imagelocks");
lockname = md5sum("/var/lib/libvirt/images/foo.img");
virLockSpaceAcquireLock(lockspace, lockname);
NB, in this example, the caller should ensure that the path
is canonicalized before calculating the checksum.
It is also possible to do locks directly on resources by
using a NULL lockspace directory and then using the file
path as the lock name eg
virLockSpacePtr lockspace = virLockSpaceNew(NULL);
virLockSpaceAcquireLock(lockspace, "/var/lib/libvirt/images/foo.img");
This is only safe to do though if no other part of the process
will be opening the files. This will be the case when this
code is used inside the soon-to-be-reposted virlockd daemon
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=851981
When using macvtap, a character device gets first created by
kernel with name /dev/tapN, its selinux context is:
system_u:object_r:device_t:s0
Shortly, when udev gets notification when new file is created
in /dev, it will then jump in and relabel this file back to the
expected default context:
system_u:object_r:tun_tap_device_t:s0
There is a time gap happened.
Sometimes, it will have migration failed, AVC error message:
type=AVC msg=audit(1349858424.233:42507): avc: denied { read write } for
pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524
scontext=unconfined_u:system_r:svirt_t:s0:c598,c908
tcontext=system_u:object_r:device_t:s0 tclass=chr_file
This patch will label the tapfd device before qemu process starts:
system_u:object_r:tun_tap_device_t:MCS(MCS from seclabel->label)
This patch adds support for SUSPEND_DISK event; both lifecycle and
separated. The support is added for QEMU, machines are changed to
PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes
to shut-off. This and much more needs to be done in order for libvirt
to work with transient devices, wake-ups etc. This patch is not
aiming for that functionality.
The onlined vcpu pinning policy should inherit def->cpuset if
it's not specified explicitly, and the affinity should be set
in this case. Oppositely, the offlined vcpu pinning policy should
be free()'ed.
Add support for logging to the systemd journal, using its
simple client library. The benefit over syslog is that it
accepts structured log data, so the journald can store
individual items like code file/line/func separately from
the string message. Tools which require structured log
data can then query the journal to extract exactly what
they desire without resorting to string parsing
While systemd provides a simple client library for logging,
it is more convenient for libvirt to directly write its
own client code. This lets us build up the iovec's on
the stack, avoiding the need to alloc memory when writing
log messages.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In the cgroups APIs we have a virCgroupKillPainfully function
which does the loop sending SIGTERM, then SIGKILL and waiting
for the process to exit. There is similar functionality for
simple processes in qemuProcessKill, but it is tangled with
the QEMU code. Untangle it to provide a virProcessKillPainfuly
function
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Continue consolidation of process functions by moving some
helpers out of command.{c,h} into virprocess.{c,h}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are a number of process related functions spread
across multiple files. Start to consolidate them by
creating a virprocess.{c,h} file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCommand prefix was inappropriate because the API
does not use any virCommandPtr object instance. This
API closely related to waitpid/exit, so use virProcess
as the prefix
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Sometimes when guest machine crashes, coredump can get huge due to the
guest memory. This can be limited using madvise(2) system call and is
being used in QEMU hypervisor. This patch adds an option for configuring
that in the domain XML and related documentation.
This patch cleans up building the "-boot" parameter and while on that
fixes one inconsistency by modifying these things:
- I completed the unfinished virDomainBootMenu enum by specifying
LAST, declaring it and also declaring the TypeFromString and
TypeToString parameters.
- Previously mentioned TypeFromString and TypeToString are used when
parsing the XML.
- Last, but not least, visible change is that the "-boot" parameter
is built and parsed properly:
- The "order=" prefix is used only when additional parameters are
used (menu, etc.).
- It's rewritten in a way that other parameters can be added
easily in the future (used in following patch).
- The "order=" parameter is properly parsed regardless to where it
is placed in the string (e.g. "menu=on,order=nc").
- The "menu=" parameter (and others in the future) are created
when they should be (i.e. even when bootindex is supported and
used, but not when bootloader is selected).
Commit f36309d added an export with no matching implementation;
probably a misspelling of an earlier version of the final addition
of virNetworkObjSetDefTransient.
* src/libvirt_private.syms (network_conf.h): Drop bogus
virNetworkSetDefTransient.
virNetworkObjUpdate takes care of all virNetworkUpdate-related changes
to the data stored in the in-memory virNetworkObj list. It should be
called by network drivers that use this in-memory list.
virNetworkObjUpdate *does not* take care of updating any disk-based
copies of the config, nor does it perform any other operations
necessary to have the new config data take effect (e.g. it won't
re-write dnsmasq host files, nor will it send a SIGHUP to dnsmasq) -
those things should all be taken care of in the network driver
function that calls virNetworkObjUpdate (assuming that it returns
success).
These new functions are highly inspired by those in domain_conf.c (but
not identical), and are intended to make it simpler to update the
various combinations of live/persistent network configs.
The network driver wasn't previously as careful about the separation
between the live "status" in network->def and the persistent "config"
in network->newDef (or sometimes in network->def). This series
attempts to remedy some of that, but probably doesn't go all the way
(enough to get these functions working and enable continued work on
virNetworkUpdate though).
bridge_driver.c and test_driver.c were updated in a few places to take
advantage of the new functions and/or account for changes in argument
lists.
Validates the wwn while parsing, error out if it's malformed.
* src/util/util.h: Declare virValidateWWN
* src/util/util.c: Implement virValidateWWN
* src/libvirt_private.syms: Export virValidateWWN.
* src/conf/domain_conf.h: New member 'wwn' for disk def.
* src/conf/domain_conf.c: Parse and format disk <wwn>
In many places we store bitmap info in a chunk of data
(pointed to by a char *), and have redundant codes to
set/unset bits. This patch extends virBitmap, and convert
those codes to use virBitmap in subsequent patches.
Only implemented for linux platform.
* src/nodeinfo.h: (Declare node{Get,Set}MemoryParameters)
* src/nodeinfo.c: (Implement node{Get,Set}MemoryParameters)
* src/libvirt_private.syms: (Export those two new internal APIs to
private symbols)
tools/virsh-nodedev.c:
* vshNodeDeviceSorter to sort node devices by name
* vshNodeDeviceListFree to free the node device objects list.
* vshNodeDeviceListCollect to collect the node device objects, trying
to use new API first, fall back to older APIs if it's not supported.
* Change option --cap to accept multiple capability types.
tools/virsh.pod
* Update document for --cap
src/conf/node_device_conf.h:
* New macro VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_CAP
* Declare virNodeDeviceList
src/conf/node_device_conf.c:
* New helpers virNodeDeviceCapMatch, virNodeDeviceMatch.
virNodeDeviceCapMatch looks up the list of all the caps the device
support, to see if the device support the cap type.
* Implement virNodeDeviceList
src/libvirt_private.syms:
* Export virNodeDeviceList
* Export virNodeDevCapTypeFromString
New options is added to support EOI (End of Interrupt) exposure for
guests. As it makes sense only when APIC is enabled, I added this into
the <apic> element in <features> because this should be tri-state
option (cannot be handled as standalone feature).
src/conf/network_conf.c: Add virNetworkMatch to filter the networks;
and virNetworkList to iterate over all the networks with the filter.
src/conf/network_conf.h: Declare virNetworkList and define the macros
for filters.
src/libvirt_private.syms: Export virNetworkList.
This patch adds a helper to deal with assigning values to
virTypedParameter structures from strings. The helper parses the value
from the string and assigns it to the corresponding union value.
When the event symbols were added to the public API, not all
of them were removed from the private exports list. Solaris
gets unhappy when there are duplicated symbols. Extend the
symfile check to test for this scenario
src/conf/storage_conf.c: Add virStoragePoolMatch to filter the
pools; Add virStoragePoolList to iterate over the pool objects
with filter.
src/conf/storage_conf.h: Declare virStoragePoolMatch,
virStoragePoolList, and the macros for filters.
src/libvirt_private.syms: Export helper virStoragePoolList.
There is a new <pm/> element implemented that can control what ACPI
sleeping states will be advertised by BIOS and allowed to be switched
to by libvirt. The default keeps defaults on hypervisor, otherwise
forces chosen setting.
The documentation of the pm element is added as well.
The name 'virDomainDiskSnapshot' didn't fit in with our normal
conventions of using a prefix hinting that it is related to a
virDomainSnapshotPtr. Also, a future patch will reuse the
enum for declaring where the VM memory is stored.
* src/conf/snapshot_conf.h (virDomainDiskSnapshot): Rename...
(virDomainSnapshotLocation): ...to this.
(_virDomainSnapshotDiskDef): Update clients.
* src/conf/domain_conf.h (_virDomainDiskDef): Likewise.
* src/libvirt_private.syms (domain_conf.h): Likewise.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainDiskDefFormat): Likewise.
* src/conf/snapshot_conf.c: (virDomainSnapshotDiskDefParseXML)
(virDomainSnapshotAlignDisks, virDomainSnapshotDefFormat):
Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotCreateDiskActive, qemuDomainSnapshotCreateXML):
Likewise.
We were failing to react to allocation failure when initializing
a snapshot object list. Changing things to store a pointer
instead of a complete object adds one more possible point of
allocation failure, but at the same time, will make it easier to
react to failure now, as well as making it easier for a future
patch to split all virDomainSnapshotPtr handling into a separate
file, as I continue to add even more snapshot code.
Luckily, there was only one client outside of domain_conf.c that
was actually peeking inside the object, and a new wrapper function
was easy.
* src/conf/domain_conf.h (_virDomainObj): Use a pointer.
(virDomainSnapshotObjListInit): Rename.
(virDomainSnapshotObjListFree, virDomainSnapshotForEach): New
declarations.
(_virDomainSnapshotObjList): Move definitions...
* src/conf/domain_conf.c: ...here.
(virDomainSnapshotObjListInit, virDomainSnapshotObjListDeinit):
Rename...
(virDomainSnapshotObjListNew, virDomainSnapshotObjListFree): ...to
these.
(virDomainSnapshotForEach): New function.
(virDomainObjDispose, virDomainListPopulate): Adjust callers.
* src/qemu/qemu_domain.c (qemuDomainSnapshotDiscard)
(qemuDomainSnapshotDiscardAllMetadata): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotLoad)
(qemuDomainUndefineFlags, qemuDomainSnapshotCreateXML)
(qemuDomainSnapshotListNames, qemuDomainSnapshotNum)
(qemuDomainListAllSnapshots)
(qemuDomainSnapshotListChildrenNames)
(qemuDomainSnapshotNumChildren)
(qemuDomainSnapshotListAllChildren)
(qemuDomainSnapshotLookupByName, qemuDomainSnapshotGetParent)
(qemuDomainSnapshotGetXMLDesc, qemuDomainSnapshotIsCurrent)
(qemuDomainSnapshotHasMetadata, qemuDomainRevertToSnapshot)
(qemuDomainSnapshotDelete): Likewise.
* src/libvirt_private.syms (domain_conf.h): Export new function.
This patch introduce virNetlinkEventServiceStopAll() to stop
all the monitors to receive netlink messages for libvirtd.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Introduce 2 APIs to support emulator threads pin.
1) virDomainEmulatorPinAdd: setup emulator threads pin with a given cpumap string.
2) virDomainEmulatorPinDel: remove all emulator threads pin.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
vcpu threads pin are implemented using sched_setaffinity(), but
not controlled by cgroup. This patch does the following things:
1) enable cpuset cgroup
2) reflect all the vcpu threads pin info to cgroup
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Introduce a new API to move tasks of one controller from a cgroup to another cgroup
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Introduce the function virCgroupForEmulator() to create sub directory
for simulator thread(include I/O thread, vhost-net thread)
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
A hypervisor may allow to override the disk geometry of drives.
Qemu, as an example with cyls=,heads=,secs=[,trans=].
This patch extends the domain config to allow the specification of
disk geometry with libvirt.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When running libvirtd from a build directory, libvirtd would load lock
drivers from system directory unless explicitly overridden by setting
LIBVIRT_LOCK_MANAGER_PLUGIN_DIR environment variable. Since we already
autodetect driver directory if libvirt is build with driver modules, we
can use the same trick to automagically set lock driver directory.
This patch adds a glue layer to enable using libssh2 code with the
network client code.
As in the original client implementation, shell code is sent to the
server to detect correct options for netcat and connect to libvirt's
unix socket.
This patch enables virNetSocket to be used as an ssh client when
properly configured.
This patch adds function virNetSocketNewConnectLibSSH2() that takes all
needed parameters and creates a libssh2 session and performs steps
needed to open the connection and then create a virNetSocket that
seamlesly encapsulates the communication.
These changes make the security drivers able to find and handle the
correct security label information when more than one label is
available. They also update the DAC driver to be used as an usual
security driver.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>
This patch updates the domain and capability XML parser and formatter to
support more than one "seclabel" element for each domain and device. The
RNG schema and the tests related to this are also updated by this patch.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>