There are currently two places in the domain where this combination is
used, and there is about to be another. This patch puts them together
for brevity and uniformity.
As with the newly-renamed virNetDevIPAddr and virNetDevIPRoute
objects, the new virNetDevIPInfo object will need to be accessed by a
utility function that calls low level Netlink functions (so we don't
want it to be in the conf directory) and will be called from multiple
hypervisor drivers (so it can't be in any hypervisor directory); the
most appropriate place is thus once again the util directory.
The parse and format functions are in conf/domain_conf.c because only
the domain XML (i.e. *not* the network XML) has this exact combination
of IP addresses plus routes. Note that virDomainNetIPInfoFormat() will
end up being the only caller to virDomainNetRoutesFormat() and
virDomainNetIPsFormat(), so it will just subsume those functions in a
later patch, but we can't do that until they are no longer called.
(It would have been nice to include the interface name within the
virNetDevIPInfo object (with a slight name change), but that can't
be done cleanly, because in each case the interface name is provided
in a different place in the XML relative to the routes and IP
addresses, so putting it in this object would actually make the code
more confused rather than simpler).
These functions all need to be called from a utility function that
must be located in the util directory, so we move them all into
util/virnetdevip.[ch] now that it exists.
Function and struct names were appropriately changed for the new
location, but all code is unchanged aside from motion and renaming.
This patch splits virnetdev.[ch] into multiple files, with the new
virnetdevip.[ch] containing all the functions related to setting and
retrieving IP-related info for a device (both addresses and routes).
I'm tired of mistyping this all the time, so let's do it the same all
the time (similar to how we changed all "Pci" to "PCI" awhile back).
(NB: I've left alone some things in the esx and vbox drivers because
I'm unable to compile them and they weren't obviously *not* a part of
some API. I also didn't change a couple of variables named,
e.g. "somethingIptables", because they were derived from the name of
the "iptables" command)
These had been declared in conf/device_conf.h, but then used in
util/virnetdev.c, meaning that we had to #include conf/device_conf.h
in virnetdev.c (which we have for a long time said shouldn't be done.
This caused a bigger problem when I tried to #include util/virnetdev.h
in a file in src/conf (which is allowed) - for some reason the
"device_conf.h: File not found" error.
The solution is to move the data types and functions used in util
sources from conf to util. Some names were adjusted during the move
("virInterface" --> "virNetDevIf", and "VIR_INTERFACE" -->
"VIR_NETDEV_IF")
virNetDevLinkDump should have been in virnetlink.c, but that file
didn't exist yet when the function was created. It didn't really
matter until now - I found that having virnetlink.h included by
virnetdev.h caused build problems when trying to #include virnetdev.h
in a .c file in src/conf (due to missing directory in -I). Rather than
fix that to further institutionalize the incorrect placement of this
one function, this patch moves the function.
The VIR_STORAGE_POOL_EVENT_REFRESHED constant does not
reflect any change in the lifecycle of the storage pool.
It should thus not be part of the storage pool lifecycle
event set, but rather be a top level event in its own
right. Thus we introduce VIR_STORAGE_POOL_EVENT_ID_REFRESH
to replace it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This kvmGetMaxVCPUs() needs to be used at two different places
so move it to utils with appropriate name and mark it as private
global now.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Move to virsecret.c and rename to virSecretLookupParseSecret. Also convert
to usage xmlNodePtr and virXMLPropString rather than virXPathString.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Move the enum into a new src/util/virsecret.h, rename it to be
virSecretLookupType. Add a src/util/virsecret.h in order to perform
a couple of simple operations on the secret XML and virSecretLookupTypeDef
for clearing and copying.
This includes quite a bit of collateral damage, but the goal is to remove
the "virStorage*" and replace with the virSecretLookupType so that it's
easier to to add new lookups that aren't necessarily storage pool related.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This will be used for the caller that needs to specify a separator.
Currently identical to virBitmapParse.
Also change one test case to use the new function.
Basically, there are just two functions introduced here:
virDomainRedirdevDefFind which looks up given redirdev in domain
definition, and virDomainRedirdevDefRemove which removes the
device at given index in the array of devices.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
While we need to know the difference between the total memory stored in
<memory> and the actual size not included in the possible memory modules
we can't pre-calculate it reliably. This is due to the fact that
libvirt's XML is copied via formatting and parsing the XML and the
initial memory size can be reliably calculated only when certain
conditions are met due to backwards compatibility.
This patch removes the storage of 'initial_memory' and fixes the helpers
to recalculate the initial memory size all the time from the total
memory size. This conversion is possible when we also make sure that
memory hotplug accounts properly for the update of the total memory size
and thus the helpers for inserting and removing memory devices need to
be tweaked too.
This fixes a bug where a cold-plug and cold-remove of a memory device
would increase the size reported in <memory> in the XML by the size of
the memory device. This would happen as the persistent definition is
copied before attaching the device and this would lead to the loss of
data in 'initial_memory'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1344892
This option allows or disallows detection of zero-writes if it is set to
"on" or "off", respectively. It can be also set to "unmap" in which
case it will try discarding that part of image based on the value of the
"discard" option.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The virQEMUDriverConfig object contains lists of
loader:nvram pairs to advertise firmwares supported by
by the driver, and qemu_conf.c contains code to populate
the lists, all of which is useful for other drivers too.
To avoid code duplication, introduce a virFirmware object
to encapsulate firmware details and switch the qemu driver
to use it.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
In libxl driver we do virObjectRef in libxlDomainObjBeginJob,
If virCondWaitUntil failed, it goes to error, do virObjectUnref,
There's a chance that someone undefine the vm at the same time,
and refs unref to zero, vm is freed in libxlDomainObjBeginJob.
But the vm outside function is not Null, we do virObjectUnlock(vm).
That's how we overwrite the vm memory after it's freed. I fix it.
Signed-off-by: Wang Yufei <james.wangyufei@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In the 162efa1a commit the function was introduced, but the
commit forgot to update livirt_private.syms accordingly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In preparation for moving all the CPU related APIs out of
the nodeinfo file, give them a virHostCPU name prefix.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In preparation for moving all the memory related APIs out of
the nodeinfo file, give them a virHostMem name prefix.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
virCPUData, virCPUx86Feature, and virCPUx86Model all contained a pointer
to virCPUx86Data, which was not very nice since the real CPUID data were
accessible by yet another pointer from virCPUx86Data. Moreover, using
virCPUx86Data directly will make static definitions of internal CPU
features a bit easier.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Since it will not be called from outside of conf we can unexport it too
if we move it to the appropriate place.
Test suite change is necessary since the error will be reported sooner
now.
Until now we weren't able to add checks that would reject configuration
once accepted by the parser. This patch adds a new callback and
infrastructure to add such checks. In this patch all the places where
rejecting a now-invalid configuration wouldn't be a good idea are marked
with a new parser flag.
Move the module from qemu_command.c to a new module virqemu.c and
rename the API to virQEMUBuildObjectCommandline.
This API will then be shareable with qemu-img and the need to build
a security object for luks support.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This function is not used anywhere. Moreover, the code that would
use lives in virperf.c and therefore has access to the FD anyway.
Well, for instance virPerfReadEvent is doing just that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Make virDomainControllerFindUnusedIndex() a global function so that it
can be used outside domain_conf.c (as well as higher up in
domain_conf.c itself)/ Also make its DomainDef arg a const* so that
functions which only have a const* to the domain can use it.
Move the logic from qemuDomainGenerateRandomKey into this new
function, altering the comments, variable names, and error messages
to keep things more generic.
NB: Although perhaps more reasonable to add soemthing to virrandom.c.
The virrandom.c was included in the setuid_rpc_client, so I chose
placement in vircrypto.
Introduce virCryptoHaveCipher and virCryptoEncryptData to handle
performing encryption.
virCryptoHaveCipher:
Boolean function to determine whether the requested cipher algorithm
is available. It's expected this API will be called prior to
virCryptoEncryptdata. It will return true/false.
virCryptoEncryptData:
Based on the requested cipher type, call the specific encryption
API to encrypt the data.
Currently the only algorithm support is the AES 256 CBC encryption.
Adjust tests for the API's
This solution does not keep snapshots cache because vz sdk lacks good support
for snapshot related events.
Libvirt and vz sdk has different approach to snapshot ids. vz sdk always
auto generate them while libvirt has ability to specify id by user.
Thus I have no other choice rather than simply ignore ids set by user
or generated by libvirt.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
For a few cases where we handle secret information it's good to clear
the buffers containing sensitive data before freeing them.
Introduce VIR_DISPOSE, VIR_DISPOSE_N and VIR_DISPOSE_STRING that allow
simple clearing fo the buffers holding sensitive information on cleanup
paths.
Move some parts of virStorageFileRemoveLastPathComponent
into a separate function so they can be reused.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For disks sources described by a libvirt volume we don't need to do a
complicated check since virStorageTranslateDiskSourcePool already
correctly determines the actual disk type.
Replace the checks using a new accessor that does not open-code the
whole logic.
We had both and the only difference was that the latter also included
information about multifunction setting. The problem with that was that
we couldn't use functions made for only one of the structs (e.g.
parsing). To consolidate those two structs, use the one in virpci.h,
include that in domain_conf.h and add the multifunction member in it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When the domain definition describes a machine with NUMA, setting the
maximum vCPU count via the API might lead to an invalid config.
Add a check that will forbid this until we add more advanced cpu config
capabilities.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1327499
Add virDomainObjGetShortName() and use it. For now that's used in one
place, but we should expose it so that future patches can use it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Introduce the final accessor's to _virSecretObject data and move the
structure from virsecretobj.h to virsecretobj.c
The virSecretObjSetValue logic will handle setting both the secret
value and the value_size. Some slight adjustments to the error path
over what was in secretSetValue were made.
Additionally, a slight logic change in secretGetValue where we'll
check for the internalFlags and error out before checking for
and erroring out for a NULL secret->value. That way, it won't be
obvious to anyone that the secret value wasn't set rather they'll
just know they cannot get the secret value since it's private.
Move and rename the secretRewriteFile, secretSaveDef, and secretSaveValue
from secret_driver to virsecretobj
Need to make some slight adjustments since the secretSave* functions
called secretEnsureDirectory, but otherwise mostly just a move of code.
Move and rename secretDeleteSaved from secret_driver into virsecretobj and
split it up into two parts since there is error path code that looks to
just delete the secret data file
Move to secret_conf.c and rename to virSecretLoadAllConfigs. Also includes
moving/renaming the supporting virSecretLoad, virSecretLoadValue, and
virSecretLoadValidateUUID.
This patch replaces most of the guts of secret_driver.c with recently
added secret_conf.c APIs in order manage secret lists and objects
using the hashed virSecretObjList* lookup API's.
Since threadpool increments the current number of threads according to current
load, i.e. how many jobs are waiting in the queue. The count however, is
constrained by max and min limits of workers. The logic of this new API works
like this:
1) setting the minimum
a) When the limit is increased, depending on the current number of
threads, new threads are possibly spawned if the current number of
threads is less than the new minimum limit
b) Decreasing the minimum limit has no possible effect on the current
number of threads
2) setting the maximum
a) Icreasing the maximum limit has no immediate effect on the current
number of threads, it only allows the threadpool to spawn more
threads when new jobs, that would otherwise end up queued, arrive.
b) Decreasing the maximum limit may affect the current number of
threads, if the current number of threads is less than the new
maximum limit. Since there may be some ongoing time-consuming jobs
that would effectively block this API from killing any threads.
Therefore, this API is asynchronous with best-effort execution,
i.e. the necessary number of workers will be terminated once they
finish their previous job, unless other workers had already
terminated, decreasing the limit to the requested value.
3) setting priority workers
- both increase and decrease in count of these workers have an
immediate impact on the current number of workers, new ones will be
spawned or some of them get terminated respectively.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
In order for the client to see all thread counts and limits, current total
and free worker count getters need to be introduced. Client might also be
interested in the job queue length, so provide a getter for that too. As with
the other getters, preparing for the admin interface, mutual exclusion is used
within all getters.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
In a few places in libvirt we busy-wait for events, for example qemu
creating a monitor socket. This is problematic because:
- We need to choose a sufficiently small polling period so that
libvirt doesn't add unnecessary delays.
- We need to choose a sufficiently large polling period so that
the effect of busy-waiting doesn't affect the system.
The solution to this conflict is to use an exponential backoff.
This patch adds two functions to hide the details, and modifies a few
places where we currently busy-wait.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
There are two places in qemu_domain_address.c where we have a switch
statement to convert PCI controller models
(VIR_DOMAIN_CONTROLLER_MODEL_PCI*) into the connection type flag that
is matched when looking for an upstream connection for that model of
controller (VIR_PCI_CONNECT_TYPE_*). This patch makes a utility
function in conf/domain_addr.c to do that, so that when a new PCI
controller is added, we only need to add the new model-->connect-type
in a single place.
commit 30c61901 added new functions to libvirt_private.syms
not alpabetically sorted and erroneously added vz sources to
STATEFUL_DRIVER_SOURCE_FILES, which triggered check-aclrules
running while vz driver isn't ready for it yet.
Pushing under build-breaker rule.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Make it possible to build vz driver as a module and don't link it with
libvirt.so statically.
Remove registering it on client's side as far as we start relying on daemon
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Since we didn't opt to use one single event for device lifecycle for a
VM we are missing one last event if the device removal failed. This
event will be emitted once we asked to eject the device but for some
reason it is not possible.
Those are the last two places that uses the getter functions. Use a
direct access instead and remove those getters.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Removes the check for graphics type, it's not a public API and developer
know what he's doing and this check makes no sense. It also removes
the ability to allocate a new array if there is none. This was used by
the virDomainGraphicsListenAdd* functions and isn't used anymore.
This is now a simple getter with simple check for listens array presence
and whether the index in out of bounds.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This effectively removes virDomainGraphicsListenSetAddress which was
used only to change the address of listen structure and possible change
the listen type. The new function will auto-expand the listens array
and append a new listen.
The old function was used on pre-allocated array of listens and in most
cases it only "add" a new listen. The two remaining uses can access the
listen structure directly.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Commit id 'fb2bd208' essentially copied the qemuGetSecretString
creating an libxlGetSecretString. Rather than have multiple copies
of the same code, create src/secret/secret_util.{c,h} files and
place the common function in there.
Modify the the build in order to build the module as a library
which is then pulled in by both the qemu and libxl drivers for
usage from both qemu_command.c and libxl_conf.c
Using the existing virUUIDGenerateRandomBytes, move API to virrandom.c
rename it to virRandomBytes and add it to libvirt_private.syms.
This will be used as a fallback for generating a domain master key.
In some cases it's impractical to use the regular APIs as the bitmap
size needs to be pre-declared. These new APIs allow to use bitmaps that
self expand.
The new code adds a property to the bitmap to track the allocation of
memory so that VIR_RESIZE_N can be used.
This patch implement a set of interfaces for perf event. Based on
these interfaces, we can implement internal driver API for perf,
and get the results of perf conuter you care about.
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Message-id: 1459171833-26416-4-git-send-email-qiaowei.ren@intel.com
This allows setting the address in host and/or network order and makes
the naming consistent. Now you don't need to call [hn]to[nh]l()
functions as that is taken care of by these functions. Also, now
the *NetOrder take the address in network order, the other functions in
host order so the naming and usage is consistent. Some places were
having the address in network order and calling ntohl() just so the
original function can call htonl() again. This makes it nicer to read.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
It's just a combination of AddImplicitControllers, and AddConsoleCompat.
Every caller that wants ImplicitControllers also wants the ConsoleCompat
AFAICT, so lump them together. We also need it for future patches.
If we expose this information, which is one byte in every PCI config
file, we let all mgmt apps know whether the device itself is an endpoint
or not so it's easier for them to decide whether such device can be
passed through into a VM (endpoint) or not (*-bridge).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1317531
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This is a missing counterpart for virSocketAddrSetIPv4Addr()
and is going to be needed later in the tests.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
These functions are going to be reused very shortly. So instead
of duplicating the code, lets move them into utils module.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Introduce a helper to check supported device and domain config and move
the memory hotplug checks to it.
The advantage of this approach is that by default all new features are
considered unsupported by all hypervisors unless specifically changed
rather than the previous approach where every hypervisor would need to
declare that a given feature is unsupported.
The VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event will be triggered once a job
(such as migration) finishes and it will contain statistics for the job
as one would get by calling virDomainGetJobStats. Thanks to this event
it is now possible to get statistics of a completed migration of a
transient domain on the source host.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
While trying to build with -Os I've encountered some build
failures.
util/vircommand.c: In function 'virCommandAddEnvFormat':
util/vircommand.c:1257:1: error: inlining failed in call to 'virCommandAddEnv': call is unlikely and code size would grow [-Werror=inline]
virCommandAddEnv(virCommandPtr cmd, char *env)
^
util/vircommand.c:1308:5: error: called from here [-Werror=inline]
virCommandAddEnv(cmd, env);
^
This function is big enough for the compiler to be not inlined.
This is the error message I'm seeing:
Then virDomainNumatuneNodeSpecified is exported and called from
other places. It shouldn't be inlined then.
In file included from network/bridge_driver_platform.h:30:0,
from network/bridge_driver_platform.c:26:
network/bridge_driver_linux.c: In function 'networkRemoveRoutingFirewallRules':
./conf/network_conf.h:350:1: error: inlining failed in call to 'virNetworkDefForwardIf.constprop': call is unlikely and code size would grow [-Werror=inline]
virNetworkDefForwardIf(const virNetworkDef *def, size_t n)
^
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
qemuProcessSetupEmulator runs at a point in time where there is only
the qemu main thread. Use virCgroupAddTask to put just that one task
into the emulator cgroup. That patch makes virCgroupMoveTask and
virCgroupAddTaskStrController obsolete.
Signed-off-by: Henning Schild <henning.schild@siemens.com>
Introduce virPolkitAgentCreate and virPolkitAgentDestroy
virPolkitAgentCreate will run the polkit pkttyagent image as an asynchronous
command in order to handle the local agent authentication via stdin/stdout.
The code makes use of the pkttyagent --notify-fd mechanism to let it know
when the agent is successfully registered.
virPolkitAgentDestroy will close the command effectively reaping our
child process
Since commit 47e5b5ae virCgroupAllowDevice allows to pass -1 as either
the minor or major device number and it automatically uses '*' in place
of that. Reuse the new approach through the code and drop the duplicated
functions.
Apparently we are not the only ones with dumb free functions
because dbus_message_unref() does not accept NULL either. But if
I were to vote, this one is even more evil. Instead of returning
an error just like we do it immediately dereference any pointer
passed and thus crash you app. Well done DBus!
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f878ebda700 (LWP 31264)]
0x00007f87be4016e5 in ?? () from /usr/lib64/libdbus-1.so.3
(gdb) bt
#0 0x00007f87be4016e5 in ?? () from /usr/lib64/libdbus-1.so.3
#1 0x00007f87be3f004e in dbus_message_unref () from /usr/lib64/libdbus-1.so.3
#2 0x00007f87bf6ecf95 in virSystemdGetMachineNameByPID (pid=9849) at util/virsystemd.c:228
#3 0x00007f879761bd4d in qemuConnectCgroup (driver=0x7f87600a32a0, vm=0x7f87600c7550) at qemu/qemu_cgroup.c:909
#4 0x00007f87976386b7 in qemuProcessReconnect (opaque=0x7f87600db840) at qemu/qemu_process.c:3386
#5 0x00007f87bf6edfff in virThreadHelper (data=0x7f87600d5580) at util/virthread.c:206
#6 0x00007f87bb602334 in start_thread (arg=0x7f878ebda700) at pthread_create.c:333
#7 0x00007f87bb3481bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) frame 2
#2 0x00007f87bf6ecf95 in virSystemdGetMachineNameByPID (pid=9849) at util/virsystemd.c:228
228 dbus_message_unref(reply);
(gdb) p reply
$1 = (DBusMessage *) 0x0
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Move the logic from virDomainDiskDefDstDuplicates into
virDomainDiskDefCheckDuplicateInfo so that we don't have to loop
multiple times through the array of disks. Since the original function
was called in qemuBuildDriveDevStr, it was actually called for every
single disk which was quite wasteful.
Additionally the target uniqueness check needed to be duplicated in
the disk hotplug case, since the disk was inserted into the domain
definition after the device string was formatted and thus
virDomainDiskDefDstDuplicates didn't do anything in that case.
In the reverted commit d2e5538b1, the libxl driver was changed to copy
interface names autogenerated by libxl to the corresponding network def
in the domain's virDomainDef object. The copied name is freed when the
domain transitions to the shutoff state. But when migrating a domain,
the autogenerated name is included in the XML sent to the destination
host. It is possible an interface with the same name already exists on
the destination host, causing migration to fail.
This patch defines a new capability for setting the network device
prefix that will be used in the driver. Valid prefixes are
VIR_NET_GENERATED_PREFIX or the one announced by the driver.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Same as for deserializer, this method might get handy for admin one day.
The major reason for this patch is to stay consistent with idea, i.e.
when deserializer can be shared, why not serializer as well. The only
problem to be solved was that the daemon side serializer uses a code
snippet which handles sparse arrays returned by some APIs as well as
removes any string parameters that can't be returned to older clients.
This patch makes of the new virTypedParameterRemote datatype introduced
by one of the pvious patches.
Since the method is static to remote_driver, it can't even be used by our
daemon. Other than that, it would be useful to be able to use it with admin as
well. This patch uses the new virTypedParameterRemote datatype introduced in
one of previous patches.
Currently, the deserializer is hardcoded into remote_driver which makes
it impossible for admin to use it. One way to achieve a shared implementation
(besides moving the code to another module) would be pass @ret_params_val as a
void pointer as opposed to the remote_typed_param pointer and add a new extra
argument specifying which of those two protocols is being used and typecast
the pointer at the function entry. An example from remote_protocol:
struct remote_typed_param_value {
int type;
union {
int i;
u_int ui;
int64_t l;
uint64_t ul;
double d;
int b;
remote_nonnull_string s;
} remote_typed_param_value_u;
};
typedef struct remote_typed_param_value remote_typed_param_value;
struct remote_typed_param {
remote_nonnull_string field;
remote_typed_param_value value;
};
That would leave us with a bunch of if-then-elses that needed to be used across
the method. This patch takes the other approach using the new datatype
introduced in one of earlier commits.
A pretty nasty deadlock occurs while trying to rename a VM in parallel
with virDomainObjListNumOfDomains.
The short description of the problem is as follows:
Thread #1:
qemuDomainRename:
------> aquires domain lock by qemuDomObjFromDomain
---------> waits for domain list lock in any of the listed functions:
- virDomainObjListFindByName
- virDomainObjListRenameAddNew
- virDomainObjListRenameRemove
Thread #2:
virDomainObjListNumOfDomains:
------> aquires domain list lock
---------> waits for domain lock in virDomainObjListCount
Introduce generic virDomainObjListRename function for renaming domains.
It aquires list lock in right order to avoid deadlock. Callback is used
to make driver specific domain updates.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In some cases it may be better to have a bitmap representing state of
individual vcpus rather than iterating the definition. The new helper
creates a bitmap representing the state from the domain definition.
The name is confusing, and there are just two uses: one is a test case,
and the other will be removed as part of an upcoming refactoring of
the hostdev code.
This patch creates two bitmaps, one for macvlan device names and one
for macvtap. The bitmap position is used to indicate that libvirt is
currently using a device with the name macvtap%d/macvlan%d, where %d
is the position in the bitmap. When requested to create a new
macvtap/macvlan device, libvirt will now look for the first clear bit
in the appropriate bitmap and derive the device name from that rather
than just starting at 0 and counting up until one works.
When libvirtd is restarted, the qemu driver code that reattaches to
active domains calls the appropriate function to "re-reserve" the
device names as it is scanning the status of running domains.
Note that it may seem strange that the retry counter now starts at
8191 instead of 5. This is because we now don't do a "pre-check" for
the existence of a device once we've reserved it in the bitmap - we
move straight to creating it; although very unlikely, it's possible
that someone has a running system where they have a large number of
network devices *created outside libvirt* named "macvtap%d" or
"macvlan%d" - such a setup would still allow creating more devices
with the old code, while a low retry max in the new code would cause a
failure. Since the objective of the retry max is just to prevent an
infinite loop, and it's highly unlikely to do more than 1 iteration
anyway, having a high max is a reasonable concession in order to
prevent lots of new failures.
On the host when we start a container, it will be
placed in a cgroup path of
/machine.slice/machine-lxc\x2ddemo.scope
under /sys/fs/cgroup/*
Inside the containers' namespace we need to setup
/sys/fs/cgroup mounts, and currently will bind
mount /machine.slice/machine-lxc\x2ddemo.scope on
the host to appear as / in the container.
While this may sound nice, it confuses applications
dealing with cgroups, because /proc/$PID/cgroup
now does not match the directory in /sys/fs/cgroup
This particularly causes problems for systems and
will make it create repeated path components in
the cgroup for apps run in the container eg
/machine.slice/machine-lxc\x2ddemo.scope/machine.slice/machine-lxc\x2ddemo.scope/user.slice/user-0.slice/session-61.scope
This also causes any systemd service that uses
sd-notify to fail to start, because when systemd
receives the notification it won't be able to
identify the corresponding unit it came from.
In particular this break rabbitmq-server startup
Future kernels will provide proper cgroup namespacing
which will handle this problem, but until that time
we should not try to play games with hiding parent
cgroups.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The VIR_DOMAIN_EVENT_ID_MIGRATION_ITERATION event will be triggered
whenever VIR_DOMAIN_JOB_MEMORY_ITERATION changes its value, i.e.,
whenever a new iteration over guest memory pages is started during
migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This new function will add a single controller of the given model,
except the case of ich9-usb-ehci1 (the master controller for a USB2
controller set) in which case a set of related controllers will be
added (EHCI1, UHCI1, UHCI2, UHCI3). These controllers will not be
given PCI addresses, but should be otherwise ready to use.
"-1" is allowed for controller model, and means "default for this
machinetype". This matches the existing practice in
qemuDomainDefPostParse(), which always adds the default controller
with model = -1, and relies on the commandline builder to set a model
(that is wrong, but will be fixed later).
This replaces the virPCIKnownStubs string array that was used
internally for stub driver validation.
Advantages:
* possible values are well-defined
* typos in driver names will be detected at compile time
* avoids having several copies of the same string around
* no error checking required when setting / getting value
The names used mirror those in the
virDomainHostdevSubsysPCIBackendType enumeration.
We only support hotplugging SCSI controllers.
The USB and virtio-serial related code was never reachable because
this function was only called for VIR_DOMAIN_CONTROLLER_TYPE_SCSI
controllers.
This reverts commit ee0d97a and parts of commits 16db8d2
and d6d54cd1.
This function can be used to retrieve the current locked memory
limit for a process, so that the setting can be later restored.
Add a configure check for getrlimit(), which we now use.
On the very first log message we send to any output, we include
the libvirt version number and package string. In some bug reports
we have been given libvirtd.log files that came from a different
host than the corresponding /var/log/libvirt/qemu log files. So
extend the initial log message to include the hostname too.
eg on first log message we would now see:
$ libvirtd
2015-12-04 17:35:36.610+0000: 20917: info : libvirt version: 1.3.0
2015-12-04 17:35:36.610+0000: 20917: info : hostname: dhcp-1-180.lcy.redhat.com
2015-12-04 17:35:36.610+0000: 20917: error : qemuMonitorIO:687 : internal error: End of file from monitor
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Once more stuff will be moved into the vCPU data structure it will be
necessary to get a specific one in some ocasions. Add a helper that will
simplify this task.
Our domain_conf.* files are big enough. Not only they contain XML
parsing code, but they served as a storage of all functions whose
name is virDomain prefixed. This is just wrong as it gathers not
related functions (and modules) into one big file which is then
harder to maintain. Split virDomainObjList module into a separate
file called virdomainobjlist.[ch].
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
As we need to provide support for URI aliases in libvirt-admin as well, URI
alias matching needs to be internally visible. Since
virConnectOpenResolveURIAlias does have a compatible signature, it could be
easily reused by libvirt-admin. This patch moves URI alias matching to util,
renaming it accordingly.
virConnectGetConfig and virConnectGetConfigPath were static libvirt
methods, merely because there hasn't been any need for having them
internally exported yet. Since libvirt-admin also needs to reference
its config file, 'xGetConfig' should be exported.
Besides moving, this patch also renames the methods accordingly,
as they are libvirt config specific.
Add the virLogManager API which allows for communication with
the virtlogd daemon to RPC program. This provides the client
side API to open log files for guest domains.
The virtlogd daemon is setup to auto-spawn on first use when
running unprivileged. For privileged usage, systemd socket
activation is used instead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add virRotatingFileReader and virRotatingFileWriter objects
which allow reading & writing from/to files with automation
rotation to N backup files when a size limit is reached. This
is useful for guest logging when a guaranteed finite size
limit is required. Use of external tools like logrotate is
inadequate since it leaves the possibility for guest to DOS
the host in between invokations of logrotate.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a new helper function "virDiskNameParse" which extends
virDiskNameToIndex but handling both disk index and partition index.
Also rework virDiskNameToIndex to be based on virDiskNameParse.
A test is also added for this function testing both valid and
invalid disk names.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
commit db488c79 assumed that dnsmasq would complete IPv6 DAD before
daemonizing, but in reality it doesn't wait, which creates problems
when libvirt's bridge driver sets the matching "dummy tap device" to
IFF_DOWN prior to DAD completing.
This patch waits for DAD completion by periodically polling the kernel
using netlink to check whether there are any IPv6 addresses assigned
to bridge which have a 'tentative' state (if there are any in this
state, then DAD hasn't yet finished). After DAD is finished, execution
continues. To avoid an endless hang in case something was wrong with
the kernel's DAD, we wait a maximum of 5 seconds.
Let's check to ensure we can find the Partition Table in the label
and that libvirt actually recognizes that type; otherwise, when we
go to read the partitions during a refresh operation we may not be
reading what we expect.
This will expand upon the types of errors or reason that a build
would fail, so we can create more direct error messages.
Add 'initial_memory' member to struct virDomainMemtune so that the
memory size can be pre-calculated once instead of inferring it always
again and again.
Separating of the fields will also allow finer granularity of decisions
in later patches where it will allow to keep the old initial memory
value in cases where we are handling incomming migration from older
versions that did not always update the size from NUMA as the code did
previously.
The change also requires modification of the qemu memory alignment
function since at the point where we are modifying the size of NUMA
nodes the total size needs to be recalculated too.
The refactoring done in this patch also fixes a crash in the hyperv
driver that did not properly initialize def->numa and thus
virDomainNumaGetMemorySize(def->numa) crashed.
In summary this patch should have no functional impact at this point.
Similar to commit id '35847860', it's possible to attempt to create
a 'netfs' directory in an NFS root-squash environment which will cause
the 'vol-delete' command to fail. It's also possible error paths from
the 'vol-create' would result in an error to remove a created directory
if the permissions were incorrect (and disallowed root access).
Thus rename the virFileUnlink to be virFileRemove to match the C API
functionality, adjust the code to following using rmdir or unlink
depending on the path type, and then use/call it for the VIR_STORAGE_VOL_DIR
I always felt like this function is qemu specific rather than
libvirt-wide. Other drivers may act differently on virDomainDef
change and in fact may require talking to underlying hypervisor
even if something else's than disk->src has changed. I know that
the function is still incomplete, but lets break that into two
commits that are easier to review. This one is pure code
movement.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
These functions were made static as a part of commit cbfe38c since
they were no longer called from outside virnetdev.c. We once again
need to call them from another file, so this patch makes them once
again public.
In an NFS root-squashed environment the 'vol-delete' command will fail to
'unlink' the target volume since it was created under a different uid:gid.
This code continues the concepts introduced in virFileOpenForked and
virDirCreate[NoFork] with respect to running the unlink command under
the uid/gid of the child. Unlike the other two, don't retry on EACCES
(that's why we're here doing this now).
That function takes string list and returns first string in that list
that starts with the @prefix parameter with that prefix being skipped as
the caller knows what it starts with (also for easier manipulation in
future).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In order to share as much virsh' logic as possible with upcomming
virt-admin client we need to split virsh logic into virsh specific and
client generic features.
Since majority of virsh methods should be generic enough to be used by
other clients, it's much easier to rename virsh specific data to virshX
than doing this vice versa. It moved generic virsh commands (including info
and opts structures) to generic module vsh.c.
Besides renaming methods and structures, this patch also involves introduction
of a client specific control structure being referenced as private data in the
original control structure, introduction of a new global vsh Initializer,
which currently doesn't do much, but there is a potential for added
functionality in the future.
Lastly it introduced client hooks which are especially necessary during
client connecting phase.
We just need to update the entry in the second hash table. Since commit 8728a56
we have two hash tables for the domain list so that we can do O(1) lookup
regardless of looking up by UUID or name. Since with renaming a domain UUID does
not change, we only need to update the second hash table, where domains are
referenced by their name.
We will call both functions from the qemuDomainRename().
Signed-off-by: Tomas Meszaros <exo@tty.sk>
This new subelement is used in PCI controllers: the toplevel
*attribute* "model" of a controller denotes what kind of PCI
controller is being described, e.g. a "dmi-to-pci-bridge",
"pci-bridge", or "pci-root". But in the future there will be different
implementations of some of those types of PCI controllers, which
behave similarly from libvirt's point of view (and so should have the
same model), but use a different device in qemu (and present
themselves as a different piece of hardware in the guest). In an ideal
world we (i.e. "I") would have thought of that back when the pci
controllers were added, and used some sort of type/class/model
notation (where class was used in the way we are now using model, and
model was used for the actual manufacturer's model number of a
particular family of PCI controller), but that opportunity is long
past, so as an alternative, this patch allows selecting a particular
implementation of a pci controller with the "name" attribute of the
<model> subelement, e.g.:
<controller type='pci' model='dmi-to-pci-bridge' index='1'>
<model name='i82801b11-bridge'/>
</controller>
In this case, "dmi-to-pci-bridge" is the kind of controller (one that
has a single PCIe port upstream, and 32 standard PCI ports downstream,
which are not hotpluggable), and the qemu device to be used to
implement this kind of controller is named "i82801b11-bridge".
Implementing the above now will allow us in the future to add a new
kind of dmi-to-pci-bridge that doesn't use qemu's i82801b11-bridge
device, but instead uses something else (which doesn't yet exist, but
qemu people have been discussing it), all without breaking existing
configs.
(note that for the existing "pci-bridge" type of PCI controller, both
the model attribute and <model> name are 'pci-bridge'. This is just a
coincidence, since it turns out that in this case the device name in
qemu really is a generic 'pci-bridge' rather than being the name of
some real-world chip)
This function should return the greatest CPU number set in
/domain/cpu/numa/cell/@cpus. The idea is that we should compare
the returned value against /domain/vcpu value. Yes, there exist
users who think the following is a good idea:
<vcpu placement='static'>4</vcpu>
<cpu mode='host-model'>
<model fallback='allow'/>
<numa>
<cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
<cell id='1' cpus='9-10' memory='2097152' unit='KiB'/>
</numa>
</cpu>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Qemu reports physical size 0 for block devices. As 15fa84acbb
changed the behavior of qemuDomainGetBlockInfo to just query the monitor
this created a regression since we didn't report the size correctly any
more.
This patch adds code to refresh the physical size of a block device by
opening it and seeking to the end and uses it both in
qemuDomainGetBlockInfo and also in qemuDomainGetStatsOneBlock that was
broken since it was introduced in this respect.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1250982
The nodeinfo is reporting incorrect number of cpus and incorrect host
topology on PPC64 KVM hosts. The KVM hypervisor on PPC64 needs only
the primary thread in a core to be online, and the secondaries offlined.
While scheduling a guest in, the kvm scheduler wakes up the secondaries to
run in guest context.
The host scheduling of the guests happen at the core level(as only primary
thread is online). The kvm scheduler exploits as many threads of the core
as needed by guest. Further, starting POWER8, the processor allows splitting
a physical core into multiple subcores with 2 or 4 threads each. Again, only
the primary thread in a subcore is online in the host. The KVM-PPC
scheduler allows guests to exploit all the offline threads in the subcore,
by bringing them online when needed.
(Kernel patches on split-core http://www.spinics.net/lists/kvm-ppc/msg09121.html)
Recently with dynamic micro-threading changes in ppc-kvm, makes sure
to utilize all the offline cpus across guests, and across guests with
different cpu topologies.
(https://www.mail-archive.com/kvm@vger.kernel.org/msg115978.html)
Since the offline cpus are brought online in the guest context, it is safe
to count them as online. Nodeinfo today discounts these offline cpus from
cpu count/topology calclulation, and the nodeinfo output is not of any help
and the host appears overcommited when it is actually not.
The patch carefully counts those offline threads whose primary threads are
online. The host topology displayed by the nodeinfo is also fixed when the
host is in valid kvm state.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
The new name makes it clear that the returned bitmap contains the
information about which CPUs are online, not eg. which CPUs are
present.
No behavioral change.
If one calls update-device with information that is not updatable,
libvirt reports success even though no data were updated. The example
used in the bug linked below uses updating device with <boot order='2'/>
which, in my opinion, is a valid thing to request from user's
perspective. Mainly since we properly error out if user wants to update
such data on a network device for example.
And since there are many things that might happen (update-device on disk
basically knows just how to change removable media), check for what's
changing and moreover, since the function might be usable in other
drivers (updating only disk path is a valid possibility) let's abstract
it for any two disks.
We can't possibly check for everything since for many fields our code
does not properly differentiate between default and unspecified values.
Even though this could be changed, I don't feel like it's worth the
complexity so it's not the aim of this patch.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1007228
This is a self-locking wrapper around virHashTable. Only a limited set
of APIs are implemented now (the ones which are used in the following
patch) as more can be added on demand.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
While working in qemu_monitor_json, I repeatedly found myself
getting a value then checking if it was an object. Add some
wrappers to make this task easier.
* src/util/virjson.c (virJSONValueObjectGetByType)
(virJSONValueObjectGetObject, virJSONValueObjectGetArray): New
functions.
(virJSONValueObjectGetString, virJSONValueObjectGetNumberInt)
(virJSONValueObjectGetNumberUint)
(virJSONValueObjectGetNumberLong)
(virJSONValueObjectGetNumberUlong)
(virJSONValueObjectGetNumberDouble)
(virJSONValueObjectGetBoolean): Simplify.
(virJSONValueIsNull): Change return type.
* src/util/virjson.h: Reflect changes.
* src/libvirt_private.syms (virjson.h): Export them.
* tests/jsontest.c (testJSONLookup): New test.
Signed-off-by: Eric Blake <eblake@redhat.com>
The wrapper is useful for calling qemuBlockJobEventProcess with the
event details stored in disk's privateData, which is the most likely
usage of qemuBlockJobEventProcess.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Complex jobs, such as migration, need to monitor several events at once,
which is impossible when each of the event uses its own condition
variable. This patch adds a single condition variable to each domain
object. This variable can be used instead of the other event specific
conditions.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Add multikey API:
* virTypedParamsFilter that filters all the parameters with specified name.
* virTypedParamsGetStringList that returns a list with all the values for
specified name and string type.
Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virDomainObjGetOneDef will help to retrieve the correct definition
pointer from @vm in cases where VIR_DOMAIN_AFFECT_LIVE and
VIR_DOMAIN_AFFECT_CONFIG are mutually exclusive. The function simply
returns the correct pointer. This similarly to virDomainObjGetDefs will
greatly simplify the code.
https://bugzilla.redhat.com/show_bug.cgi?id=1220527
This type of information defines attributes of a system
baseboard. With one exception: board type is yet not implemented
in qemu so it's not introduced here either.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Move all the system_* fields into a separate struct. Not only this
simplifies the code a bit it also helps us to identify whether BIOS
info is present. We don't have to check all the four variables for
being not-NULL, but we can just check the pointer to the struct.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Move all the bios_* fields into a separate struct. Not only this
simplifies the code a bit it also helps us to identify whether BIOS
info is present. We don't have to check all the four variables for
being not-NULL, but we can just check the pointer to the struct.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virDomainLiveConfigHelperMethod that is used for this job now does
modify the flags but still requires the callers to extract the correct
definition objects.
In addition coverity and other static analyzers are usually unhappy as
they don't grasp the fact that @flags are upadted according to the
correct def to be present.
To work this issue around and simplify the calling chain let's add a new
helper that will work only on drivers that always copy the persistent
def to a transient at start of a vm. This will allow to drop a few
arguments. The new function syntax will also fill two definition
pointers rather than modifying the @flags parameter.
Since some functions can be optimized by reusing the buffers that they
already have instead of allocating and copying new ones, lets split
virBitmapToData to two functions where one only converts the data and
the second one is a wrapper that allocates the buffer if necessary.
Store the emulator pinning cpu mask as a pure virBitmap rather than the
virDomainPinDef since it stores only the bitmap and refactor
qemuDomainPinEmulator to do the same operations in a much saner way.
As a side effect virDomainEmulatorPinAdd and virDomainEmulatorPinDel can
be removed since they don't add any value.
Sometimes the only thing we need is the pointer to virDomainDiskDef and
having to call virDomainDiskIndexBy* APIs, storing the disk index, and
looking it up in the disks array is ugly. After this patch, we can just
call virDomainDiskBy* and get the pointer in one step.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Two new domain configuration XML elements are added to enable/disable
the protected key management operations for a guest:
<domain>
...
<keywrap>
<cipher name='aes|dea' state='on|off'/>
</keywrap>
...
</domain>
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Daniel Hansel <daniel.hansel@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Because there are multiple potential reasons for an error, this
function logs any errors before returning NULL (since the caller won't
have the information needed to determine which was the reason for
failure).
The APIs take the memory value in KiB and we store it in KiB
internally, but we cannot parse the whole ULONG_MAX range
on 64-bit systems, because virDomainParseScaledValue
needs to fit the value in bytes in an unsigned long long.
https://bugzilla.redhat.com/show_bug.cgi?id=1176739
Until now the virDomainListAllDomains API would lock the domain list and
then every single domain object to access and filter it. This would
potentially allow a unresponsive VM to block the whole daemon if a
*listAllDomains call would get stuck.
To avoid this problem this patch collects a list of referenced domain
objects first from the list and then unlocks it right away. The
expensive operation requiring locking of the domain object is executed
after the list lock is dropped. While a single blocked domain will still
lock up a listAllDomains call, the domain list won't be held locked and
thus other APIs won't be blocked.
Additionally this patch also fixes the lookup code, where we'd ignore
the vm->removing flag and thus potentially return domain objects that
would be deleted very soon so calling any API wouldn't make sense.
As other clients also could benefit from operating on a list of domain
objects rather than the public domain descriptors a new intermediate
API - virDomainObjListCollect - is introduced by this patch.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1181074
Extend it to a universal helper used for clearing lists of any objects.
Note that the argument type is specifically void * to allow implicit
typecasting.
Additionally add a helper that works on non-NULL terminated arrays once
we know the length.
We already check that any auto-assigned bridge device name for a
virtual network (e.g. "virbr1") doesn't conflict with the bridge name
for any existing libvirt network (via virNetworkSetBridgeName() in
conf/network_conf.c).
We also want to check that the name doesn't conflict with any bridge
device created on the host system outside the control of libvirt
(history: possibly due to the ploriferation of references to libvirt's
bridge devices in HOWTO documents all around the web, it is not
uncommon for an admin to manually create a bridge in their host's
system network config and name it "virbrX"). To add such a check to
virNetworkBridgeInUse() (which is called by virNetworkSetBridgeName())
we would have to call virNetDevExists() (from util/virnetdev.c); this
function calls ioctl(SIOCGIFFLAGS), which everyone on the mailing list
agreed should not be done from an XML parsing function in the conf
directory.
To remedy that problem, this patch removes virNetworkSetBridgeName()
from conf/network_conf.c and puts an identically functioning
networkBridgeNameValidate() in network/bridge_driver.c (because it's
reasonable for the bridge driver to call virNetDevExists(), although
we don't do that yet because I wanted this patch to have as close to 0
effect on function as possible).
There are a couple of inevitable changes though:
1) We no longer check the bridge name during
virNetworkLoadConfig(). Close examination of the code shows that
this wasn't necessary anyway - the only *correct* way to get XML
into the config files is via networkDefine(), and networkDefine()
will always call networkValidate(), which previously called
virNetworkSetBridgeName() (and now calls
networkBridgeNameValidate()). This means that the only way the
bridge name can be unset during virNetworkLoadConfig() is if
someone edited the config file on disk by hand (which we explicitly
prohibit).
2) Just on the off chance that somebody *has* edited the file by hand,
rather than crashing when they try to start their malformed
network, a check for non-NULL bridge name has been added to
networkStartNetworkVirtual().
(For those wondering why I don't instead call
networkValidateBridgeName() there to set a bridge name if one
wasn't present - the problem is that during
networkStartNetworkVirtual(), the lock for the network being
started has already been acquired, but the lock for the network
list itself *has not* (because we aren't adding/removing a
network). But virNetworkBridgeInuse() iterates through *all*
networks (including this one) and locks each network as it is
checked for a duplicate entry; it is necessary to lock each network
even before checking if it is the designated "skip" network because
otherwise some other thread might acquire the list lock and delete
the very entry we're examining. In the end, permitting a setting of
the bridge name during network start would require that we lock the
entire network list during any networkStartNetwork(), which
eliminates a *lot* of parallelism that we've worked so hard to
achieve (it can make a huge difference during libvirtd startup). So
rather than try to adjust for someone playing against the rules, I
choose to instead give them the error they deserve.)
3) virNetworkAllocateBridge() (now removed) would leak any "template"
string set as the bridge name. Its replacement
networkFindUnusedBridgeName() doesn't leak the template string - it
is properly freed.
Add qemuDomainAddIOThread and qemuDomainDelIOThread in order to add or
remove an IOThread to/from the host either for live or config optoins
The implementation for the 'live' option will use the iothreadpids list
in order to make decision, while the 'config' option will use the
iothreadids list. Additionally, for deletion each may have to adjust
the iothreadpin list.
IOThreads are implemented by qmp objects, the code makes use of the existing
qemuMonitorAddObject or qemuMonitorDelObject APIs.
Signed-off-by: John Ferlan <jferlan@redhat.com>
We're about to allow IOThreads to be deleted, but an iothreadid may be
included in some domain thread sched, so add a new API to allow removing
an iothread from some entry.
Then during the writing of the threadsched data and an additional check
to determine whether the bitmap is all clear before writing it out.
Since it's only ever referenced in domain_conf.c, make the function
static, but also will need to move it to somewhere before it's referenced
rather than forward referencing it.
Adding a new XML element 'iothreadids' in order to allow defining
specific IOThread ID's rather than relying on the algorithm to assign
IOThread ID's starting at 1 and incrementing to iothreads count.
This will allow future patches to be able to add new IOThreads by
a specific iothread_id and of course delete any exisiting IOThread.
Each iothreadids element will have 'n' <iothread> children elements
which will have attribute "id". The "id" will allow for definition
of any "valid" (eg > 0) iothread_id value.
On input, if any <iothreadids> <iothread>'s are provided, they will
be marked so that we only print out what we read in.
On input, if no <iothreadids> are provided, the PostParse code will
self generate a list of ID's starting at 1 and going to the number
of iothreads defined for the domain (just like the current algorithm
numbering scheme). A future patch will rework the existing algorithm
to make use of the iothreadids list.
On output, only print out the <iothreadids> if they were read in.
This is basically turning qemuDomObjEndAPI into a more general
function. Other drivers which gets a reference to domain objects may
benefit from this function too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1113474
When we set the MAC address of a network device as a part of setting
up macvtap "passthrough" mode (where the domain has an emulated netdev
connected to a host macvtap device that has exclusive use of the
physical device, and sets the device MAC address to match its own,
i.e. "<interface type='direct'> <source mode='passthrough' .../>"), we
use ioctl(SIOCSIFHWADDR) giving it the name of that device. This is
true even if it is an SRIOV Virtual Function (VF).
But, when we are setting the MAC address / vlan ID of a VF in
preparation for "hostdev network" passthrough (this is where we set
the MAC address and vlan id of the VF after detaching the host net
driver and before assigning the device to the domain with PCI
passthrough, i.e. "<interface type='hostdev'>", we do the setting via
a netlink RTM_SETLINK message for that VF's Physical Function (PF),
telling it the VF# we want to change. This sets an "administratively
changed MAC" flag for that VF in the PF's driver, and from that point
on (until the PF driver is reloaded, *not* merely the VF driver) that
VF's MAC address can't be changed using ioctl(SIOCSIFHWADDR) - the
only way to change it is via the PF with RTM_SETLINK.
This means that if a VF is used for hostdev passthrough, it will have
the admin flag set, and future attempts to use that VF for macvtap
passthrough will fail.
The solution to this problem is to check if the device being used for
macvtap passthrough is actually a VF; if so, we use the netlink
RTM_SETLINK message to the PF to set the VF's mac address instead of
ioctl(SIOCSIFHWADDR) directly to the VF; if not, behavior does not
change from previously.
There are three pieces to making this work:
1) virNetDevMacVLan(Create|Delete)WithVPortProfile() now call
virNetDev(Replace|Restore)NetConfig() rather than
virNetDev(Replace|Restore)MacAddress() (simply passing -1 for VF#
and vlanid).
2) virNetDev(Replace|Restore)NetConfig() check to see if the device is
a VF. If so, they find the PF's name and VF#, allowing them to call
virNetDev(Replace|Restore)VfConfig().
3) To prevent mixups when detaching a macvtap passthrough device that
had been attached while running an older version of libvirt,
virNetDevRestoreVfConfig() is potentially given the preserved name
of the VF, and if the proper statefile for a VF can't be found in
the stateDir (${stateDir}/${pfname}_vf${vfid}),
virNetDevRestoreMacAddress() is called instead (which will look in
the file named ${stateDir}/${vfname}).
This problem has existed in every version of libvirt that has both
macvtap passthrough and interface type='hostdev'. Fortunately people
seem to use one or the other though, so it hasn't caused any real
world problem reports.
This revealed that GuestDefaultEmulator was a bit buggy, capable
of returning an emulator that didn't match the passed domain type. Fix
up the test suite input to continue to pass.
This is a helper function to look up all capabilities data for all
the OS bits that are relevant to <domain>. This is
- os type
- arch
- domain type
- emulator
- machine type
This will be used to replace several functions in later commits.
But the internal API stays the same, and we just convert the value as
needed. Not useful yet, but this is the beginning step of using an enum
for ostype throughout the code.
This is a simple wrapper around virNetDevBandwidthManipulateFilter() that
will update the desired filter on an interface (usually a network bridge)
with a new MAC address. Although, the MAC address in question usually
refers to some other interface - the one that the filter is constructed
for. Yeah, hard to parse. Thing is, our NATed network has a bridge where
some part of QoS takes place. And vNICs from guests are plugged into
the bridge. However, if a guest decides to change the MAC of its vNIC,
the corresponding qemu process emits an event which we can use to
update the QoS configuration based on the new MAC address.. However,
our QoS hierarchy is currently not notified, therefore it falls apart.
This function (when called in response to the aforementioned event)
will update our QoS hierarchy and duct tape it together again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add virStringHasControlChars that checks if the string has
any control characters other than \t\r\n,
and virStringStripControlChars that removes them in-place.
Two non-static functions in virjson.c were missing their export info in
libvirt_private.syms, so they couldn't be used anywhere it the code (and
that's about to get changed).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Rename it to virNetDevGetIPv4AddressIoctl and make
virNetDevGetIPAddress a wrapper around it, allowing
other ways of getting the address to be implemented,
and still falling back to the old method.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Create a new common API to replace the virCgroupNew{Vcpu|Emulator|IOThread}
API's using an emum to generate the cgroup name
Signed-off-by: John Ferlan <jferlan@redhat.com>
This new internal API checks if given CGroup controller is
available. It is going to be needed later when we need to make a
decision whether pin domain memory onto NUMA nodes using cpuset
CGroup controller or using numa_set_membind().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Introduce virStoragePoolSaveState to properly format the state XML in
the same manner as virStoragePoolDefFormat, except for adding a
<poolstate> ... </poolstate> around the definition. This is similar to
virNetworkObjFormat used to save the live/active network information.
Instead of always using controller 0 and incrementing port number,
respect the maximum port numbers of controllers and use all of them.
Ports for virtio consoles are quietly reserved, but not formatted
(neither in XML nor on QEMU command line).
Also rejects duplicate virtio-serial addresses.
https://bugzilla.redhat.com/show_bug.cgi?id=890606https://bugzilla.redhat.com/show_bug.cgi?id=1076708
Test changes:
* virtio-auto.args
Filling out the port when just the controller is specified.
switched from using
maxport + 1
to:
first free port on the controller
* virtio-autoassign.args
Filling out the address when no <address> is specified.
Started using all the controllers instead of 0, also discards
the bus value.
* xml -> xml output of virtio-auto
The port assignment is no longer done as a part of XML parsing,
so the unspecified values stay 0.
Create a sorted array of virtio-serial controllers.
Each of the elements contains the controller index
and a bitmap of available ports.
Buses are not tracked, because they aren't supported by QEMU.
virDomainHasDiskMirror() currently detects only jobs that add the mirror
elements. Since some operations like migration are interlocked by
existing block jobs on the given domain the check needs to be
instrumented to check regular jobs too.
This patch renames virDomainHasDiskMirror to virDomainHasDiskBlockjob
and adds an argument that allows to select that it returns true only for
block copy jobs as those interlock making the domain persistent.
Other two uses trigger on any block job type.
Signed-off-by: Shanzhi Yu <shyu@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
libvirt has always used the netlink RTM_DELLINK message to delete
macvtap/macvlan devices, but it can actually be used to delete other
types of network devices, such as bonds and bridges. This patch makes
virNetDevMacVLanDelete() available as a generic function so it can
intelligibly be called to delete these other types of interfaces.
Recently we've fixed a bug where the status XML could not be parsed as
the parser used absolute path XPath queries. This test enhancement tests
all XML files used in the qemu-xml-2-xml test as a part of a status XML
snippet to see whether they are parsed correctly. The status XML-2-XML is
currently tested in 223 cases with this patch.
The current auto-indentation buffer code applies indentation only on
complete strings. To allow adding a string containing newlines and
having it properly indented this patch adds virBufferAddStr.
Every thread created as a worker thread within a pool gets a name
according to virThreadPoolJobFunc name.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Automatically assign a job to every thread created by virThreadCreate.
The name of the virThreadFunc function passed to virThreadCreate is used
as the job or worker name in case no name is explicitly passed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Each thread can use a thread local variable to keep the name of a job
which is currently running in the job.
The virThreadJobSetWorker API is supposed to be called once by any
thread which is used as a worker, i.e., it is waiting in a pool, woken
up to do a job, and returned back to the pool.
The virThreadJobSet/virThreadJobClear APIs are to be called at the
beginning/end of each job.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Add a few helpers that allow to operate with memory device definitions
on the domain config and use them to implement memory device coldplug in
the qemu driver.
This patch adds code that parses and formats configuration for memory
devices.
A simple configuration would be:
<memory model='dimm'>
<target>
<size unit='KiB'>524287</size>
<node>0</node>
</target>
</memory>
A complete configuration of a memory device:
<memory model='dimm'>
<source>
<pagesize unit='KiB'>4096</pagesize>
<nodemask>1-3</nodemask>
</source>
<target>
<size unit='KiB'>524287</size>
<node>1</node>
</target>
</memory>
This patch preemptively forbids use of the <memory> device in individual
drivers so the users are warned right away that the device is not
supported.
Add a XML element that will allow to specify maximum supportable memory
and the count of memory slots to use with memory hotplug.
To avoid possible confusion and misuse of the new element this patch
also explicitly forbids the use of the maxMemory setting in individual
drivers's post parse callbacks. This limitation will be lifted when the
support is implemented.
This function does not make any sense now, that network driver is
(almost) dropped. I mean, previously, when threads were
serialized, this function was there to check, if no other network
with the same name or UUID exists. However, nowadays that threads
can run more in parallel, this function is useless, in fact it
gives misleading return values. Consider the following scenario.
Two threads, both trying to define networks with same name but
different UUID (e.g. because it was generated during XML parsing
phase, whatever). Lets assume that both threads are about to call
networkValidate() which immediately calls
virNetworkObjIsDuplicate().
T1: calls virNetworkObjIsDuplicate() and since no network with
given name or UUID exist, success is returned.
T2: calls virNetworkObjIsDuplicate() and since no network with
given name or UUID exist, success is returned.
T1: calls virNetworkAssignDef() and successfully places its
network into the virNetworkObjList.
T2: calls virNetworkAssignDef() and since network with the same
name exists, the network definition is replaced.
Okay, this is mainly because virNetworkAssignDef() does not check
whether name and UUID matches. Well, lets make it so! And drop
useless function too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Adds the port type definitions and methods that will be used to bind
interfaces to the Midonet virtual ports.
virtnetdevmidonet.c adds the way to bind and unbind the ports by
calling into the Midonet Host Agent control command line (installed
with the midolman package).
Signed-off-by: Antoni Segura Puimedon <toni+libvirt@midokura.com>
As there are two possible approaches to define a domain's memory size -
one used with legacy, non-NUMA VMs configured in the <memory> element
and per-node based approach on NUMA machines - the user needs to make
sure that both are specified correctly in the NUMA case.
To avoid this burden on the user I'd like to replace the NUMA case with
automatic totaling of the memory size. To achieve this I need to replace
direct access to the virDomainMemtune's 'max_balloon' field with
two separate getters depending on the desired size.
The two sizes are needed as:
1) Startup memory size doesn't include memory modules in some
hypervisors.
2) After startup these count as the usable memory size.
Note that the comments for the functions are future aware and document
state that will be present after a few later patches.
virDomainNetFindIdx no longer returns info whether device was not found,
or there was multiple matches. Additionally it already handle error
reporting. Introduce virDomainHasNet which does a simple task, without
implicit error reporting.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
A helper that never returns an error and treats bits out of bitmap range
as false.
Use it everywhere we use ignore_value on virBitmapGetBit, or loop over
the bitmap size.
https://bugzilla.redhat.com/show_bug.cgi?id=1135491
More or less a virtual copy of the existing virDomainVcpuPin{Add|Del} API's.
NB: The IOThreads implementation "reused" the virDomainVcpuPinDefPtr
since it provided everything necessary - an "id" and a "map" for each
thread id configured.
This is going to be needed later, when some functions already
have the virNetworkObjList object already locked and need to
lookup a object to work on. As an example of such function is
virNetworkAssignDef(). The other use case might be in
virNetworkObjListForEach() callback.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This is practically copy of qemuDomObjEndAPI. The reason why is
it so widely available is to avoid code duplication, since the
function is going to be called from our bridge driver, test
driver and parallels driver too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far it's just a structure which happens to have 'Obj' in its
name, but otherwise it not related to virObject at all. No
reference counting, not virObjectLock(), nothing.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Well, one day this will be self-locking object, but not today.
But lets prepare the code for that! Moreover,
virNetworkObjListFree() is no longer needed, so turn it into
virNetworkObjListDispose().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This API will be used in the future to call passed callback over
each network object in the list. It's slightly different to its
virDomainObjListForEach counterpart, because virDomainObjList
uses a hash table to store domain object, while virNetworkObjList
uses an array.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There was a mess in the way how we store unlimited value for memory
limits and how we handled values provided by user. Internally there
were two possible ways how to store unlimited value: as 0 value or as
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED. Because we chose to store memory
limits as unsigned long long, we cannot use -1 to represent unlimited.
It's much easier for us to say that everything greater than
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED means unlimited and leave 0 as valid
value despite that it makes no sense to set limit to 0.
Remove unnecessary function virCompareLimitUlong. The update of test
is to prevent the 0 to be miss-used as unlimited in future.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146539
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The first one is to truncate the memory limit to
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED if the value is greater and the second
one is to decide whether the memory limit is set or not, unlimited means
that it's not set.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Implement virCommandPassFDGetFDIndex to determine the index a given
file descriptor will have when passed to the child process.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Adding functionality to libvirt that will allow it
query the ethtool interface for the availability
of certain NIC HW offload features
Here is an example of the feature XML definition:
<device>
<name>net_eth4_90_e2_ba_5e_a5_45</name>
<path>/sys/devices/pci0000:00/0000:00:03.0/0000:08:00.1/net/eth4</path>
<parent>pci_0000_08_00_1</parent>
<capability type='net'>
<interface>eth4</interface>
<address>90:e2:ba:5e:a5:45</address>
<link speed='10000' state='up'/>
<feature name='rx'/>
<feature name='tx'/>
<feature name='sg'/>
<feature name='tso'/>
<feature name='gso'/>
<feature name='gro'/>
<feature name='rxvlan'/>
<feature name='txvlan'/>
<feature name='rxhash'/>
<capability type='80203'/>
</capability>
</device>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Since adding the support for scheduler policy settings in commit
8680ea97, there are two enums with the same information. That was
caused by rewriting the patch since first draft.
Find out thanks to clang, but there was no impact whatsoever.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1142631
This patch resolves a situation where the same "<target dev='$name'...>"
can be used for multiple disks in the domain.
While the $name is "mostly" advisory regarding the expected order that
the disk is added to the domain and not guaranteed to map to the device
name in the guest OS, it still should be unique enough such that other
domblk* type operations can be performed.
Without the patch, the domblklist will list the same Target twice:
$ virsh domblklist $dom
Target Source
------------------------------------------------
sda /var/lib/libvirt/images/file.qcow2
sda /var/lib/libvirt/images/file.img
Additionally, getting domblkstat, domblkerror, domblkinfo, and other block*
type calls will not be able to reference the second target.
Fortunately, hotplug disallows adding a "third" sda value:
$ qemu-img create -f raw /var/lib/libvirt/images/file2.img 10M
$ virsh attach-disk $dom /var/lib/libvirt/images/file2.img sda
error: Failed to attach disk
error: operation failed: target sda already exists
$
BUT, it since 'sdb' doesn't exist one would get the following on the same
hotplug attempt, but changing to use 'sdb' instead of 'sda'
$ virsh attach-disk $dom /var/lib/libvirt/images/file2.img sdb
error: Failed to attach disk
error: internal error: unable to execute QEMU command 'device_add': Duplicate ID 'scsi0-0-1' for device
$
Since we cannot fix this issue at parsing time, the best that can be done so
as to not "lose" a domain is to make the check prior to starting the guest
with the results as follows:
$ virsh start $dom
error: Failed to start domain $dom
error: XML error: target 'sda' duplicated for disk sources '/var/lib/libvirt/images/file.qcow2' and '/var/lib/libvirt/images/file.img'
$
Running 'make check' found a few more instances in the tests where this
duplicated target dev value was being used. These also exhibited some
duplicated 'id=' values (negating the uniqueness argument of aliases) in
the corresponding .args file and of course the *xmlout version of a few
input XML files.
This API joins the following two lines:
char *s = virBufferContentAndReset(buf1);
virBufferAdd(buf2, s, -1);
into one:
virBufferAddBuffer(buf2, buf1);
With one exception: there's no re-indentation applied to @buf1.
The idea is, that in general both can have different indentation
(like the test I'm adding proves)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since our formatter now handles well if the config is allocated and not
filled we can safely always-allocate the NUMA config and remove the
ad-hoc allocation code.
This will help in later patches as the parser will be refactored to just
fill the data.
Move the existing virDomainDefNew to virDomainDefNewFull as it's setting
a few things in the conf and re-introduce virDomainDefNew as a function
without parameters for common use.
For a while now there are two places that gather information about NUMA
related guest configuration. While the XML can't be changed we can at
least store the data in one place in the definition.
Rename the numatune_conf.[ch] files to numa_conf as later patches will
move the rest of the definitions from the cpu definition to this one.
We do have a check for valid per-domain security model, however we still
do permit an invalid security model for a domain's device (those which
are specified with <source> element).
This patch introduces a new function virSecurityManagerCheckAllLabel
which compares user specified security model against currently
registered security drivers. That being said, it also permits 'none'
being specified as a device security model.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1165485
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The enum converters are defined in the domain_conf.h (so
accessible widely across the code), but on the symbol layer, only
virDomainNetTypeToString was exposed. However, FromString variant
is going to be needed shortly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In order for QEMU vCPU (and other) threads to run with RT scheduler,
libvirt needs to take care of that so QEMU doesn't have to run privileged.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178986
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This function uses sched_setscheduler() function so it works with
processes and threads as well (even threads not created by us, which is
what we'll need in the future).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The helpers will be useful when implementing hotplug and coldplug of
random number generator devices.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
When adding devices to the definition it's useful to check whether the
devices don't reside on a conflicting address. This patch adds a helper
that iterates all device info and compares the addresses with the given
info.
Some code paths have special logic depending on the page size
reported by sysconf, which in turn affects the test results.
We must mock this so tests always have a consistent page size.
This helper eases iterating all key=value pairs stored in a JSON
object. Usually we pick only certain known keys from a JSON object, but
this will allow to walk complete objects and have the callback act on
those.
To be able to easily represent nodesets and other data stored in
virBitmaps in libvirt, this patch introduces a set of helpers that allow
to convert the bitmap to and from JSON value objects.
Extract the logic to determine which nodeset has to be used for a domain
from the formatting step so that it can be reused separately when the
nodeset is used in a different way.
This patch provides the utility functions needed to synchronize
the rxfilter changes made to a guest domain with the corresponding
macvtap devices on the host:
* Get/set PROMISC flag
* Get/set ALLMULTI, MULTICAST
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Do the allocation first, then add the actual device.
The second part should never fail. This is good
for live hotplug where we don't want to remove the device
on OOM after the monitor command succeeded.
The only change in behavior is that on failure, the
vmdef->consoles array is freed, not just the first console.
For stateless, client side drivers, it is never correct to
probe for secondary drivers. It is only ever appropriate to
use the secondary driver that is associated with the
hypervisor in question. As a result the ESX & HyperV drivers
have both been forced to do hacks where they register no-op
drivers for the ones they don't implement.
For stateful, server side drivers, we always just want to
use the same built-in shared driver. The exception is
virtualbox which is really a stateless driver and so wants
to use its own server side secondary drivers. To deal with
this virtualbox has to be built as 3 separate loadable
modules to allow registration to work in the right order.
This can all be simplified by introducing a new struct
recording the precise set of secondary drivers each
hypervisor driver wants
struct _virConnectDriver {
virHypervisorDriverPtr hypervisorDriver;
virInterfaceDriverPtr interfaceDriver;
virNetworkDriverPtr networkDriver;
virNodeDeviceDriverPtr nodeDeviceDriver;
virNWFilterDriverPtr nwfilterDriver;
virSecretDriverPtr secretDriver;
virStorageDriverPtr storageDriver;
};
Instead of registering the hypervisor driver, we now
just register a virConnectDriver instead. This allows
us to remove all probing of secondary drivers. Once we
have chosen the primary driver, we immediately know the
correct secondary drivers to use.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A bunch of code is wrapped in #if WITH_LIBVIRTD in order to
enable the virStateDriver to be disabled when libvirtd is not
built. Disabling this code doesn't have any real functional
benefit beyond removing 1 pointer from the virConnectPtr struct,
while having a cost of many more conditionals.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virDBusMethodCall method has a DBusError as one of its
parameters. If the caller wants to pass a non-NULL value
for this, it immediately makes the calling code require
DBus at build time. This has led to breakage of non-DBus
builds several times. It is desirable that only the virdbus.c
file should need WITH_DBUS conditionals, so we must ideally
remove the DBusError parameter from the method.
We can't simply raise a libvirt error, since the whole point
of this parameter is to give the callers a way to check if
the error is one they want to ignore, without having the logs
polluted with an error message. So, we add a virErrorPtr
parameter which the caller can then either ignore or raise
using the new virReportErrorObject method.
This new method is distinct from virSetError in that it
ensures the logging hooks are run.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Moving code for parsing and formatting network routes to
networkcommon_conf helps reusing those routes for domains. The route
definition has been hidden to help reducing the number of unnecessary
checks in the format function.
The virDomainDefParse* and virDomainDefFormat* methods both
accept the VIR_DOMAIN_XML_* flags defined in the public API,
along with a set of other VIR_DOMAIN_XML_INTERNAL_* flags
defined in domain_conf.c.
This is seriously confusing & error prone for a number of
reasons:
- VIR_DOMAIN_XML_SECURE, VIR_DOMAIN_XML_MIGRATABLE and
VIR_DOMAIN_XML_UPDATE_CPU are only relevant for the
formatting operation
- Some of the VIR_DOMAIN_XML_INTERNAL_* flags only apply
to parse or to format, but not both.
This patch cleanly separates out the flags. There are two
distint VIR_DOMAIN_DEF_PARSE_* and VIR_DOMAIN_DEF_FORMAT_*
flags that are used by the corresponding methods. The
VIR_DOMAIN_XML_* flags received via public API calls must
be converted to the VIR_DOMAIN_DEF_FORMAT_* flags where
needed.
The various calls to virDomainDefParse which hardcoded the
use of the VIR_DOMAIN_XML_INACTIVE flag change to use the
VIR_DOMAIN_DEF_PARSE_INACTIVE flag.
Add the possibility to have more than one IP address configured for a
domain network interface. IP addresses can also have a prefix to define
the corresponding netmask.
Add a default implementation of virNetDevSetIPv4Address using netlink
and libnl. This avoids requiring /usr/sbin/ip or /usr/sbin/ifconfig
external binaries.
Some of the nwfilter tests are now failing since --concurrent shows
up in the ebtables command. To avoid this, implement a function
preventing the probing for lock support in the eb/iptables tools
and use it in the tests.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Currently, when there is an API that's blocking with locked domain and
second API that's waiting in virDomainObjListFindByUUID() for the domain
lock (with the domain list locked) no other API can be executed on any
domain on the whole hypervisor because all would wait for the domain
list to be locked. This patch adds new optional approach to this in
which the domain is only ref'd (reference counter is incremented)
instead of being locked and is locked *after* the list itself is
unlocked. We might consider only ref'ing the domain in the future and
leaving locking on particular APIs, but that's no tonight's fairy tale.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
At the time that the network driver allocates a connection to a
network, the tap device that will be used hasn't yet been created -
that will be done later by qemu (or lxc or whoever) - but if the
network has macTableManager='libvirt', then when we do get around to
creating the tap device, we will need to add an entry for it to the
network bridge's fdb (forwarding database) *and* turn off learning and
unicast_flood for that tap device in the bridge's sysfs settings. This
means that qemu needs to know both the bridge name as well as the
setting of macTableManager, so we either need to create a new API to
retrieve that info, or just pass it back in the ActualNetDef that is
created during networkAllocateActualDevice. We choose the latter
method, since it's already done for the bridge device, and it has the
side effect of making the information available in domain status.
(NB: in the future, I think that the tap device should actually be
created by networkAllocateActualDevice(), as that will solve several
other problems, but that is a battle for another day, and this
information will still be useful outside the network driver)
The macTableManager attribute of a network's bridge subelement tells
libvirt how the bridge's MAC address table (used to determine the
egress port for packets) is managed. In the default mode, "kernel",
management is left to the kernel, which usually determines entries in
part by turning on promiscuous mode on all ports of the bridge,
flooding packets to all ports when the correct destination is unknown,
and adding/removing entries to the fdb as it sees incoming traffic
from particular MAC addresses. In "libvirt" mode, libvirt turns off
learning and flooding on all the bridge ports connected to guest
domain interfaces, and adds/removes entries according to the MAC
addresses in the domain interface configurations. A side effect of
turning off learning and unicast_flood on the ports of a bridge is
that (with Linux kernel 3.17 and newer), the kernel can automatically
turn off promiscuous mode on one or more of the bridge's ports
(usually only the one interface that is used to connect the bridge to
the physical network). The result is better performance (because
packets aren't being flooded to all ports, and can be dropped earlier
when they are of no interest) and slightly better security (a guest
can still send out packets with a spoofed source MAC address, but will
only receive traffic intended for the guest interface's configured MAC
address).
The attribute looks like this in the configuration:
<network>
<name>test</name>
<bridge name='br0' macTableManager='libvirt'/>
...
This patch only adds the config knob, documentation, and test
cases. The functionality behind this knob is added in later patches.
These two functions use netlink RTM_NEWNEIGH and RTM_DELNEIGH messages
to add and delete entries from a bridge's fdb. The bridge itself is
not referenced in the arguments to the functions, only the name of the
device that is attached to the bridge (since a device can only be
attached to one bridge at a time, and must be attached for this
function to make sense, the kernel easily infers which bridge's fdb is
being modified by looking at the device name/index).
https://bugzilla.redhat.com/show_bug.cgi?id=1159180
Move the API from the backend to storage_conf and rename it to
virStoragePoolGetVhbaSCSIHostParent. A future patch will need to
use this functionality from storage_conf
Get mounted filesystems list, which contains hardware info of disks and its
controllers, from QEMU guest agent 2.2+. Then, convert the hardware info
to corresponding device aliases for the disks.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
As qemu is now able to notify us about change of the channel state used
for communication with the guest agent we now can more precisely track
the state of the guest agent.
To allow notifying management apps this patch implements a new event
that will be triggered on changes of the guest agent state.
To allow reuse this non-trivial parser code in the backing store parser
this part of the command line parser needs to be split out into a
separate funciton.
Ethernet interfaces in libvirt currently do not support bandwidth setting.
For example, following xml file for an interface will not apply these
settings to corresponding qdiscs.
<interface type="ethernet">
<mac address="02:36:1d:18:2a:e4"/>
<model type="virtio"/>
<script path=""/>
<target dev="tap361d182a-e4"/>
<bandwidth>
<inbound average="984" peak="1024" burst="64"/>
<outbound average="2000" peak="2048" burst="128"/>
</bandwidth>
</interface>
Signed-off-by: Anirban Chakraborty <abchak@juniper.net>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 01b4de2b9f abstracts virDomainParseMemory()
for use by other functions in domain_conf.c
Extend the same for use, for functions outside of this file.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This is a reaction to Michal's fix [1] for non-NUMA systems that also
splits out conf/ out of util/ because libvirt_util shouldn't require
libvirt_conf if it is the other way around. This particular use case
worked, but we're trying to avoid it as mentioned [2], many times.
The only functions from virnuma.c that needed numatune_conf were
virDomainNumatuneNodesetIsAvailable() and virNumaSetupMemoryPolicy().
The first one should be in numatune_conf as it works with
virDomainNumatune, the second one just needs nodeset and mode, both of
which can be passed without the need of numatune_conf.
Apart from fixing that, this patch also fixes recently added
code (between commits d2460f85^..5c8515620) that doesn't support
non-contiguous nodesets. It uses new function
virNumaNodesetIsAvailable(), which doesn't need a stub as it doesn't use
any libnuma functions, to check if every specified nodeset is available.
[1] https://www.redhat.com/archives/libvir-list/2014-November/msg00118.html
[2] http://www.redhat.com/archives/libvir-list/2011-June/msg01040.html
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
As of 90286418 the function is introduced. However, it's missing
an entry in the libvirt_private.syms so it can't be mocked.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There was no check for 'nodeset' attribute in numatune-related
elements. This patch adds validation that any nodeset specified does
not exceed maximum host node.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
This function is used to cleanup a pidfile doing whatever it takes, even
killing the owning process.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This patch provides the utility functions to needed to synchronize the
changes made to a guest domain network device's multicast filter
with the corresponding macvtap device's filter on the host:
* Get/add/remove multicast MAC addresses
* Get the macvtap device's RX filter list
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Laine Stump <laine@laine.org>
To prepare for introducing a single global driver, rename the
virDriver struct to virHypervisorDriver and the registration
API to virRegisterHypervisorDriver()
The helper checks whether a string contains only whitespace or is NULL.
This will be helpful to skip cases where a user string is optional, but
may be provided empty with the same meaning.
Our qemu monitor code has a converter from key-value pairs to a json
value object. I want to re-use the code later and having it part of the
monitor command generator is inflexible. Split it out into a separate
helper.
if specifying migration_host to an Ipv6 address without brackets,
it was resolved to an incorrect address, such as:
tcp:2001:0DB8::1428:4444,
but the correct address should be:
tcp:[2001:0DB8::1428]:4444
so we should add brackets when parsing it.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
This same structure will be used to retrieve RX filter info for
interfaces on the host via netlink messages, and RX filter info for
interfaces on the guest via the qemu "query-rx-filter" command.
This new attribute will control whether or not libvirt will pay
attention to guest notifications about changes to network device mac
addresses and receive filters. The default for this is 'no' (for
security reasons). If it is set to 'yes' *and* the specified device
model and connection support it (currently only macvtap+virtio) then
libvirt will watch for NIC_RX_FILTER_CHANGED events, and when it
receives one, it will issue a query-rx-filter command, retrieve the
result, and modify the host-side macvtap interface's mac address and
unicast/multicast filters accordingly.
The functionality behind this attribute will be in a later patch. This
patch merely adds the attribute to the top-level of a domain's
<interface> as well as to <network> and <portgroup>, and adds
documentation and schema/xml2xml tests. Rather than adding even more
test files, I've just added the net attribute in various applicable
places of existing test files.
If we don't properly clean up all processes in the
machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm
starts fail with
'CreateMachine: File exists'
Additional processes can e.g. be added via
echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks
but there are other cases like
http://bugs.debian.org/761521
Invoke TerminateMachine to be on the safe side since systemd tracks the
cgroup anyway. This is a noop if all processes have terminated already.
There are now two places in libvirt which use polkit. Currently
they use pkexec, which is set to be replaced by direct DBus API
calls. Add a common API which they will both be able to use for
this purpose.
No tests are added at this time, since the impl will be gutted
in favour of a DBus API call shortly.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This new event will use typedParameters to expose what has been actually
updated and the reason is that we can in the future extend any tunable
values or add new tunable values. With typedParameters we don't have to
worry about creating some other events, we will just use this universal
event to inform user about updates.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
To express empty drive we historically use storage source with empty
path. Unfortunately NBD disks may be declared without a path.
Add a helper to wrap this logic.
Up to now, users can configure BIOS via the <loader/> element. With
the upcoming implementation of UEFI this is not enough as BIOS and
UEFI are conceptually different. For instance, while BIOS is ROM, UEFI
is programmable flash (although all writes to code section are
denied). Therefore we need new attribute @type which will
differentiate the two. Then, new attribute @readonly is introduced to
reflect the fact that some images are RO.
Moreover, the OVMF (which is going to be used mostly), works in two
modes:
1) Code and UEFI variable store is mixed in one file.
2) Code and UEFI variable store is separated in two files
The latter has advantage of updating the UEFI code without losing the
configuration. However, in order to represent the latter case we need
yet another XML element: <nvram/>. Currently, it has no additional
attributes, it's just a bare element containing path to the variable
store file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The new blockcopy API wants to reuse only a subset of the disk
hotplug parser - namely, we only care about the embedded
virStorageSourcePtr inside a <disk> XML. Strange as it may
seem, it was easier to just parse an entire disk definition,
then throw away everything but the embedded source, than it
was to disentangle the source parsing code from the rest of
the overall disk parsing function. All that I needed was a
couple of tweaks and a new internal flag that determines
whether the normally-mandatory target element can be
gracefully skipped, since everything else was already optional.
* src/conf/domain_conf.h (virDomainDiskSourceParse): New
prototype.
* src/conf/domain_conf.c (VIR_DOMAIN_XML_INTERNAL_DISK_SOURCE):
New flag.
(virDomainDiskDefParseXML): Honor flag to make target optional.
(virDomainDiskSourceParse): New function.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add umask to _virCommand, allow user to set umask to command.
Set umask(002) to qemu process to overwrite the default umask
of 022 set by many distros, so that unix sockets created for
virtio-serial has expected permissions.
Fix problem reported here:
https://sourceware.org/bugzilla/show_bug.cgi?id=13078#c11https://bugzilla.novell.com/show_bug.cgi?id=888166
To use virtio-serial device, unix socket created for chardev with
default umask(022) has insufficient permissions.
e.g.:
-device virtio-serial \
-chardev socket,path=/tmp/foo,server,nowait,id=foo \
-device virtserialport,chardev=foo,name=org.fedoraproject.port.0
srwxr-xr-x 1 qemu qemu 0 21. Jul 14:19 /tmp/somefile.sock
Other users in the same group (like real user, test engines, etc)
cannot write to this socket.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
virStorageBackendVolDownloadLocal and virStorageBackendVolUploadLocal
use virFDStreamOpenFile function to work with the volume fd.
virFDStreamOpenFile calls virFDStreamOpenFileInternal that implements
handling of the non-blocking I/O. If a file is not a character device and
not a fifo, it uses libvirt_iohelper.
On FreeBSD, it doesn't work as expected because disk devices (including
ZFS volumes) are exposed as character devices, and ZFS volumes do not
support open(2) with O_NONBLOCK.
To overcome this, introduce a forceIOHelper flag to
virFDStreamOpenFileInternal that forces using libvirt_iohelper. And
introduce virFDStreamOpenBlockDevice that calls
virFDStreamOpenFileInternal with the forceIOHelper set to true.
That sets a new flag, but that flag does mean the child will get
LISTEN_FDS and LISTEN_PID environment variables properly set and
passed FDs reordered so that it corresponds with LISTEN_FDS (they must
start right after STDERR_FILENO).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Since not only systemd can do this (we'll be doing it as well few
patches later), change 'systemd' to 'caller' and fix LISTEN_FDS to
LISTEN_PID where applicable.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1072653
Upon successful upload of a volume, the target volume and storage pool
were not updated to reflect any changes as a result of the upload. Make
use of the existing stream close callback mechanism to force a backend
pool refresh to occur in a separate thread once the stream closes. The
separate thread should avoid potential deadlocks if the refresh needed
to wait on some event from the event loop which is used to perform
the stream callback.
This should iterate over mount tab and search for hugetlbfs among with
looking for the default value of huge pages.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Leak introduced in commit 16ebf10f (v1.2.6), detected by valgrind:
==9816== 216 (96 direct, 120 indirect) bytes in 6 blocks are definitely lost in loss record 665 of 821
==9816== at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9816== by 0x50836FB: virAlloc (viralloc.c:144)
==9816== by 0x1DBDBE27: udevProcessPCI (node_device_udev.c:546)
==9816== by 0x1DBDD79D: udevGetDeviceDetails (node_device_udev.c:1293)
* src/util/virpci.h (virPCIEDeviceInfoFree): New prototype.
* src/util/virpci.c (virPCIEDeviceInfoFree): New function.
* src/conf/node_device_conf.c (virNodeDevCapsDefFree): Clear
pci_express under pci case.
(virNodeDevCapPCIDevParseXML): Avoid leak.
* src/node_device/node_device_udev.c (udevProcessPCI): Likewise.
* src/libvirt_private.syms (virpci.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
virTimeFieldsThenRaw will never return negative result, so I clean up
the related meaningless judgements to make it better.
Signed-off-by: James <james.wangyufei@huawei.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Added <capabilities> in the <features> section of LXC domains
configuration. This section can contain elements named after the
capabilities like:
<mknod state="on"/>, keep CAP_MKNOD capability
<sys_chroot state="off"/> drop CAP_SYS_CHROOT capability
Users can restrict or give more capabilities than the default using
this mechanism.
Introduce a new function to parse the provided scsi_host parent address
and unique_id value in order to find the /sys/class/scsi_host directory
which will allow a stable SCSI host address
Add a test to scsihosttest to lookup the host# name by using the PCI address
and unique_id value
Introduce a new function to read the current scsi_host entry and return
the value found in the 'unique_id' file.
Add a 'scsihosttest' test (similar to the fchosttest, but incorporating some
of the concepts of the mocked pci test library) in order to read the
unique_id file like would be found in the /sys/class/scsi_host tree.
There were numerous places where numatune configuration (and thus
domain config as well) was changed in different ways. On some
places this even resulted in persistent domain definition not to be
stable (it would change with daemon's restart).
In order to uniformly change how numatune config is dealt with, all
the internals are now accessible directly only in numatune_conf.c and
outside this file accessors must be used.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Since there was already public virDomainNumatune*, I changed the
private virNumaTune to match the same, so all the uses are unified and
public API is kept:
s/vir\(Domain\)\?Numa[tT]une/virDomainNumatune/g
then shrunk long lines, and mainly functions, that were created after
that:
sed -i 's/virDomainNumatuneMemPlacementMode/virDomainNumatunePlacement/g'
And to cope with the enum name, I haad to change the constants as
well:
s/VIR_NUMA_TUNE_MEM_PLACEMENT_MODE/VIR_DOMAIN_NUMATUNE_PLACEMENT/g
Last thing I did was at least a little shortening of already long
name:
s/virDomainNumatuneDef/virDomainNumatune/g
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There are many places with numatune-related code that should be put
into special numatune_conf and this patch creates a basis for that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Rename linuxDomainInterfaceStats to virNetInterfaceStats in order
to allow adding platform specific implementations without
making consumer worrying about specific implementation to be used.
Also, rename util/virstatslinux.c to util/virstats.c so placing
other platform specific implementations into this file don't
look unexpected from the file name.
Add security driver functions to label separate storage images using the
virStorageSource definition. This will help to avoid the need to do ugly
changes to the disk struct and use the source directly.
Cgroups code uses VIR_CGROUP_DEVICE_* flags to specify the mode but in
the end it needs to be converted to a string. Add a helper to do it and
use it in the cgroup code before introducing it into the rest of the
code.
We are going to modify storage source chains in place. Add a helper that
will copy relevant information such as security labels to the new
element if that doesn't contain it.
The qemu block info function relied on working with local storage. Break
this assumption by adding support for remote volumes. Unfortunately we
still need to take a hybrid approach as some of the operations require a
filedescriptor.
Previously you'd get:
$ virsh domblkinfo gl vda
error: cannot stat file '/img10': Bad file descriptor
Now you get some stats:
$ virsh domblkinfo gl vda
Capacity: 10485760
Allocation: 197120
Physical: 197120
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1110198
There's a lot of places where we skip doing actions based on the
locality of given storage type. The usual pattern is to skip it if:
virStorageSourceGetActualType(src) == VIR_STORAGE_TYPE_NETWORK
Add a simple helper to simplify the pattern to
virStorageSourceIsLocalStorage(src)
Replace the inline "auth" struct in virStorageSource with a pointer
to a virStorageAuthDefPtr and utilize between the domain_conf, qemu_conf,
and qemu_command sources for finding the auth data for a domain disk
Introduce virStorageAuthDef and friends. Future patches will merge/utilize
their view of storage source/pool auth/secret definitions.
New API's include:
virStorageAuthDefParse: Parse the "<auth/>" XML data for either the
domain disk or storage pool returning a
virStorageAuthDefPtr
virStorageAuthDefCopy: Copy a virStorageAuthDefPtr - to be used by
the qemuTranslateDiskSourcePoolAuth when it
copies storage pool auth data into domain
disk auth data
virStorageAuthDefFormat: Common output of the "<auth" in the domain
disk or storage pool XML
virStorageAuthDefFree: Free memory associated with virStorageAuthDef
Subsequent patches will utilize the new functions for the domain disk and
storage pools.
Future work in the hostdev pass through can then make use of common data
structures and code.
So far only information on disks and host devices are exposed in the
capabilities XML. Well, at least something. Even a new test is
introduced. The qemu capabilities are stolen from already existing
qemucapabilities test. There's one tricky point though. Functions that
checks host's KVM and VFIO capabilities, are impossible to mock
currently. So in the test, we are setting the capabilities by hand.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This new module holds and formats capabilities for emulator. If you
are about to create a new domain, you may want to know what is the
host or hypervisor capable of. To make sure we don't regress on the
XML, the formatting is not something left for each driver to
implement, rather there's general format function.
The domain capabilities is a lockable object (even though the locking
is not necessary yet) which uses reference counter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Check if the buffer is in error state and report an error if it is.
This replaces the pattern:
if (virBufferError(buf)) {
virReportOOMError();
goto cleanup;
}
with:
if (virBufferCheckError(buf) < 0)
goto cleanup;
Document typical buffer usage to favor this.
Also remove the redundant FreeAndReset - if an error has
been set via virBufferSetError, the content is already freed.
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
This patch introduces a function that will allow us to resolve a
relative difference between two elements of a disk backing chain. This
function will be used to allow relative block commit and block pull
where we need to specify the new relative name of the image to qemu.
This patch also adds unit tests for the function to verify that it works
correctly.
virPortAllocatorSetUsed permits to set a port as already used and
prevent the port allocator to use it without any attempt to bind it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Introduce a common function that will take a callback to resolve links
that will be used to canonicalize paths on various storage systems and
add extensive tests.
To free string lists with some strings stolen from the middle we need to
walk the complete array. Introduce a new helper that takes the string
list size to free such string lists.
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
The image labels are stored in the virStorageSource struct. Convert the
virDomainDiskDefGetSecurityLabelDef helper not to use the full disk def
and move it appropriately.
For future work we need two functions that fetches total number of
pages and number of free pages for given NUMA node and page size
(virNumaGetPageInfo()).
Then we need to learn pages of what sizes are supported on given node
(virNumaGetPages()).
Note that system page size is disabled at the moment as there's one
issue connected. If you have a NUMA node with huge pages allocated the
kernel would return the normal size of memory for that node. It
basically ignores the fact that huge pages steal size from the system
memory. Until we resolve this, it's safer to not confuse users and
hence not report any system pages yet.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For future work we want to get info for not only the free memory
but overall memory size too. That's why the function must have
new signature too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Not on all hosts the set of NUMA nodes IDs is continuous. This is
critical, because our code currently assumes the set doesn't contain
holes. For instance in nodeGetFreeMemory() we can see the following
pattern:
if ((max_node = virNumaGetMaxNode()) < 0)
return 0;
for (n = 0; n <= max_node; n++) {
...
}
while it should be something like this:
if ((max_node = virNumaGetMaxNode()) < 0)
return 0;
for (n = 0; n <= max_node; n++) {
if (!virNumaNodeIsAvailable(n))
continue;
...
}
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When the block job event was first added, it was for block pull,
where the active layer of the disk remains the same name. It was
also in a day where we only cared about local files, and so we
always had a canonical absolute file name. But two things have
changed since then: we now have network disks, where determining
a single absolute string does not really make sense; and we have
two-phase jobs (copy and active commit) where the name of the
active layer changes between the first event (ready, on the old
name) and second (complete, on the pivoted name).
Adam Litke reported that having an unstable string between events
makes life harder for clients. Furthermore, all of our API that
operate on a particular disk of a domain accept multiple strings:
not only the absolute name of the active layer, but also the
destination device name (such as 'vda'). As this latter name is
stable, even for network sources, it serves as a better string
to supply in block job events.
But backwards-compatibility demands that we should not change the
name handed to users unless they explicitly request it. Therefore,
this patch adds a new event, BLOCK_JOB_2 (alas, I couldn't think of
any nicer name - but at least Migrate2 and Migrate3 are precedent
for a number suffix). We must double up on emitting both old-style
and new-style events according to what clients have registered for
(see also how IOError and IOErrorReason emits double events, but
there the difference was a larger struct rather than changed
meaning of one of the struct members).
Unfortunately, adding a new event isn't something that can easily
be broken into pieces, so the commit is rather large.
* include/libvirt/libvirt.h.in (virDomainEventID): Add a new id
for VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2.
(virConnectDomainEventBlockJobCallback): Document new semantics.
* src/conf/domain_event.c (_virDomainEventBlockJob): Rename field,
to ensure we catch all clients.
(virDomainEventBlockJobNew): Add parameter.
(virDomainEventBlockJobDispose)
(virDomainEventBlockJobNewFromObj)
(virDomainEventBlockJobNewFromDom)
(virDomainEventDispatchDefaultFunc): Adjust clients.
(virDomainEventBlockJob2NewFromObj)
(virDomainEventBlockJob2NewFromDom): New functions.
* src/conf/domain_event.h: Add new prototypes.
* src/libvirt_private.syms (domain_event.h): Export new functions.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Generate two
different events.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Likewise.
* src/remote/remote_protocol.x
(remote_domain_event_block_job_2_msg): New struct.
(REMOTE_PROC_DOMAIN_EVENT_BLOCK_JOB_2): New RPC.
* src/remote/remote_driver.c
(remoteDomainBuildEventBlockJob2): New handler.
(remoteEvents): Register new event.
* daemon/remote.c (remoteRelayDomainEventBlockJob2): New handler.
(domainEventCallbacks): Register new event.
* tools/virsh-domain.c (vshEventCallbacks): Likewise.
(vshEventBlockJobPrint): Adjust client.
* src/remote_protocol-structs: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
These functions will handle PCIe devices and their link capabilities
to query some info about it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The purpose of this function is to fetch link state
and link speed for given NIC name from the SYSFS.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently it is not possible to determine the speed of an interface
and whether a link is actually detected from the API. Orchestrating
platforms want to be able to determine when the link has failed and
where multiple speeds may be available which one the interface is
actually connected at. This commit introduces an extension to our
interface XML (without implementation to interface driver backends):
<interface type='ethernet' name='eth0'>
<start mode='none'/>
<mac address='aa:bb:cc:dd:ee:ff'/>
<link speed='1000' state='up'/>
<mtu size='1492'/>
...
</interface>
Where @speed is negotiated link speed in Mbits per second, and state
is the current NIC state (can be one of the following: "unknown",
"notpresent", "down", "lowerlayerdown","testing", "dormant", "up").
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
A future patch wants to create disk definitions with non-zero
default contents; to avoid crashes, all callers that allocate
a disk definition should go through a common point.
I found allocation points by looking for any code that increments
ndisks, as well as any matches for ALLOC.*disk. Most places that
modified ndisks were covered by the parse from XML to domain/device
definition by initial domain creation or device hotplug; I also
hand-checked all drivers that generate a device struct on the
fly during getXMLDesc.
* src/conf/domain_conf.h (virDomainDiskDefNew): New prototype.
* src/conf/domain_conf.c (virDomainDiskDefNew): New function.
(virDomainDiskDefParseXML): Use it.
* src/parallels/parallels_driver.c (parallelsAddHddInfo):
Likewise.
* src/qemu/qemu_command.c (qemuParseCommandLine): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainGetXMLDesc): Likewise.
* src/vmx/vmx.c (virVMXParseDisk): Likewise.
* src/xenxs/xen_sxpr.c (xenParseSxprDisks, xenParseSxpr):
Likewise.
* src/xenxs/xen_xm.c (xenParseXM): Likewise.
* src/libvirt_private.syms (domain_conf.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
The API gets a NUMA node and find distances to other nodes. The
distances are returned in an array. If an item X within the array
equals to value of zero, then there's no such node as X.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
To allow using the array manipulation macros on the arrays returned by
virStringSplit we need to know the count of the elements in the array.
Modify virStringSplit to return this value, rename it and add a helper
with the old name so that we don't need to update all the code.
Add parsers for relative and absolute backing names for local and remote
storage files.
This parser parses relative paths as relative to their parents and
absolute paths according to the protocol or local access.
For remote storage volumes, all URI based backing file names are
supported and for the qemu colon syntax the NBD protocol is supported.
Use virStorageFileReadHeader() to read headers of storage files possibly
on remote storage to retrieve the image metadata.
The backend information is now parsed by
virStorageFileGetMetadataInternal which is now exported from the util
source and virStorageFileGetMetadataFromFDInternal now doesn't need to
be exported.
My future work will modify the metadata crawler function to use the
storage driver file APIs to access the files instead of accessing them
directly so that we will be able to request the metadata for remote
files too. To avoid linking the storage driver to every helper file
using the utils code, the backing chain traversal function needs to be
moved to the storage driver source.
Additionally the virt-aa-helper and virstoragetest programs need to be
linked with the storage driver as a result of this change.
Since there isn't a single libc API to get this value, this patch
supplies one which gets the value by grabbing current time, then
converting that into a struct tm with gmtime_r(), then back to a
time_t using mktime.
The returned value is the difference between UTC and localtime in
seconds. If localtime is ahead of UTC (east) the offset will be a
positive number, and if localtime is behind UTC (west) the offset will
be negative.
This function should be POSIX-compliant, and is threadsafe, but not
async signal safe. If it was ever necessary to know this value in a
child process, we could cache it with a one-time init function when
libvirtd starts, then just supply the cached value, but that
complexity isn't needed for current usage; that would also have the
problem that it might not be accurate after a local daylight savings
boundary.
(If it weren't for DST, we could simply replace this entire function
with "-timezone"; timezone contains the offset of the current timezone
(negated from what we want) but doesn't account for DST. And in spite
of being guaranteed by POSIX, it isn't available on older versions of
mingw.)
Signed-off-by: Eric Blake <eblake@redhat.com>
The VIR_ENUM_DECL/VIR_ENUM_IMPL helper macros already append 'Type'
to the enum name being converted; it looks silly to have functions
with 'TypeType' in their name. Even though some of our enums have
to have a 'Type' suffix, the corresponding string conversion
functions do not.
* src/conf/secret_conf.h (VIR_ENUM_DECL): Rename virSecretUsageType.
* src/conf/storage_conf.h (VIR_ENUM_DECL): Rename
virStoragePoolAuthType, virStoragePoolSourceAdapterType,
virStoragePartedFsType.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainFSDefParseXML, virDomainFSDefFormat): Update callers.
* src/conf/secret_conf.c (virSecretDefParseUsage)
(virSecretDefFormatUsage): Likewise.
* src/conf/storage_conf.c (virStoragePoolDefParseAuth)
(virStoragePoolDefParseSource, virStoragePoolSourceFormat):
Likewise.
* src/lxc/lxc_controller.c (virLXCControllerSetupLoopDevices):
Likewise.
* src/storage/storage_backend_disk.c
(virStorageBackendDiskPartFormat): Likewise.
* src/util/virstorageencryption.c (virStorageEncryptionSecretParse)
(virStorageEncryptionSecretFormat): Likewise.
* tools/virsh-secret.c (cmdSecretList): Likewise.
* src/libvirt_private.syms (secret_conf.h, storage_conf.h): Export
corrected names.
Signed-off-by: Eric Blake <eblake@redhat.com>
Move sharable PCI handling functions to domain_addr.[ch], and
change theirs prefix from 'qemu' to 'vir':
- virDomainPCIAddressAsString;
- virDomainPCIAddressBusSetModel;
- virDomainPCIAddressEnsureAddr;
- virDomainPCIAddressFlagsCompatible;
- virDomainPCIAddressGetNextSlot;
- virDomainPCIAddressReleaseSlot;
- virDomainPCIAddressReserveAddr;
- virDomainPCIAddressReserveNextSlot;
- virDomainPCIAddressReserveSlot;
- virDomainPCIAddressSetFree;
- virDomainPCIAddressSetGrow;
- virDomainPCIAddressSlotInUse;
- virDomainPCIAddressValidate;
The only change here is function names, the implementation itself
stays untouched.
Extract common allocation code from DomainPCIAddressSetCreate
into virDomainPCIAddressSetAlloc.
All callers of virStorageFileGetMetadataFromBuf were first calling
virStorageFileProbeFormatFromBuf, to learn what format to pass in.
But this function is already wired to do the exact same probe if
the incoming format is VIR_STORAGE_FILE_AUTO, so it's simpler to
just refactor the probing into the central function.
* src/util/virstoragefile.h (virStorageFileGetMetadataFromBuf):
Drop parameter.
(virStorageFileProbeFormatFromBuf): Drop declaration.
* src/util/virstoragefile.c (virStorageFileGetMetadataFromBuf):
Do probe here instead of in callers.
(virStorageFileProbeFormatFromBuf): Make static.
* src/libvirt_private.syms (virstoragefile.h): Drop function.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Update caller.
* src/storage/storage_backend_gluster.c
(virStorageBackendGlusterRefreshVol): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
strtoul() is required to parse negative numbers as their
twos-complement positive counterpart. But sometimes we want
to reject negative numbers. Add new functions to do this.
The 'p' suffix is a mnemonic for 'positive' (technically it
also parses 0, but 'non-negative' doesn't lend itself to a
nice one-letter suffix).
* src/util/virstring.h (virStrToLong_uip, virStrToLong_ulp)
(virStrToLong_ullp): New prototypes.
* src/util/virstring.c (virStrToLong_uip, virStrToLong_ulp)
(virStrToLong_ullp): New functions.
* src/libvirt_private.syms (virstring.h): Export them.
* tests/virstringtest.c (testStringToLong): Test them.
Signed-off-by: Eric Blake <eblake@redhat.com>
To avoid memory leak of the "backingStoreRaw" field when reparsing
backing chains a new function is being introduced by this patch that
shall be used to clear backing store information.
The memory leak was introduced in commit 8823272d41.
SO_REUSEADDR on Windows is actually akin to SO_REUSEPORT
on Linux/BSD. ie it allows 2 apps to listen to the same
port at once. Thus we must not set it on Win32 platforms
See http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a wrapper for readdir. This helps us make sure that we always
set errno before calling readdir and it will make sure errors are
properly logged.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
Create a nwfilterxml2firewalltest to exercise the
ebiptables_driver.applyNewRules method with a variety of
different XML input files. The XML input files are taken
from the libvirt-tck nwfilter tests. While the nwfilter
tests verify the final state of the iptables chains, this
test verifies the set of commands invoked to create the
chains.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The network and nwfilter drivers both have a need to update
firewall rules. The currently share no code for interacting
with iptables / firewalld. The nwfilter driver is fairly
tied to the concept of creating shell scripts to execute
which makes it very hard to port to talk to firewalld via
DBus APIs.
This patch introduces a virFirewallPtr object which is able
to represent a complete sequence of rule changes, with the
ability to have multiple transactional checkpoints with
rollbacks. By formally separating the definition of the rules
to be applied from the mechanism used to apply them, it is
also possible to write a firewall engine that uses firewalld
DBus APIs natively instead of via the slow firewalld-cmd.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add virNWFilterRuleIsProtocol{Ethernet,IPv4,IPv6} helper methods
to avoid having to write a giant switch statements with many cases.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of hardcoding LIBEXECDIR as the location of the libvirt_iohelper
binary, use virFileFindResource to optionally find it in the current
build directory.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add virFileFindResource which will try to locate files
in the local build tree if the calling binary (eg libvirtd or
test suite) is being run from the build tree. The corresponding
virFileActivateDirOverride should be called at startup passing
in argv[0]. This will be examined for evidence of libtool magic
binary prefix / sub-directory in order to activate the override.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Each backing store of a given disk is associated with a unique index
(which is also formatted in domain XML) for easier addressing of any
particular backing store. With this patch, any backing store can be
addressed by its disk target and the index. For example, "vdc[4]"
addresses the backing store with index equal to 4 of the disk identified
by "vdc" target. Such shorthand can be used in any API in place for a
backing file path:
virsh blockcommit domain vda --base vda[3] --top vda[2]
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Remove the pointer from def->cputune.vcpupin after unplugging
the CPU and also free the bitmap contained in the structure
by calling virDomainVcpuPinDel instead of VIR_FREE.
Introduced by commit 0df1a79.
This makes virDomainLookupVcpuPin redundant.
https://bugzilla.redhat.com/show_bug.cgi?id=1088165
When checking if two filenames point to the same inode (whether
by hardlink or symlink), sometimes one of the names might be
relative. This convenience function makes it easier to check.
* src/util/virfile.h (virFileRelLinkPointsTo): New prototype.
* src/util/virfile.c (virFileRelLinkPointsTo): New function.
* src/libvirt_private.syms (virfile.h): Export it.
* src/xen/xm_internal.c (xenXMDomainGetAutostart): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Deciding if a user string represents a local file instead of a
network path is an operation worth exposing directly, particularly
since the next patch will be removing a redundant variable that
was caching the information.
* src/util/virstoragefile.h (virStorageIsFile): New declaration.
* src/util/virstoragefile.c (virBackingStoreIsFile): Rename...
(virStorageIsFile): ...export, and allow NULL input.
(virStorageFileGetMetadataInternal)
(virStorageFileGetMetadataRecurse, virStorageFileGetMetadata):
Update callers.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Use it.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
* src/libvirt_private.syms (virstoragefile.h): Export function.
Signed-off-by: Eric Blake <eblake@redhat.com>
Since it is an abbreviation, PCI should always be fully
capitalized or full lower case, never Pci.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
I almost wrote a hash value free function that just called
VIR_FREE, then realized I couldn't be the first person to
do that. Sure enough, it was worth factoring into a common
helper routine.
* src/util/virhash.h (virHashValueFree): New function.
* src/util/virhash.c (virHashValueFree): Implement it.
* src/util/virobject.h (virObjectFreeHashData): New function.
* src/libvirt_private.syms (virhash.h, virobject.h): Export them.
* src/nwfilter/nwfilter_learnipaddr.c (virNWFilterLearnInit): Use
common function.
* src/qemu/qemu_capabilities.c (virQEMUCapsCacheNew): Likewise.
* src/qemu/qemu_command.c (qemuDomainCCWAddressSetCreate):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorGetBlockInfo): Likewise.
* src/qemu/qemu_process.c (qemuProcessWaitForMonitor): Likewise.
* src/util/virclosecallbacks.c (virCloseCallbacksNew): Likewise.
* src/util/virkeyfile.c (virKeyFileParseGroup): Likewise.
* tests/qemumonitorjsontest.c
(testQemuMonitorJSONqemuMonitorJSONGetBlockInfo): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Move some functions out of domain_conf for use in the next
patch where snapshot starts to directly use structs in
virstoragefile.
* src/conf/domain_conf.c (virDomainDiskDefFree)
(virDomainDiskSourcePoolDefParse): Adjust callers.
(virDomainDiskSourceDefClear, virDomainDiskSourcePoolDefFree)
(virDomainDiskAuthClear): Move...
* src/util/virstoragefile.c (virStorageSourceClear)
(virStorageSourcePoolDefFree, virStorageSourceAuthClear): ...and
rename.
* src/conf/domain_conf.h (virDomainDiskAuthClear): Drop
declaration.
* src/qemu/qemu_conf.c (qemuTranslateDiskSourcePool): Adjust
caller.
* src/util/virstoragefile.h: Declare them.
* src/libvirt_private.syms (virstoragefile.h): Export them.
Signed-off-by: Eric Blake <eblake@redhat.com>
The code in virstoragefile.c is getting more complex as I
consolidate backing chain handling code. But for the setuid
virt-login-shell, we don't need to crawl backing chains. It's
easier to audit things for setuid security if there are fewer
files involved, so this patch moves the one function that
virFileOpen() was actually relying on to also live in virfile.c.
* src/util/virstoragefile.c (virStorageFileIsSharedFS)
(virStorageFileIsSharedFSType): Move...
* src/util/virfile.c (virFileIsSharedFS, virFileIsSharedFSType):
...to here, and rename.
(virFileOpenAs): Update caller.
* src/security/security_selinux.c
(virSecuritySELinuxSetFileconHelper)
(virSecuritySELinuxSetSecurityAllLabel)
(virSecuritySELinuxRestoreSecurityImageLabelInt): Likewise.
* src/security/security_dac.c
(virSecurityDACRestoreSecurityImageLabelInt): Likewise.
* src/qemu/qemu_driver.c (qemuOpenFileAs): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsSafe): Likewise.
* src/util/virstoragefile.h: Adjust declarations.
* src/util/virfile.h: Likewise.
* src/libvirt_private.syms (virfile.h, virstoragefile.h): Move
symbols as appropriate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Another struct being moved to util. This one doesn't have as
much use yet, thankfully.
* src/conf/domain_conf.h (virDomainDiskSourcePoolMode)
(virDomainDiskSourcePoolDef): Move...
* src/util/virstoragefile.h (virStorageSourcePoolMode)
(virStorageSourcePoolDef): ...and rename.
* src/conf/domain_conf.c (virDomainDiskSourcePoolDefFree)
(virDomainDiskSourceDefClear, virDomainDiskSourcePoolDefParse)
(virDomainDiskDefParseXML, virDomainDiskSourceDefParse)
(virDomainDiskSourceDefFormatInternal)
(virDomainDiskDefForeachPath, virDomainDiskSourceIsBlockType):
Adjust clients.
* src/qemu/qemu_conf.c (qemuTranslateDiskSourcePool): Likewise.
* src/libvirt_private.syms (domain_conf.h): Move symbols...
(virstoragefile.h): ...as appropriate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Encryption keys can be associated with each source file in a
backing chain; as such, this file belongs more in util/ where
it can be used by virstoragefile.h.
* src/conf/storage_encryption_conf.h: Rename...
* src/util/virstorageencryption.h: ...to this.
* src/conf/storage_encryption_conf.c: Rename...
* src/util/virstorageencryption.c: ...to this.
* src/Makefile.am (ENCRYPTION_CONF_SOURCES, CONF_SOURCES)
(UTIL_SOURCES): Update to new file names.
* src/libvirt_private.syms: Likewise.
* src/conf/domain_conf.h: Update client.
* src/conf/storage_conf.h: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
In order to reuse the newly-created host-side disk struct in
the virstoragefile backing chain code, I first have to move
it to util/. This starts the process, by first moving the
security label structures.
* src/conf/domain_conf.h (virDomainDefGenSecurityLabelDef)
(virDomainDiskDefGenSecurityLabelDef, virSecurityLabelDefFree)
(virSecurityDeviceLabelDefFree, virSecurityLabelDef)
(virSecurityDeviceLabelDef): Move...
* src/util/virseclabel.h: ...to new file.
(virSecurityLabelDefNew, virSecurityDeviceLabelDefNew): Rename the
GenSecurity functions.
* src/qemu/qemu_process.c (qemuProcessAttach): Adjust callers.
* src/security/security_manager.c (virSecurityManagerGenLabel):
Likewise.
* src/security/security_selinux.c
(virSecuritySELinuxSetSecurityFileLabel): Likewise.
* src/util/virseclabel.c: New file.
* src/conf/domain_conf.c: Move security code, and fix fallout.
* src/Makefile.am (UTIL_SOURCES): Build new file.
* src/libvirt_private.syms (domain_conf.h): Move symbols...
(virseclabel.h): ...to new section.
Signed-off-by: Eric Blake <eblake@redhat.com>
To ease mocking for bhyve unit tests move virBhyveTapGetRealDeviceName()
out of bhyve_command.c to virnetdevtap and rename it to
virNetDevTapGetRealDeviceName().
A future patch will split virDomainDiskDef, in order to track
multiple host resources per guest <disk>. To reduce the size
of that patch, I've factored out the four most common accesses
into functions, so that I can incrementally upgrade the code
base to use the accessors, and so that code that doesn't care
about the distinction of per-file details won't have to be
changed when the struct changes.
* src/conf/domain_conf.h (virDomainDiskGetType)
(virDomainDiskSetType, virDomainDiskGetSource)
(virDomainDiskSetSource, virDomainDiskGetDriver)
(virDomainDiskSetDriver, virDomainDiskGetFormat)
(virDomainDiskSetFormat): New prototypes.
* src/conf/domain_conf.c (virDomainDiskGetType)
(virDomainDiskSetType, virDomainDiskGetSource)
(virDomainDiskSetSource, virDomainDiskGetDriver)
(virDomainDiskSetDriver, virDomainDiskGetFormat)
(virDomainDiskSetFormat): Implement them.
* src/libvirt_private.syms (domain_conf.h): Export them.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add virFDStreamOpenPTY() function which is a wrapper around
virFDStreamOpenFileInternal() with putting the device it opens into a
raw mode.
Make virChrdevOpen() use virFDStreamOpenPTY() for
VIR_DOMAIN_CHR_TYPE_PTY devices.
This fixes mangled console output when libvirt runs on FreeBSD as it
requires device it opens to be placed into a raw mode explicitly.
The test suites often have to create DBus method reply messages
with payloads. Create two helpers for simplifying the process
of creating replies with payloads.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Split the virDBusMethodCall method into a couple of new methods
virDBusCall, virDBusCreateMethod and virDBusCreateMethodV.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The offset of virDomainDeviceInfo structure within a device definition
varies with device type and some types do not contain the info structure
at all. This new API makes it easier to access the info structure from a
generic virDomainDeviceDef structure.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Create qemu monitor events as a distinct class to normal domain
events, because they will be filtered differently. For ease of
review, the logic for filtering by event name is saved for a later
patch.
* src/conf/domain_event.c (virDomainQemuMonitorEventClass): New
class.
(virDomainEventsOnceInit): Register it.
(virDomainQemuMonitorEventDispose, virDomainQemuMonitorEventNew)
(virDomainQemuMonitorEventDispatchFunc)
(virDomainQemuMonitorEventStateRegisterID): New functions.
* src/conf/domain_event.h (virDomainQemuMonitorEventNew)
(virDomainQemuMonitorEventStateRegisterID): New prototypes.
* src/libvirt_private.syms (conf/domain_conf.h): Export them.
A earlier commit changed the global log buffer so that it only
records messages that are explicitly requested via the log
filters setting. This removes the performance burden, and
improves the signal/noise ratio for messages in the global
buffer. At the same time though, it is somewhat pointless, since
all the recorded log messages are already going to be sent to an
explicit log output like syslog, stderr or the journal. The
global log buffer is thus just duplicating this data on stderr
upon crash.
The log_buffer_size config parameter is left in the augeas
lens to prevent breakage for users on upgrade. It is however
completely ignored hereafter.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As part of the goal to get away from doing string matching on
filenames when deciding whether to emit a log message, turn
the virLogSource enum into a struct which contains a log
"name". There will eventually be one virLogSource instance
statically declared per source file. To minimise churn in this
commit though, a single global instance is used.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some virHostdevXXXX methods included the string Hostdev again
as a suffix. Change the latter to Device instead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Change any method names with Usb, Pci or Scsi to use
USB, PCI and SCSI since they are abbreviations.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The future QEMU capabilities cache needs to be able to invalidate
itself if the libvirtd binary or any loadable modules are changed
on disk. Record the 'ctime' value for these binaries and provide
helper APIs to query it. This approach assumes that if libvirt.so
is changed, then libvirtd will also change, which should usually
be the case with libtool's wrapper scripts that cause libvirtd to
get re-linked
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
GNULIB provides APIs for calculating md5 and sha256 hashes,
but these APIs only return you raw byte arrays. Most users
in libvirt want the hash in printable string format. Add
some helper APIs in util/vircrypto.{c,h} for doing this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Auditing all callers of virCommandRun and virCommandWait that
passed a non-NULL pointer for exit status turned up some
interesting observations. Many callers were merely passing
a pointer to avoid the overall command dying, but without
caring what the exit status was - but these callers would
be better off treating a child death by signal as an abnormal
exit. Other callers were actually acting on the status, but
not all of them remembered to filter by WIFEXITED and convert
with WEXITSTATUS; depending on the platform, this can result
in a status being reported as 256 times too big. And among
those that correctly parse the output, it gets rather verbose.
Finally, there were the callers that explicitly checked that
the status was 0, and gave their own message, but with fewer
details than what virCommand gives for free.
So the best idea is to move the complexity out of callers and
into virCommand - by default, we return the actual exit status
already cleaned through WEXITSTATUS and treat signals as a
failed command; but the few callers that care can ask for raw
status and act on it themselves.
* src/util/vircommand.h (virCommandRawStatus): New prototype.
* src/libvirt_private.syms (util/command.h): Export it.
* docs/internals/command.html.in: Document it.
* src/util/vircommand.c (virCommandRawStatus): New function.
(virCommandWait): Adjust semantics.
* tests/commandtest.c (test1): Test it.
* daemon/remote.c (remoteDispatchAuthPolkit): Adjust callers.
* src/access/viraccessdriverpolkit.c (virAccessDriverPolkitCheck):
Likewise.
* src/fdstream.c (virFDStreamCloseInt): Likewise.
* src/lxc/lxc_process.c (virLXCProcessStart): Likewise.
* src/qemu/qemu_command.c (qemuCreateInBridgePortWithHelper):
Likewise.
* src/xen/xen_driver.c (xenUnifiedXendProbe): Simplify.
* tests/reconnect.c (mymain): Likewise.
* tests/statstest.c (mymain): Likewise.
* src/bhyve/bhyve_process.c (virBhyveProcessStart)
(virBhyveProcessStop): Don't overwrite virCommand error.
* src/libvirt.c (virConnectAuthGainPolkit): Likewise.
* src/openvz/openvz_driver.c (openvzDomainGetBarrierLimit)
(openvzDomainSetBarrierLimit): Likewise.
* src/util/virebtables.c (virEbTablesOnceInit): Likewise.
* src/util/viriptables.c (virIpTablesOnceInit): Likewise.
* src/util/virnetdevveth.c (virNetDevVethCreate): Fix debug
message.
* src/qemu/qemu_capabilities.c (virQEMUCapsInitQMP): Add comment.
* src/storage/storage_backend_iscsi.c
(virStorageBackendISCSINodeUpdate): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Thanks to namespaces, we have a couple of places in the code
base that want to reflect a child exit status, including the
ability to detect death by a signal, back to a grandparent.
Best to make it a reusable function.
* src/util/virprocess.h (virProcessExitWithStatus): New prototype.
* src/libvirt_private.syms (util/virprocess.h): Export it.
* src/util/virprocess.c (virProcessExitWithStatus): New function.
* tests/commandtest.c (test23): Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
This function is needed for user namespaces, where we need to chmod()
the cgroup to the initial uid/gid such that systemd is allowed to
use the cgroup.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a virStringSearch method to virstring.{c,h} which performs
a regex match against a string and returns the matching substrings.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Systemd does not forget about the cases, where client service needs to
wait for daemon service to initialize and start accepting new clients.
Setting a dependency in client is not enough as systemd doesn't know
when the daemon has initialized itself and started accepting new
clients. However, it offers a mechanism to solve this. The daemon needs
to call a special systemd function by which the daemon tells "I'm ready
to accept new clients". This is exactly what we need with
libvirtd-guests (client) and libvirtd (daemon). So now, with this
change, libvirt-guests.service is invoked not any sooner than
libvirtd.service calls the systemd notify function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The virDomainGetRootFilesystem method can be generalized to allow
any filesystem path to be obtained.
While doing this, start a new test case for purpose of testing various
helper methods in the domain_conf.{c,h} files, such as this one.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Basically, the idea is copied from domain code, where tainting
exists for a while. Currently, only one taint reason exists -
VIR_NETWORK_TAINT_HOOK to mark those networks which caused invoking
of hook script.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In the next patch I'm going to need the network format function that
takes virBuffer as argument. However, slightly change of name is more
appropriate then: virNetworkDefFormatBuf to match the rest of functions
that format an object to buffer.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Implement virProcessRunInMountNamespace, which runs callback of type
virProcessNamespaceCallback in a container namespace. This uses a
child process to run the callback, since you can't change the mount
namespace of a thread. This implies that callbacks have to be careful
about what code they run due to async safety rules.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Daniel Berrange <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add a helper function which takes a file path and ensures
that all directory components leading up to the file exist.
IOW, it strips the filename part of the path and passes
the result to virFileMakePath.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
All the data for getting the actual type is present in the snapshot
config. There is no need to have this function private to the qemu
driver and it will be re-used later in other parts of libvirt
All the data for getting the actual type is present in the domain
config. There is no need to have this function private to the qemu
driver and it will be re-used later in other parts of libvirt
virConf now honours a VIR_CONF_FLAG_LXC_FORMAT flag to handle LXC
configuration files. The differences are that property names can
contain '.' character and values are all strings without any bounding
quotes.
Provide a new virConfWalk function calling a handler on all non-comment
values. This function will be used by the LXC conversion code to loop
over LXC configuration lines.
This commit allows to attach/detach a <filesystem> device in qemu. For
this purpose I'm introducing two new functions: virDomainFSInsert() and
virDomainFSRemove() and adding necessary code in the qemu driver. It
compares filesystems based on their "destination" folder. So if two
filesystems share the same destination, they are considered equal and
the qemu driver would reject the insertion.
Signed-off-by: Matthieu Coudron <mattator@gmail.com>
virKModConfig() - Return a buffer containing kernel module configuration
virKModLoad() - Load a specific module into the kernel configuration
virKModUnload() - Unload a specific module from the kernel configuration
virKModIsBlacklisted() - Determine whether a module is blacklisted within
the kernel configuration
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
1. nwfilter driver lock
2. virt driver lock
3. nwfilter update lock
4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock
2. domain object lock
3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of
1. nwfilter driver lock
2. nwfilter update lock
3. virt driver lock
4. domain object lock
and VM start using
1. nwfilter update lock
2. virt driver lock
3. domain object lock
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It doesn't make sense to fail if the SCSI host device is specified
as "shareable" explicitly between domains (NB, it works if and only
if the device is specified as "shareable" for *all* domains,
otherwise it fails).
To fix the problem, this patch introduces an array for virSCSIDevice
struct, which records all the names of domain which are using the
device (note that the recorded domains must specify the device as
shareable). And the change on the data struct brings on many
subsequent changes in the code.
Prior to this patch, the "shareable" tag didn't work as expected,
it actually work like "non-shareable". So this patch also added notes
in formatdomain.html to declare the fact.
* src/util/virscsi.h:
- Remove virSCSIDeviceGetUsedBy
- Change definition of virSCSIDeviceGetUsedBy and virSCSIDeviceListDel
- Add virSCSIDeviceIsAvailable
* src/util/virscsi.c:
- struct virSCSIDevice: Change "used_by" to be an array; Add
"n_used_by" as the array count
- virSCSIDeviceGetUsedBy: Removed
- virSCSIDeviceFree: frees the "used_by" array
- virSCSIDeviceSetUsedBy: Copy the domain name to avoid potential
memory corruption
- virSCSIDeviceIsAvailable: New
- virSCSIDeviceListDel: Change the logic, for device which is already
in the list, just remove the corresponding entry in "used_by". And
since it's only used in one place, we can safely removing the code
to find out the dev in the list first.
- Copyright updating
* src/libvirt_private.sys:
- virSCSIDeviceGetUsedBy: Remove
- virSCSIDeviceIsAvailable: New
* src/qemu/qemu_hostdev.c:
- qemuUpdateActiveScsiHostdevs: Check if the device existing before
adding it to the list;
- qemuPrepareHostdevSCSIDevices: Error out if the not all domains
use the device as "shareable"; Also don't try to add the device
to the activeScsiHostdevs list if it already there; And make
more sensible error w.r.t the current "shareable" value in
driver->activeScsiHostdevs.
- qemuDomainReAttachHostScsiDevices: Change the logic according
to the changes on helpers.
Signed-off-by: Osier Yang <jyang@redhat.com>
This reverts commit 2996e6be19
and some parts of 2636dc8c4d.
The former one tried to implement QoS setting on bridgeless networks.
However, as discussed upstream [1], the patch is far away from being
useful in even a single case. The whole idea of network QoS is to have
aggregated limits over several interfaces. This patch is doing
completely the opposite when merging two QoS settings (from the network
and the domain interface) into one which is then set at the domain
interface itself, not the network.
The latter one is the test for the previous one. Now none of them makes
sense.
1: https://www.redhat.com/archives/libvir-list/2014-January/msg01441.html
Conflicts:
tests/virnetdevbandwidthtest.c: New test has been introduced since
then.
There are some units within libvirt that utilize virCommand API to run
some commands and deserve own unit testing. These units are, however,
not desired to be rewritten to dig virCommand API usage out. As a great
example virNetDevBandwidth could be used. The problem with the bandwidth
unit is: it uses virCommand API heavily. Therefore we need a mechanism
to not really run a command, but rather see its string representation
after which we can decide if the unit construct the correct sequence of
commands or not.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1055484
Currently, libvirt's XML schema of network allows QoS to be defined for
every network even though it has no bridge. For instance:
<network>
<name>vdsm-no-bridge</name>
<forward mode='passthrough'>
<interface dev='em1.10'/>
</forward>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='1000' burst='1024'/>
</bandwidth>
</network>
The bandwidth limitations can be, however, applied even on such
networks. In fact, they are going to be applied on the interface that
will be connected to the network on a domain startup. This approach,
however, has one limitation. With bridged networks, there are two points
where QoS can be set: bridge and domain interface. The lower limit of
the two is enforced then. For instance, if the interface has 10Mbps
average, but the network only 1Mbps, there's no way for interface to
transmit packets faster than the 1Mbps limit. With two points this is
enforced by kernel. With only one point, we must combine both QoS
settings into one which is set afterwards. Look at
virNetDevBandwidthMinimal() and you'll understand immediately what I
mean.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Unlike the host devices of other types, SCSI host device XML supports
"shareable" tag. This patch introduces it for the virSCSIDevice struct
for a later patch use (to detect if the SCSI device is shareable when
preparing the SCSI host device in QEMU driver).
spice-server offers an API to disable file transfer messages
on the agent channel between the client and the guest.
This is supported in qemu through the disable-agent-file-xfer option.
This patch exposes this option to libvirt.
Adds a new element 'filetransfer', with one property,
'enable', which accepts a boolean.
Default is enabled, for backward compatibility.
Depends on the capability exported in the first patch of the series.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch introduces virCgroupSetBlkioDeviceReadIops,
virCgroupSetBlkioDeviceWriteIops,
virCgroupSetBlkioDeviceReadBps and
virCgroupSetBlkioDeviceWriteBps,
we can use these interfaces to set up throttle
blkio cgroup for domain.
This patch also adds the new throttle blkio cgroup
elements to the test xml.
Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
In order to mirror a server with per-object filtering, the client
needs to track which server callbackID is servicing the client
callback. This patch introduces the notion of a serverID, as
well as the plumbing to use it for network events, although the
actual complexity of using per-object filtering in the remote
driver is deferred to a later patch.
* src/conf/object_event.h (virObjectEventStateEventID): Add parameter.
(virObjectEventStateQueueRemote, virObjectEventStateSetRemote):
New prototypes.
(virObjectEventStateRegisterID): Move...
* src/conf/object_event_private.h: ...here, and add parameter.
(_virObjectEvent): Add field.
* src/conf/network_event.h (virNetworkEventStateRegisterClient): New
prototype.
* src/conf/object_event.c (_virObjectEventCallback): Add field.
(virObjectEventStateSetRemote): New function.
(virObjectEventStateQueue): Make wrapper around...
(virObjectEventStateQueueRemote): New function.
(virObjectEventCallbackListCount): Tweak return count when remote
id matching is used.
(virObjectEventCallbackLookup, virObjectEventStateRegisterID):
Tweak registration when remote id matching will be used.
(virObjectEventNew): Default to no remote id.
(virObjectEventCallbackListAddID): Likewise, but set remote id
when one is available.
(virObjectEventCallbackListRemoveID)
(virObjectEventCallbackListMarkDeleteID): Adjust return value when
remote id was set.
(virObjectEventStateEventID): Query existing id.
(virObjectEventDispatchMatchCallback): Require matching event id.
(virObjectEventStateCallbackID): Adjust caller.
* src/conf/network_event.c (virNetworkEventStateRegisterClient): New
function.
(virNetworkEventStateRegisterID): Update caller.
* src/conf/domain_event.c (virDomainEventStateRegister)
(virDomainEventStateRegisterID): Update callers.
* src/remote/remote_driver.c
(remoteConnectNetworkEventRegisterAny)
(remoteConnectNetworkEventDeregisterAny)
(remoteConnectDomainEventDeregisterAny): Likewise.
(remoteEventQueue): Hoist earlier to avoid forward declaration,
and add parameter. Adjust all callers.
* src/libvirt_private.syms (conf/object_event.h): Drop function.
Signed-off-by: Eric Blake <eblake@redhat.com>
When the host is configured with very restrictive firewall (default policy
is DROP for all chains, including OUTPUT), the bridge driver for Linux
adds netfilter entries to allow DHCP and DNS requests to go from the VM
to the dnsmasq of the host.
The issue that this commit fixes is the fact that a DROP policy on the OUTPUT
chain blocks the DHCP replies from the host’s dnsmasq to the VM.
As DHCP replies are sent in UDP, they are not caught by any --ctstate ESTABLISHED
rule and so, need to be explicitly allowed.
Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr.eu.org>
While working on events, I found a number of minor issues; I'm
hoisting these to the front rather than doing it piecemeal in
the patches where I first noticed bad or missing documentation.
* src/conf/object_event.c: Fix grammar, document all parameters
of public functions, wrap some long lines.
* src/conf/object_event.h: Likewise.
* src/conf/network_event.c: Likewise.
* src/conf/domain_event.c: Likewise (except for the large number
of event creation functions).
* src/libvirt_private.cyms (conf/object_event.h): Split...
(conf/network_event.h): ...to account for new file.
Signed-off-by: Eric Blake <eblake@redhat.com>
Define the public API for (de-)registering network events
and the callbacks for receiving lifecycle events. The lifecycle
event includes a 'detail' parameter to match the domain lifecycle
event data, but this is currently unused.
The network events related code goes into its own set of internal
files src/conf/network_event.[ch]
Before this patch, the translation function still needs a second ugly
helper function to actually format the command line for qemu. But if we
do the right stuff in the translation function, we don't have to bother
with the second function any more.
This patch removes the messy qemuBuildVolumeString function and changes
qemuTranslateDiskSourcePool to set stuff up correctly so that the
regular code paths meant for volumes can be used to format the command
line correctly.
For this purpose a new helper "qemuDiskGetActualType()" is introduced to
return the type of the volume in a pool.
As a part of the refactor the qemuTranslateDiskSourcePool function is
fixed to do decisions based on the pool type instead of the volume type.
This allows to separate pool-type-specific stuff more clearly and will
ease addition of other pool types that will require certain other
operations to get the correct pool source.
The previously fixed tests should make sure that we don't break stuff
that was working before.
Move the code for lxcContainerGetSubtree into the virfile
module creating 2 new functions
int virFileGetMountSubtree(const char *mtabpath,
const char *prefix,
char ***mountsret,
size_t *nmountsret);
int virFileGetMountReverseSubtree(const char *mtabpath,
const char *prefix,
char ***mountsret,
size_t *nmountsret);
Add a new virfiletest.c test case to validate the new code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This gets rid of another stat() per volume, as well as cutting
bytes read in half, when populating the volumes of a directory
pool during a pool refresh. Not to mention that it provides an
interface that can let gluster pools also probe file types.
* src/util/virstoragefile.h (virStorageFileProbeFormatFromFD):
Delete.
(virStorageFileProbeFormatFromBuf): New prototype.
(VIR_STORAGE_MAX_HEADER): New constant, based on...
* src/util/virstoragefile.c (STORAGE_MAX_HEAD): ...old name.
(vmdk4GetBackingStore, virStorageFileGetMetadataInternal)
(virStorageFileProbeFormat): Adjust clients.
(virStorageFileProbeFormatFromFD): Delete.
(virStorageFileProbeFormatFromBuf): Export.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Adjust client.
* src/libvirt_private.syms (virstoragefile.h): Adjust exports.
Signed-off-by: Eric Blake <eblake@redhat.com>
Future patches will want to learn metadata about a file using
a buffer that was already parsed in order to probe the file's
format. Rather than reopening and re-reading the file, it makes
sense to separate getting file contents from actually parsing
those contents.
* src/util/virstoragefile.c (virStorageFileGetMetadataFromBuf)
(virStorageFileGetMetadataFromFDInternal): New functions.
(virStorageFileGetMetadataInternal): Hoist fstat() and read() into
callers.
(virStorageFileGetMetadataFromFD)
(virStorageFileGetMetadataRecurse): Rework clients.
* src/util/virstoragefile.h (virStorageFileGetMetadataFromBuf):
New prototype.
* src/libvirt_private.syms (virstoragefile.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Our backing file chain code was not very robust to an ill-timed
EINTR, which could lead to a short read causing us to randomly
treat metadata differently than usual. But the existing
virFileReadLimFD forces an error if we don't read the entire
file, even though we only care about the header of the file.
So add a new virFile function that does what we want.
* src/util/virfile.h (virFileReadHeaderFD): New prototype.
* src/util/virfile.c (virFileReadHeaderFD): New function.
* src/libvirt_private.syms (virfile.h): Export it.
* src/util/virstoragefile.c (virStorageFileGetMetadataInternal)
(virStorageFileProbeFormatFromFD): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add a function for efficiently checking if a path is a filesystem
mount point.
NB will not work for bind mounts, only true filesystem mounts.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This makes virCPUx86DataAddCPUID, virCPUx86DataFree, and
virCPUx86MakeData available for direct usage outside of cpu driver in
tests and the new qemu monitor that will request the actual CPU
definition from a running qemu instance.
Expand the "secmodel" XML fragment of "host" with a sequence of
baselabel's which describe the default security context used by
libvirt with a specific security model and virtualization type:
<secmodel>
<model>selinux</model>
<doi>0</doi>
<baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel>
<baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel>
</secmodel>
<secmodel>
<model>dac</model>
<doi>0</doi>
<baselabel type='kvm'>107:107</baselabel>
<baselabel type='qemu'>107:107</baselabel>
</secmodel>
"baselabel" is driver-specific information, e.g. in the DAC security
model, it indicates USER_ID:GROUP_ID.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When running setuid, we must be careful about what env vars
we allow commands to inherit from us. Replace the
virCommandAddEnvPass function with two new ones which do
filtering
virCommandAddEnvPassAllowSUID
virCommandAddEnvPassBlockSUID
And make virCommandAddEnvPassCommon use the appropriate
ones
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Care must be taken accessing env variables when running
setuid. Introduce a virGetEnvAllowSUID for env vars which
are safe to use in a setuid environment, and another
virGetEnvBlockSUID for vars which are not safe. Also add
a virIsSUID helper method for any other non-env var code
to use.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch adds cpuDataFormat and cpuDataParse APIs to be used in unit
tests for testing APIs that deal with virCPUData. In the x86 world, this
means we can now store/load arbitrary CPUID data in the test suite to
check correctness of CPU related APIs that could not be tested before.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The dbus_bus_get() function returns a shared bus connection that
all libraries in a process can use. You are forbidden from calling
close on this connection though, since you can never know if any
other code might be using it.
Add an option to use private dbus bus connections, if the app
wants to be able to close the connection.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This function takes exactly one argument: an address to check.
It returns true, if the address is an IPv4 or IPv6 address in numeric
format, false otherwise (e.g. for "examplehost").
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Prefer using VFIO (if available) to the legacy KVM device passthrough.
With this patch a PCI passthrough device without the driver configured
will be started with VFIO if it's available on the host. If not legacy
KVM passthrough is checked and error is reported if it's not available.
I tried to test ./configure --without-lxc --without-remote.
First, the build failed with some odd errors, such as an
inability to build xen, or link failures for virNetTLSInit.
But when you think about it, once there is no remote code,
all of libvirtd is useless, any stateful driver that depends
on libvirtd is also not worth compiling, and any libraries
used only by RPC code are not needed. So I patched
configure.ac to make for some saner defaults when an
explicit disable is attempted. Similarly, since we have
migrated virnetdevbridge into generic code, the workaround
for Linux kernel stupidity must not depend on stateful
drivers being in use.
Then there's 'make check' that needs segregation.
Wow - quite a bit of cleanup to make --without-remote useful :)
* configure.ac: Let --without-remote toggle defaults on stateful
drivers and other libraries. Pick up Linux kernel workarounds
even when qemu and lxc are not being compiled.
* tests/Makefile.am (test_programs): Factor out programs that
require remote.
* src/libvirt_private.syms (rpc/virnet*.h): Move...
* src/libvirt_remote.syms: ...into new file.
* src/Makefile.am (SYM_FILES): Ship new syms file.
Signed-off-by: Eric Blake <eblake@redhat.com>
The problem is described by [0] but its effect on libvirt is that
starting a container with a full distro running systemd after having
stopped it simply fails.
The container cleanup now calls the machined Terminate function to make
sure that everything is in order for the next run.
[0]: https://bugs.freedesktop.org/show_bug.cgi?id=68370
The functions
- iptablesAddForwardDontMasquerade(),
- iptablesRemoveForwardDontMasquerade
handle exceptions in the masquerading implemented in the POSTROUTING chain
of the "nat" table. Such exceptions should be added as chronologically
latest, logically top-most rules.
The bridge driver will call these functions beginning with the next patch:
some special destination IP addresses always refer to the local
subnetwork, even though they don't match any practical subnetwork's
netmask. Packets from virbrN targeting such IP addresses are never routed
outwards, but the current rules treat them as non-virbrN-destined packets
and masquerade them. This causes problems for some receivers on virbrN.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
To allow creation of a virNetSocketPtr instance from a pre-opened
socketpair FD, add a virNetSocketNewConnectSockFD method.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virDomainGetMetadata function was designed to support also retrieval
of app specific metadata from the <metadata> element. This functionality
was never implemented originally.
The function implemented common behavior that can be reused for other
hypervisor drivers that use the virDomainObj data structures. Factor out
the core into a separate helper func.
The function implemented common behavior that can be reused for other
hypervisor drivers that use the virDomainObj data structures. Factor out
the core into a separate helper func.
When using a <interface type="network"> that points to a network with
hostdev forwarding mode a hostdev alias is created for the network. This
allias is inserted into the hostdev list, but is backed with a part of
the network object that it is connected to.
When a VM is being stopped qemuProcessStop() calls
networkReleaseActualDevice() which eventually frees the memory for the
hostdev object. Afterwards when the domain definition is being freed by
virDomainDefFree() an invalid pointer is accessed by
virDomainHostdevDefFree() and may cause a crash of the daemon.
This patch removes the entry in the hostdev list before freeing the
depending memory to avoid this issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000973
Some systems may not use DBus in their system. Add a method to check if
the system bus is available that doesn't print error messages so that
code can later check for this condition and use an alternative approach.
*src/util/virstoragefile.c: Add a helper function to get
the first name of missing backing files, if the name is NULL,
it means the diskchain is not broken.
*src/qemu/qemu_domain.c: qemuDiskChainCheckBroken(disk) to
check if its chain is broken
There are some interesting escaping rules to consider when dealing
with systemd slice/scope names. Thus it is helpful to have APIs
for formatting names
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This function is needed for virt-login-shell. Also modify virGirUserDirectory
to use the new function, to simplify the code.
Signed-off-by: Eric Blake <eblake@redhat.com>
The virCgroupIsValidMachine does not need to be called from
outside the cgroups file now, so make it static.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of requiring drivers to use a combination of calls
to virCgroupNewDetect and virCgroupIsValidMachine, combine
the two into virCgroupNewDetectMachine
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of requiring one API call to create a cgroup and
another to add a task to it, introduce a new API
virCgroupNewMachine which does both jobs at once. This
will facilitate the later code to talk to systemd to
achieve this job which is also atomic.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Parsing 'user:group' is useful even outside the DAC security driver,
so expose the most abstract function which has no DAC security driver
bits in itself.
Since PCI bridges, PCIe bridges, PCIe switches, and PCIe root ports
all share the same namespace, they are all defined as controllers of
type='pci' in libvirt (but with a differing model attribute). Each of
these controllers has a certain connection type upstream, allows
certain connection types downstream, and each can either allow a
single downstream connection at slot 0, or connections from slot 1 -
31.
Right now, we only support the pci-root and pci-bridge devices, both
of which only allow PCI devices to connect, and both which have usable
slots 1 - 31. In preparation for adding other types of controllers
that have different capabilities, this patch 1) adds info to the
qemuDomainPCIAddressBus object to indicate the capabilities, 2) sets
those capabilities appropriately for pci-root and pci-bridge devices,
and 3) validates that the controller being connected to is the proper
type when allocating slots or validating that a user-selected slot is
appropriate for a device..
Having this infrastructure in place will make it much easier to add
support for the other PCI controller types.
While it would be possible to do all the necessary checking by just
storing the controller model in the qemyuDomainPCIAddressBus, it
greatly simplifies all the validation code to also keep a "flags",
"minSlot" and "maxSlot" for each - that way we can just check those
attributes rather than requiring a nearly identical switch statement
everywhere we need to validate compatibility.
You may notice many places where the flags are seemingly hard-coded to
QEMU_PCI_CONNECT_HOTPLUGGABLE | QEMU_PCI_CONNECT_TYPE_PCI
This is currently the correct value for all PCI devices, and in the
future will be the default, with small bits of code added to change to
the flags for the few devices which are the exceptions to this rule.
Finally, there are a few places with "FIXME" comments. Note that these
aren't indicating places that are broken according to the currently
supported devices, they are places that will need fixing when support
for new PCI controller models is added.
To assure that there was no regression in the auto-allocation of PCI
addresses or auto-creation of integrated pci-root, ide, and usb
controllers, a new test case (pci-bridge-many-disks) has been added to
both the qemuxml2argv and qemuxml2xml tests. This new test defines a
domain with several dozen virtio disks but no pci-root or
pci-bridges. The .args file of the new test case was created using
libvirt sources from before this patch, and the test still passes
after this patch has been applied.
The virCgroupNewDomainDriver and virCgroupNewDriver methods
are obsolete now that we can auto-detect existing cgroup
placement. Delete them to reduce code bloat.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add virCgroupIsValidMachine API to check whether an auto
detected cgroup is valid for a machine. This lets us
check if a VM has just been placed into some generic
shared cgroup, or worse, the root cgroup
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a virCgroupNewDetect API which is used to initialize a
cgroup object with the placement of an arbitrary process.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add virErrorSetErrnoFromLastError and virLastErrorIsSystemErrno
to simplify code which wants to handle system errors in a more
graceful fashion.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To register virtual machines and containers with systemd-machined,
and thus have cgroups auto-created, we need to talk over DBus.
This is somewhat tedious code, so introduce a dedicated function
to isolate the DBus call in one place.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Doing DBus method calls using libdbus.so is tedious in the
extreme. systemd developers came up with a nice high level
API for DBus method calls (sd_bus_call_method). While
systemd doesn't use libdbus.so, their API design can easily
be ported to libdbus.so.
This patch thus introduces methods virDBusCallMethod &
virDBusMessageRead, which are based on the code used for
sd_bus_call_method and sd_bus_message_read. This code in
systemd is under the LGPLv2+, so we're license compatible.
This code is probably pretty unintelligible unless you are
familiar with the DBus type system. So I added some API
docs trying to explain how to use them, as well as test
cases to validate that I didn't screw up the adaptation
from the original systemd code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
These helpers use the remembered host capabilities to retrieve the cpu
map rather than query the host again. The intended usage for this
helpers is to fix automatic NUMA placement with strict memory alloc. The
code doing the prepare needs to pin the emulator process only to cpus
belonging to a subset of NUMA nodes of the host.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are two levels on which a device may be hotplugged: config
and live. The config level requires just an insert or remove from
internal domain definition structure, which is exactly what this
patch does. There is currently no implementation for a chardev
update action, as there's not much to be updated. But more
importantly, the only thing that can be updated is path or socket
address by which chardevs are distinguished. So the update action
is currently not supported.
This new function updates or adds a feature to a existing cpu model
definition. This function will be helpful to allow tuning of
"host-model" features in later patches.
For now, only these three helpers are needed:
virDomainChrFind - to find a duplicate chardev within VM def
virDomainChrInsert - wrapper for inserting a new chardev into VM def
virDomainChrRemove - wrapper for removing chardev from VM def
There is, however, one internal helper as well:
virDomainChrGetDomainPtrs which sets given pointers to one of
vmdef->{parallels,serials,consoles,channels} based on passed
chardev type.
Since neither getpwuid_r() nor initgroups() are safe to call in
between fork and exec (they obtain a mutex, but if some other
thread in the parent also held the mutex at the time of the fork,
the child will deadlock), we have to split out the functionality
that is unsafe. At least glibc's initgroups() uses getgrouplist
under the hood, so the ideal split is to expose getgrouplist for
use before a fork. Gnulib already gives us a nice wrapper via
mgetgroups; we wrap it once more to look up by uid instead of name.
* bootstrap.conf (gnulib_modules): Add mgetgroups.
* src/util/virutil.h (virGetGroupList): New declaration.
* src/util/virutil.c (virGetGroupList): New function.
* src/libvirt_private.syms (virutil.h): Export it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Actually, I'm turning this function into a macro as filename,
function name and line number needs to be passed. The new
function virAsprintfInternal is introduced with the extended set
of arguments.
While iterating with virDomainObjListForEach it is safe to remove
current element. But while iterating, 'doms' lock is already taken, so
can't use standard virDomainObjListRemove. So introduce
virDomainObjListRemoveLocked for this purpose.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Ensure that all APIs which list node device objects filter
them against the access control system.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
iptablesContext holds only 4 pairs of iptables
(table, chain) and there's no need to pass
it around.
This is a first step towards separating bridge_driver.c
in platform-specific parts.
Any device which belongs to an "IOMMU group" (used by vfio) will
have links to all devices of its group listed in
/sys/bus/pci/$device/iommu_group/devices;
/sys/bus/pci/$device/iommu_group is actually a link to
/sys/kernel/iommu_groups/$n, where $n is the group number (there
will be a corresponding device node at /dev/vfio/$n once the
devices are bound to the vfio-pci driver)
The following functions are added:
virPCIDeviceGetIOMMUGroupList
Gets a virPCIDeviceList with one virPCIDeviceList for each device
in the same IOMMU group as the provided virPCIDevice (a copy of the
original device object is included in the list.
virPCIDeviceAddressIOMMUGroupIterate
Calls the function @actor once for each device in the group that
contains the given virPCIDeviceAddress.
virPCIDeviceAddressGetIOMMUGroupAddresses
Fills in a virPCIDeviceAddressPtr * with an array of
virPCIDeviceAddress, one for each device in the iommu group of the
provided virPCIDeviceAddress (including a copy of the original).
virPCIDeviceAddressGetIOMMUGroupNum
Returns the group number as an int (a valid group number will always
be 0 or greater). If there is no iommu_group link in the device's
directory (usually indicating that vfio isn't loaded), -2 will be
returned. On any real error, -1 will be returned.
I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).
I'm changing it now because I'm about to add several more functions
that deal with iommu groups.
All APIs that take typed parameters are only using params address in
their entry point debug messages. With the new VIR_TYPED_PARAMS_DEBUG
macro, all functions can easily log all individual typed parameters
passed to them.
* virPCIDeviceFindByIDs - find a device on a list w/o creating an object
This makes searching for an existing device on a list lighter weight.
* virPCIDeviceCopy - make a copy of an existing virPCIDevice object.
* virPCIDeviceGetDriverPathAndName - construct new strings containing
1) the name of the driver bound to this device.
2) the full path to the sysfs config for that driver.
(This code was lifted from virPCIDeviceUnbindFromStub, and replaced
there with a call to this new function).
Add a new 'access_drivers' config parameter to the libvirtd.conf
configuration file. This allows admins to setup the default
access control drivers to use for API authorization. The same
driver is to be used by all internal drivers & APIs
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch introduces the virAccessManagerPtr class as the
interface between virtualization drivers and the access
control drivers. The viraccessperm.h file defines the
various permissions that will be used for each type of object
libvirt manages
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add <features> and <compat> elements to volume target XML.
<compat> is a string which for qcow2 represents the QEMU version
it should be compatible with. Valid values are 0.10 and 1.1.
1.1 is implicit if the <features> element is present, otherwise
qemu-img default is used. 0.10 can be specified to explicitly
create older images after the qemu-img default changes.
<features> contains optional features, so far
<lazy_refcounts/> is available, which enables caching of reference
counters, improving performance for snapshots.
Call virLogVMessage instead of virLogMessage, since libudev
called us with a va_list object, not a list of arguments.
Honor message priority and strip the trailing newline.
https://bugzilla.redhat.com/show_bug.cgi?id=969152
We can't use GNULIB's fprintf-posix due to licensing
incompatibilities. We do already have a portable
formatting via virAsprintf() which we got from GNULIB
though. We can use to create a virFilePrintf() function.
But really gnulib could just provide a 'fprintf'
module, that depended on just its 'asprintf' module.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
-vnc :5900,share=allow-exclusive
allows clients to ask for exclusive access which is
implemented by dropping other connections Connecting
multiple clients in parallel requires all clients asking
for a shared session (vncviewer: -shared switch)
-vnc :5900,share=force-shared
disables exclusive client access. Useful for shared
desktop sessions, where you don't want someone forgetting
specify -shared disconnect everybody else.
-vnc :5900,share=ignore
completely ignores the shared flag and allows everybody
connect unconditionally
Introduce use of a virDomainDefPtr in the domain lookup
APIs to simplify introduction of ACL security checks.
The virDomainPtr cannot be safely used, since the app
may have supplied mis-matching name/uuid/id fields. eg
the name points to domain X, while the uuid points to
domain Y. Resolving the virDomainPtr to a virDomainDefPtr
ensures a consistent name/uuid/id set.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
In an upcoming patch, I need the way to safely transfer a nested
virJSON object out of its parent container for independent use,
even after the parent is freed.
* src/util/virjson.h (virJSONValueObjectRemoveKey): New function.
(_virJSONObject, _virJSONArray): Use correct type.
* src/util/virjson.c (virJSONValueObjectRemoveKey): Implement it.
* src/libvirt_private.syms (virjson.h): Export it.
* tests/jsontest.c (mymain): Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
network: static route support for <network>
This patch adds the <route> subelement of <network> to define a static
route. the address and prefix (or netmask) attribute identify the
destination network, and the gateway attribute specifies the next hop
address (which must be directly reachable from the containing
<network>) which is to receive the packets destined for
"address/(prefix|netmask)".
These attributes are translated into an "ip route add" command that is
executed when the network is started. The command used is of the
following form:
ip route add <address>/<prefix> via <gateway> \
dev <virbr-bridge> proto static metric <metric>
Tests are done to validate that the input data are correct. For
example, for a static route ip definition, the address must be a
network address and not a host address. Additional checks are added
to ensure that the specified gateway is directly reachable via this
network (i.e. that the gateway IP address is in the same subnet as one
of the IP's defined for the network).
prefix='0' is supported for both family='ipv4' address='0.0.0.0'
netmask='0.0.0.0' or prefix='0', and for family='ipv6' address='::',
prefix=0', although care should be taken to not override a desired
system default route.
Anytime an attempt is made to define a static route which *exactly*
duplicates an existing static route (for example, address=::,
prefix=0, metric=1), the following error message will be sent to
syslog:
RTNETLINK answers: File exists
This can be overridden by decreasing the metric value for the route
that should be preferred, or increasing the metric for the route that
shouldn't be preferred (and is thus in place only in anticipation that
the preferred route may be removed in the future). Caution should be
used when manipulating route metrics, especially for a default route.
Note: The use of the command-line interface should be replaced by
direct use of libnl so that error conditions can be handled better. But,
that is being left as an exercise for another day.
Signed-off-by: Gene Czarcinski <gene@czarc.net>
Signed-off-by: Laine Stump <laine@laine.org>
Add a virFileNBDDeviceAssociate method, which given a filename
will setup a NBD device, using qemu-nbd as the server.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the fdstream function hardcodes the location
of the iohelper to LIBEXECDIR "/libvirt_iohelper". This
is not convenient when trying to write test cases which
use this code. Add a virFDStreamSetIOHelper method to
allow the test cases to point to the location of the
un-installed iohelper binary.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
These all existed before virfile.c was created, and for some reason
weren't moved.
This is mostly straightfoward, although the syntax rule prohibiting
write() had to be changed to have an exception for virfile.c instead
of virutil.c.
This movement pointed out that there is a function called
virBuildPath(), and another almost identical function called
virFileBuildPath(). They really should be a single function, which
I'll take care of as soon as I figure out what the arglist should look
like.
Since PIDs can be reused, polkit prefers to be given
a (PID,start time) pair. If given a PID on its own,
it will attempt to lookup the start time in /proc/pid/stat,
though this is subject to races.
It is safer if the client app resolves the PID start
time itself, because as long as the app has the client
socket open, the client PID won't be reused.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are various methods named "virXXXXSecurityContext",
which are specific to SELinux. Rename them all to
"virXXXXSELinuxContext". They will still raise errors at
runtime if SELinux is not compiled in
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code adaptation is not done right now, but in subsequent patches.
Hence I am not implementing syntax-check rule as it would break
compilation. Developers are strongly advised to use these new macros.
They are similar to VIR_ALLOC() logic: VIR_STRDUP(dst, src) returns zero
on success, -1 otherwise. In case you don't want to report OOM error,
use the _QUIET variant of a macro.
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
This will be used on a tap file descriptor returned by the bridge helper
to populate the <target> element, because the helper does not provide
the interface name.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch adds two sets of functions:
1) lower level virProcessSet*() functions that will immediately set
the RLIMIT_MEMLOCK. RLIMIT_NPROC, or RLIMIT_NOFILE of either the
current process (using setrlimit()) or any other process (using
prlimit()). "current process" is indicated by passing a 0 for pid.
2) functions for virCommand* that will setup a virCommand object to
set those limits at a later time just after it has forked a new
process, but before it execs the new program.
configure.ac has prlimit and setrlimit added to the list of functions
to check for, and the low level functions log an "unsupported" error)
on platforms that don't support those functions.
This can be set when the virPCIDevice is created and placed on a list,
then used later when traversing the list to determine which stub
driver to bind/unbind for managed devices.
The existing Detach and Attach functions' signatures haven't been
changed (they still accept a stub driver name in the arg list), but if
the arg list has NULL for stub driver and one is available in the
device's object, that will be used. (we may later deprecate and remove
the arg from those functions).
<controller type='pci' index='0' model='pci-root'/>
is auto-added to pc* machine types.
Without this controller PCI bus 0 is not available and
no PCI addresses are assigned by default.
Since older libvirt supported PCI bus 0 even without
this controller, it is removed from the XML when migrating.
Now we set the default disk driver name when parsing
the qemu command line too, hence all the test changes.
Assume format type is 'auto' when none is specified on
qemu command line.
The driver.h struct for node devices used an inconsistent
naming scheme 'DeviceMonitor' instead of the more usual
'NodeDeviceDriver'. Fix this everywhere it has leaked
out to.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure that the driver struct field names match the public
API names. For an API virXXXX we must have a driver struct
field xXXXX. ie strip the leading 'vir' and lowercase any
leading uppercase letters.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Refactoring done in 19c6ad9ac7 didn't
correctly take into account the order cgroup limit modification needs to
be done in. This resulted into errors when decreasing the limits.
The operations need to take place in this order:
decrease hard limit
change swap hard limit
or
change swap hard limit
increase hard limit
This patch also fixes the check if the hard_limit is less than
swap_hard_limit to print better error messages. For this purpose I
introduced a helper function virCompareLimitUlong to compare limit
values where value of 0 is equal to unlimited. Additionally the check is
now applied also when the user does not provide all of the tunables
through the API and in that case the currently set values are used.
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=950478
Create the utility function virSocketAddrGetIpPrefix() to
determine the prefix for this network. The code in this
function was adapted from virNetworkIpDefPrefix().
Update virNetworkIpDefPrefix() in src/conf/network_conf.c
to use the new utility function.
Signed-off-by: Gene Czarcinski <gene@czarc.net>
Until now tranisent networks weren't really useful as libvirtd wasn't
able to remember them across restarts. This patch adds support for
loading status files of transient networks (that already were generated)
so that the status isn't lost.
This patch chops up virNetworkObjUpdateParseFile and turns it into
virNetworkLoadState and a few friends that will help us to load status
XMLs and refactors the functions that are loading the configs to use
them.
Add a virCgroupIsolateMount method which looks at where the
current process is place in the cgroups (eg /system/demo.lxc.libvirt)
and then remounts the cgroups such that this sub-directory
becomes the root directory from the current process' POV.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A resource partition is an absolute cgroup path, ignoring the
current process placement. Expose a virCgroupNewPartition API
for constructing such cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rename all the virCgroupForXXX methods to use the form
virCgroupNewXXX since they are all constructors. Also
make sure the output parameter is the last one in the
list, and annotate all pointers as non-null. Fix up
all callers, and make sure they use true/false not 0/1
for the boolean parameters
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Parse the domain XML with TPM passthrough support.
The TPM passthrough XML may look like this:
<tpm model='tpm-tis'>
<backend type='passthrough'>
<device path='/dev/tpm0'/>
</backend>
</tpm>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Tested-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
The helper function to look up disk controller model may be used by scsi
hostdev. But it should be changed to use device info.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
The virCgroupMounted method is badly named, since a controller can be
mounted, but disabled in the current object. Rename the method to be
virCgroupHasController. Also make it tolerant to a NULL virCgroupPtr
and out-of-range controller index, to avoid duplication of these
checks in all callers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This finds the parent for vHBA by iterating over all the HBA
which supports vport_ops capability on the host, and return
the first one which is online, not saturated (vports in use
is less than max_vports).
The helper iterates over sysfs, to find out the matched scsi host
name by comparing the wwnn,wwpn pair. It will be used by checkPool
and refreshPool of storage scsi backend. New helper getAdapterName
is introduced in storage_backend_scsi.c, which uses the new util
helper virGetFCHostNameByWWN to get the fc_host adapter name.
This introduces 4 new attributes for storage pool source adapter.
E.g.
<adapter type='fc_host' parent='scsi_host5' wwnn='20000000c9831b4b' wwpn='10000000c9831b4b'/>
Attribute 'type' can be either 'scsi_host' or 'fc_host', and defaults
to 'scsi_host' if attribute 'name' is specified. I.e. It's optional
for 'scsi_host' adapter, for back-compat reason. However, mandatory
for 'fc_host' adapter and any new future adapter types. Attribute
'parent' is to specify the parent for the fc_host adapter.
* docs/formatstorage.html.in:
- Add documents for the 4 new attrs
* docs/schemas/storagepool.rng:
- Add RNG schema
* src/conf/storage_conf.c:
- Parse and format the new XMLs
* src/conf/storage_conf.h:
- New struct virStoragePoolSourceAdapter, replace "char *adapter" with it;
- New enum virStoragePoolSourceAdapterType
* src/libvirt_private.syms:
- Export TypeToString and TypeFromString
* src/phyp/phyp_driver.c:
- Replace "adapter" with "adapter.data.name", which is member of the union
of the new struct virStoragePoolSourceAdapter now. Later patch will
add the checking, as "adapter.data.name" is only valid for "scsi_host"
adapter.
* src/storage/storage_backend_scsi.c:
- Like above
* tests/storagepoolxml2xmlin/pool-scsi-type-scsi-host.xml:
* tests/storagepoolxml2xmlin/pool-scsi-type-fc-host.xml:
- New test for 'fc_host' and "scsi_host" adapter
* tests/storagepoolxml2xmlout/pool-scsi.xml:
- Change the expected output, as the 'type' defaults to 'scsi_host' if 'name"
specified now
* tests/storagepoolxml2xmlout/pool-scsi-type-scsi-host.xml:
* tests/storagepoolxml2xmlout/pool-scsi-type-fc-host.xml:
- New test
* tests/storagepoolxml2xmltest.c:
- Include the test
The virCgroupGetAppRoot is not clear in its meaning. Change
to virCgroupForSelf to highlight that this returns the
cgroup config for the caller's process
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Use the virDomainXMLConf structure to hold this data and tweak the code
to avoid semantic change.
Without configuration the KVM mac prefix is used by default. I chose it
as it's in the privately administered segment so it should be usable for
any purposes.
This patch removes the emulatorRequired field and associated
infrastructure from the virCaps object. Instead the driver specific
callbacks are used as this field isn't enforced by all drivers.
This patch implements the appropriate callbacks in the qemu and lxc
driver and moves to check to that location.
This patch is the result of running:
for i in $(git ls-files | grep -v html | grep -v \.po$ ); do
sed -i -e "s/virDomainXMLConf/virDomainXMLOption/g" -e "s/xmlconf/xmlopt/g" $i
done
and a few manual tweaks.
One of my previous patches manipulated virSecurityLabel* APIs,
some were added to header files, and some were renamed. However,
these changes were not reflected in libvirt_private.syms.
The virDomainDefGetSecurityLabelDef was modifying the domain XML.
It tried to find a seclabel corresponding to given sec driver. If the
label wasn't found, the function created one which is wrong. In fact
it's security manager which should modify this part of domain XML.
This abstracts nodeDeviceVportCreateDelete as an util function
virManageVport, which can be further used by later storage patches
(to support persistent vHBA, I don't want to create the vHBA
using the public API, which is not good).
This adds two util functions (virIsCapableFCHost and virIsCapableVport),
and rename helper check_fc_host_linux as detect_scsi_host_caps,
check_capable_vport_linux is removed, as it's abstracted to the util
function virIsCapableVport. detect_scsi_host_caps nows detect both
the fc_host and vport_ops capabilities. "stat(2)" is replaced with
"access(2)" for saving.
* src/util/virutil.h:
- Declare virIsCapableFCHost and virIsCapableVport
* src/util/virutil.c:
- Implement virIsCapableFCHost and virIsCapableVport
* src/node_device/node_device_linux_sysfs.c:
- Remove check_capable_vport_linux
- Rename check_fc_host_linux as detect_scsi_host_caps, and refactor
it a bit to detect both fc_host and vport_os capabilities
* src/node_device/node_device_driver.h:
- Change/remove the related declarations
* src/node_device/node_device_udev.c: (Use detect_scsi_host_caps)
* src/node_device/node_device_hal.c: (Likewise)
* src/node_device/node_device_driver.c (Likewise)
"open_wwn_file" in node_device_linux_sysfs.c is redundant, on one
hand it duplicates work of virFileReadAll, on the other hand, it's
waste to use a function for it, as there is no other users of it.
So I don't see why the file opening work cannot be done in
"read_wwn_linux".
"read_wwn_linux" can be abstracted as an util function. As what all
it does is to read the sysfs entry.
So this patch removes "open_wwn_file", and abstract "read_wwn_linux"
as an util function "virReadFCHost" (a more general name, because
after changes, it can read each of the fc_host entry now).
* src/util/virutil.h: (Declare virReadFCHost)
* src/util/virutil.c: (Implement virReadFCHost)
* src/node_device/node_device_linux_sysfs.c: (Remove open_wwn_file,
and read_wwn_linux)
src/node_device/node_device_driver.h: (Remove the declaration of
read_wwn_linux, and the related macros)
src/libvirt_private.syms: (Export virReadFCHost)
Intend to reduce the redundant code,use virNumaSetupMemoryPolicy
to replace virLXCControllerSetupNUMAPolicy and
qemuProcessInitNumaMemoryPolicy.
This patch also moves the numa related codes to the
file virnuma.c and virnuma.h
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
qemuGetNumadAdvice will be used by LXC driver, rename
it to virNumaGetAutoPlacementAdvice and move it to virnuma.c
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Add APIs which allow creation of a virIdentity from the info
associated with a virNetServerClientPtr instance. This is done
based on the results of client authentication processes like
TLS, x509, SASL, SO_PEERCRED
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a local object virIdentity for managing security
attributes used to form a client application's identity.
Instances of this object are intended to be used as if they
were immutable, once created & populated with attributes
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A socket object has various pieces of security data associated
with it, such as the SELinux context, the SASL username and
the x509 distinguished name. Add new APIs to virNetServerClient
and related modules to access this data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add necessary handling code for the new s390 CCW address type to
virDomainDeviceInfo. Further, introduce memory management, XML
parsing, output formatting and range validation for the new
virDomainDeviceCCWAddress type.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
A number of symbols are only present when GNUTLS is enabled.
Thus we must use a separate libvirt_gnutls.syms file for them
instead of libvirt_private.syms
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCaps structure gathered a ton of irrelevant data over time that.
The original reason is that it was propagated to the XML parser
functions.
This patch aims to create a new data structure virDomainXMLConf that
will contain immutable data that are used by the XML parser. This will
allow two things we need:
1) Get rid of the stuff from virCaps
2) Allow us to add callbacks to check and add driver specific stuff
after domain XML is parsed.
This first attempt removes pointers to private data allocation functions
to this new structure and update all callers and function that require
them.
Currently the server determines whether authentication of clients
is complete, by checking whether an identity is set. This patch
removes that lame hack and replaces it with an explicit method
for changing the client auth code
* daemon/remote.c: Update for new APis
* src/libvirt_private.syms, src/rpc/virnetserverclient.c,
src/rpc/virnetserverclient.h: Remove virNetServerClientGetIdentity
and virNetServerClientSetIdentity, adding a new method
virNetServerClientSetAuth.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a virThreadCancel function. This functional is inherently
dangerous and not something we want to use in general, but
integration with SELinux requires that we provide this stub.
We leave out any Win32 impl to discourage further use and
because obviously SELinux isn't enabled on Win32
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When reading log output from QEMU/LXC we need to skip over any
libvirt log messages. Currently the QEMU driver checks for a
fixed string, but this is better done with a regex. Add a method
virLogProbablyLogMessage to do a regex check
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This reverts commit 0bbbd42c30.
The design for this feature is not complete, and may change the
name of the 'schid' attribute. Revert requested by Viktor Mihajlovski.
This patch adds basic configuration support for the RNG device
supporting the virtio model with the "random" and "egd" backend types as
described in the schema in the previous patch.
For both AttachDevice and UpdateDevice APIs, if the disk device
is 'cdrom' or 'floppy', the operations could be ejecting, updating,
and inserting. For either ejecting or updating, the shared disk
entry of the original disk src has to be removed, because it's
not useful anymore.
And since the original disk def will be changed, new disk def passed
as argument will be free'ed in qemuDomainChangeEjectableMedia, so
we need to copy the orignal disk def before
qemuDomainChangeEjectableMedia, to use it for qemuRemoveSharedDisk.
Automating a sorting check is the only way to ensure we don't
regress. Suggested by Dan Berrange.
* src/check-symsorting.pl (check_sorting): Add a parameter,
validate that groups are in order, and that files exist.
* src/Makefile.am (check-symsorting): Adjust caller.
* src/libvirt_private.syms: Fix typo.
* src/libvirt_linux.syms: Fix file name.
* src/libvirt_vmx.syms: Likewise.
* src/libvirt_xenxs.syms: Likewise.
* src/libvirt_sasl.syms: Likewise.
* src/libvirt_libssh2.syms: Likewise.
* src/libvirt_esx.syms: Mention file name.
* src/libvirt_openvz.syms: Likewise.
Recent renames were not reflected into the comments of
libvirt_private.syms; furthermore, since we mix private headers from
several directories into this file, knowing where the file lives
can be helpful.
* src/libvirt_private.sym: Reflect recent names.
Normally when a process' uid is changed to non-0, all the capabilities
bits are cleared, even those explicitly set with calls to
capng_update()/capng_apply() made immediately before setuid. And
*after* the process' uid has been changed, it no longer has the
necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities
bits, it is necessary to either call capng_change_id() (which
unfortunately doesn't currently call initgroups to setup auxiliary
group membership), or to perform the small amount of calisthenics
contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities
setting/clearing in virSetUIDGIDWithCaps() and virCommand's
virSetCapabilities() (which it will replace in the next patch) is that
the new function properly clears the capabilities bounding set, so it
will not be possible for a child process to set any new
capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in
capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed
to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current
capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities
maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to
setuid/setgid/change the bounding set (unless they were also requested
by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled
with setting the uid, this is necessarily done in a single function,
rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at
time of execve) for a process that has uid != 0, the *file*
capabilities of the binary being executed must also have the desired
capabilities bit(s) set (see "man 7 capabilities"). This can be done
with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio").
The existing virSecurityManagerSetProcessLabel() API is designed so
that it must be called after forking the child process, but before
exec'ing the child. Due to the way the virCommand API works, that
means it needs to be put in a "hook" function that virCommand is told
to call out to at that time.
Setting the child process label is a basic enough need when executing
any process that virCommand should have a method of doing that. But
virCommand must be told what label to set, and only the security
driver knows the answer to that question.
The new virSecurityManagerSet*Child*ProcessLabel() API is the way to
transfer the knowledge about what label to set from the security
driver to the virCommand object. It is given a virCommandPtr, and each
security driver calls the appropriate virCommand* API to tell
virCommand what to do between fork and exec.
1) in the case of the DAC security driver, it calls
virCommandSetUID/GID() to set a uid and gid that must be set for the
child process.
2) for the SELinux security driver, it calls
virCommandSetSELinuxLabel() to save a copy of the char* that will be
sent to setexeccon_raw() *after forking the child process*.
3) for the AppArmor security drivers, it calls
virCommandSetAppArmorProfile() to save a copy of the char* that will
be sent to aa_change_profile() *after forking the child process*.
With this new API in place, we will be able to remove
virSecurityManagerSetProcessLabel() from any virCommand pre-exec
hooks.
(Unfortunately, the LXC driver uses clone() rather than virCommand, so
it can't take advantage of this new security driver API, meaning that
we need to keep around the older virSecurityManagerSetProcessLabel(),
at least for now.)
virCommand gets two new APIs: virCommandSetSELinuxLabel() and
virCommandSetAppArmorProfile(), which both save a copy of a
null-terminated string in the virCommand. During virCommandRun, if the
string is non-NULL and we've been compiled with AppArmor and/or
SELinux security driver support, the appropriate security library
function is called for the child process, using the string that was
previously set. In the case of SELinux, setexeccon_raw() is called,
and for AppArmor, aa_change_profile() is called.
This functionality has been added so that users of virCommand can use
the upcoming virSecurityManagerSetChildProcessLabel() prior to running
a child process, rather than needing to setup a hook function to be
called (and in turn call virSecurityManagerSetProcessLabel()) *during*
the setup of the child process.
If a uid and/or gid is specified for a command, it will be set just
after the user-supplied post-fork "hook" function is called.
The intent is that this can replace user hook functions that set
uid/gid. This moves the setting of uid/gid and dropping of
capabilities closer to each other, which is important since the two
should really be done at the same time (libcapng provides a single
function that does both, which we will be unable to use, but want to
mimic as closely as possible).
The hook scripts used by virCommand must be careful wrt
accessing any mutexes that may have been held by other
threads in the parent process. With the recent refactoring
there are 2 potential flaws lurking, which will become real
deadlock bugs once the global QEMU driver lock is removed.
Remove use of the QEMU driver lock from the hook function
by passing in the 'virQEMUDriverConfigPtr' instance directly.
Add functions to the virSecurityManager to be invoked before
and after fork, to ensure the mutex is held by the current
thread. This allows it to be safely used in the hook script
in the child process.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add necessary handling code for the new s390 CCW address type to
virDomainDeviceInfo. Further, introduce memory management, XML
parsing, output formatting and range validation for the new
virDomainDeviceCCWAddress type.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
To enable locking to be introduced to the security manager
objects later, turn virSecurityManager into a virObjectLockable
class
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable virCapabilities instances to be reference counted,
turn it into a virObject. All cases of virCapabilitiesFree
turn into virObjectUnref
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We are requesting for stderr catching for all cases in
virFileWrapperFdNew(). There is no need to have a separate
function just to report an error, esp. when we can do it in
virFileWrapperFdClose().
We had an easy way to iterate set bits, but not for iterating
cleared bits.
* src/util/virbitmap.h (virBitmapNextClearBit): New prototype.
* src/util/virbitmap.c (virBitmapNextClearBit): Implement it.
* src/libvirt_private.syms (bitmap.h): Export it.
* tests/virbitmaptest.c (test4): Test it.
To allow modifications to the lists to be synchronized, convert
virPCIDeviceList and virUSBDeviceList into virObjectLockable
classes. The locking, however, will not be self-contained. The
users of these classes will have to call virObjectLock/Unlock
in the critical regions.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The duplicate VM checking should be done atomically with
virDomainObjListAdd, so shoud not be a separate function.
Instead just use flags to indicate what kind of checks are
required.
This pair, used in virDomainCreateXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainRestoreFlags:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, true)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainDefineXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 0) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
0, NULL)))
goto cleanup;
As a step towards making virDomainObjList thread-safe turn it
into an opaque virObject, preventing any direct access to its
internals.
As part of this a new method virDomainObjListForEach is
introduced to replace all existing usage of virHashForEach
Currently, if we want to feed stdin, or catch stdout or stderr of a
virCommand we have to use virCommandRun(). When using virCommandRunAsync()
we have to register FD handles by hand. This may lead to code duplication.
Hence, introduce an internal API, which does this automatically within
virCommandRunAsync(). The intended usage looks like this:
virCommandPtr cmd = virCommandNew*(...);
char *buf = NULL;
...
virCommandSetOutputBuffer(cmd, &buf);
virCommandDoAsyncIO(cmd);
if (virCommandRunAsync(cmd, NULL) < 0)
goto cleanup;
...
if (virCommandWait(cmd, NULL) < 0)
goto cleanup;
/* @buf now contains @cmd's stdout */
VIR_DEBUG("STDOUT: %s", NULLSTR(buf));
...
cleanup:
VIR_FREE(buf);
virCommandFree(cmd);
Note, that both stdout and stderr buffers may change until virCommandWait()
returns.
While working with a pmsuspend vs. snapshot issue, I noticed that
the state file in /var/run/libvirt/qemu/dom.xml contained a rather
suspicious "(null)" string, which does not round-trip well through
a libvirtd restart. Had I been on a platform other than glibc
where printf("%s",NULL) crashes instead of printing (null), we might
have noticed the problem much sooner.
And in fixing that problem, I also noticed that we had several
missing states, because we were #defining several *_LAST names
to a value _different_ than what they were already given as enums
in libvirt.h. Yuck. I got rid of default: labels in the case
statements, because they get in the way of gcc's -Wswitch helping
us ensure we cover all enum values.
* src/conf/domain_conf.c (virDomainStateReasonToString)
(virDomainStateReasonFromString): Fill in missing domain states;
rewrite case statement to let compiler enforce checking.
(VIR_DOMAIN_NOSTATE_LAST, VIR_DOMAIN_RUNNING_LAST)
(VIR_DOMAIN_BLOCKED_LAST, VIR_DOMAIN_PAUSED_LAST)
(VIR_DOMAIN_SHUTDOWN_LAST, VIR_DOMAIN_SHUTOFF_LAST)
(VIR_DOMAIN_CRASHED_LAST): Drop dead defines.
(VIR_DOMAIN_PMSUSPENDED_LAST): Drop dead define.
(virDomainPMSuspendedReason): Add missing enum function.
(virDomainRunningReason, virDomainPausedReason): Add missing enum
value.
* src/conf/domain_conf.h (virDomainPMSuspendedReason): Declare
missing functions.
* src/libvirt_private.syms (domain_conf.h): Export them.
This will allow storing additional topology data in the NUMA topology
definition.
This patch changes the storage type and fixes fallout of the change
across the drivers using it.
This patch also changes semantics of adding new NUMA cell information.
Until now the data were re-allocated and copied to the topology
definition. This patch changes the addition function to steal the
pointer to a pre-allocated structure to simplify the code.
The virDomainObj, qemuAgent, qemuMonitor, lxcMonitor classes
all require a mutex, so can be switched to use virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A great many virObject instances require a mutex, so introduce
a convenient class for this which provides a mutex. This avoids
repeating the tedious init/destroy code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently all classes must directly inherit from virObject.
This allows for arbitrarily deep hierarchy. There's not much
to this aside from chaining up the 'dispose' handlers from
each class & providing APIs to check types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
"virGetDeviceID" could be used across the sources, but it doesn't
relate with this series, and could be done later.
* src/util/virutil.h: (Declare virGetDeviceID, and
vir{Get,Set}DeviceUnprivSGIO)
* src/util/virutil.c: (Implement virGetDeviceID and
vir{Get,Set}DeviceUnprivSGIO)
* src/libvirt_private.syms: Export private symbols of upper helpers
The functionality provided in virchrdev.c (previously virconsole.c) is
applicable to other types of character devices besides consoles, such
as channels. This patch is just code motion, renaming things such as
"console" or "pty", instead using more general terms such as
"character device" or "device path".
The <hostdev> device type has long had a redundant "mode"
attribute, which has always been "subsys". This finally
introduces a new mode "capabilities", which will be used
by the LXC driver for device assignment. Since container
based virtualization uses a single kernel, the idea of
assigning physical PCI devices doesn't make sense. It is
still reasonable to assign USB devices, but for assigning
arbitrary nodes in /dev, the new 'capabilities' mode is
to be used.
The first capability support is 'storage', which is for
assignment of block devices. Functionally this is really
pretty similar to the <disk> support. The only difference
is the device node name is identical in both host and
container namespaces.
<hostdev mode='capabilities' type='storage'>
<source>
<block>/dev/sdf1</block>
</source>
</hostdev>
The second capability support is 'misc', which is for
assignment of character devices. There is no existing
parallel to this. Again the device node is the same
inside & outside the container.
<hostdev mode='capabilities' type='misc'>
<source>
<char>/dev/input/event3</char>
</source>
</hostdev>
The reason for keeping the char & storage devices
separate in the domain XML, is to mirror the split
in the node device XML. NB the node device XML does
not yet report character devices, but that's another
new patch to come
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There was a double free issue caused by virSysinfoRead on s390,
as the same manufacturer string instance was assigned to more
than one processor record.
Cleaned up other potential memory issues and restructured the sysinfo
parsing code by moving repeating patterns into a helper function.
The restructuring made it necessary to conditionally disable
-Wlogical-op for some older GCC versions, using pragma GCC diagnostic.
This is a GCC specific pragma, which is acceptable, since we're
using it to work around a GCC specific bug.
Finally, added a function virSysinfoSetup to configure the sysinfo
data source files/script during run time, to facilitate writing test
programs. This function is not published in sysinfo.h and only
there for testing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Add check-symsorting.pl to perform case-insensitive alphabetical
sorting of groups of symbols. Fix all violations it reports
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When a qemu domain is backed by huge pages, apparmor needs to grant the domain
rw access to files under the hugetlbfs mount point. Add a hook, called in
qemu_process.c, which ends up adding the read-write access through
virt-aa-helper. Qemu will be creating a randomly named file under the
mountpoint and unlinking it as soon as it has mmap()d it, therefore we
cannot predict the full pathname, but for the same reason it is generally
safe to provide access to $path/**.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Currently, we are only keeping a inactive XML configuration
in status dir. This is no longer enough as we need to keep
this class_id attribute so we don't overwrite old entries
when the daemon restarts. However, since there has already
been release which has just <network/> as root element,
and we want to keep things compatible, detect that loaded
status file is older one, and don't scream about it.
Network should be notified if we plug in or unplug an
interface, so it can perform some action, e.g. set/unset
network part of QoS. However, we are doing this in very
early stage, so iface->ifname isn't filled in yet. So
whenever we want to report an error, we must use a different
identifier, e.g. the MAC address.
I noticed when writing the backend functions for virNetworkUpdate that
I was repeating the same sequence of memmove, VIR_REALLOC, nXXX-- (and
messed up the args to memmove at least once), and had seen the same
sequence in a lot of other places, so I decided to write a few
utility functions/macros - see the .h file for full documentation.
The intent is to reduce the number of lines of code, but more
importantly to eliminate the need to check the element size and
element count arithmetic every time we need to do this (I *always*
make at least one mistake.)
VIR_INSERT_ELEMENT: insert one element at an arbitrary index within an
array of objects. The size of each object is determined
automatically by the macro using sizeof(*array). The new element's
contents are copied into the inserted space, then the original copy
of contents are 0'ed out (if everything else was
successful). Compile-time assignment and size compatibility between
the array and the new element is guaranteed (see explanation below
[*])
VIR_INSERT_ELEMENT_COPY: identical to VIR_INSERT_ELEMENT, except that
the original contents of newelem are not cleared to 0 (i.e. a copy
is made).
VIR_APPEND_ELEMENT: This is just a special case of VIR_INSERT_ELEMENT
that "inserts" one past the current last element.
VIR_APPEND_ELEMENT_COPY: identical to VIR_APPEND_ELEMENT, except that
the original contents of newelem are not cleared to 0 (i.e. a copy
is made).
VIR_DELETE_ELEMENT: delete one element at an arbitrary index within an
array of objects. It's assumed that the element being deleted is
already saved elsewhere (or cleared, if that's what is appropriate).
All five of these macros have an _INPLACE variant, which skips the
memory re-allocation of the array, assuming that the caller has
already done it (when inserting) or will do it later (when deleting).
Note that VIR_DELETE_ELEMENT* can return a failure, but only if an
invalid index is given (index + amount to delete is > current array
size), so in most cases you can safely ignore the return (that's why
the helper function virDeleteElementsN isn't declared with
ATTRIBUTE_RETURN_CHECK). A warning is logged if this ever happens,
since it is surely a coding error.
[*] One initial problem with the INSERT and APPEND macros was that,
due to both the array pointer and newelem pointer being cast to void*
when passing to virInsertElementsN(), any chance of type-checking was
lost. If we were going to move in newelem with a memmove anyway, we
would be no worse off for this. However, most current open-coded
insert/append operations use direct struct assignment to move the new
element into place (or just populate the new element directly) - thus
use of the new macros would open a possibility for new usage errors
that didn't exist before (e.g. accidentally sending &newelemptr rather
than newelemptr - I actually did this quite a lot in my test
conversions of existing code).
But thanks to Eric Blake's clever thinking, I was able to modify the
INSERT and APPEND macros so that they *do* check for both assignment
and size compatibility of *ptr (an element in the array) and newelem
(the element being copied into the new position of the array). This is
done via clever use of the C89-guaranteed fact that the sizeof()
operator must have *no* side effects (so an assignment inside sizeof()
is checked for validity, but not actually evaluated), and the fact
that virInsertElementsN has a "# of new elements" argument that we
want to always be 1.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=767057
It was possible to define a network with <forward mode='bridge'> that
had both a bridge device and a forward device defined. These two are
mutually exclusive by definition (if you are using a bridge device,
then this is a host bridge, and if you have a forward dev defined,
this is using macvtap). It was also possible to put <ip>, <dns>, and
<domain> elements in this definition, although those aren't supported
by the current driver (although it's conceivable that some other
driver might support that).
The items that are invalid by definition, are now checked in the XML
parser (since they will definitely *always* be wrong), and the others
are checked in networkValidate() in the network driver (since, as
mentioned, it's possible that some other network driver, or even this
one, could some day support setting those).
Currently to deal with auto-shutdown libvirtd must periodically
poll all stateful drivers. Thus sucks because it requires
acquiring both the driver lock and locks on every single virtual
machine. Instead pass in a "inhibit" callback to virStateInitialize
which drivers can invoke whenever they want to inhibit shutdown
due to existance of active VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since we can't (currently) rely on the ability to provide blanket
support for all possible network changes by calling the toplevel
netdev hostside disconnect/connect functions (due to qemu only
supporting a lockstep between initialization of host side and guest
side of devices), in order to support live change of an interface's
nwfilter we need to make a special purpose function to only call the
nwfilter teardown and setup functions if the filter for an interface
(or its parameters) changes. The pattern is nearly identical to that
used to change the bridge that an interface is connected to.
This patch was inspired by a request from Guido Winkelmann
<guido@sagersystems.de>, who tested an earlier version.
To detect if an interface's nwfilter has changed, we need to also
compare the filterparams, which is a hashtable of virNWFilterVarValue.
virHashEqual can do this nicely, but requires a pointer to a function
that will compare two of the items being stored in the hashes.
This introduces a few new APIs for dealing with strings.
One to split a char * into a char **, another to join a
char ** into a char *, and finally one to free a char **
There is a simple test suite to validate the edge cases
too. No more need to use the horrible strtok_r() API,
or hand-written code for splitting strings.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To be able todo controlled shutdown/reboot of containers an
API to talk to init via /dev/initctl is required. Fortunately
this is quite straightforward to implement, and is supported
by both sysvinit and systemd. Upstart support for /dev/initctl
is unclear.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This new function returns true if the given address is in the range of
any "private" or "local" networks as defined in RFC1918 (IPv4) or
RFC3484/RFC4193 (IPv6), otherwise they return false.
These ranges are:
192.168.0.0/16
172.16.0.0/16
10.0.0.0/24
FC00::/7
FEC0::/10
In order to optionally take advantage of new features in dnsmasq when
the host's version of dnsmasq supports them, but still be able to run
on hosts that don't support the new features, we need to be able to
detect the version of dnsmasq running on the host, and possibly
determine from the help output what options are in this dnsmasq.
This patch implements a greatly simplified version of the capabilities
code we already have for qemu. A dnsmasqCaps device can be created and
populated either from running a program on disk, reading a file with
the concatenated output of "dnsmasq --version; dnsmasq --help", or
examining a buffer in memory that contains the concatenated output of
those two commands. Simple functions to retrieve capabilities flags,
the version number, and the path of the binary are also included.
bridge_driver.c creates a single dnsmasqCaps object at driver startup,
and disposes of it at driver shutdown. Any time it must be used, the
dnsmasqCapsRefresh method is called - it checks the mtime of the
binary, and re-runs the checks if the binary has changed.
networkxml2argvtest.c creates 2 "artificial" dnsmasqCaps objects at
startup - one "restricted" (doesn't support --bind-dynamic) and one
"full" (does support --bind-dynamic). Some of the test cases use one
and some the other, to make sure both code pathes are tested.
because libvirt_lxc's cgroup mountpoint is what it shown
in /proc/self/cgroup.
we can get container's cgroup through virCgroupNew("/", &group),
add interface virCgroupGetAppRoot to help container to
get it's cgroup.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
virCgroupGetMemSwapUsage is used to get container's swap usage,
with this interface,we can get swap usage in fuse filesystem.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
This patch introduces the RNG schema and updates necessary data strucutures
to allow various hypervisors to make use of Gluster protocol as one of the
supported network disk backend. Next patch will add support to make use of
this feature in Qemu since it now supports Gluster protocol as one of the
network based storage backend.
Two new optional attributes for <host> element are introduced - 'transport'
and 'socket'. Valid transport values are tcp, unix or rdma. If none specified,
tcp is assumed. If transport is unix, socket specifies path to unix socket.
This patch allows users to specify disks on gluster backends like this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume1/image'>
<host name='example.org' port='6000' transport='tcp'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume2/image'>
<host transport='unix' socket='/path/to/sock'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Currently the LXC driver logs audit messages when a container
is started or stopped. These audit messages, however, contain
the PID of the libvirt_lxc supervisor process. To enable
sysadmins to correlate with audit messages generated by
processes /inside/ the container, we need to include the
container init process PID.
We can't do this in the main 'start' audit message, since
the init PID is not available at that point. Instead we output
a completely new audit record, that lists both PIDs.
type=VIRT_CONTROL msg=audit(1353433750.071:363): pid=20180 uid=0 auid=501 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='virt=lxc op=init vm="busy" uuid=dda7b947-0846-1759-2873-0f375df7d7eb vm-pid=20371 init-pid=20372 exe="/home/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd" hostname=? addr=? terminal=pts/6 res=success'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Upcoming patches for revert-and-clone branching of snapshots need
to be able to copy a domain definition; make this step reusable.
* src/conf/domain_conf.h (virDomainDefCopy): New prototype.
* src/conf/domain_conf.c (virDomainObjCopyPersistentDef): Split...
(virDomainDefCopy): ...into new function.
(virDomainObjSetDefTransient): Use it.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Use it.
This patch adds a helper to determine if snapshots are external and uses
the helper to fix detection of those in snapshot deletion code.
Snapshots are external if they have an external memory image or if the
disk locations are external. As mixed snapshots are forbidden for now
we need to check just one disk to know.
Currently, we use iohelper when saving/restoring a domain.
However, if there's some kind of error (like I/O) it is not
propagated to libvirt. Since it is not qemu who is doing
the actual write() it will not get error. The iohelper does.
Therefore we should check for iohelper errors as it makes
libvirt more user friendly.
In the XML warning, we print a virsh command line that can be used to
edit that XML. This patch prints UUIDs if the entity name contains
special characters (like shell metacharacters, or "--" that would break
parsing of the XML comment). If the entity doesn't have a UUID, just
print the virsh command that can be used to edit it.
For now, disk migration via block copy job is not implemented in
libvirt. But when we do implement it, we have to deal with the
fact that qemu does not yet provide an easy way to re-start a qemu
process with mirroring still intact. Paolo has proposed an idea
for a persistent dirty bitmap that might make this possible, but
until that design is complete, it's hard to say what changes
libvirt would need. Even something like 'virDomainSave' becomes
hairy, if you realize the implications that 'virDomainRestore'
would be stuck with recreating the same mirror layout.
But if we step back and look at the bigger picture, we realize that
the initial client of live storage migration via disk mirroring is
oVirt, which always uses transient domains, and that if a transient
domain is destroyed while a mirror exists, oVirt can easily restart
the storage migration by creating a new domain that visits just the
source storage, with no loss in data.
We can make life a lot easier by being cowards for now, forbidding
certain operations on a domain. This patch guarantees that we
never get in a state where we would have to restart a domain with
a mirroring block copy, by preventing saves, snapshots, migration,
hot unplug of a disk in use, and conversion to a persistent domain
(thankfully, it is still relatively easy to 'virsh undefine' a
running domain to temporarily make it transient, run tests on
'virsh blockcopy', then 'virsh define' to restore the persistence).
Later, if the qemu design is enhanced, we can relax our code.
The change to qemudDomainDefine looks a bit odd for undoing an
assignment, rather than probing up front to avoid the assignment,
but this is because of how virDomainAssignDef combines both a
lookup and assignment into a single function call.
* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype.
* src/conf/domain_conf.c (virDomainHasDiskMirror): New function.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous
actions while block copy is already in action.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=862515
which describes inconsistencies in dealing with duplicate mac
addresses on network devices in a domain.
(at any rate, it resolves *almost* everything, and prints out an
informative error message for the one problem that isn't solved, but
has a workaround.)
A synopsis of the problems:
1) you can't do a persistent attach-interface of a device with a mac
address that matches an existing device.
2) you *can* do a live attach-interface of such a device.
3) you *can* directly edit a domain and put in two devices with
matching mac addresses.
4) When running virsh detach-device (live or config), only MAC address
is checked when matching the device to remove, so the first device
with the desired mac address will be removed. This isn't always the
one that's wanted.
5) when running virsh detach-interface (live or config), the only two
items that can be specified to match against are mac address and model
type (virtio, etc) - if multiple netdevs match both of those
attributes, it again just finds the first one added and assumes that
is the only match.
Since it is completely valid to have multiple network devices with the
same MAC address (although it can cause problems in many cases, there
*are* valid use cases), what is needed is:
1) remove the restriction that prohibits doing a persistent add of a
netdev with a duplicate mac address.
2) enhance the backend of virDomainDetachDeviceFlags to check for
something that *is* guaranteed unique (but still work with just mac
address, as long as it yields only a single results.
This patch does three things:
1) removes the check for duplicate mac address during a persistent
netdev attach.
2) unifies the searching for both live and config detach of netdevices
in the subordinate functions of qemuDomainModifyDeviceFlags() to use the
new function virDomainNetFindIdx (which matches mac address and PCI
address if available, checking for duplicates if only mac address was
specified). This function returns -2 if multiple matches are found,
allowing the callers to print out an appropriate message.
Steps 1 & 2 are enough to fully fix the problem when using virsh
attach-device and detach-device (which require an XML description of
the device rather than a bunch of commandline args)
3) modifies the virsh detach-interface command to check for multiple
matches of mac address and show an error message suggesting use of the
detach-device command in cases where there are multiple matching mac
addresses.
Later we should decide how we want to input a PCI address on the virsh
commandline, and enhance detach-interface to take a --address option,
eliminating the need to use detach-device
* src/conf/domain_conf.c
* src/conf/domain_conf.h
* src/libvirt_private.syms
* added new virDomainNetFindIdx function
* removed now unused virDomainNetIndexByMac and
virDomainNetRemoveByMac
* src/qemu/qemu_driver.c
* remove check for duplicate max from qemuDomainAttachDeviceConfig
* use virDomainNetFindIdx/virDomainNetRemove instead
of virDomainNetRemoveByMac in qemuDomainDetachDeviceConfig
* use virDomainNetFindIdx instead of virDomainIndexByMac
in qemuDomainUpdateDeviceConfig
* src/qemu/qemu_hotplug.c
* use virDomainNetFindIdx instead of a homespun loop in
qemuDomainDetachNetDevice.
* tools/virsh-domain.c: modified detach-interface command as described
above
It turns out that the cpuacct results properly account for offline
cpus, and always returns results for every possible cpu, not just
the online ones. So there is no need to check the map of online
cpus in the first place, merely only a need to know the maximum
possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change
from returning the maximum id to instead returning the width of
the bitmap (one larger than the maximum id) in commit 2f4c5338,
which made this code encounter some off-by-one logic leading to
bad error messages when a cpu was offline:
$ virsh cpu-stats dom
error: Failed to virDomainGetCPUStats()
error: An error occurred, but the cause is unknown
Cleaning this up unraveled a chain of other unused variables.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop
pointless check for cpumap changes, and use correct number of
cpus. Simplify signature.
(qemuDomainGetCPUStats): Adjust caller.
* src/nodeinfo.h (nodeGetCPUCount): New prototype.
(nodeGetCPUBitmap): Drop unused parameter.
* src/nodeinfo.c (nodeGetCPUBitmap): Likewise.
(nodeGetCPUMap): Adjust caller.
(nodeGetCPUCount): New function.
* src/libvirt_private.syms (nodeinfo.h): Export it.
Added an implemention of virNodeGetCPUMap to nodeinfo.c,
(nodeGetCPUMap) which can be used by all drivers for a Linux
hypervisor host.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Callers should not need to know what the name of the file to
be read in the Linux-specific version of nodeGetCPUmap;
furthermore, qemu cares about online cpus, not present cpus,
when determining which cpus to skip.
While at it, I fixed the fact that we were computing the maximum
online cpu id by doing a slow iteration, when what we really want
to know is the max available cpu.
* src/nodeinfo.h (nodeGetCPUmap): Rename...
(nodeGetCPUBitmap): ...and simplify signature.
* src/nodeinfo.c (linuxParseCPUmax): New function.
(linuxParseCPUmap): Simplify and alter signature.
(nodeGetCPUBitmap): Change implementation.
* src/libvirt_private.syms (nodeinfo.h): Reflect rename.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Update
caller.
Sometimes it's handy to know how many bits are set.
* src/util/bitmap.h (virBitmapCountBits): New prototype.
(virBitmapNextSetBit): Use correct type.
* src/util/bitmap.c (virBitmapNextSetBit): Likewise.
(virBitmapSetAll): Maintain invariant of clear tail bits.
(virBitmapCountBits): New function.
* src/libvirt_private.syms (bitmap.h): Export it.
* tests/virbitmaptest.c (test2): Test it.
Add utility functions for Open vSwitch to both save
per-port data before a live migration, and restore the
per-port data after a live migration.
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
https://bugzilla.redhat.com/show_bug.cgi?id=866364
pointed out a crash due to virNetworkObjAssignDef free'ing
network->newDef without NULLing it afterward. A fix for this is in
upstream commit b7e9202401. While the
NULLing of newDef was a legitimate fix, newDef should have already
been empty (NULL) anyway (as indicated in the comment that was deleted
by that commit).
The reason that newDef had a non-NULL value (i.e. the root cause) was
that networkStartNetwork() had failed after populating
network->newDef, but then neglected to free/NULL newDef in the
cleanup.
(A bit of background here: network->newDef should contain the
persistent config of a network when a network is active (and of course
only when it is persisten), and NULL at all other times. There is also
a network->def which should contain the persistent definition of the
network when it is inactive, and the current live state at all other
times. The idea is that you can make changes to network->newDef which
will take effect the next time the network is restarted, but won't
mess with the current state of the network (virDomainObj has a similar
pair of virDomainDefs that behave in the same fashion). Personally I
think there should be a network->live and network->config, and the
location of the persistent config should *always* be in
network->config, but that's for a later cleanup).
Since I love things to be symmetric, I created a new function called
virNetworkObjUnsetDefTransient(), which reverses the effects of
virNetworkObjSetDefTransient(). I don't really like the name of the
new function, but then I also didn't really like the name of the old
one either (it's just named that way to match a similar function in
the domain conf code).
In order to temporarily label files read/write during a commit
operation, we need to crawl the backing chain and find the absolute
file name that needs labeling in the first place, as well as the
name of the file that owns the backing file.
* src/util/storage_file.c (virStorageFileChainLookup): New
function.
* src/util/storage_file.h: Declare it.
* src/libvirt_private.syms (storage_file.h): Export it.
Hypervisors are starting to support HyperV Enlightenment features that
improve behavior of guests running Microsoft Windows operating systems.
This patch adds support for the "relaxed" feature that improves timer
behavior and also establishes a framework to add these features in
future.
Add two new APIs virNetServerNewPostExecRestart and
virNetServerPreExecRestart which allow a virNetServerPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
This includes serialization of all registered services
and clients
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add two new APIs virNetServerClientNewPostExecRestart and
virNetServerClientPreExecRestart which allow a virNetServerClientPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
This includes serialization of the connected socket associated
with the client
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add two new APIs virNetServerServiceNewPostExecRestart and
virNetServerServicePreExecRestart which allow a virNetServerServicePtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
This includes serialization of the listening sockets associated
with the service
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add two new APIs virNetSocketNewPostExecRestart and
virNetSocketPreExecRestart which allow a virNetSocketPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()
Since it is not possible to serialize SASL or TLS encryption
state, an error will be raised if attempting to perform
serialization on non-raw sockets
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add two new APIs virLockSpaceNewPostExecRestart and
virLockSpacePreExecRestart which allow a virLockSpacePtr
object to be created from a JSON object and saved to a
JSON object, for the purposes of re-exec'ing a process.
As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The previously introduced virFile{Lock,Unlock} APIs provide a
way to acquire/release fcntl() locks on individual files. For
unknown reason though, the POSIX spec says that fcntl() locks
are released when *any* file handle referring to the same path
is closed. In the following sequence
threadA: fd1 = open("foo")
threadB: fd2 = open("foo")
threadA: virFileLock(fd1)
threadB: virFileLock(fd2)
threadB: close(fd2)
you'd expect threadA to come out holding a lock on 'foo', and
indeed it does hold a lock for a very short time. Unfortunately
when threadB does close(fd2) this releases the lock associated
with fd1. For the current libvirt use case for virFileLock -
pidfiles - this doesn't matter since the lock is acquired
at startup while single threaded an never released until
exit.
To provide a more generally useful API though, it is necessary
to introduce a slightly higher level abstraction, which is to
be referred to as a "lockspace". This is to be provided by
a virLockSpacePtr object in src/util/virlockspace.{c,h}. The
core idea is that the lockspace keeps track of what files are
already open+locked. This means that when a 2nd thread comes
along and tries to acquire a lock, it doesn't end up opening
and closing a new FD. The lockspace just checks the current
list of held locks and immediately returns VIR_ERR_RESOURCE_BUSY.
NB, the API as it stands is designed on the basis that the
files being locked are not being otherwise opened and used
by the application code. One approach to using this API is to
acquire locks based on a hash of the filepath.
eg to lock /var/lib/libvirt/images/foo.img the application
might do
virLockSpacePtr lockspace = virLockSpaceNew("/var/lib/libvirt/imagelocks");
lockname = md5sum("/var/lib/libvirt/images/foo.img");
virLockSpaceAcquireLock(lockspace, lockname);
NB, in this example, the caller should ensure that the path
is canonicalized before calculating the checksum.
It is also possible to do locks directly on resources by
using a NULL lockspace directory and then using the file
path as the lock name eg
virLockSpacePtr lockspace = virLockSpaceNew(NULL);
virLockSpaceAcquireLock(lockspace, "/var/lib/libvirt/images/foo.img");
This is only safe to do though if no other part of the process
will be opening the files. This will be the case when this
code is used inside the soon-to-be-reposted virlockd daemon
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=851981
When using macvtap, a character device gets first created by
kernel with name /dev/tapN, its selinux context is:
system_u:object_r:device_t:s0
Shortly, when udev gets notification when new file is created
in /dev, it will then jump in and relabel this file back to the
expected default context:
system_u:object_r:tun_tap_device_t:s0
There is a time gap happened.
Sometimes, it will have migration failed, AVC error message:
type=AVC msg=audit(1349858424.233:42507): avc: denied { read write } for
pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524
scontext=unconfined_u:system_r:svirt_t:s0:c598,c908
tcontext=system_u:object_r:device_t:s0 tclass=chr_file
This patch will label the tapfd device before qemu process starts:
system_u:object_r:tun_tap_device_t:MCS(MCS from seclabel->label)
This patch adds support for SUSPEND_DISK event; both lifecycle and
separated. The support is added for QEMU, machines are changed to
PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes
to shut-off. This and much more needs to be done in order for libvirt
to work with transient devices, wake-ups etc. This patch is not
aiming for that functionality.
The onlined vcpu pinning policy should inherit def->cpuset if
it's not specified explicitly, and the affinity should be set
in this case. Oppositely, the offlined vcpu pinning policy should
be free()'ed.
Add support for logging to the systemd journal, using its
simple client library. The benefit over syslog is that it
accepts structured log data, so the journald can store
individual items like code file/line/func separately from
the string message. Tools which require structured log
data can then query the journal to extract exactly what
they desire without resorting to string parsing
While systemd provides a simple client library for logging,
it is more convenient for libvirt to directly write its
own client code. This lets us build up the iovec's on
the stack, avoiding the need to alloc memory when writing
log messages.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In the cgroups APIs we have a virCgroupKillPainfully function
which does the loop sending SIGTERM, then SIGKILL and waiting
for the process to exit. There is similar functionality for
simple processes in qemuProcessKill, but it is tangled with
the QEMU code. Untangle it to provide a virProcessKillPainfuly
function
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Continue consolidation of process functions by moving some
helpers out of command.{c,h} into virprocess.{c,h}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are a number of process related functions spread
across multiple files. Start to consolidate them by
creating a virprocess.{c,h} file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCommand prefix was inappropriate because the API
does not use any virCommandPtr object instance. This
API closely related to waitpid/exit, so use virProcess
as the prefix
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Sometimes when guest machine crashes, coredump can get huge due to the
guest memory. This can be limited using madvise(2) system call and is
being used in QEMU hypervisor. This patch adds an option for configuring
that in the domain XML and related documentation.
This patch cleans up building the "-boot" parameter and while on that
fixes one inconsistency by modifying these things:
- I completed the unfinished virDomainBootMenu enum by specifying
LAST, declaring it and also declaring the TypeFromString and
TypeToString parameters.
- Previously mentioned TypeFromString and TypeToString are used when
parsing the XML.
- Last, but not least, visible change is that the "-boot" parameter
is built and parsed properly:
- The "order=" prefix is used only when additional parameters are
used (menu, etc.).
- It's rewritten in a way that other parameters can be added
easily in the future (used in following patch).
- The "order=" parameter is properly parsed regardless to where it
is placed in the string (e.g. "menu=on,order=nc").
- The "menu=" parameter (and others in the future) are created
when they should be (i.e. even when bootindex is supported and
used, but not when bootloader is selected).
Commit f36309d added an export with no matching implementation;
probably a misspelling of an earlier version of the final addition
of virNetworkObjSetDefTransient.
* src/libvirt_private.syms (network_conf.h): Drop bogus
virNetworkSetDefTransient.
virNetworkObjUpdate takes care of all virNetworkUpdate-related changes
to the data stored in the in-memory virNetworkObj list. It should be
called by network drivers that use this in-memory list.
virNetworkObjUpdate *does not* take care of updating any disk-based
copies of the config, nor does it perform any other operations
necessary to have the new config data take effect (e.g. it won't
re-write dnsmasq host files, nor will it send a SIGHUP to dnsmasq) -
those things should all be taken care of in the network driver
function that calls virNetworkObjUpdate (assuming that it returns
success).
These new functions are highly inspired by those in domain_conf.c (but
not identical), and are intended to make it simpler to update the
various combinations of live/persistent network configs.
The network driver wasn't previously as careful about the separation
between the live "status" in network->def and the persistent "config"
in network->newDef (or sometimes in network->def). This series
attempts to remedy some of that, but probably doesn't go all the way
(enough to get these functions working and enable continued work on
virNetworkUpdate though).
bridge_driver.c and test_driver.c were updated in a few places to take
advantage of the new functions and/or account for changes in argument
lists.
Validates the wwn while parsing, error out if it's malformed.
* src/util/util.h: Declare virValidateWWN
* src/util/util.c: Implement virValidateWWN
* src/libvirt_private.syms: Export virValidateWWN.
* src/conf/domain_conf.h: New member 'wwn' for disk def.
* src/conf/domain_conf.c: Parse and format disk <wwn>
In many places we store bitmap info in a chunk of data
(pointed to by a char *), and have redundant codes to
set/unset bits. This patch extends virBitmap, and convert
those codes to use virBitmap in subsequent patches.
Only implemented for linux platform.
* src/nodeinfo.h: (Declare node{Get,Set}MemoryParameters)
* src/nodeinfo.c: (Implement node{Get,Set}MemoryParameters)
* src/libvirt_private.syms: (Export those two new internal APIs to
private symbols)
tools/virsh-nodedev.c:
* vshNodeDeviceSorter to sort node devices by name
* vshNodeDeviceListFree to free the node device objects list.
* vshNodeDeviceListCollect to collect the node device objects, trying
to use new API first, fall back to older APIs if it's not supported.
* Change option --cap to accept multiple capability types.
tools/virsh.pod
* Update document for --cap
src/conf/node_device_conf.h:
* New macro VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_CAP
* Declare virNodeDeviceList
src/conf/node_device_conf.c:
* New helpers virNodeDeviceCapMatch, virNodeDeviceMatch.
virNodeDeviceCapMatch looks up the list of all the caps the device
support, to see if the device support the cap type.
* Implement virNodeDeviceList
src/libvirt_private.syms:
* Export virNodeDeviceList
* Export virNodeDevCapTypeFromString
New options is added to support EOI (End of Interrupt) exposure for
guests. As it makes sense only when APIC is enabled, I added this into
the <apic> element in <features> because this should be tri-state
option (cannot be handled as standalone feature).
src/conf/network_conf.c: Add virNetworkMatch to filter the networks;
and virNetworkList to iterate over all the networks with the filter.
src/conf/network_conf.h: Declare virNetworkList and define the macros
for filters.
src/libvirt_private.syms: Export virNetworkList.
This patch adds a helper to deal with assigning values to
virTypedParameter structures from strings. The helper parses the value
from the string and assigns it to the corresponding union value.
When the event symbols were added to the public API, not all
of them were removed from the private exports list. Solaris
gets unhappy when there are duplicated symbols. Extend the
symfile check to test for this scenario
src/conf/storage_conf.c: Add virStoragePoolMatch to filter the
pools; Add virStoragePoolList to iterate over the pool objects
with filter.
src/conf/storage_conf.h: Declare virStoragePoolMatch,
virStoragePoolList, and the macros for filters.
src/libvirt_private.syms: Export helper virStoragePoolList.
There is a new <pm/> element implemented that can control what ACPI
sleeping states will be advertised by BIOS and allowed to be switched
to by libvirt. The default keeps defaults on hypervisor, otherwise
forces chosen setting.
The documentation of the pm element is added as well.
The name 'virDomainDiskSnapshot' didn't fit in with our normal
conventions of using a prefix hinting that it is related to a
virDomainSnapshotPtr. Also, a future patch will reuse the
enum for declaring where the VM memory is stored.
* src/conf/snapshot_conf.h (virDomainDiskSnapshot): Rename...
(virDomainSnapshotLocation): ...to this.
(_virDomainSnapshotDiskDef): Update clients.
* src/conf/domain_conf.h (_virDomainDiskDef): Likewise.
* src/libvirt_private.syms (domain_conf.h): Likewise.
* src/conf/domain_conf.c (virDomainDiskDefParseXML)
(virDomainDiskDefFormat): Likewise.
* src/conf/snapshot_conf.c: (virDomainSnapshotDiskDefParseXML)
(virDomainSnapshotAlignDisks, virDomainSnapshotDefFormat):
Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotCreateDiskActive, qemuDomainSnapshotCreateXML):
Likewise.
We were failing to react to allocation failure when initializing
a snapshot object list. Changing things to store a pointer
instead of a complete object adds one more possible point of
allocation failure, but at the same time, will make it easier to
react to failure now, as well as making it easier for a future
patch to split all virDomainSnapshotPtr handling into a separate
file, as I continue to add even more snapshot code.
Luckily, there was only one client outside of domain_conf.c that
was actually peeking inside the object, and a new wrapper function
was easy.
* src/conf/domain_conf.h (_virDomainObj): Use a pointer.
(virDomainSnapshotObjListInit): Rename.
(virDomainSnapshotObjListFree, virDomainSnapshotForEach): New
declarations.
(_virDomainSnapshotObjList): Move definitions...
* src/conf/domain_conf.c: ...here.
(virDomainSnapshotObjListInit, virDomainSnapshotObjListDeinit):
Rename...
(virDomainSnapshotObjListNew, virDomainSnapshotObjListFree): ...to
these.
(virDomainSnapshotForEach): New function.
(virDomainObjDispose, virDomainListPopulate): Adjust callers.
* src/qemu/qemu_domain.c (qemuDomainSnapshotDiscard)
(qemuDomainSnapshotDiscardAllMetadata): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotLoad)
(qemuDomainUndefineFlags, qemuDomainSnapshotCreateXML)
(qemuDomainSnapshotListNames, qemuDomainSnapshotNum)
(qemuDomainListAllSnapshots)
(qemuDomainSnapshotListChildrenNames)
(qemuDomainSnapshotNumChildren)
(qemuDomainSnapshotListAllChildren)
(qemuDomainSnapshotLookupByName, qemuDomainSnapshotGetParent)
(qemuDomainSnapshotGetXMLDesc, qemuDomainSnapshotIsCurrent)
(qemuDomainSnapshotHasMetadata, qemuDomainRevertToSnapshot)
(qemuDomainSnapshotDelete): Likewise.
* src/libvirt_private.syms (domain_conf.h): Export new function.
This patch introduce virNetlinkEventServiceStopAll() to stop
all the monitors to receive netlink messages for libvirtd.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Introduce 2 APIs to support emulator threads pin.
1) virDomainEmulatorPinAdd: setup emulator threads pin with a given cpumap string.
2) virDomainEmulatorPinDel: remove all emulator threads pin.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
vcpu threads pin are implemented using sched_setaffinity(), but
not controlled by cgroup. This patch does the following things:
1) enable cpuset cgroup
2) reflect all the vcpu threads pin info to cgroup
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Introduce a new API to move tasks of one controller from a cgroup to another cgroup
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Introduce the function virCgroupForEmulator() to create sub directory
for simulator thread(include I/O thread, vhost-net thread)
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
A hypervisor may allow to override the disk geometry of drives.
Qemu, as an example with cyls=,heads=,secs=[,trans=].
This patch extends the domain config to allow the specification of
disk geometry with libvirt.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When running libvirtd from a build directory, libvirtd would load lock
drivers from system directory unless explicitly overridden by setting
LIBVIRT_LOCK_MANAGER_PLUGIN_DIR environment variable. Since we already
autodetect driver directory if libvirt is build with driver modules, we
can use the same trick to automagically set lock driver directory.
This patch adds a glue layer to enable using libssh2 code with the
network client code.
As in the original client implementation, shell code is sent to the
server to detect correct options for netcat and connect to libvirt's
unix socket.
This patch enables virNetSocket to be used as an ssh client when
properly configured.
This patch adds function virNetSocketNewConnectLibSSH2() that takes all
needed parameters and creates a libssh2 session and performs steps
needed to open the connection and then create a virNetSocket that
seamlesly encapsulates the communication.
These changes make the security drivers able to find and handle the
correct security label information when more than one label is
available. They also update the DAC driver to be used as an usual
security driver.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>
This patch updates the domain and capability XML parser and formatter to
support more than one "seclabel" element for each domain and device. The
RNG schema and the tests related to this are also updated by this patch.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>
In order to support systemd socket based activation, it needs to
be possible to create virNetSocketPtr and virNetServerServicePtr
instance from a pre-opened file descriptor
This function is needed by the network driver in a later commit.
It is useful in functions like networkNotifyActualDevice and
networkReleaseActualDevice
Move the functions the parse/format, and validate PCI addresses to
their own file so they can be conveniently used in other places
besides device_conf.c
Refactoring existing code without causing any functional changes to
prepare for new code.
This patch makes the code reusable.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Previous commit:
commit 9093ab7734
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Wed Jul 18 17:03:17 2012 +0100
Add lots of internal symbols to libvirt_private.syms
mistakenly put some conditional SASL symbols in libvirt_private.syms
instead of libvirt_sasl.syms
The following config elements now support a <vlan> subelements:
within a domain: <interface>, and the <actual> subelement of <interface>
within a network: the toplevel, as well as any <portgroup>
Each vlan element must have one or more <tag id='n'/> subelements. If
there is more than one tag, it is assumed that vlan trunking is being
requested. If trunking is required with only a single tag, the
attribute "trunk='yes'" should be added to the toplevel <vlan>
element.
Some examples:
<interface type='hostdev'/>
<vlan>
<tag id='42'/>
</vlan>
<mac address='52:54:00:12:34:56'/>
...
</interface>
<network>
<name>vlan-net</name>
<vlan trunk='yes'>
<tag id='30'/>
</vlan>
<virtualport type='openvswitch'/>
</network>
<interface type='network'/>
<source network='vlan-net'/>
...
</interface>
<network>
<name>trunk-vlan</name>
<vlan>
<tag id='42'/>
<tag id='43'/>
</vlan>
...
</network>
<network>
<name>multi</name>
...
<portgroup name='production'/>
<vlan>
<tag id='42'/>
</vlan>
</portgroup>
<portgroup name='test'/>
<vlan>
<tag id='666'/>
</vlan>
</portgroup>
</network>
<interface type='network'/>
<source network='multi' portgroup='test'/>
...
</interface>
IMPORTANT NOTE: As of this patch there is no backend support for the
vlan element for *any* network device type. When support is added in
later patches, it will only be for those select network types that
support setting up a vlan on the host side, without the guest's
involvement. (For example, it will be possible to configure a vlan for
a guest connected to an openvswitch bridge, but it won't be possible
to do that for one that is connected to a standard Linux host bridge.)
To allow for the possibility of vlan "trunks", which have more than
one vlan tag associated with them, we need a vlan struct. Since it
will be used by multiple files in src/util, src/conf, src/network, and
src/qemu, it must be defined in src/util. Unfortunately there isn't
currently a common file for simple netdev data definitions, so I
created a new file.
Currently there is a hook function that is invoked when a
new client connection comes in, which allows an app to
setup private data. This setup will make it difficult to
serialize client state during process re-exec(). Change to
a model where the app registers a callback when creating
the virNetServerPtr instance, which is used to allocate
the client private data immediately during virNetClientPtr
construction.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the virNetClientPtr constructor will always register
the async IO event handler and the keepalive objects. In the
case of the lock manager, there will be no event loop available
nor keepalive support required. Split this setup out of the
constructor and into separate methods.
The remote driver will enable async IO and keepalives, while
the LXC driver will only enable async IO
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the virNetServerServicePtr is responsible for
creating the virNetServerClientPtr instance when accepting
a new connection. Change this so that the virNetServerServicePtr
merely gives virNetServerPtr a virNetSocketPtr instance. The
virNetServerPtr can then create the virNetServerClientPtr
as it desires
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It is desirable to be able to query the config params of
the thread pool, in order to save the server state. Add
virThreadPoolGetMinWorkers, virThreadPoolGetMaxWorkers
and virThreadPoolGetPriorityWorkers APIs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch adds three utility functions that operate on
virNetDevVPortProfile objects.
* virNetDevVPortProfileCheckComplete() - verifies that all attributes
required for the type of the given virtport are specified.
* virNetDevVPortProfileCheckNoExtras() - verifies that there are no
attributes specified which are inappropriate for the type of the
given virtport.
* virNetDevVPortProfileMerge3() - merges 3 virtports into a single,
newly allocated virtport. If any attributes are specified in
more than one of the three sources, and do not exactly match,
an error is logged and the function fails.
These new functions depend on new fields in the virNetDevVPortProfile
object that keep track of whether or not each attribute was
specified. Since the higher level parse function doesn't yet set those
fields, these functions are not actually usable yet (but that's okay,
because they also aren't yet used - all of that functionality comes in
a later patch.)
Note that these three functions return 0 on success and -1 on
failure. This may seem odd for the first two Check functions, since
they could also easily return true/false, but since they actually log
an error when the requested condition isn't met (and should result in
a failure of the calling function), I thought 0/-1 was more
appropriate.
The current virRandomBits() API is only usable if the caller wants
a random number in the range [0, n-1) where n is a power of two.
This adds a virRandom() API which generates a double in the
range [0.0,1.0) with 48 bits of entropy. It then also adds a
virRandomInt(uint32_t max) API which generates an unsigned
in the range [0,@max)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
libvirt creates invalid commands if wrong locale is selected. For
example with locale that uses comma as a decimal point, JSON commands
created with decimal numbers are invalid because comma separates the
entries in JSON. Fortunately even when decimal point is affected,
thousands grouping is not, because for grouping to be enabled with
*printf, there has to be an apostrophe flag specified (and supported).
This patch adds specific internal function for converting doubles to
strings with C locale.
The meat of this patch is just moving the calls to
virNWFilterRegisterCallbackDriver from each hypervisor's "register"
function into its "initialize" function. The rest is just code
movement to allow that, and a new virNWFilterUnRegisterCallbackDriver
function to undo what the register function does.
The long explanation:
There is an array in nwfilter called callbackDrvArray that has
pointers to a table of functions for each hypervisor driver that are
called by nwfilter. One of those function pointers is to a function
that will lock the hypervisor driver. Entries are added to the table
by calling each driver's "register" function, which happens quite
early in libvirtd's startup.
Sometime later, each driver's "initialize" function is called. This
function allocates a driver object and stores a pointer to it in a
static variable that was previously initialized to NULL. (and here's
the important part...) If the "initialize" function fails, the driver
object is freed, and that pointer set back to NULL (but the entry in
nwfilter's callbackDrvArray is still there).
When the "lock the driver" function mentioned above is called, it
assumes that the driver was successfully loaded, so it blindly tries
to call virMutexLock on "driver->lock".
BUT, if the initialize never happened, or if it failed, "driver" is
NULL. And it just happens that "lock" is always the first field in
driver so it is also NULL.
Boom.
To fix this, the call to virNWFilterRegisterCallbackDriver for each
driver shouldn't be called until the end of its (*already guaranteed
successful*) "initialize" function, not during its "register" function
(which is currently the case). This implies that there should also be
a virNWFilterUnregisterCallbackDriver() function that is called in a
driver's "shutdown" function (although in practice, that function is
currently never called).
Switch virDomainObjPtr to use the virObject APIs for reference
counting. The main change is that virObjectUnref does not return
the reference count, merely a bool indicating whether the object
still has any refs left. Checking the return value is also not
mandatory.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This converts the following public API datatypes to use the
virObject infrastructure:
virConnectPtr
virDomainPtr
virDomainSnapshotPtr
virInterfacePtr
virNetworkPtr
virNodeDevicePtr
virNWFilterPtr
virSecretPtr
virStreamPtr
virStorageVolPtr
virStoragePoolPtr
The code is significantly simplified, since the mutex in the
virConnectPtr object now only needs to be held when accessing
the per-connection virError object instance. All other operations
are completely lock free.
* src/datatypes.c, src/datatypes.h, src/libvirt.c: Convert
public datatypes to use virObject
* src/conf/domain_event.c, src/phyp/phyp_driver.c,
src/qemu/qemu_command.c, src/qemu/qemu_migration.c,
src/qemu/qemu_process.c, src/storage/storage_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xend_internal.c,
tests/qemuxml2argvtest.c, tests/qemuxmlnstest.c,
tests/sexpr2xmltest.c, tests/xmconfigtest.c: Convert
to use virObjectUnref/virObjectRef
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This introduces a fairly basic reference counted virObject type
and an associated virClass type, that use atomic operations for
ref counting.
In a global initializer (recommended to be invoked using the
virOnceInit API), a virClass type must be allocated for each
object type. This requires a class name, a "dispose" callback
which will be invoked to free memory associated with the object's
fields, and the size in bytes of the object struct.
eg,
virClassPtr connclass = virClassNew("virConnect",
sizeof(virConnect),
virConnectDispose);
The struct for the object, must include 'virObject' as its
first member
eg
struct _virConnect {
virObject object;
virURIPtr uri;
};
The 'dispose' callback is only responsible for freeing
fields in the object, not the object itself. eg a suitable
impl for the above struct would be
void virConnectDispose(void *obj) {
virConnectPtr conn = obj;
virURIFree(conn->uri);
}
There is no need to reset fields to 'NULL' or '0' in the
dispose callback, since the entire object will be memset
to 0, and the klass pointer & magic integer fields will
be poisoned with 0xDEADBEEF before being free()d
When creating an instance of an object, one needs simply
pass the virClassPtr eg
virConnectPtr conn = virObjectNew(connclass);
if (!conn)
return NULL;
conn->uri = virURIParse("foo:///bar")
Object references can be manipulated with
virObjectRef(conn)
virObjectUnref(conn)
The latter returns a true value, if the object has been
freed (ie its ref count hit zero)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
All callers used the same initialization seed (well, the new
viratomictest forgot to look at getpid()); so we might as well
make this value automatic. And while it may feel like we are
giving up functionality, I documented how to get it back in the
unlikely case that you actually need to debug with a fixed
pseudo-random sequence. I left that crippled by default, so
that a stray environment variable doesn't cause a lack of
randomness to become a security issue.
* src/util/virrandom.c (virRandomInitialize): Rename...
(virRandomOnceInit): ...and make static, with one-shot call.
Document how to do fixed-seed debugging.
* src/util/virrandom.h (virRandomInitialize): Drop prototype.
* src/libvirt_private.syms (virrandom.h): Don't export it.
* src/libvirt.c (virInitialize): Adjust caller.
* src/lxc/lxc_controller.c (main): Likewise.
* src/security/virt-aa-helper.c (main): Likewise.
* src/util/iohelper.c (main): Likewise.
* tests/seclabeltest.c (main): Likewise.
* tests/testutils.c (virtTestMain): Likewise.
* tests/viratomictest.c (mymain): Likewise.
* src/conf/domain_conf.c:
- Add virDomainControllerFind to find controller device by type
and index.
- Add virDomainControllerRemove to remove the controller device
from maintained controler list.
* src/conf/domain_conf.h:
- Declare the two new helpers.
* src/libvirt_private.syms:
- Expose private symbols for the two new helpers.
* src/qemu/qemu_driver.c:
- Support attach/detach controller device persistently
* src/qemu/qemu_hotplug.c:
- Use the two helpers to simplify the codes.
Security manager is not a dynamically loadable driver, it's a common
infrastructure similar to util, conf, cpu, etc. used by individual
drivers. Such code is allowed to be linked into libvirt.so.
This reverts commit ec5b7bd2ec and most of
aae5cfb699.
This patch is supposed to fix virdrivermoduletest failures for qemu and
lxc drivers as well as libvirtd's ability to load qemu and lxc drivers.
Remove the use of a manually run virLogStartup and
virNodeSuspendInitialize methods. Instead make sure they
are automatically run using VIR_ONCE_GLOBAL_INIT
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch adds helpers that validate domain's device configuration.
This will be needed later on to verify devices being hot-plugged to
guests. If the guest has no USB bus, then it's not valid to plug a USB
device to that guest.
The nwfilter and secrets drivers are both stateful and are already
linked directly to libvirtd. Linking them to libvirt.so is thus
wrong, likewise exporting their symbols in libvirt.so is wrong
Allow detection of socket close in virNetClient via a callback
function, triggered on any condition that causes the socket to
be closed.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make sure that libvirt_private.syms has all the internal symbols
from APIs in src/rpc/*.h and src/util/cgroup.h, since the LXC
controller/driver will shortly need them
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce new members in the virMacAddr 'class'
- virMacAddrSet: set virMacAddr from a virMacAddr
- virMacAddrSetRaw: setting virMacAddr from raw 6 byte MAC address buffer
- virMacAddrGetRaw: writing virMacAddr into raw 6 byte MAC address buffer
- virMacAddrCmp: comparing two virMacAddr
- virMacAddrCmpRaw: comparing a virMacAddr with a raw 6 byte MAC address buffer
then replace raw MAC addresses by replacing
- 'unsigned char *' with virMacAddrPtr
- 'unsigned char ... [VIR_MAC_BUFLEN]' with virMacAddr
and introduce usage of above functions where necessary.
When the guest changes its memory balloon applications may want
to know what the new value is, without having to periodically
poll on XML / domain info. Introduce a "balloon change" event
to let apps see this
* include/libvirt/libvirt.h.in: Define the
virConnectDomainEventBalloonChangeCallback callback
and VIR_DOMAIN_EVENT_ID_BALLOON_CHANGE constant
* python/libvirt-override-virConnect.py,
python/libvirt-override.c: Wire up helpers for new event
* daemon/remote.c: Helper for serializing balloon event
* examples/domain-events/events-c/event-test.c,
examples/domain-events/events-python/event-test.py: Add
example of balloon event usage
* src/conf/domain_event.c, src/conf/domain_event.h: Handling
of balloon events
* src/remote/remote_driver.c: Add handler of balloon events
* src/remote/remote_protocol.x: Define wire protocol for
balloon events
* src/remote_protocol-structs: Likewise.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of changing the existed virFileMakePath to accept mode
argument and modifying a pile of its uses, this patch introduces
virFileMakePathWithMode, and use it instead of mkdir() to create
the readline history dir.
While it is not currently used elsewhere in libvirt, the code
for finding a free loop device & associating a file with it
is not LXC specific. Move it into the viffile.{c,h} file where
potentially shared code is more commonly kept.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Wraps the conversion from 'char *name' to virDomainSnapshotPtr in
a reusable manner.
* src/conf/virdomainlist.h (virDomainListSnapshots): New declaration.
* src/conf/virdomainlist.c (virDomainListSnapshots): Implement it.
* src/libvirt_private.syms (virdomainlist.h): Export it.
Now that domain listing is a thin wrapper around child listing,
it's easier to have a common entry point. This restores the
hashForEach optimization lost in the previous patch when there
are no snapshots being filtered out of the entire list.
* src/conf/domain_conf.h (virDomainSnapshotObjListGetNames)
(virDomainSnapshotObjListNum): Add parameter.
(virDomainSnapshotObjListGetNamesFrom)
(virDomainSnapshotObjListNumFrom): Delete.
* src/libvirt_private.syms (domain_conf.h): Drop deleted functions.
* src/conf/domain_conf.c (virDomainSnapshotObjListGetNames):
Merge, and (re)add an optimization.
* src/qemu/qemu_driver.c (qemuDomainUndefineFlags)
(qemuDomainSnapshotListNames, qemuDomainSnapshotNum)
(qemuDomainSnapshotListChildrenNames)
(qemuDomainSnapshotNumChildren): Update callers.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
* src/conf/virdomainlist.c (virDomainListPopulate): Likewise.
This patch adds common code to list domains in fashion used by
virListAllDomains with all currently supported flags. The header file
also contains macros that group filters together that are used to
shorten filter conditions.
Right now, the only way to get at the contents of a virBuffer is
to destroy it. But there are cases in my upcoming patches where
peeking at the contents makes life easier. I suppose this does
open up the potential for bad code to dereference a stale pointer,
by disregarding the docs that the return value is invalid on the
next virBuf operation, but such is life.
* src/util/buf.h (virBufferCurrentContent): New declaration.
* src/util/buf.c (virBufferCurrentContent): Implement it.
* src/libvirt_private.syms (buf.h): Export it.
* tests/virbuftest.c (testBufAutoIndent): Test it.
The goal of this patch is to prepare for support for multiple IP
addresses per interface in the DHCP snooping code.
Move the code for the IP address map that maps interface names to
IP addresses into their own file. Rename the functions on the way
but otherwise leave the code as-is. Initialize this new layer
separately before dependent layers (iplearning, dhcpsnooping)
and shut it down after them.
Add an impl of +virGetUserRuntimeDirectory, virGetUserCacheDirectory
virGetUserConfigDirectory and virGetUserDirectory for Win32 platform.
Also create stubs for non-Win32 platforms which lack getpwuid_r()
In adding these two helpers were added virFileIsAbsPath and
virFileSkipRoot, along with some macros VIR_FILE_DIR_SEPARATOR,
VIR_FILE_DIR_SEPARATOR_S, VIR_FILE_IS_DIR_SEPARATOR,
VIR_FILE_PATH_SEPARATOR, VIR_FILE_PATH_SEPARATOR_S
All this code was adapted from GLib2 under terms of LGPLv2+ license.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The libvirt_test.la library was introduced to allow test suites
to reference internal-only symbols. These days, nearly every
symbol we care about is in src/libvirt_private.syms, so there
is no need for libvirt_test.la to continue to exist
* src/Makefile.am: Delete libvirt_test.la & add new .syms files
* src/libvirt_private.syms: Export symbols needed by test suite
* tests/Makefile.am: Link to libvirt_test.la. Ensure LXC tests link
to network_driver.la
* src/libvirt_esx.syms, src/libvirt_openvz.syms: Add exports needed
by test suite
When the last reference to a virConnectPtr is released by
libvirtd, it was possible for a deadlock to occur in the
virDomainEventState functions. The virDomainEventStatePtr
holds a reference on virConnectPtr for each registered
callback. When removing a callback, the virUnrefConnect
function is run. If this causes the last reference on the
virConnectPtr to be released, then virReleaseConnect can
be run, which in turns calls qemudClose. This function has
a call to virDomainEventStateDeregisterConn which is intended
to remove all callbacks associated with the virConnectPtr
instance. This will try to grab a lock on virDomainEventState
but this lock is already held. Deadlock ensues
Thread 1 (Thread 0x7fcbb526a840 (LWP 23185)):
Since each callback associated with a virConnectPtr holds a
reference on virConnectPtr, it is impossible for the qemudClose
method to be invoked while any callbacks are still registered.
Thus the call to virDomainEventStateDeregisterConn must in fact
be a no-op. Thus it is possible to just remove all trace of
virDomainEventStateDeregisterConn and avoid the deadlock.
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Delete virDomainEventStateDeregisterConn
* src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
src/qemu/qemu_driver.c, src/uml/uml_driver.c: Remove
calls to virDomainEventStateDeregisterConn
Some security drivers require special options to be passed to
the mount system call. Add a security driver API for handling
this data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As defined in:
http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
This offers a number of advantages:
* Allows sharing a home directory between different machines, or
sessions (eg. using NFS)
* Cleanly separates cache, runtime (eg. sockets), or app data from
user settings
* Supports performing smart or selective migration of settings
between different OS versions
* Supports reseting settings without breaking things
* Makes it possible to clear cache data to make room when the disk
is filling up
* Allows us to write a robust and efficient backup solution
* Allows an admin flexibility to change where data and settings are stored
* Dramatically reduces the complexity and incoherence of the
system for administrators
Though numad will manage the memory allocation of task dynamically,
it wants management application (libvirt) to pre-set the memory
policy according to the advisory nodeset returned from querying numad,
(just like pre-bind CPU nodeset for domain process), and thus the
performance could benefit much more from it.
This patch introduces new XML tag 'placement', value 'auto' indicates
whether to set the memory policy with the advisory nodeset from numad,
and its value defaults to the value of <vcpu> placement, or 'static'
if 'nodeset' is specified. Example of the new XML tag's usage:
<numatune>
<memory placement='auto' mode='interleave'/>
</numatune>
Just like what current "numatune" does, the 'auto' numa memory policy
setting uses libnuma's API too.
If <vcpu> "placement" is "auto", and <numatune> is not specified
explicitly, a default <numatume> will be added with "placement"
set as "auto", and "mode" set as "strict".
The following XML can now fully drive numad:
1) <vcpu> placement is 'auto', no <numatune> is specified.
<vcpu placement='auto'>10</vcpu>
2) <vcpu> placement is 'auto', no 'placement' is specified for
<numatune>.
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='interleave'/>
</numatune>
And it's also able to control the CPU placement and memory policy
independently. e.g.
1) <vcpu> placement is 'auto', and <numatune> placement is 'static'
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='strict' nodeset='0-10,^7'/>
</numatune>
2) <vcpu> placement is 'static', and <numatune> placement is 'auto'
<vcpu placement='static' cpuset='0-24,^12'>10</vcpu>
<numatune>
<memory mode='interleave' placement='auto'/>
</numatume>
A follow up patch will change the XML formatting codes to always output
'placement' for <vcpu>, even it's 'static'.
This value will be needed to set the src_pid when sending netlink
messages to lldpad. It is part of the solution to:
https://bugzilla.redhat.com/show_bug.cgi?id=816465
Note that libnl's port generation algorithm guarantees that the
nl_socket_get_local_port() will always be > 0 (since it is "getpid() +
(n << 22>" where n is always < 1024), so it is okay to cast the
uint32_t to int (thus allowing us to use -1 as an error sentinel).
usbFindDevice():get usb device according to
idVendor, idProduct, bus, device
it is the exact match of the four parameters
usbFindDeviceByBus():get usb device according to bus, device
it returns only one usb device same as usbFindDevice
usbFindDeviceByVendor():get usb device according to idVendor,idProduct
it probably returns multiple usb devices.
usbDeviceSearch(): a helper function to do the actual search
Add function virJSONValueObjectKeysNumber, virJSONValueObjectGetKey
and virJSONValueObjectGetValue, which allow you to iterate over all
fields of json object: you can get number of fields and then get
name and value, stored in field with that name by index.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
When libvirtd is started, we create "libvirt/qemu" directories under
hugetlbfs mount point. Only the "qemu" subdirectory is chowned to qemu
user and "libvirt" remains owned by root. If umask was too restrictive
when libvirtd started, qemu user may lose access to "qemu"
subdirectory. Let's explicitly grant search permissions to "libvirt"
directory for all users.
Add 2 new functions to the virSocketAddr 'class':
- virSocketAddrEqual: tests whether two IP addresses and their ports are equal
- virSocketaddSetIPv4Addr: set a virSocketAddr given a 32 bit int
DBus connection. The HAL device code further requires that
the DBus connection is integrated with the event loop and
provides such glue logic itself.
The forthcoming FirewallD integration also requires a
dbus connection with event loop integration. Thus we need
to pull the current event loop glue out of the HAL driver.
Thus we create src/util/virdbus.{c,h} files. This contains
just one method virDBusGetSystemBus() which obtains a handle
to the single shared system bus instance, with event glue
automagically setup.
This patch adds a netlink callback when migrating a VEPA enabled
virtual machine. It fixes a Bug where a VM would not request a port
association when it was cleared by lldpad.
This patch requires the latest git version of lldpad to work.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
Since Xen 3.1 the clock=variable semantic is supported. In addition to
qemu/kvm Xen also knows about a variant where the offset is relative to
'localtime' instead of 'utc'.
Extends the libvirt structure with a flag 'basis' to specify, if the
offset is relative to 'localtime' or 'utc'.
Extends the libvirt structure with a flag 'reset' to force the reset
behaviour of 'localtime' and 'utc'; this is needed for backward
compatibility with previous versions of libvirt, since they report
incorrect XML.
Adapt the only user 'qemu' to the new name.
Extend the RelaxNG schema accordingly.
Document the new 'basis' attribute in the HTML documentation.
Adapt test for the new attribute.
Signed-off-by: Philipp Hahn <hahn@univention.de>
This patch introduces a new event type for the QMP event
SUSPEND:
VIR_DOMAIN_EVENT_ID_PMSUSPEND
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventSuspendCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This patch introduces a new event type for the QMP event
WAKEUP:
VIR_DOMAIN_EVENT_ID_PMWAKEUP
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventWakeupCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This patch introduces a new event type for the QMP event
DEVICE_TRAY_MOVED, which occurs when the tray of a removable
disk is moved (i.e opened or closed):
VIR_DOMAIN_EVENT_ID_TRAY_CHANGE
The event's data includes the device alias and the reason
for tray status' changing, which indicates why the tray
status was changed. Thus the callback definition for the event
is:
enum {
VIR_DOMAIN_EVENT_TRAY_CHANGE_OPEN = 0,
VIR_DOMAIN_EVENT_TRAY_CHANGE_CLOSE,
\#ifdef VIR_ENUM_SENTINELS
VIR_DOMAIN_EVENT_TRAY_CHANGE_LAST
\#endif
} virDomainEventTrayChangeReason;
typedef void
(*virConnectDomainEventTrayChangeCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *devAlias,
int reason,
void *opaque);
Ensure that the functions in virauth.h have names matching the file
prefix, by renaming virRequest{Username,Password} to
virAuthGet{Username,Password}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The '.ini' file format is a useful alternative to the existing
config file style, when you need to have config files which
are hashes of hashes. The 'virKeyFilePtr' object provides a
way to parse these file types.
* src/Makefile.am, src/util/virkeyfile.c,
src/util/virkeyfile.h: Add .ini file parser
* tests/Makefile.am, tests/virkeyfiletest.c: Test
basic parsing capabilities
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Convert drivers currently using the qparams APIs, to instead
use the virURIPtr query parameters directly.
* src/esx/esx_util.c, src/hyperv/hyperv_util.c,
src/remote/remote_driver.c, src/xenapi/xenapi_utils.c: Remove
use of qparams
* src/util/qparams.h, src/util/qparams.c: Delete
* src/Makefile.am, src/libvirt_private.syms: Remove qparams
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Avoid the need for each driver to parse query parameters itself
by storing them directly in the virURIPtr struct. The parsing
code is a copy of that from src/util/qparams.c The latter will
be removed in a later patch
* src/util/viruri.h: Add query params to virURIPtr
* src/util/viruri.c: Parse query parameters when creating virURIPtr
* tests/viruritest.c: Expand test to cover params
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since we defined a custom virURIPtr type, we should use a
virURIFree method instead of assuming it will always be
a typedef for xmlURIPtr
* src/util/viruri.c, src/util/viruri.h, src/libvirt_private.syms:
Add a virURIFree method
* src/datatypes.c, src/esx/esx_driver.c, src/libvirt.c,
src/qemu/qemu_migration.c, src/vmx/vmx.c, src/xen/xend_internal.c,
tests/viruritest.c: s/xmlFreeURI/virURIFree/
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A few times libvirt users manually setting mac addresses have
complained of a networking failure that ends up being due to a multicast
mac address being used for a guest interface. This patch prevents that
by logging an error and failing if a multicast mac address is
encountered in each of the three following cases:
1) domain xml <interface> mac address.
2) network xml bridge mac address.
3) network xml dhcp/host mac address.
There are several other places where a mac address can be input that
aren't controlled in this manner because failure to do so has no
consequences (e.g., if the address will be used to search through
existing interfaces for a match).
The RNG has been updated to add multiMacAddr and uniMacAddr along with
the existing macAddr, and macAddr was switched to uniMacAddr where
appropriate.
numad is an user-level daemon that monitors NUMA topology and
processes resource consumption to facilitate good NUMA resource
alignment of applications/virtual machines to improve performance
and minimize cost of remote memory latencies. It provides a
pre-placement advisory interface, so significant processes can
be pre-bound to nodes with sufficient available resources.
More details: http://fedoraproject.org/wiki/Features/numad
"numad -w ncpus:memory_amount" is the advisory interface numad
provides currently.
This patch add the support by introducing a new XML attribute
for <vcpu>. e.g.
<vcpu placement="auto">4</vcpu>
<vcpu placement="static" cpuset="1-10^6">4</vcpu>
The returned advisory nodeset from numad will be printed
in domain's dumped XML. e.g.
<vcpu placement="auto" cpuset="1-10^6">4</vcpu>
If placement is "auto", the number of vcpus and the current
memory amount specified in domain XML will be used for numad
command line (numad uses MB for memory amount):
numad -w $num_of_vcpus:$current_memory_amount / 1024
The advisory nodeset returned from numad will be used to set
domain process CPU affinity then. (e.g. qemuProcessInitCpuAffinity).
If the user specifies both CPU affinity policy (e.g.
(<vcpu cpuset="1-10,^7,^8">4</vcpu>) and placement == "auto"
the specified CPU affinity will be overridden.
Only QEMU/KVM drivers support it now.
See docs update in patch for more details.
As documented in linux.git/Documentation/cgroups/cpuacct.txt,
cpuacct.stat returns user and system time in ticks (the same
unit used in times(2)). It would be a bit nicer if it were like
getrusage(2) and reported timeval contents, or like cpuacct.usage
and in nanoseconds, but we can't be picky.
* src/util/cgroup.h (virCgroupGetCpuacctStat): New function.
* src/util/cgroup.c (virCgroupGetCpuacctStat): Implement it.
(virCgroupGetValueStr): Allow for multi-line files.
* src/libvirt_private.syms (cgroup.h): Export it.
Some members are generated during XML parse (e.g. MAC address of
an interface); However, with current implementation, if we
are plugging a device both to persistent and live config,
we parse given XML twice: first time for live, second for config.
This is wrong then as the second time we are not guaranteed
to generate same values as we did for the first time.
To prevent that we need to create a copy of DeviceDefPtr;
This is done through format/parse process instead of writing
functions for deep copy as it is easier to maintain:
adding new field to any virDomain*DefPtr doesn't require change
of copying function.
Scaling an integer based on a suffix is something we plan on reusing
in several contexts: XML parsing, virsh CLI parsing, and possibly
elsewhere. Make it easy to reuse, as well as adding in support for
powers of 1000.
* src/util/util.h (virScaleInteger): New function.
* src/util/util.c (virScaleInteger): Implement it.
* src/libvirt_private.syms (util.h): Export it.
* For now, only "cpu_time" is supported.
* cpuacct cgroup is used for providing percpu cputime information.
* src/qemu/qemu.conf - take care of cpuacct cgroup.
* src/qemu/qemu_conf.c - take care of cpuacct cgroup.
* src/qemu/qemu_driver.c - added an interface
* src/util/cgroup.c/h - added interface for getting percpu cputime
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
These changes are applied only if the hostdev has a parent net device
(i.e. if it was defined as "<interface type='hostdev'>" rather than
just "<hostdev>"). If the parent netdevice has virtual port
information, the original virtualport associate functions are called
(these set and restore both mac and port profile on an
interface). Otherwise, only mac address is set on the device.
Note that This is only supported for SR-IOV Virtual Functions (not for
standard PCI or USB netdevs), and virtualport association is only
supported for 802.1Qbh. For all other types of cards and types of
virtualport, a "Config Unsupported" error is returned and the
operation fails.
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
This patch adds the following:
- functions to set and get vf configs
- Functions to replace and store vf configs (Only mac address is handled today.
But the functions can be easily extended for vlans and other vf configs)
- function to dump link dev info (This is moved from virnetdevvportprofile.c)
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
pciDeviceGetVirtualFunctionInfo returns pf netdevice name and virtual
function index for a given vf. This is just a wrapper around existing functions
to return vf's pf and vf_index with one api call
pciConfigAddressToSysfsfile returns the sysfile pci device link
from a 'struct pci_config_address'
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
This is the new interface type that sets up an SR-IOV PCI network
device to be assigned to the guest with PCI passthrough after
initializing some network device-specific things from the config
(e.g. MAC address, virtualport profile parameters). Here is an example
of the syntax:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0' bus='0' slot='4' function='3'/>
</source>
<mac address='00:11:22:33:44:55'/>
<address type='pci' domain='0' bus='0' slot='7' function='0'/>
</interface>
This would assign the PCI card from bus 0 slot 4 function 3 on the
host, to bus 0 slot 7 function 0 on the guest, but would first set the
MAC address of the card to 00:11:22:33:44:55.
NB: The parser and formatter don't care if the PCI card being
specified is a standard single function network adapter, or a virtual
function (VF) of an SR-IOV capable network adapter, but the upcoming
code that implements the back end of this config will work *only* with
SR-IOV VFs. This is because modifying the mac address of a standard
network adapter prior to assigning it to a guest is pointless - part
of the device reset that occurs during that process will reset the MAC
address to the value programmed into the card's firmware.
Although it's not supported by any of libvirt's hypervisor drivers,
usb network hostdevs are also supported in the parser and formatter
for completeness and consistency. <source> syntax is identical to that
for plain <hostdev> devices, except that the <address> element should
have "type='usb'" added if bus/device are specified:
<interface type='hostdev'>
<source>
<address type='usb' bus='0' device='4'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
If the vendor/product form of usb specification is used, type='usb'
is implied:
<interface type='hostdev'>
<source>
<vendor id='0x0012'/>
<product id='0x24dd'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
Again, the upcoming patch to fill in the backend of this functionality
will log an error and fail with "Unsupported Config" if you actually
try to assign a USB network adapter to a guest using <interface
type='hostdev'> - just use a standard <hostdev> entry in that case
(and also for single-port PCI adapters).
Three new functions useful in other files:
virDomainHostdevInsert:
Add a new hostdev at the end of the array. This would more sensibly be
called virDomainHostdevAppend, but the existing functions for other
types of devices are called Insert.
virDomainHostdevRemove:
Eliminates one entry from the hostdevs array, but doesn't free it;
patterned after the code at the end of the two
qemuDomainDetachHostXXXDevice functions (and also other pre-existing
virDomainXXXRemove functions for other device types).
virDomainHostdevFind:
This function is patterned from the search loops at the top of
qemuDomainDetachHostPciDevice and qemuDomainDetachHostUsbDevice, and
will be used to re-factor those (and other detach-related) functions.
In order to allow for a virDomainHostdevDef that uses the
virDomainDeviceInfo of a "higher level" device (such as a
virDomainNetDef), this patch changes the virDomainDeviceInfo in the
HostdevDef into a virDomainDeviceInfoPtr. Rather than adding checks
all over the code to check for a null info, we just guarantee that it
is always valid. The new function virDomainHostdevDefAlloc() allocates
a virDomainDeviceInfo and plugs it in, and virDomainHostdevDefFree()
makes sure it is freed.
There were 4 places allocating virDomainHostdevDefs, all of them
parsers of one sort or another, and those have all had their
VIR_ALLOC(hostdev) changed to virDomainHostdevDefAlloc(). Other than
that, and the new functions, all the rest of the changes are just
mechanical removals of "&" or changing "." to "->".
This code adds a netlink event interface to libvirt.
It is based upon the event_poll code and makes use of
it. An event is generated for each netlink message sent
to the libvirt pid.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
This patch adds a set of functions used in creating console streams for
domains using PTYs and ensures mutually exclusive access to the PTYs.
If mutually exclusive access is not used, two clients may open the same
console, which results in corruption on both clients as both of them
race to read data from the PTY.
Two approaches are used to ensure this:
1) Internal data structure holding open PTYs.
This is used internally and enables the user to forcibly
terminate another console connection eg. when somebody leaves
the console open on another host.
2) UUCP style lock files:
This uses UUCP lock files according to the FHS
( http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLOCKLOCKFILES )
to check if other programs (like minicom) are not using the pty
device of the console.
This feature is disabled by default and may be enabled using
configure parameter
--with-console-lock-files=/path/to/lock/file/directory
or --with-console-lock-files=auto (which tries to infer the
location from OS used (currently only linux).
On usual linux systems, normal users may not write to the
/var/lock directory containing the locks. This poses problems
while in session mode. If the current user has no access to the
lockfile directory, check for presence of the file is still
done, but no lock file is created. This does NOT result in an
error.
Function xmlParseURI does not remove square brackets around IPv6
address when parsing. One of the solutions is making wrappers around
functions working with xmlURI*. This assures that uri->server will be
always properly assigned and it doesn't have to be changed when used
on some new place in the code.
For this purpose, functions virParseURI and virSaveURI were
added. These function are wrappers around xmlParseURI and xmlSaveUri
respectively.
Also there is one new syntax check function to prohibit these functions
anywhere else.
File changes:
- src/util/viruri.h -- declaration
- src/util/viruri.c -- definition
- src/libvirt_private.syms -- symbol export
- src/Makefile.am -- added source and header files
- cfg.mk -- added sc_prohibit_xmlURI
- all others -- ID name and include fixes
This patch allows libvirt to add interfaces to already
existing Open vSwitch bridges. The following syntax in
domain XML file can be used:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'/>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
or if libvirt should auto-generate the interfaceid use
following syntax:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
It is also possible to pass an optional profileid. To do that
use following syntax:
<interface type='bridge'>
<source bridge='ovsbr'/>
<mac address='00:55:1a:65:a2:8d'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'
profileid='test-profile'/>
</virtualport>
</interface>
To create Open vSwitch bridge install Open vSwitch and
run the following command:
ovs-vsctl add-br ovsbr
The auto-generated WWN comply with the new addressing schema of WWN:
<quote>
the first nibble is either hex 5 or 6 followed by a 3-byte vendor
identifier and 36 bits for a vendor-specified serial number.
</quote>
We choose hex 5 for the first nibble. And for the 3-bytes vendor ID,
we uses the OUI according to underlying hypervisor type, (invoking
virConnectGetType to get the virt type). e.g. If virConnectGetType
returns "QEMU", we use Qumranet's OUI (00:1A:4A), if returns
ESX|VMWARE, we use VMWARE's OUI (00:05:69). Currently it only
supports qemu|xen|libxl|xenapi|hyperv|esx|vmware drivers. The last
36 bits are auto-generated.
Rename the src/util/netlink files to src/util/virnetlink to
better fit the naming scheme. Also rename nlComm to virNetlinkCommand.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
This was forgotten when the function was originally written (not
noticed because it wasn't used at the time). It's required for
proper compilation with modules enabled after applying the recent
virStorageVolResize patches.
This was forgotten when the function was initially written (not
noticed because it wasn't used at the time). It's required for proper
compilation with modules enabled after applying the recent rawio
patches.
The virEmitXMLWarning function should always have been in
the xml.[hc] files, and should use virXML as its name
prefix
* src/util/util.c, src/util/util.h: Remove virEmitXMLWarning
* src/util/xml.c, src/util/xml.h: Add virXMLEmitWarning
Introduce a function that rebuilds all running VMs' filters. Call
this function when reloading the nwfilter driver.
This addresses a problem introduced by the 2nd patch that typically
causes no filters to be reinstantiate anymore upon driver reload
since their XML has not changed. Yet the current behavior is that
upon a SIGHUP all filters get reinstantiated.
The old virRandom() API was not generating good random numbers.
Replace it with a new API virRandomBits which instead of being
told the upper limit, gets told the number of bits of randomness
required.
* src/util/virrandom.c, src/util/virrandom.h: Add virRandomBits,
and move virRandomInitialize
* src/util/util.h, src/util/util.c: Delete virRandom and
virRandomInitialize
* src/libvirt.c, src/security/security_selinux.c,
src/test/test_driver.c, src/util/iohelper.c: Update for
changes from virRandom to virRandomBits
* src/storage/storage_backend_iscsi.c: Remove bogus call
to virRandomInitialize & convert to virRandomBits
Add a virFileTouch API which ensures that a file will always
exist, even if zero length
* src/util/virfile.c, src/util/virfile.h,
src/libvirt_private.syms: Introduce virFileTouch
While we still don't want to enable gcc's new -Wformat-literal
warning, I found a rather easy case where the warning could be
reduced, by getting rid of obsolete error-reporting practices.
This is the last place where we were passing the (unused) net
and conn arguments for constructing an error.
* src/util/virterror_internal.h (virErrorMsg): Delete prototype.
(virReportError): Delete macro.
* src/util/virterror.c (virErrorMsg): Make static.
* src/libvirt_private.syms (virterror_internal.h): Drop export.
* src/util/conf.c (virConfError): Convert to macro.
(virConfErrorHelper): New function, and adjust error calls.
* src/xen/xen_hypervisor.c (virXenErrorFunc): Delete.
(xenHypervisorGetSchedulerType)
(xenHypervisorGetSchedulerParameters)
(xenHypervisorSetSchedulerParameters)
(xenHypervisorDomainBlockStats)
(xenHypervisorDomainInterfaceStats)
(xenHypervisorDomainGetOSType)
(xenHypervisorNodeGetCellsFreeMemory, xenHypervisorGetVcpus):
Update callers.
Preparation for another patch that refactors common patterns
into the new file for fewer lines of code overall.
* src/util/util.h (virTypedParameterArrayClear): Move...
* src/util/virtypedparam.h: ...to new file.
(virTypedParameterArrayValidate, virTypedParameterAssign): New
prototypes.
* src/util/util.c (virTypedParameterArrayClear): Likewise.
* src/util/virtypedparam.c: New file.
* po/POTFILES.in: Mark file for translation.
* src/Makefile.am (UTIL_SOURCES): Build it.
* src/libvirt_private.syms (util.h): Split...
(virtypedparam.h): to new section.
(virkeycode.h): Sort.
* daemon/remote.c: Adjust callers.
* tools/virsh.c: Likewise.
To avoid a namespace clash with forthcoming identity APIs,
rename the virNet*GetLocalIdentity() APIs to have the form
virNet*GetUNIXIdentity()
* daemon/remote.c, src/libvirt_private.syms: Update
for renamed APIs
* src/rpc/virnetserverclient.c, src/rpc/virnetserverclient.h,
src/rpc/virnetsocket.c, src/rpc/virnetsocket.h: s/LocalIdentity/UNIXIdentity/
Given an LXC guest with a root filesystem path of
/export/lxc/roots/helloworld/root
During startup, we will pivot the root filesystem to end up
at
/.oldroot/export/lxc/roots/helloworld/root
We then try to open
/.oldroot/export/lxc/roots/helloworld/root/dev/pts
Now consider if '/export/lxc' is an absolute symlink pointing
to '/media/lxc'. The kernel will try to open
/media/lxc/roots/helloworld/root/dev/pts
whereas it should be trying to open
/.oldroot//media/lxc/roots/helloworld/root/dev/pts
To deal with the fact that the root filesystem can be moved,
we need to resolve symlinks in *any* part of the filesystem
source path.
* src/libvirt_private.syms, src/util/util.c,
src/util/util.h: Add virFileResolveAllLinks to resolve
all symlinks in a path
* src/lxc/lxc_container.c: Resolve all symlinks in filesystem
paths during startup
This introduces new attribute wrpolicy with only supported
value as immediate. This will be an optional
attribute with no defaults. This helps specify whether
to skip the host page cache.
When wrpolicy is specified, meaning when wrpolicy=immediate
a writeback is explicitly initiated for the dirty pages in
the host page cache as part of the guest file write operation.
Usage:
<filesystem type='mount' accessmode='passthrough'>
<driver type='path' wrpolicy='immediate'/>
<source dir='/export/to/guest'/>
<target dir='mount_tag'/>
</filesystem>
Currently this only works with type='mount' for the QEMU/KVM driver.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
VIR_DOMAIN_XML_UPDATE_CPU flag for virDomainGetXMLDesc may be used to
get updated custom mode guest CPU definition in case it depends on host
CPU. This patch implements the same behavior for host-model and
host-passthrough CPU modes.
The new introduced optional attribute "copy_on_read</code> controls
whether to copy read backing file into the image file. The value can
be either "on" or "off". Copy-on-read avoids accessing the same backing
file sectors repeatedly and is useful when the backing file is over a
slow network. By default copy-on-read is off.
Address side effect of accessing a variable via an index: Filters
accessing a variable where an element is accessed that is beyond the
size of the list (for example $TEST[10] and only 2 elements are available)
cannot instantiate that filter. Test for this and report proper error
to user.
This patch introduces the capability to use a different iterator per
variable.
The currently supported notation of variables in a filtering rule like
<rule action='accept' direction='out'>
<tcp srcipaddr='$A' srcportstart='$B'/>
</rule>
processes the two lists 'A' and 'B' in parallel. This means that A and B
must have the same number of 'N' elements and that 'N' rules will be
instantiated (assuming all tuples from A and B are unique).
In this patch we now introduce the assignment of variables to different
iterators. Therefore a rule like
<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@1]' srcportstart='$B[@2]'/>
</rule>
will now create every combination of elements in A with elements in B since
A has been assigned to an iterator with Id '1' and B has been assigned to an
iterator with Id '2', thus processing their value independently.
The first rule has an equivalent notation of
<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@0]' srcportstart='$B[@0]'/>
</rule>
Most severe here is a latent (but currently untriggered) memory leak
if any hypervisor ever adds a string interface property; the
remainder are mainly cosmetic.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BANDWIDTH_*): Move
macros closer to interface that uses them, and document type.
* src/libvirt.c (virDomainSetInterfaceParameters)
(virDomainGetInterfaceParameters): Formatting tweaks.
* daemon/remote.c (remoteDispatchDomainGetInterfaceParameters):
Avoid memory leak.
* src/libvirt_public.syms (LIBVIRT_0.9.9): Sort lines.
* src/libvirt_private.syms (domain_conf.h): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSetInterfaceParameters): Fix
comments, break long lines.
Currently all drivers using domain events need to provide a callback
for handling a timer to dispatch events in a clean stack. There is
no technical reason for dispatch to go via driver specific code. It
could trivially be dispatched directly from the domain event code,
thus removing tedious boilerplate code from all drivers
Also fix the libxl & xen drivers to pass 'true' when creating the
virDomainEventState, since they run inside the daemon & thus always
expect events to be present.
* src/conf/domain_event.c, src/conf/domain_event.h: Internalize
dispatch of events from timer callback
* src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
src/qemu/qemu_domain.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c,
src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c: Remove all timer dispatch functions
The virDomainEventCallbackList and virDomainEventQueue APIs are
now solely helpers used internally by virDomainEventState APIs.
Remove their decls from domain_event.h since no driver code should
need to use them any more.
* src/conf/domain_event.c: Make virDomainEventCallbackList and
virDomainEventQueue APIs static & remove some unused APIs
* src/conf/domain_event.h, src/libvirt_private.syms: Remove
virDomainEventCallbackList and virDomainEventQueue APIs
While virDomainEventState has APIs for managing removal of callbacks,
while locked, adding callbacks in the first place requires direct
access to the virDomainEventCallbackList structure. This is not
threadsafe since it is bypassing the virDomainEventState locks
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Add APIs for managing callbacks
via virDomainEventState.
When registering a callback for a particular event some callers
need to know how many callbacks already exist for that event.
While it is possible to ask for a count, this is not free from
race conditions when threaded. Thus the API for registering
callbacks should return the count of callbacks. Also rename
virDomainEventStateDeregisterAny to virDomainEventStateDeregisterID
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Return count of callbacks when
registering callbacks
* src/libxl/libxl_driver.c, src/libxl/libxl_driver.c,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/remote/remote_driver.c, src/uml/uml_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xen_driver.c: Update
for change in APIs
This chunk of code below repeated in several functions, factor it into
a helper method virDomainLiveConfigHelperMethod to eliminate duplicated code
based on Eric and Adam's suggestion. I have tested it for all the
relevant APIs changed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
On RHEL 5, with libxml2-2.6.26, the build failed with:
virsh.c: In function 'vshNodeIsSuperset':
virsh.c:11951: warning: implicit declaration of function 'xmlChildElementCount'
(or if warnings aren't errors, a link failure later on).
* src/util/xml.h (virXMLChildElementCount): New prototype.
* src/util/xml.c (virXMLChildElementCount): New function.
* src/libvirt_private.syms (xml.h): Export it.
* tools/virsh.c (vshNodeIsSuperset): Use it.
The Mingw32 linker highlighted that the symbols for virtime.h
declared in libvirt_private.syms were incorrect
* src/libvirt_private.syms: Fix virtime.h symbols
The virTimestamp and virTimeMs functions in src/util/util.h
duplicate functionality from virtime.h, in a non-async signal
safe manner. Remove them, and convert all code over to the new
APIs.
* src/util/util.c, src/util/util.h: Delete virTimeMs and virTimestamp
* src/lxc/lxc_driver.c, src/qemu/qemu_domain.c,
src/qemu/qemu_driver.c, src/qemu/qemu_migration.c,
src/qemu/qemu_process.c, src/util/event_poll.c: Convert to use
virtime APIs
The logging APIs need to be able to generate formatted timestamps
using only async signal safe functions. This rules out using
gmtime/localtime/malloc/gettimeday(!) and much more.
Introduce a new internal API which is async signal safe.
virTimeMillisNowRaw replacement for gettimeofday. Uses clock_gettime
where available, otherwise falls back to the unsafe
gettimeofday
virTimeFieldsNowRaw replacements for gmtime(), convert a timestamp
virTimeFieldsThenRaw into a broken out set of fields. No localtime()
replacement is provided, because converting to
local time is not practical with only async signal
safe APIs.
virTimeStringNowRaw replacements for strftime() which print a timestamp
virTimeStringThenRaw into a string, using a pre-determined format, with
a fixed size buffer (VIR_TIME_STRING_BUFLEN)
For each of these there is also a version without the Raw postfix
which raises a full libvirt error. These versions are not async
signal safe
* src/Makefile.am, src/util/virtime.c, src/util/virtime.h: New files
* src/libvirt_private.syms: New APis
* configure.ac: Check for clock_gettime in -lrt
* tests/virtimetest.c, tests/Makefile.am: Test new APIs
Fix the build on Mingw32 by removing the now obsolete
virGetPMCapabilities symbol from the private exports file
* src/libvirt_private.syms: Remove virGetPMCapabilities
This adds per-device weights to <blkiotune>. Note that the
cgroups implementation only supports weights per block device,
and not per-file within the device; hence this option must be
global to the domain definition rather than tied to individual
<devices>/<disk> entries:
<domain ...>
<blkiotune>
<device>
<path>/path/to/block</path>
<weight>1000</weight>
</device>
</blkiotune>
..
This patch also adds a parameter --device-weights to virsh command
blkiotune for setting/getting blkiotune.weight_device for any
hypervisor that supports it. All <device> entries under
<blkiotune> are concatenated into a single string attribute under
virDomain{Get,Set}BlkioParameters, named "device_weight".
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add the core functions that implement the functionality of the API.
Suspend is done by using an asynchronous mechanism so that we can return
the status to the caller before the host gets suspended. This asynchronous
operation is achieved by suspending the host in a separate thread of
execution. However, returning the status to the caller is only best-effort,
but not guaranteed.
To resume the host, an RTC alarm is set up (based on how long we want to
suspend) before suspending the host. When this alarm fires, the host
gets woken up.
Suspend-to-RAM operation on a host running Linux can take upto more than 20
seconds, depending on the load of the system. (Freezing of tasks, an operation
preceding any suspend operation, is given up after a 20 second timeout).
And Suspend-to-Disk can take even more time, considering the time required
for compaction, creating the memory image and writing it to disk etc.
So, we do not allow the user to specify a suspend duration of less than 60
seconds, to be on the safer side, since we don't want to prematurely declare
failure when we only had to wait for some more time.
In preparation of DHCP Snooping and the detection of multiple IP
addresses per interface:
The hash table that is used to collect the detected IP address of an
interface can so far only handle one IP address per interface. With
this patch we extend this to allow it to handle a list of IP addresses.
Above changes the returned variable type of virNWFilterGetIpAddrForIfname()
from char * to virNWFilterVarValuePtr; adapt all existing functions calling
this function.
This patch adds support for filtering of STP (spanning tree protocol) traffic
to the parser and makes us of the ebtables support for STP filtering. This code
now enables the filtering of traffic in chains with prefix 'stp'.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch exports KVM Host Power Management capabilities as XML so that
higher-level systems management software can make use of these features
available in the host.
The script "pm-is-supported" (from pm-utils package) is run to discover if
Suspend-to-RAM (S3) or Suspend-to-Disk (S4) is supported by the host.
If either of them are supported, then a new tag "<power_management>" is
introduced in the XML under the <host> tag.
However in case the query to check for power management features succeeded,
but the host does not support any such feature, then the XML will contain
an empty <power_management/> tag. In the event that the PM query itself
failed, the XML will not contain any "power_management" tag.
To use this, new APIs could be implemented in libvirt to exploit power
management features such as S3/S4.
Now, when we support multiple consoles per domain,
the vm->def->console[0] can still remain an alias
for vm->def->serial[0]; However, we need to copy
it's source definition as well otherwise we'll regress
on virDomainOpenConsole.
Mingw32 complains if you request export of a symbol which does
not in fact exist.
* src/libvirt_bridge.syms, src/libvirt_macvtap.syms: Delete
obsolete files
* src/libvirt_private.syms: Remove virNetServerGetDBusConn
* src/libvirt_dbus.syms: Add virNetServerGetDBusConn
This patch extends the NWFilter driver for Linux (ebiptables) to create
rules for each member of a previously introduced list. If for example
an attribute value (internally) looks like this:
IP = [10.0.0.1, 10.0.0.2, 10.0.0.3]
then 3 rules will be generated for a rule accessing the variable 'IP',
one for each member of the list. The effect of this is that this now
allows for filtering for multiple values in one field. This can then be
used to support for filtering/allowing of multiple IP addresses per
interface.
An iterator is introduced that extracts each member of a list and
puts it into a hash table which then is passed to the function creating
a rule. For the above example the iterator would cause 3 loops.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
NWFilters can be provided name-value pairs using the following
XML notation:
<filterref filter='xyz'>
<parameter name='PORT' value='80'/>
<parameter name='VAL' value='abc'/>
</filterref>
The internal representation currently is so that a name is stored as a
string and the value as well. This patch now addresses the value part of it
and introduces a data structure for storing a value either as a simple
value or as an array for later support of lists.
This patch adjusts all code that was handling the values in hash tables
and makes it use the new data type.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Add a function to the virHashTable for getting an array of the hash table's
key-value pairs and have the keys (optionally) sorted.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Move the ifaceMacvtapLinkDump and ifaceGetNthParent functions
into virnetdevvportprofile.c since they are specific to that
code. This avoids polluting the headers with the Linux specific
netlink data types
* src/util/interface.c, src/util/interface.h: Move
ifaceMacvtapLinkDump and ifaceGetNthParent functions and delete
remaining file
* src/util/virnetdevvportprofile.c: Add ifaceMacvtapLinkDump
and ifaceGetNthParent functions
* src/network/bridge_driver.c, src/nwfilter/nwfilter_gentech_driver.c,
src/nwfilter/nwfilter_learnipaddr.c, src/util/virnetdevmacvlan.c:
Remove include of interface.h
Rename ifaceIsVirtualFunction to virNetDevIsVirtualFunction,
ifaceGetVirtualFunctionIndex to virNetDevGetVirtualFunctionIndex
and ifaceGetPhysicalFunction to virNetDevGetPhysicalFunction
* src/util/interface.c, src/util/interface.h: Rename APIs
* src/util/virnetdevvportprofile.c: Update for API rename
Rename the ifaceCheck method to virNetDevValidateConfig and change
so that it always raises an error and returns -1 on error.
* src/util/interface.c, src/util/interface.h: Rename ifaceCheck
to virNetDevValidateConfig
* src/nwfilter/nwfilter_gentech_driver.c,
src/nwfilter/nwfilter_learnipaddr.c: Update for API rename
To match up with the existing virNetDevSetIPv4Address, rename
ifaceGetIPAddress to virNetDevGetIPv4Address
* util/interface.h, util/interface.c: Rename API
* network/bridge_driver.c: Update for API rename
Rename the ifaceGetIndex method to virNetDevGetIndex and
ifaceGetVlanID to virNetDevGetVLanID. Also change the error
reporting behaviour to always raise errors and return -1 on
failure
* util/interface.c, util/interface.h: Rename ifaceGetIndex
and ifaceGetVLAN
* nwfilter/nwfilter_gentech_driver.c, nwfilter/nwfilter_learnipaddr.c,
nwfilter/nwfilter_learnipaddr.c, util/virnetdevvportprofile.c: Update
for API renames and error handling changes
Rename ifaceReplaceMacAddress to virNetDevReplaceMacAddress
and ifaceRestoreMacAddress to virNetDevRestoreMacAddress.
* util/interface.c, util/interface.h, util/virnetdevmacvlan.c:
Rename APIs
Rename ifaceMacvtapLinkAdd to virNetDevMacVLanCreate and
ifaceLinkDel to virNetDevMacVLanDelete. Strictly speaking
the latter isn't restricted to macvlan devices, but that's
the only use libvirt has for it.
* util/interface.c, util/interface.h,
util/virnetdevmacvlan.c: Rename APIs
In preparation for code re-organization, rename the Macvtap
management APIs to have the following patterns
virNetDevMacVLanXXXXX - macvlan/macvtap interface management
virNetDevVPortProfileXXXX - virtual port profile management
* src/util/macvtap.c, src/util/macvtap.h: Rename APIs
* src/conf/domain_conf.c, src/network/bridge_driver.c,
src/qemu/qemu_command.c, src/qemu/qemu_command.h,
src/qemu/qemu_driver.c, src/qemu/qemu_hotplug.c,
src/qemu/qemu_migration.c, src/qemu/qemu_process.c,
src/qemu/qemu_process.h: Update for renamed APIs
I missed adding virNetServerGetDBusConn() to libvirtd_private.syms
in commit b8adfcc6, which didn't cause a problem in 0.9.6 but
results in this build error in 0.9.7
libvirtd-remote.o: In function `remoteDispatchAuthPolkit':
remote.c:(.text+0x188dd): undefined reference to `virNetServerGetDBusConn'
The ifaceSetMac and ifaceGetMac APIs duplicate the functionality
of the virNetDevSetMAC and virNetDevGetMAC APIs, but returning
errno's instead of raising errors.
* src/util/interface.c, src/util/interface.h: Remove
ifaceSetMac and ifaceGetMac APIs, adjusting callers
for new error behaviour
The ifaceUp, ifaceDown, ifaceCtrl & ifaceIsUp APIs can be replaced
with calls to virNetDevSetOnline and virNetDevIsOnline
* src/util/interface.c, src/util/interface.h: Delete ifaceUp,
ifaceDown, ifaceCtrl & ifaceIsUp
* src/nwfilter/nwfilter_gentech_driver.c, src/util/macvtap.c:
Update to use virNetDevSetOnline and virNetDevIsOnline
Rename the virVirtualPortProfileParams struct to be
virNetDevVPortProfile, and rename the APIs to match
this prefix.
* src/util/network.c, src/util/network.h: Rename port profile
APIs
* src/conf/domain_conf.c, src/conf/domain_conf.h,
src/conf/network_conf.c, src/conf/network_conf.h,
src/network/bridge_driver.c, src/qemu/qemu_hotplug.c,
src/util/macvtap.c, src/util/macvtap.h: Update for
renamed APIs/structs
This allows strings to be transported between client and server
in the context of name-type-value virTypedParameter functions.
For compatibility,
o new clients will not send strings to old servers, based on
a feature check
o new servers will not send strings to old clients without the
flag VIR_TYPED_PARAM_STRING_OKAY; this will be enforced at
the RPC layer in the next patch, so that drivers need not
worry about it in general. The one exception is that
virDomainGetSchedulerParameters lacks a flags argument, so
it must not return a string; drivers that forward that
function on to virDomainGetSchedulerParametersFlags will
have to pay attention to the flag.
o the flag VIR_TYPED_PARAM_STRING_OKAY is set automatically,
based on a feature check (so far, no driver implements it),
so clients do not have to worry about it
Future patches can then enable the feature on a per-driver basis.
This patch also ensures that drivers can blindly strdup() field
names (previously, a malicious client could stuff 80 non-NUL bytes
into field and cause a read overrun).
* src/libvirt_internal.h (VIR_DRV_FEATURE_TYPED_PARAM_STRING): New
driver feature.
* src/libvirt.c (virTypedParameterValidateSet)
(virTypedParameterSanitizeGet): New helper functions.
(virDomainSetMemoryParameters, virDomainSetBlkioParameters)
(virDomainSetSchedulerParameters)
(virDomainSetSchedulerParametersFlags)
(virDomainGetMemoryParameters, virDomainGetBlkioParameters)
(virDomainGetSchedulerParameters)
(virDomainGetSchedulerParametersFlags, virDomainBlockStatsFlags):
Use them.
* src/util/util.h (virTypedParameterArrayClear): New helper
function.
* src/util/util.c (virTypedParameterArrayClear): Implement it.
* src/libvirt_private.syms (util.h): Export it.
Based on an initial patch by Hu Tao, with feedback from
Daniel P. Berrange.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add virnetdev.h,virnetdevbridge.h,virnetdevtap.h to private symbols,
since debian linker no longer allows transitive link resolution
Signed-off-by: Eli Qiao <taget@linux.vnet.ibm.com>
The socket address APIs in src/util/network.h either take the
form virSocketAddrXXX, virSocketXXX or virSocketXXXAddr.
Sanitize this so everything is virSocketAddrXXXX, and ensure
that the virSocketAddr parameter is always the first one.
* src/util/network.c, src/util/network.h: Santize socket
address API naming
* src/conf/domain_conf.c, src/conf/network_conf.c,
src/conf/nwfilter_conf.c, src/network/bridge_driver.c,
src/nwfilter/nwfilter_ebiptables_driver.c,
src/nwfilter/nwfilter_learnipaddr.c,
src/qemu/qemu_command.c, src/rpc/virnetsocket.c,
src/util/dnsmasq.c, src/util/iptables.c,
src/util/virnetdev.c, src/vbox/vbox_tmpl.c: Update for
API renaming
To support "managed" mode of host PCI device, we record the original
states (unbind_from_stub, remove_slot, and reprobe) so that could
reattach the device to host with original driver. But there is no XML
for theses attrs, and thus after daemon is restarted, we lose the
original states. It's easy to reproduce:
1) virsh start domain
2) virsh attach-device dom hostpci.xml (in 'managed' mode)
3) service libvirtd restart
4) virsh destroy domain
You will see the device won't be bound to the original driver
if there was one.
This patch is to solve the problem by introducing internal XML
(won't be dumped to user, only dumped to status XML). The XML is:
<origstates>
<unbind/>
<remove_slot/>
<reprobe/>
</origstates>
Which will be child node of <hostdev><source>...</souce></hostdev>.
(only for PCI device).
A new struct "virDomainHostdevOrigStates" is introduced for the XML,
and the according members are updated when preparing the PCI device.
And function "qemuUpdateActivePciHostdevs" is modified to honor
the original states. Use of qemuGetPciHostDeviceList is removed
in function "qemuUpdateActivePciHostdevs", and the "managed" value of
the device config is honored by the change. This fixes another problem
alongside:
qemuGetPciHostDeviceList set the device as "managed" force
regardless of whether the device is configured as "managed='yes'"
or not in XML, which is not right.
Add additional fields to let you specify the how to authenticate with a disk.
The secret to use may be referenced by a usage string or a UUID, i.e.:
<auth username='myuser'>
<secret type='ceph' usage='secretname'/>
</auth>
or
<auth username='myuser'>
<secret type='ceph' uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/>
</auth>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
The RPC server classes are extended to allow FDs to be received
from clients with calls. There is not currently any way for a
procedure to pass FDs back to the client with replies
* daemon/remote.c, src/rpc/gendispatch.pl: Change virNetMessageHeaderPtr
param to virNetMessagePtr in dispatcher impls
* src/rpc/virnetserver.c, src/rpc/virnetserverclient.c,
src/rpc/virnetserverprogram.c, src/rpc/virnetserverprogram.h:
Extend to support FD passing
Extend the RPC client code to allow file descriptors to be sent
to the server with calls, and received back with replies.
* src/remote/remote_driver.c: Stub extra args
* src/libvirt_private.syms, src/rpc/virnetclient.c,
src/rpc/virnetclient.h, src/rpc/virnetclientprogram.c,
src/rpc/virnetclientprogram.h: Extend APIs to allow
FD passing
Define two new RPC message types VIR_NET_CALL_WITH_FDS and
VIR_NET_REPLY_WITH_FDS. These message types are equivalent
to VIR_NET_CALL and VIR_NET_REPLY, except that between the
message header, and payload there is a 32-bit integer field
specifying how many file descriptors have been passed.
The actual file descriptors are sent/recv'd out of band.
* src/rpc/virnetmessage.c, src/rpc/virnetmessage.h,
src/libvirt_private.syms: Add support for handling
passed file descriptors
* src/rpc/virnetprotocol.x: Extend protocol for FD
passing
Add APIs to the virNetSocket object, to allow file descriptors
to be sent/received over UNIX domain socket connections
* src/rpc/virnetsocket.c, src/rpc/virnetsocket.h,
src/libvirt_private.syms: Add APIs for FD send/recv
Every time we write XML into a file we call virEmitXMLWarning to write a
warning that the file is automatically generated. virXMLSaveFile
simplifies this into a single step and makes rewriting existing XML file
safe by using virFileRewrite internally.
When saving config files we just overwrite old content of the file. In
case something fails during that process (e.g. disk gets full) we lose
both old and new content. This patch makes the process more robust by
writing the new content into a separate file and only if that succeeds
the original file is atomically replaced with the new one.
If a disk source gets dropped because it is not accessible,
mgmt application might want to be informed about this. Therefore
we need to emit an event. The event presented in this patch
is however a bit superset of what written above. The reason is simple:
an intention to be easily expanded, e.g. on 'user ejected disk
in guest' events. Therefore, callback gets source string and disk alias
(which should be unique among a domain) and reason (an integer);