Following the patterns established by lifecycle events, this
creates all the new RPC calls needed to pass callback IDs
for every domain event, and changes the limits in client and
server codes to use modern style when possible.
I've tested all combinations: both 'old client and new server'
and 'new client and old server' continue to work with the old
RPCs, and 'new client and new server' benefit from server-side
filtering with the new RPCs.
* src/remote/remote_protocol.x (REMOTE_PROC_DOMAIN_EVENT_*): Add
REMOTE_PROC_DOMAIN_EVENT_CALLBACK_* counterparts.
* daemon/remote.c (remoteRelayDomainEvent*): Send callbackID via
newer RPC when used with new-style registration.
(remoteDispatchConnectDomainEventCallbackRegisterAny): Extend to
cover all domain events.
* src/remote/remote_driver.c (remoteDomainBuildEvent*): Add new
Callback and Helper functions.
(remoteEvents): Match order of RPC numbers, register new handlers.
(remoteConnectDomainEventRegisterAny)
(remoteConnectDomainEventDeregisterAny): Extend to cover all
domain events.
* src/remote_protocol-structs: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
The counterpart to the server RPC additions; here, a single
function can serve both old and new calls, while incoming
events must be serviced by two different functions. Again,
some wise choices in our XDR made it easier to share code
managing similar events.
While this only supports lifecycle events, it covers the
harder part of how Register and RegisterAny interact; the
remaining 15 events will be a mechanical change in a later
patch. For Register, we now have a callbackID locally for
more efficient cleanup if the RPC fails; we also prefer to
use the newer RPC where we know it is supported (the older
RPC must be used if we don't know if RegisterAny is
supported).
* src/remote/remote_driver.c (remoteEvents): Register new RPC
event handler.
(remoteDomainBuildEventLifecycle): Move guts...
(remoteDomainBuildEventLifecycleHelper): ...here.
(remoteDomainBuildEventCallbackLifecycle): New function.
(remoteConnectDomainEventRegister)
(remoteConnectDomainEventDeregister)
(remoteConnectDomainEventRegisterAny)
(remoteConnectDomainEventDeregisterAny): Use new RPC when supported.
We want to convert over to server-side events, even for older
APIs. To do that, the client side of the remote driver wants
to distinguish between legacy virConnectDomainEventRegister and
normal virConnectDomainEventRegisterAny, while knowing the
client callbackID and the server's serverID for both types of
registration. The client also needs to probe whether the
server supports server-side filtering. However, for ease of
review, we don't actually use the new RPCs until a later patch.
* src/conf/object_event_private.h (virObjectEventStateCallbackID):
Add parameter.
* src/conf/object_event.c (virObjectEventCallbackListAddID)
(virObjectEventStateRegisterID): Separate legacy from callbackID.
(virObjectEventStateCallbackID): Pass through parameter.
(virObjectEventCallbackLookup): Let legacy and global domain
lifecycle events share a common remoteID.
* src/conf/network_event.c (virNetworkEventStateRegisterID):
Update caller.
* src/conf/domain_event.c (virDomainEventStateRegister)
(virDomainEventStateRegisterID, virDomainEventStateDeregister):
Likewise.
(virDomainEventStateRegisterClient)
(virDomainEventStateCallbackID): Implement new functions.
* src/conf/domain_event.h (virDomainEventStateRegisterClient)
(virDomainEventStateCallbackID): New prototypes.
* src/remote/remote_driver.c (private_data): Add field.
(doRemoteOpen): Probe server feature.
(remoteConnectDomainEventRegister)
(remoteConnectDomainEventRegisterAny): Use new function.
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch adds some new RPC call numbers, but for ease of review,
they sit idle until a later patch adds the client counterpart to
drive the new RPCs. Also for ease of review, I limited this patch
to just the lifecycle event; although converting the remaining
15 domain events will be quite mechanical. On the server side,
we have to have a function per RPC call, largely with duplicated
bodies (the key difference being that we store in our callback
opaque pointer whether events should be fired with old or new
style); meanwhile, a single function can drive multiple RPC
messages. With a strategic choice of XDR struct layout, we can
make the event generation code for both styles fairly compact.
I debated about adding a tri-state witness variable per
connection (values 'unknown', 'legacy', 'modern'). It would start
as 'unknown', move to 'legacy' if any RPC call is made to a legacy
event call, and move to 'modern' if the feature probe is made;
then the event code could issue an error if the witness state is
incorrect (a legacy RPC call while in 'modern', a modern RPC call
while in 'unknown' or 'legacy', and a feature probe while in
'legacy' or 'modern'). But while it might prevent odd behavior
caused by protocol fuzzing, I don't see that it would prevent
any security holes, so I considered it bloat.
Note that sticking @acl markers on the new RPCs generates unused
functions in access/viraccessapicheck.c, because there is no new
API call that needs to use the new checks; however, having a
consistent .x file is worth the dead code.
* src/libvirt_internal.h (VIR_DRV_FEATURE_REMOTE_EVENT_CALLBACK):
New feature.
* src/remote/remote_protocol.x
(REMOTE_PROC_CONNECT_DOMAIN_EVENT_CALLBACK_REGISTER_ANY)
(REMOTE_PROC_CONNECT_DOMAIN_EVENT_CALLBACK_DEREGISTER_ANY)
(REMOTE_PROC_DOMAIN_EVENT_CALLBACK_LIFECYCLE): New RPCs.
* daemon/remote.c (daemonClientCallback): Add field.
(remoteDispatchConnectDomainEventCallbackRegisterAny)
(remoteDispatchConnectDomainEventCallbackDeregisterAny): New
functions.
(remoteDispatchConnectDomainEventRegisterAny)
(remoteDispatchConnectDomainEventDeregisterAny): Mark legacy use.
(remoteRelayDomainEventLifecycle): Change message based on legacy
or new use.
(remoteDispatchConnectSupportsFeature): Advertise new feature.
* src/remote_protocol-structs: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
virGetStorageVol can return NULL on out-of-memory. If it does, cleanly
abort the volume clone operation.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
This reverts commit 67ccf91bf2.
We only generate the volume key after we've built it, but the storage
driver expects it to be filled after createVol finishes.
Squash the volume building back with creating to fulfill this
expectation.
Openstack Nova calls virConnectBaselineCPU() during initialization
of the instance to get a full list of CPU features.
This patch adds a stub to aarch64-specific code to handle
this request (no actual work is done). That's enough to have
this stub with limited functionality because qemu/kvm backend
supports only 'host-passthrough' cpu mode on aarch64.
Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
A small fix for the possiblitiy of jumping to an error path before
registering for domain events, preventing receiving important ones
like shutdown and death.
This shadows the index function on some systems (RHEL-6.4, FreeBSD 9):
../../src/conf/capabilities.c: In function 'virCapabilitiesGetCpusForNode':
../../src/conf/capabilities.c:1005: warning: declaration of'index'
shadows a global declaration [-Wshadow]
/usr/include/strings.h:57: warning: shadowed declaration is here [-Wshadow]
On some platforms like IBM PowerNV the NUMA node numbers can be
non-sequential. For eg. numactl --hardware o/p from such a machine looks
as given below
node distances:
node 0 1 16 17
0: 10 40 40 40
1: 40 10 40 40
16: 40 40 10 40
17: 40 40 40 10
The NUMA nodes are 0,1,16,17
Libvirt uses sequential index as NUMA node numbers and this can
result in crash or incorrect results.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com>
Add a new backend for any character device. This backend uses channel
in spice connection. This channel is similar to spicevmc, but
all-purpose in contrast to spicevmc.
Apart from spicevmc, spiceport-backed chardev will not be formatted
into the command-line if there is no spice to use (with test for that
as well). For this I moved the def->graphics counting to the start
of the function so its results can be used in rest of the code even in
the future.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This patch is here just to ease the code review and make related
changes look more sensible. Apart from removing the condition this is
merely a whitespace (indentation) change.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Limiting ourselves to qemu without QEMU_CAPS_DEVICE capability, we
used '-serial none' only if there was no serial device defined in the
domain XML. This means that if we want to have a possibility of the
device being defined in XML, but not used in the command-line
(e.g. when it's pointless), we'll fail to attach '-serial none' to the
command-line (when skipping the device's command-line building and the
device being the only one).
Since there is no such device, this patch doesn't actually do
anything, but enables easier future additions in this manner.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add a new character device backend called 'spiceport' that uses
spice's channel for communications and apart from spicevmc can be used
as a backend for any character device from libvirt's point of view.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This new RBD format supports snapshotting and cloning. By having
libvirt create images in format 2 end-users of the created images
can benefit from the new RBD format.
Older versions of libvirt can work with this new RBD format as long
as librbd supports format 2. RBD format is supported by librbd since
version 0.56 (Ceph Bobtail).
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When restarting sheepdog pool, all volumes are missing.
This patch add automatically all volume from the added pool.
Adding last Daniel P. Berrange's syntaxes correction.
Adding vol on separeted function 'inspired' from parallels_storage :
parallelsAddDiskVolume
In order to make a client-only build successful on RHEL4 (yes, you
read that correctly!), commit 3ed2e54 modified src/util/virnetdev.c so
that the functional version of virNetDevGetVLanID() was only compiled
if GET_VLAN_VID_CMD was defined. However, it is *never* defined, but
is only an enum value, so the proper version was no longer compiled
even on platforms that support it. This resulted in the vlan tag not
being properly set for guest traffic on VEPA mode guest macvtap
interfaces that were bound to a vlan interface (that's the only place
that libvirt currently uses virNetDevGetVLanID)
Since there is no way to compile conditionally based on the presence
of an enum value, this patch modifies configure.ac to check for said
enum value with AC_CHECK_DECLS(), which #defines
HAVE_DECL_GET_VLAN_VID_CMD to 1 if it's successful compiling a test
program that uses GET_VLAN_VID_CMD (and still #defines it, but to 0,
if it's not successful). We can then make the compilation of
virNetDevGetVLanID() conditional on the value of
HAVE_DECL_GET_VLAN_VID_CMD.
Reset line numbering on each input file in check-aclrules.pl. Otherwise
it reports wrong line numbers in its error messages.
Signed-off-by: Yuri Myasoedov <ymyasoedov@yandex.ru>
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
In the network status XML we may have the <floor/> element with the
'sum' attribute. The attribute represents sum of all 'floor'-s of
computed over each interface connected to the network (this is needed to
guarantee certain bandwidth for certain domain). The sum is therefore a
number. However, if the number was mangled (e.g. by an user's
interference to network status file), we've just ignored it without
refusing to parse such file. This was all due to 'goto error' missing.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The code took into account only the global permissions. The domains now
support per-vm DAC labels and per-image DAC labels. Use the most
specific label available.
The lack of debug printings might be frustrating in the future.
Moreover, this function doesn't follow the usual pattern we have in the
rest of the code:
int ret = -1;
/* do some work */
ret = 0;
cleanup:
/* some cleanup work */
return ret;
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add a new <timer> for the HyperV reference time counter enlightenment
and the iTSC reference page for Windows guests.
This feature provides a paravirtual approach to track timer events for
the guest (similar to kvmclock) with the option to use real hardware
clock on systems with a iTSC with compensation across various hosts.
According to the documentation various timer options are only supported
by certain timer types. Add a post parse check to verify that the user
didn't specify invalid options.
Also fix the qemu command line parsing function to set correct default
values for the kvmclock timer so that it passes the new check.
According to the documentation describing various tunables for domain
timers not all the fields are supported by all the driver types. Express
these in the RNG:
- rtc, platform: Only these support the "track" attribute.
- tsc: only one to support "frequency" and "mode" attributes
- hpet, pit: tickpolicy/catchup attribute/element
- kvmclock: no extra attributes are supported
Additionally the attributes of the <catchup> element for
tickpolicy='catchup' are optional according to the parsing code. Express
this in the XML and fix a spurious space added while formatting the
<catchup> element and add tests for it.
Coverity complains about "USE_AFTER_FREE" due to how virPCIDeviceSetStubDriver
"could" return either -1, 0, or 1 from the VIR_STRDUP() and then possibly makes
a call to virPCIDeviceDetach().
The only way this could happen is if NULL were passed as the "driver" name
and virStrdup() returned 0. Since the calling functions check < 0 on the
initial function call, the 0 possibility causes Coverity to complain.
To fix this - enforce that the second parameter is not NULL using
ATTRIBUTE_NONNULL(2) for the function prototype, then in virPCIDeviceDetach
add an sa_assert(dev->stubDriver). This will result in Coverity not complaining
any more.
Couple of codepaths shared the same code which can be moved out to a
function and on one of such places, qemuMigrationConfirmPhase(), the
domain was resumed even if it wasn't running before the migration
started.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1057407
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
libxlDomainRestoreFlags acquires the driver lock while reading the
domain config from the save file and adding it to
libxlDriverPrivatePtr->domains. But virDomainObjList provides
self-locking APIs, so remove the needless driver locking.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
If available, let libxl handle reaping any children it creates by
specifying libxl_sigchld_owner_libxl_always_selective_reap. This
feature was added to improve subprocess handling in libxl when used
in an application that does not install a SIGCHLD handler like
libvirt
http://lists.xen.org/archives/html/xen-devel/2014-01/msg01555.html
Prior to this patch, it is possible to hit asserts in libxl when
reaping subprocesses, particularly during simultaneous operations
on multiple domains. With this patch, and the corresponding changes
to libxl, I no longer see the asserts. Note that the libxl changes
will be included in Xen 4.4.0. Previous Xen versions will be
susceptible to hitting the asserts even with this patch applied to
the libvirt libxl driver.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Handling the domain shutdown event within the event handler seems
a bit unfair to libxl's event machinery. Domain "shutdown" could
take considerable time. E.g. if the shutdown reason is reboot,
the domain must be reaped and then started again.
Spawn a shutdown handler thread to do this work, allowing libxl's
event machinery to go about its business.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Due to some misunderstanding of requirements libxl places on timer
handling, I introduced the half-brained idea of maintaining a list
of timeouts that the driver could force to expire before freeing a
libxlDomainObjPrivate (and hence libxl_ctx). But testing all
the latest versions of Xen supported by the libxl driver (4.2.3,
4.3.1, 4.4.0 RC3), I see that libxl will handle this just fine and
there is no need to force expiration behind libxl's back. Indeed it
may be harmful to do so.
This patch removes the timer list, allowing libxl to handle cleanup
of its timer registrations.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
When libxl registers an FD with the libxl driver, the refcnt of the
associated libxlDomainObjPrivate object is incremented. The refcnt
is decremented when libxl deregisters the FD. But some FDs are only
deregistered when their libxl ctx is freed, which unfortunately is
done in the libxlDomainObjPrivate dispose function. With references
held by the FDs, libxlDomainObjPrivate is never disposed.
I added the ref/unref in FD registration/deregistration when adding
the same in timer registration/deregistration. For timers, this
is a simple approach to ensuring the libxlDomainObjPrivate is not
disposed prior to their expirtation, which libxl guarantees will
occur. It is not needed for FDs, and only causes
libxlDomainObjPrivate to leak.
This patch removes the reference on libxlDomainObjPrivate for FD
registrations, but retains them for timer registrations. Tested on
the latest releases of Xen supported by the libxl driver: 4.2.3,
4.3.1, and 4.4.0 RC3.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
This commit allows to attach/detach a <filesystem> device in qemu. For
this purpose I'm introducing two new functions: virDomainFSInsert() and
virDomainFSRemove() and adding necessary code in the qemu driver. It
compares filesystems based on their "destination" folder. So if two
filesystems share the same destination, they are considered equal and
the qemu driver would reject the insertion.
Signed-off-by: Matthieu Coudron <mattator@gmail.com>
If virDomainMemoryStats was run on a domain with virtio balloon driver
running on an old qemu which supports QMP but does not support qom-list
QMP command, libvirtd would crash. The reason is we did not check if
qemuMonitorJSONGetObjectListPaths failed and moreover we even stored its
result in an unsigned integer type.
When attempting a blockcommit from the top layer, the base argument
passed is NULL. This will be dereferenced when attempting a commit with
an empty image chain. Output the real volume path instead:
virsh blockcommit --verbose --path vda --domain DOMNAME --wait
error: invalid argument: top '/path/somefile' in chain for 'vda' has no backing file
instead of:
error: invalid argument: top '(null)' in chain for 'vda' has no backing file
Eric Blake suggested to change this message to be different from the
glibc's NULL deref protection message in printf to be able to
differentiate errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1046192
Commit b8bf79a, which adds clock='variable', forgets to check
localtime basis in qemuBuildClockArgStr(). So that localtime
basis could not be used.
Reported-by: Jincheng Miao <jmiao@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 2ce63c1 added imagelabel generation when relabeling is turned
off. But we weren't filling out the sensitivity for type 'none' labels,
resulting in an invalid label:
$ virsh managedsave domain
error: unable to set security context 'system_u:object_r:svirt_image_t'
on fd 28: Invalid argument
Noticed a misuse of 'to' while testing my event regression under
polkit ACLs, and decided to review the entire conf files for
other legibility bugs.
* daemon/libvirtd.conf: Use correct grammar.
* src/qemu/qemu.conf: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1058839
Commit f9f56340 for CVE-2014-0028 almost had the right idea - we
need to check the ACL rules to filter which events to send. But
it overlooked one thing: the event dispatch queue is running in
the main loop thread, and therefore does not normally have a
current virIdentityPtr. But filter checks can be based on current
identity, so when libvirtd.conf contains access_drivers=["polkit"],
we ended up rejecting access for EVERY event due to failure to
look up the current identity, even if it should have been allowed.
Furthermore, even for events that are triggered by API calls, it
is important to remember that the point of events is that they can
be copied across multiple connections, which may have separate
identities and permissions. So even if events were dispatched
from a context where we have an identity, we must change to the
correct identity of the connection that will be receiving the
event, rather than basing a decision on the context that triggered
the event, when deciding whether to filter an event to a
particular connection.
If there were an easy way to get from virConnectPtr to the
appropriate virIdentityPtr, then object_event.c could adjust the
identity prior to checking whether to dispatch an event. But
setting up that back-reference is a bit invasive. Instead, it
is easier to delay the filtering check until lower down the
stack, at the point where we have direct access to the RPC
client object that owns an identity. As such, this patch ends
up reverting a large portion of the framework of commit f9f56340.
We also have to teach 'make check' to special-case the fact that
the event registration filtering is done at the point of dispatch,
rather than the point of registration. Note that even though we
don't actually use virConnectDomainEventRegisterCheckACL (because
the RegisterAny variant is sufficient), we still generate the
function for the purposes of documenting that the filtering
takes place.
Also note that I did not entirely delete the notion of a filter
from object_event.c; I still plan on using that for my upcoming
patch series for qemu monitor events in libvirt-qemu.so. In
other words, while this patch changes ACL filtering to live in
remote.c and therefore we have no current client of the filtering
in object_event.c, the notion of filtering in object_event.c is
still useful down the road.
* src/check-aclrules.pl: Exempt event registration from having to
pass checkACL filter down call stack.
* daemon/remote.c (remoteRelayDomainEventCheckACL)
(remoteRelayNetworkEventCheckACL): New functions.
(remoteRelay*Event*): Use new functions.
* src/conf/domain_event.h (virDomainEventStateRegister)
(virDomainEventStateRegisterID): Drop unused parameter.
* src/conf/network_event.h (virNetworkEventStateRegisterID):
Likewise.
* src/conf/domain_event.c (virDomainEventFilter): Delete unused
function.
* src/conf/network_event.c (virNetworkEventFilter): Likewise.
* src/libxl/libxl_driver.c: Adjust caller.
* src/lxc/lxc_driver.c: Likewise.
* src/network/bridge_driver.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/remote/remote_driver.c: Likewise.
* src/test/test_driver.c: Likewise.
* src/uml/uml_driver.c: Likewise.
* src/vbox/vbox_tmpl.c: Likewise.
* src/xen/xen_driver.c: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1057321
pointed out that we weren't honoring the <bandwidth> element in
libvirt networks using <forward mode='bridge'/>. In fact, these
networks are just a method of giving a libvirt network name to an
existing Linux host bridge on the system, and libvirt doesn't have
enough information to know where to set such limits. We are working on
a method of supporting network bandwidths for some specific cases of
<forward mode='bridge'/>, but currently libvirt doesn't support it. So
the proper thing to do now is just log an error when someone tries to
put a <bandwidth> element in that type of network. (It's unclear if we
will be able to do proper bandwidth limiting for macvtap networks, and
most definitely we will not be able to support it for hostdev
networks).
While looking through the network XML documentation and comparing it
to the networkValidate function, I noticed that we also ignore the
presence of a mac address in the config in the same cases, rather than
failing so that the user will understand that their desired action has
not been taken.
This patch updates networkValidate() (which is called any time a
persistent network is defined, or a transient network created) to log
an error and fail if it finds either a <bandwidth> or <mac> element
and the network forward mode is anything except 'route'. 'nat', or
nothing. (Yes, neither of those elements is acceptable for any macvtap
mode, nor for a hostdev network).
NB: This does *not* cause failure to start any existing network that
contains one of those elements, so someone might have erroneously
defined such a network in the past, and that network will continue to
function unmodified. I considered it too disruptive to suddenly break
working configs on the next reboot after a libvirt upgrade.
https://bugzilla.redhat.com/show_bug.cgi?id=1045124
When loading modules, libvirt does not honor the modprobe blacklist.
Use the new virKModLoad() API in order to attempt load with blacklist check.
Use the new virKModIsBlacklisted() API to check if the failure to load
was due to the blacklist
Signed-off-by: John Ferlan <jferlan@redhat.com>
virKModConfig() - Return a buffer containing kernel module configuration
virKModLoad() - Load a specific module into the kernel configuration
virKModUnload() - Unload a specific module from the kernel configuration
virKModIsBlacklisted() - Determine whether a module is blacklisted within
the kernel configuration
commit f094aaac changed qemuPrepareHostdevPCIDevices() such that it
may modify the "backend" (vfio vs. legacy kvm) setting in the
virHostdevDef. However, qemuDomainAttachHostPciDevice() (used by
hotplug) copies the backend setting into a local *before* calling
qemuPrepareHostdevPCIDevices(), and then later makes a decision based
on that pre-change value.
The result is that, if the backend had been set to "default" (i.e. not
specified in the config) and was later updated to "VFIO" by
qemuPrepareHostdevPCIDevices(), the qemu process' MacMemLock is not
increased (as is required for VFIO device assignment).
This patch delays making the local copy of backend until after its
potential modification.
The previous patch fixed "forwardPlainNames" so that it really is
doing only what is intended, but left the default to be
"forwardPlainNames='no'". Discussion around the initial version of
that patch led to the decision that the default should instead be
"forwardPlainNames='yes'" (i.e. the original behavior before commit
f3886825). This patch makes that change to the default.
In commit f386825 we began adding the options
--domain-needed
--local=/$mydomain/
to all dnsmasq commandlines with the stated reason of preventing
forwarding of DNS queries for names that weren't fully qualified
domain names ("FQDN", i.e. a name that included some "."s and a domain
name). This was later changed to
domain-needed
local=/$mydomain/
when we moved the options from the dnsmasq commandline to a conf file.
The original patch on the list, and discussion about it, is here:
https://www.redhat.com/archives/libvir-list/2012-August/msg01594.html
When a domain name isn't specified (mydomain == ""), the addition of
"domain-needed local=//" will prevent forwarding of domain-less
requests to the virtualization host's DNS resolver, but if a domain
*is* specified, the addition of "local=/domain/" will prevent
forwarding of any requests for *qualified* names within that domain
that aren't resolvable by libvirt's dnsmasq itself.
An example of the problems this causes - let's say a network is
defined with:
<domain name='example.com'/>
<dhcp>
..
<host mac='52:54:00:11:22:33' ip='1.2.3.4' name='myguest'/>
</dhcp>
This results in "local=/example.com/" being added to the dnsmasq options.
If a guest requests "myguest" or "myguest.example.com", that will be
resolved by dnsmasq. If the guest asks for "www.example.com", dnsmasq
will not know the answer, but instead of forwarding it to the host, it
will return NOT FOUND to the guest. In most cases that isn't the
behavior an admin is looking for.
A later patch (commit 4f595ba) attempted to remedy this by adding a
"forwardPlainNames" attribute to the <dns> element. The idea was that
if forwardPlainNames='yes' (default is 'no'), we would allow
unresolved names to be forwarded. However, that patch was botched, in
that it only removed the "domain-needed" option when
forwardPlainNames='yes', and left the "local=/mydomain/".
Really we should have been just including the option "--domain-needed
--local=//" (note the lack of domain name) regardless of the
configured domain of the network, so that requests for names without a
domain would be treated as "local to dnsmasq" and not forwarded, but
all others (including those in the network's configured domain) would
be forwarded. We also shouldn't include *either* of those options if
forwardPlainNames='yes'. This patch makes those corrections.
This patch doesn't remedy the fact that default behavior was changed
by the addition of this feature. That will be handled in a subsequent
patch.
We support only one spicevmc channel name anyway and the code is
prepared to use the default one, there's only one check missing. It
is also mentioned in the documentation already and helps defining
domains with spice vdagent for people using virsh.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Coverity complains about default: label in libxl_driver.c not be able
to be reached. It's by design for the code and since it's not necessary
in the code nor does it elicit any compiler/make check warnings - just
remove it rather than adding a coverity[dead_error_begin] tag.
While I'm at it, lxc_driver.c and nodeinfo.c have the same design, so I
removed the default labels and the existing coverity tags.
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
1. nwfilter driver lock
2. virt driver lock
3. nwfilter update lock
4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock
2. domain object lock
3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of
1. nwfilter driver lock
2. nwfilter update lock
3. virt driver lock
4. domain object lock
and VM start using
1. nwfilter update lock
2. virt driver lock
3. domain object lock
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are a number of pthreads impls available on Win32
these days, in particular the mingw64 project has a good
impl. Delete the native windows thread implementation and
rely on using pthreads everywhere.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The check-augeas-lockd test depends on the file
locking/qemu-lockd.conf, so must be skipped when QEMU
is disabled.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 10c9ceff6d intended to introduce new argument for the
testing purpose, but it missed the similar changing of the
device's sg_path. The problem was hidden since my laptop has
the /dev/sg0 and /dev/sg1. A later patch will modify the tests
accordingly.
Signed-off-by: Osier Yang <jyang@redhat.com>
Reported-by: Pavel Hrdina <phrdina@redhat.com>
To support passing the path of the test data to the utils, one
more argument is added to virSCSIDeviceGetSgName,
virSCSIDeviceGetDevName, and virSCSIDeviceNew, and the related
code is changed accordingly.
Later tests for the scsi utils will be based on this patch.
Signed-off-by: Osier Yang <jyang@redhat.com>
It doesn't make sense to fail if the SCSI host device is specified
as "shareable" explicitly between domains (NB, it works if and only
if the device is specified as "shareable" for *all* domains,
otherwise it fails).
To fix the problem, this patch introduces an array for virSCSIDevice
struct, which records all the names of domain which are using the
device (note that the recorded domains must specify the device as
shareable). And the change on the data struct brings on many
subsequent changes in the code.
Prior to this patch, the "shareable" tag didn't work as expected,
it actually work like "non-shareable". So this patch also added notes
in formatdomain.html to declare the fact.
* src/util/virscsi.h:
- Remove virSCSIDeviceGetUsedBy
- Change definition of virSCSIDeviceGetUsedBy and virSCSIDeviceListDel
- Add virSCSIDeviceIsAvailable
* src/util/virscsi.c:
- struct virSCSIDevice: Change "used_by" to be an array; Add
"n_used_by" as the array count
- virSCSIDeviceGetUsedBy: Removed
- virSCSIDeviceFree: frees the "used_by" array
- virSCSIDeviceSetUsedBy: Copy the domain name to avoid potential
memory corruption
- virSCSIDeviceIsAvailable: New
- virSCSIDeviceListDel: Change the logic, for device which is already
in the list, just remove the corresponding entry in "used_by". And
since it's only used in one place, we can safely removing the code
to find out the dev in the list first.
- Copyright updating
* src/libvirt_private.sys:
- virSCSIDeviceGetUsedBy: Remove
- virSCSIDeviceIsAvailable: New
* src/qemu/qemu_hostdev.c:
- qemuUpdateActiveScsiHostdevs: Check if the device existing before
adding it to the list;
- qemuPrepareHostdevSCSIDevices: Error out if the not all domains
use the device as "shareable"; Also don't try to add the device
to the activeScsiHostdevs list if it already there; And make
more sensible error w.r.t the current "shareable" value in
driver->activeScsiHostdevs.
- qemuDomainReAttachHostScsiDevices: Change the logic according
to the changes on helpers.
Signed-off-by: Osier Yang <jyang@redhat.com>
This reverts commit 2996e6be19
and some parts of 2636dc8c4d.
The former one tried to implement QoS setting on bridgeless networks.
However, as discussed upstream [1], the patch is far away from being
useful in even a single case. The whole idea of network QoS is to have
aggregated limits over several interfaces. This patch is doing
completely the opposite when merging two QoS settings (from the network
and the domain interface) into one which is then set at the domain
interface itself, not the network.
The latter one is the test for the previous one. Now none of them makes
sense.
1: https://www.redhat.com/archives/libvir-list/2014-January/msg01441.html
Conflicts:
tests/virnetdevbandwidthtest.c: New test has been introduced since
then.
There are some units within libvirt that utilize virCommand API to run
some commands and deserve own unit testing. These units are, however,
not desired to be rewritten to dig virCommand API usage out. As a great
example virNetDevBandwidth could be used. The problem with the bandwidth
unit is: it uses virCommand API heavily. Therefore we need a mechanism
to not really run a command, but rather see its string representation
after which we can decide if the unit construct the correct sequence of
commands or not.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add support for specifying various types when doing snapshots. This will
later allow to do snapshots on network backed volumes. Disks of type
'volume' are not supported by snapshots (yet).
Also amend the test suite to check parsing of the various new disk
types that can now be specified.
Commit df36af58 broke parsing of http response from xend. The prior
use of atoi() would happily parse e.g. a string containing "200 OK\r\n",
whereas virStrToLong_i() will fail when called with a NULL end_ptr.
Change the calls to virStrToLong_i() to provide a non-NULL end_ptr.
https://bugzilla.redhat.com/show_bug.cgi?id=1049391
When all source CPU XMLs contain just a single CPU model (with a
possibly varying set of additional feature elements),
virConnectBaselineCPU will try to use this CPU model in the computed
guest CPU. Thus, when used on just a single CPU (useful with
VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES), the result will not use a
different CPU model.
If the computed CPU uses the source model, set fallback mode to 'forbid'
to make sure the guest CPU will always be as close as possible to the
source CPUs.
https://bugzilla.redhat.com/show_bug.cgi?id=1049391
VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES flag for virConnectBaselineCPU
did not work if the resulting guest CPU would disable some features
present in its base model. This patch makes sure we won't try to add
such features twice.
Implement virProcess{Get,Set}Affinity() using cpuset_getaffinity()
and cpuset_setaffinity() calls. Quick search showed that they are
only available on FreeBSD, so placed it inside existing #ifdef
blocks for FreeBSD instead of adding configure checks.
Creating a qemu VM with /dev/hwrng as backend RNG device throws the
following error - "Could not open '/dev/hwrng': Permission denied"
This patch fixes the issue
Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1055484
Currently, libvirt's XML schema of network allows QoS to be defined for
every network even though it has no bridge. For instance:
<network>
<name>vdsm-no-bridge</name>
<forward mode='passthrough'>
<interface dev='em1.10'/>
</forward>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='1000' burst='1024'/>
</bandwidth>
</network>
The bandwidth limitations can be, however, applied even on such
networks. In fact, they are going to be applied on the interface that
will be connected to the network on a domain startup. This approach,
however, has one limitation. With bridged networks, there are two points
where QoS can be set: bridge and domain interface. The lower limit of
the two is enforced then. For instance, if the interface has 10Mbps
average, but the network only 1Mbps, there's no way for interface to
transmit packets faster than the 1Mbps limit. With two points this is
enforced by kernel. With only one point, we must combine both QoS
settings into one which is set afterwards. Look at
virNetDevBandwidthMinimal() and you'll understand immediately what I
mean.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch allows libvirt user to specify 'host-passthrough'
cpu mode while using qemu/kvm backend on aarch64.
It uses 'host' as a CPU model name instead of some other stub
(correct CPU detection is not implemented yet) to allow libvirt
user to specify 'host-model' cpu mode as well.
Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
(crobinso: fix some indentation)
Currently the qemuDomainGetBlockInfo will return allocation == physical
for most backing stores. For a qcow2 block backed device it's possible
to return the highest lv extent allocated from qemu for an active guest.
That is a value where allocation != physical and one would hope be less.
However, if the guest is not running, then the code falls back to returning
allocation == physical. This turns out to be problematic for rhev which
monitors the size of the backing store. During a migration, before the
VM has been started on the target and while it is deemed inactive on the
source, there's a small window of time where the allocation is returned
as physical triggering the code to extend the file unnecessarily.
Since rhev uses transient domains and this is edge condition for a transient
domain, rather than returning good status and allocation == physical when
this "window of opportunity" exists, this patch will check for a transient
(or non persistent) domain and return a failure to the caller rather than
returning the defaults. For a persistent domain, the defaults will be
returned. The description for the virDomainGetBlockInfo has been updated
to describe the phenomena.
the array params is allocated by VIR_ALLOC_N in
remoteDispatchDomainGetCPUStats. it had been set
to zero. No need to reset it to zero again, and
this reset here is incorrect too, nparams * ncpus
is the array length not the size of params array.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Unlike the host devices of other types, SCSI host device XML supports
"shareable" tag. This patch introduces it for the virSCSIDevice struct
for a later patch use (to detect if the SCSI device is shareable when
preparing the SCSI host device in QEMU driver).
The "checkPool" is a bit different for pool with "fc_host"
type source adapter, since the vHBA it's based on might be
not created yet (it's created by "startPool", which is
involked after "checkPool" in storageDriverAutostart). So it
should not fail, otherwise the "autostart" of the pool will
fail either.
The problem is easy to reproduce:
* Enable "autostart" for the pool
* Restart libvirtd service
* Check the pool's state
94f8205 added a space to the string but didn't change the buffer size.
Signed-off-by: Bing Bu Cao <mars@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
For pool which relies on remote resources, such as a "iscsi" type
pool, since how long it takes to export the corresponding devices
to host's sysfs is really depended, it could depend on the network
connection, it also could depend on the host's udev procedures. So
it's likely that the volumes are not able to be detected during pool
starting process, polling the sysfs doesn't work, since we don't
know how much time is best for the polling, and even worse, the
volumes could still be not detected or partly not detected even after
the polling. So we end up with a documentation to prompt the fact,
in virsh manual.
And as a small improvement, let's explicitly say no LUNs found in
the debug log in that case.
There are 2 issues here: First we shouldn't add "1" to the return
value of numa_max_node(), since the semanteme of the error message
was changed, it's not saying about the number of total NUMA nodes
anymore. Second, the value of "bit" is the position of the first
bit which exceeds either numa_max_node() or NUMA_NUM_NODES, it can
be any number in the range, so saying "bigger than $bit" is quite
confused now. For example, assuming there is a NUMA machine which
has 10 NUMA nodes, and one specifies the "nodeset" as "0,5,88",
the error message will be like:
Nodeset is out of range, host cannot support NUMA node bigger than 88
It sounds like all NUMA node number less than 88 is fine, but
actually the maximum NUMA node number the machine supports is 9.
This patch fixes the issues by removing the addition with "1" and
simplifies the error message as "NUMA node $bit is out of range".
Also simplifies the comparision in the while loop by getting the
smaller one of numa_max_node() and NUMA_NUM_NODES up front.
I noticed that we allow virDomainGetVcpusFlags even for read-only
connections, but that with a flag, it can require guest agent
interaction. It is feasible that a malicious guest could
intentionally abuse the replies it sends over the guest agent
connection to possibly trigger a bug in libvirt's JSON parser,
or withhold an answer so as to prevent the use of the agent
in a later command such as a shutdown request. Although we
don't know of any such exploits now (and therefore don't mind
posting this patch publicly without trying to get a CVE assigned),
it is better to err on the side of caution and explicitly require
full access to any domain where the API requires guest interaction
to operate correctly.
I audited all commands that are marked as conditionally using a
guest agent. Note that at least virDomainFSTrim is documented
as needing a guest agent, but that such use is unconditional
depending on the hypervisor (so the existing domain:fs_trim ACL
should be sufficient there, rather than also requirng domain:write).
But when designing future APIs, such as the plans for obtaining
a domain's IP addresses, we should copy the approach of this patch
in making interaction with the guest be specified via a flag, and
use that flag to also require stricter access checks.
* src/libvirt.c (virDomainGetVcpusFlags): Forbid guest interaction
on read-only connection.
(virDomainShutdownFlags, virDomainReboot): Improve docs on agent
interaction.
* src/remote/remote_protocol.x
(REMOTE_PROC_DOMAIN_SNAPSHOT_CREATE_XML)
(REMOTE_PROC_DOMAIN_SET_VCPUS_FLAGS)
(REMOTE_PROC_DOMAIN_GET_VCPUS_FLAGS, REMOTE_PROC_DOMAIN_REBOOT)
(REMOTE_PROC_DOMAIN_SHUTDOWN_FLAGS): Require domain:write for any
conditional use of a guest agent.
* src/xen/xen_driver.c: Fix clients.
* src/libxl/libxl_driver.c: Likewise.
* src/uml/uml_driver.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/lxc/lxc_driver.c: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Bugs have been found in the VirtualBox API C bindings. These bugs have
been fixed in versions 4.2.20 and 4.3.4. However, the changes in the
C bindings are incompatible with the vbox_CAPI_v4_2.h and vbox_CAPI_v4_3.h
files which are bundled in libvirt source code.
This is why the following patch adds vbox_CAPI_v4_2_20.h and
vbox_CAPI_v4_3_4.h.
The actual underlying problem here is that until now,
libvirt assumed that VirtualBox API can only change between minor
versions (4.2 -> 4.3), but we have a case here where it changed
(or got fixed) between patch versions (4.2.18 -> 4.2.20).
This patch makes the VBOX_API_VERSION represent the full API
version number (i.e 4002 => 4002000) so there are specific version
numbers for Vbox 4.2.20 (4002020) and 4.3.4 (4003004)
Libvirtd would crash if a domain contained an empty cdrom drive of
type='volume' as the disk def->srcpool member would be dereferenced. Fix
it by checking if the source pool is present before dereferencing it.
Also alter tests to catch this issue in the future.
Reported by: Kevin Shanahan
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1056328
- Use $XDG_RUNTIME_DIR for re-exec state file when running unprivileged.
- argv[0] may not contain a full path to the binary, however it should
contain something that can be looked up in the PATH. Use execvp() to
do path lookup on re-exec.
- As per list discussion [1], ignore --daemon on re-exec.
[1] https://www.redhat.com/archives/libvir-list/2013-December/msg00514.html
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
To retrieve node cpu statistics on Linux system, the
linuxNodeGetCPUstats function simply uses STRPREFIX() to match the cpuid
with the one read from /proc/stat. However, as the file is read line by
line it may happen, that some CPUs share the same prefix. So if user
requested stats for the first CPU, which is offline, then there's no
cpu1 in the stats file so the one that we match is cpu10. Which is
obviously wrong. Fortunately, the IDs are terminated by a space, so we
can utilize that.
Signed-off-by: Bing Bu Cao <mars@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1034993
SCSI passthrough disks (<disk .. device="lun">) can't be used as backing
for snapshots. Currently with upstream qemu the vm crashes on such
attempt.
This patch adds a early check to catch an attempt to do such a snapshot
and rejects it right away. qemu will fix the issue but this will let us
control the error message.
I noticed this problem when adding systemd support to netcf, because I
setup the configure.ac to automatically prefer using systemd over
initscripts when possible - although I had copied the
install-data-local target from the example of libvirt's
"libvirt-guests" service more or less verbatim, "make distcheck" would
fail because it was trying to install the service file directly into
/lib/systemd/system rather than into
/home/user/some/unimportant/name/lib/systemd/system.
This is caused by the install/uninstall rules for the systemd unit
files relying on $(DESTDIR) pointing the installed files to the right
place, but in reality $(DESTDIR) is empty during this part of make
distcheck - it instead sets $(prefix) with the toplevel directory used
for its test build/install/uninstall cycle.
(This problem hasn't been seen when running "make distcheck" in
libvirt because libvirt will never build/install systemd support
unless explicitly told to do so on the configure commandline, and
"make distcheck" doesn't put the "--with-initscript=..." option on the
configure commandline.)
I verified that the same problem does exist in libvirt by modifying
libvirt's configure.ac to set:
init_systemd=yes
with_init_script=systemd+redhat
This forces a build/install of the systemd unit files during
distcheck, which yields an error like this:
/usr/bin/install -c -m 644 virtlockd.service \
/lib/systemd/system/
libtool: install: warning: relinking `libvirt-qemu.la'
/usr/bin/install: cannot remove '/lib/systemd/system/virtlockd.service': Permission denied
make[4]: *** [install-systemd] Error 1
After adding $(prefix) to all the definitions of SYSTEMD_UNIT_DIR,
make distcheck now completes successfully with the modified
configure.ac, and the above lines change to something like this:
/usr/bin/install -c -m 644 virtlockd.service \
/home/laine/devel/libvirt/libvirt-1.2.1/_inst/lib/systemd/system/
We shouldn't access the domain definition while we are in the monitor
section as the domain is unlocked. Additionally after we exit from the
monitor we need to check if the VM is still alive. Not doing so resulted
in a crash if qemu exits while attempting to do an external VM snapshot.
spice-server offers an API to disable file transfer messages
on the agent channel between the client and the guest.
This is supported in qemu through the disable-agent-file-xfer option.
This patch exposes this option to libvirt.
Adds a new element 'filetransfer', with one property,
'enable', which accepts a boolean.
Default is enabled, for backward compatibility.
Depends on the capability exported in the first patch of the series.
Signed-off-by: Francesco Romani <fromani@redhat.com>
spice-server offers an API to disable file transfer messages
on the agent channel between the client and the guest.
This is supported in qemu through the disable-agent-file-xfer option.
This patch detects if QEMU supports this option, and add
a capability if does.
Signed-off-by: Francesco Romani <fromani@redhat.com>
With this patch,user can set throttle blkio cgroup for
lxc domain through virsh tool.
Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
This is useful in certain circumstances, for example when
libvirtd is being executed by FreeBSD rc script, it cannot find
dmidecode installed from FreeBSD ports because it doesn't have
/usr/local (default prefix for ports) in PATH.