For extracting hostdev codes from qemu_hostdev.c to common library, change qemu
specific COLD_BOOT handling to be a flag, and pass it to hostdev functions.
For extracting hostdev codes from qemu_hostdev.c to common library, change qemu
specific cfg->relaxedACS handling to be a flag, and pass it to hostdev
functions.
Same logic of preparing/reattaching hostdevs could be used in attach/detach
hotplug places, so reuse hostdev interfaces to avoid duplicate, also for later
extracting general code to common library.
Update parameters from vm->def to specific name, hostdevs, nhostdevs to keep
consistentcy with PreparePCIDevices and PrepareSCSIDevices. And, at the same
time, make it reusable in later patch.
Use virObject to virHostdevManager, so that each driver using virHostdevManager
can keep a reference to it, and through counting refs to make virHostdevManager
get freed.
When libvirtd is run from a build directory without being installed, it
should not depend on files from a libvirt package installed in the
system. Not only because there may not be any libvirt installed at all.
We already do a good job for plugins but cpu_map.xml was still loaded
from the system.
The Makefile.am change is necessary to make this all work from VPATH
builds since libvirtd has no idea where to find libvirt sources. It only
knows the path from which it was started, i.e, a builddir.
https://bugzilla.redhat.com/show_bug.cgi?id=1074327
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This allows it to be used by the VIR_*_ELEMENT macros.
Also use them for parsing the definiton and remove the redundant
freeing of 'nodeset' before jumping to the cleanup label.
https://bugzilla.redhat.com/show_bug.cgi?id=1071095
Add a missing goto err_exit in the error path where an unsupported
value is assigned to the CTRL_IP_LEARNING key.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
When attaching to a QEMU process, the def->seclabels array is
going to be empty. The qemuProcessAttach method must thus
populate it with data for the security drivers.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
While investigating https://bugzilla.redhat.com/show_bug.cgi?id=1061827
I noticed that we pass user input unscathed for block-pull, but
always pass a canonical absolute name through for block-commit.
[Note that we probably _ought_ to validate that the user's request
for block-pull actually matches the backing chain, the way we already
do for block-commit - but that's a separate issue. Further note that
the ability to pass user input through unscathed allows backdoors
such as specifying a backing image that is a network URI such as
a gluster disk, instead of forcing things to the local file system;
which is an area still under active investigation on whether libvirt
needs to behave differently for network disks.]
Since qemu may write the name that the user passed in as the backing
file, a user may have a reason to want a relative file name passed
through to qemu, and always munging things to absolute prevents that.
Put another way, if you have the backing chain:
[A] <- [B(back=./A)] <- [C(back=./B)]
and commit B into A (virsh blockcommit $dom vda --base A --top B),
the metadata of C will have to be re-written. But should it be
rewritten as [C(back=./A)] or as [C(back=/path/to/A)]? Still up in
the air is whether qemu's decision should be based on whether B
and/or C had relative paths, or on whether the --base and/or
--top arguments to the command were relative paths; but if we always
pass a canonical name, we've prevented the spelling of the command
arguments from being part of the hueristics that qemu uses.
I also audited the code, and verified that we never call
qemuMonitorBlockCommit() with a NULL base, either before or after
the change to qemu_driver.c.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Preserve user's
spelling, since absolute vs. relative matters to qemu.
* src/qemu/qemu_monitor.h (qemuMonitorBlockCommit): Base is never
null.
* src/qemu/qemu_monitor.c (qemuMonitorBlockCommit): Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockCommit):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockCommit):
Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit b9dd878f caused a regression in iptables interaction by
logging non-zero status at a higher level than VIR_INFO. Revert
that portion of the commit, as well as adding a comment explaining
why we check the status ourselves.
Reported by Nehal J Wani.
* src/util/viriptables.c (virIpTablesOnceInit): Undo log regression.
Signed-off-by: Eric Blake <eblake@redhat.com>
Supporting sexpr in connectDomainXMLFromNative in the libxl driver
adds flexibility for users importing legacy Xen configuration into
libvirt. E.g. this patch allows importing previous xend-managed
domains from /var/lib/xend/domains/<dom-uuid>/config.sxp into the
libvirt libxl driver.
From commit id 'd53bbfd1'
Found one core and one possible memory leak. Core seen during local
virt-test/tp_libvirt run for the vol_create_from test. The memory leak
was seen by inspection during a review of all VIR_APPEND_ELEMENT changes
In storage_backend_disk/virStorageBackendDiskMakeDataVol(), the 'vol'
needs to be kept around since it's used later, so use the _COPY macro.
This caused a segv in libvirtd:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe87c3700 (LWP 6919)]
virStorageBackendDiskMakeDataVol (vol=0x0, groups=0x7fffc8000d70, pool=0x7fffc8002460) at storage/storage_backend_disk.c:66
66 if (vol->target.path == NULL) {
In storage_backend_rbd/virStorageBackendRBDRefreshPool() there's a failure
path where the 'vol' needs to go through virStorageVolDefFree() since it
wouldn't be appended.
The qemu_bridge_filter.c file had some helpers for calling
the ebtablesXXX functions todo bridge filtering. The only
thing these helpers did was to overwrite the original error
message from the ebtables code. For added fun, the callers
of these helpers overwrote the errors yet again. For even
more fun, one of the helpers called another helper and
overwrite its errors too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The ebtablesRemoveForwardPolicyReject method was unused and
would not do anything useful even if called.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The ebtRules data structure serves no useful purpose as
the table name is never used and only 1 single chain name
needs to be stored. Just store the chain name directly
in the ebtablesContext instead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When adding/removing ebtables rules, the code would keep
an array of all rules in memory. This list of rules was
never used for any purpose and would be lost if libvirtd
restarted. Delete all the unused code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The ebtablesForwardPolicyReject method is only used internally
to the ebtables code and thus should have been static.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The bridge_driver_platform.h defines many functions that
a platform driver must implement. Only two of these
functions are actually called from the main bridge driver
code. The remainder can be made internal to the linux
driver only.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Extracting capabilities from QEMU takes a notable amount of time
when all QEMU binaries are installed. Each system emulator
needs about 200-300ms multiplied by 26 binaries == ~5-8 seconds.
This change causes the QEMU driver to save an XML file containing
the content of the virQEMUCaps object instance in the cache
dir eg /var/cache/libvirt/qemu/capabilities/$SHA256(binarypath).xml
or $HOME/.cache/libvirt/qemu/cache/capabilities/$SHA256(binarypath).xml
We attempt to load this and only if it fails, do we fallback to
probing the QEMU binary. The ctime of the QEMU binary and libvirtd
are stored in the cached file and its data discarded if either
of them change.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Debian's package manager will preserve mtime timestamp on binaries
from the time they are built, rather than installed. So if a
user downgrades their QEMU dpkg, the libvirt capabilities
cache will not refresh. The fix is to use ctime instead of mtime
since it cannot be faked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The future QEMU capabilities cache needs to be able to invalidate
itself if the libvirtd binary or any loadable modules are changed
on disk. Record the 'ctime' value for these binaries and provide
helper APIs to query it. This approach assumes that if libvirt.so
is changed, then libvirtd will also change, which should usually
be the case with libtool's wrapper scripts that cause libvirtd to
get re-linked
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Avoid the freeing of an array of zero file descriptors in case
of error. Initialize the array to -1 using memset.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Convert the sanlock and lockd lock driver plugins over to use
the new virCryptoHashString APIs instead of having their own
duplicated code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
GNULIB provides APIs for calculating md5 and sha256 hashes,
but these APIs only return you raw byte arrays. Most users
in libvirt want the hash in printable string format. Add
some helper APIs in util/vircrypto.{c,h} for doing this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This fixes a possible double free. In virNetworkAssignDef() if
virBitmapNew() fails, then virNetworkObjFree(network) is called.
However, with network->def pointing to actual @def. So if caller
frees @def again, ...
Moreover, this fixes one possible memory leak too. In
virInterfaceAssignDef() if appending to the list of interfaces
fails, we ought to call virInterfaceObjFree() instead of bare
VIR_FREE().
Although, in order to do that some array size variables needs
to be turned into size_t rather than int.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The nwfilter conf update mutex previously serialized
updates to the internal data structures for firewall
rules, and updates to the firewall itself. The latter
was recently turned into a read/write lock, and filter
instantiation allowed to proceed in parallel. It was
believed that this was ok, since each filter is created
on a separate iptables/ebtables chain.
It turns out that there is a subtle lock ordering problem
on virNWFilterObjPtr instances. __virNWFilterInstantiateFilter
will hold a lock on the virNWFilterObjPtr it is instantiating.
This in turn invokes virNWFilterInstantiate which then invokes
virNWFilterDetermineMissingVarsRec which then invokes
virNWFilterObjFindByName. This iterates over every single
virNWFilterObjPtr in the list, locking them and checking their
name. So if 2 or more threads try to instantiate a filter in
parallel, they'll all hold 1 lock at the top level in the
__virNWFilterInstantiateFilter method which will cause the
other thread to deadlock in virNWFilterObjFindByName.
The fix is to add an exclusive mutex to serialize the
execution of __virNWFilterInstantiateFilter.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This resolves a Coverity RESOURCE_LEAK issue introduced by commit
id 'de6fa535' where the virSCSIDeviceSetUsedBy() didn't VIR_FREE
the 'copy' or possibly VIR_STRDUP()'d values. It also ensures that
the VIR_APPEND_ELEMENT is successful...
If SELinux is compiled into libvirt but it is disabled on the host,
libvirtd logs:
error : virIdentityGetSystem:173 : Unable to lookup SELinux process
context: Invalid argument
on each and every client connection.
Use is_selinux_enabled() to skip retrieval of the process's SELinux
context if SELinux is disabled.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
When domain is started with setting that cannot be done, i.e. those
that require cgroups, there is no error reported and it succeeds
without any message whatsoever.
When setting with API, virsh, an error is reported, but only due to
the fact that no cgroups are mounted (priv->cgroup == NULL).
Given the above it seems reasonable to reject such unsupported
settings.
This patch effectively changes the error message from:
$ virsh -c qemu:///session schedinfo dummy
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
to:
$ virsh -c qemu:///session schedinfo dummy
Scheduler : Unknown
error: Operation not supported: CPU tuning is not available in session mode
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1023366
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
As of commit 46ec5f85, the conn.lock mutex does not need to be held
when calling any vir*Dispose() function in datatypes.c (via virObjectUnref()).
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The qemuMonitorJSONBlockJob handles a few errors internally. If qemu
returns a different error we would report a rather unhelpful message:
$ virsh blockpull gluster-job vda --base /dev/null
error: internal error: Unexpected error
As the actual message from qemu contains a bit more info, let's use it
to report something a little more useful:
$ virsh blockpull gluster-job vda --base /dev/null
error: internal error: Unexpected error: (GenericError) 'Base '/dev/null' not found'
In storageVolLookupByPath the provided path is "sanitized" at first.
This removes some extra slashes and stuff. When the lookup of the volume
fails the original path is used which makes it hard to trace errors in
some cases.
Improve the error message to print the sanitized path along with the
user provided path if they are not equal.
When looking up a volume by path on a non-local filesystem don't use the
"cleaned" path that might be mangled in such a way that it will differ
from a path provided by a storage backend.
Skip the cleanup step for gluster, sheepdog and RBD.
Pools that are not backed by files in the filesystem cause problems with
some APIs. Error out when attempting to upload a volume in such a pool
as currently we expect a local file representation for it.
use_apparmor() was first designed to be called from withing libvirtd,
but libvirt_lxc also uses it. in libvirt_lxc, there is no need to check
whether to use apparmor or not: just use it if possible.
In qemuMonitorJSONExtractCPUInfo an error message hinted on missing
character device data which is wrong.
Also a comment states that only qemu-kvm tree includes the thread_id
field. This is no longer true.
https://bugzilla.redhat.com/show_bug.cgi?id=1071264
Reverting of external snapshots is not supported currently. The check
that is present doesn't properly check for all aspects that make a
snapshot external. Use virDomainSnapshotIsExternal() to do the check.
As I did previously in 4f588a1b46, libvirt needs to set virtio vectors.
Previously, we were advised to use vectors=N, where
N = 2 * (number of queues) + 1
However, just recently this advisory has changed on the Multiquue wiki
page [1] to:
N = 2 * (number of queues) + 2
1: http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If systemd is installed, but is not the init system,
systemd-machined fails with an unhelpful error message:
Launch helper exited with unknown return code 1
Currently we only check if the "machine1" service is
available (in ListActivatableNames).
Also check if "systemd1" service is registered with DBus
(ListNames).
This fixes https://bugs.gentoo.org/show_bug.cgi?id=493246#c22
Introduce virDBusIsServiceInList which can be used to call other
methods for listing services (ListNames), not just ListActivatableNames.
No functional change, fixed the 'Retruns' typo.
Jenkins pointed out that the previous commit violates syntax
check when cppi is installed.
* src/nwfilter/nwfilter_dhcpsnoop.c (SNOOP_POLL_MAX_TIMEOUT_MS):
Update indentation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Libpcap 1.5 requires a larger buffer than previous pcap versions.
Adjust the size of the buffer to 128kb.
This patch should address symptoms in BZ 1071181 and BZ 731059
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Cap the poll timeout in the DHCP Snooping code to a max. of 10 seconds
to not hold up the libvirt shutdown longer than this.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
The old semantics of virFork() violates the priciple of good
usability: it requires the caller to check the pid argument
after use, *even when virFork returned -1*, in order to properly
abort a child process that failed setup done immediately after
fork() - that is, the caller must call _exit() in the child.
While uses in virfile.c did this correctly, uses in 'virsh
lxc-enter-namespace' and 'virt-login-shell' would happily return
from the calling function in both the child and the parent,
leading to very confusing results. [Thankfully, I found the
problem by inspection, and can't actually trigger the double
return on error without an LD_PRELOAD library.]
It is much better if the semantics of virFork are impossible
to abuse. Looking at virFork(), the parent could only ever
return -1 with a non-negative pid if it misused pthread_sigmask,
but this never happens. Up until this patch series, the child
could return -1 with non-negative pid if it fails to set up
signals correctly, but we recently fixed that to make the child
call _exit() at that point instead of forcing the caller to do
it. Thus, the return value and contents of the pid argument are
now redundant (a -1 return now happens only for failure to fork,
a child 0 return only happens for a successful 0 pid, and a
parent 0 return only happens for a successful non-zero pid),
so we might as well return the pid directly rather than an
integer of whether it succeeded or failed; this is also good
from the interface design perspective as users are already
familiar with fork() semantics.
One last change in this patch: before returning the pid directly,
I found cases where using virProcessWait unconditionally on a
cleanup path of a virFork's -1 pid return would be nicer if there
were a way to avoid it overwriting an earlier message. While
such paths are a bit harder to come by with my change to a direct
pid return, I decided to keep the virProcessWait change in this
patch.
* src/util/vircommand.h (virFork): Change signature.
* src/util/vircommand.c (virFork): Guarantee that child will only
return on success, to simplify callers. Return pid rather than
status, now that the situations are always the same.
(virExec): Adjust caller, also avoid open-coding process death.
* src/util/virprocess.c (virProcessWait): Tweak semantics when pid
is -1.
(virProcessRunInMountNamespace): Adjust caller.
* src/util/virfile.c (virFileAccessibleAs, virFileOpenForked)
(virDirCreate): Likewise.
* tools/virt-login-shell.c (main): Likewise.
* tools/virsh-domain.c (cmdLxcEnterNamespace): Likewise.
* tests/commandtest.c (test23): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Auditing all callers of virCommandRun and virCommandWait that
passed a non-NULL pointer for exit status turned up some
interesting observations. Many callers were merely passing
a pointer to avoid the overall command dying, but without
caring what the exit status was - but these callers would
be better off treating a child death by signal as an abnormal
exit. Other callers were actually acting on the status, but
not all of them remembered to filter by WIFEXITED and convert
with WEXITSTATUS; depending on the platform, this can result
in a status being reported as 256 times too big. And among
those that correctly parse the output, it gets rather verbose.
Finally, there were the callers that explicitly checked that
the status was 0, and gave their own message, but with fewer
details than what virCommand gives for free.
So the best idea is to move the complexity out of callers and
into virCommand - by default, we return the actual exit status
already cleaned through WEXITSTATUS and treat signals as a
failed command; but the few callers that care can ask for raw
status and act on it themselves.
* src/util/vircommand.h (virCommandRawStatus): New prototype.
* src/libvirt_private.syms (util/command.h): Export it.
* docs/internals/command.html.in: Document it.
* src/util/vircommand.c (virCommandRawStatus): New function.
(virCommandWait): Adjust semantics.
* tests/commandtest.c (test1): Test it.
* daemon/remote.c (remoteDispatchAuthPolkit): Adjust callers.
* src/access/viraccessdriverpolkit.c (virAccessDriverPolkitCheck):
Likewise.
* src/fdstream.c (virFDStreamCloseInt): Likewise.
* src/lxc/lxc_process.c (virLXCProcessStart): Likewise.
* src/qemu/qemu_command.c (qemuCreateInBridgePortWithHelper):
Likewise.
* src/xen/xen_driver.c (xenUnifiedXendProbe): Simplify.
* tests/reconnect.c (mymain): Likewise.
* tests/statstest.c (mymain): Likewise.
* src/bhyve/bhyve_process.c (virBhyveProcessStart)
(virBhyveProcessStop): Don't overwrite virCommand error.
* src/libvirt.c (virConnectAuthGainPolkit): Likewise.
* src/openvz/openvz_driver.c (openvzDomainGetBarrierLimit)
(openvzDomainSetBarrierLimit): Likewise.
* src/util/virebtables.c (virEbTablesOnceInit): Likewise.
* src/util/viriptables.c (virIpTablesOnceInit): Likewise.
* src/util/virnetdevveth.c (virNetDevVethCreate): Fix debug
message.
* src/qemu/qemu_capabilities.c (virQEMUCapsInitQMP): Add comment.
* src/storage/storage_backend_iscsi.c
(virStorageBackendISCSINodeUpdate): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Right now, a caller waiting for a child process either requires
the child to have status 0, or must use WIFEXITED() and friends
itself. But in many cases, we want the middle ground of treating
fatal signals as an error, and directly accessing the normal exit
value without having to use WEXITSTATUS(), in order to easily
detect an expected non-zero exit status. This adds the middle
ground to the low-level virProcessWait; the next patch will add
it to virCommand.
* src/util/virprocess.h (virProcessWait): Alter signature.
* src/util/virprocess.c (virProcessWait): Add parameter.
(virProcessRunInMountNamespace): Adjust caller.
* src/util/vircommand.c (virCommandWait): Likewise.
* src/util/virfile.c (virFileAccessibleAs): Likewise.
* src/lxc/lxc_container.c (lxcContainerHasReboot)
(lxcContainerAvailable): Likewise.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* tools/virt-login-shell.c (main): Likewise.
* tools/virsh-domain.c (cmdLxcEnterNamespace): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput): Likewise.
* tests/commandtest.c (test23): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The documentation of namespace callbacks was inconsistent on whether
it preserved positive return values. Now that we have a dedicated
EXIT_CANCELED to flag all errors before getting to the callback,
it is possible to use positive return values (not that any of the
current callers do, but it is better to match the docs).
Also, while vircommand.c is careful to close fds that a child should
not have, it's still better to be in the practice of setting
FD_CLOEXEC up front.
* src/util/virprocess.c (virProcessRunInMountNamespace): Tweak
return value to pass back non-zero status. Avoid leaking pipe fds
to other threads.
* src/util/virprocess.h: Fix comment.
Signed-off-by: Eric Blake <eblake@redhat.com>
Thanks to namespaces, we have a couple of places in the code
base that want to reflect a child exit status, including the
ability to detect death by a signal, back to a grandparent.
Best to make it a reusable function.
* src/util/virprocess.h (virProcessExitWithStatus): New prototype.
* src/libvirt_private.syms (util/virprocess.h): Export it.
* src/util/virprocess.c (virProcessExitWithStatus): New function.
* tests/commandtest.c (test23): Test it.
Signed-off-by: Eric Blake <eblake@redhat.com>
When a child fails without exec'ing, we want a well-known status;
best is to match what env(1), nice(1), su(1), and other wrapper
programs do. This patch adds enum values that later patches will
use, and sets up virFork as the first client of EXIT_CANCELED
for errors detected prior to even attempting exec, as well as
virExec to distinguish between a missing executable vs. a binary
that cannot be executed.
This is a slight semantic change in the unlikely case of a child
process failing to restore its signal mask - we now kill the
child with a known status instead of relying on the caller to
notice and do an appropriate _exit(). A subsequent patch will
make further cleanups based on an audit of all callers.
* src/internal.h (EXIT_CANCELED, EXIT_CANNOT_INVOKE)
(EXIT_ENOENT): New enum.
* src/util/vircommand.c (virFork): Document specific exit value if
child aborts early.
(virExec): Distinguish between various exec failures.
* tests/commandtest.c (test1): Enhance test.
(test22): New test.
Signed-off-by: Eric Blake <eblake@redhat.com>
While auditing all callers of virCommandRun, I noticed that nwfilter
code never paid attention to commands with a non-zero status; they
were merely passing a pointer to avoid spamming the logs with a
message about commands that might indeed fail. But proving this
required chasing through a lot of code; refactoring things to
localize the decision of whether to ignore non-zero status makes
it easier to prove that later changes to virFork don't negatively
affect this code.
While at it, I also noticed that ebiptablesRemoveRules would
actually report success if the child process failed for a
reason other than non-zero status, such as OOM.
* src/nwfilter/nwfilter_ebiptables_driver.c (ebiptablesExecCLI):
Change parameter from pointer to bool.
(ebtablesApplyBasicRules, ebtablesApplyDHCPOnlyRules)
(ebtablesApplyDropAllRules, ebtablesCleanAll)
(ebiptablesApplyNewRules, ebiptablesTearNewRules)
(ebiptablesTearOldRules, ebiptablesAllTeardown)
(ebiptablesDriverInitWithFirewallD)
(ebiptablesDriverTestCLITools, ebiptablesDriverProbeStateMatch):
Adjust all clients.
(ebiptablesRemoveRules): Likewise, and fix return value on failure.
Signed-off-by: Eric Blake <eblake@redhat.com>
Openstack Nova calls virConnectBaselineCPU() during initialization
of the instance to get a full list of CPU features.
This patch adds a stub to arm-specific code to handle
this request (no actual work is done).
Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
When probing QEMU capabilities fails for a binary generate a
log message with MESSAGE_ID==8ae2f3fb-2dbe-498e-8fbd-012d40afa361.
This can be directly queried from journald based on the UUID
instead of needing string grep. This lets tools like libguestfs'
bug reporting tool trivially do automated sanity tests on the
host they're running on.
$ journalctl MESSAGE_ID=8ae2f3fb-2dbe-498e-8fbd-012d40afa361
Feb 21 17:11:01 localhost.localdomain lt-libvirtd[9196]:
Failed to probe capabilities for /bin/qemu-system-alpha:
internal error: Child process (LC_ALL=C LD_LIBRARY_PATH=
/home/berrange/src/virt/libvirt/src/.libs PATH=/usr/lib64/
ccache:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:
/usr/bin:/root/bin HOME=/root USER=root LOGNAME=root
/bin/qemu-system-alpha -help) unexpected exit status 127:
/bin/qemu-system-alpha: error while loading shared libraries:
libglapi.so.0: cannot open shared object file: No such file
or directory
$ journalctl MESSAGE_ID=8ae2f3fb-2dbe-498e-8fbd-012d40afa361 --output=json
{ ...snip...
"LIBVIRT_SOURCE" : "file",
"PRIORITY" : "3",
"CODE_FILE" : "qemu/qemu_capabilities.c",
"CODE_LINE" : "2770",
"CODE_FUNC" : "virQEMUCapsLogProbeFailure",
"MESSAGE_ID" : "8ae2f3fb-2dbe-498e-8fbd-012d40afa361",
"LIBVIRT_QEMU_BINARY" : "/bin/qemu-system-xtensa",
"MESSAGE" : "Failed to probe capabilities for /bin/qemu-system-xtensa:
internal error: Child process (LC_ALL=C LD_LIBRARY_PATH=/home/berrange
/src/virt/libvirt/src/.libs PATH=/usr/lib64/ccache:/usr/local/sbin:
/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin HOME=/root
USER=root LOGNAME=root /bin/qemu-system-xtensa -help) unexpected
exit status 127: /bin/qemu-system-xtensa: error while loading shared
libraries: libglapi.so.0: cannot open shared object file: No such
file or directory\n" }
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When a virError is raised, pass the error domain and code
onto the systemd journald using metadata fields.
This allows error messages to be queried by code eg
$ journalctl LIBVIRT_CODE=43
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The systemd journal expects log record PRIORITY values to
be encoded using the syslog compatible numbering scheme,
not libvirt's own native numbering scheme. We must therefore
apply a conversion.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The systemd journal accepts arbitrary user specified log
fields. These can be passed into virLogMessage via the
virLogMetadata structure. Allow up to 5 custom fields to
be reported by libvirt callers.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch allows libvirt user to specify 'host-passthrough'
cpu mode while using qemu/kvm backend on arm (arm32).
It uses 'host' as a CPU model name instead of some other stub
(correct CPU detection is not implemented yet) to allow libvirt
user to specify 'host-model' cpu mode as well.
Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
As of 0bd2ccdec an empty disk path for virDomainBlockStats (or the one
with Flags) is allowed meaning "get me overall summarized statistics".
However, running 'virsh domblkstat $dom' throws a misleading error:
# ./tools/virsh domblkstat dom
error: Failed to get block stats dom
error: invalid argument: invalid path:
while after this commit
# virsh domblkstat dom
error: Operation not supported: summary statistics are not supported yet
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Libvirt uses a domain name to fill in owner_name in sanlock_options in
virLockManagerSanlockAcquire. Unfortunately, owner_name is limited to
SANLK_NAME_LEN characters (including trailing '\0'), which means domains
with longer names fail to start when sanlock is enabled. However, we can
truncate the name when setting owner_name as explained by sanlock's
author:
Setting sanlk_options or the owner_name is unnecessary, and has very
little to no benefit. If you do provide something in owner_name, it can
be anything, sanlock doesn't care or use it.
If you run the command "sanlock status", the output will display a list
of clients connected to the sanlock daemon. This client list is
displayed as "pid owner_name" if the client has provided an owner_name
via sanlk_options. This debugging output is the only usage of
owner_name, so its only benefit is to potentially provide a more human
friendly output for debugging purposes.
Only tested on v7 but the v8 equivalent seems pretty obvious.
XEN_CAP_REGEX already accepts more than it should (e.g. x86_64p or x86_32be)
but I have stuck with the existing pattern.
With this I can create a guest from:
<domain type='xen'>
<name>libvirt-test</name>
<uuid>6343998e-9eda-11e3-98f6-77252a7d02f3</uuid>
<memory>393216</memory>
<currentMemory>393216</currentMemory>
<vcpu>1</vcpu>
<os>
<type arch='armv7l' machine='xenpv'>linux</type>
<kernel>/boot/vmlinuz-arm-native</kernel>
<cmdline>console=hvc0 earlyprintk debug root=/dev/xvda1</cmdline>
</os>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<disk type='block' device='disk'>
<source dev='/dev/marilith-n0/debian-disk'/>
<target dev='xvda1'/>
</disk>
<interface type='bridge'>
<mac address='8e:a7:8e:3c:f4:f6'/>
<source bridge='xenbr0'/>
</interface>
</devices>
</domain>
Using virsh create and I can destroy it too.
Currently virsh console fails with:
Connected to domain libvirt-test
Escape character is ^]
error: internal error: cannot find character device <null>
I haven't investigated yet.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
According to commit b4e0299d if networkAllocateActualDevice() was
successful, it will *always* allocate an iface->data.network.actual,
so we can use this during networkReleaseActualDevice() to know if
there is really anything to undo. We were properly using this
information to only decrement the network connections counter if it
had previously been incremented, but we were unconditionally
unplugging bandwidth and calling the "unplugged" network hook for
*all* interfaces (during qemuProcessStop()) whether they had been
previously plugged or not. This caused problems if a domain failed to
start at some time prior to all interfaces being allocated. (I
encountered this when an interface had a bandwidth floor set but no
inbound QoS).
This patch changes both the call to networkUnplugBandwidth() and the
call to networkRunHook() to only be called if there was a previous
call to "plug" for the same interface.
networkAllocateActualDevice() is called for *all* interfaces, not just
those with type='network'. In that case, it will jump down to its
validate: label immediately, without allocating anything. After
validation is done, two counters are potentially updated (one for the
network, and one for any particular physical device that is chosen),
and then networkRunHook() is called.
This patch refactors that code a slight bit so that networkRunHook()
doesn't get called if netdef is NULL (i.e. type != network) and to
place the conditional increment of dev->connections inside the "if
(netdef)" as well - dev can never be non-null if netdef is null
(because "dev" is the pointer to a device in a network's pool of
devices), so this doesn't have any functional effect, it just makes
the code clearer.
While running virscsitest, it was found that valgrind pointed out the following
memory leak:
==320== 5 bytes in 1 blocks are definitely lost in loss record 4 of 37
==320== at 0x4A069EE: malloc (vg_replace_malloc.c:270)
==320== by 0x3E6CE81171: strdup (strdup.c:43)
==320== by 0x4CB28DF: virStrdup (virstring.c:554)
==320== by 0x4CAC987: virSCSIDeviceSetUsedBy (virscsi.c:289)
==320== by 0x402321: test2 (virscsitest.c:100)
==320== by 0x403231: virtTestRun (testutils.c:199)
==320== by 0x402121: mymain (virscsitest.c:180)
==320== by 0x4039AD: virtTestMain (testutils.c:782)
==320== by 0x3E6CE1ED1C: (below main) (libc-start.c:226)
==320==
Introduced by commit fd243fc.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Consider dozen of LXC domains, each of them having this type of interface:
<interface type='network'>
<mac address='52:54:00:a7:05:4b'/>
<source network='default'/>
</interface>
When starting these domain in parallel, all workers may meet in
virNetDevVethCreate() where a race starts. Race over allocating veth
pairs because allocation requires two steps:
1) find first nonexistent '/sys/class/net/vnet%d/'
2) run 'ip link add ...' command
Now consider two threads. Both of them find N as the first unused veth
index but only one of them succeeds allocating it. The other one fails.
For such cases, we are running the allocation in a loop with 10 rounds.
However this is very flaky synchronization. It should be rather used
when libvirt is competing with other process than when libvirt threads
fight each other. Therefore, internally we should use mutex to serialize
callers, and do the allocation in loop (just in case we are competing
with a different process). By the way we have something similar already
since 1cf97c87.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Running ./autobuild.sh detected a mingw failure:
CCLD libvirt.la
Cannot export virCgroupGetPercpuStats: symbol not defined
Cannot export virCgroupSetOwner: symbol not defined
* src/util/vircgroup.c (virCgroupGetPercpuStats)
(virCgroupSetOwner): Implement stubs.
Signed-off-by: Eric Blake <eblake@redhat.com>
The shutdown handler may restart a domain when handling a reboot
event or when <on_*> is set to 'restart'. Restarting consists of
calling libxlVmCleanup followed by libxlVmStart. libxlVmStart will
emit a VIR_DOMAIN_EVENT_STARTED event, but the SHUTDOWN event is
not emitted until exiting the shutdown handler, after the STARTED
event.
This patch changes the logic a bit to queue the event at the start
of the shutdown action, ensuring it is queued before any subsequent
events that may be generated while executing the shutdown action.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
The network hook script gets called whenever an interface is plugged
into or unplugged from a network, but even though the full XML of both
the network and the domain is included, there is no reasonable way to
determine what exact resources the plugged interface is using:
1) Prior to a recent patch which modified the status XML of interfaces
to include the information about actual hardware resources used, it
would be possible to scan through the domain XML output sent to the
hook, and from there find the correct interface, but that interface
definition would not include any runtime info (e.g. bandwidth or vlan
taken from a portgroup, or which physdev was used in case of a macvtap
network).
2) After the patch modifying the status XML of interfaces, the network
name would no longer be included in the domain XML, so it would be
completely impossible to determine which interface was the one being
plugged.
To solve that problem, this patch includes a single <interface>
element at the beginning of the XML sent to the network hook for
"plugged" and "unplugged" (just inside <hookData>) that is the status
XML of the interface being plugged. This XML will include all info
gathered from the chosen network and portgroup.
NB: due to hardcoded spaces in all of the device *Format() functions,
the <interface> element inside the <hookData> will be indented by 6
spaces rather than 2. I had intended to fix this, but it turns out
that to make virDomainNetDefFormat() indentation relative, I would
have to do the same to virDomainDeviceInfoFormat(), and that function
is called from 19 places - making that a prerequisite of this patch
would cause too many merge difficulties if we needed to backport
network hooks, so I chose to ignore the problem here and fix the
problem for *all* devices in a followup later.
Until now, the "live" XML status of an <interface type='network'>
device would always show the network information, rather than the
exact hardware device that was used. It would also show the name of
any portgroup the interface belonged to, rather than providing the
configuration that was derived from that portgroup. As an example,
given the following network definition:
[A]
<network>
<name>testnet</name>
<forward type='bridge' dev='p4p1_0'>
<interface dev='p4p1_0'/>
<interface dev='p4p1_1'/>
<interface dev='p4p1_2'/>
<interface dev='p4p1_3'/>
</forward>
<portgroup name='admin'>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='128' peak='256' burst='256'/>
</bandwidth>
</portgroup>
</network>
and the following domain <interface>:
[B]
<interface type='network'>
<source network='testnet' portgroup='admin'/>
</interface>
the output of "virsh dumpxml $domain" while the domain was running
would yield something like this:
[C]
<interface type='network'>
<source network='testnet' portgroup='admin'/>
<target dev='macvtap0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
In order to learn the exact bandwidth information of the interface, a
management application would need to retrieve the XML for testnet,
then search for the portgroup named "admin". Even worse, there was no
simple and standard way to learn which host physdev the macvtap0
device is attached to.
Internally, libvirt has always kept this information in the
virDomainDef that is held in memory, as well as storing it in the
(libvirt-internal-only) domain status XML (in
/var/run/libvirt/qemu/$domain.xml). In order to not confuse the runtime
"actual state" with the config of the device, it's internally stored
like this:
[D]
<interface type='network'>
<source network='testnet' portgroup='admin'/>
<actual type='direct'>
<source dev='p4p1_0' mode='bridge'/>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='128' peak='256' burst='256'/>
</bandwidth>
</actual>
<target dev='macvtap0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
This was never exposed outside of libvirt though, because I thought it
would be too awkward for a management application to need to look in
two places for the same information, but I also wasn't sure that it
would be okay to overwrite the config info (in this case "<source
network='testnet' portgroup='admin'/>") with the actual runtime info
(everything inside <actual> above).
Now we have a need for this information to be made available to
management applications (in particular, so that a network "plugged"
hook will have full information about the device that is being plugged
in), so it's time to take the leap and decide that it is acceptable
for the config info to be replaced with actual runtime state (but
*only* when reporting domain live status, *not* when saving state in
/var/run/libvirt/qemu/$domain.xml - that remains the same so that
there is no loss of information). That is what this patch does - once
applied, the output of "virsh dumpxml $domain" when the domain is
running will contain something like this:
[E]
<interface type='direct'>
<source dev='p4p1_0' mode='bridge'/>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='128' peak='256' burst='256'/>
</bandwidth>
<target dev='macvtap0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
In effect, everything that is internally stored within <actual> is
moved up a level to where a management application will expect
it. This means that the management application will only look in a
single place to learn - the type of interface in use, the name of the
physdev (if relevant), the <bandwidth>, <vlan>, and <virtualport>
settings in use.
The potential downside is that a management app looking at this output
will not see that the physdev 'p4p1_0' was actually allocated from the
network named 'testnet', or that the bandwidth numbers were taken from
the portgroup 'admin'. However, if they are interested in that info,
they can always get the "inactive" XML for the domain.
An example of where this could cause problems is in virt-manager's
network device display, which shows the status of the device, but
allows you to edit that status info and save it as the new
config. Previously virt-manager would always display the information
in example [C] above, and allow editing that. With this patch, it will
instead display what is in [E] and allow editing it directly, which
could lead to some confusion. I would suggest that virt-manager have
an "edit" button which would change the display from the "live" xml to
the "inactive" xml, so that editing would be done on that; such a
change would both handle the new situation, and also be compatible
with older releases.
This function is currently only called from one place, but in a
subsequent patch will be called from a 2nd place.
The new function exactly replicates the original behavior of the part
of virDomainActualNetDefFormat() that it replaces, but takes a
virDomainNetDefPtr instead of virDomainActualNetDefPtr, and uses the
virDomainNetGetActual*() functions whenever possible, rather than
reaching into def->data.network.actual - this is to be sure that we
are reporting exactly what is being used internally, just in case
there are any discrepancies (there shouldn't be).
This moves the call to virNetDevBandwidthFormat() in
virDomainNetDefFormat() to be called right after the call to
virNetDevVPortProfileFormat(), so that a single chunk of that function
can be placed inside an if that conditionally calls
virDomainActualNetDefContentsFormat() instead (next patch). The
re-ordering necessitates modifying a couple of test data files.
Other *Format() functions (e.g. virNetDevBandwidthFormat()) return
with no action when called with a NULL *Def pointer. This makes
virNetDevVlanFormat() consistent with that behavior.
In practice, if a virDomainNetDef has a virDomainActualNetDef
allocated, the ActualNetDef will *always* contain the bandwidth and
vlan data from the NetDef (unless there was also a portgroup involved
- see networkAllocateActualDevice()).
However, virDomainNetGetActual(Bandwidth|Vlan)() were coded to make it
appear as if it might be possible to have a valid bandwidth/vlan in
the NetDef, but a NULL in the ActualNetDef. Believing this un-truth
could lead to writing unnecessarily defensive code when dealing with
the virDomainGetActual*() functions, so this patch makes it more
obvious:
If there is an ActualNetDef, it will always have a copy of the
various appropriate bits from its parent NetDef, and the
virDomainGetActual* function will *always* return the data from the
ActualNetDef, not from the NetDef.
The reason for this effective-NOP patch is that a subsequent patch to
change virDomainNetDefFormat will rely on the above rule.
These timeout values make librados/librbd return -ETIMEDOUT when a
operation is blocking due to a failing/unreachable Ceph cluster.
By having the operations time out libvirt will not block.
The libxl driver was ignoring the <on_*> domain event configuration,
causing e.g. a domain to be rebooted even when on_reboot is set to
destroy.
This patch honors the <on_*> configuration in the shutdown event
handler.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
This function is needed for user namespaces, where we need to chmod()
the cgroup to the initial uid/gid such that systemd is allowed to
use the cgroup.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a virStringSearch method to virstring.{c,h} which performs
a regex match against a string and returns the matching substrings.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Systemd does not forget about the cases, where client service needs to
wait for daemon service to initialize and start accepting new clients.
Setting a dependency in client is not enough as systemd doesn't know
when the daemon has initialized itself and started accepting new
clients. However, it offers a mechanism to solve this. The daemon needs
to call a special systemd function by which the daemon tells "I'm ready
to accept new clients". This is exactly what we need with
libvirtd-guests (client) and libvirtd (daemon). So now, with this
change, libvirt-guests.service is invoked not any sooner than
libvirtd.service calls the systemd notify function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1031696
When creating a new domain, we let systemd know about it by calling
CreateMachine() function via dbus. Systemd then creates a scope and
places domain into it. However, later when the host is shutting
down, systemd computes the shutdown order to see what processes can
be shut down in parallel. And since we were not setting
dependencies at all, the slices (and thus domains) were most likely
killed before libvirt-guests.service. So user domains that had to
be saved, shut off, whatever were in fact killed. This problem can
be solved by letting systemd know that scopes we're creating must
not be killed before libvirt-guests.service.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 6515889 broke the build on FreeBSD:
In function `qemuDomainGetCPUStats':
/../../src/qemu/qemu_driver.c:16102:
undefined reference to `virCgroupGetDomainTotalCpuStats'
https://bugzilla.redhat.com/show_bug.cgi?id=1038363
If a domain has a different maximum for persistent and live maxmem
or max vcpus, then it is possible to hit cases where libvirt
refuses to adjust the current values or gets halfway through
the adjustment before failing. Better is to determine up front
if the change is possible for all requested flags.
Based on an idea by Geoff Franks.
* src/qemu/qemu_driver.c (qemuDomainSetMemoryFlags): Compute
correct maximum if both live and config are being set.
(qemuDomainSetVcpusFlags): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The virDomainGetRootFilesystem method can be generalized to allow
any filesystem path to be obtained.
While doing this, start a new test case for purpose of testing various
helper methods in the domain_conf.{c,h} files, such as this one.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupXXX APIs' return value must be checked for
being less than 0, not equal to 0.
An VIR_ERR_OPERATION_INVALID error must also be raised
when the VM is not running to prevent a crash on NULL
priv->cgroup field.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
And provide domain summary stat in that case, for lxc backend.
Use case is a container inheriting all devices from the host,
e.g. when doing application containerization.
Destroying a suspended domain needs special action.
We cannot simply terminate all process because they are frozen.
Do deal with that we send them SIGKILL and thaw them.
Upon wakeup the process sees the pending signal and dies immediately.
Signed-off-by: Richard Weinberger <richard@nod.at>
IN6ADDR_ANY_INIT does not seem to be working as expected on MinGW:
error: missing braces around initializer [-Werror=missing-braces]
.sin6_addr = IN6ADDR_ANY_INIT,
Use the in6addr_any variable instead.
Reported by Daniel P. Berrange.
Currently, networkRunHook() is called in networkAllocateActualDevice and
friends. These functions, however, doesn't necessarily work on networks,
For example, if domain's interface is defined in this fashion:
<interface type='bridge'>
<mac address='52:54:00:0b:3b:16'/>
<source bridge='virbr1'/>
<model type='rtl8139'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
</interface>
The networkAllocateActualDevice jumps directly onto 'validate' label as
the interface is not type of 'network'. Hence, @network is left
initialized to NULL and networkRunHook(network, ...) is called. One of
the things that the hook function does is dereference @network. Soupir.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Dumping a domain's core can take considerable time. Use the
recently added job functions and unlock the virDomainObj while
dumping core.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Saving domain memory and cpu state can take considerable time.
Use the recently added job functions and unlock the virDomainObj
while saving the domain.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
When explicitly destroying a domain (libxlDomainDestroyFlags), or
handling an out-of-band domain shutdown event, cleanup the domain
in the context of a job. Introduce libxlVmCleanupJob to wrap
libxlVmCleanup in a job block.
Large balloon operation can be time consuming. Use the recently
added job functions and unlock the virDomainObj while ballooning.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Creating a large domain could potentially be time consuming. Use the
recently added job functions and unlock the virDomainObj while
the create operation is in progress.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
This function, which only has five call sites, simply calls
libxl_domain_destroy and libxlVmCleanup. Call those functions
directly at the call sites, allowing more control over how a
domain is destroyed and cleaned up. This patch maintains the
existing semantic, leaving changes to a subsequent patch.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
This patch changes network device type used by default from rtl8139
to virtio when architecture type is aarch64 and machine type is virt.
Qemu doesn't support any other machine types for aarch64 right now and
we can't make any other aarch64-specific tuning in this function yet.
Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
At this point it has a limited functionality and is highly
experimental. Supported domain operations are:
* define
* start
* destroy
* dumpxml
* dominfo
It's only possible to have only one disk device and only one
network, which should be of type bridge.
There is no keyboard working on PPC64 and PS2 mouse is only for X86
when graphics are enabled.
Add a USB keyboard and USB mouse for PPC64 when graphics are enabled.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Format qemu command line for USB keyboard
and add test cases for it.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
PS2 devices only work on X86 platform, other platforms may need
USB devices instead. Athough it doesn't influence the QEMU command line,
it's not right to add PS2 mouse/keyboard for non-X86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
There is no keyboard support currently in libvirt.
For some platforms (PPC64 QEMU) this makes graphics unusable,
since the keyboard is not implicit and it can't be added via libvirt.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The networkNotifyActualDevice function is accepting two arguments, not
one:
qemu/qemu_process.c: In function 'qemuProcessNotifyNets':
qemu/qemu_process.c:2776:47: error: macro "networkNotifyActualDevice" passed 2 arguments, but takes just 1
if (networkNotifyActualDevice(def, net) < 0)
^
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
aebbcdd didn't change the non-linux definition of the function,
breaking the build on FreeBSD:
../../src/util/virinitctl.c:164: error: conflicting types for
'virInitctlSetRunLevel'
../../src/util/virinitctl.h:40: error: previous declaration of
'virInitctlSetRunLevel' was here
Basically, the idea is copied from domain code, where tainting
exists for a while. Currently, only one taint reason exists -
VIR_NETWORK_TAINT_HOOK to mark those networks which caused invoking
of hook script.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There might be some use cases, where user wants to prepare the host or
its environment prior to starting a network and do some cleanup after
the network has been shut down. Consider all the functionality that
libvirt doesn't currently have as an example what a hook script can
possibly do.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In the next patch I'm going to need the network format function that
takes virBuffer as argument. However, slightly change of name is more
appropriate then: virNetworkDefFormatBuf to match the rest of functions
that format an object to buffer.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Rewrite multiple hotunplug functions to to use the
virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with an absolute
symlink, tricking the driver into changing the host OS
filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rewrite lxcDomainAttachDeviceHostdevMiscLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rewrite lxcDomainAttachDeviceHostdevStorageLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rewrite lxcDomainAttachDeviceHostdevSubsysUSBLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rewrite lxcDomainAttachDeviceDiskLive function to use the
virProcessRunInMountNamespace helper. This avoids risk of
a malicious guest replacing /dev with a absolute symlink,
tricking the driver into changing the host OS filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Use helper virProcessRunInMountNamespace in lxcDomainShutdownFlags and
lxcDomainReboot. Otherwise, a malicious guest could use symlinks
to force the host to manipulate the wrong file in the host's namespace.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Eric Blake <eblake@redhat.com>
Implement virProcessRunInMountNamespace, which runs callback of type
virProcessNamespaceCallback in a container namespace. This uses a
child process to run the callback, since you can't change the mount
namespace of a thread. This implies that callbacks have to be careful
about what code they run due to async safety rules.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Daniel Berrange <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add a helper function which takes a file path and ensures
that all directory components leading up to the file exist.
IOW, it strips the filename part of the path and passes
the result to virFileMakePath.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The check for whether the cgroup devices ACL is available is
done quite late during LXC hotplug - in fact after the device
node is already created in the container in some cases. Better
to do it upfront so we fail immediately.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The LXC disk hotplug code was allowing block or character devices
to be given as disk. A disk is always a block device.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When detaching a USB device from an LXC guest we must remove
the device from the cgroup ACL. Unfortunately we were telling
the cgroup code to use the guest /dev path, not the host /dev
path, and the guest device node had already been unlinked.
This was, however, fortunate since the code passed &priv->cgroup
instead of priv->cgroup, so would have crash if the device node
were accessible.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After hotplugging a USB device, the LXC driver forgot
to add the device def to the virDomainDefPtr.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The LXC code missed the 'usb' component out of the path
/dev/bus/usb/$BUSNUM/$DEVNUM, so it failed to actually
setup cgroups for the device. This was in fact lucky
because the call to virLXCSetupHostUsbDeviceCgroup
was also mistakenly passing '&priv->cgroup' instead of
just 'priv->cgroup'. So once the path is fixed, libvirtd
would then crash trying to access the bogus virCgroupPtr
pointer. This would have been a security issue, were it
not for the bogus path preventing the pointer reference
being reached.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
virDomainDefCompatibleDevice blocks use of USB if no USB
controller is present. This is not correct for containers
since devices can be assigned directly regardless of any
controllers.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there's just one place where we care if hook script is
changing the domain XML: migration hook for incoming migration. In
all other places where a hook script is executed, we don't read the
XML back from the script.
Anyway, the hook script can alter domain XML and hence we should taint
it if the script did.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This new flag is to be used for tainting domains which
XML definition was altered at runtime by a hook script.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The internal pools were an idea in one of the first iterations of the
gluster series, which we decided not to use. Somehow the patch still
got pushed. Remove it as the internal flag isn't needed.
This reverts commit 362da8209d.
Also try to bind on IPv6 to check if the port is occupied.
Change the mocked bind in the test to return EADDRINUSE
for some ports only for the IPv4/IPv6 socket if we're testing
on a host with IPv6 compiled in.
Also mock socket() to make it fail with EAFNOTSUPPORTED
if LIBVIRT_TEST_IPV4ONLY is set in the environment, to
simulate a host without IPv6 support in the kernel. The
tests are repeated again with this variable set.
https://bugzilla.redhat.com/show_bug.cgi?id=1025407
In a44b7b87bc I've introduced a function
that initializes a storage file wrapper object on gluster based volumes.
The initialization function leaks the private data pointer in case of
failure. This patch fixes it.
Reported by John Ferlan.
In commit e32268184b I accidentally added
twice a typedef for virStorageFileBackend when I moved it between files
across patch iterations. The double declaration breaks build on older
compilers in RHEL5 and FreeBSD.
Remove the spurious definition.
Add support for gluster backed images as sources for snapshots in the
qemu driver. This will also simplify adding further network backed
volumes as sources for snapshot in case qemu will support them.
Use the new storage driver APIs to delete snapshot backing files in case
of failure instead of directly relying on "unlink". This will help us in
the future when we will be adding network based storage without local
representation in the host.
Add APIs that will allow to use the storage driver to assist in
operations on files even for remote filesystems without native
representation as files in the host.
All the data for getting the actual type is present in the snapshot
config. There is no need to have this function private to the qemu
driver and it will be re-used later in other parts of libvirt
All the data for getting the actual type is present in the domain
config. There is no need to have this function private to the qemu
driver and it will be re-used later in other parts of libvirt
The problem with VLAN is that the user still has to manually create the
vlan interface on the host. Then the generated configuration will use
it as a nerwork hostdev device. So the generated configurations of the
following two fragments are equivalent (see rhbz#1059637).
lxc.network.type = phys
lxc.network.link = eth0.5
lxc.network.type = vlan
lxc.network.link = eth0
lxc.network.vlan.id = 5
Some of the LXC configuration properties aren't migrated since they
would only cause problems in libvirt-lxc:
* lxc.network.ipv[46]: LXC driver doesn't setup IP address of guests,
see rhbz#1059624
* lxc.network.name, see rhbz#1059630
If no network configuration is provided, LXC only provides the loopback
interface. To match this, we need to use the privnet feature. LXC will
also define a 'none' network type in its 1.0.0 version that fits
libvirt LXC driver's default.
LXC rootfs can be either a directory or a block device or an image
file. The first two types have been implemented, but the image file is
still to be done since LXC auto-guesses the file format at mount time
and the LXC driver doesn't support the 'auto' format.
This function aims at converting LXC configuration into a libvirt
domain XML description to help users migrate from LXC to libvirt.
Here is an example of how the lxc configuration works:
virsh -c lxc:/// domxml-from-native lxc-tools /var/lib/lxc/migrate_test/config
It is possible that some parts couldn't be properly mapped into a
domain XML fragment, so users should carefully review the result
before creating the domain.
fstab files in lxc.mount lines will need to be merged into the
configuration file as lxc.mount.entry.
As we can't know the amount of memory of the host, we have to set a
default value for max_balloon that users will probably want to adjust.
virConf now honours a VIR_CONF_FLAG_LXC_FORMAT flag to handle LXC
configuration files. The differences are that property names can
contain '.' character and values are all strings without any bounding
quotes.
Provide a new virConfWalk function calling a handler on all non-comment
values. This function will be used by the LXC conversion code to loop
over LXC configuration lines.
Commit 57ddcc23 (v0.9.11) introduced the pmwakeup event, with
an optional 'reason' field reserved for possible future expansion.
But it failed to wire the field through RPC, so even if we do
add a reason in the future, we will be unable to get it back
to the user.
Worse, commit 7ba5defb (v1.0.0) repeated the same mistake with
the pmsuspend_disk event.
As long as we are adding new RPC calls, we might as well fix
the events to actually match the signature so that we don't have
to add yet another RPC in the future if we do decide to start
using the reason field.
* src/remote/remote_protocol.x
(remote_domain_event_callback_pmwakeup_msg)
(remote_domain_event_callback_pmsuspend_msg)
(remote_domain_event_callback_pmsuspend_disk_msg): Add reason
field.
* daemon/remote.c (remoteRelayDomainEventPMWakeup)
(remoteRelayDomainEventPMSuspend)
(remoteRelayDomainEventPMSuspendDisk): Pass reason to client.
* src/conf/domain_event.h (virDomainEventPMWakeupNewFromDom)
(virDomainEventPMSuspendNewFromDom)
(virDomainEventPMSuspendDiskNewFromDom): Require additional
parameter.
* src/conf/domain_event.c (virDomainEventPMClass): New class.
(virDomainEventPMDispose): New function.
(virDomainEventPMWakeupNew*, virDomainEventPMSuspendNew*)
(virDomainEventPMSuspendDiskNew*)
(virDomainEventDispatchDefaultFunc): Use new class.
* src/remote/remote_driver.c (remoteDomainBuildEvent*PM*): Pass
reason through.
* src/remote_protocol-structs: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Following the patterns established by lifecycle events, this
creates all the new RPC calls needed to pass callback IDs
for every domain event, and changes the limits in client and
server codes to use modern style when possible.
I've tested all combinations: both 'old client and new server'
and 'new client and old server' continue to work with the old
RPCs, and 'new client and new server' benefit from server-side
filtering with the new RPCs.
* src/remote/remote_protocol.x (REMOTE_PROC_DOMAIN_EVENT_*): Add
REMOTE_PROC_DOMAIN_EVENT_CALLBACK_* counterparts.
* daemon/remote.c (remoteRelayDomainEvent*): Send callbackID via
newer RPC when used with new-style registration.
(remoteDispatchConnectDomainEventCallbackRegisterAny): Extend to
cover all domain events.
* src/remote/remote_driver.c (remoteDomainBuildEvent*): Add new
Callback and Helper functions.
(remoteEvents): Match order of RPC numbers, register new handlers.
(remoteConnectDomainEventRegisterAny)
(remoteConnectDomainEventDeregisterAny): Extend to cover all
domain events.
* src/remote_protocol-structs: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
The counterpart to the server RPC additions; here, a single
function can serve both old and new calls, while incoming
events must be serviced by two different functions. Again,
some wise choices in our XDR made it easier to share code
managing similar events.
While this only supports lifecycle events, it covers the
harder part of how Register and RegisterAny interact; the
remaining 15 events will be a mechanical change in a later
patch. For Register, we now have a callbackID locally for
more efficient cleanup if the RPC fails; we also prefer to
use the newer RPC where we know it is supported (the older
RPC must be used if we don't know if RegisterAny is
supported).
* src/remote/remote_driver.c (remoteEvents): Register new RPC
event handler.
(remoteDomainBuildEventLifecycle): Move guts...
(remoteDomainBuildEventLifecycleHelper): ...here.
(remoteDomainBuildEventCallbackLifecycle): New function.
(remoteConnectDomainEventRegister)
(remoteConnectDomainEventDeregister)
(remoteConnectDomainEventRegisterAny)
(remoteConnectDomainEventDeregisterAny): Use new RPC when supported.
We want to convert over to server-side events, even for older
APIs. To do that, the client side of the remote driver wants
to distinguish between legacy virConnectDomainEventRegister and
normal virConnectDomainEventRegisterAny, while knowing the
client callbackID and the server's serverID for both types of
registration. The client also needs to probe whether the
server supports server-side filtering. However, for ease of
review, we don't actually use the new RPCs until a later patch.
* src/conf/object_event_private.h (virObjectEventStateCallbackID):
Add parameter.
* src/conf/object_event.c (virObjectEventCallbackListAddID)
(virObjectEventStateRegisterID): Separate legacy from callbackID.
(virObjectEventStateCallbackID): Pass through parameter.
(virObjectEventCallbackLookup): Let legacy and global domain
lifecycle events share a common remoteID.
* src/conf/network_event.c (virNetworkEventStateRegisterID):
Update caller.
* src/conf/domain_event.c (virDomainEventStateRegister)
(virDomainEventStateRegisterID, virDomainEventStateDeregister):
Likewise.
(virDomainEventStateRegisterClient)
(virDomainEventStateCallbackID): Implement new functions.
* src/conf/domain_event.h (virDomainEventStateRegisterClient)
(virDomainEventStateCallbackID): New prototypes.
* src/remote/remote_driver.c (private_data): Add field.
(doRemoteOpen): Probe server feature.
(remoteConnectDomainEventRegister)
(remoteConnectDomainEventRegisterAny): Use new function.
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch adds some new RPC call numbers, but for ease of review,
they sit idle until a later patch adds the client counterpart to
drive the new RPCs. Also for ease of review, I limited this patch
to just the lifecycle event; although converting the remaining
15 domain events will be quite mechanical. On the server side,
we have to have a function per RPC call, largely with duplicated
bodies (the key difference being that we store in our callback
opaque pointer whether events should be fired with old or new
style); meanwhile, a single function can drive multiple RPC
messages. With a strategic choice of XDR struct layout, we can
make the event generation code for both styles fairly compact.
I debated about adding a tri-state witness variable per
connection (values 'unknown', 'legacy', 'modern'). It would start
as 'unknown', move to 'legacy' if any RPC call is made to a legacy
event call, and move to 'modern' if the feature probe is made;
then the event code could issue an error if the witness state is
incorrect (a legacy RPC call while in 'modern', a modern RPC call
while in 'unknown' or 'legacy', and a feature probe while in
'legacy' or 'modern'). But while it might prevent odd behavior
caused by protocol fuzzing, I don't see that it would prevent
any security holes, so I considered it bloat.
Note that sticking @acl markers on the new RPCs generates unused
functions in access/viraccessapicheck.c, because there is no new
API call that needs to use the new checks; however, having a
consistent .x file is worth the dead code.
* src/libvirt_internal.h (VIR_DRV_FEATURE_REMOTE_EVENT_CALLBACK):
New feature.
* src/remote/remote_protocol.x
(REMOTE_PROC_CONNECT_DOMAIN_EVENT_CALLBACK_REGISTER_ANY)
(REMOTE_PROC_CONNECT_DOMAIN_EVENT_CALLBACK_DEREGISTER_ANY)
(REMOTE_PROC_DOMAIN_EVENT_CALLBACK_LIFECYCLE): New RPCs.
* daemon/remote.c (daemonClientCallback): Add field.
(remoteDispatchConnectDomainEventCallbackRegisterAny)
(remoteDispatchConnectDomainEventCallbackDeregisterAny): New
functions.
(remoteDispatchConnectDomainEventRegisterAny)
(remoteDispatchConnectDomainEventDeregisterAny): Mark legacy use.
(remoteRelayDomainEventLifecycle): Change message based on legacy
or new use.
(remoteDispatchConnectSupportsFeature): Advertise new feature.
* src/remote_protocol-structs: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
virGetStorageVol can return NULL on out-of-memory. If it does, cleanly
abort the volume clone operation.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
This reverts commit 67ccf91bf2.
We only generate the volume key after we've built it, but the storage
driver expects it to be filled after createVol finishes.
Squash the volume building back with creating to fulfill this
expectation.
Openstack Nova calls virConnectBaselineCPU() during initialization
of the instance to get a full list of CPU features.
This patch adds a stub to aarch64-specific code to handle
this request (no actual work is done). That's enough to have
this stub with limited functionality because qemu/kvm backend
supports only 'host-passthrough' cpu mode on aarch64.
Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
A small fix for the possiblitiy of jumping to an error path before
registering for domain events, preventing receiving important ones
like shutdown and death.