Commit Graph

11303 Commits

Author SHA1 Message Date
Peter Krempa
d410e6f19d qemu: snapshot: Use better check when reverting external snapshots
https://bugzilla.redhat.com/show_bug.cgi?id=1071264

Reverting of external snapshots is not supported currently. The check
that is present doesn't properly check for all aspects that make a
snapshot external. Use virDomainSnapshotIsExternal() to do the check.
2014-03-04 11:12:44 +01:00
Michal Privoznik
042c4ab1c9 qemuBuildNicDevStr: Adapt to new advisory on multiqueue
As I did previously in 4f588a1b46, libvirt needs to set virtio vectors.
Previously, we were advised to use vectors=N, where

N = 2 * (number of queues) + 1

However, just recently this advisory has changed on the Multiquue wiki
page [1] to:

N = 2 * (number of queues) + 2

1: http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-03-04 10:43:05 +01:00
Ján Tomko
12ee0b98d3 Check if systemd is running before creating machines
If systemd is installed, but is not the init system,
systemd-machined fails with an unhelpful error message:
Launch helper exited with unknown return code 1

Currently we only check if the "machine1" service is
available (in ListActivatableNames).
Also check if "systemd1" service is registered with DBus
(ListNames).

This fixes https://bugs.gentoo.org/show_bug.cgi?id=493246#c22
2014-03-04 09:14:52 +01:00
Ján Tomko
65a4cb03c7 Split out most of virDBusIsServiceEnabled
Introduce virDBusIsServiceInList which can be used to call other
methods for listing services (ListNames), not just ListActivatableNames.

No functional change, fixed the 'Retruns' typo.
2014-03-04 09:14:52 +01:00
Eric Blake
b75c7bd6b9 build: fix cppi warning
Jenkins pointed out that the previous commit violates syntax
check when cppi is installed.

* src/nwfilter/nwfilter_dhcpsnoop.c (SNOOP_POLL_MAX_TIMEOUT_MS):
Update indentation.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 14:02:42 -07:00
Stefan Berger
49b59a151f nwfilter: Increase buffer size for libpcap
Libpcap 1.5 requires a larger buffer than previous pcap versions.
Adjust the size of the buffer to 128kb.

This patch should address symptoms in BZ 1071181 and BZ 731059

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
2014-03-03 15:13:50 -05:00
Stefan Berger
64df4c7518 nwfilter: Display the pcap errror message
Display the pcap error message in the log.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
2014-03-03 15:13:47 -05:00
Stefan Berger
a718eb19e3 nwfilter: Cap the poll timeout in the DHCP Snooping code
Cap the poll timeout in the DHCP Snooping code to a max. of 10 seconds
to not hold up the libvirt shutdown longer than this.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
2014-03-03 15:13:44 -05:00
Eric Blake
25f87817ab virFork: simplify semantics
The old semantics of virFork() violates the priciple of good
usability: it requires the caller to check the pid argument
after use, *even when virFork returned -1*, in order to properly
abort a child process that failed setup done immediately after
fork() - that is, the caller must call _exit() in the child.
While uses in virfile.c did this correctly, uses in 'virsh
lxc-enter-namespace' and 'virt-login-shell' would happily return
from the calling function in both the child and the parent,
leading to very confusing results. [Thankfully, I found the
problem by inspection, and can't actually trigger the double
return on error without an LD_PRELOAD library.]

It is much better if the semantics of virFork are impossible
to abuse.  Looking at virFork(), the parent could only ever
return -1 with a non-negative pid if it misused pthread_sigmask,
but this never happens.  Up until this patch series, the child
could return -1 with non-negative pid if it fails to set up
signals correctly, but we recently fixed that to make the child
call _exit() at that point instead of forcing the caller to do
it.  Thus, the return value and contents of the pid argument are
now redundant (a -1 return now happens only for failure to fork,
a child 0 return only happens for a successful 0 pid, and a
parent 0 return only happens for a successful non-zero pid),
so we might as well return the pid directly rather than an
integer of whether it succeeded or failed; this is also good
from the interface design perspective as users are already
familiar with fork() semantics.

One last change in this patch: before returning the pid directly,
I found cases where using virProcessWait unconditionally on a
cleanup path of a virFork's -1 pid return would be nicer if there
were a way to avoid it overwriting an earlier message.  While
such paths are a bit harder to come by with my change to a direct
pid return, I decided to keep the virProcessWait change in this
patch.

* src/util/vircommand.h (virFork): Change signature.
* src/util/vircommand.c (virFork): Guarantee that child will only
return on success, to simplify callers.  Return pid rather than
status, now that the situations are always the same.
(virExec): Adjust caller, also avoid open-coding process death.
* src/util/virprocess.c (virProcessWait): Tweak semantics when pid
is -1.
(virProcessRunInMountNamespace): Adjust caller.
* src/util/virfile.c (virFileAccessibleAs, virFileOpenForked)
(virDirCreate): Likewise.
* tools/virt-login-shell.c (main): Likewise.
* tools/virsh-domain.c (cmdLxcEnterNamespace): Likewise.
* tests/commandtest.c (test23): Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 12:40:32 -07:00
Eric Blake
b9dd878ff8 util: make it easier to grab only regular command exit
Auditing all callers of virCommandRun and virCommandWait that
passed a non-NULL pointer for exit status turned up some
interesting observations.  Many callers were merely passing
a pointer to avoid the overall command dying, but without
caring what the exit status was - but these callers would
be better off treating a child death by signal as an abnormal
exit.  Other callers were actually acting on the status, but
not all of them remembered to filter by WIFEXITED and convert
with WEXITSTATUS; depending on the platform, this can result
in a status being reported as 256 times too big.  And among
those that correctly parse the output, it gets rather verbose.
Finally, there were the callers that explicitly checked that
the status was 0, and gave their own message, but with fewer
details than what virCommand gives for free.

So the best idea is to move the complexity out of callers and
into virCommand - by default, we return the actual exit status
already cleaned through WEXITSTATUS and treat signals as a
failed command; but the few callers that care can ask for raw
status and act on it themselves.

* src/util/vircommand.h (virCommandRawStatus): New prototype.
* src/libvirt_private.syms (util/command.h): Export it.
* docs/internals/command.html.in: Document it.
* src/util/vircommand.c (virCommandRawStatus): New function.
(virCommandWait): Adjust semantics.
* tests/commandtest.c (test1): Test it.
* daemon/remote.c (remoteDispatchAuthPolkit): Adjust callers.
* src/access/viraccessdriverpolkit.c (virAccessDriverPolkitCheck):
Likewise.
* src/fdstream.c (virFDStreamCloseInt): Likewise.
* src/lxc/lxc_process.c (virLXCProcessStart): Likewise.
* src/qemu/qemu_command.c (qemuCreateInBridgePortWithHelper):
Likewise.
* src/xen/xen_driver.c (xenUnifiedXendProbe): Simplify.
* tests/reconnect.c (mymain): Likewise.
* tests/statstest.c (mymain): Likewise.
* src/bhyve/bhyve_process.c (virBhyveProcessStart)
(virBhyveProcessStop): Don't overwrite virCommand error.
* src/libvirt.c (virConnectAuthGainPolkit): Likewise.
* src/openvz/openvz_driver.c (openvzDomainGetBarrierLimit)
(openvzDomainSetBarrierLimit): Likewise.
* src/util/virebtables.c (virEbTablesOnceInit): Likewise.
* src/util/viriptables.c (virIpTablesOnceInit): Likewise.
* src/util/virnetdevveth.c (virNetDevVethCreate): Fix debug
message.
* src/qemu/qemu_capabilities.c (virQEMUCapsInitQMP): Add comment.
* src/storage/storage_backend_iscsi.c
(virStorageBackendISCSINodeUpdate): Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 12:40:32 -07:00
Eric Blake
c72e76c3d9 util: make it easier to grab only regular process exit
Right now, a caller waiting for a child process either requires
the child to have status 0, or must use WIFEXITED() and friends
itself.  But in many cases, we want the middle ground of treating
fatal signals as an error, and directly accessing the normal exit
value without having to use WEXITSTATUS(), in order to easily
detect an expected non-zero exit status.  This adds the middle
ground to the low-level virProcessWait; the next patch will add
it to virCommand.

* src/util/virprocess.h (virProcessWait): Alter signature.
* src/util/virprocess.c (virProcessWait): Add parameter.
(virProcessRunInMountNamespace): Adjust caller.
* src/util/vircommand.c (virCommandWait): Likewise.
* src/util/virfile.c (virFileAccessibleAs): Likewise.
* src/lxc/lxc_container.c (lxcContainerHasReboot)
(lxcContainerAvailable): Likewise.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* tools/virt-login-shell.c (main): Likewise.
* tools/virsh-domain.c (cmdLxcEnterNamespace): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput): Likewise.
* tests/commandtest.c (test23): Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 12:40:31 -07:00
Eric Blake
8b24a803ad util: preserve exit status from mount namespace callback
The documentation of namespace callbacks was inconsistent on whether
it preserved positive return values.  Now that we have a dedicated
EXIT_CANCELED to flag all errors before getting to the callback,
it is possible to use positive return values (not that any of the
current callers do, but it is better to match the docs).

Also, while vircommand.c is careful to close fds that a child should
not have, it's still better to be in the practice of setting
FD_CLOEXEC up front.

* src/util/virprocess.c (virProcessRunInMountNamespace): Tweak
return value to pass back non-zero status.  Avoid leaking pipe fds
to other threads.
* src/util/virprocess.h: Fix comment.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 12:40:31 -07:00
Eric Blake
2b4f162eb4 util: make it easier to reflect child exit status
Thanks to namespaces, we have a couple of places in the code
base that want to reflect a child exit status, including the
ability to detect death by a signal, back to a grandparent.
Best to make it a reusable function.

* src/util/virprocess.h (virProcessExitWithStatus): New prototype.
* src/libvirt_private.syms (util/virprocess.h): Export it.
* src/util/virprocess.c (virProcessExitWithStatus): New function.
* tests/commandtest.c (test23): Test it.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 12:40:31 -07:00
Eric Blake
631923e7f2 virFork: give specific status on failure prior to exec
When a child fails without exec'ing, we want a well-known status;
best is to match what env(1), nice(1), su(1), and other wrapper
programs do.  This patch adds enum values that later patches will
use, and sets up virFork as the first client of EXIT_CANCELED
for errors detected prior to even attempting exec, as well as
virExec to distinguish between a missing executable vs. a binary
that cannot be executed.

This is a slight semantic change in the unlikely case of a child
process failing to restore its signal mask - we now kill the
child with a known status instead of relying on the caller to
notice and do an appropriate _exit().  A subsequent patch will
make further cleanups based on an audit of all callers.

* src/internal.h (EXIT_CANCELED, EXIT_CANNOT_INVOKE)
(EXIT_ENOENT): New enum.
* src/util/vircommand.c (virFork): Document specific exit value if
child aborts early.
(virExec): Distinguish between various exec failures.
* tests/commandtest.c (test1): Enhance test.
(test22): New test.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 12:40:31 -07:00
Eric Blake
f972a7c72c nwfilter: make ignoring non-zero status easier to follow
While auditing all callers of virCommandRun, I noticed that nwfilter
code never paid attention to commands with a non-zero status; they
were merely passing a pointer to avoid spamming the logs with a
message about commands that might indeed fail.  But proving this
required chasing through a lot of code; refactoring things to
localize the decision of whether to ignore non-zero status makes
it easier to prove that later changes to virFork don't negatively
affect this code.

While at it, I also noticed that ebiptablesRemoveRules would
actually report success if the child process failed for a
reason other than non-zero status, such as OOM.

* src/nwfilter/nwfilter_ebiptables_driver.c (ebiptablesExecCLI):
Change parameter from pointer to bool.
(ebtablesApplyBasicRules, ebtablesApplyDHCPOnlyRules)
(ebtablesApplyDropAllRules, ebtablesCleanAll)
(ebiptablesApplyNewRules, ebiptablesTearNewRules)
(ebiptablesTearOldRules, ebiptablesAllTeardown)
(ebiptablesDriverInitWithFirewallD)
(ebiptablesDriverTestCLITools, ebiptablesDriverProbeStateMatch):
Adjust all clients.
(ebiptablesRemoveRules): Likewise, and fix return value on failure.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-03-03 12:40:31 -07:00
Oleg Strikov
72bddd5f2f qemu: Implement a stub cpuArchDriver.baseline() handler for arm
Openstack Nova calls virConnectBaselineCPU() during initialization
of the instance to get a full list of CPU features.
This patch adds a stub to arm-specific code to handle
this request (no actual work is done).

Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
2014-03-03 11:06:25 -05:00
Daniel P. Berrange
36ff4ed1ec Generate a unique journald log for QEMU capabilities failure
When probing QEMU capabilities fails for a binary generate a
log message with MESSAGE_ID==8ae2f3fb-2dbe-498e-8fbd-012d40afa361.

This can be directly queried from journald based on the UUID
instead of needing string grep. This lets tools like libguestfs'
bug reporting tool trivially do automated sanity tests on the
host they're running on.

 $ journalctl MESSAGE_ID=8ae2f3fb-2dbe-498e-8fbd-012d40afa361
 Feb 21 17:11:01 localhost.localdomain lt-libvirtd[9196]:
 Failed to probe capabilities for /bin/qemu-system-alpha:
 internal error: Child process (LC_ALL=C LD_LIBRARY_PATH=
 /home/berrange/src/virt/libvirt/src/.libs PATH=/usr/lib64/
 ccache:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:
 /usr/bin:/root/bin HOME=/root USER=root LOGNAME=root
 /bin/qemu-system-alpha -help) unexpected exit status 127:
 /bin/qemu-system-alpha: error while loading shared libraries:
 libglapi.so.0: cannot open shared object file: No such file
 or directory

 $ journalctl MESSAGE_ID=8ae2f3fb-2dbe-498e-8fbd-012d40afa361 --output=json
 { ...snip...
  "LIBVIRT_SOURCE" : "file",
  "PRIORITY" : "3",
  "CODE_FILE" : "qemu/qemu_capabilities.c",
  "CODE_LINE" : "2770",
  "CODE_FUNC" : "virQEMUCapsLogProbeFailure",
  "MESSAGE_ID" : "8ae2f3fb-2dbe-498e-8fbd-012d40afa361",
  "LIBVIRT_QEMU_BINARY" : "/bin/qemu-system-xtensa",
  "MESSAGE" : "Failed to probe capabilities for /bin/qemu-system-xtensa:
   internal error: Child process (LC_ALL=C LD_LIBRARY_PATH=/home/berrange
   /src/virt/libvirt/src/.libs PATH=/usr/lib64/ccache:/usr/local/sbin:
   /usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin HOME=/root
   USER=root LOGNAME=root /bin/qemu-system-xtensa -help) unexpected
   exit status 127: /bin/qemu-system-xtensa: error while loading shared
   libraries: libglapi.so.0: cannot open shared object file: No such
    file or directory\n" }

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-03-03 11:42:37 +00:00
Roman Bogorodskiy
e2d85e6fa1 bhyve: add basic documentation 2014-03-01 23:44:58 +04:00
Roman Bogorodskiy
ae49a093c8 bhyve: defined domains should be persistent 2014-03-01 11:44:19 +04:00
Roman Bogorodskiy
91f396b33b bhyve: support domain undefine
Implement domainUndefine and required helper functions:
 - domainIsActive
 - domainIsPersistent
2014-02-28 23:23:44 +04:00
Daniel P. Berrange
f223b96051 Add comments describing the different log sources
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-28 17:38:46 +00:00
Daniel P. Berrange
0915053e97 Include error domain and code in log messages from errors
When a virError is raised, pass the error domain and code
onto the systemd journald using metadata fields.

This allows error messages to be queried by code eg

  $ journalctl LIBVIRT_CODE=43

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-28 17:38:46 +00:00
Daniel P. Berrange
21d370f0b9 Fix journald PRIORITY values
The systemd journal expects log record PRIORITY values to
be encoded using the syslog compatible numbering scheme,
not libvirt's own native numbering scheme. We must therefore
apply a conversion.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-28 17:37:38 +00:00
Daniel P. Berrange
54209df345 Send virLogMetadata fields onto the journal
The systemd journal accepts arbitrary user specified log
fields. These can be passed into virLogMessage via the
virLogMetadata structure. Allow up to 5 custom fields to
be reported by libvirt callers.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-28 17:37:38 +00:00
Oleg Strikov
97962616c1 qemu: Enable 'host-passthrough' cpu mode for arm
This patch allows libvirt user to specify 'host-passthrough'
cpu mode while using qemu/kvm backend on arm (arm32).
It uses 'host' as a CPU model name instead of some other stub
(correct CPU detection is not implemented yet) to allow libvirt
user to specify 'host-model' cpu mode as well.

Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
2014-02-28 11:31:00 -05:00
Michal Privoznik
1df00e2b22 virDomainBlockStats(Flags): Produce saner error message on empty disk path
As of 0bd2ccdec an empty disk path for virDomainBlockStats (or the one
with Flags) is allowed meaning "get me overall summarized statistics".
However, running 'virsh domblkstat $dom' throws a misleading error:

  # ./tools/virsh domblkstat dom
  error: Failed to get block stats dom
  error: invalid argument: invalid path:

while after this commit

  # virsh domblkstat dom
  error: Operation not supported: summary statistics are not supported yet

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-02-28 09:50:01 +01:00
Jiri Denemark
8f10c1e77f sanlock: Truncate domain names longer than SANLK_NAME_LEN
Libvirt uses a domain name to fill in owner_name in sanlock_options in
virLockManagerSanlockAcquire. Unfortunately, owner_name is limited to
SANLK_NAME_LEN characters (including trailing '\0'), which means domains
with longer names fail to start when sanlock is enabled. However, we can
truncate the name when setting owner_name as explained by sanlock's
author:

Setting sanlk_options or the owner_name is unnecessary, and has very
little to no benefit.  If you do provide something in owner_name, it can
be anything, sanlock doesn't care or use it.

If you run the command "sanlock status", the output will display a list
of clients connected to the sanlock daemon.  This client list is
displayed as "pid owner_name" if the client has provided an owner_name
via sanlk_options. This debugging output is the only usage of
owner_name, so its only benefit is to potentially provide a more human
friendly output for debugging purposes.
2014-02-27 09:32:41 +01:00
Ian Campbell
bf5dbce61e libxl: Recognise ARM architectures
Only tested on v7 but the v8 equivalent seems pretty obvious.

XEN_CAP_REGEX already accepts more than it should (e.g. x86_64p or x86_32be)
but I have stuck with the existing pattern.

With this I can create a guest from:
  <domain type='xen'>
    <name>libvirt-test</name>
    <uuid>6343998e-9eda-11e3-98f6-77252a7d02f3</uuid>
    <memory>393216</memory>
    <currentMemory>393216</currentMemory>
    <vcpu>1</vcpu>
    <os>
      <type arch='armv7l' machine='xenpv'>linux</type>
      <kernel>/boot/vmlinuz-arm-native</kernel>
      <cmdline>console=hvc0 earlyprintk debug root=/dev/xvda1</cmdline>
    </os>
    <clock offset='utc'/>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>destroy</on_crash>
    <devices>
      <disk type='block' device='disk'>
        <source dev='/dev/marilith-n0/debian-disk'/>
        <target dev='xvda1'/>
      </disk>
      <interface type='bridge'>
        <mac address='8e:a7:8e:3c:f4:f6'/>
        <source bridge='xenbr0'/>
      </interface>
    </devices>
  </domain>

Using virsh create and I can destroy it too.

Currently virsh console fails with:
  Connected to domain libvirt-test
  Escape character is ^]
  error: internal error: cannot find character device <null>

I haven't investigated yet.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2014-02-26 06:33:23 -07:00
Laine Stump
eed46d4cfe network: unplug bandwidth and call networkRunHook only when appropriate
According to commit b4e0299d if networkAllocateActualDevice() was
successful, it will *always* allocate an iface->data.network.actual,
so we can use this during networkReleaseActualDevice() to know if
there is really anything to undo. We were properly using this
information to only decrement the network connections counter if it
had previously been incremented, but we were unconditionally
unplugging bandwidth and calling the "unplugged" network hook for
*all* interfaces (during qemuProcessStop()) whether they had been
previously plugged or not. This caused problems if a domain failed to
start at some time prior to all interfaces being allocated. (I
encountered this when an interface had a bandwidth floor set but no
inbound QoS).

This patch changes both the call to networkUnplugBandwidth() and the
call to networkRunHook() to only be called if there was a previous
call to "plug" for the same interface.
2014-02-26 13:08:56 +02:00
Laine Stump
0700a3dac4 network: don't even call networkRunHook if there is no network
networkAllocateActualDevice() is called for *all* interfaces, not just
those with type='network'. In that case, it will jump down to its
validate: label immediately, without allocating anything. After
validation is done, two counters are potentially updated (one for the
network, and one for any particular physical device that is chosen),
and then networkRunHook() is called.

This patch refactors that code a slight bit so that networkRunHook()
doesn't get called if netdef is NULL (i.e. type != network) and to
place the conditional increment of dev->connections inside the "if
(netdef)" as well - dev can never be non-null if netdef is null
(because "dev" is the pointer to a device in a network's pool of
devices), so this doesn't have any functional effect, it just makes
the code clearer.
2014-02-26 13:03:49 +02:00
Nehal J Wani
969493f91d Fix memory leak in virSCSIDeviceListDel()
While running virscsitest, it was found that valgrind pointed out the following
memory leak:

==320== 5 bytes in 1 blocks are definitely lost in loss record 4 of 37
==320==    at 0x4A069EE: malloc (vg_replace_malloc.c:270)
==320==    by 0x3E6CE81171: strdup (strdup.c:43)
==320==    by 0x4CB28DF: virStrdup (virstring.c:554)
==320==    by 0x4CAC987: virSCSIDeviceSetUsedBy (virscsi.c:289)
==320==    by 0x402321: test2 (virscsitest.c:100)
==320==    by 0x403231: virtTestRun (testutils.c:199)
==320==    by 0x402121: mymain (virscsitest.c:180)
==320==    by 0x4039AD: virtTestMain (testutils.c:782)
==320==    by 0x3E6CE1ED1C: (below main) (libc-start.c:226)
==320==

Introduced by commit fd243fc.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2014-02-26 11:41:40 +01:00
Michal Privoznik
c0d162c68c virNetDevVethCreate: Serialize callers
Consider dozen of LXC domains, each of them having this type of interface:

    <interface type='network'>
      <mac address='52:54:00:a7:05:4b'/>
      <source network='default'/>
    </interface>

When starting these domain in parallel, all workers may meet in
virNetDevVethCreate() where a race starts. Race over allocating veth
pairs because allocation requires two steps:

  1) find first nonexistent '/sys/class/net/vnet%d/'
  2) run 'ip link add ...' command

Now consider two threads. Both of them find N as the first unused veth
index but only one of them succeeds allocating it. The other one fails.
For such cases, we are running the allocation in a loop with 10 rounds.
However this is very flaky synchronization. It should be rather used
when libvirt is competing with other process than when libvirt threads
fight each other. Therefore, internally we should use mutex to serialize
callers, and do the allocation in loop (just in case we are competing
with a different process). By the way we have something similar already
since 1cf97c87.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-02-26 08:50:47 +01:00
Eric Blake
fa2e4dbfd6 build: fix cgroups on non-Linux
Running ./autobuild.sh detected a mingw failure:

  CCLD     libvirt.la
Cannot export virCgroupGetPercpuStats: symbol not defined
Cannot export virCgroupSetOwner: symbol not defined

* src/util/vircgroup.c (virCgroupGetPercpuStats)
(virCgroupSetOwner): Implement stubs.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-02-25 17:38:46 -07:00
Jim Fehlig
4d975deddd libxl: queue domain event earlier in shutdown handler
The shutdown handler may restart a domain when handling a reboot
event or when <on_*> is set to 'restart'.  Restarting consists of
calling libxlVmCleanup followed by libxlVmStart.  libxlVmStart will
emit a VIR_DOMAIN_EVENT_STARTED event, but the SHUTDOWN event is
not emitted until exiting the shutdown handler, after the STARTED
event.

This patch changes the logic a bit to queue the event at the start
of the shutdown action, ensuring it is queued before any subsequent
events that may be generated while executing the shutdown action.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
2014-02-25 10:54:04 -07:00
Laine Stump
2122cf3979 network: include plugged interface XML in "plugged" network hook
The network hook script gets called whenever an interface is plugged
into or unplugged from a network, but even though the full XML of both
the network and the domain is included, there is no reasonable way to
determine what exact resources the plugged interface is using:

1) Prior to a recent patch which modified the status XML of interfaces
to include the information about actual hardware resources used, it
would be possible to scan through the domain XML output sent to the
hook, and from there find the correct interface, but that interface
definition would not include any runtime info (e.g. bandwidth or vlan
taken from a portgroup, or which physdev was used in case of a macvtap
network).

2) After the patch modifying the status XML of interfaces, the network
name would no longer be included in the domain XML, so it would be
completely impossible to determine which interface was the one being
plugged.

To solve that problem, this patch includes a single <interface>
element at the beginning of the XML sent to the network hook for
"plugged" and "unplugged" (just inside <hookData>) that is the status
XML of the interface being plugged. This XML will include all info
gathered from the chosen network and portgroup.

NB: due to hardcoded spaces in all of the device *Format() functions,
the <interface> element inside the <hookData> will be indented by 6
spaces rather than 2. I had intended to fix this, but it turns out
that to make virDomainNetDefFormat() indentation relative, I would
have to do the same to virDomainDeviceInfoFormat(), and that function
is called from 19 places - making that a prerequisite of this patch
would cause too many merge difficulties if we needed to backport
network hooks, so I chose to ignore the problem here and fix the
problem for *all* devices in a followup later.
2014-02-25 16:07:36 +02:00
Laine Stump
7d5bf48474 conf: output actual netdev status in <interface> XML
Until now, the "live" XML status of an <interface type='network'>
device would always show the network information, rather than the
exact hardware device that was used. It would also show the name of
any portgroup the interface belonged to, rather than providing the
configuration that was derived from that portgroup. As an example,
given the following network definition:

[A]
  <network>
    <name>testnet</name>
    <forward type='bridge' dev='p4p1_0'>
      <interface dev='p4p1_0'/>
      <interface dev='p4p1_1'/>
      <interface dev='p4p1_2'/>
      <interface dev='p4p1_3'/>
    </forward>
    <portgroup name='admin'>
      <bandwidth>
          <inbound average='1000' peak='5000' burst='1024'/>
          <outbound average='128' peak='256' burst='256'/>
      </bandwidth>
    </portgroup>
  </network>

and the following domain <interface>:

[B]
  <interface type='network'>
    <source network='testnet' portgroup='admin'/>
  </interface>

the output of "virsh dumpxml $domain" while the domain was running
would yield something like this:

[C]
  <interface type='network'>
    <source network='testnet' portgroup='admin'/>
    <target dev='macvtap0'/>
    <alias name='net0'/>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
  </interface>

In order to learn the exact bandwidth information of the interface, a
management application would need to retrieve the XML for testnet,
then search for the portgroup named "admin". Even worse, there was no
simple and standard way to learn which host physdev the macvtap0
device is attached to.

Internally, libvirt has always kept this information in the
virDomainDef that is held in memory, as well as storing it in the
(libvirt-internal-only) domain status XML (in
/var/run/libvirt/qemu/$domain.xml). In order to not confuse the runtime
"actual state" with the config of the device, it's internally stored
like this:

[D]
  <interface type='network'>
    <source network='testnet' portgroup='admin'/>
    <actual type='direct'>
      <source dev='p4p1_0' mode='bridge'/>
      <bandwidth>
          <inbound average='1000' peak='5000' burst='1024'/>
          <outbound average='128' peak='256' burst='256'/>
      </bandwidth>
    </actual>
    <target dev='macvtap0'/>
    <alias name='net0'/>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
  </interface>

This was never exposed outside of libvirt though, because I thought it
would be too awkward for a management application to need to look in
two places for the same information, but I also wasn't sure that it
would be okay to overwrite the config info (in this case "<source
network='testnet' portgroup='admin'/>") with the actual runtime info
(everything inside <actual> above).

Now we have a need for this information to be made available to
management applications (in particular, so that a network "plugged"
hook will have full information about the device that is being plugged
in), so it's time to take the leap and decide that it is acceptable
for the config info to be replaced with actual runtime state (but
*only* when reporting domain live status, *not* when saving state in
/var/run/libvirt/qemu/$domain.xml - that remains the same so that
there is no loss of information). That is what this patch does - once
applied, the output of "virsh dumpxml $domain" when the domain is
running will contain something like this:

[E]
  <interface type='direct'>
    <source dev='p4p1_0' mode='bridge'/>
    <bandwidth>
        <inbound average='1000' peak='5000' burst='1024'/>
        <outbound average='128' peak='256' burst='256'/>
    </bandwidth>
    <target dev='macvtap0'/>
    <alias name='net0'/>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
  </interface>

In effect, everything that is internally stored within <actual> is
moved up a level to where a management application will expect
it. This means that the management application will only look in a
single place to learn - the type of interface in use, the name of the
physdev (if relevant), the <bandwidth>, <vlan>, and <virtualport>
settings in use.

The potential downside is that a management app looking at this output
will not see that the physdev 'p4p1_0' was actually allocated from the
network named 'testnet', or that the bandwidth numbers were taken from
the portgroup 'admin'. However, if they are interested in that info,
they can always get the "inactive" XML for the domain.

An example of where this could cause problems is in virt-manager's
network device display, which shows the status of the device, but
allows you to edit that status info and save it as the new
config. Previously virt-manager would always display the information
in example [C] above, and allow editing that. With this patch, it will
instead display what is in [E] and allow editing it directly, which
could lead to some confusion. I would suggest that virt-manager have
an "edit" button which would change the display from the "live" xml to
the "inactive" xml, so that editing would be done on that; such a
change would both handle the new situation, and also be compatible
with older releases.
2014-02-25 16:06:43 +02:00
Laine Stump
9da98aa5e1 conf: new function virDomainActualNetDefContentsFormat
This function is currently only called from one place, but in a
subsequent patch will be called from a 2nd place.

The new function exactly replicates the original behavior of the part
of virDomainActualNetDefFormat() that it replaces, but takes a
virDomainNetDefPtr instead of virDomainActualNetDefPtr, and uses the
virDomainNetGetActual*() functions whenever possible, rather than
reaching into def->data.network.actual - this is to be sure that we
are reporting exactly what is being used internally, just in case
there are any discrepancies (there shouldn't be).
2014-02-25 16:04:26 +02:00
Laine Stump
65487c0fc5 conf: re-situate <bandwidth> element in <interface>
This moves the call to virNetDevBandwidthFormat() in
virDomainNetDefFormat() to be called right after the call to
virNetDevVPortProfileFormat(), so that a single chunk of that function
can be placed inside an if that conditionally calls
virDomainActualNetDefContentsFormat() instead (next patch). The
re-ordering necessitates modifying a couple of test data files.
2014-02-25 16:03:05 +02:00
Laine Stump
7c39214cd4 conf: make virDomainNetDefFormat a public function
We will need to call virDomainNetDefFormat() from the network hook (in
the network driver).
2014-02-25 16:01:39 +02:00
Laine Stump
79358733b0 conf: handle null pointer in virNetDevVlanFormat
Other *Format() functions (e.g. virNetDevBandwidthFormat()) return
with no action when called with a NULL *Def pointer. This makes
virNetDevVlanFormat() consistent with that behavior.
2014-02-25 15:56:12 +02:00
Laine Stump
6d4ffae4fc conf: clarify what is returned for actual bandwidth and vlan
In practice, if a virDomainNetDef has a virDomainActualNetDef
allocated, the ActualNetDef will *always* contain the bandwidth and
vlan data from the NetDef (unless there was also a portgroup involved
- see networkAllocateActualDevice()).

However, virDomainNetGetActual(Bandwidth|Vlan)() were coded to make it
appear as if it might be possible to have a valid bandwidth/vlan in
the NetDef, but a NULL in the ActualNetDef. Believing this un-truth
could lead to writing unnecessarily defensive code when dealing with
the virDomainGetActual*() functions, so this patch makes it more
obvious:

   If there is an ActualNetDef, it will always have a copy of the
   various appropriate bits from its parent NetDef, and the
   virDomainGetActual* function will *always* return the data from the
   ActualNetDef, not from the NetDef.

The reason for this effective-NOP patch is that a subsequent patch to
change virDomainNetDefFormat will rely on the above rule.
2014-02-25 15:55:19 +02:00
Wido den Hollander
60f70542f9 rbd: Set timeout options for librados
These timeout values make librados/librbd return -ETIMEDOUT when a
operation is blocking due to a failing/unreachable Ceph cluster.

By having the operations time out libvirt will not block.
2014-02-25 11:14:44 +01:00
Wido den Hollander
761491eb7c rbd: Include return statuses from librados/librbd in logging
With this information it's easier for the user to debug what is
going wrong.
2014-02-25 11:14:28 +01:00
Jim Fehlig
cfad607b23 libxl: handle on_crash coredump actions
Add support for coredump-{destroy,restart} actions of <on_crash> event.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
2014-02-24 10:39:44 -07:00
Jim Fehlig
c2de456e4e libxl: add dump dir to libxlDriverConfig object
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
2014-02-24 10:27:53 -07:00
Jim Fehlig
51b9b39127 libxl: honor domain lifecycle event configuration
The libxl driver was ignoring the <on_*> domain event configuration,
causing e.g. a domain to be rebooted even when on_reboot is set to
destroy.

This patch honors the <on_*> configuration in the shutdown event
handler.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
2014-02-24 10:26:52 -07:00
Richard Weinberger
6fb42d7cdc Ensure systemd cgroup ownership is delegated to container with userns
This function is needed for user namespaces, where we need to chmod()
the cgroup to the initial uid/gid such that systemd is allowed to
use the cgroup.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-24 15:35:47 +00:00
Roman Bogorodskiy
8ca5f46c59 bhyve: implement node information reporting
- Implement nodeGetCPUStats using nodeGetCPUStats()
- Implement nodeGetMemoryStats using nodeGetMemoryStats()
2014-02-24 19:03:46 +04:00
Daniel P. Berrange
66e3a3e914 Add virStringReplace method for substring replacement
Add a virStringReplace method to virstring.{h,c} to perform
substring matching and replacement

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-24 10:51:22 +00:00
Manuel VIVES
12aa71dfde Add virStringSearch method for regex matching
Add a virStringSearch method to virstring.{c,h} which performs
a regex match against a string and returns the matching substrings.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-24 10:46:28 +00:00