14058 Commits

Author SHA1 Message Date
Daniel P. Berrange
a2bdfa5261 lxc: report veth device indexes to systemd
Record the index of each host-side veth device created and report
them to systemd, so they show up in machinectl status for the VM.

lxc-shell(95449419f969d649d9962566ec42af7d)
     Since: Fri 2015-01-16 16:53:37 GMT; 3s ago
    Leader: 28085 (sh)
   Service: libvirt-lxc; class container
     Iface: vnet0
   Address: fe80::216:3eff:fe00:c317%124
        OS: Fedora 21 (Twenty One)
      Unit: machine-lxc\x2dshell.scope
            └─28085 /bin/sh
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
e4fc4f0c99 lxc: more logging during startup paths
Add more logging to the lxc controller and container files to
facilitate debugging startup problems. Also make it clear when
the container is going to close stdout and thus no longer do
any logging.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
4acb01e43e lxc: delay setup of cgroup until we have the init pid
Don't create the cgroups ahead of launching the container since
there is no need for the limits to apply during initial bootstrap.
Create the cgroup after the container PID is known and tell
systemd the initpid is the leader, instead of the controller
pid.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
0a8addc103 lxc: only write XML once for lxc controller
Currently when launching the LXC controller we first write out
the plain, inactive XML configuration, then launch the controller,
then replace the file with the live status XML configuration.
By good fortune this hasn't caused any problems other than some
misleading error messages during failure scenarios.

This simplifies the code so it only writes out the XML once and
always writes the live status XML. To do this we need to handshake
with the child process, to make execution pause just before exec()
so we can write the XML status with the child PID present.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
e1de552150 lxc: re-arrange startup synchronization sequence with controller
Currently the lxc controller process itself is responsible for
daemonizing itself into the background and writing out its pid
file. The lxc driver would fork the controller and then attempt
to connect to the lxc monitor. This connection would only
succeed after the controller has backgrounded itself, setup
cgroups and written its pid file, so startup was race free.

The problem is that we need to delay create of the cgroups to
much later, such that we can tell systemd the container init
pid when we create the cgroups. If we delay cgroup creation
though the current synchronization won't work.

A second problem is that the controller needs the XML config
of the guest. Currently we write out the plain virDomainDefPtr
XML before starting the controller, and then later replace it
with the full virDomainObjPtr status XML. This is kind of gross
and also means that the controller doesn't get a record of the
live XML config right away. This means it doesn't have a record
of the veth device names either and so can't give that info
to systemd when creating the cgroups.

To address this we change the startup sequencing. The goal
is that we want to get the PID as soon as possible, before
the LXC controller even starts. So we stop letting the LXC
controller daemonize itself, and instead use virCommand's
built-in capabilities. This daemonizes and writes the PID
before LXC controller is exec'd. So the driver can read
the PID as soon as virCommandRun returns. It is no longer
safe to connect to the monitor or detect the cgroups though.

Fortunately the LXC controller already has a second point
of synchronization. Immediately before its  event loop
starts running, it performs a handshake with the driver.
So we move the opening of the monitor connection and cgroup
detection after this synchronization point.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
a5979e3374 lxc: don't build pidfile string multiple times
Build the pidfile string once when starting a guest and then
use the same string thereafter. This will benefit following
patches which need the pidfile string in more situations.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
b3e4401dc6 systemd: don't report an error if the guest is already terminated
In many cases where we invoke virSystemdTerminateMachine the
process(es) will have already gone away on their own accord.
In these cases we log an error message that the machine does
not exist. We should catch this particular error and simply
ignore it, so we don't pollute the logs.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
f7afeddce9 qemu: report TAP device indexes to systemd
Record the index of each TAP device created and report them to
systemd, so they show up in machinectl status for the VM.
2015-01-27 13:57:02 +00:00
Chen Hanxiao
95da191376 storage: add a flag to clone files on btrfs
When creating a RAW file, we don't take advantage
of clone of btrfs.

Add a VIR_STORAGE_VOL_CREATE_REFLINK flag to request
a reflink copy.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2015-01-27 13:41:14 +01:00
Chen Hanxiao
466b29c8c3 storage: introduce btrfsCloneFile() for COW copy
Add a wrapper for BTRFS_IOC_CLONE ioctl.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2015-01-27 13:24:10 +01:00
Daniel P. Berrange
55ea7be7d9 Removing probing of secondary drivers
For stateless, client side drivers, it is never correct to
probe for secondary drivers. It is only ever appropriate to
use the secondary driver that is associated with the
hypervisor in question. As a result the ESX & HyperV drivers
have both been forced to do hacks where they register no-op
drivers for the ones they don't implement.

For stateful, server side drivers, we always just want to
use the same built-in shared driver. The exception is
virtualbox which is really a stateless driver and so wants
to use its own server side secondary drivers. To deal with
this virtualbox has to be built as 3 separate loadable
modules to allow registration to work in the right order.

This can all be simplified by introducing a new struct
recording the precise set of secondary drivers each
hypervisor driver wants

struct _virConnectDriver {
    virHypervisorDriverPtr hypervisorDriver;
    virInterfaceDriverPtr interfaceDriver;
    virNetworkDriverPtr networkDriver;
    virNodeDeviceDriverPtr nodeDeviceDriver;
    virNWFilterDriverPtr nwfilterDriver;
    virSecretDriverPtr secretDriver;
    virStorageDriverPtr storageDriver;
};

Instead of registering the hypervisor driver, we now
just register a virConnectDriver instead. This allows
us to remove all probing of secondary drivers. Once we
have chosen the primary driver, we immediately know the
correct secondary drivers to use.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
220c01aa0a don't disable state driver when libvirtd is not built
A bunch of code is wrapped in #if WITH_LIBVIRTD in order to
enable the virStateDriver to be disabled when libvirtd is not
built. Disabling this code doesn't have any real functional
benefit beyond removing 1 pointer from the virConnectPtr struct,
while having a cost of many more conditionals.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
f35fa0fd95 Remove all secondary driver private data fields
Now all drivers are converted to use their global state
directly, there is no need for private data fields for
the secondary drivers in virConnectPtr
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
d85f9f1a7e Remove use of interfacePrivateData from udev driver
The udev driver can be implemented using global state instead
of the connect private data.
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
60b966b378 Remove use of nodeDevicePrivateData from nodeDev driver
The node device driver can rely on its global state instead
of the connect private data.
2015-01-27 12:02:03 +00:00
Daniel P. Berrange
47b7f661a4 Remove use of storagePrivateData/networkPrivateData from vbox
The vbox driver can use the main hypervisor private data and
so does not need to use the storage/network private data fields.
2015-01-27 12:02:03 +00:00
Daniel P. Berrange
7b1ba9566b Remove use of nwfilterPrivateData from nwfilter driver
The nwfilter driver can rely on its global state instead
of the connect private data.
2015-01-27 12:02:03 +00:00
Daniel P. Berrange
04101f23d0 Remove use of secretPrivateData from secret driver
The secret driver can rely on its global state instead
of the connect private data.
2015-01-27 12:02:03 +00:00
Peter Krempa
d13f56f08a qemu: Fix job handling in qemuDomainSetMetadata
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:39:21 +01:00
Peter Krempa
fb2ed975c3 qemu: Fix job type in qemuDomainGetBlockIoTune
The function just queries status so there's no need for a MODIFY type
job.
2015-01-27 10:39:21 +01:00
Peter Krempa
c5ee5cfb18 qemu: Fix job handling in qemuDomainSetSchedulerParametersFlags
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:38:47 +01:00
Peter Krempa
4fd7a72075 qemu: Fix job handling in qemuDomainSetMemoryParameters
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:24:04 +01:00
Peter Krempa
e3e72743df qemu: Fix job handling in qemuDomainSetAutostart
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.

This patch also fixes a few very long lines of code around the touched
parts.
2015-01-27 10:24:04 +01:00
Peter Krempa
79e5603307 qemu: Fix job handling in qemuDomainPinEmulator
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:24:04 +01:00
Peter Krempa
46d950443d qemu: Fix job handling in qemuDomainPinVcpuFlags
The domain modifies the domain configuration but doesn't take a MODIFY
type job to do it.
2015-01-27 10:24:03 +01:00
Ján Tomko
b54f48812d Fix a memory leak in virCgroupGetPercpuStats
Coverity reports that my commit af1c98e introduced
two memory leaks:
the cpumap if ncpus == 0 in virCgroupGetPercpuStats
and the params array in the test of the function.
2015-01-26 16:13:06 +01:00
Ján Tomko
495accb047 Use correct location for qcow1 encryption header
After the 8-byte size header, there are two one-byte headers
and two bytes of padding before the crypt_header field.

Our QCOW1_HDR_CRYPT constant did not skip the padding.
http://git.qemu.org/?p=qemu.git;a=blob;f=block/qcow.c;h=ece22697#l41

https://bugzilla.redhat.com/show_bug.cgi?id=1185165
2015-01-26 16:13:02 +01:00
Daniel P. Berrange
2d8b59c060 systemd: avoid string comparisons on dbus error messages
Add a virDBusErrorIsUnknownMethod helper so that callers
don't need todo string comparisons themselves to detect
standard error names.
2015-01-26 09:14:04 +00:00
Daniel P. Berrange
d13b586a91 systemd: fix build without dbus
The virDBusMethodCall method has a DBusError as one of its
parameters. If the caller wants to pass a non-NULL value
for this, it immediately makes the calling code require
DBus at build time. This has led to breakage of non-DBus
builds several times. It is desirable that only the virdbus.c
file should need WITH_DBUS conditionals, so we must ideally
remove the DBusError parameter from the method.

We can't simply raise a libvirt error, since the whole point
of this parameter is to give the callers a way to check if
the error is one they want to ignore, without having the logs
polluted with an error message. So, we add a virErrorPtr
parameter which the caller can then either ignore or raise
using the new virReportErrorObject method.

This new method is distinct from virSetError in that it
ensures the logging hooks are run.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-01-26 09:14:04 +00:00
Richard W.M. Jones
ee4c13ce1d aarch64: Support versioned machine types.
For distros that want to add versioned machine types, they will add
(downstream) machine types like "virt-foo-1.2.3".  Detect these as
MMIO too.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
2015-01-23 15:12:33 +00:00
Erik Skultety
b7e6f2fc80 qemu: Add check for PCI bridge placement if there are too many PCI devices
Previous patch of this series fixed the issue with adding a new PCI bridge
when all the slots were reserved by devices with user specified addresses.
In case there are still some PCI devices waiting to get a slot reserved
by qemuAssignDevicePCISlots, this means a new bus needs to be
created along with a corresponding bridge controller. By adding an
additional check, this scenario now results in a reasonable error
instead of generating wrong qemu command line.
2015-01-23 14:35:03 +01:00
Erik Skultety
5d6904b991 qemu: Fix auto-adding PCI bridge when all slots are reserved
Commit 93c8ca tried to fix the issue with auto-adding of a PCI bridge
controller, but didn't work properly in all scenarios.

This patch provides a better fix of the issue when all slots on a PCI bus
are reserved by devices with user specified addresses and no additional
bridges need to be created.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1132900
2015-01-23 14:32:18 +01:00
Erik Skultety
a3ecd63e92 qemu: move PCI slot assignment for PIIX3, Q35 into a separate function
In order to be able to test for fully reserved PCI buses, assignment of
PCI slots for integrated devices needs to be moved to a separate function.
This also might be a good preparation if we decide to add support for
other chipsets as well.
2015-01-23 14:26:55 +01:00
Erik Skultety
3fb2a69284 qemu: reorder PCI slot assignment functions
Move qemuDomainAssignPCIAddresses after the definition
of the static function qemuDomainValidateDevicePCISlotsQ35.

This lets us define a new static function using
qemuDomainValidateDevicePCISlots* and use it in
qemuDomainAssignPCIAddresses without a forward declaration.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
2015-01-23 14:16:40 +01:00
Peter Krempa
60e4e5783d util: json: Make argument of virJSONValueArraySize const
The function doesn't allow to modify the array in any way, thus the
argument can be const.
2015-01-23 13:18:04 +01:00
Peter Krempa
165c34778b qemu: command: Honor const-correctnes in qemuBuildNumaArgStr
@def is modified in the function indirectly although it's marked as
const.
2015-01-23 13:18:04 +01:00
Peter Krempa
f18f1183e5 conf: Fix comment mentioning actual type of @multi member of virDevicePCIAddress
After refactor to use the virTristateSwitch enum the comment in the
struct was not adjusted.
2015-01-23 13:18:03 +01:00
Erik Skultety
852cea52ec conf: virDomainDefMaybeAddController tweak return code
Previously the function returned either -1 in case of an error or 0 on
success. However, we should also distinguish between a case we
successfully added a controller and a case there wasn't a need to add any
controller
2015-01-23 11:03:45 +01:00
Erik Skultety
2fbfb3ac41 qemu: Remove dead code in qemuDomainAssignPCIAddresses revert patch
As it turned out, fix of dead code 419a22 changed the affected condition
from "never true" to "always true", so better fix would be to change the
return code of virDomainMaybeAddController from 0 to 1 if
a new bridge has been added, thus distinguishing case when we didn't need to
add any controller and case we successfully added one.

The return code is changed in the next commit
2015-01-23 11:03:45 +01:00
Pavel Hrdina
3baeea6239 esx_vi: fix possible segfault
Clang found possible dereference of NULL pointer which is right.
Function 'esxVI_LookupTaskInfoByTask' should find a task info. The issue
is that we could return 0 and leave 'taksInfo' pointer NULL because if
there is no match we simply end the search loop end set 'result' to 0.
Every caller count on the fact that if the return value is 0 than it's
safe to dereference 'taskInfo'. We should return 0 only in case we found
something and the '*taskInfo' is not NULL.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2015-01-22 18:30:18 +01:00
Pavel Hrdina
828e485bd5 xenapi_driver: fix copy-paste typo
Clang found that we are passing variable with wrong enum type to
'xenapiCrashExitEnum2virDomainLifecycle' function. This is probably
copy-paste typo as the correct variable exists in the code, but it isn't
used.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2015-01-22 18:30:18 +01:00
Ján Tomko
af1c98e406 Fix virCgroupGetPercpuStats with non-continuous present CPUs
Per-cpu stats are only shown for present CPUs in the cgroups,
but we were only parsing the largest CPU number from
/sys/devices/system/cpu/present and looking for stats even for
non-present CPUs.
This resulted in:
internal error: cpuacct parse error
2015-01-22 17:01:11 +01:00
Peter Krempa
b347c0c2a3 CVE-2015-0236: qemu: Check ACLs when dumping security info from snapshots
The ACL check didn't check the VIR_DOMAIN_XML_SECURE flag and the
appropriate permission for it. Found via code inspection while fixing
permissions for save images.
2015-01-22 14:32:54 +01:00
Peter Krempa
03c3c0c874 CVE-2015-0236: qemu: Check ACLs when dumping security info from save image
The ACL check didn't check the VIR_DOMAIN_XML_SECURE flag and the
appropriate permission for it.
2015-01-22 14:32:54 +01:00
Luyao Huang
860522d26b qemu: output error when try to hotplug unsupported console type
https://bugzilla.redhat.com/show_bug.cgi?id=1164627

When using 'virsh attach-device' to hotplug an unsupported console type
into a qemu guest the attachment would succeed as the command line
formatter didn't report error in such case.

Signed-off-by: Luyao Huang <lhuang@redhat.com>
2015-01-22 11:17:14 +01:00
Ján Tomko
280ece4af9 qemu: format server interface without a listen address
https://bugzilla.redhat.com/show_bug.cgi?id=1130390

The listen address is not mandatory for <interface type='server'>
but when it's not specified, we've been formatting it as:
-netdev socket,listen=(null):5558,id=hostnet0
which failed with:
Device 'socket' could not be initialized

Omit the address completely and only format the port in the listen
attribute.

Also fix the schema to allow specifying a model.
2015-01-21 13:22:36 +01:00
Ján Tomko
c803c070c4 Fix virCgroupNewMachine prototype on non-Linux
Commit 318df5a changed the prototype of virCgroupNewMachine
without adjusting the stub function for platforms without
cgroups.
2015-01-20 10:02:53 +01:00
Josh Stone
298fa4858c network: Let domains be restricted to local DNS
This adds a new "localOnly" attribute on the domain element of the
network xml.  With this set to "yes", DNS requests under that domain
will only be resolved by libvirt's dnsmasq, never forwarded upstream.

This was how it worked before commit f69a6b987d616, and I found that
functionality useful.  For example, I have my host's NetworkManager
dnsmasq configured to forward that domain to libvirt's dnsmasq, so I can
easily resolve guest names from outside.  But if libvirt's dnsmasq
doesn't know a name and forwards it to the host, I'd get an endless
forwarding loop.  Now I can set localOnly="yes" to prevent the loop.

Signed-off-by: Josh Stone <jistone@redhat.com>
2015-01-20 01:07:18 -05:00
Ján Tomko
d16704fd60 qemu_conf: check for duplicate security drivers
Using the same driver multiple times is pointless and
it can result in confusing errors:

$ virsh start test
error: Failed to start domain test
error: internal error: security label already defined for VM

https://bugzilla.redhat.com/show_bug.cgi?id=1153891
2015-01-19 12:46:37 +01:00
Ján Tomko
5c703ca396 Always check return value of qemuDomainObjExitMonitor
Depending on the context, either error out if the domain
has disappeared in the meantime, or just ignore the value
to allow marking the function as ATTRIBUTE_RETURN_CHECK.
2015-01-19 10:12:32 +01:00