Commit Graph

18542 Commits

Author SHA1 Message Date
Peter Krempa
ea3891a0fc conf: numatune: Extract code for requesting memory nodeset from formatting
Extract the logic to determine which nodeset has to be used for a domain
from the formatting step so that it can be reused separately when the
nodeset is used in a different way.
2015-01-31 08:53:21 +01:00
Michal Privoznik
cd7702d456 xend: Don't crash in virDomainXMLDevID
The function is called from all {Attach,Update,Detach}Device APIs to
create config strings that are later passed to the xend to perform the
desired action. The function is intended to handle all supported
devices. However, as of 5b05358aba we
are trying to get disk driver of the device without checking if the
device really is a disk. This leads to an segmentation fault:

  #0 0x00007ffff7571815 in virDomainDiskGetDriver () from /usr/lib/libvirt.so.0
  #1 0x00007fffeb9ad471 in ?? () from /usr/lib/libvirt/connection-driver/libvirt_driver_xen.so
  #2 0x00007fffeb9b1062 in xenDaemonAttachDeviceFlags () from /usr/lib/libvirt/connection-driver/libvirt_driver_xen.so
  #3 0x00007fffeb9a8a86 in ?? () from /usr/lib/libvirt/connection-driver/libvirt_driver_xen.so
  #4 0x00007ffff7609266 in virDomainAttachDevice () from /usr/lib/libvirt.so.0
  #5 0x0000555555593c9d in ?? ()
  #6 0x00007ffff76743c9 in virNetServerProgramDispatch () from /usr/lib/libvirt.so.0
  #7 0x00005555555a678d in ?? ()
  #8 0x00007ffff755460e in ?? () from /usr/lib/libvirt.so.0
  #9 0x00007ffff7553b06 in ?? () from /usr/lib/libvirt.so.0
  #10 0x00007ffff4998b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
  #11 0x00007ffff46e30ed in clone () from /lib/x86_64-linux-gnu/libc.so.6
  #12 0x0000000000000000 in ?? ()

Reported-by: Xiaolin Su <linxxnil@126.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-01-30 13:59:52 +01:00
Michal Privoznik
bbd3eb5098 conf: Don't mangle vcpu placement randomly
https://bugzilla.redhat.com/show_bug.cgi?id=1170492

In one of our previous commits (dc8b7ce7) we've done a functional
change even though it was intended as pure refactor. The problem is,
that the following XML:

 <vcpu placement='static' current='2'>6</vcpu>
 <cputune>
   <emulatorpin cpuset='1-3'/>
 </cputune>
 <numatune>
   <memory mode='strict' placement='auto'/>
 </numatune>

gets translated into this one:

 <vcpu placement='auto' current='2'>6</vcpu>
 <cputune>
   <emulatorpin cpuset='1-3'/>
 </cputune>
 <numatune>
   <memory mode='strict' placement='auto'/>
 </numatune>

We should not change the vcpu placement mode. Moreover, we're doing
something similar in case of emulatorpin and iothreadpin. If they were
set, but vcpu placement was auto, we've mistakenly removed them from
the domain XML even though we are able to set them independently on
vcpus.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-01-30 13:51:22 +01:00
Tony Krowiak
79a8769479 qemu: change macvtap device options in response to NIC_RX_FILTER_CHANGED
This patch enables synchronization of the host macvtap
device options with the guest device's in response to the
NIC_RX_FILTER_CHANGED event.

The following device options will be synchronized:
* PROMISC
* MULTICAST
* ALLMULTI

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-01-30 13:16:28 +01:00
Tony Krowiak
e562a61a07 util: Functions for getting/setting device options
This patch provides the utility functions needed to synchronize
the rxfilter changes made to a guest domain with the corresponding
macvtap devices on the host:

* Get/set PROMISC flag
* Get/set ALLMULTI, MULTICAST

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-01-30 13:07:27 +01:00
John Ferlan
7879d03197 qemu: Don't unconditionally delete file in qemuOpenFileAs
https://bugzilla.redhat.com/show_bug.cgi?id=1158034

If we're expecting to create a file somewhere and that fails for some
reason during qemuOpenFileAs, then we unlink the path we're attempting
to create leaving no way to determine what the "existing" privileges,
protections, or labels are that caused the failure (open, change owner
and group, change mode, etc.).

Furthermore, if we fall into the path where we'll be opening / creating
the file using VIR_FILE_OPEN_FORK, we need to first unlink/delete the file
we created in the first path; otherwise, the attempt by the child process
to open as some specific user:group may fail because the file was already
created using nfsnobody:nfsnobody. Again, if we didn't create the file we
don't want to blindly delete what already exists. Thus, a second reason for
the original check to set need_unlink to false when we find the file with
CREAT set, but already existing.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2015-01-29 15:37:34 -05:00
John Ferlan
29946e3e53 virfile: Need to check for ENOTCONN from recvfd failure
A gnulib change (commit id 'beae0bdc') causes ENOTCONN to be returned
from recvfd which causes us to fall into the throwaway waitpid() call
and return ENOTCONN to the caller, this then gets displayed during
a 'virsh save' when using a root squashed NFS environment that's trying
to save the file as something other than root:root.

This patch will add the additional check for ENOTCONN to force the code
into the waitpid loop looking for the actual status from the _exit()'d
child fork.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2015-01-29 15:37:09 -05:00
John Ferlan
8ff383366b qemu: Adjust EndAsyncJob for qemuDomainSaveInternal error path
Commit id '540c339a' to fix issues with reference counting and transient
domains moved the qemuDomainObjEndAsyncJob call prior to the attempt to
restart the guest CPU's resulting in an error:

    error: Failed to save domain rhel70 to /tmp/pl/rhel70.save
    error: internal error: unexpected async job 3

when (ret != 0) - eg, the error path from qemuDomainSaveMemory.

This patch will adjust the logic to call the EndAsyncJob only after
we've tried to restart the guest CPUs. It also needs to adjust the
test for qemuDomainRemoveInactive to add the ret == 0 condition.

Additionally, if we get to endjob: because of some error earlier, then
we need to save that error in the event the CPU restart logic fails.
We don't want to return the error from CPU restart failure, rather we
want to return the error from the failed save that caused us to fall
into the retry to start the CPU logic.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2015-01-29 12:10:41 -05:00
Michal Privoznik
5222256849 schemas: Allow all generic elements and attributes for all interfaces
There are some interface types (notably 'server' and 'client')
which instead of allowing the default set of elements and
attributes (like the rest do), try to enumerate only the elements
they know of. This way it's, however, easy to miss something. For
instance, the <address/> element was not mentioned at all. This
resulted in a strange behavior: when such interface was added
into XML, the address was automatically generated by parsing
code. Later, the formatted XML hasn't passed the RNG schema. This
became more visible once we've turned on the XML validation on
domain XML changes: appending an empty line at the end of
formatted XML (to trick virsh think the XML had changed) made
libvirt to refuse the very same XML it formatted.

Instead of trying to find each element and attribute we are
missing in the schema, lets just allow all the elements and
attributes like we're doing that for the rest of types. It's no
harm if the schema is wider than our parser allows.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-01-29 16:23:15 +01:00
Michal Privoznik
436dcf0b74 qemu: Add AAVMF to the list of known UEFIs
Well, even though users can pass the list of UEFI:NVRAM pairs at the
configure time, we may maintain the list of widely available UEFI
ourselves too. And as arm64 begin to rises, OVMF was ported there too.
With a slight name change - it's called AAVMF, with AAVMF_CODE.fd
being the UEFI firmware and AAVMF_VARS.fd being the NVRAM store file.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-01-29 15:20:47 +01:00
Michal Privoznik
bc03a23149 qemu: Allow UEFI paths to be specified at compile time
Up until now there are just two ways how to specify UEFI paths to
libvirt. The first one is editing qemu.conf, the other is editing
qemu_conf.c and recompile which is not that fancy. So, new
configure option is introduced: --with-loader-nvram which takes a
list of pairs of UEFI firmware and NVRAM store. This way, the
compiled in defaults can be passed during compile time without
need to change the code itself.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-01-29 15:20:42 +01:00
Ján Tomko
9783c20cfb Fix syntax-check
My commit 08d1ae1 broke syntax-check by adding ATTRIBUTE_UNUSED
to the flags parameter.

Rename the parameter to unused_flags to bypass the check.
2015-01-29 14:39:12 +01:00
Ján Tomko
08d1ae16d6 Remove flag checking in MacVLanCreate helper stub
When compiling without WITH_MACVTAP, we can get:
'unsupported flags (0x1) in function
virNetDevMacVLanCreateWithVPortProfile'
on an attempt to start a domain.

Remove the flag check to reach the more helpful error:
Cannot create macvlan devices on this platform

https://bugzilla.redhat.com/show_bug.cgi?id=1186928
2015-01-29 10:06:56 +01:00
Peter Krempa
00af238109 virsh: man: Document behavior of some blkdeviotune's flags when querying
--live and --config can't be specified together when querying the
configuration, but are valid when setting. The man page was hinting that
they are valid always.
2015-01-29 08:33:51 +01:00
Peter Krempa
20448c2a72 util: bitmap: Tolerate NULL bitmaps in virBitmapEqual
After virBitmapEqual is able to compare NULL bitmaps few bits of code
can be cleaned up.
2015-01-29 08:22:41 +01:00
John Ferlan
9bbbb91216 storage: Check the partition name against provided name
https://bugzilla.redhat.com/show_bug.cgi?id=1138516

If the provided volume name doesn't match what parted generated as the
partition name, then return a failure.

Update virsh.pod and formatstorage.html.in to describe the 'name' restriction
for disk pools as well as the usage of the <target>'s <format type='value'>.
2015-01-28 17:28:03 -05:00
John Ferlan
471e1c4e2a storage: When delete extended partition, need to refresh pool
When removing a volume that is the extended partition, all the logical
volume partitions that exist within the extended partition will also be
removed, so we need to refresh the pool to have the updated list
2015-01-28 17:28:03 -05:00
John Ferlan
bce671b731 storage: Adjust how to refresh extended partition disk data
During virStorageBackendDiskMakeDataVol processing, if we find an extended
partition, then handle it specially when updating the capacity/allocation
rather than calling virStorageBackendUpdateVolInfo.

As it turns out, once a logical partition exists, any attempt to refresh
the pool or after libvirtd restart/reload will result in a failure to open
the extended partition device resulting in the inability to start the pool.
The downside to this is we will lose the <permissions> and <timestamps> for
the extended partition upon subsequent restart, refresh, reload since the
stat() in virStorageBackendUpdateVolTargetInfoFD will not be called. However,
since it's really only a container and shouldn't directly be used for
storage that seems reasonable.

Therefore, only use the existing code that already had a comment about
getting the allocation wrong for extended partitions for just the setting
of the extended partition data.
2015-01-28 17:28:03 -05:00
John Ferlan
a0d88ed4e7 storage: Fix check for partition type for disk backing volumes
While checking the existing partitions in virStorageBackendDiskPartFormat,
the code would erroneously compare the volume target format type (eg, the
virStoragePartedFsType) rather than the source partition type (eg, the
virStorageVolTypeDisk) which is set during virStorageBackendDiskReadPartitions.
2015-01-28 17:28:03 -05:00
John Ferlan
290ffcfbbc storage: Attempt error recovery in virStorageBackendDiskCreateVol
During virStorageBackendDiskCreateVol if virStorageBackendDiskReadPartitions
fails, then we were leaving with an error and a partition on the disk for
which there was no corresponding volume and used space on the disk which
could be reclaimable through direct parted activity. On a subsequent restart,
reload, or refresh the volume may magically appear too.
2015-01-28 17:28:03 -05:00
John Ferlan
1e79ad6d35 storage: Move virStorageBackendDiskDeleteVol
Move the API to before virStorageBackendDiskCreateVol in order to be
able to call the DeleteVol API when virStorageBackendDiskReadPartitions
fails so that we don't by chance leave a partition on the disk.
2015-01-28 17:28:03 -05:00
Pavel Hrdina
259dfe24a8 libvirt.spec: remove vbox storage and network .so files
Commit 55ea7be7 removed separated modules for vbox_network and
vbox_storage drivers but forget to update libvirt.spec.in file. This
patch will fix rpm build.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2015-01-28 19:05:25 +01:00
Luyao Huang
f76df311e8 qemu: fix cannot set graphic passwd via qemuDomainSaveImageDefineXML
https://bugzilla.redhat.com/show_bug.cgi?id=1183890

When we try to update a xml to a image file, we will clear the
graphics passwd settings, because we do not pass VIR_DOMAIN_XML_SECURE
to qemuDomainDefCopy, qemuDomainDefFormatBuf won't format the passwd.

Add VIR_DOMAIN_XML_SECURE flag when we call qemuDomainDefCopy
in qemuDomainSaveImageUpdateDef.

Signed-off-by: Luyao Huang <lhuang@redhat.com>
2015-01-28 16:56:34 +01:00
Ján Tomko
21e0e8866e hotplug: only add a chardev to vmdef after monitor call
https://bugzilla.redhat.com/show_bug.cgi?id=1161024

This way the device is in vmdef only if ret = 0 and the caller
(qemuDomainAttachDeviceFlags) does not free it.

Otherwise it might get double freed by qemuProcessStop
and qemuDomainAttachDeviceFlags if the domain crashed
in monitor after we've added it to vm->def.
2015-01-28 10:10:54 +01:00
Ján Tomko
daf51be5f1 Split qemuDomainChrInsert into two parts
Do the allocation first, then add the actual device.
The second part should never fail. This is good
for live hotplug where we don't want to remove the device
on OOM after the monitor command succeeded.

The only change in behavior is that on failure, the
vmdef->consoles array is freed, not just the first console.
2015-01-27 18:30:15 +01:00
Daniel P. Berrange
a2bdfa5261 lxc: report veth device indexes to systemd
Record the index of each host-side veth device created and report
them to systemd, so they show up in machinectl status for the VM.

lxc-shell(95449419f969d649d9962566ec42af7d)
     Since: Fri 2015-01-16 16:53:37 GMT; 3s ago
    Leader: 28085 (sh)
   Service: libvirt-lxc; class container
     Iface: vnet0
   Address: fe80::216:3eff:fe00:c317%124
        OS: Fedora 21 (Twenty One)
      Unit: machine-lxc\x2dshell.scope
            └─28085 /bin/sh
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
e4fc4f0c99 lxc: more logging during startup paths
Add more logging to the lxc controller and container files to
facilitate debugging startup problems. Also make it clear when
the container is going to close stdout and thus no longer do
any logging.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
4acb01e43e lxc: delay setup of cgroup until we have the init pid
Don't create the cgroups ahead of launching the container since
there is no need for the limits to apply during initial bootstrap.
Create the cgroup after the container PID is known and tell
systemd the initpid is the leader, instead of the controller
pid.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
0a8addc103 lxc: only write XML once for lxc controller
Currently when launching the LXC controller we first write out
the plain, inactive XML configuration, then launch the controller,
then replace the file with the live status XML configuration.
By good fortune this hasn't caused any problems other than some
misleading error messages during failure scenarios.

This simplifies the code so it only writes out the XML once and
always writes the live status XML. To do this we need to handshake
with the child process, to make execution pause just before exec()
so we can write the XML status with the child PID present.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
e1de552150 lxc: re-arrange startup synchronization sequence with controller
Currently the lxc controller process itself is responsible for
daemonizing itself into the background and writing out its pid
file. The lxc driver would fork the controller and then attempt
to connect to the lxc monitor. This connection would only
succeed after the controller has backgrounded itself, setup
cgroups and written its pid file, so startup was race free.

The problem is that we need to delay create of the cgroups to
much later, such that we can tell systemd the container init
pid when we create the cgroups. If we delay cgroup creation
though the current synchronization won't work.

A second problem is that the controller needs the XML config
of the guest. Currently we write out the plain virDomainDefPtr
XML before starting the controller, and then later replace it
with the full virDomainObjPtr status XML. This is kind of gross
and also means that the controller doesn't get a record of the
live XML config right away. This means it doesn't have a record
of the veth device names either and so can't give that info
to systemd when creating the cgroups.

To address this we change the startup sequencing. The goal
is that we want to get the PID as soon as possible, before
the LXC controller even starts. So we stop letting the LXC
controller daemonize itself, and instead use virCommand's
built-in capabilities. This daemonizes and writes the PID
before LXC controller is exec'd. So the driver can read
the PID as soon as virCommandRun returns. It is no longer
safe to connect to the monitor or detect the cgroups though.

Fortunately the LXC controller already has a second point
of synchronization. Immediately before its  event loop
starts running, it performs a handshake with the driver.
So we move the opening of the monitor connection and cgroup
detection after this synchronization point.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
a5979e3374 lxc: don't build pidfile string multiple times
Build the pidfile string once when starting a guest and then
use the same string thereafter. This will benefit following
patches which need the pidfile string in more situations.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
b3e4401dc6 systemd: don't report an error if the guest is already terminated
In many cases where we invoke virSystemdTerminateMachine the
process(es) will have already gone away on their own accord.
In these cases we log an error message that the machine does
not exist. We should catch this particular error and simply
ignore it, so we don't pollute the logs.
2015-01-27 13:57:02 +00:00
Daniel P. Berrange
f7afeddce9 qemu: report TAP device indexes to systemd
Record the index of each TAP device created and report them to
systemd, so they show up in machinectl status for the VM.
2015-01-27 13:57:02 +00:00
Ján Tomko
d0ab79e9cd Fix shadowed variable warning
libvirtd.c: In function 'daemonSetupAccessManager':
libvirtd.c:730:18: error: declaration of 'driver' shadows
  a global declaration [-Werror=shadow]
     const char **driver = (const char **)config->access_drivers;
                  ^
In file included from libvirtd.c:95:0:
../src/node_device/node_device_driver.h:43:36: error: shadowed
  declaration is here [-Werror=shadow]
 extern virNodeDeviceDriverStatePtr driver;
                                    ^
2015-01-27 13:43:23 +01:00
Chen Hanxiao
95da191376 storage: add a flag to clone files on btrfs
When creating a RAW file, we don't take advantage
of clone of btrfs.

Add a VIR_STORAGE_VOL_CREATE_REFLINK flag to request
a reflink copy.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2015-01-27 13:41:14 +01:00
Chen Hanxiao
466b29c8c3 storage: introduce btrfsCloneFile() for COW copy
Add a wrapper for BTRFS_IOC_CLONE ioctl.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2015-01-27 13:24:10 +01:00
Daniel P. Berrange
55ea7be7d9 Removing probing of secondary drivers
For stateless, client side drivers, it is never correct to
probe for secondary drivers. It is only ever appropriate to
use the secondary driver that is associated with the
hypervisor in question. As a result the ESX & HyperV drivers
have both been forced to do hacks where they register no-op
drivers for the ones they don't implement.

For stateful, server side drivers, we always just want to
use the same built-in shared driver. The exception is
virtualbox which is really a stateless driver and so wants
to use its own server side secondary drivers. To deal with
this virtualbox has to be built as 3 separate loadable
modules to allow registration to work in the right order.

This can all be simplified by introducing a new struct
recording the precise set of secondary drivers each
hypervisor driver wants

struct _virConnectDriver {
    virHypervisorDriverPtr hypervisorDriver;
    virInterfaceDriverPtr interfaceDriver;
    virNetworkDriverPtr networkDriver;
    virNodeDeviceDriverPtr nodeDeviceDriver;
    virNWFilterDriverPtr nwfilterDriver;
    virSecretDriverPtr secretDriver;
    virStorageDriverPtr storageDriver;
};

Instead of registering the hypervisor driver, we now
just register a virConnectDriver instead. This allows
us to remove all probing of secondary drivers. Once we
have chosen the primary driver, we immediately know the
correct secondary drivers to use.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
220c01aa0a don't disable state driver when libvirtd is not built
A bunch of code is wrapped in #if WITH_LIBVIRTD in order to
enable the virStateDriver to be disabled when libvirtd is not
built. Disabling this code doesn't have any real functional
benefit beyond removing 1 pointer from the virConnectPtr struct,
while having a cost of many more conditionals.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
f35fa0fd95 Remove all secondary driver private data fields
Now all drivers are converted to use their global state
directly, there is no need for private data fields for
the secondary drivers in virConnectPtr
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
d85f9f1a7e Remove use of interfacePrivateData from udev driver
The udev driver can be implemented using global state instead
of the connect private data.
2015-01-27 12:02:04 +00:00
Daniel P. Berrange
60b966b378 Remove use of nodeDevicePrivateData from nodeDev driver
The node device driver can rely on its global state instead
of the connect private data.
2015-01-27 12:02:03 +00:00
Daniel P. Berrange
47b7f661a4 Remove use of storagePrivateData/networkPrivateData from vbox
The vbox driver can use the main hypervisor private data and
so does not need to use the storage/network private data fields.
2015-01-27 12:02:03 +00:00
Daniel P. Berrange
7b1ba9566b Remove use of nwfilterPrivateData from nwfilter driver
The nwfilter driver can rely on its global state instead
of the connect private data.
2015-01-27 12:02:03 +00:00
Daniel P. Berrange
04101f23d0 Remove use of secretPrivateData from secret driver
The secret driver can rely on its global state instead
of the connect private data.
2015-01-27 12:02:03 +00:00
Peter Krempa
d13f56f08a qemu: Fix job handling in qemuDomainSetMetadata
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:39:21 +01:00
Peter Krempa
fb2ed975c3 qemu: Fix job type in qemuDomainGetBlockIoTune
The function just queries status so there's no need for a MODIFY type
job.
2015-01-27 10:39:21 +01:00
Peter Krempa
c5ee5cfb18 qemu: Fix job handling in qemuDomainSetSchedulerParametersFlags
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:38:47 +01:00
Peter Krempa
4fd7a72075 qemu: Fix job handling in qemuDomainSetMemoryParameters
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:24:04 +01:00
Peter Krempa
e3e72743df qemu: Fix job handling in qemuDomainSetAutostart
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.

This patch also fixes a few very long lines of code around the touched
parts.
2015-01-27 10:24:04 +01:00
Peter Krempa
79e5603307 qemu: Fix job handling in qemuDomainPinEmulator
The code modifies the domain configuration but doesn't take a MODIFY
type job to do so.
2015-01-27 10:24:04 +01:00