10526 Commits

Author SHA1 Message Date
Peter Krempa
f094aaac48 qemu: Prefer VFIO for PCI device passthrough
Prefer using VFIO (if available) to the legacy KVM device passthrough.

With this patch a PCI passthrough device without the driver configured
will be started with VFIO if it's available on the host. If not legacy
KVM passthrough is checked and error is reported if it's not available.
2013-10-10 12:00:56 +02:00
Peter Krempa
467b561ac2 qemu: hostdev: Add checks if PCI passthrough is available in the host
Add code to check availability of PCI passhthrough using VFIO and the
legacy KVM passthrough and use it when starting VMs and hotplugging
devices to live machine.
2013-10-10 10:35:01 +02:00
Peter Krempa
f24150b1f5 qemu: hostdev: Fix function spacing and header formatting 2013-10-10 10:32:07 +02:00
Peter Krempa
a863b89010 qemu: refactor qemuCompressProgramAvailable() 2013-10-09 18:26:48 +02:00
Peter Krempa
f2b0a5336e qemu: Fix coding style in qemuDomainSaveFlags()
Avoid mixed brace style in an if statement and fix formatting of error
messages.
2013-10-09 18:26:48 +02:00
Ján Tomko
3f029fb531 LXC: Fix handling of RAM filesystem size units
Since 76b644c when the support for RAM filesystems was introduced,
libvirt accepted the following XML:
<source usage='1024' unit='KiB'/>

This was parsed correctly and internally stored in bytes, but it
was formatted as (with an extra 's'):
<source usage='1024' units='KiB'/>
When read again, this was treated as if the units were missing,
meaning libvirt was unable to parse its own XML correctly.

The usage attribute was documented as being in KiB, but it was not
scaled if the unit was missing. Transient domains still worked,
because this was balanced by an extra 'k' in the mount options.

This patch:
Changes the parser to use 'units' instead of 'unit', as the latter
was never documented (fixing persistent domains) and some programs
(libvirt-glib, libvirt-sandbox) already parse the 'units' attribute.

Removes the extra 'k' from the tmpfs mount options, which is needed
because now we parse our own XML correctly.

Changes the default input unit to KiB to match documentation, fixing:
https://bugzilla.redhat.com/show_bug.cgi?id=1015689
2013-10-09 17:44:45 +02:00
Chen Hanxiao
fc9a416df7 cgroup: fix a comment typo in vircgroup.c
s/shoule/should

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-10-09 17:16:58 +02:00
Ján Tomko
63b6e59fd0 storage: Use bool instead of int
Commit 532fef3 added two-state 'need_alloc' and exposed
'want_sparse' which also only has two states.

Change their type from int to bool.
2013-10-09 09:37:12 +02:00
Giuseppe Scrivano
a90b9778c2 build: fix linker error on FreeBSD
Commit 2d74822a9eb4856c7f5216bb92bcb76630660f72 renamed
"freebsdNodeGetCPUCount" to "appleFreebsdNodeGetCPUCount", leaving one
call to "freebsdNodeGetCPUCount".  Fix this other case.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2013-10-08 12:45:20 -06:00
Peter Krempa
9d13298901 qemu: hostdev: Refactor PCI passhrough handling
To simplify future patches dealing with this code, simplify and refactor
some conditions to switch statements.
2013-10-08 15:24:27 +02:00
Michal Privoznik
4b744d7d00 virerror: s/VIR_ERR_STORAGE_VOL_EXISTS/VIR_ERR_STORAGE_VOL_EXISTS/
We currently have other error codes in singular form, e.g.
VIR_ERR_NETWORK_EXIST. Cleanup the previous patch to match the form.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2013-10-07 19:21:47 +02:00
Hongwei Bi
91875896d5 fix a ambiguous output of the command:'virsh vol-create-as'
I created a storage volume(eg: test) from a storage pool(eg:vg10) using
the following command:"virsh vol-create-as --pool vg10 --name test --capacity 300M."
When I re-executed the above command, the output was as the following:
"error: Failed to create vol test
 error: Storage volume not found: storage vol 'test' already exists"

I think the output "Storage volume not found" is not appropriate. Because in fact storage
vol test has been found at this time. And then I think virErrorNumber should includes
VIR_ERR_STORAGE_EXIST which can also be used elsewhere. So I make this patch. The result
is as following:
"error: Failed to create vol test
 error: storage volume 'test' exists already"
2013-10-07 18:26:09 +02:00
Daniel P. Berrange
999d72fbd5 Remove use of virConnectPtr from all remaining nwfilter code
The virConnectPtr is passed around loads of nwfilter code in
order to provide it as a parameter to the callback registered
by the virt drivers. None of the virt drivers use this param
though, so it serves no purpose.

Avoiding the need to pass a virConnectPtr means that the
nwfilterStateReload method no longer needs to open a bogus
QEMU driver connection. This addresses a race condition that
can lead to a crash on startup.

The nwfilter driver starts before the QEMU driver and registers
some callbacks with DBus to detect firewalld reload. If the
firewalld reload happens while the QEMU driver is still starting
up though, the nwfilterStateReload method will open a connection
to the partially initialized QEMU driver and cause a crash.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-07 14:19:10 +01:00
Daniel P. Berrange
ebca369e3f Don't pass virConnectPtr in nwfilter 'struct domUpdateCBStruct'
The nwfilter driver only needs a reference to its private
state object, not a full virConnectPtr. Update the domUpdateCBStruct
struct to have a 'void *opaque' field instead of a virConnectPtr.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-07 14:19:10 +01:00
Daniel P. Berrange
b77b16ce41 Remove virConnectPtr arg from virNWFilterDefParse*
None of the virNWFilterDefParse* methods require a virConnectPtr
arg, so just drop it

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-07 14:19:10 +01:00
Claudio Bley
609eb987c6 Adjust legacy max payload size to account for header information
Commit 27e81517a87 set the payload size to 256 KB, which is
actually the max packet size, including the size of the header.

Reduce this by VIR_NET_MESSAGE_HEADER_MAX (24) and set
VIR_NET_MESSAGE_LEGACY_PAYLOAD_MAX to 262120, which was the original
value before increasing the limit in commit eb635de1fed.
2013-10-07 13:28:44 +02:00
Ryota Ozaki
2d74822a9e nodeinfo: make freebsdNodeGetCPUCount work on Mac OS X
This fixes the following error:
  error : nodeGetInfo:933 : this function is not supported
  by the connection driver: node info not implemented on this platform

The freebsdNodeGetCPUCount was renamed to appleFreebsdNodeGetCPUCount
in order to make more visible the fact, that it works on Mac OS X too.

Mac OS X can use sysctlbyname as same as FreeBSD to get the CPU
frequency. However, the MIB style name is different from FreeBSD's.
And the unit of the return frequency is also different.

Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2013-10-07 10:28:18 +02:00
Ryota Ozaki
5a468b38b6 rpc: fix getsockopt for LOCAL_PEERCRED on Mac OS X
This fixes the following error:
  error : virGetUserEnt:703 : Failed to find user record for uid '32654'

'32654' (it's random and varies) comes from getsockopt with
LOCAL_PEERCRED option. getsockopt returns w/o error but seems
to not set any value to the buffer for uid.

For Mac OS X, LOCAL_PEERCRED has to be used with SOL_LOCAL level.
With SOL_LOCAL, getsockopt returns a correct uid.

Note that SOL_LOCAL can be found in
/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys/un.h.

Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2013-10-07 10:18:55 +02:00
Eric Blake
51c8216594 build: fix build on RHEL 5
On RHEL 5, compilation fails with:

storage/storage_backend.c: In function 'createRawFile':
storage/storage_backend.c:339: warning: implicit declaration of function 'fallocate'
storage/storage_backend.c:339: warning: nested extern declaration of 'fallocate' [-Wnested-externs]

But:

$ grep HAVE_FALLOCATE config.h
/* #undef HAVE_FALLOCATE */

Huh? It turns out that in kernels that old, fallocate() is not
implemented (config.h is correct), but <linux/fs.h> defines
HAVE_FALLOCATE as an empty witness macro for a completely
different purpose.  Since storage_backend.c is including
<linux/fs.h> on RHEL 5, we are hosed by the kernel definition.
Newer kernels no longer pollute the namespace, and it's fairly
easy to convert to an expression that works with both the old
kernel witness and the new-style config.h (undefined or 1).

Problem introduced in commit 532fef3.

* src/storage/storage_backend.c (createRawFile): Avoid namespace
pollution from kernel, by checking HAVE_FALLOCATE for a value.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-10-04 17:33:37 -06:00
Eric Blake
bdc55cc7d2 build: fix build --without-remote
I tried to test ./configure --without-lxc --without-remote.
First, the build failed with some odd errors, such as an
inability to build xen, or link failures for virNetTLSInit.
But when you think about it, once there is no remote code,
all of libvirtd is useless, any stateful driver that depends
on libvirtd is also not worth compiling, and any libraries
used only by RPC code are not needed.  So I patched
configure.ac to make for some saner defaults when an
explicit disable is attempted.  Similarly, since we have
migrated virnetdevbridge into generic code, the workaround
for Linux kernel stupidity must not depend on stateful
drivers being in use.

Then there's 'make check' that needs segregation.

Wow - quite a bit of cleanup to make --without-remote useful :)

* configure.ac: Let --without-remote toggle defaults on stateful
drivers and other libraries.  Pick up Linux kernel workarounds
even when qemu and lxc are not being compiled.
* tests/Makefile.am (test_programs): Factor out programs that
require remote.
* src/libvirt_private.syms (rpc/virnet*.h): Move...
* src/libvirt_remote.syms: ...into new file.
* src/Makefile.am (SYM_FILES): Ship new syms file.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-10-04 17:01:47 -06:00
Oskari Saarenmaa
532fef369f storage: fix file allocation behavior in file cloning
Fixed the safezero call for allocating the rest of the file after cloning
an existing volume; it used to always use a zero offset, causing it to
only allocate the beginning of the file.

Also modified file creation to try to use fallocate(2) to pre-allocate
disk space before copying any data to make sure it fails early on if disk
is full and makes sure we can skip zero blocks when copying file contents.

If fallocate isn't available we will zero out the rest of the file after
cloning and only use sparse cloning if client requested a lower allocation
than the input volume's capacity.

Signed-off-by: Oskari Saarenmaa <os@ohmu.fi>
2013-10-04 16:18:44 +02:00
Oskari Saarenmaa
b63a1d0e95 virfile: safezero: fix buffer allocation max size
My previous commit 7dc1d4ab was supposed to change safezero to allocate
1 megabyte at maximum, but had the logic reversed and will allocate 1
megabyte at minimum (and a lot more at maximum.)

Signed-off-by: Oskari Saarenmaa <os@ohmu.fi>
2013-10-04 16:10:27 +02:00
Cole Robinson
fc9ff1f249 test: Fix coverity warnings 2013-10-04 10:06:56 -04:00
Peter Krempa
f8e2da01be qemu: Use maximum guest memory size when getting NUMA placement advice
When starting the VM the guest balloon driver is not loaded at that
time. We need to ask numad for placement of the complete VM.
2013-10-04 14:57:54 +02:00
Gao feng
391b82722e Free cmd in virNetDevVethCreate
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-10-04 12:03:19 +01:00
Gao feng
524b21979a Free cmd in virNetDevVethDelete
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-10-04 12:02:38 +01:00
Oskari Saarenmaa
7dc1d4ab89 virfile: safezero: fall back to writing block by block if mmap fails
mmap can fail on 32-bit systems if we're trying to zero out a lot of data.
Fall back to using block-by-block writing in that case.  While we could map
smaller blocks it's unlikely that this code is used a lot and its easier to
just fall back to one of the existing methods.

Also modified the block-by-block zeroing to not allocate a megabyte of
zeroes if we're writing less than that.

Signed-off-by: Oskari Saarenmaa <os@ohmu.fi>
2013-10-04 08:22:36 +02:00
Cole Robinson
68cc45b6f9 test: snapshot: Add REDEFINE support 2013-10-03 17:31:55 -04:00
Cole Robinson
670e86bfd7 qemu: snapshot: Break out redefine preparation to shared function 2013-10-03 17:31:55 -04:00
Cole Robinson
390c06b675 test: Implement snapshot create/delete/revert APIs
Again stolen from qemu_driver.c, but dropping all the unneeded bits.
This aims to copy all the current qemu validation checks since that's
the most commonly used real driver, but some of the checks are
completely artificial in the test driver.

This only supports creation of internal snapshots for initial
simplicity.
2013-10-03 17:26:50 -04:00
Cole Robinson
1d24185284 test: Allow specifying domainsnapshot XML
The user can pass it as a <test:domainsnapshot> subelement of a <domain>.
2013-10-03 16:52:54 -04:00
Cole Robinson
56ff156d15 qemu: snapshots: Simplify REDEFINE flag check
Makes things more readable IMO
2013-10-03 16:52:54 -04:00
Laine Stump
9881bfed25 qemu: check actual netdev type rather than config netdev type during init
This resolves:

   https://bugzilla.redhat.com/show_bug.cgi?id=1012824
   https://bugzilla.redhat.com/show_bug.cgi?id=1012834

Note that a similar problem was reported in:

   https://bugzilla.redhat.com/show_bug.cgi?id=827519

but the fix only worked for <interface type='hostdev'>, *not* for
<interface type='network'> where the network itself was a pool of
hostdevs.

The symptom in both cases was this error message:

   internal error: Unable to determine device index for network device

In both cases the cause was lack of proper handling for netdevs
(<interface>) of type='hostdev' when scanning the netdev list looking
for alias names in qemuAssignDeviceNetAlias() - those that aren't
type='hostdev' have an alias of the form "net%d", while those that are
hostdev use "hostdev%d". This special handling was completely lacking
prior to the fix for Bug 827519 which was:

When searching for the highest alias index, libvirt looks at the alias
for each netdev and if it is type='hostdev' it ignores the entry. If
the type is not hostdev, then it expects the "net%d" form; if it
doesn't find that, it fails and logs the above error message.

That fix works except in the case of <interface type='network'> where
the network uses hostdev (i.e. the network is a pool of VFs to be
assigned to the guests via PCI passthrough). In this case, the check
for type='hostdev' would fail because it was done as:

     def->net[i]->type == VIR_DOMAIN_NET_TYPE_HOSTDEV

(which compares what was written in the config) when it actually
should have been:

    virDomainNetGetActualType(def->net[i]) == VIR_DOMAIN_NET_TYPE_HOSTDEV

(which compares the type of netdev that was actually allocated from
the network at runtime).

Of course the latter wouldn't be of any use if the netdevs of
type='network' hadn't already acquired their actual network connection
yet, but manual examination of the code showed that this is never the
case.

While looking through qemu_command.c, two other places were found to
directly compare the net[i]->type field rather than getting actualType:

* qemuAssignDeviceAliases() - in this case, the incorrect comparison
  would cause us to create a "net%d" alias for a netdev with
  type='network' but actualType='hostdev'. This alias would be
  subsequently overwritten by the proper "hostdev%d" form, so
  everything would operate properly, but a string would be
  leaked. This patch also fixes this problem.

* qemuAssignDevicePCISlots() - would defer assigning a PCI address to
  a netdev if it was type='hostdev', but not for type='network +
  actualType='hostdev'. In this case, the actual device usually hasn't
  been acquired yet anyway, and even in the case that it has, there is
  no practical difference between assigning a PCI address while
  traversing the netdev list or while traversing the hostdev
  list. Because changing it would be an effective NOP (but potentially
  cause some unexpected regression), this usage was left unchanged.
2013-10-03 11:06:45 -04:00
Daniel P. Berrange
fe3f108d85 Use 'vnet' as prefix for veth devices
The XML parser reserves 'vnet' as a prefix for automatically
generated NIC device names. Switch the veth device creation
to use this prefix, so it does not have to worry about clashes
with user specified names in the XML.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-03 11:28:44 +01:00
Daniel P. Berrange
f2e53555eb Retry veth device creation on failure
The veth device creation code run in two steps, first it looks
for two free veth device names, then it runs ip link to create
the veth pair. There is an obvious race between finding free
names and creating them, when guests are started in parallel.

Rewrite the code to loop and re-try creation if it fails, to
deal with the race condition.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-03 11:28:30 +01:00
Daniel P. Berrange
8766e9b5a5 Avoid deleting NULL veth device name
If veth device allocation has a fatal error, the veths
array may contain NULL device names. Avoid calling the
virNetDevVethDelete function on such names.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-03 11:28:08 +01:00
Daniel P. Berrange
10caf94ddc Avoid reporting an error if veth device is already deleted
The kernel automatically destroys veth devices when cleaning
up the container network namespace. During normal shutdown, it
is thus likely that the attempt to run 'ip link del vethN'
will fail. If it fails, check if the device exists, and avoid
reporting an error if it has gone. This switches to use the
virCommand APIs instead of virRun too.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-03 11:28:06 +01:00
Daniel P. Berrange
f5eae57086 Don't set netdev offline in container cleanup
During container cleanup there is a race where the kernel may
have destroyed the veth device before we try to set it offline.
This causes log error messages. Given that we're about to
delete the device entirely, setting it offline is pointless.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-03 11:25:20 +01:00
Michal Privoznik
3e8343e151 qemuMonitorJSONSendKey: Avoid double free
After successful @cmd construction the memory where @keys points to is
part of @cmd. Avoid double freeing it.
2013-10-03 08:57:57 +02:00
Michal Privoznik
ec07a9e84b qemuMonitorJSONGetVirtType: Fix error message
When querying for kvm, we try to find 'enabled' field. Hence the error
message should report we haven't found 'enabled' and not 'running'
(which is not even in the reply). Probably a typo or copy-paste error.
2013-10-03 08:57:50 +02:00
Michal Privoznik
9fa10d3901 qemu_hotplug: Allow QoS update in qemuDomainChangeNet
The qemuDomainChangeNet() is called when 'virsh update-device' is
invoked on a NIC. Currently, we fail to update the QoS even though
we have routines for that.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2013-10-02 10:48:03 +02:00
Michal Privoznik
ee02fbc8e4 virNetDevBandwidthEqual: Make it more robust
So far the virNetDevBandwidthEqual() expected both ->in and ->out items
to be allocated for both @a and @b compared. This is not necessary true
for all our code. For instance, running 'update-device' twice over a NIC
with the very same XML results in SIGSEGV-ing in this function.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2013-10-02 10:47:22 +02:00
Cole Robinson
c4510fd147 test: Implement readonly snapshot APIs
This is just stolen from qemu_driver.c with tweaks to fit the
test driver.
2013-10-01 11:59:07 -04:00
Cole Robinson
25314fa6c5 test: Wire up managed save APIs
Also add a <test:hasmanagedsave> element to set this data when starting
the connection.
2013-10-01 11:33:56 -04:00
Cole Robinson
d82ea6ec4e test: Allow specifying object transient state in driver XML
Similar to the runstate commit, allow a boolean <test:transient/>
element for setting domain persistence at driver startup.
2013-10-01 11:27:21 -04:00
Cole Robinson
a924d9d083 qemu: cgroup: Fix crash if starting nographics guest
We can dereference graphics[0] even if guest has no graphics device
configured. I screwed this up in a216e6487255d3b65d97c7ec1fa5da63dbced902

https://bugzilla.redhat.com/show_bug.cgi?id=1014088
2013-10-01 11:22:18 -04:00
Ján Tomko
f1bdcb2be9 selinux: Only close the selabel_handle once
On selinux driver initialization failure (missing/incorrectly
formatted contexts file), selabel_handle was closed twice.

Introduced by 6159710.
2013-10-01 15:00:07 +02:00
Laine Stump
e4e73337e5 util: recognize SMB/CIFS filesystems as shared
This should resolve:

  https://bugzilla.redhat.com/show_bug.cgi?id=1012085

libvirt previously recognized NFS, GFS2, OCFS2, and AFS filesystems as
"shared", and thus eligible for exceptions to certain rules/actions
about chowning image files before handing them off to a guest. This
patch widens the definition of "shared filesystem" to include SMB and
CIFS filesystems (aka "Windows file sharing"); both of these use the
same protocol, but different drivers so there are different magic
numbers for each.
2013-10-01 05:45:05 -04:00
Michal Privoznik
64f1e1688d qemu_capabilities: Introduce virQEMUCapsInitQMPMonitor
This basically covers the talking-to-monitor part of
virQEMUCapsInitQMP.  The patch itself has no real value,
but it creates an entity to be tested in the next patches.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2013-10-01 10:48:47 +02:00
Chen Hanxiao
4b2b078a8b lxc: do cleanup when failed to bind fs as read-only
We forgot to do cleanup when lxcContainerMountFSTmpfs
failed to bind fs as read-only.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2013-09-30 13:30:43 -06:00