Finish is the final state in v2 of our migration protocol. If something
fails, we have no option to abort the migration and resume the original
domain. Non fatal errors (such as failure to start guest CPUs or make
the domain persistent) has to be treated as success. Keeping the domain
running while reporting the failure was just asking for trouble.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Whenever something fails during incoming migration in Finish phase
before we started guest CPUs, we need to kill the domain in addition to
reporting the failure.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When we save status XML at the point during migration where we have
already started the domain on destination, we can't really go back and
abort migration. Thus the only thing we can do is to log a warning and
report success.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Offline migration is quite special because we don't really need to do
anything but make the domain persistent. Let's do it separately from
normal migration to avoid cluttering the code with
!(flags & VIR_MIGRATE_OFFLINE).
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
After attach-device a <hostdev> with --config, new device doesn't
show up in dumpxml and in guest.
To fix that, set dev->data.hostdev = NULL after work so that the
pointer is not freed, since vmdef has the pointer and still need it.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Tool such as libguestfs need the datacenter path to get access to disk
images. The ESX driver knows the correct datacenter path, but this
information cannot be accessed using libvirt API yet. Also, it cannot
be deduced from the connection URI in a robust way.
Expose the datacenter path in the domain XML as <vmware:datacenterpath>
node similar to the way the <qemu:commandline> node works. The new node
is ignored while parsing the domain XML. In contrast to <qemu:commandline>
it is output only.
Commit id 'f1f68ca33' added code to remove the directory paths for
auto-generated sockets, but that code could be called before the
paths were created resulting in generating error messages from
virFileDeleteTree indicating that the file doesn't exist.
Rather than "enforce" all callers to make the non-NULL and existence
checks, modify the virFileDeleteTree API to silently ignore NULL on
input and non-existent directory trees.
When looking for a QEMU binary suitable for running ppc64le guests
we have to take into account the fact that we use the QEMU target
as key for the hash, so direct comparison is not good enough.
Factor out the logic from virQEMUCapsFindBinaryForArch() to a new
virQEMUCapsFindTarget() function and use that both when looking
for QEMU binaries available on the system and when looking up
QEMU capabilities later.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1260753
libxl/libxl_conf.c: In function 'libxlDriverConfigNew':
libxl/libxl_conf.c:1560:30: error: 'log_level' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
Instead of a hardcoded DEBUG log level, use the overall
daemon log level specified in libvirtd.conf when opening
a log stream with libxl. libxl is very verbose when DEBUG
log level is set, resulting in huge log files that can
potentially fill a disk. Control of libxl verbosity should
be placed in the administrator's hands.
Fixes the following error when attempting to add a disk with bus='virtio'
to a machine which actually supports virtio-mmio (caught with ARM virt):
virtio disk cannot have an address of type 'virtio-mmio'
The problem has been likely introduced by
e8d5517254. Before that
qemuAssignDevicePCISlots() was never called for ARM "virt" machine.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1124841
If running in session mode it may happen that we fail to set
correct SELinux label, but the image may still be readable to
the qemu process. Take this into account.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We may want to do some decisions in drivers based on fact if we
are running as privileged user or not. Propagate this info there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We have plenty of callbacks in the driver. Some of these
callbacks require more than one argument to be passed. For that
we currently have a data type (struct) per each callback. Well,
so far for only one - SELinuxSCSICallbackData. But lets turn it
into more general name so it can be reused in other callbacks too
instead of each one introducing a new, duplicate data type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit f1f68ca334 tried fixing running multiple domains under various
users, but if the user can't browse the directory, it's hard for the
qemu running under that user to create the monitor socket.
The permissions need to be fixed in two places in the spec file due to
support for both installations with and without driver modules.
Creating a directory with '$(MKDIR_P) -m' shouldn't fail even on systems
where autoconf needs to fallback to 'install-sh -d'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146886
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit 8125113c added code that should remove the disk backend if the
fronted hotplug failed for any reason. The code had a bug though as it
used the disk string for unplug rather than the backend alias. Fix the
code by pre-creating an alias string and using it instead of the disk
string. In cases where qemu does not support QEMU_CAPS_DEVICE, we ignore
the unplug of the backend since we can't really create an alias in that
case.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1262399
There's a couple reports of things failing in this area (bug 1259070),
but it's tough to tell what's going wrong without stderr from
qemu-bridge-helper. So let's report stderr in the error message
Couple new examples:
virbr0 is inactive:
internal error: /usr/libexec/qemu-bridge-helper --use-vnet --br=virbr0 --fd=21: failed to communicate with bridge helper: Transport endpoint is not connected
stderr=failed to get mtu of bridge `virbr0': No such device
bridge isn't on the ACL:
internal error: /usr/libexec/qemu-bridge-helper --use-vnet --br=br0 --fd=21: failed to communicate with bridge helper: Transport endpoint is not connected
stderr=access denied by acl file
The xenXMConfigCacheRefresh method scans /etc/xen and loads
all config files it finds. It then scans its internal hash
table and purges any (previously) loaded config files whose
refresh timestamp does not match the timestamp recorded at
the start of xenXMConfigCacheRefresh(). There is unfortunately
a subtle flaw in this, because if loading the config files
takes longer than 1 second, some of the config files will
have a refresh timestamp that is 1 or more seconds different
(newer) than is checked for. So we immediately purge a bunch
of valid config files we just loaded.
To avoid this flaw, we must pass the timestamp we record at
the start of xenXMConfigCacheRefresh() into the
xenXMConfigCacheAddFile() method, instead of letting the
latter call time(NULL) again.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
commit 4b53d0d4ac "libxl: don't remove persistent domain on start
failure" cleans up the vm object and sets it to NULL if the vm is not
persistent, however at end job vm (now NULL) is dereferenced via the call to
libxlDomainObjEndJob. Avoid this by skipping "endjob" and going
straight to "cleanup" in this case.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Up until now, the default has been rtl8139, but no check was in
place to make sure that device was actually available.
Now we try rtl8139, e1000 and virtio-net in turn, checking for
availability before using any of them: this means we have a much
better chance for the guest to be able to boot.
Commit f1f68ca334 did not report an error if virFileMakePath()
returned -1. Well, who would've guessed function with name starting
with 'vir' sets an errno instead of reporting an error the libvirt way.
Anyway, let's fix it, so the output changes from:
$ virsh start arm
error: Failed to start domain arm
error: An error occurred, but the cause is unknown
to:
$ virsh start arm
error: Failed to start domain arm
error: Cannot create directory '/var/lib/libvirt/qemu/domain-arm': Not
a directory
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146886
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
If the current live definition does not have memory hotplug enabled, but
the persistent one does libvirt would reject migration if the
destination does not support memory hotplug even if the user didn't want
to persist the VM at the destination and thus the XML containing the
memory hotplug definition would not be used. To fix this corner case the
code will check for memory hotplug in the newDef only if
VIR_MIGRATE_PERSIST_DEST was used.
Commit 35847860f6 Added the virFileUnlink function, but failed to add
a version for mingw build, causing the following error:
Cannot export virFileUnlink: symbol not defined
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1260846
Introduced by 8fedbbdb, if we parse an unordered NUMA cell, will
get a segfault. This is because of a check for overlapping @cpus
sets we have there. However, since the array to hold guest NUMA
cells is allocated upfront and therefore it contains all zeros,
an out of order cell will break our assumption that cell IDs have
increasing character. At this point we try to access yet NULL
bitmap and therefore segfault.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Running valgrind on a very simplistic program consisting only of
opening and closing admin connection (virAdmConnect{Open,Close}) shows a
leak in remoteAdminPrivNew, because the last reference to privateData is
not decremented, thus the object won't be disposed. This patch unrefs
the privateData object once we closed the active connection to daemon,
making further use of this connection useless.
==24577== at 0x4A089C7: calloc (in /usr/lib64/valgrind/vgpreload_***linux.so)
==24577== by 0x4E8835F: virAllocVar (viralloc.c:560)
==24577== by 0x4EDFA5C: virObjectNew (virobject.c:193)
==24577== by 0x4EDFBD4: virObjectLockableNew (virobject.c:219)
==24577== by 0x4C14DAF: remoteAdminPrivNew (libvirt-admin.c:152)
==24577== by 0x4C1537E: virAdmConnectOpen (libvirt-admin.c:308)
==24577== by 0x400BAD: main (listservers.c:39)
==24577== LEAK SUMMARY:
==24577== definitely lost: 80 bytes in 1 blocks
==24577== indirectly lost: 840 bytes in 6 blocks
==24577== possibly lost: 0 bytes in 0 blocks
==24577== still reachable: 12,179 bytes in 199 blocks
==24577== suppressed: 0 bytes in 0 blocks
Coverity claims it could be possible to call virDBusTypeStackFree with
*stack == NULL and although the two API's that call it don't appear to
allow that - I suppose it's better to be safe than sorry
In virFileNBDDeviceFindUnused if virFileNBDDeviceIsBusy returns 0,
then both branches jumped to cleanup, so just use ignore_value
since the function returns NULL or some memory and the caller
handles the error.
Commit id '692e9fac7' used virProcessSetNamespaces instead of inlining
the similar functionality; however, Coverity notes that the function
prototype expects a size_t value and not an enum and complains. So,
just typecast the enum as a size_t to avoid the noise.
Commit id '2e7cea243' added a check for an error from Finish instead
of 'unexpected error'; however, if for some reason there wasn't an
error, then virGetLastError could return NULL resulting in the
NULL pointer deref to err->domain.
https://bugzilla.redhat.com/show_bug.cgi?id=1258361
When attaching a disk, controller, or rng using an address type ccw
or s390, we need to ensure the support is provided by both the machine.os
and the emulator capabilities (corollary to unconditional setting when
address was not provided for the correct machine.os and emulator.
For an inactive guest, an addition followed by a start would cause the
startup to fail after qemu_command builds the command line and attempts
to start the guest. For an active guest, libvirtd would crash.
Rather than have different usages of STR function in order to determine
whether the domain is s390-ccw or s390-ccw-virtio, make a single API
which will check the machine.os prefix. Then use the function.
Before libvirt sets the MAC address of the physdev (the physical
ethernet device) linked to a macvtap passthrough device, it always
saves the previous MAC address to restore when the guest is finished
(following a "leave nothing behind" policy). For a long time it
accomplished the save/restore with a combination of
ioctl(SIOCGIFHWADDR) and ioctl(SIOCSIFHWADDR), but in commit cbfe38c
(first in libvirt 1.2.15) this was changed to use netlink RTM_GETLINK
and RTM_SETLINK commands sent to the Physical Function (PF) of any
device that was detected to be a Virtual Function (VF).
We later found out that this caused problems with any devices using
the Cisco enic driver (e.g. vmfex cards) because the enic driver
hasn't implemented the function that is called to gather the
information in the IFLA_VFINFO_LIST attribute of RTM_GETLINK
(ndo_get_vf_config() for those keeping score), so we would never get
back a useful response.
In an ideal world, all drivers would implement all functions, but it
turns out that in this case we can work around this omission without
any bad side effects - since all macvtap passthrough <interface>
definitions pointing to a physdev that uses the enic driver *must*
have a <virtualport type='802.1Qbh'>, and since no other type of
ethernet devices use 802.1Qbh, libvirt can change its behavior in this
case to use the old-style. ioctl(SIOC[GS]IFHWADDR). That's what this
patch does.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1257004
These functions were made static as a part of commit cbfe38c since
they were no longer called from outside virnetdev.c. We once again
need to call them from another file, so this patch makes them once
again public.
Well, in 8ad126e6 we tried to fix a memory corruption problem.
However, the fix was not as good as it could be. I mean, the
commit has one line more than it should. I've noticed this output
just recently:
# ./run valgrind --leak-check=full --show-reachable=yes ./tools/virsh domblklist gentoo
==17019== Memcheck, a memory error detector
==17019== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==17019== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==17019== Command: /home/zippy/work/libvirt/libvirt.git/tools/.libs/virsh domblklist gentoo
==17019==
Target Source
------------------------------------------------
fda /var/lib/libvirt/images/fd.img
vda /var/lib/libvirt/images/gentoo.qcow2
hdc /home/zippy/tmp/install-amd64-minimal-20150402.iso
==17019== Thread 2:
==17019== Invalid read of size 4
==17019== at 0x4EFF5B4: virObjectUnref (virobject.c:258)
==17019== by 0x5038CFF: remoteClientCloseFunc (remote_driver.c:552)
==17019== by 0x5069D57: virNetClientCloseLocked (virnetclient.c:685)
==17019== by 0x506C848: virNetClientIncomingEvent (virnetclient.c:1852)
==17019== by 0x5082136: virNetSocketEventHandle (virnetsocket.c:1913)
==17019== by 0x4ECD64E: virEventPollDispatchHandles (vireventpoll.c:509)
==17019== by 0x4ECDE02: virEventPollRunOnce (vireventpoll.c:658)
==17019== by 0x4ECBF00: virEventRunDefaultImpl (virevent.c:308)
==17019== by 0x130386: vshEventLoop (vsh.c:1864)
==17019== by 0x4F1EB07: virThreadHelper (virthread.c:206)
==17019== by 0xA8462D3: start_thread (in /lib64/libpthread-2.20.so)
==17019== by 0xAB441FC: clone (in /lib64/libc-2.20.so)
==17019== Address 0x139023f4 is 4 bytes inside a block of size 240 free'd
==17019== at 0x4C2B1F0: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==17019== by 0x4EA8949: virFree (viralloc.c:582)
==17019== by 0x4EFF6D0: virObjectUnref (virobject.c:273)
==17019== by 0x4FE74D6: virConnectClose (libvirt.c:1390)
==17019== by 0x13342A: virshDeinit (virsh.c:406)
==17019== by 0x134A37: main (virsh.c:950)
The problem is, when registering remoteClientCloseFunc(), it's
conn->closeCallback which is ref'd. But in the function itself
it's conn->closeCallback->conn what is unref'd. This is causing
imbalance in reference counting. Moreover, there's no need for
the remote driver to increase/decrease conn refcount since it's
not used anywhere. It's just merely passed to client registered
callback. And for that purpose it's correctly ref'd in
virConnectRegisterCloseCallback() and then unref'd in
virConnectUnregisterCloseCallback().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit id '155ca616' added the 'refreshVol' API. In an NFS root-squash
environment it was possible that if the just created volume from XML wasn't
properly created with the right uid/gid and/or mode, then the followup
refreshVol will fail to open the volume in order to get the allocation/
capacity values. This would leave the volume still on the server and
cause a libvirtd crash because 'voldef' would be in the pool list, but
the cleanup code would free it.
Commit id '7c2d65dde2' changed the default value of mode to be -1 if not
supplied in the XML, which should cause creation of the volume using the
default mode of VIR_STORAGE_DEFAULT_VOL_PERM_MODE; however, the check
made was whether mode was '0' or not to use default or provided value.
This patch fixes the issue to check if the 'mode' was provided in the XML
and use that value.
In an NFS root-squashed environment the 'vol-delete' command will fail to
'unlink' the target volume since it was created under a different uid:gid.
This code continues the concepts introduced in virFileOpenForked and
virDirCreate[NoFork] with respect to running the unlink command under
the uid/gid of the child. Unlike the other two, don't retry on EACCES
(that's why we're here doing this now).
This will only be seen when debugging, but in order to help determine
whether a virFileOpenForceOwnerMode failed during an NFS root-squash
volume/file creation, add an error message from the child.
Adds a new interface type using UDP sockets, this seems only applicable
to QEMU but have edited tree-wide to support the new interface type.
The interface type required the addition of a "localaddr" (local
address), this then maps into the following xml and qemu call.
<interface type='udp'>
<mac address='52:54:00:5c:67:56'/>
<source address='127.0.0.1' port='11112'>
<local address='127.0.0.1' port='22222'/>
</source>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</interface>
QEMU call:
-net socket,udp=127.0.0.1:11112,localaddr=127.0.0.1:22222
Notice the xml "local" entry becomes the "localaddr" for the qemu call.
reference:
http://lists.gnu.org/archive/html/qemu-devel/2011-11/msg00629.html
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Like we are checking for the correct order in SYM_FILES, we
should do the same for ADMIN_SYM_FILES.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We have this check rule in src/Makefile: check-symfile that
should check if all symbols we wanted to export are exported.
Moreover, if we are not exporting something more. Do the same
with libvirt_admin.syms.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
commit 09778e09 switched from using ioctl(SIOCBRDELBR) for bridge
device deletion to using a netlink RTM_DELLINK message, which is the
more modern way to delete a bridge (and also doesn't require the
bridge to be ~IFF_UP to succeed). However, although older kernels
(e.g. 2.6.32, in RHEL6/CentOS6) support deleting *some* link types
with RTM_NEWLINK, they don't support deleting bridges, and there is no
compile-time way to figure this out.
This patch moves the body of the SIOCBRDELBR version of
virNetDevBridgeDelete() into a static function, calls the new function
from the original, and also calls the new function from the
RTM_DELLINK version if the RTM_DELLINK message generates an EOPNOTSUPP
error. Since RTM_DELLINK is done from the subordinate function
virNetlinkDelLink, which is also called for other purposes (deleting a
macvtap interface), a function pointer called "fallback" has been
added to the arglist of virNetlinkDelLink() - if that arg != NULL, the
provided function will be called when (and only when) RTM_DELLINK
fails with EOPNOTSUPP.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252780 (part 2)
commit fc7b23db switched from using ioctl(SIOCBRADDBR) for bridge
creation to using a netlink RTM_NEWLINK message with IFLA_INFO_KIND =
"bridge", which is the more modern way to create a bridge. However,
although older kernels (e.g. 2.6.32, in RHEL6/CentOS6) support
creating *some* link types with RTM_NEWLINK, they don't support
creating bridges, and there is no compile-time way to figure this out
(since the "type" isn't an enum, but rather a character string).
This patch moves the body of the SIOCBRADDBR version of
virNetDevBridgeCreate() into a static function, calls the new function
from the original, and also calls the new function from the
RTM_NEWLINK version if the RTM_NEWLINK message generates an EOPNOTSUPP
error.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252780
This reverts commit 1ce7c1d20c,
which introduced a significant semantic change to the
virDomainGetInfo() API. Additionally, the change was only
made to 2 of the 15 virt drivers.
Conflicts:
src/qemu/qemu_driver.c
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Commit f86ae403 moved acquiring a job from libxlDomainStart()
to its callers. One spot missed was in libxlDoMigrateReceive().
Acquire a job in libxlDoMigrateReceive() before calling
libxlDomainStart().
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Failure of libxl_domain_suspend() does not leave the domain in
a suspended state, so no need to call libxl_domain_resume(),
which btw will fail with "domain not suspended".
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
This patch fixes some flawed logic around ref counting the
libxlMigrationDstArgs object.
First, when adding sockets to the event loop with
virNetSocketAddIOCallback(), the generic virObjectFreeCallback()
was registered as a free function, with libxlMigrationDstArgs as
its parameter. A reference was also taken on
libxlMigrationDstArgs for each successful call to
virNetSocketAddIOCallback(). The rational behind this logic was
that the libxlMigrationDstArgs object had to out-live the socket
objects. But virNetSocketAddIOCallback() already takes a
reference on socket objects, ensuring their life until removed
from the event loop and unref'ed in virNetSocketEventFree(). We
only need to ensure libxlMigrationDstArgs lives until
libxlDoMigrateReceive() finishes, which can be done by simply
unref'ing libxlMigrationDstArgs at the end of
libxlDoMigrateReceive().
The second flaw was unref'ing the sockets in the failure path of
libxlMigrateReceive() and at the end of libxlDoMigrateReceive().
As mentioned above, the sockets are already unref'ed by
virNetSocketEventFree() when removed from the event loop.
Attempting to unref the socket a second time resulted in a
libvirtd crash since the socket was previously unref'ed and
disposed.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Now that virProcessSetNamespaces() does accept FD list in the
correct format, we can simply turn lxcAttachNS into calling
virProcessSetNamespaces().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far, if libvirt_lxc binary (usually to be found under
/usr/libexec/) is run with --help, due to a missing line
and our usual functions pattern, an 'uknown' error is returned.
Yeah, the help is printed out, but we should not claim error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far, the virProcessSetNamespaces() takes an array of FDs that
it tries to set namespace on. However, in the very next commit
this array may be sparse, having some -1's in it. Teach the
function to cope with that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So, after some movement in virt-aa-helper, I've noticed the
virt-aa-helper-test failing. I've ran gdb (it took me a while to
realize how to do that) and this showed up immediately:
Program received signal SIGSEGV, Segmentation fault.
strlen () at ../sysdeps/x86_64/strlen.S:106
106 ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0 strlen () at ../sysdeps/x86_64/strlen.S:106
#1 0x0000555555561a13 in array_starts_with (str=0x5555557ce910 "/tmp/tmp.6nI2Fkv0KL/1.img", arr=0x7fffffffd160, size=-1540438016) at security/virt-aa-helper.c:525
#2 0x0000555555561d49 in valid_path (path=0x5555557ce910 "/tmp/tmp.6nI2Fkv0KL/1.img", readonly=false) at security/virt-aa-helper.c:617
#3 0x0000555555562506 in vah_add_path (buf=0x7fffffffd3e0, path=0x5555557cb910 "/tmp/tmp.6nI2Fkv0KL/1.img", perms=0x555555581585 "rw", recursive=false) at security/virt-aa-helper.c:823
#4 0x0000555555562693 in vah_add_file (buf=0x7fffffffd3e0, path=0x5555557cb910 "/tmp/tmp.6nI2Fkv0KL/1.img", perms=0x555555581585 "rw") at security/virt-aa-helper.c:854
#5 0x0000555555562918 in add_file_path (disk=0x5555557d4440, path=0x5555557cb910 "/tmp/tmp.6nI2Fkv0KL/1.img", depth=0, opaque=0x7fffffffd3e0) at security/virt-aa-helper.c:931
#6 0x00007ffff78f18b1 in virDomainDiskDefForeachPath (disk=0x5555557d4440, ignoreOpenFailure=true, iter=0x5555555628a6 <add_file_path>, opaque=0x7fffffffd3e0) at conf/domain_conf.c:23286
#7 0x0000555555562b5f in get_files (ctl=0x7fffffffd670) at security/virt-aa-helper.c:982
#8 0x0000555555564100 in vahParseArgv (ctl=0x7fffffffd670, argc=5, argv=0x7fffffffd7e8) at security/virt-aa-helper.c:1277
#9 0x00005555555643d6 in main (argc=5, argv=0x7fffffffd7e8) at security/virt-aa-helper.c:1332
So I've taken look at valid_path() because it is obviously
calling array_starts_with() with malformed @size. And here's the
result: there are two variables to hold the size of three arrays
and their value is recalculated before each call of
array_starts_with(). What if we just use three variables,
initialize them and do not touch them afterwards?
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1226234#c3
If the qemu monitor fails to remove the memory from the guest for
any reason, the auditlog message will incorrectly use the current
actual memory (via virDomainDefGetMemoryActual) instead of the
value we were attempting to reduce to. The result is the 'new-mem'
and 'old-mem' values for the auditlog message would be identical.
This patch creates a local 'newmem' which accounts for the current
memory size minus the memory which is being removed. NB, for the
success case this results in the same value that would be returned
by virDomainDefGetMemoryActual without the need to do the math. This
follows the existing code which would subtract the size for cur_balloon.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1226234#c3
Prior to this patch, after successfully hot plugging memory
the audit log indicated that the update failed, e.g.:
type=VIRT_RESOURCE ... old-mem=1024000 new-mem=1548288 \
exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=pts/2 res=failed
This patch will adjust where virDomainAuditMemory is called to
ensure the proper 'ret' value is used based on success or failure.
Additionally, the audit message should include the size of the
memory we were attempting to change to rather than the current
actual size. On failure to add, the message showed the same value
for old-mem and new-mem.
In order to do this, introduce a 'newmem' local which will compute
the new size based on the oldmem size plus the size of memory we
are about to add. NB: This would be the same as calling the
virDomainDefGetMemoryActual again on success, but avoids the
overhead of recalculating. Plus cur_balloon is already adjusted
by the same value, so this follows that.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
The ACS checks are meaningless when using the more modern VFIO driver
for device assignment since VFIO has its own more complete and exact
checks, but I didn't realize that when I added support for VFIO. This
patch eliminates the ACS check when preparing PCI devices for
assignment if VFIO is being used.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1256486
Older versions of glibc don't provide the setns() syscall
function wrapper, so we must define it ourselves to prevent
build failure on old distros.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This is a cryptographically signed message in MIME format.
Some UEFI firmwares may want to use a non-volatile memory to store some
variables.
If AppArmor is enabled, and NVRAM store file is set currently
virt-aa-helper does
not add the NVRAM store file to the template. Add this file for
read/write when
this functionality is defined in domain XML.
Signed-off-by: Peter Kieser <peter@kieser.ca>
This patch adds feature for lxc containers to inherit namespaces.
This is very similar to what lxc-tools or docker provides. Look
for "man lxc-start" and you will find that you can pass command
args as [ --share-[net|ipc|uts] name|pid ]. Or check out docker
networking option in which you can give --net=container:NAME_or_ID
as an option for sharing +namespace.
>From this patch you can add extra libvirt option to share
namespace in following way.
<lxc:namespace>
<lxc:sharenet type='netns' value='red'/>
<lxc:shareipc type='pid' value='12345'/>
<lxc:shareuts type='name' value='container1'/>
</lxc:namespace>
The netns option is specific to sharenet. It can be used to
inherit from existing network namespace.
Co-authored: Daniel P. Berrange <berrange@redhat.com>
Commit f1f68ca334 overused mdir_name()
event though it was not needed in the latest version, hence labelling
directory one level up in the tree and not the one it should.
If anyone with SElinux managed to try run a domain with guest agent set
up, it's highly possible that they will need to run 'restorecon -F
/var/lib/libvirt/qemu/channel/target' to fix what was done.
Reported-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1253107
Make a call virCgroupGetBlkioWeight to re-read blkio.weight right
after it is set in order to keep internal data up-to-date.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Commit aa2cc7 modified a previously unnecessary but innocuous check
for interface IP address during interface update incorrectly, causing
all attempted updates (e.g. changing link state) to interfaces of
type='ethernet' for QEMU to fail.
This patch fixes the issue by completely removing the check for IP
address, which is pointless since QEMU doesn't support setting
interface IP addresses from the domain interface XML anyway.
Signed-off-by: Vasiliy Tolstov <v.tolstov@selfip.ru>
Signed-off-by: Laine Stump <laine@laine.org>
We will try to set the node to cpuset.mems without check if
it is available, since we already have helper to check this.
Call virNumaNodesetIsAvailable to check if node is available,
then try to change it in the cgroup.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Remove unused variable, tag unused parameter and adjust return type.
introduced by 3f48345f7e
CC security/libvirt_security_manager_la-security_selinux.lo
security/security_selinux.c: In function 'virSecuritySELinuxDomainSetDirLabel':
security/security_selinux.c:2520:5: error: return makes pointer from integer without a cast [-Werror]
security/security_selinux.c:2514:9: error: unused variable 'ret' [-Werror=unused-variable]
security/security_selinux.c:2509:59: error: unused parameter 'mgr' [-Werror=unused-parameter]
While a zero allocation in safezero should be fine it isn't when we use
posix_fallocate which returns EINVAL on a zero allocation.
While we could skip the zero allocation in safezero_posix_fallocate it's
an optimization to do it for all allocations.
This fixes vm installation via virtinst for me which otherwise aborts
like:
Starting install...
Retrieving file linux... | 5.9 MB 00:01 ...
Retrieving file initrd.gz... | 29 MB 00:07 ...
ERROR Couldn't create storage volume 'virtinst-linux.sBgds4': 'cannot fill file '/var/lib/libvirt/boot/virtinst-linux.sBgds4': Invalid argument'
The error was introduced by e30297b0 as spotted by Chunyan Liu
We forbid access to /usr/share/, but (at least on Debian-based systems)
the Open Virtual Machine Firmware files needed for booting UEFI virtual
machines in QEMU live in /usr/share/ovmf/. Therefore, we need to add
that directory to the list of read only paths.
A similar patch was suggested by Jamie Strandboge <jamie@canonical.com>
on https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1483071.
First check overrides, then read only files then restricted access
itself.
This allows us to mark files for read only access whose parents were
already restricted for read write.
Based on a proposal by Martin Kletzander
We are automatically generating some socket paths for domains, but all
those paths end up in a directory that's the same for multiple domains.
The problem is that multiple domains can each run with different
seclabels (users, selinux contexts, etc.). The idea here is to create a
per-domain directory labelled in a way that each domain can access its
own unix sockets.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146886
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
SELinux security driver already does that, but DAC driver somehow missed
the memo. Let's fix it so it works the same way.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In virSecuritySELinuxSetSecurityChardevLabel() we are labelling unix
socket path, but accessing another structure of the union. This does
not pose a problem currently as both paths are at the same offset, but
this should be fixed for the future.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Now that we have virNetDaemon object holding all the data and being
capable of referencing multiple servers, having a duplicate reference to
a single server stored in virLockDaemon isn't necessary anymore. This
patch removes the above described element.
While the check is appropriate for eg. the x86 and generic drivers,
there are some valid ppc64 guest configurations where the CPU
model is supposed to be NULL.
Moving this check from the generic code to the drivers makes it
possible to accomodate both use cases.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1251927
Not all combinations of host CPU models and compatibility modes
are valid, so we need to make sure we don't try to do something
that QEMU will reject.
Moreover, we need to apply a different logic to guests using
host-model and host-passthrough modes when testing them for host
compatibility.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1251927
If a guest CPU is defined using
<cpu mode='host-model'/>
the <model> sub-element will contain the compatibility mode to use.
That means we can't just copy the host CPU model on cpuUpdate(),
otherwise we'll overwrite that information and migration of such
guests will fail.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1251927
Since iothreadid = 0 is invalid, we need to check for it when attempting
to add a disk; otherwise, someone would think/believe their attempt to
add an IOThread to the disk would succeed. Luckily other code ignored
things when ->iothread == 0...
Since we're linking this into libvirtd we need some symbols to be public
but not part of the public API so mark them as
LIBVIRT_ADMIN_PRIVATE_<VERSION> as we do with libvirt.
Making all other symbols local makes sure we don't accidentally leak
unwanted ones.
Commit 89c509a0 added getters for cgroup block device I/O throttling,
however stub versions of these functions have not matching function
prototypes that result in compilation fail on platforms not supporting
cgroup.
Fix build by correcting prototypes of the stubbed functions.
Pushing under build-breaker rule.
The problem here is that there are some values that kernel accepts, but
does not set them, for example 18446744073709551615 which acts the same
way as zero. Let's do the same thing we do with other tuning options
and re-read them right after they are set in order to keep our internal
structures up-to-date.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1165580
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The problem here is that there are some values that kernel accepts, but
does not set them, for example 18446744073709551615 which acts the same
way as zero. Let's do the same thing we do with other tuning options
and re-read them right after they are set in order to keep our internal
structures up-to-date.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This function translates device paths to "major:minor " string, and all
virCgroupSetBlkioDevice* functions are modified to use it. It's a
cleanup with no functional change.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
That function takes string list and returns first string in that list
that starts with the @prefix parameter with that prefix being skipped as
the caller knows what it starts with (also for easier manipulation in
future).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1251886
Since iothread_id == 0 is an invalid value for QEMU let's point
that out specifically. For the IOThreadDel code, the failure would
have ended up being a failure to find the IOThread ID; however, for
the IOThreadAdd code - an IOThread 0 was added and that isn't good.
It seems during many reviews/edits to the code the check for
iothread_id = 0 being invalid was lost - it could have originally
been in the API code, but requested to be moved - I cannot remember.
The comment for the function indicated that iothread_id had to be
a positive non-zero value; however, that wasn't checked - that is
a value of 0 is/was allowed by the API and was left up to the
hypervisor to reject the value.
More than likely this nuance was missed during the many "adjustments"
to the API in the review phase.
Allow 0 as an iothread_id and force the hypervisor to handle.
The qemuDomainPinIOThread API will look up the iothread_id of
0 and not find it and message that anyway.
Just like in commit 704cf06, if virCgroup*() fails, the error is
already reported. There's no need to overwrite the error with a
generic one and possibly hiding the true root cause of the error.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Fix inconsistency between function description and actual
parameter name in virConfGetValue/virConfSetValue.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Ever since commit e44b0269, 64-bit mingw compilation fails with:
../../src/util/virprocess.c: In function 'virProcessGetPids':
../../src/util/virprocess.c:628:50: error: passing argument 4 of 'virStrToLong_i' from incompatible pointer type [-Werror=incompatible-pointer-types]
if (virStrToLong_i(ent->d_name, NULL, 10, &tmp_pid) < 0)
^
In file included from ../../src/util/virprocess.c:59:0:
../../src/util/virstring.h:53:5: note: expected 'int *' but argument is of type 'pid_t * {aka long long int *}'
int virStrToLong_i(char const *s,
^
cc1: all warnings being treated as errors
Although mingw won't be using this function, it does compile the
file, and the fix is relatively simple.
* src/util/virprocess.c (virProcessGetPids): Don't assume pid_t
fits in int.
Signed-off-by: Eric Blake <eblake@redhat.com>
It may happen that user (mistakenly) wants to rename a domain to
itself. Which is no renaming at all. We should reject that with
some meaningful error message.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If this function fails, the error message is reported only in
some cases (e.g. OOM), but in some it's not (e.g. duplicate key).
This fact is painful and we should either not report error at all
or report the error in all possible cases. I vote for the latter.
Unfortunately, since the key may be an arbitrary value (not
necessarily a string) we can't report it in the error message.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In 9190f0b0 we've tried to fix an OOM. And boy, was that fix
successful. But back then, the hash table implementation worked
strictly over string keys, which is not the case anymore. Hash
table have this function keyCopy() which returns void *.
Therefore a local variable that is temporarily holding the
intermediate return value from that function should be void *
too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Coverity complained that 'vm' wasn't initialized before jumping to
cleanup: and calling virDomainObjEndAPI if the VIR_STRDUP fails.
So I initialized vm = NULL and also moved the VIR_STRDUP closer to
usage and used endjob for goto. Lots of other reasons for failures.
In order to share as much virsh' logic as possible with upcomming
virt-admin client we need to split virsh logic into virsh specific and
client generic features.
Since majority of virsh methods should be generic enough to be used by
other clients, it's much easier to rename virsh specific data to virshX
than doing this vice versa. It moved generic virsh commands (including info
and opts structures) to generic module vsh.c.
Besides renaming methods and structures, this patch also involves introduction
of a client specific control structure being referenced as private data in the
original control structure, introduction of a new global vsh Initializer,
which currently doesn't do much, but there is a potential for added
functionality in the future.
Lastly it introduced client hooks which are especially necessary during
client connecting phase.
Currently supports only renaming inactive domains without snapshots.
Signed-off-by: Tomas Meszaros <exo@tty.sk>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We just need to update the entry in the second hash table. Since commit 8728a56
we have two hash tables for the domain list so that we can do O(1) lookup
regardless of looking up by UUID or name. Since with renaming a domain UUID does
not change, we only need to update the second hash table, where domains are
referenced by their name.
We will call both functions from the qemuDomainRename().
Signed-off-by: Tomas Meszaros <exo@tty.sk>
Also, among with this new API new ACL that restricts rename
capability is invented too.
Signed-off-by: Tomas Meszaros <exo@tty.sk>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Otherwise the error is just
error: Failed to create domain from test1.xml
error: failed to retrieve file descriptor for interface: Transport endpoint is not connected
since we don't get a sensible error after the fork.
Pinning information returned for emulatorpin and vcpupin calls is being
returned from our data without querying cgroups for some time. However,
not all the data were utilized. When automatic placement is used the
information is not returned for the calls mentioned above. Since the
numad hint in private data is properly saved/restored, we can safely use
it to return true information.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1162947
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The numad hint stored in priv->autoNodeset is information that gets lost
during daemon restart. And because we would like to use that
information in the future, we also need to save it in the status XML.
For the sake of tests, we need to initialize nnumaCell_max to some
value, so that the restoration doesn't fail in our test suite. There is
no need to fill in the actual numa cell data since the recalculating
function virCapabilitiesGetCpusForNodemask() will not fail, it will just
skip filling the data in the bitmap which we don't use in tests anyway.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When parsing private domain data, there are two paths that are flawed.
They are both error paths, just from different parts of the function.
One of them can call free() on an uninitialized pointer. Initialization
to NULL is enough here. The other one is a bit trickier to explain, but
as easy as the first one to fix. We create capabilities, parse them and
then assign them into the private data pointer inside the domain object.
If, however, we get to fail from now on, the error path calls unrefs the
capabilities and then, when the domain object is being cleaned,
qemuDomainObjPrivateFree() tries to unref them as well. That causes a
segfault. Settin the pointer to NULL upon successful addition to the
private data is enough.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1210587 (completed)
When generating the default drive address for a SCSI <disk> device,
check the generated address to ensure it doesn't conflict with a SCSI
<hostdev> address. The <disk> address generation algorithm uses the
<target> "dev" name in order to determine which controller and unit
in order to place the device. Since a SCSI <hostdev> device doesn't
require a target device name, its placement on the guest SCSI address
"could" conflict. For instance, if a SCSI <hostdev> exists at
controller=0 unit=0 and an attempt to hotplug 'sda' into the guest
made, there would be a conflict if the <hostdev> is already using
/dev/sda.
https://bugzilla.redhat.com/show_bug.cgi?id=1210587 (partial)
If a SCSI subsystem <hostdev> element address is provided, we need to
make sure the address provided doesn't conflict with an existing or
libvirt generated address for a SCSI <disk> element. We can handle
this condition in device post processing since we're not generating an
address based on some target name - rather it's either generated based
on space or provided from the user. If the user provides one that conflicts,
then we need to disallow the change.
This will fix the issue where the domain XML provided an <address> for
the <hostdev>, but not the <disk> element where the address provided
ends up being the same address used for the <disk>. A <disk> address
is generated using it's assigned <target> 'dev' name prior to the
check/validation of the <hostdev> address value.
Hot-unplugging a disk from a guest that supports hot-unplugging generates an error
in the libvirt log when running QEMU with the "-msg timestamp=on" flag.
2015-08-06 10:48:59.945+0000: 11662: error : qemuMonitorTextDriveDel:2594 :
operation failed: deleting drive-virtio-disk4 drive failed:
2015-08-06T10:48:59.945058Z Device 'drive-virtio-disk4' not found
This error is caused because the HMP results are getting prefixed with a timestamp.
Parsing the output is not reliable with STRPREFIX as the results can be prefixed with a timestamp.
Using strstr ensures that parsing the output works whether the results are prefixed or not.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Frank Schreuder <fschreuder@transip.nl>
This reverts commit ede34470fd, which
was apparently written based on testing performed before commits
1e15be1 and 9a12b6 were pushed upstream. Once those two patches are in
place, commit ede34470 is redundant, and can even cause
incorrect/unexpected behavior when auto-assigning addresses for
virtio-net devices.
There's a check right at the beginning of the function that
shortcuts if the function was called over all NULL arguments.
However, this was meant just as a fool-proof check so that we
don't crash if function is used in a bad manner. Anyway, it makes
Coverity unhappy as it then thinks any of the arguments could be
NULL. Well, with the current state of the code it can't.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
It may happen that an interface don't have any bandwidth set and
a new one is to be set. In that case, @ifaceBand will be NULL.
This will cause troubles later in the code when deciding what to
do.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If you pass <disk><serial> XML to UpdateDevice, and the original device
didn't have a <serial> block, libvirtd crashes trying to read the original
NULL serial string.
Use _NULLABLE string comparisons to avoid the crash. A couple other
properties needed the change too.
Commit e8d5517 updated the domain post-parse to automatically add
pcie-root et al for certain ARM "virt" machinetypes, but didn't update
the function qemuDomainSupportsPCI() which is called later on when we
are auto-assigning PCI addresses and default settings for the PCI
controller <model> and <target> attributes. The result was that PCI
addresses weren't assigned, and the controllers didn't have their
attribute default values set, leading to an error when the domain was
started, e.g.:
internal error: autogenerated dmi-to-pci-bridge options not set
This patch adds the same check made in the earlier patch to
qemuDomainSupportsPCI(), so that PCI address auto-assignment and
target/model default values will be set.
When running the test suite using "unshare -n" we might have IPv6 but no
configured addresses. Due to AI_ADDRCONFIG getaddrinfo then fails with
EAI_NONAME which we should then treat as IPv6 unavailable.
This fixes the crash described here:
https://www.redhat.com/archives/libvir-list/2015-August/msg00162.html
In short, we were calling ioctl(SIOCETHTOOL) pointing to a too-short
object that was a local on the stack, resulting in the memory past the
end of the object being overwritten. This was because the struct used
by the ETHTOOL_GFEATURES command of SIOCETHTOOL ends with a 0-length
array, but we were telling ethtool that it could use 2 elements on the
array.
The fix is to allocate the necessary memory with VIR_ALLOC_VAR(),
including the extra length needed for a 2 element array at the end.
Well, there are just two places that needs adjustment:
qemuDomainGetInterfaceParameters - to report the @floor
qemuDomainSetInterfaceParameters - now that the function has been
fixed, we can allow updating @floor too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
As sketched in previous commits, imagine the following scenario:
virsh # domiftune gentoo vnet0
inbound.average: 100
inbound.peak : 0
inbound.burst : 0
outbound.average: 100
outbound.peak : 0
outbound.burst : 0
virsh # domiftune gentoo vnet0 --inbound 0
virsh # shutdown gentoo
Domain gentoo is being shutdown
virsh # list --all
error: Failed to list domains
error: Cannot recv data: Connection reset by peer
Program received signal SIGSEGV, Segmentation fault.
0x00007fffe80ea221 in networkUnplugBandwidth (net=0x7fff9400c1a0, iface=0x7fff940ea3e0) at network/bridge_driver.c:4881
4881 net->floor_sum -= ifaceBand->in->floor;
This is rather unfortunate. We should not SIGSEGV here. The
problem is, that while in the second step the inbound QoS was
cleared out, the network part of it was not updated (moreover, we
don't report that vnet0 had inbound.floor set). Internal
structure therefore still had some fragments left (e.g.
class_id). So when qemuProcessStop() started to clean up the
environment it got to networkUnplugBandwidth(). Here, class_id is
set therefore function assumes that there is an inbound QoS. This
actually is a fair assumption to make, there's no need for a
special QoS box in network's QoS when there's no QoS to set.
Anyway, the problem is not the networkUnplugBandwidth() rather
than qemuDomainSetInterfaceParameters() which completely forgot
about QoS being disperse (some parts are set directly on
interface itself, some on bridge the interface is plugged into).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So, if a domain vNIC's bandwidth has been successfully set, it's
possible that because @floor is set on network's bridge, this
part may need updating too. And that's exactly what this function
does. While the previous commit introduced a function to check if
@floor can be satisfied, this does all the hard work. In general,
there may be three, well four possibilities:
1) No change in @floor value (either it remain unset, or its
value hasn't changed)
2) The @floor value has changed from a non-zero to a non-zero
value
3) New @floor is to be set
4) Old @floor must be cleared out
The difference between 2), 3) and 4) is, that while in 2) the QoS
tree on the network's bridge already has a special class for the
vNIC, in 3) the class must be created from scratch. In 4) it must
be removed. Fortunately, we have helpers for all three
interesting cases.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When a domain vNIC's bandwidth is to be changed (at runtime) it is
possible that guaranteed minimal bandwidth (@floor) will change too.
Well, so far it is, because we still don't have an implementation that
allows setting it dynamically, so it's effectively erased on:
#virsh domiftune $dom vnet0 --inbound 0
However, that's slightly unfortunate. We do some checks on domain
startup to see if @floor can be guaranteed. We ought do the same if
QoS is changed at runtime.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This is no functional change. It's just that later in the series we
will need to pass class_id as an integer.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There is no guarantee that an enum start it mapped onto a value
of zero. However, we are guaranteed that enum items are
consecutive integers. Moreover, it's a pity to define an enum to
avoid using magical constants but then using them anyway.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit a6f9af8292 added checking for address colisions between
starting and ending addresses of forwarding addresses, but forgot that
there might be no addresses set at all.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Unlike what happens on x86, on ppc64 you can't mix and match CPU
features to obtain the guest CPU you want regardless of the host
CPU, so the concept of model fallback doesn't apply.
Make sure CPU definitions emitted by the driver, eg. as output of
the cpuBaseline() and cpuUpdate() calls, reflect this fact.
All previously recognized CPU models (POWER7_v2.1, POWER7_v2.3,
POWER7+_v2.1 and POWER8_v1.0) are internally converted to the
corrisponding generation name so that existing guests don't stop
working.
Use multiple PVRs per CPU model to reduce the number of models we
need to keep track of.
Remove specific CPU models (eg. POWER7+_v2.1): the corresponding
generic CPU model (eg. POWER7) should be used instead to ensure
the guest can be booted on any compatible host.
Get rid of all the entries that did not match any of the CPU
models supported by QEMU, like power8 and power8e.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1250977
This will allow us to perform PVR matching more broadly, eg. consider
both POWER8 and POWER8E CPUs to be the same even though they have
different PVR values.
This ensures comparison of two CPU definitions will be consistent
regardless of the fact that it is performed using cpuCompare() or
cpuGuestData(). The x86 driver uses the same exact code.
Limitations of the POWER architecture mean that you can't run
eg. a POWER7 guest on a POWER8 host when using KVM. This applies
to all guests, not just those using VIR_CPU_MATCH_STRICT in the
CPU definition; in fact, exact and strict CPU matching are
basically the same on ppc64.
This means, of course, that hosts using different CPUs have to be
considered incompatible as well.
Change ppc64Compute(), called by cpuGuestData(), to reflect this
fact and update test cases accordingly.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1250977
ppc64Compute(), called by cpuNodeData(), is used not only to retrieve
the driver-specific data associated to a guest CPU definition, but
also to check whether said guest CPU is compatible with the host CPU.
If the user is not interested in the CPU data, it's perfectly fine
to pass a NULL pointer instead of a return location, and the
compatibility data returned should not be affected by this. One of
the checks, specifically the one on CPU model name, was however
only performed if the return location was non-NULL.
Use briefer checks, eg. (!model) instead of (model == NULL), and
avoid initializing to NULL a pointer that would be assigned in
the first line of the function anyway.
Also remove a pointless NULL assignment.
No functional changes.
Use the ppc64Driver prefix for all functions that are used to
fill in the cpuDriverPPC64 structure, ie. those that are going
to be called by the generic CPU code.
This makes it clear which functions are exported and which are
implementation details; it also gets rid of the ambiguity that
affected the ppc64DataFree() function which, despite what the
name suggested, was not related to ppc64DataCopy() and could
not be used to release the memory allocated for a
virCPUppc64Data* instance.
No functional changes.
nwfilter uses iptables and ebtables, which only work properly on
tap-based network connections (*not* on macvtap, for example), but we
just ignore any <filterref> elements for other types of networks,
potentially giving users a false sense of security.
This patch checks the network type and fails/logs an error if any
domain <interface> has a <filterref> when the connection isn't using a
tap device.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1180011
This patch modifies virSocketAddrGetRange() to function properly when
the containing network/prefix of the address range isn't known, for
example in the case of the NAT range of a virtual network (since it is
a range of addresses on the *host*, not within the network itself). We
then take advantage of this new functionality to validate the NAT
range of a virtual network.
Extra test cases are also added to verify that virSocketAddrGetRange()
works properly in both positive and negative cases when the network
pointer is NULL.
This is the *real* fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=985653
Commits 1e334a and 48e8b9 had earlier been pushed as fixes for that
bug, but I had neglected to read the report carefully, so instead of
fixing validation for the NAT range, I had fixed validation for the
DHCP range. sigh.
https://bugzilla.redhat.com/show_bug.cgi?id=1022292
The following XML really does not make any sense:
<inbound average="-1" burst="-2" peak="-3" floor="-4"/>
There can't be a negative packet rate. Well, so far we haven't
assigned any meaning to it. So reject it unless users harm themselves,
because otherwise we turn the negative numbers into really big values.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There's no need to set mon->fd to a dummy value since
it's initialized to proper value just a few lines below.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Since its introduction in 2011 (particularly in commit f4324e3292),
the option doesn't work. It just effectively disables all incoming
connections. That's because the client private data that contain the
'keepalive_supported' boolean, are initialized to zeroes so the bool is
false and the only other place where the bool is used is when checking
whether the client supports keepalive. Thus, according to the server,
no client supports keepalive.
Removing this instead of fixing it is better because a) apparently
nobody ever tried it since 2011 (4 years without one month) and b) we
cannot know whether the client supports keepalive until we get a ping or
pong keepalive packet. And that won't happen until after we dispatched
the ConnectOpen call.
Another two reasons would be c) the keepalive_required was tracked on
the server level, but keepalive_supported was in private data of the
client as well as the check that was made in the remote layer, thus
making all other instances of virNetServer miss this feature unless they
all implemented it for themselves and d) we can always add it back in
case there is a request and a use-case for it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
So the API takes @dumpformat argument. This is what makes it special
when compared to virDomainCoreDump. The argument is there so that
users can choose the format of resulting core dump file. And to ease
them the choosing process we even have an enum with supported values
across all the hypervisors. But we don't mention the enum in the
function description anywhere. Fix it!
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
By specifying parentIndex in a call to virNetworkUpdate(), it was
possible to direct libvirt to add a dhcp range or static host of a
non-matching address family to the <dhcp> element of an <ip>. For
example, given:
<ip address='192.168.122.1' netmask='255.255.255.0'/>
<ip family='ipv6' address='2001:db6:ca3:45::1' prefix='64'/>
you could provide a static host entry with an IPv4 address, and
specify that it be added to the 2nd <ip> element (index 1):
virsh net-update default add ip-dhcp-host --parent-index 1 \
'<host mac="52:54:00:00:00:01" ip="192.168.122.45"/>'
This would be happily added with no error (and no concern of any
possible future consequences).
This patch checks that any dhcp range or host element being added to a
network ip's <dhcp> subelement has addresses of the same family as the
ip element they are being added to.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1184736
This controller can be connected only to a port on a
pcie-switch-upstream-port. It provides a single hotpluggable port that
will accept any PCI or PCIe device, as well as any device requiring a
pcie-*-port (the only current example of such a device is the
pcie-switch-upstream-port).
The downstream ports of an x3130-upstream switch can each have one of
these plugged into them (and that is the only place they can be
connected). Each xio3130-downstream provides a single PCIe port that
can have PCI or PCIe devices hotplugged into it. Apparently an entire
set of x3130-upstream + several xio3130-downstreams can be hotplugged
as a unit, but it's not clear to me yet how that would be done, since
qemu only allows attaching a single device at a time.
This device will be used to implement the
"pcie-switch-downstream-port" model of pci controller.
This controller can be connected only to a pcie-root-port or a
pcie-switch-downstream-port (which will be added in a later patch),
which is the reason for the new connect type
VIR_PCI_CONNECT_TYPE_PCIE_PORT. A pcie-switch-upstream-port provides
32 ports (slot=0 to slot=31) on the downstream side, which can only
have pci controllers of model "pcie-switch-downstream-port" plugged
into them, which is the reason for the other new connect type
VIR_PCI_CONNECT_TYPE_PCIE_SWITCH.
This is the upstream part of a PCIe switch. It connects to a PCIe port
(but not PCI) on the upstream side, and can have up to 31
xio3130-downstream controllers (but no other types of devices)
connected to its downstream side.
This device will be used to implement the "pcie-switch-upstream-port"
model of pci controller.
This is backed by the qemu device ioh3420.
chassis and port from the <target> subelement are used to store/set the
respective qemu device options for the ioh3420. Currently, chassis is
set to be the index of the controller, and port is set to
"(slot << 3) + function" (per suggestion from Alex Williamson).
This controller can be connected (at domain startup time only - not
hotpluggable) only to a port on the pcie root complex ("pcie-root" in
libvirt config), hence the new connect type
VIR_PCI_CONNECT_TYPE_PCIE_ROOT. It provides a hotpluggable port that
will accept any PCI or PCIe device.
New attributes must be added to the controller <target> subelement for
this - chassis and port are guest-visible option values that will be
set by libvirt with values derived from the controller's index and pci
address information.
This is a PCIE "root port". It connects only to a port of the
integrated pcie.0 bus of a Q35 machine (can't be hotplugged), and
provides a single PCIe port that can have PCI or PCIe devices
hotplugged into it.
This device will be used to implement the "pcie-root-port" model of
pci controller.
This uses the new subelement/attribute in two ways:
1) If a "pci-bridge" pci controller has no chassisNr attribute, it
will automatically be set to the controller's index as soon as the
controller's PCI address is known (during
qemuDomainAssignPCIAddresses()).
2) when creating the commandline for a pci-bridge device, chassisNr
will be used to set qemu's chassis_nr option (rather than the previous
practice of hard-coding it to the controller's index).
There are some configuration options to some types of pci controllers
that are currently automatically derived from other parts of the
controller's configuration. For example, in qemu a pci-bridge
controller has an option that is called "chassis_nr"; up until now
libvirt has always set chassis_nr to the index of the pci-bridge. So
this:
<controller type='pci' model='pci-bridge' index='2'/>
will always result in:
-device pci-bridge,chassis_nr=2,...
on the qemu commandline. In the future we may decide there is a better
way to derive that option, but even in that case we will need for
existing domains to retain the same chassis_nr they were using in the
past - that is something that is visible to the guest so it is part of
the guest ABI and changing it would lead to problems for migrating
guests (or just guests with very picky OSes).
The <target> subelement has been added as a place to put the new
"chassisNr" attribute that will be filled in by libvirt when it
auto-generates the chassisNr; it will be saved in the config, then
reused any time the domain is started:
<controller type='pci' model='pci-bridge' index='2'>
<model type='pci-bridge'/>
<target chassisNr='2'/>
</controller>
The one oddity of all this is that if the controller configuration
is changed (for example to change the index or the pci address
where the controller is plugged in), the items in <target> will
*not* be re-generated, which might lead to conflict. I can't
really see any way around this, but fortunately if there is a
material conflict qemu will let us know and we will pass that on
to the user.
This patch provides qemu support for the contents of <model> in
<controller> for the two existing PCI controller types that need it
(i.e. the two controller types that are backed by a device that must
be specified on the qemu commandline):
1) pci-bridge - sets <model> name attribute default as "pci-bridge"
2) dmi-to-pci-bridge - sets <model> name attribute default as
"i82801b11-bridge".
These both match current hardcoded practice.
The defaults are set at the end of qemuDomainAssignPCIAddresses().
This can't be done earlier because some of the options that will be
autogenerated need full PCI address info for the controller, and
because qemuDomainAssignPCIAddresses() might create extra controllers
which would need default settings added, and that hasn't yet been done
at the time the PostParse callbacks are being run.
qemuDomainAssignPCIAddresses() is still called prior to the XML being
written to disk, though, so the autogenerated defaults are persistent.
qemu capabilities bits aren't checked when the domain is defined, but
rather when the commandline is actually created (so the domain can
possibly be defined on a host that doesn't yet have support for the
given device, or a host different from the one where it will
eventually be run). When the commandline is being generated we compare
the modelName to known qemu device names implementing the given type
of controller, and check the capabilities bit for that device.
This new subelement is used in PCI controllers: the toplevel
*attribute* "model" of a controller denotes what kind of PCI
controller is being described, e.g. a "dmi-to-pci-bridge",
"pci-bridge", or "pci-root". But in the future there will be different
implementations of some of those types of PCI controllers, which
behave similarly from libvirt's point of view (and so should have the
same model), but use a different device in qemu (and present
themselves as a different piece of hardware in the guest). In an ideal
world we (i.e. "I") would have thought of that back when the pci
controllers were added, and used some sort of type/class/model
notation (where class was used in the way we are now using model, and
model was used for the actual manufacturer's model number of a
particular family of PCI controller), but that opportunity is long
past, so as an alternative, this patch allows selecting a particular
implementation of a pci controller with the "name" attribute of the
<model> subelement, e.g.:
<controller type='pci' model='dmi-to-pci-bridge' index='1'>
<model name='i82801b11-bridge'/>
</controller>
In this case, "dmi-to-pci-bridge" is the kind of controller (one that
has a single PCIe port upstream, and 32 standard PCI ports downstream,
which are not hotpluggable), and the qemu device to be used to
implement this kind of controller is named "i82801b11-bridge".
Implementing the above now will allow us in the future to add a new
kind of dmi-to-pci-bridge that doesn't use qemu's i82801b11-bridge
device, but instead uses something else (which doesn't yet exist, but
qemu people have been discussing it), all without breaking existing
configs.
(note that for the existing "pci-bridge" type of PCI controller, both
the model attribute and <model> name are 'pci-bridge'. This is just a
coincidence, since it turns out that in this case the device name in
qemu really is a generic 'pci-bridge' rather than being the name of
some real-world chip)
If a pci address had a function number out of range, the error message
would be:
Insufficient specification for PCI address
which is logged by virDevicePCIAddressParseXML() after
virDevicePCIAddressIsValid returns a failure.
This patch enhances virDevicePCIAddressIsValid() to optionally report
the error itself (since it is the place that decides which part of the
address is "invalid"), and uses that feature when calling from
virDevicePCIAddressParseXML(), so that the error will be more useful,
e.g.:
Invalid PCI address function=0x8, must be <= 7
Previously, virDevicePCIAddressIsValid didn't check for the
theoretical limits of domain or bus, only for slot or function. While
adding log messages, we also correct that ommission. (The RNG for PCI
addresses already enforces this limit, which by the way means that we
can't add any negative tests for this - as far as I know our
domainschematest has no provisions for passing XML that is supposed to
fail).
Note that virDevicePCIAddressIsValid() can only check against the
absolute maximum attribute values for *any* possible PCI controller,
not for the actual maximums of the specific controller that this
device is attaching to; fortunately there is later more specific
validation for guest-side PCI addresses when building the set of
assigned PCI addresses. For host-side PCI addresses (e.g. for
<hostdev> and for network device pools), we rely on the error that
will be logged when it is found that the device doesn't actually
exist.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1004596
https://bugzilla.redhat.com/show_bug.cgi?id=1176020
Some users think this is a good idea:
<vcpu placement='static'>4</vcpu>
<cpu mode='host-model'>
<model fallback='allow'/>
<numa>
<cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
<cell id='1' cpus='9-10' memory='2097152' unit='KiB'/>
</numa>
</cpu>
It's not. Lets therefore introduce a check and discourage them in
doing so.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This function should return the greatest CPU number set in
/domain/cpu/numa/cell/@cpus. The idea is that we should compare
the returned value against /domain/vcpu value. Yes, there exist
users who think the following is a good idea:
<vcpu placement='static'>4</vcpu>
<cpu mode='host-model'>
<model fallback='allow'/>
<numa>
<cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
<cell id='1' cpus='9-10' memory='2097152' unit='KiB'/>
</numa>
</cpu>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Qemu reports physical size 0 for block devices. As 15fa84acbb
changed the behavior of qemuDomainGetBlockInfo to just query the monitor
this created a regression since we didn't report the size correctly any
more.
This patch adds code to refresh the physical size of a block device by
opening it and seeking to the end and uses it both in
qemuDomainGetBlockInfo and also in qemuDomainGetStatsOneBlock that was
broken since it was introduced in this respect.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1250982
The commit 7e72de4 didn't consider the hotplug scenarios. The patch addresses
the hotplug case whereby if atleast one of the pci function is owned by a
guest, the hotplug of other functions/devices in the same iommu group to the
same guest goes through successfully.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
virtio-net-pci adapter is capable to use irqfd with vhost-net only in MSI-X
mode, which appears to be available only on PCIe bus, at least on ARM
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Legacy -net option works correctly only with embedded device models, which
do not require any bus specification. Therefore, we should use -device for
PCI hardware
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Here we assume that if qemu supports generic PCI host controller,
it is a part of virt machine and can be used for adding PCI devices.
In qemu this is actually a PCIe bus, so we also declare multibus
capability so that 0'th bus is specified to qemu correctly as 'pcie.0'
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>