Given how intrusive previous patches are, it might happen that
there's a bug or imperfection. Lets give users a way out: if they
set 'namespaces' to an empty array in qemu.conf the feature is
suppressed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Instead of trying to fix our security drivers, we can use a
simple trick to relabel paths in both namespace and the host.
I mean, if we enter the namespace some paths are still shared
with the host so any change done to them is visible from the host
too.
Therefore, we can just enter the namespace and call
SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight
overhead because we have to fork in order to enter the namespace.
But on the other hand, no complexity is added to our code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Prime time. When it comes to spawning qemu process and
relabelling all the devices it's going to touch, there's inherent
race with other applications in the system (e.g. udev). Instead
of trying convincing udev to not touch libvirt managed devices,
we can create a separate mount namespace for the qemu, and mount
our own /dev there. Of course this puts more work onto us as we
have to maintain /dev files on each domain start and device
hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent
(libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev
/var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare()
to create new mount namespace. From now on anything that child
does is invisible to the parent. Child then mounts tmpfs on
$domain.dev (so that it still sees original /dev from the host)
and creates some devices (as explained in one of the previous
patches). The devices have to be created exactly as they are in
the host (including perms, seclabels, ACLs, ...). After that it
moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can
create PTYs for some chardevs. And historically we exposed the
host ends in our domain XML allowing users to connect to them.
Therefore we must preserve devpts mount to be shared with the
host's one.
To make this patch as small as possible, creating of devices
configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This is a list of devices that qemu needs for its run (apart from
what's configured for domain). The devices on the list are
enabled in the CGroups by default so they will be good candidates
for initial /dev for new qemu.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We will need this function in near future so that we know what
/dev device corresponds to the SCSI device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We will need this function in near future so that we know what
/dev device corresponds to the SCSI device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We will need this function in near future so that we know what
/dev device corresponds to the USB device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and
virFileCopyACLs. These functions are going to be required when we
are creating /dev for qemu. We have copy anything that's in
host's /dev exactly as is. Including ACLs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
libvirt libxl picks its own default with respect to the default NIC
to use. libxlMakeNic is the one responsible for this and on boot it
picks LIBXL_NIC_TYPE_VIF_IOEMU for HVM domains such that it accomodates
both PV and emulated one. The good behaving guest at boot will then
select the pv and unplug the emulated device.
Now, on HVM when attaching an interface it will pick the same default
that is LIBXL_NIC_TYPE_VIF_IOEMU which as a result will fail the attach
(see xen commit 32e9d0f ("libxl: nic type defaults to vif in hotplug for
hvm guest"). Xen doesn't yet support the hotplug of emulated devices,
but we don't want to rule out that case either, which might get support
in the future. Hence we simply reverse the defaults when we are
attaching the interface which allows libvirt to prefer the PV nic first
without adding "model='netfront'" following the same pattern as above
commit. Also to avoid ruling out the emulated one we set to
LIBXL_NIC_TYPE_IOEMU when setting a model type that is not 'netfront'.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
The virDomainSendProcessSignal method says the flags values
come from virDomainProcessSignalFlag, but this enum has
never existed. No flags are needed for this method.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Almost none of our virJSONValue*Get* functions accept const virJSONValue
pointers and it wouldn't even make sense since we sometimes modify what
we get. And because there is no reason for preventing callers of
virJSONValueObjectForeachKeyValue from modifying the values they get in
each iteration we can just stop doing it.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Using a variable named 'stat' clashes with the system function
'stat()' causing compiler warnings on some platforms
cc1: warnings being treated as errors
../../src/qemu/qemu_monitor_text.c: In function 'parseMemoryStat':
../../src/qemu/qemu_monitor_text.c:604: error: declaration of 'stat' shadows a global declaration [-Wshadow]
/usr/include/sys/stat.h:455: error: shadowed declaration is here [-Wshadow]
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the cpuset cgroup controller is disabled in /etc/libvirt/qemu.conf
QEMU virtual machines can in principle use all host CPUs, even if they
are hot plugged, if they have no explicit CPU affinity defined.
However, there's libvirt code supposed to handle the situation where
the libvirt daemon itself is not using all host CPUs. The code in
qemuProcessInitCpuAffinity attempts to set an affinity mask including
all defined host CPUs. Unfortunately, the resulting affinity mask for
the process will not contain the offline CPUs. See also the
sched_setaffinity(2) man page.
That means that even if the host CPUs come online again, they won't be
used by the QEMU process anymore. The same is true for newly hot
plugged CPUs. So we are effectively preventing that QEMU uses all
processors instead of enabling it to use them.
It only makes sense to set the QEMU process affinity if we're able
to actually grow the set of usable CPUs, i.e. if the process affinity
is a subset of the online host CPUs.
There's still the chance that for some reason the deliberately chosen
libvirtd affinity matches the online host CPU mask by accident. In this
case the behavior remains as it was before (CPUs offline while setting
the affinity will not be used if they show up later on).
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
The functions to retrieve online and present host CPU information
are only supported on Linux for the time being.
This leads to runtime errors if these function are used on other
platforms. To avoid that, code in higher levels using the functions
must replicate the conditional compilation in higher level which
is error prone (and is plainly spoken ugly).
Adding a function virHostCPUHasBitmap that can be used to check
for host CPU bitmap support.
NB: There are other functions including the host CPU count that
are lacking support on all platforms, but they are too essential
in order to be bypassed.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
virQEMUCapsFindTarget is supposed to find an alternative QEMU binary if
qemu-system-$GUEST_ARCH doesn't exist. The alternative is using host
architecture when it is compatible with $GUEST_ARCH. But a special
treatment has to be applied for ppc64le since the QEMU binary is always
called qemu-system-ppc64.
Broken by me in v2.2.0-171-gf2e71550d.
https://bugzilla.redhat.com/show_bug.cgi?id=1403745
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
qemuAgentNotifyEvent accesses monitor structure and is called on qemu
reset/shutdown/suspend events under domain lock. Other monitor
functions on the other hand take monitor lock and don't hold domain lock.
Thus it is possible to have risky simultaneous access to the structure
from 2 threads. Let's take monitor lock here to make access exclusive.
In case of 0 filesystems *info is not set while according
to virDomainGetFSInfo contract user should call free on it even
in case of 0 filesystems. Thus we need to properly set
it. NULL will be enough as free eats NULLs ok.
The libvirt-domain.h documentation indicates that for a qcow2 file
in a filesystem being used for a backing store should report the disk
space occupied by a file; however, commit id '15fa84ac' altered the
code to trust that the wr_highest_offset should be used whenever
wr_highest_offset_valid was set.
As it turns out this will lead to indeterminite results. For an active
domain when qemu hasn't yet had the need to find the wr_highest_offset
value, qemu will report 0 even though qemu-img will report the proper
disk size. This causes reporting of the following XML:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/path/to/test-1g.qcow2'/>
to be as follows:
Capacity: 1073741824
Allocation: 0
Physical: 1074139136
with qemu-img indicating:
image: /path/to/test-1g.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G
Once the backing source file is opened on the guest, then wr_highest_offset
is updated, but only to the high water mark and not the size of the file.
This patch will adjust the logic to check for the file backed qcow2 image
and enforce setting the allocation to the returned 'physical' value, which
is the 'actual-size' value from a 'query-block' operation.
NB: The other consumer of the wr_highest_offset output (GetAllDomainStats)
has a contract that indicates 'allocation' is the offset of the highest
written sector, so it doesn't need adjustment.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Instead of having duplicated code in qemuStorageLimitsRefresh and
virStorageBackendUpdateVolTargetInfo to get capacity specific data
about the storage backing source or volume -- create a common API
to handle the details for both.
As a side effect, virStorageFileProbeFormatFromBuf returns to being
a local/static helper to virstoragefile.c
For the QEMU code - if the probe is done, then the format is saved so
as to avoid future such probes.
For the storage backend code, there is no need to deal with the probe
since we cannot call the new API if target->format == NONE.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Instead of having duplicated code in qemuStorageLimitsRefresh and
virStorageBackendUpdateVolTargetInfoFD to fill in the storage backing
source or volume allocation, capacity, and physical values - create a
common API that will handle the details for both.
The common API will fill in "default" capacity values as well - although
those more than likely will be overridden by subsequent code. Having just
one place to make the determination of what the values should be will
make things be more consistent.
For the QEMU code - the data filled in will be for inactive domains
for the GetBlockInfo and DomainGetStatsOneBlock API's. For the storage
backend code - the data will be filled in during the volume updates.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit id '8dc27259' introduced virStorageSourceUpdateBlockPhysicalSize
in order to retrieve the physical size for a block backed source device
for an active domain since commit id '15fa84ac' changed to use the
qemuMonitorGetAllBlockStatsInfo and qemuMonitorBlockStatsUpdateCapacity
API's to (essentially) retrieve the "actual-size" from a 'query-block'
operation for the source device.
However, the code only was made functional for a BLOCK backing type
and it neglected to use qemuOpenFile, instead using just open. After
the open the block lseek would find the end of the block and set the
physical value, close the fd and return.
Since the code would return 0 immediately if the source device wasn't
a BLOCK backed device, the physical would be displayed incorrectly,
such as follows in domblkinfo for a file backed source device:
Capacity: 1073741824
Allocation: 0
Physical: 0
This patch will modify the algorithm to get the physical size for other
backing types and it will make use of the qemuDomainStorageOpenStat
helper in order to open/stat the source file depending on its type.
The qemuDomainGetStatsOneBlock will no longer inhibit printing errors,
but it will still ignore them leaving the physical value set to 0.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Split out the opening of the file and fetch of the stat buffer into a
helper qemuDomainStorageOpenStat. This will handle either opening the
local or remote storage.
Additionally split out the cleanup of that into a separate helper
qemuDomainStorageCloseStat which will either close the file or
call the virStorageFileDeinit function.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Originally added by commit id '89646e69' prior to commit id '15fa84ac'
and '71d2c172' which ensured that qemuStorageLimitsRefresh was only called
for inactive domains.
Adjust the comment describing the need for FIXME and move all the text
to the function description.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This flag is used in Virtuozzo backend implicitly, thus
we need to support it and don't fail if it's set.
Signed-off-by: Pavel Glushchak <pglushchak@virtuozzo.com>
This flag tells backend not to create instance
disks making behavior the same as in qemu driver.
Disk files have to be created beforehand on target
host manually or by upper management layer i.e.
OpenStack Nova.
Signed-off-by: Pavel Glushchak <pglushchak@virtuozzo.com>
When save/migrate a domain and we autogenerated a port, then if we
print the inactive domain config, write out a -1 for the socket value;
otherwise, it's possible that the subsequent start will fail if the
autogenerated websocket used conflicts with an existing running config
that also used autogenerated websockets.
Examples:
== A. Can not restore domain with autoconfigured websocket.
domain 1 and 2 have autoconfigured websocket.
1. domain 1 is started then, saved
2. domain 2 is started
3. domain 1 restoration is failed:
error: internal error: qemu unexpectedly closed the monitor: 2016-11-21T10:23:11.356687Z
qemu-kvm: -vnc 0.0.0.0:2,websocket=5700: Failed to start VNC server on `(null)':
Failed to bind socket: Address already in use
== B. Can not migrate domain with autoconfigured websocket.
domain 1 on host A, domain 2 on host B, both have autoconfigured websocket
1. domain 1 started, domain 2 started
2. domain 1 migration to host B is failed with the above error.
In preparation to the code move to virnetdevtap.c, this change:
* renames virNetInterfaceStats to virNetDevTapInterfaceStats
* changes 'path' to 'ifname', to use the same vocable as other
method in virnetdevtap.c.
* Add the attributes checker
When vhostuser interfaces are used, the interface statistics
are not available in /proc/net/dev.
This change looks at the openvswitch interfaces statistics
tables to provide this information for vhostuser interface.
Note that in openvswitch world drop/error doesn't always make sense
for some interface type. When these informations are not available we
set them to 0 on the virDomainInterfaceStats.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There's nothing to compress if the requested snapshot memory format is
set to 'raw' explicitly. After commit 9e14689ea libvirt would try to
run /sbin/raw to process the memory stream if the qemu.conf option
snapshot_image_format is set.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1402726
Since its introduction in 2012 this internal API did nothing.
Moreover we have the same API that does exactly the same:
virSecurityManagerDomainSetPathLabel.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If you've ever tried running a huge page backed guest under
different user than in qemu.conf, you probably failed. Problem is
even though we have corresponding APIs in the security drivers,
there's no implementation and thus we don't relabel the huge page
path. But even if we did, so far all of the domains share the
same path:
/hugepageMount/libvirt/qemu
Our only option there would be to set 0777 mode on the qemu dir
which is totally unsafe. Therefore, we can create dir on
per-domain basis, i.e.:
/hugepageMount/libvirt/qemu/domainName
and chown domainName dir to the user that domain is configured to
run under.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far this function takes virDomainObjPtr which:
1) is an overkill,
2) might be not available in all the places we will use it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since the great rework of how we store vcpu- and iothread-related
data, we have overly complex part of code that is trying to format the
scheduler tuning data in as less lines as possible by grouping
settings for multiple threads. That was designed as an input syntax
sugar for users, but we don't need to also use that when formatting
the XML. Switching to simple enumeration makes the code nicer,
shorter and more welcoming to future changes.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When redoing the website we deleted the libvirtLogo.png file
not remembering that the test driver screenshot API impl
relied on it.
Rather than having the test driver use the logo as a side
effect, give it its own dedicated image to use. This is
installed in /usr/share/libvirt/test-screenshot.png and
is taken from a NeXT Cube running WorldWideWeb[1]. The
very first web browser in existance, running on the
hardware it was originally written on.
[1] https://en.wikipedia.org/wiki/WorldWideWeb
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Qemu 2.8.0+ changes arguments structure for blockdev-add in the effort
to make it finally stable. Since libvirt recently added the detection of
gluster debug support relying on the old syntax we need to add the new
as well.
With current perf framework, this patch adds support and documentation
for the branch_instructions perf event.
Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
Adjust the spacing a bit in order to generate 'cleaner' looking output.
This matches what virDomainMemoryStats does and it creates text/code boxes
in order to list each of the stats for each category.
With kernel 3.18 (since commit 3e32cb2e0a12b6915056ff04601cf1bb9b44f967)
the "unlimited" value for cgroup memory limits has changed once again as
its byte value is now computed from a page counter.
The new "unlimited" value reported by the cgroup fs is therefore 2**51-1
pages which is (VIR_DOMAIN_MEMORY_PARAM_UNLIMITED - 3072). This results
e.g. in virsh memtune displaying 9007199254740988 instead of unlimited
for the limits.
This patch uses the value of memory.limit_in_bytes from the cgroup
memory root which is the system's "real" unlimited value for comparison.
See also libvirt commit 231656bbeb for the
history for kernel 3.12 and before.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
So far the NSS module looks up only hostnames as provided by
guests themselves. However, there are some cases where this is
not enough: e.g. when there's a fresh new guest being installed
(with some generic hostname) say from a live ISO image; or some
(older) systems don't advertise their hostname in DHCP
transactions at all.
In cases like that it would be helpful if we translate domain
name as seen by libvirt too so that users can:
# virsh start $dom && ssh $dom
In order to achieve that new libvirt-guest module is introduced,
while older libvirt module maintains its current behaviour (that
is translating guest provided names into IP addresses).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that we have a module that's able to track
<domain, mac addres list> pairs, hook it up into
our network driver.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This module will be used to track:
<domain, mac address list>
pairs. It will be important to know these mappings without
libvirt connection (that is from a JSON file), because NSS
module will use those to provide better host name translation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There are couple of places where we have a string and want to
save it to a file. Atomically. In all those places we use
virFileRewrite() but also implement the very same callback which
takes the string and write it into temp file. This makes no
sense. Unify the callbacks and move them to one place.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In dd7bfb2cdc I've removed locking of the network driver upon
it's allocation. However, I forgot to remove one location of the
driver unlock.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add in the block I/O throttling group parameter to the command line
if supported. If not supported, fail command creation.
Add the xml2argvtest for testing.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Modify _virDomainBlockIoTuneInfo and rng schema to support the group_name
option for iotune throttling. Document the new value.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than have multiple bool values, create a single enum with bits
representing what fields are set. Fields are generally set in groups
of 3 (read, write, total).
Currently we build the JSON object for the "block_set_io_throttle"
command using the knowledge that a NULL for a support*Options boolean
would essentially ignore the rest of the arguments.
This may not work properly if some capability was backported, plus it just
looks rather ugly. So instead, build the "base" arguments and then if
the support*Option bool capability is set, add in the arguments on the fly.
Then append those arguments to the basic command and send to qemu.
Rather than using negative logic and setting the maxparams to a lesser
value based on which capabilities exist, alter the logic to modify the
maxparams based on a base value plus the found capabilities. Reduces the
chance that some backported feature produces an incorrect value.
These features are included:
AVX512DQ, AVX512IFMA, AVX512BW, AVX512VL, AVX512VBMI, AVX512_4VNNIW and
AVX512_4FMAPS.
qemu commits: cc728d14 and 95ea69fb
Signed-off-by: Lin Ma <lma@suse.com>
Commit id '03e750f3' added support for checking the PLOOP type; however,
it used 'target.type' which no storage code ever fills in, so it will
never be set. Change to just vol->type (could use vol->target.format
as well).
Add a global check for duplicate drive addresses. This will fix the
problem of duplicate disk and hostdev drive addresses.
Example for duplicate drive addresses:
<disk>
...
<target name='sda'/>
</disk>
<disk>
...
<target name='sdb'/>
<address type='drive' controller=0 bus=0 target=0 unit=0/>
</disk>
Another example:
<hostdev mode='subsystem' type='scsi' managed='no'>
<source>
...
</source>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</hostdev>
<hostdev mode='subsystem' type='scsi' managed='no'>
<source>
...
</source>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</hostdev>
Unfortunately the fixes (1b08cc170a,
8d46386bfe) weren't enough to catch these
cases and it isn't possible to add additional checks in
virDomainDeviceDefPostParseInternal() for SCSI hostdevs or
virDomainDiskDefAssignAddress() for SCSI/IDE/FDC/SATA disks without
adding another parse flag (virDomainDefParseFlags) to disable this
validation while updating or detaching a disk or hostdev.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Comparing the parameter 'type' against the member 'bus' instead of
against the member 'type' is quite confusing. Rename the parameter
'type' to 'bus_type' to clarify its meaning.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Pass the virDomainDeviceDriveAddress as a struct instead of individual
arguments. Reworked the function descriptions.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Two reasons:
1.in none hotplug, we will pass it. We can see from libvirt function
qemuBuildVhostuserCommandLine
2.qemu will use this vetcor num to init msix table. If we don't pass, qemu
will use default value, this will cause VM can only use default value
interrupts at most.
Signed-off-by: gaohaifeng <gaohaifeng.gao@huawei.com>
Consider the following XML snippets:
$ cat scsicontroller.xml
<controller type='scsi' model='virtio-scsi' index='0'/>
$ cat scsihostdev.xml
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='8' unit='1074151456'/>
</source>
</hostdev>
If we create a guest that includes the contents of scsihostdev.xml,
but forget the virtio-scsi controller described in scsicontroller.xml,
one is silently created for us. The same holds true when attaching
a hostdev before the matching virtio-scsi controller.
(See qemuDomainFindOrCreateSCSIDiskController for context.)
Detaching the hostdev, followed by the controller, works well and the
guest behaves appropriately.
If we detach the virtio-scsi controller device first, any associated
hostdevs are detached for us by the underlying virtio-scsi code (this
is fine, since the connection is broken). But all is not well, as the
guest is unable to receive new virtio-scsi devices (the attach commands
succeed, but devices never appear within the guest), nor even be
shutdown, after this point.
While this is not libvirt's problem, we can prevent falling into this
scenario by checking if a controller is being used by any hostdev
devices. The same is already done for disk elements today.
Applying this patch and then using the XML snippets from earlier:
$ virsh detach-device guest_01 scsicontroller.xml
error: Failed to detach device from scsicontroller.xml
error: operation failed: device cannot be detached: device is busy
$ virsh detach-device guest_01 scsihostdev.xml
Device detached successfully
$ virsh detach-device guest_01 scsicontroller.xml
Device detached successfully
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Although nearly all host devices that are assigned to guests using
VFIO ("<hostdev>" devices in libvirt) are physically PCI Express
devices, until now libvirt's PCI address assignment has always
assigned them addresses on legacy PCI controllers in the guest, even
if the guest's machinetype has a PCIe root bus (e.g. q35 and
aarch64/virt).
This patch tries to assign them to an address on a PCIe controller
instead, when appropriate. First we do some preliminary checks that
might allow setting the flags without doing any extra work, and if
those conditions aren't met (and if libvirt is running privileged so
that it has proper permissions), we perform the (relatively) time
consuming task of reading the device's PCI config to see if it is an
Express device. If this is successful, the connect flags are set based
on the result, but if we aren't able to read the PCI config (most
likely due to the device not being present on the system at the time
of the check) we assume it is (or will be) an Express device, since
that is almost always the case anyway.
If libvirtd is running unprivileged, it can open a device's PCI config
data in sysfs, but can only read the first 64 bytes. But as part of
determining whether a device is Express or legacy PCI,
qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future
patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond
the first 64 bytes of the PCI config data and fails with an error log
if the read is unsuccessful.
In order to avoid creating a parallel "quiet" version of
virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down
through all the call chains that initialize the
qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver
pointer with the rest of the iterdata so that it can be used by
qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used
yet, but will be used in an upcoming patch (that detects Express vs
legacy PCI for VFIO assigned devices) to examine driver->privileged.
The path to the config file for a PCI device is conventiently stored
in a virPCIDevice object, but that object's contents aren't directly
visible outside of virpci.c, so we need to have an accessor function
for it if anyone needs to look at it.
This new function just calls fstat() (if provided with a valid fd) or
stat() (if fd is -1) and returns st_size (or -1 if there is an
error). We may decide we want this function to be more complex, and
handle things like block devices - this is a placeholder (that works)
for any more complicated function.
We can't change feature names for compatibility reasons even if they
contain typos or other software uses different names for the same
features. By adding alternative spellings in our CPU map we at least
allow anyone to grep for them and find the correct libvirt's name.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When virt-aa-helper parses xml content it can fail on security labels.
It fails by requiring to parse active domain content on seclabels that
are not yet filled in.
Testcase with virt-aa-helper on a minimal xml:
$ cat << EOF > /tmp/test.xml
<domain type='kvm'>
<name>test-seclabel</name>
<uuid>12345678-9abc-def1-2345-6789abcdef00</uuid>
<memory unit='KiB'>1</memory>
<os><type arch='x86_64'>hvm</type></os>
<seclabel type='dynamic' model='apparmor' relabel='yes'/>
<seclabel type='dynamic' model='dac' relabel='yes'/>
</domain>
EOF
$ /usr/lib/libvirt/virt-aa-helper -d -r -p 0 \
-u libvirt-12345678-9abc-def1-2345-6789abcdef00 < /tmp/test.xml
Current Result:
virt-aa-helper: error: could not parse XML
virt-aa-helper: error: could not get VM definition
Expected Result is a valid apparmor profile
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Guido Günther <agx@sigxcpu.org>
Only the latest APIs are fully documented and the documentation of the
older variants (which are just limited versions of the new APIs anyway)
points to the newest APIs.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Restarting libvirtd on the source host at the end of migration when a
domain is already running on the destination would cause image labels to
be reset effectively killing the domain. Commit e8d0166e1d fixed similar
issue on the destination host, but kept the source always resetting the
labels, which was mostly correct except for the specific case handled by
this patch.
https://bugzilla.redhat.com/show_bug.cgi?id=1343858
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Post-copy migration needs bi-directional communication between the
source and the destination QEMU processes, which is not supported by
tunnelled migration.
https://bugzilla.redhat.com/show_bug.cgi?id=1371358
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We had a lot of rados_conf_set and check works.
Use helper virStorageBackendRBDRADOSConfSet for them.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Thanks to the complex capability caching code virQEMUCapsProbeQMP was
never called when we were starting a new qemu VM. On the other hand,
when we are reconnecting to the qemu process we reload the capability
list from the status XML file. This means that the flag preventing the
function being called was not set and thus we partially reprobed some of
the capabilities.
The recent addition of CPU hotplug clears the
QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS if the machine does not support it.
The partial re-probe on reconnect results into attempting to call the
unsupported command and then killing the VM.
Remove the partial reprobe and depend on the stored capabilities. If it
will be necessary to reprobe the capabilities in the future, we should
do a full reprobe rather than this partial one.
QEMU 2.8.0 adds support for unavailable-features in
query-cpu-definitions reply. The unavailable-features array lists CPU
features which prevent a corresponding CPU model from being usable on
current host. It can only be used when all the unavailable features are
disabled. Empty array means the CPU model can be used without
modifications.
We can use unavailable-features for providing CPU model usability info
in domain capabilities XML:
<domainCapabilities>
...
<cpu>
<mode name='host-passthrough' supported='yes'/>
<mode name='host-model' supported='yes'>
<model fallback='allow'>Skylake-Client</model>
...
</mode>
<mode name='custom' supported='yes'>
<model usable='yes'>qemu64</model>
<model usable='yes'>qemu32</model>
<model usable='no'>phenom</model>
<model usable='yes'>pentium3</model>
<model usable='yes'>pentium2</model>
<model usable='yes'>pentium</model>
<model usable='yes'>n270</model>
<model usable='yes'>kvm64</model>
<model usable='yes'>kvm32</model>
<model usable='yes'>coreduo</model>
<model usable='yes'>core2duo</model>
<model usable='no'>athlon</model>
<model usable='yes'>Westmere</model>
<model usable='yes'>Skylake-Client</model>
...
</mode>
</cpu>
...
</domainCapabilities>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
"host" CPU model is supported by a special host-passthrough CPU mode and
users is not allowed to specify this model directly with custom mode.
Thus we should not advertise "host" CPU model in domain capabilities.
This worked well on architectures for which libvirt provides a list of
supported CPU models in cpu_map.xml (since "host" is not in the list).
But we need to explicitly filter "host" model out for all other
architectures.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
CPU models (and especially some additional details which we will start
probing for later) differ depending on the accelerator. Thus we need to
call query-cpu-definitions in both KVM and TCG mode to get all data we
want.
Tests in tests/domaincapstest.c are temporarily switched to TCG to avoid
having to squash even more stuff into this single patch. They will all
be switched back later in separate commits.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This patch moves the CPU models formatting code from
virQEMUCapsFormatCache into a separate function.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The function just returned cached capabilities without checking whether
they are still valid. We should check that and refresh the capabilities
to make sure we don't return stale data. In other words, we should do
what all other lookup functions do.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The function is made a little bit more readable and the code which
refreshes cached capabilities if they are not valid any more was moved
into a separate function (virQEMUCapsCacheValidate) so that it can be
reused in other places.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
If a user asked for a KVM domain capabilities when KVM is not available,
we would happily return data we got when probing through TCG and
pretended they were relevant for KVM. Let's just report KVM is not
supported to avoid confusion.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When domain capabilities were introduced we did not have enough data to
decide whether KVM works on the host or not and thus working legacy/VFIO
device assignment was used as a witness. Now that we know whether KVM
was enabled when probing QEMU capabilities (and thus we know it's
working), we can use this knowledge to provide better default value for
virttype.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Since some may depend on the accelerator used when probing QEMU the
cache becomes invalid when KVM becomes available or if it is not
available anymore.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
CPU related capabilities may differ depending on accelerator used when
probing. Let's use KVM if available when probing QEMU and fall back to
TCG. The created capabilities already contain all we need to distinguish
whether KVM or TCG was used:
- KVM was used when probing capabilities:
QEMU_CAPS_KVM is set
QEMU_CAPS_ENABLE_KVM is not set
- TCG was used and QEMU supports KVM, but it failed (e.g., missing
kernel module or wrong /dev/kvm permissions)
QEMU_CAPS_KVM is not set
QEMU_CAPS_ENABLE_KVM is set
- KVM was not used and QEMU does not support it
QEMU_CAPS_KVM is not set
QEMU_CAPS_ENABLE_KVM is not set
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Let's set QEMU_CAPS_KVM and QEMU_CAPS_ENABLE_KVM early so that the rest
of the probing code can use these capabilities to handle KVM/TCG replies
differently.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Using -machine instead of -M for QMP probing is safe because any QEMU
binary which is capable of QMP probing supports -machine.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The code that runs a new QEMU process to be used for probing
capabilities is separated into four reusable functions so that any code
that wants to probe a QEMU process may just follow a few simple steps:
cmd = virQEMUCapsInitQMPCommandNew(...);
virQEMUCapsInitQMPCommandRun(cmd);
/* talk to the running QEMU process using its QMP monitor */
if (reprobeIsRequired) {
virQEMUCapsInitQMPCommandAbort(cmd, ...);
virQEMUCapsInitQMPCommandRun(cmd);
/* talk to the running QEMU process again */
}
virQEMUCapsInitQMPCommandFree(cmd);
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This reverts commit 3a6cf6fc16.
Mistakenly this commit was pushed because I thought I missed the
corret one b880ff42dd while in fact I didn't.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
We have couple of functions that operate over NULL terminated
lits of strings. However, our naming sucks:
virStringJoin
virStringFreeList
virStringFreeListCount
virStringArrayHasString
virStringGetFirstWithPrefix
We can do better:
virStringListJoin
virStringListFree
virStringListFreeCount
virStringListHasString
virStringListGetFirstWithPrefix
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If libvirt is compiled without NUMACTL support starting libvirtd
reports a libvirt internal error "NUMA isn't available on this host"
without checking if NUMA support is compiled into the libvirt binaries.
This patch adds the missing NUMA support check to prevent the internal error.
It also includes a check if the cgroup controller cpuset is available before
using it.
The error was noticed when libvirtd was restarted with running domains and
on libvirtd start the qemuConnectCgroup gets called during qemuProcessReconnect.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
With the QEMU components in place, provide the XML parsing to
invoke that code when given the following XML snippet:
<hostdev mode='subsystem' type='scsi_host'>
<source protocol='vhost' wwpn='naa.501234567890abcd'/>
</hostdev>
An optional address element can be specified within the hostdev
(pick CCW or PCI as necessary):
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0625'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
Add basic vhost-scsi tests which were cloned from hostdev-scsi-virtio-scsi
in both xml2argv and xml2xml. Added ones for both vhost-scsi-ccw and
vhost-scsi-pci since the syntaxes are slightly different between them.
Also adjusted the docs to describe the changes.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Adjust the device string that is built for vhost-scsi devices so that it
can be invoked from hotplug.
From the QEMU command line, the file descriptors are expect to be numeric only.
However, for hotplug, the file descriptors are expected to begin with at least
one alphabetic character else this error occurs:
# virsh attach-device guest_0001 ~/vhost.xml
error: Failed to attach device from /root/vhost.xml
error: internal error: unable to execute QEMU command 'getfd':
Parameter 'fdname' expects a name not starting with a digit
We also close the file descriptor in this case, so that shutting down the
guest cleans up the host cgroup entries and allows future guests to use
vhost-scsi devices. (Otherwise the guest will silently end.)
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Open /dev/vhost-scsi, and record the resulting file descriptor, so that
the guest has access to the host device outside of the libvirt daemon.
Pass this information, along with data parsed from the XML file, to build
a device string for the qemu command line. That device string will be
for either a vhost-scsi-ccw device in the case of an s390 machine, or
vhost-scsi-pci for any others.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
For a new hostdev type='scsi_host' we have a number of
required functions for managing, adding, and removing the
host device to/from guests. Provide the basic infrastructure
for these tasks.
The name "SCSIVHost" (and its variants) is chosen to avoid
conflicts with existing code named "SCSIHost" to refer to
a hostdev type='scsi' protcol='none'.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
We already have a "scsi" hostdev subsys type, which refers to a single
LUN that is passed through to a guest. But what of things where
multiple LUNs are passed through via a single SCSI HBA, such as with
the vhost-scsi target? Create a new hostdev subsys type that will
carry this.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Do all the stuff for the vhost-scsi capability in QEMU,
so it's in place for our checks later.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
* add vboxDriver object to serve as a singleton global object that
holds references to IVirtualBox and ISession to be shared among
multiple connections. The vbox_driver is instantiated only once in
the first call vboxGetDriverConnection function that is guarded by
a mutex.
* call vbox API initialize only when the first connection is
established, and likewise uninitialize when last connection
disconnects. The prevents each subsequent connection from overwriting
IVirtualBox/ISession instances of any other active connection that
led to libvirtd segfaults. The virConnectOpen and virConnectClose
implementations are guarded by mutex on the global vbox_driver_lock
where the global vbox_driver object counts connectios and decides
when it's safe to call vbox's init/uninit routines.
* add IVirutalBoxClient to vboxDriver and use it to in tandem with newer
pfnClientInitialize/pfnClientUninitalize APIs for vbox versions that
support it, to avoid usage of the old pfnComInitialize/Uninitialize.
Removed the comment 'Set the migration source' as it isn't valid anymore
and 'start it up' isn't useful as qemuProcessStart() is already a
speaking name.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
The path prefixes for sysfs trees are always prepended by paths
beginning with a slash, making the trailing slash in the prefix
redundant.
Signed-off-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Only generate a warning if there is something to report.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Just like in the previous commit, we are not updating CGroups on
chardev hot(un-)plug and thus leaving qemu unable to access any
non-default device users are trying to hotplug.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If users try to hotplug RNG device with a backend different to
/dev/random or /dev/urandom the whole operation fails as qemu is
unable to access the device. The problem is we don't update
device CGroups during the operation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
qemuDomainObjExitAgent is unsafe.
First it accesses domain object without domain lock.
Second it uses outdated logic that goes back to commit 79533da1 of
year 2009 when code was quite different. (unref function
instead of unreferencing only unlocked and disposed object
in case of last reference and leaved unlocking to the caller otherwise).
Nowadays this logic may lead to disposing locked object
i guess.
Another problem is that the callers of qemuDomainObjEnterAgent
use domain object again (namely priv->agent) without domain lock.
This patch address these two problems.
qemuDomainGetAgent is dropped as unused.
Sometimes after domain restart agent is unavailabe even
if it is up and running in guest. Diagnostic message is
"QEMU guest agent is not available due to an error"
that is 'priv->agentError' is set. Investiagion shows that
'priv->agent' is not NULL, so error flag is set probably
during domain shutdown process and not cleaned up eventually.
The patch is quite simple - just clean up error flag unconditionally
upon domain stop.
Other hunks address other cases when error flag is not cleaned up.
1. processSerialChangedEvent. We need to clean error flag
unconditionally here too. For example if upon first 'connected' event we
fail to connect and set error flag and then connect on second
'connected' event then error flag will remain set erroneously
and make agent unavailable.
2. qemuProcessHandleAgentEOF. If error flag is set and we get
EOF we need to change state (and diagnostic) from 'error' to
'not connected'.
qemuConnectAgent return -1 or -2 in case of different errors.
A. -1 is a case of unsuccessuful connection to guest agent.
B. -2 is a case of destoyed domain during connection attempt.
All qemuConnectAgent callers handle the first error the same way
so let's move this logic into qemuConnectAgent itself. Patched
function returns 0 in case A and -1 in case B.
It is already discussed in "[RFC] daemon: remove hardcode dep on libvirt-guests" [1].
Mgmt can use means to save/restore domains on system shutdown/boot other than
libvirt-guests.service. Thus we need to specify appropriate ordering dependency between
libvirtd, domains and save/restore service. This patch takes approach suggested
in RFC and introduces a systemd target, so that ordering can be built next way:
libvirtd -> domain -> virt-guest-shutdown.target -> save-restore.service.
This way domains are decoupled from specific shutdown service via intermediate
target.
[1] https://www.redhat.com/archives/libvir-list/2016-September/msg01353.html
Use the util function virHostdevIsSCSIDevice() to simplify if
statements.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Add the function virHostdevIsSCSIDevice() which detects whether a
hostdev is a SCSI device or not.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Add missing checks if a hostdev is a subsystem/SCSI device before access
the union member 'subsys'/'scsi'. Also fix indentation and simplify
qemuDomainObjCheckHostdevTaint().
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
New line character in name of storagepool is now forbidden because it
mess virsh output and can be confusing for users.
Validation of name is done in driver, after parsing XML to avoid
problems with dissappeared pools which was already created with
new-line char in name.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
New line character in name of domain is now forbidden because it
mess virsh output and can be confusing for users.
Validation of name is done in drivers, after parsing XML to avoid
problems with dissappeared domains which was already created with
new-line char in name.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 3f71c79768 added 'qemu_id' field to track the id of the cpu
as reported by query-cpus. The patch did not include changes necessary
to propagate the id through the functions matching the data to the
libvirt cpu structures and thus all vcpus had id 0.
Don't use qemuMonitorGetCPUInfo which does a lot of matching to get the
full picture which is not necessary and would be mostly discarded.
Refresh only the vcpu halted state using data from query-cpus.
We don't need to call qemuMonitorGetCPUInfo which is very inefficient to
get data required to update the vcpu 'halted' state.
Add a monitor helper that will retrieve the halted state and return it
in a bitmap so that it can be indexed easily.