A coming patch will make it optionally possible to list backing
chain block stats; in this mode of operation, block.counts is no
longer the number of <disks> in the domain, but the number of
blocks in the array being reported. We still want block.count
listed first, but rather than iterate the tree twice (once to
count, and once to list stats), it's easier to just touch things
up after the fact.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Compute count
after the fact.
Signed-off-by: Eric Blake <eblake@redhat.com>
The prior refactoring can now be put to use. With the same domain
as the earlier commit 7b49926 (one qcow2 disk and an empty
cdrom drive):
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.0.path=/var/lib/libvirt/images/foo.qcow2
block.0.allocation=1309614080
block.0.capacity=42949672960
block.0.physical=1309671424
block.1.name=hdc
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Use
qemuStorageLimitsRefresh to report offline statistics.
Signed-off-by: Eric Blake <eblake@redhat.com>
Create a helper function that can be reused for gathering block
info from virDomainListGetStats.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Split guts...
(qemuStorageLimitsRefresh): ...into new helper function.
Signed-off-by: Eric Blake <eblake@redhat.com>
The documentation for virDomainBlockInfo was confusing: it stated
that 'physical' was the size of the container, then gave an example
of it being the amount of storage used by a sparse file (that is,
for a sparse raw image on a regular file, the wording implied
capacity==physical, while allocation was smaller; but the example
instead claimed physical==allocation). Since we use 'physical' for
the last offset of a block device, we should do likewise for
regular files.
Furthermore, the example claimed that for a qcow2 regular file,
allocation==physical. At the time the code was first written,
this was true (qcow2 files were allocated sequentially, and were
never sparse, so the last sector written happened to also match
the disk space occupied); but modern qemu does much better and
can punch holes for a qcow2 with allocation < physical.
Basically, after this patch, the three fields are now reliably
mapped as:
'capacity' - how much storage the guest can see (equal to
physical for raw images, determined by image metadata otherwise)
'allocation' - how much storage the image occupies (similar to
what 'du' would report)
'physical' - the last offset of the image (similar to what 'ls'
would report)
'capacity' can be larger than 'physical' (such as for a qcow2
image that does not vary much from a backing file) or smaller
(such as for a qcow2 file with lots of internal snapshots).
Likewise, 'allocation' can be (slightly) larger than 'physical'
(such as counting the tail of cluster allocations required to
round a file size up to filesystem granularity) or smaller
(for a sparse file). A block-resize operation changes capacity
(which, for raw images, also changes physical); many non-raw
images automatically grow physical and allocation as necessary
when starting with an allocation smaller than capacity; and even
when capacity and physical stay unchanged, allocation can change
when converting sectors from holes to data or back.
Note that this does not change semantics for qcow2 images stored
on block devices; there, we still rely on qemu to report the
highest written extent for allocation. So using this API to
track when to extend a block device because a qcow2 image is
about to exceed a threshold will not see any changes.
Also, note that virStorageVolInfo is unfortunately limited to
just 'capacity' and 'allocation' (we can't expand it to add
'physical', although we can expand the XML to add it there);
historically, that struct's 'allocation' value has reported
file size for qcow2 files (what this patch terms 'physical'
for a domain block device), but disk usage for raw files (what
this patch terms 'allocation'). So follow-up patches will be
needed to make storage volumes report the same allocation
values and get at physical values, where those differ.
* include/libvirt/libvirt-domain.h (_virDomainBlockInfo): Tweak
documentation to match saner definition.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): For regular
files, physical size is capacity, not allocation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Ultimately, we want to avoid read()ing a file while qemu is running.
We still have to open() block devices to determine their physical
size, but that is safer. This patch rearranges code to group
together all code that reads the image, to make it easier for later
patches to skip the metadata collection when possible.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Check for empty
disk up front. Place metadata reading next to use.
Signed-off-by: Eric Blake <eblake@redhat.com>
When requested in a later patch, the QMP command results are now
examined recursively. As qemu_driver will eventually have to
read items out of the hash table as stored by this patch, the
computation of backing alias string is done in a shared location.
* src/qemu/qemu_domain.h (qemuDomainStorageAlias): New prototype.
* src/qemu/qemu_domain.c (qemuDomainStorageAlias): Implement it.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): Perform recursion.
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Update callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
A future patch will allow recursion into backing chains when
collecting block stats. This patch should not change behavior,
but merely moves out the common code that will be reused once
recursion is enabled, and adds the parameter that will turn on
recursion.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Add recursion parameter,
although it is ignored for now.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.h
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Add parameter, and
split...
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): ...into helpers.
(qemuMonitorJSONGetBlockStatsInfo): Update caller.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Update caller.
* src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Right now, grabbing blockinfo always calls stat on the disk, then
opens the image to determine the capacity, using a throw-away
virStorageSourcePtr. This has a couple of drawbacks:
1. We are calling stat and opening a file on every invocation of
the API. However, there are cases where the stats should NOT be
changing between successive calls (if a domain is running, no
one should be changing the physical size of a block device or raw
image behind our backs; capacity of read-only files should not
be changing; and we are the gateway to the block-resize command
to know when the capacity of read-write files should be changing).
True, we still have to use stat in some cases (a sparse raw file
changes allocation if it is read-write and the amount of holes is
changing, and a read-write qcow2 image stored in a file changes
physical size if it was not fully pre-allocated). But for
read-only images, even this should be something we can remember
from the previous time, rather than repeating every call.
2. We want to enhance the power of virDomainListGetStats, by
sharing code. But we already have a virStorageSourcePtr for
each disk, and it would be easier to reuse the common structure
than to have to worry about the one-off virDomainBlockInfoPtr.
While this patch does not optimize reuse of information in point
1, it does get us closer to being able to do so; by updating a
structure that survives between consecutive calls.
* src/util/virstoragefile.h (_virStorageSource): Add physical, to
mirror virDomainBlockInfo; rearrange fields to match public struct.
(virStorageSourceCopy): Copy the new field.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Store into
storage source, then copy to block info.
Signed-off-by: Eric Blake <eblake@redhat.com>
In order for a future patch to virDomainListGetStats to reuse
some code for determining disk usage of offline domains, we
need to make it easier to pull out part of the guts of grabbing
blockinfo. The current implementation grabs a job fairly late
in the game, while getstats will already own a job; reordering
things so that the job is always grabbed up front in both
functions will make it easier to pull out the common code.
This patch results in grabbing a job in cases where one was not
previously needed, but as it is a query job, it should not be
noticeably slower.
This patch touches the same code as the fix for CVE-2014-6458
(commit b799259); in that patch, we avoided hotplug changing
a disk reference during the time of obtaining a monitor lock
by copying all data we needed and no longer referencing disk;
this patch goes the other way and ensures that by holding the
job, the disk cannot be changed so we no longer need to worry
about the disk being invalidated across the monitor lock.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Rearrange job
control to be outside of disk information.
Signed-off-by: Eric Blake <eblake@redhat.com>
When any of the functions modified in commit 214c687b took false branch,
the function itself used none of its parameters resulting in "unused
parameter" error. Rewriting these functions to the stubs we use
elsewhere should fix the problem.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit e3435caf added cleanup code to qemuDomainSetVcpusFlags() that was
not supposed to reset the error. Usual procedure was done, saving the
error to temporary variable, but it was never free'd, but rather leaked.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit af2a1f05 tried clearly separating each condition in
qemuRestoreCgroupState() for the sake of readability, however somehow
one condition body was missing. That means that the body of the next
condition got executed only if both of there were true, which is
impossible, thus resulting in a dead code and a logic error.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In commit d2632d60 we agreed taht we want the parsed uid to properly
overflow but only to -1, however the value was read into long and then
wrapped into uid_t. That meaned it failed on 32-bit systems.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Currently virStorageFileResize() function uses build conditionals to
choose either the posix_fallocate() or syscall(SYS_fallocate) with no
fallback in order to preallocate the space in the newly resized file.
Since the safezero code has a similar set of conditionals modify the
resize and safezero code in order to allow the resize logic to make use
of safezero to unify the look/feel of the code paths.
Add a new boolean (resize) to safezero() to make the optional decision
whether to try syscall(SYS_fallocate) if the posix_fallocate fails because
HAVE_POSIX_FALLOCATE is not defined (eg, return -1 and errno == 0).
Create a local safezero_sys_fallocate in order to handle the resize
code paths that support that. If not present, the set errno = ENOSYS
in order to allow the caller to handle the failure scenarios.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Currently build conditionals decide which of two safezero() functions
should be built - either the posix_fallocate() or mmap() with a fallback
to a slower safewrite() algorithm in order to preallocate space in a raw file.
This patch will refactor safezero to utilize static functions for either
posix_fallocate or mmap/safewrite. The build conditional still exist, but
are only for shorter sections of code.
The posix_fallocate path will make use of the ret/errno setting to contain
the logic for safezero to decide whether it needs to fallback to other
algorithms. A return of -1 with errno not changed will indicate the conditional
is not present; otherwise, a return of -1 with errno change indicates the
call was made and it failed (no functional difference to current algorithm).
The mmap/safewrite option changes only slightly to handle the ftruncate
failure for mmap. That is, previously if the ftruncate failed, there was
no fallback to the slow safewrite option.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Currently, when there is an API that's blocking with locked domain and
second API that's waiting in virDomainObjListFindByUUID() for the domain
lock (with the domain list locked) no other API can be executed on any
domain on the whole hypervisor because all would wait for the domain
list to be locked. This patch adds new optional approach to this in
which the domain is only ref'd (reference counter is incremented)
instead of being locked and is locked *after* the list itself is
unlocked. We might consider only ref'ing the domain in the future and
leaving locking on particular APIs, but that's no tonight's fairy tale.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Volume and pool formatting functions took different approaches to
unspecified uids/gids. When unknown, it is always parsed as -1, but one
of the functions formatted it as unsigned int (wrong) and one as
int (better). Due to that, our two of our XML files from tests cannot
be parsed on 32-bit machines.
RNG schema needs to be modified as well, but because both
storagepool.rng and storagevol.rng need same schema for permission
element, save some space by moving it to storagecommon.rng.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When hot-plugging a VCPU into the guest, kvm needs to allocate some data
from the DMA zone, which might be in a memory node that's not allowed in
cpuset.mems. Basically the same problem as there was with starting the
domain and due to which commit 7e72ac7878
exists. This patch just extends it to hotplugging as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1161540
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Instead of setting the value of cpuset.mems once when the domain starts
and then re-calculating the value every time we need to change the child
cgroup values, leave the cgroup alone and rather set the child data
every time there is new cgroup created. We don't leave any task in the
parent group anyway. This will ease both current and future code.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In systemd >= 218, the udev_set_log_fn method has been marked
deprecated and turned into a no-op. Nothing in the udev client
library will print to stderr by default anymore, so we can
just stop installing a logging hook for new enough udev.
For SCSI and SATA devices controller and unit are used
to specify drive address. For IDE devices - bus specifies
IDE bus, becase usually there are 2 IDE buses on IDE
controller.
Parallels SDK allows to set drive position by calling
PrlVmDev_SetStackIndex. Since PCS VMs have only one
controller of each type, for SATA and SCSI devices it
simple means position on bus, for IDE devices -
2 * bus_number + position_on_bus.
This patch fixes mapping from libvirt's disk->info.addr.drive
to parallels's 'StackIndex'.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
It seems file format is usually specified event for
real block devices. So report that file format is
raw in virDomainGetXMLDesc and add checks for proper
file format to prlsdkAddDisk.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
NULL value of virDomainVideoAccelDefPtr means default
values for video acceleration, so don't report error in
this case.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1174154
When we use attach-device add a hostdev or chr device which have a
iscsi address or others (just like guest agent, subsys iscsi disk...),
we will find there is no basic controller for our new attached device.
Somtimes this will make guest cannot start after we add them (although
they can start at the second time).
Signed-off-by: Luyao Huang <lhuang@redhat.com>
When libvirt is managing a bridge's forwarding database (FDB)
(macTableManager='libvirt'), if we add FDB entries for a new guest
interface even before the qemu process is created, then in the case of
a migration any other guest attached to the "destination" bridge will
have its traffic immediately sent to the destination of the migration
even while the source domain is still running (and the destination, of
course, isn't). To make sure that traffic from other guests on the new
host continues flowing to the old guest until the new one is ready, we
have to wait until the new guest CPUs are started to add the FDB
entries.
Conversely, we need to remove the FDB entries from the bridge any time
the guest CPUs are stopped; among other things, this will assure
proper operation during a post-copy migration (which is just the
opposite of the problem described in the previous paragraph).
We can change vnc password by using virDomainUpdateDeviceFlags API with
live flag. But it can't be changed with config flag. Error is reported as
below.
error: Operation not supported: persistent update of device 'graphics' is not supported
This patch supports the graphics arguments changed with config flag.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
It's not supported to change some graphics arguments with '--live'.
Replace some error code VIR_ERR_INTERNAL_ERROR and VIR_ERR_INVALID_ARG
with VIR_ERR_OPERATION_UNSUPPORTED.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1174096
When both parameter have lockspaces present, virDomainLeaseIndex
always returns -1 even there is a lease the same with the one we
check. This is due to broken logic in 'if-else' statement.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1173507
It occurred to me that OpenStack uses the following XML when not using
regular huge pages:
<memoryBacking>
<hugepages>
<page size='4' unit='KiB'/>
</hugepages>
</memoryBacking>
However, since we are expecting to see huge pages only, we fail to
startup the domain with following error:
libvirtError: internal error: Unable to find any usable hugetlbfs
mount for 4 KiB
While regular system pages are not huge pages technically, our code is
prepared for that and if it helps OpenStack (or other management
applications) we should cope with that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1174053
Introduced by commit id '17bddc46f' - fix a libvirtd crash when
matching a network iscsi hostdev with a host iscsi hostdev.
When we use attach-device to coldplug a network iscsi hostdev,
libvirt will check if there is already a device in XML. But if
the 'b' is a host iscsi hostdev and 'a' is a network iscsi hostdev,
then libvirtd will crash in virDomainHostdevMatchSubsysSCSIiSCSI
because 'b' doesn't have a hostname.
Add a check in virDomainHostdevMatchSubsys, if the a's protocol
and b's protocol is not the same.
Following is the backtrace:
0 0x00007f850d6bc307 in virDomainHostdevMatchSubsysSCSIiSCSI at conf/domain_conf.c:10889
1 virDomainHostdevMatchSubsys at conf/domain_conf.c:10911
2 virDomainHostdevMatch at conf/domain_conf.c:10973
3 virDomainHostdevFind at conf/domain_conf.c:10998
4 0x00007f84f6a10560 in qemuDomainAttachDeviceConfig at qemu/qemu_driver.c:7223
5 qemuDomainAttachDeviceFlags at qemu/qemu_driver.c:7554
Signed-off-by: Luyao Huang <lhuang@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1160995
In our config files users are expected to pass several integer values
for different configuration knobs. However, majority of them expect a
nonnegative number and only a few of them accept a negative number too
(notably keepalive_interval in libvirtd.conf).
Therefore, a new type to config value is introduced: VIR_CONF_ULONG
that is set whenever an integer is positive or zero. With this
approach knobs accepting VIR_CONF_LONG should accept VIR_CONF_ULONG
too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There's no need for condition of the following form:
if (str && STREQ(str, dst))
since we have STREQ_NULLABLE macro that handles NULL cases.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For historical reasons, only the first <console> element might be of targetType
serial, but we checked for other consoles of targetType serial in our post-parse
callback if and only if we knew the first console was serial, otherwise
the check was skipped.
This patch moves the check one level up, so first
the check for secondary console of type serial is performed and then the
rest of operations continue unchanged.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1170092
We now have a qemuInterfaceStartDevices() which does the final
activation needed for the host-side tap/macvtap devices that are used
for qemu network connections. It will soon make sense to have the
converse qemuInterfaceStopDevices() which will undo whatever was done
during qemuInterfaceStartDevices().
A function to "stop" a single device has also been added, and is
called from the appropriate place in qemuDomainDetachNetDevice(),
although this is currently unnecessary - the device is going to
immediately be deleted anyway, so any extra "deactivation" will be for
naught. The call is included for completeness, though, in anticipation
that in the future there may be some required action that *isn't*
nullified by deleting the device.
This patch is a part of a more complete fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
The patch that added qemuInterfaceStartDevices() (upstream commit
82977058f5) had an extra conditional to
prevent calling it if the reason for starting the CPUs was
VIR_DOMAIN_RUNNING_UNPAUSED or VIR_DOMAIN_RUNNING_SAVE_CANCELED. This
was put in by the author as the result of a reviewer asking if it was
necessary to ifup the interfaces in *all* occasions (because these
were the two cases where the CPU would have already been started (and
stopped) once, so the interface would already be ifup'ed).
It turns out that, as long as there is no corresponding
qemuInterfaceStopDevices() to ifdown the interfaces anytime the CPUs
are stopped, neglecting to ifup when reason is RUNNING_UNPAUSED or
RUNNING_SAVE_CANCELED doesn't cause any problems (because it just
happens that the interface will have already been ifup'ed by a prior
call when the CPU was previously started for some other reason).
However, it also doesn't *help*, and there will soon be a
qemuInterfaceStopDevices() function which *will* ifdown these
interfaces when the guest CPUs are stopped, and once that is done, the
interfaces will be left down in some cases when they should be up (for
example, if a domain is paused and then unpaused).
So, this patch is removing the condition in favor of always calling
qemuInterfaeStartDevices() when the guest CPUs are started.
This patch (and the aforementioned patch) resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
When one domain is being undefined and at the same time started, for
example, there is a possibility of a rare problem occuring.
- Thread 1 does virDomainUndefine(), has the lock, checks that the
domain is active and because it's not, calls
virDomainObjListRemove().
- Thread 2 does virDomainCreate() and tries to lock the domain.
- Thread 1 needs to lock domain list in order to remove the domain from
it, but must unlock domain first (proper order is to lock domain list
first and the domain itself second).
- Thread 2 grabs the lock, starts the domain and releases the lock.
- Thread 1 grabs the lock and removes the domain from list.
With this patch:
- The undefining domain gets marked as "to undefine" before it is
unlocked.
- If domain is found in any of the search APIs, it's returned only if
it is not marked as "to undefine". The check is done while the
domain is locked.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1150505
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When calling virCgroupAllowAllDevices we get these invalid entries
in the device cgroup config.
b -1:-1 rw
c -1:-1 rw
Check for positive values before outputting the major and minor to
avoid that.
For host-passthrough CPU we don't honor the CPU
features specified in the XML, but we allow
outputting them via the UPDATE_CPU flag for dumpxml,
this gives user a rough idea of what features the CPU
might have.
After restoring a managedsave'd domain, the features
might end up in the live status XML (in /var/run) without
the model. This XML cannot be parsed by the daemon after
restart and the domain might disappear.
This fix skips formatting the features for HOST_PASSTHROUGH
when UPDATE_CPU is not specified, so the newly restored domains
and newly created snapshots won't be affected.
Note: this doesn't fix existing snapshots or already restored
running domains.
https://bugzilla.redhat.com/show_bug.cgi?id=1030793https://bugzilla.redhat.com/show_bug.cgi?id=1151885
A logic bug in qemuConnectGetAllDomainStats makes the code mark the
monitor as available when qemuDomainObjBeginJob fails, instead of when
it succeeds, as the correct flow requires.
This patch fixes the check and updates the code documentation
accordingly.
Broken by commit 57023c0a3a.
Signed-off-by: Francesco Romani <fromani@redhat.com>
When using qemuProcessAttach to attach a qemu process,
the DAC label is not filled correctly.
Introduce a new function to get the uid:gid from the system
and fill the label.
This fixes the daemon crash when 'virsh screenshot' is called:
https://bugzilla.redhat.com/show_bug.cgi?id=1161831
It also fixes qemu-attach after the prerequisite of this patch
(commit f8c1fb3) was pushed out of order.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Currently, MAC registration occurs during device creation, which is
early enough that, during live migration, you end up with duplicate
MAC addresses on still-running source and target devices, even though
the target device isn't actually being used yet.
This patch proposes to defer MAC registration until right before
the guest can actually use the device -- In other words, right
before starting guest CPUs.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Laine Stump <laine@laine.org>
Some programs want to change some values for the network interfaces
configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them
allows wicked to work on openSUSE 13.2+.
Reusing the lxcNeedNetworkNamespace function to tell
lxcContainerMountBasicFS if the netns is disabled. When no netns is
set up, then we don't mount the /proc/sys/net/ipv[46] folder RW as
these would provide full access to the host NICs config.
https://bugzilla.redhat.com/show_bug.cgi?id=1172015
The refactoring done as part of commit id '59446096' caused a regression
for the multi initiator IQN commit '6aabcb5b' because the sendtargets was
not done on/for the initiator IQN prior to login (or trying to disable
autologin)
Prior to that commit, the paths were essentially
virStorageBackendISCSIStartPool
virStorageBackendISCSILogin
virStorageBackendISCSIConnection
if initiatoriqn
virStorageBackendCreateIfaceIQN
Issue sendtargets
Perform --login
else
Issue sendtargets
Perform --login
After that commit:
virStorageBackendISCSIStartPool
Issue sendtargets
Call virStorageBackendISCSIConnection
If initiatoriqn
virStorageBackendCreateIfaceIQN
Perform --login
else
Perform --login
So for non initiator IQN paths, nothing changed. For the initiator path,
the --login fails as does any attempts to change autologin via "--op update
--name node.startup --value manual".
In old version of parted like parted-2.1-25, error message is shown in
stdout when printing a disk info without disk label.
Error: /dev/sda: unrecognised disk label
This line has been moved to stderr in newer version of parted. So we
should check both stdout and stderr when locating this message.
This should fix bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1172468
Signed-off-by: Hao Liu <hliu@redhat.com>
When user doesn't have read access on one of the domains he requested,
the for loop could exit abruptly or continue and override pointer which
pointed to locked object.
This patch fixed two issues at once. One is that domflags might have
had QEMU_DOMAIN_STATS_HAVE_JOB even when there was no job started (this
is fixed by doing domflags |= QEMU_DOMAIN_STATS_HAVE_JOB only when the
job was acquired and cleaning domflags on every start of the loop.
Second one is that the domain is kept locked when
virConnectGetAllDomainStatsCheckACL() fails and continues the loop when
it didn't end. Adding a simple virObjectUnlock() and clearing the
pointer ought to do.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
If we want to perform some operation and domain state is not suitable
for that operation, we should report error VIR_ERR_OPERATION_INVALID.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
When PrlJob_GetRetCode sets second argument to
error value it means sdk function failed and we
must return error from getJobResultHelper.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Return error code, returned by parallels SDK from
waitJob and getJobResult, so that caller can handle
different errors.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Get cdrom devices list from parallels server in
prlsdkLoadDomains and add ability to define a domain
with cdroms.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
First, we don't need to call prlsdkApplyConfig after
creating new VM or containers, because it's done in
functions prlsdkCreateVm and prlsdkCreateCt.
No need to check, if domain exists in the list after
prlsdkAddDomain.
Also organize code, so that we can call virObjectUnlock
in one place.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
This patch replaces code, which creates domains by
running prlctl command.
prlsdkCreateVm/Ct will do prlsdkApplyConfig, because
we send request to the server only once in this case.
But prlsdkApplyConfig will be called also from
parallelsDomainDefineXML function. There is no problem with
it, parallelsDomainDefineXML will be refactored later.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Rewrite code, which applies domain configuration given
to virDomainDefineXML function to the VM of container
registered in PCS.
This code first check if there are unsupported parameters
in domain XML and if yes - reports error. Some of such
parameters are not supported by PCS, for some - it's not
obvious, how to convert them into PCS's corresponding params,
so let's put off it, and implement only basic params in
this patch.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Change domain state using parallels SDK functions instead of
prlctl command.
We don't need to send events from these functions now, becase
events handler will send them. But we still need to update
virDomainObj in privconn->domains.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Subscribe to events from parallels server. It's
needed for 2 things: to update cached domains list
and to send corresponding libvirt events.
Parallels server sends a lot of different events, in
this patch we handle only some of them. In the future
we can handle for example, changes in a host network
configuration or devices states.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Move macro parallelsDomNotFoundError to file parallels_utils.h, because
it will be used in parallels_sdk.c.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
Obtain information about domains using parallels sdk instead of prlctl.
prlsdkLoadDomains functions behaves as former parallelsLoadDomains with
NULL as second parameter (name) - it fills parallelsConn.domains list.
prlsdkLoadDomain is now able to update specified domain by given
virDomainObjPtr.
Signed-off-by: Dmitry Guryanov <dguryanov@parallels.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1082521
Support for shared hostdev's was added in a number of commits, initially
starting with 'f2c1d9a80' and most recently commit id 'fd243fc4' to fix
issues with the initial implementation. Missed in all those changes was
the need to mimic the virSELinux{Set|Restore}SecurityDiskLabel code to
handle the "shared" (or shareable) and readonly options when Setting
or Restoring the SELinux labels.
This patch will adjust the virSecuritySELinuxSetSecuritySCSILabel to not
use the virSecuritySELinuxSetSecurityHostdevLabelHelper in order to set
the label. Rather follow what the Disk code does by setting the label
differently based on whether shareable/readonly is set. This patch will
also modify the virSecuritySELinuxRestoreSecuritySCSILabel to follow
the same logic as virSecuritySELinuxRestoreSecurityImageLabelInt and not
restore the label if shared/readonly
https://bugzilla.redhat.com/show_bug.cgi?id=1171582
When we edit a negative controller address number to a device,
some of them will auto generate a controller with invalid index
number. This will make guest disappear after restart libvirtd.
Instead of allowing negative number for controller index, we
should forbid negative number in these place (we did this before,
but after f18c02ec, virStrToLong_ui changed to allow negative
number). Therefore switch to virStrToLong_uip in these places.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Avoid leaving the domain locked on a failed ACL check in
qemuDomainMigratePerform() and qemuDomainMigrateFinish2().
Introduced in commit abf75aea24 (Add ACL checks into the QEMU driver).
Commit c75425734 introduced a compilation failure:
../../src/access/viraccessdriverpolkit.c: In function 'virAccessDriverPolkitCheck':
../../src/access/viraccessdriverpolkit.c:137:5: error: format '%d' expects argument of type 'int', but argument 9 has type 'pid_t' [-Werror=format=]
VIR_DEBUG("Check action '%s' for process '%d' time %lld uid %d",
^
Since mingw pid_t is 64 bits, it's easier to just follow what we've
done elsewhere and cast to a large enough type when printing pids.
* src/access/viraccessdriverpolkit.c (virAccessDriverPolkitCheck):
Add cast.
Signed-off-by: Eric Blake <eblake@redhat.com>
lxcProcessSetupInterfaces() used to have a special case for
actualType='network' (a network with forward mode of route, nat, or
isolated) to call the libvirt public API to retrieve the bridge being
used by a network. That is no longer necessary - since all network
types that use a bridge and tap device now get the bridge name stored
in the ActualNetDef, we can just always use
virDomainNetGetActualBridgeName() instead.
qemuNetworkIfaceConnect() used to have a special case for
actualType='network' (a network with forward mode of route, nat, or
isolated) to call the libvirt public API to retrieve the bridge being
used by a network. That is no longer necessary - since all network
types that use a bridge and tap device now get the bridge name stored
in the ActualNetDef, we can just always use
virDomainNetGetActualBridgeName() instead.
(an audit of the two callers to qemuNetworkIfaceConnect() confirms
that it is never called for any other type of network, so the dead
code in the else statement (logging an internal error if it is called
for any other type of network) is eliminated in the process.)
When libvirt is managing the MAC table of a Linux host bridge, it must
turn off learning and unicast_flood for each tap device attached to
that bridge, then add a Forwarding Database (fdb) entry for the tap
device using the MAC address from the domain interface config.
Once we have disabled learning and flooding, any packet that has a
destination MAC address not present in the fdb will be dropped by the
bridge. This, along with the opportunistic disabling of promiscuous
mode[*], can result in enhanced network performance. and a potential
slight security improvement.
[*] If there is only one device on the bridge with learning/unicast_flood
enabled, then that device will automatically have promiscuous mode
disabled. If there are *no* devices with learning/unicast_flood
enabled (e.g. for a libvirt "route", "nat", or isolated network that
has no physical device attached), then all non-tap devices will have
promiscuous mode disabled (tap devices always have promiscuous mode
enabled, which may be a bug in the kernel, but in practice has 0
effect).
None of this has any effect for kernels prior to 3.15 (upstream kernel
commit 2796d0c648c940b4796f84384fbcfb0a2399db84 "bridge: Automatically
manage port promiscuous mode"). Even after that, until kernel 3.17
(upstream commit 5be5a2df40f005ea7fb7e280e87bbbcfcf1c2fc0 "bridge: Add
filtering support for default_pvid") traffic will not be properly
forwarded without manually adding vlan table entries. Unfortunately,
although the presence of the first patch is signalled by existence of
the "learning" and "unicast_flood" options in sysfs, there is no
reliable way to query whether or not the system's kernel has the
second of those patches installed, the only thing that can be done is
to try the setting and see if traffic continues to pass.
When the bridge device for a network has macTableManager='libvirt' the
intent is that all kernel management of the bridge's MAC table
(Forwarding Database, or fdb, in the case of a Linux Host Bridge) be
disabled, with libvirt handling updates to the table instead. The
setup required for the bridge itself is:
1) set the "vlan_filtering" property of the bridge device to 1.
2) If the bridge has a "Dummy" tap device used to set a fixed MAC
address on the bridge (which is always the case for a bridge created
by libvirt, and never the case for a bridge created by the host system
network config), turn off learning and unicast_flood on this tap (this
is needed even though this tap is never IFF_UP, because the kernel
ignores the IFF_UP flag of devices when using their settings to
automatically decide whether or not to turn off promiscuous mode for
any attached device).
(1) is done both for libvirt-created/managed bridges, and for bridges
that are created by the host system config, while (2) is done only for
bridges created by libvirt (i.e. for forward modes of nat, routed, and
isolated bridges)
There is no attempt to turn vlan_filtering off when destroying the
network because in the case of a libvirt-created bridge, the bridge is
about to be destroyed anyway, and in the case of a system bridge, if
the other devices attached to the bridge could operate properly before
destroying libvirt's network object, they will continue to operate
properly (this is similar to the way that libvirt will enable
ip_forwarding whenever a routed/natted network is started, but will
never attempt to disable it if they are stopped).
At the time that the network driver allocates a connection to a
network, the tap device that will be used hasn't yet been created -
that will be done later by qemu (or lxc or whoever) - but if the
network has macTableManager='libvirt', then when we do get around to
creating the tap device, we will need to add an entry for it to the
network bridge's fdb (forwarding database) *and* turn off learning and
unicast_flood for that tap device in the bridge's sysfs settings. This
means that qemu needs to know both the bridge name as well as the
setting of macTableManager, so we either need to create a new API to
retrieve that info, or just pass it back in the ActualNetDef that is
created during networkAllocateActualDevice. We choose the latter
method, since it's already done for the bridge device, and it has the
side effect of making the information available in domain status.
(NB: in the future, I think that the tap device should actually be
created by networkAllocateActualDevice(), as that will solve several
other problems, but that is a battle for another day, and this
information will still be useful outside the network driver)
When the actualType of a virDomainNetDef is "network", it means that
we are connecting to a libvirt-managed network (routed, natted, or
isolated) which does use a bridge device (created by libvirt). In the
past we have required drivers such as qemu to call the public API to
retrieve the bridge name in this case (even though it is available in
the NetDef's ActualNetDef if the actualType is "bridge" (i.e., an
externally-created bridge that isn't managed by libvirt). There is no
real reason for this difference, and as a matter of fact it
complicates things for qemu. Also, there is another bridge-related
attribute (macTableManager) that will need to be available in both
cases, so this makes things consistent.
In order to avoid problems when restarting libvirtd after an update
from an older version that *doesn't* store the network's bridgename in
the ActualNetDef, we also need to put it in place during
networkNotifyActualDevice() (this function is run for each interface
of each domain whenever libvirtd is restarted).
Along with making the bridge name available in the internal object, it
is also now reported in the <source> element of the <interface> state
XML (or the <actual> subelement in the internally-stored format).
The one oddity about this change is that usually there is a separate
union for every different "type" in a higher level object (e.g. in the
case of a virDomainNetDef there are separate "network" and "bridge"
members of the union that pivots on the type), but in this case
network and bridge types both have exactly the same attributes, so the
"bridge" member is used for both type==network and type==bridge.
The macTableManager attribute of a network's bridge subelement tells
libvirt how the bridge's MAC address table (used to determine the
egress port for packets) is managed. In the default mode, "kernel",
management is left to the kernel, which usually determines entries in
part by turning on promiscuous mode on all ports of the bridge,
flooding packets to all ports when the correct destination is unknown,
and adding/removing entries to the fdb as it sees incoming traffic
from particular MAC addresses. In "libvirt" mode, libvirt turns off
learning and flooding on all the bridge ports connected to guest
domain interfaces, and adds/removes entries according to the MAC
addresses in the domain interface configurations. A side effect of
turning off learning and unicast_flood on the ports of a bridge is
that (with Linux kernel 3.17 and newer), the kernel can automatically
turn off promiscuous mode on one or more of the bridge's ports
(usually only the one interface that is used to connect the bridge to
the physical network). The result is better performance (because
packets aren't being flooded to all ports, and can be dropped earlier
when they are of no interest) and slightly better security (a guest
can still send out packets with a spoofed source MAC address, but will
only receive traffic intended for the guest interface's configured MAC
address).
The attribute looks like this in the configuration:
<network>
<name>test</name>
<bridge name='br0' macTableManager='libvirt'/>
...
This patch only adds the config knob, documentation, and test
cases. The functionality behind this knob is added in later patches.
These two functions use netlink RTM_NEWNEIGH and RTM_DELNEIGH messages
to add and delete entries from a bridge's fdb. The bridge itself is
not referenced in the arguments to the functions, only the name of the
device that is attached to the bridge (since a device can only be
attached to one bridge at a time, and must be attached for this
function to make sense, the kernel easily infers which bridge's fdb is
being modified by looking at the device name/index).
I'm about to make block stats optionally more complex to cover
backing chains, where block.count will no longer equal the number
of <disks> for a domain. For these reasons, it is nicer if the
statistics output includes the source path (for local files).
This patch doesn't add anything for network disks, although we
may decide to add that later.
With this patch, I now see the following for the same domain as
in the previous patch (one qcow2 file, and an empty cdrom drive):
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.0.path=/var/lib/libvirt/images/foo.qcow2
block.1.name=hdc
* src/libvirt-domain.c (virConnectGetAllDomainStats): Document
new field.
* tools/virsh.pod (domstats): Document new field.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Return the new
stat for local files/block devices.
(QEMU_ADD_NAME_PARAM): Add parameter.
(qemuDomainGetStatsInterface): Update caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
I noticed that for an offline domain, 'virsh domstats --block $dom'
was producing just the domain name, with no stats. But the older
'virsh domblkinfo' works just fine on offline domains. This patch
starts to get us closer, by at least reporting the disk names for
an offline domain.
With this patch, I now see the following for an offline domain
with one qcow2 disk and an empty cdrom drive:
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.1.name=hdc
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Don't short-circuit
output of block name.
Signed-off-by: Eric Blake <eblake@redhat.com>
At least with 'virsh domstats --block' on an offline domain, we
currently output no stats even though we recognize the stat
category. Although a later patch will improve this situation,
it is better to document that this is expected behavior.
Also, while the current implementation rejects filtering flags
for virDomainListGetStats, this limitation may be lifted in the
future and we do not enforce it at the API level.
* src/libvirt-domain.c (virConnectGetAllDomainStats): Document
that recognized stats might not be reported.
(virDomainListGetStats): Likewise, and tweak filtering documentation.
Signed-off-by: Eric Blake <eblake@redhat.com>
qemuDomainGetStatsBlock() could leak a stats hash table if it
encountered OOM while populating the virTypedParameters.
Oddly, the fix doesn't even touch qemuDomainGetStatsBlock :)
* src/qemu/qemu_driver.c (QEMU_ADD_COUNT_PARAM)
(QEMU_ADD_NAME_PARAM): Don't return early.
(qemuDomainGetStatsInterface): Adjust caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
Whenever client socket was marked as closed for some reason, it could've
been changed when really closing the connection. With this patch the
proper reason is kept since the first time it's marked as closed.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When user tries to insert element metadata providing a namespace
declaration as well, currently we insert the element without any validation
check for XML prefix (if provided). The next VM start would then
fail with parse error. This patch fixes this issue by adding a call to
xmlValidateNCName function to check for illegal characters in the
prefix.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1143921
If probing capabilities via QMP fails, we now have a check
that prevents us falling back to -help parsing. Unfortunately
the error message
"Failed to probe capabilities for /usr/bin/qemu-kvm:
unsupported configuration: QEMU 2.1.2 is too new for help parsing"
is proving rather unhelpful to the user. We need to be telling
them why QMP failed (the root cause), rather than they can't
use -help (the side effect).
To do this we should capture stderr during QMP probing, and
if -help parsing then sees a new QEMU version, we know that
QMP should have worked, and so we can show the messages from
stderr. The message thus becomes
"Failed to probe capabilities for /usr/bin/qemu-kvm:
internal error: QEMU / QMP failed: Could not access
KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory"
When attempting to create internal system checkpoint with a passthrough
device qemu will report the following error:
error: operation failed: Error -22 while writing VM
This patch calls the function to check if migration is possible with
given VM and thus improves the error to:
error: Requested operation is not valid: domain has assigned non-USB host devices
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=874418#c19
Signed-off-by: Peter Krempa <pkrempa@redhat.com>