There are still some systems out there that have broken
setfilecon*() prototypes. Instead of taking 'const char *tcon' it
is taking 'char *tcon'. The function should just set the context,
not modify it.
We had been bitten with this problem before which resulted in
292d3f2d and subsequently b109c09765. However, with one my latest
commits (4674fc6afd) I've changed the type of @tcon variable to
'const char *' which results in build failure on the systems from
above.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This function is used only from code compiled on Linux. Therefore
on non-Linux platforms it triggers compilation error:
../../src/qemu/qemu_domain.c:209:1: error: unused function 'qemuDomainGetPreservedMounts' [-Werror,-Wunused-function]
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For the blockjobs, where libvirt is able to track the state internally
we can fix locking of images we can remove the appropriate locks.
Also when doing a pivoting operation we should not acquire the lock on
any of those images since both are actually locked already.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1302168
Images that became the backing chain of the current image due to the
snapshot need to be unlocked in the lock manager. Also if qemu was
paused during the snapshot the current top level images need to be
released until qemu is resumed so that they can be acquired properly.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1191901
The code at first changed the definition and then rolled it back in case
of failure. This was ridiculous. Refactor the code so that the image in
the definition is changed only when the snapshot is successful.
The refactor will also simplify further fix of image locking when doing
snapshots.
Libvirt is able to properly model what happens to the backing chain
after a snapshot so there's no real need to redetect the data.
Additionally with the _REUSE_EXT flag this might end up in redetecting
wrong data if the user puts wrong backing chain reference into the
snapshot image.
Commit id 'a48c674f' caused problems for systems without PARTED installed.
So move the PARTED probing code back to storage_backend_disk.c and create
a shim within storage_backend.c to call it if WITH_STORAGE_DISK is true;
otherwise, just return -1 with the error.
At startup time, rather than blindly trusting the target devices are
still properly formatted, let's check to make sure the pool's target
devices are all properly formatted before attempting to start the pool.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1373711
Add support and documentation for the [NO_]OVERWRITE flags for the
logical backend.
Update virsh.pod with a description of the process for usage of
the flags and building of the pool's volume group.
Signed-off-by: John Ferlan <jferlan@redhat.com>
If the build fails, then we need to ensure that we've run pvremove
on any devices which we've run pvcreate on; otherwise, a subsequent
build could fail since running pvcreate twice on a device requires
special force arguments.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Currently as long as the disk is formatted using a known parted format
type, the algorithm is happy to continue. However, that leaves a scenario
whereby a disk formatted using "pc98" could be used by a pool that's defined
using "dvh" (or vice versa). Alter the check to be match and different
and adjust the caller.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than have the Disk code having to use PARTED to determine if
there's something on the device, let's use the virStorageBackendDeviceProbe.
and only fallback to the PARTED probing if the BLKID code isn't built in.
This will also provide a mechanism for the other current caller (File
System Backend) to utilize a PARTED parsing algorithm in the event that
BLKID isn't built in to at least see if *something* exists on the disk
before blindly trying to use. The PARTED error checking will not find
file system types, but if there is a partition table set on the device,
it will at least cause a failure.
Move virStorageBackendDiskValidLabel and virStorageBackendDiskFindLabel
to storage_backend and rename/rework the code to fit the new model.
Update the virsh.pod description to provide a more generic description
of the process since we could now use either blkid or parted to find
data on the target device.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Prior to starting up, let's be sure the target volume device is
formatted as we expect; otherwise, inhibit the start.
Signed-off-by: John Ferlan <jferlan@redhat.com>
It's possible that the API could be called from a startup path in
order to check whether the label on the device matches what our
format is. In order to handle that condition, add a 'writelabel'
boolean to the API in order to indicate whether a write or just
read is about to happen.
This alters two "error" conditions that would care about knowing.
Signed-off-by: John Ferlan <jferlan@redhat.com>
A device may be formatted using some sort of disk partition format type.
We can check that using the blkid_ API's as well - so alter the logic to
allow checking the device for both a filesystem and a disk partition.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1363586
Commit id '27758859' introduced the "NO_OVERWRITE" flag check for
file system backends; however, the implementation, documentation,
and algorithm was inconsistent. For the "flag" description for the
API the flag was described as "Do not overwrite existing pool";
however, within the storage backend code the flag is described
as "it probes to determine if filesystem already exists on the
target device, renurning an error if exists".
The code itself was implemented using the paradigm to set up the
superblock probe by creating a filter that would cause the code
to only search for the provided format type. If that type wasn't
found, then the algorithm would return success allowing the caller
to format the device. If the format type already existed on the
device, then the code would fail indicating that the a filesystem
of the same type existed on the device.
The result is that if someone had a file system of one type on the
device, it was possible to overwrite it if a different format type
was specified in updated XML effectively trashing whatever was on
the device already.
This patch alters what NO_OVERWRITE does for a file system backend
to be more realistic and consistent with what should be expected when
the caller requests to not overwrite the data on the disk.
Rather than filter results based on the expected format type, the
code will allow success/failure be determined solely on whether the
blkid_do_probe calls finds some known format on the device. This
adjustment also allows removal of the virStoragePoolProbeResult
enum that was under utilized.
If it does find a formatted file system different errors will be
generated indicating a file system of a specific type already exists
or a file system of some other type already exists.
In the original virsh support commit id 'ddcd5674', the description
for '--no-overwrite' within the 'pool-build' command help output
has an ambiguous "of this type" included in the short description.
Compared to the longer description within the "Build a given pool."
section of the virsh.pod file it's more apparent that the meaning
of this flag would cause failure if a probe of the target already
has a filesystem.
So this patch also modifies the short description to just be the
antecedent of the 'overwrite' flag, which matches the API description.
This patch also modifies the grammar in virsh.pod for no-overwrite
as well as reworking the paragraph formats to make it easier to read.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rename virStorageBackendFileSystemProbe and to virStorageBackendBLKIDFindFS
and move to the more common storage_backend module.
Create a shim virStorageBackendDeviceIsEmpty which will make the call
to the virStorageBackendBLKIDFindFS and check the return value.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Again, there is no need to create /var/lib/libvirt/$domain.*
directories in CreateNamespace(). It is sufficient to create them
as soon as we need them which is in BuildNamespace. This way we
don't leave them around for the whole lifetime of domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The c1140eb9e got me thinking. We don't want to special case /dev
in qemuDomainGetPreservedMounts(), but in all other places in the
code we special case it anyway. I mean,
/var/run/libvirt/$domain.dev path is constructed separately just
so that it is not constructed here. It makes only a little sense
(if any at all).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If something goes wrong in this function we try a rollback. That
is unlink all the directories we created earlier. For some weird
reason unlink() was called instead of rmdir().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far if qemu is spawned under separate mount namespace in order
to relabel everything it needs an access to the security driver
to run in that namespace too. This has a very nasty down side -
it is being run in a separate process, so any internal state
transition is NOT reflected in the daemon. This can lead to many
sleepless nights. Therefore, use the transaction APIs so that
libvirt developers can sleep tight again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
With our new qemu namespace code in place, the relabelling of
devices is done not as good is it could: a child process is
spawned, it enters the mount namespace of the qemu process and
then runs desired API of the security driver.
Problem with this approach is that internal state transition of
the security driver done in the child process is not reflected in
the parent process. While currently it wouldn't matter that much,
it is fairly easy to forget about that. We should take the extra
step now while this limitation is still fresh in our minds.
Three new APIs are introduced here:
virSecurityManagerTransactionStart()
virSecurityManagerTransactionCommit()
virSecurityManagerTransactionAbort()
The Start() is going to be used to let security driver know that
we are starting a new transaction. During a transaction no
security labels are actually touched, but rather recorded and
only at Commit() phase they are actually updated. Should
something go wrong Abort() aborts the transaction freeing up all
memory allocated by transaction.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The code at the very bottom of the DAC secdriver that calls
chown() should be fine with read-only data. If something needs to
be prepared it should have been done beforehand.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
virtio-pci is the way forward for aarch64 guests: it's faster
and less alien to people coming from other architectures.
Now that guest support is finally getting there (Fedora 24,
CentOS 7.3, Ubuntu 16.04 and Debian testing all support
virtio-pci out of the box), we'd like to start using it by
default instead of virtio-mmio.
Users and applications can already opt-in by explicitly using
<address type='pci'/>
inside the relevant elements, but that's kind of cumbersome and
requires all users and management applications to adapt, which
we'd really like to avoid.
What we can do instead is use virtio-mmio only if the guest
already has at least one virtio-mmio device, and use virtio-pci
in all other situations.
That means existing virtio-mmio guests will keep using the old
addressing scheme, and new guests will automatically be created
using virtio-pci instead. Users can still override the default
in either direction.
Existing tests such as aarch64-aavmf-virtio-mmio and
aarch64-virtio-pci-default already cover all possible
scenarios, so no additions to the test suites are necessary.
When coldplugging vcpus to a VM that already has a few hotpluggable
vcpus the code might generate invalid configuration as
non-hotpluggable cpus need to be clustered starting from vcpu 0.
This fix forces the added vcpus to be hotpluggable in such case.
Fixes a corner case described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1370357
This patch adds support and documentation for
a generalized hardware cache event called cache_l1d
perf event.
Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
The public virSecret object has a single "usage_id" field
but the virSecretDef object has a different 'char *' field
for each usage type, but the code all assumes every usage
type has a corresponding single string. Get rid of the
pointless union in virSecretDef and just use "usage_id"
everywhere. This doesn't impact public XML format, only
the internal handling.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The handler for the device removal failed event was using
the struct for the device added event. Fortunately the
layout was the same, so this was harmless.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When changing the metadata via virDomainSetMetadata, we now
emit an event to notify the app of changes. This is useful
when co-ordinating different applications read/write of
custom metadata.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently when spawning containers with systemd, the container PID 1
will get moved into the systemd machine slice. Libvirt then manually
moves the libvirt_lxc and qemu-nbd processes into the cgroups associated
with the slice, but skips the systemd controller cgroup. This means that
from systemd's POV, libvirt_lxc and qemu-nbd are still part of the
libvirtd.service unit.
On systemctl daemon-reload, it will notice that libvirt_lxc & qemu-nbd
are in the libvirtd.service unit for the systemd controller, but in the
machine cgroups for resources. Systemd will thus move them back into
the libvirtd.service resource cgroups next time libvirtd is restarted.
This causes libvirtd to kill off the container due to incorrect cgroup
placement.
The solution is to ensure that when moving libvirt_lxc & qemu-nbd, we
also move the systemd cgroup controller placement. Normally this is
not something we ever want todo, but this is a special case as we are
intentionally wanting to move them to a different systemd unit.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there's only linux implementation for
virGetFCHostNameByFabricWWN(). Since the symbol is exported in
our private symbols we ought to have implementation for other
platforms too. This also triggers compilation error on FreeBSD:
../src/.libs/libvirt_driver_storage_impl.a(libvirt_driver_storage_impl_la-storage_backend_scsi.o): In function `createVport':
/usr/home/jenkins/libvirt-master/systems/libvirt-freebsd/build/src/../../src/storage/storage_backend_scsi.c:740: undefined reference to `virGetFCHostNameByFabricWWN'
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Separate out the "policy=discard" into it's own specific
qemu command line.
We'll rename "kvm-pit-device" test case to be "kvm-pit-discard"
since it has the syntax we'd be using.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
By a mistake, for the VIR_DOMAIN_TIMER_TICKPOLICY_DELAY qemu
command line creation, 'discard' was used instead of 'delay'
in commit id '1569fa14'.
Test "kvm-pit-delay" is fixed accordingly to show the correct
option being generated.
Remove the (now) redundant kvm-pit-device tests. As it turns
out there is no need to specify both QEMU_CAPS_NO_KVM_PIT and
QEMU_CAPS_KVM_PIT_TICK_POLICY since they are mutually exclusive
and "kvm-pit-device" becomes just the same as "kvm-pit-delay".
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1349696
As it turns out using only the 'parent' to achieve the goal of a
consistent vHBA parent has issues with reboots where the scsi_hostX
parent could change to scsi_hostY causing either failure to create
the vHBA or usage of the wrong HBA for our vHBA.
Thus add the ability to search for the "parent" by the parent wwnn/
wwpn values or just a fabric_name if someone only cares to ensure
usage of the same SAN for the vHBA.
Add new fields to the fchost structure to allow creation of a vHBA via
the storage pool when a parent_wwnn/parent_wwpn or parent_fabric_wwn is
supplied in the storage pool XML.
https://bugzilla.redhat.com/show_bug.cgi?id=1349696
When creating a vHBA, the process is to feed XML to nodeDeviceCreateXML
that lists the <parent> scsi_hostX to use to create the vHBA. However,
between reboots, it's possible that the <parent> changes its scsi_hostX
to scsi_hostY and saved XML to perform the creation will either fail or
create a vHBA using the wrong parent.
So add the ability to provide "wwnn" and "wwpn" or "fabric_wwn" to
the <parent> instead of a name of the scsi_hostN that is the parent.
The allowed XML will thus be:
<parent>scsi_host3</parent> (current)
or
<parent wwnn='$WWNN' wwpn='$WWPN'/>
or
<parent fabric_wwn='$WWNN'/>
Using the wwnn/wwpn or fabric_wwn ensures the same 'scsi_hostN' is
selected between hardware reconfigs or host reboots. The fabric_wwn
Using the wwnn/wwpn pair will provide the most specific search option,
while fabric_wwn will at least ensure usage of the same SAN, but maybe
not the same scsi_hostN.
This patch will add the new fields to the nodedev.rng for input purposes
only since the input XML is essentially thrown away, no need to Format
the values since they'd already be printed as part of the scsi_host
data block.
New API virNodeDeviceGetParentHostByWWNs will take the parent "wwnn" and
"wwpn" in order to search the list of devices for matching capability
data fields wwnn and wwpn.
New API virNodeDeviceGetParentHostByFabricWWN will take the parent "fabric_wwn"
in order to search the list of devices for matching capability data field
fabric_wwn.
Qemu has abandoned the +/-feature syntax in favor of key=value. Some
architectures (s390) do not support +/-feature. So we update libvirt to handle
both formats.
If we detect a sufficiently new Qemu (indicated by support for qmp
query-cpu-model-expansion) we use key=value else we fall back to +/-feature.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
When qmp query-cpu-model-expansion is available probe Qemu for its view of the
host model. In kvm environments this can provide a more complete view of the
host model because features supported by Qemu and Kvm can be considered.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
query-cpu-model-expansion is used to get a list of features for a given cpu
model name or to get the model and features of the host hardware/environment
as seen by Qemu/kvm.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
On s390, the host's features are heavily influenced by not only the host
hardware but also by hardware microcode level, host OS version, qemu
version and kvm version. In this environment it does not make sense to
attempt to report exact host details.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Jiri Denemark <jdenemar@redhat.com>
Implement compare for s390. Required to test the guest against the host for
guest cpu model runnability checking. We always return IDENTICAL to bypass
Libvirt's checking. s390 will rely on Qemu to perform the runnability checking.
Implement update for s390. required to support use of cpu "host-model" mode.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Jiri Denemark <jdenemar@redhat.com>
Just so it doesn't bite us in the future, even though it's unlikely.
And fix the comment above it as well. Commit e08ee7cd34 took the
info from the function it's calling, but that was lie itself in the
first place.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The resulting function virFileGetMountSubtreeImpl() just uses
virStringSortRevCompare or virStringSortCompare which uses strcmp().
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
With my namespace patches, we are spawning qemu in its own
namespace so that we can manage /dev entries ourselves. However,
some filesystems mounted under /dev needs to be preserved in
order to be shared with the parent namespace (e.g. /dev/pts).
Currently, the list of mount points to preserve is hardcoded
which ain't right - on some systems there might be less or more
items under real /dev that on our list. The solution is to parse
/proc/mounts and fetch the list from there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Our STREQ_NULLABLE and STRNEQ_NULLABLE macros are too
complicated. This was a result of some broken version of gcc.
However, that is long gone and therefore we can simplify the
macros.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If we restart libvirtd while VM was doing external memory snapshot, VM's
state be updated to paused as a result of running a migration-to-file
operation, and then VM will be left as paused state. In this case we must
restart the VM's CPUs to resume it.
Signed-off-by: Wang King <king.wang@huawei.com>
Rather than extraneous VIR_FREE's depending on where we are in the code,
move them to the top of the loop and in the cleanup path.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Move the check for an already existing vHBA to the top of the function.
No sense in first decoding a provided parent if the next thing we're going
to do is fail if a provided wwnn/wwpn already exists.
Signed-off-by: John Ferlan <jferlan@redhat.com>
If a <parent> is not supplied in the XML used to create a non-persistent
vHBA, then instead of failing, let's try to find a "vports" capable node
device and use that.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Extract out code from virNodeDeviceGetParentHost into helpers - it's
going to be reused in upcoming patches to search on more fields
Create virNodeDeviceFindVPORTCapDef in order to return a virNodeDevCapsDefPtr
of the VPORT_OPS and virNodeDeviceFindFCParentHost to use the function and
generate an error message if the device doesn't have the capability.
Also clean up the processing in virNodeDeviceGetParentHost to remove
need for goto's.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit 4b951d1e38 missed the fact that the
VM needs to be resumed after a live external checkpoint (memory
snapshot) where the cpus would be paused by the migration rather than
libvirt.
Again, not something that I'd hit, but there is a chance in
theory that this might bite us. Currently the way we decide
whether or not to create /dev entry for a device is by marching
first four characters of path with "/dev". This might be not
enough. Just imagine somebody has a disk image stored under
"/devil/path/to/disk". We ought to be matching against "/dev/".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Not that I'd encounter any bug here, but the code doesn't look
100% correct. Imagine, somebody is trying to attach a device to a
domain, and the device's /dev entry already exists in the qemu
namespace. This is handled gracefully and the control continues
with setting up ACLs and calling security manager to set up
labels. Now, if any of these steps fail, control jump on the
'cleanup' label and unlink() the file straight away. Even when it
was not us who created the file in the first place. This can be
possibly dangerous.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1406837
Imagine you have a domain configured in such way that you are
assigning two PCI devices that fall into the same IOMMU group.
With mount namespace enabled what happens is that for the first
PCI device corresponding /dev/vfio/X entry is created and when
the code tries to do the same for the second mknod() fails as
/dev/vfio/X already exists:
2016-12-21 14:40:45.648+0000: 24681: error :
qemuProcessReportLogError:1792 : internal error: Process exited
prior to exec: libvirt: QEMU Driver error : Failed to make device
/var/run/libvirt/qemu/windoze.dev//vfio/22: File exists
Worse, by default there are some devices that are created in the
namespace regardless of domain configuration (e.g. /dev/null,
/dev/urandom, etc.). If one of them is set as backend for some
guest device (e.g. rng, chardev, etc.) it's the same story as
described above.
Weirdly, in attach code this is already handled.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Clang 3.9 refuses to compile the existing code with the
following error:
util/virfirewall.c:425:20: error: passing an object that undergoes
default argument promotion to 'va_start'
has undefined behavior [-Werror,-Wvarargs]
va_start(args, layer);
^
util/virfirewall.c:420:37: note: parameter of type 'virFirewallLayer'
is declared here
virFirewallLayer layer,
^
This happens because 'layer' is of type virFirewallLayer, which
is an enum type and not a standard type such as eg. void* or int.
To solve the issue, turn virFirewallAddRule() from a very thin
wrapper around virFirewallAddRuleFullV() to a macro that expands
to a call to virFirewallAddRuleFull() - itself a very thin wrapper
around the aforementioned virFirewallAddRuleFullV() - with no loss
of functionality or type safety.
https://bugzilla.redhat.com/show_bug.cgi?id=1405269
If a secret was not provided for what was determined to be a LUKS
encrypted disk (during virStorageFileGetMetadata processing when
called from qemuDomainDetermineDiskChain as a result of hotplug
attach qemuDomainAttachDeviceDiskLive), then do not attempt to
look it up (avoiding a libvirtd crash) and do not alter the format
to "luks" when adding the disk; otherwise, the device_add would
fail with a message such as:
"unable to execute QEMU command 'device_add': Property 'scsi-hd.drive'
can't find value 'drive-scsi0-0-0-0'"
because of assumptions that when the format=luks that libvirt would have
provided the secret to decrypt the volume.
Access to unlock the volume will thus be left to the application.
After 478ddedc12 a bug is fixed where we wrongly presumed loopack
device name on non-Linux systems. It's lo0. However, the fix is
not reflected in the tests which are failing now.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Due to nature of operations we do over the string list (more
precisely due to how virStringListRemove() works), it is not the
best idea to use dataFree callback. Problem is, on MAC address
remove, the string list remove function modifies the original
list in place. Then, virHashUpdateEntry() is called which frees
all the data stored in the list rendering @newMacsList point to
freed data.
==16002== Invalid read of size 8
==16002== at 0x50BC083: virFree (viralloc.c:582)
==16002== by 0x513DC39: virStringListFree (virstring.c:251)
==16002== by 0x51089B4: virMacMapHashFree (virmacmap.c:67)
==16002== by 0x50EF30B: virHashAddOrUpdateEntry (virhash.c:352)
==16002== by 0x50EF4FD: virHashUpdateEntry (virhash.c:415)
==16002== by 0x5108BED: virMacMapRemoveLocked (virmacmap.c:129)
==16002== by 0x51092D5: virMacMapRemove (virmacmap.c:346)
==16002== by 0x402F02: testMACRemove (virmacmaptest.c:107)
==16002== by 0x403F15: virTestRun (testutils.c:180)
==16002== by 0x4032C4: mymain (virmacmaptest.c:205)
==16002== by 0x405A3B: virTestMain (testutils.c:992)
==16002== by 0x403D87: main (virmacmaptest.c:237)
==16002== Address 0xdd5a4d0 is 0 bytes inside a block of size 24 free'd
==16002== at 0x4C2AD6F: realloc (vg_replace_malloc.c:693)
==16002== by 0x50BB99B: virReallocN (viralloc.c:245)
==16002== by 0x513DC0B: virStringListRemove (virstring.c:235)
==16002== by 0x5108BA6: virMacMapRemoveLocked (virmacmap.c:124)
==16002== by 0x51092D5: virMacMapRemove (virmacmap.c:346)
==16002== by 0x402F02: testMACRemove (virmacmaptest.c:107)
==16002== by 0x403F15: virTestRun (testutils.c:180)
==16002== by 0x4032C4: mymain (virmacmaptest.c:205)
==16002== by 0x405A3B: virTestMain (testutils.c:992)
==16002== by 0x403D87: main (virmacmaptest.c:237)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In virMacMapRemoveLocked() we have two variables: @macsList and
@newMacsList. Obviously, @newMacsList is supposed to hold pointer
to modified list but in fact it holds pointer to the old list.
It's confusing.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Before, boot devices information for CTs was always empty and we
didn't indicate that containers can boot from disk.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Virtuozzo SDK interface doesn't differ filesystems from disks and sees them as disks.
Before, we always mistakenly presented disks based on files as filesystems, which is
not completely correct. Now we are going to show either disks or filesystems depending
on a hint, which uses boot device section of VZ config. Though this information
doesn't change booting order of a CT, it is used by vz libvirt interface as a hint
for libvirt representation of disks. Since now, if we have filesystems in input xml,
then we add them to VZ booting devices list and rely on this information to show
corresponding libvirt xml.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Implicit devices like controllers are confusing for CTs and
function virDomainDefAddImplicitDevices never intended to be called
for CTs.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
virQEMUCapsSupportsChardev existing checks returns true
for spapr-vty alone. Instead verify spapr-vty validity
and let the logic to return true for other device types
so that virtio-console passes.
The non-pseries machines dont have spapr-vio-bus. So, the
function always returned false for them before.
Fixes - https://bugzilla.redhat.com/show_bug.cgi?id=1257813
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1332019
This function will essentially be a wrapper to virStorageVolInfo in order
to provide a mechanism to have the "physical" size of the volume returned
instead of the "allocation" size. This will provide similar capabilities to
the virDomainBlockInfo which can return both allocation and physical of a
domain storage volume.
NB: Since we're reusing the _virStorageVolInfo and not creating a new
_virStorageVolInfoFlags structure, we'll need to generate the rpc APIs
remoteStorageVolGetInfoFlags and remoteDispatchStorageVolGetInfoFlags
(although both were originally created from gendispatch.pl and then
just copied into daemon/remote.c and src/remote/remote_driver.c).
The new API will allow the usage of a VIR_STORAGE_VOL_GET_PHYSICAL flag
and will make the decision to return the physical or allocation value
into the allocation field.
In order to get that physical value, virStorageBackendUpdateVolTargetInfoFD
adds logic to fill in physical value matching logic in qemuStorageLimitsRefresh
used by virDomainBlockInfo when the domain is inactive.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Although the virStorageBackendUpdateVolTargetInfo will update the
target.physical value, there is no way to provide that information
via the virStorageGetVolInfo API since it only returns the capacity
and allocation of a volume. So as described in commit id '0282ca45',
it should be possible to generate an output only <physical> value
for that purpose.
This patch generates the <physical> value in the volume XML output
for the sole purpose of being able to view/see the value to allow
someone to parse the XML in order to obtain the value.
Update the documentation to describe the output only nature.
Signed-off-by: John Ferlan <jferlan@redhat.com>
According to commit id '0282ca45a' the 'physical' value should
essentially be the last offset of the image or the host physical
size in bytes of the image container. However, commit id '15fa84ac'
refactored the GetBlockInfo to use the same returned data as the
GetStatsBlock API for an active domain. For the 'entry->physical'
that would end up being the "actual-size" as set through the
qemuMonitorJSONBlockStatsUpdateCapacityOne (commit '7b11f5e5').
Digging deeper into QEMU code one finds that actual_size is
filled in using the same algorithm as GetBlockInfo has used for
setting the 'allocation' field when the domain is inactive.
The difference in values is seen primarily in sparse raw files
and other container type files (such as qcow2), which will return
a smaller value via the stat API for 'st_blocks'. Additionally
for container files, the 'capacity' field (populated via the
QEMU "virtual-size" value) may be slightly different (smaller)
in order to accomodate the overhead for the container. For
sparse files, the state 'st_size' field is returned.
This patch thus alters the allocation and physical values for
sparse backed storage files to be more appropriate to the API
contract. The result for GetBlockInfo is the following:
capacity: logical size in bytes of the image (how much storage
the guest will see)
allocation: host storage in bytes occupied by the image (such
as highest allocated extent if there are no holes,
similar to 'du')
physical: host physical size in bytes of the image container
(last offset, similar to 'ls')
NB: The GetStatsBlock API allows a different contract for the
values:
"block.<num>.allocation" - offset of the highest written sector
as unsigned long long.
"block.<num>.capacity" - logical size in bytes of the block device
backing image as unsigned long long.
"block.<num>.physical" - physical size in bytes of the container
of the backing image as unsigned long long.
This patch detects a misconfiguration between the disk bus type and disk
address type for controller based disk buses (SATA, SCSI, FDC and
IDE). The addresses of these bus types are all managed in common code so
it's possible to decide in common code whether the disk address and bus
type are compatible or not.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Disk->info is not live updatable so add a check for this. Otherwise
libvirt reports success even though no data was updated.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
This function will be needed by the QEMU driver in an upcoming
patch. Additionally, removed a useless empty line.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
This patch reduces the complexity of the filtering algorithm in
virCgroupDetect by first correcting the controller mask and then
checking for potential co-mounts without any correlating
controller mask modifications.
If you agree that this patch removes complexity and improves
readability it could simply be squashed into the first patch
of this series.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
The cgroup controller filtering in virCgroupDetect does not work
properly if the following conditions are met:
1) the host system does not have a cgroup controller which
libvirt requests (unavailable controller) and
2) libvirt is configured to disable a controller (disabled controller) and
3) the disabled controller is located before the unavailable controller
in virCgroupController.
As an example: The memory controller is unavailable and the cpuset
controller is configured to be disabled.
In this scenario trying to start a domain results in the error
error: Controller 'cpuset' is not wanted, but 'memory' is co-mounted: Invalid argument
This error occurs when virCgroupDetect is called with a valid parent group.
The resulting group created by virCgroupCopyMounts holds for cpuset and
memory controller empty mount points. The filtering of disabled controllers
checks for co-mounts by comparing the mount points. The cpuset controller
causes the filtering to occur before the memory controller is marked as to be
ignored by modifying the controller mask since it is unavailable.
Therefore the co-mount detection logic compares the cpuset and memory controller
mount points and since both are empty the memory controller is regarded
erroneously as being co-mounted.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Similarly to localOnly DNS domain, localPtr attribute can be used to
tell the DNS server not to forward reverse lookups for unknown IPs which
belong to the virtual network.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The API creates PTR domain which corresponds to a given addr/prefix.
Both IPv4 and IPv6 addresses are supported, but the prefix must be
divisible by 8 for IPv4 and divisible by 4 for IPv6.
The generated PTR domain has the following format
IPv4: 1.2.3.4.in-addr.arpa
IPv6: 0.1.2.3.4.5.6.7.8.9.a.b.c.d.e.f.0.1.2.3.4.5.6.7.8.9.a.b.c.d.e.f.ip6.arpa
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Iterating over all child nodes when we only support one instance of each
child is pretty weird. And it would even cause memory leaks if more
than one <tftp> element was specified.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The PERF_COUNT_HW_REF_CPU_CYCLES constant is not available
on all Linux distros libvirt targets, so its use must be
made conditional. Other constant have existed long enough
that we can assume they exist, as we don't support very
old distros like RHEL-5 any more.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Enable libvirt users to modify daemon's logging output settings from outside.
If either an empty string or NULL is passed, a default logging output will be
used the same way as it would be in case writing an empty string to the
libvirtd.conf
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Now that virLog{Get,Set}DefaultOutput routines are introduced we can wire them
up to the daemon's logging initialization code. Also, change the order of
operations a bit so that we still strictly honor our precedence of settings:
cmdline > env > config now that outputs and filters are not appended anymore.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Along with an empty string, it should also be possible for users to pass
NULL to the public APIs which in turn would trigger a routine(future
work) responsible for defining an appropriate default logging output
given the current circumstances.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
These helpers will manage the log destination defaults (fetch/set). The reason
for this is to stay consistent with the current daemon's behaviour with respect
to /etc/libvirt/<daemon>.conf file, since both assignment of an empty string
or not setting the log output variable at all trigger the daemon's decision on
the default log destination which depends on whether the daemon runs daemonized
or not.
This patch also changes the logic of the selection of the default
logging output compared to how it is done now. The main difference though is
that we should only really care if we're running daemonized or not, disregarding
the fact of (not) having a TTY completely (introduced by commit eba36a3878) as
that should be of the libvirtd's parent concern (what FD it will pass to it).
Before:
if (godaemon || !hasTTY):
if (journald):
use journald
if (godaemon):
if (privileged):
use SYSCONFIG/libvirtd.log
else:
use XDG_CONFIG_HOME/libvirtd.log
else:
use stderr
After:
if (godaemon):
if (journald):
use journald
else:
if (privileged):
use SYSCONFIG/libvirtd.log
else:
use XDG_CONFIG_HOME/libvirtd.log
else:
use stderr
Signed-off-by: Erik Skultety <eskultet@redhat.com>
External disk-only snapshots with recent enough qemu don't require
libvirt to pause the VM. The logic determining when to resume cpus was
slightly flawed and attempted to resume them even if they were not
paused by the snapshot code. This normally was not a problem, but with
locking enabled the code would attempt to acquire the lock twice.
The fallout of this bug would be a error from the API, but the actual
snapshot being created. The bug was introduced with when adding support
for external snapshots with memory (checkpoints) in commit f569b87.
Resolves problems described by:
https://bugzilla.redhat.com/show_bug.cgi?id=1403691
After qemu delivers the resume event it's already running and thus it's
too late to enter lockspaces since it may already have modified the
disk. The code only creates false log entries in the case when locking
is enabled. The lockspace needs to be acquired prior to starting cpus.
Given how intrusive previous patches are, it might happen that
there's a bug or imperfection. Lets give users a way out: if they
set 'namespaces' to an empty array in qemu.conf the feature is
suppressed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When attaching a device to a domain that's using separate mount
namespace we must maintain /dev entries in order for qemu process
to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Instead of trying to fix our security drivers, we can use a
simple trick to relabel paths in both namespace and the host.
I mean, if we enter the namespace some paths are still shared
with the host so any change done to them is visible from the host
too.
Therefore, we can just enter the namespace and call
SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight
overhead because we have to fork in order to enter the namespace.
But on the other hand, no complexity is added to our code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a domain and separate mount namespace is used, we
have to create all the /dev entries that are configured for the
domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Prime time. When it comes to spawning qemu process and
relabelling all the devices it's going to touch, there's inherent
race with other applications in the system (e.g. udev). Instead
of trying convincing udev to not touch libvirt managed devices,
we can create a separate mount namespace for the qemu, and mount
our own /dev there. Of course this puts more work onto us as we
have to maintain /dev files on each domain start and device
hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent
(libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev
/var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare()
to create new mount namespace. From now on anything that child
does is invisible to the parent. Child then mounts tmpfs on
$domain.dev (so that it still sees original /dev from the host)
and creates some devices (as explained in one of the previous
patches). The devices have to be created exactly as they are in
the host (including perms, seclabels, ACLs, ...). After that it
moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can
create PTYs for some chardevs. And historically we exposed the
host ends in our domain XML allowing users to connect to them.
Therefore we must preserve devpts mount to be shared with the
host's one.
To make this patch as small as possible, creating of devices
configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This is a list of devices that qemu needs for its run (apart from
what's configured for domain). The devices on the list are
enabled in the CGroups by default so they will be good candidates
for initial /dev for new qemu.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We will need this function in near future so that we know what
/dev device corresponds to the SCSI device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We will need this function in near future so that we know what
/dev device corresponds to the SCSI device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We will need this function in near future so that we know what
/dev device corresponds to the USB device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and
virFileCopyACLs. These functions are going to be required when we
are creating /dev for qemu. We have copy anything that's in
host's /dev exactly as is. Including ACLs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
libvirt libxl picks its own default with respect to the default NIC
to use. libxlMakeNic is the one responsible for this and on boot it
picks LIBXL_NIC_TYPE_VIF_IOEMU for HVM domains such that it accomodates
both PV and emulated one. The good behaving guest at boot will then
select the pv and unplug the emulated device.
Now, on HVM when attaching an interface it will pick the same default
that is LIBXL_NIC_TYPE_VIF_IOEMU which as a result will fail the attach
(see xen commit 32e9d0f ("libxl: nic type defaults to vif in hotplug for
hvm guest"). Xen doesn't yet support the hotplug of emulated devices,
but we don't want to rule out that case either, which might get support
in the future. Hence we simply reverse the defaults when we are
attaching the interface which allows libvirt to prefer the PV nic first
without adding "model='netfront'" following the same pattern as above
commit. Also to avoid ruling out the emulated one we set to
LIBXL_NIC_TYPE_IOEMU when setting a model type that is not 'netfront'.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
The virDomainSendProcessSignal method says the flags values
come from virDomainProcessSignalFlag, but this enum has
never existed. No flags are needed for this method.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Almost none of our virJSONValue*Get* functions accept const virJSONValue
pointers and it wouldn't even make sense since we sometimes modify what
we get. And because there is no reason for preventing callers of
virJSONValueObjectForeachKeyValue from modifying the values they get in
each iteration we can just stop doing it.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Using a variable named 'stat' clashes with the system function
'stat()' causing compiler warnings on some platforms
cc1: warnings being treated as errors
../../src/qemu/qemu_monitor_text.c: In function 'parseMemoryStat':
../../src/qemu/qemu_monitor_text.c:604: error: declaration of 'stat' shadows a global declaration [-Wshadow]
/usr/include/sys/stat.h:455: error: shadowed declaration is here [-Wshadow]
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the cpuset cgroup controller is disabled in /etc/libvirt/qemu.conf
QEMU virtual machines can in principle use all host CPUs, even if they
are hot plugged, if they have no explicit CPU affinity defined.
However, there's libvirt code supposed to handle the situation where
the libvirt daemon itself is not using all host CPUs. The code in
qemuProcessInitCpuAffinity attempts to set an affinity mask including
all defined host CPUs. Unfortunately, the resulting affinity mask for
the process will not contain the offline CPUs. See also the
sched_setaffinity(2) man page.
That means that even if the host CPUs come online again, they won't be
used by the QEMU process anymore. The same is true for newly hot
plugged CPUs. So we are effectively preventing that QEMU uses all
processors instead of enabling it to use them.
It only makes sense to set the QEMU process affinity if we're able
to actually grow the set of usable CPUs, i.e. if the process affinity
is a subset of the online host CPUs.
There's still the chance that for some reason the deliberately chosen
libvirtd affinity matches the online host CPU mask by accident. In this
case the behavior remains as it was before (CPUs offline while setting
the affinity will not be used if they show up later on).
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>