Commit id '5e46d7d6' did not take into account that usage of a luks
volume will require usage of the master key encrypted passphrase for
a QEMU environment. So rather than allow creation of something that
won't be usable, just fail the creation.
Partially resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1301021
If the volume xml was looking to create a luks volume take the necessary
steps in order to make that happen.
The processing will be:
1. create a temporary file (virStorageBackendCreateQemuImgSecretPath)
1a. use the storage driver state dir path that uses the pool and
volume name as a base.
2. create a secret object (virStorageBackendCreateQemuImgSecretObject)
2a. use an alias combinding the volume name and "_luks0"
2b. add the file to the object
3. create/add luks options to the commandline (virQEMUBuildLuksOpts)
3a. at the very least a "key-secret=%s" using the secret object alias
3b. if found in the XML the various "cipher" and "ivgen" options
Signed-off-by: John Ferlan <jferlan@redhat.com>
Vz containers are able to use ploop volumes from storage pools
to work upon.
To use filesystem type volume, pool name and volume name should be
specifaed in <source> :
<filesystem type='volume' accessmode='passthrough'>
<driver type='ploop' format='ploop'/>
<source pool='guest_images' volume='TEST_POOL_CT'/>
<target dir='/'/>
</filesystem>
The information about pool and volume is stored in ct dom configuration:
<StorageURL>libvirt://localhost/pool_name/vol_name</StorageURL>
and can be easily obtained via PrlVmDevHd_GetStorageURL sdk call.
The only shorcoming: if storage pool is moved somewhere the ct
should be redefined in order to refresh the information aboot path
to root.hdd
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
The modification of .volWipe callback wipes ploop volume using one of
given wiping algorithm: dod, nnsa, etc.
However, in case of ploop volume we need to reinitialize root.hds and DiskDescriptor.xml.
v2:
- added check on ploop tools presens
- virCommandAddArgFormat changed to virCommandAddArg
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
In order to use more common code and set up for a future type, modify the
encryption secret to allow the "usage" attribute or the "uuid" attribute
to define the secret. The "usage" in the case of a volume secret would be
the path to the volume as dictated by the backwards compatibility brought
on by virStorageGenerateQcowEncryption where it set up the usage field as
the vol->target.path and didn't allow someone to provide it. This carries
into virSecretObjListFindByUsageLocked which takes the secret usage attribute
value from from the domain disk definition and compares it against the
usage type from the secret definition. Since none of the code dealing
with qcow/qcow2 encryption secrets uses usage for lookup, it's a mostly
cosmetic change. The real usage comes in a future path where the encryption
is expanded to be a luks volume and the secret will allow definition of
the usage field.
This code will make use of the virSecretLookup{Parse|Format}Secret common code.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1316370
Consider the following disk for a domain:
<disk type='volume' device='cdrom'>
<driver name='qemu' type='raw'/>
<auth username='libvirt'>
<secret type='iscsi' usage='libvirtiscsi'/>
</auth>
<source pool='iscsi-secret-pool' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/>
<target dev='sda' bus='scsi'/>
<readonly/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
Now, startupPolicy is currently not allowed for iscsi disks, so
one would expect an error message to be thrown. But what a
surprise is waiting for users if they try to start up such
domain:
==15724== Invalid free() / delete / delete[] / realloc()
==15724== at 0x4C2B1F0: free (vg_replace_malloc.c:473)
==15724== by 0x54B7A69: virFree (viralloc.c:582)
==15724== by 0x552DC90: virStorageAuthDefFree (virstoragefile.c:1549)
==15724== by 0x552F023: virStorageSourceClear (virstoragefile.c:2055)
==15724== by 0x552F054: virStorageSourceFree (virstoragefile.c:2067)
==15724== by 0x55556AA: virDomainDiskDefFree (domain_conf.c:1562)
==15724== by 0x5557ABE: virDomainDefFree (domain_conf.c:2547)
==15724== by 0x1B43CC42: qemuProcessStop (qemu_process.c:5918)
==15724== by 0x1B43BA2E: qemuProcessStart (qemu_process.c:5511)
==15724== by 0x1B48993E: qemuDomainObjStart (qemu_driver.c:7050)
==15724== by 0x1B489B9A: qemuDomainCreateWithFlags (qemu_driver.c:7104)
==15724== by 0x1B489C01: qemuDomainCreate (qemu_driver.c:7122)
==15724== Address 0x21cfbb90 is 0 bytes inside a block of size 48 free'd
==15724== at 0x4C2B1F0: free (vg_replace_malloc.c:473)
==15724== by 0x54B7A69: virFree (viralloc.c:582)
==15724== by 0x552DC90: virStorageAuthDefFree (virstoragefile.c:1549)
==15724== by 0x12D1C8D4: virStorageTranslateDiskSourcePool (storage_driver.c:3475)
==15724== by 0x1B4396E4: qemuProcessPrepareDomain (qemu_process.c:4896)
==15724== by 0x1B43B880: qemuProcessStart (qemu_process.c:5466)
==15724== by 0x1B48993E: qemuDomainObjStart (qemu_driver.c:7050)
==15724== by 0x1B489B9A: qemuDomainCreateWithFlags (qemu_driver.c:7104)
==15724== by 0x1B489C01: qemuDomainCreate (qemu_driver.c:7122)
==15724== by 0x561CA97: virDomainCreate (libvirt-domain.c:6787)
==15724== by 0x12B6FD: remoteDispatchDomainCreate (remote_dispatch.h:4116)
==15724== by 0x12B61A: remoteDispatchDomainCreateHelper (remote_dispatch.h:4092)
The problem is, in virStorageTranslateDiskSourcePool disk
def->src->auth is freed, but the pointer is not set to NULL. So
later, when qemuProcessStop starts to free the domain definition,
virStorageAuthDefFree() tries to free the memory again, instead
of jumping out immediately.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Create a function to return a temporary file path to be used in a mkostemp
type call using the path to the stateDir + pool->def->name + vol->name
Signed-off-by: John Ferlan <jferlan@redhat.com>
The VIR_STORAGE_POOL_EVENT_REFRESHED constant does not
reflect any change in the lifecycle of the storage pool.
It should thus not be part of the storage pool lifecycle
event set, but rather be a top level event in its own
right. Thus we introduce VIR_STORAGE_POOL_EVENT_ID_REFRESH
to replace it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In the unlikely case the iSCSI session path exists, but does not
contain an entry starting with "target", we would silently use
an initialized value.
Rewrite the function to correctly report errors.
The directories we iterate over are unlikely to contain any entries
starting with a dot, other than '.' and '..' which is already skipped
by virDirRead.
Move the enum into a new src/util/virsecret.h, rename it to be
virSecretLookupType. Add a src/util/virsecret.h in order to perform
a couple of simple operations on the secret XML and virSecretLookupTypeDef
for clearing and copying.
This includes quite a bit of collateral damage, but the goal is to remove
the "virStorage*" and replace with the virSecretLookupType so that it's
easier to to add new lookups that aren't necessarily storage pool related.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Every driver provides a refreshPool impl, and many other critical
places in the code unconditionally call it without checking if
it exists, so this check is pointless
Create a helper virStorageBackendCreateQemuImgSetOptions to set either
the qemu-img -o options or the previous mechanism using -F
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since we support QEMU 0.12 and later, checking for support of specific flags
added prior to that isn't necessary.
Thus start with the base of having the "-o options" available for the
qemu-img create option and then determine whether we have the compat
option for qcow2 files (which would be necessary up through qemu 2.0
where the default changes to compat 0.11).
Adjust test to no long check for NONE and FLAG options as well was removing
results of tests that would use that option.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Implement storage pool event callbacks for START, STOP, DEFINE, UNDEFINED
and REFRESHED in functions when a storage pool is created/started/stopped
etc. accordingly
Split out a helper from virStorageBackendCreateQemuImgCmdFromVol
to check the encryption - soon a new encryption sheriff will be
patroling and that'll mean all sorts of new checks.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit 5e54361c added virStoragePoolObjClearVols before refreshPool
to prevent duplicate volume entries.
However it is not needed here because we're not refreshing the pool yet,
just checking for the existence of the refresh callback.
The actual refresh is done via virStorageVolFDStreamCloseCb
in virStorageVolPoolRefreshThread, which already calls
virStoragePoolObjClearVols.
Rather than have virCommandRun just spit out the error, allow callers
to decide to pass the exitstatus so the caller can make intelligent
decisions based on the error.
Commit id 'df1011ca8' modified virStorageBackendDiskDeleteVol to use
"dmsetup remove --force" to remove the volume, but left things in an
inconsistent state since the partition still existed on the disk and
only the device mapper device (/dev/dm-#) was removed.
Prior to commit '1895b421' (or '1ffd82bb' and '471e1c4e'), this could
go unnoticed since virStorageBackendDiskRefreshPool wasn't called.
However, the pool would be unusable since the /dev/dm-# device would
be removed even though the partition was not removed unless a multipathd
restart reset the link. That would of course make the volume appear again
in the pool after a refresh or pool start after libvirt reload.
This patch removes the 'dmsetup' logic and re-implements the partition
deletion logic for device mapper devices. The removal of the partition
via 'parted rm --script #' will cause udev device change logic to allow
multipathd to handle removing the dm-* device associated with the partition.
https://bugzilla.redhat.com/show_bug.cgi?id=1265694
Commit id '020135dc' didn't quite get the algorithm correct when a
device mapper source ended with a non numeric value (e.g. ends with
an alphabet value).
This patch modifies the 'part_separator' logic to add the "p" separator
to the attempted target path name only when specified as part_separator='yes'.
For a source name that already ends with a number, the logic doesn't change
as the part separator would need to be there.
For a source name that ends with something other than a number, this allows
the possibility that a "p" separator can be added. The default for one of
these source devices is to not add the separator.
The key for device mapper and the need for a partition separator "p" is
the presence of a number in the last character of the device name link
in /dev/mapper. A name such as "/dev/mapper/mpatha1" would generate
a "/dev/mapper/mpatha1p1" partition, while "/dev/mapper/mpatha" would
generate partition "/dev/mapper/mpatha1". Similarly for a device
mapper entry not using friendly names or an alias, a device such as
"/dev/mapper/3600a0b80005b10ca00005ad656fd8d93" would generate a
paritition "/dev/mapper/3600a0b80005b10ca00005ad656fd8d93p1", while
a device such as "/dev/mapper/3600a0b80005b10ca00005e115729093f" would
generate a partition "/dev/mapper/3600a0b80005b10ca00005e115729093f1".
The long number is the WWID of the device. It's also possible to assign
an alias for a device mapper entry, that alias follows the same rules
with respect to ending with a number or not when adding a "p" to create
the target device path.
Prior to calling the 'refreshPool' during CreatePool or UploadPool
operations, we need to clear the pool; otherwise, the pool will
have duplicated entries.
https://bugzilla.redhat.com/show_bug.cgi?id=1318993
Commit id 'dd519a294' caused a regression cloning a volume into a
logical pool by removing just the 'allocation' adjustment during
storageVolCreateXMLFrom. Combined with the change to not require the
new volume input XML to have a capacity listed (commit id 'e3f1d2a8')
left the possibility that a zero allocation value (e.g., not provided)
would create a thin/sparse logical volume. When a thin lv becomes fully
populated, then LVM sets the partition 'inactive' and the subsequent
fdatasync() fails.
Add a new 'has_allocation' flag to be set at XML parse time to indicate
that allocation was provided. This is done so that if it's not provided
the create-from code uses the capacity value since we document that if
omitted, the volume will be fully allocated at time of creation.
For a logical backend, that creation time is 'createVol', while for a
file backend, creation doesn't set the size, but the 'createRaw' called
during buildVolFrom will decide whether the file is sparse or not based
on the provided capacity and allocation value.
For volume clones that provide different allocation and capacity values
to allow for sparse files, there is no change.
We had both and the only difference was that the latter also included
information about multifunction setting. The problem with that was that
we couldn't use functions made for only one of the structs (e.g.
parsing). To consolidate those two structs, use the one in virpci.h,
include that in domain_conf.h and add the multifunction member in it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Remove all the plumbing needed for the different qcow-create/kvm-img
non-raw file creation.
We can drop the error messages because CreateQemuImg will thrown an
error for us but with slightly less fidelity (unable to find qemu-img),
which I think is acceptable given the unlikeliness of that error in
practice.
This an ubuntu/debian packaging convention. At one point it may have
been an actually different binary, but at least as of ubuntu precise
(the oldest supported ubuntu distro, released april 2012) kvm-img is
just a symlink to qemu-img for back compat.
I think it's safe to drop support for it
qcow-create was a crippled qemu-img impl that shipped with xen. I
think supporting this was only relevant for really old distros
that didn't have a proper qemu package, like early RHEL5. I think
it's fair to drop support
By default, `zfs create -V ...` reserves space for the entire volsize,
plus some extra (which attempts to account for overhead).
If `zfs create -s -V ...` is used instead, zvols are (fully) sparse.
A middle ground (partial allocation) can be achieved with
`zfs create -s -o refreservation=... -V ...`. Both libvirt and ZFS
support this approach, so the ZFS storage backend should support it.
Signed-off-by: Richard Laager <rlaager@wiktel.com>
In case of ploop volume, target path of the volume is the path to the
directory that contains image file named root.hds and DiskDescriptor.xml.
While using uploadVol and downloadVol callbacks we need to open root.hds
itself.
Upload or download operations with ploop volume are only allowed when
images do not have snapshots. Otherwise operation fails.
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Refreshes meta-information such as allocation, capacity, format, etc.
Ploop volumes differ from other volume types. Path to volume is the path
to directory with image file root.hds and DiskDescriptor.xml.
https://openvz.org/Ploop/format
Due to this fact, operations of opening the volume have to be done once
again. get the information.
To decide whether the given volume is ploops one, it is necessary to check
the presence of root.hds and DiskDescriptor.xml files in volumes' directory.
Only in this case the volume can be manipulated as the ploops one.
Such strategy helps us to resolve problems that might occure, when we
upload some other volume type from ploop source.
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Recursively deletes whole directory of a ploop volume.
To delete ploop image it has to be unmounted.
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
These callbacks let us to create ploop volumes in dir, fs and etc. pools.
If a ploop volume was created via buildVol callback, then this volume
is an empty ploop device with DiskDescriptor.xml.
If the volume was created via .buildFrom - then its content is similar to
input volume content.
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Ploop image consists of directory with two files: ploop image itself,
called root.hds and DiskDescriptor.xml that contains information about
ploop device: https://openvz.org/Ploop/format.
Such volume are difficult to manipulate in terms of existing volume types
because they are neither a single files nor a directory.
This patch introduces new volume type - ploop. This volume type is used
by ploop volume's exclusively.
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
We use device-mapper to enumerate all dm devices, and filter out
the list of multipath devices by checking the target_type string
name. The code however cancels all scanning if we encounter
target_type=NULL
I don't know how to reproduce that situation, but a user was hitting
it in their setup, and inspecting the lvm2/device-mapper code shows
many places where !target_type is explicitly ignored and processing
continues on to the next device. So I think we should do the same
https://bugzilla.redhat.com/show_bug.cgi?id=1069317
I tried compiling libvirt with older gcc and probably because I used
different configure options I got some shadowed declarations.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
If the pool creation thread happens to detect the luns in
the scsi target, the size parameters will be calculated as
part of the refreshPool called from storagePoolCreate().
This means the virStoragePoolFCRefreshThread (commit id
'512b874') waiting to run and "refresh" the pool will
essentially double the allocation and capacity values.
A separate refresh would correct the values.
To avoid this, the FCRefreshThread needs to reinitialize
the pool size values prior to calling virStorageBackendSCSIFindLUs
which eventually calls virStorageBackendSCSINewLun and
updates the size values for each volume found.
After the recent commits the build didn't work for me. Fix it by
using size_t as the callback argument is using and the correct
formatter. The attempted fixup to use %llu as a formatter was wrong.
This reverts commit bb5f2dc91f.
The "if (vol->target.format != VIR_STORAGE_FILE_RAW)" check in the
createVol backend. This check is bogus because virStorageVolDefParseXML()
in conf/storage_conf.c sets target.format only if volOptions in
virStoragePoolTypeInfo has formatFromString set, and that's not the
case the zfs backend.
So the check always fails and breaks volume creation.
This reverts commit 6682d6219d.
The "if (vol->target.format != VIR_STORAGE_FILE_RAW)" check in the
createVol backend. This check is bogus because virStorageVolDefParseXML()
in conf/storage_conf.c sets target.format only if volOptions in
virStoragePoolTypeInfo has formatFromString set, and that's not the
case the logical backend.
So the check always fails and breaks volume creation.
While trying to build with -Os couple of compile errors showed
up.
conf/domain_conf.c: In function 'virDomainChrRemove':
conf/domain_conf.c:13666:24: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
virDomainChrDefPtr ret, **arrPtr = NULL;
^
Compiler fails to see that @ret is used only if set in the loop,
but whatever, there's no harm in initializing the variable.
In vboxAttachDrivesNew and _vboxAttachDrivesOld compiler thinks
that @rc may be used uninitialized. Well, not directly, but maybe
after some optimization. Yet again, no harm in initializing a
variable.
In file included from ./util/virthread.h:26:0,
from ./datatypes.h:28,
from vbox/vbox_tmpl.c:43,
from vbox/vbox_V3_1.c:37:
vbox/vbox_tmpl.c: In function '_vboxAttachDrivesOld':
./util/virerror.h:181:5: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
virReportErrorHelper(VIR_FROM_THIS, code, __FILE__, \
^
In file included from vbox/vbox_V3_1.c:37:0:
vbox/vbox_tmpl.c:1041:14: note: 'rc' was declared here
nsresult rc;
^
Yet again, one uninitialized variable:
qemu/qemu_driver.c: In function 'qemuDomainBlockCommit':
qemu/qemu_driver.c:17194:9: error: 'baseSource' may be used uninitialized in this function [-Werror=maybe-uninitialized]
qemuDomainPrepareDiskChainElement(driver, vm, baseSource,
^
And another one:
storage/storage_backend_logical.c: In function 'virStorageBackendLogicalMatchPoolSource.isra.2':
storage/storage_backend_logical.c:618:33: error: 'thisSource' may be used uninitialized in this function [-Werror=maybe-uninitialized]
thisSource->devices[j].path))
^
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Found by inspection - after calling virStoragePoolObjAssignDef the
pool is part of the driver->pools.objs list and the failure path
for the virStoragePoolObjSaveDef will use virStoragePoolObjRemove
to remove the pool from the objs list which will unlock and free
the pool pointer (as pools->objs[i] during the loop). Since the call
doesn't clear the pool address from the callee, we need to set it
to NULL; otherwise, the virStoragePoolObjUnlock in the cleanup: code
will fail miserably.
Generates a false positive for Coverity, but it turns out there's no need
to check ret == -1 since if VIR_APPEND_ELEMENT is successful, the local
vol pointer is cleared anyway.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Found by my Coverity checker - virCheckFlags call could return -1, but
not virCommandFree(destroy_cmd).
Signed-off-by: John Ferlan <jferlan@redhat.com>
%zu is not always synonymous with uint64_t; on 32-bit machines,
size_t is only 32 bits. Prefer "%lld"/'unsigned long long' when
the variable is under our control, and "%"PRIu64 when we are
stuck with 'uint64_t' from RBD.
Fixes errors such as:
../../src/storage/storage_backend_rbd.c: In function 'virStorageBackendRBDVolWipe':
../../src/storage/storage_backend_rbd.c:1281:15: error: format '%zu' expects argument of type 'size_t', but argument 8 has type 'uint64_t {aka long long unsigned int}' [-Werror=format=]
VIR_DEBUG("Need to wipe %zu bytes from RBD image %s/%s",
^
../../src/util/virlog.h:90:73: note: in definition of macro 'VIR_DEBUG_INT'
virLogMessage(src, VIR_LOG_DEBUG, filename, linenr, funcname, NULL, __VA_ARGS__)
^
../../src/storage/storage_backend_rbd.c:1281:5: note: in expansion of macro 'VIR_DEBUG'
VIR_DEBUG("Need to wipe %zu bytes from RBD image %s/%s",
^
Signed-off-by: Eric Blake <eblake@redhat.com>
Checking whether x > 0 before looping over [0..x] items doesn't make
sense and multi-line body must have curly brackets around it.
Best viewed with '-w'.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This reverts commit 611a278fa4.
According to the original commit message, this is dead code:
It is highly unlikely that a backend will know how to create a
volume from a different volume (buildVolFrom) and not know how to
create an empty volume (createVol).
Since Ceph version Infernalis (9.2.0) the new fast-diff mechanism
of RBD allows for querying actual volume usage.
Prior to this version there was no easy and fast way to query how
much allocation a RBD volume had inside a Ceph cluster.
To use the fast-diff feature it needs to be enabled per RBD image
and is only supported by Ceph cluster running version Infernalis
(9.2.0) or newer.
Without the fast-diff feature enabled libvirt will report an allocation
identical to the image capacity. This is how libvirt behaves currently.
'virsh vol-info rbd/image2' might output for example:
Name: image2
Type: network
Capacity: 1,00 GiB
Allocation: 124,00 MiB
Newly created volumes will have the fast-diff feature enabled if the
backing Ceph cluster supports it.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
In commit 0b15f920 there is a #ifdef which requires LIBRBD_VERSION_CODE
266 or newer for rbd_diff_iterate2()
rbd_diff_iterate2() is available since 266, so this if-statement should
require anything newer than 265.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
As more and more features are added to RBD volumes we will need to
call this method more often.
By moving it into a internal function we can re-use code inside the
storage backend.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
It is highly unlikely that a backend will know how to create a
volume from a different volume (buildVolFrom) and not know how to
create an empty volume (createVol). But:
1) we call the function without any prior check so if that's the
case we would SIGSEGV immediatelly
2) it's better to be safe than sorry.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Firstly, we realloc internal list to hold new item (=volume that
will be potentially created) and then we check whether we
actually know how to create it. If we don't we consume more
memory than we really need for no good reason.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Do not store the return value of called functions in the same variable
as the (future) return value of the current function.
This makes tracking the origin of the value easier and reduces
the chance of introducing a new point of exit without resetting
the return value back to -1.
The virStringListLength function does not ever modify the passed
string list. It merely counts the items in it. Make sure that we
reflect this bit in the function header.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
(crobinso: fix up spacing and squash in sheepdog bit suggested
by Andrea)
This was only used in debugging messages and not in any real code.
Ceph/RBD uses uint64_t for sizes internally and they can be printed
with %zu without any need for casting.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Through the years the RBD storage pool code hasn't maintained the
same or correct coding standard which applies to libvirt.
This patch doesn't change any logic in the code, it only applies
the proper coding standards to the code where possible without
making large changes.
This way the code style used in this storage pool is consistent
throughout the whole file.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Rather than have a unwieldy regex string - split it up into its components
each having it's own #define and then combine in a different #define
Signed-off-by: John Ferlan <jferlan@redhat.com>
There are slight differences in various ZFS implementations.
Specifically, ZFS on FreeBSD requires to set value of 'volmode'
option to 'dev' to expose volumes as raw disk device (that's what
we need) rather than geom provides, for example.
With ZFS on Linux, however, such option is not available and
volumes exposed like we need by default.
To make our implementation more flexible, only pass 'volmode'
when it's supported. Support is checked by parsing usage
information of the 'zfs get' command.
Rather than a loop reallocating space to build the regex, just allocate
it once up front, then if there's more than 1 nextent, append a comma and
another regex_unit string.
Signed-off-by: John Ferlan <jferlan@redhat.com>
The 'stripes' value is described as the "Number of stripes or mirrors in
a logical volume". So add "mirror" and anything that starts with "raid"
to the list of segtypes that can have an 'nextents' value greater than one.
Use of raid segtypes (raid1, raid4, raid5*, raid6*, and raid10) is favored
over mirror in more recent lvm code.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than preallocating a set number of elements, then walking through
the extents and adjusting the specific element in place, use the APPEND
macros to handle that chore.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Create a helper routine in order to parse any extents information
including the extent size, length, and the device string contained
within the generated 'lvs' output string.
A future patch would then be able to avoid the code more cleanly
Signed-off-by: John Ferlan <jferlan@redhat.com>
By opening a RBD volume in Read-Only we do not register a
watcher on the header object inside the Ceph cluster.
Refreshing a volume only calls rbd_stat() which is a operation
which does not write to a RBD image.
This allows us to use a cephx user which has no write
permissions if we would want to use the libvirt storage pool
for informational purposes only.
It also saves us a write into the Ceph cluster which should
speed up refreshing a RBD pool.
rbd_open_read_only() is available in all librbd versions which
also support rbd_open().
Signed-off-by: Wido den Hollander <wido@widodh.nl>
RBD supports cloning by creating a snapshot, protecting it and create
a child image based on that snapshot afterwards.
The RBD storage driver will try to find a snapshot with zero deltas between
the current state of the original volume and the snapshot.
If such a snapshot is found a clone/child image will be created using
the rbd_clone2() function from librbd.
rbd_clone2() is available in librbd since Ceph version Dumpling (0.67) which
dates back to August 2013.
It will use the same features, strip size and stripe count as the parent image.
This implementation will only create a single snapshot on the parent image if
never changes. This reduces the amount of snapshots created for that RBD image
which benefits the performance of the Ceph cluster.
During build the decision will be made to use either rbd_diff_iterate() or
rbd_diff_iterate2().
The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
Cloning is only supported if RBD format 2 is used. All images created by libvirt
are already format 2.
If a RBD format 1 image is used as the original volume the backend will report
a VIR_ERR_OPERATION_UNSUPPORTED error.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Using VIR_STORAGE_VOL_WIPE_ALG_TRIM a RBD volume can be trimmed down
to 0 bytes using rbd_discard()
Effectively all the data on the volume will be lost/gone, but the volume
remains available for use afterwards.
Starting at offset 0 the storage pool will call rbd_discard() in stripe
size * count increments which is usually 4MB. Stripe size being 4MB and
count 1.
rbd_discard() is available since Ceph version Dumpling (0.67) which dates
back to August 2013.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This new algorithm adds support for wiping volumes using TRIM.
It does not overwrite all the data in a volume, but it tells the
backing storage pool/driver that all bytes in a volume can be
discarded.
It depends on the backing storage pool how this is handled.
A SCSI backend might send UNMAP commands to remove all data present
on a LUN.
A Ceph backend might use rbd_discard() to instruct the Ceph cluster
that all data on that RBD volume can be discarded.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When wiping the RBD image will be filled with zeros started
at offset 0 and until the end of the volume.
This will result in the RBD volume growing to it's full allocation
on the Ceph cluster. All data on the volume will be overwritten
however, making it unavailable.
It does NOT take any RBD snapshots into account. The original data
might still be in a snapshot of that RBD volume.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Use the cast of (virStorageVolWipeAlgorithm) adding the missing case:'s
(VIR_STORAGE_VOL_WIPE_ALG_ZERO and VIR_STORAGE_VOL_WIPE_ALG_LAST).
Additionally, the old code would also still run the SCRUB command on
default since it didn't go to cleanup when a invalid flag was supplied.
We now go to cleanup and exit if a invalid flag would be provided.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When commit id '82c1740a' made changes to the output format (changing from
using a ',' separator to '#'), the examples in the lvs output from the
comments weren't changed.
Additionally, the two new fields added ('segtype' and 'stripes') were
not included in the output, leaving it well confusing.
This patch fixes the sample output, adds a 'striped' example, and makes
other comment related adjustments for long line and spacing between followup
'NB' remarks (while I'm there).
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1265694
In order to be able to process disk storage pool's using a multipath
device to handle the partitions, libvirt_parthelper will need a way to
not automatically add a partition separator "p" to the generated device
name for each partition found. This is designed to mimic the multipath
features known as 'user_friendly_names' and custom 'alias' name.
If the part_separator attribute is set to "no", then generation of the
multipath partition name will not include the "p" partition separator
unless the source device path name ends with a number. The generated
partition names that get passed back to libvirt are processed in order
to find the device mapper multipath (dm-#) path device.
For example, device path "/dev/mapper/mpatha" would create partitions
"/dev/mapper/mpatha1", "/dev/mapper/mpatha2", etc. instead of
"/dev/mapper/mpathap1", "/dev/mapper/mpathap2", etc. If the device
path ends with a number "/dev/mapper/mpatha1", then the algorithm
to generate names "/dev/mapper/mpatha1p1", "/dev/mapper/mpatha1p2", etc.
would be utilized.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This was reported in bug #1298024 where r would be filled with the
return code of rbd_open().
Should rbd_snap_unprotect() fail for any reason the virReportSystemError
call would return 'Success' since rbd_open() succeeded.
https://bugzilla.redhat.com/show_bug.cgi?id=1298024
Signed-off-by: Wido den Hollander <wido@widodh.nl>
If no port number was provided for a storage pool libvirt defaults to
port 6789; however, librbd/librados already default to 6789 when no port
number is provided.
In the future Ceph will switch to a new port for the Ceph monitors since
port 6789 is already assigned to a different application by IANA.
Port 6789 is assigned to SMC-HTTPS and Ceph now has port 3300 assigned as
the 'Ceph monitor' port.
In this case it is the best solution to not hardcode any port number into
libvirt and let librados handle the connection.
Only if a user specifies a different port number we pass it down to librados,
otherwise we leave it blank.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
merge
It could happen that rbd_list() returns X names, but that while
refreshing the pool one of those RBD images is removed from Ceph
through a different route then libvirt.
We do not need to error out in such case, we can simply ignore the
volume and continue.
error : volStorageBackendRBDRefreshVolInfo:289 :
failed to open the RBD image 'vol-998': No such file or directory
It could also be that one or more Placement Groups (PGs) inside Ceph
are inactive due to a system failure.
If that happens it could be that some RBD images can not be refreshed
and a timeout will be raised by librados.
error : volStorageBackendRBDRefreshVolInfo:289 :
failed to open the RBD image 'vol-893': Connection timed out
Ignore the error and continue to refresh the rest of the pool's
contents.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
It could be that we error out while the RBD image has not been
opened yet. This would cause us to call rbd_close() on pointer
which has not been initialized.
Set it to NULL by default and only close if it is not NULL.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Commit id 'aeb1078ab' added a buildPool option and failure path which
calls virStoragePoolObjRemove, which unlocks the pool, clears the 'pool'
variable, and goto cleanup. However, at cleanup virStoragePoolObjUnlock
is called without check if pool is non NULL.
This used to return 'unkown' and that was not correct.
A vol-dumpxml now returns:
<volume type='network'>
<name>image3</name>
<key>libvirt/image3</key>
<source>
</source>
<capacity unit='bytes'>10737418240</capacity>
<allocation unit='bytes'>10737418240</allocation>
<target>
<path>libvirt/image3</path>
<format type='raw'/>
</target>
</volume>
The RBD driver will now error out if a different format than RAW
is provided when creating a volume.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Valgrind complained:
==28277== 38 bytes in 1 blocks are definitely lost in loss record 298 of 957
==28277== at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==28277== by 0x82D7F57: __vasprintf_chk (in /lib64/libc-2.12.so)
==28277== by 0x52EF16A: virVasprintfInternal (stdio2.h:199)
==28277== by 0x52EF25C: virAsprintfInternal (virstring.c:514)
==28277== by 0x52B1FA9: virFileBuildPath (virfile.c:2831)
==28277== by 0x19B1947C: storageDriverAutostart (storage_driver.c:191)
==28277== by 0x19B196A7: storageStateAutoStart (storage_driver.c:307)
==28277== by 0x538527E: virStateInitialize (libvirt.c:793)
==28277== by 0x11D7CF: daemonRunStateInit (libvirtd.c:947)
==28277== by 0x52F4694: virThreadHelper (virthread.c:206)
==28277== by 0x6E08A50: start_thread (in /lib64/libpthread-2.12.so)
==28277== by 0x82BE93C: clone (in /lib64/libc-2.12.so)
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
The initial commit '74951eade' did not include the proper check for whether
any flags are supported by the driver.
Even though the driver doesn't support VIR_STORAGE_VOL_DELETE_ZEROED,
it still checks and allows the processing to continue
Also add the new VIR_STORAGE_VOL_DELETE_WITH_SNAPSHOTS since it is handled
as of commit id '3c7590e0a'.
https://bugzilla.redhat.com/show_bug.cgi?id=830056
Add flags handling to the virStoragePoolCreate and virStoragePoolCreateXML
API's which will allow the caller to provide the capability for the storage
pool create API's to also perform a pool build during creation rather than
requiring the additional buildPool step. This will allow transient pools
to be defined, built, and started.
The new flags are:
* VIR_STORAGE_POOL_CREATE_WITH_BUILD
Perform buildPool without any flags passed.
* VIR_STORAGE_POOL_CREATE_WITH_BUILD_OVERWRITE
Perform buildPool using VIR_STORAGE_POOL_BUILD_OVERWRITE flag.
* VIR_STORAGE_POOL_CREATE_WITH_BUILD_NO_OVERWRITE
Perform buildPool using VIR_STORAGE_POOL_BUILD_NO_OVERWRITE flag.
It is up to the backend to handle the processing of build flags. The
overwrite and no-overwrite flags are mutually exclusive.
NB:
This patch is loosely based upon code originally authored by Osier
Yang that were not reviewed and pushed, see:
https://www.redhat.com/archives/libvir-list/2012-July/msg01328.html
Commit id '71b803ac' assumed that the storage pool source device path
was required for a 'logical' pool. This resulted in a failure to start
a pool without any device path defined.
So, adjust the virStorageBackendLogicalMatchPoolSource logic to
return success if at least the pool name matches the vgs output
when no pool source device path is/are provided.
https://bugzilla.redhat.com/show_bug.cgi?id=1270709
When a volume wipe is successful, perform a volume refresh afterwards to
update any volume data that may be used in future volume commands, such as
volume resize. For a raw file volume, a wipe could truncate the file and
a followup volume resize the capacity may fail because the volume target
allocation isn't updated to reflect the wipe activity.
The only caller always passes 0 for the extent start.
Drop the 'extent_start' parameter, as well as the mention of extents
from the function name.
Change off_t extent_length to unsigned long long wipe_len, as well as the
'remain' variable.
Return -1:
* on all failures of fdatasync. Instead of propagating -errno
all the way up to the virStorageVolWipe API, which is documented
to return 0 or -1.
* after a partial wipe. If safewrite failed, we would re-use the
non-negative return value of lseek (which should be 0 in this case,
because that's the only offset we seek to).
https://bugzilla.redhat.com/show_bug.cgi?id=1025230
Add a new helper virStorageBackendLogicalMatchPoolSource to compare the
pool's source name against the output from a 'pvs' command to list all
volume group physical volume data on the host. In addition, compare the
pool's source device list against the particular volume group's device
list to ensure the source device(s) listed for the pool match what the
was listed for the volume group.
Then for pool startup or check API's we need to call this new API in
order to ensure that the pool we're about to start or declare active
during checkPool has a valid definition vs. the running host.
Rework virStorageBackendLogicalFindPoolSources a bit to create a
helper virStorageBackendLogicalGetPoolSources that will make the
pvs call in order to generate a list of associated pv_name and vg_name's.
A future patch will make use of this for start/check processing to
ensure the storage pool source definition matches expectations.
https://bugzilla.redhat.com/show_bug.cgi?id=1025230
When determining whether a FS pool is mounted, rather than assuming that
the FS pool is mounted just because the target.path is in the mount list,
let's make sure that the FS pool source matches what is mounted
Refactor the code that builds the pool source string during the FS
storage pool mount to be a separate helper.
A future patch will use the helper in order to validate the mounted
FS matches the pool's expectation during poolCheck processing
The libvirt file system storage driver determines what file to
act on by concatenating the pool location with the volume name.
If a user is able to pick names like "../../../etc/passwd", then
they can escape the bounds of the pool. For that matter,
virStoragePoolListVolumes() doesn't descend into subdirectories,
so a user really shouldn't use a name with a slash.
Normally, only privileged users can coerce libvirt into creating
or opening existing files using the virStorageVol APIs; and such
users already have full privilege to create any domain XML (so it
is not an escalation of privilege). But in the case of
fine-grained ACLs, it is feasible that a user can be granted
storage_vol:create but not domain:write, and it violates
assumptions if such a user can abuse libvirt to access files
outside of the storage pool.
Therefore, prevent all use of volume names that contain "/",
whether or not such a name is actually attempting to escape the
pool.
This changes things from:
$ virsh vol-create-as default ../../../../../../etc/haha --capacity 128
Vol ../../../../../../etc/haha created
$ rm /etc/haha
to:
$ virsh vol-create-as default ../../../../../../etc/haha --capacity 128
error: Failed to create vol ../../../../../../etc/haha
error: Requested operation is not valid: volume name '../../../../../../etc/haha' cannot contain '/'
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1276198
Prior to commit id '98322052' failure to saferead the block device would
cause an error to be logged and the device to be skipped while attempting
to discover/create a stable target path for a new LUN (NPIV).
This was because virStorageBackendSCSIFindLUs ignored errors from
processLU and virStorageBackendSCSINewLun.
Ignoring the failure allowed a multipath device with an "active" and
"ghost" to be present on the host with the "ghost" block device being
ignored. This patch will return a -2 to the caller indicating the desire
to ignore the block device since it cannot be used directly rather than
fail the pool startup.
I found this useful while processing a volume that wouldn't end up
showing up in the resulting list of block volumes. In this case, the
partition type wasn't found in the disk_types table.
Similar to the openflags VIR_STORAGE_VOL_OPEN_NOERROR processing, if some
read processing operation fails, check the readflags for the corresponding
error flag being set. If so, rather then causing an error - use VIR_WARN
to flag the error, but return -2 which some callers can use to perform
specific actions. Use a new VIR_STORAGE_VOL_READ_NOERROR flag in a new
VolReadErrorMode enum.
While processing the volume for lseek, virFileReadHeaderFD, and
virStorageFileGetMetadataFromBuf - failure would cause an error,
but ret would not be set. That would result in an error message being
sent, but successful status being returned.
Just so it's clearer what to expect upon input and what types of return
values could be generated. These were loosely copied from existing
virStorageBackendUpdateVolTargetInfoFD.
Similar to the openflags which allow VIR_STORAGE_VOL_OPEN_NOERROR to be
passed to avoid open errors, add a 'readflags' variable so that in the
future read failures could also be ignored.
https://bugzilla.redhat.com/show_bug.cgi?id=1282288
Rather than using just open on the path, allow for the possibility that
the path to be opened resides on an NFS root-squash target and was created
under a different uid/gid.
Without using virFileOpenAs an attempt to get the volume size data may fail
if the current user doesn't have permissions to read the volume, such as
would be the case if mode wasn't supplied in the volume XML and the default
VIR_STORAGE_DEFAULT_VOL_PERM_MODE (e.g. 0600) was used. Under this scenario
the owner/group is not root:root, thus this path run under root would fail
to open/read the volume.
NB: The virFileOpenAs code using OPEN_FORK will only work when the failure
is not EACESS/EPERM and the path resolves to a shared file system.
https://bugzilla.redhat.com/show_bug.cgi?id=1282288
Although commit id '77346f27' resolves part of the problem regarding creating
a qemu-img image in an NFS root-squash environment, it really didn't fix the
entire problem. Unfortunately it only masked the problem. It seems qemu-img
must open/create the image using 0644, which if used by target.perms would
result in the chmod not being called since the mode desired and set match.
Although qemu-img could conceivably ignore the mode when creating, libvirt
has more knowledge of the environment and can make the adjustment to the
mode far more easily by using virFileOpenAs with VIR_FILE_OPEN_FORCE_MODE.
If that's successful, then we know on return the file will have the right
owner and mode, so we can declare success
https://bugzilla.redhat.com/show_bug.cgi?id=1277781
The virStoragePoolFCRefreshThread had passed a pointer to the pool obj
in the virStoragePoolFCRefreshInfoPtr; however, we cannot assume that
the pool exists still since we don't keep the pool lock throughout
the duration of the thread.
Therefore, instead of passing the pool obj pointer, pass the UUID of
the pool and perform a lookup. If found, then we can perform the
refresh using the locked pool obj pointer; otherwise, we just exit
the thread since the pool is now gone.
https://bugzilla.redhat.com/show_bug.cgi?id=1233003
Commit id 'fdda3760' only managed a symptom where it was possible to
create a file in a pool without libvirt's knowledge, so it was reverted.
The real fix is to have all the createVol API's which actually create
a volume (disk, logical, zfs) and the buildVol API's which handle the
real creation of some volume file (fs, rbd, sheepdog) manage deleting
any volume which they create when there is some sort of error in
processing the volume.
This way the onus isn't left up to the storage_driver to determine whether
the buildVol failure was due to some failure as a result of adjustments
made to the volume after creation such as getting sizes, changing ownership,
changing volume protections, etc. or simple a failure in creation.
Without needing to consider that the volume has to be removed, the
buildVol failure path only needs to remove the volume from the pool.
This way if a creation failed due to duplicate name, libvirt wouldn't
remove a volume that it didn't create in the pool target.
This reverts commit fdda37608a.
This commit only manages a symptom of finding a buildRet failure
where a volume was not listed in the pool, but someone created the
volume outside of libvirt in the pool being managed by libvirt.
After successfully returning from virFileOpenAs, if subsequent calls fail,
then we need to remove the file since our caller expects that failures after
creation will remove the created file.
After a successful qemu-img/qcow-create of the backing file, if we
fail to stat the file, change it owner/group, or mode, then the
cleanup path should remove the file.
Currently the code does not handle the NFS root squash environment
properly since if the file gets created, then the subsequent chmod
will fail in a root squash environment where we're creating a file
in the pool with qemu tools, such as seen via:
$ virsh vol-create-from $pool $file.xml file.img --inputpool $pool
assuming $file.xml is creating a file of "<format type='qcow2'"> from
an existing file.img in the pool of "<format type='raw'>".
This patch will utilize the virCommandSetUmask when creating the file
in the NETFS pool. The virCommandSetUmask API was added in commit id
'0e1a1a8c4', which was after the original code was developed in commit
id 'e1f27784' to attempt to handle the root squash environment.
Also, rather than blindly attempting to chmod, check to see if the
st_mode bits from the stat match what we're trying to set and only
make the chmod if they don't.
Also, a slight adjustment to the fallback algorithm to move the
virCommandSetUID/virCommandSetGID inside the if (!filecreated) since
they're only useful if we need to attempt to create the file again.
When a RBD volume has snapshots it can not be removed.
This patch introduces a new flag to force volume removal,
VIR_STORAGE_VOL_DELETE_WITH_SNAPSHOTS.
With this flag any existing snapshots will be removed prior to
removing the volume.
No existing mechanism in libvirt allowed us to pass such information,
so that's why a new flag was introduced.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
We have macros for both positive and negative string matching.
Therefore there is no need to use !STREQ or !STRNEQ. At the same
time as we are dropping this, new syntax-check rule is
introduced to make sure we won't introduce it again.
Signed-off-by: Ishmanpreet Kaur Khera <khera.ishman@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1233003
Track when the logical volume was successfully created in order to
properly handle the call to virStorageBackendLogicalDeleteVol. It's
possible that the failure to create was because someone created an
LV in the pool outside of libvirt's knowledge. In this case, we don't
want to delete that LV. A subsequent or future refresh of the pool
will find the volume and cause an earlier failure
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit id '1b5685da' refactored the code to move buildvoldef inside
the buildVol conditional; however, the VIR_FREE of the memory was
left only when 'buildret' failed, thus we're leaking memory.
Signed-off-by: John Ferlan <jferlan@redhat.com>
As of commit id '155ca616' a 'refreshVol' is called after a buildVol
succeeds in storageVolCreateXML, thus a volStorageBackendSheepdogRefreshVolInfo
call in virStorageBackendSheepdogBuildVol is no longer necessary.
Additionally, the 'conn' parameter becomes unused.
Signed-off-by: John Ferlan <jferlan@redhat.com>
As of commit id '155ca616' a 'refreshVol' is called after the buildVol
succeeds in storageVolCreateXML, thus the volStorageBackendRBDRefreshVolInfo
call in virStorageBackendRBDBuildVol is no longer necessary.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1256999
After creating a copy of the 'authdef' in a pool -> disk translation,
unconditionally clear the 'authType' in the resulting disk auth def
structure since that's used for a storage pool and not a disk. This
ensures virStorageAuthDefFormat will properly format the <auth> XML
for a <disk> (e.g. it won't have a <auth type='%s'.../>).
https://bugzilla.redhat.com/show_bug.cgi?id=1247987
Calculation of the extended and logical partition values for the disk
pool is complex. As the bz points out an extended partition should have
it's allocation initialized to 0 (zero) and keep the capacity as the size
dictated by the extents read. Then for each logical partition found,
adjust the allocation of the extended partition.
Finally, previous logic tried to avoid recalculating things if a logical
partition was deleted; however, since we now have special logic to handle
the allocation of the extended partition, just make life easier by reading
the partition table again - rather than doing the reverse adjustment.
https://bugzilla.redhat.com/show_bug.cgi?id=1251461
When 'starting' up a disk pool, we need to make sure the label on the
device is valid; otherwise, the followup refreshPool will assume the
disk has been properly formatted for use. If we don't find the valid
label, then refuse the start and give a proper reason.