Rather than preallocating a set number of elements, then walking through
the extents and adjusting the specific element in place, use the APPEND
macros to handle that chore.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Create a helper routine in order to parse any extents information
including the extent size, length, and the device string contained
within the generated 'lvs' output string.
A future patch would then be able to avoid the code more cleanly
Signed-off-by: John Ferlan <jferlan@redhat.com>
By opening a RBD volume in Read-Only we do not register a
watcher on the header object inside the Ceph cluster.
Refreshing a volume only calls rbd_stat() which is a operation
which does not write to a RBD image.
This allows us to use a cephx user which has no write
permissions if we would want to use the libvirt storage pool
for informational purposes only.
It also saves us a write into the Ceph cluster which should
speed up refreshing a RBD pool.
rbd_open_read_only() is available in all librbd versions which
also support rbd_open().
Signed-off-by: Wido den Hollander <wido@widodh.nl>
RBD supports cloning by creating a snapshot, protecting it and create
a child image based on that snapshot afterwards.
The RBD storage driver will try to find a snapshot with zero deltas between
the current state of the original volume and the snapshot.
If such a snapshot is found a clone/child image will be created using
the rbd_clone2() function from librbd.
rbd_clone2() is available in librbd since Ceph version Dumpling (0.67) which
dates back to August 2013.
It will use the same features, strip size and stripe count as the parent image.
This implementation will only create a single snapshot on the parent image if
never changes. This reduces the amount of snapshots created for that RBD image
which benefits the performance of the Ceph cluster.
During build the decision will be made to use either rbd_diff_iterate() or
rbd_diff_iterate2().
The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
Cloning is only supported if RBD format 2 is used. All images created by libvirt
are already format 2.
If a RBD format 1 image is used as the original volume the backend will report
a VIR_ERR_OPERATION_UNSUPPORTED error.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Using VIR_STORAGE_VOL_WIPE_ALG_TRIM a RBD volume can be trimmed down
to 0 bytes using rbd_discard()
Effectively all the data on the volume will be lost/gone, but the volume
remains available for use afterwards.
Starting at offset 0 the storage pool will call rbd_discard() in stripe
size * count increments which is usually 4MB. Stripe size being 4MB and
count 1.
rbd_discard() is available since Ceph version Dumpling (0.67) which dates
back to August 2013.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This new algorithm adds support for wiping volumes using TRIM.
It does not overwrite all the data in a volume, but it tells the
backing storage pool/driver that all bytes in a volume can be
discarded.
It depends on the backing storage pool how this is handled.
A SCSI backend might send UNMAP commands to remove all data present
on a LUN.
A Ceph backend might use rbd_discard() to instruct the Ceph cluster
that all data on that RBD volume can be discarded.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When wiping the RBD image will be filled with zeros started
at offset 0 and until the end of the volume.
This will result in the RBD volume growing to it's full allocation
on the Ceph cluster. All data on the volume will be overwritten
however, making it unavailable.
It does NOT take any RBD snapshots into account. The original data
might still be in a snapshot of that RBD volume.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Use the cast of (virStorageVolWipeAlgorithm) adding the missing case:'s
(VIR_STORAGE_VOL_WIPE_ALG_ZERO and VIR_STORAGE_VOL_WIPE_ALG_LAST).
Additionally, the old code would also still run the SCRUB command on
default since it didn't go to cleanup when a invalid flag was supplied.
We now go to cleanup and exit if a invalid flag would be provided.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When commit id '82c1740a' made changes to the output format (changing from
using a ',' separator to '#'), the examples in the lvs output from the
comments weren't changed.
Additionally, the two new fields added ('segtype' and 'stripes') were
not included in the output, leaving it well confusing.
This patch fixes the sample output, adds a 'striped' example, and makes
other comment related adjustments for long line and spacing between followup
'NB' remarks (while I'm there).
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1265694
In order to be able to process disk storage pool's using a multipath
device to handle the partitions, libvirt_parthelper will need a way to
not automatically add a partition separator "p" to the generated device
name for each partition found. This is designed to mimic the multipath
features known as 'user_friendly_names' and custom 'alias' name.
If the part_separator attribute is set to "no", then generation of the
multipath partition name will not include the "p" partition separator
unless the source device path name ends with a number. The generated
partition names that get passed back to libvirt are processed in order
to find the device mapper multipath (dm-#) path device.
For example, device path "/dev/mapper/mpatha" would create partitions
"/dev/mapper/mpatha1", "/dev/mapper/mpatha2", etc. instead of
"/dev/mapper/mpathap1", "/dev/mapper/mpathap2", etc. If the device
path ends with a number "/dev/mapper/mpatha1", then the algorithm
to generate names "/dev/mapper/mpatha1p1", "/dev/mapper/mpatha1p2", etc.
would be utilized.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This was reported in bug #1298024 where r would be filled with the
return code of rbd_open().
Should rbd_snap_unprotect() fail for any reason the virReportSystemError
call would return 'Success' since rbd_open() succeeded.
https://bugzilla.redhat.com/show_bug.cgi?id=1298024
Signed-off-by: Wido den Hollander <wido@widodh.nl>
If no port number was provided for a storage pool libvirt defaults to
port 6789; however, librbd/librados already default to 6789 when no port
number is provided.
In the future Ceph will switch to a new port for the Ceph monitors since
port 6789 is already assigned to a different application by IANA.
Port 6789 is assigned to SMC-HTTPS and Ceph now has port 3300 assigned as
the 'Ceph monitor' port.
In this case it is the best solution to not hardcode any port number into
libvirt and let librados handle the connection.
Only if a user specifies a different port number we pass it down to librados,
otherwise we leave it blank.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
merge
It could happen that rbd_list() returns X names, but that while
refreshing the pool one of those RBD images is removed from Ceph
through a different route then libvirt.
We do not need to error out in such case, we can simply ignore the
volume and continue.
error : volStorageBackendRBDRefreshVolInfo:289 :
failed to open the RBD image 'vol-998': No such file or directory
It could also be that one or more Placement Groups (PGs) inside Ceph
are inactive due to a system failure.
If that happens it could be that some RBD images can not be refreshed
and a timeout will be raised by librados.
error : volStorageBackendRBDRefreshVolInfo:289 :
failed to open the RBD image 'vol-893': Connection timed out
Ignore the error and continue to refresh the rest of the pool's
contents.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
It could be that we error out while the RBD image has not been
opened yet. This would cause us to call rbd_close() on pointer
which has not been initialized.
Set it to NULL by default and only close if it is not NULL.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Commit id 'aeb1078ab' added a buildPool option and failure path which
calls virStoragePoolObjRemove, which unlocks the pool, clears the 'pool'
variable, and goto cleanup. However, at cleanup virStoragePoolObjUnlock
is called without check if pool is non NULL.
This used to return 'unkown' and that was not correct.
A vol-dumpxml now returns:
<volume type='network'>
<name>image3</name>
<key>libvirt/image3</key>
<source>
</source>
<capacity unit='bytes'>10737418240</capacity>
<allocation unit='bytes'>10737418240</allocation>
<target>
<path>libvirt/image3</path>
<format type='raw'/>
</target>
</volume>
The RBD driver will now error out if a different format than RAW
is provided when creating a volume.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Valgrind complained:
==28277== 38 bytes in 1 blocks are definitely lost in loss record 298 of 957
==28277== at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==28277== by 0x82D7F57: __vasprintf_chk (in /lib64/libc-2.12.so)
==28277== by 0x52EF16A: virVasprintfInternal (stdio2.h:199)
==28277== by 0x52EF25C: virAsprintfInternal (virstring.c:514)
==28277== by 0x52B1FA9: virFileBuildPath (virfile.c:2831)
==28277== by 0x19B1947C: storageDriverAutostart (storage_driver.c:191)
==28277== by 0x19B196A7: storageStateAutoStart (storage_driver.c:307)
==28277== by 0x538527E: virStateInitialize (libvirt.c:793)
==28277== by 0x11D7CF: daemonRunStateInit (libvirtd.c:947)
==28277== by 0x52F4694: virThreadHelper (virthread.c:206)
==28277== by 0x6E08A50: start_thread (in /lib64/libpthread-2.12.so)
==28277== by 0x82BE93C: clone (in /lib64/libc-2.12.so)
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
The initial commit '74951eade' did not include the proper check for whether
any flags are supported by the driver.
Even though the driver doesn't support VIR_STORAGE_VOL_DELETE_ZEROED,
it still checks and allows the processing to continue
Also add the new VIR_STORAGE_VOL_DELETE_WITH_SNAPSHOTS since it is handled
as of commit id '3c7590e0a'.
https://bugzilla.redhat.com/show_bug.cgi?id=830056
Add flags handling to the virStoragePoolCreate and virStoragePoolCreateXML
API's which will allow the caller to provide the capability for the storage
pool create API's to also perform a pool build during creation rather than
requiring the additional buildPool step. This will allow transient pools
to be defined, built, and started.
The new flags are:
* VIR_STORAGE_POOL_CREATE_WITH_BUILD
Perform buildPool without any flags passed.
* VIR_STORAGE_POOL_CREATE_WITH_BUILD_OVERWRITE
Perform buildPool using VIR_STORAGE_POOL_BUILD_OVERWRITE flag.
* VIR_STORAGE_POOL_CREATE_WITH_BUILD_NO_OVERWRITE
Perform buildPool using VIR_STORAGE_POOL_BUILD_NO_OVERWRITE flag.
It is up to the backend to handle the processing of build flags. The
overwrite and no-overwrite flags are mutually exclusive.
NB:
This patch is loosely based upon code originally authored by Osier
Yang that were not reviewed and pushed, see:
https://www.redhat.com/archives/libvir-list/2012-July/msg01328.html
Commit id '71b803ac' assumed that the storage pool source device path
was required for a 'logical' pool. This resulted in a failure to start
a pool without any device path defined.
So, adjust the virStorageBackendLogicalMatchPoolSource logic to
return success if at least the pool name matches the vgs output
when no pool source device path is/are provided.
https://bugzilla.redhat.com/show_bug.cgi?id=1270709
When a volume wipe is successful, perform a volume refresh afterwards to
update any volume data that may be used in future volume commands, such as
volume resize. For a raw file volume, a wipe could truncate the file and
a followup volume resize the capacity may fail because the volume target
allocation isn't updated to reflect the wipe activity.
The only caller always passes 0 for the extent start.
Drop the 'extent_start' parameter, as well as the mention of extents
from the function name.
Change off_t extent_length to unsigned long long wipe_len, as well as the
'remain' variable.
Return -1:
* on all failures of fdatasync. Instead of propagating -errno
all the way up to the virStorageVolWipe API, which is documented
to return 0 or -1.
* after a partial wipe. If safewrite failed, we would re-use the
non-negative return value of lseek (which should be 0 in this case,
because that's the only offset we seek to).
https://bugzilla.redhat.com/show_bug.cgi?id=1025230
Add a new helper virStorageBackendLogicalMatchPoolSource to compare the
pool's source name against the output from a 'pvs' command to list all
volume group physical volume data on the host. In addition, compare the
pool's source device list against the particular volume group's device
list to ensure the source device(s) listed for the pool match what the
was listed for the volume group.
Then for pool startup or check API's we need to call this new API in
order to ensure that the pool we're about to start or declare active
during checkPool has a valid definition vs. the running host.
Rework virStorageBackendLogicalFindPoolSources a bit to create a
helper virStorageBackendLogicalGetPoolSources that will make the
pvs call in order to generate a list of associated pv_name and vg_name's.
A future patch will make use of this for start/check processing to
ensure the storage pool source definition matches expectations.
https://bugzilla.redhat.com/show_bug.cgi?id=1025230
When determining whether a FS pool is mounted, rather than assuming that
the FS pool is mounted just because the target.path is in the mount list,
let's make sure that the FS pool source matches what is mounted
Refactor the code that builds the pool source string during the FS
storage pool mount to be a separate helper.
A future patch will use the helper in order to validate the mounted
FS matches the pool's expectation during poolCheck processing
The libvirt file system storage driver determines what file to
act on by concatenating the pool location with the volume name.
If a user is able to pick names like "../../../etc/passwd", then
they can escape the bounds of the pool. For that matter,
virStoragePoolListVolumes() doesn't descend into subdirectories,
so a user really shouldn't use a name with a slash.
Normally, only privileged users can coerce libvirt into creating
or opening existing files using the virStorageVol APIs; and such
users already have full privilege to create any domain XML (so it
is not an escalation of privilege). But in the case of
fine-grained ACLs, it is feasible that a user can be granted
storage_vol:create but not domain:write, and it violates
assumptions if such a user can abuse libvirt to access files
outside of the storage pool.
Therefore, prevent all use of volume names that contain "/",
whether or not such a name is actually attempting to escape the
pool.
This changes things from:
$ virsh vol-create-as default ../../../../../../etc/haha --capacity 128
Vol ../../../../../../etc/haha created
$ rm /etc/haha
to:
$ virsh vol-create-as default ../../../../../../etc/haha --capacity 128
error: Failed to create vol ../../../../../../etc/haha
error: Requested operation is not valid: volume name '../../../../../../etc/haha' cannot contain '/'
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1276198
Prior to commit id '98322052' failure to saferead the block device would
cause an error to be logged and the device to be skipped while attempting
to discover/create a stable target path for a new LUN (NPIV).
This was because virStorageBackendSCSIFindLUs ignored errors from
processLU and virStorageBackendSCSINewLun.
Ignoring the failure allowed a multipath device with an "active" and
"ghost" to be present on the host with the "ghost" block device being
ignored. This patch will return a -2 to the caller indicating the desire
to ignore the block device since it cannot be used directly rather than
fail the pool startup.
I found this useful while processing a volume that wouldn't end up
showing up in the resulting list of block volumes. In this case, the
partition type wasn't found in the disk_types table.
Similar to the openflags VIR_STORAGE_VOL_OPEN_NOERROR processing, if some
read processing operation fails, check the readflags for the corresponding
error flag being set. If so, rather then causing an error - use VIR_WARN
to flag the error, but return -2 which some callers can use to perform
specific actions. Use a new VIR_STORAGE_VOL_READ_NOERROR flag in a new
VolReadErrorMode enum.
While processing the volume for lseek, virFileReadHeaderFD, and
virStorageFileGetMetadataFromBuf - failure would cause an error,
but ret would not be set. That would result in an error message being
sent, but successful status being returned.
Just so it's clearer what to expect upon input and what types of return
values could be generated. These were loosely copied from existing
virStorageBackendUpdateVolTargetInfoFD.
Similar to the openflags which allow VIR_STORAGE_VOL_OPEN_NOERROR to be
passed to avoid open errors, add a 'readflags' variable so that in the
future read failures could also be ignored.
https://bugzilla.redhat.com/show_bug.cgi?id=1282288
Rather than using just open on the path, allow for the possibility that
the path to be opened resides on an NFS root-squash target and was created
under a different uid/gid.
Without using virFileOpenAs an attempt to get the volume size data may fail
if the current user doesn't have permissions to read the volume, such as
would be the case if mode wasn't supplied in the volume XML and the default
VIR_STORAGE_DEFAULT_VOL_PERM_MODE (e.g. 0600) was used. Under this scenario
the owner/group is not root:root, thus this path run under root would fail
to open/read the volume.
NB: The virFileOpenAs code using OPEN_FORK will only work when the failure
is not EACESS/EPERM and the path resolves to a shared file system.
https://bugzilla.redhat.com/show_bug.cgi?id=1282288
Although commit id '77346f27' resolves part of the problem regarding creating
a qemu-img image in an NFS root-squash environment, it really didn't fix the
entire problem. Unfortunately it only masked the problem. It seems qemu-img
must open/create the image using 0644, which if used by target.perms would
result in the chmod not being called since the mode desired and set match.
Although qemu-img could conceivably ignore the mode when creating, libvirt
has more knowledge of the environment and can make the adjustment to the
mode far more easily by using virFileOpenAs with VIR_FILE_OPEN_FORCE_MODE.
If that's successful, then we know on return the file will have the right
owner and mode, so we can declare success
https://bugzilla.redhat.com/show_bug.cgi?id=1277781
The virStoragePoolFCRefreshThread had passed a pointer to the pool obj
in the virStoragePoolFCRefreshInfoPtr; however, we cannot assume that
the pool exists still since we don't keep the pool lock throughout
the duration of the thread.
Therefore, instead of passing the pool obj pointer, pass the UUID of
the pool and perform a lookup. If found, then we can perform the
refresh using the locked pool obj pointer; otherwise, we just exit
the thread since the pool is now gone.
https://bugzilla.redhat.com/show_bug.cgi?id=1233003
Commit id 'fdda3760' only managed a symptom where it was possible to
create a file in a pool without libvirt's knowledge, so it was reverted.
The real fix is to have all the createVol API's which actually create
a volume (disk, logical, zfs) and the buildVol API's which handle the
real creation of some volume file (fs, rbd, sheepdog) manage deleting
any volume which they create when there is some sort of error in
processing the volume.
This way the onus isn't left up to the storage_driver to determine whether
the buildVol failure was due to some failure as a result of adjustments
made to the volume after creation such as getting sizes, changing ownership,
changing volume protections, etc. or simple a failure in creation.
Without needing to consider that the volume has to be removed, the
buildVol failure path only needs to remove the volume from the pool.
This way if a creation failed due to duplicate name, libvirt wouldn't
remove a volume that it didn't create in the pool target.
This reverts commit fdda37608a.
This commit only manages a symptom of finding a buildRet failure
where a volume was not listed in the pool, but someone created the
volume outside of libvirt in the pool being managed by libvirt.
After successfully returning from virFileOpenAs, if subsequent calls fail,
then we need to remove the file since our caller expects that failures after
creation will remove the created file.
After a successful qemu-img/qcow-create of the backing file, if we
fail to stat the file, change it owner/group, or mode, then the
cleanup path should remove the file.
Currently the code does not handle the NFS root squash environment
properly since if the file gets created, then the subsequent chmod
will fail in a root squash environment where we're creating a file
in the pool with qemu tools, such as seen via:
$ virsh vol-create-from $pool $file.xml file.img --inputpool $pool
assuming $file.xml is creating a file of "<format type='qcow2'"> from
an existing file.img in the pool of "<format type='raw'>".
This patch will utilize the virCommandSetUmask when creating the file
in the NETFS pool. The virCommandSetUmask API was added in commit id
'0e1a1a8c4', which was after the original code was developed in commit
id 'e1f27784' to attempt to handle the root squash environment.
Also, rather than blindly attempting to chmod, check to see if the
st_mode bits from the stat match what we're trying to set and only
make the chmod if they don't.
Also, a slight adjustment to the fallback algorithm to move the
virCommandSetUID/virCommandSetGID inside the if (!filecreated) since
they're only useful if we need to attempt to create the file again.
When a RBD volume has snapshots it can not be removed.
This patch introduces a new flag to force volume removal,
VIR_STORAGE_VOL_DELETE_WITH_SNAPSHOTS.
With this flag any existing snapshots will be removed prior to
removing the volume.
No existing mechanism in libvirt allowed us to pass such information,
so that's why a new flag was introduced.
Signed-off-by: Wido den Hollander <wido@widodh.nl>