https://bugzilla.redhat.com/show_bug.cgi?id=1251190
So, if domain loses access to storage, sanlock tries to kill it
after some timeout. So far, the default is 80 seconds. But for
some scenarios this might not be enough. We should allow users to
adjust the timeout according to their needs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Lets use wrapper functions virLockDaemonLock and
virLockDaemonUnlock instead of virMutexLock and virMutexUnlock.
This has no functional impact, but it's easier to read (at least
for me).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So we have this mechanism that on SIGUSR1 the virtlockd dumps its
internal state into a JSON file, reexec itself and the reloads
the internal state back. However, there's a bug in our
implementation:
(gdb) signal SIGUSR1
Continuing with signal SIGUSR1.
[Thread 0x7fd094f7b700 (LWP 10602) exited]
process 10600 is executing new program: /home/zippy/work/libvirt/libvirt.git/src/virtlockd
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fb28bc3c700 (LWP 14501)]
Program received signal SIGSEGV, Segmentation fault.
0x00007fb29133d530 in virExpandN (ptrptr=0x70, size=8, countptr=0x68, add=1, report=true, domcode=7, filename=0x7fb29138aeab "rpc/virnetserver.c", funcname=0x7fb29138b680 <__FUNCTION__.15821> "virNetServerAddProgram", linenr=661) at util/viralloc.c:288
288 if (*countptr + add < *countptr) {
(gdb) bt
#0 0x00007fb29133d530 in virExpandN (ptrptr=0x70, size=8, countptr=0x68, add=1, report=true, domcode=7, filename=0x7fb29138aeab "rpc/virnetserver.c", funcname=0x7fb29138b680 <__FUNCTION__.15821> "virNetServerAddProgram", linenr=661) at util/viralloc.c:288
#1 0x00007fb29132a267 in virNetServerAddProgram (srv=0x0, prog=0x7fb2915d08b0) at rpc/virnetserver.c:661
#2 0x00007fb29131f27f in main (argc=1, argv=0x7fff8f771298) at locking/lock_daemon.c:1445
Notice the NULL @srv passed to frame 2? Usually, the @srv
variable is initialized on fresh start. However, in case of
daemon reload, the code path that is responsible for initializing
the value was not triggered and therefore we crashed immediately.
Fix this by always setting the variable.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The internal representation of a JSON array counts the items in
size_t. However, for some reason, when asking for the count it's
reported as int. Firstly, we need the function to return a signed
type as it's returning -1 on an error. But, not every system has
integer the same size as size_t. Therefore, lets return ssize_t.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that we have virNetDaemon object holding all the data and being
capable of referencing multiple servers, having a duplicate reference to
a single server stored in virLockDaemon isn't necessary anymore. This
patch removes the above described element.
Since its introduction in 2011 (particularly in commit f4324e3292),
the option doesn't work. It just effectively disables all incoming
connections. That's because the client private data that contain the
'keepalive_supported' boolean, are initialized to zeroes so the bool is
false and the only other place where the bool is used is when checking
whether the client supports keepalive. Thus, according to the server,
no client supports keepalive.
Removing this instead of fixing it is better because a) apparently
nobody ever tried it since 2011 (4 years without one month) and b) we
cannot know whether the client supports keepalive until we get a ping or
pong keepalive packet. And that won't happen until after we dispatched
the ConnectOpen call.
Another two reasons would be c) the keepalive_required was tracked on
the server level, but keepalive_supported was in private data of the
client as well as the check that was made in the remote layer, thus
making all other instances of virNetServer miss this feature unless they
all implemented it for themselves and d) we can always add it back in
case there is a request and a use-case for it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Some hypervisors like Xen do not have PIDs associated with domains.
Relax the requirement for PID != 0 in the locking code so it can
be used by hypervisors that do not represent domains as a process
running on the host.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
When acquiring resource via sanlock fails, we would report it as
VIR_ERR_INTERNAL_ERROR, which is not very friendly to applications using
libvirt. Moreover, the lockd driver would report the same failure as
VIR_ERR_RESOURCE_BUSY, which looks better.
Unfortunately, in sanlock driver we don't really know if acquiring the
resource failed because it was already locked or there was another
reason behind. But the end result is the same and I think using
VIR_ERR_RESOURCE_BUSY reason for all acquire failures is still better
than what we have now.
https://bugzilla.redhat.com/show_bug.cgi?id=1165119
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In the order of appearance:
* MAX_LISTEN - never used
added by 23ad665c (qemud) and addec57 (lock daemon)
* NEXT_FREE_CLASS_ID - never used, added by 07d1b6b
* virLockError - never used, added by eb8268a4
* OPENVZ_MAX_ARG, CMDBUF_LEN, CMDOP_LEN
unused since the removal of ADD_ARG_LIT in d8b31306
* QEMU_NB_PER_CPU_STAT_PARAM - unused since 897808e
* QEMU_CMD_PROMPT, QEMU_PASSWD_PROMPT - unused since 1dc10a7
* TEST_MODEL_WORDSIZE - unused since c25c18f7
* TEMPDIR - never used, added by 714bef5
* NSIG - workaround around old headers
added by commit 60ed1d2
unused since virExec was moved by commit 02e8691
* DO_TEST_PARSE - never used, added by 9afa006
* DIFF_MSEC, GETTIMEOFDAY - unused since eee6eb6
Commit v1.2.4-52-gda879e5 fixed issues with domains started before
sanlock driver was enabled by checking whether a running domain is
registered with sanlock and if it's not, sanlock driver is basically
ignored for the domain.
However, it was checking this even for domain which has just been
started and no sanlock_* API was called for them yet. This results in
cmd 9 target pid 2135544 not found
error messages to appear in sanlock.log whenever we start a new domain.
This patch avoids this useless check for freshly started domains.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
virLockManager*New APIs are never called with
VIR_LOCK_MANAGER_USES_STATE. Moreover, lockd driver does not maintain
any state that would need to be transferred during migration and thus it
should not mention VIR_LOCK_MANAGER_USES_STATE at all.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Not all files we want to find using virFileFindResource{,Full} are
generated when libvirt is built, some of them (such as RNG schemas) are
distributed with sources. The current API was not able to find source
files if libvirt was built in VPATH.
Both RNG schemas and cpu_map.xml are distributed in source tarball.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Remove the resize flag and use the same code path for all callers.
This flag was added by commit 18f0316 to allow virStorageFileResize
use 'safezero' while preserving the behavior.
Explicitly return -2 when a fallback to a different method should
be done, to make the code path more obvious.
Fail immediately when ftruncate fails in the mmap method,
as we did before commit 18f0316.
Currently virStorageFileResize() function uses build conditionals to
choose either the posix_fallocate() or syscall(SYS_fallocate) with no
fallback in order to preallocate the space in the newly resized file.
Since the safezero code has a similar set of conditionals modify the
resize and safezero code in order to allow the resize logic to make use
of safezero to unify the look/feel of the code paths.
Add a new boolean (resize) to safezero() to make the optional decision
whether to try syscall(SYS_fallocate) if the posix_fallocate fails because
HAVE_POSIX_FALLOCATE is not defined (eg, return -1 and errno == 0).
Create a local safezero_sys_fallocate in order to handle the resize
code paths that support that. If not present, the set errno = ENOSYS
in order to allow the caller to handle the failure scenarios.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1160995
In our config files users are expected to pass several integer values
for different configuration knobs. However, majority of them expect a
nonnegative number and only a few of them accept a negative number too
(notably keepalive_interval in libvirtd.conf).
Therefore, a new type to config value is introduced: VIR_CONF_ULONG
that is set whenever an integer is positive or zero. With this
approach knobs accepting VIR_CONF_LONG should accept VIR_CONF_ULONG
too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since virDomainFree will call virObjectUnref anyway, let's just use that
directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
On some places in the libvirt code we have:
f(a,z)
instead of
f(a, z)
This trivial patch fixes couple of such occurrences.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since not only systemd can do this (we'll be doing it as well few
patches later), change 'systemd' to 'caller' and fix LISTEN_FDS to
LISTEN_PID where applicable.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In many places we define a variable as a 'const char *' when in fact
we modify it just a few lines below. Or even free it. We should not do
that.
There's one exception though, in xenSessionFree() xenapi_utils.c. We
are freeing the xen_session structure which is defined in
xen/api/xen_common.h public header. The structure contains session_id
which is type of 'const char *' when in fact it should have been just
'char *'. So I'm leaving this unmodified, just noticing the fact in
comment.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In the future we might need to track state of individual images. Move
the readonly and shared flags to the virStorageSource struct so that we
can keep them in a per-image basis.
Replace:
if (virBufferError(&buf)) {
virBufferFreeAndReset(&buf);
virReportOOMError();
...
}
with:
if (virBufferCheckError(&buf) < 0)
...
This should not be a functional change (unless some callers
misused the virBuffer APIs - a different error would be reported
then)
When a domain was started without registration in sanlock, but libvirt
was restarted after that, most of the operations failed due to
contacting sanlock about that process. E.g. migration could not be
performed because the locks couldn't be released (or inquired before a
release).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1088034
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Just move some code around for future patches to ease the review.
With this patch there is no need for drastic cleanup path later.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The virnetsocket.c API is hardcoded to pass --timeout=30 to
any daemon it auto-starts. For inexplicable reasons the virtlockd
daemon did not implement the --timeout option, so it would
immediately exit on autostart with an error.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make the lock plugin use virFileFindResource to find the
virtlockd daemon path, so that it executes the in-builddir
daemon if run from source tree.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=905282https://bugzilla.redhat.com/show_bug.cgi?id=967494
When lock failure is detected by sanlock, our sanlock_helper kill script
will try to restart (shutdown followed by start) the affected domain
when RESTART action is configured for it. While shutting down kills QEMU
and removes all its leases (which is what sanlock wants to happen),
trying to start it again just hangs because libvirt tries reacquire the
locks in the failed lock space. Hence, this action cannot be supported
by sanlock driver.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=905280https://bugzilla.redhat.com/show_bug.cgi?id=967493
Sanlock expects that the configured kill script either kills the PID on
lock failure or removes all locks the PID owns. If none of the two
options happen, sanlock will reboot the host. Although IGNORE action is
supposed to ignore the request to kill the PID or remove all leases,
it's certainly not designed to cause the host to be rebooted. That said,
IGNORE action is incompatible with sanlock and should be forbidden by
libvirt.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Part of a series of cleanups to use new accessor methods.
* src/locking/domain_lock.c (virDomainLockManagerAddDisk): Use
accessors.
Signed-off-by: Eric Blake <eblake@redhat.com>
A earlier commit changed the global log buffer so that it only
records messages that are explicitly requested via the log
filters setting. This removes the performance burden, and
improves the signal/noise ratio for messages in the global
buffer. At the same time though, it is somewhat pointless, since
all the recorded log messages are already going to be sent to an
explicit log output like syslog, stderr or the journal. The
global log buffer is thus just duplicating this data on stderr
upon crash.
The log_buffer_size config parameter is left in the augeas
lens to prevent breakage for users on upgrade. It is however
completely ignored hereafter.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit a1cbe4b5 added a check for spaces around assignments and this
patch extends it to checks for spaces around '=='. One exception is
virAssertCmpInt where comma after '==' is acceptable (since it is a
macro and '==' is its argument).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>