The recently added sanlock_strerror function can be used to translate
sanlock's numeric errors into human readable strings.
https://bugzilla.redhat.com/show_bug.cgi?id=1409511
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The log and lock protocol don't have an extra handshake to close the
connection. Instead they just close the socket. Unfortunately that
resulted into a lot of spurious garbage logged to the system log files:
2017-03-17 14:00:09.730+0000: 4714: error : virNetSocketReadWire:1800 : End of file while reading data: Input/output error
or in the journal as:
Mar 13 16:19:33 xxxx virtlogd[32360]: End of file while reading data: Input/output error
Use the new facility in the netserverclient to suppress the IO error
report from the virNetSocket layer.
Linux still defaults to a 1024 open file handle limit. This causes
scalability problems for libvirtd / virtlockd / virtlogd on large
hosts which might want > 1024 guest to be running. In fact if each
guest needs > 1 FD, we can't even get to 500 guests. This is not
good enough when we see machines with 100's of physical cores and
TBs of RAM.
In comparison to other memory requirements of libvirtd & related
daemons, the resource usage associated with open file handles
is essentially line noise. It is thus reasonable to increase the
limits unconditionally for all installs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After deploying virtlogd by default we identified a number of
mistakes in the systemd unit file. virtlockd's relationship
to libvirtd is the same as virtlogd, so we must apply the
same unit file fixes to virtlockd
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Now that virLog{Get,Set}DefaultOutput routines are introduced we can wire them
up to the daemon's logging initialization code. Also, change the order of
operations a bit so that we still strictly honor our precedence of settings:
cmdline > env > config now that outputs and filters are not appended anymore.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Along with an empty string, it should also be possible for users to pass
NULL to the public APIs which in turn would trigger a routine(future
work) responsible for defining an appropriate default logging output
given the current circumstances.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This initially started as a fix of some debug printing in
virCgroupDetect. However it turned out that other places suffer
from the similar problem. While dealing with pids, esp. in cases
where we cannot use pid_t for ABI stability reasons, we often
chose an unsigned integer type. This makes no sense as pid_t is
signed.
Also, new syntax-check rule is introduced so we won't repeat this
mistake.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Similar to outputs, parser should do parsing only, thus the 'define' logic
is going to be stripped from virLogParseAndDefineFilters by replacing calls to
this method to virLogSetFilters instead.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Since virLogParseAndDefineOutputs is going to be stripped from 'output defining'
logic, replace all relevant occurrences with virLogSetOutputs call to make the
change transparent to all original callers (daemons mostly).
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Right now virLogParse* functions are doing both parsing and defining of filters
and outputs which should be two separate operations. Since the naming is
apparently a bit poor this patch renames these functions to
virLogParseAndDefine* which eventually will be replaced by virLogSet*.
Additionally, virLogParse{Filter,Output} will be later (after the split) reused,
so that these functions do exactly what the their name suggests.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1292984
Hold on to your hats, because this is gonna be wild.
In bd3e16a3 I've tried to expose sanlock io_timeout. What I had
not realized (because there is like no documentation for sanlock
at all) was very unusual way their APIs work. Basically, what we
do currently is:
sanlock_add_lockspace_timeout(&ls, io_timeout);
which adds a lockspace to sanlock daemon. One would expect that
io_timeout sets the io_timeout for it. Nah! That's where you are
completely off the tracks. It sets timeout for next lockspace you
will probably add later. Therefore:
sanlock_add_lockspace_timeout(&ls, io_timeout = 10);
/* adds new lockspace with default io_timeout */
sanlock_add_lockspace_timeout(&ls, io_timeout = 20);
/* adds new lockspace with io_timeout = 10 */
sanlock_add_lockspace_timeout(&ls, io_timeout = 40);
/* adds new lockspace with io_timeout = 20 */
And so on. You get the picture.
Fortunately, we don't allow setting io_timeout per domain or per
domain disk. So we just need to set the default used in the very
first step and hope for the best (as all the io_timeout-s used
later will have the same value).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently, we are checking for sanlock_add_lockspace_timeout
which is good for now. But in a subsequent patch we are going to
use sanlock_write_lockspace (which sets an initial value for io
timeout for sanlock). Now, there is no reason to check for both
functions in sanlock library as the sanlock_write_lockspace was
introduced in 2.7 release and the one we are currently checking
for in the 2.5 release. Therefore it is safe to assume presence
of sanlock_add_lockspace_timeout when sanlock_write_lockspace
is detected.
Moreover, the macro for conditional compilation is renamed to
HAVE_SANLOCK_IO_TIMEOUT (as it now encapsulates two functions).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Admin API needs a way of addressing specific clients. Unlike servers, which we
are happy to address by names both because its name reflects its purpose (to
some extent) and we only have two of them (so far), naming clients doesn't make
any sense, since a) each client is an anonymous, i.e. not recognized after a
disconnect followed by a reconnect, b) we can't predict what kind of requests
it's going to send to daemon, and c) the are loads of them comming and going,
so the only viable option is to use an ID which is of a reasonably wide data
type.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
After this commit, all man pages are generated using the same two
steps:
1. Process a source $command.pod file with pod2man(1) to obtain
a valid man page in $command.$section.in
2. Process $command.$section.in with sed(1) to obtain the final
man page in $command.$section
Since servers know their name, there is no need to supply such
information twice. Also defeats inconsistencies.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
At first I did not want to do this, but after trying to implement some
newer feaures in the admin API I realized we need that to make our lives
easier. On the other hand they are not saved redundantly and the
virNetServer objects are still kept in a hash table.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Since the daemon can manage and add (at fresh start) multiple servers,
we also should be able to add them from a JSON state file in case of a
daemon restart, so post exec restart support for multiple servers is also
provided. Patch also updates virnetdaemontest accordingly.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Our existing virHashForEach method iterates through all items disregarding the
fact, that some of the iterators might have actually failed. Errors are usually
dispatched through an error element in opaque data which then causes the
original caller of virHashForEach to return -1. In that case, virHashForEach
could return as soon as one of the iterators fail. This patch changes the
iterator return type and adjusts all of its instances accordingly, so the
actual refactor of virHashForEach method can be dealt with later.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
When building with gcc-5 (particularly gcc-5.3.0 now) and having pdwtags
installed (package dwarves) make check fails with the following error:
$ make lock_protocol-struct
GEN lock_protocol-struct
--- lock_protocol-structs 2016-01-13 15:04:59.318809607 +0100
+++ lock_protocol-struct-t3 2016-01-13 15:05:17.703501234 +0100
@@ -26,10 +26,6 @@
virLockSpaceProtocolNonNullString name;
u_int flags;
};
-enum virLockSpaceProtocolAcquireResourceFlags {
- VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_SHARED = 1,
- VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_AUTOCREATE = 2,
-};
struct virLockSpaceProtocolAcquireResourceArgs {
virLockSpaceProtocolNonNullString path;
virLockSpaceProtocolNonNullString name;
Makefile:10415: recipe for target 'lock_protocol-struct' failed
make: *** [lock_protocol-struct] Error 1
That happens because without any specific options gcc doesn't keep enum
information in the resulting binary object. I managed to isolate the
parameters of gcc that caused this issue to disappear, however I
remember that they influenced the resulting binaries quite a bit and
were definitely not something we would want to add as mandatory to the
build process.
So to deal with this cleanly, let's take that enum and separate it out
to its own header file. Since it is only used in the lockd driver and
the protocol, lock_driver_lockd.h feels like a suitable name.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit b22344f328 mistakenly reordered
Default-* lines. Thanks to that I noticed that we are very inconsistent
with our init scripts, so I took the liberty of synchronizing them,
updating them and making them all look shiny and new. So apart from
fixing the LSB requirements, I also fixed the ordering, specified
runlevels and fix the link to the reference specification.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1251190
So, if domain loses access to storage, sanlock tries to kill it
after some timeout. So far, the default is 80 seconds. But for
some scenarios this might not be enough. We should allow users to
adjust the timeout according to their needs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Lets use wrapper functions virLockDaemonLock and
virLockDaemonUnlock instead of virMutexLock and virMutexUnlock.
This has no functional impact, but it's easier to read (at least
for me).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So we have this mechanism that on SIGUSR1 the virtlockd dumps its
internal state into a JSON file, reexec itself and the reloads
the internal state back. However, there's a bug in our
implementation:
(gdb) signal SIGUSR1
Continuing with signal SIGUSR1.
[Thread 0x7fd094f7b700 (LWP 10602) exited]
process 10600 is executing new program: /home/zippy/work/libvirt/libvirt.git/src/virtlockd
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fb28bc3c700 (LWP 14501)]
Program received signal SIGSEGV, Segmentation fault.
0x00007fb29133d530 in virExpandN (ptrptr=0x70, size=8, countptr=0x68, add=1, report=true, domcode=7, filename=0x7fb29138aeab "rpc/virnetserver.c", funcname=0x7fb29138b680 <__FUNCTION__.15821> "virNetServerAddProgram", linenr=661) at util/viralloc.c:288
288 if (*countptr + add < *countptr) {
(gdb) bt
#0 0x00007fb29133d530 in virExpandN (ptrptr=0x70, size=8, countptr=0x68, add=1, report=true, domcode=7, filename=0x7fb29138aeab "rpc/virnetserver.c", funcname=0x7fb29138b680 <__FUNCTION__.15821> "virNetServerAddProgram", linenr=661) at util/viralloc.c:288
#1 0x00007fb29132a267 in virNetServerAddProgram (srv=0x0, prog=0x7fb2915d08b0) at rpc/virnetserver.c:661
#2 0x00007fb29131f27f in main (argc=1, argv=0x7fff8f771298) at locking/lock_daemon.c:1445
Notice the NULL @srv passed to frame 2? Usually, the @srv
variable is initialized on fresh start. However, in case of
daemon reload, the code path that is responsible for initializing
the value was not triggered and therefore we crashed immediately.
Fix this by always setting the variable.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The internal representation of a JSON array counts the items in
size_t. However, for some reason, when asking for the count it's
reported as int. Firstly, we need the function to return a signed
type as it's returning -1 on an error. But, not every system has
integer the same size as size_t. Therefore, lets return ssize_t.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that we have virNetDaemon object holding all the data and being
capable of referencing multiple servers, having a duplicate reference to
a single server stored in virLockDaemon isn't necessary anymore. This
patch removes the above described element.
Since its introduction in 2011 (particularly in commit f4324e3292),
the option doesn't work. It just effectively disables all incoming
connections. That's because the client private data that contain the
'keepalive_supported' boolean, are initialized to zeroes so the bool is
false and the only other place where the bool is used is when checking
whether the client supports keepalive. Thus, according to the server,
no client supports keepalive.
Removing this instead of fixing it is better because a) apparently
nobody ever tried it since 2011 (4 years without one month) and b) we
cannot know whether the client supports keepalive until we get a ping or
pong keepalive packet. And that won't happen until after we dispatched
the ConnectOpen call.
Another two reasons would be c) the keepalive_required was tracked on
the server level, but keepalive_supported was in private data of the
client as well as the check that was made in the remote layer, thus
making all other instances of virNetServer miss this feature unless they
all implemented it for themselves and d) we can always add it back in
case there is a request and a use-case for it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Some hypervisors like Xen do not have PIDs associated with domains.
Relax the requirement for PID != 0 in the locking code so it can
be used by hypervisors that do not represent domains as a process
running on the host.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
When acquiring resource via sanlock fails, we would report it as
VIR_ERR_INTERNAL_ERROR, which is not very friendly to applications using
libvirt. Moreover, the lockd driver would report the same failure as
VIR_ERR_RESOURCE_BUSY, which looks better.
Unfortunately, in sanlock driver we don't really know if acquiring the
resource failed because it was already locked or there was another
reason behind. But the end result is the same and I think using
VIR_ERR_RESOURCE_BUSY reason for all acquire failures is still better
than what we have now.
https://bugzilla.redhat.com/show_bug.cgi?id=1165119
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In the order of appearance:
* MAX_LISTEN - never used
added by 23ad665c (qemud) and addec57 (lock daemon)
* NEXT_FREE_CLASS_ID - never used, added by 07d1b6b
* virLockError - never used, added by eb8268a4
* OPENVZ_MAX_ARG, CMDBUF_LEN, CMDOP_LEN
unused since the removal of ADD_ARG_LIT in d8b31306
* QEMU_NB_PER_CPU_STAT_PARAM - unused since 897808e
* QEMU_CMD_PROMPT, QEMU_PASSWD_PROMPT - unused since 1dc10a7
* TEST_MODEL_WORDSIZE - unused since c25c18f7
* TEMPDIR - never used, added by 714bef5
* NSIG - workaround around old headers
added by commit 60ed1d2
unused since virExec was moved by commit 02e8691
* DO_TEST_PARSE - never used, added by 9afa006
* DIFF_MSEC, GETTIMEOFDAY - unused since eee6eb6
Commit v1.2.4-52-gda879e5 fixed issues with domains started before
sanlock driver was enabled by checking whether a running domain is
registered with sanlock and if it's not, sanlock driver is basically
ignored for the domain.
However, it was checking this even for domain which has just been
started and no sanlock_* API was called for them yet. This results in
cmd 9 target pid 2135544 not found
error messages to appear in sanlock.log whenever we start a new domain.
This patch avoids this useless check for freshly started domains.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
virLockManager*New APIs are never called with
VIR_LOCK_MANAGER_USES_STATE. Moreover, lockd driver does not maintain
any state that would need to be transferred during migration and thus it
should not mention VIR_LOCK_MANAGER_USES_STATE at all.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Not all files we want to find using virFileFindResource{,Full} are
generated when libvirt is built, some of them (such as RNG schemas) are
distributed with sources. The current API was not able to find source
files if libvirt was built in VPATH.
Both RNG schemas and cpu_map.xml are distributed in source tarball.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Remove the resize flag and use the same code path for all callers.
This flag was added by commit 18f0316 to allow virStorageFileResize
use 'safezero' while preserving the behavior.
Explicitly return -2 when a fallback to a different method should
be done, to make the code path more obvious.
Fail immediately when ftruncate fails in the mmap method,
as we did before commit 18f0316.
Currently virStorageFileResize() function uses build conditionals to
choose either the posix_fallocate() or syscall(SYS_fallocate) with no
fallback in order to preallocate the space in the newly resized file.
Since the safezero code has a similar set of conditionals modify the
resize and safezero code in order to allow the resize logic to make use
of safezero to unify the look/feel of the code paths.
Add a new boolean (resize) to safezero() to make the optional decision
whether to try syscall(SYS_fallocate) if the posix_fallocate fails because
HAVE_POSIX_FALLOCATE is not defined (eg, return -1 and errno == 0).
Create a local safezero_sys_fallocate in order to handle the resize
code paths that support that. If not present, the set errno = ENOSYS
in order to allow the caller to handle the failure scenarios.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1160995
In our config files users are expected to pass several integer values
for different configuration knobs. However, majority of them expect a
nonnegative number and only a few of them accept a negative number too
(notably keepalive_interval in libvirtd.conf).
Therefore, a new type to config value is introduced: VIR_CONF_ULONG
that is set whenever an integer is positive or zero. With this
approach knobs accepting VIR_CONF_LONG should accept VIR_CONF_ULONG
too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>