Regression introduced in commit e6b68d7 (Nov 2010).
Prior to that point, handlesAlloc was always a multiple of
EVENT_ALLOC_EXTENT (10), and was an int (so even if the subtraction
had been able to wrap, a negative value would be less than the count
not try to free the handles array). But after that point,
VIR_RESIZE_N made handlesAlloc grow geometrically (with a pattern of
10, 20, 30, 45 for the handles array) but still freed in multiples of
EVENT_ALLOC_EXTENT; and the count changed to size_t. Which means that
after 31 handles have been created, then 30 handles destroyed,
handlesAlloc is 5 while handlesCount is 1, and since (size_t)(1 - 5)
is indeed greater than 1, this then tried to free 10 elements, which
had the awful effect of nuking the handles array while there were
still live handles.
Nuking live handles puts libvirtd in an inconsistent state, and was
easily reproducible by starting and then stopping 60 faqemu guests.
* daemon/event.c (virEventCleanupTimeouts, virEventCleanupHandles):
Avoid integer wrap-around causing us to delete the entire array
while entries are still active.
* tests/eventtest.c (mymain): Expose the bug.
This bug has been present since before the time that commit
f8a519 (Dec 2008) tried to make the dispatch loop re-entrant.
Dereferencing eventLoop.handles outside the lock risks crashing, since
any other thread could have reallocated the array in the meantime.
It's a narrow race window, however, and one that would have most
likely resulted in passing bogus data to the callback rather than
actually causing a segv, which is probably why it has gone undetected
this long.
* daemon/event.c (virEventDispatchHandles): Cache data while
inside the lock, as the array might be reallocated once outside.
* .gnulib: Update to latest.
* bootstrap.conf (gnulib_modules): Import pipe-posix and waitpid
for mingw.
* src/remote/remote_driver.c (pipe) [WIN32]: Drop dead macro.
* daemon/event.c (pipe) [WIN32]: Drop dead function.
* src/util/threads.h (virThreadID): New prototype.
* src/util/threads-pthread.c (virThreadID): New function.
* src/util/threads-win32.c (virThreadID): Likewise.
* src/libvirt_private.syms (threads.h): Export it.
* daemon/event.c (virEventInterruptLocked): Use it to avoid
warning on BSD systems.
* daemon/libvirtd.h (qemud_server): Change types of members
tracking array sizes, and add allocation trackers.
* daemon/event.c (virEventLoop): Likewise.
(virEventAddHandleImpl, virEventAddTimeoutImpl)
(virEventCleanupTimeouts, virEventCleanupHandles): Use
VIR_RESIZE_N instead of VIR_REALLOC_N. Tweak debug messages to
match type changes.
* daemon/libvirtd.c (qemudDispatchServer, qemudRunLoop): Likewise.
The code currently uses pthreads APIs directly. This is not
portable to Win32 threads. Switch it over to use the portability
APIs. Also add a wrapper for pipe() which is subtely different
on Win32
* daemon/event.c: Switch to use virMutex & virThread.
Based on a warning from coverity. The safe* functions
guarantee complete transactions on success, but don't guarantee
freedom from failure.
* src/util/util.h (saferead, safewrite, safezero): Add
ATTRIBUTE_RETURN_CHECK.
* src/remote/remote_driver.c (remoteIO, remoteIOEventLoop): Ignore
some failures.
(remoteIOReadBuffer): Adjust error messages on read failure.
* daemon/event.c (virEventHandleWakeup): Ignore read failure.
On kernels with HZ=100, the resolution of sleeps in poll() is
quite bad. Doing a precise check on the expiry time vs the
current time will thus often thing the timer has not expired
even though we're within 10ms of the expected expiry time. This
then causes another pointless sleep in poll() for <10ms. Timers
do not need to have such precise expiration, so we treat a timer
as expired if it is within 20ms of the expected expiry time. This
also fixes the eventtest.c test suite on kernels with HZ=100
* daemon/event.c: Add 20ms fuzz when checking for timer expiry