Bug https://bugzilla.redhat.com/show_bug.cgi?id=689374 reported libvirtd
crash during error dispatch.
The reason is that libvirtd uses remoteDispatchConnError() with non-NULL
conn parameter which means that virConnGetLastError() is used instead of
its thread safe replacement virGetLastError().
So when several libvirtd threads are reporting errors at the same time,
the errors can get mixed or corrupted or in case of bad luck libvirtd
itself crashes.
Since Daniel B. is going to rewrite this code from scratch on top of his
RPC infrastructure, I tried to come up with a minimal fix. Thus,
remoteDispatchConnError() now just ignores its conn argument and always
calls virGetLastError(). However, several callers had to be touched as
well, since no libvirt API is allowed to be called before dispatching
the error. Doing so would reset the error and we would have nothing to
dispatch. As a result of that, the code is not very nice but that
doesn't really make daemon/remote.c worse than it is now :-) And it will
all die soon, which is good.
The bug report also contains a reproducer in C which detects both mixed
up error messages and libvirtd crash. Before this patch, I was able to
crash libvirtd in about 20 seconds up to 3 minutes depending on number
of CPU cores (more is better) and luck.
Done mechanically with:
$ git grep -l '\bDEBUG0\? *(' | xargs -L1 sed -i 's/\bDEBUG0\? *(/VIR_&/'
followed by manual deletion of qemudDebug in daemon/libvirtd.c, along
with a single 'make syntax-check' fallout in the same file, and the
actual deletion in src/util/logging.h.
* src/util/logging.h (DEBUG, DEBUG0): Delete.
* daemon/libvirtd.h (qemudDebug): Likewise.
* global: Change remaining clients over to VIR_DEBUG counterpart.
When failing to serialize the normal RPC reply, try harder to
send a error message back to the client, instead of immediately
closing the connection.
* daemon/dispatch.c: Improve error messages when RPC reply
can not be sent
When libvirtd fails to serialize a message to XDR the client
connection is terminated immediately. To enable this to be
diagnosed, log the message which caused the problem on the
server
* daemon/dispatch.c: Log XDR serialization failures
Since we are adding a new "per-hypervisor" protocol, we
make it so that the qemu remote protocol uses a new
PROTOCOL and PROGRAM number. This allows us to easily
distinguish it from the normal REMOTE protocol.
This necessitates changing the proc in remote_message_header
from a "remote_procedure" to an "unsigned", which should
be the same size (and thus preserve the on-wire protocol).
Changes since v1:
- Fixed up a couple of script problems in remote_generate_stubs.pl
- Switch an int flag to a bool in dispatch.c
Changes since v2:
- None
Changes since v3:
- Change unsigned proc to signed proc, to conform to spec
Signed-off-by: Chris Lalancette <clalance@redhat.com>
dispatch.c requires stdio.h (and stdarg.h), however, currently
dispatch.c implicitly relys on rpc/xdr.h to include stdio.h.
If rpc/xdr.h unxpectedly does not include stdio.h, the compilation
of dispatch.c fails.
This can happen, for example, when portablexdr is installed
under /usr/local; because portablexdr's rpc/xdr.h does not
include stdio.h and gcc looks up it not /usr/include/rpc/xdr.h.
Note that stdarg.h is also included according to man va_start,
although stdio.h seems including it anyway.
The actual type of size_t is architecture dependent. Because the len
parameter is used as unsigned int in remoteSendStreamData(), change its
type to unsigned int.
* daemon/dispatch.[ch]: change size_t to unsigned int for
remoteSendStreamData()
Commit 47cab73499 changed the way how
qemud_client_message objects were reused. Before this commit
remoteDispatchClientRequest() reused the received message for normal responses
and to report non-fatal errors. If a fatal error occurred qemudWorker() frees
the message. After this commit non-fatal errors are reported by
remoteSerializeReplyError() using a new qemud_client_message object and the
original message leaks.
To fix this leak the original message has to be freed if
remoteSerializeReplyError() succeeds. If remoteSerializeReplyError()
fails the original message is freed in qemudWorker().
* daemon/dispatch.c: free qemud_client_message objects that will not be reused
and would leak otherwise, also free the allocated qemud_client_message object
in remoteSerializeError() if an error occurs
* daemon/dispatch.c: Set streamTX flag on outgoing data packets
* daemon/qemud.h: Add streamTX flag to track outgoing data
* daemon/qemud.c: Re-enable further TX when outgoing data packet
has been fully sent.
* daemon/stream.h, daemon/stream.c: Add method for enabling TX.
Support reading from streams and transmitting data out to client
Defines the extensions to the remote protocol for generic
data streams. Adds a bunch of helper code to the libvirtd
daemon for working with data streams.
* daemon/Makefile.am: Add stream.c/stream.h to build
* daemon/stream.c, qemud/stream.h: Generic helper functions for
creating new streams, associating streams with clients, finding
existing streams for a client and removing/deleting streams.
* src/remote/remote_protocol.x: Add a new 'REMOTE_STREAM' constant
for the 'enum remote_message_type' for encoding stream data
in wire messages. Add a new 'REMOTE_CONTINUE' constant to
'enum remote_message_status' to indicate further data stream
messsages are expected to follow. Document how the
remote_message_header is used to encode data streams
* src/remote/remote_protocol.h: Regenerate
* daemon/dispatch.c: Remove assumption that a error message
sent to client is always type=REMOTE_REPLY. It may now
also be type=REMOTE_STREAM. Add convenient method for
sending outgoing stream data packets. Log and ignore
non-filtered incoming stream packets. Add a method for
serializing a stream error message
* daemon/dispatch.h: Add API for serializing stream errors
and sending stream data packets
* daemon/qemud.h: Add struct qemud_client_stream for tracking
active data streams for clients. Tweak filter function
operation so that it accepts a client object too.
* daemon/qemud.c: Refactor code for free'ing message objects
which have been fully transmitted into separate method.
Release all active streams when client shuts down. Change
filter function to be responsible for queueing the message