The virLogSetFromEnv call was done too late in startup to
catch many log messages (eg from security driver initialization).
To assist debugging also explicitly log the security details
at startup
The driver->securityDriverName field may be NULL, if automatic
probing is used to determine security driver. This meant that
unless selinux was explicitly requested in lxc.conf, it was
not being sent to the libvirt_lxc process.
The driver->securityManager field is guaranteed non-NULL, since
there will always be the 'none' security driver present if
nothing else exists. So use that to set the driver name for
libvirt_lxc
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the libvirt_lxc process uses VIR_DOMAIN_XML_INACTIVE
when loading the XML for the container. This means it loses
any dynamic data such as the, just allocated, SELinux label.
Further there is an inconsistency in the libvirt LXC driver
whereby it saves the live config XML and then later overwrites
the file with the live status XML instead. Add a comment about
this for future reference.
* src/lxc/lxc_controller.c: Remove VIR_DOMAIN_XML_INACTIVE
when loading XML
* src/lxc/lxc_driver.c: Add comment about inconsistent
config file formats
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This works with newer qemu that doesn't allow escaping spaces.
It's backwards compatible as well.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
In fact, the 'tapfd' is always NULL, the function 'virNetDevTapCreate()' hasn't
assign 'fd' to 'tapfd', when the function 'virNetDevSetMAC()' is failed then
goto 'error' label, finally, the VIR_FORCE_CLOSE() will deref a NULL 'tapfd'.
* util/virnetdevtap.c (virNetDevTapCreateInBridgePort): fix a NULL pointer derefing.
* How to reproduce?
$ cat > /tmp/net.xml <<EOF
<network>
<name>test</name>
<forward mode='nat'/>
<bridge name='br1' stp='off' delay='1' />
<mac address='00:00:00:00:00:00'/>
<ip address='192.168.100.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.100.2' end='192.168.100.254' />
</dhcp>
</ip>
</network>
EOF
$ virsh net-define /tmp/net.xml
$ virsh net-start test
error: Failed to start network brTest
error: End of file while reading data: Input/output error
Signed-off-by: Alex Jia <ajia@redhat.com>
The previous storage patch missed an instance affected by the struct
member rename. It also had some botched whitespace detected by
'make check'.
* src/storage/storage_backend_iscsi.c
(virStorageBackendISCSIFindPoolSources): Adjust to new struct.
* src/conf/storage_conf.c (virStoragePoolSourceFormat): Fix
indentation.
It doesn't break out the "for" loop even if duplicate pool is
found, and thus the "matchpool" could be overriden as NULL again
if there is different pool afterwards.
To address the problem in libvirt-user list:
https://www.redhat.com/archives/libvirt-users/2012-April/msg00150.html
The current storage pools for NFS and iSCSI only require one host to
connect to. Future storage pools like RBD and Sheepdog will require
multiple hosts.
This patch allows multiple source hosts and rewrites the current
storage drivers.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When libvirtd is started, we create "libvirt/qemu" directories under
hugetlbfs mount point. Only the "qemu" subdirectory is chowned to qemu
user and "libvirt" remains owned by root. If umask was too restrictive
when libvirtd started, qemu user may lose access to "qemu"
subdirectory. Let's explicitly grant search permissions to "libvirt"
directory for all users.
Currently, qemu GA is not providing 'desc' field for errors like
we are used to from qemu monitor. Therefore, we fall back to this
general 'unknown error' string. However, GA is reporting 'class' which
is not perfect, but much more helpful than generic error string.
Thus we should fall back to class firstly and if even no class
is presented, then we can fall back to that generic string.
Before this patch:
virsh # dompmsuspend --target mem f16
error: Domain f16 could not be suspended
error: internal error unable to execute QEMU command
'guest-suspend-ram': unknown QEMU command error
After this patch:
virsh # dompmsuspend --target mem f16
error: Domain f16 could not be suspended
error: internal error unable to execute QEMU command
'guest-suspend-ram': The command has not been found
More bug extermination in the category of:
Error: CHECKED_RETURN:
/libvirt/src/conf/network_conf.c:595:
check_return: Calling function "virAsprintf" without checking return value (as is done elsewhere 515 out of 543 times).
/libvirt/src/qemu/qemu_process.c:2780:
unchecked_value: No check of the return value of "virAsprintf(&msg, "was paused (%s)", virDomainPausedReasonTypeToString(reason))".
/libvirt/tests/commandtest.c:809:
check_return: Calling function "setsid" without checking return value (as is done elsewhere 4 out of 5 times).
/libvirt/tests/commandtest.c:830:
unchecked_value: No check of the return value of "virTestGetDebug()".
/libvirt/tests/commandtest.c:831:
check_return: Calling function "virTestGetVerbose" without checking return value (as is done elsewhere 41 out of 42 times).
/libvirt/tests/commandtest.c:833:
check_return: Calling function "virInitialize" without checking return value (as is done elsewhere 18 out of 21 times).
One note about the error in commandtest line 809: setsid() seems to fail when running the test -- could be removed ?
With RHEL 6.2, virDomainBlockPull(dom, dev, bandwidth, 0) has a race
with non-zero bandwidth: there is a window between the block_stream
and block_job_set_speed monitor commands where an unlimited amount
of data was let through, defeating the point of a throttle.
This race was first identified in commit a9d3495e, and libvirt was
able to reduce the size of the window for that race. In the meantime,
the qemu developers decided to fix things properly; per this message:
https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03793.html
the fix will be in qemu 1.1, and changes block-job-set-speed to use
a different parameter name, as well as adding a new optional parameter
to block-stream, which eliminates the race altogether.
Since our documentation already mentioned that we can refuse a non-zero
bandwidth for some hypervisors, I think the best solution is to do
just that for RHEL 6.2 qemu, so that the race is obvious to the user
(anyone using stock RHEL 6.2 binaries won't have this patch, and anyone
building their own libvirt with this patch for RHEL can also rebuild
qemu to get the modern semantics, so it is no real loss in behavior).
Meanwhile the code must be fixed to honor actual qemu 1.1 naming.
Rename the parameter to 'modern', since the naming difference now
covers more than just 'async' block-job-cancel. And while at it,
fix an unchecked integer overflow.
* src/qemu/qemu_monitor.h (enum BLOCK_JOB_CMD): Drop unused value,
rename enum to match conventions.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Reflect enum rename.
* src/qemu_qemu_monitor_json.h (qemuMonitorJSONBlockJob): Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Likewise,
and support difference between RHEL 6.2 and qemu 1.1 block pull.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Reject
bandwidth during pull with too-old qemu.
* src/libvirt.c (virDomainBlockPull, virDomainBlockRebase):
Document this.
Error: UNINIT:
/libvirt/src/lxc/lxc_driver.c:1412:
var_decl: Declaring variable "fd" without initializer.
/libvirt/src/lxc/lxc_driver.c:1460:
uninit_use_in_call: Using uninitialized value "fd" when calling "virFileClose".
/libvirt/src/util/virfile.c:50:
read_parm: Reading a parameter value.
Error: DEADCODE:
/libvirt/src/lxc/lxc_controller.c:960:
dead_error_condition: On this path, the condition "ret == 4" cannot be true.
/libvirt/src/lxc/lxc_controller.c:959:
at_most: After this line, the value of "ret" is at most -1.
/libvirt/src/lxc/lxc_controller.c:959:
new_values: Noticing condition "ret < 0".
/libvirt/src/lxc/lxc_controller.c:961:
dead_error_line: Execution cannot reach this statement "continue;".
Error: UNINIT:
/libvirt/src/lxc/lxc_controller.c:1104:
var_decl: Declaring variable "consoles" without initializer.
/libvirt/src/lxc/lxc_controller.c:1237:
uninit_use: Using uninitialized value "consoles".
QEMU binary is called several times when we probe different kinds of
capabilities the binary supports. This patch introduces new common
helper so that all probes use a consistent way of invoking qemu.
https://bugzilla.redhat.com/show_bug.cgi?id=816662 pointed out
that attempting 'virsh blockpull' on an offline domain gave a
misleading error message about qemu lacking support for the
operation, even when qemu was specifically updated to support it.
The real problem is that we have several capabilities that are
only determined when starting a domain, and therefore are still
clear when first working with an inactive domain (namely, any
capability set by qemuMonitorJSONCheckCommands).
While this patch was able to hoist an existing check in one of the
three culprits, it had to add redundant checks in the other two
places (because you always have to check for an active domain after
obtaining a VM job lock, but the capability bits were being checked
prior to obtaining the job lock).
Someday it would be nice to patch libvirt to cache the set of
capabilities per qemu binary (as determined by inode and timestamp),
rather than re-probing the binary every time a domain is started,
and to teach the cache how to query the monitor during the one
time the probe is made rather than having to wait until a guest
is started; then, a capability probe would succeed even for offline
guests because it just refers to the cache, and the single check for
an active domain after grabbing the job lock would be sufficient.
But since that will involve a lot more coding, I'm happy to go
with this simpler solution for an immediate solution.
* src/qemu/qemu_driver.c (qemuDomainPMSuspendForDuration)
(qemuDomainSnapshotCreateXML, qemuDomainBlockJobImpl): Check for
offline state before checking an online-only cap.
Below patch fixes the following coverity findings
Error: OVERRUN_STATIC:
/libvirt/src/qemu/qemu_command.c:152:
overrun-buffer-val: Overrunning static array "net->mac" of size 6 bytes by passing it as an argument to a function which indexes it at byte position 15.
/libvirt/src/util/virnetdevmacvlan.c:948:
access_dbuff_const: Calling "virNetDevMacVLanVPortProfileRegisterCallback" indexes array "macaddress" at byte position 15.
/libvirt/src/util/virnetdevmacvlan.c:773:
access_dbuff_const: Calling "memcpy" indexes array "macaddress" with index "16UL" at byte position 15.
Error: OVERRUN_STATIC:
/libvirt/src/qemu/qemu_migration.c:2744:
overrun-buffer-val: Overrunning static array "net->mac" of size 6 bytes by passing it as an argument to a function which indexes it at byte position 15.
/libvirt/src/util/virnetdevmacvlan.c:773:
access_dbuff_const: Calling "memcpy" indexes array "macaddress" with index "16UL" at byte position 15.
Error: OVERRUN_STATIC:
/libvirt/src/qemu/qemu_driver.c:435:
overrun-buffer-val: Overrunning static array "net->mac" of size 6 bytes by passing it as an argument to a function which indexes it at byte position 15.
/libvirt/src/util/virnetdevmacvlan.c:1036:
access_dbuff_const: Calling "virNetDevMacVLanVPortProfileRegisterCallback" indexes array "macaddress" at byte position 15.
/libvirt/src/util/virnetdevmacvlan.c:773:
access_dbuff_const: Calling "memcpy" indexes array "macaddress" with index "16UL" at byte position 15.
This patch addresses the following coverity findings:
/libvirt/src/conf/nwfilter_params.c:390:
var_assigned: Assigning: "varValue" = null return value from "virHashLookup".
/libvirt/src/conf/nwfilter_params.c:392:
dereference: Dereferencing a pointer that might be null "varValue" when calling "virNWFilterVarValueGetNthValue".
/libvirt/src/conf/nwfilter_params.c:399:
dereference: Dereferencing a pointer that might be null "tmp" when calling "virNWFilterVarValueGetNthValue".
This patch addresses the following coverity findings:
/libvirt/src/conf/nwfilter_params.c:157:
deref_parm: Directly dereferencing parameter "val".
/libvirt/src/conf/nwfilter_params.c:473:
negative_returns: Using variable "iterIndex" as an index to array "res->iter".
/libvirt/src/nwfilter/nwfilter_ebiptables_driver.c:2891:
unchecked_value: No check of the return value of "virAsprintf(&protostr, "-d 01:80:c2:00:00:00 ")".
/libvirt/src/nwfilter/nwfilter_ebiptables_driver.c:2894:
unchecked_value: No check of the return value of "virAsprintf(&protostr, "-p 0x%04x ", l3_protocols[protoidx].attr)".
/libvirt/src/nwfilter/nwfilter_ebiptables_driver.c:3590:
var_deref_op: Dereferencing null variable "inst".
Some of the error messages in this function should have been
virReportSystemError (since they have an errno they want to log), but
were mistakenly written as netlinkError, which expects a libvirt error
code instead. The result was that when one of the errors was
encountered, "No error message provided" would be printed instead of
something meaningful (see
https://bugzilla.redhat.com/show_bug.cgi?id=816465 for an example).
Once qemu monitor reports migration has completed, we just closed our
end of the pipe and let migration tunnel die. This generated bogus error
in case we did so before the thread saw EOF on the pipe and migration
was aborted even though it was in fact successful.
With this patch we first wake up the tunnel thread and once it has read
all data from the pipe and finished the stream we close the
filedescriptor.
A small additional bonus of this patch is that real errors reported
inside qemuMigrationIOFunc are not overwritten by virStreamAbort any
more.
When QEMU reported failed or canceled migration, we correctly detected
it but didn't really consider it as an error condition and migration
protocol just went on. Luckily, some of the subsequent steps eventually
failed end we reported an (unrelated and mostly random) error back to
the caller.
Currently, non-blocking calls are either sent immediately or discarded
in case sending would block. This was implemented based on the
assumption that the non-blocking keepalive call is not needed as there
are other calls in the queue which would keep the connection alive.
However, if those calls are no-reply calls (such as those carrying
stream data), the remote party knows the connection is alive but since
we don't get any reply from it, we think the connection is dead.
This is most visible in tunnelled migration. If it happens to be longer
than keepalive timeout (30s by default), it may be unexpectedly aborted
because the connection is considered to be dead.
With this patch, we only discard non-blocking calls when the last call
with a thread is completed and thus there is no thread left to keep
sending the remaining non-blocking calls.
In some cases (spotted with broken connection during tunneled migration)
we were overwriting the original error with worse or even misleading
errors generated when we were cleaning up after failed migration.
The docs for virConnectSetKeepAlive() advertise that this function
should be able to disable keepalives on negative or zero interval time.
This patch removes the check that prohibited this and adds code to
disable keepalives on negative/zero interval.
* src/libvirt.c: virConnectSetKeepAlive(): - remove check for negative
values
* src/rpc/virnetclient.c
* src/rpc/virnetclient.h: - add virNetClientKeepAliveStop() to disable
keepalive messages
* src/remote/remote_driver.c: remoteSetKeepAlive(): -add ability to
disable keepalives
This patch resolves https://bugzilla.redhat.com/show_bug.cgi?id=815270
The function virNetDevMacVLanVPortProfileRegisterCallback() takes an
arg "virtPortProfile", and was checking it for non-NULL before using
it. However, the prototype for
virNetDevMacVLanPortProfileRegisterCallback had marked that arg with
ATTRIBUTE_NONNULL(). Contrary to what one may think,
ATTRIBUTE_NONNULL() does not provide any guarantee that an arg marked
as such really is always non-null; the only effect to the code
generated by gcc, is that gcc *assumes* it is non-NULL; this results
in, for example, the check for a non-NULL value being optimized out.
(Unfortunately, this code removal only occurs when optimization is
enabled, and I am in the habit of doing local builds with optimization
off to ease debugging, so the bug didn't show up in my earlier local
testing).
In general, virPortProfile might always be NULL, so it shouldn't be
marked as ATTRIBUTE_NONNULL. One other function prototype made this
same error, so this patch fixes it as well.
Add 2 new functions to the virSocketAddr 'class':
- virSocketAddrEqual: tests whether two IP addresses and their ports are equal
- virSocketaddSetIPv4Addr: set a virSocketAddr given a 32 bit int
This patch improves the previously added virAtomicInt implementation
by using gcc-builtins if possible. The needed builtins are available
since GCC >= 4.1. At least the 4.0 docs don't mention them.
vboxArray is not castable to a COM item type. vboxArray is a
wrapper around the XPCOM and MSCOM specific array handling.
In this case we can avoid passing NULL as an empty array to
IMachine::Delete by passing a dummy IMedium* array with a single
NULL item.
In order to track a block copy job across libvirtd restarts, we
need to save internal XML that tracks the name of the file
holding the mirror. Displaying this name in dumpxml might also
be useful to the user, even if we don't yet have a way to (re-)
start a domain with mirroring enabled up front. This is done
with a new <mirror> sub-element to <disk>, as in:
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/var/lib/libvirt/images/original.img'/>
<mirror file='/var/lib/libvirt/images/copy.img' format='qcow2' ready='yes'/>
...
</disk>
For now, the element is output-only, in live domains; it is ignored
when defining a domain or hot-plugging a disk (since those contexts
use VIR_DOMAIN_XML_INACTIVE in parsing). The 'ready' attribute appears
when libvirt knows that the job has changed from the initial pulling
phase over to the mirroring phase, although absence of the attribute
is not a sure indicator of the current phase. If we come up with a way
to make qemu start with mirroring enabled, we can relax the xml
restriction, and allow <mirror> (but not attribute 'ready') on input.
Testing active-only XML meant tweaking the testsuite slightly, but it
was worth it.
* docs/schemas/domaincommon.rng (diskspec): Add diskMirror.
* docs/formatdomain.html.in (elementsDisks): Document it.
* src/conf/domain_conf.h (_virDomainDiskDef): New members.
* src/conf/domain_conf.c (virDomainDiskDefFree): Clean them.
(virDomainDiskDefParseXML): Parse them, but only internally.
(virDomainDiskDefFormat): Output them.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: New test file.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml: Likewise.
* tests/qemuxml2xmltest.c (testInfo): Alter members.
(testCompareXMLToXMLHelper): Allow more test control.
(mymain): Run new test.
This patch introduces a new block job, useful for live storage
migration using pre-copy streaming. Justification for including
this under virDomainBlockRebase rather than adding a new command
includes: 1) there are now two possible block jobs in qemu, with
virDomainBlockRebase starting either type of command, and
virDomainBlockJobInfo and virDomainBlockJobAbort working to end
either type; 2) reusing this command allows distros to backport
this feature to the libvirt 0.9.10 API without a .so bump.
Note that a future patch may add a more powerful interface named
virDomainBlockJobCopy, dedicated to just the block copy job, in
order to expose even more options (such as setting an arbitrary
format type for the destination without having to probe it from a
pre-existing destination file); adding a new command for targetting
just block copy would be similar to how we already have
virDomainBlockPull for targetting just the block pull job.
Using a live VM with the backing chain:
base <- snap1 <- snap2
as the starting point, we have:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY)
creates /path/to/copy with the same format as snap2, with no backing
file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)
creates /path/to/copy as a raw file, so entire chain is copied and
flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW)
creates /path/to/copy with the same format as snap2, but with snap1 as
a backing file, so only snap2 is copied.
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)
reuse existing /path/to/copy (must have empty contents, and format is
probed[*] from the metadata), and copy the full chain
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT|
VIR_DOMAIN_BLOCK_REBASE_SHALLOW)
reuse existing /path/to/copy (contents must be identical to snap1,
and format is probed[*] from the metadata), and copy only the contents
of snap2
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT|
VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)
reuse existing /path/to/copy (must be raw volume with contents
identical to snap1), and copy only the contents of snap2
Less useful combinations:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW|
VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)
fail if source is not raw, otherwise create /path/to/copy as raw and
the single file is copied (no chain involved)
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT|
VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)
makes little sense: the destination must be raw but have no contents,
meaning that it is an empty file, so there is nothing to reuse
The other three flags are rejected without VIR_DOMAIN_BLOCK_COPY.
[*] Note that probing an existing file for its format can be a security
risk _if_ there is a possibility that the existing file is 'raw', in
which case the guest can manipulate the file to appear like some other
format. But, by virtue of the VIR_DOMAIN_BLOCK_REBASE_COPY_RAW flag,
it is possible to avoid probing of raw files, at which point, probing
of any remaining file type is no longer a security risk.
It would be nice if we could issue an event when pivoting from phase 1
to phase 2, but qemu hasn't implemented that, and we would have to poll
in order to synthesize it ourselves. Meanwhile, qemu will give us a
distinct job info and completion event when we either cancel or pivot
to end the job. Pivoting is accomplished via the new:
virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)
Management applications can pre-create the copy with a relative
backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT
flag to have qemu reuse the metadata; if the management application
also copies the backing files to a new location, this can be used
to perform live storage migration of an entire backing chain.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_TYPE_COPY):
New block job type.
(virDomainBlockJobAbortFlags, virDomainBlockRebaseFlags): New enums.
* src/libvirt.c (virDomainBlockRebase): Document the new flags,
and implement general restrictions on flag combinations.
(virDomainBlockJobAbort): Document the new flag.
(virDomainSaveFlags, virDomainSnapshotCreateXML)
(virDomainRevertToSnapshot, virDomainDetachDeviceFlags): Document
restrictions.
* include/libvirt/virterror.h (VIR_ERR_BLOCK_COPY_ACTIVE): New
error.
* src/util/virterror.c (virErrorMsg): Define it.
This patch modifies the CPU comparrison function to report the
incompatibilities in more detail to ease identification of problems.
* src/cpu/cpu.h:
cpuGuestData(): Add argument to return detailed error message.
* src/cpu/cpu.c:
cpuGuestData(): Add passthrough for error argument.
* src/cpu/cpu_x86.c
x86FeatureNames(): Add function to convert a CPU definition to flag
names.
x86Compute(): - Add error message parameter
- Add macro for reporting detailed error messages.
- Improve error reporting.
- Simplify calculation of forbidden flags.
x86DataIteratorInit():
x86cpuidMatchAny(): Remove functions that are no longer needed.
* src/qemu/qemu_command.c:
qemuBuildCpuArgStr(): - Modify for new function prototype
- Add detailed error reports
- Change error code on incompatible processors
to VIR_ERR_CONFIG_UNSUPPORTED instead of
internal error
* tests/cputest.c:
cpuTestGuestData(): Modify for new function prototype
virThreadSelf tries to access the virThreadPtr stored in TLS for the
current thread via TlsGetValue. When virThreadSelf is called on a thread
that was not created via virThreadCreate (e.g. the main thread) then
TlsGetValue returns NULL as TlsAlloc initializes TLS slots to NULL.
virThreadSelf can be called on the main thread via this call chain from
virsh
vshDeinit
virEventAddTimeout
virEventPollAddTimeout
virEventPollInterruptLocked
virThreadIsSelf
triggering a segfault as virThreadSelf unconditionally dereferences the
return value of TlsGetValue.
Fix this by making virThreadSelf check the TLS slot value for NULL and
setting the given virThreadPtr accordingly.
Reported by Marcel Müller.
POSIX says that sa_sigaction is only safe to use if sa_flags
includes SA_SIGINFO; conversely, sa_handler is only safe to
use when flags excludes that bit. Gnulib doesn't guarantee
an implementation of SA_SIGINFO, but does guarantee that
if SA_SIGINFO is undefined, we can safely define it to 0 as
long as we don't dereference the 2nd or 3rd argument of
any handler otherwise registered via sa_sigaction.
Based on a report by Wen Congyang.
* src/rpc/virnetserver.c (SA_SIGINFO): Stub for mingw.
(virNetServerSignalHandler): Avoid bogus dereference.
(virNetServerFatalSignal, virNetServerNew): Set flags properly.
(virNetServerAddSignalHandler): Drop unneeded #ifdef.
I almost copied-and-pasted some redundant () into my new code,
and figured a general cleanup prereq patch would be better instead.
No semantic change.
* src/conf/domain_conf.c (virDomainLeaseDefParseXML)
(virDomainDiskDefParseXML, virDomainFSDefParseXML)
(virDomainActualNetDefParseXML, virDomainNetDefParseXML)
(virDomainGraphicsDefParseXML, virDomainVideoAccelDefParseXML)
(virDomainVideoDefParseXML, virDomainHostdevFind)
(virDomainControllerInsertPreAlloced, virDomainDefParseXML)
(virDomainObjParseXML, virDomainCpuSetFormat)
(virDomainCpuSetParse, virDomainDiskDefFormat)
(virDomainActualNetDefFormat, virDomainNetDefFormat)
(virDomainTimerDefFormat, virDomainGraphicsListenDefFormat)
(virDomainDefFormatInternal, virDomainNetGetActualHostdev)
(virDomainNetGetActualBandwidth, virDomainGraphicsGetListen):
Reduce extra ().
Ensure we don't introduce any more lousy integer parsing in new
code, while avoiding a scrub-down of existing legacy code.
Note that we also need to enable sc_prohibit_atoi_atof (see cfg.mk
local-checks-to-skip) before we are bulletproof, but that also
entails scrubbing I'm not ready to do at the moment.
* src/util/util.c (virStrToLong_i, virStrToLong_ui)
(virStrToLong_l, virStrToLong_ul, virStrToLong_ll)
(virStrToLong_ull, virStrToDouble): Mark exemptions.
* src/util/virmacaddr.c (virMacAddrParse): Likewise.
* cfg.mk (sc_prohibit_strtol): New syntax check.
(exclude_file_name_regexp--sc_prohibit_strtol): Ignore files that
I'm not willing to fix yet.
(local-checks-to-skip): Re-enable sc_prohibit_atoi_atof.
https://bugzilla.redhat.com/show_bug.cgi?id=617711 reported that
even with my recent patched to allow <memory unit='G'>1</memory>,
people can still get away with trying <memory>1G</memory> and
silently get <memory unit='KiB'>1</memory> instead. While
virt-xml-validate catches the error, our C parser did not.
Not to mention that it's always fun to fix bugs while reducing
lines of code. :)
* src/conf/domain_conf.c (virDomainParseMemory): Check for parse error.
(virDomainDefParseXML): Avoid strtoll.
* src/conf/storage_conf.c (virStorageDefParsePerms): Likewise.
* src/util/xml.c (virXPathLongBase, virXPathULongBase)
(virXPathULongLong, virXPathLongLong): Likewise.
Commit 78345c68 makes at least gcc 4.1.2 on RHEL 5 complain:
cc1: warnings being treated as errors
In file included from vbox/vbox_V4_0.c:13:
vbox/vbox_tmpl.c: In function 'vboxDomainUndefineFlags':
vbox/vbox_tmpl.c:5298: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
* src/vbox/vbox_tmpl.c (vboxDomainUndefineFlags): Use union to
avoid compiler warning.
DBus connection. The HAL device code further requires that
the DBus connection is integrated with the event loop and
provides such glue logic itself.
The forthcoming FirewallD integration also requires a
dbus connection with event loop integration. Thus we need
to pull the current event loop glue out of the HAL driver.
Thus we create src/util/virdbus.{c,h} files. This contains
just one method virDBusGetSystemBus() which obtains a handle
to the single shared system bus instance, with event glue
automagically setup.
Fix the support for trusted DHCP server in the ebtables code's
hard-coded function applying DHCP only filtering rules:
Rather than using a char * use the more flexible
virNWFilterVarValuePtr that contains the trusted DHCP server(s)
IP address. Process all entries.
Since all callers so far provided NULL as parameter, no changes
are necessary in any other code.
Currently upon a migration a callback is created when a 802.1qbg link
is set to PREASSOCIATE, this should not happen because this is a no-op
on most switches, and does not lead to an ASSOCIATE state. This patch
only creates callbacks when CREATE or RESTORE is requested. Migration
and libvirtd restart scenarios are already handled elsewhere.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
The below patch fixes the following memory leak.
==20624== 24 bytes in 2 blocks are definitely lost in loss record 532 of 1,867
==20624== at 0x4A05E46: malloc (vg_replace_malloc.c:195)
==20624== by 0x38EC27FC01: strdup (strdup.c:43)
==20624== by 0x4EB6BA3: virDomainChrSourceDefCopy (domain_conf.c:1122)
==20624== by 0x495D76: qemuProcessFindCharDevicePTYs (qemu_process.c:1497)
==20624== by 0x498321: qemuProcessWaitForMonitor (qemu_process.c:1258)
==20624== by 0x49B5F9: qemuProcessStart (qemu_process.c:3652)
==20624== by 0x468B5C: qemuDomainObjStart (qemu_driver.c:4753)
==20624== by 0x469171: qemuDomainStartWithFlags (qemu_driver.c:4810)
==20624== by 0x4F21735: virDomainCreate (libvirt.c:8153)
==20624== by 0x4302BF: remoteDispatchDomainCreateHelper (remote_dispatch.h:852)
==20624== by 0x4F72C14: virNetServerProgramDispatch (virnetserverprogram.c:416)
==20624== by 0x4F6D690: virNetServerHandleJob (virnetserver.c:164)
==20624== by 0x4E8F43D: virThreadPoolWorker (threadpool.c:144)
==20624== by 0x4E8EAB5: virThreadHelper (threads-pthread.c:161)
==20624== by 0x38EC606CCA: start_thread (pthread_create.c:301)
==20624== by 0x38EC2E0C2C: clone (clone.S:115)
Most of our errors complaining about an inability to support a
particular action due to qemu limitations used CONFIG_UNSUPPORTED,
but we had a few outliers. Reported by Jiri Denemark.
* src/qemu/qemu_command.c (qemuBuildDriveDevStr): Prefer
CONFIG_UNSUPPORTED.
* src/qemu/qemu_driver.c (qemuDomainReboot)
(qemuDomainBlockJobImpl): Likewise.
* src/qemu/qemu_hotplug.c (qemuDomainAttachPciControllerDevice):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorTransaction)
(qemuMonitorBlockJob, qemuMonitorSystemWakeup): Likewise.
So that a domain xml which doesn't have "placement" specified, but
"cpuset" is specified, could be parsed. And in this case, the
"placement" mode will be set as "static".
The objects (domain, pool, network, etc) for testing are defined/
started each time when opening a connect to test driver, and thus
the UUID for the objects will be generated each time, with different
values. e.g.
% for i in {1..3}; do ./tools/virsh --connect \
test:///default dumpxml test | grep uuid; done
<uuid>a1b6ee1f-97de-f0ee-617a-0cdb74947df5</uuid>
<uuid>ee68d7d2-3eb9-593e-2769-797ce1f4c4aa</uuid>
<uuid>fecb1d3a-918a-8412-e534-76192cf32b18</uuid>
It's the potential bug which can cause operations like below to fail:
$ virsh -c test:///default dumpxml test > test.xml
[ Some modificatons, though it's not supported, but it should work ]
$ virsh -c test:///default define test.xml
This patch set fixed UUID for objects which support it. (domain,
pool, network).
A "ide-drive" device can be either a hard disk or a CD-ROM,
if there is ",media=cdrom" specified for the backend, it's
a CD-ROM, otherwise it's a hard disk.
Upstream qemu splitted "ide-drive" into "ide-hd" and "ide-cd"
since commit 1f56e32, and ",media=cdrom" is not required for
ide-cd anymore. "ide-drive" is still supported for backwards
compatibility, but no doubt we should go foward.
A "scsi-disk" device can be either a hard disk or a CD-ROM,
if there is ",media=cdrom" specified for the backend, it's
a CD-ROM, otherwise it's a hard disk.
But upstream qemu splitted "scsi-disk" into "scsi-hd" and
"scsi-cd" since commit b443ae, and ",media=cdrom" is not
required for scsi-cd anymore. "scsi-disk" is still supported
for backwards compatibility, but no doubt we should go
foward.
If console[0] is an alias for serial[0], do not enforce the former to
have a PTY source type. This breaks serial consoles on stdio and makes
no sense.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
When using the xm/xend stack to manage instances there is a bug
that causes the emulated interfaces to be unusable when the vif
config contains type=ioemu.
The current code already has a special quirk to not use this
keyword if no specific model is given for the emulated NIC
(defaulting to rtl8139).
Essentially it works because regardless of the type argument,i
the Xen stack always creates emulated and paravirt interfaces and
lets the guest decide which one to use. So neither xl nor xm stack
actually require the type keyword for emulated NICs.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Currently, we have 3 boolean arguments we have to pass
to qemuProcessStart(). As libvirt grows it is harder and harder
to remember them and their position. Therefore we should
switch to flags instead.
lvcreate want's the parent pool's name, not the pool path
lvchange and lvremove want lv specified as $vgname/$lvname
This largely worked before because these commands strip off a
starting /dev. But https://bugzilla.redhat.com/show_bug.cgi?id=714986
is from a user using a 'nested VG' that was having problems.
I couldn't find any info on nested LVM and the reporter never responded,
but I reproduced with XML that specified a valid source name, and
set target path to a symlink.
As explained in previous patch, numad will balance the affinity
dynamically, so reflecting the cpuset from numad at the first
time doesn't make much case, and may just could cause confusion.
Instead of returning a CPUs list, numad returns NUMA node
list instead, this patch is to convert the node list to
cpumap before affinity setting. Otherwise, the domain
processes will be pinned only to CPU[$numa_cell_num],
which will cause significiant performance losses.
Also because numad will balance the affinity dynamically,
reflecting the cpuset from numad back doesn't make much
sense then, and it may just could produce confusion for
the users. Thus the better way is not to reflect it back
to XML. And in this case, it's better to ignore the cpuset
when parsing XML.
The codes to update the cpuset is removed in this patch
incidentally, and there will be a follow up patch to ignore
the manually specified "cpuset" if "placement" is "auto",
and document will be updated too.
The linux-2.6.32 kernel header does not yet define IFLA_VF_MAX and others,
which breaks compiling a new libvirt on old systems like Debian Squeeze.
(I also have to add --without-macvtap --disable-werror --without-virtualport to
./configure to get it to compile.)
Signed-off-by: Philipp Hahn <hahn@univention.de>
Although it should be harmless to do:
disk = disk = def->disks[i]
some not-so-wise compilers may fool around.
Besides, such assignment is useless here.
On newer xend (v3.x and after) there is no state and domid reported
for inactive domains. When initially creating connections this is
handled in various places by assigning domain->id = -1.
But once an instance has been running, the id is set to the current
domain id. And it does not change when the instance is shut down.
So when querying the domain info, the hypervisor driver, which gets
asked first will indicate it cannot find information, then the
xend driver is asked and will set the status to NOSTATE because it
checks for the -1 domain id.
Checking domain/status for 0 seems to be more reliable for that.
One note: I am not sure whether the domain->id also should get set
back to -1 whenever any sub-driver thinks the instance is no longer
running.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=746007
BugLink: http://bugs.launchpad.net/bugs/929626
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
This patch adds a netlink callback when migrating a VEPA enabled
virtual machine. It fixes a Bug where a VM would not request a port
association when it was cleared by lldpad.
This patch requires the latest git version of lldpad to work.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
If dynamic_ownership is off and we are creating a file on NFS
we force chown. This will fail as chown/chmod are not supported
on NFS. However, with no dynamic_ownership we are not required
to do any chown.
In my testing, I was able to provoke an odd block pull failure:
$ virsh blockpull dom vda --bandwidth 10000
error: Requested operation is not valid: No active operation on device: drive-virtio-disk0
merely by using gdb to artifically wait to do the block job set speed
until after the pull had already finished. But in reality, that should
be a success, since the pull finished before we had a chance to set
speed. Furthermore, using a double job lock is not only annoying, but
a bug in itself - if you do parallel virDomainBlockRebase, and hit
the race window just right, the first call grabs the VM job to start
a fast block job, then the second call grabs the VM job to start
a long-running job with unspecified speed, then the first call finally
regrabs the VM job and sets the speed, which ends up running the
second job under the speed from the first call. By consolidating
things into a single job, we avoid opening that race, as well as reduce
the time between starting the job and changing the speed, for less
likelihood of the speed change happening after block job completion
in the first place.
* src/qemu/qemu_monitor.h (BLOCK_JOB_CMD): Add new mode.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Move secondary
job call...
(qemuDomainBlockJobImpl): ...here, for fewer locks.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Change
return value on new internal mode.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally
poll using qemu's "query-block-jobs" API and will not return until the
operation has been completed. API users are advised that this operation
is unbounded and further interaction with the domain during this period
may block. Future patches may refactor things to allow other queries in
parallel with this polling. For older qemu, we synthesize the cancellation
event, since qemu won't generate it.
The choice of polling duration copies from the code in qemu_migration.c.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Probably in the noise, but this will let us scale more efficiently
as we learn to recognize even more qemu events.
* src/qemu/qemu_monitor_json.c (eventHandlers): Sort.
(qemuMonitorEventCompare): New helper function.
(qemuMonitorJSONIOProcessEvent): Optimize event lookup.
Block job cancellation can take a while. Now that upstream qemu 1.1
has asynchronous block cancellation, we want to expose that to the user.
Therefore, the following updates are made to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELED is managed by
libvirt. Regardless of the flags used with virDomainBlockJobAbort, this
event will be raised: 1. when using synchronous block_job_cancel (the
event will be synthesized by libvirt), and 2. whenever it is received
from qemu (via asynchronous block-job-cancel). Note that the event
may be detected by libvirt even before the virDomainBlockJobAbort
completes (always true when it is synthesized, but also possible if
cancellation was fast).
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC is added to the
virDomainBlockJobAbort API. When enabled, this function will allow
(but not require) asynchronous operation (ie, it returns as soon as
possible, which might be before the job has actually been canceled).
When the API is used in this mode, it is the responsibility of the
caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELED event or poll via
the virDomainGetBlockJobInfo API to check the cancellation status.
This patch also exposes the new flag through virsh, and makes virsh
slightly easier to use (--async implies --abort, and lack of any options
implies --info), although it leaves the qemu implementation for later
patches.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
RHEL 6.2 was released with an early version of block jobs, which only
worked on the qed file format, where the commands were spelled with
underscore (contrary to QMP style), and where 'block_job_cancel' was
synchronous and did not trigger an event.
The upcoming qemu 1.1 release has fixed these short-comings [1][2]:
the commands now work on multiple file types, are spelled with dash,
and 'block-job-cancel' is asynchronous and emits an event upon conclusion.
[1]qemu commit 370521a1d6f5537ea7271c119f3fbb7b0fa57063
[2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01248.html
This patch recognizes the new spellings, and fixes virDomainBlockRebase
to give a graceful error when talking to a too-old qemu on a partial
rebase attempt. Fixes for the new semantics will come later. This
patch also removes a bogus ATTRIBUTE_NONNULL mistakenly added in
commit 10ec36e2.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCKJOB_SYNC)
(QEMU_CAPS_BLOCKJOB_ASYNC): New bits.
* src/qemu/qemu_capabilities.c (qemuCaps): Name them.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
them.
(qemuMonitorJSONBlockJob): Manage both command names.
(qemuMonitorJSONDiskSnapshot): Minor formatting fix.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Alter signature.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJob): Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Pass through
capability bit.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Update callers.
The new safe console handling introduced a possibility to deadlock the
qemu driver when a new console connection forcibly disconnects a
previous console stream that belongs to an already closed connection.
The virStreamFree function calls subsequently a the virReleaseConnect
function that tries to lock the driver while discarding the connection,
but the driver was already locked in qemuDomainOpenConsole.
Backtrace of the deadlocked thread:
0 0x00007f66e5aa7f14 in __lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007f66e5aa3411 in _L_lock_500 () from /lib64/libpthread.so.0
2 0x00007f66e5aa322a in pthread_mutex_lock () from/lib64/libpthread.so.0
3 0x0000000000462bbd in qemudClose ()
4 0x00007f66e6e178eb in virReleaseConnect () from/usr/lib64/libvirt.so.0
5 0x00007f66e6e19c8c in virUnrefStream () from /usr/lib64/libvirt.so.0
6 0x00007f66e6e3d1de in virStreamFree () from /usr/lib64/libvirt.so.0
7 0x00007f66e6e09a5d in virConsoleHashEntryFree () from/usr/lib64/libvirt.so.0
8 0x00007f66e6db7282 in virHashRemoveEntry () from/usr/lib64/libvirt.so.0
9 0x00007f66e6e09c4e in virConsoleOpen () from /usr/lib64/libvirt.so.0
10 0x00000000004526e9 in qemuDomainOpenConsole ()
11 0x00007f66e6e421f1 in virDomainOpenConsole () from/usr/lib64/libvirt.so.0
12 0x00000000004361e4 in remoteDispatchDomainOpenConsoleHelper ()
13 0x00007f66e6e80375 in virNetServerProgramDispatch () from/usr/lib64/libvirt.so.0
14 0x00007f66e6e7ae11 in virNetServerHandleJob () from/usr/lib64/libvirt.so.0
15 0x00007f66e6da897d in virThreadPoolWorker () from/usr/lib64/libvirt.so.0
16 0x00007f66e6da7ff6 in virThreadHelper () from/usr/lib64/libvirt.so.0
17 0x00007f66e5aa0c5c in start_thread () from /lib64/libpthread.so.0
18 0x00007f66e57e7fcd in clone () from /lib64/libc.so.6
* src/qemu/qemu_driver.c: qemuDomainOpenConsole()
-- unlock the qemu driver right after acquiring the domain
object
In case an API fails with "cannot acquire state change lock", searching
for the API that possibly forgot to end its job is not always easy.
Let's keep track of the job owner and print it out for easier
identification.
As reported by Daniel Berrangé, we have a huge performance regression
for virDomainGetInfo() due to the change which makes virDomainEndJob()
save the XML status file every time it is called. Previous to that
change, 2000 calls to virDomainGetInfo() took ~2.5 seconds. After that
change, 2000 calls to virDomainGetInfo() take 2 *minutes* 45 secs.
We made the change to be able to recover from libvirtd restart in the
middle of a job. However, only destroy and async jobs are taken care of.
Thus it makes more sense to only save domain state XML when these jobs
are started/stopped.
I noticed these compiler warnings when building for the s390 architecture.
* src/node_device/node_device_udev.c (udevDeviceMonitorStartup):
Mark unused variable.
* src/nodeinfo.c (linuxNodeInfoCPUPopulate): Avoid unused variable.
* src/qemu/qemu_command.c: Wire up -bios with <loader>
* tests/qemuxml2argvdata/qemuxml2argv-bios.args,
tests/qemuxml2argvdata/qemuxml2argv-bios.xml: Expand
existing BIOS test case to cover <loader>
Leak introduced in commit 0436d32. If we allocate an actions array,
but fail early enough to never consume it with the qemu monitor
transaction call, we leaked memory.
But our semantics of making the transaction command free the caller's
memory is awkward; avoiding the memory leak requires making every
intermediate function in the call chain check for error. It is much
easier to fix things so that the function that allocates also frees,
while the call chain leaves the caller's data intact. To do that,
I had to hack our JSON data structure to make it easy to protect a
portion of an arbitrary JSON tree from being freed.
* src/util/json.h (virJSONType): Name the enum.
(_virJSONValue): New field.
* src/util/json.c (virJSONValueFree): Use it to protect a portion
of an array.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONTransaction): Avoid
freeing caller's data.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive):
Free actions array on failure.
We can tell qemuDomainSnapshotFSThaw if we want it to report errors or
not. However, if we don't want to and an error has been already set by
previous qemuReportError() we must keep copy of that error not just a
pointer to it. Otherwise, it get overwritten if FSThaw reports an error.
Detected by valgrind. Leaks are introduced in commit b22eaa7.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): fix memory leaks.
How to reproduce?
% make && make -C tests check TESTS=qemuxml2argvtest
% cd tests && valgrind -v --leak-check=full ./qemuxml2argvtest
actual result:
==2143== 12 bytes in 2 blocks are definitely lost in loss record 74 of 179
==2143== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==2143== by 0x39D90A67DD: xmlStrndup (xmlstring.c:45)
==2143== by 0x4F5EC0: virDomainDiskDefParseXML (domain_conf.c:3438)
==2143== by 0x502F00: virDomainDefParseXML (domain_conf.c:8304)
==2143== by 0x505FE3: virDomainDefParseNode (domain_conf.c:9080)
==2143== by 0x5069AE: virDomainDefParse (domain_conf.c:9030)
==2143== by 0x41CBF4: testCompareXMLToArgvHelper (qemuxml2argvtest.c:105)
==2143== by 0x41E5DD: virtTestRun (testutils.c:145)
==2143== by 0x416FA3: mymain (qemuxml2argvtest.c:399)
==2143== by 0x41DCB7: virtTestMain (testutils.c:700)
==2143== by 0x39CF01ECDC: (below main) (libc-start.c:226)
Signed-off-by: Alex Jia <ajia@redhat.com>
If the daemon is restarted it will lose list of active
USB devices assigned to active domains. Therefore we need
to rebuild this list on qemuProcessReconnect().
To prevent assigning one USB device to two domains,
we keep a list of assigned USB devices. On domain
startup - qemuProcessStart() - we insert devices
used by domain into the list but remove them only
on detach-device. Devices are, however, released
on qemuProcessStop() as well.
Originally, qemuDomainCheckEjectableMedia was entering monitor with qemu
driver lock. Commit 2067e31bf9, which I
made to fix that, revealed another issue we had (but didn't notice it
since the driver was locked): we didn't set nested job when
qemuDomainCheckEjectableMedia is called during migration. Thus the
original fix I made was wrong.
XenD-3.1 introduced managed domains. HV-domains have rtc_timeoffset
(hgd24f37b31030 from 2007-04-03), which tracks the offset between the
hypervisors clock and the domains RTC, and is persisted by XenD.
In combination with localtime=1 this had a bug until XenD-3.4
(hg5d701be7c37b from 2009-04-01) (I'm not 100% sure how that bug
manifests, but at least for me in TZ=Europe/Berlin I see the previous
offset relative to utc being applied to localtime again, which manifests
in an extra hour being added)
XenD implements the following variants for clock/@offset:
- PV domains don't have a RTC → 'localtime' | 'utc'
- <3.1: no managed domains → 'localtime' | 'utc'
- ≥3.1: the offset is tracked for HV → 'variable'
due to the localtime=1 bug → 'localtime' | 'utc'
- ≥3.4: the offset is tracked for HV → 'variable'
Current libvirtd still thinks XenD only implements <clock offset='utc'/>
and <clock offset='localtime'/>, which is wrong, since the semantic of
'utc' and 'localtime' specifies, that the offset will be reset on
domain-restart, while with 'variable' the offset is kept. (keeping the
offset over "virsh edit" is important, since otherwise the clock might
jump, which confuses certain guest OSs)
xendConfigVersion was last incremented to 4 by the xen-folks for
xen-3.1.0. I know of no way to reliably detect the version of XenD
(user space tools), which may be different from the version of the
hypervisor (kernel) version! Because of this only the change from
'utc'/'localtime' to 'variable' in XenD-3.1 is handled, not the buggy
behaviour of XenD-3.1 until XenD-3.4.
For backward compatibility with previous versions of libvirt Xen-HV
still accepts 'utc' and 'localtime', but they are returned as 'variable'
on the next read-back from Xend to libvirt, since this is what XenD
implements: The RTC is NOT reset back to the specified time on next
restart, but the previous offset is kept.
This behaviour can be turned off by adding the additional attribute
adjustment='reset', in which case libvirt will report an error instead
of doing the conversion. The attribute can also be used as a shortcut to
offset='variable' with basis='...'.
With these changes, it is also necessary to adjust the xen tests:
"localtime = 0" is always inserted, because otherwise on updates the
value is not changed within XenD.
adjustment='reset' is inserted for all cases, since they're all <
XEND_CONFIG_VERSION_3_1_0, only 3.1 introduced persistent
rtc_timeoffset.
Some statements change their order because code was moved around.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Since Xen 3.1 the clock=variable semantic is supported. In addition to
qemu/kvm Xen also knows about a variant where the offset is relative to
'localtime' instead of 'utc'.
Extends the libvirt structure with a flag 'basis' to specify, if the
offset is relative to 'localtime' or 'utc'.
Extends the libvirt structure with a flag 'reset' to force the reset
behaviour of 'localtime' and 'utc'; this is needed for backward
compatibility with previous versions of libvirt, since they report
incorrect XML.
Adapt the only user 'qemu' to the new name.
Extend the RelaxNG schema accordingly.
Document the new 'basis' attribute in the HTML documentation.
Adapt test for the new attribute.
Signed-off-by: Philipp Hahn <hahn@univention.de>
https://bugzilla.redhat.com/show_bug.cgi?id=808979
The leak is really in virProcessInfoGetAffinity, as shown in the
valgrind output given in the above bug report - it calls CPU_ALLOC(),
but then fails to call CPU_FREE().
This leak has existed in every version of libvirt since 0.7.5.
Commit 1b1402b introduced a regression. Since older libvirt versions
would silently round memory up (until the previous patch), but populated
current memory based on querying the guest, it was possible to have
dumpxml show cur > max by the amount of the rounding. For example, if
a user requested 1048570 KiB memory (just shy of 1GiB), the qemu
driver would actually run with 1048576 KiB, and libvirt 0.9.10 would
output a current that was 6KiB larger than the maximum. Situations
where this could have an impact include, but are not limited to,
migration from old to new libvirt, managedsave in old libvirt and
start in new libvirt, snapshot creation in old libvirt and revert in
new libvirt - without this patch, the new libvirt would reject the
VM because of the rounding discrepancy.
Fix things by adding a fuzz factor, and silently clamp current down to
maximum in that case, rather than failing to reparse XML for an existing
VM. From a practical standpoint, this has no user impact: 'virsh
dumpxml' will continue to query the running guest rather than rely on
the incoming xml, which will see the currect current value, and even if
clamping down occurs during parsing, it will be by at most the fuzz
factor of a megabyte alignment, and rounded back up when passed back to
the hypervisor.
Meanwhile, we continue to reject cur > max if the difference is beyond
the fuzz factor of nearest megabyte. But this is not a real change in
behavior, since with 0.9.10, even though the parser allowed it, later
in the processing stream we would reject it at the qemu layer; so
rejecting it in the parser just moves error detection to a nicer place.
* src/conf/domain_conf.c (virDomainDefParseXML): Don't reject
existing XML.
Based on a report by Zhou Peng.
If we round up a user's memory request, we should update the XML
to reflect the actual value in use by the VM, rather than giving
an artificially small value back to the user.
* src/qemu/qemu_command.c (qemuBuildNumaArgStr)
(qemuBuildCommandLine): Reflect rounding back to XML.
This patch was created to resolve this upstream bug:
https://bugzilla.redhat.com/show_bug.cgi?id=784767
and is at least a partial solution to this RHEL RFE:
https://bugzilla.redhat.com/show_bug.cgi?id=805071
Previously the only attribute of a network device that could be
modified by virUpdateDeviceFlags() ("virsh update-device") was the
link state; attempts to change any other attribute would log an error
and fail.
This patch adds recognition of a change in bridge device name, and
supports reconnecting the guest's interface to the new device.
Standard audit logs for detaching and attaching a network device are
also generated. Although the current auditing function doesn't log the
bridge being attached to, this will later be changed in a separate
patch.
Regression introduced when we changed types in commit 3e2c3d8f6.
We've done this sort of cleanup before (see commit c685993d7).
* src/conf/storage_conf.c (virStoragePoolDefFormat)
(virStorageVolTargetDefFormat): Cast gid_t and uid_t.
We are so close to a release that we don't want to pull in a
gnulib submodule update and risk regressions, since there has
been a lot of other gnulib churn upstream. However, there are
a couple of gnulib issues that are worth fixing in isolation,
by applying local patches to gnulib.
There was an upstream gnulib bug in maint.mk that rendered most
of our syntax checks ineffective (and fixing it flushed out a
minor bug in our code):
https://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00194.html
There is still an upstream bug where gnulib uses the wrong type
for ssize_t on mingw; we need the fix now even though it has not
yet been accepted into gnulib:
https://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00188.html
* gnulib/local/top/maint.mk.diff: Pick up upstream gnulib
maint.mk.
* gnulib/local/m4/ssize_t.m4.diff: Work around gnulib bug.
* src/libvirt.c: Remove unused header.
* cfg.mk
(exclude_file_name_regexp--sc_prohibit_empty_lines_at_EOF): Exempt
gnulib local files.
qemuBuildHostNetStr had a switch-within-a-switch where both were
looking at the same variable. This was apparently to take advantage of
code common to three different cases (while also taking care of some
code that was different). However, there were only 2 lines common to
all, one of those can be eliminated by merging it into the
virAsprintfs that are in each case. On top of that, all the extra
empty cases cause Coverity complaints (because they are unreachable),
but absence of the empty cases causes a compile error due to
"enumeration value not handled in switch".
The solution is to just make each toplevel case independent, folding
in the common code to each.
commit b0e2bb33 set a default value for the SPICE agent channel by
inserting it during parsing of the channel XML. That method of setting
a default is problematic because it makes a format/parse roundtrip
unclean, and experience with setting other values as a side effect of
parsing has led to headaches (e.g. automatically setting a MAC address
in the parser when one isn't specified in the input XML).
This patch does not revert commit b0e2bb33 (it will be reverted in a
separate patch) but adds the alternate implementation of simply
inserting the default value in the appropriate place on the qemu
commandline when no value is provided.
If we issue guest command and GA is not running, the issuing thread
will block endlessly. We can check for GA presence by issuing
guest-sync with unique ID (timestamp). We don't want to issue real
command as even if GA is not running, once it is started, it process
all commands written to GA socket.
With latest gnulib we are checking even the lowest level functions
whether they check flags. Moreover, we are shadowing the real error
on system without TUNSETIFF support.
The code is splattered with a mix of
sizeof foo
sizeof (foo)
sizeof(foo)
Standardize on sizeof(foo) and add a syntax check rule to
enforce it
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A handful of places used %zd for format specifiers even
though the args was size_t, not ssize_t.
* src/remote/remote_driver.c, src/util/xml.c: s/%zd/%zu/
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
* src/conf/domain_conf.c (virDomainChannelDefCheckABIStability): avoid
crashing libvirtd due to derefing a NULL pointer.
For details, please see bug:
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=808371
Signed-off-by: Alex Jia <ajia@redhat.com>
When qemu cannot start, we may call qemuProcessStop() twice.
We have check whether the vm is running at the beginning of
qemuProcessStop() to avoid libvirt deadlock. We call
qemuProcessStop() with driver and vm locked. It seems that
we can avoid libvirt deadlock. But unfortunately we may
unlock driver and vm in the function qemuProcessKill() while
vm->def->id is not -1. So qemuProcessStop() will be run twice,
and monitor will be freed unexpectedly. So we should set
vm->def->id to -1 at the beginning of qemuProcessStop().
An upstream gnulib bug[1] meant that some of our syntax checks
weren't being run. Fix up our offenders before we upgrade to
a newer gnulib.
[1] https://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00194.html
* src/util/virnetdevtap.c (virNetDevTapCreate): Use flags.
* tests/lxcxml2xmltest.c (mymain): Strip useless ().
In the current V3 migration protocol, Libvirt does not
check the result of the function
qemuMigrationVPAssociatePortProfiles
This means that it is possible for a migration to complete
successfully even when the VM loses network connectivity on
the destination host.
With this change libvirt aborts the migration
(during the "finish" step) when the above function fails, that
is to say when at least one of the port profile associations fails.
Signed-off by: Christian Benvenuti <benve@cisco.com>
libvirt documentation for channels with type 'spicevmc' says that the
'target' child node has:
"an optional attribute name controls how the guest will have access
to the channel, and defaults to name='com.redhat.spice.0'."
However, this default value is never set in libvirt code base,
there's only a check in qemu_command.c to error out if the name
attribute doesn't have the expected value (if it's set).
This commit sets a default target name for spicevmc channels during
the domain configuration parsing so that the code agrees with the
documentation.
Commit d42a2ff caused a regression in creating a disk-only snapshot
of a qcow2 disk; by passing the wrong variable to the monitor call,
libvirt ended up creating JSON that looked like "format":null instead
of the intended "format":"qcow2".
To make it easier to diagnose this in the future, make JSON creation
error out if "s:arg" is paired with NULL (it is still possible to
use "n:arg" in the rare cases where qemu will accept a null).
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive): Pass correct value.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMakeCommandRaw):
Improve error message.
Pass argv to the init binary of LXC, using a new <initarg> element.
* docs/formatdomain.html.in: Document <os> usage for containers
* docs/schemas/domaincommon.rng: Add <initarg> element
* src/conf/domain_conf.c, src/conf/domain_conf.h: parsing and
formatting of <initarg>
* src/lxc/lxc_container.c: Setup LXC argv
* tests/Makefile.am, tests/lxcxml2xmldata/lxc-systemd.xml,
tests/lxcxml2xmltest.c, tests/testutilslxc.c,
tests/testutilslxc.h: Test parsing/formatting of LXC related
XML parts
The SELinux mount point moved from /selinux to /sys/fs/selinux
when systemd came along.
* configure.ac: Probe for SELinux mount point
* src/lxc/lxc_container.c: Use SELinux mount point determined
by configure.ac
When libvirtd is restarted, also restart the netlink event
message callbacks for existing VEPA connections and send
a message to lldpad for these existing links, so it learns
the new libvirtd pid.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
This avoids possible deadlock of the qemu driver in case a domain is
begin migrated (in Begin phase) and unrelated connection to qemu driver
is closed at the right time.
I checked all callers of qemuDomainCheckEjectableMedia() and they are
calling this function with qemu driver locked.
Found when attempting to build on Fedora 17 alpha with:
./autogen.sh --system --enable-compile-warnings=error
(this same build command works without problem on Fedora 16). Since
the consumer of the qemuProcessReconnectData doesn't assume that the
other fields of the struct are initialized (although it uses them
internally), the simpler solution is to just switch to C99-style
struct initialization (which doesn't require specification of all
fields).
libvirt always adds -Werror-frame-larger-than=4096 to the flags when
it builds. When building on Fedora 17, two functions with multiple
1024 buffers declared inside if {} blocks would generate frame size
errors; apparently the version of gcc on Fedora 16 will merge these
multiple buffers into a single buffer even when optimization is off,
but Fedora 17 won't.
The fix is to declare a single 1024 buffer at the top of the two
offending functions, and reuse the single buffer throughout the
functions.
Return statements with parameter enclosed in parentheses were modified
and parentheses were removed. The whole change was scripted, here is how:
List of files was obtained using this command:
git grep -l -e '\<return\s*([^()]*\(([^()]*)[^()]*\)*)\s*;' | \
grep -e '\.[ch]$' -e '\.py$'
Found files were modified with this command:
sed -i -e \
's_^\(.*\<return\)\s*(\(\([^()]*([^()]*)[^()]*\)*\))\s*\(;.*$\)_\1 \2\4_' \
-e 's_^\(.*\<return\)\s*(\([^()]*\))\s*\(;.*$\)_\1 \2\3_'
Then checked for nonsense.
The whole command looks like this:
git grep -l -e '\<return\s*([^()]*\(([^()]*)[^()]*\)*)\s*;' | \
grep -e '\.[ch]$' -e '\.py$' | xargs sed -i -e \
's_^\(.*\<return\)\s*(\(\([^()]*([^()]*)[^()]*\)*\))\s*\(;.*$\)_\1 \2\4_' \
-e 's_^\(.*\<return\)\s*(\([^()]*\))\s*\(;.*$\)_\1 \2\3_'
When qparams support was dropped in commit bc1ff160, we forgot
to add tests to ensure that viruri can do the same round trip
handling of a URI. This round trip was broken, due to use
of the old 'query' field of xmlUriPtr, instead of the new
'query_raw'
Also, we forgot to report an OOM error.
* tests/viruritest.c (mymain): Add tests based on just-deleted
qparamtest.
(testURIParse): Allow difference in input and expected output.
* src/util/viruri.c (virURIFormat): Add missing error. Use
query_raw, instead of query for xmlUriPtr object.
The oVirt developers have stated that the real reasons they want
to have qemu reuse existing volumes when creating a snapshot are:
1. the management framework is set up so that creation has to be
done from a central node for proper resource tracking, and having
libvirt and/or qemu create things violates the framework, and
2. qemu defaults to creating snapshots with an absolute path to
the backing file, but oVirt wants to manage a backing chain that
uses just relative names, to allow for easier migration of a chain
across storage locations.
When 0.9.10 added VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT (commit
4e9953a4), it only addressed point 1, but libvirt was still using
O_TRUNC which violates point 2. Meanwhile, the new qemu
'transaction' monitor command includes a new optional mode argument
that will force qemu to reuse the metadata of the file it just
opened (with the burden on the caller to have valid metadata there
in the first place). So, this tweaks the meaning of the flag to
cover both points as intended for use by oVirt. It is not strictly
backward-compatible to 0.9.10 behavior, but it can be argued that
the O_TRUNC of 0.9.10 was a bug.
Note that this flag is all-or-nothing, and only selects between
'existing' and the default 'absolute-paths'. A more flexible
approach that would allow per-disk selections, as well as adding
support for the 'no-backing-file' mode, would be possible by
extending the <domainsnapshot> xml to have a per-disk mode, but
until we have a management application expressing a need for that
additional complexity, it is not worth doing.
* src/libvirt.c (virDomainSnapshotCreateXML): Tweak documentation.
* src/qemu/qemu_monitor.h (qemuMonitorDiskSnapshot): Add
parameters.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorDiskSnapshot): Pass them
through.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot): Use
new monitor command arguments.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive)
(qemuDomainSnapshotCreateSingleDiskActive): Adjust callers.
(qemuDomainSnapshotDiskPrepare): Allow qed, modify rules on reuse.
The hardest part about adding transactions is not using the new
monitor command, but undoing the partial changes we made prior
to a failed transaction.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive): Use
transaction when available.
(qemuDomainSnapshotUndoSingleDiskActive): New function.
(qemuDomainSnapshotCreateSingleDiskActive): Pass through actions.
(qemuDomainSnapshotCreateXML): Adjust caller.
QEmu 1.1 is adding a 'transaction' command to the JSON monitor.
Each element of a transaction corresponds to a top-level command,
with the additional guarantee that the transaction flushes all
pending I/O, then guarantees that all actions will be successful
as a group or that failure will roll back the state to what it
was before the monitor command. The difference between a
top-level command:
{ "execute": "blockdev-snapshot-sync", "arguments":
{ "device": "virtio0", ... } }
and a transaction:
{ "execute": "transaction", "arguments":
{ "actions": [
{ "type": "blockdev-snapshot-sync", "data":
{ "device": "virtio0", ... } } ] } }
is just a couple of changed key names and nesting the shorter
command inside a JSON array to the longer command. This patch
just adds the framework; the next patch will actually use a
transaction.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMakeCommand): Move
guts...
(qemuMonitorJSONMakeCommandRaw): ...into new helper. Add support
for array element.
(qemuMonitorJSONTransaction): New command.
(qemuMonitorJSONDiskSnapshot): Support use in a transaction.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskSnapshot): Add
argument.
(qemuMonitorJSONTransaction): New declaration.
* src/qemu/qemu_monitor.h (qemuMonitorTransaction): Likewise.
(qemuMonitorDiskSnapshot): Add argument.
* src/qemu/qemu_monitor.c (qemuMonitorTransaction): New wrapper.
(qemuMonitorDiskSnapshot): Pass argument on.
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive): Update caller.
Taking an external snapshot of just one disk is atomic, without having
to pause and resume the VM. This also paves the way for later patches
to interact with the new qemu 'transaction' monitor command.
The various scenarios when requesting atomic are:
online, 1 disk, old qemu - safe, allowed by this patch
online, more than 1 disk, old qemu - failure, this patch
offline snapshot - safe, once a future patch implements offline disk snapshot
online, 1 or more disks, new qemu - safe, once future patch uses transaction
Taking an online system checkpoint snapshot is atomic, since it is
done via a single 'savevm' monitor command. Taking an offline system
checkpoint snapshot is atomic, thanks to the previous patch.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Support
new flag for single-disk setups.
(qemuDomainSnapshotDiskPrepare): Check for atomic here.
(qemuDomainSnapshotCreateDiskActive): Skip pausing the VM when
atomic supported.
(qemuDomainSnapshotIsAllowed): Use bool instead of int.
Offline internal snapshots can be rolled back with just a little
bit of refactoring, meaning that we are now automatically atomic.
* src/qemu/qemu_domain.c (qemuDomainSnapshotForEachQcow2): Move
guts...
(qemuDomainSnapshotForEachQcow2Raw): ...to new helper, to allow
rollbacks.
Right now, it is appallingly easy to cause qemu disk snapshots
to alter a domain then fail; for example, by requesting a two-disk
snapshot where the second disk name resides on read-only storage.
In this failure scenario, libvirt reports failure, but modifies
the live domain XML in-place to record that the first disk snapshot
was taken; and places a difficult burden on the management app
to grab the XML and reparse it to see which disks, if any, were
altered by the partial snapshot.
This patch adds a new flag where implementations can request that
the hypervisor make snapshots atomically; either no changes to
XML occur, or all disks were altered as a group. If you request
the flag, you either get outright failure up front, or you take
advantage of hypervisor abilities to make an atomic snapshot. Of
course, drivers should prefer the atomic means even without the
flag explicitly requested.
There's no way to make snapshots 100% bulletproof - even if the
hypervisor does it perfectly atomic, we could run out of memory
during the followup tasks of updating our in-memory XML, and report
a failure. However, these sorts of catastrophic failures are rare
and unlikely, and it is still nicer to know that either all
snapshots happened or none of them, as that is an easier state to
recover from.
* include/libvirt/libvirt.h.in
(VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC): New flag.
* src/libvirt.c (virDomainSnapshotCreateXML): Document it.
* tools/virsh.c (cmdSnapshotCreate, cmdSnapshotCreateAs): Expose it.
* tools/virsh.pod (snapshot-create, snapshot-create-as): Document
it.
We need a capability bit to gracefully error out if some of the
additions in future patches can't be implemented by the running qemu.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_TRANSACTION): New cap.
* src/qemu/qemu_capabilities.c (qemuCaps): Name it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
it.
Recent changes have caused build failures on systems where pdwtags works:
commit a26a196 mistakenly exported a public variable
commits a26a196, 57ddcc2, 487c063 all had copy-paste bugs in
hand-updating the golden API rather than rerunning pdwtags
* include/libvirt/libvirt.h.in (virDomainEventTrayChangeReason):
Make this a typedef, not external storage.
* src/remote_protocol-structs (remote_procedure): Fix spelling.
This introduces a new running reason VIR_DOMAIN_RUNNING_WAKEUP,
and new suspend event type VIR_DOMAIN_EVENT_STARTED_WAKEUP.
While a wakeup event is emitted, the domain which entered into
VIR_DOMAIN_PMSUSPENDED will be transferred to "running"
with reason VIR_DOMAIN_RUNNING_WAKEUP, and a new domain lifecycle
event emitted with type VIR_DOMAIN_EVENT_STARTED_WAKEUP.
This introduces a new domain state pmsuspended to represent
the domain which has been suspended by guest power management,
e.g. (entered itno s3 state). Because a "running" state could
be confused in this case, one will see the guest is paused
actually while playing. And state "paused" is for the domain
which was paused by virDomainSuspend.
This patch introduces a new event type for the QMP event
SUSPEND:
VIR_DOMAIN_EVENT_ID_PMSUSPEND
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventSuspendCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This patch introduces a new event type for the QMP event
WAKEUP:
VIR_DOMAIN_EVENT_ID_PMWAKEUP
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventWakeupCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This is similiar with physical world, one will be surprised if the
box starts with medium exists while the tray is open.
New tests are added, tests disk-{cdrom,floppy}-tray are for the qemu
supports "-device" flag, and disk-{cdrom,floppy}-no-device-cap are
for old qemu, i.e. which doesn't support "-device" flag.
This patch introduces a new event type for the QMP event
DEVICE_TRAY_MOVED, which occurs when the tray of a removable
disk is moved (i.e opened or closed):
VIR_DOMAIN_EVENT_ID_TRAY_CHANGE
The event's data includes the device alias and the reason
for tray status' changing, which indicates why the tray
status was changed. Thus the callback definition for the event
is:
enum {
VIR_DOMAIN_EVENT_TRAY_CHANGE_OPEN = 0,
VIR_DOMAIN_EVENT_TRAY_CHANGE_CLOSE,
\#ifdef VIR_ENUM_SENTINELS
VIR_DOMAIN_EVENT_TRAY_CHANGE_LAST
\#endif
} virDomainEventTrayChangeReason;
typedef void
(*virConnectDomainEventTrayChangeCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *devAlias,
int reason,
void *opaque);
Libvirt on x86 parses 'dmidecode' to gather characteristics of host
system. On PowerPC, this is now implemented by reading /proc/cpuinfo
NOTE: memory-DIMM information is not presently implemented.
Acked-by: Daniel Veillard <veillard@redhat.com>
Acked-by: Daniel P Berrange <berrange@redhat.com>
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
When SASL requests auth credentials, try to look them up in the
config file first. If any are found, remove them from the list
that the user is prompted for
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
SASL may prompt for credentials after either a 'start' or 'step'
invocation. In both cases the code to handle this is the same.
Refactor this code into a separate method to reduce the duplication,
since the complexity is about to grow
* src/remote/remote_driver.c: Refactor interaction with SASL
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure that the functions in virauth.h have names matching the file
prefix, by renaming virRequest{Username,Password} to
virAuthGet{Username,Password}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To follow latest naming conventions, rename src/util/authhelper.[ch]
to src/util/virauth.[ch].
* src/util/authhelper.[ch]: Rename to src/util/virauth.[ch]
* src/esx/esx_driver.c, src/hyperv/hyperv_driver.c,
src/phyp/phyp_driver.c, src/xenapi/xenapi_driver.c: Update
for renamed include files
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The '.ini' file format is a useful alternative to the existing
config file style, when you need to have config files which
are hashes of hashes. The 'virKeyFilePtr' object provides a
way to parse these file types.
* src/Makefile.am, src/util/virkeyfile.c,
src/util/virkeyfile.h: Add .ini file parser
* tests/Makefile.am, tests/virkeyfiletest.c: Test
basic parsing capabilities
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Convert drivers currently using the qparams APIs, to instead
use the virURIPtr query parameters directly.
* src/esx/esx_util.c, src/hyperv/hyperv_util.c,
src/remote/remote_driver.c, src/xenapi/xenapi_utils.c: Remove
use of qparams
* src/util/qparams.h, src/util/qparams.c: Delete
* src/Makefile.am, src/libvirt_private.syms: Remove qparams
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Avoid the need for each driver to parse query parameters itself
by storing them directly in the virURIPtr struct. The parsing
code is a copy of that from src/util/qparams.c The latter will
be removed in a later patch
* src/util/viruri.h: Add query params to virURIPtr
* src/util/viruri.c: Parse query parameters when creating virURIPtr
* tests/viruritest.c: Expand test to cover params
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of just typedef'ing the xmlURIPtr struct for virURIPtr,
use a custom libvirt struct. This allows us to fix various
problems with libxml2. This initially just fixes the query vs
query_raw handling problems.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The parameter in the virURIFormat impl mistakenly used the
xmlURIPtr type, instead of virURIPtr. Since they will soon
cease to be identical, this needs fixing
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since we defined a custom virURIPtr type, we should use a
virURIFree method instead of assuming it will always be
a typedef for xmlURIPtr
* src/util/viruri.c, src/util/viruri.h, src/libvirt_private.syms:
Add a virURIFree method
* src/datatypes.c, src/esx/esx_driver.c, src/libvirt.c,
src/qemu/qemu_migration.c, src/vmx/vmx.c, src/xen/xend_internal.c,
tests/viruritest.c: s/xmlFreeURI/virURIFree/
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When a client which started non-p2p migration dies in a bad time, the
source libvirtd never clears the migration job and almost nothing can be
done with the domain without restarting the daemon. This patch makes use
of connection close callbacks and ensures that migration job is properly
discarded when the client disconnects.
Destination daemon should not rely on the client or source daemon
(depending on the type of migration) to call Finish when migration
fails, because the client may crash before it can do so. The domain
prepared for incoming migration is set to be destroyed (and migration
job cleaned up) when connection with the client closes but this is not
enough. If the associated qemu process crashes after Prepare step and
the domain is cleaned up before the connection gets closed, autodestroy
is not called for the domain and migration jobs remains set. In case the
domain is defined on destination host (i.e., it is not completely
removed once destroyed) we keep the job set for ever. To fix this, we
register a cleanup callback which is responsible to clean migration-in
job when a domain dies anywhere between Prepare and Finish steps. Note
that we can't blindly clean any job when spotting EOF on monitor since
normally an API is running at that time.
This reverts commit 61f2b6ba5f and most of
commit d8916dc8e2, which effectively
brings back commit ef1065cf5a written by
Jim Fehlig:
The qemu migration speed default is 32MiB/s as defined in migration.c
/* Migration speed throttling */
static int64_t max_throttle = (32 << 20);
There's no need to throttle migration when targeting a file, so set
migration speed to unlimited prior to migration, and restore to libvirt
default value after migration.
Default units is MB for migrate_set_speed monitor command, so
(INT64_MAX / (1024 * 1024)) is used for unlimited migration speed.
This was reverted because migration to file could not be canceled and
even monitored since qemu was not processing any monitor commands until
the migration finished. This is now different as we make sure the
file descriptor we pass to qemu is able to properly report EAGAIN.
Recent qemu changes might have helped as well.
I tested managedsave with this patch in and indeed, it is 10x faster
while I can still monitor its progress.
A few times libvirt users manually setting mac addresses have
complained of a networking failure that ends up being due to a multicast
mac address being used for a guest interface. This patch prevents that
by logging an error and failing if a multicast mac address is
encountered in each of the three following cases:
1) domain xml <interface> mac address.
2) network xml bridge mac address.
3) network xml dhcp/host mac address.
There are several other places where a mac address can be input that
aren't controlled in this manner because failure to do so has no
consequences (e.g., if the address will be used to search through
existing interfaces for a match).
The RNG has been updated to add multiMacAddr and uniMacAddr along with
the existing macAddr, and macAddr was switched to uniMacAddr where
appropriate.
If an error was encountered parsing a dhcp host entry mac address or
name, parsing would continue and log a less descriptive error that
might make it more difficult to notice the true nature of the problem.
This patch returns immediately on logging the first error.
This patch is in response to:
https://bugzilla.redhat.com/show_bug.cgi?id=798467
If a guest's tap device is created using the same MAC address the
guest uses for its own network card (which connects to the tap
device), the Linux kernel will log the following message and traffic
will not pass:
kernel: vnet9: received packet with own address as source address
This patch disallows MAC addresses with a first byte of 0xFE, but only in
the case that the MAC address is used for a guest interface that's
connected by way of a standard tap device. (In other words, the
validation is done at runtime at the same place the MAC address is
modified for the tap device, rather than when mac address is parsed,
the idea being that it is then we know for sure the address will be
problematic.)
Using inheritance, this patch cleans up the cpu_map.xml file and also
sorts all CPU features according to the feature and registry
values. Model features are sorted the same way as foeatures in the
specification.
Also few models that are related were organized together and parts of
the XML are marked with comments
If a guest is paused, we were silently ignoring the quiesce flag,
which results in unclean snapshots, contrary to the intent of the
flag. Since we can't quiesce without guest agent support, we should
instead fail if the guest is not running.
Meanwhile, if we attempt a quiesce command, but the guest agent
doesn't respond, and we time out, we may have left the command
pending on the guest's queue, and when the guest resumes parsing
commands, it will freeze even though our command is no longer
around to issue a thaw. To be safe, we must _always_ pair every
quiesce call with a counterpart thaw, even if the quiesce call
failed due to a timeout, so that if a guest wakes up and starts
processing a command backlog, it will not get stuck in a frozen
state.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive):
Always issue thaw after a quiesce, even if quiesce failed.
(qemuDomainSnapshotFSThaw): Add a parameter.
This patch fixes a NULL pointer check that was causing SegFault on
some specific configurations. It also reverts commit 59d0c9801c
that was checking for this value in one place.
A common coding pattern for changing blkio parameters is
1. virDomainGetBlkioParameters
2. change one or more params
3. virDomainSetBlkioParameters
For this to work, it must be possible to roundtrip through
the methods without error. Unfortunately virDomainGetBlkioParameters
will return "" for the deviceWeight parameter for guests by default,
which virDomainSetBlkioParameters will then reject as invalid.
This fixes the handling of "" to be a no-op, and also improves the
error message to tell you what was invalid
How to reproduce:
% valgrind -v --leak-check=full virsh migrate mig \
qemu+ssh://$dest/system --unsafe
== 8 bytes in 1 blocks are definitely lost in loss record 1 of 28
== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
== by 0x3EB7115FB8: xdr_reference (in /lib64/libc-2.12.so)
== by 0x3EB7115F10: xdr_pointer (in /lib64/libc-2.12.so)
== by 0x4D1EA84: xdr_remote_string (remote_protocol.c:40)
== by 0x4D1EAD8: xdr_remote_domain_migrate_prepare3_ret (remote_protocol.c:4772)
== by 0x4D2FFD2: virNetMessageDecodePayload (virnetmessage.c:382)
== by 0x4D2789C: virNetClientProgramCall (virnetclientprogram.c:382)
== by 0x4D0707D: callWithFD (remote_driver.c:4549)
== by 0x4D070FB: call (remote_driver.c:4570)
== by 0x4D12AEE: remoteDomainMigratePrepare3 (remote_driver.c:4138)
== by 0x4CF7BE9: virDomainMigrateVersion3 (libvirt.c:4815)
== by 0x4CF9432: virDomainMigrate2 (libvirt.c:5454)
==
== LEAK SUMMARY:
== definitely lost: 8 bytes in 1 blocks
== indirectly lost: 0 bytes in 0 blocks
== possibly lost: 0 bytes in 0 blocks
== still reachable: 126,995 bytes in 1,343 blocks
== suppressed: 0 bytes in 0 blocks
This patch also fixes the leaks in remoteDomainMigratePrepare and
remoteDomainMigratePrepare2.
* src/libvirt.c (virStorageVolResize): correct comment typo according to
virStorageVolResizeFlags enum definition.
Signed-off-by: Alex Jia <ajia@redhat.com>
If no <interface> elements are included in an LXC guest XML
description, then the LXC guest will just see the host's
network interfaces. It is desirable to be able to hide the
host interfaces, without having to define any guest interfaces.
This patch introduces a new feature flag <privnet/> to allow
forcing of a private network namespace for LXC. In the future
I also anticipate that we will add <privuser/> to force a
private user ID namespace.
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add support
for <privnet/> feature. Auto-set <privnet> if any <interface>
devices are defined
* src/lxc/lxc_container.c: Honour request for private network
namespace
Commit e457d5ef20 adds ability to pass the
default URI using the client configuration file. If the file is not
present, it still accesses the NULL config object causing a segfault.
Caught running "make check".
Wire up the domain graphics event notifications for SPICE. Adapted
from a RHEL-only patch written by Dan Berrange that used custom
__com.redhat_SPICE events - equivalent events are now available in
upstream QEMU (including a SPICE_CONNECTED event, which was missing in
the __COM.redhat_SPICE version).
* src/qemu/qemu_monitor_json.c: Wire up SPICE graphics events
Currently if the URI passed to virConnectOpen* is NULL, then we
- Look for LIBVIRT_DEFAULT_URI env var
- Probe for drivers
This changes it so that
- Look for LIBVIRT_DEFAULT_URI env var
- Look for 'uri_default' in $HOME/.libvirt/libvirt.conf
- Probe for drivers
numad is an user-level daemon that monitors NUMA topology and
processes resource consumption to facilitate good NUMA resource
alignment of applications/virtual machines to improve performance
and minimize cost of remote memory latencies. It provides a
pre-placement advisory interface, so significant processes can
be pre-bound to nodes with sufficient available resources.
More details: http://fedoraproject.org/wiki/Features/numad
"numad -w ncpus:memory_amount" is the advisory interface numad
provides currently.
This patch add the support by introducing a new XML attribute
for <vcpu>. e.g.
<vcpu placement="auto">4</vcpu>
<vcpu placement="static" cpuset="1-10^6">4</vcpu>
The returned advisory nodeset from numad will be printed
in domain's dumped XML. e.g.
<vcpu placement="auto" cpuset="1-10^6">4</vcpu>
If placement is "auto", the number of vcpus and the current
memory amount specified in domain XML will be used for numad
command line (numad uses MB for memory amount):
numad -w $num_of_vcpus:$current_memory_amount / 1024
The advisory nodeset returned from numad will be used to set
domain process CPU affinity then. (e.g. qemuProcessInitCpuAffinity).
If the user specifies both CPU affinity policy (e.g.
(<vcpu cpuset="1-10,^7,^8">4</vcpu>) and placement == "auto"
the specified CPU affinity will be overridden.
Only QEMU/KVM drivers support it now.
See docs update in patch for more details.
With current code, we pass true iff domain is cold booting. However,
if disk is inaccessible and startupPolicy for that disk is set to
'requisite' we have to fail iff cold booting.
AMD Bulldozer (or Opteron_G4 as called in QEMU) was added to the list
of cpu models, flags were taken from upstream qemu cpu specifications
and should be sorted by bit values (or first occurence in the feature
specification part of cpu_map.xml).
Based on QEMU upstream commit 885bb0369a4f0abe2c0185178f3cb347cb02cdf1.
Even though we say in documentation setting (tls-)port to -1 is legacy
compat style for enabling autoport, we're roughly doing this for VNC.
However, in case of SPICE auto enable autoport iff both port & tlsPort
are equal -1 as documentation says autoport plays with both.
In qemuDomainDetachNetDevice, detach was being used before it had been
validated. If no matching device was found, this resulted in a
dereference of a NULL pointer.
This behavior was a regression introduced in commit
cf90342be0, so it has not been a part of
any official libvirt release.
When host-model and host-passthrouh CPU modes were introduced, qemu
driver was properly modify to update guest CPU definition during
migration so that we use the right CPU at the destination. However,
similar treatment is needed for (managed)save and snapshots since they
need to save the exact CPU so that a domain can be properly restored.
To avoid repetition of such situation, all places that need live XML
share the code which generates it.
As a side effect, this patch fixes error reporting from
qemuDomainSnapshotWriteMetadata().
Thanks to cgroups, providing user vs. system time of the overall
guest is easy to add to our existing API.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_CPU_STATS_USERTIME)
(VIR_DOMAIN_CPU_STATS_SYSTEMTIME): New constants.
* src/util/virtypedparam.h (virTypedParameterArrayValidate)
(virTypedParameterAssign): Enforce checking the result.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Fix offender.
(qemuDomainGetTotalcpuStats): Implement new parameters.
* tools/virsh.c (cmdCPUStats): Tweak output accordingly.
As documented in linux.git/Documentation/cgroups/cpuacct.txt,
cpuacct.stat returns user and system time in ticks (the same
unit used in times(2)). It would be a bit nicer if it were like
getrusage(2) and reported timeval contents, or like cpuacct.usage
and in nanoseconds, but we can't be picky.
* src/util/cgroup.h (virCgroupGetCpuacctStat): New function.
* src/util/cgroup.c (virCgroupGetCpuacctStat): Implement it.
(virCgroupGetValueStr): Allow for multi-line files.
* src/libvirt_private.syms (cgroup.h): Export it.
If there is a disk file with a comma in the name, QEmu expects a double
comma instead of a single one (e.g., the file "virtual,disk.img" needs
to be specified as "virtual,,disk.img" in QEmu's command line). This
patch fixes libvirt to work with that feature. Fix RHBZ #801036.
Based on an initial patch by Crístian Viana.
* src/util/buf.h (virBufferEscape): Alter signature.
* src/util/buf.c (virBufferEscape): Add parameter.
(virBufferEscapeSexpr): Fix caller.
* src/qemu/qemu_command.c (qemuBuildRBDString): Likewise. Also
escape commas in file names.
(qemuBuildDriveStr): Escape commas in file names.
* docs/schemas/basictypes.rng (absFilePath): Relax RNG to allow
commas in input file names.
* tests/qemuxml2argvdata/*-disk-drive-network-sheepdog.*: Update
test.
Signed-off-by: Eric Blake <eblake@redhat.com>
We found few more AMD-specific features in cpu64-rhel* models that
made it impossible to start qemu guest on Intel host (with this
setting) even though qemu itself starts correctly with them.
This impacts one test, thus the fix in tests/cputestdata/.
virNetworkDNSHostsDefParseXML was calling VIR_ALLOC(def->hosts) if
def->hosts was NULL. This is a waste of time, though, since
VIR_REALLOC_N is called a few lines further down, prior to any use of
def->hosts. (initializing def->nhosts to 0 is also redundant, because
the newly allocated memory will always be cleared to all 0's anyway).
If user hasn't supplied any tlsPort we default to setting it
to zero in our internal structure. However, when building command
line we test it against -1 which is obviously wrong.
This is nearly identical to an earlier patch for virnetlink.c.
There are special stub versions of all public functions in this file
that are compiled when the platform isn't linux. Each of these
functions had an almost identical message, differing only in the
function name included in the message. Since log messages already
contain the function name, we can just define a const char* with the
common part of the string, and use that same string for all the log
messages.
If nothing else, this at least makes for less strings that need
translating...
This function was freeing a virDomainNetDef with
VIR_FREE(). virDomainNetDef is a complex structure with many pointers
to other dynamically allocated data; to properly free it
virDomainNetDefFree() must be called instead, otherwise several
strings (and potentially other things) will be leaked.
For some reason, although live hotplug of <hostdev> devices is
supported, persistent hotplug is not. This patch adds the proper
VIR_DOMAIN_DEVICE_HOSTDEV cases to the switches in
qemuDomainAttachDeviceConfig and qemuDomainDetachDeviceConfig.
There are several functions that call virNetlinkCommand, and they all
follow a common pattern, with three exit labels: err_exit (or
cleanup), malformed_resp, and buffer_too_small. All three of these
labels do their own cleanup and have their own return. However, the
malformed_resp label usually frees the same items as the
cleanup/err_exit label, and the buffer_too_small label just doesn't
free recvbuf (because it's known to always be NULL at the time we goto
buffer_too_small.
In order to simplify and standardize the code, I've made the following
changes to all of these functions:
1) err_exit is replaced with the more libvirt-ish "cleanup", which
makes sense because in all cases this code is also executed in the
case of success, so labelling it err_exit may be confusing.
2) rc is initialized to -1, and set to 0 just before the cleanup
label. Any code that currently sets rc = -1 is made to instead goto
cleanup.
3) malformed_resp and buffer_too_small just log their error and goto
cleanup. This gives us a single return path, and a single place to
free up resources.
4) In one instance, rather then logging an error immediately, a char*
msg was pointed to an error string, then goto cleanup (and cleanup
would log an error if msg != NULL). It takes no more lines of code
to just log the message as we encounter it.
This patch should have 0 functional effects.
There are several functions in domain_conf.c that remove a device
object from the domain's list of that object type, but don't free the
object or return it to the caller to free. In many cases this isn't a
problem because the caller already had a pointer to the object and
frees it afterward, but in several cases the removed object was just
left floating around with no references to it.
In particular, the function qemuDomainDetachDeviceConfig() calls
functions to locate and remove net (virDomainNetRemoveByMac), disk
(virDomainDiskRemoveByName()), and lease (virDomainLeaseRemove())
devices, but neither it nor its caller qemuDomainModifyDeviceConfig()
ever obtain a pointer to the device being removed, much less free it.
This patch modifies the following "remove" functions to return a
pointer to the device object being removed from the domain device
arrays, to give the caller the option of freeing the device object
using that pointer if needed. In places where the object was
previously leaked, it is now freed:
virDomainDiskRemove
virDomainDiskRemoveByName
virDomainNetRemove
virDomainNetRemoveByMac
virDomainHostdevRemove
virDomainLeaseRemove
virDomainLeaseRemoveAt
The functions that had been leaking:
libxlDomainDetachConfig - leaked a virDomainDiskDef
qemuDomainDetachDeviceConfig - could leak a virDomainDiskDef,
a virDomainNetDef, or a
virDomainLeaseDef
qemuDomainDetachLease - leaked a virDomainLeaseDef
There were certain paths through the hostdev detach code that could
lead to the lower level function failing (and not removing the object
from the domain's hostdevs list), but the higher level function
free'ing the hostdev object anyway. This would leave a stale
hostdevdef pointer in the list, which would surely cause a problem
eventually.
This patch relocates virDomainHostdevRemove from the lower level
functions qemuDomainDetachThisHostDevice and
qemuDomainDetachHostPciDevice, to their caller
qemuDomainDetachThisHostDevice, placing it just before the call to
virDomainHostdevDefFree. This makes it easy to verify that either both
operations are done, or neither.
NB: The "dangling pointer" part of this problem was introduced in
commit 13d5a6, so it is not present in libvirt versions prior to
0.9.9. Earlier versions would return failure in certain cases even
though the the device object was removed/deleted, but the removal and
deletion operations would always both happen or neither.
There are special stub versions of all public functions in this file
that are compiled when either libnl isn't available or the platform
isn't linux. Each of these functions had two almost identical message,
differing only in the function name included in the message. Since log
messages already contain the function name, we can just define a const
char* with the common part of the string, and use that same string for
all the log messages.
Also, rather than doing #if defined ... #else ... #endif *inside the
error log macro invocation*, this patch does #if defined ... just
once, using it to decide which single string to define. This turns the
error log in each function from 6 lines, to 1 line.
This patch will allow OpenFlow controllers to identify which interface
belongs to a particular VM by using the Domain UUID.
ovs-vsctl get Interface vnet0 external_ids
{attached-mac="52:54:00:8C:55:2C", iface-id="83ce45d6-3639-096e-ab3c-21f66a05f7fa", iface-status=active, vm-id="142a90a7-0acc-ab92-511c-586f12da8851"}
V2 changes:
Replaced vm-uuid with vm-id. There was a discussion in Open vSwitch
mailinglist that we should stick with the same DB key postfixes for the
sake of consistency (e.g iface-id, vm-id ...).
The indentation on the final lines of the function was off by four
spaces, making me wonder for a second if there was something
missing. (There wasn't.)
Commit 5d4b0c4c80 tried to fix certain classes of VPATH builds,
but was too limited. In particular, Guannan Ren reported:
> For example: The libvirt source code resides in /home/testuser,
> I make dist in /tmp/buildvpath, the XDR routine .c file will
> include full path of the header file like:
>
> #include "/home/testuser/src/rpc/virnetprotocol.h"
> #include "internal.h"
> #include <arpa/inet.h>
>
> If we distribute the tarball to another machine to compile,
> it will report error as follows:
>
> rpc/virnetprotocol.c:7:59: fatal error:
> /home/testuser/src/rpc/virnetprotocol.h: No such file or directory
* src/rpc/genprotocol.pl: Fix more include lines.
If we need to virFork() to check assess() under different
UID+GID we need to translate returned status via WEXITSTATUS().
Otherwise, we may return values greater than 255 which is
obviously wrong.
The function sanlock_inquire can return NULL in the state string if the
message consists only of a header. The return value is arbitrary and
sent by the server. We should proceed carefully while touching such
pointers.
Some members are generated during XML parse (e.g. MAC address of
an interface); However, with current implementation, if we
are plugging a device both to persistent and live config,
we parse given XML twice: first time for live, second for config.
This is wrong then as the second time we are not guaranteed
to generate same values as we did for the first time.
To prevent that we need to create a copy of DeviceDefPtr;
This is done through format/parse process instead of writing
functions for deep copy as it is easier to maintain:
adding new field to any virDomain*DefPtr doesn't require change
of copying function.
Currently, startupPolicy='requisite' was determining cold boot
by migrateFrom != NULL. That means, if domain was started up
with migrateFrom set we didn't require disk source path and allowed
it to be dropped. However, on snapshot-revert domain wasn't migrated
but according to documentation, requisite should drop disk source
as well.
Output is still in kibibytes, but input can now be in different
scales for ease of typing.
* src/conf/domain_conf.c (virDomainParseMemory): New helper.
(virDomainDefParseXML): Use it when parsing.
* docs/schemas/domaincommon.rng: Expand XML; rename memoryKBElement
to memoryElement and update callers.
* docs/formatdomain.html.in (elementsMemoryAllocation): Document
scaling.
* tests/qemuxml2argvdata/qemuxml2argv-memtune.xml: Adjust test.
* tests/qemuxml2xmltest.c: Likewise.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-memtune.xml: New file.
Using 'unsigned long' for memory values is risky on 32-bit platforms,
as a PAE guest can have more than 4GiB memory. Our API is
(unfortunately) locked at 'unsigned long' and a scale of 1024, but
the rest of our system should consistently use 64-bit values,
especially since the previous patch centralized overflow checking.
* src/conf/domain_conf.h (_virDomainDef): Always use 64-bit values
for memory. Change hugepage_backed to a bool.
* src/conf/domain_conf.c (virDomainDefParseXML)
(virDomainDefCheckABIStability, virDomainDefFormatInternal): Fix
clients.
* src/vmx/vmx.c (virVMXFormatConfig): Likewise.
* src/xenxs/xen_sxpr.c (xenParseSxpr, xenFormatSxpr): Likewise.
* src/xenxs/xen_xm.c (xenXMConfigGetULongLong): New function.
(xenXMConfigGetULong, xenXMConfigSetInt): Avoid truncation.
(xenParseXM, xenFormatXM): Fix clients.
* src/phyp/phyp_driver.c (phypBuildLpar): Likewise.
* src/openvz/openvz_driver.c (openvzDomainSetMemoryInternal):
Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainDefineXML): Likewise.
* src/qemu/qemu_command.c (qemuBuildCommandLine): Likewise.
* src/qemu/qemu_process.c (qemuProcessStart): Likewise.
* src/qemu/qemu_monitor.h (qemuMonitorGetBalloonInfo): Likewise.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONGetBalloonInfo):
Likewise.
* src/qemu/qemu_driver.c (qemudDomainGetInfo)
(qemuDomainGetXMLDesc): Likewise.
* src/uml/uml_conf.c (umlBuildCommandLine): Likewise.
On 64-bit platforms, unsigned long and unsigned long long are
identical, so we don't have to worry about overflow checks.
On 32-bit platforms, anywhere we narrow unsigned long long back
to unsigned long, we have to worry about overflow; it's easier
to do this in one place by having most of the code use the same
or wider types, and only doing the narrowing at the last minute.
Therefore, the memory set commands remain unsigned long, and
the memory get command now centralizes the overflow check into
libvirt.c, so that drivers don't have to repeat the work.
This also fixes a bug where xen returned the wrong value on
failure (most APIs return -1 on failure, but getMaxMemory
must return 0 on failure).
* src/driver.h (virDrvDomainGetMaxMemory): Use long long.
* src/libvirt.c (virDomainGetMaxMemory): Raise overflow.
* src/test/test_driver.c (testGetMaxMemory): Fix driver.
* src/rpc/gendispatch.pl (name_to_ProcName): Likewise.
* src/xen/xen_hypervisor.c (xenHypervisorGetMaxMemory): Likewise.
* src/xen/xen_driver.c (xenUnifiedDomainGetMaxMemory): Likewise.
* src/xen/xend_internal.c (xenDaemonDomainGetMaxMemory):
Likewise.
* src/xen/xend_internal.h (xenDaemonDomainGetMaxMemory):
Likewise.
* src/xen/xm_internal.c (xenXMDomainGetMaxMemory): Likewise.
* src/xen/xm_internal.h (xenXMDomainGetMaxMemory): Likewise.
* src/xen/xs_internal.c (xenStoreDomainGetMaxMemory): Likewise.
* src/xen/xs_internal.h (xenStoreDomainGetMaxMemory): Likewise.
* src/xenapi/xenapi_driver.c (xenapiDomainGetMaxMemory):
Likewise.
* src/esx/esx_driver.c (esxDomainGetMaxMemory): Likewise.
* src/libxl/libxl_driver.c (libxlDomainGetMaxMemory): Likewise.
* src/qemu/qemu_driver.c (qemudDomainGetMaxMemory): Likewise.
* src/lxc/lxc_driver.c (lxcDomainGetMaxMemory): Likewise.
* src/uml/uml_driver.c (umlDomainGetMaxMemory): Likewise.
The test domain allows <memory>0</memory>, but the RNG was stating
that memory had to be at least 4096000 bytes. Hypervisors should
enforce their own limits, rather than complicating the RNG.
Meanwhile, some copy and paste had introduced some fishy constructs
in various unit tests.
* docs/schemas/domaincommon.rng (memoryKB, memoryKBElement): Drop
limit that isn't enforced in code.
* src/conf/domain_conf.c (virDomainDefParseXML): Require current
<= maximum.
* tests/qemuxml2argvdata/*.xml: Fix offenders.
Disk manufacturers are fond of quoting sizes in powers of 10,
rather than powers of 2 (after all, 2.1 GB sounds larger than
2.0 GiB, even though the exact opposite is true). So, we might
as well follow coreutils' lead in supporting three types of
suffix: single letter ${u} (which we already had) and ${u}iB
for the power of 2, and ${u}B for power of 10.
Additionally, it is impossible to create a file with more than
2**63 bytes, since off_t is signed (if you have enough storage
to even create one 8EiB file, I'm jealous). This now reports
failure up front rather than down the road when the kernel
finally refuses an impossible size.
* docs/schemas/basictypes.rng (unit): Add suffixes.
* src/conf/storage_conf.c (virStorageSize): Use new function.
* docs/formatstorage.html.in: Document it.
* tests/storagevolxml2xmlin/vol-file-backing.xml: Test it.
* tests/storagevolxml2xmlin/vol-file.xml: Likewise.
Make it obvious to 'dumpxml' readers what unit we are using,
since our default of KiB for memory (1024) differs from qemu's
default of MiB; and differs from our use of bytes for storage.
Tests were updated via:
$ find tests/*data tests/*out -name '*.xml' | \
xargs sed -i 's/<\(memory\|currentMemory\|hard_limit\|soft_limit\|min_guarantee\|swap_hard_limit\)>/<\1 unit='"'KiB'>/"
$ find tests/*data tests/*out -name '*.xml' | \
xargs sed -i 's/<\(capacity\|allocation\|available\)>/<\1 unit='"'bytes'>/"
followed by a few fixes for the stragglers.
Note that with this patch, the RNG for <memory> still forbids
validation of anything except unit='KiB', since the code silently
ignores the attribute; a later patch will expand <memory> to allow
scaled input in the code and update the RNG to match.
* docs/schemas/basictypes.rng (unit): Add 'bytes'.
(scaledInteger): New define.
* docs/schemas/storagevol.rng (sizing): Use it.
* docs/schemas/storagepool.rng (sizing): Likewise.
* docs/schemas/domaincommon.rng (memoryKBElement): New define; use
for memory elements.
* src/conf/storage_conf.c (virStoragePoolDefFormat)
(virStorageVolDefFormat): Likewise.
* src/conf/domain_conf.h (_virDomainDef): Document unit used
internally.
* src/conf/storage_conf.h (_virStoragePoolDef, _virStorageVolDef):
Likewise.
* tests/*data/*.xml: Update all tests.
* tests/*out/*.xml: Likewise.
* tests/define-dev-segfault: Likewise.
* tests/openvzutilstest.c (testReadNetworkConf): Likewise.
* tests/qemuargv2xmltest.c (blankProblemElements): Likewise.
Scaling an integer based on a suffix is something we plan on reusing
in several contexts: XML parsing, virsh CLI parsing, and possibly
elsewhere. Make it easy to reuse, as well as adding in support for
powers of 1000.
* src/util/util.h (virScaleInteger): New function.
* src/util/util.c (virScaleInteger): Implement it.
* src/libvirt_private.syms (util.h): Export it.
Overflow can be user-induced, so it deserves more than being called
an internal error. Note that in general, 32-bit platforms have
far more places to trigger this error (anywhere the public API
used 'unsigned long' but the other side of the connection is a
64-bit server); but some are possible on 64-bit platforms (where
the public API computes the product of two numbers).
* include/libvirt/virterror.h (VIR_ERR_OVERFLOW): New error.
* src/util/virterror.c (virErrorMsg): Translate it.
* src/libvirt.c (virDomainSetVcpusFlags, virDomainGetVcpuPinInfo)
(virDomainGetVcpus, virDomainGetCPUStats): Use it.
* daemon/remote.c (HYPER_TO_TYPE): Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockResize): Likewise.
Yes, I like kilobytes better than kibibytes (when I say kilobytes,
I generally mean 1024). But since the term is ambiguous, it can't
hurt to say what we mean, by using both the correct name and
calling out the numeric equivalent.
* src/libvirt.c (virDomainGetMaxMemory, virDomainSetMaxMemory)
(virDomainSetMemory, virDomainSetMemoryFlags)
(virNodeGetFreeMemory): Tweak wording.
* docs/formatdomain.html.in: Likewise.
* docs/formatstorage.html.in: Likewise.
ATTRIBUTE_UNUSED was accidentally forgotten on one arg of a stub
function for functionality that's not present on non-linux
platforms. This causes a non-linux build with
--enable-compile-warnings=error to fail.
The RPC code assumed that the array returned by the driver would be
fully populated; that is, ncpus on entry resulted in ncpus * return
value on exit. However, while we don't support holes in the middle
of ncpus, we do want to permit the case of ncpus on entry being
longer than the array returned by the driver (that is, it should be
safe for the caller to pass ncpus=128 on entry, and the driver will
stop populating the array when it hits max_id).
Additionally, a successful return implies that the caller will then
use virTypedParamArrayClear on the entire array; for this to not
free uninitialized memory, the driver must ensure that all skipped
entries are explicitly zeroed (the RPC driver did this, but not
the qemu driver).
There are now three cases:
server 0.9.10 and client 0.9.10 or newer: No impact - there were no
hypervisor drivers that supported cpu stats
server 0.9.11 or newer and client 0.9.10: if the client calls with
ncpus beyond the max, then the rpc call will fail on the client side
and disconnect the client, but the server is no worse for the wear
server 0.9.11 or newer and client 0.9.11: the server can return a
truncated array and the client will do just fine
I reproduced the problem by using a host with 2 CPUs, and doing:
virsh cpu-stats $dom --start 1 --count 2
* daemon/remote.c (remoteDispatchDomainGetCPUStats): Allow driver
to omit tail of array.
* src/remote/remote_driver.c (remoteDomainGetCPUStats):
Accommodate driver that omits tail of array.
* src/libvirt.c (virDomainGetCPUStats): Document this.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Clear all
unpopulated entries.
* For now, only "cpu_time" is supported.
* cpuacct cgroup is used for providing percpu cputime information.
* src/qemu/qemu.conf - take care of cpuacct cgroup.
* src/qemu/qemu_conf.c - take care of cpuacct cgroup.
* src/qemu/qemu_driver.c - added an interface
* src/util/cgroup.c/h - added interface for getting percpu cputime
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
These changes are applied only if the hostdev has a parent net device
(i.e. if it was defined as "<interface type='hostdev'>" rather than
just "<hostdev>"). If the parent netdevice has virtual port
information, the original virtualport associate functions are called
(these set and restore both mac and port profile on an
interface). Otherwise, only mac address is set on the device.
Note that This is only supported for SR-IOV Virtual Functions (not for
standard PCI or USB netdevs), and virtualport association is only
supported for 802.1Qbh. For all other types of cards and types of
virtualport, a "Config Unsupported" error is returned and the
operation fails.
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
This patch includes the following changes to virnetdevmacvlan.c and
virnetdevvportprofile.c:
- removes some netlink functions which are now available in
virnetdev.c
- Adds a vf argument to all port profile functions.
For 802.1Qbh devices, the port profile calls can use a vf argument if
passed by the caller. If the vf argument is -1 it will try to derive the vf
if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device
argument because during port profile assignment on a hostdev, this argument
can be null.
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
This patch adds the following:
- functions to set and get vf configs
- Functions to replace and store vf configs (Only mac address is handled today.
But the functions can be easily extended for vlans and other vf configs)
- function to dump link dev info (This is moved from virnetdevvportprofile.c)
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
pciDeviceGetVirtualFunctionInfo returns pf netdevice name and virtual
function index for a given vf. This is just a wrapper around existing functions
to return vf's pf and vf_index with one api call
pciConfigAddressToSysfsfile returns the sysfile pci device link
from a 'struct pci_config_address'
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
qemuDomainAttachNetDevice
- re-ordered some things at start of function because
networkAllocateActualDevice should always be run and a slot
in def->nets always allocated, but host_net_add isn't needed
if the actual type is hostdev.
- if actual type is hostdev, defer to
qemuDomainAttachHostDevice (which will reach up to the NetDef
for things like MAC address when necessary). After return
from qemuDomainAttachHostDevice, slip directly to cleanup,
since the rest of the function is specific to emulated net
devices.
- put assignment of new NetDef into expanded def->nets down
below cleanup: (but only on success) since it is also needed
for emulated and hostdev net devices.
qemuDomainDetachHostDevice
- after locating the exact device to detach, check if it's a
network device and, if so, use toplevel
qemuDomainDetachNetDevice instead so that the def->nets list
is properly updated, and 'actual device' properly returned to
network pool if appropriate. Otherwise, for normal hostdevs,
call the lower level qemuDomainDetachThisDevice.
qemuDomainDetachNetDevice
- This is where it gets a bit tricky. After locating the device
on the def->nets list, if the network device type == hostdev,
call the *lower level* qemuDomainDetachThisDevice (which will
reach back up to the parent net device for MAC address /
virtualport when appropriate, then clear the device out of
def->hostdevs) before skipping past all the emulated
net-device-specific code to cleanup:, where the network
device is removed from def->nets, and the network device
object is freed.
In short, any time a hostdev-type network device is detached, we must
go through the toplevel virDomaineDetachNetDevice function first and
last, to make sure 1) the def->nnets list is properly managed, and 2)
any device allocated with networkAllocateActualDevice is properly
freed. At the same time, in the middle we need to go through the
lower-level vidDomainDetach*This*HostDevice to be sure that 1) the
def->hostdevs list is properly managed, 2) the PCI device is properly
detached from the guest and reattached to the host (if appropriate),
and 3) any higher level teardown is called at the appropriate time, by
reaching back up to the NetDef config (part (3) will be covered in a
separate patch).
This patch makes sure that each network device ("interface") of
type='hostdev' appears on both the hostdevs list and the nets list of
the virDomainDef, and it modifies the qemu driver startup code so that
these devices will be presented to qemu on the commandline as hostdevs
rather than as network devices.
It does not add support for hotplug of these type of devices, or code
to honor the <mac address> or <virtualport> given in the config (both
of those will be done in separate patches).
Once each device is placed on both lists, much of what this patch does
is modify places in the code that traverse all the device lists so
that these hybrid devices are only acted on once - either along with
the other hostdevs, or along with the other network interfaces. (In
many cases, only one of the lists is traversed / a specific operation
is performed on only one type of device. In those instances, the code
can remain unchanged.)
There is one special case - when building the commandline, interfaces
are allowed to proceed all the way through
networkAllocateActualDevice() before deciding to skip the rest of
netdev-specific processing - this is so that (once we have support for
networks with pools of hostdev devices) we can get the actual device
allocated, then rely on the loop processing all hostdevs to generate
the correct commandline.
(NB: <interface type='hostdev'> is only supported for PCI network
devices that are SR-IOV Virtual Functions (VF). Standard PCI[e] and
USB devices, and even the Physical Functions (PF) of SR-IOV devices
can only be assigned to a guest using the more basic <hostdev> device
entry. This limitation is mostly due to the fact that non-SR-IOV
ethernet devices tend to lose mac address configuration whenever the
card is reset, which happens when a card is assigned to a guest;
SR-IOV VFs fortunately don't suffer the same problem.)
This is the new interface type that sets up an SR-IOV PCI network
device to be assigned to the guest with PCI passthrough after
initializing some network device-specific things from the config
(e.g. MAC address, virtualport profile parameters). Here is an example
of the syntax:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0' bus='0' slot='4' function='3'/>
</source>
<mac address='00:11:22:33:44:55'/>
<address type='pci' domain='0' bus='0' slot='7' function='0'/>
</interface>
This would assign the PCI card from bus 0 slot 4 function 3 on the
host, to bus 0 slot 7 function 0 on the guest, but would first set the
MAC address of the card to 00:11:22:33:44:55.
NB: The parser and formatter don't care if the PCI card being
specified is a standard single function network adapter, or a virtual
function (VF) of an SR-IOV capable network adapter, but the upcoming
code that implements the back end of this config will work *only* with
SR-IOV VFs. This is because modifying the mac address of a standard
network adapter prior to assigning it to a guest is pointless - part
of the device reset that occurs during that process will reset the MAC
address to the value programmed into the card's firmware.
Although it's not supported by any of libvirt's hypervisor drivers,
usb network hostdevs are also supported in the parser and formatter
for completeness and consistency. <source> syntax is identical to that
for plain <hostdev> devices, except that the <address> element should
have "type='usb'" added if bus/device are specified:
<interface type='hostdev'>
<source>
<address type='usb' bus='0' device='4'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
If the vendor/product form of usb specification is used, type='usb'
is implied:
<interface type='hostdev'>
<source>
<vendor id='0x0012'/>
<product id='0x24dd'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
Again, the upcoming patch to fill in the backend of this functionality
will log an error and fail with "Unsupported Config" if you actually
try to assign a USB network adapter to a guest using <interface
type='hostdev'> - just use a standard <hostdev> entry in that case
(and also for single-port PCI adapters).
This refactoring is necessary to support hotplug detach of
type=hostdev network devices, but needs to be in a separate patch to
make potential debugging of regressions more practical.
Rather than the lowest level functions searching for a matching
device, the search is now done in the toplevel function, and an
intermediate-level function (qemuDomainDetachThisHostDevice()), which
expects that the device's entry is already found, is called (this
intermediate function will be called by qemuDomainDetachNetDevice() in
order to support detach of type=hostdev net devices)
This patch should result in 0 differences in functionality.
Three new functions useful in other files:
virDomainHostdevInsert:
Add a new hostdev at the end of the array. This would more sensibly be
called virDomainHostdevAppend, but the existing functions for other
types of devices are called Insert.
virDomainHostdevRemove:
Eliminates one entry from the hostdevs array, but doesn't free it;
patterned after the code at the end of the two
qemuDomainDetachHostXXXDevice functions (and also other pre-existing
virDomainXXXRemove functions for other device types).
virDomainHostdevFind:
This function is patterned from the search loops at the top of
qemuDomainDetachHostPciDevice and qemuDomainDetachHostUsbDevice, and
will be used to re-factor those (and other detach-related) functions.
To shorten some new code that accesses the many fields within the
subsys struct of a hostdev, create a separate toplevel, typedefed
virDomainHostdevSubsys struct so that we can define temporary pointers
to the subsys part.
The parent can be any type of device. It defaults to type=none, and a
NULL pointer. The intent is that if a hostdevdef is contained in the
def for a higher level device (e.g. virDomainNetDef), hostdev->parent
will point to the higher level device, and type will be set to that
type of device. This way, during attach and detach of the device,
parent can be checked, and appropriate callouts made to do higher
level device initialization (e.g. setting MAC address).
Also, although these hostdevs with parents will be added to a domain's
hostdevs list, they will be treated slightly differently when
traversing the list, e.g. virDomainHostdefDefFree for a hostdev that
has a parent doesn't need to be called (and will be a NOP); it will
simply be removed from the list (since the parent device object is in
its own type-specific list, and will be freed from there).
In an upcoming patch, virDomainNetDef will acquire a
virDomainHostdevDef, and the <interface> XML will take on some of the
elements of a <hostdev>. To avoid duplicating the code for parsing and
formatting the <source> element (which will be nearly identical in
these two cases), this patch factors those parts out of the
HostdevDef's parse and format functions, and puts them into separate
helper functions that are now called by the HostdevDef
parser/formatter, and will soon be called by the NetDef
parser/formatter.
One change in behavior - previously virDomainHostdevDefParseXML() had
diverged from current common coding practice by logging an error and
failing if it found any subelements of <hostdev> other than those it
understood (standard libvirt practice is to ignore/discard unknown
elements and attributes during parse). The new helper function ignores
unknown elements, and thus so does the new
virDomainHostdevDefParseXML.
In order to allow for a virDomainHostdevDef that uses the
virDomainDeviceInfo of a "higher level" device (such as a
virDomainNetDef), this patch changes the virDomainDeviceInfo in the
HostdevDef into a virDomainDeviceInfoPtr. Rather than adding checks
all over the code to check for a null info, we just guarantee that it
is always valid. The new function virDomainHostdevDefAlloc() allocates
a virDomainDeviceInfo and plugs it in, and virDomainHostdevDefFree()
makes sure it is freed.
There were 4 places allocating virDomainHostdevDefs, all of them
parsers of one sort or another, and those have all had their
VIR_ALLOC(hostdev) changed to virDomainHostdevDefAlloc(). Other than
that, and the new functions, all the rest of the changes are just
mechanical removals of "&" or changing "." to "->".
There will be cases where the iterator callback will need to know the
type of the device whose info is being operated on, and possibly even
need to use some of the device's config. This patch adds a
virDomainDeviceDefPtr to the args of every callback, and fills it in
appropriately as the devices are iterated through.
The virDomainDeviceInfoPtrs in qemuCollectPCIAddress and
qemuComparePCIDevice are named "dev" and "dev1", but those functions
will be changed (in order to match a change in the args sent to
virDomainDeviceInfoIterate() callback args) to contain a
virDomainDeviceDefPtr device.
This patch renames "dev" to "info" (and "dev[n]" to "info[n]") to
avoid later confusion.
This patch is only code movement + adding some forward definitions of
typedefs.
virDomainHostdevDef (not just a pointer to it, but an actual object)
will be needed in virDomainNetDef and virDomainActualNetDef, so it
must be relocated earlier in the file.
Likewise, virDomainDeviceDef will be needed in virDomainHostdevDef, so
it must be moved up even earlier. This, in turn, creates a forward
reference problem, but fortunately only with pointers to other device
types, so their typedefs can be moved up in the file, eliminating the
problem.
Not all device types were represented in virDomainDeviceType, so some
types of devices couldn't be represented in a virDomainDeviceDef
(which requires a different type of pointer in the union for each
different kind of device).
Since serial, parallel, channel, and console devices are all
virDomainChrDef, and the virDomainDeviceType is never used to produce
a string from the type (and only used in the other direction
internally to code, never to produce XML), I only added one "CHR"
type, which is associated with "virDomainChrDefPtr chr" in the union.
Commit 723d5c (added after the release of 0.9.10) adds a
NetlinkEventClient for each interface sent to
virNetDevMacVLanCreateWithVPortProfile. This should only be done if
the interface actually *has* a virtPortProfile, otherwise the event
handler would be a NOP. The bigger problem is that part of the setup
to create the NetlinkEventClient is to do a memcpy of virtPortProfile
- if it's NULL, this triggers a segv.
This patch just qualifies the code that adds the client - if
virtPortProfile is NULL, it's skipped.
Qemu supports sizing by bytes; we shouldn't force the user to
round up if they really wanted an unaligned total size.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_RESIZE_BYTES):
New flag.
* src/libvirt.c (virDomainBlockResize): Document it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockResize): Take
size in bytes.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextBlockResize):
Likewise. Pass bytes, not megabytes, to monitor.
* src/qemu/qemu_driver.c (qemuDomainBlockResize): Implement new
flag.
No matter what cache mode is used, readonly disks are always safe wrt
migration. Shared disks are required to be readonly or to disable
host-side cache, which makes them safe as well.
A multi-threaded client with event loop may crash if one of its threads
closes a connection while event loop is in the middle of sending
keep-alive message (either request or response). The right place for it
is inside virNetClientIOEventLoop() between poll() and
virNetClientLock(). We should only close a connection directly if no-one
is using it and defer the closing to the last user otherwise. So far we
only did so if the close was initiated by keep-alive timeout.
Building virt-aa-helper with dtrace probes enabled, ldd complained about
undefined references:
./.libs/libvirt_util.a(libvirt_util_la-event_poll.o):(.note.stapsdt+0x24):
undefined reference to `libvirt_event_poll_purge_timeout_semaphore'
...
Lets say I got a volume with '1G' allocation and '10G' capacity. The
available space in the parent pool is '5G'. With the current check for
overcapacity, I can only try to resize to <= '6G'. You see the problem?
With an additional new bool added to determine whether or not to
discourage the use of the supplied MAC address by the bridge itself,
virNetDevTapCreateInBridgePort had three booleans (well, 2 bools and
an int used as a bool) in the arg list, which made it increasingly
difficult to follow what was going on. This patch combines those three
into a single flags arg, which not only shortens the arg list, but
makes it more self-documenting.
When a tap device for a domain is created and attached to a bridge,
the first byte of the tap device MAC address is set to 0xFE, while the
rest is set to match the MAC address that will be presented to the
guest as its network device MAC address. Setting this high value in
the tap's MAC address discourages the bridge from using the tap
device's MAC address as the bridge's own MAC address (Linux bridges
always take on the lowest numbered MAC address of all attached devices
as their own).
In one case within libvirt, a tap device is created and attached to
the bridge with the intent that its MAC address be taken on by the
bridge as its own (this is used to assure that the bridge has a fixed
MAC address to prevent network outages created by the bridge MAC
address "flapping" as guests are started and stopped). In this case,
the first byte of the mac address is *not* altered to 0xFE.
In the current code, callers to virNetDevTapCreateInBridgePort each
make the MAC address modification themselves before calling, which
leads to code duplication, and also prevents lower level functions
from knowing the real MAC address being used by the guest. The problem
here is that openvswitch bridges must be informed about this MAC
address, or they will be unable to pass traffic to/from the guest.
This patch centralizes the location of the MAC address "0xFE fixup"
into virNetDevTapCreateInBridgePort(), meaning 1) callers of this
function no longer need the extra strange bit of code, and 2)
bitNetDevTapCreateBridgeInPort itself now is called with the guest's
unaltered MAC address, and can pass it on, unmodified, to
virNetDevOpenvswitchAddPort.
There is no other behavioral change created by this patch.
Nuke the last vestiges of printing pid_t values with the wrong
types, at least in code compiled on mingw64. There may be other
places, but for now they are only compiled on systems where the
existing %d doesn't trigger gcc warnings.
* src/rpc/virnetsocket.c (virNetSocketNew): Use %lld and casting,
rather than assuming any particular int type for pid_t.
* src/util/command.c (virCommandRunAsync, virPidWait)
(virPidAbort): Likewise.
(verify): Drop a now stale assertion.
No thanks to 64-bit windows, with 64-bit pid_t, we have to avoid
constructs like 'int pid'. Our API in libvirt-qemu cannot be
changed without breaking ABI; but then again, libvirt-qemu can
only be used on systems that support UNIX sockets, which rules
out Windows (even if qemu could be compiled there) - so for all
points on the call chain that interact with this API decision,
we require a different variable name to make it clear that we
audited the use for safety.
Adding a syntax-check rule only solves half the battle; anywhere
that uses printf on a pid_t still needs to be converted, but that
will be a separate patch.
* cfg.mk (sc_correct_id_types): New syntax check.
* src/libvirt-qemu.c (virDomainQemuAttach): Document why we didn't
use pid_t for pid, and validate for overflow.
* include/libvirt/libvirt-qemu.h (virDomainQemuAttach): Tweak name
for syntax check.
* src/vmware/vmware_conf.c (vmwareExtractPid): Likewise.
* src/driver.h (virDrvDomainQemuAttach): Likewise.
* tools/virsh.c (cmdQemuAttach): Likewise.
* src/remote/qemu_protocol.x (qemu_domain_attach_args): Likewise.
* src/qemu_protocol-structs (qemu_domain_attach_args): Likewise.
* src/util/cgroup.c (virCgroupPidCode, virCgroupKillInternal):
Likewise.
* src/qemu/qemu_command.c(qemuParseProcFileStrings): Likewise.
(qemuParseCommandLinePid): Use pid_t for pid.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* src/conf/domain_conf.h (_virDomainObj): Likewise.
* src/probes.d (rpc_socket_new): Likewise.
* src/qemu/qemu_command.h (qemuParseCommandLinePid): Likewise.
* src/qemu/qemu_driver.c (qemudGetProcessInfo, qemuDomainAttach):
Likewise.
* src/qemu/qemu_process.c (qemuProcessAttach): Likewise.
* src/qemu/qemu_process.h (qemuProcessAttach): Likewise.
* src/uml/uml_driver.c (umlGetProcessInfo): Likewise.
* src/util/virnetdev.h (virNetDevSetNamespace): Likewise.
* src/util/virnetdev.c (virNetDevSetNamespace): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput): Likewise.
* src/conf/storage_conf.h (_virStoragePerms): Use mode_t, uid_t,
and gid_t rather than int.
* src/security/security_dac.c (virSecurityDACSetOwnership): Likewise.
* src/conf/storage_conf.c (virStorageDefParsePerms): Avoid
compiler warning.
Commit 7c90026 added #include "conf/domain_conf.h" to
util/virrandom.c. Fortunately it didn't actually use anything from
domain_conf.h, since as far as I'm aware, files in util aren't allowed
to reference anything in conf (although the opposite is allowed). So
this #include is unnecessary.
I verified it still compiles with the line removed, but have placed a
one day moratorium on me doing any "trivial rule" pushes, so will
wait for someone else to verify/ACK before pushing.
This actually wires up the new optional parameter to block_stream:
http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI
The error checking is still sparse, since libvirt must not use
qemu-img or header probing on a qcow2 file in use by qemu to
check if the backing file name is valid; so for now, libvirt is
relying on qemu to diagnose an incorrect backing name. Fixing this
will require libvirt to track the entire backing file chain at the
time qemu is started and keeps it updated with snapshot and pull
operations.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Add
parameter, and update callers.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJob): Update
signature.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Update caller.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Likewise.
Block job commands are not part of upstream qemu until 1.1; and
proper support of job completion and cancellation depends on being
able to receive QMP events, which implies the JSON monitor.
Additionally, some early versions of block job commands were
backported to RHEL qemu, but these versions lacked asynchronous
job cancellation and partial block pull, so there are several
patches that will still be needed in this area of libvirt code
to support both flavors of block job commands.
Due to earlier patches in libvirt, we are guaranteed that all versions
of qemu that support block job commands already require libvirt to
use the JSON monitor. That means that the text version of block jobs
will not be used, and having to refactor two copies of the block job
handlers makes no sense. So instead, we delete the text handlers.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Drop text monitor
support.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextBlockJob): Delete.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextParseBlockJobOne)
(qemuMonitorTextParseBlockJob, qemuMonitorTextBlockJob):
Likewise.
Add de-association handling for 802.1qbg (vepa) via lldpad
netlink messages. Also adds the possibility to perform an
association request without waiting for a confirmation.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
This code adds a netlink event interface to libvirt.
It is based upon the event_poll code and makes use of
it. An event is generated for each netlink message sent
to the libvirt pid.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
In qemu there are 2 cpu models (cpu64-rhel5 and cpu64-rhel6) not
supported by libvirt. This patch adds the support with the flags
specifications from /usr/share/qemu-kvm/cpu-model/cpu-x86_64.conf
The only difference is that AMD-specific features are removed so
the processor type is not vendor-specific. Those features are either
emulated or ignored by qemu if host CPU doesn't support them.
This call to virDomainDeviceDefParse is both unnecessary (since
it will again be called at the top of the immediately following if(),
and if not there, then at the top of the if following that), but it
also creates a leak of one virDomainDeviceDef and one [whatever type
of device the DeviceDef is pointing to; probably a virDomainDiskDef]
in the case that the function has been called with
VIR_DOMAIN_DEVICE_MODIFY_CONFIG (the second parse will overwrite the
devicedef that was just created).
For any disk controller model which is not "lsilogic", the command
line will be like:
-drive file=/dev/sda,if=none,id=drive-scsi0-0-3-0,format=raw \
-device scsi-disk,bus=scsi0.0,channel=0,scsi-id=3,lun=0,i\
drive=drive-scsi0-0-3-0,id=scsi0-0-3-0
The relationship between the libvirt address attrs and the qdev
properties are (controller model is not "lsilogic"; strings
inside <> represent libvirt adress attrs):
bus=scsi<controller>.0
channel=<bus>
scsi-id=<target>
lun=<unit>
* src/qemu/qemu_command.h: (New param "virDomainDefPtr def"
for function qemuBuildDriveDevStr; new param "virDomainDefPtr
vmdef" for function qemuAssignDeviceDiskAlias. Both for
virDomainDiskFindControllerModel's use).
* src/qemu/qemu_command.c:
- New param "virDomainDefPtr def" for qemuAssignDeviceDiskAliasCustom.
For virDomainDiskFindControllerModel's use, if the disk bus is "scsi"
and the controller model is not "lsilogic", "target" is one part of
the alias name.
- According change on qemuAssignDeviceDiskAlias and qemuBuildDriveDevStr
* src/qemu/qemu_hotplug.c:
- Changes to be consistent with declarations of qemuAssignDeviceDiskAlias
qemuBuildDriveDevStr, and qemuBuildControllerDevStr.
* tests/qemuxml2argvdata/qemuxml2argv-pseries-vio-user-assigned.args,
tests/qemuxml2argvdata/qemuxml2argv-pseries-vio.args: Update the
generated command line.
* src/conf/domain_conf.h: Add new member "target" to struct
_virDomainDeviceDriveAddress.
* src/conf/domain_conf.c: Parse and format "target"
* Lots of tests (.xml) in tests/domainsnapshotxml2xmlout,
tests/qemuxml2argvdata, tests/qemuxml2xmloutdata, and
tests/vmx2xmldata/ are modified for newly introduced
attribute "target" for address of "drive" type.
KVM will be able to use a PCI SCSI controller even on POWER. Let
the user specify the vSCSI controller by other means than a default.
After this patch, the QEMU driver will actually look at the model
and reject anything but auto, lsilogic and ibmvscsi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
In qemuDomainAttachNetDevice, the guest's tap interface has only been
attached to the bridge if iface_connected is true. It's possible for
an error to occur prior to that happening, and previously we would
attempt to remove the tap interface from the bridge even if it hadn't
been attached.
QMP commands don't need to be escaped since converting them to json
also escapes special characters. When a QMP command fails, however,
libvirt falls back to HMP commands. These fallback functions
(qemuMonitorText*) do their own escaping, and pass the result directly
to qemuMonitorHMPCommandWithFd. If the monitor is in json mode, these
pre-escaped commands will be escaped again when converted to json,
which can result in the wrong arguments being sent.
For example, a filename test\file would be sent in json as
test\\file.
This prevented attaching an image file with a " or \ in its name in
qemu 1.0.50, and also broke rbd attachment (which uses backslashes to
escape some internal arguments.)
Reported-by: Masuko Tomoya <tomoya.masuko@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch fixes console corruption, that happens if two concurrent
sessions are opened for a single console on a domain. Result of this
corruption was that each of the console streams recieved just a part
of the data written to the pipe so every console rendered unusable.
New helper function for safe console handling is used to establish the
console stream connection. This function ensures that no other libvirt
client is using the console (with the ability to disconnect consoles of
libvirt clients) and that no UUCP style lockfile is placed on the PTY
device.
* src/qemu/qemu_domain.h
- add data structure to domain's private data dealing with
console connections
* src/qemu/qemu_domain.c:
- allocate/free domain's console data structure
* src/qemu/qemu_driver.c
- use the new helper function for console handling
This patch adds a set of functions used in creating console streams for
domains using PTYs and ensures mutually exclusive access to the PTYs.
If mutually exclusive access is not used, two clients may open the same
console, which results in corruption on both clients as both of them
race to read data from the PTY.
Two approaches are used to ensure this:
1) Internal data structure holding open PTYs.
This is used internally and enables the user to forcibly
terminate another console connection eg. when somebody leaves
the console open on another host.
2) UUCP style lock files:
This uses UUCP lock files according to the FHS
( http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLOCKLOCKFILES )
to check if other programs (like minicom) are not using the pty
device of the console.
This feature is disabled by default and may be enabled using
configure parameter
--with-console-lock-files=/path/to/lock/file/directory
or --with-console-lock-files=auto (which tries to infer the
location from OS used (currently only linux).
On usual linux systems, normal users may not write to the
/var/lock directory containing the locks. This poses problems
while in session mode. If the current user has no access to the
lockfile directory, check for presence of the file is still
done, but no lock file is created. This does NOT result in an
error.
This patch adds another callback to a FDstream object. The original
callback is used by the daemon stream driver to handle events.
This callback is called if and only if the stream is about to be closed.
This might be used to handle cleanup steps after a fdstream exits. This
will be used later on in ensuring mutually exclusive access to consoles.
* src/fdstream.c:
- emit the callback, when stream is being closed
- add data structures needed to handle the callback
- add function to register callback
* src/fdstream.h:
- define function prototypes for the callback
This patch causes the fdstream driver to call the stream event callback
if virStreamAbort() is called on a stream using this driver.
A remote handler for a stream can only detect changes via stream events,
so this event callback is necessary in order to enable a daemon to abort
a stream in such a way that the client will see the change.
* src/fdstream.c:
- modify close function to call stream event callback
This patch adds a set of flags to be used with the virDomainOpenConsole
API call to specify if the user wishes to interrupt an existing console
session or just to try open a new one.
VIR_DOMAIN_CONSOLE_SAFE - specifies that the console connection should
be opened only if the hypervisor supports
mutually exclusive access to console devices
VIR_DOMAIN_CONSOLE_FORCE - specifies that the caller wishes to interrupt
existing session and force a creation of a
new one.
This patch changes behavior of virPidFileRead to enable passing NULL as
path to the binary the pid file should be checked against to skip this
check. This enables using this function for reading files that have same
semantics as pid files, but belong to unknown processes.
using 'system-wakeup' monitor command. It is supported only in JSON,
as we are enabling it if possible. Moreover, this command is available
in qemu-1.1+ which definitely has JSON.
Function xmlParseURI does not remove square brackets around IPv6
address when parsing. One of the solutions is making wrappers around
functions working with xmlURI*. This assures that uri->server will be
always properly assigned and it doesn't have to be changed when used
on some new place in the code.
For this purpose, functions virParseURI and virSaveURI were
added. These function are wrappers around xmlParseURI and xmlSaveUri
respectively.
Also there is one new syntax check function to prohibit these functions
anywhere else.
File changes:
- src/util/viruri.h -- declaration
- src/util/viruri.c -- definition
- src/libvirt_private.syms -- symbol export
- src/Makefile.am -- added source and header files
- cfg.mk -- added sc_prohibit_xmlURI
- all others -- ID name and include fixes
The /usr/include/python/pyconfig.h file pollutes the global
namespace with a huge number of HAVE_XXX and WITH_XXX
defines. These change what we detected in our own config.h
In particular if you try to build without DTrace, python's
headers turn it back on with predictable fail.
THe hack to workaround this is to rename WITH_DTRACE to
WITH_DTRACE_PROBES to avoid the namespace clash
It's possible to disable SPICE TLS in qemu.conf. When this happens,
libvirt ignores any SPICE TLS port or x509 directory that may have
been set when it builds the qemu command line to use. However, it's
not ignoring the secure channels that may have been set and adds
tls-channel arguments to qemu command line.
Current qemu versions don't report an error when this happens, and try to use
TLS for the specified channels.
Before this patch
<domain type='kvm'>
<name>auto-tls-port</name>
<memory>65536</memory>
<os>
<type arch='x86_64' machine='pc'>hvm</type>
</os>
<devices>
<graphics type='spice' port='5900' tlsPort='-1' autoport='yes' listen='0' ke
<listen type='address' address='0'/>
<channel name='main' mode='secure'/>
<channel name='inputs' mode='secure'/>
</graphics>
</devices>
</domain>
generates
-spice port=5900,addr=0,disable-ticketing,tls-channel=main,tls-channel=inputs
and starts QEMU.
After this patch, an error is reported if a TLS port is set in the XML
or if secure channels are specified but TLS is disabled in qemu.conf.
This is the behaviour the oVirt people (where I spotted this issue) said
they would expect.
This fixes bug #790436
This patch adds support for vmx files with empty networkName
values (which is the case for vmx generated by Workstation).
It also adds support for vmx containing NATed network interfaces.
Update test suite accordingly
[forwarding this here from RH bug #796732]
When creating a network (virsh net-create) with an erroneous XML
containing an empty <name> element, the error message is misleading:
error: Failed to create network from foo.xml
error: missing domain name information
It took me a bit of time to figure out that it was the *network* name
that was missing (I generate this xml and didn't look at it, first).
I realized that the same message is used for missing name when creating
a domain, network, or device node.
https://bugzilla.redhat.com/show_bug.cgi?id=795656 mentions
that a graceful destroy request can time out, meaning that the
error message is user-visible and should be more appropriate
than just internal error.
* src/qemu/qemu_driver.c (qemuDomainDestroyFlags): Swap error type.
Migrating domains with disks using cache != none is unsafe unless the
disk images are stored on coherent clustered filesystem. Thus we forbid
migrating such domains unless VIR_MIGRATE_UNSAFE flags is used.
This patch adds VIR_MIGRATE_UNSAFE flag for migration APIs and new
VIR_ERR_MIGRATION_UNSAFE error code. The error code should be returned
whenever migrating a domain is considered unsafe (e.g., it's configured
in a way that does not ensure data integrity once it is migrated).
VIR_MIGRATE_UNSAFE flag may be used to force migration even though it
would normally be considered unsafe and forbidden.
AC_CHECK_PROG checks for program in given path. However, if it doesn't
exists, [variable] is set to [value-if-not-found]. We don't want this
to be the empty string in case of 'modprobe' and 'scrub' as we want to
fallback to runtime detection.
Adding "Expect:" to the header list stops libcurl from sending a
Expect header at all.
Before, a dummy Expect header was added that might confuse HTTP
proxies and result in HTTP error code 417 being reported.
Previously we would have:
"os type 'hvm' & arch 'idontexist' combination is not supported"
Now we get
"No guest options available for arch 'idontexist'"
or if options available but guest OS type not applicable:
"No os type 'xen' available for arch 'x86_64'"
* src/util/virfile.h: the virFileWrapperFdFlags being defined as
a globa variable instead of a type ended up generating a duplicate
symbol error.
* AUTHORS: added Lincoln Myers
* src/qemu/qemu_process.c (qemuFindAgentConfig): avoid crash libvirtd due to
deref a NULL pointer.
* How to reproduce?
1. virsh edit the following xml into guest configuration:
<channel type='pty'>
<target type='virtio'/>
</channel>
2. virsh start <domain>
or
% virt-install -n foo -r 1024 --disk path=/var/lib/libvirt/images/foo.img,size=1 \
--channel pty,target_type=virtio -l <installation tree>
Signed-off-by: Alex Jia <ajia@redhat.com>
When migrating a qemu domain, we enter the monitor, send some commands,
try to connect to destination qemu, send other commands, end exit the
monitor. However, if we couldn't connect to destination qemu we forgot
to exit the monitor.
Bug introduced by commit d9d518b1c8.
In case libvirtd cannot detect host CPU model (which may happen if it
runs inside a virtual machine), the daemon is likely to segfault when
starting a new qemu domain. It segfaults when domain XML asks for host
(either model or passthrough) CPU or does not ask for any specific CPU
model at all.
Currently, if scrub (used for wiping algorithms) is not present
at compile time, we don't support any other wiping algorithms than
zeroing, even if it was installed later. Switch to runtime detection
instead.
Bug introduced in commit 35abced. On an inactive domain,
$ virsh snapshot-create-as dom snap
$ virsh snapshot-create dom
$ virsh snapshot-create dom
$ virsh snapshot-delete --children dom snap
could crash libvirtd, due to a use-after-free that results
when the callback freed the current element in the iteration.
* src/conf/domain_conf.c (virDomainSnapshotForEachChild)
(virDomainSnapshotActOnDescendant): Allow iteration to delete
current child.
This patch allows libvirt to add interfaces to already
existing Open vSwitch bridges. The following syntax in
domain XML file can be used:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'/>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
or if libvirt should auto-generate the interfaceid use
following syntax:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
It is also possible to pass an optional profileid. To do that
use following syntax:
<interface type='bridge'>
<source bridge='ovsbr'/>
<mac address='00:55:1a:65:a2:8d'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'
profileid='test-profile'/>
</virtualport>
</interface>
To create Open vSwitch bridge install Open vSwitch and
run the following command:
ovs-vsctl add-br ovsbr
The current default method of terminating the qemu process is to send
a SIGTERM, wait for up to 1.6 seconds for it to cleanly shutdown, then
send a SIGKILL and wait for up to 1.4 seconds more for the process to
terminate. This is problematic because occasionally 1.6 seconds is not
long enough for the qemu process to flush its disk buffers, so the
guest's disk ends up in an inconsistent state.
Since this only occasionally happens when the timeout prior to SIGKILL
is 1.6 seconds, this patch increases that timeout to 10 seconds. At
the very least, this should reduce the occurrence from "occasionally"
to "extremely rarely". (Once SIGKILL is sent, it waits another 5
seconds for the process to die before returning).
Note that in the cases where it takes less than this for qemu to
shutdown cleanly, libvirt will *not* wait for any longer than it would
without this patch - qemuProcessKill polls the process and returns as
soon as it is gone.
This patch is based on an earlier patch by Eric Blake which was never
committed:
https://www.redhat.com/archives/libvir-list/2011-November/msg00243.html
Aside from rebasing, this patch only drops the driver lock once (prior
to the first time the function sleeps), then leaves it dropped until
it returns (Eric's patch would drop and re-acquire the lock around
each call to sleep).
At the time Eric sent his patch, the response (from Dan Berrange) was
that, while it wasn't a good thing to be holding the driver lock while
sleeping, we really need to rethink locking wrt the driver object,
switching to a finer-grained approach that locks individual items
within the driver object separately to allow for greater concurrency.
This is a good plan, and at the time it made sense to not apply the
patch because there was no known bug related to the driver lock being
held in this function.
However, we now know that the length of the wait in qemuProcessKill is
sometimes too short to allow the qemu process to fully flush its disk
cache before SIGKILL is sent, so we need to lengthen the timeout (in
order to improve the situation with management applications until they
can be updated to use the new VIR_DOMAIN_DESTROY_GRACEFUL flag added
in commit 72f8a7f197). But, if we
lengthen the timeout, we also lengthen the amount of time that all
other threads in libvirtd are essentially blocked from doing anything
(since just about everything needs to acquire the driver lock, if only
for long enough to get a pointer to a domain).
The solution is to modify qemuProcessKill to drop the driver lock
while sleeping, as proposed in Eric's patch. Then we can increase the
timeout with a clear conscience, and thus at least lower the chances
that someone running with existing management software will suffer the
consequence's of qemu's disk cache not being flushed.
In the meantime, we still should work on Dan's proposal to make
locking within the driver object more fine grained.
(NB: although I couldn't find any instance where qemuProcessKill() was
called with no jobs active for the domain (or some other guarantee
that the current thread had at least one refcount on the domain
object), this patch still follows Eric's method of temporarily adding
a ref prior to unlocking the domain object, because I couldn't
convince myself 100% that this was the case.)
In the future (my next patch in fact) we may want to make
decisions depending on qemu having a monitor command or not.
Therefore, we want to set qemuCaps flag instead of querying
on the monitor each time we are about to make that decision.
When blkdeviotune was first committed in 0.9.8, we had the limitation
that setting one value reset all others. But bytes and iops should
be relatively independent. Furthermore, setting tuning values on
a live domain followed by dumpxml did not output the new settings.
* src/qemu/qemu_driver.c (qemuDiskPathToAlias): Add parameter, and
update callers.
(qemuDomainSetBlockIoTune): Don't lose previous unrelated
settings. Make live changes reflect to dumpxml output.
* tools/virsh.pod (blkdeviotune): Update documentation.
Detected by valgrind. Leaks are introduced in commit c1b2264.
* src/remote/remote_driver.c (doRemoteOpen): free client program memory in failure path.
* How to reproduce?
% valgrind -v --leak-check=full virsh -c qemu:
* Actual result
==3969== 40 bytes in 1 blocks are definitely lost in loss record 8 of 28
==3969== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==3969== by 0x4C89C41: virAlloc (memory.c:101)
==3969== by 0x4D5A236: virNetClientProgramNew (virnetclientprogram.c:60)
==3969== by 0x4D47AB4: doRemoteOpen (remote_driver.c:658)
==3969== by 0x4D49FFF: remoteOpen (remote_driver.c:871)
==3969== by 0x4D13373: do_open (libvirt.c:1196)
==3969== by 0x4D14535: virConnectOpenAuth (libvirt.c:1422)
==3969== by 0x425627: main (virsh.c:18537)
==3969==
==3969== 40 bytes in 1 blocks are definitely lost in loss record 9 of 28
==3969== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==3969== by 0x4C89C41: virAlloc (memory.c:101)
==3969== by 0x4D5A236: virNetClientProgramNew (virnetclientprogram.c:60)
==3969== by 0x4D47AD7: doRemoteOpen (remote_driver.c:664)
==3969== by 0x4D49FFF: remoteOpen (remote_driver.c:871)
==3969== by 0x4D13373: do_open (libvirt.c:1196)
==3969== by 0x4D14535: virConnectOpenAuth (libvirt.c:1422)
==3969== by 0x425627: main (virsh.c:18537)
==3969==
==3969== LEAK SUMMARY:
==3969== definitely lost: 80 bytes in 2 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
The auto-generated WWN comply with the new addressing schema of WWN:
<quote>
the first nibble is either hex 5 or 6 followed by a 3-byte vendor
identifier and 36 bits for a vendor-specified serial number.
</quote>
We choose hex 5 for the first nibble. And for the 3-bytes vendor ID,
we uses the OUI according to underlying hypervisor type, (invoking
virConnectGetType to get the virt type). e.g. If virConnectGetType
returns "QEMU", we use Qumranet's OUI (00:1A:4A), if returns
ESX|VMWARE, we use VMWARE's OUI (00:05:69). Currently it only
supports qemu|xen|libxl|xenapi|hyperv|esx|vmware drivers. The last
36 bits are auto-generated.
Some audit records generated by libvirt contain fields enclosed by single
quotes. Since those fields are inside the msg field, which is enclosed by
single quotes, these records generated by libvirt are not correctly parsed by
libauparse.
Some tools, such as virt-manager, prefers having the default USB
controller explicit in the XML document. This patch makes sure there
is one. With this patch, it is now possible to switch from USB1 to
USB2 from the release 0.9.1 of virt-manager.
Fix tests to pass with this change.
virsh blkiotune dom --device-weights /dev/sda,400 --config
wasn't working correctly.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Use
correct definition.
Now that no one is relying on the return value being a pointer to
somewhere inside of the passed-in argument, we can simplify the
callers to simply return success or failure. Also wrap some long
lines and add some const-correctness.
* src/util/sysinfo.c (virSysinfoParseBIOS, virSysinfoParseSystem)
(virSysinfoParseProcessor, virSysinfoParseMemory): Change return.
(virSysinfoRead): Adjust caller.
Reported by Alex Jia:
==21503== 112 (32 direct, 80 indirect) bytes in 1 blocks are
definitely lost in loss record 37 of 40
==21503== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==21503== by 0x4A8991: virAlloc (memory.c:101)
==21503== by 0x505A6C: x86DataCopy (cpu_x86.c:247)
==21503== by 0x507B34: x86Compute (cpu_x86.c:1225)
==21503== by 0x43103C: qemuBuildCommandLine (qemu_command.c:3561)
==21503== by 0x41C9F7: testCompareXMLToArgvHelper
(qemuxml2argvtest.c:183)
==21503== by 0x41E10D: virtTestRun (testutils.c:141)
==21503== by 0x41B942: mymain (qemuxml2argvtest.c:705)
==21503== by 0x41D7E7: virtTestMain (testutils.c:696)
In case the caller specifies that confined guests are required but the
security driver turns out to be 'none', we should return an error since
this driver clearly cannot meet that requirement. As a result of this
error, libvirtd fails to start when the host admin explicitly sets
confined guests are required but there is no security driver available.
Since security driver 'none' cannot create confined guests, we override
default confined setting so that hypervisor drivers do not thing they
should create confined guests.
Security label type 'none' requires relabel to be set to 'no' so there's
no reason to output this extra attribute. Moreover, since relabel is
internally stored in a negative from (norelabel), the default value for
relabel would be 'yes' in case there is no <seclabel> element in domain
configuration. In case VIR_DOMAIN_SECLABEL_DEFAULT turns into
VIR_DOMAIN_SECLABEL_NONE, we would incorrectly output relabel='yes' for
seclabel type 'none'.
Qemu uses non-blocking I/O which doesn't play nice with regular file
descriptors. We need to pass a pipe to qemu instead, which can easily be
done using iohelper.
virFileDirectFd was used for accessing files opened with O_DIRECT using
libvirt_iohelper. We will want to use the helper for accessing files
regardless on O_DIRECT and thus virFileDirectFd was generalized and
renamed to virFileWrapperFd.
dmidecode displays processor information, followed by BIOS, system and
memory-DIMM details.
Calls to virSysinfoParseBIOS(), virSysinfoParseSystem() would update
the buffer pointer 'base', so the processor information would be lost
before virSysinfoParseProcessor() was called. Sysinfo would therefore
not be able to display processor details -- It only described <bios>,
<system> and <memory_device> details.
This patch attempts to insulate sysinfo from ordering of dmidecode
output.
Before the fix:
---------------
virsh # sysinfo
<sysinfo type='smbios'>
<bios>
....
</bios>
<system>
....
</system>
<memory_device>
....
</memory_device>
After the fix:
-------------
virsh # sysinfo
<sysinfo type='smbios'>
<bios>
....
</bios>
<system>
....
</system>
<processor>
....
</processor>
<memory_device>
....
</memory_device>
Input to the volume cloning code is a source volume and an XML
descriptor for the new volume. It is possible for the new volume
to have a greater size than source volume, at which point libvirt
will just stick 0s on the end of the new image (for raw format
anyways).
Unfortunately a logic error messed up our tracking of the of the
excess amount that needed to be written: end result is that sparse
clones were made very much non-sparse, and cloning regular disk
images could end up excessively sized (though data unaltered).
Drop the 'remain' variable entriely here since it's redundant, and
track actual allocation directly against the desired 'total'.
gcc 4.7 complains:
util/virhashcode.c:49:17: error: always_inline function might not be inlinable [-Werror=attributes]
util/virhashcode.c:35:17: error: always_inline function might not be inlinable [-Werror=attributes]
Normal 'inline' is a hint that the compiler may ignore; the fact
that the function is static is good enough. We don't care if the
compiler decided not to inline after all.
* src/util/virhashcode.c (getblock, fmix): Relax attribute.
On CentOS5:
If "virsh edit $DOM" is used and an error happens (for example changing
any live cycle action to a non-existing value), libvirt forgets that
$DOM exists, since it is already removed from the internal hash tables,
which are used for domain lookup.
In once case (unreproducible) even the persistent configuration
/etc/xen/$DOM was deleted.
Instead of using the compound function xenXMConfigSaveFile() explicitly
use xenFomatXM() and virConfWriteFile() to distinguish between a failure
in converting the libvirt definition to the xen-xm format and a problem
when writing the file.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Commit b170eb99 introduced a bug: domains that had an explicit
<seclabel type='none'/> when started would not be reparsed if
libvirtd restarted. It turns out that our testsuite was not
exercising this because it never tried anything but inactive
parsing. Additionally, the live XML for such a domain failed
to re-validate. Applying just the tests/ portion of this patch
will expose the bugs that are fixed by the other two files.
* docs/schemas/domaincommon.rng (seclabel): Allow relabel under
type='none'.
* src/conf/domain_conf.c (virSecurityLabelDefParseXML): Per RNG,
presence of <seclabel> with no type implies dynamic. Don't
require sub-elements for type='none'.
* tests/qemuxml2xmltest.c (mymain): Add test.
* tests/qemuxml2argvtest.c (mymain): Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-none.xml: Add file.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-none.args: Add file.
Reported by Ansis Atteka.
On CentOS5 with xen-3.0.3:
Program received signal SIGSEGV, Segmentation fault.
virFree (ptrptr=0x8) at util/memory.c:310
310 free(*(void**)ptrptr);
(gdb) bt
#0 virFree (ptrptr=0x8) at util/memory.c:310
#1 0x00002aaaaae167c8 in xenXMDomainDefineXML (conn=0x694e80, xml=0x6b2ce0 "P\fk") at xen/xm_internal.c:1199
#2 0x00002aaaaae070d7 in xenUnifiedDomainDefineXML (conn=0x8,
xml=0x6ac040 "<domain type='xen'>\n <name>pv</name>\n <uuid>20291bc0-453a-4d6c-c6ac-4e5af63b932c</uuid>\n <memory>1048576</memory>\n <currentMemory>1048576</currentMemory>\n <vcpu>1</vcpu>\n <os>\n <type arch='x8"...) at xen/xen_driver.c:1524
#3 0x00002aaaaada7803 in virDomainDefineXML (conn=0x694e80,
xml=0x6ac040 "<domain type='xen'>\n <name>pv</name>\n <uuid>20291bc0-453a-4d6c-c6ac-4e5af63b932c</uuid>\n <memory>1048576</memory>\n <currentMemory>1048576</currentMemory>\n <vcpu>1</vcpu>\n <os>\n <type arch='x8"...) at libvirt.c:7823
#4 0x0000000000426173 in cmdEdit (ctl=0x7fffffffb8e0, cmd=<value optimized out>) at virsh.c:14882
#5 0x000000000041c9ce in vshCommandRun (ctl=0x7fffffffb8e0, cmd=0x658c50) at virsh.c:17712
#6 0x000000000042c3b9 in main (argc=1, argv=<value optimized out>) at virsh.c:19317
Signed-off-by: Philipp Hahn <hahn@univention.de>
Calling qemuDomainMigrateGraphicsRelocate notifies spice clients to
connect to destination qemu so that they can seamlessly switch streams
once migration is done. Unfortunately, current qemu is not able to
accept any connections while incoming migration connection is open.
Thus, we need to delay opening the migration connection to the point
spice client is already connected to the destination qemu.
Unlike .cvsignore under CVS, git allows for ignoring nested
names. We weren't very consistent where new tests were
being ignored (some in .gitignore, some in tests/.gitignore),
and I found it easier to just consolidate everything.
* .gitignore: Subsume entries from subdirectories.
* daemon/.gitignore: Delete.
* docs/.gitignore: Likewise.
* docs/devhelp/.gitignore: Likewise.
* docs/html/.gitignore: Likewise.
* examples/dominfo/.gitignore: Likewise.
* examples/domsuspend/.gitignore: Likewise.
* examples/hellolibvirt/.gitignore: Likewise.
* examples/openauth/.gitignore: Likewise.
* examples/domain-events/events-c/.gitignore: Likewise.
* include/libvirt/.gitignore: Likewise.
* src/.gitignore: Likewise.
* src/esx/.gitignore: Likewise.
* tests/.gitignore: Likewise.
* tools/.gitignore: Likewise.
This eliminates the warning message reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=624447
It was caused by a failure to open an image file that is not
accessible by root (the uid libvirtd is running as) because it's on a
root-squash NFS share, owned by a different user, with permissions of
660 (or maybe 600).
The solution is to use virFileOpenAs() rather than open(). The
codepath that generates the error is during qemuSetupDiskCGroup(), but
the actual open() is in a lower-level generic function called from
many places (virDomainDiskDefForeachPath), so some other pieces of the
code were touched just to add dummy (or possibly useful) uid and gid
arguments.
Eliminating this warning message has the nice side effect that the
requested operation may even succeed (which in this case isn't
necessary, but shouldn't hurt anything either).
virFileOpenAs previously would only try opening a file as the current
user, or as a different user, but wouldn't try both methods in a
single call. This made it cumbersome to use as a replacement for
open(2). Additionally, it had a lot of historical baggage that led to
it being difficult to understand.
This patch refactors virFileOpenAs in the following ways:
* reorganize the code so that everything dealing with both the parent
and child sides of the "fork+setuid+setgid+open" method are in a
separate function. This makes the public function easier to understand.
* Allow a single call to virFileOpenAs() to first attempt the open as
the current user, and if that fails to automatically re-try after
doing fork+setuid (if deemed appropriate, i.e. errno indicates it
would now be successful, and the file is on a networkFS). This makes
it possible (in many, but possibly not all, cases) to drop-in
virFileOpenAs() as a replacement for open(2).
(NB: currently qemuOpenFile() calls virFileOpenAs() twice, once
without forking, then again with forking. That unfortunately can't
be changed without at least some discussion of the ramifications,
because the requested file permissions are different in each case,
which is something that a single call to virFileOpenAs() can't deal
with.)
* Add a flag so that any fchown() of the file to a different uid:gid
is explicitly requested when the function is called, rather than it
being implied by the presence of the O_CREAT flag. This just makes
for less subtle surprises to consumers. (Commit
b1643dc15c added the check for O_CREAT
before forcing ownership. This patch just makes that restriction
more explicit.)
* If either the uid or gid is specified as "-1", virFileOpenAs will
interpret this to mean "the current [gu]id".
All current consumers of virFileOpenAs should retain their present
behavior (after a few minor changes to their setup code and
arguments).
Rename the src/util/netlink files to src/util/virnetlink to
better fit the naming scheme. Also rename nlComm to virNetlinkCommand.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
When libvirt's virDomainDestroy API is shutting down the qemu process,
it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the
process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss
because the guest running in qemu doesn't have time to flush its disk
cache buffers before it's unceremoniously whacked.
This patch maintains that default behavior, but provides a new flag
VIR_DOMAIN_DESTROY_GRACEFUL to alter the behavior. If this flag is set
in the call to virDomainDestroyFlags, SIGKILL will never be sent to
the qemu process; instead, if the timeout is reached and the qemu
process still exists, virDomainDestroy will return an error.
Once this patch is in, the recommended method for applications to call
virDomainDestroyFlags will be with VIR_DOMAIN_DESTROY_GRACEFUL
included. If that fails, then the application can decide if and when
to call virDomainDestroyFlags again without
VIR_DOMAIN_DESTROY_GRACEFUL (to force the issue with SIGKILL).
(Note that this does not address the issue of existing applications
that have not yet been modified to use VIR_DOMAIN_DESTROY_GRACEFUL.
That is a separate patch.)
Our HACKING discourages use of malloc and free, for at least
a couple of years now. But we weren't enforcing it, until now :)
For now, I've exempted python and tests, and will clean those up
in subsequent patches. Examples should be permanently exempt,
since anyone copying our examples won't have use of our
internal-only memory.h via libvirt_util.la.
* cfg.mk (sc_prohibit_raw_allocation): New rule.
(exclude_file_name_regexp--sc_prohibit_raw_allocation): and
exemptions.
* src/cpu/cpu.c (cpuDataFree): Avoid false positive.
* src/conf/network_conf.c (virNetworkDNSSrvDefParseXML): Fix
offenders.
* src/libxl/libxl_conf.c (libxlMakeDomBuildInfo, libxlMakeVfb)
(libxlMakeDeviceModelInfo): Likewise.
* src/rpc/virnetmessage.c (virNetMessageSaveError): Likewise.
* tools/virsh.c (_vshMalloc, _vshCalloc): Likewise.
Sometimes, its easier to run children with 2>&1 in shell notation,
and just deal with stdout and stderr interleaved. This was already
possible for fd handling; extend it to also work when doing string
capture of a child process.
* docs/internals/command.html.in: Document this.
* src/util/command.c (virCommandSetErrorBuffer): Likewise.
(virCommandRun, virExecWithHook): Implement it.
* tests/commandtest.c (test14): Test it.
* daemon/remote.c (remoteDispatchAuthPolkit): Use new command
feature.
This patch fixes the access of variable "con" in two files where the
variable was declared only on SELinux builds and thus the build failed
without SELinux. It's a rather nasty fix but helps fix the build
quickly and without any major changes to the code.
Detected by valgrind. Leak is introduced in commit 397e6a7.
* src/conf/domain_conf.c(virDomainDiskDefParseXML): fix memory leak.
How to reproduce?
% make -C tests check TESTS=qemuxml2argvtest
% cd tests && valgrind -v --leak-check=full ./qemuxml2argvtest
* Actual result:
==16352== 4 bytes in 1 blocks are definitely lost in loss record 12 of 147
==16352== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==16352== by 0x39D90A67DD: xmlStrndup (xmlstring.c:45)
==16352== by 0x4E83D5: virDomainDiskDefParseXML (domain_conf.c:2894)
==16352== by 0x4F542D: virDomainDefParseXML (domain_conf.c:7626)
==16352== by 0x4F8683: virDomainDefParseNode (domain_conf.c:8390)
==16352== by 0x4F904E: virDomainDefParse (domain_conf.c:8340)
==16352== by 0x41C626: testCompareXMLToArgvHelper (qemuxml2argvtest.c:105)
==16352== by 0x41DED1: virtTestRun (testutils.c:142)
==16352== by 0x418172: mymain (qemuxml2argvtest.c:486)
==16352== by 0x41D5C7: virtTestMain (testutils.c:697)
==16352== by 0x39CF01ECDC: (below main) (in /lib64/libc-2.12.so)
Signed-off-by: Alex Jia <ajia@redhat.com>
To allow the container to access /dev and /dev/pts when under
sVirt, set an explicit mount option. Also set a max size on
the /dev mount to prevent DOS on memory usage
* src/lxc/lxc_container.c: Set /dev mount context
* src/lxc/lxc_controller.c: Set /dev/pts mount context
For the sake of backwards compat, LXC guests are *not*
confined by default. This is because it is not practical
to dynamically relabel containers using large filesystem
trees. Applications can create confined containers though,
by giving suitable XML configs
* src/Makefile.am: Link libvirt_lxc to security drivers
* src/lxc/libvirtd_lxc.aug, src/lxc/lxc_conf.h,
src/lxc/lxc_conf.c, src/lxc/lxc.conf,
src/lxc/test_libvirtd_lxc.aug: Config file handling for
security driver
* src/lxc/lxc_driver.c: Wire up security driver functions
* src/lxc/lxc_controller.c: Add a '--security' flag to
specify which security driver to activate
* src/lxc/lxc_container.c, src/lxc/lxc_container.h: Set
the process label just before exec'ing init.
Curently security labels can be of type 'dynamic' or 'static'.
If no security label is given, then 'dynamic' is assumed. The
current code takes advantage of this default, and avoids even
saving <seclabel> elements with type='dynamic' to disk. This
means if you temporarily change security driver, the guests
can all still start.
With the introduction of sVirt to LXC though, there needs to be
a new default of 'none' to allow unconfined LXC containers.
This patch introduces two new security label types
- default: the host configuration decides whether to run the
guest with type 'none' or 'dynamic' at guest start
- none: the guest will run unconfined by security policy
The 'none' label type will obviously be undesirable for some
deployments, so a new qemu.conf option allows a host admin to
mandate confined guests. It is also possible to turn off default
confinement
security_default_confined = 1|0 (default == 1)
security_require_confined = 1|0 (default == 0)
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add new
seclabel types
* src/security/security_manager.c, src/security/security_manager.h:
Set default sec label types
* src/security/security_selinux.c: Handle 'none' seclabel type
* src/qemu/qemu.conf, src/qemu/qemu_conf.c, src/qemu/qemu_conf.h,
src/qemu/libvirtd_qemu.aug: New security config options
* src/qemu/qemu_driver.c: Tell security driver about default
config
This re-introduces parsing & formatting for per device seclabels.
There is a new virDomainDeviceSeclabelPtr struct and corresponding
APIs for parsing/formatting.
Revert parsing changes:
commit 302fe95ffa
Author: Eric Blake <eblake@redhat.com>
Date: Wed Jan 4 16:01:24 2012 -0700
seclabel: fix regression in libvirtd restart
commit b43432931a
Author: Eric Blake <eblake@redhat.com>
Date: Thu Dec 22 17:47:50 2011 -0700
seclabel: allow a seclabel override on a disk src
These two commits changed the sec label parsing code so that
the same code dealt with both the VM level sec label, and the
per device label. Unfortunately, as we add more options to the
VM level sec label, the logic required to use the same parsing
code for the per device label becomes unintelligible.
* src/conf/domain_conf.c: Remove support for parsing per
device sec labels
I slightly botched commit be9fb5a - I converted '--arg=value' to
'--arg value', which has no semantic change, but did trip up the
testsuite.
* src/network/bridge_driver.c (networkBuildDnsmasqArgv): Restore
expected output.
libvirt supports 4 different versions of the user-land XenD daemon. When
queried the daemon just returns its generation number, which is hard to
match to the version of the Xen tools.
Replace the magic generation numbers by named enum definitions to
improve code readability.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Detected by valgrind. Leaks introduced in commit 973af236.
* src/network/bridge_driver.c: fix memory leaks on failure and successful path.
* How to reproduce?
% make -C tests check TESTS=networkxml2argvtest
% cd tests && valgrind -v --leak-check=full ./networkxml2argvtest
* Actual result:
==2226== 3 bytes in 1 blocks are definitely lost in loss record 1 of 24
==2226== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==2226== by 0x39CF0FEDE7: __vasprintf_chk (in /lib64/libc-2.12.so)
==2226== by 0x41DFF7: virVasprintf (stdio2.h:199)
==2226== by 0x41E0B7: virAsprintf (util.c:1695)
==2226== by 0x41A2D9: networkBuildDhcpDaemonCommandLine (bridge_driver.c:545)
==2226== by 0x4145C8: testCompareXMLToArgvHelper (networkxml2argvtest.c:47)
==2226== by 0x4156A1: virtTestRun (testutils.c:141)
==2226== by 0x414332: mymain (networkxml2argvtest.c:123)
==2226== by 0x414D97: virtTestMain (testutils.c:696)
==2226== by 0x39CF01ECDC: (below main) (in /lib64/libc-2.12.so)
==2226==
==2226== 3 bytes in 1 blocks are definitely lost in loss record 2 of 24
==2226== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==2226== by 0x39CF0FEDE7: __vasprintf_chk (in /lib64/libc-2.12.so)
==2226== by 0x41DFF7: virVasprintf (stdio2.h:199)
==2226== by 0x41E0B7: virAsprintf (util.c:1695)
==2226== by 0x41A307: networkBuildDhcpDaemonCommandLine (bridge_driver.c:551)
==2226== by 0x4145C8: testCompareXMLToArgvHelper (networkxml2argvtest.c:47)
==2226== by 0x4156A1: virtTestRun (testutils.c:141)
==2226== by 0x414332: mymain (networkxml2argvtest.c:123)
==2226== by 0x414D97: virtTestMain (testutils.c:696)
==2226== by 0x39CF01ECDC: (below main) (in /lib64/libc-2.12.so)
==2226==
==2226== 5 bytes in 1 blocks are definitely lost in loss record 4 of 24
==2226== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==2226== by 0x39CF0FEDE7: __vasprintf_chk (in /lib64/libc-2.12.so)
==2226== by 0x41DFF7: virVasprintf (stdio2.h:199)
==2226== by 0x41E0B7: virAsprintf (util.c:1695)
==2226== by 0x41A2AB: networkBuildDhcpDaemonCommandLine (bridge_driver.c:539)
==2226== by 0x4145C8: testCompareXMLToArgvHelper (networkxml2argvtest.c:47)
==2226== by 0x4156A1: virtTestRun (testutils.c:141)
==2226== by 0x414332: mymain (networkxml2argvtest.c:123)
==2226== by 0x414D97: virtTestMain (testutils.c:696)
==2226== by 0x39CF01ECDC: (below main) (in /lib64/libc-2.12.so)
==2226==
==2226== LEAK SUMMARY:
==2226== definitely lost: 11 bytes in 3 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This is a trivial implementation, which works with the current
released qemu 1.0 with backports of preliminary block pull but
no partial rebase. Future patches will update the monitor handling
to support an optional parameter for partial rebase; but as qemu
1.1 is unreleased, it can be in later patches, designed to be
backported on top of the supported API.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Add parameter,
and adjust callers. Drop redundant check.
(qemuDomainBlockPull): Move guts...
(qemuDomainBlockRebase): ...to new function.
Qemu is adding the ability to do a partial rebase. That is, given:
base <- intermediate <- current
virDomainBlockPull will produce:
current
but qemu now has the ability to leave base in the chain, to produce:
base <- current
Note that current qemu can only do a forward merge, and only with
the current image as the destination, which is fully described by
this API without flags. But in the future, it may be possible to
enhance this API for additional scenarios by using flags:
Merging the current image back into a previous image (that is,
undoing a live snapshot), could be done by passing base as the
destination and flags with a bit requesting a backward merge.
Merging any other part of the image chain, whether forwards (the
backing image contents are pulled into the newer file) or backwards
(the deltas recorded in the newer file are merged back into the
backing file), could also be done by passing a new flag that says
that base should be treated as an XML snippet rather than an
absolute path name, where the XML could then supply the additional
instructions of which part of the image chain is being merged into
any other part.
* include/libvirt/libvirt.h.in (virDomainBlockRebase): New
declaration.
* src/libvirt.c (virDomainBlockRebase): Implement it.
* src/libvirt_public.syms (LIBVIRT_0.9.10): Export it.
* src/driver.h (virDrvDomainBlockRebase): New driver callback.
* src/rpc/gendispatch.pl (long_legacy): Add exemption.
* docs/apibuild.py (long_legacy_functions): Likewise.
This patch adds support for the new api into the qemu driver to support
modification and retrieval of domain description and title. This patch
does not add support for modifying the <metadata> element.
This patch adds API to modify domain metadata for running and stopped
domains. The api supports changing description, title as well as the
newly added <metadata> element. The API has support for storing data in
the metadata element using xml namespaces.
* include/libvirt/libvirt.h.in
* src/libvirt_public.syms
- add function headers
- add enum to select metadata to operate on
- export functions
* src/libvirt.c
- add public api implementation
* src/driver.h
- add driver support
* src/remote/remote_driver.c
* src/remote/remote_protocol.x
- wire up the remote protocol
* include/libvirt/virterror.h
* src/util/virterror.c
- add a new error message note that metadata for domain are
missing
This patch adds a new element <title> to the domain XML. This attribute
can hold a short title defined by the user to ease the identification of
domains. The title may not contain newlines and should be reasonably short.
*docs/formatdomain.html.in
*docs/schemas/domaincommon.rng
- add schema grammar for the new element and documentation
*src/conf/domain_conf.c
*src/conf/domain_conf.h
- add field to hold the new attribute
- add code to parse and create XML with the new attribute
This was forgotten when the function was originally written (not
noticed because it wasn't used at the time). It's required for
proper compilation with modules enabled after applying the recent
virStorageVolResize patches.
This was forgotten when the function was initially written (not
noticed because it wasn't used at the time). It's required for proper
compilation with modules enabled after applying the recent rawio
patches.
We already provide ways to detect when a domain has been paused as a
result of I/O error, but there was no way of getting the exact error or
even the device that experienced it. This new API may be used for both.
If we are building not on a WIN32 architecture and without HAVE_CAPNG
virSetCapabilities has unused argument and virClearCapabilities
is unused as well.
In qemuDomainShutdownFlags if we try to use guest agent,
which has error or is not configured, we jump go endjob
label even if we haven't started any job yet. This may
lead to the daemon crash:
1) virsh shutdown --mode agent on a domain without agent configured
2) wait until domain quits
3) virsh edit
Fix my typo at
commit 74e034964c
"disk->rawio == -1" indicates that this value is not
specified. So in case of this, domain must not
be tainted.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Using new function 'virTypedParameterArrayClear' to simplify block of codes.
* daemon/remote.c, src/remote/remote_driver.c: simplify codes.
Signed-off-by: Alex Jia <ajia@redhat.com>
This patch revises qemuProcessStart() function for qemu
processes to retain CAP_SYS_RAWIO if needed.
And in case of that, add taint flag to domain.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp>
This patch introduces virSetCapabilities() function and implements
virCommandAllowCap() function.
Existing virClearCapabilities() is function to clear all capabilities.
Instead virSetCapabilities() is function to set arbitrary capabilities.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp>
This patch adds a new attribute "rawio" to the "disk" element
of domain XML. Valid values of "rawio" attribute are "yes"
and "no".
rawio='yes' indicates the disk is desirous of CAP_SYS_RAWIO.
If you specify the following XML:
<disk type='block' device='lun' rawio='yes'>
...
</disk>
the domain will be granted CAP_SYS_RAWIO.
(of course, the domain have to be executed with root privilege)
NOTE:
- "rawio" attribute is only valid when device='lun'
- At the moment, any other disks you won't use rawio can use rawio.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Our existing virDomainBlockResize takes an unsigned long long
argument; if that command is later taught a DELTA and SHRINK flag,
we cannot change its type without breaking API (but at least such
a change would be ABI compatible). Meanwhile, the only time a
negative size makes sense is if both DELTA and SHRINK are used
together, but if we keep the argument unsigned, applications can
pass the positive delta amount by which they would like to shrink
the system, and have the flags imply the negative value. So,
since this API has not yet been released, and in the interest of
consistency with existing API, we swap virStorageVolResize to
always pass an unsigned value.
* include/libvirt/libvirt.h.in (virStorageVolResize): Use unsigned
argument.
* src/libvirt.c (virStorageVolResize): Likewise.
* src/driver.h (virDrvStorageVolUpload): Adjust clients.
* src/remote/remote_protocol.x (remote_storage_vol_resize_args):
Likewise.
* src/remote_protocol-structs: Regenerate.
Suggested by Daniel P. Berrange.
Fix several references to now renamed functions and parameters when the
functions were moved from src/xen/ to src/xenxs/.
Signed-off-by: Philipp Hahn <hahn@univention.de>
This patch addresses: https://bugzilla.redhat.com/show_bug.cgi?id=781562
Along with the "rombar" option that controls whether or not a boot rom
is made visible to the guest, qemu also has a "romfile" option that
allows specifying a binary file to present as the ROM BIOS of any
emulated or passthrough PCI device. This patch adds support for
specifying romfile to both passthrough PCI devices, and emulated
network devices that attach to the guest's PCI bus (just about
everything other than ne2k_isa).
One example of the usefulness of this option is described in the
bugzilla report: 82576 sriov network adapters don't provide a ROM BIOS
for the cards virtual functions (VF), but an image of such a ROM is
available, and with this ROM visible to the guest, it can PXE boot.
In libvirt's xml, the new option is configured like this:
<hostdev>
...
<rom file='/etc/fake/boot.bin'/>
...
</hostdev
(similarly for <interface>).
When support for the rombar option was added, it was only added for
PCI passthrough devices, configured with <hostdev>. The same option is
available for any network device that is attached to the guest's PCI
bus. This patch allows setting rombar for any PCI network device type.
After adding cases to test this to qemuxml2argv-hostdev-pci-rombar.*,
I decided to rename those files (to qemuxml2argv-pci-rom.*) to more
accurately reflect the additional tests, and also noticed that up to
now we've only been performing a domainschematest for that case, so I
added the "pci-rom" test to both qemuxml2argv and qemuxml2xml (and in
the process found some bugs whose fixes I squashed into previous
commits of this series).
Since these two items are now in the virDomainDeviceInfo struct, it
makes sense to parse/format them in the functions written to
parse/format that structure. Not all types of devices allow them, so
two internal flags are added to indicate when it is appropriate to do
so.
I was lucky - only one test case needed to be re-ordered!
To help consolidate the commonality between virDomainHostdevDef and
virDomainNetDef into as few members as possible (and because I
think it makes sense), this patch moves the rombar and bootIndex
members into the "info" member that is common to both (and to all the
other structs that use them).
It's a bit problematic that this gives rombar and bootIndex to many
device types that don't use them, but this is already the case for the
master and mastertype members of virDomainDeviceInfo, and is properly
commented as such in the definition.
Note that this opens the door to supporting rombar for other devices
that are attached to the guest PCI bus - virtio-blk-pci,
virtio-net-pci, various other network adapters - which which have that
capability in qemu, but previously had no support in libvirt.
Unlike other users of virTypedParameter with RPC, this interface
can return zero-filled entries because the interface assumes
2 dimensional array. We compress these entries out from the
server when generating the over-the-wire contents, then reconstitute
them in the client.
Signed-off-by: Eric Blake <eblake@redhat.com>
add new API virDomainGetCPUStats() for getting cpu accounting information
per real cpus which is used by a domain. The API is designed to allow
future extensions for additional statistics.
based on ideas by Lai Jiangshan and Eric Blake.
* src/libvirt_public.syms: add API for LIBVIRT_0.9.10
* src/libvirt.c: define virDomainGetCPUStats()
* include/libvirt/libvirt.h.in: add virDomainGetCPUStats() header
* src/driver.h: add driver API
* python/generator.py: add python API (as not implemented)
Signed-off-by: Eric Blake <eblake@redhat.com>
This API allows a domain to be put into one of S# ACPI states.
Currently, S3 and S4 are supported. These states are shared
with virNodeSuspendForDuration.
However, for now we don't support any duration other than zero.
The same apply for flags.
Add a new function to allow changing of capacity of storage volumes.
Plan out several flags, even if not all of them will be implemented
up front.
Expose the new command via 'virsh vol-resize'.
Signed-off-by: Eric Blake <eblake@redhat.com>
And hook it up for policykit auth. This allows virt-manager to detect
that the user clicked the policykit 'cancel' button and not throw
an 'authentication failed' error message at the user.
If yajl was not compiled in, we end up freeing an incoming
parameter, which leads to a bogus free later on. Regression
introduced in commit 6e769eb.
* src/qemu/qemu_capabilities.c (qemuCapsParseHelpStr): Avoid alloc
on failure path, which in turn fixes bogus free.
Reported by Cole Robinson.
The virEmitXMLWarning function should always have been in
the xml.[hc] files, and should use virXML as its name
prefix
* src/util/util.c, src/util/util.h: Remove virEmitXMLWarning
* src/util/xml.c, src/util/xml.h: Add virXMLEmitWarning
QEMU supports a bunch of CPUID features that are tied to the kvm CPUID
nodes rather than the processor's. They are "kvmclock",
"kvm_nopiodelay", "kvm_mmu", "kvm_asyncpf". These are not known to
libvirt and their CPUID leaf might move if (for example) the Hyper-V
extensions are enabled. Hence their handling would anyway require some
special-casing.
However, among these the most useful is kvmclock; an additional
"property" of this feature is that a <timer> element is a better model
than a CPUID feature. Although, creating part of the -cpu command-line
from something other than the <cpu> XML element introduces some
ugliness.
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add kvmclock timer to documentation, schema and parsers. Keep the
platform timer first since it is kind of special, and alphabetize
the others when possible (i.e. when it does not change the ABI).
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid creating an empty <cpu> element when the QEMU command-line simply
specifies the default "-cpu qemu32" or "-cpu qemu64".
This requires the previous patch, which lets us represent "-cpu qemu32"
as <os arch='i686'> in the generated XML.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The qemu32 CPU model is chosen based on the <os arch=...> name when
creating the QEMU command line for a 64-bit host. For the opposite
transformation we can test the guest CPU model for the "lm" feature.
If it is absent, def->os.arch needs to be corrected.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When running under KVM, the arch is usually set to i686 because
the name of the emulator is not qemu-system-x86_64. Use the host
arch instead.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recently (or not so recently) QEMU added the kvm32 and kvm64
architectures, representing a least common denominator of all
hosts that can run KVM. Add them to the machine map.
Also, some features that TCG supports were added to qemu64.
Add them to the cpu_map.xml whenever KVM is guaranteed to support
those. We still have to leave some out, because they would not
be available to guests running on older hosts.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The qemu developers have made it clear that modern qemu will no
longer guarantee human monitor command stability; furthermore,
some features, such as async events, are only supported via qmp.
If we are compiled without support for handling JSON, we cannot
expect to sanely interact with modern qemu.
However, things must continue to build on RHEL 5, where qemu
is stuck at 0.10, and where yajl is not available.
Another benefit of this patch: future additions of new monitor
commands need only focus on qemu_monitor_json.c, instead of
also wasting time with qemu_monitor_text.c.
* src/qemu/qemu_capabilities.c (qemuCapsComputeCmdFlags): Report
error if yajl is missing but qemu requires qmp.
(qemuCapsParseHelpStr): Propagate error.
(qemuCapsExtractVersionInfo): Update caller.
* tests/qemuhelptest.c (testHelpStrParsing): Likewise.
I'm getting tired of remembering to backport RHEL-specific
patches when building upstream libvirt on RHEL 6.x or CentOS.
All the affected versions of RHEL qemu-kvm have backported
enough patches to a) make JSON useful, and b) modify the
-help text to mention libvirt as the preferred interface;
which means this string in the help output is a reliable
indicator that we can outsmart a strict version check,
even when upstream qemu 0.12 lacked the needed features.
* src/qemu/qemu_capabilities.c (qemuCapsComputeCmdFlags):
Recognize particular help string present when enough features were
backported to be worth using JSON.
* tests/qemuhelptest.c (mymain): Update tests accordingly.
Compare two filters' XML for equality and only rebuild/instantiate the new
filter if the new and current filters are found to be different. This
improves performance during an update of a filter with no obvious change
or the reloading of filters during a 'kill -SIGHUP'
Introduce a function that rebuilds all running VMs' filters. Call
this function when reloading the nwfilter driver.
This addresses a problem introduced by the 2nd patch that typically
causes no filters to be reinstantiate anymore upon driver reload
since their XML has not changed. Yet the current behavior is that
upon a SIGHUP all filters get reinstantiated.
QEMU always sends details about all available block devices as an answer
for "info block"/"query-block" command. On the other hand, our
qemuMonitorGetBlockInfo was made for a single block devices queries
only. Thus, when asking for multiple devices, we asked qemu multiple
times to always get the same answer from which different parts were
filtered. This patch makes qemuMonitorGetBlockInfo return a hash table
of all block devices, which may later be used for getting details about
specific devices.
Added a new field "vm-pid" to the VIRT_CONTROL audit record. This information
is useful to correlated another audit events to the events generated by
libvirt.
On RHEL5, I got:
util/virrandom.c:66: warning: nested extern declaration of '_gl_verify_function66' [-Wnested-externs]
The fix is to hoist the verify earlier. Also some other hodge-podge
fixes I noticed while reviewing Dan's recent series.
* .gitignore: Ignore new test.
* src/util/cgroup.c: Bump copyright year.
* src/util/virhash.c: Fix typo in description.
* src/util/virrandom.c (virRandomBits): Mark doc comment, and
hoist assert to silence older gcc.
Recent discussions have illustrated the potential for DOS attacks
with the hash table implementations used by most languages and
libraries.
https://lwn.net/Articles/474912/
libvirt has an internal hash table impl, and uses hash tables for
a variety of purposes. The hash key generation code is pretty
simple and thus not strongly collision resistant.
This patch replaces the current libvirt hash key generator with
the (public domain) Murmurhash3 code. In addition every hash
table now gets a random seed value which is used to perturb the
hashing code. This should make it impossible to mount any
practical attack against libvirt hashing code.
* bootstrap.conf: Import bitrotate module
* src/Makefile.am: Add virhashcode.[ch]
* src/util/util.c: Make virRandom() return a fixed 32 bit
integer value.
* src/util/hash.c, src/util/hash.h, src/util/cgroup.c: Replace
hash code generation with a call to virHashCodeGen()
* src/util/virhashcode.h, src/util/virhashcode.c: Add a new
virHashCodeGen() API using the Murmurhash3 algorithm.
In preparation for the patch to include Murmurhash3, which
introduces a virhashcode.h and virhashcode.c files, rename
the existing hash.h and hash.c to virhash.h and virhash.c
respectively.
In preparation for conversion over to use the Murmurhash3
algorithm, convert various virHash APIs to use size_t or
uint32 for their return values/parameters, instead of the
variable size 'unsigned long' or 'int' types
The old virRandom() API was not generating good random numbers.
Replace it with a new API virRandomBits which instead of being
told the upper limit, gets told the number of bits of randomness
required.
* src/util/virrandom.c, src/util/virrandom.h: Add virRandomBits,
and move virRandomInitialize
* src/util/util.h, src/util/util.c: Delete virRandom and
virRandomInitialize
* src/libvirt.c, src/security/security_selinux.c,
src/test/test_driver.c, src/util/iohelper.c: Update for
changes from virRandom to virRandomBits
* src/storage/storage_backend_iscsi.c: Remove bogus call
to virRandomInitialize & convert to virRandomBits
Currently, we support only filling a volume with zeroes on wiping.
However, it is not enough as data might still be readable by
experienced and equipped attacker. Many technical papers have been
written, therefore we should support other wiping algorithms.
In file included from ../gnulib/lib/unistd.h:51:0,
from ../src/util/util.h:30,
from rpc/virkeepalive.c:29:
/usr/x86_64-w64-mingw32/sys-root/mingw/include/winsock2.h:15:2: warning: #warning Please include winsock2.h before windows.h [-Wcpp]
Reported by Marc-André Lureau.
* src/util/threads-win32.h (includes): Pick up winsock2.h before
windows.h, as required by mingw64.
On F16 at least, empty volume groups don't have a directory under /dev.
The directory only appears once a logical volume is created.
This tickles some behavior in BackendStablePath which ends with
libvirt sleeping for 5 seconds while waiting for the directory to appear.
This causes all sorts of problems for the virStorageVolLookupByPath API
which virtinst uses, even if trying to resolve a path that is independent
of the logical pool.
In reality we don't even need to do that checking since logical pools
always have a stable target path. Short circuit the polling in that
case.
Fixes bug 782261
Systemd detects containers based on whether they have
an environment variable starting with 'container=lxc';
using a longer name fits the expectations, while also
allowing detection of who created the container.
Requested by Lennart Poettering, in response to
https://bugs.freedesktop.org/show_bug.cgi?id=45175
* src/lxc/lxc_container.c (lxcContainerBuildInitCmd): Add another
env-var.
The current setup code for LXC is bind mounting /dev/pts/ptmx
on top of a character device /dev/ptmx. This is denied by SELinux
policy and is just wrong. The target of a bind mount should just
be a plain file
* src/lxc/lxc_container.c: Don't bind /dev/pts/ptmx onto
a char device
Add a virFileTouch API which ensures that a file will always
exist, even if zero length
* src/util/virfile.c, src/util/virfile.h,
src/libvirt_private.syms: Introduce virFileTouch
Direct boot (using kernel, initrd, and command line) is used by
virt-install/virt-manager for network install. While any bootindex has
no direct effect since -kernel is always first, we need it as a hint for
SeaBIOS to present disks in the same order as they will be presented
during normal boot.
It's better to group all the metadata together. This is a
cosmetic output change; since the RNG allows interleave, it
doesn't matter where the user stuck it on input, and an XPath
query will find the same information when parsing the output.
* src/conf/domain_conf.c (virDomainDefFormatInternal): Output
metadata earlier.
* docs/formatdomain.html.in: Update documentation.
* tests/domainsnapshotxml2xmlout/metadata.xml: Update test.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-metadata.xml: Likewise.
Applications can now insert custom nodes and hierarchies into domain
configuration XML. Although currently not enforced, applications are
required to use their own namespaces on every custom node they insert,
with only one top-level element per namespace.
POLLIN and POLLHUP are not mutually exclusive. Currently the following
seems possible: the child writes 3K to its stdout or stderr pipe, and
immediately closes it. We get POLLIN|POLLHUP (I'm not sure that's possible
on Linux, but SUSv4 seems to allow it). We read 1K and throw away the
rest.
When poll() returns and we're about to check the /revents/ member in a
given array element, let's map all the revents bits to two (independent)
ideas: "let's attempt to read()", and "let's attempt to write()". This
should cover all errors, EOFs, and normal conditions; the read()/write()
call should report any pending error.
Under this approach, both POLLHUP and POLLERR are mapped to "needs read()"
if we're otherwise prepared for POLLIN. POLLERR also maps to "needs
write()" if we're otherwise prepared for POLLOUT. The rest of the mappings
(POLLPRI etc.) would be easy, but probably useless for pipes.
Additionally, SUSv4 doesn't appear to forbid POLLIN|POLLERR (or
POLLOUT|POLLERR) set simultaneously. One could argue that the read() or
write() call would return without blocking in these cases (with an error),
so POLLIN / POLLOUT would be justified beside POLLERR.
The code now penalizes POLLIN|POLLERR differently from plain POLLERR. The
former (ie. read() returning -1) is terminal and we jump to cleanup, while
plain POLLERR masks only the affected file descriptor for the future.
Let's unify those.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
This makes use of the QEMU guest agent to implement the
virDomainShutdownFlags and virDomainReboot APIs. With
no flags specified, it will prefer to use the agent, but
fallback to ACPI. Explicit choice can be made by using
a suitable flag
* src/qemu/qemu_driver.c: Wire up use of agent
Add a new API virDomainShutdownFlags and define:
VIR_DOMAIN_SHUTDOWN_DEFAULT = 0,
VIR_DOMAIN_SHUTDOWN_ACPI_POWER_BTN = (1 << 0),
VIR_DOMAIN_SHUTDOWN_GUEST_AGENT = (1 << 1),
Also define some flags for the reboot API
VIR_DOMAIN_REBOOT_DEFAULT = 0,
VIR_DOMAIN_REBOOT_ACPI_POWER_BTN = (1 << 0),
VIR_DOMAIN_REBOOT_GUEST_AGENT = (1 << 1),
Although these two APIs currently have the same flags, using
separate enums allows them to expand separately in the future.
Add stub impls of the new API for all existing drivers
There is now a standard QEMU guest agent that can be installed
and given a virtio serial channel
<channel type='unix'>
<source mode='bind' path='/var/lib/libvirt/qemu/f16x86_64.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
The protocol that runs over the guest agent is JSON based and
very similar to the JSON monitor. We can't use exactly the same
code because there are some odd differences in the way messages
and errors are structured. The qemu_agent.c file is based on
a combination and simplification of qemu_monitor.c and
qemu_monitor_json.c
* src/qemu/qemu_agent.c, src/qemu/qemu_agent.h: Support for
talking to the agent for shutdown
* src/qemu/qemu_domain.c, src/qemu/qemu_domain.h: Add thread
helpers for talking to the agent
* src/qemu/qemu_process.c: Connect to agent whenever starting
a guest
* src/qemu/qemu_monitor_json.c: Make variable static
When converting a linear enum to a string, we have checks in
place in the VIR_ENUM_IMPL macro to ensure that there is one
string for every value, which lets us quickly flag if a user
added a value but forgot to add a counterpart string. However,
this only works if we use the _LAST marker.
* cfg.mk (sc_require_enum_last_marker): New syntax check.
* src/conf/domain_conf.h (virDomainSnapshotState): Add new marker.
* src/conf/domain_conf.c (virDomainSnapshotState): Fix offender.
* src/qemu/qemu_monitor_json.c (qemuMonitorWatchdogAction)
(qemuMonitorIOErrorAction, qemuMonitorGraphicsAddressFamily):
Likewise.
* src/util/virtypedparam.c (virTypedParameter): Likewise.
Although this is a public API break, it only affects users that
were compiling against *_LAST values, and can be trivially
worked around without impacting compilation against older
headers, by the user defining VIR_ENUM_SENTINELS before using
libvirt.h. It is not an ABI break, since enum values do not
appear as .so entry points. Meanwhile, it prevents users from
using non-stable enum values without explicitly acknowledging
the risk of doing so.
See this list discussion:
https://www.redhat.com/archives/libvir-list/2012-January/msg00804.html
* include/libvirt/libvirt.h.in: Hide all sentinels behind
LIBVIRT_ENUM_SENTINELS, and add missing sentinels.
* src/internal.h (VIR_DEPRECATED): Allow inclusion after
libvirt.h.
(LIBVIRT_ENUM_SENTINELS): Expose sentinels internally.
* daemon/libvirtd.h: Use the sentinels.
* src/remote/remote_protocol.x (includes): Don't expose sentinels.
* python/generator.py (enum): Likewise.
* tests/cputest.c (cpuTestCompResStr): Silence compiler warning.
* tools/virsh.c (vshDomainStateReasonToString)
(vshDomainControlStateToString): Likewise.
While we still don't want to enable gcc's new -Wformat-literal
warning, I found a rather easy case where the warning could be
reduced, by getting rid of obsolete error-reporting practices.
This is the last place where we were passing the (unused) net
and conn arguments for constructing an error.
* src/util/virterror_internal.h (virErrorMsg): Delete prototype.
(virReportError): Delete macro.
* src/util/virterror.c (virErrorMsg): Make static.
* src/libvirt_private.syms (virterror_internal.h): Drop export.
* src/util/conf.c (virConfError): Convert to macro.
(virConfErrorHelper): New function, and adjust error calls.
* src/xen/xen_hypervisor.c (virXenErrorFunc): Delete.
(xenHypervisorGetSchedulerType)
(xenHypervisorGetSchedulerParameters)
(xenHypervisorSetSchedulerParameters)
(xenHypervisorDomainBlockStats)
(xenHypervisorDomainInterfaceStats)
(xenHypervisorDomainGetOSType)
(xenHypervisorNodeGetCellsFreeMemory, xenHypervisorGetVcpus):
Update callers.
Reusing common code makes things smaller; it also buys us some
additional safety, such as now rejecting duplicate parameters
during a set operation.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters)
(qemuDomainSetMemoryParameters, qemuDomainSetNumaParameters)
(qemuSetSchedulerParametersFlags)
(qemuDomainSetInterfaceParameters, qemuDomainSetBlockIoTune)
(qemuDomainGetBlkioParameters, qemuDomainGetMemoryParameters)
(qemuDomainGetNumaParameters, qemuGetSchedulerParametersFlags)
(qemuDomainBlockStatsFlags, qemuDomainGetInterfaceParameters)
(qemuDomainGetBlockIoTune): Use new helpers.
* src/esx/esx_driver.c (esxDomainSetSchedulerParametersFlags)
(esxDomainSetMemoryParameters)
(esxDomainGetSchedulerParametersFlags)
(esxDomainGetMemoryParameters): Likewise.
* src/libxl/libxl_driver.c
(libxlDomainSetSchedulerParametersFlags)
(libxlDomainGetSchedulerParametersFlags): Likewise.
* src/lxc/lxc_driver.c (lxcDomainSetMemoryParameters)
(lxcSetSchedulerParametersFlags, lxcDomainSetBlkioParameters)
(lxcDomainGetMemoryParameters, lxcGetSchedulerParametersFlags)
(lxcDomainGetBlkioParameters): Likewise.
* src/test/test_driver.c (testDomainSetSchedulerParamsFlags)
(testDomainGetSchedulerParamsFlags): Likewise.
* src/xen/xen_hypervisor.c (xenHypervisorSetSchedulerParameters)
(xenHypervisorGetSchedulerParameters): Likewise.
Preparation for another patch that refactors common patterns
into the new file for fewer lines of code overall.
* src/util/util.h (virTypedParameterArrayClear): Move...
* src/util/virtypedparam.h: ...to new file.
(virTypedParameterArrayValidate, virTypedParameterAssign): New
prototypes.
* src/util/util.c (virTypedParameterArrayClear): Likewise.
* src/util/virtypedparam.c: New file.
* po/POTFILES.in: Mark file for translation.
* src/Makefile.am (UTIL_SOURCES): Build it.
* src/libvirt_private.syms (util.h): Split...
(virtypedparam.h): to new section.
(virkeycode.h): Sort.
* daemon/remote.c: Adjust callers.
* tools/virsh.c: Likewise.
Based on qemu changes made in commits ae523427 and 659ded58.
* src/lxc/lxc_driver.c (lxcSetSchedulerParametersFlags)
(lxcGetSchedulerParametersFlags, lxcDomainSetBlkioParameters)
(lxcDomainGetBlkioParameters): Use helpers.
(lxcDomainSetBlkioParameters): Allow setting live and config at
once.
We had a memory leak on a very arcane OOM situation (unlikely to ever
hit in practice, but who knows if libvirt.so would ever be linked
into some other program that exhausts all thread-local storage keys?).
I found it by code inspection, while analyzing a valgrind report
generated by Alex Jia.
* src/util/threads.h (virThreadLocalSet): Alter signature.
* src/util/threads-pthread.c (virThreadHelper): Reduce allocation
lifetime.
(virThreadLocalSet): Detect failure.
* src/util/threads-win32.c (virThreadLocalSet): Likewise.
(virCondWait): Fix caller.
* src/util/virterror.c (virLastErrorObject): Likewise.
The RPC generator transforms methods matching certain
patterns like 'id' or 'uuid', etc but does not anchor
its matches to the end of the word. So if a method
contains 'id' in the middle (eg virIdentity) then the
RPC generator munges that.
* src/rpc/gendispatch.pl: Anchor matches
To avoid a namespace clash with forthcoming identity APIs,
rename the virNet*GetLocalIdentity() APIs to have the form
virNet*GetUNIXIdentity()
* daemon/remote.c, src/libvirt_private.syms: Update
for renamed APIs
* src/rpc/virnetserverclient.c, src/rpc/virnetserverclient.h,
src/rpc/virnetsocket.c, src/rpc/virnetsocket.h: s/LocalIdentity/UNIXIdentity/
There was missing capability for blkiotune and thus specifying these
settings caused libvirt to run qemu with invalid parameters and then
reporting qemu error instead of the standard libvirt one. The support
for blkiotune setting was added in upstream qemu repo under commit
0563e191516289c9d2f282a8c50f2eecef2fa773.
Given an LXC guest with a root filesystem path of
/export/lxc/roots/helloworld/root
During startup, we will pivot the root filesystem to end up
at
/.oldroot/export/lxc/roots/helloworld/root
We then try to open
/.oldroot/export/lxc/roots/helloworld/root/dev/pts
Now consider if '/export/lxc' is an absolute symlink pointing
to '/media/lxc'. The kernel will try to open
/media/lxc/roots/helloworld/root/dev/pts
whereas it should be trying to open
/.oldroot//media/lxc/roots/helloworld/root/dev/pts
To deal with the fact that the root filesystem can be moved,
we need to resolve symlinks in *any* part of the filesystem
source path.
* src/libvirt_private.syms, src/util/util.c,
src/util/util.h: Add virFileResolveAllLinks to resolve
all symlinks in a path
* src/lxc/lxc_container.c: Resolve all symlinks in filesystem
paths during startup
pciTrySecondaryBusReset checks if there is active device on the
same bus, however, qemu driver doesn't maintain an effective
list for the inactive devices, and it passes meaningless argument
for parameter "inactiveDevs". e.g. (qemuPrepareHostdevPCIDevices)
if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
return -1;
..skipped...
if (pciResetDevice(dev, driver->activePciHostdevs, pcidevs) < 0)
goto reattachdevs;
NB, the "pcidevs" used above are extracted from domain def, and
thus one won't be able to attach a device of which bus has other
device even detached from host (nodedev-detach). To see more
details of the problem:
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=773667
This patch is to resolve the problem by introducing an inactive
PCI device list (just like qemu_driver->activePciHostdevs), and
the whole logic is:
* Add the device to inactive list during nodedev-dettach
* Remove the device from inactive list during nodedev-reattach
* Remove the device from inactive list during attach-device
(for non-managed device)
* Add the device to inactive list after detach-device, only
if the device is not managed
With the above, we have a sufficient inactive PCI device list, and thus
we can use it for pciResetDevice. e.g.(qemuPrepareHostdevPCIDevices)
if (pciResetDevice(dev, driver->activePciHostdevs,
driver->inactivePciHostdevs) < 0)
goto reattachdevs;
This introduces new attribute wrpolicy with only supported
value as immediate. This will be an optional
attribute with no defaults. This helps specify whether
to skip the host page cache.
When wrpolicy is specified, meaning when wrpolicy=immediate
a writeback is explicitly initiated for the dirty pages in
the host page cache as part of the guest file write operation.
Usage:
<filesystem type='mount' accessmode='passthrough'>
<driver type='path' wrpolicy='immediate'/>
<source dir='/export/to/guest'/>
<target dir='mount_tag'/>
</filesystem>
Currently this only works with type='mount' for the QEMU/KVM driver.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
In the past we didn't reserve 0:0:2.0 PCI address if there was no video
device assigned to a domain, which made it impossible to add a video
device later on. So we fixed it (commit v0.9.0-37-g7b2cac1) by always
reserving that address. However, that breaks existing domains without
video devices that already have another device assigned to the
problematic address.
This patch reserves address 0:0:2.0 only in case it was not explicitly
assigned to another device, which means libvirt will try to keep this
address free and will not automatically assign it new devices. But
existing domains for which older libvirt already assigned the address to
a non-video device will keep working as they used to work before 0.9.1.
Moreover, users who want to create a domain without a video device and
use its address for another device may do so by explicitly configuring
the PCI address in domain XML.
There are several reasons for doing this:
- the CPU specification is out of libvirt's control so we cannot
guarantee stable guest ABI
- not every feature of a CPU may actually work as expected when
advertised directly to a guest
- migration between two machines with exactly the same CPU may work but
no guarantees can be made
- this mode is not supported and its use is at one's own risk
VIR_DOMAIN_XML_UPDATE_CPU flag for virDomainGetXMLDesc may be used to
get updated custom mode guest CPU definition in case it depends on host
CPU. This patch implements the same behavior for host-model and
host-passthrough CPU modes.
The mode can be either of "custom" (default), "host-model",
"host-passthrough". The semantics of each mode is described in the
following examples:
- guest CPU is a default model with specified topology:
<cpu>
<topology sockets='1' cores='2' threads='1'/>
</cpu>
- guest CPU matches selected model:
<cpu mode='custom' match='exact'>
<model>core2duo</model>
</cpu>
- guest CPU should be a copy of host CPU as advertised by capabilities
XML (this is a short cut for manually copying host CPU specification
from capabilities to domain XML):
<cpu mode='host-model'/>
In case a hypervisor does not support the exact host model, libvirt
automatically falls back to a closest supported CPU model and
removes/adds features to match host. This behavior can be disabled by
<cpu mode='host-model'>
<model fallback='forbid'/>
</cpu>
- the same as previous returned by virDomainGetXMLDesc with
VIR_DOMAIN_XML_UPDATE_CPU flag:
<cpu mode='host-model' match='exact'>
<model fallback='allow'>Penryn</model> --+
<vendor>Intel</vendor> |
<topology sockets='2' cores='4' threads='1'/> + copied from
<feature policy='require' name='dca'/> | capabilities XML
<feature policy='require' name='xtpr'/> |
... --+
</cpu>
- guest CPU should be exactly the same as host CPU even in the aspects
libvirt doesn't model (such domain cannot be migrated unless both
hosts contain exactly the same CPUs):
<cpu mode='host-passthrough'/>
- the same as previous returned by virDomainGetXMLDesc with
VIR_DOMAIN_XML_UPDATE_CPU flag:
<cpu mode='host-passthrough' match='minimal'>
<model>Penryn</model> --+ copied from caps
<vendor>Intel</vendor> | XML but doesn't
<topology sockets='2' cores='4' threads='1'/> | describe all
<feature policy='require' name='dca'/> | aspects of the
<feature policy='require' name='xtpr'/> | actual guest CPU
... --+
</cpu>
In case a hypervisor doesn't support the exact CPU model requested by a
domain XML, we automatically fallback to a closest CPU model the
hypervisor supports (and make sure we add/remove any additional features
if needed). This patch adds 'fallback' attribute to model element, which
can be used to disable this automatic fallback.
Commit 5d784bd6d7 was a nice attempt to
clarify the semantics by requiring domain name from dxml to either match
original name or dname. However, setting dxml domain name to dname
doesn't really work since destination host needs to know the original
domain name to be able to use it in migration cookies. This patch
requires domain name in dxml to match the original domain name. The
change should be safe and backward compatible since migration would fail
just a bit later in the process.
There are three address validation routines that do nothing:
virDomainDeviceDriveAddressIsValid()
virDomainDeviceUSBAddressIsValid()
virDomainDeviceVirtioSerialAddressIsValid()
Remove them, and replace their call sites with "1" which is what they
currently return. In some cases this means we can remove an entire
if block.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
We can't call qemuCapsExtractVersionInfo() from test code, because it
expects to be able to call the emulator, and for testing we have fake
emulators that can't be executed. For that reason qemuxml2argvtest.c
doesn't call qemuDomainAssignPCIAddresses(), instead it open codes its
own version.
That means we can't call qemuDomainAssignAddresses() from the test code,
instead we need to manually call qemuDomainAssignSpaprVioAddresses().
Also add logic to cope with qemuDomainAssignSpaprVioAddresses() failing,
so that we can write a test that checks for a known failure in there.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
KVM will be able to use a PCI SCSI controller even on POWER. Let
the user specify the vSCSI controller by other means than a default.
After this patch, the QEMU driver will actually look at the model
and reject anything but auto, lsilogic and ibmvscsi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit d09f6ba5fe introduced a regression in event
registration. virDomainEventCallbackListAddID() will only return a positive
integer if the type of event being registered is VIR_DOMAIN_EVENT_ID_LIFECYCLE.
For other event types, 0 is always returned on success. This has the
unfortunate side effect of not enabling remote event callbacks because
remoteDomainEventRegisterAny() uses the return value from the local call to
determine if an event callback needs to be registered on the remote end.
Make sure virDomainEventCallbackListAddID() returns the callback count for the
eventID being registered.
Signed-off-by: Adam Litke <agl@us.ibm.com>
The new introduced optional attribute "copy_on_read</code> controls
whether to copy read backing file into the image file. The value can
be either "on" or "off". Copy-on-read avoids accessing the same backing
file sectors repeatedly and is useful when the backing file is over a
slow network. By default copy-on-read is off.
Earlier, when the number of vcpus was greater than the topology allowed,
libvirt didn't raise an error and continued, resulting in running qemu
with parameters making no sense. Even though qemu did not report any
error itself, the number of vcpus was set to maximum allowed by the
topology.
Detected by Coverity. Although unlikely, if we are ever started
with stdin closed, we could reach a situation where we open a
uuid file but then fail to close it, making that file the new
stdin for the rest of the process.
* src/util/uuid.c (getDMISystemUUID): Allow for stdin.
Currently the LXC controller attempts to deal with EOF on a
tty by spawning a thread to do an edge triggered epoll_wait().
This avoids the normal event loop spinning on POLLHUP. There
is a subtle mistake though - even after seeing POLLHUP on a
master PTY, it is still perfectly possible & valid to write
data to the PTY. There is a buffer that can be filled with
data, even when no client is present.
The second mistake is that the epoll_wait() thread was not
looking for the EPOLLOUT condition, so when a new client
connects to the LXC console, it had to explicitly send a
character before any queued output would appear.
Finally, there was in fact no need to spawn a new thread to
deal with epoll_wait(). The epoll file descriptor itself
can be poll()'d on normally.
This patch attempts to deal with all these problems.
- The blocking epoll_wait() thread is replaced by a poll
on the epoll file descriptor which then does a non-blocking
epoll_wait() to handle events
- Even if POLLHUP is seen, we continue trying to write
any pending output until getting EAGAIN from write.
- Once write returns EAGAIN, we modify the epoll event
mask to also look for EPOLLOUT
* src/lxc/lxc_controller.c: Avoid stalled I/O upon
connected to an LXC console
If client stream does not have any data to sink and neither received
EOF, a dummy packet is sent to the daemon signalising client is ready to
sink some data. However, after we added event loop to client a race may
occur:
Thread 1 calls virNetClientStreamRecvPacket and since no data are cached
nor stream has EOF, it decides to send dummy packet to server which will
sent some data in turn. However, during this decision and actual message
exchange with server -
Thread 2 receives last stream data from server. Therefore an EOF is set
on stream and if there is a call waiting (which is not yet) it is woken
up. However, Thread 1 haven't sent anything so far, so there is no call
to be woken up. So this thread sent dummy packet to daemon, which
ignores that as no stream is associated with such packet and therefore
no reply will ever come.
This race causes client to hang indefinitely.
QEMU does not support security_model for anything but 'path' fs driver type.
Currently in libvirt, when security_model ( accessmode attribute) is not
specified it auto-generates it irrespective of the fs driver type, which
can result in a qemu error for drivers other than path. This patch ensures
that the qemu cmdline is correctly generated by taking into account the
fs driver type.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
If a system has 64 or more VF's, it is quite tedious to mention each VF
in the interface pool.
The following modification will implicitly create an interface pool from
the SR-IOV PF.
This functions enables us to get the Virtual Functions attached to
a Physical function given the name of a SR-IOV physical functio.
In order to accomplish the task, added a getter function pciGetDeviceAddrString
to get the BDF of the Virtual Function in a char array.
The autobuilder pointed out an odd failure on mingw:
../../src/interface/netcf_driver.c:644:5: error: unknown field 'close_used_without_including_unistd_h' specified in initializer
cc1: warnings being treated as errors
This is because the gnulib headers #define close to different strings,
according to which headers are included, in order to work around some
odd mingw problems with close(), and these defines happen to also
affect field members declared with a name of struct foo.close. As long
as all headers are included before both the definition and use of the
struct, the various #define doesn't matter, but the netcf file hit
an instance where things were included in a different order. Fix this
for all clients that use a struct member named 'close'.
* src/driver.h: Include <unistd.h> before using 'close'.
For some weird reason, i686-pc-mingw32-gcc version 4.6.1 at -O2 complained:
../../src/conf/nwfilter_params.c: In function 'virNWFilterVarCombIterCreate':
../../src/conf/nwfilter_params.c:346:23: error: 'minValue' may be used uninitialized in this function [-Werror=uninitialized]
../../src/conf/nwfilter_params.c:319:28: note: 'minValue' was declared here
../../src/conf/nwfilter_params.c:344:23: error: 'maxValue' may be used uninitialized in this function [-Werror=uninitialized]
../../src/conf/nwfilter_params.c:319:18: note: 'maxValue' was declared here
cc1: all warnings being treated as errors
even though all paths of the preceding switch statement either
assign the variables or return.
* src/conf/nwfilter_params.c (virNWFilterVarCombIterAddVariable):
Initialize variables.
Address side effect of accessing a variable via an index: Filters
accessing a variable where an element is accessed that is beyond the
size of the list (for example $TEST[10] and only 2 elements are available)
cannot instantiate that filter. Test for this and report proper error
to user.
This patch adds access to single elements of variables via index. Example:
<rule action='accept' direction='in' priority='500'>
<tcp srcipaddr='$ADDR[1]' srcportstart='$B[2]'/>
</rule>
This patch introduces the capability to use a different iterator per
variable.
The currently supported notation of variables in a filtering rule like
<rule action='accept' direction='out'>
<tcp srcipaddr='$A' srcportstart='$B'/>
</rule>
processes the two lists 'A' and 'B' in parallel. This means that A and B
must have the same number of 'N' elements and that 'N' rules will be
instantiated (assuming all tuples from A and B are unique).
In this patch we now introduce the assignment of variables to different
iterators. Therefore a rule like
<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@1]' srcportstart='$B[@2]'/>
</rule>
will now create every combination of elements in A with elements in B since
A has been assigned to an iterator with Id '1' and B has been assigned to an
iterator with Id '2', thus processing their value independently.
The first rule has an equivalent notation of
<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@0]' srcportstart='$B[@0]'/>
</rule>
In this patch we introduce testing whether the iterator points to a
unique set of entries that have not been seen before at one of the previous
iterations. The point is to eliminate duplicates and with that unnecessary
filtering rules by preventing identical filtering rules from being
instantiated.
Example with two lists:
list1 = [1,2,1]
list2 = [1,3,1]
The 1st iteration would take the 1st items of each list -> 1,1
The 2nd iteration would take the 2nd items of each list -> 2,3
The 3rd iteration would take the 3rd items of each list -> 1,1 but
skip them since this same pair has already been encountered in the 1st
iteration
Implementation-wise this is solved by taking the n-th element of list1 and
comparing it against elements 1..n-1. If no equivalent is found, then there
is no possibility of this being a duplicate. In case an equivalent element
is found at position i, then the n-th element in the 2nd list is compared
against the i-th element in the 2nd list and if that is not the same, then
this is a unique pair, otherwise it is not unique and we may need to do
the same comparison on the 3rd list.
When sVirt is integrated with the LXC driver, it will be neccessary
to invoke the security driver APIs using only a virDomainDefPtr
since the lxc_container.c code has no virDomainObjPtr available.
Aside from two functions which want obj->pid, every bit of the
security driver code only touches obj->def. So we don't need to
pass a virDomainObjPtr into the security drivers, a virDomainDefPtr
is sufficient. Two functions also gain a 'pid_t pid' argument.
* src/qemu/qemu_driver.c, src/qemu/qemu_hotplug.c,
src/qemu/qemu_migration.c, src/qemu/qemu_process.c,
src/security/security_apparmor.c,
src/security/security_dac.c,
src/security/security_driver.h,
src/security/security_manager.c,
src/security/security_manager.h,
src/security/security_nop.c,
src/security/security_selinux.c,
src/security/security_stack.c: Change all security APIs to use a
virDomainDefPtr instead of virDomainObjPtr
When disk snapshots were first implemented, libvirt blindly refused
to allow an external snapshot destination that already exists, since
qemu will blindly overwrite the contents of that file during the
snapshot_blkdev monitor command, and we don't like a default of
data loss by default. But VDSM has a scenario where NFS permissions
are intentionally set so that the destination file can only be
created by the management machine, and not the machine where the
guest is running, so that libvirt will necessarily see the destination
file already existing; adding a flag will allow VDSM to force the file
reuse without libvirt complaining of possible data loss.
https://bugzilla.redhat.com/show_bug.cgi?id=767104
* include/libvirt/libvirt.h.in (virDomainSnapshotCreateFlags): Add
VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT.
* src/libvirt.c (virDomainSnapshotCreateXML): Document it. Add
note about partial failure.
* tools/virsh.c (cmdSnapshotCreate, cmdSnapshotCreateAs): Add new
flag.
* tools/virsh.pod (snapshot-create, snapshot-create-as): Document
it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateXML): Implement the new flag.
We had loads of different styles in describing the @flags parameter
for various APIs, as well as several APIs that didn't list which
enums provided the bit values valid for the flags.
The end result is one of two formats:
@flags: bitwise-OR of vir...Flags
@flags: extra flags; not used yet, so callers should always pass 0
* src/libvirt.c: Use common sentences for flags. Also,
(virDomainGetBlockIoTune): Mention virTypedParameterFlags.
(virConnectOpenAuth): Mention virConnectFlags.
(virDomainMigrate, virDomainMigrate2, virDomainMigrateToURI)
(virDomainMigrateToURI2): Mention virDomainMigrateFlags.
(virDomainMemoryPeek): Mention virDomainMemoryFlags.
(virStoragePoolBuild): Mention virStoragePoolBuildFlags.
(virStoragePoolDelete): Mention virStoragePoolDeleteFlags.
(virStreamNew): Mention virStreamFlags.
(virDomainOpenGraphics): Mention virDomainOpenGraphicsFlags.
This *kind of* addresses:
https://bugzilla.redhat.com/show_bug.cgi?id=772395
(it doesn't eliminate the failure to start, but causes libvirt to give
a better idea about the cause of the failure).
If a guest uses a kvm emulator (e.g. /usr/bin/qemu-kvm) and the guest
is started when kvm isn't available (either because virtualization is
unavailable / has been disabled in the BIOS, or the kvm modules
haven't been loaded for some reason), a semi-cryptic error message is
logged:
libvirtError: internal error Child process (LC_ALL=C
PATH=/sbin:/usr/sbin:/bin:/usr/bin /usr/bin/qemu-kvm -device ? -device
pci-assign,? -device virtio-blk-pci,? -device virtio-net-pci,?) status
unexpected: exit status 1
This patch notices at process start that a guest needs kvm, and checks
for the presence of /dev/kvm (a reasonable indicator that kvm is
available) before trying to execute the qemu binary. If kvm isn't
available, a more useful (too verbose??) error is logged.
It should be a copy-paste error, the result is programming will result in an
infinite loop again due to without iterating 'j' variable.
* src/qemu/qemu_driver.c: fix a typo on qemuDomainSetBlkioParameters.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=770520
Signed-off-by: Alex Jia <ajia@redhat.com>
I previously mentioned [1] a PolicyKit issue where libvirt would
proceed with authentication even though polkit-auth failed:
testusr xen134:~> virsh list --all
Attempting to obtain authorization for org.libvirt.unix.manage.
polkit-grant-helper: given auth type (8 -> yes) is bogus
Failed to obtain authorization for org.libvirt.unix.manage.
Id Name State
----------------------------------
0 Domain-0 running
- sles11sp1-pv shut off
AFAICT, libvirt attempts to obtain a privilege it already has,
causing polkit-auth to fail with above message. Instead of calling
obtain and then checking auth, IMO the workflow should be for the
server to check auth first, and if that fails ask the client to
obtain it and check again. This workflow also allows for checking
only successful exit of polkit-auth in virConnectAuthGainPolkit().
[1] https://www.redhat.com/archives/libvir-list/2011-December/msg00837.html
In the past, generic SCSI commands issued from a guest to a virtio
disk were always passed through to the underlying disk by qemu, and
the kernel would also pass them on.
As a result of CVE-2011-4127 (see:
http://seclists.org/oss-sec/2011/q4/536), qemu now honors its
scsi=on|off device option for virtio-blk-pci (which enables/disables
passthrough of generic SCSI commands), and the kernel will only allow
the commands for physical devices (not for partitions or logical
volumes). The default behavior of qemu is still to allow sending
generic SCSI commands to physical disks that are presented to a guest
as virtio-blk-pci devices, but libvirt prefers to disable those
commands in the standard virtio block devices, enabling it only when
specifically requested (hopefully indicating that the requester
understands what they're asking for). For this purpose, a new libvirt
disk device type (device='lun') has been created.
device='lun' is identical to the default device='disk', except that:
1) It is only allowed if bus='virtio', type='block', and the qemu
version is "new enough" to support it ("new enough" == qemu 0.11 or
better), otherwise the domain will fail to start and a
CONFIG_UNSUPPORTED error will be logged).
2) The option "scsi=on" will be added to the -device arg to allow
SG_IO commands (if device !='lun', "scsi=off" will be added to the
-device arg so that SG_IO commands are specifically forbidden).
Guests which continue to use disk device='disk' (the default) will no
longer be able to use SG_IO commands on the disk; those that have
their disk device changed to device='lun' will still be able to use SG_IO
commands.
*docs/formatdomain.html.in - document the new device attribute value.
*docs/schemas/domaincommon.rng - allow it in the RNG
*tests/* - update the args of several existing tests to add scsi=off, and
add one new test that will test scsi=on.
*src/conf/domain_conf.c - update domain XML parser and formatter
*src/qemu/qemu_(command|driver|hotplug).c - treat
VIR_DOMAIN_DISK_DEVICE_LUN *almost* identically to
VIR_DOMAIN_DISK_DEVICE_DISK, except as indicated above.
Note that no support for this new device value was added to any
hypervisor drivers other than qemu, because it's unclear what it might
mean (if anything) to those drivers.
This patch adds two capabilities flags to deal with various aspects
of supporting SG_IO commands on virtio-blk-pci devices:
QEMU_CAPS_VIRTIO_BLK_SCSI
set if -device virtio-blk-pci accepts the scsi="on|off" option
When present, this is on by default, but can be set to off to disable
SG_IO functions.
QEMU_CAPS_VIRTIO_BLK_SG_IO
set if SG_IO commands are supported in the virtio-blk-pci driver
(present since qemu 0.11 according to a qemu developer, if I
understood correctly)
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=638633
Although scripts are not used by interfaces of type other than
"ethernet" in qemu, due to the fact that the parser stores the script
name in a union that is only valid when type is ethernet or bridge,
there is no way for anyone except the parser itself to catch the
problem of specifying an interface script for an inappropriate
interface type (by the time the parsed data gets back to the code that
called the parser, all evidence that a script was specified is
forgotten).
Since the parser itself should be agnostic to which type of interface
allows scripts (an example of why: a script specified for an interface
of type bridge is valid for xen domains, but not for qemu domains),
the solution here is to move the script out of the union(s) in the
DomainNetDef, always populate it when specified (regardless of
interface type), and let the driver decide whether or not it is
appropriate.
Currently the qemu, xen, libxml, and uml drivers recognize the script
parameter and do something with it (the uml driver only to report that
it isn't supported). Those drivers have been updated to log a
CONFIG_UNSUPPORTED error when a script is specified for an interface
type that's inappropriate for that particular hypervisor.
(NB: There was earlier discussion of solving this problem by adding a
VALIDATE flag to all libvirt APIs that accept XML, which would cause
the XML to be validated against the RNG files. One statement during
that discussion was that the RNG shouldn't contain hypervisor-specific
things, though, and a proper solution to this problem would require
that (again, because a script for an interface of type "bridge" is
accepted by xen, but not by qemu).
Commit ae523427 missed one pair of functions that could use
the helper routine.
* src/qemu/qemu_driver.c (qemuSetSchedulerParametersFlags)
(qemuGetSchedulerParametersFlags): Simplify.
On rawhide, gcc is new enough to output new DWARF information that
pdwtags has not yet learned, but the resulting 'make check' output
was rather confusing:
$ make -C src check
...
GEN virkeepaliveprotocol-structs
die__process_function: DW_TAG_INVALID (0x4109) @ <0x58c> not handled!
WARNING: your pdwtags program is too old
WARNING: skipping the virkeepaliveprotocol-structs test
WARNING: install dwarves-1.3 or newer
...
$ pdwtags --version
v1.9
I've filed the pdwtags deficiency as
https://bugzilla.redhat.com/show_bug.cgi?id=772358
* src/Makefile.am (PDWTAGS): Don't leave -t file behind on version
mismatch. Soften warning message, since 1.9 is newer than 1.3.
Don't leak stderr from broken version.
Commit db371a2 mistakenly added new functions inside a #ifndef WIN32
guard, even though they are needed on all platforms.
* src/util/command.c (virCommandFDSet): Move outside WIN32
conditional.
Detected by valgrind. Leak introduced in commit 5745dc1.
* src/qemu/qemu_command.c: fix memory leak on failure and successful path.
* How to reproduce?
% valgrind -v --leak-check=full ./qemuargv2xmltest
* Actual result:
==2196== 80 bytes in 1 blocks are definitely lost in loss record 3 of 4
==2196== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==2196== by 0x39CF07F6E1: strdup (in /lib64/libc-2.12.so)
==2196== by 0x419823: qemuParseRBDString (qemu_command.c:1657)
==2196== by 0x4221ED: qemuParseCommandLine (qemu_command.c:5934)
==2196== by 0x422AFB: qemuParseCommandLineString (qemu_command.c:7561)
==2196== by 0x416864: testCompareXMLToArgvHelper (qemuargv2xmltest.c:48)
==2196== by 0x417DB1: virtTestRun (testutils.c:141)
==2196== by 0x415CAF: mymain (qemuargv2xmltest.c:175)
==2196== by 0x4174A7: virtTestMain (testutils.c:696)
==2196== by 0x39CF01ECDC: (below main) (in /lib64/libc-2.12.so)
==2196==
==2196== LEAK SUMMARY:
==2196== definitely lost: 80 bytes in 1 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
When setting numa nodeset for a domain which has no nodeset set
before, libvirtd crashes by dereferencing the pointer to the old
nodemask which is null in that case.
Commit baade4d fixed a memory leak on failure, but in the process,
introduced a use-after-free on success, which can be triggered with:
1. set bandwidth with --live
2. query bandwidth
3. set bandwidth with --live
* src/qemu/qemu_driver.c (qemuDomainSetInterfaceParameters): Don't
free newBandwidth on success.
Reported by Hu Tao.
Commit b434329 has a logic bug: seclabel overrides don't set
def->type, but the default value is 0 (aka static). Restarting
libvirtd would thus reject the XML for any domain with an
override of <seclabel relabel='no'/> (which happens quite
easily if a disk image lives on NFS), with a message:
2012-01-04 22:29:40.949+0000: 6769: error : virSecurityLabelDefParseXMLHelper:2593 : XML error: security label is missing
Fix the logic to never read from an override's def->type, and
to allow a missing <label> subelement when relabel is no. There's
a lot of stupid double-negatives in the code (!norelabel) because
of the way that we want the zero-initialized defaults to behave.
* src/conf/domain_conf.c (virSecurityLabelDefParseXMLHelper): Use
type field from correct location.
Currently, virCommand implementation uses FD_ macros from
sys/select.h. However, those cannot handle more opened files
than FD_SETSIZE. Therefore switch to generalized implementation
based on array of integers.
xen-unstable c/s 23874:651aed73b39c added another member to
xen_domctl_getdomaininfo struct and bumped domctl version to 8.
Add a corresponding domctl v8 struct in xen hypervisor sub-driver
and detect domctl v8 during initialization.
The console path in xenstore is /local/domain/<id>/console/tty
for PV guests (PV console) and /local/domain/<id>/serial/0/tty
(serial console) for HVM guests. Similar to Xen's in-tree console
client, read the correct path for PV vs HVM.
It is a good practise to set revents to zero before doing any poll().
Moreover, we should check if event we waited for really occurred or
if any of fds we were polling on didn't encountered hangup.
Most severe here is a latent (but currently untriggered) memory leak
if any hypervisor ever adds a string interface property; the
remainder are mainly cosmetic.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BANDWIDTH_*): Move
macros closer to interface that uses them, and document type.
* src/libvirt.c (virDomainSetInterfaceParameters)
(virDomainGetInterfaceParameters): Formatting tweaks.
* daemon/remote.c (remoteDispatchDomainGetInterfaceParameters):
Avoid memory leak.
* src/libvirt_public.syms (LIBVIRT_0.9.9): Sort lines.
* src/libvirt_private.syms (domain_conf.h): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSetInterfaceParameters): Fix
comments, break long lines.
Hi,
this is the fifth version of my SRV record for DNSMasq patch rebased
for the current codebase to the bridge driver and libvirt XML file to
include support for the SRV records in the DNS. The syntax is based on
DNSMasq man page and tests for both xml2xml and xml2argv were added as
well. There are some things written a better way in comparison with
version 4, mainly there's no hack in tests/networkxml2argvtest.c and
also the xPath context is changed to use a simpler query using the
virXPathInt() function relative to the current node.
Also, the patch is also fixing the networkxml2argv test to pass both
checks, i.e. both unit tests and also syntax check.
Please review,
Michal
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Leak detected by Coverity, and introduced in commit 93ab585.
Reported by Alex Jia.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Free
devices array on error.
The blocks to extract node information on a per-arch
basis wasn't well balanced leading to a compilation
failure if not on one of the handled arches (PCs and PPCs)
This wires up the XML changes in the previous patch to let SELinux
labeling honor user overrides, as well as affecting the live XML
configuration in one case where the user didn't specify anything
in the offline XML.
I noticed that the logs contained messages like this:
2011-12-05 23:32:40.382+0000: 26569: warning : SELinuxRestoreSecurityFileLabel:533 : cannot lookup default selinux label for /nfs/libvirt/images/dom.img
for all my domain images living on NFS. But if we would just remember
that on domain creation that we were unable to set a SELinux label (due to
NFSv3 lacking labels, or NFSv4 not being configured to expose attributes),
then we could avoid wasting the time trying to clear the label on
domain shutdown. This in turn is one less point of NFS failure,
especially since there have been documented cases of virDomainDestroy
hanging during an attempted operation on a failed NFS connection.
* src/security/security_selinux.c (SELinuxSetFilecon): Move guts...
(SELinuxSetFileconHelper): ...to new function.
(SELinuxSetFileconOptional): New function.
(SELinuxSetSecurityFileLabel): Honor override label, and remember
if labeling failed.
(SELinuxRestoreSecurityImageLabelInt): Skip relabeling based on
override.
Implement the parsing and formatting of the XML addition of
the previous commit. The new XML doesn't affect qemu command
line, so we can now test round-trip XML->memory->XML handling.
I chose to reuse the existing structure, even though per-device
override doesn't use all of those fields, rather than create a
new structure, in order to reuse more code.
* src/conf/domain_conf.h (_virDomainDiskDef): Add seclabel member.
* src/conf/domain_conf.c (virDomainDiskDefFree): Free it.
(virSecurityLabelDefFree): New function.
(virDomainDiskDefFormat): Print it.
(virSecurityLabelDefFormat): Reduce output if model not present.
(virDomainDiskDefParseXML): Alter signature, and parse seclabel.
(virSecurityLabelDefParseXML): Split...
(virSecurityLabelDefParseXMLHelper): ...into new helper.
(virDomainDeviceDefParse, virDomainDefParseXML): Update callers.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-dynamic-override.args:
New file.
* tests/qemuxml2xmltest.c (mymain): Enhance test.
* tests/qemuxml2argvtest.c (mymain): Likewise.
A future patch will parse and output <seclabel> in more than one
location in a <domain> xml; make it easier to reuse code.
* src/conf/domain_conf.c (virSecurityLabelDefFree): Rename...
(virSecurityLabelDefClear): ...and make static.
(virSecurityLabelDefParseXML): Alter signature.
(virDomainDefParseXML, virDomainDefFree): Adjust callers.
(virDomainDefFormatInternal): Split output...
(virSecurityLabelDefFormat): ...into new helper.
* daemon/remote.c: implement the server side support
* src/remote/remote_driver.c: implement the client side support
* src/remote/remote_protocol.x: definitions for the new entry points
* src/remote_protocol-structs: structure definitions
The APIs are used to set/get domain's network interface's parameters.
Currently supported parameters are bandwidth settings.
* include/libvirt/libvirt.h.in: new API and parameters definition
* python/generator.py: skip the Python API generation
* src/driver.h: add new entry to the driver structure
* src/libvirt_public.syms: export symbols
https://bugzilla.redhat.com/show_bug.cgi?id=770520
We had two nested loops both trying to use 'i' as the iteration
variable, which can result in an infinite loop when the inner
loop interferes with the outer loop. Introduced in commit 93ab585.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Don't
reuse iteration variable across two loops.
This patch adds max_files option to qemu.conf which can be used to
override system default limit on number of opened files that are
allowed for qemu user.
Remove the requirement that DHCP messages have to be broadcasted.
DHCP requests are most often sent via broadcast but can be directed
towards a specific DHCP server. For example 'dhclient' takes '-s <server>'
as a command line parameter thus allowing DHCP requests to be sent to a
specific DHCP server.
Add logic to assign addresses for devices with spapr-vio addresses.
We also do validation of addresses specified by the user, ie. ensuring
that there are not duplicate addresses on the bus.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
For QEMU PPC64 we have a machine type ("pseries") which has a virtual
bus called "spapr-vio". We need to be able to create devices on this
bus, and as such need a way to specify the address for those devices.
This patch adds a new address type "spapr-vio", which achieves this.
The addressing is specified with a "reg" property in the address
definition. The reg is optional, if it is not specified QEMU will
auto-assign an address for the device.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Currently non-x86 guests must have <acpi/> defined in <features> to
prevent libvirt from running qemu with -no-acpi. Although it works, it
is a hack.
Instead add a capability flag which indicates whether qemu understands
the -no-acpi option. Use it to control whether libvirt emits -no-acpi.
Current versions of qemu always display -no-acpi in their help output,
so this patch has no effect. However the development version of qemu
has been modified such that -no-acpi is only displayed when it is
actually supported.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
The RPC code had several latent memory leaks and an attempt to
free the wrong string, but thankfully nothing triggered them
(blkiotune was the only one returning a string, and always as
the last parameter). Also, our cleanups for rpcgen ended up
nuking a line of code that renders VIR_TYPED_PARAM_INT broken,
because it was the only use of 'i' in a function, even though
it was a member usage rather than a standalone declaration.
* daemon/remote.c (remoteSerializeTypedParameters): Free the
correct array element.
(remoteDispatchDomainGetSchedulerParameters)
(remoteDispatchDomainGetSchedulerParametersFlags)
(remoteDispatchDomainBlockStatsFlags)
(remoteDispatchDomainGetMemoryParameters): Don't leak strings.
* src/rpc/genprotocol.pl: Don't nuke member-usage of 'buf' or 'i'.
The lifetime of the virDomainEventState object is tied to
the lifetime of the driver, which in stateless drivers is
tied to the lifetime of the virConnectPtr.
If we add & remove a timer when allocating/freeing the
virDomainEventState object, we can get a situation where
the timer still triggers once after virDomainEventState
has been freed. The timeout callback can't keep a ref
on the event state though, since that would be a circular
reference.
The trick is to only register the timer when a callback
is registered with the event state & remove the timer
when the callback is unregistered.
The demo for the bug is to run
while true ; do date ; ../tools/virsh -q -c test:///default 'shutdown test; undefine test; dominfo test' ; done
prior to this fix, it will frequently hang and / or
crash, or corrupt memory
Currently all drivers using domain events need to provide a callback
for handling a timer to dispatch events in a clean stack. There is
no technical reason for dispatch to go via driver specific code. It
could trivially be dispatched directly from the domain event code,
thus removing tedious boilerplate code from all drivers
Also fix the libxl & xen drivers to pass 'true' when creating the
virDomainEventState, since they run inside the daemon & thus always
expect events to be present.
* src/conf/domain_event.c, src/conf/domain_event.h: Internalize
dispatch of events from timer callback
* src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
src/qemu/qemu_domain.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c,
src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c: Remove all timer dispatch functions
The virDomainEventCallbackList and virDomainEventQueue APIs are
now solely helpers used internally by virDomainEventState APIs.
Remove their decls from domain_event.h since no driver code should
need to use them any more.
* src/conf/domain_event.c: Make virDomainEventCallbackList and
virDomainEventQueue APIs static & remove some unused APIs
* src/conf/domain_event.h, src/libvirt_private.syms: Remove
virDomainEventCallbackList and virDomainEventQueue APIs
No caller of the domain events APIs should need to poke at the
struct internals. Thus they should all be removed from the
header file
* src/conf/domain_event.h: Remove struct definitions
* src/conf/domain_event.c: Add struct definitions
While virDomainEventState has APIs for managing removal of callbacks,
while locked, adding callbacks in the first place requires direct
access to the virDomainEventCallbackList structure. This is not
threadsafe since it is bypassing the virDomainEventState locks
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Add APIs for managing callbacks
via virDomainEventState.
When registering a callback for a particular event some callers
need to know how many callbacks already exist for that event.
While it is possible to ask for a count, this is not free from
race conditions when threaded. Thus the API for registering
callbacks should return the count of callbacks. Also rename
virDomainEventStateDeregisterAny to virDomainEventStateDeregisterID
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Return count of callbacks when
registering callbacks
* src/libxl/libxl_driver.c, src/libxl/libxl_driver.c,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/remote/remote_driver.c, src/uml/uml_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xen_driver.c: Update
for change in APIs
The Xen & VBox drivers deal with callbacks & dispatching of
events directly. All the other drivers use a timer to dispatch
events from a clean stack state, rather than deep inside the
drivers. Convert Xen & VBox over to virDomainEventState so
that they match behaviour of other drivers
* src/conf/domain_event.c: Return count of remaining
callbacks when unregistering event callback
* src/vbox/vbox_tmpl.c, src/xen/xen_driver.c,
src/xen/xen_driver.h: Convert to virDomainEventState
If only iptables rules are created then two unnecessary ebtables chains
are also created. This patch fixes this and prevents these chains from
being created. They have been cleaned up properly, though.
A generic error code was returned, if the user aborted a migration job.
This made it hard to distinguish between a user requested abort and an
error that might have occured. This patch introduces a new error code,
which is returned in the specific case of a user abort, while leaving
all other failures with their existing code. This makes it easier to
distinguish between failure while mirgrating and an user requested
abort.
* include/libvirt/virterror.h: - add new error code
* src/util/virterror.c: - add message for the new error code
* src/qemu/qemu_migration.h: - Emit operation aborted error instead of
operation failed, on migration abort
If managed save fails at the right point in time, then the save
image can end up with 0 bytes in length (no valid header), and
our attempts in commit 55d88def to detect and skip invalid save
files missed this case.
* src/qemu/qemu_driver.c (qemuDomainSaveImageOpen): Also unlink
empty file as corrupt. Reported by Dennis Householder.
Currently, on device detach, we parse given XML, find the device
in domain object, free it and try to restore security labels.
However, in some cases (e.g. usb hostdev) parsed XML contains
less information than freed device. In usb case it is bus & device
IDs. These are needed during label restoring as a symlink into
/dev/bus is generated from them. Therefore don't drop device
configuration until security labels are restored.
In commit 6f84e110 I mistakenly set default migration speed to
33554432 Mb! The units of migMaxBandwidth is Mb, with conversion
handled in qemuMonitor{JSON,Text}SetMigrationSpeed().
Also, remove definition of QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX since
it is no longer used after reverting commit ef1065cf.
If an async job run on a domain will stop the domain at the end of the
job, a concurrently run query job can hang in qemu monitor and nothing
can be done with that domain from this point on. An attempt to start
such domain results in "Timed out during operation: cannot acquire state
change lock" error.
However, quite a few things have to happen at the right time... There
must be an async job running which stops a domain at the end. This race
was reported with dump --crash but other similar jobs, such as
(managed)save and migration, should be able to trigger this bug as well.
While this async job is processing its last monitor command, that is a
query-migrate to which qemu replies with status "completed", a new
libvirt API that results in a query job must arrive and stay waiting
until the query-migrate command finishes. Once query-migrate is done but
before the async job closes qemu monitor while stopping the domain, the
other thread needs to wake up and call qemuMonitorSend to send its
command to qemu. Before qemu gets a chance to respond to this command,
the async job needs to close the monitor. At this point, the query job
thread is waiting for a condition that no-one will ever signal so it
never finishes the job.
* src/qemu/qemu_hostdev.c (qemuDomainReAttachHostdevDevices):
pciDeviceListFree(pcidevs) in the end free()s the device even if
it's in use by other domain, which can cause a race.
How to reproduce:
<script>
virsh nodedev-dettach pci_0000_00_19_0
virsh start test
virsh attach-device test hostdev.xml
virsh start test2
for i in {1..5}; do
echo "[ -- ${i}th time --]"
virsh nodedev-reattach pci_0000_00_19_0
done
echo "clean up"
virsh destroy test
virsh nodedev-reattach pci_0000_00_19_0
</script>
Device pci_0000_00_19_0 dettached
Domain test started
Device attached successfully
error: Failed to start domain test2
error: Requested operation is not valid: PCI device 0000:00:19.0 is in use by domain test
[ -- 1th time --]
Device pci_0000_00_19_0 re-attached
[ -- 2th time --]
Device pci_0000_00_19_0 re-attached
[ -- 3th time --]
Device pci_0000_00_19_0 re-attached
[ -- 4th time --]
Device pci_0000_00_19_0 re-attached
[ -- 5th time --]
Device pci_0000_00_19_0 re-attached
clean up
Domain test destroyed
Device pci_0000_00_19_0 re-attached
The patch also fixes another problem, there won't be error like
"qemuDomainReAttachHostdevDevices: Not reattaching active
device 0000:00:19.0" in daemon log if some device is in active.
As pciResetDevice and pciReattachDevice won't be called for
the device anymore. This is sensible as we already reported
error when preparing the device if it's active. Blindly trying
to pciResetDevice & pciReattachDevice on the device and getting
an error is just redundant.
This patch fixes two problems:
1) The device will be reattached to host even if it's not
managed, as there is a "pciDeviceSetManaged".
2) The device won't be reattached to host with original
driver properly. As it doesn't honor the device original
properties which are maintained by driver->activePciHostdevs.
This chunk of code below repeated in several functions, factor it into
a helper method virDomainLiveConfigHelperMethod to eliminate duplicated code
based on Eric and Adam's suggestion. I have tested it for all the
relevant APIs changed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
If the vol object is newly created, it increases the volumes count,
but doesn't decrease the volumes count when do cleanup. It can
cause libvirtd to crash when one trying to free the volume objects
like:
for (i = 0; i < pool->volumes.count; i++)
virStorageVolDefFree(pool->volumes.objs[i]);
It's more reliable if we add the newly created vol object in the
end.
When destroying a domain qemuDomainDestroy kills its qemu process and
starts a new job, which means it unlocks the domain object and locks it
again after some time. Although the object is usually unlocked for a
pretty short time, chances are another thread processing an EOF event on
qemu monitor is able to lock the object first and does all the cleanup
by itself. This leads to wrong shutoff reason and lifecycle event detail
and virDomainDestroy API incorrectly reporting failure to destroy an
inactive domain.
Reported by Charlie Smurthwaite.
Current "-ay | -an" has problems on pool starting/refreshing if
the volumes are clustered. Rommer has posted a patch to list 2
months ago.
https://www.redhat.com/archives/libvir-list/2011-October/msg01116.html
But IMO we shouldn't skip the inactived vols. So this is a squashed
patch by Rommer.
Signed-off-by: Rommer <rommer@active.by>
Network disks don't have paths to be resolved or files to be checked
for ownership. ee3efc41e6 checked this
for some image label functions, but was partially reverted in a
refactor. This finishes adding the check to each security driver's
set and restore label methods for images.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
This patch addresses https://bugzilla.redhat.com/show_bug.cgi?id=760442
When a network has any forward type other than route, nat or none, the
network configuration should be done completely external to libvirt -
libvirt only uses these types to allow configuring guests in a manner
that isn't tied to a specific host (all the host-specific information,
in particular interface names, port profile data, and bandwidth
configuration is in the network definition, and the guest
configuration only references it).
Due to a bug in the bridge network driver, libvirt was adding iptables
rules for networks with forward type='bridge' etc. any time libvirtd
was restarted while one of these networks was active.
This patch eliminates that error by only "reloading" iptables rules if
forward type is route, nat, or none.
Currently qemuDomainAssignPCIAddresses() is called to assign addresses
to PCI devices.
We need to do something similar for devices with spapr-vio addresses.
So create one place where address assignment will be done, that is
qemuDomainAssignAddresses().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
For the PPC64 pseries machine type we need to add address information
for the spapr-vty device.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
In QEMU PPC64 we have a network device called "spapr-vlan". We can specify
this using the existing syntax for network devices, however libvirt
currently rejects "spapr-vlan" in virDomainNetDefParseXML() because of
the "-". Fix the code to accept "-".
* src/conf/domain_conf.c (virDomainNetDefParseXML): Allow '-' in
model name, and be more efficient.
* docs/schemas/domaincommon.rng: Limit valid model names to match code.
Based on a patch by Michael Ellerman.
When parsing ppc64 models on an x86 host an out-of-memory error message is displayed due
to it checking for retcpus being NULL. Fix this by removing the check whether retcpus is NULL
since we will realloc into this variable.
Also in the X86 model parser display the OOM error at the location where it happens.
Fix memory leak:
==27534== 24 bytes in 1 blocks are definitely lost in loss record 207 of 530
==27534== at 0x4A05E46: malloc (vg_replace_malloc.c:195)
==27534== by 0x38EC26EC37: vasprintf (in /lib64/libc-2.13.so)
==27534== by 0x4E998E6: virVasprintf (util.c:1677)
==27534== by 0x4E999F1: virAsprintf (util.c:1695)
==27534== by 0x4F1EAAC: nodeGetInfo (nodeinfo.c:593)
==27534== by 0x47948F: qemuCapsInitCPU (qemu_capabilities.c:855)
==27534== by 0x4796B1: qemuCapsInit (qemu_capabilities.c:915)
==27534== by 0x456550: qemuCreateCapabilities (qemu_driver.c:245)
==27534== by 0x4578C4: qemudStartup (qemu_driver.c:580)
==27534== by 0x4F20886: virStateInitialize (libvirt.c:852)
==27534== by 0x420E55: daemonRunStateInit (libvirtd.c:1156)
==27534== by 0x4E94C56: virThreadHelper (threads-pthread.c:157)
Mark this leaked variable as const char * when it is passed into another
function.
Pool creates new workers dynamically. However, it is possible
for a pool to have no workers. If we want to free that pool,
we don't want to wait on quit condition as it will never be
signaled.
Add support for newly supported Intel cpu features. Newly supported
flags are: pclmuldq, dtes64, smx, fma, pdcm, movbe, xsave, osxsave and
avx. This adds support for Intel's Sandy Bridge platform.
A preparatory patch for DHCP snooping where we want to be able to
differentiate between a VM's interface using the tuple of
<VM UUID, Interface MAC address>. We assume that MAC addresses could
possibly be re-used between different networks (VLANs) thus do not only
want to rely on the MAC address to identify an interface.
At the current 'final destination' in virNWFilterInstantiate I am leaving
the vmuuid parameter as ATTRIBUTE_UNUSED until the DHCP snooping patches arrive.
(we may not post the DHCP snooping patches for 0.9.9, though)
Mostly this is a pretty trivial patch. On the lowest layers, in lxc_driver
and uml_conf, I am passing the virDomainDefPtr around until I am passing
only the VM's uuid into the NWFilter calls.
This patch cleans up return codes in the nwfilter subsystem.
Some functions in nwfilter_conf.c (validators and formatters) are
keeping their bool return for now and I am converting their return
code to true/false.
All other functions now have failure return codes of -1 and success
of 0.
[I searched for all occurences of ' 1;' and checked all 'if ' and
adapted where needed. After that I did a grep for 'NWFilter' in the source
tree.]
In some error situations, the function testDomainRestoreFlags() could
unlock the test driver mutex without first locking it. This patch
moves the lock operation earlier, so that it occurs before any
potential jump down to the unlock call.
I found this problem while auditing the test driver lock usage to
determine the cause of a hang while running the following test:
cd tests; while true; do printf x; ./undefine; done
This patch *does not* solve that problem, but we now understand its
actual source, and danpb is working on a patch.
assumptions from generic code.
This implements the minimal set of changes needed in libvirt to launch a
PowerPC-KVM based guest.
It removes x86-specific assumptions about choice of serial driver backend
from generic qemu guest commandline generation code.
It also restricts the ACPI capability to be available for an x86 or
x86_64 domain.
This is not a complete solution -- it still does not guarantee libvirt
the capability to flag non-supported options in guest XML. (Eg, an ACPI
specification in a PowerPC guest XML will still get processed, even
though qemu-system-ppc64 does not support it while qemu-system-x86_64 does.)
This drawback exists because libvirt falls back on qemu to query supported
features, and qemu '-h' blindly lists all capabilities -- irrespective
of whether they are available while emulating a given architecture or not.
The long-term solution would be for qemu to list out capabilities based
on architecture and platform -- so that libvirt can cleanly make out what
devices are supported on an arch (say 'ppc64') and platform (say, 'mac99').
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
This enables libvirt to select the correct qemu binary (qemu-system-ppc64)
for a guest vm based on arch 'ppc64'.
Also, libvirt is enabled to correctly parse the list of supported PowerPC
CPUs, generated by running 'qemu-system-ppc64 -cpu ?'
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Acked-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
/proc/cpuinfo
Libvirt at present depends on /proc/cpuinfo to gather host
details such as CPUs, cores, threads, etc. This is an architecture-
dependent approach. An alternative is to use 'Sysfs', which provides
a platform-agnostic interface to parse host CPU topology.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
On RHEL 5, with libxml2-2.6.26, the build failed with:
virsh.c: In function 'vshNodeIsSuperset':
virsh.c:11951: warning: implicit declaration of function 'xmlChildElementCount'
(or if warnings aren't errors, a link failure later on).
* src/util/xml.h (virXMLChildElementCount): New prototype.
* src/util/xml.c (virXMLChildElementCount): New function.
* src/libvirt_private.syms (xml.h): Export it.
* tools/virsh.c (vshNodeIsSuperset): Use it.
When one thread passes the buck to another thread, it uses
virCondSignal to wake up the target thread. The variable
'haveTheBuck' is not updated in a race-free manner when
this occurs. The current thread sets it to false, and the
woken up thread sets it to true. There is a window where
a 3rd thread can come in and grab the buck.
Even if this didn't lead to crashes & deadlocks, this would
still result in unfairness in the buckpassing algorithm.
A better solution is to *never* set haveTheBuck to false
when we're passing the buck. Only set it to false when there
is no further thread waiting for the buck.
* src/rpc/virnetclient.c: Only set haveTheBuck to false
if no thread is waiting
Commit fd06692544 tried to fix
a race condition in
commit fa9595003d
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Fri Nov 11 15:28:41 2011 +0000
Explicitly track whether the buck is held in remote client
Unfortunately there is a second race condition whereby the
event loop can trigger due to incoming data to read. Revert
this fix, so a complete fix for the problem can be cleanly
applied
* src/rpc/virnetclient.c: Revert fd06692544
With security_driver set to "none" in /etc/libvirt/qemu.conf,
libvirtd would crash when attempted to attach to an existing
qemu process. Only copy the security model if it actually exists.
During virDomainDestroy, QEMU may emit SHUTDOWN event as a response to
SIGTERM and since domain object is still locked, the event is processed
after the domain is destroyed. We need to ignore this event in such case
to avoid changing domain state from shutoff to shutdown.
This patch is to expose the fabric_name of fc_host class, which
might be useful for users who wants to known which fabric the
(v)HBA connects to.
The patch also adds the missed capabilities' XML schema of scsi_host,
(of course, with fabric_wwn added), and update the documents
(docs/formatnode.html.in)
Currently if you try to connect to a local libvirtd when
libvirtd is not in $PATH, you'll get an error
error: internal error invalid use of command API
This is because remoteFindDaemonPath() returns NULL, which
causes us to pass NULL into virNetSocketConnectUNIX which
in turn causes us to pass NULL into virCommandNewArgList.
Adding missing error checks improves this to
error: internal error Unable to locate libvirtd daemon in $PATH
* src/remote/remote_driver.c: Report error if libvirtd
cannot be found
* src/rpc/virnetsocket.c: Report error if caller requested
spawning of daemon, but provided no binary path
The Mingw32 linker highlighted that the symbols for virtime.h
declared in libvirt_private.syms were incorrect
* src/libvirt_private.syms: Fix virtime.h symbols
When QEMU guest finishes its shutdown sequence, qemu stops virtual CPUs
and when started with -no-shutdown waits for us to kill it using
SGITERM. Since QEMU is flushing its internal buffers, some time may pass
before QEMU actually dies. We mistakenly used "paused" state (and
events) for this which is quite confusing since users may see a domain
going to pause while they expect it to shutdown. Since we already have
"shutdown" state with "the domain is being shut down" semantics, we
should use it for this state.
However, the state didn't have a corresponding event so I created one
and called its detail as VIR_DOMAIN_EVENT_SHUTDOWN_FINISHED (guest OS
finished its shutdown sequence) with the intent to add
VIR_DOMAIN_EVENT_SHUTDOWN_STARTED in the future if we have a
sufficiently capable guest agent that can notify us when guest OS starts
to shutdown.
Otherwise connections to older libvirt abort with:
$ virsh -c qemu+ssh://host.example.com/system list
error: invalid connection pointer in virDrvSupportsFeature
error: failed to connect to the hypervisor
Tested against 0.8.3 and 0.9.8-rc2.
https://bugzilla.redhat.com/show_bug.cgi?id=648855 mentioned a
misuse of 'an' where 'a' is proper; that has since been fixed,
but a search found other problems (some were a spelling error for
'and', while most were fixed by 'a').
* daemon/stream.c: Fix grammar.
* src/conf/domain_conf.c: Likewise.
* src/conf/domain_event.c: Likewise.
* src/esx/esx_driver.c: Likewise.
* src/esx/esx_vi.c: Likewise.
* src/rpc/virnetclient.c: Likewise.
* src/rpc/virnetserverprogram.c: Likewise.
* src/storage/storage_backend_fs.c: Likewise.
* src/util/conf.c: Likewise.
* src/util/dnsmasq.c: Likewise.
* src/util/iptables.c: Likewise.
* src/xen/xen_hypervisor.c: Likewise.
* src/xen/xend_internal.c: Likewise.
* src/xen/xs_internal.c: Likewise.
* tools/virsh.c: Likewise.
virBufferContentAndReset (intentionally) returns NULL for a buffer
with no content, but it is feasible to invoke a command with an
explicit empty string.
* src/util/command.c (virCommandAddEnvBuffer): Reject empty string.
(virCommandAddArgBuffer): Allow explicit empty argument.
* tests/commandtest.c (test9): Test it.
* tests/commanddata/test9.log: Adjust.
The RPC fixups needed on Linux are also needed on cygwin, and
worked without further tweaking to the list of fixups. Also,
unlike BSD, Cygwin exports 'struct ifreq', but unlike Linux,
Cygwin lacks the ioctls that we were using 'struct ifreq' to
access. This patch allows compilation under cygwin.
* src/rpc/genprotocol.pl: Also perform fixups on cygwin.
* src/util/virnetdev.c (HAVE_STRUCT_IFREQ): Also require AF_PACKET
definition.
* src/util/virnetdevbridge.c (virNetDevSetupControlFull): Only
compile if SIOCBRADDBR works.
The pathname for the pipe for tunnelled migration is unresolvable. The
libvirt apparmor driver therefore refuses access, causing migration to
fail. If we can't resolve the path, the worst that can happen is that
we should have given permission to the file but didn't. Otherwise
(especially since this is a /proc/$$/fd/N file) the file is already open
and libvirt won't be refused access by apparmor anyway.
Also adjust virt-aa-helper to allow access to the
*.tunnelmigrate.dest.name files.
For more information, see https://launchpad.net/bugs/869553.
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Originaly, the code checked if another client is the queue and infered
ownership of the buck from that. Commit fa9595003d
added a separate variable to track the buck. That caused, that a new
call might enter claiming it has the buck, while another thread was
signalled to take the buck. This ends in two threads claiming they hold
the buck and entering poll(). This happens due to a race on waking up
threads on the client lock mutex.
This caused multi-threaded clients to hang, most prominently visible and
reproducible on python based clients, like virt-manager.
This patch causes threads, that have been signalled to take the buck to
re-check if buck is held by another thread.
This ought to fix the build if you have net/if.h but do
not have struct ifreq
* configure.ac: Check for struct ifreq in net/if.h
* src/util/virnetdev.c: Conditionalize to avoid use of
struct ifreq if it does not exist
probes.h can only be generated on Linux, and then only with dtrace
installed. If it is part of the tarball, then either 'make dist'
will fail if you don't have that setup, or we would have to start
keeping probes.h in libvirt.git. Since we only need it to be
generated when dtrace is in use, it's better to avoid shipping
it in the first place, and avoid tracking it in git.
Meanwhile, there is a build dependency - since the RPC code is
generated, it can be built early; but when dtrace is enabled, we
must ensure probes.h is built even earlier. Commit 1afcfbdd tried
to fix this, but did so in a way that added probes.h into the
tarball, and broke VPATH as well. Commit ecbca767 fixed VPATH,
but didn't fix the more fundamental problem. This patch solves
the issue by adding a dependency instead.
Tested with 'make dist' in a clean VPATH builds, for both
'./configure --without-dtrace' and './configure --with-dtrace';
all configurations were able to correctly build a tarball, and
the dtrace configuration no longer sticks probes.h in the tarball.
* src/Makefile.am (REMOTE_DRIVER_GENERATED): Don't ship probes.h;
rather, make it a dependency.
Fix a logic error, the initial value of ret = -1, if just set --config,
it will goto endjob directly without doing its really job here.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
The glibc time.h header has an undocumented __isleap macro
that we are using. Since it is undocumented & does not appear
on any other OS, stop using it and just define the macro in
libvirt code instead.
* src/util/virtime.c: Remove __isleap usage
Only one IPv4 DHCP definition is supported. Originally the code checked
for a multiple definition and returned an error, but the new domain
definition was already added to networks. This patch moves the check
before the newly defined network is added to active networks.
*src/network/bridge_driver.c: networkDefine(): - move multiple IPv4
addresses check before
definition is used.
Detected by Coverity. Leak introduced in commit c1df2c1.
Two bugs here:
1. memory leak on successful parse
2. failure to parse still returned success
Signed-off-by: Alex Jia <ajia@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Detected by Coverity. Leak introduced in commit 8866eed.
Two bugs here:
1. logfd wasn't closed on all return paths
2. if we failed to mark a domain autodestroy, then the domain
was not made transient but we still returned success
Signed-off-by: Alex Jia <ajia@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Detected by Coverity. Leak introduced in commit 673adba.
Two separate bugs here:
1. call was not freed on all error paths
2. virCondDestroy was called even if virCondInit failed
Signed-off-by: Alex Jia <ajia@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
To add support for running libvirt on PowerPC, a CPU driver for the
PowerPC platform must be added.
Most generic cpu driver routines such as CPU compare, decode, etc
are based on CPUID comparison and are not relevant for non-x86
platforms.
Here, we introduce stubs for relevant PowerPC routines invoked by libvirt.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@au.ibm.com>
filter 0-device-weight when:
- getting blkio parameters with --config
- starting up a domain
When testing with blkio, I found these issues:
(dom is down)
virsh blkiotune dom --device-weights /dev/sda,300,/dev/sdb,500
virsh blkiotune dom --device-weights /dev/sda,300,/dev/sdb,0
virsh blkiotune dom
weight : 800
device_weight : /dev/sda,200,/dev/sdb,0
# issue 1: shows 0 device weight of /dev/sdb that may confuse user
(continued)
virsh start dom
# issue 2: If /dev/sdb doesn't exist, libvirt refuses to bring the
# dom up because it wants to set the device weight to 0 of a
# non-existing device. Since 0 means no weight-limit, we really don't
# have to set it.
Prior to this patch, for a running dom, the commands:
$ virsh blkiotune dom --device-weights /dev/sda,502,/dev/sdb,498
$ virsh blkiotune dom --device-weights /dev/sda,503
$ virsh blkiotune dom
weight : 500
device_weight : /dev/sda,503
claim that /dev/sdb no longer has a non-default weight, but
directly querying cgroups says otherwise:
$ cat /cgroup/blkio/libvirt/qemu/dom/blkio.weight_device
8:0 503
8:16 498
After this patch, an explicit 0 is required to remove a device path
from the XML, and omitting a device path that was previously
specified leaves that device path untouched in the XML, to match
cgroups behavior.
* src/qemu/qemu_driver.c (parseBlkioWeightDeviceStr): Rename...
(qemuDomainParseDeviceWeightStr): ...and use correct type.
(qemuDomainSetBlkioParameters): After parsing string, modify
rather than replacing existing table.
* tools/virsh.pod (blkiotune): Tweak wording.
The next patch will make it possible to have virDomainSetBlkioParameters
leave device weights unchanged if they are not mentioned in the incoming
string, but this only works if the list of block weights does not allow
duplicate paths. Technically, a user can still confuse libvirt by
passing alternate spellings that resolve to the same device, but it
is not worth worrying about working around that kind of abuse.
* src/conf/domain_conf.c (virDomainDefParseXML): Require unique
paths.
Implement the block I/O throttle setting and getting support to qemu
driver.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Enable block I/O throttle for per-disk in XML, as the first
per-disk IO tuning parameter.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Support Block I/O Throttle setting and query to remote driver.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>