Commit Graph

8181 Commits

Author SHA1 Message Date
Michal Privoznik
96a02703da sanlock: Retry after EINPROGRESS
It may take some time for sanlock to add a lockspace. And if user
restart libvirtd service meanwhile, the fresh daemon can fail adding
the same lockspace with EINPROGRESS. Recent sanlock has
sanlock_inq_lockspace() function which should block until lockspace
changes state. If we are building against older sanlock we should
retry a few times before claiming an error. This issue can be easily
reproduced:

for i in {1..1000} ; do echo $i; service libvirtd restart; sleep 2; done
20
Stopping libvirtd daemon:                                  [FAILED]
Starting libvirtd daemon:                                  [  OK  ]
21
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]
22
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

 error : virLockManagerSanlockSetupLockspace:334 : Unable to add
 lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation now in
 progress
2012-11-16 08:00:11 +01:00
Viktor Mihajlovski
a2b3d7cff8 qemu, lxc: Change host CPU number detection logic.
The drivers for QEMU and LXC use virNodeGetInfo only to determine
the number of host CPUs. On Linux hosts nodeGetCPUCount has less
overhead.
2012-11-15 08:48:19 -07:00
Viktor Mihajlovski
0c996c10e4 nodeinfo: enable nodeGetCPUCount for older kernels
Since /sys/devices/system/cpu/present is not available on
older kernels like on RHEL 5.x nodeGetCPUCount will
fail there. The fallback implemented is to scan for
/sys/devices/system/cpu/cpuNN entries.

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2012-11-14 20:43:54 -07:00
Miloslav Trmač
39c814ff46 Use helper functions to format the journal iov array
This simplifies the top-level code, at the cost of using a little more
stack space.  The primary benefit is being able to send more fields
without knowing in advance how many of them, and of which types, these
fields will be, and without having to individually add buffer variables.

The code imposes an upper limit on the total number of iovs/buffers
used, and fields that wouldn't fit are silently dropped.  This is not
significant in this patch, but will affect the following one.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2012-11-14 20:20:02 -07:00
Miloslav Trmač
37f7a1faf1 Add metadata to virLogOutputFunc
... and update all users.  No change in functionality, the parameter
will be used in the next patch.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2012-11-14 19:14:07 -07:00
Miloslav Trmač
c780e9b882 Add a metadata parameter to virLog{, V}Message
... and update all users.  No change in functionality, the parameter
will be used later.

The metadata representation is as minimal as possible, but requires
the caller to allocate an array on stack explicitly.

The alternative of using varargs in the virLogMessage() callers:
* Would not allow the caller to optionally omit some metadata elements,
  except by having two calls to virLogMessage.
* Would not be as type-safe (e.g. using int vs. size_t), and the compiler
  wouldn't be able to do type checking
* Depending on parameter order:
  a) virLogMessage(..., message format, message params...,
                   metadata..., NULL)
     can not be portably implemented (parse_printf_format() is a glibc
     function)
  b) virLogMessage(..., metadata..., NULL,
                   message format, message params...)
     would prevent usage of ATTRIBUTE_FMT_PRINTF and the associated
     compiler checking.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2012-11-14 19:08:31 -07:00
Ján Tomko
a4c19459aa qemu: add bootindex for usb-host and usb-redir devices
Allow bootindex to be specified for redirected USB devices and host USB
devices.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=805414
2012-11-14 19:03:18 -07:00
Laine Stump
bc4b433098 util: fix index when building lock owners array
The "restart" function for locks allocates a new array according to
and pre-sets its length, then reads the owner pids from a JSON
document in a loop. Rather than adding each owner at a different
index, though, it repeatedly overwrites the last element of the array
with all the owners.
2012-11-14 12:43:49 -05:00
Daniel P. Berrange
3782814d4a Fix uninitialized variable in virLXCControllerSetupDevPTS
The lack of initialization of 'opts' caused a SEGV in the
cleanup: path if the root->src directory did not exist
2012-11-14 15:39:48 +00:00
Michal Privoznik
9f87247235 qemu: Don't force port=0 for SPICE
If domain uses only TLS port we don't want to add
'port=0' explicitly to command line.
2012-11-14 10:07:27 +01:00
Peter Krempa
30f1bccf33 snapshot: qemu: Fix detection of external snapshots when deleting
This patch adds a helper to determine if snapshots are external and uses
the helper to fix detection of those in snapshot deletion code.

Snapshots are external if they have an external memory image or if the
disk locations are external. As mixed snapshots are forbidden for now
we need to check just one disk to know.
2012-11-13 20:36:26 +01:00
Peter Krempa
9576afd110 nodeinfo: Add check and workaround to guarantee valid cpu topologies
Lately there were a few reports of the output of the virsh nodeinfo
command being inaccurate. This patch tries to avoid that by checking if
the topology actually makes sense. If it doesn't we then report a
synthetic topology that indicates to the user that the host capabilities
should be checked for the actual topology.
2012-11-13 00:35:29 +01:00
Michal Privoznik
fd723164c7 AbortJob: Fix documentation
This API was never synchronous and probably doesn't even need to be.
2012-11-12 10:39:39 +01:00
Michal Privoznik
ab5e7d4977 qemu: Allow migration to be cancelled at prepare phase
Currently, if user calls virDomainAbortJob we just issue
'migrate_cancel' and hope for the best. However, if user calls
the API in wrong phase when migration hasn't been started yet
(perform phase) the cancel request is just ignored. With this
patch, the request is remembered and as soon as perform phase
starts, migration is cancelled.
2012-11-12 10:39:39 +01:00
Viktor Mihajlovski
b1c88c1476 capabilities: defaultConsoleTargetType can depend on architecture
For S390, the default console target type cannot be of type 'serial'.
It is necessary to at least interpret the 'arch' attribute
value of the os/type element to produce the correct default type.

Therefore we need to extend the signature of defaultConsoleTargetType
to account for architecture. As a consequence all the drivers
supporting this capability function must be updated.

Despite the amount of changed files, the only change in behavior is
that for S390 the default console target type will be 'virtio'.

N.B.: A more future-proof approach could be to to use hypervisor
specific capabilities to determine the best possible console type.
For instance one could add an opaque private data pointer to the
virCaps structure (in case of QEMU to hold capsCache) which could
then be passed to the defaultConsoleTargetType callback to determine
the console target type.
Seems to be however a bit overengineered for the use case...

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2012-11-09 09:20:59 -07:00
Peter Krempa
02cf57c0d0 qemu: Fix domain ID numbering race condition
When the libvirt daemon is restarted it tries to reconnect to running
qemu domains. Since commit d38897a5d4 the
re-connection code runs in separate threads. In the original
implementation the maximum of domain ID's (that is used as an
initializer for numbering guests created next) while libvirt was
reconnecting to the guest.

With the threaded implementation this opens a possibility for race
conditions with the thread that is autostarting guests. When there's a
guest running with id 1 and the daemon is restarted. The autostart code
is reached first and spawns the first guest that should be autostarted
as id 1. This results into the following unwanted situation:

 # virsh list
   Id    Name                           State
  ----------------------------------------------------
   1     guest1                         running
   1     guest2                         running

This patch extracts the detection code before the re-connection threads
are started so that the maximum id of the guests being reconnected to is
known.

The only semantic change created by this is if the guest with greatest ID
quits before we are able to reconnect it's ID is used anyway as the
greatest one as without this patch the greatest ID of a process we could
successfuly reconnect to would be used.
2012-11-09 00:12:38 +01:00
Philipp Hahn
e0c469e58b storage: fix broken backing chain
82507838 refactored the code to keep both the raw and canonicalized form
of the backingStore, which breaks badly when the storage pool contains a
storage volume, which is missing its backing store file:
 # ./daemon/libvirtd -l
 2012-11-07 12:43:33.279+0000: 22175: info : libvirt version: 1.0.0
 2012-11-07 12:43:33.279+0000: 22175: error : absolutePathFromBaseFile:542 : Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory
 2012-11-07 12:43:33.280+0000: 22175: error : storageDriverAutostart:115 : Failed to autostart storage pool 'default': Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory

This is because virStorageFileGetMetadataFromBuf() aborts with -1 if the
filename of the backingStore can not be canonicalized:
 #0  absolutePathFromBaseFile () at util/storage_file.c:541
 #1  virStorageFileGetMetadataFromBuf () at util/storage_file.c:728
 #2  virStorageFileGetMetadataFromFD () at util/storage_file.c:932
 #3  virStorageBackendProbeTarget () at storage/storage_backend_fs.c:94
 #4  virStorageBackendFileSystemRefresh () at storage/storage_backend_fs.c:849
 #5  storagePoolStart () at storage/storage_driver.c:700
 #6  virStoragePoolCreate () at libvirt.c:12471
 ...

Treat files which miss their backing file as standalone files.

Signed-off-by: Philipp Hahn <hahn@univention.de>
2012-11-08 16:03:36 -07:00
Peter Krempa
e124f49890 qemu: Fix function header formating of 2 functions
Headers of qemuDomainSnapshotLoad and qemuDomainNetsRestart were
improperly formatted.
2012-11-08 13:45:45 +01:00
Peter Krempa
9b5a514b31 snapshot: qemu: Add support for external inactive snapshots
This patch adds support for external disk snapshots of inactive domains.
The snapshot is created by calling using qemu-img by calling:

 qemu-img create -f format_of_snapshot -o
 backing_file=/path/to/src,backing_fmt=format_of_backing_image
 /path/to/snapshot

in case the backing image format is known or probing is allowed and
otherwise:

 qemu-img create -f format_of_snapshot -o  backing_file=/path/to/src
 /path/to/snapshot

on each of the disks selected for snapshotting. This patch also modifies
the snapshot preparing function to support creating external snapshots
and to sanitize arguments. For now the user isn't able to mix external
and internal snapshots but this restriction might be lifted in the
future.
2012-11-08 11:27:34 +01:00
Michal Privoznik
a08fc66d90 qemu: Emit event if 'cont' fails
Some operations, APIs needs domain to be paused prior operation can be
performed, e.g. (managed-) save of a domain. The processors should be
restored in the end. However, if 'cont' fails for some reason, we log a
message but this is not sufficient as an event should be emitted as
well. Mgmt application can then decide what to do.
2012-11-07 12:06:09 +01:00
Peter Krempa
fb58f8e2a4 qemu: Don't corrupt pointer in qemuDomainSaveMemory()
The code that was split out into the qemuDomainSaveMemory expands the
pointer containing the XML description of the domain that it gets from
higher layers. If the pointer changes the old one is invalid and the
upper layer function tries to free it causing an abort.

This patch changes the expansion of the original string to a new
allocation and copy of the contents.
2012-11-06 14:45:27 +01:00
Martin Kletzander
9c294e6f9a esx: Yet another connection fix for 5.1
After the connection to ESX 5.1 being broken since g1e7cd39, the fix
in bab7752c helped a bit, but still missed a spot, so the connection
is now successful, but some APIs (for example defineXML) don't work.
Two cases missing are added in this patch to avoid that.
2012-11-06 11:09:00 +01:00
Michal Privoznik
0f720ab35a qemu: Add controllers in specified order
qemu is sensitive to the order of arguments passed. Hence, if a
device requires a controller, the controller cmd string must
precede device cmd string. The same apply for controllers, when
for instance ccid controller requires usb controller. So
controllers create partial ordering in which they should be added
to qemu cmd line.
2012-11-06 10:11:34 +01:00
Michal Privoznik
77b93dbc3e qemu: Wrap controllers code into dummy loop
which just re-indent code and prepare it for next patch.
2012-11-06 10:11:34 +01:00
Michal Privoznik
46325e5131 iohelper: Don't report errors on special FDs
Some FDs may not implement fdatasync() functionality,
e.g.  pipes. In that case EINVAL or EROFS is returned.
We don't want to fail then nor report any error.

Reported-by: Christophe Fergeau <cfergeau@redhat.com>
2012-11-05 16:55:42 +01:00
Peter Krempa
0dac29d89f snapshot: qemu: Remove restrictions preventing external checkpoints
Some of the pre-snapshot check have restrictions wired in regarding
configuration options that influence taking of external checkpoints.

This patch removes restrictions that would inhibit taking of such a
snapshot.
2012-11-04 20:17:57 +01:00
Peter Krempa
f569b87f51 snapshot: qemu: Add support for external checkpoints
This patch adds support to take external system checkpoints.

The functionality is layered on top of the previous disk-only snapshot
code. When the checkpoint is requested the domain memory is saved to the
memory image file using migration to file. (The user may specify to
take the memory image while the guest is live with the
VIR_DOMAIN_SNAPSHOT_CREATE_LIVE flag.)

The memory save image shares format with the image created by
virDomainSave() API.
2012-11-04 16:53:32 +01:00
Peter Krempa
b5fd404471 snapshot: qemu: Rename qemuDomainSnapshotCreateActive
Before now, libvirt supported only internal snapshots for active guests.
This patch renames this function to qemuDomainSnapshotCreateActiveInternal
to prepare the grounds for external active snapshots.
2012-11-03 15:06:09 +01:00
Peter Krempa
2a59a3d597 snapshot: qemu: Add async job type for snapshots
The new external system checkpoints will require an async job while the
snapshot is taken. This patch adds QEMU_ASYNC_JOB_SNAPSHOT to track this
job type.
2012-11-03 14:57:43 +01:00
Peter Krempa
5f75bd4bbe snapshot: Add flag to enable creating checkpoints in live state
The default behavior while creating external checkpoints is to pause the
guest while the memory state is captured. We want the users to sacrifice
space saving for creating the memory save image while the guest is live
to minimize downtime.

This patch adds a flag that causes the guest not to be paused before
taking the snapshot.
 *include/libvirt/libvirt.h.in:
    - add new paused reason: VIR_DOMAIN_PAUSED_SNAPSHOT
    - add new flag for taking snapshot: VIR_DOMAIN_SNAPSHOT_CREATE_LIVE
 *tools/virsh-domain-monitor.c:
    - add string representation for VIR_DOMAIN_PAUSED_SNAPSHOT
 *tools/virsh-snapshot.c:
    - add support for VIR_DOMAIN_SNAPSHOT_CREATE_LIVE
 *tools/virsh.pod:
    - add docs for --live option added to use
    VIR_DOMAIN_SNAPSHOT_CREATE_LIVE flag
2012-11-03 14:43:01 +01:00
Peter Krempa
2771f8b74c qemu: Split out domain memory saving code to allow reuse
The code that saves domain memory by migration to file can be reused
while doing external checkpoints of a machine. This patch extracts the
common code and places it in a separate function.
2012-11-03 11:49:41 +01:00
Peter Krempa
ec69ca14f9 qemu: Clean up snapshot retrieval to use the new helper
Two other places were left with the old code to look up snapshots.
Change them to use the snapshot lookup helper.
2012-11-03 11:26:39 +01:00
Peter Krempa
d38b934c49 cpu: Add AMD Opteron G5 cpu model 2012-11-02 20:57:17 +01:00
Peter Krempa
bafffe7a10 cpu: Add newly added cpu flags
This patch adds a few new processor feature flags. Namely:
 f16c rdrand lwp tbm topoext perfctr_core perfctr_nb fsgsbase bmi1 hle
 avx2 bmi2 erms invpcid rtm rdseed adx tce
2012-11-02 20:52:40 +01:00
Peter Krempa
d0fc6dc831 qemu: Fix possible race when pausing guest
When pausing the guest while migration is running (to speed up
convergence) the virDomainSuspend API checks if the migration job is
active before entering the job. This could cause a possible race if the
virDomainSuspend is called while the job is active but ends before the
Suspend API enters the job (this would require that the migration is
aborted). This would cause a incorrect event to be emitted.
2012-11-02 20:18:46 +01:00
Eric Blake
de76cae971 snapshot: merge pre-snapshot checks
Both system checkpoint snapshots and disk snapshots were iterating
over all disks, doing a final sanity check before doing any work.
But since future patches will allow offline snapshots to be either
external or internal, it makes sense to share the pass over all
disks, and then relax restrictions in that pass as new modes are
implemented.  Future patches can then handle external disks when
the domain is offline, then handle offline --disk-snapshot, and
finally, combine with migration to file to gain a complete external
system checkpoint snapshot of an active domain without using 'savevm'.

* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotIsAllowed): Merge...
(qemuDomainSnapshotPrepare): ...into one function.
(qemuDomainSnapshotCreateXML): Update caller.
2012-11-02 10:19:03 -06:00
Eric Blake
e260e401a5 snapshot: populate new XML info for qemu snapshots
Now that the XML supports listing internal snapshots, it is worth
always populating the <memory> and <disks> element to match.

* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Always
parse disk info and set memory info.
2012-11-02 10:11:50 -06:00
Eric Blake
f9670bf8a4 snapshot: improve disk align checking
There were not previous callers with require_match set to true.
I originally implemented this bool with the intent of supporting
ESX snapshot semantics, where the choice of internal vs. external
vs. non-checkpointable must be made at domain start, but as ESX
has not been wired up to use it yet, we might as well fix it to
work with our next qemu patch for now, and worry about any further
improvements (changing the bool to a flags argument) if the ESX
driver decides to use this function in the future.

* src/conf/snapshot_conf.c (virDomainSnapshotAlignDisks): Alter
logic when require_match is true to deal with new XML.
2012-11-02 10:02:57 -06:00
Eric Blake
4201a7ea1c snapshot: new XML for external system checkpoint
Each <domainsnapshot> can now contain an optional <memory>
element that describes how the VM state was handled, similar
to disk snapshots.  The new element will always appear in
output; for back-compat, an input that lacks the element will
assume 'no' or 'internal' according to the domain state.

Along with this change, it is now possible to pass <disks> in
the XML for an offline snapshot; this also needs to be wired up
in a future patch, to make it possible to choose internal vs.
external on a per-disk basis for each disk in an offline domain.
At that point, using the --disk-only flag for an offline domain
will be able to work.

For some examples below, remember that qemu supports the
following snapshot actions:

qemu-img: offline external and internal disk
savevm: online internal VM and disk
migrate: online external VM
transaction: online external disk

=====
<domainsnapshot>
  <memory snapshot='no'/>
  ...
</domainsnapshot>

implies that there is no VM state saved (mandatory for
offline and disk-only snapshots, not possible otherwise);
using qemu-img for offline domains and transaction for online.

=====
<domainsnapshot>
  <memory snapshot='internal'/>
  ...
</domainsnapshot>

state is saved inside one of the disks (as in qemu's 'savevm'
system checkpoint implementation).  If needed in the future,
we can also add an attribute pointing out _which_ disk saved
the internal state; maybe disk='vda'.

=====
<domainsnapshot>
  <memory snapshot='external' file='/path/to/state'/>
  ...
</domainsnapshot>

This is not wired up yet, but future patches will allow this to
control a combination of 'virsh save /path/to/state' plus disk
snapshots from the same point in time.

=====

So for 1.0.1 (and later, as needed), I plan to implement this table
of combinations, with '*' designating new code and '+' designating
existing code reached through new combinations of xml and/or the
existing DISK_ONLY flag:

domain  memory  disk   disk-only | result
-----------------------------------------
offline omit    omit   any       | memory=no disk=int, via qemu-img
offline no      omit   any       |+memory=no disk=int, via qemu-img
offline omit/no no     any       | invalid combination (nothing to snapshot)
offline omit/no int    any       |+memory=no disk=int, via qemu-img
offline omit/no ext    any       |*memory=no disk=ext, via qemu-img
offline int/ext any    any       | invalid combination (no memory to save)
online  omit    omit   off       | memory=int disk=int, via savevm
online  omit    omit   on        | memory=no disk=default, via transaction
online  omit    no/ext off       | unsupported for now
online  omit    no     on        | invalid combination (nothing to snapshot)
online  omit    ext    on        | memory=no disk=ext, via transaction
online  omit    int    off       |+memory=int disk=int, via savevm
online  omit    int    on        | unsupported for now
online  no      omit   any       |+memory=no disk=default, via transaction
online  no      no     any       | invalid combination (nothing to snapshot)
online  no      int    any       | unsupported for now
online  no      ext    any       |+memory=no disk=ext, via transaction
online  int/ext any    on        | invalid combination (disk-only vs. memory)
online  int     omit   off       |+memory=int disk=int, via savevm
online  int     no/ext off       | unsupported for now
online  int     int    off       |+memory=int disk=int, via savevm
online  ext     omit   off       |*memory=ext disk=default, via migrate+trans
online  ext     no     off       |+memory=ext disk=no, via migrate
online  ext     int    off       | unsupported for now
online  ext     ext    off       |*memory=ext disk=ext, via migrate+transaction

* docs/schemas/domainsnapshot.rng (memory): New RNG element.
* docs/formatsnapshot.html.in: Document it.
* src/conf/snapshot_conf.h (virDomainSnapshotDef): New fields.
* src/conf/domain_conf.c (virDomainSnapshotDefFree)
(virDomainSnapshotDefParseString, virDomainSnapshotDefFormat):
Manage new fields.
* tests/domainsnapshotxml2xmltest.c: New test.
* tests/domainsnapshotxml2xmlin/*.xml: Update existing tests.
* tests/domainsnapshotxml2xmlout/*.xml: Likewise.
2012-11-02 09:56:23 -06:00
Eric Blake
e66bdbb784 snapshot: simplify OOM checking during parse
* src/conf/snapshot_conf.c (virDomainSnapshotDefParseString):
Simplify OOM reporting.
2012-11-02 09:43:49 -06:00
Daniel P. Berrange
1c04f99970 Remove spurious whitespace between function name & open brackets
The libvirt coding standard is to use 'function(...args...)'
instead of 'function (...args...)'. A non-trivial number of
places did not follow this rule and are fixed in this patch.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-11-02 13:36:49 +00:00
Peter Krempa
0211fd6e04 net: Mark network persistent when assigning persistent definition
When assigning the new persistent definition for a transient network
(thus making it persistent) the network needs to be marked persistent
before actually atempting to assign the definition.
2012-11-02 13:28:40 +01:00
Peter Krempa
fa16957ccd net: Add support for changing persistent networks to transient
Until now, the network undefine API was able to undefine only inactive
networks. The restriction doesn't make sense any more so this patch
implements changing networks to transient.
2012-11-02 13:28:40 +01:00
Peter Krempa
b6dbbae128 net: Re-use checks when creating transient networks
When a transient network was created some of the checks weren't run on
the definition allowing to start invalid networks.

This patch splits out code to the network validation function and
re-uses that code when creating transient networks.
2012-11-02 13:28:40 +01:00
Peter Krempa
e87af617fc net: Remove dnsmasq and radvd files also when destroying transient nets
The network driver didn't care about config files when a network was
destroyed, just when it was undefined leaving behind files for transient
networks.

This patch splits out the cleanup code to a helper function that handles
the cleanup if the inactive network object is being removed and re-uses
this code when getting rid of inactive networks.
2012-11-02 13:28:40 +01:00
Peter Krempa
23ae3fe425 net: Move creation of dnsmasq hosts file to function starting dnsmasq
The hosts file was created in the network definition function. This
patch moves the place the file is being created to the point where
dnsmasq is being started.
2012-11-02 13:28:40 +01:00
Peter Krempa
a3258c0eb9 net: Change argument type of virNetworkObjIsDuplicate()
The argument check_active is used only as a boolean so this patch
changes the type and updates callers.
2012-11-02 13:28:39 +01:00
Peter Krempa
f823089124 conf: net: Fix deadlock if assignment of network def fails
When the assignment fails, the network object is not unlocked and next
call that would use it deadlocks.
2012-11-02 13:28:39 +01:00
Peter Krempa
947230fb56 conf: net: Fix helper for applying new network definition
When there's no new definition the helper overwrote the old one with
NULL.
2012-11-02 13:28:39 +01:00
Daniel Veillard
bd0cb27cf6 Remove a chunk which should not have been pushed as part of 1.0.0
I didn't noticed that that small old patch was still applied locally
2012-11-02 19:23:13 +08:00
Michal Privoznik
30b398d5ef logging.c: Properly indent and ignore one syntax-check rule
With our fix of mkostemp (pushed as 2b435c15) we define a macro
to compile with uclibc. However, this definition is conditional
and thus needs to be properly indented. Moreover, with this definition
sc_prohibit_mkstemp syntax-check rule keeps yelling:

  src/util/logging.c:63:# define mkostemp(x,y) mkstemp(x)
  maint.mk: use mkostemp with O_CLOEXEC instead of mkstemp

Therefore we should ignore this file for this rule.
2012-11-02 11:19:04 +01:00
Guannan Ren
1851a0c864 qemu: use default machine type if missing it in qemu command line
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=871273
when using virsh qemu-attach to attach an existing qemu process,
if it misses the -M option in qemu command line, libvirtd crashed
because the NULL value of def->os.machine in later use.

Example:
/usr/libexec/qemu-kvm -name foo \
                      -cdrom /var/lib/libvirt/images/boot.img \
                      -monitor unix:/tmp/demo,server,nowait \

error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor

This patch tries to set default machine type if the value of
def->os.machine is still NULL after qemu command line parsing.
2012-11-02 12:55:29 +08:00
Daniel Veillard
2b435c153e Release of libvirt-1.0.0
* configure.ac docs/news.html.in libvirt.spec.in: update for the new release
* po/*.po*: update from transifex, a lot of added support e.g. Indian
  languages, and regenerate
2012-11-02 12:08:11 +08:00
Eric Blake
3d0130cbcc cpumap: optimize for clients that don't need online count
It turns out that calling virNodeGetCPUMap(conn, NULL, NULL, 0)
is both useful, and with Viktor's patches, common enough to
optimize.  Since this interface hasn't been released yet, we
can change the RPC call.

A bit more background on the optimization - learning the cpu count
is a single file read (/sys/devices/system/cpu/possible), but
learning the number of online cpus can possibly trigger a file
read per cpu, depending on the age of the kernel, and all wasted
if the caller passed NULL for both arguments.

* src/nodeinfo.c (nodeGetCPUMap): Avoid bitmap when not needed.
* src/remote/remote_protocol.x (remote_node_get_cpu_map_args):
Supply two separate flags for needed arguments.
* src/remote/remote_driver.c (remoteNodeGetCPUMap): Update
caller.
* daemon/remote.c (remoteDispatchNodeGetCPUMap): Likewise.
* src/remote_protocol-structs: Regenerate.
2012-11-01 20:36:01 -06:00
Doug Goldstein
ba804d9fd1 qemu: QMP capabilities support starts with 1.2
Per the code comment in qemuCapsInitQMPBasic() and commit 43e23c7, we
should only use QMP for capabilities probing starting with 1.2 and
newer.  The old code had dead logic that probed on 1.0 and newer.

Signed-off-by: Eric Blake <eblake@redhat.com>
2012-11-01 17:50:02 -06:00
Dan Walsh
2e03b08ead Linux Containers are not allowed to create device nodes.
This needs to be done before the container starts. Turning
off the mknod capability is noticed by systemd, which will
no longer attempt to create device nodes.

This eliminates SELinux AVC messages and ugly failure messages in the journal.
2012-11-01 15:14:25 -06:00
Stefan Hajnoczi
23d47b33a2 qemu: Fix name comparison in qemuMonitorJSONBlockIoThrottleInfo()
The string comparison logic was inverted and matched the first drive
that does *not* have the name we search for.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-11-01 13:23:27 -06:00
Stefan Hajnoczi
04ee70bfda qemu: Keep QEMU host drive prefix in BlkIoTune
The QEMU -drive id= begins with libvirt's QEMU host drive prefix
("drive-"), which is stripped off in several places two convert between
host ("-drive") and guest ("-device") device names.

In the case of BlkIoTune it is unnecessary to strip the QEMU host drive
prefix because we operate on "info block"/"query-block" output that uses
host drive names.

Stripping the prefix incorrectly caused string comparisons to fail since
we were comparing the guest device name against the host device name.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-11-01 13:03:26 -06:00
Michal Privoznik
f32e3a2dd6 iohelper: fdatasync() at the end
Currently, when we are doing (managed) save, we insert the
iohelper between the qemu and OS. The pipe is created, the
writing end is passed to qemu and the reading end to the
iohelper. It reads data and write them into given file. However,
with write() being asynchronous data may still be in OS
caches and hence in some (corner) cases, all migration data
may have been read and written (not physically though). So
qemu will report success, as well as iohelper. However, with
some non local filesystems, where ENOSPACE is polled every X
time units, we may get into situation where all operations
succeeded but data hasn't reached the disk. And in fact will
never do. Therefore we ought sync caches to make sure data
has reached the block device on remote host.
2012-11-01 16:55:01 +01:00
Peter Krempa
8cd327fa7f conf: Fix private symbols exported by files in conf
Some of the functions were moved to other files but the private symbol
file wasn't tweaked to reflect that.
2012-11-01 10:21:52 +01:00
Daniel P. Berrange
6fea88a119 Fix arch detection for qemu-system-i386 with QMP
QEMU uses 'i386' for its 32-bit x86 architecture, but libvirt
wants that to be 'i686', so we must fix it up

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-11-01 09:16:37 +00:00
Daniel P. Berrange
6bf55a9752 Don't assume pid_t is the same size as an int
virPidFileReadPathIfAlive passed in an 'int *' where a 'pid_t *'
was expected, which breaks on Mingw64 targets. Also a few places
were using '%d' for formatting pid_t, change them to '%lld' and
force a cast to the longer type as done elsewhere in the same
file.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-11-01 09:16:04 +00:00
Eric Blake
4dbd6e9654 build: prefer mkostemp for multi-thread safety
https://bugzilla.redhat.com/show_bug.cgi?id=871756

Commit cd1e8d1 assumed that systems new enough to have journald
also have mkostemp; but this is not true for uclibc.

For that matter, use of mkstemp[s] is unsafe in a multi-threaded
program.  We should prefer mkostemp[s] in the first place.

* bootstrap.conf (gnulib_modules): Add mkostemp, mkostemps; drop
mkstemp and mkstemps.
* cfg.mk (sc_prohibit_mkstemp): New syntax check.
* tools/virsh.c (vshEditWriteToTempFile): Adjust caller.
* src/qemu/qemu_driver.c (qemuDomainScreenshot)
(qemudDomainMemoryPeek): Likewise.
* src/secret/secret_driver.c (replaceFile): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainScreenshot): Likewise.
2012-10-31 10:06:10 -06:00
Martin Kletzander
10c5212b10 qemu: Fix EmulatorPinInfo without emulatorpin
https://bugzilla.redhat.com/show_bug.cgi?id=871312

Recent fixes made almost all the right steps to make emulator pinned
to the cpuset of the whole domain in case <emulatorpin> isn't
specified, but qemudDomainGetEmulatorPinInfo still reports all the
CPUs even when cpuset is specified.  This patch fixes that.
2012-10-31 16:27:02 +01:00
Peter Krempa
ca043b8c06 util: Improve error reporting from absolutePathFromBaseFile helper
There are multiple reasons canonicalize_file_name() used in
absolutePathFromBaseFile helper can fail. This patch enhances error
reporting from that helper.
2012-10-31 11:53:07 +01:00
Martin Kletzander
037a49dc66 Make non-KVM machines work with QMP probing
When there is no 'qemu-kvm' binary and the emulator used for a machine
is, for example, 'qemu-system-x86_64' that, by default, runs without
kvm enabled, libvirt still supplies '-no-kvm' option to this process,
even though it does not recognize such option (making the start of a
domain fail in that case).

This patch fixes building a command-line for QEMU machines without KVM
acceleration and is based on following assumptions:

 - QEMU_CAPS_KVM flag means that QEMU is running KVM accelerated
   machines by default (without explicitly requesting that using a
   command-line option).  It is the closest to the truth according to
   the code with the only exception being the comment next to the
   flag, so it's fixed in this patch as well.

 - QEMU_CAPS_ENABLE_KVM flag means that QEMU is, by default, running
   without KVM acceleration and in case we need KVM acceleration it
   needs to be explicitly instructed to do so.  This is partially
   true for the past (this option essentially means that QEMU
   recognizes the '-enable-kvm' option, even though it's almost the
   same).
2012-10-31 08:31:49 +01:00
Gene Czarcinski
adaa7ab653 bugfix: ip6tables rule removal
Three FORWARD chain rules are added and two INPUT chain rules
are added when a network is started but only the FORWARD chain
rules are removed when the network is destroyed.
2012-10-30 16:04:25 -06:00
Eric Blake
270a9fef37 maint: log xml during volume creation
I noticed this while answering a list question about Java bindings
of volume creation.  All other functions that take xml logged xmlDesc.

* src/libvirt.c (virStorageVolCreateXML)
(virStorageVolCreateXMLFrom): Use consistent spelling of xmlDesc,
and log the argument.
2012-10-30 14:59:31 -06:00
Laine Stump
7bafe009d9 util: do a better job of matching up pids with their binaries
This patch resolves: https://bugzilla.redhat.com/show_bug.cgi?id=871201

If libvirt is restarted after updating the dnsmasq or radvd packages,
a subsequent "virsh net-destroy" will fail to kill the dnsmasq/radvd
process.

The problem is that when libvirtd restarts, it re-reads the dnsmasq
and radvd pidfiles, then does a sanity check on each pid it finds,
including checking that the symbolic link in /proc/$pid/exe actually
points to the same file as the path used by libvirt to execute the
binary in the first place. If this fails, libvirt assumes that the
process is no longer alive.

But if the original binary has been replaced, the link in /proc is set
to "$binarypath (deleted)" (it literally has the string " (deleted)"
appended to the link text stored in the filesystem), so even if a new
binary exists in the same location, attempts to resolve the link will
fail.

In the end, not only is the old dnsmasq/radvd not terminated when the
network is stopped, but a new dnsmasq can't be started when the
network is later restarted (because the original process is still
listening on the ports that the new process wants).

The solution is, when the initial "use stat to check for identical
inodes" check for identity between /proc/$pid/exe and $binpath fails,
to check /proc/$pid/exe for a link ending with " (deleted)" and if so,
truncate that part of the link and compare what's left with the
original binarypath.

A twist to this problem is that on systems with "merged" /sbin and
/usr/sbin (i.e. /sbin is really just a symlink to /usr/sbin; Fedora
17+ is an example of this), libvirt may have started the process using
one path, but /proc/$pid/exe lists a different path (indeed, on F17
this is the case - libvirtd uses /sbin/dnsmasq, but /proc/$pid/exe
shows "/usr/sbin/dnsmasq"). The further bit of code to resolve this is
to call virFileResolveAllLinks() on both the original binarypath and
on the truncated link we read from /proc/$pid/exe, and compare the
results.

The resulting code still succeeds in all the same cases it did before,
but also succeeds if the binary was deleted or replaced after it was
started.
2012-10-30 13:28:47 -04:00
Peter Krempa
7af929d065 cpu: Fix definition of flag smap
A mild case of dyslexia caused that commit
012f9b19ef specifies wrong mask for the
smap cpu feature flag. This patch fixes that mistake.
2012-10-30 15:01:27 +01:00
Michal Privoznik
9af1b30da3 sanlock: Introduce 'user' and 'group' conf variables
through which user set under what permissions does sanlock
daemon run so libvirt will set the same permissions for
files exposed to it.
2012-10-30 10:12:10 +01:00
Vladislav Bogdanov
81af5336ac qemu: pass -usb and usb hubs earlier, so USB disks with static address are handled properly 2012-10-30 08:54:32 +01:00
Vladislav Bogdanov
8f708761c0 qemu: Do not ignore address for USB disks 2012-10-30 08:54:28 +01:00
Martin Kletzander
bab7752c0c esx: Fix connection to ESX 5.1
After separating 5.x and 5.1 versions of ESX, we forgot to add 5.1
into the list of allowed connections, so connections to 5.1 fail since
v1.0.0-rc1-5-g1e7cd39
2012-10-30 08:35:24 +01:00
Eric Blake
c047f54749 build: place attributes in correct location
Ever since commit eefb881, ATTRIBUTE_NONNULL has normally been a
no-op under gcc (since it tends to cause more bugs than it cures
given gcc's current lame implementation of the attribute).  However,
the macro is still useful to Coverity and other static-analysis
tools, but only if we use it correctly.  Coverity follows gcc's lead
in accepting function declarations with attributes at the end, but
function bodies must attach attributes to the return type.  That is,
these are valid:

void foo(void *arg) ATTRIBUTE_NONNULL(1);
void ATTRIBUTE_NONNULL(1) foo(void *arg);
void ATTRIBUTE_NONNULL(1) foo(void *arg) {}

but this is not:

void foo(void *arg) ATTRIBUTE_NONNULL(1) {}

even though you don't get a compile failure until you do static
analysis.  Bug introduced in commit 80533ca, with these symptoms:

nodeinfo.c:206: error: expected ',' or ';' before '{' token
cc1: warning: unrecognized command line option "-Wno-suggest-attribute=const"
cc1: warning: unrecognized command line option "-Wno-suggest-attribute=pure"
make[3]: *** [libvirt_driver_la-nodeinfo.lo] Error 1

* src/nodeinfo.c (virNodeParseNode): Fix syntax error when
non-null attribute is in use.
2012-10-29 16:53:44 -06:00
Eric Blake
a047a24d11 build: fix linking with systemtap probes
Commit 34e8f63a3 altered virfile.o to drag in additional symbols,
which in turn led to pulling in other .o files and eventually causing
a link failure when systemtap probes are enabled, such as:

./.libs/libvirt_util.a(libvirt_util_la-event_poll.o): In function `virEventPollRunOnce':
/home/dummy/libvirt/src/util/event_poll.c:614: undefined reference to `libvirt_event_poll_run_semaphore'
./.libs/libvirt_util.a(libvirt_util_la-event_poll.o):(.note.stapsdt+0x24): undefined reference to `libvirt_event_poll_add_handle_semaphore'

Even though libvirt_iohelper and libvirt_parthelper don't directly
use the portion of virfile.o that drags in probing, it was easier
to satisfy the linker and get the build back up, than to figure out
whether it is even possible or worth trying to disentangle the mess.

* src/Makefile.am (libvirt_iohelper_LDADD)
(libvirt_parthelper_LDADD): Use libvirt_probes.lo when needed.
2012-10-29 14:22:28 -06:00
Michal Privoznik
34e8f63a32 qemu: Report errors from iohelper
Currently, we use iohelper when saving/restoring a domain.
However, if there's some kind of error (like I/O) it is not
propagated to libvirt. Since it is not qemu who is doing
the actual write() it will not get error. The iohelper does.
Therefore we should check for iohelper errors as it makes
libvirt more user friendly.
2012-10-29 17:04:26 +01:00
Peter Krempa
cbd10126ed util: Re-format literal strings in virXMLEmitWarning
And drop a stray space at the end of the first line of the warning.
2012-10-29 15:19:26 +01:00
Ján Tomko
0b121614a2 xml: print uuids in the warning
In the XML warning, we print a virsh command line that can be used to
edit that XML. This patch prints UUIDs if the entity name contains
special characters (like shell metacharacters, or "--" that would break
parsing of the XML comment). If the entity doesn't have a UUID, just
print the virsh command that can be used to edit it.
2012-10-29 14:38:43 +01:00
Jiri Denemark
23f5e74ed3 Revert "qemu: Do not require hostuuid in migration cookie"
This reverts commit 8d75e47ede.

Libvirt was never released with support for migration cookies without
hostuuid.
2012-10-29 09:04:27 +01:00
Cole Robinson
9a2975786b qemu: Fix domxml-to-native network model conversion
https://bugzilla.redhat.com/show_bug.cgi?id=636832
2012-10-27 12:20:49 -04:00
Eric Blake
dd0a7040f7 build: typo fix for qemu cpu affinity
Introduced in commit 0039a32f.

* src/qemu/qemu_process.c (qemuPrepareCpumap): s/covert/convert/
2012-10-27 08:09:51 -06:00
Eric Blake
5a3501be9e blockjob: relabel entire existing chain
When using block copy to pivot over to a new chain, the backing files
for the new chain might still need labeling (particularly if the user
passes --reuse-ext with a relative backing file name).  Relabeling a
file that is already labeled won't hurt, so this just labels the entire
chain at the point of the pivot.  Doing the relabel of the chain uses
the fact that we already safely probed the file type of an external
file at the start of the block copy.

* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Relabel chain before
asking qemu to pivot.
2012-10-27 07:43:39 -06:00
Eric Blake
35c7701c64 blockjob: allow mirroring under SELinux and cgroup
Use the recent addition of qemuDomainPrepareDiskChainElement to
obtain locking manager lease, permit a block device through cgroups,
and set the SELinux label; then audit the fact that we hand a new
file over to qemu.  Alas, releasing the lease and label at the end
of the mirroring is a trickier prospect (we would have to trace the
backing chain of both source and destination, and be sure not to
revoke rights to any part of the chain that is shared), so for now,
virDomainBlockJobAbort still leaves things with additional access
granted (as block-pull and block-commit have the same problem of
not clamping access after completion, a future cleanup would cover
all three commands).

* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling.
2012-10-27 07:43:39 -06:00
Eric Blake
8ee5073c1e blockjob: allow for existing files in block-copy
Support the REUSE_EXT flag, in part by copying sanity checks from
snapshot code.  This code introduces a case of probing an external
file for its type; such an action would be a security risk if the
existing file is supposed to be raw but the contents resemble some
other format; however, since the virDomainBlockRebase API has a
flag to force treating the file as raw rather than probe, we can
assume that probing is safe in all other instances.  Besides, if
we don't probe or force raw, then qemu will.

* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT
flag.
(qemuDomainBlockCopy): Wire up flag, and add some sanity checks.
2012-10-27 07:43:39 -06:00
Eric Blake
c1eb38053d blockjob: implement block copy for qemu
Minimal patch to wire up all the pieces in the previous patches
to actually enable a block copy job.  By minimal, I mean that
qemu creates the file (that is, no REUSE_EXT flag support yet),
SELinux must be disabled, a lock manager is not informed, and the
audit logs aren't updated.  But those will be added as
improvements in future patches.

This patch is designed so that if we ever add a future API
virDomainBlockCopy with more bells and whistles (such as letting
the user specify a destination image format different than the
source), where virDomainBlockRebase is a wrapper around the
simpler portions of the new functionality, then the new API can
just reuse the new qemuDomainBlockCopy function and already
support _SHALLOW and _REUSE_EXT flags.  Also note that libvirt.c
already filtered the new flags if _COPY is not present, so that
we are not impacting the case of BlockRebase being a wrapper
around BlockPull.

* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function.
(qemuDomainBlockRebase): Call it when appropriate.
2012-10-27 07:43:39 -06:00
Eric Blake
400ac797ef blockjob: make block pivot safer
Since libvirt drops locks between issuing a monitor command and
getting a response, it is possible for libvirtd to be restarted
before getting a response on a block-job-complete command; worse, it
is also possible for the guest to shut itself down during the window
while libvirtd is down, ending the qemu process.  A management app
needs to know if the pivot happened (and the destination file
contains guest contents not in the source) or failed (and the source
file contains guest contents not in the destination), but since
the job is finished, 'query-block-jobs' no longer tracks the
status of the job, and if the qemu process itself has disappeared,
even 'query-block' cannot be checked to ask qemu its current state.

At the time of this patch, the design for persistent bitmap has not
been clarified, so a followup patch will be needed once qemu
actually figures out how to expose it, and we figure out how to use
it.  In the meantime, we have a solution that avoids the worst of
the problem.  [This problem was first analyzed with the RHEL 6.3
__com.redhat_drive-reopen command; which partly explains why
upstream qemu 1.3 ditched the drive-reopen idea and went with
block-job-complete plus persistent bitmap instead.]

If we surround 'drive-reopen' with a pause/resume pair, then we can
guarantee that the guest cannot modify either source or destination
files in the window of libvirtd uncertainty, and the management app
is guaranteed that either libvirt knows the outcome and reported it
correctly; or that on libvirtd restart, the guest will still be
paused and that the qemu process cannot have disappeared due to
guest shutdown; and use that as a clue that the management app must
implement recovery protocol, with both source and destination files
still being in sync and with 'query-block' still being an option as
part of that recovery.  My testing shows that the pause window will
typically be only a fraction of a second.

* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around
drive-reopen.
(qemuDomainBlockJobImpl): Update caller.
2012-10-27 07:43:38 -06:00
Eric Blake
eaba79d22e blockjob: support pivot operation on cancel
This is the bare minimum to end a copy job (of course, until a
later patch adds the ability to start a copy job, this patch
doesn't do much in isolation; I've just split the patches to
ease the review).

This patch intentionally avoids SELinux, lock manager, and audit
actions.  Also, if libvirtd restarts at the exact moment that a
'block-job-complete' is in flight, the proposed proper way to
detect the outcome of that would be with a persistent bitmap and
some additional query commands when libvirtd restarts.  This
patch is enough to test the common case of success when used
correctly, while saving the subtleties of proper cleanup for
worst-case errors for later.

When a mirror job is started, cancelling the job safely reverts back
to the source disk, regardless of whether the destination is in
phase 1 (streaming, in which case the destination is worthless) or
phase 2 (mirroring, in which case the destination is synced up to
the source at the time of the cancel).  Our existing code does just
fine in either phase, other than some bookkeeping cleanup; this
implements live block copy.

Ideas for future enhancements via new flags:

Depending on when persistent bitmap support is added, it may be
worth adding a VIR_DOMAIN_REBASE_COPY_ATOMIC flag that fails up
front if we detect an older qemu with risky pivot operation.

Interesting side note: while snapshot-create --disk-only creates a
copy of the disk at a point in time by moving the domain on to a
new file (the copy is the file now in the just-extended backing
chain), blockjob --abort of a copy job creates a copy of the disk
while keeping the domain on the original file.  There may be
potential improvements to the snapshot code to exploit block copy
over multiple disks all at one point in time.  And, if
'block-job-cancel' were made part of 'transaction', you could
copy multiple disks at the same point in time without pausing
the domain.  This also implies we may want to add a --quiesce flag
to virDomainBlockJobAbort, so that when breaking a mirror (whether
by cancel or pivot), the side of the mirror that we are abandoning
is at least in a stable state with regards to guest I/O.

* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag.
(qemuDomainBlockPivot): New helper function.
(qemuDomainBlockJobImpl): Implement it.
2012-10-27 07:43:38 -06:00
Eric Blake
edecd45c78 blockjob: return appropriate event and info
Handle the new type of block copy event and info.  Of course,
this patch does nothing until a later patch actually allows the
creation/abort of a block copy job.

* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_READY): New
block job status.
* src/libvirt.c (virDomainBlockRebase): Document the event.
* src/qemu/qemu_monitor_json.c (eventHandlers): New event.
(qemuMonitorJSONHandleBlockJobReady): New function.
(qemuMonitorJSONGetBlockJobInfoOne): Translate new job type.
(qemuMonitorJSONHandleBlockJobImpl): Handle new event and job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Recognize
the event to minimize snooping.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful
info query to save effort on a pivot request.
2012-10-27 07:43:38 -06:00
Eric Blake
b3822ed04a blockjob: react to active block copy
For now, disk migration via block copy job is not implemented in
libvirt.  But when we do implement it, we have to deal with the
fact that qemu does not yet provide an easy way to re-start a qemu
process with mirroring still intact.  Paolo has proposed an idea
for a persistent dirty bitmap that might make this possible, but
until that design is complete, it's hard to say what changes
libvirt would need.  Even something like 'virDomainSave' becomes
hairy, if you realize the implications that 'virDomainRestore'
would be stuck with recreating the same mirror layout.

But if we step back and look at the bigger picture, we realize that
the initial client of live storage migration via disk mirroring is
oVirt, which always uses transient domains, and that if a transient
domain is destroyed while a mirror exists, oVirt can easily restart
the storage migration by creating a new domain that visits just the
source storage, with no loss in data.

We can make life a lot easier by being cowards for now, forbidding
certain operations on a domain.  This patch guarantees that we
never get in a state where we would have to restart a domain with
a mirroring block copy, by preventing saves, snapshots, migration,
hot unplug of a disk in use, and conversion to a persistent domain
(thankfully, it is still relatively easy to 'virsh undefine' a
running domain to temporarily make it transient, run tests on
'virsh blockcopy', then 'virsh define' to restore the persistence).
Later, if the qemu design is enhanced, we can relax our code.

The change to qemudDomainDefine looks a bit odd for undoing an
assignment, rather than probing up front to avoid the assignment,
but this is because of how virDomainAssignDef combines both a
lookup and assignment into a single function call.

* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype.
* src/conf/domain_conf.c (virDomainHasDiskMirror): New function.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous
actions while block copy is already in action.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
2012-10-27 07:43:38 -06:00
Eric Blake
6d264c9182 blockjob: add qemu capabilities related to block jobs
Upstream qemu 1.3 is adding two new monitor commands, 'drive-mirror'
and 'block-job-complete'[1], which can drive live block copy and
storage migration.  [Additionally, RHEL 6.3 had backported an earlier
version of most of the same functionality, but under the names
'__com.redhat_drive-mirror' and '__com.redhat_drive-reopen' and with
slightly different JSON arguments, and has been using patches similar
to these upstream patches for several months now.]

The libvirt API virDomainBlockRebase as already committed for 0.9.12
is flexible enough to expose the basics of block copy, but some
additional features in the 'drive-mirror' qemu command, such as
setting error policy, setting granularity, or using a persistent
bitmap, may later require a new libvirt API virDomainBlockCopy.  I
will wait to add that API until we know more about what qemu 1.3
will finally provide.

This patch caters only to the upstream qemu 1.3 interface, although
I have proven that the changes for RHEL 6.3 can be isolated to
just qemu_monitor_json.c, and the rest of this series will
gracefully handle either interface once the JSON differences are
papered over in a downstream patch.

For consistency with other block job commands, libvirt must handle
the bandwidth argument as MiB/sec from the user, even though qemu
exposes the speed argument as bytes/sec; then again, qemu rounds
up to cluster size internally, so using MiB hides the worst effects
of that rounding if you pass small numbers.

[1]https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04123.html

* src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR)
(QEMU_CAPS_DRIVE_REOPEN): New bits.
* src/qemu/qemu_capabilities.c (qemuCaps): Name them.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
them.
(qemuMonitorJSONDriveMirror, qemuMonitorDrivePivot): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror)
(qemuMonitorDrivePivot): Declare them.
* src/qemu/qemu_monitor.c (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): New passthroughs.
* src/qemu/qemu_monitor.h (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): Declare them.
2012-10-27 07:43:37 -06:00
Laine Stump
def31e4c58 qemu: fix attach/detach of netdevs with matching mac addrs
This resolves:

   https://bugzilla.redhat.com/show_bug.cgi?id=862515

which describes inconsistencies in dealing with duplicate mac
addresses on network devices in a domain.

(at any rate, it resolves *almost* everything, and prints out an
informative error message for the one problem that isn't solved, but
has a workaround.)

A synopsis of the problems:

1) you can't do a persistent attach-interface of a device with a mac
address that matches an existing device.

2) you *can* do a live attach-interface of such a device.

3) you *can* directly edit a domain and put in two devices with
matching mac addresses.

4) When running virsh detach-device (live or config), only MAC address
is checked when matching the device to remove, so the first device
with the desired mac address will be removed. This isn't always the
one that's wanted.

5) when running virsh detach-interface (live or config), the only two
items that can be specified to match against are mac address and model
type (virtio, etc) - if multiple netdevs match both of those
attributes, it again just finds the first one added and assumes that
is the only match.

Since it is completely valid to have multiple network devices with the
same MAC address (although it can cause problems in many cases, there
*are* valid use cases), what is needed is:

1) remove the restriction that prohibits doing a persistent add of a
netdev with a duplicate mac address.

2) enhance the backend of virDomainDetachDeviceFlags to check for
something that *is* guaranteed unique (but still work with just mac
address, as long as it yields only a single results.

This patch does three things:

1) removes the check for duplicate mac address during a persistent
netdev attach.

2) unifies the searching for both live and config detach of netdevices
in the subordinate functions of qemuDomainModifyDeviceFlags() to use the
new function virDomainNetFindIdx (which matches mac address and PCI
address if available, checking for duplicates if only mac address was
specified). This function returns -2 if multiple matches are found,
allowing the callers to print out an appropriate message.

Steps 1 & 2 are enough to fully fix the problem when using virsh
attach-device and detach-device (which require an XML description of
the device rather than a bunch of commandline args)

3) modifies the virsh detach-interface command to check for multiple
matches of mac address and show an error message suggesting use of the
detach-device command in cases where there are multiple matching mac
addresses.

Later we should decide how we want to input a PCI address on the virsh
commandline, and enhance detach-interface to take a --address option,
eliminating the need to use detach-device

* src/conf/domain_conf.c
* src/conf/domain_conf.h
* src/libvirt_private.syms
  * added new virDomainNetFindIdx function
  * removed now unused virDomainNetIndexByMac and
    virDomainNetRemoveByMac

* src/qemu/qemu_driver.c
  * remove check for duplicate max from qemuDomainAttachDeviceConfig
  * use virDomainNetFindIdx/virDomainNetRemove instead
    of virDomainNetRemoveByMac in qemuDomainDetachDeviceConfig
  * use virDomainNetFindIdx instead of virDomainIndexByMac
    in qemuDomainUpdateDeviceConfig

* src/qemu/qemu_hotplug.c
  * use virDomainNetFindIdx instead of a homespun loop in
    qemuDomainDetachNetDevice.

* tools/virsh-domain.c: modified detach-interface command as described
    above
2012-10-26 20:47:54 -04:00
Eric Blake
4fbf322fe9 cpustat: fix regression when cpus are offline
It turns out that the cpuacct results properly account for offline
cpus, and always returns results for every possible cpu, not just
the online ones.  So there is no need to check the map of online
cpus in the first place, merely only a need to know the maximum
possible cpu.  Meanwhile, virNodeGetCPUBitmap had a subtle change
from returning the maximum id to instead returning the width of
the bitmap (one larger than the maximum id) in commit 2f4c5338,
which made this code encounter some off-by-one logic leading to
bad error messages when a cpu was offline:

$ virsh cpu-stats dom
error: Failed to virDomainGetCPUStats()

error: An error occurred, but the cause is unknown

Cleaning this up unraveled a chain of other unused variables.

* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop
pointless check for cpumap changes, and use correct number of
cpus.  Simplify signature.
(qemuDomainGetCPUStats): Adjust caller.
* src/nodeinfo.h (nodeGetCPUCount): New prototype.
(nodeGetCPUBitmap): Drop unused parameter.
* src/nodeinfo.c (nodeGetCPUBitmap): Likewise.
(nodeGetCPUMap): Adjust caller.
(nodeGetCPUCount): New function.
* src/libvirt_private.syms (nodeinfo.h): Export it.
2012-10-26 15:34:52 -06:00
Eric Blake
60f54f6146 build: silence compiler warning about signedness
Commit 246143b fixed a warning on older gcc, but caused a warning
on newer gcc.

../../src/rpc/virnetserverservice.c: In function 'virNetServerServiceNewPostExecRestart':
../../src/rpc/virnetserverservice.c:277:41: error: pointer targets in passing argument 3 of 'virJSONValueObjectGetNumberUint' differ in signedness [-Werror=pointer-sign]

* src/rpc/virnetserverservice.c: Use correct types.
2012-10-26 14:29:51 -06:00
Eric Blake
246143b69f build: fix type-punning bug
With older gcc and 64-bit size_t, the compiler issues a real warning:
rpc/virnetserverservice.c:277: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

Introduced in commit 0cc79255.  Depending on machine endianness,
this warning represents a real bug that could mis-interpret the
value by a factor of 2^32.  I don't know why I couldn't get newer
gcc to report the same warning message.

* src/rpc/virnetserverservice.c
(virNetServerServiceNewPostExecRestart): Use temporary instead.
2012-10-26 13:00:27 -06:00
Laine Stump
73ebd86d73 parallels: fix build for some older compilers
Found this when building on RHEL5:

parallels/parallels_storage.c: In function 'parallelsStorageOpen':
parallels/parallels_storage.c:180: error: 'for' loop initial declaration used outside C99 mode

(and similar error in parallels_driver.c). This was in spite of
configuring with "-Wno-error".
2012-10-26 13:23:56 -04:00
Cole Robinson
eba36a3878 daemon: Fix LIBVIRT_DEBUG=1 default output
This commit changes the behavior of LIBVIRT_DEBUG=1 libvirtd:

$ git show 7022b09111
commit 7022b09111
Author: Daniel P. Berrange <berrange@redhat.com>
Date:   Thu Sep 27 13:13:09 2012 +0100

    Automatically enable systemd journal logging

    Probe to see if the systemd journal is accessible, and if
    so enable logging to the journal by default, rather than
    stderr (current default under systemd).

Previously  'LIBVIRT_DEBUG=1 /usr/sbin/libvirtd' would show all debug
output to stderr, now it send debug output to the journal.

Only use the journal by default if running in daemon mode, or
if stdin is _not_ a tty. This should make libvirtd launched from
systemd use the journal, but preserve the old behavior in most
situations.
2012-10-25 16:46:23 -04:00
Laine Stump
d8aae15aa1 network: fix networkValidate check for default portgroup and vlan
This was found during testing of the fix for:

   https://bugzilla.redhat.com/show_bug.cgi?id=868483

networkValidate was supposed to check for the existence of multiple
portgroups and report an error if this was encountered. It did, but
there were two problems:

1) even though it logged an error, it still returned success, allowing
the operation to continue.

2) It could exit the portgroup checking loop early (or possibly not
even do it once) if a vlan tag was supplied in the base network config
or one of the portgroups.

This patch fixes networkValidate to return failure in addition to
logging the error, and also changes it to not exit the portgroup
checking loop early. The logic was a bit off in the checking for vlan
anyway, and it's intertwined with fixing the early loop exit, so I
fixed that as well. Now it correctly checks for combinations where a
<virtualport> is specified in the base network def and <vlan> is given
in a portgroup, as well as the opposite (<vlan> in base network def
and <virtualport> in portgroup), and ignores the case of a disallowed
vlan when using *no* portgroup if there is a default portgroup (since
in that case there is no way to not use any portgroup).
2012-10-25 16:32:04 -04:00
Viktor Mihajlovski
e3ba67037b virNodeGetCPUMap: Implement driver support
Driver support added for:
- test: pretending 8 host CPUS, 3 being online
- qemu, lxc, openvz, uml: using nodeGetCPUMap

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2012-10-25 11:20:15 -06:00
Viktor Mihajlovski
d34439c9e4 virNodeGetCPUMap: Implement support function in nodeinfo
Added an implemention of virNodeGetCPUMap to nodeinfo.c,
(nodeGetCPUMap) which can be used by all drivers for a Linux
hypervisor host.

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2012-10-25 11:20:08 -06:00
Eric Blake
2f4c5338a6 nodeinfo: improve probing node cpu bitmap
Callers should not need to know what the name of the file to
be read in the Linux-specific version of nodeGetCPUmap;
furthermore, qemu cares about online cpus, not present cpus,
when determining which cpus to skip.

While at it, I fixed the fact that we were computing the maximum
online cpu id by doing a slow iteration, when what we really want
to know is the max available cpu.

* src/nodeinfo.h (nodeGetCPUmap): Rename...
(nodeGetCPUBitmap): ...and simplify signature.
* src/nodeinfo.c (linuxParseCPUmax): New function.
(linuxParseCPUmap): Simplify and alter signature.
(nodeGetCPUBitmap): Change implementation.
* src/libvirt_private.syms (nodeinfo.h): Reflect rename.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Update
caller.
2012-10-25 11:20:08 -06:00
Eric Blake
0711c4b74d bitmap: add virBitmapCountBits
Sometimes it's handy to know how many bits are set.

* src/util/bitmap.h (virBitmapCountBits): New prototype.
(virBitmapNextSetBit): Use correct type.
* src/util/bitmap.c (virBitmapNextSetBit): Likewise.
(virBitmapSetAll): Maintain invariant of clear tail bits.
(virBitmapCountBits): New function.
* src/libvirt_private.syms (bitmap.h): Export it.
* tests/virbitmaptest.c (test2): Test it.
2012-10-25 11:19:23 -06:00
Jiri Denemark
0111b409a3 Fix build with apparmor
Recent storage patches changed signature of virStorageFileGetMetadata
and replaced chain with backingChain in virDomainDiskDef.
2012-10-25 10:21:57 +02:00
Matthias Bolte
1e7cd39511 esx: Update version checks for vSphere 5.1
Also remove warnings for upcoming versions. There hadn't been any
compatibility problems with new ESX version over the whole lifetime
of the ESX driver, so I don't expect any in the future.

Update documentation to mention vSphere 5.x support.
2012-10-24 19:50:28 +02:00
Peter Krempa
012f9b19ef cpu: Add recently added cpu feature flags.
Qemu has added some new feature flags. This patch adds them to libvirt.

The new features are for the cpuid function 0x7 that takes an argument
in the ecx register. Currently only 0x0 is used as the argument so I was
lazy and I just clear the registers to 0 before calling cpuid. In future
when there maybe will be some other possible arguments, we will need to
improve the cpu detection code to take this into account.
2012-10-24 17:36:03 +02:00
Osier Yang
a6bd7c22ea qemu: Prohibit chaning affinity of domain process if placement is 'auto'
On one hand, numad probably will manage the affinity of domain process
dynamically in future. On the other hand, even numad won't manage it,
it still could confusion. Let's make things simpler enough to avoid
the lair for now.
2012-10-24 22:26:11 +08:00
Osier Yang
bb81021bfe qemu: Keep the affinity when creating cgroup for emulator thread
When the cpu placement model is "auto", it sets the affinity for
domain process with the advisory nodeset from numad, however,
creating cgroup for the domain process (called emulator thread
in some contexts) later overrides that with pinning it to all
available pCPUs.

How to reproduce:

  * Configure the domain with "auto" placement for <vcpu>, e.g.
    <vcpu placement='auto'>4</vcpu>
  * % virsh start dom
  * % cat /proc/$dompid/status

Though the emulator cgroup cause conflicts, but we can't simply
prohibit creating it, as other tunables are still useful, such
as "emulator_period", which is used by API
virDomainSetSchedulerParameter. So this patch doesn't prohibit
creating the emulator cgroup, but inherit the nodeset from numad,
and reset the affinity for domain process.

* src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator
                          to accept the passed nodenet
* src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset
2012-10-24 21:46:24 +08:00
Osier Yang
0039a32fca qemu: Add helper to prepare cpumap for affinity setting
Abstract the codes to prepare cpumap into a helper a function,
which can be used later.

* src/qemu/qemu_process.h: Declare qemuPrepareCpumap
* src/qemu/qemu_process.c: Implement qemuPrepareCpumap, and use it.
2012-10-24 21:24:10 +08:00
Viktor Mihajlovski
d804d35fac virNodeGetCPUMap: Implement wire protocol.
- Defined the wire protocol format for virNodeGetCPUMap and its
  arguments
- Implemented remote method invocation (remoteNodeGetCPUMap)
- Implemented method dispatcher (remoteDispatchNodeGetCPUMap)

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2012-10-23 18:46:48 -06:00
Viktor Mihajlovski
7ecc1d814a virNodeGetCPUMap: Define public API.
Adding a new API to obtain information about the
host node's present, online and offline CPUs.

int virNodeGetCPUMap(virConnectPtr conn,
                     unsigned char **cpumap,
                     unsigned int *online,
                     unsigned int flags);

The function will return the number of CPUs present on the host
or -1 on failure;
If cpumap is non-NULL virNodeGetCPUMap will allocate an array
containing a bit map representation of the online CPUs. It's
the callers responsibility to deallocate cpumap using free().
If online is non-NULL, the variable pointed to will contain
the number of online host node CPUs.
The variable flags has been added to support future extensions
and must be set to 0.

Extend the driver structure by nodeGetCPUMap entry in support of the
new API virNodeGetCPUMap.
Added implementation of virNodeGetCPUMap to libvirt.c

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2012-10-23 18:46:47 -06:00
Kyle Mestery
2f3e2c0c43 qemu_migration: Transport OVS per-port data during live migration
Transport Open vSwitch per-port data during live
migration by using the utility functions
virNetDevOpenvswitchGetMigrateData() and
virNetDevOpenvswitchSetMigrateData().

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
2012-10-23 15:26:04 -04:00
Kyle Mestery
f6a2f97eb9 openvswitch: Add utility functions for getting and setting Open vSwitch per-port data
Add utility functions for Open vSwitch to both save
per-port data before a live migration, and restore the
per-port data after a live migration.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
2012-10-23 15:26:04 -04:00
Kyle Mestery
694d0c520b qemu_migration: Add hooks to transport network data during migration
Add the ability for the Qemu V3 migration protocol to
include transporting network configuration. A generic
framework is proposed with this patch to allow for the
transfer of opaque data.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Laine Stump <laine@laine.org>
2012-10-23 15:26:04 -04:00
Jim Fehlig
9785f2b6f2 Fix detection of Xen sysctl version 9
In commit 371ddc98, I mistakenly added the check for sysctl
version 9 after setting the hypercall version to 1, which will
fail with

error : xenHypervisorDoV1Op:967 : Unable to issue hypervisor
ioctl 3166208: Function not implemented

This check should be included along with the others that use
hypercall version 2.
2012-10-23 11:18:20 -06:00
Cole Robinson
767be8be72 selinux: Don't fail RestoreAll if file doesn't have a default label
When restoring selinux labels after a VM is stopped, any non-standard
path that doesn't have a default selinux label causes the process
to stop and exit early. This isn't really an error condition IMO.

Of course the selinux API could be erroring for some other reason
but hopefully that's rare enough to not need explicit handling.

Common example here is storing disk images in a non-standard location
like under /mnt.
2012-10-23 11:45:24 -04:00
Eric Blake
add633bdf9 build: print uids as unsigned
Reported by Michal Privoznik.

* src/security/security_dac.c (virSecurityDACGenLabel): Use
correct format.
2012-10-23 08:38:33 -06:00
Ján Tomko
9b704ab823 xml: omit domain name from comment if it contains double hyphen
We put a comment containing "virsh edit <domain_name>" at the start of
the XML. W3C recommendation forbids the use of "--" in comments [1] and
libvirt can't parse it either. This patch omits the domain name if it
contains a double hyphen.

[1] http://www.w3.org/TR/REC-xml/#sec-comments
2012-10-23 14:24:31 +02:00
Ján Tomko
b326765c80 storage: don't shadow global 'wait' declaration
Rename the 'wait' parameter to 'loop'.
This silences the warning:
storage/storage_backend.c:1348:34: error: declaration of 'wait' shadows
a global declaration [-Werror=shadow]
and fixes the build with -Werror.
--
Note: loop is pool backwards.
2012-10-23 13:56:59 +02:00
Eric Blake
33eaebe48e snapshot: sanity check when reusing file for snapshot
The snapshot code when reusing an existing file had hard-to-read
logic, as well as a missing sanity check: REUSE_EXT should require
the destination to already be present.

* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare): Require
destination on REUSE_EXT, rename variable for legibility.
2012-10-22 15:10:16 -06:00
Eric Blake
23a4df886d build: use correct printf types for uid/gid
Fixes a build failure on cygwin:
cc1: warnings being treated as errors
security/security_dac.c: In function 'virSecurityDACSetProcessLabel':
security/security_dac.c:862:5: error: format '%u' expects type 'unsigned int', but argument 7 has type 'uid_t' [-Wformat]
security/security_dac.c:862:5: error: format '%u' expects type 'unsigned int', but argument 8 has type 'gid_t' [-Wformat]

* src/security/security_dac.c (virSecurityDACSetProcessLabel)
(virSecurityDACGenLabel): Use proper casts.
2012-10-22 14:41:00 -06:00
Cole Robinson
77eff5eeb2 storage: Don't do wait loops from VolLookupByPath
virStorageVolLookupByPath is an API call that virt-manager uses
quite a bit when dealing with storage. This call use BackendStablePath
which has several usleep() heuristics that can be tripped up
and hang virt-manager for a while.

Current example: an empty mpath pool pointing to /dev/mapper makes
_any_ calls to virStorageVolLookupByPath take 5 seconds.

The sleep heuristics are actually only needed in certain cases
when we are waiting for new storage to appear, so let's skip the
timeout steps when calling from LookupByPath.
2012-10-22 16:15:12 -04:00
Cole Robinson
e58dfad4a4 qemu: Don't use -enable-nesting with qemu 1.2.0+
Since the option doesn't exist. Fixes booting with
cpu mode='host-model' and qemu 1.2.0
2012-10-22 16:15:12 -04:00
Doug Goldstein
2da776b1d6 qemu: Don't blindly assume VNC is supported
Currently it's assumed that qemu always supports VNC, however it is
definitely possible to compile qemu without VNC support so we should at
the very least check for it and handle that correctly.
2012-10-22 23:16:17 +08:00
Eric Blake
d9d77bfa80 storage: let format probing work on root-squash NFS
Yet another instance of where using plain open() mishandles files
that live on root-squash NFS, and where improving the API can
improve the chance of a successful probe.

* src/util/storage_file.h (virStorageFileProbeFormat): Alter
signature.
* src/util/storage_file.c (virStorageFileProbeFormat): Use better
method for opening file.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Update caller.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
2012-10-22 09:04:57 -06:00
Ján Tomko
b6ab7a067f migrate: v2: use VIR_DOMAIN_XML_MIGRATABLE when available
In v2 migration protocol, XML is obtained by calling domainGetXMLDesc.
This includes the default USB controller in XML, which breaks migration
to older libvirt (before 0.9.2).

Commit 409b5f5495
    qemu: Emit compatible XML when migrating a domain
only fixed this for v3 migration.

This patch uses the new VIR_DOMAIN_XML_MIGRATABLE flag (detected by
VIR_DRV_FEATURE_XML_MIGRATABLE) to obtain XML without the default controller,
enabling backward v2 migration.
2012-10-22 10:48:50 +02:00
Michal Privoznik
508451e4ad qemu: set seamless migration capability
As we switched to setting capabilities based on QMP communication,
qemu seamless-migration capability was not set. In the -help output
this knob is called seamless-migration=[on|off]. The equivalent in
QMP world is SPICE_MIGRATE_COMPLETED event (qemu upstream commit
2fdd16e2).
2012-10-22 10:09:47 +02:00
Osier Yang
b0f1ba47dd qemu: Fix the unused parameter which causes the build failure 2012-10-22 15:51:13 +08:00
Osier Yang
5828080f71 qemu: Cleanup the unused 'nodeinfo'
"nodeinfo" is not used in these two functions, and it's waste
of goto in qemuProcessSetEmulatorAffinites
2012-10-22 15:12:57 +08:00
Cole Robinson
b62f9b99dd Log parameters passed to virFileMakePath 2012-10-21 13:21:50 -04:00
Cole Robinson
7fcf8d9d69 Log file name passed to virConfReadFile 2012-10-21 13:21:50 -04:00
Laine Stump
6f8a8b30c9 network: don't allow multiple default portgroups
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=868483

virNetworkUpdate, virNetworkDefine, and virNetworkCreate all three
allow network definitions to contain multiple <portgroup> elements
with default='yes'. Only a single default portgroup should be allowed
for each network.

This patch updates networkValidate() (called by both
virNetworkCreate() and virNetworkDefine()) and
virNetworkDefUpdatePortGroup (called by virNetworkUpdate() to not
allow multiple default portgroups.
2012-10-20 21:29:19 -04:00
Laine Stump
1cb1f9dabf network: always create dnsmasq hosts and addnhosts files, even if empty
This fixes the problem reported in:

  https://bugzilla.redhat.com/show_bug.cgi?id=868389

Previously, the dnsmasq hosts file (used for static dhcp entries, and
addnhosts file (used for additional dns host entries) were only
created/referenced on the dnsmasq commandline if there was something
to put in them at the time the network was started. Once we can update
a network definition while it's active (which is now possible with
virNetworkUpdate), this is no longer a valid strategy - if there were
0 dhcp static hosts (resulting in no reference to the hosts file on the
commandline), then one was later added, the commandline wouldn't have
linked dnsmasq up to the file, so even though we create it, dnsmasq
doesn't pay any attention.

The solution is to just always create these files and reference them
on the dnsmasq commandline (almost always, anyway). That way dnsmasq
can notice when a new entry is added at runtime (a SIGHUP is sent to
dnsmasq by virNetworkUdpate whenever a host entry is added or removed)

The exception to this is that the dhcp static hosts file isn't created
if there are no lease ranges *and* no static hosts. This is because in
this case dnsmasq won't be setup to listen for dhcp requests anyway -
in that case, if the count of dhcp hosts goes from 0 to 1, dnsmasq
will need to be restarted anyway (to get it listening on the dhcp
port). Likewise, if the dhcp hosts count goes from 1 to 0 (and there
are no dhcp ranges) we need to restart dnsmasq so that it will stop
listening on port 67. These special situations are handled in the
bridge driver's networkUpdate() by checking for ((bool)
nranges||nhosts) both before and after the update, and triggering a
dnsmasq restart if the before and after don't match.
2012-10-20 21:29:19 -04:00
Laine Stump
78fab2770b network: free/null newDef if network fails to start
https://bugzilla.redhat.com/show_bug.cgi?id=866364

pointed out a crash due to virNetworkObjAssignDef free'ing
network->newDef without NULLing it afterward. A fix for this is in
upstream commit b7e9202401. While the
NULLing of newDef was a legitimate fix, newDef should have already
been empty (NULL) anyway (as indicated in the comment that was deleted
by that commit).

The reason that newDef had a non-NULL value (i.e. the root cause) was
that networkStartNetwork() had failed after populating
network->newDef, but then neglected to free/NULL newDef in the
cleanup.

(A bit of background here: network->newDef should contain the
persistent config of a network when a network is active (and of course
only when it is persisten), and NULL at all other times. There is also
a network->def which should contain the persistent definition of the
network when it is inactive, and the current live state at all other
times. The idea is that you can make changes to network->newDef which
will take effect the next time the network is restarted, but won't
mess with the current state of the network (virDomainObj has a similar
pair of virDomainDefs that behave in the same fashion). Personally I
think there should be a network->live and network->config, and the
location of the persistent config should *always* be in
network->config, but that's for a later cleanup).

Since I love things to be symmetric, I created a new function called
virNetworkObjUnsetDefTransient(), which reverses the effects of
virNetworkObjSetDefTransient(). I don't really like the name of the
new function, but then I also didn't really like the name of the old
one either (it's just named that way to match a similar function in
the domain conf code).
2012-10-20 02:43:16 -04:00
Eric Blake
a172dfbe2e blockjob: avoid segv on early error
Gcc with optimization warns:
../../src/qemu/qemu_driver.c: In function 'qemuDomainBlockCommit':
../../src/qemu/qemu_driver.c:12813:46: error: 'disk' may be used uninitialized in this function [-Werror=maybe-uninitialized]
../../src/qemu/qemu_driver.c:12698:25: note: 'disk' was declared here
cc1: all warnings being treated as errors

so obviously I had only been testing with optimization off.

* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Guard cleanup.
2012-10-19 21:17:00 -06:00
Eric Blake
2e43cb8e90 blockjob: properly label disks for qemu block-commit
I finally have all the pieces in place to perform a block-commit with
SELinux enforcing.  There's still missing cleanup work when the commit
completes, but doing that requires tracking both the backing chain and
the base and top files within that chain in domain XML across libvirtd
restarts.  Furthermore, from a security standpoint, once you have
granted access, you must assume any damage that can be done will be
done; later revoking access is nice to minimize the window of damage,
but less important as it does not affect the fact that damage can be
done in the first place.  Therefore, deferring the revoke efforts until
we have better XML tracking of what chain operations are in effect,
including across a libvirtd restart, is reasonable.

* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Label disks as
needed.
(qemuDomainPrepareDiskChainElement): Cast away const.
2012-10-19 17:56:39 -06:00
Eric Blake
35a2f5bc52 blockjob: refactor qemu disk chain permission grants
Previously, snapshot code did its own permission granting (lock
manager, cgroup device controller, and security manager labeling)
inline.  But now that we are adding block-commit and block-copy
which also have to change permissions, it's better to reuse
common code for the task.  While snapshot should fall back to
no access if read-write access failed, block-commit will want to
fall back to read-only access.  The common code doesn't know
whether failure to grant read-write access should revert to no
access (snapshot, block-copy) or read-only access (block-commit).
This code can also be used to revoke access to unused files after
block-pull.

It might be nice to clean things up in a future patch by adding
new functions to the lock manager, cgroup manager, and security
manager that takes a single file name and applies context of a
disk to that file, rather than the current semantics of applying
context to the entire chain already associated to a disk.  That
way, we could avoid the games this patch plays of temporarily
swapping out the disk->src and related fields of the disk.  But
that would involve more code changes, so this patch really is
the smallest hack for doing the necessary work; besides, this
patch is more or less code motion (the hack was already employed
by the snapshot creation code, we are just making it reusable).

* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotUndoSingleDiskActive): Refactor labeling hacks...
(qemuDomainPrepareDiskChainElement): ...into new function.
2012-10-19 17:49:06 -06:00
Eric Blake
0a220e2225 blockjob: implement shallow commit flag in qemu
Now that we can crawl the chain of backing files, we can do
argument validation and implement the 'shallow' flag.  In
testing this, I discovered that it can be handy to pass the
shallow flag and an explicit base, as a means of validating
that the base is indeed the file we expected.

* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Crawl through
chain to implement shallow flag.
* src/libvirt.c (virDomainBlockCommit): Relax API.
2012-10-19 17:35:11 -06:00
Eric Blake
2cbc1fd892 blockjob: wire up online qemu block-commit
This is the bare minimum to kick off a block commit.  In particular,
flags support is missing (shallow requires us to crawl the backing
chain to determine the file name to pass to the qemu monitor command;
delete requires us to track what needs to be deleted at the time
the completion event fires).  Also, we are relying on qemu to do
error checking (such as validating 'top' and 'base' as being members
of the backing chain), including the fact that the current qemu code
does not support committing the active layer (although it is still
planned to add that before qemu 1.3).  Since the active layer won't
change, we have it easy and do not have to alter the domain XML.
Additionally, this will fail if SELinux is enforcing, because we fail
to grant qemu proper read/write access to the files it will modify.

* src/qemu/qemu_driver.c (qemuDomainBlockCommit): New function.
(qemuDriver): Register it.
2012-10-19 17:35:11 -06:00
Eric Blake
3f38c7e3a9 blockjob: manage qemu block-commit monitor command
qemu 1.3 will be adding a 'block-commit' monitor command, per
qemu.git commit ed61fc1.  It matches nicely to the libvirt API
virDomainBlockCommit.

* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCK_COMMIT): New bit.
* src/qemu/qemu_capabilities.c (qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_monitor.h (qemuMonitorBlockCommit): New prototype.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockCommit):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockCommit): Implement it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockCommit):
Likewise.
(qemuMonitorJSONHandleBlockJobImpl)
(qemuMonitorJSONGetBlockJobInfoOne): Handle new event type.
2012-10-19 17:35:11 -06:00
Eric Blake
67aea3fb78 blockjob: remove unused parameters after previous patch
Minor cleanup made possible by previous simplifications.

* src/qemu/qemu_cgroup.h (qemuSetupDiskCgroup)
(qemuTeardownDiskCgroup): Alter signature.
* src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
(qemuTeardownDiskCgroup, qemuSetupCgroup): Update all uses.
* src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice)
(qemuDomainDetachDiskDevice): Likewise.
* src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive)
(qemuDomainChangeDiskMediaLive)
(qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotUndoSingleDiskActive): Likewise.
2012-10-19 17:35:11 -06:00
Eric Blake
38c4a9cc40 storage: use cache to walk backing chain
We used to walk the backing file chain at least twice per disk,
once to set up cgroup device whitelisting, and once to set up
security labeling.  Rather than walk the chain every iteration,
which possibly includes calls to fork() in order to open root-squashed
NFS files, we can exploit the cache of the previous patch.

* src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter
signature.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller
to supply backing chain via disk, if recursion is desired.
* src/security/security_dac.c
(virSecurityDACSetSecurityImageLabel): Adjust caller.
* src/security/security_selinux.c
(virSecuritySELinuxSetSecurityImageLabel): Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
* src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
(qemuTeardownDiskCgroup): Likewise.
(qemuSetupCgroup): Pre-populate chain.
2012-10-19 17:35:11 -06:00
Eric Blake
4d34c92947 storage: cache backing chain while qemu domain is live
Technically, we should not be re-probing any file that qemu might
be currently writing to.  As such, we should cache the backing
file chain prior to starting qemu.  This patch adds the cache,
but does not use it until the next patch.

Ultimately, we want to also store the chain in domain XML, so that
it is remembered across libvirtd restarts, and so that the only
kosher way to modify the backing chain of an offline domain will be
through libvirt API calls, but we aren't there yet.  So for now, we
merely invalidate the cache any time we do a live operation that
alters the chain (block-pull, block-commit, external disk snapshot),
as well as tear down the cache when the domain is not running.

* src/conf/domain_conf.h (_virDomainDiskDef): New field.
* src/conf/domain_conf.c (virDomainDiskDefFree): Clean new field.
* src/qemu/qemu_domain.h (qemuDomainDetermineDiskChain): New
prototype.
* src/qemu/qemu_domain.c (qemuDomainDetermineDiskChain): New
function.
* src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive)
(qemuDomainChangeDiskMediaLive): Pre-populate chain.
(qemuDomainSnapshotCreateSingleDiskActive): Uncache chain before
snapshot.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Update
chain after block pull.
2012-10-19 17:35:10 -06:00
Eric Blake
5eaf605447 storage: make it easier to find file within chain
In order to temporarily label files read/write during a commit
operation, we need to crawl the backing chain and find the absolute
file name that needs labeling in the first place, as well as the
name of the file that owns the backing file.

* src/util/storage_file.c (virStorageFileChainLookup): New
function.
* src/util/storage_file.h: Declare it.
* src/libvirt_private.syms (storage_file.h): Export it.
2012-10-19 17:35:10 -06:00
Eric Blake
82507838e0 storage: remember relative names in backing chain
In order to search for a backing file name as literally present
in a chain, we need to remember if the chain had relative names.
Also, searching for absolute names is easier if we only have
to canonicalize once, rather than on every iteration.

* src/util/storage_file.h (_virStorageFileMetadata): Add field.
* src/util/storage_file.c (virStorageFileGetMetadataFromBuf):
(virStorageFileFreeMetadata): Manage it
(absolutePathFromBaseFile): Store absolute names in canonical form.
2012-10-19 17:35:10 -06:00
Eric Blake
1fc9593271 storage: don't require caller to pre-allocate metadata struct
Requiring pre-allocation was an unusual idiom.  It allowed iteration
over the backing chain to use fewer mallocs, but made one-shot
clients harder to read.  Also, this makes it easier for a future
patch to move away from opening fds on every iteration over the chain.

* src/util/storage_file.h (virStorageFileGetMetadataFromFD): Alter
signature.
* src/util/storage_file.c (virStorageFileGetMetadataFromFD): Allocate
return value.
 (virStorageFileGetMetadata): Update clients.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
2012-10-19 17:35:10 -06:00
Eric Blake
35c74c1733 storage: get entire metadata chain in one call
Previously, no one was using virStorageFileGetMetadata, and for good
reason - it couldn't support root-squash NFS.  Change the signature
and make it useful to future patches, including enhancing the metadata
to recursively track the entire chain.

* src/util/storage_file.h (_virStorageFileMetadata): Add field.
(virStorageFileGetMetadata): Alter signature.
* src/util/storage_file.c (virStorageFileGetMetadata): Rewrite.
(virStorageFileGetMetadataRecurse): New function.
(virStorageFileFreeMetadata): Handle recursion.
2012-10-19 17:35:10 -06:00
Eric Blake
eac74c1f47 storage: don't probe non-files
Backing chains can end on a network protocol, such as nbd:xxx; we
should not attempt to probe the file system in this case.

* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Only probe files.
2012-10-19 17:35:10 -06:00
Eric Blake
1246640b3d storage: use enum for snapshot driver type
This is the last use of raw strings for disk formats throughout
the src/conf directory.

* src/conf/snapshot_conf.h (_virDomainSnapshotDiskDef): Store enum
rather than string for disk type.
* src/conf/snapshot_conf.c (virDomainSnapshotDiskDefClear)
(virDomainSnapshotDiskDefParseXML, virDomainSnapshotDefFormat):
Adjust users.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateSingleDiskActive): Likewise.
2012-10-19 17:35:10 -06:00
Eric Blake
e5e8d5d082 storage: use enum for disk driver type
Actually use the enum in the domain conf structure.

* src/conf/domain_conf.h (_virDomainDiskDef): Store enum rather
than string for disk type.
* src/conf/domain_conf.c (virDomainDiskDefFree)
(virDomainDiskDefParseXML, virDomainDiskDefFormat)
(virDomainDiskDefForeachPath): Adjust users.
* src/xenxs/xen_sxpr.c (xenParseSxprDisks, xenFormatSxprDisk):
Likewise.
* src/xenxs/xen_xm.c (xenParseXM, xenFormatXMDisk): Likewise.
* src/vbox/vbox_tmpl.c (vboxAttachDrives): Likewise.
* src/libxl/libxl_conf.c (libxlMakeDisk): Likewise.
2012-10-19 17:35:09 -06:00
Eric Blake
09e7fb5e1f storage: use enum for default driver type
Express the default disk type as an enum, for easier handling.

* src/conf/capabilities.h (_virCaps): Store enum rather than
string for disk type.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): Adjust
clients.
* src/qemu/qemu_driver.c (qemuCreateCapabilities): Likewise.
2012-10-19 17:35:09 -06:00
Eric Blake
41e0edaf84 storage: treat 'aio' like 'raw' at parse time
We have historically allowed 'aio' as a synonym for 'raw' for
back-compat to xen, but since a future patch will move to using
an enum value, we have to pick one to be our preferred output
name.  This is a slight change in the output XML, but the sexpr
and xm outputs should still be identical, and the input XML can
still use either form.

* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Move aio
back-compat...
(virDomainDiskDefParseXML): ...to parse time.
* src/xenxs/xen_sxpr.c (xenParseSxprDisks, xenFormatSxprDisk): ...and
to output time.
* src/xenxs/xen_xm.c (xenParseXM, xenFormatXMDisk): Likewise.
* tests/sexpr2xmldata/sexpr2xml-*.xml: Update tests.
2012-10-19 17:35:09 -06:00
Eric Blake
f772b3d91f storage: list more file types
When an image has no backing file, using VIR_STORAGE_FILE_AUTO
for its type is a bit confusing.  Additionally, a future patch
would like to reserve a default value for the case of no file
type specified in the XML, but different from the current use
of -1 to imply probing, since probing is not always safe.

Also, a couple of file types were missing compared to supported
code: libxl supports 'vhd', and qemu supports 'fat' for directories
passed through as a file system.

* src/util/storage_file.h (virStorageFileFormat): Add
VIR_STORAGE_FILE_NONE, VIR_STORAGE_FILE_FAT, VIR_STORAGE_FILE_VHD.
* src/util/storage_file.c (virStorageFileMatchesVersion): Match
documentation when version probing not supported.
(cowGetBackingStore, qcowXGetBackingStore, qcow1GetBackingStore)
(qcow2GetBackingStoreFormat, qedGetBackingStore)
(virStorageFileGetMetadataFromBuf)
(virStorageFileGetMetadataFromFD): Take NONE into account.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/conf/storage_conf.c (virStorageVolumeFormatFromString): New
function.
(poolTypeInfo): Use it.
2012-10-19 17:35:09 -06:00
Guannan Ren
4492ef7f48 selinux: relabel tapfd in qemuPhysIfaceConnect
Relabeling tapfd right after the tap device is created.
qemuPhysIfaceConnect is common function called both for static
netdevs and for hotplug netdevs.
2012-10-20 00:01:03 +08:00
Jiri Denemark
8d75e47ede qemu: Do not require hostuuid in migration cookie
Having hostuuid in migration cookie is a nice bonus since it provides an
easy way of detecting migration to the same host. However, requiring it
breaks backward compatibility with older libvirt releases.
2012-10-19 15:08:29 +02:00
Jiri Denemark
9fcc5436d3 qemu: Allow migration with host USB devices
Recently, patches were added support for (managed)saving, restoring, and
migrating domains with host USB devices. However, qemu driver would
still forbid migration of such domains because qemuMigrationIsAllowed
was not updated.
2012-10-19 14:18:26 +02:00
Guido Günther
c324bad93a qemu: Set arch to i686 if qemu-system-i386 is found
If we can't probe the architecture from QMP we parse the architecture
from the qemu binaries name. This results in the architecture being i386
instead of i686 which then results in QEMU_CAPS_PCI_MULTIBUS being unset
which gives a broken qemu command line.

This probably didn't show up earlier since most of the time there's also
a /usr/bin/qemu around which results in i686 capabilities.
2012-10-19 08:12:21 +02:00
Guido Günther
a605594f8e qemu: Don't fail without emulatorpin or cpumask
This unbreaks qemu:///session that got broken by
ba63d8f7d8.
2012-10-19 01:25:19 +02:00
Michal Privoznik
b7e9202401 network: Set to NULL after virNetworkDefFree()
which frees all allocated memory but doesn't set the passed pointer to
NULL.  Therefore, we must do it ourselves. This is causing actual
libvirtd crash: Basically, when doing 'virsh net-edit' the newDef should
be dropped.  And the memory is freed, indeed. However, the pointer is
not set to NULL but kept instead. And the next duo of calls 'virsh
net-start' and 'virsh net-destroy' starts the disaster. The latter one
does the same as 'virsh destroy'; it sees that newDef is nonNULL so it
replaces def with newDef (which has been freed already as said a few
lines above). Therefore any subsequent call accessing def will hit the ground.
2012-10-18 17:02:48 +02:00
Viktor Mihajlovski
47a7b93584 dist: added cpu/cpu_ppc_data.h to Makefile.am
Missing entry for cpu_ppc_data.h added to fix RPM build.

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2012-10-18 16:50:47 +02:00
Jiri Denemark
f1c7010040 qemu: Always format CPU topology
When libvirt cannot find a suitable CPU model for host CPU (easily
reproducible by running libvirt in a guest), it would not provide CPU
topology in capabilities XML either. Even though CPU topology is known
and can be queried by virNodeGetInfo. With this patch, CPU topology will
always be provided in capabilities XML regardless on the presence of CPU
model.
2012-10-18 14:57:08 +02:00
Peter Krempa
09f10a12be qemu: Add support for HyperV Enlightenment feature "relaxed"
This patch adds QEMU support for the "relaxed" feature implemented by
previous patch.
2012-10-18 12:22:50 +02:00
Peter Krempa
cc922fddc3 conf: Add support for HyperV Enlightenment features
Hypervisors are starting to support HyperV Enlightenment features that
improve behavior of guests running Microsoft Windows operating systems.

This patch adds support for the "relaxed" feature that improves timer
behavior and also establishes a framework to add these features in
future.
2012-10-18 12:22:50 +02:00
Peter Krempa
88cac66d92 conf: Make tri-state feature options more universal
The apic-eoi feature enum and implementation can be made more universal
to allow re-use of the enum for other features.
2012-10-18 12:22:49 +02:00
Michal Privoznik
998dc17da3 qemu: Correctly wait for spice to migrate
Currently we query-spice after the main migration has completed
before moving to next state. Qemu reports this as boolean (not
enclosed within quotes). Therefore it is not correct to use
virJSONValueObjectGetString but virJSONValueObjectGetBoolean instead.
2012-10-18 10:31:56 +02:00
Viktor Mihajlovski
1916679506 qemu: Fixed default machine detection in qemuCapsParseMachineTypesStr
The machine in the last output line of <qemu-binary> -M ?
was always reported as default machine even if this wasn't the
actual default. Trivial fix.

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2012-10-17 17:24:41 -06:00
Martin Kletzander
ba63d8f7d8 qemu: Pin the emulator when only cpuset is specified
According to our recent changes (clarifications), we should be pinning
qemu's emulator processes using the <vcpu> 'cpuset' attribute in case
there is no <emulatorpin> specified.  This however doesn't work
entirely as expected and this patch should resolve all the remaining
issues.
2012-10-17 17:37:10 +02:00
Jiri Denemark
837993d845 qemu: Clear async job when p2p migration fails early
When p2p migration fails early because qemuMigrationIsAllowed or
qemuMigrationIsSafe say migration should be cancelled, we fail to clear
the migration-out async job. As a result of that, further APIs called
for the same domain may fail with Timed out during operation: cannot
acquire state change lock.

Reported by Guido Winkelmann.
2012-10-17 15:43:38 +02:00
Doug Goldstein
1e7ec88d9a interface: add virInterfaceGetXMLDesc() in udev
Added support for retrieving the XML defining a specific interface via
the udev based backend to virInterface. Implement the following APIs
for the udev based backend:
* virInterfaceGetXMLDesc()

Note: Does not support bond devices.
2012-10-17 13:59:16 +02:00
Li Zhang
40f58ca75d Doc-fix for PowerPC CPU model driver
There are some descriptions not right in PowerPC CPU model driver.
This patch is to fix them.

Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
2012-10-17 10:03:34 +02:00
Li Zhang
9943a7341c Implement CPU model driver for PowerPC
Currently, the CPU model driver is not implemented for PowerPC.
Host's CPU information is needed to exposed to guests' XML file some
time.

This patch is to implement the callback functions of CPU model driver.

Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
2012-10-17 10:03:34 +02:00
Li Zhang
309f03db40 Add one file cpu_ppc_data.h to define CPU data for PPC
CPU version can be got by PVR on PowerPC. So this PVR is defined in
the CPU data in cpuData structure.

Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Acked-by: Michal Privoznik <mprivozn@redhat.com>
2012-10-17 10:03:34 +02:00
Guannan Ren
d37a3a1d6c selinux: remove unused variables in socket labelling 2012-10-17 13:13:17 +08:00
Guannan Ren
89b63f0ad4 selinux: fix wrong tapfd relablling
It should relabel tapfd of virtual network of type VIR_DOMAIN_NET_TYPE_DIRECT
rather than VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE
(commit ae368ebfcc introduced this bug)

Caution: The context of the two hunks is identical other than indentation.
Please be extremely cautious of where the patch gets applied.
2012-10-17 13:13:14 +08:00
Cole Robinson
9f0e9cba27 storage: lvm: lvcreate fails with allocation=0, don't do that
On F17 at least, this command fails:

$ sudo /usr/sbin/lvcreate --name sparsetest -L 0K --virtualsize 16384K vgvirt
  Unable to create new logical volume with no extents

Which is unfortunate since allocation=0 is what virt-manager tries to use
by default.

Rather than telling the user 'don't do that', let's just give them the
smallest allocation possible if alloc=0 is requested.

https://bugzilla.redhat.com/show_bug.cgi?id=866481
2012-10-16 21:16:44 -04:00
Cole Robinson
01df6f2bff storage: lvm: Don't overwrite lvcreate errors
Before:
$ sudo virsh vol-create-as --pool vgvirt sparsetest --capacity 16M --allocation 0
error: Failed to create vol sparsetest
error: internal error Child process (/usr/sbin/lvchange -aln vgvirt/sparsetest) unexpected exit status 5:   One or more specified logical volume(s) not found.

After:
$ sudo virsh vol-create-as --pool vgvirt sparsetest --capacity 16M --allocation 0
error: Failed to create vol sparsetest
error: internal error Child process (/usr/sbin/lvcreate --name sparsetest -L 0K --virtualsize 16384K vgvirt) unexpected exit status 5:   Unable to create new logical volume with no extents
2012-10-16 21:16:44 -04:00
Jiri Denemark
5ce6d95eed locking: Fix build with sanlock < 2.4
libvirt started using sanlock_killpath to implement on_lockfailure
action. Since sanlock_killpath was introduced in sanlock 2.4, libvirt
fails to build with older sanlock.
2012-10-16 21:32:05 +02:00
Daniel P. Berrange
7bd744c401 Fix typo in previous commit s/lik/like/
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 16:37:50 +01:00
Daniel P. Berrange
d507f8f9b9 Make virInitialize thread safe
Currently there is a restriction that multi-threaded applications
must manually call virInitialize, before threads start using
libvirt, because it is not thread-safe. By switching it to use
a virOnceControl initializer we gain thread safety, and thus
applications no longer need to manually call it. They can rely
on virConnectOpen invoking it for them.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 16:33:38 +01:00
Daniel P. Berrange
84912e9c91 Fix virProcessKillPainfully on Win32
Win32 platforms don't have SIGKILL defined, but they do have
SIGABRT. Since our virProcess wrapper treats anything which
isn't SIGTERM/SIGINT as equivalent to SIGKILL, just use
SIGABRT on Win32.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 15:47:14 +01:00
Daniel P. Berrange
381a339e98 Add JSON serialization of virNetServerPtr objects for process re-exec()
Add two new APIs virNetServerNewPostExecRestart and
virNetServerPreExecRestart which allow a virNetServerPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.

This includes serialization of all registered services
and clients

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 15:45:55 +01:00
Daniel P. Berrange
3cfc3d7d2c Add JSON serialization of virNetServerClientPtr objects for process re-exec()
Add two new APIs virNetServerClientNewPostExecRestart and
virNetServerClientPreExecRestart which allow a virNetServerClientPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.

This includes serialization of the connected socket associated
with the client

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 15:45:55 +01:00
Daniel P. Berrange
0cc7925520 Add JSON serialization of virNetServerServicePtr objects for process re-exec()
Add two new APIs virNetServerServiceNewPostExecRestart and
virNetServerServicePreExecRestart which allow a virNetServerServicePtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.

This includes serialization of the listening sockets associated
with the service

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 15:45:55 +01:00
Daniel P. Berrange
c298145344 Add JSON serialization of virNetSocketPtr objects for process re-exec()
Add two new APIs virNetSocketNewPostExecRestart and
virNetSocketPreExecRestart which allow a virNetSocketPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.

As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()

Since it is not possible to serialize SASL or TLS encryption
state, an error will be raised if attempting to perform
serialization on non-raw sockets

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 15:45:55 +01:00
Daniel P. Berrange
8057c04e8d Add JSON serialization of virLockSpacePtr objects for process re-exec()
Add two new APIs virLockSpaceNewPostExecRestart and
virLockSpacePreExecRestart which allow a virLockSpacePtr
object to be created from a JSON object and saved to a
JSON object, for the purposes of re-exec'ing a process.

As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 15:45:55 +01:00
Daniel P. Berrange
eca72d4759 Introduce an internal API for handling file based lockspaces
The previously introduced virFile{Lock,Unlock} APIs provide a
way to acquire/release fcntl() locks on individual files. For
unknown reason though, the POSIX spec says that fcntl() locks
are released when *any* file handle referring to the same path
is closed. In the following sequence

  threadA: fd1 = open("foo")
  threadB: fd2 = open("foo")
  threadA: virFileLock(fd1)
  threadB: virFileLock(fd2)
  threadB: close(fd2)

you'd expect threadA to come out holding a lock on 'foo', and
indeed it does hold a lock for a very short time. Unfortunately
when threadB does close(fd2) this releases the lock associated
with fd1. For the current libvirt use case for virFileLock -
pidfiles - this doesn't matter since the lock is acquired
at startup while single threaded an never released until
exit.

To provide a more generally useful API though, it is necessary
to introduce a slightly higher level abstraction, which is to
be referred to as a "lockspace".  This is to be provided by
a virLockSpacePtr object in src/util/virlockspace.{c,h}. The
core idea is that the lockspace keeps track of what files are
already open+locked. This means that when a 2nd thread comes
along and tries to acquire a lock, it doesn't end up opening
and closing a new FD. The lockspace just checks the current
list of held locks and immediately returns VIR_ERR_RESOURCE_BUSY.

NB, the API as it stands is designed on the basis that the
files being locked are not being otherwise opened and used
by the application code. One approach to using this API is to
acquire locks based on a hash of the filepath.

eg to lock /var/lib/libvirt/images/foo.img the application
might do

   virLockSpacePtr lockspace = virLockSpaceNew("/var/lib/libvirt/imagelocks");
   lockname = md5sum("/var/lib/libvirt/images/foo.img");
   virLockSpaceAcquireLock(lockspace, lockname);

NB, in this example, the caller should ensure that the path
is canonicalized before calculating the checksum.

It is also possible to do locks directly on resources by
using a NULL lockspace directory and then using the file
path as the lock name eg

   virLockSpacePtr lockspace = virLockSpaceNew(NULL);
   virLockSpaceAcquireLock(lockspace, "/var/lib/libvirt/images/foo.img");

This is only safe to do though if no other part of the process
will be opening the files. This will be the case when this
code is used inside the soon-to-be-reposted virlockd daemon

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-10-16 15:45:55 +01:00
Eric Blake
819c8ce043 maint: prepare for next release number
Given Daniel's announcement[1], code targetting the next release will
be in 1.0.0, not 0.10.3.  Changed mechanically with:

for f in $(git grep -l '0\(.\)10\13\b') ; do
   sed -i -e 's/0\(.\)10\13/1\10\10/g' $f
done

[1]https://www.redhat.com/archives/libvir-list/2012-October/msg00403.html

* docs/formatdomain.html.in: Use 1.0.0 for next release.
* src/interface/interface_backend_udev.c: Likewise.
2012-10-16 08:09:01 -06:00
Martin Kletzander
280b8c9e7c conf: Fix crash with cleanup
There was a crash possible when both <boot dev... and <boot
order... were specified due to virDomainDefParseBootXML() erroring out
before setting *tmp (which was free'd in cleanup).  As a fix, I
created this cleanup that uses one pointer for all the temporary
stored XPath strings and values, plus this pointer is correctly
initialized to NULL.
2012-10-16 11:15:04 +02:00
Martin Kletzander
6676c1fc8f selinux: Use raw contexts 2
In commit 9674f2c637, I forgot to change
selabel_lookup with the other functions, so this one-liner does exactly
that.
2012-10-16 10:30:18 +02:00
Eric Blake
2cfa14bc8a maint: drop spurious semicolons
Detected with:
git grep ';;$' -- '**/*.[ch]'

* src/network/bridge_driver.c (networkRadvdConfContents): Fix
harmless typo.
* src/phyp/phyp_driver.c (phypUUIDTable_Pull): Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveDel):
Likewise.
2012-10-15 09:08:19 -06:00
Guannan Ren
ae368ebfcc selinux: add security selinux function to label tapfd
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=851981
When using macvtap, a character device gets first created by
kernel with name /dev/tapN, its selinux context is:
system_u:object_r:device_t:s0

Shortly, when udev gets notification when new file is created
in /dev, it will then jump in and relabel this file back to the
expected default context:
system_u:object_r:tun_tap_device_t:s0

There is a time gap happened.
Sometimes, it will have migration failed, AVC error message:
type=AVC msg=audit(1349858424.233:42507): avc:  denied  { read write } for
pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524
scontext=unconfined_u:system_r:svirt_t:s0:c598,c908
tcontext=system_u:object_r:device_t:s0 tclass=chr_file

This patch will label the tapfd device before qemu process starts:
system_u:object_r:tun_tap_device_t:MCS(MCS from seclabel->label)
2012-10-15 21:01:07 +08:00
Martin Kletzander
7ba5defb5a Add support for SUSPEND_DISK event
This patch adds support for SUSPEND_DISK event; both lifecycle and
separated.  The support is added for QEMU, machines are changed to
PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes
to shut-off.  This and much more needs to be done in order for libvirt
to work with transient devices, wake-ups etc.  This patch is not
aiming for that functionality.
2012-10-15 12:09:10 +02:00
Ján Tomko
a9e3b4f78e util: switch virLogEatParams to virLogSource
Commit e8fd8757c8 changed 'const char *'
category to virLogSource enum. This changes it in virLogEatParams as
well, thus fixing the build with --disable-debug.
--
Hopefully moving the enum declarations is less ugly than using int.
2012-10-15 11:13:43 +02:00
Osier Yang
f81f0f2f1d node_memory: Add new parameter field to tune the new sysfs knob
Upstream kernel introduced new sysfs knob "merge_across_nodes" to
specify if pages from different numa nodes can be merged. When set
to 0, only pages which physically reside in the memory area of
same NUMA node can be merged. When set to 1, pages from all nodes
can be merged.

This patch supports the tuning by adding new param field
"shm_merge_across_nodes".
2012-10-15 17:35:54 +08:00
Laine Stump
6bde0a1a37 qemu: reorganize qemuDomainChangeNet and qemuDomainChangeNetBridge
This patch resolves:

  https://bugzilla.redhat.com/show_bug.cgi?id=805071

to the extent that it can be resolved with current qemu functionality.
It attempts to detect as many situations as possible when the simple
operation of disconnecting an existing tap device from one bridge and
attaching it to another will satisfy the change requested in
virDomainUpdateDeviceFlags() for a network device. Before this patch,
that situation could only be detected if the pre-change interface
*and* the post-change interface definition were both "type='bridge'".
After this patch, it can also be detected if the before or after
interfaces are any combination of type='bridge' and type='network'
(the networks can be <forward mode='nat|route|bridge'>, as long as
they use a Linux host bridge and not macvtap connections).

This extra effort is especially useful since the recent discovery that
a netdev_del+netdev_add combo (to reconnect the network device with
completely different hostside configuration) doesn't work properly
with current qemu (1.2) unless it is accompanied by the matching
device_del+device_add - see this mailing list message for details:

  http://lists.nongnu.org/archive/html/qemu-devel/2012-10/msg02355.html

(A slight modification of the patch referenced there has been prepared
to apply on top of this patch, but won't be pushed until qemu can be
made to work with it.)

* qemuDomainChangeNet needs access to the virDomainDeviceDef that
holds the new netdef (so that it can clear out the virDomainDeviceDef
if it ends up using the NetDef to replace the original), so the
virDomainNetDefPtr arg is replaced with a virDomainDeviceDefPtr.

* qemuDomainChangeNet previously checked for *some* changes to the
interface config, but this check was by no means complete. It was also
a bit disorganized.

This refactoring of the code is (I believe) complete in its check of
all NetDef attributes that might be changed, and either returns a
failure (for changes that are simply impossible), or sets one of three
flags:

  needLinkStateChange - if the device link state needs to go up/down
  needBridgeChange    - if everything else is the same, but it needs
                        to be connected to a difference linux host
                        bridge
  needReconnect       - if the entire host side of the device needs
                        to be torn down and reconstructed (currently
                        non-working, as mentioned above)

Note that this function will refuse to make any change that requires
the *guest* side of the device to be detached (e.g. changing the PCI
address or mac address). Those would be disruptive enough to the guest
that it's reasonable to require an explicit detach/attach sequence
from the management application.

* As mentioned above, qemuDomainChangeNet also does its best to
understand when a simple change in attached bridge for the existing
tap device will work vs. the need to completely tear down/reconstruct
the host side of the device (including tap device).

This patch *does not* implement the "reconnect" code anyway - there is
a placeholder that turns that into an error. Rather, the purpose of
this patch is to replicate existing behavior with code that is ready
to have that functionality plugged in in a later patch.

* The expanded uses for qemuDomainChangeNetBridge meant that it needed
to be enhanced as well - it no longer replaces the original brname
string in olddev with the new brname; instead, it relies on the
caller to replace the *entire* olddev with newdev (since we've gone
to great lengths to assure they are functionally identical other
than the name of the bridge, this is now not only safe, but more
correct). Additionally, qemuDomainNetChangeBridge can now set the
bridge for type='network' interfaces as well as plain type='bridge'
interfaces. (Note that I had to make this change simultaneous to the
reorganization of qemuDomainChangeNet because the two are too
closely intertwined to separate).
2012-10-15 04:36:39 -04:00
Guido Günther
dc9d7a171c Avoid straying </cpuset>
by using the same condition as for the <cpuset>.

Fixes "make check" found by
    http://honk.sigxcpu.org:8001/job/libvirt-check/160/
2012-10-15 17:14:25 +08:00
Laine Stump
11c47d979c conf: virDomainDeviceInfoCopy utility function
This does a shallow copy of all the bits, then strdups the two items
that are actually allocated separately.
2012-10-15 04:03:06 -04:00
Laine Stump
310945597c conf: fix virDevicePCIAddressEqual args
This function really should have been taking virDevicePCIAddress*
instead of the inefficient virDevicePCIAddress (results in copying two
entire structs onto the stack rather than just two pointers), and
returning a bool true/false (not matching is not necessarily a
"failure", as a -1 return would imply, and also using "if
(!virDevicePCIAddressEqual(x, y))" to mean "if x == y" is just a bit
counterintuitive).
2012-10-15 04:03:06 -04:00
Guido Günther
a2b80edbc6 Fix tab vs space
that broke "make syntax-check"

found by http://honk.sigxcpu.org:8001/job/libvirt-syntax-check/157/

Pushed under the build breaker rule.
2012-10-15 09:18:18 +02:00
Osier Yang
3635b41e15 qemu: Ignore def->cpumask if emulatorpin is specified
If the vcpu placement is "static", it's just fine to ignore the
def->cpumask if emulatorpin is specified.
2012-10-15 12:20:37 +08:00
Osier Yang
5378effd57 conf: Ignore emulatorpin if vcpu placement is auto
When vcpu placement is "auto", the domain process will be pinned
to advisory nodeset from querying numad, While emulatorpin will
override the pinning. That means both of them are to set the
pinning policy for domain process, but conflicts with each other.

This patch ingore emulatorpin if vcpu placement is "auto", because
<vcpu> placement can't be simply ignored for <numatune> placement
could default to it.
2012-10-15 12:19:54 +08:00