Commit id '2c322378' missed the nuance that the redirdev backend could
be using a TCP chardev and if TLS is enabled on the host, thus will need
to have the TLS object added.
Use a pointer and the virDomainChrSourceDefNew() function in order to
allocate the structure for _virDomainRedirdevDef.
Signed-off-by: John Ferlan <jferlan@redhat.com>
from virDomainDefPtr to virDomainObjPtr so that the function has
access to the other parts of the virDomainObjPtr. Take advantage of
this by removing the "priv" arg and retrieving it from the
virDomainObjPtr instead.
No functional change.
Change the virDomainChrDef to use a pointer to 'source' and allocate
that pointer during virDomainChrDefNew.
This has tremendous "fallout" in the rest of the code which mainly
has to change source.$field to source->$field.
Signed-off-by: John Ferlan <jferlan@redhat.com>
There was inconsistency between alias used to create tls-creds-x509
object and alias used to link that object to chardev while hotpluging.
Hotplug ends with this error:
error: Failed to detach device from channel-tcp.xml
error: internal error: unable to execute QEMU command 'chardev-add':
No TLS credentials with id 'objcharchannel3_tls0'
In XML we have for example alias "serial0", but on qemu command line we
generate "charserial0".
The issue was that code, that creates QMP command to hotplug chardev
devices uses only the second alias "charserial0" and that alias is also
used to link the tls-creds-x509 object.
This patch unifies the aliases for tls-creds-x509 to be always generated
from "charserial0".
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We need to make sure that the chardev is TCP. Without this check we
may access different part of union and corrupt pointers.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1366108
There are couple of things that needs to be done in order to
allow vhost-user hotplug. Firstly, vhost-user requires a chardev
which is connected to vhost-user bridge and through which qemu
communicates with the bridge (no acutal guest traffic is sent
through there, just some metadata). In order to generate proper
chardev alias, we must assign device alias way sooner.
Then, because we are plugging the chardev first, we need to do
the proper undo if something fails - that is remove netdev too.
We don't want anything to be left over in case attach fails at
some point.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Instead of blindly claim support for hot-plugging of every
interface type out there we should copy approach we have for
device types: white listing supported types and explicitly error
out on unsupported ones.
For instance, trying to hotplug vhostuser interface results in
nothing usable from guest currently. vhostuser typed interfaces
require additional work on our side.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The idea is to have function that does some checking at its
beginning and then have one big switch for all the interface
types it supports.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This function for some weird reason returns integer instead of
virDomainNetType type. It is important to return the correct type
so that we know what values we can expect.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Qemu always opens the tray if forced to. Skip the waiting step in such
case.
This also helps if qemu does not report the tray change event when
opening the cdrom forcibly (the documentation says that the event will
not be sent although qemu in fact does trigger it even if @force is
selceted).
This is a workaround for a qemu issue where qemu does not send the tray
change event in some cases (after migration with empty closed locked
drive) and thus renders the cdrom useless from libvirt's point of view.
Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1368368
If the incoming XML defined a path to a TLS X.509 certificate environment,
add the necessary 'tls-creds-x509' object to the VIR_DOMAIN_CHR_TYPE_TCP
character device.
Likewise, if the environment exists the hot unplug needs adjustment as
well. Note that all the return ret were changed to goto cleanup since
the cfg needs to be unref'd
Signed-off-by: John Ferlan <jferlan@redhat.com>
The linkstate setting of an <interface> is only meant to change the
online status reported to the guest system by the emulated network
device driver in qemu, but when support for auto-creating tap devices
for <interface type='ethernet'> was added in commit 9717d6, a chunk of
code was also added to qemuDomainChangeNetLinkState() that sets the
online status of the tap device (i.e. the *host* side of the
interface) for type='ethernet'. This was never done for tap devices
used in type='bridge' or type='network' interfaces, nor was it done in
the past for tap devices created by external scripts for
type='ethernet', so we shouldn't be doing it now.
This patch removes the bit of code in qemuDomainChangeNetLinkState()
that modifies online status of the tap device.
This patch removes the old vcpu unplug code completely and replaces it
with the new code using device_del. The old hotplug code basically never
worked with any recent qemu and thus is useless.
As the new code is using device_del all the implications of using it
are present. Contrary to the device deletion code, the vcpu deletion
code fails if the unplug request is not executed in time.
https://bugzilla.redhat.com/show_bug.cgi?id=1289391
Rather than pass the whole drive string (which contained the alias),
pass only the alias for the qemuMonitorDriveDel call in the error
path when adding a host device in the monitor fails.
Partial fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1336225
Similar to the other disk types, add the qemuMonitorDriveDel in the failure
to add/hotplug a USB.
Added a couple of other formatting changes just to have a less cluttered look
Since we already have a function that will generate the drivestr from
the alias, let's use it and remove the qemuDeviceDriveHostAlias.
Move the QEMU_DRIVE_HOST_PREFIX definition into qemu_alias.h
Also alter qemuAliasFromDisk to use the QEMU_DRIVE_HOST_PREFIX instead
of "drive-%s".
The current LUKS support has a "luks" volume type which has
a "luks" encryption format.
This partially makes sense if you consider the QEMU shorthand
syntax only requires you to specify a format=luks, and it'll
automagically uses "raw" as the next level driver. QEMU will
however let you override the "raw" with any other driver it
supports (vmdk, qcow, rbd, iscsi, etc, etc)
IOW the intention though is that the "luks" encryption format
is applied to all disk formats (whether raw, qcow2, rbd, gluster
or whatever). As such it doesn't make much sense for libvirt
to say the volume type is "luks" - we should be saying that it
is a "raw" file, but with "luks" encryption applied.
IOW, when creating a storage volume we should use this XML
<volume>
<name>demo.raw</name>
<capacity>5368709120</capacity>
<target>
<format type='raw'/>
<encryption format='luks'>
<secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccd2f80d6f'/>
</encryption>
</target>
</volume>
and when configuring a guest disk we should use
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/home/berrange/VirtualMachines/demo.raw'/>
<target dev='sda' bus='scsi'/>
<encryption format='luks'>
<secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccd2f80d6f'/>
</encryption>
</disk>
This commit thus removes the "luks" storage volume type added
in
commit 318ebb36f1
Author: John Ferlan <jferlan@redhat.com>
Date: Tue Jun 21 12:59:54 2016 -0400
util: Add 'luks' to the FileTypeInfo
The storage file probing code is modified so that it can probe
the actual encryption formats explicitly, rather than merely
probing existance of encryption and letting the storage driver
guess the format.
The rest of the code is then adapted to deal with
VIR_STORAGE_FILE_RAW w/ VIR_STORAGE_ENCRYPTION_FORMAT_LUKS
instead of just VIR_STORAGE_FILE_LUKS.
The commit mentioned above was included in libvirt v2.0.0.
So when querying volume XML this will be a change in behaviour
vs the 2.0.0 release - it'll report 'raw' instead of 'luks'
for the volume format, but still report 'luks' for encryption
format. I think this change is OK because the storage driver
did not include any support for creating volumes, nor starting
guets with luks volumes in v2.0.0 - that only since then.
Clearly if we change this we must do it before v2.1.0 though.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Dropping the caching of virtio serial address set.
Instead of using the cached address set, a function in qemu_hotplug.c
now recalculates it on demand.
Credit goes to Cole Robinson.
Since return code is checked globally at the end of the function, let's
make sure that we set it correctly at any point.
This fixes a regression introduced in commit 0aa19f35 where the first
command to eject changeable media would fail unconditionally.
Signed-off-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1301021
Generate the luks command line using the AES secret key to encrypt the
luks secret. A luks secret object will be in addition to a an AES secret.
For hotplug, check if the encinfo exists and if so, add the AES secret
for the passphrase for the secret object used to decrypt the device.
Modify/augment the fakeSecret* in qemuxml2argvtest in order to handle
find a uuid or a volume usage with a specific path prefix in the XML
(corresponds to the already generated XML tests). Add error message
when the 'usageID' is not 'mycluster_myname'. Commit id '1d632c39'
altered the error message generation to rely on the errors from the
secret_driver (or it's faked replacement).
Add the .args output for adding the LUKS disk to the domain
Signed-off-by: John Ferlan <jferlan@redhat.com>
Soon we will be adding luks encryption support. Since a volume could require
both a luks secret and a secret to give to the server to use of the device,
alter the alias generation to create a slightly different alias so that
we don't have two objects with the same alias.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit id 'a1344f70a' added AES secret processing for RBD when starting
up a guest. As such, when the hotplug code calls qemuDomainSecretDiskPrepare
an AES secret could be added to the disk about to be hotplugged. If an AES
secret was added, then the hotplug code would need to generate the secret
object because qemuBuildDriveStr would add the "password-secret=" to the
returned 'driveStr' rather than the base64 encoded password.
Signed-off-by: John Ferlan <jferlan@redhat.com>
A recent adjustment to qemuDomainAttachRNGDevice to properly cleanup
the props object after a qemuMonitorAddObject also would affect this
code. Alter the cleanup to be similar to RNG changes.
Based on recent review comment - rather than have a spate of goto failxxxx,
change to a boolean based model. Ensures that the original error can be
preserved and cleanup is a bit more orderly if more objects are added.
Based on recent review comment - rather than have a spate of goto failxxxx,
change to a boolean based model. Ensures that the original error can be
preserved and cleanup is a bit more orderly if more objects are added.
Based on recent review comment - rather than have a spate of goto failxxxx,
change to a boolean based model. Ensures that the original error can be
preserved and cleanup is a bit more orderly if more objects are added.
Based on recent review comment - rather than have a spate of goto failxxxx,
change to a boolean based model. Ensures that the original error can be
preserved and cleanup is a bit more orderly if more objects are added.
Based on recent review comment - rather than have a spate of goto failxxxx,
change to a boolean based model. Ensures that the original error can be
preserved and cleanup is a bit more orderly if more objects are added.
Ensure that the given controller and all controllers with a smaller
index exist; there must not be any missing index in between.
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
The commit "qemu: hot-plug: Assume support for -device in
qemuDomainAttachSCSIDisk" dropped the code for the automatic SCSI
controller creation used in SCSI disk hot-plugging. If we are
hot-plugging a SCSI disk to a domain and there is no proper SCSI
controller defined, it results in an "error: internal error: Could not
find scsi controller with index X required for device" error.
For that reason reverting a hunk of the commit
d4d32005d6.
This patch also adds an extra comment to the code to clarify the
loop.
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
CVE-2016-5008
Setting an empty graphics password is documented as a way to disable
VNC/SPICE access, but QEMU does not always behaves like that. VNC would
happily accept the empty password. Let's enforce the behavior by setting
password expiration to "now".
https://bugzilla.redhat.com/show_bug.cgi?id=1180092
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When support for <interface type='ethernet'> was added in commit
9a4b705f back in 2010, it erroneously looked at <source dev='blah'/>
for a user-specified guest-side interface name. This was never
documented though. (that attribute already existed at the time in the
data.ethernet union member of virDomainNetDef, but apparently had no
practical use - it was only used as a storage place for a NetDef's
bridge name during qemuDomainXMLToNative(), but even then that was
never used for anything).
When support for similar guest-side device naming was added to the lxc
driver several years later, it was put in a new subelement <guest
dev='blah'/>.
In the intervening years, since there was no validation that
ethernet.dev was NULL in the other drivers that didn't actually use
it, innocent souls who were adding other features assuming they needed
to account for non-NULL ethernet.dev when really they didn't, so
little bits of the usual pointless cargo-cult code showed up.
This patch not only switches the openvz driver to use the documented
<guest dev='blah'/> notation for naming the guest-side device (just in
case anyone is still using the openvz driver), and logs an error if
anyone tries to set <source dev='blah'/> for a type='ethernet'
interface, it also removes the cargo-cult uses of ethernet.dev and
<source dev='blah'/>, and eliminates if from the RNG and from
virDomainNetDef.
NB: I decided on this course of action after mentioning the
inconsistency here:
https://www.redhat.com/archives/libvir-list/2016-May/msg02038.html
and getting encouragement do eliminate it in a later IRC discussion
with danpb.
Since commit 7140807917, qemu agent
channel cannot be plugged in because we won't generate its path
automatically. Let's not only fix that, but also add tests for it so
next time it's checked for.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1322210
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Hand-entering indexes for 20 PCI controllers is not as tedious as
manually determining and entering their PCI addresses, but it's still
annoying, and the algorithm for determining the proper index is
incredibly simple (in all cases except one) - just pick the lowest
unused index.
The one exception is USB2 controllers because multiple controllers in
the same group have the same index. For these we look to see if 1) the
most recently added USB controller is also a USB2 controller, and 2)
the group *that* controller belongs to doesn't yet have a controller
of the exact model we're just now adding - if both are true, the new
controller gets the same index, but in all other cases we just assign
the lowest unused index.
With this patch in place and combined with the automatic PCI address
assignment, we can define a PCIe switch with several ports like this:
<controller type='pci' model='pcie-root-port'/>
<controller type='pci' model='pcie-switch-upstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
...
These will each get a unique index, and PCI addresses that connect
them together appropriately with no pesky numbers required.
Use the detected tray presence flag to trigger the tray waiting code
only if the given storage device in qemu reports to have a tray.
This is necessary as the floppy device lost it's tray as of qemu commit:
commit abb3e55b5b718d6392441f56ba0729a62105ac56
Author: Max Reitz <mreitz@redhat.com>
Date: Fri Jan 29 20:49:12 2016 +0100
Revert "hw/block/fdc: Implement tray status"
Commit 1fad65d49a used a really big hammer
and overwrote the error message that might be reported by qemu if the
tray is locked. Fix it by reporting the error only if no error is
currently set.
Error after commit mentioned above:
error: internal error: timed out waiting for disk tray status update
New error:
error: internal error: unable to execute QEMU command 'eject': Tray of
device 'drive-ide0-0-0' is not open
Add the data structure and infrastructure to support an initialization
vector (IV) secrets. The IV secret generation will need to have access
to the domain private master key, so let's make sure the prepare disk
and hostdev functions can accept that now.
Anywhere that needs to make a decision over which secret type to use
in order to fill in or use the IV secret has a switch added.
Signed-off-by: John Ferlan <jferlan@redhat.com>
We had both and the only difference was that the latter also included
information about multifunction setting. The problem with that was that
we couldn't use functions made for only one of the structs (e.g.
parsing). To consolidate those two structs, use the one in virpci.h,
include that in domain_conf.h and add the multifunction member in it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Similar to the qemuDomainSecretDiskPrepare, generate the secret
for the Hostdev's prior to call qemuProcessLaunch which calls
qemuBuildCommandLine. Additionally, since the secret is not longer
added as part of building the command, the hotplug code will need
to make the call to add the secret in the hostdevPriv.
Since this then is the last requirement to pass a virConnectPtr
to qemuBuildCommandLine, we now can remove that as part of these
changes. That removal has cascading effects through various callers.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than needing to pass the conn parameter to various command
line building API's, add qemuDomainSecretPrepare just prior to the
qemuProcessLaunch which calls qemuBuilCommandLine. The function
must be called after qemuProcessPrepareHost since it's expected
to eventually need the domain masterKey generated during the prepare
host call. Additionally, future patches may require device aliases
(assigned during the prepare domain call) in order to associate
the secret objects.
The qemuDomainSecretDestroy is called after the qemuProcessLaunch
finishes in order to clear and free memory used by the secrets
that were recently prepared, so they are not kept around in memory
too long.
Placing the setup here is beneficial for future patches which will
need the domain masterKey in order to generate an encrypted secret
along with an initialization vector to be saved and passed (since
the masterKey shouldn't be passed around).
Finally, since the secret is not added during command line build,
the hotplug code will need to get the secret into the private disk data.
Signed-off-by: John Ferlan <jferlan@redhat.com>
After killing one of the conditionals it's now guaranteed to have
@drivealias populated when calling the monitor, so the code attempting
to cleanup can be simplified.
If qemu doesn't support DEVICE_TRAY_MOVED event the code that attempts
to change media would attempt to re-eject the tray even if it wouldn't
be notified when the tray opened. Add a capability bit and skip retrying
for old qemus.
Empty floppy drives start with tray in "open" state and libvirt did not
refresh it after startup. The code that inserts media into the tray then
waited until the tray was open before inserting the media and thus
floppies could not be inserted.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1326660
Rather than trying some magic calculations on our side query the monitor
for the current size of the memory balloon both on hotplug and
hotunplug.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1220702
Similarly to the DEVICE_DELETED event we will be able to tell when
unplug of certain device types will be rejected by the guest OS. Wire up
the device deletion signalling code to allow handling this.
Neither of the callers cares whether the DEVICE_DELETED event isn't
supported or the event was received. Simplify the code and callers by
unifying the two values and changing the return value constants so that
a temporary variable can be omitted.
Callers ignore if this function returns -1 and continue as though the
DEVICE_DELETED event was not received. Since we can't be sure that the
event was not received we should behave as if the event was not
supported and remove the device definition right away. The error
fortunately won't really happen here.
Essentially revert commit 3a6204c which added these to allow the test
suite to pass without depending on the host system state.
Since commit 4b527c1 we already mock virSCSIDeviceGetSgName, so these
callbacks are useless.
For device hotplug, the new alias ID needs to be checked in the list
rather than using the count of devices. Unplugging a device that is not
last in the array will make further hotplug impossible due to alias
collision.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1324551
For device hotplug, the new alias ID needs to be checked in the list
rather than using the count of devices. Unplugging a device that is not
last in the array will make further hotplug impossible due to alias
collision.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1324551
When a hostdev is attached to the guest (and removed from the host),
the order of operations is call qemuHostdevPreparePCIDevices to remove
the device from the host, call qemuSetupHostdevCgroup to setup the cgroups,
and virSecurityManagerSetHostdevLabel to set the labels.
When the device is removed from the guest, the code didn't use the
reverse order leading to possible issues (especially if the path to
the device no longer exists). This patch will move the call to
qemuTeardownHostdevCgroup to prior to reattaching the device to
the host.
When a hostdev is attached to the guest (and removed from the host),
the order of operations is call qemuHostdevPreparePCIDevices to remove
the device from the host, call qemuSetupHostdevCgroup to setup the cgroups,
and virSecurityManagerSetHostdevLabel to set the labels.
When the device is removed from the guest, the code didn't use the
reverse order leading to possible issues (especially if the path to
the device no longer exists). This patch will move the call to
virSecurityManagerRestoreHostdevLabel to prior to reattaching the
device to the host.
In certain cases, we need to assign a hostdevN-style alias in a case
when we don't have a virDomainHostdevDefPtr (instead we have a
virDomainNetDefPtr). Since qemuAssignDeviceHostdevAlias() doesn't use
anything in the virDomainHostdevDef except the alias string itself
anyway, this patch just changes the arguments to pass a pointer to the
alias pointer instead.
If a user specify network type ethernet, then create it via libvirt and run
script if it provided. After this commit user does not need to
run external script to create tap device or add root permissions to qemu
process.
Signed-off-by: Vasiliy Tolstov <v.tolstov@selfip.ru>
QEMU changed the error message to:
"Tray of device 'drive-sata0-0-1' is not open"
and they may change the error massage in the future.
This updates the code to not depend on the text from the error message
but only on error itself.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Create new modules qemu_domain_address.c and qemu_domain_address.h to
contain all the new functions and header data. Additionally move any
supporting static functions.
Make qemuDomainSupportsPCI non static.
Also, move and rename the following:
qemuSetSCSIControllerModel to qemuDomainSetSCSIControllerModel
qemuCollectPCIAddress to qemuDomainCollectPCIAddress
qemuValidateDevicePCISlotsPIIX3 to qemuDomainValidateDevicePCISlotsPIIX3
qemuAssignDevicePCISlots to qemuDomainAssignDevicePCISlots
Signed-off-by: John Ferlan <jferlan@redhat.com>
Move the misplaced function from qemu_command.c to qemu_interface.c
since it's closer in functionality there and had less to do with building
the command line.
Rename function to qemuInterfaceBridgeConnect and modify callers.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Move the misplaced function from qemu_command.c to qemu_interface.c
since it's closer in functionality there and had less to do with building
the command line.
Rename function to qemuInterfaceDirectConnect and modify callers.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Currently, on hot unplug of PCI devices with VFIO driver for QEMU, libvirt is
trying to restore the host devices to it's previous value (basically a chown
on the previous user/group).
However for devices with VFIO driver, when the device is unbinded it is
removed from the /dev/vfio file system causing the restore label to fail.
The fix is to not restore the label for those PCI devices since they are going
to be teared down anyway.
Signed-off-by: Ludovic Beliveau <ludovic.beliveau@windriver.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Move the logic from virDomainDiskDefDstDuplicates into
virDomainDiskDefCheckDuplicateInfo so that we don't have to loop
multiple times through the array of disks. Since the original function
was called in qemuBuildDriveDevStr, it was actually called for every
single disk which was quite wasteful.
Additionally the target uniqueness check needed to be duplicated in
the disk hotplug case, since the disk was inserted into the domain
definition after the device string was formatted and thus
virDomainDiskDefDstDuplicates didn't do anything in that case.
We only support hotplugging SCSI controllers.
The USB and virtio-serial related code was never reachable because
this function was only called for VIR_DOMAIN_CONTROLLER_TYPE_SCSI
controllers.
This reverts commit ee0d97a and parts of commits 16db8d2
and d6d54cd1.
This function calls qemuDomainAttachControllerDevice for SCSI
controllers and reports an error for all other controllers.
Move the error inside qemuDomainAttachControllerDevice and delete this
wrapper.
We increase the limit before plugging in a PCI hostdev or a memory
module because some memory might need to be locked due to eg. VFIO.
Of course we should do the opposite after unplugging a device: this
was already the case for memory modules, but not for PCI hostdevs.
when appropriate, of course. If the config for a domain specifies boot
order with <boot dev='blah'/> elements, e.g.:
<os>
...
<boot dev='hd'/>
<boot dev='network'/>
</os>
Then the first disk device in the config will have ",bootindex=1"
appended to its qemu commandline -device options, and the first (and
*only* the first) network interface device will get ",bootindex=2".
However, if the first network interface device is a "hostdev" device
(an SRIOV Virtual Function (VF) being assigned to the domain with
vfio), then the bootindex option will *not* be appended. This happens
because the bootindex=n option corresponding to the order of "<boot
dev='network'/>" is added to the -device for the first network device
when network device commandline args are constructed, but if it's a
hostdev network device, its commandline arg is instead constructed in
the loop for hostdevs.
This patch fixes that omission by noticing (in bootHostdevNet) if the
first network device was a hostdev, and if so passing on the proper
bootindex to the commandline generator for hostdev devices - the
result is that ",bootindex=2" will be properly appended to the first
"network" device in the config even if it is really a hostdev
(including if it is assigned from a libvirt network pool). (note that
this is only the case if there is no <bootmenu enabled='yes'/> element
in the config ("-boot menu-on" in qemu) , since the two are mutually
exclusive - when the bootmenu is enabled, the individual per-device
bootindex options can't be used by qemu, and we revert to using "-boot
order=xyz" instead).
If a greater level of control over boot order is desired (e.g., more
than one network device should be tried, or a network device other
than the first one encountered in the config), then <boot
dev='network'/> in the <os> element should not be used; instead, the
individual device elements in the config should be given a "<boot
order='n'/>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1278421
https://bugzilla.redhat.com/show_bug.cgi?id=1240439
Ta-da! Now that we know how to open a macvtap device multiple
times, we can finally enable the multiqueue feature. Everything
else is already prepared (e.g. command line generation) from the
previous iteration where the feature was implemented for
TUN/TAP devices.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When a SCSI disk is hotplugged to a domain that does not have the required
SCSI controller already defined and loaded the following internal error occurs
error: Failed to attach device from scsi_disk.xml
error: internal error: Could not find scsi controller with index 0 required for device
Commit 0260506c added in method qemuBuildDriveDevStr a lookup of the controller
alias. The internal error occurs because in method qemuDomainAttachSCSIDisk
the automatic creation of the potentially missing SCSI controller occurs after
calling qemuBuildDriveDevStr.
This patch reverses the calling sequence.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
The function is used everywhere else to check whether the locked
memory limit should be set / updated, and it should be used here
as well.
Moreover, qemuDomainGetMlockLimitBytes() expects the hostdev to
have already been added to the domain definition, but we only do
that at the end of qemuDomainAttachHostPCIDevice(). Work around
the issue by adding the hostdev before adjusting the locked memory
limit and removing it immediately afterwards.
The -sdl and -net ...name=XXX arguments were both introduced
in QEMU 0.10, so the QEMU driver can assume they are always
available.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As of QEMU 0.9.1 the -drive argument can be used to configure
all disks, so the QEMU driver can assume it is always available
and drop support for -hda/-cdrom/etc.
Many of the tests need updating because a great many were
running without CAPS_DRIVE set, so using the -hda legacy
syntax.
Fixing the tests uncovered a bug in the argv -> xml
convertor which failed to handle disk with if=floppy.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If mlock is required either due to use of VFIO hostdevs or due to the
fact that it's enabled it needs to be tweaked prior to adding new memory
or after removing a module. Add a helper to determine when it's
necessary and reuse it both on hotplug and hotunplug.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1273491
Adopt the same names used for virHostdevReAttach*Devices() for
consistency's sake and to make it easier to jump between the two.
No functional changes.
We have macros for both positive and negative string matching.
Therefore there is no need to use !STREQ or !STRNEQ. At the same
time as we are dropping this, new syntax-check rule is
introduced to make sure we won't introduce it again.
Signed-off-by: Ishmanpreet Kaur Khera <khera.ishman@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Extract the size determination into a separate function and reuse it
across the memory device alignment functions. Since later we will need
to decide the alignment size according to architecture let's pass def to
the functions.
Every single call to qemuDomainEventQueue() uses the following pattern:
if (event)
qemuDomainEventQueue(driver, event);
Let's move the check for valid event to qemuDomainEventQueue and
simplify all callers.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Commit 8125113c added code that should remove the disk backend if the
fronted hotplug failed for any reason. The code had a bug though as it
used the disk string for unplug rather than the backend alias. Fix the
code by pre-creating an alias string and using it instead of the disk
string. In cases where qemu does not support QEMU_CAPS_DEVICE, we ignore
the unplug of the backend since we can't really create an alias in that
case.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1262399
https://bugzilla.redhat.com/show_bug.cgi?id=1258361
When attaching a disk, controller, or rng using an address type ccw
or s390, we need to ensure the support is provided by both the machine.os
and the emulator capabilities (corollary to unconditional setting when
address was not provided for the correct machine.os and emulator.
For an inactive guest, an addition followed by a start would cause the
startup to fail after qemu_command builds the command line and attempts
to start the guest. For an active guest, libvirtd would crash.
Rather than have different usages of STR function in order to determine
whether the domain is s390-ccw or s390-ccw-virtio, make a single API
which will check the machine.os prefix. Then use the function.
Adds a new interface type using UDP sockets, this seems only applicable
to QEMU but have edited tree-wide to support the new interface type.
The interface type required the addition of a "localaddr" (local
address), this then maps into the following xml and qemu call.
<interface type='udp'>
<mac address='52:54:00:5c:67:56'/>
<source address='127.0.0.1' port='11112'>
<local address='127.0.0.1' port='22222'/>
</source>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</interface>
QEMU call:
-net socket,udp=127.0.0.1:11112,localaddr=127.0.0.1:22222
Notice the xml "local" entry becomes the "localaddr" for the qemu call.
reference:
http://lists.gnu.org/archive/html/qemu-devel/2011-11/msg00629.html
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1226234#c3
If the qemu monitor fails to remove the memory from the guest for
any reason, the auditlog message will incorrectly use the current
actual memory (via virDomainDefGetMemoryActual) instead of the
value we were attempting to reduce to. The result is the 'new-mem'
and 'old-mem' values for the auditlog message would be identical.
This patch creates a local 'newmem' which accounts for the current
memory size minus the memory which is being removed. NB, for the
success case this results in the same value that would be returned
by virDomainDefGetMemoryActual without the need to do the math. This
follows the existing code which would subtract the size for cur_balloon.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1226234#c3
Prior to this patch, after successfully hot plugging memory
the audit log indicated that the update failed, e.g.:
type=VIRT_RESOURCE ... old-mem=1024000 new-mem=1548288 \
exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=pts/2 res=failed
This patch will adjust where virDomainAuditMemory is called to
ensure the proper 'ret' value is used based on success or failure.
Additionally, the audit message should include the size of the
memory we were attempting to change to rather than the current
actual size. On failure to add, the message showed the same value
for old-mem and new-mem.
In order to do this, introduce a 'newmem' local which will compute
the new size based on the oldmem size plus the size of memory we
are about to add. NB: This would be the same as calling the
virDomainDefGetMemoryActual again on success, but avoids the
overhead of recalculating. Plus cur_balloon is already adjusted
by the same value, so this follows that.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Commit aa2cc7 modified a previously unnecessary but innocuous check
for interface IP address during interface update incorrectly, causing
all attempted updates (e.g. changing link state) to interfaces of
type='ethernet' for QEMU to fail.
This patch fixes the issue by completely removing the check for IP
address, which is pointless since QEMU doesn't support setting
interface IP addresses from the domain interface XML anyway.
Signed-off-by: Vasiliy Tolstov <v.tolstov@selfip.ru>
Signed-off-by: Laine Stump <laine@laine.org>
nwfilter uses iptables and ebtables, which only work properly on
tap-based network connections (*not* on macvtap, for example), but we
just ignore any <filterref> elements for other types of networks,
potentially giving users a false sense of security.
This patch checks the network type and fails/logs an error if any
domain <interface> has a <filterref> when the connection isn't using a
tap device.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1180011
Some guests lock the tray and QEMU eject command will simply fail to
eject the media. But the guest OS can handle this attempt to eject the
media and can unlock the tray and open it. In this case, we should try
again to actually eject the media.
If the first attempt fails to detect a tray_open we will fail with
error, from monitor. If we receive that event, we know, that the guest
properly reacted to the eject request, unlocked the tray and opened it.
In this case, we need to run the command again to actually eject the
media from the device. The reason to call it again is, that QEMU
doesn't wait for the guest to react and report an error, that the tray
is locked.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1147471
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The target type comparison in qemuDomainDetachChrDevice
used the VIR_DOMAIN_CHR_SERIAL_TARGET_TYPE enum, so virtio-serial
addresses were not freed properly for channel devices.
Call qemuDomainReleaseDeviceAddress uncoditionally and decide
based on the address type instead of the target/device types.
Also check the device type when deciding what type the address should
be. Commit 9807c47 (aiming to fix another error in address allocation)
only checked the target type, but its value is different for different
device types. This resulted in an error when trying to attach
a channel with target type 'virtio':
error: Failed to attach device from channel-file.xml
error: internal error: virtio serial device has invalid address type
Make the logic for releasing the address dependent only on
* the address type
* whether it was allocated earlier
to avoid copying the device and target type checks.
https://bugzilla.redhat.com/show_bug.cgi?id=1230039
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The SCSI Architecture Model defines a logical unit address
as 64-bits in length, so change the field accordingly so
that the entire value could be stored.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
The address elements are all unsigned integers, so we should
use the appropriate print directive when printing it.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
If a guest has multiple network devices with the same MAC address,
when we online update the second device, libvirtd always updates
the first one.
commit def31e4c forgot to fix the online updating scenario. We need to
use virDomainNetFindIdx() to find the correct network device.
Signed-off-by: Zhou Yimin <zhouyimin@huawei.com>
Signed-off-by: Zhang Bo <oscar.zhangbo@huawei.com>
Commit id '980b265d' neglected to check for a successful status when
deciding whether to release the device address for the RNG attach thus
the address would be released even though the device was added.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Commit id '862473fa' neglected to return the status from the
qemuDomainRemoveRNGDevice call in qemuDomainRemoveDevice causing
the function to always fail when receiving an RNG device unplug
event. Additionally the domain status/state would not be updated
in the processDeviceDeletedEvent path.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Not every chardev is plugged onto virtio-serial bus. However, the
code introduced in 89e991a2aa assumes that. Incorrectly.
With previous patches we have three options where a chardev can
be plugged: virtio-serial, USB and PCI. This commit fixes the
detach part. However, since we are not auto allocating USB
addresses yet, I'm just marking the place where appropriate code
should go.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Not every chardev is plugged onto virtio-serial bus. However, the
code introduced in 89e991a2aa assumes that. Incorrectly.
With previous patches we have three options where a chardev can
be plugged: virtio-serial, USB and PCI. This commit fixes the
attach part. However, since we are not auto allocating USB
addresses yet, I'm just marking the place where appropriate code
should go.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There are a few extra exceptions that weren't being accounted for when
creating the alias for a controller. This resulted in 1) incorrect
status XML, and 2) exceptions/printfs of what *should* have been
directly available in the controller alias when constructing device
commandline arguments:
1) The primary (and only) IDE controller on a 440FX machinetype is
hardcoded to be "ide" in qemu.
2) The primary SATA controller on a 440FX machinetype is also
hardcoded to be "ide" in qemu.
3) On machinetypes that don't support multiple PCI buses, the PCI bus
is hardcoded in qemu to have the name "pci".
4) The first usb master controller is "usb", all others are the normal
"usb%d". (note that usb controllers that are not a "master" will have
the same index, and thus alias, as the master).
We needed to pass in the full domainDef and qemuCaps in order to
properly make the decisions about these exceptions.
Since libvirt doesn't call to update the new balloon size in qemu add
code that will handle tweaking of the size of the current balloon
statistic until qemu reports the new size using the event.
Coverity notes that ->ifname is used after the VIR_FREE done in the
code path after the call to virNetDevMacVLanDeleteWithVPortProfile
by a call to virNetDevOpenvswitchRemovePort.
Since the ->ifname will be VIR_FREE()'d eventually in virDomainNetDefFree
just remove the extraneous VIR_FREE here.
When originally added, the Openvswitch code wasn't present and checks
were made for non NULL prior to use.
Coverity complains that in the error paths both the < 0 condition and
the success path after the qemuDomainObjExitMonitor failure will end
up going to cleanup. So just use ignore_value in this error path to
resolve the complaint.
A further fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1113474
Since there is no possibility that any type of macvtap will work if
the parent physdev it's attached to is offline, we should bring the
physdev online at the same time as the macvtap. When taking the
macvtap offline, it's also necessary to take the physdev offline for
macvtap passthrough mode (because the physdev has the same MAC address
as the macvtap device, so could potentially cause problems with
misdirected packets during migration, as outlined in commits 829770
and 879c13). We can't set the physdev offline for other macvtap modes
1) because there may be other macvtap devices attached to the same
physdev (and/or the host itself may be using the device) in the other
modes whereas passthrough mode is exclusive to one macvtap at a time,
and 2) there's no practical reason to do so anyway.
This patch adds code that parses and formats configuration for memory
devices.
A simple configuration would be:
<memory model='dimm'>
<target>
<size unit='KiB'>524287</size>
<node>0</node>
</target>
</memory>
A complete configuration of a memory device:
<memory model='dimm'>
<source>
<pagesize unit='KiB'>4096</pagesize>
<nodemask>1-3</nodemask>
</source>
<target>
<size unit='KiB'>524287</size>
<node>1</node>
</target>
</memory>
This patch preemptively forbids use of the <memory> device in individual
drivers so the users are warned right away that the device is not
supported.
virnetdevopenvswitch.h declares a few functions that can be called to
add ports to and remove them from OVS bridges, and retrieve the
migration data for a port. It does not contain any data definitions
that are used by domain_conf.h. But for some reason, domain_conf.h
virnetdevopenvswitch.h should be directly #including it. This adds a
few lines to the project, but saves all the files that don't need it
from the extra computing, and makes the dependencies more clear cut.
Use the utilities introduced in the previous patches so the qemu
driver is able to create tap devices that are bound (and unbound
on domain destroyal) to Midonet virtual ports.
Signed-off-by: Antoni Segura Puimedon <toni+libvirt@midokura.com>
As there are two possible approaches to define a domain's memory size -
one used with legacy, non-NUMA VMs configured in the <memory> element
and per-node based approach on NUMA machines - the user needs to make
sure that both are specified correctly in the NUMA case.
To avoid this burden on the user I'd like to replace the NUMA case with
automatic totaling of the memory size. To achieve this I need to replace
direct access to the virDomainMemtune's 'max_balloon' field with
two separate getters depending on the desired size.
The two sizes are needed as:
1) Startup memory size doesn't include memory modules in some
hypervisors.
2) After startup these count as the usable memory size.
Note that the comments for the functions are future aware and document
state that will be present after a few later patches.
There was a mess in the way how we store unlimited value for memory
limits and how we handled values provided by user. Internally there
were two possible ways how to store unlimited value: as 0 value or as
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED. Because we chose to store memory
limits as unsigned long long, we cannot use -1 to represent unlimited.
It's much easier for us to say that everything greater than
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED means unlimited and leave 0 as valid
value despite that it makes no sense to set limit to 0.
Remove unnecessary function virCompareLimitUlong. The update of test
is to prevent the 0 to be miss-used as unlimited in future.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146539
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Commit f7afeddc added code to report to systemd an array of interface
indexes for all tap devices used by a guest. Unfortunately it not only
didn't add code to report the ifindexes for macvtap interfaces
(interface type='direct') or the tap devices used by type='ethernet',
it ended up sending "-1" as the ifindex for each macvtap or hostdev
interface. This resulted in a failure to start any domain that had a
macvtap or hostdev interface (or actually any type other than
"network" or "bridge").
This patch does the following with the nicindexes array:
1) Modify qemuBuildInterfaceCommandLine() to only fill in the
nicindexes array if given a non-NULL pointer to an array (and modifies
the test jig calls to the function to send NULL). This is because
there are tests in the test suite that have type='ethernet' and still
have an ifname specified, but that device of course doesn't actually
exist on the test system, so attempts to call virNetDevGetIndex() will
fail.
2) Even then, only add an entry to the nicindexes array for
appropriate types, and to do so for all appropriate types ("network",
"bridge", and "direct"), but only if the ifname is known (since that
is required to call virNetDevGetIndex().
libvirt was unconditionally calling virNetDevBandwidthClear() for
every interface (and network bridge) of a type that supported
bandwidth, whether it actually had anything set or not. This doesn't
hurt anything (unless ifname == NULL!), but is wasteful.
This patch makes sure that all calls to virNetDevBandwidthClear() are
qualified by checking that the interface really had some bandwidth
setup done, and checks for a null ifname inside
virNetDevBandwidthClear(), silently returning success if it is null
(as well as removing the ATTRIBUTE_NONNULL from that function's
prototype, since we can't guarantee that it is never null,
e.g. sometimes a type='ethernet' interface has no ifname as it is
provided on the fly by qemu).
Export the required helpers and add backend code to hotplug RNG devices.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
This way the device is in vmdef only if ret = 0 and the caller
(qemuDomainAttachDeviceFlags) does not free it.
Otherwise it might get double freed by qemuProcessStop
and qemuDomainAttachDeviceFlags if the domain crashed
in monitor after we've added it to vm->def.
Do the allocation first, then add the actual device.
The second part should never fail. This is good
for live hotplug where we don't want to remove the device
on OOM after the monitor command succeeded.
The only change in behavior is that on failure, the
vmdef->consoles array is freed, not just the first console.
Depending on the context, either error out if the domain
has disappeared in the meantime, or just ignore the value
to allow marking the function as ATTRIBUTE_RETURN_CHECK.
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
If the domain crashed while we were in monitor,
we cannot rely on the REALLOC done on live definition,
since vm->def now points to the persistent definition.
Skip adding the attached devices to domain definition
if the domain crashed.
In AttachChrDevice, the chardev was already added to the
live definition and freed by qemuProcessStop in the case
of a crash. Skip the device removal in that case.
Also skip audit if the domain crashed in the meantime.
https://bugzilla.redhat.com/show_bug.cgi?id=1161024
In the device type-specific functions, exit early
if the domain has disappeared, because the cleanup
should have been done by qemuProcessStop.
Check the return value in processDeviceDeletedEvent
and qemuProcessUpdateDevices.
Skip audit and removing the device from live def because
it has already been cleaned up.
https://bugzilla.redhat.com/show_bug.cgi?id=1165993
So, there are still plenty of vNIC types that we don't know how to set
bandwidth on. Let's warn explicitly in case user has requested it
instead of pretending everything was set.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add the possibility to have more than one IP address configured for a
domain network interface. IP addresses can also have a prefix to define
the corresponding netmask.
We can change vnc password by using virDomainUpdateDeviceFlags API with
live flag. But it can't be changed with config flag. Error is reported as
below.
error: Operation not supported: persistent update of device 'graphics' is not supported
This patch supports the graphics arguments changed with config flag.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
It's not supported to change some graphics arguments with '--live'.
Replace some error code VIR_ERR_INTERNAL_ERROR and VIR_ERR_INVALID_ARG
with VIR_ERR_OPERATION_UNSUPPORTED.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>
We now have a qemuInterfaceStartDevices() which does the final
activation needed for the host-side tap/macvtap devices that are used
for qemu network connections. It will soon make sense to have the
converse qemuInterfaceStopDevices() which will undo whatever was done
during qemuInterfaceStartDevices().
A function to "stop" a single device has also been added, and is
called from the appropriate place in qemuDomainDetachNetDevice(),
although this is currently unnecessary - the device is going to
immediately be deleted anyway, so any extra "deactivation" will be for
naught. The call is included for completeness, though, in anticipation
that in the future there may be some required action that *isn't*
nullified by deleting the device.
This patch is a part of a more complete fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
Currently, MAC registration occurs during device creation, which is
early enough that, during live migration, you end up with duplicate
MAC addresses on still-running source and target devices, even though
the target device isn't actually being used yet.
This patch proposes to defer MAC registration until right before
the guest can actually use the device -- In other words, right
before starting guest CPUs.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Laine Stump <laine@laine.org>
qemuNetworkIfaceConnect() used to have a special case for
actualType='network' (a network with forward mode of route, nat, or
isolated) to call the libvirt public API to retrieve the bridge being
used by a network. That is no longer necessary - since all network
types that use a bridge and tap device now get the bridge name stored
in the ActualNetDef, we can just always use
virDomainNetGetActualBridgeName() instead.
(an audit of the two callers to qemuNetworkIfaceConnect() confirms
that it is never called for any other type of network, so the dead
code in the else statement (logging an internal error if it is called
for any other type of network) is eliminated in the process.)
Since virNetworkFree will call virObjectUnref anyway, let's just use that
directly so as to avoid the possibility that we inadvertently clear out
a pending error message when using the public API.
Coverity complained that because the cfg->macFilter call checked
net->ifname != NULL before calling ebtablesRemoveForwardAllowIn, then
the virNetDevOpenvswitchRemovePort call should have the same check.
However, if I move the ebtables call prior to the check for TYPE_DIRECT
(where there is a VIR_FREE(net->ifname)), then it seems Coverity is
happy. Since firewall info is tacked on last during setup, removing
it in the opposite order of initialization seems to be natural anyway
Ethernet interfaces in libvirt currently do not support bandwidth setting.
For example, following xml file for an interface will not apply these
settings to corresponding qdiscs.
<interface type="ethernet">
<mac address="02:36:1d:18:2a:e4"/>
<model type="virtio"/>
<script path=""/>
<target dev="tap361d182a-e4"/>
<bandwidth>
<inbound average="984" peak="1024" burst="64"/>
<outbound average="2000" peak="2048" burst="128"/>
</bandwidth>
</interface>
Signed-off-by: Anirban Chakraborty <abchak@juniper.net>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Hotplugging and hotunplugging char devices is only supported through
'-device' and the check for device capability should be independently.
Coverity also complains about 'tmpChr->info.alias' could be NULL and we
are dereferencing it but it somehow only in this case don't recognize
that the value is set by 'qemuAssignDeviceChrAlias' so it's clearly
false positive. Add sa_assert to make coverity happy.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In qemuDomainDetachControllerDevice if the info.alias already exists
a call to qemuAssignDeviceControllerAlias would overwrite the existing
so avoid this possibility.
Prior patch removed the need for the virConnectPtr in the unplug
detach host path which caused ripple effect to remove in multiple
callers. The previous patch just left things as ATTRIBUTE_UNUSED -
this patch will remove the variable.
https://bugzilla.redhat.com/show_bug.cgi?id=1141732
Introduced by commit id '8f76ad99' the logic to detach a scsi_host
device (SCSI or iSCSI) fails when attempting to remove the 'drive'
because as I found in my investigation - the DelDevice takes care of
that for us.
The investigation turned up commits to adjust the logic for the
qemuMonitorDelDevice and qemuMonitorDriveDel processing for interfaces
(commit id '81f76598'), disk bus=VIRTIO,SCSI,USB (commit id '0635785b'),
and chr devices (commit id '55b21f9b'), but nothing with the host devices.
This commit uses the model for the previous set of changes and applies
it to the hostdev path. The call to qemuDomainDetachHostSCSIDevice will
return to qemuDomainDetachThisHostDevice handling either the audit of
the failure or the wait for the removal and then call into
qemuDomainRemoveHostDevice for the event, removal from the domain hostdev
list, and audit of the removal similar to other paths.
NOTE: For now the 'conn' param to +qemuDomainDetachHostSCSIDevice is left
as ATTRIBUTE_UNUSED. Removing requires a cascade of other changes to be
left for a future patch.
This patch adds parsing/formatting code as well as documentation for
shared memory devices. This will currently be only accessible in QEMU
using it's ivshmem device, but is designed as generic as possible to
allow future expansion for other hypervisors.
In the devices section in the domain XML users may specify:
- For shmem device using a server:
<shmem name='shmem0'>
<server path='/tmp/socket-ivshmem0'/>
<size unit='M'>32</size>
<msi vectors='32' ioeventfd='on'/>
</shmem>
- For ivshmem device not using an ivshmem server:
<shmem name='shmem1'>
<size unit='M'>32</size>
</shmem>
Most of the configuration is made optional so it also allows
specifications like:
<shmem name='shmem1/>
<shmem name='shmem2'>
<server/>
</shmem>
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Request erroring out from the backing chain traveller and drop qemu's
internal backing chain integrity tester.
The backing chain traveller reports errors by itself with possibly more
detail than qemuDiskChainCheckBroken ever could.
We also need to make sure that we reconnect to existing qemu instances
even at the cost of losing the backing chain info (this really should be
stored in the XML rather than reloaded from disk, but that needs some
work).
https://bugzilla.redhat.com/show_bug.cgi?id=1095636
When starting up the domain the domain's NICs are allocated. As of
1f24f682 (v1.0.6) we are able to use multiqueue feature on virtio
NICs. It breaks network processing into multiple queues which can be
processed in parallel by different host CPUs. The queues are, however,
created by opening /dev/net/tun several times. Unfortunately, only the
first FD in the row is labelled so when turning the multiqueue feature
on in the guest, qemu will get AVC denial. Make sure we label all the
FDs needed.
Moreover, the default label of /dev/net/tun doesn't allow
attaching a queue:
type=AVC msg=audit(1399622478.790:893): avc: denied { attach_queue }
for pid=7585 comm="qemu-kvm"
scontext=system_u:system_r:svirt_t:s0:c638,c877
tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023
tclass=tun_socket
And as suggested by SELinux maintainers, the tun FD should be labeled
as svirt_t. Therefore, we don't need to adjust any range (as done
previously by Guannan in ae368ebf) rather set the seclabel of the
domain directly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Pass the source of the changed media instead of a complete disk
definition.
Note that the @disk argument now contains what @olddisk would contain.
The new source is passed as a virStorageSource struct.
When we are changing media (or doing other hotplug operations) we need
to setup cgroups, locking and seclabels on the new disk. This is a
multi-step process where every piece can fail. To simplify dealing with
this introduce qemuDomainPrepareDisk that similarly to
qemuDomainPrepareDiskChainElement initializes/tears down a whole new
disk to be used with the domain.
Additionally the function supports passing a different source struct for
media changes of cdroms that will be refactored later.
Currently, qemu driver uses qemuTranslateDiskSourcePool()
to translate disk volume information. This function is
general enough and could be used for other drivers as well,
so move it to conf/domain_conf.c along with its helpers.
- qemuTranslateDiskSourcePool: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePool,
- qemuAddISCSIPoolSourceHost: move to storage/storage_driver.c
and rename to virStorageAddISCSIPoolSourceHost,
- qemuTranslateDiskSourcePoolAuth: move to storage/storage_driver.c
and rename to virStorageTranslateDiskSourcePoolAuth,
- Update users of qemuTranslateDiskSourcePool to use a
new name.
During a QEMU live migration several warning messages about job
handling could be written to syslog on the destination host:
"entering monitor without asking for a nested job is dangerous"
The messages are written because the job handling during migration
uses hard coded asyncJob values in several places that are incorrect.
This patch passes the required asyncJob value around and prevents
the warnings as well as any issues that the warnings may be referring
to.
https://bugzilla.redhat.com/show_bug.cgi?id=1130089
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Otherwise this beautiful error would be overwritten when
the function is called with a really high rate number:
2014-07-28 12:51:47.920+0000: 2304: error : virCommandWait:2399 :
internal error: Child process (/sbin/tc class add dev vnet0 parent 1:
classid 1:1 htb rate 4294968kbps) unexpected exit status 1: Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
default minor id of class to which unclassified packets are sent {0}
r2q DRR quantums are computed as rate in Bps/r2q {10}
debug string of 16 numbers each 0-3 {0}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
[prio P] [slot S] [pslot PS]
[ceil R2] [cburst B2] [mtu MTU] [quantum Q]
rate rate allocated to this class (class can still borrow)
burst max bytes burst which can be accumulated during idle period {computed}
mpu minimum packet size used in rate computations
overhead per-packet size overhead used in rate computations
linklay adapting to a linklayer e.g. atm
ceil definite upper class rate (no borrows) {rate}
cburst burst but for ceil {computed}
mtu max packet size we create rate map for {1600}
prio priority of leaf; lowe
https://bugzilla.redhat.com/show_bug.cgi?id=1043735
Create the structures and API's to hold and manage the iSCSI host device.
This extends the 'scsi_host' definitions added in commit id '5c811dce'.
A future patch will add the XML parsing, but that code requires some
infrastructure to be in place first in order to handle the differences
between a 'scsi_host' and an 'iSCSI host' device.
Split virDomainHostdevSubsysSCSI further. In preparation for having
either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI
to contain just a virDomainHostdevSubsysSCSIHost to describe the
'scsi_host' host device
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
As we are doing with the enum structures, a cleanup in "src/qemu/"
directory was done now. All the enums that were defined in the
header files were converted to typedefs in this directory. This
patch includes all the adjustments to remove conflicts when you do
this kind of change. "Enum-to-typedef"'s conversions were made in
"src/qemu/qemu_{capabilities, domain, migration, hotplug}.h".
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
A future patch will add two-phase block commit jobs; as the
mechanism for managing them is similar to managing a block copy
job, existing errors should be made generic enough to occur
for either job type.
* src/conf/domain_conf.c (virDomainHasDiskMirror): Update
comment.
* src/qemu/qemu_driver.c (qemuDomainDefineXML)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemuDomainBlockCopy): Update error
message.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Some of the APIs already return int since they can produce errors that
need to be propagated. For consistency reasons, this patch changes the
rest of the APIs to also return int even though they do not fail or
report any errors.
In general, we should only remove a backend after seeing DEVICE_DELETED
event for a corresponding frontend.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In general, we should only remove a backend after seeing DEVICE_DELETED
event for a corresponding frontend. This doesn't make any difference for
disks attached using -drive or drive_add since QEMU automatically
removes their backends but it's still better to make our code
consistent. And it may start making difference in case we switch to
attaching disks using -blockdev.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
[1] reported that we are removing network's backend too early. I didn't
really get the reproducer but libvirt behaves strangely when a guest
does not confirm the removal, e.g., it does not support PCI hotplug. In
such case, detaching a network device leaves its frontend in place but
removes the backend, which makes the device unusable for the guest.
Moreover attaching the same device again succeeds and both the guest and
libvirt will see two network interfaces attached but only one of them is
actually working.
I checked with Paolo Bonzini and he confirmed we should only remove a
backend after seeing DEVICE_DELETED event for a corresponding frontend.
[1] https://www.redhat.com/archives/libvir-list/2014-March/msg01740.html
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In "src/conf/domain_conf.h" there are many enum declarations. The
cleanup in this header filer was started, but it wasn't enough and
there are many other files that has enum variables declared. So, the
commit was starting to be big. This commit finish the cleanup in this
header file and in other files that has enum variables, parameters,
or functions declared.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
In "src/conf/domain_conf.h" there are many enumerations (enum)
declarations to be converted as a typedef too. As mentioned before,
it's better to use a typedef for variable types, function types and
other usages. I think this file has most of those enum declarations
at "src/conf/". So, me and Eric Blake plan to keep the cleanups all
over the source code. This time, most of the files changed in this
commit are related to part of one file: "src/conf/domain_conf.h".
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
If QEMU supports DEVICE_DELETED event, we always call
qemuDomainRemoveDevice from the event handler. However, we will need to
push this call away from the main event loop and begin a job for it (see
the following commit), we need to make sure the device is fully removed
by the original thread (and within its existing job) in case the
DEVICE_DELETED event arrives before qemuDomainWaitForDeviceRemoval times
out.
Without this patch, device removals would be guaranteed to never finish
before the timeout because the could would be blocked by the original
job being still active.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The commit 84c59ffa improved the way we change ejectable media.
If for any reason the first "eject" didn't open the tray we
should return with error.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Move sharable PCI handling functions to domain_addr.[ch], and
change theirs prefix from 'qemu' to 'vir':
- virDomainPCIAddressAsString;
- virDomainPCIAddressBusSetModel;
- virDomainPCIAddressEnsureAddr;
- virDomainPCIAddressFlagsCompatible;
- virDomainPCIAddressGetNextSlot;
- virDomainPCIAddressReleaseSlot;
- virDomainPCIAddressReserveAddr;
- virDomainPCIAddressReserveNextSlot;
- virDomainPCIAddressReserveSlot;
- virDomainPCIAddressSetFree;
- virDomainPCIAddressSetGrow;
- virDomainPCIAddressSlotInUse;
- virDomainPCIAddressValidate;
The only change here is function names, the implementation itself
stays untouched.
Extract common allocation code from DomainPCIAddressSetCreate
into virDomainPCIAddressSetAlloc.
This uses the new QEMU_CAPS_HOST_PCI_MULTIDOMAIN capability when
present, for -devivce pci-assign, -device vfio-pci, and -pcidevice.
While creating tests for this new functionality, I noticed that the
xmls for two existing tests had erroneously specified an
until-now-ignored domain="0x0002", so I corrected those two tests, and
also added two failure tests to be sure that we alert users who
attempt to use a non-zero domain with a qemu that doesn't support it.
If a domain network interface that contains a <filterref> is modified
"live" using "virsh update-device --live", libvirtd would crash. This
was because the code supporting live update of an interface's
filterref was assuming that a filterref might be added or modified,
but didn't account for removing the filterref, resulting in a null
dereference of the filter name.
Introduced with commit 258fb278, which was first in libvirt v1.0.1.
This addresses https://bugzilla.redhat.com/show_bug.cgi?id=1093301
The check for a network being active during interface attach was being
done individually in several places (by both the lxc driver and the
qemu driver), but those places were too specific, leading to it *not*
being checked when allocating a connection/device from a macvtap or
hostdev network.
This patch puts a single check in networkAllocateActualDevice(), which
is always called before the any network interface is attached to any
type of domain. It also removes all the other now-redundant checks
from the lxc and qemu drivers.
NB: the following patches are prerequisites for this patch, in the
case that it is backported to any branch:
440beeb network: fix virNetworkObjAssignDef and persistence
8aaa5b6 network: create statedir during driver initialization
b9e9549 network: change location of network state xml files
411c548 network: set macvtap/hostdev networks active if their state
file exists
This fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=880483
Since it is an abbreviation, PCI should always be fully
capitalized or full lower case, never Pci.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Every caller checked the return value and logged an error
- one if no device with the specified MAC was found,
other if there were multiple devices matching the MAC address
(except for qemuDomainUpdateDeviceConfig which logged the same
message in both cases).
Move the error reporting into virDomainNetFindIdx, since in both cases,
we couldn't find one single match - it's just the error messages that
differ.
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Change any method names with Usb, Pci or Scsi to use
USB, PCI and SCSI since they are abbreviations.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
For extracting hostdev codes from qemu_hostdev.c to common library, change qemu
specific cfg->relaxedACS handling to be a flag, and pass it to hostdev
functions.
Same logic of preparing/reattaching hostdevs could be used in attach/detach
hotplug places, so reuse hostdev interfaces to avoid duplicate, also for later
extracting general code to common library.
The qemu_bridge_filter.c file had some helpers for calling
the ebtablesXXX functions todo bridge filtering. The only
thing these helpers did was to overwrite the original error
message from the ebtables code. For added fun, the callers
of these helpers overwrote the errors yet again. For even
more fun, one of the helpers called another helper and
overwrite its errors too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Avoid the freeing of an array of zero file descriptors in case
of error. Initialize the array to -1 using memset.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
There might be some use cases, where user wants to prepare the host or
its environment prior to starting a network and do some cleanup after
the network has been shut down. Consider all the functionality that
libvirt doesn't currently have as an example what a hook script can
possibly do.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The code took into account only the global permissions. The domains now
support per-vm DAC labels and per-image DAC labels. Use the most
specific label available.
commit f094aaac changed qemuPrepareHostdevPCIDevices() such that it
may modify the "backend" (vfio vs. legacy kvm) setting in the
virHostdevDef. However, qemuDomainAttachHostPciDevice() (used by
hotplug) copies the backend setting into a local *before* calling
qemuPrepareHostdevPCIDevices(), and then later makes a decision based
on that pre-change value.
The result is that, if the backend had been set to "default" (i.e. not
specified in the config) and was later updated to "VFIO" by
qemuPrepareHostdevPCIDevices(), the qemu process' MacMemLock is not
increased (as is required for VFIO device assignment).
This patch delays making the local copy of backend until after its
potential modification.
This eliminates the misleading error message that was being logged
when a vfio hostdev hotplug failed:
error: unable to set user and group to '107:107' on '/dev/vfio/22':
No such file or directory
as documented in:
https://bugzilla.redhat.com/show_bug.cgi?id=1035490
Commit ee414b5d (pushed as a fix for Bug 1016511 and part of Bug
1025108) replaced the single call to
virSecurityManagerSetHostdevLabel() in qemuDomainAttachHostDevice()
with individual calls to that same function in each
device-type-specific attach function (for PCI, USB, and SCSI). It also
added a corresponding call to virSecurityManagerRestoreHostdevLabel()
in the error handling of the device-type-specific functions, but
forgot to remove the common call to that from
qemuDomainAttachHostDevice() - this resulted in a duplicate call to
virSecurityManagerRestoreHostdevLabel(), with the second occurrence
being after (e.g.) a PCI device has already been re-attached to the
host driver, thus destroying some of the device nodes / links that we
then attempted to re-label (e.f. /dev/vfio/22) and generating an error
log that obscured the original error.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1035490
virProcessSetMaxMemLock() (which is a wrapper over prlimit(3)) expects
the memory size in bytes, but libvirt's domain definition (which was
being used by qemuDomainAttachHostPciDevice()) stores all memory
tuning parameters in KiB. This was being accounted for when setting
MaxMemLock at domain startup time (so cold-plugged devices would
work), but not for hotplug.
This patch simplifies the few lines that call
virProcessSetMemMaxLock(), and multiply the amount * 1024 so that
we're locking the correct amount of memory.
What remains a mystery to me is why hot-plug of a managed='no' device
would succeed (at least on my system) while managed='yes' would
fail. I guess in one case the memory was coincidentally already
resident and in the other it wasn't.
We were unconditionally removing the device from the host list, when it
should only be done on error.
This fixes USB collision detection when hotplugging the same device to
two guests.
If we hit a collision, we free the USB device while it is still part
of our temporary USBDeviceList. When the list is unref'd, the device
is free'd again.
Make the initial device freeing dependent on whether it is present
in the temporary list or not.
Similar to what Jiri did for cgroup setup/teardown in 05e149f94, push
it all into the device handler functions so we can do the necessary prep
work before claiming the device.
This also fixes hotplugging USB devices by product/vendor (virt-manager's
default behavior):
https://bugzilla.redhat.com/show_bug.cgi?id=1016511
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1029732
The BZ asked for the capability to change the number of queues used by
a virtio-net device while the device is in use. Because the number of
queues can only be set at the time the device is created, that isn't
possible. However, libvirt also shouldn't be silently reporting
success when someone tries to change the number of queues. So this
patch flags that as an error (just as attempts to change any of the
other virtio-specific parameters already do).
If a SCSI hostdev is included in an initial domain XML, without a
corresponding controller statement, one is created silently when the
guest is booted.
When hotplugging a SCSI hostdev, a presumption is that the controller
is already present in the domain either from the original XML, or via
an earlier hotplug.
[root@xxxxxxxx ~]# cat disk.xml
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='3' unit='1088438288'/>
</source>
</hostdev>
[root@xxxxxxxx ~]# virsh attach-device guest01 disk.xml
error: Failed to attach device from disk.xml
error: internal error: unable to execute QEMU command 'device_add': Bus 'scsi0.0' not found
Since the infrastructure is in place, we can also create a controller
silently for use by the hotplugged hostdev device.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
For systems without a PCI bus, attaching a SCSI controller fails:
[root@xxxxxxxx ~]# cat controller.xml
<controller type='scsi' model='virtio-scsi' index='0' />
[root@xxxxxxxx ~]# virsh attach-device guest01 controller.xml
error: Failed to attach device from controller.xml
error: XML error: No PCI buses available
A similar problem occurs with the detach of a controller:
[root@xxxxxxxx ~]# virsh detach-device guest01 controller.xml
error: Failed to detach device from controller.xml
error: operation failed: controller scsi:0 not found
The qemuDomainXXtachPciControllerDevice routines made assumptions
that any caller had a PCI bus. These routines now selectively calls
PCI functions where necessary, and assigns the device information
type to one appropriate for the bus in use.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
For attach/detach of controller devices, we rename the functions to
remove 'PCI' from their title. The actual separation of PCI-specific
operations will be handled in the next patch.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1025108
So far qemuSetupHostdevCGroup was called very early during hotplug, even
before we knew the device we were about to hotplug was actually
available. By calling the function later, we make sure QEMU won't be
allowed to access devices used by other domains.
Another important effect of this change is that hopluging USB devices
specified by vendor and product (but not by their USB address) works
again. This was broken since v1.0.5-171-g7d763ac, when the call to
qemuFindHostdevUSBDevice was moved after the call to
qemuSetupHostdevCGroup, which then used an uninitialized USB address.
This patch moves some code in the qemuDomainAttachSCSIDisk
function. The check for the existence of a PCI address assigned
to the SCSI controller was moved in order to be executed only
when needed. The PCI address of a controller is not necessary
if QEMU_CAPS_DEVICE is supported.
This fixes issues with the hotplug of SCSI disks on pseries guests.
This patch (and the two patches that precede it) resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1005682
When libvirt was changed to delay the final cleanup of device removal
until the qemu process had signaled it with a DEVICE_DELETED event for
that device, the hostdev removal function
(qemuDomainRemoveHostDevice()) was written to properly handle the
removal of a hostdev that was actually an SRIOV virtual function
(defined with <interface type='hostdev'>). However, the function used
to search for a device matching the alias name provided in the
DEVICE_DELETED message (virDomainDefFindDevice()) would search through
the list of netdevs before hostdevs, so qemuDomainRemoveHostDevice()
was never called; instead the netdev function,
qemuDomainRemoveNetDevice() (which *doesn't* properly cleanup after
removal of <interface type='hostdev'>), was called.
(As a reminder - each <interface type='hostdev'> results in a
virDomainNetDef which contains a virDomainHostdevDef having a parent
type of VIR_DOMAIN_DEVICE_NET, and parent.data.net pointing back to
the virDomainNetDef; both Defs point to the same device info object
(and the info contains the device's "alias", which is used by qemu to
identify the device). The virDomainHostdevDef is added to the domain's
hostdevs list *and* the virDomainNetDef is added to the domain's nets
list, so searching either list for a particular alias will yield a
positive result.)
This function modifies the qemuDomainRemoveNetDevice() to short
circuit itself and call qemu DomainRemoveHostDevice() instead when the
actual device is a VIR_DOMAIN_NET_TYPE_HOSTDEV (similar logic to what
is done in the higher level qemuDomainDetachNetDevice())
Note that even if virDomainDefFindDevice() changes in the future so
that it finds the hostdev entry first, the current code will continue
to work properly.
This function was called in three places, and in each the call was
qualified by a slightly different conditional. In reality, this
function should only be called for a hostdev if all of the following
are true:
1) mode='subsystem'
2) type='pci'
3) there is a parent device definition which is an <interface>
(VIR_DOMAIN_DEVICE_NET)
We can simplify the callers and make them more consistent by checking
these conditions at the top ov qemuDomainHostdevNetConfigRestore and
returning 0 if one of them isn't satisfied.
The location of the call to qemuDomainHostdevNetConfigRestore() has
also been changed in the hot-plug case - it is moved into the caller
of its previous location (i.e. from qemuDomainRemovePCIHostDevice() to
qemuDomainRemoveHostDevice()). This was done to be more consistent
about which functions pay attention to whether or not this is one of
the special <interface> hostdevs or just a normal hostdev -
qemuDomainRemoveHostDevice() already contained a call to
networkReleaseActualDevice() and virDomainNetDefFree(), so it makes
sense for it to also handle the resetting of the device's MAC address
and vlan tag (which is what's done by
qemuDomainHostdevNetConfigRestore()).
Prefer using VFIO (if available) to the legacy KVM device passthrough.
With this patch a PCI passthrough device without the driver configured
will be started with VFIO if it's available on the host. If not legacy
KVM passthrough is checked and error is reported if it's not available.
The qemuDomainChangeNet() is called when 'virsh update-device' is
invoked on a NIC. Currently, we fail to update the QoS even though
we have routines for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The return value of virDomainControllerFind >=0 means that
the specific controller was found.
But some functions invoke it and treat 0 as not found.
This patch fix these incorrect invocation.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
When using a <interface type="network"> that points to a network with
hostdev forwarding mode a hostdev alias is created for the network. This
allias is inserted into the hostdev list, but is backed with a part of
the network object that it is connected to.
When a VM is being stopped qemuProcessStop() calls
networkReleaseActualDevice() which eventually frees the memory for the
hostdev object. Afterwards when the domain definition is being freed by
virDomainDefFree() an invalid pointer is accessed by
virDomainHostdevDefFree() and may cause a crash of the daemon.
This patch removes the entry in the hostdev list before freeing the
depending memory to avoid this issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000973
If there's no hard_limit set and domain uses VFIO we still must lock
the guest memory (prerequisite from qemu). Hence, we should compute
the amount to be locked from max_balloon.
If user requested multiqueue networking, beside multiple /dev/tap and
/dev/vhost-net openings, we forgot to pass mq=on onto the -device
virtio-net-pci command line. This is advised at:
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
Hotplugging a single SCSI device works, but adding additional ones
result in an error from QEMU:
[root@gpok197 ~]# virsh attach-device guest01 blah.xml
Device attached successfully
[root@gpok197 ~]# virsh attach-device guest01 blah2.xml
error: Failed to attach device from blah2.xml
error: internal error unable to execute QEMU command 'device_add': Duplicate ID 'hostdev0' for device
The hostdev ID that is created is always set to zero, regardless
of the contents of the XML. Changing the index in the hotplug case
to a negative one so the next available index is used.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
We had been setting the device alias in the devinceinfo for pci
controllers to "pci%u", but then hardcoding "pci.%u" when creating the
device address for other devices using that pci bus. This all worked
just fine until we encountered the built-in "pcie.0" bus (the PCIe
root complex) in Q35 machines.
In order to create the correct commandline for this one case, this
patch:
1) sets the alias for PCI controllers correctly, to "pci.%u" (or
"pcie.%u" for the pcie-root controller)
2) eliminates the hardcoded "pci.%u" for pci controllers when
generatuing device address strings, and instead uses the controller's
alias.
3) plumbs a pointer to the virDomainDef all the way down to
qemuBuildDeviceAddressStr. This was necessary in order to make the
aliase of the controller *used by a device* available (previously
qemuBuildDeviceAddressStr only had the deviceinfo of the device
itself, *not* of the controller it was connecting to). This made for a
larger than desired diff, but at least in the future we won't have to
do it again, since all the information we could possibly ever need for
future enhancements is in the virDomainDef. (right?)
This should be done for *all* controllers, but for now we just do it
in the case of PCI controllers, to reduce the likelyhood of
regression.
Introduced in commit 24b08219; compilation on RHEL 6.4 complained:
qemu/qemu_hotplug.c: In function 'qemuDomainAttachChrDevice':
qemu/qemu_hotplug.c:1257: error: declaration of 'remove' shadows a global declaration [-Wshadow]
/usr/include/stdio.h:177: error: shadowed declaration is here [-Wshadow]
* src/qemu/qemu_hotplug.c (qemuDomainAttachChrDevice): Avoid the
name 'remove'.
Signed-off-by: Eric Blake <eblake@redhat.com>
There are two levels on which a device may be hotplugged: config
and live. The config level requires just an insert or remove from
internal domain definition structure, which is exactly what this
patch does. There is currently no implementation for a chardev
update action, as there's not much to be updated. But more
importantly, the only thing that can be updated is path or socket
address by which chardevs are distinguished. So the update action
is currently not supported.
If an error occurs during qemuDomainAttachNetDevice after the macvtap
was created in qemuPhysIfaceConnect, the macvtap device gets left behind.
This patch adds code to the cleanup routine to delete the macvtap.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
I recently patches the callers to virPCIDeviceReset() to not call it
if the current driver for a device was vfio-pci (since that driver
will always reset the device itself when appropriate. At the time, Dan
Berrange suggested that I could instead modify virPCIDeviceReset
to check the currently bound driver for the device, and decide
for itself whether or not to go ahead with the reset.
This patch removes the previously added checks, and replaces them with
a check down in virPCIDeviceReset(), as suggested.
The functional difference here is that previously we were deciding
based on either the hostdev configuration or the value of
stubDriverName in the virPCIDevice object, but now we are actually
comparing to the "driver" link in the device's sysfs entry
directly. In practice, both should be the same.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.
Commit 752596b5 broke the build with -Werror
qemu/qemu_hotplug.c: In function 'qemuDomainChangeGraphics':
qemu/qemu_hotplug.c:1980:39: error: declaration of 'listen' shadows a
global declaration [-Werror=shadow]
Fix with s/listen/newlisten/
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in
7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
If we are just ejecting media, ret == -1 even after the retry loop
determines that the tray is open, as requested. This means media
disconnect always report's error.
Fix it, and fix some other mini issues:
- Don't overwrite the 'eject' error message if the retry loop fails
- Move the retries decrement inside the loop, otherwise the final loop
might succeed, yet retries == 0 and we will raise error
- Setting ret = -1 in the disk->src check is unneeded
- Fix comment typos
cc: mprivozn@redhat.com
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
In 84c59ffa I've tried to fix changing ejectable media process. The
process should go like this:
1) we need to call 'eject' on the monitor
2) we should wait for 'DEVICE_TRAY_MOVED' event
3) now we can issue 'change' command
However, while waiting in step 2) the domain monitor was locked. So
even if qemu reported the desired event, the proper callback was not
called immediately. The monitor handling code needs to lock the
monitor in order to read the event. So that's the first lock we must
not hold while waiting. The second one is the domain lock. When
monitor handling code reads an event, the appropriate callback is
called then. The first thing that each callback does is locking the
corresponding domain as a domain or its device is about to change
state. So we need to unlock both monitor and VM lock. Well, holding
any lock while sleep()-ing is not the best thing to do anyway.
Since 0d70656afd, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
This adds both attachment and detachment support for scsi host
device.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat>
It's better to put the usb related codes into qemuDomainAttachHostUsbDevice
instead of qemuDomainAttachHostDevice.
And in the old qemuDomainAttachHostDevice, just stealing the "usb" from
driver->activeUsbHostdevs leaks the memory.
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
The USB-specific cgroup setup had been inserted inline in
qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a
common cgroup setup function called for all hostdevs, so it makes sens
to put the usb-specific setup there and just rely on that function
being called.
The one thing I'm uncertain of here (and a reason for not pushing
until after release) is that previously hostdev->missing was checked
only when starting a domain (and cgroup setup for the device skipped
if missing was true), but with this consolidation, it is now checked
in the case of hotplug as well. I don't know if this will have any
practical effect (does it make sense to hotplug a "missing" usb
device?)
PCIO device assignment using VFIO requires read/write access by the
qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the
VFIO group number that the assigned device belongs to (and can be
found with the function virPCIDeviceGetVFIOGroupDev)
/dev/vfio/vfio can be accessible to any guest without danger
(according to vfio developers), so it is added to the static ACL.
The group device must be dynamically added to the cgroup ACL for each
vfio hostdev in two places:
1) for any devices in the persistent config when the domain is started
(done during qemuSetupCgroup())
2) at device attach time for any hotplug devices (done in
qemuDomainAttachHostDevice)
The group device must be removed from the ACL when a device it
"hot-unplugged" (in qemuDomainDetachHostDevice())
Note that USB devices are already doing their own cgroup setup and
teardown in the hostdev-usb specific function. I chose to make the new
functions generic and call them in a common location though. We can
then move the USB-specific code (which is duplicated in two locations)
to this single location. I'll be posting a followup patch to do that.
This isn't strictly speaking a bugfix, but I realized I'd gotten a bit
too verbose when I chose the names for
VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_*. This shortens them all a bit.
<source type='bridge'> uses a helper application to do the necessary
TUN/TAP setup to use an existing network bridge, thus letting
unprivileged users use TUN/TAP interfaces.
However, libvirt should be preventing QEMU from running any setuid
programs at all, which would include this helper program. From
a security POV, any setuid helper needs to be run by libvirtd itself,
not QEMU.
This is what this patch does. libvirt now invokes the setuid helper,
gets the TAP fd and then passes it to QEMU in the normal manner.
The path to the helper is specified in qemu.conf.
As a small advantage, this adds a <target dev='tap0'/> element to the
XML of an active domain using <interface type='bridge'>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VFIO requires all of the guest's memory and IO space to be lockable in
RAM. The domain's max_balloon is the maximum amount of memory the
domain can have (in KiB). We add a generous 1GiB to that for IO space
(still much better than KVM device assignment, where the KVM module
actually *ignores* the process limits and locks everything anyway),
and convert from KiB to bytes.
In the case of hotplug, we are changing the limit for the already
existing qemu process (prlimit() is used under the hood), and for
regular commandline additions of vfio devices, we schedule a call to
setrlimit() that will happen after the qemu process is forked.