Add support for using the new approach to hotplug vcpus using device_add
during startup of qemu to allow sparse vcpu topologies.
There are a few limitations imposed by qemu on the supported
configuration:
- vcpu0 needs to be always present and not hotpluggable
- non-hotpluggable cpus need to be ordered at the beginning
- order of the vcpus needs to be unique for every single hotpluggable
entity
Qemu also doesn't really allow to query the information necessary to
start a VM with the vcpus directly on the commandline. Fortunately they
can be hotplugged during startup.
The new hotplug code uses the following approach:
- non-hotpluggable vcpus are counted and put to the -smp option
- qemu is started
- qemu is queried for the necessary information
- the configuration is checked
- the hotpluggable vcpus are hotplugged
- vcpus are started
This patch adds a lot of checking code and enables the support to
specify the individual vcpu element with qemu.
The vcpu order information is extracted only for hotpluggable entities,
while vcpu definitions belonging to the same hotpluggable entity need
to all share the order information.
We also can't overwrite it right away in the vcpu info detection code as
the order is necessary to add the hotpluggable vcpus enabled on boot in
the correct order.
The helper will store the order information in places where we are
certain that it's necessary.
Introduce a new migration cookie flag that will be used for any
configurations that are not compatible with libvirt that would not
support the specific vcpu hotplug approach. This will make sure that old
libvirt does not fail to reproduce the configuration correctly.
Individual vCPU hotplug requires us to track the state of any vCPU. To
allow this add the following XML:
<domain>
...
<vcpu current='2'>3</vcpu>
<vcpus>
<vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
<vcpu id='1' enabled='yes' hotpluggable='yes' order='2'/>
<vcpu id='1' enabled='no' hotpluggable='yes'/>
</vcpus>
...
The 'enabled' attribute allows to control the state of the vcpu.
'hotpluggable' controls whether given vcpu can be hotplugged and 'order'
allows to specify the order to add the vcpus.
Similarly to devices the guest may allow unplug of the VCPU if libvirt
is down. To avoid problems, refresh the vcpu state on reconnect. Don't
mess with the vcpu state otherwise.
Now that the monitor code gathers all the data we can extract it to
relevant places either in the definition or the private data of a vcpu.
As only thread id is broken for TCG guests we may extract the rest of
the data and just skip assigning of the thread id. In case where qemu
would allow cpu hotplug in TCG mode this will make it work eventually.
Power 8 platform's basic hotpluggable unit is a core rather than a
thread for x86_64 family. This introduces most of the complexity of the
matching code and thus needs to be tested.
The test data contain data captured from in-order cpu hotplug and
unplug operations.
During review it was reported that adding at least 11 vcpus creates a
collision of prefixes in the monitor matching algorithm. Add a test case
to verify that the problem won't happen.
As the combination algorithm is rather complex and ugly it's necessary
to make sure it works properly. Add test suite infrastructure for
testing it along with a basic test based on x86_64 platform.
For hotplug purposes it's necessary to retrieve data using
query-hotpluggable-cpus while the old query-cpus API report thread IDs
and order of hotplug.
This patch adds code that merges the data using a rather non-trivial
algorithm and fills the data to the qemuMonitorCPUInfo structure for
adding to appropriate place in the domain definition.
Add support for retrieving information regarding hotpluggable cpu units
supported by qemu. Data returned by the command carries information
needed to figure out the granularity of hotplug, the necessary cpu type
name and the topology information.
Note that qemu doesn't specify any particular order of the entries thus
it's necessary sort them by socket_id, core_id and thread_id to the
order libvirt expects.
To allow matching up the data returned by query-cpus to entries in the
query-hotpluggable-cpus reply for CPU hotplug it's necessary to extract
the QOM path as it's the only link between the two.
QEMU reports whether 'query-hotpluggable-cpus' is supported for a given
machine type. Extract and cache the information using the capability
cache.
When copying the capabilities for a new start of qemu, mask out the
presence of QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS if the machine type
doesn't support hotpluggable cpus.
As of qemu commit:
commit a32ef3bfc12c8d0588f43f74dcc5280885bbdb30
Author: Thomas Huth <thuth@redhat.com>
Date: Wed Jul 22 15:59:50 2015 +0200
vl: Add another sanity check to smp_parse() function
v2.4.0-952-ga32ef3b
configuration where the maximum CPU count doesn't match the topology is
rejected. Prior to that only configurations where the topology would
contain more cpus than the maximum count would be rejected.
Use QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS as a relevant recent enough
witness to avoid breaking old configs.
Prepare to extract more data by returning an array of structs rather than
just an array of thread ids. Additionally report fatal errors separately
from qemu not being able to produce data.
Use the full versions of the message, instead of composing a base
message with what was updated; the change makes the messages properly
translatable, since different parts of a sentence might need different
declensions for example.
Turn various vshPrint() informative messages into vshPrintExtra(), so
they are not printed when requesting the quiet mode; neither XML/info
outputs nor the results of commands are affected.
Also change the expected outputs of the virsh-undefine test, since virsh
is invoked in quiet mode there.
Some informative messages might still be converted (and thus silenced
when in quiet mode), but this is an improvements nonetheless.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1358179
vzDomainMigrateConfirm3Params is whitelisted. Otherwise we need to
move removing domain from domain list from perform to confirm
step. This would further imply adding a flag and check that migration
is in progress to prohibit mistakenly (maliciously) removing domains
on confirm step. vz version of p2p also need to be fixed to include confirm step.
One would also need to add means to cleanup pending migration
on client disconnect as now is has state across several API
calls.
On the other hand current version of confirm step is totaly
harmless thus it is easier to whitelist it at the moment.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
This way we make naming consistent to API calls and make subsequent
ACL checks possible (otherwise ACL check would discover name
discrepancies).
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
ACL check on perform step should be in API call itself to make ACL
checking script pass. Thus we need to reorganize code to obtain
domain object in perform API itself. Most of this is straight
forward, the only nuance is dropping locks on lengthy remote
operations.
The other motivation is to have only perform step ACL checks for
p2p migration instead of both begin in perform if we can leave
ACL check in vzDomainMigratePerformStep.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
We need it to prepare the calls for ACL checks otherwise ACL checking
script will fail.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
This action deserves its own function and makes main API call
structure much cleaner.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
The original motivation is to expand API calls like start/stop etc so that
the ACL checks could be added. But this patch has its own befenits.
1. functions like prlsdkStart/Stop use common routine to wait for
job without domain lock. They become more self contained and do
not return intermediate PRL_RESULT.
2. vzDomainManagedSave do not update cache twice.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1367259
Crash occurs because 'secrets' is being dereferenced in call:
if (qemuDomainSecretSetup(conn, priv, secinfo, disk->info.alias,
VIR_SECRET_USAGE_TYPE_VOLUME, NULL,
&src->encryption->secrets[0]->seclookupdef,
true) < 0)
(gdb) p *src->encryption
$1 = {format = 2, nsecrets = 0, secrets = 0x0, encinfo = {cipher_size = 0,
cipher_name = 0x0, cipher_mode = 0x0, cipher_hash = 0x0, ivgen_name = 0x0,
ivgen_hash = 0x0}}
(gdb) bt
priv=priv@entry=0x7fffc03be160, disk=disk@entry=0x7fffb4002ae0)
at qemu/qemu_domain.c:1087
disk=0x7fffb4002ae0, vm=0x7fffc03a2580, driver=0x7fffc02ca390,
conn=0x7fffb00009a0) at qemu/qemu_hotplug.c:355
Upon entry to qemuDomainAttachVirtioDiskDevice, src->encryption points
at a valid 'secret' buffer w/ nsecrets == 1; however, the call to
qemuDomainDetermineDiskChain will call virStorageFileGetMetadata
and eventually virStorageFileGetMetadataInternal where the src->encryption
was overwritten when probing the volume.
Commit id 'a48c7141' added code to virStorageFileGetMetadataInternal
to determine if the disk/volume would use/need encryption and allocated
a meta->encryption. This overwrote an existing encryption buffer
already provided by the XML
This patch adds a check for meta->encryption already present before
just allocating and overwriting an existing buffer. It then checks the
existing encryption data to ensure the XML provided format for the
disk matches the expected format read from the disk and errors if there
is a mismatch.
For some unknown reason the original implementation of the <forwarder>
element only took advantage of part of the functionality in the
dnsmasq feature it exposes - it allowed specifying the ip address of a
DNS server which *all* DNS requests would be forwarded to, like this:
<forwarder addr='192.168.123.25'/>
This is a frontend for dnsmasq's "server" option, which also allows
you to specify a domain that must be matched in order for a request to
be forwarded to a particular server. This patch adds support for
specifying the domain. For example:
<forwarder domain='example.com' addr='192.168.1.1'/>
<forwarder domain='www.example.com'/>
<forwarder domain='travesty.org' addr='10.0.0.1'/>
would forward requests for bob.example.com, ftp.example.com and
joe.corp.example.com all to the DNS server at 192.168.1.1, but would
forward requests for travesty.org and www.travesty.org to
10.0.0.1. And due to the second line, requests for www.example.com,
and odd.www.example.com would be resolved by the libvirt network's own
DNS server (i.e. thery wouldn't be immediately forwarded) even though
they also match 'example.com' - the match is given to the entry with
the longest matching domain. DNS requests not matching any of the
entries would be resolved by the libvirt network's own DNS server.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1331796
If you define a libvirt virtual network with one or more IP addresses,
it starts up an instance of dnsmasq. It's always been possible to
avoid dnsmasq's dhcp server (simply don't include a <dhcp> element),
but until now it wasn't possible to avoid having the DNS server
listening; even if the network has no <dns> element, it is started
using default settings.
This patch adds a new attribute to <dns>: enable='yes|no'. For
backward compatibility, it defaults to 'yes', but if you don't want a
DNS server created for the network, you can simply add:
<dns enable='no'/>
to the network configuration, and next time the network is started
there will be no dns server created (if there is dhcp configuration,
dnsmasq will be started with "port=0" which disables the DNS server;
if there is no dhcp configuration, dnsmasq won't be started at all).
The new forward mode 'open' is just like mode='route', except that no
firewall rules are added to assure that any traffic does or doesn't
pass. It is assumed that either they aren't necessary, or they will be
setup outside the scope of libvirt.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=846810
==18324== 32 bytes in 1 blocks are still reachable in loss record 41 of 114
==18324== at 0x4C2C070: calloc (vg_replace_malloc.c:623)
==18324== by 0x4EA479B: virAlloc (viralloc.c:144)
==18324== by 0x4EA674A: virBitmapNewQuiet (virbitmap.c:77)
==18324== by 0x4EA67F7: virBitmapNew (virbitmap.c:106)
==18324== by 0x4EC777D: dnsmasqCapsNewEmpty (virdnsmasq.c:801)
==18324== by 0x4EC781B: dnsmasqCapsNewFromBuffer (virdnsmasq.c:815)
==18324== by 0x407CF4: mymain (networkxml2conftest.c:99)
==18324== by 0x409CF0: virTestMain (testutils.c:982)
==18324== by 0x4080EA: main (networkxml2conftest.c:136)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch fixes a bug which occurs when we check a bus and unit number
for a new attached disk. We should do this check in ValidadionCallback,
not in PostParse callback. Because in PostParse we have not initialized
disk->info.addr.drive struct yet.
Move part of code from domainPostParseCallback to domainValidateCallback
and part from devicesPostParseCallback to deviceValidateCallback.
PostParse callbacks are for modification data.
ValidateCallbacks are only for checks.
While dettaching/attaching device in OpenStack, nova
calls vzDomainDettachDevice twice, because the update of the internal
configuration of the ct comes a bit latter than the update event.
As the result, we suffer from the second call to dettach the same device.
Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
If we are going to ignore return value of a functions
that can raise an error, it's not enough to use ignore_value
construction. We should explicitly call virResetLastError
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
First, make function logPrlEventErrorHelper be void and only
print information (if any) from an event.
Second, don't rewrite original error with any errors we get
during parsing event info.
Third, ignore PRL_ERR_NO_DATA at all.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>