Silly this bug went unnoticed so long. At the beginning we try to
find the passed network in the list of network objects. If found,
it's locked and real work takes place. Then, in the end, the
network object is never unlocked.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Okay, this is mainly for educational purposes since is called
from single point only with all the possible locks held. So
there's no way for other thread to hop in and do something wrong.
Nevertheless, we should not give bad example.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We have this function networkObjFromNetwork() which for given
virNetworkPtr tries to find corresponding virNetworkObjPtr. If no
object is found, a nice error message is printed out:
no network with matching uuid '$uuid' ($name)
Let's improve the error message produced by networkLookupByUUID to
follow that logic.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1197600
So, libvirt uses pid file to track pid of started qemus. Whenever
a domain is started, its pid is put into corresponding pid file.
The pid file path is generated based on domain name and stored
into domain object internals. However, it's not stored in the
status XML and therefore lost on daemon restarts. Hence, later,
when domain is being shut down, the daemon does not know which
pid file to unlink, and the correct pid file is left behind. To
avoid this, lets generate the pid file path again in
qemuProcessReconnect().
Reported-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Instead of checking defaultMode for every channel that has no mode
configured, test it only once outside of channel loop. This fixes a bug
that in case all possible channels are fore example set to insecure, but
defaultMode is set to secure, we wouldn't auto-generate TLS port. This
results in failure while starting a guest.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1143832
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We have two different places that needs to be updated while touching
code for allocation spice ports. Add a bool option to
'qemuProcessSPICEAllocatePorts' function to switch between true and fake
allocation so we can use this function also in qemu_driver to generate
native domain definition.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Since adding the support for scheduler policy settings in commit
8680ea97, there are two enums with the same information. That was
caused by rewriting the patch since first draft.
Find out thanks to clang, but there was no impact whatsoever.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The problem here was that when opening a channel, we were checking
whether the channel given is alias (can't be NULL for running domain) or
it's name, which can be NULL (for example with spicevmc). In case of
such domain qemuDomainOpenChannel() made the daemon crash.
STREQ_NULLABLE() is safe to use since the code in question is wrapped in
"if (name)" and is more readable, so use that instead of checking for
non-NULL "vm->def->channels[i]->target.name".
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The virStorageBackendISCSIFindPoolSources API only needs the 'host' name
in order to discover iSCSI pools, it returns the various device paths.
On input, it's also possible to further restrict a search by providing the
port attribute for the host element and the (undocumented) initiator element.
For example:
$ virsh find-storage-pool-sources-as iscsi
error: Failed to find any iscsi pool sources
error: invalid argument: hostname and device path must be specified for iscsi sources
$ virsh find-storage-pool-sources-as iscsi 192.168.122.1
<sources>
<source>
<host name='192.168.122.1' port='3260'/>
<device path='iqn.2013-12.com.example:iscsi-chap-lclpool'/>
</source>
</sources>
https://bugzilla.redhat.com/show_bug.cgi?id=1181062
According to the formatstorage.html description for <source> element
and "format" attribute: "All drivers are required to have a default
value for this, so it is optional."
As it turns out the disk backend did not choose a default value, so I
added a default of "msdos" if the source type is "unknown" as well as
updating the storage.html backend disk volume driver documentation to
indicate the default format is dos.
https://bugzilla.redhat.com/show_bug.cgi?id=1142631
This patch resolves a situation where the same "<target dev='$name'...>"
can be used for multiple disks in the domain.
While the $name is "mostly" advisory regarding the expected order that
the disk is added to the domain and not guaranteed to map to the device
name in the guest OS, it still should be unique enough such that other
domblk* type operations can be performed.
Without the patch, the domblklist will list the same Target twice:
$ virsh domblklist $dom
Target Source
------------------------------------------------
sda /var/lib/libvirt/images/file.qcow2
sda /var/lib/libvirt/images/file.img
Additionally, getting domblkstat, domblkerror, domblkinfo, and other block*
type calls will not be able to reference the second target.
Fortunately, hotplug disallows adding a "third" sda value:
$ qemu-img create -f raw /var/lib/libvirt/images/file2.img 10M
$ virsh attach-disk $dom /var/lib/libvirt/images/file2.img sda
error: Failed to attach disk
error: operation failed: target sda already exists
$
BUT, it since 'sdb' doesn't exist one would get the following on the same
hotplug attempt, but changing to use 'sdb' instead of 'sda'
$ virsh attach-disk $dom /var/lib/libvirt/images/file2.img sdb
error: Failed to attach disk
error: internal error: unable to execute QEMU command 'device_add': Duplicate ID 'scsi0-0-1' for device
$
Since we cannot fix this issue at parsing time, the best that can be done so
as to not "lose" a domain is to make the check prior to starting the guest
with the results as follows:
$ virsh start $dom
error: Failed to start domain $dom
error: XML error: target 'sda' duplicated for disk sources '/var/lib/libvirt/images/file.qcow2' and '/var/lib/libvirt/images/file.img'
$
Running 'make check' found a few more instances in the tests where this
duplicated target dev value was being used. These also exhibited some
duplicated 'id=' values (negating the uniqueness argument of aliases) in
the corresponding .args file and of course the *xmlout version of a few
input XML files.
NUMA enabled guest configuration explicitly specifies memory sizes for
individual nodes. Allowing the virDomainSetMemoryFlags API (and friends)
to change the total doesn't make sense as the individual node configs
are not updated in that case.
Forbid use of the API in case NUMA is specified.
Add VIR_VOL_XML_PARSE_OPT_CAPACITY flag to virStorageVolDefParseXML.
With this flag, no error is reported when the capacity is missing
if there is a backing store.
Instead of just looking at the output of fstat, call
virStorageFileGetMetadata to get the full capacity from
image headers.
Note that the capacity is probed unconditionally. The updateCapacity
bool parameter is ignored and will be removed in the following commit.
In virStorageVolCreateXML, add VIR_VOL_XML_PARSE_NO_CAPACITY
to the call parsing the XML of the new volume to make the capacity
optional.
If the capacity is omitted, use the capacity of the old volume.
We already do that for values that are less than the original
volume capacity.
If we combine the boot order on the command line with other
boot options, we prepend order= in front of it.
Instead of checking if the number of added arguments is between
0 and 2, separate the strings for boot order and options
and prepend boot order only if both strings are not empty.
Commit 6992994 started filling the listen attribute
of the parent <graphics> elements from type='network' listens.
When this XML is passed to UpdateDevice, parsing fails:
XML error: graphics listen attribute 10.20.30.40 must match
address attribute of first listen element (found none)
Ignore the address in the parent <graphics> attribute
when no type='address' listens are found,
the same we ignore the address for the <listen> subelements
when parsing inactive XML.
The gluster volume name extraction code was copied from the XML parser
without changing the VIR_ERR_XML_ERROR error code. Use
VIR_ERR_CONFIG_UNSUPPORTED instead.
Similar to commit fdb80ed4f6 libvirtd
would crash if a gluster URI without path would be used in the backing
chain of a volume. The crash happens in the gluster specific part of the
parser that extracts the gluster volume name from the path.
Fix the crash by checking that the PATH is NULL.
This patch does not contain a test case as it's not possible to test it
with the current infrastructure as the test suite would attempt to
contact the gluster server in the URI. I'm working on the test suite
addition but that will be post-release material.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1196528
In virNetworkDHCPHostDefParseXML an error is reported
when partialOkay == true, and none of ip, mac, name
were supplied.
Add the missing goto and error out in this case.
https://bugzilla.redhat.com/show_bug.cgi?id=1196503
We already check whether the host id is valid or not, add a jump
to forbid invalid host id.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Commit f7afeddc added code to report to systemd an array of interface
indexes for all tap devices used by a guest. Unfortunately it not only
didn't add code to report the ifindexes for macvtap interfaces
(interface type='direct') or the tap devices used by type='ethernet',
it ended up sending "-1" as the ifindex for each macvtap or hostdev
interface. This resulted in a failure to start any domain that had a
macvtap or hostdev interface (or actually any type other than
"network" or "bridge").
This patch does the following with the nicindexes array:
1) Modify qemuBuildInterfaceCommandLine() to only fill in the
nicindexes array if given a non-NULL pointer to an array (and modifies
the test jig calls to the function to send NULL). This is because
there are tests in the test suite that have type='ethernet' and still
have an ifname specified, but that device of course doesn't actually
exist on the test system, so attempts to call virNetDevGetIndex() will
fail.
2) Even then, only add an entry to the nicindexes array for
appropriate types, and to do so for all appropriate types ("network",
"bridge", and "direct"), but only if the ifname is known (since that
is required to call virNetDevGetIndex().
Previously this function relied on having ATTRIBUTE_NONNULL(1) in its
prototype rather than explicitly checking for a null
ifname. Unfortunately, ATTRIBUTE_NONNULL is just a hint to the
optimizer and code analyzers like Coverity, it doesn't actually check
anything at execution time, so the result was possible warnings from
Coverity, along with the possibility of null dereferences when ifname
wasn't available.
This patch removes the ATTRIBUTE_NONNULL from the prototype, and
checks ifname inside the function, logging an error if it's NULL (once
we've determined that the user really is trying to set a bandwidth).
libvirt was unconditionally calling virNetDevBandwidthClear() for
every interface (and network bridge) of a type that supported
bandwidth, whether it actually had anything set or not. This doesn't
hurt anything (unless ifname == NULL!), but is wasteful.
This patch makes sure that all calls to virNetDevBandwidthClear() are
qualified by checking that the interface really had some bandwidth
setup done, and checks for a null ifname inside
virNetDevBandwidthClear(), silently returning success if it is null
(as well as removing the ATTRIBUTE_NONNULL from that function's
prototype, since we can't guarantee that it is never null,
e.g. sometimes a type='ethernet' interface has no ifname as it is
provided on the fly by qemu).
If the qemu binary on x86 does not support lsi SCSI controller,
but it supports virtio-scsi, we reject the virtio-specific attributes
for no reason.
Move the default controller assignment before the check.
https://bugzilla.redhat.com/show_bug.cgi?id=1168849
https://bugzilla.redhat.com/show_bug.cgi?id=1183869
Soo. you've successfully started yourself a domain. And since you want
to use it on your host exclusively you are confident enough to
passthrough the host CPU model, like this:
<cpu mode='host-passthrough'/>
Then, after a while, you want to save the domain into a file (e.g.
virsh save dom dom.save). And here comes the trouble. The file consist
of two parts: Libvirt header (containing domain XML among other
things), and qemu migration data. Now, the domain XML in the header is
formatted using special flags (VIR_DOMAIN_XML_SECURE |
VIR_DOMAIN_XML_UPDATE_CPU | VIR_DOMAIN_XML_INACTIVE |
VIR_DOMAIN_XML_MIGRATABLE).
Then, on your way back from the bar, you think of changing something
in the XML in the saved file (we have a command for it after all), say
listen address for graphics console. So you successfully type in the
command:
virsh save-image-edit dom.save
Change all the bits, and exit the editor. But instead of success
you're left with sad error message:
error: unsupported configuration: Target CPU model <null> does not
match source Pentium Pro
Sigh. Digging into the code you see lines, where we check for ABI
stability. The new XML you've produced is compared with the old one
from the saved file to see if qemu ABI will break or not. Wait, what?
We are using different flags to parse the XML you've provided so we
were just lucky it worked in some cases? Yep, that's right.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Well, not that we are not formatting invalid XML, rather than not as
beautiful as we can:
<cpu mode='host-passthrough'>
</cpu>
If there are no children, let's use the singleton element.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Well, so far there are no variables to free, no cleanup work needed on
an error, so bare 'return -1;' after each error is just okay. But this
will change in a while.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This API joins the following two lines:
char *s = virBufferContentAndReset(buf1);
virBufferAdd(buf2, s, -1);
into one:
virBufferAddBuffer(buf2, buf1);
With one exception: there's no re-indentation applied to @buf1.
The idea is, that in general both can have different indentation
(like the test I'm adding proves)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In commit cc41c648 I've re-factored qemuMonitorFindBalloonObjectPath, but
missed that there is a memory leak. The "nextpath" variable is
overwritten while looping in for cycle and we have to free it before next
cycle.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1151942
While the restriction doesn't have origin in any RFC, it matters
to us while constructing the dnsmasq config file (or command line
previously). For better picture, this is how the corresponding
part of network XML look like:
<dns>
<forwarder addr='8.8.4.4'/>
<txt name='example' value='example value'/>
</dns>
And this is how the config file looks like then:
server=8.8.4.4
txt-record=example,example value
Now we can see why there can't be any commas in the TXT name.
They are used by dnsmasq to separate @name and @value.
Funny, we have it in the documentation, but the code (which was
pushed back in 2011) didn't reflect that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Making use of the ARCH_IS_S390 macro introduced with
e808357528
Signed-off-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Since s390 does not support usb the default creation of a usb controller
for a domain should not occur.
Also adjust s390 test cases by removing usb device instances since
usb devices are no longer created by default for s390 the s390
test cases need to be adjusted.
Signed-off-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
This implement handling of <backenddomain name=''/> parameter introduced
in previous patch.
Works on Xen >= 4.3, because only there libxl supports setting backend
domain by name. Specifying backend domain by ID or UUID is currently not
supported.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
At least Xen supports backend drivers in another domain (aka "driver
domain"). This patch introduces an XML config option for specifying the
backend domain name for <disk> and <interface> devices. E.g.
<disk>
<backenddomain name='diskvm'/>
...
</disk>
<interface type='bridge'>
<backenddomain name='netvm'/>
...
</interface>
In the future, same option will be needed for USB devices (hostdev
objects), but for now libxl doesn't have support for PVUSB.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
The function that parses the <forward> subelement of a network used to
fail/log an error if the network definition contained both a <pf>
element as well as at least one <interface> or <address> element. That
check was present because the configuration of a network should have
either one <pf>, one or more <interface>, or one or more <address>,
but never combinations of multiple kinds.
This caused a problem when libvirtd was restarted with a network
already active - when a network with a <pf> element is started, the
referenced PF (Physical Function of an SRIOV-capable network card) is
checked for VFs (Virtual Functions), and the <forward> is filled in
with a list of all VFs for that PF either in the form of their PCI
addresses (a list of <address>) or their netdev names (a list of
<interface>); the <pf> element is not removed though. When libvirtd is
restarted, it parses the network status and finds both the original
<pf> from the config, as well as the list of either <address> or
<interface>, fails the parse, and the network is not added to the
active list. This failure is often obscured because the network is
marked as autostart so libvirt immediately restarts it.
It seems odd to me that <interface> and <address> are stored in the
same array rather than keeping two separate arrays, and having
separate arrays would have made the check much simpler. However,
changing to use two separate arrays would have required changes in
more places, potentially creating more conflicts and (more
importantly) more possible regressions in the event of a backport, so
I chose to keep the existing data structure in order to localize the
change.
It appears that this problem has been in the code ever since support
for <pf> was added (0.9.10), but until commit
34cc3b2f10 (first in libvirt 1.2.4)
networks with interface pools were not properly marked as active on
restart anyway, so there is no point in backporting this patch any
further than that.
Later patches will need to access the full definition to do check the
memory size and thus the checking needs to be done after the whole
definition including devices is known.
For historical reasons data regarding NUMA configuration were split
between the CPU definition and numatune. We cannot do anything about the
XML still being split, but we certainly can at least store the relevant
data in one place.
This patch moves the NUMA stuff to the right place.
As virDomainNumatuneSet now doesn't allocate the virDomainNuma object
any longer it's not necessary to pass the pointer to a pointer to store
the object as it will not change any longer.
While touching the parameter definitions I've also changed the name of
the parameter to "numa".
Since our formatter now handles well if the config is allocated and not
filled we can safely always-allocate the NUMA config and remove the
ad-hoc allocation code.
This will help in later patches as the parser will be refactored to just
fill the data.
Move the existing virDomainDefNew to virDomainDefNewFull as it's setting
a few things in the conf and re-introduce virDomainDefNew as a function
without parameters for common use.
Do a content-aware check if formatting of the <numatune> element is
necessary. Later on the def->numa structure will be always present so we
cannot decide only on the basis whether it's allocated.
Shuffling around the logic will allow to simplify the code quite a bit.
As an additional bonus the change in the logic now reports an error if
automatic placement is selected and individual placement is configured.
Currently the code would exit without reporting an error as
virBitmapParse reports one only if it fails to parse the bitmap, whereas
the code was jumping to the error label even in case 0 cpus were
correctly parsed in the map.
It's easier to recalculate the number in the one place it's used as
having a separate variable to track it. It will also help with moving
the NUMA code to the separate module.
Name it virNumaMemAccess and add it to conf/numa_conf.[ch]
Note that to avoid a circular dependency the type of the NUMA cell
memAccess variable was changed to int. It will be turned back later
after the circular dependency will not exist.
The mask was stored both as a bitmap and as a string. The string is used
for XML output only. Remove the string, as it can be reconstructed from
the bitmap.
The test change is necessary as the bitmap formatter doesn't "optimize"
using the '^' operator.
Rewrite the function to save a few local variables and reorder the code
to make more sense.
Additionally the ncells_max member of the virCPUDef structure is used
only for tracking allocation when parsing the numa definition, which can
be avoided by switching to VIR_ALLOC_N as the array is not resized
after initial allocation.
For weird historical reasons NUMA cells are added as a subelement of
<cpu> while the actual configuration is done in <numatune>.
This patch splits out the cell parser code from cpu config to NUMA
config. Note that the changes to the code are minimal just to make it
work and the function will be refactored in the next patch.
For a while now there are two places that gather information about NUMA
related guest configuration. While the XML can't be changed we can at
least store the data in one place in the definition.
Rename the numatune_conf.[ch] files to numa_conf as later patches will
move the rest of the definitions from the cpu definition to this one.
Not all machine types support all devices, device properties, backends,
etc. So until we create a matrix of [machineType, qemuCaps], lets just
filter out some capabilities before we return them to the consumer
(which is going to make decisions based on them straight away).
Currently, as qemu is unable to tell which capabilities are (not)
enabled for given machine types, it's us who has to hardcode the matrix.
One day maybe the hardcoding will go away and we can create the matrix
dynamically on the fly based on a few monitor calls.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
1. Delete all boot devices for VM instance
2. Find the first HDD from XML and set it as bootable
Now we support only one boot device and it should be HDD.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Not all files we want to find using virFileFindResource{,Full} are
generated when libvirt is built, some of them (such as RNG schemas) are
distributed with sources. The current API was not able to find source
files if libvirt was built in VPATH.
Both RNG schemas and cpu_map.xml are distributed in source tarball.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1179678
When migrating with storage, libvirt iterates over domain disks and
instruct qemu to migrate the ones we are interested in (shared, RO and
source-less disks are skipped). The disks are migrated in series. No
new disk is transferred until the previous one hasn't been quiesced.
This is checked on the qemu monitor via 'query-jobs' command. If the
disk has been quiesced, it practically went from copying its content
to mirroring state, where all disk writes are mirrored to the other
side of migration too. Having said that, there's one inherent error in
the design. The monitor command we use reports only active jobs. So if
the job fails for whatever reason, we will not see it anymore in the
command output. And this can happen fairly simply: just try to migrate
a domain with storage. If the storage migration fails (e.g. due to
ENOSPC on the destination) we resume the host on the destination and
let it run on partly copied disk.
The proper fix is what even the comment in the code says: listen for
qemu events instead of polling. If storage migration changes state an
event is emitted and we can act accordingly: either consider disk
copied and continue the process, or consider disk mangled and abort
the migration.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Upon BLOCK_JOB_COMPLETED event delivery, we check if the job has
completed (in qemuMonitorJSONHandleBlockJobImpl()). For better image,
the event looks something like this:
"timestamp": {"seconds": 1423582694, "microseconds": 372666}, "event":
"BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len":
8412790784, "offset": 409993216, "speed": 8796093022207, "type":
"mirror", "error": "No space left on device"}}
If "len" does not equal "offset" it's considered an error, and we can
clearly see "error" field filled in. However, later in the event
processing this case was handled no differently to case of job being
aborted via separate API. It's time that we start differentiate these
two because of the future work.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently, upon BLOCK_JOB_* event, disk->mirrorState is not updated
each time. The callback code handling the events checks if a blockjob
was started via our public APIs prior to setting the mirrorState.
However, some block jobs may be started internally (e.g. during
storage migration), in which case we don't bother with setting
disk->mirror (there's nothing we can set it to anyway), or other
fields. But it will come handy if we update the mirrorState in these
cases too. The event wasn't delivered just for fun - we've started the
job after all.
So, in this commit, the mirrorState is set to whatever job status
we've obtained. Of course, there are some actions on some statuses
that we want to perform. But instead of if {} else if {} else {} ...
enumeration, let's move to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If 'virNumaGetHostNodeset()' fails then the error path will try to free
uninitialized pointer mem_mask. Introduced by commit af2a1f058.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
PowerPC : Forbid NULL CPU model with 'host-model' mode in qemu command line.
This ensures that an XML such as following:
...
<cpu mode='host-model'>
<model fallback='allow'/>
</cpu>
...
will not generate a '-cpu host,compat=(null)' command line with qemu-system-ppc64.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
PowerPC : Explicitly associate 'qemu-system-ppc64' as the
default emulator for all 64-bit PowerPC guests ( both Big & Little Endian )
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1126762
Commit 43b67f introduced a deadlock issue when we use numatune
to change numa settings to a vm in session mode.
Jump to endjob instead of jump to cleanup.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
So, when building the '-numa' command line, the
qemuBuildMemoryBackendStr() function does quite a lot of checks to
chose the best backend, or to check if one is in fact needed. However,
it returned that backend is needed even for this little fella:
<numatune>
<memory mode="strict" nodeset="0,2"/>
</numatune>
This can be guaranteed via CGroups entirely, there's no need to use
memory-backend-ram to let qemu know where to get memory from. Well, as
long as there's no <memnode/> element, which explicitly requires the
backend. Long story short, we wouldn't have to care, as qemu works
either way. However, the problem is migration (as always). Previously,
libvirt would have started qemu with:
-numa node,memory=X
in this case and restricted memory placement in CGroups. Today, libvirt
creates more complicated command line:
-object memory-backend-ram,id=ram-node0,size=X
-numa node,memdev=ram-node0
Again, one wouldn't find anything wrong with these two approaches.
Both work just fine. Unless you try to migrated from the older libvirt
into the newer one. These two approaches are, unfortunately, not
compatible. My suggestion is, in order to allow users to migrate, lets
use the older approach for as long as the newer one is not needed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Periodically my Coverity scan will return a checked_return failure
for libxlDomainShutdownThread call to libxlDomainStart. Followed the
libxlAutostartDomain example in order to check the status, emit a
message, and continue on.
Jumping to the cleanup label prior to starting the container failed to
properly clean everything up that is handled by the virLXCProcessCleanup
which is called if virLXCProcessStop is called on failure after the
container properly starts. Most importantly is prior to this patch none
of the stop/release hooks, host device reattachment, and network cleanup
(that is reverse of virLXCProcessSetupInterfaces).
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Modify the VIR_DEBUG message in virLXCProcessCleanup to make it clearer
about the path. Also add some more VIR_DEBUG messages in virLXCProcessStart
in order to help debug error flow.
https://bugzilla.redhat.com/show_bug.cgi?id=1176503
Move the two console checks - one for zero nconsoles present and the
other for an invalid console type to earlier in the processing rather than
getting after performing some setup that has to be undone for what amounts
to an invalid configuration.
This resolves the above bug since it's not not possible to have changed
the security labels when we cause the configuration check failure.
if (mgr == NULL || mgr->drv == NULL)
return ret;
This check isn't really necessary, security manager cannot be a NULL
pointer as it is either selinux (by default) or 'none', if no other driver is
set in the config. Even with no config file driver name yields 'none'.
The other hunk checks for domain's security model validity, but we should
also check devices' security model as well, therefore this hunk is moved into
a separate function which is called by virSecurityManagerCheckAllLabel that
checks both the domain's security model and devices' security model.
https://bugzilla.redhat.com/show_bug.cgi?id=1165485
Signed-off-by: Ján Tomko <jtomko@redhat.com>
We do have a check for valid per-domain security model, however we still
do permit an invalid security model for a domain's device (those which
are specified with <source> element).
This patch introduces a new function virSecurityManagerCheckAllLabel
which compares user specified security model against currently
registered security drivers. That being said, it also permits 'none'
being specified as a device security model.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1165485
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Add an XML attribute to allow disabling merge of rx buffers
on the host:
<interface ...>
...
<model type='virtio'/>
<driver ...>
<host mrg_rxbuf='off'/>
</driver>
</interface>
https://bugzilla.redhat.com/show_bug.cgi?id=1186886
The enum converters are defined in the domain_conf.h (so
accessible widely across the code), but on the symbol layer, only
virDomainNetTypeToString was exposed. However, FromString variant
is going to be needed shortly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit b6a2828e introduced new functions to set process scheduler. There
is a small typo in ELSE path for systems where scheduler is not
available.
Also some of the definitions were introduced later in kernel. For
example RHEL-5 is running on kernel 2.6.18, but SCHED_IDLE was introduces
in 2.6.23 [1] and SCHED_BATCH in 2.6.16 [1]. We should not count only on
existence of function sched_setscheduler(), we must also check for
existence of used macros as they might not be defined.
[1] see 'man 7 sched'
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
While the main storage driver code allows the flag
VIR_STORAGE_VOL_RESIZE_SHRINK to be set, none of the backend
drivers are supporting it. At the very least this can work
for plain file based volumes since we just ftruncate() them
to the new size. It does not work with qcow2 volumes, but we
can arguably delegate to qemu-img for error reporting for that
instead of second guessing this for ourselves:
$ virsh vol-resize --shrink /home/berrange/VirtualMachines/demo.qcow2 2G
error: Failed to change size of volume 'demo.qcow2' to 2G
error: internal error: Child process (/usr/bin/qemu-img resize /home/berrange/VirtualMachines/demo.qcow2 2147483648) unexpected exit status 1: qemu-img: qcow2 doesn't support shrinking images yet
qemu-img: This image does not support resize
See also https://bugzilla.redhat.com/show_bug.cgi?id=1021802
The qemuDomainHelperGetVcpus attempted to report an error when the
vcpupids info was NULL. Unfortunately earlier code would clamp the
value of 'maxinfo' to 0 when nvcpupids was 0, so the error reporting
would end up being skipped.
This lead to 'virsh vcpuinfo <dom>' just returning an empty list
instead of giving the user a clear error.
If a previous commit I fixed the incorrect handling of vcpu pids
for TCG mode QEMU:
commit b07f3d821d
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Thu Dec 18 16:34:39 2014 +0000
Don't setup fake CPU pids for old QEMU
The code assumes that def->vcpus == nvcpupids, so when we setup
fake CPU pids for old QEMU with nvcpupids == 1, we cause the
later code to read off the end of the array. This has fun results
like sche_setaffinity(0, ...) which changes libvirtd's own CPU
affinity, or even better sched_setaffinity($RANDOM, ...) which
changes the affinity of a random OS process.
The intent was that this would merely disable the ability to set
per-vCPU affinity. It should still have been possible to set VM
level host CPU affinity.
Unfortunately, when you set <vcpu cpuset='0-1'>4</vcpu>, the XML
parser will internally take this & initialize an entry in the
def->cputune.vcpupin array for every VCPU. IOW this is implicitly
being treated as
<cputune>
<vcpupin cpuset='0-1' vcpu='0'/>
<vcpupin cpuset='0-1' vcpu='1'/>
<vcpupin cpuset='0-1' vcpu='2'/>
<vcpupin cpuset='0-1' vcpu='3'/>
</cputune>
Even more fun, the faked cputune elements are hidden from view when
querying the live XML, because their cpuset mask is the same as the
VM default cpumask.
The upshot was that it was impossible to set VM level CPU affinity.
To fix this we must update qemuProcessSetVcpuAffinities so that it
only reports a fatal error if the per-VCPU cpu mask is different
from the VM level cpu mask.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When initializing a libxl_domain_build_info struct with
libxl_domain_build_info_init(), VNC is enabled by default. As a
result, VMs configured with no graphics still have VNC enabled.
This behavior is a regression wrt to the legacy Xen driver.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Do not silently ignore its value. LibXL support only one address, so
refuse multiple IPs.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
In order for QEMU vCPU (and other) threads to run with RT scheduler,
libvirt needs to take care of that so QEMU doesn't have to run privileged.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178986
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This function uses sched_setscheduler() function so it works with
processes and threads as well (even threads not created by us, which is
what we'll need in the future).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Prior to commit 7d5bf48474 (first appearing in libvirt 1.2.2), the
status XML of a domain's interface was missing a lot of important
information; mainly it just output the config of the interface, plus
the name of the tap device and qemu device alias. Commit 7d5bf48474
changed the status XML to include many important bits of information
that were required to make network "hook" scripts useful - bandwidth
information, vlan tag, the name of the bridge (or physical device in
the case of macvtap) that the tap/macvtap device was attached to - the
commit log for 7d5bf48474 has a very detailed explanation of the
change. For quick reference - in the example given there, prior to the
change, status XML looked like figure [C]:
<interface type='network'>
<source network='testnet' portgroup='admin'/>
<target dev='macvtap0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
and after the change, it looked like figure [E]:
<interface type='direct'>
<source dev='p4p1_0' mode='bridge'/>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='128' peak='256' burst='256'/>
</bandwidth>
<target dev='macvtap0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
You'll notice that bandwidth info, physdev, and macvtap mode have been
added, but the network and portgroup names are now missing - I didn't
think that this information was of any use once the needed
bandwidth/vlan/etc config had been pulled from the network/portgroup.
I was wrong.
A few months after that change a user on IRC asked what happened to
portgroup in the status XML and described how he used it (more or less
as a tag to decide what external information to use in a hook script
that was run at startup/migration time - see
http://wiki.libvirt.org/page/OVS_and_PVLANS ). At that time I planned
to make a patch to re-add portgroup, but life intervened as that was
just prior to a transatlantic move involving several weeks of
"vacation". During this time I somehow forgot to make the patch, and
also mistakenly remembered that I *had* made it.
Subsequent to this, as a part of mprivozn's work to add support for
network-specific hooks, I did re-add the output of the network name in
status XML, but once again completely forgot about portgroup. This was
in commit a3609121 (first appearing in libvirt 1.2.11). This made the
status XML from the above example look like this:
<interface type='direct'>
<source network='testnet' dev='p4p1_0' mode='bridge'/>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='128' peak='256' burst='256'/>
</bandwidth>
<target dev='macvtap0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
*This* patch just adds the portgroup back to the status XML, so the
same example interface will look like this:
<interface type='direct'>
<source network='testnet' portgroup='admin'
dev='p4p1_0' mode='bridge'/>
<bandwidth>
<inbound average='1000' peak='5000' burst='1024'/>
<outbound average='128' peak='256' burst='256'/>
</bandwidth>
<target dev='macvtap0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
The result is that the status XML now contains all information about
how the interface is setup (bandwidth, physical device, tap device,
etc), in addition to pointers to its origin (the network and
portgroup).
virDomainGraphicsListenSetAddress() and
virDomainGraphicsListenSetNetwork() both set their respective char* to
NULL directly when asked to set it to NULL, which is okay as long as
it's already set to NULL. If these functions are ever called to clear
a listen object that has a valid string in address or network, it will
end up leaking the old value. Currently that doesn't happen, so this
is just a preemptive strike.
Prior to 0.9.4, libvirt only supported a single listen, and it had to
be an IP address:
<graphics listen='1.2.3.4' ..../>
Starting with 0.9.4, a graphics element could have a <listen>
subelement (actually the grammar supports multiples, but all of the
drivers only support a single <listen> per <graphics>), and that
listen element can be of type='address' or type='network'. For
type='address', <listen> also has an attribute called 'address' which
contains the IP address for listening:
<graphics ....>
<listen type='address' address='1.2.3.4' .../>
</graphics>
type can also be "network", and in that case listen will have a
"network" attribute which will contain the name of a libvirt
network:
<graphics ....>
<listen type='network' network='testnet' .../>
</graphics>
At domain start (or migrate) time, libvirt will attempt to
find an IP address associated with that network (e.g. the IP address
of the bridge device used by the network, or the physical device
listed in <forward dev='physdev'/>) and fill in that address in the
status XML:
<graphics ....>
<listen type='network' network='testnet' address='1.2.3.4' .../>
</graphics>
In the case that a <graphics> element has a <listen> subelement of
type='address', that listen subelement's "address" attribute is
backfilled into the parent graphics element's "listen" *attribute* for
backward compatibility (so that a management application unaware of
the separate <listen> element can still learn the listen
address). This backfill should be done with the IP learned from
type='network' as well, and that's what this patch does:
<graphics listen='1.2.3.4' ....>
<listen type='network' network='testnet' address='1.2.3.4' .../>
</graphics>
This is a continuation of the fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1191016
In the event we're falling into the code that tries to create the file
in a forked environment (VIR_FILE_OPEN_FORK) we pass different mode bits,
but those are never set because the virFileOpenForceOwnerMode has a check
if the OPEN_FORCE_MODE bit is set before attempting to change the mode.
Since this is a special case it seems reasonable to set u+rw,g+rw,o
Rather than have a dummy waitpid loop and return of the failure status
from recvfd, adjust the logic to save the recvfd error & fd and then
in priority order:
- if waitpid failed, use that errno value
- waitpid succeeded, but if the child exited abnormally, report failure
(use EACCES to report as return failure, since either EACCES or EPERM is
what caused us to fall into the fork+setuid path)
- waitpid succeeded, but if the child reported non-zero status, report
failure (use the errno value that the child encoded into exit status)
- waitpid succeeded, but if recvfd failed, report recvfd_errno
- waitpid and recvfd succeeded, use the fd
NOTE: Original logic to retry the open and force owner mode was
"documented" as only being attempted if we had already tried opening
with the fork+setuid, but checked flags vs. VIR_FILE_OPEN_NOFORK which
is counter to how we would get to that point. So that code was removed.
https://bugzilla.redhat.com/show_bug.cgi?id=1191355
When we attempt to migrate a vm with a migrateuri that has no scheme:
# virsh migrate test4 --live qemu+ssh://lhuang/system --migrateuri 127.0.0.1
target libvirtd will crash because uri->scheme is NULL in
qemuMigrationPrepareDirect on this line:
if (STRNEQ(uri->scheme, "tcp") &&
Add a value check before this line. Also fix a bug like this in
doNativeMigrate, that could only happen when destination libvirtd
returned an incorrect URI.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The function virDomainVcpuPinDel() used vcpupin_list to stand for
def->cputune.vcpupin, which made the codes more readable.
However, in this function, it will realloc vcpupin_list later.
As the definition of realloc(), it may free vcpupin_list and then
points it to a new-realloced address, but def->cputune.vcpupin doesn't
point to the new address(it's freed however).
Thus,
1) When we refer to the def->cputune.vcpupin afterwards, which was freed
by realloc(), an INVALID READ occurs, and libvirtd may crash.
2) As no one will use vcpupin_list any more, and no one frees it(it's just
alloced by realloc()), memory leak occurs.
Part of the valgrind logs are shown as below:
==1837== Thread 15:
==1837== Invalid read of size 8
==1837== at 0x5367337: virDomainDefFormatInternal (domain_conf.c:18392)
which is : virBufferAsprintf(buf, "<vcpupin vcpu='%u' ",
def->cputune.vcpupin[i]->vcpuid);
==1837== by 0x536966C: virDomainObjFormat (domain_conf.c:18970)
==1837== by 0x5369743: virDomainSaveStatus (domain_conf.c:19166)
==1837== by 0x117B26DC: qemuDomainPinVcpuFlags (qemu_driver.c:4586)
==1837== by 0x53EA313: virDomainPinVcpuFlags (libvirt.c:9803)
==1837== by 0x14CB7D: remoteDispatchDomainPinVcpuFlags (remote_dispatch.h:6762)
==1837== by 0x14CC81: remoteDispatchDomainPinVcpuFlagsHelper (remote_dispatch.h:6740)
==1837== by 0x5464C30: virNetServerProgramDispatchCall (virnetserverprogram.c:437)
==1837== by 0x546507A: virNetServerProgramDispatch (virnetserverprogram.c:307)
==1837== by 0x171B83: virNetServerProcessMsg (virnetserver.c:172)
==1837== by 0x171E6E: virNetServerHandleJob (virnetserver.c:193)
==1837== by 0x5318E78: virThreadPoolWorker (virthreadpool.c:145)
==1837== Address 0x12ea2870 is 0 bytes inside a block of size 16 free'd
==1837== at 0x4C291AC: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1837== by 0x52A3D14: virReallocN (viralloc.c:245)
==1837== by 0x52A3DFB: virShrinkN (viralloc.c:372)
==1837== by 0x52A3F57: virDeleteElementsN (viralloc.c:503)
==1837== by 0x533939E: virDomainVcpuPinDel (domain_conf.c:15405) //doReset为true时才会进到。
==1837== by 0x117B2642: qemuDomainPinVcpuFlags (qemu_driver.c:4573)
==1837== by 0x53EA313: virDomainPinVcpuFlags (libvirt.c:9803)
==1837== by 0x14CB7D: remoteDispatchDomainPinVcpuFlags (remote_dispatch.h:6762)
==1837== by 0x14CC81: remoteDispatchDomainPinVcpuFlagsHelper (remote_dispatch.h:6740)
==1837== by 0x5464C30: virNetServerProgramDispatchCall (virnetserverprogram.c:437)
==1837== by 0x546507A: virNetServerProgramDispatch (virnetserverprogram.c:307)
==1837== by 0x171B83: virNetServerProcessMsg (virnetserver.c:172)
Steps to reproduce the problem:
1) use virDomainPinVcpuFlags() to pin a guest's vcpu to all the pcpus
of the host.
This patch uses def->cputune.vcpupin instead of vcpupin_list to do the
realloc() job, to avoid invalid read or memory leaking.
Signed-off-by: Zhang Bo <oscar.zhangbo@huawei.com>
Signed-off-by: Yue Wenyuan <yuewenyuan@huawei.com@huawei.com>
Export the required helpers and add backend code to hotplug RNG devices.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The helpers will be useful when implementing hotplug and coldplug of
random number generator devices.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
As the RNG device is using an -object as backend refactor the code to
use the JSON to commandline generator so that we can reuse the code
later in hotplug.
Move the alias name right after the object type for rng-egd backend so
that we can later use the JSON to commandline generator to create the
command line.
Libvirt didn't prefix the random number generator backend object alias
with any string thus the device alias and object alias were identical.
To avoid possible problems, rename the alias for the backend object and
tweak tests to comply with the change.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Rename qemuBuildRNGDeviceArgs to qemuBuildRNGDevStr and change the
return type so that it can be reused in the device hotplug code later.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
This function is used to assign an alias for a RNG device. It will be
later reused when hotplugging RNGs.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
When adding devices to the definition it's useful to check whether the
devices don't reside on a conflicting address. This patch adds a helper
that iterates all device info and compares the addresses with the given
info.
commit a58e1cb4 didn't fix the bug if the security_default_confined is
not set to 1. We now clean up even if there is no seclabel defined or
the default one.
When defining and creating networks, we have been checking to make
sure there is only a single "default" portgroup, but haven't verified
that no two portgroups have the same name. We *do* check for multiple
definitions when updating the portgroups in an existing network
though.
This patch adds a check to networkValidate(), which is called when a
network is defined or created, to disallow duplicate names. It would
actually make sense to do this in the network XML parser (since it's
not really "something that might make sense but isn't supported by
this driver", but is instead "something that should never be
allowed"), but doing that carries the danger of causing errors when
rereading the config of existing networks when libvirtd is restarted
after an upgrade, and that would result in networks disappearing from
libvirt's list. (I'm thinking I should change the error to "XML_ERROR"
instead of "UNSUPPORTED", even though that's not the type of error
that networkValidate is intended for)
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1115858
It is only usable for NETWORK and BRIDGE type interfaces.
Error out when trying to start a domain where the custom
tap device path is specified for interfaces of other types,
or when the daemon is not privileged.
Note that this cannot be checked at definition time, because
the comparison is against actual type.
https://bugzilla.redhat.com/show_bug.cgi?id=1147195
It is only supported for virtio adapters.
Silently drop it if it was specified for other models,
as is done for other virtio attributes.
Also mention this in the documentation.
https://bugzilla.redhat.com/show_bug.cgi?id=1147195
Return 0 instead of ERR_NO_SUPPORT in each driver
where we don't support managed save or -1 if
the domain does not exist.
This avoids spamming daemon logs when 'virsh dominfo' is run.
https://bugzilla.redhat.com/show_bug.cgi?id=1095637
It is often helpful to know which version of libvirt and QEMU
was present when a guest was first launched. Ensure this info
is written into the QEMU log file for each guest.