libvirt/docs/internals/oomtesting.html.in
Daniel P. Berrange 590029f672 Introduce new OOM testing support
The previous OOM testing support would re-run the entire "main"
method each iteration, failing a different malloc each time.
When a test suite has 'n' allocations, the number of repeats
requires is  (n * (n + 1) ) / 2.  This gets very large, very
quickly.

This new OOM testing support instead integrates at the
virtTestRun level, so each individual test case gets repeated,
instead of the entire test suite. This means the values of
'n' are orders of magnitude smaller.

The simple usage is

   $ VIR_TEST_OOM=1 ./qemuxml2argvtest
   ...
   29) QEMU XML-2-ARGV clock-utc                                         ... OK
       Test OOM for nalloc=36 .................................... OK
   30) QEMU XML-2-ARGV clock-localtime                                   ... OK
       Test OOM for nalloc=36 .................................... OK
   31) QEMU XML-2-ARGV clock-france                                      ... OK
       Test OOM for nalloc=38 ...................................... OK
   ...

the second lines reports how many mallocs have to be failed, and thus
how many repeats of the test will be run.

If it crashes, then running under valgrind will often show the problem

  $ VIR_TEST_OOM=1 ../run valgrind ./qemuxml2argvtest

When debugging problems it is also helpful to select an individual
test case

  $ VIR_TEST_RANGE=30 VIR_TEST_OOM=1 ../run valgrind ./qemuxml2argvtest

When things get really tricky, it is possible to request that just
specific allocs are failed. eg to fail allocs 5 -> 12, use

  $ VIR_TEST_RANGE=30 VIR_TEST_OOM=1:5-12 ../run valgrind ./qemuxml2argvtest

In the worse case, you might want to know the stack trace of the
alloc which was failed then VIR_TEST_OOM_TRACE can be set. If it
is set to 1 then it will only print if it thinks a mistake happened.
This is often not reliable, so setting it to 2 will make it print
the stack trace for every alloc that is failed.

  $ VIR_TEST_OOM_TRACE=2 VIR_TEST_RANGE=30 VIR_TEST_OOM=1:5-5 ../run valgrind ./qemuxml2argvtest
  30) QEMU XML-2-ARGV clock-localtime                                   ... OK
      Test OOM for nalloc=36 !virAllocN
  /home/berrange/src/virt/libvirt/src/util/viralloc.c:180
  virHashCreateFull
  /home/berrange/src/virt/libvirt/src/util/virhash.c:144
  virDomainDefParseXML
  /home/berrange/src/virt/libvirt/src/conf/domain_conf.c:11745
  virDomainDefParseNode
  /home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12646
  virDomainDefParse
  /home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12590
  testCompareXMLToArgvFiles
  /home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:106
  virtTestRun
  /home/berrange/src/virt/libvirt/tests/testutils.c:250
  mymain
  /home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:418 (discriminator 2)
  virtTestMain
  /home/berrange/src/virt/libvirt/tests/testutils.c:750
  ??
  ??:0
  _start
  ??:?
   FAILED

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-20 15:36:10 +00:00

214 lines
7.9 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<h1>Out of memory testing</h1>
<ul id="toc"></ul>
<p>
This page describes how to use the test suite todo out of memory
testing.
</p>
<h2>Building with OOM testing</h2>
<p>
Since OOM testing requires hooking into the malloc APIs, it is
not enabled by default. The flag <code>--enable-test-oom</code>
must be given to <code>configure</code>. When this is done the
libvirt allocation APIs will have some hooks enabled.
</p>
<pre>
$ ./configure --enable-test-oom
</pre>
<h2><a name="basicoom">Basic OOM testing support</a></h2>
<p>
The first step in validating OOM usage is to run a test suite
with full OOM testing enabled. This is done by setting the
<code>VIR_TEST_OOM=1</code> environment variable. The way this
works is that it runs the test once normally to "prime" any
static memory allocations. Then it runs it once more counting
the total number of memory allocations. Then it runs it in a
loop failing a different memory allocation each time. For every
memory allocation failure triggered, it expects the test case
to return an error. OOM testing is quite slow requiring each
test case to be executed O(n) times, where 'n' is the total
number of memory allocations. This results in a total number
of memory allocations of '(n * (n + 1) ) / 2'
</p>
<pre>
$ VIR_TEST_OOM=1 ./qemuxml2argvtest
1) QEMU XML-2-ARGV minimal ... OK
Test OOM for nalloc=42 .......................................... OK
2) QEMU XML-2-ARGV minimal-s390 ... OK
Test OOM for nalloc=28 ............................ OK
3) QEMU XML-2-ARGV machine-aliases1 ... OK
Test OOM for nalloc=38 ...................................... OK
4) QEMU XML-2-ARGV machine-aliases2 ... OK
Test OOM for nalloc=38 ...................................... OK
5) QEMU XML-2-ARGV machine-core-on ... OK
Test OOM for nalloc=37 ..................................... OK
...snip...
</pre>
<p>
In this output, the first line shows the normal execution and
the test number, and the second line shows the total number
of memory allocations from that test case.
</p>
<h3><a name="valgrind">Tracking failures with valgrind</a></h3>
<p>
The test suite should obviously *not* crash during OOM testing.
If it does crash, then to assist in tracking down the problem
it is worth using valgrind and only running a single test case.
For example, supposing test case 5 crashed. Then re-run the
test with
</p>
<pre>
$ VIR_TEST_OOM=1 VIR_TEST_RANGE=5 ../run valgrind ./qemuxml2argvtest
...snip...
5) QEMU XML-2-ARGV machine-core-on ... OK
Test OOM for nalloc=37 ..................................... OK
...snip...
</pre>
<p>
Valgrind should report the cause of the crash - for example a
double free or use of uninitialized memory or NULL pointer
access.
</p>
<h3><a name="stacktraces">Tracking failures with stack traces</a></h3>
<p>
With some really difficult bugs valgrind is not sufficient to
identify the cause. In this case, it is useful to identify the
precise allocation which was failed, to allow the code path
to the error to be traced. The <code>VIR_TEST_OOM</code>
env variable can be given a range of memory allocations to
test. So if a test case has 150 allocations, it can be told
to only test allocation numbers 7-10. The <code>VIR_TEST_OOM_TRACE</code>
variable can be used to print out stack traces.
</p>
<pre>
$ VIR_TEST_OOM_TRACE=2 VIR_TEST_OOM=1:7-10 VIR_TEST_RANGE=5 \
../run valgrind ./qemuxml2argvtest
5) QEMU XML-2-ARGV machine-core-on ... OK
Test OOM for nalloc=37 !virAllocN
/home/berrange/src/virt/libvirt/src/util/viralloc.c:180
virDomainDefParseXML
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:11786 (discriminator 1)
virDomainDefParseNode
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12677
virDomainDefParse
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12621
testCompareXMLToArgvFiles
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:107
virtTestRun
/home/berrange/src/virt/libvirt/tests/testutils.c:266
mymain
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:388 (discriminator 2)
virtTestMain
/home/berrange/src/virt/libvirt/tests/testutils.c:791
__libc_start_main
??:?
_start
??:?
!virAlloc
/home/berrange/src/virt/libvirt/src/util/viralloc.c:133
virDomainDiskDefParseXML
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:4790
virDomainDefParseXML
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:11797
virDomainDefParseNode
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12677
virDomainDefParse
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12621
testCompareXMLToArgvFiles
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:107
virtTestRun
/home/berrange/src/virt/libvirt/tests/testutils.c:266
mymain
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:388 (discriminator 2)
virtTestMain
/home/berrange/src/virt/libvirt/tests/testutils.c:791
__libc_start_main
??:?
_start
??:?
!virAllocN
/home/berrange/src/virt/libvirt/src/util/viralloc.c:180
virXPathNodeSet
/home/berrange/src/virt/libvirt/src/util/virxml.c:609
virDomainDefParseXML
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:11805
virDomainDefParseNode
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12677
virDomainDefParse
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12621
testCompareXMLToArgvFiles
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:107
virtTestRun
/home/berrange/src/virt/libvirt/tests/testutils.c:266
mymain
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:388 (discriminator 2)
virtTestMain
/home/berrange/src/virt/libvirt/tests/testutils.c:791
__libc_start_main
??:?
_start
??:?
!virAllocN
/home/berrange/src/virt/libvirt/src/util/viralloc.c:180
virDomainDefParseXML
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:11808 (discriminator 1)
virDomainDefParseNode
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12677
virDomainDefParse
/home/berrange/src/virt/libvirt/src/conf/domain_conf.c:12621
testCompareXMLToArgvFiles
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:107
virtTestRun
/home/berrange/src/virt/libvirt/tests/testutils.c:266
mymain
/home/berrange/src/virt/libvirt/tests/qemuxml2argvtest.c:388 (discriminator 2)
virtTestMain
/home/berrange/src/virt/libvirt/tests/testutils.c:791
__libc_start_main
??:?
_start
??:?
</pre>
<h3><a name="noncrash">Non-crash related problems</a></h3>
<p>
Not all memory allocation bugs result in code crashing. Sometimes
the code will be silently ignoring the allocation failure, resulting
in incorrect data being produced. For example the XML parser may
mistakenly treat an allocation failure as indicating that an XML
attribute was not set in the input document. It is hard to identify
these problems from the test suite automatically. For this, the
test suites should be run with <code>VIR_TEST_DEBUG=1</code> set
and then stderr analysed for any unexpected data. For example,
the XML conversion may show an embedded "(null)" literal, or the
test suite might complain about missing elements / attributes
in the actual vs expected data. These are all signs of bugs in
OOM handling. In the future the OOM tests will be enhanced to
validate that an error VIR_ERR_NO_MEMORY is returned for each
allocation failed, rather than some other error.
</p>
</body>
</html>