Skip to content

Commit

Permalink
Document the use of Memcheck and add suppressions
Browse files Browse the repository at this point in the history
  • Loading branch information
gmarkall committed Jan 10, 2020
1 parent 88e3dad commit b4ee055
Show file tree
Hide file tree
Showing 3 changed files with 152 additions and 0 deletions.
21 changes: 21 additions & 0 deletions contrib/valgrind-numba.supp
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
<llvmpy_get_cpu_name_cond>
Memcheck:Cond
fun:_ZN4llvm3sys14getHostCPUNameEv
fun:LLVMPY_GetHostCPUName
}

{
<llvmpy_get_cpu_name_value8>
Memcheck:Value8
fun:_ZN4llvm3sys14getHostCPUNameEv
fun:LLVMPY_GetHostCPUName
}

{
<openmp_init_cond>
Memcheck:Cond
fun:__intel_sse2_strrchr
fun:_ZN67_INTERNAL_45_______src_thirdparty_tbb_omp_dynamic_link_cpp_c306cade5__kmp12init_dl_dataEv
fun:__sti__$E
}
130 changes: 130 additions & 0 deletions docs/source/developer/debugging.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
.. _developer-debugging:

==================
Notes on Debugging
==================

This section describes techniques that can be useful in debugging the
compilation and execution of generated code.

Memcheck
--------

Memcheck_ is a memory error detector implemented using Valgrind_. It is useful
for detecting memory errors in compiled code, particularly out-of-bounds
accesses and use-after-free errors. Buggy or miscompiled native code can
generate these kinds of errors. The `Memcheck documentation
<https://valgrind.org/docs/manual/mc-manual.html>`_ explains its usage; here, we
discuss only the specifics of using it with Numba.

.. _Memcheck: https://valgrind.org/docs/manual/mc-manual.html
.. _Valgrind: https://valgrind.org/

The Python interpreter, and some of the libraries used by Numba can generate
false positives with Memcheck - see `this section of the manual
<https://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine>`_ for more
information on why false positives occur. The false positives can make it
difficult to determine when an actual error has occurred, so it is helpful to
suppress known false positives. This can be done by supplying a suppressions
file, which instructs Memcheck to ignore errors that match the suppressions
defined in it.

The CPython source distribution includes a suppressions file, in the file
``Misc/valgrind-python.supp``. Using this file prevents a lot of spurious errors
generated by Python's memory allocation implementation. Additionally, the Numba
repository includes a suppressions file in ``contrib/valgrind-numba.supp``.

To run the Python interpreter under Memcheck with both suppressions
files, it is invoked with the following command::

valgrind --tool=memcheck \
--suppressions=${CPYTHON_SRC_DIR}/Misc/valgrind-python.supp \
--suppressions=${NUMBA_SRC_DIR}/contrib/valgrind-numba.supp \
python ${PYTHON_ARGS}

where ``${CPYTHON_SRC_DIR}`` is set to the location of the CPython source
distribution, ``${NUMBA_SRC_DIR}`` is the location of the Numba source dir, and
``${PYTHON_ARGS}`` are the arguments to the Python interpreter.

If there are errors, then messages describing them will be printed to standard
error. An example of an error is::

==77113== at 0x24169A: PyLong_FromLong (longobject.c:251)
==77113== by 0x241881: striter_next (bytesobject.c:3084)
==77113== by 0x2D3C95: _PyEval_EvalFrameDefault (ceval.c:2809)
==77113== by 0x21B499: _PyEval_EvalCodeWithName (ceval.c:3930)
==77113== by 0x26B436: _PyFunction_FastCallKeywords (call.c:433)
==77113== by 0x2D3605: call_function (ceval.c:4616)
==77113== by 0x2D3605: _PyEval_EvalFrameDefault (ceval.c:3124)
==77113== by 0x21B977: _PyEval_EvalCodeWithName (ceval.c:3930)
==77113== by 0x21C2A4: _PyFunction_FastCallDict (call.c:376)
==77113== by 0x2D5129: do_call_core (ceval.c:4645)
==77113== by 0x2D5129: _PyEval_EvalFrameDefault (ceval.c:3191)
==77113== by 0x21B499: _PyEval_EvalCodeWithName (ceval.c:3930)
==77113== by 0x26B436: _PyFunction_FastCallKeywords (call.c:433)
==77113== by 0x2D46DA: call_function (ceval.c:4616)
==77113== by 0x2D46DA: _PyEval_EvalFrameDefault (ceval.c:3139)
==77113==
==77113== Use of uninitialised value of size 8

The traceback provided only outlines the C call stack, which can make it
difficult to determine what the Python interpreter was doing at the time of the
error. One can learn more about the state of the stack by looking at the
backtrace in GDB. Launch ``valgrind`` with an additional argument,
``--vgdb-error=0`` and attach to the process using GDB as instructed by the
output. Once an error is encountered, GDB will stop at the error and the stack
can be inspected.

GDB does provide support for backtracing through the Python stack, but this
requires symbols which may not be easily available in your Python distribution.
In this case, it is still possible to determine some information about what was
happening in Python, but this depends on examining the backtrace closely. For
example, in a backtrace corresponding to the above error, we see items in the
backtrace such as:

.. code-block::
#18 0x00000000002722da in slot_tp_call (
self=<_wrap_impl(_callable=<_wrap_missing_loc(func=<function at remote
0x1cf66c20>) at remote 0x1d200bd0>, _imp=<function at remote 0x1d0e7440>,
_context=<CUDATargetContext(address_size=64,
typing_context=<CUDATypingContext(_registries={<Registry(functions=[<type
at remote 0x65be5e0>, <type at remote 0x65be9d0>, <type at remote
0x65bedc0>, <type at remote 0x65bf1b0>, <type at remote 0x8b78000>, <type
at remote 0x8b783f0>, <type at remote 0x8b787e0>, <type at remote
0x8b78bd0>, <type at remote 0x8b78fc0>, <type at remote 0x8b793b0>, <type
at remote 0x8b797a0>, <type at remote 0x8b79b90>, <type at remote
0x8b79f80>, <type at remote 0x8b7a370>, <type at remote 0x8b7a760>, <type
at remote 0x8b7ab50>, <type at remote 0x8b7af40>, <type at remote
0x8b7b330>, <type at remote 0x8b7b720>, <type at remote 0x8b7bf00>, <type
at remote 0x8b7c2f0>, <type at remote 0x8b7c6e0>], attributes=[<type at
remote 0x8b7cad0>, <type at remote 0x8b7cec0>, <type at remote
0x8b7d2b0>, <type at remote 0x8b7d6a0>, <type at remote 0x8b7da90>,
<t...(truncated),
args=(<Builder(_block=<Block(parent=<Function(parent=<Module(context=<Context(scope=<NameScope(_useset={''},
_basenamemap={}) at remote 0xbb5ae10>, identified_types={}) at remote
0xbb5add0>, name='cuconstRecAlign$7',
data_layout='e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64',
scope=<NameScope(_useset={'',
'_ZN08NumbaEnv5numba4cuda5tests6cudapy13test_constmem19cuconstRecAlign$247E5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE',
'_ZN5numba4cuda5tests6cudapy13test_constmem19cuconstRecAlign$247E5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE'},
_basenamemap={}) at remote 0x1d27bf10>, triple='nvptx64-nvidia-cuda',
globals={'_ZN08NumbaEnv5numba4cuda5tests6cudapy13test_constmem19cuconstRecAlign$247E5ArrayIdLi1E1C7mutable7ali...(truncated),
kwds=0x0)
We can see some of the arguments, in particular the names of the compiled functions, e.g::

_ZN5numba4cuda5tests6cudapy13test_constmem19cuconstRecAlign$247E5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE

We can run this through ``c++filt`` to see a more human-readable representation::

numba::cuda::tests::cudapy::test_constmem::cuconstRecAlign$247(
Array<double, 1, C, mutable, aligned>,
Array<double, 1, C, mutable, aligned>,
Array<double, 1, C, mutable, aligned>,
Array<double, 1, C, mutable, aligned>,
Array<double, 1, C, mutable, aligned>)

which is the fully qualified name of a jitted function and the types with which
it was called.

1 change: 1 addition & 0 deletions docs/source/developer/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,5 @@ Developer Manual
hashing.rst
caching.rst
literal.rst
debugging.rst
roadmap.rst

0 comments on commit b4ee055

Please sign in to comment.