Wednesday, July 31, 2013

Python data structures in GDB

One of the neat use cases of the gdb-heap project is the ability to inspect the Python data structures within Gdb.  For some reason, the current version adapted for Ubuntu 12.04 doesn't seem to categorize the structures with the C-runtime of Python, so I sought out to find out why.

I've noticed that even after installing the Ubuntu 12.04 python-dbg, the various Python data types with the stock installation do not always resolve even after loading the symbol file from /usr/lib/debug/usr/bin/python2.7:

gdb --args python /tmp/tst.py

(gdb) run
[Ctrl-C]

I noticed that none of the data types can be found:
(gdb) ptype PyObject
No symbol "PyObject" in current context.
(gdb) ptype PyVarObject
No symbol "PyVarObject" in current context.

...but for the debug build, gdb --args python-dbg /tmp/ac.py works:

(gdb) ptype PyObject
type = struct _object {
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
}
(gdb) ptype PyVarObject
type = struct {
    struct _object *_ob_next;
    struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    Py_ssize_t ob_size;
}

When setting verbose mode on within gdb (set verbose mode on), I noticed that gdb was looking up the debug symbols to the various modules with python2.7-dbg, but not with python2.7:

(gdb) set verbose on
(gdb) info types
Reading in symbols for ../Modules/symtablemodule.c...done.
Reading in symbols for ../Modules/zipimport.c...done.
Reading in symbols for ../Modules/_weakref.c...done.
Reading in symbols for ../Modules/_codecsmodule.c...done.
Reading in symbols for ../Modules/_sre.c...done.
Reading in symbols for ../Modules/pwdmodule.c...done.
Reading in symbols for ../Modules/errnomodule.c...done.

In stock Python, you don't see these symbols being referenced:

(gdb) set verbose on
(gdb) info types
Reading in symbols for bsearch.c...done.
Reading in symbols for ../sysdeps/x86_64/multiarch/init-arch.c...done.
Reading in symbols for ../sysdeps/x86_64/multiarch/cacheinfo.c...done.
Reading in symbols for wordcopy.c...done.
Reading in symbols for ../sysdeps/x86_64/multiarch/memmove.c...done.
Reading in symbols for ../sysdeps/x86_64/multiarch/rtld-memcmp.c...done.
Reading in symbols for ../sysdeps/unix/sysv/linux/x86_64/sigaction.c...done.
Reading in symbols for environ.c...done.
Reading in symbols for ../nptl/sysdeps/unix/sysv/linux/getpid.c...done.

After looking further, I noticed that the filenames for the linked modules were changed. The xxx.ltrans.o files seem to suggest that gcc link time optimization (-flto/-fltrans) was enabled to compile with the Ubuntu 12.04 install:


readelf -a /usr/lib/debug/usr/bin/python2.7 | grep FILE
    35: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    43: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    48: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ccvMPsIN.ltrans0.o
   166: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ccvMPsIN.ltrans8.o
   196: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ccvMPsIN.ltrans9.o
   210: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ccvMPsIN.ltrans10.o
   326: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ccvMPsIN.ltrans11.o
   369: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ccvMPsIN.ltrans12.o
   400: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ccvMPsIN.ltrans13.o
.
.
.

$ readelf -a /usr/bin/python2.7-dbg | grep FILE
    35: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    43: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    48: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS python.c
    49: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS getbuildinfo.c
    52: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS acceler.c
    55: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS grammar1.c
    58: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS listnode.c
    63: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS node.c
    67: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS parser.c
    77: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS parsetok.c
    81: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS bitset.c
    82: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS metagrammar.c
   112: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS firstsets.c
   115: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS grammar.c
   118: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS pgen.c
   148: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS myreadline.c
   151: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS tokenizer.c
   184: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abstract.c
   217: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS boolobject.c
   229: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS bufferobject.c
   258: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS bytes_methods.c



Also, typing python-config doesn't seem to indicate any of this special flag being used:

python-config --cflags --ldflags
-I/usr/include/python2.7 -I/usr/include/python2.7 -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security
-L/usr/lib/python2.7/config -lpthread -ldl -lutil -lm -lpython2.7 -Xlinker -export-dynamic -Wl,-O1 -Wl,-Bsymbolic-functions


I compiled my own version and using -O2/-O3 on an Ubuntu 12.04 install didn't seem to perform this special optimization. If you specifically however specify CFLAGS="-lto" LDFLAGS="-lto" ./configure, then the link-time optimization code will be added and the xxx.ltrans.o files will be used. For some reason using this compilation flag removes debugging symbols. More about link optimization is discussed here.

This finding seems to imply that using Gdb to inspect the Python data structures can only be done with the python-dbg binary or using your own compiled version of Python without using link-time optimizations. The latter seems preferable since the python-dbg changes the internal data strutures of Python and requires recompiling all C extension modules.

1 comment:

  1. Very good information. Its very useful for me. we need learn from real time examples and for this we choose good training institute, who were interested to know about python which is quite interesting. We need a good training institute for my learning .. so people making use of the free demo classes.
    Many training institute provides free demo classes. One of the best training institute in Bangalore is Apponix Technologies.
    https://www.apponix.com/Python-Institute/Python-Training-in-Bangalore.html

    ReplyDelete