|
CodeViz - A call graph generation utility for C/C++ Go straight to download Click here to read Belorussian translation (provided by Webhostingrating) Click here to read a Ukrainian translation (provided by Galina Miklosic) Click here to read an Armenian translation (provided by Karen Mgebrova) Click here to read a Polish translation (provided by Andrey Fomin) At some stage in everyone's programming career, they will need to read through a lot of code written by another programmer. An important part of program comprehension is building a picture of how the program is structured from a high-level view and call graphs can be an invaluable aid when building this piecture. This is particularly useful if the original programmer uses clear function names. This project provides the ability to generate call graphs to aid the task of understanding code. It uses a highly modular set of collection methods and can be adapted to support any language although only C and C++ are currently supported. Each collection method has different advantages and disadvantages. ![]() Call graph of alloc_pages() in the Linux Kernel 2.6.12-rc2 gengraph -f alloc_pages -d 5 -t -s "buffered_rmqueue out_of_memory try_to_free_pages numa_node_id" -i "cpuset_zone_allowed" --output-font=Arial --output-type=png tar -zxvf codeviz-1.0.3.tar.gz cd codeviz-1.0.3 ./configure && make && make installThis will configure codeviz for use with a patched version of the 3.4.1 gcc compiler and install all the scripts. Additional Dependencies The graphs are rendered using dot which is part of the GraphViz project. Install the package for your distribution or obtain it directly from The following is a rough list of packages you need to have installed; Scripts
Generating cdepn Files for genfull If the full.graph for the source you are interested in have already been created, you can skip this section. See ./graphs to see if a full.graph is available. The cobjdump and cppobjdump (for C and C++ respectively) will generate adequate call graphs but the information is a bit lacking. For example, the source file of a function declaration is unknown and macros and inline functions will be totally missing. Ideally, the cdepn method should be used but it requires a patched version of gcc and g++ to work. The patches and some scripts are available in the compilers/ directory. gcc 3.0.4 is automatically downloaded and patched for you as part of configure && make so if if you have used that, skip the rest of this section if you want. The patched version of gcc and g++ outputs .cdepn files for every c and c++ file compiled. This .cdepn file contains information such as when functions are called, where they are declared and so on. In versions of CodeViz prior to 1.0.3, three versions of gcc were supported: 2.95.4, 3.0.4 and 3.2.3. For the latest versions, only 3.4.1 is supported as it should be able to compile any application. It has been heavily tested with the Linux Kernel so C support should be fine. C++ support is also there but has not been tested as extensively. Reports welcome. In case you do not wish to use the configure script, this is how to install the compiler yourself. First, the source tar has to be downloaded. For those who have better things to do than read the gcc install doc, just do the following cd compilers ncftpget ftp://ftp.gnu.org/pub/gnu/gcc/gcc-3.4.1/gcc-3.4.1.tar.gz ./install_gcc-3.4.1.shThis script will untar gcc, patch it and install it to the supplied path. If no path is given, it'll be installed to $HOME/gcc-graph . I usually install it to /usr/local/gcc-graph with ./install_gcc-3.4.1.sh /usr/local/gcc-graphIf you seriously want to patch by hand, just read the script as it goes through each of the steps one at a time. For now, we will presume a patched version of gcc and g++ is now in $HOME/gcc-graph/. Most projects will use the variable CC for deciding which version of gcc to use. The handiest way to use the patched one is with something like make CC=$HOME/gcc-graph/bin/gcc CXX=$HOME/gcc-graph/bin/g++Or alternatively, adjust your path that gcc-graph will appear before the normal gcc. As each source file is compiled, the corresponding cdepn file will be created. In the case of building the Linux Kernel, the commands would be; make CC=$HOME/gcc-graph/bin/gcc bzImage make CC=$HOME/gcc-graph/bin/gcc modulesSimilar methods will work for other projects presuming that the Makefile uses the CC or CXX macros correctly to indicate the compiler to use. If it's a Makefile of your own type or it does not use proper macros, you may have to edit the Makefile yourself or else adjust your path to put gcc-graph first. For example, with bash, the following will work. PATH=$HOME/gcc-graph/bin:$PATHWhen building, watch the compiler output to make sure the .cdepn files are being created. Generating nccout files for genfull An alternative to using a patched version of gcc is to use ncc (http://freshmeat.net/projects/ncc) which is a C compiler specifically designed for code browsing. It comes with it's own navigation tool and is well worth checking out. CodeViz supports ncc with the cncc collection method (just like cdepn is for use with gcc) and supports C only. The really big thing going for the ncc collection method is that it can traverse function pointers. If you download and install ncc, use the cncc collection method if it is C code and function pointers are common. Once ncc is installed, in the case of building the Linux Kernel, the commands would be: make -i CC='ncc -ncoo -ncfabs' bzImage make -i CC='ncc -ncoo -ncfabs' modules find . -name \*.nccout | xargs cat > code.map.nccoutGenerating full.graph Some full.graph files are provided with the tar in the downloads section. If one you want is not available, read on. To create a full.graph, the script genfull is used. run genfull --help to see all options but the easiest thing to do is run the script with no arguments in the top level source directory after a compile and a file full.graph will be created in the top level source directory. While it should be possible to put full.graph though dot and see the postscript file, it is recommended you do not try. A full graph is extremely large and unlikely to be rendered in a reasonable amount of time. One really should use the gengraph program to create smaller graphs. Problems that might exist with full.graph In more complex code, the full.graph may not be perfect. For example, there may be naming collisions where there is duplicate function names between modules or if there is multiple binaries being compiled, genfull will not distinguish between them. If you think this will be a problem, there is two steps you can make. First, compare the graph generated by cdepn with the one generated by cobjdump. As cobjdump is analysing a binary, it is highly unlikely the graph is wrong, it just will have no information on inline functions or macros. With the linux kernel, this test would look something like genfull -g cobjdump -o full.graph-objdump genfull -g cdepn -o full.graph-cdepn gengraph -t -d 5 -g full.graph-objdump -f kswapd -o kswapd-objdump.ps gengraph -t -d 5 -g full.graph-cdepn -f kswapd -o kswapd-cdepn.psThis would generate two full.graphs and two call graphs of the function kswapd() which could be compared to make sure the cdepn graph is accurate. A similar method can be used for other projects. The second problem that may occur is where function names are duplicated between modules. In this case, the best course of action is to use the -s switch to genfull to limit which branches of the tree are examined. For example, in the linux kernel there is an alloc_pages() function in mm/ and drivers/char/drm . If one was examining the VM alone and naming collisions were expected to be a problem, genfull could be invoked as genfull -s "mm include/linux drivers/block arch/i386"which would cover most of the functions of interest. In other projects, it will be a case of different libraries colliding with each other. For instance, with avifile, genfull with no arguments will create a horrible mess. Instead, the -s switch must be used to generate a full.graph for each part of the project. For example, the player would be graphed with genfull -s "player" -o full.graph-playerand each of the libraries would be graphed separately. Generating Call Graphs The script gengraph generates a call graph for a specified function based on the full.graph file. gengraph --man will provide all the information you need. The most important switch to note is -g which determines what collection method to use. Once the script completes, a postscript file will be available which can be viewed with any postscript viewer. By default, the output filename will be functionname.ps If it takes a long time to generate a graph, it is usually a good idea to first limit it's depth to something reasonable with -d . We'll take an example of graphing alloc_pages() with kernel 2.4.20 Step 1: gengraph -f alloc_pages Result: Taking way too long, hit ctrl-c and limited by some reasonable depth to get an idea of what was happening Step 2: gengraph -d 10 -f alloc_pages Result: Output graph is massive, mainly with kernel stock functions of no interest. Use the -t switch to omit functions that are usually of no interest. For other projects, edit the gengraph script and go to the line "sub generate_trimlist", this function has a list of functions to "trim" with the -t switch is used Step 3: gengraph -t -d 10 -f alloc_pages Result: Output graph is still massive but a glance at the graph shows that a call to "shrink_cache()" is resulting in a massive graph below it that does not look like it is directly related to page allocation. Lets just show that function but not traverse it with the -s switch Step 4: gengraph -t -d 10 -s "shrink_cache" -f alloc_pages Result: Graph size is drastically reduced. Most of the remaining graph involves two functions "try_to_free_pages_zone()" and "__free_pages_ok". We'll not traverse try_to_free_pages_zone() and will ignore __free_pages_ok() altogether with the -i switch Step 5: gengraph -t -d 10 -s "shrink_cache try_to_free_pages_zone" -i "__free_pages_ok" -f alloc_pages Result: Perfect, shows a nice graph which clearly shows what the important functions are in relation to just page allocation. Later the branches that were not traversed in this graph can be graphed separately The bottom line is that the first graph is usually too large and needs to be cut down. How to pare it down in a combination of experience with the code and common sense. I find it usually helps to just limit the depth first by 4 and start ignoring functions that are obviously not of current interest and traverse them later. If listing individual functions to omit is not your thing, use the --show-re or --ignore-re switches to show or ignore functions based on a regular expression. Daemon/Client Support With a large input graph, the longest operation for the generation of the call graph is the reading of the input file. To compare, to generate a small graph on the authors machine, it takes 4 seconds to read the input graph and 0.1 seconds to generate the output file. To address, this, gengraph can run as a daemon if the -q (--daemon) switch is specified. Use -v if you want to see what it is doing. gengraph -q -g /usr/src/linux-2.4.20-clean/full.graphWhen this returns, the daemon is running. To generate a graph using the daemon, run gengraph -q -t -d 2 -f alloc_pagesNote the use of the -q switch which says that gengraph should run as a client to the daemon instance. If you are bored, compare the difference in running times between normal gengraph and when it is used as a client :-) . To stop the daemon, do the following echo QUIT > /tmp/codeviz.pipeand the daemon will shutdown and cleanup. Post-Processing Options Both genfull and gengraph support the use of post-procesing steps. Currently, two are supported. The first is stack usage by a single function. This is x86 specific as it depends on object files regardless of the collection method used. This is mainly of benefit to the Linux kernel as normal applications can expand their stack and do not need to worry about stack usage as much. The second module shows cumulative usage in gengraph between pairs of functions. This is really handy for showing the usage between a system call and a lower-level function to identify places where stack is used too much. See the man pages for genfull and gengraph for more information on the use of the post-processing options. Generating Graphs for the Web Gengraph is now suitable for use with CGI scripts. To generate GIF output instead of postfix, use the -w switch. How you choose to implement is up to yourself but what I did was the following There is no demo of this available because the webserver which hosts this project is a bit loaded. While I could run a demo, my popularity would take a bit of a dent. Misc Notes gcc 2.95.3, gcc 3.2.3, C++ support and reverse call graphs have *not* being extensively tested. Reports of success or failure, especially with C++, using any of the collection methods are appreciated. gcc 3.2.3 is not recommended unless you really have to use it as the build and patching to it is a little flaky. Use 3.0.4 unless there is a good reason not to. Bugs and Feedback Codeviz is largely unmaintained these days as it mostly does what I needed it to do. If you have specific queries, contact me directly. Credits The vast majority of this has been implemented by Mel Gorman (mel@csn.ul.ie). However, the diff to gcc and original cdepn.pl that this project was originally based on was written by Martin Devera (Devik) (http://luxik.cdi.cz/~devik). They have since changed considerably to support other languages and be more flexible but the original idea was his, thanks Martin. Encouragement and prodding to support ncc is courtesy of the author of ncc Xanthakis Stelios (sxanth@ceid.upatras.gr). Support for gcc 3.3.2 and instructions on how to setup a cross-compiler were provided by Joel Soete. Finally, support for gcc 3.4.6 was provided by Michael Iatrou. Download codeviz-1.0.12.tar.gz codeviz-1.0.11.tar.gz codeviz-1.0.10.tar.gz codeviz-1.0.9.tar.gz codeviz-1.0.7.tar.gz codeviz-1.0.6.tar.gz codeviz-1.0.5.tar.gz codeviz-1.0.3.tar.gz codeviz-1.0.tar.gz codeviz-0.99d.tar.gz codeviz-0.99c.tar.gz codeviz-0.99b.tar.gz codeviz-0.99.tar.gz codeviz-0.24.tar.gz codeviz-0.23.tar.gz codeviz-0.22.tar.gz codeviz-0.21.tar.gz codeviz-0.20.tar.gz codeviz-0.19.tar.gz codeviz-0.18.tar.gz codeviz-0.17.tar.gz codeviz-0.16.tar.gz codeviz-0.15.tar.gz codeviz-0.14.tar.gz codeviz-0.13.tar.gz codeviz-0.12.tar.gz codeviz-0.11.tar.gz codeviz-0.10.tar.gz codeviz-0.9.tar.gz codeviz-0.8.tar.gz gengraph-0.7.tar.gz gengeaph-0.6 was not released gengraph-0.5.tar.gz gengraph-0.4.tar.gz gengraph-0.3.tar.gz gengraph-0.2.tar.gz Changelog --------- Version 1.0.2 o Ditched support for multiple compilers, left with only 3.4.1 o Vastly superior C++ support, multiple bugs fixed Version 1.0.1 o Support for gcc 3.3.2 removed, way too buggy o Error in gcc 3.2.3 installation script fixed up o Support for --font switch to specify what font to use for graphs o Support again available for HTML generation and --shighlight Version 1.0 o Final bit of macro recognition tweaking o Documentation updates Version 0.99 o Be consistent about the use of cdep or cdepn o Better header processing . Version 0.24 o Added support for a --version switch Version 0.23 o Remove CPP support in the cxref method, it was just too delicate o Changed the C++ method for cdepn methods to use use cdepn files, works well o CObjDump will now put " around labels with :: Version 0.22 o Bugs with the --plan switch fixed o Patch and installation for gcc 3.3.2 compiler fixed Version 0.21 o Added missing file Version 0.20 o Fixed bug with --plain usage o Fixed bug with SMP function name mangling with later 2.6 kernels o Calculate cumulative stack usage with --pp-cstack post-processing module Version 0.19 o Mainly code cleanups o Moved graph rendering to Output.pm that exports just renderGraph() o Moved printing functions to Format.pm o Moved IPC functions to IPC.pm o Moved remaining graph functions to Graph.pm o Fixed cobjdump for binary analysis o Added post-processing analysis to genfull to calculate stack usage o Display stack usage and highlight excessive use for gengraph Version 0.18 o Allow output of just the graph file without using dot o Support for templated base URLs for HTML image maps o Better support for source-highlight usages o Allow standard error to be redirected (useful to daemon mode) o Allow standard out to be redirected (useful to daemon mode) Version 0.17 o Major bug fixed that prevented genfull running o Support for gcc 3.3.2 (Joel Soete) o Cross-compile instructions (Joel Soete) Version 0.16 o Many bug fixes and cleanups related to the HTML rendering o Better handling of node attributes for code cleanup o Many code cleanups to reduce complexity, overall less code o Documentation updates Version 0.15 o Show location of a function call (--all-locs) (Mel) o Graph top-level functions based on regular expressions (Lehr + Mel) o Various web-page related options added (Lehr + Mel) Version 0.14 o More minor bugs o Support to show/ignore functions based on a regular expression (Lehr) o Add RPM spec file (Lehr) o Format nodes that are not traversed differently (Lehr) Version 0.13 o Bugfixes Version 0.12 o Graphs are now internally represented as DAGs, massive speedups o graph2vrml removed because it was not going anywhere useful o Daemon/Client support added o GIF support added for web pages o Proper checking for availablity of dot o Various optimizations and speedups Version 0.11 o cdepn and cxref methods merged o cdepn method is MUCH more accurate and is able to determine files to ignore o Output printing module added, only cdepn uses it currently Version 0.10 o Avoid naming collisions where structure names match functions in cxrefdep o Improved name collision resolution o ncc support added for new method cncc which supports function pointers Version 0.9 o xref support added which understands macros o Minor bug fixes and cleanups Version 0.8 o Modular data collection so that many collection methods can be easily added All collection methods are now perl libraries o C++ support added o Integrated all scripts together so that there is only two principal scripts o objdump support so that project does not depend on patched compiled o Patches to gcc updated to gcc-3.0.4 o Patches added for gcc-2.95.3 and gcc-3.2.3 o glibc workaround added for new version of glib compiling gcc o Automated download, compile and patch scripts added for each compiler version Version 0.7 o Reverse Call Graph Support o Online man pages and documentation help Version 0.5 o Fix up case where graphs with similar function names sometimes get corrupt. Most time it would work out ok, but other times multiple edges or unrelated functions were displayed o Enforce that the call graph order matches the order in code perfectly. It was a very rare case that a depth first search of the call graph would give a misleading view of the code o Allow multiple functions to be specified to graph. This is really handy when a number of API functions map to a much smaller set and it is desirable to display all the API wrappers in one place o Small documentation fix Version 0.4 o The order of functions displayed is now in the same order as the source. Traversing the graph in depth-first search will be the same order in the code Version 0.3 o Fixed cdepn.pl to work with 2.5.x kernels |
