21.6 Analyzing Dumped core Files
When your application dies with the "Segmentation fault" error (generated by the default SIGSEGV signal handler) and generates a core file, you can analyze the core file using gdb or a similar debugger to find out what caused the segmentation fault (or segfault).
21.6.1 Getting Ready to Debug
To debug the core file, you may need to recompile Perl and mod_perl so that their executables contain debugging symbols. Usually you have to recompile only mod_perl, but if the core dump happens in the libperl.so library and you want to see the whole backtrace, you will probably want to recompile Perl as well.
For example, sometimes people send this kind of backtrace to the mod_perl list:
#0 0x40448aa2 in ?? ( ) #1 0x40448ac9 in ?? ( ) #2 0x40448bd1 in ?? ( ) #3 0x4011d5d4 in ?? ( ) #4 0x400fb439 in ?? ( ) #5 0x400a6288 in ?? ( ) #6 0x400a5e34 in ?? ( )
This kind of trace is absolutely useless, since you cannot tell where the problem happens from just looking at machine addresses. To preserve the debug symbols and get a meaningful backtrace, recompile Perl with -DDEBUGGING during the ./Configure stage (or with -Doptimize="-g", which, in addition to adding the -DDEBUGGING option, adds the -g option, which allows you to debug the Perl interpreter itself).
During make install, Apache strips all the debugging symbols. To prevent this, you should use the Apache —without-execstrip ./configure option. So if you configure Apache via mod_perl, you should do this:
panic% perl Makefile.PL USE_APACI=1 \ APACI_ARGS='--without-execstrip' [other options]
Alternatively, you can copy the unstripped binary manually. For example, we did this to give us an Apache binary called httpd_perl that contains debugging symbols:
panic# cp apache_1.3.24/src/httpd /home/httpd/httpd_perl/bin/httpd_perl
Now the software is ready for a proper debug.
21.6.2 Creating a Faulty Package
The next stage is to create a package that aborts abnormally with a segfault, so you will be able to reproduce the problem and exercise the debugging technique explained here. Luckily, you can download Debug::DumpCore from CPAN, which does a very simple thing—it segfaults when called as:
use Debug::DumpCore; Debug::DumpCore::segv( );
Debug::DumpCore::segv( ) calls a function, which calls another function, which dereferences a NULL pointer, which causes the segfault:
int *p; p = NULL; printf("%d", *p); // cause a segfault
For those unfamiliar with C programming, p is a pointer to a segment of memory. Setting it to NULL ensures that we try to read from a segment of memory to which the operating system does not allow us access, so of course dereferencing the NULL pointer through *p causes a segmentation fault. And that's what we want.
Of course, you can use Perl's CORE::dump( ) function, which causes a core dump, but you don't get the nice long trace provided by Debug::DumpCore, which on purpose calls a few other functions before causing a segfault.
21.6.3 Dumping the core File
Now let's dump the core file from within the mod_perl server. Sometimes the program aborts abnormally via the SIGSEGV signal (a segfault), but no core file is dumped. And without the core file it's hard to find the cause of the problem, unless you run the program inside gdb or another debugger in the first place. In order to get the core file, the application must:
Note that when you are running the program under a debugger like gdb, which traps the SIGSEGV signal, the core file will not be dumped. Instead, gdb allows you to examine the program stack and other things without having the core file.
First let's test that we get the core file from the command line (under tcsh):
panic% limit coredumpsize unlimited panic% perl -MDebug::DumpCore -e 'Debug::DumpCore::segv( )' Segmentation fault (core dumped) panic% ls -l core -rw------- 1 stas stas 954368 Jul 31 23:52 core
Indeed, we can see that the core file was dumped. Let's write a simple script that uses Debug::DumpCore, as shown in Example 21-9.
use strict; use Debug::DumpCore ( ); use Cwd( ) my $r = shift; $r->send_http_header("text/plain"); my $dir = getcwd; $r->print("The core should be found at $dir/core\n"); Debug::DumpCore::segv( );
In this script we load the Debug::DumpCore and Cwd modules. Then we acquire the request object and send the HTTP headers. Now we come to the real part—we get the current working directory, print out the location of the core file that we are about to dump, and finally call Debug::DumpCore::segv( ), which dumps the core file.
Before we run the script we make sure that the shell sets the core file size to be unlimited, start the server in single-server mode as a non-root user, and generate a request to the script:
panic% cd /home/httpd/httpd_perl/bin panic% limit coredumpsize unlimited panic% ./httpd_perl -X # issue a request here Segmentation fault (core dumped)
Our browser prints out:
The core should be found at /home/httpd/perl/core
And indeed the core file appears where we were told it would (remember that Apache::Registry scripts change their directory to the location of the script source):
panic% ls -l /home/httpd/perl/core -rw------- 1 stas httpd 4669440 Jul 31 23:58 /home/httpd/perl/core
As you can see it's a 4.7 MB core file. Notice that mod_perl was started as user stas, which has write permission for the directory /home/httpd/perl.
21.6.4 Analyzing the core File
panic% gdb /home/httpd/httpd_perl/bin/httpd_perl /home/httpd/perl/core
(gdb) where #0 0x4039f781 in crash_now_for_real ( suicide_message=0x403a0120 "Cannot stand this life anymore") at DumpCore.xs:10 #1 0x4039f7a3 in crash_now ( suicide_message=0x403a0120 "Cannot stand this life anymore", attempt_num=42) at DumpCore.xs:17 #2 0x4039f824 in XS_Debug_ _DumpCore_segv (cv=0x84ecda0) at DumpCore.xs:26 #3 0x401261ec in Perl_pp_entersub ( ) from /usr/lib/perl5/5.6.1/i386-linux/CORE/libperl.so #4 0x00000001 in ?? ( )
Notice that only the symbols from the DumpCore.xs file are available (plus Perl_pp_entersub from libperl.so), since by default Debug::DumpCore always compiles itself with the -g flag. However, we cannot see the rest of the trace, because our Perl and mod_perl libraries and Apache server were built without the debug symbols. We need to recompile them all with the debug symbols, as explained earlier in this chapter.
Then we repeat the process of starting the server, issuing a request, and getting the core file, after which we run gdb again against the executable and the dumped core file:
panic% gdb /home/httpd/httpd_perl/bin/httpd_perl /home/httpd/perl/core
Now we can see the whole backtrace:
(gdb) bt #0 0x40448aa2 in crash_now_for_real ( suicide_message=0x404499e0 "Cannot stand this life anymore") at DumpCore.xs:10 #1 0x40448ac9 in crash_now ( suicide_message=0x404499e0 "Cannot stand this life anymore", attempt_num=42) at DumpCore.xs:17 #2 0x40448bd1 in XS_Debug_ _DumpCore_segv (my_perl=0x8133b60, cv=0x861d1fc) at DumpCore.xs:26 #3 0x4011d5d4 in Perl_pp_entersub (my_perl=0x8133b60) at pp_hot.c:2773 #4 0x400fb439 in Perl_runops_debug (my_perl=0x8133b60) at dump.c:1398 #5 0x400a6288 in S_call_body (my_perl=0x8133b60, myop=0xbffff160, is_eval=0) at perl.c:2045 #6 0x400a5e34 in Perl_call_sv (my_perl=0x8133b60, sv=0x85d696c, flags=4) at perl.c:1963 #7 0x0808a6e3 in perl_call_handler (sv=0x85d696c, r=0x860bf54, args=0x0) at mod_perl.c:1658 #8 0x080895f2 in perl_run_stacked_handlers (hook=0x8109c47 "PerlHandler", r=0x860bf54, handlers=0x82e5c4c) at mod_perl.c:1371 #9 0x080864d8 in perl_handler (r=0x860bf54) at mod_perl.c:897 #10 0x080d2560 in ap_invoke_handler (r=0x860bf54) at http_config.c:517 #11 0x080e6796 in process_request_internal (r=0x860bf54) at http_request.c:1308 #12 0x080e67f6 in ap_process_request (r=0x860bf54) at http_request.c:1324 #13 0x080ddba2 in child_main (child_num_arg=0) at http_main.c:4595 #14 0x080ddd4a in make_child (s=0x8127ec4, slot=0, now=1028133659) #15 0x080ddeb1 in startup_children (number_to_start=4) at http_main.c:4792 #16 0x080de4e6 in standalone_main (argc=2, argv=0xbffff514) at http_main.c:5100 #17 0x080ded04 in main (argc=2, argv=0xbffff514) at http_main.c:5448 #18 0x40215082 in _ _libc_start_main ( ) from /lib/i686/libc.so.6
Reading the trace from bottom to top, we can see that it starts with Apache functions, moves on to the mod_perl and then Perl functions, and finally calls functions from the Debug::DumpCore package. At the top we can see the crash_now_for_real( ) function, which was the one that caused the segmentation fault; we can also see that the faulty code was at line 10 of the DumpCore.xs file. And indeed, if we look at that line number we can see the reason for the segfault—the dereferencing of the NULL pointer:
9: int *p = NULL; 10: printf("%d", *p); /* cause a segfault */
In our example, we knew what Perl script had caused the segmentation fault. In the real world, it is likely that you'll have only the core file, without any clue as to which handler or script has triggered it. The special curinfo gdb macro can help:
panic% gdb /home/httpd/httpd_perl/bin/httpd_perl /home/httpd/perl/core (gdb) source mod_perl-1.xx/.gdbinit (gdb) curinfo 9:/home/httpd/perl/core_dump.pl
Start the gdb debugger as before. .gdbinit, the file with various useful gdb macros, is located in the source tree of mod_perl. We use the gdb source function to load these macros, and when we run the curinfo macro we learn that the core was dumped when /home/httpd/perl/core_dump.pl was executing the code at line 9.
These are the bits of information that are important in order to reproduce and resolve a problem: the filename and line number where the fault occurred (the faulty function is Debug::DumpCore::segv( ) in our case) and the actual line where the segmentation fault occurred (the printf("%d", *p) call in XS code). The former is important for problem reproducing, since it's possible that if the same function was called from a different script the problem wouldn't show up (not the case in our example, where using a dereferenced NULL pointer will always cause a segmentation fault).
21.6.5 Extracting the Backtrace Automatically
With the help of Debug::FaultAutoBT, you can try to get the backtrace extracted automatically, without any need for the core file. As of this writing this CPAN module is very new and doesn't work on all platforms.
To use this module we simply add the following code in the startup file:
use Debug::FaultAutoBT; use File::Spec::Functions; my $tmp_dir = File::Spec::Functions::tmpdir; die "cannot find out a temp dir" if $tmp_dir eq ''; my $trace = Debug::FaultAutoBT->new(dir => "$tmp_dir"); $trace->ready( );
This code tries to automatically figure out the location of the temporary directory, initializes the Debug::FaultAutoBT object with it, and finally uses the method ready( ) to set the signal handler, which will attempt to automatically get the backtrace. Now when we repeat the process of starting the server and issuing a request, if we look at the error_log file, it says:
SIGSEGV (Segmentation fault) in 29072 writing to the core file /tmp/core.backtrace.29072
And indeed the file /tmp/core.backtrace.29072 includes a backtrace similar to the one we extracted before, using the core file.