GDB, or the GNU Project Debugger, allows you to debug a program while it is executing. GDB can start a program, stop it on specified conditions, examine the execution environment when the program is stopped and allow you to change things inside the running program itself.
Let’s get started by changing to the php-src directory
$ cd ~/php-src
The .gdbinit script is provided with the PHP source distribution. By copying this file to the home directory, gdb will always load this file when it starts.
$ copy .gdbinit ~
Alternatively you can load the script into gdb at runtime:
(gdb) source /home/vagrant/php-src/.gdbinit
The .gdbinit script provides a number of commands within gdb that are helpful when debugging extensions.
To view the available commands once the .gdbinit script is loaded provided try the following:
(gdb) help user-defined
Of particular interest is zbacktrace which enables one to backtrace from C to PHP.
To familiarise ourselves with GDB and its commands available for debugging we will step through a PHP PCRE extension test.
The run-tests.php script locates test files and then creates a unique shell script to execute each one in a pristine context. To avoid the complexity and overhead of dealing with GDB and fork/exec, we will run our test directly and bypass the run-tests.php script.
Now let’s load gdb and tell it we intend to run the phpt test file by executing the PHP CLI binary with an argument to its location:
$ gdb --args sapi/cli/php ext/pcre/tests/preg_match_basic.phpt
The PHP function we intend to examine is preg_match() and is defined in the ext/pcre/php_pcre.c file as a static PHP_FUNCTION wrapper around another static function php_do_pcre_match.
Our test file, ext/pcre/tests/preg_match_basic.phpt evaluates a number of regular expressions against a string and extracts matches into a variable. The .phpt file defines a PHP script to run as input and the corresponding output data that it expects to see from a successful test.
$string = 'Hello, world. [*], this is \ a string';
Running the above should produce:
Where int(1) signifies preg_match returning true for a match, with an array(1) of matched strings, in this case there is only one match, “Hello, “.
Knowing that php_do_pcre_match is called, let’s tell gdb we would like to break on invocation of php_do_pcre_match and then run the test.
TODO: more info on breakpoint formats.
(gdb) break php_do_pcre_match
After a few moments, you should see something similar to:
Breakpoint 1, php_do_pcre_match (execute_data=0x7ffff0214260, return_value=0x7ffff0214100,
Let’s use the zbacktrace command to find out where we came from.
The above output shows that we are inside the the function that generates the output that will be used as a comparison for the first assertion of our unit test. The first unit test in the preg_match_basic.phpt file tries to find matches based on the regex “/^[hH]ello,\s/“ against our subject string “Hello, world. , this is \ a string”.
Running next allows us to move line by line through the execution of the original source code. Enter next a few times until you reach a function invocation that looks like this:
In GDB, we can step inside any function call. The ‘next’ command by default steps over functions where as ‘step’ allows us to follow the execution path.
Let’s step inside! For function calls that span multiple lines in the source file, you may have to call step a few times.
Now we are inside the lowest level of the extension that wraps about the PCRE library. Looking at the source for php_pcre.c, we see that php_do_pcre_match calls another function php_pcre_match_impl after parsing the arguments from the Zend VM. We know the string that’s the subject for evaluation, let’s see if we can use the argument subject to php_pcre_match_impl to catch it.
In GDB a break can take a conditional argument. Knowing that a string starting with the character ‘H’ is the subject for our unit tests, let’s try and break on that condition. To do so we’ll kill the current running program, delete the existing breakpoint and create a new conditional one.
Delete the existing breakpoint:
(gdb) delete 1
Now let’s set a conditional breakpoint in the lower level function we stepped into before:
(gdb) break php_pcre_match_impl if subject == 'H'
Breakpoint 2, php_pcre_match_impl (pce=0x1a98b70, subject=0x7ffff02028b8 "Hello, world. [*],
Let’s have a look at what’s happening inside this function and start execution line by line:
(gdb) info args
The args output clearly shows the string we were looking for. Local shows the variables in scope for the current function. Now we can start moving through the code and find out how php_pcre_match_impl works. Looking at the source, we can see a call to a library function pcre_exec at line 688. Let’s break there.
First we create a new breakpoint, continue execution, and then step inside the pcre_exec call.
(gdb) break 688
Once we’re done debugging the pcre_exec call, to return from the current function we can use ‘finish’ and look at the results being passed back through to the extension.
Now we’re back from the call to count = pcre_exec(…), let’s look at the result.
Run till exit from #0 php_pcre_exec (argument_re=0x1a98a00, extra_data=0x1a98a80,
Excellent, so our library function pcre_exec correctly found 1 match for the regular expression against the subject and returned the result to the local variable count in the php_pcre_match_impl function.
Let’s learn how to unpack and inspect some internal zend engine data types to look at the results being returned back to PHP. First, let’s finish the current php_pcre_match_impl function and get back to our entry point, php_do_pcre_match.
The above message indicates we’ve returned back to php_do_pcre_match, let’s investigate the local variables available to us.
(gdb) info locals
We know that the 3rd argument of preg_match takes a variable to return an array of the matches as subpatterns. The results are held in the subpats* pointer, let’s find out more about it.
First, let’s find out the type of subpats using the ptype command.
(gdb) ptype subpats
The above output indicates that subpats is a pointer to a _zval_struct that holds a zend_value type named value. What’s a zend_value? It’s the generic data type PHP uses internally for most of its scalars/primitives.
We can find out more information about zend_value using the same ptype command. Note, that we can query the type directly or a variable of that type. I.e. ptype zend_value would return the same information.
(gdb) ptype subpats->value
The above exposes the generic data structure PHP uses internally for holding values. We know the return type is an array, so we expect zend_array *arr to hold our information.
Next, we print the type of a zend_array.
(gdb) ptype subpats->value->arr
Aha! PHP uses a ‘bucket’ concept to store it’s array data. We expected 1 match from our unit test on the word ‘Hello’, let’s find out how many items are in the subpats array.
(gdb) print subpats->value->arr->nNumOfElements
So there is one element in our array, presumably held in the arData pointer to a Bucket. Let’s find out more about the structure of a Bucket type by querying the type directly instead of the arData pointer.
(gdb) ptype Bucket
So a Bucket consists of a val and a key. A zval has a value that’s a zend_value, and we know the array result should contain a string for the subpattern match. Therefore subpats->value->arr->arData->val->value->str should be a zend_string with a length property and a char val.
(gdb) print subpats->value->arr->arData->val->value->str->len
So our result has a length of 7. Let’s look at the result itself. GDB print command (or p or x) has a number of additional and useful features. First, we can specify the output format, and for arrays we can pass our length using the @ operator.
(gdb) print/c subpats->value->arr->arData->val->value->str->[email protected]
You can also use the x command to inspect variables/memory
(gdb) x subpats->value->arr->arData->val->value->str->val
And there it is, our match “Hello, “ being returned to PHP.
What have we learnt? Zval’s are just a struct that contain’s a zend_value, and a zend_value is a generic data type that can represent any PHP value. A PHP array consists of a “bucket” structure (among other things), which is ultimately just a reference to another zval (i.e. any type).
We have a zval -> zend_value -> array -> bucket -> zval -> zend_value -> str where our subpattern match is held.
As a final note, it’s important to know that all PHP functions are exposed to GDB through a symbol table that prefixes them with zif_. For example, we could have chosen to break on zif_preg_match insterad of php_do_pcre_match initially. This would have broken at the static function wrapper around php_do_pcre_match. Global object functions are exposed under the zim_ prefix with the pattern zim_classname_methodname.