Security Issues: ctype_digit() Weirdness


If you've ever attended a PHP course, when considering adding form data validation, most instructors will follow the conventional wisdom, which says to avoid using regular expressions if you are able to use any of the PHP ctype_*() family of functions instead.  This handy set of functions is indeed convenient, fast, and have all been in the language for ages.  In contrast with the is_*() functions, instead of determining the data type of a variable, the ctype_*() family examines the contents of the variable.  There is a little bit of weirdness surrounding ctype_digit(), however, of which you might not be aware.  In this article we examine this function's dirty little secret and show you how to avoid ending up with false positives when performing form data validation.


So What About ctype_digit()?

Probably a good starting point when discussing ctype_digit() is to have a look at it's signature.  Consulting the documentation page, we come up with this:

What's important about this signature is that the data type of the variable to be examined is mixed.  This means that, at least theoretically, ctype_digit() is able to examine any variable and return TRUE or FALSE depending on whether or not its contents are 0 - 9 only. 

Using ctype_digit() for Form Data Validation

In the following example, let's assume that you're validating form data.  For the sake of this example, let's assume that $_POST contains the following:

$_POST = [
    'id' => '1111',
    'age' => '49',
    'gender' => 'M',
    'amount' => '99.99',
    'life_universe_and_everything' => '42'
];

If we then run the post data through a foreach() loop, and validate using ctype_digit():

$ptn = "%30s : %s\n";
printf($ptn, 'Form Field', 'Only Digits');
foreach ($_POST as $key => $value)
    printf($ptn, $key, (ctype_digit($value) ? 'Y' : 'N'));

The resulting output would appear as follows:

                    Form Field : Only Digits
                            id : Y
                           age : Y
                        gender : N
                        amount : N
  life_universe_and_everything : Y

OK, so far so good.  So ... what's the problem?

Houston We Have a Problem!

What a developer might next do with the data, prior to validation, is perform a normal bit of sanitization.  Here's how the sanitizing code might appear:

$id = $_POST['id'] ?? 0;
$age = $_POST['age'] ?? 0;
$gender = $_POST['gender'] ?? '';
$amount = $_POST['amount'] ?? 0.00;
$life_etc = $_POST['life_universe_and_everything'] ?? 0;

$_POST['id'] = (int) $id;
$_POST['age'] = (int) $age;
$_POST['gender'] = (in_array($gender, ['M','F','X'])) ? $gender : 'X';
$_POST['amount'] = (float) $amount;
$_POST['life_universe_and_everything'] = (int) $life_etc;

However, if we then run the sanitized code through the same loop as shown above, the result is slightly different, as seen here:

                    Form Field : Only Digits
                            id : Y
                           age : Y
                        gender : N
                        amount : N
  life_universe_and_everything : N

The last item should come back as containing only digits, as it's the number 42.  But, as you can see from the output, we now have a false positive.  Help!!!  What's going on?

ctype_digit's Dirty Little Secret

The answer to this perplexing problem can be found if we return the the PHP documentation page for ctype_digit().

As you can see from the documentation, although ctype_digit() accepts arguments of mixed data type, if the data type is int, and the value is between -128 and 255, it's treated as an ASCII code!  So, in the example shown just above, the id field, with a value of 1111, passed without a problem.  The life_universe_and_everything field, on the other hand, was treated as an ASCII character 42, which happens to be the code for an asterisk (*).  Since an asterisk is not a digit, the return value from ctype_digit() was FALSE.

Got It ... But What About Age?

But now we have a bigger problem: why did age return TRUE?  The answer to this again takes us back to the ASCII table.  The value for age was 49.  A look at the ASCII table shows us that 49 is the ASCII code for the number one!  Accordingly ctype_digit() returns a value of TRUE, as one is certainly a digit.

Final Thoughts

When performing form data validation using any of the ctype_*() family of functions, especially ctype_digit(), it would be best to leave the data present in $_POST (or $_GET) in its original state while performing validation.  The ctype_*() family works best with string data, despite the fact that it technically accepts any data type.  You can then perform sanitization after the validation has occurred.