Suppose you have a php script where a user is prompted to enter a number. You then do something with that number ... you increment it, perform some other math calculation on it, search the database for records with the ID # the user passed in as a query string, etc.
But what if your script is expecting a number, but they passed in something like apple?
What if you were expecting the end-user to visit the URL www.example.com?id=5
but, instead, they went to the URL www.example.com?id=apple
?
You can't increment apple, as it would throw an error. You can't look up the ID # apple in your database ... or worse yet, a malicious person can use an SQL injection attack string that could actually destroy your database!
Therefore, you always want to sanitize user input into the format you are expecting.
If you are expecting $variable
to be an integer, then do $variable = intval($variable);
and that will convert whatever $variable
happens to be to its integer equivalent. If you are expecting $variable
to be a positive integer (e.g. an ID # in a database) then do $variable = abs(intval($variable));
. If you want to strip HTML tags from a string, you can use the strip_tags() php function.
PHP additionally has sanitization functions to ensure a string is properly formatted as an email or a URL. Here is a more complete list but you can use it as so:
// To strip all characters except those that are permitted in email addresses
$email = filter_var($email, FILTER_SANITIZE_EMAIL);
You can also use validation filters which return true or false depending on if a string is in a specific format. For example:
if (filter_var($email, FILTER_VALIDATE_EMAIL) {
echo 'This string looks like a valid email address.';
}
By using filter_var()
with sanitization and verification flags, you can ensure that a string doesn't contain hidden or weird characters, invalid characters, or an invalid format, that can screw up what you're expecting a string to look like.
Suppose you want to pass a string into MySQL. I've seen people write MySQL queries like this: SELECT * FROM user_table WHERE string = '$string';
The problem with this is what if $string
contains a quote?! What if the value of string is:
My name is 'Dani'
Then, you'd actually be running the SQL query:
SELECT * FROM user_table WHERE string = 'My name is 'Dani'';
Notice the extra single quote at the end there. That would throw a MySQL error. But it gets worse! What if the value of the string that the end-user passed into the form or URL is:
My name is 'Dani'; DROP TABLE user_table;
You would then execute two SQL queries:
SELECT * FROM user_table WHERE string = 'My name is 'Dani';
DROP TABLE user_table;
The end-user could literally delete your entire table!
If you want to pass a string into MySQL, then MySQL has a sanitization function that automatically escapes potentially dangerous characters from the string, so that you aren't susceptible to these types of hacks, and the string stays within the quotes the way you intended it to. Here is a link to the mysqli::real_escape_string()
function you want to use anytime you need to pass a string into a MySQL query.
You can use it like this:
// New database connection
$mysqli = new mysqli("localhost", "my_user", "my_password", "database_name");
// Value of the string (either via query string, form, some other unknown/unsanitized user input, etc.)
$string = "This is Dani's string";
// It's important to sanitize the string before using it in a query!
$string = $mysqli->real_escape_string($string));
$query = " SELECT string FROM table WHERE string = '$string' ";
// Execute the MySQL query
$result = $mysqli->query($query);
If you want to sanitize a string before being echo'ed out to the web browser, you want to use htmlspecialchars()
. You would do something such as:
$string = 'An apple & a banana is invalid HTML.';
// Converts the & (which is invalid HTML) to &
echo htmlspecialchars($string);
In conclusion, always sanitize any variable where you don't have 1000% control over its value (e.g. all user input). However, sanitization should always be the last step before it is used in a database query, echo'ed to the screen, etc. You don't want to accidentally perform PHP-based calculations or manipulation with sanitized data, or you might wind up with unexpected results, depending on what you're trying to do.