Hi everyone and thanks for reading,

Now that I've been getting my hands dirty with PHP a lot more, I've been starting to freelance and I'm wanting to make sure I have the right systematic approach to protecting data before I commit to any projects.

Whenever we accept user input either to be displayed somewhere on our site, or to be used in an SQL query (perhaps a keyword search or a login), the rule of thumb is: "NEVER TRUST USER INPUT".

Throughout my research the two functions that seem to keep coming up are mysql_real_escape_string() and htmlspecialchars(). The thing that is confusing me is how and when I should be using these.

For example if I have a login form, then surely I would only parse my data through mysql_real_escape_string() as the other function would be unnecessary?

One that I've not figured out conceptually yet is: lets say for example I am storing information about people, such as their name, home town, country and so on. Do I run all these $_POST variables through both commands? Or do I only run it through mysql_real_escape_string() and then when I'm echoing back OUT of the database, use htmlspecialchars()?

I hope I'm not rambling at this point but I'm just trying to figure out if I store the information after I run it through htmlspecialchars, or do I do that when I'm echoing back out of the database. I realise if I do this BEFORE, then if I export to plain text, I'll have a load of HTML characters that no-one will understand.

I know I've not mentioned addslashes() but I've been told that mysql_real_escape_string() is better than addslashes, and all you have to do is run the content through stripslashes() when you echo it back out.

Thanks for any help,


Anthony

First, php.net is a great place to start to learn about the functions and can be alot quicker than asking here ;)

One thing i will say, is never, ever use $_POST, $_GET, $_QUERY...etc directly in SQL queries or the like (you're asking for trouble if you do), always sanitize them first, if the value should only contain numbers, check this first and put them into their own variables. REGEX is ideal for this sort of job.

Always test your applications and try to find security holes in them, or ask a friend to do this as well before publishing it.

htmlspecialchars() will take characters like & and < and > and convert them to their HTML entities, such as &amp; &lt; &gt
in doing this, it means that people cant put HTML into the database and at best mess your styles, at worst add javascript which could well be malicious.
You can do this on either the input or the output, but since most times I assume you will be displaying to a HTML page it makes little difference, only when printing to plain text will it be better to do it on output..

mysql_real_escape_string() would be better to use over addslashes(), they do basically the same thing though, take a look here
stripslashes() will obviously remove any escaped characters from the strings, so this would be needed otherwise all the " and ' would show as \" and \'.

I usually use htmlspecialchars() on the way out of the database because they pose no threat to the database as long as you use mysql_real_escape_string() and other proper measures to ensure data security. I think this better preserves the data and eliminates possible future adjustments to the data. Another thing that I think is way underrated are sql transactions. For instance, to start a transaction you just run the sql command "begin;". At that point you can run any number of queries and none of them will be committed until you run the sql command "commit;". This allows you to run long scripts with many updates, deletes and inserts and literally have no database clean up in case something goes south, you just call "rollback;". It also makes it easy to test such scripts because you can cancel the transaction at the end by running the sql command "rollback;".

Thanks to the both of you for your tips and advice. I've now concluded that sanitizing the data with real_escape_string on the way in is essential, and stripping the slashes is needed on the way out.

Great link by the way,


Anthony

Guys I just have one more question. I made the following script (this is a watered down version just to show you generally what's happening):

if (get_magic_quotes_gpc()) {
   $name  = stripslashes($_POST["name"]);
   $email = stripslashes($_POST["email"]);
   $post  = stripslashes($_POST["post"]);
} else {
   $name  = $_POST["name"];
   $email = $_POST["email"];
   $post  = $_POST["post"];
}

$name = mysql_real_escape_string($name);
$email = mysql_real_escape_string($email); 
$post = mysql_real_escape_string($post);

echo "<p>Name: $name<br />Email: $email<br /> Post: $post<br />.</p>\r\n";

$result = mysql_query("INSERT INTO shouts (name, email, post) VALUES ('$name', '$email', '$post')", $connection);

I looked up the PHP manual for mysql_real_escape_string but it didn't really answer the following question so I'm hoping one of you can help again. If for example the $_POST data for name, email and post was: "test'ing", then the echo would show me this:

Name: \"test\'ing\". Email: \"test\'ing\". Post: \"test\'ing\".

That by my estimation is correct, however when I go to the database records, none of the quotation marks have been escaped! Is the function only meant to escape them upon execution to make sure no foul play occurs, or is there something wrong with my scripting and they are actually meant to appear escaped within the database?

Thanks again,

Anthony

I've figured out that addslashes() is a PHP command to escape slashes altogether, and that is why the slashes remain in the database. The mysql_real_escape_string() command is a MySQL command to specifically make sure that the SQL will execute properly without any quotes messing it up. After it executes properly, the data is put inside the database without any slashes.

Thanks to all the previous posters and the link below explains the whole situation completely, and in plain english:
http://www.phpbuilder.com/board/showthread.php?t=10330933

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.