Hello,

One of the following code got htmlspecialchars. Which code is correct out of the two ?
Both codes build pagination section. Need to add security so users cannot sql inject.
Not using http_build_query function here as I want to build a pagination section without it and already built one with http_build_query function. Just learning different ways to build pagination section. Old way. New way. Ok ?

Page Format 1: https://localhost/Work/buzz/Templates/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&input_1=brute.com&lmt=1&pg=1

Page Format 2: https://localhost/Work/buzz/Templates/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&col_2=email_domain&input_1=brute.com&input_2=brute.com&lmt=1&pg=1

$i = 0;
while($i<$total_pages)
{
    $i++;
    if($bool=='and' || $bool=='or')
    {
        $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'col_2='.urlencode($col_2).'&'.'bool='.$bool.'&'.'input_1='.urlencode($input_1).'&'.'input_2='.urlencode($input_2).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
    }
    else
    {
        $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'bool='.urlencode($bool).'&'.'input_1='.urlencode($input_1).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
    }
    if($i==$page)
    {
        echo "<a href=\"$serps_url\"><b>$i</b></a>";
    }
    else
    {
        echo "<a href=\"$serps_url\">$i</a>";
    }
}

Thank you.

 $i = 0;
    while($i<$total_pages)
    {
        $i++;
        if($bool=='and' || $bool=='or')
        {
            $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'col_2='.urlencode($col_2).'&'.'bool='.$bool.'&'.'input_1='.urlencode($input_1).'&'.'input_2='.urlencode($input_2).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
        }
        else
        {
            $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'bool='.urlencode($bool).'&'.'input_1='.urlencode($input_1).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
        }
        if($i==$page)
        {
            echo '<a href="' .htmlspecialchars($serps_url) .'">' ."<b>$i</b>" .'</a>';
        }
        else
        {
            echo '<a href="' .htmlspecialchars($serps_url) .'">' ."$i" .'</a>';
        }
    }

The URLs that you linked to for the two page formats are located at https://localhost/, which means that only you have access to them, since they're being served locally from your computer. Therefore, I can't tell what is on those pages.

You do want to HTML escape content with an URL, however. For example, I would do:

<a href="https://www.domain.com?foo=abc&amp;bar=def">Link</a>

and not:

<a href="https://www.domain.com?foo=abc&bar=def">Link</a>

Of course, the preferred way is to use http_build_query().

htmlspecialchars() does not prevent against SQL injections, however. It has nothing to do with SQL queries or SQL injection. Instead, it protects against XSS attacks (Javascript attacks), and things that have to do with how the web browser interprets your HTML code.

htmlspecialchars() works by escaping characters that have a special meaning in HTML code, such as < and > and " and &.

Protecting against SQL injections is important too. However, that's a different question/topic entirely, and involves protecting your SQL queries when you connect to a database. You can do that by protecting characters that are used in SQL queries, most notably quotes.

Both forms of injection attacks (both XSS injections that affect HTML code, and SQL injections that affect SQL queries), can be protected by using the filter_var() function. However, you would use different filters with filter_var() depending upon what you're trying to prevent from happening and what your needs are.

commented: Dani, If my code will not prevent XSS attack then can you fix it without using http_build_query bcos I know how to use that one. +0

@Dani

I use prepared statements to prevent Sql Injection.
My mistake. I meant, I want to learn urlencoding and escaping to prevent XSS Attacks.
I know how to build pagination section with http_vuild_query function and now need to preoprly learn how to build with htmlspecialchars() to escape urls and urlencode where necessary. Therefore, if you do not mind then kindly fix my code with comments then I should learn from my mistakes. And future newbies will learn too from your 2 mins effort.
In short, add where necessary these:

htmlspecialchars()
htmlentities()
urlencode()
intval()
etc.

That way, I learn from a single example how to use more than one function properly.

Thanks

@Dani

Just incase you are wondering to know the full context of my pagination section code, then here it is:

<?php

//Page Formats: https://daniweb.com/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&input_1=brute.com&lmt=1&pg=1

//Page Formats: https://daniweb.com/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&col_2=email_domain&input_1=brute.com&input_2=brute.com&lmt=1&pg=1

//Report Error.
ini_set('display_errors',1);
ini_set('display_startup_errors',1);
error_reporting(E_ALL);

//Valid $_GET Items.
$tables = array('admin','admin_settings','links','taggings','affiliates','affiliates_settings','partners','partners_settings','sponsors','sponsors_settings','advertisers','advertisers_settings','members','members_settings','searchers','searchers_settings','users','users_settings');

$links_table_columns = array('id','date_and_time','domain','domain_email','word','phrase','exclusive_keywords','exclusive_keyphrases');

//Extract $_GETs.
$tbl = !EMPTY($_GET['tbl'])?strtolower($_GET['tbl']):links;
$input_1 = !EMPTY($_GET['input_1'])?$_GET['input_1']:die('Make your input for us to search!');
$input_2 = !EMPTY($_GET['input_2'])?$_GET['input_2']:null;
$col_1 = !EMPTY($_GET['col_1'])?strtolower($_GET['col_1']):die('Input MySql Column to search!');
$col_2 = !EMPTY($_GET['col_2'])?strtolower($_GET['col_2']):null;
$bool = !EMPTY($_GET['bool'])?strtolower($_GET['bool']):null;
$page = !EMPTY($_GET['pg'])?intval($_GET['pg']):1;
$limit = !EMPTY($_GET['lmt'])?intval($_GET['lmt']):1;
$offset = ($page*$limit)-$limit;

if(ISSET($col_2))
{
    if(!in_array($col_2,$links_table_columns))
    {
        die('Invalid Mysql Table!');
    }
}

if(!in_array($col_1,$links_table_columns))
{
    die('Invalid Mysql Table!');
}

//Query DB.
mysqli_report(MYSQLI_REPORT_ERROR|MYSQLI_REPORT_STRICT);

$conn = mysqli_connect("localhost","root","","buzz"); //mysqli_connect("server","user","password","db");

mysqli_set_charset($conn,'utf8mb4');

if(mysqli_connect_errno())
{
    printf("Mysqli Connection Error: %s",mysqli_connect_error());
}

$stmt = mysqli_stmt_init($conn);

if($bool=='and')
{
    $input_1 = $_GET['input_1'];
    $input_2 = $_GET['input_2'];
    $sql_count = "SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ? AND $col_2 = ?";
    $sql = "SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ? AND $col_2 = ? LIMIT $limit OFFSET $offset";
}
elseif($bool=='or')
{
    $input_1 = $_GET['input_1'];
    $input_2 = $_GET['input_2'];
    $sql_count = "SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ? OR $col_2 = ?";
    $sql = "SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ? OR $col_2 = ? LIMIT $limit OFFSET $offset";
}
else
{
    $input_1 = $_GET['input_1'];
    $sql_count = "SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ?";
    $sql = "SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ? LIMIT $limit OFFSET $offset";
}

if(!mysqli_stmt_prepare($stmt,$sql_count)) //Fetch All Matching Rows Number.
{
    echo 'Mysqli Error: ' .mysqli_stmt_error($stmt);
    echo '<br>';
    echo 'Mysqli Error No: ' .mysqli_stmt_errno($stmt);
}
else
{
    if($bool=='and' || $bool=='or')
    {
        mysqli_stmt_bind_param($stmt,"ss",$input_1,$input_2);
    }
    else
    {
        mysqli_stmt_bind_param($stmt,"s",$input_1);
    }

    mysqli_stmt_execute($stmt);
    mysqli_stmt_store_result($stmt); //Necessary to use with mysqli_stmt_affected_rows() when SQL query is SELECT.

    //Fetch Matching Rows Count.
    //mysqli_stmt_affected_rows() has to come after mysqli_stmt_store_result().
    echo 'Total Result: ' .$rows_count = mysqli_stmt_affected_rows($stmt);
    mysqli_stmt_free_result($stmt); //Is this really necessary ?

    if(!mysqli_stmt_prepare($stmt,$sql)) //Fetch Rows based on Row Limit per page.
    {
        echo 'Mysqli Error: ' .mysqli_stmt_error($stmt);
        echo '<br>';
        echo 'Mysqli Error No: ' .mysqli_stmt_errno($stmt);
    }
    else
    {
        if($bool=='and' || $bool=='or')
        {
            mysqli_stmt_bind_param($stmt,"ss",$input_1,$input_2);
        }
        else
        {
            mysqli_stmt_bind_param($stmt,"s",$input_1);
        }

        mysqli_stmt_execute($stmt);
        $result = mysqli_stmt_get_result($stmt);

        while($row = mysqli_fetch_array($result,MYSQLI_ASSOC))
        {
            $id = $row['id'];
            $domain = $row['domain'];
            $word = $row['word'];
            $phrase = $row['phrase'];

            echo "$id<br>";
            echo "$domain<br>";
            echo "$word<br>";
            echo "$phrase<br>";
            echo "<br>";
        }
    }
}

mysqli_stmt_close($stmt);
mysqli_close($conn);

echo 'Total Pages: ' .$total_pages = ceil($rows_count/$limit);
echo '<br><br>';

$i = 0;
while($i<$total_pages)
{
    $i++;
    if($bool=='and' || $bool=='or')
    {
        $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'col_2='.urlencode($col_2).'&'.'bool='.$bool.'&'.'input_1='.urlencode($input_1).'&'.'input_2='.urlencode($input_2).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
    }
    else
    {
        $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'bool='.urlencode($bool).'&'.'input_1='.urlencode($input_1).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
    }
    if($i==$page)
    {
        echo '<a href="' .htmlspecialchars($serps_url) .'">' ."<b>$i</b>" .'</a>';
    }
    else
    {
        echo '<a href="' .htmlspecialchars($serps_url) .'">' ."$i" .'</a>';
    }
}

?>

Dani, If my code will not prevent XSS attack then can you fix it without using http_build_query bcos I know how to use that one.

Sorry if I was unclear. The second code snippet you posted, that uses urlencode() or intval() for every variable in the query string, and that uses htmlspecialchars() for every variable that you echo out in your HTML code, is on the right track. It's okay that you are not sanitizing $i when you spit it out, because it's clear that there's absolutely no possible way for it to ever be anything other than an integer, so you're already safe there.

There are 3 simple rules to follow:

  1. ALL HTML code should always be HTML sanitized. All variables that are echo'ed out in HTML should always be sanitized with htmlspecialchars(), regardless of where they appear in your HTML code (within links, not within links, etc.).
  2. All query strings should always be URL encoded. You want to use urlencode() on every variable that appears within a query string.
  3. All URL paths should always be raw URL encoded. You want to use rawurlencode() on every variable that appears within a URL path.

htmlspecialchars() escapes the characters: & < > " and converts them into &amp; &lt; &gt; &quot; and you always want to escape these characters in all of your HTML code, whether it's a variable or not. For example: <strong>This is my <code>!</strong> is invalid HTML, because <code> is interpreted to be an HTML opening tag. Instead, the only way to represent this with valid HTML would be: <strong>This is my &lt;code&gt;! ... The htmlspecialchars() function just makes it easier to convert those strings for you, so you can do something like: <strong><?php echo htmlspecialchars('This is my <code>!') ?></strong>. So you see, you need to escape your HTML whether it's a string you wrote out yourself hard-coded into your page, or if it's user input.

For people experienced at writing HTML, it just comes second nature for us to type out &amp;, etc. each time we write a character that needs sanitizing in HTML. There's only 4 to remember, after all! So we only have a need to use htmlspecialchars() when it's user input.

The exact same concepts hold true for urlencode() and rawurlencode(). Your URLs always need to be encoded. It's just that, for most people, it's well understood nowadays that you wouldn't expect to see super weird characters in the middle of a URL. For example, most people understand that there can be no such URL that looks like: https://www.dom&a"in.com/p@age.php. Therefore, it's pretty common place to just use urlencode() and rawurlencode() when parts of the URL are variables or user input, where you want to make sure nothing weird was able to sneak in there.

Here's an example of the attack you're trying to prevent:

Imagine you have the code:

<a href="<?php echo $url ?>">Link</a>

Now imagine if the value of $url is literally the string: My string "uses" quotes. You would be echo'ing out:

<a href="My string "uses" quotes.">Link</a>

The problem with this, is that the web browser will interpret it as a link <a> with an href value of: My string because the value of the href is literally from one " to the next ". So by using rawurlencode($url) we would instead be generating the HTML:

<a href="My%20string%20%22uses%22%20quotes.">Link</a>

The link might still be an invalid URL, but at least it doesn't generate any malformed HTML that could ultimately cause the entire webpage to be broken, or, worse, a malicious user could make the value of $url something that intentionally breaks the site in a way that works to their advantage.

As for htmlentities(), it's not something you need to concern yourself with right now. Basically what it does is encode every non alpha-numeric character with an encoded version (that your browser can understand just fine, so it will all look normal), instead of htmlspecialchars() which only encodes the 4 characters that have the ability to break your HTML.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.