@dani
Checking these 2 working codes of your's out. I got some basic questions.
1
<?php
ini_set('display_errors',1);
ini_set('display_startup_errors',1);
error_reporting(E_ALL);
//Dan's Code.
//Code from: https://www.daniweb.com/programming/web-development/threads/538868/simplehtmldom-failing#post2291972
//Sitemap Protocol: https://www.sitemaps.org/protocol.html
// Initiate ability to manipulate the DOM and load that baby up
$doc = new DOMDocument();
$message = file_get_contents('https://www.daniweb.com/programming/web-development/threads/538868/simplehtmldom-failing#post2288453');
// https://www.php.net/manual/en/function.libxml-use-internal-errors.php
libxml_use_internal_errors(true);
// https://www.php.net/manual/en/domdocument.loadhtml.php
$doc->loadHTML($message, LIBXML_NOENT|LIBXML_COMPACT);
// https://www.php.net/manual/en/function.libxml-clear-errors.php
libxml_clear_errors();
// Fetch all <a> tags
$links = $doc->getElementsByTagName('a');
// If <a> tags exist ...
if ($links->length > 0)
{
// For each <a> tag ...
foreach ($links AS $link)
{
$link->setAttribute('class', 'link-style');
}
}
// Because we are actually manipulating the DOM, DOMDocument will add complete <html><body> tags we need to strip out
$message = str_replace(array('<body>', '</body>'), '', $doc->saveHTML($doc->getElementsByTagName('body')->item(0)));
?>
2
<?php
ini_set('display_errors',1);
ini_set('display_startup_errors',1);
error_reporting(E_ALL);
//Dan's Code.
//CODE FROM: https://www.daniweb.com/programming/web-development/threads/540121/how-to-extract-meta-tags-using-domdocument
$url = "https://www.daniweb.com/programming/web-development/threads/540013/how-to-find-does-not-contain-or-does-contain";
// https://www.php.net/manual/en/function.file-get-contents
$html = file_get_contents($url);
//https://www.php.net/manual/en/domdocument.construct.php
$doc = new DOMDocument();
// https://www.php.net/manual/en/function.libxml-use-internal-errors.php
libxml_use_internal_errors(true);
// https://www.php.net/manual/en/domdocument.loadhtml.php
$doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING);
// https://www.php.net/manual/en/function.libxml-clear-errors.php
libxml_clear_errors();
//EXTRACT METAS
// https://www.php.net/manual/en/domdocument.getelementsbytagname.php
$meta_tags = $doc->getElementsByTagName('meta');
// https://www.php.net/manual/en/domnodelist.item.php
if ($meta_tags->length > 0)
{
// https://www.php.net/manual/en/class.domnodelist.php
foreach ($meta_tags as $tag)
{
// https://www.php.net/manual/en/domnodelist.item.php
echo 'Name: ' .$name = $tag->getAttribute('name'); echo '<br>';
echo 'Content: ' .$content = $tag->getAttribute('content'); echo '<br>';
}
}
//EXAMPLE 1: EXTRACT TITLE
//CODE FROM: https://www.daniweb.com/programming/web-development/threads/540121/how-to-extract-meta-tags-using-domdocument
$title_tag = $doc->getElementsByTagName('title');
if ($title_tag->length>0)
{
echo 'Title: ' .$title = $title_tag[0]->textContent; echo '<br>';
}
?>
Q1.
On the first code, you wrote new DOMDocument();
prior to file_get_contents()
.
While on the second code, you did vice versa. using my logic, I reckon it does not matter the order. But what is best practice to speeden-up the php interpreter to handle the job faster ?
Q2.
On both the codes, you wrote ...
// https://www.php.net/manual/en/function.libxml-use-internal-errors.php
libxml_use_internal_errors(true);
// https://www.php.net/manual/en/domdocument.loadhtml.php
$doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING);
// https://www.php.net/manual/en/function.libxml-clear-errors.php
libxml_clear_errors();
... after the new DOMDocument()
AND file_get_contents()
.
Does it have to be in this order or can I add thesese 3 error lines before the
new DOMDocument()
AND file_get_contents()
?
Using my logic, I reckon it does not matter the order. But what is best practice to speeden-up the php interpreter to handle the job faster ?
But, I prefer to add them at the top instead. Is this ok ?
Q3.
On the first code, you put these error lines ...
// https://www.php.net/manual/en/domdocument.loadhtml.php
$doc->loadHTML($message, LIBXML_NOENT|LIBXML_COMPACT);
... while on the second code, another ...
// https://www.php.net/manual/en/domdocument.loadhtml.php
$doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING);
... why you did like this ? What is the significance of doing like this ?
Q3A. What issue will I face if I do vice versa ?
Q3B. Anyway, what is the wisdom behind the way you did things ?
Q3C. What is the REAL difference between the two error codes ?
Q3D. LIBXML_NOENT|LIBXML_COMPACT
what do these 2 mean ?
Q4. Anything else I need to know ?