Howdy,
I have a basic html file containing certain data I need to extract. This is the code for just one of the tables on the page:
<TABLE title="Left Magazine"class="dataTable" align="center" cellspacing="0" cellpadding="0"> <caption>Media Details</caption><THEAD><TR class="captionRow"><TH>Slot #</TH><TH>Attn</TH><TH>Status</TH><TH>In Drive</TH><TH>Label</TH><TH>Media Loads</TH><TH>Comment</TH></TR></THEAD> <TBODY>
<TR class="altRowColor" > <TD>1 </TD> <TD> </TD><TD>Full, Gen. 3 </TD> <TD> </TD> <TD>TMSWK2D</TD><TD> 9</TD> <TD> Poor write quality</TD> </TR>
<TR class="altRowColor" > <TD>2 </TD> <TD> </TD><TD>Full, Gen. 1 </TD> <TD> </TD> <TD>TMSWK2C</TD><TD> </TD> <TD>Read Only, Clean Tape</TD> </TR>
<TR class="altRowColor" > <TD>3 </TD> <TD> </TD><TD>Full, Gen. 3 </TD> <TD> </TD> <TD>TMSWK2B</TD><TD> 9</TD> <TD> Poor write quality</TD> </TR>
<TR class="altRowColor" > <TD>4 </TD> <TD> </TD><TD>Full, Gen. 3 </TD> <TD> </TD> <TD>TMSWK2A</TD><TD> 10</TD> <TD> Poor write quality</TD> </TR> </TBODY> </TABLE>
I need to extract the data from both of the "media information" tables, excluding the "in-drive" field. I have found some example code using the DOMDocument() function. This kind of works, but it selects every table in the page, rather than just the two tables + fields I need:
<?php
/*** a new dom object ***/
$dom = new domDocument;
/*** load the html into the object ***/
$dom->loadHTMLFile('inventory_status.html');
/*** discard white space ***/
$dom->preserveWhiteSpace = false;
/*** the table by its tag name ***/
$tables = $dom->getElementsByTagName('table');
/*** get all rows from the table ***/
$rows = $tables->item(0)->getElementsByTagName('tr');
/*** loop over the table rows ***/
foreach ($rows as $row)
{
/*** get each column by tag name ***/
$cols = $row->getElementsByTagName('td');
/*** echo the values ***/
echo $cols->item(0)->nodeValue.'';
echo $cols->item(1)->nodeValue.'';
echo $cols->item(2)->nodeValue;
echo '<hr />';
}
My first question is whether the DOM is the best/easiest way to achieve the parsing. And secondly, how could I modify the code to select only the relevant data?
Cheers for your time