Hi all,
I have a html file, i need to extract name , address and phone no. from it.
now name is in <span class="name"></span>
add is in <span class="list_address"></span>
phone is in <span style="display:none" ID="phoneVal11"></span>
I have used following code for getting name,add and phone:
<?php
function get_name($file){
echo"In Set_Name<br>";
$h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])name\1\s*>\s*(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
$res = array();
array_push($res,$patterns[2]);
array_push($res,count($patterns[2]));
return $res;
}
function get_add($file){
echo"In Set_Add<br>";
$h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])list_address\1\s*>\s*(?:.*?\s+)*?\s*(.*?)\s*<\/span>/',$file,$patterns);
$res = array();
array_push($res,$patterns[2]);
array_push($res,count($patterns[2]));
return $res;
}
function get_phone_no($file){
echo"In Set_Phone<br>";
$h1count = preg_match_all('/<span\s+(?:.*?\s+)?ID=([\"])phoneVal[1-9].*\1\s*>(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
$res = array();
array_push($res,$patterns[2]);
array_push($res,count($patterns[2]));
return $res;
}
$str = htmlentities(strip_tags("testfile2.html"));
$file = file_get_contents($str);
$name = get_name($file);
$add = get_add($file);
$phone = get_phone_no($file);
// get names
if($name[1] != 0){
echo "<br/>Names Found: $name[1]<ul>";
foreach($name[0] as $key => $val){
echo "<li>" . htmlentities($val) . "</li>";
}
echo "</ul>";
}else{
echo "<br/><div class=\"error\">No Names Found</div><br/>";
}
// get addresses
if($add[1] != 0){
echo "<br/>Addresses Found: $add[1]<ul>";
foreach($add[0] as $key => $val){
echo "<li>" . htmlentities($val) . "</li>";
}
echo "</ul>";
}else{
echo "<br/><div class=\"error\">No Addresses Found</div><br/>";
}
// get phone no.s
if($phone[1] != 0){
echo "<br/>Phone No.s Found: $phone[1]<ul>";
foreach($phone[0] as $key => $val){
echo "<li>" . htmlentities($val) . "</li>";
}
echo "</ul>";
}else{
echo "<br/><div class=\"error\">No Phone No.s Found</div><br/>";
}
?>
now the problem is::
for names, i am getting names from span tags that do not have any other tag nested into it for eg::
<span class="name">
<a href="http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm" onClick='setLSBCookie14(); this.href = "http://clicks.superpages.com/ct/clickThrough?SRC=promo17&target=SP&PN=1&FP=listings&S=NC&C=Insurance&CID=495050&PGID=yp452.8081.1220963148577.12010365560&channelId=sp16202148s&ACTION=log,red&LID=0136893005&relativePosition=14&FL=list&TL=profile&LOC=" + "http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm?SRC=promo17&C=Insurance&L=NC&lbp=1"'">
Murphy, Matthew - State Farm Insurance Agent
</a>
</span>
here output i get is </a>
so i need a solution that this <a> tag is ignored
Problem with address tags is...it has commas and new lines eg::
<span class="list_address">
<br>1425D West 1st Street,
Winston Salem,
NC 27101
</span>
output i am getting is :: NC 27101
Problem with phone no. is..it has phone no. as well as fax or cell no.s so in output i get the last no. i.e eg::
<span style="display:none" ID="phoneVal14"><br>(336) 722-1718
<br>(336) 896-1060 (fax)
</span>
i am getting output :: <br>(336) 896-1060 (fax)
i need both the no.s
please help...i am stuck now for 2 days..i am a begginner please help.
Thanks in advance