Hey guys,
I am writing a bot class to scrape some information off of websites.
Here are the requirements.
- Specify Url
- Check for valid url
- 'GET' contents of url with curl
- check mime type & response status code
- check for special url
- parse special data
- parse for standard data
- return data as array or json
now i will explain each requirment:
Specify a url: the class would accept a url from a form to know what website to scrape.
check for valid url: Check if url is fully qualified.
'GET' contents of url with curl: perform a curl get request on specified url and return the websites contents.
check mine type & response status code: only allowing certain response codes, ex: 200. and mimes types of text/html.
check for special url: youtube url's use query vars to serve content. for example: http://youtube.com/watch?v=9ha98h
parse special data: parse the youtube video or other content unique to that site.
parse standard data: retrieve document keywords, description, images, etc...
return data as array or json: may be using ajax or a normal form, would like to return either type.
I have most of this functionality complete but im running into errors when i try using static methods.
problem:
I have a parent class with a method of parse and sub classes for special url's
ex: class Bot{}, and class Bot_Special_Youtube extends Bot{}
the child class has a parse function also. if the url is special, we will use the child class parse method which calls parent::parse();
im having trouble here.
here is some code
// Find out if it the url
// is a special case
$this->_find_special();
if($this->is_special() !== false)
{
$class = 'Bot_Special_'.ucfirst($this->_special);
$class::parse();
}else
{
Bot::parse();
}
I'm getting an error Using $this when not in object context
Here is the offending code:
/**
* Parses the returned content from url
* into something useable
*
* @return array
*/
public static function parse()
{
$images = $this->_document->find('img');
foreach($images as $image)
{
$this->_return['images'][] = $image->src;
}
$keywords = $this->_document->find('head meta[name=keywords]');
$description = $this->_document->find('head meta[name=description]');
if(isset($keywords))
{
$this->_return['meta']['keywords'] = $keywords->content;
echo $keywords->content;
}
if(isset($description))
{
$this->_return['meta']['description'] = $description->content;
echo $description->content;
}
}
If you guys could help me please that would be amazing.
Thanks in advance.