http://www.tgreer.com/class_http_php.html
I've written what I think is a high-quality PHP class for screen-scraping external (or internal) web content. The class includes features to cache scraped content for any number of seconds. So for example, if you want to show stock market data on your site that you scrape from a third-party, you can easily set it so your site hits the source site for fresh content no more than once every 5 minutes. This caching is seamless to your script--you don't have to worry about it.
The class includes a companion script named image_cache.php that can be used as the src attribute in img elements to cache images from external sites. This is useful if you want to incorporate an image from an external site that is dynamically updated on a regular basis. For example--stock charts that are generated every 60 seconds on many websites. This allows you to use their image on your site, but with the caching feature, you don't have to hit their site everytime somebody hits yours.
The class also has 2 static methods that make it simple to extract data from HTML tables. One extracts a table into an array and the other into XML.
The class can perform basic authentication allowing you to scrape protected content. It also cloaks itself as the User Agent of the user requesting your script. This allows you to access content that may normally be blocked to non-browser agents.
The article explains in detail how to use the class, and is itself, a good tutorial for many techniques in PHP.
If you have any comments or questions about this class or the article, let's discuss it here. I'm always wanting to learn more, so lets discuss. I hope you find the class and the article useful.