Hello,

I need some help. Ive got this script which scrapes the IMDB top 250 movies list. What Im trying to do is add a search link next to the year bit.

Ive got it partially working with str_replace but it only adds the link to the first movie. See here. (Pay attention to the actual URL of the search links)

So how would I make it add links to all the movies correctly. I was thinking preg_replace because I could use regex. But I have no idea how to use regex :confused:

Please help. Thanks :D Heres my script...

<?php
function get_inner_string($a,$b,$c) 
{ 
  $y = explode($b,$a); 
  $x = explode($c,$y[1]); 
  return $x[0]; 
} 

//Get Page 
$file = 'http://www.imdb.com/chart/top'; 

//Open Page 
$open_file = file_get_contents($file); 

//Find the list 
$find_ad = get_inner_string($open_file, '<i>For this top 250, only votes from regular voters are considered.</i>', 'The formula for calculating the Top Rated 250 Titles gives a <b>true Bayesian estimate</b>:'); 

//Add http://www.imdb.com/ to the URL's 
$new_page = str_replace('a href="/title/', 'a href="http://www.imdb.com/title/', $find_ad); 

//Find movie name 
$find_movie = get_inner_string($new_page, '/">', '</a>');

//Search URL
$search_url = '<a href="http://www.theflickzone.com/search.php?do=process&sortby=lastpost&titleonly=true&query=' . $find_movie . '"> Search </a>';

$replace_search = str_replace(')</font>', ') - ' . $search_url . '</font>', $new_page); 

echo $replace_search; 

?>

first, after $find_ad has been set, I would split the data by $list = explode("</tr>",$find_ad) then do a replacement on each item like so:

$list=preg_replace("/([^>])(</a> \(\d{4}\))/",'\1\2 - <a href="http://www.theflickzone.com/search.php?do=process&sortby=lastpost&titleonly=true&query=\1"> Search </a>',$list);

//then to wrap it up:
$new_page = '<table border="1" cellspacing="0" cellpadding="4" style="margin-right:30px;">';
foreach ($list as $item){
  $new_page .= $item . </tr>
}
$new_page .= "</table></p>";

The regex line should replace Movie Name</a> (year) with Movie Name</a> (year) - [YOUR LINK] though it's not tested. You could quite easily put the search link at the start of the name but you would have to incorporate the <a ...> tag into the regex.

Hope you can get your head around all that.

Regular expressions are quite simple.

Instead of matching exact characters like in your str_replace() you match patterns.

The pattern is held in two delimiters, denoting the start and end of the pattern. The delimiter has to be non-alphanumeric.
After the end delimiter, you have the modifiers (flags).

So a regular expression could be:

|a|

That matches just the letter a. The delimiter is |.

Another example:

|ape|i

This matches the sequence of characters "ape". The "i" after the second delimiter is a modifier that means the match is case-insensitive. So "ape" can be in any case: eg: "aPE" would be a match.

There are special characters used to match patterns. The most used is the fullstop. "."

The fullstop matches any single character.

So:

|a.e|i

Would match "aPE" as well as "abe" etc.

The other two most used characters are * and +.

* means 0 or more of the character to the left of it.
+ means 1 or more of the character to the left of it.

eg:

|a.*e|i

this would match "appe" or even "ae" since we can match 0 or more of ".". Since . is any character, it can be 0 or more of any character between a and e.

This is just the fundamentals, Regex is very powerful. Take a look at:

http://www.regular-expressions.info/

it has a lot of information on regular expressions.

commented: Good idea explaining the fundamentals +4
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.