Function, which picks up from specified page all images that are larger than 50kt. The function returns arrayn, which contains the image URL and the image size in kilobytes. How do i start to do this? Do i use wget and exec? Is there easier way to do it. So first i need to download all images. Then analyze images and get url and size.

There is a nice function called file_get_contents which gets the contents of a url and stores it in a variable then you can place it into a file or process it into a mysql database etc. For example

<?php
//first to specify the url
$url='http://images.daniweb.com/logo.gif';
//now to retrieve it
$imagedata=file_get_contents($url);
//now to save it
file_put_contents('image.gif');
//and image.gif will be in the same directory as your php file

And there you go. As simple as that.

You can, also, query the remote server for an HEAD request and check if provides Content-Length, something like:

<?php
$url = 'http://www.website.tld/image01.jpg';
$head = get_headers($url);
$length = str_replace('Content-Length: ','',$head[6]);
if($length < 50000)
{
	echo 'too small: ';
}
else
{
	echo 'ok: ';
}
echo $length;
echo "\n";
?>

And for the array you can do a loop, it's simple:

<?php
$url = array(
	'http://www.website.tld/image01.jpg',
	'http://www.website.tld/image02.jpg',
	'http://www.website.tld/image031.jpg'
	);

foreach($url as $key)
{
	$head = get_headers($key);
	$length = str_replace('Content-Length: ','',$head[6]);
	if($length >= 50000)
	{
		$result[$key] = $length;
	}
	
}

print_r($result);
?>

But you still need to grab all the images links from a specific page. Good work.
Bye :)

cwarn23 i know that function but problem is that we dont know what is name of image and we need to download all images not only one.

If you want to download more than one image then perhaps a loop might be best. For example.

<?php
//first to specify the url
$links=array(
'http://images.daniweb.com/1a.jpg',
'http://images.daniweb.com/2c.jpg',
'http://images.daniweb.com/3d.jpg',
'http://images.daniweb.com/4h.jpg',
'http://images.daniweb.com/5f.jpg',
'http://images.daniweb.com/6e.jpg',
'http://images.daniweb.com/7d.jpg');
foreach ($links AS $url) {
//now to retrieve it
$imagedata=file_get_contents($url);
//now to save it
file_put_contents(basename($url),$imagedata);
//and image.jpg will be in the same directory as your php file
}

@siina
I was looking at cwarn23 code and I tried to mix it with mine (hope is not a problem ^_^ and that works for you), this will scan the link you set and build an array of those images greater than 50kb:

<?php
$url = "http://www.website.tld"; # no ending slash
$data = file_get_contents($url);
$pattern = "/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i"; # search for img tags
preg_match_all($pattern, $data, $images);

function valid_url($u)
{
	if(preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $u))	{ return true; }
	else { return false; }
}

# print_r($images); # uncomment to check $images array

$result = array();
foreach($images[1] as $key)
{
	$link = $url . $key;
	if(valid_url($link) === true)
	{
		$head = get_headers($link);
		$length = str_replace('Content-Length: ','',$head[6]);
		if($length >= 50000)
		{
			$result[$link] = $length;
		}
	}
}

if(empty($result))
{
	echo 'no data';
}else{
	print_r($result); # array to use for retrieving images
}
?>

This script is not perfect because will search only for img and object tags but not for images included by CSS, and you have to figure for: relative paths, absolute paths, complete links, external images... Right now this example works only with absoute paths, so <img src="/images/pic01.jpg" /> rather than <img src="../images/pic01.jpg" /> or <img src="http://a-website.tld/images/pic01.jpg" />

siina,
1) firstly you need to take all html content from site url using file_get_contents.
2) Then find all image tags from html source using preg_match_all.
3) have a loop of images array and again use file_get_contents function to grab image source and save it in your folder.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.