I've been researching this all day, and am having trouble finding something that works. I have installed and used Sphider (PHP/MySQL), and it crawls my site successfully and gives me the URLs and I can do a search for any text on any page, but it doesn't pick up the image filenames in the <img> tags. What I need is a way to somehow get a list of the image filenames in the <img> tags, so that I can do a search (for example IMAGE_1234.JPG) and find the url of the page where that image is on, out of perhaps 100 pages. Or optionally, but not necessary, a list of all the images on the site with the full url, for example:
mysite.com/aaa.html/image1.jpg
mysite.com/aaa.html/image491.jpg
mysite.com/bbb.html/image534.jpg
mysite.com/bbb.html/image123.jpg
What else I've tried:
simplehtmldom and another using curl (errors)
some programs on the web: they retrieve the pictures but don't give me a list
google image search for my site: they only have about 5% of my photos or less, and old versions
So basically I'm looking for a simple spider script in php (my hosting service does not support perl) to get a list of all my images & the path
All the html pages are in the top level directory, but each image is pulled from dozens of various subfolders, each containing about 20 pictures, and there are almost 3000 jpg files, so if I'm looking for a specific picture, I need an easy way to search for it, or click on a link to see it (optional). One easy way would be to download a copy of my entire site contents to my box and do a file search on the computer, but I'd like to have a search box on the actual main web page for this.
Thanks...
frank754 0 Newbie Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.