hey, so for my first project in my class, I need to build a crawler
this is what I need it to do:
- crawl and follow all urls on a domain (not go out into other websites)
- get the title of the page
- get the meta tag keywords of the page
- get the meta tag description of the page
- get the url of the page
- store all of this information in a MySQL database
- then follow the url's on the page and do the same thing on those pages
- if possible, I would also like to get the full index of the page and store it into the MySQL database as well (we are going to make a script for our next project to search for keywords of what we crawled, but I can do that myself after I have the data).
I'm not planning to create a public website or anything, just something for private testing. if you cannot help with this but you know of other scripts with similar features to use as reference, that would be huge help as well.