Hi everybody, I'm interested in creating a web crawler, but can't really settle on what I'd like the program to do. It's more of an exercise in the technology, and expanding it to achieve new, great things.
I am proficient in Python, so I will naturally be using that language, alongside the module urllib2, because I have some experience with it, and it is fantastic for pulling a webpage's source code, which can then be parsed.
So what we have so far:
-Python
-Urllib2 module
I will need to research regex and re-learn it, so that I will be able to create the functions that will handle parsing the page source in order to extract all URLs.
Now this is where my question really comes in. What types of things can I/Should I use a web crawler to do?
Throw at me some really interesting things! Thanks!