Hi
I intend over the next few months to learnt Java with the purpose of building my own simple web crawler/spider. I have seen a few open source spiders but would like to build my own if possible.
What I would like to ask is how would I go about learning java and also would the building of a simple spider be very hard?
My requirements of the spider are as follows:
Go to the entered URL and gather all content from the site
Collect link structure
The app I am developing will need to be able to build a structured sitemap of the specified URL.
One final note is how would I go about building a browser add-on? What languages can they be built in and which browser is best/easiest to develop for?
Thanks