I've been working on this webcrawler and I've ran into a problem. I can read the first URL and get all the URLs out of the HTML code, but I can't seem to set up a looping structure that will work.
This is basically what it does:
Searches through html of first URL.
It may find, say 20 other URLs contained in that one.
How can I make it to were it would continously search through the ones that were found?
Here is the code I have so far, but it's not complete:
import java.io.*;
import java.net.*;
import java.util.*;
public class CustomWebCrawler implements Runnable
ArrayList alCurrentSearches = new ArrayList();
ArrayList alAlreadySearched = new ArrayList();
ArrayList alMatchingSearches = new ArrayList();
Thread running;
URL enteredURL;
int count = 0;
public CustomWebCrawler()
public void start()
if (running == null)
running = new Thread();
public void stop()
if (running != null)
running = null;
public void run()
if (enteredURL == null || enteredURL.getProtocol().compareTo("http") != 0)
running = null;
BufferedReader br = null;
br = new BufferedReader(new InputStreamReader(enteredURL.openStream()));
String inputText = "";
while ((inputText = br.readLine()) != null)
int first = inputText.lastIndexOf("<a href=");
int end = inputText.indexOf("\">",first);
if (first != -1 && end != -1)
catch(Exception e)
public void findURL(String text, int numFirst, int numEnd)
String link = text.substring(numFirst+9, numEnd);
URL newURL = new URL(link);
if (newURL.getProtocol().compareTo("http") == 0)
if (!(alMatchingSearches.contains(newURL)))
System.out.println(newURL + "");
catch(MalformedURLException mue)