Scan web pages for attributes.

Tactical Fart 0 Newbie Poster

15 Years Ago

My current goal is to scan and parse an html page and get the attributes from the tags. Right now, I can take a page and scan everything token by token and save them to a String. What I want to so is pick out certain tags and strip them to get certain attributes. Here's an example (the following code does not work):

<img class="awsome" src="http://www.thebestsiteonthefaceoftheplanet.com/image.jpg">

I know that I have to use something like:

s = new Scanner(new BufferedReader(new InputStreamReader(yahoo.openStream())));
            String stuff;
            while (s.hasNext())
            {
                stuff = s.nextLine()
                System.out.println(stuff);
            }

This will barrel through the source (the html file) token by token, save them to the string "stuff" and print a line using "stuff" as the argument.

Now I need to find a way to pick out a target tag, and get certain attributes. I want to detect when the img tag is run over, and harvest the various attributes within, separating them into different strings. Using the tag demonstrated above, I want to find an <img> tag, with with the class attribute "awesome" and get the src attribute.

I think I'm making this more difficult than it needs to be. There might be a simpler way of doing this and I'm not seeing it. Also, whenever I'm done, I need to do something else that's a bit more complex, but I'm taking baby steps right now. Anyone know whether I should keep on this line of thinking, or is there a better way to do this?

java

1 Contributor
0 Replies
58 Views

Be the first to reply

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.