Hello
I am new to Java & I am trying to find ways(in built Java objects/ways) to parse HTML. Can you suggest some objects in the Java Standard Library?
I have extended the object HTMLEditorKit.ParserCallback but when parsing a web pages' source code it literally takes 2 minutes or more! But some of that is my internet right now is only getting 9kb download per second :P but nvm that.
Here is what I have done to the HTMLEditorKit.ParserCallback (overloaded the handleStartTag() & handleText() functions ):
public class SearchResultGatherer extends HTMLEditorKit.ParserCallback implements Runnable
{
/// Class Variables:
int index = 0;
private Vector <SearchResult> searchResults;
private SearchEngine searchEngine;
private View appView;
private double searchTime;
private int searchQuantity;
/// Class Methods:
public SearchResultGatherer( Vector <SearchResult> _searchResults, SearchEngine _searchEngine, View _appView )
{
searchResults = _searchResults;
searchEngine = _searchEngine;
appView = _appView;
searchTime = -1;
searchQuantity = -1;
Thread thisThread = new Thread( this );
thisThread.start();
// Maybe do
// thisThread.invokeAndWait();
}
public void handleStartTag( HTML.Tag t, MutableAttributeSet a, int pos )
{
for ( int i=0; i<searchEngine.targetElementInfo.length; i++ )
{
if ( t.toString().equals( searchEngine.targetElementInfo[i][0] ) )
{
if ( a.toString().equals( searchEngine.targetElementInfo[i][1] ) )
{
searchEngine.targetElementIdentified = true;
return;
}
else if (i == 1)
{
System.out.println( "Element = " + t.toString() );
System.out.println( "id = " + a.toString() );
searchEngine.targetElementIdentified = true;
return;
}
}
}
}
public void handleText( char[] arg0, int arg1 )
{
if ( searchEngine.targetElementIdentified )
{
System.out.println( arg0 );
// System.out.println( "String arg = " + arg0.toString() );
// System.out.println( "Int arg = " + arg1 );
Object searchData[] = searchEngine.retrieveSearchData( arg0 );
searchTime = Double.parseDouble( searchData[0].toString() );
searchQuantity = Integer.parseInt ( searchData[1].toString() );
//searchEngine.targetElementIdentified = false;
}
}
.....