Hi,
I would like to help me with a problem I have.
I want to make a program that tokenize the text of an input file and create a new file with all the words (one word per line).
Because in the input file there are numbers, html tags like <p id=#> and numbers like I. II. III. , I would like not to take place in output file.
In my code I have implementate the filereader and filewriter.
I also know that maybe I have to use stringTokenizer but I don't know to continue . . .:-(
Could anyone help me ?
public static void main(String arg[])
{
new TestStreamTokenizer().testInOut(arg[0], arg[1]);
}
private void createReadWriteStreams(String inFName, String outFName)
{
_fileReader = new FileReader(inFName);
_fileWriter = new FileWriter(outFName);
_printWriter = new PrintWriter(_fileWriter);
}
public void testInOut(String inFName, String outFName)
{
createReadWriteStreams(inFName, outFName);
StreamTokenizer tokenizer = new StreamTokenizer(_fileReader);
tokenizer.eolIsSignificant(true);
int nextTok = tokenizer.nextToken();
while (StreamTokenizer.TT_EOF != nextTok)
{
// ........................
//I don't know how can I do it???
}
}
Thanks a lot
P.S. :
--------------------------
My input file is attached
----------------------------