Hi all,
I'm currently learning Java right now through a class and my instructor has asked me to try writing a word-count program using a Map interface. It should run through command line and has to identify the number of words, the number of distinct words, and the distinct words that were used.
This is what I have right now:
import java.util.*;
public class WordCount {
public static void main(String[] args) {
// Defining total word count variable
int totalCount = 0;
// Setting up linked hash map so output will display words in order of appearance
Map<String, Integer> textInput = new LinkedHashMap<String, Integer>();
// Determining the number of distinct words and their frequencies of occurrence
for (String a : args) {
Integer freq = textInput.get(a);
textInput.put(a, (freq == null) ? 1 : freq + 1);
}
// Determining the word count using the occurrence frequencies
for (int Values : textInput.values())
if (Values >=2) {
totalCount += Values;
}
else {
++totalCount;
}
// Determining correct grammar and printing word count results
// If there's only one word
if (totalCount == 1) {
System.out.println("The total word count is " + totalCount + " word.");
System.out.println("The word is " +textInput.keySet());
}
// If there's no words
else if (totalCount == 0) {
System.out.println("There are no words.");
// If there's more than one word
} else {
System.out.println("The total word count is " + totalCount + " words.");
System.out.println("There are "+ textInput.size() + " different words.");
System.out.println("The words are: " +textInput.keySet());
}
}
}
So far, I have got it to work very well except for one small hitch - it's case sensitive in that it will identify words spelled with different-case letters, such as "There" and "there", as separate words.
Output using "This is a test sentence that this word count program needs to process" ("This" and "this" are the words to watch here):
The total word count is 13 words.
There are 13 different words.
The words are: [This, is, a, test, sentence, that, this, word, count, program, needs, to, process.]
Output using "This is a test sentence that This word count program needs to process" (the two "This" are spelled both with capital Ts here):
The total word count is 13 words.
There are 12 different words.
The words are: [This, is, a, test, sentence, that, word, count, program, needs, to, process.]
How can I make the program non-case-sensitive?
Thanks in advance!