Hey All!
New to here, but I have a couple questions:
How do I find Average word length, and how to I printout the occurrence of each word size?
Needs to look like this:
length frequency
------ ---------
1 3
2 13
3 24
4 13
5 10
6 2
7 5
8 3
9 1
10 3
11 2
> 23 0
------ ---------
Average length = 4.2
And also, how do I format this:
abcdefghijklm
27111103397242502185
nopqrstuvwxyz
2526502130357110120
to match this:
a b c d e f g h i j k l m
27 1 11 10 33 9 7 24 25 0 2 18 5
n o p q r s t u v w x y z
25 26 5 0 21 30 35 7 1 10 1 2 0
I have a main class, but its irrelevant to post, here's my instantiable class where all the calculations occur:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.StringTokenizer;
public class TextStatistics {
private Scanner in;
private StringTokenizer ST;
private int[] chr = new int[26];
private int lines = 0;
private int words = 0;
private int characters = 0;
private String temp;
private String temp1 = "";
private String temp2 = "";
private String temp3 = "";
private String temp4 = "";
private char ch;
private double avg = 0.0;
private int count;
public TextStatistics(String args) {
try {
in = new Scanner(new File(args));
while (in.hasNext()) {
temp = in.nextLine().toLowerCase();
ST = new StringTokenizer(temp,
" , .;:'\"&!?-_\n\t12345678910[]{}()@#$%^*/+-");
lines++;
words += ST.countTokens();
count = temp.length();
characters = characters + count;
for (int c = 0; c < temp.length(); c++) {
ch = temp.charAt(c);
if (ch >= 'a' && ch <= 'z') {
chr[ch - 'a']++;
}
}
}
} catch (FileNotFoundException e) {
System.err.print("TextStatistics: File " + args
+ " cannot be found");
}
for (int c = 0; c < 13; c++) {
temp1 += (char) (c + 'a');
temp2 += chr[c];
}
for (int c = 13; c < 26; c++) {
temp3 += (char) (c + 'a');
temp4 += chr[c];
}
}
public int getLines() {
return lines;
}
public int getWords() {
return words;
}
public int getCharacters() {
return characters;
}
@Override
public String toString() {
return "=============================================================\n"
+ lines
+ " lines\n"
+ words
+ " words\n"
+ characters
+ " characters\n"
+ "---------------------------------------\n"
+ temp1
+ "\n"
+ temp2
+ '\n'
+ temp3
+ "\n"
+ temp4
+ "\n"
+ "---------------------------------------\n"
+ "length frequency\n"
+ "------ ---------\n"
+ "------ ---------\n"
+ "Average length = "
+ avg
+ "\n"
+ "=============================================================\n";
}
}
Here's my current output:
Statistics for testfile.txt
=============================================================
11 lines
79 words
458 characters
---------------------------------------
abcdefghijklm
27111103397242502185
nopqrstuvwxyz
2526502130357110120
---------------------------------------
length frequency
------ ---------
------ ---------
Average length = 0.0
=============================================================