I am writing a code that reads a user-input text file, and displays all words(excluding duplicates), and displays them in ascending order. My code runs correctly, and displays the words. The only problem are quotation marks. I uploaded a sample from a fanfiction I wrote a long time ago, and it displays the quotation marks, as well as three periods(which are used to denote a pause in speech). For example, one of the sentences is "Jackson...help them!". In the output, it displays '"Jackson...help' in the sorting. Not only this, but hyphenate words(like 'color-blind'), are displayed as one word without removing the -. The same with words like goin'(short for going). It displays goin', instead of just showing goin. I'm certain the problem lies in my split statement, but I can't find the problem. Is it in the split statement, or elsewhere. Thanks for any help in advance.
import java.io.*;
import java.util.*;
public class ProblemOne {
public static void main(String[] args) throws IOException{
Scanner input = new Scanner(System.in);
System.out.println("Enter a text file, from which all words(excluding their duplicates) are sorted in alphabetical order." + "\nNote! You must input the ENTIRE FILE LOCATION." + "\nExample: C:/Users/Username/File Location/Filename.txt" );
String text = input.nextLine();
File file = new File(text);
if (!file.exists()) {
System.out.println("The input file you specified either does not exist or is not in the designated location.");
}
else
System.out.println("After sorting, and removing duplicates, the words from the text sorted in alphabetical order are: ");
BufferedReader wordRead = null;
Set<String> noDuplicates = new TreeSet<String>();
String[] words;
try {
wordRead = new BufferedReader(new FileReader(text));
String nextSentence;
while ((nextSentence = wordRead.readLine()) != null) {
words = nextSentence.split("[ \n \t \r . , ; : ' - ... ! ? ( ) { } ]");
for (int i = 0; i < words.length; i++) {
noDuplicates.add(words[i]);
}
}
}
catch (IOException e) {
e.printStackTrace();
}
finally {
wordRead.close(); //closes the reader
}
List<String> noDuplicateWords = new ArrayList<String>(noDuplicates);
Collections.sort(noDuplicateWords);
for (String word : noDuplicateWords) {
System.out.println(word);
}
}
}