You will create a program that determines the language of a given input file based on the words in that file. The structure of the input files will be one word per line (all lowercase letters). You should read these words in one LETTER at a time.
Each input file will contain 100000 letters, and be in a different language.
The way you are able to tell the languages apart is as follows:
English: The top letter frequencies are e, i, and a respectively.
Danish: The top letter frequencies are e, r, and n respectively.
Italian: The letters j, x, y do not exist in Italian (other than proper nouns, which are not present in the input files).
Your program MUST include 5 functions in addition to your main() function:
- A void function that will take an array of size 100000, where each element is a letter from the input file, and another array of size 26 (size of the alphabet), where each element is a number indicating the frequency of occurrence (how often the letter shows up in the first array). This function will be used to fill the second (occurrence) array. The occurrence should be a percentage. HINT: The following code will be helpful when trying to determine which array position in the second array to increment:
temp_char = a[i]; // array a is the array of size 100000, with all letters
b[temp_char - 'a']++; // b is the array of size 26
- A void function that will initialize an array of characters of size 26 so that element 0 = ‘a’, and so on until element 25 = ‘z’. HINT: The following code will be helpful:
alpha[i] = (char)i+97; // where alpha is the array is of type char, i is the element
// this uses type-casting to a char; 97 is ASCII-decimal code for ‘a’
A void function that sorts two arrays in parallel, both of size 26. One array will have the occurrence of letters, the other will be the char array of the alphabet. You want to sort in decreasing order so that the highest frequency is the first element (in one array) and its corresponding letter is also the first element (in the other array).
A value-returning function that returns the percentage of occurrence of a specific letter that is passed into the function. You will use this function to determine if the occurrence of j, x, y is zero. You may call it in other locations/situations if you wish.
A void function that takes in the occurrence array and the alphabet array and determines which language the file contains, based on the above assumptions of the languages. It will print out to standard output the language that was determined; if it cannot determine the input to be one of the above languages then it should just print out “the language cannot be determined”.
Your main function will open an input file and read the letters into an array. Call the functions above to first get the occurrence as a percent for each letter, and then determine the language used.
Also, include the following for testing:
• An if block based on a boolean flag
• If the bool is set to true it will print out the occurrence of each letter (sorted) using 2 columns: letters and frequency
• If the bool is set to false, this printing will not happen
• You will be setting the bool (manually, for each run).
Make sure you use appropriate data types and that your submission follows the assignment guidelines.
Note: The input files may have 100000 letters, or they may have slightly less or slightly more. Therefore, you should use appropriate file-handling procedures.
Part A: Write the C++ source code, compile the program and fix syntax errors.
Part B: Run the program 4 times:
1) input file is input_1.txt and the bool flag is off
2) input file is input_2.txt and the bool flag is on
3) input file is input_3.txt and the bool flag is off
4) input file is input_4.txt and the bool flag is on