*Note: its not a fully functional hashTable for technical purposes
My priority is to get it to store the strings properly. I approached this a different
yet similar way to my other post and I have gotten much farther! The problem now is that my output prints duplicates each and table. More confounding is why only these words don't output properly. I will provide the source so you can try it out for yourself.
I compiled with visual c++ 2008 & 05 and received same output. My next goal would be to
implement it as an ADT, but for now solving this issue is crucial.
Once again thanks for any help I receive.

#include <iostream>
#include <string>
#include <fstream>
#include <cctype>
#include <iomanip>
using namespace std;
//function prototypes
int hashFunction(const char *key, int keyLenght, const int HTSIZE); 
void LinearProbing(int upindex, string words[], string item ,int noitems[]);
string StringToLower(string strToConvert);

int main()
{
	string getString;			//used to read the strings
	const int HTSIZE = 131;		//Holds up to 130 words
	string hashTable[HTSIZE];	//stores unique words
	const char *cphashTable;    //holds c-string value for processing
	int IndexStatusList[131];   //holds the number of times a word is inserted at its location
	ifstream DataIn;			//filestream object
	int size = 0;				//no use yet
	int lenght;					//stores lenght of a string
	int hIndex;					//stores the index in where to store a string

	//will be used as a constructor when implemented as ADT
	//initializes both arrays. The first for empty strings, the other for ints
	for(int c = 0; c < HTSIZE; c++)
	{
		hashTable[c] = " ";          
		IndexStatusList[c] = 0;		
	}

	DataIn.open("blah.txt");                  //Fail Safe test
	if(!DataIn)
	{
		cout <<"Error opening data file\n";
	}
	else //Sucess
	{
		while(DataIn >> getString)           //Gets the first string and reads until
		{									 //there is no more data to be read
			lenght = static_cast<int>(getString.length());	//gets lenght of a string to be used for processing
			cphashTable = getString.data();                 //returns a c-string copy of a string
			hIndex = hashFunction(cphashTable,lenght,HTSIZE); //computes the index. see def below
			LinearProbing(hIndex,hashTable, cphashTable, IndexStatusList);   //cphashTable //getString
		}																	 //tried both no difference in results
	}

	cout << "Count\t" << "Index\t" << "WORD\n";   //Title of Columns

	for(int i = 0; i < HTSIZE;i++)               // Outputs number of occurences followed by
	{											 // the location of the word in the hashtable				
		if(hashTable[i] != " ")					 // the content of that location e.g cat
		{
			cout << IndexStatusList[i] << "\t" << i << "\t" << hashTable[i] << endl ;
		}
	}

	return 0;
} //end of main
/* Def of Linear Probring
Places the item, in this case a string in the appropriate location in the array.
The location hIndex is calculated by the function hashFunction and then passed
to this function as an argument. If the location to where hIndex points is empty
in the hashTable insert the item there and update its number of currence:(parralel array)
StatusListIndex. While the location is occupied find next available location by incrementing
the current location to one until an available space is found. My current list is less than
130 elements so no need to worry about space at this time. Then insert item at this new location 
and update the IndexStatusList. You also need to check if at a certain location it is the same
instance of that item, to update its IndexStatusList. Eg, cat is inserted into location 1, 
IndexStatusList is 1. Then another cat is found just update the IndexStatusList by one rather than
inserting it in a new location. Cat still at loc 1 IndexStatusList is 2
*/
void LinearProbing(int upindex, string words[], string item ,int noitems[])
{
	item = StringToLower(item); //i tried this in hopes of being able to compare both
	             				//strings properly but im not so sure.
	if(item == words[upindex])
	{
		int inc = noitems[upindex]; //this section takes care of duplicates
		inc++;
		noitems[upindex] = inc;
	}
	else if(words[upindex] == " ")
	{
	words[upindex] = item;             //if location is empty
	noitems[upindex] = 1;
	}
	else
	{
		while(noitems[upindex] != 0)
		{
			upindex = ((upindex + 1) % 131);     //collision
		}
		words[upindex] = item;
		noitems[upindex] = 1;
	}


}
/*
Adds the values of each character in the string then % it with  HTSIZe which
is to be used as hIndex. There is no distinction between lowercase and uppercase
A is converted to the ascii equivalent of a
ex string ab = 97+98 then lets say htsize was 3
hIndex would then be 0.
*/
int hashFunction(const char *key, int keyLenght, const int HTSIZE)
{
	int sum = 0;
	for(int j =0; j < keyLenght ; j++)
	{
		sum = sum + static_cast<int>(tolower(key[j]));
		
	}
	return sum = sum % HTSIZE;
}
//change each element of the string to lower case
string StringToLower(string strToConvert)
{
	for(unsigned int i=0;i<strToConvert.length();i++)
	{
      strToConvert[i] = tolower(strToConvert[i]);
	}
   return strToConvert;//return the converted string
}

1. Let's suppose that you have a collision with word "table". The LinearProbing finds a new free location then places the word into this cell. But the next occurence of the word "table" gets collision again - and linearProbing finds the next free location, it does not check if the word "table" was placed in the hash table! That's why you have next occurences of the same word in resulting hash table. Don't forget to wrap around search index in a new (correct) linearProbing loop...
2. DataIn >> getString gets not a word as a sequence of letters but all chars until whitespace occured. That's why you have such items as "539," or "word" and "word.". Add a simple item preprocessing (use isalpha() from <cctype>, for example) to select true words only.
3. Use c_str() string class member functions, not data(), The last one does not garantee that it's C-string with zero byte terminator.
4. Add hash table overfow check NOW! Never ever say that "My current list is less than
130 elements so no need to worry about space at this time" !!!
5. It seems you are writing a program in C++. Where is a class (at least a struct) HashTable? Why you have wrote obviously C-style code in C++? Why linearProbe does not know about hash table capacity?..

I think you are capable to correct all these defects yourself.
Good luck!

1. Let's suppose that you have a collision with word "table". The LinearProbing finds a new free location then places the word into this cell. But the next occurence of the word "table" gets collision again - and linearProbing finds the next free location, it does not check if the word "table" was placed in the hash table! That's why you have next occurences of the same word in resulting hash table. Don't forget to wrap around search index in a new (correct) linearProbing loop...
2. DataIn >> getString gets not a word as a sequence of letters but all chars until whitespace occured. That's why you have such items as "539," or "word" and "word.". Add a simple item preprocessing (use isalpha() from <cctype>, for example) to select true words only.
3. Use c_str() string class member functions, not data(), The last one does not garantee that it's C-string with zero byte terminator.
4. Add hash table overfow check NOW! Never ever say that "My current list is less than
130 elements so no need to worry about space at this time" !!!
5. It seems you are writing a program in C++. Where is a class (at least a struct) HashTable? Why you have wrote obviously C-style code in C++? Why linearProbe does not know about hash table capacity?..

I think you are capable to correct all these defects yourself.
Good luck!

I seemed to have finally solved my problems!
1) My code WAS checking to see if the item was already in the table. The problem came in the logic. When a duplicate item was found, it had no choice but to think collision was occurring because it first tested to see if the location in my string array was empty but not equal to it.
2) I didn't specify, for the purpose of this program digits are to be taken as strings not integers.
3) I knew strings could be compared at a character per character basis however, i was trying to convert an array of strings which would subsequently point to the next string rather than a single character in that string. My mistake ^_^ so the need for c_str() or data() was removed from my code with a simple loop. I do agree with you using c_str() as to assert a null string terminator is included.
4)Its good habit to take care of any business such as enough memory allocated for a variable however, i was not required to do so since were short on time. That explains why linearprobing doesnt take HTSIZE as an argument.
5)My previous code was written in c++ only a few functions were in c-style.
6)My updated code is now an Abstract Data Type

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.