Hi
I am needing to store some data for a project I am working on, I am trying to figure out the fastest method to access the stored data, the data takes the form:
keyword (std::string) - list of ints
ie:
yellow - 2342 2312 8478 3827 9773 4837 2893 0983 478 2981, etc
I need to store this information in a memory based cache, size isn't really in issue the machine this will be running on has 8g of ram of which I can freely use about 2g.
I am thinking a Btree would be best and then store the ints in a vector, ie:
Btree node yellow:
vector<int>
vector<int>
More background this is a searchengine previously written in perl, which came up against some obvious limitations when it started getting >10 hits a second, it is being reimplimented in C++ to remove some of the limitations of perl.
The data was previously stored in a large hash of arrays:
$hash{yellow} = (int, int, int, int, int, int, int);
So data was retreived pretty quickly.
I am looking at about 2000 keywords and about 3000 ints which can repeat in different keywords, when a request is made I need to be able to pull all the ints in the requested keyword.
I will be able to figure out the code without too many problems I am just looking for some recommendations on the most efficient way to store this data for rapid retrival. I read that Btree is the quickest when dealing with strings or character arrays (I can convert the data to be char[] from std::string if need be), and that linked lists are better for smaller amounts of data. I could use map<std::string, vector<int>> I guess (I am not sure how efficient that would be though).
So if anyone has recommendations on the best methods please share :)
Thanks
Ben