The following code does the same lookups using a list and a dictionary. Lists are sequential in memory, meaning that the program has to get the offset for the beginning of the item it wants to use, go to that offset, compare to what it wants to find, and go back and do it all over again for each element in the list. A dictionary uses a hash which creates small groups, for lack of a better term, so there is a relatively small number of lookups. For each tenfold increase in data, the dictionary's time increases by ten times also, but is still pretty fast. The list's lookup time increases by 100+ times for the same increase. One would expect the same increases for a tenfold increase in the size of each individual item being stored as that would also increase the anount of memory that has to be traversed. So it appears that you can't really decide on using a list vs. dictionary based on the number of items but have to consider the total amount of memory to be consumed.
import datetime
def add_to_list(num):
a_list=[]
start_time=datetime.datetime.now()
for ctr in xrange(num):
## lookup up each time to show the "cost" of lookups
if ctr not in a_list:
a_list.append(ctr)
print "elapsed time:", datetime.datetime.now()-start_time
def add_to_dict(num):
a_dict={}
start_time=datetime.datetime.now()
for ctr in xrange(num):
## lookup up each time to show the "cost" of lookups
if ctr not in a_dict:
a_dict[ctr]=ctr
print "elapsed time:", datetime.datetime.now()-start_time
for num in [1000, 10000, 100000]:
print "list of %6d" % (num),
add_to_list(num)
print "dict of %6d" % (num),
add_to_dict(num)
print
""" compare execute times for list vs. dictionary for different
lengths of input data
list of 1000 elapsed time: 0:00:00.012220
dict of 1000 elapsed time: 0:00:00.000264 Approx 61 times faster
list of 10000 elapsed time: 0:00:01.210761
dict of 10000 elapsed time: 0:00:00.003146 Approx 390 times
list of 100000 elapsed time: 0:02:30.698131
dict of 100000 elapsed time: 0:00:00.03296 Approx 4566.7 times
"""