I'm compiling an extremely large list of usernames, and I want to know which is a faster method of checking what is already in the list.
If anyone can give some insight as to how Python deals with each that would be much appreciated!
I'm compiling an extremely large list of usernames, and I want to know which is a faster method of checking what is already in the list.
If anyone can give some insight as to how Python deals with each that would be much appreciated!
If you want to check if the username is present, the easiest thing to do is:
usern=['sravan953','Dan08','SuperMetroid','vegaseat']
'SuperMetroid' in usern
"""
>> True
"""
Is that the most efficient for an extremely big list?
I really want to know what is going on behind the scenes..
And what would be fastest in Big O notation.
I don't know exactly what you want to compare, but here is a code which measures the time necessary to execute 1,000,000 times a dictionary lookup (the statement '7498' in D
)
from timeit import Timer
D = dict()
for i in range(10000):
D[str(i)] = i
print(Timer("'7498' in D", "from __main__ import D").timeit())
For your problem, I would choose a dictionary lookup over other methods.
Dictionary key searches are highly optimized, since Python itself uses dictionaries internally. There are entire articles published that recommend converting a long list into a dictionary for fast searches. Still faster than a list search even with the time it takes to convert.
I remember seeing one of these articles in:
http://code.activestate.com/recipes/langs/python/
I got curious, so I did a little test run ...
# use Python3's dictionary comprehension to
# speed up list searches
#
# the time it takes to create the dictionary
# will be regained after about 6 searches
# as list size increases this goes down to > 1 search
# tested with Python 3.1.1 vegaseat
import timeit
data = """\
Bill
Brutus
Daphne
Dorky
Al
Kristin
Cloe
Carlos
Pete
Pheobe
Jerry
Jack
Justin
John
Julie
Joe
Moe
Theo
Albert
Alberto
Pauline
Paula
Christopher
Gisela
Lamar
Donna
Demitrius
Frank
Heidi
Margot
Cindy
Doris
Harry
Larry
Dilbert
Mary
Robert
Sophia
Samuel
Candy
Tawny
Terry
Markus
Veronika
Norbert
Zoe
Udo"""
# create a list of names from the data
mylist = data.split('\n')
# create a dictionary with name:zero pairs from the list
mylist_d = {name:0 for name in mylist}
# search for 'Udo' is the last item in the list and dictionary
statement = "'Udo' in mylist"
setup = "from __main__ import mylist"
t = timeit.Timer(statement, setup)
# doing 1000000 passes (default) gives microseconds per pass
elapsed = t.timeit()
sf = "Code %-20s takes %0.3f micro-seconds/pass"
print( sf % (statement, elapsed))
statement = "'Udo' in mylist_d"
setup = "from __main__ import mylist_d"
t = timeit.Timer(statement, setup)
elapsed = t.timeit()
sf = "Code %-20s takes %0.3f micro-seconds/pass"
print( sf % (statement, elapsed))
print('-'*60)
statement = "{name:0 for name in mylist}"
setup = "from __main__ import mylist"
t = timeit.Timer(statement, setup)
elapsed = t.timeit()
sf = "Code %-20s takes %0.3f micro-seconds/pass"
print( sf % (statement, elapsed))
print('-'*60)
# optional tests to show the last 4 items in list and dictionary
print("Last 4 items in list and dictionary:")
print(mylist[-4:])
print(list(mylist_d.keys())[-4:])
"""my result -->
Code 'Udo' in mylist takes 2.193 micro-seconds/pass
Code 'Udo' in mylist_d takes 0.182 micro-seconds/pass
------------------------------------------------------------
Code {name:0 for name in mylist} takes 11.151 micro-seconds/pass
------------------------------------------------------------
Last 4 items in list and dictionary:
['Veronika', 'Norbert', 'Zoe', 'Udo']
['Dilbert', 'Julie', 'Al', 'Udo']
"""
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.