This program uses Python module re for splitting a text file into words and removing some common punctuation marks. The word:frequency dictionary is then formed using try/except. In honor of 4th of July the text analyzed is National Anthem of USA (found via Google).
Word Frequency using Python
# another word frequency program, uses re
# tested with Python2.4.3 HAB
import re
# this one in honor of 4th July, or pick text file you have!!!!!!!
filename = 'NationalAnthemUSA.txt'
# create list of lower case words, \s+ --> match any whitespace(s)
# you can replace file(filename).read() with given string
word_list = re.split('\s+', file(filename).read().lower())
print 'Words in text:', len(word_list)
# create dictionary of word:frequency pairs
freq_dic = {}
# punctuation marks to be removed
punctuation = re.compile(r'[.?!,":;]')
for word in word_list:
# remove punctuation marks
word = punctuation.sub("", word)
# form dictionary
try:
freq_dic[word] += 1
except:
freq_dic[word] = 1
print 'Unique words:', len(freq_dic)
# create list of (key, val) tuple pairs
freq_list = freq_dic.items()
# sort by key or word
freq_list.sort()
# display result
for word, freq in freq_list:
print word, freq
kenmeck03 0 Newbie Poster
bumsfeld 413 Nearly a Posting Virtuoso
bipratikgoswami 0 Newbie Poster
masterofpuppets
mattp23 0 Newbie Poster
nawaf_ali 0 Newbie Poster
luisbeta04 0 Newbie Poster
sujit.shakya.3 0 Newbie Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.