So this is my first post and I have only begun using python. One of my first assignments is to design a program which will count the most used words in the given text file. In my case we are using the Declaration of Independence.

Here is what i have so far, I think everything is fine up until the end were i get confused.The Problems i seem to have is with the My dictionary statements at the bottom. Any way to sort it out?

Once again i'm sorry if i sound terrible but I've only just started this so not everything is 100% accurate.

def word_freq(text_file):
	""" prints the most commonly used words in the given text file
	author

	INPUT
	    text_file: the name of a text file to analyze
	OUTPUT
	    printing the most frequently used words in the file
	"""
	f = open(text_file, 'r')
	contents = f.read()
	words = contents.split()
	for i in range(len(words)):
	    words[i] = words[i].lower()
	    words[i] = words[i].strip(',:.;')
	counter = dict()
	for i in range(len(words)):
	    if words[i] not in counter:
		    counter[words[i]] = 1
	    else:
		    counter[words[i]] += 1
	sorted_words = list(sorted(counter, key=counter.get, reverse=True))
	for w in sorted_words[0:30]:
		    print('freq:',counter[w],'word',w)
		    my_dictionary
		    my_dictionary[‘the’] = 0
	    else:
		    my_dictionary[‘the’] += 1

Any helpful tips or solutions would be greatly appreciated.
Thanks a bunch.

You are making some mistake and it can be done much eaiser with some python power.
You are trying to count with dictionary 2 times 1 is enough.
.strip(',:.;')
You should take out more than this also ?!
An example.

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> s = 'The quick:: brown! fox jumps? over+ the lazy?? dog.'
>>> ''.join(c for c in s if c not in string.punctuation)
'The quick brown fox jumps over the lazy dog'
>>>

So another way this time a complete script.

import re
from collections import Counter

with open('text.txt') as f:
    text = f.read().lower()
words = re.findall(r'\w+', text)

print(Counter(words).most_common(4))

This use regex \w+ that dos the same as i showed in code over remove special character.
And counting is done bye collections Counter new from 2.7-->
Counter also has a most_common function,that dos what name say.
This will show the 4 most common word in text.txt.

If you want to eliminate certain common words like "the and "and", etc. use a list.

omit_words = ["the", "a", "and", "but", "i"]
    for w in sorted_words[0:30]:
        if w not in omit_words:

Thanks for all the replies got it working.

CLOSE THAT FILE!!! Also, did you notice the stray else at the end?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.