A simple way to find duplicate words in a text. In this case the text is preprocessed to eliminate punctuation marks and set all words to lower case.
Find duplicate words in a text (Python)
''' Count_find_duplicate_words101.py
find duplicate words in a text (preprocessed)
using Counter() from the Python module collections and set()
following a tip from raymondh
tested with Python27, IronPython27 and Python33 by vegaseat 24sep2013
'''
from string import punctuation
from collections import Counter
# sample text for testing
text = """\
If you see a turn signal blinking on a car with a southern license plate,
you may rest assured that it was on when the car was purchased."""
# preprocess text, remove punctuation marks and change to lower case
text2 = ''.join(c for c in text.lower() if c not in punctuation)
# text2.split() splits text2 at white spaces and returns a list of words
word_list = text2.split()
duplicate_word_list = sorted(Counter(word_list) - Counter(set(word_list)))
# show result
print("Original text:")
print(text)
print('-'*72)
print("A list of duplicate words in the text:")
print(duplicate_word_list)
''' result ...
Original text:
If you see a turn signal blinking on a car with a southern license plate,
you may rest assured that it was on when the car was purchased.
------------------------------------------------------------------------
A list of duplicate words in the text:
['a', 'car', 'on', 'was', 'you']
'''
paddy3118 11 Light Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.