I have been trying to find the frequency distribution of nouns in a given sentence. If I do this:
text = "This ball is blue, small and extraordinary. Like no other ball."
token_text= nltk.word_tokenize(text)
tagged_sent = nltk.pos_tag(token_text)
nouns= []
for word,pos in tagged_sent:
if pos in ['NN',"NNP"]:
nouns.append(word)
freq_nouns=nltk.FreqDist(nouns)
print freq_nouns
It considers "ball" and "ball." as separate words. So I went ahead and tokenized the sentence before tokenizing the words:
text = "This ball is blue, small and extraordinary. Like no other ball."
sentences = nltk.sent_tokenize(text)
words = [nltk.word_tokenize(sent)for sent in sentences]
tagged_sent = [nltk.pos_tag(sent)for sent in words]
nouns= []
for word,pos in tagged_sent:
if pos in ['NN',"NNP"]:
nouns.append(word)
freq_nouns=nltk.FreqDist(nouns)
print freq_nouns
It gives the following error:
Traceback (most recent call last):
File "C:\beautifulsoup4-4.3.2\Trial.py", line 19, in <module>
for word,pos in tagged_sent:
ValueError: too many values to unpack
What am I doing wrong? Please help.