I'm a linguist (Python newbie) trying to use Python to help me with some simple NLP processing. I have extracted verb + preposition + VBG triples from the British National Corpus, so I have large csv files containing stuff like this:

plan,on,selling
criticised,for,getting
opened,after,receiving
were,in,identifying
visited,before,returning
recruited,from,including
attended,by,including
given,by,joining
...
...

The python script below counts tokens within any given column (e.g., how many times does the verb "prevent" occur, or how many times does the preposition "by" occur).

import csv
out_stream = file('counted_test_file.csv', 'w')
x = csv.reader(open('test_file.csv', 'rb'))

count = {}

for verb, prep, vbg in x:
	if verb not in count:
		count[verb] = 0
	count[verb] += 1

for (key, val) in count.items():
	print>>out_stream, "%s,%s" % (key, val)

out_stream.close()

Now I'm trying to get this code to count all combinations (e.g., how many times does 'prevent' occur with 'from'). I tried the following variation, but this just counts preps (the second element in the csv file):

for verb, prep, vbg in x:
if (verb and prep) not in count:
count[(verb and prep)] = 0
count[(verb and prep)] += 1

Any help would be greatly appreciated!

How about count[(verb,prep)] += 1 ?

How about count[(verb,prep)] += 1 ?

Ahh, yes, count[(verb,prep)] += 1 worked. So simple. Thanks!

If you want both verb and prep to be found in count created by the existing code

from collections import defaultdict
total_found_dic = defaultdict(int) 
if (verb in count) and (prep in count):
     total_found_dic[(verb, prep)]  += 1

Note that you want to test sub-words and print the results to see what happens. I doubt you are looking for "the", but as an example, searching for "the" may or may not give it hit for the word "they", depending on how the dictionary is arranged.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.