I'm a linguist (Python newbie) trying to use Python to help me with some simple NLP processing. I have extracted verb + preposition + VBG triples from the British National Corpus, so I have large csv files containing stuff like this:
plan,on,selling
criticised,for,getting
opened,after,receiving
were,in,identifying
visited,before,returning
recruited,from,including
attended,by,including
given,by,joining
...
...
The python script below counts tokens within any given column (e.g., how many times does the verb "prevent" occur, or how many times does the preposition "by" occur).
import csv
out_stream = file('counted_test_file.csv', 'w')
x = csv.reader(open('test_file.csv', 'rb'))
count = {}
for verb, prep, vbg in x:
if verb not in count:
count[verb] = 0
count[verb] += 1
for (key, val) in count.items():
print>>out_stream, "%s,%s" % (key, val)
out_stream.close()
Now I'm trying to get this code to count all combinations (e.g., how many times does 'prevent' occur with 'from'). I tried the following variation, but this just counts preps (the second element in the csv file):
for verb, prep, vbg in x:
if (verb and prep) not in count:
count[(verb and prep)] = 0
count[(verb and prep)] += 1
Any help would be greatly appreciated!