Dear all,
may be someone can help me to find a solution to the following problem:
I have a list of patterns (len=5) that are presented as tuples in a list, e.g.
patterns = [('w1','X1','w1','Y1','w1'), ('w2','w2','X2','w2','Y2'), ('w2','X2','w2','Y2','w2')]
I want to go through all sentences in a text file (one sentence per line) and extract all occurrences of these patterns in each sentence. The problem is that all words (w1,w2) in each pattern have to be exactly the same except for the elements X1, X2, Y1, etc. because what I want to know is which words occur in these places.
I can check for each line in file: whether each element of the pattern is in it. But how do I deal with placeholders X & Y? I can't think of anything to solve this :/ Can anyone help me or point me in the right direction?!
Thank you in advance!
Malinka
***
Example:
#tuples with patterns, the unknown element is an empty string ''
patterns = [('w1','','w3','','w5'), ('w7','w8','','w10',''), ('w8','','w10','',w12)]
sent1: w1 A w3 B w5 w6 w7 w8
sent2: w1 w2 w3 w4 w5 w6 w7 w8 C w10 D w12
#extracted patterns with new words instead of empty strings
extracted_patterns =