hi all,
i have been working for two months on a project and i have come up with an algorithm that
is a mix of R. Mitkov's algorithm on anaphora resolution (robust, knowledge-poor algorithm) and
several filters that are applied first to the xml file (a POS tagged text) to eliminate the non-expletive (impersonal/ non anaphoris) "it" e.g. it rains
(i am working on a french text)
my algorithm is structured as follows:
input: XML text with POS tagging
step 1: search for pronouns that are tagged as "CL" or "PRO" and tag them as
being "expletive" or "non expletive" by applying a list of rules
step 2: search for pronouns that are tagged as being "expletive" and
search for their antecedents applying Mitkov's algorithm : (10 rules
in all) that attribute scores to each antecedent
in a distance of 4 sentences (if its a personal pronoun)
and search for antecedents in the same phrase ( if the pronoun is reflexive or possessive)
output: a list with every line containing: the pronoun, its position
in the XML text (nr of sentence) , the chosen antecedent, and the nr
of sentence in which it was found
so i am wondering how to proceed with the score application,
do i make a matrix for the antecedents of every pronoun (if thats
possible in Python?)
because every antecedent will be attributed scores,
and at the end of every antecedent-search for every expletive pronoun
found,
i have to add up the scores attributed to every antecedent, and pick
the antecedent with the best score !
thank you for any advice you can give me on implementing this
algorithm
P.S
i already read the two previous discussions about xml parsing
but i was told that not all parsers allow detailed search in the xml file, like if i want the
child of a Node (direct child or the second child) :this would be possible using certain parsers
and not others