I'm a complete beginner in the programming world, so forgive me for the basic questions. I'm trying to run Peter Norvig's spelling corrector from the Windows XP command line, but am having difficulties.
I have a text file of addresses with a number of misspellings. I would like to use Norvig's script in correcting these misspellings. I have created a file, 'big.txt', consisting of addresses with the correct spellings. This is to be used as the reference data embedded in line 11 of the script. What I cannot figure out is how to provide the script my input text file with the misspellings, and have it generate an output file with the corrections.
Can someone show me what I need to change in the script to accept an input file and generate an output file? Secondly, how do you run all of this on the command line?
The following is Peter Norvig's spelling corrector -
import re, collections
def words(text): return re.findall('[a-z]+', text.lower())
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in splits if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
inserts = [a + c + b for a, b in splits for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)