Next: Prior Probability Up: No Title Previous: Proposing Candidates

Ranking Candidates

The second stage scores each correction. Let t be the typo and c range over a set C of candidate corrections. The most likely correction is then

c = argmax

The prior probability P(c) can be estimated by

Counting how often the word c appears in the corpus
Normalising the count by dividing it by the number N of words in the corpus. Zero counts can cause problems, and so we add .5 to all counts (this is called ``smoothing''). Having done this, we must compensate by adding 0.5*V to the denominator for each word V in the vocabulary so that
P(c) =

Mike Rosner
Mon Mar 15 12:22:51 MET 1999