Next: Prior Probability
Up: No Title
Previous: Proposing Candidates
The second stage scores each correction. Let t be the typo
and c range over a set C of candidate corrections. The most
likely correction is then
c = argmax
The prior probability P(c) can be estimated by
- Counting how often the word c appears in the corpus
- Normalising the count by dividing it by the number N
of words in the corpus. Zero counts can cause problems, and so
we add .5 to all counts (this is called ``smoothing''). Having
done this, we must compensate by adding 0.5*V to the denominator
for each word V in the vocabulary so that
P(c) =
Mike Rosner
Mon Mar 15 12:22:51 MET 1999