The Bayesian interpretation begins by considering
all possible classes, i.e. the set of all possible words.
Out of this universe, we want to choose that
word which is most probable given the observation that
we have [ni]., i.e.
w =argmax
where argmax means ``the x such that f(x) is maximised''.
Problem. Whilst this is guaranteed to give us the
optimal word, it is not obvious how to make the equation operational:
for a given word w and a given O we don't know how to compute