since we are maximising
for all words where the denominator never changes. So w, the most likely word
= argmax
The two terms of this product have names:
P(w) is called the prior probability
is called the likelihood