Lab Session on Pig Latin using xfst

(due to Lauri Karttunen, Bonnie Webber, Mark Steedman)

Using the xfst tool, create a transducer for translating from English (or any other language) into Pig Latin.

There may be different varieties of Pig Latin. Let's pick the simplest one, defined by the rule

If the word begins with a consonant, the corresponding Pig Latin word
is the same except that the initial consonant has been moved to the end
of the word and suffixed with "ay".

For example, the sentence "pig latin is fun" corresponds to "igpay atinlay is unfay" in Pig Latin.

Your task is to write an xfst script, to be run with the command,

xfst -l FileName

that leaves on the stack a transducer with the following properties:

  1. The upper side language of the transducer is the universal language. That is, it contains all strings of any length.
  2. The transducer is unambiguous in the top-down direction. That is, each string in the upper language corresponds to one and only one string in the lower language.
  3. When you apply your transducer "downward" to an input line, xfst displays the corresponding Pig Latin translation for all words that begin with a consonant. Otherwise the output is the same as the input.
  4. Words consist of consonants and vowels.
  5. The consonants are b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z.
  6. The vowels are a, e, i, o, u.
  7. White space characters, space, and tab, are word separators. Thus the string "pig latin" contains two words.

Example

Here is an example of how things should work when you are done.

xfst -l piglatin.scr Execute piglatin.scr and wait for commands.
xfst[1]: apply down Start applying the transducer downwards.
apply down> pig
igpay
apply down> pig latin
igpay atinlay
apply down> the little brown fox jumped over the lazy dog
hetay ittlelay rownbay oxfay umpedjay over hetay azylay ogday

Hints

You will find it convenient to start your script with some definitions, such as
define Cons [ b ] ;
define Vowel [a | e | i | o | u ] ;
define Ltr [ Cons | Vowel ]+ ;
define Limit [" " | "\t" | .#. ] ;
Because the trick in the Pig Latin transformation is the treatment of word-initial consonants, it is better to start experimenting with just one consonant. Once you succeed in constructing a transducer that correctly maps "boa" into "oabay", the rest is easy.

Because you cannot literally "move" the initial consonant from the beginning to the end of the word, you need to think of movement in terms of two more primitive operations:

  1. Copy the initial consonant to the end of the word adding "ay".
  2. Delete the initial consonant.
You need the definition of Limit to refer to the beginning and end of the word.

One more important hint:
[ ] -> a || b _ c ; Maps the string "bc" to the infinite language [ b a* c ].
[. .] -> a || b _ c ; Maps the string "bc" to the string "bac".
To insert just one instance of something, use [. .] in your replace expression.

Work with simple replacement and composition. Don't try using parallel replacement.

With just one consonant, your Pig Latin translator should have around 8 states and 44 arcs. With the full set of 21 consonants, the size of the transducer is about 48 states, 1220 arcs.


Thanks to Bonnie Webber and Mark Steedman for suggesting this excercise.