Introduction to Two-level Phonology[1]

Evan L. Antworth
Summer Institute of Linguistics

evan.antworth@sil.org

Two-level phonology is a linguistic tool developed by computational linguists. Its primary use is in systems for natural language processing such as PC-KIMMO, a program recently been published by SIL (Antworth 1990). This article describes the linguistic and computational basis of two-level phonology.[2]

Computational and linguistic roots

As the fields of computer science and linguistics have grown up together during the past several decades, they have each benefited from cross-fertilization. Modern linguistics has especially been influenced by the formal language theory that underlies computation. The most famous application of formal language theory to linguistics was Chomsky's (1957) transformational generative grammar. Chomsky's strategy was to consider several types of formal languages to see if they were capable of modeling natural language syntax. He started by considering the simplest type of formal languages, called finite state languages. As a general principle, computational linguists try to use the least powerful computational devices possible. This is because the less powerful devices are better understood, their behavior is predictable, and they are computationally more efficient. Chomsky (1957:18ff) demonstrated that natural language syntax could not be effectively modeled as a finite state language; thus he rejected finite state languages as a theory of syntax and proposed that syntax requires the use of more powerful, non-finite state languages. However, there is no reason to assume that the same should be true for natural language phonology. A finite state model of phonology is especially desirable from the computational point of view, since it makes possible a computational implementation that is simple and efficient. While various linguists proposed that generative phonological rules could be implemented by finite state devices (see Johnson 1972, Kay 1983), the most successful model of finite state phonology was developed by Kimmo Koskenniemi, a Finnish computer scientist. He called his model two-level morphology (Koskenniemi 1983). His use of the term morphology should be understood to encompass both what linguists would consider morphology proper (the decomposition of words into morphemes) and phonology (at least in the sense of morphophonemics). Koskenniemi's motivation for developing the two-level model was eminently practical. Finnish is a highly agglutinative language in which words can have thousands of inflected forms. Natural language processing systems for Finnish could get nowhere without first parsing its morphology. This is in contrast to English, whose relatively impoverished inflectional morphology can be handled in an ad hoc fashion.

Koskenniemi's two-level model comprises two components:

a rules component, which contains phonological rules represented as finite state devices, and
a lexical component, or lexicon, which lists lexical items (indivisible words and morphemes) in their underlying forms, and encodes morphotactic constraints.

The two components work together to perform both generation (production) and recognition (parsing) of word forms. Our main interest in this article is the phonological formalism used by the two-level model, hereafter called two-level phonology. Two-level phonology traces its linguistic heritage to `classical' generative phonology as codified in The Sound Pattern of English (Chomsky and Halle 1968). The basic insight of two-level phonology is due to the phonologist C. Douglas Johnson (1972,) who showed that the SPE theory of phonology could be implemented using finite state devices by replacing sequential rule application with simultaneous rule application. At its core, then, two-level phonology is a rule formalism, not a complete theory of phonology. The following sections of this article describe the mechanism of two-level rule application by contrasting it with rule application in classical generative phonology. It should be noted that Chomsky and Halle's theory of rule application became the focal point of much controversy during the 1970s with the result that current theories of phonology differ significantly from classical generative phonology. The relevance of two-level phonology to current theory is an important issue, but one that will not be fully addressed here. Rather, the comparison of two-level phonology to classical generative phonology is done mainly for expository purposes, recognizing that while classical generative phonology has been superseded by subsequent theoretical work, it constitutes a historically coherent view of phonology that continues to influence current theory and practice.

One feature that two-level phonology shares with classical generative phonology is linear representation. That is, phonological forms are represented as linear strings of symbols. This is in contrast to the nonlinear representations used in much current work in phonology, namely autosegmental and metrical phonology (see Goldsmith 1990). On the computational side, two-level phonology is consistent with natural language processing systems that are designed to operate on linear orthographic input.

Two-level rule application

We will begin by reviewing the formal properties of generative rules. Stated succinctly, generative rules are sequentially ordered rewriting rules. What does this mean? First, rewriting rules are rules that change or transform one symbol into another symbol. For example, a rewriting rule of the form a -> b interprets the relationship between the symbols a and b as a dynamic change whereby the symbol a is rewritten or turned into the symbol b. This means that after this operation takes place, the symbol a no longer `exists,' in the sense that it is no longer available to other rules. In linguistic theory generative rules are known as process rules. Process rules attempt to characterize the relationship between levels of representation (such as the phonemic and phonetic levels) by specifying how to transform representations from one level into representations on the other level.

Second, generative phonological rules apply sequentially, that is, one after another, rather than applying simultaneously. This means that each rule creates as its output a new intermediate level of representation. This intermediate level then serves as the input to the next rule. As a consequence, the underlying form becomes inaccessible to later rules.

Third, generative phonological rules are ordered; that is, the description specifies the sequence in which the rules must apply. Applying rules in any other order may result in incorrect output.

As an example of a set of generative rules, consider the following rules:

        Vowel Raising
        1. e -> i / ___C0i

        Palatalization
        2. t -> c / ___i

Rule 1 (Vowel Raising) states that e becomes (is rewritten as) i in the environment preceding Ci (where C stands for the set of consonants and C0 stands for zero or more consonants). Rule 2 (Palatalization) states that t becomes c preceding i. A sample derivation of forms to which these rules apply looks like this (where UR stands for Underlying Representation, SR stands for Surface Representation):[3]

        UR:     temi
        Rule 1: timi
        Rule 2: cimi
        SR:     cimi

Notice that in addition to the underlying and surface levels, an intermediate level has been created as the result of sequentially applying rules 1 and 2. The application of rule 1 produces the intermediate form timi, which then serves as the input to rule 2. Not only are these rules sequential, they are ordered, such that rule 1 must apply before rule 2. Rule 1 has a feeding relationship to rule 2; that is, rule 1 increases the number of forms that can undergo rule 2 by creating more instances of i. Consider what would happen if they were applied in the reverse order. Given the input form temi, rule 2 would do nothing, since its environment is not satisfied. Rule 1 would then apply to produce the incorrect surface form timi.

Two-level rules differ from generative rules in the following ways. First, whereas generative rules apply in a sequential order, two-level rules apply simultaneously, which is better described as applying in parallel. Applying rules in parallel to an input form means that for each segment in the form all of the rules must apply successfully, even if only vacuously.

Second, whereas sequentially applied generative rules create intermediate levels of derivation, simultaneously applied two-level rules require only two levels of representation: the underlying or lexical level and the surface level. There are no intermediate levels of derivation. It is in this sense that the model is called two-level.

Third, whereas generative rules relate the underlying and surface levels by rewriting underlying symbols as surface symbols, two-level rules express the relationship between the underlying and surface levels by positing direct, static correspondences between pairs of underlying and surface symbols. For instance, instead of rewriting underlying a as surface b, a two-level rule states that an underlying a corresponds to a surface b . The two-level rule does not change a into b, so a is available to other rules. In other words, after a two-level rule applies, both the underlying and surface symbols still `exist.'

Fourth, whereas generative rules have access only to the current intermediate form at each stage of the derivation, two-level rules have access to both underlying and surface environments. Generative rules cannot `look back' at underlying environments or `look ahead' to surface environments. In contrast, the environments of two-level rules are stated as lexical-to-surface correspondences. This means that a two-level rule can easily refer to an underlying a that corresponds to a surface b, or to a surface b that corresponds to an underlying a. In generative phonology, the interaction between a pair of rules is controlled by requiring that they apply in a certain sequential order. In two-level phonology, rule interactions are controlled not by ordering the rules but by carefully specifying their environments as strings of two-level correspondences.

Fifth, whereas generative, rewriting rules are unidirectional (that is, they operate only in an underlying to surface direction), two-level rules are bidirectional. Two-level rules can operate either in an underlying to surface direction (generation mode) or in a surface to underlying direction (recognition mode). Thus in generation mode two-level rules accept an underlying form as input and return a surface form, while in recognition mode they accept a surface form as input and return an underlying form. The practical application of bidirectional phonological rules is obvious: a computational implementation of bidirectional rules is not limited to generation mode to produce words; it can also be used in recognition direction to parse words.

Two-level rules and declarative representation

Two-level rules are not process rules like generative rules but more like the realization rules of stratificational linguistics. The linguistic opposition between process rules and realization rules is mirrored in computer science in the opposition between imperative and declarative programming. A typical imperative programming language is Pascal, while Prolog is an example of a declarative language. An imperative program is an operation that transforms input data objects into the desired output objects. In contrast, a declarative program merely expresses what must be true of the relationship between the input objects and output objects. When writing an imperative program, the programmer must specify an ordered sequence of commands that the computer will execute in order to arrive at the correct result. But when writing a declarative program, the programmer merely states constraints among the data objects, leaving it up to the computer to figure out what operations are needed to get output that is consistent with the constraints.

A significant consequence of declarative programming is that programs in a declarative language such as Prolog can run bidirectionally. For example, consider the problem of converting Fahrenheit temperatures to Celsius temperatures, and vice-versa. An imperative program that does these operations must contain two separate procedures: one to convert Fahrenheit to Celsius and another to convert Celsius to Fahrenheit. A declarative program, however, will simply state the relationship between Fahrenheit and Celsius equivalents in such a way that a single function can accept as input a Fahrenheit temperature and return as output the Celsius equivalent or accept a Celsius temperature and return a Fahrenheit temperature. Thus many relationships are more appropriately represented by a declarative formalism than an imperative one. Two-level phonology, then, permits phonological rules to be implemented declaratively as static, two-level rules, rather than imperatively as dynamic, process rules.

How a two-level description works

To understand how a two-level phonological description works, we will use the example given above involving Raising and Palatalization. The two-level model treats the relationship between the underlying form temi and the surface form cimi as a direct, symbol-to-symbol correspondence:

        UR:  t e m i
        SR:  c i m i

Each pair of lexical and surface symbols is a correspondence pair. We refer to a correspondence pair with the notation <underlying symbol>:<surface symbol>, for instance e:i and m:m. There must be an exact one-to-one correspondence between the symbols of the underlying form and the symbols of the surface form. Deletion and insertion of symbols (explained in detail in the next section) is handled by positing correspondences with zero, a null segment. The two-level model uses a notation for expressing two-level rules that is similar to the notation linguists use for phonological rules. Corresponding to the generative rule for Palatalization (rule 2 above), here is the two-level rule for the t:c correspondence:

        Palatalization
        3. t:c <=> ___ @:i

This rule is a statement about the distribution of the pair t:c on the left side of the arrow with respect to the context or environment on the right side of the arrow. A two-level rule has three parts: the correspondence, the operator, and the environment. The correspondence part of rule 3 is the pair t:c, which is the correspondence that the rule sanctions. The operator part of rule 3 is the double-headed arrow. It indicates the nature of the logical relationship between the correspondence and the environment (thus it means something very different from the rewriting arrow -> of generative phonology). The <=> arrow is equivalent to the biconditional operator of formal logic and means that the correspondence occurs always and only in the stated context; that is, t:c is allowed if and only if it is found in the context ___@:i. In short, rule 3 is an obligatory rule. The environment part of rule 3 is everything to the right of the arrow. The long underline indicates the gap where the pair t:c occurs. Notice that even the environment part of the rule is specified as two-level correspondence pairs. The environment part of rule 3 requires further explanation. Instead of using a correspondence such as i:i, it uses the correspondence @:i. The @ symbol is a special `wildcard' symbol that stands for any phonological segment included in the description. In the context of rule 3, the correspondence @:i stands for all the feasible pairs in the description whose surface segment is i, in this case e:i and i:i. Thus by using the correspondence @:i, we allow Palatalization to apply in the environment of either a lexical e or lexical i. In other words, we are claiming that Palatalization is sensitive to a surface (phonetic) environment rather than an underlying (phonemic) environment. Thus rule 3 will apply to both underlying forms timi and temi to produce a surface form with an initial c.

Corresponding to the generative rule for Raising (rule 1 above) is the following two-level rule for the e:i correspondence:

        Vowel Raising
        4. e:i <=> ___ C:C* @:i

(The asterisk in C:C* indicates zero or more instances of the correspondence C:C) Similar to rule 3 above, rule 4 uses the correspondence @:i in its environment. Thus rule 4 states that the correspondence e:i occurs preceding a surface i, regardless of whether it is derived from a lexical e or i. Why is this necessary? Consider the case of an underlying form such as pememi. In order to derive the surface form pimimi, Raising must apply twice: once before a lexical i and again before a lexical e, both of which correspond to a surface i. Thus rule 4 will apply to both instances of lexical e, capturing the regressive spreading of Raising through the word. By applying rules 3 and 4 in parallel, they work in consort to produce the right output. For example,

        UR:    t e m i
               | | | |
        Rules: 3 4 | |
               | | | |
        SR:    c i m i

Conceptually, a two-level phonological description of a data set such as this can be understood as follows. First, the two-level description declares an alphabet of all the phonological segments used in the data in both underlying and surface forms, in the case of our example, t, m, c, e, and i. Second, the description declares a set feasible pairs, which is the complete set of all underlying-to-surface correspondences of segments that occur in the data. The set of feasible pairs for these data is the union of the set of default correspondences, whose underlying and surface segments are identical (namely t:t, m:m, e:e, and i:i) and the set of special correspondences, whose underlying and surface segments are different (namely t:c and e:i). Notice that since the segment c only occurs as a surface segment in the feasible pairs, the description will disallow any underlying form that contains a c.

A minimal two-level description, then, consists of nothing more than this declaration of the feasible pairs. Since it contains all possible underlying-to-surface correspondences, such a description will produce the correct output form, but because it does not constrain the environments where the special correspondences can occur, it will also allow many incorrect output forms. For example, given the underlying form temi, it will produce the surface forms temi, timi, cemi, and cimi, of which only the last is correct.

Third, in order to restrict the output to only correct forms, we include rules in the description that specify where the special correspondences are allowed to occur. Thus the rules function as constraints or filters, blocking incorrect forms while allowing correct forms to pass through. For instance, rule 3 (Palatalization) states that a lexical t must be realized as a surface c when it precedes @:i; thus, given the underlying form temi it will block the potential surface output forms timi (because the surface sequence ti is prohibited) and cemi (because surface c is prohibited before anything except surface i). Rule 4 (Raising) states that a lexical e must be realized as a surface i when it precedes the sequence C:C @:i; thus, given the underlying form temi it will block the potential surface output forms temi and cemi (because the surface sequence emi is prohibited). Therefore of the four potential surface forms, three are filtered out; rules 3 and 4 leave only the correct form cimi.

Two-level phonology facilitates a rather different way of thinking about phonological rules. We think of generative rules as processes that change one segment into another. In contrast, two-level rules do not perform operations on segments, rather they state static constraints on correspondences between underlying and surface forms. Generative phonology and two-level phonology also differ in how they characterize relationships between rules. Rules in generative phonology are described in terms of their relative order of application and their effect on the input of other rules (the so-called feeding and bleeding relations). Thus the generative rule 1 for Raising precedes and feeds rule 2 for Palatalization. In contrast, rules in the two-level model are categorized according to whether they apply in lexical versus surface environments. So we say that the two-level rules for Raising and Palatalization are sensitive to a surface rather than underlying environment.

With zero you can do (almost) anything

Phonological processes that delete or insert segments pose a special challenge to two-level phonology. Since an underlying form and its surface form must correspond segment for segment, how can segments be deleted from an underlying form or inserted into a surface form? The answer lies in the use of the special null symbol 0 (zero). Thus the correspondence x:0 represents the deletion of x, while 0:x represents the insertion of x. (It should be understood that these zeros are provided by rule application mechanism and exist only internally; that is, zeros are not included in input forms nor are they printed in output forms.) As an example of deletion, consider these forms from Tagalog (where + represents a morpheme boundary):

        UR:  m a n + b i l i
        SR:  m a m 0 0 i l i

Using process terminology, these forms exemplify phonological coalescence, whereby the sequence nb becomes m. Since in the two-level model a sequence of two underlying segments cannot correspond to a single surface segment, coalescence must be interpreted as simultaneous assimilation and deletion. Thus we need two rules: an assimilation rule for the correspondence n:m and a deletion rule for the correspondence b:0 (note that the morpheme boundary + is treated as a special symbol that is always deleted).

        Nasal Assimilation
        5. n:m <=> ___ +:0 b:@

        Deletion
        6. b:0 <=> @:m +:0 ___

Notice the interaction between the rules: Nasal Assimilation occurs in a lexical environment, namely a lexical b (which can correspond to either a surface b or 0), while Deletion occurs in a surface environment, namely a surface m (which could be the realization of either a lexical n or m). In this way the two rules interact with each other to produce the correct output. Insertion correspondences, where the lexical segment is 0, enable one to write rules for processes such as stress insertion, gemination, infixation, and reduplication. For example, Tagalog has a verbalizing infix <um> that attaches between the first consonant and vowel of a stem; thus the infixed form of bili is bumili. To account for this formation with two-level rules, we represent the underlying form of the infix <um> as the prefix X+, where X is a special symbol that has no phonological purpose other than standing for the infix. We then write a rule that inserts the sequence um in the presence of X+, which is deleted. Here is the two-level correspondence:

        UR:  X + b 0 0 i l i
        SR:  0 0 b u m i l i

and here is the two-level rule, which simultaneously deletes X and inserts um:

        Infixation
        7. X:0 <=> ___ +:0 C:C 0:u 0:m V:V

These examples involving deletion and insertion show that the invention of zero is just as important for phonology as it was for arithmetic. Without zero, two-level phonology would be limited to the most trivial phonological processes; with zero, the two-level model has the expressive power to handle complex phonological or morphological phenomena (though not necessarily with the degree of felicity that a linguist might desire).

Two-level phonology as a linguistic tool

Shieber (1986) describes two classes of linguistic formalisms: linguistic tools and linguistic theories. A linguistic tool is used to describe natural languages. A linguistic theory, on the other hand, is intended to define the class of possible natural languages. From this point of view, two-level phonology is best regarded as a linguistic tool rather than a theory. Its job is to provide the expressive power needed to describe the phonological phenomena of natural languages. Issues such as characterizing the class of possible natural language phonologies, constraining possible analyses, and evaluating competing descriptions must be resolved by the theory which the tool serves. As described below, two-level phonology has been used to build PC-KIMMO, a computational system for producing and recognizing words. But PC-KIMMO is not a linguistic theory either (though it is modeled on linguistic concepts), rather it is a practical application for natural language processing. Thus it is inappropriate to compare PC-KIMMO with, say, the theory of Lexical Phonology. However, two-level phonology per se is not inconsistent with a theory such as Lexical Phonology. While this article has described the two levels of two-level phonology as corresponding to the underlying and surface levels of classical generative phonology, the general point to understand is that the two levels can actually be any two levels as defined by a certain linguistic theory. For example, Lexical Phonology does not have a single underlying level and a single surface level. Rather, the model allows multiple, ordered morphological levels. At each level, morphological rules such as affixation are applied accompanied by the application of the phonological rules relevant to that level (this summary leaves out many important details; for an overview of Lexical Phonology see Kaisse and Shaw 1985 or Kroeger 1990). So on each morphological level, phonological rules apply to `underlying' forms and produce `surface' forms, which are then fed into the next morphological level. These phonological rules could be implemented as two-level rules. It is in this sense that two-level phonology can be used as a tool to computationally implement a linguistic theory.

Doing two-level phonology on a computer

Earlier in this article two-level phonology was described as a type of finite state phonology. The importance of this observation lies in the fact that finite state devices can be effectively constructed on a computer. Various computer implementations of Koskenniemi's two-level model have been done, but they have all required large, expensive computers. In order to bring the power of the two-level model to individual linguists who do not have access to a large computer, SIL has recently released PC-KIMMO, a computer program that runs the two-level model on personal computers, namely IBM PC compatibles and the Apple Macintosh. It is named after Kimmo Koskenniemi, the originator of the two-level model. The program is included with the book entitled PC-KIMMO: A Two-level Processor for Morphological Analysis (Antworth 1990). The book is a tutorial on developing two-level descriptions with PC-KIMMO. It teaches how to write two-level rules in the notation used above and then how to translate them into finite state tables, which is the notation the computer actually uses. For example, rules 3 and 4 above translate into the following tables:

        Rule 3: Palatalization

           t t @ @
           c @ i @
        1: 3 2 1 1
        2: 3 2 0 1
        3. 0 0 1 0


        Rule 4: Vowel Raising

           e e C @ @
           i @ C i @
        1: 4 2 1 1 1
        2: 4 2 3 1 1
        3: 4 2 1 0 1
        4. 0 0 5 0 0
        5. 0 0 0 1 0

Describing what these tables mean and how to construct them is beyond the scope of this article. Suffice it to say that while an ordinary, working linguist can learn to translate two-level rules into finite state tables, it does require motivation and a commitment of time. And what practical uses does PC-KIMMO have? Here are two:

Field linguists can use PC-KIMMO as a tool for developing and testing phonological and morphological descriptions.
Applications based on PC-KIMMO can be developed that will morphologically analyze text in preparation for interlinear glossing or dialect adaptation.

Neither two-level phonology nor PC-KIMMO is the ultimate answer to the challenges of phonological description or computational word parsing. While phonological theory has advanced beyond the classical generative theory that two-level phonology grew out of, two-level phonology is still consistent with many generally accepted and widely practised views of phonology. In addition, its formalism for rule application provides an alternative to generative rule application that can be computationally implemented in practical natural language processing systems.

References

Antworth, Evan L. 1990. PC-KIMMO: a two-level processor for morphological analysis. Occasional Publications in Academic Computing No. 16. Dallas, TX: Summer Institute of Linguistics.

Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton.

_____, and Morris Halle. 1968. The sound pattern of English. New York: Harper and Row.

Goldsmith, John A. 1990. Autosegmental and metrical phonology. Oxford and Cambridge, MA: Basil Blackwell.

Johnson, C. Douglas. 1972. Formal aspects of phonological description. The Hague: Mouton.

Kaisse, Ellen M. and Patricia A. Shaw. 1985. On the theory of Lexical Phonology. Phonology Yearbook 2:1-30.

Kay, Martin. 1983. When meta-rules are not meta-rules. In Karen Sparck Jones and Yorick Wilks, eds., Automatic natural language parsing, 94-116. Chichester: Ellis Horwood Ltd. See pages 100-104.

Koskenniemi, Kimmo. 1983. Two-level morphology: a general computational model for word-form recognition and production. Publication No. 11. Helsinki: University of Helsinki Department of General Linguistics.

Kroeger, Paul. 1990. Lexical Phonology and the rebirth of the phoneme. Notes on Linguistics 50: 11-24.

Shieber, Stuart M. 1986. An introduction to unification-based approaches to grammar. CSLI Lecture Notes No. 4. Stanford, CA: Center for the Study of Language and Information.

[2]I would like to thank those who read and commented on drafts of this paper: Gary Simons, Stuart Milliken, and David Payne.

[3]This made-up example is used for expository purposes. To make better phonological sense, the forms should have internal morpheme boundaries, for instance te+mi (otherwise there would be no basis for positing an underlying e). See the section below on the use of zero to see how morpheme boundaries are handled.