Kenneth R. Beesley
Microlytics
Abstract: Two-Level rules are declarative constraints that describe morphological alternations, such as the y~ie alternation in the plural of some English nouns (spy~spies). This document describes a compiler, written in C, that converts two-level rules into deterministic, minimized finite-state transducers. It describes the format of two-level grammars, the rule formalism, and the user interface to the compiler. It also explains how the compiler can assist the user in the development of a two-level grammar.
1. What is it?
In his 1983 dissertation [1], Kimmo Koskenniemi proposed a declarative
system of constraints for describing morphological
alternations. Constraints of this type are known as two-level
rules. Two-level rules are similar to phonological rewrite-rules
introduced in the 1960s by Noam Chomsky and Morris Halle [2] in that
both types of rules denote regular relations [3, 4, 5]. Consequently,
they can be modeled by finite-state transducers. An important
difference is that rewrite-rules are applied in a serial order whereas
two-level rules are parallel constraints that have to be satisfied
simultaneously. The two-level formalism is expressive enough so that
it can describe phenomena that in classical phonology require rule
ordering [6].
The compiler described in this document (TwoL Compiler) converts
two-level rules to deterministic, minimized finite-state
transducers. The first version of the compiler was written in
1985-1987 in Interlisp by Lauri Karttunen, Kimmo Koskenniemi, and
Ronald M. Kaplan. The current C version, based on Karttunen's 1989
Common Lisp implementation, was written by Lauri Karttunen, Todd
Yampol, and Kenneth R. Beesley (Microlytics) in consultation with
Ronald M. Kaplan at Xerox PARC in 1991-92. The compilation algorithm
and the rule formalism are as described in [8]. The compiler currently
works on Macintosh and UNIX operating systems. The interface to the
compiler described here is common to both versions. The transducers
produced with the compiler, combined with a lexicon, can be used for
word recognition and generation in systems such as Koskenniemi's
two-l program [1] and Evan Antworth's PC-KIMMO [7].
Section 2 of this document (Running the compiler) gives an
example of how to apply the compiler to an existing set of
rules. Readers who are unfamiliar with two-level grammars may wish to
start by first reading Section 3 (Two-level grammars) that
specifies the grammar format and the rule formalism in detail. Section
4 (Error checking) and 5 (Rule testing) describe ways in
which the compiler can assist the rule writer in the development of a
two-level grammar. Section 6 (Compiler options) contains
further details about the user interface to the compiler. Appendix 1
lists the on-line help messages; Appendix 2 is an example of a
two-level grammar.
When you start the compiler (by double-clicking the icon on a
Macintosh or by typing 'twolc' in UNIX), it prints a
welcoming banner and waits for the user to type a command:
(1)
Let us start with 'help':
(2)
The first step towards getting the rules converted to transducers is
to have the compiler read them and verify that there are no syntactic
errors in the input. This is what the command 'read-grammar'
does, after prompting the user for the name of the input
file.
(4)
(5)
The commands 'show-rules' and
'show' display on the screen the resulting
transducers; 'show-rules' displays all the compiled
rules, and 'show' prompts for a rule name and displays
just the transducer for that rule.
(6)
Only some of the transitions are shown explicitly; the rest can be
inferred from the listing of equivalence classes at the bottom of the
table. It shows that N:m is in a class by itself, but
the p column also represents the transitions of
p:m. The equivalence class headed by a
contains most of the symbol pairs in the alphabet. Thus the number
1 in the first row of the a column
means that there is a transition from state 1 back to
itself labeled a; and because a stands
for the whole equivalence class (a b c ... x y z), the
transducer will react to all these symbol pairs in the same way.
The absence of a transition means that the transducer fails in that
state on that input. For example, if the transducer is in state
3, a p or p:m must
follow in the input string because the state is nonfinal and there are
no transitions for other input pairs.
A conventional state diagram for the same transducer is shown below.
(7)
(8)
(9)
(10)
Both save-commands first prompt the user for the name of the output
file:
(11)
If the number of rules is not too large and the transducers are not
big, it may be advantageous to combine them into a single
transducer. This is what the command 'intersect'
does:
(13)
Intersected rules can be displayed and saved with the same commands as
the input automata:
(14)
(15)
Appendix 2 illustrates the use of the compiler on a larger and more
sophisticated set of rules.
The compiler expects as its input a two-level grammar. A
two-level grammar is a system of rules that constrain the surface
realization of lexical forms, hence the name two-level
rule. Two-level rules are parallel constraints; a pairing of a
lexical form with a surface form is valid only if it satisfies all the
rules. In principle, the grammar could consist of just a set of rules
and a specification of a pair alphabet, but in practice it is useful
to introduce some auxiliary notions. In this section we first describe
the general format of a two-level grammar and then focus on some
important details.
3.1 General format
A two-level grammar consists of sections that are headed by a
keyword. Every grammar must start with an Alphabet declaration
and end with Rules. In addition there may be three optional
sections headed by the keywords Diacritics, Sets, and
Definitions. In addition, comments can be freely inserted
anywhere. Anything that follows an exclamation mark, up to the end of
the line, is treated as a comment and ignored. If ! or
some other special symbol is used as an ordinary character, it must be
immediately preceded with a %.
The Alphabet section is a list of elements (single symbols,
complete or incomplete pairs) terminated with a semicolon. The rule
compiler needs to know all symbols in the lexical and surface alphabet
and all possible correspondences. Much of that information it can
deduce directly from the rules. But if some symbol or correspondence
is not explicitly mentioned elsewhere, it must be declared in the
Alphabet section. For example:
(16)
The optional Diacritics section is a list of single lexical
symbols terminated with a semicolon. The compiler converts them to a
list of diacritic pairs with zero surface realization. Diacritics are
typically used as features that force or block the application of
particular rules. All rules that do not specifically refer to a
diacritic allow that diacritic to occur freely without affecting the
context that the rule specifies. For example, the declaration
(17)
The optional Sets declaration serves the same purpose as
distinctive features in generative phonology: it enables the user to
define classes of symbols that are equivalent in some respect. For
example, it is often useful to define a set of vowels and a set of
consonants:
(18)
(19)
(20)
The optional Definitions section allows the user to write
regular expressions to create and name two-level relations. For
example,
(21)
The Rules section is the last part of a two-level grammar:
(22)
3.2 Regular Expressions
Like the definitions, the correspondence and context parts of rules
are written as regular expressions of symbol pairs. Symbols that occur
as the first member of some pair constitute the lexical
alphabet, the set of second members is called the surface
alphabet. The total pair alphabet of a grammar is called the set
of feasible pairs. Because all symbols in two-level rules
denote pairs (for convenience x:x is written as
x), the regular expressions of a two-level grammar
designate equal-length relations rather than simple string
languages. The correspondence part typically consists of a single pair
but it can be a more complicated expression, for example, a union of
pairs. This section reviews the syntax of regular expressions
recognized by the compiler and the regular relations over symbol pairs
that they denote.
3.2.1 Brackets and Parentheses
Enclosing a regular expression in a pair of matching brackets yields
an equivalent expression: a:b, [a:b],
[[a:b]], etc. denote the relation consisting of the
single pair a:b. An empty pair of brackets,
[], denotes the empty relation that maps the
empty string to itself.
Round parentheses indicate optionality. (A) denotes
the union of the relation A with the empty
relation.
Curly brackets, { }, can be used as an alternative to
square brackets to form a union of regular relations. Union and other
Boolean operators are discussed in section 3.2.4.
If brackets or parentheses are used as ordinary symbols, they must be
prefixed with a percent sign; for example, %[ is the
relation that maps the left bracket to itself.
3.2.2 Special symbols
The numeral 0 (zero) is special when it does not
constitute a part of some larger symbol (as in
007). It is the conventional two-level counterpart of
the epsilon symbol which stands for the empty string. But for
technical reasons 0 is different from epsilon in that
it is treated like any other symbol during the compilation of
two-level rules to transducers. In the two-level calculus, all regular
expressions must denote pairs of strings that are of the same length,
thus in a:0 both the lexical side of the relation,
a, and the surface side, 0, are
one-element strings. This is important because only equal-length
regular relations (see [5]) are closed under intersection and
negation, which are essential operations in the compilation of
two-level rules to transducers. The interpretation of zero as the
empty string takes place only in the application of the resulting
transducers to input strings. A zero that is prefixed with a percent
sign, %0, is interpreted as an ordinary numeral.
In the two-level calculus, the question mark ? stands
for the union of all known and unknown feasible pairs; it is called
the other symbol. When a regular expression containing
? is compiled into an automaton, the interpretation of
? is dynamically updated to reflect the changes in the
pair alphabet. For example, in the course of converting the
sequence
(23)
(24)
The current implementation of the compiler assumes that the pair
alphabet is completely specified by the Alphabet declaration in
conjunction with the rules. When the total pair alphabet becomes
known, the compiler removes all remaining instances of
? from the automata. Consequently, the transducers
fail to apply to strings that contain unknown symbols. Because this
has proved inconvenient in practice, future versions of the compiler
may retain ? in output tables as a place holder for
new symbols.
For historical reasons, and for backward compatibility with older rule
systems, the compiler also recognizes the equal sign,
=, as a special any symbol. The interpretation
of = as a single symbol is the union of all feasible
pairs. It is like ? except that ? also
potentially includes unknown pairs. Symbol pairs in which
= appears on one side are interpreted in the same
manner as incomplete pairs (see next section). For example,
a: is equivalent to a:=.
The symbol # is commonly used to indicate a lexical
word boundary. Languages typically have rules that refer to the
beginning or the end of words. The compiler does not need to give any
special treatment to #, but other software that uses
the results of the rule compiler may have a built-in assumption that
words are flanked by #.
3.2.3 Incomplete pairs
It is possible to leave one side of a symbol pair unspecified. Such
one-sided pairs are interpreted as the union of all feasible pairs
which contain the specified symbol on the appropriate side. For
example, if k appears on the lexical side only in the
feasible pairs k, k:0,
k:v and k:j, then the one-sided symbol
k: is implicitly defined as the union of these four
relations. Similarly, :k denotes the union of all
feasible pairs that have k as the surface symbol.
The equal sign may be used as a filler in a partially specified pair:
k:= and =:k are equivalent to
k: and :k, respectively. It is also
legal, although possibly confusing, to use a plain colon to stand for
a completely unspecified pair, denoting the union of all feasible
pairs, which is what a lone = also means.
The last line is a prompt line where the user may type one of the available
commands. The compiler will keep prompting with a new command line until the
user quits by typing a control-C (UNIX) or command-Q (Macintosh). ******************************************************
* Two-Level Compiler 1.0 *
* created by *
* Lauri Karttunen, Todd Yampol, *
* Kenneth R. Beesley, and Ronald M. Kaplan. *
* Copyright (c) 1991-92 by the Xerox Corporation. *
* All Rights Reserved. *
******************************************************
Compiler commands: compile, install-binary, intersect,
list-rules, read-grammar, save-binary,
save-tabular, show, show-rules.
Switches: resolve, time, trace.
Information: ?, banner, help, storage, switches.
Command (C-Q = quit):
The two main commands are 'read-grammar' and
'compile'. To illustrate what they do, let us assume
that the user has the following mini-grammar stored as a file named
rules.txt:Command (C-Q = quit): help
Typing "help
(3)! First example. Two rules for making the lexical form "kaNpan"
! realized as "kamman".
Alphabet
a b c d e f g h i j k l m N:m N:n n o p q r s t u v x y w z ;
Rules
"N realized as m" ! Before a lexical "p." Always, and only there.
N:m <=> _ p: ;
"p realized as m" ! After a surface "m". Always and only there.
p:m <=> :m _ ;
Note that the exclamation mark is a comment character; after seeing
!, the compiler ignores the rest of line. (More about
the format of the grammar in Section 3.)
As it processes the input, the compiler prints messages about its progress and
about errors it detects along the way. Here no errors occurred during
`read-grammar', so we are ready for the next step,
`compile':Command (C-Q = quit): read-grammar
Input file: rules.txt
reading from 'rules.txt'...
Alphabet... Rules...
"N realized as m" "p realized as m"
Command (C-Q = quit):
The 'redo' command is useful if an error is detected
in the course of parsing or compiling. After correcting the problem,
the user can re-execute the 'read-grammar' and
'compile' commands on the current file by just
typing 'redo'.Command (C-Q = quit): compile
Compiling "rules.txt"
Expanding... Analyzing...
Compiling rule components:
"N realized as m" "p realized as m"
Compiling rules:
"N realized as m"
N:m <=>
"p realized as m"
p:m <=>
Done.
Command (C-Q = quit):
Transducers are displayed on the screen as tables in which each row
represents a state and the columns represent transition functions for
a class of symbol pairs. In a transducer, all symbols are pairs but
conventionally we write just one symbol when the two members of the
pair are the same. Thus a and p in the
table above mean a:a and p:p,
respectively. The first member of a pair is a lexical symbol,
the second is a surface symbol. The rows are labeled with the
name of the state (an integer) followed by a colon if the state is
final and a period if it is nonfinal. At the head of each column is
the representative first member of a class of symbol pairs that have
the transitions indicated by the column. For example, the number
3 in the first row of the N:m column
in the above table means that there is a transition from state
1 (a final state) to state 3 (a
nonfinal state) labeled N:m.Command (C-Q = quit): show
Rule name: "N realized as m"
a p N:m N:n
1: 1 1 3 2
2: 1 3 2
3. 1
Equivalence classes:
((a b c d e f g h i j k l m n o q r s t u v w x y z) (p p:m) (N:m)
(N:n))
Command (C-Q = quit):
This table is equivalent to the state diagramCommand (C-Q = quit): show "p realized as m"
a m p p:m
1: 1 2 1
2: 1 2 2
Equivalence classes:
((a b c d e f g h i j k l n o q r s t u v w x y z N:n) (m N:m) (p)
(p:m))
The command 'lex-test' prompts for a lexical string
and applies the transducers to it, returning any surface strings that
are generated and the mapping from the lexical to the surface string
as a listing of symbol pairs.Command (C-Q = quit): list-rules
"N realized as m" 3 x 4
"p realized as m" 2 x 4
The command 'pair-test' prompts for both a lexical
string and a surface string; these paired strings, which must contain
an equal number of symbols, are then run through the transducers,
which will either accept or reject them.Command (C-Q quit): lex-test
Lexical string (q = quit): kaNpat
kammat
k
a
N:m
p:m
a
t
The results of the compilation can be saved in two formats:
'save-binary' and 'save-tabular'. The
command 'save-binary' produces a compact binary file
from which the automata can be reloaded to the compiler for testing or
intersection; 'save-tabular' makes a simple ASCII file
that can be read in by Kimmo Koskenniemi's two-l program for
morphological analysis. We expect that future versions of PC-KIMMO [8]
will also accept the same tabular format.Command (C-Q = quit): pair-test
Lexical string (q = quit): kaNpat
Surface string (q = quit): kammat
k
a
N:m
p:m
a
t
ACCEPTED
Lexical string (q = quit): kaNpat
Surface string (q = quit): kampat
k
a
N:m
p
REJECTED: "p realized as m" fails in state 2.
The resulting tabular file looks like this:Command (C-Q = quit): save-tabular
Output file: rules.aut
Writing "rules.aut"
Command (C-Q = quit):
(12)ALPHABET
N a b c d e f g h i j k l m n o p q r s t u v w x y z
NULL 0
END
AUTOMATA
"N realized as m" 3 4
1: 1 1 3 2
2: 1 0 3 2
3. 0 1 0 0
"p realized as m" 2 4
1: 1 2 1 0
2: 1 2 0 2
ALIGNMENT
a a 1 1
b b 1 1
... [omitting alignments of b:b, c:c, etc.]
z z 1 1
N m 3 2
N n 4 1
p m 2 4
END
Transition tables in a tabular file are the same that
'show' and 'show-rules' display on the
screen except that 0 rather than a blank is used to
indicate that there is no transition for a symbol. Equivalence classes
are not explicitly shown. The alignment table at the end of the file
indicates for every pair in the alphabet which column in each
transducer applies to that pair. For example, the line
means that the transitions for the pair N:m are in the third
column of the first transducer and in the second column of the second
transducer.N m 3 2
The user has the option of selecting a set of rules to be intersected
or applying the operation to all the rules. The default is to
intersect all rules. Another option is to choose a name for the
result of the intersection. If the user simply hits a carriage return
in response to both prompts, as in the example above, the result is
called "Unnamed 1".Command (C-Q = quit): intersect
Intersect all rules? (y/n) [y]:
Name the intersection:
Composing common alphabet...Expanding...Intersecting...
4 states, 6 equivalence classes, 15 arcs.
Command (C-Q = quit):
This single transducer describes the same relation as original rules
in tandem but we can no longer see which individual rule causes the
rejection of an incorrect pair: Command (C-Q = quit): show-rules
"Unnamed 1"
a m p N:m N:n p:m
1: 1 3 1 2 4
2. 3
3: 1 3 2 4 3
4: 1 3 2 4
Equivalence classes:
((a b c d e f g h i j k l n o q r s t u v w x y z) (m) (p) (N:m)
(N:n) (p:m))
The intersection of two or more rule transducers typically
has more states than its components; in the worst case its size is the
product of the input sizes. The processing time required to run a
transducer during analysis or generation is fairly constant and
independent of the size; if it is feasible to intersect some rules,
the resulting system is faster.Command (C-Q = quit): pair-test
Lexical string (q = quit): kaNpat
Surface string (q = quit): kampat
k
a
N:m
p
REJECTED: Rule "Unnamed 1" fails in state 2.
This declaration tells the compiler that a (=
a:a), b:0, and 0:c are
possible pairs. It does not constrain them in any way. If
b and b:0 are possible pairs and there
are no rules that require b to be realized only one
way in some contexts, then b can be deleted or
realized as b anywhere. The incomplete pair
:d tells the compiler that d is a
possible surface symbol but does not indicate what its lexical
counterparts are; e: stands for pairs that have an
e on the lexical side. Because incomplete pairs like
:d and e: in the Alphabet
section are meaningful only if there are complete pairs of that type
declared elsewhere, it is not necessary to mention them, although it
may be a useful reminder for the rule writer.Alphabet
a b:0 0:c :d e: ;
tells the compiler that ':0 and @:0 are
possible pairs but are irrelevant to any rule that doesn't mention them
explicitly.Diacritics
' @ ;
A symbol that has been defined as a set can be used in a rule to replace a
union. For example, Sets
Vowel = a e i o u;
Cons = b c d f g h j k l m n p q r s t v x y w z ;
VclessStop = k p t ;
Liquid = l r ;
is equivalent tok:0 <=> Vowel _ Vowel ;
Similarly, Vowel:0 means {a:0 | e:0 | i:0 | o:0
| u:0}, assuming that all five pairs occur somewhere in the
grammar (more about that in Section 3.6).k:0 <=> {a | e | i | o | u} _ {a | e | i | o | u} ;
defines ClosedOffset as a relation that has on the
surface side a consonant followed by a second surface consonant or a
word boundary on the lexical side. When ClosedOffset
is found in a rule, the compiler interprets it as a reference to a
relation it only needs to compute once. From the grammar writer's
point of view, definitions can be thought of and used like macros.Definitions
ClosedOffset = :Cons {:Cons | #:} ;
Each rule must have (1) a name, written within double quotes,
(2) a lexical:surface correspondence like k:0
that is constrained by the rule, (3) a rule operator that
defines the nature of the constraint, (4) one or more contexts,
divided between a left and a right context by an
underscore, and, optionally (5) a where-clause that specifies
the assignment of values to variables (if any). Each context and the
optional where clause must be terminated with a
semicolon. Although the formalism does not require it, in practice it
is useful to include a couple of comment lines (starting with
!) to explain what the rule is intended to do. If a
semicolon, underscore or exclamation point is used in a rule as an
ordinary symbol, it must be prefixed with %, the
escape character. The same goes, of course, for the escape character
itself.Rules
"Consonant gradation" ! Single k, p, and t are realized as
! zero, v, and d, respectively, in the
! beginning of a closed syllable.
Cx:Cy <=> h | Liquid | Vowel: _ Vowel ClosedOffset;
where Cx in (k p t)
Cy in (0 v d)
matched ;
"Geminate gradation" ! Geminate stops are shortened to
! single stops in a closed syllable.
Cx:0 <=> Cx _ Vowel ClosedOffset ;
where Cx in VclessStop ;
to a transducer from left to right, the first member of the
concatenation starts out just as other and becomes the union of
other with a:b and c:d as the
interpretation proceeds. An equivalent expression for the same
relation is? a:b c:d
where the brackets and vertical bars mark the union. A question mark
prefixed with a percent sign, %?, is interpreted as a
question mark without a special meaning.[ ? | a:b | c:d] a:b c:d
[] = ([]) | Because [] denotes the empty relation, ([]) is the union of the empty relation with itself. |
\[] = ? | The term complement of the empty relation consists of the union of all feasible pairs. |
~[] = ?+ | The complement of the empty relation is the set of all nonempty relations. |
$? = ?+ | Yet another way of describing the class of nonempty relations. |
$[] = ?* | The class of relations that contain the empty relation is the same as zero or more concatenations of all possible pairs. |
~$[] = ~[?*] | Two equivalent ways of describing the null relation, that is, a relation that does not contain anything at all, not even the pairing of an empty string with itself. |
A | ~$[] = A | The null relation is the identity element for union. |
A & ?* = A | The universal relation is the identity for intersection. |
A - ~$[] = A | The null relation is the right-sided identity for minus. |
A ~$[] = ~$[] | Concatenating anything to the null relation (on either side) yields the null relation. |
A [] = A | The empty relation is the identity for concatenation. |
?* ?* = ?* | Concatenating the universal relation to itself is vacuous. |
(A) ?* = ?* | Concatenating an optional element to the universal relation (on either side) is vacuous because (A) includes the empty relation. |
~[?+] ?* = ?* | Concatenating the complement of any nonempty relation to the universal relation (on either side) is vacuous for the same reason. |
In writing two-level rules, it is important to keep the last two equivalences in mind. Optional relations and negated nonempty relations at the outer edges of a context specification are meaningless because all contexts are extended by concatenating the universal relation to the beginning of the left context and to the end of the right context. A rule written as
(25)
in fact meansA <=> B _ C ;
(26)
A <=> ?* B _ C ?* ;
Operator | Example | Translation | Table 1 |
<= | a:b <= c _ d | a is always realized as b in the context c _ d. | |
=> | a:b => c _ d | a is realized as b only in the context c _ d. | |
<=> | a:b <=> c _ d | a is realized as b in c _ d and nowhere else. | |
/<= | a:b /<= c _ d | a is never realized as b in the context c _ d. |
The meaning of simple two-level rules that do not involve multiple contexts or variables can be explained easily in terms of the regular expression calculus described in section 3.2.
The prohibition rule,
(27)
disallows the appearance of a:b when preceded by c and followed by d. A regular expression stating this condition isa:b /<= c _ d ;
(28)
Note the extension of the context with the universal relation on both sides.~[?* c a:b d ?*]
The left-arrow rule
(29)
requires that a lexical a be realized as b when preceded by c and followed by d. It fails if a lexical a has some other surface realization in this context. A regular expression to that effect isa:b <= c _ d ;
(30)
There is an asymmetry in that the rule does not require a surface b in this environment to have a lexical a as its counterpart. This shows up in the expression [a: - a:b] that denotes the union of all pairs that have a on the lexical side but something other than b on the surface side.~[?* c [a: - a:b] d ?*]
The right arrow rule
(31)
restricts the occurrence of a:b to contexts in which it is preceded by c and followed by d. A regular expression for this constraint is a bit more convoluted than in the case of /<= and <= rules. The first disjunct below says that c does not precede a:b, the second disjunct states that d does not follow a:b; the entire expression states that neither one of these bad situations is allowed.a:b => c _ d ;
(32)
The double arrow rule~[[~[?* c] a:b ?*] | [?* a:b ~[d ?*]]]
(33)
is simply a combination of the left and right arrow rules. It requires the realization of a as b in this context and allows it only there. The corresponding regular expression is the conjunction of the formulas expressing the <= and => aspects of the constraint.a:b <=> c _ d ;
3.4 Contexts
A two-level rule can have any number of contexts. In the compilation
of <= and /<= rules (and the
<= side of double arrow rules), multiple contexts
are treated conjunctively.
(34)
In => rules, multiple contexts are read
disjunctively.
(35)
(36)
(37)
The first a:b correspondence is allowed because it
precedes b on the surface side, as the first context
stipulates; the second a:b correspondence is allowed
because it follows an a on the lexical side, as
required by the second context. Deriving the correct transducer from
rules of this sort is quite complicated. See [7] for details.
Because two-level rules can refer to both lexical and surface
contexts, they can describe phenomena that in classical rewriting
systems require ordered rules. For example, the two rules in Example
(3) in section 2:
(3)
"p realized as m" ! After a surface "m". Always and only there.
p:m <=> :m _ ;
The numbers in the middle trace the state transitions in the
transducers (6) and (7) compiled from the two rules in section 2 as
they process the lexical-surface matches in this example. The
corresponding rewrite rules,
(38)
3.5 Variables
A two-level rule may contain any number of variables that take
as value a set of simple symbols or sets. Variables and the assignment
of values to them are specified in a where clause that follows
the last context of the rule. A where clause consists of (1)
the keyword where followed by (2) one or more variable
declarations, and (3) an optional keyword (matched or
freely) that specifies the mode of value assignment. A
variable declaration contains (1) a variable name, (2) the keyword
in, and (3) a range of values. The range may be either a set
name or a list closed in parentheses containing symbols or set
names. For example,
(39)
(40)
If the keyword matched is not present at the end of a
where clause, the assignment of values to variables is not
coordinated. In this case, assigning values freely to the two
variables would create 16 subrules with all possible pairings of
values from the two sets.
Matched variables are also convenient when there is a dependence
between the correspondence and context parts of the rule. A case of
that sort is a rule that creates a geminate consonant at a morpheme
boundary:
(41)
Variables can also be used to encode dependences between left and
right parts of the context. For example, if HighLabial
is defined as
(42)
(43)
(44)
3.6 Inventory of feasible pairs
When a set is used as the range specification, the values are assigned
from the set in the order they appear in the set declaration. A more
complicated case is when a set name and an ordinary symbol appear in
the same value list. Assuming the set definitions
(45)
(46)
The compiler expands the "Back Harmony" rule to two
subrules:
(47)
One possibility would be to take the crossproduct of the two sets in
question: 0:a, 0:i4,
0:o, 0:u, I:a,
I:i4, I:o,
I:u. However, the compiler follows a more conservative
policy: every feasible pair must appear explicitly somewhere in the
grammar. Pairs that do not appear in the alphabet declaration or
in some rule are considered to be impossible. In the case at hand, it
turns out that four of the eight pairs appear elsewhere in the grammar
(0:i4, 0:u, I:i4,
I:u) but the other four do not. Consequently, the
compiler interprets the "Back Harmony" rule as
(48)
4. Error checking
4.1 Simple errors
If there is a syntactic error in the grammar, such as a missing colon
or an extra bracket, it will be detected and reported by
`read-grammar' when the rules and declarations are
parsed. The parser also performs some more sophisticated checks. It
makes sure that sets and relation names are defined before they are
used. It tries to determine that variables are used correctly. The
`compile' command is not available if syntactic errors
are detected in the parsing phase.
The `compile' routine makes further checks on
definitions and variable assignments. For example, it reports an error
if a term that has been defined as a relation appears on the upper or
lower side of a symbol pair. If errors are detected in the initial
stages, the compilation is aborted. When a rule has been compiled, the
resulting transducer is checked to make sure that it does not
completely block the appearance of some symbol pair or pairs. This
type of error occurs, for example, in the following rule:
(49)
(50)
(51)
(52)
Even if the compiler is not able to discover the reason for the error,
it will complain if some class of symbol pairs becomes completely
blocked by a rule.
4.2 Rule conflicts
Because two-level rules are statements about what is required,
permitted, or prohibited and these statements all have equal force, it
can easily happen that some rules are in conflict with one
another. There are two basic types of conflicts. In a left-arrow
conflict, two rules require a certain lexical symbol to be
realized in two different ways in the same context. In a
right-arrow conflict, different rules place contradictory
constraints on contexts in which a certain correspondence is
allowed.
The following pair of sandhi rules for Sanskrit is an example of a
typical left-arrow conflict:
(53)
(54)
Some conflicts are more difficult to detect because they arise only
under certain assignments of values to variables. On closer
inspection, our Gemination rule, (37), discussed in section 3.5 turns
out to harbor a right-arrow conflict.
(41)
(55)
One option is to remove the two conflicting cases from the general
gemination rule and write a second rule especially for
%+:k:
(56)
4.3 Conflict Resolution
The compiler can find all left- and right-arrow conflicts before a
rule is compiled by comparing every one of its contexts and subrules
with all other contexts and rules in the grammar. Whether the compiler
does that depends on the setting of the `resolve' flag
that the user can toggle (section 5). The default is to check, report
and, if possible, resolve all conflicts. For example, the problem with
the two sandhi rules, (49) in section 4.2, is duly noticed when the
compiler takes up the first rule:
(57) In a
case like this one, if it is necessary to eliminate the conflict, the
rule writer has to decide which rule should have precedence in the
disputed case and amend the context specifications in the appropriate
way. There is no general principle that could be relied on to resolve
the matter. But there are left-arrow conflicts that can be resolved in
a principled way. A case in point is the conflict between the
"Consonant gradation" rule, (22) in section 3.1, and
the "Gradation of k between u/y", (43) in section
3.5;
(58)
(59)
(60)
What makes this case different from the conflict between the Sanskrit
rules is that the context of one rule is properly nested inside the
context of the other. Both rules have ClosedOffset at
the edge of the right context, a pair of identical vowels is a pair of
vowels, and the left end of the k:v context,
Cons, is a part of the universal relation that extends
the left side of the k:0 rule. The relation between
the two rule contexts is illustrated in Figure 4:
Viewed in this way, the k:v rule is obviously an
exception to a more general rule about k which in turn
is a subcase of a yet more general rule dealing with the alternation
of single stops in Finnish. One expects a specific rule to take
precedence over a general one, and that is the case here: in the
domain of the k:v rule, k should be
realized as v.
When the compiler sees a left-arrow conflict between two rules that
are related in this way, it does not report it as an error but
automatically gives precedence to the more specific rule. In the case
at hand, it reports:
(61)
(62)
Another common type of conflict that the compiler can resolve is
illustrated by the two contradictory subrules of the gemination rule
discussed in section 4.2:
(63)
(64)
(65)
(66)
(67)
(2) Right-arrow conflicts are always resolved.
5. Rule testing
The compiler provides two kinds of rule testing: 'lex-test'
and 'lex-test-file' apply the rule transducers to lexical
strings, generating surface strings; 'pair-test' and
'pair-test-file' apply the rule transducers to a lexical
string and a surface string, either accepting or rejecting the pair.
Both test modes can be valuable for debugging rules.
5.1 Applying rule transducers to lexical forms
Once rules have been read in using 'read-grammar' and
compiled using 'compile', they can be tested using
'lex-test'. Previously compiled rules that have been
saved to file using 'save-binary' can also be loaded
for testing using 'install-binary'.
'lex-test' prompts for a lexical string, applies the
rule transducers, and returns any surface strings that are generated.
The following example uses the rules.txt grammar in example (3).
When the user enters the lexical string kampan or
kaNpan, the rule transducers generate the surface
string kamman.
(68)
The list of lexical-surface pairs is helpful when the output string
is related to the input in a more complicated way than in the cases
we have seen so far. For example, a set of rules deriving past tense
forms for irregular verbs in English might relate buy
to bought in the following way:
(69)
The command 'lex-test-file' works exactly like
'lex-test' except that the rule transducers are applied
to lexical forms read in from a file. The user is prompted for the
input file and output file; specifying the output file '-'
(hyphen) causes the output to be written to the screen.
The input file for 'lex-test'file' should consist of
lexical strings, one per line. Blank lines are ignored.
(70)
5.2 Checking lexical-surface string pairs
Compiled rules can also be tested in 'pair-test' mode,
which prompts for both a lexical string and a surface string. If the
rule transducers allow the pairing of the two strings, the program prints
ACCEPTED; if the pairing of the strings is illegal,
it prints out REJECTED, the name of the violated rule,
and the state in which the rule failed. Because 'pair-test'
also prints out any initial substings up to and including the point of
rejection, the exact input causing the failure is easily identified.
(71)
The 'pair-test' facility detects and reports various
input errors, including the absence of compiled transducers, mismatched
string lengths, undeclared alphabetic symbols, and illegal lexical-surface
pairs. 'pair-test' is especially valuable for finding
the cause of undergeneration, where some rule in the system is blocking
an analysis or generation that is considered valid.
The command 'pair-test-file' works exactly like
'pair-test' except that the input is read from a file.
The user is prompted for an input file and and output file. As with
'lex-test-file', specifying '-' (hyphen)
as the output file causes the output to be written to the screen.
The input file for 'pair-test-file' should contain
pairs of a lexical string followed on a separate line by a matching
surface string, with one string per line. Blank lines are ignored. As
with 'pair-test', the strings can optionally be
filled out with spaces to allow vertical alignment of individual
feasible pairs.
(72)
6. Compiler options
6.1 Binary output
The results of the compilation can be saved as simple ASCII tables
(`save-tabular') or in a more compact binary
representation (`save-binary'). The binary encoding
format is documented in [9]. If the data is saved in binary form, the
transducers can later be read back in from the file using the command
`install-binary' for intersecting or just for
displaying the tables on the screen. There is currently no
`install-tabular' command.
6.2 Switches
The compiler has three on-off switches that the user can toggle:
`resolve', `time', and `trace'.
The command `switches' reports on the current status:
(73)
6.2.1 Resolve
Turning off conflict resolution speeds up the compiler, but it is not
recommended unless the rule system is very large and already known to
be free of conflicts. Even if the `resolve' mode is
turned off, the compiler checks for gross errors of the type discussed
in 4.1. It can catch errors in a single rule, but it does not compare
any rule with respect to other rules in the grammar. Because of that,
doing the intersection of a set of rules compiled with
`resolve' turned off can fail because the intersection
brings out any hidden rule conflicts.
6.2.2 Time
`time' prints out the execution time of
`read-grammar', `compile', and
`intersect' commands.
6.2.3 Trace
If the `trace' flag (= verbose mode) is turned on for
`read-grammar', the alphabet, rules and other sections
of the grammar are displayed on the screen as they are read
in. Regular expressions are printed with minimal bracketing as the
compiler has interpreted them.
With the `trace' flag off, `compile'
prints a status message as it moves from one compilation phase to the
next and the name of the rule it is working on. During the last stage,
when individual subrules are compiled, the correspondence part of the
subrule and the operator are displayed on the screen. If the rule is a
double-arrow rule, the left-arrow part is compiled first, then the
right-arrow part counting down context numbers, if there are more than
one; finally, the transducers for the two parts are intersected. For
example, when the compiler starts to work on the "Gradation of
k between u/y" rule in (59), which expands to one subrule
with two contexts (60), it prints
6.3 Storage
The `storage' command lists the amount of memory the
compiler is using for various internal data structures:
(74)
Acknowledgments
Thanks to Ronald M. Kaplan, and Annie Zaenen for the many comments and
suggestions that helped to improve the content and the structure of
this paper.
References
[1] Koskenniemi, Kimmo. Two-level Morphology: A General
Computational Model for Word-Form Recognition and
Production. Department of General Linguistics. University of
Helsinki. 1983.
[2] Chomsky, Noam and Morris Halle. The Sound Pattern of
English. Harper and Row. New York. 1968.
[3] Johnson, C. Douglas. Formal Aspects of Phonological
Description. Mouton. The Hague. 1972.
[4] Kaplan, Ronald M. and Martin Kay. Phonological rules and
finite-state transducers [Abstract]. Linguistic Society of America
Meeting Handbook. Fifty-sixth Annual Meeting, December 27-30,
1981. New York.
[5] Ritchie, Graeme. Languages Generated by Two-Level Morphological
Rules. Computational Linguistics 18(1): 41-59. 1992.
[6] Karttunen, Lauri. Finite-State Constraints. In the Proceedings
of the International Conference on Current Issues in Computational
Linguistics, June 10-14, 1991. Universiti Sains Malaysia, Penang,
Malaysia. 1991. To appear in The Last Phonological Rule, ed. by
John Goldsmith, Chicago University Press.
[7] Antworth, Evan L. PC-KIMMO: a two-level processor for
morphological analysis. Occasional Publications in Academic
Computing No. 16, Summer Institute of Linguistics, Dallas,
Texas. 1990.
[8] Karttunen, Lauri, Kimmo Koskenniemi, and Ronald M. Kaplan. A
Compiler for Two-level Phonological Rules. In Dalrymple, M. et
al. Tools for Morphological Analysis. Center for the Study of
Language and Information. Stanford University. Palo Alto. 1987.
[9] Karttunen, Lauri. Binary Encoding Format for Finite-State
Networks. Technical Report SSL-90-17. Xerox Palo Alto Research
Center. Palo Alto. March 1990.
means that a must be realized as b in
the context c _ d and in the context e _
f. The corresponding /<= rule prohibits
the occurrence of a:b in both contexts.a:b <= c _ d ;
e _ f ;
says that a:b must occur either in the context
c _ d or in the context e _
f. The compilation of right-arrow rules with multiple
contexts is more complicated than the formula for the one-context rule
in section 3.3 suggests because the relation that is regulated may
itself be involved in context specifications. For example, the rulea:b => c _ d ;
e _ f ;
allows the realization of the lexical string aa as
bb:a:b => _ :b ;
a: _ ;
have the effect of realizing the lexical string kaNpan as
kamman when applied in parallel: "N realized as m" ! Before a lexical "p." Always, and only there.
N:m <=> _ p: ;
would have to be applied sequentially to generate the same surface forms
(kaNpan --> kampan --> kamman). See [6] for more examples
and discussion of this aspect of the two-level rule formalism.N -> m / _ p
p -> m / m _
In this rule, the two variables, Cx and
Cy, each have four possible values. The keyword
matched means that the assignment of values to the two
variables is done in tandem so that when Cx takes its
n'th value, Cy has its n'th value. The
compiler interprets this type of rule as a conjunction of four
independent subrules: "Final Devoicing"
Cx:Cy <=> _ #: ;
where Cx in (b d c g)
Cy in (p t c2 k)
matched ;
Subrules are compiled individually and the resulting transducers are
intersected. b:p <=> _ #: ;
d:t <=> _ #: ;
c:c2 <=> _ #: ;
g:k <=> _ #: ;
For example, the rule would realize lexical forms such as
big+er, picnic+ed, and
yak+ing as the appropriate English surface forms
bigger, picnicked, and
yakking (but... see section 4.2)."Gemination"
%+:Cx <=> Cy _ Vowel ;
where Cx in (b k d f g k l m n p r s t v z)
Cy in (b c d f g k l m n p r s t v z)
matched ;
the following rule realizes k as v in
two contexts:HighLabial = u y ;
Here no variable occurs in the correspondence part of the rule; the
compiler only needs to expand the context part:"Gradation of k between u/y" ! k weakens to v between u's
! and y's.
k:v <=> Cons Vx _ Vx ClosedOffset ;
where Vx in HighLabial ;
Because the rule only contains one variable, the interpretation of
(43) is the same regardless of whether the keyword matched is
present or absent. k:v <=> Cons u _ u ClosedOffset ;
Cons y _ y ClosedOffset ;
what is the meaning of the following harmony rule for Turkish?BackVowel = a i4 o u ;
HiHarmony = 0 I ;
"Back Harmony"
Vx:Vy <=> :BackVowel :Cons* _ ;
where Vx in (HiHarmony E)
Vy in (BackVowel a)
matched ;
The second subrule can be compiled straightforwardly but the first
subrule raises the question of what interpretation should be assigned
to the pairHiHarmony:BackVowel <=> :BackVowel :Cons* _ ;
E:a <=> :BackVowel :Cons* _ ;
which has a set name on both sides. This same problem also rises in
any simple rule that has a set name in the correspondence part.HiHarmony:BackVowel
Note that the E:a pair is assumed to be valid even if
it does not appear anywhere else in the grammar because it arises
directly from the interpretation of the where clause. Because
the interpretation of "implicit definitions," such as
E:, :a,
HiHarmony:BackVowel, etc. depends on the inventory of
feasible pairs, it is important to keep in mind that some pairs can
arise as a result of interpreting where clauses.0:i4 | 0:u | I:i4 | I:u <=> :BackVowel :Cons* _ ;
E:a <=> :BackVowel :Cons* _ ;
The problem with this rule is that the left context is completely
unspecified. This not an error in rules that constrain the surface
realization of some lexical symbol but this rule requires the
insertion of a surface element that corresponds to a zero on the
lexical side. The rule requires that there be 0:? pair
before every vowel even if the vowel is preceded by such a pair. This
creates a need for an infinite regression of inserted glottal stops
before every vowel. Consequently, the resulting transducer does not
allow any vowels at all:"Glottal stop insertion" ! Insert '?' before a vowel.
! (This rule does not have the intended
! effect.)
0:%? <= _ Vowel ;
One equivalence class in the table has no transitions. This is a type
of error that the compiler anticipates:"Glottal stop insertion"
b
1: 1
Equivalence classes:
((a e i o u) (b c d f g h j k l m n p q r s t v w x y z 0:?))
The correction that the user needs to make is to specify in the rule that one
insertion of a glottal stop is enough:*** Error: Rule "Glottal stop insertion"
is defective. It disallows
a e i o u
Perhaps the rule requires an infinite progression of "0:?"
\0:%? denotes the union of all feasible pairs other than
0:%?.0:%? <= \0:%? _ Vowel ;
The two rules appear have different contexts but in fact they overlap
in the case of lexical strings such as"8b. Visarga after a (change preceding a)"
a:o <=> _ s: @ [ VoicedCons: | a: ] ;
"8c. Visarga after a (change following a)"
a:' <=> a: s: @ _ ;
According to the rule 8b, the middle a should be
realized as o, according to 8c it should be realized
as '. Because the rules are in conflict, lexical
forms containing such strings (if they exist in the language), do not
have any surface realization.a s @ a s @ a
When the compiler expands the rule replacing Cx and
Cy with their matching values, it creates, among others, the
two subrules shown below:"Gemination"
%+:Cx <=> Cy _ Vowel ;
where Cx in (b k d f g k l m n p r s t v z)
Cy in (b c d f g k l m n p r s t v z)
matched ;
The first subrule is for the inserted k in words like
frolicked, the second for words like
trekked. Because the morpheme boundary gets realized
as k in both cases, the rules are in conflict: one
allows the %+:k pair only in the context c _
Vowel, the second restricts it to k _
Vowel. The result is that frolic+ed and
trek+ed do not get realized at all unless the
right-arrow conflict is resolved. %+:k <=> c _ Vowel ;
%+:k <=> k _ Vowel ;
It is also possible to leave the original rule as it is and let the
compiler take care of the problem. This is the topic of the next
section."K gemination"
%+:k <=> c | k _ Vowel ;
The message identifies the conflicting contexts, the pair that is
causing the problem, and the names of the rules involved. "8b. Visarga after a (change preceding a)"
_ s: @ [VoicedCons: | a:] ;
a: s: @ _ ;
*** Warning: Unresolved <= conflict with respect to 'a:o' vs. 'a:''
between "8b. Visarga after a (change preceding a)"
and "8c. Visarga after a (change following a)"
The "Consonant gradation" rule expands to three subrules:"Consonant gradation" ! Single k, p, and t are realized as
! zero, v, and d, respectively, in the
! in the beginning of a closed syllable.
Cx:Cy <=> h | Liquid | Vowel: _ Vowel ClosedOffset;
where Cx in (k p t)
Cy in (0 v d)
matched ;
"Gradation of k between u/y" ! k weakens to v between u's and y's
k:v <=> Cons Vx _ Vx ClosedOffset ;
where Vx in HighLabial ;
The k:v rule expands tok:0 <=> h | Liquid | Vowel: _ Vowel ClosedOffset ;
p:v <=> h | Liquid | Vowel: _ Vowel ClosedOffset ;
t:d <=> h | Liquid | Vowel: _ Vowel ClosedOffset ;
The k:0 subrule of "Consonant
gradation" is in obvious conflict with "Gradation of k
between u/y" because lexical forms like pukun
satisfy the context specification of both rules. Should
k be deleted or realized as v in this
context?k:v <=> Cons u _ u ClosedOffset ;
Cons y _ y ClosedOffset ;
If the more specific rule is a double-arrow rule, as is the case here,
the resolution can be done by modifying the correspondence part of the
general rule so that it also allows the realization required in the
exceptional case. The first subrule of "Consonant
gradation" is compiled as if it were>>> Resolving a <= conflict with respect to 'k:0' vs. 'k:v'
between "Consonant gradation"
and "Gradation of k between u/y"
by giving precedence to "Gradation of k between u/y"
Weakening the general rule in this way when the specific rule has a
=> side does not lead to any problems. Because the
k:v rule does not allow k to be
realized as v outside its own domain, it does not
matter if the general rule allows k to be realized as
v or deleted in the wider context. If the specific
rule is only a left-arrow rule, the conflict is resolved by
subtracting its contexts from the contexts of the general rule.k:0 | k:v <=> h | Liquid | Vowel: _ Vowel ClosedOffset ;
By their right-arrow side, both of these rules constrain the
realization of the morpheme boundary as k, but the
contexts in which they allow it are completely disjoint: if
%+ is preceded by c it is not preceded
by k, and vice versa. When the compiler sees a
conflict of this type, it takes the view that the rule writer does not
intend to completely prohibit the %+:k realization but
wants to allow it in both environments. In compiling the right-arrow
sides of the two subrules, the compiler amends the rules so that they
allow the %+:k pair in both contexts and reports:%+:k <=> c _ Vowel ;
%+:k <=> k _ Vowel ;
Besides the left-arrow conflict discussed above, our "Consonant
gradation" rule is also involved in a right-arrow conflict
with the "Geminate gradation" rule we presented in
example (18), section 3.1:>>> Resolving a => conflict with respect to '+:k'
within "Gemination"
One of the expansions of the Finnish geminate rule is"Geminate gradation" ! Geminate stops are shortened to
! single stops in a closed syllable.
Cx:0 <=> Cx _ Vowel ClosedOffset ;
where Cx in VclessStop ;
The => side of this rule clashes with the first
subrule of the gradation rule for single stops which also constrains
the deletion of k. But because the contexts are
completely disjoint, the compiler eliminates the problem:k:0 <=> k _ Vowel ClosedOffset ;
In dealing with rule conflicts the compiler relies on two principles: >>> Resolving a => conflict with respect to 'k:0'
between "Consonant gradation"
and "Geminate gradation"
(1) Left-arrow conflicts are resolved if the context of one rule is
completely subsumed by the context of the other. The more specific
rule has precedence. Other left-arrow conflicts are reported.
Checking for conflicts and resolving them increases the compilation
time, up to 30% in large rule systems, but it can save the rule writer
from errors that take much longer to track down. It allows the option
of writing rules that express broad generalizations even though there
are specific exceptions. See Appendix 2 for more examples of conflict
resolution.
For grammars that refer to word-boundary characters,
'#' is automatically added to each end of the lexical
string. 'lex-test' finds and reports various errors,
including undeclared lexical symbols and the absence of compiled
transducers to test. The 'lex-test' facility is
especially valuable for detecting overgeneration, when the rules
generate illegal surface strings. Command (C-Q = quit): read-grammar rules.txt
reading from 'rules.txt' ...
Alphabet... Rules...
"N realized as m" "p realized as m"
Command (C-Q = quit): compile
--- output omitted --
Done.
Command (C-Q = quit): lex-test
Lexical string (q = quit): kaNpan
kamman
k
a
N:m
p:m
a
n
Because 0's are suppressed in the lexical input and
the surface output for convenience and legibility, only the pairwise
listing shows the real mapping. Lexical string ('q' = quit): buy`+{Vpast}
bought
#:0
b
u:o
y:u
0:g
`:0
+:h
{VPast}:t
#:0
kaNpat
kaNmat
kampat
KaNkat
The lexical and surface strings must contain an equal number of symbols,
including 0 (the two-level epsilon) where appropriate.
To facilitate lining up the lexical and surface symbols, the lexical and
surface strings can optionally be filled out with spaces, which are
ignored. For grammars that refer to word-boundary symbols, the pair
#:0 is automatially added to both ends of the paired
strings. Command (C-Q = quit): pair-test
Lexical string (q = quit): kaNpat
Surface string (q = quit): kammat
k
a
N:m
p:m
a
t
ACCEPTED
Lexical string (q = quit): kaNpat
Surface string (q = quit): kanpat
k
a
N:n
p
REJECTED: "N realized as m" fails in state 2.
b u y 0 ` + {Vpast}
b o u g 0 h t
b r i n g ` + {Vpast}
b r o u g 0 h t
f i 0 g h t ` + {VPast}
f o u g h t 0 0 0
These are the default settings. The user can change the setting of a
switch by typing its name. `resolve' turns conflict
resolution off if it is on, and vice versa.Command (C-Q = quit): switches
Conflict resolution is ON.
Timing is OFF.
Verbose mode is OFF.
Command (C-Q = quit):
when the left-arrow compilation starts, then > at
the start of the right arrow compilation followed by 2
and 1 as the two contexts are processed. When the
intersection begins, the line reads:
k:v <=
In the verbose mode, `compile' shows how rules that
contain variables are expanded with subrules and subcontexts. The list
of feasible pairs and implicit definitions for underspecified symbols
such as :Vowel are also shown. Before a rule is
compiled, the compiler shows the partitioning of the alphabet into
equivalence classes it has inferred by inspecting individual rule
components. It may turn out during the compilation that some of the
classes can be merged because their representative symbols turn out to
be equivalent after all. In that case, the adjusted classes are
displayed at the end. If the rule is a double-arrow rule, the
transducer tables for the <= and
=> compilation are shown separately before
intersection.k:v <=> 2 1
The largest item on the list (Networks) is the space
occupied by transducers at the end of the compilation. When a new
grammar is read in, the previous data is expunged.Command (C-Q = quit): storage
Memory in Use
Strings: 1979 bytes
RE cells: 3288 bytes
Rules: 388 bytes
Subrules: 476 bytes
Contexts: 340 bytes
Where clauses: 64 bytes
Alphabets: 2182 bytes
Networks: 21426 bytes
Total: 30143 bytes