The Document Company -  Xerox
Copyright © 1996 Xerox Corporation. All rights reserved.

Application of Finite-State Networks

This page explains how a finite-state network is applied to an input string to produce output strings. As discussed in the section on Finite-State Networks, we use the term network to cover both simple automata and transducers. In this discussion we focus on transducers. Because any simple automaton can be thought of as representing the identity relation on the language it encodes, the notion of application defined for transducers naturally subsumes simple automata as well.

Introduction

A transducer encodes a regular relation, a set of pairs of strings. We call the first member of the pair the upper, the second one the lower string. Every pair in the relation is represented in the transducer by a path, a sequence of arcs from the start state to a final state. Consider, for example, the network in Figure 1.

Figure 1: The relation [a b -> x]

The [a b -> x] relation consists of pairs of strings that are identical except that any instance of "ab" in the upper string corresponds to an "x" in the the lower string. Because the network contains loops, there is an infinite number of paths and, consequently, an infinite number of string pairs in the relation. For example, the pair consisting of "caba" and "cxa" is encoded by the path 0-?-0-a:x-2-b:0-0-a-1. The correspondence between the path and the two strings involved is illustrated in Figure 2.

Upper string:  c   a     b     a
Path:0-?-0-a:x-2-b:0-0-a-1
Lower string:  c     x         a
Figure 2: The path for the pair consisting of "caba" and "cxa"

The initial "c" in the upper string is matched by the unknown symbol, ?, because "c" is not in the sigma alphabet of the network. In a transducer, ? is treated as an identity pair: any symbol that is not in the sigma, paired with itself. Therefore, the ? also matches the "c" in the lower string. The first "a" in the upper string matches the upper symbol of the a:x pair. The lower symbol of the same pair matches "x" in the lower string. The "b" in the upper string matches upper symbol of the b:0 pair. This pair does not contribute anything to the lower string because the 0 is the epsilon symbol (the empty string). Finally, the "a" in the upper string matches the a transition, which we take here as an identity pair, a:a, thus getting a match for "a" on the lower side as well.

Application

The path in Figure 2 can be located by starting with either one of the two strings in question. The other member of the pair can be derived from the completed path. A transducer can thus be used to convert an input string to an output string. This is called applying the transducer. In a downward application the upper string serves as input and the output is generated from the lower side of the matching path or paths. An upward application maps a lower string to upper strings. We also use the term lookup for upward application and lookdown for downward application.

Application can be thought of as a procedure that incrementally constructs a path that

  1. consumes the entire input string; and
  2. leads from the start state to a final state in the network.
Let us consider the downward application with "caba" as the input string.

We start in state 0 of the network in Figure 1 looking for an arc that has as its label either c or some symbol pair with c as the upper symbol. Because c is not in the sigma (symbol alphabet) of the network, there are no such arcs. The only matching arc is the looping ? arc for unknown symbols. This brings us back to state 0, now looking for a match for the next input character, the first "a".

Now there are two matching arcs, the a arc to state 1 and the a:x to state 2. Both possibilities have to be explored. The first alternative, a to state 1, quickly leads to an impasse because there are no "b" arcs leaving state 1. Note that ? does not match "b" or any other symbol that belongs to the sigma. On the other hand, the a:x arc leading to state 2 allows the path to continue.

In state 2, we seek a match for "b" and find the b:0 arc leading back to state 0. At that point, only the last "a" in the input string needs to be consumed. Again, there are two alternatives, the a arc to state 1 and the a:x arc to state 2. Because the entire input is consumed, we do not need to look for any continuation; condition 1 is fulfilled either way. However, condition 2 requires that the path terminate in a final state. Since state 1 is final but state 2 is not, only the a arc to 1 results in a successful match for the entire input string.

Once a complete path has been built, we can generate the output string from the opposite side of the path, keeping in mind that ? labels count as identity pairs for the unknown symbol in question.

Zero, One, or Multiple Results

The application of a transducer may produce zero, one, or more results depending on the number of paths that match the input on the appropriate side. In the case of the transducer in Figure 1, downward application (lookdown) on the string "caba" yields only a single result, "cxa", because the path in Figure 1 is the only one in the transducer with "caba" on the upper side. On the other hand, upward application (lookup) on "cxa" produces two results "caba" and "cxa" because there is also another path, shown in Figure 3, with "cxa" on the lower side.

Upper string:  c   x   a
Path:0-?-0-x-0-a-1
Lower string:  c   x   a
Figure 3: The path for the pair consisting of "cxa" and "cxa"

The reason for the ambiguity is simple: any instance of "x" in the lower language of the [a b -> x] relation can be viewed either as the image of an "ab" in the corresponding upper language string or as an "x" that is mapped to "x".

If there are no paths that contain the input string on the appropriate side, the application produces a null result. For example, the transducer in Figure 1 does not allow "ab" to occur on the lower side. Lookup on strings like "ab", "caab", etc. fails at the "b" because the path cannot be completed. Note that state 1 in Figure 1 has no outgoing transition with a "b" on either side of the arc label.

Application as Composition

Application of a transducer to a string is a special case of a more general operation in the finite-state calculus, namely composition. For example, the downward application of the [a b -> x] transducer to "caba" discussed above is in effect the operation illustrated in Figure 4.

[[c a b a]   
.o.
[a b -> x]].2
Figure 3: Downward application of [a b -> x] to "caba"

The expression inside the [ ].2 part denotes the relation obtained by composing the language of [c a b a] (more precisely, the corresponding identity relation) with the relation denoted by [a b -> x]. The result of this composition consists of the pair "caba" and "cxa". The effect of the outermost .2 operator is to extract the lower side of that relation, yielding a set that contains just the string "cxa". Figure 4 illustrates the upward application of the same transducer to "cxa".

[[a b -> x]
.o.
  [c x a]].1
Figure 4: Upward application of [a b -> x] to "cxa"

Note that the order of the arguments to composition in Figure 4 is the opposite of Figure 3. Here the replacement transducer goes on top. The second difference is that the outermost operator, .1, extracts the upper language of the resulting relation. In this case, it is the set consisting of the strings "caba" and "cxa", as we discussed above.

It is useful to think of application in these general terms because it makes it easy to see that the input to application need not be a language representing a single string, it can be any language no matter how complex. For example, the downward application of [a b -> x] to the language denoted by [c [a b]* a] applies to an infinite number of strings producing the language [c x* a] in one step.

If is important to know not just the resulting language but the mapping of each output string to the corresponding input strings or strings, we may stop at the composition, omitting the last extraction operation (.1 or .2). This idea is exploited, for example, in the construction of lexical transducers where the input is a lexicon consisting of citation forms and morphological information and the transducer describes the regular morphological alternations of the language (Karttunen 1994). The result of applying the rules to the lexicon is a relation that associates each inflected form with its lemma.


For more detailed information please look at our pointers to finite-state literature. We welcome your comments and suggestions.
Copyright © The Document Company - Xerox 1997 . All rights reserved.
Written by Lauri Karttunen. Last modified on Wednesday, 22 October 1997.