The Document Company - Xerox 
Copyright © 1997 Xerox Corporation. All rights reserved. 

Xerox Finite-State Tool

Lauri Karttunen, Tamás Gaál, and André Kempe

Version 6.3.0
Abstract: Xerox finite-state tool is a general-purpose utility for computing with finite-state networks. It enables the user to create simple automata and transducers from text and binary files, regular expressions and other networks by a variety of operations. The user can display, examine and modify the structure and the content of the networks. The result can be saved as text or binary files.

Introduction

XFST is a general-purpose Unix application for computing with finite-state networks, based on long-term Xerox research initiated by Ronald M. Kaplan and Martin Kay at the Xerox Palo Alto Research Center in the early 1980s. It is a successor to two earlier interfaces of the same type: IFSM and FSC. The IFSM interface was created at PARC by Lauri Karttunen and Todd Yampol around 1990-92. The FSC tool was developed at XRCE in 1994-95 by Pasi Tapanainen. XFST combines the best features of its two predecessors. It includes many extensions to the regular expresson calculus introduced at XRCE in the last few years.

XFST can read finite-state networks from binary files and compile them from regular expressions and text files. The networks can be simple networks or finite-state transducers. They can be combined by means of a variety of operations, such as union and composition. The resulting networks can be saved into a binary or a text file. The user may apply a network to strings to determine whether the string is accepted by the network or to transform it to another string if the network is a transducer. XFST provides many ways to get information about a network and to inspect and modify its structure.

We introduce XFST with a brief tour that shows how the application is launched in Unix and some of the things that it can do. In the later sections we discuss the command set and other related issues in more depth. For help with regular expressions and other issues concerning the structure and application of finite-state networks, please consult our documentation on these special topics:

Running XFST

XFST is launched with the command 'xfst'. The launch command can be modified by a number of optional flags. For example, 'xfst -f scriptfile' executes the commands in scriptfile and exits upon completion. If no additional modifiers are given, the 'xfst' command starts the application in an interactive loop waiting for commands from the user:
> xfst

Copyright © Xerox Corporation 1997
Xerox Finite-State Tool, version 5.6.6
Type "help" to list all commands available or "help help" for further help.
xfst[0]:

The xfst[0]: prompt indicates that the xfst application is waiting for a command. The number 0 indicates that the network stack is empty.

The great majority of XFST commands are about adding networks to the stack, replacing some or all of the stack by the result of some operation, and saving the stack into a file. Another set of commands is concerned with the network currently on the top of the stack, that is, the network that was most recently added to the stack. We give a few simple examples of both types of commands.

We present two examples. In the first example, the user creates a network with the help of XFST and saves the result into a file. In the second example, we retrieve the network from the file to use in another operation. We then show how a sequence of commands can be run by a script and aliased to a single command.

Making and saving a network

A network can be added to the stack by loading a previously compiled network from a binary file or by compiling a new one from some text source. In either case, the network becomes the topmost one on the stack. In this example, we compile a network from a regular expression using the command 'read regex'. We type
xfst[0]: read regex [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
This expression denotes the language that consists of the ten decimal digits. Because 0 is a special symbol (epsilon) in a regular expression, it is necessary to prefix it here with , the escape character, to have it interpreted as a digit. The semicolon at the end of the line closes the regular expression. When the command is terminated with a carriage return, XFST responds:
2 states, 10 arcs, 10 words.

xfst[1]:
showing that the network representing this ten-word language consists of 2 states and 10 arcs. The new prompt, xfst[1]: shows that we now have one network on the stack. The command 'print net' displays the structure of the network on the screen:
xfst[1]: print net
Sigma:
0 1 2 3 4 5 6 7 8 9
Size: 10
Net: ["0" | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
Flags: deterministic, pruned, minimized, epsilon_free
Arity: 1
s0:   0 -> fs1, 1 -> fs1, 2 -> fs1, 3 -> fs1, 4 -> fs1, 5 -> fs1,
      6 -> fs1, 7 -> fs1, 8 -> fs1, 9 -> fs1.
fs1:  (no arcs)
xfst[1]:
Here the 'print net' command displays the states of the network: s0 (a non-final state), fs1 (a final state); and the labeled arcs leading from s0 to fs1. In addition, we see the symbol alphabet of the network (Sigma), the regular expression it was compiled from, and some characteristics of the network (Flags, Arity).

It is often convenient to give a network a name that can be used in a regular expression to refer to it. The command for that assignment is 'define':

xfst[1]: define Digit
xfst[0]:
The 'define' command requires at least one argument: the symbol that is being defined, here 'Digit'. If no further specification is given, the network on the top of the stack becomes the value of the defined symbol and is removed from the stack.

Another way to achieve exactly the same result is to include in the 'define' command a regular expression that denotes the desired language or relation. In this case, the single command

xfst[0]: define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
2 states, 10 arcs, 10 words.
xfst[0]:
would have exactly the same effect as the sequence of commands shown above. Here the stack remains empty. Note the closing semicolon that marks the end of the regular expression.

Once defined, the name 'Digit' can be used in regular expressions to represent the language in question. To give a simple example, let us construct a transducer that converts US numerals to the European format. In US numerals the comma is used as a separator, the period marks the beginning of the decimal part. In Europe the convention is the opposite. Thus "1,000.00" in the US corresponds to "1.000,00" in Europe. A transducer that does this conversion can be defined as follows, using the defined 'Digit' symbol:

xfst[0]: read regex  %. -> %, , %, -> %. || Digit _ Digit ;
4 states, 41 arcs, Circular.
xfst[1]:
This transducer represents the parallel replacement of "." by "," and "," by "." between two digits. Because period and comma are also special symbols, they must be prefixed with the percent sign when used as ordinary characters. Note the plain comma that separates the two replacement expressions.

To verify that the transducer does what it is supposed to do we can use the 'apply' command. Because transducers are bidirectional, we must specify the direction of application. In this case, it is 'down'; that is, the US representation is on the "upper" side of the transducer:

xfst[1]: apply down 1,234.99
1.234,99
xfst[1]:
The 'apply' command may also be used to take the input strings from a file instead of typing them directly. The file is processed line by line:
fst[1]: apply down < US-num.txt
Opening file US-num.txt...
apply down> 1,234.99
1.234,99
apply down> 0.5
0,5
apply down> 100,000,000
100.000.000
Closing file US-num.txt...
In order to have the transducer available in the future, we can save it to a file. The command 'save' writes all the networks currently on the stack into a single file. In this case, the file will contain just one network:
xfst[1]: save stack US-to-EU-num.fst
xfst[1]:
The command 'quit' exits from XFST and returns back to Unix:
xfst[1]: quit
>
However, instead of quitting, we move on to the next topic.

Loading and using a stored network

In this example, we load the network we just created from a file to the stack, add another network on the top of the first one, and then perform an operation to replace both of them with the result of that operation.

Let us first empty the stack with 'clear stack' and then load the network back from the file.

xfst[1]: clear stack
xfst[0]: load US-to-EU-num.fst
xfst[1]:
We will now create another network by compiling a simple network from the same little text file we already used above
xfst[1]: read text < US-num.txt
Opening file US-num.txt...

Closing file US-num.txt...
20 states, 21 arcs, 3 words.
xfst[2]:
The 'read text' command expects as its argument a name of a file containing a list of words, one entry per line. It compiles the word list into a network. The command 'print words' displays the content of the compiled word list.
xfst[1]: print words
1,234.99
100,000,000
0.5
xfst[2]:
As the prompt 'xfst[2]:' indicates, we now have two networks on the stack. We can get more information about the stack with the 'print stack' command:
fst[2]: print stack
0: 20 states, 21 arcs, 3 words.
1: 4 states, 41 arcs, Circular.
fst[2]:
The positions in the stack are numbered from 0 upwards. The net in position 0, the three-word network, is on top of the stack. Unary commands such as 'print net' and 'print words' apply to the top network on the stack, if not followed by some defined name. The net below it, in position 1, is the transducer converting US-style numbers to European-style numbers.

Instead of applying the transducer to strings one at the time, as we did before with the 'apply down' command, we can now transduce the entire list of words at once with the 'compose net' command:

fst[2]: compose net
20 states, 21 arcs, 3 words.
fst[1]:
The compose operation replaces the two networks on the stack by the result of the operation. Thus we now have just one network left. It is instructive to view its contents using the same 'print words' command as before.
fst[1]: print words
0<.:,>5
100<,:.>000<,:.>000
1<,:.>234<.:,>99
fst[1]:
The result of the composition is a transducer. It denotes a relation, a mapping from one regular language into another one. On its "upper side", the transducer has the three original US-style numbers, each mapped to a corresponding European-style on the "lower side" of the transducer. For the most part, the mapping is an identity relation because each digit is mapped to itself. The only difference is that periods are mapped to commas, and vice versa. The angle brackets indicate pairs of non-identical symbols: <.:,> indicates the correspondence of an upper-side period with a lower-side comma; similarly for <,:.>.

We can view the upper and lower languages of the relation independently. print upper-words displays the three US numbers; print lower-words shows what they have been transduced into:

fst[1]: print lower-words
0,5
100.000.000
1.234,99
fst[1]:
As before, can use the 'apply' commands to map strings on one side of the transducer to the corresponding strings on the other side. For example, 'apply up' maps any one of the three lower-side European numbers to the corresponding upper-side string.
fst[1]: apply up 0,5
0.5
We can also extract one of the other language from the relation. The command 'lower-side net' extracts from the transducer a simple automaton that contains just the three European numbers.
fst[1]: lower-side net
20 states, 21 arcs, 3 words.
fst[1]: print words
1.234,99
100.000.000
0,5
fst[1]:quit

Running XFST with a script

The examples we have given so far are about using XFST in an interactive mode, with the user typing commands on the prompt line. It is more convenient, for many purposes, to write a list of commands to be run in batch mode without any user interaction. To illustrate this possibility, we write a script that compiles the US-to-European transducer. and uses it to produce a file of European-style numbers from a file of US-style numbers.

A script is an ordinary text file that can be prepared with any text editor, such as Emacs. Here we create the script with the help of the simple Unix 'cat' command. The command 'cat > xfst.script' writes the rest of the text up to the closing ^D (control-D) into the file xfst.script.

cat > xfst.script

# A script for producing a file of European numbers
# from a file of US numbers.

echo >> Defining Digit
define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;

echo >> Defining number converter
read regex  %. -> %, , %, -> %. || Digit _ Digit ;

echo >> reading US numbers
read text < US-num.txt

echo >> Composing and extracting EU numbers
compose net
lower-side net

echo >> Creating EU-num.txt
write text EU-num.txt
system ls -l EU-num.txt

echo >> Done.
quit
^D
The lines beginning with # are comments that are ignored by xfst. The 'echo' command prints the rest of the line to the standard output when the script is executed. In complex scripts it is often useful to print progress reports during the execution.

Because the purpose of this script is to produce an output file, we verify at the end that the file has indeed been created. The 'system' command sends the rest of the line to the Unix operating system. If the file is in the directory, the script was run successfully

To execute the script, we invoke xfst with the -f flag:

> xfst -f xfst.script
 >> Defining Digit
2 states, 10 arcs, 10 words.
 >> Defining number converter
4 states, 41 arcs, Circular.
 >> reading US numbers
Opening file US-num.txt...
 
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
 >> Composing and extracting EU numbers
20 states, 21 arcs, 3 words.
20 states, 21 arcs, 3 words.
 >> Creating EU-num.txt
-rw-r--r--   1 lk       xerox         25 May 30 13:09 EU-num.txt
 >> Done.
bye.
The -f flag causes xfst to exit after the script has been run. If the user wishes to take over at that point and continue running xfst in an interactive mode, he can call the script with -l flag instead of -f.

Another non-interactive way of running xfst is to include a sequence of xfst commands directly on the Unix command line. Each xfst command must be enclosed in double quotes and preceded by an -e flag. If the final -stop flag is omitted, xfst executes the commands and continues running in an interactive mode.

> xfst -e "define Digit [%0|1|2|3|4|5|6|7|8|9] ;" \
-e "read regex  %. -> %, , %, -> %. || Digit _ Digit;" \
-e "read text < US-num.txt" \
-e "compose net" \
-e "lower-side net" \
-e "write text EU-num.txt" -stop
Note the backslash at the end of each line. It signals to to the Unix operating system that the entire sequence of commands counts as a single command line.

Defining aliases

XFST allows the user to create simple names for more complex commands. For example,
alias dir system ls -l *.txt
creates a new XFST command 'dir' that has the same effect as 'system ls -l *.txt', that is, it sends to the Unix operating system the command 'ls -l *.txt'. The chosen alias must be a single word with no hyphens, underscores, or other special characters. The command 'print alias' lists all the current aliases and their definitions.

An alias can represent an arbitrary sequence of commands. To create such an alias, the user first types only the name to be defined.

xfst[0]: alias ConvertAndShow
XFST responds by prompting the user for commands. The list can be terminated by ^D (control-D) or by a special symbol, END;, with no extra whitespace around it:
alias> define Digit [%0|1|2|3|4|5|6|7|8|9] ;
alias> define Conv  %. -> %, , %, -> %. || Digit _ Digit ;
alias> read regex [@txt "US-num.txt" .o. Conv].l ;
alias> echo
alias> echo >> Here is the result:
alias> print words
alias> END;
With slightly different means, the above definition of the alias 'ConvertAndShow' produces the same end result as our previous examples: a network of European-style numbers obtained by transduction from a US-style source file. We define the conversion relation, Conv, by means of the earlier definition of Digit. Instead of the command 'read text' to compile the word list in US-num.txt, we use the regular expression operator @txt to bring about the same effect. Similarly, in place of the XFST commands 'compose net' and 'lower-side net', we employ the corresponding operators in the regular expression calculus: [@txt "US-num.txt" .o. Conv].l denotes the lower side language (.l) of the composition (.o.) of the US numbers with the conversion relation.

Now XFST interprets ConvertAndShow as a command to execute the defined set of commands:

xfst[0]: ConvertAndShow
2 states, 10 arcs, 10 words.
4 states, 41 arcs, Circular.
Opening file US-num.txt...
 
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
20 states, 21 arcs, 3 words.

 >> Here is the result:
1,234.99
100,000,000
0.5
xfst[1]:
A convenient way to extend the xfst command language is to place such alias definitions in a script file that is executed with with the -l flag as the application is launched. 

Overview of Commands

This section contains a general overview of xfst commands. A complete alphabetical list, with explanations, is found in the next section. 

Command Syntax

XFST commands are in general of the form '<command> <type or object>', where <command> specifies the operation to be performed and the second term, if any, gives some additional specification about the type of the operation or the object it applies to. For example, there are several variants of the 'print' command: 'print net', 'print sigma', 'print words', etc.

All display commands and all unary operations, such as 'lower-side net', apply to the network on the top of the stack. For example, 'print words' enumerates the paths of the topmost network only, although there may be others on the stack. Some commands, such as 'print net' and 'print words', can be followed by a name of network which has been given a name with the 'define' command, as shown below:

xfst[0]: read text
text> two
text> words
text> ^D
 
8 states, 8 arcs, 2 words.
fst[1]: define W2  
fst[0]: print words W2
two
words
xfst[0]:
Virtually all XFST commands can be abbreviated to a single word command. For example, the 'print' part of all print commands can be dropped. Thus 'sigma' as a command has the same effect as 'print sigma'. Similarly, 'regex' and 'read regex' are equivalent.

Short command names are convenient when one is working in an interactive mode. On the other hand, for scripts we recommend using the long commands because it tends to make the scripts more perspicuous.

Command Classes

The FST commands can be grouped into five classes. We describe first in general terms the five classes and list the commands in each class. A more detailed description of each command is given in an alphabetical listing in the next section.

Input/Output and Stack Commands

move networks onto and off the stack, save the stack or the top network into a file in various formats. We use the commands 'load' and 'save' for binary files, 'read' and 'write' for text files. 'read regex' and 'define' compile a network from a regular expression; 'read regex' pushes the result on the stack, 'define' assigns it to the chosen name.

The 'read' commands take the input from the console (stdin) unless the user specifies another source by means of < symbol and a file name. For example, 'read text < /usr/dict/words' compiles the list in /usr/dict/words.

Similarly, the 'write' commands send the output to the console (stdout) unless the command is followed by > and a name of a target file. It is also possible to send the output directly to a printer. For example, 'write text | lp' uses the | convention in Unix to pass the output to the 'lp' print command.

Binary
input/output 
load stack, save stack, load defined, save defined
Text
input/output 
read prolog, read spaced-text, read text, write prolog, write spaced-text, write text.
Regular
expressions 
define, read regex, undefine
Stack
commands 
clear stack, pop stack, push defined, turn stack, rotate stack

Display commands

give information about the content and properties of the network on the top of the stack, the language or relation it represents, the state of the system, and the available commands.

Like the 'write' commands in the previous section, the 'print' commands display the output on the console, unless the user directs it to a file or a printer. The 'print' commands do not have a corresponding input command.

Stack and network
information 
inspect net, print eqv-labels, print flags, print print labels, label-tally, print longest-string, print name, print net, print sigma, print sigma-tally, print sigma-word-tally, print size, print stack
System information 
print aliases, print file-info, print defined, print directory, print storage
Network
content 
print random-lower, print random-upper, print words, print lower-words, print upper-words, print nth-upper, print nth-lower, print num-upper, print num-lower
Apply network
to a string 
apply up, apply down
Help 
apropos, help
Other 
echo

Tests of network properties

Unary tests check whether the network on the stack has certain properties. Binary tests compare the languages of the two top network on the stack.

Unary 
test lower-bounded, test upper-bounded, test lower-universal, test upper-universal
Binary 
test equivalent, test overlap, test sublanguage

Operations on networks

produce a network that represents a new language or relation by modifying the top network, two top networks, or all the networks on the stack. The affected networks are removed from the stack and replaced by the result of the operation.

Property list commands modify or display the attribute-value pairs on the top network's property list. Maintenance commands perform various housekeeping operations of the top level network structure without changing the language or the relation it represents.

Unary
operations 
invert net, label net, lower-side net, multi char sigma net, negate net, one-plus net, paste net labels, reverse net, sigma net, single char sigma net, substitute defined, substitute label, substitute symbol, substring net, upper-side net, zero-plus net
Binary
operations 
crossproduct net, minus net
N-ary
operations 
compose net, concatenate net, intersect net, shuffle net, union net
Property
list 
add properties, edit properties, name net, read properties, write properties 
Maintenance 
complete net, determinize net, eliminate flag, epsilon-remove net, minimize net, prune net, sort net, compact sigma, optimize net, unoptimize net

System commands

change the internal configuration of XFST or interact with the Unix operating system.

Variables 
show, set
Aliases 
alias
System call 
system
Execute script 
source
Other 
quit
 



Alphabetical Index

This index consists of three parts. The first index lists all the currently implemented XFST commands. The second list describes the variables that the user can manipulate with the XFST 'set' command. The final part explains the correspondence between certain XFST commands and certain operators in the regular expression calculus. The XFST commands involved are the 'load' and 'read' commands and all commands that construct a new network by means of an operation on one or more networks on the stack.

List of Commands

This list gives a brief description of all the currently implemented XFST commands. This information is also available on-line in XFST with the 'help' command. For example, 'help add properties' prints out the description shown below.
add properties
add properties < source-file
reads attribute-value pairs from a text file and adds them to the property list of the top network on the stack. Each line in the source file must contain one attribute-value pair. For example,
COPYRIGHT: "Xerox Corporation"
DATE: "June 10, 1997"
VERSION: 9
The attributes, each terminated by a colon, must be strings (upper case recommended). The values may be strings or integers. String values must be enclosed in double quotes: "9" and 9 are considered as different values. If an attribute to be added already exists on the property list of the network, 'add properties' will reset it to a new value. See also: 'read properties', 'write-properties'.
alias
alias name
alias name command-sequence
'alias ' defines as a shorthand for the given sequence of XFST commands. For instance, the new command 'ls' can be created in the following way:
alias ls print directory
Alias may contain several commands. A new command 'lexr', for reading a lexicon, can be created in the following way:
fst[0]: alias lexr
alias> load lexicon.fst
alias> print properties
alias> END;
A multi-command alias may also be defined on one line as follows:
alias lexr load lexicon.fst; print properties
See also: 'print aliases'.
ambiguity net
converts the lexical transducer on the top of the stack into a transducer that maps each word on the lower-side of the network to itself and a sequence of alternative tags that the word has in the original transducer. Limitations: (1) the last symbol on each path is interpreted as a tag, (2) only one tag per word is allowed, (3) the network must be acyclic.
apply up
apply up line
apply up < source-file
Simulates the composition of the input string with the lower side of the top network on the stack and extracts the result from the upper side. If the command `apply up' is followed by a carriage return,the program prompts the user with apply>, prints the result and gives a prompt for a new input. Control-D exits from this mode. See also the variable 'print-pairs'.
apply down
apply down line
apply down < source-file
Simulates the composition of the input string with the upper side of the top network on the stack and extracts the result from the lower side. If the command `apply down' is followed by a carriage return, the program prompts the user with apply>, prints the result and gives a prompt for a new input. Control-D exits from this mode. See also the variable 'print-pairs'.
apropos text
'apropos text' prints brief information about the commands relating to the given text. If you need further information about a command, use the command 'help command-name'.
cleanup net
merges X:0 0:Y and 0:Y X:0 sequences into X:Y, if possible.
clear stack
Removes all networks on the stack.
compact sigma
all symbols that have the same distribution as the unknown symbol are eliminated from the alphabet and the corresponding redundant arcs are removed from the network.
complete net
extends the top network until it has a transition for every symbol in sigma in every state. All new transitions lead to a failure state, thus the language of the network is preserved.Complete is defined only for simple networks (arity = 1), not for transducers.
compose net
replaces the stack with the composition of all networks currently on the stack. If the stack consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A .o. B].
concatenate net
replaces the stack with the concatenation of all networks currently on the stack. If the consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A B]. In regular expressions, concatenation is indicated simply by whitespace.
crossproduct net
returns the crossproduct of the first two networks on the stack. They must not be transducers. If the stack consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A .x. B].
define symbol
define symbol regular-expression
binds the given symbol to a network compiled from the regular expression. Every subsequent occurrence of the symbol in a regular expression is interpreted as the name of the corresponding language or relation. To unbind the symbol use 'undefine '. If the regular expression is omitted, i.e. if the command is 'define ', the symbol is bound to the top net on the stack. The net is removed from the stack. See also: 'load defined', 'save defined', 'print defined', 'read regex', 'undefine', and 'recursive-define'.
determinize net
replaces the top network with an equivalent deterministic network. A deterministic net has at most one transition per label from each state.
echo text
prints the text, followed by a newline.
edit properties
'edit properties' allows the user to edit the property list of the top network on the stack. Properties may be added, deleted, and modified. Values must be integers or strings.
eliminate flag
replaces the top network on the stack with an equivalent network that encodes directly the constraints expressed by the given flag attribute.
epsilon-remove net
replaces the top network with an equivalent one that has no epsilon transitions.
help
help command-name
'help' lists the commands and the command classes. To get more information about a command, use the command 'help command'. It gives a longer explanations about the command and its usage. All possible arguments are listed and explained. For instance, the command 'help' (see above) may be used without arguments or it may have one argument that is a name of a command. The command name can be a name of the class of commands, for instance, 'help stack' prints all stack commands. If the given command name is not recognised, command help works like command 'apropos'.
inspect net
invokes a mode for interactively inspecting the top network on the stack.
intersect net
replaces the stack with the intersection of all networks currently on the stack. If the stack consists of two networks, defined as A and B, regardless of the order, the result is equivalent to compiling the regular expression [A & B].
invert net
exchanges the two sides of the top network on the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.i]. 'invert net' has no effect on simple networks.
label net
replaces the top network on the stack with a network that represents the language or relation that includes all the strings and string pairs in the label alphabet of the original network. See also 'print labels', 'sigma net'.
load defined source-file
restores the definitions from a binary source file. The networks are assigned to the chosen names directly, without pushing them onto the stack. The file must be in the binary format, as produced by the 'save defined' command.
load stack < source-file
reads the networks in the given file, and pushes them onto the the stack. The file must be in the binary format, as produced by the 'save stack' command. If the file only contains a single network, it can also be loaded with the command 'read regex [@bin ];'.
lower-side net
extracts the lower language of the top network on the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.l]. 'lower-side net' has no effect on simple networks.
minimize net
replaces the top network on the stack with an equivalent network that has a minimal number of states.
minus net
replaces the top two networks on the stack with a network containing all strings in the first network that are not in the second. If the two networks are defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A - B].
multi char sigma net
replaces the top network on the stack with a network that represents the language of the words of the original network. The words are composed into multi-character symbols. See also 'single char sigma net', 'print sigma', 'label net' 'sigma net'.
name net symbol
assigns the given name to the top network on the stack.
negate net
replaces the top network on the stack with a network that accepts all and only those strings rejected by the original. If the network is defined as A, the result is equivalent to compiling the regular expression [~A].
one-plus net
concatenates the top network any number of times with itself. If the network is defined as A, the result is equivalent to compiling the regular expression [A+].
optimize net
runs a heuristic algorithm that tries to reduce the number of arcs, possibly at the cost of introducing more states.
paste net labels
replaces the top network on the stack with a network that represents the same language as the original network. Along every path, symbols (labels) are pasted together into as few as possible multi-character symbols. Loops and branching are handled correctly. See also 'multi char sigma net' 'single char sigma net', 'print sigma', 'label net', 'sigma net'.
pop stack
removes the top network on the stack.
push defined
makes a copy of the network named by a previous 'define' command and pushes it onto the stack. 'push defined SYMBOL' has the same effect as 'read regex SYMBOL ;'.
print aliases
print aliases > target-file
prints each alias and the command sequence it encodes.
print defined
print defined > target-file
prints each defined symbol and the size of the network it stands for.
print directory
print directory filenames
print directory [filenames] > target-file
prints the content of the current directory in its entirety or the files that match the given pattern, such as '*.txt'.
print eqv-labels
print eqv-labels > target-file
prints the label equivalence classes of the networks on the stack. Labels that belong to the same equivalence class always appear together on arcs that lead to the same destination state.
print file-info
print file-info > target-file
displays info about the last opened binary file.
print flags
print flags > target-file
'print flags' prints information about the top network on the stack.
print labels
print labels symbol
print labels > target-file
prints the label alphabet of the top network on the stack (by default) or of the given defined network. The label alphabet includes all the symbols and symbol pairs that occur as arc labels in the network. See also 'print sigma', 'label net'.
print label-tally
print label-tally > target-file
tallies label frequencies in the top network on the stack.
print longest-string
print longest-string > target-file
prints the length of the longest path in the top net.
print lower-words
print lower-words name
print lower-words number
print lower-words name number
print lower-words > target-file
displays the words in the lower side language of the top network or in the network assigned to the defined symbol. If a number is given, only that many words are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print lower-words > lower-words.txt'. If the network is a simple network rather than a transducer, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variables 'print-space' and 'quote-special'.
print name
print name > target-file
prints the name of the top network on the stack. If the net is unnamed, the hex address of the net is printed instead. Use the command 'name net' to give a name to the top network.
print net
print net symbol
print net > target-file
prints a text representation of the top network on the stack (by default) or the given definition. The output is directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command.
print nth-lower
print nth-lower name
print nth-lower number
print nth-lower name number
print nth-lower > target-file
displays the word on the lower side of the nth path in the top level network or in the network assigned to the defined symbol. This command only works on acyclic networks. The inverse command is 'print num-lower'.
print nth-upper
print nth-upper name
print nth-upper number
print nth-upper name number
print nth-upper > target-file
displays the word on the upper side of the nth path in the top level network or in the network assigned to the defined symbol. This command only works on acyclic networks. The inverse command is 'print num-upper'.
print num-lower word
print num-lower name word
print num-lower > target-file
displays the numbers of all paths in the top level network that have word on the lower side. This command only works on acyclic networks. The inverse command is 'print nth-lower'.
print num-upper word
print num-upper name word
print num-upper > target-file
displays the numbers of all paths in the top level network that have word on the upper side. This command only works on acyclic networks. The inverse command is 'print nth-upper'.
print random-lower
print random-lower number
print random-lower > target-file
generates 15 random words (by default) from the top network on the stack or any number specified in the command. The words are from the lower side language of the net. If the network is a simple automaton, rather than a transducer, 'print random-lower' gives the same output as 'print random' and 'print random-upper'. See also the variable 'print-space'.
print random-upper
print random-upper number
print random-upper > target-file
generates 15 random words (by default) from the top network on the stack, or any number specified in the command. The words are from the upper side language of the net. If the network is a simple automaton, rather than a transducer, 'print random-upper' gives the same output as 'print random' and 'print random-lower'. See also the variable 'print-space'.
print random-words
print random-words number
print random-words > target-file
generates 15 random words (by default) from the top network on the stack or any number specified in the command. If the network is a transducer, the output may contain symbol pairs. If the network is a simple automaton, rather than a transducer, 'print random-words' gives the same output as 'print random-lower' and 'print random-upper'. See also the variable 'print-space'.
print sigma
print sigma symbol
print sigma > target-file
prints the sigma alphabet of the top network on the stack (by default) or the given defined network. The sigma alphabet includes all the symbols that appear in the network, either as such or as a constituent of a symbol pair, and any symbols that have been excluded from the network by subtraction or complementation. See also 'print labels', 'sigma net'.
print sigma-tally
print sigma-tally > target-file
tallies sigma arc frequencies in the top network on the stack.
print sigma-word-tally
print sigma-word-tally upper
print sigma-word-tally lower
print sigma-word-tally > target-file
tallies sigma frequencies by path in the top network on the stack.
print size
print size symbol
print size > target-file
prints the size of the top network on the stack (by default) or the given definition.
print stack
print stack > target-file
displays the content of the stack.
print storage
print storage > target-file
displays memory usage
print upper-words
print upper-words name
print upper-words number
print upper-words name number
print upper-words > target-file
displays the words in the upper side language of the top network or in the network assigned to the defined symbol. If a number is given, only that many words are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print upper-words > upper-words.txt'. If the network is a simple network rather than a transducer, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variables 'print-space' and 'quote-special'.
print words
print words name
print words number
print words name number
displays the paths in the top network or in the network assigned to the defined symbol. If a number is given, only that many paths are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print words > words.txt' If the network is a transducer, the output may contain symbol pairs. Otherwise, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variables 'print-space' and 'quote-special'.
prune net
removes all paths leading to non-final states in the top network on the stack.
quit
Exits the program.
read prolog
read prolog < source-file
reads the network(s) contained in the file and pushes them onto the stack. The file must be in the format produced by the 'write prolog' command. The the same effect can be produced with the command 'read regex [@pl ];'
read properties
read properties < source-file
reads a set of attribute-value pairs from a text file and resets the property list of the top network on the stack. See also: 'add properties', 'write properties'.
read regex
read regex < source-file
read regex regular expression
creates a network from a regular expression and and pushes it onto the stack. The file should contain one regular expression. If no filename is given the input is read from the console. The input expression must be terminated by a semicolon.
read spaced-text
read spaced-text source-file
reads a transducer or a net with multicharacter symbols from a file. See also function 'print words' and variable 'print-space'.
read text
read text < source-file
reads the net as text from a file. See also function 'print words' and variable 'print-space'.
reverse net
replaces the top net on the stack with one that accepts the reverse language.
rotate stack
Pushes the top element of the stack to the bottom.
save defined target-file
stores the networks for all defined symbols into a binary file. Use 'load defined' to restore the definitions.
save stack target-file
stores all the networks on the stack into the given file in a binary format. Use 'load stack' to load the file.
set variable value
sets a new value to the variable. The value may be an integer, ON or OFF, or a string. See variable for details. See also: 'show'.
show variable
shows the value of the variable. See also 'set'.
shuffle net
replaces the stack with a single network that accepts every string formed by shuffling together (interdigitanting) one string from each of the input languages. For instance, if Net1 accepts ab and Net2 accepts xy, then the shuffle of these nets will accept the following six strings: abxy axby axyb xaby xayb xyab.
sigma net
replaces the top network on the stack with a network that represents the language of the strings in the sigma alphabet of the original network. See also 'print sigma', 'label net'.
single char sigma net
replaces the top network on the stack with a network that represents the language of the strings in the sigma alphabet of the original network. Multi-character symbols are decomposed into single character ones. See also 'print sigma', 'label net' 'sigma net'.
sort net
Reorders the arcs of the top network in a canonical order.
source command file
executes a file of XFST commands. For example, 'source xfst.script' executes the commands in xfst.script.
substitute defined NAME for LABEL
replaces the top network on the stack with a network derived by splicing in the network associated with the given name for each arc that contains the label. If the network is a transducer, LABEL must not occur in a symbol pair.
substitute label list-of-labels for label
replaces the top network on the stack with a network derived by replacing every arc with the given label by an arc or arcs, each labeled with one of the substitute labels. If the list consists of the keyword NOTHING, all paths containing the given label are removed. The label and the substitutes may be single symbols or a symbol pairs. See also 'substitute symbol'.
substitute symbol list-of-symbols for symbol
replaces the top network on the stack with a network derived by replacing every arc whose label contains the given symbol by an arc or arcs whose label instead contains one of the substitute symbols. If the list consists of the keyword NOTHING, all paths containing the given symbol are removed. The symbol and the substitutes must be single symbols, not symbol pairs. See also 'substitute label'.
substring net
replaces the top network on the stack with a network that accepts every string that is a substring of some string in the language of the original network.
system system-command
'system command' executes the given operating system command(s), for instance, 'system ls'.
test equivalent
returns 1 (TRUE) if all networks on the stack contain the same language
test lower-bounded
returns 1 (TRUE) if the lower side of the top level network has no epsilon cycles.
test lower-universal
returns 1 (TRUE) if the lower side of the top level network represents the universal language.
test overlap
returns 1 (TRUE) if the languages of all networks on the stack have a non-empty intersection.
test sublanguage
returns 1 (TRUE) if the language of the i-th network is a sublanguage of the next (i+1-th) one on the stack. Starting at the topmost network (i=0). See also 'print stack' and 'turn stack'.
test upper-bounded
returns 1 (TRUE) if the upper side of the top level network has no epsilon cycles.
test upper-universal
returns 1 (TRUE) if the upper side of the top level network contains the universal language.
turn stack
reverses the order of the networks on the the stack. NOTE: Use the command 'reverse net' to create the reverse language of the top net.
undefine list of symbols
unbinds the symbol that was bound by the 'define' command. 'undefine symbol1 symbol2 symbol3 ...' etc. undefines all the listed symbols. 'undefine ALL' removes all the definitions.
union net
replaces the stack with the union of all networks currently on the stack.
unoptimize net
reverses the effect of 'optimize net'
upper-side net
extracts the upper language from the network on the top of the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.l]. 'upper-side net' has no effect on simple networks.
write prolog
write prolog > target-file
writes the network as Prolog-clauses. See also: 'read prolog'.
write properties
write properties symbol
write properties > target-file
prints the attribute-value pairs on the property list of the network. See also 'add properties', 'read properties'.
write spaced-text
write spaced-text > target-file
Stores the top network in a text format. This format is intended for transducers and networks with multi-character labels. Use 'read spaced-text' to read files in this format. Use 'write text' for simple networks with single-character labels. Use 'save stack' to store in a compact binary format. If no target file is specified, output will be sent to standard output (usually the terminal).
write text > target-file
stores the words in the top network, one per line. This format cannot be used for transducers or networks with multi-character labels. For such networks, use 'save-as-spaced-text' instead. Use 'save stack' to store in a compact binary format. If no target file is specified, output will be sent to standard output (usually the terminal).
zero-plus net
concatenates the top network any number of times with itself (as one-plus net also does) and adds to it the empty string. If the network is defined as A, the result is equivalent to compiling the regular expression [A*].

Variables

This is the list of variables that the user can set with the command 'set variable value' with the list of available values. The command 'show variable' displays the current value of the variable.
variable compose-flag-as-special (ON|OFF)
when ON, composition treats flag diacritics as special kind of of epsilon symbols. Flag arcs are like epsilon arcs in that they are not matched against any symbol in the other network but, unlike true epsilons, they are retained in the resulting network. When OFF flag diacritics are treated as ordinary symbols. Default is OFF.
variable directory (filename)
The working directory
variable mark-version (ON|OFF)
when ON, the current version of the program is saved on the property list of the network under the attribute TOOLVERSIONS when the network is saved into a file.
variable max-regex-errors (integer)
The maximum number of errors for regular expressions. When it is exceeded the rest of the input file is discarded.
variable minimal (ON|OFF)
when ON, network operations minimize the result. Minimization includes epsilon removal and determinization. Default is ON.
variable obey-flags (ON|OFF)
when ON, the constraints expressed as flag diacritics are taken into account by 'print [upper|lower]-words' and 'print random-[upper|lower]' commands. When OFF, flag diacritics are treated as ordinary symbols in listing the contents of a network. Default is ON.
variable name-nets (ON|OFF)
when ON, regex names networks. Default is OFF.
variable print-pairs (ON|OFF)
when ON,'apply up' and 'apply down' show both (UPPER and LOWER) sides of labels. Default is OFF.
variable print-sigma (ON|OFF)
when ON, the sigma is shown when a network is printed. Default is ON.
variable print-space (ON|OFF)
when ON, 'print word' and 'print random-word' insert a space between symbols. Default is OFF.
variable process-in-order (ON|OFF)
when ON, intersection and composition operations process networks in the same order as they appear in the stack. Default is OFF.
variable recursive-define (ON|OFF)
when ON, 'define ;' treats instances of in as referring to what is being defined. 'define A [a | A b];' yields the left-recursive language [a b*]. The right-recursive definition 'define B [a B | b];' assigns to B the language [a* b]. 'define C [a C b | a b];' makes C equivalent to [a+ b+], a weak approximation of the context-free language C -> a C b, C -> a b. Default is OFF.
variable quit-on-fail (ON|OFF)
when ON, the application exits with an error if a command is incorrectly spelled or cannot be executed because of a missing file or some other error. This is useful when the application is run by a shell script. Default is OFF.
variable quote-special (ON|OFF)
When ON literal instances of special characters are enclosed in double quotes: zero and question mark are printed as "0" and "?" to distinguish them from epsilon and the unknown symbol. Space, tab, and newline are printed as " ", "\t", and "\n", respectively, to distinguish them from each other. The affected commands are 'print words', 'print lower-words', and 'print upper-words'.
variable random-seed (integer)
seed of the random number generator. It cannot be inspected.
variable sort-arcs (ON|OFF)
when ON, the arcs of a network are sorted before the network is pushed onto the stack. Default is ON.
variable unicode (ON|OFF)
If the value is ON, two-byte characters are printed as four hex integers preceded by \u, if the value is OFF we try to display 16-bit characters directly. Default is OFF.
variable verbose (ON|OFF)
When ON more information is printed.

Commands <-> RE operators

All the XFST commands that produce a new network from one or more expressions on the stack and replace them with the result of the operation correspond to an operation in the regular expression calculus (see Syntax and Semantics of Regular Expressions). Some of the load and read commands also have a counterpart in the regular expression language.

The correspondence is summarized below along with a characterization of the language or relation that the result implements, described in terms of the corresponding regular expression. We also indicate for each command whether it applies to the top network (unary), the two top networks (binary), or to all networks on the stack (n-ary).

concatenate net (n-ary)
[A B] denotes the concatenation of all strings or string pairs in A with all strings or string pairs in B.
compose net (n-ary)
[A .o. B] denotes the composition of the relations A and B.
crossproduct net (binary)
[A .x. B] denotes the relation that maps every string of A to every string of B.
intersect net (n-ary)
[A & B] denotes the set of strings that belong to both A and B.
invert net (unary)
[A.i] denotes the relation that is the inverse of A; that is the upper and lower members of each string pair in the relation are switched around but the relation remains otherwise the same.
load stack
[@bin source-file] denotes the topmost network in the binary source-file. Note that @bin only retrieves one network, whereas 'load stack' pushes onto the stack all the networks in the file. If the name of the source file contains a period, a slash or some other special character, the name should be enclosed in double quotes.
lower-side net (unary)
[A.l] denotes the lower language of the relation A.
minus net (binary)
[A - B] denotes the set of strings that are in A but not in B.
negate net (unary)
[~A] denotes the set of all strings that are not in A.
one-plus net (unary)
[A+] denotes all the strings that are composed by concatenating strings of A with each other any number of times.
read prolog

read regex
read spaced-text
read text
have corresponding regular expression operators: @pl, @re, @stxt, and @txt, respectively, These unary operators are prefixed to the name of the source file. If the name contains a period, a slash or some other special character, the name must be enclosed in double quotes. For example,
read regex [@re regexfile] ;
compiles the regular expression in regexfile;
read regex [@txt "words.txt"] ;
has the same effect as
read text < words.txt
reverse net (unary)
[A.r] denotes the language or relation that is the reverse of A.
substitute symbol symbol_list for symbol (unary)
`[A, SYMBOL, SYMBOL_LIST] denotes the language or relation derived from A by replacing every instance of the given symbol by the union of the symbols on the list.
union net (n-ary)
[A | B] denotes the union of expressions A and B. All the strings and string pairs that belong to either A or B (or both) also belong to the union.
upper-side net (unary)
[A.u] denotes the upper language of the relation A.
zero-plus net (unary)
[A*] denotes the set of strings that consists the empty string and any number of concatenations of A with itself. [A*] is like [A+] except that it always includes the empty string.
N.B. There are many regular expression operators without an equivalent command in XFST. For example, the replace (->) and restriction (=>) operators do not correspond to any XFST command.



We welcome your comments and suggestions
Copyright © The Document Company - Xerox 1997. All rights reserved.
Written by Lauri Karttunen, Tamás Gaál, and André Kempe.
Last modified on Sunday, 25 October 1998.