Copyright © 1997 Xerox Corporation. All rights reserved.
Xerox Finite-State Tool
Lauri Karttunen, Tamás Gaál, and André Kempe
Version 6.3.0
Abstract: Xerox finite-state tool is a general-purpose
utility for computing with finite-state networks. It enables the user to
create simple automata and transducers from text and binary files, regular
expressions and other networks by a variety of operations. The user can
display, examine and modify the structure and the content of the networks.
The result can be saved as text or binary files.
Introduction
XFST is a general-purpose Unix application for computing with finite-state
networks, based on long-term Xerox research initiated by Ronald M. Kaplan
and Martin Kay at the Xerox Palo Alto Research Center in the early 1980s.
It is a successor to two earlier interfaces of the same type: IFSM and
FSC. The IFSM interface was created at PARC by Lauri Karttunen and Todd
Yampol around 1990-92. The FSC tool was developed at XRCE in 1994-95 by
Pasi Tapanainen. XFST combines the best features of its two predecessors.
It includes many extensions to the regular expresson calculus introduced
at XRCE in the last few years.
XFST can read finite-state networks from binary files and compile them
from regular expressions and text files. The networks can be simple networks
or finite-state transducers. They can be combined by means of a variety
of operations, such as union and composition. The resulting networks can
be saved into a binary or a text file. The user may apply a network to
strings to determine whether the string is accepted by the network or to
transform it to another string if the network is a transducer. XFST provides
many ways to get information about a network and to inspect and modify
its structure.
We introduce XFST with a brief tour that shows how the application is
launched in Unix and some of the things that it can do. In the later sections
we discuss the command set and other related issues in more depth. For
help with regular expressions and other issues concerning the structure
and application of finite-state networks, please consult our documentation
on these special topics:
Running XFST
XFST is launched with the command 'xfst'. The launch command can be modified
by a number of optional flags. For example, 'xfst -f scriptfile' executes
the commands in scriptfile and exits upon completion. If no additional
modifiers are given, the 'xfst' command starts the application in an interactive
loop waiting for commands from the user:
> xfst
Copyright © Xerox Corporation 1997
Xerox Finite-State Tool, version 5.6.6
Type "help" to list all commands available or "help help" for further
help.
xfst[0]:
The xfst[0]: prompt indicates that the xfst application is waiting
for a command. The number 0 indicates that the network stack is empty.
The great majority of XFST commands are about adding networks to the
stack, replacing some or all of the stack by the result of some operation,
and saving the stack into a file. Another set of commands is concerned
with the network currently on the top of the stack, that is, the network
that was most recently added to the stack. We give a few simple examples
of both types of commands.
We present two examples. In the first example, the user creates a network
with the help of XFST and saves the result into a file. In the second example,
we retrieve the network from the file to use in another operation. We then
show how a sequence of commands can be run by a script and aliased to a
single command.
Making and saving a network
A network can be added to the stack by loading a previously compiled network
from a binary file or by compiling a new one from some text source. In
either case, the network becomes the topmost one on the stack. In this
example, we compile a network from a regular expression using the command
'read regex'. We type
xfst[0]: read regex [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
This expression denotes the language that consists of the ten decimal
digits. Because 0 is a special symbol (epsilon) in a regular
expression, it is necessary to prefix it here with , the
escape character, to have it interpreted as a digit. The semicolon at
the end of the line closes the regular expression. When the command
is terminated with a carriage return, XFST responds:
2 states, 10 arcs, 10 words.
xfst[1]:
showing that the network representing this ten-word language consists of
2 states and 10 arcs. The new prompt, xfst[1]: shows that we now
have one network on the stack. The command 'print net' displays the structure
of the network on the screen:
xfst[1]: print net
Sigma:
0 1 2 3 4 5 6 7 8 9
Size: 10
Net: ["0" | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
Flags: deterministic, pruned, minimized, epsilon_free
Arity: 1
s0: 0 -> fs1, 1 -> fs1, 2 -> fs1, 3 -> fs1, 4 -> fs1, 5 -> fs1,
6 -> fs1, 7 -> fs1, 8 -> fs1, 9 -> fs1.
fs1: (no arcs)
xfst[1]:
Here the 'print net' command displays the states of the network: s0
(a non-final state), fs1 (a final state); and the labeled arcs
leading from s0 to fs1. In addition, we see the symbol
alphabet of the network (Sigma), the regular expression it was compiled
from, and some characteristics of the network (Flags, Arity).
It is often convenient to give a network a name that can be used in
a regular expression to refer to it. The command for that assignment is
'define':
xfst[1]: define Digit
xfst[0]:
The 'define' command requires at least one argument: the symbol that is
being defined, here 'Digit'. If no further specification is given, the
network on the top of the stack becomes the value of the defined symbol
and is removed from the stack.
Another way to achieve exactly the same result is to include in the
'define' command a regular expression that denotes the desired language
or relation. In this case, the single command
xfst[0]: define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
2 states, 10 arcs, 10 words.
xfst[0]:
would have exactly the same effect as the sequence of commands shown above.
Here the stack remains empty. Note the closing semicolon that marks the
end of the regular expression.
Once defined, the name 'Digit' can be used in regular expressions to
represent the language in question. To give a simple example, let us construct
a transducer that converts US numerals to the European format. In US numerals
the comma is used as a separator, the period marks the beginning of the
decimal part. In Europe the convention is the opposite. Thus "1,000.00"
in the US corresponds to "1.000,00" in Europe. A transducer that does this
conversion can be defined as follows, using the defined 'Digit' symbol:
xfst[0]: read regex %. -> %, , %, -> %. || Digit _ Digit ;
4 states, 41 arcs, Circular.
xfst[1]:
This transducer represents the parallel replacement of "." by "," and
"," by "." between two digits. Because period and comma are also
special symbols, they must be prefixed with the percent sign when used
as ordinary characters. Note the plain comma that separates the two
replacement expressions.
To verify that the transducer does what it is supposed to do we can
use the 'apply' command. Because transducers are bidirectional, we must
specify the direction of application. In this case, it is 'down'; that
is, the US representation is on the "upper" side of the transducer:
xfst[1]: apply down 1,234.99
1.234,99
xfst[1]:
The 'apply' command may also be used to take the input strings from a file
instead of typing them directly. The file is processed line by line:
fst[1]: apply down < US-num.txt
Opening file US-num.txt...
apply down> 1,234.99
1.234,99
apply down> 0.5
0,5
apply down> 100,000,000
100.000.000
Closing file US-num.txt...
In order to have the transducer available in the future, we can save it
to a file. The command 'save' writes all the networks currently on the
stack into a single file. In this case, the file will contain just one
network:
xfst[1]: save stack US-to-EU-num.fst
xfst[1]:
The command 'quit' exits from XFST and returns back to Unix:
xfst[1]: quit
>
However, instead of quitting, we move on to the next topic.
Loading and using a stored network
In this example, we load the network we just created from a file to the
stack, add another network on the top of the first one, and then perform
an operation to replace both of them with the result of that operation.
Let us first empty the stack with 'clear stack' and then load the network
back from the file.
xfst[1]: clear stack
xfst[0]: load US-to-EU-num.fst
xfst[1]:
We will now create another network by compiling a simple network from the
same little text file we already used above
xfst[1]: read text < US-num.txt
Opening file US-num.txt...
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
xfst[2]:
The 'read text' command expects as its argument a name of a file containing
a list of words, one entry per line. It compiles the word list into a network.
The command 'print words' displays the content of the compiled word list.
xfst[1]: print words
1,234.99
100,000,000
0.5
xfst[2]:
As the prompt 'xfst[2]:' indicates, we now have two networks on the stack.
We can get more information about the stack with the 'print stack' command:
fst[2]: print stack
0: 20 states, 21 arcs, 3 words.
1: 4 states, 41 arcs, Circular.
fst[2]:
The positions in the stack are numbered from 0 upwards. The net in position
0, the three-word network, is on top of the stack. Unary commands such
as 'print net' and 'print words' apply to the top network on the stack,
if not followed by some defined name. The net below it, in position 1,
is the transducer converting US-style numbers to European-style numbers.
Instead of applying the transducer to strings one at the time, as we
did before with the 'apply down' command, we can now transduce the entire
list of words at once with the 'compose net' command:
fst[2]: compose net
20 states, 21 arcs, 3 words.
fst[1]:
The compose operation replaces the two networks on the stack by the result
of the operation. Thus we now have just one network left. It is instructive
to view its contents using the same 'print words' command as before.
fst[1]: print words
0<.:,>5
100<,:.>000<,:.>000
1<,:.>234<.:,>99
fst[1]:
The result of the composition is a transducer. It denotes a relation, a
mapping from one regular language into another one. On its "upper side",
the transducer has the three original US-style numbers, each mapped to
a corresponding European-style on the "lower side" of the transducer. For
the most part, the mapping is an identity relation because each digit is
mapped to itself. The only difference is that periods are mapped to commas,
and vice versa. The angle brackets indicate pairs of non-identical symbols:
<.:,> indicates the correspondence of an upper-side period
with a lower-side comma; similarly for <,:.>.
We can view the upper and lower languages of the relation independently.
print upper-words displays the three US numbers; print lower-words
shows what they have been transduced into:
fst[1]: print lower-words
0,5
100.000.000
1.234,99
fst[1]:
As before, can use the 'apply' commands to map strings on one side of the
transducer to the corresponding strings on the other side. For example,
'apply up' maps any one of the three lower-side European numbers to the
corresponding upper-side string.
fst[1]: apply up 0,5
0.5
We can also extract one of the other language from the relation. The command
'lower-side net' extracts from the transducer a simple automaton that contains
just the three European numbers.
fst[1]: lower-side net
20 states, 21 arcs, 3 words.
fst[1]: print words
1.234,99
100.000.000
0,5
fst[1]:quit
Running XFST with a script
The examples we have given so far are about using XFST in an interactive
mode, with the user typing commands on the prompt line. It is more convenient,
for many purposes, to write a list of commands to be run in batch mode
without any user interaction. To illustrate this possibility, we write
a script that compiles the US-to-European transducer. and uses it to produce
a file of European-style numbers from a file of US-style numbers.
A script is an ordinary text file that can be prepared with any text
editor, such as Emacs. Here we create the script with the help of the simple
Unix 'cat' command. The command 'cat > xfst.script' writes the rest of
the text up to the closing ^D (control-D) into the file xfst.script.
cat > xfst.script
# A script for producing a file of European numbers
# from a file of US numbers.
echo >> Defining Digit
define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
echo >> Defining number converter
read regex %. -> %, , %, -> %. || Digit _ Digit ;
echo >> reading US numbers
read text < US-num.txt
echo >> Composing and extracting EU numbers
compose net
lower-side net
echo >> Creating EU-num.txt
write text EU-num.txt
system ls -l EU-num.txt
echo >> Done.
quit
^D
The lines beginning with # are comments that are ignored by xfst.
The 'echo' command prints the rest of the line to the standard output when
the script is executed. In complex scripts it is often useful to print
progress reports during the execution.
Because the purpose of this script is to produce an output file, we
verify at the end that the file has indeed been created. The 'system' command
sends the rest of the line to the Unix operating system. If the file is
in the directory, the script was run successfully
To execute the script, we invoke xfst with the -f flag:
> xfst -f xfst.script
>> Defining Digit
2 states, 10 arcs, 10 words.
>> Defining number converter
4 states, 41 arcs, Circular.
>> reading US numbers
Opening file US-num.txt...
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
>> Composing and extracting EU numbers
20 states, 21 arcs, 3 words.
20 states, 21 arcs, 3 words.
>> Creating EU-num.txt
-rw-r--r-- 1 lk xerox 25 May 30 13:09 EU-num.txt
>> Done.
bye.
The -f flag causes xfst to exit after the script has been run.
If the user wishes to take over at that point and continue running xfst
in an interactive mode, he can call the script with -l flag instead
of -f.
Another non-interactive way of running xfst is to include a sequence
of xfst commands directly on the Unix command line. Each xfst command must
be enclosed in double quotes and preceded by an -e flag. If the
final -stop flag is omitted, xfst executes the commands and continues
running in an interactive mode.
> xfst -e "define Digit [%0|1|2|3|4|5|6|7|8|9] ;" \
-e "read regex %. -> %, , %, -> %. || Digit _ Digit;" \
-e "read text < US-num.txt" \
-e "compose net" \
-e "lower-side net" \
-e "write text EU-num.txt" -stop
Note the backslash at the end of each line. It signals to to the Unix operating
system that the entire sequence of commands counts as a single command
line.
Defining aliases
XFST allows the user to create simple names for more complex commands.
For example,
alias dir system ls -l *.txt
creates a new XFST command 'dir' that has the same effect as 'system ls
-l *.txt', that is, it sends to the Unix operating system the command 'ls
-l *.txt'. The chosen alias must be a single word with no hyphens, underscores,
or other special characters. The command 'print alias' lists all the current
aliases and their definitions.
An alias can represent an arbitrary sequence of commands. To create
such an alias, the user first types only the name to be defined.
xfst[0]: alias ConvertAndShow
XFST responds by prompting the user for commands. The list can be terminated
by ^D (control-D) or by a special symbol, END;, with no extra whitespace
around it:
alias> define Digit [%0|1|2|3|4|5|6|7|8|9] ;
alias> define Conv %. -> %, , %, -> %. || Digit _ Digit ;
alias> read regex [@txt "US-num.txt" .o. Conv].l ;
alias> echo
alias> echo >> Here is the result:
alias> print words
alias> END;
With slightly different means, the above definition of the alias 'ConvertAndShow'
produces the same end result as our previous examples: a network of European-style
numbers obtained by transduction from a US-style source file. We define
the conversion relation, Conv, by means of the earlier definition of Digit.
Instead of the command 'read text' to compile the word list in US-num.txt,
we use the regular expression operator @txt to bring about the
same effect. Similarly, in place of the XFST commands 'compose net' and
'lower-side net', we employ the corresponding operators in the regular
expression calculus: [@txt "US-num.txt" .o. Conv].l denotes the
lower side language (.l) of the composition (.o.) of the US numbers
with the conversion relation.
Now XFST interprets ConvertAndShow as a command to execute the defined
set of commands:
xfst[0]: ConvertAndShow
2 states, 10 arcs, 10 words.
4 states, 41 arcs, Circular.
Opening file US-num.txt...
Closing file US-num.txt...
20 states, 21 arcs, 3 words.
20 states, 21 arcs, 3 words.
>> Here is the result:
1,234.99
100,000,000
0.5
xfst[1]:
A convenient way to extend the xfst command language is to place such alias
definitions in a script file that is executed with with the -l
flag as the application is launched.
Overview of Commands
This section contains a general overview of xfst commands. A complete alphabetical
list, with explanations, is found in the next section.
Command Syntax
XFST commands are in general of the form '<command> <type or
object>', where <command> specifies the operation to be performed
and the second term, if any, gives some additional specification about
the type of the operation or the object it applies to. For example, there
are several variants of the 'print' command: 'print net', 'print sigma',
'print words', etc.
All display commands and all unary operations, such as 'lower-side net',
apply to the network on the top of the stack. For example, 'print words'
enumerates the paths of the topmost network only, although there may be
others on the stack. Some commands, such as 'print net' and 'print words',
can be followed by a name of network which has been given a name with the
'define' command, as shown below:
xfst[0]: read text
text> two
text> words
text> ^D
8 states, 8 arcs, 2 words.
fst[1]: define W2
fst[0]: print words W2
two
words
xfst[0]:
Virtually all XFST commands can be abbreviated to a single word command.
For example, the 'print' part of all print commands can be dropped. Thus
'sigma' as a command has the same effect as 'print sigma'. Similarly, 'regex'
and 'read regex' are equivalent.
Short command names are convenient when one is working in an interactive
mode. On the other hand, for scripts we recommend using the long commands
because it tends to make the scripts more perspicuous.
Command Classes
The FST commands can be grouped into five classes. We describe first in
general terms the five classes and list the commands in each class. A more
detailed description of each command is given in an alphabetical listing
in the next section.
-
Input/Output and Stack Commands
-
move networks onto and off the stack, save the stack or the top network
into a file in various formats. We use the commands 'load' and 'save' for
binary files, 'read' and 'write' for text files. 'read regex' and 'define'
compile a network from a regular expression; 'read regex' pushes the result
on the stack, 'define' assigns it to the chosen name.
The 'read' commands take the input from the console (stdin)
unless the user specifies another source by means of < symbol and a
file name. For example, 'read text < /usr/dict/words' compiles the list
in /usr/dict/words.
Similarly, the 'write' commands send the output to the console
(stdout) unless the command is followed by > and a name of a
target file. It is also possible to send the output directly to a
printer. For example, 'write text | lp' uses the | convention
in Unix to pass the output to the 'lp' print command.
Binary
input/output
|
load stack, save stack, load defined, save defined |
Text
input/output
|
read prolog, read spaced-text, read text, write prolog, write spaced-text,
write text. |
Regular
expressions
|
define, read regex, undefine |
Stack
commands
|
clear stack, pop stack, push defined, turn stack, rotate stack |
-
Display commands
-
give information about the content and properties of the network on the
top of the stack, the language or relation it represents, the state of
the system, and the available commands.
Like the 'write' commands in the previous section, the 'print' commands
display the output on the console, unless the user directs it to a file
or a printer. The 'print' commands do not have a corresponding input command.
Stack and network
information
|
inspect net, print eqv-labels, print flags, print print labels,
label-tally, print longest-string, print name, print net, print sigma,
print sigma-tally, print sigma-word-tally, print size, print stack |
System information
|
print aliases, print file-info, print defined, print directory, print
storage |
Network
content
|
print random-lower, print random-upper, print words, print lower-words,
print upper-words, print nth-upper, print nth-lower, print num-upper, print
num-lower |
Apply network
to a string
|
apply up, apply down |
Help
|
apropos, help |
Other
|
echo |
-
Tests of network properties
-
Unary tests check whether the network on the stack has certain properties.
Binary tests compare the languages of the two top network on the stack.
Unary
|
test lower-bounded, test upper-bounded, test lower-universal, test
upper-universal |
Binary
|
test equivalent, test overlap, test sublanguage |
-
Operations on networks
-
produce a network that represents a new language or relation by modifying
the top network, two top networks, or all the networks on the stack. The
affected networks are removed from the stack and replaced by the result
of the operation.
Property list commands modify or display the attribute-value pairs
on the top network's property list. Maintenance commands perform
various housekeeping operations of the top level network structure
without changing the language or the relation it represents.
Unary
operations
|
invert net, label net, lower-side net,
multi char sigma net,
negate net, one-plus net, paste net labels,
reverse net, sigma net,
single char sigma net,
substitute defined, substitute label,
substitute symbol, substring net, upper-side net,
zero-plus net |
Binary
operations
|
crossproduct net, minus net |
N-ary
operations
|
compose net, concatenate net, intersect net, shuffle net, union net |
Property
list
|
add properties, edit properties, name net, read properties, write
properties |
Maintenance
|
complete net, determinize net, eliminate flag, epsilon-remove net,
minimize net, prune net, sort net, compact sigma, optimize net, unoptimize
net |
-
System commands
-
change the internal configuration of XFST or interact with the Unix operating
system.
Variables
|
show, set |
Aliases
|
alias |
System call
|
system |
Execute script
|
source |
Other
|
quit |
Alphabetical Index
This index consists of three parts. The first index lists all the currently
implemented XFST commands. The second list describes the variables that
the user can manipulate with the XFST 'set' command. The final part explains
the correspondence between certain XFST commands and certain operators
in the regular expression calculus. The XFST commands involved are the
'load' and 'read' commands and all commands that construct a new network
by means of an operation on one or more networks on the stack.
List of Commands
This list gives a brief description of all the currently implemented XFST
commands. This information is also available on-line in XFST with the 'help'
command. For example, 'help add properties' prints out the description
shown below.
-
add properties
add properties < source-file
- reads attribute-value pairs from a text file and adds them to
the property list of the top network on the stack.
Each line in the source file must contain one attribute-value
pair. For example,
COPYRIGHT: "Xerox Corporation"
DATE: "June 10, 1997"
VERSION: 9
The attributes, each terminated by a colon, must be strings (upper
case recommended). The values may be strings or integers. String
values must be enclosed in double quotes: "9" and 9 are considered as
different values. If an attribute to be added already exists on the
property list of the network, 'add properties' will reset it to a new
value.
See also: 'read properties', 'write-properties'.
-
alias
alias name
alias name command-sequence
- 'alias ' defines as a shorthand
for the given sequence of XFST commands. For instance, the new command
'ls' can be created in the following way:
alias ls print directory
Alias may contain several commands. A new command 'lexr', for
reading a lexicon, can be created in the following way:
fst[0]: alias lexr
alias> load lexicon.fst
alias> print properties
alias> END;
A multi-command alias may also be defined on one line as follows:
alias lexr load lexicon.fst; print properties
See also: 'print aliases'.
-
ambiguity net
- converts the lexical transducer on the top of the stack into a
transducer that maps each word on the lower-side of the network to
itself and a sequence of alternative tags that the word has in the
original transducer. Limitations: (1) the last symbol on each path is
interpreted as a tag, (2) only one tag per word is allowed,
(3) the network must be acyclic.
-
apply up
apply up line
apply up < source-file
- Simulates the composition of the input string with the lower
side of the top network on the stack and extracts the result from the
upper side. If the command `apply up' is followed by a carriage
return,the program prompts the user with apply>, prints the result and
gives a prompt for a new input. Control-D exits from this mode. See
also the variable 'print-pairs'.
-
apply down
apply down line
apply down < source-file
- Simulates the composition of the input string with the upper
side of the top network on the stack and extracts the result from the
lower side. If the command `apply down' is followed by a carriage
return, the program prompts the user with apply>, prints the result
and gives a prompt for a new input. Control-D exits from this mode.
See also the variable 'print-pairs'.
-
apropos text
- 'apropos text' prints brief information about the commands
relating to the given text. If you need further information about a
command, use the command 'help command-name'.
-
cleanup net
- merges X:0 0:Y and 0:Y X:0 sequences into
X:Y, if possible.
-
clear stack
- Removes all networks on the stack.
-
compact sigma
- all symbols that have the same distribution as the unknown
symbol are eliminated from the alphabet and the corresponding
redundant arcs are removed from the network.
-
complete net
- extends the top network until it has a transition for every
symbol in sigma in every state. All new transitions lead to a failure
state, thus the language of the network is preserved.Complete is
defined only for simple networks (arity = 1), not for transducers.
-
compose net
- replaces the stack with the composition of all networks currently
on the stack. If the stack consists of two networks, defined as A and
B, with A on top of the stack, the result is equivalent to compiling
the regular expression [A .o. B].
-
concatenate net
- replaces the stack with the concatenation of all networks
currently on the stack. If the consists of two networks, defined as A
and B, with A on top of the stack, the result is equivalent to
compiling the regular expression [A B]. In regular expressions,
concatenation is indicated simply by whitespace.
-
crossproduct net
- returns the crossproduct of the first two networks on the
stack. They must not be transducers. If the stack consists of two
networks, defined as A and B, with A on top of the stack, the result
is equivalent to compiling the regular expression [A .x. B].
-
define symbol
define symbol regular-expression
- binds the given symbol to a network compiled from the regular
expression. Every subsequent occurrence of the symbol in a regular
expression is interpreted as the name of the corresponding language or
relation. To unbind the symbol use 'undefine '. If the regular
expression is omitted, i.e. if the command is 'define ', the
symbol is bound to the top net on the stack. The net is removed from the
stack.
See also: 'load defined', 'save defined', 'print defined', 'read regex',
'undefine', and 'recursive-define'.
-
determinize net
- replaces the top network with an equivalent deterministic
network. A deterministic net has at most one transition per label
from each state.
-
echo text
- prints the text, followed by a newline.
-
edit properties
- 'edit properties' allows the user to edit the property list of
the top network on the stack. Properties may be added, deleted,
and modified. Values must be integers or strings.
-
eliminate flag
- replaces the top network on the stack with an equivalent network
that encodes directly the constraints expressed by the given flag
attribute.
-
epsilon-remove net
- replaces the top network with an equivalent one that has no
epsilon transitions.
-
help
help command-name
- 'help' lists the commands and the command classes.
To get more information about a command, use the command 'help
command'. It gives a longer explanations about the command and its
usage. All possible arguments are listed and explained. For
instance, the command 'help' (see above) may be used without arguments
or it may have one argument that is a name of a command.
The command name can be a name of the class of commands, for instance,
'help stack' prints all stack commands. If the given command name
is not recognised, command help works like command 'apropos'.
-
inspect net
- invokes a mode for interactively inspecting the top network on
the stack.
-
intersect net
- replaces the stack with the intersection of all networks
currently on the stack. If the stack consists of two networks, defined
as A and B, regardless of the order, the result is equivalent to
compiling the regular expression [A & B].
-
invert net
- exchanges the two sides of the top network on the stack. If
the network is defined as A, the result is equivalent to compiling the
regular expression [A.i]. 'invert net' has no effect on simple
networks.
-
label net
- replaces the top network on the stack with a network that
represents the language or relation that includes all the strings
and string pairs in the label alphabet of the original network.
See also 'print labels', 'sigma net'.
-
load defined source-file
- restores the definitions from a binary source file. The
networks are assigned to the chosen names directly, without pushing
them onto the stack. The file must be in the binary format, as
produced by the 'save defined' command.
-
load stack < source-file
- reads the networks in the given file, and pushes them onto the
the stack. The file must be in the binary format, as produced by the
'save stack' command. If the file only contains a single network, it
can also be loaded with the command 'read regex [@bin
];'.
-
lower-side net
- extracts the lower language of the top network on the stack. If
the network is defined as A, the result is equivalent to compiling the
regular expression [A.l]. 'lower-side net' has no effect on simple
networks.
-
minimize net
- replaces the top network on the stack with an equivalent network
that has a minimal number of states.
-
minus net
- replaces the top two networks on the stack with a network
containing all strings in the first network that are not in the
second. If the two networks are defined as A and B, with A on top of
the stack, the result is equivalent to compiling the regular
expression [A - B].
-
multi char sigma net
- replaces the top network on the stack with a network that
represents the language of the words of the original network.
The words are composed into multi-character symbols. See also
'single char sigma net', 'print sigma', 'label net'
'sigma net'.
-
name net symbol
- assigns the given name to the top network on the stack.
-
negate net
- replaces the top network on the stack with a network that accepts
all and only those strings rejected by the original. If the network is
defined as A, the result is equivalent to compiling the regular
expression [~A].
-
one-plus net
- concatenates the top network any number of times with itself. If
the network is defined as A, the result is equivalent to compiling the
regular expression [A+].
-
optimize net
- runs a heuristic algorithm that tries to reduce the number of
arcs, possibly at the cost of introducing more states.
-
paste net labels
- replaces the top network on the stack with a network that
represents the same language as the original network.
Along every path, symbols (labels) are pasted together into
as few as possible multi-character symbols. Loops and branching
are handled correctly.
See also 'multi char sigma net' 'single char sigma net',
'print sigma', 'label net', 'sigma net'.
-
pop stack
- removes the top network on the stack.
-
push defined
- makes a copy of the network named by a previous 'define' command
and pushes it onto the stack. 'push defined SYMBOL' has the same effect
as 'read regex SYMBOL ;'.
-
print aliases
print aliases > target-file
- prints each alias and the command sequence it encodes.
-
print defined
print defined > target-file
- prints each defined symbol and the size of the network it stands for.
-
print directory
print directory filenames
print directory [filenames] > target-file
- prints the content of the current directory in its entirety or
the files that match the given pattern, such as '*.txt'.
-
print eqv-labels
print eqv-labels > target-file
- prints the label equivalence classes of the networks on the
stack. Labels that belong to the same equivalence class always appear
together on arcs that lead to the same destination state.
-
print file-info
print file-info > target-file
- displays info about the last opened binary file.
-
print flags
print flags > target-file
- 'print flags' prints information about the top network on the stack.
-
print labels
print labels symbol
print labels > target-file
- prints the label alphabet of the top network on the stack (by
default) or of the given defined network. The label alphabet includes
all the symbols and symbol pairs that occur as arc labels in the
network. See also 'print sigma', 'label net'.
-
print label-tally
print label-tally > target-file
- tallies label frequencies in the top network on the stack.
-
print longest-string
print longest-string > target-file
- prints the length of the longest path in the top net.
-
print lower-words
print lower-words name
print lower-words number
print lower-words name number
print lower-words > target-file
- displays the words in the lower side language of the top
network or in the network assigned to the defined symbol. If a number
is given, only that many words are printed, otherwise every word is
printed but not looping on any path more than once. The output can be
directed to a file by adding a greater-than symbol, >, and a name of a
target file to the end of the command. For example, 'print lower-words
> lower-words.txt'. If the network is a simple network rather than a
transducer, 'print lower-words', 'print words', and 'print
upper-words' give the same output. See also the variables
'print-space' and 'quote-special'.
-
print name
print name > target-file
- prints the name of the top network on the stack. If the net is
unnamed, the hex address of the net is printed instead. Use the
command 'name net' to give a name to the top network.
-
print net
print net symbol
print net > target-file
- prints a text representation of the top network on the stack
(by default) or the given definition. The output is directed to a
file by adding a greater-than symbol, >, and a name of a target file
to the end of the command.
-
print nth-lower
print nth-lower name
print nth-lower number
print nth-lower name number
print nth-lower > target-file
- displays the word on the lower side of the nth path in the top
level network or in the network assigned to the defined symbol. This
command only works on acyclic networks. The inverse command is 'print
num-lower'.
-
print nth-upper
print nth-upper name
print nth-upper number
print nth-upper name number
print nth-upper > target-file
- displays the word on the upper side of the nth path in the top
level network or in the network assigned to the defined symbol. This
command only works on acyclic networks. The inverse command is 'print
num-upper'.
-
print num-lower word
print num-lower name word
print num-lower > target-file
- displays the numbers of all paths in the top level network that
have word on the lower side. This command only works on acyclic
networks. The inverse command is 'print nth-lower'.
-
print num-upper word
print num-upper name word
print num-upper > target-file
- displays the numbers of all paths in the top level network that
have word on the upper side. This command only works on acyclic
networks. The inverse command is 'print nth-upper'.
-
print random-lower
print random-lower number
print random-lower > target-file
- generates 15 random words (by default) from the top network on
the stack or any number specified in the command. The words are from
the lower side language of the net. If the network is a simple
automaton, rather than a transducer, 'print random-lower' gives the
same output as 'print random' and 'print random-upper'. See also the
variable 'print-space'.
-
print random-upper
print random-upper number
print random-upper > target-file
- generates 15 random words (by default) from the top network on
the stack, or any number specified in the command. The words are from
the upper side language of the net. If the network is a simple
automaton, rather than a transducer, 'print random-upper' gives the
same output as 'print random' and 'print random-lower'. See also the
variable 'print-space'.
-
print random-words
print random-words number
print random-words > target-file
- generates 15 random words (by default) from the top network on the
stack or any number specified in the command. If the network is a
transducer, the output may contain symbol pairs. If the network is a
simple automaton, rather than a transducer, 'print random-words' gives
the same output as 'print random-lower' and 'print random-upper'. See
also the variable 'print-space'.
-
print sigma
print sigma symbol
print sigma > target-file
- prints the sigma alphabet of the top network on the stack (by
default) or the given defined network. The sigma alphabet includes all the
symbols that appear in the network, either as such or as a constituent
of a symbol pair, and any symbols that have been excluded from the
network by subtraction or complementation. See also 'print labels',
'sigma net'.
-
print sigma-tally
print sigma-tally > target-file
- tallies sigma arc frequencies in the top network on the stack.
-
print sigma-word-tally
print sigma-word-tally upper
print sigma-word-tally lower
print sigma-word-tally > target-file
- tallies sigma frequencies by path in the top network on the stack.
-
print size
print size symbol
print size > target-file
- prints the size of the top network on the stack (by default) or
the given definition.
-
print stack
print stack > target-file
- displays the content of the stack.
-
print storage
print storage > target-file
- displays memory usage
-
print upper-words
print upper-words name
print upper-words number
print upper-words name number
print upper-words > target-file
- displays the words in the upper side language of the top
network or in the network assigned to the defined symbol. If a number
is given, only that many words are printed, otherwise every word is
printed but not looping on any path more than once. The output can be
directed to a file by adding a greater-than symbol, >, and a name of a
target file to the end of the command. For example, 'print upper-words
> upper-words.txt'. If the network is a simple network rather than a
transducer, 'print lower-words', 'print words', and 'print
upper-words' give the same output. See also the variables
'print-space' and 'quote-special'.
-
print words
print words name
print words number
print words name number
- displays the paths in the top network or in the network
assigned to the defined symbol. If a number is given, only that many
paths are printed, otherwise every word is printed but not looping on
any path more than once. The output can be directed to a file by
adding a greater-than symbol, >, and a name of a target file to the
end of the command. For example, 'print words > words.txt' If the
network is a transducer, the output may contain symbol pairs.
Otherwise, 'print lower-words', 'print words', and 'print upper-words'
give the same output. See also the variables 'print-space' and
'quote-special'.
-
prune net
- removes all paths leading to non-final states in the top
network on the stack.
-
quit
- Exits the program.
-
read prolog
read prolog < source-file
- reads the network(s) contained in the file and pushes them onto
the stack. The file must be in the format produced by the 'write
prolog' command. The the same effect can be produced with the command
'read regex [@pl ];'
-
read properties
read properties < source-file
- reads a set of attribute-value pairs from a text file and
resets the property list of the top network on the stack.
See also: 'add properties', 'write properties'.
-
read regex
read regex < source-file
read regex regular expression
- creates a network from a regular expression and and pushes it
onto the stack. The file should contain one regular expression. If no
filename is given the input is read from the console. The input
expression must be terminated by a semicolon.
-
read spaced-text
read spaced-text source-file
- reads a transducer or a net with multicharacter symbols from a file.
See also function 'print words' and variable 'print-space'.
-
read text
read text < source-file
- reads the net as text from a file.
See also function 'print words' and variable 'print-space'.
-
reverse net
- replaces the top net on the stack with one that accepts the
reverse language.
-
rotate stack
- Pushes the top element of the stack to the bottom.
-
save defined target-file
- stores the networks for all defined symbols into a binary file.
Use 'load defined' to restore the definitions.
-
save stack target-file
- stores all the networks on the stack into the given file in a
binary format. Use 'load stack' to load the file.
-
set variable value
- sets a new value to the variable. The value may be an integer,
ON or OFF, or a string. See variable for details. See also:
'show'.
-
show variable
- shows the value of the variable. See also 'set'.
-
shuffle net
- replaces the stack with a single network that accepts every
string formed by shuffling together (interdigitanting) one string from
each of the input languages. For instance, if Net1 accepts ab and
Net2 accepts xy, then the shuffle of these nets will accept the
following six strings: abxy axby axyb xaby xayb xyab.
-
sigma net
- replaces the top network on the stack with a network that
represents the language of the strings in the sigma alphabet of the
original network. See also 'print sigma', 'label net'.
-
single char sigma net
- replaces the top network on the stack with a network that
represents the language of the strings in the sigma alphabet of the
original network. Multi-character symbols are decomposed into single
character ones. See also 'print sigma', 'label net' 'sigma net'.
-
sort net
- Reorders the arcs of the top network in a canonical order.
-
source command file
- executes a file of XFST commands. For example, 'source
xfst.script' executes the commands in xfst.script.
-
substitute defined NAME for LABEL
- replaces the top network on the stack with a network derived by
splicing in the network associated with the given name for each arc that
contains the label. If the network is a transducer, LABEL must not occur
in a symbol pair.
-
substitute label list-of-labels for label
- replaces the top network on the stack with a network derived by
replacing every arc with the given label by an arc or arcs, each labeled
with one of the substitute labels. If the list consists of the keyword
NOTHING, all paths containing the given label are removed. The label and
the substitutes may be single symbols or a symbol pairs.
See also 'substitute symbol'.
-
substitute symbol list-of-symbols for symbol
- replaces the top network on the stack with a network derived by
replacing every arc whose label contains the given symbol by an arc or
arcs whose label instead contains one of the substitute symbols. If the
list consists of the keyword NOTHING, all paths containing the given
symbol are removed. The symbol and the substitutes must be single symbols,
not symbol pairs. See also 'substitute label'.
-
substring net
- replaces the top network on the stack with a network that
accepts every string that is a substring of some string in the
language of the original network.
-
system system-command
- 'system command' executes the given operating system
command(s), for instance, 'system ls'.
-
test equivalent
- returns 1 (TRUE) if all networks on the stack contain the same
language
-
test lower-bounded
- returns 1 (TRUE) if the lower side of the top level network has
no epsilon cycles.
-
test lower-universal
- returns 1 (TRUE) if the lower side of the top level network
represents the universal language.
-
test overlap
- returns 1 (TRUE) if the languages of all networks on the stack
have a non-empty intersection.
-
test sublanguage
- returns 1 (TRUE) if the language of the i-th network is a
sublanguage of the next (i+1-th) one on the stack. Starting at the
topmost network (i=0). See also 'print stack' and 'turn stack'.
-
test upper-bounded
- returns 1 (TRUE) if the upper side of the top level network has
no epsilon cycles.
-
test upper-universal
- returns 1 (TRUE) if the upper side of the top level network
contains the universal language.
-
turn stack
- reverses the order of the networks on the the stack. NOTE: Use the
command 'reverse net' to create the reverse language of the top net.
-
undefine list of symbols
- unbinds the symbol that was bound by the 'define'
command. 'undefine symbol1 symbol2 symbol3 ...' etc. undefines all the
listed symbols. 'undefine ALL' removes all the definitions.
-
union net
- replaces the stack with the union of all networks currently on
the stack.
-
unoptimize net
- reverses the effect of 'optimize net'
-
upper-side net
- extracts the upper language from the network on the top of the
stack. If the network is defined as A, the result is equivalent to
compiling the regular expression [A.l]. 'upper-side net' has no
effect on simple networks.
-
write prolog
write prolog > target-file
- writes the network as Prolog-clauses. See also: 'read prolog'.
-
write properties
write properties symbol
write properties > target-file
- prints the attribute-value pairs on the property list of the
network. See also 'add properties', 'read properties'.
-
write spaced-text
write spaced-text > target-file
- Stores the top network in a text format. This format is intended
for transducers and networks with multi-character labels. Use
'read spaced-text' to read files in this format. Use 'write text'
for simple networks with single-character labels. Use 'save stack'
to store in a compact binary format. If no target file is specified,
output will be sent to standard output (usually the terminal).
-
write text > target-file
- stores the words in the top network, one per line. This format
cannot be used for transducers or networks with multi-character
labels. For such networks, use 'save-as-spaced-text' instead. Use
'save stack' to store in a compact binary format. If no target file is
specified, output will be sent to standard output (usually the
terminal).
-
zero-plus net
- concatenates the top network any number of times with itself (as
one-plus net also does) and adds to it the empty string. If
the network is defined as A, the result is equivalent to compiling the
regular expression [A*].
Variables
This is the list of variables that the user can set with the command 'set
variable value' with the list of available values. The command
'show variable' displays the current value of the variable.
- variable compose-flag-as-special (ON|OFF)
-
when ON, composition treats flag diacritics as special kind of
of epsilon symbols. Flag arcs are like epsilon arcs in that they
are not matched against any symbol in the other network but, unlike
true epsilons, they are retained in the resulting network. When OFF
flag diacritics are treated as ordinary symbols. Default is OFF.
- variable directory (filename)
-
The working directory
- variable mark-version (ON|OFF)
-
when ON, the current version of the program is saved on the property
list of the network under the attribute TOOLVERSIONS when the network is
saved into a file.
- variable max-regex-errors (integer)
-
The maximum number of errors for regular expressions. When it
is exceeded the rest of the input file is discarded.
- variable minimal (ON|OFF)
-
when ON, network operations minimize the result. Minimization
includes epsilon removal and determinization. Default is ON.
- variable obey-flags (ON|OFF)
-
when ON, the constraints expressed as flag diacritics are
taken into account by 'print [upper|lower]-words' and
'print random-[upper|lower]' commands. When OFF, flag diacritics
are treated as ordinary symbols in listing the contents of a
network. Default is ON.
- variable name-nets (ON|OFF)
-
when ON, regex names networks. Default is OFF.
- variable print-pairs (ON|OFF)
-
when ON,'apply up' and 'apply down' show both (UPPER and LOWER)
sides of labels. Default is OFF.
- variable print-sigma (ON|OFF)
-
when ON, the sigma is shown when a network is printed. Default is ON.
- variable print-space (ON|OFF)
-
when ON, 'print word' and 'print random-word' insert a space between symbols.
Default is OFF.
- variable process-in-order (ON|OFF)
-
when ON, intersection and composition operations process networks
in the same order as they appear in the stack. Default is OFF.
- variable recursive-define (ON|OFF)
-
when ON, 'define ;' treats instances of in
as referring to what is being defined. 'define A [a | A b];' yields
the left-recursive language [a b*]. The right-recursive definition
'define B [a B | b];' assigns to B the language [a* b].
'define C [a C b | a b];' makes C equivalent to [a+ b+], a weak
approximation of the context-free language C -> a C b, C -> a b.
Default is OFF.
- variable quit-on-fail (ON|OFF)
-
when ON, the application exits with an error if a command is
incorrectly spelled or cannot be executed because of a missing file
or some other error. This is useful when the application is run by
a shell script. Default is OFF.
- variable quote-special (ON|OFF)
-
When ON literal instances of special characters are enclosed in
double quotes: zero and question mark are printed as "0" and "?"
to distinguish them from epsilon and the unknown symbol. Space, tab,
and newline are printed as " ", "\t", and "\n", respectively,
to distinguish them from each other. The affected commands are
'print words', 'print lower-words', and 'print upper-words'.
- variable random-seed (integer)
-
seed of the random number generator. It cannot be inspected.
- variable sort-arcs (ON|OFF)
-
when ON, the arcs of a network are sorted before the network is
pushed onto the stack. Default is ON.
- variable unicode (ON|OFF)
-
If the value is ON, two-byte characters are printed as four hex
integers preceded by \u, if the value is OFF we try to display 16-bit
characters directly. Default is OFF.
- variable verbose (ON|OFF)
-
When ON more information is printed.
Commands <-> RE operators
All the XFST commands that produce a new network from one or more
expressions on the stack and replace them with the result of the
operation correspond to an operation in the regular expression
calculus (see Syntax
and Semantics of Regular Expressions). Some of the load and read
commands also have a counterpart in the regular expression language.
The correspondence is summarized below along with a characterization
of the language or relation that the result implements, described in terms
of the corresponding regular expression. We also indicate for each command
whether it applies to the top network (unary), the two top networks (binary),
or to all networks on the stack (n-ary).
-
concatenate net (n-ary)
-
[A B] denotes the concatenation of all strings or string pairs
in A with all strings or string pairs in B.
-
compose net (n-ary)
-
[A .o. B] denotes the composition of the relations A and B.
-
crossproduct net (binary)
-
[A .x. B] denotes the relation that maps every string of A to
every string of B.
-
intersect net (n-ary)
-
[A & B] denotes the set of strings that belong to both A and
B.
-
invert net (unary)
-
[A.i] denotes the relation that is the inverse of A; that is the
upper and lower members of each string pair in the relation are switched
around but the relation remains otherwise the same.
-
load stack
-
[@bin source-file] denotes the topmost network in the binary
source-file. Note that @bin only retrieves one network,
whereas 'load stack' pushes onto the stack all the networks in the file.
If the name of the source file contains a period, a slash or some other
special character, the name should be enclosed in double quotes.
-
lower-side net (unary)
-
[A.l] denotes the lower language of the relation A.
-
minus net (binary)
-
[A - B] denotes the set of strings that are in A but not in B.
-
negate net (unary)
-
[~A] denotes the set of all strings that are not in A.
-
one-plus net (unary)
-
[A+] denotes all the strings that are composed by concatenating
strings of A with each other any number of times.
-
read prolog
read regex
read spaced-text
read text
-
have corresponding regular expression operators: @pl, @re,
@stxt, and @txt, respectively, These unary operators
are prefixed to the name of the source file. If the name contains a period,
a slash or some other special character, the name must be enclosed in double
quotes. For example,
read regex [@re regexfile] ;
compiles the regular expression in regexfile;
read regex [@txt "words.txt"] ;
has the same effect as
read text < words.txt
-
reverse net (unary)
-
[A.r] denotes the language or relation that is the reverse of
A.
-
substitute symbol symbol_list for symbol (unary)
-
`[A, SYMBOL, SYMBOL_LIST] denotes the language or relation
derived from A by replacing every instance of the given symbol by the
union of the symbols on the list.
-
union net (n-ary)
-
[A | B] denotes the union of expressions A and B. All the strings
and string pairs that belong to either A or B (or both) also belong to
the union.
-
upper-side net (unary)
-
[A.u] denotes the upper language of the relation A.
-
zero-plus net (unary)
-
[A*] denotes the set of strings that consists the empty string
and any number of concatenations of A with itself. [A*] is like
[A+] except that it always includes the empty string.
N.B. There are many regular expression operators without an equivalent
command in XFST. For example, the replace (->) and restriction (=>) operators
do not correspond to any XFST command.
We welcome your
comments and suggestions.
Copyright © The Document Company - Xerox 1997. All rights reserved.
Written by Lauri Karttunen, Tamás Gaál,
and André Kempe.
Last modified on
Sunday, 25 October 1998.