Shrdlu - Detailed comments:

[Prof. Ben-Avi]

It is very important that we do not over-estimate the significance of Terry Winograd's work in the way that ELIZA and PARRY were overestimated.

HETERARCHICAL THINKING

Terry Winograd's program is, in outline, three programs working in concert. What was new was the flexible manner of their organization. Programs may be combined serially, hierarchically, or heterarchically.

• In serial ("pass-oriented") combination, one program is run on the initial data and another then takes over, using the results output by its predecessor as its own data. (Typical of a compiler).

• In hierarchical combination, one program has overall control and the others are subordinate to it as mere subroutines in the service of the goals of the master program. The subordinate programs need not be directly accountable to the highest one, since there may be several hierarchical levels involved. But the flow of control passes in one direction only: downward. Consequently, although the responsibility for control is distributed throughout the system, in the sense that each lower level program has its own type of job to do, authorization must always derive from a higher level. Although subordinate programs can influence super-ordinate ones, (in the sense that the super-ordinate's actions may depend upon information gleaned from the running of lower level ones), they cannot turn round and tell a super-ordinate module what to do. Moreover, they cannot communicate directly at all with programs lying "sideways" in a hierarchy, (sister modules). Hierarchical control may be passed to the subordinate programs in a fixed or a variable order. Performance is naturally less rigid in the latter case, but even in such a case the overall decision as to which lower level module should be activated next lies with some higher level member of the hierarchy. Ultimately, all decisions directing performance are dependent on the highest level of all. A hierarchical assembly of routines is clearly more flexible than a strictly serial arrangement, since it is possible for the action (and activation) of the lower level units to be differentially directed by the master program, in light of the overall problem situation.

• In a heterarchical organization, however, the responsibility for control can be more equally distributed throughout the system, and internal communication is much increased. Programs that are related heterarchically can address or call upon each other either "up", "down", or "sideways." Moreover, they can do this at many different points in their (potentially independent) functioning. The human analogy is a group of intercommunicating specialists contributing their several skills to a coöperative enterprise, rather than a troop of servants each unquestioningly obeying their mistress -- or her butler, cook, and housekeeper.

To be sure, coöperation can be incompetent, and heterarchy is not a magical guarantee of efficiency: the camel, it has been said, is a horse designed by a committee. Coöperating experts usually must not only be expert: they must normally also be guided by a clear idea of the overall goal if they are not to risk being sidetracked into irrelevancies. Sometimes this clear idea is shared by all members of the committee, but more often it is the responsibility of the chairman, to whom individual members continually send in reports; if the chairman has insufficient strategic control of the committee to guide it accordingly, what was to be a horse may turn out to be a camel. Occasionally, however, the requirements of the task are so restrictive as to keep the coöperative efforts within the bounds of relevancy, without anyone's having a global view of all aspects of the task.

In fruitful heterarchical coöperation, the specific problem conditions at a given time nicely determine which specialists will communicate with their fellows, and what type of communication (question, complaint, command, advice, criticism, answer, etc.) will be employed. The result is an enormous increase in the flexibility of performance (particularly in the sensitivity of control to context), as compared with serial or hierarchical arrangements.

The three programs that, broadly speaking, comprise Winograd's language-understanding system are concerned with grammar, semantics, and deduction. • The first is a parsing system that embodies a particular theory of English grammar and uses it to recognize the syntactic structure of sentences. • The second is a set of semantic programs dealing with meaning (whether of words, word groups, or whole sentences). This system is built around a collection of "semantic specialists" designed to interpret particular syntactic structures such as noun group, adjective group, preposition group, etc. • And the third is a deductive system that can solve problems of various kinds in exploring the consequences of facts, planning actions, and answering questions, and that includes a body of knowledge about the specific universe of discourse chosen -- viz., the "table-top" world of blocks and pyramids pictured above.

None of these three programs has a global view of what the system as a whole is doing. Even the "Monitor" program, which is the master program in the trivial sense that it calls the basic parts of the system, has no such view; it does not really monitor what goes on, for this would require receiving continual reports on progress and adjusting the activities of the various specialist programs accordingly. Winograd points out that most of the communication between components is done directly, and the monitor is called only at the beginning and end of the understanding process. In other words, it decides when SHRDLU is to listen and when it is to answer, but does not help it to listen or answer intelligently. Where SHRDLU's understanding is concerned, it is as if no overall chairman wisely guides the committee, but each member "does his/her own thing" with constant help from his/her peers.

A supervisory chairman is unnecessary because the experts are "tailor-made" to the particular, highly limited, tasks undertaken by the system. This is not strictly true of the parsing program which can parse very many English sentences. But, by the same token, the rules of syntax are so restrictive that the parser can readily recognize grammatically unacceptable constructions, so that these have no chance of passing unchecked. By contrast, the semantics and world knowledge provided by SHRDLU are grossly limited compared with human understanding (it knows only about blocks, and only very little about these), so that it is not possible for it to go wildly offtrack. A more knowledgeable program, with more varied types of expertise available to it, would need a genuine monitor with a global idea of what it was doing if its versatility were to be kept sensibly within bounds.

But SHRDLU is heterarchical rather than hierarchical in its functioning. Thus although Winograd describes the grammar program as "the main coordinator of the language understanding process", he also points out that the deductive system is "used by the program at all stages of the analysis, both to direct the parsing process and to deduce facts about the 'table-top' world." Moreover, each of the specialist semantic programs "has full power to use the deductive system, and can even call the grammar to do a special bit of parsing before going on with the semantic analysis." In short, to single out one component of the heterarchical system of programs as the crucial or overriding element would be invidious. In practice, the various systems coöperate in subtle ways so as concurrently to interpret the sentence presented to the machine.

As will be evident from the example to be described below, the deductive system is used both in the semantic evaluation that guides the parsing and in the program's answers and (simulated) motor actions. In essence, then, SHRDLU is a "theorem prover." Theorem-proving programs are aptly named: their function is to deduce theorems from an axiomatic base of knowledge by using strictly logical methods of reasoning. Some employ only very general methods, which are universally applicable in principle whether or not they are always sufficient in practice to solve the problem concerned; one example of such a technique is "resolution", whereby an assertion is shown to be a theorem by proving that its negation is impossible. Others, including SHRDLU, rely also on specific problem-solving strategies that are appropriate to particular problems only. But in either case the reasoning is "logical", or deductive, in nature. Non-deductive methods of reasoning may follow strict rules -- for instance, the "paranoid" attributions of malevolence in Colby's PARRY -- but these rules are not rules of proof. In assessing the truth of a statement (whether input to them or thought up by them), their prime concern is whether it is probable, given the relevant evidence available, not whether it is strictly entailed (or, conversely, contradicted) by the existing knowledge of the system.

Unlike PARRY, SHRDLU's thought processes are rigorously logical in form. When seeking an empty space on the table, for instance, in which to place a block it wishes to get rid of (as in item 1), its technique is deductively to work toward the goal of identifying a space that can be proved to fit the description "empty." It also has to prove that its hand is not already holding anything before it can pick up the big red block as requested. I'll show later that this logical purity has certain disadvantages, which would be more apparent were SHRDLU required to function in an "open" world (like yours and mine) in which whatever is not entailed by the known data cannot therefore be simply assumed not to be the case. (Think about it !!) SHRDLU's world is "closed" in the sense that the axioms are assumed to imply all the truths it needs to know.

Winograd's system parses the input sentence from left to right as it goes along, anticipating the syntactic structure of the portion yet to be considered. It applies its semantic and environmental knowledge at appropriate points in the parsing procedure to eliminate ambiguities. If necessary, it can backtrack to weed out parsings that it discovers, after all, to have no sensible application in context. Usually, the backtracking is not blind but is guided by the specific nature of the difficulty; the requisite advice is embodied in the syntactic and semantic theorems themselves. Like a human being who does not regard a sentence as ambiguous on a particular occasion of its use, even though it may be so in principle, Winograd's system assigns only one parsing in cases where an alternative syntactic analysis might seem to be possible. Earlier parsing programs had to cope with ambiguity (if at all) by returning all possible parsings of the input string, being unable to distinguish the contextually correct version. SHRDLU is significantly more intelligent.

For instance, the correct interpretation of the potentially ambiguous sentence, "Put the blue pyramid on the block in the box", is achieved only because the deductive knowledge system can be called to check which of the alternative parsings makes sense in the current situation (cf. item 34). [Put the blue pyramid (on the block in the box) or Put (the blue pyramid on the block) (in the box).]

The grammar program has to identify the major clause as either declarative, imperative, or a question. It first parses the initial word as an imperative, and takes it to be the main verb. This is possible since "put" is classified in the dictionary as a verb in the infinitive form, and the parser has the information that such verbs in the initial position are usually imperatives. The imperative, declarative, and interrogative moods are basic theoretical elements in systemic grammar, which differs from transformational grammar in identifying basic syntactic elements largely by reference to the meanings conveyed rather than by reference to formal structure alone. As Winograd puts it, systemic grammar gives priority to the question "Which features of a syntactic structure are important to conveying meaning, and which are just a by-product of the symbol manipulations needed to produce the right word order?" This aspect of systemic grammar makes it particularly suitable for facilitating fruitful communication between a "syntactic" parsing program and a "semantic" program concerned primarily with meaning.

Having identified an imperative, the parser expects to meet a noun group -- and parses "the blue pyramid on the block" accordingly. The parser may be described as "expecting" a noun group because its knowledge about verbs includes the advice that it should look first for a noun group on having met a transitive imperative; but it does not rigidly assume the presence of a noun group, and so would be able to parse sentences like the imperative "Hammer harder!"

The typical noun group structure that the parser expects to find can be represented as follows:



      

DET ORD NUM ADJ* CLASF* NOUN Q*

This diagram shows the syntactic "slots" in the order in which they occur within the phrase; not all the slots need be filled, though there is almost always a NOUN and usually also a DETerminer (such as "the", "a", "her", or "Mary's"). The "*" sign means that there may be more than one word of the class concerned, so the parser should not overhastily assume that it can pass immediately to the next slot in the list. After the initial DETerminer comes the ORDinal slot, which may contain words like "next", "fourth", "tenth", or "last." The NUMber slot can contain either single words or phrases, such as "seven" or "more than two hundred." There may be several ADJectives, such as "big, red, beautiful." Equally, there may be more than one CLASsiFier -- that is, a noun currently being used as a quasi-adjective, to describe the main NOUN: as in city fire hydrant. The main NOUN may be followed by one or more Qualifiers, such as "the man in the moon", or "the woman with red hair, who conducts the orchestra." Sometimes all the possible slots are filled within one noun group, as for instance in "the first three old red city fire hydrants without covers you can find." This is parsed as DET, ORD, NUM, ADJ, ADJ, CLASF, CLASF, NOUN, Q (preposition group), Q (clause).

It is because the parser expects to meet this structure when it looks for a noun group that the words "on the block" in our example are initially included by it within the noun group (that is, as attached to "the blue pyramid"). For "on the block" is a preposition group that in this position can function as a qualifier of the main noun.

However, one intuitively senses that "on the block" is ambiguous in our sample sentence. It might indeed be a phrase qualifying the object previously identified as the head of a noun group -- that is, the blue pyramid might already be sitting on the block. Alternatively, the preposition group might be using "on" in a directional rather than a locational sense, to say where the thing identified by the noun group is to be put. In the first case, the phrase "on the block" should be included within the noun group already identified, whereas in the second it should not. How does Winograd's system resolve this ambiguity?

Whenever a main section of a parse (such as a noun group) has been finished, the parsing program hands control over to the semantic program to see if the parse makes sense so far. The semantic program confirms, for instance, that "pyramid" is classified by the BLOCKS system of semantic markers as something which is a sort of physical object that can be manipulated by the robot (see Figures). The semantic program thus has no reason for rejecting this parsing as inherently nonsensical -- as it has with regard to the suggestion that tables can pick up things (cf. item 9). Up to and including the word "pyramid", then, the sentence causes no special difficulty.

But the potential ambiguity of "on the block" is recognized by the semantic specialist for "on", which knows that "on the block" can be used in a locational sense to qualify a noun such as "pyramid", but which does not know for sure whether it is being used in this sense on this occasion. In order to decide, the semantic specialist addresses the deductive program to ask for information: is the (one and only) blue pyramid currently on the block? If the deductive program replies that it is, the semantic program confirms the initial hypothesis of the parsing program, so that "on the block" is taken to be a qualifier of "the blue pyramid." If the deductive program replies that it is not, the parser backtracks to the point where the noun group has been identified simply as "the blue pyramid", and seeks another interpretation for the preposition group.

The interpretation corresponding to "onto the block (which is) in the box" is suggested next, and passed to the semantic program and thence to the deductive program for checking. Since this alternative implies interpreting the second preposition group ("in the box") as a qualifier of "the block", the world-model is consulted to confirm that there is one (and only one) block currently in the box.

It is important to realize that the ambiguity of "on" in this example is dependent on the presence of the subsequent -- and equally ambiguous -- preposition "in." That is, if the sentence had instead been "Put the blue pyramid on the block", then "on" would necessarily have been interpreted as "onto." The reason for this -- as systemic grammar makes explicit -- is that one may confidently expect an answer to the question "where is the thing to be put?", whereas one may or may not find an answer to the question "where is the thing now?" Since both "in" and "on" can be used to answer either of these where? questions, the original example was syntactically ambiguous. But if the program had found that there was no other preposition group after "on the block", its knowledge of systemic grammar would have led it to detach this phrase from the noun ("pyramid") and reparse it as associated rather with the verb, "Put."

(Strictly, there are three senses of "in" distinguished by Winograd's program. There is the directional sense, meaning "into." And there are two locational senses: it may mean "contained in", as in the phrase "in the box"; alternatively, it may mean "forming part of", as in the phrase "in the stack.")

In this example, it is the environmental context (the state of the world) that determines the interpretation of the input sentence. Some times, however, it is the verbal context that is crucial, as in item s (where "the pyramid" is assumed to refer to the pyramid just mentioned -- cf. item 2). The semantic program makes a habit of storing the immediately preceding discourse for use in cases where the reference of pronouns (such as "it" -- item 42) or of noun groups (as in item 5) is determined by verbal context. SHRDLU is more efficient in this regard than PARRY: for whereas PARRY merely places specific items such as "horseracing" on the expectancy list, for use in disambiguating likely pronominal references as in "What do you enjoy about it?" SHRDLU stores whole sentences. Moreover, SHRDLU, unlike PARRY, has access to information about the world and its own past actions in it. In general, problems of reference are settled by the semantic and deductive programs working coöperatively.

When the reference is determined by environmental information, the deductive program must access its internal model of the world and draw upon the knowledge stored there in order to interpret the sentence. References determined by the context of discourse, on the other hand, do not necessitate active recourse to the world model, but merely require that the linguistic "anchor" concerned be identified. Both types of reference may occur independently within a single sentence, as in item 3: "Find a block which is taller than the one you are holding and put it into the box"; sometimes both types of problem arise in connection with one and the same pronoun, as in "the one which . . ." in item 7: "Is at least one of them narrower than the one which I told you to pick up?"

Strategies for interpreting (and effecting) references to actual items and events are required not only to enable SHRDLU to engage with the world by acting within it, but are crucial also in providing the program with a belief system as opposed to a mere conceptual structure. This distinction underlies the familiar retort "If the cap fits, wear it!" the speaker having supplied an abstract conceptual structure while the hearer has the task of establishing its particular reference.

Naturally, SHRDLU's semantic specialist for "it" must know the full range of possibly relevant alternatives if it is reliably to mediate sensible interpretation of any given expression. If it assumed that "it" can refer only to a physical object, the program as a whole would be unable to interpret items 30 and 31. And since even human grammarians do not have a formal (as opposed to an unformalized, implicit) understanding of all uses of the pronoun "it", there are sentences involving this word that Winograd's system could not parse, even though the rest of the sentence were to present no difficulties. Further, the program does not have the semantic information that "it" can be used (nominally rather than pronominally) to mean the chaser in a game of tag. Consequently, sentences employing these senses of the word would be obscure to SHRDLU.

Anyone who can interpret all occurrences of "it" within SHRDLU's conversation, not to mention more exotic uses, must have a linguistic competence at least as powerful as that incorporated in the program. Human understanding may differ in various ways from SHRDLU's, but it must be able to mediate comparable feats of comprehension in assigning sense and syntax to natural language. In short, the notion that understanding is a simple matter may be introspectively plausible, but must be firmly rejected. Understanding "it" is no less complex than interpreting remarks about law and order as indicative of a particular ideological viewpoint, or recognizing abandonment as a species of betrayal. This everyday linguistic achievement not only demands a flexible (heterarchical) interplay of one's knowledge about various matters, but typically involves a far richer knowledge base than that available to SHRDLU. Now let us look more closely at just what it is that SHRDLU knows.

WHAT SHRDLU KNOWS

One may broadly classify the content of SHRDLU's knowledge as of three types. First, there is very general knowledge about problem solving in the abstract -- what one might term "general problem-solving skills." This knowledge is implicit in the ad-hoc organization of the program as a whole, rather than being explicitly itemized in a distinguishable portion of the data base, and it is used by SHRDLU in all phases of thinking, whether of a primarily "linguistic" or "environmental" character.

One example of this first category of knowledge is deduction itself. We saw in the previous section that all stages of SHRDLU's understanding rely extensively on the deductive system. But one should not assume that SHRDLU's powers of deduction are as great as ours. For example, its reasoning cannot distinguish between knowing that it is false that something is the case and not knowing whether it is the case or not. The reason for this is that the PLANNER language in which the deductive programs are written embodies an inadequate concept of negation. How ever, sucessor PLANNER-like languages can deal more subtly with negation, so that a successors of SHRDLU would not similarly be unable to distinguish between "No" and "Dunno." Consequently, if the information that bats are the only flying mammals were provided as a contingent fact about bats, SHRDLU's descendants (but not SHRDLU) could reply with a confident "No" to the question whether pigs have wings. Only if the information were coded at the semantic level, as part of the taxonomic meaning of "bat" and "pig", would SHRDLU get this right (cf. item 9).

Again, SHRDLU can cope, rather stupidly I'm afraid, with only a few uses of modal words like "can" (see items 10 and 11). It cannot handle all questions concerning possibilities, nor can it reason sensibly (or even at all) about counterfactual conditionals, such as "How many eggs would you have been going to use in the cake if you hadn't learned your mother's recipe was wrong?" (Good grief!) or "What would you have done if the big red block had been in the box?" We shall see shortly that this is not because the grammar programs cannot parse conditionals and subjunctives: they can. As Winograd's comment to item 11 suggests, some of SHRDLU's failings in handling "can" and "must" could be ameliorated by giving it more power to analyze its own theorems in answering modal questions. And he points out that further improvement would result from a version of PLANNER that could temporarily move into a hypothetical world in answering questions, instead of having to consider only the world as it is. Indeed, more recent systems can do precisely this, and do so specifically in order to enable newer programs to answer counterfactual questions and questions about possibilities. Without going into details here, these examples of negative and modal reasoning should indicate how the development of increasingly powerful programs goes hand in hand with the development of more powerful programming languages. (PLANNER can be seen as a primitive PROLOG in many ways).

Some of SHRDLU's deductive inadequacies are due not to the programming language used by Winograd so much as to specific choices made by him for convenience's sake. For instance, SHRDLU's use of universal quantifiers (like "all" and "every") is limited by the fact that for the robot, "every" means "every one I know about". If told, as in item 14 that its human friend owns all the blocks that are not red, SHRDLU makes a mental note that if it ever wants to find out whether the friend owns a block, all it needs to do is establish that the block is not red. But suppose the block in question is in the next room? SHRDLU knows nothing about it, so cannot deduce that it is red. It cannot even say something like: "If it's red, then you own it", because it has no way of generalizing "all" to cover things that it knows nothing about. (A more intelligent program could ask the color of the block, but SHRDLU cannot.

Winograd points out that since this is not a consequence of the basic deductive capacities or of the semantics, the system could be expanded so as to discuss genuinely universal statements.

Another general problem-solving skill is referring to an inner model of one's own intellectual capacities in planning appropriate action, and in backtracking to a previous point on realizing a mistake. Thus SHRDLU can conceptualize, or think about, picking up a big red block as distinguished from actually doing so, and can hypothetically consider a particular parsing of a phrase before committing itself finally to that interpretation.

Backtracking in a parsing problem is not blind, because Winograd wrote the PROGRAMMAR language so that it would note reasons for failure and suggest the best ways of recovering. However, the PLANNER language in which the BLOCKS world is manipulated does tend to encourage inefficient back tracking, and when SHRDLU fails to obey a command it does not know why it failed. Even so, some facility for backtracking is better than none at all -- unless one can write a program with the superhuman faculty of never making any mistakes, right?

In planning its moves, SHRDLU refers to its inner model of its own capacities. The distinction between thought and action in SHRDLU is signaled by the simple convention of omitting or including shriek-marks around the name of the operation concerned. For instance, if the program activates the function !MOVETO! then SHRDLU will move the specified object to the specified position; but if the program activates the function MOVETO then SHRDLU will merely think about how and whether to do so. In short, just as thinking about moving and moving are crucially different cognitive operations, so MOVETO and !MOVETO! are different functions for Winograd's program. By the simple expedient of systematically deleting all shriek-marks from the program one could reduce SHRDLU to a poor copy of Prince Hamlet, an ineffectual robot unable to translate even its most considered deliberations into actions in the real world. As it is, however, the decisive intelligence of SHRDLU's linguistic and block-moving performances is largely dependent on its inner ability to look before it leaps.

As well as using a general model of one's abilities in planning, one often refers to a specific model of one's past achievements (and failures) explaining one's actions and in learning how to do better. As the responses to items 25 to 29 show, SHRDLU shares the first of these capacities; indeed, unlike neurotics such as Colby's patient, who formed the original model for his neurotic program, SHRDLU has the complete goal-subgoal tree stored for explanatory purposes. If some goal or sub goal were inaccessible (repressed), the robot would either have to answer "I DON'T KNOW" to some of the questions in this part of the conversation, or would have to rationalize in some way so as to suggest a plausible reason, which might or might not be the real reason. The second capacity -- noting the purposive structure of one's activities so as learn how to do better -- is not available to SHRDLU. SHRDLU does not have the insight into its own behavior that would be needed for such learning.

The second broad category of SHRDLU's knowledge is a rich store specialized linguistic knowledge, both syntactic and semantic in nature, which can be deployed in any English parsing exercise irrespective of content and context. This knowledge is embodied in the grammar and semantic programs. It includes problem-solving skills or heuristics that are peculiarly linguistic -- incorporated, for example, in the aforementioned hints that a transitive imperative verb will very likely be followed by a noun phrase and that a manipulative imperative like "Put . . ." is more reliably accompanied by a preposition used in its directional than in its locational sense. The definitions of 'clause", "noun group", "question", and other syntactic categories are themselves specialized programs for parsing these structures, and the equivalent semantic specialists know precisely how to check whether the tentative syntactic assignment makes sense. The linguistic programs also include definitions of words that are essential to any English dictionary, like "it", "on", "and", and "the", as well as a special procedure for dealing with common word endings, like "-ing", "-ed", and so on. Other words may be regarded as optional: ignorance of "pyramid", for instance, would not often reduce one to incoherence.

Although SHRDLU can converse sensibly only about pyramids and other inhabitants of the BLOCKS world, the program can parse sentences containing non-BLOCKS words like "eggs", "cake", "mother", and "recipe", provided that minimal relevant semantic information (such that "mother" is an animate noun) is included within the definitions the words in question. For instance, the program can parse (though not reply to) the sentence, "How many eggs would you have been going to use in the cake if you hadn't learned that your mother's recipe was wrong?" one should note not only the complex syntax of the verbs "use" and "learn" as employed in this sentence, but also the fact that the two final words determine the grammatical interpretation of the previous noun group "your mother's recipe." The linguistic distinctions that underlie correct parsing of this sentence, and which are not introspectively available to a person gossiping casually in a kitchen, are made explicit in the figure below, (page *) which shows the final parsing produced by the program. (The grammatical abbreviations are explained in Winograd's text, though most of them are easily recognizable if examined in relation to the specific sentence being parsed; the plethora of brackets is a syntactic feature of the LISP programming language and is one of the reasons I don't use it.)

SHRDLU's grammatical knowledge enables it to parse many English sentences, although there are a few syntactic constructions it cannot handle (sometimes because human grammarians cannot do so either). Some apparently simple words can be satisfactorily parsed by SHRDLU, but not subtly used or really understood, since their semantic force has not been fully programmed. For instance, SHRDLU can parse sentences containing the conjunctions "and" and "but", but it cannot understand the difference in meaning between them and so has to use (and interpret) them as equivalent to each other. Again, SHRDLU has no notion of the many different meanings of "and", whereby it can convey causal relation, temporal succession, social rank, and various other matters. These semantic subtleties are intuitively understood by us, but have not been clearly expressed in theoretical form. (Part of the meaning of "but" might be expressible by reference to R. C. Schank's semantic theory, described soon).

The third broad type of knowledge available to SHRDLU is semantic knowledge, of which there are two varieties. On the one hand, there is general semantic knowledge associated with the parsing procedure, such as the distinction between animate and inanimate nouns, and the meaning (or, as we have seen, part of the meaning) of words like "and", "but", and "or." On the other hand, SHRDLU has semantic information specific to the particular domain of discourse assumed in the cited conversation, namely, the BLOCKS world.

This domain-specific knowledge mediates relatively insightful discourse and action, as opposed to the blind parsing of isolated sentences like the query about the eggs in the cake. Just as Colby's neurotic knows only what it is explicitly told about "love" and "hate", so the only proper ties of blocks of which SHRDLU has any inkling are those explicitly mentioned in the BLOCKS-world description provided. It has already been remarked that SHRDLU knows very little about blocks, if compared with human beings. The importance of Winograd's program, however, is that it knows a good deal more than earlier programs did and is a wonderful front end for any micro-world.

((HOW MANY EGGS WOULD YOU HAVE BEEN GOING TO USE IN THE CAKE

IF YOU HADN'T LEARNED YOUR MOTHER'S RECIPE WAS WRONG)

(CLAUSE MAJOR QUEST NGQUES POLR2 ACTV OBJ1Q TRANS)

(((HOW MANY EGGS)

(NG QUEST HOWMANY NDEF NPL DET)

((HOW (QDET)) (MANY (QDET)) (EGGS (NOUN NPL)))) (WOULD (VB AUX MODAL QAUX)) ((YOU) (NG SUBJ DEF NS NPL) ((YOU (PRON NPL NS SUBJ OBJ)))) ((HAVE BEEN GOING TO USE) (VG MODAL NAGR (FUT PAST MODAL)) ((WOULD (VB AUX MODAL QAUX)) (HAVE (HAVE VB AUX INF TRANS)) (BEEN (AUX VB BE EN)) (GOING (VB ITRANS ING)) (TO (TO)) (USE (VB INF TRANS MVB)))) ((IN THE CAKE) (PREPG)

((IN (PLACE PREP PLACE))

((THE CAKE)

(NG OBJ DET NS DEF)

((THE (DET NPL NS DEF)) (CAKE (NOUN NS)))))) ((IF YOU HADN'T LEARNED YOUR MOTHER'S RECIPE WAS WRONG) (CLAUSE BOUND DECLAR ACTV TRANS) ((IF (BINDER))

((YOU) (NG SUBJ DEF NS NPL) ((YOU (PRON NPL NS SUBJ OBJ)))) ((HADN'T LEARNED) (VG VPL V3PS NEG (PAST PAST)) ((HADN'T (HAVE VB AUX TRANS PAST VPL V3PS VFS NEG))

(LEARNED (VB TRANS REPOB PAST EN MVB))))

((YOUR MOTHER'S RECIPE WAS WRONG)

(CLAUSE RSNG REPORT OBJ OBJ1 DECLAR BE INT)

(((YOUR MOTHER'S RECIPE)

(NG SUBJ NS DEF DET POSES)

(((YOUR MOTHER'S)

(NG SUBJ NS DEF DET POSES POSS)

(((YOUR) (NG SUBJ POSS)

((YOUR (PRON NPL NS SUBJ OBJ POSS)))) (MOTHER'S (NOUN NS POSS)))) (RECIPE (NOUN NS))))

( (WAS ) ( V G V 3PS V F S ( P AST) )

((WAS (AUX VB BE V3PS VFS PAST MVB))))

((WRONG) (ADJG Q COMP) ((WRONG (ADJ)))))))))))

Sample Parsing Produced by SHRDLU. Winograd, Understanding Natural Language, pp. 175.

Natural language front-ends talk about their particular spheres of interest (and they are able actively to use their knowledge in conversation, as we have seen) The BLOCKS-world knowledge includes "environmental" information about the size, shape, color, and current and past positions of the various items in the world, information that would normally be heavily reliant on one's perception of the scene. It includes "practical" knowledge about what causes what and about how to manipulate things in the BLOCKS world, such as that picking up a block may necessitate moving something off the top and finding an empty place to put it in (cf. item 1). And it includes "semantic" information about the implications and inter relations of the various concepts employed in thinking about and acting in the BLOCKS world; thus SHRDLU's insight that tables cannot pick up bricks is ultimately derived from the semantics represented in the next figure, there being no pathway linking TABLE to ANIMATE (the program's general semantics forbids the verb "pick up" to take an inanimate noun as subject). See fig. page *

The distinction between environmental, practical, and semantic in formation is somewhat forced, because one can only state "facts" about the environment (including "practical possibilities") in terms of one's conceptual scheme. So SHRDLU cannot conceive of picking up a table. If SHRDLU's basic BLOCKS semantics were altered so as to include TABLE within the category of MANIP-ulable PHYS-ical OB-jects, this currently unthinkable idea could then enter SHRDLU's head. In short, SHRDLU's thought and action are crucially dependent upon the program's inner model of the world and of its own capacities for effecting changes in the world.

The information coded in these inner representations is integral to SHRDLU's planning of actions, which occurs independently of any specific instructions from the human interlocutor (who usually would be quite incapable of articulating such instructions). The reason for this is that PLANNER (in which the deductive program and the BLOCKS world knowledge are written) is a "goal-directed" language, in which one can ask that a general type of goal be achieved without having to tell the program precisely how to achieve it; thus one can say "Pick up a big red block" and leave SHRDLU to figure out how to do so.

SHRDLU's knowledge, whether of syntax, semantics, or the BLOCKS world, is embodied as "procedures", or miniprograms, rather than as passive items or theorems stored in the data base. PLANNER "theorems" are miniprograms that specify -- and, when activated, control -- the execution of a particular set of steps in the proof procedures available for solving problems. Similarly, the PROGRAMMAR definitions of "noun clause" and the like are programs for parsing noun clauses. Even the semantic definitions of words like "it", "and", or "if" are LISP programs specifying what to do when one encounters these words in interpreting a sentence, rather than assertions that their meaning is such-and-such.

There are various advantages in representing knowledge procedurally rather than assertionally; indeed, at the time of creating SHRDLU, Winograd felt procedural representations to be more crucial than he does now. One should merely note that knowledge can be represented procedurally, as knowledge how to do something as opposed to knowledge that something is the case.

SHRDLU's knowledge, as has already been remarked, is grossly limited as compared with implicit in human responses of the most ordinary kind. For example, SHRDLU has no understanding of personal relations or of any facet of human affective psychology. Having no representation of the relevant human psychology, SHRDLU is incapable of personal intuition, however crude. But understanding language about people, as opposed to blocks, would require some such representation, and would be needed even to interpret apparently simple sentences like " 'They are good pencils,' said Janet." This is evident from a recent attempt to formalize the intellectual processes involved in the comprehension of children's stories.

Trivial, right? Understanding simple children's stories? WRONG!

Technical details - summary.

The natural language processing system SHRDLU was written by Terry Winograd as his doctoral research project at M.I.T. It was written in LISP and MICRO-PLANNER, a LISP-based programming language. The design of the system was based on the belief that to understand language, a program must deal in an integrated way with syntax, semantics, and reasoning. The basic viewpoint guiding its implementation was that meanings (of words, phrases, and sentences) can be embodied in procedural structures and that language is a way of activating appropriate procedures within the hearer. Thus, instead of representing knowledge about syntax and meaning as rules in a grammar or as patterns to be matched against the input, Winograd embodied the knowledge in SHRDLU in pieces of executable computer code. For example, the context-free rule saying that a sentence is composed of a noun phrase and a verb phrase,

S --> NP VP,

is embodied in the MICRO-PLANNER procedure:

(PDEFINE SENTENCE

(((PARSE NP) NIL FAIL)

((PARSE VP) FAIL FAIL RETURN)))

When called, this program, named SENTENCE, uses independent procedures for parsing a noun phrase followed by a verb phrase. These, in turn, can call other procedures. The process FAILs if the required constituents are not found. With such special procedural representations for syntactic, semantic, and reasoning knowledge, SHRDLU was able to achieve unprecedented performance levels.

SHRDLU operates within a small "table-top" domain so that it can have an extensive model of the structures and processes allowed in the do main. The program simulates the operation of a robot arm that manipulates toy blocks on a table. The system maintains an interactive dialogue with the user: It can accept statements and commands as well as answer questions about the state of its world and the reasons for its actions. The implemented system consists of four basic elements:

• a parser,

• a recognition grammar for English,

• programs for semantic analysis (to change a sentence into a sequence of commands to the robot or into a query of the database), and

• a problem solver (which knows about how to accomplish tasks in the blocks world).

Each procedure can make any checks on the sentence being parsed, perform any actions, or call on other procedures that may be required to accomplish its goal. For example, the VERB PHRASE procedure called above contains calls to functions that establish verb-subject agree ment by searching through the entire derivation tree for other constituents while still in the middle of parsing the VP. SHRDLU's knowledge base includes a detailed model of the blocks world it manipulates, as well as a simple model of its own reasoning processes, so that it can explain its actions.

Reasoning in SHRDLU

SHRDLU's model of the world and reasoning about it are done in the MICRO-PLANNER programming language, which facilitates the rep resentation of problem-solving procedures, allowing the user to specify his own heuristics and strategies for a particular domain. Knowledge about the state of the world is translated into MICRO-PLANNER assertions, and manipulative and reasoning knowledge is embodied in MICRO-PLANNER programs. For example, the input sentence "The pyramid is on the table" would be translated into an assertion of the form:

(ON PYRAMID TABLE)

or in PROLOG, for example:

is_on(pyramid, table).

SHRDLU's problem solver consists of a group of "theorems" about the robot's environment and actions, represented as MICRO-PLANNER procedures. In operation, the theorem prover manipulates the state of the domain by running MICRO-PLANNER programs that perform the actions requested by the user. The main idea of PLANNER is to solve problems by means of spe cific procedures built into the problem statements themselves, as well as by applying general problem-solving rules. The advantage of these problem-specific rules, or heuristics, is that they can radically increase the efficiency of the process.

Furthermore, the problem statements are programs and thus can carry out actions in the problem-solving process. Thus, to put one block on another, there might be a MICRO-PLANNER program of the form:

(THGOAL (ON ?X ?Y)

(OR (ON-TOP ?X ?Y)

(AND (CLEAR-TOP ?X)

(CLEAR-TOP ?Y)

(PUT-ON ?X ?Y))))

This means that, if X is not already on Y, such a state can be achieved by clearing off everything that is stacked on top of X (so that the robot can move X), clearing off Y (so that X can be placed on top of Y), and then putting X on Y. The procedure resembles a predicate calculus theorem and PROLOG, but there are important differences. The PLANNER procedure is a program, and its operators carry out actions. The THGOAL procedure finds an assertion in the database or proves it with other procedures. AND and OR are logical connectives. The crucial element is that though PLANNER may end up doing a proof, it does so only after checking some conditions that may make the proof trivial, or impossible, and it only performs the proof on relevant arguments, rather than checking all entities in the database as a blind theorem prover might. Moreover, no sharp distinction is drawn between proof by showing that a desired assertion is already true and proof by finding a sequence of actions (manipulating blocks) that will make the assertion true.

Grammar, Syntax, and Semantics

SHRDLU's grammar is based on the notion of systemic grammar, a system of choice networks that specifies the features of a syntactic unit, how the unit functions, and how it influences other units. Thus, a systemic grammar contains not only the constituent elements of a syntactic group but also higher level features such as mood, tense, and voice. To facilitate the analysis, the parsing process looks for syntactic units that play a major role in meaning, and the semantic programs are organized into groups of procedures that are applicable to a certain type of syntactic unit. In addition, the database definitions contain semantic markers that can be used by the syntactic programs to rule out gram matical but semantically incorrect sentences such as The table picks up blocks. These markers are calls to semantic procedures that check for restrictions, for example, that only animate objects pick up things. These semantic programs can also examine the context of discourse to clarify meanings, establish pronoun referents, and initiate other semantically guided parsing functions.

Parsing

To write SHRDLU'S parser, Winograd first wrote a programming language, embedded in LISP, which he called PROGRAMMAR. This language supplies primitive functions for building systemically described syntactic structures. The theory behind the language is that basic programming methods, such as procedures, iteration, and recursion, are also basic to the cognitive process. Thus, a grammar can be implemented without additional programming paraphernalia; special syntactic items (such as conjunctions) are dealt with through calls to special procedures. PROGRAMMAR operates basically in a top-down, left-to-right fashion but uses neither a true parallel processing nor a true backtracking strategy in dealing with multiple alternatives. It finds one parsing rather directly, since decisions at choice points are guided by the semantic procedures. By functionally integrating its knowledge of syntax and semantics, SHRDLU can avoid exploring alternative choices in an ambiguous situation. If the choice does fail, PROGRAMMAR has primitives for returning to the choice point with the reasons for the failure and informing the parser of the next best choice based on these reasons. This "directed backup" is far different from PLANNER's automatic backtracking in that the design philosophy of the parser is oriented toward making an original correct choice rather than es tablishing exhaustive backtracking.

The key to the system's successful operation is the interaction of PLANNER reasoning procedures, semantic analysis, and PROGRAMMAR. All three of these elements examine the input and help direct the parsing process. By making use of this multiple-source knowledge and programmed-in "hints" (heuristics), SHRDLU successfully dealt with language issues such as pronouns and referents.

Discussion

SHRDLU constituted a significant step forward in natural language processing research because oi its attempts to combine models of human linguistic and reasoning methods in the language understanding process. Before SHRDLU, most AI language programs were linguistically simple; they used keyword and pattern-oriented grammars. Furthermore, even the more powerful grammar models used by linguists made little use of inference methods and semantic knowledge in the analysis of sentence structure. A union of these two techniques gives SHRDLU impressive results and makes it a more viable theoretical model of human language processing.

SHRDLU does have its problems, however. Like most existing natural language systems, SHRDLU cannot handle many of the more complex features of English. Some of the problem areas are agreement, dealing with hypotheses, and handling words like "the" and "and".

Yorrick Wilks has argued that SHRDLU'S power does not come from linguistic analysis but from the use of problem-solving methods in a simple, logical, and closed domain ("table-top" world), thus eliminating the need to face some of the more difflcult language issues. It seems doubtful that if SHRDLU were extended to a larger domain, it would be able to deal with these problems. Further, the level at which SHRDLU seeks to simulate the intermixing of knowledge sources typical of human rea soning is embedded in its processes rather than made explicit in its control structure, where it would be most powerful. Lastly, its problem solving is still highly oriented toward predicate calculus and limited in its use of inferential and heuristic data.