Summary of Narrative Experiment


This experiment was carried out as part of the BabyTalk Project. The aim of BabyTalk is to create systems that automatically generate English summaries of data related to a patient in a Neonatal Intensive Care Unit. The main purpose is to provide a decision-support aid for medical staff, who will be able to request a summary of the salient clinical events related to a patient within a specific time period.

Although generating such summaries requires a system to have considerable domain-specific knowledge, there are also more general principles related to narrative construction and temporal coherence that we believe apply across domains. In an evaluation of an early prototype system, we found that automatically generated narratives, compared to human-written summaries, suffer from a lack of temporal and narrative coherence. This arises because there are conflicting constraints that our narratives have to satisfy:

  1. A person reading the narrative needs to be able to reconstruct the temporal sequence in which the main events occurred;
  2. The narrative needs to highlight salient or more important events. These occupy more prominent positions in a paragraph, and precede less important events. Therefore, discourse order is not isomorphic to temporal order.

Because of the above constraints, we needed to collect data about the methods human authors use to indicate temporal time shifts in a narrative discourse, especially on how events in a discourse are temporally related to a previously mentioned event. The model we are assuming is an extension of Reichenbach's theory of tense.


Our experiment aimed to collect a large corpus of narratives for use in empirical analysis using both standard statistical methods and machine learning techniques.


Five picture stories were selected from a set of picture stories created as teaching materials for British Sign Language signers and made available by the Higher Education Academy Languages and Linguistics Area Studies. Each story had the following structure:

  1. P1: introductory, scene-setting picture;
  2. P2 - P5: 4 pictures describing the main events in the story;
  3. P6: a concluding picture

The pictures making up a story were time-stamped by the experimenters, in order to make their temporal order clear. In addition, a caption was added to each picture in order to give the main thrust of the event depicted. This aimed to control the thematic variation between experimental participants who were exposed to the same stories.


Versions of each story were created to represent levels of the following main variables:

  1. P6 position (2 levels): whether the concluding picture was placed first or last in the narrative (2 levels);
  2. Event order: the order of the remainder of the events in the narrative: 5 different orders were created (e.g. P5,P1,P2,P3,P4). One of the orders was the original (i.e. P1,P2,P3,P4,P5,P6);

This yielded (2 * 5 = ) 10 conditions, for which a version of each of the 5 stories was created, giving a total of 50 different stories.

Participants and procedure

The experiment had a between-groups design and was carried out online by volunteers. Each participant was asked to write a single story based on pictures. Each participant therefore saw one story in one of the conditions described above.

165 participants completed the experiment. They were assigned randomly to a condition (a story order) and a story. There were roughly equal numbers of participants in each condition.

The procedure was as follows: after an introductory phase in which participants were introduced to the experiment and given instructions, they were shown the full picture story that they would write, together with the timestamps and captions, in its original order. Thus, participants knew the temporal order of events in their story. They were subsequently shown the order in which they would be required to write the story.

During the subsequent six stages, they were presented with the six pictures making up their story one by one, in the order determined for their experimental condition. They wrote the part of the narrative pertaining to the story. During these stages, the screen was divided into two. The bottom half was where they wrote the next part of the story; the top half showed the story as they had written it so far, with accompanying pictures. On submitting the new addition, the story in the top half of the screen was updated.

Preliminary results

NB: These results are preliminary, and based on a survey of the data. Statistical analysis is in progress, and this page will be updated as these results become available.

Two examples of some excerpts from narratives are shown below. These are from the same story, but with different orders.

Story 1: P1, P2, P3, P4, P5, P6 Story 2: P6,P1,P2,P3,P4,P5
It was Saturday morning at 9:15. Mary and John arrived home after getting last minute supplies for their home improvement project. Marjorie and Derek finally finished moving into their new home in Surrey late in the evening. [...]
After organizing their supplies, they got started on the bedroom. [...] They started almost 12 h previously moving boxes into the house [...]
Finally, at 13:30 they were ready to hang the new door. [...] Crazily they decided it would be a good idea to give the spare room a quick freshen up with new paint [...]
The next step in their project was the kitchen. [...] They even had to replace a new door on the bathroom [...]
It was nearly 7 PM when they finished the final touches on the bathroom [...] They also decided to change the doors in the kitchen before moving in their food and white goods. [...]
John and Mary rested up for a while, had a nice dinner and toasted their success. [...] Once all the DIY was done they started with the finishing touches [...]

A preliminary comparison of different conditions such as the above suggests that authors structure their narrative into temporally linear segments, each of which is explicitly anchored in time. For example, Story 1 above situates the narrative with respect to the time at which it started (9:15 am). This story seems to consist of a single, large segment, in which each event follows on from the previous one. The reference time of each event (in the sense of Reichenbach) is the event time of the previous one.

In Story 2, where the conclusion is the first event recounted, there is an explicit temporal anchoring for this first event (late in the evening). The next part of the story, consisting of a temporally linear sequence of events, is also explicitly anchored with respect to the previous event (in sentence 2, via 12 h previously). This situates the following segment of narrative in time relative to the first-mentioned event. Within this second temporal segment, events follow on from each other as per Story 1.

To the extent that these observations can be generalised, they suggest a model in which, having determined the order in which events are to be recounted, they are split up into segments consisting of temporally consecutive events. Each segment is related to the previous segment while, within each segment, each event follows from the previously mentioned one.

Work in progress

We are currently investigating in more detail the way temporal linking occurs through the use of tenses and adverbials. Of particular interest is how reference time of an event is computed (i.e. for a given event, how it is marked relative to a salient, previously mentioned one).

The story corpus may prove to be a useful resource both for researchers interested in narrative and temporal structure, and for researchers in Natural Language Generation who are interested in the automatic generation of narratives. It is currently being annotated with the following information:

  1. the events in the story, corresponding to the pictures, with the time at which they happened;
  2. the temporal links between events, whether these are expressed using tenses such as the past perfect, or temporal adverbials;
  3. the grammatical and semantic information (tense/aspect and aksionsart) associated with each sentence describing an event.

The corpus will be made publicly available once annotation is complete. It will also be exploited in studies using machine-learning techniques to extract rules about how to express temporal relations in different contexts.