2003 ARDA Workshop

Title: Graphical Annotation Toolkit for TimeML (pdf version)

Leader: James Pustejovsky, Department of Computer Science, Brandeis University

Workshop Team : James Pustejovsky , Brandeis University (Team Lead), Inderjeet Mani , MITRE Virginia (Co-Team Lead), Luc Belanger, University of Montreal, Robert Gaizauskas, University of Sheffield, Robert Knippen, Brandeis University, Marc Verhagen, Brandeis University, David Day, MITRE, Jon Schwarz, MITRE, Linda van Guilder, MITRE

Problem Definition

The purpose of this workshop is to address some specific annotation and user interface problems that arose in the context of the TERQAS workshop in 2002. In the previous workshop we focused on two efforts, reflecting the major deliverables of that contract:

  1. TimeML: Definition and design of a Metadata Standard for Markup of events, their temporal anchoring, and how they are related to each other in News articles.
  2. TIMEBANK: Creation of a gold standard corpus of 300 articles marked up for temporal expressions, events, and basic temporal relations, based on the specification of TimeML.

In addition to these major deliverables, several secondary milestones were achieved, including:

1. Creation of Algorithms for recognizing:

a. Temporal Expressions,

b. Event Expressions

c. Times associated with Events

d. Ordering of Events and Times

2. Development of a Text Segmented Closure Algorithm

3. Creation of a Semi-graphical Annotation Tool

It is the last two of these deliverables above that will form the basis of the present proposal for a graphical annotation toolkit.

The advantages of TimeML have already proven to be quite useful. These include:

1. It provides a robust markup framework for multiple domains and applications;

2. It is compliant and interoperable with emerging Semantic Web standards;

3. Its algorithms can be compared and measured against common TimeML-marked up corpora, starting with TIMEBANK.

However, there are four major problems with annotating text to this standard with the currently available annotation tools, such as the Alembic WB.

1. Inconsistencies : Annotators frequently input inconsistent information.

2. Tag Density : (a) The annotation is very dense, mainly due to link tags, (b) It is hard for annotators to keep track of relationships mentally (see figure below describing the breakdown of elements in a typical document). Notice that this number is only a fraction of the possible temporal links in that document, which is in fact quadratic to the number of events and time expressions.

3. Speed : The process is extremely slow, 1K/hour per annotator.

4. Utility : (a) Research communities carrying out other tasks need to adopt it, (b) Density and annotation speed is an obstacle.

5. Invalid Annotation : Since the current annotation tools were not designed to produce XML, they occasionally produce invalid XML documents.

 


Figure 1: TimeML Density Information

Workshop Goals

The goals of the present workshop are to:

  1. Create a graphical annotation tool for dense annotation tasks (such as TimeML);
  2. Embed an interactive closure algorithm into the annotation environment, which helps compute event and temporal relationships automatically.

Workshop Deliverables

The workshop will generate a new toolkit for the annotation of text containing high dependency markup. In particular, the Graphical Annotation Toolkit will enable the quick construction of new gold standard texts, overcoming the problems and shortcomings described in section 1 above.

Deliverables:

1. GraphTool Annotation Toolkit: source code and executables, annotation environment, user manual;

2. Embedded Temporal Closure Algorithm; as a component of GraphTool : source code and executables.