DARPA Communicator Testbed


Log Standard Proposal (v12)


Introduction

This document is intended to establish standards for logfile contents and format. We will try to determine what is the smallest set of data necessary in order to re-run a system, yet also includes meaningful metrics. This may vary depending on how much of the system is to be re-run as well as what we would like to measure. In the process we will attempt to establish a standard format which all logfiles can be converted to (or generated in, although we foresee that at least a minimal amount of inferencing might be required to render the logs in this form). A goal of this document is to provide a standard that is flexible and general enough such that it could be used in different domains.

In order to accomplish this goal, we will propose an XML DTD which records the basic events in a Communicator-compliant system which can be annotated with type information indicating that a data element is "significant" from the point of view of annotators (and annotation tools).

To clarify we will consider the following (term definitions are by no means final and are open to suggestion):

The definition of "turn" requires special attention. In some accounts, a turn is an exchange between user and system. In a robust dialogue context, this definition fails to be adequate when the user or system barges in with follow-up information, etc., or when the dialogue involves more than two parties (a situation which we shouldn't rule out). We propose that the term "turn" in the context of these log files be reserved for the processing of a single participant's utterance (either user or system). This definition is not without its problems. For instance, it's not clear whether a call to the backend belongs at the end of the processing of a user's utterance (because it's the presentation of the utterance to the backend) or the beginning of the processing of the system's utterance (because it's the source of the system's response). We can currently think of nothing that this decision hinges on in the data analysis, and recommend that either interpretation be recognized at the moment.


Content

Here we will try to discuss the granularity of data to be logged in an end-to-end system. The contents of these bullets were derived mainly from the information needed by MITRE to do its own internal evaluation and will probably change as the perspectives of other sites are incorporated. Every log should contain enough information to determine the following (here input refers to the user sending information to the system and output refers to the system sending information to the user). Ideally, all this information should be extractable from the log file without any site-specific analysis. In this table, we describe the data to be logged, whether it's optional or obligatory, and how we propose to standardize access to the data:
 
Data Obligatory Standard access
Duration of session yes readable directly off the XML representation proposed below
Duration of turn (input or output) yes readable directly off the XML representation proposed below
Duration of generation of output (in a phone demo, the time the synthesizer takes to generate the audio file) yes  see 1
Duration of display of output (in a phone demo, how long it takes to play the audio file) yes  see 2
Duration of recognition of input (in a phone demo, how long it takes the recognizer to produce its hypotheses) yes  see 3
Duration of arbitrary operations no readable directly off the XML representation proposed below
Number of turns within a session yes readable directly off the XML representation proposed below
Number of sessions (in our current model each session is its own logfile) yes readable directly off the XML representation proposed below
The audio files corresponding to the user input and system output and their formats. The audio files should be stored and distributed with the logs, and the pathnames of these files should be relative to the log. yes accessed given an arbitrary search of the logged data (see the "audio_input" and "audio_output" values for the type attribute of the GC_DATA tag, as well as the "mime_type" attribute)
The text of the user input chosen by the system yes accessed given an arbitrary search of the logged data (see the "text_input" values for the type attribute of the GC_DATA tag)
The text of the system output yes accessed given an arbitrary search of the logged data (see the "text_output" value for the type attribute of the GC_DATA tag)
All possible input sentences (from the recognizer) up to a certain limit (TBD) (N/A to systems that use a word lattice) no accessed given an arbitrary search of the logged data (see the "text_input_hypothesis" value for the type attribute of the GC_DATA tag)
Indication of whether the parse succeeded no  see 4
The full input interpretation no accessed given an arbitrary search of the logged data

The elements which may pose minor complications have been left blank. Here we make tentative proposals for each of these:

  1. Duration of output generation. In a system where there is a single, obvious call to the synthesizer, this is simply the duration of that operation, but this is only one possible configuration. We propose that the "type" attribute be added to the GC_OPERATION element and that a "virtual" operation be generated by a postprocess phase with a distinguished type (say, "synthesis_duration"); alternatively, we could introduce a new XML element (say, GC_EVENT) reserved for these "virtual" events.
  2. Duration of output presentation. In the MIT system, this is an inference from notifications posted by the audio server (playing_has_begun, playing_has_ended; see the Communicator documentation for the MIT audio server). This could be handled similarly to output generation, or we could add optional start and end time attributes to the GC_DATA element which contains the audio file.
  3. Duration of recognition. Again, we propose to handle this similarly to output generation.
  4. Indication of whether the parse succeeded. Again, this is frequently an inference. We can insert a distinguished GC_DATA element (say, with a type of "input_parse_successful").
We believe that this sort of proposal will allow sites to gather data in the form they prefer, and augment it with sharable semantics in such a way that individual sites' data will retain its site-specific integrity.
 


Format

We believe that XML would be a good candidate language for this format for many reasons, among them that there is a growing supply of viewers, editors, as well as a variety of parsers available in many programming languages.

We propose that operations should be logged as single XML elements. For example:

<GC_OPERATION name="paraphrase_reply" server="nl" location="localhost:11000"
      turnid="-1" stime="941473394.66" etime="941473394.69" tidx="3">
   <GC_DATA key=":reply_string" dtype="string">
      Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for
      system development. You may hang up or ask for help at any time. How can I
      help you?
   </GC_DATA>
</GC_OPERATION>


Since in our distributed architecture messages are sent asynchronously, and many events may occur before the completion of an operation, some caching (or post processing) will be necessary to log operations as single elements.

Next we will try to define the main entities in the logfile and their formats. A DTD is also available which defines these terms and their relations. We will assume all time types will use a standard base time known as "the epoch", the number of milliseconds since January 1, 1970, 00:00:00 GMT.
 

GC_SESSION

 A session represents an interaction of a user with the system. In our current demo the equivalent to a phone call. The elements in this table refer to the XML DTD.
 
Name Description Type Required
id We should attempt to determine a unique identifier for sessions. MIT's solution for this is of the following format (IP:process id:session counter). Process id's might not be trivial to achieve in different programing languages and OS' however there usually are "equivalent" data available string yes
stime time when session started milliseconds yes
etime time when session finished milliseconds yes
GC_TURN see GC_TURN GC_TURN no

 

Example:

<GC_SESSION
    id="129.10.2.200:1010:3"
    stime="930254422.720000"
    etime="930254434.790000">
    ...
</GC_SESSION>

GC_TURN

 Consists of each interaction of the user with the system, as discussed in the introduction. The elements in this table refer to the XML DTD.
 
Name Description Type Required
id A unique identifier within each session number yes
stime time when turn started milliseconds yes
etime time when turn ended milliseconds yes
GC_OPERATION see GC_OPERATION GC_OPERATION no
GC_MESSAGE see GC_MESSAGE GC_MESSAGE no
GC_EVENT see GC_EVENT GC_EVENT no

 

Example:

<GC_TURN
    id="-01"
    stime="930254422.720000"
    etime="930254424.790000">
    ...
</GC_TURN>

GC_OPERATION

Every command executed by the system within a turn. All operations can send and receive data, frames or audio files. The elements in this table refer to the XML DTD.
 
Name Description Type Required
type the type of operation being executed (specific values TBD) string no
turnid the turn id that this operation was executed under number yes
stime time when operation started milliseconds yes
etime time when operation ended milliseconds yes
server  the name (according to the program file) of the server that executed the operation string yes
location the server (real server name or IP address) and its port (server_name:port_number) string yes
name the name of the operation string yes
tidx the token index associated with the operation number no
reply_type valid values of reply_type include normal, detroy, and error string no
reply_status valid values are normal, error, destroy and asynchronous string no
type_start_task valid values of type_start_task are task and total, and indicate whether the measurement is of on-task time or total call time string no
type_end_task indicates the end of the task string no
type_new_turn valid values of type_new_turn are user and system string no
type_start_utt valid values of type_start_utt include user, system and pacifier string no
type_end_utt valid valudes of type_end_utt include user, system and pacifier string no
type_prompt indicates the system is prompting for a key. the value of type_prompt is the key being prompted string no
GC_DATA see GC_DATA  GC_DATA no

 

Example:

<GC_OPERATION type_new_turn="system" name="paraphrase_reply" server="nl" location="localhost:11000"
      turnid="-1" stime="941473394.66" etime="941473394.69" tidx="3">
   <GC_DATA type_utt_text="system" key=":reply_string" dtype="string">
      Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for
      system development. You may hang up or ask for help at any time. How can I
      help you?
   </GC_DATA>
</GC_OPERATION>

GC_MESSAGE

Messages are items sent from a server to the hub, and their replies. In contrast to operations, messages are initiated by servers. The elements in this table refer to the XML DTD.
 
Name Description Type Required
type the type of message being issued (specific values TBD) string no
turnid the turn id that this operation was executed under number yes
time time when message issued milliseconds yes
server  the name of the server that issued the message string yes
location the server (real server name or IP address) and its port (server_name:port_number) string yes
name the name of the message string yes
direction server_to_hub or hub_to_server string yes
tidx the token index associated with the message number no
reply_type valid values of reply_type include normal, detroy, and error string no
reply_status valid values are normal, error, destroy and asynchronous string no
type_start_task valid values of type_start_task are task and total, and indicate whether the measurement is of on-task time or total call time string no
type_end_task indicates the end of the task string no
type_new_turn valid values of type_new_turn are user and system string no
type_start_utt valid values of type_start_utt include user, system and pacifier string no
type_end_utt valid values of type_end_utt include user, system and pacifier string no
type_prompt indicates the system is prompting for a key. the value of type_prompt is the key being prompted string no
GC_DATA see GC_DATA  GC_DATA no

Example:

<GC_MESSAGE name="filelog" direction="server_to_hub" server="audio"
      location="localhost:15000" turnid="-1" time="941473396.48" tidx="6">
   <GC_DATA key=":synth_log_filename" dtype="string">
      /home/communicator/test/Travel-demo/../../logs/travel_cfone/19991101/001/
      travel_cfone-19991101-001-synth--01-001.wav
   </GC_DATA>
</GC_MESSAGE>

GC_EVENT

Examples of events are internal hub errors, locks, alarm expirations, alarm enabling/disabling, and alarm resets. The elements in this table refer to the XML DTD.
 
Name Description Type Required
etype the name of hub event (SYSTEM_ERROR, LOCK, etc.) string yes
turnid the turn id under which this event occurred number yes
time time when message issued milliseconds yes
name the type of the event (operation) string yes
server  the name of the server that issued the message string no
location the server (real server name or IP address) and its port (server_name:port_number) string no
tidx the token index associated with the event number no
type_start_task valid values of type_start_task are task and total, and indicate whether the measurement is of on-task time or total call time string no
type_end_task indicates the end of the task string no
type_new_turn valid values of type_new_turn are user and system string no
type_start_utt valid values of type_start_utt include user, system and pacifier string no
type_end_utt valid valudes of type_end_utt include user, system and pacifier string no
type_prompt indicates the system is prompting for a key. the value of type_prompt is the key being prompted string no
GC_DATA see GC_DATA  GC_DATA no

 

Example:

<GC_EVENT etype="LOCK" server="audio" location="localhost:15000" turnid="-1"
      time="941473396.19" name=":hub_get_session_lock" tidx="5"/>

GC_ANNOT

GC_ANNOT are tag containing human annotations, and as such are not present in the raw (unannotated) log files. GC_ANNOT is included in this specification to support folding human annotation files in with their associated log files. The elements in this table refer to the XML DTD.
 
Name Description Type Required
turnid the turn id under which this event occurred number yes
tidx the token index associated with the event number no
type_task_completion human annotation indicating whether the task was successfully completed or not string no
GC_DATA see GC_DATA  GC_DATA no

 

Examples:

<GC_ANNOT type_task_completion="1"/>

<GC_ANNOT turnid="2" tidx="129">
   <GC_DATA type_utt_text="transcription" dtype="string">
      i'd like a flight from boston to san francisco
   </GC_DATA>
</GC_ANNOT>

GC_DATA

A key/value pair. This datatype can be used to display the information involved in an operation, as well as to display the contents of a GC_FRAME or GC_LIST. The elements in this table refer to the XML DTD.
 
Name Description Type Required
key the name of this data point string yes
turnid the turn id that this operation was executed under number no
time time stamp for this data point  milliseconds no
type valid values of type include audio_input, audio_output, text_input, text_output, text_input_hypothesis, and concept. See the Content section. string no
mime_type the mime type of the data string no
direction valid values are in and out string no
dtype the data type - valid values include integer, string, etc. (full list of values TBD) string no
type_utt_text valid values of type_utt_text are transciption, system and asr string no
type_error_msg valid value is true string no
type_help_msg valid value is true string no
GC_FRAME see GC_FRAME  GC_FRAME no
GC_LIST see GC_LIST  GC_LIST no

 

Example:

<GC_DATA key=":reply_string" dtype="string">
   Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for
   system development. You may hang up or ask for help at any time. How can I
   help you?
</GC_DATA>

GC_FRAME

This stucture would allow for recording of frames. The elements in this table refer to the XML DTD.
 
Name Description Type Required
frame_type Galaxy frame type string no
name the name of the frame string no
turnid the turn id in which this frame appears number no
GC_DATA see GC_DATA GC_DATA no

Example:

<GC_DATA key=":rec_scores">
   <GC_FRAME name="scores" type="clause">
      <GC_DATA key=":acoustic_score" dtype="string">
         "-617.9270"
      </GC_DATA>
      <GC_DATA key=":ngram_score" dtype="string">
         "-17.4465"
      </GC_DATA>
      <GC_DATA key=":nwords" dtype="integer">
         8
      </GC_DATA>
      <GC_DATA key=":total_score" dtype="string">
         "-651.3735"
      </GC_DATA>
      <GC_DATA key=":nphones" dtype="integer">
         36
      </GC_DATA>
   </GC_FRAME>
</GC_DATA>

GC_LIST

This stucture would allow for recording of lists. The elements in this table refer to the XML DTD.
 
Name Description Type Required
name the name of the list string no
turnid the turn id in which this list appears number no
GC_DATA see GC_DATA GC_DATA no

Example:

<GC_DATA name=":nbest_list">
   <GC_LIST name=":nbest_list">
      <GC_DATA key=":nbest_list[0]" dtype="string">
         can i get this american flight
      </GC_DATA>
      <GC_DATA key=":nbest_list[1]" dtype="string">
         can i get this american difference
      </GC_DATA>
      <GC_DATA key=":nbest_list[2]" dtype="string">
         can i did this american difference
      </GC_DATA>
      <GC_DATA key=":nbest_list[3]" dtype="string">
         can i get this american that flight
      </GC_DATA>
   </GC_LIST>
</GC_DATA>

Code support

MITRE volunteers to work with sites to produce the appropriate conversion tools from MIT logfiles to the proposed logfile standard. If more appropriate, we will produce a new logging module for the Hub which will simplify this process; however, we don't envision this to be necessary.


Document Type Definition (DTD)

Below we provide an XML DTD to define the above types.
 

<?xml version="1.0"?>

<!ELEMENT GC_LOG (GC_SESSION)*>
<!ATTLIST GC_LOG logfile_version CDATA #IMPLIED>

<!ELEMENT GC_SESSION ( GC_TURN | GC_ANNOT )*>
<!ATTLIST GC_SESSION id NMTOKEN #REQUIRED>
<!-- time could be defined as CDATA if we chose to use a non millisecond format -->
<!ATTLIST GC_SESSION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_SESSION etime NMTOKEN #REQUIRED>

<!ELEMENT GC_TURN ( GC_ANNOT | GC_OPERATION | GC_MESSAGE | GC_EVENT )*>
<!ATTLIST GC_TURN id NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN stime NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN etime NMTOKEN #REQUIRED>

<!ELEMENT GC_ANNOT (GC_DATA)*>
<!-- GC_ANNOT can have a sequence of one or more GC_DATA tags or it can be empty -->
<!ATTLIST GC_ANNOT type_task_completion CDATA #IMPLIED>
<!ATTLIST GC_ANNOT turnid NMTOKEN #IMPLIED>
<!ATTLIST GC_ANNOT tidx NMTOKEN #IMPLIED>

<!ELEMENT GC_OPERATION (GC_DATA)*>
<!ATTLIST GC_OPERATION type NMTOKENS #IMPLIED>
<!ATTLIST GC_OPERATION turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION server CDATA #REQUIRED>
<!ATTLIST GC_OPERATION location NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION name CDATA #REQUIRED>
<!ATTLIST GC_OPERATION tidx NMTOKEN #IMPLIED>
<!ATTLIST GC_OPERATION reply_type CDATA #IMPLIED>
<!ATTLIST GC_OPERATION reply_status CDATA #IMPLIED>
<!ATTLIST GC_OPERATION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION etime NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION type_start_task CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_end_task CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_new_turn CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_start_utt CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_end_utt CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_prompt CDATA #IMPLIED>

<!ELEMENT GC_MESSAGE (GC_DATA)*>
<!ATTLIST GC_MESSAGE type NMTOKENS #IMPLIED>
<!ATTLIST GC_MESSAGE turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE server CDATA #REQUIRED>
<!ATTLIST GC_MESSAGE location NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE name CDATA #REQUIRED>
<!ATTLIST GC_MESSAGE direction NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE tidx NMTOKEN #IMPLIED>
<!ATTLIST GC_MESSAGE reply_type CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE reply_status CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE time NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE type_start_task CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_end_task CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_new_turn CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_start_utt CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_end_utt CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_prompt CDATA #IMPLIED>

<!ELEMENT GC_EVENT (GC_DATA)*>
<!ATTLIST GC_EVENT etype NMTOKEN #REQUIRED>
<!ATTLIST GC_EVENT turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_EVENT server CDATA #IMPLIED>
<!ATTLIST GC_EVENT location NMTOKEN #IMPLIED>
<!ATTLIST GC_EVENT time NMTOKEN #REQUIRED>
<!ATTLIST GC_EVENT name CDATA #REQUIRED>
<!ATTLIST GC_EVENT tidx NMTOKEN #IMPLIED>
<!ATTLIST GC_EVENT type_start_task CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_end_task CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_new_turn CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_start_utt CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_end_utt CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_prompt CDATA #IMPLIED>

<!ELEMENT GC_DATA ANY>
<!ATTLIST GC_DATA key CDATA #REQUIRED>
<!ATTLIST GC_DATA type NMTOKENS #IMPLIED>
<!ATTLIST GC_DATA mime_type NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA direction NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA dtype NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA time NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA turnid NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA type_utt_text CDATA #IMPLIED>
<!ATTLIST GC_DATA type_error_msg CDATA #IMPLIED>
<!ATTLIST GC_DATA type_help_msg CDATA #IMPLIED>

<!ELEMENT GC_FRAME (GC_DATA)*>
<!ATTLIST GC_FRAME frame_type NMTOKEN #IMPLIED>
<!ATTLIST GC_FRAME name CDATA #IMPLIED>
<!ATTLIST GC_FRAME turnid NMTOKEN #IMPLIED>

<!ELEMENT GC_LIST (GC_DATA)*>
<!ATTLIST GC_LIST name CDATA #IMPLIED>
<!ATTLIST GC_LIST turnid NMTOKEN #IMPLIED>