Galaxy Communicator Documentation:

MITRE Log Tools

License / Documentation home / Help and feedback

MITRE is providing a number of support tools for the community to use for manipulating logs and preparing them for evaluation. These tools are useful for people generating logs from raw Hub logs, or writing their own XML. All these tools are written in Python.

This version of the log tools was originally distributed with GalaxyCommunicator 2.0.1p2; no changes were made for version 2.1. However, a set of errors was discovered in the final preparations for the 2000 evaluation which led to patches both for 2.0.1 and 2.1. For convenience, a set of reference documents has been included with this distribution; however, you are encouraged to consult MITRE's DARPA Communicator web site for the most current versions.


Background and common arguments

All the MITRE log analysis tools operate on sets of log directories. These log directories have an expected form, and all the tools in this suite of tools can be instructed about where to find these log directories and what to expect their form to be.

All these tools are written in Python, and currently use the XML parser in the standard Python distribution. This XML parser isn't really fast enough. In the near future, we'll be looking into other XML parsers currently being produced by the Python XML SIG, which should yield considerably enhanced performance (as well as providing DTD validation). We apologize for the inconvenience.

Common command line

<GC_HOME>/contrib/MITRE/tools/bin/<tool_name> \
    [--raw_txt_pat regexp] [--raw_xml_pat regexp] [--ann_xml_pat regexp] [--human_xml_pat regexp] \
    [--force] [--write_file] [--print_output] [--write_cache] [--read_cache] \
    [--help] \
    [--start first_dir] [--end last_dir] log_root...

We'll consider these arguments in groups.

Group 1: Directory structure

Each log directory must contain either a raw MIT text log, a raw XML log, or an annotated XML log. In addition, the tool suite recognizes a fourth distinguished file containing human annotations. The directory may have an arbitrary number of additional files, but it must contain no directories besides a directory called .cache, which is created by the tool suite under certain circumstances. The default names of the files in the log directory are:
 
File type Perl regular expression Command line argument for override
raw MIT text log ^.+-\d{8,8}-\d{3,3}-hublog[.]txt$ --raw_txt_pat
raw XML log ^.+-\d{8,8}-\d{3,3}-hublog-raw[.]xml$ --raw_xml_pat
annotated XML log ^.+-\d{8,8}-\d{3,3}-hublog-annotated[.]xml$ --ann_xml_pat
human annotation file ^.+-\d{8,8}-\d{3,3}-hublog-human[.]xml$ or ^.+-\d{8,8}-\d{3,3}-human[.]xml$ --human_xml_pat

The default pattern for the raw MIT text log is identical to what the Hub logger produces, so if you're using this tool suite to generate the raw and annotated XML, you don't have to worry about filenames, except to observe that your human annotations file should be generated by replacing hublog.txt in your raw MIT text log filename with hublog-human.xml.

Overriding file names

If you are not using the Hub logger, you have the option of providing different regular expressions for the four file types. You can use the arguments listed here to provide different patterns for the file types. By convention, the file names follow a consistent pattern, and you can take advantage of this in your overrides. So for instance, if you provide a pattern only for the raw MIT text log, and the tool finds the raw MIT text log but can't find the other files based on the default patterns, it will also search for files based on the pattern you provided for the raw MIT text log. The algorithm works as follows: So if your raw MIT text log pattern is ^mylog.text$, the tool will also accept mylog-raw.xml, mylog-annotated.xml and mylog-human.xml as the names of the other files if it can't find the defaults.

Group 2: Processing directives

Each analysis step has preconditions on its execution, and may produce a log as a result of operating.
 
Analysis step Precondition Produces
XML log generation raw MIT text log raw XML log conforming to the logfile standard
XML annotation raw XML log conforming to the logfile standard (+ rules) annotated XML log conforming to the logfile DMA standard
XML summarization annotated XML log conforming to the logfile DMA standard (+ human annotation file conforming to the human annotation logfile standard, if annotations are not included in the annotated XML log)
XML log validation annotated XML log conforming to the logfile DMA standard (+ human annotation file conforming to the human annotation logfile standard, if annotations are not included in the annotated XML log)
XML scoring annotated XML log conforming to the logfile DMA standard (+ human annotation file conforming to the human annotation logfile standard, if annotations are not included in the annotated XML log)

If the required preconditions of each step are not met, the tool suite is capable of backward chaining to generate it. So if you ask for an annotation, but all you currently have is the raw MIT text log, the annotation tool will force the generation of the raw XML log (or, rather, an internal representation of it).
 
Flag Purpose
--force Causes the tool to backward chain as far as possible, regardless of which intermediate files are present. So if you've provided a raw MIT text log, all the analysis steps will start with this file, no matter whether you've written out intermediate results.
--write_file Causes the tool to write out intermediate results to an appropriately-named file, using the same naming generalizations that it uses to search for files. The default is not to write files.
--print_output Causes the tool to print out each intermediate result to standard output. The default is not to print.
--read_cache Causes the tool to read from a file in the .cache directory, if present. The default is not to consult the cache.
--write_cache Causes the tool to write intermediate results to a file in the .cache directory. The default is not to write intermediate results to the cache.

The cache exists because of the current speed problems with the default Python XML parser. The cache uses a fairly simple XDR encoding of the XML tree structure. This cache takes a while to write (considerably longer than writing the XML files themselves), but currently seems noticeably faster to load than the raw XML (we haven't done timings, though). If you're doing a great deal of batch processing, it may be valuable to write the cache for the annotated XML log and read from the cache for subsequent validation, summarization and scoring. The cache may go away in the future, if the XML parsing speeds up sufficiently.

Group 3: --help

Prints out the command line and exits.

Group 4: Finding the log directories

The tools in this suite are capable of processing multiple log directories at the same time. Each tool can accept a sequence of directory names as arguments, and it will search hierarchically through those directories to find any directories which meet the qualifications for being log directories. As an example, assume your directory stack looks like this:

mylogs/
  19991120/
    000/
    001/
    002/
    notes/
  19991123/
    000/
    001/
    002/
    003/
    004/
    notes/
  19991124/
    000/
    001/
    002/
    notes/

Assume the notes/ directories don't contain logs, but all the other leaf directories do. If you want to produce raw XML for all the logs and write out the file, all you need to do is this:

% xmlize --write_file mylogs

which is equivalent to

% xmlize --write_file mylogs/19991120 mylogs/19991123 mylogs/19991124

The --start and --end command line arguments allow you to control begin and end points when you provide a single directory root as an argument. The --start and --end values are treated as subdirectories of the root, and only directory names which are alphanumerically greater than or equal to the start and less than or equal to the end are processed. So if you just want to do the last two days in your directory stack, you could use the following command line:

% xmlize --write_file --start 19991123 mylogs

If you wanted to start with the final log on the first day and end with the first log on the final day, you could use the following command line:

% xmlize --write_file --start 19991120/002 --end 19991124/000 mylogs

If you're using the Hub logger, the alphanumeric order of the directory names in your directory stack is conveniently also the date order.


XML log generation

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xmlize <common flags> [--default_user_version version_string] log_root...

This tool translates MIT logs into XML which conforms to the unannotated portion of the most recent version of the MITRE logfile standard proposal. The file <GC_HOME>/contrib/MITRE/tools/src/xmlizer/log_standard.dtd contains the XML DTD which the result of this utility conforms to. The resulting log can be printed or written to an appropriately-named file using the common flags. All status messages are written to stderr.

The --default_user_version argument allows you to add a version name to the GC_LOG tag; this is normally inherited from the program file if you use LOGFILE_VERSION:, but this allows you to provide a default version for this log if there is no LOGFILE_VERSION: in your program file. The XML annotation tool is sensitive to this version name; if it is present, the name must match one of the GC_LOG_VERSION tag contents specified in the rules file (described below), but if it is absent, no checking will be done.


XML annotation

Once you have a raw XML file, you may annotate it using the XML annotation tool. This tool uses a special rules file.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_annotate <common flags> \
    [--default_user_version version_string] [--rule_base rule_file] log_root...

This tool produces an annotated XML document. The resulting log can be printed or written to an appropriately-named file using the common flags. The raw XML log in each directory must conform to the logfile standard. The rule_file is an XML file describing legal augmentations of the raw XML. You may pass in as many rule files as you wish; the first one whose version matches will be used. The resulting document conforms to the logfile DMA standard. Note that this operation also takes the command line arguments for XML log generation, in case backchaining is required. All status messages are written to stderr.

Here's an example of a rules file (this is actually the current rules file for MITRE's travel demo):

<RULES>
<GC_LOG_VERSION>travel, version 2.0 cfone</GC_LOG_VERSION>

<RULE>
  <GC_MESSAGE name="filelog">
    <GC_DATA key=":synth_log_filename"
             dtype="string"
             new:type_audio_file="system"/>
  </GC_MESSAGE>
</RULE>

<RULE>
  <GC_MESSAGE name="filelog">
    <GC_DATA key=":utt_log_filename"
             dtype="string"
             new:type_audio_file="user"/>
  </GC_MESSAGE>
</RULE>

<RULE occurrences="first">
  <GC_OPERATION name="speak_output"
                new:type_start_task="total"/>
</RULE>

<RULE occurrences="first">
  <GC_OPERATION name="nop"
                new:type_start_task="task">
    <GC_DATA key=":listening_has_begun"/>
  </GC_OPERATION>
</RULE>

<RULE occurrences="last">
  <GC_OPERATION name="nop"
                new:type_end_task="true">
    <GC_DATA key=":playing_has_ended"/>
  </GC_OPERATION>
</RULE>

<RULE>
  <GC_OPERATION name="paraphrase_reply"
                new:type_new_turn="system"/>
</RULE>

<RULE>
  <GC_OPERATION name="nop"
                new:type_new_turn="user">
    <GC_DATA key=":listening_has_begun"/>
  </GC_OPERATION>
</RULE>

<RULE>
  <GC_OPERATION name="speak_output">
    <GC_DATA key=":reply_string" dtype="string"
             new:type_utt_text="system"/>
  </GC_OPERATION>
</RULE>

<RULE>
  <GC_OPERATION name="gather">
    <GC_DATA key=":parse_frame">
      <GC_FRAME>
        <GC_DATA key=":input_string" dtype="string"
          new:type_utt_text="asr"/>
      </GC_FRAME>
    </GC_DATA>
  </GC_OPERATION>
</RULE>

<RULE>
  <GC_OPERATION name="nop" new:type_end_utt="user">
    <GC_DATA key=":recording_has_ended"/>
  </GC_OPERATION>
</RULE>

<RULE>
  <GC_OPERATION name="nop" new:type_start_utt="system">
    <GC_DATA key=":playing_has_begun"/>
  </GC_OPERATION>
</RULE>

<RULE>
  <GC_OPERATION name="nop" new:type_end_utt="system">
    <GC_DATA key=":playing_has_ended"/>
  </GC_OPERATION>
</RULE>

</RULES>

Tags

We attempt to describe the rules file here.
 
Tag name Description Legal children
RULES Toplevel tag in the rules file. Exactly one must appear. RULE, GC_LOG_VERSION, OR
GC_LOG_VERSION The content of this tag is log file version this set of rules applies to. This must match the value of the logfile_version attribute of the GC_LOG tag in your raw XML log file, if that attribute is present. This tag may be repeated, in which case the rules are applied if at least one of the versions matches.
RULE An individual rule. The immediate children of this tag must match tags which share an immediate parent. The children of this tag describe the pattern which must match, and the augmentations to apply. This tag may be repeated. It accepts the attributes occurrences="first" and occurrences="last", which apply to the first and last tags to match the pattern, respectively. GC_TURN, GC_OPERATION, GC_MESSAGE, GC_EVENT, GC_DATA
OR A disjunction of rules. The first rule that succeeds for a given tag in the raw XML log file satisfies the OR. RULE

Both tags and attributes may have the special namespace prefixes new: (material to add) or not: (material to ensure is not present), which we'll discuss in a moment. First, let's review the behavior of the rule engine. The rule engine behaves as follows:

In the context of this behavior, the namespace prefixes have the following meaning:
 
Prefix Tag meaning Attribute meaning
new: Add this tag, with its attributes and children, at the end of the list of children of the tag its parent matches. Add this attribute-value pair to the tag which this attribute's tag matches.
not: Fail if this tag matches an eligible child. Fail if this attribute-value pair matches any attribute-value pair on the tag being checked.

For unordered tag children, the interpretation of the not: prefix is straightforward: if any child matches, fail. So the pattern <GC_DATA a="b"/><NOT:GC_DATA a="b"/> will always fail. However, for ordered tag children, the interpretation is a little more subtle: in this case, the eligible children are only those children between the nearest positive matches (or the beginning or end of the list of children). So the pattern <GC_OPERATION a="b"/><NOT:GC_OPERATION a="b"/> may very well match a sequence of children which has exactly one tag in it of the form <GC_OPERATION a="b"/>.

Note that for both prefixes, the closing tags must also bear the prefixes.

The annotator will warn you if you specify a "new" attribute it doesn't know about; it will not fail, however. For known attributes, on the other hand, the annotator enforces certain generalizations about what tags the attributes can be associated with. So tags which implicitly refer to timestamps must be associated with timestamped tags (GC_OPERATION, GC_MESSAGE, GC_EVENT), while tags which implicitly refer to data must be associated with GC_DATA tags. The annotator will print a notification if it encounters such a rules file, and then exit.

A note about GC_TURN

Currently, the GC_TURN tag is created in the raw XML log as a result of the execution of the Hub builtin server operation builtin.increment_utterance. Because there's no requirement that you use this operation, we can't rely on GC_TURN tags. Accordingly, the subsequent summarization, validation and scoring steps rely not on GC_TURN but on the placement of the type_new_turn attribute that results from the annotation step. We anticipate that GC_TURN will eventually vanish from the log standard DTD.

Matching and copying attributes and data

The values of attributes, and tag content, in the RULE patterns are actually Perl regular expressions (which Python also recognizes). These expressions are required to match the entire attribute value or tag content. So if you want to tag any data element whose key starts with ":error" as a type_error_msg, you can write the following rule:

<RULE>
  <GC_DATA key=":error.*" new:type_error_msg="true"/>
</RULE>

Perl regular expressions provide a grouping operation (using parentheses) which allows subsequent reference to the match by numeric index. Python provides an extension of this operation by which these groups can be named, as well as numbered. You can use this facility to copy information from one place in the pattern to another (this will be useful, for instance, to record values for the type_prompt attribute signalling a system reprompt). The Python syntax is as follows:

(?P<name>...)
The location to copy the matching text to can be specified using the "old:" namespace, referencing the pattern name. The scope of these names is the entire rule; it is an error to have duplicate names in the patterns, or to reference an undefined pattern name.

So let's say we record the prompted slot in our frames in the :prompted_slot key, as follows:

{c tts
   :output_string "What city would you like to depart from?"
   :prompted_slot "departure_city" }
We'd like to be able to copy the value of :prompted_slot to the value of type_prompt as follows:

<RULE>
  <GC_OPERATION>
    <GC_DATA key=":prompted_slot" dtype="string">
      (?P<slot_name>.*)
    </GC_DATA>
    <GC_DATA key=":output_string" dtype="string"
             new:type_prompt="old:slot_name"/>
  </GC_OPERATION>
</RULE>

The Python extension, regrettably, delimits the pattern name using angle brackets, which are (of course) special in XML. So we actually need to use XML entity references in the pattern:

<RULE>
  <GC_OPERATION>
    <GC_DATA key=":prompted_slot" dtype="string">
      (?P&lt;slot_name&gt;.*)
    </GC_DATA>
    <GC_DATA key=":output_string" dtype="string"
             new:type_prompt="old:slot_name"/>
  </GC_OPERATION>
</RULE>

So the XML reader translates the entities to their textual representations (open and close angle bracket), and the postprocess which digests the rules recognizes the resulting string as a delimited name. This general approach applies to any other characters in Perl regular expressions which are also special in XML.

Using order to conditionalize

Let's say you have two operations, start_speaking_A and start_speaking_B, and they can occur in any order, and the first one should indicate the start of system speech. Because annotations aren't added unless the entire rule is matched, you can use order to conditionalize the assignment of system speech onset, as follows:

<RULE>
    <GC_OPERATION name="start_speaking_A" new:type_start_utt="system"/>
    <GC_OPERATION name="start_speaking_B"/>
</RULE>

<RULE>
    <GC_OPERATION name="start_speaking_B" new:type_start_utt="system"/>
    <GC_OPERATION name="start_speaking_A"/>
</RULE>

Using negation

Consider a situation where the system reports beginning and end of audio output, but in some cases fails to report the end of audio output because a call hangup occurred. You can use not: to ensure that the hangup is recognized as the end of audio output just when there's no actual end reported as follows:

<RULE>
  <GC_MESSAGE name="audio_status" new:type_start_utt="system">
    <GC_DATA key=":playback_start"/>
  </GC_MESSAGE>
</RULE>

<RULE>
  <GC_MESSAGE name="audio_status">
    <GC_DATA key=":playback_start"/>
  </GC_MESSAGE>
  <GC_MESSAGE name="audio_status" new:type_end_utt="system">
    <GC_DATA key=":playback_end"/>
  </GC_MESSAGE>
</RULE>

<RULE>
  <GC_MESSAGE name="audio_status">
    <GC_DATA key=":playback_start"/>
  </GC_MESSAGE>
  <NOT:GC_MESSAGE name="audio_status">
    <GC_DATA key=":playback_end"/>
  </NOT:GC_MESSAGE>
  <GC_MESSAGE name="call" new:type_end_utt="system">
    <GC_DATA key=":hangup"/>
  </GC_MESSAGE>
</RULE>
 


XML human annotation stub generation

Sometimes human annotations, such as speech transcription, need to be hand-generated. We do not yet have a GUI tool which will generate these files for you. However, we do provide a stubber, which produces a "seed" containing the SR output as the default transcription, and a place for the judgment of task completion.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_human_stubber <common flags> \
    [--default_user_version version_string] [--rule_base rule_file] \
    [--range [all_turns | overall_task | on_task]] log_root...

This tool produces a stub for human annotations from an annotated XML log. The annotated XML log must conform to the logfile DMA standard. The generated stub conforms to the human annotation logfile standard. This utility writes its stub to standard output (rather than saving it directly). All status messages are written to stderr. When you save the output of this utility, be sure that you respect the filename conventions. Although this utility, like all the others, operates on an entire repository, you'll almost certainly want to use it one log directory at a time.

Note that this operation also takes the command line arguments for XML log generation and annotation, in case backchaining is required.

Log ranges

This utility also takes the --range command line argument. This argument allows the user to specify what subinterval of the session this utility (and other utilities) applies to. The default is overall_task unless otherwise indicated for the tool.
 
Value Description
all_turns All annotated data starting with the first new turn (marked with the type_new_turn attribute) will be considered.
overall_task Only annotated data between the overall start and end task markers (type_start_task=true, type_start_task=total, type_end_task=true, type_end_task=total), inclusive, will be considered. If a turn is underway when the start marker is encountered, the subset of that turn which follows the marker will be considered.
on_task Only annotated data between the on_task start and end markers (type_start_task=true, type_start_task=task, type_end_task=true, type_end_task=task), inclusive, will be considered. If a turn is underway when the start marker is encountered, the subset of that turn which follows the marker will be considered.

When you write a human annotations file, you can specify what the annotation is anchored to by turnid or by tidx. Here's an example hand-generated human annotations file from a MITRE log:

<GC_LOG_ANNOTATIONS>
   <GC_SESSION>
      <GC_ANNOT type_task_completion="0"/>
      <GC_DATA type_utt_text="transcription" turnid="0" dtype="string">
         I'd like to fly to Newark
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="1" dtype="string">
         St. Louis
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="2" dtype="string">
         Could I leave from New Orleans
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="3" dtype="string">
         I'd like to leave from New Orleans
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="4" dtype="string">
         I'd like to leave at midnight on Friday
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="5" dtype="string">
         I'd to leave around dinner on Friday
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="6" dtype="string">
         I want to leave around six _p_m on Friday
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="7" dtype="string">
         Goodbye
      </GC_DATA>
   </GC_SESSION>
</GC_LOG_ANNOTATIONS>

The value of the turnid attribute is anchored to the appropriate turn in the original XML file (and see the note about GC_TURN). The stub output for this same session would look like this:

<GC_LOG_ANNOTATIONS>
  <GC_SESSION>
    <GC_ANNOT type_task_completion="?">
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="0" tidx="35">
      i+d like to fly to newark
    </GC_DATA>
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="1" tidx="81">
      say ends
    </GC_DATA>
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="2" tidx="127">
      could i will be from new orleans
    </GC_DATA>
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="3" tidx="158">
      i+d like to leave from new orleans
    </GC_DATA>
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="4" tidx="211">
      i+d like to leave around midnight on friday
    </GC_DATA>
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="5" tidx="261">
      i+d like to leave around dinner on friday
    </GC_DATA>
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="6" tidx="296">
      i want to leave at around six p m on friday
    </GC_DATA>
    <GC_DATA type_utt_text="transcription" dtype="string" turnid="7" tidx="341">
      goodbye
    </GC_DATA>
  </GC_SESSION>
</GC_LOG_ANNOTATIONS>


Unifying automatic and human XML files

There's no requirement that human annotations be placed in a separate file; it's just sometimes easier to store the human annotations separately. All the tools in this package will operate reliably if no human annotations file is present; of course, in those circumstances, the appropriate human annotations need to be in the annotated XML file in order for summarization, validation and scoring to work appropriately. If, for some reason, you wish to merge a human annotations file into an annotated XML file, you can use this utility.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_unify <common flags> \
    [--default_user_version version_string] [--rule_base rule_file] log_root...

This tool produces an annotated XML log which contains all the auxiliary human annotations found in a human annotations file in the same directory. The annotated XML log must conform to the logfile DMA standard. Note that this operation also takes the command line arguments for XML log generation and annotation, in case backchaining is required. If a human annotation file is present in the log directory, it must conform to the human annotation logfile standard (you can generate a stub for this file using the human stubber).

As usual, in order to print out the result, use the --print common flag. In order to save the file, use the --write_file common flag. Note that the human annotations file is not removed. This utility adds the attribute human_annotations_included="1" to the output file, which causes utilities which use the annotated XML files to ignore the human annotations file.


XML summarization

Once you have your annotated XML file, you can summarize it using this tool. This tool will work for any conformant XML file, whether or not it was generated using the previous tools. If you have a separate human annotations file, it will be folded in here.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_summarize <common flags> \
    [--default_user_version version_string] [--rule_base rule_file] \
    [--range [all_turns | overall_task | on_task]] log_root...

This tool summarizes an annotated XML log. The annotated XML log must conform to the logfile DMA standard. If a human annotation file is present in the log directory, it must conform to the human annotation logfile standard (you can generate a stub for this file using the human stubber). This utility writes its summary to standard output. All status messages are written to stderr.

Note that this operation also takes the command line arguments for XML log generation and annotation, in case backchaining is required. In addition, this utility allows you to specify a log range.

Example

Here's an example of a MITRE log summarized using this method. Notice that this summary includes references to the audio files, which are annotated using the optional type_audio_file="user" and type_audio_file="system".

% xml_summarize --force --rule_base travel_2_0_rules.xml MITRE/19991101/004
Checking: MITRE/19991101/004
Reading raw Hub log: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...read.
Converting to XML: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...converted.
Reading rule file: travel_2_0_rules.xml
Ignoring Comment element
Ignoring Comment element
...read.
Applying rules: travel_2_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...resegmented.
Mon Nov 1 1999 at 17:09:57.40: Task-specific portion started.
Mon Nov 1 1999 at 17:10:08.83: Overall task started.
Mon Nov 1 1999 at 17:11:58.32: Task-specific portion and overall task ended.
Task completion status: not completed.
 
 

Turn 0 (system)
Mon Nov 1 1999 at 17:09:57.36 to Mon Nov 1 1999 at 17:09:57.38: New system turn began.

System said: Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for system development. You may hang up or ask for help at any time. How can I help you?
Mon Nov 1 1999 at 17:09:58.43: System started speaking.
System audio file: travel_cfone-19991101-004-synth--01-001.wav
Mon Nov 1 1999 at 17:10:08.66: System finished speaking.

Turn 1 (user)
Mon Nov 1 1999 at 17:10:08.83: New user turn began.

Mon Nov 1 1999 at 17:10:09.36: User started speaking.
User audio file: travel_cfone-19991101-004-000.wav
Mon Nov 1 1999 at 17:10:12.77: User finished speaking.
Recognizer heard: i+d like to fly to newark
User said: I'd like to fly to Newark

Turn 2 (system)
Mon Nov 1 1999 at 17:10:17.64 to Mon Nov 1 1999 at 17:10:17.66: New system turn began.

System said: What city does the flight depart from?
Mon Nov 1 1999 at 17:10:18.72: System started speaking.
System audio file: travel_cfone-19991101-004-synth-000-001.wav
Mon Nov 1 1999 at 17:10:20.83: System finished speaking.

Turn 3 (user)
Mon Nov 1 1999 at 17:10:21.03: New user turn began.

Mon Nov 1 1999 at 17:10:21.80: User started speaking.
User audio file: travel_cfone-19991101-004-001.wav
Mon Nov 1 1999 at 17:10:24.14: User finished speaking.
Recognizer heard: say ends
User said: St. Louis

Turn 4 (system)
Mon Nov 1 1999 at 17:10:30.03 to Mon Nov 1 1999 at 17:10:30.05: New system turn began.

System said: i am sorry i did not understand that
Mon Nov 1 1999 at 17:10:30.95: System started speaking.
System audio file: travel_cfone-19991101-004-synth-001-001.wav
Mon Nov 1 1999 at 17:10:33.43: System finished speaking.

Turn 5 (user)
Mon Nov 1 1999 at 17:10:33.61: New user turn began.

Mon Nov 1 1999 at 17:10:34.66: User started speaking.
User audio file: travel_cfone-19991101-004-002.wav
Mon Nov 1 1999 at 17:10:37.89: User finished speaking.
Recognizer heard: could i will be from new orleans
User said: Could I leave from New Orleans

Turn 6 (system)
Mon Nov 1 1999 at 17:10:44.22 to Mon Nov 1 1999 at 17:10:44.24: New system turn began.

System said: could you please repeat that
Mon Nov 1 1999 at 17:10:45.13: System started speaking.
System audio file: travel_cfone-19991101-004-synth-002-001.wav
Mon Nov 1 1999 at 17:10:46.91: System finished speaking.

Turn 7 (user)
Mon Nov 1 1999 at 17:10:47.14: New user turn began.

Mon Nov 1 1999 at 17:10:47.92: User started speaking.
User audio file: travel_cfone-19991101-004-003.wav
Mon Nov 1 1999 at 17:10:51.36: User finished speaking.
Recognizer heard: i+d like to leave from new orleans
User said: I'd like to leave from New Orleans

Turn 8 (system)
Mon Nov 1 1999 at 17:10:54.50 to Mon Nov 1 1999 at 17:10:54.53: New system turn began.

System said: Can you provide the approximate departure or arrival time?
Mon Nov 1 1999 at 17:10:55.96: System started speaking.
System audio file: travel_cfone-19991101-004-synth-003-001.wav
Mon Nov 1 1999 at 17:10:59.54: System finished speaking.

Turn 9 (user)
Mon Nov 1 1999 at 17:10:59.73: New user turn began.

Mon Nov 1 1999 at 17:11:00.51: User started speaking.
User audio file: travel_cfone-19991101-004-004.wav
Mon Nov 1 1999 at 17:11:04.49: User finished speaking.
Recognizer heard: i+d like to leave around midnight on friday
User said: I'd like to leave at midnight on Friday

Turn 10 (system)
Mon Nov 1 1999 at 17:11:10.75 to Mon Nov 1 1999 at 17:11:10.76: New system turn began.

System said: i am sorry i could not understand you
Mon Nov 1 1999 at 17:11:11.64: System started speaking.
System audio file: travel_cfone-19991101-004-synth-004-001.wav
Mon Nov 1 1999 at 17:11:13.99: System finished speaking.

Turn 11 (user)
Mon Nov 1 1999 at 17:11:14.17: New user turn began.

Mon Nov 1 1999 at 17:11:14.72: User started speaking.
User audio file: travel_cfone-19991101-004-005.wav
Mon Nov 1 1999 at 17:11:18.88: User finished speaking.
Recognizer heard: i+d like to leave around dinner on friday
User said: I'd like to leave around dinner on Friday

Turn 12 (system)
Mon Nov 1 1999 at 17:11:24.38 to Mon Nov 1 1999 at 17:11:24.40: New system turn began.

System said: please try rephrasing what you said
Mon Nov 1 1999 at 17:11:25.29: System started speaking.
System audio file: travel_cfone-19991101-004-synth-005-001.wav
Mon Nov 1 1999 at 17:11:27.43: System finished speaking.

Turn 13 (user)
Mon Nov 1 1999 at 17:11:27.62: New user turn began.

Mon Nov 1 1999 at 17:11:28.40: User started speaking.
User audio file: travel_cfone-19991101-004-006.wav
Mon Nov 1 1999 at 17:11:32.87: User finished speaking.
Recognizer heard: i want to leave at around six p m on friday
User said: I want to leave around six _p_m on Friday

Turn 14 (system)
Mon Nov 1 1999 at 17:11:38.85 to Mon Nov 1 1999 at 17:11:38.89: New system turn began.

System said: I have no information about a flight from New Orleans that depart to Newark that depart friday around 6 o'clock p m .
Mon Nov 1 1999 at 17:11:40.08: System started speaking.
System audio file: travel_cfone-19991101-004-synth-006-001.wav
Mon Nov 1 1999 at 17:11:47.26: System finished speaking.

Turn 15 (user)
Mon Nov 1 1999 at 17:11:47.46: New user turn began.

Mon Nov 1 1999 at 17:11:48.50: User started speaking.
User audio file: travel_cfone-19991101-004-007.wav
Mon Nov 1 1999 at 17:11:50.62: User finished speaking.
Recognizer heard: goodbye
User said: Goodbye

Turn 16 (system)
Mon Nov 1 1999 at 17:11:53.47 to Mon Nov 1 1999 at 17:11:53.49: New system turn began.

System said: Good bye. Thank you for using Mitre's Travel demonstration.
Mon Nov 1 1999 at 17:11:54.40: System started speaking.
System audio file: travel_cfone-19991101-004-synth-007-001.wav
Mon Nov 1 1999 at 17:11:58.32: System finished speaking.
 
 


XML log validation

Once you have your annotated XML file, you can validate it using this tool. This tool will work for any conformant XML file, whether or not it was generated using the previous tools. If you have a separate human annotations file, it will be folded in here.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_log_validate <common flags> \
    [--default_user_version version_string] [--rule_base rule_file] log_root...

This tool validates an annotated XML log. The annotated XML log must conform to the logfile DMA standard. If a human annotation file is present in the log directory, it must conform to the human annotation logfile standard (you can generate a stub for this file using the human stubber). This utility writes its summary to standard output. All status messages are written to stderr.

Note that this operation also takes the command line arguments for XML log generation and annotation, in case backchaining is required.

This utility now analyzes all turns. It reports which turns lie outside the task boundaries, as well as more specific information about how to locate misplaced elements.

Example

Here's an example of a MITRE log validated using this method. Notice that this log is not completely valid; we have not yet completed writing our rules to catch all the appropriate termination conditions.

% xml_log_validate --force --rule_base travel_2_0_rules.xml MITRE/19991101/004
Checking: MITRE/19991101/004
Reading raw Hub log: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...read.
Converting to XML: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...converted.
Reading rule file: travel_2_0_rules.xml
Ignoring Comment element
Ignoring Comment element
...read.
Applying rules: travel_2_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...resegmented.

#
# Log MITRE/19991101/004:
#

Global elements:
 

No errors found.

Possibly misplaced turns:

Turn 17 (turnid 8, tidx 365) is completely outside the overall task boundaries.

Turn 0 (system)
 

No errors found.

Turn 1 (user)
 

No errors found.

Turn 2 (system)
 

No errors found.

Turn 3 (user)
 

No errors found.

Turn 4 (system)
 

No errors found.

Turn 5 (user)
 

No errors found.

Turn 6 (system)
 

No errors found.

Turn 7 (user)
 

No errors found.

Turn 8 (system)
 

No errors found.

Turn 9 (user)
 

No errors found.

Turn 10 (system)
 

No errors found.

Turn 11 (user)
 

No errors found.

Turn 12 (system)
 

No errors found.

Turn 13 (user)
 

No errors found.

Turn 14 (system)
 

No errors found.

Turn 15 (user)
 

No errors found.

Turn 16 (system)
 

No errors found.

Turn 17 (user)

Error for one of the attributes type_start_utt="user":
At least one required.
 


XML scoring

Once you have your annotated XML file, you can score it using this tool. This tool will work for any conformant XML file, whether or not it was generated using the previous tools. If you have a separate human annotations file, it will be folded in here.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_score <common flags> \
    [--default_user_version version_string] [--format html | csv] [--rule_base rule_file] \
    [--range [all_turns | overall_task | on_task]] [--ref_postprocess script] log_root...

This tool scores an annotated XML log. The annotated XML log must conform to the logfile DMA standard. If a human annotation file is present in the log directory, it must conform to the human annotation logfile standard (you can generate a stub for this file using the human stubber). This utility writes its score output to standard output. All status messages are written to stderr. The default format is HTML output; --format csv will produce comma-delimited output suitable for loading into a spreadsheet (no formula cells, unfortunately).

Note that this operation also takes the command line arguments for XML log generation and annotation, in case backchaining is required. In addition, this utility allows you to specify a log range. The default in this case is to report all intervals separately, rather than just the total task interval.

The scorer now produces all automatically computable DMAs yet (except word error rate, which the xml_nist_batch tool generates input for the NIST scoring package for). It also currently provides output as HTML tables and comma-delimited flat files for spreadsheet input.

Postprocessing scripts

This utility allows you to specify a script to postprocess the user transcriptions via --ref_postprocess. This script should be a normal Unix executable, of any kind. It can be used to strip out transcription markup in preparation for scoring. The executable should loop, reading lines from stdin and writing the altered string to stdout, including a trailing newline. Be sure to flush stdout after each write. See the extraction of landmarks for examples.

If no postprocess is provided, any data corresponding to use transcriptions will be postprocessed according to the NIST transcription guidelines for the Communicator program. This postprocess removes any tokens delimited by square brackets ([ ]), since they are intended to indicate noise under the guidelines. If you provide your own postprocess, this step will be skipped; so be sure you duplicate this behavior if you write your own postprocess.

Example

Here's the command line for scoring a set of MITRE logs, and the corresponding HTML tables which result. Note the use of multiple rule files, due to the fact that the two logs processed here use different program file formats and thus have different log versions.

% xml_score --force --default_user_version "travel, version 1" \
    --rule_base travel_1_0_rules.xml --rule_base travel_2_0_rules.xml MITRE
Checking: MITRE
Checking: MITRE/19991101
Checking: MITRE/19991101/004
Checking: MITRE/19990923
Checking: MITRE/19990923/000
Reading raw Hub log: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...read.
Converting to XML: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...converted.
Reading rule file: travel_1_0_rules.xml
...read.
Reading rule file: travel_2_0_rules.xml
Ignoring Comment element
Ignoring Comment element
...read.
Applying rules: travel_1_0_rules.xml
...wrong version.
Applying rules: travel_2_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...resegmented.
Reading raw Hub log: MITRE/19990923/000/travel_cfone-19990923-000-hublog.txt
...read.
Converting to XML: MITRE/19990923/000/travel_cfone-19990923-000-hublog.txt
...converted.
Applying rules: travel_1_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19990923/000/travel_cfone-19990923-000-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19990923/000/travel_cfone-19990923-000-hublog-human.xml
...resegmented.

Complete sessions (on task)

  Duration (secs) Turns in interval Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts
MITRE/19990923/000 58.71 8 1 0 5.35/0.1189 21 62 0.25 0
Overall 58.71 8.00 1.00 0.00 5.35/0.1189 21 62 0.25 0.00

Complete sessions (total task)

  Duration (secs) Turns in interval Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts
MITRE/19990923/000 70.35 9 1 0 5.35/0.1189 21 92 0.20 0
Overall 70.35 9.00 1.00 0.00 5.35/0.1189 21 92 0.20 0.00

Complete sessions (all turns)

  Turns in interval Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts
MITRE/19990923/000 10 1 0 5.35/0.1189 21 92 0.20 0
Overall 10.00 1.00 0.00 5.35/0.1189 21 92 0.20 0.00

Incomplete sessions (on task)

  Duration (secs) Turns in interval Error messages Help messages User words (noise stripped) System words Prompt percentage Number of reprompts
MITRE/19991101/004 120.92 1 0 0 0 30 0.00 0
Overall 120.92 1.00 0.00 0.00 0 30 0.00 0.00

Incomplete sessions (total task)

  Duration (secs) Turns in interval Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts
MITRE/19991101/004 109.49 17 0 0 6.14/1.4842 47 105 0.22 0
Overall 109.49 17.00 0.00 0.00 6.14/1.4842 47 105 0.22 0.00

Incomplete sessions (all turns)

  Turns in interval Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts
MITRE/19991101/004 18 0 0 6.14/1.4842 47 105 0.22 0
Overall 18.00 0.00 0.00 6.14/1.4842 47 105 0.22 0.00

All sessions (on task)

  Task completed Duration (secs) Turns in interval Mean user words per turn (noise stripped) Mean system words per turn Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts Mean system utterance duration (secs) Mean system turn duration (secs) Mean system turn silence (secs) System turn silence pct
MITRE/19990923/000 1 58.71 8 5.25 15.50 1 0 5.35/0.1189 21 62 0.25 0 5.35 5.35 0.00 0.00
MITRE/19991101/004 0 120.92 1 - 30.00 0 0 - 0 30 0.00 0 10.23 10.23 0.00 0.00
Overall 0.50 89.82 4.50 5.25 18.40 0.50 0.00 5.35/0.1189 21 92 0.20 0.00 6.33 6.33 0.00 0.00

All sessions (total task)

  Task completed Duration (secs) Turns in interval Mean user words per turn (noise stripped) Mean system words per turn Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts Mean system utterance duration (secs) Mean system turn duration (secs) Mean system turn silence (secs) System turn silence pct
MITRE/19990923/000 1 70.35 9 5.25 18.40 1 0 5.35/0.1189 21 92 0.20 0 6.33 6.33 0.00 0.00
MITRE/19991101/004 0 109.49 17 5.88 11.67 0 0 6.14/1.4842 47 105 0.22 0 3.97 3.97 0.00 0.00
Overall 0.50 89.92 13.00 5.67 14.07 0.50 0.00 5.88/1.1692 68 197 0.21 0.00 4.82 4.82 0.00 0.00

All sessions (all turns)

  Task completed Turns in interval Mean user words per turn (noise stripped) Mean system words per turn Error messages Help messages Response latency (mean in secs/variance) User words (noise stripped) System words Prompt percentage Number of reprompts Mean system utterance duration (secs) Mean system turn duration (secs) Mean system turn silence (secs) System turn silence pct
MITRE/19990923/000 1 10 5.25 18.40 1 0 5.35/0.1189 21 92 0.20 0 6.33 6.33 0.00 0.00
MITRE/19991101/004 0 18 5.88 11.67 0 0 6.14/1.4842 47 105 0.22 0 3.97 3.97 0.00 0.00
Overall 0.50 14.00 5.67 14.07 0.50 0.00 5.88/1.1692 68 197 0.21 0.00 4.82 4.82 0.00 0.00


Batch generation of NIST sclite input

Once you have your annotated XML file, you can generate input for the NIST sclite scoring tool using the xml_nist_batch tool. This tool will work for any conformant XML file, whether or not it was generated using the previous tools. If you have a separate human annotations file, it will be folded in here.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_nist_batch <common flags> \
    [--default_user_version version_string] [--rule_base rule_file] \
    [--hyp_postprocess script] [--ref_postprocess script] [--wav_postprocess script] \
    [--outdir file_location] [--outprefix file_prefix] \
    [--range [all_turns | overall_task | on_task]] log_root...

This tool generates three files, given an annotated XML log. The annotated XML log must conform to the logfile DMA standard. If a human annotation file is present in the log directory, it must conform to the human annotation logfile standard (you can generate a stub for this file using the human stubber).

Note that this operation also takes the command line arguments for XML log generation and annotation, in case backchaining is required. In addition, this utility allows you to specify a log range.

This utility saves three files: a speech hypothesis file, a speech transcription file, and (possibly) a file of audio files. The --outprefix names the file prefix, if present. The default file prefix is sr. The three files are named:

These files are written to: If any of the arguments --hyp_postprocess, --ref_postprocess, or --wav_postprocess are present, the hypothesis, transcription, or audio file pathname will be passed through the executable named by the corresponding argument. These executables are identical in form to the postprocessing scripts described above. The default for --ref_postprocess is the same as in xml_score; the defaults for the others are not to do postprocessing.

The hypothesis and transcription files can be passed to the sclite utility as follows:

% xml_nist_batch --force --default_user_version "travel, version 1" \
      --rule_base travel_1_0_rules.xml --rule_base travel_2_0_rules.xml \
      --range all_turns MITRE
Checking: MITRE
Checking: MITRE/19991101
Checking: MITRE/19991101/004
Checking: MITRE/19990923
Checking: MITRE/19990923/000
Reading raw Hub log: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...read.
Converting to XML: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...converted.
Reading rule file: travel_1_0_rules.xml
...read.
Reading rule file: travel_2_0_rules.xml
Ignoring Comment element
Ignoring Comment element
...read.
Applying rules: travel_1_0_rules.xml
...wrong version.
Applying rules: travel_2_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...resegmented.
Reading raw Hub log: MITRE/19990923/000/travel_cfone-19990923-000-hublog.txt
...read.
Converting to XML: MITRE/19990923/000/travel_cfone-19990923-000-hublog.txt
...converted.
Applying rules: travel_1_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19990923/000/travel_cfone-19990923-000-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19990923/000/travel_cfone-19990923-000-hublog-human.xml
...resegmented.
% sclite -r MITRE/sr.ref -h MITRE/sr.hyp -i rm
sclite: 2.2 TK Version 1.2
Begin alignment of Ref File: 'MITRE/sr.ref' and Hyp File: 'MITRE/sr.hyp'
    Alignment# 4 for speaker travel_cfone19990923000hublogannotated
    Alignment# 8 for speaker travel_cfone19991101004hublogannotated
 
 
 

                     SYSTEM SUMMARY PERCENTAGES by SPEAKER

,------------------------------------------------------------------------------------------------.
|                                          MITRE/sr.hyp                                          |
|------------------------------------------------------------------------------------------------|
| SPKR                                   | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
|----------------------------------------+-------------+-----------------------------------------|
| travel_cfone19990923000hublogannotated |    4     21 | 76.2   23.8    0.0    9.5   33.3  100.0 |
|----------------------------------------+-------------+-----------------------------------------|
| travel_cfone19991101004hublogannotated |    8     47 | 80.9   19.1    0.0    6.4   25.5   87.5 |
|================================================================================================|
| Sum/Avg                                |   12     68 | 79.4   20.6    0.0    7.4   27.9   91.7 |
|================================================================================================|
|                  Mean                  |  6.0   34.0 | 78.5   21.5    0.0    8.0   29.4   93.8 |
|                  S.D.                  |  2.8   18.4 |  3.3    3.3    0.0    2.2    5.5    8.8 |
|                 Median                 |  6.0   34.0 | 78.5   21.5    0.0    8.0   29.4   93.8 |
`------------------------------------------------------------------------------------------------'

Successful Completion


Extraction of landmarks

Once you have your annotated XML file, you can extract the values of arbitrary landmarks using the xml_extract_landmarks tool. This tool will work for any conformant XML file, whether or not it was generated using the previous tools. If you have a separate human annotations file, it will be folded in here.

Command line

<GC_HOME>/contrib/MITRE/tools/bin/xml_extract_landmarks <common flags> \
    [--default_user_version version_string] [--rule_base rule_file] \
    --landmark attr=val[:script] [--landmark attr=val[:script]]* \
    [--outdir file_location] [--outprefix file_prefix] log_root...

For each annotated XML log, this tool generates a file containing the landmarks specified, given an annotated XML log. The annotated XML log must conform to the logfile DMA standard. If a human annotation file is present in the log directory, it must conform to the human annotation logfile standard (you can generate a stub for this file using the human stubber).

Note that this operation also takes the command line arguments for XML log generation and annotation, in case backchaining is required. In addition, this utility allows you to specify a log range.

You must specify at least one landmark. Each landmark is specified as shown in the command line; so if you want to extract all instances of system text, you'd specify --landmark type_utt_text=system. If you'd like the landmark postprocessed before it's written out, you can specify a postprocess executable.

These files are written to:

Each file is named <file_prefix>-<annotated_XML_filename>.txt. The default file prefix is landmark-<UTC_time>. Because the annotated XML filename is used to differentiate between sessions, this utility will probably not work too well if the names of your annotated XML files are all the same (e.g., if you've overriden the defaults and named all your files annotated.xml).

Here's an example of collecting alternating user and system text using this utility. The scripts add_u.py and add_s.py add the prefix "U: " and "S: ", respectively, to each line. The scripts are displayed at the end of the example.

% xml_extract_landmarks --landmark type_utt_text=transcription:/tmp/add_u.py \
    --landmark type_utt_text=system:/tmp/add_s.py --force \
    --default_user_version "travel, version 1" \
    --rule_base travel_1_0_rules.xml --rule_base travel_2_0_rules.xml \
    --outprefix landmark MITRE
Checking: MITRE
Checking: MITRE/19991101
Checking: MITRE/19991101/004
Checking: MITRE/19990923
Checking: MITRE/19990923/000
Reading raw Hub log: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...read.
Converting to XML: MITRE/19991101/004/travel_cfone-19991101-004-hublog.txt
...converted.
Reading rule file: travel_1_0_rules.xml
...read.
Reading rule file: travel_2_0_rules.xml
Ignoring Comment element
Ignoring Comment element
...read.
Applying rules: travel_1_0_rules.xml
...wrong version.
Applying rules: travel_2_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19991101/004/travel_cfone-19991101-004-hublog-human.xml
...resegmented.
Reading raw Hub log: MITRE/19990923/000/travel_cfone-19990923-000-hublog.txt
...read.
Converting to XML: MITRE/19990923/000/travel_cfone-19990923-000-hublog.txt
...converted.
Applying rules: travel_1_0_rules.xml
...succeeded.
Reading human XML annotations: MITRE/19990923/000/travel_cfone-19990923-000-hublog-human.xml
...read.
Resegmenting...
Incorporating human annotations: MITRE/19990923/000/travel_cfone-19990923-000-hublog-human.xml
...resegmented.
% more MITRE/landmark-*
::::::::::::::
MITRE/landmark-travel_cfone-19990923-000-hublog-annotated.txt
::::::::::::::
S: Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for
system development. You may hang up or ask for help at any time. How can I help
you?
U: i+d like an flight from boston to detroit
S: Can you provide the approximate departure or arrival time?
U: i+d like to leave at seven a m
S: north west flight 381 departs Boston at seven thirty a m  and arrives Detroit
 at nine forty nine a m . Is there something else I can do for you?
U: is that flight serve breakfast
S: That flight does not serve breakfast. What else would you like to know?
U: goodbye  now
S: I heard your words, but I can't understand them correctly.
::::::::::::::
MITRE/landmark-travel_cfone-19991101-004-hublog-annotated.txt
::::::::::::::
S: Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for
system development. You may hang up or ask for help at any time. How can I help
you?
U: I'd like to fly to Newark
S: What city does the flight depart from?
U: St. Louis
S: i am sorry i did not understand that
U: Could I leave from New Orleans
S: could you please repeat that
U: I'd like to leave from New Orleans
S: Can you provide the approximate departure or arrival time?
U: I'd like to leave at midnight on Friday
S: i am sorry i could not understand you
U: I'd to leave around dinner on Friday
S: please try rephrasing what you said
U: I want to leave around six _p_m on Friday
S: I have no information about a flight from New Orleans that depart to Newark t
hat depart friday around 6 o'clock p m .
U: Goodbye
S: Good bye. Thank you for using Mitre's Travel demonstration.
% more /tmp/add_u.py /tmp/add_s.py
::::::::::::::
add_u.py
::::::::::::::
#!/usr/bin/python

import string, sys

r = string.strip(sys.stdin.readline())
while r:
    print "U:", r
    sys.stdout.flush()
    r = string.strip(sys.stdin.readline())
::::::::::::::
add_s.py
::::::::::::::
#!/usr/bin/python

import string, sys

r = string.strip(sys.stdin.readline())
while r:
    print "S:", r
    sys.stdout.flush()
    r = string.strip(sys.stdin.readline())


License / Documentation home / Help and feedback
Last updated August 31, 2000