License / Documentation home / Help and feedback |
In order to illustrate how to assemble an end-to-end system, we've constructed an example which operates according to a preset sequence of messages. Below, we describe the configuration of servers, how to run them, and what they illustrate.
(The file is a list of lists. Each list corresponds to the consequences of a single user input (either "typed input" or "spoken input"). Each user input list is a sequence of frames. The name of each frame corresponds to a processing stage, and each processing stage has a set of key-value pairs which are recognized. Here are the steps and keys:
( {c dialogue_output
:frame {c greeting } }
{c generator_output
:output_string "Welcome to Communicator. How may I help you?" }
{c synthesizer_output
:sample_rate 8000
:encoding_format "linear16"
:num_samples 14520 } )( {c audio_input
:sample_rate 8000
:encoding_format "linear16"
:num_samples 16560 }
{c text_input
:input_string "I WANT TO FLY TO LOS ANGELES" }
{c recognizer_output
:input_string "I WANT TO FLY LOS ANGELES" }
{c parser_output
:frame {c flight
:destination "LOS ANGELES" } }
{c dialogue_output
:frame {c query_departure } }
{c generator_output
:output_string "Where are you traveling from?" }
{c synthesizer_output
:sample_rate 8000
:encoding_format "linear16"
:num_samples 9560 } )( {c audio_input
:sample_rate 8000
:encoding_format "linear16"
:num_samples 4580 }
{c text_input
:input_string "BOSTON" }
{c recognizer_output
:input_string "BOSTON" }
{c parser_output
:frame {c flight
:city "BOSTON" } }
{c backend_query
:sql_query "select airline, flight_number, departure_datetime from flight_table where departure_aiport = 'BOS' and arrival_airport = 'LAX'" }
{c backend_output
:column_names ( "airline" "flight_number" "departure_datetime" )
:nfound 2
:values ( ( "AA" "115" "1144" )
( "UA" "436" "1405" ) ) }
{c dialogue_output
:frame {c db_result
:column_names ( "airline" "flight_number" "departure_datetime" )
:tuples ( ( "AA" "115" "1144" )
( "UA" "436" "1405" ) ) } }
{c generator_output
:output_string "American Airlines flight 115 leaves at 11:44 AM, and United flight 436 leaves at 2:05 PM" }
{c synthesizer_output
:sample_rate 8000
:encoding_format "linear16"
:num_samples 35068 } ))
Step | Frame name | keys |
Audio gesture from the Audio server to Recognizer | audio_input | :sample_rate (integer), :encoding_format (string), :num_samples (integer) |
Text gesture from UI server to Parser | text_input | :input_string (string) |
From Recognizer to Parser | recognizer_output | :input_string (string) |
From Parser to Dialogue | parser_output | :frame (frame) |
Query from Dialogue to Backend | backend_query | :sql_query (string) |
Result of query to Backend | backend_output | :column_names (list of strings), :nfound (integer), :values (list of lists of strings) |
New message to user from Dialogue server to to Generator | dialogue_output | :frame (frame) |
From Generator, to Synthesizer or UI | generator_output | :output_string (string) |
From Synthesizer to Audio | synthesizer_output | :sample_rate (integer), :encoding_format (string), :num_samples (integer) |
Each list is treated as a single set of actions. Each server works by looking for an action set which contains the data corresponding to the input it gets in the appropriate step. If it finds the appropriate input, and the action set also contains an appropriate output, it produces the output.
;; The value of LOG_VERSION: will be used in
;; the annotation rules.LOG_VERSION: "toy travel, version 1"
;; Use extended syntax (new in version 3.0).
PGM_SYNTAX: extended
;; This means that the log directory hierarchy
;; will start in the directory where the Hub is run.LOG_DIR: .
;; Both audio and UI will be HUB clients, and
;; they will share a port.SERVICE_TYPE: Audio
CLIENT_PORT: 2800
OPERATIONS: PlaySERVICE_TYPE: UI
CLIENT_PORT: 2800
OPERATIONS: PrintSERVER: Parser
HOST: localhost
PORT: 10000
OPERATIONS: ParseSERVER: Dialogue
HOST: localhost
PORT: 18500
OPERATIONS: DoDialogue DoGreetingSERVER: Generator
HOST: localhost
PORT: 16000
OPERATIONS: GenerateSERVER: Backend
HOST: localhost
PORT: 13000
OPERATIONS: RetrieveSERVER: Recognizer
HOST: localhost
PORT: 11000
OPERATIONS: RecognizeSERVER: Synthesizer
HOST: localhost
PORT: 15500
OPERATIONS: SynthesizeSERVER: IOMonitor
HOST: localhost
PORT: 10050
OPERATIONS: ReportIO;; We use four crucial functions in the Builtin server.
SERVER: Builtin
OPERATIONS: new_session end_session call_program nop hub_break;; For logging, I will timestamp everything. Since
;; I'm also logging all the relevant keys, I really
;; don't need to timestamp, since they'll be added
;; automatically, but it's harmless and good practice.TIMESTAMP: Play Print Parse DoDialogue DoGreeting Generate Retrieve \
Recognize Synthesize new_session end_session call_program \
FromAudio OpenAudioSession OpenTextSession \
FromRecognizer FromUI UserInput \
FromDialogue DBQuery FromSynthesizer;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; AUDIO INPUT
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; This first program handles input from the audio
;; server. The :host and :port and :call_id are
;; typically the information required to establish
;; a brokering connection.PROGRAM: FromAudio
RULE: :host & :port & :call_id --> Recognizer.Recognize
IN: :host :port :encoding_format :sample_rate :call_id
LOG_IN: :host :port :encoding_format :sample_rate :call_id
OUT: none!;; This program handles opening
;; an audio session. It marks audio available for the session
;; by using a key whose prefix is :hub_session_.PROGRAM: OpenAudioSession
;; Notice that we use the nop dispatch function to
;; "host" a call to OUT: to set a session variable.RULE: --> Builtin.nop
OUT: ($in(:audio_available session) 1);; Now we create a session. This is not technically
;; necessary, since the audio server creates a
;; session by virtue of how it connects using the
;; listener-in-Hub functionality.RULE: --> Builtin.new_session
;; Finally, I kick off the system greeting.
RULE: --> Dialogue.DoGreeting
;; This program dispatches recognizer results to
;; the general program which handles user input.PROGRAM: FromRecognizer
RULE: :input_string --> Builtin.call_program
IN: (:program "UserInput") :input_string
OUT: none!;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; TEXT INPUT
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; This program handles opening
;; a text in/text out session. It creates a session
;; and kicks off the user greeting.PROGRAM: OpenTextSession
RULE: --> Builtin.new_session
RULE: --> Dialogue.DoGreeting
;; This function relays the typed input to the
;; main body of the input processing.PROGRAM: FromUI
RULE: :input_string --> Builtin.call_program
IN: (:program "UserInput") :input_string;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; MAIN INPUT BODY
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; This program handles the main input text processing.
;; It passes the result of the parsing to the dialogue
;; manager, and if there's a response, it relays it
;; to the output processing program.PROGRAM: UserInput
RULE: :input_string --> IOMonitor.ReportIO
IN: (:utterance :input_string) (:who "user")
OUT: none!RULE: :input_string --> Parser.Parse
IN: :input_string
OUT: :frame
LOG_IN: :input_string
LOG_OUT: :frame;; We're not waiting for a reply, so the errors will be
;; signalled by new messages.RULE: :frame --> Dialogue.DoDialogue
IN: :frame
LOG_IN: :frame
OUT: none!;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; MAIN OUTPUT BODY
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; This program handles the response from the dialogue
;; manager which may take one of a number of
;; forms (database tuples to be described, or perhaps
;; an already-formatted frame).PROGRAM: FromDialogue
RULE: :output_frame --> Generator.Generate
IN: :output_frame
OUT: :output_string
LOG_IN: :output_frame
LOG_OUT: :output_stringRULE: :output_string --> IOMonitor.ReportIO
IN: (:utterance :output_string) (:who "system")
OUT: none!;; At this point, we need to decide whether to respond
;; using audio or not. We condition this on our Hub
;; session variable.RULE: $in(:audio_available session) & :output_string --> Synthesizer.Synthesize
IN: :output_string
LOG_IN: :output_string
OUT: none!RULE: ! $in(:audio_available session) & :output_string --> UI.Print
IN: :output_string
LOG_IN: :output_string
OUT: none!;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; DB SUBQUERY
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; This program handles the server-to-server subdialogue
;; through which the dialogue manager queries the database.PROGRAM: DBQuery
RULE: :sql_query --> Backend.Retrieve
IN: :sql_query
OUT: :column_names :nfound :values
LOG_IN: :sql_query
LOG_OUT: :column_names :nfound :values;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; AUDIO OUTPUT
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Finally, the synthesizer chooses to produce a new
;; message. We could also accomplish the same
;; result by returning a value from the synthesizer and
;; adding more rules to the FromDialogue program above.PROGRAM: FromSynthesizer
RULE: :host & :port & :call_id --> Audio.Play
IN: :host :port :encoding_format :sample_rate :call_id :num_samples
LOG_IN: :host :port :encoding_format :sample_rate :call_id :num_samples
% cd contrib/MITRE/demos/toy-travelIf you select "Process Control --> Restart all", all the servers will be started in order, and the Hub. Also, two additional process monitors will be started for the UI and Audio servers. Both these servers use the interaction paradigm described for audio and text. Both these servers can be running at the same time, and both of them can be connected at the same time; the sessions will be distinguished appropriately.
% ../../tools/bin/process_monitor toy-travel.config -- example.frames toy-travel.pgm &
Please note that in the case of audio simulation, the audio data being passed around is an array of randomly generated bytes. The computer's audio device is not involved, and you will not hear any audio played at any point.
% cd contrib/MITRE/demos/toy-travelThe timestamps for the start and end of speech are identical because our audio server doesn't send separate notifications for audio start and end. It also doesn't log the audio and report the logfile back to the Hub. A true audio server would do both these things.
% ../../tools/bin/process_monitor toy-travel.config -- example.frames toy-travel.pgm &
[...after running the example...]
% ../../tools/bin/xml_summarize --rule_base annotation_rules.xml sls/[datedir]/[sessionnum]
Checking: sls/20000804/018
Reading raw Hub log: sls/20000804/018/sls-20000804-018-hublog.txt
...read.
Converting to XML: sls/20000804/018/sls-20000804-018-hublog.txt
...converted.
Reading rule file: annotation_rules.xml
...read.
Applying rules: annotation_rules.xml
...succeeded.
Resegmenting...
...resegmented.
Fri Aug 4 2000 at 21:40:21.89: Task-specific portion and overall task ended.
Fri Aug 4 2000 at 21:40:16.54: New system turn began.
Fri Aug 4 2000 at 21:40:16.56 to Fri Aug 4 2000 at 21:40:16.58: System started speaking.
Fri Aug 4 2000 at 21:40:16.56 to Fri Aug 4 2000 at 21:40:16.58: System finished speaking.
System said: Welcome to Communicator. How may I help you?
Fri Aug 4 2000 at 21:40:17.78: New user turn began.
Fri Aug 4 2000 at 21:40:17.78: User started speaking.
Fri Aug 4 2000 at 21:40:17.78: User finished speaking.
Recognizer heard: I WANT TO FLY LOS ANGELES
Fri Aug 4 2000 at 21:40:17.93: New system turn began.
Fri Aug 4 2000 at 21:40:17.95 to Fri Aug 4 2000 at 21:40:17.96: System started speaking.
Fri Aug 4 2000 at 21:40:17.95 to Fri Aug 4 2000 at 21:40:17.96: System finished speaking.
System said: Where are you traveling from?
Fri Aug 4 2000 at 21:40:19.02: New user turn began.
Fri Aug 4 2000 at 21:40:19.02: User started speaking.
Fri Aug 4 2000 at 21:40:19.02: User finished speaking.
Recognizer heard: BOSTON
Fri Aug 4 2000 at 21:40:19.19: New system turn began.
Fri Aug 4 2000 at 21:40:19.23 to Fri Aug 4 2000 at 21:40:19.25: System started speaking.
Fri Aug 4 2000 at 21:40:19.23 to Fri Aug 4 2000 at 21:40:19.25: System finished speaking.
System said: American Airlines flight 115 leaves at 11:44 AM, and United flight 436 leaves at 2:05 PM
License / Documentation home / Help and feedback |