Galaxy Communicator Tutorial:
Creating an End-to-End System
We're almost there. We can build servers, write Hub program files, handle
errors, set up broker backchannels, understand the design of UI elements,
log our activity, and keep track of multiple users over multiple interactions.
In this last lesson, we'll address a number of remaining details.
Selecting your servers
The toy travel demo illustrated a representative set of servers you might
need in your end-to-end system:
- An audio server which supports some sort of telephony hardware,
such as ComputerFone or Dialogic;
- A recognition server which maps audio samples into text strings;
- A parsing server which maps text strings into semantic frames;
- A dialogue manager server which incorporates each incoming semantic
frame into the current context and decides how to act on the resulting context;
- An information server, such as a relational database;
- A language generation server which maps outgoing frames into text
strings;
- A speech synthesis server which maps text strings into audio samples.
There are a wide range of variations on this configuration. For instance:
- In some cases, you may choose to "federate" audio, recognition and
synthesis in the same server. You might do this, for instance, because your
recognizer can only read input from the audio device (this is the case with
JSAPI-compliant recognizers in Java 1.2, due to limitations of Javasound
in that version of Java).
- You may have a separate server for mapping dialogue representations
into representations the information server can understand (an SQL generation
server, for instance).
- You may have a different information server, such as a script which
harvests information from Web pages, or you may have multiple information
servers, either for cacheing, redundancy, or to cover multiple domains.
- Your recognizer may return more structured output, such as n-best
lists or word lattices, or provide confidence measures along with its recognition
results.
- You may choose to separate the dialogue server's function of incorporation
into context from its function of choosing what to do with the resulting
context.
- You may choose to provide a user profile server, either for security
or user modeling or both, which might register a user by voice or by touchtone
input.
- You may choose to have multiple UI elements, such as a Web browser
and a telephone connection working in tandem.
The Galaxy Communicator architecture places no restrictions on any of
these configurations (although in its current form, it may accidently or
intentionally make some configurations easier than others).
We have released an initial version of an open source toolkit which contains
a number of available Communicator-compliant wrappers which are free and
available to use. In addition, a number of Communicator developers are also
distributing such software.
Contacting the Hub
One thing we haven't talked about in detail is how to set up servers
which contact the Hub. It's actually trivial. There's a command line argument
-contact_hub which is available by default to all Communicator-compliant
servers which allows you to specify host:port where a Hub has set
up the appropriate listener. You can find detailed documentation about this
in the Hub listener documentation.
The short version is that there are two steps.
First, you should use the CLIENT_PORT: program file directive to set up
the Hub listener, as we saw in our program file tutorial. Second, you
should start up the server using -contact_hub. As an example, here's
the command line to start up the Audio server from the toy travel demo:
% $DEMO_ROOT/bin/Audio -audio_data $GC_HOME/tutorial/toy-travel/short-example.frames
-contact_hub localhost:2800 -verbosity 0
The Hub program file has the appropriate listener set up:
SERVICE_TYPE: Audio
CLIENT_PORT: 2800
OPERATIONS: Play
Under normal operating circumstances, you should be able to connect as
many servers as you want to the Hub listener; there is no limit imposed.
If you want the listener locked to a particular session, you can use the
-session_id command line argument we talked about in the session lesson.
Guidelines
for writing servers and Hub programs
Always use the environment to write messages to the Hub
There are publicly available functions in the Galaxy Communicator API
which allow you to write messages to the Hub using the connection directly,
instead of the environment. But if you use call environments, you're always
guaranteed of passing along the correct session information. We recommend
always using the call environment.
Make sure to declare your dispatch functions and Hub operations
If you write a dispatch function in a server, you have to declare
it using GAL_SERVER_OP if you want the server to know about it. Similarly,
if you want the Hub to know that the server has a dispatch function, you
have to declare it in the appropriate OPERATIONS: directive entry.
Make sure your messages are distinct
The Hub programming language is very forgiving, in the sense that it
will ignore or pay attention to messages which invoke programs depending
on the keys in the message. So, for example, you could use main
for the name of every message you send to the Hub via GalSS_EnvWriteFrame()
or GalSS_EnvDispatchFrame(),
and use the keys in the message to decide what to do with them. However,
this strategy makes the program file extremely hard to read. We have learned,
from studying MIT's program files, that it's much more straightforward to
distinguish your messages at least by the server that sent them. Our toy travel example
demostrates this strategy.
Use the listener-in-Hub capability at least for your UI elements
The easiest way to manage UI connections to the Hub (that is, GUIs, audio
servers, and the like) is to have them contact the Hub instead of the other
way around. If you set things up using the listener-in-Hub capability, you'll
be able to have UI elements contact the Hub on an ad-hoc basis, and there
are simple hooks for keeping the sessions separate as well. You can use listener-in-Hub
for all your server connections, if you choose, but it's most desirable
for UI elements.
The Builtin server
The Builtin server is a special
server which is implemented as part of the Hub itself. This server is always
available, which means that it doesn't need to be declared in the Hub program
file. It has special access to the internals of the Hub operations, so that
it can be used to provide information about the Hub state (such as the servers
which are available, or the state of various namespaces). The Builtin server
is a grab bag of functionality, and we're not going to talk about most of
it. We'll concentrate here on a small number of potentially interesting and
relevant functions.
nop
The dispatch function nop is a no-op. It simply returns its
input frame. The reason you might want to use nop is that there
may be some times when you may want simply to update a key-value pair in
a namespace when a certain condition is met:
RULE: :output_parse == "PARSE_FAILED" --> Builtin.nop
OUT: (:parse_failed 1)
The Hub scripting language probably ought to provide a case where you
can omit the operation, but for now, using nop is the appropriate
strategy.
call_program
This dispatch function can be used to construct a new message to the
Hub. The name of the program should be passed in as the value of the :program
key. All the other key-value pairs passed to call_program will
appear in the new message. The new message will be processed like any other
new message; it will either invoke a program with the appropriate name, be
relayed to a server which supports an operation with the appropriate name,
or be discarded. This dispatch function will also return the result of the
executed program, if requested.
For example, the toy travel demo unifies the processing of its typed input
and the "output" of the Recognizer server by routing both to the same program
(named UserInput), as follows:
PROGRAM: FromRecognizer
RULE: :input_string --> Builtin.call_program
IN: (:program "UserInput") :input_string
OUT: none!
In this example, no result is requested, so none will be provided.
new_session
This dispatch function creates or resets a session. For creation, this
dispatch function is really not needed, because sessions are created automatically
when they're mentioned, but this dispatch function is crucial if you try
to reuse a session ID. For instance, you might be running a system which will
never have multiple simultaneous users, and you might be using the default
session as your session every time. This is not recommended. However,
if you must do this, new_session will reset the current session state and
start a new log file in the appropriate circumstances (i.e., if you're calling
new_session on this session for the second time, or later).
end_session
This dispatch function ends the current session. We discussed in the
session lesson how important
this is.
destroy
This dispatch function destroys the current token. We alluded to this
dispatch function when we talked about the special destroy! value
of OUT: in the program file
tutorial. This method of destroying the token is somewhat less efficient
than the OUT: value, but it's an available alternative.
hub_exit
This dispatch function causes the Hub to exit, in case you ever want
this sort of thing under program control.
Debugging strategies
There are a number of things you can do to make it easier for you to
understand what's going on (and what's going wrong) with your end-to-end
system. In this section, we describe some of them briefly.
Builtin.hub_break
This Builtin dispatch function suspends the execution of the Hub and
enters a loop where you can inspect the state of various namespaces. We
used this functionality to halt the Hub in our initial lesson on the toy travel demo.
If there's a Hub program rule that isn't getting fired and you think it should
be (or vice versa), you may find it useful to insert a call to Builtin.hub_break
at the appropriate point in your Hub program.
The -debug argument to the Hub
You can also access this same breakpoint behavior by using the -debug
Hub command line argument. This
argument will cause the Hub to break after it sends each new message. You
can shut this behavior off as the Hub is running if you choose to, by typing
a capital C at the breakpoint prompt.
Exploit your verbosity settings
As we described in the toy
travel demo lesson, the Hub and servers support 6 levels of verbose
output, from nothing (level 0) to 6 (everything). These levels can be controlled
by setting the GAL_VERBOSE
environment variable in your shell, or by using the -verbosity
command line arguments for the Hub and servers. The default is 3. If you
choose a level above 3, you'll get more information. Among the relevant values
are:
- At level 4, you'll get a full printout of the message traffic in
the servers instead of the truncated messages with ellipses that you usually
see.
- At level 6, you'll get the actual XDR encodings which are written
to and read from the servers and the Hub. It's not necessarily advisable
to use this level if you have broker connections, since the broker data encoding
will be dumped as well.
MODE: pedantic
In normal circumstances, the Hub will report errors in program files,
but a number of those errors will not cause the Hub to exit (for instance,
a reference to an undeclared operation). If you insert the directive entry
"MODE: pedantic" into your program
file, the Hub will always exit when it encounters these errors. This is a
good way to find typos in your server and operation names.
Fake desktop audio
If you're building a telephony application, we strongly recommend constructing
versions of your audio server to handle desktop audio, and one to handle
text I/O. Both of these should present as similar as possible a set of functionality
as the telephony server. MITRE hopes to make its open-source cross-platform
audio server available in the next release of the Open Source Toolkit.
Avoiding trouble
Consider using the -assert command line argument
If a server is starting up a listener for Hub connections (as opposed
to using the listener-in-Hub
capability), it will use the first port it finds available. If the requested
port isn't available, it will try the requested port + 1, and so on until
it finds a port it can start up on. This behavior can sometimes lead to unexpected
consequences. For instance, if you start up a server on port 6000 and there's
already a server running on that port, the new server will happily start
up on port 6001. If you don't notice this, your Hub may try to contact the
wrong server, or you might not be able to make a connection at all. One
way to avoid this is to use the -assert server command line argument,
which will force the server to exit if the requested port isn't available.
Don't forget about the initial token
You may remember that we talked about the initial
token when we talked about how to send new messages to the Hub. This
initial token can be specified using the INITIAL_TOKEN: directive entry
in the Hub program file. While we don't really recommend using the initial
token for anything besides global initializations and simple tests and examples,
you should be aware that you don't have complete control over the appearance
and content of the initial token. In particular, for historical reasons,
even if you don't specify INITIAL_TOKEN:, an initial token will be created
if there's a program named main. If your main program is
being invoked unexpectedly on a token with token index 0, you should check
to see if there is some accidental overlap between the keys you expect and
the keys in the default initial token.
Turn off the Hub pacifier
In all of our exercises, we've used the -suppress_pacifier command
line argument to the Hub. If this flag is not provided, the Hub will print
out a period (".") each second when it is idle. If you're trying to scan
back through the Hub output, many displays (X terminals and the process monitor,
for instance) will cause problems because they scroll to the end whenever
there's output. In the process monitor, you can press the "Pause" button;
but in general, if you're having problems with this, use -suppress_pacifier.
Builtin.increment_utterance and logging
You may recall when we talked about the content of the Hub log,
we told you to ignore the features :BEGIN_UTT and the reference
to -01 throughout the log. These elements have to do with the way
the Hub was originally designed to handle utterance boundaries. The Hub has
an utterance counter, which can be incremented using the Builtin dispatch
function increment_utterance. The original designers assumed that
a Communicator-compliant system would begin with a presentation to the user
(utterance -1), followed by a call to increment_utterance, and then
a call to increment_utterance each time the Hub program writer determined
an utterance boundary was reached.
The utterance counter has had three uses in the history of the Hub:
- It is used to index entries in a Hub-internal database which is
local to each session. You can use the STORE: directive entry to store
key-value pairs in a frame associated with the current utterance counter
in the current session, and use the RETRIEVE: directive entry to retrieve
key-value pairs from a frame associated with (by default) the previous
utterance counter in the current session. If you have a strong commitment
to stateless servers, you may find this functionality useful as a storage
device. However, it clearly depends on the utterance counter being incremented;
otherwise, you'd just be able to use the session namespace.
- The original MIT audio server used this counter to help name the
audio files logged for the given session. We believe there are better ways
to do this; in particular, we recommend that the audio server determine
its own file names, and report them to the Hub so that they can be logged
in the Hub log.
- Incrementing the utterance counter has the effect of ending an
utterance in the log and starting a new one. When we designed the XML DTD
for the XML version
of the Hub log, we introduced a tag called GC_TURN which correspond this
boundary. However, we subsequently moved to a different model of introducing
"meaningful" information into the log, through a system of annotation rules
which "decorate" existing tags with meaningful landmarks which indicate things
like the beginning and end of turns. This was because we didn't want people
to have to write their program files a particular way in order to guarantee
logs that could be scored. As a result, the GC_TURN tag is mostly ignored
in the log analysis process, with a single exception: it's possible to write
annotation rules which rely on GC_TURN as a scoping device, and under some
circumstances, it's difficult to write your rules if you don't have enough
scoping levels. This is a flaw which ought to be addressed eventually, but
has not been yet.
In summary, unless you want to use STORE: and RETRIEVE:, you can usually
ignore the utterance counter. But you should be aware of the issues involved.
Other historical artifacts
There are a few other elements which are worth mentioning which may surprise
you or can get you into trouble:
- The frame type (the c at the beginning of the frame's string
representation) is a relic of how frames were originally used (and are still
used in the MIT servers): as hierarchical representations of parses and semantic
interpretations. The c stands for "clause"; there are also predicates ("p")
and topics ("q" for quantificational element). Frames without the leading
type (e.g., { foo :bar 1 }) are interpreted as clauses. We only
use the clause type in the infrastructure proper.
- The Galaxy Communicator header include line for C code should always
be -I$GC_HOME/include, and the main header file should always
be #include <galaxy/galaxy_all.h>. This is because all the
Galaxy Communicator header files assume that the include lines end with the
include/ directory, and will include other files which are
referenced as, say, #include <galaxy/util.h>. This is a historical
idiosyncracy which will never be changed.
- Finally, you may wonder why the second argument of dispatch functions
is a void *, rather than a call environment pointer. This, again,
is a historical relic, but because of the vagaries of typing in C and the
base of installed code already finished, it's something that we can't change.
Summary
Congratulations! You've completed the Galaxy Communicator tutorial. Not
only should you have enough information to understand the remainder of the
documentation, you should also know enough by now to actually get started
building your own end-to-end system. Good luck!
Last updated August 8, 2002