Don't Read Me:
A. L. I. C. E. and AIML Documentation
Richard S. Wallace
Copyright © 2000
Last Modified January 14, 2000
Preface
The primary purpose of this program -- codenamed "B"
-- is to give
away the secret of ALICE chat robot development to anyone
who
wants it, permitting the greatest possible dissemination,
utilization
and technical improvement of the ALICE chat robot technology.
Second, program B is written in Java to support the widest
possible
architectural base (a significant shortcoming of early ALICE).
B runs on a
wide range of processors and systems supporting the
Java Virtual Machine.
Third, program B is designed to be as simple as
possible to use, especially
for nonprogrammers who will never have
to touch the Java source
code.
Program B is offered free, under the terms of the GNU General
Public
License; permitting rapid, organic software growth with contributions
from many programmers. We gratefully acknowledge the
following
individuals who worked on program B:
Diana Andreacchio
wrote a help document for Windows users that became
part of this FAQ.
Anonymous contributors created the VBScript interface to MS Agent.
David
Bacon wrote the SETL compiler used to prototype the early
versions of ALICE.
Jacco Bikker wrote an AIML interpreter in C and corrected much of
the
original AIML content.
Ace Craig contributed the the AIML "web search"
markup.
Kris Drent has provided significant contributions to the
Program
B application, Applet and Servlet. He created the
<topic> tag. Kris
also wrote a fast, small-footprint XML
parser tuned for AIML
files.
Christian Drossmann provided the German language AIML module.
Ken
Goldberg provided key ideas for dialog initiation and analysis.
Sage Greco
created the ALICE robot maid 3-D graphic model and also
designed the
original ALICE pyramid logo.
John Laroche updated the XML DTD for AIML.
Andrew Potgieter wrote a text wrapper and created the <think>
tag.
Timothy F. Rohaly proofread the Java code and suggested numerous
improvements.
Anthony Taylor provided a formatted AIML file
65Pretty.aiml.
Additional graphics were created by Sage Greco, Darren
Langley and
Larry Hauser. We gratefully acknowledge these contributions and
the
programmers who created them. Thanks are also due to the many people
who asked the "frequently asked questions" that form the basis of
this
document.
DISCLAIMER:
This Java code sample is provided to you on an
'as-is' basis
without warranty or condition of any kind, either express
or
implied, including, but not limited to, warranty or condition
of
merchantable quality or fitness for a particular purpose.
The authors
shall not be liable for any damages
arising out of your use of this code,
even if they
have been advised of the possibility of such
damages.
Copyright © 2000, Dr. Richard S. Wallace, All Rights
Reserved
Program B includes a chat robot development environment, a GUI,
a server,
Servlet and downloadable Applet. This file contains a basic outline
of
documentation and a tutorial. Some experience working with
Java
applications and Applets is desirable, but not necessary, to
install
and run this software. Program B could be your first
Java program.
Contents
I. Introduction
II. Download and Run
III. Creating Content
IV. Web Server Interface
V. Applet Interface
VI. AIML - Artificial Intelligence Markup Language
VII. Java Classes
Appendix A. Self Test
Appendix B. Note to Parents
I. Introduction
- What is the goal for AIML?
AIML (Artificial Intelligence
Markup Language) is an XML specification
for programming chat robots like
ALICE using program B. The emphasis
in the language design is minimalism. The
simplicity of AIML makes
it easy for non-programmers, especially those who
already know HTML,
to get started writing chat robots.
One ambitious
goal for AIML is that, if a number of people create their own
robots, each
with a unique area of expertise, program B can literally
merge-sort them
together into a Superbot, automatically omitting
duplicate categories. We
offer the both the source code and the ALICE
content, in order to encourage
others will "open source" their chat
robots as well, to contribute to the
Superbot. Ken Goldberg,
Christian Drossmann and others have already
contributed significant
content to the ALICE chat robot.
Botmasters
are also of course free to copy protect private chat robots.
- Who is the botmaster?
The botmaster is you, the master of
your chat robot. A botmaster runs
program B and creates or modifies a chat
robot with the program's
graphical user interface (GUI). He or she is
responsible for
reading the dialogues, analyzing the responses, and creating
new
replies for the patterns detected by program B. Botmasters
are
hobbyists, webmasters, developers, advertisers, artists,
publishers,
editors, engineers, and anyone else interested in creating a
personal
chat robot.
- How can I create my own chat robot?
The secret to chat bot
programming, if there is one, is what Simon
Laven called "continuous beta
testing". Program B runs as a server
and collects dialog on the web. The
program provides the chat bot
developer with a tool called "classify
dialogues", that tests the current
robot with the history of accumulated
human queries. Moreover, the program
suggests new categories automatically,
for the botmaster to refine.
- How difficult is it to create a chat robot?
Not difficult.
If you can write HTML, you can write AIML (Artificial
Intelligence Markup
Language). Here is an example of a simple but
complete chat robot in
AIML:
<alice>
<category>
<pattern>*</pattern>
<template>
Hello! </template>
</category>
</alice>
The tags
<alice>...</alice> indicate that this markup contains a
chat
robot. The <category> tag indicates an AIML category, the
basic unit of
chat robot knowledge. The category has a <pattern>
and a
<template>. The pattern in this case is the wild-card
symbol '*' that
matches any input. The template is just the text
"Hello!" As you may have
guessed, this simple chat robot just
responds by saying "Hello!" to any
input.
You can get stared with AIML knowing just the three
tags
<category>, <pattern> and <template>; much like you
may have
started with HTML knowing only <a>, <img> and
<h1>.
- Does ALICE learn?
The model of learning in ALICE is called
"supervised training",
because a teacher, the botmaster, always plays a
crucial role.
The alternative, "unsupervised training", is complicated
in
an open environment like the Web. The problem is that clients
are
untrustworthy teachers, and forever try to "fool" the robot
with untrue
assertions.
- Does ALICE think?
It depends on what you mean by
"thinking". The most fascinating
responses from ALICE arise when she says
something
unexpected, or puts together responses in ways the botmaster
never indended. For example:
Client: I bet you are gay.
ALICE:
Actually I am not the gambling type. Actually as a machine
I have no need for
sex.
Here the robot linked two different categories which both
coincidentally
have a moral theme (gambling and sexuality). But this specific
combination was
not "preprogrammed" by the botmaster.
Are these
surprising responses just unintended coincidences, or do they
indicate that
ALICE is thinking? Is ALICE just a gigantic stimulus-response
mechanism, or
are we?
- What is the theory behind ALICE?
I used to say that there
was NO theory behind ALICE: no neural network,
no knowledge representation,
no search, no fuzzy logic, no genetic
algorithms, and no parsing. Then I
discovered there was a theory
circulating in applied AI called "Case-Based
Reasoning" or CBR that
maps well onto the ALICE algorithm. Another term,
borrowed from
pattern recognition, is "nearest-neighbor classification."
The CBR "cases" are the categories in AIML. The algorithm
finds
best-matching pattern for each input. The category ties the
response
template directly to the stimulus pattern. ALICE is
conceptually not much
more complicated that Weizenbaum's ELIZA
chat robot; the main differences
are the much larger case base and the
tools for creating new content by
dialog analysis.
ALICE is also part of the tradition of "minimalist",
"reactive" or
"stimulus-response" robotics. Mobile robots work best, fastest
and
demonstrate the most animated, realistic behavior when their sensory
inputs directly control the motor reactions. Higher-level
symbolic
processing, search, and planning, tends to slow down the process
too much for realistic applications, even with the fastest
control
computers.
- Can probability (statistics, weights, neural networks, or fuzzy logic)
improve bots?
Statistics are in fact heavily used in the ALICE
server, but not in the way
you might think. ALICE uses 'Zipf Analysis' to
plot the rank-frequency of
the activated categories and to reveal inputs from
the log file that don't
already have specific replies, so the botmaster can
focus on answering
questions people actually ask (the "Quick Targets"
function).
Other bot languages, notably the one used for Julia, make
heavy use of
"fuzzy" or "weighted" rules. We see their problem as this: the
botmaster
already has enough to worry about without having to make up
"magic
numbers" for every rule. Once you get up 10,000 categories (like
ALICE)
you don't want to think about more parameters than necessary.
Bot
languages with fuzzy matching rules tend to have scaling
problems.
Finally, the bot replies are not as deterministic as you might
think, even
without weights. Some answers rely on <random> to select
one of several
possible replies. Other replies generated by unforseen user
input also
create "spontaneous" outputs that the botmaster doesn't
anticipate.
- Can I have a private conversation with ALICE?
The ALICE
server logs and records all conversations. Even the ALICE
Applet tries to
transmit conversation logs back to the originating server.
You can have a
private conversation with ALICE, however, if you download
Program B to your
own computer and run it there. Running on your machine,
the server stores all
the conversations locally.
II. Download and Run
- How do I install ALICE?
If you purchased a commercial
version of ALICE on CD ROM or
over the web, installation should be very easy.
These versions
usually have their own self-extracting and install software.
You can install the ALICE program with just a mouse click and
activate it
with a desktop icon.
If you bought a commercial version of ALICE with a
self-installer,
you can skip this section and go on to "Creating
Content".
- How do I download program B?
Create a Directory (or Folder)
on your machine to download
the B.zip file. When you click on "B.zip" the
browser
should ask you where you want to save the file. Select
the
directory you created and save B.zip to that folder.
Once you've
downloaded, You can use "unzip B.zip" to extract the files.
If you don't have
this unzip command on your machine, you can get
a free one from Winzip
(www.winzip.com) to unzip the "B.zip" file.
If you want to get into the
Java source code, you need a
Java 1.17 (or higher) development kit
release.
Go to java.sun.com for a free one. The program source code
and
all associated files are stored in the single "zip" file
called B.zip. To
extract the files use the command
"unzip B.zip" (assuming you have "unzip" on
your machine).
- How do I run program B?
Use the command "java B" (or "java
Bawt") to start the program.
Run program B and notice that the program
creates an Edit View
text window. By default, program B loads the chat robot
ALICE
(stored in B.aiml).
- What does "Send" do?
Type a text string like "hello" into
the Text Area
(Edit View) and press the "Send" button. Notice that program
B
replaces the text in the Edit View with a reply from the
robot.
- What does "Clear" do?
To enter another robot query, clear
the screen with the "Clear"
button. Enter a new String like "How are you?"
and press "Say."
"Send" and "Clear" provide a simple way to communicate
with the
chat bot through the Edit View. Try cutting and pasting a
paragraph,
such as an e-mail message, into the Edit View and press "Send".
See how the robot would reply to your multiline message.
- What is program Bawt?
Significant demand for a version of
ALICE compatible with
pre- Java 2 (formerly known as Java 1.2) prompted
the
development of "Bawt.java", an open source java program
for chat robot
development that works with older versions of
Java, and AWT. Originally
program B relied on
Java 2 and Swing, but program Bawt needs only Java 1.1
and AWT.
Swing is a newer GUI package that subsumes the earlier Java
Abstract Windows Toolkit (AWT).
At present class B merely extends
class Bawt. Swing not
supported.
- Does program B run under Windows?
Yes. You need the Java
Runtime Environment (JRE) so you can run the
"java" command from the DOS
prompt. Try opening a DOS window
and type "java".
Microsoft often
includes a JRE called "jview" rather than
"java". Try opening a DOS window
and type "jview". On Windows 98
the JRE is usually located in
c:\windows\jview.exe.
- Does program B run on a Mac?
Yes. Download the B.zip file
and save it in a new folder.
Instead of the "winzip" or "unzip" utility
use "Aladdin StuffIt Expander."
The newer version will unzip most MAC
formats as well as .ZIP files. You can
download this at "www.download.com" by
searching for it by name. You can
also select the option that allows it to
search only for Mac programs.
Download that and install it, it should do the
trick.
Apple makes its own Java Runtime Environment for the Mac
called
MRJ 2.1.4. You can download it from
http://www.apple.com/java.
- Does program B run under Linux?
Yes. You need the JRE,
which often comes bundled with Linux
(e.g. the kaffee JRE with Red Hat Linux)
or you can download one
from java.sun.com. You also need X-windows to run
the GUI.
Open a shell under X windows and use the command "java B".
We also recommend the IBM release of their Java 1.1.8 Java
Development
Kit (JDK) and JRE for Linux. It is solid, efficient and very
fast.
You can download it free
at:
http://www.ibm.com/java/jdk/118/linux/index.html
- Does program B run under XYZ?
Yes if XYZ runs has a Java
Runtime Environment 1.17 or higher.
- How much memory do I need to run program B?
The source code
compresses to as little as half a megabyte, including
all the AIML files for
nearly 16,000 categories. You may have downloaded
a file of only around 500K.
Plan to use a minimum 10 MB of hard disk space
for the download directory.
The hard disk requirements include not
only the source code and Java class
files, but also the dialogue files
and other temporary files created by the
robot.
The RAM requirements vary depending on the size of your
robot.
To run the fully loaded ALICE chat robot with 16,000 categories
you
will need 64MB of memory. To do this and anything else at
the same time on
your system we recommend a minimum of 96MB.
With less memory you can load a
smaller robot. See the question
below "What is <load
filename="X"/>?"
- How do I install ALICE on Windows?
Download program B at
www.alicebot.org.
Install program B in a file folder named
"B".
Download the java development kit - jdk1.2.2 -
at
java.sun.com/products/jdk1.2
This is a large file - 20MB. If you have a
slow modem it will take
a long time. Download the full single file as windows
95 will not
concatenate the separate pieces. Install the jdk1.2 in a folder
named "jdk1.2.2". Read the installation instructions.
Download the
separate docs file - 17MB. Install in the jdk1.2 folder.
Program B is a
java application and must be run in DOS.
Go to Start, click Programs,
click MS-DOS Prompt.
At C:\> prompt type in,
cd B
You will
then get - C:\B> - This is now pointing to your source directory.
(Please
note that if you have installed program B to a different
drive than C
substitute that drive letter - also you should have
installed the jdk1.2.2 in
the same drive as program B).
At C:\B> type jdk1.2.2\bin\java B and hit
enter (the idea is, you first
point to your source file and then to the jdk
folder which will
run your program and then to the executable file which in
this case
is java B.
What it should look like is this
-
C:\B>jdk1.2.2\bin\java B
After hitting enter you should get the
botmaster screen. You can
play around with it by yourself on your
machine.
You can also use the applet in DOS. To do so type as
follows:
C:\B>jdk1.2.2\bin\appletviewer index.html
This will try to
connect you to the main server at alicebot.org
so have your browser fired up.
Keep in mind that you will be
talking to the alicebot at that site, however,
and not to the
program on your machine.
- What do you mean by the command "java B"?
This does not
mean you mean click on an icon. If you are using Windows,
you must use a DOS
window to run a Java program. Find the MS-DOS item
on your start menu or
desktop and open up a DOS window. In that window, use
the DOS commands CD
(change directory) to move to the "B" directory.
Then type "java B" to run
the program.
If you are using windows, then you can create a desktop icon
as a "shortcut" to a batch file. Create a batch file called
"launch.bat"
in the program B directory. The file contains only
one line with the text
"java B". There is an AIML icon file
included with program B called
"aiml.ico". You can use this
file to add an icon to your desktop.
- I tried running "java B" and I got a "bad command or file name".
You are using a Windows/DOS setup. If "jview B" does not work
either,
you may need to install Java on your computer. Go to java.sun.com
and pick the one for your computer (Windows 95/98 or NT).
If it still
says "bad command" then possibly there is a problem with
the CLASSPATH
variable in AUTOEXEC.BAT. Make sure it is set to
something like
SET
CLASSPATH=.;%CLASSPATH%
(The single "." means the current working
directory)
and make sure the PATH is set to include the java home
directory:
SET PATH=c:\JDK1.2\bin;%PATH%
- How do I uninstall ALICE from my system?
If you installed
ALICE on Windows with a commercial installer like
InstallShield Java Edition,
then go to the start menu and
select "Control Panel". Click on the control
panel item called
"Add/Remove Programs". Select ALICE from the list of
installed
software and choose "Uninstall".
All the files of ALICE are
stored in one directory on your computer
(or folder) usually called "B" but
maybe something else depending
on the name you chose when you downloaded
ALICE. In any case,
ALICE will not change or damage any other files on your
system.
To remove ALICE from your computer, simply remove this folder.
Delete it, or drag it to your trash bin and select "Empty trash"
(or
"Empty Recycle Bin").
If you cannot find the folder where ALICE resides,
use the Finder
to locate the file called "B.aiml" on your file system. The
"B.aiml"
file is in the same directory as all the ALICE files. If this file
does
not exist, then ALICE is probably not installed on your
computer.
Because ALICE is a platform-independent Java application, it
does
not rely on the Windows Registry or other Windows-specific
features.
You can assume ALICE will leave your MS Windows Registry and
other Windows system files untouched.
Conceivably if ALICE has run
for a long time on your computer, and
you deliberately used the "Save
Options" menu item to change the
name or location of her files to something
other than the default values,
then there is a slight chance that there
could be a few ALICE
files scattered around your disk. Please refer to the
DISCLAIMER
at the beginning of DON'T READ ME.
- Can I create a language-specific installtion?
Yes. The file
"language.txt" controls the language of the
buttons and menus in the ALICE
GUI. If the file is missing,
the program uses English names by default. To
see an
example of a language-specific installation, copy the
file
"Germanlanguage.txt" to "language.txt" and start
program B.
III. Creating Content
- How does the Personality Wizard work?
The simplest way to
alter the content of the basic ALICE
robot personality is to run the
Personality Wizard on
the "Options" menu (or in the Kid interface).
This
wizard asks the botmaster a series
of questions to set the values of a set of
robot
personality tags including its name, gender, preferences
and replies
to very common questions.
The Personality Wizard does not create any new
AIML
categories. The replies set the value of global tags
like
<location/> and <favorite_movie/> that might be
used in many
categories throughout the AIML knowledge
base. The basic set of Wizard
questions are collected
in the file Personality.aiml.
Hint: If you
plan to use the Applet, avoid the double-quote (")
character in the
Personality Wizard.
- Can I change the name of the robot?
The AIML tag
<name/> inserts the name of the Bot wherever it appears.
The default
robot name is "ALICE" but you can change it in the
"Options menu". Select
"Show Options" and replace "ALICE" with the
name of your bot, and then do
"Save Options". Depending on your
state, you may need to restart program B.
- How can I customize my robot?
AIML provides several tags
useful to quickly clone
a chat robot from ALICE with a distinct
"personality":
<gender/> the robot's gender
<location/>
the robot's location
<birthday/> the robot's
birthday
<botmaster/> the botmaster's name
Together with the
previously discussed <name/>, these
tags allow you to quickly create a
clone from the ALICE
Brain with a separate identity from ALICE.
All
the personality tag values can be modifed through
the Personality Wizard. The
tag values can also be
changed with the Options Menu in program B. Use "Show
Options"
and "Save Options" to customize your chat robot.
To test the
new features, we created a male robot named
Brute (because "all men are
brutes") born on August 18, 1999.
- How do I know what categories to add?
After you collect
some dialogue, run "Classify" and "Quick Targets".
This will tell you the
most frequently asked patterns that do not
already have specific responses.
The "Target" functions display new
categories with proposed patterns and
template fields filled with
the name of another category. Delete the template
information and fill
in a new response. You can also edit the pattern to
simplify it or
generalize it with a "*" operator.
- What does "Classify" do?
The key to chat robot development
is log file analysis. The program
stores client dialogues in a file called
"dialog.txt" (unless you
change this default name). The "Classify" button
activates a routine
that scans the dialogue file and reports how many times
each
category is activated. The processing may take several
minutes,
depending on the size and range of the dialogue file chosen.
The
result appears as a table in the Edit View window. The
program
displays the categories sorted by activation count.
The
format of each output line is:
P% (Q%) T PATTERN = N1 W1 + N2 W2 +
...
Where
P = Percent of inputs classified in this category
Q =
Cumulative percent up to this category
T = Total count of inputs activating
this category
Ni = number of times input Wi detected (blank if Ni = 1)
Wi
= normalized input pattern activating this category
- What does "Quick Targets" do?
After running Classify, the
Quick Targets button displays a set of
new AIML categories for editing. The
program uses statistics to
find new category candidates. These categories are
displayed as
<category>
<pattern> NEW PATTERN
</pattern> <template> OLD PATTERN
</template>
</category>
where OLD PATTERN is the pattern
from the original category and
NEW PATTERN is the proposed new input
pattern.
The botmaster may choose to either delete or edit the new
category.
If the new category is not desired, delete it by selecting
the
category from the text area and "cut" the text with the
"delete"
key.
If the new category appears useful, edit the OLD PATTERN
string to
create a new reply. Optionally, the NEW PATTERN may also be
edited,
depending on how specific a pattern the botmaster
desires.
When finished editing the Target categories, go to the
"Botmaster"
menu and select "Add AIML". The "Add AIML" menu item will read
the
text displayed in the Edit View and parse it into new AIML
categories.
The botmaster may then save the updated robot with the "File/Save
Robot"
or "File/Save Robot As" menu items.
- What does "More Targets" do?
If you don't see enough good
targets with "Quick Targets", hit
"More Targets."
- What does the File menu do?
Save and load text files
(transfer contents to/from text area);
Save and load robot (AIML)
files.
1. By default, AIML files use the .aiml file extension.
2. The
default robot file is called "B.aiml"
3. By default the robot files reside in
the same directory as
program B
4. Robot files begin and end with the tags
<alice> and </alice>
5. "Save Robot" overwrites the default robot
file (see 2).
6. "Save Robot As" can be used to copy a robot.
Exit -
exit the program
- What does the Edit menu do?
Paste contents of clipboard
into the program B text area.
- What does the Options menu do?
Display and save chat robot
options.
Use start and end index to select a range of lines
from the
dialog file.
Toggle Beep - Make a sound when a remote client
connects.
- What is the Botmaster menu?
The Botmaster menu contains all
the tools to help develop chat robots.
Classify - same as Classify
button
Default Targets - display targets obtained from
the Default
('*') category,
in a format suitable for
quick conversion to new
AIML.
Recursive Targets - display targets from "recursive"
categories,
i.e. categories with a template containing
the AIML
<sr/> or <srai/> functions.
Autochat - The robot chats with
herself; sometimes helpful
in detecting conversation "loops".
Add AIML
- Clear the screen and type a line of AIML. Selecting
"Add AIML" adds this
new category to the chatbot. You can
test the bot with "Send" and "Classify",
then save it with
"File/Save Robot".
In general you can add any number
of new AIML categories
to the bot with "Add AIML."
- What does "Help" do?
The "Help" button displays a random
FAQ question that ALICE
knows the answer to. You can see the answer by
pressing the
"Send" button.
The Help menu provides the same function
as the Help button
under the selection "Random Help Question." Select a
random
Help question and obtain the reply with the "Send" button.
The
Help menu also contains an item to Show All Help Questions.
This command
lists all the FAQ questions the robot knows. You can
select one question by
deleting the others. Obtain the
answer with the "Send" button.
The
menu item "Ask Help Question" is the same as "Send". This
item asks the robot
the Help question(s), and displays the reply.
The Help menu displays the
entire FAQ with the "Don't Read Me"
selection. Finally, the "GNU Public
License" menu items displays
the open source software license for program
B.
- What is on the Help menu?
Random Help - Same as "Help"
button.
Show Help Questions - Displays a list of all FAQ questions.
Select
one by deleting all the others. Obtain the answer with "Send."
Don't Read Me - Display the text of this document.
GNU Public
License - Display the software license.
- Do I have to use the GUI to enter AIML content?
No. You can
create a new AIML file with any text editor
and add that content to an
existing robot with the <load> tag.
Also, you can edit AIML
categories in any text file and use
"Load Text File" and "Add Aiml" to add
the content.
You can also save the output of "Targets" to a file,
edit
that file, and then reload and "Add Aiml".
Finally, you can edit the
robot source file files directly.
(By default the robot source file is
called "B.aiml").
Use a text editor, like emacs, notepad, or a word
processor
in text mode, to modify the content of the AIML
files.
- What are 7 steps to creating content?
1. Run program B
(ALICE Botmaster)
2. Under "Options", select "Show Options".
Find the
item called "AnalysisFile=" and
change the value to the name of the
dialogue
file you want to analyze. The default file
name is the same as
the default log file
name, "dialog.txt".
3. Press the "Classify"
button. Wait
several minutes while the program processes
the data from
your log file. When finished
it will display a "brain activation"
table
showing the patterns that activated each
category. (You can use
"File/Save As Text File"
to save this table to a file, if you
want).
4. Now press the "Quick Targets" button.
You will see a set of
new categories created
by the program. These are categories with
patterns
that have no specific response in the
robot brain. With these categories you
have
3 choices (A, B or C):
(A) Delete the category. Many of the
suggested
categories are just nonsense or garbage inputs.
Use your cursor
and left mouse button to select
the categories for deletion.
The
"delete" key will cut them.
(B) Edit a new template. The information
you
see displayed in the <template> tags is actually
the pattern of
the default category into which
this input was classified. For example you
may see:
<category>
<pattern>WHO IS
007</pattern><template>WHO IS
*</template>
</category>
This tells us that the robot
classified the client "WHO IS 007"
as "WHO IS *". Use the cursor and left
mouse button
to cut the "WHO IS *", and replace it with a new template
of
your own design:
<category>
<pattern>WHO IS
007</pattern>
<template><set_he>007</set_he> is James
Bond, the
famous fictional spy from the novels of Ian
Fleming.
</category>
(C) Edit a new pattern. Many of the
patterns
suggested by "Quick Targets" and "More Targets" are
too specific,
but with a little practise you
can easily see how to generalize these
suggestions
with the "*" wild-card.
For example you may see one like
this:
<category>
<pattern>WHO BOMBED PERAL
HARBOR</pattern>
<template>WHO
*</template>
</category>
The original response was based
on "WHO *", which
is too general for this topic. But the odds
are small of
anyone else using this exact pattern
WHO BOMBED PEARL HARBOR when asking
about the
same topic. Think about the alternative ways
of expressing the
same question:
"Who attacked Pearl Harbor?", "Who invaded Pearl
Harbor?",
"Who through deceit and subterfuge
carried out an unscrupulous and unprovoked
suprise
attack on American forces at Pearl Harbor?"
You can cover all of
these inputs by generalizing
the input pattern with the wild-card
"*",
which matches any word or sequence of
words:
<category>
<pattern>WHO * PERAL
HARBOR</pattern>
<template>The Japanase
attacked Pearl Harbor
on December 7, 1941,
"A day that will live in infamy" (FDR).
<A
href="http://www.pearlharbor.org">...
</template>
</category>
Remember,
the AIML pattern language allows
at most one wild-card "*" per
pattern.
Of course, with choice (C) you have to
edit the template as
well as the pattern.
5. When finished with editing the suggested
categories,
use "Botmaster - Add AIML" to add the new AIML content.
If you
made any syntax errors, you can fix them
and repeat the "Add AIML" as many
times as needed.
Be sure to do a "File - Save Robot" at this point
also to
back up your changes. This will save all of
your new categories in the root
robot file
"B.aiml".
6. Use "More Targets" to find more new
categories
until the new suggestions are fruitless. Then, go
back and
start with "Classify" again (step [3]).
7. The responses you create
should be a combination
of a "conversational" response like "He is
James
Bond, the famous spy" and also provide some HTML
hyperlinks where
appropriate.
- How can I merge two chat robots together?
There are two
ways to merge robots together. First, you can
use the File menu option
"merge" to directly load the contents
of another bot file. You may see a lot
of "duplicate pattern
discarded" warnings but these can be ignored because
the program
is simply eliminating overlapping content.
Another method
is to use the <load filename=X/> tag.
Suppose you load two or more
files with the load tag,
and those files contain redundant duplicate
patterns.
Which categories get the priority? The answer is: it depends
on
the order of the <load> tags used to load the AIML files.
If your
B.aiml contains:
<load filename="Brain.aiml"/>
<load
filename="German.aiml"/>
then the categories from "Brain" have priority,
and duplicates
in "German" are discarded. If the order is the opposite,
German
categories have priority and Brain's duplicates are
discarded.
- How can I create a new robot personality?
There is a lot of
flexibility in robot personality design with AIML.
You can add to any of the
existing AIML files, modify or delete them,
create your own, or use the GUI
tools to analyze the log files
and create new categories. One simple method
is to create your own
Specialty.aiml file so that you can always get the
latest copies
of the ALICE files. Load your Specialty.aiml first in the
root
AIML file (usually B.aiml) so that its categories have priority over
ALICE's.
- What are all the options for program B?
There are robot
personality options, animated agent options,
log file and analysis options,
and options for the web server
and for the applet. Most of the time you won't
need to change
many of these values. For completeness, the entire
set
breaks down into:
Robot options:
Sign - Astrological
sign
Wear - clothing and apparel
ForFun - What the robot does for
fun
BotFile - Root file of robot personality
BotName - Robot
name
Friends - The robot's friends
LookLike - The robot
appearance
Question - A random question
TalkAbout - favorite
subjects
KindMusic - Favorite kind of music
BoyFriend - Does the robot
have a boyfriend?
BotMaster - Robot author
BotGender - male, female or
custom
GirlFriend - Does the robot have a girlfriend?
BotLocation - Robot
location
BotBirthday - Robot activation date
FavoriteBook - Robot's
favorite book
FavoriteFood - Robot's favorite food
FavoriteSong - Robot's
favorite song
FavoriteBand - Robot's favorite band
FavoriteMovie - Robot's
favorite movie
FavoriteColor - Robot's favorite color
BotBirthplace -
Robot's birthplace
MS Agent options:
Animagent - true or false for
activating MS Agent VB scripting
ACFURL - file or URL location of MS Agent
software
Log/Analysis options:
AnalysisFile - file selected for
log file analysis
LogFile - file for recording robot
dialogues
ClientLineContains - a pattern identifying input lines in
logfiles
RobotLineStarts - a pattern identifying robot lines in
logfiles
StartLine - starting line for analysis
EndLine - ending line for
log file analysis
Applet options:
AppletHost - DNS name or IP
address of applet's server.
CodeBase - URL or directory of applet
code.
Web server options:
ClerkTimeout - Web server option to
retire waiting clerks
BrainSize - a threshold number of categories to display
"loading"
Advertize - a boolean parameter to optionally display ad
Beep -
Web server option to beep on client connections
Other:
Version -
read only version number
TempFile - scratch file for temporary
data
All of the options reside in the globals.txt file.
Running
program B, choose "Options/Show Options" to see the
contents of
the file.
- Why is the format of the options (globals.txt) so
strange?
Depending on your system, you may see a globals.txt file
that looks like:
Animagent=true
Botmaster=Dr. Richard S.
Wallace
AnalysisFile=dialog.txt
ClientLineContains=t:
LogFile=dialog.txt
CodeBase=D\:CHATTERBOTS\ALICE
StartLine=0
Beep=true
BotFile=B.aiml
AppletHost=206.184.206.210
EndLine=25000
BotName=ALICE
Birthday=November
23, 1995
TempFile=Temp.ai
RobotLineStarts=Robot
# ... and so
on
The global values seem to be stored in a random order.
This is not
a bug. The Globals class uses the Java methods
Properties.load() and
Properties.store() to save the globals
to a file. You can also use # and ! to
add comments to the file.
The Properties class uses a hash table
representation, so does
not preserve the order of the global variables. The
program
displays and saves the global options in an arbitrary
order.
IV. Web server interface
- How does the web server work?
By default the web server
starts on port 2001. This means you can
access the web server through the URL
http://localhost:2001 on
your own machine. Find out your IP address or DNS
name and tell
your friends to connect to
"http://yourcompany.com:2001".
(One way to find out your IP address is by
running "netstat -n"
to view all your open TCP/IP
connections).
- How can I get a "permanent" DNS name?
You can buy a fixed
IP address from an ISP provider, but suppose
you want run a chat robot (or
other server) from your home over an
ordinary ISP connection? Or suppose you
want to carry it around on
your notebook PC, and plug it in anywhere in the
world?
One solution is a dynamic IP registry service by Dynip
(www.dynip.com).
They offer a service that allows you to register your
computer
with their server so that you always receive the same DNS
name,
for example alicebot.dynip.com. Every time you connect to your
ISP,
dynIP automatically associates your dynamic IP address with
your permanent
DNS name.
- How can I keep my computer connected all the time?
Running
a web server from home can be frustrating if your ISP
automatically detects
periods of "inactivity" or hangs up your
connected after a fixed interval
like 12 hours. Check out the
Rascal program from Basta computing
(www.basta.com) which runs
as a watchdog to keep your Windows machine
connected 24/7.
Another alternative is to use the program B applet,
called Blet.java.
A third alternative is the ALICE Servlet. Some ISPs
will
allow you to install a Servlet on their sever.
- Does the web server have to run on port 2001?
You can
change the default web server port number in the "Option"
Menu.
- Does program B serve HTML files?
Yes. Program B is a "faux"
web server that can serve a number of file
types just like an ordinary
server. Certain file names such as
"HOME.html", "header.html", and
"trailer.html" are reserved by
program B, but you can create new HTML files
and serve them with B.
Although program B can also serve image files and
other large binary
files, we recommend creating chat robot web pages with
links to images
served by other web servers or machines. Reserve your chat
robot server
for the robot chat, use ordinary web servers for images and
other large
files.
- What files are needed to run the program B web server?
The
program B directory must contain the HTML files header.html,
trailer.html,
loading.html and HOME.html. You can customize these files for
your bot, but
take care with "header" and "trailer" because
program B uses these files to
construct an HTML reply
(by inserting the robot reply and the text form
between the
"header" and the "trailer"). Use "header" and "trailer"
to
customize the robot with your own logo and links.
Program B needs
at least one AIML file, usually called B.aiml
by default. The AIML file may
contain <load> tags that recursively
load other AIML files; these must
also be present.
The program also requires the file
"globals.txt"
which it reads at start up.
The files "language.txt"
and "predicates.txt" are option.
"language.txt" controls the language of the
buttons and
menu items in the program B GUI. The file
"predicates.txt"
defines any custom predicates.
Program B also reads
the files "gnu.txt" (the GNU Public License)
and "dont.txt" (this file).
- Can I test the robot offline on my desktop?
Yes. You can
run the program B server and connect to it with
a browser, even if your
desktop computer is offline.
When working offline, it often helps to
change the Internet
settings (in IE or Netscape) to "local area network".
Then your machine becomes a one-computer network. You should
be able to
use IE to connect to program B with http://localhost:2001.
- How can I use the MS Agent Interface?
Select the menu item
Options/Toggle MS Agent. This sets the
output HTML to a format that includes
commands to run MS Agent.
The client may activate the agent if she
receives a template
with the <set_animagent/> tag. The free ALICE
download includes
a couple of example categories using this tag. Try
asking
ALICE, "Can you speak?". In another demo ALICE imitates
the famous
fictional AI HAL from 2001: A Space Odyssey.
Client: Tell me about
yourself
Robot: I am an artificial linguistic entity. I was created
by
Dr. Richard S. Wallace at Bethlehem, Pennsylvania,
on November 23, 1995. He
taught me to sing a song.
Would you like me to sing it for you?.
Client:
yes
Robot: Ahem. It's called, "Daisy." (Agent sings "Daisy")
The MS
Agent VB script appears as embedded HTML in the client
reply. To verify the
script, use the browser "View Page Source"
menu item.
On most newer
browsers, the agent software will download
automatically after the script
starts. The download may take
several minutes, depending on the speed of the
connection.
Clients should be warned that the download is slow. Also,
the
agent software download will display one or more licenses
in Dialog boxes.
You may not want to accept the terms of the
MS agent software licenses.
- Can you help me debug the animated agent?
Look at the
class Animagent.java. The method vbscript_html(reply)
does nothing unless the
global Animagent member is true. In that case,
the vbscript_html() method
constructs a string from the reply that
includes an MS Agent VBScript
embedded in the HTML reply.
This makes the browser load up the objects
required for the agent.
The text reply just becomes part of the
VBScript.
You may have to download and run the Robby the Robot
agent
software and the text-to-speech synthesis software from
the MSDN
homepage:
http://msdn.microsoft.com/workshop/imedia/agent
We wish
other companies were producing agent animation API's
for free but this MS
Agent seems to be about the only
thing out there now.
Join the ALICE
and AIML mailing list at alicebot.listbot.com
to see how others are working
with the animated agent software.
- Can I speak to the robot with voice input?
One simple
experiment that works well as a demo
involves using IBM ViaVoice (tm) speech
recognition
software on a Windows platform. At the same time,
run the
ALICE program B web server and activate the
MS Agent interface. The ViaVoice
software allows
you to dictate into an application called VoicePad,
but
not directly into the browser. You have to
use "cut" and "paste" to move your
speech inputs
into the browser form for ALICE. But the net effect
is a
somewhat slow voice-in voice-out conversation
with ALICE.
The
ViaVoice software seems to work well with ALICE
after some training. We
trained it with the file
"patterns.txt" created with the "List Patterns"
command.
- How does ALICE keep track of conversations?
Originally
ALICE used IP addresses to keep track of clients.
Assuming that everyone
chatting with ALICE has a fixed IP
address, at least for the duration of
their conversation,
this technique works successfully. Each IP address is a
key
into a hashtable (or database) that stores the client's
dialogue,
name, and values of pronouns and other AIML values.
Unfortunately, many
clients have "dynamic IP addressing" enforced
by their ISP provider. AOL and
MS WebTV are two notorious examples:
each successive client transaction
appears to come from a different
host. For this reason, program B uses a form
of "virtual IP"
addressing to track dialogues.
The form in index.html
(and the ALICE home page) contains a
tag that creates a "hidden" parameter
called "virtual" with
an initial value of "none." The server assigns a unique
name
to the value of "virtual", which then becomes a hidden variable
in
the client's HTML form. Each successive client transaction
contains this
virtual IP address; the server uses it as a key
to index the
conversation.
- Can the virtual IP be the real IP?
Actually that would be
the default case, when the client chats from
the same fixed IP address. The
only time the virtual ip differs from
the real one is when the client is
behind a dynamic firewall, like
WebTV or AOL customers.
- Can I run the web server as a daemon process?
Yes. There is
a class file called Bterm.java in the
program B distribution. Bterm runs the
web server
as a console application, with no GUI. You can
redirect the
output of program Bterm to a log file
and start the process in the background
with
"java Bterm > B.log &" (assuming a Unix shell).
- How does ALICE remember clients between sessions?
The
persistence of memory in ALICE is inherited from
the Java Properties class.
The program B class Classifier
saves the client name, age, location and other
properties
in a set of Properties lists. These Properties inherit
the Java
load and store methods. Program B uses the load
and store methods to save the
client properties in a set of
files with names ip_name.txt, ip_age.txt,
ip_location.txt
and so on. If these files become too large or
bothersome,
there is no harm deleting or editing them, or moving them
to
another directory.
The Applet requires no memory of the client
properties, because
the applet has only the one client, and in any case
remains in
memory (at least for the lifetime of the client's browser
cache).
V. Applet Interface
- How does the Applet work?
Program B supports the creation
of both server-side and client-side
chat robots. The server runs as a thread
in program B. The
client-side version is supported by an applet called
Blet.java.
The Applet Blet.java runs ALICE in a web browser, or with
the Java tool appletviewer. The file "index.html" contains an
example of
the HTML Applet tag syntax needed to start
the Applet. The command
"appletviewer index.html" will start the
Applet.
You also have to
create the file "index.html" and change the
default value of the parameters
"codebase" and
"applethost" serve the Applet from your
location.
- How does the Applet differ from the application?
The Applet
runs on the client's computer; the server runs
on your host machine. The
applet has fewer privileges and
therefore a simpler user interface than the
Application,
which uses menus and buttons to control server-side
functions.
The Applet may reside on any web server, such as one
provided
with an ISP account, but the application requires a
24/7
connection to the Web.
Internally, the primary difference
between the two programs
is that the Applet handles only one client
conversation,
while the application processes multiple client
connections
simultaneously. The Applet also suppresses all HTML (and
any
other XML) from the client response.
- How do I create an Applet?
Go to the Options menu and
select "Show Options." You need
to change the values of "AppletHost" and
"CodeBase" to the
correct IP address and directory for your applet host.
Many people want to post the applet on their web site.
In that case,
change the IP address "206.184.206.210" to
the name or IP address of the web
server. Change the
directory path "/B" in "CodeBase" to your directory
on
the remote server. Save the changes with "Save Options."
Select
"Create Applet" from the options menu to create
the "index.html" and
"Blet.aiml" files needed to run
your applet. The program displays the
contents of
"index.html" in your text area.
Use a file transfer
utility like FTP to upload the
class files (or jar file--see "What files do I
need to
run the Applet") to your web server.
- List twelve basic Applet tips for AIML users
1. Applets are
notoriously hard to debug; you are not dumb.
2. An applet can work perfectly
well in Appletviewer, but
then break in the browser, for any number of
reasons.
3. Let's get the terminology straight: the applet resides on
an
"originating host" but runs on a "target machine".
4. The browser is very
picky because of the "security
sandbox"--the browser doesn't trust Applets so
they can't
open files (and obey other restrictions) on the target
machine.
5. The Applet MAY open a socket connection from the
target
machine to the originating host.
6. When you are debugging the applet, the
target machine
might be the same as the originating host (your
computer).
7. When you post your applet to a remote web server,
that
server becomes the originating host.
8. You can use ftp to transfer the
Applet files to the
remote web server.
9. You must transfer ALL the
applet's files
to the originating host.
10. You must change the program B
values of "CodeBase"
and "AppletHost" (the originating host) to the name
and
location of the files on the remote server.
11. Use "Create applet" to
create the "index.html" and
"Blet.aiml" (make sure you have the latest
release of B.zip)
12. We recommend placing all the *.class files into
a
single "Blet.jar" file (see DON'T READ ME).
- Can the AppletHost use a symbolic DNS name instead of an IP
number?
The answer is yes, but the numeric IP address works on
more machines
than a symbolic name. Applets are protected by a "security
sandbox"
from interfering with local resources on your machine. One
restriction
is that Applets may only open socket connections to the
originating
host. When using a symbolic DNS name, the "sandbox" may not know
that
two variations such as "Www.AliceBot.Org" and "alicebot.org" are
in
fact the same server. The client might not be able to resolve
the DNS name,
and the Applet will throw a security exception.
- What files do I need to run the Applet?
You only need the
java *.class files and the *.aiml files
to run the ALICE Applet, no more
files are necessary.
You can also put all the class files in a single
jar
file like Blet.jar. The sample index.html provided with the ALICE
distribution uses this Blet.jar file.
Not all of the Java source
files are involved in the Applet.
You can use the following command to
compile all the Java source
files needed for the Applet:
javac
Access.java Globals.java StringFile.java Substituter.java \
Classifier.java
Loader.java Animagent.java Log.java Blet.java
Then, you can use zip (or
jar) to collect the class files into
a single jar file:
zip -r
Blet.jar *.class
The *.class will include all the class files you
compiled.
The *.aiml files have to be on the same host that serves the
Applet. An applet
can only open files on the server it originated
from.
Don't forget to change the Applet host parameters in index.html,
when
you upload the applet to an ISP.
- Does the Applet record dialogues?
The applet tries to log
conversations on the originating server,
using a cgi-bin script called
"Blog". If Blog exists then
it records the dialogues in a file called
"dialog.txt" (or
another name chosen on the Options menu).
Actually
the cgi-script need not actually exist, because the server
records the
cgi-commands as errors in the access log.
The applet opens a URL connection
to the its host, and
sends a log string that looks like an HTTP request, but
the HTTP
server will log it as an error (with code 404). Later on you
can
download the access_log and analyze it with program B.
See the
code in Classifier.java for the method log(x) that
implements the URL
connection.
- Can I analyze the dialogues collected by the Applet?
If the
web server produces an access_log file, such
as /var/log/httpd/access_log,
then the server records
Applet dialogue in the access_log file. You may
use
ftp to download the access_log file to your machine;
then run program
B to analyze it.
Go to the Options menu and find the value for
"AnalysisFile".
The Classify function operates on the data in the
AnalysisFile.
By default the AnalysisFile is the same as the LogFile
(the
current server log file). But you can change the analysis
file to
another name, such as /var/log/httpd/access_log or
just
access_log.
- Can the applet record a dialog.txt file on the server?
No
because the applet cannot write the file directly on the originating host.
If
your server log file /var/log/httpd/access_log is too large; you
have a
couple of choices:
1. If your ISP is a unix account, use telnet to log on to
a shell account.
Use the command "grep Blog < access_log > dialog.txt"
to create a smaller
file to download which contains just the lines recorded
by the applet.
2. Create a CGI-BIN command called "/cgi-bin/Blog" that reads
its
command-line argument and appends it to a file called
"dialog.txt".
There ought to be a nice Perl script for this, or even a shell
script.
- I am still having problems with the applet
If your applet
is looking at Blet.aiml and your web space is at
www.myplace.org and your
aiml files are in dirctory /alice/ then
your load statements in Blet.aiml
would look similar to this:
<load
url="http://www.myplace.org/alice/Atomic.aiml">
If this is what you
have, then open up the "Java Console" window
in your browser to get whatever
debugging information is coming
out. The Java console will display any error
messages or
exceptions caught by program B. Please report these
errors to
the ALICE and AIML mailing list at
alicebot.listbot.com.
- Can you give me any help debugging the Applet?
Debugging
applets can be tricky. The same suggestion
to set IE for "local area network"
might help here too.
Also the browser caches class files, so it's difficult
to
know if you are testing a "fresh" copy of the applet. The
program
"appletviewer" that comes with Sun Java is better
for debugging applets. Use
"appletviewer index.html".
The best thing to do is join the alicebot
mailing list
at alicebot.listbot.com.
VI. AIML
- What is AIML?
The ALICE software implements AIML
(Artificial Intelligence Markup
Language) a non-standard evolving markup
language for creating chat robots.
The primary design feature of AIML is
minimalism. Compared with
other chat robot languages, AIML is perhaps the
simplest. The
pattern matching language is very simple, for example
permitting
only one wild-card ('*') match character per pattern.
AIML
is an XML language, implying that it obeys certain grammatical
meta-rules.
The choice of XML syntax permits integration with
other tools such as XML
editors. Another motivation for XML is
its familiar look and feel, especially
to people with HTML experience.
An AIML chat robot begins and ends with
the <alice> and
</alice> tags respectively.
- What is XML?
David Bacon pronounces it "Eggsmell". XML is
the Extensible
Markup Language. Like many "standards" in computer science,
XML
is a moving target. In the simplest terms, XML is just a
generalized
version of HTML. Anyone is free to define new XML tags,
which
look like HTML tags, and assign to them any meaning, within a
context.
AIML is an example of using the XML standard to define a
specialized
language for artificial intelligence.
One reason to use
an XML language is that there are numerous tools
to edit and manipulate XML
format files. Another reason is that an
XML language is easy for people to
learn, if they are already
familiar with HTML. Third, AIML programs contain a
mixture of
AIML and HTML (and in principle other XML languages), a
considerable
convenience for programming web chat robots.
A good
resource for information on XML is www.oasis-open.org.
- What is a category?
AIML consists of a list of statements
called categories. Each
category contains an input pattern and a reply
template.
The syntax of an AIML category
is:
<category>
<pattern> PATTERN </pattern>
<template> Template
</template>
</category>
or
<category>
<pattern>
PATTERN </pattern>
<that> THAT </that>
<template>
Template </template>
</category>
The AIML category tags
are case-sensitive. Each open tag has an
associated closing tag. This syntax
obviously derives from XML.
- What is a pattern?
The pattern is the "stimulus" or "input"
part of the category.
The pattern is an expression in a formal language
that consists of
(1) Words of natural language in UPPER CASE.
(2) The
symbol * which matches any sequence of one or more words.
(3) The symbol _
which is the same as * except that it comes
after Z in lexicographic
order.
(4) The markup <name/> which is replaced at robot load time
with the name of the robot.
Note there is a difference between the
patterns HELLO and HELLO *.
HELLO matches only identical one-word sentences
("Hello.")
and HELLO * matches any sentence of two or more words starting
with "Hello" ("Hello how are you?").
To simplify pattern description
and matching, AIML patterns allow
only one "*" per pattern. In other words,
"MY NAME IS *" is a
valid pattern, but "* AND *" is not.
- What is a template?
A template is the "response" or
"output" part of an AIML category.
The template is the formula for
constructing the reply. The simplest
template consists of plain, unmarked
text. AIML provides markup
functions to tailor the replies for each
individual input and client.
The markup function <getname/> for
example inserts the client's name
into the reply.
The template may
call the pattern matcher recursively using the
<sr/> and <srai>
tags. Many templates are simple symbolic
reductions that map one sentence
form to another, for example
"Do you know what X is?" transforms to "What is
X" with the category
<category>
<pattern>DO YOU KNOW WHAT
* IS</pattern>
<template><srai>WHAT IS <star/>
</srai></template>
</category>
The template may also
contain other embedded HTML and XML.
These embedded tags may cause the
browser to play a sound,
show an image, or run an applet. There is
considerable freedom
of expression in the construction of response templates.
The
botmaster is encouraged to study the examples in ALICE, to
and
experiment with new ideas.
- What is "that"?
The keyword "that" in ALICE refers to
whatever the robot said before
a user input. Conceptually the choice of
"that" comes from the
observation of the role of the word "that" in dialogue
fragments like:
Robot: Today is yesterday.
Client: That makes no
sense.
Robot: The answer is 3.14159
Client: That is cool.
In
AIML the syntax <that>...</that> permits an optional
"ThatPattern"
to match the robot's "that" expression. A common example using
"that"
is any yes-no
question:
<category>
<pattern>YES</pattern>
<that>
DO YOU LIKE MOVIES </that>
<template> What's your favorite movie?
</template>
</category>
This category handles the user
input "YES" and checks to see whether
the client is replying to the question
"What's your favorite movie?".
One interesting application of "that" are
the categories that
enable a robot to respond to "knock-knock"
jokes:
<category>
<pattern>KNOCK
KNOCK</pattern>
<template>Who's
there?</template>
</category>
<category>
<pattern>*</pattern>
<that>WHO
IS THERE</that>
<template><person/>
Who?</template>
</category>
<category>
<pattern>*</pattern>
<that>*
WHO</that>
<template>Ha ha very funny,
<getname/></template>
</category>
Client: KNOCK
KNOCK
Robot: Who's there?
Client: BANANA
Robot: banana Who?
Client: KNOCK KNOCK
Robot: Who's there?
Client: BANANA
Robot:
banana Who?
Client: KNOCK KNOCK
Robot: Who's there?
Client:
ORANGE
Robot: orange Who?
Client: ORANGE YOU GLAD I DID NOT SAY
BANANA
Robot: Ha ha very funny, Aol-person
- How do I use "that"?
The AIML tag <that> refers to
the robot's previous
reply. There are two forms of the <that> tag:
a
paired form <that>...</that> appearing in a
category, and an
atomic form <that/> always appearing
in a template. Often we can use
<that/> to find
an opportunity to create a category with
<that></that>.
One of the default replies to the input "WHY"
is
"<that/>"? Why? This default produces the following
dialogue
fragment:
Robot: Do not ask me any more questions please.
Client:
WHY
Robot: "Do not ask me any more questions please"? Why?
The
botmaster notices the fragment and creates the
new AIML
category:
<category>
<pattern>WHY</pattern>
<that>DO
NOT ASK ME ANY MORE QUESTIONS PLEASE</that>
<template>Because I
would rather talk about you.</template>
</category>
Now
the next client who asks "WHY" to the robot's
request will active the new
<that> category:
Robot: Do not ask me any more questions
please.
Client: WHY
Robot: Because I would rather talk about
you.
This style of conversational analysis does not
presuppose that we
know when the client will
say "WHY"; rather it looks backward to
identify
cases where the "WHY" appeared following one
of the robot's
statements. Having identified
the conversation point, the botmaster
creates
the new category.
- What is <load filename="X"/>?
The template may
contain a <load/> tag to recursively load an AIML
file. The semantics
of a load are the same as a merge: categories
loaded first have priority; the
server eliminates categories with
duplicate patterns.
The default
robot file B.aiml contains the top-level load commands.
There are several
ways to "comment out" a <load> tag in order
to test your system with a
smaller robot. You can change the
line reading
<load
filename="Brain.aiml"/>
to
<noload
filename="Brain.aiml"/>
and the AIML parser will simply ignore the
non-existent "noload"
command.
- What happens to contractions and punctuation?
Program B has
a class called Substituter that performs a number
of grammatical and
syntactical substitutions on strings.
One task involves preprocessing
sentences to remove ambiguous
punctuation to prepare the input for
segmentation into individual
sentence phrases. Another task expands all
contractions and
coverts all letters to upper case; this process is
called
"normalization".
The Substituter class also performs some
spelling correction.
(See also the question "What is
<person/>?")
One justification for removing all punctuation from
inputs
is the need to make ALICE compatible with speech input
systems,
which of course do not detect punctuation (unless the
speaker
utters the actual word for the punctuation mark --
"period").
- How are the patterns matched?
Program B stores the
categories in alphabetical order by pattern.
When a client enters an input,
the program scans the categories
in reverse order to find the best match. By
comparing the
input with the patterns in reverse alphabetical order, the
algorithm
ensures that the most specific pattern matches first.
"Specific"
in this case has a formal definition, but basically it means
that
the program finds the "longest" pattern matching an input.
The
wild-card character "*" comes before "A" in alphabetical
order. For example,
the "WHAT *" pattern is more general than "WHAT IS *".
The default pattern
"*" is first in alphabetical order and the
most general pattern. For
convenience AIML also provides a
variation on "*" denoted "_", which comes
after "Z" in alphabetical
order.
- Do the categories need to be in alphabetical order by
pattern?
No, the alphabetical order is maintained internally when
the
categories load, but you can write them in any order. When you
do
"Save Robot" the file may or may not be stored
alphabetically.
- How are the categories stored?
If your session with program
B included a "Classify" routine, then
the AIML script is stored in order of
category activation rank.
In other words, program B stores
the most
frequently accessed category (usually '*') first, the second
most frequently
next, and so on. If a number of categories have the
same activation count,
program B saves them in alphabetical order by
pattern. Hence, if the session
did not include a "classify" routine,
the program stores all the categories
in alphabetical order by pattern
(because they all have an activation count
of zero).
One reason to store the categories in order by activation is
to
make the Applet interface more natural. Because the Applet
interface
starts simultaneously with a thread to load the robot source
file,
the Applet client can talk with the robot before all the
categories
are fully loaded. Given that the interlocutor is more likely
to
say something that activates a more frequently activated category,
it
makes sense to transmit these categories first. Storing the
*.aiml files in
order of category activation achieves the desired effect.
The Applet loads
the most frequent categories first, and continues
loading in the background
while the conversation begins.
- Is there a way to use the GUI interface to add one category at a time?
Yes. Do a "clear". Type in one
category:
<category>
<pattern>WHO IS
JOHN</pattern>
<template>He is a really smart
guy.</template>
</category>
Now do a "Add AIML". If you
like the result, do a "Save Robot".
If your name is not John, try
replacing JOHN with
your own name. Notice that the pattern is in all upper
case.
This is called "normalized form". We store patterns this way
for
efficiency. The template on the other hand consists of
mixed case.
You can also create a file of AIML, do a cut & paste, and then "Add
AIML"
to add more categories. Editing the source file directly is of course
also
useful. If you edit the source file, select "Load Robot" to load
it.
Try creating a text file with the
category:
<category>
<pattern>WHO IS JOHN
WANG</pattern>
<template>
<random>
<li>He is a
really smart guy.</li>
<li><set_he>John Wang</set_he>
is a great
father.</li>
</random>
</template>
</category>
Load
the file into program B with the "File/Load Text File"
menu item. Then
select "Add AIML" from the Botmaster menu.
- Can I build on top of the ALICE code rather than changing
it?
Absolutely. You only have to change her name, location,
birthday and/or
botmaster, and put a couple of references to yourself. Then
add new
categories that cover your own area of expertise or
interest.
- What's new in AIML?
AIML is changing. The original tag
syntax was changed
into XML. Right now, AIML uses XML syntax for
the
categories, patterns, "that" patterns and templates, but inside the
<template> tag you may still see the original +~ syntax in a few
places.
But this will change soon. For completeness program B
supports
both versions.
The biggest change between the old AIML and the new
XML
version of AIML is the elimination of the "+"
character to stand for string
appendage. The change
is of little concern except in the implementation
of
<random>, discussed at length below.
The old AIML used a
tilde (~) markup character to
indicate the start of an AIML token. The XML
version
naturally uses an SGML type tag syntax instead.
XML tags,
unlike HTML, are case-sensitive. Moreover, XML syntax
requires a closing tag
of some kind. The "empty" tags that contain
no text, like <A></A>
in HTML, are written like <A/> in XML.
- What is <star>?
The <star> tag indicates the
input text fragment matching the pattern '*'.
Remember, <star/> is an
XML abbreviation for <star></star>.
<star/> the value
of "*" matched by the pattern.
- What is a symbolic reduction?
In general there are a lot of
categories whose job is
"symbolic reduction". The
category:
<category>
<pattern>ARE YOU VERY
*</pattern>
<template><srai>ARE YOU
<star/></srai></template>
</category>
This
category [in Brain.aiml] will reduce "Are you very very smart"
to "Are you
smart".
- What are the get methods?
Get methods are logically atomic
tags, i.e. they enclose no text.
(similar to say <P> or <IMG> in
HTML). But XML requires closing tags.
All the "get" methods retrieve
values stored relative
to a particular client IP address. We use
hash
tables to store the maps from IP to these attributes.
<get_ip/> Get
the client's IP address
<getname/> client's name
<gettopic/>
The "topic" of conversation
<name/> Robot's name
<location/>
Robot's location
<gender/> Robot's gender
<birthday/> Robot's
birthday
<that/> what robot said previously
<get_location/>
the client's geographic location
<get_it/> the value of
"it"
<get_they/> the value of "they"
<get_he/> the value of
"he"
<get_she/> the value of "she"
<get_we/> the value of
"we"
<get_gender/> a string like "she" or "he" for client
gender
In XML languages there is always a tradeoff between creating
attributes
and creating new tags. The get methods are really all special
instances
of a more general <get attribute="name">, for
example
<get_we/> = <get attribute="we"/>
The attributes
with explicit "get" names (getname, get_it, get_we etc.)
are client-specific
properties. The other attributes (e.g. <name/> and
<botmaster/>)
relate to the robot.
- What are the set methods?
Set methods consist of single-tag
and double-tag markup. The
methods
<set_male/> the client gender
is male
<set_female/> the client gender is
female
<set_animagent/> activates the animation agent.
<setname> X </setname> sets the client name to
X
<settopic> X </settopic> sets the topic to X
<set_it>
X </set_it> sets the value of "it" to X
<set_location> X
</set_location> sets the value of client location
<set_they> X
</set_they> sets the value of "they" to X
<set_he> X
</set_he> sets the value of "he" to X
<set_she> X
</set_she> sets the value of "she" to X
<set_we> X
</set_we> sets the value of "we" to X
<set_thought> X
</set_thought> is a custom tag suggested by Andrew
Potgieter for
storing a predicate for "what are you thinking about?"
See the documentation
on custom tags and the predicates.txt file.
- How do I use the pronoun tags?
The guiding principle in
creating content is dialogue analysis.
The principle applies to pronouns like
"it", "he" and "we". This
example shows how a couple of tricks helps us find
places where
the pronoun tags improve the dialogue.
Client:
HELLO
Robot: Hi there!
Client: HOW ARE YOU
Robot: I am functioning
within normal parameters.
Client: I AM BROKE
Robot: Do you mean your name
is broke.
Client: NO I HAVE NO MONEY
Robot: I see. Would you like to have
it?
Client: YES
Robot: "Would you like to have it"? Yes?
One of the
default responses to a YES pattern is
"<that/>?" Yes?
The
conversation breaks down somewhat at this point, but
the robot nonetheless
conveys an illusion of understanding
the client's response. With its
response, the robot is really
telling the botmaster to create a new "YES"
reply specific
to this
situation:
<category>
<pattern>YES</pattern><that>WOULD
YOU LIKE TO HAVE IT</that>
<template>Where do you usually get
<get_it/>?</template>
</category>
Now the next time
we run the dialogue it ends with:
Client: NO I HAVE NO MONEY
Robot: I
see. Would you like to have it?
Client: YES
Robot: Where do you usually
get money?
There is no automated technique to finding the
conversation
points where the pronoun tags improve the flow of
conversation.
This is the "art" of AIML programming. The example shown
here
with '"<that/>?" Yes?' exemplifies one approach to finding
these
conversation way points. There are no doubt countless other
tricks
like this, and the field is wide open to linguists and
writers to help us
uncover them.
- I still don't get "it"
Here is another example that might
help clarify
the meaning of the pronoun "it."
The category with the
pattern "DO YOU LIKE *" is
a kind of default category for a whole class
of
inputs matching "Do you like X?", when the
input does not match a
category with a more specific
pattern like "DO YOU LIKE CATS". No matter
what the
client says, we want the robot to remember that
"it" stands for
"X".
Many inputs activate this default category, so
the botmaster
tries to create a variety of replies
using the <random> tag twice. One
interesting
side-effect of the random tag is the evaluation
of all AIML
inside the <random>...</random>
expression, whether the random
function selects
the list item containing the markup or not.
So if one
item happens to contain a <set_it>
tag then the program will remember
"it",
even if the actual random reply is a different
item.
<category>
<pattern>DO YOU LIKE
*</pattern>
<template><random>
<li>I don't know if
I like <set_it><person/></set_it>.
</li>
<li>I've heard other people say they like that.
</li>
<li>Some people like it. </li>
<li>How old
are you? </li>
<li>Do you like it?
</li>
</random>
<random>
<li><for_fun/>.
</li>
<li>I like pets. </li>
<li>I love horses.
</li>
<li>I like trains. </li>
<li>But I like
cats. </li>
<li>I really like ELIZA. </li>
<li>But
I like to gossip. </li>
<li>I like science fiction.
</li>
<li>Though I like cats and dogs.
</li>
<li>My favorite thing is robots. </li>
<li>I
love TV shows about robots. </li>
<li>But I think airplanes are
cool. </li>
<li>I like to talk about movies and music.
</li>
</random>
</template>
</category>
In
the "DO YOU LIKE *" category one list item
I don't know if I like
<set_it><person/></set_it>.
results in the pronoun
"it" being bound to the value
of "*", rewritten in third person. With this
category,
the robot has the following dialogue:
Client: Do you like
Dr. Who
Robot: How old are you? I like to make new friends.
The robot
response is a combination of two random
selections. But as a side effect, the
pronoun value
of "it" is now set to "Dr. Who".
Client: It is a TV
programme
Robot: Dr. Who is a TV programme. I suppose that makes
sense.
To generate the final response, the robot applied its
category:
<category>
<pattern>IT IS A
*</pattern>
<template><get_it/> is a <person/>. I
suppose that makes sense.</template>
</category>
The
robot is correct that the reply does indeed make sense. Does
this example
suggest that in our human mind "it" is just a temporary
storage register for
a fragment of text?
Extra credit: In the dialogue fragment above, why
did the robot say
"I like to make new friends"?
- Can I create more AIML tags?
AIML is extensible. You can
create an infinite number of
new tags for foreign language pronouns,
predicates, or
application-specific properties. The file
"predicates.txt"
defines any new predicate tags. "Predicate tags"
mean
tags that have a client-specific "set" and "get" method.
Pronouns
like "it" and "he" have predicate tags like
<set_it></set_it> and
<get_he/>. AIML has a number of
these built-in tags for common English
pronouns.
There are two varieties of extensible predicate tags.
The
first example illustrates the use of new tags
for foreign language pronouns.
The Japanese language
pronoun "kare" means "he". In predicates.txt, we
can
add a line of the form:
kare=dare
This single line automatically
generates the tags
<set_kare> X </set_kare> to set the value of
"kare"
to X, and the tag <get_kare/> to retrieve the value.
By
default, <get_kare/> returns "dare" ("who?").
Now we can create
two AIML categories for an elementary
Japanese
conversation:
<category>
<pattern>KARE WA *
DESU</pattern>
<template><star/> wa
<set_kare><star/></set_kare>
desu.</template>
</category>
<category>
<pattern>KARE
WA * DESU KA</pattern>
<template><get_kare/> desu ka?
</template>
</category>
<category>
These two
categories produce a coherent conversation
about Mr. Arimoto:
Client:
KARE WA DARE DESU KA
Robot: dare desu ka?
Client: KARE WA ARIMOTO SAN
DESU
Robot: arimoto san wa kare desu.
Client: KARE WA DARE DESU
KA
Robot: arimoto san desu ka?
Notice that the program replaces
<set_kare/><star/></set_kare>
with "kare", the default
value of this tag. Sometimes however
the default value of the tag is not the
best choice for
the set tag to return. The second type of predicate tag
replaces the set tag markup with the original string
inside the markup.
To create a predicate tag "has" we add the following line
to
predicates.txt:
has=*mother
The default value of <get_has/>
is "mother" (because
everyone has a mother). The "*" here indicates
that
the program should replace <set_has> X </set_has>
with X.
Here we write two AIML categories to demonstrate the
"has"
predicate:
<category>
<pattern>I HAVE A
*</pattern>
<template>Where did you get
<set_it><set_has><person/></set_has></set_it>?
</template>
</category>
<category>
<pattern>WHAT
DO I HAVE</pattern>
<template>Tell me more about your
<get_has/>. </template>
</category>
Notice that the
first category embeds <set_has> in <set_it>,
so that the value of
<person/> becomes both "it" and "has".
The resulting
dialogue:
Client: WHAT DO I HAVE
Robot: Tell me more about your
mother.
Client: I HAVE A BOX
Robot: Where did you get it?
Client: WHAT
DO I HAVE
Robot: Tell me more about your box.
Look at the file
"predicates.txt" for several more
examples of custom AIML tags, including
the German
<set_sie>, <set_er> and
<set_es>.
- What is are the <person> tags?
The <person> and
<person2> tags indicate a place where the
AIML interpreter changes the
personal pronouns in a sentence.
<person2> X </person2>
change X from 1st to 2nd person
<person> X </person> exchange 1st
and 3rd person
<person2> is not often used. The main application
is
"gossip":
Client: I admire robots like you.
Robot: That's good
information: Joe said he admire robots like me.
The transformation is a
combination of:
1. change the first person pronouns to second
person.
2. change the third person pronouns to first person.
The array
in Substituter.java is incomplete. We need more substitutions
to make person2
work really well.
The <person> substitution is much more common and
easier
to understand, because it simply exchanges 1st and 3rd
person
pronouns. The main issue with <person> in English is
knowing
when to use "I" and when to use "me".
- What is the <person/> tag?
The XML specification
requires that every start tag such as
<person> be followed by a
matching end tag like </person>.
HTML is more relaxed about this
requirement, exemplified by
the liberal use of the <IMG> tag without a
corresponding </IMG>.
XML supports a shorthand notation for the
"atomic" tags.
The <star/> tag is an example of a shorthand AIML tag.
<person/> is another example:
<person/> =
<person><star/></person>
This tag replaces the
+~person(*)+ tag in old-style AIML.
- What is the <person2/> tag?
This tag is an
abbreviation:
<person2/> =
<person2><star/></person2>
See the FAQ question "What
are the <person> tags?" for more
information about
<person2/>.
- What is "gossip" ?
Gossip is an interesting "learning"
feature of AIML. The best way to
illustrate the gossip function is with an
example. Consider the
category with the pattern "I * " and the
template:
Thanks for the gossip:
<gossip><getname/> said
<get_gender/> <person2/> </gossip>.
The gossip()
function tells AIML that the botmaster wants to save this
tidbit as gossip.
The <get_gender/> function returns "he" or "she" as
determined by the
markup functions <set_female> and <set_male/>.
The
<person2/> function converts the statement * to second person.
At
present the robot stores the gossip collected in a file
called
"gossip.txt".
<gossip> X </gossip> Save X as
gossip.
- What is the <personf/> tag?
The value of
<personf/> (a "formatted" personal pronoun transformation)
is shown by
the example
<category>
<pattern>WHAT IS A
*</pattern>
<template>
What does
<A
HREF="http://www.dictionary.com/cgi-bin/dict.pl?term=<personf/>">
<set_it> <person/> </set_it>
</A> mean?
<BR>
Or Ask Jeeves:
<A
HREF="http://www.ask.com/AskJeeves.asp?ask=WHAT%20IS%20A%20<personf/>">
What
is a
<person/>?
</A>
</template>
</category>
The
search strings formatted for the Webster Dictionary and for
the Ask.com
search engine utilize <personf/>. The effect is the
same as
<person/>, but the formatting inserts an escaped "%20" in
places of the
spaces returned by <person/>. These escape sequences
permit the HTTP
GET methods to transmit multiple-word queries.
- What's the <srai> tag?
The recursive function
<srai> stands for
"Stimulus-Response artificial intelligence" and
means
that the text between the tags should be sent recursively
to the
pattern matcher and the result interpreted.
The resulting text replaces the
original text in the markup.
<srai> X </srai> calls the
pattern matcher recursively on X.
<sr/> recursive call to chat
robot
<sr/> abbreviates <srai> <star/>
</srai>
Note: what happens if X contains AIML markup? Does the
interpreter
do "lazy evaluation"? Look at the source code and examine
the
method pfkh(), the Program Formerly Known as
"Hello".
- Could you explain the <srai> tag a little more?
The
most common application of <srai> is "symbolic reduction"
of a complex
sentence form to a simpler one:
<category>
<pattern>DO YOU
KNOW WHAT * IS</pattern>
<template><srai>WHAT IS
<star/></srai></template>
</category>
so the
botmaster can store most knowledge in the
simplest
categories:
<category>
<pattern>WHAT IS
LINUX</pattern>
<template><set_it>Linux</set_it> is
the best operating system.</template>
</category>
With all
the "symbolic reduction" categories, the robot gives
the same answer
for:
"What is Linux?"
"Do you know what Linux is?"
"Define
Linux"
"Alice please tell me what Linux is right now"
Sometimes the
response consists of two symbolic reductions
together:
<category>
<pattern>YES
*</pattern>
<template><srai>YES</srai>
<sr/></template>
</category>
With this category the
robot will reply to all
"Yes something" inputs by combining the
reply to
"Yes" with the reply to "something".
Remember, <sr/> is an abbreviation
for <srai><star/></srai>.
The <srai> tag is also
the answer to the question: Can I have more
than one pattern in the same
category? Suppose you want the
same answer for two different patterns. You
might think of
writing something like
this:
<category>
<pattern>BYE</pattern>
<pattern>GOODBYE</pattern>
<template>See
you later.</template>
</category>
Right now you can't put
two patterns in one category, but with <srai>
you can get the same
effect:
<category>
<pattern>GOODBYE</pattern>
<template><srai>BYE</srai></template>
</category>
<category>
<pattern>BYE</pattern>
<template>See you later.</template>
</category>
If
you look through the AIML files you will see many examples
of <srai>
mapping multiple patterns to the same reply.
- How recursive is AIML?
Understanding recursion is important
to understanding AIML.
"Recursion" means applying the same solution over and
over
again, to smaller and smaller problems, until you reduce
the problem
to its simplest form. AIML uses the tags
<sr/> and <srai> to
implement recursion. The botmaster
uses these tags to tell the robot how to
respond to a
complex sentence by breaking it down into the responses
to
simpler ones.
Recursion can apply many times to a single input.
Given
the normalized input:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX IS
RIGHT NOW
an AIML category with the pattern "_ RIGHT NOW" matches
first,
reducing the input to:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX
IS
Another pattern ("<name/> *") reduces it to:
CAN YOU
PLEASE TELL ME WHAT LINUX IS
And then:
PLEASE TELL ME WHAT LINUX
IS
reduces to:
TELL ME WHAT LINUX IS
and finally
to:
WHAT IS LINUX
- What are "justthat" and "justbeforethat"
<justthat/>
and <justbeforethat/> are new, experimental
AIML tags. The idea here is
to represent more "state" in the dialogue
than just "that":
Robot:
justbeforethat
Client: justthat
Robot: that
Client: input
In the
future we may expand AIML categories to include such
"deeper context", if
there is a need for it.
- How can I insert a transcript in the robot reply?
The
purpose of <get_dialogue/> is to give the client a transcript of
his or
her conversation with ALICE. Unfortunately this feature was
advertised in a
press article before we had a really efficient
implementation, and the large
number of dialogue requests bogged
down the server. So for now
<get_dialogue/> just displays a warning.
- How does the random function work?
The random function is
(so far) the only AIML method
with a list argument. Its purpose is random
selection
of one of a set of text items. In "old-style" AIML the
text
appendage operator "+" also served as a list-item
marker. In XML style we use
the HTML <li> list-item
tag.
<random>
<li>X1</li><li>X2</li> </random> Say one of X1 or
X2
randomly
<random><li>A</li><li>B</li><li>C</li></random>
Say one of A, B or C randomly
- Can I run shell commands from AIML scripts?
Yes. Use the
<system>X</system> tag to run the shell command X.
The command X
is assumed to produce its output in line-oriented
format suitable for a
BufferdReader to read line by line.
A simple example of this command in an
AIML script is:
<category>
<pattern>WHAT TIME IS
IT</pattern>
<template>The local time is:
<system>date</system></template>
</category>
The
"date" command is a system command that generates a text
string containing
the date and time. (Note that this might
not work on Windows).
Take
extreme care in using the <system> tag because it
potentially permits
remote clients to run a command on
your system.
- How can I restrict remote clients from running programs on my
computer?
If your reply contains the
markup
<system>yourcammand <get_ip/></system>
then the robot will insert the (virtual) client IP into the
command
line argument for "yourcommand". Then it is up to "yourcommand"
to
enforce access privileges.
- Can I insert dynamic HTML into the robot reply?
If you are
fortunate enough to be running lynx under Linux, the
following markup is a
simple way to "inline" the results of an HTTP
request into the chat robot
reply. Try asking ALICE:
"What chatterbots do you know?" and she will reply
with a page
of links generated by the Google search
engine.
<category>
<pattern>WHAT
*</pattern>
<template>
Here is the information I
found:
<system>
lynx -dump -source -image_links
http://www.google.com/search?q=<personf/>
</system>
</template>
</category>
- Can I include JavaScript in the robot reply?
Yes. You can
include any HTML including <script> tags. Suppose you
want to "chat AND
browse," in other words, have the robot open
up a new browser window when she
provides a URL link. Here's a category that
kicks out a piece of
HTML/scripting that opens a new window with and loads a
given URL. This is
handy for search engines or showing off one's web
page.
<category>
<pattern> WHERE IS YOUR WEB SITE
</pattern>
<template>
It's at
"http://www.geocities.com/krisdrent/"
<script
language="JavaScript">
// Go to <a
href="http://www.geocities.com/krisdrent">The
ALICE
Connection</a>
<!--
window.open("http://www.geocities.com/krisdrent/")
-->
</script>
</template>
</category>
A
couple of things to note about this technique: #1, this will only work
when
ALICE is being talked to from a browser that runs JavaScript, i.e. it
won't
work in the applet. We have tested it in Netscape and MS Internet
Explorer,
and it works well in both. #2. For the above reason, it is
important to have
some sort of explanatory statement before the scripting in
case the scripting
isn't supported. Besides, you want some response in your
ALICE window, even
if another window DOES come up. #3. If this is viewed
in a browser that
doesn't understand the <script> tag, notice that this line
will show
up:
"// Go to <a href="http://www.geocities.com/krisdrent">The
ALICE
Connection</a>"
Which is good, because it gives a back-up for
the "non-scripted" (the Lynx
users, I guess.) And remember that you have to
keep the "//" in front of
any non-java-script lines within the <script>
tag.
- What is the <topic> tag?
1. <topic> allows
ALICE to prefer responses that deal with the
topic currently being
discussed. This creates topical
conversation, yet still has the ability to
move from one subject
to another.
2. <topic> allows ALICE to have
duplicate patterns in different
contexts (topics) allowing ALICE to have
different responses to
the same input patterns depending on the topic. For
example,
"overriding" the " * " pattern for different topics. (I'll give
an example with this.)
3. As always, you can still use the
<gettopic/> tag to refer to
the topic in your output statements
(templates).
4. As always, you can add topics on top of all your existing
AIML
to keep your bot's current personality.
- Where does the <topic> tag appear?
Topic tags are
placed around one or more categories. (Usually
many.) The categories (with
each respective "pattern", "that",
and "template") within a set of
<topic> </topic> tags would be
associated with the defined
topic. The name of the topic would be
given by a "name" property in the
beginning topic tag. Here would
be the full AIML format with
topic:
<alice>
<topic name="THE TOPIC">
<category>
<pattern> phrase </pattern>
<that> phrase </that>
<template> phrase
</template>
</category>
</topic>
</alice>
- How do I use the <topic> tag?
The concept is that the
botmaster uses the <settopic> tags to set
the current topic being
discussed. Once the topic is set, when
the client types in a statement for
ALICE to find a response for,
the categories defined within the
<topic> tags matching the
current topic will be searched first--
before any of the non-
topic categories, or the default categories. If there
is not a
matching category defined in the current topic, then any
categories that are not defined in topic tags are searched. As
mentioned
before, you can create categories with identical
<pattern> phrases in
different topics, each with different
responses that cater to the current
topic.
An proof of concept example:
A very useful topic entry might be
the default "*" input for
specific topics. If ALICE were set up on a pet
store web site
and a person was talking to ALICE about dogs, a useful entry
might be:
<topic
name="DOGS">
<category>
<pattern> *
</pattern>
<template>
<random>
<li> Dogs are
one of the most popular pets to have.</li>
<li> Have you ever met
a Chihuahua you didn't like?</li>
<li> What else do you know
about dogs? </li>
<li> Do you have any questions about dogs?
</li>
</random>
</template>
</category>
//more
dog categories....
</topic>
Normally there would be many
entries in a topic, but in this
example, we simply entered the default "*".
In this case, if the
person said something that ALICE didn't have a specific
programmed response for, she could still respond intelligently
within
the current topic. (Note: this is all assuming there are
existing categories
that might set the current topic to "DOGS")
Also, though topics can only
have one name, they can contain the
wild characters "*" or "_" just like a
pattern. Also, while
sticking with the pattern criteria, only one wildcard
character
is allowed per name. This would allow topics like "CARING FOR
DOGS" or "GROOMING DOGS" to also fall into the "_ DOGS" topic.
As with
patterns, the more specific topics would gain preference
over the wildcarded
topic. This means that if the topic is
currently "GROOMING DOGS" and yet
there is not a programmed
response for the input in that category, then "_
DOGS" would be
checked, and then next the default
categories.
- What is <think>?
The simple purpose of the
<think> X </think> tag pair is
to evaluate the AIML expression X,
but "nullify" or hide
the result from the client reply.
A simple
example:
<category>
<pattern>I AM
FEMALE</pattern>
<template>Thanks for telling me your gender.
<think><set_female/></think>
</template>
</category>
The
<set_female/> tag normally returns a string like "she". But
the
<think> tag hides the text output of <set_female/> from the
reply,
which contains only the text:
Thanks for telling me your
gender.
- What is the DTD for AIML?
Real XML fanatics know that
because AIML is an XML language it
must have something called a DTD (Document
Template Descriptor).
The DTD is a formal specification of the grammar for an
XML language.
Unless you are using special XML tools to work on your AIML
or
developing your own parser for AIML, you probably do not need to know
much about the DTD.
This DTD reflects the current content of the
*.aiml files that program B can
actually parse. The DTD will become more
general as the parser
improves.
<!DOCTYPE alice
[
<!--
# Author: John E. Laroche john@hitthebeach.com
# Version:
.91
# Organization: hitthebeach.com,inc
# Date: 24 October 1999
#
Revised: 6 November 1999
-->
<!ELEMENT alice ( category | topic
)* >
<!ATTLIST alice
>
<!ELEMENT topic ( category+ )
>
<!ATTLIST topic
name CDATA #REQUIRED
>
<!ELEMENT
category ( pattern | template | that )* >
<!ATTLIST
category
>
<!ELEMENT that ( #PCDATA)* >
<!ATTLIST
that
>
<!ELEMENT template ( #PCDATA | beforethat | birthday |
botmaster |
boyfriend | favorite_band
| favorite_book | favorite_color |
favorite_food | favorite_movie |
favorite_song
| for_fun | friends |
gender | get_age | get_dialogue | get_gender | get_he
| get_ip | get_it
|
get_location | get_she | get_they | getname | gettopic | girlfriend |
gossip
| justbeforethat
| justthat | kind_music | load | location | look_like | name
| noload |
person | person2
| personf | question | random | set_age |
set_animagent | set_female |
set_he | set_it
| set_location | set_male |
set_she | set_they | settopic | setname | sign |
sr
| srai | star | system
| talk_about | that | think | wear )*
>
<!ATTLIST
template
>
<!ELEMENT wear EMPTY >
<!ATTLIST
wear
>
<!ELEMENT talk_about EMPTY >
<!ATTLIST
talk_about
>
<!ELEMENT star EMPTY >
<!ATTLIST
star
>
<!ELEMENT srai ( #PCDATA | star | person | person2 | personf
| name |
get_it | botmaster | favorite_band
| favorite_book |
favorite_color | favorite_food | favorite_movie |
favorite_song
| set_it |
getname | gettopic | location | sr )*
>
<!ATTLIST
srai
>
<!ELEMENT settopic ( #PCDATA | star | person | person2 |
personf | name |
get_it |
favorite_band | favorite_book | favorite_color |
favorite_food |
favorite_movie | favorite_song
| set_it | getname |
gettopic | location| sr )*
>
<!ATTLIST
settopic
>
<!ELEMENT location EMPTY >
<!ATTLIST
location
>
<!ELEMENT gettopic EMPTY >
<!ATTLIST
gettopic
>
<!ELEMENT getname EMPTY >
<!ATTLIST
getname
>
<!ELEMENT get_gender EMPTY >
<!ATTLIST
get_gender
>
<!ELEMENT get_they EMPTY >
<!ATTLIST
get_they
>
<!ELEMENT set_it ( #PCDATA | star | person | person2 |
personf | name |
get_it
| favorite_band | favorite_book | favorite_color |
favorite_food |
favorite_movie | favorite_song
| that | justthat |
justbeforethat | set_it | getname | gettopic | location
| sr
)*
>
<!ATTLIST set_it
>
<!ELEMENT set_they ( #PCDATA |
favorite_band | favorite_book |
favorite_color
| favorite_food |
favorite_movie | favorite_song | person | person2 |
personf | star | that
)*
>
<!ATTLIST set_they
>
<!ELEMENT person EMPTY
>
<!ATTLIST person
>
<!ELEMENT favorite_song EMPTY
>
<!ATTLIST favorite_song
>
<!ELEMENT favorite_movie EMPTY
>
<!ATTLIST favorite_movie
>
<!ELEMENT favorite_book EMPTY
>
<!ATTLIST favorite_book
>
<!ELEMENT get_it EMPTY
>
<!ATTLIST get_it
>
<!ELEMENT name EMPTY
>
<!ATTLIST name
>
<!ELEMENT sr EMPTY >
<!ATTLIST
sr
>
<!ELEMENT sign EMPTY >
<!ATTLIST
sign
>
<!ELEMENT setname (#PCDATA| person | star
)*
>
<!ATTLIST setname
>
<!ELEMENT set_age ( #PCDATA |
star | person )*
>
<!ATTLIST set_age
>
<!ELEMENT set_she
(#PCDATA| person | star)* >
<!ATTLIST set_she
>
<!ELEMENT
set_location (#PCDATA | star | person )*>
<!ATTLIST
set_location
>
<!ELEMENT set_male EMPTY >
<!ATTLIST
set_male
>
<!ELEMENT set_female EMPTY >
<!ATTLIST
set_female
>
<!ELEMENT set_he ( #PCDATA | star | person )*
>
<!ATTLIST set_he
>
<!ELEMENT random ( #PCDATA | li )*
>
<!ATTLIST random
>
<!ELEMENT li ( #PCDATA | beforethat |
birthday | botmaster | boyfriend |
favorite_band
| favorite_book |
favorite_color | favorite_food | favorite_movie |
favorite_song
| for_fun
| friends | gender | get_age | get_dialogue | get_gender | get_he
| get_ip |
get_it
| get_location | get_she | get_they | getname | gettopic | girlfriend
|
gossip | justbeforethat
| justthat | kind_music | load | location |
look_like | name | noload |
person | person2
| personf | question |
set_age | set_animagent | set_female | set_he |
set_it
| set_location |
set_male | set_she | set_they | settopic | setname | sign |
sr
| srai |
star | system | talk_about | that | think| wear )*
>
<!ATTLIST
li
>
<!ELEMENT question EMPTY >
<!ATTLIST
question
>
<!ELEMENT look_like EMPTY >
<!ATTLIST
look_like
>
<!ELEMENT kind_music EMPTY >
<!ATTLIST
kind_music
>
<!ELEMENT justthat EMPTY >
<!ATTLIST
justthat
>
<!ELEMENT justbeforethat EMPTY >
<!ATTLIST
justbeforethat
>
<!ELEMENT girlfriend EMPTY >
<!ATTLIST
girlfriend
>
<!ELEMENT get_she EMPTY >
<!ATTLIST
get_she
>
<!ELEMENT get_ip EMPTY >
<!ATTLIST
get_ip
>
<!ELEMENT get_he EMPTY >
<!ATTLIST
get_he
>
<!ELEMENT gender EMPTY >
<!ATTLIST
gender
>
<!ELEMENT friends EMPTY >
<!ATTLIST
friends
>
<!ELEMENT for_fun EMPTY >
<!ATTLIST
for_fun
>
<!ELEMENT favorite_food EMPTY >
<!ATTLIST
favorite_food
>
<!ELEMENT favorite_color EMPTY >
<!ATTLIST
favorite_color
>
<!ELEMENT favorite_band EMPTY >
<!ATTLIST
favorite_band
>
<!ELEMENT boyfriend EMPTY >
<!ATTLIST
boyfriend
>
<!ELEMENT botmaster EMPTY >
<!ATTLIST
botmaster
>
<!ELEMENT birthday EMPTY >
<!ATTLIST
birthday
>
<!ELEMENT beforethat EMPTY >
<!ATTLIST
beforethat
>
<!ELEMENT noload EMPTY >
<!ATTLIST
noload
url CDATA #REQUIRED
>
<!ELEMENT load EMPTY
>
<!ATTLIST load filename NMTOKEN #REQUIRED
>
<!ELEMENT
gossip ( #PCDATA | getname | person2 | get_gender | person |
personf | star
)* >
<!ATTLIST gossip
>
<!ELEMENT get_dialogue EMPTY
>
<!ATTLIST get_dialogue
>
<!ELEMENT pattern ( #PCDATA )*
>
<!ATTLIST pattern
>
<!ELEMENT get_age EMPTY
>
<!ATTLIST get_age
>
<!ELEMENT get_location EMPTY
>
<!ATTLIST get_location
>
<!ELEMENT personf EMPTY
>
<!ATTLIST personf
>
<!ELEMENT person2 EMPTY
>
<!ATTLIST person2
>
<!ELEMENT set_animagent EMPTY
>
<!ATTLIST set_animagent
>
<!ELEMENT think ( #PCDATA |
getname | person2 | get_gender | person |
personf | star )*
>
<!ATTLIST think
>
<!ELEMENT system ( #PCDATA )*
>
<!ATTLIST system
>
]>
VII. Java Classes
- Do I need to know about the Java classes?
No, not unless
you plan to do software development on
the program B Java code. If you are an
open source
contributor to the ALICE project, a researcher developing
new
AI software, or you are trying to link your own
code to the ALICE package,
then this section is for you.
Otherwise, you probably don't need to know much
about the
Java classes in program B.
- How does program B work?
The basic loop of program B is to
accept an input,
either from the GUI or from the Web, to
preprocess that
input and segment it into sentences,
and, for each sentence, to find the best
match among
the patterns, and to return the corresponding reply.
Each
reply is itself an AIML template, in effect a mini-
program that tells
program B how to construct the reply.
The algorithm is thus divided into
a matching phase
and a response evaluation phase. In fact these two
phases
interleave, because the response may evoke
a recursive call to the pattern
matcher with the
<srai> or <sr/> tags.
- What is the class structure of program B?
The core
functionality of program B resides in the file
Classifier.java. In that file,
you find a class hierarchy
from "String" to "Brain" and finally "Classifier."
A branch in that hierarchy contains classes for histogramming
and
ranking.
The first branch of the class hierarchy derives class
Brain
from StringSorter, extending StringSet. The second branch
extends
StringSet to StringHistogrammer and on to StringRanker.
The final class Brain
extends StringSet and uses StringRanker.
- I tried to compile prorgam B and got a lot of warnings.
The
designers of Java and the designers of ALICE disagree
on one stylistic point:
Java designers believe in the
"one file-one class" philosophy, at least for
classes
used outside their own source file. The ALICE engineers
follow the
opposite "one file-many classes" design principle,
which allows us to group a
number of logically related classes
in a single file, such as
Classifier.java. The Java compiler
might complain about a class used outside
its file, but
these messages are just warnings.
If you don't want to
see the compiler warnings, run the
compiler with the "-nowarn"
flag:
javac -nowarn *.java
- What are deprecated APIs?
One of the biggest challenges
facing a Java application
developer is finding a "lowest common denominator"
subset of the
Java language that works on all the platforms and
versions
out there. The Java language itself is a moving target.
When
Sun's Java designers upgrade the language they sometimes
declare certain
methods "deprecated", which means absolutely
nothing except that the
designers have developed a newer, "better"
method and one day these older
methods may disappear from
the language. Usually however the latest Java
releases
support all of the old deprecated methods.
- What is class Globals?
Globals is the repository for all of
the botmaster-selectable
parameters in program B. The Globals class
corresponds to
the "Options" menu on the program B menu bar. Globals
contains
methods toFile() and fromFile() to make these values
persistent
between sessions.
- What is class StringSet?
The StringSet implements the
abstract concept of a set of
strings, meaning that each string item appears
at most once
in the setc.
The "set" means that the strings occur only
once in instances
of object StringSet: {"this","that","another"} is a set
of
strings; {"start","start","stop"} is not.
- What is class SortedStringSet?
SortedStringSet extends
StringSet but enforces an alphabetical
ordering of the Strings. The
SortedStringSet maintains its
data structure dynamically, so that the set
remains sorted
after each item is added.
- What is class StringHistogrammer?
StringHistogrammer
extends StringSet and contains a map from
each string to a count, usually
indicating the number of times
that string appears in a sample of text. A
histogram is
like a "bar graph" that counts occurances of each item.
- What is class StringRanker?
Extending StringHistogrammer,
StringRanker also sorts the
strings by the histogram count. The highest
count string
is first, the next highest count second, and so on.
The
concept of a StringRanker should be familiar to anyone
who has ranked people,
companies or sports teams by any
number such as sales, market capitilization,
or points scored.
One application for a StringRanker is determining
the
"top 10 referers" in HTTP log file analysis
(see
http://alicebot.org/mine.html).
- What is class Brain?
Brain extends StringSorter, and uses
StringRanker. The sorted
strings in the Brain class are keys formed by
combining the
pattern, that, and topic strings. In the original
versions
of ALICE, there were no "that" and no "topic" tags, so the
Brain
class simply mapped input patterns to output templates.
With the addition of
the "that" and "topic" tags we had to
create the "key" from the combination
of all three.
The "Target" objects in class Brain are instances of
StringRanker.
These structures form the basis of the classification and
targeting
algorithms in program B. For each category, the Targetmap
contains
an instance of StringRanker storing the inputs classified
into
that category.
- What is the Responder interface?
Developed to meet the
needs of multiple ALICE
application scenarios, the Responder
interface
simplifies the code in class Classifier for
natural language
queries. The Responder defines
an interface with three members:
log()
: tells how to log the conversation.
append() : how to append response lines
together.
post_process() : runs after response loop finishes.
The
method Classifier.multiline_response() calls
all of the Responder methods.
See the next
question ("What is the low-level interface?")
for more
information about multiline_response().
At least five classes implement the
Responder
interface:
GUIResponder: the program B GUI uses
this.
HTMLResponder: a class for Web Server HTML replies.
RobotResponder:
this class used by RobotCommunicator
CustomResponder: a template for more
Responder classes.
AppletResponder: the Applet code uses this
class.
These classes all handle special circumstances
for the various
Responder types: for example,
HTMLResponder appends the client input to
each
response; GUIResponder does not. AppletResponder
logs the dialogue
through a network URL connection;
all other classes write to a local file.
RobotResponder,
used by the Kid interface, suppresses all the HTML
from
robot replies; while HTMLResponder passes
them through. HTMLResponder also
runs the optional
Animagent class to create the MS Agent VB
Script.
Text-based Responder classes wrap the text; HTMLResponder
need not
wrap because the browser handles text formatting.
The Responder interface
addresses this wide variety of needs.
- What is the low level interface to program B?
If you
require only a graphical interface, try using the
class RobotCommunicator.
Depending on your application,
you may also try the Servlet interface or the
applet.
Some developers however may want lower-level access to the
chat
robot functions.
The class Classifier in Classifier.java contains the
low-level
methods needed to interface directly to ALICE. "Classifier" might
as well be called "Bot" because more than any other class,
it handles
those functions most unique to the chat robot.
The method
Classifier.multiline_response() is a key entry point
into the conversation
engine. The "multiline" in
"multiline_response" means that the input may
contain
multiple "lines" or sentences. The first argument "query"
to
multiline_response is the input. The second argument "hname" is
the
virtual IP address of the client. The third and last argument
is the class
implementing the Responder interface.
If the input string contains
"Sentence1. Sentence2? Sentence3."
then multiline_response might
produce:
> Sentence1.
Reply1
> Sentence2
Reply2
>
Sentence3
Reply3
The method multiline_response hides all of the
details
of sentence segmentation, responding to each input line
individually,
and formatting the output. In particular multiline_response()
may or may not append the VBScript needed to drive the MS
Agent output,
depending on whether the global MS Agent parameter is set.
The argument
"hname" is a key that indexes the client's conversation. For
the interface
you need this can probably always be "localhost" or some
other constant.
- Lower, Lower
If you need even lower level access to the
program B robot,
you can request responses to individual sentences on a
line-by-line basis. Inside multiline_response() there are
calls to the
Classifier.respond() method like:
String response = respond(norm,
hname);
where "norm" is a normalized single-sentence input and hname
is
the virtual IP address of the client.
Inside respond() we find the
the method respondIndex(). The
base class StringSet stores the strings in an
indexed vector,
and respondIndex() locates the index of the best matched
category
for the normalized input string.
The loop inside
respondIndex() scans through the categories
in reverse alphabetical order by
key, until it finds the best
match. Because the "*" pattern comes first in
alphabetical
order, and is the most general pattern, respondIndex()
will
return zero when no more specific category matches.
- What is class IntSet?
IntSet represents a set of integers.
Were we using Java
Collections this would likely be a Set, but the
simple
requirements of program B allow us to create a simple
IntSet
class.
"Set" means that the object has only one occurance of each
item:
{1, 4, 2, 9} is a set of integers; {1, 1, 2} is
not.
- What is class SortedIntSet?
The sorted version of IntSet,
SortedIntSet maintains its
elements in a sorted array. Throughout program B
you will
find many loops utilizing instances of SortedIntSet.
These
objects provide an efficient means to locate items in
"rank order",
the highest numbered items first and the
smallest numbers last.
- What is class Substituter?
The static class Substituter
contains a number of similar string substitution
methods useful at several
points in program B.
Program B has the unique feature that it relies on
HTTP GET methods,
rather than POST methods, to transmit chat inputs to the
robot server.
HTTP inserts '+' characters in place of spaces, and applies a
series of
substitutions to eliminate many characters. The static method
cleanup_http()
undoes these substitutions and restores the input string to
the form similar
to what the client originally typed.
The problem of
segmenting strings into sentences is complicated by the
conventional use of
periods to denote abbreviations like "Dr.", "Mr.",
and "St." The method
deperiodize() applies a series of substitutions to
eliminate most common
abbreviations. Like the other substitution methods
in this class, the
deperiodize() method has an associated static data member
of class
String[][2], which stores the substitution map.
The patterns in AIML are
written in normalized form. The method normalize()
converts a string to
normal form by the following steps:
1. Remove all punctuation (inputs
assumed to be individual sentences)
2. Convert string to upper
case
3. Place exactly one space between words
4. Expand all
contractions
5. Correct a few common spelling mistakes
6. Return a
"Trimmed" string
The justification for removing all punctuation from text
inputs
is explained by the need to make the chatterbot compatible with
speech
inputs, which of course contains no punctuation.
- What is class Unifier?
Unification refers to the process of
matching and binding. A unifier determines
whether two sentences match and,
if so, what any 'variables' in the pattern
bind to. In the case of AIML the
only matching variable is the single '*'
symbol. The Unifier class contains a
'star' data memeber to contain the
matched subsentence.
- What is class Parser?
The Parser class is responsible for
the evaluation of AIML
response templates. The method pfkh() [the Program
Formerly
Known as Hello] is the heart of evaluation process. This
method
contains the code for recognizing and processing
AIML template
tags.
The Parser class does not parse all the AIML in the
language
definition; it parses and evaluates only the templates at
runtime.
Another class, AliceReader, has the job of reading the AIML files
at load time, and parsing the categories into topics, patterns and
templates.
- What is class AliceReader?
AliceReader is an efficient,
small-footprint XML interpreter
hard coded by Kris Drent specifically for
reading AIML categories.
Each category has a pattern, a template, and an
optional topic and
thatpattern. AliceReader scans the AIML input and tries
to
identify these fields as quickly as possible.
- What is class Classifier?
The class Classifier might as
well be called "bot" because it contains
the basic functionality of the
chatterbot algorithm.
See the question "How can I interace my Java
program to ALICE?" for
additional information about the class
Classifier.
- What is class LineClassifier?
In the file Log.java you will
find an Interface called LineProcessor
with one required method:
process_line(). The LineProcessor
is the abstraction of an algorithm that
reads a file one line at a time,
processes each line as a data record, and
moves on to the next.
LineClassifier implements LineProcessor because it
reads lines
of text from the log file and identifies client input lines
for
classification. What makes classification efficient is the
way
LineClassifier stores the client lines in a SortedStringSet,
called
Lines. Becuase the matching algorithm proiritizes the
patterns
alphabetically, LineClassifier can classify an element from
Lines
in O(1) time.
The code for LineClassifier is in
Classifier.java.
- What is class Dialogue?
A Dialogue (not to be confused with
a Dialog class!) is
the representation of the conversation between the
client
and the robot. The basic data structure is a pair of String
arrays
client_said[] and robot_said[] that store the
alternating
statements of client and robot. The Dialogue also
encodes the
length, hostname, and start and end tag
information.
- What is class Access?
Class Access is the abstraction for
log file analysis to
extract dialogues. In a typical chat robot server
scenario,
the program records each line of client input and the
robot
reply in a log file. Given many simultaneous conversations,
these
dialogues are interleaved in the log file. The purpose
of class Access is to
unravel these conversations into
individual threads by
client.
- What is class B?
Class B is the old name for the Swing
version of class Bawt, but
now just extends Bawt.
- What is class Bawt?
The class Bawt is the Java application,
and implements the GUI.
- What is class Blet?
The Blet class is the applet, but is
similar in many ways to the application.
The applet is a stripped down
version of the program, with a simpler GUI
and no "botmaster" privileges.
Also, the Blet class doesn't utilize the
web server, because it runs as a
client-side applet.
- What is class Kid?
Class Kid is a simplified graphical user
interface, "easy enough
for kids" to run. Program Kid does not evoke program
B, but the Kid
may be started from the program B options menu. The logic
here
is that kids should be able to have conversations with the
chat
robot, but parents may not want kids to start chat robot servers
(see
Appendix B: Note to Parents).
Class Kid utilizes RobotCommunicator as its
interface to the
chat robot.
- What is class RobotCommunicator?
If you want to customize
your own application or applet then
you might find RobotCommunicator is a
useful class. The
RobotCommunicator abstracts the combination of a scrolling
TextArea
output display with a TextField input area input
field.
- What is class Loader?
Both the application and the applet
use the Loader class to load the AIML
robot script. The Loader class extends
Thread, and runs "in the background"
while the GUI and, in the case of the
application, the web server start.
- What is class WebServer?
The WebSever class implements a
"faux" HTTP server, i.e. a server that
listens for HTTP connections and
accepts them; then replies in properly
formatted HTML. The connecting client,
typically a browser, cannot tell
the difference between the chat robot server
and a full-blown web server.
In particular, our WebServer implements only
HTTP GET methods, not POST
methods. Our WebServer class does not implement
many of the other features
of ordinary web servers; although it is a
multithreaded server.
- What is class Clerk?
The idea behind class Clerk is to put
a "firewall" between the
client and the server so that a misbehaving client
can't tie up
the server forever. A client connecting to a web server
is
like a customer appearing at a service window. When the
customer
appears, the ClerkManager assigns a clerk to that
customer.
The customer may take a while to give the clerk his
query,
even after making the first connection. The clerk goes into
a
hibernating "wait" state and wakes up periodiocally to
see if the client has
finished the query. Some customers
never complete their requests, so the
manager kills the
clerk after a predetertimined timeout.
We made the
Classifier method mutliline_response "synchronized"
so that requests to the
robot are atomic. The clerk will
wait until the client request is completed
before activating
the robot. Once the client completes the query, the
clerk
delivers it to the robot. The clerk then sends the reply back
to the
client and terminates itself.
Appendix A.
Self-Test
1. What does AIML stand
for?
2. What is the basic unit of AIML?
3. Name the three parts of a
category
4. True or false: <that> is optional
5. True or false: the
web server must run on port 2001
6. Show two forms the recursive AIML
tags
7. True or false: <srai> <star/> </srai>
is the
same as <sr/>
8. Which of the following are valid AIML patterns?
a.
HELLO
b. * HELLO
c. * AND *
d. _ ALICE
e. *
f. forget *
9.
What does <person> your wish is my command </person> do?
10.
Given the input "Who is Dr. Wallace" which of the following
patterns is the
best match:
a. *
b. WHO IS *
c. WHO IS DR WALLACE
d. WHO IS DR
WALLACE *
11. Given the input "Who is the first president" which of the
following patterns is the best match:
a. WHO IS THE PRESIDENT
b. WHO
IS THE FIRST *
c. WHO IS THE * PRESIDENT
d. WHO IS *
12. What is wrong
with the following
category?
<category>
<pattern>AND</pattern>
<template>
<srai> AND </srai> </template>
</category>
13.
Is anything wrong with the following
category?
<category>
<pattern>TEST
ME</pattern>
<template> <system>dir</system>
</template>
</category>
14. What is the difference between
<gender/> and <get_gender/>?
Advanced Java
questions:
15. Sketch the class hierarchy from "String" to "Brain".
16. True or false:
a. "pfkh" stands for "The Program Formerly Known as
'Hello'"
b. pfkh() is a method in class Parser
c. pfkh() parses and
evaluates all AIML expressions
d. pfkh() evaluates templates at reply
time
17. multiline_response() is a member of what class?
18. The keys used
to track client properties are called what?
Answers:
1. Artificial
Intelligence Markup Language
2. a category
3. pattern, that,
template
4. true
5. false
6. <sr/> and
<srai>...</srai>
7. true
8. a. b. d. and e. are valid.
9.
"My wish is your command"
10. c. is an exact, atomic match.
11. b. because
it is last in alphabetical order.
12. This category creates an infinite
loop.
13. This category works fine, but allows remote
clients to see your
files.
14. <gender/> is the robot's gender; <get_gender/> is the
client's.
16. true, true, false, true
17. Classifier
18. virtual IP
addresses
Appendix B. Note to Parents
The ALICE "brain" does not
contain any explicit or adult material.
Experience has shown however,
that clients (persons communicating
with the chat robot over the Internet)
will invariably try to
engage the robot in adult conversations. The robot is
programmed
to try to avoid these topics. But parents may not wish
to give
children access to the log files containing these
mature
conversations.
Children chatting with ALICE is okay; children
reading the
dialogues with adult clients is not.
Clients talking with
chat robots on the Internet should also be
aware that the program B server
logs and records all conversations.
-- End of Don't Read Me © 2000 Dr.
Richard S. Wallace
-- For more help join the ListBot mailing list at
alicebot.listbot.com
-- Please send corrections and additions to
dr.wallace@mindspring.com