Don't Read Me:

A. L. I. C. E. and AIML Documentation

Richard S. Wallace

Copyright 2000

Last Modified January 14, 2000


The primary purpose of this program -- codenamed "B" -- is to give
away the secret of ALICE chat robot development to anyone who
wants it, permitting the greatest possible dissemination, utilization
and technical improvement of the ALICE chat robot technology.
Second, program B is written in Java to support the widest
possible architectural base (a significant shortcoming of early ALICE).
B runs on a wide range of processors and systems supporting the
Java Virtual Machine. Third, program B is designed to be as simple as
possible to use, especially for nonprogrammers who will never have
to touch the Java source code.

Program B is offered free, under the terms of the GNU General Public
License; permitting rapid, organic software growth with contributions
from many programmers. We gratefully acknowledge the following
individuals who worked on program B:

Diana Andreacchio wrote a help document for Windows users that became
part of this FAQ.
Anonymous contributors created the VBScript interface to MS Agent.
David Bacon wrote the SETL compiler used to prototype the early
versions of ALICE.
Jacco Bikker wrote an AIML interpreter in C and corrected much of
the original AIML content.
Ace Craig contributed the the AIML "web search" markup.
Kris Drent has provided significant contributions to the
Program B application, Applet and Servlet. He created the
<topic> tag. Kris also wrote a fast, small-footprint XML
parser tuned for AIML files.
Christian Drossmann provided the German language AIML module.
Ken Goldberg provided key ideas for dialog initiation and analysis.
Sage Greco created the ALICE robot maid 3-D graphic model and also
designed the original ALICE pyramid logo.
John Laroche updated the XML DTD for AIML.
Andrew Potgieter wrote a text wrapper and created the <think> tag.
Timothy F. Rohaly proofread the Java code and suggested numerous improvements.
Anthony Taylor provided a formatted AIML file 65Pretty.aiml.

Additional graphics were created by Sage Greco, Darren Langley and
Larry Hauser. We gratefully acknowledge these contributions and the
programmers who created them. Thanks are also due to the many people
who asked the "frequently asked questions" that form the basis of
this document.

This Java code sample is provided to you on an 'as-is' basis
without warranty or condition of any kind, either express or
implied, including, but not limited to, warranty or condition of
merchantable quality or fitness for a particular purpose.
The authors shall not be liable for any damages
arising out of your use of this code, even if they
have been advised of the possibility of such damages.

Copyright 2000, Dr. Richard S. Wallace, All Rights Reserved

Program B includes a chat robot development environment, a GUI, a server,
Servlet and downloadable Applet. This file contains a basic outline of
documentation and a tutorial. Some experience working with Java
applications and Applets is desirable, but not necessary, to
install and run this software. Program B could be your first
Java program.


I. Introduction

II. Download and Run

III. Creating Content

IV. Web Server Interface

V. Applet Interface

VI. AIML - Artificial Intelligence Markup Language

VII. Java Classes

Appendix A. Self Test

Appendix B. Note to Parents

I. Introduction

- What is the goal for AIML?

AIML (Artificial Intelligence Markup Language) is an XML specification
for programming chat robots like ALICE using program B. The emphasis
in the language design is minimalism. The simplicity of AIML makes
it easy for non-programmers, especially those who already know HTML,
to get started writing chat robots.

One ambitious goal for AIML is that, if a number of people create their own
robots, each with a unique area of expertise, program B can literally
merge-sort them together into a Superbot, automatically omitting
duplicate categories. We offer the both the source code and the ALICE
content, in order to encourage others will "open source" their chat
robots as well, to contribute to the Superbot. Ken Goldberg,
Christian Drossmann and others have already contributed significant
content to the ALICE chat robot.

Botmasters are also of course free to copy protect private chat robots.

- Who is the botmaster?

The botmaster is you, the master of your chat robot. A botmaster runs
program B and creates or modifies a chat robot with the program's
graphical user interface (GUI). He or she is responsible for
reading the dialogues, analyzing the responses, and creating new
replies for the patterns detected by program B. Botmasters are
hobbyists, webmasters, developers, advertisers, artists, publishers,
editors, engineers, and anyone else interested in creating a personal
chat robot.

- How can I create my own chat robot?

The secret to chat bot programming, if there is one, is what Simon
Laven called "continuous beta testing". Program B runs as a server
and collects dialog on the web. The program provides the chat bot
developer with a tool called "classify dialogues", that tests the current
robot with the history of accumulated human queries. Moreover, the program
suggests new categories automatically, for the botmaster to refine.

- How difficult is it to create a chat robot?

Not difficult. If you can write HTML, you can write AIML (Artificial
Intelligence Markup Language). Here is an example of a simple but
complete chat robot in AIML:

<template> Hello! </template>

The tags <alice>...</alice> indicate that this markup contains a
chat robot. The <category> tag indicates an AIML category, the
basic unit of chat robot knowledge. The category has a <pattern>
and a <template>. The pattern in this case is the wild-card
symbol '*' that matches any input. The template is just the text
"Hello!" As you may have guessed, this simple chat robot just
responds by saying "Hello!" to any input.

You can get stared with AIML knowing just the three tags
<category>, <pattern> and <template>; much like you may have
started with HTML knowing only <a>, <img> and <h1>.

- Does ALICE learn?

The model of learning in ALICE is called "supervised training",
because a teacher, the botmaster, always plays a crucial role.
The alternative, "unsupervised training", is complicated in
an open environment like the Web. The problem is that clients
are untrustworthy teachers, and forever try to "fool" the robot
with untrue assertions.

- Does ALICE think?

It depends on what you mean by "thinking". The most fascinating
responses from ALICE arise when she says something
unexpected, or puts together responses in ways the botmaster
never indended. For example:

Client: I bet you are gay.
ALICE: Actually I am not the gambling type. Actually as a machine
I have no need for sex.

Here the robot linked two different categories which both coincidentally
have a moral theme (gambling and sexuality). But this specific combination was
not "preprogrammed" by the botmaster.

Are these surprising responses just unintended coincidences, or do they
indicate that ALICE is thinking? Is ALICE just a gigantic stimulus-response
mechanism, or are we?

- What is the theory behind ALICE?

I used to say that there was NO theory behind ALICE: no neural network,
no knowledge representation, no search, no fuzzy logic, no genetic
algorithms, and no parsing. Then I discovered there was a theory
circulating in applied AI called "Case-Based Reasoning" or CBR that
maps well onto the ALICE algorithm. Another term, borrowed from
pattern recognition, is "nearest-neighbor classification."

The CBR "cases" are the categories in AIML. The algorithm finds
best-matching pattern for each input. The category ties the
response template directly to the stimulus pattern. ALICE is
conceptually not much more complicated that Weizenbaum's ELIZA
chat robot; the main differences are the much larger case base and the
tools for creating new content by dialog analysis.

ALICE is also part of the tradition of "minimalist", "reactive" or
"stimulus-response" robotics. Mobile robots work best, fastest and
demonstrate the most animated, realistic behavior when their sensory
inputs directly control the motor reactions. Higher-level symbolic
processing, search, and planning, tends to slow down the process
too much for realistic applications, even with the fastest control

- Can probability (statistics, weights, neural networks, or fuzzy logic) improve bots?

Statistics are in fact heavily used in the ALICE server, but not in the way
you might think. ALICE uses 'Zipf Analysis' to plot the rank-frequency of
the activated categories and to reveal inputs from the log file that don't
already have specific replies, so the botmaster can focus on answering
questions people actually ask (the "Quick Targets" function).

Other bot languages, notably the one used for Julia, make heavy use of
"fuzzy" or "weighted" rules. We see their problem as this: the botmaster
already has enough to worry about without having to make up "magic
numbers" for every rule. Once you get up 10,000 categories (like ALICE)
you don't want to think about more parameters than necessary. Bot
languages with fuzzy matching rules tend to have scaling problems.

Finally, the bot replies are not as deterministic as you might think, even
without weights. Some answers rely on <random> to select one of several
possible replies. Other replies generated by unforseen user input also
create "spontaneous" outputs that the botmaster doesn't anticipate.

- Can I have a private conversation with ALICE?

The ALICE server logs and records all conversations. Even the ALICE
Applet tries to transmit conversation logs back to the originating server.
You can have a private conversation with ALICE, however, if you download
Program B to your own computer and run it there. Running on your machine,
the server stores all the conversations locally.

II. Download and Run

- How do I install ALICE?

If you purchased a commercial version of ALICE on CD ROM or
over the web, installation should be very easy. These versions
usually have their own self-extracting and install software.
You can install the ALICE program with just a mouse click and
activate it with a desktop icon.

If you bought a commercial version of ALICE with a self-installer,
you can skip this section and go on to "Creating Content".

- How do I download program B?

Create a Directory (or Folder) on your machine to download
the file. When you click on "" the browser
should ask you where you want to save the file. Select the
directory you created and save to that folder.

Once you've downloaded, You can use "unzip" to extract the files.
If you don't have this unzip command on your machine, you can get
a free one from Winzip ( to unzip the "" file.

If you want to get into the Java source code, you need a
Java 1.17 (or higher) development kit release.
Go to for a free one. The program source code
and all associated files are stored in the single "zip" file
called To extract the files use the command
"unzip" (assuming you have "unzip" on your machine).

- How do I run program B?

Use the command "java B" (or "java Bawt") to start the program.

Run program B and notice that the program creates an Edit View
text window. By default, program B loads the chat robot ALICE
(stored in B.aiml).

- What does "Send" do?

Type a text string like "hello" into the Text Area
(Edit View) and press the "Send" button. Notice that program B
replaces the text in the Edit View with a reply from the robot.

- What does "Clear" do?

To enter another robot query, clear the screen with the "Clear"
button. Enter a new String like "How are you?" and press "Say."

"Send" and "Clear" provide a simple way to communicate with the
chat bot through the Edit View. Try cutting and pasting a paragraph,
such as an e-mail message, into the Edit View and press "Send".
See how the robot would reply to your multiline message.

- What is program Bawt?

Significant demand for a version of ALICE compatible with
pre- Java 2 (formerly known as Java 1.2) prompted the
development of "", an open source java program
for chat robot development that works with older versions of
Java, and AWT. Originally program B relied on
Java 2 and Swing, but program Bawt needs only Java 1.1 and AWT.
Swing is a newer GUI package that subsumes the earlier Java
Abstract Windows Toolkit (AWT).

At present class B merely extends class Bawt. Swing not

- Does program B run under Windows?

Yes. You need the Java Runtime Environment (JRE) so you can run the
"java" command from the DOS prompt. Try opening a DOS window
and type "java".

Microsoft often includes a JRE called "jview" rather than
"java". Try opening a DOS window and type "jview". On Windows 98
the JRE is usually located in c:\windows\jview.exe.

- Does program B run on a Mac?

Yes. Download the file and save it in a new folder.

Instead of the "winzip" or "unzip" utility use "Aladdin StuffIt Expander."
The newer version will unzip most MAC formats as well as .ZIP files. You can
download this at "" by searching for it by name. You can
also select the option that allows it to search only for Mac programs.
Download that and install it, it should do the trick.

Apple makes its own Java Runtime Environment for the Mac called
MRJ 2.1.4. You can download it from

- Does program B run under Linux?

Yes. You need the JRE, which often comes bundled with Linux
(e.g. the kaffee JRE with Red Hat Linux) or you can download one
from You also need X-windows to run the GUI.
Open a shell under X windows and use the command "java B".

We also recommend the IBM release of their Java 1.1.8 Java Development
Kit (JDK) and JRE for Linux. It is solid, efficient and very fast.
You can download it free at:

- Does program B run under XYZ?

Yes if XYZ runs has a Java Runtime Environment 1.17 or higher.

- How much memory do I need to run program B?

The source code compresses to as little as half a megabyte, including
all the AIML files for nearly 16,000 categories. You may have downloaded
a file of only around 500K. Plan to use a minimum 10 MB of hard disk space
for the download directory. The hard disk requirements include not
only the source code and Java class files, but also the dialogue files
and other temporary files created by the robot.

The RAM requirements vary depending on the size of your robot.
To run the fully loaded ALICE chat robot with 16,000 categories
you will need 64MB of memory. To do this and anything else at
the same time on your system we recommend a minimum of 96MB.
With less memory you can load a smaller robot. See the question
below "What is <load filename="X"/>?"

- How do I install ALICE on Windows?

Download program B at

Install program B in a file folder named "B".

Download the java development kit - jdk1.2.2 -

This is a large file - 20MB. If you have a slow modem it will take
a long time. Download the full single file as windows 95 will not
concatenate the separate pieces. Install the jdk1.2 in a folder
named "jdk1.2.2". Read the installation instructions.
Download the separate docs file - 17MB. Install in the jdk1.2 folder.

Program B is a java application and must be run in DOS.

Go to Start, click Programs, click MS-DOS Prompt.

At C:\> prompt type in,
cd B

You will then get - C:\B> - This is now pointing to your source directory.
(Please note that if you have installed program B to a different
drive than C substitute that drive letter - also you should have
installed the jdk1.2.2 in the same drive as program B).
At C:\B> type jdk1.2.2\bin\java B and hit enter (the idea is, you first
point to your source file and then to the jdk folder which will
run your program and then to the executable file which in this case
is java B.

What it should look like is this -
C:\B>jdk1.2.2\bin\java B

After hitting enter you should get the botmaster screen. You can
play around with it by yourself on your machine.

You can also use the applet in DOS. To do so type as follows:
C:\B>jdk1.2.2\bin\appletviewer index.html
This will try to connect you to the main server at
so have your browser fired up. Keep in mind that you will be
talking to the alicebot at that site, however, and not to the
program on your machine.

- What do you mean by the command "java B"?

This does not mean you mean click on an icon. If you are using Windows,
you must use a DOS window to run a Java program. Find the MS-DOS item
on your start menu or desktop and open up a DOS window. In that window, use
the DOS commands CD (change directory) to move to the "B" directory.
Then type "java B" to run the program.

If you are using windows, then you can create a desktop icon
as a "shortcut" to a batch file. Create a batch file called
"launch.bat" in the program B directory. The file contains only
one line with the text "java B". There is an AIML icon file
included with program B called "aiml.ico". You can use this
file to add an icon to your desktop.

- I tried running "java B" and I got a "bad command or file name".

You are using a Windows/DOS setup. If "jview B" does not work either,
you may need to install Java on your computer. Go to
and pick the one for your computer (Windows 95/98 or NT).

If it still says "bad command" then possibly there is a problem with
the CLASSPATH variable in AUTOEXEC.BAT. Make sure it is set to
something like
(The single "." means the current working directory)
and make sure the PATH is set to include the java home directory:
SET PATH=c:\JDK1.2\bin;%PATH%

- How do I uninstall ALICE from my system?

If you installed ALICE on Windows with a commercial installer like
InstallShield Java Edition, then go to the start menu and
select "Control Panel". Click on the control panel item called
"Add/Remove Programs". Select ALICE from the list of installed
software and choose "Uninstall".

All the files of ALICE are stored in one directory on your computer
(or folder) usually called "B" but maybe something else depending
on the name you chose when you downloaded ALICE. In any case,
ALICE will not change or damage any other files on your system.
To remove ALICE from your computer, simply remove this folder.
Delete it, or drag it to your trash bin and select "Empty trash"
(or "Empty Recycle Bin").

If you cannot find the folder where ALICE resides, use the Finder
to locate the file called "B.aiml" on your file system. The "B.aiml"
file is in the same directory as all the ALICE files. If this file does
not exist, then ALICE is probably not installed on your computer.

Because ALICE is a platform-independent Java application, it does
not rely on the Windows Registry or other Windows-specific features.
You can assume ALICE will leave your MS Windows Registry and
other Windows system files untouched.

Conceivably if ALICE has run for a long time on your computer, and
you deliberately used the "Save Options" menu item to change the
name or location of her files to something other than the default values,
then there is a slight chance that there could be a few ALICE
files scattered around your disk. Please refer to the DISCLAIMER
at the beginning of DON'T READ ME.

- Can I create a language-specific installtion?

Yes. The file "language.txt" controls the language of the
buttons and menus in the ALICE GUI. If the file is missing,
the program uses English names by default. To see an
example of a language-specific installation, copy the
file "Germanlanguage.txt" to "language.txt" and start
program B.

III. Creating Content

- How does the Personality Wizard work?

The simplest way to alter the content of the basic ALICE
robot personality is to run the Personality Wizard on
the "Options" menu (or in the Kid interface).
This wizard asks the botmaster a series
of questions to set the values of a set of robot
personality tags including its name, gender, preferences
and replies to very common questions.

The Personality Wizard does not create any new AIML
categories. The replies set the value of global tags
like <location/> and <favorite_movie/> that might be
used in many categories throughout the AIML knowledge
base. The basic set of Wizard questions are collected
in the file Personality.aiml.

Hint: If you plan to use the Applet, avoid the double-quote (")
character in the Personality Wizard.

- Can I change the name of the robot?

The AIML tag <name/> inserts the name of the Bot wherever it appears.
The default robot name is "ALICE" but you can change it in the
"Options menu". Select "Show Options" and replace "ALICE" with the
name of your bot, and then do "Save Options". Depending on your
state, you may need to restart program B.

- How can I customize my robot?

AIML provides several tags useful to quickly clone
a chat robot from ALICE with a distinct "personality":

<gender/> the robot's gender
<location/> the robot's location
<birthday/> the robot's birthday
<botmaster/> the botmaster's name

Together with the previously discussed <name/>, these
tags allow you to quickly create a clone from the ALICE
Brain with a separate identity from ALICE.

All the personality tag values can be modifed through
the Personality Wizard. The tag values can also be
changed with the Options Menu in program B. Use "Show Options"
and "Save Options" to customize your chat robot.

To test the new features, we created a male robot named
Brute (because "all men are brutes") born on August 18, 1999.

- How do I know what categories to add?

After you collect some dialogue, run "Classify" and "Quick Targets".
This will tell you the most frequently asked patterns that do not
already have specific responses. The "Target" functions display new
categories with proposed patterns and template fields filled with
the name of another category. Delete the template information and fill
in a new response. You can also edit the pattern to simplify it or
generalize it with a "*" operator.

- What does "Classify" do?

The key to chat robot development is log file analysis. The program
stores client dialogues in a file called "dialog.txt" (unless you
change this default name). The "Classify" button activates a routine
that scans the dialogue file and reports how many times each
category is activated. The processing may take several minutes,
depending on the size and range of the dialogue file chosen. The
result appears as a table in the Edit View window. The program
displays the categories sorted by activation count.

The format of each output line is:

P% (Q%) T PATTERN = N1 W1 + N2 W2 + ...


P = Percent of inputs classified in this category
Q = Cumulative percent up to this category
T = Total count of inputs activating this category
Ni = number of times input Wi detected (blank if Ni = 1)
Wi = normalized input pattern activating this category

- What does "Quick Targets" do?

After running Classify, the Quick Targets button displays a set of
new AIML categories for editing. The program uses statistics to
find new category candidates. These categories are displayed as

<pattern> NEW PATTERN </pattern> <template> OLD PATTERN </template>

where OLD PATTERN is the pattern from the original category and
NEW PATTERN is the proposed new input pattern.

The botmaster may choose to either delete or edit the new category.
If the new category is not desired, delete it by selecting the
category from the text area and "cut" the text with the "delete"

If the new category appears useful, edit the OLD PATTERN string to
create a new reply. Optionally, the NEW PATTERN may also be edited,
depending on how specific a pattern the botmaster desires.

When finished editing the Target categories, go to the "Botmaster"
menu and select "Add AIML". The "Add AIML" menu item will read the
text displayed in the Edit View and parse it into new AIML categories.
The botmaster may then save the updated robot with the "File/Save Robot"
or "File/Save Robot As" menu items.

- What does "More Targets" do?

If you don't see enough good targets with "Quick Targets", hit
"More Targets."

- What does the File menu do?

Save and load text files (transfer contents to/from text area);

Save and load robot (AIML) files.
1. By default, AIML files use the .aiml file extension.
2. The default robot file is called "B.aiml"
3. By default the robot files reside in the same directory as
program B
4. Robot files begin and end with the tags <alice> and </alice>
5. "Save Robot" overwrites the default robot file (see 2).
6. "Save Robot As" can be used to copy a robot.

Exit - exit the program

- What does the Edit menu do?

Paste contents of clipboard into the program B text area.

- What does the Options menu do?

Display and save chat robot options.
Use start and end index to select a range of lines
from the dialog file.

Toggle Beep - Make a sound when a remote client connects.

- What is the Botmaster menu?

The Botmaster menu contains all the tools to help develop chat robots.

Classify - same as Classify button

Default Targets - display targets obtained from
the Default ('*') category,
in a format suitable for
quick conversion to new AIML.

Recursive Targets - display targets from "recursive" categories,
i.e. categories with a template containing
the AIML <sr/> or <srai/> functions.

Autochat - The robot chats with herself; sometimes helpful
in detecting conversation "loops".

Add AIML - Clear the screen and type a line of AIML. Selecting
"Add AIML" adds this new category to the chatbot. You can
test the bot with "Send" and "Classify", then save it with
"File/Save Robot".

In general you can add any number of new AIML categories
to the bot with "Add AIML."

- What does "Help" do?

The "Help" button displays a random FAQ question that ALICE
knows the answer to. You can see the answer by pressing the
"Send" button.

The Help menu provides the same function as the Help button
under the selection "Random Help Question." Select a random
Help question and obtain the reply with the "Send" button.

The Help menu also contains an item to Show All Help Questions.
This command lists all the FAQ questions the robot knows. You can
select one question by deleting the others. Obtain the
answer with the "Send" button.

The menu item "Ask Help Question" is the same as "Send". This
item asks the robot the Help question(s), and displays the reply.

The Help menu displays the entire FAQ with the "Don't Read Me"
selection. Finally, the "GNU Public License" menu items displays
the open source software license for program B.

- What is on the Help menu?

Random Help - Same as "Help" button.

Show Help Questions - Displays a list of all FAQ questions. Select
one by deleting all the others. Obtain the answer with "Send."

Don't Read Me - Display the text of this document.

GNU Public License - Display the software license.

- Do I have to use the GUI to enter AIML content?

No. You can create a new AIML file with any text editor
and add that content to an existing robot with the <load> tag.

Also, you can edit AIML categories in any text file and use
"Load Text File" and "Add Aiml" to add the content.
You can also save the output of "Targets" to a file,
edit that file, and then reload and "Add Aiml".

Finally, you can edit the robot source file files directly.
(By default the robot source file is called "B.aiml").
Use a text editor, like emacs, notepad, or a word processor
in text mode, to modify the content of the AIML files.

- What are 7 steps to creating content?

1. Run program B (ALICE Botmaster)

2. Under "Options", select "Show Options".
Find the item called "AnalysisFile=" and
change the value to the name of the dialogue
file you want to analyze. The default file
name is the same as the default log file
name, "dialog.txt".

3. Press the "Classify" button. Wait
several minutes while the program processes
the data from your log file. When finished
it will display a "brain activation" table
showing the patterns that activated each
category. (You can use "File/Save As Text File"
to save this table to a file, if you want).

4. Now press the "Quick Targets" button.
You will see a set of new categories created
by the program. These are categories with
patterns that have no specific response in the
robot brain. With these categories you have
3 choices (A, B or C):

(A) Delete the category. Many of the suggested
categories are just nonsense or garbage inputs.
Use your cursor and left mouse button to select
the categories for deletion.
The "delete" key will cut them.

(B) Edit a new template. The information you
see displayed in the <template> tags is actually
the pattern of the default category into which
this input was classified. For example you may see:

<pattern>WHO IS 007</pattern><template>WHO IS *</template>

This tells us that the robot classified the client "WHO IS 007"
as "WHO IS *". Use the cursor and left mouse button
to cut the "WHO IS *", and replace it with a new template
of your own design:

<pattern>WHO IS 007</pattern>
<template><set_he>007</set_he> is James Bond, the
famous fictional spy from the novels of Ian Fleming.

(C) Edit a new pattern. Many of the patterns
suggested by "Quick Targets" and "More Targets" are
too specific, but with a little practise you
can easily see how to generalize these suggestions
with the "*" wild-card.

For example you may see one like this:

<pattern>WHO BOMBED PERAL HARBOR</pattern>
<template>WHO *</template>

The original response was based on "WHO *", which
is too general for this topic. But the odds
are small of anyone else using this exact pattern
WHO BOMBED PEARL HARBOR when asking about the
same topic. Think about the alternative ways
of expressing the same question:
"Who attacked Pearl Harbor?", "Who invaded Pearl
Harbor?", "Who through deceit and subterfuge
carried out an unscrupulous and unprovoked suprise
attack on American forces at Pearl Harbor?"
You can cover all of these inputs by generalizing
the input pattern with the wild-card "*",
which matches any word or sequence of words:

<pattern>WHO * PERAL HARBOR</pattern>
<template>The Japanase
attacked Pearl Harbor on December 7, 1941,
"A day that will live in infamy" (FDR).
<A href="">...

Remember, the AIML pattern language allows
at most one wild-card "*" per pattern.

Of course, with choice (C) you have to
edit the template as well as the pattern.

5. When finished with editing the suggested categories,
use "Botmaster - Add AIML" to add the new AIML content.
If you made any syntax errors, you can fix them
and repeat the "Add AIML" as many times as needed.
Be sure to do a "File - Save Robot" at this point
also to back up your changes. This will save all of
your new categories in the root robot file

6. Use "More Targets" to find more new categories
until the new suggestions are fruitless. Then, go
back and start with "Classify" again (step [3]).

7. The responses you create should be a combination
of a "conversational" response like "He is James
Bond, the famous spy" and also provide some HTML
hyperlinks where appropriate.

- How can I merge two chat robots together?

There are two ways to merge robots together. First, you can
use the File menu option "merge" to directly load the contents
of another bot file. You may see a lot of "duplicate pattern
discarded" warnings but these can be ignored because the program
is simply eliminating overlapping content.

Another method is to use the <load filename=X/> tag.
Suppose you load two or more files with the load tag,
and those files contain redundant duplicate patterns.
Which categories get the priority? The answer is: it depends
on the order of the <load> tags used to load the AIML files.
If your B.aiml contains:
<load filename="Brain.aiml"/>
<load filename="German.aiml"/>
then the categories from "Brain" have priority, and duplicates
in "German" are discarded. If the order is the opposite, German
categories have priority and Brain's duplicates are discarded.

- How can I create a new robot personality?

There is a lot of flexibility in robot personality design with AIML.
You can add to any of the existing AIML files, modify or delete them,
create your own, or use the GUI tools to analyze the log files
and create new categories. One simple method is to create your own
Specialty.aiml file so that you can always get the latest copies
of the ALICE files. Load your Specialty.aiml first in the root
AIML file (usually B.aiml) so that its categories have priority over ALICE's.

- What are all the options for program B?

There are robot personality options, animated agent options,
log file and analysis options, and options for the web server
and for the applet. Most of the time you won't need to change
many of these values. For completeness, the entire set
breaks down into:

Robot options:

Sign - Astrological sign
Wear - clothing and apparel
ForFun - What the robot does for fun
BotFile - Root file of robot personality
BotName - Robot name
Friends - The robot's friends
LookLike - The robot appearance
Question - A random question
TalkAbout - favorite subjects
KindMusic - Favorite kind of music
BoyFriend - Does the robot have a boyfriend?
BotMaster - Robot author
BotGender - male, female or custom
GirlFriend - Does the robot have a girlfriend?
BotLocation - Robot location
BotBirthday - Robot activation date
FavoriteBook - Robot's favorite book
FavoriteFood - Robot's favorite food
FavoriteSong - Robot's favorite song
FavoriteBand - Robot's favorite band
FavoriteMovie - Robot's favorite movie
FavoriteColor - Robot's favorite color
BotBirthplace - Robot's birthplace

MS Agent options:

Animagent - true or false for activating MS Agent VB scripting
ACFURL - file or URL location of MS Agent software

Log/Analysis options:

AnalysisFile - file selected for log file analysis
LogFile - file for recording robot dialogues
ClientLineContains - a pattern identifying input lines in logfiles
RobotLineStarts - a pattern identifying robot lines in logfiles
StartLine - starting line for analysis
EndLine - ending line for log file analysis

Applet options:

AppletHost - DNS name or IP address of applet's server.
CodeBase - URL or directory of applet code.

Web server options:

ClerkTimeout - Web server option to retire waiting clerks
BrainSize - a threshold number of categories to display "loading"
Advertize - a boolean parameter to optionally display ad
Beep - Web server option to beep on client connections


Version - read only version number
TempFile - scratch file for temporary data

All of the options reside in the globals.txt file. Running
program B, choose "Options/Show Options" to see the
contents of the file.

- Why is the format of the options (globals.txt) so strange?

Depending on your system, you may see a globals.txt file that looks like:

Botmaster=Dr. Richard S. Wallace
Birthday=November 23, 1995
# ... and so on

The global values seem to be stored in a random order.
This is not a bug. The Globals class uses the Java methods
Properties.load() and to save the globals
to a file. You can also use # and ! to add comments to the file.

The Properties class uses a hash table representation, so does
not preserve the order of the global variables. The program
displays and saves the global options in an arbitrary order.

IV. Web server interface

- How does the web server work?

By default the web server starts on port 2001. This means you can
access the web server through the URL http://localhost:2001 on
your own machine. Find out your IP address or DNS name and tell
your friends to connect to "".
(One way to find out your IP address is by running "netstat -n"
to view all your open TCP/IP connections).

- How can I get a "permanent" DNS name?

You can buy a fixed IP address from an ISP provider, but suppose
you want run a chat robot (or other server) from your home over an
ordinary ISP connection? Or suppose you want to carry it around on
your notebook PC, and plug it in anywhere in the world?

One solution is a dynamic IP registry service by Dynip (
They offer a service that allows you to register your computer
with their server so that you always receive the same DNS name,
for example Every time you connect to your
ISP, dynIP automatically associates your dynamic IP address with
your permanent DNS name.

- How can I keep my computer connected all the time?

Running a web server from home can be frustrating if your ISP
automatically detects periods of "inactivity" or hangs up your
connected after a fixed interval like 12 hours. Check out the
Rascal program from Basta computing ( which runs
as a watchdog to keep your Windows machine connected 24/7.

Another alternative is to use the program B applet, called

A third alternative is the ALICE Servlet. Some ISPs will
allow you to install a Servlet on their sever.

- Does the web server have to run on port 2001?

You can change the default web server port number in the "Option" Menu.

- Does program B serve HTML files?

Yes. Program B is a "faux" web server that can serve a number of file
types just like an ordinary server. Certain file names such as
"HOME.html", "header.html", and "trailer.html" are reserved by
program B, but you can create new HTML files and serve them with B.

Although program B can also serve image files and other large binary
files, we recommend creating chat robot web pages with links to images
served by other web servers or machines. Reserve your chat robot server
for the robot chat, use ordinary web servers for images and other large

- What files are needed to run the program B web server?

The program B directory must contain the HTML files header.html,
trailer.html, loading.html and HOME.html. You can customize these files for
your bot, but take care with "header" and "trailer" because
program B uses these files to construct an HTML reply
(by inserting the robot reply and the text form between the
"header" and the "trailer"). Use "header" and "trailer" to
customize the robot with your own logo and links.

Program B needs at least one AIML file, usually called B.aiml
by default. The AIML file may contain <load> tags that recursively
load other AIML files; these must also be present.

The program also requires the file "globals.txt"
which it reads at start up.

The files "language.txt" and "predicates.txt" are option.
"language.txt" controls the language of the buttons and
menu items in the program B GUI. The file "predicates.txt"
defines any custom predicates.

Program B also reads the files "gnu.txt" (the GNU Public License)
and "dont.txt" (this file).

- Can I test the robot offline on my desktop?

Yes. You can run the program B server and connect to it with
a browser, even if your desktop computer is offline.

When working offline, it often helps to change the Internet
settings (in IE or Netscape) to "local area network".
Then your machine becomes a one-computer network. You should
be able to use IE to connect to program B with http://localhost:2001.

- How can I use the MS Agent Interface?

Select the menu item Options/Toggle MS Agent. This sets the
output HTML to a format that includes commands to run MS Agent.

The client may activate the agent if she receives a template
with the <set_animagent/> tag. The free ALICE download includes
a couple of example categories using this tag. Try asking
ALICE, "Can you speak?". In another demo ALICE imitates
the famous fictional AI HAL from 2001: A Space Odyssey.

Client: Tell me about yourself
Robot: I am an artificial linguistic entity. I was created
by Dr. Richard S. Wallace at Bethlehem, Pennsylvania,
on November 23, 1995. He taught me to sing a song.
Would you like me to sing it for you?.
Client: yes
Robot: Ahem. It's called, "Daisy." (Agent sings "Daisy")

The MS Agent VB script appears as embedded HTML in the client
reply. To verify the script, use the browser "View Page Source"
menu item.

On most newer browsers, the agent software will download
automatically after the script starts. The download may take
several minutes, depending on the speed of the connection.
Clients should be warned that the download is slow. Also,
the agent software download will display one or more licenses
in Dialog boxes. You may not want to accept the terms of the
MS agent software licenses.

- Can you help me debug the animated agent?

Look at the class The method vbscript_html(reply)
does nothing unless the global Animagent member is true. In that case,
the vbscript_html() method constructs a string from the reply that
includes an MS Agent VBScript embedded in the HTML reply.
This makes the browser load up the objects required for the agent.

The text reply just becomes part of the VBScript.
You may have to download and run the Robby the Robot
agent software and the text-to-speech synthesis software from
the MSDN homepage:

We wish other companies were producing agent animation API's
for free but this MS Agent seems to be about the only
thing out there now.

Join the ALICE and AIML mailing list at
to see how others are working with the animated agent software.

- Can I speak to the robot with voice input?

One simple experiment that works well as a demo
involves using IBM ViaVoice (tm) speech recognition
software on a Windows platform. At the same time,
run the ALICE program B web server and activate the
MS Agent interface. The ViaVoice software allows
you to dictate into an application called VoicePad,
but not directly into the browser. You have to
use "cut" and "paste" to move your speech inputs
into the browser form for ALICE. But the net effect
is a somewhat slow voice-in voice-out conversation
with ALICE.

The ViaVoice software seems to work well with ALICE
after some training. We trained it with the file
"patterns.txt" created with the "List Patterns" command.

- How does ALICE keep track of conversations?

Originally ALICE used IP addresses to keep track of clients.
Assuming that everyone chatting with ALICE has a fixed IP
address, at least for the duration of their conversation,
this technique works successfully. Each IP address is a key
into a hashtable (or database) that stores the client's
dialogue, name, and values of pronouns and other AIML values.

Unfortunately, many clients have "dynamic IP addressing" enforced
by their ISP provider. AOL and MS WebTV are two notorious examples:
each successive client transaction appears to come from a different
host. For this reason, program B uses a form of "virtual IP"
addressing to track dialogues.

The form in index.html (and the ALICE home page) contains a
tag that creates a "hidden" parameter called "virtual" with
an initial value of "none." The server assigns a unique name
to the value of "virtual", which then becomes a hidden variable
in the client's HTML form. Each successive client transaction
contains this virtual IP address; the server uses it as a key
to index the conversation.

- Can the virtual IP be the real IP?

Actually that would be the default case, when the client chats from
the same fixed IP address. The only time the virtual ip differs from
the real one is when the client is behind a dynamic firewall, like
WebTV or AOL customers.

- Can I run the web server as a daemon process?

Yes. There is a class file called in the
program B distribution. Bterm runs the web server
as a console application, with no GUI. You can
redirect the output of program Bterm to a log file
and start the process in the background with
"java Bterm > B.log &" (assuming a Unix shell).

- How does ALICE remember clients between sessions?

The persistence of memory in ALICE is inherited from
the Java Properties class. The program B class Classifier
saves the client name, age, location and other properties
in a set of Properties lists. These Properties inherit
the Java load and store methods. Program B uses the load
and store methods to save the client properties in a set of
files with names ip_name.txt, ip_age.txt, ip_location.txt
and so on. If these files become too large or bothersome,
there is no harm deleting or editing them, or moving them
to another directory.

The Applet requires no memory of the client properties, because
the applet has only the one client, and in any case remains in
memory (at least for the lifetime of the client's browser cache).

V. Applet Interface

- How does the Applet work?

Program B supports the creation of both server-side and client-side
chat robots. The server runs as a thread in program B. The
client-side version is supported by an applet called

The Applet runs ALICE in a web browser, or with
the Java tool appletviewer. The file "index.html" contains an
example of the HTML Applet tag syntax needed to start
the Applet. The command "appletviewer index.html" will start the

You also have to create the file "index.html" and change the
default value of the parameters "codebase" and
"applethost" serve the Applet from your location.

- How does the Applet differ from the application?

The Applet runs on the client's computer; the server runs
on your host machine. The applet has fewer privileges and
therefore a simpler user interface than the Application,
which uses menus and buttons to control server-side functions.
The Applet may reside on any web server, such as one provided
with an ISP account, but the application requires a 24/7
connection to the Web.

Internally, the primary difference between the two programs
is that the Applet handles only one client conversation,
while the application processes multiple client connections
simultaneously. The Applet also suppresses all HTML (and any
other XML) from the client response.

- How do I create an Applet?

Go to the Options menu and select "Show Options." You need
to change the values of "AppletHost" and "CodeBase" to the
correct IP address and directory for your applet host.
Many people want to post the applet on their web site.
In that case, change the IP address "" to
the name or IP address of the web server. Change the
directory path "/B" in "CodeBase" to your directory on
the remote server. Save the changes with "Save Options."

Select "Create Applet" from the options menu to create
the "index.html" and "Blet.aiml" files needed to run
your applet. The program displays the contents of
"index.html" in your text area.

Use a file transfer utility like FTP to upload the
class files (or jar file--see "What files do I need to
run the Applet") to your web server.

- List twelve basic Applet tips for AIML users

1. Applets are notoriously hard to debug; you are not dumb.
2. An applet can work perfectly well in Appletviewer, but
then break in the browser, for any number of reasons.
3. Let's get the terminology straight: the applet resides on
an "originating host" but runs on a "target machine".
4. The browser is very picky because of the "security
sandbox"--the browser doesn't trust Applets so they can't
open files (and obey other restrictions) on the target machine.
5. The Applet MAY open a socket connection from the
target machine to the originating host.
6. When you are debugging the applet, the target machine
might be the same as the originating host (your computer).
7. When you post your applet to a remote web server,
that server becomes the originating host.
8. You can use ftp to transfer the Applet files to the
remote web server.
9. You must transfer ALL the applet's files
to the originating host.
10. You must change the program B values of "CodeBase"
and "AppletHost" (the originating host) to the name and
location of the files on the remote server.
11. Use "Create applet" to create the "index.html" and
"Blet.aiml" (make sure you have the latest release of
12. We recommend placing all the *.class files into
a single "Blet.jar" file (see DON'T READ ME).

- Can the AppletHost use a symbolic DNS name instead of an IP number?

The answer is yes, but the numeric IP address works on more machines
than a symbolic name. Applets are protected by a "security sandbox"
from interfering with local resources on your machine. One restriction
is that Applets may only open socket connections to the originating
host. When using a symbolic DNS name, the "sandbox" may not know that
two variations such as "Www.AliceBot.Org" and "" are
in fact the same server. The client might not be able to resolve
the DNS name, and the Applet will throw a security exception.

- What files do I need to run the Applet?

You only need the java *.class files and the *.aiml files
to run the ALICE Applet, no more files are necessary.
You can also put all the class files in a single jar
file like Blet.jar. The sample index.html provided with the ALICE
distribution uses this Blet.jar file.

Not all of the Java source files are involved in the Applet.
You can use the following command to compile all the Java source
files needed for the Applet:

javac \

Then, you can use zip (or jar) to collect the class files into
a single jar file:

zip -r Blet.jar *.class

The *.class will include all the class files you compiled.

The *.aiml files have to be on the same host that serves the Applet. An applet
can only open files on the server it originated from.

Don't forget to change the Applet host parameters in index.html, when
you upload the applet to an ISP.

- Does the Applet record dialogues?

The applet tries to log conversations on the originating server,
using a cgi-bin script called "Blog". If Blog exists then
it records the dialogues in a file called "dialog.txt" (or
another name chosen on the Options menu).

Actually the cgi-script need not actually exist, because the server
records the cgi-commands as errors in the access log.
The applet opens a URL connection to the its host, and
sends a log string that looks like an HTTP request, but the HTTP
server will log it as an error (with code 404). Later on you can
download the access_log and analyze it with program B.

See the code in for the method log(x) that
implements the URL connection.

- Can I analyze the dialogues collected by the Applet?

If the web server produces an access_log file, such
as /var/log/httpd/access_log, then the server records
Applet dialogue in the access_log file. You may use
ftp to download the access_log file to your machine;
then run program B to analyze it.

Go to the Options menu and find the value for "AnalysisFile".
The Classify function operates on the data in the AnalysisFile.
By default the AnalysisFile is the same as the LogFile (the
current server log file). But you can change the analysis
file to another name, such as /var/log/httpd/access_log or
just access_log.

- Can the applet record a dialog.txt file on the server?

No because the applet cannot write the file directly on the originating host.
If your server log file /var/log/httpd/access_log is too large; you
have a couple of choices:
1. If your ISP is a unix account, use telnet to log on to a shell account.
Use the command "grep Blog < access_log > dialog.txt" to create a smaller
file to download which contains just the lines recorded by the applet.
2. Create a CGI-BIN command called "/cgi-bin/Blog" that reads its
command-line argument and appends it to a file called "dialog.txt".
There ought to be a nice Perl script for this, or even a shell script.

- I am still having problems with the applet

If your applet is looking at Blet.aiml and your web space is at and your aiml files are in dirctory /alice/ then
your load statements in Blet.aiml would look similar to this:
<load url="">

If this is what you have, then open up the "Java Console" window
in your browser to get whatever debugging information is coming
out. The Java console will display any error messages or
exceptions caught by program B. Please report these
errors to the ALICE and AIML mailing list at

- Can you give me any help debugging the Applet?

Debugging applets can be tricky. The same suggestion
to set IE for "local area network" might help here too.
Also the browser caches class files, so it's difficult to
know if you are testing a "fresh" copy of the applet. The
program "appletviewer" that comes with Sun Java is better
for debugging applets. Use "appletviewer index.html".

The best thing to do is join the alicebot mailing list


- What is AIML?

The ALICE software implements AIML (Artificial Intelligence Markup
Language) a non-standard evolving markup language for creating chat robots.
The primary design feature of AIML is minimalism. Compared with
other chat robot languages, AIML is perhaps the simplest. The
pattern matching language is very simple, for example permitting
only one wild-card ('*') match character per pattern.

AIML is an XML language, implying that it obeys certain grammatical
meta-rules. The choice of XML syntax permits integration with
other tools such as XML editors. Another motivation for XML is
its familiar look and feel, especially to people with HTML experience.

An AIML chat robot begins and ends with the <alice> and
</alice> tags respectively.

- What is XML?

David Bacon pronounces it "Eggsmell". XML is the Extensible
Markup Language. Like many "standards" in computer science, XML
is a moving target. In the simplest terms, XML is just a generalized
version of HTML. Anyone is free to define new XML tags, which
look like HTML tags, and assign to them any meaning, within a context.
AIML is an example of using the XML standard to define a specialized
language for artificial intelligence.

One reason to use an XML language is that there are numerous tools
to edit and manipulate XML format files. Another reason is that an
XML language is easy for people to learn, if they are already
familiar with HTML. Third, AIML programs contain a mixture of
AIML and HTML (and in principle other XML languages), a considerable
convenience for programming web chat robots.

A good resource for information on XML is

- What is a category?

AIML consists of a list of statements called categories. Each
category contains an input pattern and a reply template.
The syntax of an AIML category is:

<pattern> PATTERN </pattern> <template> Template </template>


<pattern> PATTERN </pattern>
<that> THAT </that>
<template> Template </template>

The AIML category tags are case-sensitive. Each open tag has an
associated closing tag. This syntax obviously derives from XML.

- What is a pattern?

The pattern is the "stimulus" or "input" part of the category.

The pattern is an expression in a formal language that consists of
(1) Words of natural language in UPPER CASE.
(2) The symbol * which matches any sequence of one or more words.
(3) The symbol _ which is the same as * except that it comes
after Z in lexicographic order.
(4) The markup <name/> which is replaced at robot load time
with the name of the robot.

Note there is a difference between the patterns HELLO and HELLO *.
HELLO matches only identical one-word sentences ("Hello.")
and HELLO * matches any sentence of two or more words starting
with "Hello" ("Hello how are you?").

To simplify pattern description and matching, AIML patterns allow
only one "*" per pattern. In other words, "MY NAME IS *" is a
valid pattern, but "* AND *" is not.

- What is a template?

A template is the "response" or "output" part of an AIML category.

The template is the formula for constructing the reply. The simplest
template consists of plain, unmarked text. AIML provides markup
functions to tailor the replies for each individual input and client.
The markup function <getname/> for example inserts the client's name
into the reply.

The template may call the pattern matcher recursively using the
<sr/> and <srai> tags. Many templates are simple symbolic
reductions that map one sentence form to another, for example
"Do you know what X is?" transforms to "What is X" with the category

<pattern>DO YOU KNOW WHAT * IS</pattern>
<template><srai>WHAT IS <star/> </srai></template>

The template may also contain other embedded HTML and XML.
These embedded tags may cause the browser to play a sound,
show an image, or run an applet. There is considerable freedom
of expression in the construction of response templates. The
botmaster is encouraged to study the examples in ALICE, to and
experiment with new ideas.

- What is "that"?

The keyword "that" in ALICE refers to whatever the robot said before
a user input. Conceptually the choice of "that" comes from the
observation of the role of the word "that" in dialogue fragments like:

Robot: Today is yesterday.
Client: That makes no sense.

Robot: The answer is 3.14159
Client: That is cool.

In AIML the syntax <that>...</that> permits an optional "ThatPattern"
to match the robot's "that" expression. A common example using "that"
is any yes-no question:

<that> DO YOU LIKE MOVIES </that>
<template> What's your favorite movie? </template>

This category handles the user input "YES" and checks to see whether
the client is replying to the question "What's your favorite movie?".

One interesting application of "that" are the categories that
enable a robot to respond to "knock-knock" jokes:

<pattern>KNOCK KNOCK</pattern>
<template>Who's there?</template>

<that>WHO IS THERE</that>
<template><person/> Who?</template>

<that>* WHO</that>
<template>Ha ha very funny, <getname/></template>

Robot: Who's there?
Client: BANANA
Robot: banana Who?
Robot: Who's there?
Client: BANANA
Robot: banana Who?
Robot: Who's there?
Client: ORANGE
Robot: orange Who?
Robot: Ha ha very funny, Aol-person

- How do I use "that"?

The AIML tag <that> refers to the robot's previous
reply. There are two forms of the <that> tag:
a paired form <that>...</that> appearing in a
category, and an atomic form <that/> always appearing
in a template. Often we can use <that/> to find
an opportunity to create a category with <that></that>.

One of the default replies to the input "WHY" is
"<that/>"? Why? This default produces the following
dialogue fragment:

Robot: Do not ask me any more questions please.
Client: WHY
Robot: "Do not ask me any more questions please"? Why?

The botmaster notices the fragment and creates the
new AIML category:

<template>Because I would rather talk about you.</template>

Now the next client who asks "WHY" to the robot's
request will active the new <that> category:

Robot: Do not ask me any more questions please.
Client: WHY
Robot: Because I would rather talk about you.

This style of conversational analysis does not
presuppose that we know when the client will
say "WHY"; rather it looks backward to identify
cases where the "WHY" appeared following one
of the robot's statements. Having identified
the conversation point, the botmaster creates
the new category.

- What is <load filename="X"/>?

The template may contain a <load/> tag to recursively load an AIML
file. The semantics of a load are the same as a merge: categories
loaded first have priority; the server eliminates categories with
duplicate patterns.

The default robot file B.aiml contains the top-level load commands.
There are several ways to "comment out" a <load> tag in order
to test your system with a smaller robot. You can change the
line reading
<load filename="Brain.aiml"/>
<noload filename="Brain.aiml"/>
and the AIML parser will simply ignore the non-existent "noload"

- What happens to contractions and punctuation?

Program B has a class called Substituter that performs a number
of grammatical and syntactical substitutions on strings.
One task involves preprocessing sentences to remove ambiguous
punctuation to prepare the input for segmentation into individual
sentence phrases. Another task expands all contractions and
coverts all letters to upper case; this process is called

The Substituter class also performs some spelling correction.
(See also the question "What is <person/>?")

One justification for removing all punctuation from inputs
is the need to make ALICE compatible with speech input systems,
which of course do not detect punctuation (unless the speaker
utters the actual word for the punctuation mark -- "period").

- How are the patterns matched?

Program B stores the categories in alphabetical order by pattern.
When a client enters an input, the program scans the categories
in reverse order to find the best match. By comparing the
input with the patterns in reverse alphabetical order, the algorithm
ensures that the most specific pattern matches first. "Specific"
in this case has a formal definition, but basically it means that
the program finds the "longest" pattern matching an input.

The wild-card character "*" comes before "A" in alphabetical
order. For example, the "WHAT *" pattern is more general than "WHAT IS *".
The default pattern "*" is first in alphabetical order and the
most general pattern. For convenience AIML also provides a
variation on "*" denoted "_", which comes after "Z" in alphabetical

- Do the categories need to be in alphabetical order by pattern?

No, the alphabetical order is maintained internally when the
categories load, but you can write them in any order. When you do
"Save Robot" the file may or may not be stored alphabetically.

- How are the categories stored?

If your session with program B included a "Classify" routine, then
the AIML script is stored in order of category activation rank.
In other words, program B stores
the most frequently accessed category (usually '*') first, the second
most frequently next, and so on. If a number of categories have the
same activation count, program B saves them in alphabetical order by
pattern. Hence, if the session did not include a "classify" routine,
the program stores all the categories in alphabetical order by pattern
(because they all have an activation count of zero).

One reason to store the categories in order by activation is to
make the Applet interface more natural. Because the Applet interface
starts simultaneously with a thread to load the robot source file,
the Applet client can talk with the robot before all the categories
are fully loaded. Given that the interlocutor is more likely to
say something that activates a more frequently activated category,
it makes sense to transmit these categories first. Storing the
*.aiml files in order of category activation achieves the desired effect.
The Applet loads the most frequent categories first, and continues
loading in the background while the conversation begins.

- Is there a way to use the GUI interface to add one category at a time?

Yes. Do a "clear". Type in one category:

<pattern>WHO IS JOHN</pattern>
<template>He is a really smart guy.</template>

Now do a "Add AIML". If you like the result, do a "Save Robot".

If your name is not John, try replacing JOHN with
your own name. Notice that the pattern is in all upper case.
This is called "normalized form". We store patterns this way
for efficiency. The template on the other hand consists of
mixed case.

You can also create a file of AIML, do a cut & paste, and then "Add AIML"
to add more categories. Editing the source file directly is of course also
useful. If you edit the source file, select "Load Robot" to load it.

Try creating a text file with the category:

<pattern>WHO IS JOHN WANG</pattern>
<li>He is a really smart guy.</li>
<li><set_he>John Wang</set_he> is a great father.</li>

Load the file into program B with the "File/Load Text File"
menu item. Then select "Add AIML" from the Botmaster menu.

- Can I build on top of the ALICE code rather than changing it?

Absolutely. You only have to change her name, location, birthday and/or
botmaster, and put a couple of references to yourself. Then add new
categories that cover your own area of expertise or interest.

- What's new in AIML?

AIML is changing. The original tag syntax was changed
into XML. Right now, AIML uses XML syntax for the
categories, patterns, "that" patterns and templates, but inside the
<template> tag you may still see the original +~ syntax in a few places.
But this will change soon. For completeness program B
supports both versions.

The biggest change between the old AIML and the new
XML version of AIML is the elimination of the "+"
character to stand for string appendage. The change
is of little concern except in the implementation of
<random>, discussed at length below.

The old AIML used a tilde (~) markup character to
indicate the start of an AIML token. The XML version
naturally uses an SGML type tag syntax instead.

XML tags, unlike HTML, are case-sensitive. Moreover, XML syntax
requires a closing tag of some kind. The "empty" tags that contain
no text, like <A></A> in HTML, are written like <A/> in XML.

- What is <star>?

The <star> tag indicates the input text fragment matching the pattern '*'.
Remember, <star/> is an XML abbreviation for <star></star>.

<star/> the value of "*" matched by the pattern.

- What is a symbolic reduction?

In general there are a lot of categories whose job is
"symbolic reduction". The category:

<pattern>ARE YOU VERY *</pattern>
<template><srai>ARE YOU <star/></srai></template>

This category [in Brain.aiml] will reduce "Are you very very smart"
to "Are you smart".

- What are the get methods?

Get methods are logically atomic tags, i.e. they enclose no text.
(similar to say <P> or <IMG> in HTML). But XML requires closing tags.

All the "get" methods retrieve values stored relative
to a particular client IP address. We use
hash tables to store the maps from IP to these attributes.

<get_ip/> Get the client's IP address
<getname/> client's name
<gettopic/> The "topic" of conversation
<name/> Robot's name
<location/> Robot's location
<gender/> Robot's gender
<birthday/> Robot's birthday
<that/> what robot said previously
<get_location/> the client's geographic location
<get_it/> the value of "it"
<get_they/> the value of "they"
<get_he/> the value of "he"
<get_she/> the value of "she"
<get_we/> the value of "we"
<get_gender/> a string like "she" or "he" for client gender

In XML languages there is always a tradeoff between creating attributes
and creating new tags. The get methods are really all special instances
of a more general <get attribute="name">, for example
<get_we/> = <get attribute="we"/>

The attributes with explicit "get" names (getname, get_it, get_we etc.)
are client-specific properties. The other attributes (e.g. <name/> and
<botmaster/>) relate to the robot.

- What are the set methods?

Set methods consist of single-tag and double-tag markup. The

<set_male/> the client gender is male
<set_female/> the client gender is female
<set_animagent/> activates the animation agent.

<setname> X </setname> sets the client name to X
<settopic> X </settopic> sets the topic to X
<set_it> X </set_it> sets the value of "it" to X
<set_location> X </set_location> sets the value of client location
<set_they> X </set_they> sets the value of "they" to X
<set_he> X </set_he> sets the value of "he" to X
<set_she> X </set_she> sets the value of "she" to X
<set_we> X </set_we> sets the value of "we" to X

<set_thought> X </set_thought> is a custom tag suggested by Andrew
Potgieter for storing a predicate for "what are you thinking about?"
See the documentation on custom tags and the predicates.txt file.

- How do I use the pronoun tags?

The guiding principle in creating content is dialogue analysis.
The principle applies to pronouns like "it", "he" and "we". This
example shows how a couple of tricks helps us find places where
the pronoun tags improve the dialogue.

Client: HELLO
Robot: Hi there!
Robot: I am functioning within normal parameters.
Client: I AM BROKE
Robot: Do you mean your name is broke.
Robot: I see. Would you like to have it?
Client: YES
Robot: "Would you like to have it"? Yes?

One of the default responses to a YES pattern is
"<that/>?" Yes?

The conversation breaks down somewhat at this point, but
the robot nonetheless conveys an illusion of understanding
the client's response. With its response, the robot is really
telling the botmaster to create a new "YES" reply specific
to this situation:

<pattern>YES</pattern><that>WOULD YOU LIKE TO HAVE IT</that>
<template>Where do you usually get <get_it/>?</template>

Now the next time we run the dialogue it ends with:

Robot: I see. Would you like to have it?
Client: YES
Robot: Where do you usually get money?

There is no automated technique to finding the conversation
points where the pronoun tags improve the flow of conversation.
This is the "art" of AIML programming. The example shown here
with '"<that/>?" Yes?' exemplifies one approach to finding these
conversation way points. There are no doubt countless other
tricks like this, and the field is wide open to linguists and
writers to help us uncover them.

- I still don't get "it"

Here is another example that might help clarify
the meaning of the pronoun "it."

The category with the pattern "DO YOU LIKE *" is
a kind of default category for a whole class of
inputs matching "Do you like X?", when the
input does not match a category with a more specific
pattern like "DO YOU LIKE CATS". No matter what the
client says, we want the robot to remember that
"it" stands for "X".

Many inputs activate this default category, so
the botmaster tries to create a variety of replies
using the <random> tag twice. One interesting
side-effect of the random tag is the evaluation
of all AIML inside the <random>...</random>
expression, whether the random function selects
the list item containing the markup or not.
So if one item happens to contain a <set_it>
tag then the program will remember "it",
even if the actual random reply is a different item.

<pattern>DO YOU LIKE *</pattern>
<li>I don't know if I like <set_it><person/></set_it>. </li>
<li>I've heard other people say they like that. </li>
<li>Some people like it. </li>
<li>How old are you? </li>
<li>Do you like it? </li>
<li><for_fun/>. </li>
<li>I like pets. </li>
<li>I love horses. </li>
<li>I like trains. </li>
<li>But I like cats. </li>
<li>I really like ELIZA. </li>
<li>But I like to gossip. </li>
<li>I like science fiction. </li>
<li>Though I like cats and dogs. </li>
<li>My favorite thing is robots. </li>
<li>I love TV shows about robots. </li>
<li>But I think airplanes are cool. </li>
<li>I like to talk about movies and music. </li>

In the "DO YOU LIKE *" category one list item

I don't know if I like <set_it><person/></set_it>.

results in the pronoun "it" being bound to the value
of "*", rewritten in third person. With this category,
the robot has the following dialogue:

Client: Do you like Dr. Who
Robot: How old are you? I like to make new friends.

The robot response is a combination of two random
selections. But as a side effect, the pronoun value
of "it" is now set to "Dr. Who".

Client: It is a TV programme
Robot: Dr. Who is a TV programme. I suppose that makes sense.

To generate the final response, the robot applied its category:

<pattern>IT IS A *</pattern>
<template><get_it/> is a <person/>. I suppose that makes sense.</template>

The robot is correct that the reply does indeed make sense. Does
this example suggest that in our human mind "it" is just a temporary
storage register for a fragment of text?

Extra credit: In the dialogue fragment above, why did the robot say
"I like to make new friends"?

- Can I create more AIML tags?

AIML is extensible. You can create an infinite number of
new tags for foreign language pronouns, predicates, or
application-specific properties. The file "predicates.txt"
defines any new predicate tags. "Predicate tags" mean
tags that have a client-specific "set" and "get" method.
Pronouns like "it" and "he" have predicate tags like
<set_it></set_it> and <get_he/>. AIML has a number of
these built-in tags for common English pronouns.

There are two varieties of extensible predicate tags.
The first example illustrates the use of new tags
for foreign language pronouns. The Japanese language
pronoun "kare" means "he". In predicates.txt, we
can add a line of the form:


This single line automatically generates the tags
<set_kare> X </set_kare> to set the value of "kare"
to X, and the tag <get_kare/> to retrieve the value.
By default, <get_kare/> returns "dare" ("who?").

Now we can create two AIML categories for an elementary
Japanese conversation:

<pattern>KARE WA * DESU</pattern>
<template><star/> wa <set_kare><star/></set_kare> desu.</template>
<pattern>KARE WA * DESU KA</pattern>
<template><get_kare/> desu ka? </template>

These two categories produce a coherent conversation
about Mr. Arimoto:

Robot: dare desu ka?
Robot: arimoto san wa kare desu.
Robot: arimoto san desu ka?

Notice that the program replaces <set_kare/><star/></set_kare>
with "kare", the default value of this tag. Sometimes however
the default value of the tag is not the best choice for
the set tag to return. The second type of predicate tag
replaces the set tag markup with the original string
inside the markup.

To create a predicate tag "has" we add the following line
to predicates.txt:


The default value of <get_has/> is "mother" (because
everyone has a mother). The "*" here indicates that
the program should replace <set_has> X </set_has>
with X.

Here we write two AIML categories to demonstrate the
"has" predicate:

<pattern>I HAVE A *</pattern>
<template>Where did you get

<pattern>WHAT DO I HAVE</pattern>
<template>Tell me more about your <get_has/>. </template>

Notice that the first category embeds <set_has> in <set_it>,
so that the value of <person/> becomes both "it" and "has".
The resulting dialogue:

Robot: Tell me more about your mother.
Client: I HAVE A BOX
Robot: Where did you get it?
Robot: Tell me more about your box.

Look at the file "predicates.txt" for several more
examples of custom AIML tags, including the German
<set_sie>, <set_er> and <set_es>.

- What is are the <person> tags?

The <person> and <person2> tags indicate a place where the
AIML interpreter changes the personal pronouns in a sentence.

<person2> X </person2> change X from 1st to 2nd person
<person> X </person> exchange 1st and 3rd person

<person2> is not often used. The main application is

Client: I admire robots like you.
Robot: That's good information: Joe said he admire robots like me.

The transformation is a combination of:

1. change the first person pronouns to second person.
2. change the third person pronouns to first person.

The array in is incomplete. We need more substitutions
to make person2 work really well.

The <person> substitution is much more common and easier
to understand, because it simply exchanges 1st and 3rd person
pronouns. The main issue with <person> in English is knowing
when to use "I" and when to use "me".

- What is the <person/> tag?

The XML specification requires that every start tag such as
<person> be followed by a matching end tag like </person>.
HTML is more relaxed about this requirement, exemplified by
the liberal use of the <IMG> tag without a corresponding </IMG>.
XML supports a shorthand notation for the "atomic" tags.
The <star/> tag is an example of a shorthand AIML tag.
<person/> is another example:

<person/> = <person><star/></person>

This tag replaces the +~person(*)+ tag in old-style AIML.

- What is the <person2/> tag?

This tag is an abbreviation:

<person2/> = <person2><star/></person2>

See the FAQ question "What are the <person> tags?" for more
information about <person2/>.

- What is "gossip" ?

Gossip is an interesting "learning" feature of AIML. The best way to
illustrate the gossip function is with an example. Consider the
category with the pattern "I * " and the template:

Thanks for the gossip:
<gossip><getname/> said <get_gender/> <person2/> </gossip>.

The gossip() function tells AIML that the botmaster wants to save this
tidbit as gossip. The <get_gender/> function returns "he" or "she" as
determined by the markup functions <set_female> and <set_male/>.
The <person2/> function converts the statement * to second person.
At present the robot stores the gossip collected in a file called

<gossip> X </gossip> Save X as gossip.

- What is the <personf/> tag?

The value of <personf/> (a "formatted" personal pronoun transformation)
is shown by the example

<pattern>WHAT IS A *</pattern>
What does
<A HREF="<personf/>">
<set_it> <person/> </set_it>
</A> mean? <BR>
Or Ask Jeeves:
<A HREF="<personf/>">
What is a <person/>?

The search strings formatted for the Webster Dictionary and for
the search engine utilize <personf/>. The effect is the
same as <person/>, but the formatting inserts an escaped "%20" in
places of the spaces returned by <person/>. These escape sequences
permit the HTTP GET methods to transmit multiple-word queries.

- What's the <srai> tag?

The recursive function <srai> stands for
"Stimulus-Response artificial intelligence" and means
that the text between the tags should be sent recursively
to the pattern matcher and the result interpreted.
The resulting text replaces the original text in the markup.

<srai> X </srai> calls the pattern matcher recursively on X.
<sr/> recursive call to chat robot

<sr/> abbreviates <srai> <star/> </srai>

Note: what happens if X contains AIML markup? Does the interpreter
do "lazy evaluation"? Look at the source code and examine the
method pfkh(), the Program Formerly Known as "Hello".

- Could you explain the <srai> tag a little more?

The most common application of <srai> is "symbolic reduction"
of a complex sentence form to a simpler one:

<pattern>DO YOU KNOW WHAT * IS</pattern>
<template><srai>WHAT IS <star/></srai></template>

so the botmaster can store most knowledge in the simplest

<pattern>WHAT IS LINUX</pattern>
<template><set_it>Linux</set_it> is the best operating system.</template>

With all the "symbolic reduction" categories, the robot gives
the same answer for:

"What is Linux?"
"Do you know what Linux is?"
"Define Linux"
"Alice please tell me what Linux is right now"

Sometimes the response consists of two symbolic reductions together:

<pattern>YES *</pattern>
<template><srai>YES</srai> <sr/></template>

With this category the robot will reply to all
"Yes something" inputs by combining the
reply to "Yes" with the reply to "something".
Remember, <sr/> is an abbreviation for <srai><star/></srai>.

The <srai> tag is also the answer to the question: Can I have more
than one pattern in the same category? Suppose you want the
same answer for two different patterns. You might think of
writing something like this:

<template>See you later.</template>

Right now you can't put two patterns in one category, but with <srai>
you can get the same effect:

<pattern>GOODBYE</pattern> <template><srai>BYE</srai></template>

<pattern>BYE</pattern> <template>See you later.</template>

If you look through the AIML files you will see many examples
of <srai> mapping multiple patterns to the same reply.

- How recursive is AIML?

Understanding recursion is important to understanding AIML.
"Recursion" means applying the same solution over and over
again, to smaller and smaller problems, until you reduce
the problem to its simplest form. AIML uses the tags
<sr/> and <srai> to implement recursion. The botmaster
uses these tags to tell the robot how to respond to a
complex sentence by breaking it down into the responses
to simpler ones.

Recursion can apply many times to a single input. Given
the normalized input:


an AIML category with the pattern "_ RIGHT NOW" matches first,
reducing the input to:


Another pattern ("<name/> *") reduces it to:


And then:


reduces to:


and finally to:


- What are "justthat" and "justbeforethat"

<justthat/> and <justbeforethat/> are new, experimental
AIML tags. The idea here is to represent more "state" in the dialogue
than just "that":

Robot: justbeforethat
Client: justthat
Robot: that
Client: input

In the future we may expand AIML categories to include such
"deeper context", if there is a need for it.

- How can I insert a transcript in the robot reply?

The purpose of <get_dialogue/> is to give the client a transcript of
his or her conversation with ALICE. Unfortunately this feature was
advertised in a press article before we had a really efficient
implementation, and the large number of dialogue requests bogged
down the server. So for now <get_dialogue/> just displays a warning.

- How does the random function work?

The random function is (so far) the only AIML method
with a list argument. Its purpose is random selection
of one of a set of text items. In "old-style" AIML the
text appendage operator "+" also served as a list-item
marker. In XML style we use the HTML <li> list-item

<random> <li>X1</li><li>X2</li> </random> Say one of X1 or X2 randomly
<random><li>A</li><li>B</li><li>C</li></random> Say one of A, B or C randomly

- Can I run shell commands from AIML scripts?

Yes. Use the <system>X</system> tag to run the shell command X.
The command X is assumed to produce its output in line-oriented
format suitable for a BufferdReader to read line by line.
A simple example of this command in an AIML script is:

<pattern>WHAT TIME IS IT</pattern>
<template>The local time is: <system>date</system></template>

The "date" command is a system command that generates a text
string containing the date and time. (Note that this might
not work on Windows).

Take extreme care in using the <system> tag because it
potentially permits remote clients to run a command on
your system.

- How can I restrict remote clients from running programs on my computer?

If your reply contains the markup

<system>yourcammand <get_ip/></system>

then the robot will insert the (virtual) client IP into the command
line argument for "yourcommand". Then it is up to "yourcommand" to
enforce access privileges.

- Can I insert dynamic HTML into the robot reply?

If you are fortunate enough to be running lynx under Linux, the
following markup is a simple way to "inline" the results of an HTTP
request into the chat robot reply. Try asking ALICE:
"What chatterbots do you know?" and she will reply with a page
of links generated by the Google search engine.

<pattern>WHAT *</pattern>
Here is the information I found:
lynx -dump -source -image_links<personf/>

- Can I include JavaScript in the robot reply?

Yes. You can include any HTML including <script> tags. Suppose you
want to "chat AND browse," in other words, have the robot open
up a new browser window when she provides a URL link. Here's a category that
kicks out a piece of HTML/scripting that opens a new window with and loads a
given URL. This is handy for search engines or showing off one's web page.

<pattern> WHERE IS YOUR WEB SITE </pattern>
It's at ""
<script language="JavaScript">
// Go to <a href="">The ALICE

A couple of things to note about this technique: #1, this will only work
when ALICE is being talked to from a browser that runs JavaScript, i.e. it
won't work in the applet. We have tested it in Netscape and MS Internet
Explorer, and it works well in both. #2. For the above reason, it is
important to have some sort of explanatory statement before the scripting in
case the scripting isn't supported. Besides, you want some response in your
ALICE window, even if another window DOES come up. #3. If this is viewed
in a browser that doesn't understand the <script> tag, notice that this line
will show up:
"// Go to <a href="">The ALICE
Which is good, because it gives a back-up for the "non-scripted" (the Lynx
users, I guess.) And remember that you have to keep the "//" in front of
any non-java-script lines within the <script> tag.

- What is the <topic> tag?

1. <topic> allows ALICE to prefer responses that deal with the
topic currently being discussed. This creates topical
conversation, yet still has the ability to move from one subject
to another.
2. <topic> allows ALICE to have duplicate patterns in different
contexts (topics) allowing ALICE to have different responses to
the same input patterns depending on the topic. For example,
"overriding" the " * " pattern for different topics. (I'll give
an example with this.)
3. As always, you can still use the <gettopic/> tag to refer to
the topic in your output statements (templates).
4. As always, you can add topics on top of all your existing AIML
to keep your bot's current personality.

- Where does the <topic> tag appear?

Topic tags are placed around one or more categories. (Usually
many.) The categories (with each respective "pattern", "that",
and "template") within a set of <topic> </topic> tags would be
associated with the defined topic. The name of the topic would be
given by a "name" property in the beginning topic tag. Here would
be the full AIML format with topic:


<topic name="THE TOPIC">

<pattern> phrase </pattern>
<that> phrase </that>
<template> phrase </template>



- How do I use the <topic> tag?

The concept is that the botmaster uses the <settopic> tags to set
the current topic being discussed. Once the topic is set, when
the client types in a statement for ALICE to find a response for,
the categories defined within the <topic> tags matching the
current topic will be searched first-- before any of the non-
topic categories, or the default categories. If there is not a
matching category defined in the current topic, then any
categories that are not defined in topic tags are searched. As
mentioned before, you can create categories with identical
<pattern> phrases in different topics, each with different
responses that cater to the current topic.

An proof of concept example:
A very useful topic entry might be the default "*" input for
specific topics. If ALICE were set up on a pet store web site
and a person was talking to ALICE about dogs, a useful entry
might be:

<topic name="DOGS">

<pattern> * </pattern>
<li> Dogs are one of the most popular pets to have.</li>
<li> Have you ever met a Chihuahua you didn't like?</li>
<li> What else do you know about dogs? </li>
<li> Do you have any questions about dogs? </li>

//more dog categories....


Normally there would be many entries in a topic, but in this
example, we simply entered the default "*". In this case, if the
person said something that ALICE didn't have a specific
programmed response for, she could still respond intelligently
within the current topic. (Note: this is all assuming there are
existing categories that might set the current topic to "DOGS")

Also, though topics can only have one name, they can contain the
wild characters "*" or "_" just like a pattern. Also, while
sticking with the pattern criteria, only one wildcard character
is allowed per name. This would allow topics like "CARING FOR
DOGS" or "GROOMING DOGS" to also fall into the "_ DOGS" topic.
As with patterns, the more specific topics would gain preference
over the wildcarded topic. This means that if the topic is
currently "GROOMING DOGS" and yet there is not a programmed
response for the input in that category, then "_ DOGS" would be
checked, and then next the default categories.

- What is <think>?

The simple purpose of the <think> X </think> tag pair is
to evaluate the AIML expression X, but "nullify" or hide
the result from the client reply.

A simple example:

<pattern>I AM FEMALE</pattern>
<template>Thanks for telling me your gender. <think><set_female/></think>

The <set_female/> tag normally returns a string like "she". But the
<think> tag hides the text output of <set_female/> from the reply,
which contains only the text:

Thanks for telling me your gender.

- What is the DTD for AIML?

Real XML fanatics know that because AIML is an XML language it
must have something called a DTD (Document Template Descriptor).
The DTD is a formal specification of the grammar for an XML language.
Unless you are using special XML tools to work on your AIML or
developing your own parser for AIML, you probably do not need to know
much about the DTD.

This DTD reflects the current content of the *.aiml files that program B can
actually parse. The DTD will become more general as the parser

<!DOCTYPE alice [

# Author: John E. Laroche
# Version: .91
# Organization:,inc
# Date: 24 October 1999
# Revised: 6 November 1999

<!ELEMENT alice ( category | topic )* >
<!ATTLIST alice
<!ELEMENT topic ( category+ ) >
<!ATTLIST topic
<!ELEMENT category ( pattern | template | that )* >
<!ATTLIST category
<!ELEMENT that ( #PCDATA)* >
<!ATTLIST that
<!ELEMENT template ( #PCDATA | beforethat | birthday | botmaster |
boyfriend | favorite_band
| favorite_book | favorite_color | favorite_food | favorite_movie |
| for_fun | friends | gender | get_age | get_dialogue | get_gender | get_he
| get_ip | get_it
| get_location | get_she | get_they | getname | gettopic | girlfriend |
gossip | justbeforethat
| justthat | kind_music | load | location | look_like | name | noload |
person | person2
| personf | question | random | set_age | set_animagent | set_female |
set_he | set_it
| set_location | set_male | set_she | set_they | settopic | setname | sign |
| srai | star | system | talk_about | that | think | wear )*
<!ATTLIST template
<!ATTLIST wear
<!ELEMENT talk_about EMPTY >
<!ATTLIST talk_about
<!ATTLIST star
<!ELEMENT srai ( #PCDATA | star | person | person2 | personf | name |
get_it | botmaster | favorite_band
| favorite_book | favorite_color | favorite_food | favorite_movie |
| set_it | getname | gettopic | location | sr )*
<!ATTLIST srai
<!ELEMENT settopic ( #PCDATA | star | person | person2 | personf | name |
get_it |
favorite_band | favorite_book | favorite_color | favorite_food |
favorite_movie | favorite_song
| set_it | getname | gettopic | location| sr )*
<!ATTLIST settopic
<!ELEMENT location EMPTY >
<!ATTLIST location
<!ELEMENT gettopic EMPTY >
<!ATTLIST gettopic
<!ELEMENT getname EMPTY >
<!ATTLIST getname
<!ELEMENT get_gender EMPTY >
<!ATTLIST get_gender
<!ELEMENT get_they EMPTY >
<!ATTLIST get_they
<!ELEMENT set_it ( #PCDATA | star | person | person2 | personf | name |
| favorite_band | favorite_book | favorite_color | favorite_food |
favorite_movie | favorite_song
| that | justthat | justbeforethat | set_it | getname | gettopic | location
| sr )*
<!ATTLIST set_it
<!ELEMENT set_they ( #PCDATA | favorite_band | favorite_book |
| favorite_food | favorite_movie | favorite_song | person | person2 |
personf | star | that )*
<!ATTLIST set_they
<!ELEMENT person EMPTY >
<!ATTLIST person
<!ELEMENT favorite_song EMPTY >
<!ATTLIST favorite_song
<!ELEMENT favorite_movie EMPTY >
<!ATTLIST favorite_movie
<!ELEMENT favorite_book EMPTY >
<!ATTLIST favorite_book
<!ELEMENT get_it EMPTY >
<!ATTLIST get_it
<!ATTLIST name
<!ATTLIST sign
<!ELEMENT setname (#PCDATA| person | star )*
<!ATTLIST setname
<!ELEMENT set_age ( #PCDATA | star | person )*
<!ATTLIST set_age
<!ELEMENT set_she (#PCDATA| person | star)* >
<!ATTLIST set_she
<!ELEMENT set_location (#PCDATA | star | person )*>
<!ATTLIST set_location
<!ELEMENT set_male EMPTY >
<!ATTLIST set_male
<!ELEMENT set_female EMPTY >
<!ATTLIST set_female
<!ELEMENT set_he ( #PCDATA | star | person )* >
<!ATTLIST set_he
<!ELEMENT random ( #PCDATA | li )* >
<!ATTLIST random
<!ELEMENT li ( #PCDATA | beforethat | birthday | botmaster | boyfriend |
| favorite_book | favorite_color | favorite_food | favorite_movie |
| for_fun | friends | gender | get_age | get_dialogue | get_gender | get_he
| get_ip | get_it
| get_location | get_she | get_they | getname | gettopic | girlfriend |
gossip | justbeforethat
| justthat | kind_music | load | location | look_like | name | noload |
person | person2
| personf | question | set_age | set_animagent | set_female | set_he |
| set_location | set_male | set_she | set_they | settopic | setname | sign |
| srai | star | system | talk_about | that | think| wear )*
<!ELEMENT question EMPTY >
<!ATTLIST question
<!ELEMENT look_like EMPTY >
<!ATTLIST look_like
<!ELEMENT kind_music EMPTY >
<!ATTLIST kind_music
<!ELEMENT justthat EMPTY >
<!ATTLIST justthat
<!ELEMENT justbeforethat EMPTY >
<!ATTLIST justbeforethat
<!ELEMENT girlfriend EMPTY >
<!ATTLIST girlfriend
<!ELEMENT get_she EMPTY >
<!ATTLIST get_she
<!ELEMENT get_ip EMPTY >
<!ATTLIST get_ip
<!ELEMENT get_he EMPTY >
<!ATTLIST get_he
<!ELEMENT gender EMPTY >
<!ATTLIST gender
<!ELEMENT friends EMPTY >
<!ATTLIST friends
<!ELEMENT for_fun EMPTY >
<!ATTLIST for_fun
<!ELEMENT favorite_food EMPTY >
<!ATTLIST favorite_food
<!ELEMENT favorite_color EMPTY >
<!ATTLIST favorite_color
<!ELEMENT favorite_band EMPTY >
<!ATTLIST favorite_band
<!ELEMENT boyfriend EMPTY >
<!ATTLIST boyfriend
<!ELEMENT botmaster EMPTY >
<!ATTLIST botmaster
<!ELEMENT birthday EMPTY >
<!ATTLIST birthday
<!ELEMENT beforethat EMPTY >
<!ATTLIST beforethat
<!ELEMENT noload EMPTY >
<!ATTLIST noload
<!ELEMENT gossip ( #PCDATA | getname | person2 | get_gender | person |
personf | star )* >
<!ATTLIST gossip
<!ELEMENT get_dialogue EMPTY >
<!ATTLIST get_dialogue
<!ELEMENT pattern ( #PCDATA )* >
<!ATTLIST pattern
<!ELEMENT get_age EMPTY >
<!ATTLIST get_age
<!ELEMENT get_location EMPTY >
<!ATTLIST get_location
<!ELEMENT personf EMPTY >
<!ATTLIST personf
<!ELEMENT person2 EMPTY >
<!ATTLIST person2
<!ELEMENT set_animagent EMPTY >
<!ATTLIST set_animagent
<!ELEMENT think ( #PCDATA | getname | person2 | get_gender | person |
personf | star )* >
<!ATTLIST think
<!ELEMENT system ( #PCDATA )* >
<!ATTLIST system


VII. Java Classes

- Do I need to know about the Java classes?

No, not unless you plan to do software development on
the program B Java code. If you are an open source
contributor to the ALICE project, a researcher developing
new AI software, or you are trying to link your own
code to the ALICE package, then this section is for you.
Otherwise, you probably don't need to know much about the
Java classes in program B.

- How does program B work?

The basic loop of program B is to accept an input,
either from the GUI or from the Web, to
preprocess that input and segment it into sentences,
and, for each sentence, to find the best match among
the patterns, and to return the corresponding reply.
Each reply is itself an AIML template, in effect a mini-
program that tells program B how to construct the reply.

The algorithm is thus divided into a matching phase
and a response evaluation phase. In fact these two
phases interleave, because the response may evoke
a recursive call to the pattern matcher with the
<srai> or <sr/> tags.

- What is the class structure of program B?

The core functionality of program B resides in the file In that file, you find a class hierarchy
from "String" to "Brain" and finally "Classifier."
A branch in that hierarchy contains classes for histogramming
and ranking.

The first branch of the class hierarchy derives class Brain
from StringSorter, extending StringSet. The second branch
extends StringSet to StringHistogrammer and on to StringRanker.
The final class Brain extends StringSet and uses StringRanker.

- I tried to compile prorgam B and got a lot of warnings.

The designers of Java and the designers of ALICE disagree
on one stylistic point: Java designers believe in the
"one file-one class" philosophy, at least for classes
used outside their own source file. The ALICE engineers
follow the opposite "one file-many classes" design principle,
which allows us to group a number of logically related classes
in a single file, such as The Java compiler
might complain about a class used outside its file, but
these messages are just warnings.

If you don't want to see the compiler warnings, run the
compiler with the "-nowarn" flag:

javac -nowarn *.java

- What are deprecated APIs?

One of the biggest challenges facing a Java application
developer is finding a "lowest common denominator" subset of the
Java language that works on all the platforms and versions
out there. The Java language itself is a moving target.
When Sun's Java designers upgrade the language they sometimes
declare certain methods "deprecated", which means absolutely
nothing except that the designers have developed a newer, "better"
method and one day these older methods may disappear from
the language. Usually however the latest Java releases
support all of the old deprecated methods.

- What is class Globals?

Globals is the repository for all of the botmaster-selectable
parameters in program B. The Globals class corresponds to
the "Options" menu on the program B menu bar. Globals contains
methods toFile() and fromFile() to make these values
persistent between sessions.

- What is class StringSet?

The StringSet implements the abstract concept of a set of
strings, meaning that each string item appears at most once
in the setc.

The "set" means that the strings occur only once in instances
of object StringSet: {"this","that","another"} is a set of
strings; {"start","start","stop"} is not.

- What is class SortedStringSet?

SortedStringSet extends StringSet but enforces an alphabetical
ordering of the Strings. The SortedStringSet maintains its
data structure dynamically, so that the set remains sorted
after each item is added.

- What is class StringHistogrammer?

StringHistogrammer extends StringSet and contains a map from
each string to a count, usually indicating the number of times
that string appears in a sample of text. A histogram is
like a "bar graph" that counts occurances of each item.

- What is class StringRanker?

Extending StringHistogrammer, StringRanker also sorts the
strings by the histogram count. The highest count string
is first, the next highest count second, and so on.

The concept of a StringRanker should be familiar to anyone
who has ranked people, companies or sports teams by any
number such as sales, market capitilization, or points scored.
One application for a StringRanker is determining the
"top 10 referers" in HTTP log file analysis (see

- What is class Brain?

Brain extends StringSorter, and uses StringRanker. The sorted
strings in the Brain class are keys formed by combining the
pattern, that, and topic strings. In the original versions
of ALICE, there were no "that" and no "topic" tags, so the
Brain class simply mapped input patterns to output templates.
With the addition of the "that" and "topic" tags we had to
create the "key" from the combination of all three.

The "Target" objects in class Brain are instances of StringRanker.
These structures form the basis of the classification and targeting
algorithms in program B. For each category, the Targetmap contains
an instance of StringRanker storing the inputs classified into
that category.

- What is the Responder interface?

Developed to meet the needs of multiple ALICE
application scenarios, the Responder interface
simplifies the code in class Classifier for
natural language queries. The Responder defines
an interface with three members:

log() : tells how to log the conversation.
append() : how to append response lines together.
post_process() : runs after response loop finishes.

The method Classifier.multiline_response() calls
all of the Responder methods. See the next
question ("What is the low-level interface?")
for more information about multiline_response().
At least five classes implement the Responder

GUIResponder: the program B GUI uses this.
HTMLResponder: a class for Web Server HTML replies.
RobotResponder: this class used by RobotCommunicator
CustomResponder: a template for more Responder classes.
AppletResponder: the Applet code uses this class.

These classes all handle special circumstances
for the various Responder types: for example,
HTMLResponder appends the client input to each
response; GUIResponder does not. AppletResponder
logs the dialogue through a network URL connection;
all other classes write to a local file. RobotResponder,
used by the Kid interface, suppresses all the HTML
from robot replies; while HTMLResponder passes
them through. HTMLResponder also runs the optional
Animagent class to create the MS Agent VB Script.
Text-based Responder classes wrap the text; HTMLResponder
need not wrap because the browser handles text formatting.
The Responder interface addresses this wide variety of needs.

- What is the low level interface to program B?

If you require only a graphical interface, try using the
class RobotCommunicator. Depending on your application,
you may also try the Servlet interface or the applet.
Some developers however may want lower-level access to the
chat robot functions.

The class Classifier in contains the low-level
methods needed to interface directly to ALICE. "Classifier" might
as well be called "Bot" because more than any other class,
it handles those functions most unique to the chat robot.

The method Classifier.multiline_response() is a key entry point
into the conversation engine. The "multiline" in
"multiline_response" means that the input may contain
multiple "lines" or sentences. The first argument "query" to
multiline_response is the input. The second argument "hname" is
the virtual IP address of the client. The third and last argument
is the class implementing the Responder interface.

If the input string contains "Sentence1. Sentence2? Sentence3."
then multiline_response might produce:

> Sentence1.
> Sentence2
> Sentence3

The method multiline_response hides all of the details
of sentence segmentation, responding to each input line individually,
and formatting the output. In particular multiline_response()
may or may not append the VBScript needed to drive the MS
Agent output, depending on whether the global MS Agent parameter is set.

The argument "hname" is a key that indexes the client's conversation. For
the interface you need this can probably always be "localhost" or some
other constant.

- Lower, Lower

If you need even lower level access to the program B robot,
you can request responses to individual sentences on a
line-by-line basis. Inside multiline_response() there are
calls to the Classifier.respond() method like:

String response = respond(norm, hname);

where "norm" is a normalized single-sentence input and hname is
the virtual IP address of the client.

Inside respond() we find the the method respondIndex(). The
base class StringSet stores the strings in an indexed vector,
and respondIndex() locates the index of the best matched category
for the normalized input string.

The loop inside respondIndex() scans through the categories
in reverse alphabetical order by key, until it finds the best
match. Because the "*" pattern comes first in alphabetical
order, and is the most general pattern, respondIndex() will
return zero when no more specific category matches.

- What is class IntSet?

IntSet represents a set of integers. Were we using Java
Collections this would likely be a Set, but the simple
requirements of program B allow us to create a simple
IntSet class.

"Set" means that the object has only one occurance of each item:
{1, 4, 2, 9} is a set of integers; {1, 1, 2} is not.

- What is class SortedIntSet?

The sorted version of IntSet, SortedIntSet maintains its
elements in a sorted array. Throughout program B you will
find many loops utilizing instances of SortedIntSet. These
objects provide an efficient means to locate items in
"rank order", the highest numbered items first and the
smallest numbers last.

- What is class Substituter?

The static class Substituter contains a number of similar string substitution
methods useful at several points in program B.

Program B has the unique feature that it relies on HTTP GET methods,
rather than POST methods, to transmit chat inputs to the robot server.
HTTP inserts '+' characters in place of spaces, and applies a series of
substitutions to eliminate many characters. The static method cleanup_http()
undoes these substitutions and restores the input string to the form similar
to what the client originally typed.

The problem of segmenting strings into sentences is complicated by the
conventional use of periods to denote abbreviations like "Dr.", "Mr.",
and "St." The method deperiodize() applies a series of substitutions to
eliminate most common abbreviations. Like the other substitution methods
in this class, the deperiodize() method has an associated static data member
of class String[][2], which stores the substitution map.

The patterns in AIML are written in normalized form. The method normalize()
converts a string to normal form by the following steps:

1. Remove all punctuation (inputs assumed to be individual sentences)

2. Convert string to upper case

3. Place exactly one space between words

4. Expand all contractions

5. Correct a few common spelling mistakes

6. Return a "Trimmed" string

The justification for removing all punctuation from text inputs
is explained by the need to make the chatterbot compatible with speech
inputs, which of course contains no punctuation.

- What is class Unifier?

Unification refers to the process of matching and binding. A unifier determines
whether two sentences match and, if so, what any 'variables' in the pattern
bind to. In the case of AIML the only matching variable is the single '*'
symbol. The Unifier class contains a 'star' data memeber to contain the
matched subsentence.

- What is class Parser?

The Parser class is responsible for the evaluation of AIML
response templates. The method pfkh() [the Program Formerly
Known as Hello] is the heart of evaluation process. This
method contains the code for recognizing and processing
AIML template tags.

The Parser class does not parse all the AIML in the language
definition; it parses and evaluates only the templates at runtime.
Another class, AliceReader, has the job of reading the AIML files
at load time, and parsing the categories into topics, patterns and templates.

- What is class AliceReader?

AliceReader is an efficient, small-footprint XML interpreter
hard coded by Kris Drent specifically for reading AIML categories.
Each category has a pattern, a template, and an optional topic and
thatpattern. AliceReader scans the AIML input and tries to
identify these fields as quickly as possible.

- What is class Classifier?

The class Classifier might as well be called "bot" because it contains
the basic functionality of the chatterbot algorithm.

See the question "How can I interace my Java program to ALICE?" for
additional information about the class Classifier.

- What is class LineClassifier?

In the file you will find an Interface called LineProcessor
with one required method: process_line(). The LineProcessor
is the abstraction of an algorithm that reads a file one line at a time,
processes each line as a data record, and moves on to the next.

LineClassifier implements LineProcessor because it reads lines
of text from the log file and identifies client input lines for
classification. What makes classification efficient is the way
LineClassifier stores the client lines in a SortedStringSet, called
Lines. Becuase the matching algorithm proiritizes the patterns
alphabetically, LineClassifier can classify an element from Lines
in O(1) time.

The code for LineClassifier is in

- What is class Dialogue?

A Dialogue (not to be confused with a Dialog class!) is
the representation of the conversation between the client
and the robot. The basic data structure is a pair of String arrays
client_said[] and robot_said[] that store the alternating
statements of client and robot. The Dialogue also
encodes the length, hostname, and start and end tag

- What is class Access?

Class Access is the abstraction for log file analysis to
extract dialogues. In a typical chat robot server scenario,
the program records each line of client input and the robot
reply in a log file. Given many simultaneous conversations,
these dialogues are interleaved in the log file. The purpose
of class Access is to unravel these conversations into
individual threads by client.

- What is class B?

Class B is the old name for the Swing version of class Bawt, but
now just extends Bawt.

- What is class Bawt?

The class Bawt is the Java application, and implements the GUI.

- What is class Blet?

The Blet class is the applet, but is similar in many ways to the application.
The applet is a stripped down version of the program, with a simpler GUI
and no "botmaster" privileges. Also, the Blet class doesn't utilize the
web server, because it runs as a client-side applet.

- What is class Kid?

Class Kid is a simplified graphical user interface, "easy enough
for kids" to run. Program Kid does not evoke program B, but the Kid
may be started from the program B options menu. The logic here
is that kids should be able to have conversations with the chat
robot, but parents may not want kids to start chat robot servers
(see Appendix B: Note to Parents).

Class Kid utilizes RobotCommunicator as its interface to the
chat robot.

- What is class RobotCommunicator?

If you want to customize your own application or applet then
you might find RobotCommunicator is a useful class. The
RobotCommunicator abstracts the combination of a scrolling TextArea
output display with a TextField input area input field.

- What is class Loader?

Both the application and the applet use the Loader class to load the AIML
robot script. The Loader class extends Thread, and runs "in the background"
while the GUI and, in the case of the application, the web server start.

- What is class WebServer?

The WebSever class implements a "faux" HTTP server, i.e. a server that
listens for HTTP connections and accepts them; then replies in properly
formatted HTML. The connecting client, typically a browser, cannot tell
the difference between the chat robot server and a full-blown web server.
In particular, our WebServer implements only HTTP GET methods, not POST
methods. Our WebServer class does not implement many of the other features
of ordinary web servers; although it is a multithreaded server.

- What is class Clerk?

The idea behind class Clerk is to put a "firewall" between the
client and the server so that a misbehaving client can't tie up
the server forever. A client connecting to a web server is
like a customer appearing at a service window. When the customer
appears, the ClerkManager assigns a clerk to that customer.

The customer may take a while to give the clerk his query,
even after making the first connection. The clerk goes into
a hibernating "wait" state and wakes up periodiocally to
see if the client has finished the query. Some customers
never complete their requests, so the manager kills the
clerk after a predetertimined timeout.

We made the Classifier method mutliline_response "synchronized"
so that requests to the robot are atomic. The clerk will
wait until the client request is completed before activating
the robot. Once the client completes the query, the clerk
delivers it to the robot. The clerk then sends the reply back
to the client and terminates itself.

Appendix A.


1. What does AIML stand for?
2. What is the basic unit of AIML?
3. Name the three parts of a category
4. True or false: <that> is optional
5. True or false: the web server must run on port 2001
6. Show two forms the recursive AIML tags
7. True or false: <srai> <star/> </srai>
is the same as <sr/>
8. Which of the following are valid AIML patterns?
b. * HELLO
c. * AND *
d. _ ALICE
e. *
f. forget *
9. What does <person> your wish is my command </person> do?
10. Given the input "Who is Dr. Wallace" which of the following
patterns is the best match:
a. *
b. WHO IS *
11. Given the input "Who is the first president" which of the
following patterns is the best match:
d. WHO IS *
12. What is wrong with the following category?

<template> <srai> AND </srai> </template>

13. Is anything wrong with the following category?

<pattern>TEST ME</pattern>
<template> <system>dir</system> </template>

14. What is the difference between <gender/> and <get_gender/>?

Advanced Java questions:

15. Sketch the class hierarchy from "String" to "Brain".
16. True or false:
a. "pfkh" stands for "The Program Formerly Known as 'Hello'"
b. pfkh() is a method in class Parser
c. pfkh() parses and evaluates all AIML expressions
d. pfkh() evaluates templates at reply time
17. multiline_response() is a member of what class?
18. The keys used to track client properties are called what?


1. Artificial Intelligence Markup Language
2. a category
3. pattern, that, template
4. true
5. false
6. <sr/> and <srai>...</srai>
7. true
8. a. b. d. and e. are valid.
9. "My wish is your command"
10. c. is an exact, atomic match.
11. b. because it is last in alphabetical order.
12. This category creates an infinite loop.
13. This category works fine, but allows remote
clients to see your files.
14. <gender/> is the robot's gender; <get_gender/> is the client's.
16. true, true, false, true
17. Classifier
18. virtual IP addresses

Appendix B. Note to Parents

The ALICE "brain" does not contain any explicit or adult material.

Experience has shown however, that clients (persons communicating
with the chat robot over the Internet) will invariably try to
engage the robot in adult conversations. The robot is programmed
to try to avoid these topics. But parents may not wish
to give children access to the log files containing these mature

Children chatting with ALICE is okay; children reading the
dialogues with adult clients is not.

Clients talking with chat robots on the Internet should also be
aware that the program B server logs and records all conversations.

-- End of Don't Read Me 2000 Dr. Richard S. Wallace
-- For more help join the ListBot mailing list at
-- Please send corrections and additions to