How to program a text adventure in C
by Ruud Helderman
<r.helderman@hccnet.nl>
Licensed under
MIT License
13. The parser
Every text adventure has a
parser.
But there are parsers and parsers.
A simple ‘verb-noun’ parser
(like the one we have been using ever since chapter 2)
could be sufficient for a carefully designed adventure game.
However,
Infocom
has proven that a more advanced parser
really helps to make an enjoyable game.
It does not have to pass the
Turing test;
remember, it’s only a game.
But the parser should enable the player
to express his intentions in a more or less natural way.
Our current parser mainly consists of two lines of code,
tucked away in parsexec.c:
char *verb = strtok(input, " \n");
char *noun = strtok(NULL, "\n");
|
Alright, plus the sequence of strcmp calls to map verbs to commands,
and the functions in noun.c to map nouns to objects.
But that’s it.
This system has served us well for the past 11 chapters,
but it has its flaws.
- It only accepts simple commands of the form ‘verb noun’;
it does not understand sentences
with both a direct object and an indirect object, like put coin in box.
- It does accept multi-word objects (like silver coin),
but the
whitespace
between the words must be spot-on.
Our game rejects a double space between silver and coin.
- It is
case-sensitive;
the command “Go north” is not recognized
because of the capital ‘G’.
Writing a good parser is no trivial task,
but here I will give you a relatively simple approach.
We will define a
grammar
that consists of a list of patterns,
similar to (but much simpler than)
regular expressions.
Examples of patterns:
look around |
go A |
put A in B |
To parse the user’s input,
we will traverse the list of patterns from top to bottom,
trying to match the user’s input with each pattern in turn.
We will stop at the first match found.
For the sake of simplicity, we will not make use of
backtracking,
though this could be added later.
Uppercase letters are the
nonterminal symbols
in our grammar;
they match any tag (of any object).
Matching is
greedy;
when the parser has a choice between two different tags
(for example ‘silver coin’ and ‘silver’),
the longer tag will take precedence.
The matching tag can then be passed as parameter noun
to one of the execute functions.
For commands with more than one noun (introduced in the next chapter),
parameter passing becomes a bit unpractical.
For simplicity, I will use a
global variable
instead of parameters (despite the bad reputation of global variables).
The variable will be an array of string pointers.
The array has 26 elements; one for each (uppercase) letter in the alphabet.
That is sufficient for upto 26 different nonterminals in a single pattern.
For every nonterminal in a (matching) pattern,
a matching tag will be ‘captured’ by filling
the nonterminal’s array element with a pointer to that particular tag.
params[0] (the first array element) captures nonterminal ‘A’,
params[1] captures ‘B’, and so forth.
A simple macro can be used
to find the array element that belongs to a given nonterminal.
#define paramByLetter(letter) (params + (letter) - 'A')
|
Note: you may find the array length of 26 a bit over the top,
but it does save me the trouble of writing some
bounds checking
code to prevent a
buffer overflow
in case of a malformed pattern.
Now we have to think of a way to handle missing or unrecognized nouns.
Suppose the user makes a typo and enters “go kave”.
Question is, should this command match the pattern “go A” or not?
If we don’t, then the command will fail to match any pattern
and end up in a generic error handler, which will probably reply with
something like “I don’t know how to go kave”.
This takes away every opportunity
to improve these ‘negative responses’;
a reply “I don’t understand where you want to go”
already feels a lot more natural.
It would be best to maintain all replies concerning command go
inside function executeGo.
There are a few ways to achieve this, but the easiest one is
to allow nonterminals to match anything;
so not just a valid tag, but also total gibberish, blanks or just nothing.
This ‘invalid’ input will be captured as an empty string
(“”).
Having such a ‘loose’ nonterminal in the middle of a pattern
does complicate the pattern matching process; it would require
backtracking
to properly align the word ‘in’
when matching input “put foo in in box”
with pattern “put A in B”.
To keep things simple, we will only enable this loose matching
for nonterminals that occur at the very end of a pattern.
To be able to properly match the example above, we need two separate patterns:
- Pattern “put A in B” will match
the valid command put coin in box,
as well as the invalid commands
put coin in booox and put coin in.
Note that the nonterminal A only matches valid tags
(in this case coin).
- Pattern “put A” matches
all remaining invalid commands:
put coin,
put koin,
put koin in box,
put koin in booox and a bare put.
And also the initial example (put foo in in box).
The order of the patterns is vital here:
if “put A” would be on top (meaning it will be tried first),
then it would consume every put command;
a valid “put coin in box” would be considered invalid
because there is no tag “coin in box.”
The same goes for commands that appear in multiple forms, such as look.
- “look around”
- “look” (as an abbreviation of look around)
- “look A”
The first two patterns can be in any order,
but the third one must come last.
Time to put this into action.
We will discard the existing contents of module parsexec.c,
and replace it by a new implementation of function parseAndExecute
using a list of patterns
that should be able to match every command we have implemented so far.
Each pattern is tied to a function that executes the appropriate command.
parsexec.h |
- extern bool parseAndExecute(const char *input);
|
parsexec.c |
- #include <ctype.h>
- #include <stdbool.h>
- #include <stdio.h>
- #include "object.h"
- #include "misc.h"
- #include "match.h"
- #include "location.h"
- #include "inventory.h"
- #include "openclose.h"
- typedef struct
- {
- const char *pattern;
- bool (*function)(void);
- } COMMAND;
- static bool executeQuit(void)
- {
- return false;
- }
- static bool executeNoMatch(void)
- {
- const char *src = *params;
- int len;
- for (len = 0; src[len] != '\0' && !isspace(src[len]); len++);
- if (len > 0) printf("I don't know how to '%.*s'.\n", len, src);
- return true;
- }
- bool parseAndExecute(const char *input)
- {
- static const COMMAND commands[] =
- {
- { "quit" , executeQuit },
- { "look" , executeLookAround },
- { "look around" , executeLookAround },
- { "look at A" , executeLook },
- { "look A" , executeLook },
- { "examine A" , executeLook },
- { "go to A" , executeGo },
- { "go A" , executeGo },
- { "get A" , executeGet },
- { "drop A" , executeDrop },
- { "ask A" , executeAsk },
- { "give A" , executeGive },
- { "inventory" , executeInventory },
- { "open A" , executeOpen },
- { "close A" , executeClose },
- { "lock A" , executeLock },
- { "unlock A" , executeUnlock },
- { "A" , executeNoMatch }
- };
- const COMMAND *cmd;
- for (cmd = commands; !matchCommand(input, cmd->pattern); cmd++);
- return (*cmd->function)();
- }
|
Explanation:
- Line 1:
from ctype.h,
we will be using isspace (line 26),
which provides a more complete
whitespace
test than a simple comparison with
space
(it also matches
tab,
among others).
- Line 11-15:
type COMMAND is a structure that combines a pattern with a function.
- Line 17-29:
a few generic handlers.
- Line 26:
this loop calculates the length of the first word entered by the player;
we assume this is supposed to be the (unrecognized) verb.
- Line 27:
we use a special format specifier
%.*s
to print only the first len characters of src.
- Line 33-53:
the array of patterns.
- Line 38-40:
synonyms
are simple: just map different patterns to the same function.
- Line 52:
the last pattern should always be “A”;
this pattern matches anything and so it will pick up
everything that hasn’t been matched by preceding patterns.
- Line 55:
this loop scans though all patterns from top to bottom;
it stops at the first pattern that matches.
The ‘sure match’ on line 52 ensures
the loop will never pass the end of the array.
- Line 56:
the function that belongs to the matching pattern is called.
The hardest part is the implementation of function matchCommand.
But as you can see below, that too can be done in less than 100 lines of code.
match.h |
- #define MAX_PARAMS 26
- extern const char *params[];
- #define paramByLetter(letter) (params + (letter) - 'A')
- extern bool matchCommand(const char *src, const char *pattern);
|
match.c |
- #include <ctype.h>
- #include <stdbool.h>
- #include <string.h>
- #include "object.h"
- #include "misc.h"
- #include "match.h"
- const char *params[MAX_PARAMS];
- static const char *skipSpaces(const char *src)
- {
- while (isspace(*src)) src++;
- return src;
- }
- static const char *matchSpaces(const char *src)
- {
- return *src == '\0' || isspace(*src) ? skipSpaces(src) : NULL;
- }
- static const char *matchTerminal(const char *src, char terminal)
- {
- return terminal == ' ' ? matchSpaces(src) :
- tolower(*src) == tolower(terminal) ? src + 1 : NULL;
- }
- static const char *matchTag(const char *src, const char *tag, bool atEnd)
- {
- while (src != NULL && *tag != '\0')
- {
- src = matchTerminal(src, *tag++);
- }
- return atEnd && src != NULL && *skipSpaces(src) != '\0' ? NULL : src;
- }
- static const char *matchParam(const char *src, const char **par, bool loose)
- {
- const char *restOfSrc = loose ? src + strlen(src) : NULL;
- OBJECT *obj;
- for (obj = objs; obj < endOfObjs; obj++)
- {
- const char **tag;
- for (tag = obj->tags; *tag != NULL; tag++)
- {
- const char *behindMatch = matchTag(src, *tag, loose);
- if (behindMatch != NULL && strlen(*tag) > strlen(*par))
- {
- *par = *tag;
- restOfSrc = behindMatch;
- }
- }
- }
- if (**par == '\0')
- {
- *par = src;
- }
- return restOfSrc;
- }
- bool matchCommand(const char *src, const char *pattern)
- {
- const char **par;
- for (par = params; par < params + MAX_PARAMS; par++)
- {
- *par = "";
- }
- for (src = skipSpaces(src); src != NULL && *pattern != '\0'; pattern++)
- {
- src = isupper(*pattern)
- ? matchParam(src, paramByLetter(*pattern), pattern[1] == '\0')
- : matchTerminal(src, *pattern);
- }
- return src != NULL && *skipSpaces(src) == '\0';
- }
|
Explanation:
- Line 16-19:
function matchSpaces
tries to match a stretch of whitespace, either between two words,
or (possibly zero-width) at the end of the input string.
As with most of the functions in this module,
parsing starts at the character pointed to by parameter src;
the return value is either a pointer to the position in the input string
that immediately follows the match
(in this case, to the first non-whitespace character encountered;
this may be the NUL character denoting end-of-string),
or NULL, meaning no match was found
(in this case, if there was a non-whitespace character at the start position).
- Line 21-25:
function matchTerminal matches a particular character,
case-insensitively.
If the required character (parameter terminal) is a space,
the match is a stretch of (whitespace) characters, as explained above.
- Line 27-34:
function matchTag matches a particular word, case-insensitively.
If parameter atEnd is true,
then matching will only succeed if the word is last in the input string.
- Line 36-58:
function matchParam matches and captures a nonterminal.
All tags of all objects will be scanned to find the longest matching tag.
The array element pointed to by par
will be filled with a pointer to the matching tag (if any).
Set parameter loose to match a ‘loose’ nonterminal;
this will always consume the entire input string
(i.e. return a pointer to the end-of-string character),
regardless of whether there is a tag to satisfy this match.
- Line 53-56:
if no matching tag is found, then the remaining input is captured.
This can be used to included the non-matching input in an error message;
we do this in function executeNoMatch, defined in parsexec.c.
- Line 60-74:
function matchCommand matches an entire pattern,
capturing nonterminals into array params.
We adjust the implementations of the various commands
to make use of the new array params.
Sample output |
Welcome to Little Cave Adventure.
You are in an open field.
You see:
a silver coin
a burly guard
a cave entrance to the east
dense forest all around
--> get coin
You pick up a silver coin.
--> give coin
You give a silver coin to a burly guard.
--> go cave
You walk into the cave.
You are in a little cave.
You see:
an exit to the west
solid rock all around
a closed door to the south
a tiny key
--> get key
You pick up a tiny key.
--> go south
The door is closed.
--> open door
You open a closed door to the south.
--> go south
You walk through the door into a backroom.
You are in a backroom.
You see:
solid rock all around
an open door to the north
a wooden box
--> unlock box
You unlock a wooden box.
--> open box
You open a wooden box.
--> examine box
The box is open.
You see:
a gold coin
--> get gold
You get a gold coin from a wooden box.
--> quit
Bye!
|
inventory.h |
- extern bool executeGet(void);
- extern bool executeDrop(void);
- extern bool executeAsk(void);
- extern bool executeGive(void);
- extern bool executeInventory(void);
|
inventory.c |
- #include <stdbool.h>
- #include <stdio.h>
- #include "object.h"
- #include "misc.h"
- #include "match.h"
- #include "noun.h"
- #include "move.h"
- bool executeGet(void)
- {
- OBJECT *obj = getVisible("what you want to get", params[0]);
- switch (getDistance(player, obj))
- {
- case distSelf:
- printf("You should not be doing that to yourself.\n");
- break;
- case distHeld:
- printf("You already have %s.\n", obj->description);
- break;
- case distOverthere:
- printf("Too far away, move closer please.\n");
- break;
- case distUnknownObject:
- // already handled by getVisible
- break;
- default:
- if (obj->location != NULL && obj->location->health > 0)
- {
- printf("You should ask %s nicely.\n", obj->location->description);
- }
- else
- {
- moveObject(obj, player);
- }
- }
- return true;
- }
- bool executeDrop(void)
- {
- moveObject(getPossession(player, "drop", params[0]), player->location);
- return true;
- }
- bool executeAsk(void)
- {
- moveObject(getPossession(actorHere(), "ask", params[0]), player);
- return true;
- }
- bool executeGive(void)
- {
- moveObject(getPossession(player, "give", params[0]), actorHere());
- return true;
- }
- bool executeInventory(void)
- {
- if (listObjectsAtLocation(player) == 0)
- {
- printf("You are empty-handed.\n");
- }
- return true;
- }
|
Same thing for the module we added in the previous chapter.
openclose.h |
- extern bool executeOpen(void);
- extern bool executeClose(void);
- extern bool executeLock(void);
- extern bool executeUnlock(void);
|
openclose.c |
- #include <stdbool.h>
- #include <stdio.h>
- #include "object.h"
- #include "match.h"
- #include "reach.h"
- bool executeOpen(void)
- {
- OBJECT *obj = reachableObject("what you want to open", params[0]);
- if (obj != NULL) (*obj->open)();
- return true;
- }
- bool executeClose(void)
- {
- OBJECT *obj = reachableObject("what you want to close", params[0]);
- if (obj != NULL) (*obj->close)();
- return true;
- }
- bool executeLock(void)
- {
- OBJECT *obj = reachableObject("what you want to lock", params[0]);
- if (obj != NULL) (*obj->lock)();
- return true;
- }
- bool executeUnlock(void)
- {
- OBJECT *obj = reachableObject("what you want to unlock", params[0]);
- if (obj != NULL) (*obj->unlock)();
- return true;
- }
|
In location.c, the command look around is given its own function,
separate from the look command to inspect specific objects
(see lines 7-12).
location.h |
- extern bool executeLookAround(void);
- extern bool executeLook(void);
- extern bool executeGo(void);
|
location.c |
- #include <stdbool.h>
- #include <stdio.h>
- #include "object.h"
- #include "misc.h"
- #include "match.h"
- #include "noun.h"
- bool executeLookAround(void)
- {
- printf("You are in %s.\n", player->location->description);
- listObjectsAtLocation(player->location);
- return true;
- }
- bool executeLook(void)
- {
- OBJECT *obj = getVisible("what you want to look at", params[0]);
- switch (getDistance(player, obj))
- {
- case distHereContained:
- printf("Hard to see, try to get it first.\n");
- break;
- case distOverthere:
- printf("Too far away, move closer please.\n");
- break;
- case distNotHere:
- printf("You don't see any %s here.\n", params[0]);
- break;
- case distUnknownObject:
- // already handled by getVisible
- break;
- default:
- printf("%s\n", obj->details);
- listObjectsAtLocation(obj);
- }
- return true;
- }
- static void movePlayer(OBJECT *passage)
- {
- printf("%s\n", passage->textGo);
- if (passage->destination != NULL)
- {
- player->location = passage->destination;
- printf("\n");
- executeLookAround();
- }
- }
- bool executeGo(void)
- {
- OBJECT *obj = getVisible("where you want to go", params[0]);
- switch (getDistance(player, obj))
- {
- case distOverthere:
- movePlayer(getPassage(player->location, obj));
- break;
- case distNotHere:
- printf("You don't see any %s here.\n", params[0]);
- break;
- case distUnknownObject:
- // already handled by getVisible
- break;
- default:
- movePlayer(obj);
- }
- return true;
- }
|
Our game still accepts simple verb-noun commands only,
but the new parser does have the potential to accept
commands with more than one noun like ‘put coin in box’;
this will be demonstrated in the next chapter.
The new parser is a huge improvement over the original one,
but by today’s standards, it is still far from perfect.
For example, there is no structural way to
manipulate multiple objects with a single command
(‘get coin, key and lamp’)
or execute two or more commands in a row
(‘get key then go east’).
In the true sense of the word, my parser is not even a parser.
It is just a simple pattern matcher.
IMHO, it does its job sufficiently well for a simple adventure game.
Some of its flaws can be mitigated by adding more patterns.
But eventually, you will run into its limitations,
and you may want to move on to something more mature.
In that case, I would recommend a decent parser generator
(e.g. Yacc).
Please be aware that this is not an easy thing to do.
For now, this is outside the scope of this tutorial.
Next chapter: 14. Multiple nouns