Geeks With Blogs
Ulterior Motive Lounge UML Comics and more from Martin L. Shoemaker (The UML Guy),
Offering UML Instruction and Consulting for your projects and teams.
To understand the code behind Dee Jay, we first need to understand the basics of the M-SAPI speech recognition system. That means we need to understand three concepts:

  1. SpeechRecognitionEngine. This is the class that will listen for commands and phrases and fire events when it recognizes something. We're not ready to understand this class yet, even though it's a very simple class. Before we can look at the SpeechRecognitionEngine, though, we need to look at Grammar.
  2. Grammar. This class describes a complete set of phrases and options that a SpeechRecognitionEngine will recognize. There are a number of ways to create a Grammar, ranging from simple strings to W3C Speech Recognition Grammar Specification (SRGS) documents. But for Dee Jay, we're going to concentrate on building a Grammar out of smaller elements, using the GrammarBuilder class.
  3. GrammarBuilder. This is a class that represents a subset of a Grammar; and that subset can itself have subsets, and so on.
GrammarBuilder is the focus of this post; and I find that it helps to understand GrammarBuilder if you think of it in relation to two standard design patterns: Decorator and Composite. Neither one precisely describes the design of GrammarBuilder, but they'll help you to think about how it works.

The Decorator Pattern

Decorator is a pattern that allows you to dynamically add new behavior to an existing object, as shown in Figure 1:

Decorator Pattern

Figure 1: The Decorator Pattern

In this example, we have Things that DoStuff. Now at run time we want to make some Things also able to DoPlainStuff and others also able to DoFancyStuff. Now if we had the right sort of problem, we could solve this with Plain and Fancy subclasses of Thing; but what if we won't know when we first create a Thing whether it will be Plain or Fancy (or neither)?

Another solution would be to create a converter that converts a Thing to Plain or Fancy; but as we get more varieties and the number of converters grows, this can get cumbersome. And what if we later find a Thing which we want to do both Plain and Fancy stuff?

The Decorator Pattern says that the solution is not subclasses and subsubclasses and subsubsubclasses and a plethora of converters; rather, there is one base class (Base Thing in Figure 1) and two subclasses. One subclass is Thing itself; but the other is DecoratedThing, which isn't really a Thing at all. Instead, DecoratedThing contains a Base Thing; and any time someone asks DecoratedThing to DoStuff, it does so by asking its "inner Thing" to do the real work. And that "inner Thing" might be a real Thing, or it might be another DecoratedThing. The first DecoratedThing doesn't know, and doesn't care. It simply asks the inner Thing to do work.

And now we can define Plain Things by creating PlainDecorator, a subclass of DecoratedThing, and sticking a real Thing inside it. And we can define Fancy Things with FancyDecorator. And we could even stick a PlainDecorator inside a FancyDecorator. There's no limit.

Now GrammarBuilders aren't Decorators, though I thought they were at first. I thought that because they have some Decorator-like behavior, in that a GrammarBuilder can be defined or built out of smaller GrammarBuilders. There's a definite sense of layers within layers, much as with Decorator. (Why aren't GrammarBuilders Decorators? See below...)

The Composite Pattern

Composite is a pattern very similar to Decorator; but instead of adding new behavior to an existing thing, you define a thing that contains other similar things. The distinction between the two patterns is subtle, and is more in intention than in implementation: you could take Composite code and use it in a Decorator fashion, so the code differences are minor. But in Decorator you think about adding behavior, while in Composite you think about adding contents.

A typical example of Composite is shown in Figure 2:

The Composite Pattern

Figure 2: The Composite Pattern

In this example, we have two varieties of Widgets (Plain and Fancy), and then a CompositeWidget; and all three are subclasses of a base Widget class, and can do whatever Widgets do. But the Composite Widget contains 0 or more Widgets, which may themselves be Plain, Fancy, or Composite; and when asked to do its Widget stuff, it does so by asking each of its contained Widgets to do their Widget stuff.

GrammarBuilder isn't quite like Composite, either. Once a GrammarBuilder has been created, it really doesn't act like a collection with contents. Rather, it acts just as a single entity with a lot of rich detail.

The GrammarBuilder Class

So what does GrammarBuilder look like? Well, something like Figure 3:

GrammarBuilder and Friends

Figure 3: GrammarBuilder and Friends

One look at Figure 3 will tell any UML-aware reader what's lacking for either the Decorator Pattern or the Composite Pattern: base classes! A GrammarBuilder is indeed made up of smaller pieces; but those smaller pieces don't have any common base classes. So GrammarBuilder may be inspired by one of these patterns, but it isn't implemented as either of them. (At least not publicly. If you dug inside, I suspect you would find something that looks a lot like Composite: a tree-like structure containing internal elements constructed from the external elements in Figure 3.)

Figure 3 shows that Grammar Builder depends on itself and also on four other classes:

  1. String. This is simply the .NET string class. It represents one word or phrase the user might say.
  2. Choices. This class represents a choice between two or more alternate phrases. It is defined by the list of choices. Note that, somewhat like GrammarBuilder, Choices also depends on both string and GrammarBuilder. The alternates in a Choices list can be simple strings, or they can be more complex phrases built up through GrammarBuilders.
  3. SemanticResultKey. This takes an existing Grammar element (GrammarBuilder, Choices, string) and attaches a label to it so that you can find it as a member of a SemanticValue array after recognition. For instance, in Dee Jay, you could give the command "Play Graceland". I used SemanticResultKeys to define this command as [Command][MusicKey]"; and then when I ask for [Command], M-SAPI returns "Play"; and when I ask for [MusicKey], M-SAPI returns "Graceland". By using SemanticResultKeys, you tell the SpeechRecognitionEngine how to parse your phrases for you automatically.
  4. SemanticResultValue. This element allows you to map a recognized phrase to a given bool, int, float, or string value. So for instance, you might map the word "score" to the number 20.
So a GrammarBuilder can be built from any of these classes, including another GrammarBuilder; and two GrammarBuilders can be combined to form a new GrammarBuilder, as can a GrammarBuilder and a string or a Choices. This may not be precisely the Composite Pattern, due to no common base classes; but it sure is a form of composition.

To see a very simple pseudocode example of how GrammarBuilders can be used to build a Grammar, let's imagine a control with a background color and a foreground color; and let's further imagine that either color can only be red, green, or blue. Then our Grammar could be built like this:

// Define the color choices.
chcColors = Choices("Red", "Green", "Blue");

// Add the key, "Color".
keyColor = SemanticResultKey("Color", chcColors);

// Make a GrammarBuilder.
gbColor = GrammarBuilder(keyColor);

// Define the target choices.
chcTargets = Choices("Foreground", "Background");

// Add the key, "Target".
keyTarget = SemanticResultKey("Target", chcTargets);

// Make a GrammarBuilder.
gbTarget = GrammarBuilder(keyTarget);

// Make the combined GrammarBuilder.
gbCommands = gbTarget + gbColor

Once converted into a Grammar, this GrammarBuilder will match any of the following phrases:

  • Foreground Red
  • Foreground Green
  • Foreground Blue
  • Background Red
  • Background Green
  • Background Blue
But it won't match any of these phrases:

  • Foreground Yellow
  • Foreground Color
  • Target Blue
  • Target Color
  • Target Earth
  • What?
Keep in mind that "Target" and "Color" are red herrings (so to speak) in these bad examples. "Target" and "Color" aren't recognized phrases in the Grammar; rather, they're keys to look up parts of the recognized result, as in the following bit of pseudo-code:

// Read the command pieces.
target = result.SemanticValues["Target"];
color = result.SemanticValues["Color"];

Where Next?

Now that we understand the basics of building a GrammarBuilder, we'll need to build a Grammar and recognize it. We'll look at how to do that when I get time to continue this series.
Posted on Saturday, November 15, 2008 4:30 PM .NET , M-SAPI | Back to top

Comments on this post: Dee Jay, Part 1: Decorating, composing, or encompassing?

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © Martin L. Shoemaker | Powered by: