Panlingua, by Chaumont Devin, May 10, 1998.

Chapter 6, Matching.

Basic Matching in Panlingua.

Recall that a thought is a verb and its dependents. In Panlingua, rather than matching texts character-by-character and word-by-word, entire phrases and thoughts can be matched to see if they mean the same thing.

Suppose we have a knowledge base represented entirely in Panlingua, and that it runs something like this:

Eggs cost $1.50 per dozen.
Coffee costs $1.50 per pound.
TV dinners cost $2.00 each.
Ice cream costs $2.50 per half gallon.
Fresh bread costs $1.50 per loaf.

What would this look like in Panlingua?

Recall that every word in Panlingua has a synlink and a lexlink. Let us disregard the lexlinks temporarily and focus upon the synlinks. Recall that in the syntactic plane the nodes and links of Panlingua have a Tinkertoy appearance. The Tinkertoy sticks correspond to synlinks, and the wheels correspond to words, or nodes. Every one of the sentences given above is a thought--namely a verb and its dependents. The sibling dependents all link together horizontally and the first dependent links vertically to the verb, creating the typical Panlingua down-and-right-branching effect. The general pattern is a series of L shapes, one L for each thought, and with the verb of the thought at the top of each L.

But what about the verbs at the tops of the L formations. They have to have synlinks too. Their regent may be a real word or it may be a dummy word placed to fill this requirement at the head of the array. And because all the verbs of the structure are of the same rank (all of them are dependents of the same regent), they are all siblings, which means that they will be joined by horizontal synlinks. As a result, Panlingua representations of long strings of sentences like the above would appear to have a verbal backbone with the dependents of these verbs hanging beneath them.

Now suppose that this Panlingua representation is held in an artificial intelligence that can interact with a user, and that the user asks the AI, "How much does coffee cost?"

First the AI will parse the English entry to obtain yet another Panlingua representation. Then the AI will examine the Panlingua results and "see" that the newly-parsed structure represents a query. Then the AI will change the synlink type of the parsed query from "complex interrogative" to "declarative," so that no mismatch will occur because the sentences represented in the knowledge base are declarative. It will also change the lexlink of the word representing "what" so that it links to a wildcard semnod that will match anything. Once this has happened, both the query and the knowledge base are in appropriate Panlingua representation, and Panlingua matching can begin. The AI will take the verb of the query and compare it to the first verb of the knowledge base to see if these two atoms match. If they do not, the AI will skip ahead to the next verb (which is linked to the first by a synlink), and see if that one will match, and so on until a match is found or the AI reaches the end of the Panlingua representation. But if they do match (as they must, since both will be "cost"), it will then compare the dependents of both verbs to see if a match can be found among the dependents of the verb in the knowledge base for every dependent of the verb of the query. A Panlingua match occurs whenever a regent and all its dependents can be matched to another regent and its dependents without regard for the order in which the dependents occur.

If the search fails, the AI will respond with an "I don't know." If it succeeds, the AI will pass the subtree that matched the wildcard semnod, which will be the price, to the text generator, and the system will say, "$1.50 per dozen."

Needless to say, this kind of searching is exceedingly fast because instead of having to muddle through texts one character at a time, the AI can skip along from verb to verb to verb. Furthermore, instead of matching the characters of each word, it need only check to see if the semnod to which two words are linked is the same and whether the synlink types of the two words are the same.

Of course this is only a simple example designed to provide a basic idea of how Panlingua matching works. In a real system special algorithms would be inplace to interpret queries like, "How much are eggs?" in such a way that the system would make sure a price was returned (since "how much" in this case would be assumed to mean a price), etc.

But it will be seen that even at the most basic level, Panlingua has killed two old birds with one stone. Not only can Panlingua search much faster, but it is also able to match on a thought rather than a word level, so that thoughts expressed differently will still match as long as their meanings are the same.

Fuzzy matching.

But what if instead of just saying, "I don't know," every time a less than perfect match were found, you might like the AI to give you more information? Panlingua provides several ways for determining fuzzy values. Suppose you asked the same AI, "What does day-old bread cost?" and the AI found that there was no entry for "day-old bread," but only one for "fresh bread." Using a simple matching function, instead of returning 0, meaning "perfect match," the function might return 1, meaning "one miss." And when compared with the returns on all the other matches, this value of 1 might turn out to be the lowest. The AI might then respond, "I don't know, but fresh bread costs $1.50 per loaf." This is a simple example of fuzzy matching.

To convert the fuzzy value obtained in the above to a percentage match value, just divide the 1 by the number of words in the query, which happens to parse to four atoms, and multiply by 100 for a result of 25%.

More refined matching algorithms can be written to weight their mismatches according to rank. In other words, make each dependent missed count as 2, but make each dependent of a dependent missed count only for 1, etc. This exact approach is not recommended, but is provided only as an example.

Yet another kind of fuzzy matching might take into account and weight ontological relationships. For example, suppose biscuits and bread both have hypernym links to "bakery products" in the ontology. Then if the user were to ask, "What do biscuits cost?" instead of just answering, "I don't know," it might answer, "I don't know, but fresh bread costs $1.50 per loaf." Etc. Template Matching.

Another important kind of Panlingua matching is what I will here call "template matching." Using Panlingua it is possible to create templates against which many sentences can match. This is done again by matching in conjunction with the ontology. As an example, suppose we have a string of Panlingua templates, one of which is:

Truck drivers drive trucks.

And suppose further that in the ontology we have an entry:

Sam is a truck_driver

and we want "Sam drives trucks" to match "Truck drivers drive trucks." A matching function can do this if it is designed for template matching, because not only does it check to see if the semnods to which two atoms link are the same, but if they are not the same then it uses the ontology to see if the semnod to which the atom in the template is linked is a hypernym of the semnod to which the query word is linked. In this kind of matching, any word in structure A matches the corresponding word in structure B if the word in structure A is linked to the same semnod as the word in structure B, or else if the word in structure A is a hyponym of the corresponding word in structure B.

This kind of matching is very useful for searches into generic references in order to see if a sentence is of a recognized type or pattern because a few template representations will be able to match many test sentences.

Matching Direction.

It is important to notice that Panlingua matching is directional. Thus the fact that sentence A matches sentence B does not necessarily mean that sentence B will match sentence A. As an example, "She parked the car" will match "She parked the car in the garage," but "She parked the car in the garage" will not match "She parked the car." This is because the fundamental matching rule for Panlingua reads as follows:

Subtree A matches subtree B if the regent of subtree A matches the regent of subtree B and the dependents of subtree A can all be matched among the dependents of subtree B.

And this rule must be applied recursively whenever dependents of dependents occur.

Ignoring Markers.

Many Panlingua structures may require marker atoms to indicate things like tone of voice, emphasis, what to do in case of a match, etc. These can all be safely ignored where simple queries are being matched to knowledge bases because the queries will be free of these markers though the knowledge base may have them. Recall that in Panlingua matching, if structure B has all the dependents of structure A and more, then A will match B, because all that is required is that what is in A be found with the same syntax in B.

But care must be taken in cases where structure A may contain special dependents not present in structure B. For applications involving such cases special code will have to be written so that these additional dependents will be ignored when they are not part of the comparison to be performed. Further Confirmation.

These matching properties, so humanlike in character, provide striking confirmation for the validity of our deductions about the existence and structural features of Panlingua.

Further research.

Needless to say, if even these cursory examinations of the matching properties of Panlingua can yield so much, then there may be astonishing new discoveries to be made in the areas of Panlingua-based matching and inference processing.