Panlingua, by Chaumont Devin, May 10, 1998.
Chapter 7, Panlingua Data Structures.
Any Panlingua-based system must have an ontology because Panlingua cannot work without one. Every Panlingua word is linked by a lexlink to a semnod in the ontology, and many Panlingua operations require the information contained in the semlinks of the ontology. And if a Panlingua-based system is to communicate with the outside world at all, then it must have a lexicon, because it is only by means of the lexicon that the atoms of Panlingua can be converted to the symbols of the outside world. These two structures will be basic, then, to almost any system involving Panlingua. But Neither one of these structures themselves use Panlingua.
But besides these two critical data structures, there are many more that are of importance to Panlingua-based systems, all of them represented in Panlingua. In this chapter I will attempt an examination of some of them.
The General Reference.
There are many things a fully functional linguistic apparatus needs to know that require more than a binary relation to be defined. An ontology can tell us simple things, like, "Roses are red," because only two words are involved, and the verb is a common ontological operator (semlink type). But for sentences like, "Soldiers shoot guns," "The sun rises in the east," etc., several binary relations will be required. These trivial facts about the world that every good computer must know should be stored in a Panlingua reference of the template variety. Recall that in Chapter 6 I described such Panlingua references under the heading, "Template Matching." Furthermore, this general reference should be handled dynamically in such a way that the system can learn. Thus, for example, if the general reference contains the sentence:
The sun rises in the east.
and later the system receives from a reliable source the information that:
Heavenly bodies rise in the east.
and a check of the ontology reveals that the sun is a heavenly body, then the original sentence should be upgraded to:
Heavenly bodies rise in the east.
and all versions of this sentence in the general ontology that match the upgraded entry should be destroyed. Then supposing there was no original entry stating that:
The moon rises in the east.
now, providing the ontology has an entry for
moon isa heavenly_body
the system will automatically know that the moon also rises in the east without being told.
Discourse Logs. A Panlingua log should be kept of the last few sentences that have been communicated in discourse with each user and for each text or Panlingua structure being output or read. The system will use these to resolve problems of discourse such as what the conversation is about, who the last male subject was, etc.
In addition to the general reference, the system may require specific encyclopedic information about places, people, events, and other things. These can take various forms, including large files on disk. Not all the information the system can get at through theses need be represented in Panlingua. For example, it may be desirable to display copies of original texts, graphics, sound and video clips, etc. Or when certain topics are referenced it may be desirable to have the system invoke some dedicated interactive process. All these things can easily be done in Panlingua by adding various marker atoms to the Panlingua structures of the reference. Instead of pointing to a semnod, the semnod identifier field of such atoms could be used to identify other data files, programs, etc.
This reference would contain such idioms as, "Kick the bucket," "Turn green," etc. The parsing proses would then go like this:
1. Parse into Panlingua. 2. Check to see whether parsed subtree is an idiom. 3. Skip ahead if it is not. 4. Replace the idiom with its true meaning if it is, and mark the subtree to show that in the original text an Idiom was employed.
And why can't this idiom check be made before parsing? Simple. "Raining cats and dogs," "Raining bloody cats and dogs," and "Raining cats and blankety-blank dogs" all contain the same idiom, which cannot be matched at a word level but only at a thought level, which requires that the match be made in Panlingua.
Most people agree that something--namely alternate meanings when ambiguities occur--is lost in translation. However anything can be marked. For example, "It was raining (idiom originally used for next word)hard." This can be accomplished using a marker dependent in Panlingua which retains the information that the original version used an idiom instead of a single word.
In fact anything that can be said can be represented in Panlingua, and once in Panlingua this information can be scanned at high speeds, and searching can be done with high selectivity and accuracy. Not only this, but for the computers of the future, many processors can be assigned, each one to a different Panlingua structure, so that split-second searches can be done over truly vast amounts of information.
But in the meantime, even with the personal computers of today, Panlingua implementations whose atoms are only seven bytes long can handle many billions of different semnods and words. And as an example, to record a simple fact such as, "Soldiers shoot guns," requires only 21 bytes of computer memory, which means that nearly fifty thousand such facts could be stored in just one megabyte of RAM. And because many such items will be stored as templates, each one of such items might match many sentences. For example, if the general reference says that "Soldiers shoot guns," and the ontology tells us that a corporal is a soldier, then it will be known by the system that corporals shoot guns. So the potential of Panlingua-based systems is indeed very large, and it is doubtful whether other types of systems could emulate this kind of performance.
I must point out that all these facts about Panlingua-based systems, versatility, speed, highly selective matching, and sheer amount of information that can be stored, all serve only to confirm what I have already asserted before, namely that:
Panlingua is the unique basis upon which all truly functional automated linguistic systems must be built. All refinements of automated linguistic systems must converge upon Panlingua. No automated linguistic system can ever really be made to work without Panlingua. Etc.
And these are not just idle claims.