Disk Horse – an interactive conversation program

 

What should it do? – It should be fed simple conversations and be able to extract data from those conversations.

 

Example:  I tell the system, “I have a red car.”

 

From that statement there are many questions that the system should be able to answer.

 

1)      Do I have a red car? – Yes

2)      What color is my car? – red

3)      Is my car red? – Yes

4)      Is my car green? -  no

5)      Do I have a red car? – Yes

6)      What do I have? – you have a red car.

7)      Who has a red car? – You do. You have a red car.

8)      Who has a green car? – I don’t know.

9)      Is my car fast? – I don’t know. You did not say anything about fast.

10)   Do I have a fast red car? – I don’t know. You have a red car but I don’t know whether it is fast or slow. I don’t know if it is fast.

 

I propose the following:

 

Understanding a conversation (and being able to converse) means being able to answer questions similar to the 10 above once you have been fed a first statement. The first statement is cracked into some internal data storage, a database, and all the following statements are cracked into queries on that database. However, I will define understanding as being the ability to converse independent of the internal storage format.

 

I believe that the first step to understanding speech is to collect a data set similar to the above. I.e. mini-conversations that display a logical or plausible set of conclusions that one should be able to infer from a small set of input statements.

 

What is the nature of a program that can do the discourse? I believe that it will have something similar to parsing so that it can extract sentence structure. It will need to have persistent state and be able to distinguish persistent from transient data. For example, it needs to remember that red is a color from session to session, but is should not necessarily remember that I have a red car. It probably needs to be able to classify i.e. car is a noun, red is an adj, red is also a color, cars are things that move, people ride in them, you can only fit about 6 people into a car at one time, They are too heavy for a person to lift, etc. In the process of discourse I will probably mix meta information in with the information. By that I mean, I will make statements intended to help the system classify. For example, “Red is a color,” or “Fast is the opposite of slow.”

 

I expect that the system will need to create templates, All X are Y. It will need to be able to count. However – and this is a big however – I think that the most important thing is that the code necessary to make this all work is probably too complicated for me to write. I postulate that none of the conclusions that I listed are going to be difficult to derive. I.e. I believe that the total number of instructions executed in order to draw those conclusions may be small and tractable BUT the total number of paths through the code is probably huge. If I am correct, then the most important aspect for this project is that I must not stoop to writing the code; rather, the design must include the aspect that the system writes its own code. The system that I build is the system that will grovel over the conversation fragments that I give it and will attempt to generate the code that is capable of answering the questions.

 

Lest this be viewed as an intractable problem let me point out the following model. The process of parsing a computer language can be characterized as follows. You have a huge table that has triplets namely: a current state, an input symbol, and the resulting state. Parsing a sentence to see if it follows the rules of grammar simply involve walking through the input sentence, doing the state transitions from the table and seeing if the final state is either accept or reject. Is it possible to grow a parser (fill in the table) for a language (FORTRAN) by giving the system a set of test sentences, both legal and not quite legal FORTRAN and have the system create the proper tables? What if you seed the table with known useful genomes – for example caned routines that parse out legal keywords, or legal variable names?

 

The point is simply this it is possible to genetically grow data, and data in the form of a table as above can be treated as code. Very simple code (small tables) can do fairly complex stuff parsing. Enormous tables may be able to do something as competent as NLP. It is plausible to believe that with the proper concern for hierarchy it may be possible for the system to tease out the structure (and the meaning) of a natural language. It is plausible to believe that a single word, once recognized triggers a context – a cluster of related states that may be appropriate for dealing with that particular word. When the next word comes in, again the word triggers certain contexts.

 

For example the word “The” at the start of a sentences triggers off the context that I have seen a definite article, I am now in the state of creating a noun phrase, I could see an adjective next, I am probably in the process of creating a subject for the sentence, etc.

 

The exact nature of this cluster of states is not particularly important. Think of the state generated by a word not as being a single number, but rather being a cloud of points in some abstract state space. As other words come in, they generate their own cloud of state points and compatible points between the words bind up with the one another. Each of those bindings has the possibility of creating other state elements which from our external viewpoint may map to concepts such as “that was the noun that completed the noun phrase and is a good candidate for being the subject of the sentence unless of course there was some long subordinate clause at the beginning of this sentence and the subject will be along sometime later.”

 

I do not expect the system to be able to articulate the meaning of any of these state points. I also don’t plan to go inside the system and try to read what is in the table. I just feed the system test data and ask it to grovel through possible table values to see if it can find one that allow the entire system to correctly replicate all the conversational elements.

 

I do not expect any kind of magic convergence to happen. If necessary I can jamb in required state values (“Look, you haven’t figured it out yet and I am tired of waiting, there are things called ‘nouns’ there are things called ‘subjects’ there are things called ‘words’ a word is either a noun or it is not IN the context of a single sentence, it may be either one in different contexts. Within the context of a single sentence I want you to assign a boolean value to the word indicating whether you think it is a noun or not. You must bind those states. I am going to slap you down whenever you fail to bind those states. Furthermore, given any single word there is a certain percentage based on usage frequency that will hint at what way you should bind those words when you have no other context yet to guide you. I know, you haven’t a clue what I am talking about as I say all this, I am just thinking out loud. I will write some routines that will create named boolean states attached to words with language probabilities attached to the words. YOU will use those routines as basic genetic material that you can adopt into other contexts for other uses because I from the outside know that this is a pattern that will be useful to you in the processing of language.”)

 

My main conviction is this: while any single word may have an enormous amount of information connected to it, the amount of information needed to link up and bind the information is not conceptually complicated it is merely very context dependent. This gives me some hope that a genetic scheme might be able to create the state cloud about a word (or a phrase) that can capture the meaning, and the usage of the words.

 

As a sentence comes in, it is recognized as a sequence of words that spun off clouds of state that bound up giving a graph that captured the meaning of the sentence. The questions are similarly cracked into graphs that are interpreted (by the humans on the outside) as instructions to the system to grovel back over the graph of what was said and re-serialize portions of the graph as natural language output.

 

I am under no illusions that this process will be either quick or easy. I am merely of the opinion that the complexity of NLP which is true complexity, in the sense that enormous amounts of information, special cases, and exceptions must be handled, is not in any single local context particularly complicated. This give me hope that even though the final code may be billions of lines long it could actually run on a current day computer because only some small fraction of the code actually needs to be invoked in order to crack any single sentence.

 

I further maintain, that even in the event of a total failure in the creation of a successful NLP system, one of the results of this project would be a database of conversations that could be used both as training data and as a test suite for any NLP system because we would be creating conversational forms that impart information and then immediately test for the acquisition of that information.