Back in the days of Pen Windows this was one of the hot topics. Debating whether "ink" was going to be an important data type in the soon to be multi-billion dollar pen computing biz. In fact at one point in time after I left the pen group and was working in MS research on general compression algorithms, I assigned one of my junior engineers to consult with the pen group on a means of ink compression. Nothing much came of it as I recall. I mean, if the pen group really considered it important they would not have trusted some random dude in the research group do it. I believe we got the project so that some program manager over in the pen group could check off the item - "working on ink compression" - check!
Anyway I thought I'd report on a note taking system that I hacked out since then that does some interesting ink compression and is in my opinion a rather satisfying ink capture application.
You have a blank screen upon which you can draw. Works just like any paint package pencil, you scribble on the screen and it leaves lines. However after each stroke (time from pen down to pen up) the system "cleans up" the stroke you just drew. What this really means is that the system compresses the stroke and replaces the user drawn data with its reconstruction of the compressed data for the stroke.
There are two buttons off to the side. One is labeled "WRONG!" the other labeled "OOPS!" The purpose of the buttons is to allow the user to give additional feedback to the system to tell it how it is doing. The idea is that the default feed back which the user gives by just continuing to draw, is that everything is fine. "Wrong" means the system screwed up and "oops" means that the user screwed up and would like the last stroke to just be erased as if it never happened. The oops feedback in necessary because the system is engaged in learning from the supplied input and if you make a mistake in the input you would like to clear it out rather than have it corrupt the learning process.
The means for compressing strokes is trivial. In fact, I think it is surprisingly trivial, meaning that it did well even in places where I expected the behavior to be less than satisfactory. More on that later.
I keep an array, the dictionary, of normalized strokes. The normalization process is to take the stream of XY data that made up the original stroke, reduce or expand it to exactly 20 XY coordinate points and to scale the points so that they fit in a (-1,1) coordinate system. I do nothing fancy like interpolating between points, I just toss out all but 20 points or duplicate them if necessary to get to 20 points and then scale them.
Whenever the user inputs a new stroke, I normalize it and then look through my stroke dictionary to find the closest thing to the stroke that was just drawn. For the metric function for measuring "closeness" I just use the standard Euclidean distance - root sum of squares of the differences. I replace the ink on the screen showing the user's actual input with the stroke I found in the dictionary. Then the feedback comes.
If the user hits WRONG it means that the element in the dictionary was not a close enough match to what the user input. In that case we just take the original user input, add it to the dictionary (quantized down to the 20 points) and fix up the screen to match the new dictionary element.
If the user hits OOPS they have decided that they made a mistake so you just remove the stroke from the screen and do NO learning.
If the user draws another stroke this is takes as acceptance of the previous stroke, in which case the learning happens. Learning consists of averaging in the latest user input to the dictionary entry. For my system I used a 75% old to 25% new ratio. i.e.
Dic = .75*Dic + .25New
The result is that over time my strokes are averaged together and the results look cleaner than any single example of my handwriting.
The results are surprising. Notice that I do absolutely nothing to locate inflection points or cusps. You would think that sometimes when I draw a 3 the cusp would land on the 10th point and other times would happen to be the 12th point. If you average those together you should get mush that does not look much like a 3. That does not seem to be the case. Perhaps I have just not entered enough data to see this problem yet. On the other hand, if it did look bad, I would just hit the WRONG button and keep a new prototype to clear up the problem.
Is 20 XY points enough? Clearly not if you draw huge long scrawls without ever lifting your pen. 20 seemed to be enough for the sort of drawing I do.
The compression is difficult to measure. I am taking a stroke and reducing it to 4 numbers, the CenterX, CenterY, ScaleFactor, and DictionaryIndex. (Note: I don't have to count the 20 points that made up the stroke in the dictionary because that is a "fixed" overhead. It is part of the compression algorithm.) The typical stroke is 20 points or more so the compression is probably better than 10 to 1.
It is important to note that this is not really an appropriate compression method for a legit data type. By legit I mean one that I can exchange with someone else. My dictionary may not match your dictionary. However it is equally obvious that we could collect a lot of user data, nail down a common dictionary, do no more averaging of input stroke data and turn every WRONG click into an escape mechanism that keeps the newly input data for transfer.
In fact, as far as averaging in the new user data goes you probably want to have some limits on that anyway in the interests of stability. If I have a bunch of ink documents that are all full of dictionary indicies and the dictionary keeps drifting there is no assurance that the stored document will decompress and look like it did when it was stored away. Seems to me that the appropriate fix is to keep track of how often a single dictionary entry is hit and freeze it after it has had about 10 things averaged into it.
The one thing that this method does not do well is the following. Since I do no rotations, only scaling, I find that when I am doing line art, if I draw a line at one angle, it will replace it with one at a slightly different angle. The fix seems to be that you just keep hitting WRONG and build up a good collection of differently angled straight lines.
The other mildly annoying thing is that since the system is not recording starting XY and stopping XY the replacement line does not always exactly match those two things. IF what you are trying to draw is something like a network topology where you are carefully targeting line endpoints and the system is totally ignoring them is can be a bit disconcerting. Fortunately WRONG fixes that as well.
The fact is, the thing I am proudest of in this design is the UI. There is something very satisfying about punching the WRONG! Button. Usually the computer is actually cleaning up what I draw. It make is look nicer. That is really cool. It is replacing my strokes with something smoother. Hey! That is WRONG! As soon as you punch the button the screen reverts to exactly what I drew (well, what I drew quantized down to 20 points) The wrong button is nice and big. The OOPS! Button is small. Oops! The pen skipped and what I drew is crap. It found some random match in the database but I don't want to average that crap in and pollute my dictionary. My mistake. Little button. Oops! I certainly got to the point that I was hitting oops more often that I was hitting wrong but I kept the button size skewed for psychological reasons.
I liked this note taker so well, that I spun my wheels for days wondering whether it was worth learning C and learning how to build plug in modules for the Web browsers so that I could post my line drawings and cool "in my own handwriting" notes to the Web. I did finally answer that question in the negative. Of course it is not worth learning all that shit. Nothing is worth that much pain.
I could of course declare victory and be done with it. I have what I think is a quite reasonable ink capturing system. It would even not be terribly difficult to add the handwriting recognition component on top of this system, though of course it would still suck just like it did before.
However what I think is the next step is to use this as the basis for a more sophisticated recognition package.
As it stands, it is perfectly adequate for writing mathematical equations. In fact, it is better than adequate because it does actually clean up my handwriting. It lets me invent any new notation what so ever and also lets me draw those little pictures of a torus with a hole punched in it. (We mathematicians do that a lot, you know!)
Seems to me that if I am always drawing little pictures of a torus with a hole punched in it, the damn system ought to notice that and draw the whole picture right. I.e. What I have built cleans up single strokes. What I want should notice that I always draw rectangles with 3 strokes always in the same order. (This is the legacy of learning to draw the Kanji for "Mouth" using the proper Japanese stroke order.) I want the system to notice and clean up the higher order structure "rectangle" rather than just cleaning up the components.
When I was in the Pen Windows group I used to whine that the problem that we ought to be working on was 2D parsing. My thinking was that using the pen for the input of text is foolish because the typewriter is superior. What we should be using the pen to input is notations that are inherently 2 dimensional (Text is really one dimensional. One char sequenced after another.) There are many examples of 2D notations, math equations, electrical circuit diagrams, flowcharts, music notation, chemical bond diagrams, etc.
I can't say that I have made any progress on that problem, but I think I have now got the foundation upon which I want to build that system.
Continuous Flow Traffic
Many years ago, watching the Pasadena Police Dept Motorcycle Corp weaving in and out as they drove down the Rose Parade route. I got to wondering if it would not be feasible to eliminate the STOP and GO nature of traffic lights in the city. The idea is this. Instead of having a traffic light at the intersection, you have a moving spot light. As I drive up the the intersection, I try to get the spot light lined up right on the hood of my car. I whiz through the intersection synchronized with the light beam. The car behind me also has a light on his hood and he is forced to leave a gap between his car and mine that is just big enough to allow one car to blaze through the intersection (timed by his own light of course) in the perpendicular direction.
You would have to learn to IGNORE the traffic you see right in front of you and just match the light.
The draw back to this idea of course is that everyone better do it exactly right or you have high speed collisions. You not only have to do it right, you have to trust everyone else to do it exactly right.
So the next idea was to have a more fault tollerant system for synchronizing the flow through the intersection. The idea was this. Imagaine a huge pendulum swinging back and forth across the roadway right in front of the interesetion. We are talking megatonnage big here. Your job is to dirve through when the pendulum is up and out of your way. If you miss time it, BAMO the pendulum smashes you off the road. This means that the guy that screws up is the one that pays the price, instead of some innocent party.
It is a slightly better design, but the cost of getting all that tonnage swinging back and forth seems prohibitive.
It occurs to me that that in the future when we are no longer driving because the computer that is our car is driving itself we might in fact begin to trust the other drivers enough to do continuous flow through the intersections.
Just wanted to point out that when we go to smart cars with reaction times measured in microseconds that chatter back and forth among themselves so that they know exactly who is on the road and where not only do we get the benefit of not having to worry about drunk drivers but also we won't need to stop at intersections any more.
This is the second note I have made concerning smart self-driving vehicles in just a few weeks. Don't exactly know why that is on my brain lately. I suppose the main reason is that that I believe that we could actually build self driving cars today. I do not believe that there is any need to develop any more AI or do anything particularly fancy.
We might need to do a little work to the road way, like paint blue lines down the center of the lanes, The car would center up on the blue line and follow them. The blue lines could be interupted at semi regular intervals, just like the bar codes on products so that the car can actually read where it is, "Next Right is Green Lake Ave. Speed limit 25MPH"
Lane changing can be delt with by haveing criss crossing lines that are the "allowed" places for lane changes.
The remaining issues would seem to be a) knowing the location of the traffic around you. And b) noticing that something does not look right about the road (some dog has wandered onto it) and being able to stop in those situations.
The knowing about the traffic is mostly a matter of getting on the CB radio and talking to those around you.
"Breaker Breaker, This is Lil' Mazada. I'm in lane 2 westbound on the 520 exactly 37 dashed lines from the Lake Washington Blvd turn off. I am currently doing 53 MPH and today I seem to be able to accelerate at about .2 G going forward at this speed and can break at about .7 G. How's it hanging out there? Who's with me."
The communication chanel number could be written onto the road and changed every 50 yards or so so that you are always talking only to the cohort that you are driving with. Perhaps you are thinking from my CB radio analogy that this is some kind of difficult communication. Think of it this way. You can just write a long string of numbers all the way along the blue line and every car that rolls over the blue line is supposed to recite the numbers as it rolls over them.
All you have to do is just listen to your local cohort blather their numbers and you know exactly where everyone is and how fast they are moving. Every once in a while they slip in a string of numbers of where they plan to be going, for lane changes or turns, the digital equivalent of a turn signal or a break light.
The fact is, all the codeing of the speedlimit and the turnoffs etc can be done AFTER you paint the road with random digits. The painter car drives a stretch of road spraying in the random numbers. Then drives the road reading them off and this becomes the map that your car downloads before it hits the road.
None of this strikes me as particularly difficult. I suspect that the biggest problem is that I don't want my smart car driving on the road if there are old non-smart cars out there who won't talk back and tell me where they are.
That is where the dog detecting, sonar system kicks in. "Hey good buddies, I got some large chunk of moving junk up ahead of me here in lane 2. Its doing about 57 MPH and appears to be staying in my lane. I'm giving you a wide margin. You better answer this call NOW or I'm calling 911 on you. Over!"
How many people do we kill each year on the highways that we could fix with about a 1000 dollars per car of processor, memory, radio, sonar and a few tons of blue paint?
So why isn't NASA doing this? Why do we talk about spending billions on super conducting super colliders, and trips to MARS instead of Federal projects that will let me tell my car, "Go pick up the girls at school and bring them straight home." Do any of you have the slightest doubt that this could be designed, implemented, tested, and rolled out in 10 years or less?
Should I go whine to bill that he should start up a project like this?
Over and Out