Marlin’s Gottahack
– 1998 Nov 09 – PREV HOME NEXT
Ink Compression
---------------
Back in the days of
Pen Windows this was one of the hot topics. Debating whether "ink"
was going to be an important data type in the soon to be multi-billion dollar
pen computing biz. In fact at one point in time after I left the pen group and
was working in MS research on general compression algorithms, I assigned one of
my junior engineers to consult with the pen group on a means of ink
compression. Nothing much came of it as I recall. I mean, if the pen group
really considered it important they would not have trusted some random dude in
the research group do it. I believe we got the project so that some program
manager over in the pen group could check off the item - "working on ink
compression" - check!
Anyway I thought
I'd report on a note taking system that I hacked out since then that does some
interesting ink compression and is in my opinion a rather satisfying ink
capture application.
Implementation
--------------
You have a blank
screen upon which you can draw. Works just like any paint package pencil, you
scribble on the screen and it leaves lines. However after each stroke (time
from pen down to pen up) the system "cleans up" the stroke you just
drew. What this really means is that the system compresses the stroke and
replaces the user drawn data with its reconstruction of the compressed data for
the stroke.
There are two
buttons off to the side. One is labeled "WRONG!" the other labeled
"OOPS!" The purpose of the buttons is to allow the user to give
additional feedback to the system to tell it how it is doing. The idea is that
the default feed back which the user gives by just continuing to draw, is that
everything is fine. "Wrong" means the system screwed up and
"oops" means that the user screwed up and would like the last stroke
to just be erased as if it never happened. The oops feedback in necessary because
the system is engaged in learning from the supplied input and if you make a
mistake in the input you would like to clear it out rather than have it corrupt
the learning process.
The means for
compressing strokes is trivial. In fact, I think it is surprisingly trivial,
meaning that it did well even in places where I expected the behavior to be
less than satisfactory. More on that later.
I keep an array,
the dictionary, of normalized strokes. The normalization process is to take the
stream of XY data that made up the original stroke, reduce or expand it to
exactly 20 XY coordinate points and to scale the points so that they fit in a
(-1,1) coordinate system. I do nothing fancy like interpolating between points,
I just toss out all but 20 points or duplicate them if necessary to get to 20
points and then scale them.
Whenever the user
inputs a new stroke, I normalize it and then look through my stroke dictionary
to find the closest thing to the stroke that was just drawn. For the metric
function for measuring "closeness" I just use the standard Euclidean
distance - root sum of squares of the differences. I replace the ink on the
screen showing the user's actual input with the stroke I found in the
dictionary. Then the feedback comes.
If the user hits
WRONG it means that the element in the dictionary was not a close enough match
to what the user input. In that case we just take the original user input, add
it to the dictionary (quantized down to the 20 points) and fix up the screen to
match the new dictionary element.
If the user hits
OOPS they have decided that they made a mistake so you just remove the stroke
from the screen and do NO learning.
If the user draws
another stroke this is takes as acceptance of the previous stroke, in which
case the learning happens. Learning consists of averaging in the latest user
input to the dictionary entry. For my system I used a 75% old to 25% new ratio.
i.e.
Dic = .75*Dic +
.25New
The result is that
over time my strokes are averaged together and the results look cleaner than
any single example of my handwriting.
Results
-------
The results are
surprising. Notice that I do absolutely nothing to locate inflection points or
cusps. You would think that sometimes when I draw a 3 the cusp would land on
the 10th point and other times would happen to be the 12th point. If you
average those together you should get mush that does not look much like a 3.
That does not seem to be the case. Perhaps I have just not entered enough data
to see this problem yet. On the other hand, if it did look bad, I would just hit
the WRONG button and keep a new prototype to clear up the problem.
Is 20 XY points
enough? Clearly not if you draw huge long scrawls without ever lifting your
pen. 20 seemed to be enough for the sort of drawing I do.
The compression is
difficult to measure. I am taking a stroke and reducing it to 4 numbers, the
CenterX, CenterY, ScaleFactor, and DictionaryIndex. (Note: I don't have to
count the 20 points that made up the stroke in the dictionary because that is a
"fixed" overhead. It is part of the compression algorithm.) The
typical stroke is 20 points or more so the compression is probably better than
10 to 1.
It is important to
note that this is not really an appropriate compression method for a legit data
type. By legit I mean one that I can exchange with someone else. My dictionary
may not match your dictionary. However it is equally obvious that we could
collect a lot of user data, nail down a common dictionary, do no more averaging
of input stroke data and turn every WRONG click into an escape mechanism that
keeps the newly input data for transfer.
In fact, as far as
averaging in the new user data goes you probably want to have some limits on
that anyway in the interests of stability. If I have a bunch of ink documents
that are all full of dictionary indicies and the dictionary keeps drifting
there is no assurance that the stored document will decompress and look like it
did when it was stored away. Seems to me that the appropriate fix is to keep
track of how often a single dictionary entry is hit and freeze it after it has
had about 10 things averaged into it.
The one thing that
this method does not do well is the following. Since I do no rotations, only
scaling, I find that when I am doing line art, if I draw a line at one angle,
it will replace it with one at a slightly different angle. The fix seems to be
that you just keep hitting WRONG and build up a good collection of differently
angled straight lines.
The other mildly
annoying thing is that since the system is not recording starting XY and stopping
XY the replacement line does not always exactly match those two things. IF what
you are trying to draw is something like a network topology where you are
carefully targeting line endpoints and the system is totally ignoring them is
can be a bit disconcerting. Fortunately WRONG fixes that as well.
The fact is, the
thing I am proudest of in this design is the UI. There is something very
satisfying about punching the WRONG! Button. Usually the computer is actually
cleaning up what I draw. It make is look nicer. That is really cool. It is
replacing my strokes with something smoother. Hey! That is WRONG! As soon as
you punch the button the screen reverts to exactly what I drew (well, what I
drew quantized down to 20 points) The wrong button is nice and big. The OOPS!
Button is small. Oops! The pen skipped and what I drew is crap. It found some
random match in the database but I don't want to average that crap in and
pollute my dictionary. My mistake. Little button. Oops! I certainly got to the
point that I was hitting oops more often that I was hitting wrong but I kept
the button size skewed for psychological reasons.
I liked this note
taker so well, that I spun my wheels for days wondering whether it was worth learning
C and learning how to build plug in modules for the Web browsers so that I
could post my line drawings and cool "in my own handwriting" notes to
the Web. I did finally answer that question in the negative. Of course it is
not worth learning all that shit. Nothing is worth that much pain.
The Future
-
I could of course
declare victory and be done with it. I have what I think is a quite reasonable
ink capturing system. It would even not be terribly difficult to add the
handwriting recognition component on top of this system, though of course it
would still suck just like it did before.
However what I
think is the next step is to use this as the basis for a more sophisticated
recognition package.
As it stands, it is
perfectly adequate for writing mathematical equations. In fact, it is better
than adequate because it does actually clean up my handwriting. It lets me
invent any new notation what so ever and also lets me draw those little
pictures of a torus with a hole punched in it. (We mathematicians do that a
lot, you know!)
Seems to me that if
I am always drawing little pictures of a torus with a hole punched in it, the
damn system ought to notice that and draw the whole picture right. I.e. What I
have built cleans up single strokes. What I want should notice that I always
draw rectangles with 3 strokes always in the same order. (This is the legacy of
learning to draw the Kanji for "Mouth" using the proper Japanese
stroke order.) I want the system to notice and clean up the higher order structure
"rectangle" rather than just cleaning up the components.
When I was in the
Pen Windows group I used to whine that the problem that we ought to be working
on was 2D parsing. My thinking was that using the pen for the input of text is
foolish because the typewriter is superior. What we should be using the pen to
input is notations that are inherently 2 dimensional (Text is really one
dimensional. One char sequenced after another.) There are many examples of 2D
notations, math equations, electrical circuit diagrams, flowcharts, music
notation, chemical bond diagrams, etc.
I can't say that I
have made any progress on that problem, but I think I have now got the
foundation upon which I want to build that system.
Continuous Flow
Traffic
-----------------------
Many years ago,
watching the Pasadena Police Dept Motorcycle Corp weaving in and out as they
drove down the Rose Parade route. I got to wondering if it would not be
feasible to eliminate the STOP and GO nature of traffic lights in the city. The
idea is this. Instead of having a traffic light at the intersection, you have a
moving spot light. As I drive up the the intersection, I try to get the spot
light lined up right on the hood of my car. I whiz through the intersection
synchronized with the light beam. The car behind me also has a light on his
hood and he is forced to leave a gap between his car and mine that is just big
enough to allow one car to blaze through the intersection (timed by his own
light of course) in the perpendicular direction.
You would have to
learn to IGNORE the traffic you see right in front of you and just match the
light.
The draw back to
this idea of course is that everyone better do it exactly right or you have
high speed collisions. You not only have to do it right, you have to trust everyone
else to do it exactly right.
So the next idea
was to have a more fault tollerant system for synchronizing the flow through
the intersection. The idea was this. Imagaine a huge pendulum swinging back and
forth across the roadway right in front of the interesetion. We are talking
megatonnage big here. Your job is to dirve through when the pendulum is up and
out of your way. If you miss time it, BAMO the pendulum smashes you off the
road. This means that the guy that screws up is the one that pays the price,
instead of some innocent party.
It is a slightly
better design, but the cost of getting all that tonnage swinging back and forth
seems prohibitive.
It occurs to me
that that in the future when we are no longer driving because the computer that
is our car is driving itself we might in fact begin to trust the other drivers
enough to do continuous flow through the intersections.
Just wanted to
point out that when we go to smart cars with reaction times measured in
microseconds that chatter back and forth among themselves so that they know
exactly who is on the road and where not only do we get the benefit of not
having to worry about drunk drivers but also we won't need to stop at
intersections any more.
But Seriously..
---------------
This is the second
note I have made concerning smart self-driving vehicles in just a few weeks.
Don't exactly know why that is on my brain lately. I suppose the main reason is
that that I believe that we could actually build self driving cars today. I do
not believe that there is any need to develop any more AI or do anything
particularly fancy.
We might need to do
a little work to the road way, like paint blue lines down the center of the
lanes, The car would center up on the blue line and follow them. The blue lines
could be interupted at semi regular intervals, just like the bar codes on
products so that the car can actually read where it is, "Next Right is
Green Lake Ave. Speed limit 25MPH"
Lane changing can
be delt with by haveing criss crossing lines that are the "allowed"
places for lane changes.
The remaining
issues would seem to be a) knowing the location of the traffic around you. And
b) noticing that something does not look right about the road (some dog has
wandered onto it) and being able to stop in those situations.
The knowing about
the traffic is mostly a matter of getting on the CB radio and talking to those
around you.
"Breaker
Breaker, This is Lil' Mazada. I'm in lane 2 westbound on the 520 exactly 37
dashed lines from the Lake Washington Blvd turn off. I am currently doing 53
MPH and today I seem to be able to accelerate at about .2 G going forward at
this speed and can break at about .7 G. How's it hanging out there? Who's with
me."
The communication
chanel number could be written onto the road and changed every 50 yards or so
so that you are always talking only to the cohort that you are driving with.
Perhaps you are thinking from my CB radio analogy that this is some kind of
difficult communication. Think of it this way. You can just write a long string
of numbers all the way along the blue line and every car that rolls over the
blue line is supposed to recite the numbers as it rolls over them.
All you have to do
is just listen to your local cohort blather their numbers and you know exactly
where everyone is and how fast they are moving. Every once in a while they slip
in a string of numbers of where they plan to be going, for lane changes or
turns, the digital equivalent of a turn signal or a break light.
The fact is, all
the codeing of the speedlimit and the turnoffs etc can be done AFTER you paint
the road with random digits. The painter car drives a stretch of road spraying
in the random numbers. Then drives the road reading them off and this becomes
the map that your car downloads before it hits the road.
None of this
strikes me as particularly difficult. I suspect that the biggest problem is
that I don't want my smart car driving on the road if there are old non-smart
cars out there who won't talk back and tell me where they are.
That is where the dog
detecting, sonar system kicks in. "Hey good buddies, I got some large
chunk of moving junk up ahead of me here in lane 2. Its doing about 57 MPH and
appears to be staying in my lane. I'm giving you a wide margin. You better
answer this call NOW or I'm calling 911 on you. Over!"
How many people do
we kill each year on the highways that we could fix with about a 1000 dollars
per car of processor, memory, radio, sonar and a few tons of blue paint?
So why isn't NASA
doing this? Why do we talk about spending billions on super conducting super
colliders, and trips to MARS instead of Federal projects that will let me tell
my car, "Go pick up the girls at school and bring them straight
home." Do any of you have the slightest doubt that this could be designed,
implemented, tested, and rolled out in 10 years or less?
Should I go whine
to bill that he should start up a project like this?
Over and Out