Focus or lack thereof
In case you are wondering if Marlin has just slipped off into reliving the days when he was teaching a CS Algorithms course, well, you are partly right. However there is a reason that I am hacking through all of this seemingly random stuff. It was and is my hope to use this news letter as a forum for discussion of Computer Music Composition. Since everyone on the mailing list comes from differing backgrounds with respect to both CS and Music I am attempting to go over all the background material, algorithms and musical terms so that we have a common vocabulary for discussing this stuff. I want to know that everyone on the mailing list knows what Kohonen nets, Viterbi algorithms, and black board systems are.
I know that some of you were involved with the handwriting recognition project and are thus quite familiar with some of these CS techniques that are useful in the AI domain. I hope that you will bear with me as I go back through all of this stuff.
The reason that you are seeing more CS than music so far is that I am culling through previously half written CS class notes and articles and editing them up for the news letter. I am also doing some writing from scratch and expect to be doing a lot more of that. However, there should be no doubt in your mind that I am using this news letter as a tool to force myself to clean up a lot of the junk that has collected on my hard drive over the last decade or so. I can assure you that the week will come when I just can't think of anything to write and I will dip into by bin of poetry and fiction and fob some of that off on you.
I was inspired in this direction by one of the talks given at Microsoft Research. They invited in this dude, Dr. Manford Clynes, to give a talk on his enhanced music playback system. He is in the middle of giving his talk to about 50 people when he says, "Hey, I want to read you this poem that I wrote while I was in Japan."
Suddenly there we are in the twilight zone, "time - The fourth dimension - evolution ..." The audience fidgets wondering how long this is going to last.
I mean, why not? He had his captive audience, it was his talk, why not just launch into, "Hey check out this tune," or, "Here's some photos of me and the dog from our family vacation a few summers ago." It made me realize how constrained many of the forum that we frequent really are.
We are not so constrained here. Caveat Emptor.
Dynamic Programming - An intro
Dynamic programming is a name attached to a combination of two techniques (not algorithms) that help convert some minimization problems from a potentially exponential run time into a polynomial time, and possibly even into a linear time.
These tricks only work on a certain class of problems (problems where the minimum of some global big thing is obtained from a simple combination of minimums for a few small things thus a divide and conquer can work) but it is a large and useful class so the tricks are worth learning.
One classical minimization problem that dynamic programming solves is approximate string matching. In this problem, you assign a cost function (penalty)for every letter mismatch that occurs, another cost function for every letter that occurred in the search string but not in the text and yet another cost function for every letter that occurred in the text but did not have a matching letter in the search string. What you are trying to do is find the alignment that minimized the total penalty that you incur. At first glance, this problem is combinatorially hard, since you can line up nearly any letter with nearly any other letter, stretching the search string over any number of letters in the text string, or skipping letters in the search string. However as we will see later, it can actually be done in a manner that is not much more computationally inefficient than standard exact string matching.
One last word on approximate string matching before we go on. The string-matching problem is particularly important not only for its use in dealing with actual matching of text, but because many recognition problems such as speech or handwriting can be cast as approximate string matching problems. The problem of aligning a human played midi file with a score and the search through a tune database looking for a not entirely correctly remembered melody are other examples.
Before we tackle the classic problem we will take a brief digression and show how the techniques of dynamic programming can be formally applied to the "stupid" exponential runtime version of the Fibbonacci algorithm to get the standard linear runtime algorithm. This is overkill for the Fibbonacci problem, but it illustrates the techniques and helps render them obvious. The Fibbonacci problem is not a minimization problem, but the transform involves the two essential steps of dynamic programming.
Technique 1: Given a recursive problem, where F(bigproblem) is defined in terms of F(little problem) combined with F(little problem) you often are solving the same little problem over and over. You can typically eliminate the recursion and use an array to store the results of the sub problem and just reuse the answer.
Technique 2: given a large array of subproblems that you need to solve in order to solve the big problem, it may well be that by reorganizing the order in which you do things you can convert the large array to a small array to do the same thing. i.e. you may find that once you have used the results of the small subproblems to solve some mid size problem, you no longer need the small results any more so you can recycle the space they were using.
In summary, write the recursive function, then convert all the function calls to array calls and then reorder what you are doing to minimize the size of the array that you need to use.
Here is the recursive Fib program:
if n > 1 then
Fib = Fib(n-1) + Fib(n-2)
Fib = 1
Technique 1 says convert problem from recursion to using an array. Since Array values for large N are computed from array numbers smaller than N we will fill the array from 0 on up.
n = 100 ' or some big number
Fib(0) = 1
Fib(1) = 1
For i = 2 to n
Fib(n) = Fib(n-1) + Fib(n-2)
Sho nuff, no array element is used before it has been filled. We have converted the exponential problem to a linear one. However it still uses large space. Technique 2 says to observe that we never need an element more than 2 back so we can crank the array size down to two elements.
Fib(0 mod 2) = 1
Fib(1 mod 2) = 1
For i = 1 to n
Fib(i mod 2) = Fib((i - 1)mod 2) + Fib((i-2) mod 2)
You can eliminate the mod 2 using conventional techniques to toggle indices back and forth if you want.
So we have used the two techniques to convert from recursion to arrays then to trim down the array size.
One last point on the general method. In the Fib example I went immediately from the recursive function to the array based one that uses FOR loops. It may not always be obvious how you are supposed to do that with a more gnarly recursive function. The difficulty is often in the figuring out what order to use to fill in the arrays. There is actually a very easy reformulation of the recursive fib problem that technically still uses recursion but in reality shortcuts to using arrays. The advantage of this is that the transformation from the exponentially slow to the polynomially fast algorithm becomes a simple syntatic change. So here is Fib once again:
Dim FibA(1000) ' this is the array that shortcuts the recursion
' rename the original recursive function to Rfib so that it no longer directly calls itself
if n > 1 then
Fib = Fib(n-1) + Fib(n-2)
Fib = 1
' now write the new function that shortcuts the recursion with the array
if FibA(n) = 0 then ' I use the fact that fib in never 0 to flag uncomputed elements
FibA(n) = Rfib(n) ' compute new value using Real fib then stuff into array
Fib = FibA(n) ' now I know that FibA(n) has a value so return it
and finally be careful to initialize the array befor you call fib at the top level
For i = 1 to 1000: FibA(i) = 0: next I
This allows you to hack out the recursive function. Get it working for small n and then just wrap the old function with a layer that shortcuts to the array. This is not as efficient as what you get by a total rewrite that eliminates all the function overhead. But it is way easier to do, way easier to read, and let's face it, when yer just screwing around solving some combinatorial puzzle like the Brozz eggs who cares about function overhead.
Solution to the Brozz Egg Problem
As I said before, there was a puzzle alias at the soft and this one made its rounds. The debate that went around was all about whether binary search was right or not. I hacked out the dynamic program solution and showed that binary search was NOT correct. (Though to be fair it is not very far from being right)
I resurrected the puzzle for last weeks news letter so that I could launch this discussion of dynamic programming and the puzzle once again leaked out into MS email and had dozens of people working on it.
I was shocked to discover that this time the MS alias had quickly converged on a totally different solution from any that were discussed the first time the problem went around. Now it is possible that I gave the problem wrong in this second pass but I don't think so.
So first the latest and greatest solution: Fill the boiling kettle with 60 eggs. Pull one out every minute until you get the perfect egg. Immediately note the time and dump out all the water, you now have several perfectly cooked eggs and know exactly how long it takes to cook them. Expected time to solution = Proper cooking time.
Piece of cake! No programming let alone dynamic programming necessary.
So I will now restate the problem is such a was that you get to use the DP tools I just showed you.
Consider a binary tree with N nodes whose values are the numbers from 1 to N. The tree is arranged so that an in order traversal of the binary tree prints out the numbers from 1 to N in order. (In other words if a node has the value X then that node's left sub tree only has values less than X and that node's right sub tree only has values greater than X) Now, there are many such trees. We define the cost to a node to be the sum of all the parent nodes plus the value in the node itself. (this is the time it would take to cook a parent egg, discover whether it is raw or over cooked, make the binary decision and move to the new cooking time) We are looking for the tree that optimizes the average cost to each node. Since all the trees have exactly N nodes optimizing the average cost is the same as optimizing the total cost.
Now that you all are DP experts the code for this is pretty simple:
You need to be able to find the optimal layout for the nodes from a to b. You do this by looking a each possible split point and seeing what the cost is for that split and you pick the best split point. Here is the psuedo code
if a = b then ' first we do the trivial case
OptimalCost = a
elseif a > b then ' this just stops the recursion
OptimalCost = 0
else ' a < b - the normal case
BestSoFar = Cost of split at a
BestSplit = a
For I = a + 1 to b
NewCost = Cost of split at I - this will be recursive, calling OptimalCost
If NewCost < BestSoFar then ' we have a better solution
BestSoFar = NewCost
BestSplit = I
OptimalCost = BestSoFar
Save BestSplit somewhere so we can reconstruct the optimal tree
The handwaving was what is the cost of a tree from a to b split at some I
There are a total of (b-a + 1) nodes in this tree. The top of the tree has value I. You must pay the cost of I not only for the node I itself, but you must also pay the cost of I for every single child and grandchild of I thus the total cost of the tree from a to b is:
Cost(a,b,I) = (b - a + 1) * I + OptimalCost(a, I - 1) + OptimalCost(I + 1, b)
The other piece of handwaving that often causes trouble in dynamic programming is the saving the best split. You see, it is rare that all you care about is the one optimal value. What you usually want as well is the optimal set of decisions that led to the optimal value. What you typicall need to do is have another array that instead of keeping the optimal values, which is driving the recursive function, is keeping track of the decision that was made at each point. You then must walk this array to reconstrct the actual path in problem graph that led to the optimum value.
Before I show the real code, I will remark on one last sleezy trick. Normally I would need two arrays, one to keep track of the optimum value and one to keep track of the split point that led to that optimum value. However note that I only care about the function for cases when a is less than b. This means that I only use the upper triangular portion of a square array. Since I can fit two triangles into a square array I will pack both of the arrays that I need into one. I create one array, bsa() - BestScoreArray, and use the following convention. Given the problem from a to b where a < b I will store the optimum value at bsa(a,b) and I will store the split point at bsa(b,a)
Here is final VB code Including some really ratty code to print out a tree structure picture of the trees so that you can eyeball the shape of the tree.
It may appear that I made no effort to do step two of the DP process, that I made no effort to reduce the size of the matrix needed. This is not quite true. I spent many days trying to find some order in which I could fill the array so that I could toss out some of the information once it was no longer needed. Great effort expended but no success. I convinced myself that there is no size ruduction possible in this problem. Not all problems allow you to use step 1 and not all problems allow you to use step 2. Actual mileage may vary.
Brozz Egg VB code
Const n = 60
Dim bsa(n, n)
Private Sub Command1_Click() ' I just hard wire a button to calculate BestScore(1,60)
Print "clearing array"
For i = 1 To n: For j = 1 To n
bsa(i, j) = 0
Next j: Next I
Print "doin it"
t = BestScore(1, n)
Public Function BestScore(a, b)
If a > b Then
BestScore = 0
ElseIf a = b Then
BestScore = a
bsa(a, b) = a
Else ' a < b
If bsa(a, b) <> 0 Then
BestScore = bsa(a, b)
bs = (b - a + 1) * a + BestScore(a + 1, b) ' score for split at a
bi = a
For i = a + 1 To b
newscore = i * (b - a + 1) + BestScore(a, i - 1) + BestScore(i + 1, b)
If newscore < bs Then
bs = newscore
bi = i
BestScore = bs
bsa(a, b) = bs ' save the cost
bsa(b, a) = bi ' save the split point
Private Sub Command2_Click() ' will print the sub tree from a to b
a = Val(Text1.Text)
b = Val(Text2.Text)
For i = a To b
t$ = Str$(i)
ta = a: tb = b
While bsa(tb, ta) <> i
t$ = " " & t$
'Print "("; ta; ","; tb; ")="; bsa(tb, ta);
v = bsa(tb, ta)
If v < i Then
ta = v + 1
If v > i Then
tb = v - 1
I think I will save the approximate string match for another time.
If you actually hack in this code and look at the trees you will find that I was totally off base with the hint that I gave when I introduced the problem. Such is the nature of failing memory. I remembered that the tree was skewed from a simple balanced binary tree, but I had the skew in exactly the wrong direction. The skew is actually toward the shorter time eggs.
an example in pictures is worth a thousand words
Here is the perfectly balanced tree for the numbers from 1 to 7
1 3 5 7
Here is a higly skewed one that has an identical cost function.
3 5 7
In the balanced tree it costs you 4 + 2 + 1 to get to the 1 or seven points.
in the skew tree it costs you only 1 to get to 1 and you have added the cost of 1 to the six other paths. So in the skew tree you have shaved off 6 points off the 1-path and have added 1 to each of the six paths and the result is very different shaped trees with identical cost structure.
As you go up to higher numbers and particularly as you approach 2^(n-1) nodes which would allow a perfectly balanced tree, you find that just before you get to perfectly balanced you can win by skewing a small number up to the top of the tree, adding that small cost to every long path and simplifying the left sub tree. The optimal split point seems to vary back and forth from about 1/3 to 1/2 depending on how close you are to a power of 2. Here is the split points for the first several numbers.
So you see just before a power of 2, like 14 the tree is just about balanced with a split point mid way at 7, but right at the powers of 2 it skewes. Just for the record, the problem that I gave you to solve was the problem from 1 to 60 and 60 just happens to be just before the skew dislocation that happens at 63 and thus to add insult to injury the actual shape of the optimal tree for 60 is indistinguishable from a balanced binary tree contrary to my hint.
OK so I lied. That is just part of the thrills of a hacked out newsletter as opposed to one of those boreing old peer reviewed journals.
Beating Fib to death
A quick digression for those who have not seen this trick: What is the runtime of the recursive Fib algorithm? It is exponential. It takes Fib(n) steps to calculate Fib(n) What is the runtime of the one we listed above that just uses an array? It is linear. It takes n steps to calculate Fib(n). Way faster. Is it the best we can do. Nope, not even close. Here is the order lg(n) algorithm that kicks face.
Consider the following 2x2 matrix
if you multiply the vector (a.b) by that matrix you get the vector (b, a+b). Pretty cool, nee. If a and b happened to be two successive terms in the Fibonnaci series, the Matrix shifted you over one more in the series and gave you the next term. If I multiplied once more I would have (a+b,a+b+b) So if I want to know what the series looks like N terms out I can evaluate the vector equation
Now the speed up comes because we can raise to integer powers very fast the trick to that is to notice that you can multiply M by itself to get M^2. You can multiply M^2 by itself to get M^4. So you can quickly crank out M raised to any power of 2 and then using the binary representation of N you can pick out which powers of 2 you need to multiply together to get M^N. So you crank out M^N in lg(N) steps, hit the vector with the matrix and ta da!
If we ignore the problem of size of integer storage, and really we must since they get real long real fast, i.e. we change the problem from evaluating Fib(n) which has too many digits to write down for any decent sized n to evaluating Fib(n) mod 32 bits, and we set you the problem of calculating Fib(billion) you get that wonderful spread of execution times where the recursive one takes you several times past the heat death of the universe to run (not to mention having blown out the return stack long ago) the second one is done in seconds and the last one is done in milliseconds.
This trick works for any simple linearly recursive function and is the method used to show that linearly recursive functions are just the discrete analog of a linear differential equations. No surprise that the answers come out to be so similar.
I hear nothing from any of you. Actually that is not exactly true, luis recommends the movie "Lolita" playing at the egyptian and my dad sent me a song that he wrote.
This is not a problem. I can continue to churn on in this vein forever. I just wanted to let you all know that you are not the only one out there that contributes nothing.
Things I am thinking of for future issues:
Handwriting Input of Music Notation
Names, not random numbers.
Jan Ken Po - Paper, rock, scissors.
Quick and dirty Voroni Diagrams
the second half of Dynamic programming - string matching.
I will going back to Russia to look in on things there in another week or so. I may or may not interrupt the gottahack during that time. You been forewarned.