Marlin’s Gottahack – 1998 Sep 28 – PREV HOME NEXT

Genetic Algorithm Counterpoint


A couple years ago I did an experiment with counterpoint composition that has shaped some of my thinking about what I believe should be the structure of an automatic composition system. I will confess up front that counterpoint is probably the most mechanical (algorithmic) and most constrained of all composition system and thus lends itself most clearly to computer implementation, so one must be careful about trying to draw too many far reaching conclusions. None the less it did very well in the sub-domain of species one counterpoint and I think that the ideas could be easily extended.

I know that I have discussed this with some if not all of you already, but I have never written it up so here goes.



Counterpoint is an early form of music the evolved out of Gregorian chant. It reached its peak in the music of Palestrina in the 1500's and Bach in the 1600's to 1750. The basic thread of musical evolution in that time period was from music that was essentially melodically structured to music that was harmonically structured. In extremely broad strokes what happened was this.

1) Single melodies, chants, are written.

2) Simple 2 part parallel harmonization is invented. (second person sings same melody shifted up a third or shifted up a fifth. This means that the "harmony" is really a melody, the same melody, itself)

3) The harmonies evolve away from the parallel motion you get when you sing a melody against itself. The idea becomes one of having a second melody that fits in nicely with the first one. This is counterpoint - literally one point or note counter to another.

4) Counterpoint develops in several directions like having rhythmic variation between the two melodies or going beyond two voices to stacking up three, four, or five melodies all at once. This is polyphonic music which probably reached its peak with Palestrina.

5) In the next century there was a shift in thinking. In Palestrina's day you wrote a melody against a melody and if the harmony did not quite work out, tough, the voices had to work independently as melodies. By the time Bach was banging out counterpoint the thinking had shifted from polyphony to homophony, single melody with harmonic accompaniment. The basic shift was the harmony no longer sacrificed to melody, instead the harmony was king and if an inner voice had to cave into a harmonic demand and not be quite correct as an independent melody, tough.

Counterpoint and polyphony are essentially obsolete forms these days in the sense that basically since Bach's time harmony is king and independence of voice is not really a major consideration. However counterpoint is taught to all music majors, both for historical reasons and because there is value in learning how to weave independent voices together. I mean, if you can get your harmony AND your independent melody too, why not. Furthermore, among the theorists of music, it is believed that it is the invention of counterpoint that directly led to the invention (discovery?) of harmony. It is the reason that western music is so harmonically structured (rather than rhythmically structured as is Indian music, African music, an much of Asian music)



Counterpoint is taught by breaking it up into about 6 species. This is an instructional system only, meaning no significant music was ever written in the species, they are merely a graded path to help the student acquire the means to learn counterpoint. The simplest form, species one, eliminates all rhythmic variation. The melody consists of a stream of whole notes, one per measure. This melody is considered given by GOD and is called the cantus firmus (fixed voice) and your job as a student is to write a single melody also consisting of a single whole note per measure that fits with the CF.

You are given a pile of rules about how you are allowed to move the voice that you are writing. The rules look something like this:

First we define some terms.

skip - Any melodic step that is greater than a second

parallel motion - both voices move the same direction e.g. both go up

contrary motion - voices go in opposite directions.

Oblique motion - one voice stays fixed the other moves

1) Allowed melodic intervals - Everything up to a major sixth and the octave.

2) Forbidden melodic intervals - sevenths, or anything greater than an octave. Anything diminished or augmented.

3) Limits on successive skips - two skips in the same direction should not total more than an octave

4) Allowed harmonic intervals - unison, third, fifth, sixth, octave, octave + third, octave + fifth etc.

5) Forbidden harmonic intervals - second, fourth, seventh

6) Forbidden harmonic motion - you may NOT move parallel and land on a fifth or an octave.

There are more rules but these give the flavor. The above are quite mechanical. Allowed or NOT allowed.

The others are less mechanical. They look like this.

1) Contrary motion is better than Parallel motion.

2) Skips tend to imply a change of direction e.g. step down, skip up, step down

3) Voice crossing is tacky

In essence there are a pile of rules that are absolute and others that are things that are encouraged or discouraged.



After briefly considering a structure where the computer would compose a melody by choosing the next note from among the next allowed note and backtracking when you get into trouble, I choose instead use a genetic algorithm. I create random melodies, breed them with one another, score them based on how well they follow the rules and kill the weak melodies.

The genetic algorithm is of course just looking to minimize an objective function and in our case the objective function is just how well it follows the rules. I make no distinction between absolute rules and rules of encouragement, I just give harsh scores for absolute rules. Fore example, I set the cost of using a harmonic interval of a 4th (which is forbidden) at 1000 points. I set the cost of using parallel motion at 1 point, contrary motion is 0 points. Thus contrary is preferred to parallel but not by much.

Given that we are in the sub-domain of species one counterpoint, the genetic mutation an dcrossover is quite easy, when you breed two melodies you simply follow one for a while then switch to the other for a while and occasionally throw in a random note. (One of the problem I am thinking about is how to extend this model up to a rhythmically richer domain where I may wish to insert 2 notes where I once had one, but I don't want to screw up the timing and cause the back half of the melody to misalign with the CF.)



It worked great. Given short melodies, less than 20 notes, it would converge in a minute or so. The counterpoint sounded fine and was faster than I could do by hand.

What was nice about the GA structure is that I could easily add new rules and I could easily change the costs of violating any rule.

Having no "fixed" rules allows me to violate any of them at any time. For example I did the experiment, where I nailed down a single note. I.e. I forced the third note in the piece to be a C# which was dissonant with the cantus firmus and the GA was forced to find a solution that ran through that C#. Except for that one dissonant note, the piece was harmonically sound and it of course followed all the melodic rules.

Having a single scoring function allowed me to write my own solution to a counterpoint problem, score it and compare it to the one that the computer generated. In all cases, my solution was "inferior" - meaning that it had a higher score - to the one that the GA found.

Of course, sometimes, I liked my solution better, it sounded better to my ear, than the one that the machine generated. In those cases, I would compare scores, see why it had gone the way it did and then I would change the weights on some of the rules, or add new rules to force it to generate my solution. This gave me the feeling that what I was doing was modifying the score function so that it matched my aesthetic judgments.

There is an important point here. The scoring function could look at a CF and the solution and analyze it. It would tell me what it considered to be wrong (or right) about a composition. I could take a piece I was working, nail down the parts I liked and have it grind on the transition sections.

Overall, this is my model for how a computer generated music system should work. There is no "hardwired" logic, rather everything is done with weighted rules. The weightings can be changed to match any particular aesthetics. One can even imagine that one could do something like the following: You develop this system that is able to do automatic composition and then you train it in a style, say Mozart's. You do this by giving the system the melody for the Jupiter symphony and asking it to write the bass line, or any of the other parts. In each case you jiggle the weights of the system so that it produces the result most like the bass line that Mozart actually wrote.

Drawbacks - areas for further work


1) It was a toy domain. There was no consideration of the rhythmic aspects of composition.

2) There was no attempt to capture the notion of sequence. Sequence is the repetition of certain aspects of the music. For example the Duh duh duh DUUUUH of Beethovan's 5th is a rhythmic sequence repeated all through the first movement with different notes. You can also have intervalic sequence, where you may have notes in the pattern, up a second, down a third, up a second, and you want to repeat that pattern starting on different notes.

3) The problem in 2 shows up at different scales all through music when you create a FORM. For example a "classic ragtime" would have the structure AABBACCDD. It seems to me that if you are going to compose a ragtime piece it is appropriate to give the system the task of designing ABCD which you then output in the final form, rather than having it design AABACCDD. (the point being that the GA converges faster on shorter strings so rather than giving it a scoring function that includes the structure, you factor out the overall structure and have it design the pieces.)

Future Directions


I think that the main thing that I want to do next to further this work is to develop a language for representing the musical information so that one can write rules in a form that makes sense. I also want to keep the system closely tied to analysis as well as composition. What I mean by this is that it is fairly easy to write the code that will take 4 sections, ABCD, and output them as AABBACCDD. It is more work to analyze a piece and observe that structurally it is in the form AABBACCDD and therefor a rag. Nonetheless it seems to me that if you work only on compositional aspects and not analytical aspects you will get yourself into the problem that the system is not able to look at and learn from existing music.

I look at it in the light that composition is the function that takes you from a nebulous domain of "I want to write a rag" to a specific example of a rag. Analysis is in some sense the inverse of that function that takes you from a specific example to all the abstract representation, "Ah this is a rag and that is the trio section and I can see from having looked at a lot of rags that the trio section is almost always in a key 1 fifth away from the starting key of the rag."

At some point I will want to start passing code around with these news letters, but that will be a while off.

Book Recommendation


The Music Theory Handbook by Marjorie Merryman

It is a very good concise text on music theory. You can get it at amazon for $12. It will make some of the discussions of music easier if I can just refer you all to a section of the book rather than recapitulate in email. Here's the URL.


Solution to Puzzle #1


In the miscounted vote each person got exactly 6 points. How could a person get 6 points? It must be the sum of 3 numbers each of which is 1, 2 or 3. The only possibilities are 1+2+3 or 2+2+2.

However the 2+2+2 possibility could not occur because if one person collected three of the 2 votes, there are not enough 2 votes left for the other people. (Since each person gets to vote exactly one 3 one 2 and one 1 there are exactly 4 of each one of those votes)

So now we know that each person got exactly the same configuration of votes namely 1+2+3.

Since there were 3 liars we know that the honest person got three lies so his original score, 1+2+3 is now really 3+2+1 ie. he stays at 6 points.

Each of the other guys has two lies and one honest vote. If we mark the honest vote with a star we can look at the old vote and the new vote.

the guy ranked poor by the honest fellow got *1+2+3 -> 1+2+1 = 4

the guy ranked middle by the honest fellow got 1+*2+3 -> 3+2+1 = 6

the guy ranked high by the honest fellow got 1+2+*3 -> 3+2+3 = 8

So you can see that the guy that the honest fellow thought was top was the final winner of the competition, the honest fellow tied for second place with the fellow that he thought was mediocre, and last place goes to the fellow that the honest person did not like.

Next week


Some comments on how to implement genetic algorithms so that they consume less resources and run a little faster.

Also, I am thinking about writing a container application for these notes. (I built some toy note taking apps a few years ago and now that I am writing more regularly I think I might ressurect one of them)

jokes - I need some jokes for this thing. If I don't get any from you all, I just may have to tell you the one about the catholic ackolyte that I got from my sister. Till then here is a semi musical one:



Stallone, Van Damme and Schwartznegger are choosen to star in an Action/Music Movie. They figure with all these box office studs the thing is sure to be a success. So confident are they that they let each actor choose his own part.

Stallone say he is into the boy genius thing and wants to be Mozart. Claude likes the piano so he opts for being chopin. Of course Schwartznegger just says, "I'll be Bach!"