Scientific Teaching

I have had a notion for some time that the process of education could benefit from a more scientific approach to teaching. This is not to claim that no one is studying education from a scientific approach but rather that the actual system that we use for teaching lacks the uniformity and focus that one could get from a “create a theory – experimentally test it – revise it” cycle.

For the most part each teacher learns a topic and then goes out to teach it and discovers that there is quite a difference between knowing your subject and knowing how to present it to students with a varying background, lacking either essential language skills or fluidity with fundamental arithmetic etc. Material gets written into books and who knows how useful it is?

The basic idea is that one should create teaching materials and testing materials. These could be any combination of text, slides, figures, animated diagrams, videos etc but what they should NOT be is a human standing in front of a class waving its arms and spontaneously rambling along. The reason is that we want to collect information, via testing, about the impact of the teaching material. Did it convey the meaning as intended? Do students understand some topic better after reading paragraph A? Do they understand it better after reading paragraph B? Is one paragraph better than the other at explaining the topic as measured by tests?

We want experimental replicability, which is better obtained if you are working from a fixed material.

One way to apply this methodology would be as follows. Wouldn’t it be nice if you could take physics from Richard Feynman? Or Russian Literature from Vladimir Nabakov? These folks were acknowledged top teachers in their fields. Why not film or tape them or have them write books and give everyone access to their knowledge? The answer is of course that while the educational industry has been slow to adopt video and audio formats they have at least used books to broadcast the information. But it seems to me that what has not been done is to finish the scientific process. Does Feynman’s book teach friction better than Sears and Zemansky? Did Feynman get to look at the results of testing and find out that his presentation was flawed because he assumed certain knowledge that with just a little feedback he could have said, “Oh yes, I forgot to tell you, you first need to understand…”

The concept is simply to close the loop on the creation of educational materials. You test them on various students and use the feedback from the testing to modify the material. This is nothing other than formalizing the process of evolution of the educational materials in an effort to optimize them along all the lines of utility, entertainment, ease of access etc.

The Internet provides a very interesting vehicle for doing this. The reason is simple. The Internet allows for broadcast distribution of educational materials at essentially no cost. (Of course, the development of the materials costs, but the distribution to millions of people is essentially free) Furthermore the interactivity of the PC allows for the immediate collection of testing data following the consumption of the material. For the first time in history we have a vehicle that will permit us to disseminate and collect information to millions essentially for free. This will give us a large enough audience to do some meaningful statistical analysis.

A Mathematical Model

I would now like to suggest a mathematical model that I believe would be useful in the creation of a system of scientific teaching. It works like this:

Let us assume that the educational process is one of conveying certain topics to the student. These topics may have complex interdependencies like that you can’t really learn calculus until you learn algebra, which you can’t learn until you have learned arithmetic. Let us also assume that any student comes to the process with some haphazard collection of topics that they have already learned.

The teaching system is going to attempt to a) evaluate the student’s knowledge base of topics by means of test questions and b) having created a model of a particular student’s knowledge it will choose to present educational material that is known from previous testing to be useful to students that come from a similar knowledge base.

For example the system asks a few questions, quickly determines that the student does not know calculus, does not know algebra, appears to know the multiplication table but does not quite have the hang of fractions. So start teaching fractions. Or, along the same lines asks a student what she would like to learn today? (a question that ascertains the topic of what is currently motivating to the student) Once the topic is selected, say Russian literature, a little probing is done – have you read this? Have you read that? Oh you did, what did you think was going on when Rascalnikov was first apprehended in this passage…

Humans are responsible for the creation of the educational material reduced into chunks that attempt to convey a single topic. They also create the questions that probe the knowledge base of the student again attempting to create the perfect question that tests but a single topic.

The computer is responsible for sequencing everything. First testing and evaluating the student, then presenting appropriate educational material, and then looping between those modes. Test – present – test to see if presentation took, if not present alternate material for same topic or move on.

There is one other responsibility that the computer has and that is where the math and the statistics come in. I believe it should be the responsibility of the computer to determine the topics both in the questions and in the educational materials. I will make the assumption that a single question actually is a complex beast that may require knowledge of several topics to get the correct answer. The educator that created the material will have some notion of what a question tests for and what the material conveys, but the computer will have access to millions of student answers and may have a different opinion of how the topics intertwine and interact.

I think that this is an important point. It is often easy for a human being that is creating educational materials to be oblivious to some of the concepts required. For example a wonderful paragraph about fractions, written in Spanish, may have limited utility to an English speaking student. Did the educator think to classify the language in which the material was written as a required concept? Similarly there may be subtle biases, cultural or otherwise. For example in simple arithmetic 3 + 1 = 4, 5 + 3 = 8, 4 + 1 = 5. Are these all equally difficult concepts? They are NOT from the standpoint of a student that is just learning to do arithmetic on an abacus. (The hard problem is 4 + 1 = 5, because it requires subtracting 4 one beads and bringing down a 5 bead. The others all involve moving a single bead or a class of beads all in a single direction.) If there is any validity to the claim that there are visual learners and audio learners, a proper set of questions and a pile of statistics teases out that relationship and starts to affect the sequencing of the information. Was the human author of the material aware of those conceptual class distinctions at the time of creating the material? Did the author know that there are some kids that learn their math on an abacus or may be differently abled?

Basically the computer will have to solve the following math model. You have a set of questions, you have a ton of people’s answers to those questions, and you know that each question embodies hopefully a single topic or more likely a handful of topics. You want to simultaneously solve the model to deduce a) how many topics are represented by the set of questions, b) which particular topics are embedded in each particular question and c) what is the knowledge vector for each individual.

My belief is that once you are able to solve that sort of model it will not be difficult to create a system that does the test-present-test cycle. The system does a certain amount of randomized testing to determine the topics covered in questions. Once it has an idea of what topics a question embodies it uses those questions to evaluate an individual. It presents educational material that presumably fills in a gap in that individual’s knowledge. It then tests again to see both how many topics the ed material actually covers and how well statistically it actually modifies the student’s knowledge of the topics.

Presumably the system can also provide feedback to the educators on the utility of the questions (“that is a good question, single topic, very solid test.” Or, “this question appears to have too many topics in it and does not give fine discrimination value, for example folks that don’t know the topic seem to be able to guess the correct answer”) Likewise the system can prompt for other ed material. (“What can I show a student who consistently gets the topics in these questions wrong that will fix the problem?” Or, “I can find no question that tests the knowledge in this paragraph; can you supply one?”)

Of course, one should not go overboard on the automation. It is perfectly reasonable to expect students to read materials, look at demos, and have questions, which they type in. The educators look at the FAQ, generate answers and either embed that into material so that the original material is better in that it addresses the topic, or else they just create a new nugget of info and some questions that cover that topic.

Summary

What I am proposing can be looked at as a very modest extension of the programmed learning texts that were created in the seventies as a result of some work of Bell Labs designing materials to optimally train telephone operators. The format was: present material, test, branch to new material based on result of test. I took a yearlong music theory course entirely in that format. It was wonderful. It worked. The major difference between that and what I propose is that that material required a human to work out the sequencing and the branching and it was all done with out the feedback from the use of the materials. I believe that with an internet that allows the essentially free distribution of material and free collection of usage statistics, we could build tools that will make it much easier for educators to create this type of material and we can make the material sensitive to individual differences in ways that were not possible with the creation of static texts.

What you should get out of all this is an educational system that intermixes all of the insights of the best and the brightest of the educators and strives continually to find the optimal path through all that material. The system models the student’s knowledge all through the process and can modify the presentation on the fly to suit a student’s mood or to fill in holes left by previous education or just by faulty memory. By being able to re-sequence the presentation on the fly it is able to tailor the educational experience to the individual similar to what any good tutor does. It should provide for a much richer educational experience that can be obtained from any static work such as a book, a figure, a tape or a video. Lastly by leaving the sequencing of material to the machine it relieves the author or the material from those sorts of sequencing issues.

Educational materials become little organisms that evolve in the sea of the web and compete for survival based upon how well they do at clearing up confusion as measured by the questions.

As a side note I point out that the computer does not care in the least if a person’s knowledge profile is HUGE. So what if it keeps track of the 100,000 things that you know so that it can continue your education. The result is that people are classified by what they know and what they don’t know. You need an expert that understands both cosmology and the NT operating system - You can FIND that guy. You need someone that can translate medical documents about liver function from Japanese to English – you can find that guy.

I think that there could be great benefit in this knowledge. I also think that if there are any problems with this system, they will almost for sure be political, civil-liberties, invasion of privacy, BIG BROTHER concerns, rather than in anything technical or educational in nature. I think it must be stressed from the beginning that an attempt to make this system into a classification and testing system, will distort the educational purpose, i.e. once it is known that they way you can find the experience medical technician that you are looking for by cruising the stats generated in this system, you will find people spoofing the system to pump up their stats in the hopes that they will be seen. I don’t see this as any sort of major problem, I just point it out for completeness.

The Math

Known:

G_A, G_B – the probability of getting questions A & B right when guessing. This could be 50% for a true false question, or 25% for multiple choice, or some other number because some of the multiple choices are “obviously” wrong.

Unknown:

K_AB, K_AB, K_A_B, K~~_AB~~ – The percent of the population that knows both A & B, knows only one, or knows neither.

Note: as defined the Knowledge values are a partition of unity i.e.

1 = K_AB + K_AB + K_A_B + K~~_AB~~

Observable:

T₀₀, T₀₁, T₁₀, T₁₁, - The percent of the population that got both answers wrong, got A wrong but B right, etc.

Now we use our model, that if someone knows the answer they will get it right and if they don’t know then they will guess, to define the observables from the unknowns. Namely:

T₁₁= K_AB + (G_B)K_AB + (G_A) K_A_B + (G_B)(G_A) K~~_AB~~

The way to read the above is “The only way to get both answers right is either the person knew both A & B, or only knew A but guessed B, or only knew B but guessed A, or knew neither but guessed both” Similarly one gets:

T₀₁ = (1 – G_A)( K_A_B + (G_B) K~~_AB~~)

T₁₀ = (1 – G_B)( K_AB + (G_A) K~~_AB~~)

The term (1 – G_A) is the percentage of people that will guess wrong on A. You read the first equation above as “The folks that got A wrong and B right did it the following way, first of all, all of them guessed at A and failed, hence the term (1 – G_A) and secondly either they knew B directly or else the guessed at B and got lucky.” Lastly we have:

T₀₀ = (1 – G_A)(1 – G_B) K~~_AB~~

Four equations, four unknowns so we solve for the Ks and get:

K~~_AB~~ = T₀₀ /((1 – G_A)(1 – G_B))

K_A_B = (T₀₁ – G_B(T₀₁ + T₀₀))/((1 – G_A)(1 – G_B))

K_AB = (T₁₀ – G_A(T₁₀ + T₀₀))/((1 – G_A)(1 – G_B))

K_AB = T₁₁ – T₁₀((G_B)/(1 – G_B)) – T₀₁((G_A)/(1 – G_A)) + T₀₀((G_B)/(1 – G_B))((G_A)/(1 – G_A))

Now the way that this relates to the educational problem is this. Suppose you have two questions that are testing the same concept. If so then you expect that everyone either knows both or knows neither i.e. you expect both K_A_B and K_AB to be zero. Suppose on the other hand that Question A strictly contains Question B, in other words, A is a more difficult question requiring that you know B and then some other things. In that case, you would expect that anyone that knows A can get B but not the other way around. In that case K_AB would vanish but K_A_B would be some non-zero number.

What this analysis leaves lacking is any comment on conditioning or analysis of the partials. That is, suppose your 4-answer multiple-choice question has an answer that it pretty obviously incorrect

How far is the moon from the earth?

(a) 1 foot

(b) 183,000 miles

(d) 185,000 miles

In this case using a value for G_A of strictly ¼ will probably not be correct. In theory, one corrects for this by measuring the correct value of G_A for each question, but this assumes that you know whether or not someone knows the concepts.

Nonetheless, I think it clearly shows without a great leap of faith that given a huge collection of questions and person’s answers to those questions, it will be possible to extract the relational hierarchy of the questions, i.e. the concepts. Once one has calculated the content of the questions, it is not particularly difficult to go back and look at an individual student and estimate their knowledge. “He got every question involving A right, but missed about ¾ of all those involving the more complicated concept B – Here is a student ready to learn B”

Addendum: On further reflection my concern over knowing the correct value of G_A should not be a major problem. I think it will fall out in a bootstrap sort of procedure. For example in the question above answer (a) is clearly wrong and as a result it will not be chosen as frequently as the other wrong answers (b) and (c). This will show up clearly in the stats. Furthermore, if you have multiple questions that cover a given concept, which is desirable since it is the only way to really discover if someone knows a concept, you can fairly easily detect someone that does not know a concept. Once you know that, you look at the population that you are sure does not know the concept and for any question you can look at the stats to see with what probability they choose each answer, INCLUDING the correct one. Thus one should be able to use initial values for G_A to calculate the concepts in a question, use the concepts to identify the persons that don’t know the concept, then use the distribution of the guesses to recalculate the probability of guessing for each question. By going back and forth it should converge.