How to discover a proof of the fundamental theorem of arithmetic.

The usual proof.

Here is a brief sketch of the proof of the fundamental theorem of arithmetic that is most commonly presented in textbooks.

1. First one introduces Euclid's algorithm, and shows that it leads to the following statement: for any two integers x and y there exist integers h and k such that hx+ky=(x,y), where (x,y) is the highest common factor of x and y.

2. Next, one deduces from this the result that if p is a prime and p divides ab then p divides a or p divides b. (Proof: if p does not divide a then (p,a)=1 so by the previous result we can find h and k such that hp+ka=1. Then hpb+kab=b. Since p divides ab and p obviously divides hpb, we deduce that p divides b.)

3. Then, one deduces from this, by an easy inductive argument, that if p divides a1...ak then p divides ai for some i.

4. Lastly, one takes a supposed minimal counterexample to the theorem. So let p1...pr=q1...qs, where the pi and qj are all primes and not the same ones up to a reordering. By minimality, no pi is equal to any qj (or we could divide through and get a smaller example). But p1 divides the product of the pi and hence the product of the qj. By Step 3, p1 divides some qj, which is nonsense as qj is a prime not equal to p1.

How such an argument might have been discovered.

If you did not know how to prove unique factorization, then what would possess you to define Euclid's algorithm, think of the clever trick to show Step 2 above, and put it all together? It seems that Euclid himself did something like that. (Although he did not write down a complete proof of unique factorization, he did get as far as Step 2, and must have known how to deduce from it the whole theorem.) Did he stumble on his ideas by accident, messing about with highest common factors and suddenly noticing what he could do with his results, did he start with the problem of unique factorization and develop his algorithm to deal with it, or was he simply an utter genius who could invent arguments of several steps all in one go? I am no historian, and do not even know whether the answer to this question is known. However, I hope to show that he could have been led naturally to his famous algorithm by first thinking about unique factorization. (I myself believe that he was led to it this way, but as I say this does not matter.)

In the spirit of Polya, I shall try to make as explicit as possible the rules of thumb that I assume are part of a standard mathematical armoury. I shall probably be more explicit than Euclid himself would have been, but this is not a problem as I am not trying to be historical, and in any case it is quite possible to apply mathematical rules of thumb without ever being aware of them - just as it is possible to speak grammatically without formulating the rules of grammar. I shall give far more detail than any human reader is likely to need, because I am also interested in how one might teach computers to do mathematics.

One guiding principle is so useful and well known that I shall state it before I even start.

Principle 1.

For questions to do with the positive integers, induction is a good idea. Very often, a good first line of an argument is, "Let us consider a minimal counterexample".

We can now get to work. Guided by Principle 1 above, we do indeed consider a minimal counterexample. What does this mean? Since it is a counterexample, it must be a pair of distinct prime factorizations of the same number. Since it is minimal, it should not be possible to find a smaller example. We must therefore consider an expression p1...pr=q1...qs, where the pi and qj are prime numbers, and they are not the same sequence up to a reordering. Now it is clear that we cannot just stare at this expression and feel certain that it must be false. After all, why shouldn't the two sides of the expression be equal if say the primes involved are very large? We have to find some reason for them not being equal, and we don't have much to go on.

Noticing that no pi is equal to any qj.

When you don't have much to go on, the following embarrassingly obvious principle is often useful.

Principle 2.

Write down what you do have to go on and see what you can deduce from it.

Is there anything we haven't used in this example? There is, because we have not yet used the minimality of the counterexample. How does one ever deduce anything from minimality? This question has an answer, which I shall formulate as another principle.

Principle 3.

If you want to exploit the minimality of an X, then think of methods for finding a smaller X. Given that a smaller X does not exist, any method you think of must fail. The fact that it fails has consequences that may be useful.

How might we pass from p1...pr=q1...qs to a smaller example? Well, both sides represent some positive integer n, so a more general question is how we can pass from a positive integer n to a smaller one. There are various methods, such as subtracting 1, or subtracting other numbers. Notice, however, that there is not a lot we can say about n-1. Let us be guided by Principle 2, and ask ourselves what we know about n. We could also use the following equally obvious principle.

Principle 4.

If you are trying to find an example of an X, then write down what makes an X an X.

In this case we are trying to find an integer smaller than n that can be factorized in two different ways. Now notice that n is not a specific integer like 1001, but is rather a hypothetical integer, and all we know about it is that it equals both p1...pr and q1...qs, where these are themselves hypothetical sequences of primes. This leads to another principle.

Principle 5.

If you are going to use a hypothetical object X to construct an object Y, then Y will itself be hypothetical and all you can use to construct Y is the hypotheses about X.

Putting together Principles 4 and 5 in our particular example, we see that we are trying to construct from n=p1...pr=q1...qs a smaller integer which can also be written as a product of primes in two different ways, and that all we have to help us is the two ways of writing n.

Principle 6.

Relax the constraints. For example, if you are asked to do two things at once, see if you can do one of them and then build up to the second.

We are trying to find an integer smaller than n which is a product of primes in two different ways. Can we at least use n to find a smaller integer which is a product of primes in one way? Notice that it is important here to use n in a genuine way. It is not good enough to say "what about the number 2?" So for this we apply Principle 5, and finally notice (of course actually we could have noticed this long ago, but I am trying to minimize the human skills I assume) that p2...pr is an integer smaller than n, constructed using the properties of n, with a prime factorization.

We now return to the harder problem - constructing a smaller integer with two prime factorizations. It is clear that using the property n=p1...pr will not be enough, as every integer has at least one factorization. It is therefore very natural to consider a second integer, constructed using the property n=q1...qs, namely the integer q2...qs.

We have got somewhere, since we now have an attempted construction of a smaller example. Indeed, we will have a smaller example if p2...pr=q2...qs. However, now (applying Principle 3) we remember that there is no smaller example. It follows that our attempt did not succeed. From this we deduce that p1 does not equal q1.

Let us review the above argument, guided by the following principle.

Principle 7.

Having reached a mathematical conclusion, examine it carefully to see which hypotheses were essential and which were incidental.

In this case, one sees readily that it did not matter that we considered p1 and q1. Indeed, since these are hypothetical primes about which we have said nothing, it is clear that there is no distinction between any of the pi or qj. Consequently, we can extend our above reasoning to achieve the following conclusion: no pi is equal to any qj.

[It is interesting to think about how a computer would have to be set up in order to apply the reasoning of the above paragraph. Perhaps it would regard {1,2,...,r} and {1,2,...,s} as indexing sets, taking care to note that these indexing sets are sets and nothing more. Or perhaps it would consider pi and qj in the first place rather than p1 and q1.]

Formulating the lemma p|ab => p|a or p|b.

We have made some definite progress, since we now know that no pi is equal to any qj. However, at this point we hit a brick wall: there still appears to be absolutely no reason for the two products to be distinct, or any obvious way to find a smaller example. In a situation like this, it is often a good idea to fall back on the following principle, which I divide into two subprinciples.

Principle 8.

a. Try to prove something more specific and thus easier.

b. Try to prove the simplest non-trivial special case you can find.

What does a special case mean for our example? It surely does not mean choosing a particular pair of products of primes and proving that they are distinct. After all, that is just a matter of numerical checking and it will tell us nothing. (Perhaps it is not obvious in advance that it will tell us nothing, since there might be interesting patterns in the digits of the numbers arising in the calculations. However, if we searched for such patterns we would not find them.)

Let us see where we stand and try to describe it precisely. If we are faced with the expression p1...pr=q1...qs and know that no pi is equal to any qj, then we feel there is nothing much to say because the statement is too general. If on the other hand we substitute values for r, s and all the pi and qj, then we can say something but we learn nothing because the statement is too specific. A natural next step is to try something in between. This had better be stated as a principle.

Principle 9.

If two attempted arguments fail for opposite reasons, then try to find an argument in between.

What could that mean? It is not difficult to say in this case. We had problems if we specified none of the primes, and we also had problems if we specified all of them. So let us try specifying some of them. The simplest way of doing so is to specify just one. So let us choose a value for p1. What value should we try? Principle 8b seems to tell us that we should try p1=2.

A simpler conjecture.

Doing this, we can instantly deduce that none of the qj is equal to 2. We now seem to have a genuinely new problem to think about, simpler than the original one. We must show that it is not possible for 2p2...pr to equal q1...qs if none of the qj is equal to 2. Why might this be?

Well, a minimally competent human mathematician can see the answer immediately: the left hand side is even and the right hand side is odd. How might a computer discover this? It could proceed as follows. First, it would reflect that it must use the information that p1=2, since that is all that separates us from the earlier, hopeless looking situation. Next, it would reflect that it must use the information that no qj is equal to 2, since if some qj did equal 2 (which as we know contradicts the minimality) then one could divide through and lose the information that one of the pi was equal to 2. It could then apply the following principle.

Principle 10.

Try to use as little information as possible. In other words, see how you get on with as few hypotheses as you can, and introduce extra ones only if they are necessary.

We have already picked out two hypotheses that must be used. Using just those leads to the following conjecture: if 2 is a factor of m and n is a product of primes not equal to 2, then m does not equal n.

Again, we human beings find this immediately obvious, because primes not equal to 2 are odd. But what could cause a computer to notice this? There are several possibilities, one of which is to appeal to Principle 1 again: consider a minimal counterexample. This will be an example of the form m=q1...qs, where m is even and each qj is a prime not equal to 2. The minimality tells us (by similar reasoning to our earlier minimality argument) that q1...qs-1 does not give rise to a counterexample. That is, it is not equal to an even integer. Let us write a=q1...qs-1. Then a is not divisible by 2, but aqs is. What does this tell us?

It is time for another principle.

Principle 11.

Don't forget what it is you are trying to prove.

In this case, I mean not the conjecture recently stated, but unique factorization. If unique factorization is true and 2 divides q1...qs, then 2 must equal one of the qj. In our case, the only way it can happen is if it equals qs. Can we prove this? We are asking the following question: if a is odd, qs is prime and aqs is even, does it follow that qs=2?

Principle 12.

Try the contrapositive.

All right, if qs is a prime not equal to 2 and a is odd, does it follow that aqs is odd? Well, what does it mean to say that a is odd? The definition is that it is not divisible by 2. However, another, more positive definition is that a can be written 2b+1 for some integer b. [This would be a very low-level thing for a computer to notice, but I don't immediately see how it would do so.] Let us substitute this in to the expression aqs, to obtain (2b+1)qs=2bqs+qs. It is now an easy exercise [I leave it to the reader to think about how the computer would do it] to notice that this is odd if and only if qs itself is odd. However, we know it is even, so we know that qs is even, and hence equal to 2 as required.

Back to formulating the main lemma.

We now review the above argument. We see that what we ended up needing to prove was that if aqs was even and a was odd, then qs was equal to 2. We did this by showing that qs was even, and deducing (from the fact that it was a prime) that it was 2. Thus, the statement that led to our success was the following: if a and b are odd, then ab is odd.

Having considered a special case, we now do the obvious thing.

Principle 13.

Once a special case is proved, try to generalize it.

We have had some success starting with the assumption that p1=2. We could now see if we can do something similar with the assumption that p1=3. We would find that we could: indeed, just as with 2, the statement we would end up needing to prove would be that if neither a nor b is divisible by 3, then ab is not divisible by 3. Inspired by the proof of this for 2, we would write a=2x+1 or 2x+2 and b=2y+1 or 2y+2, check four cases (only three if we are feeling clever) and discover that the statement is true. We could then go on and prove similar statements for 5,7 and so on.

We would then find ourselves faced with the following problem. For any specific prime p, we can prove, by a brute force argument, that if p|ab then p|a or p|b. However, our argument does not generalize. Still, we have at least formulated the lemma that we want to prove, and we know that it implies the whole theorem.

Proving the lemma p|ab => p|a or p|b.

To a mathematician who had not seen a proof of this statement, it would be a formidable problem to find one. Getting to the stage of formulating the lemma is definitely the easier part of proving the fundamental theorem of arithmetic. Once the lemma is formulated, we again find ourselves in the situation with which professional mathematicians are all too familiar: there seems to be absolutely no reason for the statement to be true in general, and yet one believes that it is.

This kind of despair can be very productive, because it often leads to Principle 2. If there is not much to go on, then in a sense one's moves are forced, and that can make finding a proof (if it exists) easier rather than harder. In this case, in a desperate attempt to strengthen our hypotheses, let us apply Principle 1 again. We would like to consider a minimal counterexample. Principle 3 then tells us to start with a prime p dividing a product ab while dividing neither of the factors, and to attempt to build out of it a smaller example.

In this case, it is not immediately obvious how to find a smaller example, so let us consider (guided by Principle 6) a more general problem: how can one construct any other example from a given one? That is, if p divides ab without dividing either a or b, how can one find any other example of a prime q dividing cd without dividing c or d? Let us try to do the simplest thing and just vary one of the numbers p, a or b. A moment's thought makes it clear that varying p is unlikely to be easy, so let us try to vary a.

We must convert a into a new number a' in such a way that p divides a'b and does not divide a'. Let us try adding something to a. Given that we know nothing about a except that it is not a multiple of p, if we want to preserve this property, then we ought to add a multiple of p. As an experiment, let us try a'=a+p. We find that a'b=(a+p)b=ab+pb which is a multiple of p, since ab is. We can obviously repeat this process, which shows that adding any multiple of p to a gives an integer a' such that p|a'b but p divides neither a' nor b.

Having had some success with addition, we could try multiplication. What happens if we replace a by ha? Well, we know that p divides hab, since it divides ab, but perhaps it divides ha. Of course, if p does not divide h, then we suspect that p will not divide ha either, but that is what we are trying to prove. But hang on! Our counterexample was supposed to be minimal, so as long as h is smaller than b we are all right. So we now have two methods of producing new examples: multiply by a number smaller than b, and add or subtract a multiple of p. The question now is: can we use these two processes to construct a smaller example?

Stumbling on Euclid's algorithm.

Just to make things absolutely clear, here is the problem we are now considering: given a prime number p and a positive integer a, can we construct a smaller positive integer a' from a by a combination of adding multiples of p and multiplying by positive integers less than b? The argument that allowed us to multiply no longer works if we have already changed a, so we should first multiply by some h between 1 and b-1 and then add tp for some t. Thus, we can reformulate the question more precisely: does there exist an integer combination a'=ha+tp which is positive and smaller than a, with h positive and smaller than b? Of course, t will have to be negative.

The most obvious thing we could do is try subtracting p from a. If a is greater than p then this works. Hence, on the assumption that we have a minimal counterexample, we know that a is less than p.

To make any more progress, we must use h. We are a little bit worried about the condition that h is less than b, so we would like to keep it as small as possible. On the other hand, we obviously need t to be non-zero, which forces ha to be greater than p. So a natural choice for h is the smallest integer with this property. We are thus led to the following question: let p be a prime, let 0 < a < p and let h be the smallest integer such that p < ha. (It is not possible for p to equal ha since p is a prime and a clearly cannot equal 1 if p does not divide b.) Is ha-p smaller than a?

Remembering Principle 12, we consider the contrapositive: if ha-p is greater than or equal to a, does it follow that h is not the smallest integer such that p < ha? Clearly the answer is that h is not the smallest if and only if p < (h-1)a. But if ha-p is greater than or equal to a, then, rearranging, (h-1)a is greater than or equal to p and hence greater than p (again because p is prime and a is not 1).

Have we suddenly finished the proof? Yes, provided that the h we chose is definitely less than b. Now we are assuming that p|ab, so we know that ab=sp for some positive integer s. Since p is prime, we know that s cannot equal 1. It follows that ab is at least as big as 2p. On the other hand, since a < p, it is clear that ha is not as big as 2p (either by the argument that ha-p < a < p or simply because the first time you go above p you certainly don't jump as far as 2p). Therefore, h < b as desired, and the proof is complete.

Just a reminder of why the proof is complete: we started with a minimal counterexample and have used it to construct a smaller counterexample. This contradiction proves the lemma, and with it the theorem.

I should perhaps remark that the algorithm implicit in the above proof is not quite the same as Euclid's algorithm, since we write p=ha-a' rather than p=qa+a'. Moreover, we pass from the pair (a,p) to the pair (a',p) rather than the pair (a',a). These differences make the algorithm less efficient for finding highest common factors, but they introduce ideas that lead easily to the usual version of the algorithm.

Converting the above thoughts into the usual proof.

As every mathematician knows, the order in which one thinks of a proof is not the order in which one writes it. These web pages are attempts to write proofs in their untidy, thinking order rather than their neat, logical order. This makes them more long-winded but perhaps easier to understand and memorize. Nevertheless, the process of reorganizing one's incoherent thoughts into a mathematics paper or textbook is an important one. Examining the above argument, one is naturally led to isolate Euclid's algorithm, as I hope to show.

The trouble with arguments that start, "Consider a minimal counterexample" is that they often hide an interesting inductive process, almost pretending that it doesn't exist. A curious computer would have been told the following principle.

Principle 14.

If you have proved an argument by deriving a contradiction from the existence of a minimal counterexample, try rewriting the proof by starting with any example, repeatedly applying the method you used to construct a smaller example and seeing where you end up.

In our proof, we started with a triple (p,a,b) and built a new triple (p,a',b), with 0 < a' < a and with a' of the form ha+tp. What happens if we repeat this process? The first thing to observe is that p and b are never altered, so let us forget about them for now. Where will the process of replacing a end? Well, the only extra facts that we used from time to time were that a was not equal to 1 and was not a multiple of p. It follows that if a is not a multiple of p then the process will end when we have created a new a that does equal 1. This new a is created from the original one by a succession of operations of the following form: multiply by some h and subtract some multiple of p. Doing two such operations results in doing a third: k(ha+tp)+up=(kh)a+(kt+s)p. Hence, we have managed to express 1 as an integer combination Ha+Tp for some integers H and T. This is exactly the consequence of Euclid's algorithm used in the usual proof.

Now recall why it was that a should not equal 1. It was because if p divides ab and a=1, then p certainly divides b. But the whole point of the integer combination Ha+Tp was that if p divides ab without dividing either a or b, then it also divides (Ha+Tp)b without dividing Ha+Tp or b. Now let us assume merely that p divides ab without dividing a. Then we can find H and T such that Ha+Tb=1, and we know that we will now find that p divides b if we continue the argument. Let us do so: we know that Ha+Tp=1 and p divides (Ha+Tp)b. Thus p divides b.

This is almost the argument in its standard form, but in order to prove that p divides (Ha+Tp)b we have used the fact that Ha+Tp was built in several steps from a. It is natural to ask whether we can prove it more directly, and as soon as we have asked that, we see that we can since p divides ab and p divides itself. We have recovered the trick used in step 2 of the usual proof.

We haven't quite formulated Euclid's algorithm, since we assumed that p was prime, but it is a short step from here to asking the question, "What would happen if we did the same process starting with an arbitrary pair of integers?" Given that we have now proved the fundamental theorem of arithemetic, we can quickly discover the answer, and then, if we feel like it, we can prove this answer by generalizing the proof for p prime and not using the fundamental theorem of arithmetic after all.