If f is continuous then it maps convergent sequences to convergent sequences.

A formal statement of the result to be proved.

Let f be a continuous function from R to R and let (x_n) be a convergent sequence. Then (f(x_n)) is a convergent sequence.

Remark: the proof carries over easily to continuous maps between metric spaces, and so does its method of discovery. I present this less general version so that it can be understood by those who have not yet met metric spaces.

Finding the proof.

1. Our eventual aim will be to show that the sequence (f(x_n)) is convergent. Let us write out the statement in full.

(i) (there exists y) (for all a > 0) (there exists N) (for all n > N) |f(x_n)-y| < a.

2. Our immediate challenge is to find y. Let us therefore write out our hypotheses in full as well.

(ii) (for all b > 0) (for all z) (there exists c> 0) (for all w) |w-z|< c implies |f(w)-f(z)| < b.

(iii) (there exists u) (for all d> 0) (there exists M) (for all m> M) |x_m-u|< d.

3. Now we start trying to match up parameters, which as always we do by focusing on the end of (i): the statement |f(x_n)-y| < a. Could either of our hypotheses be made to end that way (bearing in mind that we are allowed to make a specific choice for y)? Since (iii) does not involve the function f, the only possibility is (ii).

4. This immediately suggests renaming b as a. So now we are trying to match the following two statements.

(i) (there exists y) (for all a > 0) (there exists N) (for all n > N) |f(x_n)-y| < a.

(ii)' (for all a > 0) (for all z) (there exists c> 0) (for all w) |w-z|< c implies |f(w)-f(z)| < a.

5. We would now like to specialize (ii) to some particular value of z and find a good name for w. It is easy to see how to do this, but just to make it absolutely clear how to do it automatically (rather than using a tiny bit of `intelligence') let us use the technique of pretending that we have chosen y already. That is, we write

Let y be a real number, to be determined later.

and then replace (i) by the statement

(i)' (for all a > 0) (there exists N) (for all n > N) |f(x_n)-y| < a.

6. This tells us that when we choose z, we will want to do it in such a way that f(z)=y (because y is the only parameter that is fixed in the statement [|f(x_n)-y| < a]). So let us now write

Let z be a real number, to be determined later, and let y=f(z).

and revise (i)' to

(i)'' (for all a > 0) (there exists N) (for all n > N) |f(x_n)-f(z)| < a.

and (ii)' to

(ii)'' (for all a > 0) (there exists c> 0) (for all w) |w-z|< c implies |f(w)-f(z)| < a.

7. It is not immediately obvious how we should choose z, but it is clear that if our matching of the ends of (i)' and (ii)' is to work, then we need w to be renamed x_n.

(ii)''' (for all a > 0) (there exists c> 0) (for all x_n) |x_n-z|< c implies |f(x_n)-f(z)| < a.

8. A standard piece of logic tells us that we may replace [for all x_n P(x_n)] by [for all n P(x_n)], so let us do this.

(ii)'''' (for all a > 0) (there exists c> 0) (for all n) |x_n-z|< c implies |f(x_n)-f(z)| < a.

9. Let us apply the for-all-removing move to (i)'' by writing

Let a> 0.

so that (i)'' now becomes

(i)''' (there exists N) (for all n > N) |f(x_n)-f(z)| < a.

10. Of course, now we want to specialize (ii)'''' to this choice of a:

(ii)''''' (there exists c> 0) (for all n) |x_n-z|< c implies |f(x_n)-f(z)| < a.

11. When a hypothesis begins "there exists" then the standard move is to fix one. So pick a c that works. This leaves us with

(ii)'''''' (for all n) |x_n-z|< c implies |f(x_n)-f(z)| < a.

12. This tells us that in (i)''' it is enough to prove [|x_n-z|< c] rather than [|f(x_n)-f(z)| < a]. In other words, we can rewrite what we need to prove as

(i)'''' (there exists N) (for all n > N) |x_n-z|< c.

13. It looks as though we have gone as far as we can without using our other hypothesis, which was

(iii) (there exists u) (for all d> 0) (there exists M) (for all m> M) |x_m-u|< d.

14. Again, a hypothesis starts with "there exists", so let us remove it by picking such a u. This gives

(iii)' (for all d> 0) (there exists M) (for all m> M) |x_m-u|< d.

15. Now we would like to match with (i)''''. Looking at how the two statements end, it is clear that we should specialize d to c and rename m as n and M as N. That gives

(iii)'' (there exists N) (for all n> N) |x_n-u|< c.

16. We are still free to choose z. (Instead of remembering this, we could have put "there exists z" at the front of (i)''''.) It is now blindingly obvious that we should choose it to be u. The match between (iii)'' and (i)'''' is then perfect, and the proof is discovered.

Remark:

The discovery is much easier if one is asked to prove instead the more specific statement

Let f be a continuous function from R to R and let (x_n) be a sequence that converges to x. Then (f(x_n)) converges to f(x).

since then one is not asked to work out what (f(x_n)) converges to. Probably I made heavy weather of this above, and could have found the limit more quickly by introducing hypothesis (iii) at an earlier stage. If true, this merely illustrates that there is not always a single natural way to discover a proof. A computer program based on the above ideas could be set up so that it ran two or three times, introducing hypotheses in different orders, and then it could decide which of the resulting approaches was the most economical.

An alternative approach

Just to show what I mean about alternative approaches, here is a slightly sketchier indication of what a simple algorithm might do if it was asked to find some consequence of (ii) and (iii) without really knowing where it was going. Here, for convenience, are the statements again.

(ii) (for all b > 0) (for all z) (there exists c> 0) (for all w) |w-z|< c implies |f(w)-f(z)| < b.

(iii) (there exists u) (for all d> 0) (there exists M) (for all m> M) |x_m-u|< d.

1. As usual we fix a u and change (iii) to

(iii)' (for all d> 0) (there exists M) (for all m> M) |x_m-u|< d.

2. Our best chance for matching anything seems to be to fit [|w-z|< c] together with [|x_m-u|< d]. It therefore seems a good idea to specialize z by setting it equal to u. We'd also like c and d to represent the same parameter, so we should specialize (iii)' to the case d=c. This we cannot yet do as c is still a bound variable, and it depends on b. So to deal with this problem first remove "for all b> 0" from the beginning of (ii)' by declaring that b is to be chosen later. Doing this and specializing z we get

(ii)' (there exists c> 0) (for all w) |w-u|< c implies |f(w)-f(u)| < b.

3. Next we do what we always do when a hypothesis starts with "there exists": fix a c with the given property and write

(ii)'' (for all w) |w-u|< c implies |f(w)-f(u)| < b.

4. Now that we have a c we are ready to specialize d, converting (iii)' into

(iii)'' (there exists M) (for all m> M) |x_m-u|< c.

5. To make the match between [|x_m-u|< c] and [|w-u|< c] perfect we would like to replace "(for all w) P(w)" by "(for all n) P(x_n)". Doing this gives us

(ii)''' (for all n) |x_n-u|< c implies |f(x_n)-f(u)| < b.

6. Clearly we would be better off if m was renamed as n in (iii)'', giving

(iii)''' (there exists M) (for all n> M) |x_n-u|< c.

7. It is not hard (even for a computer) to insert (ii)''' into (iii)''' and deduce that

(iv) (there exists M) (for all n> M) |f(x_n)-f(u)| < b.

8. Now we are very close to what we want, which is

(i) (there exists y) (for all a > 0) (there exists N) (for all n > N) |f(x_n)-y| < a.

Obviously we should let y=f(u). What was u? Well, it was something that we were told existed (and in fact it is the limit of the x_n). As always, the proof starts by fixing an arbitrary a. Earlier we promised to choose b at some point. Clearly it should be chosen to be a. Finally, rename M as N and we are done.