Why easy analysis problems are easy

I am in the process of revising and greatly expanding this discussion. For now I will leave up what I had before, but soon I hope to have a better worked out argument for the following belief: the reason experienced mathematicians can come up with certain sorts of arguments in elementary analysis very easily is that these arguments can be generated by a simple algorithm. For now I have some examples of how such an algorithm would run. Later I will try to give a general description of what it would be doing, though a description of sorts is already implicit in my surrounding remarks.

The examples

If f is continuous then the inverse image under f of any open set is open.

If the inverse image under f of any open set is open, then f is continuous.

If f is continuous then it maps convergent sequences to convergent sequences.

A composition of continuous functions is continuous.

Why easy analysis problems are easy (old version)

If you have recently met epsilons and deltas for the first time, then you may find the problems you are asked to solve on examples sheets very hard. On the other hand, you will notice that your lecturers, supervisors etc. do not find them hard at all. Why is this? Partly, of course, because they have had much more practice. But what have they learnt from this practice? In this page I hope to give the answer, by describing a completely mechanical method for tackling analysis problems. I don't quite mean a method for solving them - after all, people make careers out of research in analysis - but rather, a method for stripping away the layers of quantifiers (things like, `for every x' or `there exists y') and getting to the heart of what one is actually asked to prove. Very often, this will turn out to be something extremely simple such as showing that for every positive integer N there is another positive integer M whose square root is at least ten times as big as N.

Needless to say, reading this will not make the problems suddenly easy, since even a totally mechanical procedure has to be learnt, and practised until it becomes fluent. However, I hope that it may help to speed up the process for some people, and show that you don't have to be a genius to understand analysis.

Because this is an HTML document, some of my notation will not be standard. In particular, I shall write (Ax) to mean `for all x' and (Ex) to mean `there exists x such that'. Also, instead of epsilon and delta I shall use the letters e and d. I write => for `implies' and =/=> for `does not imply'.

Let me begin with a brief description of the method in general, and then illustrate it with a few examples.

How to get to the heart of an analysis exercise.

Step 1.

Rewrite the question in terms of the basic definitions. For example, if you are asked to prove that a function f is not continuous, write out the definition (in terms of epsilons and deltas) of f being continuous.

Step 2.

If you are asked to prove the negation of what you have written down in Step 1, then apply a totally mechanical process (described below) that replaces a statement involving quantifiers by its negation.

Step 3.

You should now have a statement that looks something like this

(Av) (Ew) (Ax) (Ey) (Az) P(v,w,x,y,z)

or perhaps this

(Ev) (Aw) (Ex) (Ay) (Ez) P(v,w,x,y,z)

where P(v,w,x,y,z) is some statement involving the variables v,w,x,y and z. Of course, it is not necessary that the number of variables should be five, or that `there exists' should alternate with `for all'.

How do you get your head round such a statement? My advice is not to try to grasp it all at once. However, you should certainly examine the statement P(v,w,x,y,z) just to get some sort of a feel for what it says about the relationship between v,w,x,y and z.

Step 4.

Suppose, though, that you understand the statement P(v,w,x,y,z) and become bewildered only when it is preceded by all those quantifiers. What then? Surprisingly, it is possible to begin writing out a solution without having the faintest idea what is going on. In fact, let me illustrate this by starting to write out a proof of the first statement above, even though I do not know what the statement P(v,w,x,y,z) is. It goes like this.

Let v be arbitrary. [Of course, typically, v will not be totally arbitrary, but will be something like a positive real number, in which case instead of (Av) one would have written (A v > 0). However, the main point is that we have no control over v, since our argument is meant to be valid for every v.]

Given v, let us choose w to be f(v). [Here, the point is that we have to demonstrate that a suitable w exists, but it is allowed to depend on v. Thus, in effect, we must come up with a clever choice of a function f. If we cannot see in advance what would be a good choice for f, then we could continue the above sentence with the words `where f is a function that we shall specify later'.]

Now let x be arbitrary.

Let y be g(v,x). [There is no harm in letting y depend on w as well, but since w=f(v), this does not add generality. Once again, we can if we wish defer the choice of g until we have understood the statement P(v,w,x,y,z) better.]

But for every z, it can be shown that P(v,f(v),x,g(v,x),z). Indeed, ... [and here it does at last become important to know what P actually is]. Since v and x were chosen arbitrarily, the result is proved.

Notice that what we were required to do is completely clear and unambiguous: it was to find functions f (of v) and g (of v and x) such that P(v,f(v),x,g(v,x),z) is true for every choice of v, x and z. Such a function must be specified for each `there exists' in the statement, and it is allowed to depend only on the preceding variables. For example, if we are trying to prove the second statement (the one that begins with `there exists') then we must find a constant V (which can be thought of as a function of no variables), a function f (of w) and a function g (of w and y) with the property that P(V,w,f(w),y,g(w,y)) is true for every w and y.

The method illustrated.

Example 1.

Let me begin with a very simple example. I wish to prove that the sequence (1,0,1,0,1,0,...) does not converge. Let me set a_n to be 1 if n is odd and 0 if n is even. Then the statement that a_n converges to a can be written

(A e > 0) (E N) (A n > N) |a_n-a| < e

To say that a_n converges is to say that there exists a such that it converges to a. Thus, it is to say

(E a) (A e > 0) (E N) (A n > N) |a_n-a| < e

We want to prove the negation of this. In other words, we want to prove that

(A a) (E e > 0) (A N) (E n > N) |a_n-a|>e

From the general considerations above, we are trying to find a function f (of a) and g (of a and N) with the property that f(a) > 0, g(a,N) > N and |a_g(a,N)-a|>f(a) for every a and every N.

You may immediately see how to do this. In that case, pretend that it is hard for a moment, because when we look at more complicated examples it will help to have thought about how we tackled the simpler ones. Suppose, therefore, that you find it hard to think of a suitable function f. Let us ignore the difficulty for now and imagine that we have specified f. That means that we can restrict our attention to g. In order to specify g, we must say what value it takes for any given a and N. So let a and N be arbitrary. We are trying to choose m=g(a,N) with the property that m > N and |a_m-a|>f(a). Now a_m is 1 if m is odd and 0 if m is even. We are trying to get |a_m-a| to be at least as big as f(a), with m > N. What is the best choice for m? Well, we may as well make the difference |a_m-a| as big as possible. Since a_m only takes the values 0 and 1, it is easy to see how to do this. If a<1/2, then choose m > N to be odd, so that a_m=1. If a > 1/2, then choose m > N to be even, so that a_m=0. In either case, the difference |a_m-a| is at least 1/2. Hence, a possible choice for f is the constant function 1/2. (If you like to live dangerously, you could choose f to be as large as possible. In that case you would set f(a) to be a when a>1/2 and 1-a when a < 1/2.)

Now let us see how such a proof would actually be written out, since it is not usual to talk about functions f and g as above. I would write something like this.

Let a be arbitrary and set e=1/2. Given any natural number N, let m be an even number greater than N if a>1/2, and otherwise an odd number greater than N. In the first case, |a_m-a|=a>1/2 and in the second case |a_m-a|=1-a>1/2. In both cases, we are done.

Example 2

Define a much-loved function f by setting f(x)=sin(1/x) when x is not zero and f(0)=0. This function is not continuous at zero, but how can we prove it?

Well, the definition of f being continuous at x is

(A e > 0) (E d > 0) (A y) |y-x| < d => |f(y)-f(x)| < e

When x=0 and f(x)=0, as they are in our problem, this becomes

(A e > 0) (E d > 0) (A y) |y| < d => |f(y)| < e

Negating this sentence, we obtain the sentence

(E e > 0) (A d > 0) (E y) |y| < d =/=> |f(y)| < e

Now to say (in a mathematical context) that a statement p does not imply a statement q is to say that p is true and q is false. Hence, we can rewrite the last sentence as

(E e > 0) (A d > 0) (E y) |y| < d and |f(y)|>e

Now we must think a little bit, because we have got to the stage where we must choose something. In this case, our first task is to find a suitable e. As in the first example, let us proceed as though we do not have any idea what to choose. Again, the technique is to pretend we have chosen it, call it e, and remember that we can `cash it in' later (that is, fix on an actual value, such as 20, or 0.005) when we have a clearer idea of what we need it for.

The next step is to write, `Let d > 0 be arbitrary'. We are then trying to find y with two properties: |y|< d and |f(y)| > e. We have no control over d, so we must be able to find arbitrarily small values of y with the property that |sin(1/y)| is at least e. Since sin x lies between -1 and 1, it is clear that there is no hope of achieving this unless e<1. If e=1, then we are trying to find y with |y| < d and |sin(1/y)|>1, which can happen only if sin(1/y) is either 1 or -1. This, however, is possible. Indeed, all we have to do is let 1/y be an odd multiple of pi/2. Can we do this with |y| being smaller than d? Yes of course. Just choose n so large that 1/(n+1/2)pi is less than d and set this to be y. Notice that what we have done is choose a constant, 1 (the value of e) and a function of d (the value of y, which one could take, for example, as the reciprocal of the smallest (n+1/2)pi that exceeds 1/d).

The final proof now looks like this. Let e=1. Given d > 0, let n be the smallest integer such that (n+1/2)pi > 1/d and let y=1/(n+1/2)pi. Then sin(1/y)= 1 or -1, so |sin(1/y)|>1. Since d was arbitrary, we have shown that f is not continuous at zero.