If f is continuous then the inverse image under f of any open set is open.

A formal statement of the result to be proved.

Let X and Y be metric spaces and let f be a continuous function from X to Y. Then f^-1(U) is open for every open subset U of Y.

Finding the proof.

1. The conclusion we want is

(for every open subset U of Y) f^{-1}(U) is open

A standard move, which doesn't do much except make it easier to focus on the problem, is to say the following.

Let U be an open subset of Y.

Once we have done that, we regard U as fixed, but since it was arbitrary, if we can prove the result for "that U" then we have proved the result.

2. We can now revise the conclusion we want. It has become

f^-1(U) is open

3. By the definition of open sets, this expands to

(for all x in f^-1(U)) (there exists a > 0) (for all y in X) d(x,y)< a implies y is in f^-1(U)

4. Applying the for-all-removing standard move again, let us write a second line of proof.

Let x belong to f^-1(U).

Now x is regarded as fixed, and we have the additional hypothesis to work with, that x is in f^-1(U).

5. What we would now like to prove is

(there exists a > 0) (for all y in X) d(x,y)< a implies y is in f^-1(U)

6. Using the definition of f^-1(U) we can rewrite our hypothesis about x more simply as

f(x) is in U.

7. Similarly, we can rewrite the desired conclusion as

(i) (there exists a > 0) (for all y in X) d(x,y)< a implies f(y) is in U

8. Since there is no obvious further simplification, it is time to use more information that we are given in the problem. We have not used the fact that f is continuous, or that U is open. Let us write out these statements formally.

(ii) (for all u in X) (for all b > 0) (there exists c > 0) (for all v in X) d(u,v) < c implies d(f(u),f(v)) < b.

(iii) (for all z in U) (there exists d > 0) (forall w in Y) d(z,w) < d implies w is in U.

We would like to relate this to our desired conclusion using VERY simple techniques. Two of the most useful are (1) specializing a known fact that begins "(for all t)" to a particular value of t, and (2) changing the name we give to a bound variable when stating a known fact.

9. Can we use technique (1)? To do so, we would like to choose values for u, b or z in (ii) or (iii) in a way that might help prove (i).

10. Since a proof of (i) should end with a demonstration that f(y) is in U (subject to certain conditions of course), we should examine (ii) and (iii) for anything that might provide us with such a statement. Our best chance of doing this seems to be to use (iii) since it ends by showing that something is in U. Since w is a bound variable, the obvious thing to do is to try to replace it with f(y).

11. A first attempt yields us

(for all z in U) (there exists d > 0) (for all f(y) in Y) d(z,f(y)) < d implies f(y) is in U.

12. If P is any statement about elements of Y, then (for all f(y) in Y) P(f(y)) means (for all y in X) P(f(y)). Applying this, we obtain the statement

(iii)' (for all z in U) (there exists d > 0) (for all y in X) d(z,f(y)) < d implies f(y) is in U.

13. This statement now ends correctly, so now we should concentrate on the next bit in. In order to deduce (i) we'd like to replace the statement [d(z,f(y)) < d] by the statement [d(x,y) < a]. This we will be allowed to do if d(x,y) < a implies that d(z,f(y)) < d.

14. Do we have available any statement that contains anything of the form

d(x,y) < a implies d(z,f(y)) < d?

Yes we do. After a very short search we see that (ii) ends with a statement of that form. So this suggests that we should set z to be f(x) in (iii)'. We are allowed to do this only if f(x) is in U, but fortunately it is (step 6). We are also free to choose u, so u should be x (not y because y is not a free variable). Finally, the bound variable v should be renamed y.

15. Let's make these substitutions and see what we get. For good measure, we'll remind ourselves of (i).

(i) (there exists a > 0) (for all y in X) d(x,y)< a implies f(y) is in U

(ii)'' (for all b > 0) (there exists c > 0) (for all y in X) d(x,y) < c implies d(f(x),f(y)) < b.

(iii)'' (there exists d > 0) (for all y in X) d(f(x),f(y)) < d implies f(y) is in U.

16. Just as there is a for-all-removing move, so there is a there-exists-removing move, consisting of taking a statement (there exists t) P(t) and fixing a t with that property. Logically it doesn't do much, but let us apply the move to (iii)'' anyway. So we write

Fix a d> 0 with the property that

(iii)''' (for all y in X) d(f(x),f(y)) < d implies f(y) is in U.

17. Is there an obvious potentially useful choice for b? Well, recall that we wanted to replace [d(z,f(y)) < d] by [d(x,y) < a]. So far, we've replaced it by [d(f(x),f(y)) < d] and our strategy was to make that a consequence of [d(x,y) < a]. Can we make it a consequence of anything? Yes, if in (ii)'' we set b to be d.

18. This gives us

(ii)''' (there exists c > 0) (for all y in X) d(x,y) < c implies d(f(x),f(y)) < d.

19. Putting together (ii)''' and (iii)''' gives us

(there exists c > 0) (for all y in X) [d(x,y) < c implies d(f(x),f(y)) < d] AND [d(f(x),f(y)) < d implies f(y) is in U]

20. Recalling our strategy, we put the two implications together and obtain

(there exists c > 0) (for all y in X) d(x,y) < c implies f(y) is in U

21. Finally, we rename c as a, and obtain (i). The proof is discovered.

Writing up the proof

We would like to extract from the above process a reasonably concise write-up of the proof, one that wouldn't look out of place as the answer given if this were a question on an examples sheet. We already have a couple of lines to start us off:

(a) Let U be an open subset of Y.

(b) Let x be an element of f^-1(U).

In order to produce the rest of the write-up, what we shall do is pick out a few of the statements from steps 1-20, put them in a natural order, and leave out the thoughts that led us to choose certain specializations, useful names for bound variables and so on. The thinking behind the natural order is that, whereas above we spent some of our time working backwards from the conclusion, now we will present the proof as though we knew all along how it would go. This may not determine the order uniquely (for example, if we want to deduce a statement P from statements Q and R, then it may well not matter in which order we write Q and R). What we have is a partial ordering of the useful statements which we shall extend to a linear order.

A convention of normally written out proofs is that we do not need to write out basic definitions in full. So, for example, it will not be necessary to write (iii) in the proof. What about (iii)'? That is such a simple deduction that it too is not really needed - of course it will be true as it is simply the definition of "open" restricted to elements f(y) of the image of f. So we can probably get away with missing those two out and writing (iii)''. But recall from step 14 that to get from (iii)' to (iii)'' we needed to use the fact that f(x) is in U. So let us write out a further potential chunk of proof.

(*) Then f(x) is in U.

(**) Hence, (there exists d > 0) (for all y in X) d(f(x),f(y)) < d implies f(y) is in U.

A couple of improvements suggest themselves. Since (*) is not really a deduction from (b) it might be better to write

Secondly, although we didn't want to write out in full what it means for a set to be open, we did use the openness and should remind the reader of that. So we could write instead

(**) Hence, as U is open, (there exists d > 0) (for all y in X) d(f(x),f(y)) < d implies f(y) is in U.

If (**) does not end up immediately following (c), then we may wish to revise "Hence, as U is open," to "Since f(x) is in U and U is open." However, let's guess that in fact (**) is (d).

We therefore have

(a) Let U be an open subset of Y.

(b) Let x be an element of f^-1(U).

(d) Hence, as U is open, (there exists d > 0) (for all y in X) d(f(x),f(y)) < d implies f(y) is in U.

Now it is time for (ii), or one of its descendants, to put in an appearance. For similar reasons to those we applied when considering (iii), we end up using (ii)''' and writing

(e) But f is continuous, so (there exists c > 0) (for all y in X) d(x,y) < c implies d(f(x),f(y)) < d.

Notice that in between steps (d) and (e) we changed d from a bound variable to something we imagine as fixed. This is often done in write-ups and doesn't seem to matter.

The jump from 19 to 20 was also pretty simple, so instead of writing out 19, we can now imagine c as fixed and after (d) and (e) safely write

(f) Hence, if d(x,y) < c, f(y) is in U

That was another piece of writing that is sloppy in a common way - strictly speaking we ought to have written "for all y" but it was implicit.

We would then like to revise line (f) to make it clearer that we have shown that f^-1(U) is open. So we expand to

(f) Hence, if d(x,y) < c, f(y) is in U and hence y is in f^-1(U), as was wanted.

The write-up in full

(a) Let U be an open subset of Y.

(b) Let x be an element of f^-1(U).

(d) Hence, as U is open, (there exists d > 0) (for all y in X) d(f(x),f(y)) < d implies f(y) is in U.

(e) But f is continuous, so (there exists c > 0) (for all y in X) d(x,y) < c implies d(f(x),f(y)) < d.

(f) Hence, if d(x,y) < c, f(y) is in U and hence y is in f^-1(U), as was wanted.