Two definitions of `definition'

What does it mean to define a mathematical concept? As any experienced mathematician is aware, there is a small subtlety about this, but it is not always highlighted, for which reason I wish to highlight it here.

Here are a few examples of mathematical definitions.

1. A positive integer is prime if it has exactly two factors.

2. A group is a set together with an associative binary operation such that there is an identity element and every element has an inverse.

3. Let X be a metric space. A subset Y of X is open if for every y in Y there exists d> 0 such that x is in Y for every x with d(x,y)< d.

4. A function f:X--> Y is an injection if no element of Y has more than one preimage. That is, f(x)=f(y) ==> x=y.

5. Let A be a set of positive integers and for each n let d_n=n^-1|A intersect {1,2,...,n}|. That is, d_n is the proportion of numbers up to n that belong to A. If d_n tends to a limit d as n tends to infinity then A is said to have density d. The lim sup of the d_n is called the upper density of A.

6. For html reasons write E for the empty set. Then the number 4 is the set {E,{E},{E,{E}},{E,{E},{E,{E}}}}.

7. Let x and y be mathematical objects. The ordered pair (x,y) is the set {x,{x,y}}.

8. A real number is a partition of the rational numbers into two sets A and B such that every element of A is less than every element of B.

9. A function from A to B is a subset F of the Cartesian product AxB with the property that for every a in A there is exactly one b in B such that (a,b) is in F.

10. The function f(x)=sin(x) is the function f:R--> R defined by f(x)=x-x³/3!+x⁵/5!-... .

Notice that these definitions are of two kinds. Definitions 1-5 are all straightforward, in the sense that a mathematical word is introduced and the definition tells us what it means. Definitions 6-10 take words we thought we understood already and redefine them in peculiar ways. If somebody says, `A function from A to B is a subset of the Cartesian product AxB with certain properties,' it is tempting to reply, `No it isn't.' Surely a function is more like a procedure for taking elements of A and unambiguously associating with them elements of B. Similarly, a real number isn't a partition of the rationals - it's more like a single object, a position on the number line. As for the number 4, it's just the positive integer that comes after 3, which comes after 2, which comes after 1, which is the first one. And isn't sin(x) something to do with trigonometry? Maybe it so happens that its value is given by a nice power series, but that is surely a theorem, rather than the definition, which is the ratio of the lengths of the opposite and hypoteneuse of an appropriate triangle.

What is the point of definitions like 6-10? The answer is that they are artificial constructions which are useful from the point of view of providing mathematics with solid foundations. However, that is largely where their usefulness ends, and one should not make the silly mistake of thinking that they somehow reveal the `true essence' of the concept being defined. This would be too obvious to be worth mentioning were it not for the fashion of introducing these definitions as though they were at last uncovering such an essence. When a lecturer says, `log(x) is the integral of t^-1 from 1 to x,' and follows it up with a proof that log(x), thus defined, has all the familiar properties, this should be understood as an abbreviated way of saying something like, `For every positive real number c there is at most one continuous function L from the positive reals to R such that L(ab)=L(a)+L(b) for every a,b and such that L(c)=1. The integral I(x) of t^-1 from 1 to x has the first property, and increases with x. Hence by the intermediate value theorem there is a unique number e such that I(e)=1.'

To return to the examples above, what would be a natural way to define the ordered pair (x,y)? It isn't all that easy to say. We would like to say that it is the object you get by putting x first and y second, or that it is the set {x,y} `except that order matters'. Unfortunately, these attempts are a little vague. Another possibility, which is perhaps a little artificial, but less so than {x,{x,y}}, is to say that (x,y) is the function from {1,2} to {x,y} defined by f(1)=x and f(2)=y. The trouble with that is that later we may want to define functions in terms of Cartesian products and Cartesian products in terms of ordered pairs.

Despite these difficulties, we have no difficulty thinking about ordered pairs, and moreover had no difficulty thinking about them long before anybody told us the `correct' definition (at least if my experience is anything to go by). Why is this? The answer is that to use ordered pairs all one needs to know about them is the following axiom:

(x,y)=(z,w) if and only if x=z and y=w.

In a sense, this axiom is the true definition of an ordered pair. It would be nice if one could use it to produce a definition of the first type - something like that an ordered pairing is a class of objects (x,y) where x and y can be anything, such that the ordered-pair axiom holds. But mathematical etiquette demands that we say what those objects are, or at least prove that the axiom doesn't lead to inconsistencies. In the end, in order to do this one is forced to come up with some non-canonical construction that satisfies the axiom. But others could have done just as well. If I felt like it, I could define the ordered pair (x,y) to be {x,{x,{y}}}. (Proof: Let {x,{x,{y}}}={z,{z,{w}}}. x doesn't equal {x,{y}} or it would be an element of itself. Similarly for z and w. It is not possible for x to equal {z,{w}} and z to equal {x,{y}} or x is an element of an element of itself. So x must equal z and {x,{y}} must equal {z,{w}}={x,{w}}. It follows that either {y}={w} and hence y=w, or x={w}. In the second case, {x,{w}} has only one element, which then forces x to equal {y} as well, so again {y}={w} and hence y=w.)

Similar remarks can be made about all of definitions 6-10. In each case there are some properties with which we are familiar from our `naive' experience of some concept. In order to reassure ourselves that we are being rigorous, we look for a construction that has those properties and which is isomorphic to any other construction that has the properties. Usually this can be done in many ways, and no way is more correct than any other - though it may have advantages of efficiency and mathematical neatness. Sometimes there are several natural constructions. For example, sin(x) can be defined as a power series, or as the inverse function of arcsin (itself defined via an appropriate integral) or as the unique solution to a certain differential equation etc. etc. Whichever definition one chooses, one must then prove that it has the properties one wants - such as being given by a power series, getting back to itself after you differentiate four times, being periodic with period some number (defined to be 2 pi), satisfying the addition law sin(a+b)=sin(a)cos(b)+cos(a)sin(b) (I assume that one defines cos at the same time) and so on.

Conclusion

It would be asking too much to suggest a change to normal mathematical parlance, but it might have been better if there had been different words for the two kinds of definition. However, since they are usually easily distinguishable, this is not a serious problem.

When meeting a definition of the second, artificial kind, the kind that one might call a `construction-definition', one should not accept it passively. Instead, one should pay careful attention to the basic properties possessed by whatever has just been constructed/defined, because it is these that are interesting rather than the definition itself. In a typical lecture course or textbook these properties will be found in the easyish propositions that immediately follow the definition.

Additional remarks and examples.

I said that the two kinds of definition are `usually' easily distinguishable, but sometimes the distinction is blurred. A good example of this is the definition of homology groups. We do not start an algebraic topology course with a preconceived notion of homology, and are therefore inclined to accept whatever definition is thrown at us, however complicated it might seem. As a result, the subject seems to many people to be difficult.

It may not solve everything, but one way to navigate oneself through these difficulties is to follow the suggestion above and focus on the properties that homology groups are supposed to have. In particular, when calculating homology groups, far more useful than the definition will usually be the cluster of theorems that follow it - things like the Mayer-Vietoris sequence - that tell you the homology groups of an object X if it is built out of other objects whose homology groups you already know.

In fact, there is a system of axioms, known as the Eilenberg-Steenrod axioms, which tell you exactly what the properties are that are needed from homology, and, just as with definitions 6-10 above, there are many equivalent ways of satisfying those axioms. What is interesting about this situation is that the discovery of these axioms was a relatively late development in topology (it took place in the 1940s) and had a major clarifying effect on the subject.

Here are three more examples of artificial definitions, which are better thought of as constructions with certain properties.

11. An ordinal is a transitive set that is well-ordered by inclusion. (See here for a discussion of ordinals.)

12. A complex number is an ordered pair (a,b) of real numbers. (a,b)+(c,d)=(a+c,b+d) and (a,b)(c,d)=(ac-bd,ad+bc).

13. The hyperbolic plane is the set of all complex numbers with positive real part, with the Riemannian metric $dz/x$, where $x$ is the real part of $z$. (The main property of interest here is the large group of symmetries. It is usually made clear that this is a construction rather than a definition - one would normally call it the half-plane model of the hyperbolic plane rather than the hyperbolic plane itself. See here for a further discussion of this point.)

As has probably occurred to most people who have read this far, it is possible to classify construction-definitions further. One could defend the power-series definition of sin(x) by saying that it picks out the unique real number that it is supposed to pick out. If you do that in a different way you don't get a different object, but just a different way of describing the same object. By contrast, if you decided to define a real number as a Cauchy sequence of rationals, you would be giving a genuinely different definition.

This distinction, valid though it may be, does not alter the point that when we work with functions like sin(x) and log(x) it is their basic properties that we tend to use. I don't care that sin(2) is approximately 0.90929743, but I do care that sin(a+b)=sin(a)cos(b)+cos(a)sin(b). It is because the power series has these properties that we know that it picks out the values it is meant to.

A logician might explain the distinction between the two sorts of definition as follows: the first kind specifies a system of axioms whereas the second provides a model for a system of axioms that is already given.