Euler’s Number

Leonhard Euler
Leonhard Euler (1707 – 1783)

Part of the Maths and Science archive

Euler’s Number. The name is evocative. Leonhard Euler was one of the greatest Mathematicians and certainly one of the most prolific. As was typical in his time, Euler was a polymath, also making contributions to Astronomy, Engineering, Optics and what we would now call Physics. He produced deep results in a range of Mathematical fields and was innovative in both his ideas and terminology; being the first to introduce the fundamental concept of a function, to use the modern notation of trigonometric functions and to employ the Greek letter \Sigma to denote summation. This brief article cannot possibly do justice to the measure of his work in Mathematics, but the words of renowned contemporary Laplace provide some idea of the respect in which he was held:

Lisez Euler, lisez Euler, c’est notre maître à tous.[1]

So if a number is named after Euler, then it is likely to be pretty important [2]. This is indeed the case, Euler’s Number is probably the most important number in Mathematics and I aim to give a flavour of why in this brief article. In order to do this, I need to lay some groundwork and some of this will be in areas that Euler himself advanced: functions, limits and differential calculus. But first some background.

Hail to e blithe constant [3]

Euler's Number

Euler’s number is written as e. Its decimal representation is a non-terminating, non-repeating number which begins:

2.718\hspace{2mm}281\hspace{2mm}828\hspace{2mm}459\hspace{2mm}045\hspace{2mm}235\hspace{2mm}360\hspace{2mm} 287\hspace{2mm}471\hspace{2mm}352\hspace{2mm}662\hspace{2mm}497\hspace{2mm}\ldots

Non-terminating means that the numbers in the decimal representation of e go on for ever. Non-repeating means that they never settle down to any pattern (e.g. a counter-example would be the repeating sequence: 461\hspace{2mm}874\hspace{2mm}618\hspace{2mm}746\hspace{2mm}187\ldots).

My recollection from school [4] was that e was introduced as the base of Natural Logarithms. Rather circuitously, Natural Logarithms were defined as ones with a base of e. I am just about old enough to have come across tables of logarithms while at school, something that disappeared with the advent of electronic calculators shortly thereafter. Such tables were used to facilitate the multiplication of numbers, for reasons I will explain soon. However, let’s first of all consider the genesis of logarithms, raising numbers to powers, or exponentiation.

The Power of Exponentiation

Let’s start with something even more familiar, multiplication. This is nothing bar repeated addition. So:

3\times 4=4 + 4 + 4

or symmetrically:

3\times 4=3 + 3 + 3 + 3

More generally, for two numbers n and m, we define:

n\times m=\underbrace{\strut m + m + \ldots + m}_{\text{\normalfont n times}}

or again symmetrically:

n\times m=\underbrace{\strut n + n + \ldots + n}_{\text{\normalfont m times}}

Of course the \times is often elided in Mathematics, we write nm instead of n\times m

Exponentiation is the next step in this ladder [5]. This consists of repeated multiplication. So we have:

3^4=3 \times 3 \times 3 \times 3

or more generally [6]:

n^m=\underbrace{\strut n \times n \times \ldots \times n}_{\text{\normalfont m times}}

In passing we will note some properties of exponentiation:

\begin{aligned} n^mn^p&=n^{m+p} \hspace{2cm} &(1) \\ \\\dfrac{n^m}{n^p}&=n^{m-p} \hspace{2cm} &(2) \\ \\(n^m)^p&=n^{mp} \hspace{2cm} &amp(3)\end{aligned}

We can see how (1) is derived from the definition of exponents easily as follows:

n^m\times n^p=\underbrace{\strut \underbrace{\strut n \times n \times \ldots \times n}_{\text{\normalfont m times}}\times\underbrace{\strut n \times n \times\ldots\times n}_{\text{\normalfont p times}}}_{\text{\normalfont m + p times}}

The other two identities are just as simply derived.

(2) provides the insight that negative exponents are reciprocals and powers of reciprocals:


So (recalling that anything to the power zero is equal to one):


(1) also allows us to consider fractional exponents as follows:




There are even ways to raise digits to the power of Complex Numbers. However, now that we have some background on exponentiation under our belt, we can think about logarithms.

As Easy as Falling off a Log

Log Tables

Logarithms are defined in terms of exponentiation. Consider:

a^b = c

We then define the logarithm of c base a as follows:


I.e. the logarithm of c base a is the power to which a must be raised in order for it to equal c.

Some examples include:

\begin{aligned} 10^3 = 1000 &\longleftrightarrow \log_{10}(1000) = 3 \\ \\2^{10} = 1024 &\longleftrightarrow \log_2(1024) = 10 \\ \\7^8 = 5,764,801 &\longleftrightarrow \log_7(5,764,801) = 8 \end{aligned}

It should be noted that, though I am using Natural Numbers in the examples above, the definition of logarithms extends to fractions (aka Rational Numbers) and even – as we will see in a bit – numbers that cannot be expressed as fractions (aka Irrational Numbers)

We can now see why tables of logarithms used to be helpful. Supposing we wanted to multiply two largish numbers, say 123456789 and 987654321. Then:

\begin{aligned}&\log_{10}(123456789)\approx 8.091515\hspace{1mm}&\Rightarrow\hspace{1mm}&10^{8.091515}\approx 123456789 \\ \\&\log_{10}(987654321)\approx 8.994605\hspace{1mm}&\Rightarrow\hspace{1mm}&10^{8.994605}\approx 987654321\end{aligned}

Then we have that:

123456789\times 987654321 \approx 10^{8.091515} \times 10^{8.994605}

and by (1) above, we have:

123456789\times 987654321 \approx 10^{8.091515 + 8.994605}

Algorithmically, we take the logarithms of each multiplicand, add these and then raise 10 to the power of the resulting number. In this case:

10^{17.08612}\approx 1.21933\times 10^{17}

Which is a reasonable approximation to the actual answer, probably close enough for most practical purposes. As adding the logarithms is easier then multiplying two nine digit numbers (of course assuming that tables of logarithms have already been created), this approach historically saved labour.

Using both the definition of logarithms and the properties of exponents noted in (1)\ldots (3) above, we have:

\begin{aligned} \log _n(m \times p)&=\log _n(m)+\log _n(p) \hspace{2cm} &(4) \\ \\\log _n\left(\dfrac{m}{p}\right)&=\log _n(m)-\log _n(p) \hspace{2cm} &(5) \\ \\\log _n m^p&=p\log _n m \hspace{2cm} &(6)\end{aligned}

Natural Logarithms are ones where the base is Euler’s number, e. Hence if:

e^x = y

We then define the Natural Logarithm of y as follows:

I.e. the power to which we must raise e to obtain y.

Well so far, so good, but we haven’t really got any insight as yet as to how Euler’s Number arises, or why it is so special. To obtain this we need to first spend some time contemplating functions, slopes and slopes of functions.

The Proper Function of Man [8]

Functions are one of the central concepts in Mathematics. They embody a relationship between a set of inputs and a set of outputs. A function can be thought of as a set of instructions on how to transform each input into each output. There is the proviso that each input has just one output. Thus the mapping between a function’s inputs and its outputs can be one-to-one or many-to-one, but never one-to-many (see below).

Function types

A function (typically denoted by one of f,  g or h, when not a Greek letter) might take the set of Integers (\mathbb{Z}) to the same set by adding 1 to each input number. The way that the notation works, if we call this function f, we have:

f: \mathbb{Z} \rightarrow \mathbb{Z} : x \mapsto x+1

This may be read as “a function f maps Integers to other Integers, by the recipe of adding 1 to each input number”.

We can also write more concisely:

f(x) = x+1

To take a few examples, we have:

\begin{aligned}f(1) &= 2 \\ \\f(2) &= 3 \\ \\f(438) &= 439 \\ \\f(-2) &= -1 \\\\f(-6)&=-5\end{aligned}

Here we can see that the function is one-to-one, each output comes from only one input. Output x can only come from input x-1 and no other number.


We may in passing have also discovered the concept of the inverse of a function here and also stumbled across the fact that this only exists for one-to-one functions. The inverse of a function f is a second function, labeled f^{-1} which “undoes” the transformation enacted by f. So if f(x)=y then f^{-1}(y)=x.

Function Inverse

This also means that f^{-1}(f(x))=f(f^{-1}(x))=x.

In our example where f(x)=x+1 then f^{-1}(x)=x-1.

Let’s instead consider a different function (which we can also label f [9]) that also maps Integers to the same set, this time by squaring each input [10]. Here we have:

f: \mathbb{Z} \rightarrow \mathbb{Z} : x \mapsto x^2


f(x) = x^2

If we revisit our previous examples, we have:

\begin{aligned}f(1) &= 1 \\ \\f(2) &= 4 \\ \\f(438) &= 191,844 \\ \\f(-2) &= 4 \\ \\f(-6) &= 36\end{aligned}

This time, we can see that the function is many-to-one, for example f(2) = f(-2) = 4.

So far, we have looked at discrete inputs and outputs [11] If we keep the same recipe for our function, but change the input and output sets to both be the Real Numbers(\mathbb{R}), then our same function f(x)=x^2 can be visualised as a line on a graph:

f(x) = x^2

x – the input – is (unsurprisingly) plotted on the horizontal x-axis and f(x) – the output – is plotted on the vertical y-axis.

Not only are the input and output numbers continuous [12], the line generated by the function is smooth, i.e. it has no sudden changes in direction. We will be coming back to this concept of smoothness very shortly.

So far I have covered functions that are powers of x, we could also define functions where a number is raised to the power of x, so f(x)=10^x is a function. Later we will be considering a special function:


After this very brief introduction to functions, we hopefully understand them enough to move on to the next concept, slopes, or more properly gradients.

Why is a Blotter like a Lazy Dog?

Warning! Right-angled triangle ahead.

The joke is rather dated (our writing tools have changed over the years), but:

A blotter is an ink-lined plane
An inclined plane is a slope up
A slow pup is a lazy dog

– Anonymous

Here we will consider slopes up and indeed down and tie this to some of the earlier work in the article.

The road sign above provides a starting point and also the suggestion that gradients are ratios. Let’s start with a very simple example, where a “hill” is 5 miles long and has a rise of half a mile [13]; this is shown in the left-hand diagram below (please note that vertical distances have been exaggerated to aid clarity):

Basic Gradient

The gradient of the left hand slope is defined as how much height is gained divided by how much distance it takes to gain this height. In our example we have:

\text{gradient} = \dfrac{\text{vertical rise}}{\text{horizontal distance}}=\dfrac{0.5}{5}=\dfrac{1}{10}=10\%

As our slope up is a straight line, the gradient is the same at any point. If we drew smaller triangles anywhere on our “hill”, the numerator and denominator of our gradient equation would change, but their ratio would remain constant. By way of contrast, consider the slope up in the right-hand diagram. Here we can see that the gradient changes as we go up. However, the average gradient is still the overall distance climbed divided by the overall horizontal distance covered. So in our example, the average gradient of both hills is the same [14].


The definition of gradient above should have some bells ringing from another area, Trigonometry. Let’s pause for a brief refresher on the basics of this. Consider a generic right-angled triangle as in the figure below:


Here the bottom left-hand angle has a value of \theta, the hypotenuse has length c, the adjacent side has length a and the opposite side has a length of b. We then have the following definitions:

\sin\theta = \dfrac{b}{c}\hspace{5mm}\Rightarrow\hspace{5mm} b = c\sin\theta

\cos\theta = \dfrac{a}{c}\hspace{5mm}\Rightarrow\hspace{5mm} a = c\cos\theta

\tan\theta = \dfrac{b}{a}\hspace{5mm}\Rightarrow\hspace{5mm} b = a\tan\theta

We can see that the definition of \tan\theta is the same as that of gradient above.

Now let’s consider just the first two miles of the climb for both of our examples as follows (the segment highlighted in red below):

Detailed Gradient

We can see that – consistent with our observation about smaller triangles above – the gradient of the left-hand “hill” is:

\text{gradient} = \dfrac{\text{vertical rise}}{\text{horizontal distance}}=\dfrac{0.200}{2}=\dfrac{1}{10}=10\%

However that of the right-hand “hill” is:

\text{gradient} = \dfrac{\text{vertical rise}}{\text{horizontal distance}}=\dfrac{0.163}{2}=8.25\%

While the average gradient of both climbs is the same, the local details of gradient differ.

What if, instead of looking at the gradient across some subset of the five mile climb on the right, we wanted to assess the steepness of the slope at various points. How would we go about this?

Well using the smaller triangles (the ones with a base of 2 miles) above suggests one approach. Supposing we wanted to know the slope of our right-hand “hill” at the 2 mile mark. Well we could construct a triangle whose base started on the “hill” at this point and stretched a little way to the right, say half a mile (or 880 yards [15]). If we zoom in, this might look like the following (again recall that vertical distances have been exaggerated in the diagram):

Gradient Estimate

Our measurement of the gradient of this triangle is as follows:

\text{gradient} = \dfrac{\text{vertical rise}}{\text{horizontal distance}}=\dfrac{150}{880}=17\%

This seems like a decent estimate of the slope at 2 miles, but we could improve on it. What if we made the triangle smaller, so that its base was a quarter of a mile, or 100 yards, or 1 yard, or an inch. Intuitively it would seem that the smaller the triangle, the closer its slope would be to the actual gradient two miles into the climb. Indeed it would seem evident that if a certain level of precision in measuring the gradient was required, this could be achieved simply by drawing a small enough triangle.

The last paragraph may have seemed a little hand-wavy, but it is actually based on rigorous Mathematics. Once more this was an area to which Euler contributed strongly, the concept of limits.

The Sky is the Limit

The introductory part of this section is adapted from my book on Group Theory and Particle Physics, Glimpses of Symmetry [16]

In antiquity, Zeno of Elea propounded a number of paradoxes to do with motion. Here I am going to conflate two of them [17], but hopefully offer a simplification at the same time.

Zeno's paradoxes

Some picture elements are © Randall Munroe of

Consider an arrow being loosed towards a target. My simplification of Zeno’s argument – contrary to all experience of the physical world – is that it will never get there. The argument is as follows:

  1. In order to reach the target, the arrow must first cover half of the distance. Leaving half the distance to be traversed.
  2. Next it must cover half of the remaining distance, or a quarter of the total distance. Leaving a quarter of the distance to be traversed.
  3. Next it must cover half of the remaining distance, or an eighth of the total distance. Leaving an eighth of the distance to be traversed.
  4. And so on…

Because this process can be extended indefinitely, the arrow will always be short of the target. The distance it is short by will decrease rapidly of course, but there will always be a small distance still to be covered.

Therefore the arrow will never reach the target.

There is actually no paradox here and the theory of limits deals with any messiness quite nicely. Again Euler was a major figure in this area of Mathematics. To introduce it, let’s consider the distance remaining to travel in the above example. This forms a sequence as follows:

1, \hspace{5mm}\dfrac{1}{2}, \hspace{5mm}\dfrac{1}{4}, \hspace{5mm}\dfrac{1}{8}, \hspace{5mm}\dfrac{1}{16}, \hspace{5mm}\ldots

If we say that 1 is the zeroth [18] term and \dfrac{1}{2} the first, then in general the nth term of this sequence is \dfrac{1}{2^n}.

What happens to \dfrac{1}{2^n} as n gets bigger and bigger? Well clearly it gets smaller and smaller. Suppose we are engaged in a game with someone, our adversary can name a number as small as they like and then we have to name a smaller one of the form \dfrac{1}{2^n}, where n is a Natural Number. Let’s see how a few rounds might play out:

Opponent’s Number Your Choice of n \dfrac{1}{2^n}
0.1 4 0.0625
0.01 7 0.0078125
0.001 10 0.0009765625

It is probably evident that we will alway have the upper hand over our opponent. Indeed if we generalise by saying that the number they pick is denoted by \varepsilon, then we get to use logarithms again and only need to find n such that:

\begin{aligned}\dfrac{1}{2^n}&<\varepsilon \\ \\\log_2(2^{-n})&<\log_2\varepsilon \\ \\-n&<\log_2\varepsilon \\ \\n&>-\log_2\varepsilon\end{aligned}

(of course if 0<x<1 then \log_mx<0 for any base m)

So all we need to do to win our game is to pick n as the next Natural Number up from -\log_2\varepsilon and we win.

In situations like this, where n gets bigger and bigger without limit, we say that n tends to infinity (\infty). We can describe what happens to \dfrac{1}{2^n} as n tends to infinity by saying that \dfrac{1}{2^n} tends to zero. We can write this using some special notation:


This can be read as “the limit of \dfrac{1}{2^n} as n tends to infinity is zero”.

The sense is that, if we can get arbitrarily close to a given value (in this case 0) by taking a large enough term in the sequence, then as the sequence wends its way to infinity, it actually reaches the given value, rather than just approaching it. Although this is not a very rigorous way of putting things Mathematically, we are sort of saying:




We formalise this result in exactly the way we established in our game above. However, to be fully rigorous, we need to add in the technical requirement that not only can we find a number, n, that makes the difference between the limit and the sequence smaller than our \varepsilon, but that this also holds for all sequence members following on after n. We say:


If and only if, for any \varepsilon>0, we can find a value of n such that for all m\ge n:


That is however small a number is selected we can find a value of \dfrac{1}{2^n} that is closer to zero than the number and that so are all later values \dfrac{1}{2^m}. By picking n>-\log_2\varepsilon we see that this assertion is true.

If we want to be more general, then if our sequence is a_0,a_1,a_2,a_3,\ldots then:


If and only if, for any \varepsilon>0, we can find a value of n such that for all m\ge n:

Again the sense is that as n tends to infinity, a_n tends to b.

Before closing this introduction to limits, we will cover just one more concept. If the distance remaining to be covered in our version of Zeno’s paradox is the infinite sequence we have been working with above, the distance that has been covered is instead an infinite series. Infinite series involve adding terms, the ones in this case are:


Using the notation that Euler himself introduced, we can write this as:

\displaystyle\sum_{n=1}^{\infty} \dfrac{1}{2^n}

which can be read as “the sum of \dfrac{1}{2^n} from n=1 to infinity”.

Formally, in terms of limits, what this means is:

\displaystyle\sum_{n=1}^{\infty} \dfrac{1}{2^n}=\lim_{m\to\infty}\sum_{n=1}^{m}\dfrac{1}{2^n}

The term on the right of the equals sign is called a partial sum. It captures the first m terms in our series. This gives us a mechanism to talk about a limit, by letting m tend to infinity.

A corollary of the limit we established for the sequence \dfrac{1}{2^n} is that:

\displaystyle\sum_{n=1}^{\infty} \dfrac{1}{2^n}=1

This is – in a nutshell – why motion is actually possible.

Having developed quite a bit of Mathematical machinery, much of it due to Euler himself, in the next two sections, we will begin to combine these concepts, first by considering the gradient of a function.

What difference does it make?

The introductory part of this section is also adapted from my book on Group Theory and Particle Physics, Glimpses of Symmetry [20]

In the section on gradients, we explored a way to find the slope of a “hill” at a given point using smaller and smaller triangles to increase precision. Some similarity between this process and the work we have just covered about limits is probably evident. Let’s drop our discussion of “hills” and be clear that what we want to be able to do is to determine the gradient of functions at specific points (we can think about drawing these functions on graph paper and measuring their slopes). Indeed, if our function is f(x), what we would like to do is to generate a second function, f^\prime(x), which captures the gradient at any point x [21]. Thus f^\prime(0) will be the slope of f(x) at the point 0 and f^\prime(75.8) will be the slope of f(x) at the point 75.8.

The function f^\prime(x) is called the derivative of f(x). The process of generating f^\prime(x) from f(x) is called differentiation and is part of the grand edifice of The Calculus as developed independently by both Newton and Leibniz in the late 1600s.

While f^\prime(x) itself is often used to denote the differential. There are a couple of other ways it is shown, first:

\displaystyle f^\prime(x)=\dfrac{d}{dx}f(x)

We read this as “the derivative of f(x) with respect to x”, or just “df by dx” (pronounced “dee-eff by dee-ecks”).

Alternatively, if we set y=f(x):

\displaystyle f^\prime(x)=\dfrac{dy}{dx}

Similarly, we read this as “the derivative of y with respect to x”, or just “dy by dx”.

If we go back to our previous approach of drawing small triangles, but apply this to a function, f(x), then – for some small distance along the x-axis, \delta x – we have:

The differential as a limit

Using our previous definition of gradient and considering the red triangle above, we have our estimate of the slope at point x is given by:

\text{gradient} = \dfrac{\text{vertical rise}}{\text{horizontal distance}}= \dfrac{f(x+\delta x)-f(x)}{\delta x}

Further as we shrink the size of \delta x (and thereby the red triangle), we can appeal to our work on limits to define:

\displaystyle f^\prime(x)=\dfrac{d}{dx}f(x)= \lim_{\delta x\to 0}\dfrac{f(x+\delta x)-f(x)}{\delta x}

Let’s use this approach on a function we met earlier, f(x)=x^2. We then have:

\displaystyle \dfrac{f(x+\delta x)-f(x)}{\delta x}=\dfrac{(x+\delta x)^2-x^2}{\delta x}


\begin{aligned}\dfrac{(x+\delta x)^2-x^2}{\delta x} &= \dfrac{x^2+2x\delta x+\delta x^2-x^2}{\delta x} = \\ \\\dfrac{2x\delta x+\delta x^2}{\delta x} &= 2x+\delta x\end{aligned}


\displaystyle \dfrac{d}{dx}x^2 = \lim_{\delta x\to 0}(2x+ \delta x)= \lim_{\delta x\to 0}2x+ \lim_{\delta x\to 0}\delta x

As, rather obviously, \lim_{\delta x\to 0}\delta x=0, we have established that:

\displaystyle \dfrac{d}{dx}x^2 = 2x

We can use similar arguments to show more generically that:

\displaystyle \dfrac{d}{dx}x^n = nx^{n-1}, where n\ge 1

I.e. to get the derivative of x^n you “bring the n down and reduce the power of \displaystyle x by one”, a nice simple rule [22]. We will be using this result in the next section.


We have defined the process of differentiation of a function, f(x), to produce its derivative, f^\prime(x). As f^\prime(x) is itself a function, we can also differentiate it [23] to obtain the following:

\displaystyle f^{\prime\prime}(x)=\dfrac{d}{dx}f^\prime(x)

If we want to write this in terms of our initial function, then:

\displaystyle f^{\prime\prime}(x)=\dfrac{d^2}{dx^2}f(x)

We can similarly form higher order derivatives such as \displaystyle \dfrac{d^3}{dx^3}f(x), or indeed \displaystyle \dfrac{d^n}{dx^n}f(x) for the \displaystyle n^{th} derivative. These can have a variety of physical meanings.

Rather than providing many other examples of the derivatives of functions, it is time to cut to the chase and ask a question about differentiation that will lead us to our ultimate goal.


Petri Dishes

A we mentioned when introducing the topic, a function f of a variable x is essentially a recipe for getting an output, f(x), from an input, x. What is the meaning of the derivative in this context? Well it depends. Suppose that x is a time input (it would typically be denoted by t rather than x in this case) and that the function yields how far something has travelled after a certain time, then the derivative of the function would be how fast the object is travelling (distance over time). Suppose that instead x is once more time but that f(x) is the number of bacteria in a petri dish. The derivative would then tell us how fast the population of bacteria is growing at a point in time.

This second example is germane as – all other things being equal [24] – the number of bacteria at a future point is determined by how many there are now. This is because most bacteria reproduce by binary fission; each cell splits in two and so the number of bacteria doubles with each new generation. So if we have 2,000 now, we will soon have 2\times 2,000 = 4,000.

In an idealised situation, we would have the following number of bacteria at each successive generation:

E. coli

Numerically, we have:


We can see that the rate of growth of the culture is dependent on its size, the bigger it is, the faster it grows. Putting this Mathematically, the derivative of the function giving us the population is dependent on the population itself, or f^\prime\propto f, where \propto means “proportional to”.

Returning to our idealised state, we could rewrite the number of bacteria over time as:


E. coli split as frequently as every 20 minutes. If we took our time input as being chunks of 20 minutes, then we could form a rough formula as follows:


Where t=1 is 20 minutes, t=2 is 40 minutes and so on.

We seem to have created a function involving exponentiation, more on this shortly.

But now let’s abandon our bacteria before they engulf us and instead ask a more generic question. Can we find a function f(x) such that:


That is a function whose derivative is precisely the function itself (or equivalently, the slope at a point is the value of the function at the same point)?

Well there are any number of ways that we could approach this, let’s just try some things out in a naive manner. I’ll take as a starting place our observation above that:

\displaystyle \dfrac{d}{dx}x^n = nx^{n-1}

We can form a table using this result as follows:

A pattern appears here, the derivative of a function on one row is a multiple of the function on the previous row. Let’s look at what happens when we differentiate the terms we have in the table above put together in an expression:

\begin{aligned} f(x)&=1+x+x^2+x^3+x^4 \\ \\f^\prime(x)&=0+1+2x+3x^2+4x^3\end{aligned}

Well that’s an interesting result. The derivative is close to the initial expression, but with all terms moving one place to the right.There is certainly a relationship between f and f^\prime. There are however a couple of problems:

  1. The coefficients of each term (the numbers we multiply each power of x by) don’t match up. They are all 1 in the first equation and are the sequence 0,1,2,3,4 in the second one.
  2. We have lost our x^4 term and so f^\prime is truncated

Can we fix these issues? Well to address the first point, we could try modifying our expression to be:


What happens when we differentiate this? Well we get:


At first sight, all seems good. However, having modified the definition of f(x), our derivative is still not equal to the original function. We have another trick that we need to play in order to make progress. Consider instead:

f(x)=1+x+\dfrac{x^2}{1\times 2}+\dfrac{x^3}{1\times 2\times 3}+\dfrac{x^4}{1\times 2\times 3\times 4}

What is the derivative of this?

f^\prime(x)=0+1+\dfrac{2x}{1\times 2}+\dfrac{3x^2}{1\times 2\times 3}+\dfrac{4x^3}{1\times 2\times 3\times 4}

Cancelling out the same terms in each numerator and denominator, we get:

f^\prime(x)=0+1+x+\dfrac{x^2}{1\times 2}+\dfrac{x^3}{1\times 2\times 3}

Which, if we drop the initial 0 is indeed the same as our latest definition of f(x), save that (problem 2) we have lost the term in x^4.

Before moving on to address our second difficulty, a note about numbers like 1\times 2\times 3 \times 4, we call these factorials and they have a special notation, so:

4!=1\times 2\times 3\times 4

and more generally:

n!=1\times 2\times 3\times \ldots \times (n-1)\times n

Using this notation (and noting that 0! is defined to equal 1) we can write our latest version of f(x) as:


So what about our missing x^4 term in f(x)? Where could we get such a term from? Well if we added an x^5 term to f(x), then this would work, but we have just moved the problem along one, we now have no term in x^5 in the derivative. How to resolve this conundrum? What we need is a constant supply of higher powers of x to slot into place in the derivative.

Well the concept of an infinite series as explored above comes to the rescue. Instead of the expression for f(x) cutting off, what if we define it as:


or using Euler’s summation notation:


We then have [27]:

\begin{aligned}f(x)^\prime=\displaystyle\dfrac{d}{dx}\left(\sum_{n=0}^{\infty}\dfrac{x^n}{n!}\right)&=\sum_{n=0}^{\infty}\dfrac{d}{dx}\left(\dfrac{x^n}{n!}\right) \\ \\=\sum_{n=1}^{\infty}\dfrac{nx^{n-1}}{n!}&=\sum_{n=1}^{\infty}\dfrac{x^{n-1}}{(n-1)!}=f(x)\end{aligned}

So we have constructed at least one function f(x) with the property that f^\prime(x)=f(x). In fact it can be shown that there is only one such function. Because this function is so special, it has its own notation, we say:


The choice of \exp(x), as well as the emergence of an exponential function (i.e. f(t)=2^t) in our discussions about bacteria perhaps gives the game away. In fact what we have is:


These equalities were proved directly by Euler again, though both Newton and Leibniz had come up with less direct and more convoluted proofs in the 1600s.

We call \exp(x), or equivalently e^x, The Exponential Function, employing a definite article to signify its uniqueness and importance.

If we consider the value of \exp(1), we have:


Here we return to the beginning with a formula that expresses Euler’s Number as an infinite sum. Sometimes this is where people start to learn about e [28]. However we have taken a journey that has highlighted not just that Euler’s Number is important (something that could be deduced by its ubiquity in Mathematics and related disciplines) but why it is important and how it arises. It is from the self-reference of The Exponential Function that the power (no pun intended) of e arises and to which its many beautiful properties may be ascribed.

Closing Thoughts

\boldsymbol{\huge e^{i\pi} + 1 = 0}

There are many other things to be said about e and The Exponential Function. For example, several fundamental results arise by considering The Exponential Function acting on the set of Complex Numbers. Just one of these is the equation appearing above, which is often described as the most elegant in Mathematics and which ties together five of its most fundamental constants. It is of course named after its discoverer and called Euler’s Identity. This margin is too narrow to contain a proper derivation of Euler’s Identity [29], but this is covered more fully in The Equation.

I will close by again reemphasising the self-referential manner in which The Exponential Function and Euler’s Number arise. As in several areas of Mathematics, self-reference leads to the emergence of interesting features. Also some of the apparatus we have developed allows the Exponential Function to be applied to many different Mathematical objects. We have mentioned Complex Numbers above, we can even do things like raising Euler’s Number to the power of a matrix [30]. In developing theories relating to the Exponential Function, as is much of the rest of his work, Euler opened up new Mathematical vistas, terrain which intrepid Mathematicians explore to this very day. I hope in this piece that I have given some flavour of his unique vision.


The following people’s input is acknowledged:

  • Carey G. Butler, for pointing out a number of typographical errors.
  • JohnKathy Mapp, who noticed that I was starting many of my summations at n=1 where n=0 was what it should have been.
  • John S. Garavelli, who noticed I was erroneously using a + in equation (4) above, when a \times would have been more in keeping with the equation.
  • Caleb Fitzgerald, who pointed out a glitch in one of my examples of logarithms.
  • Jonathan Baker, who pointed out an error in equation (3).
  • Paul Macey, who straightened out my many-to-one and one-to-many text, which had got twisted.
  • Michael Aitken-Deacon, who pointed out that my initial example of a limit and my general definition lacked the appropraite rigour.

Of course any [other] errors and omissions remain the responsibility of the author.

Consider Supporting Us
Like all of the content on, this article is free. However, if you enjoyed reading it, you might consider helping to support the creation of new content by making a small contribution to defray our costs. Pay as much or as little as you want. Of course this is entirely optional.

Peter James Thomas


<< When I’m 65 The Equation >>
Part of the Maths and Science archive.


Read Euler, read Euler, he is the master of us all.
Euler has the distinction of having a second number also named after him. The other is the Euler–Mascheroni constant.
Sorry Percy :-o.
Many years ago and so perhaps not to be relied upon 100%.
The one after exponentiation is called tetration. It consists of repeated exponentiation, but this is outside of the scope of this article.
Here we note that, unlike multiplication, n^m is not in general the same as m^n. E.g. 3^2=9 and 2^3=8.
In Pure Mathematics, the assumption is made that if \log is used with no explicit base, then the base is e and the meaning is Natural Logarithm. In other disciplines (and many scientific calculators), the formulation \ln is used instead.
I’m sure Jack London meant to add “or Woman”.
f gets used almost as much as x in Mathematics.
A point here is that while the output set is \mathbb{Z}, the function only maps to values greater than or equal to zero (any negative number is mapped to a positive one). There is a term for the subset of the output range that a function covers, but we won’t be going in to this today.
A set like the Integers is called discrete because there are gaps between its members, indeed there are an infinite number of values between any two Integers, say 1 and 2. If there are no gaps, the set is instead called continuous.
I.e. with no gaps in them, see Note 11 above.
The units are immaterial, kilometres would have worked just as well, or cubits for that matter. We will drop units soon enough in what follows.
Of course it is average gradient that appears on road signs as hills are seldom nice straight lines.
Recall that there are 1,760 yards in a mile.
Specifically a section in Chapter 21 – SU(3) and the Meaning of Lie.
The first relates to Achilles and a Tortoise having a race, where the Tortoise has a head start; by the time Achilles has reached the Tortoise’s staring point, the Tortoise has moved – an so on. This paradox relates to fractions of distance covered, but is more complicated than is strictly necessary to make the point. The second paradox relates to an arrow in flight; arguing that at any instant in time, the arrow occupies a point, so therefore it cannot be moving. This paradox is more to do with the nature of time as it pertains to motion. Here I have blended the two to make what I think is a simpler example.
It makes the format of the generic term easier to start with zero.
It makes the format of the generic term easier to start with zero.
Here |x| stands for the modulus of x, which is effectively its absolute size. That is |-1|=|1|=1.
Specifically a section in Chapter 20 – Power to Truth, which shares the same name. Any chance to reference a Smiths song is of course welcome.
We can do this if the original function, f(x) is both continuous and smooth. Smooth has a specific technical meaning, which we won’t get into here. The general sense is not a million miles from the day-to-day meaning of the word.
Somewhat sadly, schoolchildren are generally taught the rule before being shown why it works.
Assuming that f^\prime(x) is smooth of course. We spoke before about smooth functions. In Mathematical terms, a smooth function is one that you can differentiate as many times as you want (possibly reaching 0 of course).
Nutrients, space to grow, cells being immortal etc.
We could of course rely upon our generic formula for the derivative, but if a function is equal to a constant, here 1, then it is a horizontal straight line when plotted on a graph. As with real hills, the gradient of a level line is zero.
Again, we can see this result directly without recourse to our generic formula. If f(x)=x then the graph of f(x) is a straight line going up from the origin at 45°. The slope of such a line is a constant one as each increase in the horizontal axis is matched by the same increase in the vertical axis.
We are able to make the last step in this chain of equalities for the same reason that we could drop 0 in earlier expressions, namely:

For n=0 we have \dfrac{d}{dx}x^0=\dfrac{d}{dx}1=0

which can also be dropped allowing our sum to start from n=1 and clearly:

\displaystyle \sum_{n=0}^{\infty}\dfrac{x^{n}}{n!}=\sum_{n=1}^{\infty}\dfrac{x^{n-1}}{(n-1)!}
The other, and perhaps less interesting, starting point is of course compound interest.
With apologies to Pierre de Fermat of course.
See a section of Glimpses of Symmetry, Chapter 20 – Power to Truth.

Text & Images: © Peter James Thomas 2018.
Published under a Creative Commons Attribution 4.0 International License.