HomePatterns patterns everywhere

Patterns patterns everywhere

21 Apr 20103 Nov 2014 Peter James Thomas business intelligence, Physics, Pure Mathematics, Statisticschaos theory, eyjafjallajokull, non-linear systems, predictive analytcis, vesuvius, volcano, weather prediction

Look at the beautiful shapes!

Introduction

A lot of human scientific and technological progress over the span of recorded history has been related to discerning patterns. People noticed that the Sun and Moon both had regular periodicity to their movements, leading to models that ultimately changed our view of our place in the Universe. The apparently wandering trails swept out by the planets were later regularised by the work of Johannes Kepler and Tycho Brahe; an outstanding example of a simple idea explaining more complex observations.

In general Mathematics has provided a framework for understanding the world around us; perhaps most elegantly (at least in work that is generally accessible to the non-professional) in Newton’s Laws of Motion (which explained why Kepler and Brahe’s models for planetary movement worked). The simple formulae employed by Newton seemed to offer a precise set of rules governing everything from the trajectory of an arrow to the orbits of the planets and indeed galaxies; a triumph for the application of Mathematics to the natural world and surely one of humankind’s greatest achievements.

The Antikythera mechanism

For centuries it appeared that natural phenomena seemed to have simple principles underlying them, which were susceptible to description in the language of Mathematics. Sometimes (actually much more often than you might think) the Mathematics became complicated and precision was dropped in favour of – generally more than good enough – estimation; but philosophically Mathematics and the nature of things appeared to be inextricably interlinked. The Physicist and Nobel Laureate E.P. Wigner put this rather more eloquently:

The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.

Dihedral Group 3

In my youth I studied Group Theory, a branch of mathematics concerned with patterns and symmetry. The historical roots (no pun intended^[1]) of Group Theory are in the solvability of polynomial equations, but the relation with symmetry emerged over time; revealing an important linkage between geometry and algebra. While Group Theory is a part of Pure Mathematics (supposedly studied for its own intrinsic worth, rather than any real-world applications), its applications are actually manifold. Just one example is that groups lie (again no pun intended^[2]) at the heart of the Standard Model of Particle Physics.

However, two major challenges to this happy symbiosis between Mathematics and the Natural Sciences arose. One was an abrupt earthquake caused by Kurt Gödel in 1931. The other was more of a slowly rising flood, beginning in the 1880s with Henri Poincaré and (arguably) culminating with Ruelle, May and Yorke in 1977 (though with many other notables contributing both before and after 1977). The linkage between Mathematics and Science persists, but maybe some of the chains that form it have been weakened.

Potentially fallacious patterns

However, rather than this article becoming a dissertation on incompleteness theorems or (the rather misleadingly named) chaos theory, I wanted to return to something more visceral that probably underpins at least the beginnings of the long association of Mathematics and Science. Here I refer to people’s general view that things tend to behave the same way as they have in the past. As mentioned at the beginning of this article, the sun comes up each morning, the moon waxes and wanes each month, summer becomes autumn (fall) becomes winter becomes spring and so on. When you knock your coffee cup over it reliably falls to the ground and the contents spill everywhere. These observations about genuine patterns have served us well over the centuries.

It seems a very common human trait to look for patterns. Given the ubiquity of this, it is likely to have had some evolutionary benefit. Indeed patterns are often there and are often useful – there is indeed normally more traffic on the roads at 5pm on Fridays than on other days of the week. Government spending does (with the possible exception of current circumstances) generally go up in advance of an election. However such patterns may be less useful in other areas. While winter is generally colder than summer (in the Northern hemisphere), the average temperature and average rainfall in any given month varies a lot year-on-year. Nevertheless, even within this variability, we try to discern patterns to changes that occur in the weather.

Brrrrrrrrrrrrrrrrrrrrrrrrr

We may come to the conclusion that winters are less severe than when we were younger and thus impute a trend in gradually moderating winters; perhaps punctuated by some years that don’t fit what we assume is an underlying curve. We may take rolling averages to try to iron out local “noise” in various phenomena such as stock prices. This technique relies on the assumption that things change gradually. If the average July temperature has increased by 2°C in the last 100 years, then it maybe makes sense to assume that it will increase by the same 2°C ±0.2°C in the next 100 years. Some of the work I described earlier has rigorously proved that a lot of these human precepts are untrue in many important fields, not least weather prediction. The phrase long-term forecast has been 100% shown to be an oxymoron. Many systems – even the simplest, even those which are apparently stable^[3] – can change rapidly and unpredictably and weather is one of them.

$Of course the rules state that you must have a picture of a strange attractor in any article referencing chaos theory - I do however get points for not using the word 'fractal' anywhere in the text!$

For the avoidance of doubt I am not leaping into the general Climate Change debate here – except in the most general sense. Instead I am highlighting the often erroneous human tendency to believe that when things change they do so smoothly and predictably. That when a pattern shifts, it does so to something quite like the previous pattern. While this assumed smoothness is at the foundation of many of our most powerful models and techniques (for example the grand edifice of The Calculus), in many circumstances it is not a good fit for the choppiness seen in nature.

Obligatory topical section on volcanoes

The above observations about the occasionally illusory nature of patterns lead us to more current matters. I was recently reading an article about the Eyjafjallajokull eruption in The Economist. This is suffused with a search for patterns in the history of volcanic eruptions. Here are just a few examples:

Last time Eyjafjallajokull erupted, from late 1821 to early 1823, it also had quite viscous lava. But that does not mean it produced fine ash continuously all the time. The activity settled into a pattern of flaring up every now and then before dying back down to a grumble. If this eruption continues for a similar length of time, it would seem fair to expect something similar.
Previous eruptions of Eyjafjallajokull seem to have acted as harbingers of a subsequent Katla [a nearby volcano] eruptions.
[However] Only two or three […] of the 23 eruptions of Katla over historical times (which in Iceland means the past 1,200 years or so) have been preceded by eruptions of Eyjafjallajokull.
Katla does seem to erupt on a semi-regular basis, with typical periods between eruptions of between 30 and 80 years. The last eruption was in 1918, which makes the next overdue.

Planes beware!

To be fair, The Economist did lace their piece with various caveats, for example the above-quoted “it would seem fair to expect”, but not all publications are so scrupulous. There is perhaps something comforting in all this numerology, maybe it gives us the illusion that we can make meaningful predictions about what a volcano will do next. Modern geologists have used a number of techniques to warn of imminent eruptions and these approaches have been successful and saved lives. However this is not the same thing as predicting that an eruption is likely in the next ten years solely because they normally occur every century and it is 90 years since the last one. Long-term forecasts of volcanic activity are as chimerical as long-term weather forecasts.

A little light analysis

Looking at another famous volcano, Vesuvius, I have put together the following simple chart.

The average period between eruptions is just shy of 14 years, but the pattern is anything but regular. If we expand our range a bit, we might ask how many eruptions occurred between 10 and 20 years after the previous one. The answer is just 9 of the 26^[4], or about 35%. Even if we expand our range to periods of calm lasting between 5 and 25 years (so 10 years of leeway on either side), we only capture 77% of eruptions. The standard deviation of the periods between recorded eruptions is a whopping 12.5; eruptions of Vesuvius are not regular events.

One aspect of truly random distributions at first seems counterfactual, this is their lumpiness. It might seem reasonable to assume that a random set of events would lead to a nicely spaced out distribution; maybe not a set of evenly-spaced points, but a close approximation to one. In fact the opposite is generally true; random distributions will have clusters of events close to each other and large gaps between them.

The above exhibit (a non-wrapped version of which may be viewed by clicking on it) illustrates this point. It compares a set of pseudo-random numbers (the upper points) with a set of truly random numbers (the lower points)^[5]. There are some gaps in the upper distribution, but none are large and the spread is pretty even. By contrast in the lower set there are many large gaps (some of the more major ones being tagged a, … ,h) and significant clumping^[6]. Which of these two distributions more closely matches the eruptions of Vesuvius? What does this tell us about the predictability of its eruptions?

The predictive analytics angle

As always in closing I will bring these discussions back to a business focus. The above observations should give people involved in applying statistical techniques to make predictions about the future some pause for thought. Here I am not targeting the professional statistician; I assume such people will be more than aware of potential pitfalls and possess much greater depth of knowledge than myself about how to avoid them. However many users of numbers will not have this background and we are all genetically programmed to seek patterns, even where none may exist. Predictive analytics is a very useful tool when applied correctly and when its findings are presented as a potential range of outcomes, complete with associated probabilities. Unfortunately this is not always the case.

It is worth noting that many business events can be just as unpredictable as volcanic eruptions. Trying to foresee the future with too much precision is going to lead to disappointment; to say nothing of being engulfed by lava flows.

But the model said…

Explanatory notes


[1]	The solvability of polynomials is of course equivalent to whether or not roots of them exist.

[2]	Lie groups lie at the heart of quantum field theory – a interesting lexicographical symmetry in itself

[3]	Indeed it has been argued that non-linear systems are more robust in response to external stimuli than classical ones. The latter tend to respond to “jolts” in a smooth manner leading to a change in state. The former often will revert to their previous strange attractor. It has been postulated that evolution has taken advantage of this fact in demonstrably chaotic systems such as the human heart.

[4]	Here I include the – to date – 66 years since Vesuvius’ last eruption in 1944 and exclude the eruption in 1631 as there is no record of the preceding one.

[5]	For anyone interested, the upper set of numbers were generated using Excel’s RAND() function and the lower are successive triplets of the decimal expansion of pi, e.g. 141, 592, 653 etc.

[6]	Again for those interested the average gap in the upper set is 10.1 with a standard deviation of 4.3; the figures for the lower set are 9.7 and 9.6 respectively.

21 thoughts on “Patterns patterns everywhere”

Seth Grimes says:

22 Apr 2010 at 11:32 am

What you’re describing here — “random distributions will have clusters of events close to each other and large gaps between them” — is a Poisson process. If you want the quick-learn version, go to Wikipedia (http://en.wikipedia.org/wiki/Poisson_process). If you want the discursive, literary introduction, read Gravity’s Rainbow. That’s one of the things I did when I should have been studying the Wedderburn Theorem and the like in grad school.

Reply
- Peter Thomas says:
  
  22 Apr 2010 at 11:40 am
  
  Thanks Seth,
  
  I’ll take a look.
  
  I did Probability Theory at University rather than applied Statistics; and moved on to concentrate on Group and Number Theory as soon as the curriculum allowed!
  
  Peter
  
  Reply
Daoud says:

22 Apr 2010 at 11:43 am

Your “truly random numbers” (from the decimal expansion of pi) are not truly random – they’re just pseudorandom numbers that happen to behave more randomly than the Excel ones, which are clearly rubbish. They’re pseudorandom since the sequence is deterministic – if you want truly random numbers you’d have to do something like measure a quantum system in a superposition of states.

Reply
- Peter Thomas says:
  
  22 Apr 2010 at 11:51 am
  
  Hi Daoud,
  
  Thanks for taking the time to comment.
  
  I had just this precise debate with an engineer recently – it’s a semantic one and depends on what you mean by random. Your definition is that it relates to a truly random event (and you example is perfect for this – radioactive decay would work well as a physical manifestation).
  
  However I was interested on what grounds you felt that my (admittedly rather artificial) choice of triplets in pi’s decimal expansion was inappropriate. Transcendental numbers have many interesting properties. Not a challenge, just genuinely interested in where you are coming from.
  
  Peter
  
  PS As mentioned in the article, I was a Group / Number Theoretician, not a Statistician.
  
  Reply
Daoud says:

22 Apr 2010 at 12:00 pm

Hi Peter – well, I studied physics so I guess I have a different perspective on what random means! You’re right of course that it’s an argument about semantics – alternatively, we could say that decimal expansions of pi are more random than Excel’s random function.

Your use of the decimal expansion of pi was not inappropriate – however Excel’s random function is not a good example of a pseudorandom function – you’ll find plenty of sophisticated algorithms out there that are almost indistinguishable from true random numbers.

Reply
- Peter Thomas says:
  
  22 Apr 2010 at 12:08 pm
  
  No problem Daoud,
  
  I guess a subtext to the article was both the overlap and differences between Physicists and Mathematicians – maybe I should say Pure Mathematicians as we all know that Applied Mathematicians are really Physicists in disguise ;-).
  
  I know there are much better algorithms for pseudo-random number generation than the one implemented in Excel – however I do wonder how many Excel users realise just how bad RAND() is. However my point was to contrast a mainstream view of what a random distribution looks like (i.e. even in aggregate) and what a stochastic process actually looks like (i.e. lumpy). Hopefully I achieved this despite eliding some details that the cognoscenti might think important.
  
  As per the final paragraph, my points were more aimed at the casual user of statistics (people like myself maybe) than at the professional.
  
  Peter
  
  Reply
Peter Thomas says:

22 Apr 2010 at 6:58 pm

This article is also syndicated on SmartDataCollective at http://smartdatacollective.com/Home/26224

Peter

Reply
Peter Thomas says:

23 Apr 2010 at 1:37 pm

Daoud,

Relative to our discussion on RAND() I was just sent the following.

Peter

Reply
Sid says:

28 Apr 2010 at 3:24 am

The answer is 42.

Reply
- Peter Thomas says:
  
  28 Apr 2010 at 7:28 am
  
  But what is the question?
  
  Reply
Using historical data to justify BI investments – Part III « Peter Thomas – Award-winning Business Intelligence and Cultural Transformation Expert says:

16 May 2011 at 9:19 pm

[…] Patterns, patterns everywhere, I wrote about the dangers associated with making predictions about events are essentially […]

Reply
Analogies « Peter Thomas – Award-winning Business Intelligence and Cultural Transformation Expert says:

19 May 2011 at 11:48 pm

[…] other occasions I have posted overtly Mathematical articles such as Patterns, patterns everywhere, The triangle paradox and the final segment of my recently posted trilogy Using historical data to […]

Reply
Words fail me « Peter Thomas – Award-winning Business Intelligence and Cultural Transformation Expert says:

14 Aug 2011 at 11:11 am

[…] a more cogent review of predicting volcanic erruptions, see my earlier post, Patterns patterns everywhere. […]

Reply
Merel says:

12 Dec 2011 at 1:51 pm

Hello,

I want to thank you for your information about the vesuvius, I am writing a report about the vulcano but you had the exact information i needed yet couldn’t find anywhere, of course you get full credits and i didn’t copy anything.

Thanks again

Reply
- Peter Thomas says:
  
  12 Dec 2011 at 3:10 pm
  
  Hi Merel,
  
  Glad to be of assistance and good luck with the report.
  
  All the best
  
  Peter
  
  Reply
Patterns patterns everywhere – The Sequel | Peter Thomas - Award-winning Business Intelligence and Cultural Transformation Expert says:

26 Jan 2014 at 10:18 pm

[…] in 2010 I posted a piece called Patterns patterns everywhere which used the entry point of various articles on a number of web-sites relating to the, then […]

Reply
Data Visualisation – A Scientific Treatment | Peter James Thomas says:

6 Nov 2014 at 2:47 pm

[…] Anyone interested in some of the reasons for this is directed to my earlier article Patterns patterns everywhere. […]

Reply
Ten Million Aliens – More musings on BI-ology | Peter James Thomas says:

14 Nov 2014 at 6:40 pm

[…] Patterns patterns everywhere […]

Reply
Data Visualization – A Scientific Treatment (Peter James Thomas) | Michael Sandberg's Data Visualization Blog says:

20 Nov 2014 at 1:03 pm

[…] Anyone interested in some of the reasons for this is directed to my earlier article Patterns patterns everywhere. […]

Reply
5 More Themes from a Chief Data Officer Forum | Peter James Thomas says:

17 Nov 2015 at 1:27 pm

[…] Patterns patterns everywhere […]

Reply
Toast | Peter James Thomas says:

1 Feb 2017 at 9:05 pm

[…] consistent with existing observations and predict new phenomena. However – as I explained in Patterns patterns everywhere – a theory is only as good as the latest set of evidence and some cherished scientific […]

Reply