bsodmike's photostream @ www.flickr.com
            

Standard Deviation Explained

27/01/2010

This little write up, has for a while been on my mind. It’s nothing earth-shattering but I feel that this simple topic on statistics is something students, at school and University alike, have found to be a fuzzy grey area.

The problem is that most text books on Mathematics, are simply just that. More often than not, the authors use an extremely high fluent form of the language; especially to “show off” their intellect to their colleagues. However, this is where most Math texts start to fail, especially when the subject matter is being introduced for the very first time.

Even the Wikipedia entry on Standard Deviation, can be considered as daunting for some; it should be noted that this article is based on some content from this page.

As such, I will try my best to explain the concept of Standard Deviation (“SD”) as clearly as possible.

Let us consider three data sets {0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8, 8} which have an arithmetic mean of 7. The formula for evaluating the arithmetic mean of a set of numbers is given below,

This is essentially the process by which, all the numbers are summed together and then the total sum is divided by the total number of elements. In the above example, N = 4, since there are four (4) elements in each set.

Their standard deviations are 7, 5, and 1, respectively, and is evaluated by using the following formula:

The above formula simply means the following:

  • Obtain the arithmetic mean of the ‘population’ data or set of numbers.
  • Subtract the mean from each element of the set and square this value
  • Sum all the above squared terms
  • Divide the final ’sum of squared terms’ by N
  • Evaluate the square root of the final value

Note: You will often find a formula similar to the one above, with the difference of the ’sum of the squared terms’ being divided by N-1 rather than N. N-1 is used when the SD is being evaluated of a sample of data rather than the entire population itself. We will be focussing upon the formula as shown above.

This will now result in the SD of that particular set. In the above example, the third set has a much smaller standard deviation than the other two because its values are all close to 7.

Let’s consider another example: A data set with a mean of 50 (shown in blue) and a standard deviation (?) of 20 is plotted below:

It can be seen that most of the data points are contained within plus or minus one standard deviation (±?) from the mean (?) of the set of data.

(It should be noted that we have referred to the arithmetic mean as “x bar” and as well as ?; In the most strictest sense, “x bar” is the ’sample mean’ whilst ? is the mean of an entire population. In this context, the examples shown here are only of complete populations and not samples. More detail on this here.)

Without going too much into details of the Standard Normal Distribution, I will now touch upon the Central Limit Theorem.

This simply states that the distribution of a sum of many independent, identically distributed random variables tends towards the normal distribution. If a data distribution is approximately normal then about 68% of the values are within 1 standard deviation of the mean (mathematically, ? ± ?, where ? is the arithmetic mean), about 95% of the values are within two standard deviations (? ± 2?), and about 99.7% lie within 3 standard deviations (? ± 3?). This is known as the 68-95-99.7 rule, or the empirical rule.

So, um…what is SD?
Right, ‘dumbing down’ everything I’ve just said above, the Standard Deviation is simply a means to evaluate the volatility or measure of uncertainty within the data points themselves with respect to the mean of the entire set.

For example, the set {1,1,1,1} would have a SD of zero; simply because the values of each element does not deviate from the mean (?=4/4=1). Although a rather simplistic concept, it is a key concept in statistics that bares as great importance in many fields.

In engineering for example, the reported standard deviation of a group of repeated measurements is an indication of the precision of the instrument used to make the measurements. In CAM, SD and the Standard Normal Distribution is a crucial concept used in Concurrent Engineering/Set Theoretic.

No Comments

About

For the past couple years I lived in the UK, reading in BEng (Hons) Electronic and Computer Engineering at The University of Leeds and MSc (Dist) Mechatronics at King's College London.

My interests and hobbies include writing with Fountain Pens on various ink and paper, Swiss and German wristwatches, authoring articles in Mathematics, Physics, and Engineering, and Gundam modeling.

I have been following much Anime over the years as well as TV Shows with the likes of 24, Smallville, Dexter, and NCIS becoming favourites.