Once you’ve been doing statistics for a while, you tend to take descriptive statistics for granted…mostly because we all use stats programs that just take our raw data and do it for us.
But for all of you who are just starting out, a thorough understanding of descriptive statistics is absolutely essential. So this is a quick post that will start from the ground up on descriptive stats.
The best way to understand descriptive statistics is to follow a problem top-to-bottom, so let’s pretend that we are researchers who are interested in the average SAT scores of college freshman.
We take a sample of 10 freshman from the University in question (if you don’t know the difference in a sample and population, go find that post). Below are their SAT scores.
Question: What is the average SAT score for this sample?
Thus, we calculate the sample mean using the following equation:
In English: add all of the scores together, and then divide by the number of scores.
So our sample mean = 730.4
Next Question: How close is everyone to the average?
In order to determine this, we calculate simple deviations; in other words, we compute how far off from the mean each individual score is.
For example, the first participant scored 787 on the SAT, so we subtract the mean…
787 – 730.4 = 56.6
This means that he scored 56.6 points above the sample average on the SAT.
Now we do the same thing for each participant. (Hint: Tables are amazing!)
Now that we have calculated simple deviations for each person…
Next Question: On average, how much do scores deviate from the mean in this sample?
That’s simple enough, right?
I mean, to get the average amount of deviation, you just take the average of all the simple deviations. Except, as you can see, there’s a problem:
All the simple deviations for a sample will always add up to zero, and we can’t take an average of zero. There’s a really extensive mathematical proof that explains why this happens…but I would just take my word for it.
Anyway…the way we compensate for this issue is by squaring all of the simple deviations – now, they no longer add up to be zero.
Now is where things get a little less intuitive, and a little more mathematical.
We now have everything we need to calculate the variance of the sample. This is the equation:
The top of this equation is the sum of squares, also known as the sum of the squared deviations. In our example this is 17,190.4.
Now we divide by the number of participants (N), minus one…also known as degrees of freedom (df).
17,190.4/(10-1) = 1,910.04 = sample variance
Because this calculation is based on these squared values we created, it’s really hard to interpret in terms of our data. So we are going to convert it back to something we understand: the standard deviation.
s² = variance
s = standard deviation
To get from s² ⇒ s, just take the square root.
√1910.04 = 43.704
What does this mean in English?
On average, SAT scores deviate from the sample mean by 43.7 points.
And there are your basics! Now, you’re capable of doing all sorts of fun (basic) statistical calculations!