The latest addition to our team and our resident expert in data analysis and predictive modelling, Shaylon Stolk, shares her knowledge of statistics. By understanding these basic analysis tools data can seem much less complex!
Most of us are familiar with the arithmetic mean or average, or a set of data. This is simply the sum of the data divided by the number of points. If you have a dataset with a normal distribution, this will be a representative typical value for your dataset.
If you have a data set with lots of outliers (numbers which fall far outside the expected values), a dataset with lots of variability, or a non-normal distribution, it may be better to take the median than the mean. Arrange your data points in ascending order by value, and take the middle number in the sequence.
Taking the mode of a group of values identifies which value occurs most frequently. In a normally distributed data set, this will be the mean value, or extremely close to the mean value.
Standard Deviation and Error
Identifying the level and significance of variation in data trends is not particularly intuitive. To make these determinations, you’ll want to understand both the standard deviation and the standard error.
The standard deviation is a measure of ‘normal’ variability. A data set with a lot of variation will have a larger standard deviation than a data set with less variability. 70% of your values should fall within one standard deviation of the mean. A value which falls within 2 or 3 standard deviations can be considered an outlier.
Error describes the level of uncertainty in a set of results. Typically, the larger the sample size, the smaller the error. An error of 5% or lower is usually seen as indicating statistically robust results.