A pretty comprehensive guide to statistics for the non-technical reader. Rowntree focuses on the 'why' of quantitative methods so that the 'how' makes more sense.
Statistics has been developed as a way of making sense of collections of observations. It aims, particularly to help us avoid jumping to conclusions. It reminds us to be cautious about the extent to which we can generalise from out always limited experience.
Likelihood/probability is central to the statistical view of the world.
Statistics helps us to look for reliable regularities and associations among things 'in general' and 'in the long run'. At the same time, however, it teaches us to be cautious about expecting these regularities to hold true of any specific individuals.
Chief concerns of statistics:
Statistical thinking is a way of recognising that our observations of the world can never be totally accurate; they are always somewhat uncertain.
Reliability of the generalisation will depend on how well the sample represents the population.
Variable: Any attribute or characteristic that will help us distinguish between one individual or another.
Some error is inevitable in statistics.
Errors differ from one subject to another. In experimental science they are likely to be minute. In social science, they are much larger.
Frequency Distribution: The frequency with which each value in the distribution was observed.
The tendency of the observations to pile up around a particular value.
The more variable the values the more dispersed they will be - so we are looking for a measure of dispersion.
Inter-quartile range: The 'mini-range' based on the difference in values between quartiles
Standard Deviation: The most common measure of dispersion. The greater the dispersion, the bigger the deviation and the bigger the standard ('average') deviation.
Calculating the Standard Deviation (data set: 111, 114, 117, 118, 120)
Whatever the distribution, you'll find that the standard deviation never approaches anywhere near the range.
A distribution can be positively or negatively skewed:
In symmetrical distributions, all three measures of central tendency are to be found in the same place—at or near the centre. The measures move depending on whether there is a positive or negative skew.
The normal curve of distribution (the bell curve) is not the 'usual' curve. rather, 'norm' is being used in the sense of a pattern or standard (idealised, perfect) against which we can compare the real-world distributions.
Proportions under the normal curve
z-values: The base-line of the distribution measured off both in unity of the standard deviation.
Any value in a distribution can be converted into a z-value by subtracting the mean of the distribution and dividing the difference by the standard deviation.
Statistics: Figure is derived from the sample (e.g. mean, median, mode, range, standard deviation, inter-quartile range). Represented by roman letters (e.g. sample mean = x̄)
Parameters: The true mean, mode etc. of the population. Represented by greek letters (e.g. population mean = μ)
Sampling Variation: the variability from one sample to another.
Standard Error (SE): The standard deviation of a sampling distribution (e.g. of the sample-means). Influenced by three factors:
The smaller the standard error, the more confident we can be that our sample-mean is close to the population-mean
Confidence Interval: The sample mean (x̄) ± margin of error (SE) (e.g. 50 ± 1.5, gives a range of 48.5 to 51.5, which is a 68% confidence interval)
Test of Significance: Asking whether the difference between samples is big enough to signify a real difference between populations.
Null Hypothesis: An assumptions that attempt to nullify the difference between two sample-means by suggesting that it is of no statistical significance. Through the research process the null hypothesis is 'under assault'.
Alternate Hypothesis: When we reject the null hypothesis, we replace it with the an alternative hypothesis. The most usual alternative hypothesis would be simply that the two population-means are not the same.
We reject the null hypothesis when the difference between samples signifies a real difference between the populations.
Significant does not necessarily imply 'interesting' or 'important'. Note also that it is not the social or human value of the difference we are looking at. We are only concerned with how big it is in terms of the standard error of such differences (and thus how likely to be repeated in future comparisons).
The bigger the difference, the more confidently we can reject the null hypothesis.
In significance testing there are two opposite risks:
The emphasis is on reducing Type 1 errors.
Critical region: an area in the theoretical distribution, where the alternative hypothesis will seem more acceptable than the null hypothesis.
z-test: the significance test discussed above.
t-test: the t-test is used instead when samples are smaller than 30
f-test: a single test whereby several samples can be compared all at once
Correlation concerns the strength of the relationship between the values of two variables.
Regression Analysis: determines the nature of that relationship and enables us to make predictions from it.
Correlation Coefficient (r): a numerical figure that will be at maximum when the correlation is strong and reduce to a minimum as the correlation weakens.
How far we can trust a sample-r as an estimate of population-r will depend on two factors.:
Correlation (a mathematical relationship) can never prove a causal connection. What it does do is give support to an explanation you can justify on logical grounds.
Image credits: Statistics Without Tears by Derek Rowntree
A quick read on how to design websites that users will enjoy and want to return to. The examples are pretty dated, but it's still worth a read given how short it is.
This book is about our compulsion, as designers, to attempt to solve every problem with a smartphone or laptop. Krishna lays out three principles to help us move beyond today's screen-obsessed world.