| Sample means | |||||
| It would be very nice if all the data we collected were distributed
normally; then we could use what we know about the normal distribution--
the patterns stated in the empirical rule-- to apply to the data.
Unfortunately not all the data we gather will be bell curve data.
For example, the probabilities involved in rolling a 1, 2, 3,
4, 5, or 6 on a die follows a uniform distribution. The probability
of finding x defects in a manufactured door follows a different
non-normal distribution.
However, if we take repeated large samples, and average within each sample, the sample averages will be normally distributed! This occurs even with samples from a non-normal distribution In general, sample means are normally distributed. This property is essential to statistical process control. We start by taking a sample, consisting of several measurements; we next find the average of those measurements, which is called a sample mean. A set of sample means of several such samples is our new data set. This data set will be approximately normally distributed. From individual measurements, we obtain new data based on subgroups-- the means of the subgroups. If this is the situation, then the sample means are normally distributed. |
|||||
|
The fine print:
1. sample means from samples of any size are normally distributed if the parent distribution is normal. 2. sample means from large samples (n is 30 or more) are approximately normally distributed even if the parent distribution is non-normal. The parent distribution must be free of extreme skew or outliers. The standard deviation of the sample means is given by |
|||||
|
Also, it is always true that
the mean of all possible sample means is the same as the mean of the individuals. This holds for any distribution-- normal, uniform, skewed, anything. The key idea of statistical process control: If we sample the product stream, and we compute the average of each sample, then those averages should follow the empirical rule. We would expect that 68% of our measurements would fall within one standard deviation of the mean. Less than 1% of the averages would fall outside of 3 standard deviations from the mean. In such a case, variation in the sample means is due to chance and "noise", the small effects of random variation. If the sample means do not follow the empirical rule, then some other cause, not just chance variation, is affecting the process. One note of caution: the sample means of small samples are only approximately normally distributed, this is allowed for in the numerical constants used in SPC. |
|||||