| Exercise 1. Take at least thirty samples from a production run, with sample size n = 5. It is always a good idea to plot the data first. In this case, make a histogram of the hardness measures, and state what the histogram has to say about the blade hardness after the process of heat treating and tempering twice. | ||||||||||||||||||||||||||||||||||||||||
| Solution of exercise 1 A valid analysis depends on the quality of the data collection, which means the data are free from systematic bias. Random sampling guarantees that the data will reflect the population, at least over the long term. The simulation sampling method is systematic within each batch starting with a random choice for the first item. Thirty samples of 5 each are not the same as 90 individuals randomly chosen, but they are similar. |
||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||
|
Fig. 14. Frequency chart for hardness.
|
||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||
| The frequency chart in figure14 was used to create the histogram in figure 15. The histogram shows that the individual hardness measurements are strongly skewed to the left. The reasons for skewing bear investigation; there may be special causes present. The 4 blades that measured 55 are out of specification. See the work instructions. Further measurements will reveal if skewness is a regular feature of the hardness measurements.
Since the distribution is not symmetrical, statistics that rely on a balanced distribution, such as capability ratios, are not reliable! The means of samples drawn from a skewed distribution will have a more balanced distribution, however. See the sampling distribution properties in the review pages. |
||||||||||||||||||||||||||||||||||||||||
|
Fig. 15. Histogram of 150 observations.
|
||||||||||||||||||||||||||||||||||||||||
| Exercise 2. Use a spreadsheet data to generate descriptive statistics for the data. Point out the major features of the current production, based on the histogram and descriptive statistics. | ||||||||||||||||||||||||||||||||||||||||
| Solution of exercise 2 The descriptive statistics in figure16 are created using Excel's simple functions such as =average(C1:G30) and =max(C1:G:30). Descriptive statistics are not much use alone; they need to be compared with information from other times or historical levels. Even though the data is not a simple random sample we can treat the150 observations as if they were an actual random sample of the whole production run. In fact they are a systematic sample based on a random beginning number. Each blade in a batch has an equal chance of being chosen, and the method of choosing is not likely to bias the data. In order to use Excel's built-in descriptive statistics tool the 5 data columns must be stacked into one column. Figure 16.1 shows the output of the descriptive statistics tool. Often built-in functions display a degree of precision not warranted by the precision of the original measurements. For example the mean is displayed to 8 decimal digits, while the original hardness was measured only to the nearest whole unit! Similarly, derived measures such as skewness displayed to 7 decimal places implies a ridiculous level of precision. How is one to use the difference in skewness between -0.7894517 and -0.7894518? Likewise a confidence interval for the mean relies on a sampling distribution setting which we have defeated by putting all data into one large sample. The point is that complex automatic calculations can give inaccurate or meaningless statistics. The main feature of the hardness data is its skewness, with most observations at 58 and a very few some distance away at 55. The reasons for these low measures would be worth investigating. Skewness in the individual measurements argues for large sample sizes. Using n=5 is common but probably inaccurate with underlying distributions that are skewed. |
||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||
|
Figure 16. Descriptive statistics using Excel basic functions.
|
||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||
|
Figure 16.1. Descriptive statistics output using Excel's data analysis tools.
|
||||||||||||||||||||||||||||||||||||||||