Visualizing Variability : Histograms and their Frequency Tables

A histogram is an important aide in visualizing the "demographics" of a dataset. The histogram shows features not apparent in reading a table or list of data. It is defined as a grouped frequency chart. A histogram shows the distribution of the data. The term distribution describes the idea that some regions of a dataset may be densely populated while other regions contain few members.

A histogram is made from a frequency table. A single list of numbers is broken out into two columns, one containing divisions of the range of the data, and the other containing the number of values in each division. See the frequency chart below.

Each column of the frequency table is charted on its own axis. The horizontal axis contains the labels for the divisions, which are also called classes or bins. The vertical axis shows the number of members of each class, called the frequency. See the histogram below.

 

Frequency chart of hours-to-failure

Hour classes
Frequency
0-19
36
20-39
10
40-59
12
60-79
10
80-99
11
100-119
15
120-139
13

This frequency table shows seven classes in the left column, each having a class width of 20 hours. The right column holds the count of occurrences (frequency) . Unequal class widths distort the true shape of the distribution.

The histogram at right is based on this frequency chart. The bars should be drawn without spaces between them if the boundary interval is zero.

#12 What are some facts about the time-to-failure that you can gather from the histogram?

Making a histogram

Frequency charts and histograms take one column of data and put the data into size classes. The electronics data was put into 7 classes of width 20 hours each. The classes must be equal width and there should be at least 5 classes. Histograms usually have from 5 to 15 classes. A rule of thumb is to divide the range by 10 to get a trial class width. But the class width should be a friendly number because the class labels need to be understandable. 1, 2, 5, 10 are good class widths. A class width of 17 or 4.327 is not easy to follow or to work with. Once the classes have been determined, make a frequency table of how many data items occur in each class. For the chip production line 1, the data is 13, 13, 14, 14, 14, 14, 15, 15, 16, 17. The data naturally fits a class size of one, so the frequency table would be

Class
Frequency
13
2
14
4
15
2
16
1
17
1

#13 Make a frequency table and histogram for each chip packaging lines 1 & 2.

Excel and other statistical programs can create histograms from the raw data or from frequency tables. Each has features and quirks that you should be aware of before placing faith in the charts they produce. For example, Excel uses bin labels as the upper class boundary. The class 13 includes weights from over 12 through 13 oz! See the hours-to-failure chart above. Other programs base the class on a centerline plus or minus half the class width.

#14 What can you say about the chip packaging lines 1 and 2 comparing the histograms you obtained?

Rules of thumb and suggestions for making histograms.

1. Make a frequency table from a sorted list of the data or a tally sheet. Use the frequency table to make the histogram.
2. The number of classes should be at least five and probably not more than twenty.
3. The class boundary numbers should be easy to understand-- multiples of 1,2,5, 10, etc.
4. An estimate of the class width can be gotten by dividing the data range by the number of classes. Round this estimate to produce reasonable class boundaries.
5. The class widths should be the same, except for "less than" and "more than" classes, which should be avoided.
6. All classes must be shown. Don't omit classes with zero frequency.
7. Make the bars join together without gaps if the classes join together; i.e. if the data is continuous.
8. When using a computer to generate histograms, observe carfully how the program organizes class boundaries.
9. Start the bar scale at zero if possible; otherwise, show a clear break to indicate the bar is not starting at zero.

Process simulations home page