In Section 2, the main concern was with producing a table of data, for others to read, that communicates clearly the important patterns or messages in the data. In this section, the focus changes slightly. Your role will be that of the reader or user of the data in a table, and you will learn about approaches that make it easier for you to extract information from a table. However, manipulating tabular data into a form that makes it clearer to others will also, very often, make it clearer to
Author(s): The Open University

## Example 2.2: Early retirement from the National Health Service

A study was carried out to investigate various aspects of early retirement from the British National Health Service (NHS). In 1998â€“99, 5469 NHS employees from England and Wales were granted early retirement because
Author(s): The Open University

Can Table 2.4 be simplified further by pooling more rows or columns? Perhaps it might be, but there may well be a risk of losing some important or relevant information. So, before considering any further simplification, we shall look at adding information to the table, in the form of the r
Author(s): The Open University

In much of your statistical work, you will begin with data set, often presented in the form of a table, and use the information in the table to produce diagrams and/or summary statistics that help in the interpretation of the data set. However, in practice, much interpretation of data sets can be done directly from an appropriate table of data, or by re-presenting the data in a rather different tabular form. Dealing with data in tables is the subject of this section and the next. By the time
Author(s): The Open University

## Activity 3 Exercise 1.1 Memory recall times

In a study of memory recall times, a series of stimulus words was shown to a subject on a computer screen. For each word, the subject was instructed to recall either a pleasa
Author(s): The Open University

In this section you have been introduced to the boxplot. This is a graphic that represents the key features of a set of data. A typical boxplot is shown in Figure 1.8.

Author(s): The Open University

## Activity 1 Drawing a boxplot: chondrite meteors

In this first section, you will learn how to construct a boxplot for a single set of data. The use of boxplots to compare two or more sets of data will then be discussed.

Author(s): The Open University

All materials included in this course are derived from content originated at the Open University.

Course image: Kjetil Korslien in Flickr made available under Creative Commons Attribution-NonCommercial 2.0 Licence.

Except for third party materials and otherwise
Author(s): The Open University

In this section, various ways of summarising certain aspects of a data set by a single number have been discussed. You have been introduced to two pairs of statistics for assessing location and dispersion. The median and interquartile range provide one pair of statistics, and the mean and standard deviation the other, each pair doing a similar job. As for the choice of which pair to use, there are pros and cons for either. You have seen that the median is a more resistant measure of location
Author(s): The Open University

It is worth noting that a special term is reserved for the square of the sample standard deviation: it is known as the sample variance.

## The sample variance

The sample variance of a data sample x 1, x 2, â€¦, xn
Author(s): The Open University

5.6.1 Quartiles for the SIRDS data

For the 23 infants who survived SIRDS, the ordered birth weights are given in Table 9. The first quartile is

qL = x (Â¼(23+1)) = x (6) = 1.720kg.

The third quartile is

qU = x (Â¾
Author(s): The Open University

5.5 Measures of dispersion

During the above discussion of suitable numerical summaries for a typical value (measures of location), you may have noticed that it was not possible to make any kind of decision about the relative merits of the sample mean and median without introducing the notion of the extent of variation of the data. In practice, this means that the amount of information contained in these measures, when taken in isolation, is not sufficient to describe the appearance of the data. A more informative numer
Author(s): The Open University

5.3 The mean

The second measure of location defined in this course for a collection of data is the mean. Again, to be precise, we are discussing the sample mean, as opposed to the population mean. This is what most individuals would understand by the word â€˜averageâ€™. All the items in the data set are added together, giving the sample total. This total is divided by the number of items (the sample size).

Author(s): The Open University

5 Numerical summaries

Histograms provide a quick way of looking at data sets, but they lose sight of individual observations and they tend to play down â€˜intuitive feelâ€™ for the magnitude of the numbers themselves. We may often want to summarize the data in numerical terms; for example, we could use a number to summarize the general level (or location) of the values and, perhaps, another number to indicate how spread out or dispersed they are. In this section you will learn about some numerical summaries
Author(s): The Open University

4.4 Histograms and scatterplots: summary

Two common graphical displays, most frequently used for continuous data (arising from measurements), have been introduced in this section. A histogram is in a sense a development of the idea of a bar chart. A set of continuous data is divided up into groups, the frequencies in the groups are found, and a histogram is produced by drawing vertical bars, without gaps between them, whose heights are proportional to the frequencies in the groups. You have seen that the shape of a histogram drawn f
Author(s): The Open University

4.3 Scatterplots: body weights and brain weights for animals

In our discussion of the data on body weights and brain weights for animals in section 1.7, we conjectured a strong relationship between these weights on the grounds that a large body might well need a large brain to run it properly. At that stage a â€˜difficultyâ€™ with the data was also suggested, but we did not say exactly what it was. It would, you might reasonably have thought, be useful to look at a scatterplot, but you will see the difficulty if you actually try to produce one. Did you
Author(s): The Open University

4 Histograms and scatterplots

In this section, two more kinds of graphical display are introduced â€“ histograms in section 3.2 and scatterplots in section 3.3. Both are most commonly used with data that do not relate to separate categories, unlike pie charts and bar charts. However, as you will see, histograms do have something in common with bar charts. Scatterplots are a very common way of picturing the way in which two different quantities are related to each other.

Author(s): The Open University

3.8 Pie charts and bar charts: summary

Two common display methods for data relating to a set of categories have been introduced in this section. In a pie chart, the number in each category is proportional to the angle subtended at the centre of the circular chart by the corresponding â€˜sliceâ€™. In a bar chart, the number in each category is proportional to the length of the corresponding bar. The bars may be arranged vertically or horizontally, though it is conventional to draw them vertically where the labelling of the chart ma
Author(s): The Open University