Contingency Tables 


 

We have so far shown how to relate continuous variables with other continuous variables or with categorical variables. We now deal with the case where we wish to examine the relationship between two categorical variables. For example in the vlbw.sav data, we may want to know how hospital mortality (death) varies over maternal alcohol consumption (drink). This we do by producing a cross-tabulation of death by drink. In SPSS, choose Statistics, Summarize, Crosstabs..., place drink in the Row(s): box and death in the Column(s): box. Now to produce percentages, click on Cells... , then in the Percentages box, click on the Row option, Continue, OK. You should get the output shown in Figure 59. Here you can see that 52.2% of infants of those mothers who drank whilst pregnant died, whilst 21.4% of mothers who did not drink whilst pregnant died. This data may also be represented graphically, see Figure 60.


Figure 59


Figure 60

We would like to find some way of finding whether this difference is a real one, or could have occurred by chance. In this case, our null hypothesis is that the proportion of mortality in both groups is the same, or that the proportion of drinkers is the same in each mortality group. We can calculate the expected cell values assuming this null hypothesis is correct. We should first introduce some notation. We can define the cells of a contingency table as follows:
 

 
Variable A 
 
Variable B
Group A1 
Group A2
Total 
Group B1
n11 
n12
n1. 
Group B2
n21 
n22
n2. 
Total
n.1 
n.2
n = n.. 

Thus nij refers to the cell for row i and column j

n.j refers to the sum over all rows for column j

ni. refers to the sum of all columns for row i

n.. = n refers to the sum over all rows for all column, or the grand total.

Now overall n.1/n were in Group A1. For there to be no difference over variable B, then we would expect a similar proportion (n.1/n) for Group B1 and Group B2. From this we can calculate expected values for each of the cells. From the data shown in Figure 60 we have, for cell n11

Expected value n11, e11 = n1. x n.1/n = 28x35/95 = 10.32

Or in general:

Expected value for cell nij, eij = ni. x n.j / n = 

We can get SPSS to produce these expected cell values for us. Choose Statistics, Summarize, Crosstabs... Now in the Cells option choose Expected as well, and then continue as before. You should obtain a table as shown in Figure 61. We now need to compare the difference between the observed and expected values. Obviously the larger are these differences, then the greater is the effect of maternal alcohol consumption on mortality. We again need some critical value of these differences to tell us when the difference is so large that it is unlikely to be due to chance. We use what is called the Chi - Square distribution. We form a test statistic, X2 with df equal to the product of one less than the number of rows (r) and one less than the number of columns (c) . We calculate X2 as follows:

X2With df = (r-1)(c-1).


Figure 61
 

So for our example:

X2

= 7.64

With df = (2-1)(2-1) = 1

Using the Chi Square tables in Appendix 4, we find the row with df = 1 and the column with P value = 0.05. We find that for us to reject the null hypothesis X2 should be greater than 3.84. We can of course do this within SPSS. Again using Statistics, Summarize, Crosstabs... Now choose the Statistics... option and click on the Chi - square option. You should obtain the output shown in Figure 62. Here SPSS gives X2 = 7.64, 1df p=.0057. Thus we reject the null hypothesis.


Figure 62
 

The chi square test may be applied to tables of dimensions other than 2x2. However, there are restrictions on its use. Ideally, all cells should have an expected value of 5 or more. Cells with an expected value of 5 are prone to cause a disproportionately large increase in X2 for slight differences with the observed value. For 2x2 tables you may use Fisher's exact test, also given in SPSS, or for other dimensions you should re-group some of the rows or columns to ensure that the expected values exceed 5.

The 2x2 table may also be used to illustrate the odds ratio. If some event occurs with probability p, then the ratio p/(1-p) is called the odds. In our current example for non drinkers the odds of mortality are 0.214/0.786 (or 6/22) = 0.273, this is the same as dividing the area in the lower half of the drinkers bar by the upper half. The odds of mortality for non drinkers are 0.522/0.478 (or 35/32) = 1.09 The ratio of these two odds are called an odds ratio. The closer is this ratio to 1, then the less is the difference between the two groups of drinkers. In our example we have an odds ratio of : 1.09/0.273 = 3.99

Thus it seems that the infants of drinkers are approximately 4 times more likely to die than those of none drinkers. SPSS will also give you this value, and a 95% confidence interval. If you choose Statistics, Summarize, Crosstabs... Now in the Statistics... option choose Risk to give the additional output shown in Figure 63.

Figure 63
 

SPSS in this instance has given us the inverse value of 0.24935. Thus if we invert this value and the 95% confidence interval, we can say that infants whose mothers drank in our study were about 4 times more likely to die than infants of those mothers who did not drink, (95% CI 1.44, 11.14; p = 0.0057).

In this instance the 95% CI should not cross 1 for our effect to be significant at the 95% level.


Introduction | Summary Statistics | Descriptive Statistics | Sampling | Normal Distribution | The t-Student Distribution |
Correlation and Regression | Analysis of Variance  | Contingency Tables | Non-Parametric Statistics