Previous statistical methods have assumed that the data is either Normally distributed, or approximately so. When the data is clearly not Normally distributed, an alternative method is to use non-parametric techniques. These techniques do not assume the data are Normally distributed, and thus non of the properties of the Normal distribution are applied directly. These techniques tend to use the median and the ranks of the values. It is possible to use non-parametric methods on normally distributed data, but this will waste the useful properties of the Normal data.
There are numerous non-parametric techniques, however this section will cover only two methods. The first is analogous to the two sample T-test, and the second is analogous to the Paired T-test.
The Mann-Whitney U test
The following is a very simple description of the principle of the Mann-Whitney U test. Returning to the file height.sav you can recall from the Descriptive Statistics Chapter, that the time variable is not Normally distributed. We have already seen that given a large enough sample, the mean of this variable could still be assumed to have come from a normal distribution, and thus a parametric test performed. For convenience, in this example we shall randomly chose 10 members of Class A, and 10 members of Class B. Suppose we wished to compare the distributions of time between the 10 members of Class A and the 10 members of Class B, and we were not confident about assuming that the data were Normally distributed; we must use a non - parametric test.
First all the time values are ranked from 0 to 19. If Class A
tended to have lower values of time than Class B, we would
expect the Class A values to cluster towards the lower ranked values. The
most extreme cases being where all members of Class A were ranked 0 to
9 (Table 1). The alternative extreme occurs, if
all members of Class B had lower time values than Class A and is
represented by swapping the class columns in Table 1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If there is no difference in the distributions of time between the two classes then we would expect a completely random order of ranks over the classes (eg Table 2).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Under the null hypothesis it is assumed that any distribution of rankings is as likely as any other. The probability of obtaining the distribution of rankings observed or one more extreme (up to Table 1) is then calculated, under this null hypothesis. The resulting probability indicates how likely it is that the distribution observed could have occurred if there were no difference in the distribution of time between the two classes.
Using the real Classes given in the height data, we obtain Table
3. To perform a Mann-Whitney U test in SPSS, click on Statistics,
Nonparametric Tests, 2 Independent Samples…
(Figure 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now place time in the TestVariable List: and class in the Grouping Variable: box, along with 1 & 2 in the relevant Define Groups… Boxes. Then click on OK to give Figure 2.
It can be seen that if the distribution of time is similar in both classes, the probability of having the above number or more of Class A with a time less than Class B is 0.0413. Thus there is evidence to suggest the two classes have a different distribution of time.
The Wilcoxon matched pairs test
This test is analogous to the paired T-test described earlier. The data
contained in headache.sav will be used to illustrate this test.
Here there were two values of headache clearing time recorded for each
person. One for tablet A, and the other, for tablet B. Similar to the paired
T-test, the difference between the pairs of values are found. In the non-parametric
test, these values are now ranked by their absolute value. For example,
if we randomly took 6 pairs of values from the cleara & clearb
columns the most extreme events and an event indicating no difference are
illustrated in table uu.
Extreme Favoring A
|
No Favoring
|
Extreme Favoring B
|
(0) - (1+2+3+4+5+6) = -21
will represent the most extreme difference between the sums of the ranks associated with negative and positive differences. In the lower part of the table, this extreme gives a sum of 0 to the ranks associated to the negative differences. Now,
(1+2+3+4+5+6) - (0) = 21
will represent the most extreme value in this direction.
For the example where the effects of A & B appear similar this difference is calculated as:
(5+3+2) - (1+4+6) = -1
Clearly the closer the difference is to 0, the more similar are the effects of A & B, and the greater the difference, the more dis-similar are the effects of A & B. Also, it can be seen that the maximum difference is dependent upon the sample size. In this example the most extreme situation lead to a maximum difference of 21.
The aim of the Wilcoxon matched pairs test is to find the probability of obtaining the difference in the summed ranks or a greater difference, assuming the effects of the two tablets are the same.
From the headache.sav data, if we use all 17 cases, in SPSS the Wilcoxon matched pairs test is performed by using the Statistics, Nonparametric Tests, 2 Related Samples… Then in a similar manner to a paired T-test, select both cleara & clearb and put them in the Test Pair(s) List: (Figure 3), OK, to give the output shown in Figure 4. Here we find the probability of obtaining such data, given that there is no difference in the effects of Tablets A & B is 0.300. That is to say there is no evidence to suggest any significant difference between the two tablets. Compare this with the results of the paired T-test performed earlier on the same data. The smaller p value obtained using a parametric approach highlights the greater power there is to detect a difference by utilizing the properties of the Normal distribution (where appropriate).
Introduction
|
Summary
Statistics |
Descriptive Statistics
|
Sampling |
Normal Distribution
| The t-Student Distribution
|
Correlation and Regression
|
Analysis of Variance
|
Contingency Tables | Non-Parametric Statistics