Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance

This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests.

This tutorial assumes that you have:

Downloaded the standard class data set (click on the link and save the data file)
Started SPSS (click on Start | Programs | SPSS for Windows | SPSS 12.0 for Windows)
Loaded the standard data set

The one way analysis of variance (ANOVA) is an inferential statistical test that allows you to test if any of several means are different from each other. It assumes that the dependent variable has an interval or ratio scale, but it is often also used with ordinally scaled data.

In this example, we will test if the response to the question "If you could not be a psychology major, which of these majors would you choose? (Math, English, Visual Arts, or History)" influences the person's GPAs. We will follow the standard steps for performing hypothesis tests:

Write the null hypothesis:
H₀: µ_Math = µ_English = µ_{Visual Arts} = µ_History
Where µ represents the mean GPA.
Write the alternative hypothesis:
H₁: not H₀
(Remember that the alternative hypothesis must be mutually exclusive and exhaustive of the null hypothesis.)
Specify the α level: α = .05
Determine the statistical test to perform: In this case, GPA is approximately ratio scaled, and we have multiple (4) groups, so the between-subjects ANOVA is appropriate.

Calculate the appropriate statistic:

SPSS assumes that the independent variable (technically a quasi-independent variable in this case) is represented numerically. In the sample data set, MAJOR is a string. So we must first convert MAJOR from a string variable to a numerical variable. See the tutorial on transforming a variable to learn how to do this. We need to automatically recode the MAJOR variable into a variable called MAJORNUM.

Once you have recoded the independent variable, you are ready to perform the ANOVA. Click on Analyze | Compare Means | One-Way ANOVA:

The One-Way ANOVA dialog box appears:

In the list at the left, click on the variable that corresponds to your dependent variable (the one that was measured.) Move it into the Dependent List by clicking on the upper arrow button. In this example, the GPA is the variable that we recorded, so we click on it and the upper arrow button:

Now select the (quasi) independent variable from the list at the left and click on it. Move it into the Factor box by clicking on the lower arrow button. In this example, the quasi-independent variable is the recoded variable from above, MAJORNUM:

Click on the Post Hoc button to specify the type of multiple comparison that you would like to perform. The Post Hoc dialog box appears:

Consult your statistics text book to decide which post-hoc test is appropriate for you. In this example, I will use a conservative post-hoc test, the Tukey test. Click in the check box next to Tukey (not Tukey's-b):

Click on the Continue Button to return to the One-Way ANOVA dialog box. Click on the Options button in the One-Way ANOVA dialog box. The One-Way ANOVA Options dialog box appears:

Click in the check box to the left of Descriptives (to get descriptive statistics), Homogeneity of Variance (to get a test of the assumption of homogeneity of variance) and Means plot (to get a graph of the means of the conditions.):

Click on the Continue button to return to the One-Way ANOVA dialog box. In the One Way ANOVA dialog box, click on the OK button to perform the analysis of variance. The SPSS output window will appear. The output consists of six major sections. First, the descriptive section appears:

For each dependent variable (e.g. GPA), the descriptives output gives the sample size, mean, standard deviation, minimum, maximum, standard error, and confidence interval for each level of the (quasi) independent variable. In this example, there were 7 people who responded that they would be a math major if they could not be a psychology major, and their mean GPA was 3.144, with a standard deviation of 0.496. There were 16 people who would be an English major if they could not be a psychology major, and their mean GPA was 2.937 with a standard deviation of 0.5788.

The Test of Homogeneity of Variances output tests H₀: σ²_Math = σ²_English = σ²_Art = σ²_History. This is an important assumption made by the analysis of variance. To interpret this output, look at the column labeled Sig. This is the p value. If the p value is less than or equal to your α level for this test, then you can reject the H₀ that the variances are equal. If the p value is greater than α level for this test, then we fail to reject H₀ which increases our confidence that the variances are equal and the homogeneity of variance assumption has been met. The p value is .402. Because the p value is greater than the α level, we fail to reject H₀ implying that there is little evidence that the variances are not equal and the homogeneity of variance assumption may be reasonably satisfied.

The ANOVA output gives us the analysis of variance summary table. There are six columns in the output:

Column Description

Unlabeled (Source of variance) The first column describes each row of the ANOVA summary table. It tells us that the first row corresponds to the between-groups estimate of variance (the estimate that measures the effect and error). The between-groups estimate of variance forms the numerator of the F ratio. The second row corresponds to the within-groups estimate of variaince (the estimate of error). The within-groups estimate of variance forms the denominator of the F ratio. The final row describes the total variability in the data.

Sum of Squares The Sum of squares column gives the sum of squares for each of the estimates of variance. The sum of squares corresponds to the numerator of the variance ratio.

df The third column gives the degrees of freedom for each estimate of variance.

The degrees of freedom for the between-groups estimate of variance is given by the number of levels of the IV - 1. In this example there are four levels of the quasi-IV, so there are 4 - 1 = 3 degrees of freedom for the between-groups estimate of variance.

The degrees of freedom for the within-groups estimate of variance is calculated by subtracting one from the number of people in each condition / category and summing across the conditions / categories. In this example, there are 2 people in the Math category, so that category has 7 - 1 = 6 degrees of freedom. There are 16 people in the English category, so that category has 16 - 1 = 15 degrees of freedom. For art, there are 15 - 1 = 14 degrees of freedom. For history there are 7 - 1 = 6 degrees of freedom. Summing the dfs together, we find there are 6 + 15 + 14 + 6 = 41 degrees of freedom for the within-groups estimate of variance. The final row gives the total degrees of freedom which is given by the total number of scores - 1. There are 45 scores, so there are 44 total degrees of freedom.

Mean Square The fourth column gives the estimates of variance (the mean squares.) Each mean square is calculated by dividing the sum of square by its degrees of freedom.

MS_{Between-groups} = SS_{Between-groups} / df_{Between-groups}
MS_{Within-groups} = SS_{Within-groups} / df_{Within-groups}

F The fifth column gives the F ratio. It is calculated by dividing mean square between-groups by mean square within-groups.
F = MS_{Between-groups} / MS_{Within-groups}

Sig. The final column gives the significance of the F ratio. This is the p value. If the p value is less than or equal your α level, then you can reject H₀ that all the means are equal. In this example, the p value is .511 which is greater than the α level, so we fail to reject H₀. That is, there is insufficient evidence to claim that some of the means may be different from each other.

Column	Description
Unlabeled (Source of variance)	The first column describes each row of the ANOVA summary table. It tells us that the first row corresponds to the between-groups estimate of variance (the estimate that measures the effect and error). The between-groups estimate of variance forms the numerator of the F ratio. The second row corresponds to the within-groups estimate of variaince (the estimate of error). The within-groups estimate of variance forms the denominator of the F ratio. The final row describes the total variability in the data.
Sum of Squares	The Sum of squares column gives the sum of squares for each of the estimates of variance. The sum of squares corresponds to the numerator of the variance ratio.
df	The third column gives the degrees of freedom for each estimate of variance. The degrees of freedom for the between-groups estimate of variance is given by the number of levels of the IV - 1. In this example there are four levels of the quasi-IV, so there are 4 - 1 = 3 degrees of freedom for the between-groups estimate of variance. The degrees of freedom for the within-groups estimate of variance is calculated by subtracting one from the number of people in each condition / category and summing across the conditions / categories. In this example, there are 2 people in the Math category, so that category has 7 - 1 = 6 degrees of freedom. There are 16 people in the English category, so that category has 16 - 1 = 15 degrees of freedom. For art, there are 15 - 1 = 14 degrees of freedom. For history there are 7 - 1 = 6 degrees of freedom. Summing the dfs together, we find there are 6 + 15 + 14 + 6 = 41 degrees of freedom for the within-groups estimate of variance. The final row gives the total degrees of freedom which is given by the total number of scores - 1. There are 45 scores, so there are 44 total degrees of freedom.
Mean Square	The fourth column gives the estimates of variance (the mean squares.) Each mean square is calculated by dividing the sum of square by its degrees of freedom. MS_{Between-groups} = SS_{Between-groups} / df_{Between-groups} MS_{Within-groups} = SS_{Within-groups} / df_{Within-groups}
F	The fifth column gives the F ratio. It is calculated by dividing mean square between-groups by mean square within-groups. F = MS_{Between-groups} / MS_{Within-groups}
Sig.	The final column gives the significance of the F ratio. This is the p value. If the p value is less than or equal your α level, then you can reject H₀ that all the means are equal. In this example, the p value is .511 which is greater than the α level, so we fail to reject H₀. That is, there is insufficient evidence to claim that some of the means may be different from each other.

We would write the F ratio as: The one-way, between-subjects analysis of variance failed to reveal a reliable effect of other major on GPA, F(3, 41) = 0.781, p = .511, MS_error = 0.292, α = .05.

The 3 is the between-groups degrees of freedom, 41 is the within-groups degrees of freedom, 0.781 is the F ratio from the F column, .511 is the value in the Sig. column (the p value), and 0.292 is the within-groups mean square estimate of variance.

Decide whether to reject H₀: If the p value associated with the F ratio is less than or equal to the α level, then you can reject the null hypothesis that all the means are equal. In this case, the p value equals .511, which is greater than the α level (.05), so we fail to reject H₀.

When the F ratio is statistically significant, we need to look at the multiple comparisons output. Even though our F ratio is not statistically significant, we will look at the multiple comparisons to see how they are interpreted.

The Multiple Comparisons output gives the results of the Post-Hoc tests that you requested. In this example, I requested Tukey multiple comparisons, so the output reflects that choice. Different people have different opinions about when to look at the multiple comparisons output. One of the leading opinions is that the multiple comparison output is only meaningful if the overall F ratio is statistically significant. In this example, it is not statistically significant, so technically I should not check the multiple comparisons output.

The output includes a separate row for each level of the independent variable. In this example, there are four rows corresponding to the four levels of the quasi-IV. Lets consider the first row, the one with major equal to art. There are three sub-rows within in this row. Each sub-row corresponds to one of the other levels of the quasi-IV. Thus, there are three comparisons described in this row:

Comparison H₀ H₁

Art vs English H₀: µ_Art = µ_English H₁: µ_Art ≠ µ_English

Art vs History H₀: µ_Art = µ_History H₁: µ_Art ≠ µ_History

Art vs Math H₀: µ_Art = µ_Math H₁: µ_Art ≠ µ_Math

Comparison	H₀	H₁
Art vs English	H₀: µ_Art = µ_English	H₁: µ_Art ≠ µ_English
Art vs History	H₀: µ_Art = µ_History	H₁: µ_Art ≠ µ_History
Art vs Math	H₀: µ_Art = µ_Math	H₁: µ_Art ≠ µ_Math

The second column in the output gives the difference between the means. In this example, the difference between the GPA of the people who would be art majors and those who would be English majors is 0.2532. The third column gives the standard error of the mean. The fourth column is the p value for the multiple comparison. In this example, the p value for comparing the GPAs of people who would be art majors with those those who would be English majors is 0.565, meaning that it is unlikely that these means are different (as you would expect given that the difference (0.2532) is small.) If the p values is less than or equal to the α level, then you can reject the corresponding H₀. In this example, the p value is .565 which is larger than the α level of .05, so we fail to reject H₀ that the mean GPA of the people who would be art majors is different from the mean GPA of the people who would be English majors. The final two columns give you the 95% confidence interval.

The next part of the SPSS output (shown above) summarizes the results of the multiple comparisons procedure. Often there are several subset columns in this section of the output. The means listed in each subset column are not statistically reliably different from each other. In this example, all four means are listed in a single subset column, so none of the means are reliably different from any of the other means. That is not to say that the means are not different from each other, but only that we failed to observe a difference between any of the means. This is consistent with the fact that we failed to reject the null hypothesis of the ANOVA.

The final part of the SPSS output is a graph showing the dependent variable (GPA) on the Y axis and the (quasi) independent variable (other major) on the X axis:

Because the quasi-independent variable is nominally scaled, the plot really should be a bar plot. Double click on the plot to invoke the SPSS Chart Editor:

In the Chart Editor, click on one of the data points:

In the Chart Editor, select Chart | Change Data Element Type | Simple Bar:

The new bar graph appears in the editor:

Make any other changes to the bar graph that you want. (See the tutorial on editing graphs if you don't remember how to make changes.)

Close the Chart Editor by selecting File | Close in the chart editor.