Ipsative Comparisons of Index Scores for the Canadian WAIS-III

Ipsative Comparisons of Index Scores for the Canadian WAIS-III

Longman, R Stewart


The Canadian standardization of the Wechsler Adult Intelligence Scale – Third Edition (WAIS-III; Wechsler, 1997a, 2001) provides factor-based index scores, giving an intermediate level of analysis between IQ scores and individual subtests. This article provides tables for comparing all indices to the mean index score, and for identifying the statistical significance and relative frequency of obtained differences. This simultaneous or ipsative approach can avoid some of the statistical and logical pitfalls of multiple pairwise comparisons, such as decreased interpretability and inflated risk of Type I errors.


L’Échelle d’Intelligence de Wechsler pour adultes – troisième édition (WAIS-III; Wechsler, 1997a; Wechsler, 2001) fournit des indicateurs basés sur des facteurs et donne une analyse de niveau intermédiaire entre le QI et les sous-tests individuels. Le présent article présente des tableaux permettant de comparer tous les indices à l’indicateur moyen et d’identifier la signification statistique et la fréquence relative des différences obtenues. La méthode simultanée ou ipsative que nous présentons peut permettre d’éviter certains pièges inhérents aux comparaisons multiples par paire, telles que la diminution de la possibilité d’interprétation et le risque exagéré d’erreurs de type 1.

With the introduction of the Wechsler Intelligence Scale for Children – Third Edition (WISC-III: Wechsler, 1991), and, subsequently, the Wechsler Adult Intelligence Scale – Third Edition (WAIS-III: Wechsler, 1997a, 2001), factor-based index scores have been provided, reflecting relatively more homogenous selections of skills and abilities than those making up the standard Verbal and Performance IQs. For the WAIS-III, Verbal IQ has been separated and expanded into the Verbal Comprehension (VCl) and Working Memory (WMl) indices, while the Performance IQ has been broadened into the Perceptual Organization (POl) and Processing Speed (PSl) indices. These indices combine the greater reliability of the IQ scores with the greater cognitive specificity associated with the subtest scores, producing scores that may avoid some of the pitfalls of interpreting subtest score profiles, namely, the poor reliability of these profiles (Livingston, Jennings, Reynolds, & Gray, 2003; McDermott, Fantuzzo, Glutting, Watkins, & Baggaley, 1992).

The WAIS-III manual (Wechsler, 2001) includes tables for interpretation of index score differences. Tables B.I and B.2 provide values for comparing pairs of index scores for statistical significance and rarity of obtained differences between scores. This is the same approach as provided for the WISC-III. However, there are both statistical and logical objections to pairwise comparisons of multiple index scores, particularly as they are presented in the manual.

A statistical concern is that the values for determining statistical significance have not been corrected for multiple comparisons. For the six comparisons between the four index scores, the probability of making a Type I error (that is, spuriously reporting a difference between any pair of index scores when there are no true differences) is 62% if using the .15 significance level, and 26% when using the more stringent .05 significance level. This may explain the infrequency of WAIS-III profiles that do not show statistically significant differences between at least one pair of indices. If desired, this difficulty could be alleviated by using tables that provide corrected comparison values, as was done by Naglieri (1993) for the WISC-III.

Even after correcting for multiple comparisons, however, there are interpretive issues. Pairwise tests identify when a pair of indices have scores that are not likely to be the same. The interpretation of a significant difference between two indices is likely to differ, though, depending on the relative standing of each score in the overall profile. For example, if the POI score is significantly higher than PSI, this may indicate a relative strength in POI, a relative weakness in PSI, or neither may meaningfully differ from the other two indices. Single comparisons may not provide the information to make these determinations. A better strategy for interpreting index scores should use information about the overall configuration.

These concerns are only important if the index score profile can provide clinical or diagnostic information. Fortunately for users of the Wechsler scales, there is evidence that profiles of specific index strengths or weaknesses are associated with particular clinical and normative groups. For example, a reduced PSI is a common consequence of traumatic brain injury (Donders, Tulsky, & Zhu, 2001; Langeluddecke & Lucas, 2003) or a dementing process (Wechsler, 1997b, pp. 145-153). For adults with a history of learning disabilities, a reduced WMI is common (Wechsler, 1997b, pp. 176-178). In the WAIS-III normative sample, five clusters were derived based on index score profiles (Donders, Zhu & Tulsky, 2001). Three clusters differed by overall score elevation, but the remaining two profiles were characterized by a relative strength or a weakness on the PSI, compared to the remaining indices. Thus, index score profiles may help to characterize performance of both clinical and normal groups, and, by inference, individuals.

The statistical and logical issues arising from multiple pairwise comparisons have previously been reviewed when interpreting profiles of subtest scores (e.g., Knight & Godfrey, 1984). In that context, the recommended procedure has evolved from comparing pairs of subtests (e.g., Field, 1960), with many comparisons and a marked risk of overinterpreting differences, to comparing subtest scores to the overall means of subtest scores, as recommended for the WAIS-III (Wechsler, 2001). This overall comparison to the mean of the individual’s scores is considered to be more reliable, less prone to overinterpretation, and easier to communicate than the result from multiple pairs of comparisons.

Kaufman (1990, p. 436) recommended a similar procedure for comparing three empirically derived factor scores from the WAIS-R, and provided significance values. Similarly, Naglieri (1993) recommended the same approach for comparing the four index scores from the WISC-III, and provided significance values to test differences between each index and the mean of the indices. Unfortunately, this approach does not seem widely used, despite both the statistical and conceptual advantages. One reason may be a lack of data for the abnormality of these difference scores, data that are available for the pairwise comparisons of both the WAIS-III and WISC-III.

Recent tables have presented both the statistical significance and abnormality of differences from the mean index score for the U.S. standardization of the WAIS-III (Longman, 2004). The Canadian standardization of the WAIS-III, however, is characterized by slightly lower index reliabilities and lower correlations between indices, features that make tabled values from the U.S. standardization somewhat liberal. Values that are derived from the Canadian normative sample will be more accurate and useful for clinicians in this country.


Davis (1959) provides a formula for calculating critical values required for statistical significance for an individual test compared to the overall mean of multiple tests. Following Kaufman (1990, p. 436) and Naglieri (1993), I calculated values for omnibus comparisons using overall significance levels of .15, .05, and .01. These significance values were chosen to reflect the values presented for both pairwise and ipsative comparisons in the manual for the WAIS-III (Wechsler, 2001), which correspond to the .15 value recommended for hypothesis generation by Davis (1959), and the .05 and .01 values for more stringent identification of strengths and weaknesses proposed by Kaufman (1990). As inspection of the standard errors of measurement did not show any evident age-related trends, values averaged across the full age range were used (Wechsler, 2001, p. 34).

In addition to the statistical significance of differences, the relative infrequency of differences between a specific index and the overall mean were calculated to give one-tailed 15th, 10th, 5th, 2nd, and 1st percentile values, identifying relatively common and relatively uncommon discrepancies. This requires calculation of the standard deviation of the mean of the four index scores (using formula 5-3 in Nunnally, p. 153), the correlation of that mean with each index score (using formula 5-7 in Nunnally, 1978, p. 166), and then calculating the abnormality of the difference between two correlated scores, using the formula from Payne and Jones (1957).

Results and Example

The standard errors and corresponding critical values for each index are presented in Table 1. In addition, given the evidence for specific lowering of PSI or WMI in certain populations (individuals with suspected traumatic brain injury or learning disabilities), critical values for a targeted comparison of one of these index scores against the overall mean are provided in Table 2. These use unprotected alpha levels of .05 and .01, and are for directional (one-tailed) tests. Finally, Table 3 indicates the 1st, 2nd, 5th, 10th, and 15th percentiles for one-tailed differences between a specific index score and the overall mean. These can be used to give an indication of the relative infrequency of an obtained difference.

As an example of how to use these values, consider a 21-year-old woman reporting memory and attentional difficulties after a car crash, referred for consideration of possible brain injury. She shows a VCI of 101, POI of 98, WMI of 93, and a PSI of 88, giving an overall mean index score of 95. For this individual, and the hypothesis of traumatic brain injury, comparison of the PSI against the overall mean is appropriate. In this case, the obtained value is 7, as compared to a critical value of 6.94 for the .05 level of statistical significance. None of her remaining indices differ from the overall mean. Use of pairwise comparisons at a .05 level indicates reliable differences between VCI and PSI (but does not state if either is unusually high or low). Table 3 indicates that this discrepancy, although statistically significant, is commonly found in more than 15% of the general population, while the pairwise discrepancy is found in the same direction in about 20% of the general population. Thus, this discrepancy is reliable yet not unusual, but ipsative analysis eases the comparison to the data for groups of adults with traumatic brain injury.


For interpretive purposes, describing differences between obtained index scores and the mean index score is likely to be more useful and easier to report than comparing pairs of indices. It is much easier to conceptualize and describe a profile as showing a relative strength or weakness on one or two indices, rather than trying to present an overall profile in terms of differences between specific pairs of indices. The values in Table 1 allow for such comparisons, thus providing better integration of overall results than the current practice of pairwise comparisons. This is also conceptually consistent with current strategies for interpretation of subtest strengths and weaknesses (Wechsler, 1997a, 2001).

The values provided in Tables 1 and 2 do not indicate the abnormality of specific differences, but rather their statistical improbability when there are no true differences. As can be seen from Table B.2 of the WAIS-III manual (Wechsler, 2001), large discrepancies between index scores, although extremely unlikely to occur by chance, may still be relatively frequent. Thus, test users are urged to temper interpretations based on statistical significance with information about the infrequency of these differences, as shown in Table 3. This may reduce overinterpretation of test results and direct attention to the most reliable (and hopefully, meaningful) differences. If this can be combined with information about specific cognitive profiles, it may increase the contribution of WAIS-III scores to diagnosis and conceptualization.

Canadian-specific norms on the WAlS-III also produce slightly different results than U.S. norms. The current tables of statistical significance and rarity indicate that slightly greater differences are needed for statistical significance (with values ranging from .14 to .62), with rather larger differences needed for an equivalent level of rarity (bearing in mind the differences resulting from one- vs. two-tailed tests). This reflects the slightly lower reliability but noticeably lower intercorrelations between index scores for the Canadian sample, and indicates why Canadian-specific values are more appropriate than simply using values from the American standardization.


Davis, F. B. (1959). Interpretation of differences among averages and individual test scores. Journal of Educational Psychology, 50, 162-170.

Donders, J., Tulsky, D. S., & Zhu. J. (2001). Criterion validity of new WAIS-III subtest scores after traumatic brain injury. Journal of the International Neitropsychological Society, 7, 892-898.

Donders, J., Zhu, J., & Tulsky, D. (2001). Factor index score patterns in the WAIS-III standardization sample. Assessment, 8,193-203.

Field, J. G. (1960). Two types of tables for use with Wechsler’s Intelligence Scales. Journal of Clinical Psychology, 16,3-7.

Kaufman, A. S. (1990). Assessing adolescent and adult intelligence. Boston, MA: Allyn & Bacon.

Knight, R. G., & Godfrey, H. P. D. (1984). Assessing the significance of differences between subtests on the Wechsler Adult Intelligence Scale – Revised, journal of Clinical Psychology, 40, 808-810.

Langeluddecke, P. M., & Lukas, S. K. (2003). Wechsler Adult Intelligence Scale – Third Edition findings in relation to severity of brain injury in litigants. The Clinical Neiiropsychologist, 17, 273-284.

Livingston, R. B., Jennings, E., Reynolds, C. R., & Gray, R. M. (2003). Multivariate analyses of the profile stability of intelligence tests: High for IQs, low to very low for subtest analyses. Archives of Clinical Neuropsychology, 18, 487-507.

Longman, R. S. (2004). Values for comparison of WAIS-III index scores to overall means. Psychological Assessment, 16, 323-325.

McDermott, P., Fantuzzo, J., Glutting, J., Watkins, M., & Baggaley, A. (1992). Illusions of meaning in the ipsative assessment of children’s ability, journal of Special Education, 25, 504-526.

Naglieri, J. A. (1993). Pairwise and ipsative comparisons of WISC-IIIIQ and index scores. Psychological Assessment, 5,113-116.

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.

Payne, R. W., & Jones, H. G. (1957). Statistics for the investigation of individual cases, journal of Clinical Psychology, 13, 115-121.

Wechsler, D. (1991). Wechsler Intelligence Scale for Children-Ill. San Antonio, TX: The Psychological Corporation.

Wechsler, D. (1997a). Wechsler Adult Intelligence Scale-Third Edition. San Antonio, TX: The Psychological Corporation.

Wechsler, D. (1997b). WAIS-III/WMS-III technical manual. San Antonio, TX: The Psychological Corporation.

Wechsler, D. (2001). Wechsler Adult Intelligence Scale-Third Edition. Canadian technical manual. Toronto, ON: Harcourt Canada.

Received December 18, 2003

Revised September 9, 2004

Accepted September 23, 2004


Calgary Health Region

I wish to thank Donald H. Saklofske for his discussion on this topic.

Correspondence concerning this article should be addressed to R. Stewart Longman, Department of Psychology, Foothills Hospital, 1403 29th Street N.W., Calgary, Alberta Canada T2N 2T9 (E-mail: Stewart. Longman@calgaryhealthregion.ca).

Copyright Canadian Psychological Association Apr 2005

Provided by ProQuest Information and Learning Company. All rights Reserved