The effect of skew on the magnitude of product-moment correlations
William P. Dunlap
Micceri (1989) examined 440 large-sample achievement and psychometric measures for nonnormality and found that 71.6% showed moderate to extreme skew. Some types of measures showed even greater likelihood of asymmetry; for example, 91.4% of criterion/mastery tests were extremely asymmetric, and 84% of what Micceri termed psychometric measures were at least moderately asymmetric. Although Micceri found other nonnormal distortions, such as multi-modality and lumpiness, asymmetry was the most common contamination. An understanding of the impact of skew on statistical procedures, such as correlation and regression, is therefore of considerable gravity. Consequently, in the remainder of our introduction, we discuss expectations concerning the effects of transforming variables on reliability estimation, power in correlational research, and intercorrelations between transformed variables. It should be noted that although our treatment of those latter issues in the Introduction is statistical in nature, we are cognizant of construct measurement and construct validity issues relating to transformed variables and will return to such issues in the Discussion section.
Lancaster’s (1957) theorem has important implications regarding the effect of transforming variables on the intercorrelations among the transformed variables. If two variables X and Y have a bivariate-normal distribution, with correlation rho, and if X is transformed to [X.sup.*] and Y is transformed to [Y.sup.*], the absolute value of the correlation of [X.sup.*] and [Y.sup.*] must be less than or equal to the absolute value of rho. Because correlations are invariant under linear transformation, linear transformation will leave the correlation unchanged. According to the above theorem, however, nonlinear transformation is virtually certain to lower the correlation of bivariate-normal variables. The converse of Lancaster’s theorem, which has direct implications for the present research, is that if skewed variables could be converted to bivariate normality by transformation, the correlation between them should increase. With actual data, it is highly unlikely that one could find transformations that would exactly attain the bivariate normal goal. Instead, by correcting skew in each measure as much as possible, we would hope to make the resulting joint distribution closer to bivariate normality.
Calkins (1974) demonstrated the validity of Lancaster’s (1957) theorem by generating a large population of bivariate normal data, which were then skewed to various degrees by power transformations. He found that any amount of induced skew lowered the value of the resulting correlation below that of the original bivariate normal value. The least reduction in the size of r was found when the two distributions involved had the same direction of skew, and the greatest reduction in r was found with the largest amounts of skew, when the tails of the distributions were in opposite directions.
To show the extent of those relationships in ideal data, we generated data for Figure 1, analogous to Calkins’s (1974) demonstration but using the lognormal distribution rather than a power transformation to induce skew in originally bivariate normal data. We first generated 10,000 bivariate normal data, x and y, correlated .70, using the IMSL function RNNOF to generate independent standard normal data x and w, from which y was computed as
y = rx + [(1 – [r.sup.2]).sup.1/2]w, (1)
the procedure used by Knapp and Swoyer (1967) to produce data with an inter-correlation of r. Various amounts of skew were produced in the two variables by first modifying their variances, then transforming to the lognormal distribution, by[x.sup.*] = [e.sup.x], (2)
where e is the base of the Naperian logarithms. An equation for the skew in lognormal data is
skew = [([e.sup.v]- 1).sup.1/2]([e.sup.v] + 2), (3)
where V is the variance (Aitchison & Brown, 1976; simplified by Dunlap, Chen, & Greer, 1994). To produce negative skew, we performed exponentiation on the reflected variable. Then the result was reflected again to preserve the sign of the correlation, as may be seen in Equation 4. The standardized third[x.sup.*]= [-e.sup.-x], (4)
moments and correlations were computed and are depicted in Figure 1. As can be seen, the maximum correlation occurs when both variables have no skew. The correlation is somewhat lower when both variables are skewed in the same direction, but the correlation is dramatically lower when the skew is in opposite directions.
Relatedly, Carroll (1961) and Halperin (1986) pointed out that differences in the shapes of the marginal distributions of two variables will constrain the maximum value that the correlation can assume. The impact of differences in distribution shape in limiting the maximal value of a correlation can be most easily seen for the phi coefficient. Ferguson (1941), among others, worried that, if left uncorrected, such effects would result in difficult items grouping with other difficult items, and easy items grouping with easy items in factor analysis. This in turn would result in “difficulty factors” that were solely the result of different shapes of the marginal distributions.
Dunlap, Chen, and Greer (1994) showed analytically that skew lowered reliability for two general types of skewed distributions. Those distributions were the normal distribution raised to a power and the lognormal distribution, which can be returned to symmetry by a simple power transformation or by taking the logarithm, respectively. If skew in a measure lowers the reliability of that measure, it stands to reason that correcting the skew in a measure, thus improving its reliability, should result in improved correlations of that variable with other measures. It is well known (cf. Allen & Yen, 1979; Lord & Novick, 1968) that the validity coefficient cannot be higher than the square root of the product of the reliabilities of the two measures involved (where the reliability coefficient is defined as the correlation between observed scores on parallel measures).
Levine and Dunlap (1982, 1983) showed that the power of analysis of variance (ANOVA) was reduced with lognormal data, a distribution that may have considerable skew but may be corrected to normality with the log transformation. Because ANOVA can be thought of as a subset of problems in regression (see Cohen & Cohen, 1975), it seems likely that skewed data will also result in decreased power in regression problems. Power in testing correlations is directly related to the size of the correlation coefficient (see Cohen, 1988); therefore, it is likely that reducing skew will result in increased absolute magnitudes of correlations.
So we posit that if skewed data are transformed to gain greater symmetry, then the corresponding correlations with the corrected data will be larger. Correction of data toward symmetry can be accomplished by the power series of transformations recommended by Box and Cox (1964). A power transformation raises each data point to a power selected to reduce skew. For example, if the power were 0.5, the transformation would be the square root; if the power selected were -1.0, a reciprocal transformation would result. Powers near 0 result in transformations that approach the logarithmic transformation. A program solving for the power transformation that will minimize skew was described by Dunlap and Duffy (1974).
A problem, however, is that the power transformation is a nonlinear transformation. Nonlinear transformations of data, although they may produce greater symmetry, will also change the shape of the functional relationship between transformed variables. So, if a nonlinear relationship is made more linear by a power transformation, an additional increase in the correlation could be expected. On the other hand, if the power transformation makes the underlying relationship more curvilinear, the correlation may not increase as much as would be expected based on higher reliability alone, or the correlation may in fact decrease. We speculated that the former situation, where transformation to reduce skew would make the relationship more linear, is most likely with actual data. But the question is clearly an empirical one.
To address the complex issues regarding the effects of skew on the correlations among variables, we examined the effect of correcting skew in the present study using a data set with 20 measures, with a fairly large sample size. The concept of investigating statistical issues in real, as opposed to simulated, data sets, has recently become a popular approach. For example, Sawilowsky and Blair (1992) used eight of Micceri’s (1989) distributions to investigate properties of the t test. Also, Raju, Pappas, and Williams (1989) conducted an empirical Monte Carlo study with a large real data base to determine the accuracy of correlation, covariance, and regression slope models for assessing validity generalization. Halperin (1986), in discussing the importance of symmetrical marginal distributions to correlation, provided two examples of correcting skew in actual data; in one case the correlation became markedly larger, whereas in the second, little change in correlation magnitude was seen. It seems clear that to investigate how important skew is to the magnitude of correlations and, perhaps more importantly, how correcting skew influences correlations and via what mechanisms, more experience in correcting skew with actual data is needed.
The Data Set
The data examined consisted of 20 measures describing the financial characteristics of the nation’s top companies compiled from three business publications (Staff, 1989, Special issue; Staff, 1989, May 1; Staff, 1989, May 29). Measures included salaries, other compensations, sales, stock, earnings, and so forth, which after a deletion of the cases with missing data resulted in a sample size of 580. Because they were economic measures in monetary units, their distributions showed primarily positive skew, which in some cases was quite extreme (see Table 1). As is typically the case, the data showed the substantial leptokurtosis that normally accompanies skew.
A program from the BMDP statistical package (Dixon, 1983) was used to compute the skew in each of the measures. Next, we used a modified version of the Dunlap and Duffy (1974) program to correct the skew in the original data and to produce a new corrected data file. Then the correlation matrices of both the original and the corrected data were computed, again using the BMDP program package.
To investigate curvilinearity, we computed a quadratic partial correlation of each measure with each other measure, both before and after transformation to correct skew. This was done for any two variables X and Y by calculating the partial correlation between Y and [X.sup.2], controlling X. That partial correlation measured the extent to which the original relationship or the relationship after transformation deviated from a linear relationship in the simplest nonlinear manner, a parabolic trend (see Cohen & Cohen, 1975).
Shown in the first column of Table 1 is the power transformation required to reduce the skew in the original variables to 0 or near 0. The raw data were simply raised to that power to reduce skew. If 0 or negative scores were present in the untransformed data, a constant was first added to make all data positive. Power transformation values near 0 indicated that the log of the data would work well in reducing skew. For most of the data the log would have done a good approximate job. Exceptions are: Variable 2, where the reciprocal of the square root was about correct; Variable 16, where the reciprocal was indicated; Variables 18-20, which could be left as they were (raised to the power of 1), but showed very little skew in the first place.[TABULAR DATA FOR TABLE 1 OMITTED]
The effect of transforming the data to correct asymmetry reduced the skew to 0 (see Table 1; notice that the variables in Tables 1 and 2 have been ordered from greatest to least skew). Furthermore, the kurtosis, which was highly leptokurtotic (above 0) for most of the skewed data, was reduced to near 0 for all but a few variables after transformation; the kurtosis of normal data is 0. The coefficients of variation (standard deviation/mean), which were often above 1 for the skewed data, were all reduced to less than 1 for the transformed data.
The correlations of each variable with each other variable before transformation were averaged and appear in the second column of Table 2. The average correlations after transformation are shown in Column 3, and the fourth column shows the difference between the average correlations before and after transformation. In all cases the difference was positive, indicating increased average correlation after correction of skew. To help summarize the impact of skew correction on correlation, we grouped the measures in terms of the absolute amount of skew into the following categories: 7 or more, 5 to 7, 3 to 5, and 0 to 3. The amounts of increase in the average correlations for those skew categories were, [TABULAR DATA FOR TABLE 2 OMITTED] respectively: .0831, .0788, .0670, .0262. Therefore it was clear that increase in correlation was directly related to the amount of skew corrected. Averaged across the 20 variables in Table 2, the correlation increased an average of 37.4%, a rather substantial amount. The correlation between the absolute value of skew and change in average correlation as a result of skew correction was 0.60 (p = .0052), which further confirmed that correcting skew when it is larger results in larger increases in the average correlation.
Examination of the quadratic partial correlations before and after transformation, as well as the change score (Columns 5, 6, and 7 of Table 2), showed that curvilinearity was sometimes increased by transformation (positive differences) and sometimes decreased. The average change was .015, which was not very different from no change, yet it still represented a slight increase in curvilinearity. The correlation between the curvilinearity change and the change in average r because of transformation was -.20, which, although not significant, suggested that reductions in curvilinearity were associated with bigger changes in average correlation, which would have been expected.
Lancaster’s (1957) theorem, which proved that correlations will be at their maximum when bivariate normality obtains, essentially predicted the findings of this study: that removing skew from measures via transformation will in general lead to larger correlations. Lancaster’s theorem, however, gave rather limited insight into why that should occur. The treatment of transformation with respect to correlation and prediction in the statistical literature almost always concerns one of two objectives. The first is detecting nonlinear relationships and attempting to make the relation more linear by transforming one or both variables. The second is ensuring that the residuals from the regression line have characteristics appropriate for parametric tests of significance. For example, Hamilton (1992, pp. 46, 55) examined the correlation between water use and household income, both for the raw data and after correcting positive skew in both variables. He carefully described the improvements in the distribution of residuals, yet he failed to point out that the correlation increased from .4177 to .4511 as a result of correcting skew.
In the present study, our goal was to use transformations simply to correct skew, rather than to improve linearity or to improve the distribution of residuals. One result of the nonlinear transformations required to correct skew was that nonlinearity became, in fact, slightly greater in a number of instances, even though the average correlations in all 20 cases increased. Therefore, we can safely conclude that the increase in correlation magnitude that we saw was not to any great extent a result of correcting curvilinearity. Instead, another potential mechanism for increasing a correlation between measures is to increase the reliability of the individual measures. Given normal theory, the correlation between measures after changes in the reliabilities of the two measures is[R.sub.XY] = [r.sub.XY][[([R.sub.XX][R.sub.YY])/([r.sub.XX][r.sub.YY])].sup.1/ 2], (5)
where [R.sub.XY] is the new correlation, [r.sub.XY] is the old correlation, [R.sub.XX] and [R.sub.YY] are the new reliabilities, and [r.sub.XX] and [r.sub.YY] are the old reliabilities. It is clear that if the new reliabilities are greater than the old reliabilities, the resulting new correlation will be larger.
For measures that are composites of multiple items, increased reliability can be obtained (via the Spearman – Brown prophesy formula) by increasing the number of items. On the other hand, if a measure is skewed, the reliability of that measure will be improved by transformation to normality, as shown analytically for lognormal and power distributions and empirically for examples of actual data (Dunlap, Chen, & Greer, 1994). Therefore, at least some of the increases in average correlations noted earlier are likely the result of improved reliability in the constituent measures because of correction of skew. Using the lognormal model with various amounts of skew, we attempted to estimate the extent of increase that would result from increased reliability alone. Our estimates fell short of the actual increases (see Table 1), suggesting that there are direct effects of correcting skew on correlations over and above those based on increased reliability alone. Even with a large amount of skew in one variable, the maximum correlation occurs when the other variable has skew in the same direction, but of a somewhat lesser extent [ILLUSTRATION FOR FIGURE 1 OMITTED]. Substantially smaller correlations are seen when the data are skewed in opposite directions. Although the data set we studied had few examples of skew in opposite directions, one could anticipate that even greater improvement in correlation magnitude could be expected by transforming to symmetry in such cases.
What we recommend is that scientists working in areas of correlational research at least scan their measures for the presence of skew. If skew is not great, correcting for skew by means of transformation would be expected to yield only modest increases in the magnitudes of correlations; therefore it is debatable whether the increase is worth the loss in interpretation incurred by moving a step away from the variable as originally measured by using a transformed variable instead. Where the skew is great, however, and when the researcher is primarily concerned with prediction, much may be gained in the magnitudes of correlations by transforming to symmetry. In prediction contexts it is well recognized that the correlation coefficient itself is a direct index of practical gains in predictive efficiency (cf. Brogden, 1946; Cronbach & Gleser, 1965; Raju, Burke, & Normand, 1990).
Although the transformation of variables may be based on substantive reasons (cf. Cullen, Anderson, & Baker’s, 1986, tests of the theory of structural differentiation), researchers have frequently examined correlates of both transformed and untransformed variables to search for empirical regularities. For instance, there is a body of literature examining the relationship between organizational size (number of employees) and organizational performance, where size is defined as the number of employees (e.g., Glisson & Martin, 1980) or the log of the number of employees (e.g., Yasai-Ardekani, 1989). The search in the latter literature for empirical regularities between organizational size and performance is often at the expense of developing theories of organizational performance (Kimberly, 1976). The two operationalizations of organizational size, however, and the resulting correlations with organizational performance measures have assisted in clarifying such relationships (cf. Gooding & Wagner, 1985; Yasai-Ardekani). As summarized in Gooding and Wagner’s meta-analysis, correlations between the logarithm of numbers of employees and organizational productivity are consistently greater (ranging from .50 to .69) than the relationships between untransformed employee measures and organizational productivity (ranging from .31 to .53).
We believe that the present findings assist in explaining why researchers such as Gooding and Wagner (1985) obtained higher correlations with transformed than with nontransformed variables. Furthermore, from a theoretical perspective, the findings of Gooding and Wagner (1985) relating to the log of the number of employees suggest hypotheses concerning relative gains in organizational productivity as a function of workforce size. It is plausible that examinations of the difference between correlations based on transformed and on nontransformed variables in other domains will also lead to the development of similar hypotheses.
It should be pointed out that the extent to which relationships based on other types of transformed variables than those studied by Gooding and Wagner (1985) cross-validate and/or generalize across situations are empirical questions that remain to be answered. Relatedly, the degree to which relationships based on transformed sample data are good estimates of population correlations, in particular when the population is believed to be skewed, are questions to be addressed.
The findings of the present study, together with the previous research concerning correction of skew on correlation, may also help us to understand the findings of Fowler (1987), who showed that for a variety of skewed distributions (exponential, lognormal, double exponential, half normal), the power of the test for Spearman’s rho may exceed the power for the test of Pearson’s r. It must be remembered that Spearman’s rho replaces the raw data with ranks and that ranks have a symmetric rectangular distribution. Replacing raw data with ranks will cause previously skewed distributions to become symmetrical; thus ranking can be considered another transformation to symmetry, which when applied to skewed data should cause the correlation to rise.
Before transforming variables to reduce skew, however, one must consider potential difficulties that might ensue (see, e.g., Games, 1983). If there exists a strong theoretical rationale for measuring the variable in a given way, one must question what may be lost in the construct represented after transforming the variable. Likewise, it must be clear that the relationship studied after transforming is a relationship between the transformed variables; it may be misleading if it is interpreted in terms of the original variables. Furthermore, one must be aware that finding a transformation that will correct skew in a sample, particularly a small one, may not be the same as the transformation required to correct skew in the population. In addition, the attempt to achieve symmetry by transformation in some instances, as seen earlier, may induce nonlinear relations among the resulting variables. It should also be clear that when a transformation toward symmetry does increase the value of a correlation, there is no guarantee that the correlation will be increased by that transformation in all future samples.
Against those potential problems, we must weigh the potential advantages to be gained by correcting skew, particularly if it is great. Almost all of our parametric statistical procedures are based on underlying models that assume normality, yet naturally occurring normally distributed variables are rare (Micceri, 1989). Thus, even though there is no guarantee that the resulting data will be normal, if a researcher transforms data to symmetry, one can reasonably expect the data to be closer to normal than if they had been left skewed. Thus one is more likely to approximate the underlying assumptions of the statistical procedure used.
The fact that little attention has been given to the effects of skew on correlation coefficients is in part responsible for the fact that little is known about the effects of skew on more complex procedures that use simple correlations as a starting point. We may anticipate that there are, for example, effects of skew on factor analysis, such as factors consisting of variables with similarly shaped marginal distributions, as we mentioned earlier. There should also be effects on various indices of internal consistency. One such index of internal consistency, standardized item alpha (cf. Cortina, 1993), uses the average correlation among items directly in computing alpha; if transformation makes the average correlation larger, higher internal consistency, by this measure, will result. At a minimum, skew and its effects on correlation and reliability should be a topic covered in courses on psychometrics and test development.
The authors would like to acknowledge support from the Robert E. Floweree Fund.
Aitchison, J., & Brown, J. A. C. (1976). The lognormal distribution. Cambridge, England: Cambridge University Press.
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society (Series B), 26, 211-243.
Brogden, H. E. (1946). On the interpretation of the correlation coefficient as a measure of predictive efficiency. Journal of Educational Psychology, 37, 65-76.
Calkins, D. S. (1974). Some effects of non-normal distribution shape on the magnitude of the Pearson product moment correlation coefficient. Interamerican Journal of Psychology, 8, 261-288.
Carroll, J. B. (1961). The nature of the data, or how to choose a correlation coefficient. Psychometrika, 26, 347-372.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analysis for the behavioral sciences. New York: Wiley.
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104.
Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions. Urbana, IL: University of Illinois Press.
Cullen, J. B., Anderson, K. S., & Baker, D. D. (1986). Blau’s theory of structural differentiation revisited: A theory of structural change or scale? Academy of Management Journal, 29, 203-229.
Dixon, W. J. (1983). BMDP statistical software. Berkeley, CA: University of California Press.
Dunlap, W. P., Chen, R. S., & Greer, T. (1994). Skew reduces test-retest reliability. Journal of Applied Psychology, 79, 310-313.
Dunlap, W. P., & Duffy, J. A. (1974). A computer program for determining optimal data transformations minimizing skew. Behavior Research Methods & Instrumentation, 6, 46-48.
Ferguson, G. A. (1941). The factorial interpretation of test difficulty. Psychometrika, 6, 323-329.
Fowler, R. L. (1987). Power and robustness in product-moment correlation. Applied Psychological Measurement, 11, 419-428.
Games, P. A. (1983). Curvilinear transformations of the dependent variable. Psychological Bulletin, 93, 382-387.
Glisson, C. A., & Martin, P. Y. (1980). Productivity and efficiency in human service organizations as related to structure, size and age. Academy of Management Journal, 23, 21-37.
Gooding, R. Z., & Wagner, J. A. (1985). A meta-analytic review of the relationship between size and performance: The productivity and efficiency of organizations and their subunits. Administrative Science Quarterly, 30, 462-481.
Halperin, S. (1986). Spurious correlations – causes and cures. Psychoneuroendocrinology, 11, 3-13.
Hamilton, L. C. (1992). Regression with graphics. Pacific Grove, CA: Brooks/Cole.
Kimberly, J. R. (1976). Organizational size and the structuralist perspective: A review, critique, and proposal. Administrative Science Quarterly, 21, 571-597.
Knapp, T. R., & Swoyer, V. H. (1967). Some empirical results concerning the power of Bartlett’s test of the significance of a correlation matrix. American Educational Research Journal, 4, 13-17.
Lancaster, H. O. (1957). Some properties of the bivariate normal distribution considered in the form of a contingency table. Biometrika, 44, 289-292.
Levine, D. W., & Dunlap, W. P. (1982). Power of the F-test with skewed data: Should one transform or not? Psychological Bulletin, 92, 272-280.
Levine, D. W., & Dunlap, W. P. (1983). Data transformation, power, and skew: A rejoinder to Games. Psychological Bulletin, 93, 596-599.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166.
Raju, N. S., Burke, M. J., & Normand, J. (1990). A new approach for utility analysis. Journal of Applied Psychology, 75, 3-12.
Raju, N. S., Pappas, S., & Williams, C. P. (1989). An empirical Monte Carlo test of the accuracy of the correlation, covariance, and regression slope models for assessing validity generalization. Journal of Applied Psychology, 74, 901-911.
Sawilowsky, S.S., & Blair, R. C. (1992). A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. Psychological Bulletin, 111, 352-360.
Staff. (1989). The Business Week top 1000 [Special issue]. Business Week, pp. 163-296.
Staff. (1989, May 1). The Forbes 500’s annual directory. Forbes, pp. 173-396.
Staff. (1989, May 29). The power and the pay: The 800 best paid executives in America. Forbes, pp. 159-245.
Yasai-Ardekani, M. (1989). Effects of environmental scarcity and munificence on the relationship of context to organizational structure. Academy of Management Journal, 32, 131-156.
COPYRIGHT 1995 Heldref Publications
COPYRIGHT 2004 Gale Group