From the USA and Holland to Australia [1]

Cross-Cultural Generalizability of CBCL Syndromes Across Three Continents: From the USA and Holland to Australia [1]

Bernd G. Heubeck

Bernd G. Heubeck [2,3]

The study asked how well Achenbach’s 8-factor cross-informant model for the Child Behavior Checklist (Achenbach, 1991a, 1991b, 1991c) fits clinic data in the USA, Holland, and Australia. DeGroot et al.’s Dutch 8-factor model (DeGroot, Koot, & Verhulst 1994) was also tested for its cross-cultural generalizability. Achenbach’s matched clinical sample data (N = 2110) were analyzed and contrasted with the previously reported Dutch findings (N = 2335), as well as a new data set collected on clinic referred children and adolescents in Australia (N = 2237). Confirmatory factor analyses supported the Dutch as much as the American model in the USA, Holland, and Australia. Although about 90% of items showed convergent validity across models and countries, the attention and especially the social problems factor found least support. Most double loadings in the current models were not upheld. Instead, additional analyses discovered a number of unmodelled loadings including many cross-loadings. This led to the redefinition of the social problems factor as a mean aggression factor (with associated social problems) whereas the original aggression factor focuses on emotional acting out and the delinquent factor describes an evasive, covert type of antisocial behavior. Overall most support was obtained for the withdrawn, somatic, anxious/depressed, thought problems, and aggressive factors.

KEY WORDS: CBCL; confirmatory factor analyses; clinical samples; USA; Holland; Australia.


The fundamental importance of our clinical constructs cannot be pointed out more clearly and dramatically than by Feinstein (1967) and repeated by Mezzich and Mezzich (1987, p. 34): “The diagnostic taxonomy establishes the patterns according to which clinicians observe, think, remember, and act.” Two taxonomies have dominated the last decade of this century: The DSM system based on clinical observation and reasoning (cf. American Psychiatric Association, 1994) and empirical, dimensional approaches as represented by the factors or syndromes derived from the Child Behavior Checklist (CBCL) and its offshoots (Achenbach, 1991a, 1991b, 1991c). Both taxonomies are in widespread use and exert a very pervasive influence on clinicians and researchers not only in the USA but around the world. However, they have also been criticized, have gone through changes, and are continuing to evolve in a process that is nowhere near completion. The focus of this paper is on the CBCL and the current taxonomy derived from it.

A mammoth amount of research has gone into the development of the empirical approach based on the CBCL (e.g., Achenbach, 1966, 1978; Achenbach, 1991a, 1991b, 1991c; Achenbach, Conners, Quay, Verhulst, & Howel 1989; Achenbach & Edelbrock, 1978, 1979, 1981, 1983; Edelbrock & Costello, 1988). The earlier CBCL model described factors of child psychopathology that varied by sex and age group (cf. Achenbach & Edelbrock, 1983). Initially seen as a major strength of this approach, this developmental specificity made comparisons between sex and age groups difficult. Consequently the revision sought to establish factors that are common across these groups (Achenbach, 1991a). In addition, the revision attempted to integrate several rater perspectives, i.e., it demanded that factors or syndromes can be identified by at least two raters, if not three: by parents, teachers, and young people themselves (Achenbach, 1991a, 1991b, 1991c; see also Achenbach, McConaughy, & Howell, 1987). The resulting eight “cross-informant synd romes” form the core of the current empirical taxonomy (Achenbach, 1993). The fact that these syndromes can be observed across a wide age range, in males and females, and from three rater perspectives, represents a major strength of the taxonomy. In addition, there is some evidence that these syndromes can be identified in other countries as well (e.g., DeGroot, Koot, & Verhulst, 1994). Despite this success, the CBCL and its taxonomy has not been without its critics (e.g., Drotar, Stein, & Perrin, 1995; Macmann, Barnett, & Lopez, 1993) and therefore it is interesting to reexamine the process by which the syndrome scales were generated.

Achenbach (1991a) computed product–moment correlations and used principal component analysis with varimax rotation. The ratings obtained on the CBCLs consist of only three levels: never, sometimes, and often. Olsson (1979) showed that the treatment of short ordinal scales as interval scales can lead to serious distortions in the estimation of the correlation between two variables. Following Olsson (1979), the maximum likelihood estimation of the polychoric correlation is now regarded by many (e.g., Joreskog, 1990) as the better choice of statistic. Further, exploratory procedures like principle component analysis are nowadays regarded as appropriate in the first phase of instrument development. Once, however, a model has been established, confirmatory factor analysis (CFA) is seen as providing a more appropriate test of a model in a new sample (cf. Floyd & Widaman, 1995; Hull, Lehn, & Tedlie, 1991). Finally, Achenbach’s use of varimax rotation seems to be based on practical reasons rather than a strong theo ry about the underlying independence of different syndromes. In the generation of the 1991 scales only varimax rotations were used, leaving the question open as to whether oblique rotations may better represent the underlying factors.

By the beginning of 1999 only two studies of the CBCL had been published that used a confirmatory factor analysis approach. Dedrick, Greenbaum, Friedman, Wetherington, and Knoff (1997) reported on a moderately sized (given the size of the model) sample of 631 children in the USA, and DeGroot et al. (1994) examined a substantial sample of 2335 Dutch children. Although both studies investigated the 8-factor cross-informant structure of CBCL ratings, Dedrick et al. also included a test of a 1-factor model. Based on Macman et al.’s analysis of the CBCL as a one-dimensional measure (Macman et al., 1993, p. 327), which provides “a global index of the relative intensity of informant concerns,” the 1-factor model may represent the most appropriate comparison to evaluate the fit of any more differentiated model. Macman et al. also criticized the assignment of five items to two or three factors and Dedrick et al. found little support for this practice in their sample. In addition, the decision to assign aggression ite ms to other syndromes if their loadings equal or exceed .3 (although loading .4 or higher on the aggression factor) meant that some misspecification was built into the model from the start. Some discriminant validity problems can thus be expected. Dedrick et al. did not investigate these at the item level, but instead asserted that the syndromes possess discniminant validity because their correlations were less than perfect.

Drotar et al. (1995) raised a number of other problems with the checklist, amongst them an unreflected use in different cultures. Although they point to research demonstrating the possibility that there are different thresholds for distress about particular problems in different cultures (cf. Weisz, Sigman, Weiss, & Mosk, 1993), the issue may be deeper, and not only concern mean differences, but also include differences in the very symptom constellations that are rated and by inference in the underlying syndromes. If, however, it could be demonstrated that the CBCL measures similar problems or syndromes across “…countries that differ in language, culture, and referral practices…” (DeGroot et al., 1994, p. 225), our ability to compare and use findings from studies in different countries would be enormously enhanced.

DeGroot et al. (1994) concluded that they had found supportive evidence for the cross-cultural generality of the CBCL cross-informant syndromes in a study of clinically referred children in Holland. They also used exploratory factor analyses to generate a Dutch model of CBCL factors, which shared 74 loadings with the American model, but assigned 37 items differently. Not only were they able to cross-validate this model in a second large sample of clinically referred children and adolescents, but they also showed in this cross-validation that the Dutch and the American model both provided an equally good fit to the Dutch data. The question of double loadings was not addressed in that study. In fact, the Dutch model exacerbated the problem by assigning not just five, but nine items to two factors each. Despite this drawback, the Dutch 8-factor model constitutes a major alternative to the US model, given the strength of its database and development. So far it has not been tested with American data, nor has any other test of the model been published so far.

Whether either the American or the Dutch model apply to Australian children and adolescents is also not known. Although Hensley (1988) reported norms for the CBCL in Sydney, Australia, these were based on the pre-1991 American syndrome structures. More importantly, no research has been published todate to demonstrate that either the pre-1991 or the new 1991 American CBCL syndromes are actually seen in clinics in Australia. A demonstration that the CBCL measures the same constructs in Australia as in the USA and Holland would go some way to reassure Australian practitioners and researchers that the CBCL is an appropriate instrument for use on this continent. Outside of Australia it would contribute to the further development of the global cross-cultural perspective on child psychopathology.

One other study was located that reported a confirmatory factor analysis of CBCL items (Berg, Fombonne, McGuire, & Verhulst, 1997). Unfortunately, only 43 items common to French and Dutch exploratory factors were subjected to CFA (N = 673). The study points to a possibly major issue with the thought problems syndrome in some cultures because the factor was not replicated at all in this study. In addition, DeGroot et al. (1994) reported a poor replication for the American social problem scale and some difficulties with the Dutch attention problem scale as well. Put together with some exploratory factor analyses (e.g., Doepfner, Schmeck, Berner, Lehmkuhl, & Poustka, 1994), these results question the assumption that all eight CBCL factors can be identified with equal clarity and stability across all western cultures.

The current study set out to test the US as well as the Dutch 8-factor model with clinically referred children and adolescents in Australia. As Achenbach (1991a) did not report a confirmatory factor analysis, an analysis of the US matched clinical data was also planned to (a) provide a common method basis for comparisons across countries and (b) examine the Dutch model with American data. In addition, the study was to compare the 8-factor model with the simpler 1-factor model and pay particular attention to the issue of discriminant validity and double loadings.



Australian Samples

Sydney is with more than 3.5 million residents the largest city in Australia, which in turn has a total population of about 18 million people (only slightly larger than Holland). Sydney is the capital of New South Wales, which has about 6 million inhabitants. All the data for this study were collected in Sydney. However, clients from country regions of New South Wales were also serviced by some of the agencies as detailed later. Altogether, over 3000 CBCL records were collected during the period of 1983-1997. After excluding second raters of the same child and records with too much missing data, 2237 CBCLs were analysed, 643 from an agency called Arndell, 466 from Rivendell, 600 from Redbank, 450 from a Mental Health Service at Liverpool, and 78 from Hensley’s study (Hensley, 1988). The Arndell Child and Family Unit is a department of the Royal North Shore Hospital, offering tertiary level psychiatric outpatient, daypatient, and inpatient services. Most clients live in the Northern Sydney Health Region (up to 60% of referrals) whereas others travel from other metropolitan areas of Sydney ([sim]20%) as well as country areas (about 20% of referrals). The Department of Child, Adolescent, and Family Psychiatry at Redbank House is part of Westmead Hospital in the Western Sydney Health Region. It is a tertiary level service, providing outpatient, daypatient, and inpatient programs mainly to the Western Sydney Health Region, and to a lesser extent to the Wentworth area, other regions of Sydney, and country regions of NSW. The Rivendell Adolescent and Family Psychiatric Service at Concord offers tertiary level assessment and treatment services for adolescents on an outpatient, daypatient, and inpatient basis. Although a substantial section of the clientele is drawn from the local central Sydney area, Rivendell offers its services to all metropolitan areas and over half of its clientele usually comes from other areas of Sydney. In addition, services are provided to selected country regions of NSW and around 15% of clients in any one year may come from outside of Sydney. The Pediatric Mental Health Service at Liverpool is a specialized tertiary level unit offering outpatient assessment and treatment for infants, children, adolescents, and their families. The unit also provides consultation to other service providers, but does not offer an inpatient option. All clients resided within the South Western Sydney Area Health region, which mainly covers suburbs ranked low or very low in socioeconomic prestige. Hensley (1988) provided normative data for the CBCL based on interviews with 1300 Sydney parents. Her norms explicitly excluded 78 children (51 boys and 27 girls) who were assessed or treated or both assessed and treated by school counselors, psychologists, or psychiatrists. Their CBCL records were included in the current study, although they had no discernible impact on the results.

While 891 boys in the total sample were under 12 years old, the other 632 boys were 12 years or older. Only 263 girls under 12 years were included whereas 451 girls were 12 years or older. For boys the exact age distribution (n/age 4-17) was as follows: 70, 57, 100, 92, 129, 142, 154, 147, 175, 166, 127, 90, 66, and 8. For girls the exact numbers per age (4-18) were 26, 33, 17, 27,45, 43, 43, 29, 93, 84, 103, 87, 71, 12, and 1. Mothers provided ratings for 90% of CBCLs, fathers for 5%, others for 3%, and for 2% this information was not recorded. Many forms did not include the occupational data required to estimate the socioeconomic status of the clients’ families. All that can be said from the information available is that families from a wide range of socioeconomic backgrounds used these services. Although the majority of participants were of Caucasian background, the information on ethnic background was too scatchy to provide exact figures. No claim of representativeness of the overall sample for clinic ser vices in Sydney or New South Wales can be made. However, the large number and diversity of participants hopefully mitigated against some of the possible selection biases.

The US Samples

Achenbach (1991a) performed his analyses in clinical samples of boys and girls at three age levels, 4-5, 6-11, and 12-18 years with Ns ranging from 292 to 1339 per sex/age group. These children and adolescents were seen in 52 different settings in eastern, southern, and mid-western USA. The services included a wide range of private and public psychology and psychiatry services. In order to compare clinic and nonclinic cases, Achenbach (1991a) formed samples of N = 2110 each, who were matched by sex and age, and as far as possible also by respondent, ethnicity, and SES. It was this matched clinic subsample data that was analyzed for the current study. It included 1032 boys and 1078 girls, with at least 48 subjects at every sex/age level, except for 17-year-old girls (N = 28) and 18-year-old boys and girls (total N = 24). Just over 74% of CBCLs were obtained from mothers, another 10% from fathers, 7.8% from others, and for the remainder this information was missing. About 3 out of 4 children were Caucasian, bu t for 6.4% this information was missing. Information about socioeconomic status was available for 92% of the sample, showing a broad distribution across the SES spectrum with a mean of 5.1 (SD = 2.4) on Hollingshead’s scale.

Dedrick et al.’s sample included 631 children and adolescents identified as suffering from severe emotional disturbances for a national adolescent and child treatment study (Dedrick et al., 1997). They came from six different US states, were mostly white (72.3%), and male (76.4%). Their ages ranged from 8 to 18 years, with a mean age of 14 years (SD = 2.4 years). Over half (55%) participated in special education programs for severely emotionally disturbed children whereas almost 45% resided in mental health facilities. Their socioeconomic background was not reported. Dedrick et al.’s findings are included in the current presentation to facilitate a direct comparison between studies (Dedrick et al., 1997).

West-European Sample

The Dutch data was collected at 25 mental health centers in the province of Zuid-Holland. Demographic details of the wider Dutch sample were reported in DeGroot et al. (1994), including a slightly larger number of girls than boys and an age range from 4 to 18 years (mean = 9.8 years). More than half (55%) of the respondents were mothers and 12% were fathers. The remaining CBCLs were answered by both parents or an adult custodian. About 93% of children were Caucasian. The mean SES of the total sample was average for Holland. The representativeness of the sample could not be established, but “to avoid selective biases as much as possible subjects were recruited from a diversity of sources…” and a broad distribution of demographic variables (DeGroot et al., 1994, p. 226). For the current investigation, the results based on the 2335 cases in the “validation sample” are included to facilitate the direct comparison between countries.

Models and Data Analyses

A major aim of the analyses was to achieve maximum comparability of results across studies. Therefore, only studies that examined all 85 cross-informant items and only models that had been tested previously, i.e. the 1-factor model (Dedrick et al., 1997) and the 8-factor model in its American and Dutch form, were considered (Achenbach, 1991a; Dedrick et al., 1997; DeGroot et al., 1994). The 8-factor model was tested in its correlated as well as uncorrelated form to clarify whether Achenbach’s correlated scales represent underlying factors that are also correlated (Achenbach, 1991a). In addition, the basis of analysis, namely a matrix of polychoric correlations, as well as the method of estimation, i.e. unweighted least-squares estimation, was held constant to avoid a possible method confound in comparing results across studies. Although Joreskog (1990) suggested the use of weighted least-squares estimation (WLS) for polychoric correlation matrices, the size of the models to be tested prohibited the computati on of stable weight matrices. Both, Dedrick et al. (1997) and DeGroot et al. (1994) used unweighted least-squares estimation (ULS) to overcome this problem. Their choice was supported by the findings of a Monte Carlo study conducted by Rigdon and Ferguson (1991), which showed that ULS estimation did not produce more biased parameter estimates than WLS did. Consequently, ULS estimation was chosen for the current study as well.

In the choice of fit indices, comparability with other studies was again a major criterion. The [[chi].sup.2] statistic is known to be strongly dependent on sample size (e.g., Marsh, Balla, & McDonald, 1988) and, although reported, was not used in the evaluations of model fit. DeGroot et al. (1994) reported the goodness of fit index (GFI), adjusted goodness of fit index (AGFI), and the root mean square residual (RMR). They too are affected by sample size, but are reported to be able to compare the American and Australian findings with the Dutch results. The main criteria used to judge model fit included the normed fit index (BBI) proposed by Bentler and Bonett (1980), Bentler’s comparative fit index (CFI) (Bentler, 1990), a nonnormed index, TLI (Tucker & Lewis, 1973), and the root mean square error of approximation (RMSEA; Steiger & Lind, 1980). A recent Monte Carlo study of incremental fit indices by Marsh, Balla, and Hau (1996) supported the TLI and the CFI in the assessment of model fit. Dedrick et al. (1997) judged fit to be acceptable for models with CFI and TLI greater than .90 while RMSEA was less than .08. DeGroot et al. (1994, p. 229) implied that their results (GFI = .88 and AGFI = .88) reflected a limited “fit,” and assessed their RMR of .096 as “small.” Others have suggested that a GFI and AGFI [greater than].90 and RMR [less than].05 characterize a relatively “good” model fit. As no criteria exist to determine precise cutoffs, interpretation of fit indices has to take into account a number of measures as well as the nature of the data and the model under examination. All computations were carried out using the PC versions of Prelis 2 and LISREL 8 (Joreskog & Sorbom, 1994).


Table I shows the models used, the data sets to which these models were applied, and the fit indices calculated for the current study (US Achenbach and Sydney) or reported previously (US Dedrick and Holland). The chi-square statistic of the null models varied between studies, obviously mainly as a function of sample size. The independence chi-square for the Dutch model was not reported and neither was a test of the 1-factor model.

Dedrick et al. (1997) found that the 1-factor model was not completely unfitting and analysis of Achenbach’s data for the current study showed very similar results (e.g., CFI and TLI = .85 and .84, respectively; BBI = .83 and .84, while RMSEA = .104 and .109, respectively). In Sydney, however, the fit of the 1-factor model was worse than in both of the American data sets (e.g., CFI, TLI, and BBI = .80, while RMSEA = .122).

Dedrick et al. (1997) reported a very poor fit for the uncorrelated 8-factor model. This finding was replicated in the current study for the Achenbach and the Sydney data using the US as well as the Dutch model (CFI, TLI, BBI [less than].38, while RMSEA [greater than].21). However, when the model allowed for the substantial correlations between the underlying eight factors (ranging, for example, from .30 to .69 in the US model and data), the Dutch data showed a moderate fit, the Sydney data fit the US model as well as the Dutch model slightly better, and the American data showed the relatively best fit (CFI, TLI, BBI = .90, with RMSEA = .085). At the same time the size of the fit measures and the residuals demonstrated that the fit of these models was not exactly perfect and that it would be useful to examine the data in more detail.

One way of further scrutinizing the fit of the data to these models is by computing the loadings of items on the factors they are thought to express or represent. Table II shows the number of items for each syndrome that passed the conventional .3 criterion for convergent validity in the US and Dutch 8-factor models. The Table also includes the number of items with loadings of .4 or higher because Achenbach (1991a) chose this higher threshold for the selection of items for the aggressive factor. Full details are reported in Appendix A for each of the hypothesized eight correlated factors in the US model as well as the Dutch model. Between 89%-93% of items loaded above .3 on the factors they are meant to measure in the US model. The corresponding finding for the Dutch model showed 87%-93% of items loading above .3 on their respective factors in different countries.

Examination of individual syndromes in the US model showed the best convergent validity for items measuring somatic complaints, anxious/depressed, and the aggressive syndrome. In each case only one out of four samples produced an item loading below the .3 criterion. The same syndromes showed the best convergent validity for individual items in the Dutch model. At the other end, a number of items on the social problems factor did not perform well under the US model, and the worst results were obtained for the attention syndrome. At least three items received loadings under .3 in different samples and in Sydney there were four items showing a lack of convergent validity. Under the Dutch model similar problems with three and four attention items were found.

Further examination revealed that 12 items were responsible for the reduction in convergent validity under the US model (items 1, 45, 55, 56e, 62, 63, 75, 80, 93, 101, 103, 105) and under the Dutch model (13, 17, 23, 31, 50, 55, 61, 64, 75, 80, 101, 105). There was an overlap of five nonperforming items between the two models. Five of the low loading items were assigned to more than one factor in the US model (1, 45, 62, 80, 103), and eight in the Dutch model (13, 17, 23, 31, 50, 61, 64, 80). Deletion of these items in the US Achenbach sample as well as the Sydney sample yielded correlations above .95 for the US model and above .92 for the Dutch model, between the shortened scales and the respective full length scales suggested by the models.

Discriminant validity was assessed in the US Achenbach sample as well as the Sydney clinic sample. Inspection of modification indices demonstrated a large number of potential crossloadings as well as correlations between error variances. Exploratory factor analyses of eight factors in the US and Sydney clinic samples found no additional items loading (.3+) on the somatic complaints and anxious/depressed factor, one extra item on thought problems, two on the delinquent factor (three in the US sample), three more items on the withdrawn factor, three (in Sydney) and five (US) extra items on the aggressive factor, one in the US and five in Sydney on the attention factor, and another eight (Sydney) and eleven (US) loadings on the social problem factor.

These “new” loadings did not have a major impact on the interpretation of the withdrawn factor as they simply added that someone who rates high on the factor does not display restless behavior, does not show off, or talk too much. The additional loading of item 13 (confused) on the thought problem factor would also not be considered to change its basic meaning. Additional items on the attention problem factor included item 93 (talks too much) in Sydney and five items in the US that mainly describe the social correlates of attention problems (items 23, 25, 38, 48, 64).

Additional items on the delinquent factor showed that these children do not cling to adults (US and Sydney) and are secretive (Sydney). The aggressive factor showed additional loadings, which included crying, showing no guilt, and sulking in both countries. Restlessness and impulsiveness also received loadings from this factor in Sydney.

The majority of the cross-loadings described so far were in the .3-.4 range and would not impact in a major way on the interpretation of these factors. However, a new picture emerged from the exploratory factor analysis of the social problems factor. Only three items (25, 38, 48) on the original social problems factor were supported in the US as well as the Sydney sample. Eleven new items joined the factor in the US sample (16, 20, 21, 37, 57, 72, 81, 82, 94, 97, 106) and eight new items in the Sydney sample (16, 21, 34, 37, 57, 81, 82, 97), seven of them the same items as in the US solution. The highest loadings were found on items like attacks, fights, is mean, threatens, does not get along with others, and is not liked (range .4-.6). These cross-loadings raised the question of how distinct the newly defined social problem factor is from the delinquent and aggressive factors. The matrix of factor correlations showed that the factors are quite distinct. Correlations between the new social problems factor an d the delinquent factor were low (.17 in the US, .26 in Sydney) as were correlations between the aggressive and delinquent factors (.20 in the US, .23 in Sydney). However, correlations between the new social problems factor and aggression were moderately high (.41 in the US and .45 in Sydney).


Although a number of studies have reported exploratory factor analyses of the CBCL in different countries, the many decisions that have to be made along the way (e.g., factor method, number of factors to be extracted, type of rotation, etc.) have meant that results were often not directly comparable. The current study employed exactly the same methodology (CFA) across countries to test five models that were identified a priori and found support for large sections of the Dutch and the US correlated 8-factor models. However, additional analyses also identified a number of misspecifications that should be considered in a revision of the model.

Both correlated 8-factor models demonstrated that they significantly improve measurement over the I-factor model suggested by Macmann et al. (1993), thus countering criticism that the CBCL only measures overall level of parental concern. However, the uncorrelated 8-factor models did substantially worse than even the 1-factor model, thus strongly arguing against the use of varimax rotation in this area of inquiry. The basic strength of the 1-factor model needs to be recognized. This strength establishes a fairly high baseline (CFI, TLI, BBI of .84) and leaves relatively little room for further factors to improve fit before a ceiling is being reached. Theoretically the I-factor solution may represent a basic psychopathology factor, a higher order factor, or indiscriminant reporting by parents. Further study needs to address to what extent these interpretations apply, preferably involving some criteria outside the CBCL itself.

Despite the strength of the 1-factor model, fit indices like the CFI, TLI, and BBI rose to .90 when eight factors were specified (and potentially could rise even more after adjusting the model for misspecification, see later). Examination of convergent item validities found that about 90% of items loaded on the factors the models say they represent. More specifically, there was good support for the claim that the majority of items on six of the eight scales measure the factors they were designed to tap. Very important also is the finding that there was considerable consistency in these item loadings across the three countries. The withdrawn, somatic complaints, anxious/depressed, thought problems, delinquent, and aggressive behavior scales can thus be used with some confidence not just in the USA and Holland but in Australia as well. It should be clear though, that this conclusion is based on the convergent validity data. This means that practitioners who currently administer these scales can continue their use in the knowledge that the scale scores they compute will be highly correlated with any scale modified to adjust for the few low loading items. This recommendation only pertains to situations where individual scale scores are used to rank order children independent of their scores on other scales. It does not extend to other uses of the scales like the assessment of comorbidity or interpretation of the CBCL profile, which heavily depend on another criterion, namely discriminant validity (see later).

This study found less support for the CBCL attention factor. Given that 9 out of 14 items supposed to measure attention problems demonstrated low loadings in one or the other model, it may be most instructive to point out the items that did show cross-cultural generalizability, namely item 8 (concentrate), item 10 (sit still), and item 41 (impulsive), all with strong loadings in each country and model. Each of these items is also part of the Child Attention Profile (CAP; Edelbrock, 1988), which uses items from the Teacher Report Form of the CBCL. The CAP has a clear factor structure measuring inattention and over-activity and has been shown to be sensitive to stimulant drug effects (cf. Barkley, DuPaul, & McMurray, 1991). In view of the better performance of items on instruments derived from the CBCL, the maintenance of the original item composition on the parent form may turn out to be a procrustean bed that hampers further development. The CAP is not the only source that could assist the future clarificati on and development of this factor. DSM researchers who have embraced dimensional ideas have also contributed to the definition of two dimensions related to the AD/HD category, that they also call inattention and overactivity (cf. DuPaul et al., 1998; Gomez, Harvey, Quick, Sharer, & Harris, 1999). It seems as if future revisions of the CBCL could benefit from incorporating some of these advances.

The social problems factor needs a major reconceptualization. Achenbach (1993) observed that there is no clear counterpart for this factor in DSM, although at least 13 studies have reported similar factors previously. The US and the Dutch model overlap by only four items and only three of these performed well across models and countries (not get along, teased, and not liked). The three additional items in the Dutch model were supported across countries (feels persecuted, fights, and attacks). Berg et al. (1997) identified the same three core items as the current study as measuring the French-Dutch cross-cultural social problems factor. However, Doepfner et al. (1994) suggested that social problems and social withdrawal do not form separate factors and also reported substantial loadings of these three items on their aggressive factor. Additional exploratory factor analyses conducted in the current study supported the Dutch model of the factor more than the US model. Most importantly, they revealed a number of additional false negative items in the US model (mean, threatens, destroys, steals, etc.) in both the US and Sydney sample. Taken together, these results indicate a significant shift in the meaning of this factor from the original US model, which portraits an immature and clumsy child who does not get along with peers. The new factor paints the picture of a child who may be rejected, but who is mean, destructive, antisocial, and probably a bully.

Decreased convergent loadings on some items and additional loadings found in this study also suggest a slightly different emphasis in the interpretation of the delinquent and aggressive factors. The delinquent syndrome was characterized by lying, stealing, running away, truancy, and alcohol and drug use, in the US as well as Sydney. The Sydney data also showed a substantial loading for the secretive item. Taken together, this factor describes an evasive and often covert form of antisocial behavior. The aggressive factor always contained a large number of mood related items, e.g., jealous, stubborn, mood change, temper. The current study found significant additional loadings for crying and sulking on this factor in the US and in Sydney (as well as impulsiveness in Sydney), suggesting the interpretation that an emotion-regulation deficit may underlie this factor. Taken together this means that there are three behaviour problem factors measured on the CBCL: an emotional acting out factor, a mean, aggressive, an d destructive factor, and an evasive, delinquent factor. Correlations ranging from .17 to .45 showed that the underlying factors are distinct. How do they relate to the literature? Cole and Zahn-Waxler (1992), for example, described the problem of emotional dysregulation in disruptive behavior disorders; Frick, O’Brien, Wootton, and McBumett (1994) distinguished between impulsive conduct problems and callous/unemotional psychopathy; Patterson (1982) examined the overt-covert dimension of antisocial behavior; and Burns et al. (1997) factor analysed DSM symptoms of ODD and CD. How exactly the three CBCL factors just described relate to such conceptualizations will require more research.

Another issue addressed by the current study concerned the performance of items that are assigned double loadings in either model. Overall, there was little support for this practice in relation to the items currently assigned to more than one factor. In the US model none of the five items obtained substantial loadings on both factors they were meant to measure (or all three in the case of item 80). However, item 45 (nervous) and item 103 (sad) received substantial loadings across countries and models from the anxious/depressed factor. The scoring of several scales can thus be simplified by counting items on one scale only. Macmann et al. (1993) argued that items that need to be scored on several scales lack discriminant validity by definition and that the practice is undesirable. This line of reasoning assumes that there are clear diagnostic signs in child psychopathology, which are uniquely related to distinct conditions. Although an interesting ideal, the reality of child psychopathology may be different. Just as fever needs not to be dropped as a sign of many medical conditions, an item like confusion needs not to be dropped as a sign of attention as well as thought problems. What is important though, is that the discriminant validity of the item is known and taken into account. A number of cross-loadings were found in the current study, which would improve model fit if incorporated into a revised version. Macmann et al. (1993) were also concerned that double scoring of items inflates correlations between scales. Although this is correct, this is not a problem of the model as such, but of the incorrect application or interpretation of statistics. The use of factor scores can easily overcome this problem in most research. In clinical practice with individual clients the issue usually only arises in the context of the CBCL profile, where considerable caution will continue to be necessary in the interpretation of intraindividual profile differences.

Just as DeGroot et al. (1994) had found in Holland, comparison of the US and Dutch model showed similar (minimally better) fit to the data in the USA and Australia. The models share 74 loadings and both require some revision. Bringing together all findings in this study, it is clear that there is a strong core of items on the CBCL, which generalize well across models and countries. Any revision should preserve this core and improve model structure by taking convergent as well as discriminant validity equally into account. The current findings will hopefully contribute to such a revision, which could carry the CBCL and its associated taxonomy into the 21st century. However, further considerations should also enter into the process.

Firstly, the CBCL has kept the same items for the last 20 years (cf. Achenbach, 1978; Achenbach & Edelbrock, 1979). Although this constancy enabled an unprecedented accumulation of research findings that can be directly compared, it may have prevented a more dynamic development of the CBCL system by adapting items to newer insights from clinical studies. It appears that the attention syndrome may be a prime candidate for improvement through the addition of items that have already proven their worth in other studies. Secondly, the current study was limited in the sense that only a small number of models was tested. Other viable models include a two dimensional specification (e.g., internalizing and externalizing), a seven factor model (cf. Berg et al., 1997; Doepfner et al., 1994), or hierarchical models. The additional presentation of these models would have far exceeded the space limitations of a journal article, but any serious revision should include tests of these models as well. Thirdly, given the unden iable importance of different rater perspectives (cf. Achenbach et al., 1987), research with the Teacher Report Form and Youth Self-Report needs to be considered as well, just as Achenbach (1991a, 1991b, 1991c) did in the initial creation of the cross-informant syndromes. Fourthly, although the current study focused on the core syndromes that can be identified across sex and age groups (Achenbach, 1991a), there is a need to establish that any revision is also applicable in different sex and age groups. Finally, the support obtained in the current research for six of the eight CBCL syndromes should give researchers some confidence that these factors are measurable across countries as diverse as the USA, Holland, and Australia. After revision, eight syndromes may emerge as generalizable across these countries. Nonetheless, researchers need to remember that they are all so-called “Western” countries, and that further work is needed before the results can be generalized to Eastern, African, Latin, or Islamic nati ons.


Thanks are due to numerous people who have either assisted in the data collection, contributed data, administrative support, or advice to this study: Three anonymous reviewers, Prof. Tom Achenbach, Ms. Cathy Howell, Prof. Frank Verhulst, Prof. Joseph Rey, Dr. Nick Kowalenko, Mr. Henry Luiker, Dr. Beth Kotze, Dr. John Brennan, Ms. Julie Squires, Dr. Johanna Watson, Mr. Roberto Parada, Assoc. Prof. Bryanne Barnett, Mr. Stephen Matthey, Ms. Sherryl Davies, Ms. Michelle Willis, and Dr. Rae Hensley.

(1.) An earlier version of this paper was presented at the 14th World Congress on Psychosomatic Medicine of the International College of Psychosomatic Medicine, Cairns, 31.8.97-5.9.97.

(2.) Division of Psychology, School of Life Sciences, Faculty of Science, The Australian National University, Canberra, Australia.

(3.) Address all correspondence to Bernd G. Heubeck, Division of Psychology, School of Life Sciences, Faculty of Science, The Australian National University, Canberra, ACT 0200, Australia; e-mail:


Achenbach, T. M. (1966). The classification of children’s psychiatric symptoms: A factor analytic study. Psychological Monographs, 80 (No. 615).

Achenbach, T. M. (1978). The Child Behavior Profile: I. Boys aged 6-11. Journal of Consulting and Clinical Psychology, 46, 478-488.

Achenbach, T. M. (1991 a). Manual for the Child Behavior Checklist/4-18 and 1991 Profile. Burlington, VT: University of Vermont, Department of Psychiatry.

Achenbach, T. M. (1991 b). Manual for the Teacher’s Report Form and 1991 Profile. Burlington, VT: University of Vermont, Department of Psychiatry.

Achenbach, T. M. (1991 c). Manual for the Youth Self-Report and 1991 Profile. Burlington, VT: University of Vermont, Department of Psychiatry.

Achenbach, T. M. (1993). Empirically based taxonomy: How to use syndromes and profile types derived from the CBCL/4-18, TRF, and YSR. Burlington, VT: University of Vermont, Department of Psychiatry.

Achenbach, T. M., Conners, C. K., Quay, H. C., Verhulst, F. C., & Howell, C. T. (1989). Replication of empirically derived syndromes as a basis for taxonomy of child/adolescent psycho-pathology. Journal of Abnormal Child Psychology 17, 299-323.

Achenbach, T. M., & Edelbrock, C. (1978). The classification of child psychopathology: A review and analysis of empirical efforts. Psychological Bulletin, 85, 1275-1301.

Achenbach, T. M., & Edelbrock, C. (1979). The Child Behavior Profile: II. Boys aged 12-16 and girls aged 6-11 and 12-16. Journal of Consulting and Clinical Psychology, 47, 223-233.

Achenbach, T. M., & Edelbrock, C. (1981). Behavioral problems and competencies reported by parents of normal and disturbed children aged four to sixteen. Monographs of the Society for Research in Child Development, 46(Serial No. 188).

Achenbach, T. M., & Edelbrock, C. (1983). Manual for the Child Behavior Checklist and Revised Child Behavior Profile. Burlington. VT: University of Vermont, Department of Psychiatry.

Achenbach, T. M., MeConaughy, S. H., & Howell, C. T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213-232.

American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders, (4th ed.). Washington, DC: Author.

Barklcy, R. A., DuPaul, G. J., & MeMurray, M. B. (1991). Attention deficit disorder with and without hyperactivity: Clinical response to three dose levels of methylphenidate. Pediatrics, 87, 519-531.

Bentler, P.M. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238-246.

Bentler, P.M., & Bonett, D. G. (1980). Significance tests and goodness of-fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606.

Berg, I., Fombonne, E., McGuire, R., & Verhulst, F. (1997). A cross cultural comparison of French and Dutch disturbed children using the Child Behaviour Checklist (CBCL). European Child & Adolescent Psychiatry, 6,7-11.

Burns, G. L., Walch, J. A., Patterson, D. R., Holte, C. S., Sommers Flanagan, R., & Parker, C. M. (1997). Internal validity of the disruptive behavior disorder symptoms: Implications from parent ratings for a dimensional approach to symptom validity. Journal of Abnormal Child Psychology 25, 307-319.

Cole, P.M., & Zahn-Waxler, C. (1992). Emotional dysregulation in disruptive behavior disorders. In D. Cicchetti & S. L. Toth (Eds.), Rochester symposium on developmental psychopathology Vol. 4: Developmental perspectives on depression. New York: University of Rochester Press.

Dedrick, R. F., Greenbaum, P. E., Friedman, R. M., Wetherington, C. M., & Knoff, H. M. (1997). Testing the structure of the Child Behavior Checklist/4-18 using confirmatory factor analysis. Educational and Psychological Measurement, 57, 306-313.

DeGroot, A., Koot, H. M., & Verhulst, F. C. (1994). Cross-cultural generalizability of the Child Behavior Checklist cross-informant syndromes. Psychological Assessment, 6, 225-230.

Doepfner, M., Schmeck, K., Berner, W., Lehmkuhl, G., & Poustka, F. (1994). Zur Reliabilitaet und faktoriellen Validitaet der Child Behavior Checklist–eine Analyse in einer klinischen und einer Feldstichprobe. Zeitschrift fuer Kinder- und Jugendpsychiatrie, 22, 189-205.

Drotar, D., Stein, R. E. K., & Perrin, E. C. (1995). Methodological issues in using the Child Behavior Checklist and its related instruments in clinical child psychology research. Journal of Clinical Child Psychology 24, 184-192.

DuPaul, G. J., Anastopolous, A. D., Power, T. J., Reid, R., McGoey, K. E., & Ikeda, M. J. (1998). Parent ratings of ADHD symptoms: Factor structure, normative data, and psychometric properties. Journal of Psychopathology and Behavioral Assessment, 20, 83-102.

Edelbrock, C. S. (1988). The Child Attention Profile. Unpublished manuscript.

Edelbrock, C., & Costello, A. J. (1988). Convergence between statistically derived behavior problem syndromes and child psychiatric diagnoses. Journal of Abnormal Child Psychology, 16, 219-231.

Feinstein, A. R. (1967). Clinical judgement. Huntingon, NY: Krieger.

Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286-299.

Frick, P. J., O’Brien, B. S., Wootton, J. M., & McBurnett, K. (1994). Psychopathy and conduct problems in children. Journal of Abnormal Psychology 103, 700-707.

Gomez, R., Harvey, J., Quick, C., Sharer, I.,& Harris, G. (1999). DSM-IV AD/HD: Confirmatory factor models, prevalence, and gender and age differences based on parent and teacher ratings of Australian primary school children. Journal of Child Psychology and Psychiatr); 40, 265-274.

Hensley, V. R. (1988). Australian normative study of the Achenbach Child Behaviour Checklist. Australian Psychologist, 23, 371-382.

Hull, J. G., Lehn, D. A., & Tedlie, J. C. (1991). A general approach to testing multifaceted personality constructs. Journal of Personality and Social Psychology 61, 932-945.

Joreskog, K. G. (1990). New developments in LISREL: Analysis of ordinal variables using polychoric correlations and weighted least squares. Quality and Quantity 24, 387-404.

Joreskog, K. G., & Sorbom, D. (1994). LISREL 8 user’s reference guide. Chicago: Scientific Software International.

Macmann, G. M., Barnett, D. W., & Lopez, E. J. (1993). The Child Behavior Checklist/4-18 and related materials: Reliability and validity of syndromal assessment. School Psychology Review, 22, 322-333.

Marsh, H. W., Balla, J. R., & Hau, K. (1996). An evaluation of incremental fit indices: A clarification of mathematical and empirical properties. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 315-353) Mahwah, NJ: Lawrence Erlbaum.

Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391-410.

Mezzich, J. E., & Mezzich, A. C. (1987). Diagnostic classification systems in child psychopathology. In C. L. Frame & J. L. Matson (Eds.), Handbook of assessment in childhood psychopathology. New York: Plenum Press.

Olsson. U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psvchometrika, 44, 443-460.

Rigdon, E. E., & Ferguson, C. E. (1991). The performance of the polychoric correlation coefficient and selected fitting functions in confirmatory factor analysis with ordinal data. Journal of Marketing Research, 28, 491-497.

Patterson, G. R. (1982). A social learning approach, Vol. 3: Coercive family process. Eugene, OR: Castalia.

Steiger, J. H., & Lind, J. C. (1980, May). Statistically-based tests for the number of common factors. Paper presented at the annual meeting of the Psychonomic Society, Iowa City, IA.

Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10.

Weisz, J. R., Sigman, M., Weiss, B., & Mosk, J. (1993). Parent reports of behavioral and emotional problems among children in Kenya, Thailand, and the United States. Child Development, 64,98-109.

COPYRIGHT 2000 Plenum Publishing Corporation

COPYRIGHT 2001 Gale Group