Assessment of complex sentence production in a narrative context

Assessment of complex sentence production in a narrative context

Gummersall, Dawn M

ABSTRACT: This study focused on eliciting and assessing complex sentence structure in a meaningful discourse context. The effect of clinician support (modeling of specific structures and practice with the structures) on the subsequent use of complex sentence structures in story retelling was studied. Experiment 1 was conducted with students demonstrating language abilities within normal limits. Because these initial results were promising, Experiment 2 was conducted with students demonstrating language impairments. The results of these experiments indicated, first, that amount of exposure to the stimulus is a critical variable affecting length and syntactic complexity in story retelling. Evidence further supported the use of modeling and practice of specific structures when assessing students’ syntactic skills. The assessment protocol used in these experiments was found to be useful for eliciting a large number and variety of complex syntactic structures in a meaningful context from children with and without language impairments. KEY WORDS: syntax assessment, narrative, dynamic assessment, school-age children, language disorder

The comprehension of complex sentences is particularly important for classroom success and social development (Strong & Strong, 1999). As children progress through school, both teacher talk and textbook language become more complex (e.g., Nelson, 1988). In addition, children must be able to use complex sentence structure to communicate with increasing specificity. As Scott (1988a) stated, “The importance of complex language to the child cannot be overstated. Adult language consists of intricate weavings of meaning relationships, and these can never be adequately expressed in simple sentences” (p. 59).

Given the importance of complex sentence-structure skills to children’s academic and social development, speech-language pathologists need to identify children having difficulty with syntax so that they can provide effective intervention. However, both conversational language sampling and standardized testing may not provide evidence of the breadth and depth of a child’s syntactic skills. The range of structures that contribute to linguistic complexity, the vulnerability of these forms to contextual constraints, and the low frequency of occurrence of some structures present a considerable assessment challenge (Scott & Stokes, 1995).

Standardized testing is often criticized for fragmenting our understanding of a child’s abilities (e.g., Jitendra & Kameenui, 1993). Typically, such measures fail not only to provide information regarding the child’s readiness to learn new syntactic structures, but also to identify the types of clinical support that may be most helpful in achieving client change.

Standardized tests currently available for evaluating syntactic complexity include tasks such as sentence combining (e.g., Test of Language Development: Intermediate, second edition [TOLD-2, Hammill & Newcomer, 1988]); sentence generation, given a word (e.g., Clinical Evaluation of Language Fundamentals, third edition [CELF-3, Semel, Wiig, & Secord, 1994]); sentence assembly or ordering (e.g., CELF-3, Semel et al.); sentence completion or clozure (e.g., Test of Language Development: Primary, second edition [TOLD-P:2, Newcomer & Hammill, 1991]); sentence correction (e.g., Evaluating Communicative Competence [ECC, Simon, 1984]); nonimitative elicitation (e.g., Clark-Madison Test of Oral Language, Clark & Madison, 1981); and sentence imitation (e.g., TOLD-P:2, Newcomer & Hammill). For almost all of these published measures, children are asked to produce unrelated sentences in the absence of any meaningful context.

An exception to this pattern is a subtest of ECC (Simon, 1984), in which syntax is assessed in a storytelling context through responses to two tasks: first, to tell a story from a set of pictures; and second, to paraphrase a story told by the examiner about the same pictures. Simon’s stated objective for this subtest was to measure the extent of improvement following a model. Although most of the sentences presented in the modeled stories (eight sentences per story) are syntactically complex, there are a limited number and range of structures. Moreover, only mean length of utterance (MLU) is used for comparing performance on the two tasks. The analysis of specific syntactic structures is not addressed.

There is, then, a lack of information regarding the effects of clinician support (e.g., modeling and practice of specific structures) on the subsequent use of complex sentence structures in story retelling. To address the research problem, a complex sentence-structure assessment protocol was developed in which syntactic complexity was assessed in a meaningful discourse context (story retelling).

Experiment 1, conducted with students with normal language, included three conditions. For Condition 1, children listened to the story, then imitated each sentence, then retold the story. For Condition 2, children listened to the story, then listened again with pause time after each sentence (without imitating), then retold the story. For Condition 3, children listened to the story and then retold it. For Experiment 2, Condition 1 procedures were used with students exhibiting language impairments. We reasoned that if complex sentence imitation in a meaningful discourse context was found to enhance the frequency and variety of complex sentences used during story retelling, speech-language pathologists might have an efficient means of assessing complex syntax as well as a research-based approach for providing clinical supportnamely, modeling and practice.


Based on the theoretical foundation laid by Vygotsky (1978), contemporary examiners have turned to dynamic- or interactive-assessment approaches to gain more useful evaluation information (e.g., Pena, Quinn, & Iglesias, 1992). When using this approach, the subject is viewed as a learner and active participant. The examiner not only assesses but also intervenes by providing a prompt-rich environment, seeking to estimate the extent to which positive change can be induced in the learner’s performance (e.g., Lidz, 1987; Olswang & Bain, 1996).

Lidz (1987) suggested that dynamic assessment should supplement other assessment approaches by beginning at the point where the child fails to independently use strategies to modify his or her performance. One model of dynamic assessment is called test-train-test assessment (e.g., Budoff & Gottlieb, 1976). Although Budoff and his colleagues focused on aspects of cognition assessment, this model also has relevance for language assessment. However, little is known concerning the types of training that are most clinically effective.

Typically, a prompt-free approach has been used in narrative-retelling tasks. The child listens to and views a story and then retells the story without picture support. A speech-language clinician then analyzes all aspects of language production (including syntactic complexity) from a transcript of the child’s story retelling (Hughes, McGillivray, & Schmidek, 1997; Strong 1998; Strong & Shaver, 1991). However, we hypothesized that a complex sentence-imitation task plus picture cues could act as the training component of the narrative testing procedure by providing a model for the child to imitate. With this training, children could rehearse the story’s content, syntax, and sequence prior to retelling.

The limitations of using sentence imitation (elicited imitation) as an assessment procedure are well-documented (e.g., Fujiki & Brinton, 1987; Prutting & Connolly, 1976; Scott, 1988a). We emphasize that sentence-by-sentence imitation was used only as a training component in our research design, not as an assessment procedure. We asked whether complex sentence imitation-where the sentences to be imitated had meaningful relationships and were supported by a story’s pictures-would enhance complex syntax production during story retelling.


The analysis of students’ complex sentence structure while narrating provides valuable information regarding syntactic production. Research summarized by Milosky (1987) showed utterances to be longer and more complex in narration than in conversation. As children develop grammatical complexity, they can construct meaning relationships in narration more concisely and intricately. For example, adverbial subordinate clauses occur as children describe cause-and-effect relationships for story events; relative clauses occur as children describe characters; and nominal clauses occur as children relate story dialogue and use main verbs such as “thought” and “knew” to describe characters’ internal responses. The chronology of the story and locative detail are recounted through embedded prepositional phrases and adjectives. As children talk about characters’ actions and consequences, complex verb phrases are needed.

Research findings indicate that story-retelling tasks result in longer stories than story-generation tasks (Merritt & Liles, 1989). Additionally, Westby (as cited in Crais & Lorch, 1994) noted that highly structured stimuli, such as wordless picture books and films, result in more complex stories. Research findings also indicate that the linguistic complexity with which children retell stories is related to the linguistic complexity of the story they have read or heard. After eliciting language samples from children, Holloway (1986) had them retell stories under two conditions: (a) after having read aloud from a basal text and (b) after having read aloud from a text in which the language was matched for length and complexity to their own. She found that the language produced in the basal story retelling was less complex than that produced following the story that was matched for the children’s linguistic level. Evidence indicates, then, that in order to elicit more complex oral language, stimulus material must contain a level of linguistic complexity that is similar to or more advanced than the child’s own.

Two standardized tests for assessing narrative production are the Story Construction subtest of the Detroit Tests of Learning Aptitude (DTLA-3, Hammill, 1991) and DeAvila and Duncan’s Oral Production subtest of the Language Assessment Scales (as cited in Crais & Lorch, 1994). Both require students to produce stories (in the absence of a model) about selected topics, and obtained stories are scored for thematic content without specific attention to syntactic complexity. Crais and Lorch expressed concerns about the specificity of these measures for revealing distinct language-production difficulties.

One standardized procedure for assessing narrative production for which field test data are available is the Strong Narrative Assessment Procedure (SNAP, Strong, 1998). After listening to an audiotape and looking at a corresponding wordless picture book, students retell the story without picture support to a naive listener. Scoring does include specific attention to syntactic complexity; however, only neutral prompts are given to support the story retelling.

Attempts to include dynamic-assessment procedures in the assessment of narrative abilities have been made. For example, Gutierrez-Clellen and Quinn (1993) suggested a two-step procedure. First, the examiner collects spontaneous narrative samples in varying contexts and analyzes the child’s performance from these samples to determine whether the child used the rules appropriate to each narrative context. Then the examiner provides mediation by describing the contextual rules for different narrative situations. For example, the contextual rules for storytelling in school and test situations might be described as “talking like a book.” The child then practices “talking like a book” with verbal cues and modeling from the mediator. The frequency of cues and the child’s responses are recorded, and the mediated performance is compared to the promptfree performance.


Syntactic complexity develops at the clause level by coordination or subordination. When coordinating or linking clauses, the clauses are related semantically and the syntactic status of both clauses is equal, as in, “He leaped out of the carrier, and he jumped on the nurse’s cap.” In subordination, clauses are embedded into or attached to a main clause (Scott, 1988a). There are three major types of clause subordination-adverbial, nominal, and relative: Adverbial clause: So, while the girls snoozed in their sleeping bags, Fred got into the pizzas. Nominal clause: Tori told her parents what had happened.

Relative clause: Once there was a frog whose name was Fred.

In the preschool and early elementary years, adverbial clauses of time (when) and reason (because) predominate. Nominal clauses used include that and wh (e.g., what, where) nominals. In this early period of syntactic development, relative clauses appear in the predicate (right branching) of sentences (Scott, 1988a), as in, “The vet gave Fred some pink medicine that was really yucky.”

Toward the later elementary years, use of subordination expands to include consequence (therefore), concession (though, even, if, unless), and manner (as). Also, the embedding of elements begins to occur in the subject (left branching) position, with nominal clauses as subjects and center-embedded relative clauses emerging (Scott & Stokes, 1995). Left-branching subordination tends to be later developing, likely because it disrupts the subject-verbobject format that is found in early developing syntax (Owens, 1992).

Nonfinite nominals (nominal clauses consisting of nonfinite elements such as the ing participle, ed participle, or infinitive; Leech & Svartvik [1975]) are also characteristic of later development, as in, “The girls saw Fred gobbling all the fries;” “Tori had a frog named Fred;” and “How to get some fries was all he could think about.” Other later-developing relative pronouns (whose, which, in which) appear. In addition, relative pronouns may be deleted, though understood, as in, “Tori brought the pink medicine [that] Fred hated.”

Loban (1976) reported that approximately 20% to 30% of sentences spoken by a 9-year-old contain a subordinate clause. Nominal, adverbial, and relative clauses are the major clause types, accounting for more than 90% of all subordination (Scott, 1988b). Nominal and adverbial clauses together account for 80% of the subordination, with each being nearly equally represented (Scott, 1988b). In narratives, relative clauses occur with 6% to 20% frequency, and in conversation, with 24% to 34% frequency (Scott, 1988b).


The research findings reviewed support the use of a narrative context for assessing children’s production of complex sentences as well as the use of clinician support during assessment to enhance children’s language production. Few studies were located in which authors used such strategies in narrative-retelling assessment tasks. The question not addressed in prior studies was whether clinician support in the form of sentence imitation plus picture cues enhances the syntactic complexity of children’s subsequent retelling of a story.


Purpose and Design

Three experimental conditions were defined, and subjects were randomly assigned to one of these conditions to form three groups:

Condition 1: Children (a) listen to and view a story, (b) listen to each sentence of the story and then imitate each with the pictures in view, and (c) retell the story in the absence of the pictures.

Condition 2: Children (a) listen to and view the story, (b) listen to each sentence of the story with the pictures in view (without imitating), and (c) retell the story in the absence of the pictures.

Condition 3: Children (a) listen to and view the story and (b) retell the story in the absence of the pictures.

Condition I was the assessment protocol of interest. That is, a sentence-imitation task was included to act as a training component. In this manner, by interjecting a sentence-by-sentence imitation of the narrative, subsequent to listening to the story and prior to story retelling, the child became an active participant and was able to practice the targeted language. Children were not given any indication of the correctness of their imitated responses; only general feedback (“You’re doing a good job”) was periodically provided to keep children on task.

Condition 2 was included as a control for the amount of story exposure. The child was exposed to the story twice prior to retelling it, as in Condition 1. However, the child did not imitate, but only listened during the sentence-bysentence presentation, with a 4-to-5 second pause following each sentence.

Condition 3 was the traditional narrative assessment paradigm (no additional practice or listening) and was included as the control condition for comparison. The child listened to and viewed the story and then was asked to retell the story. The researchers reasoned that if no statistically or educationally significant differences existed between the mean syntactic complexity scores for Conditions 2 and 3, additional exposure to the story (in Conditions 1 and 2) could be eliminated with some confidence as the salient variable affecting performance, and the obtained mean scores for Conditions I and 3 could be compared with greater confidence. On the other hand, if the mean scores for Conditions 1 and 2 were quite similar, but those for Condition 3 were statistically or educationally significantly lower, then exposure to the story was an important variable and the comparison of performance between Conditions 1 and 2 was of interest.

The test-train-test model of dynamic assessment was used to conceptualize the experimental protocol. However, because the researchers did not have access to a parallel version of the stimulus story, subjects could not be pretested using one story, then trained and post-tested using a different, but equivalent, story. Consequently, findings for Condition 3 subjects represent pretest performance in this experimental design. Clearly, the procedures used represent a modified, and somewhat more limited, form of the testtrain-test paradigm.


According to research findings, many changes in the use of complex syntactic structures occur in the mid-elementary years and later (Scott, 1988a). Therefore, the target population for the initial study was children 8 or 9 years of age (third grade) who had normal language abilities, as defined by no history of language intervention or special education enrollment, and receptive vocabulary within normal limits as measured by the Peabody Picture Vocabulary Test-Revised (PPVT-R, Dunn & Dunn, 1981). Based on obtained PPVT-R standard scores, each of 30 children was randomly assigned to one of three conditions (resulting in 10 children per condition). Specifically, the standard scores were ranked from high to low. The first three children were randomly assigned from the ordered ranking to one of the three assessment conditions to help ensure that receptive vocabulary scores were similar for the three conditions. Then the children obtaining the next highest three scores were randomly assigned to conditions, and so on. Descriptive statistics for age and PPVT-R scores are shown in Table 1.


The researcher met initially with each of the children, in random order, to administer the PPVT-R. The researcher then met with each child a second time to administer one of the three narrative-assessment conditions. For each condition, the children were given standard instructions. All testing took place within a 3-week period. All responses to the sentence-imitation and story-retelling tasks were tape recorded to allow for later transcription and analysis.

Stimulus Materials

The stimulus story for the three conditions was JunkFood Frog (Strong & Strong, 1995). This story, which is 46 sentences long (52 T-units, 486 words, 9.3 words per T-unit), consists of 40 complex and six compound sentences (Appendix). Each subordinate clause structure occurs twice in the story, once in the left-branching sentence position (before the main clause verb) and once in the right-branching sentence position (after the main clause verb). Coordinate clause structures (and, but, so) occur twice as well. The selection of the specific syntactic structures was based on developmental data from various sources (e.g., Scott, 1988a, 1988b; Scott & Stokes, 1995), thus strengthening the measure’s validity. Twenty-three original picture plates illustrate the story. The story was narrated by a professional narrator (male) and tape recorded in a recording studio.

Data Coding

A randomly selected identification number was placed on each child’s audiotape and the child’s name obscured, thus controlling for possible coder-expectancy effects and ensuring confidentiality. The tapes were placed in random order and transcribed (using standard English spelling) by the first researcher.

The second researcher also independently transcribed nine (30%) randomly selected tapes (three from each condition). For 85.5% of the transcribed sentences in the nine stories, the transcriptions of the first and second researchers were identical. For the other 14.5% of the sentences, there were minor differences that would have affected the scoring of one or more of the dependent measures. The two researchers then replayed and reviewed tapes for which any differences had occurred and discussed the transcription until agreement was reached.

Transcripts of the story retellings were segmented into T-units (minimal terminable units) (Hunt, 1965). A T-unit is “one main clause with all the subordinate clauses attached to it. The number of subordinate clauses can, of course, be none” (Hunt, p. 20). All main clauses beginning with a coordinating conjunction were counted as separate T-units (Scott, 1988a). The second researcher checked the segmentation of all transcripts. The researchers reviewed any discrepancy in segmentation until agreement was reached.

There were two dependent measures of length (number of words and number of T-units) and four dependent measures of syntax: (a) words per T-unit, (b) number of subordinate clauses produced correctly (incorrect production was any variation that did not yield an acceptable complex sentence according to standard American English, as in, “The vet gave Fred some pink medicine what was very gross”), (c) the subordination index (average number of clauses per T-unit), and (d) number of different subordinate clauses produced correctly (“different” clauses were defined as use of different subordinate clause connectors, such as which or while). Words were counted using the rules outlined by Strong and Shaver (1991).

For all dependent measures, intercoder agreement was calculated, using point-to-point rater agreement checks (McReynolds & Kearns, 1983). Agreement ranged from 99.2% to 100%. The two researchers reviewed transcripts for which any differences occurred and resolved any disagreements. Reliability of the Scores

In classic measurement theory (e.g., Nunnally, 1978), random error in measurement reduces the repeatability of assessments, and so their reliability. Three sources of random error in measurement are typically of concern: inconsistency in coding, elements of the test, and variations in the trait being assessed (Nunnally). As discussed, intercoder agreement was assessed and was at acceptable levels. The next question was how to estimate the reliability of the obtained scores as affected by any remaining sources of measurement error.

For a previous study of the stability of cohesion scores from children’s spoken narratives, Strong and Shaver (1991) used methods that are usually applied to paper-andpencil measures to provide reasonable approximations. These methods include the test-retest method, internal analysis method, and parallel forms method (Lord & Novick, 1968). The test-retest and the parallel forms methods were rejected-the former because of the practice effects of retelling the same story and the latter because a parallel form was not available. The internal analysis method was selected, and Pearson product-moment correlation coefficients were calculated for all 30 subjects pooled, between scores for the first half and scores for the second half of the transcripts (separated by number of T-units into two halves). Whenever the total number of T-units was an odd number, the extra T-unit was included in the second half. The correlations between scores for the two halves of each retelling were corrected with the Spearman-Brown prophesy formula, which provided estimates of reliability coefficients for samples of the length obtained.

The corrected reliability coefficients for number of words and number of subordinate clauses were large (r2 = .98 and .92, respectively). Corrected coefficients for words per T-unit and for the subordination index were moderate (r2 = .68 and .77, respectively). Although the low reliability of scores can obscure true differences between/among means, coefficients of .70 are, nevertheless, typically of sufficient magnitude for research purposes (Nunnally, 1978). Given the above moderate coefficients, these syntax scores were judged to be of adequate reliability for research purposes. The corrected coefficient for number of different subordinate clauses produced correctly was small (r2 = .39). Given the low magnitude, findings for this measure were interpreted with caution.

Reliability coefficients were not calculated for the specific subordinate clause structure scores. The frequency of occurrence of the individual subordinate clause structures was low. Because reliability tends to be a function of the number of items in a test (Nunnally, 1978), it is likely that the reliability coefficients for the scores used for these analyses were low. Consequently, findings from analyses for these structures should be interpreted with caution.

Data Analyses

Differences among the mean length and syntax scores for the three conditions were analyzed. Words per T-unit and clauses per T-unit are syntax metrics that are functions of story length (number of words divided by number of Tunits and number of clauses divided by number of T-units, respectively).

Descriptive statistics were computed for each of the three conditions for the two story-length and four syntax measures. The practical significance of the results was assessed by computing standardized mean differences (SMD = M^sub 1^ – M^sub 2^/SD) for each mean contrast for each dependent measure. Cohen’s (1988, pp. 25-27) standards of .2 as a small effect size, .5 as a medium effect size, and .8 as a large effect size were used as arbitrary though reasonable criteria to judge the magnitude of the SMDs.

One-way analyses of variance (ANOVAs) were used to test the statistical significance among the means on the dependent measures for the main effect of condition membership. For any statistically significant main effect, the Tukey multiple-comparison technique was used to test each contrast of means for statistical significance (Hopkins & Anderson, 1973).

In supplementary analyses for the three conditions, the number of subjects who used at least one adverbial, nominal, and relative clause was compared. Also, bar charts were constructed to display the findings for the mean number of different subordinate connectors used and for the mean number of left- and right-branching subordinate clauses used.


Condition-Membership Differences Length. Statistically significant differences among the means for the three conditions (Table 2) were found for both number of words [F(2, 29) = 7.94, p = .002] and number of T-units [F(2, 29) = 5.77, p = .008]. Post-hoc analyses revealed statistically significant differences for the mean contrasts for Condition 1 versus 3 and for Condition 2 versus 3, but not for the mean contrasts for Condition 1 versus 2. That is, the length means for Conditions 1 and 2 were statistically significantly larger than those obtained for Condition 3.

Syntax. Statistically significant differences among the means for the three conditions (Table 2) were obtained for words per T-unit [F(2, 29) = 4.68, p = .0071, number of subordinate clauses produced correctly [F(2, 29) = 6.52, p = .005], subordination index [F(2, 29) = 4.96, p = .015], and number of different subordinate clauses produced correctly [F(2, 29) = 6.06, p = .007]. The latter finding was surprising, given the low reliability coefficient (r = .39) for that measure. Even with scores that lacked reliability, the difference was large enough to be statistically significant.

Post-hoc analyses revealed that for words per T-unit and number of subordinate clauses produced correctly, statistically significant differences were obtained for the mean contrasts for Condition 1 versus 3 and for Condition 2 versus 3, but not for the mean contrasts for Condition 1 versus 2, as was the case with the length measures. For the subordination index and for number of different subordinate clauses produced correctly, statistically significant differences were obtained between the mean scores for Conditions 1 and 3 only.

To address the practical significance of the results, SMDs for each mean contrast were computed for each measure (Table 2). All mean contrasts for Conditions 1 and 3 and for Conditions 2 and 3 yielded large and clinically important effect sizes by Cohen’s (1988) standards. Conversely, trivial to medium SMDs were obtained for the mean contrasts for Conditions 1 and 2.

Supplementary Analyses

Story-retelling error analysis. Only five subordinate clauses were produced incorrectly during story retelling by the students with normal language:

1. Incorrect adverbial connector (though/because): Though it smelled good to Fred, he thinked how to get out.

2. Incorrect adverbial connector (although/because): And although they were hungry, they ordered some pizza.

3. Incorrect nominal connector (how/that): And when they were eating outside, when Dad was serving burgers and fries, he peeked up to the window and wished how he could have all the fries.

4. Incorrect relative connector (what/that): The vet gave Fred some pink medicine what was very gross and told him not to eat as much junk food.

5. Abandoned adverbial clause (lacked a main clause): And when everybody turned their heads to see Tori’s mom bringing salad.

Errors were produced by subjects in Conditions 1 and 2; no errors were produced by subjects in Condition 3.

Differences Between Conditions 1 and 2 for Specific Subordinate Structures

Because the amount of exposure appeared to enhance children’s performance, we questioned whether the exposure created through sentence-by-sentence imitation (Condition 1) resulted in different performance than exposure created through sentence-by-sentence listening with added pause time (Condition 2). One-tailed t tests for independent means (p

For the numbers of adverbial, nominal, and relative clauses used, statistically significant differences between the means were not obtained (Table 3). However, medium effect sizes (SMDs = .6) were obtained for the numbers of nominal and relative clauses; the effect size was small for the number of adverbial clauses.

Statistical significance in a positive direction (i.e., Condition I means were higher than Condition 2 means) was obtained for the number of subjects using one or more adverbial clauses with although, while, and because; nominal clauses with that; and relative clauses with where (Table 4). Statistical significance in a negative direction (i.e., Condition 2 means were higher than Condition I means) was obtained for the number of subjects using one or more adverbial clauses with since and until. All effect sizes for these contrasts were large (.8 or greater). Although the mean differences were not statistically significant for the number of subjects using one or more subordinate clauses with when, after, as [adj.] as, even though, for [noun] to, whenever, and which, medium effect sizes were obtained (Table 4). For the number of subjects using one or more subordinate clauses with if, so that, before, unless, as, no connector, how to, what, that, and who, the mean differences were not statistically significant and the effect sizes were nil to small.

Comparisons Among the Three Conditions for Specific Syntactic Structures

The number of subjects who used at least one adverbial, nominal, and relative clause was determined. All subjects in Conditions 1 and 2, and nine subjects in Condition 3, produced at least one adverbial clause. Nine subjects in Conditions I and 2, and eight subjects in Condition 3, used at least one nominal clause. Eight subjects in Condition 1, six subjects in Condition 2, and three subjects in Condition 3 used at least one relative clause. These results were consistent with Scott’s (1988b) findings regarding the relative use of different clause types.

To examine the variety of subordinators used, the mean number of different subordinate clause connectors used in each Condition was calculated (Figure 1). Condition I subjects produced the highest mean number of different connectors (the greatest variety) for all subordinate clause types. Condition 3 subjects produced the lowest mean number (the smallest variety) of different connectors.

Figure 2 was constructed to examine left- and rightbranching of subordinate clause types by condition. Conditions I and 2 resulted in a higher mean number of left-branching subordinate clauses than Condition 3, particularly for adverbial and nominal clauses. Children produced a slightly higher mean number of left-branching clauses in Condition 1 than in Condition 2.


Overall, statistically and educationally significant differences were found between the means for Conditions 1 and 3 and between the means for Conditions 2 and 3. The mean scores for Conditions 1 and 2 were similar; those for Condition 3 were significantly lower. Conditions 1 and 2 resulted in higher incidences of subordination overall, a greater variety of subordinate clause connectors, and more left-branching clauses than Condition 3. The shared variable in Conditions 1 and 2, which differed from Condition 3, was amount of exposure.

When comparing Conditions I and 2, it was found that Condition 1 resulted in a greater variety of clause connectors and in more left-branching subordination than Condition 2. These findings indicate that clinical support (modeling and practice) had a positive effect above and beyond amount of exposure.


Purpose and Design

Experiment 2 explored the usefulness of the protocol for assessing complex syntax in students with language impairments. Because Condition 1 students from Experiment I had produced the largest variety and the most leftbranching subordinate structures, we selected Condition 1 as the testing condition for Experiment 2. Based on findings from a prior study of story-retelling skills, in which students were not provided with additional clinical support (Strong, 1998), the researchers speculated that the students with language impairments (under Condition I testing circumstances) would retell the story using sentence complexity that was less complex than that of Condition 1 students (with normal language) but more complex than that of Condition 3 students (with normal language).

Four students in the communicative disorders department served as research assistants. The second author provided instruction and practice on narrative assessment after the students had completed two senior-level classes on language assessment and intervention. The written and oral instructions focused on eliciting narrative samples using Condition I procedures and on transcribing and segmenting narrative samples. The audiotape, picture book, and standard instructions remained constant across all subjects as in Condition 1 for Experiment 1.


Twelve students with documented language impairments who were receiving language intervention services in the public schools were the subjects. Each student met the criterion for identification of students with language impairments as mandated by the Utah State Office of Education. That is, the student was considered language impaired if he or she performed at least 1 SD below the mean on two or more measures of oral expression or listening comprehension in one or more of three areasmorphology, syntax, and semantics. Diagnostic data had been collected by school district speech-language pathologists and were available in the school records.

These 12 students-two 8-year-olds, five 9-year-olds, four 10-year-olds, and one II-year-old-had mild, moderate, or severe language impairments. All had IQ scores of 85 or better on a standardized measure (scores were available in the school district records) and they were not classified as intellectually handicapped, emotionally disturbed, or behaviorally disordered, although support services could be provided by learning-disabilities specialists. All had normal vision, hearing, and speech intelligibility as well as no history of organic disorders. Their diagnosis of language impairment was not attributable to cultural differences. The students’ mean age in months was 118.4 (SD = 12.0, median = 115, range = 103-139). Their PPVT-R mean score was 80 (SD = 15, median = 78, range = 42-101).


The story retellings were elicited, transcribed, segmented, and coded as described previously for Condition All transcripts were reviewed by the second author to check for transcription and segmentation accuracy and the coding of dependent variables. Any disagreements were reviewed by both the second author and the research assistant who had done the testing and were resolved.

Data Analysis

The findings from the 12 students with language impairments were compared to findings from the 20 students with normal language who had participated in Conditions I and 3 from Experiment 1. Because the 12 students had not been matched for age and/or IQ to Experiment 1 students, tests of statistical significance were inappropriate for comparing the findings from the two experiments.

Moreover, the purpose of Experiment 2 was to examine whether the clinician-supported testing condition would elicit a variety of subordinate structures (both left- and right-branching) during story retelling as were produced by the students with normal language in Condition 1 of Experiment 1. Consequently, only descriptive statistics and bar charts are used here.


For Condition 1 students (with language impairments), the mean number of words per T-unit was 8.8 (SD = 1.2, median = 8.7, range = 7.1-10.9). The mean number of clauses per T-unit (subordination index) was 1.36 (SD = .2, median = 1.33, range = 1.14-1.72). As anticipated, these means for Condition 1 students (with language impairments) were slightly smaller than the means for Condition 1 students (with normal language) but larger than those obtained for Condition 3 students (with normal language) (see Table 2 for comparison means).

All of the students with language impairments used some correct subordinate clauses. The mean number of correct subordinate clauses was 10.5 (SD = 7.2, median = 10, range = 4-27). Thirty-four percent of the 365 T-units produced included clause subordination; of these subordinate clauses, 55% were adverbial, 32% were nominal, and 13% were relative clauses-a finding that is consistent with that reported by Loban (1976) and Scott (1988b). A greater percentage of Condition 1 students (with language impairment and with normal language) used all three clause types than did Condition 3 students (normal language) (Figure 3).

The variety of specific clause types used was also examined. For adverbial clauses, the students with language impairments produced a mean of 4.4 (SD = 2.7, median = 4.5, range = 1-10). The adverbial clause connectors used, listed by frequency of use, were: when, because, if, before, while, so [that], even though, as, unless, and as [adj.] as. For nominal clauses, the mean was 2.0 (SD = 1.1, median = 2, range = 0-4). The nominal clause types, listed by frequency of use, were: no connector (as in dialogue), that, what, and [that] deletion. For relative clauses, the mean was .9 (SD =1.0, median = 1, range = 0-3). The relative clause types, listed by frequency of use, were: that, who, and whose. The mean numbers of different subordinate clause connectors used correctly for Condition I students (with and without language impairments) were larger than the mean numbers for Condition 3 students with normal language (Figure 4).

For left- versus right-branching clause types, Condition 1 students (with language impairments) produced a mean number of left-branching clauses of 3.4 (SD = 4.2, median = 2, range = 0-13); for right-branching clauses, the mean was 7.1 (SD = 4.4, median = 6, range = 2-14). Because the distributions did not approximate a normal curve, medians were used for comparing Conditions 1 and 3 rather than means (Figure 5). For Condition 1 students (with and without language impairments), the medians were larger than the medians for Condition 3 students (with normal language).

Condition I students (with language impairments) produced 126 subordinate clauses correctly and only 15 subordinate clauses incorrectly. Examples of errors (based on acceptability in adult standard American English) are listed below by the four error types that occurred.

1. Incorrect subordinator (even that/even though): Even that they stayed clear up to the evening, there still was a lot of pizza left.

Addition of unnecessary subordinator: Tori warned Fred that not to eat so much junk food.

Addition of unnecessary coordinating conjunction: And Mom gave him some medicine, and while she rubbed his belly.

Abandoned adverbial clause (lacked a main clause): When the two girls that liked Fred showed him a funny picture.

Overall, Condition 1 students (with language impairments) retold stories with syntactic complexity that was similar to, or slightly less complex, than that of Condition 1 students (with normal language) but more complex than that of Condition 3 students (with normal language).


In Experiment I (students with normal language), we found that Conditions 1 and 2 (in which additional story exposure was the variable held in common) resulted in more subordination overall, a greater variety of subordinate structures, and more left-branching subordination than Condition 3 (the traditional listen-retell procedure). Condition 1 required active participation through sentenceby-sentence imitation. Condition 2 required simply that students listen to the story again, sentence-by-sentence, with a brief pause following each sentence. We speculated that although Condition 2 students were not actively imitating, they may have actively processed and silently reconstructed each sentence, thus comprehending and remembering the story better than Condition 3 students.

Inspection of the data from Conditions 1 and 2 (Experiment 1) revealed that Condition I held a small advantage over Condition 2. A larger number of Condition I students used at least one relative clause and, as a group, Condition 1 students used a greater variety of subordinators and more left-branching subordination.

In Experiment 2, we found that Condition 1 students (with language impairments) used more subordination overall, a greater variety of subordinators, and more leftbranching subordination than Condition 3 students (with normal language). The specific information obtained regarding the amount, variety, and position of subordination was particularly useful for planning intervention and for documenting intervention gains.

Because the mean age of Condition I students (with language impairments) was 118.4 months (SD = 12) as compared to the mean age of 108.2 months (SD = 3.6) for Condition 3 students (with normal language), the findings reported here should be interpreted with caution. This age difference alone could explain part of the findings of more complex syntax used by students with language impairments in Condition 1 than students with normal language in Condition 3. Nonetheless, the finding that a simple enhancement could affect the obtained syntax scores is an important one. This finding may indicate that children with language impairments do not have a competence-based syntax problem for the types of structures measured and that the traditional listen-retell narrative-elicitation task is not particularly sensitive for revealing their syntactic production potential.

Only replication can establish the validity and reliability of the findings from this study. Validity issues that need to be addressed are whether the mean scores reliably discriminate between children who are developing language normally and children who are demonstrating language impairments, and whether scores increase with increases in age. In addition, the findings obtained under these experimental conditions need to be compared to findings from spontaneous use of complex syntax in order to determine if this assessment measures children’s potential to use complex syntactic structures.

Of considerable interest is determining whether differences are observed for the same children for enhanced versus unenhanced narrative-retelling tasks (on different, but parallel and equated stimulus stories, in an order-controlled design). With this information, a measure of the child’s potential to use complex structures, given assistance, would be obtained. Because more is learned about the structures a student is on the threshold of learning, the protocol may be valuable for planning purposes once a child has already been diagnosed with a language impairment.

The nature of the training component also needs to be studied further to determine the most effective and efficient type of clinician support. For example, perhaps merely listening to the story a second time (without pauses or imitation) would enhance complex sentence use during retelling. Also, a cloze procedure, in which the child is required to supply the targeted clause, should be studied. It may be that subjects would process the specific language input to a greater degree than they do during sentence imitation. Finally, presentation of the story in different contexts (e.g., classroom and/or small group) prior to collecting the retelling may be beneficial as well.

In summary, the evidence from this study supported the inclusion of a simple task enhancement when assessing children’s syntactic skills in a narrative context. Imitation of the story, sentence-by-sentence, during a second exposure to the narrative resulted in performance that differed from that of children who did not imitate. A challenge that remains for researchers in this area is determining the most effective and efficient manner in which to provide assistance in assessment. Especially for the assessment of complex syntax, the development of increasingly valid and meaningful assessment procedures is crucial.


The authors wish to acknowledge Kathy Barclay, Marian Carlton, Debbie Theobald, and Elsha Young for their assistance with narrative sampling for Experiment 2. The authors are grateful to William Strong for his review of this manuscript and for his assistance in writing Junk-Food Frog, and to Willis Pitkin for serving as a linguistic consultant.


Budoff, M., & Gottlieb, J. (1976). Special-class EMR children mainstreamed: A study of an aptitude (learning potential) x treatment interaction. American Journal of Mental Deficiency, 81, 1-11.

Clark, J. B., & Madison, C. L. (1981). Clark-Madison Test of Oral Language. Austin, TX: Pro-Ed.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum Associates.

Crais, E. R., & Lorch, N. (1994). Oral narratives in school-age children. Topics in Language Disorders, 14(3), 13-28.

Dunn, L. M., & Dunn, L. M. (1981). Peabody Picture Vocabulary Test-Revised. Circle Pines, MN: American Guidance Service.

Fujiki, M., & Brinton, B. (1987). Elicited imitation revisited: A comparison with spontaneous language production. Language Speech, and Hearing Services in Schools, 18, 301-311.

Gutierrez-Clellen, V. F., & Quinn, R. (1993). Assessing narratives in diverse cultural/linguistic populations: Clinical implications. Language, Speech, and Hearing Services in Schools, 24(1), 2-9.

Hammill, D. (1991). Detroit Tests of Learning Aptitude (DTLA-3). Austin, TX: Pro-Ed.

Hammill, D. D., & Newcomer, P. L. (1988). Test of Language Development: Intermediate (2nd ed.). Austin, TX: Pro-Ed.

Holloway, K. (1986). The effects of basal readers on oral language structures: A description of complexity. Journal of Psycholinguistic Research, 15, 141-151.

Hopkins, K. D., & Anderson, B. L. (1973). A guide to multiplecomparison techniques: Criteria for selecting the “method of choice.” Journal of Special Education, 7, 319-328.

Hughes, D., McGillivray, L., & Schmidek, M. (1997). Sourcebook for narrative language: Procedures for assessment. Eau Claire, WI: Thinking Publications.

Hunt, K. W. (1965). Grammatical structures written at three grade levels (NCTE Research Report No. 3). Urbana. IL: National Council of Teachers of English.

Jitendra, A. K., & Kameenui, E. J. (1993). Dynamic assessment as a compensatory assessment approach: A description and analysis. Remedial and Special Education, 14, 6-18.

Leech, G., & Svartvik, J. (1975). A communicative grammar of English. Essex, England: Longman.

Lidz, C. S. (Ed.). (1987). Dynamic assessment: An interactional approach to evaluating learning potential. New York: Guilford Press.

Loban, W. (1976). Language development: Kindergarten through grade 12. Urbana, IL: National Council of Teachers of English.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

McReynolds, L. V., & Kearns, K. P. (1983). Single-subject experimental designs in communicative disorders. Baltimore, MD: University Park Press.

Merritt, D., & Liles, B. (1989). Narrative analysis: Clinical applications of story generation and story retelling. Journal of Speech and Hearing Disorders, 54(3), 429-438. Milosky, L. M. (1987). Narratives in the classroom. Seminars in Speech and Language, 8(4), 329-343. Nelson, N. W. (1988). Planning Individualized speech and language intervention programs. Tucson. AZ: Communication Skill Builders.

Newcomer, P. L., & Hammill, D. D. (1991). Test of Language Development: Primary (2nd ed.). Austin, TX: Pro-Ed. Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York:


Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production. Journal of Speech and Hearing Research, 39, 414-423. Owens, R. E., Jr. (1992). Language development: An introduction. New York: Macmillan.

Pena, E., Quinn, R., & Iglesias, A. (1992). The application of dynamic methods to language assessment; A nonbiased procedure. The Journal of Special Education, 26(3), 269-280. Prutting, C. A., & Connolly, J. E. (1976). Imitation: A closer

look. Journal of Speech and Hearing Disorders, 41(3), 412-422. Scott, C. M. (1988a). Producing complex sentences. Topics in Language Disorders, 8(2), 44-62.

Scott, C. M. (1988b). Spoken and written syntax. In M. A. Nippold (Ed.), Later language development (pp. 49-95). Boston, MA: A College-Hill Publication.

Scott, C. M., & Stokes, S. L. (1995). Measures of syntax in school-age children and adolescents. Language, Speech, and Hearing Services in Schools, 26(4), 309-319. Semel, E. M., Wiig, E. H., & Secord, W. (1994). Clinical Evaluation of Language Fundamentals (3rd ed.). San Antonio, TX: Psychological Corp.

Simon, C. S. (1984). Evaluating Communicative Competence: A functional pragmatic procedure. Tucson, AZ: Communication Skill Builders.

Strong, C, J. (1998). The Strong Narrative Assessment Procedure (SNAP). Eau Claire, W1: Thinking Publications

Strong, C. J., & Shaver, J. P. (1991) Stability of cohesion in the spoken narratives of language-impaired and normally developing school-aged children. American Speech-Language Hearing Association, 34. 95-111.

Strong, C. J., & Strong W. (1995). Junk-food frog. Department of Communicative Disorders and Deaf Education. Utah State University. Logan

Strong. C. J., & Strong, W (1999) Strong rhythms and rhymes: Language and literacy development through sentence combining Eau Claire WI: Thinking Publications

Vygotsky, L S. (1978). Mind in society: The development of higher psychological processes. Cambridge. MA: Harvard University Press.

Received November 15 1996 Accepted July 16, 1998

Dawn M. Gummersall Carol J. Strong

Utah State University, Logan

Contact author: Carol J. Strong, Professor, Communicative Disorders and Deaf Education, Utah State University, 1000 Old Main Hill, Logan, UT 84322. Email:

Copyright American Speech-Language-Hearing Association Apr 1999

Provided by ProQuest Information and Learning Company. All rights Reserved