Summary Assessing Grammar and Assessing Vocabulary
Summary Assessing Grammar from book language assessment by James E. Purpura
CHAPTER ONE
Differing notions of ‘grammar’ for Assessment
Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be sufficient for learners to actually acquire another language (Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study to the extent that in the Middle Ages in Europe, it was thought to be the foundation of all knowledge and the gateway to sacred and secular understanding (Hillocks and Smith, 1991). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century. Even a few decades ago, it would have been hard to imagine language instruction without immediately thinking of grammar.
What is meant by ‘grammar’ in theories of language?
Grammar and linguistics
This is important given the different definitions and conceptualizations of grammar that have been proposed over the years, and the diverse ways in which these notions of grammars have influenced L2 educators.
When most language teachers, second language acquisition (SLA) researchers and language testers think of ‘grammar’, they call to mind one of the many paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for the study and analysis of language.
Form-based perspectives of language
Several syntactocentric, or form-based, theories of language have provided grammatical insights to L2 teachers. There are three: traditional grammar, structural linguistics and transformational-generative grammar.
Form- and use-based perspectives of language
The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote different meanings, we might turn to those theories that study grammatical form and use interfaces.
Communication-based perspectives of language
Other theories have provided grammatical insights from a communication- based perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora.
What is pedagogical grammar?
A pedagogical grammar represents an eclectic, but principled description of the target-language forms, created for the express purpose of helping teachers understand the linguistic resources of communication. These grammars provide information about how language is organized and offer relatively accessible ways of describing complex, linguistic phenomena for pedagogical purposes. The more L2 teachers understand how the grammatical system works, the better they will be able to tailor this information to their specific instructional contexts.
CHAPTER TWO
Research on L2 grammar teaching, learning and assessment
I will discuss the research on L2 grammar teaching and learning and show how this research has important insights for language teachers and testers wanting to assess L2 grammatical ability. Similarly, I will discuss the critical role that assessment has played in empirical inquiry on L2 grammar teaching and learning.
Research on L2 teaching and learning
Over the years, several of the questions mentioned above have intrigued language teachers, inspiring them to experiment with different methods, approaches and techniques in the teaching of grammar. To determine if students had actually learned under the different conditions, teachers have used diverse forms of assessment and drawn their own conclusions about their students
The SLA research looking at the role of grammar instruction in SLA might be categorized into three strands. One set of studies has looked at the relationship between the acquisition of L2 grammatical knowledge and different language-teaching methods. These are referred to as the comparative methods studies. A second set of studies has examined the acquisition of L2 grammatical knowledge through what Long and Robinson (1998) call a ‘non-interventionist’ approach to instruction. These studies have examined the degree to which grammatical ability could be acquired incidentally (while doing something else) or implicitly (without awareness), and not through explicit (with awareness) grammar instruction. A third set of studies has investigated the relationship between explicit grammar instruction and the acquisition of L2 grammatical ability. These are referred to as the interventionist studies, and are a topic of particular interest to language teachers and testers.
Comparative methods studies
The comparative methods studies sought to compare the effects of different language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from a reaction to the grammar-translation method, which had dominated language instruction during the first half of the twentieth century.
Non-interventionist studies
While some language educators were examining different methods of teaching grammar in the 1960s, others were feeling a growing sense of dissatisfaction with the central role of grammar in the L2 curriculum.
Empirical studies in support of non-intervention
The non-interventionist position was examined empirically by Prabhu (1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, Rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication.
Possible implications of fixed developmental order to language assessment
The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably have some relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s interlanguage.
Problems with the use of development sequences as a basis for assessment
Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing.
Interventionist studies
Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain,1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so.
Empirical studies in support of intervention
A side from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn (1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction.
Research on instructional techniques and their effects on acquisition
Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside the purview of this book (see Ellis, 1997; Doughty and Williams, 1998), these techniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000).
Grammar processing and second language development
In the grammar-learning process, explicit grammatical knowledge refers to a conscious knowledge of grammatical forms and their meanings. Explicit knowledge is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b). DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it involves the explanation of a rule or the request to focus on a grammatical feature. Instruction can be explicitly deductive, where learners are given rules and asked to apply them, or explicitly inductive, where they are given samples of language from which to generate rules and make generalizations. Similarly, many types of language test tasks (i.e., gap-filling tasks) seem to measure explicit grammatical knowledge.
Implicit grammatical knowledge
refers to ‘the knowledge of a language that is typically manifest in some form of naturally occurring language behavior such as conversation’ (Ellis, 2001b, p. 252). In terms of processing time, it is unconscious and is accessed quickly. DeKeyser (1995) classifies grammatical instruction as implicit when it does not involve rule presentation or a request to focus on form in the input; rather, implicit grammatical instruction involves semantic processing of the input with any degree of awareness of grammatical form.
Implications for assessing grammar
The studies investigating the effects of teaching and learning on grammatical performance present a number of challenges for language assessment. First of all, the notion that grammatical knowledge structures can be differentiated according to whether they are fully automatized (i.e., implicit) or not (i.e., explicit) raises important questions for the testing of grammatical ability (Ellis, 2001b). Given the many purposes of assessment, we might wish to test explicit knowledge of grammar, implicit knowledge of grammar or both. For example, in certain classroom contexts, we might want to assess the learners’ explicit knowledge of one or more grammatical forms, and could, therefore, ask learners to answer multiple-choice or short-answer questions related to these forms.
CHAPTER THREE
The role of grammar in models of communicative language ability
In this chapter I will discuss the role that grammar plays in models of communicative competence. I will then endeavor to define grammar for assessment purposes. In this discussion I will describe in some detail the relationships among grammatical form, grammatical meaning and pragmatic meaning. Finally, I will present a theoretical model of grammar that will be used in this book as a basis for a model of grammatical knowledge
The role of grammar in models of communicative competence
In sum, many different models of communicative competence have emerged over the years. The more recent depictions have presented much broader conceptualizations of communicative language ability;
What is meant by ‘grammar’ for assessment purposes?
Now with a better understanding of how grammar has been conceptualized in models of language ability, how might we define ‘grammar’ for assessment purposes? It should be obvious from the previous discussion that there is no one ‘right’ way to define grammar. In one testing situation the assessment goal might be to obtain information on students’ knowledge of linguistic forms in minimally contextualized sentences, while in another, it might be to determine how well learners can use linguistic forms to express a wide range of communicative meanings. Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes.
CHAPTER FOUR
Towards a definition of grammatical ability
What is meant by grammatical ability?
Having described how grammar has been conceptualized, we are now faced with the challenge of defining what it means to ‘know’ the grammar of a language so that it can be used to achieve some communicative goal. In other words, what does it mean to have ‘grammatical ability’?
Defining grammatical constructs
A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability. The many possible ways of interpreting what it means to ‘know grammar’ or to have ‘grammatical ability’ highlight the importance in language assessment of defining key terms. Some of the same terms used by different testers reflect a wide range of theoretical positions in the field of applied linguistics. In this book, I will use several theoretical terms from the domain of language testing. These include knowledge, competence, ability, proficiency and performance, to name a few. These concepts are abstract, not directly observable in tests and open to multiple definitions and interpretations. Therefore, before we use abstract terms such as knowledge or ability, we need to ‘construct’ a definition of them that will both suit our assessment goals and be theoretically viable. I will refer to these abstract, theoretical concepts generically as constructs or theoretical constructs.
Definition of key terms
Before continuing this discussion, it might be helpful if I clarified some of the key terms.
Grammatical knowledge
Knowledge refers to a set of informational structures that are built up through experience and stored in long-term memory.
Grammatical ability
Although some researchers have defined knowledge and ability similarly, I use these terms differently. ‘Knowledge’ refers to a set of informational structures available for use in long-term memory.
Grammatical ability is, then, the combination of grammatical knowledge and strategic competence; it is specifically defined as the capacity to realize grammatical knowledge accurately and meaningfully in testing or other language-use situations. Hymes (1972) distinguished between competence and performance, stating that communicative competence includes the underlying potential of realizing language ability in instances of language use, whereas language performance refers to the use of language in actual language events.
Grammatical performance
grammatical performance is defined as the observable manifestation of grammatical ability in language use. In grammatical performance, the underlying grammatical ability of a test-taker may be masked by interactions with other attributes of the examinee or the test task
Metalinguistic knowledge
Finally, metalanguage is the language used to describe a language. It generally consists of technical linguistic or grammatical terms (e.g., noun,verb).
What is ‘grammatical ability’ for assessment purposes?
The approach to the assessment of grammatical ability in this book is based on several specific definitions. First, grammar encompasses grammatical form and meaning, whereas pragmatics is a separate, but related, component of language. A second is that grammatical knowledge, along with strategic competence, constitutes grammatical ability. A third is that grammatical ability involves the capacity to realize grammatical knowledge accurately and meaningfully in test-taking or other language-use contexts.
Knowledge of phonological or graphological form and meaning
Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations. Phonological form includes the segmentals (i.e., vowels and consonants) and prosody (i.e., stress, rhythm, intonation contours, volume, tempo).
Knowledge of lexical form and meaning
Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that reveal meaning. This includes words that mark gender (e.g., waitress), count ability (e.g., people) or part of speech (e.g., relate, relation).
Knowledge of lexical meaning allows us to interpret and use words based on their literal meanings. Lexical meaning here does not encompass the suggested or implied meanings of words based on contextual, sociocultural, psychological or rhetorical associations. Knowledge of morphosyntactic form and meaning
Knowledge of morphosyntactic form permits us to understand and produce both the morphological and syntactic forms of the language.
Knowledge of cohesive form and meaning
Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and Express cohesion on both the sentence and the discourse levels.
Knowledge of information management form and meaning
Knowledge of information management form allows us to use linguistic forms as a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures.
Knowledge of interactional form and meaning
Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-in interaction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions. For example, well . . . can signal disagreement, ya know or ah-huh can signal shared knowledge, and by the way can signal topic diversion. Conversation-management strategies include a wide range of linguistic forms that serve to facilitate smooth interaction or to repair interaction when communication breaks down.
CHAPTER FIVE
Designing test tasks to measure L2 grammatical ability
How does test development begin? Every grammar-test development project begins with a desire to obtain (and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996).
What do we mean by "task"?
The notion of ‘task’ in language-learning contexts has been conceptualized in many different ways over the years. Traditionally, ‘task’ has referred to any activity that requires students to do something for the intent purpose of learning the target language. A task then is any activity (i.e., short answers, role-plays) as long as it involves a linguistic or non-linguistic (circle the answer) response to input.
What are the characteristics of grammatical test tasks?
As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to the differences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests.
Describing grammar test tasks
For grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. In designing grammar tests, we need to be familiar with a wide range of activities to elicit grammatical performance.
Selected-response task types
Selected-response tasks present input in the form of an item, and test-takers are expected to select the response. Other than that, all other task characteristics can vary. However, in some instances, partial-credit scoring may be useful, depending on how the construct is defined. Finally, selected-response tasks can vary in terms of reactivity, scope and directness.
- The multiple-choice (MC) task
- Multiple-choice error identification task
- The discrimination task
- The noticing task
Limited-production task types
Limited-production tasks are intended to assess one or more areas of grammatical knowledge depending on the construct definition. Unlike selected-response items, which usually have only one possible answer, the range of possible answers for limited-production tasks can, at times, be large – even when the response involves a single word.
In other situations, limited-production tasks can be scored with a holistic or analytic rating scale. This method is useful if we wish to judge distinct aspects of grammatical ability with different levels of ability or mastery.
- The gap-filling task
- The short-answer task
- The dialogue (or discourse) completion task (DCT)
- Extended-production tasks
- The information-gap task (info-gap)
- The role-play and simulation tasks
CHAPTER SIX
Developing tests to measure L2 grammatical ability
The information derived from language tests, of which grammar tests are a subset, can be used to provide test-takers and other test-users with formative and summative evaluations. Formative evaluation relating to grammar assessment supplies information during a course of instruction or learning on how test-takers might increase their knowledge of grammar, or how they might improve their ability to use grammar in communicative contexts.
The quality of reliability
When we talk about ‘reliability’ in reference to a car, we all know what that means. A car is said to be reliable if it readily starts up every time we want to use it regardless of the weather, the time of day or the user. It is also considered reliable if the brakes never fail, and the steering is consistently responsive. These mechanical functions, working together, make the car’s performance anywhere from zero to one hundred percent reliable.
The quality of construct validity
The second quality that all ‘useful’ tests possess is construct validity. Bachman and Palmer (1996) define construct validity as ‘the extent to which we can interpret a given test score as an indicator of the ability(ies), or construct(s), we want to measure. Construct validity also has to do with the domain of generalization to which our score interpretations generalize’ (p. 21). Construct validity is clearly one of the most important qualities a test can possess. It tells us if we are measuring what we had intended to measure
The quality of authenticity
A third quality of test usefulness is authenticity, a notion much discussed in language testing since the late 1970s, when communicative approaches to language teaching were first taking root. Building on these discussions, Bachman and Palmer (1996) refer to ‘authenticity’ as the degree of correspondence between the test-task characteristics and the TLU task characteristics.
The quality of Interactiveness
A fourth quality of test usefulness outlined by Bachman and Palmer (1996) is interactiveness. This quality refers to the degree to which the aspects of the test-taker’s language ability we want to measure (e.g., grammatical knowledge, language knowledge) are engaged by the test-task characteristics (e.g, the input response, and relationship between the input and response) based on the test constructs.
The quality of impact
Bachman and Palmer (1996) refer to the degree to which testing and test score decisions influence all aspects of society and the individuals within that society as test impact. Therefore, impact refers to the link between the inferences we make from scores and the decisions we make based on these interpretations. In terms of impact, most educators would agree that tests should promote positive test-taker experiences leading to positive attitudes (e.g., a feeling of accomplishment) and actions (e.g., studying hard).
The quality of practicality
Test practicality is not a quality of a test itself, but is a function of the extent to which we are able to balance the costs associated with designing, developing, administering, and scoring a test in light of the available resources (Bachman, personal communication, 2002).
Overview of grammar-test construction
As a result, there is no one ‘right’ way to develop a test; nor are there any recipes for ‘good’ tests that could generalize to all situations. Test development is often presented as a linear process consisting of a number of stages and steps. In reality, the process is anything but linear.
Bachman and Palmer (1996) organize test development into three stages: design, operationalization and administration. I will discuss each of these stages in the process of describing grammar-test development. According to Bachman and Palmer (1996, p. 88), this document should contain the following components:
• Purpose
• TLU domains and tasks
• Characteristics of test-takers
• Construct(s) to be measured
• Plan for evaluating usefulness
• Plan for managing resources
Specifying the scoring method
Scoring method provides an explicit description of the criteria for correctness and the exact procedures for scoring the response. Generally speaking, tasks can be scored objectively, where the scorer does not need to make any expert judgments in determining if the answers are correct, or subjectively, where expert judgment is needed to judge performance.
• Scoring selected-response tasks
• Scoring limited-production tasks
• Scoring extended-production tasks
• Using scoring rubrics
• Grading
CHAPTER SEVEN
Illustrative tests of grammatical ability
In this chapter I will examine several examples of professionally developed language tests that measure grammatical ability. Some of these tests contain separate sections that are exclusively devoted to the assessment of grammatical ability, while others measure grammatical knowledge along with other components of language ability in the context of language use – that is while test-takers are listening, speaking, reading or writing.
The First Certificate in English Language Test (FCE)
- Purpose
The First Certificate in English (FCE) exam was first developed by the University of Cambridge Local Examinations Syndicate (UCLES, now Cambridge ESOL) in 1939 and has been revised periodically ever since. The purpose of the FCE (Cambridge ESOL, 2001a) is to assess the general English language proficiency of learners as measured by their abilities in reading, writing, speaking, listening, and knowledge of the lexical and grammatical systems of English (Cambridge ESOL, 1995, p. 4).
- Construct definition and operationalization
According to the FCE Handbook (Cambridge ESOL, 2001a), the Use of English paper is designed to measure the test-takers’ ability to ‘demonstrate their knowledge and control of the language system by completing a number of tasks, some of which are based on specially written texts’ (p. 7).
- Measuring grammatical ability through language use
In addition to measuring grammatical ability in the Use of English paper of the test, the FCE measures grammatical ability in the writing and speaking sections. Language use in the writing paper is measured in the contexts of writing letters, articles, reports and compositions (Cambridge ESOL, 2001a, p. 7).
- The FCE and the qualities of test usefulness
The qualities of test usefulness, the FCE clearly gives priority to construct validity, especially as this relates to the measurement of grammatical ability as one component of English language proficiency. The purpose and uses of the FCE, the establishment of a discrete, empirical relationship between the target language use tasks and the test tasks in the Use of English paper of the test is difficult to determine from the published literature.
- The Comprehensive English Language Test (CELT)
Purpose The Comprehensive English Language Test (CELT) (Harris and Palmer, 1970a, 1986) was designed to measure the English language ability of nonnative speakers of English..
- Construct definition and operationalization
According to the CELT Technical Manual (Harris and Palmer, 1970b), the structure subtest is intended to measure the students’ ‘ability to manipulate the grammatical structures occurring in spoken English.
The CELT and the qualities of test usefulness
In terms of the purposes and intended uses of the CELT, the authors explicitly stated, ‘the CELT is designed to provide a series of reliable and easy-to-administer tests for measuring English language ability of non native speakers’ (Harris and Palmer, 1970b, p. 1). As a result, concerns for high reliability and ease of administration led the authors to make choices privileging reliability and practicality over other qualities of test usefulness.
Finally, authenticity in the CELT was low due to the exclusive use of multiple-choice tasks and the lack of correspondence between these tasks and those one might encounter in the target language use domain. Interactiveness was also low due to the test’s inability to fully involve the test-takers’ grammatical ability in performing the tests. The impact of the CELT on stakeholders is not documented in the published manual.
The Community English Program (CEP) Placement Test
- Purpose
The Community English Program (CEP) Placement Test was first developed by students and faculty in the TESOL and Applied Linguistics Programs at Teachers College, Columbia University, in 2002, and is revised regularly. Unlike the previous tests reviewed, the CEP Placement Testis a theme-based assessment designed to measure the communicative language ability of learners entering the Community English Program, a low-cost, adult ESL
- Construct definition and operationalization
Given that the CEP is a theme-based ESL program, where language instruction is contextualized within a number of different themes throughout the different levels, the CEP Placement Test is also theme-based. The theme for the CEP Placement Test under review is ‘Cooperation and Competition’. This is not one of the themes students encounter in the program. In this test, all five test sections assess different aspects of language ability while exposing examinees to different aspects of the theme.
- Measuring grammatical ability through language use
In addition to measuring grammatical ability in the grammar section, grammatical ability is also measured in the writing and speaking sections of the test. The writing section consists of one 30-minute essay to bewritten on the theme of ‘cooperation and competition’.
- The CEP Placement Test and the qualities of test usefulness
In terms of the qualities of test usefulness, the developers of the grammar section of the CEP Placement Test prioritize construct validity, reliability and practicality. With regard to construct validity, the grammar section of this test was designed to measure both grammatical form and meaning on the sentential and discourse levels, sampling from a wide range of grammatical features. In this test, grammatical ability is measured by means of four tasks in the grammar section, one task in the writing section, and by several tasks in the speaking section. In short, the CEP. Placement Test measures both explicit and implicit knowledge of grammar. Placement decisions based on interpretations of the CEP Placement Test scores seem to be appropriate as only a handful of misplacements are reported each term.
The reliability of the grammar-test scores was also considered a priority from the design stage of test development as seen in the procedures for item development, test piloting and scoring. In an effort to promote consistency (and quick return of the results), the CEP Placement Test developers decided to use only multiple-choice tasks in the grammar section. This decision was based on the results of the pilot tests, where the use of limited production grammar tasks showed inconsistent scoring results and put a strain on time resources.
CHAPTER EIGHT
Learning-oriented assessments of grammatical ability
Introduction
The language tests reviewed in the previous chapter involved the grammar sections from large-scale tests designed to measure global language proficiency, typically for academic purposes. Like other large-scale and often high-stakes tests, they were designed to make institutional decisions related to placement into or exit from a language program, screening for language proficiency or reclassification of school status based on whether a student had achieved the language skills necessary to benefit from instruction in the target language. These tests provide assessments for several components of language ability including, among others, aspects of grammatical knowledge.
What is learning-oriented assessment of grammar?
In reaction to conventional testing practices typified by large-scale, discrete-point, multiple-choice tests of language ability, several educators (e.g., Herman, Aschbacher and Winters, 1992; Short, 1993; Shohamy, 1995; Shepard, 2000) have advocated reforms so that assessment practices might better capture educational outcomes and might be more consistent with classroom goals, curricula and instruction. The termalternative assessment, authentic assessment and performance assessment have all been associated with calls for reform to both large-scale and classroom assessment contexts. While alternative, authentic and performance assessment are all viewed to be essentially the same, they emphasize slightly different aspects of a move away from conventional, discrete-point, standardized assessment.
Alternative assessment emphasizes an alternative to and rejection of selected-response, timed and one shot approaches to assessment, whether they occur in large scale or classroom assessment contexts. Alternative assessment encourages assessments in which students are asked to perform, create, produce or do meaningful tasks that both tap into higher-level thinking (e.g., problem-solving) and have real-world implications (Herman et al., 1992).
Implementing learning-oriented assessment of grammar
Considerations from grammar-testing theory
The development procedures for constructing large-scale assessments of grammatical ability discussed in Chapter 6 are similar to those needed to develop learning-oriented assessments of grammar for classroom purposes with the exception that the decisions made from classroom assessments will be somewhat different due to the learning-oriented mandate of classroom assessment. Also, given the usual low-stakes nature of the decisions in classroom assessment, the amount of resources that needs to be expended is generally less than that required for large-scale assessment.
Implications for test design In designing classroom-based, learning-oriented assessments, we need to provide a much more explicit depiction of the assessment mandate than we might do for large-scale assessments. This is because classroom assessment, especially in school contexts, has many interested stakeholders (e.g. students, teachers, parents, tutors, principals, school districts), who are likely to be held accountable for learning and who will use the assessment information to evaluate instructional outcomes and plan for further instruction.
Implications for operationalization
The operationalization stage of classroom-based, learning-oriented assessment is also similar to that of large-scale assessments. That is, the outcome should be a blueprint for the assessment, as described in Chapter 6. The learning mandate, however, will obviously affect the specification of test tasks so that characteristics such as the setting, the rubrics or the expected response can be better aligned with instructional goals. For example, in classroom-based assessment, we may wish to collect information about grammar ability during the course of instruction, and we may decide to evaluate performance by means of teacher observationreports, or we may wish to assess grammatical ability by means of informal oral interviews conducted over several day.
Learning-oriented assessment of grammar may be achieved by means of a wide array of data-gathering methods in classroom contexts. These obviously include conventional quizzes and tests containing selected response, limited-production and all sorts of extended-production tasks, as discussed earlier. These conventional methods provide achievement or diagnostic information to test-users, and can occur before, during or after instruction, depending on the assessment goals. They are often viewed as ‘separate’ from instruction in terms of their administration. These assessments are what most teachers typically call to mind when they think of classroom tests.
Planning for further learning
The usefulness of learning-oriented, classroom assessment is to a great extent predicated upon the quality and explicitness of information obtained and its relevance for further action. Research has shown, however, that the quality of feedback contributes more to further learning than the actual presence or absence of it (Bangert-Downs et al., 1991).
Teachers have many options for presenting assessment results to students. They could present students with feedback in the form of a single overall test score, a score for each test component, scores referenced to a rubric, a narrative summary of teacher observations or a profile of scores showing development over time. Feedback can also be presented in a private conference with the individual student. In an effort to understand the effect of feedback on further learning, Butler (1987) presented test takers with feedback from an assessment in one of three forms: (1) focused written comments that addressed criteria test-takers were aware of before the assessment; (2) grades derived from numerical scoring; and (3) grades and comments. Test-takers were then given two subsequent tasks, and significant gains were observed with those who received thedetailed coments.
Considerations from L2 learning theory
Given that learning-oriented assessment involves the collection and interpretation of evidence about performance so that judgments can be made about further language development, learning-oriented assessment of grammar needs to be rooted not only in a theory of grammar testing or language proficiency, but also in a theory of L2 learning. What is striking in the literature is that models of language ability rarely refer to models of language learning, and models of language learning rarely make reference to models of language ability.
As we have seen, implementing grammar assessment with a learning mandate has implications for test construction. Some of these implications have already been discussed. However, implementing learning oriented assessment of grammar is not only about task design and operationalization, teachers also need to consider how assessment relates to and can help promote grammar acquisition, as described by Van Patten (1996).
SLA processes – briefly revisited
As discussed in Chapter 2, research in SLA suggests that learning an L2 involves three simultaneously occurring processes: input processing (Van Patten, 1996), system change (Schmidt, 1990) and output processing (Swain, 1985; Lee and Van Patten, 2003). Input processing relates to how the learner understands the meaning of a new grammatical feature or how form–meaning connections are made (Ellis, 1993; Van Patten, 1996). A critical first stage of acquisition is the conversion of input to ‘intake’. The second set of processes, system change, refers to how learners accommodate new grammatical forms into their inter language and how this change helps restructure their inter language so that it is more target like (McLaughlin, 1987; De Keyser, 1998).
Assessing for intake
Van Patten and Cadierno (1993b) describe this critical first stage of acquisition as the process of converting input into ‘intake’. In language classrooms, considerable time is spent on determining if students have understood. As most teachers know, however, it is difficult to discern if their students have mapped meaning onto the form.
Assessing for intake requires that learners understand the target forms, but do not produce them themselves. This can be achieved by selected response and limited-production tasks in which learners need to make form meaning connections. Three examples of interpretation tasks designed to assess for intake are presented below. (For additional examples of interpretation tasks, see Ellis, 1997; Lee and Van Patten, 2003; and Van Patten, 1996, 2003.).
Assessing to push restructuring
Once input has been converted into intake, the new grammatical feature is ready to be ‘accommodated’ into the learner’s developing linguistic system, causing a restructuring of the entire system (Van Patten, 1996). To initiate this process, teachers provide students with tasks that enable them to use the new grammatical forms in decreasingly controlled situations so they can incorporate these forms into their existing system of implicit grammatical knowledge.
Assessing for output processing
Although learners may have developed
an explicit knowledge of the form and meaning of a new grammatical point, this does not necessarily mean they can access this knowledge automatically in spontaneous communication. In order for learners to produce unplanned, meaningful output in real time (i.e., speaking), they need to be able to tap into grammatical knowledge that is already an unconscious part of their developing system of language knowledge (Lee and VanPatten, 2003). Thus, to assess the test takers’ implicit knowledge of grammar (i.e., their ability to process output), test-takers need to be presented with tasks that ask them to produce language in real time, where the focus is more on the content being communicated or on the completion of the task than on the application of explicit grammar rules.
Illustrative of learning-oriented assessment
Let us now turn to an illustration of a learning-oriented achievement test of grammatical ability.
Making assessment learning-oriented
The On Target achievement tests were designed with a clear learning mandate. The content of the tests had to be strictly aligned with the content of the curriculum. This obviously had several implications for the test design and its operationalization. From a testing perspective, the primary purpose of the Unit 7 achievement test was to measure the students’ explicit as well as their implicit knowledge of grammatical form and meaning on both the sentence and discourse levels.
While the TLU domain was limited to the use of the present perfect tense to discuss life achievements, the constructs and tasks included in the test were both simple and complex. For example, the first gap-filling grammar task was intended only to assess the test-takers’ explicit knowledge of morphosyntactic form and the pronunciation task focused only on their explicit knowledge of phonological form. The second grammar task was slightly more complex in that it aimed to measure the test-takers’ ability to use these forms to communicate literal and intended meanings based on more extensive input.
CHAPTER NINE
Challenges and new directions inassessing grammatical ability
Introduction
Research and theory related to the teaching and learning of grammar have made significant advances over the years. In applied linguistics, our understanding of language has been vastly broadened with the work of corpus-based and communication-based approaches to language study, and this research has made pathways into recent pedagogical grammars.
Also, our conceptualization of language proficiency has shifted from an emphasis on linguistic form to one on communicative language ability and communicative language use, which has, in turn, led to a demphasis on grammatical accuracy and a greater concern for communicative effectiveness.
The state of grammar assessment
In the last fifty years, language testers have dedicated a great deal of time to discussing the nature of language proficiency and the testing of the four skills, the qualities of test usefulness (i.e., reliability, authenticity), the relationships between test-taker or task characteristics and performance, and numerous statistical procedures for examining data and providing evidence of test validity. In all of these discussions, very little has been said about the assessment of grammatical ability, and unsurprisingly, until recently, not much has changed since the 1960s.
In recent years, the assessment of grammatical ability has taken an interesting turn in certain situations. Grammatical ability has been assessed in the context of language use under the rubric of testing speaking or writing. This has led, in some cases, to examinations in which grammatical knowledge is no longer included as a separate and explicit component of communicative language ability in the form of a separate subtest. In other words, only the students’ implicit knowledge of grammar alongside other components of communicative language ability (e.g., topic, organization, register) is measured. Having discussed how grammar assessment has evolved over the years, I will discuss in the next section some ongoing issues and challenges associated with assessing grammar.
Challenge 1: Defining grammatical ability
One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintained a strong syntactocentric view of language rooted largely in linguistic structuralism.
Challenge 2: Scoring grammatical ability
A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures.
Challenge 3: Assessing meanings
The third challenge revolves around ‘meaning’ and how ‘meaning’ in a model of communicative language ability can be defined and assessed. The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language.
Challenge 4: Reconsidering grammar-test tasks
The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching.
Challenge 5: Assessing the development of grammatical ability
The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind.
Final remarks
Despite loud claims in the 1970s and 1980s by a few influential SLA researchers that instruction, and in particular explicit grammar instruction, had no effect on language learning, most language teachers around the world never really gave up grammar teaching. Furthermore, these claims have instigated an explosion of empirical research in SLA, the results of which have made a compelling case for the effectiveness of certain types of both explicit and implicit grammar instruction. This research has also highlighted the important role that meaning plays in learning grammatical forms.
In the same way, most language teachers and SLA researchers around the world have never really given up grammar testing. Admittedly, some have been perplexed as to how grammar assessment could be compatible with a communicative language teaching agenda, and many have relied on assessment methods that do not necessarily meet the current standards of test construction and validation.
My aim in this book, therefore, has been to provide language teachers, language testers and SLA researchers with a practical framework, firmly based in research and theory, for the design, development and use of grammar assessments. I have tried to show how grammar plays a critical role in teaching, learning and assessment. I have also presented a model of grammatical knowledge, including both form and meaning, that could be used for test construction and validation. I then showed how L2 grammar tests can be constructed, scored and used to make decisions about test-takers in both large-scale and classroom contexts. Finally, in this last chapter, I have discussed some of the challenges we still face in constructing useful grammar assessments. My hope is that this volume will not only help language teachers, testers and SLA researchers develop better grammar assessments for their respective purposes, but instigate research and continued discussion on the assessment of grammatical ability and its role in language learning.
Summary Assessing Vocabulary from book Language Assessment by John Read
CHAPTER ONE
The Place Of Vocabulary In Language Assessment
Introduction
At first glance, it may seem that assessing the vocabulary knowledge of second language learners is both necessary and reasonably straightforward. It is necessary in the sense that words are the basic building blocks of language, the units of meaning from which larger structures such as sentences, paragraphs and whole texts are formed.
Vocabulary assessment seems straightforward in the sense that word lists are readily available to provide a basis for selecting a set of words to be tested. In addition, there is a range of well-known item types that are convenient to use for vocabulary testing.
Recent trends in language testing
However, scholars in the ®eld of language testing have a rather different perspective on vocabulary-test items of the conventional kind. Such items ®t neatly into what language testers call the discretepoint approach to testing. This involves designing tests to assess whether learners have knowledge of particular structural elements of the language: word meanings, word forms, sentence patterns, sound contrasts and so on..
Bachman and Palmer's (1996) book Language Testing in Practice, which is a comprehensive and in¯uential volume on language-test design and development. Following Bachman's (1990) earlier work, the authors see the purpose of language testing as being to allow us to make inferences about learners' language ability, which consists of two components. One is language knowledge and the other is strategic competence.
Three dimensions of vocabulary assessment
Up to this point, I have outlined two contrasting perspectives on the role of vocabulary in language assessment. One point of view is that it is perfectly sensible to write tests that measure whether learners know the meaning and usage of a set of words, taken as independent semantic units. The other view is that vocabulary must always be assessed in the context of a language-use task, where it interacts in a natural way with other components of language knowledge.
Discrete - embedded
The first dimension focuses on the construct which underlies the assessment instrument. In language testing, the term construct refers to the mental attribute or ability that a test is designed to measure
a discrete test takes vocabulary knowledge as a distinct construct, separated from other components of language competence.
an embedded vocabulary measure is one that contributes to the assessment of a larger construct. I have already given an example of such a measure, when I referred to Bachman and Palmer's task of writing a proposal for the improvement of university admissions procedures.
Selective - comprehensive
The second dimension concerns the range of vocabulary to be included in the assessment. A conventional vocabulary test is based on a set of target words selected by the test-writer, and the test-takers are assessed according to how well they demonstrate their knowledge of the meaning or use of those words. This is what I call a selective vocabulary measure. The target words may either be selected as individual words and then incorporated into separate test items, or alternatively the test-writer first chooses a suitable text and then uses certain words from it as the basis for the vocabulary assessment.
Context-independent - context-dependent
The role of context, which is an old issue in vocabulary testing, is the basis for the third dimension. Traditionally contextualisation has meant that a word is presented to test-takers in a sentence rather than as an isolated element. From a contemporary perspective, it is necessary to broaden the notion of context to include whole texts and, more generally, discourse.
The issue of context dependence also arises with cloze tests, in which words are systematically deleted from a text and the testtakers' task is to write a
suitable word in each blank space.
CHAPTER TWO
The Nature Of Vocabulary
Introduction
Before we start to consider how to test vocabulary, it is necessary first to explore the nature of what we want to assess. Our everyday concept of vocabulary is dominated by the dictionary. We tend to think of it as an inventory of individual words, with their associated meanings. This view is shared by many second language learners, who see the task of vocabulary learning as a matter of memorising long lists of L2 words, and their immediate reaction when they encounter an unknown word is to reach for a bilingual dictionary.
What is a word?
A basic assumption in vocabulary testing is that we are assessing knowledge of words. But the word is not an easy concept to define, either in theoretical terms or for various applied purposes. There are some basic points that we need to spell out from the start. One is the distinction between tokens and types, which applies to any count of the worlds in a text.
What about larger lexical Itemst
The second major point about vocabulary is that it consists of more than just single words. For a start, there are the phrasal verbs (get available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments' (p. 110).
What does it mean to know a lexical item?
The other seven assumptions cover various aspects of what is meant by knowing a word:
• Knowing a word means knowing the degree of probability of encountering that word in speech or print. For many words we also know the sort of words most likely to be found associated with the word.
• Knowing a word implies knowing the limitations on the use of the word according to variations of function and situation.
What ia vocabulary ability?
Mention of the term construct brings me back to the main theme I developed in Chapter 1, which was that scholars with a specialist interest in vocabulary teaching and learning have a rather different perspective from language testers on the question of how - and even whether - to assess vocabulary. My three dimensions of vocabulary assessment represent one attempt to incorporate the two perspectives within a single framework.
The context of vocabulary use
Traditionally in vocabulary testing, the term context has referred to the sentence or utterance in which the target word occurs. For in-stance, in a multiple-choice vocabulary item, it is normally recommended that the stem should consist of a sentence containing the word to be tested.
Vocabulary knowledge and fundamental processes
The second component in Chapelle's (1994) framework of vocabulary ability is the one that has received the most attention from applied linguists and second language teachers. Chapelle outines four dimensions of this component:
Vocabulary size: This refers to the number of words that a person knows.
Lexicon organization: This concerns the way in which words and other lexical items are stored in the brain.
Metacognitive strategies for vocabulary use
This is the third component of Chapelle's definition of vocabulary ability, and is what Bachman (1990) refers to as 'strategic competence'. The strategies are employed by all language users to manage the ways that they use their vocabulary knowledge in communication. Most of the time, we operate these strategies without being aware of it. It is only when we have to undertake unfamiliar or cognitively demanding communication tasks that the strategiès become more conscious.
CHAPTER THREE
Research on vocabulary acquisition and use
Introduction
The focus of this chapter is on research in second language vocabu- lary acquisition and use. There are three reasons for reviewing this research in a book on vocabulary assessment. The first is that the researchers are significant users of vocabulary tests as instruments in their studies.
Systematic vocabulary learning
Given the number of words that learners need to know if they are to achieve any kind of functional proficiency in a second language, it is understandable that researchers on language teaching have been in- terested in evaluating the relative effectiveness of different ways of learning new words.
Assessment Issues
As for the assessment implications, the design of tests to evaluate how well students have learned a set of new words is straightforward, particularly if the learning task is restricted to memorising association between an L2 word and its L1 meaning. It simply involves presenting the test-takers with one word and asking them to supply the other-language equivalent. However, as Ellis and Beaton (1993b) note, it makes a difference whether they are required to translate into or out of their own language
Incidental vocabulary learning
The term incidental often causes problems in the discussion of research on this kind of vocabulary acquisition. In practice it usually isciation between an L2 word and its L1 meaning. It simply involves presenting the test-takers with one word and asking them to supply the other-language equivalent.
Research with native speakers
The first step in investigating this kind of vocabulary acquisition was to obtain evidence that it actually happened. Teams of reading researchers in the United States (Jenkins, Stein and Wysocki, 1984; Nagy, Herman and Anderson, 1985; Nagy, Anderson and Herman, 1987) undertook a series of studies with native-English-speaking school children.
Second language research
Now, how about incidental learning of second language vocabulary? In a study that predates the LI research in the US, Saragi, Nation and Meister (1978) gave a group of native speakers of English the task of reading Anthony Burgess's novel A Clockwork Orange, which contains a substantial number of Russian-derived words functions as an argot used by the young delinquents who are the main characters in the book.
Assessment issues
Now, what are the testing issues that arise from this research on incidental vocabulary acquisition? One concerns the need for a pretest. A basic assumption made in these studies is that the target words are not known by the subjects. To some extent, it is possible to rely on teachers' judgements or word-frequency counts to select words that a particular group of learners are unlikely to know, but it is preferable to have some more direct evidence. The use of a pre-test allows the researchers to select from a set of potential target words ones that none of the subjects are familiar with.
Questions about lexical inferencing
This topic is not purely a pedagogical concern. Inferencing by learners is of great interest in second language acquisition research and there are several types of empirical studies that are relevant here. In re-viewing this work, I find it helpful to start with five questions that seem to follow a logical sequence:
1 What kind of contextual information is available to readers to help them in guessing the meaning of unknown words in texts?
2 Are such clues normally available to the reader in natural, unedited texts?
3 How do well learners infer the meaning of unknown words without being specifically trained to do so?
4 Is training strategy an effective way of developing learners' lexical inferencing skills?
5 Does successful inferencing lead to acquisition of the words?
Assessment issues
As in any test-design project, we first need to be clear about what the purpose of a lexical inferencing test is. The literature I have just reviewed above indicated at least three possible purposes:
1 to conduct research on the processes that learners engage in when they attempt to enter the meaning of unknown words;
2 to evaluate the success of a program to train learners to apply lexical inferencing strategies; or
3 to assess learners on their abilities to make inferences about unknown words.
Communication strategies
When compared with the amount of research on ways that learners cope with unknown words they encounter in their reading, there has been less investigation of the vocabulary difficulties they face in expressing themselves through speaking and writing. However, within the field of second language acquisition, there is an active tradition of research on communication strategies.
CHAPTER FOUR
Research on vocabulary assessment
Introduction
In the previous chapter, we see how tests play a role in research on vocabulary within the field of second language acquisition (SLA). Now we move on to consider research in the field of language testing. where the focus is not so much on understanding the processes of vocabulary learning as on measuring the level of vocabulary knowledge and abilities that learners have reached.
Objective testing
The history of vocabulary assessment in the twentieth century is very much associated with the development of objective testing, especially in the United States. Objective tests are ones in which the learning material is divided into small units, each of which can be assessed by means of a test item with a single correct answer that can be specified in advance.
Multiple-choice vocabulary items
Although the multiple-choice format is one of the most widely used methods of vocabulary assessment, both for native speakers and for second language learners, its limitations have also been recognized for a long time.
Validating tests of vocabulary knowledge
Writers on first language reading research over the years (Kelley and Krey, 1934; Farr, 1969: Schwartz, 1984) have pointed out that, in addition to the various variations of the multiple-choice format, a wide range of test items and methods have been used for measuring vocabulary knowledge. Kelley and Krey (cited in Farr, 1969: 34) identified 26 different methods in standardized US vocabulary and reading tests.
Measuring vocabulary size
Let me first sketch some educational situations in which consideration of vocabulary size is relevant and where the research has been undertaken.
• Reading researchers have long been interested in estimating how many words are known by native speakers of English as they grow from childhood through the school years to adult life
• Estimates of native-speaker vocabulary size at different ages provide a target - though a moving one, of course - for the acquisitions of vocabulary by children entering school with little knowledge of the language used as the medium of instruction.
• International students undertaking upper secondary or university education through a new medium of instruction simply do not have discussion I use the two sets of terms interchangeably.
• Reading researchers have long been interested in estimating how many words are known by native speakers of English as they grow from childhood through the school years to adult life
• Estimates of native-speaker vocabulary size at different ages provide a target - though a moving one, of course - for the acquisitions of vocabulary by children entering school with little knowledge of the language used as the medium of instruction.
What counts as a word?
is an issue that I discussed in Chapter 2. The larger estimates of vocabulary sizes for native speakers tend to be calculated on the basis of individual word forms, whereas more conservative estimates take word families as the units to be measured. Remember that a word family consists of a base word together with its inflected and derived forms that share the same meaning.
How do we choose which words to test?
For practical reasons it is impossible to test all the words that the native speaker of a language might know. Researchers have typically started with a large dictionary and then drawn a sample of words representing, say, 1 per cent (1 in 100) of the total dictionary entries. The next step is to test how many of the selected words are known by a group of subjects.
How do we find out what the selected words are known?
Once a sample of words has been selected, it is necessary to find out - by means of some kind of test whether each word is known. In studies of vocabulary size, the criterion for knowing a word is usually quite liberal, because of the large number of words that need to be covered in the time available for testing. The following test formats have been commonly used:
- multiple-choice items of various types; matching of words with synonyms or definitions3;
- supplying an Ll equivalent to each L2 target word;
- The checklist (or yes-no) test, in which test-takers simply indicate whether they know the word or not.
Assessing quality of vocabulary knowledge
Whatever the merits of vocabulary-size tests, one limitation is that they can give only a superficial indication of how well any particular word is known. In fact this criticism has long been applied to many objective vocabulary tests, not just those that are designed to estimate the total vocabulary size. Dolch and Leeds (1953) analyzed the vocabulary subtests of five major reading and general achievement test batteries for American school children and found that
How to measure it?
How to conceptualize it? The Dolch and Leeds (1953) test items with which I introduced this section of the chapter essentially assessing precision of knowledge: do the test-takers know the specific meaning of each target word, rather than just having a vague idea about it? This represents one way to define quality of knowledge, but it assumes that each word has only one meaning to be precisely known.
The role of context
Whether we can separate vocabulary from other aspects of language proficiency is obviously relevant to the question of what the role of context is in vocabulary assessment. In the early years of objective testing, many vocabulary tests are presented in the target words in isolation. in lists or as the stems of multiple-choice items. It was considered that such tests were pure measures of vocabulary knowledge
Cloze tests as vocabulary measures
A standard cloze test consists of one or more reading passages from which words are deleted according to a fixed ratio (e.g. every seventh word). Each deleted word is replaced by a blank of uniform length, and the task of the test takers is to write a suitable word in each space.
The standard cloze
Let us first look at the standard, fixed-ratio cloze. A popular way of exploring the validity of cloze tests in the 1970s was to correlate them with various other types of tests. In numerous studies the cloze correlated highly with 'integrative' tests such as dictation or composition writing and at a rather lower level with more 'discrete-point' tests of vocabulary, grammar and phonology
The rational cloze
Although Oller has consistently favored the standard fixed-ratio format as the most valid form of the cloze procedure for assessing second language proficiency, other scholars have argued for a more selective approach to the deletion of words from the text. In his research, Alderson (1979) found that a single text could produce quite different tests depending on whether you deleted, say, every eighth word rather than every sixth.
The multiple-choice cloze
Choice items in a cloze test rather than the standard blanks to be filled in. Porter (1976) and Ozete (1977) argued that the standard format re-quires writing ability, whereas the multiple-choice version makes it more a measure of reading comprehension Jonz (1976) pointed out that a multiple-choice cloze could be marked more objectively because it controlled the range of responses that the test-takers could give In addition, he considered that providing response options made the test more student-centered - or 'learner-friendly, 'as we might say these days.
The C-test
At first glance the C-test-in which a series of short texts are prepared for testing by deleting the second half of every second word - may seem to be the version of the cloze procedure that is the least processing as a specific measure of vocabulary. For one thing, its creators intended that it should assess general proficiency in the language, particularly for selection and placement purposes, and that the representation should be a sample of all the elements in the text (Klein-Braley, 1985; 1997 : 63-66).
CHAPTER FIVE
Vocabulary Tests: Four case studies
The four tests are:
1. The voluntary level test;
2. The Eurocentres Vocabulary Size Test (EVST);
3. The Vocabulary Knowledge scale (VKS); and
4. The test of English as a Foreign Language (TOEFL)
The Vocabulary Levels Test
The vocabulary levels Test was devised by paul Nation at victoria University of Wellington I in New Zealand in the early 1980s as a simple instrument for classroom use by teachers in order to help the develop a suitable vocabulary teaching.and leaming programme for their students
The design of the test
The test is in five parts, representing five levels of word frequency in English: the first 2000 words, 3000 words, 5000 words, the University word level (beyond 5000 words) and 10,000
According to Nation (1990:261), the 2000- and 3000-word levels contain the high-frequency i words that all leamers need to kmow in order to function effectively in English.
• Validation
• New versions
The Eurocentres Vocabulary Size Test
Like the Vocabulary Levels Test, the Eurocentres Vocabulary Size Test (EVST) makes an estimate of a learner's vocabulary size using a graded sample of words covering numerous frequency levels. Another distinctive feature of the EVST is that it is administered by computer rather than as a pen-and-paper test. Let us now look at the test from two perspec tives: first as a placement instrument and then as a measure of vocabulary size.
• The EVST as a placement test
• The EVST as a measure of vocabulary size
Evaluation of the instrument
Paribakht and Wesche have been careful to make modest claims for their instrument: 'Its purpose is not to estimate general vocabulary knowledge, but rather to track the early development of specific words in an instructional or experimental situation' (Wesche and Paribakht, 1996: 33). They have obtained various kinds of evidence in their research for its reliability and validity as a measure of incidental vocabulary acquisition (Wesche and Paribakht, 1996: 31-33).
The Test of English as a Foreign Language
The Test of English as a Foreign Language, or TOEFL, is administered in 180 countries and territories to more than 900,000 candidates. Like other ETS tests, TOEFL relies on sophisticated statistical analyses and testing technology in order to ensure its quality as a measuring instrument and its efficient administration to such large numbers of test-takers. Until recently, all the items in the basic TOEFL test have been of this type.
• The original vocabulary items
From its beginning in 1964 until the mid-1970s, TOEFL consisted of five sections: listening comprehension, English structure, vocabulary, reading comprehension and writing ability.There were two types of vocabulary-test item, which were labelled sentence completion and synonym matching.
• The 1976 revision
In his study involving students in Peru, Chile and Japan, Plke found that, although the existing Vocabulary section of the test correlated highly (at 0.88 to 0.95) with the Reading Comprehension section, the new Words in Context items had even higher correlations (0.94 to 0.99) with the reading section of the experimental test.
• Towards more contextualised testing
At a conference convened by the TOEFL Program in 1984, a number of applied linguists were invited to present critical reviews of the extent to which TOEFL could be considered a measure of com municative competence.
• Vocbulary in the 1995 version
The results of this in-house research provided support for recommen dations from the TOEFL Committee of Examiners, an advisory body of scholars from outside ETS, that vocabulary knowledge should be assessed in a more integrative manner.
• The current situation
The latest development in the story occurred in 1998, with the intro duction of a computerised version of TOEFL. In most countries candidates now take the test at an individual testing station, sitting at a computer and using the mouse to record their responses to the items presented to them on the screen. For the reading test, the passages appear in a frame on the left side of the screen and the test items are shown one by one on the right side. Vocabulary items have been retained but in a different form from before.
CHAPTER SIX
The design of discrete vocabulary tests
Introduction
The discussion of vocabulary-test design in the first part of this chapter is based on the framework for language-test development i presented in Bachman and Palmer's (1996) book Language Testing in Practice. Since the full framework is too complex to cover here, I have chosen certain key steps in the test-development process as the basis for a discussion of important issues in the design of discrete vocabu lary tests in particular. In the second part of the chapter, I offer ai practical perspective on the development of vocabulary tests by means of two examples
Test Purpose
Following Bachman and Palmer's framework, an essential first step in language-test design is to define the purpose of the test. It is important to clarify what the test will be used for because, according to testing theory, a test is valid to the extent that we are justified in drawingi conclusons from its results.
we can identify three uses for language tests: for research, making decisions about learners and making decisions about language programmes.
Construct definition
Bachman and Palmer (1996: 117-120) state that there are two approaches to construct definition: syllabus-based and theory-based. A syllabus-based definition is ap propriate when vocabulary assessment takes place within a course of study, so that the lexical items and the vocabulary skills to be assessed can be specified in relation to the learning objectives of the course.
Receptive and productive vocabulary
This distinction between receptive and productive vocabulary is one that is accepted by scholars working on bothI first and second language vocabulary development, and it is often referred to by the alternative terms passive and active. As Melka (1997) points out, though, there are still basic problems in conceptualising and measuring the two types of vocabulary, in spite of a lengthy history of research on the subject.
Characteristics of the test input
The design of test tasks is the next step in test development, according to Bachman and Palmer's model. In this chapter, I focus on just two aspects of task design: characteristics of the input and characteristics of the expected response.
Selection of target words
Based on such findings, Nation (1990: Chapter 2) proposes that, for teaching and learning purposes, a broad three-way division can bei made Into high-frequency, low-frequency and specialised vocabulary. The hlgh-frequency category In English consists of 2000 word families, which form the foundation of the vocabulary knowledge that i all proficient users of the language must have acquired.On the other hand, low-frequency vocabulary as a whole is of much less value to learners
Presentation of words
- Words in isolation
As with other decisions in test design, the question of how to present selected words to the test-takers needs to be related to the purpose of the assessment.
- Words in context
For other purposes, the presentation of target words in some context i is desirable or necessary. In discrete, selective tests, the context most commonly consists of a sentence in which the target word occurs, buti it can also be a paragraph or a longer text containing a whole series of target words.
Characteristics of the expected response
- Self-report vs. verifiable response
In some testing situations, it is appropriate to ask the learners to assess their own lexical knowledge. In Chapter 5, we saw how the EVST and the VKS draw on self-report by the test-takers, although both instruments also incorporate a way of checking how valid the responses are as measures of vocabulary knowledge.
- Monolingual vs. Bilingual testing
last design consideration concerns the language of the test itself. Whereas in a monolingual test format only the target language is i used, a billingual one employs both the target language (L2) and the learners' own language (Ll).
Practical examples
Classroom progress tests
The purpose of my class tests is generally to assess the learners' progress in vocabulary learning and, more specifically, to give them an incentive to keep studying vocabulary on a regular basis.
Matching items
There are some aspects of the design of this item type which arei worth noting:
- The reason for adding one or two extra definitions is to avoid a situation where the learner knows four of the target words and can then get the fifth definition correct by process of elimination, without actually knowing what the word means.
- Assuming that the focus of the test is on knowledge of the target words, the definitions should be easy for the learners to understand. Thus, as a general principle, they should be composed of higher frequency vocabulary than the words to be tested and should not be written In an ellptical style that causes comprehension problems.
Completion items
Completion, or blank-filling, items consist of a sentence from which the target word has been deleted and replaced by a blank. As in the contextualised matching format above, the function of the sentence is to provide a context for the word and perhaps to cue a particular use of it.
Generic test items
In an individualised vocabulary programme, these generic items offer a practical alternative to having separate tests for each learner in the class. The same item types could also be used more convention. ally, with target words provided by the teacher, in a class where they learners have all studied the same vocabulary.
The word-associates test
The new starting point was the concept of word association. The standard word-association task involves presenting subjects with a set of stimulus words one by one and asking them to say the first related word that comes into their head.
CHAPTER SEVEN
Comprehensive measures of vocabulary
Introduction
Comprehensive measures are particularly suitable for assessment procedures in which vocabulary is embedded as one component of the measurement of a larger construct, such as communicative com petence in speaking, academic writing ability or listening comprehension. However, we cannot simply say that all comprehensive measures are embedded ones, because they can also be used on ai discrete basis.
Measures of test input
In reading and listening tests we have to be concerned about thei nature of the input text. At least two questions can be asked:
- Is it at a suitable level of difficulty that matches the ability range of the test-takers?
- Does it have the characteristics of an authentic text, especially if it has been produced or modified for use in the test? Here we are specifically interested in the extent to which informa tion about the vocabulary of the text can help to provide answers to these questions.
Rediability
In Ll reading research. the basic concept used in the analysis of textsi is readabillty, which refers to the various aspects of a text that arei likely to make it easy or difficult for a reader to understand and enjoy. During the twentieth century a whole educational enterprise grew up in the United States devoted to devising and applying formulas toi predict the readability of English texts for native-speaker readers in terms of school grade or age levels (for a comprehensive review, see Klare, 1984).
Listenability of spoken texts
Much more work has been done on the comprehensibility of written texts than of spoken language. Whereas the term readability is now very well established, its oral equivalent, listenabillty, has had only limited currency. However, it seems reasonable to expect that spoken texts used for the assessment of listening comprehension vary in the demands that they place on listeners in comparable ways to thei demands made of readers by different kinds of written language.
Measures of learner production
Most of this section is concerned with statistical measures of writing, because there is more published research on that topic, but l also consider measures i of spoken production, as well as the more qualitative approach to assessment represented by the use of rating scales to judgei performance.
CHAPTER EIGHT
Further development in vocabulary assessment
Introduction
In the main body of this chapter, I want to review i some current areas of work on second language vocabulary. which will provide additional evidence for my view that a wider perspective is required. and then explore the implications for further develop ments in vocabulary assessment for the future.
The identification of lexical units
One basic requirement for any work on vocabulary is good quality information about the units that we are dealing with. In this section of the chapter, I first review the current state of word-frequency listsi and then take up the question of how we might deal with multi-word Iexical items in vocabulary assessment.
The vocabulary of informal speech
The vocabulary speech is the second area of vocabulary study that has received less attention than it should have, as indicated by the fact that perhaps the most frequently cited research study is the one conducted by Schonell et al. (1956) in the 1950s on the spoken vocabulary of Australian workers.
The social dimension of vocabulary use
In addition, vocabulary knowledge and use are typically thought of in psycholinguistic terms, which minimises the existence of social variation among learners, apart from the fact i that they undertake various courses of study, pursue different careers i and have a range of personal interests.
For assessment purposes, the education domain is obviously an area of major concern, especially when there is evidence that learners i from particular social backgrounds lack the opportunity to acquire the vocabulary they need for academic study.
References:
Purpura, E. James. 2004. Assessing Grammar. United Kingdom: Cambridge University Press.
Read, John. 2000. Assessing Vocabulary. United Kingdom: Cambridge University Press.
