Language Assessment

Jumat, 15 Mei 2020

Assignment 11
Summary Assessing Grammar and Assessing Vocabulary

Summary Assessing Grammar from book language assessment by James E. Purpura

CHAPTER ONE

Differing notions of ‘grammar’ for Assessment

Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be sufficient for learners to actually acquire another language (Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study to the extent that in the Middle Ages in Europe, it was thought to be the foundation of all knowledge and the gateway to sacred and secular understanding (Hillocks and Smith, 1991). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century. Even a few decades ago, it would have been hard to imagine language instruction without immediately thinking of grammar.

What is meant by ‘grammar’ in theories of language?

Grammar and linguistics

This is important given the different definitions and conceptualizations of grammar that have been proposed over the years, and the diverse ways in which these notions of grammars have influenced L2 educators.

When most language teachers, second language acquisition (SLA) researchers and language testers think of ‘grammar’, they call to mind one of the many paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for the study and analysis of language.

Form-based perspectives of language

Several syntactocentric, or form-based, theories of language have provided grammatical insights to L2 teachers. There are three: traditional grammar, structural linguistics and transformational-generative grammar.

Form- and use-based perspectives of language

The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote different meanings, we might turn to those theories that study grammatical form and use interfaces.

Communication-based perspectives of language

Other theories have provided grammatical insights from a communication- based perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora.

What is pedagogical grammar?

A pedagogical grammar represents an eclectic, but principled description of the target-language forms, created for the express purpose of helping teachers understand the linguistic resources of communication. These grammars provide information about how language is organized and offer relatively accessible ways of describing complex, linguistic phenomena for pedagogical purposes. The more L2 teachers understand how the grammatical system works, the better they will be able to tailor this information to their specific instructional contexts.

CHAPTER TWO

Research on L2 grammar teaching, learning and assessment

I will discuss the research on L2 grammar teaching and learning and show how this research has important insights for language teachers and testers wanting to assess L2 grammatical ability. Similarly, I will discuss the critical role that assessment has played in empirical inquiry on L2 grammar teaching and learning.

Research on L2 teaching and learning

Over the years, several of the questions mentioned above have intrigued language teachers, inspiring them to experiment with different methods, approaches and techniques in the teaching of grammar. To determine if students had actually learned under the different conditions, teachers have used diverse forms of assessment and drawn their own conclusions about their students

The SLA research looking at the role of grammar instruction in SLA might be categorized into three strands. One set of studies has looked at the relationship between the acquisition of L2 grammatical knowledge and different language-teaching methods. These are referred to as the comparative methods studies. A second set of studies has examined the acquisition of L2 grammatical knowledge through what Long and Robinson (1998) call a ‘non-interventionist’ approach to instruction. These studies have examined the degree to which grammatical ability could be acquired incidentally (while doing something else) or implicitly (without awareness), and not through explicit (with awareness) grammar instruction. A third set of studies has investigated the relationship between explicit grammar instruction and the acquisition of L2 grammatical ability. These are referred to as the interventionist studies, and are a topic of particular interest to language teachers and testers.

Comparative methods studies

The comparative methods studies sought to compare the effects of different language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from a reaction to the grammar-translation method, which had dominated language instruction during the first half of the twentieth century.

Non-interventionist studies

While some language educators were examining different methods of teaching grammar in the 1960s, others were feeling a growing sense of dissatisfaction with the central role of grammar in the L2 curriculum.

Empirical studies in support of non-intervention

The non-interventionist position was examined empirically by Prabhu (1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, Rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication.

Possible implications of fixed developmental order to language assessment

The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably have some relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s interlanguage.

Problems with the use of development sequences as a basis for assessment

Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing.

Interventionist studies

Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain,1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so.

Empirical studies in support of intervention

A side from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn (1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction.

Research on instructional techniques and their effects on acquisition

Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside the purview of this book (see Ellis, 1997; Doughty and Williams, 1998), these techniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000).

Grammar processing and second language development

In the grammar-learning process, explicit grammatical knowledge refers to a conscious knowledge of grammatical forms and their meanings. Explicit knowledge is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b). DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it involves the explanation of a rule or the request to focus on a grammatical feature. Instruction can be explicitly deductive, where learners are given rules and asked to apply them, or explicitly inductive, where they are given samples of language from which to generate rules and make generalizations. Similarly, many types of language test tasks (i.e., gap-filling tasks) seem to measure explicit grammatical knowledge.

Implicit grammatical knowledge

refers to ‘the knowledge of a language that is typically manifest in some form of naturally occurring language behavior such as conversation’ (Ellis, 2001b, p. 252). In terms of processing time, it is unconscious and is accessed quickly. DeKeyser (1995) classifies grammatical instruction as implicit when it does not involve rule presentation or a request to focus on form in the input; rather, implicit grammatical instruction involves semantic processing of the input with any degree of awareness of grammatical form.

Implications for assessing grammar

The studies investigating the effects of teaching and learning on grammatical performance present a number of challenges for language assessment. First of all, the notion that grammatical knowledge structures can be differentiated according to whether they are fully automatized (i.e., implicit) or not (i.e., explicit) raises important questions for the testing of grammatical ability (Ellis, 2001b). Given the many purposes of assessment, we might wish to test explicit knowledge of grammar, implicit knowledge of grammar or both. For example, in certain classroom contexts, we might want to assess the learners’ explicit knowledge of one or more grammatical forms, and could, therefore, ask learners to answer multiple-choice or short-answer questions related to these forms.

CHAPTER THREE

The role of grammar in models of communicative language ability

In this chapter I will discuss the role that grammar plays in models of communicative competence. I will then endeavor to define grammar for assessment purposes. In this discussion I will describe in some detail the relationships among grammatical form, grammatical meaning and pragmatic meaning. Finally, I will present a theoretical model of grammar that will be used in this book as a basis for a model of grammatical knowledge

The role of grammar in models of communicative competence

In sum, many different models of communicative competence have emerged over the years. The more recent depictions have presented much broader conceptualizations of communicative language ability;

What is meant by ‘grammar’ for assessment purposes?

Now with a better understanding of how grammar has been conceptualized in models of language ability, how might we define ‘grammar’ for assessment purposes? It should be obvious from the previous discussion that there is no one ‘right’ way to define grammar. In one testing situation the assessment goal might be to obtain information on students’ knowledge of linguistic forms in minimally contextualized sentences, while in another, it might be to determine how well learners can use linguistic forms to express a wide range of communicative meanings. Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes.

CHAPTER FOUR

Towards a definition of grammatical ability

What is meant by grammatical ability?

Having described how grammar has been conceptualized, we are now faced with the challenge of defining what it means to ‘know’ the grammar of a language so that it can be used to achieve some communicative goal. In other words, what does it mean to have ‘grammatical ability’?

Defining grammatical constructs

A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability. The many possible ways of interpreting what it means to ‘know grammar’ or to have ‘grammatical ability’ highlight the importance in language assessment of defining key terms. Some of the same terms used by different testers reflect a wide range of theoretical positions in the field of applied linguistics. In this book, I will use several theoretical terms from the domain of language testing. These include knowledge, competence, ability, proficiency and performance, to name a few. These concepts are abstract, not directly observable in tests and open to multiple definitions and interpretations. Therefore, before we use abstract terms such as knowledge or ability, we need to ‘construct’ a definition of them that will both suit our assessment goals and be theoretically viable. I will refer to these abstract, theoretical concepts generically as constructs or theoretical constructs.

Definition of key terms

Before continuing this discussion, it might be helpful if I clarified some of the key terms.

Grammatical knowledge

Knowledge refers to a set of informational structures that are built up through experience and stored in long-term memory.

Grammatical ability

Although some researchers have defined knowledge and ability similarly, I use these terms differently. ‘Knowledge’ refers to a set of informational structures available for use in long-term memory.

Grammatical ability is, then, the combination of grammatical knowledge and strategic competence; it is specifically defined as the capacity to realize grammatical knowledge accurately and meaningfully in testing or other language-use situations. Hymes (1972) distinguished between competence and performance, stating that communicative competence includes the underlying potential of realizing language ability in instances of language use, whereas language performance refers to the use of language in actual language events.

Grammatical performance

grammatical performance is defined as the observable manifestation of grammatical ability in language use. In grammatical performance, the underlying grammatical ability of a test-taker may be masked by interactions with other attributes of the examinee or the test task

Metalinguistic knowledge

Finally, metalanguage is the language used to describe a language. It generally consists of technical linguistic or grammatical terms (e.g., noun,verb).

What is ‘grammatical ability’ for assessment purposes?

The approach to the assessment of grammatical ability in this book is based on several specific definitions. First, grammar encompasses grammatical form and meaning, whereas pragmatics is a separate, but related, component of language. A second is that grammatical knowledge, along with strategic competence, constitutes grammatical ability. A third is that grammatical ability involves the capacity to realize grammatical knowledge accurately and meaningfully in test-taking or other language-use contexts.

Knowledge of phonological or graphological form and meaning

Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations. Phonological form includes the segmentals (i.e., vowels and consonants) and prosody (i.e., stress, rhythm, intonation contours, volume, tempo).

Knowledge of lexical form and meaning

Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that reveal meaning. This includes words that mark gender (e.g., waitress), count ability (e.g., people) or part of speech (e.g., relate, relation).

Knowledge of lexical meaning allows us to interpret and use words based on their literal meanings. Lexical meaning here does not encompass the suggested or implied meanings of words based on contextual, sociocultural, psychological or rhetorical associations. Knowledge of morphosyntactic form and meaning
Knowledge of morphosyntactic form permits us to understand and produce both the morphological and syntactic forms of the language.

Knowledge of cohesive form and meaning

Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and Express cohesion on both the sentence and the discourse levels.

Knowledge of information management form and meaning

Knowledge of information management form allows us to use linguistic forms as a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures.

Knowledge of interactional form and meaning

Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-in interaction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions. For example, well . . . can signal disagreement, ya know or ah-huh can signal shared knowledge, and by the way can signal topic diversion. Conversation-management strategies include a wide range of linguistic forms that serve to facilitate smooth interaction or to repair interaction when communication breaks down.

CHAPTER FIVE

Designing test tasks to measure L2 grammatical ability

How does test development begin? Every grammar-test development project begins with a desire to obtain (and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996).

What do we mean by "task"?

The notion of ‘task’ in language-learning contexts has been conceptualized in many different ways over the years. Traditionally, ‘task’ has referred to any activity that requires students to do something for the intent purpose of learning the target language. A task then is any activity (i.e., short answers, role-plays) as long as it involves a linguistic or non-linguistic (circle the answer) response to input.

What are the characteristics of grammatical test tasks?

As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to the differences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests.

Describing grammar test tasks

For grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. In designing grammar tests, we need to be familiar with a wide range of activities to elicit grammatical performance.

Selected-response task types

Selected-response tasks present input in the form of an item, and test-takers are expected to select the response. Other than that, all other task characteristics can vary. However, in some instances, partial-credit scoring may be useful, depending on how the construct is defined. Finally, selected-response tasks can vary in terms of reactivity, scope and directness.

The multiple-choice (MC) task
Multiple-choice error identification task
The discrimination task
The noticing task

Limited-production task types

Limited-production tasks are intended to assess one or more areas of grammatical knowledge depending on the construct definition. Unlike selected-response items, which usually have only one possible answer, the range of possible answers for limited-production tasks can, at times, be large – even when the response involves a single word.

In other situations, limited-production tasks can be scored with a holistic or analytic rating scale. This method is useful if we wish to judge distinct aspects of grammatical ability with different levels of ability or mastery.

The gap-filling task
The short-answer task
The dialogue (or discourse) completion task (DCT)
Extended-production tasks
The information-gap task (info-gap)
The role-play and simulation tasks

CHAPTER SIX

Developing tests to measure L2 grammatical ability

The information derived from language tests, of which grammar tests are a subset, can be used to provide test-takers and other test-users with formative and summative evaluations. Formative evaluation relating to grammar assessment supplies information during a course of instruction or learning on how test-takers might increase their knowledge of grammar, or how they might improve their ability to use grammar in communicative contexts.

The quality of reliability

When we talk about ‘reliability’ in reference to a car, we all know what that means. A car is said to be reliable if it readily starts up every time we want to use it regardless of the weather, the time of day or the user. It is also considered reliable if the brakes never fail, and the steering is consistently responsive. These mechanical functions, working together, make the car’s performance anywhere from zero to one hundred percent reliable.

The quality of construct validity

The second quality that all ‘useful’ tests possess is construct validity. Bachman and Palmer (1996) define construct validity as ‘the extent to which we can interpret a given test score as an indicator of the ability(ies), or construct(s), we want to measure. Construct validity also has to do with the domain of generalization to which our score interpretations generalize’ (p. 21). Construct validity is clearly one of the most important qualities a test can possess. It tells us if we are measuring what we had intended to measure

The quality of authenticity

A third quality of test usefulness is authenticity, a notion much discussed in language testing since the late 1970s, when communicative approaches to language teaching were first taking root. Building on these discussions, Bachman and Palmer (1996) refer to ‘authenticity’ as the degree of correspondence between the test-task characteristics and the TLU task characteristics.

The quality of Interactiveness

A fourth quality of test usefulness outlined by Bachman and Palmer (1996) is interactiveness. This quality refers to the degree to which the aspects of the test-taker’s language ability we want to measure (e.g., grammatical knowledge, language knowledge) are engaged by the test-task characteristics (e.g, the input response, and relationship between the input and response) based on the test constructs.

The quality of impact

Bachman and Palmer (1996) refer to the degree to which testing and test score decisions influence all aspects of society and the individuals within that society as test impact. Therefore, impact refers to the link between the inferences we make from scores and the decisions we make based on these interpretations. In terms of impact, most educators would agree that tests should promote positive test-taker experiences leading to positive attitudes (e.g., a feeling of accomplishment) and actions (e.g., studying hard).

The quality of practicality

Test practicality is not a quality of a test itself, but is a function of the extent to which we are able to balance the costs associated with designing, developing, administering, and scoring a test in light of the available resources (Bachman, personal communication, 2002).

Overview of grammar-test construction

As a result, there is no one ‘right’ way to develop a test; nor are there any recipes for ‘good’ tests that could generalize to all situations. Test development is often presented as a linear process consisting of a number of stages and steps. In reality, the process is anything but linear.

Bachman and Palmer (1996) organize test development into three stages: design, operationalization and administration. I will discuss each of these stages in the process of describing grammar-test development. According to Bachman and Palmer (1996, p. 88), this document should contain the following components:

• Purpose
• TLU domains and tasks
• Characteristics of test-takers
• Construct(s) to be measured
• Plan for evaluating usefulness
• Plan for managing resources

Specifying the scoring method

Scoring method provides an explicit description of the criteria for correctness and the exact procedures for scoring the response. Generally speaking, tasks can be scored objectively, where the scorer does not need to make any expert judgments in determining if the answers are correct, or subjectively, where expert judgment is needed to judge performance.

• Scoring selected-response tasks
• Scoring limited-production tasks
• Scoring extended-production tasks
• Using scoring rubrics
• Grading

CHAPTER SEVEN

Illustrative tests of grammatical ability

In this chapter I will examine several examples of professionally developed language tests that measure grammatical ability. Some of these tests contain separate sections that are exclusively devoted to the assessment of grammatical ability, while others measure grammatical knowledge along with other components of language ability in the context of language use – that is while test-takers are listening, speaking, reading or writing.

The First Certificate in English Language Test (FCE)

Purpose

The First Certificate in English (FCE) exam was first developed by the University of Cambridge Local Examinations Syndicate (UCLES, now Cambridge ESOL) in 1939 and has been revised periodically ever since. The purpose of the FCE (Cambridge ESOL, 2001a) is to assess the general English language proficiency of learners as measured by their abilities in reading, writing, speaking, listening, and knowledge of the lexical and grammatical systems of English (Cambridge ESOL, 1995, p. 4).

Construct definition and operationalization

According to the FCE Handbook (Cambridge ESOL, 2001a), the Use of English paper is designed to measure the test-takers’ ability to ‘demonstrate their knowledge and control of the language system by completing a number of tasks, some of which are based on specially written texts’ (p. 7).

Measuring grammatical ability through language use

In addition to measuring grammatical ability in the Use of English paper of the test, the FCE measures grammatical ability in the writing and speaking sections. Language use in the writing paper is measured in the contexts of writing letters, articles, reports and compositions (Cambridge ESOL, 2001a, p. 7).

The FCE and the qualities of test usefulness

The qualities of test usefulness, the FCE clearly gives priority to construct validity, especially as this relates to the measurement of grammatical ability as one component of English language proficiency. The purpose and uses of the FCE, the establishment of a discrete, empirical relationship between the target language use tasks and the test tasks in the Use of English paper of the test is difficult to determine from the published literature.

The Comprehensive English Language Test (CELT)

Purpose The Comprehensive English Language Test (CELT) (Harris and Palmer, 1970a, 1986) was designed to measure the English language ability of nonnative speakers of English..

Construct definition and operationalization

According to the CELT Technical Manual (Harris and Palmer, 1970b), the structure subtest is intended to measure the students’ ‘ability to manipulate the grammatical structures occurring in spoken English.

The CELT and the qualities of test usefulness

In terms of the purposes and intended uses of the CELT, the authors explicitly stated, ‘the CELT is designed to provide a series of reliable and easy-to-administer tests for measuring English language ability of non native speakers’ (Harris and Palmer, 1970b, p. 1). As a result, concerns for high reliability and ease of administration led the authors to make choices privileging reliability and practicality over other qualities of test usefulness.

Finally, authenticity in the CELT was low due to the exclusive use of multiple-choice tasks and the lack of correspondence between these tasks and those one might encounter in the target language use domain. Interactiveness was also low due to the test’s inability to fully involve the test-takers’ grammatical ability in performing the tests. The impact of the CELT on stakeholders is not documented in the published manual.

The Community English Program (CEP) Placement Test

Purpose

The Community English Program (CEP) Placement Test was first developed by students and faculty in the TESOL and Applied Linguistics Programs at Teachers College, Columbia University, in 2002, and is revised regularly. Unlike the previous tests reviewed, the CEP Placement Testis a theme-based assessment designed to measure the communicative language ability of learners entering the Community English Program, a low-cost, adult ESL

Construct definition and operationalization

Given that the CEP is a theme-based ESL program, where language instruction is contextualized within a number of different themes throughout the different levels, the CEP Placement Test is also theme-based. The theme for the CEP Placement Test under review is ‘Cooperation and Competition’. This is not one of the themes students encounter in the program. In this test, all five test sections assess different aspects of language ability while exposing examinees to different aspects of the theme.

Measuring grammatical ability through language use

In addition to measuring grammatical ability in the grammar section, grammatical ability is also measured in the writing and speaking sections of the test. The writing section consists of one 30-minute essay to bewritten on the theme of ‘cooperation and competition’.

The CEP Placement Test and the qualities of test usefulness

In terms of the qualities of test usefulness, the developers of the grammar section of the CEP Placement Test prioritize construct validity, reliability and practicality. With regard to construct validity, the grammar section of this test was designed to measure both grammatical form and meaning on the sentential and discourse levels, sampling from a wide range of grammatical features. In this test, grammatical ability is measured by means of four tasks in the grammar section, one task in the writing section, and by several tasks in the speaking section. In short, the CEP. Placement Test measures both explicit and implicit knowledge of grammar. Placement decisions based on interpretations of the CEP Placement Test scores seem to be appropriate as only a handful of misplacements are reported each term.

The reliability of the grammar-test scores was also considered a priority from the design stage of test development as seen in the procedures for item development, test piloting and scoring. In an effort to promote consistency (and quick return of the results), the CEP Placement Test developers decided to use only multiple-choice tasks in the grammar section. This decision was based on the results of the pilot tests, where the use of limited production grammar tasks showed inconsistent scoring results and put a strain on time resources.

CHAPTER EIGHT

Learning-oriented assessments of grammatical ability

Introduction

The language tests reviewed in the previous chapter involved the grammar sections from large-scale tests designed to measure global language proficiency, typically for academic purposes. Like other large-scale and often high-stakes tests, they were designed to make institutional decisions related to placement into or exit from a language program, screening for language proficiency or reclassification of school status based on whether a student had achieved the language skills necessary to benefit from instruction in the target language. These tests provide assessments for several components of language ability including, among others, aspects of grammatical knowledge.

What is learning-oriented assessment of grammar?

In reaction to conventional testing practices typified by large-scale, discrete-point, multiple-choice tests of language ability, several educators (e.g., Herman, Aschbacher and Winters, 1992; Short, 1993; Shohamy, 1995; Shepard, 2000) have advocated reforms so that assessment practices might better capture educational outcomes and might be more consistent with classroom goals, curricula and instruction. The termalternative assessment, authentic assessment and performance assessment have all been associated with calls for reform to both large-scale and classroom assessment contexts. While alternative, authentic and performance assessment are all viewed to be essentially the same, they emphasize slightly different aspects of a move away from conventional, discrete-point, standardized assessment.

Alternative assessment emphasizes an alternative to and rejection of selected-response, timed and one shot approaches to assessment, whether they occur in large scale or classroom assessment contexts. Alternative assessment encourages assessments in which students are asked to perform, create, produce or do meaningful tasks that both tap into higher-level thinking (e.g., problem-solving) and have real-world implications (Herman et al., 1992).

Implementing learning-oriented assessment of grammar

Considerations from grammar-testing theory

The development procedures for constructing large-scale assessments of grammatical ability discussed in Chapter 6 are similar to those needed to develop learning-oriented assessments of grammar for classroom purposes with the exception that the decisions made from classroom assessments will be somewhat different due to the learning-oriented mandate of classroom assessment. Also, given the usual low-stakes nature of the decisions in classroom assessment, the amount of resources that needs to be expended is generally less than that required for large-scale assessment.

Implications for test design In designing classroom-based, learning-oriented assessments, we need to provide a much more explicit depiction of the assessment mandate than we might do for large-scale assessments. This is because classroom assessment, especially in school contexts, has many interested stakeholders (e.g. students, teachers, parents, tutors, principals, school districts), who are likely to be held accountable for learning and who will use the assessment information to evaluate instructional outcomes and plan for further instruction.

Implications for operationalization

The operationalization stage of classroom-based, learning-oriented assessment is also similar to that of large-scale assessments. That is, the outcome should be a blueprint for the assessment, as described in Chapter 6. The learning mandate, however, will obviously affect the specification of test tasks so that characteristics such as the setting, the rubrics or the expected response can be better aligned with instructional goals. For example, in classroom-based assessment, we may wish to collect information about grammar ability during the course of instruction, and we may decide to evaluate performance by means of teacher observationreports, or we may wish to assess grammatical ability by means of informal oral interviews conducted over several day.

Learning-oriented assessment of grammar may be achieved by means of a wide array of data-gathering methods in classroom contexts. These obviously include conventional quizzes and tests containing selected response, limited-production and all sorts of extended-production tasks, as discussed earlier. These conventional methods provide achievement or diagnostic information to test-users, and can occur before, during or after instruction, depending on the assessment goals. They are often viewed as ‘separate’ from instruction in terms of their administration. These assessments are what most teachers typically call to mind when they think of classroom tests.

Planning for further learning

The usefulness of learning-oriented, classroom assessment is to a great extent predicated upon the quality and explicitness of information obtained and its relevance for further action. Research has shown, however, that the quality of feedback contributes more to further learning than the actual presence or absence of it (Bangert-Downs et al., 1991).

Teachers have many options for presenting assessment results to students. They could present students with feedback in the form of a single overall test score, a score for each test component, scores referenced to a rubric, a narrative summary of teacher observations or a profile of scores showing development over time. Feedback can also be presented in a private conference with the individual student. In an effort to understand the effect of feedback on further learning, Butler (1987) presented test takers with feedback from an assessment in one of three forms: (1) focused written comments that addressed criteria test-takers were aware of before the assessment; (2) grades derived from numerical scoring; and (3) grades and comments. Test-takers were then given two subsequent tasks, and significant gains were observed with those who received thedetailed coments.

Considerations from L2 learning theory

Given that learning-oriented assessment involves the collection and interpretation of evidence about performance so that judgments can be made about further language development, learning-oriented assessment of grammar needs to be rooted not only in a theory of grammar testing or language proficiency, but also in a theory of L2 learning. What is striking in the literature is that models of language ability rarely refer to models of language learning, and models of language learning rarely make reference to models of language ability.

As we have seen, implementing grammar assessment with a learning mandate has implications for test construction. Some of these implications have already been discussed. However, implementing learning oriented assessment of grammar is not only about task design and operationalization, teachers also need to consider how assessment relates to and can help promote grammar acquisition, as described by Van Patten (1996).

SLA processes – briefly revisited

As discussed in Chapter 2, research in SLA suggests that learning an L2 involves three simultaneously occurring processes: input processing (Van Patten, 1996), system change (Schmidt, 1990) and output processing (Swain, 1985; Lee and Van Patten, 2003). Input processing relates to how the learner understands the meaning of a new grammatical feature or how form–meaning connections are made (Ellis, 1993; Van Patten, 1996). A critical first stage of acquisition is the conversion of input to ‘intake’. The second set of processes, system change, refers to how learners accommodate new grammatical forms into their inter language and how this change helps restructure their inter language so that it is more target like (McLaughlin, 1987; De Keyser, 1998).

Assessing for intake

Van Patten and Cadierno (1993b) describe this critical first stage of acquisition as the process of converting input into ‘intake’. In language classrooms, considerable time is spent on determining if students have understood. As most teachers know, however, it is difficult to discern if their students have mapped meaning onto the form.

Assessing for intake requires that learners understand the target forms, but do not produce them themselves. This can be achieved by selected response and limited-production tasks in which learners need to make form meaning connections. Three examples of interpretation tasks designed to assess for intake are presented below. (For additional examples of interpretation tasks, see Ellis, 1997; Lee and Van Patten, 2003; and Van Patten, 1996, 2003.).

Assessing to push restructuring

Once input has been converted into intake, the new grammatical feature is ready to be ‘accommodated’ into the learner’s developing linguistic system, causing a restructuring of the entire system (Van Patten, 1996). To initiate this process, teachers provide students with tasks that enable them to use the new grammatical forms in decreasingly controlled situations so they can incorporate these forms into their existing system of implicit grammatical knowledge.

Assessing for output processing

Although learners may have developed
an explicit knowledge of the form and meaning of a new grammatical point, this does not necessarily mean they can access this knowledge automatically in spontaneous communication. In order for learners to produce unplanned, meaningful output in real time (i.e., speaking), they need to be able to tap into grammatical knowledge that is already an unconscious part of their developing system of language knowledge (Lee and VanPatten, 2003). Thus, to assess the test takers’ implicit knowledge of grammar (i.e., their ability to process output), test-takers need to be presented with tasks that ask them to produce language in real time, where the focus is more on the content being communicated or on the completion of the task than on the application of explicit grammar rules.

Illustrative of learning-oriented assessment

Let us now turn to an illustration of a learning-oriented achievement test of grammatical ability.

Making assessment learning-oriented

The On Target achievement tests were designed with a clear learning mandate. The content of the tests had to be strictly aligned with the content of the curriculum. This obviously had several implications for the test design and its operationalization. From a testing perspective, the primary purpose of the Unit 7 achievement test was to measure the students’ explicit as well as their implicit knowledge of grammatical form and meaning on both the sentence and discourse levels.

While the TLU domain was limited to the use of the present perfect tense to discuss life achievements, the constructs and tasks included in the test were both simple and complex. For example, the first gap-filling grammar task was intended only to assess the test-takers’ explicit knowledge of morphosyntactic form and the pronunciation task focused only on their explicit knowledge of phonological form. The second grammar task was slightly more complex in that it aimed to measure the test-takers’ ability to use these forms to communicate literal and intended meanings based on more extensive input.

CHAPTER NINE

Challenges and new directions inassessing grammatical ability

Introduction

Research and theory related to the teaching and learning of grammar have made significant advances over the years. In applied linguistics, our understanding of language has been vastly broadened with the work of corpus-based and communication-based approaches to language study, and this research has made pathways into recent pedagogical grammars.
Also, our conceptualization of language proficiency has shifted from an emphasis on linguistic form to one on communicative language ability and communicative language use, which has, in turn, led to a demphasis on grammatical accuracy and a greater concern for communicative effectiveness.

The state of grammar assessment

In the last fifty years, language testers have dedicated a great deal of time to discussing the nature of language proficiency and the testing of the four skills, the qualities of test usefulness (i.e., reliability, authenticity), the relationships between test-taker or task characteristics and performance, and numerous statistical procedures for examining data and providing evidence of test validity. In all of these discussions, very little has been said about the assessment of grammatical ability, and unsurprisingly, until recently, not much has changed since the 1960s.

In recent years, the assessment of grammatical ability has taken an interesting turn in certain situations. Grammatical ability has been assessed in the context of language use under the rubric of testing speaking or writing. This has led, in some cases, to examinations in which grammatical knowledge is no longer included as a separate and explicit component of communicative language ability in the form of a separate subtest. In other words, only the students’ implicit knowledge of grammar alongside other components of communicative language ability (e.g., topic, organization, register) is measured. Having discussed how grammar assessment has evolved over the years, I will discuss in the next section some ongoing issues and challenges associated with assessing grammar.

Challenge 1: Defining grammatical ability

One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintained a strong syntactocentric view of language rooted largely in linguistic structuralism.

Challenge 2: Scoring grammatical ability

A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures.

Challenge 3: Assessing meanings

The third challenge revolves around ‘meaning’ and how ‘meaning’ in a model of communicative language ability can be defined and assessed. The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language.

Challenge 4: Reconsidering grammar-test tasks

The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching.

Challenge 5: Assessing the development of grammatical ability

The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind.

Final remarks

Despite loud claims in the 1970s and 1980s by a few influential SLA researchers that instruction, and in particular explicit grammar instruction, had no effect on language learning, most language teachers around the world never really gave up grammar teaching. Furthermore, these claims have instigated an explosion of empirical research in SLA, the results of which have made a compelling case for the effectiveness of certain types of both explicit and implicit grammar instruction. This research has also highlighted the important role that meaning plays in learning grammatical forms.

In the same way, most language teachers and SLA researchers around the world have never really given up grammar testing. Admittedly, some have been perplexed as to how grammar assessment could be compatible with a communicative language teaching agenda, and many have relied on assessment methods that do not necessarily meet the current standards of test construction and validation.

My aim in this book, therefore, has been to provide language teachers, language testers and SLA researchers with a practical framework, firmly based in research and theory, for the design, development and use of grammar assessments. I have tried to show how grammar plays a critical role in teaching, learning and assessment. I have also presented a model of grammatical knowledge, including both form and meaning, that could be used for test construction and validation. I then showed how L2 grammar tests can be constructed, scored and used to make decisions about test-takers in both large-scale and classroom contexts. Finally, in this last chapter, I have discussed some of the challenges we still face in constructing useful grammar assessments. My hope is that this volume will not only help language teachers, testers and SLA researchers develop better grammar assessments for their respective purposes, but instigate research and continued discussion on the assessment of grammatical ability and its role in language learning.

Summary Assessing Vocabulary from book Language Assessment by John Read

CHAPTER ONE

The Place Of Vocabulary In Language Assessment

Introduction

At first glance, it may seem that assessing the vocabulary knowledge of second language learners is both necessary and reasonably straightforward. It is necessary in the sense that words are the basic building blocks of language, the units of meaning from which larger structures such as sentences, paragraphs and whole texts are formed.
Vocabulary assessment seems straightforward in the sense that word lists are readily available to provide a basis for selecting a set of words to be tested. In addition, there is a range of well-known item types that are convenient to use for vocabulary testing.

Recent trends in language testing

However, scholars in the ®eld of language testing have a rather different perspective on vocabulary-test items of the conventional kind. Such items ®t neatly into what language testers call the discretepoint approach to testing. This involves designing tests to assess whether learners have knowledge of particular structural elements of the language: word meanings, word forms, sentence patterns, sound contrasts and so on..

Bachman and Palmer's (1996) book Language Testing in Practice, which is a comprehensive and in¯uential volume on language-test design and development. Following Bachman's (1990) earlier work, the authors see the purpose of language testing as being to allow us to make inferences about learners' language ability, which consists of two components. One is language knowledge and the other is strategic competence.

Three dimensions of vocabulary assessment

Up to this point, I have outlined two contrasting perspectives on the role of vocabulary in language assessment. One point of view is that it is perfectly sensible to write tests that measure whether learners know the meaning and usage of a set of words, taken as independent semantic units. The other view is that vocabulary must always be assessed in the context of a language-use task, where it interacts in a natural way with other components of language knowledge.

Discrete - embedded

The first dimension focuses on the construct which underlies the assessment instrument. In language testing, the term construct refers to the mental attribute or ability that a test is designed to measure
a discrete test takes vocabulary knowledge as a distinct construct, separated from other components of language competence.
an embedded vocabulary measure is one that contributes to the assessment of a larger construct. I have already given an example of such a measure, when I referred to Bachman and Palmer's task of writing a proposal for the improvement of university admissions procedures.

Selective - comprehensive

The second dimension concerns the range of vocabulary to be included in the assessment. A conventional vocabulary test is based on a set of target words selected by the test-writer, and the test-takers are assessed according to how well they demonstrate their knowledge of the meaning or use of those words. This is what I call a selective vocabulary measure. The target words may either be selected as individual words and then incorporated into separate test items, or alternatively the test-writer first chooses a suitable text and then uses certain words from it as the basis for the vocabulary assessment.

Context-independent - context-dependent

The role of context, which is an old issue in vocabulary testing, is the basis for the third dimension. Traditionally contextualisation has meant that a word is presented to test-takers in a sentence rather than as an isolated element. From a contemporary perspective, it is necessary to broaden the notion of context to include whole texts and, more generally, discourse.
The issue of context dependence also arises with cloze tests, in which words are systematically deleted from a text and the testtakers' task is to write a
suitable word in each blank space.

CHAPTER TWO

The Nature Of Vocabulary

Introduction

Before we start to consider how to test vocabulary, it is necessary first to explore the nature of what we want to assess. Our everyday concept of vocabulary is dominated by the dictionary. We tend to think of it as an inventory of individual words, with their associated meanings. This view is shared by many second language learners, who see the task of vocabulary learning as a matter of memorising long lists of L2 words, and their immediate reaction when they encounter an unknown word is to reach for a bilingual dictionary.

What is a word?

A basic assumption in vocabulary testing is that we are assessing knowledge of words. But the word is not an easy concept to define, either in theoretical terms or for various applied purposes. There are some basic points that we need to spell out from the start. One is the distinction between tokens and types, which applies to any count of the worlds in a text.

What about larger lexical Itemst

The second major point about vocabulary is that it consists of more than just single words. For a start, there are the phrasal verbs (get available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments' (p. 110).

What does it mean to know a lexical item?

The other seven assumptions cover various aspects of what is meant by knowing a word:

• Knowing a word means knowing the degree of probability of encountering that word in speech or print. For many words we also know the sort of words most likely to be found associated with the word.
• Knowing a word implies knowing the limitations on the use of the word according to variations of function and situation.

What ia vocabulary ability?

Mention of the term construct brings me back to the main theme I developed in Chapter 1, which was that scholars with a specialist interest in vocabulary teaching and learning have a rather different perspective from language testers on the question of how - and even whether - to assess vocabulary. My three dimensions of vocabulary assessment represent one attempt to incorporate the two perspectives within a single framework.

The context of vocabulary use

Traditionally in vocabulary testing, the term context has referred to the sentence or utterance in which the target word occurs. For in-stance, in a multiple-choice vocabulary item, it is normally recommended that the stem should consist of a sentence containing the word to be tested.

Vocabulary knowledge and fundamental processes

The second component in Chapelle's (1994) framework of vocabulary ability is the one that has received the most attention from applied linguists and second language teachers. Chapelle outines four dimensions of this component:

Vocabulary size: This refers to the number of words that a person knows.

Lexicon organization: This concerns the way in which words and other lexical items are stored in the brain.

Metacognitive strategies for vocabulary use

This is the third component of Chapelle's definition of vocabulary ability, and is what Bachman (1990) refers to as 'strategic competence'. The strategies are employed by all language users to manage the ways that they use their vocabulary knowledge in communication. Most of the time, we operate these strategies without being aware of it. It is only when we have to undertake unfamiliar or cognitively demanding communication tasks that the strategiès become more conscious.

CHAPTER THREE

Research on vocabulary acquisition and use

Introduction

The focus of this chapter is on research in second language vocabu- lary acquisition and use. There are three reasons for reviewing this research in a book on vocabulary assessment. The first is that the researchers are significant users of vocabulary tests as instruments in their studies.
Systematic vocabulary learning
Given the number of words that learners need to know if they are to achieve any kind of functional proficiency in a second language, it is understandable that researchers on language teaching have been in- terested in evaluating the relative effectiveness of different ways of learning new words.

Assessment Issues

As for the assessment implications, the design of tests to evaluate how well students have learned a set of new words is straightforward, particularly if the learning task is restricted to memorising association between an L2 word and its L1 meaning. It simply involves presenting the test-takers with one word and asking them to supply the other-language equivalent. However, as Ellis and Beaton (1993b) note, it makes a difference whether they are required to translate into or out of their own language
Incidental vocabulary learning
The term incidental often causes problems in the discussion of research on this kind of vocabulary acquisition. In practice it usually isciation between an L2 word and its L1 meaning. It simply involves presenting the test-takers with one word and asking them to supply the other-language equivalent.

Research with native speakers

The first step in investigating this kind of vocabulary acquisition was to obtain evidence that it actually happened. Teams of reading researchers in the United States (Jenkins, Stein and Wysocki, 1984; Nagy, Herman and Anderson, 1985; Nagy, Anderson and Herman, 1987) undertook a series of studies with native-English-speaking school children.

Second language research

Now, how about incidental learning of second language vocabulary? In a study that predates the LI research in the US, Saragi, Nation and Meister (1978) gave a group of native speakers of English the task of reading Anthony Burgess's novel A Clockwork Orange, which contains a substantial number of Russian-derived words functions as an argot used by the young delinquents who are the main characters in the book.

Assessment issues

Now, what are the testing issues that arise from this research on incidental vocabulary acquisition? One concerns the need for a pretest. A basic assumption made in these studies is that the target words are not known by the subjects. To some extent, it is possible to rely on teachers' judgements or word-frequency counts to select words that a particular group of learners are unlikely to know, but it is preferable to have some more direct evidence. The use of a pre-test allows the researchers to select from a set of potential target words ones that none of the subjects are familiar with.

Questions about lexical inferencing

This topic is not purely a pedagogical concern. Inferencing by learners is of great interest in second language acquisition research and there are several types of empirical studies that are relevant here. In re-viewing this work, I find it helpful to start with five questions that seem to follow a logical sequence:

1 What kind of contextual information is available to readers to help them in guessing the meaning of unknown words in texts?
2 Are such clues normally available to the reader in natural, unedited texts?
3 How do well learners infer the meaning of unknown words without being specifically trained to do so?
4 Is training strategy an effective way of developing learners' lexical inferencing skills?
5 Does successful inferencing lead to acquisition of the words?

Assessment issues

As in any test-design project, we first need to be clear about what the purpose of a lexical inferencing test is. The literature I have just reviewed above indicated at least three possible purposes:

1 to conduct research on the processes that learners engage in when they attempt to enter the meaning of unknown words;
2 to evaluate the success of a program to train learners to apply lexical inferencing strategies; or
3 to assess learners on their abilities to make inferences about unknown words.

Communication strategies

When compared with the amount of research on ways that learners cope with unknown words they encounter in their reading, there has been less investigation of the vocabulary difficulties they face in expressing themselves through speaking and writing. However, within the field of second language acquisition, there is an active tradition of research on communication strategies.

CHAPTER FOUR

Research on vocabulary assessment

Introduction

In the previous chapter, we see how tests play a role in research on vocabulary within the field of second language acquisition (SLA). Now we move on to consider research in the field of language testing. where the focus is not so much on understanding the processes of vocabulary learning as on measuring the level of vocabulary knowledge and abilities that learners have reached.

Objective testing

The history of vocabulary assessment in the twentieth century is very much associated with the development of objective testing, especially in the United States. Objective tests are ones in which the learning material is divided into small units, each of which can be assessed by means of a test item with a single correct answer that can be specified in advance.

Multiple-choice vocabulary items

Although the multiple-choice format is one of the most widely used methods of vocabulary assessment, both for native speakers and for second language learners, its limitations have also been recognized for a long time.

Validating tests of vocabulary knowledge

Writers on first language reading research over the years (Kelley and Krey, 1934; Farr, 1969: Schwartz, 1984) have pointed out that, in addition to the various variations of the multiple-choice format, a wide range of test items and methods have been used for measuring vocabulary knowledge. Kelley and Krey (cited in Farr, 1969: 34) identified 26 different methods in standardized US vocabulary and reading tests.

Measuring vocabulary size

Let me first sketch some educational situations in which consideration of vocabulary size is relevant and where the research has been undertaken.

• Reading researchers have long been interested in estimating how many words are known by native speakers of English as they grow from childhood through the school years to adult life
• Estimates of native-speaker vocabulary size at different ages provide a target - though a moving one, of course - for the acquisitions of vocabulary by children entering school with little knowledge of the language used as the medium of instruction.
• International students undertaking upper secondary or university education through a new medium of instruction simply do not have discussion I use the two sets of terms interchangeably.
• Reading researchers have long been interested in estimating how many words are known by native speakers of English as they grow from childhood through the school years to adult life
• Estimates of native-speaker vocabulary size at different ages provide a target - though a moving one, of course - for the acquisitions of vocabulary by children entering school with little knowledge of the language used as the medium of instruction.

What counts as a word?

is an issue that I discussed in Chapter 2. The larger estimates of vocabulary sizes for native speakers tend to be calculated on the basis of individual word forms, whereas more conservative estimates take word families as the units to be measured. Remember that a word family consists of a base word together with its inflected and derived forms that share the same meaning.

How do we choose which words to test?

For practical reasons it is impossible to test all the words that the native speaker of a language might know. Researchers have typically started with a large dictionary and then drawn a sample of words representing, say, 1 per cent (1 in 100) of the total dictionary entries. The next step is to test how many of the selected words are known by a group of subjects.

How do we find out what the selected words are known?

Once a sample of words has been selected, it is necessary to find out - by means of some kind of test whether each word is known. In studies of vocabulary size, the criterion for knowing a word is usually quite liberal, because of the large number of words that need to be covered in the time available for testing. The following test formats have been commonly used:

multiple-choice items of various types; matching of words with synonyms or definitions3;
supplying an Ll equivalent to each L2 target word;
The checklist (or yes-no) test, in which test-takers simply indicate whether they know the word or not.

Assessing quality of vocabulary knowledge

Whatever the merits of vocabulary-size tests, one limitation is that they can give only a superficial indication of how well any particular word is known. In fact this criticism has long been applied to many objective vocabulary tests, not just those that are designed to estimate the total vocabulary size. Dolch and Leeds (1953) analyzed the vocabulary subtests of five major reading and general achievement test batteries for American school children and found that

How to measure it?

How to conceptualize it? The Dolch and Leeds (1953) test items with which I introduced this section of the chapter essentially assessing precision of knowledge: do the test-takers know the specific meaning of each target word, rather than just having a vague idea about it? This represents one way to define quality of knowledge, but it assumes that each word has only one meaning to be precisely known.

The role of context

Whether we can separate vocabulary from other aspects of language proficiency is obviously relevant to the question of what the role of context is in vocabulary assessment. In the early years of objective testing, many vocabulary tests are presented in the target words in isolation. in lists or as the stems of multiple-choice items. It was considered that such tests were pure measures of vocabulary knowledge

Cloze tests as vocabulary measures

A standard cloze test consists of one or more reading passages from which words are deleted according to a fixed ratio (e.g. every seventh word). Each deleted word is replaced by a blank of uniform length, and the task of the test takers is to write a suitable word in each space.

The standard cloze

Let us first look at the standard, fixed-ratio cloze. A popular way of exploring the validity of cloze tests in the 1970s was to correlate them with various other types of tests. In numerous studies the cloze correlated highly with 'integrative' tests such as dictation or composition writing and at a rather lower level with more 'discrete-point' tests of vocabulary, grammar and phonology

The rational cloze

Although Oller has consistently favored the standard fixed-ratio format as the most valid form of the cloze procedure for assessing second language proficiency, other scholars have argued for a more selective approach to the deletion of words from the text. In his research, Alderson (1979) found that a single text could produce quite different tests depending on whether you deleted, say, every eighth word rather than every sixth.

The multiple-choice cloze

Choice items in a cloze test rather than the standard blanks to be filled in. Porter (1976) and Ozete (1977) argued that the standard format re-quires writing ability, whereas the multiple-choice version makes it more a measure of reading comprehension Jonz (1976) pointed out that a multiple-choice cloze could be marked more objectively because it controlled the range of responses that the test-takers could give In addition, he considered that providing response options made the test more student-centered - or 'learner-friendly, 'as we might say these days.

The C-test

At first glance the C-test-in which a series of short texts are prepared for testing by deleting the second half of every second word - may seem to be the version of the cloze procedure that is the least processing as a specific measure of vocabulary. For one thing, its creators intended that it should assess general proficiency in the language, particularly for selection and placement purposes, and that the representation should be a sample of all the elements in the text (Klein-Braley, 1985; 1997 : 63-66).

CHAPTER FIVE

Vocabulary Tests: Four case studies

The four tests are:
1. The voluntary level test;
2. The Eurocentres Vocabulary Size Test (EVST);
3. The Vocabulary Knowledge scale (VKS); and
4. The test of English as a Foreign Language (TOEFL)

The Vocabulary Levels Test

The vocabulary levels Test was devised by paul Nation at victoria University of Wellington I in New Zealand in the early 1980s as a simple instrument for classroom use by teachers in order to help the develop a suitable vocabulary teaching.and leaming programme for their students

The design of the test

The test is in five parts, representing five levels of word frequency in English: the first 2000 words, 3000 words, 5000 words, the University word level (beyond 5000 words) and 10,000
According to Nation (1990:261), the 2000- and 3000-word levels contain the high-frequency i words that all leamers need to kmow in order to function effectively in English.

• Validation
• New versions

The Eurocentres Vocabulary Size Test

Like the Vocabulary Levels Test, the Eurocentres Vocabulary Size Test (EVST) makes an estimate of a learner's vocabulary size using a graded sample of words covering numerous frequency levels. Another distinctive feature of the EVST is that it is administered by computer rather than as a pen-and-paper test. Let us now look at the test from two perspec tives: first as a placement instrument and then as a measure of vocabulary size.

• The EVST as a placement test
• The EVST as a measure of vocabulary size

Evaluation of the instrument

Paribakht and Wesche have been careful to make modest claims for their instrument: 'Its purpose is not to estimate general vocabulary knowledge, but rather to track the early development of specific words in an instructional or experimental situation' (Wesche and Paribakht, 1996: 33). They have obtained various kinds of evidence in their research for its reliability and validity as a measure of incidental vocabulary acquisition (Wesche and Paribakht, 1996: 31-33).

The Test of English as a Foreign Language

The Test of English as a Foreign Language, or TOEFL, is administered in 180 countries and territories to more than 900,000 candidates. Like other ETS tests, TOEFL relies on sophisticated statistical analyses and testing technology in order to ensure its quality as a measuring instrument and its efficient administration to such large numbers of test-takers. Until recently, all the items in the basic TOEFL test have been of this type.

• The original vocabulary items

From its beginning in 1964 until the mid-1970s, TOEFL consisted of five sections: listening comprehension, English structure, vocabulary, reading comprehension and writing ability.There were two types of vocabulary-test item, which were labelled sentence completion and synonym matching.

• The 1976 revision

In his study involving students in Peru, Chile and Japan, Plke found that, although the existing Vocabulary section of the test correlated highly (at 0.88 to 0.95) with the Reading Comprehension section, the new Words in Context items had even higher correlations (0.94 to 0.99) with the reading section of the experimental test.

• Towards more contextualised testing

At a conference convened by the TOEFL Program in 1984, a number of applied linguists were invited to present critical reviews of the extent to which TOEFL could be considered a measure of com municative competence.

• Vocbulary in the 1995 version

The results of this in-house research provided support for recommen dations from the TOEFL Committee of Examiners, an advisory body of scholars from outside ETS, that vocabulary knowledge should be assessed in a more integrative manner.

• The current situation

The latest development in the story occurred in 1998, with the intro duction of a computerised version of TOEFL. In most countries candidates now take the test at an individual testing station, sitting at a computer and using the mouse to record their responses to the items presented to them on the screen. For the reading test, the passages appear in a frame on the left side of the screen and the test items are shown one by one on the right side. Vocabulary items have been retained but in a different form from before.

CHAPTER SIX

The design of discrete vocabulary tests
Introduction

The discussion of vocabulary-test design in the first part of this chapter is based on the framework for language-test development i presented in Bachman and Palmer's (1996) book Language Testing in Practice. Since the full framework is too complex to cover here, I have chosen certain key steps in the test-development process as the basis for a discussion of important issues in the design of discrete vocabu lary tests in particular. In the second part of the chapter, I offer ai practical perspective on the development of vocabulary tests by means of two examples

Test Purpose

Following Bachman and Palmer's framework, an essential first step in language-test design is to define the purpose of the test. It is important to clarify what the test will be used for because, according to testing theory, a test is valid to the extent that we are justified in drawingi conclusons from its results.

we can identify three uses for language tests: for research, making decisions about learners and making decisions about language programmes.

Construct definition

Bachman and Palmer (1996: 117-120) state that there are two approaches to construct definition: syllabus-based and theory-based. A syllabus-based definition is ap propriate when vocabulary assessment takes place within a course of study, so that the lexical items and the vocabulary skills to be assessed can be specified in relation to the learning objectives of the course.

Receptive and productive vocabulary

This distinction between receptive and productive vocabulary is one that is accepted by scholars working on bothI first and second language vocabulary development, and it is often referred to by the alternative terms passive and active. As Melka (1997) points out, though, there are still basic problems in conceptualising and measuring the two types of vocabulary, in spite of a lengthy history of research on the subject.

Characteristics of the test input

The design of test tasks is the next step in test development, according to Bachman and Palmer's model. In this chapter, I focus on just two aspects of task design: characteristics of the input and characteristics of the expected response.

Selection of target words

Based on such findings, Nation (1990: Chapter 2) proposes that, for teaching and learning purposes, a broad three-way division can bei made Into high-frequency, low-frequency and specialised vocabulary. The hlgh-frequency category In English consists of 2000 word families, which form the foundation of the vocabulary knowledge that i all proficient users of the language must have acquired.On the other hand, low-frequency vocabulary as a whole is of much less value to learners

Presentation of words

Words in isolation

As with other decisions in test design, the question of how to present selected words to the test-takers needs to be related to the purpose of the assessment.

Words in context

For other purposes, the presentation of target words in some context i is desirable or necessary. In discrete, selective tests, the context most commonly consists of a sentence in which the target word occurs, buti it can also be a paragraph or a longer text containing a whole series of target words.

Characteristics of the expected response

Self-report vs. verifiable response

In some testing situations, it is appropriate to ask the learners to assess their own lexical knowledge. In Chapter 5, we saw how the EVST and the VKS draw on self-report by the test-takers, although both instruments also incorporate a way of checking how valid the responses are as measures of vocabulary knowledge.

Monolingual vs. Bilingual testing

last design consideration concerns the language of the test itself. Whereas in a monolingual test format only the target language is i used, a billingual one employs both the target language (L2) and the learners' own language (Ll).

Practical examples

Classroom progress tests

The purpose of my class tests is generally to assess the learners' progress in vocabulary learning and, more specifically, to give them an incentive to keep studying vocabulary on a regular basis.

Matching items

There are some aspects of the design of this item type which arei worth noting:

The reason for adding one or two extra definitions is to avoid a situation where the learner knows four of the target words and can then get the fifth definition correct by process of elimination, without actually knowing what the word means.
Assuming that the focus of the test is on knowledge of the target words, the definitions should be easy for the learners to understand. Thus, as a general principle, they should be composed of higher frequency vocabulary than the words to be tested and should not be written In an ellptical style that causes comprehension problems.

Completion items

Completion, or blank-filling, items consist of a sentence from which the target word has been deleted and replaced by a blank. As in the contextualised matching format above, the function of the sentence is to provide a context for the word and perhaps to cue a particular use of it.

Generic test items

In an individualised vocabulary programme, these generic items offer a practical alternative to having separate tests for each learner in the class. The same item types could also be used more convention. ally, with target words provided by the teacher, in a class where they learners have all studied the same vocabulary.

The word-associates test

The new starting point was the concept of word association. The standard word-association task involves presenting subjects with a set of stimulus words one by one and asking them to say the first related word that comes into their head.

CHAPTER SEVEN

Comprehensive measures of vocabulary

Introduction

Comprehensive measures are particularly suitable for assessment procedures in which vocabulary is embedded as one component of the measurement of a larger construct, such as communicative com petence in speaking, academic writing ability or listening comprehension. However, we cannot simply say that all comprehensive measures are embedded ones, because they can also be used on ai discrete basis.

Measures of test input

In reading and listening tests we have to be concerned about thei nature of the input text. At least two questions can be asked:

Is it at a suitable level of difficulty that matches the ability range of the test-takers?
Does it have the characteristics of an authentic text, especially if it has been produced or modified for use in the test? Here we are specifically interested in the extent to which informa tion about the vocabulary of the text can help to provide answers to these questions.

Rediability

In Ll reading research. the basic concept used in the analysis of textsi is readabillty, which refers to the various aspects of a text that arei likely to make it easy or difficult for a reader to understand and enjoy. During the twentieth century a whole educational enterprise grew up in the United States devoted to devising and applying formulas toi predict the readability of English texts for native-speaker readers in terms of school grade or age levels (for a comprehensive review, see Klare, 1984).

Listenability of spoken texts

Much more work has been done on the comprehensibility of written texts than of spoken language. Whereas the term readability is now very well established, its oral equivalent, listenabillty, has had only limited currency. However, it seems reasonable to expect that spoken texts used for the assessment of listening comprehension vary in the demands that they place on listeners in comparable ways to thei demands made of readers by different kinds of written language.

Measures of learner production

Most of this section is concerned with statistical measures of writing, because there is more published research on that topic, but l also consider measures i of spoken production, as well as the more qualitative approach to assessment represented by the use of rating scales to judgei performance.

CHAPTER EIGHT

Further development in vocabulary assessment

Introduction

In the main body of this chapter, I want to review i some current areas of work on second language vocabulary. which will provide additional evidence for my view that a wider perspective is required. and then explore the implications for further develop ments in vocabulary assessment for the future.

The identification of lexical units

One basic requirement for any work on vocabulary is good quality information about the units that we are dealing with. In this section of the chapter, I first review the current state of word-frequency listsi and then take up the question of how we might deal with multi-word Iexical items in vocabulary assessment.

The vocabulary of informal speech

The vocabulary speech is the second area of vocabulary study that has received less attention than it should have, as indicated by the fact that perhaps the most frequently cited research study is the one conducted by Schonell et al. (1956) in the 1950s on the spoken vocabulary of Australian workers.

The social dimension of vocabulary use

In addition, vocabulary knowledge and use are typically thought of in psycholinguistic terms, which minimises the existence of social variation among learners, apart from the fact i that they undertake various courses of study, pursue different careers i and have a range of personal interests.

For assessment purposes, the education domain is obviously an area of major concern, especially when there is evidence that learners i from particular social backgrounds lack the opportunity to acquire the vocabulary they need for academic study.

References:

Purpura, E. James. 2004. Assessing Grammar. United Kingdom: Cambridge University Press.

Read, John. 2000. Assessing Vocabulary. United Kingdom: Cambridge University Press.

Assignment 10
Summary Assessing Reading and Assessing Writing from book Language Assessment principle and classroom practice by Douglas Brown

ASSESSING READING
(Page 185-217)

In foreign language learning, reading is likewise a skilL that teachers simply expect learners to acquire.. Basic, beginning-level textbooks in a foreign language presuppose a student'S reading ability ifonly because it's a book that is the medium. Most formal tests use the written word' as a stimulus for test-taker response; even oral interviews may require reading performance for certain tasks. Reading, arguably the most essential skill for success in all educational contexts, remaiiis a skill of Paramount importance as we create assessments of general language ability.

As we consider a number of different types or genres of written texts, the components of reading ability, and specific tasks that are commonly used in the assessment of reading, let's not forget the unobservable nature of reading. Like listening, one cannot see the process of reading, nor can one observe a specific product of reading.

TYPES (GENRES) OF READING

Each type or genre of written text has its own set of governing rules and conventions. A reader must be able to anticipate those conventions in order to process meaning efficiently. With an extraordinary number of genres present in any literate culture, the· reader's ability to process texts must be very sophisticated. Consider the
following abridged list of common genres, which ultimately form part of the specifications for assessments of reading ability.

Genres of reading

1. Academic reading

general interest articles (in magazines, newspapers, etc.)
technical reports (e.g., lab reports), professional journal articles
reference material (dictionaries, etc.)
textbooks, theses
essays, papers
test directionseditorials and opinion writing.

2. Job-related reading

messages (e.g., phone messages)
letters/emai Is
memos (e.g., interoffice)
reports (e.g., job evaluations, project reports)
schedules, labels, signs, announcements
forms, applications, questionnaires
financial documents (bills, invoices, etc.)
directories (telephone, office, etc.)
manuals, directions

3. Personal reading

newspapers and magazines

letters, emails, greeting cards, invitations
messages, notes, lists
schedules (train, bus, plane, etc.)
recipes, menus, maps, calendars
advertisements (commercials, want ads)
novels, short stories, jokes, drama, poetry
financial documents (e.g., checks, tax forms, loan .applications)
forms, questionnaires, medical reports, immigration documents
comic strips, cartoons

When we realize that this list is omy the beginning, it is easy to see how overwhelming it is to learn to read in a foreign language. The genre of a text enables readers to apply certain schemata that will assist them in extracting appropriate meaning. If for example; reatlers know that a text is a recipe, they will expect a certain arrangement of information (ingredients) and will know to search for a sequential, order of directions. Efficient readers also have to know what their purpose is in reading a text, the strategies for accomplishing that purpose, and how to retain the information.

MICROSKILLS, MACROSKILLS, AND STRATEGIES FOR READING

The micro- and macroskills below represent the spectrum of possibilities for objectives in the assessment of reading comprehension.

Micro-and macroskills for reading comprehension

Microskills

Discriminate among the distinctive graphemes and orthographic patterns of English.
Retain chunks of language of different lengths in short-term memory.
Process writing at an efficient rate of speed to suit the purpose.
Recognize a core of words, and interpret word order patterns and their sign ificance.
Recognize grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement, pluralization), patterns, rules, and elliptical forms.
Recognize that a particular meaning may be expressed in different grammatical forms.
Recognize cohesive devices in written discourse ahd their role in signaling the relationsh ip between and among clauses.

Macroskills

Recognize the rhetorical forms of written discourse and their significance for interpretation.
Recognize the communicative functions of written texts, according to form and purpose.
Infercontext that is not explicit by using background knowledge.
From described events, ideas, etc., infer links and connections between events, deduce causes and effects, and detect such relations as mai n idea, supporting idea, new information, given information, generalization, and exemple ification.
Distinguish between literal and implied meanings.
Detect culturally specific references and interpret them in a context of the appropriate cultural schemata.
Develop and use a battery of reading strategies, such as scanning and skimming, detecting discourse markers, guessing the meaning of words from context, and activating schemata for the interpretation of texts.

Some principal strategies for reading comprehension

Identify your purpose in reading a text.
Apply spelling rules and conventions for bottom-up decoding.
Use lexical analysis (prefixes, roots/suffixes, etc.) to determine meaning.
Guess at meaning (of words, idioms, etc.) when you aren't certain.
Skim the text for the gist and for main ideas.
Scan the text for specific information (names, dates, key words).
Use silent reading techniques for rapid processing.
Use marginal notes, outlines, charts, or semantic map for understanding
and retaining information.
Distinguish between literal and implied meanings.
Capitalize on discourse markers to process relationships.

TYPES OF READING

Perceptive. In keeping with the set of categories specified for listening comprehension, similar specifications are offered here; except with some differing terminology to capture the uniqueness of reading. Perceptive reading tasks involve attending to the components of larger stretches of discourse: letters, words,punctuation, and other graphemic symbols. Bottom-up processing is implied.
Selective. This category is largely an artifact of assessment formats. In order to ascertain one's reading recognition of lexical, grammatical, or discourse features of language within a very short stretch of language, c'ertain typical tasks are used: picture-cued tasks, matching, true/false, multiple-choice, etc.
Interactive. Included among interactive reading types are stretches of language of several paragraphs to one page or more in which the reader must, in a psycholinguistic sense, interact with the text. That is, reading is a process of negotiating meaning; the reader brings to the text a set of schemata for understanding it, and intake is the product of that interaction.
Extensive. Extensive reading, as discussed in this book, applies to texts of more than a page, up to an<:t including professional articles, essays, technical reports, short stories, and books. (It should be noted that reading research commonly refers to "extensive reading" as longer stretches of discourse, such as iong articles and books that are usually read outside classroom hour.

DESIGNING ASSESSMENT TASKS: PERCEPTIVE READING

At the beginning level of reading a second language lies a set of tasks that are fundamental and basic: recognition of. alphabetic symbols, capitalized and lowercase letters, punctuation, word and grapheme-phoneme correspondences. Such tasks of perception are often referred to as literacy tasks, implying that the learner is in the early stages of becoming "literate." Assessment of basic reading skills may be carried out in a number of different ways.

Reading Aloud

The test-taker sees separate letters, words, and/or short sentences and reads them aloud, one by one, in the presence of an administrator. Since the assessment is of reading comprehension, any recognizable oral approximation of the target response is considered correct.

Written Response

The same stimuli are presented, and the test-taker's task is' to reproduce the probe in writing. Because of the transfer across different skills here, evaluation of the testtaker's response must be carefully treated. If an error occurs, make sure you determine its source; what might be assumed to be a writing error, for example, may actually be a reading error, and vice versa.

Multiple-Choice

Multiple-choice responses are not only a matter of choosing one of four or five possible answers. Other formats, some of which are especially useful at the low levels of reading, include same/different, circle the answer, true/false, choose the letter, and matching. Here are somee possibilities.

Picture-Cued Items

Test-takers are shown a picture, such as the one on the next page, along vlith a written text and are given one of a number of possible tasks to perform.

DESIGNING ASSESSMENT TASKS: SELECTIVE READING

Just above the rudimentary skill level of perception of letters and words is a category in which the test designer focuses on formal aspects oflanguage (lexical, grammatical, and a few discourse features). This category includes what many incorrectly

think of as testing "vocabulary and grammar."

Multiple-Choice (for Form-Focused Criteria)

By far the most popular method of testing a reading knowledge of vocabulary and grammar is the multiple-choice format, mainly for reasons of practicality: it is easy to administer and can be scored quickly. The most straightforward .multiple-choice items may have little context, but might serve as a vocabulary or granitnar check.

Matching Tasks

At this selective level of reading, the test-taker's task is simply to respond correctly, which makes matching an appropriate format. The most frequently appearing criterion

in matching procedures is vocabulary.

Editing Tasks

Editing for grammatical or rhetorical errors is a widely used test method for assessing linguistic competence in reading. The TOEFL and many "other tests employ this technique with the argument that it not only focuses on grammar but also, introduces a simulation of the authentic task of editing, or discerning errors in written passages. Its authenticity may be supported if you consider proofreading as a real-world skill that is being tested. Here is a typical set of examples of editing.

Picture-Cued Tasks

In the previous section we looked at picture-cued tasks for perceptive recognition of symbols and words. Pictures and photographs may be equally well utilized for examining ability at the selective level. Several types of picture-cued methods are commonly used.

Test-takers read a sentence or passage and choose one of four pictures that is being described.The sentence (or sentences) at this level is more complex.
Test-takers read a series of sentences or definitions, each describing a labeledpart of a picture or diagram. Their task is to identify each labeled item. In the following diagram, test-takers do not necessarily know each term, but by reading the definition they are able to make an identification.

Gap-Filling Tasks

Many of the multiple-choice tasks described above can be converted into gap-filling, or "fill-in-the-blank,"'items in which the test-taker's response is to write a word or phrase. An extension of simple gap-fIlling tasks is to create sentence completion items where test-takers read part of a sentence and then complete it by writing a phrase.

DESIGNING ASSESSMENT TASKS: INTERACTIVE READING

Interactive tasks may therefore imply a little more focus on top-down processing than on bottom-up. Texts are a little longer, from a paragraph to as much as a page or so in the case of ordinary prose. Charts, graphs, and other graphics may be somewhat complex in their format.

Cloze Tasks

One of the most popular types of reading assessment task is the cloze procedure. The word cloie was coined by educational psychologists to capture the Gestalt psychological cohcept of "closure," that is, the ability to fill in gaps in an incomplete image (visual; auditory, or cognitive) and supply (from background schemata) " omitted details.

In written language, a sentence with a word left out showd have enough context that a reader can close that gap with a calculated guess, using linguistic expectancies (formal schemata), background experience (content schemata), and some strategic competence. Based on tIus assumption, cloze tests were developed for native language readers and defended as an appropriate gauge of reading ability.

Impromptu Reading Plus Comprehension Questions

If cloze testing is the most-researched procedure for assessing reading, the traditional "Read a passage and answer some questions" technique is undoubtedly the oldest and the most common. Virtually every profiCiency test uses the format, and one would rarely consider assessing reading without some component of the assessment involving impromptu reading and responding to questions. Notice that this set of questions, based on a 250-word passage, covers the comprehension of these features:

main idea (topic)
expressions/idioms/phrases in context
inference (implied detail)
grammatical features
detail (scanning for a specifically stated detail)
excluding facts not written (unstated details)
supporting idea(s)
vocabulary in context.

These specifications, and the questions that exemplify them, are not just a string of "straight" comprehension questions that follow the thread of the passage.The questions represent a sample of the test specifications for TOEFL readfug passages, which are derived from research on a variety of abilities good readers exhibit. Notice that many of them are consistent with strategies of effective reading: skimming for main idea, scanning for details, guessing word meanings from context, inferencing, using , discourse markers, etc.

Short-Answer Tasks

Multiple-choice items are difficult to construct and validate, and classroom teachers rarely have tinle in their busy schedules to design such a test. A popular alternative to multiple-choice questions following reading passages is the age old short- answer format. A reading passage is presented, and the test-taker reads questions that must be answered in a sentence or two.

Editing (Longer Texts)

Several advantages are gainednin the longer format.

First, authenticity is increased. The likelihood that students in English classrooms will read connected prose of a page or two is greater than the likelihood of their encountering the contrived format of unconnected sentences.

Second, the task simulates proofreading one's own essay, where it is imperative to find and correct errors. And third, if the test is connected to a specific curriculum (such as placement into one of several writing courses), the test designer. can draw up specifications for a number of grammatical and rhetorical categories that match the content of the courses. Content validity is thereby supported, and along with it the face validity of a task in which students are willing to invest.

Scanning

Scanning is a strategy used by all readers to fmd relevant information in a text. Assessment of scanning is carried out by presenting test-takers with a text (prose or something in a chart or graph format) and requiring rapid identification of relevant bits of information.

Ordering Tasks

Students always enjoy the activity of receivirtg little strips of paper, each with a sentence on it, anti assembling them irito a story, sometimes called the "strip story" technique. Variations on this can serve :as an assessment of overall global understanding of a story and of the cohesive devices that signal the order of events or ideas.

Information transfer: Reading charts, maps, graphs, and diagrams

Every educated person must be able to comprehend charts, maps, graphs, calendars, diagrams, and the like. Converting such nonverbal input into comprehensible intake requires not only an understanding of the graphic and verbal conventions of the medium but also a linguistic ability to interpret that information to someone else. Reading a map implies understanding the conventions of map graphics, but it is often accompanied by telling someone where to turn, how far to go, etc. Scanning a menu requires an ability to understand the structure of most menus as well as the capacity to give an order when the time comes. Interpreting the numbers on a stock market report involves the interaction of understanding the numbers and of conveying that understanding to others.

This implies a process of information transfer from one skill to another: in this case, from reading verbal and nonverbal information to speaking/writing.

DESIGNING ASSESSMENT TASKS: EXTENSIVE READING

Extensive reading involves somewhat longer texts than we have been dealing with up to this point. Journal articles, technical reports, longer essays, short stories, and books fall into this category. The reason for placing such reading into a separate category is that reading of this type of discourse almost always involves a focus on meaning using mostly top-down processing, with only occasional use of a targeted bottom-up strategy.

Another complication in assessing extensive reading is that the expected

response from the reader is likely to involve as much written (or sometimes oral) performance as reading.

Before examining,a few tasks that have proved to be useful in assessing extensive reading, it is essential to note that a number of the tasks described in previous categories can apply here. Among them are

impromptu reading plus comprehension questions,
short-answer tasks,
editing,
scanning,
ordering,
information transfer, and
interpretation (discussed under graphics),

Skimniing Tasks

Skimming is the process of rapid coverage of reading matter to determine its gist or main idea. It is a prediction strategy used to give a reader a sense of the topic and purpose of a text, the organization of the text, the perspective or point of view of the writer, its ease or difficulty, and/or its usefulness to the reader. Of course skimming can apply to texts of less than one page, so it would be wise not to confine this type of task just to extensive texts.

Summarizing and Responding

One of the most common means of assessing extensive reading is to ask the test-taker to write a summary of the text.

Note-Taking and Outlining

Finally, a reader's comprehension of extensive texts may be assessed through an evaluation of a process of note-taldng and/or outlining. Because of the difficulty of controlling the conditions and time frame for both these techniques, they rest firmly in the category of informal assessment. Their utility is in the strategic training that learners gain in retaining information through marginal notes that highlight key information or organizational outlines that put supporting ideas into a visually manageable framework. A teacher,perhaps in one on one conferences-with students, can use student notes/outlines as indicators of the presence or absence of effective reading strategies, and thereby point the learners in positive directions.

ASSESSING WRITING

(Page 218-250)

Not many centuries ago, writing was a skill that was the exclusive domain of scribes and scholars in educational or religious institutions. Almost every aspect of everyday life for "common" people was carried out orally. Business transactions, records, legal I documents, political and military agreements-all were written by specialists whose vocation it was to render language into the written word.

In the field of second language teaching, only a half-century ago experts were saying that writing was primarily a convention for recording speech and for reinforcing grammatical and lexical features of language. Now we understand the uniqueness ofwriting as a skill with its own features and conventions. We also fully understand the difficulty of learning to write "welltl in any language, even in our own native language.

GENRES OF WRITTEN LANGUAGE

1. Academic writing

papers and general subject reports
essays, compositions
academically focused journals

2. Job-related writing

messages (e.g., phone messages)
letters/emails
memos (e.g., interoffice)

3. Personal writing

letters, emails, greeting cards, invitations
messages, notes
calendar entries, shopping lists, reminders.

TYPES OF WRITING PERFORMANCE

Four categories ofwritten performance that capture the range ofwritten production are considered here. Each category resembles the categories defmed for the other three skills, but these categories, as always, reflect the uniqueness of the skill area.

Imitative. To produce written language, the learner must attain skills in the funqamental, basic tasks of.writing letters, words, punctuation, and very brief sentences. This category includes the ability to spell correctly and to perceive phoneme-grapheme correspondences in the English spelling system. It is a level at which learners are trying to master the mechanics of writing.
Intensive (controlled). Beyond the fundamentals of imitative writing are skills in producing appropriate vocabulary within a context, collocations and idioms, and correct grammatical features up to'the length of a: sentence.
Responsive. Here, assessment tasks require learners to perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected sequence of two or three paragraphs. Tasks respond to pedagogical directives, lists of criteria, outlines, and other guidelines. Genres of writing include brief narratives and descriptions, short reports, lab reports, summaries, brief responses to reading, and interpretations of charts or graphs.
Extensive. Extensive writing implies successful management of all the processes and strategies of writing for all purposes, up to the length of an essay, a term paper, a major research project report, or even a thesis. Writers focus on achieving a purpose, organizing and developing ideas logically, using details to support or illustrate ideas, demonstrating syntactiC and lexical variety, and in many cases, engaging in the process of mUltiple drafts to achieve a fmal product. Focus on grammatical form is limited to occasional editing or proofreading of a draft.

MICRO- AND MACROSKII.IS OF WRITING

We turn once again to a taxonomy of micro- and macroskills that will assist you in defining the ultimate criterion of an assessment procedure. The earlier microskills apply more appropriately to imitative and intensive types of writing task, while the macroskills are essential for the successful mastery of responsive and extensive writing.

Microskills

Produce graphemes and orthographic patterns of English.
Produce writing at an efficient rate of speed to suit the purpose.
Produce an acceptable core of words and use appropriate word order
patterns.
Use acceptable grammatical systems (e.g., tense, agreement, pluralization), patterns, and rules.
Express a particular meaning in different grammatical forms.
Use cohesive devices in written discourse.

Macroskills

Use the rhetorical forms and conventions of written discourse.
Appropriately accomplish the communicative functions of written texts according to form and purpose.
Convey links and connections between events, and communicate such
relations as main idea, supporting idea, new information, given information, generalization, and exemplification.
Distinguish between literal and implied meanings when writing.
Correctly convey culturally specific references in the context of the written text.
Develop and use a battery of writing strategies.

DESIGNING ASSESSMENT TASKS: IMITATIVE WRITING .

With the recent worldwide emphasis on teaching English at young ages, it is tempting to assume that every English learner knows how to handwrite the Roman alphabet. Such is not the case. Many beginning-level English learners, from young children to older adults, need basic training in and assessment of imitative writing: the rudiments offorming letters, words, and simple sentences.

Tasks in [Hand] Writing Letters, Words, and Punctuation

First, a comment should be made on the increasing use of personal and laptop computers and handheld instruments for creating written symbols. Handwriting has the potential of becoming a lost art as even very young children are more and more likely to use a keyboard to produce writing.

A few of the more common types are :

1. Copying.

2. Listening cloze selection tasks.

3. Picture-cued tasks.

4. Form completion tasks.

5. Converting numbers and abbreviations to words.

Spelling Tasks and Detecting Phoneme-Grapheme Correspondences

A number of task types are in popular use to assess the ability to spell words correctly and to process phoneme-grapheme correspondences.

1. Spelling tests.

2. Picture-cued tasks.

3. Multiple-choice techniques.

4. Matching phonetic symbols.

DESIGNING ASSESSMENT TASKS: INTENSIVE (CONTROLLED) WRITING

This next level ofwriting is what second language teacher training n1anuals have for decades called controlled writing. It may also be thought of as_ form-focused writing, grammar writing, or simply guided writing. A good deal of writing at this level is display writing as opposed to real writing.

Dictation and Dicto-Comp

Dictation was described as an assessment of the integration of listening and writing, but it was clear that the primary skill being assessed is listening. Because of its response mode, however, it deserves a second mention in this chapter. Dictation is simply the rendition in writing of what one hears aurally, so it could be classified as an imitative type of writing, especially since a proportion of the testtaker's performance centers on correct spelling.

A form of controlled writing related to dictation is a dicto-comp. Here, a paragraph is read at normal speed, usually two or three times;then the teacher asks students to rewrite the paragraph from the best oftheir recollection.

Grammatical Transformation Tasks

In the heyday of structural paradigms of language teaching with slot-filler ,techniques and slot substitution drills, the practice ofmaking grammatical transformations orally or in writing-was very popular. To this day, language teachers have also used this technique as an assessment task, ostensibly to measure grammatical competence. Numerous versions of the task are possible:

Picture-Cued Tasks

A variety of picture-cued controlled tasks have been used in English classrooms around the world.

1. Short sentences.

2. Picture description

3. Picture sequence description.

Vocabulary Assessment Tasks

Most vocabulary study is carried out through reading. A number of assessments of reading recognition ofvocabulary were discussed in the previous chapter: multiplechoice techniques, matching, picture-cued identification, cloze techniques, guessing the meaning of a word in context, etc. The major techniques used to assess vocabulary are (a) defiding and (b) using a word in a sentence.

Vocabulary assessment is clearly form-focused in the above tasks, but the procedures are creatively linked by means of the target word, its collocations, and its morphological variants.

Ordering Tasks

One task at the sentence level may appeal to those who are fond ofword games and puzzles: ordering (or reordering) a scralnbled set of words into a correct sentence. Here is the way the item format appears.

Short-Answer and Sentence Completion Tasks

Some types of short-answer tasks were discussed in Chapter 8 because of the heavy participation of reading performance in their completion. Such items range from very simple and predictable to somewhat more elaborate responses.

ISSUES IN ASSESSING RESPONSIVE AND EXTENSIVE WRITING

Responsive writing creates the opportunity for test-takers to offer an array of 'possible creative responses within a pedagogic11 or assessment framework: test-takers are "responding" to a prompt or assignment. Freed from the strict control of intensive writing, learners can exercise a number of options in choosing vocabulary, grammar, and discourse, but with some constraints and conditions. Criteria now· begin to include the discourse and rhetorical conventions of paragraph structure and of connecting two or three such paragraphs in texts oflimited length.

The genres of text that are typically addressed here are

short reports (with structured formats and conventions);
responses to the reading of an article or story;
summaries of articles or stories;
brief narratives or descriptions; and
interpretations of graphs, tables, and charts.

Both responsive and extensive writing tasks are the subject of some classic, widely debated assessment issues that take on a distinctly different flavor from those at the lower-end production of writing.

Authenticity. Authenticity is a trait that is given special attention
Scoring. Scoring is the thorniest issue at these fmal two stages of writing.
Time. Yet another assessment issue surrounds the unique nature of writing

DESIGNING ASSESSMENT TASKS: RESPONSIVE AND EXTENSIVE WRITING

In this section we consider both responsive and extensive writing tasks. They will -be regarded here as a continuum of possibilities ranging from lower-end tasks whose complexity exceeds those in the previous category of intensive or controlled writing, through more open-ended tasks such as writing short reports, essays, summaries, and responses. up to texts of several pages or more.

paraphrasing

One of the more difficult concepts for second language learners to grasp is paraphrasing.The initial step in teaching paraphrasing is to ensure that learners understand the importance of. paraphrasing: to say something in one's own words, to avoid plagiarizing, to offer some variety in expression. With those possible motivations and purposes in mind, the test designer needs to elicit a paraphrase of a sentence or paragraph, usually .not more.

Guided Question and Answer

Another lower-order task in this type of writing, which has the pedagogical benefit of guiding a learner without dictating the form of the output, is a guided questionand-answer format in which the test administrator poses a series of questions that essentially serve as an outline of the emergent written text. In the writing of a narrative that the teacher has already covered in a class discussion

A variation on using guided questions is to prompt the test-taker to write from an outline. The outline may be self-created from earlier reading and/or discussion, or, 'Yhich is less desirable, be provided by the teacher or test administrator.

Paragraph Construction Tasks

The participation of reading performance is inevitable in writing effective paragraphs. To a great extent, writing is the art of emulating what one reads.You read an effective paragraph; you analyze the ingredients of its success; you emulate it. Assessment of paragraph development takes on a number of different forms:

Topic sentence writing.
Topic development within a paragraph.
Development ofmain and supportihg ideas across paragraphs.

Strategic Options

Developing main and supporting ideas is the goal for the writer attempting to create an effective text, whether a short one- to two-paragraph one or an extensive one of several pages. A number .of strategies are commonly taught to second language writers to accomplish their purposes. Aside from strategies offreewriting, outlining, drafting, and revising.

1. Attending to task.

2. Attending to genre.

TEST OF WRITTEN ENGLISH (TWE®)

The TWE is in the category of a timed impromptu test in that test-takers are under a 30-minute time limit and are not able to prepare ahead of time for the topic that will appear.

SCORING METHODS FOR RESPONSIVE AND EXTENSIVE WRITING

At responsive and extensive levels of writing, three major approaches to scoring writing performance are commonly used by test designers: holistic, primary trait, and analytical. In the first method, a single score is assigned to an essay, which represents a reader's general overall assessment. Primary trait scoring is a variation of the holistic method in that the achievement of the primary purpose, or trait, of an essay is the only factor rated.

Holistic Scoring

Each point on a holistic scale is given a, systematic set of descriptors, and the reader-evaluator matches an overall impression with the descriptors to arrive at a score. Descriptors usually (but not always) follow a prescribed pattern. For example, the first descriptor across all score c;ategories may address the quality of task achievement, the second may deal with organization, the third with grammatical or rhetorical considerations, and' so on.

Primary Trait Scoring

A second method of scoring, prilnary trait, focuses on "how well students can write within a narrowly defmed range of discourse" (Weigle, 2002, p. 110).This type of scoring en1phasizes the task at hand and assigns a score based on the effectiveness of the text's achieving that one goal. For example, if the purpose or function of an essay is to persuade the reader to do something, the score for the writing would rise or fall on the accomplishment of that function.

Analytic Scoring

Analytic scoring of compositions offers writers a little more washback than a single holistic or primary trait score. Scores in five or six major elements will help to call the writers' attention to areas of needed improvement. Practicality is lowered in that more time is required for teachers to attend to details within each of the categories in order to render a fmal score or grade, but ultimately students receive more information about their writing

BEYOND SCORING: RESPONDING TO EXTENSIVE WRITING

Formal testing carries with it the burden of designing a practical and reliable instrument that assesses its intended criterion accurately. To accomplish that mission, designers of writing tests are charged with the task of providing as "objective" a scoring procedure as possible, and one that in many cases can be easily interpreted by agents beyond the learner. HolistiC, primary trait, and analytic scoring all satisfy those ends.Yet beyond mathematically calculated scores lies a rich domain of assessment in which a developing writer is coached from stage to stage in a process of building a storehouse of writing skills.

Most writing specialists agree that the best way to teach writing is a hands-on approach that stimulates student output and then generates a series ofse1f-assessments, peer editing and reviSion, and teacher response and conferencing (Raimes, 1991, 1998; Reid, 1993; Seow, 2002).

Assessing Initial Stages of the Process of Composing

Following are some guidelines for assessing the initial stages (the frrst draft or two) of a written composition. These guidelines are generic for self, peer,. and teacher , responding. Each assessor will need to modify the list according to the level of the learner, the context, and the purpose in responding.

The teacher-assessor's role is; as a guide, a facilitator, and an ally; therefore, assessment at this stage of writing heeds to be as positive as possible to encourage the writer.

Assessing Later Stages of the Process of Composing

Once the writer has determined and clarified his or her purpose and plan, and has completed at least one or perhaps two drafts, the focus shifts toward "fme tuning" the expression with a view toward a final revision.

Through all these stages it is assumed that peers and teacher are both responding to the writer through conferencing in person, electronic communication, or, at the very least, an exchange of papers.

This chapter completes the cycle of considering the assessment of all of the four skills of listening, speaking, reading, and writing. As you contemplate using some of the assessment techniques that have been suggested, I think you can now fully appreciate two significant overarching guidelines for designing an effective assessment procedure:

It is virtually impossible to isolate anyone of the four skills without the involvement of at least one other mode of performance. Don't underestimate the power of the integration ofskills in assessments designed to target a single skill area.
The variety ofassessment techniques and item types and tasks is virtually infmite in that there is always some possibility for creating a unique variation.

References:

Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education