Language Assessment

Kamis, 14 Mei 2020

Assignment 9
Summary Assessing Listening and Assessing Speaking from book Language Assessment principle and classroom practice by Douglas Brown

ASSESSING LISTENING
(page 112-139)

OBSERVING THE PERFORMANCE OF THE FOUR SKILLS

Before focusing on listening itself, think about the two interacting concepts of performance and observation. All language users perform the acts of listening, speaking, reading, and writing. When you propose to assess someone's ability in one or a combination of the four skills, you assess that person's competence, but you observe the person's performance.

So, one important principle for assessing a learner's competence is to consider the fallibility of the results of a single performance, such as that produced in a test. As with any attempt at measurement, it is your obligation as a teacher to triangulate your measurements: consider at least two (or more) performances and/or contexts before drawing a conclusion. That could take the form of one or more of the following designs:

several tests that are combined to form an assessment
a single test with multiple test tasks to account for learning styles and performance
in-class and extra-class graded work
alternative forms of assessment (e.g., journal, portfolio, conference, observation, self-assessment, peer-assessment).

Multiple measures will always give you a more reliable and valid assessment than a single measure. A second principle is one that we teachers often forget. We must rely as much as possible on observable performance in our assessments of students. Observable means being able to see or hear the performance of the learner (the senses of touch, taste, and smell don't apply very often to language testing!). You observe only the result of the meaningful input in the
form of spoken or written output, just as you observe the result of the wind by noticing trees waving back and forth.

THE IMPORTANCE OF LISTENING

Listening has often played second fiddle to its counterpart speaking. But it is rare to find just a listening test. One reason for this emphasis is that listening is often implied as a component of speaking. How could you speak a language without also listening? In addition, the overtly observable nature of speaking renders it more empirically measurable then listening.

Every teacher of language knows that one's oral production ability-other than monologues, speeches, reading aloud, and the like-is only as good as one's listening comprehension ability. But of even further impact is the likelihood that input in the aural-oral mode accounts for a large proportion of successful language acquisition. In a typical day, we do measurably more listening than speaking (with the exception of one or two of your friends who may be nonstop chatterboxes!).

We therefore need to pay close attention to listening as a mode of performance for assessment in the classroom. In this chapter, we will begin with basic principles and types of listening, then move to a survey of tasks that can be used to assess listening.

BASIC TYPES OF LISTENING

As with all effective tests, designing appropriate assessment tasks in listening begins with the specification of objectives, or criteria. Those objectives may be classified in terms -of several types of listening performance.

From these stages we can derive four commonly identified types of listening performance, each of which comprises a category within' which to consider assessment tasks and procedures.

Intensive. Listening for perception of the components (phonemes, words, intonation, discourse markers, etc.) of a larger stretch of language.
Responsive. Listening to a relatively short stretch of language (a greeting, question, command, comprehension check, etc.) in order to make an equally short response.
Selective. Processing stretches of discourse such as short monologues for several minutes in order to "scan" for certain information. The purpose of such performance is not necessarily to look for global or general meanings, but to be able to comprehend designated information in a context of longer stretches of spoken language (such as classroom directions from a teacher, TV or radio news items, or stories). Assessment tasks in selective listening could ask students, for example, to listen for names, numbers, a grammatical category, directions (in a map exercise), or certain facts and events.
Extensive. Listening to· develop a top-down, global understanding of spoken language. Extensive performance ranges from listening to lengthy lectures to listening to a conversation and deriving a comprehensive message or purpose. Listening for the gist, for the main idea, and making inferences are all part of extensive listening.

MICRO- AND MACRO SKILLS OF LISTENING

A usefull way of synthesizing the above two lists is to consider a finite number of micro- and macro skills implied in the performance of listening comprehension. Richards' (1983) list of micros kills has proven useful in the domain of specifying objectives for learning and may be even more useful in forcing test makers to carefully identify specific assessment objectives. The micro and macros skills provide 17 different objectives to assess in listening: Micro- and macro skills of listening (adapted from Richards, 1983)

Micro skills

Discriminate among the distinctive sounds of English.
Retain chunks of language of different lengths in short-term memory.
Recognize English stress patterns, words in stressed and unstressed positions, rhythm. In structure, intonation contours, and their role in signaling information.
Recognize reduced forms of words.
Distinguish word boundaries, recognize a core of words, and interpret word order patterns and their significance.
Process speech at different rates of delivery.
Process speech containing pauses, errors, corrections, and other performance variable
Recognize grammatical word classes (nouns, verbs, etc.
Detect sentence constituents and distinguish between major and minor constituents.
Recognize that a particular meaning may be expressed in different grammatical forms.
Recognize cohesive devices in spoken discourse.

Macro skills

Recognize the communicative functions of utterances, according to situations, participants, goals.
Infer situations, participants, goals using real-world knowledge.
From events, ideas, and so on, described, predict outcomes, infer links and connections between events, deduce causes and effects, and detect such relations as main idea, supporting idea, flew information, given information, generalization, and exemplification
Distinguish between literal and implied meanings.
Use facial, kinesic, body language, and other nonverbal clues to decipher meanings.
Develop and use a battery of listening strategies, such as detecting keywords, guessing the 'meaning of words from context, appealing for help, and signaling comprehension or lack there of.

DESIGNING ASSESSMENT TASKS: INTENSIVE LISTENING

Once you have determined objectives, your next step is to design the tasks, including making decisions about how you will elicit performance and how you will' expect the test-taker to respond. he focus in this section is on the micro skills of intensive listening.

Recognizing Phonological and Morphological Elements

A typical form of intensive listening at this level is the assessment of recognition of phonological and morphological elements of language. A classic test task gives a spoken stimulus and asks test-takers to identify the stimulus from two or more.

Paraphrase Recognition

The next step up on the scale of listening comprehension micros kills is words, phrases, and sentences, which are frequently assessed by providing a stimulus sentence and asking the test-taker to choose the correct paraphrase from a number of choices.

DESIGNING ASSESSMENT TASKS: RESPONSIVE LISTENING

A question-and-answer format can provide some interactivity in these lower-end listening tasks. The test-taker's response is the appropriate answer to a question. This item is recognition of the who question how much and its appropriate response.

DESIGNING ASSESSMENT TASKS: SELECTIVE LISTENING

A third type of listening performance is selective listening, in-which the test-taker listens to a limited quantity of aural input and must discern within it some specific information. A number of techniques have been used 'that require selective listening.

Listening Cloze

Listening cloze tasks (sometimes called cloze dictations or partial dictations) require the test-taker to listen to a story. Monologue, or conversation and simultaneously read the written text in which selected words or phrases have been deleted. Cloze procedure is most commonly associated with reading only. In its generic form, the test consists of a passage in which every nth word (typically every seventh word) is deleted and the test-taker is asked to. supply an appropriate word.

One potential weakness of listening cloze techniques is that they may simply become reading comprehension tasks. Test-takers who are asked to listen to a story with periodic deletions in the written version may not need to listen at all, yet may still be able to respond with the appropriate word or phrase.

Other listening cloze tasks may focus on a grammatical category such as verb tenses, articles, two-word verbs, prepositions, or transition words/phrases. Notice two important structural differences between listening cloze tasks and standard reading cloze.

Listening cloze tasks should normally use an exact word method of scoring, in which you accept as a correct response only the, actual word or phrase that was spoken and consider other appropriate words as incorrect.

Information Transfer

Selective listening can also be assessed through an information transfer technique in which aurally processed information must be transferred to a visual representation, such as labeling a diagram, identifying an element in a picture, completing a form, or showing routes on a map.

The objective of this task is to test prepositions and prepositional phrases of location (at the bottom, on top or around, along with larger, smaller), so other words and phrases such as back yard, yesterday, last few seeds, and scare away are supplied only as can’t and need not be tested.

Sentence Repetition

Sentence repetition is far from a flawless listening assessment task. Buck (2001, p.79) noted that such tasks "are not just tests of listening, but tests of general oral skills." Further, this task may test only recognition of sounds, and it can easily be contaminated by lack of short-term memory ability, thus invalidating it as an assessment of comprehension alone.

DESIGNING ASSESSMENT TASKS: EXTENSIVE LISTENING

Drawing a clear distinction between any two of the categories of listening referred to here is problematic, but perhaps the fuzziest division is between selective and extensive listening. As we gradually move along the continuum from smaller to larger stretches of language, and from micro to macro skills of listening, the probability of using more extensive listening ask.

Dictation

Dictation is a widely researched genre of assessing listening comprehension. In a dictation, test-takers hear a passage, typically of 50 to 100 words, recited three times: first, at normal speed; then, with long pauses between phrases or natural word groups, during which time test-takers write down what they have just heard; and finally, at normal speed once more so they can check their work and proofread.

Dictations have been used as assessment tools for decades. Some readers still cringe at the thought of having to render a correctly spelled, verbatim version of a paragraph or story recited by the teacher.

The difficulty of a dictation task can be easily manipulated by the length of the word groups (or bursts, as they are technically called), the length of the pauses, the speed at which the text is read, and the complexity of the discourse, grammar, and vocabulary used in the passage. Scoring is another matter. Depending on your context and purpose in administering a dictation, you will need to decide on scoring criteria for several possible kinds of errors:

Spelling error only, ,but the word appears to have been heard correctly
Spelling 'and/or obvious misrepresentation of a word, illegible word
Grammatical error (For example, test-taker hears I can’t do it, writes I can do it.)
Skipped word or phrase
permutation of words
Additional words not in the original
Replacement of a word with an appropriate synonym

Dictation seems to provide a reasonably valid method for integrating listening and writing skills and for tapping into the cohesive elements of language implied in short passages.

Communicative Stimulus-Response Tasks

A stimulus monologue or conversation and then is asked to respond to a set of comprehension. The monologues, lectures and brief conversations used in such tasks are sometimes a little contrived, and certainly the subsequent multiple-choice questions don't mirror communicative, real-life situations. But with some care and creativity, one can create reasonably authentic stimuli, and in some rare cases the response mode (as shown in one example below) actually approaches complete authenticity.

Authentic Listening Tasks

Ideally, the language assessment field would have a stockpile of listening test types that are cognitively demanding. Communicative, and authentic, not to mention interactive by means of an integration with speaking. However, the nature of a test as a sa1nple of performance and a set of tasks with limited time frames implies an equally limited capacity to mirror all the real-world contexts of listening performance.

There is no such thing as stated Buck (200 1, p. 92)."Every test requires some communicative language ability, and no test covers them all. Similarly, with the notion of authenticity, every task shares some characteristics with target-language tasks, and no test is completely authentic.

ASSESSING SPEAKING
(Page 140-184)

All of these issues will be addressed in this chapter as we review types of
spoken language and micro- and macros kills of speaking, then outline numerous tasks for assessing speaking.

BASIC TYPES OF SPEAKING

In Chapter 6, we cited four categories of listening performance assessment tasks. A similar taxonomy emerges for oral production.

Imitative. At one end of a continuum of types of speaking performance is the ability to simply parrot back (imitate) a word or phrase or possibly a sentence. While this is a purely phonetic level of oral production, a number of prosodic, lexical, and grammatical properties of language may be included in the criterion performance.
Intensive. A second type of speaking frequently employed in assessment contexts is the production of short stretches of oral language designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships (such as prosodic elements-intonation, stress, rhythm, juncture). The speaker must be aware of semantic properties in order to be able to respond, but interaction with an interlocutor or test administrator is minimal at best.
Responsive. Responsive assessment tasks include interaction and test comprehension but at the somewhat limited level of very short conversations, standard greetings and small talk, simple requests and comments, and the like. The stimulus is almost always a spoken prompt (in order to preserve authenticity), with perhaps only one or two follow-up questions or retorts.
Interactive. The difference between responsive and interactive" speaking is in the length and complexity of the interaction, which sometimes includes multiple exchanges and/or multiple participants. Interaction can take the two forms of transactional language, which has the purpose of exchanging specific information, or interpersonal exchanges, which have the purpose of maintaining social relationships.
Extensive (monologue). Extensive oral production tasks include speeches, oral presentations, and story-telling, during which the opportunity for oral interaction from listeners is either highly limited (perhaps to nonverbal responses) or ruled out altogether.

MICRO-AND MACROSKILLS OF SPEAKING

A similar list of speaking

skills can be drawn up for the same purpose: to serve as a taxonomy of skills from which you 'will select one or several that will become the objective(s) of an assessment task. The microskills refer to producing the smaller chunks of language such as phonemes, morphemes, words, collocations, and phrasal units. The macroskills simply thespeaker's focus on the larger elements: fluency, discourse, function, style, cohesion, nonverbal communication, and strategic options. The micro-and macros kills total roughly 16 different objectives to assess in speaking.

Microskills

Produce differences among English phonemes and allophonic variants.
Produce chunks of language of different lengths.
Produce English stress patterns, words in stressed and unstressed positions, rhythmic structure, and intonation contours.
Produce reduced forms of words and phrases.
Use an adequate number of lexical units (words) to accomplish pragmatic purposes.
Produce fluent speech at different rates of delivery.
Monitor one's own oral production and use various strategic devices pauses, fillers, self-corrections, back tracking-to enhance the clarity of the message.
Use grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement, pluralization), word order, patterns, rules, and elliptical forms.
Produce speech in natural constituents: in appropriate phrases, pause groups, breath groups, and sentence constituents.
Express a particular meaning in different grammatical forms.
Use cohesive devices in spoken discourse.

Macroskills

Appropriately accomplish communicative functions according to situations, participants, and goals.
Use appropriate styles, registers, implicature, redundancies, pragmatic conventions, conversation rules, floor-keeping and -yielding, interrupting, and other sociolinguistic features in face-to-face conversations.
Convey links and connections between events and communicate such relations as focal and peripheral ideas, events and feelings, new information and given information, generalization and exemplification.
Convey facial features, kinesics, body language, and other nonverbal cues along with verbal language.
Develop and use a battery of speaking strategies, such as emphasizing key words, rephrasing, providing a context for interpreting the meaning of words, appealing for help, and accurately assessing how well your interlocutor is understanding you.

DESIGNING ASSESSMENT TASKS: IMITATIVE SPEAKING

You may be surprised to see the inclusion of simple phonological imitation in a consideration of assessment of oral production. After all, endless repeating of words, phrases, and sentences was the province of the long-since-discarded Audio lingual Method, and in an era of communicative language teaching, many believe that non meaningful imitation of sounds is fruitless. Such opinions-have faded in recent years as we discovered that an overemphasis on fluency can sometimes lead to the decline of accuracy in speech. And so we have been paying more attention to pronunciation, especially' supra segmentals, in an attempt to help learners be more comprehensible. .

An occasional phonologically focused repetition task is warranted as long as repetition tasks are not allowed to occupy a dominant role in an overall oral production assessment, and as long as you artfully avoid a negative washback effect. Such tasks range from word level to sentence level, usually with each item focusing on. a specific phonological criterion. In a simple repetition task, test-takers repeat the stimulus, whether it is a pair of words, a sentence, or perhaps a question (to test for intonation production).

PHONEPASS TEST

The PhonePass findings could signal an increase in the future use of repetition

and read-aloud procedures for the assessment oL.oraLproduction. Because a testtaker'S output is completely controlled, scoring using speech-recognition technology becomes achievable and practical. As researchers uncover the constructs

underlying both repetition/read-aloud tasks and oral production in all its complexities, we will have access to more comprehensive explanations of why such simple tasks appear to be reliable and valid indicators of very complex oral production proficiency.

DESIGNING ASSESSMENT TASKS: INTENSIVE SPEAKING

At the intensive level, test-takers are prompted to produce short stretches of discourse (no more than a sentence) through which they demonstrate linguistic ability at a specified level of language. Many tasks are "cued" tasks in that they lead the test taker into a narrow band of possibilities.

Directed Response Tasks

In this type of task, the test administrator elicits a particular grammatical form or a transformation of a sentence. Such tasks are clearly mechanical and not communicative, but they do require minimal processing of meaning in order to produce the correct grammatical output.

Read-Aloud Tasks

Intensive reading-aloud tasks include reading beyond the sentence level up to a paragraph or two. This technique is easily administered by selecting a passage that incorporates test specs and by recording the test-taker's output; the scoring is relatively easy because all of the test taker's oral production is controlled. Because of the results of research on the Phone Pass test, reading aloud may actually be a surprisingly strong indicator of overall oral production ability.

Sentence/Dialogue Completion Tasks and Oral Questionnaires

Another technique for targeting intensive aspects of language requires test-takers to read dialogue in which one speaker's lines have been omitted. Test-takers are first given time to read through the dialogue to get its gist and to think about appropriate lines to fill in. Then as the tape, teacher, or test administrator produces one part orally, the test-taker responds.

Picture-Cued Tasks

One of the more popular ways to elicit oral language performance at both intensive and extensive levels is a picture-cued stimulus that requires a description from the test taker. Pictures may be very simple, designed to elicit a word or a phrase; somewhat more elaborate and "busy"; or composed of a series that tells a story or incident. Here is an example of a picture-cued elicitation of the production of a simple minimal pair.

Opinions about paintings, persuasive monologue and directions on a map create a more complicated problem for scoring. More demand is placed on the test administrator to make calculated judgments, in which case a modified form of a scale such as the one suggested for evaluating interviews (below) could be used:

grammar
vocabulary
comprehension
fluency
pronunciation
task (accomplishing the objective of the elicited task)

Translation (of Limited Stretches of Discourse)

Translation is a part of our tradition in language teaching that we tend to discount or disdain, if only because our current pedagogical stance plays down its importance. Translation methods of teaching are certainly passe in an era of direct approaches to creating communicative classrooms. But we should remember that in countries where English is not the native or prevailing language, translation is a meaningful communicative device in contexts where the English user is. called on to be an interpreter. Also, translation is a well-proven communication strategy for learners of a second language.

DESIGNING ASSESSMENT TASKS: RESPONSIVE SPEAKING

Assessment of responsive tasks involves brief interactions with an interlocutor, differing from intensive tasks in the increased creativity given to the test-taker and from interactive tasks by the somewhat limited length of utterances.

Question and Answer

Question-and-answer tasks can consist of one or two questions from an interviewer, or they can make up a portion of a whole battery of questions and prompts in an oral interview. They can vary from simple questions like "What is this called in English?" to complex questions like "What are the steps governments should take, if any, to stem the rate of deforestation in tropical countries?" The first question is intensive in its purpose; it is a display question intended to elicit a predetermined correct response. We have already looked at some of these types or questions in the previous section. Questions at the responsive level tend to be genuine referential questions in which the test-taker is given more opportunity to produce meaningful language in response.

Giving Instructions and Directions

We are all called on in our daily routines to read instructions on how to operate an appliance, how to put a bookshelf together, or how to create a delicious clam chowder. Somewhat less frequent is the mandate to provide such instructions orally, but this speech act is still relatively common. Using such a stimulus in an assessment context provides an opportunity for the test-taker to engage in a relatively extended stretch of discourse, to be very clear and specific, and to use appropriate discourse markers and connectors. The technique is Simple: the administrator poses the problem, and the test-taker responds. Scoring is based primarily on comprehensibility and scondari1y on other specified grammatical or discourse categories. Here are some possibilities.

Paraphrasing

Another type of assessment task that can be categorized as responsive asks the test taker to read or hear a limited number of sentences (perhaps two to five) and-produce a paraphrase of the sentence. The advantages of such tasks are that they elicit short stretches of output and perhaps tap into test-takers' ability to practice the conversational art of conciseness by reducing the output/input ratio.

TEST OF SPOKEN ENGLISH (TSE)

Somewhere straddling responsive, interactive, and extensive speaking tasks lies another popular commercial oral production assessment, the Test of Spoken English (TSE)'. The TSE is a 20-minute audio taped test of oral language ability within an academic or professional environment. TSE scores are used by many North American institutions of higher education to select international teaching assistants. The scores are also used for selecting and certifying health professionals such as physicians, nurses, pharmacists, physical therapists, and veterinarians.

The following content specifications for the TSE represent the discourse and pragmatic contexts assessed in each administration:

Describe something physical.
Narrate from presented material.
Summarize information of the speaker's own choice.
Give directions based on visual materials.
Give instructions.
Give an opinion.
Support an. opinion.
Compare/contrast.
Hypothesize.
Function "interactively."
Define.

DESIGNING ASSESSMENT TASKS: INTERACTIVE SPEAKING

The final two categories of oral production assessment (interactive and extensive speaking) include tasks that involve relatively long stretches of interactive discourse (interviews, role plays, discussions, games) and tasks. of equally long duration but that involve less interaction (speeches, telling longer stories, and extended explanations and translations).The obvious difference between the two sets of tasks is the degree of interaction with' an interlocutor. Also, interactive tasks are what some would describe as interpersonal, while the final category includes more transactional speech events.

Interview

When "oral production assessment" is mentioned, the first thing that comes to mind is an oral interview: a test administrator and a test-taker sit downjn a direct face-to face exchange and proceed through a protocol of questions and directives. The interview, which may be tape-recorded for re-listening, is then scored on one or more parameters such as accuracy in pronunciation and/or grammar, vocabulary usage, fluency, sociolinguistic/pragmatic appropriateness, task accomplishment, and even comprehension.

Every effective interview contains a number of mandatory stages. Two decades ago, Michael Canale (1984) proposed a framework for oral 8proficiency testing that has with stood the test of time. He suggested that test-takers will perform at their best if they are led through four stages:

1. Warm-up

2. Level check.

3. Probe.

4. Wind-down.

The success of an oral interview will depend on

clearly specifying administrative procedures of the assessment (practicality),
focusing the questions and probes on the purpose of the assessment (validity),
appropriately eliciting an optimal amount and quality of oral production from the test taker (biased for best performance),and
creating a consistent, workable scoring system (reliability).

Role Play

Role playing is a popular pedagogical activity in communicative language-teaching classes. Within const set forth by the guidelines, it frees students to be somewhat creative in their linguistic output. In some versions, role play allows some rehearsal time so that students can map out what they are going to say. And it has the effect of lowering anxieties as students can, even for a few moments, take on the persona of someone other than themselves.

Discussions and Conversations

As formal assessment devices, discussions and conversations with and among students are difficult to specify and even more difficult to score. But as informal techniques to assess learners, they offer a level of authenticity and spontaneity that other assessment techniques may not provide. Discussions may be especially appropriate tasks through which to elicit and observe such abilities as

topic nomination, maintenance, and termination;
attention getting, interrupting, floor holding, control;
clarifying, questioning, paraphrasing; comprehension Signals (nodding, "uh-huh,""hmm," etc.);
negotiating meaning;
intonation patterns for pragmatic effect;
kinesics, . eye contact, proxemics, body language; and
politeness, formality, and other sociolinguistic factors.

Games

Among informal assessment devices are a variety of games that directly involve language production.

ORAL PROFICIENCY INTERVIEW (OPI)

The best known oral interview format is one that has gone through a considerable metamorphosis over the last half-century, the Oral Proficiency Interview (OPI). Originally known as the Foreign Service Institute (FSI) test, the OPI is the result of a historical progression of revisions under the auspices of several agencies, including the Educational Testing Service and the American Council on Teaching Foreign Languages (ACTFL). The latter, a-professional society for research on foreign language instruction and assessment, has now become the principal body for promoting the use of the OPI. The OPI is widely used across dozens of languages around the world. Only certified examiners are authorized to administer the OP!; certification workshops are available, at costs of around $700 for ACTFL members, through ACTFL at selected sites and conferences throughout the year.

Bachman (1988, p. 149) also pointed out that the validityof the OPI simply cannot be demonstrated because it confounds abilities with elicitation procedures in its design, and it provides only a single rating, which has no basis in either theory or research.

DESIGNING ASSESSMENTS: EXTENSIVE SPEAKING

Extensive speaking tasks involve complex, relatively lengthy stretches of discourse. They are frequently variations on monologues, usually with minimal verbal interaction.

Oral Presentations

In the academic and professional arenas, it would not be uncommon to be called on to present are port- a-paper; a marketing plan, a-sales-idea, a design of a new product, or a method. A summary of oral assessment techniques would therefore be incomplete without some consideration of extensive speaking tasks. Once again the rules! i' for effective assessment must be invoked: (a) specify the criterion, (b) set appropriate tasks, (c) elicit optimal output, and (d) establish practical, reliable scoring pro-/ / cedures. And once again scoring is the key assessment challenge. "

For oral presentations, a checklist or grid is a common means of scoring or evaluation. Holistic scores are tempting to use for their apparent practicality, but they may obscure the variability of performance across several subcategories, especially the two major components of content and delivery. Following is an example of a checklist for a prepared oral presentation at the intermediate or advanced level of English

Picture-Cued Story-Telling

One of the most common techniques for eliciting oral production is through visual pictures, photographs, diagrams, and charts. We have already looked at this' elicitation device for intensive tasks, but at this level we consider a picture or a series of pictures as a stimulus for a longer story or description.

Retelling a Story, News Event

In this type of task, test-takers hear or read a story or news event that they are asked to retell. This differs from the paraphrasing task discussed above (pages 161-162) in that it is a longer stretch of discourse and a different genre. The objectives in assigning such a task vary from listening comprehension of the original to production of a number of oral discourse features (communicating sequences and relationships 01 events, stress and emphasis patterns, "expression" in the case of a dramatic story), fluency, and interaction with the hearer. Scoring should of course meet the intended criteria.

Translation (of Extended Prose)

Translation of words, phrases, or short sentences was mentioned under the category of-intensive speaking. Here, longer texts are presented for the test-taker to read in the native language and then translate into English. Those texts could come in many forms: dialogue, directions for assembly of a product, a synopsis of a story or play or movie, directions on how to find something on a map, and other genres. The advantage of translation is in the control of the content, vocabulary, and, to some extent, the grammatical and discourse features.

References:

Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education

Sabtu, 02 Mei 2020

Assignment 8
Summary Beyond Test: Alternatives in Assessment

BEYOND TESTS: ALTERNATIVES IN
ASSESSMENT

The defining characteristics of the various alternatives in assessment that have been commonly used across the profession were aptly summed up by Brown and Hudson (1998, pp. 654-655). Alternatives in assessments

require students to perform, create, produce, or do something;
use real-world contexts or Simulations;
are nonintrusive in that they extend the day-to-day classroom activities;
allow students to be assessed on what they normally do in class every day;
use tasks that represent meaningful instructional activities;
focus on processes as well as products;
tap into higher-level thinking and problem-solving skills;
provide information about both the strengths and weaknesses of students;
are multiculturally sensitive whenpropedy administered;
ensure that people, not machines, do the scoring, using human judgment;
encourage open disclosure of standards and rating criteria; and
call upon teachers to perform new instructional and assessment roles.

THE DILEMMA OF MAXIMIZING BOTH PRACTICALITY AND WASHBACK

The principal purpose of this chapter is to examine some of the alternatives in assessment that are markedly different from formal tests. Tests, especially large-scale standardized tests, tend to be one-shot performances that are timed, multiple-chOice, decontextualized, norm-referenced, and that foster extrinsic motivation. On the other hand, tasks like portfolios, journals, and self-assessment are

open-ended in their time orientation and format,
contextualized to a curriculum,
referenced to the criteria (objectives) of that curriculum, and
likely to build intrinsic motivation.

Formal standardized tests are almost by definition highly practical, reliable instruments. They are designed to minimize time and money on the part of test designer and test-taker, and to be painstakingly accurate in their scoring. AJternatives such as portfolio, or conferencing with students on drafts of written work, or observations of learners over time all require considerable time and effort on the part of the teacher and the student.

PERFORMANCE-BASED ASSESSMENT

O'Malley and Valdez Pierce (1996) considered performance-based assessment to be a subset of authentic assessment. In other words, not all authentic assessment is performance-based. One could infer that reading, listening, and thinking have many authentic manifestations, but since they are not directly observable in and of themselves, they are not performance-based. According to O'Malley· and Valdez Pierce (p. 5), the following are characteristics of performance assessment:

Students make a constructed response.
They engage in bigber-order tbinking, with open-ended tasks.
Tasks are meaningful engaging, and authentic.
Tasks call for the integration oflanguage skills.
Both process and product are assessed.
Depth of a student's mastery is emphasized over breadth.

Performance-based assessment needs to be approached with caution. It is tempting for teachers to assume that if a student is doing something, then the process hasfulfilled its own goal and the evaluator-needs only to make a mark inthe grade book that says "accomplished” next to a particular competency. In reality, performances as assessment procedures need to be treated with the same rigor as traditional tests. This implies that teachers should

state the overall goal of the performance,
specify the objectives (criteria) of the performance in detail,
prepare students for performance in stepwise progressions,
use a reliable evaluation form, checklist; or rating sheet,
treat performances as opportunities for giving feedback and provide that feedback systematically, and
if possible, utilize self- and peer-assessments judiciously.

PORTOFOLIOS

According to Genesee and Upshur (1996), a portfolio is "a purposeful collection ofstudents' work that demonstrates ... their efforts, progress, and achievements in given areas" (p. 99). Portfolios include materials such as

essays and compositions in draft and fmal forms;
reports, project outlines;
poetry and creative prose;
artwork, photos, newspaper or magazine clippings;
audio and/or video recordings of presentations, demonstrations, etc.;
journals, diaries, and other personal reflections; .
tests, test scores, and written homework exercises;
notes on lectures; and
self· and peer-assessments comments, evaluations, and checklists.

Gottlieb (1995) suggested a developmental scheme for considering the nature and purpose of portfolios, using the acronym CRADLE to designate six possible attributes of a portfolio:

Collecting
Reflecting
Assessing
Documenting
Linking
Evaluating

The advantages of engaging students in portfolio development have been extolled in a number ofsources (Genesee & Upshur, 1996; O'Malley &Valdez Pierce, 1996; Brown & Hudson, 1998; Weigle, 2002). A synthesis of those characteristics gives us a number of potential benefits. Portfolios

foster intrinsic motivation, responsibility, and ownership,
promote student-teacher interaction with the teacher as facilitator,
individualize learning and celebrate the uniqueness of each student,
provide tangible evidence of a student's work,
facilitate Critical thinking, self-assessment, and revision processes,
offer opportunities for collaborative work with peers, and
permit assessment of multiple dimensions of language learning.

At the same time, care must be taken lest portfolios become a haphazard pile of "junk" the purpose of which is a mystery to both teacher and student. Portfolios can fail if objectives are not clear, if guidelines are not given to students, if systematic periodic review and feedback are not present, and so on. Sometimes the thought of asking students to develop a portfolio is a daunting challenge, especially for new teachers and for those who have never created a portfolio on their own. Successful portfolio development will depend on following a number of steps and guidelines.

State objectives clearly.
Give guidelines on what materials to include.
Communicate assessment criteria to students.
Designate time within the curriculum for portfolio development.
Establish periodic schedules for review and conferencing.
Designate an accessible place to keep portfolios.
Provide positive washback-giving final assessments.

JOURNAlS

A journal is a log (or "account") of one's thoughts, feelings, reactions, assessments, ideas, or progress toward goals, usually written with little attention to structure, form, or correctness. Learners can articulate their thoughts without the threat of those thoughts being judged later (usually by the teacher). Sometimes journals are rambling sets of verbiage that represent a stream of consciousness with no particular point, purpose, or audience. Fortunately, models of journal use in educational practice have sought to tighten up this style of journal in· order to give them some focus (Staton et al., 1987). The result is the emergence of a number of overlapping categories or purposes in journal writing, such as the following:

language-learning logs
grammar journals
responses to readings
strategies-based learning logs
self-assessment reflections
diaries of attitudes, feelings, and other affective factors
acculturation logs

It is important to turn the advantages and potential drawbacks of journals into positive general steps and guidelines for using journals as assessment instruments. The following steps are not coincidentally parallel to those cited above for portfolio development:

Sensitively introduce students to the concept ofjournal writing.
State the objective(s) of the journal.
Give guidelines on what kinds oftopics to include.
Carefully specify the criteria for assessing or grading journals.
Provide optimalfeedback in your responses.
Designate appropriate time frames and scbedules for review.
Provide formative, wasbback-giving final comments.

CONFERENCES AND INTERVIEWS

Conferences are not limited to drafts of written work. Including portfolios and journals discussed above, the list of possible functions and subject matter for conferencing is substantial:

commenting on drafts of essays and reports
reviewing portfolios
responding to journals ,
advising on a student's plan for an oral presentation
assessing a proposal for a project
giving feedback on the results of performance on a test
clarifying understanding of a reading
exploring strategies-based options for enhancement or compensation
focusing on aspects of oral production
checking a student's self-assessment of a performance
setting personal goals for the near future
assessing general progress in a course

Discussions of alternatives in assessment usually encompass one specialized kind of conference: an interview. This term is intended to denote a context in .. which a teacher interviews a student for a designated assessment purpose. Interviews may have one or more of several possible goals, in which the teacher

assesses the student's oral production,
ascertains a studenfs needs before deSigning a course or curriculum,
seeks to discover a student's learning styles and preferences,
asks a student to assess his or her own petiormance, and
requests an evaluation of a course.

OBSERVATIONS

All teachers, whether they are aware of it or not, observe their students in the classroom almost constantly Virtually every question. every response, and almost every nonverbal behavior is, at some level of perception, noticed. All those intuitive perceptions are stored as little bits and pieces of information about students that can form a composite impression of a student's ability. Without eyer administering a test or a quiz, teachers know a lot about their students. In fact, experienced teachers are so good at this almost subliminal process of assessment that their estimates of a student's competence are often highly correlated with actual independently administered test scores. (See Acton, 1979, for an example.)

Designing a system for observing is no simple task. Recording your observations can take the form of anecdotal records, checklists, or rating scales. Anecdotal records should be as specific as possible in focusing on the objective of the observation, but they are so varied in form that to suggest format nere wouId- be counterproductive. Their very purpose is more note-taking than record-keeping.The key is to devise a system that maintains the principle of reliability as closely as possible. Checklists are a viable alternative for recording observation results. Some checklists ofstudent classroom performance, such as the COLT observation scheme devised by Spada and Frohlich (1995), are elaborate grids referring to such variables as

whole-class, group, and individual participation,
content of the topic,
linguistic competence (form, function, discourse, Sociolinguistic),
materials being used, and
skill (listening, speaking, reading, writing),

SELF- AND PEER-ASSESSMENTS

Self-assessment derives its theoretical justification from a number of wellestablished principles of second language acquisition. The principle of autonomy starids_ qut as one of the primary foundation stones of successful learning. The ability to set one's own goals both within and beyond the structure of a classroom curriculum, to pursue them without the presence of an external prod, and to independently monitor that pursuit are all keys to success. Developing intrinsic motivation that comes from a self-propelled desire to excel is at the top of the list of successful acquisition of any set of skills.

Peer-assessment appeals to similar prinCiples, the most obvious ofwhich is cooperative learning. Many people go through a whole regimen of education from kindergarten up through a graduate degree and never come to appreciate the value of collaboration in learning-the benefit of a community oflearners capable of teaching each 'other something. Peer-assessment is simply one arm of a plethoIa of tasks and procedures within the domain of learner-centered and collaborative education.

Types of Self- and Peer-Assessment

It is important to distinguish among several different types ofself- and peer-assessment and to apply them accordingly. I have borrowed from widely accepted classifications of strategic options to create five categories of self- and peer-assessment:
(1) direct assessment of performance, (2) indirect assessment of performance, (3) metacognitive assessment, (4) assessment of socioaffective factors, and (5) student self-generated tests.

Assessment offa specific} performance. In this category, a student typically monitors him- or herself-in either oral or written production-and renders some kind of evaluation of performance. The evaluation takes place immediately or very soon after the performance. Thus, having made an oral presentation, the student (or a peer) fills out a checklist that rates performance on a defined scale. Or perhaps the student views a video-recorded lecture and completes a self-corrected ·comprehension quiz. A journal mayserve as a tool for such"self-assessment. Peer editing is an excellent example of direct assessment of a specific performance.
Indirect assessment of[general) competence. Indirect self- or peer-assessment targets larger slices of time with a view to rendering an evaluatioIl'of general ability, as opposed to one specific, relatively time-cortstrained performance. The distinction between direct and indirect assessments is the classic competence-performance distinction. Self- and peer-assessments of performance are limited in time and focus to a relatively short performance.
Metacognitive assessment [for setting goals}. Some kinds of evaluation are more strategic in nature, with the purpose not just of viewing past performance -or competence but of setting goals and maintaining an eye on the process oftheir pursuit. Personal goal-setting has the advantage offostering intrinsic motivation and of providing learners with that extra-special impetus from having set and accomplished one's own goals. Strategic planning and self-monitoring can take the form of journal entries, choices from a list of possibilities, questionnaires, or cooperative (oral) pair or group planning.
Socioaffective assessment. Yet another type of self- and peer-assessment comes in the form of methods of examining affective factors in learning. Such assessment is quite different from looking at and planning linguistic aspects of acquisition. It requires looking at oneself through a psychological lens and may not differ greatly from self-assessment across a number of subject-matter areas or for any set of personal skills.
Student-generated tests. A final type of assessment that is not usually classified strictly as self- or peer-assessment is the technique of engaging students in the process of constructing tests themselves. The traditional view of what a test is would never allow students to engage in test construction, but student-generated tests can be productive, intrinsically motivating, autonomy-building processes.

Guidelines for Self- and Peer-Assessment

Self- and peer-assessment are among the best possible formative types of assessment and possibly the most rewarding, but they must be carefully designed and administered for them to reach their potential. Four guidelines will help teachers bring this intrinsically motivating task into the classroom successfully.

Tell students the purpose ofthe assessment. Self-assessment is a process that many students-especially those in traditional educational systems-will initially fmd quite uncomfortable.
Define the task(s) clearly. Ifyou are offering a rating sheet or questionnaire, the task is not complex, but an open-ended journal entry could leave students perplexed about what to write. Guidelines and models will be of great help in clarifying the procedures.
Encourage impartial evaluation ofperformance or ability. One of the greatest drawbacks to self-assessment is the threat of subjectivity.
Ensure benefictal washback through follow-up tasks. It is not enough to simply toss aself-checklist at students and then walk away. Systematic follow-up can be accomplished through further self-analysis, journal reflection, written feedback from the teacher, conferencing with the teacher, purposeful goal-setting by the student, or any combination of the above.

A Taxonomy of Self-and Peer-Assessment Tasks

An evaluation of self- and peer-assessment according to our classic principles of assessment yields a pattern that is quite consistent with other alternatives to assessment that have been analyzed in this chapter. Practicality can achieve a moderate level with such procedures as checklists· and questionnaires, while reliability risks remaining at a low level, given the variation within and across learners. Once students accept the notion that they can legitimately assess themselves, then face validity can be raised from what might otherwise be a low level. Adherence to course objectives will maintain a high degree of content validity. Authenticity and washback both have very high potential because students are centering on their own linguistic needs and are receiving useful feedback.

References:
Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education

Jumat, 10 April 2020

Assignment 7
Summary Standar Based Assessment

STANDARD-BASED ASSESSMENT

Standardized test is an assessment instrument for which there are uniform procedures for administration, design, scoring, and reporting it also a procedure that, through repeated administrations and ongoing research demonstrates criterion and construct validity. But a third, and perhaps the most important, element of standardized testing is the presupposition of an accepted set of standards on which to base the procedure. A history of standardized testing in the United States reveals that during most of the decades in the middle of the twentieth century, standardized tests enjoyed a popularity and growth that was almost unchallenged.

Toward the end of the twentieth century, such claims began to be challenged on all fronts (see Medina & Neill, 1990; Kohn, 2000), and at the vanguard" of those challenges were the teachers of those millions of children.

ELD STANDARDS

The process of designing and conducting appropriate periodic reviews of ELD standards involves dozens of curriculum and assessment specialists, teachers, and researchers (Fields, 2000; Kuhlman, 2001).

Standards-setting is a global challenge. In many non-English-speaking countries, English is now a required subject starting as early as the first grade in some countries and by the seventh grade in virtually every country worldwide. In Japan and Korea, for example, a "commnunicative" curriculum in English is required from third grade onward.

ELD ASSESSMENT

The development of standards obviously implies the responsibility for correctly assessing their attainment. As standard-based education became more accepted in the 1990s, many school systems across the United States found that the standardized tests of past decades were not in line with newly developed standards.Thus began the interactive process not only of developing standards but also of creating standards-based assessments.

The process of administering a comprehensive, valid, and fair assessment of ELD students continues to be perfected. Stringent budgets within departments of education worldwide predispose many in decision-making positions to rely on traditional standardized tests for ELD assessment, but rays of hope lie in the exploration of more student-centered approaches to learner assessment. Stack, Stack, and Fern (2002), for example, reported on a portfolio assessment system in the San Francisco Unified School District called the Language and Literacy Assessment Rubric (LALAR), in which multiple forms of evidence of students' work are collected. Teachers observe students year-round and record their observations on scannable forms.

CASAS AND SCANS

At the higher levels of education (colleges, community colleges, adult schools, language schools, and workplace settings), standards-based assessment systems have also had an enormous impact.The Comprehensive Adult Student Assessment System (CASAS), for example, is a program designed to provide broadly based assessments of ESL curricula across the United States. CASAS -assessment instruments- are used to measure functional reading, writing, listening, and speaking skills, and higher-order thinking skills. CASAS scaled scores report learners' language ability levels in employment and adult life skills contexts.

A similar set of standards compiled by the U. S. Department of Labor, now known as the Secretary's Commission in Achieving Necessary Skills (SCANS), outlines competencies necessary for language in the workplace. The competencies cover language functions in terms of

resources (allocating time, materials, staff, etc.),
interpersonal skills, teamwork, customer service, etc.,
information processing, evaluating data, organizing fues, etc.,
systems (e.g., understanding social and organizational systems), and
technology use and application.

TEACHER STANDARDS

Kuhlman (2001) emphasized the importance of teacher standards in three domains:

1. linguistics and language development
2. culture and the interrelationship between language and culture
3. planning and managing instruction

Professional teaching standards have also been the focus of several committees in the international association of Teachers of English to Speakers of Other Languages (TESOL).

TESOL's standards committee advocates performance-based assessment of teachers for the following reasons:

Teachers can demonstrate the standards in their teaching.
Teaching can be assessed through what teachers do with their learners in their classrooms or virtual classrooms (their performance) .
This performance can be detailed in what are called "indicators": examples of evidence that the teacher can meet a part of a standard.
The processes used to assess teachers need to draw on complex evidence of penormance. In other words, indicators are more that simple "how to" statements.
Performance-based assessment of the standards is an in"tegrated system. It is neither a checklist nor a series of discrete assessments.
Each assessment within the system has performance criteria against which the performance can be measured.
Performance criteria identify to what extent the teacher meets the standard.
Student learning is at the heart of the teacher'S performance.

THE CONSEQUENCES OF STANDARDS-BASED AND STANDARDIZED TESTING

The widespread global acceptance of standardized tests as valid procedures for assessing individuals in many walks of life brings with it a set of consequences that fall under the category of consequential validity. Standardized tests offer high levela of practicality and reliability and are often supported by impressive construct validation studies.

Test Bias

It is no secret that standardized tests involve a number ot types of test bias. That bias comes in many forms: language, culture, race, gender, and learning styles (Medina & Neill, 1990). The National Center for Fair and Open Testing, in its bimonthly newsletter Fair Test, every year offers dozens of instances of claims of test bias from teachers, parents, students, and legal consultants. For example, reading selections in standardized tests may use a passage from a literary piece that reflects a middle-class, white, Anglo-Saxon norm.

Test-Driven Learning and Teaching

Yet another consequence of standardized testing is the danger of test-driven learning and teaching. Test-driven learning is a worldwide issue. In Japan, Korea, and Taiwan, to name just a few countries, students approaching their last year of secondary school focus obsessively on passing the year-end college entrance examination, a major section of which is English (Kuba, 2002).

ETHICAL ISSUES: CRITICAL lANGUAGE TESTING

Shohamy (1997, p. 2) further defines the issue: "Tests represent a social technology deeply embedded in education, government, and business; as such they provide the mechanism for enforcing power and control.

The issues of critical language testing are numerous:

Psychometric traditions are challenged by interpretive, individualized procedures for predicting success and evaluating ability.
Test designers have a responsibility to offer multiple modes of performance to account for varying styles and abilities among test-takers.
Tests are deeply embedded in culture and ideology.
Test-takers are political subjects in a political context.

References:
Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education

Jumat, 03 April 2020

Assignment 6

Depicts various modes of elicitation and response Are there other modes of elicitation that could be included in such a chart? Justify your additions with an example of each.

Mode elicitation and response

Elicitation mode:

Oral (students listen)

The teacher will give instructions using whiteboard or visual and teachers will also show or draw an image.

Written (students read)

Teachers will provide text or short stories and ask students to pay attention to how the functionality of the story and express it. Then, Master made some questions on the text.

Response mode

Oral:
Students will try to explain what is happening in the picture.

Written:
Students will discuss the questions in the text by pairing with their groups, after which they will report back what was discussed. If there are students who are reluctant to talk then their partners will help to write their answers before commenting on by other groups.

Kamis, 26 Maret 2020

Assignment 5

Summary Designing Clasroom Language Tests

A. Test Types
1. Language Aptitude Tests

One type of test-although admittedly not a very common one-predicts a person's success prior 'to exposure to the second language. A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking. Language aptitude tests are ostensibly designed to apply to the classroom learning of any language.

2. Proficiency Tests

A proficiency test is not limited to anyone course, curriculwn, or single skill in the language; rather, it tests overall ability. ProfiCiency tests have traditionally consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, and aural comprehension. A typical example of a standardized proficiency test is the Test of English as a Foreign Language (TOEFL) produced by the Educational Testing Service.

3. Placement Tests

Certain proficiency tests can act in the role of placement tests, the purpose of which is to place a student into a particular level or section of a language curriculum or school. A placement test usually, but not always, includes a sampling of the material to be covered in the various courses in a curriculum; a student's performance on the test should indicate the point at which the student will find material neither too easy nor too difficult but appropriately challenging.

The English as a Second Language Placement Test (ESLP1) at San Francisco State University has three parts. In Part I, students read a short articre and then write a summary essay. In Part II, students write a composition in response to an article. Part III is multiple-choice: students read an essay and identify grammar errors in it. The maximum time allowed for the test is three hours.

4. Diagnostic Tests

A diagnostic test is designed to diagnose specified aspects of a language. A test in pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum. Usually, such tests offer a checklist of features for the administrator (often the teacher) to use in pinpointing difficulties. A writing diagnostic would elicit'a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention.

A typical diagnostic test of oral production was created by Clifford Prator (1972) to accompany a manual of English pronunciation. Test-takers are directed to read a ISO-word passage while they are tape-recorded. The test administrator then refers to an. inventory of phonological items for analyzing a learner's production. After multiple listenings, the administrator produces a checklist of errors in five separate categories, each of which has several subcategories. The main' categories include.

stress and rhythm,
intonation,
vowels,
consonants, and
other factors.

5. Achievement Tests

An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests are (or should be) limited to particular material addressed in a curriculum within a particular time frame and are offered after a course haS focused on the objectives in question
Achievement tests are often summative because they are administered at the end of a unit dr term of study. They also play an important formative role. An effective achievenlent test will offer washback about the quality of a learner's performance in subsets of the unit or course. This washback contributes to the formative nature of such tests.
The specifications for an achievement test should be determined by

the objectives of the lesson, unit, or course being assessed,
the relative importance (or 'weight) assigned to each objective,
the tasks employed in classroom lessons during the unit of time,
practicality issues, such as the tinle frame for the test and turnaround time, and
the extent to which the test structure lends itself to formative washback.

B. Some Practical Steps To Test Construction

1. Assessing Clear, Unambiguous Objectives

In addition to knowing the purpose of the test you're creating, you need to know as specifically as possible what it is you want to test.

2. Drawing Up Test Specifications

Test specifications for classroom use can be a simple and practical outline of your test.

3. Devising Test Tasks

You begin and end with nonscored items (wann-up and wind down) designed to set students at ease, and then sandwich between them items intended to test the objective (level cbeck) and a little beyond (Probe).

4. Designing Multipie-Choice Test Items

In the sample achievement test above, two of the five components (both of the listening sections) specified a multiple-choice format for items. Multiple-choice items, which may appear to be the Simplest kind of item to construct, are extremely difficult to design correctly. Hughes (2003, pp. 76-78) cautions against a number of weaknesses of multiple-choice items:

The technique tests only recognition knowledge.

Guessing may have a considerabIe effect on test scores.

The technique severely restrict what can be tested.

It is very difficult to write successful items.

Washback may be harmful.

Cheating may be facilitated.

The two prinCiples that stand out in support of multiple-choice formats are, of course, practicality and reliability.
Since there will be occasions when multiple-choice items are appropriate, consider the following four guidelines for designing multiple-choice items for classroom-based and large-scale situations (adapted from Gronlund, 1998, pp.60-75, and]. D. Brown, 1996, pp. 54-57).

Design each item to measure a specific objective.
State both stem and options as simply and directly as possible.
Make certain that the intended answer is clearly the only correct one.
Use item indices to accept, discard, or revise items

C. Scoring, Grading, and Giving Feedback

1. Scoring

As you design a classroom test, you must consider how the test will be scored and graded. Your scoring plan reflects the relative weight that you place on each section and items in each section. The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing.

2. Grading

Your first thought might be that assigning grades to student performance on this test would be easy: just give an "A" for 90-100 percent, a "B" for 80-89 percent, and, so on.

3. Giving Feedback

A section on scoring and grading would not be complete without some consideration of the forms in which you will offer feedback to your students, feedback that you want to become beneficial washback. Washback is achieved when students can, through the testing experience, identify their areas of success and challenge.

References:
Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education

Jumat, 20 Maret 2020

Assignment 4
Summary principles of language assessment include: practicality, reliability, validity, authenticity and washback .

Practicality

Affective test is practical. This means that it

is not excessively expensive,
stays within appropriate time constraints,
is relatively easy to administer, and
has a scoring/evaluation procedure that is specific and time-efficient.

A test that is prohibitively expensive is impractical. A test of language proficiency that takes a student five hours to complete is impractical-it consumes more time (and money) than necessary to accomplish its objective. A test that takes a few minutes for a student to take and several hours for an examiner to evaluate is impractical for most classroom situations. A test that can be scored only by computer is impractical if the test takes place a thousand miles away from the nearest computer. The v~lue and quality of a test sometimes hinge on such nitty-gritty, practical considerations.

Reliability

If you give the same test to the same student or matched students on two different occasions, the test should yield similar results. The issue of reliability of a test may best be addressed by considering a ·number of factors that may contribute to the unreliability of a test.

Student-Related Reliability

The most common learner-related issue in reliability is caused by temporary illness, fatigue, a "bad day," anxiety, and other physical or psychological factors, which may make an "observed"score deviate from one's "true" score. Also included in this category are such factors as a test-taker's "test-wiseness" or strategies for efficient test taking (Mousavi, 2002, p. 804).

Rater Reliability

Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when.two or more scorers yield inconsistent scores of the same test, possibly for lack of attention to scoring criteria, inexperience, iriattention, or even preconceived biases.

Intra-rater reliability isa common occurrence for classroom . teachers because of unclear scoring. criteJ;"ia, fatigue, bias toward particular "good" and "bad" students, or simple carelessness.

Test Administration Reliability

Unreliability may also result from the conditions in which the test is administered. I once witnessed the administration of a test of aural comprehension in which a tape recorder played items for comprehension, but because of street noise outside the building, students sitting next to windows could not hear the tape accurately. This was a clear case of unreliability caused by the conditions of the test administration.

Test Reliability

Sometimes the nature of the test itself can cause measurement errors. If a test is too ,long, test-takers may become fatigued by the time they reach the later items and hastily respond incorrectly. Timed tests may discriminate against students who do not perform well on a test with a time limit.

Validity

By-far the most complex criterion ofan effective test-and arguably the most important principle-is validity, "the extent to which inferences made from assessment results are appropiate, meaningful, and useful in terms of the purpose of the assessment" (Gronlund, 1998, p. 226). We will look at these five types of evidence below.

Content-Related Evidence
Criterion-Related Evidence
Construct-Related Evidence
Consequential Validity
Face Validity

Authenticity

A fourth major principle oflanguage testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing tests. Bachman and Palmer, (1996, p. 23) define authenticity as "the degree of correspondence of the characteristics of a given language test task to the features of a target language task," and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.

Washback

A facet of consequential validity, discussed above, is "the effect of testing on teaching and learning" (Hughes, 2003, p. 1), otherwise known among language-testing , specialists as washback. In large-scale assessment, washback generally refers to the effects the tests have on instruction in terms of how students prepare for the test.

"Cram" courses and "teaching to the test" are examples of such washback. Another form of washback that occurs more in classroom assessment is the information thAt "wa,shes back" to students in the form of useful diagnoses of strengths and weaknesses.Washback also includes the effects ofan assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in was~back effects because the teacher is usually providing interactive feedback. Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score.

References:
Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education

Assignment 3
Analysis of the 3 principles of language assessment (practicality, Reliability, and validity)

Latihan Soal Ujian Nasional Bahasa Inggris SMP

A. Topik : Announcement
Questions 1-2 are based on the following text.

To All English Literature Class Students
(1) Exam will be held on Wednesday, October 23rd 2014, at 10 a.m. in room 143. Please read A Clean and Well-Lighted Place by E. Hemingway and Queenie by Alice Munro as they will be the subject materials. Copies of both are available in my locker, feel free to grab them!
(5) It will be an open book exam. You can bring books as many as you can, but no internet and cell phones! Cheaters will be expelled!
Professor Smith

1.To whom the announcement is made?
A. Students of English Literature Class
B. Professor Smith
C. E. Hemingway and Alice Munro
D. The readers

2. Which statement is correct based on the text?
A. Students cant open their books in the exam.
B. Using internet from the cell phones is allowed.
C. Prof. Smith put copies of subject materials inside her locker.
D. The exam will be held at 8 a.m.

B. Topik : Notice
Questions for number 3 to 5!

3. The notice above means.....
A. wash hand before work
B. wash hand before eat
C. use the shop for washing hand
D. wash hand in the bathroom

4. The statement based on the text is
A. The notice is for parents.
B. The notice is for students.
C. The notice is for employees.
D. The notice is for the guest.

5. Where you can find the notice?
A. In the market
B. In the class
C. In the company
D. In the street

C. Topik : Narrative
Questions 6-9 are based on the following text.

(1) A hen was so jealous at her friend, a goose, who could lay golden eggs. The farmer was very caring with his goose and he always feed the goose the best corn and wheat, as opposed to other livestock which were provided only with the usual fodder.
(5) The goose is so lucky. She lays golden eggs so she gets more affection from our master, said the hen.
(7) One day, the farmer came into the barn and said greedily, I wonder if there are many golden eggs inside the goose." Then, he captured the goose, took her to his home, and butchered her. The hen, knowing the fate of the poor goose, said, I am so lucky! If only I had laid golden eggs, I would have been dead instead of the goose!

6. What is the appropriate title for the text?
A. The Goose
B. The Jealous Hen
C. The Greedy Farmer
D. The Pursuit of Golden Eggs

7. What is the main idea of paragraph 3?
A. The hen was grateful because of her luck
B. The goose laid more eggs
C. The farmer butchered the hen
D. The hen was jealous with the goose

8. Which of the following statement is NOT mentioned in the text?
A. The goose was dead in the end.
B. The farmer always feed the goose the best food.
C. The goose was the farmers favorite.
D. The hen never felt unlucky.

Then, he captured the goose, took her to his home, and butchered her.
9. What is the closest meaning of the underlined word?
A. Intimidated
B. Fed
C. Cared
D. Slaughtered

D. Topik : Advertisement
Questions 10-11 are based on the text below.

10. What is advertised from the text above?
A. Tea
B. Beverages
C. Perfume
D. rinks

"The Most Treasured Name in Perfume".
11.What is the meaning of the underlined phrase?
A. Most valuable
B. Most expensive
C. Most helpful
D. Most desired

Analysis of practicality, Reliability, and validity

Practicality

In this test the level of practicality has been fulfilled, namely the problem is economical and does not require funding from students, the discipline of this question is easy to follow and easily understood by students and remains within the right time limits and the correct test procedures and is specific and time-saving.

Reability

is an index that shows the extent of an instrument reliable and reliable gauges. A test is said to have provisi on if reliable, consistent / stable and productive whenever the test used. From this problem can in dicate the level of stability of 9th grades tudents in junior high school. This means that the test can be used to express or measure the level of student knowledge of English subjects.

Validity

validation in this problem is how we determine whether or not a problem can and distinguish groups in the aspects measured according to differences. The validity of the questions here shows an index of discrimination in distinguishing between high-ability test takers from low-ability test takers.

References:
https://blog.ruangguru.com/latihan-soal-ujian-nasional-bahasa-inggris-smp-dan-pembahasannya

Minggu, 01 Maret 2020

Assignment 2
1. Explain types and objectives of assessment (achievement, diagnostic, placement, proficiency, and aptitude tests)

achievement assessment/test is to determine whether course objectives have been met with skills acquired by the end of a period of instruction. Achievement tests should be limited to particular material addressed in a curriculum within a particular time frame. Achievement tests belong to summative because they are administered at the end on a unit/term of study. It analyzes the extent to which students have acquired language that have already been taught.
Diagnostic test the purpose is to diagnose specific aspects of a language. These tests offer a checklist of features for the teacher to use in discovering difficulties. Proficiency tests should elicit information on what students need to work in the future; therefore the test will typically offer more detailed subcategorized information on the learner. For example, a writing diagnostic test would first elicit a writing sample of the students. Then, the teacher would identify the organization, content, spelling, grammar, or vocabulary of their writing. Based on that identifying, teacher would know the needs of students that should have special focus.
The purpose of placement test is to place a student into a particular level or section of a language curriculum or school. It usually includes a sampling of the material to be covered in the various courses in a curriculum. A students performance on the test should indicate the point at which the student will find material neither too easy nor too difficult. Placement tests come in many varieties: assessing comprehension and production, responding through written and oral performance, multiple choice, and gap filling formats. One of the examples of Placement tests is the English as a Second Language Placement Test (ESLPT) at San Francisco State University.
Proficiency test is to test global competence in a language. It tests overall ability regardless of any training they previously had in the language. Proficiency tests have traditionally consisted of standardized multiple-choices item on grammar, vocabulary, reading comprehension, and listening comprehension. One of a standardized proficiency test is TOEFL.
Aptitude test is to predict a persons success to exposure to the foreign language. According to John Carrol and Stanley Sapon (the authors of MLAT), language aptitude tests does not refer to whether or not an individual can learn a foreign language; but it refers to how well an individual can learn a
foreign language in a given amount of time and under given conditions. In other words, this test is done to determine how quickly and easily a learner learn language in language course or language training program.

2. Identify issues in language assessment

Behavior

In the middle of 20 century, teaching and testing were influenced by behaviorism. Testing focused sentence structure, translation from L1 to L2, grammar and vocabulary items.

Integrative

The integrative approach refers to a test that seeks to integrate knowledge of systematic components of language (pronounciation, grammar, vocabulary) with an understanding of contex (McNamara, 2000). In an integrative test language is not viewed in a discrete component and isolated from its context. According to Heaton (1989), the integrative approach does not separate the skills like in discrete tests but requires students to use more than one skill simultaneously.

Communicative language

Communicative language test is a test of learners language performance in meaning/real life situations. The test does not only test the learners competence, that is, what the learners know about the language and about how to use it, but also to the performance, that is, to what extent the learners are able to actually demonstrate in a meaningful or real life situations.

Performance based assessment

Performance based assessment measures student ability to apply the skills and knowledge learned from a unit or units of study. Typically, the task challenges students to use their higher-order thinking skills to create a product or complete a process (Chun,2010).

3. Identify hot topics relating to classroom based assessment

Gardner groups student capabilities into eight broad categories (each student's unique learning style is a combination of these intelligences):

Logical/mathematical (uses numbers effectively)
Visual/spatial (is artistically or spatially perceptive)
Bodily/kinesthetic (excels at tasks that require physical movement)
Musical (perceives and/or expresses musical forms and patterns)
Linguistic (uses words effectively)
Interpersonal (responds well to others)
Intrapersonal (is reflective and inner-directed)
Naturalist (makes distinctions in the natural world)

According to Muller (2008), traditional assessment is an assessment that refers to choosing a response and more to measuring students' memories related to information obtained. This can be done through measurement of multiple-choice tests, cloze tests, true-false tests, matching and the like. Students typically choose an answer or memorize information to complete the assessment.

Authentic assessment is a form of task that requires learners to demonstrate meaningful performance in the real world, which is the application of the essence of knowledge and skills. Authentic assessment emphasizes the ability of learners to demonstrate their knowledge in a tangible and meaningful way. Assessment activities are not just asking or tapping knowledge that has been known to learners, but the real performance of knowledge that has been mastered.

Computer Based Test is a test conducted using a computer. Computer-based test items in the form of multiple choice (objective). For certain conditions, we can use this test to assess students.

References:

www.google.com/amp/s/thejoyoflanguageassessment.wordpress.com/2012/12/19/kind-of-test/amp/
https://www.slideshare.net/mobile/RahilaKhan6/assessments-concepts-and-issues
http://www.proenglishteacher.com/2015/04/asesmen-pengertian-asesmen-asesmen.html?m=1

Minggu, 23 Februari 2020

The definition of Test, measurement, assessment and evaluation of two different authors

According to riduwan (2006: 37) tests are a series of questions used to measure the skills, knowledge, intelligence, abilities or talents that individuals/groups have.
According to rusli lutan (2000:21) tests are instruments that use to obtain information about an individual or an object.
According to angelosi (1995: 21) measurement is the process of data collection by empirical observations used to gather relevant information with a determined purpose. In this case teachers assess students' achievements by reading or observing what students do, by observing their performance, by hearing what they say, and by using their senses such as seeing, hearing, touching, kissing, and feeling.
According to wiersma & jurs (1990) measurement is a numerical assessment of the facts of objects to be measured by certain criteria or units.
According to djemari mardapi (1999: 8) assessment is the activity of interpreting or describing measurements.
According to Palomba and Banta (1999), Assessment is the systematic collection , review , and use of information about educational programs undertaken for the purpose of improving student learning and development
According to Frey, Barbara A., and Susan W. Alman. (2003): Evaluation is the systematic process of collecting, analyzing, and interpreting information to determine the extent to which pupils are achieving instructional objectives.
According to Mehrens & lehmann, 1978:5) Evaluation is a process of planning, obtaining, and providing indispensable information to make alternative decisions.

The definition of formative, summative, formal and informal assessment

According to winkel these formative evaluations were use of tests during the ongoing learning process, allowing students and teachers to get feedback on progress.
According to winkel the sumative evaluation as use of tests at the end of a particular teaching period, covering some or all of the lesson units taught in a semester, even after discussing a field of study.
Formal assessments are systematic, preplanned methods of testing students that are used to determine how well students have learned the material that is being taught in the classroom. In other words, formal assessments provide a way to know what the students know.
Informal assessments are those spontaneous forms of assessment that can easily be incorporated in the day-to-day classroom activities and that measure the students performance and progress. Informal assessments are content and performance driven.

References

http://amrinasr.blogspot.com/2016/06/pengukuran-measurement-penilaian.html?m=1

https://httadityachandra.blogspot.com/2016/08/pengertian-pengukuran-measurement.html

https://www.google.com/amp/s/imammalik11.wordpress.com/2015/01/10/penilaian-formatif-dan-sumatif/amp/

https://abdao.wordpress.com/2015/07/18/formal-and-informal-assessments/

https://study.com/academy/lesson/formal-assessment-examples-ty.pes-qiuz.html