Language Assessment

Assignment 4
Summary principles of language assessment include: practicality, reliability, validity, authenticity and washback .

Practicality

Affective test is practical. This means that it

is not excessively expensive,
stays within appropriate time constraints,
is relatively easy to administer, and
has a scoring/evaluation procedure that is specific and time-efficient.

A test that is prohibitively expensive is impractical. A test of language proficiency that takes a student five hours to complete is impractical-it consumes more time (and money) than necessary to accomplish its objective. A test that takes a few minutes for a student to take and several hours for an examiner to evaluate is impractical for most classroom situations. A test that can be scored only by computer is impractical if the test takes place a thousand miles away from the nearest computer. The v~lue and quality of a test sometimes hinge on such nitty-gritty, practical considerations.

Reliability

If you give the same test to the same student or matched students on two different occasions, the test should yield similar results. The issue of reliability of a test may best be addressed by considering a ·number of factors that may contribute to the unreliability of a test.

Student-Related Reliability

The most common learner-related issue in reliability is caused by temporary illness, fatigue, a "bad day," anxiety, and other physical or psychological factors, which may make an "observed"score deviate from one's "true" score. Also included in this category are such factors as a test-taker's "test-wiseness" or strategies for efficient test taking (Mousavi, 2002, p. 804).

Rater Reliability

Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when.two or more scorers yield inconsistent scores of the same test, possibly for lack of attention to scoring criteria, inexperience, iriattention, or even preconceived biases.

Intra-rater reliability isa common occurrence for classroom . teachers because of unclear scoring. criteJ;"ia, fatigue, bias toward particular "good" and "bad" students, or simple carelessness.

Test Administration Reliability

Unreliability may also result from the conditions in which the test is administered. I once witnessed the administration of a test of aural comprehension in which a tape recorder played items for comprehension, but because of street noise outside the building, students sitting next to windows could not hear the tape accurately. This was a clear case of unreliability caused by the conditions of the test administration.

Test Reliability

Sometimes the nature of the test itself can cause measurement errors. If a test is too ,long, test-takers may become fatigued by the time they reach the later items and hastily respond incorrectly. Timed tests may discriminate against students who do not perform well on a test with a time limit.

Validity

By-far the most complex criterion ofan effective test-and arguably the most important principle-is validity, "the extent to which inferences made from assessment results are appropiate, meaningful, and useful in terms of the purpose of the assessment" (Gronlund, 1998, p. 226). We will look at these five types of evidence below.

Content-Related Evidence
Criterion-Related Evidence
Construct-Related Evidence
Consequential Validity
Face Validity

Authenticity

A fourth major principle oflanguage testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing tests. Bachman and Palmer, (1996, p. 23) define authenticity as "the degree of correspondence of the characteristics of a given language test task to the features of a target language task," and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.

Washback

A facet of consequential validity, discussed above, is "the effect of testing on teaching and learning" (Hughes, 2003, p. 1), otherwise known among language-testing , specialists as washback. In large-scale assessment, washback generally refers to the effects the tests have on instruction in terms of how students prepare for the test.

"Cram" courses and "teaching to the test" are examples of such washback. Another form of washback that occurs more in classroom assessment is the information thAt "wa,shes back" to students in the form of useful diagnoses of strengths and weaknesses.Washback also includes the effects ofan assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in was~back effects because the teacher is usually providing interactive feedback. Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score.

References:
Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education

Language Assessment

Jumat, 20 Maret 2020

Tidak ada komentar:

Posting Komentar

Arsip Blog