KUMPULAN ILMU TENTANG KESEHATAN (COLLECTION OF HEALTH SCIENCE): PERFORMANCE ASSESSMENT: AN OLD FRIEND REDISCOVERED

Chapter Objectives

As a result of studying the material presented in this chapter, reflecting on that material, and completing the learning exercises presented at the end of the chapter, you will:

1. Master content knowledge:

a. Understand the roles of objectivity and subjectivity in performance assessment.

b. Know the kinds of achievement targets that can be reflected in performance assessments.

c. List three basic parts and nine design decisions that comprise the steps in performance assessment development.

d. Specify classroom assessment context factors to consider in deciding whether or when to use performance assessments.

e. State considerations in sampling student performance via performance exercise.

f. Know specific guidelines for the construction of performance exercises and scoring schemes.

g. Specify ways to bring students into the performance assessment process as a teaching strategy.

2. Be able to use that knowledge to reason as follows:

a. Identify the kinds of skills and products that can form the basis of performance assessment exercises.

b. Transform those important learning’s into quality exercises and scoring criteria.

3. Become proficient at the following skills:

a. Be able to carry out the steps in designing performance assessments.

b. Evaluate previously developed performance assessments to determine their quality.

4. Be able to create quality products of the following sorts:

a. Performance assessments that meet standards of quality.

5. Attain the following affective outcomes:

a. Understand the importance of knowing about sound performance assessment.

b. Value performance assessment as a viable option for use in the classroom.

c. See performance assessment as a valuable instructional tool in which students can and should be full partners.

d. Regard performance assessment with caution, valuing the need to adhere to rigorous standards of quality in development and use.

The education community has discovered performance Assessment methodology! Across the land, we are in a frenzy to learn about and use this "new" assessment discovery. Performance assessments involve students in activities that require the demonstration of certain skills and/or the creation of specified products. As a result, this assessment methodology permits us to tap many of the complex educational outcomes we value that cannot be translated into paper and pencil tests.

With performance assessments, we observe students while they are performing, or we examine the products created, and we judge the level of proficiency demonstrated. As with essay assessments, we use these observations to make subjective judgments about the level of achievement attained. Those evaluations are based on comparisons of student performance to preset standards of excellence.

For example, a primary-grade teacher might watch a student interacting with classmates and draw inferences about that child's level of development in social interaction-skills. If the levels of achievement are clearly defined in terms the observer can easily interpret, then the teacher, observing carefully, can derive information from watching that will aid in planning strategies to promote further social development. Thus, this is not an assessment where answers are counted right or wrong. Rather, like the essay test, we rely on teacher judgment to place the student's performance somewhere on a continuum of achievement levels ranging from very low to very high.

From a completely different context, a middle school science teacher might examine "mousetrap cars" built by students to determine if certain principles of energy utilization have been followed. Mousetrap cars are vehicles powered by one snap of a trap. One object is to -see who can design a car that can travel the farthest by converting that amount of energy into forward motion. When the criteria are clear, the teacher can help students understand why the winning car goes furthest.

Performance assessment methodology has arrived on the assessment scene with a flash of brilliance unprecedented since the advent of selected response test formats earlier in this century. For many reasons, this "new discovery" has struck a chord among educators at all levels. Recently popular applications carry such labels as authentic assessments, alternative assessments, exhibitions, demonstrations, and student work samples, among others.

These kinds of assessment are seen as providing high-fidelity or authentic assessments of student achievement (Wiggins, 1989). Proponents contend that, just as high-fidelity music provides accurate representations of the original music, so too can performance assessments provide accurate reproductions of complex achievements under performance circumstances that stretch into life beyond school. However, some urge great caution in our rush to embrace this methodology, because performance assessment brings with it great technical challenges. They correctly point out that this is a very difficult methodology to develop and use well (Dunbar, Koretz, & Hoover, 1991).

CHAPTER ROADMAP

As with the other forms of assessment, there are three critical contexts within which your knowledge of performance assessment methodology will serve you well. First, you will design and develop performance assessments for use in your classroom in the future, if you are not doing so already. The quality of those assessments, obviously, is in your hands.

Further, the education literature, published textbooks, and instructional materials and published test materials all are beginning to include more and more examples of performance assessments, which you might consider for use in your classroom. Once again, you are the gatekeeper. Only you can check these for quality and appropriateness for use in your context.

And finally, as with other forms of assessment, you may find yourself on the receiving end of a performance assessment. Obviously, it is in your best interest to be sure these are as sound as they can be—as sound as you would want them if you were to use them to evaluate your own students. Be a critical consumer: If you find flaws in the quality of these assessments, be diplomatic, but call the problems to the attention of those who will evaluate your performance.

To prepare you to fulfill these responsibilities, our journey will begin with an explanation of the basic elements of a performance assessment, illustrated with simple examples; we continue by examining the role of subjectivity in this form of assessment; and then we will analyze the kinds of achievement targets performance assessments can serve to reflect.

Further, as we travel, we will do the following:

• Complete a detailed analysis of the assessment development process, including specific recommendations for actions on your part that can help you avoid the many pitfalls to sound assessment that accompany this alternative.

	SELECTEC REDPONSE	ESSAY	PERFORMANCE ASSESSMENT	PERSONAL COMMUNICATION
Know
Reason
Skill
Product
Affect

Figure 8-1

Aligning Achievement Targets and Assessment Methods

• Address strategies for devising the criteria by which to evaluate performance, suggestions for developing quality exercises to elicit performance to be evaluated, and ideas for making and recording judgments of proficiency.

• Explore the integration of performance assessment with instruction, not as a concluding section at the end of the chapter, but as an integral part of the entire presentation on performance assessment methodology.

In this way, we will be able to see the great power of this methodology in the classroom: its ability to place students in charge of their own academic well-being.

As you proceed through this chapter, keep the big picture in mind. The shaded cells in Figure 8-1 show the material we will be addressing herein.

Also as we proceed, be advised that this is rust part one of a multipart treatment of performance assessment methodology. It is intended only to reveal the basic structure of performance assessments. The remaining parts are included in Part 3, on classroom applications. After we cover the basic structure and development of these assessments in this chapter, we will return with many more examples in Chapter 10, on assessing reasoning in all of its forms, and in Chapter 11, on using performance assessments to evaluate many skills and products. Your understanding of performance assessment methodology is contingent upon studying, reflecting upon, and applying material covered in all three chapters.

A Note of Caution

While I tend to be a proponent of performance assessment methodology because of its great potential to reflect complex and valuable outcomes of education, I urge caution on two fronts.

First, please understand that there is nothing new about performance assessment methodology. This is not some kind of radical invention recently fabricated by opponents of traditional tests to challenge the testing industry. Rather, it is a proven method of evaluating human characteristics that has been in use for decades (Linquist, 1951), for centuries, maybe even for eons. For how long have we selected our leaders, at least in part, on the basis of our observations of and judgments about their performance under pressure? Further, this is a methodology that has been the focus of sophisticated research and development both in educational settings and in the workplace for a very long time (Berk, 1986).

Besides, anyone who has taught knows that teachers routinely observe students and make judgments about their proficiency. Admittedly, some of those applications don't meet accepted standards of assessment quality (Stiggins & Conklin, 1992). But we know performance assessment is common in the classroom and we know how to do it well. Our challenge is to make the assessment meet the standards.

Virtually every bit of research and development done in education and business over the past decades leads to the same conclusion: performance assessment is a complex way to assess. It requires that users prepare and conduct their assessments in a thoughtful and rigorous manner. Those unwilling to invest the time and energy needed to do it well are better off assessing in some other way.

Second, performance assessment methodology is not the panacea some advocates seem to think it is. It is neither the savior of the teacher, nor the key to assessing the "real" curriculum. It is just one tool capable of providing an effective and efficient means of assessing some—but not all--of the most highly valued outcomes of our educational process. As a result, it is a valuable tool indeed. But it is not the be-all and end-all of the educational assessment process. For this reason, it is critical that we keep this form of assessment in balance with the other alternatives.

Although performance assessment is complex, and requires care to use well, it certainly does hold the promise of bringing teacher, students, and instructional leaders into the assessment equation in unprecedented ways. But the cost of reaching this goal can be high. You must meet the considerable challenge of learning how to develop and use sound performance assessments. This will not be easy! There is nothing here you cannot master, but don't take this methodology lightly—we're not talking about "assessment by guess and by gosh" here. There is no place in performance assessment for "intuitions" or ethereal "feelings" about the achievement of students. It is not acceptable for a teacher to claim to just “know” a student can do it. Believable evidence is required. This is neither a mystical mode of assessment, nor are keys to its proper use a mystery. It takes thorough preparation and meticulous attention to detail to attain appropriate levels of performance assessment rigor.

Just as with other modes of assessment, there are rules of evidence for sound performance assessment. Remember, sound assessments 'do the following:

• arise from clear and appropriate achievement targets

• serve a clearly articulated purpose

• rely on proper assessment methods

• sample performance appropriately

• control for all relevant sources of extraneous interference

Adhere to these rules of evidence when developing performance assessment and you can add immeasurably to the quality and utility of your classroom assessments of student achievement. Violate those rules—which is very easy to do in this case!—and you place your students at risk.

Time for Reflection

As you have seen, performance assessments are based on observation and judgment. Can you think of instances outside of the school setting where this mode of assessment comes into play? In the context of hobbies? In work settings? In othef contexts? Please list. five or six examples.

PERFORMANCE ASSESSMENT AT WORK IN THE CLASSROOM

To appreciate the extremely wide range of possible applications of performance assessment, we need to explore the many design alternatives that reside within this methodology. I will briefly describe and illustrate those now, and will then show you how one professor put this design framework to work in her very productive learning environment.

An Overview of the Basic Components

We initiate the creation of performance assessments just as we initiated the development of paper and pencil tests as described in the previous two chapters: We start with a plan or blueprint. As with selected response and essay assessments, the performance assessment plan includes three components. In this case, each component contains three specific design decisions within it.

First, the performance assessment developer must clarify the performance to be evaluated. Second, performance exercises must be prepared. And third, systems must be devised for scoring and recording results.

The immense potential of this form of assessment becomes apparent when we consider all of the design options available within this three-part structure. Let's explore these options.

Part 1: Clarifying Performance. Under this heading, the user has the freedom to select from a nearly infinite range of achievement target possibilities. We can focus performance assessments on particular targets by making three specific design decisions, addressing the kind of performance to be assessed, identifying who will be assessed, and specifying performance criteria.

Nature of Performance. This first design decision requires that we answer these basic questions: How will successful achievement manifest itself? Where will the evidence of proficiency most easily be found?

Performance might take the form of a particular set of skills or behaviors that students must demonstrate. In this case, we watch students "in process," or while they are actually doing something, and we evaluate the quality of their performance. The example given earlier of the primary-grade teacher observing the youngster in interaction with other students illustrates this kind of performance assessment. In that instance, success manifests itself in the actions of the student.

On the other hand, we also can define performance in terms of a particular kind of product to be created, such as the mousetrap car. In this application, the standards of performance would reflect important attributes of an energy-efficient car. The teacher would examine the car and determine the extent to which the criteria of efficiency had been met. Evidence of success is found in the attributes of the car as a product.

Some contexts permit or may even require the observation and evaluation of both skill and product. For example, you might watch a student operate a computer (skill) and evaluate the final product (the resulting program or other document) to evaluate that student's success in hitting both key parts of that achievement target.

Time for Reflection

Based on your experience as a student and/or teacher, can you think of additional classroom contexts where it might be relevant to assess both process and product?

Focus of the Assessment. To address this design decision, we must understand that performance assessments need not focus only on individual student behaviors. They also can apply to the observation of and judgment about the performance of students functioning in a group.

In these times of cooperative learning, the evaluation of teamwork can represent a very important and useful application. Two kinds of observations are worthy of consideration. One focuses on the group interaction behaviors. The observer tracks the manner in which the group works as a whole. The group is the unit of analysis of performance, if you will. A sample performance assessment question might be, Given a problem to solve, does the group exhibit sound group problem-solving skills?

The other form of observation focuses on individual behaviors in a group context and summarizes those across individuals. For example, observers might tally .and/or evaluate the nature of instances of aggressive and dangerous playground ground behavior. These can be very informative and useful assessments.

Performance Criteria. Once we have decided on the performance and performer upon which to focus, the real work begins. Attention then shifts to (a) specifying in writing all key elements of performance, and (b) defining a performance continuum for each element so as to depict in writing what performance is like when it is of very poor quality, when it is outstanding, and all key points in between. These key elements or dimensions of performance are called the Performance criteria.

In terms of the two examples we have been discussing, performance criteria answer the questions, What are the desirable social interaction behaviors for a primary-grade student? What are the specific attributes of an energy-efficient mousetrap car?

Clear and appropriate performance criteria are critical to sound performance assessment. When we can provide sound criteria, we are in for ati easy and productive application of this methodology. Not only will we be in focus on the expected outcomes, but with clearly articulated performance criteria, as you shall see, both students and teachers share a common language in which to converse about those expectations.

Time for Reflection

When you are evaluating a movie, what criteria do you apply? How about a restaurant? Write down criteria you think should be used as the basis for evaluating a teacher.

Part 2: Developing Exercises. In designing performance exercises, we must think of ways to cause students to perform in a manner that will reveal their level of proficiency. How can we cause them to produce and present a sample product for us to observe and evaluate, for example? In this facet of performance assessment design, we decide the nature of the exercises, the number of exercises needed, and the actual instructions to be given to performers.

Nature of Exercises. Performance assessment offers two exercise choices that, once again, reveal the rich potential of this methodology. Specifically, there are two ways to elicit performance for purposes of evaluation.

One option is to present a structured exercise in which you provide respondents with a predetermined and detailed set of instructions as to the kind of performance desired. They are completely aware of the assessment, they reflect upon and prepare for this assignment, and then they provide evidence of their proficiency. For example, they might be asked to prepare to give a certain kind of speech, perform some kind of athletic activity, write a term paper, or build a mousetrap car.

But also be advised that performance assessment offers another option not available with any other form of assessment. You can observe and evaluate some kinds of performance during naturally occurring classroom events and gather useful information about "typical" student performance. For example, the primary-grade teacher interested in the social interaction skills of one student obviously would disrupt the entire assessment by instructing the student, "Go interact with that group over there, so I can evaluate your ability to get along." Such an exercise would run completely counter to the very essence of the assessment. Rather, the teacher would want to stand back and watch the student's behavior unfold naturally in a group context to obtain usable information. Assessments that are based on observation and judgment allow for this possibility, while others do not. Just try to be unobtrusive with a true/ false test!

It may have become apparent to you also that you can combine observations derived from administration of structured' exercises and from the context of naturally occurring events to generate corroborating information about the same achievement target. For example, an English teacher might evaluate writing proficiency gathered in response to a required assignment and in the context of student daily writing journals done for practice. The combined information might provide insights about specific student needs.

Time for Reflection

Can you think of an instance outside of school in which observation of naturally occurring performance * serves as the basis for an assessment? In the con- text of a hobby? In a work setting? In some other context? What is observed and judged and by whom?

Content of Exercises. The final exercise-related design component is the actual content of the exercise. Like the essay exercise discussed in Chapter 7, instructions for structured exercises should include the kind(s) of achievement to be demonstrated, conditions under which that demonstration is to take place, and standards of quality to be applied in evaluating performance. Here is a simple example of a complete exercise:

Achievement: Your four-person team is to do the research requied to prepare a group presentation on the dwellings and primary food sources of the Native American tribe you have selected.

Conditions. The manner in which you carry out the background research and divide up responsibilities within the group is up to you. The focus of the evaluation will be your presentation. (Note that the process will not be evaluated in terms of doing the background research or preparation, but it will be in the process of giving the presentation)

Standards. Your presentation will be evaluated according to the criteria we develop together in class, dealing with content (scope, organization, and accuracy) and delivery (use of learning aids, clarity, and interest value for the audience).

Number of Exercises. Once the nature of the exercise is determined, you must decide how many exercises are needed. This is a sampling issue. How many examples of student performance are enough? As discussed, you must decide how many exercises are needed to provide a representative sample of all the important questions you could have asked given infinite time. If you want to know if students can speak French, how many times do they have to speak for you to be reasonably certain you could predict how well they would do given one more chance? How many samples of writing must you see to be confident drawing conclusions about writing proficiency? In fact, the answers to these questions are a function of the assessment context. To answer them, we must consider several factors, including the reasons for assessment and other issues. We will review these factors later in the chapter.

Part 3: Scoring and Recording Results. Once performance has been clarified and exercises developed, procedures for managing results must be specified.

Level of Detail of Results. First, the user must select one of two kinds of scores to generate from the assessment. Option one is to evaluate performance analytically, making independent judgments about each of the performance criteria separately. In this case, performance is profiled in terms of individual ratings. Option two is called holistic scoring. In this case, one overall judgment is made about performance that combines all criteria into one evaluation. The choice is a function of the manner in which assessment results are to be used. Some uses require the high-resolution microscope of the analytical system, while others require the less precise but also less costly holistic process.

Recording Procedures. Second, designers must select a specific method for transforming performance criteria into usable information through a system of recording procedures. Once again, the great flexibility of performance assessment methodology comes through. Users have many choices here, too:

• checklists of desired attributes present or absent in performance

• various kinds of performance rating scales

• anecdotal records, which capture written descriptions of and judgments about performance

• mental records, which capture images and records of performance in the memory of the evaluator for later recall and use (to be used cautiously!)

Identifying the Rater. And finally, performance assessment users must decide who will observe and evaluate performance. In most classroom contexts, the most natural choice is the teacher. Since performance evaluators must possess a clear vision of the desired achievement and be capable of the rigorous application of the performance criteria, who could be more qualified than the teacher?

Just be advised that you have other choices. You might rely on some outside expert to come to the classroom and participate. Or you might rely on the students to conduct self-assessments or to evaluate each other's performance.

The instructional potential of preparing students to apply performance criteria in a rigorous manner to their own work should be obvious. I will address this application in greater detail throughout the chapter.

Time for Reflection

As a student, have you ever been invited to observe and evaluate the skill or product performance of other students? What did you observe? What criteria did you use? Were you trained to assess? What was that experience like for you?

Summary of Basic Components. Figure &-2 lists the nine design decisions faced by any performance assessment developer. Also included are the design options available within each decision.

Design Factors	Options
1. Clarifying Performance Nature of performance Focus of the assessment Performance Criteria	Behavior to be demonstrated Product to be created Individual performancd Group performance Reflect key, aspects of the specific target
2. Developing Exercises Nature of. Exercises Content of exercises Number of exercises	Structured assignment Naturally occurring events Defines-target; conditions, and standards Function.of purpose, target, and available Resources
3. Scoring and Recording Results Level of detail of results Recording procedures Identifying the rater	Holistic Analytical Checklist Rating Anecdotal record Mental records Teacher Outside expert Student self-evaluation Studeni peer evaluation

Figure 8-2

Performance Assessment Design Framework

ENSURING THE QUALITY OF PERFORMANCE ASSESSMENTS

If we are to apply the design framework shown in Figure 8-2 productively, we need to understand where the pitfalls to sound performance assessment hide. For instance, if we are not careful, problems can arise from the inherently subjective nature of performance assessment. Other problems can arise from trying to use this methodology in places where it doesn't belong.

Subjectivity in Performance Assessment

Professional judgment guides every aspect of the design and development of any performance assessment. For instance, as the developer and/or user of this method, you establish the achievement target to be assessed using input about educational priorities expressed in state and local curricula, your text materials, and the opinions of experts in the field. You interpret all of these factors and you decide what will be emphasized in your classroom—based on professional judgment.

Further, you select the assessment method to be used to reflect that target. Based on your vision of the valued outcomes and your sense of the assessment options available to you, you make the choices. This certainly qualifies as a matter of professional judgment.

In the classroom, typically you create the assessment, either selecting from among some previously developed options or generating it by yourself. If you generate it yourself, you choose whether to involve students or other parties in that design process. In the case of performance assessment, the first design issue to be faced is that of devising performance criteria, those detailed descriptions of success that will guide both assessment and instruction. This translation of vision into criteria is very much a matter of professional judgment.

So is the second design decision you must make: formulating performance exercises, the actual instructions to respondents that cause them to either demonstrate certain skills or create some tangible product, so their performance can be observed and evaluated. And finally, of course, this observation and evaluation process is subjective too. Every step along the way is a matter of your professional and subjective judgment.

For decades, the assessment community has afforded performance assessment the status of second-class citizenship because of the potentially negative impacts of all of this subjectivity. The possibility of bias due to subjective judgment has rendered this methodology too risky for many.

More recently, however, we have come to understand that carefully trained performance assessment users, who invest the clear thinking and developmental resources needed to do a good job, can use this methodology effectively. Indeed, many of the increasingly complex achievement targets that we ask students to hit today demand that we use performance assessments and use them well. In short, we now know that we have no choice but to rely on subjective performance assessment in certain contexts. So we had better do our homework as an education community!

Here I must insert as strong a warning as any presented anywhere in this book: In your classroom, you will set the standards of assessment quality. It is your vision that will be translated into performance criteria, exercises, and records of student achievement. For this reason, it is not acceptable for you to hold a vision that is wholly a matter of your personal opinion about what it means to be academically successful. Rather, your vision must have the strongest possible basis in the collective academic opinions of experts in the discipline within which you assess and of colleagues and associates in your school, district, and community.

Systematic assessment of student performance of the wrong target is as much a waste of time as a haphazard assessment of the proper target. The only way to prevent this is for you to be in communication with those who know the right target and the most current best thinking about that target and for you to become a serious student of their standards of academic excellence. Strive to know the skills and products that constitute maximum proficiency in the disciplines you assess.

Time for Reflection

What specific sources can teachers tap to be sure they understand skill and performance outcomes?

Matching Method to Target

As the meaning of academic excellence becomes clear, it will also become clear whether or when performance assessment is, in fact, the proper tool to use. While the range of possible applications of this methodology is broad, it is not infinitely so. Performance assessment can provide dependable information about student achievement of some, but not all, kinds of valued outcomes. Let's examine the matches and mismatches with the five kinds of outcomes we have been discussing: knowledge, reasoning, skills, products, and affect.

Assessing Knowledge. If the objective is to determine if students have mastered a body of knowledge through memorization, observing performance or products may not be the best way to assess. Three difficulties can arise in this context, one related to potential sampling errors, another to issues of assessment efficiency, and a third related to the classroom assessment and instructional decision-making context.

Consider, for example, asking students to participate in a group discussion conducted in Spanish as a means of assessing mastery of vocabulary and rules of grammar. While this is an apparently authentic assessment, it might lead you to incorrect conclusion. First, the students will naturally choose to use vocabulary and syntax with which they are most comfortable and confident. Thus they will naturally select biased samples of all possible vocabulary and usage.

Second, if this is an assessment of the level of knowledge mastery of a large number of students, the total assessment will take a great deal of time. This may cause you to collect too small a sample of the performance of each individual, leading to undependable results. Given this achievement target, it would be much more efficient from an achievement sampling point of view to administer a simple objectively scored vocabulary and grammar test. Then, once you are confident that the foundational knowledge has been mastered and the focus of instruction turns to real-world applications, you might turn to the group discussion performance assessment.

Consider this same issue from a slightly different perspective. If you use the performance assessment as a reflection of the knowledge mastery target, it will be difficult to decide how to help the student who fails to perform well. It will not be clear what went wrong. Is the problem a lack of knowledge of the vocabulary and grammar, and/or an inability to pronounce words, and/or anxiety about the public nature of the demonstration? Since all three are hopelessly confounded with one another, it becomes difficult to decide on a proper course of action. Thus once again, given this target and this context, performance assessment may not be the best choice.

When the knowledge to be memorized is to be sampled in discrete elements, a selected response format is best. When larger structures of knowledge are the target, the essay format is preferable. Both of these options offer more control over the material assessed.

However, if your assessment goal is to determine if students have gained control over a body of knowledge through the proper and efficient use of reference materials, then performance assessment might work well. For instance, you might give students the exercise of finding a number of facts about a given topic and observe the manner in which they attack the problem, applying performance criteria related to the process of using particular library reference services and documents. A checklist of proper steps might serve as the basis for scoring and recording results of the assessment.

Or, you might ask for a written summary of those facts, which might be evaluated on rating scales in terms of the speed with which it was generated, the efficiency of the search, and the accuracy and thoroughness of the summary. Observation and judgment might play a role here.

Assessing Reasoning. Performance assessment also can provide an excellent means of assessing student reasoning and problem-solving proficiencies. Given complex problems to solve, students must engage in thinking and reasoning processes that include several steps. While we cannot directly view the thought processes, we can use various kinds of proxy measures as the basis for drawing inferences about the reasoning carried out.

For example, we might give chemistry students unidentified substances to identify and watch how they go about setting up the apparatus and carrying out the study. The criteria might reflect the proper order of activities. Those who reason well will follow the proper sequence and succeed. Those whose reasoning is flawed will go awry. Some might classify this as a selected response test: students identify the substance correctly or they do not; right or wrong. While that is true in one sense, think about how much richer and more useful the results are when the assessment is conceived and carried out as a performance assessment—especially when the student fails to identify the substance accurately. A comparison of the reasoning actually carried out with the reasoning spelled out in the performance criteria will be very revealing, and instructionally relevant.

Performance assessments structured around products created by students also can provide insight into the reasoning process. The resulting product itself is a reflection of sound or unsound reasoning during its development. One simple example might be the production of a written research report by students who carried out the above experiment. That report would reflect and provide evidence of their problem-solving capabilities.

Another example of a product-based performance assessment would be the physics challenge of building a tower out of toothpicks that will hold a heavy load. One performance criterion certainly will be the amount of weight it can hold. But others might focus on whether the builder adhered to appropriate engineering principles. The product-based performance assessment can help reveal the ability to apply those principles.

In fact, the thoughtful development and use of this performance assessment can help students achieve such a problem-solving goal. For example, what if you gave students two towers built purposely to hold vastly different amounts of weight? They might be told to analyze each in advance of the load-bearing experiment to predict which would hold more. Further, you might ask them to defend their prediction with specific design differences. After the experiment reveals the truth, the students are more likely to be able to infer how to build strong towers. In essence, the problem-solving criteria will have been made clear to them.

Assessing Skills. The great strength of performance assessment methodology lies in its ability to ask students to perform in certain ways and to provide a dependable means of evaluating that performance. Most communication skills fall in this category, as do all forms of performing, visual, and industrial arts. The-observation of students in action can be a rich and useful source of information about their attainment of very important forms of skill achievement. We will review many examples of these as our journey continues.

Assessing Products. Herein lies the other great strength of performance assessment. There are occasions when we ask students to create complex achievement-related products. The quality of those products indicates the creator's level of achievement. If we develop sound performance criteria that reflect the key attributes of these products and learn to apply those criteria well, performance assessment can serve us as both an efficient and effective tool. Everything from written products, such as term papers and research reports, to the many forms of art and craft products can be evaluated in this way. Again, many examples will follow.

Assessing Affect. To the extent that we can draw inferences about attitudes, values, interests, motivational dispositions, and/or academic self-concept based either on the actions of students or on what we see in the products they create, then performance assessment can assist us here, too.

However, I urge caution. Remember, sound performance assessment requires strict adherence to a pre established set of rules of evidence. Sound assessments must do the following:

· Reflect a clear target—We must thoroughly understand and develop sound definitions of the affective targets to be assessed.

· Serve a clearly articulated purpose—We must know precisely why we are assessing and what it is we intend to do with the result--especially tricky in the case of affective outcomes.

· Rely on a proper method - The performance must present dependable information to us about affect.

· Sample the target appropriately - We must collect enough evidence of affect to give us confidence in our conclusions.

· Control for extraneous interference - The potential sources of bias in our judgments about student attitudes, values, interests, and so on must be understood and neutralized in the context of our assessments.

When applying these standards of quality to the assessment of achievement outcomes--those content-based targets we are trained to teach—it becomes somewhat easier to see the translations. That is, hopefully, we have immersed ourselves far enough in a particular field of study to attain a complete understanding of its inherent breadth and scope. We should know when a sample of test items, a set of essay exercises, a particular performance assessment, or a product evaluation captures the meaning of academic success.

When it comes to affective outcomes, however, most of us have had much less experience with and therefore are much less comfortable with their meaning, depth, and scope. That means successfully assessing them demands careful and thoughtful preparation.

We can watch students in action and/or examine the things they create and infer about their affective states. But we can do this only if we have a clear and practiced sense of what it is we are looking for and why we are assessing it. I will address these issues in depth in Chapter 12.

Summary of Target Matches. There are many important educational outcomes that can be translated into performance assessments. That is, if we prepare carefully, we can develop performance criteria and devise exercises to sample the

following:

• use of reference material to acquire knowledge

• application of that knowledge in a variety of problem-solving contexts

• proficiency in a range of skill arenas

• ability to create diffe-ent kinds of products

• feelings, attitudes, values, and other affective characteristics

In fact, the only target for which performance assessment is not recommended is the assessment of simple elements or complex components of subject matter knowledge to be mastered through memorization. Selected response and essay formats work better here.

DEVELOPING PERFORMANCE ASSESSMENTS

As with selected response and essay assessments, we develop performance assessments in three steps. Each step corresponds to one of the three basic design components introduced earlier. Developers must specify the performance to be evaluated, devise exercises to elicit the desired behavior, and develop a method for making and recording judgments.

Unlike other forms of assessment, however, this form permits flexibility in the order in which these parts are developed. But before we consider those issues, we will review context factors we should consider in deciding whether or when to adopt performance assessment methods.

Context Factors

Clearly, the prime factors to consider in the assessment selection process are the appropriateness of the achievement target for your students and the match of performance assessment methodology to that target. You must also ask yourself certain practical questions when deciding if performance assessment is the right choice for your particular context. These questions are posed in Figure 8-3.

Approximating the Best. Remember, while performance assessment can be used to assess reasoning, we also can use selected response and essay formats to tap this kind of outcome. In addition, while performance assessment is the best option for measuring attainment of skill and product outcomes, again, we can use selected response. and essay to assess student mastery of important prerequisites of effective performance. In this sense, they represent approximations of the best.

Further, as you will see in Chapter 9, sometimes we can gain insight into achievement by having students talk through hypothetical performance situations. Admittedly, these are second best when compared to the real thing. But they can provide useful information.

These "proxy" measures might come into play when we seek to size up a group of students very quickly for instructional planning purposes. In such a case, we might" need only group performance information, so we could sample a few students from the group and assess just a few aspects of the performance of each. By combining information across students, we generate a profile of achievement that indicates group achievement strengths and needs. We can use such group information to plan instruction. Under these circumstances, it may be unnecessary to use costly, full-blown performance assessments of every student. Rather, we might turn to simpler, more efficient paper and pencil or personal communication–based approximations of the best assessment to get what we need.

In effect, ct, you can use proxies to delve into the prerequisites of skill and product performance to determine individual student needs. Remember that the building blocks of competence include knowledge to be mastered and reasoning power—both of which can be assessed with methods that fall short of actual performance

• Do you have the expertise required to develop clear and appropriate criteria? Don't take this too lightly. If you have not developed a deep sense of important outcomes in the field and therefore don't have a highly differentiated vision of the target(s), performance assessment car present a very frustrating challenge. Understand the implications of teaching students to hit the wrong target) Solicit some help—an outside opinion - just to verity the appropriateness of your assessment. For instance, find a colleague or maybe a small team of partners to work with.

• Are your students able to perform in the ways required? Be sure there are no physical and/or emotional handicaps that preclude being able to do the work required. Primary among the possible performance Inhibitors maybe evaluation anxiety in those assessment contexts requiring public displays of proficiency.

• What Is the purpose for the assessment? If high-stakes decisions hang on the assessment results, such as promotion, graduation, a critical certification of mastery, or the like, you need to be prepared to invest the time and energy sufficient to produce confident results. Such critical assessments require a higher degree of confidence than do periodic examinations that measure current student proficiency levels in an ongoing classroom situation.

• How many students will you assess? The more students you must assess, the; more carefully you I must think through where or how you are going to fine required to do them all. There a many creative ways to economize —such much as sharing the I work with I other trained and qualified judges like your students, among others.

• What is the scope of the achievement target to be assessed?: scope influences two things: the amount of time over which you sample performance. If the scope is narrow and the time frame short (e.g. The focus of one day’s lesson), few exercises will be needed to sample it well. Broader targets, on the other hand, require more exercise (e.g., a semester’s worth of material), and demand that you spread your sample of exercises out over an extended period.

• Is the target simple or complex? Complex targets require more exercises to cover the full range of applications. For example, we cannot label a student a competent or incompetent writer on the basis of one exercise, no matter what that exercise requires. Writing is complex, taking many forms and occurring in many different kinds of contexts. If the target is complex, exercises must sample enough relevant forms and contexts to lead to confident inferences about competence.

• Are the materials required to perform successfully available in school and / or at home? Anticipate what specific material students will need to perform the task at hand: School resources vary greatly, as do resources available for students at home. Be sure all have an equal opportunity to succeed before you proceed.

• What resources do you have, at disposal to conduct the observation and scoring required your of your assessment? Obviously, observing and evaluating students or their products is a labor- intensive activity that demands much time and effort. If there has been one deterrent to the broader use of this methodology, it is the time required to do it well. Teachers often get trapped into thinking that all that work must automatically fall on their shoulders. This is not so. Other resources can include the principal (!), teacher aides, parents, outside experts, colleagues, and last but by no means least, students themselves, evaluating their own or each other’s performance. Think about the instructional implications of involving them in the process. But remember, you must train them to apply the criteria dependably.

• Has the author of your textbook or workbook, a colleague, , or someone else already developed sound performance criteria and associated exercises for you to adopt and use? Verify the quality and train yourself to apply the criteria dependably and these ready-made assessments can save a great deal of development time. Also, consider revising them to more closely fit your needs.

Assessment. Not only can proxy measures serve as a means of such formative assessment, but to the extent that you involve your students in the assessment process, they also can introduce students to the various prerequisites before they need to put them together.

Further, any time resources are too limited to permit a full-blown performance assessment, we might be forced to think of alternatives; to come as close as we can to the real thing, given our resources. While the resulting paper and pencil or personal communication assessments will fall short of perfection, if they are thoughtfully developed, they may give us enough information to serve our needs.

If you do decide to use approximations, however, never lose sight of their limitations: understand the outcomes they do and do not reflect.

Time for Reflection

Can you remember a paper and pencil test you have taken that was a proxy measure for an achievement target that would have been more completely or precisely assessed with a performance assessment? How close did it come to the real thing, in your opinion?

The Order of Development

In explaining the basic three-part design framework, I began by specifying performance, then turned to exercises, then scoring practices. However, this is not the only possible or viable order of development for performance assessments.

For instance, we might begin developing a performance assessment by creating rich and challenging exercises. If we can present complex but authentic, real-world problems for students to solve, then we can see how they respond during pilot test administrations and devise clear criteria for evaluating the achievement' of subsequent performers.

On the other hand, it is difficult to devise exercises to elicit outcomes unless and until we have specified precisely what those outcomes are. In this case, we could start by selecting the target, translate it into performance criteria, and then develop performance rating procedures. Then, given a clear vision of the desired performance, we can devise exercises calculated to elicit samples of performance to which we can then apply the criteria.

In a sense, we have a chicken-or-egg dilemma here. We can't plan to evaluate performance until we know what that performance is--but neither can we solicit performance properly until we know how we're going to evaluate it! Which comes first, the performance criteria or the exercises?

As luck and good planning would have it, you can take your choice. Which you choose is a function of your level of understanding of the valued outcomes to be assessed.

When You Know What to Look For. Those who begin the performance assessment development process with a strong background in the area to be evaluated probably possess a highly refined vision of the target and can develop performance criteria out of that vision. If you begin with that kind of firm grounding, you may be able simply to sit at your desk and spell out each of the key elements of sound performance. With little effort, you may be able to translate each key element into different levels of achievement proficiency in clear, understandable terms. If you have sufficient pedagogical knowledge in your area(s) of expertise, you can use the procedures discussed in this chapter to carry out the necessary professional reflection, spell out your performance criteria, and transform those into scoring and recording procedures. Then you will be ready to devise exercises to elicit the performance you want.

When You Have a Sense of What to Look for. However, not everyone is ready to jump right in in this manner. Sometimes we have a general sense of the nature of the performance but are less clear on the specific criteria. For example, you might want your students to "write a term paper," but not have a clear sense about the standards of quality you want to apply.

When this happens, you need a different starting place. One option is to give students a general term paper assignment and use the resulting papers—that is, actual samples of student work—as the basis for defining specific criteria. You can select a few high-quality and a few low-quality papers to compare as a means of generating clear and appropriate performance criteria. One way to do this is to sort them into three or four piles of general quality, ranging from poor to excellent, then carefully and thoughtfully analyze why the papers differ. Why do some work, while others don't.? In those differences are hidden the performance criteria you seek.

The major shortcoming of starting with general exercises, of course, is that it puts students in the unenviable position of trying to perform well—write a good term paper, for example—without a clear sense of what good performance is supposed to look like. But remember, you need do this only once. From then on, you will always have well-developed criteria in hand to share with students in advance.

You can avoid this problem if you can recover copies of previous term papers. The key is to find contrasting cases, so you can compare them. They needn't come from your current students. Or, if you are assessing a demonstrated skill, perhaps you can find videotapes of past performance, or can locate students practicing and observe them. One excellent way to find the right performance criteria, your vision of the meaning of academic success, is by "student watching." You can derive the critical elements of student success from actual samples of student work. But to take advantage of this option, you first need to get them performing somehow. That may mean starting with the development of exercises.

When you’re Uncertain about what to look for. Other times, you may have only the vaguest sense of what you want students to know and be able to do within a particular discipline. In these instances, you can use performance assessment exercises to help identify the truly important learning. Here's how this works:

Begin by asking yourself, what kinds of real-world challenges do I want students to be able to handle? What are some sample problems I hope students would be able to solve? Using creative brainstorming, you and your colleagues can create and collect numerous written illustrative sample exercises. When you have assembled enough such exercises to begin to zero in on what they are sampling, then step back from the array of possibilities and ask, what are the important skills that seem to cross all or many of these problems? Or, if products are to result, What do all or many of the products seem to have in common? In short, ask, what are these exercises really getting at? In other words, we draw inferences about the underlying meaning of success by examining various examples of how that success is likely to manifest itself in real-world problems. Out of these generalizations, we can draw relevant and explicit performance criteria.

One thing I like about this strategy is the fact that the resulting performance criteria are likely to be usable for a number of similar exercises. Good criteria generalize across tasks. They are likely to represent generally important dimensions of sound performance, not just those dimensions that relate to one narrowly defined task. They capture and convey a significant, generalizable portion of the meaning of academic success.

Having acknowledged these various options in the order of assessment development, I will now outline a simple performance assessment development sequence starting with the criteria, adding in the exercises, and concluding with the development of scoring and recording procedures. You can mix and match these parts and use the development of one part to help you solve problems in the development of another part. This is the art of performance assessment development.

Phase 1: Defining Performance

Our goal in defining the term performance as used in the context of performance assessment is to describe the important skills to be demonstrated and/or the important attributes of the product to be created. While performance assessments also include evaluations of or judgments about the level of proficiency demonstrated, our basic challenge is to describe the underlying basis of our evaluations.

More specifically, in designing performance assessments, we work to find a vocabulary to use in communicating with each other and with our students about the meaning of successful performance. The key assessment question comes down to this: Do you, the teacher, know what you are looking for in performance? But the more important instructional question is this: Do you know the difference between successful and unsuccessful performance and can you convey that difference in meaningful terms to your students? Remember, students can hit any target they can see and that holds still for them. In performance assessment contexts, the target is defined in terms of the performance criteria.

Shaping Your Vision of Success. As I have said repeatedly, the most effective way to be able to answer these two questions in the affirmative is to be a master of the skills and products that reflect the valued academic outcomes in your classroom. Those who teach drama, music, physical education, second languages, computer operations, or other skill-based disciplines, are prepared to assess well only –hen they possess a refined vision of the critical skills involved. Those who instruct students to create visual an, craft or (XILMS, and various written products face both the teaching and assessment challenges with greatest competence and confidence when they are masters at describing the high-quality product to the neophyte.

Connoisseurs can recognize outstanding performance when they see it. They know a good restaurant when they find it. They can select a fine wine. They know which movies deserve thumbs up, which Broadway plays are worth their ticket price. And connoisseurs can describe why they have evaluated any of these as outstanding. It is their stock in trade. However, because the evaluation criteria may vary somewhat from reviewer to reviewer, their judgments may not always agree. In restaurants, wines, movies, and plays, the standards of quality may be a matter of opinion. But, that's what makes interesting reading in newspapers and magazines.

Teachers are very much like these connoisseurs, in that they must be able to recognize and describe outstanding performance. But there are important differences between connoisseurs and teachers.

Not only can well-prepared teachers visualize and explain the meaning of success, but they can impart that meaning to others so as to help them become outstanding performers. In short, they are teachers, not just critics.

In most disciplines, there are agreed-upon skills and products that proficient performers must master. The standards of excellence that characterize our definitions of high-quality performance are always those held by experts in the field of study in question. Outstanding teachers have immersed themselves in understanding those discipline-based meanings of proficiency and they understand them thoroughly. Even when there are differences of opinion about the meaning of outstanding performance in a particular discipline, well-prepared teachers understand those differences and are capable of revealing them to their students.

It is this depth of understanding that must be captured in our performance expectations so it can be conveyed to students through instruction, example, and practice. Because they must be shared with students, our performance criteria cannot exist only in the intellect of the assessor. They must be translated into words and examples for all to see. And they must be capable of forming the basis of our judgments when we record the results of our assessments.

Finding Help in Shaping Your Vision. In this regard, since we now have nearly a decade of significant new discipline-based performance assessment research and development behind us, many fields of study already have developed outstanding examples of sound criteria for critical performance. Examples include writing proficiency, foreign language, mathematics, and physical education. The most accessible source of information about these developments is the national association of teachers in each discipline. Nearly every such association has advanced written standards of student achievement in their field of study within the past five years. Any association that has not completed that work by now is conducting such studies at this time and will have them completed soon. I will provide examples of these in Part 3.

Not only will these associations probably have completed at least some of this work themselves, but they likely know others who have engaged in developing performance standards in their field. These may include university researchers and/or state departments of education. Check with your reference librarian for a directory of associations to learn how to contact those of interest to you.

Many contend that most of the important advances in the development of new assessment methods, including performance assessments, conducted over the past decade have been made by assessment departments of state departments of education. For this reason, it may be useful to contact your state personnel to see if they have either completed development of performance criteria in your discipline or know of other states that have. Again, I will share examples of these later.

And finally, consider the possibility that your local district or school curriculum development process may have resulted in the creation of some new performance assessment. Or perhaps a colleague, completely unbeknownst to you, developed an evaluation of a kind of performance that is of interest to you, too. You will never know unless you ask. At the very least, you may find a partner or even a small team to work with you in your performance assessment development.

Six Steps in Developing Your Own Criteria. If you must develop performance criteria yourself, you must carry out a thoughtful task or product analysis. That means you must look inside the skills or products of interest and find the active ingredients. In most cases, this is not complicated.

A professor associate of mine decided to develop a performance assessment of her own teaching proficiency. The assessment would focus on the critical skills in the presentation of a class on assessment. Further, she decided to engage her students in the process of devising those criteria--to assure that they understand what it means to teach effectively. Let me describe how that went.

Please note that the examples presented in this description are real. They came directly from the work of the actual class depicted in this story. They are not intended to represent exemplary work or the best possible representation of the attributes discussed and should not be regarded as such. They are merely presented as illustrations from real classroom activities.

Step 1: Reflective Brainstorming. The process of developing criteria reflecting teaching proficiency began with a brainstorming session. The professor talked with her students a few moments about why it is important to understand how to provide sound instruction in assessment methods, and then asked, what do you think might be some of the factors that contribute to suggestions on the board, trying to capture the essence of each with the briefest possible label.

That list looked something like this:

know the subject

use humor

organized

enthusiasm

fresh ideas

relevant content

clear objectives

be interactive

use visuals well

be interesting

appropriate pacing

believe in material covered

professional

credible information

poised

flexible

on schedule

good support materials

appropriate text

monitor student needs

voice loud, clear, varied

comfortable environment

refreshments!

material connected

challenging

personalize communication

effective time management

in control

From time to time, the teacher would dip into her own reservoir of ideas about effective teaching and offer a suggestion, just to prime the pump a bit.

As the list grew, the flow of ideas began to dry up--the brainstorming process slowed. When it did, she asked another question: What specific behaviors could teachers engage in that would help them be effective—what could they do to maximize the chances of making instruction work? The list grew until everyone agreed that it captured most of what really is important. The entire process didn't take more than ten minutes.

Time for Reflection

Think about your experience as a student and/or teacher. What other keys to effective teaching can you think of?

Step 2: Condensing. Next, she told them that they needed to be able to take advantage of all of these excellent suggestions to evaluate her class and her effectiveness. However, given the immense list they had just brainstormed, they just wouldn't have time to evaluate on the basis of all those criteria. They would have to boil them down to the truly critical factors. She asked how they might do that.

Some students thought they should review the list and pick out the most critical entries, to concentrate on those first. Others suggested that they try to find a smaller number of major categories within which to group elements on the long list. To reach these goals, the professor asked two questions: Which of the things listed here on the board are most crucial? Or, what super categories might we place file individual entries in, to get a shorter list?

At this point in the development process, it became important to keep the list of super categories as short as possible. She asked the class if they could narrow it down to four or five—again capturing the essence of the category with the briefest possible label. (These super category headings need to represent truly important aspects of sound performance, because they form the basis for the performance criteria, as you will see.)

Here are the five super categories the students came up with after about five minutes of reflection and discussion:,

Content

Organization

Delivery

Personal characteristics

Classroom environment

Time for Reflection

Based on the list presented above, supplemented with your additions, what Other super categories would you suggest?

Remember, the goal throughout this entire activity is to build a vocabulary both students and teacher can use to converse with each other about performance. This is why it is important to engage students in the process of devising criteria, even if you know going in what criteria you want used. When you share the stage with your students. They get to playa role in defining success and in choosing a language to de-scribe it that they understand, thus connecting them to their target. (Please reread that sentence. It is one of the most important in the entire book. It argues for empowering students, the central theme of this work.)

Step 3: Defining. Next, class members collaborated in generating definitions of each of the five chosen super categories, or major dimensions of effective teaching. The professor assigned responsibility for writing a concise definition of key dimensions to groups of students, one dimension per group. She advised them to consider the elements in the original brainstormed list by reviewing it and finding those smaller elements subsumed within each super category. This would help them find the critical words they needed to describe their dimension. When each group completed its draft, a spokesperson read their definition to the class and all were invited to offer suggestions for revising them as needed.

Here are some of the definitions they composed:

Content: appropriateness of presentation of research, theory, and practical applications related to the topic of assessment; appropriateness of course objectives.

Organization : appropriateness of the order in which material is presented in terms of aiding learning.

Delivery: deals with the presentation and interaction patterns in terms of conveying material and helping students learn it

Personal characteristics: appropriateness of the personal manner of the instructor in relating to the material, the students, and the interaction between the two.

Class environment: addresses all physical aspects of the learning atmosphere and setting that are supportive of both students and teacher

The group work, sharing, and revision took about twenty minutes.

Step 4: Contrasting. with labels and definitions for key performance dimensions in hand, they turned to the next challenge: finding words and examples to describe the range of possible performance within each dimension. They had to find ways to communicate with each other about what teaching looks like when it is very ineffective and how that changes as it moves toward outstanding performance. By establishing a sense of the underlying continuum of performance for each dimension of effective teaching (that is, to share a common meaning of proficiency ranging from a complete lack of it to totally proficient), they can observe any teaching and communicate about where that particular example should be rated on each key dimension.

In preparation for this activity, the professor dug up brief, ten-minute videos of two teachers in action, one faltering badly and the other hitting on all cylinders. She showed these videos to her students and asked the question, What makes one class work well while -the other fails? What do you see that makes them different in terms of the five key dimensions defined earlier? They rewound and reviewed the examples several times while defining those differences for each dimension. This activity always helps participants zero in on how to describe performance, good and bad, in clear, understandable language. (Regardless of the performance for which criteria are being developed, my personal experience has Keen that the most effective method of articulating the meaning of sound and unsound performance is that of very carefully studying vastly contrasting cases. These developers used this method to great advantage to define the basis for their performance criteria.)

Step 5. Describing Success. As the students began to become clear on the language and examples needed to describe performance, they searched for ways to capture and quantify their judgments, such as by mapping their continuum descriptions onto rating scales or checklists. (We'll learn more about this in the section below on scoring and recording.) The class decided to develop three-point rating scales to reflect their thinking. Figure 8-4 presents some of these scales. This phase of the work took about an hour. -

Time for Reflection

See if you can devise a three-point rating scale for one or two of the other criteria defined above.

Step 6. Revising and Refining. The professor was careful to point out that, when they arrived at a definition of academic success—whether as a set of performance criteria, rating scales, or whatever form it happened to take—the work was not yet done. They needed to practice applying their new standards to some teaching samples to see if they really fit—to see if they might need to more precisely define key aspects of performance. We can learn a general lesson from this: performance criteria should never be regarded as "finished." Rather, with time and experience in applying our standards to actual samples of student work, our vision of the meaning of success will grow and change. We will sharpen our focus. As this happens, we are obliged to adjust our performance expectations to reflect our most current sense of the keys to academic success.

Note the Benefits. I hope you realized that the entire performance criteria development sequence we just reviewed represents far more than just a preparation to assess dependably. This sequence almost always involves participants in serious, highly motivated questioning, probing, and clarifying. In fact, assessment and instruction are indistinguishable when teachers involve their students in the process of identifying performance criteria.

Another Useful and Important Application. However, for various reasons, you may not wish to involve your students. Perhaps the students are too young to comprehend the criteria or the process. Or perhaps the target requires the development and application of highly technical or complex criteria that would be out of reach of the students. I have seen student involvement work productively as early as the third grade for some simple targets. But it may not always be appropriate.

Content	3 2 1	Outcomes clearly articulated challenging and, provocative content highly relevant content on assessment for teachers some stated outcomes content. somewhat, interesting and engaging of some relevance to the classroom intended outcome not stated content boring irrelevant to teachers and the classroom
Delivery	3 2 1	flow brid pace moves well humor used checks for clarity regularly feedback used to adjust extensive Interaction with students pacing acceptable. some of the time material and/or delivery somewhat disjointed some, checking for clarity some student participation pacing too slow or fast delivery disconnected much dead time no interaction--one-person show no checking for clarity

When this happens, at least consider another option for carrying out this same set of activities: Rather than engaging your students as your partners, devise criteria with a group of colleagues. If you do, you may argue about what is really important in performance. You might disagree about the proper language to use to describe performance. And you may fight with each other about key differences between sound and unsound performance. But I promise you these will be some of the most engaging and productive faculty meetings of your life. And out of that process might come long-term partners in the performance assessment process.

Even if everyone doesn't agree in the end, each of you will have reflected deeply on, and be able to defend, the meaning of student success in your classroom. We all need that kind of reflection regularly.

Summary of the Six Steps. However, if it comes down to you devising your own

performance criteria, you can rely on variations of these steps, listed again in Figure 8-5. And remember, when students are partners in carrying out these six steps, you and your students join together in a learning community.

These activities can provide clear windows into the meaning of academic success—they can give us the words and examples we need to communicate about that meaning. I urge you to share those words with all who have a vested interest in

Step 1, Begin by reflecting on the meaning of excellence in the performance arena that Is of Interest to you. Be sure to tap your own professional literature, texts, and curriculum. Materials for Insights, too. And don't overlook the wisdom of your colleagues and associate as a resource. Talk with them! Include students as partners in this step, too-Brainstorm your own list of key elements. 'You don't have to list them all In sitting. Take some time to let the list grow.

Step 2, Categorize the many elements; so that they- reflect-your highest priorities. Keep the list as shows possible while still capturing the essence of performance.

Step, 3, define each key dimension in clear, simple language.

Step 4, find some actual performance to watch or ecample of products to study. If this step can include the thoughtful analysis of a number of contrasting cases – an outstanding term paper and a very ewak one, a flowing and accurate jumshot in basketball and a poor one, a student who functions effectively in a group and one who is repeatedly rejected, and so on – so much the better.

Step 5, Use your clearest language and your very best examples to spell out in word and picture each point a long the various continuums of performance e you

use to define the important dimensions of the achievement to be -assessed.

Step 6. Try your performance criteria to see If they really do capture the essence of performance: fine -tune them to state as precisely as possible what it means to succeed. Let this fine tuning go on as needed for as long as you teach student success, most notably, with your students themselves. This, (lien, is the art of developing performance criteria.

Attributes of Sound Criteria. Quellmalz (1991), writing in a serial issue of a professional journal devoted to performance assessment, provides us with a simple list of standards against which to compare our performance criteria in order to judge their quality. She points out that effective performance criteria do the following:

1. Reflect all of the important components of performance--the milestones in target attainment.

2. Apply appropriately in contexts and under conditions in which performance naturally occurs.

3. Represent dimensions of performance that trained evaluators can apply consistently to a set of similar tasks (i.e., not be exercise specific).

4. Are developmentally appropriate for the examinee population.

5. Are understandable and usable by all participants in the performance assessment process, including teachers, students, parents, and the community.

6. Link assessment results directly into the instructional decision making process.

7. Provide a clear and understandable means of documenting and communicating about student growth over time.

I would expand this list to include one additional standard: The development of performance criteria should be seen as an opportunity to teach. Students should play a role in the development of performance criteria whenever possible.

Figure 8-6 details rating scales that depict two key dimensions of good writing, organization and voice. Note the simple, yet dear and specific i nature of the communication about important dimensions of good writing. With these kinds of criteria in hand, we definitely can help students become better performers.

Phase 2: Designing Performance Exercises

Performance assessment exercises, like selected response test items and essay exercises, frame the challenge for the respondent and set the conditions within which that challenge is to be met. Thus, they are a clear and explicit reflection of the desired outcomes. Like essay exercises, sound performance assessment exercises outline a complete problem for the respondent: achievement to be demonstrated, conditions of the demonstration, and standards of quality to be applied.

As specified earlier in this chapter, we face three basic design considerations when dealing with exercises in the context of performance assessment. We must determine the following:

1. The nature of the exercise(s), whether structured exercises or naturally occurring events Organization

5 The organization enhances and showcases the central idea or theme. The order, structure, or presentation is compelling and moves the reader through the text.

• Details seem to fit where they're placed; sequencing is logical and effective.

• An inviting introduction draws the reader in and a satisfying conclusion leaves the reader with a sense of resolution.

• Pacing is, very well controlled; the writer delivers needed information at just the right moment, then moves on.

• Transitions are smooth and weave the separate threads of meaning into one cohesive whole.

• Organization flows so smoothly the reader hardly thinks about it.

3 The organizational structure is strong enough to move the reader from point to point without undue confusion.

• The paper has a recognizable introduction and conclusion. The introduction may not create a strong sense of anticipation; the conclusion may not leave the. Reader with a satisfying sense of resolution.

• Sequencing is usually logical: It may sometimes be too obvious, or otherwise ineffective.

• Pacing is fairly well controlled, though the writer sometimes spurts ahead too quickly or spends too much time on the obvious.

• Transitions often work well; at times though, connections between ideas are fuzzy or call for inferences.

• Despite a few problems, the organization does not seriously get in the way of the main point or storyline.

1 The writing lacks a clear sense of direction. Ideas, details or events seem strung together in a random, haphazard fashion--or' else there Is no identifiable internal structure at all. More than one of the following problems is fikely'to'be evident:

• The writer has not yet drafted d a real lead or conclusion.

• Transitions are not, yet dearly defined; connections between ideas seem confusing or incomplete.

• Sequencing, if it exists, needs work.

• Pacing feels awkward, with lots of time spent on minor details or big, hard-to-follow leaps from point to point.

• Lack of organization makes it hard for the reader to get a grip on the main point 'or storyline.

Figure 8-6

Sample Rating Scales For Writing (Repreinted from “Lingking Writing Assessment and Instruction”) in Creating Writers (104-106) by V. Spandel and R. J. Stiggins, 1990, White Plains, NY: Longman. Copyright 1990 by Longman. Reprinted by permission of Longman.)

2. The specific content of structured exercises, defining the tasks to be carried out by performers

3. The number of exercises needed to provide a sufficient sample of performance

We will now delve into each in some detail.

5 The writer speaks directly to the reader in a way that is Individualistic, expressive, and engaging. Clearly, the writer is involved in the text and is writing to be read.

• The paper is honest and written from the heart. It has the ring of conviction.

• The language is natural yet provocative; it brings the topic to life.

• The reader feels a strong sense of interaction n with the-writer and senses The person behind the words.

• The projected tone and voice give flavor to the writer's message and seem very appropriate for the purpose and audience.

3 The writer seems sincere, but not genuinely engaged, committed, or involved. The result is pleasant and sometimes even personable, but short of compelling.

• The writing communicates in an earnest, pleasing manner. Moments here and there amuse, surprise, delight, or move the reader.

• Voice may emerge strongly on occasion, then retreat behind general, vague, tentative, or abstract language.

• The writing hides as much of the writer as it reveals.

• The writer seems aware of an audience, but often weighs words carefully, stands at a distance, and avoids risk.

1 The writer seems indifferent, uninvolved, or distanced from the topic and/or the audience. As a result, the writing is flat, lifeless, or mechanical; depending on the topic, it maybe overly technical jargonistic. More than one of the following problems is likely to be evident

• The reader has a hard time sensing the writer behind the words. The writer does not seem to reach out to an audience, or make use of voice to connect with that audience.

• The writer speaks in a kind of monotone that tends of flatten all potential highs and lows of the message

• The writing communicates on a functional level With no apparent attempt to move or involve the reader.

• The writer is not yet sufficiently engaged or at home with the topic to take risks or share him/herself.

Figure 8-6, (Continued)

Sample Rating Scales for Writing

Nature of Exercises. The decision about whether to rely on structured exercises, naturally occurring events, or some combination of the two should be influenced by several factors related to the outcome(s) to be assessed and the environment within which the assessment is to be conducted.

Focus of Assessment. Structured exercises and naturally occurring events can help us get at slightly different targets. When a pending performance assessment is announced in advance and students are given instructions as to how to prepare we intend to maximize their motivation to perform well. In fact, we often try to encourage best possible performance by attaching a grade or telling students that observers from outside the classroom (often parents) will watch them perform. When we take these steps and build the assessment around structured exercises, we set our conditions up to assess students' best possible performance, under conditions of maximum motivation to do well—a very important outcome.

However, sometimes our objective is not to see the student's "best possible" performance. Rather, what we wish is "typical" performance, performance under conditions of the students' regular, everyday motivation. For example, we want students to adhere to safety rules in the woodworking shop or the science lab all the time (under conditions of typical motivation), not just when they think we are evaluating them (maximum motivation to perform well). Observation during naturally occurring classroom events can allow us to get at the latter.

From an assessment quality control point of view, we still must be clear about our purpose. And, explicit performance criteria are every bit as important here. But our assessment goal is to be watching closely as students behave spontaneously in the performance setting.

Time for Reflection

Identify a few achievement targets you think might be most effectively assessed through the unobtrusive observation of naturally occurring events. In your experience as a teacher or student, have you ever been assessed in this way? When?

Time Available to Assess. In addition to motivational factors, there also are practical considerations to bear in mind in deciding whether to use structured or naturally occurring events. One is time. If normal events of the classroom afford you opportunities to gather sound evidence of proficiency without setting aside special time for the presentation of structured exercises and associated observations, then take advantage of the naturally occurring instructional event. The dividend will be time saved from having to devise exercises and present and explain them.

Natural Availability of Evidence. Another practical matter to consider in your choice is the fact that classrooms are places just packed full of evidence of student proficiency. Think about it--teachers and students spend more time together than do the typical parent and child or husband and wife! Students and teachers live in a world of constant interaction in which both are watching, doing, talking, and learning. A teacher's greatest assessment tool is the time spent with students.

This permits the accumulation of bits of evidence--for example, corroboration of past inferences about student proficiency and/or evidence of slow, gradual growth—over extended periods of time, and makes for big samples. It offers opportunities to detect patterns, to double check, and to verify.

Spontaneous Assessment. Everything I have said about performance assessment up to this point has depicted it as rational, structured, and preplanned. But teachers know that assessment is sometimes spontaneous. The unexpected classroom event or the briefest of unanticipated student responses can provide the ready observer with a new glimpse into student competence. Effective teachers see things. They file those things away. They accumulate evidence of proficiency. They know their students. No other assessor of student achievement has the opportunity to see students like this over time.

But beware! These kinds of spontaneous performance assessments based on on the spot, sometimes unobtrusive observations of naturally occurring events are fraught with as many dangers of misassessment as any other kind of performance assessment. Even in these cases, we are never absolved from adhering to the basic principles of sound assessment: clear target, clear purpose, proper method, sound sample, and controlled interference.

You must constantly ask yourself: What did I really see? Am I drawing the right conclusion based on what I saw? How can I capture the results of this spontaneous assessment for later use (if necessary or desirable)? Anecdotal notes alone may suffice. The threats to sound assessment never leave us. So by all means, take advantage of the insights provided by classroom time spent together with your students. But as a practical classroom assessment matter, do so cautiously. Create a written record of your assessment whenever possible.

Time for Reflection

Can you think of creative, realistic ways to establish dependable records of the results of spontaneous performance assessments that happen during an instructional day?

Exercise Content. Like well-developed essay exercises, sound structured performance assessment exercises explain the challenge to the respondent and set them up to succeed if they can, by doing the following:

• identifying the specific kind(s) of performance to be demonstrated

• detailing the context and conditions within which proficiency is to be demonstrated

• pointing the respondent in the direction of a good response by identifying the. standards to be applied in evaluating performance

Here is a simple example:

Achievement: You are to apply your knowledge of energy converted to motion and simple principles of mechanics by building a mousetrap car.

Conditions: Using materials provided in class and within the time limits of four class periods, please design and diagram your plan, construct the car itself, and prepare to explain why you included the design features you chose.

Standards: Your performance will be evaluated in terms of the specific standards we set in class, including the clarity of your diagrammed plan, the performance and quality of your car, and your presentation explaining its design features. if you have questions about these instructions or the standards of quality, let me know.

In this way, sound exercises frame clear and specific problems to solve.

In a comprehensive discussion of the active ingredients of sound performance assessment exercises, Baron (1991) offers us clear and thought-provoking guidance. I quote and paraphrase below at length from her work because of the richness of her advice. Baron urges that we ask important questions about the nature of the assessment:

• "If a group of curriculum experts in my field and a group of educated citizens in my community were to use my assessment tasks as an indicator of my educational values, would I be pleased with their conclusions? And would they?" (p. 307)

• "When students prepare for my assessment tasks and I structure my curriculum and pedagogy to enable them to be successful on these tasks, do I feel assured that they will be making progress toward becoming genuine or authentic readers, mathematicians, writers, historians, problem solvers, etc.?" (p. 308)

• "Do my tasks clearly communicate my standards and expectations to my students?" (p. 308)

• Is performance assessment the best method to use given what I want my students to know and be able to do?

• “Are some of my tasks rich and integrative, requiring students to make connections and forge relationships among various aspects of the curriculum?" (p. 310)

• "Are my tasks structured to encourage students to access their prior knowledge and skills when solving problems?" (p. 310)

• "Do some tasks require students to work together in small groups to solve complex problems?" (p. 311)

• Do some of my tasks require that my students sustain their efforts over a period of time (perhaps even an entire term!) to succeed?

• Do some tasks offer students a degree of freedom to choose the course of action--to design and carry out the investigations—they will take to solve the problem?

• "Do my tasks require self-assessment and reflection on the part of students?" (p. 312)

• "Are my tasks likely to have personal meaning and value to my students?" (p. 313)

• "Are they sufficiently challenging for the students?" (p. 313)

• "Do some of my tasks provide problems that are situated in real-world contexts and are they appropriate for the age group solving them?" (p. 314)

These guidelines define the art of developing sound performance exercises.

Time for Reflection

What might a performance assessment exercise look like that could test your skill in developing a high-quality performance assessment? What specific ingredients would you include in the exercise?

The Number of Exercises—Sampling Considerations. How do we know how many exercises are needed within an assessment to give us confidence that we are drawing dependable conclusions about student proficiency? This is a particularly troubling issue in the context of performance assessment, because the amount of time required to administer, observe, and score any single exercise can be so long. Sometimes, we feel it is impossible to employ a number of exercises because of time and workload constraints.

However, this view can lead to problems. Consider writing assessment, for example. Because writing takes so many forms and takes place in so many contexts, defining proficiency is very complex. As a result, proficiency in one writing context may not predict proficiency in another. We understand that a proper sample of writing proficiency, one that allows generalizations to the entire performance domain, must include exercises calling for various kinds of writing, such as narrative, expository, and persuasive. Still, however, we find large-scale writing assessments labeling students as writers or non writers on the basis of a single twenty- to sixty-minute writing sample (Bond & Roeber, 1993). Why? Because that's all the assessment resources will permit!

Sampling always involves tradeoffs between quality of resulting information and the cost of collecting it. Few have the resources needed to gather the perfect sample of student performance. We all compromise. The good news for you as a teacher is that you must compromise less than the large-scale assessor primarily because you have more time with your students. This is precisely why I feel that the great strength and future of performance assessment lies in the classroom, not in large-scale standardized testing.

In the classroom, it is often helpful to define sampling as the purposeful collection of a number of bits of information about student achievement gathered over time. When gathered and summarized carefully, these bits of insight can form a representative sample of performance that can lead to confident conclusions about student achievement.

Unfortunately, there are no hard and fast rules to follow in determining how many exercises are needed to yield dependable conclusions. That means we must once again speak of the art of classroom assessment. I will share a sample decision rule with you now that depicts the artistic judgment in this case, and then I will review specific factors you can consider when exercising your judgment.

The sampling decision rule is this: You know you have presented enough exercises and gathered enough instances of student performance when you can predict with a high degree of confidence how well the student would do on the next one. Part of performance assessment sampling is science and part of it is art. Dealing first with the science, the more systematic part, one challenge is to gather samples of student performance under all or most of the circumstances in which they will be expected to perform over the long haul. Let me illustrate from life.

Let's say we want to assess for the purpose of certifying the competence of commercial airline pilots. One specific skill we want them to demonstrate, among others, is the ability to land the plane safely. So we take candidates up on a bright, sunny, calm day and ask them to land the plane---clearly an authentic performance assessment. Let's say all pilots do an excellent job of landing. Are you ready to certify them?

If your answer is yes, I don't want you screening the pilots hired by the airlines on which I fly. Our assessment only reflected one narrow set of circumstances within which we expect our pilots to be competent. What if it's night, not bright, clear daylight? A strange airport? Windy? Raining? An emergency? These represent realities within which pilots must operate routinely. So the proper course of action in sampling performance for certification purposes is to analyze relevant variables and put them together in various combinations to see how each candidate performs. At some point, the array of samples of landing proficiency (gathered under various conditions) combine to lead us to a conclusion that the skill of landing safely has or has not been mastered.

This example frames your performance assessment sampling challenge, too. How many "landings" must you see under what kinds of conditions to feel confident your students can perform according to your standards? The science of such sampling is to have thought through the important conditions within which performance is to be sampled. The art is to use your resources creatively to gather enough different instances under varying conditions to bring you and the student to a confident conclusion about proficiency.

In this context, I'm sure you can understand why you must consider the seriousness of the decision to be made in planning your sample. Some decisions bear greater weight than others. These demand assessments that sample both more deeply and more broadly to give you confidence in the decision that results—such as certifying a student as competent for purposes of high school graduation, for example. On the other hand, some decisions leave more room to err. They allow you to reconsider the decision later, if necessary, at no cost to the student—for example, assessing a student's ability to craft a complete sentence during a unit of instruction on sentence construction. When the target is narrow and the time frame brief, we need sample fewer instances of performance.

Figure 8-7 identifies four factors to take into account in making sampling decisions in any particular performance assessment context. Even within the guidelines these provide, however, the artistic sampling decision rule is this: You know how confident you are. If you are quite certain you have enough evidence, draw your conclusion and act upon it.

But your professional challenge is to follow the rules of sound assessment and gather enough information to minimize the chance that you are wrong. The conservative position to take in this case is to err in the direction of oversampling to raise your level of confidence.

If you feel uncertain about the conclusion you might draw regarding the achieve

· The reason(s) for the assessment. The more critical the decision, the more sure you must be and. the -more information you should gather; a simple daily Instructional decision that can be reversed tomorrow if necessary requires less confidence and therefore a smaller sample of performance than a hig school graduation decision.

· The scope of the target. The broader she scope the more different instances of performance we must sample.

· The amount of information provided by the response to one exercise. Exercise can be written to produce very large samples of work, providing a great deal of information about proficiency; when we use these, we may need to fewer exercise.

· The resources available for observing and evaluating. Put simply, the bigger your labor force, the more assessment you can conduct per unit of time. This may be something you can really take advantage of. Always remain aware of all the shoulders over which you can spread the performance assessment workload: the principal, teacher aides, colleagues, parents, outside experts, student.

Figure 8-7

Considerations in Performance Assessment Sampling

ment of a particular student, -you have no choice but to gather more information. To do otherwise is to place the well-being of that student in jeopardy.

Time for Reflection

Based on your experience as a student, can you identify a skill achievement target that you think would take several exercises to sample appropriately, and another that you think could be sampled with only one, or very few, exercises? What are the most obvious differences between these two targets?

Phase 3: Scoring and Recording Results

Three design issues demand our attention at this stage of performance assessment development, if we are to make the entire plan come together:

1. the level of detail we need in assessment results

2. the manner in which results will be recorded

3. who will do the observing and evaluating

These are straightforward decisions, if we approach them with a clear target and a clear sense of how the assessment results are to be used.

Level of Detail of Results. We have two choices in the kinds of scores or results we derive from our observations and judgments: holistic and analytica ].'Both require explicit performance criteria. That is, we are never absolved from responsibility for having articulated the meaning of academic performance in clear and appropriate terms.

However, the two kinds of scoring procedures use the criteria in different ways. We can either (a) score analytically and make our judgments by considering each key dimension of performance or criterion separately, thus analyzing performance in terms of each of its elements, or (b) make our judgments holistically by considering all of the criteria simultaneously, making one overall evaluation of performance. The former provides a high-resolution picture of performance but takes more time and effort to accomplish. The latter provides a more general sense of performance but is much quicker.

Your choice of score type will turn on how you plan to use the results (whether you need precise detail or a general picture) and the resources you have available to conduct the assessment (whether you have time to evaluate analytically).

Some assessment contexts demand analytical evaluation of student performance. No matter how hard you try, you will not be able to diagnose student needs based on holistic performance information. You will never be able to help students understand and learn to replicate the fine details of sound performance by teaching them to score holistically.

But on the other hand, it is conceivable that you may find yourself involved in an assessment where you must evaluate the performance of hundreds of students with few resources on hand, too few resources to score analytically. Holistic may be your only option.

(As a personal aside, I must say that I am minimizing my own use of holistic scoring as a single, stand-alone judgment of overall performance. I see few applications for such a score in the classroom. Besides, I have begun to question the meaning of such scores. I have participated in some writing assessments in which students whose analytical profiles of performance [including six different rating scales) were remarkably different ended up with the same holistic score. That gives me pause to wonder about the real meaning and interpretability of holistic scores. I have begun to think holistic scores mask the kind of more detailed information needed to promote classroom-level student growth. It may be that, in the classroom, the benefits of quick scoring are not worth the costs of sacrificing such valuable assessment information.)

When a holistic score is needed, it is best obtained by summing analytical scores, simply adding them together. Or, if your vision of the meaning of academic success suggests that some analytical scales are more important than others, they can be assigned a higher weight (by multiplying by a weighting factor) before summing. However, a rational basis for determining the weights must be spelled out in advance.

It may also be acceptable to add a rating scale that reflects "overall impression" to a set of analytical score scales, if the user can define how the whole is equal to more than the sum of the individual parts of performance.

Recording Results. Performance assessors have the freedom to choose from among a wonderful array of ways to record results for later communication. These include checklists, rating scales, anecdotal records, and mental record keeping. Each of these is described in Table 8-1 in terms of definition, principal strength, and chief limitation.

Table 8-1

Options for recording performance judgments

	Definition	Strength	limitation
Checklists	List of key aftibutes of good performance checked present or absent.	Quick; useful with large number of criteria.	Results can lack depth
Rating scales	Performance continuum mapped numerical scale ranging from low to high.	Can record judgment and rationale with one rating.	Can demand extensive, expensive development and training for raters.
Anecdotal records	Student performance is described in detail in writing.	Can provide rich portraits of achievement	Time consuming to read, write, and interpret.
Mental records	Assessor store judgments and/or descriptions of performance in memory.	Quick and easy way to record	Difficult to retain accurate recollections, especially as time passed.

Note that checklists, rating scales, and anecdotal records all store information that is descriptive in terms of the performance criteria. That is, each element of performance checked, rated, or written about must relate to our judgments about student performance on established key dimensions.

In using mental record keeping, we can store either ratings or images of actual performance. I included it in this list to provide an opportunity to urge caution when using this notoriously undependable storage system! Most often, it is not a good idea to rely on our mental records of student achievement. When we try to remember Such things, there are five things that can happen and four of them are bad. The one good possibility is that we might retain an accurate recollection of performance. The bad things are that we could do any or all of the following:

• forget, losing that recollection forever

• remember the performance but ascribe it to the wrong student

• unconsciously allow the memory to change over time due to our observations of more recent performance

• retain a memory of performance that serves as a filter through which we see and interpret all subsequent performance, thus biasing our judgments inappropriately

The chances of these problems occurring increase the longer we try to maintain accurate mental records and the more complex these records are.

For the, reasons listed above, I urge you to limit your use of this filing system to no more than a day or two at most and to very limited targets. If you must retain the record of performance longer than that, write it down—as a checklist, a set of rating scales, or an anecdotal record!

Checking for Errors in Judgment. Subjective scoring—a prospect that raises the anxiety of any assessment specialist—is the hallmark of performance assessment. I hope by now you see why it is that we in the assessment community urge caution as the education community moves boldly to embrace this option. It is fraught with potential danger and must be treated with great care.

We already have discussed many ways to assure that our subjective assessment process is as objective as it can be:

• Be mindful of the purpose for assessing

• Be crystal clear about the target

• Articulate the key elements of good performance in explicit performance criteria

• Share those criteria with students in terms they understand

• Learn to apply those criteria in a consistent manner

• Double check to be sure bias does not creep into the assessment process.

Testing for Bias. There is a simple way to check for bias in your performance evaluations. Remember, bias occurs when factors other than the kind of achievement being assessed begin to influence our judgments, such as the gender, age, ethnic heritage, appearance, or prior academic record of the examinee. You can determine the degree of objectivity of your ratings by comparing them with the judgments of another trained and qualified evaluator who independently observes and evaluates the same student performance with the intent of applying the same criteria. If, after observing and evaluating performance, two independent judges generally agree on the level of proficiency demonstrated, then we have evidence that the results reflect student proficiency. But if the judges come to significantly different conclusions, they obviously have applied different standards. We have no way of knowing which is the most accurate estimate of true student achievement. Under these circumstances the accuracy of the assessment must be called into question and the results set aside until the reasons for those differences have been thoroughly explained.

Time for Reflection

It's tempting to conclude that it is unrealistic to gather corroborating judgments in the classroom—to double check ratings. But can you think of any helpers who might assist you in your classroom by playing the role of second rater of a performance assessment? For each, what would it take to involve them productively? What benefits might arise from their involvement?

Practical Ways to Find Help. While this test of objectivity, or of evaluator agreement, promises to help us check an important aspect of performance assessment quality, it seems impractical for classroom use for two reasons: it's often difficult to come up with a qualified second rater, or we lack the time and expertise required to compare evaluations.

In fact, however, this process need not take so much time. You need not check all of your judgments for objectivity. Perhaps a qualified colleague could double check just a few—just to see if your ratings are on target.

Further, it doesn't take a high degree of technical skill to do this. Have someone who is qualified rate some student performance you already have rated, and then sit down for a few minutes and talk about any differences. If the performance to be evaluated is a product students created, have your colleague evaluate a few. If it's a skill, videotape a few. Apply your criteria to one and check for agreement. Do you both see it about the same way? If so, go on to the next one. If not, try to resolve differences, adjusting your performance criteria as needed.

Please understand that my goal here is not to have you carry out this test of objectivity every time you conduct a performance assessment. Rather, I want you to understand the spirit of this test of your objectivity. An important part of the art of classroom performance assessment is the ability to sense when your performance criteria are sufficiently explicit that another judge would be able to use them effectively, if called upon to do so. Further, from time to time it is a good idea to actually check whether you and another rater really do agree in applying your criteria.

On those occasions, however, when you are conducting very important performance assessments that have significant impact on-students (i.e., for promotion decisions, graduation decisions, and the like), you absolutely must at least have a sample of your ratings double checked by an independent rater. In these instances, remember you do have access to other available evaluators: colleagues in your school or district, your building administrative staff, support teachers, curriculum personnel, experts from outside the field of education (when appropriate), retired teachers in your community, qualified parents, and others.

In addition, sharing your criteria with your students and teaching them to apply those standards consistently can provide you with useful insights. You can be assured that they will tell you which criteria they don't understand.

Just remember, all raters must be trained to understand and apply your standards. Never assume that they are qualified to evaluate performance on the basis of prior experience if that experience does not include training in using the criteria you employ in your classroom. Have them evaluate some samples to show you they can do it. If training is needed, it very often does not take long. Figure 8-8 presents steps to follow when training raters. Remember, once they're trained, your support raters are allies forever. Just think of the benefits to you if you have a pool of trained evaluators ready to share the workload

More about Students as Partners

Imagine what it would mean if your helpers—your trained and qualified evaluators of process and/or product—were your students. Not only could they be participants in the kind of rater training spelled out in Figure 8-8, but they might even be partners in the process of devising the performance criteria themselves. And, once trained, what if they took charge of training some additional students, or perhaps trained their parents to be qualified raters, too?

The pool of available helpers begins

• Have trainees review and discuss the performance criteria. Provide clarification as needed.

• Give them a sample of work to evaluate that is of known quality to you (i.e., which you already have rated), but not to your trainees.

• Check their judgments against yours, reviewing and discussing any differences in terms of the specifics of the performance criteria.

• Give them another sample of work of known quality to evaluate.

• Compare their judgments to yours again, noting and discussing differences.

• Repeat this process until your trainee converges on your standards, as evidenced by a high degree of agreement with your judgments.

• You and the trainees evaluate a sample of work of unknown quality. Discuss any differences.

• Repeat this process until youhave confidence in your new partner(s) in the evaluation process.

Figure 8-8

Steps in Training Raters of Student Performance

to grow as more participants begin to internalize the meaning of success in your classroom.

Without question, the best and most appropriate way to integrate performance assessment and instruction is to be absolutely certain that the important performance criteria serve as the goals and objectives of the instruction. As we teach students to understand and demonstrate key dimensions of performance, we prepare them to achieve the targets we value. We prepare in sound and appropriate ways to be held accountable for student learning when we are clear and public about our performance criteria, and when we do all in our power to be sure students have the opportunity to learn to hit the target.

In addition, we can make performance assessment an integral, part of the teaching and learning process by involving students in assessment development and use:

• Share the performance criteria with students at the beginning of the unit of instruction.

• Collaborate with students in keeping track of which criteria have been covered and which are yet to come.

• Involve students in creating prominent visual display of important performance criteria for bulletin boards.

• Engage students in the actual development of performance exercises.

• Engage students in comparing contrasting examples of performance, some of which reflect high-quality work and some of which do not (perhaps as pan of a process of developing performance criteria).

• Involve students in the process of transforming performance criteria into checklists, rating scales, and other recording methods.

• Have students evaluate their own and each other's performance, one on one and/or in cooperative groups.

• Have student’s rate performance and then conduct studies of how much agreement (i.e., objectivity) there was among student judges; see if degree of agreement increases as students become more proficient as performers and as judges.

• Have students reflect in writing on their own growth over time with respect to specified criteria.

• Have students set specific achievement goals in terms of specified criteria and then keep track of their own progress.

• Store several samples of each student's performance over time, either as a portfolio or on videotape, if appropriate, and have students compare old performance to new and discuss in terms of specific ratings.

• Have students predict their performance criterion by criterion, and then check actual evaluations to see if their predictions are accurate.

Time for Reflection

Have you ever been involved in any of these ways of assessing your own performance—as a partner with your teacher? If so, what was the experience like for you?

These activities will help increase students' control of their own academic wellbeing and will remove the mystery that too often surrounds the meaning of success in the classroom.

Barriers to Sound Performance Assessment

There are many things in the design and development of performance assessments that can cause a student's real achievement to be misrepresented. Many of the potential problems and remedies are summarized in Table 8-2.

CHAPTER SUMMARY: THOUGHTFUL DEVELOPMENT YIELDS SOUND ASSESSMENTS

This chapter has been about the great promise of performance assessment. However, the presentation has been tempered with the need to develop and use this option cautiously. Performance assessment, like other methods, brings with it specific rules of evidence. We must all strive to meet those rigorous standards.

We began with an overview of the three steps in developing performance assessments: clarifying performance (dealing with the nature and focus of the achievernent to be assessed), developing exercises (dealing with the nature, content, and number of exercises), and scoring (dealing with kinds of scores, recording results, and identifying and training the evaluator). As we covered each step, we discussed how students could become full partners in performance assessment design, development, and use. The result will be better performers.

Source of Problems

Remedy

Inadequate vision of the target

Wrong method for the target

Incorrect performance criteria

Unclear performance criteria

Poor-quality exercises

Inadequate sample of exercises

Too little time to evaluate

Untrained evaluators

Inappropriate scoring method selected

(holisti vs. analytical)

Poor record keeping

Keeping the criteria and performance assessment process a mystery to students

Seek training and help needed to clarify the vision. Collaborate with others in this process.

Stick to process and product targets when using performance assessment.

Compare contrasting cases to zero in on key differences. Tap into sources of quality criteria devised by others.

Study samples of performance more carefully. Seek qualified expertise whenever necessary.

Think about and specify achievement to be demonstrated, conditions, and standards to be applied.

Define the domain to be sampled as precisely as possible. Gather as much evidence in your corroborate your judgments.

Add trained evaluators – they are available!

Use clear criteria and examples of performance as a starting point in training them.

Understand the relationship between holistic and analytical scoring and assessment purpose.

Strive for accurate written records of performance judgments. Don’t depend on memory.

Don’t

To assure quality, we discussed the need to understand the role of subjectivity in performance assessment. We also analyzed the match between performance assessment and the five kinds of achievement targets, concluding that strong matches can be developed for mastery of knowledge through reference materials, reasoning, skills, and products. We discussed the key context factors to consider in selecting this methodology for use in the classroom, centering mostly on the importance of having in place the necessary expertise and resources.

We devised six practical steps for formulating sound performance criteria, urging collaboration with students and/or colleagues in the process. We set standards for sound exercises, including the need to identify the achievement to be demonstrated, the conditions of the demonstration, and the standards of quality to be applied. And finally, we spelled out scoring options, suggesting that analytical evaluation of student work is likely to be most productive, especially when students are trained to apply standards of quality to their own and each other's work.

As the decade of the 1990s unfolds, we will come to rely more and more on performance assessment methodology as the basis for our evaluation of student

achievement, and as a means of integrating assessment and instruction. Let us strive for the highest quality, most rigorous assessments our resources will allow.

EXERCISES TO ADVANCE YOUR LEARNING
Knowledge Outcomes

1. Memorize the three basic parts and nine design decisions that guide the performance assessment development process.

2. List the aspects of performance assessment design that require professional judgment and the dangers of bias associated with each.

3. Specify the kinds of achievement targets that can be transformed into the performance assessment format and those that cannot.

4. Identify the factors to take into account in considering use of the performance assessment option.

5. Describe the key considerations in devising a sound sample of performance exercise.

6. Memorize the six steps in the design of performance criteria and the basic ingredients of sound exercises.

7. In your own words, list as many ways as you can to bring students into the performance assessment process as partners.

Reasoning Outcomes

1. Find an example of a performance assessment previously developed.by you or others and evaluate it. Using the framework provided in this chapter, analyze the underlying structure of the assessment and evaluate each part to see if standards of quality have been met. Write a complete analysis of the assessment, detailing what you would do to improve it, if necessary.

Skill Outcomes

1 select a unit of instruction from the material you teach, will teach, or have studied as a student, that includes skill or product outcomes. Go through the process of devising a performance assessment for one of those outcomes, including performance criteria, exercises, and a scoring and recording scheme.

Product Outcomes

1. Evaluate the assessment you created in the exercise above in terms of the attributes of sound assessment discussed in this and earlier chapters. How did you do?

Affective Outcomes

1. Some have argued that performance assessments are too fraught with potential bias due to evaluator subjectivity to justify the attention they are receiving these days. Do you agree? Why?

2. Throughout the chapter, I argue that the assessment development procedures outlined here will help teachers who use performance assessments to connect those assessments directly to their instruction. Having completed the chapter, do you agree? Why?

3. I also argue that the assessment development and use procedures suggested herein, while apparently very labor intensive, could save you valuable teaching time in the long run. Do you agree? Why?

KUMPULAN ILMU TENTANG KESEHATAN (COLLECTION OF HEALTH SCIENCE)

Jumat, 22 Maret 2013

PERFORMANCE ASSESSMENT: AN OLD FRIEND REDISCOVERED

Tidak ada komentar:

Posting Komentar