Chapter
Objectives
As a result of studying the material presented in
this chapter, reflecting on that material, and completing the learning
exercises presented at the end of the chapter, you will:
1.
Master content knowledge:
a.
Understand the roles of objectivity
and subjectivity in performance assessment.
b.
Know the kinds of achievement targets
that can be reflected in performance assessments.
c.
List three basic parts and nine design
decisions that comprise the steps in performance assessment development.
d.
Specify classroom assessment context
factors to consider in deciding whether or when to use performance assessments.
e.
State considerations in sampling
student performance via performance exercise.
f.
Know specific guidelines for the
construction of performance exercises and scoring schemes.
g.
Specify ways to bring students into
the performance assessment process as a teaching strategy.
2.
Be able to use that knowledge to
reason as follows:
a.
Identify the kinds of skills and
products that can form the basis of performance assessment exercises.
b.
Transform those important learning’s into
quality exercises and scoring criteria.
3.
Become proficient at the following skills:
a.
Be able to carry out the steps in
designing performance assessments.
b.
Evaluate previously developed
performance assessments to determine their quality.
4.
Be able to create quality products of
the following sorts:
a.
Performance assessments that meet
standards of quality.
5.
Attain the following affective outcomes:
a.
Understand the importance of knowing
about sound performance assessment.
b.
Value performance assessment as a viable
option for use in the classroom.
c.
See performance assessment as a
valuable instructional tool in which students can and should be full partners.
d.
Regard performance assessment with
caution, valuing the need to adhere to rigorous standards of quality in
development and use.
The education community has
discovered performance Assessment methodology! Across the land, we are in a
frenzy to learn about and use this "new" assessment discovery.
Performance assessments involve students in activities that require the
demonstration of certain skills and/or the creation of specified products. As
a result, this assessment methodology permits us to tap many of the complex
educational outcomes we value that cannot be translated into paper and pencil
tests.
With performance assessments, we observe students
while they are performing, or we examine the products created, and we judge the
level of proficiency demonstrated. As with essay assessments, we use these
observations to make subjective judgments about the level of achievement
attained. Those evaluations are based on comparisons of student performance to
preset standards of excellence.
For example, a primary-grade
teacher might watch a student interacting with classmates and draw inferences
about that child's level of development in social interaction-skills. If the
levels of achievement are clearly defined in terms the observer can easily
interpret, then the teacher, observing carefully, can derive information from
watching that will aid in planning strategies to promote further social development.
Thus, this is not an assessment where answers are counted right or wrong.
Rather, like the essay test, we rely on teacher judgment to place the student's
performance somewhere on a continuum of achievement levels ranging from very
low to very high.
From a completely different
context, a middle school science teacher might examine "mousetrap
cars" built by students to determine if certain principles of energy
utilization have been followed. Mousetrap cars are vehicles powered by one snap
of a trap. One object is to -see who can design a car that can travel the
farthest by converting that amount of energy into forward motion. When the
criteria are clear, the teacher can help students understand why the winning
car goes furthest.
Performance assessment
methodology has arrived on the assessment scene with a flash of brilliance
unprecedented since the advent of selected response test formats earlier in
this century. For many reasons, this "new discovery" has struck a
chord among educators at all levels. Recently popular applications carry such
labels as authentic assessments, alternative assessments, exhibitions,
demonstrations, and student work samples, among others.
These kinds of assessment are
seen as providing high-fidelity or authentic assessments of student
achievement (Wiggins, 1989). Proponents contend that, just as high-fidelity
music provides accurate representations of the original music, so too can
performance assessments provide accurate reproductions of complex achievements
under performance circumstances that stretch into life beyond school. However,
some urge great caution in our rush to embrace this methodology, because
performance assessment brings with it great technical challenges. They
correctly point out that this is a very difficult methodology to develop and
use well (Dunbar, Koretz, & Hoover, 1991).
CHAPTER
ROADMAP
As with the other forms of
assessment, there are three critical contexts within which your knowledge of
performance assessment methodology will serve you well. First, you will design
and develop performance assessments for use in your classroom in the future, if
you are not doing so already. The quality of those assessments, obviously, is
in your hands.
Further, the education
literature, published textbooks, and instructional materials and published
test materials all are beginning to include more and more examples of
performance assessments, which you might consider for use in your classroom.
Once again, you are the gatekeeper. Only you can check these for quality and appropriateness
for use in your context.
And finally, as with other forms
of assessment, you may find yourself on the receiving end of a performance
assessment. Obviously, it is in your best interest to be sure these are as
sound as they can be—as sound as you would want them if you were to use them to
evaluate your own students. Be a critical consumer: If you find flaws in the
quality of these assessments, be diplomatic, but call the problems to the
attention of those who will evaluate your performance.
To prepare you to fulfill these
responsibilities, our journey will begin with an explanation of the basic
elements of a performance assessment, illustrated with simple examples; we
continue by examining the role of subjectivity in this form of assessment; and
then we will analyze the kinds of achievement targets performance assessments
can serve to reflect.
Further, as we travel, we will do
the following:
•
Complete a detailed analysis of the
assessment development process, including specific recommendations for actions
on your part that can help you avoid the many pitfalls to sound assessment that
accompany this alternative.
|
SELECTEC
REDPONSE
|
ESSAY
|
PERFORMANCE
ASSESSMENT
|
PERSONAL
COMMUNICATION
|
Know
|
|
|
|
|
Reason
|
|
|
|
|
Skill
|
|
|
|
|
Product
|
|
|
|
|
Affect
|
|
|
|
|
Figure
8-1
Aligning
Achievement Targets and Assessment Methods
•
Address strategies for devising the criteria
by which to evaluate performance, suggestions for developing quality exercises
to elicit performance to be evaluated, and ideas for making and recording
judgments of proficiency.
•
Explore the integration of performance
assessment with instruction, not as a concluding section at the end of the
chapter, but as an integral part of the entire presentation on performance
assessment methodology.
In this way, we will be able to
see the great power of this methodology in the classroom: its ability to place
students in charge of their own academic well-being.
As you proceed through this
chapter, keep the big picture in mind. The shaded cells in Figure 8-1 show the
material we will be addressing herein.
Also as we proceed, be advised
that this is rust part one of a multipart treatment of performance assessment
methodology. It is intended only to reveal the basic structure of performance
assessments. The remaining parts are included in Part 3, on classroom
applications. After we cover the basic structure and development of these
assessments in this chapter, we will return with many more examples in Chapter
10, on assessing reasoning in all of its forms, and in Chapter 11, on using
performance assessments to evaluate many skills and products. Your understanding
of performance assessment methodology is contingent upon studying, reflecting
upon, and applying material covered in all three chapters.
A
Note of Caution
While I tend to be a proponent of
performance assessment methodology because of its great potential to reflect
complex and valuable outcomes of education, I urge caution on two fronts.
First, please understand that
there is nothing new about performance assessment methodology. This is not
some kind of radical invention recently fabricated by opponents of traditional
tests to challenge the testing industry. Rather, it is a proven method of
evaluating human characteristics that has been in use for decades (Linquist,
1951), for centuries, maybe even for eons. For how long have we selected our
leaders, at least in part, on the basis of our observations of and judgments
about their performance under pressure? Further, this is a methodology that has
been the focus of sophisticated research and development both in educational
settings and in the workplace for a very long time (Berk, 1986).
Besides, anyone who has taught
knows that teachers routinely observe students and make judgments about their
proficiency. Admittedly, some of those applications don't meet accepted
standards of assessment quality (Stiggins & Conklin, 1992). But we know
performance assessment is common in the classroom and we know how to do it
well. Our challenge is to make the assessment meet the standards.
Virtually every bit of research
and development done in education and business over the past decades leads to
the same conclusion: performance assessment is a complex way to assess. It
requires that users prepare and conduct their assessments in a thoughtful and
rigorous manner. Those unwilling to invest the time and energy needed to do it well
are better off assessing in some other way.
Second, performance assessment
methodology is not the panacea some advocates seem to think it is. It is
neither the savior of the teacher, nor the key to assessing the
"real" curriculum. It is just one tool capable of providing an
effective and efficient means of assessing some—but not all--of the most highly
valued outcomes of our educational process. As a result, it is a valuable tool
indeed. But it is not the be-all and end-all of the educational assessment
process. For this reason, it is critical that we keep this form of assessment
in balance with the other alternatives.
Although performance assessment
is complex, and requires care to use well, it certainly does hold the promise
of bringing teacher, students, and instructional leaders into the assessment
equation in unprecedented ways. But the cost of reaching this goal can be high.
You must meet the considerable challenge of learning how to develop and use
sound performance assessments. This will not be easy! There is nothing here you
cannot master, but don't take this methodology lightly—we're not talking about
"assessment by guess and by gosh" here. There is no place in
performance assessment for "intuitions" or ethereal "feelings"
about the achievement of students. It is not acceptable for a teacher to claim
to just “know” a student can do it. Believable evidence is required. This is
neither a mystical mode of assessment, nor are keys to its proper use a
mystery. It takes thorough preparation and meticulous attention to detail to
attain appropriate levels of performance assessment rigor.
Just as with other modes of
assessment, there are rules of evidence for sound performance assessment.
Remember, sound assessments 'do the following:
•
arise from clear and appropriate
achievement targets
•
serve a clearly articulated purpose
•
rely on proper assessment methods
•
sample performance appropriately
•
control for all relevant sources of
extraneous interference
Adhere to these rules of evidence
when developing performance assessment and you can add immeasurably to the
quality and utility of your classroom assessments of student achievement.
Violate those rules—which is very easy to do in this case!—and you place your
students at risk.
Time for Reflection
As you have seen, performance
assessments are based on observation and judgment. Can you think of instances
outside of the school setting where this mode of assessment comes into play? In
the context of hobbies? In work settings? In othef contexts? Please list. five
or six examples.
PERFORMANCE
ASSESSMENT AT WORK IN THE CLASSROOM
To appreciate the extremely wide
range of possible applications of performance assessment, we need to explore
the many design alternatives that reside within this methodology. I will
briefly describe and illustrate those now, and will then show you how one
professor put this design framework to work in her very productive learning
environment.
An
Overview of the Basic Components
We initiate the creation of
performance assessments just as we initiated the development of paper and
pencil tests as described in the previous two chapters: We start with a plan or
blueprint. As with selected response and essay assessments, the performance
assessment plan includes three components. In this case, each component
contains three specific design decisions within it.
First, the performance assessment
developer must clarify the performance to be evaluated. Second, performance
exercises must be prepared. And third, systems must be devised for scoring and
recording results.
The immense potential of this
form of assessment becomes apparent when we consider all of the design options
available within this three-part structure. Let's explore these options.
Part 1:
Clarifying Performance. Under this heading, the user
has the freedom to select from a nearly infinite range of achievement target
possibilities. We can focus performance assessments on particular targets by
making three specific design decisions, addressing the kind of performance to
be assessed, identifying who will be assessed, and specifying performance
criteria.
Nature of
Performance. This first design decision requires
that we answer these basic questions: How will successful achievement manifest
itself? Where will the evidence of proficiency most easily be found?
Performance might take the form
of a particular set of skills or behaviors that students must demonstrate. In
this case, we watch students "in process," or while they are actually
doing something, and we evaluate the quality of their performance. The example
given earlier of the primary-grade teacher observing the youngster in
interaction with other students illustrates this kind of performance
assessment. In that instance, success manifests itself in the actions of the
student.
On the other hand, we also can
define performance in terms of a particular kind of product to be created, such
as the mousetrap car. In this application, the standards of performance would
reflect important attributes of an energy-efficient car. The teacher would
examine the car and determine the extent to which the criteria of efficiency
had been met. Evidence of success is found in the attributes of the car as a
product.
Some contexts permit or may even
require the observation and evaluation of both skill and product. For example,
you might watch a student operate a computer (skill) and evaluate the final
product (the resulting program or other document) to evaluate that student's
success in hitting both key parts of that achievement target.
Time for Reflection
Based on your experience as a
student and/or teacher, can you think of additional classroom contexts where it
might be relevant to assess both process and product?
Focus of the
Assessment. To address this design decision, we
must understand that performance assessments need not focus only on individual
student behaviors. They also can apply to the observation of and judgment about
the performance of students functioning in a group.
In these times of cooperative
learning, the evaluation of teamwork can represent a very important and useful
application. Two kinds of observations are worthy of consideration. One focuses
on the group interaction behaviors. The observer tracks the manner in which the
group works as a whole. The group is the unit of analysis of performance, if
you will. A sample performance assessment question might be, Given a problem to
solve, does the group exhibit sound group problem-solving skills?
The other form of observation
focuses on individual behaviors in a group context and summarizes those across
individuals. For example, observers might tally .and/or evaluate the nature of
instances of aggressive and dangerous playground ground behavior. These can be
very informative and useful assessments.
Performance Criteria. Once we have
decided on the performance and performer upon which to focus, the real work
begins. Attention then shifts to (a) specifying in writing all key elements of
performance, and (b) defining a performance continuum for each element so as
to depict in writing what performance is like when it is of very poor quality,
when it is outstanding, and all key points in between. These key elements or
dimensions of performance are called the Performance criteria.
In terms of the two examples we
have been discussing, performance criteria answer the questions, What are the
desirable social interaction behaviors for a primary-grade student? What are
the specific attributes of an energy-efficient mousetrap car?
Clear and appropriate performance
criteria are critical to sound performance assessment. When we can provide
sound criteria, we are in for ati easy and productive application of this
methodology. Not only will we be in focus on the expected outcomes, but with
clearly articulated performance criteria, as you shall see, both students and
teachers share a common language in which to converse about those expectations.
Time for Reflection
When you are evaluating a movie,
what criteria do you apply? How about a restaurant? Write down criteria you
think should be used as the basis for evaluating a teacher.
Part 2:
Developing Exercises. In designing performance exercises,
we must think of ways to cause students to perform in a manner that will reveal
their level of proficiency. How can we cause them to produce and present a
sample product for us to observe and evaluate, for example? In this facet of
performance assessment design, we decide the nature of the exercises, the
number of exercises needed, and the actual instructions to be given to
performers.
Nature of
Exercises. Performance assessment offers two exercise
choices that, once again, reveal the rich potential of this methodology.
Specifically, there are two ways to elicit performance for purposes of
evaluation.
One option is to present a
structured exercise in which you provide respondents with a predetermined and
detailed set of instructions as to the kind of performance desired. They are
completely aware of the assessment, they reflect upon and prepare for this
assignment, and then they provide evidence of their proficiency. For example,
they might be asked to prepare to give a certain kind of speech, perform some
kind of athletic activity, write a term paper, or build a mousetrap car.
But also be advised that
performance assessment offers another option not available with any other form
of assessment. You can observe and evaluate some kinds of performance during
naturally occurring classroom events and gather useful information about
"typical" student performance. For example, the primary-grade teacher
interested in the social interaction skills of one student obviously would
disrupt the entire assessment by instructing the student, "Go interact
with that group over there, so I can evaluate your ability to get along."
Such an exercise would run completely counter to the very essence of the
assessment. Rather, the teacher would want to stand back and watch the
student's behavior unfold naturally in a group context to obtain usable
information. Assessments that are based on observation and judgment allow for this
possibility, while others do not. Just try to be unobtrusive with a true/ false
test!
It may have become apparent to
you also that you can combine observations derived from administration of
structured' exercises and from the context of naturally occurring events to
generate corroborating information about the same achievement target. For
example, an English teacher might evaluate writing proficiency gathered in
response to a required assignment and in the context of student daily writing
journals done for practice. The combined information might provide insights
about specific student needs.
Time for Reflection
Can you think of an instance
outside of school in which observation of naturally occurring performance
* serves as the basis for an assessment?
In the con- text of a hobby? In a work setting? In some other context? What is
observed and judged and by whom?
Content of
Exercises. The final exercise-related design component is
the actual content of the exercise. Like the essay exercise discussed in
Chapter 7, instructions for structured exercises should include the kind(s) of
achievement to be demonstrated, conditions under which that demonstration is to
take place, and standards of quality to be applied in evaluating performance.
Here is a simple example of a complete exercise:
Achievement:
Your four-person team is to do the research requied to prepare a group
presentation on the dwellings and primary food sources of the Native American
tribe you have selected.
Conditions.
The manner in which you carry out the background research and divide up
responsibilities within the group is up to you. The focus of the evaluation
will be your presentation. (Note that the
process will not be evaluated in terms of doing the background research or
preparation, but it will be in the process of giving the presentation)
Standards.
Your presentation will be evaluated according to the criteria we develop
together in class, dealing with content (scope, organization, and accuracy) and
delivery (use of learning aids, clarity, and interest value for the audience).
Number of
Exercises. Once the nature of the exercise is determined,
you must decide how many exercises are needed. This is a sampling issue. How
many examples of student performance are enough? As discussed, you must decide
how many exercises are needed to provide a representative sample of all the
important questions you could have asked given infinite time. If you want to
know if students can speak French, how many times do they have to speak for you
to be reasonably certain you could predict how well they would do given one
more chance? How many samples of writing must you see to be confident drawing
conclusions about writing proficiency? In fact, the answers to these questions
are a function of the assessment context. To answer them, we must consider
several factors, including the reasons for assessment and other issues. We will
review these factors later in the chapter.
Part 3:
Scoring and Recording Results. Once performance has
been clarified and exercises developed, procedures for managing results must be
specified.
Level of
Detail of Results. First, the user must select one of
two kinds of scores to generate from the assessment. Option one is to evaluate
performance analytically, making independent judgments about each of the
performance criteria separately. In this case, performance is profiled in terms
of individual ratings. Option two is called holistic scoring. In this case, one
overall judgment is made about performance that combines all criteria into one
evaluation. The choice is a function of the manner in which assessment results
are to be used. Some uses require the high-resolution microscope of the
analytical system, while others require the less precise but also less costly
holistic process.
Recording
Procedures. Second, designers must select a specific method
for transforming performance criteria into usable information through a system
of recording procedures. Once again, the great flexibility of performance assessment
methodology comes through. Users have many choices here, too:
•
checklists of desired attributes
present or absent in performance
•
various kinds of performance rating
scales
•
anecdotal records, which capture
written descriptions of and judgments about performance
•
mental records, which capture images
and records of performance in the memory of the evaluator for later recall and
use (to be used cautiously!)
Identifying
the Rater. And finally, performance assessment users must
decide who will observe and evaluate performance. In most classroom contexts,
the most natural choice is the teacher. Since performance evaluators must
possess a clear vision of the desired achievement and be capable of the
rigorous application of the performance criteria, who could be more qualified
than the teacher?
Just be advised that you have
other choices. You might rely on some outside expert to come to the classroom
and participate. Or you might rely on the students to conduct self-assessments
or to evaluate each other's performance.
The instructional potential of
preparing students to apply performance criteria in a rigorous manner to their
own work should be obvious. I will address this application in greater detail
throughout the chapter.
Time for Reflection
As a student, have you ever been
invited to observe and evaluate the skill or product performance of other
students? What did you observe? What criteria did you use? Were you trained to
assess? What was that experience like for you?
Summary
of Basic Components. Figure &-2 lists the nine
design decisions faced by any performance assessment developer. Also included
are the design options available within each decision.
Design
Factors
|
Options
|
1. Clarifying
Performance
Nature
of performance
Focus
of the assessment
Performance
Criteria
|
Behavior to
be demonstrated
Product to
be created
Individual
performancd
Group
performance
Reflect key,
aspects of the specific target
|
2. Developing
Exercises
Nature of. Exercises
Content of exercises
Number of exercises
|
Structured assignment
Naturally
occurring events
Defines-target;
conditions, and standards
Function.of
purpose, target, and available Resources
|
3. Scoring and
Recording Results
Level of detail of results
Recording procedures
Identifying the rater
|
Holistic
Analytical
Checklist
Rating
Anecdotal
record
Mental
records
Teacher
Outside
expert
Student
self-evaluation
Studeni peer
evaluation
|
Figure 8-2
Performance Assessment Design Framework
ENSURING
THE QUALITY OF PERFORMANCE ASSESSMENTS
If we are to apply the design framework shown in
Figure 8-2 productively, we need to understand where the pitfalls to sound
performance assessment hide. For instance, if we are not careful, problems can
arise from the inherently subjective nature of performance assessment. Other
problems can arise from trying to use this methodology in places where it
doesn't belong.
Subjectivity
in Performance Assessment
Professional judgment guides every aspect of the
design and development of any performance assessment. For instance, as the
developer and/or user of this method, you establish the achievement target to
be assessed using input about educational priorities expressed in state and
local curricula, your text materials, and the opinions of experts in the
field. You interpret all of these factors and you decide what will be
emphasized in your classroom—based on professional judgment.
Further, you select the
assessment method to be used to reflect that target. Based on your vision of
the valued outcomes and your sense of the assessment options available to you,
you make the choices. This certainly qualifies as a matter of professional
judgment.
In the classroom, typically you
create the assessment, either selecting from among some previously developed
options or generating it by yourself. If you generate it yourself, you choose
whether to involve students or other parties in that design process. In the
case of performance assessment, the first design issue to be faced is that of
devising performance criteria, those detailed descriptions of success that will
guide both assessment and instruction. This translation of vision into criteria
is very much a matter of professional judgment.
So is the second design decision
you must make: formulating performance exercises, the actual instructions to
respondents that cause them to either demonstrate certain skills or create some
tangible product, so their performance can be observed and evaluated. And
finally, of course, this observation and evaluation process is subjective too.
Every step along the way is a matter of your professional and subjective
judgment.
For decades, the assessment community
has afforded performance assessment the status of second-class citizenship
because of the potentially negative impacts of all of this subjectivity. The
possibility of bias due to subjective judgment has rendered this methodology
too risky for many.
More recently, however, we have
come to understand that carefully trained performance assessment users, who
invest the clear thinking and developmental resources needed to do a good job,
can use this methodology effectively. Indeed, many of the increasingly complex
achievement targets that we ask students to hit today demand that we use
performance assessments and use them well. In short, we now know that we have
no choice but to rely on subjective performance assessment in certain contexts.
So we had better do our homework as an education community!
Here I must insert as strong a
warning as any presented anywhere in this book: In your classroom, you will set
the standards of assessment quality. It is your vision that will be translated
into performance criteria, exercises, and records of student achievement. For
this reason, it is not acceptable for you to hold a vision that is wholly a
matter of your personal opinion about what it means to be academically
successful. Rather, your vision must have the strongest possible basis in the
collective academic opinions of experts in the discipline within which you
assess and of colleagues and associates in your school, district, and
community.
Systematic assessment of student
performance of the wrong target is as much a waste of time as a haphazard
assessment of the proper target. The only way to prevent this is for you to be
in communication with those who know the right target and the most current best
thinking about that target and for you to become a serious student of their
standards of academic excellence. Strive to know the skills and products that
constitute maximum proficiency in the disciplines you assess.
Time for Reflection
What specific sources can
teachers tap to be sure they understand skill and performance outcomes?
Matching
Method to Target
As the meaning of academic excellence becomes
clear, it will also become clear whether or when performance assessment is, in
fact, the proper tool to use. While the range of possible applications of this
methodology is broad, it is not infinitely so. Performance assessment can
provide dependable information about student achievement of some, but not all,
kinds of valued outcomes. Let's examine the matches and mismatches with the
five kinds of outcomes we have been discussing: knowledge, reasoning, skills,
products, and affect.
Assessing
Knowledge.
If the objective is to determine if students have mastered a body of knowledge
through memorization, observing performance or products may not be the best way
to assess. Three difficulties can arise in this context, one related to
potential sampling errors, another to issues of assessment efficiency, and a third
related to the classroom assessment and instructional decision-making context.
Consider, for example, asking
students to participate in a group discussion conducted in Spanish as a means
of assessing mastery of vocabulary and rules of grammar. While this is an
apparently authentic assessment, it might lead you to incorrect conclusion.
First, the students will naturally choose to use vocabulary and syntax with
which they are most comfortable and confident. Thus they will naturally select
biased samples of all possible vocabulary and usage.
Second, if this is an assessment
of the level of knowledge mastery of a large number of students, the total
assessment will take a great deal of time. This may cause you to collect too
small a sample of the performance of each individual, leading to undependable
results. Given this achievement target, it would be much more efficient from an
achievement sampling point of view to administer a simple objectively scored
vocabulary and grammar test. Then, once you are confident that the foundational
knowledge has been mastered and the focus of instruction turns to real-world
applications, you might turn to the group discussion performance assessment.
Consider this same issue from a
slightly different perspective. If you use the performance assessment as a
reflection of the knowledge mastery target, it will be difficult to decide how
to help the student who fails to perform well. It will not be clear what went
wrong. Is the problem a lack of knowledge of the vocabulary and grammar, and/or
an inability to pronounce words, and/or anxiety about the public nature of the
demonstration? Since all three are hopelessly confounded with one another, it
becomes difficult to decide on a proper course of action. Thus once again,
given this target and this context, performance assessment may not be the best
choice.
When the knowledge to be
memorized is to be sampled in discrete elements, a selected response format is
best. When larger structures of knowledge are the target, the essay format is
preferable. Both of these options offer more control over the material
assessed.
However, if your assessment goal
is to determine if students have gained control over a body of knowledge
through the proper and efficient use of reference materials, then performance
assessment might work well. For instance, you might give students the exercise
of finding a number of facts about a given topic and observe the manner in
which they attack the problem, applying performance criteria related to the
process of using particular library reference services and documents. A
checklist of proper steps might serve as the basis for scoring and recording
results of the assessment.
Or, you might ask for a written
summary of those facts, which might be evaluated on rating scales in terms of
the speed with which it was generated, the efficiency of the search, and the
accuracy and thoroughness of the summary. Observation and judgment might play
a role here.
Assessing
Reasoning. Performance assessment also can
provide an excellent means of assessing student reasoning and problem-solving
proficiencies. Given complex problems to solve, students must engage in
thinking and reasoning processes that include several steps. While we cannot
directly view the thought processes, we can use various kinds of proxy measures
as the basis for drawing inferences about the reasoning carried out.
For example, we might give
chemistry students unidentified substances to identify and watch how they go
about setting up the apparatus and carrying out the study. The criteria might
reflect the proper order of activities. Those who reason well will follow the
proper sequence and succeed. Those whose reasoning is flawed will go awry. Some
might classify this as a selected response test: students identify the
substance correctly or they do not; right or wrong. While that is true in one
sense, think about how much richer and more useful the results are when the
assessment is conceived and carried out as a performance assessment—especially
when the student fails to identify the substance accurately. A comparison of
the reasoning actually carried out with the reasoning spelled out in the
performance criteria will be very revealing, and instructionally relevant.
Performance assessments
structured around products created by students also can provide insight into
the reasoning process. The resulting product itself is a reflection of sound or
unsound reasoning during its development. One simple example might be the
production of a written research report by students who carried out the above
experiment. That report would reflect and provide evidence of their
problem-solving capabilities.
Another example of a
product-based performance assessment would be the physics challenge of building
a tower out of toothpicks that will hold a heavy load. One performance
criterion certainly will be the amount of weight it can hold. But others might focus
on whether the builder adhered to appropriate engineering principles. The
product-based performance assessment can help reveal the ability to apply
those principles.
In fact, the thoughtful
development and use of this performance assessment can help students achieve
such a problem-solving goal. For example, what if you gave students two towers
built purposely to hold vastly different amounts of weight? They might be told
to analyze each in advance of the load-bearing experiment to predict which would
hold more. Further, you might ask them to defend their prediction with specific
design differences. After the experiment reveals the truth, the students are
more likely to be able to infer how to build strong towers. In essence, the
problem-solving criteria will have been made clear to them.
Assessing
Skills. The great strength of performance assessment
methodology lies in its ability to ask students to perform in certain ways and
to provide a dependable means of evaluating that performance. Most communication
skills fall in this category, as do all forms of performing, visual, and
industrial arts. The-observation of students in action can be a rich and useful
source of information about their attainment of very important forms of skill
achievement. We will review many examples of these as our journey continues.
Assessing
Products. Herein lies the other great strength
of performance assessment. There are occasions when we ask students to create
complex achievement-related products. The quality of those products indicates
the creator's level of achievement. If we develop sound performance criteria
that reflect the key attributes of these products and learn to apply those
criteria well, performance assessment can serve us as both an efficient and
effective tool. Everything from written products, such as term papers and
research reports, to the many forms of art and craft products can be evaluated
in this way. Again, many examples will follow.
Assessing
Affect. To the extent that we can draw inferences about
attitudes, values, interests, motivational dispositions, and/or academic
self-concept based either on the actions of students or on what we see in the
products they create, then performance assessment can assist us here, too.
However, I urge caution. Remember, sound
performance assessment requires strict adherence to a pre established set of
rules of evidence. Sound assessments must do the following:
·
Reflect
a clear target—We must thoroughly understand and
develop sound definitions of the affective targets to be assessed.
·
Serve
a clearly articulated purpose—We must know
precisely why we are assessing and what it is we intend to do with the
result--especially tricky in the case of affective outcomes.
·
Rely
on a proper method - The performance must present
dependable information to us about affect.
·
Sample
the target appropriately - We must collect enough evidence
of affect to give us confidence in our conclusions.
·
Control
for extraneous interference - The potential sources of bias
in our judgments about student attitudes, values, interests, and so on must be
understood and neutralized in the context of our assessments.
When applying these standards of
quality to the assessment of achievement outcomes--those content-based targets
we are trained to teach—it becomes somewhat easier to see the translations.
That is, hopefully, we have immersed ourselves far enough in a particular field
of study to attain a complete understanding of its inherent breadth and scope.
We should know when a sample of test items, a set of essay exercises, a
particular performance assessment, or a product evaluation captures the meaning
of academic success.
When it comes to affective
outcomes, however, most of us have had much less experience with and therefore
are much less comfortable with their meaning, depth, and scope. That means
successfully assessing them demands careful and thoughtful preparation.
We can watch students in action
and/or examine the things they create and infer about their affective states.
But we can do this only if we have a clear and practiced sense of what it is
we are looking for and why we are assessing it. I will address these issues in
depth in Chapter 12.
Summary
of Target Matches. There are many important educational
outcomes that can be translated into performance assessments. That is, if we
prepare carefully, we can develop performance criteria and devise exercises to
sample the
following:
•
use of reference material to acquire
knowledge
•
application of that knowledge in a
variety of problem-solving contexts
•
proficiency in a range of skill arenas
•
ability to create diffe-ent kinds of
products
•
feelings, attitudes, values, and other
affective characteristics
In fact, the only target for which performance
assessment is not recommended is the assessment of simple elements or complex
components of subject matter knowledge to be mastered through memorization.
Selected response and essay formats work better here.
DEVELOPING
PERFORMANCE ASSESSMENTS
As with selected response and essay assessments,
we develop performance assessments in three steps. Each step corresponds to one
of the three basic design components introduced earlier. Developers must
specify the performance to be evaluated, devise exercises to elicit the
desired behavior, and develop a method for making and recording judgments.
Unlike other forms of assessment,
however, this form permits flexibility in the order in which these parts are
developed. But before we consider those issues, we will review context factors
we should consider in deciding whether or when to adopt performance assessment
methods.
Context
Factors
Clearly, the prime factors to consider in the
assessment selection process are the appropriateness of the achievement target
for your students and the match of performance assessment methodology to that
target. You must also ask yourself certain practical questions when deciding if
performance assessment is the right choice for your particular context. These
questions are posed in Figure 8-3.
Approximating
the Best. Remember, while performance
assessment can be used to assess reasoning, we also can use selected response
and essay formats to tap this kind of outcome. In addition, while performance
assessment is the best option for measuring attainment of skill and product
outcomes, again, we can use selected response. and essay to assess student
mastery of important prerequisites of effective performance. In this sense,
they represent approximations of the best.
Further, as you will see in
Chapter 9, sometimes we can gain insight into achievement by having students
talk through hypothetical performance situations. Admittedly, these are second
best when compared to the real thing. But they can provide useful information.
These "proxy" measures
might come into play when we seek to size up a group of students very quickly
for instructional planning purposes. In such a case, we might" need only
group performance information, so we could sample a few students from the group
and assess just a few aspects of the performance of each. By combining
information across students, we generate a profile of achievement that
indicates group achievement strengths and needs. We can use such group
information to plan instruction. Under these circumstances, it may be
unnecessary to use costly, full-blown performance assessments of every student.
Rather, we might turn to simpler, more efficient paper and pencil or personal
communication–based approximations of the best assessment to get what we need.
In effect, ct, you can use proxies
to delve into the prerequisites of skill and product performance to determine
individual student needs. Remember that the building blocks of competence
include knowledge to be mastered and reasoning power—both of which can be
assessed with methods that fall short of actual performance
•
Do
you have the expertise required to develop clear and appropriate criteria?
Don't take this too lightly. If you have not developed a deep sense of
important outcomes in the field and therefore don't have a highly differentiated
vision of the target(s), performance assessment car present a very frustrating
challenge. Understand the implications of teaching students to hit the wrong
target) Solicit some help—an outside opinion - just to verity the
appropriateness of your assessment. For instance, find a colleague or maybe a
small team of partners to work with.
•
Are
your students able to perform in the ways required?
Be sure there are no physical and/or emotional handicaps that preclude being
able to do the work required. Primary among the possible performance
Inhibitors maybe evaluation anxiety in those assessment contexts requiring
public displays of proficiency.
•
What
Is the purpose for the assessment? If high-stakes
decisions hang on the assessment results, such as promotion, graduation, a
critical certification of mastery, or the like, you need to be prepared to
invest the time and energy sufficient to produce confident results. Such
critical assessments require a higher degree of confidence than do periodic
examinations that measure current student proficiency levels in an ongoing
classroom situation.
•
How
many students will you assess? The more students
you must assess, the; more carefully you I must think through where or how you
are going to fine required to do them all. There a many creative ways to
economize —such much as sharing the I work with I other trained and qualified
judges like your students, among others.
•
What
is the scope of the achievement target to be assessed?:
scope influences two things: the amount of time over which you sample
performance. If the scope is narrow and the time frame short (e.g. The focus of
one day’s lesson), few exercises will be needed to sample it well. Broader
targets, on the other hand, require more exercise (e.g., a semester’s worth of
material), and demand that you spread your sample of exercises out over an
extended period.
•
Is
the target simple or complex? Complex targets
require more exercises to cover the full range of applications. For example, we
cannot label a student a competent or incompetent writer on the basis of one
exercise, no matter what that exercise requires. Writing is complex, taking
many forms and occurring in many different kinds of contexts. If the target is
complex, exercises must sample enough relevant forms and contexts to lead to
confident inferences about competence.
•
Are
the materials required to perform successfully available in school and / or at
home? Anticipate what specific material students will
need to perform the task at hand: School resources vary greatly, as do
resources available for students at home. Be sure all have an equal opportunity
to succeed before you proceed.
•
What
resources do you have, at disposal to conduct the observation and scoring
required your of your assessment? Obviously, observing
and evaluating students or their products is a labor- intensive activity that
demands much time and effort. If there has been one deterrent to the broader
use of this methodology, it is the time required to do it well. Teachers often
get trapped into thinking that all that work must automatically fall on their
shoulders. This is not so. Other resources can include the principal (!),
teacher aides, parents, outside experts, colleagues, and last but by no means
least, students themselves, evaluating their own or each other’s performance.
Think about the instructional implications of involving them in the process.
But remember, you must train them to apply the criteria dependably.
•
Has
the author of your textbook or workbook, a colleague, , or someone else already
developed sound performance criteria and associated exercises for you to adopt and
use? Verify the quality and train yourself
to apply the criteria dependably and these ready-made assessments can save a
great deal of development time. Also, consider revising them to more closely
fit your needs.
Assessment. Not only can proxy measures serve as a
means of such formative assessment, but to the extent that you involve your
students in the assessment process, they also can introduce students to the
various prerequisites before they need to put them together.
Further, any time resources are
too limited to permit a full-blown performance assessment, we might be forced
to think of alternatives; to come as close as we can to the real thing, given
our resources. While the resulting paper and pencil or personal communication
assessments will fall short of perfection, if they are thoughtfully developed,
they may give us enough information to serve our needs.
If you do decide to use
approximations, however, never lose sight of their limitations: understand the
outcomes they do and do not reflect.
Time for Reflection
Can you remember a paper and
pencil test you have taken that was a proxy measure for an achievement target
that would have been more completely or precisely assessed with a performance
assessment? How close did it come to the real thing, in your opinion?
The
Order of Development
In explaining the basic three-part design
framework, I began by specifying performance, then turned to exercises, then
scoring practices. However, this is not the only possible or viable order of
development for performance assessments.
For instance, we might begin
developing a performance assessment by creating rich and challenging exercises.
If we can present complex but authentic, real-world problems for students to
solve, then we can see how they respond during pilot test administrations and
devise clear criteria for evaluating the achievement' of subsequent
performers.
On the other hand, it is
difficult to devise exercises to elicit outcomes unless and until we have
specified precisely what those outcomes are. In this case, we could start by
selecting the target, translate it into performance criteria, and then develop
performance rating procedures. Then, given a clear vision of the desired performance,
we can devise exercises calculated to elicit samples of performance to which we
can then apply the criteria.
In a sense, we have a
chicken-or-egg dilemma here. We can't plan to evaluate performance until we
know what that performance is--but neither can we solicit performance properly
until we know how we're going to evaluate it! Which comes first, the
performance criteria or the exercises?
As luck and good planning would
have it, you can take your choice. Which you choose is a function of your level
of understanding of the valued outcomes to be assessed.
When
You Know What to Look For. Those who begin the
performance assessment development process with a strong background in the area
to be evaluated probably possess a highly refined vision of the target and can
develop performance criteria out of that vision. If you begin with that kind
of firm grounding, you may be able simply to sit at your desk and spell out
each of the key elements of sound performance. With little effort, you may be
able to translate each key element into different levels of achievement
proficiency in clear, understandable terms. If you have sufficient pedagogical
knowledge in your area(s) of expertise, you can use the procedures discussed in
this chapter to carry out the necessary professional reflection, spell out
your performance criteria, and transform those into scoring and recording
procedures. Then you will be ready to devise exercises to elicit the performance
you want.
When
You Have a Sense of What to Look for. However, not
everyone is ready to jump right in in this manner. Sometimes we have a general
sense of the nature of the performance but are less clear on the specific
criteria. For example, you might want your students to "write a term
paper," but not have a clear sense about the standards of quality you want
to apply.
When this happens, you need a
different starting place. One option is to give students a general term paper
assignment and use the resulting papers—that is, actual samples of student
work—as the basis for defining specific criteria. You can select a few
high-quality and a few low-quality papers to compare as a means of generating
clear and appropriate performance criteria. One way to do this is to sort them
into three or four piles of general quality, ranging from poor to excellent,
then carefully and thoughtfully analyze why the papers differ. Why do some
work, while others don't.? In those differences are hidden the performance
criteria you seek.
The major shortcoming of starting
with general exercises, of course, is that it puts students in the unenviable
position of trying to perform well—write a good term paper, for example—without
a clear sense of what good performance is supposed to look like. But remember,
you need do this only once. From then on, you will always have well-developed
criteria in hand to share with students in advance.
You can avoid this problem if you
can recover copies of previous term papers. The key is to find contrasting
cases, so you can compare them. They needn't come from your current students.
Or, if you are assessing a demonstrated skill, perhaps you can find videotapes
of past performance, or can locate students practicing and observe them. One
excellent way to find the right performance criteria, your vision of the
meaning of academic success, is by "student watching." You can derive
the critical elements of student success from actual samples of student work. But
to take advantage of this option, you first need to get them performing
somehow. That may mean starting with the development of exercises.
When
you’re Uncertain about what to look for. Other times,
you may have only the vaguest sense of what you want students to know and be
able to do within a particular discipline. In these instances, you can use
performance assessment exercises to help identify the truly important learning.
Here's how this works:
Begin by asking yourself, what
kinds of real-world challenges do I want students to be able to handle? What
are some sample problems I hope students would be able to solve? Using creative
brainstorming, you and your colleagues can create and collect numerous written
illustrative sample exercises. When you have assembled enough such exercises
to begin to zero in on what they are sampling, then step back from the array of
possibilities and ask, what are the important skills that seem to cross all or
many of these problems? Or, if products are to result, What do all or many of
the products seem to have in common? In short, ask, what are these exercises
really getting at? In other words, we draw inferences about the underlying
meaning of success by examining various examples of how that success is likely
to manifest itself in real-world problems. Out of these generalizations, we can
draw relevant and explicit performance criteria.
One thing I like about this
strategy is the fact that the resulting performance criteria are likely to be
usable for a number of similar exercises. Good criteria generalize across
tasks. They are likely to represent generally important dimensions of sound
performance, not just those dimensions that relate to one narrowly defined task.
They capture and convey a significant, generalizable portion of the meaning of
academic success.
Having acknowledged these various
options in the order of assessment development, I will now outline a simple
performance assessment development sequence starting with the criteria, adding
in the exercises, and concluding with the development of scoring and recording
procedures. You can mix and match these parts and use the development of one
part to help you solve problems in the development of another part. This is the
art of performance assessment development.
Phase
1: Defining Performance
Our goal in defining the term
performance as used in the context of performance assessment is to describe the
important skills to be demonstrated and/or the important attributes of the
product to be created. While performance assessments also include evaluations
of or judgments about the level of proficiency demonstrated, our basic
challenge is to describe the underlying basis of our evaluations.
More specifically, in designing
performance assessments, we work to find a vocabulary to use in communicating
with each other and with our students about the meaning of successful
performance. The key assessment question comes down to this: Do you, the
teacher, know what you are looking for in performance? But the more important
instructional question is this: Do you know the difference between successful
and unsuccessful performance and can you convey that difference in meaningful
terms to your students? Remember, students can hit any target they can see and
that holds still for them. In performance assessment contexts, the target is
defined in terms of the performance criteria.
Shaping
Your Vision of Success. As I have said repeatedly, the
most effective way to be able to answer these two questions in the affirmative
is to be a master of the skills and products that reflect the valued academic
outcomes in your classroom. Those who teach drama, music, physical education,
second languages, computer operations, or other skill-based disciplines, are
prepared to assess well only –hen they possess a refined vision of the critical
skills involved. Those who instruct students to create visual an, craft or (XILMS,
and various written products face both the teaching and assessment challenges
with greatest competence and confidence when they are masters at describing the
high-quality product to the neophyte.
Connoisseurs can recognize
outstanding performance when they see it. They know a good restaurant when they
find it. They can select a fine wine. They know which movies deserve thumbs up,
which Broadway plays are worth their ticket price. And connoisseurs can
describe why they have evaluated any of these as outstanding. It is their
stock in trade. However, because the evaluation criteria may vary somewhat from
reviewer to reviewer, their judgments may not always agree. In restaurants,
wines, movies, and plays, the standards of quality may be a matter of opinion.
But, that's what makes interesting reading in newspapers and magazines.
Teachers are very much like these
connoisseurs, in that they must be able to recognize and describe outstanding
performance. But there are important differences between connoisseurs and
teachers.
Not only can well-prepared
teachers visualize and explain the meaning of success, but they can impart
that meaning to others so as to help them become outstanding performers. In
short, they are teachers, not just critics.
In most disciplines, there are
agreed-upon skills and products that proficient performers must master. The
standards of excellence that characterize our definitions of high-quality
performance are always those held by experts in the field of study in question.
Outstanding teachers have immersed themselves in understanding those
discipline-based meanings of proficiency and they understand them thoroughly.
Even when there are differences of opinion about the meaning of outstanding performance
in a particular discipline, well-prepared teachers understand those differences
and are capable of revealing them to their students.
It is this depth of understanding
that must be captured in our performance expectations so it can be conveyed to
students through instruction, example, and practice. Because they must be
shared with students, our performance criteria cannot exist only in the
intellect of the assessor. They must be translated into words and examples for
all to see. And they must be capable of forming the basis of our judgments
when we record the results of our assessments.
Finding Help in Shaping Your
Vision. In this regard, since we now have nearly a decade of significant new
discipline-based performance assessment research and development behind us,
many fields of study already have developed outstanding examples of sound
criteria for critical performance. Examples include writing proficiency,
foreign language, mathematics, and physical education. The most accessible
source of information about these developments is the national association of
teachers in each discipline. Nearly every such association has advanced written
standards of student achievement in their field of study within the past five
years. Any association that has not completed that work by now is conducting
such studies at this time and will have them completed soon. I will provide
examples of these in Part 3.
Not only will these associations
probably have completed at least some of this work themselves, but they likely
know others who have engaged in developing performance standards in their
field. These may include university researchers and/or state departments of
education. Check with your reference librarian for a directory of associations
to learn how to contact those of interest to you.
Many contend that most of the
important advances in the development of new assessment methods, including
performance assessments, conducted over the past decade have been made by
assessment departments of state departments of education. For this reason, it
may be useful to contact your state personnel to see if they have either
completed development of performance criteria in your discipline or know of
other states that have. Again, I will share examples of these later.
And finally, consider the
possibility that your local district or school curriculum development process
may have resulted in the creation of some new performance assessment. Or
perhaps a colleague, completely unbeknownst to you, developed an evaluation of
a kind of performance that is of interest to you, too. You will never know
unless you ask. At the very least, you may find a partner or even a small team
to work with you in your performance assessment development.
Six
Steps in Developing Your Own Criteria. If you must
develop performance criteria yourself, you must carry out a thoughtful task or
product analysis. That means you must look inside the skills or products of
interest and find the active ingredients. In most cases, this is not
complicated.
A professor associate of mine
decided to develop a performance assessment of her own teaching proficiency.
The assessment would focus on the critical skills in the presentation of a
class on assessment. Further, she decided to engage her students in the
process of devising those criteria--to assure that they understand what it
means to teach effectively. Let me describe how that went.
Please note that the examples
presented in this description are real. They came directly from the work of the
actual class depicted in this story. They are not intended to represent
exemplary work or the best possible representation of the attributes discussed
and should not be regarded as such. They are merely presented as illustrations
from real classroom activities.
Step
1: Reflective Brainstorming. The process of
developing criteria reflecting teaching proficiency began with a brainstorming
session. The professor talked with her students a few moments about why it is
important to understand how to provide sound instruction in assessment
methods, and then asked, what do you think might be some of the factors that
contribute to suggestions on the board, trying to capture the essence of each
with the briefest possible label.
That list looked something like
this:
know the
subject
use humor
organized
enthusiasm
fresh ideas
relevant
content
clear
objectives
be
interactive
use visuals
well
be interesting
appropriate
pacing
believe in
material covered
professional
credible
information
|
poised
flexible
on schedule
good support
materials
appropriate
text
monitor
student needs
voice loud,
clear, varied
comfortable
environment
refreshments!
material connected
challenging
personalize
communication
effective
time management
in control
|
From time to time, the teacher
would dip into her own reservoir of ideas about effective teaching and offer a
suggestion, just to prime the pump a bit.
As the list grew, the flow of ideas began to dry
up--the brainstorming process slowed. When it did, she asked another question:
What specific behaviors could teachers engage in that would help them be
effective—what could they do to maximize the chances of making instruction
work? The list grew until everyone agreed that it captured most of what really
is important. The entire process didn't take more than ten minutes.
Time for Reflection
Think about your experience as a
student and/or teacher. What other keys to effective teaching can you think of?
Step
2: Condensing. Next, she told them that they needed
to be able to take advantage of all of these excellent suggestions to evaluate
her class and her effectiveness. However, given the immense list they had just
brainstormed, they just wouldn't have time to evaluate on the basis of all
those criteria. They would have to boil them down to the truly critical
factors. She asked how they might do that.
Some students thought they should
review the list and pick out the most critical entries, to concentrate on those
first. Others suggested that they try to find a smaller number of major
categories within which to group elements on the long list. To reach these
goals, the professor asked two questions: Which of the things listed here on
the board are most crucial? Or, what super categories might we place file individual
entries in, to get a shorter list?
At this point in the development
process, it became important to keep the list of super categories as short as
possible. She asked the class if they could narrow it down to four or
five—again capturing the essence of the category with the briefest possible
label. (These super category headings need to represent truly important aspects
of sound performance, because they form the basis for the performance criteria,
as you will see.)
Here are the five super
categories the students came up with after about five minutes of reflection and
discussion:,
Content
Organization
Delivery
Personal characteristics
Classroom environment
Time for Reflection
Based on the list presented
above, supplemented with your additions, what Other super categories would you
suggest?
Remember, the goal throughout
this entire activity is to build a vocabulary both students and teacher can use
to converse with each other about performance. This is why it is important to
engage students in the process of devising criteria, even if you know going in
what criteria you want used. When you
share the stage with your students. They get to playa role in defining success
and in choosing a language to de-scribe it that they understand, thus
connecting them to their target. (Please reread that sentence. It is one of
the most important in the entire book. It argues for empowering students, the
central theme of this work.)
Step
3: Defining. Next, class members collaborated in
generating definitions of each of the five chosen super categories, or major
dimensions of effective teaching. The professor assigned responsibility for
writing a concise definition of key dimensions to groups of students, one
dimension per group. She advised them to consider the elements in the original
brainstormed list by reviewing it and finding those smaller elements subsumed
within each super category. This would help them find the critical words they
needed to describe their dimension. When each group completed its draft, a
spokesperson read their definition to the class and all were invited to offer
suggestions for revising them as needed.
Here are some of the definitions
they composed:
Content: appropriateness of
presentation of research, theory, and practical applications related to the
topic of assessment; appropriateness of course objectives.
Organization
: appropriateness of the order in which material is presented in terms of
aiding learning.
Delivery:
deals with the presentation and interaction patterns in terms of conveying
material and helping students learn it
Personal
characteristics: appropriateness of the personal
manner of the instructor in relating to the material, the students, and the
interaction between the two.
Class
environment: addresses all physical aspects of
the learning atmosphere and setting that are supportive of both students and teacher
The group work, sharing, and revision took about
twenty minutes.
Step
4: Contrasting. with labels and definitions for key
performance dimensions in hand, they turned to the next challenge: finding
words and examples to describe the range of possible performance within each
dimension. They had to find ways to communicate with each other about what
teaching looks like when it is very ineffective and how that changes as it
moves toward outstanding performance. By establishing a sense of the underlying
continuum of performance for each dimension of effective teaching (that is, to
share a common meaning of proficiency ranging from a complete lack of it to
totally proficient), they can observe any teaching and communicate about where
that particular example should be rated on each key dimension.
In preparation for this activity,
the professor dug up brief, ten-minute videos of two teachers in action, one
faltering badly and the other hitting on all cylinders. She showed these videos
to her students and asked the question, What makes one class work well while
-the other fails? What do you see that makes them different in terms of the
five key dimensions defined earlier? They rewound and reviewed the examples
several times while defining those differences for each dimension. This
activity always helps participants zero in on how to describe performance, good
and bad, in clear, understandable language. (Regardless of the performance for
which criteria are being developed, my personal experience has Keen that the
most effective method of articulating the meaning of sound and unsound
performance is that of very carefully studying vastly contrasting cases. These
developers used this method to great advantage to define the basis for their
performance criteria.)
Step
5. Describing Success. As the students began to become
clear on the language and examples needed to describe performance, they
searched for ways to capture and quantify their judgments, such as by mapping
their continuum descriptions onto rating scales or checklists. (We'll learn
more about this in the section below on scoring and recording.) The class
decided to develop three-point rating scales to reflect their thinking. Figure
8-4 presents some of these scales. This phase of the work took about an hour. -
Time for Reflection
See if you can devise a
three-point rating scale for one or two of the other criteria defined above.
Step
6. Revising and Refining. The professor was
careful to point out that, when they arrived at a definition of academic success—whether
as a set of performance criteria, rating scales, or whatever form it happened
to take—the work was not yet done. They needed to practice applying their new
standards to some teaching samples to see if they really fit—to see if they
might need to more precisely define key aspects of performance. We can learn a
general lesson from this: performance criteria should never be regarded as
"finished." Rather, with time and experience in applying our
standards to actual samples of student work, our vision of the meaning of
success will grow and change. We will sharpen our focus. As this happens, we
are obliged to adjust our performance expectations to reflect our most current
sense of the keys to academic success.
Note the
Benefits. I hope you realized that the entire performance
criteria development sequence we just reviewed represents far more than just a
preparation to assess dependably. This sequence almost always involves
participants in serious, highly motivated questioning, probing, and clarifying.
In fact, assessment and instruction are indistinguishable when teachers
involve their students in the process of identifying performance criteria.
Another Useful
and Important Application. However, for various reasons,
you may not wish to involve your students. Perhaps the students are too young
to comprehend the criteria or the process. Or perhaps the target requires the
development and application of highly technical or complex criteria that would
be out of reach of the students. I have seen student involvement work
productively as early as the third grade for some simple targets. But it may
not always be appropriate.
Content
|
3
2
1
|
Outcomes
clearly articulated
challenging
and, provocative content
highly relevant
content on assessment for teachers
some stated
outcomes
content. somewhat,
interesting and engaging of some relevance to the classroom
intended
outcome not stated
content
boring
irrelevant
to teachers and the classroom
|
Delivery
|
3
2
1
|
flow brid
pace moves well
humor used
checks for
clarity regularly
feedback
used to adjust
extensive
Interaction with students
pacing
acceptable. some of the time
material
and/or delivery somewhat disjointed
some,
checking for clarity
some student
participation
pacing too
slow or fast delivery disconnected
much dead
time
no
interaction--one-person show
no checking
for clarity
|
When this happens, at least
consider another option for carrying out this same set of activities: Rather
than engaging your students as your partners, devise criteria with a group of
colleagues. If you do, you may argue about what is really important in performance.
You might disagree about the proper language to use to describe performance.
And you may fight with each other about key differences between sound and
unsound performance. But I promise you these will be some of the most engaging
and productive faculty meetings of your life. And out of that process might
come long-term partners in the performance assessment process.
Even if everyone doesn't agree in
the end, each of you will have reflected deeply on, and be able to defend, the
meaning of student success in your classroom. We all need that kind of
reflection regularly.
Summary
of the Six Steps. However, if it comes down to you
devising your own
performance criteria, you can rely on variations
of these steps, listed again in Figure 8-5. And remember, when students are
partners in carrying out these six steps, you and your students join together
in a learning community.
These activities can provide
clear windows into the meaning of academic success—they can give us the words
and examples we need to communicate about that meaning. I urge you to share
those words with all who have a vested interest in
Step 1,
Begin by reflecting on the meaning of excellence in the performance arena that
Is of Interest to you. Be sure to tap your own professional literature, texts,
and curriculum. Materials for Insights, too. And don't overlook the wisdom of
your colleagues and associate as a resource. Talk with them! Include students
as partners in this step, too-Brainstorm your own list of key elements. 'You
don't have to list them all In sitting. Take some time to let the list grow.
Step 2,
Categorize the many elements; so that they- reflect-your highest priorities.
Keep the list as shows possible while still capturing the essence of
performance.
Step, 3, define each key dimension in clear,
simple language.
Step 4, find some actual performance to watch or
ecample of products to study. If this step can include the thoughtful analysis
of a number of contrasting cases – an outstanding term paper and a very ewak
one, a flowing and accurate jumshot in basketball and a poor one, a student who
functions effectively in a group and one who is repeatedly rejected, and so on
– so much the better.
Step 5, Use your clearest language and your very
best examples to spell out in word and picture each point a long the various
continuums of performance e you
use to define the important dimensions of the achievement
to be -assessed.
Step 6. Try your performance criteria to see If
they really do capture the essence of performance: fine -tune them to state as
precisely as possible what it means to succeed. Let this fine tuning go on as
needed for as long as you teach student success, most notably, with your
students themselves. This, (lien, is the art of developing performance criteria.
Attributes
of Sound Criteria. Quellmalz (1991), writing in a
serial issue of a professional journal devoted to performance assessment, provides
us with a simple list of standards against which to compare our performance
criteria in order to judge their quality. She points out that effective
performance criteria do the following:
1.
Reflect all of the important
components of performance--the milestones in target attainment.
2.
Apply appropriately in contexts and
under conditions in which performance naturally occurs.
3.
Represent dimensions of performance
that trained evaluators can apply consistently to a set of similar tasks
(i.e., not be exercise specific).
4.
Are developmentally appropriate for
the examinee population.
5.
Are understandable and usable by all
participants in the performance assessment process, including teachers,
students, parents, and the community.
6.
Link assessment results directly into
the instructional decision making process.
7.
Provide a clear and understandable
means of documenting and communicating about student growth over time.
I would expand this list to include one additional
standard: The development of performance criteria should be seen as an
opportunity to teach. Students should play a role in the development of
performance criteria whenever possible.
Figure 8-6 details rating scales
that depict two key dimensions of good writing, organization and voice. Note
the simple, yet dear and specific i nature of the communication about important
dimensions of good writing. With these kinds of criteria in hand, we
definitely can help students become better performers.
Phase
2: Designing Performance Exercises
Performance assessment exercises, like selected
response test items and essay exercises, frame the challenge for the respondent
and set the conditions within which that challenge is to be met. Thus, they are
a clear and explicit reflection of the desired outcomes. Like essay exercises,
sound performance assessment exercises outline a complete problem for the
respondent: achievement to be demonstrated, conditions of the demonstration,
and standards of quality to be applied.
As specified earlier in this
chapter, we face three basic design considerations when dealing with exercises
in the context of performance assessment. We must determine the following:
1.
The nature of the exercise(s), whether
structured exercises or naturally occurring events Organization
5 The organization enhances and showcases the central idea or theme. The
order, structure, or presentation is compelling and moves the reader through
the text.
•
Details seem to fit where they're
placed; sequencing is logical and effective.
•
An inviting introduction draws the
reader in and a satisfying conclusion leaves the reader with a sense of
resolution.
•
Pacing is, very well controlled; the
writer delivers needed information at just the right moment, then moves on.
•
Transitions are smooth and weave the
separate threads of meaning into one cohesive whole.
•
Organization flows so smoothly the
reader hardly thinks about it.
3 The organizational structure is strong enough to move the reader from
point to point without undue confusion.
•
The paper has a recognizable
introduction and conclusion. The introduction may not create a strong sense of
anticipation; the conclusion may not leave the. Reader with a satisfying sense
of resolution.
•
Sequencing is usually logical: It may
sometimes be too obvious, or otherwise ineffective.
•
Pacing is fairly well controlled,
though the writer sometimes spurts ahead too quickly or spends too much time on
the obvious.
•
Transitions often work well; at times
though, connections between ideas are fuzzy or call for inferences.
•
Despite a few problems, the
organization does not seriously get in the way of the main point or storyline.
1 The writing lacks a clear sense of direction. Ideas, details or events
seem strung together in a random, haphazard fashion--or' else there Is no identifiable internal
structure at all. More than one of the following problems is fikely'to'be
evident:
•
The writer has not yet drafted d a
real lead or conclusion.
•
Transitions are not, yet dearly
defined; connections between ideas seem confusing or incomplete.
•
Sequencing, if it exists, needs work.
•
Pacing feels awkward, with lots of
time spent on minor details or big, hard-to-follow leaps from point to point.
•
Lack of organization makes it hard for
the reader to get a grip on the main point 'or storyline.
Figure 8-6
Sample Rating
Scales For Writing (Repreinted from “Lingking Writing
Assessment and Instruction”) in Creating
Writers (104-106) by V. Spandel and R. J. Stiggins, 1990, White Plains, NY:
Longman. Copyright 1990 by Longman. Reprinted by permission of Longman.)
2.
The specific content of structured
exercises, defining the tasks to be carried out by performers
3.
The number of exercises needed to
provide a sufficient sample of performance
We will now delve into each in some detail.
5 The writer speaks directly to
the reader in a way that is Individualistic, expressive, and engaging. Clearly,
the writer is involved in the text and is writing to be read.
•
The paper is honest and written from
the heart. It has the ring of conviction.
•
The language is natural yet
provocative; it brings the topic to life.
•
The reader feels a strong sense of
interaction n with the-writer and senses The person behind the words.
•
The projected tone and voice give
flavor to the writer's message and seem very appropriate for the purpose and
audience.
3 The writer seems sincere, but not genuinely
engaged, committed, or involved. The result is pleasant and sometimes even
personable, but short of compelling.
•
The writing communicates in an
earnest, pleasing manner. Moments here and there amuse, surprise, delight, or
move the reader.
•
Voice may emerge strongly on occasion,
then retreat behind general, vague, tentative, or abstract language.
•
The writing hides as much of the
writer as it reveals.
•
The writer seems aware of an audience,
but often weighs words carefully, stands at a distance, and avoids risk.
1 The writer seems indifferent,
uninvolved, or distanced from the topic and/or the audience. As a result, the
writing is flat, lifeless, or mechanical; depending on the topic, it maybe
overly technical jargonistic. More than
one of the following problems is likely to be evident
•
The reader has a hard time sensing the
writer behind the words. The writer does not seem to reach out to an audience,
or make use of voice to connect with that audience.
•
The writer speaks in a kind of
monotone that tends of flatten all potential highs and lows of the message
•
The writing communicates on a functional
level With no apparent attempt to move or involve the reader.
•
The writer is not yet sufficiently
engaged or at home with the topic to take risks or share him/herself.
Figure 8-6,
(Continued)
Sample Rating
Scales for Writing
Nature
of Exercises. The decision about whether to rely
on structured exercises, naturally occurring events, or some combination of the
two should be influenced by several factors related to the outcome(s) to be
assessed and the environment within which the assessment is to be conducted.
Focus
of Assessment. Structured exercises and naturally
occurring events can help us get at slightly different targets. When a pending
performance assessment is announced in advance and students are given
instructions as to how to prepare we intend to maximize their motivation to
perform well. In fact, we often try to encourage best possible performance by
attaching a grade or telling students that observers from outside the classroom
(often parents) will watch them perform. When we take these steps and build the
assessment around structured exercises, we set our conditions up to assess
students' best possible performance, under conditions of maximum motivation to
do well—a very important outcome.
However, sometimes our objective
is not to see the student's "best possible" performance. Rather,
what we wish is "typical" performance, performance under conditions
of the students' regular, everyday motivation. For example, we want students to
adhere to safety rules in the woodworking shop or the science lab all the time
(under conditions of typical motivation), not just when they think we are
evaluating them (maximum motivation to perform well). Observation during
naturally occurring classroom events can allow us to get at the latter.
From an assessment quality
control point of view, we still must be clear about our purpose. And, explicit
performance criteria are every bit as important here. But our assessment goal
is to be watching closely as students behave spontaneously in the performance
setting.
Time for Reflection
Identify a few achievement
targets you think might be most effectively assessed through the unobtrusive
observation of naturally occurring events. In your experience as a teacher or
student, have you ever been assessed in this way? When?
Time
Available to Assess. In addition to motivational
factors, there also are practical considerations to bear in mind in deciding
whether to use structured or naturally occurring events. One is time. If normal
events of the classroom afford you opportunities to gather sound evidence of
proficiency without setting aside special time for the presentation of
structured exercises and associated observations, then take advantage of the
naturally occurring instructional event. The dividend will be time saved from
having to devise exercises and present and explain them.
Natural
Availability of Evidence. Another practical
matter to consider in your choice is the fact that classrooms are places just
packed full of evidence of student proficiency. Think about it--teachers and
students spend more time together than do the typical parent and child or
husband and wife! Students and teachers live in a world of constant interaction
in which both are watching, doing, talking, and learning. A teacher's greatest
assessment tool is the time spent with students.
This permits the accumulation of
bits of evidence--for example, corroboration of past inferences about student
proficiency and/or evidence of slow, gradual growth—over extended periods of
time, and makes for big samples. It offers opportunities to detect patterns,
to double check, and to verify.
Spontaneous
Assessment. Everything I have said about
performance assessment up to this point has depicted it as rational,
structured, and preplanned. But teachers know that assessment is sometimes
spontaneous. The unexpected classroom event or the briefest of unanticipated
student responses can provide the ready observer with a new glimpse into
student competence. Effective teachers see things. They file those things away.
They accumulate evidence of proficiency. They know their students. No other
assessor of student achievement has the opportunity to see students like this
over time.
But beware! These kinds of
spontaneous performance assessments based on on the spot, sometimes
unobtrusive observations of naturally occurring events are fraught with as many
dangers of misassessment as any other kind of performance assessment. Even in
these cases, we are never absolved from adhering to the basic principles of
sound assessment: clear target, clear purpose, proper method, sound sample, and
controlled interference.
You must constantly ask yourself:
What did I really see? Am I drawing the right conclusion based on what I saw?
How can I capture the results of this spontaneous assessment for later use (if
necessary or desirable)? Anecdotal notes alone may suffice. The threats to
sound assessment never leave us. So by all means, take advantage of the
insights provided by classroom time spent together with your students. But as a
practical classroom assessment matter, do so cautiously. Create a written
record of your assessment whenever possible.
Time for Reflection
Can you think of creative,
realistic ways to establish dependable records of the results of spontaneous
performance assessments that happen during an instructional day?
Exercise
Content. Like well-developed essay exercises,
sound structured performance assessment exercises explain the challenge to the
respondent and set them up to succeed if they can, by doing the following:
•
identifying the specific kind(s) of
performance to be demonstrated
•
detailing the context and conditions
within which proficiency is to be demonstrated
•
pointing the respondent in the
direction of a good response by identifying the. standards to be applied in
evaluating performance
Here is a simple example:
Achievement:
You are to apply your knowledge of energy converted to motion and simple principles
of mechanics by building a mousetrap car.
Conditions:
Using materials provided in class and within the time limits of four class
periods, please design and diagram your plan, construct the car itself, and
prepare to explain why you included the design features you chose.
Standards:
Your performance will be evaluated in terms of the specific standards we set
in class, including the clarity of your diagrammed plan, the performance and
quality of your car, and your presentation explaining its design features. if
you have questions about these instructions or the standards of quality, let me
know.
In this way, sound exercises frame clear and
specific problems to solve.
In a comprehensive discussion of
the active ingredients of sound performance assessment exercises, Baron (1991)
offers us clear and thought-provoking guidance. I quote and paraphrase below at
length from her work because of the richness of her advice. Baron urges that we
ask important questions about the nature of the assessment:
•
"If a group of curriculum experts
in my field and a group of educated citizens in my community were to use my
assessment tasks as an indicator of my educational values, would I be pleased
with their conclusions? And would they?" (p. 307)
•
"When students prepare for my
assessment tasks and I structure my curriculum and pedagogy to enable them to
be successful on these tasks, do I feel assured that they will be making
progress toward becoming genuine or authentic readers, mathematicians,
writers, historians, problem solvers, etc.?" (p. 308)
•
"Do my tasks clearly communicate
my standards and expectations to my students?" (p. 308)
•
Is performance assessment the best
method to use given what I want my students to know and be able to do?
•
“Are some of my tasks rich and
integrative, requiring students to make connections and forge relationships
among various aspects of the curriculum?" (p. 310)
•
"Are my tasks structured to
encourage students to access their prior knowledge and skills when solving
problems?" (p. 310)
•
"Do some tasks require students
to work together in small groups to solve complex problems?" (p. 311)
•
Do some of my tasks require that my
students sustain their efforts over a period of time (perhaps even an entire
term!) to succeed?
•
Do some tasks offer students a degree
of freedom to choose the course of action--to design and carry out the
investigations—they will take to solve the problem?
•
"Do my tasks require
self-assessment and reflection on the part of students?" (p. 312)
•
"Are my tasks likely to have
personal meaning and value to my students?" (p. 313)
•
"Are they sufficiently
challenging for the students?" (p. 313)
•
"Do some of my tasks provide
problems that are situated in real-world contexts and are they appropriate for
the age group solving them?" (p. 314)
These guidelines define the art of developing
sound performance exercises.
Time for Reflection
What might a performance
assessment exercise look like that could test your skill in developing a
high-quality performance assessment? What specific ingredients would you
include in the exercise?
The
Number of Exercises—Sampling Considerations.
How do we know how many exercises are needed within an assessment to give us
confidence that we are drawing dependable conclusions about student
proficiency? This is a particularly troubling issue in the context of
performance assessment, because the amount of time required to administer, observe,
and score any single exercise can be so long. Sometimes, we feel it is
impossible to employ a number of exercises because of time and workload
constraints.
However, this view can lead to
problems. Consider writing assessment, for example. Because writing takes so many
forms and takes place in so many contexts, defining proficiency is very
complex. As a result, proficiency in one writing context may not predict
proficiency in another. We understand that a proper sample of writing
proficiency, one that allows generalizations to the entire performance domain,
must include exercises calling for various kinds of writing, such as narrative,
expository, and persuasive. Still, however, we find large-scale writing
assessments labeling students as writers or non writers on the basis of a
single twenty- to sixty-minute writing sample (Bond & Roeber, 1993). Why?
Because that's all the assessment resources will permit!
Sampling always involves
tradeoffs between quality of resulting information and the cost of collecting
it. Few have the resources needed to gather the perfect sample of student
performance. We all compromise. The good news for you as a teacher is that you
must compromise less than the large-scale assessor primarily because you have
more time with your students. This is precisely why I feel that the great
strength and future of performance assessment lies in the classroom, not in
large-scale standardized testing.
In the classroom, it is often helpful to define
sampling as the purposeful collection of a number of bits of information about
student achievement gathered over time. When gathered and summarized carefully,
these bits of insight can form a representative sample of performance that can
lead to confident conclusions about student achievement.
Unfortunately, there are no hard
and fast rules to follow in determining how many exercises are needed to yield
dependable conclusions. That means we must once again speak of the art of
classroom assessment. I will share a sample decision rule with you now that
depicts the artistic judgment in this case, and then I will review specific
factors you can consider when exercising your judgment.
The sampling decision rule is
this: You know you have presented enough exercises and gathered enough
instances of student performance when you can predict with a high degree of
confidence how well the student would do on the next one. Part of performance
assessment sampling is science and part of it is art. Dealing first with the
science, the more systematic part, one challenge is to gather samples of
student performance under all or most of the circumstances in which they will
be expected to perform over the long haul. Let me illustrate from life.
Let's say we want to assess for
the purpose of certifying the competence of commercial airline pilots. One
specific skill we want them to demonstrate, among others, is the ability to
land the plane safely. So we take candidates up on a bright, sunny, calm day
and ask them to land the plane---clearly an authentic performance assessment.
Let's say all pilots do an excellent job of landing. Are you ready to certify
them?
If your answer is yes, I don't
want you screening the pilots hired by the airlines on which I fly. Our
assessment only reflected one narrow set of circumstances within which we
expect our pilots to be competent. What if it's night, not bright, clear daylight?
A strange airport? Windy? Raining? An emergency? These represent realities
within which pilots must operate routinely. So the proper course of action in
sampling performance for certification purposes is to analyze relevant
variables and put them together in various combinations to see how each
candidate performs. At some point, the array of samples of landing proficiency
(gathered under various conditions) combine to lead us to a conclusion that
the skill of landing safely has or has not been mastered.
This example frames your
performance assessment sampling challenge, too. How many "landings"
must you see under what kinds of conditions to feel confident your students can
perform according to your standards? The science of such sampling is to have
thought through the important conditions within which performance is to be
sampled. The art is to use your resources creatively to gather enough different
instances under varying conditions to bring you and the student to a confident
conclusion about proficiency.
In this context, I'm sure you can
understand why you must consider the seriousness of the decision to be made in
planning your sample. Some decisions bear greater weight than others. These
demand assessments that sample both more deeply and more broadly to give you
confidence in the decision that results—such as certifying a student as
competent for purposes of high school graduation, for example. On the other
hand, some decisions leave more room to err. They allow you to reconsider the
decision later, if necessary, at no cost to the student—for example, assessing
a student's ability to craft a complete sentence during a unit of instruction
on sentence construction. When the target is narrow and the time frame brief,
we need sample fewer instances of performance.
Figure 8-7 identifies four
factors to take into account in making sampling decisions in any particular performance
assessment context. Even within the guidelines these provide, however, the
artistic sampling decision rule is this: You know how confident you are. If you
are quite certain you have enough evidence, draw your conclusion and act upon
it.
But your professional challenge
is to follow the rules of sound assessment and gather enough information to
minimize the chance that you are wrong. The conservative position to take in
this case is to err in the direction of oversampling to raise your level of
confidence.
If you feel uncertain about the
conclusion you might draw regarding the achieve
·
The
reason(s) for the assessment. The more critical
the decision, the more sure you must be and. the -more information you should
gather; a simple daily Instructional decision that can be reversed tomorrow if
necessary requires less confidence and therefore a smaller sample of
performance than a hig school graduation decision.
·
The
scope of the target. The broader she scope the more
different instances of performance we must sample.
·
The
amount of information provided by the response to one exercise.
Exercise can be written to produce very large samples of work, providing a
great deal of information about proficiency; when we use these, we may need to
fewer exercise.
·
The
resources available for observing and evaluating.
Put simply, the bigger your labor force, the more assessment you can conduct
per unit of time. This may be something you can really take advantage of.
Always remain aware of all the shoulders over which you can spread the
performance assessment workload: the principal, teacher aides, colleagues,
parents, outside experts, student.
Figure 8-7
Considerations
in Performance Assessment Sampling
ment of a particular student, -you have no choice
but to gather more information. To do otherwise is to place the well-being of
that student in jeopardy.
Time for Reflection
Based on your experience as a
student, can you identify a skill achievement target that you think would take
several exercises to sample appropriately, and another that you think could be
sampled with only one, or very few, exercises? What are the most obvious
differences between these two targets?
Phase
3: Scoring and Recording Results
Three design issues demand our attention at this
stage of performance assessment development, if we are to make the entire plan
come together:
1.
the level of detail we need in
assessment results
2.
the manner in which results will be
recorded
3.
who will do the observing and
evaluating
These are straightforward decisions, if we
approach them with a clear target and a clear sense of how the assessment
results are to be used.
Level
of Detail of Results. We have two choices in the
kinds of scores or results we derive from our observations and judgments:
holistic and analytica ].'Both require explicit performance criteria. That is,
we are never absolved from responsibility for having articulated the meaning of
academic performance in clear and appropriate terms.
However, the two kinds of scoring
procedures use the criteria in different ways. We can either (a) score
analytically and make our judgments by considering each key dimension of
performance or criterion separately, thus analyzing performance in terms of each
of its elements, or (b) make our judgments holistically by considering all of
the criteria simultaneously, making one overall evaluation of performance. The
former provides a high-resolution picture of performance but takes more time
and effort to accomplish. The latter provides a more general sense of performance
but is much quicker.
Your choice of score type will
turn on how you plan to use the results (whether you need precise detail or a
general picture) and the resources you have available to conduct the assessment
(whether you have time to evaluate analytically).
Some assessment contexts demand
analytical evaluation of student performance. No matter how hard you try, you
will not be able to diagnose student needs based on holistic performance information.
You will never be able to help students understand and learn to replicate the
fine details of sound performance by teaching them to score holistically.
But on the other hand, it is
conceivable that you may find yourself involved in an assessment where you must
evaluate the performance of hundreds of students with few resources on hand,
too few resources to score analytically. Holistic may be your only option.
(As a personal aside, I must say
that I am minimizing my own use of holistic scoring as a single, stand-alone
judgment of overall performance. I see few applications for such a score in the
classroom. Besides, I have begun to question the meaning of such scores. I have
participated in some writing assessments in which students whose analytical
profiles of performance [including six different rating scales) were remarkably
different ended up with the same holistic score. That gives me pause to wonder
about the real meaning and interpretability of holistic scores. I have begun to
think holistic scores mask the kind of more detailed information needed to
promote classroom-level student growth. It may be that, in the classroom, the
benefits of quick scoring are not worth the costs of sacrificing such valuable
assessment information.)
When a holistic score is needed,
it is best obtained by summing analytical scores, simply adding them together.
Or, if your vision of the meaning of academic success suggests that some
analytical scales are more important than others, they can be assigned a higher
weight (by multiplying by a weighting factor) before summing. However, a
rational basis for determining the weights must be spelled out in advance.
It may also be acceptable to add
a rating scale that reflects "overall impression" to a set of
analytical score scales, if the user can define how the whole is equal to more
than the sum of the individual parts of performance.
Recording
Results. Performance assessors have the
freedom to choose from among a wonderful array of ways to record results for
later communication. These include checklists, rating scales, anecdotal
records, and mental record keeping. Each of these is described in Table 8-1 in
terms of definition, principal strength, and chief limitation.
Table 8-1
Options for
recording performance judgments
|
Definition
|
Strength
|
limitation
|
Checklists
|
List of key
aftibutes of good performance checked present or absent.
|
Quick;
useful with large number of criteria.
|
Results can
lack depth
|
Rating
scales
|
Performance
continuum mapped numerical scale ranging from low to high.
|
Can record
judgment and rationale with one rating.
|
Can demand
extensive, expensive development and training for raters.
|
Anecdotal
records
|
Student
performance is described in detail in writing.
|
Can provide
rich portraits of achievement
|
Time consuming
to read, write, and interpret.
|
Mental
records
|
Assessor
store judgments and/or descriptions of performance in memory.
|
Quick and
easy way to record
|
Difficult to
retain accurate recollections, especially as time passed.
|
Note that checklists, rating
scales, and anecdotal records all store information that is descriptive in
terms of the performance criteria. That is, each element of performance
checked, rated, or written about must relate to our judgments about student
performance on established key dimensions.
In using mental record keeping, we can store
either ratings or images of actual performance. I included it in this list to
provide an opportunity to urge caution when using this notoriously undependable
storage system! Most often, it is not a good idea to rely on our mental records
of student achievement. When we try to remember Such things, there are five
things that can happen and four of them are bad. The one good possibility is
that we might retain an accurate recollection of performance. The bad things
are that we could do any or all of the following:
•
forget, losing that recollection
forever
•
remember the performance but ascribe
it to the wrong student
•
unconsciously allow the memory to
change over time due to our observations of more recent performance
•
retain a memory of performance that
serves as a filter through which we see and interpret all subsequent
performance, thus biasing our judgments inappropriately
The chances of these problems occurring increase
the longer we try to maintain accurate mental records and the more complex
these records are.
For the, reasons listed above, I
urge you to limit your use of this filing system to no more than a day or two
at most and to very limited targets. If you must retain the record of performance
longer than that, write it down—as a checklist, a set of rating scales, or an
anecdotal record!
Checking for Errors in Judgment.
Subjective scoring—a prospect that raises the anxiety of any assessment
specialist—is the hallmark of performance assessment. I hope by now you see why
it is that we in the assessment community urge caution as the education
community moves boldly to embrace this option. It is fraught with potential
danger and must be treated with great care.
We already have discussed many ways to assure that
our subjective assessment process is as objective as it can be:
•
Be mindful of the purpose for
assessing
•
Be crystal clear about the target
•
Articulate the key elements of good
performance in explicit performance criteria
•
Share those criteria with students in
terms they understand
•
Learn to apply those criteria in a
consistent manner
•
Double check to be sure bias does not
creep into the assessment process.
Testing
for Bias. There is a simple way to check for
bias in your performance evaluations. Remember, bias occurs when factors other
than the kind of achievement being assessed begin to influence our judgments,
such as the gender, age, ethnic heritage, appearance, or prior academic record
of the examinee. You can determine the degree of objectivity of your ratings by
comparing them with the judgments of another trained and qualified evaluator
who independently observes and evaluates the same student performance with the
intent of applying the same criteria. If, after observing and evaluating
performance, two independent judges generally agree on the level of proficiency
demonstrated, then we have evidence that the results reflect student
proficiency. But if the judges come to significantly different conclusions,
they obviously have applied different standards. We have no way of knowing
which is the most accurate estimate of true student achievement. Under these
circumstances the accuracy of the assessment must be called into question and
the results set aside until the reasons for those differences have been
thoroughly explained.
Time for Reflection
It's tempting to conclude that it
is unrealistic to gather corroborating judgments in the classroom—to double
check ratings. But can you think of any helpers who might assist you in your
classroom by playing the role of second rater of a performance assessment? For
each, what would it take to involve them productively? What benefits might
arise from their involvement?
Practical
Ways to Find Help. While this test of objectivity, or
of evaluator agreement, promises to help us check an important aspect of
performance assessment quality, it seems impractical for classroom use for two
reasons: it's often difficult to come up with a qualified second rater, or we
lack the time and expertise required to compare evaluations.
In fact, however, this process
need not take so much time. You need not check all of your judgments for
objectivity. Perhaps a qualified colleague could double check just a few—just
to see if your ratings are on target.
Further, it doesn't take a high
degree of technical skill to do this. Have someone who is qualified rate some
student performance you already have rated, and then sit down for a few minutes
and talk about any differences. If the performance to be evaluated is a product
students created, have your colleague evaluate a few. If it's a skill,
videotape a few. Apply your criteria to one and check for agreement. Do you
both see it about the same way? If so, go on to the next one. If not, try to
resolve differences, adjusting your performance criteria as needed.
Please understand that my goal
here is not to have you carry out this test of objectivity every time you
conduct a performance assessment. Rather, I want you to understand the spirit
of this test of your objectivity. An important part of the art of classroom
performance assessment is the ability to sense when your performance criteria
are sufficiently explicit that another judge would be able to use them
effectively, if called upon to do so. Further, from time to time it is a good
idea to actually check whether you and another rater really do agree in
applying your criteria.
On those occasions, however, when
you are conducting very important performance assessments that have
significant impact on-students (i.e., for promotion decisions, graduation
decisions, and the like), you absolutely must at least have a sample of your
ratings double checked by an independent rater. In these instances, remember
you do have access to other available evaluators: colleagues in your school or
district, your building administrative staff, support teachers, curriculum personnel,
experts from outside the field of education (when appropriate), retired teachers
in your community, qualified parents, and others.
In addition, sharing your
criteria with your students and teaching
them to apply those standards consistently can provide you with useful
insights. You can be assured that they will tell you which criteria they don't
understand.
Just remember, all raters must be
trained to understand and apply your standards. Never assume that they are
qualified to evaluate performance on the basis of prior experience if that
experience does not include training in using the criteria you employ in your
classroom. Have them evaluate some samples to show you they can do it. If
training is needed, it very often does not take long. Figure 8-8 presents steps
to follow when training raters. Remember, once they're trained, your support
raters are allies forever. Just think of the benefits to you if you have a pool
of trained evaluators ready to share the workload
More
about Students as Partners
Imagine what it would mean if your helpers—your
trained and qualified evaluators of process and/or product—were your students.
Not only could they be participants in the kind of rater training spelled out
in Figure 8-8, but they might even be partners in the process of devising the
performance criteria themselves. And, once trained, what if they took charge of
training some additional students, or perhaps trained their parents to be
qualified raters, too?
The pool of available helpers begins
•
Have trainees review and discuss the
performance criteria. Provide clarification as needed.
•
Give them a sample of work to evaluate
that is of known quality to you (i.e., which you already have rated), but not
to your trainees.
•
Check their judgments against yours, reviewing
and discussing any differences in terms of the specifics of the performance
criteria.
•
Give them another sample of work of known
quality to evaluate.
•
Compare their judgments to yours
again, noting and discussing differences.
•
Repeat this process until your trainee
converges on your standards, as evidenced by a high degree of agreement with
your judgments.
•
You and the trainees evaluate a sample
of work of unknown quality. Discuss any differences.
• Repeat this
process until youhave confidence in your new partner(s) in the evaluation
process.
Figure 8-8
Steps in Training Raters of Student Performance
to grow as more participants begin to internalize
the meaning of success in your classroom.
Without question, the best and
most appropriate way to integrate performance assessment and instruction is to
be absolutely certain that the important performance criteria serve as the
goals and objectives of the instruction. As we teach students to understand and
demonstrate key dimensions of performance, we prepare them to achieve the
targets we value. We prepare in sound and appropriate ways to be held
accountable for student learning when we are clear and public about our performance
criteria, and when we do all in our power to be sure students have the opportunity
to learn to hit the target.
In addition, we can make
performance assessment an integral, part of the teaching and learning process
by involving students in assessment development and use:
•
Share the performance criteria with
students at the beginning of the unit of instruction.
•
Collaborate with students in keeping
track of which criteria have been covered and which are yet to come.
•
Involve students in creating prominent
visual display of important performance criteria for bulletin boards.
•
Engage students in the actual
development of performance exercises.
•
Engage students in comparing contrasting
examples of performance, some of which reflect high-quality work and some of
which do not (perhaps as pan of a process of developing performance criteria).
•
Involve students in the process of
transforming performance criteria into checklists, rating scales, and other
recording methods.
•
Have students evaluate their own and
each other's performance, one on one and/or in cooperative groups.
•
Have student’s rate performance and
then conduct studies of how much agreement (i.e., objectivity) there was among
student judges; see if degree of agreement increases as students become more
proficient as performers and as judges.
•
Have students reflect in writing on
their own growth over time with respect to specified criteria.
•
Have students set specific achievement
goals in terms of specified criteria and then keep track of their own progress.
•
Store several samples of each
student's performance over time, either as a portfolio or on videotape, if
appropriate, and have students compare old performance to new and discuss in
terms of specific ratings.
•
Have students predict their
performance criterion by criterion, and then check actual evaluations to see if
their predictions are accurate.
Time
for Reflection
Have
you ever been involved in any of these ways of assessing your own
performance—as a partner with your teacher? If so, what was the experience like
for you?
These activities will help
increase students' control of their own academic wellbeing and will remove the
mystery that too often surrounds the meaning of success in the classroom.
Barriers
to Sound Performance Assessment
There are many things in the design and
development of performance assessments that can cause a student's real
achievement to be misrepresented. Many of the potential problems and remedies
are summarized in Table 8-2.
CHAPTER
SUMMARY: THOUGHTFUL DEVELOPMENT YIELDS SOUND ASSESSMENTS
This chapter has been about the great promise of
performance assessment. However, the presentation has been tempered with the
need to develop and use this option cautiously. Performance assessment, like
other methods, brings with it specific rules of evidence. We must all strive
to meet those rigorous standards.
We began with an overview of the
three steps in developing performance assessments: clarifying performance
(dealing with the nature and focus of the achievernent to be assessed),
developing exercises (dealing with the nature, content, and number of exercises),
and scoring (dealing with kinds of scores, recording results, and identifying
and training the evaluator). As we covered each step, we discussed how students
could become full partners in performance assessment design, development, and
use. The result will be better performers.
Source of Problems
|
Remedy
|
Inadequate
vision of the target
Wrong method
for the target
Incorrect
performance criteria
Unclear
performance criteria
Poor-quality
exercises
Inadequate
sample of exercises
Too little
time to evaluate
Untrained
evaluators
Inappropriate
scoring method selected
(holisti vs.
analytical)
Poor record
keeping
Keeping the
criteria and performance assessment process a mystery to students
|
Seek
training and help needed to clarify the vision. Collaborate with others in
this process.
Stick to
process and product targets when using performance assessment.
Compare
contrasting cases to zero in on key differences. Tap into sources of quality
criteria devised by others.
Study
samples of performance more carefully. Seek qualified expertise whenever
necessary.
Think about
and specify achievement to be demonstrated, conditions, and standards to be
applied.
Define the
domain to be sampled as precisely as possible. Gather as much evidence in
your corroborate your judgments.
Add trained
evaluators – they are available!
Use clear
criteria and examples of performance as a starting point in training them.
Understand
the relationship between holistic and analytical scoring and assessment purpose.
Strive for
accurate written records of performance judgments. Don’t depend on memory.
Don’t
|
To assure quality, we discussed
the need to understand the role of subjectivity in performance assessment. We
also analyzed the match between performance assessment and the five kinds of
achievement targets, concluding that strong matches can be developed for
mastery of knowledge through reference materials, reasoning, skills, and
products. We discussed the key context factors to consider in selecting this
methodology for use in the classroom, centering mostly on the importance of
having in place the necessary expertise and resources.
We devised six practical steps
for formulating sound performance criteria, urging collaboration with students
and/or colleagues in the process. We set standards for sound exercises,
including the need to identify the achievement to be demonstrated, the
conditions of the demonstration, and the standards of quality to be applied.
And finally, we spelled out scoring options, suggesting that analytical
evaluation of student work is likely to be most productive, especially when students
are trained to apply standards of quality to their own and each other's work.
As the decade of the 1990s
unfolds, we will come to rely more and more on performance assessment
methodology as the basis for our evaluation of student
achievement, and as a means of integrating
assessment and instruction. Let us strive for the highest quality, most
rigorous assessments our resources will allow.
EXERCISES
TO ADVANCE YOUR LEARNING
Knowledge Outcomes
Knowledge Outcomes
1.
Memorize the three basic parts and
nine design decisions that guide the performance assessment development
process.
2.
List the aspects of performance
assessment design that require professional judgment and the dangers of bias
associated with each.
3.
Specify the kinds of achievement
targets that can be transformed into the performance assessment format and
those that cannot.
4.
Identify the factors to take into
account in considering use of the performance assessment option.
5.
Describe the key considerations in
devising a sound sample of performance exercise.
6.
Memorize the six steps in the design
of performance criteria and the basic ingredients of sound exercises.
7.
In your own words, list as many ways
as you can to bring students into the performance assessment process as
partners.
Reasoning
Outcomes
1.
Find an example of a performance
assessment previously developed.by you or others and evaluate it. Using the
framework provided in this chapter, analyze the underlying structure of the
assessment and evaluate each part to see if standards of quality have been met.
Write a complete analysis of the assessment, detailing what you would do to
improve it, if necessary.
Skill
Outcomes
1
select a unit of instruction from the material
you teach, will teach, or have studied as a student, that includes skill or
product outcomes. Go through the process of devising a performance assessment
for one of those outcomes, including performance criteria, exercises, and a
scoring and recording scheme.
Product
Outcomes
1.
Evaluate the assessment you created in
the exercise above in terms of the attributes of sound assessment discussed in
this and earlier chapters. How did you do?
Affective Outcomes
1.
Some have argued that performance
assessments are too fraught with potential bias due to evaluator subjectivity
to justify the attention they are receiving these days. Do you agree? Why?
2.
Throughout the chapter, I argue that
the assessment development procedures outlined here will help teachers who use
performance assessments to connect those assessments directly to their
instruction. Having completed the chapter, do you agree? Why?
3.
I also argue that the assessment
development and use procedures suggested herein, while apparently very labor
intensive, could save you valuable teaching time in the long run. Do you
agree? Why?
Tidak ada komentar:
Posting Komentar