Wednesday / June 28

Judgment-Free Teacher Evaluation?

Contributed by Jim Popham

In almost all 50 states, teachers will soon be subjected to annual, high-stakes evaluations of their instructional competence. Unlike previous teacher evaluations that were aimed at improving teachers’ instructional skills, these new teacher evaluations are much more likely to lead to a teacher’s dismissal. America’s teachers are, with good reason, concerned.

The trouble is that state officials have swung from one extreme to the other. Stung by criticisms about the subjectivity of previous teacher-evaluation systems, they have instituted what they claim are wholly objective evaluations based on quantitative data. Such by-the-numbers, supposedly scientific teacher evaluations, however, are destined to fail for two reasons.

One of those reasons is the enormous diversity in different teachers’ instructional situations. Teachers differ in what they teach, who they teach, the effectiveness of their students’ previous instruction, and a host of other salient educational variables. To quantitatively evaluate a state’s teachers as though they were functioning in identical instructional settings is flat-out foolish. Yet, many of today’s judgment-free teacher evaluations attempt to do precisely that.

The second obstacle faced by teacher evaluators is the variety and quality of evidence used to arrive at a teacher’s quality. The most common kinds of evidence employed to evaluate teachers are students’ scores on standardized or teacher-made tests, classroom observations of teachers in action, administrative ratings, and students’ evaluations of their teachers. But the quality of those different sorts of evidence varies profoundly from state to state, district to district, school to school, and even within a particular school.

Nonetheless, many of our nation’s current teacher-evaluation procedures mistakenly assume that the performances of a teacher’s students on one kind of test are the same as their performances would be on similar tests. Yet many of the standardized tests currently being employed to judge teachers have not been demonstrated to be suitable for this significant task. That is, most of those tests are unaccompanied by any evidence indicating those tests can distinguish between well taught and badly taught students.

Striking differences can also be found in the quality of evaluative data drawn from classroom observations, administrative ratings, and student evaluations. To illustrate, we often see profound differences in the way that classroom observers have been trained to carry out their observations.
Properly trained observers who rely on a research-rooted observation system, and who observe sufficient numbers of a teacher’s classes, can provide compelling evaluative evidence. Badly trained observers who rely on a make-shift observation system to observe teachers only a few times are apt to provide evaluators with shoddy evidence.

Teacher-evaluation systems that attempt to resolve these two problems by using preformed numerical templates will inaccurately evaluate far more teachers than is necessary. To cope sensibly with such diversities, properly trained and carefully monitored human beings must supply nuanced judgments that fairly and accurately address those diversities.

What’s involved in any commonsense approach to teacher evaluation is not dramatically unlike what we now see in many states’ recently refurbished teacher evaluations—but with one important exception. Most states’ current teacher evaluations attempt to rely on a cookie-cutter, quantitative process in which human judgment plays a minor or nonexistent role.

Will reliance on the judgments of properly prepared evaluators, even if those judgments are transparent and well documented, lead to the errorless evaluation of teachers? Of course not; as with the evaluation of most workers, especially those engaged in complex endeavors, mistakes will be made. But because judgmentally rooted appraisals can cope better with anomalies in teachers’ instructional settings as well as variations in the worthiness of evaluative evidence, a commonsense judgmental approach is the only defensible way to evaluate our nation’s teachers.



Jim PophamW. James Popham, professor emeritus at University of California Graduate School of Education and Information Studies, has spent the bulk of his educational career as a teacher at UCLA. He is the author of Evaluating America’s Teachers, and the creator of the Corwin Teacher Evaluation Consulting Program.

