A growing number of school districts have adopted a system called value-added modeling to answer that question, provoking battles from Washington to Los Angeles — with some saying it is an effective method for increasing teacher accountability, and others arguing that it can give an inaccurate picture of teachers’ work.
The system calculates the value teachers add to their students’ achievement, based on changes in test scores from year to year and how the students perform compared with others in their grade.
People who analyze the data, making a few statistical assumptions, can produce a list ranking teachers from best to worst.
Use of value-added modeling is exploding nationwide. Hundreds of school systems, including those in Chicago, New York and Washington, are already using it to measure the performance of schools or teachers. Many more are expected to join them, partly because the Obama administration has prodded states and districts to develop more effective teacher-evaluation systems than traditional classroom observation by administrators.
A report released this month by several education researchers warned that the value-added methodology can be unreliable.
“If these teachers were measured in a different year, or a different model were used, the rankings might bounce around quite a bit,” said Edward Haertel, a Stanford professor who was a co-author of the report. “People are going to treat these scores as if they were reflections on the effectiveness of the teachers without any appreciation of how unstable they are.”
Other experts disagree.
William L. Sanders, a senior research manager for a North Carolina company, SAS, that does value-added estimates for districts in North Carolina, Tennessee and other states, said that “if you use rigorous, robust methods and surround them with safeguards, you can reliably distinguish highly effective teachers from average teachers and from ineffective teachers.”
Dr. Sanders helped develop value-added methods to evaluate teachers in Tennessee in the 1990s. Their use spread after the 2002 No Child Left Behind law required states to test in third to eighth grades every year, giving school districts mountains of test data that are the raw material for value-added analysis.
Even critics acknowledge that the method can be more accurate for rating schools than the system now required by federal law, which compares test scores of succeeding classes, for instance this year’s fifth graders with last year’s fifth graders.
But when the method is used to evaluate individual teachers, many factors can lead to inaccuracies. Different people crunching the numbers can get different results, said Douglas N. Harris, an education professor at the University of Wisconsin, Madison. For example, two analysts might rank teachers in a district differently if one analyst took into account certain student characteristics, like which students were eligible for free lunch, and the other did not.
Millions of students change classes or schools each year, so teachers can be evaluated on the performance of students they have taught only briefly, after students’ records were linked to them in the fall.
In many schools, students receive instruction from multiple teachers, or from after-school tutors, making it difficult to attribute learning gains to a specific instructor. Another problem is known as the ceiling effect. Advanced students can score so highly one year that standardized state tests are not sensitive enough to measure their learning gains a year later.
In Houston, a district that uses value-added methods to allocate teacher bonuses, Darilyn Krieger said she had seen the ceiling effect as a physics teacher at Carnegie Vanguard High School.
“My kids come in at a very high level of competence,” Ms. Krieger said.
After she teaches them for a year, most score highly on a state science test but show little gains, so her bonus is often small compared with those of other teachers, she said.
The Houston Chronicle reports teacher bonuses each year in a database, and readers view the size of the bonus as an indicator of teacher effectiveness, Ms. Krieger said.
“I have students in class ask me why I didn’t earn a higher bonus,” Ms. Krieger said. “I say: ‘Because the system decided I wasn’t doing a good enough job. But the system is flawed.’ ”
This year, the federal Department of Education’s own research arm warned in a study that value-added estimates “are subject to a considerable degree of random error.”
And last October, the Board on Testing and Assessments of the National Academies, a panel of 13 researchers led by Dr. Haertel, wrote to Mr. Duncan warning of “significant concerns” that the Race to the Top grant competition was placing “too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals.”
“Value-added methodologies should be used only after careful consideration of their appropriateness for the data that are available, and if used, should be subjected to rigorous evaluation,” the panel wrote. “At present, the best use of VAM techniques is in closely studied pilot projects.”
Despite those warnings, the Department of Education made states with laws prohibiting linkages between student data and teachers ineligible to compete in Race to the Top, and it designed its scoring system to reward states that use value-added calculations in teacher evaluations.
“I’m uncomfortable with how fast a number of states are moving to develop teacher-evaluation systems that will make important decisions about teachers based on value-added results,” said Robert L. Linn, a testing expert who is an emeritus professor at the University of Colorado, Boulder.
“They haven’t taken caution into account as much as they need to,” Professor Linn said.
Indeed they haven't.
I wish the writers at the op-ed page of the Times would read the news section of the paper and learn how the VAM jive they are cheering for is full of problems.
And I REALLY wish the know-it-alls in the administration would do the same thing.
More on this tomorrow.
I think maybe VAM should be applied to the Obama administration data.
And the revenue and readership numbers of the Times.