Thursday, September 13, 2012

Dilemma in Chicago

The current situation in Chicago between the Mayor’s office and the teacher's union regarding test scores being included in teacher evaluations is representative of an argument that is taking place all over the country.  Policy makers believe that higher test scores are reflective of good teaching and thus highly desirable as part of an accountability package, while teachers see standardized test scores as reflecting so much more than what actually happens in a classroom--much of it beyond their control.

The problem is that both sides are referencing a research device (in the form of standardized tests) about which most people understand very little.

A standardized test is a way of reducing some current state of things to a set of numbers that can be compared and analyzed more easily than words. For example, all the scores from the first time a reading test is administered serve as a statistical approximation of the current state of that population. A few students will answer a single item correctly, a few more two items, and at the other end a few will have answered all the items correctly, and a few will have missed only one or two. Somewhere in the middle of the score range will be found the majority of students, which when viewed graphically represent a curve.

Within the resulting curve several things can then be understood, such as where each student falls in the overall mix, or how a population of students changes over time when the test (or a comparable test) is administered at a later date. The nice thing about such instruments is that they represent a relatively cheap, unobtrusive way for approximating some very complex things.

But a standardized test by design is limited to approximating an overall status quo, and then showing where students, schools, etc., fit in relation to the rest. When a researcher uses an instrument in that manner, the results can be deeply informative, and the proper use of such is (and should continue to be) invaluable. Note that I said "proper" use.

Consider what happens when the purpose changes from research (the only proper use) to accountability (which is nowhere to be found in the design parameters of such an instrument): you risk making an instrument designed to approximate the status quo the basis for teaching and instruction. Let me say that again in slightly different terms: a policy maker that places accountability of any type upon a standardized test score risks focusing teaching and learning in such a way that it risks repeating the untenable present which was the very thing the policy maker wanted to change in the first place! The irony of making a standardized test the primary agent of change is frankly laughable--fomenting change was never in its purpose or design.

Consider the implications from a teacher's perspective. If your instruction is based upon a test, which is an approximation of the status quo, no matter how hard you teach you will find very little opportunity to actually move the system, since the test isn't about what might someday be, but about what is! Thus the political gesture that was intended to change the status quo serves to reify it instead. This problem is confounded with one of the worst things to happen to the tests used for accountability purposes, which is the establishment of a passing score (ironically named, in many instances, "meeting" the standard). That cut score further defines success not at the highest, most aspirational level of content contained on the test, but somewhere in the middle. Thus a teacher is deemed to be successful when they have taught the vast majority of their students content that is somewhere in the midst of where students are currently performing. That hardly seems to me a worthwhile definition of success.

But the frustration continues. In order to build a standardized test that will perform as designed items may come from a set of standards, but the only ones that are included are those that have good discriminatory power. An item that all students answer correctly or incorrectly may tell a great deal about what was or was not taught or learned, but fails to help understand anything about differences in a population. An item that half the students will answer correctly (and, conversely, half incorrectly), provides just the sort of discriminatory power required for such tests, and when grouped together with other items that behave similarly help to parse students into piles. Those piles, as was mentioned, go from those that answered no items correctly all the way up to those that answered them all correctly. Items that all students answered correctly or incorrectly don't contribute to that parsing and are eliminated.

A curriculum based on test items that were picked because they have good discriminatory power, and a curriculum based on what and how students learn will not and should not resemble one another. In fact, if the test serves as the basis for the curriculum--which it does in so many of our states at present--no matter how hard a teacher teaches or how much effort struggling students put into their studies, they will focus on all the wrong things and success will continue to elude them.

All because policy makers failed to ask basic questions about the nature of the key instrument they selected for determining success. They have, in effect, placed the present state of things as the definition of future success and now pretend to something else entirely--that high standards are just around the corner if only teachers would just do their jobs.

Which brings to my final point--the argument that if the test are so basic then students ought to at least be able to perform on that content and failure must be the result of poor teaching. The fallacy here goes back to the design. A standardized test is designed to parse students, not answer questions about what was taught or learned. Any assumption that a standardized test can be used to make such inferences requires a standardized test score to transmogrify into something that was never a part of the design. A standardized test score can certainly be used to show where a student or school ranks among others, but it is absolutely silent as to the cause behind that ranking.

As long as standardized testing remains the basis for school and teacher accountability, the successful schools we want for our students will continue to elude us, no matter how much effort we put into solutions. Teachers in Chicago--if they are like teachers in so many other places--are sick of looking like apologists for saying that standardized test scores shouldn't be a part of their evaluations, while fully accepting the idea of teacher accountability. Policy makers are genuinely concerned that schools don't seem to be getting better. Tools exist that were designed to accomplish what each side desires. Its time opted for them and acknowledged that the current tool isn't up to the task.

Friday, December 16, 2011

Stethoscopes and Education

What would happen if a doctor used only a single instrument to make a diagnosis about a patient? A stethoscope that finds a regular heartbeat and lungs free of pneumonia is incapable of offering up an opinion when it comes to diabetes, for which the proper tool would be a glucose test. A blood pressure cuff can say a great deal about whether or not someone has high blood pressure, but says nothing about whether on not a patient needs an antibiotic to treat an infection.

And so it goes.

Should any one instrument be brought forward and presented, as the only instrument a doctor should use, the results would be catastrophic. Patients that were clearly unhealthy would be deemed as being just fine. Patients identified through the instrument as needing attention would be at the mercy of an uninformed opinion as to what to do next, resulting in massive inefficiency and less than stellar outcomes. Even worse, when an intervention works it will not be because it was selected as the best intervention given the overall needs of the patient, but because someone made a lucky guess.

Think of the old adage regarding test taking that when you don’t know the answer just mark C. The reason is pretty simple: assuming four choices if you always chose C odds are 25% of your C responses would in fact land on the correct answers, since most tests randomize the response pattern. The point is that if you guess consistently you stand a chance of hitting on the right answer often enough that you might fool somebody into thinking its something other than luck.

Or consider a second adage: a broken clock tells the correct time twice a day. If the clock was frozen at 6:00 and you happened to glance at it twice that day, once at 6:00 am and once at 6:00 pm, you might indeed think that the clock was accurate and doing it’s job. In fact, you could act on what you perceived as being accurate information and due to the fact that you happened to look at just the right time your actions would be seen as successful because of the clock even though the clock had nothing to do with it.

A one-instrument system cannot, by definition, work. When it appears to, it is either because we guessed consistently over time and as a result hit on the right answer at least some of the time, or we got lucky in that our actions were right even though the instrument was in fact broken.

Some remarkable things happen the instant a second instrument can be added to the mix. For example, the result of the first instrument can be called into question, as when a stethoscope alone declares a person healthy while a glucose test suggests just the opposite. Or an outcome can be shown to be the result of lucky guessing and not a good decision making process. Or the first instrument can be show to be flawed or even broken.

We are trapped in a paradigm in education that accepts and even celebrates a one-instrument system. We rely on standardized testing as that single instrument, and while it appears in a number of forms (e.g., end of year tests, end of course tests, formative assessment) they are all based upon a similar set of assumptions. Our love affair with the standardized test is a long one and shows no signs of subsiding.

As a result, consider how far from the goal of a high quality personalized education that places us. Our over-reliance on a single instrument means that we lack any real basis for offering much personalization, and are left to come up with generic approaches that we hope fit as many of our students as possible. When/if those approaches work, we are left to presume that it had to do with something other than luck, and yet we have no way to prove or even know that. As a result, not knowing if a success was the result of our actions or blind luck, success is in no way scalable.

Monday, October 24, 2011

The policy problem

What if a government agency asked you to build a building that was guaranteed to fall down, but because it was the law you built it anyway?

Furthermore, what if that same government agency threatened your job if you failed to prop up the mess that got built every time it started to fall down? And yet again, what if that same agency now threatened to find the individual workers who worked on what was a bad design and hold them accountable for succeeding in spite of a design that all but guaranteed failure?

Making rising test scores the goal of education creates just such a system. Test scores were designed and intended to be a check on the system--and that is the limit of their use and their promise. Anything beyond that pretends that a test can magically transform itself into something beyond itself, something as illogical as it is dangerous when it comes to our students.

Monday, October 3, 2011

Article just published

An article I wrote comparing teaching to the test in schools to studying for an eye exam was just published in the October 2011 edition of The School Administrator. Click here to open the article.

Tuesday, September 27, 2011

The paradigm of school reform

For years now "reformers" in the form of educators and policy makers alike have operated under the assumption that they have the ability to fix this thing called education that they and others find profoundly broken. Such a paradigm brings with it a way of thinking about education that has considerable consequences.

For example, in both policy and practice, our actions over the past twenty-five years have taken the form of "anything has to better than this," since something so radically broken can't possibly get any worse and at least this is something. That has led to policies that created massive systems and infrastructure around ideas and tools that were never proven to produce the results dictated in the policies. It meant we stopped trusting educators, turning the reigns of education over to business leaders and other non-educators as if a business model could finally get it right. It meant we entrusted policy makers that have never taught a child with setting policy for that activity. And it now means that all of us have become addicted to test scores and accountability formulas that have almost zero capacity to signal what must be done to make a school better, because they were too short-sighted to consider that there might be a better way.

Make no mistake about it--at no point in the history of education has their been a moment where a careful look at education wouldn't show significant room for improvement (the same is true for any organization). But consider the very different reactions when you compare one approach that condemns American education and demands reform, with one that approaches each and every educator and asks the simple question: "what could be done to improve the quality of education for your students?" The first approach requires a process of condemnation, gathering agreement that the condemnation is justified, putting forth a proposal for change that will then compete with other proposals as the thing that can best fix the problem, and then a process of implementation that will be completed just about the time the next set of solutions are being presented as the new thing that will fix the mess. What a waste of energy that could be put to better use.

The second approach asks that accept what you find in the status quo (so neither good or bad, just as is), and determine a hand full of things you could to make it better. The steps to such a process are: look around, decide what to do, and do it. That doesn't require years to implement, it doesn't require a policy change, and everyone already has the tools to do it.

Imagine if we asked every school principal for the three or four things that they believed could improve their ability to help students achieve at very high levels. Imagine if we also insisted that these things be practical--as in, these things have to work within your budget, you have to work with your current staff, etc. I've had the chance to ask a great many principals that very question insisting on those parameters and every single one of them has been able to offer an immediate answer.

But that answer is almost always followed by the articulation of a set of constraints that are likely to prevent those things from happening, coming almost entirely from the policies and requirements imposed in the name of reform.

If we would stop operating under the paradigm of "reform" for the changes needed in schools we might realize that it is our policies that really deserve the level of energy demanded of the word "reform." If we realized that virtually every teacher and principle has ideas and a professional understanding as to how to make education for their students just a little better, and we held schools accountable for implementing those ideas, we could start tomorrow and see improvements almost immediately. Instead, our reform paradigm insists that teachers and principles really don't know or they would have done it already, that the system remains broken in spite of billions of dollars of investment and lots of hand-wringing by policy makers, and that the broken system has nothing to do with policy makers having spent and legislated stupidly.

Thus the argument to maintain the reform paradigm is really an argument to continue down a very expensive path that to date has accomplished almost nothing in spite of the level of energy and investment. The two great successes that reformists claim occurred on their watch have to do with the increased inclusion of students who historically struggle and increased attention to closing achievement gaps in minority populations, but ironically the argument can be made that this had more to do with broader trends in society. One can, I believe, make the argument that these things would have happened with or without education reform, and that the best the reform movement can claim is to have sped up the process.

What the paradigm of reform has not been able to produce is a great surge in national achievement or a grand improvement in the educational outcomes for children when that was its very promise.

It is high time we thought seriously about that.

Thursday, August 25, 2011

Of interest right now

I watched some YouTube videos recently of a DARPA competition to see whose robot could drive a car on a 60 mile course without outside aid. The cars were equipped with the latest in sensors, teams of some of the smartest scientists in the world oversaw each competitor, and some of the best programmers anywhere in the world were brought in to write the code that would become the brains for each robot. The track and every obstacle had to be extremely well mapped, the maps were programmed in to the massive computers, and away they went.

The winning car completed the course in around four hours, cost a whole bunch of money to develop, and walked away with a big prize for the effort. While the lessons learned are invaluable to DARPA, the effort remains at the prototype stage and will be years from having a commercial purpose. After all, if we're talking about putting human beings into cars that can drive on their own, lives are at stake and we have to be careful. The success experienced by the winner occurred within a closed system where every possible obstacle was known beforehand, which hardly describes what happens when we drive out in the real world.

I wish we would have but the same degree of thought into the data systems that have become so pervasive in schools. Numerous systems have been developed in recent years that gather all of a school's data under one roof, and then offer up the promise of driving instruction such that the result is an increase in achievement.

What disturbs me about that promise is the arrogance in thinking that the answers for something as complex as individual student needs can be programmed into a computer as if learning occurs in some kind of controlled environment. We are just now capable of creating a computer model of driving that is light years behind what a student in driver's ed can accomplish in the real world, and we think that we have the capacity to model individual student learning, something infinitely more complex? DARPA had around a hundred entries for their competition and all fell far short of replicating behavior that most of us do almost as a reflex. We have millions of students, none of whom are entirely alike, each with needs that are uniquely theirs. If we can't yet replicate a single, linear, well-defined task in a controlled environment with that amount of brainpower, why would we think we can model teaching and learning?

Systems, policies, and practices that support teachers in providing for those unique needs stand a better chance of offering an improvement or two for every student than systems that standardize an offering against an algorithm, no matter how complex the algorithm. It is shameful that we seem willing to relegate teachers to automatons within such systems, when the teacher is the only system that stands a chance of actually working.

Tuesday, August 9, 2011

What if the real standard isn't even in the test?

Here's an amazing thought given all the efforts behind school reform: what if the actual standard that we have in our minds for what students should accomplish in school isn't even on the test?

Here's the simple reality: it doesn't have to be.

Standardized tests work when they can show the distribution of students within a domain. We currently define the domain via a set of standards, but the only rule a test designer has to follow in terms of the the standards in order to build a standardized test is that the items come from those standards. However, the most important criteria for a standardized test has little to do with the standards. Rather, the most important criteria is that the item contribute to understanding the distribution of students.

Items that contribute to that understanding are items that roughly half (say 40-60%, generally) of the students taking a test will answer incorrectly. A single item can then divide the students into two piles, a second into three, and so on.

But consider what that means for those things that might matter most to us. I certainly want my kids to achieve a high standard, but in order for items to be included only about half the kids can answer an item representing a high standard correctly. But what if it isn't taught so no one gets it right? If that is the case then the item won't contribute to our understanding of differences among students since it showed them all to be the same--they all missed it. In standardized test mode that item would be tossed as being useless for the purpose behind the test.

But lets say that just such an item managed to make it through somehow. To see the impact of such an item, imagine that all the items were lined up from the easiest item on the test to the most difficult. If a student took the items in that order, we could imagine them starting out strong and than at some point beginning to struggle, and eventually reaching a point where they answer the rest of the items incorrectly. Among those answered incorrectly would most likely be the item representing the standard we actually care about.

What makes such an item particularly useless in state testing programs is the fact that schools are judged by how many of their students answer a certain number of items correctly. So long as they do, the school is declared as having done their job, and to the degree they do not they are judged as missing the boat. Remembering that students will typically answer items correctly only up to a point, and then answer the remainder incorrectly, a school can in fact be declared 100% successful when 100% of their students miss the real standard as represented in that item but answer just enough items to get over the established hump.

Can you think of a stranger world? One in which the very goal of reform which was high standards for all may now be deploying an accountability measure that either doesn't contain the actual goal, or if it does, positions it in such a way that it doesn't actually matter?