Formative Thoughts on Assessment and Evaluation

My philosophy of assessment and education has been primarily forged by my coaching experience and deepened by my study in Education. Yogi Berra offered a pithy quote that summarizes my thoughts nicely: “it ain’t over ’til it’s over.” I believe learning, of any skill, process, or concept, to be a continuum. We all start somewhere, and end somewhere – the role of an educator is to help students progress as far as possible in the time that we’re given. To accomplish that goal, we use three types of assessment. Diagnostic assessment is used to determine the starting point of students; formative assessment is completed daily to set goals, work towards them, and deliver feedback for the next day; summative assessment only occurs at the end of a course. Summative assessment, in this model, is not part of the learning process – it’s a way of summarizing learning when reporting to outside agencies. In a strictly regulated learning environment, like our public education system, summative assessment needs to be completed more often to satisfy the supervising outside agencies.

During pre-internship, I used all three types of assessment, but I didn’t design them all. Most of the summative assessment I implemented was created by one of my co-ops. For diagnostic tools, I used quick-writes, brainstorms, and think-pair-shares. The summative tools I used included tests, quizzes, presentations, and written narratives. The majority of my tools used were formative; I used many different worksheets, individual and small group conversations, monitoring sheets and anecdotal records, peer-feedback with rubrics, defence for credit forms, written reflections, Kahoot, and class discussions.

My implementation of assessment tools was most effective in my English A30 class, for several reasons. The most important factor was my pre-planning. I was told to do a unit, so I made a unit plan. I built in assessment pieces, and used them to rework my plan as the unit progressed. In comparison, I planned my math classes day by day; I didn’t plan assessment far in advance, and this limited my effectiveness. I was also given significantly more freedom to do my English unit plan however I wanted – I was more strictly bound to teach math traditionally. In a traditional math classroom, descriptive feedback doesn’t really exist, and almost all assessment is summative.

I directly involved my students in the assessment process in my English class. We built a rubric together, they marked each other using the rubric, they submitted defence for credit forms with an argument for why they deserved the mark they chose, and they had a chance to omit one of two marks for the unit at their discretion. However, when I tried to involve my math students, in a minor way, I was shot down. They worked on a math riddle, and I asked them if they’d prefer it to be formative or summative (in student language). Because they found it easy, they requested it to be a summative grade. However, since that math class had more than one section, and the other sections didn’t do the riddle, it could not be included in the grade-book of just one section. School policy actually seemed to purposefully limit DI.

My main English project was fairly open. Students had to create their own myth, but the presentation option was open. They could submit a written copy, or present to the class. One group decided to film a video. This allowed students to express their learning in a medium they found comfortable, but still tied to the ELA curriculum. In math classes, almost every piece of work we did could be done as an individual or in a small group. Timelines for assignments and tests were relaxed; students would frequently come back on another day and finish work during lunch, or during their tutorial periods. I helped scribe for a student with broken fingers, and spent time reading questions to students who had difficulty making sense of word problems.

I found pre-internship felt like a snippet of internship – I went in, started teaching, meeting students, forming relationships. Then, after 3-weeks, I was whisked out, so I didn’t have to deal with the consequences of all the mistakes I made. It felt like a chance to make all kinds of mistakes, to prevent me from being stuck with them for four months in the fall. I hopefully got a lot of the bugs out of my system (though, of course, other bugs will surface). The biggest barriers I predict for internship will be the restrictions placed on me as a math teacher. If my internship school is like where I went for pre-internship, I can’t design my own summative assessment, all evaluation is standardized across the district, and I’ll be expected to teach topics in a particular order, following the textbook.

I don’t want to spend the last four months of 2015 frustrated and champing at my bit. My plan is to approach variety from a slow and steady perspective. Get to know my co-op,  build trust, and make small changes as I go. If I can develop mathematical curiosity in my  students, and facilitate meaningful discussions around mathematical topics, I’ll be happy. Overcoming fear of math is one of the essential steps of building conceptual understanding. If I have to do some workbook problems along the way, then I will prepare myself for that, mentally.

3 Key Learnings

Take it slow: In the span of 4 days, I had my English class design a rubric, analyse a myth, complete a group presentation, provide their peers with feedback, argue for a mark, and offer suggestions to improve the rubric. It was too many new ideas, too fast. I need to scaffold their learning a bit more. Use a simpler project for their first peer feedback; allow them to design a rubric that I use to mark them; argue for grades later in the semester. Overall, students enjoyed the experience, but I want more than simple enjoyment – I want deep learning.

Diagnostics catch my assumptions: I did several quick-writes from students, and I learned so much. Several students didn’t know their peers; group work was the first time some students had socially interacted with their peers; some students were frustrated by the idea of Aboriginal myths because they had taken Native Studies 30; some students requested adaptations for their learning of which the co-op knew nothing. I did rote diagnosis – now that I’ve experienced the value, I plan to do much more.

Specific direction is required for open assessment: I want students to have variety in their assessment, and have several ways to approach assignments. Being vague and unspecific does not achieve this goal. Especially with students trained to anticipate what a teacher is thinking, who aim to meet unspoken expectations, this confuses and frustrates them. For an open assignment to be effective, I need to give even more, clear direction.

Although my philosophy on assessment and evaluation hasn’t changed much this year, my language when discussing it and my approach to achieving it have. It’s beautiful to be able to consistently learn so much. It makes the future look so much brighter, because my learning will have no end.

Behaviourism, Not Behaviour

Behaviourism: A psychological approach that values observable events.

Education Example: We set behavioural objectives, so that we can assess observable evidence of students meeting the objective. We use verbs like “create”, “recite”, and “draw” rather than “know”, “think”, or “understand.”

Behaviour: the way something moves, functions, or reacts

Education Example: Normally a judgement, determining behaviour to be within social norms, or outside of them. Often punish for “bad behaviour” (speaking out, truancy).

As educators, we set behavioural objectives, and sometimes punish for bad behaviour. The former is deified by pedagogical experts, the latter condemned. That they share etymological roots confuses the issue for all students, both in public schools and in Teacher Education programs.

The breakdown of the argument:

  • We teach courses based in a curriculum
  • Grades represent performance of students relative to that curriculum
  • Objectives are how we scaffold student progress towards curricular achievement
  • We phrase behavioural objectives so we can assess student progress
  • Behaviour issues directly related to curricular content are rare
  • Reducing grades for behaviour issues dilutes the meaning of the grade – it no longer reflects curricular knowledge.

Education and knowledge do not exist in a vacuum, separated from context – they cannot. So when we’re told to separate grades from behaviour, I understand. I understand the curricular argument, and I don’t disagree. However, schools are socializing agents – they shape the beliefs, values, and behaviours of students. We expect our education system to prepare socially aware and engaged citizens. But we give no structure to support that.  Disclosure: my CAF experience pre-disposes me to regulation and centralized management. When we expect schools to regulate social norms, but give no central planning to achieve that, it’s left to the discretion of each school district, perhaps each school. Each principal? Each teacher?

I don’t think that’s necessarily bad – I can think of many worse alternatives. But if we care so much about behaviour outcomes, why haven’t we designed a behaviour curriculum? Are we too busy sticking our heads in the sand, denying that schools are socializing our youth?

Simplicity V.S. Objectivity: The Crackling Tension of Assessment Tools

Teachers could be compared to a piece of rope. We’re hardy and resistant, but stretched thin over long hours. We’re a necessary piece of equipment that supports a lot of people, but we’re rarely given a second thought, unless we fail. But think on the plight of the rope chosen for Tug of War. Two groups of people come along one day, they pick up either end of the rope, and they begin to pull with all their might. The poor rope in the centre has no choice but to bear all of their tension. For the tensions inherent in teaching, see this text; for an analysis on how easy it is to break even a strong rope, see this article.

The tension I want to focus on in this post is between making assessment tools simple (as argued by Davies, 2011) and making them objective (as demanded by many students and stakeholders). To clarify the arguments, a simple assessment tool is easy to read, easy to understand, is clear and concise, brief, and doesn’t have too many words. An objective assessment tool will give the same result, no matter who uses it – the perspective of the assessor does not impact the result of the assessment.

Most people involved in education fall somewhere in the middle of this rope’s span. The most simple assessment tool is a blank piece of paper – the assessor fills in whatever they want. It’s entirely subjective, completely simple, and not very useful. The most objective tool removes the need for the assessor – it could be filled out by a rock, and you’d get the same results. But, nobody wants an assessor that has no opinions, so we need a bit more subjectivity.

You might be asking yourself, have I crossed my ropes? Aren’t there two tensions here, one between simplicity and complexity, the other between objectivity and subjectivity? Although the tension could be set up as such, it’s more meaningful to braid the ropes into one. Allow me to demonstrate:

You start with a blank piece of paper. Beautifully simple, but terribly subjective – the assessor can write whatever they want! So you decide to give the assessor some criteria. They now have categories they need to assess within. Objectivity increases, but simplicity decreases – you have words on your page now.

Up next, you decide, well, we need some kind of way to convert this into grades, otherwise the board will be upset (an entirely different tension). So you throw down a scale (say, one to four) for each category. But, you’ll need criteria to distinguish between the levels of the scale, so you create criteria for each level of each category. Your assessment tool is starting to be more objective, but simplicity is rapidly vanishing. You have a full scale rubric on your hands now.

Thus the tension, as I’ve established it. But let’s take a step back – let’s go back to the blank page with a few categories written on it. If we have a strong assessor, who knows what they’re talking about, would descriptive feedback under those headings be enough? Would it be the most useful formative feedback to receive? We only started losing our simplicity and devaluing our assessor when we brought in the grades. Maybe we can stick to a page with some categories for all assessment. Maybe evaluation can be determined by the students after they’ve received lots of feedback from their assessment – they can decide what’s important and how to assign grades based on their performance. Maybe grades can become authentic.

Week 2 – The Graded Deceit

Polymaths are dead.

One person cannot be an expert in many fields – knowledge and terminology have become too specific. We don’t have enough time to study multiple fields deeply.

Since we cannot fully understand the world around us, we put faith in things we don’t understand: we use cellphones without understanding coding; we drive cars without understanding engines; we bank online without understanding cryptology; we watch Netflix without understanding ISPs and streaming routes.

We place faith in mathematics too. We trust that an 89% is better than a 65%.  We believe that a bell curve actually represents our world. And, despite mountains of evidence, most of us believe that school grades actually measure something intrinsic to a student.

Because we cannot all understand everything, we have faith in a contradiction. We believe that all humans are equal and we believe that all students can be ranked, perfectly, by intelligence.

We can be very specific in our ranking. We can say, “Johnny got an 83.3% this semester.” We can also be very unspecific. We can say, “Johnny passed this semester.” The first option ranks Johnny on a 1000 point scale – we give a grade and say it is within one one thousandth of Johnny’s ability. He’s an 83.3% – not 83.4%, not 83.2%. The second option ranks Johnny on a two point scale – he either passed or failed.

Let me put it mathematically – grades are an optimization problem. Society demands a ranking of their children. The more specific the ranking, the less accurately it reflects that child’s ability. As you get less specific, the ranking becomes more accurate (a one point scale says that Johnny is a child. That’s it. Not pass or fail, he just is.) As teachers, we need to find the optimization point for our students, for  our communities, for our students.

The rate of change (steepness of curve) is not fixed. Thus, there are no labels on the horizontal-axis.

The rate of change (steepness of curve) is not fixed. Thus, there are no labels on the horizontal-axis.

The Beginnings of Assessment

I’d been thinking about assessment and evaluation this week, before I went to the first ECS 410 class. Specifically, in a mathematics classroom, I was considering how to administer re-tests. I was thinking in term of both tests and assignments. What I had thought, before the class, was to offer several rewrites, perhaps as many as required, but the potential gain from each rewrite would be decreased. I also considered implementing “peer-coaching periods” in which students would work for a class helping each other to prepare for re-tests. That way, those skilled in certain topics could share (and expand) their knowledge through coaching, while those who needed help would get one on one assistance. It would also free me to circulate and give more personalized feedback.

I still like the idea of peer coaching – ECS 410, in one class, has made me question my re-testing policy. Specifically, why would re-writes decrease returns? Also, how traditional are my concepts of assignments and tests in this scheme – should they be the element revised, rather than the re-testing principle? I need to think on this.

During the Math OCRE on Friday, we had a principal from Saskatchewan in as a guest speaker, and he was describing the way some of his math courses are run; specifically, he was describing the Workplace and Apprenticeship Pathway. Essentially, his school’s program worked like this:

1) Students are given a workbook/chapter/section (reference material of some sort) and teach themselves a concept.

2) Working in groups is strictly forbidden – students must work alone with the material. To work with others is considered cheating.

3) Students write a short test. They are required to pass each question on the quiz. If they are unsuccessful, they are given verbal feedback on what material was incorrect. They must study more, then reattempt an identical test.

4) When they have successfully completed a test, they are given the next unit of reference material, and the process repeats.

I have several issues with this. The first and largest is the lack of teamwork. To call collaboration cheating, especially in a course designed to prepare the student for a Workplace or an Apprenticeship, is incredibly naïve. Secondly, I find the use of a single identical test troubling. To paraphrase the principal: if they can answer this question, they have demonstrated their understanding of the outcome, and have thus completed that curricular objective successfully. I don’t believe that a single question can demonstrate outcome completion – what about choice and application? Decision making and problem solving? One question to demonstrate competence and understanding is not very holistic.

I’m excited for this class. I think I have a lot to learn about assessment and evaluation, and I can’t wait to do so. It matters to me, it will matter to my students – what more can you ask for from a university Education credit?