It had been a long day. As an administrator, it was one of those days where you feel that hypothetical fires were purposefully being set to test your patience and commitment as a leader.
Despite the challenges, a support staff and I had to conduct a post conference for a teacher. I often encourage at least two individuals to conduct a teacher observation. We had completed an observation the previous day, agreed to write up our observation that night and compare notes the next day. This comparison of notes did not happen due to the challenges that the day of the post conference presented.
As we sat down with the teacher to conduct a post conference, we began to list the strengths and areas of concerns that we had. What became painfully clear is that the strengths and areas of improvements each of us saw were polar opposites.
Our interrater reliability was poor.
This inconsistency impacted our teacher’s ability to grow. This ultimately makes our students suffer from not receiving the best instruction possible… simply because we did not take the time to ensure we were on the same page.
Interrater Reliability is the degree to which raters of an observation can achieve a consensus based on the ratings they provide.
Throughout my four years as an administrator, I have tried to find ways to ensure that interrater reliability existed between all staff within our building. This includes teachers who are encouraged to conduct observations on each other and themselves. What has become abundantly clear is how technology can play a pivotal role in making this goal possible.
The following are three ways to achieve interrater reliability with the assistance of technology.
1. Analyze the Game Tape
Interrater Reliability can often be impacted by how raters remember an observation. It is common for three separate interpretations of an event to occur when conducting a post conference.
Observer #1: “I do remember the student stating that they did not understand the second step to solving that problem. I took note of this.”
Observer #2: “I thought the student stated that they needed help completing the second step; not that they did not understand what the step was.”
Teacher: “No, the student asked for help… I think”
Eliminate this confusion by taping your observations. There are multiple means of technology that allow an observation to be recorded. Once recorded, the observation can be uploaded to a central location, such as a cloud, where the playback is easily accessible and can be viewed on demand.
Eliminate misinterpretations and increase your interrater reliability by simply pulling up the game tape.
Suggested Technology: The SWIVL device is a great tool to conduct recordings that can be easily uploaded to a cloud storage device and viewed by any individual who has the appropriate username and password. It even allows comments to be added to specific points in the video.
2. Practice, Practice, Practice
The key to reaching a high level of interrater reliability is by simply practicing. The key to practicing is that you conduct random observations throughout the year that have the sole purpose of determining how strong the reliability is between stakeholders.
The benefit of these practices is the pressure of an observation is lifted when the understanding of its function is to simply achieve fidelity when using a rubric.
The first step is conduct an observation by recording it (as mentioned in step one). Next, have administration, teachers, and paraprofessionals view the recording of the observation.
Each person will use the same observation rubric when viewing the recording. The important part is that each person completes the rubric independently.
The final step is for all participants to bring their completed observations together and compare notes.
This activity allows the forum for an open and honest dialogue about what each stakeholders should be looking for in an observation. What you should find is that despite the use of a standard observation form, everyone interprets the information differently. With enough practice, you will begin to see the interrater reliability increase dramatically.
Suggested Technology: Google Docs remains an easy way in which to compare notes on a central document. This allows the opportunity to compare notes on a rubric by simply typing it in to a Google Document. Individuals who have access to this free document can view real-time feedback as the comparison of the observation is shared.
3. Throw Quantitative Measures Into the Mix
The numbers don’t lie. After completing these activities, you use can use raw data to determine if interrater reliability truly exists.
Comparing notes, reflecting on what to look out for during observations, and practicing are extremely valuable activities to conduct. However, it is just as important to use an interrater reliability calculator to determine how much cohesion actually consists.
For at least two observations every month, use an observation rubric that contains a rating scale. Take the ratings assigned by each observer and use an interrater reliability calculator. This tool can easily be created and provide an objective measure of the level of unanimity amongst the stakeholders in your school.
Suggested Technology: A Short Video on Interrater Reliability — This video is just one way in which to create an Interrater Reliability Calculator. It is simple and easy to use and provides the needed data to verify your level of cohesiveness.
Teacher growth is the foundation for student learning. In order to achieve this growth, stakeholders must ensure they are on the same page when providing the feedback they deem necessary to improve practice.
A conscious effort by educators to achieve interrater reliability increases the likelihood for an improvement in performance by teacher and student. It also eliminates the confusion and resentment that can be the result of competing perceptions.