Editor’s Note: The study to which the author refers – “Evaluating the Content and Quality of Next Generation of Assessments” – can be found here: https://edexcellence.net/publications/evaluating-the-content-and-quality-of-next-generation-assessments
I’ve always loved assessment and understood its power, so when I saw the Fordham Institute was going to do a study of assessments, I immediately sent an email to find out how I could be a part of the project. After sending my resume, I was pleasantly surprised to find I made the cut! I served on the team that reviewed 5th grade math assessments. This experience rates as one of the best professional development opportunities I have had. We participated in a few webinars where we learned about the methodology and had training from the testing companies. The webinars from the testing companies introduced us to how each test was set up. After the webinars, we came together in Washington, DC to learn more about the work into which we were about to delve.
The Fordham Study was a study of the alignment of the content and quality of three next generation assessments and one state-level assessment for English Language Arts and Mathematics. Assessment items were analyzed for grades 5 and 8 for this study. Grades 5 and 8 were chosen because they represent the capstone, or end of grades, for elementary and middle school. High School assessments were analyzed in a separate but related study. The assessments chosen for the survey were Smarter Balanced, PARCC, ACT Aspire, and the Massachusetts Compressive Assessment System (MCAS). Smarter Balanced, PARCC, and ACT Aspire were selected because they are the new kids on the assessment block, specifically designed to assess Common Core State Standards. The MCAS was chosen because it is known as being one of the best state assessments available.
The methodology of the study was new and several criteria were used to determine the alignment of content and quality. In mathematics, there were five criteria which were analyzed and each criterion received one of four scores: Excellent Match, Good Match, Limited/Uneven Match, or Weak Match. First, we analyzed the items for the alignment to the standard it claimed to assess, and if we didn’t agree with the match provided by the assessment, we would provide the standard we found to be better aligned. In math, this was used to determine the percentage of the test assessing Major Work and it was used to determine the percentage of items which assessed off-grade-level work. On average across their items, both PARCC and Smarter Balanced received scores of a Good Match, while MCAS received a Limited or Uneven Match, and ACT Aspire received a Weak Match.
Next, we determined which of the aspects of Rigor the items assessed: conceptual understanding, fluency/skill, or application. We also indicated if the item was a combined item, meaning it assessed more than one of the aspects of Rigor. This information was used to determine the balance of Rigor in the assessment. While looking at application, we also noted if the application problems had shallow contexts. In mathematics application items, you want to see problems which have both depth and multiple entry points for students. Problems that allow students to simply pull out the numbers and solve are considered to have shallow contexts. Having rich contexts is important for students since life does not give us easy math problems to solve but instead has us making sense of the context. This is especially noticeable in the 5th grade PARCC math assessment. For this criterion MCAS received an Excellent Match and PARCC, Smarter Balanced, and ACT Aspire received a Good Match.
We looked at the Standards for Mathematical Practice next. Here we were only looking to see if the items, which assessed the mathematical practices, also assessed content. All of the assessments reviewed received an Excellent Match on this criteria. As reviewers, we gave feedback to Fordham on all the criteria, but this particular part of the review concerned us since essentially it was either an all or none score. My hope for the next study would be either to look at the alignment of the items to the mathematical practices or to look at the balance of the practices across the assessments.
For criterion four, we analyzed the Depth of Knowledge (DOK) for each item using Webb’s Depth of Knowledge framework. Webb’s Depth of Knowledge allows educators to analyze cognitive demand required by students and use this information to ensure alignment to the Standards. This information was used to determine the match of the distribution of depth of knowledge of the Standards with the assessment items. Too many low DOK items could lower the score, but so could too many high DOK questions. It is important for assessment items to match the cognitive demand for which the educational standards are calling. With too many low DOK items we are not challenging our students enough. When we have too many high DOK questions we may be setting our student up to be unsuccessful. MCAS received an Excellent Match; PARCC and Smarter Balanced both received a Good Match, and ACT Aspire received a Limited/Uneven Match.
Finally, we determined the item types and looked for item quality. When looking at the quality of the items here, we were looking for both operational quality and editorial quality. For example, for operational quality, part of what we were looking for included mathematical accuracy and if unintended answers were possible. All of the assessments included a variety of item types. For this criterion, ACT Aspire and MCAS both received Excellent Matches; PARCC received a Good Match, and Smarter Balanced received a Limited/Uneven Match.
I was energized by participating in this work. For several years I have worked with my districts and schools to analyze their classroom assessments for quality and alignment to educational standards. Being involved in the project brought my thinking to a new level. I would highly encourage readers to read the report and then make a commitment either to increasing the quality of assessments in their districts, schools, and classrooms or to improving the quality of their state assessments. In any case, use the criteria in the report to begin looking at assessments and talking to your colleagues, your administration, your state- level personnel, or your representatives. Your choice to advocate for high-quality assessments makes an impact for students. Students deserve assessments that are worthy of their time, and we deserve assessments that yield valid evidence about our students.