Sorry to Burst Your Bubble Sheet! Is it Time to Rethink High-Stakes Testing to Fit Your Institutional Needs?

Concurrent Session 3

Session Materials

Brief Abstract

This presentation will focus on the process of evaluating existing technology and workflow for high-stakes testing, piloting a new testing paradigm, and further, iterative evaluation to support continuous improvement of practice to support student success.


As DoIT’s Coordinator of Learning Analytics, Dr. Penniston leverages institutional, academic, and learning analytics to inform course redesigns and improve student engagement and success. He collaborates closely with DoIT’s Analytics and Business Intelligence team, the Office of Institutional Research and Decision Support (IRADS), and of course, faculty, including recipients of learning analytics 'mini-grants,' which are part of UMBC’s Learning Analytics Community of Practice. Tom has been involved in education for over two decades, teaching students ranging in age and skill-level from early elementary to doctoral, in both domestic and international settings (including as a Peace Corps Volunteer in Moldova). He has worked in online and blended learning in different capacities for 15 of those years as an instructor, builder, and administrator. In addition to writing analytics-related DoIT News stories, Tom has co-authored book chapters and peer-reviewed publications, and presents at conferences, including EDUCAUSE and American Educational Research Association (AERA). Tom earned his PhD through UMBC’s Language, Literacy, and Culture program, and has extensive experience with quantitative, qualitative, and mixed methods designs.

Extended Abstract

Our Institution, like many within our higher education community, has been making use of bubble-sheet testing technology seemingly since the advent of the #2 Pencil. Unfortunately, from a practical standpoint, use of this tool has not only grown increasingly antiquated, but it has served as a resource drain for our Division of Information Technology, particularly amidst high volume testing periods throughout academic terms. Our division has had not only the responsibility of scanning completed bubble sheets through specialized machines, but has also been responsible for managing storage of physical bubble sheets if instructors choose not to return them to students.

Despite adjustments to student assessment amidst the pivot to online instruction during the pandemic, faculty remain in need of means to deploy and score high-stakes tests as a practical component of their pedagogy, particularly in high-enrollment gateway STEM courses. Moving fully virtual during the pandemic accentuated this need, as faculty en masse deployed their assessments online. Many struggled with concerns for academic integrity, and weighed their reluctance to use Monitoring software, which was perceived by some as some combination of technically arduous and an invasion of privacy, against the practical limitations imposed by living in our Chegg-friendly world. In our regression back to fully face-to-face as the standard form of content delivery, this question of how to best support standardized testing has only become more pronounced.

Our existing hardware-based solution for scoring also failed to provide key critical insights, at scale, regarding student success. Although we’ve been able generate individual assessment-level reports with testing specifics, including traditional elements of item analyses, there had been no reasonable, readily adoptable means to pull these data into our warehouse to add academic and demographic contextual data for statistical modeling and reporting without dedicating significant human resources for development of a complex internal extract, transform, and load (ETL) process. In short, our use of the existing technology served our contemporary testing needs in much the same way an abacus would address our modern accounting needs.

All of these issues, including the need for valid measures of student learning, academic integrity, and the need for scalable, data-informed decision making led our institution to adopt a pilot beginning Spring 2020, and then adoption of an alternative solution in Fall 2021. This migration has not only reduced overhead human resource needs in the form of test scoring, but has also allowed us to analyze data to describe and infer the impacts of test design on student outcome, specifically in the form of DFW rates. The most consequential finding from this analysis is the relative impact of overall course design on DFW rates. Future analyses may benefit from more diverse testing pools including additional courses, departments, and colleges. It may also be possible to combine these test data with LMS assessment data to get a more comprehensive view of the relationship between course outcomes and various indicators of assessment design. Systematizing an ETL to pull these data into our warehouse would help streamline these processes and subsequent analyses.

The broader conversation concerning high-stakes testing and academic integrity continues to move forward. Spurred on by campus need and encouraged through the promise inspired by our iterative pilot, evaluate, and messaging approach to change management, additional data has been gathered to inform the potential adoption of an in-person campus testing center.

We will use a variety of strategies to interactively engage session participants throughout the presentation, including use of embedded questions/polls leveraging audience responseware to highlight content relevance for the community. Attendees will learn from the presentation about trends in formative and summative assessments, tools and approaches for high-stakes testing, and means for change management through monitoring, evaluation, and messaging, to adjust to online and hybrid learning environments, which they can take away from the session to apply at their own institutions.