Accelerate Assessment Validity by Avoiding Common Mistakes in Assessment Item Writing

Concurrent Session 1

Add to My Schedule

Brief Abstract

In this presentation, participants will discover common mistakes in writing assessment items that can interfere with the ability of the test to accurately measure student achievement of learning outcomes. They will also discuss and practice with corresponding best practices that will lead to better, more valid items.

Presenters

Eric Orton is an instructional designer with BYU's Division of Continuing Education, working on BYU Online semester-based courses. He previously worked in instructional design and faculty development at both Boise State University and the University of Iowa. He holds a B.S. in Elementary Education from BYU and a Master’s in Instructional Technology from Utah State University. Outside of work, Eric can be found hitting softballs, hunting and fishing, or skiing, depending on the season.

Extended Abstract

Despite the growing use of authentic, alternative assessment strategies, traditional assessments still play important roles in many online and blended courses. When using these assessments, faculty must be concerned with the validity of the items. Do students’ answers actually show that they have achieved the learning outcomes the items are intended to assess?

Assessment item validity is damaged when factors other than mastery of the desired learning outcome (or construct) result in students answering an item correctly at a rate that significantly varies from the rate expected from randomly guessing. Validity is also damaged when students who have actually met learning outcomes answer items incorrectly at an unexpected rate. This is called construct irrelevant variance (CIV).

Much as a poker player looks for “tells” in the speech and behavior of opponents, students who lack the knowledge needed to confidently respond to an assessment item may look for “tells” in the question and the answer choices that can help point them to the correct answer.  We call this behavior testwiseness. It is one of the most common causes of CIV.  Faculty and instructional designers who understand best practices for item writing can avoid such “tells,” making it harder for students to answer items correctly simply by being testwise. This improves the validity of assessments in showing whether or not students have actually achieved the desired learning outcomes.

By the same token, test writers sometimes unduly reduce students’ ability to answer an item correctly by unwittingly obscuring the learning outcome with irrelevant factors. Just as any business must account for some amount of “overhead” that is not directly related to its business goals, any test requires students to process some information that is not directly related to the learning outcome. This informational “overhead,” or cognitive load, is what allows students to understand what is being asked or to navigate the test in general. Some items, however, can present a cognitive overload, creating CIV by introducing information, constructs, or references that are both irrelevant to the learning outcome and unnecessary to the test. Much of this cognitive overload can be eliminated when faculty and instructional designers focus on making test items and formatting as intuitive as possible and on removing irrelevant language and cultural barriers.

This presentation will review common mistakes in writing assessment items. It will also explain corresponding best practices that will lead to better, more valid items. Attendees will examine and discuss assessment items drawn from actual courses. They will collaborate in small groups to critique these examples according to the principles presented and propose improvements to the items. Attendees will also have the opportunity to work on improving assessment items from their own courses, receive feedback from peers, and share the improvements they have made with the larger group.

Attendees will learn to write assessment items that are less subject to testwiseness and that avoid cognitive overload, and are thus more valid. They will leave with access to slides, posted on the conference website, containing overviews of the common mistakes and corresponding best practices presented, as well as authentic examples of each.

Level of Participation:

Attendees will examine and discuss assessment items drawn from actual courses. They will collaborate in small groups to critique these examples according to the principles presented and propose improvements to the items. Attendees will also have the opportunity to work on improving assessment items from their own courses, receive feedback from peers, and share the improvements they have made with the larger group.

Session Goals:

  1. Define “construct irrelevant variance.”

  2. Explain several common mistakes in writing assessment items that can contribute to construct irrelevant variance.

  3. Write assessment items that reduce the potential for construct irrelevant variance resulting from testwise student behaviors and excess cognitive load.

  4. Increase the validity of assessment items by employing item-writing best practices.