You are here
UIC TA Handbook - Test Development
By Julian J. Szucko
UIC Office of Testing Services Classroom Exams
In developing classroom exams, as in all other testing situations, the two most important considerations are reliability and validity. Reliability refers to the consistency with which a test measures some construct; the extent to which scores would remain the same if the identical test or an equivalent form were administered on different occasions. Validity refers to the extent to which a test measures what it is designed to measure. A valid mathematics test, for example, measures mathematical ability, not reading skills, or spelling ability, or other unrelated skills. Reliability is usually expressed quantitatively as a correlation coefficient. Though validity can also be measured quantitatively through a test's correlation with an external index of performance, classroom exams are usually evaluated through their content validity: how closely the material on the test parallels the content and skills taught in the course.
The task of every test developer is to maximize validity. It is important, however, to note that validity is dependent on reliability. You cannot have a valid test unless that test is reliable.
The first decision, then, is what kind of test to administer. Most tests can be classified into two broad categories. These are multiple-choice type exams and their variants, including true/false formats, where the test taker must recognize a correct response; and essay exams and their variants, including fill-in-the-blanks, where the test candidate is required to generate a correct response. The first table at the end of this section lists some of the advantages and disadvantages of these two formats.
In general, multiple choice exams are more reliable than essay exams. However, there are situations in which the testing objectives can best be met by an essay exam. Though the focus here will be primarily on multiple choice exams, most of these guidelines are applicable to essay exams.
Developing a Test
Careful planning is the first important consideration in effective test development. Writing exam items will be easier if you keep track of the concepts that you cover in each class period. If you create a table with content areas along the side and applicable thinking skills along the top, you can create a test blueprint that specifies the relative importance of each content-behavior area. Table 2 provides an example. The classification of thinking skills in this table is based on Bloom's taxonomy (Bloom, 1956).
In addition to following the above blueprint to ensure validity, some simple guidelines will help maximize reliability:
- When writing exam questions use simple, precise, and unambiguous language. A student's performance relates to his or her knowledge of the subject, not his or her ability to decipher the meaning of the question.
- The structure and syntax should promote ease of understanding.
- Avoid trick questions.
- Use unambiguous wording.
- Be precise.
- Exclude extraneous or irrelevant information.
- Each item should be based on a single, clearly defined concept rather than on an obscure or unimportant detail.
- Do not include irrelevant or superfluous information in the question. Such information makes the question more difficult to understand.
- Write questions that cover the appropriate range of thinking skills, not just recall. Multiple choice questions should not be limited to rote learning.
- Avoid stating the question in a negative form:
- Negatively phrased questions are often misread and are more difficult to understand.
- Do not use double negatives. Following a negative stem with a negative alternative benefits only the students who are able to follow the logical complexity of such items.
- All questions should be independent.
- Avoid questions that require knowledge of the answer to other questions.
- Take care not to give away answers to one question in any of the others.
- Do not give away clues to the right answer:
- Test-wise students quickly learn that correct answers are often:
- a) longer
- b) either more qualified or more general
- c) use familiar wording, or
- d) a grammatical extension of the stem
- Students also learn that incorrect answers are:
- a) listed first or last
- b) contain extreme words
- c) contain unexpected technical terms, and
- d) include unreasonable statements
- Avoid using "all of the above" as an answer selection. Recognizing one wrong option will eliminate this alternative while recognizing two correct options identifies this as the answer.
- Test-wise students quickly learn that correct answers are often:
- It is acceptable to vary the number of alternatives on the items. There is no psychometric advantage to having a uniform number. Three plausible distractors are better than four implausible ones.
- Try to minimize the effects of irrelevant personality factors by advising all students that it is to their advantage to answer every question.
- Avoid systematic patterns for correct responses.
- Check all alternatives for typos. Typos are most frequently left in the distractors.
Scoring of multiple choice exams is a simple process, best accomplished by submitting the scannable test forms to the Test Scoring Office. Scoring essay exams, on the other hand, is not a trivial matter. The following guidelines for scoring essay exams may increase reliability:
- Avoid identifying students' answer sheets. Scores should reflect the content of the answer, not bias that may be introduced by your knowledge of the student.
- Grade all questions across examinees. This will reduce the scoring error introduced when an answer on one question influences the score of another.
- Try to establish anchors for the entire range of scores. Ideally, a representative sample of answers needs to be read before assigning any scores.
Though not strictly part of the scoring process, following a few additional guidelines may further increase the reliability of your test:
- Your instructions should clearly state what students should do if they don't know an answer. If you want students to demonstrate even minimal knowledge, instruct them to try answering every question rather than leaving an answer blank. Without such instructions answers will be influenced by personality factors rather than knowledge. More timid students may skip the question while bolder students will try to "bluff.”
Avoid giving students choices of questions to answer. You want everyone to take the same test. Reliability is difficult enough to attain without complicating the situation with what are essentially multiple test forms.
Despite your best efforts, you cannot judge the adequacy of your test until it has been administered and you have analyzed the responses. The analysis of your students' test results will help you assign grades and gauge the performance of your students. However, an analysis of the answers given for individual questions will provide suggestions for how weak questions may be improved. It can also suggest content areas that are not receiving adequate coverage in class.