Interpreting the Index of
Discrimination
The index of discrimination is a
useful measure of item quality whenever the purpose of a test is to produce a
spread of scores, reflecting differences in student achievement, so that
distinctions may be made among the performances of examinees. This is likely to
be the purpose of norm-referenced tests.
For the subset of
criterion-referenced tests known as mastery model tests, we desire that all
examinees score as high as possible. We do not wish to distinguish among
examinees who score at mastery level and therefore are not interested in
maximizing test score variance. In such cases the index of discrimination is
not useful and other measures, such as sensitivity to instruction, are used to
judge item quality.
A basic consideration in evaluating
the performance of a normative test item is the degree to which the item
discriminates between high achieving students and low achieving students.
Literally dozens of indices have been developed to express the discriminating
ability of test items. Most empirical studies have shown that nearly identical
sets of items are selected regardless of the indices of discrimination used. A
common conclusion is to use the index which is the easiest to compute and
interpret.
Such an index of discrimination is
shown on the item analysis reports available from the Scoring Office. This
index of discrimination is simply the difference between the percentage of high
achieving students who got an item right and the percentage of low achieving
students who got the item right. The high and low achieving students are
usually defined as the upper and lower twenty-seven percent of the students
based on the total examination score. This difference in percentages is
expressed as a whole number as a matter of convenience.
A useful rule of thumb in
interpreting the index of discrimination is to compare it with the maximum
possible discrimination for an item. The maximum possible discrimination is a
function of item difficulty. When half or less of the sum of the upper group
plus the lower group answered the item correctly, the maximum possible
discrimination is the sum of the proportions of the upper and lower groups who
answered the item correctly. For example, if 30% of the upper group and 10% of
the lower group answered the item correctly, the maximum possible
discrimination is 30 plus 10, or 40. This maximum possible discrimination would
occur when 40% of the upper group and none of the lower group answered the item
correctly.
Note that the actual discrimination
of the example is 20. It might be said that the discriminating efficiency of
the item, which is the ratio of the actual discrimination to the possible
discrimination, is 50%. See Item A in Table 1.
When more than half of the sum of
the upper group plus the lower group answer an item correctly, the maximum
possible discrimination is 200 minus the sum of the proportions of the upper
and lower groups who answered the item correctly. For example, if 96% of the
upper group and 84% of the lower group answered the item correctly, the maximum
possible discrimination for the item would be 200 minus 180 (96 plus 84), or
20. Since the actual index of discrimination for the item is 96 minus 84, or
12, the discriminating efficiency of the item is 12/20 or 60%. See Item B in
Table 1.
It is important to recognize that an
item which half of the students answer correctly has the highest possible
discriminating potential. Consider an item which 80% of the upper group and 20%
of the lower group answer correctly. According to the rule of thumb for items
answered by half or less of the students, the maximum discriminating ability of
the item is 80 plus 20, or 100. Since the index of discrimination of the item
is 60, the discriminating efficiency is 60%. See Item C in Table 1. As the
difficulty of an item varies so that more than half of the combined upper and
lower groups answer the item correctly, the discriminating ability will
decrease from 100. The lower limit of the maximum discriminating ability is
zero when all of the combined upper and lower groups, or none of them, answer
an item correctly.
|
The techniques discussed above
enable one to determine the upper limit of the index of discrimination. In most
practical situations, determining a lower limit for the index of discrimination
is not a problem, since the most discriminating items are selected from the
available item pool. The practical rule is the higher the discrimination, the
better.
However, there are a number of
techniques which may be used to determine a lower limit below which the index
of discrimination is not significantly different from zero. The first, and most
tedious, would be to determine the statistical significance of the difference
between two proportions, that is, the difference between the proportion of the
upper group who answered the item correctly and the proportion of the lower
group who answered the item correctly.
A second method would be to use a
specially prepared table such as the one in Appendix A of Julian
Stanley's Measurement In Today's Schools, (fourth edition).
Prentice-Hall, 1964. Table A-5 (pp 353-355) indicates the level at which an item
can be considered sufficiently discriminating in terms of numbers of persons.
The number of persons must be converted to a proportion before relating it to
the index of discrimination given on the item analysis report. Use of this
table is convenient and gives values appropriate for 2, 3, 4, or 5 option
items.
A third method of determining the statistical
significance of the index of discrimination would be to compute its standard
error. This might be accomplished by doing an item analysis on two samples of a
large group. The reliability of the index of discrimination may be determined
by correlating the pairs of values from the two item analyses. The rule may
then be applied that the index of discrimination must be more than twice as
large as the standard error in order for the index to be statistically
different from zero at the 2.5 percent level of significance. Experience with
University College final examinations has shown that the standard error
technique and the use of Stanley's table result in the establishment of almost
identical criteria for testing the significance of the index of discrimination
when item analyses are based on 500 students. Comparable criteria will also be
developed by applying the technique of determining the statistical significance
of the difference between two proportions, when the items have difficulty
indices of approximately 50.
No comments:
Post a Comment