## An A+ Solution to Grade Inflation

**November 16, 1998**

Figure 1. "True" versus GPA ranking for raw (top) and modified (bottom) GPA at fictitious ABC College.

**Barry A. Cipra**

Asking undergraduates' advice on grading policy is a bit like consulting with a fox on hen-house security. Nonetheless, it's at least an interesting academic academic exercise, as shown by a three-student team from Harvey Mudd College in this year's Mathematical Contest in Modeling.

Harvey Mudd math majors Aaron Archer, Andrew Hutchings, and Brian Johnson shared top honors for their analysis of a system for ranking students at fictitious ABC College, where the average grade has been inflated to an A-. The team presented their results at a special session for the MCM at the SIAM annual meeting in Toronto. Also honored (but not present) at the session was the team of Nicholas Weininger, Tamas Nemeth-Csori, and Paul Cantrell from Macalester College, for their analysis of the competition's "continuous" problem (see article on page 4 in this issue).

Grade inflation is caused by professors being overly lenient in assigning grades. One of the major complaints against it is that grade inflation tends to create a logjam of straight-A students, making grade point average (GPA) all but useless for differentiating among talented students. Compounding that is another problem: Professors vary in leniency, making it possible for a weak student who picks the right profs to look better than a good student with bad luck. Both problems can occur in the absence of grade inflation (for example, if faculty simply gave all their students a C in every course), but reality tends to favor inflated grading.

The MCM teams that chose the grade inflation problem were asked to devise a method for ranking students that would give the dean at ABC College some confidence in awarding generous scholarships to the "top" 10% of the class. The teams were instructed to design data sets for testing their methods and to discuss limitations of their methods.

The Harvey Mudd team concluded that the dean's problem may have no good solution, but that computing adjusted GPAs could help solve a corresponding problem at the other end: identifying the bottom 10% of students. The modelers assumed that each student at ABC College has an inherent, overall "quality," *q*, and specific aptitudes for specific courses (e.g., math), which have varying degrees of "difficulty," which translates into the amount of spread in grades. (An "easy" course, for example, is one in which all students get the same grade, regardless of how much or how little work they do; a "hard" course is one that tends to distinguish the good from the lazy.) Finally, the team assumed that professors are variably "harsh" in their grading practices, but that the harshness is restricted to the alteration of each student's grade by a constant amount, say three-quarters of a letter grade. (They chose the term "harshness" over "leniency," Archer claims, "to minimize notational confusion"---they used *l* as an index of summation.)

The grade that student *i* would receive in course *j* taught by professor *k* was assumed to be a straightforward function of (*q_i* + *c_*{*i,j*})*d_j* - *h_k*, where *q_i* is student *i'*s overall quality, *c_*{*i,j*} his or her aptitude for course *j*, which is of difficulty *d_j*, and *h_k* is professor *k*'s harshness. The immediate goal was to estimate professors' harshness, based on their grading "histories," and then to compute a modified GPA for each student, adjusting grades and weighting them according to the estimated harshnesses. The overall goal was to obtain a modified-GPA ranking that reflected the "true" ranking by quality.

The Harvey Mudd modelers generated data using normal distributions for quality and aptitude and beta distributions for difficulty and harshness. Their simulations followed 500 students through 16 classes, with 40 students per class. They considered three related measures of how well or poorly the modified-GPA ranking compared with the true ranking: (1) the number of misassigned scholarships, (2) a "scholarship injustice" metric, and (3) a "scaled error" metric (which took all students into account).

The simulations indicated that the modified-GPA approach was ineffective with regard to scholarships: The number of misassigned scholarships actually increased in four of five trials, and the scholarship injustice metric varied widely in relative performance. The scaled error, however, was significantly smaller for the modified-GPA ranking than for the raw-GPA ranking. Moreover, plots of quality vs. GPA rank were considerably more linear for the modified approach, at least for the lower-ranking students (see Figure 1).

"If the administration seeks to accurately rank the top tier of students, it must realize that a bloated aggregate GPA from excessively lenient grading can quickly lead to a situation where no amount of calculations and statistics can recover the desired information about the intrinsic quality of the students," the Harvey Mudd team concludes. But there may be a mathematically better way to peg those who belong on academic probation.

*Barry A. Cipra is a mathematician and writer based in Northfield, Minnesota.*