“Three things are certain: Death, taxes and life-altering tests.”
These tests will determine whether a person advances to the next grade in school. Or whether they get into the university of their choice.
“The better the score, the better set you are for life,” says UNCG’s Dr. Randall Penfield. “It’s a gatekeeper.”
Will the test be fair? Will you be treated equally to the other students around you – or students around the country and the world? How can you tell it’s a level playing field?
Researchers are working to ensure the tests – and every item on them – are as fair as humanly possible, using sophisticated statistical methodologies. Penfield, an innovator in the field, is dedicated to the cause.
“I’m a little bit of an inventor,” he says. “I invent better methods – and using these methods leads to increased fairness.”
A major focus of the department is an area called psychometrics, which deals with how to create tests of people’s knowledge and how to evaluate the quality of scores generated by those tests.
Statistical evaluation can reveal odd differences between groups for a test item. One region vs. another. One gender vs. another. One ethnicity vs. another.
The term is “differential item functioning” or “DIF.” It’s the extent to which a test item functions differently for different groups of test-takers.
“The numbers may tell you there’s something funky going on – for whatever reason, it’s favoring one group over another,” Penfield says. “But you don’t know why exactly that is. The important step is to go back to the content to see why.”
What sets Penfield’s work apart is his decade-long focus on performance assessment. Until recently, most of the algorithms for determining bias had been for multiple choice tests. But his research looks to the next era.
This could be, for example, a standardized test essay question that is scored across a range of categories. Or it could be a complex math problem, with various factors or steps receiving a score.
An important consideration in performance-based assessments is who’s doing the grading.
“With performance-based assessments, humans are not always reliable,” he explains. “Different graders may give different scores to the same writing sample.”
Computer grading is the future, like it or not, he says. Artificial intelligence is here, and it’s cost-efficient.
But while computers are reliable in that they will give you the same score every time, their findings are only as valid as the thinking behind the programs they run.
“We have to ensure that the algorithms used to conduct the scoring are functioning in a way that does not introduce unintended biases in the scores,” says Penfield. “This will take ongoing research.”
He is confident that researchers and those creating tests are up to the task.
“So much effort is going into ensuring the test is as fair as it can possibly be.”
This post was adapted from a UNCG Research Magazine story written by Mike Harris. To read the full story and more, click here.
Photography by Mike Dickens