Back in 2018, Pete Fussey, a sociology professor from the University of Essex, was studying how police in London used facial recognition systems to look for suspects on the street. Over the next two years, he accompanied Metropolitan Police officers in their vans as they surveilled different pockets of the city, using mounted cameras and facial-recognition software.
Fussey made two important discoveries on those trips, which he laid out in a 2019 study. First, the facial-recognition system was woefully inaccurate. Across all 42 computer-generated matches that came through on the six deployments he went on, just eight, or 19%, turned out to be correct.
Second, and more disturbing, was that most of the time, police officers assumed the facial-recognition system was probably correct. “I remember people saying, ‘If we’re not sure, we should just assume it’s a match,’†he says. Fussey called the phenomenon “deference to the algorithm.â€
This deference is a problem, and it’s not unique to police. In education, ProctorU sells software that monitors students taking exams on their home computers, and it uses machine-learning algorithms to look for signs of cheating, such as suspicious gestures, reading notes or the detection of another face in the room. The Alabama-based company recently conducted an investigation into how colleges were using its AI software. It found that just 11% of test sessions tagged by its AI as suspicious were double-checked by the school or testing authority.
This was despite the fact that such software could be wrong sometimes, according to the company. For instance, it could inadvertently flag a student as suspicious if they were rubbing their eyes or if there was an unusual sound in the background, like a dog barking. In February, one teenager taking a remote exam was wrongly accused of cheating by a competing provider, because she looked down to think during her exam, according to a New York Times report.
Meanwhile, in the field of recruitment, nearly all Fortune 500 companies use resume-filtering software to parse the flood of job applicants they get every day. But a recent study from Harvard Business School found that millions of qualified job seekers were being rejected at the first stage of the process because they didn’t meet criteria set by the software.
What unites these examples is the fallibility of artificial intelligence. Such systems have ingenious mechanisms — usually a neural network that’s loosely inspired by the workings of the human brain — but they also make mistakes, which often only reveal themselves in the hands of customers.
Companies who sell AI systems are notorious for touting accuracy rates in the high 90s, without mentioning that these figures come from lab settings and not the wild. Last year, for instance, a study in Nature looking at dozens of AI models that claimed to detect Covid-19 in scans couldn’t actually be used in hospitals because of flaws in their methodology and models.
The answer isn’t to stop using AI systems but rather to hire more humans with special expertise to babysit them. In other words, put some of the excess trust we’ve put in AI back on humans, and reorient our focus toward a hybrid of humans and automation. (In consultancy parlance, this is sometimes called “augmented intelligence.â€)
—Bloomberg