Artificial intelligence’s next big bet is ‘fake’ data

 

Microsoft Corp said recently it would stop selling software that guesses a person’s mood by looking at their face. The reason: It could be discriminatory. Computer vision software, which is used in self-driving cars and facial recognition, has long had issues with errors that come at the expense of women and people of colour. Microsoft’s decision to halt the system entirely is one way of dealing with the problem.
But there’s another, novel approach that tech firms are exploring: training AI on “synthetic” images to make it less biased.
The idea is a bit like training pilots. Instead of practicing in unpredictable, real-world conditions, most will spend hundreds of hours using flight simulators designed to cover a broad array of different scenarios they could experience in the air.
A similar approach is being taken to train AI, which relies on carefully labelled data to work properly. Until recently, the software used to recognise people has been trained on thousands or millions of images of real people, but that can be time-consuming, invasive, and neglectful of large swathes of the population.
Now many AI makers are using fake or “synthetic” images to train computers on a broader array of people, skin tones, ages or other features, essentially flipping the notion that fake data is bad. In fact, if used properly it’ll not only make software more trustworthy, but completely transform the economics of data as the “new oil.”
In 2015, Simi Lindgren came up with the idea for a website called Yuty to sell beauty products for all skin types. She wanted to use AI to recommend skin care products by analysing selfies, but training a system to do that accurately was difficult. A popular database of 70,000 licensed faces from Flickr wasn’t diverse or inclusive enough. It showed facial hair on men, but not on women, and she says there weren’t enough melanin-rich — that is, darker-skinned — women to accurately detect their various skin conditions like acne or fine lines.
She tried crowdsourcing and got just under 1,000 photos of faces from her network of friends and family. But even that wasn’t enough.
Lindgren’s team then decided to create their own data to plug the gap. The answer was something called GANs. General adversarial networks or GANs are a type of neural network designed in 2014 by Ian Goodfellow, an AI researcher now at Alphabet Inc’s DeepMind. The system works by trying to fool itself, and then humans, with new faces. You can try testing your ability to tell the difference between a fake face and a real one on this website set up by academics at the University of Washington, using a type of GAN.
Lindgren used the method to create hundreds of thousands of photorealistic images and says she ended up with “a balanced dataset of diverse people, with diverse skin tones and diverse concerns.”

—Bloomberg

Leave a Reply

Send this to a friend