One of the things people frequently talk about as a drawback of the current class of deep learning techniques that are helping fuel the AI wave is that they require a lot of data to work. But how much is enough data?
“I would say pretty much any business that has tens or hundreds of thousands of customer interactions has enough scale to start thinking about using these sorts of things,” Jeff Dean, a senior fellow at Google, said in an on-stage interview at the VB Summit in Berkeley, California. “If you only have ten examples of something, it’s going to be hard to make deep learning work. If you have 100,000 things you care about, records or whatever, that’s the kind of scale where you should really start thinking about these kinds of techniques.”
Dean knows a thing or two about deep learning — he’s the head of the Google Brain team, a group of researchers focused on a wide-ranging set of problems in computer science and artificial intelligence. He’s been working with neural networks since the 1990s, when he wrote his undergraduate thesis on artificial neural networks.
In his view, machine learning techniques have an opportunity to impact every industry, though the rate at which that will happen depends on the industry.
There are still plenty of hurdles that humans need to tackle before they can take the data they have and turn it into machine intelligence, though. In order to be useful for machine learning, data needs to be processed, which can take time and (at least at first) significant human intervention.
“There’s a lot of work in machine learning systems that is not actually machine learning,” Dean said. “And so you still have to do a lot of that. You have to get the data together, maybe you have to have humans label examples and then you have to write some data processing pipeline to produce the dataset that you will then do machine learning on.”
One area that Google is looking to tackle in order to help make the process of creating machine learning systems easier is using machine learning to determine the right system for solving a particular problem. It’s a tough problem that isn’t anywhere near solved yet, but Dean said that early work is promising.
For example, a network that trained itself was able to post state-of-the-art results on identifying images from the ImageNet dataset earlier this year, and Google-owned DeepMind just published a paper about a version of AlphaGo that appeared to master the game solely through playing itself.