Don’t buy the ‘big data’ hype, says cofounder of Google Brain

The thing about big data is that it’s supposed to be “big.” But massive data sets aren’t essential for A.I. innovation, says Andrew Ng, founder and CEO of startup Landing AI and cofounder of Google Brain, the tech giant’s A.I. and deep-learning team.

Currently, a typical A.I. system is developed by feeding it large amounts of data. The system then uses patterns within that data to draw useful connections, for example, using customer data to better understand consumer preferences, leading to more tailored advertising, better recommendations, and more sales.

But with this type of A.I. development, the only way to improve the system is to throw more data at it, and only a few companies in a few countries either have sufficient data or the resources to acquire more of it. 

Subscribe to Eye on A.I. for expert weekly analysis on the intersection of artificial intelligence and industry, delivered free to your inbox.

Those constraints limit the potential for innovation, as up-and-coming firms are outcompeted by giants with better A.I. systems trained on huge data sets. The same can be said of countries vying to become A.I. leaders. Kai-Fu Lee, author of AI Superpowers: China, Silicon Valley, and the New World Order, has called China the “Saudi Arabia of data,” since its tech firms can leverage gigantic amounts of data produced from its massive population.

As Lee noted in 2019, “People [in China] are ordering takeout 10 times more than the U.S. They’re doing mobile payment 50 times more than the U.S. They’re ordering shared bicycles 300 times more than the U.S. And all that data can be fed into an engine to make money as well as to improve user experience.”

Ng’s short response to that argument is, “Don’t buy into that hype.” The machine-learning expert shared that “there are lots of opportunities in lots of sectors for startups, entrepreneurs, and for big companies to go and create new innovations,” even without the massive data sets needed for current A.I. development.

Ng, who’s also cofounder of massive open online course provider Coursera and former chief scientist at Chinese search giant Baidu, made the observation in an interview that aired during Fortune’s 40 Under 40 China event on Thursday.

Ng’s argument is that vast data sets have little application outside of consumer Internet companies. “Maybe you have tons of web search data, or tons of econ data. But that econ data is not very helpful for finding defective parts in a fashion line, or not very helpful for understanding medical records.”

Instead, the next frontier for A.I. will be how to build algorithms around much smaller data sets. Ng cited the example of a system trained to recognize scratched smartphones on an assembly line: There’s no data set of “a million pictures of scratched smartphones” that could be used to train an A.I. system. 

Yet a big challenge to embracing a more “data-centric” perspective of A.I.—where quality of data is more important than quantity—is standardization. Ng noted that different people, even two inspectors in the same factory, will transcribe data completely differently, making it difficult to use that smaller data set.

Progress on this front will help smooth A.I. adoption in sectors that are still in the “relatively early stage of development,” such as health care, manufacturing, and agriculture, and help achieve the $13 trillion in additional value from A.I. adoption that research from the McKinsey Global Institute has predicted.

Ng thinks that A.I. can do more than just help with consumer software. “The value of A.I. in these traditional industries will turn out to be even greater than the value unlocked potentially in [the consumer Internet].”

Subscribe to Fortune Daily to get essential business stories straight to your inbox each morning.