Skip to content

Learned Index Structures–Why ML is a Foundational Technology

Posted on:December 18, 2017 at 02:29 AM

A recent paper published by Google Research provides more evidence that machine learning is a foundational technology that is as significant as electrical power, relational databases, or the internet.

The Case for Learned Index Structures by Kraska, Beutel, et al. examines what happens if one uses bottom-up ML techniques to optimize classic data structures such as B-trees, hash maps, and Bloom filters. The Google team propose using learned indexes, which rebuild the fundamental data structures used bottom-based on the specific characteristics of particular datasets. Historically, this was impractical because one would not re-implement a new index for each new application or dataset. However, machine learning allow us to optimize indexes that are specifically tuned to each and every application.

The tech industy has had decades to optimize hardware memory, cache, and CPU to make classical indexes efficient. As such, it will take several years of R&D to bring learned indexes into production. However, this is already a promising research direction that suggests something new. Much faster, more efficient, and customized data indexes mean new data stores, new applications, and eventually, new businesses.

This aspect of machine learning specifically (and AI more generally) that excites me most. I get the impression that a lot of popular press about AI has been focused: (a) fears of an AI Singularity / take over by artificial general intelligence or (b) consumer-facing voice assistants (such as Alexa, Siri, Cortana) and their related smart speakers or (c) industry pushes like bots by Facebook. In contrast, papers like this show how AI has the potential to transform technology behind the scenes, which eventually makes new technologies, products, and companies available. Walmart could never have become the giant it did without foundational building blocks like the internal combustion engine, trucks, the interstate highway system, and relational databases. In the 2030’s, what will be the new Walmarts enabled by far more efficient data structure indexes?

(h/t to former Microsoft exec Steven Sinofsky for highlighting this paper.)