Forget the AI Race. Let’s Invest in a Data Grid for AI.
This is a letter published by Palantir’s CTO – Akash Jain. Many shareholders have not read this letter. If you haven’t, you should. It will help you understand Palantir’s philosophical approach to data. You can watch a video analysis of it here.
Akash “Aki” Jain is President of Palantir USG, Inc. where he focuses on Artificial Intelligence, USG Technical Engagement, Enterprise Data Management and Cloud Architecture.
Private Sector Perspective — Neil Armstrong’s small step for man in 1969 was a symbolic resolution to the Cold War’s most visible global security power struggle: The Space Race. As victor, the U.S. proved its technological superiority, leading the Soviets to largely concede the space domain to the U.S. More broadly, the U.S. ability to come from behind demonstrated the underlying strength of its economic, technological, and scientific systems.
Today, we are in another pursuit for technological superiority – what has been dubbed the artificial intelligence (AI) race. However, unlike the lunar landing, the so-called “AI Race” has no clearly defined finish line. We know we have an immediate competitor (China), but how will we know if – and when – we have won? The ambiguity around this question is why I believe we need to forget the notion of a singular AI Race and instead focus our efforts on building a data infrastructure to tackle any AI challenge.
For the U.S. to position itself in a place of technological strength for decades to come, we must create systems that allow for steady, continuous, and trustworthy progress on AI. Andrew Ng’s analogy of AI as electricity helps us see that we’re in the infancy of AI’s potential. While AI is substantially more complex than electricity, it is similar in that it is a powerful enabling technology, not an end in itself. What we are missing is a way to make enduring, scalable, and reliable use of that technology.
When introduced in the 19th century, incandescent bulbs were revolutionary. At first, only those consumers who had their own electricity generator could make use of them, limiting their seemingly endless potential. Thomas Edison realized that he could sell many more lightbulbs if there was an easy way for anyone to receive electricity. Without the electrical grid, we would never have seen the widespread adoption of the incandescent bulb and the subsequent surge of innovation to create many other electric devices. We’ve been caught up designing individual AI/ML models (“lightbulbs”), but we don’t have a unified infrastructure to serve as our modern-day electrical grid equivalent.
One particularly compelling reason to invest in such an infrastructure is that AI models aren’t something that can be sprinkled around to make any project better. Most AI models deployed today are quite brittle: they have been developed and trained for a specific use case under specific conditions. Once deployed, model performance and quality may degrade quickly as the data environment evolves. Further, throwing a model at an adjacent problem without re-training it usually does not work.
To give an illustrative example, an AI model for identifying pathologies in X-Ray films was unable to be repurposed at another hospital due to a difference in the radiology films used by different machines — and that was for a nearly identical use case. To harness the value that is currently available from AI, ensuring models are continually provided with appropriate training data and feedback to improve is critical. We must create a holistic AI and data environment – a “grid” – that works around AI model brittleness by making it easy to re-train and evaluate models, and share training data (within appropriate security, data protection, and usage boundaries).
The U.S. Government will struggle to retain the lead in AI because this infrastructure does not exist. Academics, government researchers, and private companies are off in silos building incredible AI/ML capabilities — but like a solitary lightbulb, they are illuminating but a single room in a single house at a time. For the U.S. to maintain technological superiority, we must build the data infrastructure that will allow entire skyscrapers to be illuminated. We must also enable new innovations – not just “lightbulbs,” but “toasters,” “televisions,” and beyond. We must have a means for scaling existing capabilities in the real world and encouraging the development of new ones. And we must do so in a way that stays true to our democratic values. Let’s invest in a data grid for AI.
Just as governments play a role in enforcing standards related to electrical current flow, the U.S. Government has a role to play in establishing our own “grid.” Investing in a data grid for AI is a strategic move that will not only result in an immediate spike in innovation in the short term, but will allow for sustained, step-by-step advancement in the long term. There are several key components we’ll need to get right:
1. Design for iteration, not stagnation: AI systems are learning systems that require constant iteration and feedback. We must build our infrastructure so that it can evolve. Electrical grids today are flexible to support a variety of energy sources, from solar power to coal fired generators, so too must our AI infrastructure empower AI firms and Government programs to adapt to various systems. And, just as grids can surge resources in response to demands, we should build in connectivity, which in turn will allow us to discover and build towards emerging demand and refine and further develop new capabilities.
2. Create an AI deployment infrastructure: Government consumers should have an easy access point to discover, evaluate, and deploy potential AI/ML solutions and training data, monitor algorithm performance, and capture and save any feedback.
3. Adopt open data standards: Just as we have standards for voltage, we need standards for the format, quality, and curation of data, systems, and APIs.
4. Fund an AI training data library: AI/ML models depend upon quality data to train and test. Large, diverse datasets help mitigate algorithmic biases, and our Government is best positioned to conduct quality assurance on this data and enable appropriate access to it. By building out this training data library thoughtfully, instead of via ad hoc, disconnected efforts, our Government can both spur AI development and ensure that training data sets are curated ethically and transparently.
5. Keep our grid secure: We must protect our electricity grid from hackers – similarly, we must ensure that our AI training data, algorithms, and deployment infrastructure are secure.
Following these guidelines in conjunction with existing calls to adhere to strong AI ethics principles and standards, we can invest in an AI infrastructure that enables not just the occasional “incandescent bulb,” but empowers an entire generation with access to enabling technologies that will buoy innovation as more potential use cases are discovered.
By investing in a grid, we can unlock the enormous potential of AI development and ensure our technological, economic, democratic, and military superiority for decades to come.