As society hurtles into an era of AI exploration, the need for methods to guarantee the capabilities (and limitations) of generative AI systems is more pressing than ever. Insufficient insights into when and why those capabilities manifest is an ongoing challenge.
DARPA has been a long-term investor in AI research and development. With the influx of large language models, the agency continues to invest in areas that show promise in filling the fundamental gaps between state-of-the-art systems and national security applications, including the Defense Department's mission-critical needs.
As decision-making becomes faster due to generative AI, the agency seeks to develop mathematical foundations for assessing generative AI and providing the guarantees necessary to deploy the technology safely and effectively across the DoD and society.
The Artificial Intelligence Quantified (AIQ) program seeks to develop technology to assess and understand the capabilities of AI to enable guaranteed performance. Researchers will test the hypothesis that mathematical foundations, combined with advances in measurement and modeling, will guarantee an AI system's capabilities, when they will or will not manifest, and why in a quantified way.
Generalization is also key. Current AI evaluation focuses on giving AI systems quizzes like we would give to a person. However, there is no reason to believe that the answers would be the same even for simple rewordings of the same question, never mind real-world applications. That is, we want guarantees about generalization, and math is required for that.
Through AIQ, DARPA will work closely with partners at the National Institute of Standards and Technology (NIST) and the DOD to ensure that when AI systems are deployed in high-stakes situations, one can have confidence in predicting their performance.