Defense Advanced Research Projects AgencyTagged Content List

Data Analysis at Massive Scales

Extracting information and insights from massive datasets; "big data"; "data mining"

Showing 95 results for Data RSS
A rapidly increasing percentage of the world’s population is connected to the global information environment. At the same time, the information environment is enabling social interactions that are radically changing how and at what rate information spreads. Both nation-states and nonstate actors have increasingly drawn upon this global information environment to promote their beliefs and further related goals.
Networks within the United States and abroad face increasingly broad-spectrum cyber threats from numerous actors and novel attack vectors. Malicious activity also crosscuts organizational boundaries, as nefarious actors use networks with less protection to pivot into networks containing key assets. Detection of these threats requires adjustments to network and host sensors at machine speed. Additionally, the data required to detect these threats may be distributed across devices and networks. In all of these cases, the threat actors are using technology to perpetrate their attacks and hide their activities and movement, both physical and virtual, inside DoD, commercial, and Internet Access Provider (IAP) networks.
Understanding the complex and increasingly data-intensive world around us relies on the construction of robust empirical models, i.e., representations of real, complex systems that enable decision makers to predict behaviors and answer “what-if” questions. Today, construction of complex empirical models is largely a manual process requiring a team of subject matter experts and data scientists.
Department of Defense (DoD) operators and analysts collect and process copious amounts of data from a wide range of sources to create and assess plans and execute missions. However, depending on context, much of the information that could support DoD missions may be implicit rather than explicitly expressed. Having the capability to automatically extract operationally relevant information that is only referenced indirectly would greatly assist analysts in efficiently processing data.
Deep Purple aims to advance the modeling of complex dynamic systems using new information-efficient approaches that make optimal use of data and known physics at multiple scales. The program is investigating next-generation deep learning approaches that use not only high throughput multimodal scientific data from observations and controlled experiments (including behaviors such as phase transitions and chaos), but also of the known science of such systems at whatever scales it exists.