Defense Advanced Research Projects AgencyTagged Content List

Supervised Autonomy

Automated capabilities with human supervision; "human in the loop"

Showing 4 results for Autonomy + Language RSS
The United States Government has an interest in developing and maintaining a strategic understanding of events, situations, and trends around the world, in a variety of domains. The information used in developing this understanding comes from many disparate sources, in a variety of genres, and data types, and as a mixture of structured and unstructured data. Unstructured data can include text or speech in English and a variety of other languages, as well as images, videos, and other sensor information.
Expanded global access to diverse means of communication is resulting in more information being produced in more languages more quickly than ever before. The volume of information encountered by DoD, the speed at which it arrives, and the diversity of languages and media through which it is communicated make identifying and acting on relevant information a serious challenge. At the same time, there is a need to communicate with non-English-speaking local populations of foreign countries, but it is at present costly and difficult for DoD to do so.
Department of Defense (DoD) operators and analysts collect and process copious amounts of data from a wide range of sources to create and assess plans and execute missions. However, depending on context, much of the information that could support DoD missions may be implicit rather than explicitly expressed. Having the capability to automatically extract operationally relevant information that is only referenced indirectly would greatly assist analysts in efficiently processing data.
The U.S. Government operates globally and frequently encounters so-called “low-resource” languages for which no automated human language technology capability exists. Historically, development of technology for automated exploitation of foreign language materials has required protracted effort and a large data investment. Current methods can require multiple years and tens of millions of dollars per language—mostly to construct translated or transcribed corpora.