Program Summary
The U.S. Government operates globally and frequently encounters so-called “low-resource” languages for which no automated human language technology capability exists. Historically, development of technology for automated exploitation of foreign language materials has required protracted effort and a large data investment. Current methods can require multiple years and tens of millions of dollars per language—mostly to construct translated or transcribed corpora. As a result, human language technology systems exist primarily for languages in widespread use or in high demand. With more than 7,000 languages in the world and the difficulty of predicting the next language for which technology will be needed, universal human language technology coverage by existing means is an unattainable goal.
The goal of the Low Resource Languages for Emergent Incidents (LORELEI) Program is to dramatically advance the state of computational linguistics and human language technology to enable rapid, low-cost development of capabilities for low-resource languages. With the understanding that even with perfect translation, there would still be too much material for analysts to use effectively, LORELEI research will not be focused solely on machine translation. While LORELEI technologies may include partial or fully automated speech recognition and/or machine translation, the overall goal will not be translating foreign language material into English but providing situational awareness by identifying elements of information in foreign language and English sources, such as topics, names, events, sentiment and relationships.
To accomplish this, the LORELEI program will develop human language technology that eliminates the current reliance on huge, manually-translated, manually-transcribed or manually-annotated corpora, leveraging language-universal resources, projecting from related-language resources and fully exploiting a broad range of language-specific resources. The technologies resulting from LORELEI research will be capable of supporting situational awareness based on low-resource foreign language sources within an extremely short time frame – starting as soon as 24 hours after a new language requirement emerges.
LORELEI technology is expected to be applicable to any incident in which a sudden need emerges for assimilation of information by U.S. Government entities about a region of the world where low-resource languages are frequently used in formal and/or informal media. LORELEI capabilities will be exercised to provide situational awareness based on information from any language, in support of emergent missions such as humanitarian assistance/disaster relief, peacekeeping or infectious disease response.
The LORELEI program will hold an Industry Day on November 13, 2014. Please register here.