Interview with Joe Olive, Program Manager, IPTO (11/09)
Johanna Jones: We're here today with Joe Olive who is a program manager with the Information Processing Techniques Office, or IPTO. You manage a program called GALE or Global Autonomous Language Exploitation. What is the goal of the program?
Dr. Joe Olive: The goal of the program is to provide relevant information to our war fighters regardless of the language.
Johanna: And why is this important today?
Dr. Olive: Putting our war fighters in countries where they're incapable of understanding their surroundings, it is important to translate for them what is going on in the media and deliver it to them.
Johanna: How did you come up with the idea for this GALE program?
Dr. Olive: DARPA has been involved in language research for many, many years over three decades. GALE is just an extension of that process. There have been other programs dealing with language processing, speech processing, and machine translation. The nice thing is that this was the right time to make tremendous advances in this field, and I think we've done so.
Johanna: What are some of these challenges that you have been trying to solve with this program, and especially in getting all this information or distilling the data?
Dr. Olive: Well there are two aspects to the work. The first one is the ability to be able to translate both voice and text. Translation requires a tremendous amount of work. One of the problems, of course, is that there are many words in English that say the same thing, and the same is true about other languages. So you always have to pick the right word to translate.
And then finally GALE has a component called "distillation." Distillation essentially is a very sophisticated search engine unlike engines that you see today on the Internet that really are what we call "bag of words." So if you see words matching you associate the query with the answer.
The search engine in GALE really is concerned about the function of the words themselves. So for example if you ask the question, "What did somebody say about somebody else?" you don't want to know what was said about them, you want to know what they said. And the reverse would be, of course, irrelevant information.
Johanna: How do you actually distill this information?
Dr. Olive: We do an analysis of either the text or the speech and decompose it first. Of course we do what we used to do in grammar school and high school, that is construct trees of the sentences. Then we look into the functions of words themselves. So for example, we have ways of analysis for what we call agent, patient, and actual act and then including in that you may have temporal information, location information.
Johanna: What is the accuracy of this technology?
Dr. Olive: Again, we have to talk about really two technologies. Within the search we have two parameters. One is recall, which means how many items did you get from the body of items that were there. And then precision, because you certainly do not want irrelevant information. As a matter of fact, irrelevant information usually tells the user that this technology is not very good and they get discouraged even though it is extremely useful for them.
But overall, what we do when we measure this is we actually compete against humans trying to do the same with the same kind of a database. And in many queries we're getting two to three times the production of humans, although for some queries we're not as good as humans.
Now as far as the actual translation accuracy, in Arabic we have worked on it for three years now and we are beginning to get to the level of what we call "first pass human translation."
Now most translation houses do a first pass and then they do a second pass where somebody will try to edit and improve what the first translator did. We're not there yet for the second pass but we are matching first pass translation.
Johanna: So will humans always still have to be in the loop in this kind of process?
Dr. Olive: Well, that depends on what you need the application for. If you really want to understand what's going on in the field, no. But if you want to include it in the report at least as of now, humans will have to be in the loop. Of course if you really require actionable information, it's highly recommended that a human look at what the machine did anyway.
Johanna: What makes you passionate every day about GALE and coming in and trying to solve these problems?
Dr. Olive: What was most interesting to work with GALE was the fact that the community was going in a very monolithic path towards machine translation. Getting enough resources made it possible to examine many alternate paths. Some of which people originally said, or already tried and failed. And basically their reply was, "Well, try again and try it in the right way with the right resources." One of the reasons why we have succeeded so much in the machine translation field is that we've brought in many different ways of looking at the problem and integrated them together into one methodology that does the translation. Humans, when they listen to speech, do employ many techniques.
They don't just listen to the words. They listen to the words in context. They have hypothesis about which word is going to come next. Those are the kind of techniques that we have put in to practice within the program.
Johanna: So how does this program tie into your other program MADCAT or Multi Lingual Automatic Document Classification Analysis and Translation? Which translates foreign language texts into transcripts.
Dr. Olive: GALE dealt with two media, one was speech and one was text, but the text was computer text, or what we would term written in Unicode. There is a third medium, which is the printed text, or hand written text. So MADCAT tries to address the problem of trying to decipher hand written text as well as printed text. But of course hand written text is extremely important and difficult.
Johanna: So what are your next challenges here at DARPA dealing with translation or collecting this information and translating it into useful information?
Dr. Olive: Obviously we have some way to go with improving, number one, our Arabic translation, but more important our Chinese translation. But DOD is interested in many more languages than just these two. One of our challenges is that we spent a lot of time developing these languages and we have to cut that time short and find methods to develop machine translation for languages quickly. Also, many languages do not have a body of data of what we call "parallel corpora." That is a corpus which has English on one side and a foreign language on the other. We have to develop methodology to develop translation for these languages quickly and with as little data as possible, and with high accuracy.
Johanna: What makes DARPA such a unique place to work?
Dr. Olive: DARPA is only bounded by your own ideas. It's really not bounded by much else. If you have really good ideas, you can find the resources to execute them. They are resources that are so important to get things done in the right way.
