• I2O_title
  • Mind's Eye

    Army scouts are commonly tasked with covertly entering uncontrolled areas, setting up a temporary observation post, and then performing persistent surveillance for 24 hours or longer.  But what if instead of sending scouts on high-risk missions the military could deploy taskable smart cameras?  A truly "smart" camera would be able to describe with words everything it sees and reason about what it cannot see.  These devices could be instructed to report only on activities of interest, which would increase the relevancy of incoming data to users.  Thus, smart cameras could permit a single scout to monitor multiple observation posts from a safe location.

    Army scouts are commonly tasked with covertly entering uncontrolled areas, setting up a temporary observation post, and then performing persistent surveillance for 24 hours or longer.  But what if instead of sending scouts on high-risk missions the military could deploy taskable smart cameras?  A truly "smart" camera would be able to describe with words everything it sees and reason about what it cannot see.  These devices could be instructed to report only on activities of interest, which would increase the relevancy of incoming data to users.  Thus, smart cameras could permit a single scout to monitor multiple observation posts from a safe location.

    The enabling technology for such a smart camera is machine-based visual intelligence.  The Mind's Eye program seeks to develop the capability for visual intelligence by automating the ability to learn generally applicable and generative representations of action between objects in a scene directly from visual inputs, and then reason over those learned representations.  

    A key distinction between this research and the state of the art in machine vision is that the latter has made continual progress in recognizing a wide range of objects and their properties—what might be thought of as the nouns in the description of a scene.  The focus of Mind's Eye is to add the perceptual and cognitive underpinnings for recognizing and reasoning about the verbs in those scenes, enabling a more complete narrative of action in the visual experience.

Share this page: