CAREER: Structure from Recognition, Building Visual Worlds

  • Nistér, David (PI)

Grants and Contracts Details


Project Summary Imagine a computer vision system that takes continuously streaming video as input and opportunistically builds a visual description of the world around it, starting from a very agnostic state where it reconstructs everything from scratch, and progressing towards a state where it is mostly recognizing new input as something it has seen before. This is the dream driving this proposal. The envisioned system will appear in a diverse set of applications, such as in wearable computing as an assistant for fully seeing as well as visually impaired users, general human computer interaction, monitoring television streams, surveillance, robotics, navigation, smart cars, mapping and 3D reconstruction. .The intellectual merit of this proposal is the exploration of a symbiosis between geometry and recognition, aiming at entirely automatic visual 3D reconstruction of an environment. The goal is highcr reliability, along with thc ability to cover a larger scope than currcntly afforded by the state of the art. For example, it is envisioned that a user wears a camera in order to reconstruct a whole school, building interior, airport, city center or ultimately even the entire planet. It is argued that a symbiosis between geometry and recognition is essential to accomplish this goal. .The broader impacts include the development of a certificate program in computer vision and graphics. The certificate program will reach undergraduates earlier in their college careers and include them in current research. It will also attract high school students interested in vision and graphics through mentoring and outreach activities. Integrated with the proposed research and the certificate program, the PI will explore application of the computer vision system to assisting visually impaired users. This will be carried out in collaboration with Dr. Melody Carswell, professor in psychology with access to the visually impaired community. In the interest of performing practically relevant research, the PI's team will embark on the endeavor of building a computer vision system that automatically extracts a human-life-size threedimensional visual world description from video. The results will inform the team's theoretical research and vice versa. The goal is a system that can improve the visual description over time in a trustworthy and robust manner, incorporating large flexibility, while utilizing the constraints and previously gathered knowledge necessary to approach human performance. In this endeavor, we are more concerned with the systems ability to eventually arrive at an organized result that makes overall sense, in contrast to for example concentrating on very high localization precision. In the context of aiding visually impaired users, the PI's team will explore how to distill visual information to pass over an alternative information channel such as a haptic interface. In particular, the team will explore how to best empower the user with close control over the computer vision algorithms. The team will also involve undergraduates in research through a course with openended questions in this area, aimed to evolve into a yearly competition including other schools. Important work in this direction has been carried out by computer vision researchers. However, the amount of work is small in comparison to the number of attempted applications where human visual performance has to be beaten, rather than approached, for computer vision to be useful. By focusing on applying computer vision where it is likely to be useful first, fairly mature technology such as stereo vision or reading printed text can perhaps already be applied in a useful manner. An inspiring goal is then to gradually progress towards more challenging tasks such as reading handwritten text, finding items at the supermarket, recognition in clutter, general scene interpretation, assisting general navigation and eventually even real-time tasks such as hitting a baseball, playing tennis or driving a car.
Effective start/end date1/1/0612/31/06


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.