Semantic Representations for Multi-Viewpoint Multimodal Geolocation

  • Jacobs, Nathan (PI)

Grants and Contracts Details


As NGA and other government agencies collect more data, it becomes increasingly difficult for analysts to keep up. Requiring analysts to reason across multiple modalities of data and/or viewpoints adds a lot of complexity to this task, resulting in either significantly slower analysis or analysts skipping data entirely. The proposed work seeks to automate the aggregation of multimodal data, thereby enabling analysts to more easily access and analyze relevant data for an operation. One can imagine boosting the performance of both human and algorithmically-performed tasks (such as object detection, change detection, etc.) by automatically aggregating relevant data across modalities. To keep this work grounded and ensure that success is measureable, we intend to focus on the problem of UAS localization. Obtaining an accurate georeferenced pose of a UAS is critical to obtaining the geospatial positions of targets recognized in sensor data. However, GPS-denied environments and inaccurate GPS make this problem much more difficult. Having an up-to-date base map (e.g., satellite image of the area) for registration enables an alternative way of estimating an accurate georeferenced pose. However, one does not always have access to up-to-date base maps. Furthermore, if we can utilize base maps derived from drastically different modalities and perspectives than the sensors on-board the UAS, then we can dramatically broaden the utility of these methods. Our proposed work will enable geolocating UAS systems in (possibly out-of-date) base maps from different modalities by leveraging recent developments in deep learning. In other words, we seek to develop algorithms capable of learning common, cross-modal, and semantic representations between UAS sensor data and base maps to enable localization that leverages all available data sources. We envision investigating modalities such as synthetic-aperture radar (SAR), electro-optical (EO), audio, and magnetic data, as well as across varying perspectives (e.g., nadir vs. oblique views for imagers). As opposed to algorithms that learn low-level visual/3D features, which are unlikely to generalize across modalities, we will investigate algorithms capable of extracting higherlevel semantic representations (e.g., road networks, water sources) across modalities. We will also target architectures capable of learning how to use the 3D layout of the scene as the unmanned aerial system (UAS) navigates the environment.
Effective start/end date6/1/205/31/22


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.