FindAnything

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment

Sebastián Barbas Laina^*, Simon Boche^*, Sotiris Papatheodorou^*, Simon Schaefer, Jaehyung Jung, Helen Oleynikova, Stefan Leutenegger

TU Munich & ETH Zurich
ICRA 2026
^*Indicates Equal Contribution
The code will be hosted in the OKVIS2-X repository, as an additional feature

Paper Code arXiv

FindAnything's lightweight design allows to be fully deployed onboard autonomous systems, while being semantically accurate. In this example, FindAnything is deployed on an Nvidia Jetson Orin NX on a custom built MAV.

Abstract

Geometrically accurate and semantically expressive map representations have proven invaluable for robot deployment and task planning in unknown environments. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments still presents open challenges, mainly due to computational requirements. FindAnything is an open-world mapping framework that incorporates vision-language information into dense volumetric submaps. Thanks to the use of vision-language features, FindAnything combines pure geometric and open-vocabulary semantic information for a higher level of understanding. It proposes an efficient storage of open-vocabulary information through the aggregation of features at the object level. Pixelwise vision-language features are aggregated based on eSAM segments, which are in turn integrated into object-centric volumetric submaps, providing a mapping from open-vocabulary queries to 3D geometry that is scalable also in terms of memory usage. FindAnything performs on par with the state-of-the-art in terms of semantic accuracy while being substantially faster and more memory-efficient, allowing its deployment in large-scale environments and on resource-constrained devices, such as MAVs. We show that the real-time capabilities of FindAnything make it useful for downstream tasks, such as autonomous MAV exploration in a simulated Search and Rescue scenario

FindAnything produces accurate volumetric reconstructions and which can be queried with natural language prompts. Ideal for non-technical users!

Thanks to its design, objects are decomposed into the most indivisible parts, allowing for fine-grained queries!

FindAnything can be used for downstream tasks, such as autonomous exploration in a Search and Rescue scenarios! Here, the exploration is modulated by a natural language query (here "bed"), which allows the robot to prioritize the exploration of certain areas of the environment.

Video Presentation

BibTeX

@article{laina2025findanything,
          title={FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment},
          author={Laina, Sebasti{\'a}n Barbas and Boche, Simon and Papatheodorou, Sotiris and Schaefer, Simon and Jung, Jaehyung and Leutenegger, Stefan},
          journal={arXiv preprint arXiv:2504.08603},
          year={2025}
}