LINGO-Space

LINGO-Space: Language-Conditioned Incremental Grounding for Space

Korea Advanced Institute of Science and Technology
AAAI 2024
^*Indicates the corresponding author

Abstract

We aim to solve the problem of spatially localizing composite instructions referring to space: space grounding. Compared to current instance grounding, space grounding is challenging due to the ill-posedness of identifying locations referred to by discrete expressions and the compositional ambiguity of referring expressions. Therefore, we propose a novel probabilistic space-grounding methodology (LINGO-Space) that accurately identifies a probabilistic distribution of space being referred to and incrementally updates it, given subsequent referring expressions leveraging configurable polar distributions. Our evaluations show that the estimation using polar distributions enables a robot to ground locations successfully through $20$ table-top manipulation benchmark tests. We also show that updating the distribution helps the grounding method accurately narrow the referring space. We finally demonstrate the robustness of the space grounding with simulated manipulation and real quadruped robot navigation tasks. Code and videos are available at https://lingo-space.github.io.

Overall Architecture

The overall architecture of LINGO-Space on a tabletop manipulation task. Given a composite instruction, a graph generator provides a scene graph. A semantic parser decomposes the instruction into a structured form of relation-embedding tuples $r^{(i)}$ where $i\in\{1..M\}$. Finally, a spatial-distribution estimator incrementally updates a probabilistic distribution of locations satisfying spatial constraints encoded in the embedding tuples.

Qualitative results

Task: LINGO-Space composite

Task: LINGO-Space far-unseen

Task: CLIPort packing-unseen-google

Task: SREM comp-one-step-unseen-colors

Quantitative results

Table 1. Evaluation (success score) on our 4 benchmark tasks with new predicates: close and far.

Table 2. Evaluation (success score) on 4 benchmark tasks with multiple referring expressions. The number in the parentheses is the result of the literature.

Real-world demonstration

Real-world demonstration of space grounding using a quadruped robot. (a) is the given instruction; (b) is the output phrases of the semantic parser; (c) and (d) are spatial distributions of the first and the second phrase; (e) is the final spatial distribution; (f) and (g) is the robot’s navigation trajectories in different views.

BibTeX

@inproceedings{kim2024lingo, title={LINGO-Space: Language-Conditioned Incremental Grounding for Space}, author={Kim, Dohyun and Oh, Nayoung and Hwang, Deokmin and Park, Daehyung}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={38}, number={9}, pages={10314--10322}, year={2024} }