ConceptACT: Episode-Level Concepts for Sample-Efficient Robotic Imitation Learning
Jakob Karalus, Friedhelm Schwenker
- Year
- 2026
- Access
- Open access
Abstract
Imitation learning enables robots to acquire complex manipulation skills from human demonstrations, but current methods rely solely on low-level sensorimotor data while ignoring the rich semantic knowledge humans naturally possess about tasks. We present ConceptACT, an extension of Action Chunking with Transformers that leverages episode-level semantic concept annotations during training to improve learning efficiency. Unlike language-conditioned approaches that require semantic input at deployment, ConceptACT uses human-provided concepts (object properties, spatial relationships, task constraints) exclusively during demonstration collection, adding minimal annotation burden. We integrate concepts using a modified transformer architecture in which the final encoder layer implements concept-aware cross-attention, supervised to align with human annotations. Through experiments on two robotic manipulation tasks with logical constraints, we demonstrate that ConceptACT converges faster and achieves superior sample efficiency compared to standard ACT. Crucially, we show that architectural integration through attention mechanisms significantly outperforms naive auxiliary prediction losses or language-conditioned models. These results demonstrate that properly integrated semantic supervision provides powerful inductive biases for more efficient robot learning.
Keywords
Related papers
Real-Time Obstacle Avoidance for Manipulators and Mobile Robots
Oussama Khatib
1986
A Mathematical Introduction to Robotic Manipulation
Richard M. Murray, Zexiang Li, Shankar Sastry
2017
Robot dynamics and control
Mark W. Spong
1989
A tutorial on visual servo control
Seth Hutchinson, Gregory D. Hager, Peter Corke
1996