The Talking Robot: Distortion-Robust Acoustic Models for Robot-Robot Communication
Hanlong Li, Karishma Kamalahasan, Jiahui Li, Kazuhiro Nakadai, Shreyas Kousik
- Year
- 2026
- Access
- Open access
Abstract
We present Artoo, a learned acoustic communication system for robots that replaces hand-designed signal processing with end-to-end co-trained neural networks. Our system pairs a lightweight text-to-speech (TTS) transmitter (1.18M parameters) with a conformer-based automatic speech recognition (ASR) receiver (938K parameters), jointly optimized through a differentiable channel. Unlike human speech, robot-to-robot communication is paralinguistics-free: the system need not preserve timbre, prosody, or naturalness, only maximize decoding accuracy under channel distortion. Through a three-phase co-training curriculum, the TTS transmitter learns to produce distortion-robust acoustic encodings that surpass the baseline under noise, achieving 8.3% CER at 0 dB SNR. The entire system requires only 2.1M parameters (8.4 MB) and runs in under 13 ms end-to-end on a CPU, making it suitable for deployment on resource-constrained robotic platforms.
Keywords
Related papers
The Organization of Behavior
D. O. Hebb
2005
Fractional Brownian Motions, Fractional Noises and Applications
Benoît B. Mandelbrot, John W. Van Ness
1968
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi +7 more
2021
A guide to deep learning in healthcare
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar +7 more
2018