Great progress of robotics and learning from language models.
Researchers at OpenAI have developed the Robotics Transformer 1 (RT-1), a machine learning model that’s designed to help robots learn from large and diverse datasets.
RT-1, which is built on a transformer architecture, takes a short history of images from a robot’s camera alongside task descriptions, expressed in natural language, as inputs and directly outputs tokenized actions such as motor commands.
RT-1 is trained on a real-world robotics dataset of 130,000 episodes covering over 700 tasks, and is able to exhibit improved zero-shot generalisation to new tasks and environments compared to previous techniques.
RT-1 is also able to compress image tokens and adaptively select soft combinations that can be compressed based on their impact towards learning, resulting in a more than 2.4x inference speed-up.