Robots often struggle with new tasks because explicitly defining rewards, what the robot should or shouldn't do, is challenging. However, humans can easily identify good and bad behaviors just by looking at short video clips of robots performing tasks. In preference learning, we leverage this intuition: by collecting a small set of these human preferences, a robot can efficiently learn new tasks even without explicitly defined rewards. This project investigates how efficiently we can learn a useful reward signal from a small set of human-labeled video comparisons. Starting from demonstrations of a "right-handed" drawer-opening task, you will collect labeled preference examples to quickly adapt the robot to an unseen, mirrored "left-handed" drawer task. The learned preference model (following the PEBBLE approach) will be integrated into an existing off-policy RL algorithm, allowing rapid policy fine-tuning.
This is a perfect project for students interested in human-robot interaction, RL, and reward learning. You will gain practical experience with state-of-the-art methods and directly explore how minimal human guidance enhances robotic learning efficiency. When applicable, your results can be published as a short benchmark note or as an appendix to an existing paper.
For more details or to apply, feel free to contact me directly via email or in-person.
You should be comfortable with PyTorch and a python-based simulation framework, familiarity with basic RL is a plus. You should also be willing to collect and annotate short video clips using provided tools. The project provides you with a prebuilt 3D printed WidowX arm, ready-to-use MuJoCo simulation environments, baseline RL implementations, and all necessary computational resources (though bringing your own GPU is a plus).
Frequency | Weekday | Time | Format / Place | Period | |
---|---|---|---|---|---|
by appointment | n.V. | 13.10.2025-06.02.2026 |
Module | Course | Requirements | |
---|---|---|---|
39-M-Inf-P Projekt | Projekt | Ungraded examination
|
Student information |
The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.
Enable rapid adaptation of a simulated robot to a new task using only minimal human preference feedback based on state-of-the-art human-in-the-loop RL methods.