392234 Project: Teaching Robots Through Human Preferences: Rapid Task Adaption with a 3D-printed Arm (Pj) (WiSe 2025/2026)

Contents, comment

Robots often struggle with new tasks because explicitly defining rewards, what the robot should or shouldn't do, is challenging. However, humans can easily identify good and bad behaviors just by looking at short video clips of robots performing tasks. In preference learning, we leverage this intuition: by collecting a small set of these human preferences, a robot can efficiently learn new tasks even without explicitly defined rewards. This project investigates how efficiently we can learn a useful reward signal from a small set of human-labeled video comparisons. Starting from demonstrations of a "right-handed" drawer-opening task, you will collect labeled preference examples to quickly adapt the robot to an unseen, mirrored "left-handed" drawer task. The learned preference model (following the PEBBLE approach) will be integrated into an existing off-policy RL algorithm, allowing rapid policy fine-tuning.

This is a perfect project for students interested in human-robot interaction, RL, and reward learning. You will gain practical experience with state-of-the-art methods and directly explore how minimal human guidance enhances robotic learning efficiency. When applicable, your results can be published as a short benchmark note or as an appendix to an existing paper.

For more details or to apply, feel free to contact me directly via email or in-person.

Requirements for participation, required level

You should be comfortable with PyTorch and a python-based simulation framework, familiarity with basic RL is a plus. You should also be willing to collect and annotate short video clips using provided tools. The project provides you with a prebuilt 3D printed WidowX arm, ready-to-use MuJoCo simulation environments, baseline RL implementations, and all necessary computational resources (though bringing your own GPU is a plus).

Teaching staff

Dates ( Calendar view )

Frequency Weekday Time Format / Place Period  
by appointment n.V.   13.10.2025-06.02.2026

Subject assignments

Module Course Requirements  
39-M-Inf-P Projekt Projekt Ungraded examination
Student information

The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.


Enable rapid adaptation of a simulated robot to a new task using only minimal human preference feedback based on state-of-the-art human-in-the-loop RL methods.

No eLearning offering available
Address:
WS2025_392234@ekvv.uni-bielefeld.de
This address can be used by teaching staff, their secretary's offices as well as the individuals in charge of course data maintenance to send emails to the course participants. IMPORTANT: All sent emails must be activated. Wait for the activation email and follow the instructions given there.
If the reference number is used for several courses in the course of the semester, use the following alternative address to reach the participants of exactly this: VST_568320566@ekvv.uni-bielefeld.de
Notes:
Additional notes on the electronic mailing lists
Last update basic details/teaching staff:
Sunday, June 15, 2025 
Last update times:
Sunday, June 15, 2025 
Last update rooms:
Sunday, June 15, 2025 
Type(s) / SWS (hours per week per semester)
project (Pj) / 2
Department
Faculty of Technology
Questions or corrections?
Questions or correction requests for this course?
Planning support
Clashing dates for this course
Links to this course
If you want to set links to this course page, please use one of the following links. Do not use the link shown in your browser!
The following link includes the course ID and is always unique:
https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=568320566
Send page to mobile
Click to open QR code
Scan QR code: Enlarge QR code
ID
568320566