Pretrained "reward models" (PRM) promise to estimate task progress directly from observations (e.g., videos), which could reduce the need for hand-designed rewards in reinforcement learning (RL). In this project, you will evaluate an existing PRM on a simulated robotic manipulation task: does it track progress and predict success zero-shot, and where does it fail? If the signal is usable, you will design a simple policy improvement experiment that uses the reward model as-is (zero-shot), for example by ranking, filtering, or weighting trajectories, and compare against naive supervised baselines. The project is simulation-only and focuses on careful evaluation and clean experimentation. You will be given a mature codebase for the simulation, training loops and utilities. Requirements: Solid Python skills and comfort working with existing ML code and datasets. Prior RL / Imitation Learning experience is helpful but not required.
| Frequency | Weekday | Time | Format / Place | Period | |
|---|---|---|---|---|---|
| by appointment | n.V. | 13.04.-24.07.2026 |
| Module | Course | Requirements | |
|---|---|---|---|
| 39-M-Inf-P Project Projekt | Projekt | Ungraded examination
|
Student information |
The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.