Pretrained "reward models" (PRM) promise to estimate task progress directly from observations (e.g., videos), which could reduce the need for hand-designed rewards in reinforcement learning (RL). In this project, you will evaluate an existing PRM on a simulated robotic manipulation task: does it track progress and predict success zero-shot, and where does it fail? If the signal is usable, you will design a simple policy improvement experiment that uses the reward model as-is (zero-shot), for example by ranking, filtering, or weighting trajectories, and compare against naive supervised baselines. The project is simulation-only and focuses on careful evaluation and clean experimentation. You will be given a mature codebase for the simulation, training loops and utilities. Requirements: Solid Python skills and comfort working with existing ML code and datasets. Prior RL / Imitation Learning experience is helpful but not required.
| Rhythmus | Tag | Uhrzeit | Format / Ort | Zeitraum | |
|---|---|---|---|---|---|
| nach Vereinbarung | n.V. | 13.04.-24.07.2026 |
| Modul | Veranstaltung | Leistungen | |
|---|---|---|---|
| 39-M-Inf-P Projekt Projekt | Projekt | unbenotete Prüfungsleistung
|
Studieninformation |
Die verbindlichen Modulbeschreibungen enthalten weitere Informationen, auch zu den "Leistungen" und ihren Anforderungen. Sind mehrere "Leistungsformen" möglich, entscheiden die jeweiligen Lehrenden darüber.