392242 Project: Learning Rewards from Videos: Evaluating Pretrained Reward Models and Using Them for Robot Policy Learning (Pj) (SoSe 2026)

Contents, comment

Pretrained "reward models" (PRM) promise to estimate task progress directly from observations (e.g., videos), which could reduce the need for hand-designed rewards in reinforcement learning (RL). In this project, you will evaluate an existing PRM on a simulated robotic manipulation task: does it track progress and predict success zero-shot, and where does it fail? If the signal is usable, you will design a simple policy improvement experiment that uses the reward model as-is (zero-shot), for example by ranking, filtering, or weighting trajectories, and compare against naive supervised baselines. The project is simulation-only and focuses on careful evaluation and clean experimentation. You will be given a mature codebase for the simulation, training loops and utilities. Requirements: Solid Python skills and comfort working with existing ML code and datasets. Prior RL / Imitation Learning experience is helpful but not required.

Teaching staff

Dates ( Calendar view )

Frequency	Weekday	Time	Format / Place	Period
by appointment	n.V.			13.04.-24.07.2026

Subject assignments

Module	Course	Requirements
39-M-Inf-P Project Projekt	Projekt	Ungraded examination	Student information

The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.

No more requirements

No eLearning offering available

Address:: SS2026_392242@ekvv.uni-bielefeld.de; This address can be used by teaching staff, their secretary's offices as well as the individuals in charge of course data maintenance to send emails to the course participants. IMPORTANT: All sent emails must be activated. Wait for the activation email and follow the instructions given there.; If the reference number is used for several courses in the course of the semester, use the following alternative address to reach the participants of exactly this: VST_685331719@ekvv.uni-bielefeld.de
Notes:: Additional notes on the electronic mailing lists

Last update basic details/teaching staff:: Friday, February 20, 2026
Last update times:: Friday, February 20, 2026
Last update rooms:: Friday, February 20, 2026

Type(s) / SWS (hours per week per semester): project (Pj) / 2
Department: Faculty of Technology
Questions or corrections?: Questions or correction requests for this course?
Planning support: Clashing dates for this course
Links to this course: If you want to set links to this course page, please use one of the following links. Do not use the link shown in your browser!; The following link includes the course ID and is always unique:; https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=685331719
Send page to mobile: Click to open QR code
Scan QR code:
ID: 685331719

Quick links

392242 Project: Learning Rewards from Videos: Evaluating Pretrained Reward Models and Using Them for Robot Policy Learning (Pj) (SoSe 2026)

Contents, comment

Teaching staff

Dates ( Calendar view )

Subject assignments

Requirement concretion

eLearning

Automatic electronic mailing list for the course

Changes to/updates of the course details

Others