- Short Description
The project aims at comparing different learning algorithms, specifically CMA-ES (covariance matrix adaptation evolution strategy) and Bayesian optimization, with respect to performance (and potentially user experience). For the evaluation, the above setup with Pepper will be used with an external camera setup for tracking to determine the cost of a rollout. One motivation for this project is the robot skill learning from inexperienced users. Usually, these optimization approaches take quantitative feedback (as obtained from the camera setup), but in the scope of this project should be extended to being able to cope with qualitative, relational feedback users provide which depicts a binary relation of preference in a pair of consecutive trials (extension to dueling bandit/preference learning approaches).
Please note that the teams will be selected by the supervisors on the basis of short applications that students are expected to send to them. Registering to the project in the ekVV will only be regarded as expression of interest; it will not secure a team membership.
Please get in touch with the supervisors for information on the application procedure.
-Requirements and Goals of this project:
Get familiar with optimization algorithms (CMA-ES and Bayesian optimization) and evaluate which performs best on the given task
Implement a setup that interactively optimizes the robot’s skill learning with user feedback using Bayesian optimization.
Extend the existing framework to a dueling preference learning approach (with binary preference relations) and compare this approach to the
standard approaches. Evaluate the framework in a user study.
Required skils:
- in machine learning, python
Rhythmus | Tag | Uhrzeit | Format / Ort | Zeitraum |
---|
Modul | Veranstaltung | Leistungen | |
---|---|---|---|
39-M-Inf-GP Grundlagenprojekt Intelligente Systeme | Gruppenprojekt | unbenotete Prüfungsleistung
|
Studieninformation |
Die verbindlichen Modulbeschreibungen enthalten weitere Informationen, auch zu den "Leistungen" und ihren Anforderungen. Sind mehrere "Leistungsformen" möglich, entscheiden die jeweiligen Lehrenden darüber.
-Scenario: Naive users teach Pepper to catch a ball in a cup.The scenario underlying the proposed Master thesis involves the robotic humanoid platform Pepper learning the game of skill “ball in cup” (bilboquet). See https://www.youtube.com/watch?v=jkaRO8J_1XI .
-Problem Statement: How can non-expert users provide Pepper with adequate
feedback about its skill success?
-Idea: Compare a qualitative-relational teaching approach to a
quantitative feedback teaching approach.
-Research Question: Does a quantitative approach facilitate faster robot
learning and/or easier human teaching?