The goal of multimodal affect recognition is to predict affect (i.e., basic sense of feeling), from multimodal data such as videos which contain, audio, visual and textual modalities. State-of-the-art methods for affect recognition are increasingly turning to deep learning methods. These methods, however, often use pre-extracted features for each of the modalities. By pre-extracting features, models are missing out on the power of representation learning afforded through deep learning. In this project, you will implement multimodal models using an end-to-end approach (i.e. methods that use raw data instead of pre-extracted features) and compare this approach to existing multimodal methods that use pre-extracted features.
Required skills:
• Python,
• PyTorch preferred (TensorFlow is also ok)
• Experience implementing deep learning models
In case this proposal would not attract enough students for a team project, I'd adapt it into (in reduced/modified form)
• an individual project
• a project for only 2 students
Frequency | Weekday | Time | Format / Place | Period |
---|
Module | Course | Requirements | |
---|---|---|---|
39-M-Inf-GP Grundlagenprojekt Intelligente Systeme | weiteres Projekt | Ungraded examination
|
Student information |
The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.