392219 Machine Learning in Bioinformatics (V) (WiSe 2016/2017)

Diese Veranstaltung ist ausgefallen!

Inhalt, Kommentar

Soft Computing for Bioinformatics: Data mining with application in Flow Cytrometry Data using R

Big Data is a growing challenge across technological and disease domains for both fundamental research and clinical applications. One such example is flow cytometry data. Developed 50 years ago, the technology has recently significantly increased in the complexity of its output. A series of lectures and hands-on tutorials will use flow cytometry data as the foundation upon which to illustrate how some of the challenges recently introduced in clustering, classification and data processing have been approached though the development of an ecosystem of about 40 R/BioConductor packages by researchers. This lecture series aims at three goals: 1) to introduce fundamental concepts in machine learning and data mining, 2) to motivate research questions an challenges which arise in an important field of bioinformatics, flow cytometry, and 3) to provide participants some familiarity with the open source software environment R as an analysis tool for FCM data as they explore the fundamental concepts of taking their data to diagnosis and discovery. Participants will learn how to from within R be able to interact with the file system, explore their data using various visualizations, perform simple automated pre-processing tasks, quickly generate quality assurance reports to identify potential technical issues with the data, and automate simple analysis strategies. As well, several dimension reduction, supervised and unsupervised clustering techniques (e.g., based on k-means, spectral clustering, Gaussian Mixture Models, density, binning, T-distributed stochastic neighbor embedding) along with advanced analysis options will be discussed and used. Students will come away with a better understanding of bioinformatics approaches to data analysis using this widely used statistical computing language, be able to write their own analysis scripts in R to explore and analyze data, as well automate tasks. High performance computing using R, reproducible research and packaging of R software for easy use for non-programming collaborators will also be explored.

Modules will include:

Data Mining Theory

  • Fundamental concepts of machine learning / data mining. Example machine learning problem. How and when to use machine learning? What are important formalizations of learning problems? What are the relevant steps for the learning pipeline?
  • Classification and regression. What is a classification problem? What is a regression problem? Exemplary classification / regression models such as linear regression, kNN, Gaussian process and their differences in modelling the data.
  • Overfitting and regularization. The bias variance dilemma and the role of regularization. Effects of boosting and bagging.
  • Unsupervised learning. How to formalize unsupervised problems? Examples of successful models such as dimensionality reduction, sparse coding, independent component analysis, and clustering.

R-Tutorial

  • Fundmentals of R programming: R main data and control structures, plots
  • Statistics with R: probability distributions, confidence intervals, p values, simple tests

Analysis of Flow Cytometry data

  • Introduction to Flow Cytometry Analysis in R. Though this lecture, Students will be introduced to the framework for high throughput data analysis within R, its advantages and disadvantages through illustrative examples from recent papers.
  • Exploring FCM data in R. Students will gain hands-on experience to working with real flow cytometry data within R
  • Preprocessing and Quality Assurance of FCM Data. Students will gain hands-on experience to exploratory data analysis and pre-processing, including normalization.
  • Supervised clustering. A combination of lecture and hands-on learning will introduce supervised clustering approaches.
  • Unsupervised clustering and classification. A combination of lecture and hand-on learning will introduce a wide spectrum of clustering approaches as they have been used for flow cytometry data analysis, and evaluating the results of these tools for biomarker identification and patient diagnosis.
  • Using R for high performance computing and reproducible research, and how to deploy interactive software for wet-lab end users. Big data requires big computers, and how to use R within a High Performance Computing framework will be explored. Students will be introduced to the importance of reproducible research in data analysis and how to integrate this into their own work through mechanisms available within R. Finally a lot of bioinformatics work involves collaboration with scientists who are not programmers and means of how to enable these vital interactions will be explored.

Lehrende

Fachzuordnungen

Modul Veranstaltung Leistungen  
39-Inf-AKS Anwendungen Kognitiver Systeme Maschinelles Lernen im Web oder Modern Data Analysis oder Softcomputing für die Bioinformatik benotete Prüfungsleistung
Studieninformation

Die verbindlichen Modulbeschreibungen enthalten weitere Informationen, auch zu den "Leistungen" und ihren Anforderungen. Sind mehrere "Leistungsformen" möglich, entscheiden die jeweiligen Lehrenden darüber.


Keine Konkretisierungen vorhanden
Kein Lernraum vorhanden
registrierte Anzahl: 23
Dies ist die Anzahl der Studierenden, die die Veranstaltung im Stundenplan gespeichert haben. In Klammern die Anzahl der über Gastaccounts angemeldeten Benutzer*innen.
Adresse:
WS2016_392219@ekvv.uni-bielefeld.de
Lehrende, ihre Sekretariate sowie für die Pflege der Veranstaltungsdaten zuständige Personen können über diese Adresse E-Mails an die Veranstaltungsteilnehmer*innen verschicken. WICHTIG: Sie müssen verschickte E-Mails jeweils freischalten. Warten Sie die Freischaltungs-E-Mail ab und folgen Sie den darin enthaltenen Hinweisen.
Falls die Belegnummer mehrfach im Semester verwendet wird können Sie die folgende alternative Verteileradresse nutzen, um die Teilnehmer*innen genau dieser Veranstaltung zu erreichen: VST_78460631@ekvv.uni-bielefeld.de
Reichweite:
6 Studierende direkt per E-Mail erreichbar
Hinweise:
Weitere Hinweise zu den E-Mailverteilern
E-Mailarchiv
Anzahl der Archiveinträge: 0
E-Mailarchiv öffnen
Letzte Änderung Grunddaten/Lehrende:
Donnerstag, 23. Februar 2017 
Letzte Änderung Zeiten:
Donnerstag, 23. Februar 2017 
Letzte Änderung Räume:
Donnerstag, 23. Februar 2017 
Art(en) / SWS
V / 2
Sprache
Diese Veranstaltung wird komplett in englischer Sprache gehalten
Einrichtung
Technische Fakultät
Fragen oder Korrekturen?
Fragen oder Korrekturwünsche zu dieser Veranstaltung?
Planungshilfen
Terminüberschneidungen für diese Veranstaltung
Link auf diese Veranstaltung
Wenn Sie diese Veranstaltungsseite verlinken wollen, so können Sie einen der folgenden Links verwenden. Verwenden Sie nicht den Link, der Ihnen in Ihrem Webbrowser angezeigt wird!
Der folgende Link verwendet die Veranstaltungs-ID und ist immer eindeutig:
https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=78460631
Seite zum Handy schicken
Klicken Sie hier, um den QR Code zu zeigen
Scannen Sie den QR-Code: QR-Code vergrößern
ID
78460631