392219 Machine Learning in Bioinformatics (V) (WiSe 2016/2017)

This course has been cancelled!

Contents, comment

Soft Computing for Bioinformatics: Data mining with application in Flow Cytrometry Data using R

Big Data is a growing challenge across technological and disease domains for both fundamental research and clinical applications. One such example is flow cytometry data. Developed 50 years ago, the technology has recently significantly increased in the complexity of its output. A series of lectures and hands-on tutorials will use flow cytometry data as the foundation upon which to illustrate how some of the challenges recently introduced in clustering, classification and data processing have been approached though the development of an ecosystem of about 40 R/BioConductor packages by researchers. This lecture series aims at three goals: 1) to introduce fundamental concepts in machine learning and data mining, 2) to motivate research questions an challenges which arise in an important field of bioinformatics, flow cytometry, and 3) to provide participants some familiarity with the open source software environment R as an analysis tool for FCM data as they explore the fundamental concepts of taking their data to diagnosis and discovery. Participants will learn how to from within R be able to interact with the file system, explore their data using various visualizations, perform simple automated pre-processing tasks, quickly generate quality assurance reports to identify potential technical issues with the data, and automate simple analysis strategies. As well, several dimension reduction, supervised and unsupervised clustering techniques (e.g., based on k-means, spectral clustering, Gaussian Mixture Models, density, binning, T-distributed stochastic neighbor embedding) along with advanced analysis options will be discussed and used. Students will come away with a better understanding of bioinformatics approaches to data analysis using this widely used statistical computing language, be able to write their own analysis scripts in R to explore and analyze data, as well automate tasks. High performance computing using R, reproducible research and packaging of R software for easy use for non-programming collaborators will also be explored.

Modules will include:

Data Mining Theory

  • Fundamental concepts of machine learning / data mining. Example machine learning problem. How and when to use machine learning? What are important formalizations of learning problems? What are the relevant steps for the learning pipeline?
  • Classification and regression. What is a classification problem? What is a regression problem? Exemplary classification / regression models such as linear regression, kNN, Gaussian process and their differences in modelling the data.
  • Overfitting and regularization. The bias variance dilemma and the role of regularization. Effects of boosting and bagging.
  • Unsupervised learning. How to formalize unsupervised problems? Examples of successful models such as dimensionality reduction, sparse coding, independent component analysis, and clustering.

R-Tutorial

  • Fundmentals of R programming: R main data and control structures, plots
  • Statistics with R: probability distributions, confidence intervals, p values, simple tests

Analysis of Flow Cytometry data

  • Introduction to Flow Cytometry Analysis in R. Though this lecture, Students will be introduced to the framework for high throughput data analysis within R, its advantages and disadvantages through illustrative examples from recent papers.
  • Exploring FCM data in R. Students will gain hands-on experience to working with real flow cytometry data within R
  • Preprocessing and Quality Assurance of FCM Data. Students will gain hands-on experience to exploratory data analysis and pre-processing, including normalization.
  • Supervised clustering. A combination of lecture and hands-on learning will introduce supervised clustering approaches.
  • Unsupervised clustering and classification. A combination of lecture and hand-on learning will introduce a wide spectrum of clustering approaches as they have been used for flow cytometry data analysis, and evaluating the results of these tools for biomarker identification and patient diagnosis.
  • Using R for high performance computing and reproducible research, and how to deploy interactive software for wet-lab end users. Big data requires big computers, and how to use R within a High Performance Computing framework will be explored. Students will be introduced to the importance of reproducible research in data analysis and how to integrate this into their own work through mechanisms available within R. Finally a lot of bioinformatics work involves collaboration with scientists who are not programmers and means of how to enable these vital interactions will be explored.

Teaching staff

Subject assignments

Module Course Requirements  
39-Inf-AKS Anwendungen Kognitiver Systeme Maschinelles Lernen im Web oder Modern Data Analysis oder Softcomputing für die Bioinformatik Graded examination
Student information

The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.


No more requirements
No eLearning offering available
Address:
WS2016_392219@ekvv.uni-bielefeld.de
This address can be used by teaching staff, their secretary's offices as well as the individuals in charge of course data maintenance to send emails to the course participants. IMPORTANT: All sent emails must be activated. Wait for the activation email and follow the instructions given there.
If the reference number is used for several courses in the course of the semester, use the following alternative address to reach the participants of exactly this: VST_78460631@ekvv.uni-bielefeld.de
Notes:
Additional notes on the electronic mailing lists
Email archive
Number of entries 0
Open email archive
Last update basic details/teaching staff:
Thursday, February 23, 2017 
Last update times:
Thursday, February 23, 2017 
Last update rooms:
Thursday, February 23, 2017 
Type(s) / SWS (hours per week per semester)
lecture (V) / 2
Language
This lecture is taught in english
Department
Faculty of Technology
Questions or corrections?
Questions or correction requests for this course?
Planning support
Clashing dates for this course
Links to this course
If you want to set links to this course page, please use one of the following links. Do not use the link shown in your browser!
The following link includes the course ID and is always unique:
https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=78460631
Send page to mobile
Click to open QR code
Scan QR code: Enlarge QR code
ID
78460631