Soft Computing for Bioinformatics: Data mining with application in Flow Cytrometry Data using R
Big Data is a growing challenge across technological and disease domains for both fundamental research and clinical applications. One such example is flow cytometry data. Developed 50 years ago, the technology has recently significantly increased in the complexity of its output. A series of lectures and hands-on tutorials will use flow cytometry data as the foundation upon which to illustrate how some of the challenges recently introduced in clustering, classification and data processing have been approached though the development of an ecosystem of about 40 R/BioConductor packages by researchers. This lecture series aims at three goals: 1) to introduce fundamental concepts in machine learning and data mining, 2) to motivate research questions an challenges which arise in an important field of bioinformatics, flow cytometry, and 3) to provide participants some familiarity with the open source software environment R as an analysis tool for FCM data as they explore the fundamental concepts of taking their data to diagnosis and discovery. Participants will learn how to from within R be able to interact with the file system, explore their data using various visualizations, perform simple automated pre-processing tasks, quickly generate quality assurance reports to identify potential technical issues with the data, and automate simple analysis strategies. As well, several dimension reduction, supervised and unsupervised clustering techniques (e.g., based on k-means, spectral clustering, Gaussian Mixture Models, density, binning, T-distributed stochastic neighbor embedding) along with advanced analysis options will be discussed and used. Students will come away with a better understanding of bioinformatics approaches to data analysis using this widely used statistical computing language, be able to write their own analysis scripts in R to explore and analyze data, as well automate tasks. High performance computing using R, reproducible research and packaging of R software for easy use for non-programming collaborators will also be explored.
Modules will include:
Data Mining Theory
R-Tutorial
Analysis of Flow Cytometry data
| Module | Course | Requirements | |
|---|---|---|---|
| 39-Inf-AKS Anwendungen Kognitiver Systeme | Maschinelles Lernen im Web oder Modern Data Analysis oder Softcomputing für die Bioinformatik | Graded examination
|
Student information |
The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.