Soft Computing for Bioinformatics: Data mining with application in Flow Cytrometry Data using R
Big Data is a growing challenge across technological and disease domains for both fundamental research and clinical applications. One such example is flow cytometry data. Developed 50 years ago, the technology has recently significantly increased in the complexity of its output. A series of lectures and hands-on tutorials will use flow cytometry data as the foundation upon which to illustrate how some of the challenges recently introduced in clustering, classification and data processing have been approached though the development of an ecosystem of about 40 R/BioConductor packages by researchers. This lecture series aims at three goals: 1) to introduce fundamental concepts in machine learning and data mining, 2) to motivate research questions an challenges which arise in an important field of bioinformatics, flow cytometry, and 3) to provide participants some familiarity with the open source software environment R as an analysis tool for FCM data as they explore the fundamental concepts of taking their data to diagnosis and discovery. Participants will learn how to from within R be able to interact with the file system, explore their data using various visualizations, perform simple automated pre-processing tasks, quickly generate quality assurance reports to identify potential technical issues with the data, and automate simple analysis strategies. As well, several dimension reduction, supervised and unsupervised clustering techniques (e.g., based on k-means, spectral clustering, Gaussian Mixture Models, density, binning, T-distributed stochastic neighbor embedding) along with advanced analysis options will be discussed and used. Students will come away with a better understanding of bioinformatics approaches to data analysis using this widely used statistical computing language, be able to write their own analysis scripts in R to explore and analyze data, as well automate tasks. High performance computing using R, reproducible research and packaging of R software for easy use for non-programming collaborators will also be explored.
Modules will include:
Data Mining Theory
R-Tutorial
Analysis of Flow Cytometry data
Modul | Veranstaltung | Leistungen | |
---|---|---|---|
39-Inf-AKS Anwendungen Kognitiver Systeme | Maschinelles Lernen im Web oder Modern Data Analysis oder Softcomputing für die Bioinformatik | benotete Prüfungsleistung
|
Studieninformation |
Die verbindlichen Modulbeschreibungen enthalten weitere Informationen, auch zu den "Leistungen" und ihren Anforderungen. Sind mehrere "Leistungsformen" möglich, entscheiden die jeweiligen Lehrenden darüber.