In comparative genomics, a common preprocessing step is to represent genomes as a series of blocks, represented usually by numbers, where each number corresponds to a gene family. Using different kinds of algorithms, usually based on sequence similarity, the gene family detection step has to be performed, in order to allow the subsequent comparative genomics analysis.
Recently, a different approach was proposed, called family free, where the family detection step is skipped, and entirely included in the comparative analysis. This approach can be used on several areas, and in this project we are interested in genome rearrangements, large scale evolutionary events that change the genome by moving or reversing big DNA blocks to different positions or even chromosomes.
The Double Cut and Join (DCJ) operation is the most studied model for genome rearrangements since its introduction in 2005, due to its capability to simulate several rearrangement operations and yet it gives rise to a simple combinatorial model, solvable in linear time.
In 2014, Martinez et al. proposed the Family Free DCJ (FFDCJ) model, where the family free approach is used in the DCJ model. This problem was shown to be NP-hard, but solvable in medium sized instances through an ILP.
In this project, we want to further develop the FFDCJ model, also testing on simulated and real data, to assess how good a tool it is for phylogenetic reconstruction, orthology assignment and ancestral reconstruction.
This project will have two main objectives: i) improve the theoretical FFDCJ model, allowing new operations insertions, deletions and duplications, from which results are already known in the original (non family free) DCJ model; ii) develop tests on real and simulated data, to evaluate the new improvements.
Modul | Veranstaltung | Leistungen | |
---|---|---|---|
39-M-Inf-P_BI Projekt Bioinformatik | Projekt | unbenotete Prüfungsleistung
|
Studieninformation |
Die verbindlichen Modulbeschreibungen enthalten weitere Informationen, auch zu den "Leistungen" und ihren Anforderungen. Sind mehrere "Leistungsformen" möglich, entscheiden die jeweiligen Lehrenden darüber.