Identifying pathogenic genes in virus strains at a glance
International project to help determine genetic abnormalities
When new viruses or bacteria spread to humans, it is essential to clarify their special characteristics as quickly as possible. For example, why is the coronavirus resistant to common drugs? In the future, new Big Data technology can help to identify the characteristics of new strains of viruses and bacteria in a short time. It does this by comparing the genome of a single organism with the genome of all the strains of a species. This procedure can also be used for more highly developed organisms such as mammals. The new project ‘Pangaia’ at Bielefeld University is investigating how the masses of data used in this process can be ordered and analysed for use in biomedicine. The university is one of eleven project partners from Europe and North America. The EU is funding the project with 1.14 million euros over three years.
‘In these cases, we compare only two genomes with each other—differences and similarities are relatively easy to identify on the computer,’ says Professor Dr Jens Stoye from the Faculty of Technology, who is taking part in Pangaia with his Genome Informatics research group. ‘With the new approach, we can compare one genome to thousands of other genomes in a single step.’ Researchers call this exploration of the genetic repertoire of a population ‘pangenomics’.
‘Until now, the problem with computer-assisted pangenomics has been the lack of transparency caused by the mass of data,’ said Professor Dr Alexander Schönhuth from the Faculty of Technology, who has been head of the Genome Data Science working group since January 2020. He is coordinating Bielefeld’s Pangaia sub-project. Like Jens Stoye, he and his group are carrying out research at Bielefeld University’s Center for Biotechnology (CeBiTec).
Genetic data are represented by the letters A, C, G, and T. These represent the nucleotides, the building blocks of the genetic material. Genomes can be made up of billions of these information units. To make them easier to compare, they can be displayed next to each other as ‘letter chains’. This traditional sequence-based representation is widespread today. ‘But with hundreds of comparison genomes, it takes a great deal of time to analyse step by step how the genome under investigation differs from each of the comparison genomes,’ said Schönhuth.
‘The new technology enables a simultaneous, integrated analysis of many strains of the same organism. These can be viruses, bacteria, and sometimes even higher organisms,’ explains Jens Stoye. ‘This makes it possible to highlight the similarities and differences between the individual members. In the case of pathogens, it is often even possible to understand and predict the processes that led to the development of particularly infectious strains. ‘The technology can also be used to detect hereditary diseases in humans or to determine which mutations in a tumour have led to strong, abnormal growth.
The full name of the Pangaia project is ‘Pan-genome Graph Algorithms and Data Integration’. It will run from January 2020 to December 2023. The European Union is funding Pangaia through its Horizon 2020 research framework programme and the University of Milan (Italy) is coordinating the project. Other partners besides Bielefeld University are: the Netherlands Organization for Scientific Research (NWO), Comenius University Bratislava (Slovakia), the biotech companies Geneton (Slovakia) and Illumina Cambridge (Great Britain), the Institut Pasteur (France), Simon Fraser University (Canada), University of Tokyo (Japan), Cornell University, and Pennsylvania State University (both USA).
Prof. Dr Alexander Schönhuth, Bielefeld University
Faculty of Technology
Phone: +49 521 106-3362