- Short Description
The Web contains a large number (billions) of tables (e.g., HTML tables, spreadsheet documents). Many of these tables contain structured information that could be extracted and added to a knowledge base. Given such a knowledge base, important tasks such as search and question answering can be supported. To do so, the content of a table needs to be understood and represented in terms of an ontology.
Given a set of tables extracted from the Web and a large knowledge base (DBpedia), the task of the project is i) to align entries from tables to entries in the knowledge base, ii) to build hypotheses about what a table expresses (in the form of an RDF graph pattern), iii) to find the best hypothesis, and finally, iv) to populate a knowledge base given the information extracted from the tables.
An influential paper on table understanding is "Understanding tables
on the web" (2012) by Microsoft Research Asia: Jingjing Wang, Bin Shao, Haixun Wang, Kenny Q. Zhu.
- Required skills (e.g. mandatory courses, if required)
- programming skills are required (e.g., Perl, Python, Java, ...). However, in a group of several students, conceptual and implementational work can be distributed among the group members.
- knowledge of Semantic Web technologies (RDF, SPARQL) is a plus, but can be obtained during the project
Please note that the teams will be selected by the supervisors on the basis of short applications that students are expected to send to them. Registering to the project in the ekVV will only be regarded as expression of interest; it will not secure a team membership.
Please get in touch with the supervisors for information on the application procedure.