- Short Description
The Web contains a large number (billions) of tables (e.g., HTML tables, spreadsheet documents). Many of these tables contain structured information that could be extracted and added to a knowledge base. Given such a knowledge base, important tasks such as search and question answering can be supported. To do so, the content of a table needs to be understood and represented in terms of an ontology.
In the previous year, within an <a href="https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=92380254>ISY project</a>, students developed and implemented a set of basic tasks that is a necessary prerequisite for table understanding. These basic tasks create basic hypotheses that can then be consumed by higher-level tasks. For example, given <a href="dbpedia.org">DBpedia</a> as a knowledge base and given a table cell value "Barack Obama", a task creates the hypothesis that the politician Barack Obama (which is known to DBpedia) is mentioned in that cell. Given a cell value "Aug 4, 1961" in the same table row where Barack Obama is thought to be mentioned, another task might create the hypothesis that in this cell the birthday of Barack Obama is mentioned. Higher level tasks would then generate hypotheses about rows, columns, or the entire table. Data mining an machine learning techniques will be applied to reach that goal.
The basic tasks were executed on one million Web tables, resulting in an enriched table corpus. The tasks and the results are published as a paper: <a href="https://pub.uni-bielefeld.de/publication/2913458">Towards a Large Corpus of Richly Annotated Web Tables for Knowledge Base Population</a>. The data was made available <a href="https://pub.uni-bielefeld.de/data/2912802">online</a> as well.
The task of the project is to i) develop and implement these higher level tasks by building on the basic tasks, thus to realize table understanding, ii) execute these tasks on real data, iii) extract information from tables and extend a knowledge base, and iv) evaluate the correctness of the tasks and the extracted information. Possibly, v) further basic tasks need to be implemented or existing basic tasks need to be improved.
Please note that the teams will be selected by the supervisors on the basis of short applications that students are expected to send to them. Registering to the project in the ekVV will only be regarded as expression of interest; it will not secure a team membership.
Please get in touch with the supervisors for information on the application procedure.