Start my eKVV Studieninformation Lernräume Prüfungsverwaltung Bewerbungs-/Statusportal Anmelden

392268 ISY Project: Information Extraction from Web Tables (Pj) (SoSe 2018)

Inhalt, Kommentar

- Short Description
The Web contains a large number (billions) of tables (e.g., HTML tables, spreadsheet documents). Many of these tables contain structured information that could be extracted and added to a knowledge base. Given such a knowledge base, important tasks such as search and question answering can be supported. To do so, the content of a table needs to be understood and represented in terms of an ontology.

In the previous year, within an <a href="https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=92380254>ISY project</a>, students developed and implemented a set of basic tasks that is a necessary prerequisite for table understanding. These basic tasks create basic hypotheses that can then be consumed by higher-level tasks. For example, given <a href="dbpedia.org">DBpedia</a> as a knowledge base and given a table cell value "Barack Obama", a task creates the hypothesis that the politician Barack Obama (which is known to DBpedia) is mentioned in that cell. Given a cell value "Aug 4, 1961" in the same table row where Barack Obama is thought to be mentioned, another task might create the hypothesis that in this cell the birthday of Barack Obama is mentioned. Higher level tasks would then generate hypotheses about rows, columns, or the entire table. Data mining an machine learning techniques will be applied to reach that goal.

The basic tasks were executed on one million Web tables, resulting in an enriched table corpus. The tasks and the results are published as a paper: <a href="https://pub.uni-bielefeld.de/publication/2913458">Towards a Large Corpus of Richly Annotated Web Tables for Knowledge Base Population</a>. The data was made available <a href="https://pub.uni-bielefeld.de/data/2912802">online</a> as well.

The task of the project is to i) develop and implement these higher level tasks by building on the basic tasks, thus to realize table understanding, ii) execute these tasks on real data, iii) extract information from tables and extend a knowledge base, and iv) evaluate the correctness of the tasks and the extracted information. Possibly, v) further basic tasks need to be implemented or existing basic tasks need to be improved.

Please note that the teams will be selected by the supervisors on the basis of short applications that students are expected to send to them. Registering to the project in the ekVV will only be regarded as expression of interest; it will not secure a team membership.
Please get in touch with the supervisors for information on the application procedure.

Teilnahmevoraussetzungen, notwendige Vorkenntnisse

Required skills:

  • programming skills are required (e.g., Perl, Python, Java, ...). However, in a group of several students, conceptual and implementational work can be distributed among the group members.
  • knowledge of Semantic Web technologies (RDF, SPARQL) is a plus, but can be obtained during the project.
  • experiences with and knowledge about data mining and machine learning are a plus, but can be obtained during the project

Lehrende

Termine (Kalendersicht )

Rhythmus Tag Uhrzeit Ort Zeitraum  
nach Vereinbarung n. V.   09.04.2018-20.07.2018

Verstecke vergangene Termine <<

Klausuren

  • Keine gefunden

Fachzuordnungen

Modul Veranstaltung Leistungen  
39-M-Inf-GP Grundlagenprojekt Intelligente Systeme Gruppenprojekt unbenotete Prüfungsleistung
Studieninformation

Die verbindlichen Modulbeschreibungen enthalten weitere Informationen, auch zu den "Leistungen" und ihren Anforderungen. Sind mehrere "Leistungsformen" möglich, entscheiden die jeweiligen Lehrenden darüber.

Konkretisierung der Anforderungen
Keine Konkretisierungen vorhanden
Lernraum
TeilnehmerInnen
Automatischer E-Mailverteiler der Veranstaltung
Änderungen/Aktualität der Veranstaltungsdaten
Sonstiges