392268 ISY Project: Information Extraction from Web Tables (Pj) (SoSe 2018)

Contents, comment

- Short Description
The Web contains a large number (billions) of tables (e.g., HTML tables, spreadsheet documents). Many of these tables contain structured information that could be extracted and added to a knowledge base. Given such a knowledge base, important tasks such as search and question answering can be supported. To do so, the content of a table needs to be understood and represented in terms of an ontology.

In the previous year, within an <a href="https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=92380254>ISY project</a>, students developed and implemented a set of basic tasks that is a necessary prerequisite for table understanding. These basic tasks create basic hypotheses that can then be consumed by higher-level tasks. For example, given <a href="dbpedia.org">DBpedia</a> as a knowledge base and given a table cell value "Barack Obama", a task creates the hypothesis that the politician Barack Obama (which is known to DBpedia) is mentioned in that cell. Given a cell value "Aug 4, 1961" in the same table row where Barack Obama is thought to be mentioned, another task might create the hypothesis that in this cell the birthday of Barack Obama is mentioned. Higher level tasks would then generate hypotheses about rows, columns, or the entire table. Data mining an machine learning techniques will be applied to reach that goal.

The basic tasks were executed on one million Web tables, resulting in an enriched table corpus. The tasks and the results are published as a paper: <a href="https://pub.uni-bielefeld.de/publication/2913458">Towards a Large Corpus of Richly Annotated Web Tables for Knowledge Base Population</a>. The data was made available <a href="https://pub.uni-bielefeld.de/data/2912802">online</a> as well.

The task of the project is to i) develop and implement these higher level tasks by building on the basic tasks, thus to realize table understanding, ii) execute these tasks on real data, iii) extract information from tables and extend a knowledge base, and iv) evaluate the correctness of the tasks and the extracted information. Possibly, v) further basic tasks need to be implemented or existing basic tasks need to be improved.

Please note that the teams will be selected by the supervisors on the basis of short applications that students are expected to send to them. Registering to the project in the ekVV will only be regarded as expression of interest; it will not secure a team membership.
Please get in touch with the supervisors for information on the application procedure.

Requirements for participation, required level

Required skills:

programming skills are required (e.g., Perl, Python, Java, ...). However, in a group of several students, conceptual and implementational work can be distributed among the group members.
knowledge of Semantic Web technologies (RDF, SPARQL) is a plus, but can be obtained during the project.
experiences with and knowledge about data mining and machine learning are a plus, but can be obtained during the project

Teaching staff

Dates ( Calendar view )

Frequency	Weekday	Time	Format / Place	Period

Show passed dates >> Edit date display

Subject assignments

Module	Course	Requirements
39-M-Inf-GP Basic Project Intelligent Systems Grundlagenprojekt Intelligente Systeme	Gruppenprojekt	Ungraded examination	Student information

The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.

No more requirements

No eLearning offering available

Address:: SS2018_392268@ekvv.uni-bielefeld.de; This address can be used by teaching staff, their secretary's offices as well as the individuals in charge of course data maintenance to send emails to the course participants. IMPORTANT: All sent emails must be activated. Wait for the activation email and follow the instructions given there.; If the reference number is used for several courses in the course of the semester, use the following alternative address to reach the participants of exactly this: VST_123314055@ekvv.uni-bielefeld.de
Notes:: Additional notes on the electronic mailing lists
Email archive: Number of entries 0; Open email archive

Last update basic details/teaching staff:: Thursday, February 1, 2018
Last update times:: Thursday, February 1, 2018
Last update rooms:: Thursday, February 1, 2018

Type(s) / SWS (hours per week per semester): project (Pj) / 4
Department: Faculty of Technology
Questions or corrections?: Questions or correction requests for this course?
Planning support: Clashing dates for this course
Links to this course: If you want to set links to this course page, please use one of the following links. Do not use the link shown in your browser!; The following link includes the course ID and is always unique:; https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=123314055
Send page to mobile: Click to open QR code
Scan QR code:
ID: 123314055

Quick links

392268 ISY Project: Information Extraction from Web Tables (Pj) (SoSe 2018)

Contents, comment

Requirements for participation, required level

Teaching staff

Dates ( Calendar view )

Subject assignments

Requirement concretion

eLearning

Automatic electronic mailing list for the course

Changes to/updates of the course details

Others