392268 ISY Project: Information Extraction from Web Tables (Pj) (SoSe 2018)

Contents, comment

- Short Description
The Web contains a large number (billions) of tables (e.g., HTML tables, spreadsheet documents). Many of these tables contain structured information that could be extracted and added to a knowledge base. Given such a knowledge base, important tasks such as search and question answering can be supported. To do so, the content of a table needs to be understood and represented in terms of an ontology.

In the previous year, within an <a href="https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=92380254>ISY project</a>, students developed and implemented a set of basic tasks that is a necessary prerequisite for table understanding. These basic tasks create basic hypotheses that can then be consumed by higher-level tasks. For example, given <a href="dbpedia.org">DBpedia</a> as a knowledge base and given a table cell value "Barack Obama", a task creates the hypothesis that the politician Barack Obama (which is known to DBpedia) is mentioned in that cell. Given a cell value "Aug 4, 1961" in the same table row where Barack Obama is thought to be mentioned, another task might create the hypothesis that in this cell the birthday of Barack Obama is mentioned. Higher level tasks would then generate hypotheses about rows, columns, or the entire table. Data mining an machine learning techniques will be applied to reach that goal.

The basic tasks were executed on one million Web tables, resulting in an enriched table corpus. The tasks and the results are published as a paper: <a href="https://pub.uni-bielefeld.de/publication/2913458&quot;>Towards a Large Corpus of Richly Annotated Web Tables for Knowledge Base Population</a>. The data was made available <a href="https://pub.uni-bielefeld.de/data/2912802&quot;>online</a> as well.

The task of the project is to i) develop and implement these higher level tasks by building on the basic tasks, thus to realize table understanding, ii) execute these tasks on real data, iii) extract information from tables and extend a knowledge base, and iv) evaluate the correctness of the tasks and the extracted information. Possibly, v) further basic tasks need to be implemented or existing basic tasks need to be improved.

Please note that the teams will be selected by the supervisors on the basis of short applications that students are expected to send to them. Registering to the project in the ekVV will only be regarded as expression of interest; it will not secure a team membership.
Please get in touch with the supervisors for information on the application procedure.

Requirements for participation, required level

Required skills:

  • programming skills are required (e.g., Perl, Python, Java, ...). However, in a group of several students, conceptual and implementational work can be distributed among the group members.
  • knowledge of Semantic Web technologies (RDF, SPARQL) is a plus, but can be obtained during the project.
  • experiences with and knowledge about data mining and machine learning are a plus, but can be obtained during the project

Teaching staff

Dates ( Calendar view )

Frequency Weekday Time Format / Place Period  
by appointment n.V.   09.04.-20.07.2018

Hide passed dates <<

Subject assignments

Module Course Requirements  
39-M-Inf-GP Grundlagenprojekt Intelligente Systeme Gruppenprojekt Ungraded examination
Student information

The binding module descriptions contain further information, including specifications on the "types of assignments" students need to complete. In cases where a module description mentions more than one kind of assignment, the respective member of the teaching staff will decide which task(s) they assign the students.


No more requirements
No eLearning offering available
Registered number: 4
This is the number of students having stored the course in their timetable. In brackets, you see the number of users registered via guest accounts.
Address:
SS2018_392268@ekvv.uni-bielefeld.de
This address can be used by teaching staff, their secretary's offices as well as the individuals in charge of course data maintenance to send emails to the course participants. IMPORTANT: All sent emails must be activated. Wait for the activation email and follow the instructions given there.
If the reference number is used for several courses in the course of the semester, use the following alternative address to reach the participants of exactly this: VST_123314055@ekvv.uni-bielefeld.de
Coverage:
No students to be reached via email
Notes:
Additional notes on the electronic mailing lists
Email archive
Number of entries 0
Open email archive
Last update basic details/teaching staff:
Thursday, February 1, 2018 
Last update times:
Thursday, February 1, 2018 
Last update rooms:
Thursday, February 1, 2018 
Type(s) / SWS (hours per week per semester)
project (Pj) / 4
Department
Faculty of Technology
Questions or corrections?
Questions or correction requests for this course?
Planning support
Clashing dates for this course
Links to this course
If you want to set links to this course page, please use one of the following links. Do not use the link shown in your browser!
The following link includes the course ID and is always unique:
https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=123314055
Send page to mobile
Click to open QR code
Scan QR code: Enlarge QR code
ID
123314055