Text to Knowledge

Recent estimates suggest that up to 90% of data on the Web and in enterprises is unstructured, e.g., as natural language text. Information extraction (IE) systems discover structured information from such text (e.g., convert news articles into database entries listing extracted named entities, relations, dates, etc.), since structured information enables much richer querying and data mining, e.g., using semantic reasoning. Beyond understanding the human produced text, we also are interested in assisting people to find (textual) information they are looking for, or could be interested in, including predicting what they will (want to) read.

The overarching objective in our machine reading research group is to build systems that make sense of human produced text.

Some of the topics we recently have been working on in this area include the following:

Relation extraction for knowledge base population
Keyphrase extraction
Text similarity and categorization
Prediction of news adoption over social media

For these research topics, we have been using classical machine learning methods, as well as various methods in the field of representation learning, and are currently working towards methods to integrate external (structured) knowledge into neural (unstructured) text processing methods.

In the recent past, we have also been working on information retrieval tasks, in particular federated web search, resource selection for IE, knowledge extraction from social media and user disagreement modeling.

In this research and related projects, we regularly make use of more generic software tools produced by other teams within IDLab, e.g., LimeDS, Tengu.

Staff

Chris Develder, Bart Dhoedt, Thomas Demeester, Erik Mannens, Azarakhsh Jalavand

Researchers

Lucas Sterckx, Giannis Bekoulis, Cedric De Boom, Johannes Deleu, Laurent Mertens, Gerald Haesendonck, Ben De Meester, Martin Van Brabant

Projects

ICON Steamer

ICON Providence

Google DNI Fund: Providence+

Key publications

L. Sterckx, T. Demeester, J. Deleu and C. Develder, "Knowledge base population using semantic label propagation", Knowledge-Based Syst., 10 May 2016.

C. De Boom, S. Van Canneyt, T. Demeester, B. Dhoedt, “Representation learning for very short texts using weighted word embedding aggregation”, Pattern Recognition Lett., 2016

T. Demeester, R. Aly, D. Hiemstra, D. Nguyen and C. Develder, "Predicting relevance based on assessor disagreement: Analysis and practical applications for search evaluation", Inf. Retr., Nov. 2015, pp. 1-29.

S. Van Canneyt, S. Schockaert, B. Dhoedt, “Categorizing events using spatio-temporal and user features from Flickr”, Information Sciences, vol. 328, Jan. 2016, pp. 76-96.

P. Barrio, L. Gravano and C. Develder, "Ranking deep web text collections for scalable information extraction", in Proc. 24th ACM Int. Conf. Information and Knowledge Management (CIKM 2015), Melbourne, Australia, 19-23 Oct. 2015.

T. Demeester, T. Rocktaschel, S. Riedel, “Lifted Rule Injection for Relation Embeddings”, in Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP 2016), Austin, TX, USA, 1-5 Nov. 2016.