HOME     |      PUBLICATIONS     |      PROJECTS     |      TEACHING     |      RESOURCES         

Ekaterini Ioannou

Software Technology and Network Applications Laboratory

Department of Electronic & Computer Engineering
Technical University of Crete
University Campus
73100, Crete, HELLAS

Email: ioannousoftnet.gr

Entity Linkage for Heterogeneous, Uncertain, and Volatile Data

Ekaterini Ioannou
Ph.D thesis at University of Hannover, April 2011, Hannover, Germany.
pdf,   slides


A plethora of collections is nowadays created by merging data from a variety of different applications and information sources. These sources often use different identifiers for data that describe the same real world object, for example an artist, a conference, an organization. The large number of existing entity linkage approaches are not designed for the characteristics of modern applications and Web data. These includes data heterogeneity that is due to the lack of uniform standards, uncertainty resulting in imperfections in the extraction process or the reliability of the sources, and the volatile nature of the data due to constant modifications through interactions with users or external applications.

This dissertation introduces a novel methodology to address the entity linkage problem for heterogeneous, uncertain, and volatile data. The methodology is based on a probabilistic linkage database, which is able to simultaneously capture the entities from the original collection data and the possible linkages between entities, as these are generated by a number of the existing entity linkage techniques.

The probabilistic linkage database consists of two main components. The first is related to efficient query processing. The proposed query mechanism does not only consider the entities and the probabilistic linkages, but it also handles the uncertainty present in them.

The second component is related to the processing of the entity data for generating probabilistic linkages between entities. In order to handle the heterogeneity and the volatile nature of the data, this part focuses on incremental and adaptive techniques that consider not only the available textual information but also their inferred semantics. Both effectiveness and efficiency of the introduced algorithms are illustrated through an experimental evaluation that involves real world data.


     author = {Ekaterini Ioannou},
     title = {Entity linkage for heterogeneous, uncertain, and volatile data},
     school = {University of Hannover},
     year = {2011}

Last modified: April 2011