![]() |
HOME | PUBLICATIONS | PROJECTS | TEACHING | RESOURCES |
Ekaterini IoannouSoftware Technology and Network Applications LaboratoryDepartment of Electronic & Computer Engineering Technical University of Crete University Campus 73100, Crete, HELLAS Emails: ioannou ![]() EkateriniIoannou ![]() |
Efficient Semantic-Aware Detection of Near Duplicate ResourcesEkaterini Ioannou, Odysseas Papapetrou, Dimitrios Skoutas, Wolfgang NejdlIn Proceedings of the 7th Extended Semantic Web Conference (ESWC), 30 May - 03 June 2010, Heraklion, Greece. pdf, presentation, news articles dataset AbstractEfficiently detecting near duplicate resources is an important task when integrating information from various sources and applications. Once detected, near duplicate resources can be grouped together, merged, or removed, in order to avoid repetition and redundancy, and to increase the diversity in the information provided to the user. In this paper, we introduce an approach for efficient semantic-aware near duplicate detection, by combining an indexing scheme for similarity search with the RDF representations of the resources.We provide a probabilistic analysis for the correctness of the suggested approach, which allows applications to conffigure it for satisfying their specific quality requirements. Our experimental evaluation on the RDF descriptions of real-world news articles from various news agencies demonstrates the efficiency and effectiveness of our approach.Bibtex@inproceedings{conf/esws/IoannouPSN10,author = {Ekaterini Ioannou and Odysseas Papapetrou and Dimitrios Skoutas and Wolfgang Nejdl}, title = {Efficient Semantic-Aware Detection of Near Duplicate Resources}, booktitle = {ESWC}, year = {2010}, pages = {136-150} } |
|||
Last modified: April 2011 |