"Structure and Value Synopses for XML Data Graphs"

by Neoklis Polyzotis and Minos Garofalakis.
Proceedings of VLDB'2002, Hong Kong, China, August 2002, pp. 466-477.



Abstract

All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation and branching through the label structure of the XML data graph, and (2) predicates on the values of specific path/branch nodes, in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over graph-structured XML data. In this paper, we extend our earlier work on structural XSketch synopses and we propose an (augmented) XSketch synopsis model that exploits localized stability and value-distribution summaries (e.g., histograms) to accurately capture the complex correlation patterns that can exist between and across path structure and element values in the data graph. We develop a systematic XSketch estimation framework for complex path expressions with value predicates and we propose an efficient heuristic algorithm based on greedy forward selection for building an effective XSketch for a given amount of space (which is, in general, an NP-hard optimization problem). Implementation results with both synthetic and real-life data sets verify the effectiveness of our approach.

[ camera-ready paper (pdf) (ps.gz) | Alkis' talk slides (ppt.gz) ]


Copyright © 2002, VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.