Evaluating Recursive Queries in Distributed Databases

Wolfgang Nejdl, Stefano Ceri, and Gio Wiederhold

Abstract

In this paper, we study the execution of logic queries in a distributed database environment. We assume that each local database system can execute logic queries, and we design methods for the efficient execution of queries requiring data from multiple sites. Conventional optimization strategies which are well-known in the field of distributed databases, such as the early evaluation of selection conditions and the clustering of processing to manipulate and exchange large sets of tuples, are redefined in view of the additional difficulties due to logic queries, in particular to recursive rules.

In order to allow efficient processing of these logic queries we present several program transformation techniques which attempt to minimize distribution costs based on the idea of semi-joins and generalized semi-joins in conventional databases. Although local computation of semi-joins is not possible for the general case, we indicate classes of programs for which these transformations succeed in producing set-oriented computation. We describe processes evaluating the recursive program in a distributed network and develop an efficient method for testing the termination of the computation. Finally we compare our approach with sequential as well as dataflow-oriented evaluation. Datalog is assumed as logic programming language and paradigm.

Keywords: Deductive databases, distributed databases, distributed query processing, query optimization, recursive query processing, semijoin

A preliminary version of the full paper is available as a postscript file .