----------------------------------------------------------------------------- Query Optimization in the Deep Web ----------------------------------------------------------------------------- Andrea Cali' Davide Martinenghi Brunel University Politecnico di Milano ----------------------------------------------------------------------------- Abstract The term Deep Web refers to the data content that is created dynamically as the result of a specific search on the Web. In this respect, such content resides outside web pages, and is only accessible through interaction with the web site – typically via HTML forms. It is believed that the size of the Deep Web is several orders of magnitude larger than that of the so-called Surface Web, i.e., the web that is accessible and indexable by search engines. Usually, data sources accessible through web forms are modeled by relations that require certain fields to be selected – i.e., some fields in the form need to be filled in. These requirements are commonly referred to as access limitations in that access to data can only take place according to given patterns. Besides data accessible through web forms, access limitations may also occur i) in legacy systems where data scattered over several files are wrapped as relational tables, and ii) in the context of Web services, where similar restrictions arise from the distinction between input parameters and output parameters. In such contexts, computing the answer to a user query cannot be done as in a traditional database; instead, a query plan is needed that provides the best answer possible while complying with the access limitations. In these talks, we illustrate the semantics of answers to queries over data sources under access limitations and present techniques for query answering in this context. We show different techniques to optimize query answering both at the time of the query plan generation and at the time of the execution of the query plan. We analyze the influence of integrity constraints on the sources, of the kind that is usually found in database schemata, on query answering. We present prototype systems that are aimed at querying the deep web, and show their achievements. ----------------------------------------------------------------------------- Giovedi' 10 Giugno 2010 Ore 11:30 Aula N7 Facoltà di Ingegneria - Università Roma Tre Via Vasca Navale, 79 00146 Roma Come arrivare: http://atzeni.dia.uniroma3.it/accesso/index.html -----------------------------------------------------------------------------