[EBOOK] Advanced Metasearch Engine Technology PDF Download

Computers

Advanced Metasearch Engine Technology

Book Details:

Author : Weiyi Meng
Publisher : Morgan & Claypool Publishers
Release : 2011
ISBN : 1608451925
Pages : 130 pages

Download or read book Advanced Metasearch Engine Technology written by Weiyi Meng and published by Morgan & Claypool Publishers. This book was released on 2011 with total page 130 pages. Available in PDF, EPUB and Kindle. Book excerpt: Among the search tools currently on the Web, search engines are the most well known thanks to the popularity of major search engines such as Google and Yahoo . While extremely successful, these major search engines do have serious limitations. This book introduces large-scale metasearch engine technology, which has the potential to overcome the limitations of the major search engines. Essentially, a metasearch engine is a search system that supports unified access to multiple existing search engines by passing the queries it receives to its component search engines and aggregating the returned results into a single ranked list. A large-scale metasearch engine has thousands or more component search engines. While metasearch engines were initially motivated by their ability to combine the search coverage of multiple search engines, there are also other benefits such as the potential to obtain better and fresher results and to reach the Deep Web. The following major components of large-scale metasearch engines will be discussed in detail in this book: search engine selection, search engine incorporation, and result merging. Highly scalable and automated solutions for these components are emphasized. The authors make a strong case for the viability of the large-scale metasearch engine technology as a competitive technology for Web search. Table of Contents: Introduction / Metasearch Engine Architecture / Search Engine Selection / Search Engine Incorporation / Result Merging / Summary and Future Research

Computers

Advanced Metasearch Engine Technology

Book Details:

Author : Weiyi Meng
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018435
Pages : 117 pages

Download or read book Advanced Metasearch Engine Technology written by Weiyi Meng and published by Springer Nature. This book was released on 2022-05-31 with total page 117 pages. Available in PDF, EPUB and Kindle. Book excerpt: Among the search tools currently on the Web, search engines are the most well known thanks to the popularity of major search engines such as Google and Yahoo!. While extremely successful, these major search engines do have serious limitations. This book introduces large-scale metasearch engine technology, which has the potential to overcome the limitations of the major search engines. Essentially, a metasearch engine is a search system that supports unified access to multiple existing search engines by passing the queries it receives to its component search engines and aggregating the returned results into a single ranked list. A large-scale metasearch engine has thousands or more component search engines. While metasearch engines were initially motivated by their ability to combine the search coverage of multiple search engines, there are also other benefits such as the potential to obtain better and fresher results and to reach the Deep Web. The following major components of large-scale metasearch engines will be discussed in detail in this book: search engine selection, search engine incorporation, and result merging. Highly scalable and automated solutions for these components are emphasized. The authors make a strong case for the viability of the large-scale metasearch engine technology as a competitive technology for Web search. Table of Contents: Introduction / Metasearch Engine Architecture / Search Engine Selection / Search Engine Incorporation / Result Merging / Summary and Future Research

Computers

Peer to Peer Data Management

Book Details:

Author : Karl Aberer
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018478
Pages : 138 pages

Download or read book Peer to Peer Data Management written by Karl Aberer and published by Springer Nature. This book was released on 2022-05-31 with total page 138 pages. Available in PDF, EPUB and Kindle. Book excerpt: This lecture introduces systematically into the problem of managing large data collections in peer-to-peer systems. Search over large datasets has always been a key problem in peer-to-peer systems and the peer-to-peer paradigm has incited novel directions in the field of data management. This resulted in many novel peer-to-peer data management concepts and algorithms, for supporting data management tasks in a wider sense, including data integration, document management and text retrieval. The lecture covers four different types of peer-to-peer data management systems that are characterized by the type of data they manage and the search capabilities they support. The first type are structured peer-to-peer data management systems which support structured query capabilities for standard data models. The second type are peer-to-peer data integration systems for querying of heterogeneous databases without requiring a common global schema. The third type are peer-to-peer document retrieval systems that enable document search based both on the textual content and the document structure. Finally, we introduce semantic overlay networks, which support similarity search on information represented in hierarchically organized and multi-dimensional semantic spaces. Topics that go beyond data representation and search are summarized at the end of the lecture. Table of Contents: Introduction / Structured Peer-to-Peer Databases / Peer-to-peer Data Integration / Peer-to-peer Retrieval / Semantic Overlay Networks / Conclusion

Computers

Methods for Mining and Summarizing Text Conversations

Book Details:

Author : Giuseppe Carenini‌
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 303101880X
Pages : 120 pages

Download or read book Methods for Mining and Summarizing Text Conversations written by Giuseppe Carenini‌ and published by Springer Nature. This book was released on 2022-06-01 with total page 120 pages. Available in PDF, EPUB and Kindle. Book excerpt: Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods to extract information from conversational data, and to provide natural language summaries of the data. The book begins with an overview of basic concepts, such as the differences between extractive and abstractive summaries, and metrics for evaluating the effectiveness of summarization and various extraction tasks. It also describes some of the benchmark corpora used in the literature. The book introduces extraction and mining methods for performing subjectivity and sentiment detection, topic segmentation and modeling, and the extraction of conversational structure. It also describes frameworks for conducting dialogue act recognition, decision and action item detection, and extraction of thread structure. There is a specific focus on performing all these tasks on conversational data, such as meeting transcripts (which exemplify synchronous conversations) and emails (which exemplify asynchronous conversations). Very recent approaches to deal with blogs, discussion forums and microblogs (e.g., Twitter) are also discussed. The second half of this book focuses on natural language summarization of conversational data. It gives an overview of several extractive and abstractive summarizers developed for emails, meetings, blogs and forums. It also describes attempts for building multi-modal summarizers. Last but not least, the book concludes with thoughts on topics for further development. Table of Contents: Introduction / Background: Corpora and Evaluation Methods / Mining Text Conversations / Summarizing Text Conversations / Conclusions / Final Thoughts

Computers

Probabilistic Databases

Book Details:

Author : Dan Suciu
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018796
Pages : 164 pages

Download or read book Probabilistic Databases written by Dan Suciu and published by Springer Nature. This book was released on 2022-05-31 with total page 164 pages. Available in PDF, EPUB and Kindle. Book excerpt: Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-k query processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases. Table of Contents: Overview / Data and Query Model / The Query Evaluation Problem / Extensional Query Evaluation / Intensional Query Evaluation / Advanced Techniques

Computers

Uncertain Schema Matching

Book Details:

Author : Avigdor Gal
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018451
Pages : 85 pages

Download or read book Uncertain Schema Matching written by Avigdor Gal and published by Springer Nature. This book was released on 2022-05-31 with total page 85 pages. Available in PDF, EPUB and Kindle. Book excerpt: Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. Schema matching is one of the basic operations required by the process of data and schema integration, and thus has a great effect on its outcomes, whether these involve targeted content delivery, view integration, database integration, query rewriting over heterogeneous sources, duplicate data elimination, or automatic streamlining of workflow activities that involve heterogeneous data sources. Although schema matching research has been ongoing for over 25 years, more recently a realization has emerged that schema matchers are inherently uncertain. Since 2003, work on the uncertainty in schema matching has picked up, along with research on uncertainty in other areas of data management. This lecture presents various aspects of uncertainty in schema matching within a single unified framework. We introduce basic formulations of uncertainty and provide several alternative representations of schema matching uncertainty. Then, we cover two common methods that have been proposed to deal with uncertainty in schema matching, namely ensembles, and top-K matchings, and analyze them in this context. We conclude with a set of real-world applications. Table of Contents: Introduction / Models of Uncertainty / Modeling Uncertain Schema Matching / Schema Matcher Ensembles / Top-K Schema Matchings / Applications / Conclusions and Future Work

Computers

On Uncertain Graphs

Book Details:

Author : Arijit Khan
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018605
Pages : 80 pages

Download or read book On Uncertain Graphs written by Arijit Khan and published by Springer Nature. This book was released on 2022-05-31 with total page 80 pages. Available in PDF, EPUB and Kindle. Book excerpt: Large-scale, highly interconnected networks, which are often modeled as graphs, pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the underlying data due to a variety of reasons, such as noisy measurements, lack of precise information needs, inference and prediction models, or explicit manipulation, e.g., for privacy purposes. Therefore, uncertain, or probabilistic, graphs are increasingly used to represent noisy linked data in many emerging application scenarios, and they have recently become a hot topic in the database and data mining communities. Many classical algorithms such as reachability and shortest path queries become #P-complete and, thus, more expensive over uncertain graphs. Moreover, various complex queries and analytics are also emerging over uncertain networks, such as pattern matching, information diffusion, and influence maximization queries. In this book, we discuss the sources of uncertain graphs and their applications, uncertainty modeling, as well as the complexities and algorithmic advances on uncertain graphs processing in the context of both classical and emerging graph queries and analytics. We emphasize the current challenges and highlight some future research directions.

Computers

Query Processing over Incomplete Databases

Book Details:

Author : Yunjun Gao
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 303101863X
Pages : 106 pages

Download or read book Query Processing over Incomplete Databases written by Yunjun Gao and published by Springer Nature. This book was released on 2022-06-01 with total page 106 pages. Available in PDF, EPUB and Kindle. Book excerpt: Incomplete data is part of life and almost all areas of scientific studies. Users tend to skip certain fields when they fill out online forms; participants choose to ignore sensitive questions on surveys; sensors fail, resulting in the loss of certain readings; publicly viewable satellite map services have missing data in many mobile applications; and in privacy-preserving applications, the data is incomplete deliberately in order to preserve the sensitivity of some attribute values. Query processing is a fundamental problem in computer science, and is useful in a variety of applications. In this book, we mostly focus on the query processing over incomplete databases, which involves finding a set of qualified objects from a specified incomplete dataset in order to support a wide spectrum of real-life applications. We first elaborate the three general kinds of methods of handling incomplete data, including (i) discarding the data with missing values, (ii) imputation for the missing values, and (iii) just depending on the observed data values. For the third method type, we introduce the semantics of k-nearest neighbor (kNN) search, skyline query, and top-k dominating query on incomplete data, respectively. In terms of the three representative queries over incomplete data, we investigate some advanced techniques to process incomplete data queries, including indexing, pruning as well as crowdsourcing techniques.

Computers

Similarity Joins in Relational Database Systems

Book Details:

Author : Nikolaus Augsten
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018516
Pages : 106 pages

Download or read book Similarity Joins in Relational Database Systems written by Nikolaus Augsten and published by Springer Nature. This book was released on 2022-05-31 with total page 106 pages. Available in PDF, EPUB and Kindle. Book excerpt: State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance computations. The basic idea is to decompose complex objects into sets of tokens that can be compared efficiently. Token-based distances are used to compute an approximation of the edit distance and prune expensive edit distance calculations. A key observation when computing similarity joins is that many of the object pairs, for which the similarity is computed, are very different from each other. Filters exploit this property to improve the performance of similarity joins. A filter preprocesses the input data sets and produces a set of candidate pairs. The distance function is evaluated on the candidate pairs only. We describe the essential query processing techniques for filters based on lower and upper bounds. For token equality joins we describe prefix, size, positional and partitioning filters, which can be used to avoid the computation of small intersections that are not needed since the similarity would be too low.

Computers

Perspectives on Business Intelligence

Book Details:

Author : Raymond T. Ng
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018486
Pages : 151 pages

Download or read book Perspectives on Business Intelligence written by Raymond T. Ng and published by Springer Nature. This book was released on 2022-05-31 with total page 151 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the 1980s, traditional Business Intelligence (BI) systems focused on the delivery of reports that describe the state of business activities in the past, such as for questions like "How did our sales perform during the last quarter?" A decade later, there was a shift to more interactive content that presented how the business was performing at the present time, answering questions like "How are we doing right now?" Today the focus of BI users are looking into the future. "Given what I did before and how I am currently doing this quarter, how will I do next quarter?" Furthermore, fuelled by the demands of Big Data, BI systems are going through a time of incredible change. Predictive analytics, high volume data, unstructured data, social data, mobile, consumable analytics, and data visualization are all examples of demands and capabilities that have become critical within just the past few years, and are growing at an unprecedented pace. This book introduces research problems and solutions on various aspects central to next-generation BI systems. It begins with a chapter on an industry perspective on how BI has evolved, and discusses how game-changing trends have drastically reshaped the landscape of BI. One of the game changers is the shift toward the consumerization of BI tools. As a result, for BI tools to be successfully used by business users (rather than IT departments), the tools need a business model, rather than a data model. One chapter of the book surveys four different types of business modeling. However, even with the existence of a business model for users to express queries, the data that can meet the needs are still captured within a data model. The next chapter on vivification addresses the problem of closing the gap, which is often significant, between the business and the data models. Moreover, Big Data forces BI systems to integrate and consolidate multiple, and often wildly different, data sources. One chapter gives an overview of several integration architectures for dealing with the challenges that need to be overcome. While the book so far focuses on the usual structured relational data, the remaining chapters turn to unstructured data, an ever-increasing and important component of Big Data. One chapter on information extraction describes methods for dealing with the extraction of relations from free text and the web. Finally, BI users need tools to visualize and interpret new and complex types of information in a way that is compelling, intuitive, but accurate. The last chapter gives an overview of information visualization for decision support and text.

Computers

Data Management in the Cloud

Book Details:

Author : Divyakant Agrawal
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018958
Pages : 120 pages

Download or read book Data Management in the Cloud written by Divyakant Agrawal and published by Springer Nature. This book was released on 2022-05-31 with total page 120 pages. Available in PDF, EPUB and Kindle. Book excerpt: Cloud computing has emerged as a successful paradigm of service-oriented computing and has revolutionized the way computing infrastructure is used. This success has seen a proliferation in the number of applications that are being deployed in various cloud platforms. There has also been an increase in the scale of the data generated as well as consumed by such applications. Scalable database management systems form a critical part of the cloud infrastructure. The attempt to address the challenges posed by the management of big data has led to a plethora of systems. This book aims to clarify some of the important concepts in the design space of scalable data management in cloud computing infrastructures. Some of the questions that this book aims to answer are: the appropriate systems for a specific set of application requirements, the research challenges in data management for the cloud, and what is novel in the cloud for database researchers? We also aim to address one basic question: whether cloud computing poses new challenges in scalable data management or it is just a reincarnation of old problems? We provide a comprehensive background study of state-of-the-art systems for scalable data management and analysis. We also identify important aspects in the design of different systems and the applicability and scope of these systems. A thorough understanding of current solutions and a precise characterization of the design space are essential for clearing the "cloudy skies of data management" and ensuring the success of DBMSs in the cloud, thus emulating the success enjoyed by relational databases in traditional enterprise settings. Table of Contents: Introduction / Distributed Data Management / Cloud Data Management: Early Trends / Transactions on Co-located Data / Transactions on Distributed Data / Multi-tenant Database Systems / Concluding Remarks

Computers

Transaction Processing on Modern Hardware

Book Details:

Author : Mohammad Sadoghi
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018702
Pages : 122 pages

Download or read book Transaction Processing on Modern Hardware written by Mohammad Sadoghi and published by Springer Nature. This book was released on 2022-05-31 with total page 122 pages. Available in PDF, EPUB and Kindle. Book excerpt: The last decade has brought groundbreaking developments in transaction processing. This resurgence of an otherwise mature research area has spurred from the diminishing cost per GB of DRAM that allows many transaction processing workloads to be entirely memory-resident. This shift demanded a pause to fundamentally rethink the architecture of database systems. The data storage lexicon has now expanded beyond spinning disks and RAID levels to include the cache hierarchy, memory consistency models, cache coherence and write invalidation costs, NUMA regions, and coherence domains. New memory technologies promise fast non-volatile storage and expose unchartered trade-offs for transactional durability, such as exploiting byte-addressable hot and cold storage through persistent programming that promotes simpler recovery protocols. In the meantime, the plateauing single-threaded processor performance has brought massive concurrency within a single node, first in the form of multi-core, and now with many-core and heterogeneous processors. The exciting possibility to reshape the storage, transaction, logging, and recovery layers of next-generation systems on emerging hardware have prompted the database research community to vigorously debate the trade-offs between specialized kernels that narrowly focus on transaction processing performance vs. designs that permit transactionally consistent data accesses from decision support and analytical workloads. In this book, we aim to classify and distill the new body of work on transaction processing that has surfaced in the last decade to navigate researchers and practitioners through this intricate research subject.

Computers

Databases on Modern Hardware

Book Details:

Author : Anastasia Ailamaki
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 3031018583
Pages : 101 pages

Download or read book Databases on Modern Hardware written by Anastasia Ailamaki and published by Springer Nature. This book was released on 2022-06-01 with total page 101 pages. Available in PDF, EPUB and Kindle. Book excerpt: Data management systems enable various influential applications from high-performance online services (e.g., social networks like Twitter and Facebook or financial markets) to big data analytics (e.g., scientific exploration, sensor networks, business intelligence). As a result, data management systems have been one of the main drivers for innovations in the database and computer architecture communities for several decades. Recent hardware trends require software to take advantage of the abundant parallelism existing in modern and future hardware. The traditional design of the data management systems, however, faces inherent scalability problems due to its tightly coupled components. In addition, it cannot exploit the full capability of the aggressive micro-architectural features of modern processors. As a result, today's most commonly used server types remain largely underutilized leading to a huge waste of hardware resources and energy. In this book, we shed light on the challenges present while running DBMS on modern multicore hardware. We divide the material into two dimensions of scalability: implicit/vertical and explicit/horizontal. The first part of the book focuses on the vertical dimension: it describes the instruction- and data-level parallelism opportunities in a core coming from the hardware and software side. In addition, it examines the sources of under-utilization in a modern processor and presents insights and hardware/software techniques to better exploit the microarchitectural resources of a processor by improving cache locality at the right level of the memory hierarchy. The second part focuses on the horizontal dimension, i.e., scalability bottlenecks of database applications at the level of multicore and multisocket multicore architectures. It first presents a systematic way of eliminating such bottlenecks in online transaction processing workloads, which is based on minimizing unbounded communication, and shows several techniques that minimize bottlenecks in major components of database management systems. Then, it demonstrates the data and work sharing opportunities for analytical workloads, and reviews advanced scheduling mechanisms that are aware of nonuniform memory accesses and alleviate bandwidth saturation.

Computers

Data Intensive Workflow Management

Book Details:

Author : Daniel Oliveira
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 3031018729
Pages : 161 pages

Download or read book Data Intensive Workflow Management written by Daniel Oliveira and published by Springer Nature. This book was released on 2022-06-01 with total page 161 pages. Available in PDF, EPUB and Kindle. Book excerpt: Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc. Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data. In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.

Computers

Big Data Integration

Book Details:

Author : Xin Luna Dong
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018532
Pages : 178 pages

Download or read book Big Data Integration written by Xin Luna Dong and published by Springer Nature. This book was released on 2022-05-31 with total page 178 pages. Available in PDF, EPUB and Kindle. Book excerpt: The big data era is upon us: data are being generated, analyzed, and used at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data. BDI differs from traditional data integration along the dimensions of volume, velocity, variety, and veracity. First, not only can data sources contain a huge volume of data, but also the number of data sources is now in the millions. Second, because of the rate at which newly collected data are made available, many of the data sources are very dynamic, and the number of data sources is also rapidly exploding. Third, data sources are extremely heterogeneous in their structure and content, exhibiting considerable variety even for substantially similar entities. Fourth, the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This book explores the progress that has been made by the data integration community on the topics of schema alignment, record linkage and data fusion in addressing these novel challenges faced by big data integration. Each of these topics is covered in a systematic way: first starting with a quick tour of the topic in the context of traditional data integration, followed by a detailed, example-driven exposition of recent innovative techniques that have been proposed to address the BDI challenges of volume, velocity, variety, and veracity. Finally, it presents merging topics and opportunities that are specific to BDI, identifying promising directions for the data integration community.

Computers

Database Repairs and Consistent Query Answering

Book Details:

Author : Leopoldo Bertossi
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018834
Pages : 105 pages

Download or read book Database Repairs and Consistent Query Answering written by Leopoldo Bertossi and published by Springer Nature. This book was released on 2022-05-31 with total page 105 pages. Available in PDF, EPUB and Kindle. Book excerpt: Integrity constraints are semantic conditions that a database should satisfy in order to be an appropriate model of external reality. In practice, and for many reasons, a database may not satisfy those integrity constraints, and for that reason it is said to be inconsistent. However, and most likely, a large portion of the database is still semantically correct, in a sense that has to be made precise. After having provided a formal characterization of consistent data in an inconsistent database, the natural problem emerges of extracting that semantically correct data, as query answers. The consistent data in an inconsistent database is usually characterized as the data that persists across all the database instances that are consistent and minimally differ from the inconsistent instance. Those are the so-called repairs of the database. In particular, the consistent answers to a query posed to the inconsistent database are those answers that can be simultaneously obtained from all the database repairs. As expected, the notion of repair requires an adequate notion of distance that allows for the comparison of databases with respect to how much they differ from the inconsistent instance. On this basis, the minimality condition on repairs can be properly formulated. In this monograph we present and discuss these fundamental concepts, different repair semantics, algorithms for computing consistent answers to queries, and also complexity-theoretic results related to the computation of repairs and doing consistent query answering. Table of Contents: Introduction / The Notions of Repair and Consistent Answer / Tractable CQA and Query Rewriting / Logically Specifying Repairs / Decision Problems in CQA: Complexity and Algorithms / Repairs and Data Cleaning

Computers

Deep Web Query Interface Understanding and Integration

Book Details:

Author : Eduard C. Dragut
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031018893
Pages : 150 pages

Download or read book Deep Web Query Interface Understanding and Integration written by Eduard C. Dragut and published by Springer Nature. This book was released on 2022-05-31 with total page 150 pages. Available in PDF, EPUB and Kindle. Book excerpt: There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art techniques for extracting, understanding, and integrating the query interfaces of deep Web data sources. These techniques are critical for producing an integrated query interface for each domain. The interface serves as the mediator for searching all data sources in the concerned domain. While query interface integration is only relevant for the deep Web integration approach, the extraction and understanding of query interfaces are critical for both deep Web exploration approaches. This book aims to provide in-depth and comprehensive coverage of the key technologies needed to create high quality integrated query interfaces automatically. The following technical issues are discussed in detail in this book: query interface modeling, query interface extraction, query interface clustering, query interface matching, query interface attribute integration, and query interface integration. Table of Contents: Introduction / Query Interface Representation and Extraction / Query Interface Clustering and Categorization / Query Interface Matching / Query Interface Attribute Integration / Query Interface Integration / Summary and Future Research