Download or read book P2P Techniques for Decentralized Applications written by Esther Pacitti and published by Springer Nature. This book was released on 2022-06-01 with total page 90 pages. Available in PDF, EPUB and Kindle. Book excerpt: As an alternative to traditional client-server systems, Peer-to-Peer (P2P) systems provide major advantages in terms of scalability, autonomy and dynamic behavior of peers, and decentralization of control. Thus, they are well suited for large-scale data sharing in distributed environments. Most of the existing P2P approaches for data sharing rely on either structured networks (e.g., DHTs) for efficient indexing, or unstructured networks for ease of deployment, or some combination. However, these approaches have some limitations, such as lack of freedom for data placement in DHTs, and high latency and high network traffic in unstructured networks. To address these limitations, gossip protocols which are easy to deploy and scale well, can be exploited. In this book, we will give an overview of these different P2P techniques and architectures, discuss their trade-offs, and illustrate their use for decentralizing several large-scale data sharing applications. Table of Contents: P2P Overlays, Query Routing, and Gossiping / Content Distribution in P2P Systems / Recommendation Systems / Top-k Query Processing in P2P Systems
Download or read book P2P Techniques for Decentralized Applications written by Esther Pacitti and published by Morgan & Claypool Publishers. This book was released on 2012-04-15 with total page 106 pages. Available in PDF, EPUB and Kindle. Book excerpt: As an alternative to traditional client-server systems, Peer-to-Peer (P2P) systems provide major advantages in terms of scalability, autonomy and dynamic behavior of peers, and decentralization of control. Thus, they are well suited for large-scale data sharing in distributed environments. Most of the existing P2P approaches for data sharing rely on either structured networks (e.g., DHTs) for efficient indexing, or unstructured networks for ease of deployment, or some combination. However, these approaches have some limitations, such as lack of freedom for data placement in DHTs, and high latency and high network traffic in unstructured networks. To address these limitations, gossip protocols which are easy to deploy and scale well, can be exploited. In this book, we will give an overview of these different P2P techniques and architectures, discuss their trade-offs, and illustrate their use for decentralizing several large-scale data sharing applications. Table of Contents: P2P Overlays, Query Routing, and Gossiping / Content Distribution in P2P Systems / Recommendation Systems / Top-k Query Processing in P2P Systems
Download or read book Peer to Peer Computing written by Quang Hieu Vu and published by Springer Science & Business Media. This book was released on 2009-10-20 with total page 330 pages. Available in PDF, EPUB and Kindle. Book excerpt: Peer-to-peer (P2P) technology, or peer computing, is a paradigm that is viewed as a potential technology for redesigning distributed architectures and, consequently, distributed processing. Yet the scale and dynamism that characterize P2P systems demand that we reexamine traditional distributed technologies. A paradigm shift that includes self-reorganization, adaptation and resilience is called for. On the other hand, the increased computational power of such networks opens up completely new applications, such as in digital content sharing, scientific computation, gaming, or collaborative work environments. In this book, Vu, Lupu and Ooi present the technical challenges offered by P2P systems, and the means that have been proposed to address them. They provide a thorough and comprehensive review of recent advances on routing and discovery methods; load balancing and replication techniques; security, accountability and anonymity, as well as trust and reputation schemes; programming models and P2P systems and projects. Besides surveying existing methods and systems, they also compare and evaluate some of the more promising schemes. The need for such a book is evident. It provides a single source for practitioners, researchers and students on the state of the art. For practitioners, this book explains best practice, guiding selection of appropriate techniques for each application. For researchers, this book provides a foundation for the development of new and more effective methods. For students, it is an overview of the wide range of advanced techniques for realizing effective P2P systems, and it can easily be used as a text for an advanced course on Peer-to-Peer Computing and Technologies, or as a companion text for courses on various subjects, such as distributed systems, and grid and cluster computing.
Download or read book Data Processing on FPGAs written by Jens Teubner and published by Springer Nature. This book was released on 2022-05-31 with total page 104 pages. Available in PDF, EPUB and Kindle. Book excerpt: Roughly a decade ago, power consumption and heat dissipation concerns forced the semiconductor industry to radically change its course, shifting from sequential to parallel computing. Unfortunately, improving performance of applications has now become much more difficult than in the good old days of frequency scaling. This is also affecting databases and data processing applications in general, and has led to the popularity of so-called data appliances—specialized data processing engines, where software and hardware are sold together in a closed box. Field-programmable gate arrays (FPGAs) increasingly play an important role in such systems. FPGAs are attractive because the performance gains of specialized hardware can be significant, while power consumption is much less than that of commodity processors. On the other hand, FPGAs are way more flexible than hard-wired circuits (ASICs) and can be integrated into complex systems in many different ways, e.g., directly in the network for a high-frequency trading application. This book gives an introduction to FPGA technology targeted at a database audience. In the first few chapters, we explain in detail the inner workings of FPGAs. Then we discuss techniques and design patterns that help mapping algorithms to FPGA hardware so that the inherent parallelism of these devices can be leveraged in an optimal way. Finally, the book will illustrate a number of concrete examples that exploit different advantages of FPGAs for data processing. Table of Contents: Preface / Introduction / A Primer in Hardware Design / FPGAs / FPGA Programming Models / Data Stream Processing / Accelerated DB Operators / Secure Data Processing / Conclusions / Bibliography / Authors' Biographies / Index
Download or read book Transaction Processing on Modern Hardware written by Mohammad Sadoghi and published by Springer Nature. This book was released on 2022-05-31 with total page 122 pages. Available in PDF, EPUB and Kindle. Book excerpt: The last decade has brought groundbreaking developments in transaction processing. This resurgence of an otherwise mature research area has spurred from the diminishing cost per GB of DRAM that allows many transaction processing workloads to be entirely memory-resident. This shift demanded a pause to fundamentally rethink the architecture of database systems. The data storage lexicon has now expanded beyond spinning disks and RAID levels to include the cache hierarchy, memory consistency models, cache coherence and write invalidation costs, NUMA regions, and coherence domains. New memory technologies promise fast non-volatile storage and expose unchartered trade-offs for transactional durability, such as exploiting byte-addressable hot and cold storage through persistent programming that promotes simpler recovery protocols. In the meantime, the plateauing single-threaded processor performance has brought massive concurrency within a single node, first in the form of multi-core, and now with many-core and heterogeneous processors. The exciting possibility to reshape the storage, transaction, logging, and recovery layers of next-generation systems on emerging hardware have prompted the database research community to vigorously debate the trade-offs between specialized kernels that narrowly focus on transaction processing performance vs. designs that permit transactionally consistent data accesses from decision support and analytical workloads. In this book, we aim to classify and distill the new body of work on transaction processing that has surfaced in the last decade to navigate researchers and practitioners through this intricate research subject.
Download or read book Semantics Empowered Web 3 0 written by Amit Sheth and published by Springer Nature. This book was released on 2022-05-31 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt: After the traditional document-centric Web 1.0 and user-generated content focused Web 2.0, Web 3.0 has become a repository of an ever growing variety of Web resources that include data and services associated with enterprises, social networks, sensors, cloud, as well as mobile and other devices that constitute the Internet of Things. These pose unprecedented challenges in terms of heterogeneity (variety), scale (volume), and continuous changes (velocity), as well as present corresponding opportunities if they can be exploited. Just as semantics has played a critical role in dealing with data heterogeneity in the past to provide interoperability and integration, it is playing an even more critical role in dealing with the challenges and helping users and applications exploit all forms of Web 3.0 data. This book presents a unified approach to harness and exploit all forms of contemporary Web resources using the core principles of ability to associate meaning with data through conceptual or domain models and semantic descriptions including annotations, and through advanced semantic techniques for search, integration, and analysis. It discusses the use of Semantic Web standards and techniques when appropriate, but also advocates the use of lighter weight, easier to use, and more scalable options when they are more suitable. The authors' extensive experience spanning research and prototypes to development of operational applications and commercial technologies and products guide the treatment of the material. Table of Contents: Role of Semantics and Metadata / Types and Models of Semantics / Annotation -- Adding Semantics to Data / Semantics for Enterprise Data / Semantics for Services / Semantics for Sensor Data / Semantics for Social Data / Semantics for Cloud Computing / Semantics for Advanced Applications
Download or read book Answering Queries Using Views written by Foto Afrati and published by Springer Nature. This book was released on 2022-11-10 with total page 229 pages. Available in PDF, EPUB and Kindle. Book excerpt: The topic of using views to answer queries has been popular for a few decades now, as it cuts across domains such as query optimization, information integration, data warehousing, website design, and, recently, database-as-a-service and data placement in cloud systems. This book assembles foundational work on answering queries using views in a self-contained manner, with an effort to choose material that constitutes the backbone of the research. It presents efficient algorithms and covers the following problems: query containment; rewriting queries using views in various logical languages; equivalent rewritings and maximally contained rewritings; and computing certain answers in the data-integration and data-exchange settings. Query languages that are considered are fragments of SQL, in particular, select-project-join queries, also called conjunctive queries (with or without arithmetic comparisons or negation), and aggregate SQL queries.
Download or read book Data Cleaning written by Venkatesh Ganti and published by Springer Nature. This book was released on 2022-05-31 with total page 69 pages. Available in PDF, EPUB and Kindle. Book excerpt: Data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.
Download or read book Big Data Integration written by Xin Luna Dong and published by Springer Nature. This book was released on 2022-05-31 with total page 178 pages. Available in PDF, EPUB and Kindle. Book excerpt: The big data era is upon us: data are being generated, analyzed, and used at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data. BDI differs from traditional data integration along the dimensions of volume, velocity, variety, and veracity. First, not only can data sources contain a huge volume of data, but also the number of data sources is now in the millions. Second, because of the rate at which newly collected data are made available, many of the data sources are very dynamic, and the number of data sources is also rapidly exploding. Third, data sources are extremely heterogeneous in their structure and content, exhibiting considerable variety even for substantially similar entities. Fourth, the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This book explores the progress that has been made by the data integration community on the topics of schema alignment, record linkage and data fusion in addressing these novel challenges faced by big data integration. Each of these topics is covered in a systematic way: first starting with a quick tour of the topic in the context of traditional data integration, followed by a detailed, example-driven exposition of recent innovative techniques that have been proposed to address the BDI challenges of volume, velocity, variety, and veracity. Finally, it presents merging topics and opportunities that are specific to BDI, identifying promising directions for the data integration community.
Download or read book Querying Graphs written by Angela Bonifati and published by Springer Nature. This book was released on 2022-06-01 with total page 166 pages. Available in PDF, EPUB and Kindle. Book excerpt: Graph data modeling and querying arises in many practical application domains such as social and biological networks where the primary focus is on concepts and their relationships and the rich patterns in these complex webs of interconnectivity. In this book, we present a concise unified view on the basic challenges which arise over the complete life cycle of formulating and processing queries on graph databases. To that purpose, we present all major concepts relevant to this life cycle, formulated in terms of a common and unifying ground: the property graph data model—the pre-dominant data model adopted by modern graph database systems. We aim especially to give a coherent and in-depth perspective on current graph querying and an outlook for future developments. Our presentation is self-contained, covering the relevant topics from: graph data models, graph query languages and graph query specification, graph constraints, and graph query processing. We conclude by indicating major open research challenges towards the next generation of graph data management systems.
Download or read book Information and Influence Propagation in Social Networks written by Wei Chen and published by Springer Nature. This book was released on 2022-05-31 with total page 161 pages. Available in PDF, EPUB and Kindle. Book excerpt: Research on social networks has exploded over the last decade. To a large extent, this has been fueled by the spectacular growth of social media and online social networking sites, which continue growing at a very fast pace, as well as by the increasing availability of very large social network datasets for purposes of research. A rich body of this research has been devoted to the analysis of the propagation of information, influence, innovations, infections, practices and customs through networks. Can we build models to explain the way these propagations occur? How can we validate our models against any available real datasets consisting of a social network and propagation traces that occurred in the past? These are just some questions studied by researchers in this area. Information propagation models find applications in viral marketing, outbreak detection, finding key blog posts to read in order to catch important stories, finding leaders or trendsetters, information feed ranking, etc. A number of algorithmic problems arising in these applications have been abstracted and studied extensively by researchers under the garb of influence maximization. This book starts with a detailed description of well-established diffusion models, including the independent cascade model and the linear threshold model, that have been successful at explaining propagation phenomena. We describe their properties as well as numerous extensions to them, introducing aspects such as competition, budget, and time-criticality, among many others. We delve deep into the key problem of influence maximization, which selects key individuals to activate in order to influence a large fraction of a network. Influence maximization in classic diffusion models including both the independent cascade and the linear threshold models is computationally intractable, more precisely #P-hard, and we describe several approximation algorithms and scalable heuristics that have been proposed in the literature. Finally, we also deal with key issues that need to be tackled in order to turn this research into practice, such as learning the strength with which individuals in a network influence each other, as well as the practical aspects of this research including the availability of datasets and software tools for facilitating research. We conclude with a discussion of various research problems that remain open, both from a technical perspective and from the viewpoint of transferring the results of research into industry strength applications.
Download or read book Veracity of Data written by Laure Berti-Équille and published by Springer Nature. This book was released on 2022-05-31 with total page 141 pages. Available in PDF, EPUB and Kindle. Book excerpt: On the Web, a massive amount of user-generated content is available through various channels (e.g., texts, tweets, Web tables, databases, multimedia-sharing platforms, etc.). Conflicting information, rumors, erroneous and fake content can be easily spread across multiple sources, making it hard to distinguish between what is true and what is not. This book gives an overview of fundamental issues and recent contributions for ascertaining the veracity of data in the era of Big Data. The text is organized into six chapters, focusing on structured data extracted from texts. Chapter 1 introduces the problem of ascertaining the veracity of data in a multi-source and evolving context. Issues related to information extraction are presented in Chapter 2. Current truth discovery computation algorithms are presented in details in Chapter 3. It is followed by practical techniques for evaluating data source reputation and authoritativeness in Chapter 4. The theoretical foundations and various approaches for modeling diffusion phenomenon of misinformation spreading in networked systems are studied in Chapter 5. Finally, truth discovery computation from extracted data in a dynamic context of misinformation propagation raises interesting challenges that are explored in Chapter 6. This text is intended for a seminar course at the graduate level. It is also to serve as a useful resource for researchers and practitioners who are interested in the study of fact-checking, truth discovery, or rumor spreading.
Download or read book Data Intensive Workflow Management written by Daniel Oliveira and published by Springer Nature. This book was released on 2022-06-01 with total page 161 pages. Available in PDF, EPUB and Kindle. Book excerpt: Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc. Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data. In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.
Download or read book Community Search over Big Graphs written by Xin Huang and published by Springer Nature. This book was released on 2022-05-31 with total page 188 pages. Available in PDF, EPUB and Kindle. Book excerpt: Communities serve as basic structural building blocks for understanding the organization of many real-world networks, including social, biological, collaboration, and communication networks. Recently, community search over graphs has attracted significantly increasing attention, from small, simple, and static graphs to big, evolving, attributed, and location-based graphs. In this book, we first review the basic concepts of networks, communities, and various kinds of dense subgraph models. We then survey the state of the art in community search techniques on various kinds of networks across different application areas. Specifically, we discuss cohesive community search, attributed community search, social circle discovery, and geo-social group search. We highlight the challenges posed by different community search problems. We present their motivations, principles, methodologies, algorithms, and applications, and provide a comprehensive comparison of the existing techniques. This book finally concludes by listing publicly available real-world datasets and useful tools for facilitating further research, and by offering further readings and future directions of research in this important and growing area.
Download or read book Similarity Joins in Relational Database Systems written by Nikolaus Augsten and published by Springer Nature. This book was released on 2022-05-31 with total page 106 pages. Available in PDF, EPUB and Kindle. Book excerpt: State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance computations. The basic idea is to decompose complex objects into sets of tokens that can be compared efficiently. Token-based distances are used to compute an approximation of the edit distance and prune expensive edit distance calculations. A key observation when computing similarity joins is that many of the object pairs, for which the similarity is computed, are very different from each other. Filters exploit this property to improve the performance of similarity joins. A filter preprocesses the input data sets and produces a set of candidate pairs. The distance function is evaluated on the candidate pairs only. We describe the essential query processing techniques for filters based on lower and upper bounds. For token equality joins we describe prefix, size, positional and partitioning filters, which can be used to avoid the computation of small intersections that are not needed since the similarity would be too low.
Download or read book Skylines and Other Dominance Based Queries written by Apostolos N. Papadopoulos and published by Springer Nature. This book was released on 2022-06-01 with total page 134 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book is a gentle introduction to dominance-based query processing techniques and their applications. The book aims to present fundamental as well as some advanced issues in the area in a precise, but easy-to-follow, manner. Dominance is an intuitive concept that can be used in many different ways in diverse application domains. The concept of dominance is based on the values of the attributes of each object. An object dominates another object if is better than . This goodness criterion may differ from one user to another. However, all decisions boil down to the minimization or maximization of attribute values. In this book, we will explore algorithms and applications related to dominance-based query processing. The concept of dominance has a long history in finance and multi-criteria optimization. However, the introduction of the concept to the database community in 2001 inspired many researchers to contribute to the area. Therefore, many algorithmic techniques have been proposed for the efficient processing of dominance-based queries, such as skyline queries, -dominant queries, and top- dominating queries, just to name a few.
Download or read book Perspectives on Business Intelligence written by Raymond T. Ng and published by Springer Nature. This book was released on 2022-05-31 with total page 151 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the 1980s, traditional Business Intelligence (BI) systems focused on the delivery of reports that describe the state of business activities in the past, such as for questions like "How did our sales perform during the last quarter?" A decade later, there was a shift to more interactive content that presented how the business was performing at the present time, answering questions like "How are we doing right now?" Today the focus of BI users are looking into the future. "Given what I did before and how I am currently doing this quarter, how will I do next quarter?" Furthermore, fuelled by the demands of Big Data, BI systems are going through a time of incredible change. Predictive analytics, high volume data, unstructured data, social data, mobile, consumable analytics, and data visualization are all examples of demands and capabilities that have become critical within just the past few years, and are growing at an unprecedented pace. This book introduces research problems and solutions on various aspects central to next-generation BI systems. It begins with a chapter on an industry perspective on how BI has evolved, and discusses how game-changing trends have drastically reshaped the landscape of BI. One of the game changers is the shift toward the consumerization of BI tools. As a result, for BI tools to be successfully used by business users (rather than IT departments), the tools need a business model, rather than a data model. One chapter of the book surveys four different types of business modeling. However, even with the existence of a business model for users to express queries, the data that can meet the needs are still captured within a data model. The next chapter on vivification addresses the problem of closing the gap, which is often significant, between the business and the data models. Moreover, Big Data forces BI systems to integrate and consolidate multiple, and often wildly different, data sources. One chapter gives an overview of several integration architectures for dealing with the challenges that need to be overcome. While the book so far focuses on the usual structured relational data, the remaining chapters turn to unstructured data, an ever-increasing and important component of Big Data. One chapter on information extraction describes methods for dealing with the extraction of relations from free text and the web. Finally, BI users need tools to visualize and interpret new and complex types of information in a way that is compelling, intuitive, but accurate. The last chapter gives an overview of information visualization for decision support and text.