EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Using Peer Support to Reduce Fault tolerant Overhead in Distributed Shared Memories

Download or read book Using Peer Support to Reduce Fault tolerant Overhead in Distributed Shared Memories written by G. C. Hunt and published by . This book was released on 1996 with total page 14 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Proceedings of the Fifth International Symposium on Assessment of Software Tools

Download or read book Proceedings of the Fifth International Symposium on Assessment of Software Tools written by Ez Nahouraii and published by Institute of Electrical & Electronics Engineers(IEEE). This book was released on 1997 with total page 172 pages. Available in PDF, EPUB and Kindle. Book excerpt: Focuses on component-based software engineering, which emphasizes the construction of application systems from a combination of existing software assets and specifically-produced items. The 19 selected papers explore such aspects as the model-integrated development of complex applications, a compiler for composition, a procurement- centric model, metrics and risk, experimenting in industry settings, components and objects, managing risk, and commercial off-the-shelf and open systems. No subject index. Annotation copyrighted by Book News, Inc., Portland, OR.

Book Fault tolerant Distributed Shared Memories

Download or read book Fault tolerant Distributed Shared Memories written by Larry Brown and published by . This book was released on 1993 with total page 474 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Fault Tolerance in Distributed Shared Memory

Download or read book Fault Tolerance in Distributed Shared Memory written by Samir Muranjan and published by . This book was released on 1997 with total page 104 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book A Fault tolerant Coherence Protocol for Distributed Shared Memory Systems

Download or read book A Fault tolerant Coherence Protocol for Distributed Shared Memory Systems written by Pallavi K. Ramam and published by . This book was released on 1998 with total page 436 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Fault Tolerance for Main memory Applications in the Cloud

Download or read book Fault Tolerance for Main memory Applications in the Cloud written by Tuan Anh Cao and published by . This book was released on 2013 with total page 312 pages. Available in PDF, EPUB and Kindle. Book excerpt: Advances in hardware have enabled many long-running applications to execute entirely in main memory. With the emergence of cloud computing, thousands of machines could be made available to deploy such applications with lowered operational and maintenance costs. While achieving substantially better performance, these applications have encountered new challenges in achieving fault tolerance; i.e., to ensure durability in the event of a crash. In addition, many of these applications, such as massively multiplayer online games, main-memory OLTP systems, main-memory search engine and deterministic transaction processing systems, must sustain extremely high update rates - often hundreds of thousands of updates per second. They also demand extremely high throughput (e.g. scientific simulation) or low latency (e.g. massively multiplayer online games). To support these demanding requirements, these applications have increasingly turned to database techniques. In this dissertation, we propose an approach to provide fault tolerance for main-memory applications without introducing excessive overhead or latency spikes. First, we evaluate the applicability of existing checkpoint recovery techniques developed for main-memory DBMS. We use massively multiplayer online games (MMOs) as our motivating example. In particular, we show how to adapt consistent checkpointing techniques developed for main-memory databases to MMOs. Furthermore, we provide a thorough simulation model and evaluation of six recovery strategies. Based on our results, we argue that not all state-of-the-art checkpoint recovery techniques are equally suited for low-latency and high-throughput applications such as MMOs. These algo- rithms either use locks or large synchronous copy operations, which hurt throughput and latency, respectively. Next, we take advantage of frequent points of consistency in many of these applications to develop novel checkpoint recovery algorithms that trade additional space in main memory for significantly lower overhead and latency. Compared to previous work, our new algorithms do not require any locking or bulk copies of the application state. Our experimental evaluation shows that one of our new algorithms attains nearly constant latency and reduces overhead by more than an order of magnitude for low to medium update rates. Additionally, in a heavily loaded main-memory transaction processing system, it still reduces overhead by more than a factor of two. Finally, we present BRRL, a library for making distributed main-memory applications fault tolerant. BRRL is optimized for cloud applications with frequent points of consistency that use data-parallelism to avoid complex concurrency control mechanisms. BRRL differs from existing recovery libraries by providing a simple table abstraction and using schema information to optimize checkpointing.

Book Fault Tolerance  Methods of Rollback Recovery

Download or read book Fault Tolerance Methods of Rollback Recovery written by Stanford University. Computer Systems Laboratory and published by . This book was released on 1997 with total page 57 pages. Available in PDF, EPUB and Kindle. Book excerpt: This paper describes the latest methods of rollback recovery for fault-tolerant distributed shared memory (DSM) multiprocessors. This report discusses (1) the theoretical issues that rollback recovery addresses, (2) the 3 major classes of methods for recovery, and (3) the relative merits of each class.

Book A Shared bus Shared memory Distributed Processing System for Fault Tolerance Laboratory Experimentation

Download or read book A Shared bus Shared memory Distributed Processing System for Fault Tolerance Laboratory Experimentation written by Bryan Chris Rickertsen and published by . This book was released on 1979 with total page 158 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Synchronization and Fault Tolerance Techniques in Concurrent Shared Memory Systems

Download or read book Synchronization and Fault Tolerance Techniques in Concurrent Shared Memory Systems written by Sahil Dhoked and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Mutual exclusion is one of the most commonly used techniques to handle contention in concurrent systems. Traditionally, mutual exclusion algorithms have been designed under the assumption that a process does not fail while acquiring/releasing a lock or while executing its critical section. However, failures do occur in real life, potentially leaving the lock in an inconsistent state. This gives rise to the problem of recoverable mutual exclusion (RME) that involves designing a mutual exclusion (ME) algorithm that can tolerate failures, while maintaining safety and liveness properties. With the recent development of NVRAM (non-volatile random-access memory) technologies, there is renewed interest in the RME problem. The NVRAM technology is a combination of the low latency of traditional random-access memory with the high persistence of disk storage media. NVRAMs can be used to provide near-instantaneous recovery to many problems including the RME problem. This work describes techniques for designing efficient algorithms to solve the RME problem under two different failure models, independent failure model and system-wide failure model, depending on whether processes fail independently or simultaneously. Additionally, especially for systems with low memory capacity, this work describes fault-tolerant techniques for reclaiming memory, in case there is no built-in support for garbage collection. The primary measure of an RME algorithm is its performance. Performance of any ME algorithm, including an RME algorithm, is measured by the number of remote memory references (RMRs) made by a process—for acquiring and releasing a lock as well as recovering the lock structure after a failure. Loosely speaking, it represents the number of expensive shared memory instructions. In this work, two models of RMR computation are considered: (a) the CC model, and (b) the DSM model. The results mentioned in this work are applicable to both of these computation models. For the independent failure model, this work presents a framework that transforms any algorithm that solves the RME problem into an algorithm whose performance (in terms of RMRs) can simultaneously adapt to (a) the number of processes competing for the lock, as well as (b) the number of failures that have occurred in the recent past, while maintaining the correctness and performance properties of the underlying RME algorithm. Assume that, for n processes, the RMR complexity of the underlying RME algorithm is R(n). Then, this framework yields an RME algorithm for which the RMR complexity is given by O(min{c, ̈ √ F + 1, R(n)}), where ̈c denotes the point contention (number of active processes) and F denotes the number of failures in the recent past. The system-wide failure model is a special case of the independent failure model that assumes that failures only occur simultaneously. For example, a power outage is a real life example of such a failure. This model makes a stronger assumption than just multiple independent failures. This assumption is leveraged with enhanced RME algorithms presented under this model. For the system-wide failure model, this work presents optimal RME algorithms (and related transformations) whose worst-case performance yield a O(1) RMR complexity. The fault-tolerant memory reclamation algorithm provides novel techniques to bound the worst-case space complexity of RME algorithms. The techniques used are general enough that they may also be employed to bound the space complexity of other RME algorithms. Its RMR complexity is merely an additive factor of O(1).

Book The Consensus Power of Shared memory Distributed Systems

Download or read book The Consensus Power of Shared memory Distributed Systems written by Eric Ruppert and published by . This book was released on 2000 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: In many asynchronous distributed systems, processes communicate by accessing objects in a shared memory. The ability of systems to solve problems in a fault-tolerant manner depends on the types of objects provided. Here, the wait-free model of fault-tolerance is used: non-faulty processes must run correctly even if other processes experience halting failures. The consensus problem, where processes begin with private inputs and must agree on one of them, has played a central role in analysing the power of distributed systems. This thesis studies the ability of different types of objects to solve consensus. An object type has consensus number 'n' if it can be used (with read/writehsp sp="0.167"hsp sp="0.167"regist ers to solve consensus among 'n' processes but not among ' n'+1 processes. Conditions are given that are necessary and sufficient for an object type to have consensus number 'n'. This characterization applies to two large classes of objects: readable objects and read-modify-write (RMW) objects. An object is readable if processes can read its state without changing the state. For a RMW object, all operations update the state and then return the previous state of the object. When the type is of bounded size, the characterization may be used to decide the question "Does the type 'T' have consensus number 'n'?", which is undecidable for arbitrary types. The characterization is also used to show that different readable and RMW types with consensus number ' n' cannot be used in combination to solve consensus for 'n '+1 processes. Ordinarily, processes may access only one object in shared memory at a time. This thesis also studies how much the consensus number of a type increases in the multi-object and transactional models, where processes can perform operations on up to 'm' of the objects in a single atomic action. These models are much more convenient for programmers to use, since they guarantee that certain blocks of operations will be executed without interruptions from other processes. This thesis establishes bounds on the consensus numbers of multi-objects and transactional objects as a function of 'm' and the consensus numbers of the corresponding single-access types.

Book On the Relative Power of Shared Objects in Fault tolerant Distributed Systems

Download or read book On the Relative Power of Shared Objects in Fault tolerant Distributed Systems written by Wai Kau Lo and published by . This book was released on 1997 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: A fundamental question in distributed computing is to determine whether a given set of "base" shared object types can be used to implement a new type. In this thesis we study this problem in a fault-tolerant setting, where implementations must work even if some of the processes that share the objects may crash. An implementation is t-resilient, if it tolerates the crash of t processes; it is wait-free, if it is $(n - 1)$-resilient, where n is the number of processes. This thesis makes two contributions. The first concerns the classification of shared object types according to their ability to support wait-free implementations. A wait-free hierarchy assigns object types to levels in $\{1,2,\...\}$ such that, using only objects of any type assigned to level n, in conjunction with registers, we can implement an object of any type in a wait-free manner in a system of n processes. Such a hierarchy is robust if, in a system of n processes, it is not possible to implement objects of types at level n in a wait-free manner, using any number and combination of objects of types that are below level n. We show that, if nondeterministic types are allowed, then the only robust wait-free hierarchy is the trivial one, which lumps all types into level one. One important and useful object type is consensus, because consensus objects and registers alone can be used to implement objects of any type. The second contribution of the thesis concerns the ability of object types to support one-resilient implementations of the type consensus. Specifically, we study the relationship between the one-resilient implementability of consensus objects for n processes and that for $n - 1$ processes, for every $n \ge 3.$ On the one hand, the following is shown for n = 3: there exists a deterministic type that can be used to implement a one-resilient consensus object for three, but not two, processes. On the other hand, for every $n \ge 4$, we show that given any set ${\cal B}$ of object types, there is a one-resilient implementation of a consensus object for n processes using ${\cal B}$ if and only if there is a one-resilient implementation of a consensus object for $n - 1$ processes using ${\cal B}.$