[EBOOK] Optimizing Memory Systems For High Efficiency In Computing Clusters PDF Download

Optimizing Memory Systems for High Efficiency in Computing Clusters

Book Details:

Author : Wenjie Liu
Publisher :
Release : 2022
ISBN :
Pages : 0 pages

Download or read book Optimizing Memory Systems for High Efficiency in Computing Clusters written by Wenjie Liu and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: DRAM-based memory system suffers from increasing aggravating row buffer interference, which causes significant performance degradation and power consumption. With DRAM scaling, the overheads of row buffer interference become even worse due to higher row activation and precharge latency. Clusters have been a prevalent and successful computing framework for processing large amount of data due to their distributed and parallelized working paradigm. A task submitted to a cluster is typically divided into a number of subtasks which are designated to different work nodes running the same code but dealing with different equal portion of the dataset to be processed. Due to the existence of heterogeneity, it could easily result in stragglers unfairly slowing down the entire processing, because work nodes finish their subtasks at different rates. With the increasing problem complexity, more irregular applications are deployed on high-performance clusters due to the parallel working paradigm, and yield irregular memory access behaviors across nodes. However, the irregularity of memory access behaviors is not comprehensively studied, which results in low utilization of the integrated hybrid memory system compositing of stacked DRAM and off-chip DRAM. This dissertation lists our research results on the above three mentioned challenges in order to optimize the memory system for high efficiency in computing clusters. Details are as follows: To address low row buffer utilization caused by row buffer interference, we propose Row Buffer Cache (RBC) architecture to efficiently mitigate row buffer interference overheads. At the core of the RBC architecture, the DRAM pages with good locality are cached and escape from the row buffer interference.Such an RBC architecture significantly reduces the overheads caused by row activation and precharge, thus improves overall system performance and energy efficiency. We evaluate our RBC using SPEC CPU2006 on a DDR4 memory compared to the commodity baseline memory system along with the state-of-art methods, DICE and Bingo. Results show that RBC improves the memory performance by up to 2.24X (16.1% on average) and reduces the overall memory energy by up to 68.2% (23.6% on average) for single-core simulations. For multi-core simulations, RBC increases the performance by up to 1.55X (16.7% on average) and reduces the energy by up to 35.4% (21.3% on average). Comparing with the state-of-art methods, RBC outperforms DICE and Bingo by 8% and 5.1% on average for single-core scenario, and by 10.1% and 4.7% for multi-core scenario. To relax the straggling effect observed in clusters, we aim to speed up straggling work nodes to quicken the overall processing by leveraging exhibited performance variation, and propose StragglerHelper which conveys the memory access characteristics experienced by the forerunner to the stragglers such that stragglers can be sped up due to the accurately informed memory prefetching. A Progress Monitor is deployed to supervise the respective progresses of the work nodes and inform the memory access patterns of forerunner to straggling nodes. Our evaluation results with the SPEC MPI 2007 and BigDataBench on a cluster of 64 work nodes have shown that StragglerHelper is able to improve the execution time of stragglers by up to 99.5% with an average of 61.4%, contributing to an overall improvement of the entire cohort of the cluster by up to 46.7% with an average of 9.9% compared to the baseline cluster. To address the performance difference in the irregular application, we devise a novel method called Similarity-Managed Hybrid Memory System (SM-HMS) to improve the hybrid memory system performance by leveraging the memory access similarity among nodes in a cluster. Within SM-HMS, two techniques are proposed, Memory Access Similarity Measuring and Similarity-based Memory Access Behavior Sharing. To quantify the memory access similarity, memory access behaviors of each node are vectorized, and the distance between two vectors is used as the memory access similarity. The calculated memory access similarity is used to share memory access behaviors precisely across nodes. With the shared memory access behaviors, SM-HMS divides the stacked DRAM into two sections, the sliding window section and the outlier section. The shared memory access behaviors guide the replacement of the sliding window section while the outlier section is managed in the LRU manner. Our evaluation results with a set of irregular applications on various clusters consisting of up to 256 nodes have shown that SM-HMS outperforms the state-of-the-art approaches, Cameo, Chameleon, and Hyrbid2, on job finish time reduction by up to 58.6%, 56.7%, and 31.3%, with 46.1%, 41.6%, and 19.3% on average, respectively. SM-HMS can also achieve up to 98.6% (91.9% on average) of the ideal hybrid memory system performance.

Computers

High Performance Memory Systems

Book Details:

Author : Haldun Hadimioglu
Publisher : Springer Science & Business Media
Release : 2011-06-27
ISBN : 1441989870
Pages : 298 pages

Download or read book High Performance Memory Systems written by Haldun Hadimioglu and published by Springer Science & Business Media. This book was released on 2011-06-27 with total page 298 pages. Available in PDF, EPUB and Kindle. Book excerpt: The State of Memory Technology Over the past decade there has been rapid growth in the speed of micropro cessors. CPU speeds are approximately doubling every eighteen months, while main memory speed doubles about every ten years. The International Tech nology Roadmap for Semiconductors (ITRS) study suggests that memory will remain on its current growth path. The ITRS short-and long-term targets indicate continued scaling improvements at about the current rate by 2016. This translates to bit densities increasing at two times every two years until the introduction of 8 gigabit dynamic random access memory (DRAM) chips, after which densities will increase four times every five years. A similar growth pattern is forecast for other high-density chip areas and high-performance logic (e.g., microprocessors and application specific inte grated circuits (ASICs)). In the future, molecular devices, 64 gigabit DRAMs and 28 GHz clock signals are targeted. Although densities continue to grow, we still do not see significant advances that will improve memory speed. These trends have created a problem that has been labeled the Memory Wall or Memory Gap.

Computer engineering

Memory Management and Optimization Using Distributed Shared Memory Systems for High Performance Computing Clusters

Book Details:

Author : Niraj Upadhayaya
Publisher :
Release : 2006
ISBN :
Pages : 214 pages

Download or read book Memory Management and Optimization Using Distributed Shared Memory Systems for High Performance Computing Clusters written by Niraj Upadhayaya and published by . This book was released on 2006 with total page 214 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Optimizing HPC Applications with Intel Cluster Tools

Book Details:

Author : Alexander Supalov
Publisher : Apress
Release : 2014-10-09
ISBN : 1430264977
Pages : 291 pages

Download or read book Optimizing HPC Applications with Intel Cluster Tools written by Alexander Supalov and published by Apress. This book was released on 2014-10-09 with total page 291 pages. Available in PDF, EPUB and Kindle. Book excerpt: Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interface (MPI) and OpenMP for multi-threading to achieve the ultimate goal of high performance at low power consumption on enterprise-class workstations and compute clusters. The book focuses on optimization for clusters consisting of the Intel® Xeon processor, but the optimization methodologies also apply to the Intel® Xeon Phi™ coprocessor and heterogeneous clusters mixing both architectures. Besides the tutorial and reference content, the authors address and refute many myths and misconceptions surrounding the topic. The text is augmented and enriched by descriptions of real-life situations.

Content aware Memory Systems for High performance Energy efficient Data Movement

Book Details:

Author : Shibo Wang
Publisher :
Release : 2017
ISBN :
Pages : 173 pages

Download or read book Content aware Memory Systems for High performance Energy efficient Data Movement written by Shibo Wang and published by . This book was released on 2017 with total page 173 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Power dissipation and limited memory bandwidth are significant bottlenecks in virtually all computer systems, from datacenters to mobile devices. The memory subsystem is responsible for a significant and growing fraction of the total system energy due to data movement throughout the memory hierarchy. These energy and performance problems become more severe as emerging data-intensive applications place a larger fraction of the data in memory, and require substantial data processing and transmission capabilities. As a result, it is critical to architect novel, energy- and bandwidth-efficient memory systems and data access mechanisms for future computer systems. Existing memory systems are largely oblivious to the contents of the transferred or stored data. However, the transmission and storage costs of data with different contents often differ, which creates new possibilities to reduce the attendant data movement overheads. This dissertation investigates both content aware transmission and storage mechanisms in conventional DRAM systems, such as DDRx, and emerging memory architectures, such as Hybrid Memory Cube (HMC). Content aware architectural techniques are developed to improve the performance and energy efficiency of the memory hierarchy. The dissertation first presents a new energy-efficient data encoding mechanism based on online data clustering that exploits asymmetric data movement costs. One promising way of reducing the data movement energy is to design the interconnect such that the transmission of 0s is considerably cheaper than that of 1s. Given such an interconnect with asymmetric transmission costs, data movement energy can be reduced by encoding the transmitted data such that the number of 1s in each transmitted codeword is minimized. In the proposed coding scheme, the transmitted data blocks are dynamically grouped into clusters based on the similarities between their binary representations. Each cluster has a center with a bit pattern close to those of the data blocks that belong to that cluster. Each transmitted data block is expressed as the bitwise XOR between the nearest cluster center and a sparse residual with a small number of 1s. The data movement energy is minimized by sending the sparse residual along with an identifier that specifies which cluster center to use in decoding the transmitted data. At runtime, the proposed approach continually updates the cluster centers based on the observed data to adapt to phase changes. By dynamically learning and adjusting the cluster centers, the Hamming distance between each data block and the nearest cluster center can be significantly reduced. As a result, the total number of 1s in the transmitted residual is lowered, leading to substantial savings in data movement energy. The dissertation then introduces content aware refresh - a novel DRAM refresh method that reduces the refresh rate by exploiting the unidirectional nature of DRAM retention errors: assuming that a logical 1 and 0 respectively are represented by the presence and absence of charge, 1-to-0 failures dominate the retention errors. As a result, in a DRAM system that uses a block error correcting code (ECC) to protect memory from errors, blocks with fewer 1s exhibit a lower probability of encountering an uncorrectable error. Such blocks can attain a specified reliability target with a refresh rate lower than what is required for a block with all 1s. Leveraging this key insight, and without compromising memory reliability, the proposed content aware refresh mechanism refreshes memory blocks with fewer 1s less frequently. In the proposed content-aware refresh mechanism, the refresh rate of a refresh group - a group of DRAM rows refreshed together?is decided based on the worst case ECC block in that group, which is the block with the greatest number of 1s. In order to keep the overhead of tracking multiple refresh rates manageable, multiple refresh groups are dynamically arranged into one of a predefined number of refresh bins and refreshed at the same rate. To reduce the number of refresh operations, both the refresh rates of the bins and the refresh group-to-bin assignments are adaptively changed at runtime. By tailoring the refresh rate to the actual content of a memory block rather than assuming a worst case data pattern, the proposed content aware refresh technique effectively avoids unnecessary refresh operations and significantly improves the performance and energy efficiency of DRAM systems. Finally, the dissertation examines a novel HMC power management solution that enables energy-efficient HMC systems with erasure codes. The key idea is to encode multiple blocks of data in a single coding block that is distributed among all of the HMC modules in the system, and to store the resulting check bits in a dedicated, always-on HMC. The inaccessible data that are stored in a sleeping HMC module can be reconstructed by decoding a subset of the remaining memory blocks retrieved from other active HMCs, rather than waiting for the sleeping HMC module to become active. A novel data selection policy is used to decide which data to encode at runtime, significantly increasing the probability of reconstructing otherwise inaccessible data. The coding procedure is optimized by leveraging the near memory computing capability of the HMC logic layer. This approach makes it possible to tolerate the latency penalty incurred when switching an HMC between active and sleep modes, thereby enabling a power-capped HMC system."--Pages xi-xiv.

Technology & Engineering

Fast Efficient and Predictable Memory Accesses

Book Details:

Author : Lars Wehmeyer
Publisher : Springer Science & Business Media
Release : 2006-09-08
ISBN : 140204822X
Pages : 263 pages

Download or read book Fast Efficient and Predictable Memory Accesses written by Lars Wehmeyer and published by Springer Science & Business Media. This book was released on 2006-09-08 with total page 263 pages. Available in PDF, EPUB and Kindle. Book excerpt: Speed improvements in memory systems have not kept pace with the speed improvements of processors, leading to embedded systems whose performance is limited by the memory. This book presents design techniques for fast, energy-efficient and timing-predictable memory systems that achieve high performance and low energy consumption. In addition, the use of scratchpad memories significantly improves the timing predictability of the entire system, leading to tighter worst case execution time bounds.

Computers

High Performance Computing on Complex Environments

Book Details:

Author : Emmanuel Jeannot
Publisher : John Wiley & Sons
Release : 2014-04-10
ISBN : 1118712072
Pages : 512 pages

Download or read book High Performance Computing on Complex Environments written by Emmanuel Jeannot and published by John Wiley & Sons. This book was released on 2014-04-10 with total page 512 pages. Available in PDF, EPUB and Kindle. Book excerpt: With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of Heterogeneous High-Performance Computing. • Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC • Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems • Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency

Computers

HPC Green IT

Book Details:

Author : Ralf Gruber
Publisher : Springer Science & Business Media
Release : 2010-03-15
ISBN : 3642017894
Pages : 230 pages

Download or read book HPC Green IT written by Ralf Gruber and published by Springer Science & Business Media. This book was released on 2010-03-15 with total page 230 pages. Available in PDF, EPUB and Kindle. Book excerpt: Making the most ef?cient use of computer systems has rapidly become a leading topic of interest for the computer industry and its customers alike. However, the focus of these discussions is often on single, isolated, and speci?c architectural and technological improvements for power reduction and conservation, while ignoring the fact that power ef?ciency as a ratio of performance to power consumption is equally in?uenced by performance improvements and architectural power red- tion. Furthermore, ef?ciency can be in?uenced on all levels of today’s system hi- archies from single cores all the way to distributed Grid environments. To improve execution and power ef?ciency requires progress in such diverse ?elds as program optimization, optimization of program scheduling, and power reduction of idling system components for all levels of the system hierarchy. Improving computer system ef?ciency requires improving system performance and reducing system power consumption. To research and reach reasonable conc- sions about system performance we need to not only understand the architectures of our computer systems and the available array of code transformations for p- formance optimizations, but we also need to be able to express this understanding in performance models good enough to guide decisions about code optimizations for speci?c systems. This understanding is necessary on all levels of the system hierarchy from single cores to nodes to full high performance computing (HPC) systems, and eventually to Grid environments with multiple systems and resources.

Memory Systems for High performance Computing the Capacity and Reliability Implications

Book Details:

Author : Darko Živanovič
Publisher :
Release : 2018
ISBN :
Pages : 144 pages

Download or read book Memory Systems for High performance Computing the Capacity and Reliability Implications written by Darko Živanovič and published by . This book was released on 2018 with total page 144 pages. Available in PDF, EPUB and Kindle. Book excerpt: Memory systems are signicant contributors to the overall power requirements, energy consumption, and the operational cost of large high-performance computing systems (HPC). Limitations of main memory systems in terms of latency, bandwidth and capacity, can signicantly affect the performance of HPC applications, and can have strong negative impact on system scalability. In addition, errors in the main memory system can have a strong impact on the reliability, accessibility and serviceability of large-scale clusters. This thesis studies capacity and reliability issues in modern memory systems for high-performance computing. The choice of main memory capacity is an important aspect of high-performance computing memory system design. This choice becomes in- creasingly important now that 3D-stacked memories are entering the market. Compared with conventional DI10s, 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now. We analyze memory capacity requirements of important HPC benchmarks and applications. The study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step towards the adoption of this novel technology in the HPC domain. For HPC domains where large memory capacities are required, we propose scaling-in of HPC applications to reduce energy consumption and the running time of a batch of jobs. We also propose upgrading the per-node memory capacity, which enables greater degree of scaling-in and additional energy savings. Memory system is one of the main causes of hardware failures. In each generation, the DRAM chip density and the amount of the memory in systems increase, while the DRAM technology process is constantly shrinking. Therefore, we could expect that the DRAM failures could have a serious impact on the future-systems reliability. This thesis studies DRAM errors observed on a production HPC system during a period of two years. We clearly distinguish between two different approaches for the DRAM error analysis: categorical analysis and the analysis of error rates. The first approach compares the errors at the DI10 level and partitions the DI10s into various categories, e.g. based on whether they did or did not experience an error. The second approach is to analyze the error rates, i.e., to present the total number of errors relative to other statistics, typically the number of MB-hours or the duration of the observation period. We show that although DRAM error analysis may be performed with both approaches, they are not interchangeable and can lead to completely different conclusions. We further demonstrate the importance of providing statistical significance and presenting results that have practical value and real-life use. We show that various widely-accepted approaches for DRAM error analysis may provide data that appear to support an interesting conclusion, but are not statistically signifcant, meaning that they could merely be the result of chance. We hope the study of methods for DRAM error analysis presented in this thesis will become a standard for any future analysis of DRAM errors in the field.

Computers

Fault Tolerance Techniques for High Performance Computing

Book Details:

Author : Thomas Herault
Publisher : Springer
Release : 2015-07-01
ISBN : 3319209434
Pages : 325 pages

Download or read book Fault Tolerance Techniques for High Performance Computing written by Thomas Herault and published by Springer. This book was released on 2015-07-01 with total page 325 pages. Available in PDF, EPUB and Kindle. Book excerpt: This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Computers

Software Optimization for High performance Computing

Book Details:

Author : Kevin R. Wadleigh
Publisher : Prentice Hall Professional
Release : 2000
ISBN : 9780130170088
Pages : 414 pages

Download or read book Software Optimization for High performance Computing written by Kevin R. Wadleigh and published by Prentice Hall Professional. This book was released on 2000 with total page 414 pages. Available in PDF, EPUB and Kindle. Book excerpt: The hands-on guide to high-performance coding and algorithm optimization. This hands-on guide to software optimization introduces state-of-the-art solutions for every key aspect of software performance - both code-based and algorithm-based. Two leading HP software performance experts offer comparative optimization strategies for RISC and for the new Explicitly Parallel Instruction Computing (EPIC) design used in Intel IA-64 processors. Using many practical examples, they offer specific techniques for: Predicting and measuring performance - and identifying your best optimization opportunities Storage optimization: cache, system memory, virtual memory, and I/0 Parallel processing: distributed-memory and shared-memory (SMP and ccNUMA) Compilers and loop optimization Enhancing parallelism: compiler directives, threads, and message passing Mathematical libraries and algorithms Whether you're a developer, ISV, or technical researcher, if you need to optimize high-performance software on today's leading processors, one book delivers the advanced techniques and code examples you need: Software Optimization for High Performance Computing.

Memory System Optimizations for Customized Computing From Single Chip to Datacenter

Book Details:

Author : Yu-Ting Chen
Publisher :
Release : 2016
ISBN :
Pages : 314 pages

Download or read book Memory System Optimizations for Customized Computing From Single Chip to Datacenter written by Yu-Ting Chen and published by . This book was released on 2016 with total page 314 pages. Available in PDF, EPUB and Kindle. Book excerpt: Energy efficiency is one of the key considerations for various systems, from handheld devices to servers in a data center. Application-specific accelerators can provide 10 - 1000X energy-efficiency improvement over general-purpose processors through customization and by exploiting the application parallelism. The design of memory system is the key to improve performance and energy efficiency for both accelerators and processors. However, even with customization and acceleration, the single-server computation power is still limited and cannot support need of large-scale data processing and analytics. Therefore, the second goal of this dissertation is to provide customization support in the in-memory cluster computing system for such big data applications. The first part of this dissertation investigates the design and optimizations of memory system. Our goal is to design a high-performance and energy-efficient memory system that supports both general-purpose processors and accelerator-rich architectures (ARAs). We proposed hybrid caches architecture and corresponding optimizations for processor caches. We also provide an optimal algorithm to synthesize the ARA memory system. In the second part of this dissertation, we focus on improving the performance of an important domain, DNA sequencing pipeline, which demands huge computation need together with big data characteristics. We adopt the in-memory cluster computing framework, Spark, to provide scalable speedup while providing hardware acceleration support in the cluster. With such system, we can reduce the time of sequence alignment process from tens of hours to 32 minutes.

Computers

Recent Advances in the Message Passing Interface

Book Details:

Author : Rainer Keller
Publisher : Springer Science & Business Media
Release : 2010-09-02
ISBN : 3642156452
Pages : 320 pages

Download or read book Recent Advances in the Message Passing Interface written by Rainer Keller and published by Springer Science & Business Media. This book was released on 2010-09-02 with total page 320 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the proceedings of the 17th European MPI User's Group Meeting on Recent Advances in the Message Passing Interface held in Stuttgart in September 2010.

$Book High performance computing for solving large sparse systems Optical diffraction tomography as a case of study$

High performance computing for solving large sparse systems Optical diffraction tomography as a case of study

Book Details:

Author : Gloria Ortega López
Publisher : Universidad Almería
Release : 2015-04-14
ISBN : 8416027587
Pages : 182 pages

Download or read book High performance computing for solving large sparse systems Optical diffraction tomography as a case of study written by Gloria Ortega López and published by Universidad Almería. This book was released on 2015-04-14 with total page 182 pages. Available in PDF, EPUB and Kindle. Book excerpt: This thesis, entitled €High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study€ investigates the computational issues related to the resolution of linear systems of equations which come from the discretization of physical models described by means of Partial Differential Equations (PDEs). These physical models are conceived for the description of the space-temporary behavior of some physical phenomena f(x, y, z, t) in terms of their variations (partial derivative) with respect to the dependent variables of the phenomena. There is a wide variety of discretization methods for PDEs. Two of the most well-known methods are the Finite Difference Method (FDM) and the Finite Element Method (FEM). Both methods result in an algebraic description of the model that can be translated into the approach of a linear system of equations of type (Ax = b), where A is a sparse matrix (a high percentage of zero elements) whose size depends on the required accuracy of the modeled phenomena. This thesis begins with the algebraic description of the model associated with the physical phenomena, and the work herein has been focused on the design of techniques and computational models that allow the resolution of these linear systems of equations. The main interest of this study is specially focused on models which require a high level of discretization and usually generate sparse matrices, A, which have a highly sparse structure and large size. Literature characterizes these types of problems by their high demanding computational requirements (because of their fine degree of discretization) and the sparsity of the matrices involved, suggesting that these kinds of problems can only be solved using High Performance Computing techniques and architectures. One of the main goals of this thesis is the research of the possible alternatives which allow the implementation of routines to solve large and sparse linear systems of equations using High Performance Computing (HPC). The use of massively parallel platforms (GPUs) allows the acceleration of these routines, because they have several advantages for vectorial computation schemes. On the other hand, the use of distributed memory platforms allows the resolution of problems defined by matrices of enormous size. Finally, the combination of both techniques, distributed computation and multi-GPUs, will allow faster resolution of interesting problems in which large and sparse matrices are involved. In this line, one of the goals of this thesis is to supply the scientific community with implementations based on multi-GPU clusters to solve sparse linear systems of equations, which are the key in many scientific computations. The second part of this thesis is focused on a real physical problem of Optical Diffractional Tomography (ODT) based on holographic information. ODT is a non-damaging technique which allows the extraction of the shapes of objects with high accuracy. Therefore, this technique is very suitable to the in vivo study of real specimens, microorganisms, etc., and it also makes the investigation of their dynamics possible. A preliminary physical model based on a bidimensional reconstruction of the seeding particle distribution in fluids was proposed by J. Lobera and J.M. Coupland. However, its high computational cost (in both memory requirements and runtime) made compulsory the use of HPC techniques to extend the implementation to a three dimensional model. In the second part of this thesis, the implementation and validation of this physical model for the case of three dimensional reconstructions is carried out. In such implementation, the resolution of large and sparse linear systems of equations is required. Thus, some of the algebraic routines developed in the first part of the thesis have been used to implement computational strategies capable of solving the problem of 3D reconstruction based on ODT.

Computers

A Primer on Compression in the Memory Hierarchy

Book Details:

Author : Somayeh Sardashti
Publisher : Morgan & Claypool Publishers
Release : 2015-12-01
ISBN : 1627057048
Pages : 88 pages

Download or read book A Primer on Compression in the Memory Hierarchy written by Somayeh Sardashti and published by Morgan & Claypool Publishers. This book was released on 2015-12-01 with total page 88 pages. Available in PDF, EPUB and Kindle. Book excerpt: This synthesis lecture presents the current state-of-the-art in applying low-latency, lossless hardware compression algorithms to cache, memory, and the memory/cache link. There are many non-trivial challenges that must be addressed to make data compression work well in this context. First, since compressed data must be decompressed before it can be accessed, decompression latency ends up on the critical memory access path. This imposes a significant constraint on the choice of compression algorithms. Second, while conventional memory systems store fixed-size entities like data types, cache blocks, and memory pages, these entities will suddenly vary in size in a memory system that employs compression. Dealing with variable size entities in a memory system using compression has a significant impact on the way caches are organized and how to manage the resources in main memory. We systematically discuss solutions in the open literature to these problems. Chapter 2 provides the foundations of data compression by first introducing the fundamental concept of value locality. We then introduce a taxonomy of compression algorithms and show how previously proposed algorithms fit within that logical framework. Chapter 3 discusses the different ways that cache memory systems can employ compression, focusing on the trade-offs between latency, capacity, and complexity of alternative ways to compact compressed cache blocks. Chapter 4 discusses issues in applying data compression to main memory and Chapter 5 covers techniques for compressing data on the cache-to-memory links. This book should help a skilled memory system designer understand the fundamental challenges in applying compression to the memory hierarchy and introduce him/her to the state-of-the-art techniques in addressing them.

Computer storage devices

The Memory System

Book Details:

Author : Bruce Jacob
Publisher : Morgan & Claypool Publishers
Release : 2009
ISBN : 159829587X
Pages : 78 pages

Download or read book The Memory System written by Bruce Jacob and published by Morgan & Claypool Publishers. This book was released on 2009 with total page 78 pages. Available in PDF, EPUB and Kindle. Book excerpt: Introduce the reader to the most important details of the memory system. This book targets both computer scientists and computer engineers in industry and in academia. Roughly speaking, computer scientists are the users of the memory system and computer engineers are the designers of the memory system. Both can benefit tremendously from a basic understanding of how the memory system really works.

Computers

Euro Par 2003 Parallel Processing

Book Details:

Author : Harald Kosch
Publisher : Springer
Release : 2004-06-01
ISBN : 3540452095
Pages : 1324 pages

Download or read book Euro Par 2003 Parallel Processing written by Harald Kosch and published by Springer. This book was released on 2004-06-01 with total page 1324 pages. Available in PDF, EPUB and Kindle. Book excerpt: Euro-ParConferenceSeries The European Conference on Parallel Computing (Euro-Par) is an international conference series dedicated to the promotion and advancement of all aspects of parallel and distributed computing. The major themes fall into the categories of hardware, software, algorithms, and applications. This year, new and interesting topicswereintroduced,likePeer-to-PeerComputing,DistributedMultimedia- stems, and Mobile and Ubiquitous Computing. For the ?rst time, we organized a Demo Session showing many challenging applications. The general objective of Euro-Par is to provide a forum promoting the de- lopment of parallel and distributed computing both as an industrial technique and an academic discipline, extending the frontiers of both the state of the art and the state of the practice. The industrial importance of parallel and dist- buted computing is supported this year by a special Industrial Session as well as a vendors’ exhibition. This is particularly important as currently parallel and distributed computing is evolving into a globally important technology; the b- zword Grid Computing clearly expresses this move. In addition, the trend to a - bile world is clearly visible in this year’s Euro-Par. ThemainaudienceforandparticipantsatEuro-Parareresearchersinaca- mic departments, industrial organizations, and government laboratories. Euro- Par aims to become the primary choice of such professionals for the presentation of new results in their speci?c areas. Euro-Par has its own Internet domain with a permanent Web site where the history of the conference series is described: http://www.euro-par.org. The Euro-Par conference series is sponsored by the Association for Computer Machinery (ACM) and the International Federation for Information Processing (IFIP).