EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Memory System Optimizations for Customized Computing    From Single Chip to Datacenter

Download or read book Memory System Optimizations for Customized Computing From Single Chip to Datacenter written by Yu-Ting Chen and published by . This book was released on 2016 with total page 314 pages. Available in PDF, EPUB and Kindle. Book excerpt: Energy efficiency is one of the key considerations for various systems, from handheld devices to servers in a data center. Application-specific accelerators can provide 10 - 1000X energy-efficiency improvement over general-purpose processors through customization and by exploiting the application parallelism. The design of memory system is the key to improve performance and energy efficiency for both accelerators and processors. However, even with customization and acceleration, the single-server computation power is still limited and cannot support need of large-scale data processing and analytics. Therefore, the second goal of this dissertation is to provide customization support in the in-memory cluster computing system for such big data applications. The first part of this dissertation investigates the design and optimizations of memory system. Our goal is to design a high-performance and energy-efficient memory system that supports both general-purpose processors and accelerator-rich architectures (ARAs). We proposed hybrid caches architecture and corresponding optimizations for processor caches. We also provide an optimal algorithm to synthesize the ARA memory system. In the second part of this dissertation, we focus on improving the performance of an important domain, DNA sequencing pipeline, which demands huge computation need together with big data characteristics. We adopt the in-memory cluster computing framework, Spark, to provide scalable speedup while providing hardware acceleration support in the cluster. With such system, we can reduce the time of sequence alignment process from tens of hours to 32 minutes.

Book Customizable Computing

Download or read book Customizable Computing written by Yu-Ting Chen and published by Springer Nature. This book was released on 2022-05-31 with total page 106 pages. Available in PDF, EPUB and Kindle. Book excerpt: Since the end of Dennard scaling in the early 2000s, improving the energy efficiency of computation has been the main concern of the research community and industry. The large energy efficiency gap between general-purpose processors and application-specific integrated circuits (ASICs) motivates the exploration of customizable architectures, where one can adapt the architecture to the workload. In this Synthesis lecture, we present an overview and introduction of the recent developments on energy-efficient customizable architectures, including customizable cores and accelerators, on-chip memory customization, and interconnect optimization. In addition to a discussion of the general techniques and classification of different approaches used in each area, we also highlight and illustrate some of the most successful design examples in each category and discuss their impact on performance and energy efficiency. We hope that this work captures the state-of-the-art research and development on customizable architectures and serves as a useful reference basis for further research, design, and implementation for large-scale deployment in future computing systems.

Book Advanced Memory Optimization Techniques for Low Power Embedded Processors

Download or read book Advanced Memory Optimization Techniques for Low Power Embedded Processors written by Manish Verma and published by Springer Science & Business Media. This book was released on 2007-06-20 with total page 192 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book proposes novel memory hierarchies and software optimization techniques for the optimal utilization of memory hierarchies. It presents a wide range of optimizations, progressively increasing in the complexity of analysis and of memory hierarchies. The final chapter covers optimization techniques for applications consisting of multiple processes found in most modern embedded devices.

Book Custom Memory Management Methodology

Download or read book Custom Memory Management Methodology written by Francky Catthoor and published by Springer Science & Business Media. This book was released on 2013-03-09 with total page 352 pages. Available in PDF, EPUB and Kindle. Book excerpt: The main intention of this book is to give an impression of the state-of-the-art in system-level memory management (data transfer and storage) related issues for complex data-dominated real-time signal and data processing applications. The material is based on research at IMEC in this area in the period 1989- 1997. In order to deal with the stringent timing requirements and the data dominated characteristics of this domain, we have adopted a target architecture style and a systematic methodology to make the exploration and optimization of such systems feasible. Our approach is also very heavily application driven which is illustrated by several realistic demonstrators, partly used as red-thread examples in the book. Moreover, the book addresses only the steps above the traditional high-level synthesis (scheduling and allocation) or compilation (traditional or ILP oriented) tasks. The latter are mainly focussed on scalar or scalar stream operations and data where the internal structure of the complex data types is not exploited, in contrast to the approaches discussed here. The proposed methodologies are largely independent of the level of programmability in the data-path and controller so they are valuable for the realisation of both hardware and software systems. Our target domain consists of signal and data processing systems which deal with large amounts of data.

Book High Performance Memory Systems

Download or read book High Performance Memory Systems written by Haldun Hadimioglu and published by Springer Science & Business Media. This book was released on 2003-10-31 with total page 314 pages. Available in PDF, EPUB and Kindle. Book excerpt: The State of Memory Technology Over the past decade there has been rapid growth in the speed of micropro cessors. CPU speeds are approximately doubling every eighteen months, while main memory speed doubles about every ten years. The International Tech nology Roadmap for Semiconductors (ITRS) study suggests that memory will remain on its current growth path. The ITRS short-and long-term targets indicate continued scaling improvements at about the current rate by 2016. This translates to bit densities increasing at two times every two years until the introduction of 8 gigabit dynamic random access memory (DRAM) chips, after which densities will increase four times every five years. A similar growth pattern is forecast for other high-density chip areas and high-performance logic (e.g., microprocessors and application specific inte grated circuits (ASICs)). In the future, molecular devices, 64 gigabit DRAMs and 28 GHz clock signals are targeted. Although densities continue to grow, we still do not see significant advances that will improve memory speed. These trends have created a problem that has been labeled the Memory Wall or Memory Gap.

Book Innovations in the Memory System

Download or read book Innovations in the Memory System written by Rajeev Balasubramonian and published by Morgan & Claypool Publishers. This book was released on 2019-09-10 with total page 153 pages. Available in PDF, EPUB and Kindle. Book excerpt: This is a tour through recent and prominent works regarding new DRAM chip designs and technologies, near data processing approaches, new memory channel architectures, techniques to tolerate the overheads of refresh and fault tolerance, security attacks and mitigations, and memory scheduling. The memory system will soon be a hub for future innovation. While conventional memory systems focused primarily on high density, other memory system metrics like energy, security, and reliability are grabbing modern research headlines. With processor performance stagnating, it is also time to consider new programming models that move some application computations into the memory system. This, in turn, will lead to feature-rich memory systems with new interfaces. The past decade has seen a number of memory system innovations that point to this future where the memory system will be much more than dense rows of unintelligent bits.

Book Semiconductor Memories and Systems

Download or read book Semiconductor Memories and Systems written by Andrea Redaelli and published by Woodhead Publishing. This book was released on 2022-06-07 with total page 364 pages. Available in PDF, EPUB and Kindle. Book excerpt: Semiconductor Memories and Systems provides a comprehensive overview of the current state of semiconductor memory at the technology and system levels. After an introduction on market trends and memory applications, the book focuses on mainstream technologies, illustrating their current status, challenges and opportunities, with special attention paid to scalability paths. Technologies discussed include static random access memory (SRAM), dynamic random access memory (DRAM), non-volatile memory (NVM), and NAND flash memory. Embedded memory and requirements and system level needs for storage class memory are also addressed. Each chapter covers physical operating mechanisms, fabrication technologies, and the main challenges to scalability.Finally, the work reviews the emerging trends for storage class memory, mainly focusing on the advantages and opportunities of phase change based memory technologies. - Features contributions from experts from leading companies in semiconductor memory - Discusses physical operating mechanisms, fabrication technologies and paths to scalability for current and emerging semiconductor memories - Reviews primary memory technologies, including SRAM, DRAM, NVM and NAND flash memory - Includes emerging storage class memory technologies such as phase change memory

Book Resilient On chip Memory Design in the Nano Era

Download or read book Resilient On chip Memory Design in the Nano Era written by Abbas Banaiyanmofrad and published by . This book was released on 2015 with total page 219 pages. Available in PDF, EPUB and Kindle. Book excerpt: Aggressive technology scaling in the nano-scale regime makes chips more susceptible to failures. This causes multiple reliability challenges in the design of modern chips, including manufacturing defects, wear-out, and parametric variations. By increasing the number, amount, and hierarchy of on-chip memory blocks in emerging computing systems, the reliability of the memory sub-system becomes an increasingly challenging design issue. The limitations of existing resilient memory design schemes motivate us to think about new approaches considering scalability, interconnect-awareness, and cost-effectiveness as major design factors. In this thesis, we propose different approaches to address resilient on-chip memory design in computing systems ranging from traditional single-core processors to emerging many-core platforms. We classify our proposed approaches in five main categories: 1) Flexible and low-cost approaches to protect cache memories in single-core processors against permanent faults and transient errors, 2) Scalable fault-tolerant approaches to protect last-level caches with non-uniform cache access in chip multiprocessors, 3) Interconnect-aware cache protection schemes in network-on-chip architectures, 4) Relaxing memory resiliency for approximate computing applications, and 5) System-level design space exploration, analysis, and optimization for redundancy-aware on-chip memory resiliency in many-core platforms. We first propose a flexible fault-tolerant cache (FFT-Cache) architecture for SRAM-based on-chip cache memories in single-core processors working at near-threshold voltages. Then, we extend the technique proposed in FFT-Cache, to protect shared last-level cache (LLC) with Non-Uniform Cache Access (NUCA) in chip multiprocessor (CMP) architectures, proposing REMEDIATE that leverages a flexible fault remapping technique while considering the implications of different remapping heuristics in the presence of cache banking, non-uniform latency, and interconnected network. Then, we extend REMEDIATE by introducing RESCUE with the main goal of proposing a design trend (aggressive voltage scaling + cache over-provisioning) that uses different fault remapping heuristics with salable implementation for shared multi-bank LLC in CMPs to reduce power while exploring a large design space with multiple dimensions and performing multiple sensitivity analysis. Considering multibit upsets, we propose a low-cost technique to leverage embedded erasure coding (EEC) to tackle soft errors as well as hard errors in data caches of a high-performance as well as an embedded processor. Considering non-trivial effect of interconnection fabric in memory resiliency of network-on-chip (NoC) platforms, we then propose a novel fault-tolerant scheme that leverages the interconnection network to protect the LLC cache banks against permanent faults. During a LLC access to a faulty area, the network detects and corrects the faults, returning the fault-free data to the requesting core. In another approach, we propose CoDEC, a Co-design approach to error coding of cache and interconnect in many-core architectures to reduce the cost of error protection compared to conventional methods. Proposing a system-wide error coding scheme, CoDEC guarantees end-to-end protection of LLC data blocks throughout the on-chip network against errors. Observing available tradeoffs among reliability, output fidelity, performance, and energy in emerging error-resilient applications in approximate computing era motivates us to consider application-awareness in resilient memory design. The key idea is exploiting the intrinsic tolerance of such applications to some level of errors for relaxing memory guard-banding to reduce design overheads. As an exemplar we propose Relaxed-Cache, in which we relax the definition of faulty block depending on the number and location of faulty bits in a SRAM-based cache to save energy. In this part of thesis, we aim at cross-layer characterization and optimization of on-chip memory resiliency over the system stack. Our first contribution toward this approach is focusing more on scalability of memory resiliency as a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for effective shared redundancy management toward cost-efficient fault-tolerance of on-chip memory blocks. Each cluster represents a group of cores that have access to shared redundancy resources for protection of their memory blocks.

Book Memory System Optimizations for Energy and Bandwidth Efficient Data Movement

Download or read book Memory System Optimizations for Energy and Bandwidth Efficient Data Movement written by Mahdi Nazm Bojnordi and published by . This book was released on 2016 with total page 189 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Since the early 2000s, power dissipation and memory bandwidth have been two of the most critical challenges that limit the performance of computer systems, from data centers to smartphones and wearable devices. Data movement between the processor cores and the storage elements of the memory hierarchy (including the register file, cache levels, and main memory) is the primary contributor to power dissipation in modern microprocessors. As a result, energy and bandwidth efficiency of the memory hierarchy is of paramount importance to designing high performance and energy-efficient computer systems. This research explores a new class of energy-efficient computer architectures that aim at minimizing data movement, and improving memory bandwidth efficiency. We investigate the design of domain specific ISAs and hardware/software interfaces, develop physical structures and microarchitectures for energy efficient memory arrays, and explore novel architectural techniques for leveraging emerging memory technologies (e.g., Resistive RAM) in energy efficient memory-centric accelerators. This dissertation first presents a novel, energy-efficient data exchange mechanism using synchronized counters. The key idea is to represent information by the delay between two consecutive pulses on a set of wires connecting the data arrays to the cache controller. This time-based data representation makes the number of state transitions on the interconnect independent of the bit patterns, and significantly lowers the activity factor on the interconnect. Unlike the case of conventional parallel or serial data communication, however, the transmission time of the proposed technique grows exponentially with the number of bits in each transmitted value. This problem is addressed by limiting the data blocks to a small number of bits to avoid a significant performance loss. A viable hardware implementation of the proposed mechanism is presented that incurs negligible area and delay overheads. The dissertation then examines the first fully programmable DDRx controller that enables application specific optimizations for energy and bandwidth efficient data movement between the processor and main memory. DRAM controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. These optimizations must satisfy different system requirements, which complicates memory controller design. A promising way of improving the versatility and energy efficiency of these controllers is to make them programmable - a proven technique that has seen wide use in other control tasks ranging from DMA scheduling to NAND Flash and directory control. Unfortunately, the stringent latency and throughput requirements of modern DDRx devices have rendered such programmability largely impractical, confining DDRx controllers to fixed-function hardware. The proposed programmable controller employs domain specific ISAs with associative search instructions, and carefully partitions tasks between specialized hardware and firmware to meet all the requirements for high performance DRAM management. Finally, this dissertation presents the memristive Boltzmann machine, a novel hardware accelerator that leverages in situ computation with RRAM technology to eliminate unnecessary data movement on combinatorial optimization and deep learning workloads. The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinatorial optimization problems and training deep machine learning models on massive datasets. Regrettably, the required all-to-all communication among the processing units limits the performance of the Boltzmann machine on conventional memory architectures. The proposed accelerator exploits the electrical properties of RRAM to realize in situ, fine-grained parallel computation within the memory arrays, thereby eliminating the need for exchanging data between the memory cells and the computational units. Two classical optimization problems, graph partitioning and boolean satisfiability, and a deep belief network application are mapped onto the proposed hardware"--Pages viii-x.

Book Artificial Intelligence and Hardware Accelerators

Download or read book Artificial Intelligence and Hardware Accelerators written by Ashutosh Mishra and published by Springer Nature. This book was released on 2023-03-15 with total page 358 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book explores new methods, architectures, tools, and algorithms for Artificial Intelligence Hardware Accelerators. The authors have structured the material to simplify readers’ journey toward understanding the aspects of designing hardware accelerators, complex AI algorithms, and their computational requirements, along with the multifaceted applications. Coverage focuses broadly on the hardware aspects of training, inference, mobile devices, and autonomous vehicles (AVs) based AI accelerators

Book Optimizing Processor Architectures for Warehouse scale Computers

Download or read book Optimizing Processor Architectures for Warehouse scale Computers written by Grant Edward Ayers and published by . This book was released on 2019 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Our society is becoming increasingly integrated with and reliant upon services hosted in large-scale datacenters. These services interact with the lives of billions of people, in large part because they offer unprecedented, near-instantaneous access to information, acceleration of communication and transactions, power to influence knowledge and sentiment, and because they have promoted significant economic growth. As more services and features come online, the complexity and amount of data that they generate and process is continually increasing, and as a result the demands of software systems have never been greater. In contrast, hardware scaling is not keeping up. The end of Dennard scaling near the turn of the century quickly pushed chips against the power wall which caused frequency and single-threaded performance scaling to plateau. Process scaling, or Moore's Law, is also in decline as observed by a slowdown in cost scaling over the past several years. With these slowdowns, we no longer enjoy the exponential performance benefits that device scaling enabled for decades, and increasing instruction and data working sets further challenge on-chip capabilities. As a result, we are at a watershed moment where future performance sustainability will be driven more significantly by computer architecture than by process technology. This dissertation is about improving computer performance in spite of slowed hardware scaling and accelerated software scaling. It focuses on server-class, general-purpose CPUs which are the workhorses of the warehouse-scale computers that host the majority of our online activities today. To this end, we present contributions that address the identification and measurement of performance challenges in important workloads today, provide immediate short-term optimizations that alleviate a portion of these challenges, and finally long-term optimizations, methodologies, and strategies for sustaining performance scaling into the future. We do this with a focus on both the instruction and data latency bottlenecks, which constitute the majority of CPU stalls. We first present a detailed microarchitecture and memory subsystem analysis of Google's Web Search, one of the largest and most popular services in the world today. This study shows that stalls from memory latency present an opportunity to more than double performance. It also quantifies significant differences between the hardware performance of large-scale workloads like search and traditional software benchmarks which have historically driven CPU design. We evaluate two opportunities to readjust the memory hierarchy to better support search; a rebalancing of on-chip cache and compute resources, and the introduction of a latency-optimized L4 cache to target shared heap accesses. These optimizations combine to yield between 27% and 38% performance improvement. We next focus on the CPU instruction front-end which contributes to as much as one third of all CPU stalls. We show that, in a large fleet, instruction cache misses are caused by a long tail of millions of unique instructions, which suggests the need for larger caches. However, recognizing that on-chip storage is not scaling and that larger caches increase access latencies, we choose to address instruction availability via prefetching. Specifically, we propose a profile-driven software code prefetcher that can eliminate up to 96% of instruction cache misses with very little execution overhead. Finally, we consider the CPU data back-end which is the largest single contributor to stalls. Data working set sizes have long outpaced cache capacities, and so data prefetching is well-studied. However, despite decades of research on prefetchers, only simple designs are present in modern systems today, and this is primarily because prefetcher proposals don't adequately capture generality and cost. One key reason for this is that relatively little is known about the dominant memory access patterns of important workloads. To that end, we show that access patterns can be extracted directly from programs via dataflow analysis instead of being estimated by indirect methods, and we propose dataflow-based memory pattern analysis tools which can inform us about the capabilities and limitations of current prefetchers as well as guide future prefetcher designs. We evaluate the accuracy and timeliness of a dataflow-informed prefetcher, and show that it is able to consistently outperform hardware prefetchers many times over, and provides a much better design point in the landscape of generality and cost.

Book Perspectives of System Informatics

Download or read book Perspectives of System Informatics written by Andrei Voronkov and published by Springer. This book was released on 2015-04-20 with total page 429 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book contains the thoroughly refereed papers from the 9th International Ershov Informatics Conference, PSI 2014, held in St. Petersburg, Russia, in June 2014. The 17 revised full papers, 11 revised short papers, and 2 system and experimental papers presented in this book were carefully reviewed and selected from 80 submissions. The volume also contains 5 keynote talks which cover a range of hot topics in computer science and informatics. The papers cover various topics related to the foundations of program and system development and analysis, programming methodology and software engineering and information technologies.

Book Power Optimization of Embedded Memory Systems Via Data Remapping

Download or read book Power Optimization of Embedded Memory Systems Via Data Remapping written by Krishna V. Palem and published by . This book was released on 2002 with total page 24 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Autonomous driving algorithms and Its IC Design

Download or read book Autonomous driving algorithms and Its IC Design written by Jianfeng Ren and published by Springer Nature. This book was released on 2023-08-09 with total page 306 pages. Available in PDF, EPUB and Kindle. Book excerpt: With the rapid development of artificial intelligence and the emergence of various new sensors, autonomous driving has grown in popularity in recent years. The implementation of autonomous driving requires new sources of sensory data, such as cameras, radars, and lidars, and the algorithm processing requires a high degree of parallel computing. In this regard, traditional CPUs have insufficient computing power, while DSPs are good at image processing but lack sufficient performance for deep learning. Although GPUs are good at training, they are too “power-hungry,” which can affect vehicle performance. Therefore, this book looks to the future, arguing that custom ASICs are bound to become mainstream. With the goal of ICs design for autonomous driving, this book discusses the theory and engineering practice of designing future-oriented autonomous driving SoC chips. The content is divided into thirteen chapters, the first chapter mainly introduces readers to the current challenges and research directions in autonomous driving. Chapters 2–6 focus on algorithm design for perception and planning control. Chapters 7–10 address the optimization of deep learning models and the design of deep learning chips, while Chapters 11-12 cover automatic driving software architecture design. Chapter 13 discusses the 5G application on autonomous drving. This book is suitable for all undergraduates, graduate students, and engineering technicians who are interested in autonomous driving.

Book Computerworld

    Book Details:
  • Author :
  • Publisher :
  • Release : 1975-03-19
  • ISBN :
  • Pages : 44 pages

Download or read book Computerworld written by and published by . This book was released on 1975-03-19 with total page 44 pages. Available in PDF, EPUB and Kindle. Book excerpt: For more than 40 years, Computerworld has been the leading source of technology news and information for IT influencers worldwide. Computerworld's award-winning Web site (Computerworld.com), twice-monthly publication, focused conference series and custom research form the hub of the world's largest global IT media network.

Book High Performance Computing in Finance

Download or read book High Performance Computing in Finance written by M. A. H. Dempster and published by CRC Press. This book was released on 2018-02-21 with total page 637 pages. Available in PDF, EPUB and Kindle. Book excerpt: High-Performance Computing (HPC) delivers higher computational performance to solve problems in science, engineering and finance. There are various HPC resources available for different needs, ranging from cloud computing– that can be used without much expertise and expense – to more tailored hardware, such as Field-Programmable Gate Arrays (FPGAs) or D-Wave’s quantum computer systems. High-Performance Computing in Finance is the first book that provides a state-of-the-art introduction to HPC for finance, capturing both academically and practically relevant problems.

Book In Memory Data Management

Download or read book In Memory Data Management written by Hasso Plattner and published by Springer Science & Business Media. This book was released on 2011-03-08 with total page 245 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the last 50 years the world has been completely transformed through the use of IT. We have now reached a new inflection point. Here we present, for the first time, how in-memory computing is changing the way businesses are run. Today, enterprise data is split into separate databases for performance reasons. Analytical data resides in warehouses, synchronized periodically with transactional systems. This separation makes flexible, real-time reporting on current data impossible. Multi-core CPUs, large main memories, cloud computing and powerful mobile devices are serving as the foundation for the transition of enterprises away from this restrictive model. We describe techniques that allow analytical and transactional processing at the speed of thought and enable new ways of doing business. The book is intended for university students, IT-professionals and IT-managers, but also for senior management who wish to create new business processes by leveraging in-memory computing.