[EBOOK] Hardware Techniques To Improve The Performance Of The Processor Memory Interface PDF Download

Hardware Techniques to Improve the Performance of the Processor memory Interface

Book Details:

Author : Doug Burger
Publisher :
Release : 1998
ISBN :
Pages : 434 pages

Download or read book Hardware Techniques to Improve the Performance of the Processor memory Interface written by Doug Burger and published by . This book was released on 1998 with total page 434 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Computer Organization and Design

Book Details:

Author : David A. Patterson
Publisher : Elsevier
Release : 2004-08-07
ISBN : 0080502571
Pages : 690 pages

Download or read book Computer Organization and Design written by David A. Patterson and published by Elsevier. This book was released on 2004-08-07 with total page 690 pages. Available in PDF, EPUB and Kindle. Book excerpt: This best selling text on computer organization has been thoroughly updated to reflect the newest technologies. Examples highlight the latest processor designs, benchmarking standards, languages and tools. As with previous editions, a MIPs processor is the core used to present the fundamentals of hardware technologies at work in a computer system. The book presents an entire MIPS instruction set—instruction by instruction—the fundamentals of assembly language, computer arithmetic, pipelining, memory hierarchies and I/O. A new aspect of the third edition is the explicit connection between program performance and CPU performance. The authors show how hardware and software components--such as the specific algorithm, programming language, compiler, ISA and processor implementation--impact program performance. Throughout the book a new feature focusing on program performance describes how to search for bottlenecks and improve performance in various parts of the system. The book digs deeper into the hardware/software interface, presenting a complete view of the function of the programming language and compiler--crucial for understanding computer organization. A CD provides a toolkit of simulators and compilers along with tutorials for using them. For instructor resources click on the grey "companion site" button found on the right side of this page.This new edition represents a major revision. New to this edition:* Entire Text has been updated to reflect new technology* 70% new exercises.* Includes a CD loaded with software, projects and exercises to support courses using a number of tools * A new interior design presents defined terms in the margin for quick reference * A new feature, "Understanding Program Performance" focuses on performance from the programmer's perspective * Two sets of exercises and solutions, "For More Practice" and "In More Depth," are included on the CD * "Check Yourself" questions help students check their understanding of major concepts * "Computers In the Real World" feature illustrates the diversity of uses for information technology *More detail below...

Computers

High Performance Memory Systems

Book Details:

Author : Haldun Hadimioglu
Publisher : Springer Science & Business Media
Release : 2011-06-27
ISBN : 1441989870
Pages : 298 pages

Download or read book High Performance Memory Systems written by Haldun Hadimioglu and published by Springer Science & Business Media. This book was released on 2011-06-27 with total page 298 pages. Available in PDF, EPUB and Kindle. Book excerpt: The State of Memory Technology Over the past decade there has been rapid growth in the speed of micropro cessors. CPU speeds are approximately doubling every eighteen months, while main memory speed doubles about every ten years. The International Tech nology Roadmap for Semiconductors (ITRS) study suggests that memory will remain on its current growth path. The ITRS short-and long-term targets indicate continued scaling improvements at about the current rate by 2016. This translates to bit densities increasing at two times every two years until the introduction of 8 gigabit dynamic random access memory (DRAM) chips, after which densities will increase four times every five years. A similar growth pattern is forecast for other high-density chip areas and high-performance logic (e.g., microprocessors and application specific inte grated circuits (ASICs)). In the future, molecular devices, 64 gigabit DRAMs and 28 GHz clock signals are targeted. Although densities continue to grow, we still do not see significant advances that will improve memory speed. These trends have created a problem that has been labeled the Memory Wall or Memory Gap.

Computers

Computer Organization and Design MIPS Edition

Book Details:

Author : David A. Patterson
Publisher : Newnes
Release : 2013-09-30
ISBN : 0124078869
Pages : 1137 pages

Download or read book Computer Organization and Design MIPS Edition written by David A. Patterson and published by Newnes. This book was released on 2013-09-30 with total page 1137 pages. Available in PDF, EPUB and Kindle. Book excerpt: Computer Organization and Design, Fifth Edition, is the latest update to the classic introduction to computer organization. The text now contains new examples and material highlighting the emergence of mobile computing and the cloud. It explores this generational change with updated content featuring tablet computers, cloud infrastructure, and the ARM (mobile computing devices) and x86 (cloud computing) architectures. The book uses a MIPS processor core to present the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies and I/O.Because an understanding of modern hardware is essential to achieving good performance and energy efficiency, this edition adds a new concrete example, Going Faster, used throughout the text to demonstrate extremely effective optimization techniques. There is also a new discussion of the Eight Great Ideas of computer architecture. Parallelism is examined in depth with examples and content highlighting parallel hardware and software topics. The book features the Intel Core i7, ARM Cortex-A8 and NVIDIA Fermi GPU as real-world examples, along with a full set of updated and improved exercises. This new edition is an ideal resource for professional digital system designers, programmers, application developers, and system software developers. It will also be of interest to undergraduate students in Computer Science, Computer Engineering and Electrical Engineering courses in Computer Organization, Computer Design, ranging from Sophomore required courses to Senior Electives. Winner of a 2014 Texty Award from the Text and Academic Authors Association Includes new examples, exercises, and material highlighting the emergence of mobile computing and the cloud Covers parallelism in depth with examples and content highlighting parallel hardware and software topics Features the Intel Core i7, ARM Cortex-A8 and NVIDIA Fermi GPU as real-world examples throughout the book Adds a new concrete example, "Going Faster," to demonstrate how understanding hardware can inspire software optimizations that improve performance by 200 times Discusses and highlights the "Eight Great Ideas" of computer architecture: Performance via Parallelism; Performance via Pipelining; Performance via Prediction; Design for Moore's Law; Hierarchy of Memories; Abstraction to Simplify Design; Make the Common Case Fast; and Dependability via Redundancy Includes a full set of updated and improved exercises

Computers

Computer Organization and Design RISC V Edition

Book Details:

Author : David A. Patterson
Publisher : Morgan Kaufmann
Release : 2017-05-12
ISBN : 0128122765
Pages : 700 pages

Download or read book Computer Organization and Design RISC V Edition written by David A. Patterson and published by Morgan Kaufmann. This book was released on 2017-05-12 with total page 700 pages. Available in PDF, EPUB and Kindle. Book excerpt: The new RISC-V Edition of Computer Organization and Design features the RISC-V open source instruction set architecture, the first open source architecture designed to be used in modern computing environments such as cloud computing, mobile devices, and other embedded systems. With the post-PC era now upon us, Computer Organization and Design moves forward to explore this generational change with examples, exercises, and material highlighting the emergence of mobile computing and the Cloud. Updated content featuring tablet computers, Cloud infrastructure, and the x86 (cloud computing) and ARM (mobile computing devices) architectures is included. An online companion Web site provides advanced content for further study, appendices, glossary, references, and recommended reading. - Features RISC-V, the first such architecture designed to be used in modern computing environments, such as cloud computing, mobile devices, and other embedded systems - Includes relevant examples, exercises, and material highlighting the emergence of mobile computing and the cloud

Technology & Engineering

Computer Architecture Techniques for Power Efficiency

Book Details:

Author : Stefanos Kaxiras
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 3031017218
Pages : 207 pages

Download or read book Computer Architecture Techniques for Power Efficiency written by Stefanos Kaxiras and published by Springer Nature. This book was released on 2022-06-01 with total page 207 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the last few years, power dissipation has become an important design constraint, on par with performance, in the design of new computer systems. Whereas in the past, the primary job of the computer architect was to translate improvements in operating frequency and transistor count into performance, now power efficiency must be taken into account at every step of the design process. While for some time, architects have been successful in delivering 40% to 50% annual improvement in processor performance, costs that were previously brushed aside eventually caught up. The most critical of these costs is the inexorable increase in power dissipation and power density in processors. Power dissipation issues have catalyzed new topic areas in computer architecture, resulting in a substantial body of work on more power-efficient architectures. Power dissipation coupled with diminishing performance gains, was also the main cause for the switch from single-core to multi-core architectures and a slowdown in frequency increase. This book aims to document some of the most important architectural techniques that were invented, proposed, and applied to reduce both dynamic power and static power dissipation in processors and memory hierarchies. A significant number of techniques have been proposed for a wide range of situations and this book synthesizes those techniques by focusing on their common characteristics. Table of Contents: Introduction / Modeling, Simulation, and Measurement / Using Voltage and Frequency Adjustments to Manage Dynamic Power / Optimizing Capacitance and Switching Activity to Reduce Dynamic Power / Managing Static (Leakage) Power / Conclusions

Computers

Computer Organization and Design ARM Edition

Book Details:

Author : David A. Patterson
Publisher : Morgan Kaufmann
Release : 2016-05-06
ISBN : 0128018356
Pages : 724 pages

Download or read book Computer Organization and Design ARM Edition written by David A. Patterson and published by Morgan Kaufmann. This book was released on 2016-05-06 with total page 724 pages. Available in PDF, EPUB and Kindle. Book excerpt: The new ARM Edition of Computer Organization and Design features a subset of the ARMv8-A architecture, which is used to present the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies, and I/O. With the post-PC era now upon us, Computer Organization and Design moves forward to explore this generational change with examples, exercises, and material highlighting the emergence of mobile computing and the Cloud. Updated content featuring tablet computers, Cloud infrastructure, and the ARM (mobile computing devices) and x86 (cloud computing) architectures is included. An online companion Web site provides links to a free version of the DS-5 Community Edition (a free professional quality tool chain developed by ARM), as well as additional advanced content for further study, appendices, glossary, references, and recommended reading. - Covers parallelism in depth with examples and content highlighting parallel hardware and software topics - Features the Intel Core i7, ARM Cortex-A53, and NVIDIA Fermi GPU as real-world examples throughout the book - Adds a new concrete example, "Going Faster," to demonstrate how understanding hardware can inspire software optimizations that improve performance by 200X - Discusses and highlights the "Eight Great Ideas" of computer architecture: Performance via Parallelism; Performance via Pipelining; Performance via Prediction; Design for Moore's Law; Hierarchy of Memories; Abstraction to Simplify Design; Make the Common Case Fast; and Dependability via Redundancy. - Includes a full set of updated exercises

Technology & Engineering

Chip Multiprocessor Architecture

Book Details:

Author : Kunle Olukotun
Publisher : Morgan & Claypool Publishers
Release : 2007-12-01
ISBN : 1598291238
Pages : 154 pages

Download or read book Chip Multiprocessor Architecture written by Kunle Olukotun and published by Morgan & Claypool Publishers. This book was released on 2007-12-01 with total page 154 pages. Available in PDF, EPUB and Kindle. Book excerpt: Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become prohibitive in all but water-cooled systems. Compounding these problems is the simple fact that with the immense numbers of transistors available on today's microprocessor chips, it is too costly to design and debug ever-larger processors every year or two. CMPs avoid these problems by filling up a processor die with multiple, relatively simpler processor cores instead of just one huge core. The exact size of a CMP's cores can vary from very simple pipelines to moderately complex superscalar processors, but once a core has been selected the CMP's performance can easily scale across silicon process generations simply by stamping down more copies of the hard-to-design, high-speed processor core in each successive chip generation. In addition, parallel code execution, obtained by spreading multiple threads of execution across the various cores, can achieve significantly higher performance than would be possible using only a single core. While parallel threads are already common in many useful workloads, there are still important workloads that are hard to divide into parallel threads. The low inter-processor communication latency between the cores in a CMP helps make a much wider range of applications viable candidates for parallel execution than was possible with conventional, multi-chip multiprocessors; nevertheless, limited parallelism in key applications is the main factor limiting acceptance of CMPs in some types of systems. After a discussion of the basic pros and cons of CMPs when they are compared with conventional uniprocessors, this book examines how CMPs can best be designed to handle two radically different kinds of workloads that are likely to be used with a CMP: highly parallel, throughput-sensitive applications at one end of the spectrum, and less parallel, latency-sensitive applications at the other. Throughput-sensitive applications, such as server workloads that handle many independent transactions at once, require careful balancing of all parts of a CMP that can limit throughput, such as the individual cores, on-chip cache memory, and off-chip memory interfaces. Several studies and example systems, such as the Sun Niagara, that examine the necessary tradeoffs are presented here. In contrast, latency-sensitive applications - many desktop applications fall into this category - require a focus on reducing inter-core communication latency and applying techniques to help programmers divide their programs into multiple threads as easily as possible. This book discusses many techniques that can be used in CMPs to simplify parallel programming, with an emphasis on research directions proposed at Stanford University. To illustrate the advantages possible with a CMP using a couple of solid examples, extra focus is given to thread-level speculation (TLS), a way to automatically break up nominally sequential applications into parallel threads on a CMP, and transactional memory. This model can greatly simplify manual parallel programming by using hardware - instead of conventional software locks - to enforce atomic code execution of blocks of instructions, a technique that makes parallel coding much less error-prone. Contents: The Case for CMPs / Improving Throughput / Improving Latency Automatically / Improving Latency using Manual Parallel Programming / A Multicore World: The Future of CMPs

Computer architecture

Chip Multiprocessor Architecture

Book Details:

Author : Oyekunle Ayinde Olukotun
Publisher : Morgan & Claypool Publishers
Release : 2007
ISBN : 159829122X
Pages : 155 pages

Download or read book Chip Multiprocessor Architecture written by Oyekunle Ayinde Olukotun and published by Morgan & Claypool Publishers. This book was released on 2007 with total page 155 pages. Available in PDF, EPUB and Kindle. Book excerpt: Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become prohibitive in all but water-cooled systems. After a discussion of the basic pros and cons of CMPs when they are compared with conventional uniprocessors, this book examines how CMPs can best be designed to handle two radically different kinds of workloads that are likely to be used with a CMP: highly parallel, throughput-sensitive applications at one end of the spectrum, and less parallel, latency-sensitive applications at the other. Throughput-sensitive applications, such as server workloads that handle many independent transactions at once, require careful balancing of all parts of a CMP that can limit throughput, such as the individual cores, on-chip cache memory, and off-chip memory interfaces. Several studies and example systems, such as the Sun Niagara, that examine the necessary tradeoffs are presented here. In contrast, latency-sensitive applications - many desktop applications fall into this category - require a focus on reducing inter-core communication latency and applying techniques to help programmers divide their programs into multiple threads as easily as possible. This book discusses many techniques that can be used in CMPs to simplify parallel programming, with an emphasis on research directions proposed at Stanford University. To illustrate the advantages possible with a CMP using a couple of solid examples, extra focus is given to thread-level speculation (TLS), a way to automatically break up nominally sequential applications into parallel threads on a CMP, and transactional memory. This model can greatly simplify manual parallel programming by using hardware - instead of conventional software locks - to enforce atomic code execution of blocks of instructions, a technique that makes parallel coding much less error-prone. Book jacket.

Computers

Computer Architecture Techniques for Power efficiency

Book Details:

Author : Stefanos Kaxiras
Publisher : Morgan & Claypool Publishers
Release : 2008
ISBN : 1598292080
Pages : 220 pages

Download or read book Computer Architecture Techniques for Power efficiency written by Stefanos Kaxiras and published by Morgan & Claypool Publishers. This book was released on 2008 with total page 220 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the last few years, power dissipation has become an important design constraint, on par with performance, in the design of new computer systems. Whereas in the past, the primary job of the computer architect was to translate improvements in operating frequency and transistor count into performance, now power efficiency must be taken into account at every step of the design process. While for some time, architects have been successful in delivering 40% to 50% annual improvement in processor performance, costs that were previously brushed aside eventually caught up. The most critical of these costs is the inexorable increase in power dissipation and power density in processors. Power dissipation issues have catalyzed new topic areas in computer architecture, resulting in a substantial body of work on more power-efficient architectures. Power dissipation coupled with diminishing performance gains, was also the main cause for the switch from single-core to multi-core architectures and a slowdown in frequency increase. This book aims to document some of the most important architectural techniques that were invented, proposed, and applied to reduce both dynamic power and static power dissipation in processors and memory hierarchies. A significant number of techniques have been proposed for a wide range of situations and this book synthesizes those techniques by focusing on their common characteristics.

Technology & Engineering

Power Aware Design Methodologies

Book Details:

Author : Massoud Pedram
Publisher : Springer Science & Business Media
Release : 2007-05-08
ISBN : 0306481391
Pages : 533 pages

Download or read book Power Aware Design Methodologies written by Massoud Pedram and published by Springer Science & Business Media. This book was released on 2007-05-08 with total page 533 pages. Available in PDF, EPUB and Kindle. Book excerpt: Power Aware Design Methodologies was conceived as an effort to bring all aspects of power-aware design methodologies together in a single document. It covers several layers of the design hierarchy from technology, circuit logic, and architectural levels up to the system layer. It includes discussion of techniques and methodologies for improving the power efficiency of CMOS circuits (digital and analog), systems on chip, microelectronic systems, wirelessly networked systems of computational nodes and so on. In addition to providing an in-depth analysis of the sources of power dissipation in VLSI circuits and systems and the technology and design trends, this book provides a myriad of state-of-the-art approaches to power optimization and control. The different chapters of Power Aware Design Methodologies have been written by leading researchers and experts in their respective areas. Contributions are from both academia and industry. The contributors have reported the various technologies, methodologies, and techniques in such a way that they are understandable and useful.

Technology & Engineering

A Primer on Hardware Prefetching

Book Details:

Author : Babak Falsafi
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 3031017439
Pages : 54 pages

Download or read book A Primer on Hardware Prefetching written by Babak Falsafi and published by Springer Nature. This book was released on 2022-06-01 with total page 54 pages. Available in PDF, EPUB and Kindle. Book excerpt: Since the 1970’s, microprocessor-based digital platforms have been riding Moore’s law, allowing for doubling of density for the same area roughly every two years. However, whereas microprocessor fabrication has focused on increasing instruction execution rate, memory fabrication technologies have focused primarily on an increase in capacity with negligible increase in speed. This divergent trend in performance between the processors and memory has led to a phenomenon referred to as the “Memory Wall.” To overcome the memory wall, designers have resorted to a hierarchy of cache memory levels, which rely on the principal of memory access locality to reduce the observed memory access time and the performance gap between processors and memory. Unfortunately, important workload classes exhibit adverse memory access patterns that baffle the simple policies built into modern cache hierarchies to move instructions and data across cache levels. As such, processors often spend much time idling upon a demand fetch of memory blocks that miss in higher cache levels. Prefetching—predicting future memory accesses and issuing requests for the corresponding memory blocks in advance of explicit accesses—is an effective approach to hide memory access latency. There have been a myriad of proposed prefetching techniques, and nearly every modern processor includes some hardware prefetching mechanisms targeting simple and regular memory access patterns. This primer offers an overview of the various classes of hardware prefetchers for instructions and data proposed in the research literature, and presents examples of techniques incorporated into modern microprocessors.

Technology & Engineering

Low Power Processors and Systems on Chips

Book Details:

Author : Christian Piguet
Publisher : CRC Press
Release : 2018-10-03
ISBN : 142003720X
Pages : 392 pages

Download or read book Low Power Processors and Systems on Chips written by Christian Piguet and published by CRC Press. This book was released on 2018-10-03 with total page 392 pages. Available in PDF, EPUB and Kindle. Book excerpt: The power consumption of microprocessors is one of the most important challenges of high-performance chips and portable devices. In chapters drawn from Piguet's recently published Low-Power Electronics Design, this volume addresses the design of low-power microprocessors in deep submicron technologies. It provides a focused reference for specialists involved in systems-on-chips, from low-power microprocessors to DSP cores, reconfigurable processors, memories, ad-hoc networks, and embedded software. Low-Power Processors and Systems on Chips is organized into three broad sections for convenient access. The first section examines the design of digital signal processors for embedded applications and techniques for reducing dynamic and static power at the electrical and system levels. The second part describes several aspects of low-power systems on chips, including hardware and embedded software aspects, efficient data storage, networks-on-chips, and applications such as routing strategies in wireless RF sensing and actuating devices. The final section discusses embedded software issues, including details on compilers, retargetable compilers, and coverification tools. Providing detailed examinations contributed by leading experts, Low-Power Processors and Systems on Chips supplies authoritative information on how to maintain high performance while lowering power consumption in modern processors and SoCs. It is a must-read for anyone designing modern computers or embedded systems.

Computers

Computer System Design

Book Details:

Author : Michael J. Flynn
Publisher : John Wiley & Sons
Release : 2011-08-08
ISBN : 1118009916
Pages : 271 pages

Download or read book Computer System Design written by Michael J. Flynn and published by John Wiley & Sons. This book was released on 2011-08-08 with total page 271 pages. Available in PDF, EPUB and Kindle. Book excerpt: The next generation of computer system designers will be less concerned about details of processors and memories, and more concerned about the elements of a system tailored to particular applications. These designers will have a fundamental knowledge of processors and other elements in the system, but the success of their design will depend on the skills in making system-level tradeoffs that optimize the cost, performance and other attributes to meet application requirements. This book provides a new treatment of computer system design, particularly for System-on-Chip (SOC), which addresses the issues mentioned above. It begins with a global introduction, from the high-level view to the lowest common denominator (the chip itself), then moves on to the three main building blocks of an SOC (processor, memory, and interconnect). Next is an overview of what makes SOC unique (its customization ability and the applications that drive it). The final chapter presents future challenges for system design and SOC possibilities.

Technology & Engineering

Designing Embedded Processors

Book Details:

Author : Jörg Henkel
Publisher : Springer Science & Business Media
Release : 2007-07-27
ISBN : 1402058691
Pages : 551 pages

Download or read book Designing Embedded Processors written by Jörg Henkel and published by Springer Science & Business Media. This book was released on 2007-07-27 with total page 551 pages. Available in PDF, EPUB and Kindle. Book excerpt: To the hard-pressed systems designer this book will come as a godsend. It is a hands-on guide to the many ways in which processor-based systems are designed to allow low power devices. Covering a huge range of topics, and co-authored by some of the field’s top practitioners, the book provides a good starting point for engineers in the area, and to research students embarking upon work on embedded systems and architectures.

Performance and Power Optimizations in Chip Multiprocessors for Throughput aware Computation

Book Details:

Author : Augusto J. Vega
Publisher :
Release : 2014
ISBN :
Pages : 155 pages

Download or read book Performance and Power Optimizations in Chip Multiprocessors for Throughput aware Computation written by Augusto J. Vega and published by . This book was released on 2014 with total page 155 pages. Available in PDF, EPUB and Kindle. Book excerpt: The so-called "power (or power density) wall" has caused core frequency (and single-thread performance) to slow down, giving rise to the era of multi-core/multi-thread processors. For example, the IBM POWER4 processor, released in 2001, incorporated two single-thread cores into the same chip. In 2010, IBM released the POWER7 processor with eight 4-thread cores in the same chip, for a total capacity of 32 execution contexts. The ever increasing number of cores and threads gives rise to new opportunities and challenges for software and hardware architects. At software level, applications can benefit from the abundant number of execution contexts to boost throughput. But this challenges programmers to create highly-parallel applications and operating systems capable of scheduling them correctly. At hardware level, the increasing core and thread count puts pressure on the memory interface, because memory bandwidth grows at a slower pace --phenomenon known as the "bandwidth (or memory) wall". In addition to memory bandwidth issues, chip power consumption rises due to manufacturers' difficulty to lower operating voltages sufficiently every processor generation. This thesis presents innovations to improve bandwidth and power consumption in chip multiprocessors (CMPs) for throughput-aware computation: a bandwidth-optimized last-level cache (LLC), a bandwidth-optimized vector register file, and a power/performance-aware thread placement heuristic. In contrast to state-of-the-art LLC designs, our organization avoids data replication and, hence, does not require keeping data coherent. Instead, the address space is statically distributed all over the LLC (in a fine-grained interleaving fashion). The absence of data replication increases the cache effective capacity, which results in better hit rates and higher bandwidth compared to a coherent LLC. We use double buffering to hide the extra access latency due to the lack of data replication. The proposed vector register file is composed of thousands of registers and organized as an aggregation of banks. We leverage such organization to attach small special-function "local computation elements" (LCEs) to each bank. This approach --referred to as the "processor-in-regfile" (PIR) strategy-- overcomes the limited number of register file ports. Because each LCE is a SIMD computation element and all of them can proceed concurrently, the PIR strategy constitutes a highly-parallel super-wide-SIMD device (ideal for throughput-aware computation). Finally, we present a heuristic to reduce chip power consumption by dynamically placing software (application) threads across hardware (physical) threads. The heuristic gathers chip-level power and performance information at runtime to infer characteristics of the applications being executed. For example, if an application's threads share data, the heuristic may decide to place them in fewer cores to favor inter-thread data sharing and communication. In such case, the number of active cores decreases, which is a good opportunity to switch off the unused cores to save power. It is increasingly harder to find bulletproof (micro-)architectural solutions for the bandwidth and power scalability limitations in CMPs. Consequently, we think that architects should attack those problems from different flanks simultaneously, with complementary innovations. This thesis contributes with a battery of solutions to alleviate those problems in the context of throughput-aware computation: 1) proposing a bandwidth-optimized LLC; 2) proposing a bandwidth-optimized register file organization; and 3) proposing a simple technique to improve power-performance efficiency.

Technology & Engineering

The Memory System

Book Details:

Author : Bruce Jacob
Publisher : Morgan & Claypool Publishers
Release : 2009-07-08
ISBN : 1598295888
Pages : 77 pages

Download or read book The Memory System written by Bruce Jacob and published by Morgan & Claypool Publishers. This book was released on 2009-07-08 with total page 77 pages. Available in PDF, EPUB and Kindle. Book excerpt: Today, computer-system optimization, at both the hardware and software levels, must consider the details of the memory system in its analysis; failing to do so yields systems that are increasingly inefficient as those systems become more complex. This lecture seeks to introduce the reader to the most important details of the memory system; it targets both computer scientists and computer engineers in industry and in academia. Roughly speaking, computer scientists are the users of the memory system and computer engineers are the designers of the memory system. Both can benefit tremendously from a basic understanding of how the memory system really works: the computer scientist will be better equipped to create algorithms that perform well and the computer engineer will be better equipped to design systems that approach the optimal, given the resource limitations. Currently, there is consensus among architecture researchers that the memory system is "the bottleneck," and this consensus has held for over a decade. Somewhat inexplicably, most of the research in the field is still directed toward improving the CPU to better tolerate a slow memory system, as opposed to addressing the weaknesses of the memory system directly. This lecture should get the bulk of the computer science and computer engineering population up the steep part of the learning curve. Not every CS/CE researcher/developer needs to do work in the memory system, but, just as a carpenter can do his job more efficiently if he knows a little of architecture, and an architect can do his job more efficiently if he knows a little of carpentry, giving the CS/CE worlds better intuition about the memory system should help them build better systems, both software and hardware. Table of Contents: Primers / It Must Be Modeled Accurately / ...\ and It Will Change Soon