Paul Brett - Published Papers & Patents

Posts

The Forgotten ‘Uncore’: On the Energy-Efficiency of Heterogeneous Cores

Abstract Heterogeneous multicore processors (HMPs), consisting of cores with different performance/power characteris- tics, have been proposed to deliver higher energy effi- ciency than symmetric multicores. This paper investi- gates the opportunities and limitations in using HMPs to gain energy-efficiency. Unlike previous work focused on server systems, we focus on the client workloads typi- cally seen in modern end-user devices. Further, beyond considering core power usage, we also consider the ‘un- core’ subsystem shared by all cores, which in modern platforms, is an increasingly important contributor to to- tal SoC power....

HeteroMates: Providing High Dynamic Power Range on Client Devices using Heterogeneous Core Groups

Abstract This paper presents HeteroMates, a solution that uses heterogeneous processors to extend the dynamic power/performance range of client devices. By using a mix of different processors, HeteroMates offers both high performance and reduced power consumption. The solution uses core groups as the abstraction that groups a small number of heterogeneous cores to form a single execution unit. Group heterogeneity is exposed as multiple heterogeneity (H) states, an interface similar to the P-state interface already used for frequency scaling....

QuickIA: Exploring Heterogeneous Architectures on Real Prototypes

Abstract Over the last decade, homogeneous multi-core processors emerged and became the de-facto approach for offering high parallelism, high performance and scalability for a wide range of platforms. We are now at an interesting juncture where several critical factors (smaller form factor devices, power challenges, need for specialization, etc) are guiding architects to consider heterogeneous chips and platforms for the next decade and beyond. Exploring heterogeneous architectures is challenging since it involves re-evaluating architecture options, OS implications and application development....

Extending the Dynamic Power Range of Client Devices using Heterogeneous Processors

Abstract The ubiquity of handhelds is causing an unprecedented increase in the range of performance demands imposed on mobile platforms, and at the same time, battery life and energy efficiency remain critical concerns. Yet modern processors are typically designed to meet only one, not both, of these two conflicting goals: to offer high performance vs. provide power savings. This work explores an approach in which heterogeneous processors, i.e., a mix of different cores, are used to extend the dynamic power/performance range of client devices....

Access: Smart Scheduling for Asymmetric Cache CMPs

Abstract In current Chip-multiprocessors (CMPs), a significant portion of the die is consumed by the last-level cache. Until recently, the balance of cache and core space has been primarily guided by the needs of single applications. However, as multiple applications or virtual machines (VMs) are consolidated on such a platform, researchers have observed that not all VMs or applications require significant amount of cache space. In order to take advantage of this phenomenon, we explore the use of asymmetric last-level caches in a CMP platform....

Bridging functional heterogeneity in multicore architectures

Abstract Heterogeneous processors that mix big high performance cores with small low power cores promise excellent single-threaded performance coupled with high multi-threaded throughput and higher performance-per-watt. A significant portion of the commercial multicore heterogeneous processors are likely to have a common instruction set architecture( ISA). However, due to limited design resources and goals, each core is likely to contain ISA extensions not yet implemented in the other core. Therefore, such heterogeneous processors will have inherent functional asymmetry at the ISA level and face significant software challenges....

Hardware Support for Cross-Layer PMU Arbitration

Abstract Intel processors offer PerfMon, a set of hardware events and counters that may be programmed in a number of ways for a variety of uses. Traditionally used for application optimization, we are seeing novel nascent uses throughout the software stack: in operating systems, virtualization hypervisors, and even BIOS firmware. Conflict for these counters has already been observed, and is likely to worsen. We posit the need for hardware features to allow “reservation” of and exclusive access to hardware counters, and describe a prototype system2 to solve the problem....

The 48-core SCC processor: the programmer's view

Abstract The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures....

Operating System Support for Overlapping-ISA Heterogeneous Multi-core Architectures

Abstract iA heterogeneous processor consists of cores that are asymmetric in performance and functionality. Such a design provides a cost-effective solution for processor manufacturers to continuously improve both single-thread performance and multi-thread throughput. This design, however, faces significant challenges in the operating system, which traditionally assumes only homogeneous hardware. This paper presents a comprehensive study of OS support for heterogeneous architectures in which cores have asymmetric performance and overlapping, but non-identical instruction sets....

Operating System Support for Shared-ISA Asymmetric Multi-core Architectures

Abstract Current trends in multi-core processor implementation scale by duplicating a single core design many times in a package; however, this approach can cause inefficient utilization of resources, such as die space and power. Recent research has proposed asymmetric cores as an alternative solution. This paper explores the design space for asymmetric multi-core architectures, and presents a case study and prototype of one design in which cores implement overlapping, but nonidentical instruction sets....