Paul Brett - Published Papers & Patents

QuickIA: Exploring Heterogeneous Architectures on Real Prototypes

Abstract Over the last decade, homogeneous multi-core processors emerged and became the de-facto approach for offering high parallelism, high performance and scalability for a wide range of platforms. We are now at an interesting juncture where several critical factors (smaller form factor devices, power challenges, need for specialization, etc) are guiding architects to consider heterogeneous chips and platforms for the next decade and beyond. Exploring heterogeneous architectures is challenging since it involves re-evaluating architecture options, OS implications and application development....

Extending the Dynamic Power Range of Client Devices using Heterogeneous Processors

Abstract The ubiquity of handhelds is causing an unprecedented increase in the range of performance demands imposed on mobile platforms, and at the same time, battery life and energy efficiency remain critical concerns. Yet modern processors are typically designed to meet only one, not both, of these two conflicting goals: to offer high performance vs. provide power savings. This work explores an approach in which heterogeneous processors, i.e., a mix of different cores, are used to extend the dynamic power/performance range of client devices....

Access: Smart Scheduling for Asymmetric Cache CMPs

Abstract In current Chip-multiprocessors (CMPs), a significant portion of the die is consumed by the last-level cache. Until recently, the balance of cache and core space has been primarily guided by the needs of single applications. However, as multiple applications or virtual machines (VMs) are consolidated on such a platform, researchers have observed that not all VMs or applications require significant amount of cache space. In order to take advantage of this phenomenon, we explore the use of asymmetric last-level caches in a CMP platform....

Bridging functional heterogeneity in multicore architectures

Abstract Heterogeneous processors that mix big high performance cores with small low power cores promise excellent single-threaded performance coupled with high multi-threaded throughput and higher performance-per-watt. A significant portion of the commercial multicore heterogeneous processors are likely to have a common instruction set architecture( ISA). However, due to limited design resources and goals, each core is likely to contain ISA extensions not yet implemented in the other core. Therefore, such heterogeneous processors will have inherent functional asymmetry at the ISA level and face significant software challenges....

Hardware Support for Cross-Layer PMU Arbitration

Abstract Intel processors offer PerfMon, a set of hardware events and counters that may be programmed in a number of ways for a variety of uses. Traditionally used for application optimization, we are seeing novel nascent uses throughout the software stack: in operating systems, virtualization hypervisors, and even BIOS firmware. Conflict for these counters has already been observed, and is likely to worsen. We posit the need for hardware features to allow “reservation” of and exclusive access to hardware counters, and describe a prototype system2 to solve the problem....

The 48-core SCC processor: the programmer's view

Abstract The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures....

Operating System Support for Overlapping-ISA Heterogeneous Multi-core Architectures

Abstract iA heterogeneous processor consists of cores that are asymmetric in performance and functionality. Such a design provides a cost-effective solution for processor manufacturers to continuously improve both single-thread performance and multi-thread throughput. This design, however, faces significant challenges in the operating system, which traditionally assumes only homogeneous hardware. This paper presents a comprehensive study of OS support for heterogeneous architectures in which cores have asymmetric performance and overlapping, but non-identical instruction sets....

Operating System Support for Shared-ISA Asymmetric Multi-core Architectures

Abstract Current trends in multi-core processor implementation scale by duplicating a single core design many times in a package; however, this approach can cause inefficient utilization of resources, such as die space and power. Recent research has proposed asymmetric cores as an alternative solution. This paper explores the design space for asymmetric multi-core architectures, and presents a case study and prototype of one design in which cores implement overlapping, but nonidentical instruction sets....

Using OS Observations to Improve Performance in Multicore Systems

Abstract Today\’s Operating Systems don\’t adequately handle the complexities of multicore processors. architectural features confound existing OS techniques for task scheduling, load balancing, and power management. This article shows that the OS can use data obtained from dynamic runtime observation of task behavior to ameliorate performance variability and more effectively exploit multicore processor resources. The authors\’s research prototypes demonstrate the utility of observation-based policy. Authors Intel Corporation: Rob Knauerhase, Paul Brett, Tong Li, Barbara Hohlt, Scott Hahn...

An Analysis of Performance Interference Effects in Virtual Environments

Abstract Virtualization is an essential technology in modern datacenters. Despite advantages such as security isolation, fault isolation, and environment isolation, current virtualization techniques do not provide effective performance isolation between virtual machines (VMs). Specifically, hidden contention for physical resources impacts performance differently in different workload configurations, causing significant variance in observed system throughput. To this end, characterizing workloads that generate performance interference is important in order to maximize overall utility....