Paul Brett's Published Papers

The 48-core SCC processor: the programmer's view

The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures. The hardware has been described elsewhere. In this paper, we describe the programmer's view of this chip. In particular we describe RCCE: the native message passing model created for the SCC processor.

Timothy G. Mattson, Rob F. Van der Wijngaart, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Vangal, Nitin Borkar, Greg Ruhl, Saurabh Dighe
SC10

Operating System Support for Overlapping-ISA Heterogeneous Multi-core Architectures

A heterogeneous processor consists of cores that are asymmetric in performance and functionality. Such a design provides a cost-effective solution for processor manufacturers to continuously improve both single-thread performance and multi-thread throughput. This design, however, faces significant challenges in the operating system, which traditionally assumes only homogeneous hardware. This paper presents a comprehensive study of OS support for heterogeneous architectures in which cores have asymmetric performance and overlapping, but non-identical instruction sets. Our algorithms allow applications to transparently execute and fairly share different types of cores. We have implemented these algorithms in the Linux 2.6.24 kernel and evaluated them on an actual heterogeneous platform. Evaluation results demonstrate that our designs efficiently manage heterogeneous hardware and enable significant performance improvements for a range of applications.

Tong Li, Paul Brett, Rob Knauerhase, David Koufaty, Dheeraj Reddy and Scott Hahn
HPCA 2010 [PDF]

Operating System Support for Shared-ISA Asymmetric Multi-core Architectures

Current trends in multi-core processor implementation scale by duplicating a single core design many times in a package; however, this approach can cause inefficient utilization of resources, such as die space and power. Recent research has proposed asymmetric cores as an alternative solution. This paper explores the design space for asymmetric multi-core architectures, and presents a case study and prototype of one design in which cores implement overlapping, but nonidentical instruction sets.

We propose fault-and-migrate, which enables the OS to manage hardware asymmetries transparently to applications. Our mechanism traps the fault when a core executes an unsupported instruction, migrates the faulting thread to a core that supports the instruction, and allows the OS to migrate it back when load balancing is necessary. We have also developed three approaches to emulate future asymmetric processors using current hardware. Preliminary evaluation shows that fault-and-migrate enables applications to execute transparently and incurs less than 4% overhead for a SPEC CPU2006* benchmark

Tong Li, Paul Brett, Barbara Hohlt, Rob Knauerhase, Sean D. McElderry, and Scott Hahn
WIOSCA 2008 [PDF]

Using OS Observations to Improve Performance in Multicore Systems

Today's operating systems don't adequately handle the complexities of multicore processors. Architectural features confound existing OS techniques for task scheduling, load balancing, and power management. This article shows that the OS can use data obtained from dynamic runtime observation of task behavior to ameliorate performance variability and more effectively exploit multicore processor resources. The authors' research prototypes demonstrate the utility of observation-based policy

Rob Knauerhase, Paul Brett, Barbara Hohlt, Tong Li, Scott Hahn
IEEE Micro May/June 2008 [PDF]

An Analysis of Performance Interference Effects in Virtual Environments

Virtualization is an essential technology in modern datacenters. Despite advantages such as security isolation, fault isolation, and environment isolation, current virtualization techniques do not provide effective performance isolation between virtual machines (VMs). Specifically, hidden contention for physical resources impacts performance differently in different workload configurations, causing significant variance in observed system throughput. To this end, characterizing workloads that generate performance interference is important in order to maximize overall utility.

In this paper, we study the effects of performance interference by looking at system-level workload characteristics. In a physical host, we allocate two VMs, each of which runs a sample application chosen from a wide range of benchmark and real-world workloads. For each combination, we collect performance metrics and runtime characteristics using an instrumented Xen hypervisor. Through subsequent analysis of collected data, we identify clusters of applications that generate certain types of performance interference. Furthermore, we develop mathematical models to predict the performance of a new application from its workload characteristics. Our evaluation shows our techniques were able to predict performance with average error of approximately 5%.

Younggyun Koh, Rob C. Knauerhase, Paul Brett, Mic Bowman, Zhihua Wen, Calton Pu
ISPASS 2007 [PDF]

Virtualization In The Enterprise

We present how an enterprise IT organization sees virtualization in the enterprise and how it can be applied. We look at key enterprise services and applications used within Intel's IT department and examine the issues associated with virtualizing servers within the context of those services. We demonstrate that virtual machine (VM) isolation does not extend to performance isolation as we show how applications running in separate VMs can significantly interfere with each other. Enterprise services depend on host characteristics like available cycles, platform configurations, and on proximity to other services. We define a taxonomy of these dependencies derived from our study. Next, we describe uses of Intel virtualization technology (Intel VT) that we are investigating. The ability to run multiple operating systems (OS's) is of great interest in our design environment where highly specialized tools are tied closely to OS versions. The ability to checkpoint, suspend, resume, and migrate VMs is very useful when we run long simulations. The ability to allocate VMs at the location of choice opens up other possible use cases, such as network monitoring, security monitoring, and content distribution. We see this capability also enabling safe yet realistic experimentation, as a way to extend virtualization into clients. Finally, we present a real case study applying virtualization to enterprise IT problems This virtualization program achieved higher server utilization, made it easier to manage datacenter assets, and reduced the consumption of datacenter resources (floor space, power, etc.), as well as simplified server releases through standardization.

Jeff Sedayao, Cheng-Chee Koh, Mic Bowman, Robert Knauerhase, Sanjay Rungta, John Vicente, Julia Palmer, Patrick Fabian, Paul Brett, Justin Richardson
Intel Technical Journal August 2006 [PDF]

Monitoring Internet Connectivity using PlanetLab

This paper explores one company's use of PlanetLab for a real application. Intel Corporation is a global enterprise with many Internet "DMZs" and thousands of customers around the world who use them. Intel needs to monitor the quality of service received through these Internet connections from many parts of the world. Doing this with available commercial services or by implementing monitoring systems in rented data center space across the globe would be expensive as well as being relatively inflexible. PlanetLab presents a relatively inexpensive and flexible platform for global scale monitoring but poses significant challenges in developing, deploying, and managing such a widely distributed application in an environment where node available and connectivity can change rapidly. We implemented the global DMZ monitor using PlanetLab nodes and the Distributed Service Management Toolkit (DSMT). DSMT provides a way to distribute code for an application and manage it despite node outages, moving the application to geographically appropriate nodes when nodes become unavailable. We position graphs to allow us to correlate data to either geographical local events or Internet wide events. Connectivity events are propagated using the PSEPR eventing system. Our experience with this implementation has shown that it can detect problems Internet connectivity problems. Future work includes using different protocols such as HTTP for monitoring and to extend DSMT services to monitor other conditions.

Sanjay Rungta, Alex Rentzis, Jeff Sedayao, Robert Adams, Paul Brett
NOMS 2006 [PDF]

Scalable Management

Modern computing environments, such as enterprise data centers, Grids, and PlanetLab, introduce distributed services to address scalability, locality, and reliability. Web Services (WS), in particular, improve decoupling, decentralization, and autonomicity within distributed systems. Unfortunately, scale and decentralization introduce additional problems in distributed services management, such as deployment, monitoring, and lifecycle maintenance.

In this paper, we propose a new approach to management of large scale distributed services, based on three artifacts: scalable publish-subscribe eventing, scalable WS-based deployment, and model-based management. We demonstrate that these techniques improve the manageability of services. In this way we enable service developers to focus on the development of service functionality rather than on management features.

Robert Adams, Paul Brett, Subu Iyer, Dejan S. Milojicic, Sandro Rafaeli, Vanish Talwar
ICAC 2005 [PDF]

A Shared Global Event Propagation And Storage System To Enable Next-Generation Distributed Services

The construction of highly reliable planetary-scale distributed services in the unreliable Internet environment entails significant challenges. Our research focuses on the use of loose binding among service components as a means to deploy distributed services at scale. An event-based publish/subscribe messaging infrastructure is the principal means through which we implement loose binding. A unique property of the messaging infrastructure is that it is built on a collection of off-the-shelf instant messaging servers running on PlanetLab. Using this infrastructure we have successfully constructed long-running services (such as a PlanetLab node status service) with more than 2000 components.

Paul Brett, Rob Knauerhase, Mic Bowman, Robert Adams, Aroon Nataraj, Jeff Sedayao, Michael Spindel
Worlds 2004 [PDF]

Securing the PlanetLab Distributed Testbed: How to Manage Security in an Environment with No Firewalls, with All Users Having Root, and No Direct Physical Control of Any System

PlanetLab is a globally distributed network of hosts designed to support the deployment and evaluation of planetary scale applications. Support for planetary applications development poses several security challenges to the team maintaining PlanetLab. The planetary nature of Planetlab mandates nodes distributed across the globe, far from the physical control of the team. The application development requirements force every user to have access to the equivalent of root on each machine, and use of firewalls is discouraged. If an account is compromised, PlanetLab administrators needed a way to track the actions of users on the nodes. If an entire node is compromised, then the administrators need a way to regain control despite the lack of physical access. Encryption was built into PlanetLab to ensure confidentiality and integrity of system downloads. A special reset packet, combined with keeping a boot CD in the machine, enables PlanetLab system administrators to remotely regain control of machines if they are compromised and return to the nodes into a safe known state. The Linux VServer implementation is used to provide root access to PlanetLab users for development purposes while isolating users from each other. A network abstraction layer provides accounting of traffic and allows safe access to raw sockets. These mechanisms have proven very useful in managing PlanetLab. After a compromise of large numbers of PlanetLab hosts, control of the PlanetLab network was regained in 10 minutes. The compromise spawned a review of PlanetLab security, which pointed out a number of flaws. The need the central site for maintaining PlanetLab was cites as a key weakness. Future work includes distributing the functions of PlanetLab's central administrative database and improving integrity checks.

Paul Brett, Mic Bowman, Jeff Sedayao, Robert Adams, Rob C. Knauerhase, Aaron Klingaman
LISA 2004 [PDF]