Prof. Mattan Erez Recieves NSF CAREER Award
Professor Mattan Erez has received an NSF CAREER Award for his research on "Architectural Mechanisms for Cooperative Reliability." Dr. Erez's research focuses on hardware and software techniques, as well as programming model considerations, for enabling flexible and dynamic soft error protection.
The goal of the research is to enable cooperative protection schemes and maximize efficiency by blurring the lines between the hardware and software control of soft error protection. Professor Erez will investigate techniques that allow the hardware designer, software system, and programmer to make optimal decisions for their usage model. The aim is to make error tolerance a first-class optimization option in order to allow users to "pay for the error tolerance they need, rather than overpay for what they might need". Applicable scenarios will be studied and analyzed to gauge the advantages to overall system design, effective memory capacity, and performance/power. Professor Erez will also develop methods for the programmer to explicitly yet abstractly and intuitively express trade off between protection and hardware resources. The approach is based on a new mechanism called ``limited guaranteed precision", which allows a programmer to selectively protect costly or sensitive portions of a computation.
To achieve long-term impact, materials to train developers to realize the benefits of treating reliability as a first-class application property will be developed. The education plan revolves around course modules, problems, and demonstrations at levels ranging from popular and mini-talks suitable for high-school and middle-school, through lower and upper division undergraduate course modules, to in-depth graduate-level study. The project will also introduce students who are not computer scientists/engineers to scientific computing and systems and train them, thereby increasing US high-end computing competitiveness. The outcome of this research can impact related fields, industry, and society at large by maintaining advances in computational tools for science and engineering.