Complex silicon devices are increasingly controlling critical systems where safety and reliability are key concerns. Silicon technology is subject to numerous failure modes which can be broadly classified into soft-error effects (due to natural radiation) and life-time effects (e.g. electro-migration, NBTI, HCI) which are the result of semi-permanent changes in the circuit or transistor parameters. There is a large body of academic work which analyzes these effects in isolation and which proposes specific mitigation schemes. However, in industry, when analyzing the reliability of a system, it is necessary to consider all of these failure modes and how they propagate through the system and produce user-visible effects. There are no consistent tools or methodologies to address this problem and as a result reliability analysis in industry is often done using pessimistic assumptions and by a simple summing of failure rates using relatively basic spreadsheets . Current methodologies are not able to cope with the diversity of technology failure modes, increased design sizes and the complex relationships between consumers and suppliers of electronic components.
The root cause of the problem is that there is no standard format for sharing information about the reliability of silicon components. Individual companies develop internal models for specific classes of failures but these can not be shared across industry. This makes it difficult for system integrators to accurately characterize the failures for a silicon-intensive system. This lack of consistency is an impediment to design automation and has been highlighted by the International Technology Roadmap for Semiconductors:
“The expertise required to drive reliability engineering closer to product design requires the use of advanced engineering CAD tools. Although modeling and simulation support have been used to a limited degree, support is generally sporadic or inconsistent and does not provide the focus required for an effective reliability engineering”
“Another challenge is the lack of commonality in methodologies and models being used in DFR tools... EDA companies do not have consensus for DFR requirements and specifications that drive toward flexible tools, which might appeal to a broad customer base.”
The RIIF proposal is a new standardized modeling language which enables the creation of general reliability models for silicon devices. The language can model both transient and permanent errors at the cell level and makes it is possible to combine these low-level models to create models for complex components such as micro-controllers, SoCs and even full systems. Having a standardized format for sharing information about reliability is a critical first step in order to enable EDA tools which can analyze and optimize the reliability of general purpose designs.
A study group consisting of partners from industry and academia has been created under the IEEE TTTC. The group is working to define the modeling needs across a broad industrial base (e.g. automotive, networking, energy, aerospace, etc.) and to ensure that these needs are met by the RIIF proposal. The group is developing a library of worked examples and standardizing a new language for specifying and modeling the reliability and safety of such complex, silicon-intensive systems. The proposed workshop will play a key role in this process.
The workshop will consist primarily of invited presentations from industry highlighting the challenges associated with modeling the reliability of electronic systems. This will be balanced with a poster session which will highlight the latest research and developments in reliability modeling. A panel session will create debate and will be structured to be lively and to engage the audience.
Approximately 40-50 participants are expected for this workshop, with about one third being active members of the group, one third being industrial participants at DATE including individuals already working on the RELY and RESCAR 2.0 projects and one third being DATE participants from academia. Co-locating this workshop with DATE is highly beneficial as many academic partners working in the field of reliable systems will already be present and the workshop will create an opportunity to expose them to real-world challenges in reliability modeling.