Information Technology

SEU Abridged Discussion

__
<p style="text-align: justify;"><span data-contrast="none">Single Event Upset (SEU) or Single Event Effect (SEE) is a space and terrestrial effect that, despite the system may have passed all quality-related testing and verifications, the system can still experience random faults that are typically recovered after reset or power cycle.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">First, you need to answer how much pain it would cause if your application fails. A more solid pictorial example is as follows.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><ul style="text-align: justify;"><li><span data-contrast="none">Imagine the AI-powered robotic arm wildly rotates its arm and destroys the assembly line</span></li><li>An autonomous driving vehicle failed to recognize the pedestrian on the crosswalk and didn't stop</li><li>The accelerated trading platform buys at full speed while the marketing crashes</li></ul><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;">The question is not if this will happen but when and how often it will happen.&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">Of course, this is an unavoidable topic if you're already bound by safety standards like ISO26262, IEC61508 or your customer has already asked you about FIT.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;<br /><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic1.jpg" width="697" height="235" /></p><p style="text-align: justify;">&nbsp;<br /><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic2.jpg" width="484" height="315" /></p><p style="text-align: justify;"><span data-contrast="none">&nbsp; &nbsp;</span> <span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;&nbsp;</span> <span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><h2 style="text-align: justify;"><span style="font-size: 14pt;">Real World Examples&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">To drive the points home, the following hyperlinked topics provide lessons learned that already cost millions of dollars and/or lifes.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><a href="https://www.bbc.com/future/article/20221011-how-space-weather-causes-computer-errors"><span data-contrast="none">BBC special on SEU is to blame for your electronic woes - Mr. Gate is vindicated of all the blue screens~</span></a><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><a href="https://www.bbc.com/news/business-46999443"><span data-contrast="none">Google Engineer manhandled SEU</span></a><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><a href="https://en.wikipedia.org/wiki/Qantas_Flight_72"><span data-contrast="none">The tragedy of Quantas Flight 72</span></a><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><a href="https://en.wikipedia.org/wiki/Electronic_voting_in_Belgium#%3A~%3Atext%3DThe%20official%20explanation%20was%20%22the%2Csystem%20did%20not%20protect%20against"><span data-contrast="none">&nbsp;Belgium's party biased e-voter</span></a><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><a href="https://en.wikipedia.org/wiki/Takata_Corporation#%3A~%3Atext%3DOn%20June%2025%2C%202017%2C%20Takata%2Cwas%20possible%20for%20its%20survival"><span data-contrast="none">Tanaka&nbsp;</span></a><span data-contrast="none">airbag or&nbsp;</span><a href="https://www.bbc.com/news/world-us-canada-65578475"><span data-contrast="none">Arc automotive recall</span></a><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Or course, the list goes on, with&nbsp;</span><a href="https://support.huawei.com/enterprise/de/knowledge/EKB1000085293"><span data-contrast="none">Huawei&nbsp;</span></a><span data-contrast="none">and&nbsp;</span><a href="https://www.networkworld.com/article/3122864/cisco-says-router-bug-could-be-result-of-cosmic-radiation-seriously.html"><span data-contrast="none">Cisco&nbsp;</span></a><span data-contrast="none">both listing SEU as a known issue and as a standard debugging step and that it's a common assumption that ~20% of all "can not be reproduced" errors attributed to SEU in 1998. But if this is so serious, why haven't major IC folks talked about this?&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Most of these folks take a "you don't ask, I don't tell" approach. This is exactly why high-reliability applications clearly outline the mandates in specifications such as ISO26262. And you can further verify that the major players are all doing their homework (or some of them) by checking out the various accelerated radiation facility's booking schedules.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 14pt;">Source of Radiation&nbsp;</span></h2><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic3.jpg" width="471" height="312" /></span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">The sun is full of radiation, while our mother Earth's magnetic field shields us from most of the harm. On top of that, package material can be a serious contributor as well. As to exactly how physics interacts between the radiation and IC, I will leave that to the Physicists and Radiation Engineer to worry about. As a designer or engineer, you only need to know it's there and it's real. Then we'll need to figure out how to deal with it.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">So hopefully, by now, I have convinced you that the effects are real and we need to take care of them for some of us. And we'll dive into the topic of FIT in the next part of this series.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 14pt;">Failure in Time &ndash; FIT&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">1 FIT = 1 failure in a billion hours. In other words, it takes around 114,155 years for an error to occur. Quality engineers will often mention MTTF (Mean Time To Failure) and MTBF (Mean Time Between Failure). Although one error in 114,155 years sounds abysmal, a generally assumed unmitigated processor with a FIT of 50,000 would mean a fault every 2.28 years. But this is one single processor. A system will likely have memories and many other ICs. It's a common consensus to use 1,000 FIT/M gates for unmitigated 90nm ASIC. And if you expect just to build 10,000 systems, the number adds up fast, and that accumulated number should be enough to send a chill down any quality engineer's spine.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1">&nbsp;</p><p style="text-align: justify;" aria-level="1"><strong>Real Numbers to put things into Correct Perspective</strong><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">AMD-Xilinx is the poster child in the SEU field when doing their due diligence and openly publishing their results. We're interested in the Device Reliability Report&nbsp;</span><a href="https://www.xilinx.com/support/documentation/user_guides/ug116.pdf"><span data-contrast="none">UG 116,&nbsp;</span></a><span data-contrast="none">updated every six months.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">First, take a look at the figure below. Some FIT is already provided.&nbsp;&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">As engineers, it's our job to drive down the FIT during normal life or the region not covered by the bold red letters in the diagram.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic4.jpg" width="478" height="353" /></span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">Table 19 should specify the various SEU-related FIT numbers for CRAM (configuration ram). For those unfamiliar with AMD FPGA, FPGA is an SRAM-based reconfigurable gate array device. Therefore, the user design must first be loaded into the configuration memory for the device to function as expected.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Look at the first three rows, all with tech nodes of 90nm but spanning 4 product families. As you can see, the LANSCE Neutron cross-section per bit is different. Moreover, the last two columns vary up to 2.6x between Virtex-4 and Spartan-3E/3A.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">The lesson for this is that don't assume your 90nm will have the same superb number as AMD. They put in much work to reverse the technology trend to drive down the FIT. Otherwise, with each shrinking technology, one would expect it takes less energy to upset a cell while knocking out multiple cells.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">It's also important to note that they're the only major house player I know who provides real-world data. If you merely follow the JESD89 conversion, you'll see there will still be quite a gap between the derived number and the real-world number。</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic6.jpg" /></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">The main takeaway here is that it's&nbsp;</span>inevitable that IC will fall to SEU eventually, and the more advanced the node is, the worse the problem is unless you do something.&nbsp;</p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 14pt;" data-contrast="none">SEU Family Tree</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">It can be simply broken down into two categories: </span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><ul style="text-align: justify;"><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="1" data-aria-level="1"><span data-contrast="none">Hard/destructive&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="2" data-aria-level="1"><span data-contrast="none">Non-destructive</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li></ul><p style="text-align: justify;"><span data-contrast="none">Hard errors are rare but catastrophic, while software is more common.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic7.png" width="517" height="273" /></span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Before we move on, note that this table should be used as a baseline reference. For example, FPGAs are also vulnerable to SET, and it's just that the probability of an SEU is so much higher than SET that we should focus on SEU.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">In addition, some of the errors will also be vendor or even device-specific. Do a quick search on FPGA and SEL, and you'll see that not all vendors and parts are vulnerable to SEL.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1"><span data-contrast="none">Non-Destructive SEE</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">SEU (Single Event Upset): General error that results in a functional error</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">MBU (Multiple bits Upset): Any single strike that knocks out bits</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">MCU (Multiple Cells Upset): MBU, but not limited to bits</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">Most ECC deployment is based on SECDEC, Single Error Correct Double Error Detect. Therefore, aliasing can occur if an upset spanned more than three errors. Refer to the following generalized diagram; you're safe at 180nm, but at 65nm, ~20% of errors resulted in MBU. </span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">At 40nm, you're now close to 40%, SECDEC is effectively useless. It's also true that the result can vary significantly pending foundry, cell layout, etc., but if you're unaware of this issue, don't expect the device vendors to tell you this voluntarily.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic8.jpg" width="359" height="256" /></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><strong><span data-contrast="none">SEFI (Single Event Functional Interrupt)</span></strong><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Anything that knocks out the device function can be put into this bucket. For example, if an update knocks out your PCI bridge such that the device drops the packet and is no longer responsive to subsequent requests, it's a SEFI.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;<br /><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><strong><span data-contrast="none">SET (Single Event Transient)</span></strong><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">A picture is worth a thousand words. Try to imagine this glitch on the reset or clock line, but at the same time, a glitch on the data input without a clock likely isn't going to result in any error.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic9.jpg" width="545" height="134" /></span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><strong><span data-contrast="none">SED (Single Event Disturb)</span></strong><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Effectively just a dumpster for anything not covered by the other family members.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1"><span data-contrast="none">Destructive Errors</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Most of these errors are self-explanatory by name. The common symptom is a damaged device which typically manifests itself with an additional current draw that can't be recovered even after a power cycle.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><ul style="text-align: justify;"><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="3" data-aria-level="1"><span data-contrast="none">SHE (Single Event Hard Error)&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="4" data-aria-level="1"><span data-contrast="none">SEL (Single Event Latch-up)&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="5" data-aria-level="1"><span data-contrast="none">SESB (Single Event Snap-Back)&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="6" data-aria-level="1"><span data-contrast="none">SEB (Single Event Burn-out)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="7" data-aria-level="1"><span data-contrast="none">SEGR (Single Event Gate Rupture)&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="8" data-aria-level="1"><span data-contrast="none">SEDR (Single Event Dielectric Rupture)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li></ul><p style="text-align: justify;" aria-level="1">&nbsp;</p><h2 style="text-align: justify;" aria-level="1"><span style="font-size: 14pt;">Not All Errors Are Created Equal&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">After reviewing the SEU family tree, you should quickly see that not all errors are equal. Some errors significantly impact the system, while others may simply flush out with the system, even experiencing a</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">blip. SEUPI, Single Event Upset Probability Impact factor, is typically used to further derate the probability of an event that may result in actual errors observed.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 14pt;">Choices&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">As an engineer, many techniques are at our disposal to drive down FIT. Do note that some of the techniques discussed only apply to IC design, some for system or module designers, while all can deploy some. You'll also notice that many techniques can target the same failure. The choice is yours. It comes down to area, resource, speed, power, and all these constraining factors you get to deal with before you choose the option that best suits you.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">For example, if you're an IC Designer and need to protect memory's integrity, you're limited to methods such as ECC, parity, scrubbing, interleave, cell type, etc. But as a system or module developer, you must deploy redundancy and software mitigation schemes such as SWIFT or simply log and optionally reset the module.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 14pt;" data-contrast="none">Mitigation Techniques</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></h2><p style="text-align: justify;"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic10.jpg" width="545" height="397" /></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic11.png" width="541" height="276" /></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic12.png" width="586" height="289" /></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Now you see numerous techniques that we have at our disposal. For full tables, please follow up with me. But to demonstrate how even a simple technique does require plenty of expertise, we'll dive into one technique.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1">&nbsp;</p><h2 style="text-align: justify;" aria-level="1"><span style="font-size: 14pt;" data-contrast="none">TMR - Triple Module Redundancy</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">Three times power consumption, area, resources, and complexity always equate to a thousand times happiness. IEC61508 provided some different variations of redundancy as well.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span></p><p style="text-align: justify;"><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic13.jpg" width="533" height="215" /></span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">Now let's assume one of your 2oo3 voter channels is out due to SEU, but the error stays in the system because it's part of the one-hot finite state machine (FSM). The system now becomes up to twice as vulnerable to subsequent failure.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic14.jpg" width="591" height="122" /></span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">Let's look at another widely touted LTMR (Local TMR) scheme that protects flip-flops below.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p>&nbsp;<br /><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic15.jpg" width="571" height="225" /></span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">This is a shift register design that protects only flip-flops. The essence of the comparison is that the red line (Chain) is effectively the same, although the blue line (TMR flip flop) lowered significantly. In an FPGA, the configuration bits required to set the flip flop's content may pale compared to the number of configuration bits required to set routing. Therefore, performing redundancy without sufficient knowledge may be worse than not doing anything at all.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1">&nbsp;<br /><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><h2 style="text-align: justify;" aria-level="1"><span style="font-size: 14pt;">Mitigation Schemes Deployed by IC and System Providers&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">After all that talk, let's review the major IC players' actions. It's important to know that each IC households this deck of cards very close to their chest, and many published data may be outdated, but it still provides the idea that this SEU issue is not to be simply overlooked.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">First, s/390 G5 provided by IBM had added 20~30% logic to protect its execution units and main process. Fujitsu claimed to have 80% of its 20,000 latches in Sparc64 covered, parity added to ALU, and checks done on multipliers and dividers.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Xilinx went even a step further with its Zynq family. It has completely separated PL (programmable logic) and PS (process) power, so you now have redundancy. This is in addition to providing two ARM's real-time processors so you can deploy schemes like lockstep and other software and design flow techniques that one can further deploy to drive down FIT. A lot of background work has been done so ASIL-C can be achieved on one part.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">And if you google Huawei and SEU, NTT's 10G-OPON ONU+SEU, or Broadcom's WP SE-BSD-WP100, you'll see that system folks other than automotive folks have long been dealing with SEU. Cisco even had a radiation test team that listed SEU as a standard&nbsp;</span><a href="https://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/116135-trouble-6500-parity-00.html"><span data-contrast="none">debugging step.</span></a><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 14pt;">Verify as Soon as Possible&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">The design must inevitably be verified. A mitigated design only cranks up the pain meter. For anyone who codes in RTL, you can't escape writing test benches. And no, you don't get to check that done box just by implementing system verilog's random function.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">And are we done after the simulation? Life can't be that easy, can it?&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">The general rule of thumb is still applicable. The earlier you catch a bug, the less hair you'll pull out.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1">&nbsp;</p><h2 style="text-align: justify;" aria-level="1"><strong><span style="font-size: 14pt;" data-contrast="none">Simulation</span></strong><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></h2><p style="text-align: justify;"><span data-contrast="none">We know we can't cover all cases; therefore, getting bang for the buck is essential. Hit the clock line or reset line if they're also triplicated. Hit logic with high value so you can quickly validate how robust the code is. The list goes on, but you get the idea.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1">&nbsp;</p><p style="text-align: justify;" aria-level="1"><strong>Fault Injection&nbsp;</strong></p><p style="text-align: justify;"><span data-contrast="none">FPGA emulation or FPGA designs provide designers with a quick and easy way to inject errors into the design on a bit-per-bit basis. And if you know your configuration well enough, you may be able to back-trace the failing bit to your design.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;" aria-level="1">&nbsp;</p><p style="text-align: justify;" aria-level="1"><strong>Accelerated Radiation Testing</strong><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">This is the closest method you get other than real-world data collection. And if you're doing this for ISO26262 and following JESD89, you still have to think about all the nitty-gritty details if you've never done it before.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">Let me just toss out a few things for you.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><ul style="text-align: justify;"><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="9" data-aria-level="1"><span data-contrast="none">What beam will you use? Alpha, proton, neutron, x-ray, focused laser beam?</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="10" data-aria-level="1"><span data-contrast="none">Will this be a dynamic or static test?</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="11" data-aria-level="1"><span data-contrast="none">What type of failure should you be looking for? Are you sure you can catch it and recognize it?</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li><li data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="12" data-aria-level="1"><span data-contrast="none">What monitoring system would you deploy?</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></li></ul><p style="text-align: justify;"><span data-contrast="none">To provide you with a real-life example, let's look again at the AMD-Xilinx test setup for a "dynamic test".</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic16.jpg" width="370" height="208" /></p><p style="text-align: justify;">&nbsp;</p><p><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic17.jpg" width="369" height="288" /></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span data-contrast="none">Two setups are provided here; you can tell one setup is in the open air while the other is in a vacuum.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p><span data-contrast="none"><img style="display: block; margin-left: auto; margin-right: auto;" src="https://kradminasset.s3.ap-south-1.amazonaws.com/ExpertViews/Chenpic18.png" width="339" height="251" /></span></p><p style="text-align: justify;"><span data-contrast="none">&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="none">The discussion of beam selection, test setup, and all the gotchas is a giant topic in and by itself. This is why some major commercial folks even have their dedicated radiation test team.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;"><span style="font-size: 10pt;"><em>This article was contributed by our expert <a href="https://www.linkedin.com/in/chen-wei-tseng/" target="_blank" rel="noopener">Chen Wei Tseng</a></em></span><br />&nbsp;</p><p style="text-align: justify;">&nbsp;</p><h3 style="text-align: justify;"><span style="font-size: 18pt;">Frequently Asked Questions Answered by Chen Wei Tseng</span></h3><p style="text-align: justify;"><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><h2 style="text-align: justify;"><span style="font-size: 12pt;">1. What mitigation strategies or techniques can be employed to reduce the impact of SEU and power cycles on electronic systems?</span></h2><p style="text-align: justify;"><span data-contrast="auto">The goal is first to reduce FIT rather than reset or power cycle the system. Reset or power cycle the system will bring down the system and result in user operation interruption and critical applications such as electric cars, which may result in life or death. </span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 12pt;">2. How do safety standards align with industry practices and experiences regarding SEU, and do they evolve to address emerging challenges in this area?</span></h2><p style="text-align: justify;"><span data-contrast="auto">As you can tell, many applications already have. I'll append the pictures here for your reference. Most of the standards for the designers to refer to "experts" as the effect can differ dramatically pending IC technology node, vendor, foundry, etc. The knowledge in this field is lacking here in Asia. But of course, as the course has already mentioned, applications that can result in significant damage will also be a concern. </span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="auto">I've listed examples from Huawei and Cisco in the piece, I'm aware of systems in the financial world (accelerated trading platform), gaming machines, AI applications, etc, that, although not bound by the standards but are rather interested in this topic because they've already been burned. </span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;"><span data-contrast="auto">pic</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 12pt;" data-contrast="auto">3. </span><span style="font-size: 12pt;">How do high-reliability applications account for the potential occurrence of SEU in their system design and architecture?</span></h2><p style="text-align: justify;"><span data-contrast="auto">Through design mitigation, engineers must first know the FIT (failure in time) rate to deploy the correct mitigation. </span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><h2 style="text-align: justify;"><span style="font-size: 12pt;" data-contrast="auto">4. </span><span style="font-size: 12pt;">Are there any advancements or innovations in package materials that aim to improve SEU resistance?</span></h2><p style="text-align: justify;"><span data-contrast="auto">As far as I'm aware, no. This is why the burden is now on the designer to deal with it.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}">&nbsp;</span></p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;">&nbsp;</p><p style="text-align: justify;">&nbsp;</p>
KR Expert - Chen Wei Tseng

Core Services

Human insights are irreplaceable in business decision making. Businesses rely on Knowledge Ridge to access valuable insights from custom-vetted experts across diverse specialties and industries globally.

Get Expert Insights Today