Permanent-Fault tolerance in CNN accelerators

This work has explored the possibility of drastically reducing the supply voltage of activation memories in CNN inference accelerators to save energy consumption. To address the impact on CNN accuracy of bitcell permanent faults as a consequence of supply voltage underscaling, this work has proposed a couple of low-cost microarchitecturalmechanisms based on flipping and patching approaches. These mechanisms are a consequence of a characterization study that identified the impact on accuracy of different fault patterns in activation memories.

FlipandPatch is based on the observation that activations with faults in the most significant byte (High Order or HO activations) largely degrade the accuracy of CNN applications. In addition, a small number of activations with faults in both most and less significant bytes (Low and HighOrder or L&HO activations) also compromise the accuracy of some CNNs. The proposed approach consists of a couple of techniques. First, we introduce a word flipping mechanism to deal with HO activations. Then, we propose a patching approach to deal with L&HO activations.

Flip technique

The aim of this technique is to minimize the weight of faulty bits in HO activations. Assuming a little endian data representation, Nbit HO activations only contain faults in bit positions from N∕2 to N−1. After flipping an activation, a bit occupying the ith position occupies the (N−1−i)th position. The flipping technique ensures that HO activations turn into LO activations, what significantly reduces the impact of a fault on the magnitude of the activation.

Example of applicability of the flipping technique to an 8bit HO activation

To differentiate between HO (flipped) activations and the remaining (non flipped) activations, the proposed design includes an f control bit associated with each activation. This bit is set, for different Vdd levels, during post fabrication testing, prior to deploying the device for actual operation in the field. It is therefore necessary to add 2:1 muxes, which are controlled by the f bit, to select between an original activation, which does not require any modification, and a flipped one.

Patch technique

The flipping technique does not remove the impact of L&HO activations on the CNN accuracy. The patch technique consists of a tiny cache, referred to as patching cache, that stores the original (faultfree) value of such activations.

Like f bits for the flipping approach, the patch mechanism requires a p control bit per activation to determine whether a requested activation is to be found in the patching cache or not. 2:1 muxes are converted to 4:1 muxes to select between an original activation, the flipped activation, and the patched one. The reads in the patching cache are performed cycle by cycle at word (activation) granularity. Therefore, latches are needed before the muxes to temporarily store the read activations before sending the block to the PE array.

Experimental results

Experimental results have shown that, compared to a conventional CNN accelerator supplied at a safe voltage of 0.6 V, an enhanced accelerator supplied at 0.54 V with Flip andPatch reduces the average energy consumption of activation memories by 10.5%, while maintaining the original (fault free) accuracy with a negligible impact on system performance (less than 0.05% for every application).