Permanent-Fault tolerance in CNN accelerators

This work has explored the possibility of drastically reducing the supply voltage of activation memories in CNN inference accelerators to save energy consumption. To address the impact on CNN accuracy of bitcell permanent faults as a consequence of supply voltage underscaling, this work has proposed a couple of low-cost microarchitecturalmechanisms based on flipping and patching approaches. These mechanisms are a consequence of a characterization study that identified the impact on accuracy of different fault patterns in activation memories.

Flip­and­Patch is based on the observation that activations with faults in the most significant byte (High­ Order or HO activations) largely degrade the accuracy of CNN applications. In addition, a small number of activations with faults in both most and less significant bytes (Low and High­Order or L&HO activations) also compromise the accuracy of some CNNs. The proposed approach consists of a couple of techniques. First, we introduce a word flipping mechanism to deal with HO activations. Then, we propose a patching approach to deal with L&HO activations.

Flip technique

The aim of this technique is to minimize the weight of faulty bits in HO activations. Assuming a little­ endian data representation, N­bit HO activations only contain faults in bit positions from N∕2 to N−1. After flipping an activation, a bit occupying the i­th position occupies the (N−1−i)­th position. The flipping technique ensures that HO activations turn into LO activations, what significantly reduces the impact of a fault on the magnitude of the activation.

Example of applicability of the flipping technique to an 8­bit HO activation
Example of applicability of the flipping technique to an 8­bit HO activation

To differentiate between HO (flipped) activations and the remaining (non­ flipped) activations, the proposed design includes an f control bit associated with each activation. This bit is set, for different Vdd levels, during post­ fabrication testing, prior to deploying the device for actual operation in the field. It is therefore necessary to add 2:1 muxes, which are controlled by the f bit, to select between an original activation, which does not require any modification, and a flipped one.

Example of applicability of the flipping technique to an 8­bit HO activation
Example of applicability of the flipping technique to an 8­bit HO activation

Patch technique

The flipping technique does not remove the impact of L&HO activations on the CNN accuracy. The patch technique consists of a tiny cache, referred to as patching cache, that stores the original (fault­free) value of such activations.
Example of applicability of the flipping technique to an 8­bit HO activation
Example of applicability of the flipping technique to an 8­bit HO activation

Like f bits for the flipping approach, the patch mechanism requires a p control bit per activation to determine whether a requested activation is to be found in the patching cache or not. 2:1 muxes are converted to 4:1 muxes to select between an original activation, the flipped activation, and the patched one. The reads in the patching cache are performed cycle by cycle at word (activation) granularity. Therefore, latches are needed before the muxes to temporarily store the read activations before sending the block to the PE array.

Example of applicability of the flipping technique to an 8­bit HO activation
Example of applicability of the flipping technique to an 8­bit HO activation

Experimental results

Experimental results have shown that, compared to a conventional CNN accelerator supplied at a safe voltage of 0.6 V, an enhanced accelerator supplied at 0.54 V with Flip­ and­Patch reduces the average energy consumption of activation memories by 10.5%, while maintaining the original (fault free) accuracy with a negligible impact on system performance (less than 0.05% for every application).