Abstract:
Processing-in-memory (PIM) architecture has demonstrated great potentials in accelerating numerous deep learning tasks. In particular, resistive random-access memory (RRAM) technology provides a promising hardware substrate for PIM accelerators, because it can support efficient in-situ vector-matrix multiplications (VMMs) with high-density RRAM crossbar arrays. However, such accelerators suffer from frequent and energy-intensive analog-to-digital (A/D) conversions, severely limiting their performance. This paper proposes a new PIM architecture to efficiently accelerate deep learning tasks by minimizing the required A/D conversions with neural approximated peripheral circuits. By characterizing the existing dataflows of state-of-the-art PIM architectures, we first propose a new dataflow by extending shift and add (S+A) operations into the analog domain before the final A/D conversion, which can remarkably reduce the required A/D conversions for a dot-product. We then elaborate on a neural approximation method to design both accumulation circuits (S+A) and quantization circuits (ADC) using RRAM crossbar arrays. Finally, we apply them to build a RRAM-based PIM accelerator--\textbf{Neural-PIM} based on the proposed analog dataflow and evaluate its system-level performances. Evaluations on different DNN benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy, compared to state-of-the-art RRAM-based PIM accelerators, i.e., ISAAC} (CASCADE)