Satyabrata Sarangi
Abstract: Data compression is essential in reducing high storage and communication costs for a wide range of systems and applications. Canonical Huffman coding plays a pivotal role for several compression standards. This work presents energy-efficient and high-throughput canonical Huffman codec designs on a fine-grain many-core array, Intel i7-4850HQ, and Nvidia GT 750M GPU executing the corpus benchmarks: Calgary, Canterbury, Artificial, and Large. The many-core encoder implementations exploiting task-level parallelism and optimized codebook generation kernel yield a scaled energy efficiency that is 26.6×and 5.3×better than the i7 and GT 750M respectively. Moreover, the encoder implementations on the many-core array results in a scaled throughput per chip area that is 65×and 3.4×greater on average than the i7 and GT 750M respectively, while offering a compression ratio improvement of up to 22% than the independent block-based parallel encoding on GPUs. The many-core decoder implementations using bit-parallel architecture achieve a scaled throughput per chip area that is 891x and 7×greater on average than the i7 and GT 750M respectively. Furthermore, the many-core implementations result in a scaled energy efficiency that is 149.5×, 3.9×, and 2.5×greater on average than the i7, GT 750M, and Intel MAX10 FPGA respectively for the decoder. This work also presents DeepScaleTool, a tool for the accurate estimation of deep-submicron technology scaling by modeling and curve fitting published data by a leading commercial fabrication company for silicon fabrication technology generations from 130 nm to 7 nm for the key parameters of area, delay, and energy. The i7, FPGA, and GPU results are scaled to 32 nm (technology for many-core array) to account for differences in technology generations using scaling data by the DeepScaleTool.