HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

by   Qingcheng Xiao, et al.

Tensor computations overwhelm traditional general-purpose computing devices due to the large amounts of data and operations of the computations. They call for a holistic solution composed of both hardware acceleration and software mapping. Hardware/software (HW/SW) co-design optimizes the hardware and software in concert and produces high-quality solutions. There are two main challenges in the co-design flow. First, multiple methods exist to partition tensor computation and have different impacts on performance and energy efficiency. Besides, the hardware part must be implemented by the intrinsic functions of spatial accelerators. It is hard for programmers to identify and analyze the partitioning methods manually. Second, the overall design space composed of HW/SW partitioning, hardware optimization, and software optimization is huge. The design space needs to be efficiently explored. To this end, we propose an agile co-design approach HASCO that provides an efficient HW/SW solution to dense tensor computation. We use tensor syntax trees as the unified IR, based on which we develop a two-step approach to identify partitioning methods. For each method, HASCO explores the hardware and software design spaces. We propose different algorithms for the explorations, as they have distinct objectives and evaluation costs. Concretely, we develop a multi-objective Bayesian optimization algorithm to explore hardware optimization. For software optimization, we use heuristic and Q-learning algorithms. Experiments demonstrate that HASCO achieves a 1.25X to 1.44X latency reduction through HW/SW co-design compared with developing the hardware and software separately.


SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices

Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with...

Learned Hardware/Software Co-Design of Neural Accelerators

The use of deep learning has grown at an exponential rate, giving rise t...

Cycle-Accurate Evaluation of Software-Hardware Co-Design of Decimal Computation in RISC-V Ecosystem

Software-hardware co-design solutions for decimal computation can provid...

Enumerating Hardware-Software Splits with Program Rewriting

A core problem in hardware-software codesign is in the sheer size of the...

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many domains including m...

Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization

Apache TVM (Tensor Virtual Machine), an open source machine learning com...

WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow

With the cross-fertilization of applications and the ever-increasing sca...

Please sign up or login with your details

Forgot password? Click here to reset