
FORMS: Finegrained Polarized ReRAMbased Insitu Computation for Mixedsignal DNN Accelerator
Recent works demonstrated the promise of using resistive random access m...
read it

Kudu: An Efficient and Scalable Distributed Graph Pattern Mining Engine
This paper proposes Kudu, a general distributed execution engine with a ...
read it

HASCO: Towards Agile HArdware and Software COdesign for Tensor Computation
Tensor computations overwhelm traditional generalpurpose computing devi...
read it

IntersectX: An Efficient Accelerator for Graph Mining
Graph pattern mining applications try to find all embeddings that match ...
read it

Mix and Match: A Novel FPGACentric Deep Neural Network Quantization Framework
Deep Neural Networks (DNNs) have achieved extraordinary performance in v...
read it

LowCost FloatingPoint Processing in ReRAM for Scientific Computing
We propose ReFloat, a principled approach for lowcost floatingpoint pr...
read it

DwarvesGraph: A HighPerformance Graph Mining System with Pattern Decomposition
Graph mining tasks, which focus on extracting structural information fro...
read it

ReversiSpec: Reversible Coherence Protocol for Defending Transient Attacks
The recent works such as InvisiSpec, SafeSpec, and CleanupSpec, among o...
read it

A Lightweight Isolation Mechanism for Secure Branch Predictors
Recently exposed vulnerabilities reveal the necessity to improve the sec...
read it

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices
Deep neural network (DNN) has emerged as the most important and popular ...
read it

A Comprehensive Evaluation of RDMAenabled Concurrency Control Protocols
Online transaction processing (OLTP) applications require efficient dis...
read it

PatDNN: Achieving RealTime DNN Execution on Mobile Devices with Patternbased Weight Pruning
With the emergence of a spectrum of highend mobile devices, many applic...
read it

HeterogeneityAware Asynchronous Decentralized Training
Distributed deep learning training usually adopts AllReduce as the sync...
read it

A StochasticComputing based Deep Learning Framework using Adiabatic QuantumFluxParametron SuperconductingTechnology
The Adiabatic QuantumFluxParametron (AQFP) superconducting technology ...
read it

Nonstructured DNN Weight Pruning Considered Harmful
Large deep neural network (DNN) models pose the key challenge to energy ...
read it

Hop: HeterogeneityAware Decentralized Training
Recent work has shown that decentralized algorithms can deliver superior...
read it

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
With the rise of artificial intelligence in recent years, Deep Neural Ne...
read it

ADMMNN: An AlgorithmHardware CoDesign Framework of DNNs Using Alternating Direction Method of Multipliers
To facilitate efficient embedded and hardware implementations of deep ne...
read it

ERNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs
Recurrent Neural Networks (RNNs) are becoming increasingly important for...
read it

An Efficient Framework for Implementing Persist Data Structures on Remote NVM
The byteaddressable NonVolatile Memory (NVM) is a promising technology...
read it

Towards UltraHigh Performance and Energy Efficiency of Deep Learning Systems: An AlgorithmHardware CoOptimization Framework
Hardware accelerations of deep learning systems have been extensively in...
read it

VIBNN: Hardware Acceleration of Bayesian Neural Networks
Bayesian Neural Networks (BNNs) have been proposed to address the proble...
read it

CirCNN: Accelerating and Compressing Deep Neural Networks Using BlockCirculantWeight Matrices
Largescale deep neural networks (DNNs) are both compute and memory inte...
read it

GraphR: Accelerating Graph Processing Using ReRAM
This paper presents GRAPHR, the first ReRAMbased graph processing accel...
read it

SCDCNN: HighlyScalable Deep Convolutional Neural Network using Stochastic Computing
With recent advancing of Internet of Things (IoTs), it becomes very attr...
read it
Xuehai Qian
is this you? claim profile