b'Gu-Yeon Wei'

research

∙ 06/13/2023

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

We introduce a method that dramatically reduces fine-tuning VRAM require...

0 Yuji Chai, et al. ∙

research

∙ 06/09/2023

S^3: Increasing GPU Utilization during Generative Inference for Higher Throughput

Generating texts with a large language model (LLM) consumes massive amou...

0 Yunho Jin, et al. ∙

research

∙ 05/04/2023

CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning

The emergence of the Internet of Things (IoT) has resulted in a remarkab...

0 Sai Qian Zhang, et al. ∙

research

∙ 05/02/2023

Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems

As computing hardware becomes more specialized, designing environmentall...

0 Mariam Elgamal, et al. ∙

research

∙ 02/21/2023

MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

Deep learning recommendation systems serve personalized content under di...

0 Samuel Hsia, et al. ∙

research

∙ 01/26/2023

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

The ability to accurately predict deep neural network (DNN) inference pe...

0 Yuji Chai, et al. ∙

research

∙ 01/26/2023

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

On-device machine learning (ML) inference can enable the use of private ...

0 Maximilian Lam, et al. ∙

research

∙ 12/01/2022

Architectural Implications of Embedding Dimension during GCN on CPU and GPU

Graph Neural Networks (GNNs) are a class of neural networks designed to ...

0 Matthew Adiletta, et al. ∙

research

∙ 05/13/2022

Impala: Low-Latency, Communication-Efficient Private Deep Learning Inference

This paper proposes Impala, a new cryptographic protocol for private inf...

0 Woo-Seok Choi, et al. ∙

research

∙ 05/06/2022

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

Autonomous machines (e.g., vehicles, mobile robots, drones) require soph...

0 Tianyu Jia, et al. ∙

research

∙ 03/05/2022

Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference

Multiparty computation approaches to secure neural network inference tra...

0 Maximilian Lam, et al. ∙

research

∙ 03/01/2022

Specialized Accelerators and Compiler Flows: Replacing Accelerator APIs with a Formal Software/Hardware Interface

Specialized accelerators are increasingly used to meet the power-perform...

0 Bo-Yuan Huang, et al. ∙

research

∙ 01/21/2022

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration

The design of heterogeneous systems that include domain specific acceler...

0 Georgios Zacharopoulos, et al. ∙

research

∙ 11/17/2021

Early DSE and Automatic Generation of Coarse Grained Merged Accelerators

Post-Moore's law area-constrained systems rely on accelerators to delive...

0 Iulian Brumar, et al. ∙

research

∙ 09/02/2021

NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories

Repeated off-chip memory accesses to DRAM drive up operating power for d...

0 Lillian Pentecost, et al. ∙

research

∙ 06/18/2021

Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories

The memory wall bottleneck is a key challenge across many data-intensive...

0 Mohammad Mehdi Sharifi, et al. ∙

research

∙ 06/10/2021

Gradient Disaggregation: Breaking Privacy in Federated Learning by Reconstructing the User Participant Matrix

We show that aggregated model updates in federated learning may be insec...

0 Maximilian Lam, et al. ∙

research

∙ 05/18/2021

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Deep learning recommendation systems must provide high quality, personal...

0 Udit Gupta, et al. ∙

research

∙ 05/03/2021

Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models

This work analyzes how attention-based Bidirectional Long Short-Term Mem...

0 Coleman Hooper, et al. ∙

research

∙ 02/05/2021

Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots

Building domain-specific architectures for autonomous aerial robots is c...

2 Srivatsan Krishnan, et al. ∙

research

∙ 01/29/2021

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Neural personalized recommendation models are used across a wide variety...

0 Mark Wilkening, et al. ∙

research

∙ 11/28/2020

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

Transformer-based language models such as BERT provide significant accur...

10 Thierry Tambe, et al. ∙

research

∙ 10/28/2020

Chasing Carbon: The Elusive Environmental Footprint of Computing

Given recent algorithm, software, and hardware innovation, computing has...

0 Udit Gupta, et al. ∙

research

∙ 10/10/2020

Cross-Stack Workload Characterization of Deep Recommendation Systems

Deep learning based recommendation systems form the backbone of most per...

0 Samuel Hsia, et al. ∙

research

∙ 05/31/2020

Cheetah: Optimizations and Methods for PrivacyPreserving Inference via Homomorphic Encryption

As the application of deep learning continues to grow, so does the amoun...

0 Brandon Reagen, et al. ∙

research

∙ 01/13/2020

CHIPKIT: An agile, reusable open-source framework for rapid test chip development

The current trend for domain-specific architectures (DSAs) has led to re...

0 Paul Whatmough, et al. ∙

research

∙ 01/08/2020

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

Neural personalized recommendation is the corner-stone of a wide collect...

0 Udit Gupta, et al. ∙

research

∙ 12/10/2019

SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads

In recent years, there has been tremendous advances in hardware accelera...

0 Yuan Yao, et al. ∙

research

∙ 11/30/2019

A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM

We present a new algorithm for training neural networks with binary acti...

0 Siming Ma, et al. ∙

research

∙ 10/02/2019

MLPerf Training Benchmark

Machine learning is experiencing an explosion of software and hardware s...

6 Peter Mattson, et al. ∙

research

∙ 09/29/2019

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

Conventional hardware-friendly quantization methods, such as fixed-point...

0 Thierry Tambe, et al. ∙

research

∙ 07/24/2019

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

Training deep learning models is compute-intensive and there is an indus...

31 Gu-Yeon Wei, et al. ∙

research

∙ 05/24/2019

Learning Low-Rank Approximation for CNNs

Low-rank approximation is an effective model compression technique to no...

0 Dongsoo Lee, et al. ∙

research

∙ 05/24/2019

Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks

Model compression techniques, such as pruning and quantization, are beco...

0 Se Jung Kwon, et al. ∙

research

∙ 05/14/2019

Network Pruning for Low-Rank Binary Indexing

Pruning is an efficient model compression technique to remove redundancy...

0 Dongsoo Lee, et al. ∙

research

∙ 02/15/2018

Cloud No Longer a Silver Bullet, Edge to the Rescue

This paper takes the position that, while cognitive computing today reli...

0 Yuhao Zhu, et al. ∙

research

∙ 11/13/2017

Weightless: Lossy Weight Encoding For Deep Neural Network Compression

The large memory requirements of deep neural networks limit their deploy...

0 Brandon Reagen, et al. ∙

Gu-Yeon Wei

Featured Co-authors

Sign in with Google

Consider DeepAI Pro