b'Di He'

research

∙ 05/24/2023

Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective

Recent studies have discovered that Chain-of-Thought prompting (CoT) can...

0 Guhao Feng, et al. ∙

research

∙ 03/23/2023

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

Current endpointing (EP) solutions learn in a supervised framework, whic...

0 Do June Min, et al. ∙

research

∙ 02/14/2023

A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weisfeiler-Lehman Tests

Recently, subgraph GNNs have emerged as an important direction for devel...

0 Bohang Zhang, et al. ∙

research

∙ 02/12/2023

3D Molecular Generation via Virtual Dynamics

Structure-based drug design, i.e., finding molecules with high affinitie...

0 Shuqi Lu, et al. ∙

research

∙ 02/03/2023

Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

We propose a new class of linear Transformers called FourierLearner-Tran...

0 Krzysztof Marcin Choromanski, et al. ∙

research

∙ 01/23/2023

Rethinking the Expressive Power of GNNs via Graph Biconnectivity

Designing expressive Graph Neural Networks (GNNs) is a central topic in ...

0 Bohang Zhang, et al. ∙

research

∙ 01/15/2023

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Designing an efficient yet deployment-friendly 3D backbone to handle spa...

0 Haiyang Wang, et al. ∙

research

∙ 10/09/2022

Online Training Through Time for Spiking Neural Networks

Spiking neural networks (SNNs) are promising brain-inspired energy-effic...

0 Mingqing Xiao, et al. ∙

research

∙ 10/04/2022

Rethinking Lipschitz Neural Networks for Certified L-infinity Robustness

Designing neural networks with bounded Lipschitz constant is a promising...

0 Bohang Zhang, et al. ∙

research

∙ 10/04/2022

One Transformer Can Understand Both 2D 3D Molecular Data

Unlike vision and language data which usually has a unique format, molec...

0 Shengjie Luo, et al. ∙

research

∙ 06/09/2022

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

Adversarial examples, which are usually generated for specific inputs wi...

0 Huishuai Zhang, et al. ∙

research

∙ 06/04/2022

Is L^2 Physics-Informed Loss Always Suitable for Training Physics-Informed Neural Network?

The Physics-Informed Neural Network (PINN) approach is a new and promisi...

0 Chuwei Wang, et al. ∙

research

∙ 05/26/2022

Your Transformer May Not be as Powerful as You Expect

Relative Positional Encoding (RPE), which encodes the relative distance ...

0 Shengjie Luo, et al. ∙

research

∙ 04/13/2022

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

We present an efficient method of pretraining large-scale autoencoding l...

0 Payal Bajaj, et al. ∙

research

∙ 03/09/2022

Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets

This technical note describes the recent updates of Graphormer, includin...

10 Yu Shi, et al. ∙

research

∙ 02/28/2022

An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

This technical note describes the recent updates of Graphormer, includin...

0 Yu Shi, et al. ∙

research

∙ 02/22/2022

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

While end-to-end models have shown great success on the Automatic Speech...

0 Jinhan Wang, et al. ∙

research

∙ 02/16/2022

HousE: Knowledge Graph Embedding with Householder Parameterization

The effectiveness of knowledge graph embedding (KGE) largely depends on ...

0 Rui Li, et al. ∙

research

∙ 11/02/2021

Can Vision Transformers Perform Convolution?

Several recent studies have demonstrated that attention-based networks, ...

0 Shanda Li, et al. ∙

research

∙ 10/13/2021

Boosting the Certified Robustness of L-infinity Distance Nets

Recently, Zhang et al. (2021) developed a new neural network architectur...

0 Bohang Zhang, et al. ∙

research

∙ 06/23/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

The attention module, which is a crucial component in Transformer, canno...

9 Shengjie Luo, et al. ∙

research

∙ 06/15/2021

First Place Solution of KDD Cup 2021 OGB Large-Scale Challenge Graph-Level Track

In this technical report, we present our solution of KDD Cup 2021 OGB La...

7 Chengxuan Ying, et al. ∙

research

∙ 06/09/2021

Do Transformers Really Perform Bad for Graph Representation?

The Transformer architecture has become a dominant choice in many domain...

16 Chengxuan Ying, et al. ∙

research

∙ 05/31/2021

Adversarial Training with Rectified Rejection

Adversarial training (AT) is one of the most effective strategies for pr...

0 Tianyu Pang, et al. ∙

research

∙ 05/10/2021

How could Neural Networks understand Programs?

Semantic understanding of programs is a fundamental problem for programm...

0 Dinglan Peng, et al. ∙

research

∙ 03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...

0 Samik Sadhu, et al. ∙

research

∙ 02/27/2021

Transformers with Competitive Ensembles of Independent Mechanisms

An important development in deep learning from the earliest MLPs has bee...

8 Alex Lamb, et al. ∙

research

∙ 02/25/2021

LazyFormer: Self Attention with Lazy Update

Improving the efficiency of Transformer-based language pre-training is a...

0 Chengxuan Ying, et al. ∙

research

∙ 02/18/2021

Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder

Many real-world applications use Siamese networks to efficiently match t...

0 Shuqi Lu, et al. ∙

research

∙ 02/16/2021

Revisiting Language Encoding in Learning Multilingual Representations

Transformer has demonstrated its great power to learn contextual word re...

0 Shengjie Luo, et al. ∙

research

∙ 02/10/2021

Towards Certifying ℓ_∞ Robustness using Neural Networks with ℓ_∞-dist Neurons

It is well-known that standard neural networks, even with a high classif...

2 Bohang Zhang, et al. ∙

research

∙ 01/31/2021

CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics

Accurate and robust prediction of patient's response to drug treatments ...

0 Di He, et al. ∙

research

∙ 10/09/2020

A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

An unsolved fundamental problem in biology and ecology is to predict obs...

0 Di He, et al. ∙

research

∙ 09/07/2020

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Normalization plays an important role in the optimization of deep neural...

1 Tianle Cai, et al. ∙

research

∙ 08/04/2020

Taking Notes on the Fly Helps BERT Pre-training

How to make unsupervised language pre-training more efficient and less r...

0 Qiyu Wu, et al. ∙

research

∙ 07/24/2020

Transferred Discrepancy: Quantifying the Difference Between Representations

Understanding what information neural networks capture is an essential p...

0 Yunzhen Feng, et al. ∙

research

∙ 06/28/2020

Rethinking Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...

0 Guolin Ke, et al. ∙

research

∙ 06/28/2020

Rethinking the Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...

0 Guolin Ke, et al. ∙

research

∙ 06/10/2020

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foun...

0 Zhenhui Xu, et al. ∙

research

∙ 05/12/2020

Invertible Image Rescaling

High-resolution digital images are usually downscaled to fit various dis...

8 Mingqing Xiao, et al. ∙

research

∙ 02/17/2020

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural...

0 Jinhua Zhu, et al. ∙

research

∙ 02/12/2020

On Layer Normalization in the Transformer Architecture

The Transformer is widely used in natural language processing tasks. To ...

0 Ruibin Xiong, et al. ∙

research

∙ 01/08/2020

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Adversarial training is one of the most popular ways to learn robust mod...

4 Runtian Zhai, et al. ∙

research

∙ 11/19/2019

Defective Convolutional Layers Learn Robust CNNs

Robustness of convolutional neural networks has recently been highlighte...

20 Tiange Luo, et al. ∙

research

∙ 10/25/2019

Fast Structured Decoding for Sequence Models

Autoregressive sequence models achieve state-of-the-art performance in d...

0 Zhiqing Sun, et al. ∙

research

∙ 09/27/2019

On the Anomalous Generalization of GANs

Generative models, especially Generative Adversarial Networks (GANs), ha...

46 Jinchen Xuan, et al. ∙

research

∙ 09/15/2019

Hint-Based Training for Non-Autoregressive Machine Translation

Due to the unparallelizable nature of the autoregressive factorization, ...

0 Zhuohan Li, et al. ∙

research

∙ 08/25/2019

Multilingual Neural Machine Translation with Language Clustering

Multilingual neural machine translation (NMT), which translates multiple...

0 Xu Tan, et al. ∙

research

∙ 07/28/2019

Representation Degeneration Problem in Training Natural Language Generation Models

We study an interesting problem in training neural network-based models ...

0 Jun Gao, et al. ∙

research

∙ 06/06/2019

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

The Transformer architecture is widely used in natural language processi...

0 Yiping Lu, et al. ∙

Di He

Featured Co-authors

Sign in with Google

Consider DeepAI Pro