Yuan Cao

research

∙ 06/24/2023

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective

Graph neural networks (GNNs) have pioneered advancements in graph repres...

0 Wei Huang, et al. ∙

research

∙ 06/20/2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

We study the implicit bias of batch normalization trained by gradient de...

2 Yuan Cao, et al. ∙

research

∙ 06/08/2023

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

Large Language Models (LLMs) have been applied in the speech domain, oft...

0 Mingqiu Wang, et al. ∙

research

∙ 05/30/2023

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Large language models (LLMs) can learn to perform a wide range of natura...

0 Bailin Wang, et al. ∙

research

∙ 05/20/2023

Can Public Large Language Models Help Private Cross-device Federated Learning?

We study (differentially) private federated learning (FL) of language mo...

0 Boxin Wang, et al. ∙

research

∙ 05/17/2023

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Language models are increasingly being deployed for general problem solv...

0 Shunyu Yao, et al. ∙

research

∙ 04/21/2023

Learn to Cluster Faces with Better Subgraphs

Face clustering can provide pseudo-labels to the massive unlabeled face ...

0 Yuan Cao, et al. ∙

research

∙ 03/31/2023

Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

Gradient regularization, as described in <cit.>, is a highly effective t...

0 Xuran Meng, et al. ∙

research

∙ 03/15/2023

The Benefits of Mixup for Feature Learning

Mixup, a simple data augmentation method that randomly mixes two data po...

0 Difan Zou, et al. ∙

research

∙ 02/24/2023

MUX-PLMs: Pre-training Language Models with Data Multiplexing

Data multiplexing is a recently proposed method for improving a model's ...

0 Vishvak Murahari, et al. ∙

research

∙ 02/09/2023

Binarized Neural Machine Translation

The rapid scaling of language models is motivating research using low-bi...

0 Yichi Zhang, et al. ∙

research

∙ 02/08/2023

SimCGNN: Simple Contrastive Graph Neural Network for Session-based Recommendation

Session-based recommendation (SBR) problem, which focuses on next-item p...

0 Yuan Cao, et al. ∙

research

∙ 12/20/2022

AnyTOD: A Programmable Task-Oriented Dialog System

We propose AnyTOD, an end-to-end task-oriented dialog (TOD) system with ...

0 Jeffrey Zhao, et al. ∙

research

∙ 12/16/2022

Speech Aware Dialog System Technology Challenge (DSTC11)

Most research on task oriented dialog modeling is based on written text ...

0 Hagen Soltau, et al. ∙

research

∙ 12/03/2022

Fast Online Hashing with Multi-Label Projection

Hashing has been widely researched to solve the large-scale approximate ...

0 Wenzhe Jia, et al. ∙

research

∙ 10/13/2022

Knowledge-grounded Dialog State Tracking

Knowledge (including structured knowledge such as schema and ontology, a...

8 Dian Yu, et al. ∙

research

∙ 10/06/2022

ReAct: Synergizing Reasoning and Acting in Language Models

While large language models (LLMs) have demonstrated impressive capabili...

7 Shunyu Yao, et al. ∙

research

∙ 08/21/2022

Multiple Descent in the Multiple Random Feature Model

Recent works have demonstrated a double descent phenomenon in over-param...

5 Xuran Meng, et al. ∙

research

∙ 05/09/2022

Unsupervised Slot Schema Induction for Task-oriented Dialog

Carefully-designed schemas describing how to collect and annotate dialog...

2 Dian Yu, et al. ∙

research

∙ 05/09/2022

Building Machine Translation Systems for the Next Thousand Languages

In this paper we share findings from our effort to build practical machi...

8 Ankur Bapna, et al. ∙

research

∙ 05/02/2022

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

Label smoothing is ubiquitously applied in Neural Machine Translation (N...

1 Bowen Liang, et al. ∙

research

∙ 04/08/2022

Show, Don't Tell: Demonstrations Outperform Descriptions for Schema-Guided Task-Oriented Dialogue

Building universal dialogue systems that can seamlessly operate across m...

1 Raghav Gupta, et al. ∙

research

∙ 03/17/2022

Nearest Neighbor Classifier with Margin Penalty for Active Learning

As deep learning becomes the mainstream in the field of natural language...

32 Yuan Cao, et al. ∙

research

∙ 03/15/2022

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation

Multilingual neural machine translation models are trained to maximize t...

2 Yong Cheng, et al. ∙

research

∙ 02/14/2022

Benign Overfitting in Two-layer Convolutional Neural Networks

Modern neural networks often have great expressive power and can be trai...

0 Yuan Cao, et al. ∙

research

∙ 01/21/2022

Description-Driven Task-Oriented Dialog Modeling

Task-oriented dialogue (TOD) systems are required to identify key inform...

1 Jeffrey Zhao, et al. ∙

research

∙ 01/09/2022

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Achieving universal translation between all human language pairs is the ...

8 Aditya Siddhant, et al. ∙

research

∙ 12/31/2021

Benign Overfitting in Adversarially Robust Linear Classification

"Benign overfitting", where classifiers memorize noisy training data yet...

3 Jinghui Chen, et al. ∙

research

∙ 10/28/2021

Understanding How Encoder-Decoder Architectures Attend

Encoder-decoder networks with attention have proven to be a powerful way...

29 Kyle Aitken, et al. ∙

research

∙ 10/13/2021

SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems

Zero/few-shot transfer to unseen services is a critical challenge in tas...

9 Harrison Lee, et al. ∙

research

∙ 10/06/2021

Efficient and Private Federated Learning with Partially Trainable Networks

Federated learning is used for decentralized training of machine learnin...

8 Hakim Sidahmed, et al. ∙

research

∙ 09/19/2021

Towards Zero-Label Language Learning

This paper explores zero-label learning in Natural Language Processing (...

17 Zirui Wang, et al. ∙

research

∙ 08/31/2021

Effective Sequence-to-Sequence Dialogue State Tracking

Sequence-to-sequence models have been applied to a wide variety of NLP t...

3 Jeffrey Zhao, et al. ∙

research

∙ 08/25/2021

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

Adaptive gradient methods such as Adam have gained increasing popularity...

17 Difan Zou, et al. ∙

research

∙ 08/24/2021

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

With recent progress in joint modeling of visual and textual representat...

6 Zirui Wang, et al. ∙

research

∙ 06/07/2021

A Comprehensive Survey on Image Dehazing Based on Deep Learning

The presence of haze significantly reduces the quality of images. Resear...

10 Jie Gui, et al. ∙

research

∙ 05/17/2021

Construction and enumeration of left dihedral codes satisfying certain duality properties

Let 𝔽_q be the finite field of q elements and let D_2n=⟨ x,y| x^n=1, y^2...

0 Yuan Cao, et al. ∙

research

∙ 04/28/2021

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Modern machine learning systems such as deep neural networks are often h...

4 Yuan Cao, et al. ∙

research

∙ 02/27/2021

Improving Longer-range Dialogue State Tracking

Dialogue state tracking (DST) is a pivotal component in task-oriented di...

9 Ye Zhang, et al. ∙

research

∙ 02/18/2021

Echo State Speech Recognition

We propose automatic speech recognition (ASR) models inspired by echo st...

11 Harsh Shrivastava, et al. ∙

research

∙ 01/04/2021

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

We consider a one-hidden-layer leaky ReLU network of arbitrary width tra...

3 Spencer Frei, et al. ∙

research

∙ 11/11/2020

Towards NNGP-guided Neural Architecture Search

The predictions of wide Bayesian neural networks are described by a Gaus...

6 Daniel S. Park, et al. ∙

research

∙ 10/28/2020

The geometry of integration in text classification RNNs

Despite the widespread application of recurrent neural networks (RNNs) a...

13 Kyle Aitken, et al. ∙

research

∙ 10/23/2020

Rapid Domain Adaptation for Machine Translation with Monolingual Data

One challenge of machine translation is how to quickly adapt to unseen d...

5 Mahdis Mahdieh, et al. ∙

research

∙ 10/21/2020

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Most undeciphered lost languages exhibit two characteristics that pose s...

3 Jiaming Luo, et al. ∙

research

∙ 10/12/2020

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Massively multilingual models subsuming tens or even hundreds of languag...

3 Zirui Wang, et al. ∙

research

∙ 10/01/2020

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

We analyze the properties of gradient descent on convex surrogates for t...

1 Spencer Frei, et al. ∙

research

∙ 07/19/2020

An explicit expression for Euclidean self-dual cyclic codes of length 2^k over Galois ring GR(4,m)

For any positive integers m and k, existing literature only determines t...

0 Yuan Cao, et al. ∙

research

∙ 05/29/2020

Agnostic Learning of a Single Neuron with Gradient Descent

We consider the problem of learning the best-fitting single neuron as me...

4 Spencer Frei, et al. ∙

research

∙ 05/11/2020

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Over the last few years two promising research directions in low-resourc...

0 Aditya Siddhant, et al. ∙

Yuan Cao

Featured Co-authors

Sign in with Google

Consider DeepAI Pro