
Towards NNGPguided Neural Architecture Search
The predictions of wide Bayesian neural networks are described by a Gaus...
read it

The geometry of integration in text classification RNNs
Despite the widespread application of recurrent neural networks (RNNs) a...
read it

Rapid Domain Adaptation for Machine Translation with Monolingual Data
One challenge of machine translation is how to quickly adapt to unseen d...
read it

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior
Most undeciphered lost languages exhibit two characteristics that pose s...
read it

Gradient Vaccine: Investigating and Improving Multitask Optimization in Massively Multilingual Models
Massively multilingual models subsuming tens or even hundreds of languag...
read it

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins
We analyze the properties of gradient descent on convex surrogates for t...
read it

An explicit expression for Euclidean selfdual cyclic codes of length 2^k over Galois ring GR(4,m)
For any positive integers m and k, existing literature only determines t...
read it

Agnostic Learning of a Single Neuron with Gradient Descent
We consider the problem of learning the bestfitting single neuron as me...
read it

Leveraging Monolingual Data with SelfSupervision for Multilingual Neural Machine Translation
Over the last few years two promising research directions in lowresourc...
read it

Your GAN is Secretly an Energybased Model and You Should use Discriminator Driven Latent Sampling
We show that the sum of the implicit generator logdensity log p_g of a ...
read it

Echo State Neural Machine Translation
We present neural machine translation (NMT) models inspired by echo stat...
read it

Fullyhierarchical finegrained prosody modeling for interpretable speech synthesis
This paper proposes a hierarchical, finegrained and interpretable laten...
read it

Generating diverse and natural texttospeech samples using a quantized finegrained VAE and autoregressive prosody prior
Recent neural texttospeech (TTS) models with finegrained latent featu...
read it

Towards Understanding the Spectral Bias of Deep Learning
An intriguing phenomenon observed during training neural networks is the...
read it

How Much Overparameterization Is Sufficient to Learn Deep ReLU Networks?
A recent line of research on deep learning focuses on the extremely over...
read it

Tight Sample Complexity of Learning Onehiddenlayer Convolutional Neural Networks
We study the sample complexity of learning onehiddenlayer convolutiona...
read it

AlgorithmDependent Generalization Bounds for Overparameterized Deep Residual Networks
The skipconnections used in residual networks have become a standard ar...
read it

On selfduality and hulls of cyclic codes over F_2^m[u]/〈 u^k〉 with oddly even length
Let F_2^m be a finite field of 2^m elements, and R=F_2^m[u]/〈 u^k〉=F_2^m...
read it

Video Prediction for Precipitation Nowcasting
Video prediction, which aims to synthesize new consecutive frames subseq...
read it

Construction and enumeration for selfdual cyclic codes of even length over F_2^m + uF_2^m
Let F_2^m be a finite field of cardinality 2^m, R=F_2^m+uF_2^m (u^2=0) a...
read it

An efficient method to construct selfdual cyclic codes of length p^s over F_p^m+uF_p^m
Let p be an odd prime number, F_p^m be a finite field of cardinality p^m...
read it

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
We introduce our efforts towards building a universal neural machine tra...
read it

Neural Decipherment via MinimumCost Flow: from Ugaritic to Linear B
In this paper we propose a novel neural approach for automatic decipherm...
read it

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
We study the training and generalization of deep neural networks (DNNs) ...
read it

Explicit representation for a class of Type 2 constacyclic codes over the ring F_2^m[u]/〈 u^2λ〉 with even length
Let F_2^m be a finite field of cardinality 2^m, λ and k be integers sati...
read it

Managing Recurrent Virtual Network Updates in MultiTenant Datacenters: A System Perspective
With the advent of softwaredefined networking, network configuration th...
read it

Umbrella: Enabling ISPs to Offer Readily Deployable and PrivacyPreserving DDoS Prevention Services
Defending against distributed denial of service (DDoS) attacks in the In...
read it

AccFlow: Defending Against the LowRate TCP DoS Attack in Wireless Sensor Networks
Because of the open nature of the Wireless Sensor Networks (WSN), the De...
read it

Lingvo: a Modular and Scalable Framework for SequencetoSequence Modeling
Lingvo is a Tensorflow framework offering a complete solution for collab...
read it

Selfdual binary [8m, 4m]codes constructed by left ideals of the dihedral group algebra F_2[D_8m]
Let m be an arbitrary positive integer and D_8m be a dihedral group of o...
read it

A Generalization Theory of Gradient Descent for Learning Overparameterized Deep ReLU Networks
Empirical studies show that gradient based methods can learn deep neural...
read it

An explicit representation and enumeration for selfdual cyclic codes over F_2^m+uF_2^m of length 2^s
Let F_2^m be a finite field of cardinality 2^m and s a positive integer....
read it

An explicit representation and enumeration for negacyclic codes of length 2^kn over Z_4+uZ_4
In this paper, an explicit representation and enumeration for negacyclic...
read it

Stochastic Gradient Descent Optimizes Overparameterized Deep ReLU Networks
We study the problem of training deep neural networks with Rectified Lin...
read it

Leveraging Weakly Supervised Data to Improve EndtoEnd SpeechtoText Translation
Endtoend Speech Translation (ST) models have many potential advantages...
read it

Hierarchical Generative Modeling for Controllable Speech Synthesis
This paper proposes a neural endtoend texttospeech (TTS) model which...
read it

High Temperature Structure Detection in Ferromagnets
This paper studies structure detection problems in high temperature ferr...
read it

Training Deeper Neural Machine Translation Models with Transparent Attention
While current stateoftheart NMT models, such as RNN seq2seq and Trans...
read it

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Adaptive gradient methods are workhorses in deep learning. However, the ...
read it

A class of repeatedroot constacyclic codes over F_p^m[u]/〈 u^e〉 of Type 2
Let F_p^m be a finite field of cardinality p^m where p is an odd prime, ...
read it

Matrixproduct structure of constacyclic codes over finite chain rings F_p^m[u]/〈 u^e〉
Let m,e be positive integers, p a prime number, F_p^m be a finite field ...
read it

Negacyclic codes over the local ring Z_4[v]/〈 v^2+2v〉 of oddly even length and their Gray images
Let R=Z_4[v]/〈 v^2+2v〉=Z_4+vZ_4 (v^2=2v) and n be an odd positive intege...
read it

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Neural Machine Translation (NMT) is an endtoend learning approach for ...
read it

Local and Global Inference for High Dimensional Nonparanormal Graphical Models
This paper proposes a unified framework to quantify local and global inf...
read it
Yuan Cao
is this you? claim profile