Generation of plausible yet incorrect factual information, termed
halluc...
Sound event detection (SED) often suffers from the data deficiency probl...
In future B5G/6G broadband communication systems, non-linear signal
dist...
We present a scalable method to build a high quality instruction followi...
Knowledge graph embeddings (KGE) have been extensively studied to embed
...
In recent years, self-supervised learning (SSL) has emerged as a popular...
Information extraction, e.g., attribute value extraction, has been
exten...
We present a new task setting for attribute mining on e-commerce product...
Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Expert...
In recent years, large pre-trained language models (LLMs) have demonstra...
Recent work has shown that fine-tuning large pre-trained language models...
Recent work in multilingual translation advances translation quality
sur...
Hate speech detection is complex; it relies on commonsense reasoning,
kn...
Deep face recognition has achieved great success due to large-scale trai...
Multilingual pre-trained models are known to suffer from the curse of
mu...
Large language models, which are often trained for hundreds of thousands...
Automatic extraction of product attributes from their textual descriptio...
Self-supervised learning (SSL) learns knowledge from a large amount of
u...
We describe a method to jointly pre-train speech and text in an
encoder-...
All-MLP architectures have attracted increasing interest as an alternati...
Mixture of Experts layers (MoEs) enable efficient scaling of language mo...
Large-scale autoregressive language models such as GPT-3 are few-shot
le...
Do language models have beliefs about the world? Dennett (1995) famously...
This paper focuses on developing energy-efficient online data processing...
Multilingual neural machine translation (MNMT) learns to translate multi...
In this paper, we describe our end-to-end multilingual speech translatio...
Pretraining and multitask learning are widely used to improve the speech...
Procedural fairness has been a public concern, which leads to controvers...
Mobile edge computing (MEC) has recently become a prevailing technique t...
Multi-head attention has each of the attention heads collect salient
inf...
Is bias amplified when neural machine translation (NMT) models are optim...
Multilingual Transformer improves parameter efficiency and crosslingual
...
Multilingual machine translation has attracted much attention recently d...
Mobile edge computing (MEC) is a promising paradigm to accommodate the
i...
This article introduces subbagging (subsample aggregating) estimation
ap...
Multilingual neural machine translation has shown the capability of dire...
Active learning (AL) algorithms may achieve better performance with fewe...
We propose an effective approach to utilize pretrained speech and text m...
Fuzzing is one of the most effective technique to identify potential sof...
The Transformer model has achieved state-of-the-art performance in many
...
Recent work demonstrates the potential of multilingual pretraining of
cr...
Can one build a knowledge graph (KG) for all products in the world? Know...
Recent studies have demonstrated the cross-lingual alignment ability of
...
Product catalogs are valuable resources for eCommerce website. In the
ca...
Generally, it is common that cited papers are earlier than citing papers...
There has been recent success in pre-training on monolingual data and
fi...
The integration of mobile edge computing (MEC) and wireless power transf...
This paper demonstrates that multilingual denoising pre-training produce...
Posterior collapse plagues VAEs for text, especially for conditional tex...
Most sequence-to-sequence (seq2seq) models are autoregressive; they gene...