b'Shaohan Huang'

research

∙ 09/20/2023

Kosmos-2.5: A Multimodal Literate Model

We present Kosmos-2.5, a multimodal literate model for machine reading o...

0 Tengchao Lv, et al. ∙

research

∙ 09/18/2023

Adapting Large Language Models via Reading Comprehension

We explore how continued pre-training on domain-specific corpora influen...

0 Daixuan Cheng, et al. ∙

research

∙ 09/03/2023

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

The increasing volume of log data produced by software-intensive systems...

0 Jiaxing Qi, et al. ∙

research

∙ 07/31/2023

Scaling Sentence Embeddings with Large Language Models

Large language models (LLMs) have recently garnered significant interest...

0 Ting Jiang, et al. ∙

research

∙ 07/17/2023

Retentive Network: A Successor to Transformer for Large Language Models

In this work, we propose Retentive Network (RetNet) as a foundation arch...

0 Yutao Sun, et al. ∙

research

∙ 07/05/2023

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Scaling sequence length has become a critical demand in the era of large...

0 Jiayu Ding, et al. ∙

research

∙ 06/26/2023

Kosmos-2: Grounding Multimodal Large Language Models to the World

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enablin...

0 Zhiliang Peng, et al. ∙

research

∙ 05/31/2023

Learning Music Sequence Representation from Text Supervision

Music representation learning is notoriously difficult for its complex h...

0 Tianyu Chen, et al. ∙

research

∙ 05/16/2023

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Recent studies have shown that dual encoder models trained with the sent...

0 Ziheng Li, et al. ∙

research

∙ 05/06/2023

Pre-training Language Model as a Multi-perspective Course Learner

ELECTRA, the generator-discriminator pre-training framework, has achieve...

0 Beiduo Chen, et al. ∙

research

∙ 03/21/2023

LogQA: Question Answering in Unstructured Logs

Modern systems produce a large volume of logs to record run-time status ...

0 Shaohan Huang, et al. ∙

research

∙ 03/15/2023

UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation

Large Language Models (LLMs) are popular for their impressive abilities,...

0 Daixuan Cheng, et al. ∙

research

∙ 02/27/2023

Language Is Not All You Need: Aligning Perception with Language Models

A big convergence of language, multimodal perception, action, and world ...

0 Shaohan Huang, et al. ∙

research

∙ 12/20/2022

A Length-Extrapolatable Transformer

Position modeling plays a critical role in Transformers. In this paper, ...

0 Yutao Sun, et al. ∙

research

∙ 12/20/2022

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Pre-trained models have achieved remarkable success in natural language ...

0 Jian Yang, et al. ∙

research

∙ 11/23/2022

TorchScale: Transformers at Scale

Large Transformers have achieved state-of-the-art performance across man...

0 Shuming Ma, et al. ∙

research

∙ 10/26/2022

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

In this paper, we elaborate upon recipes for building multilingual repre...

0 Barun Patra, et al. ∙

research

∙ 10/13/2022

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

Named entity recognition (NER) suffers from the scarcity of annotated tr...

0 Jian Yang, et al. ∙

research

∙ 10/12/2022

Foundation Transformers

A big convergence of model architectures across language, vision, speech...

26 Hongyu Wang, et al. ∙

research

∙ 07/19/2022

MoEC: Mixture of Expert Clusters

Sparsely Mixture of Experts (MoE) has received great interest due to its...

0 Yuan Xie, et al. ∙

research

∙ 06/13/2022

Language Models are General-Purpose Interfaces

Foundation models have received much attention due to their effectivenes...

0 Yaru Hao, et al. ∙

research

∙ 06/01/2022

Task-Specific Expert Pruning for Sparse Mixture-of-Experts

The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pr...

0 Tianyu Chen, et al. ∙

research

∙ 06/01/2022

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

As more and more pre-trained language models adopt on-cloud deployment, ...

0 Tianyu Chen, et al. ∙

research

∙ 04/20/2022

On the Representation Collapse of Sparse Mixture of Experts

Sparse mixture of experts provides larger model capacity while requiring...

0 Zewen Chi, et al. ∙

research

∙ 03/01/2022

DeepNet: Scaling Transformers to 1,000 Layers

In this paper, we propose a simple yet effective method to stabilize ext...

0 Hongyu Wang, et al. ∙

research

∙ 01/15/2022

Kformer: Knowledge Injection in Transformer Feed-Forward Layers

Knowledge-Enhanced Model have developed a diverse set of techniques for ...

0 Yunzhi Yao, et al. ∙

research

∙ 01/12/2022

PromptBERT: Improving BERT Sentence Embeddings with Prompts

The poor performance of the original BERT for sentence semantic similari...

0 Ting Jiang, et al. ∙

research

∙ 11/03/2021

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

This report describes Microsoft's machine translation systems for the WM...

0 Jian Yang, et al. ∙

research

∙ 10/21/2021

Improving Non-autoregressive Generation with Mixup Training

While pre-trained language models have achieved great success on various...

0 Ting Jiang, et al. ∙

research

∙ 09/15/2021

Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training

Compared to monolingual models, cross-lingual models usually require a m...

0 Bo Zheng, et al. ∙

research

∙ 06/30/2021

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

In this paper, we introduce ELECTRA-style tasks to cross-lingual languag...

0 Zewen Chi, et al. ∙

research

∙ 06/25/2021

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

While pretrained encoders have achieved success in various natural langu...

0 Shuming Ma, et al. ∙

research

∙ 06/25/2021

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

Large pre-trained models have achieved great success in many natural lan...

0 Yunzhi Yao, et al. ∙

research

∙ 06/15/2021

Consistency Regularization for Cross-Lingual Fine-Tuning

Fine-tuning pre-trained cross-lingual language models can transfer task-...

0 Bo Zheng, et al. ∙

research

∙ 06/11/2021

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

The cross-lingual language models are typically pretrained with masked l...

0 Zewen Chi, et al. ∙

research

∙ 12/31/2020

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

We generalize deep self-attention distillation in MiniLM (Wang et al., 2...

0 Wenhui Wang, et al. ∙

research

∙ 09/24/2020

Generating Commonsense Explanation by Extracting Bridge Concepts from Reasoning Paths

Commonsense explanation generation aims to empower the machine's sense-m...

0 Haozhe Ji, et al. ∙

research

∙ 09/24/2020

Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph

Despite the success of generative pre-trained language models on a serie...

0 Haozhe Ji, et al. ∙

research

∙ 06/01/2020

DocBank: A Benchmark Dataset for Document Layout Analysis

Document layout analysis usually relies on computer vision models to und...

0 Minghao Li, et al. ∙

research

∙ 12/31/2019

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of ...

0 Yiheng Xu, et al. ∙

research

∙ 03/05/2019

TableBank: Table Benchmark for Image-based Table Detection and Recognition

We present TableBank, a new image-based table detection and recognition ...

0 Minghao Li, et al. ∙

research

∙ 09/30/2018

Text Morphing

In this paper, we introduce a novel natural language generation task, te...

0 Shaohan Huang, et al. ∙

research

∙ 09/12/2018

Neural Melody Composition from Lyrics

In this paper, we study a novel task that learns to compose music from n...

4 Hangbo Bao, et al. ∙

research

∙ 07/06/2018

Neural Document Summarization by Jointly Learning to Score and Select Sentences

Sentence scoring and sentence selection are two main steps in extractive...

0 Qingyu Zhou, et al. ∙

research

∙ 06/21/2018

Dictionary-Guided Editing Networks for Paraphrase Generation

An intuitive way for a human to write paraphrase sentences is to replace...

0 Shaohan Huang, et al. ∙

research

∙ 06/19/2018

Response Generation by Context-aware Prototype Editing

Open domain response generation has achieved remarkable progress in rece...

0 Yu Wu, et al. ∙

Shaohan Huang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro