Shi Dong

research

∙ 06/04/2023

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Reinforcement learning from human feedback (RLHF) has emerged as a relia...

0 Banghua Zhu, et al. ∙

research

∙ 05/19/2023

Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models

A centerpiece of the ever-popular reinforcement learning from human feed...

0 Wanqiao Xu, et al. ∙

research

∙ 12/24/2022

Inclusive Artificial Intelligence

Prevailing methods for assessing and comparing generative AIs incentiviz...

0 Dilip Arumugam, et al. ∙

research

∙ 11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...

0 Wanqiao Xu, et al. ∙

research

∙ 07/07/2022

A unified interpretable intelligent learning diagnosis framework for smart education

Intelligent learning diagnosis is a critical engine of smart education, ...

1 Zhifeng Wang, et al. ∙

research

∙ 03/23/2022

Online Encrypted Skype Identification Based on an Updating Mechanism

The machine learning algorithm is gaining prominence in traffic identifi...

0 Shi Dong, et al. ∙

research

∙ 02/10/2021

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

We design a simple reinforcement learning agent that, with a specificati...

0 Shi Dong, et al. ∙

research

∙ 09/23/2020

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

In this paper, we propose Ensemble Learning models to identify factors c...

0 Shi Dong, et al. ∙

research

∙ 12/13/2019

Provably Efficient Reinforcement Learning with Aggregated States

We establish that an optimistic variant of Q-learning applied to a finit...

0 Shi Dong, et al. ∙

research

∙ 11/26/2019

Summarizing CPU and GPU Design Trends with Product Data

Moore's Law and Dennard Scaling have guided the semiconductor industry f...

0 Yifan Sun, et al. ∙

research

∙ 11/18/2019

Comments on the Du-Kakade-Wang-Yang Lower Bounds

Du, Kakade, Wang, and Yang recently established intriguing lower bounds ...

0 Benjamin Van Roy, et al. ∙

research

∙ 05/12/2019

On the Performance of Thompson Sampling on Logistic Bandits

We study the logistic bandit, in which rewards are binary with success p...

10 Shi Dong, et al. ∙

research

∙ 10/15/2018

MGSim + MGMark: A Framework for Multi-GPU System Research

The rapidly growing popularity and scale of data-parallel workloads dema...

0 Yifan Sun, et al. ∙

research

∙ 05/30/2018

An Information-Theoretic Analysis for Thompson Sampling with Many Actions

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...

0 Shi Dong, et al. ∙

research

∙ 05/30/2018

An Information-Theoretic Analysis of Thompson Sampling for Large Action Spaces

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...

0 Shi Dong, et al. ∙

Shi Dong

Featured Co-authors

Sign in with Google

Consider DeepAI Pro