A mixed policy to improve performance of language models on math problems

07/17/2023
by   Gang Chen, et al.
0

When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than 2% performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2022

Language models are better than humans at next-token prediction

Current language models are considered to have sub-human capabilities at...
research
05/25/2021

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

Existing pre-trained language models (PLMs) are often computationally ex...
research
06/11/2021

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

The cross-lingual language models are typically pretrained with masked l...
research
04/16/2023

Solving Math Word Problems by Combining Language Models With Symbolic Solvers

Automatically generating high-quality step-by-step solutions to math wor...
research
05/08/2023

Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question Answering

We propose a simple yet effective strategy to incorporate event knowledg...
research
03/24/2022

minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models

We present minicons, an open source library that provides a standard API...
research
02/27/2023

Systematic Rectification of Language Models via Dead-end Analysis

With adversarial or otherwise normal prompts, existing large language mo...

Please sign up or login with your details

Forgot password? Click here to reset