TransPolymer: a Transformer-based Language Model for Polymer Property Predictions

09/03/2022
by   Changwen Xu, et al.
22

Accurate and efficient prediction of polymer properties is of great significance in polymer development and design. Conventionally, expensive and time-consuming experiments or simulations are required to evaluate the function of polymers. Recently, Transformer models, equipped with attention mechanisms, have exhibited superior performance in various natural language processing tasks. However, such methods have not been investigated in polymer sciences. Herein, we report TransPolymer, a Transformer-based language model for polymer property prediction. Owing to our proposed polymer tokenizer with chemical awareness, TransPolymer can learn representations directly from polymer sequences. The model learns expressive representations by pretraining on a large unlabeled dataset, followed by finetuning the model on downstream datasets concerning various polymer properties. TransPolymer achieves superior performance in all eight datasets and surpasses other baselines significantly on most downstream tasks. Moreover, the improvement by the pretrained TransPolymer over supervised TransPolymer and other language models strengthens the significant benefits of pretraining on large unlabeled data in representation learning. Experiment results further demonstrate the important role of the attention mechanism in understanding polymer sequences. We highlight this model as a promising computational tool for promoting rational polymer design and understanding structure-property relationships in a data science view.

READ FULL TEXT

page 6

page 19

page 20

page 21

research
06/17/2021

Do Large Scale Molecular Language Representations Capture Important Structural Information?

Predicting chemical properties from the structure of a molecule is of gr...
research
06/29/2020

Knowledge-Aware Language Model Pretraining

How much knowledge do pretrained language models hold? Recent research o...
research
08/12/2021

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Transformer-based pretrained language models (T-PTLMs) have achieved gre...
research
04/07/2020

Byte Pair Encoding is Suboptimal for Language Model Pretraining

The success of pretrained transformer language models in natural languag...
research
08/17/2022

DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation

Machine learning assisted modeling of the inter-atomic potential energy ...
research
08/30/2023

Materials Informatics Transformer: A Language Model for Interpretable Materials Properties Prediction

Recently, the remarkable capabilities of large language models (LLMs) ha...
research
02/19/2023

Learning Language Representations with Logical Inductive Bias

Transformer architectures have achieved great success in solving natural...

Please sign up or login with your details

Forgot password? Click here to reset