Anonymous Pattern Molecular Fingerprint and its Applications on Property Identification
Molecular fingerprints are significant cheminformatics tools to map molecules into vectorial space according to their characteristics in diverse functional groups, atom sequences, and other topological structures. In this paper, we set out to investigate a novel molecular fingerprint Anonymous-FP that possesses abundant perception about the underlying interactions shaped in small, medium, and large molecular scale links. In detail, the possible inherent atom chains are sampled from each molecule and are extended in a certain anonymous pattern. After that, the molecular fingerprint Anonymous-FP is encoded in virtue of the Natural Language Processing technique PV-DBOW. Anonymous-FP is studied on molecular property identification and has shown valuable advantages such as rich information content, high experimental performance, and full structural significance. During the experimental verification, the scale of the atom chain or its anonymous manner matters significantly to the overall representation ability of Anonymous-FP. Generally, the typical scale r = 8 enhances the performance on a series of real-world molecules, and specifically, the accuracy could level up to above 93% on all NCI datasets.
READ FULL TEXT