string2string: A Modern Python Library for String-to-String Algorithms

04/27/2023
by   Mirac Suzgun, et al.
0

We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis – along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Besides, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page: https://github.com/stanfordnlp/string2string.

READ FULL TEXT
research
11/29/2017

A critical analysis of string APIs: The case of Pharo

Most programming languages, besides C, provide a native abstraction for ...
research
03/13/2020

Knowledge Graph Alignment using String Edit Distance

In this work, we propose a novel knowledge base alignment technique base...
research
03/27/2019

Import2vec - Learning Embeddings for Software Libraries

We consider the problem of developing suitable learning representations ...
research
03/31/2020

A Clustering Framework for Lexical Normalization of Roman Urdu

Roman Urdu is an informal form of the Urdu language written in Roman scr...
research
10/08/2021

Contrastive String Representation Learning using Synthetic Data

String representation Learning (SRL) is an important task in the field o...
research
11/11/2020

Cryo-RALib – a modular library for accelerating alignment in cryo-EM

With the enhancement of algorithms, cryo-EM has become the most efficien...

Please sign up or login with your details

Forgot password? Click here to reset