Merging Sorted Lists of Similar Strings

08/19/2022
by   Gene Myers, et al.
0

Merging T sorted, non-redundant lists containing M elements into a single sorted, non-redundant result of size N ≥ M/T is a classic problem typically solved practically in O(M log T) time with a priority-queue data structure the most basic of which is the simple *heap*. We revisit this problem in the situation where the list elements are *strings* and the lists contain many *identical or nearly identical elements*. By keeping simple auxiliary information with each heap node, we devise an O(M log T+S) worst-case method that performs no more character comparisons than the sum of the lengths of all the strings S, and another O(M log (T/ e̅)+S) method that becomes progressively more efficient as a function of the fraction of equal elements e̅ = M/N between input lists, reaching linear time when the lists are all identical. The methods perform favorably in practice versus an alternate formulation based on a trie.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2020

Uniform Linked Lists Contraction

We present a parallel algorithm (EREW PRAM algorithm) for linked lists c...
research
12/15/2020

Sorting Lists with Equal Keys Using Mergesort in Linear Time

This article introduces a new optimization method to improve mergesort's...
research
08/13/2020

On seat allocation problem with multiple merit lists

In this note, we present a simpler algorithm for joint seat allocation p...
research
06/16/2020

Discovering outstanding subgroup lists for numeric targets using MDL

The task of subgroup discovery (SD) is to find interpretable description...
research
04/08/2021

Permutation Encoding for Text Steganography: A Short Tutorial

We explore a method of encoding secret messages using factoradic numberi...
research
04/05/2023

Fast computation of approximate weak common intervals in multiple indeterminate strings

In ongoing work to define a principled method for syntenic block discove...
research
06/17/2020

Improvements in Computation and Usage of Joint CDFs for the N-Dimensional Order Statistic

Order statistics provide an intuition for combining multiple lists of sc...

Please sign up or login with your details

Forgot password? Click here to reset