Constrained Optimal Querying: Huffman Coding and Beyond
Huffman coding is well known to be useful in certain decision problems involving minimizing the average number of (freely chosen) queries to determine an unknown random variable. However, in problems where the queries are more constrained, the original Huffman coding no longer works. In this paper, we proposed a general model to describe such problems and two code schemes: one is Huffman-based, and the other called GBSC (Greedy Binary Separation Coding). We proved the optimality of GBSC by induction on a binary decision tree, telling us that GBSC is at least as good as Shannon coding. We then compared the two algorithms based on these two codes, by testing them with two problems: DNA detection and 1-player Battleship, and found both to be decent approximating algorithms, with Huffman-based algorithm giving an expected length 1.1 times the true optimal in DNA detection problem, and GBSC yielding an average number of queries 1.4 times the theoretical optimal in 1-player Battleship.
READ FULL TEXT