Approximate Queries and Representations for Large Data Sequences
Many new database application domains such as experimental sciences and medicine are characterized by large sequences as their main form of data. Using approximate representation can significantly reduce the required storage and search space. A good choice of representation, can support a broad new class of approximate queries, needed in these domains. These queries are concerned with application dependent features of the data as opposed to the actual sampled points. We introduce a new notion of generalized approximate queries and a general divide and conquer approach that supports them. This approach uses families of real-valued functions as an approximate representation. We present an algorithm for realizing our technique, and the results of applying it to medical cardiology data. (Extended version is available in Tech Report CS-95-03, Dept of Computer Science, Brown University. http://cs.brown.edu/research/pubs/techreports/reports/CS-95-03.html)
READ FULL TEXT