Testing frequency distributions in a stream
We study how to verify specific frequency distributions when we observe a stream of N data items taken from a universe of n distinct items. We introduce the relative Fréchet distance to compare two frequency functions in a homogeneous manner. We consider two streaming models: insertions only and sliding windows. We present a Tester for a certain class of functions, which decides if f is close to g or if f is far from g with high probability, when f is given and g is defined by a stream. If f is uniform we show a space Ω(n) lower bound. If f decreases fast enough, we then only use space O(log^2 n·loglog n). The analysis relies on the Spacesaving algorithm <cit.> and on sampling the stream.
READ FULL TEXT