A Quick Note on Sufficient Statistic

Siya
1 min readApr 13, 2021

--

We all know about simple mean and standard deviation in static data. But what happens when data comes in streams? You might argue that we can still compute the current mean and standard deviation of available records. Makes perfect sense!

But is it “sufficient”? Does this current mean give us the real picture of the mean of the data? Are we estimating the correct distribution?

As sufficient statistics says - yes!

Sample taken from entire Population

If the information about how the data was generated from a sample is equal to getting from multiple samples then its sufficient.

Hence any function calculated on it e.g. mean etc also is correct and informative.

Application-

Stream clustering

Using K-mean clustering on streaming data takes advantage of the same. The center of the clusters are updated using current samples.

References-

--

--

No responses yet