This article is a little demonstration of Bisecting K-means. We all are familiar with K-means, let’s see what Bisecting the data does.

So, the infamous problem of centroid initialization in K-means has many solutions, one of them is bisecting the data points. As the main goal of the K-means algorithm is to reduce the SSE(sum of squared errors/distance from data points to its closet centroid), bisecting also aims for the same. It breaks the data into 2 clusters, then breaks the largest cluster or the cluster with maximum SSE into again 2 clusters and so on until K clusters has been produced.

Now, the centroid of the finally produced clusters can be used as the initialization points for the standard K-mean algorithm. These points ought to serve better because the clusters already have the minimum SSE.

By recording the sequence of these clusters produced we can get the hierarchy of the clusters as well.

Here is a little visualization taken from the textbook-

Here is the little notebook showing the difference between SSE using K-means directly and with using Bisecting K-means.

There is an open issue to add the algorithm in sklean library-