Learner Objectives
At the conclusion of this micro assignment, participants should be able to:
Implement a priority queue using a binary heap Implement hierarchical clustering using a priority queue Prerequisites
Before starting this micro assignment, participants should be able to:
Write Markdown and code cells in Jupyter Notebook Implement k-means clusteringHierarchical Clustering
A problem with k-means clustering is that we may not know what k is (though we could try several and compare the resulting cluster quality). Another way to cluster, avoiding such a parameter (but with its own issues), is called hierarchical clustering. A hierarchical clustering is a binary tree, with higher-up branches connecting subtrees that are less similar, and lowerdown branches connecting subtrees that are more similar.Hierarchical clustering builds a tree bottom up. Each object starts in its own cluster (a leaf node, denoted with a square in the above diagram). We then find the closest pair of clusters, and make them the children of a newly formed cluster (inner node, denoted with a diamond in the above diagram). That cluster is then treated like the rest of the clusters (leaves and inner nodes). We repeat the process:
1. Find the closest pair of clusters (which could be leaves or inner nodes)A. Make them children of a new cluster (inner node)2. Repeat until all that's left is a single object (the root)
When we form a cluster, we need to be able to compute its distance to the other clusters (leaves and inner nodes), so that we can determine which pair is closest. Distances involving one or two clusters can be computed in a number of ways. For this assignment we will simply use the centroid of a cluster as a representative point for computing the cluster's distance. We define the centroid of a cluster to be the weighted average of the centroids of all leaves in the subtree.
Read less