TSNE + K-Means: Data visualization and Clustering
K-Means Algorithm
I recently used the K-Means algorithm for clustering in my work. While reviewing my notes on the topic, I decided to publish them online :)
K-Means is a clustering algorithm that assigns each sample to one of the $K$ clusters.
How does AK-Means algorithm work?
Data Preparation
K-Means algorithm relies on distance calculations, so data should be normalized to prevent features with larger scales from dominating the results. The normalization can be achieved by the Scikit-Learn
library as follows.