Original Data (High-D Space)
t-SNE Embedding (2D)
qᵢⱼ = (1 + ||yᵢ - yⱼ||²)⁻¹ / Σₖ≠ₗ (1 + ||yₖ - yₗ||²)⁻¹
Student-t distribution in low-D space
KL Divergence (Cost Function)
Pairwise Similarities (P matrix)
pⱼ|ᵢ = exp(-||xᵢ-xⱼ||²/2σᵢ²) / Σₖ≠ᵢ exp(-||xᵢ-xₖ||²/2σᵢ²)
Gaussian similarity in high-D space
Perplexity
30
Effective neighbors
Low perplexity → local structure
High perplexity → global structure
Algorithm
0
Iteration
Learning Rate
Statistics
Points
0
KL Div
--
Clusters
0
σ range
--
Quick Examples
How it works: t-SNE preserves local neighborhood structure by matching pairwise similarities. It uses Gaussian kernels in high-D and heavy-tailed Student-t in low-D to avoid crowding. The KL divergence cost is minimized via gradient descent.