How does DBSCAN clustering algorithm work?

the biggest secrets behind a clustering algorithm

If you can’t explain it simply, you don’t understand it well enough.” — Albert Einstein
Scikit Learn — Demo of DBSCAN clustering algorithm
Dense and Sparse region


Core point:

Border point:

Noise point:

Density Edge:

Density Connected Points:

DBSCAN algorithm:


How to choose Min Points?

How to determine eps?

4th nearest neighbor of p is p4

When DBSCAN work well?

When not!

Time and Space complexity:


Simple Overview:

from sklearn.cluster import DBSCAN
from sklearn import metrics
import numpy as np
X = #load the data
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
#Storing the labels formed by the DBSCAN
labels = clustering.labels_
# measure the performance of dbscan algo
#Identifying which points make up our “core points”
core_samples = np.zeros_like(labels, dtype=bool)
core_samples[clustering.core_sample_indices_] = True
#Calculating "the number of clusters"
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
#Computing "the Silhouette Score"
("Silhouette Coefficient: %0.3f"
% metrics.silhouette_score(X, labels))

Implementation of real-world data:

And now, your journey begins!

Data Engineer | Python Programmer | Instructor | Tech Enthusiast