New Jersey Institute of Technology Unsupervised Data Mining Clustering Worksheet
Description
Part1
Generate a set S of 500 points (vectors) in 2–dimensional
Euclidean space. Use the Euclidean distance to measure
the distance between any two points. Write a program to find all
the outliers in your set S and print out these outliers. If there is no
outlier, your program should indicate so. Use any programming
language of your choice (specify the programming language you
use in the project).
https://careerfoundry.com/en/blog/data–analytics/how–to–find–
outliers/
Next, remove the outliers from S, and call the resulting set S’.
Part2
(1)Write a program that implements the hierarchical
agglomerative clustering algorithm taught in the
class to cluster the points in S’ into k clusters where
kis a user–specified parameter value.
(2)Repeat part 1 and (1) above on two additional
different datasets.
Notes on the hierarchical agglomerative clustering algorithm
In determining the distance of two clusters, you should
consider the following definitions respectively:
Øthe distance between the nearest two points in the two
clusters,
Øthe distance between the farthest two points in the two
clusters,
Øthe average distance between points in the two clusters,
Øthe distance between the centers of the two clusters.
Use the definition that yields the best performance where the
performance is measured by the Silhouette coefficient.
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."