1. Implement the k-means clustering algorithm either in Java or Python. • The program should be executable with at least 3 parameters: the name of the dataset file, k, and the name of the output file. • The output file should contain numerical class labels (formatted as one number per row) for all the records in the test dataset and report the sum squared error (SSE) and silhouette coefficient in the last row. • You only need to handle numerical attributes (categorical attributes are not required).

Respuesta :

Answer:

The code for this Question in Python is as follows:

matplotlib inline

from copy import deepcopy

import numpy as np

import pandas as pd

from matplotlib import pyplot as plt

plt.rcParams['figure.figsize'] = (16, 9)

plt.style.use('ggplot')

# Importing the dataset

data = pd.read_csv('xclara.csv')

print(data.shape)

data.head()

# Getting the values and plotting it

f1 = data['V1'].values

f2 = data['V2'].values

X = np.array(list(zip(f1, f2)))

plt.scatter(f1, f2, c='black', s=7)

# Number of clusters

k = 3

# X coordinates of random centroids

C_x = np.random.randint(0, np.max(X)-20, size=k)

# Y coordinates of random centroids

C_y = np.random.randint(0, np.max(X)-20, size=k)

C = np.array(list(zip(C_x, C_y)), dtype=np.float32)

print(C)

# To store the value of centroids when it updates

C_old = np.zeros(C.shape)

# Cluster Lables(0, 1, 2)

clusters = np.zeros(len(X))

# Error func. - Distance between new centroids and old centroids

error = dist(C, C_old, None)

# Loop will run till the error becomes zero

while error != 0:

   # Assigning each value to its closest cluster

   for i in range(len(X)):

       distances = dist(X[i], C)

       cluster = np.argmin(distances)

       clusters[i] = cluster

   # Storing the old centroid values

   C_old = deepcopy(C)

   # Finding the new centroids by taking the average value

   for i in range(k):

       points = [X[j] for j in range(len(X)) if clusters[j] == i]

       C[i] = np.mean(points, axis=0)

   error = dist(C, C_old, None)

# Initializing KMeans

kmeans = KMeans(n_clusters=4)

# Fitting with inputs

kmeans = kmeans.fit(X)

# Predicting the clusters

labels = kmeans.predict(X)

# Getting the cluster centers

C = kmeans.cluster_centers_

fig = plt.figure()

ax = Axes3D(fig)

ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y)

ax.scatter(C[:, 0], C[:, 1], C[:, 2], marker='*', c='#050505', s=1000)

ACCESS MORE
EDU ACCESS
Universidad de Mexico