1. Implement the k-means clustering algorithm either in Java or Python. • The program should be executable with at least 3 parameters: the name of the dataset file, k, and the name of the output file. • The output file should contain numerical class labels (formatted as one number per row) for all the records in the test dataset and report the sum squared error (SSE) and silhouette coefficient in the last row. • You only need to handle numerical attributes (categorical attributes are not required).

Question

hayliebell7943 hayliebell7943

29-02-2020
Engineering

contestada

1. Implement the k-means clustering algorithm either in Java or Python. • The program should be executable with at least 3 parameters: the name of the dataset file, k, and the name of the output file. • The output file should contain numerical class labels (formatted as one number per row) for all the records in the test dataset and report the sum squared error (SSE) and silhouette coefficient in the last row. • You only need to handle numerical attributes (categorical attributes are not required).

Respuesta :

Otras preguntas

Find the next two terms in the following sequence. 13, -5, -35, -77,...

Which of the following is the equation of a line parallel to the line y=3x+2, passing through the point (10,1)? A. 3x - y=29 B. 3x + y=29 C. -3x - y=29 D. 3x +

Solve by elimination:4x+7y =8-8x -14y= 5

How do I factor 2y^2 - y - 10

What are Radio Waves?

What is 3y = 15x - 12 in y = mx + b form ?

how do you multiply and divide equations?

Hello how do you solve for x a= x-b/y

How do you use Bill of rights in a sentence

you shoot a model rocket into the air with a speed of 18.2 m/s. now high does the rocket go?

DayyanKhan DayyanKhan · Answer 1 · 2020-03-01T11:39:18+01:00

Answer:

The code for this Question in Python is as follows:

matplotlib inline

from copy import deepcopy

import numpy as np

import pandas as pd

from matplotlib import pyplot as plt

plt.rcParams['figure.figsize'] = (16, 9)

plt.style.use('ggplot')

# Importing the dataset

data = pd.read_csv('xclara.csv')

print(data.shape)

data.head()

# Getting the values and plotting it

f1 = data['V1'].values

f2 = data['V2'].values

X = np.array(list(zip(f1, f2)))

plt.scatter(f1, f2, c='black', s=7)

# Number of clusters

k = 3

# X coordinates of random centroids

C_x = np.random.randint(0, np.max(X)-20, size=k)

# Y coordinates of random centroids

C_y = np.random.randint(0, np.max(X)-20, size=k)

C = np.array(list(zip(C_x, C_y)), dtype=np.float32)

print(C)

# To store the value of centroids when it updates

C_old = np.zeros(C.shape)

# Cluster Lables(0, 1, 2)

clusters = np.zeros(len(X))

# Error func. - Distance between new centroids and old centroids

error = dist(C, C_old, None)

# Loop will run till the error becomes zero

while error != 0:

# Assigning each value to its closest cluster

for i in range(len(X)):

distances = dist(X[i], C)

cluster = np.argmin(distances)

clusters[i] = cluster

# Storing the old centroid values

C_old = deepcopy(C)

# Finding the new centroids by taking the average value

for i in range(k):

points = [X[j] for j in range(len(X)) if clusters[j] == i]

C[i] = np.mean(points, axis=0)

error = dist(C, C_old, None)

# Initializing KMeans

kmeans = KMeans(n_clusters=4)

# Fitting with inputs

kmeans = kmeans.fit(X)

# Predicting the clusters

labels = kmeans.predict(X)

# Getting the cluster centers

C = kmeans.cluster_centers_

fig = plt.figure()

ax = Axes3D(fig)

ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y)

ax.scatter(C[:, 0], C[:, 1], C[:, 2], marker='*', c='#050505', s=1000)