homework7_ZhankunLuo

Homework 7

Zhankun Luo

PUID: 0031195279

Email: [email protected]

Fall-2018-ECE-59500-009

Instructor: Toma Hentea

Function

plot_cluster.m

function [] = plot_cluster(X2, bel, theta)
plot(X2(1,bel==1),X2(2,bel==1),'r.',...
X2(1,bel==2),X2(2,bel==2),'g*',X2(1,bel==3),X2(2,bel==3),'bo',...
X2(1,bel==4),X2(2,bel==4),'cx',X2(1,bel==5),X2(2,bel==5),'md',...
X2(1,bel==6),X2(2,bel==6),'yp',X2(1,bel==7),X2(2,bel==7),'ks')
hold on, plot(theta(1,:), theta(2,:),'k+')
end

k_means.m

Hard Clustering: k-means

homework7_ZhankunLuo
homework7_ZhankunLuo

Where
d(xi,θj(t))=[xiθj(t)]T[xiθj(t)]i=1Nuij(t1)d(xi,θj(t))θj=2i=1Nuij(t1)[θj(t)xi]=0 d(x_i, \theta_j(t))=[x_i - \theta_j(t)]^T[x_i - \theta_j(t)]\\ \sum_{i=1}^{N} u_{ij}(t-1) \frac{\partial d(x_i, \theta_j(t))}{\partial \theta_j} = 2\sum_{i=1}^{N} u_{ij}(t-1) [\theta_j(t) - x_i] = 0\\
because θj(t)\theta_j(t) is not related to ii
θj(t)=i=1Nuij(t1)xii=1Nuij(t1) \theta_{j}(t) = \frac {\sum_{i=1}^{N} u_{ij}(t-1)x_i}{\sum_{i=1}^{N} u_{ij}(t-1)}\\
We have

u_{ij}(t - 1) = (bel==j) % 1xN vector
\sum u_{ij}(t - 1) * x_i = sum(X'.*((bel==j)'*ones(1,l))) % 1xm vector
\sum u_{ij} = sum(bel==j)

Thus

% in function k_means.m
theta(:,j)=sum(X'.*((bel==j)'*ones(1,l))) / sum(bel==j);

drawbacks

  1. Different initial partitions may lead k-means to produces different final clusterings, each one corresponding to a different local minimum.
  2. Knowledge of the number of clusters m is required a priori.
  3. k-means is sensitive to outliers and noise.
  4. k-means is not suitable for data with nominal (categorical) coordinates.

code

function [theta,bel,J]=k_means(X,theta)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FUNCTION
%  [theta,bel,J]=k_means(X,theta)
% This function implements the k-means algorithm, which requires
% as input the number of clusters underlying the data set. The algorithm
% starts with an initial estimation of the cluster representatives and
% iteratively tries to move them into regions that are "dense" in data
% vectors, so that a suitable cost function is minimized. This is
% achieved by performing (usually) a few passes on the data set. The
% algorithm terminates when the values of the cluster representatives
% remain unaltered between two successive iterations.
%
% INPUT ARGUMENTS:
%  X:       lxN matrix, each column of which corresponds to
%           an l-dimensional data vector.
%  theta:   a matrix, whose columns contain the l-dimensional (mean)
%           representatives of the clusters.
%
% OUTPUT ARGUMENTS:
%  theta:   a matrix, whose columns contain the final estimations of
%           the representatives of the clusters.
%  bel:     N-dimensional vector, whose i-th element contains the
%           cluster label for the i-th data vector.
%  J:       the value of the cost function (sum of squared Euclidean
%           distances of each data vector from its closest parameter
%           vector) that corresponds to the  estimated clustering.
%
% (c) 2010 S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[l,N]=size(X);
[l,m]=size(theta);
e=1;
iter=0;
while(e~=0)
    iter=iter+1;
    theta_old=theta;
    dist_all=[];
    for j=1:m
        dist=sum(((ones(N,1)*theta(:,j)'-X').^2)'); % sum of l dimensions
        dist_all=[dist_all; dist];
    end
    [q1,bel]=min(dist_all); % get min value of every col of dist_all: mxN
    J=sum(min(dist_all));
    % bel =
    % 1     2     1
    % (bel==1)=
    % 1     0     1 % choose 1st class
    for j=1:m
        if(sum(bel==j)~=0)
            theta(:,j)=sum(X'.*((bel==j)'*ones(1,l))) / sum(bel==j);
        end
    end
    e=sum(sum(abs(theta-theta_old)));
end

fuzzy_c_means.m

Fuzzy Clustering

homework7_ZhankunLuo

In fcm.m, define
d(xi,θj(t))=[xiθj(t)]T[xiθj(t)]i=1Nuij(t1)d(xi,θj(t))θj=2i=1Nuij(t1)[θj(t)xi]=0 d(x_i, \theta_j(t))=[x_i - \theta_j(t)]^T[x_i - \theta_j(t)]\\ \sum_{i=1}^{N} u_{ij}(t-1) \frac{\partial d(x_i, \theta_j(t))}{\partial \theta_j} = 2\sum_{i=1}^{N} u_{ij}(t-1) [\theta_j(t) - x_i] = 0\\
because θj(t)\theta_j(t) is not related to ii
θj(t)=i=1Nuij(t1)xii=1Nuij(t1) \theta_{j}(t) = \frac {\sum_{i=1}^{N} u_{ij}(t-1)x_i}{\sum_{i=1}^{N} u_{ij}(t-1)}\\

code

function [theta,U,obj_fun]=fuzzy_c_means(X,m,q)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FUNCTION
%  [w, U, obj_fun]=fuzzy_c_means(X,m,q)
% This function applies the FCM algorithm by calling the corresponding
% MATLAB function, called "fcm". 
%
% INPUT ARGUMENTS:
%  X:   lxN matrix, each column of which corresponds to
%       an l-dimensional data vector.
%  m:   the number of clusters.
%  q:   the fuzzifier.
% 
% OUTPUT ARGUMENTS:
%  theta:   lxm matrix, each column of which corresponds to
%           a cluster representative, after the convergence of the
%           algorithm. 
%  U:       Nxm matrix containing in its i-th row
%           the ``grade of membership'' of the data vector xi to the m
%           clusters.  
%  obj_fun: a vector, whose t-th coordinate is the value of the cost
%           function, J, for the clustering produced at the t-th teration.
%
% (c) 2010 S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
X=X';
options(1)=q;
% options(1): Exponent for the partition matrix U. Default: 2.0
% options(2): Maximum number of iterations. Default: 100
% options(3): Minimum amount of improvement. Default: 1e-5
% options(4): Information displayed during iteration. Default: 1.
[theta,U,obj_fun] = fcm(X,m,options);
theta=theta';
U=U';

where function: [center, U, obj_fcn] = fcm(data, cluster_n)

[center, U, obj_fcn] = fcm(data, cluster_n)

  • applies the fuzzy c-means clustering method to a given data set.

The input arguments of this function are

  • data:

    data set to be clustered;

    each row is a sample data point

  • cluster_n:

    number of clusters (greater than one)

The output arguments of this function are

  • center:

    matrix of final cluster centers where each row provides the center coordinates

  • U:

    final fuzzy partition matrix (or membership function matrix)

  • obj_fcn:

    values of the objective function during iterations

Exercise

Exercise 7.5.1

  1. Generate and plot a data set X2X_2, that consists of 300 2-dimensional stemming from the normal distribution with mean m1=[0,0]Tm_1 = [0, 0]^T and covariance matrix equal to the 2 x 2 identity matrix.

    repeat step I of Example 7.4.2

    randn('seed', 0);
    m = [0 0; 4 0; 0 4; 5 4];
    S(:,:,1) = eye(2);
    S(:,:,2) = [1.0 .2; .2 1.5];
    S(:,:,3) = [1.0 .4; .4 1.1];
    S(:,:,4) = [.3 .2; .2 .5];
    n_points = 300 * ones(1,4); % Number of points per group
    
  2. Apply the k-means algorithm for m=2,3m = 2, 3 and random initialization if θj\theta_j's, and draw conclusions.

exercise_751.m

% Exercise 7.5.1
% NOTE: this example generates a total of six figures.
close('all');
clear
% 1. To generate and plot X2, work as in Example 7.4.2, where now the means
% of the Gaussians are different. The plot shows that X3 contains four
% clearly separated compact clusters.
randn('seed', 0);
m = [0 0; 4 0; 0 4; 5 4];
S(:,:,1) = eye(2);
S(:,:,2) = [1.0 .2; .2 1.5];
S(:,:,3) = [1.0 .4; .4 1.1];
S(:,:,4) = [.3 .2; .2 .5];
n_points = 300 * ones(1,4); % Number of points per group
X2=[];
for i=1:4
    X2=[X2; mvnrnd(m(i,:),S(:,:,i),n_points(i))];
end
X2=X2'; % X3: lxN=2x1200
figure(1), plot(X2(1,:),X2(2,:),' b.')
figure(1), axis equal

% 2. To apply the k-means algorithm, for m = 4 and random initialization
m = 4;
[l, N] = size(X2);
rand('seed', 0)
theta_ini = rand(l, m);
[theta, bel, J] = k_means(X2, theta_ini);
% Plot X3, using different colors for points from different clusters
figure(2), plot_cluster(X2, bel, theta)
figure(2), axis equal

%  3. Work as in step 2, for m = 2
m = 2;
[l, N] = size(X2);
rand('seed', 0)
theta_ini = rand(l, m);
[theta, bel, J] = k_means(X2, theta_ini);
figure(3), plot_cluster(X2, bel, theta)
figure(3), axis equal

% 4. Work as in step 2, for m = 3
m = 3;
[l, N]=size(X2);
rand('seed', 0)
theta_ini = rand(l, m);
[theta,bel,J] = k_means(X2, theta_ini);
figure(4), plot_cluster(X2, bel, theta)
figure(4), axis equal
    
% 5. Work as in step 2 with different theta_ini set1
m = 4;
[l, N] = size(X2);
theta_ini = [-2 -2; -2.1 -2.1; -2 -2.2; -2.1 -2.2]';
[theta, bel, J] = k_means(X2, theta_ini);
figure(5), plot_cluster(X2, bel, theta)
figure(5), axis equal

% 6. Work as in step 2 with different theta_ini set2
m = 4;
[l,N] = size(X2);
rand('seed', 0)
theta_ini = rand(l, m);
theta_ini(:, m) = [5 5];
[theta, bel, J] = k_means(X2, theta_ini);
figure(6), plot_cluster(X2, bel, theta)
figure(6), axis equal

Result

  1. original data
    homework7_ZhankunLuo
  2. m = 4, theta_ini = rand(l, m)
    homework7_ZhankunLuo
  3. m = 2, theta_ini = rand(l, m)
    homework7_ZhankunLuo
  4. m = 3, theta_ini = rand(l, m)
    homework7_ZhankunLuo
  5. m = 4, theta_ini = [-2 -2; -2.1 -2.1; -2 -2.2; -2.1 -2.2]’
    homework7_ZhankunLuo
  6. m = 4, theta_ini = rand(l, m), theta_ini(:, m) = [12 12];
    homework7_ZhankunLuo

Conclusion

  1. When m=2,3,4m = 2, 3, 4, divisions of clusters would be different

    So, for different mm, it could get different divisions.

  2. Sometimes when θj(0),(j=1,...,m)\theta_j(0), (j =1, ... , m) are not set properly,

    Division would be trapped in local best value,

    Thus, it would not get global best value ( which is unique for mm = certain value)

Exercise 7.5.9

  1. Apply the FCM on the data set X3X_3 generated in Example 7.5.1 for q=2,q=10q = 2, q = 10 and q=25q = 25.
  2. Define and plot the three corresponding hard clustering, as discussed, as discussed previously.
  3. Compare the uiju_{ij} parameters and the θj\theta_j's for the three cases and draw conclusions.

exercise_759.m

% Exercise 7.5.9
% NOTE: this example generates a total of 4 figures: 1 original + 3 cases
close('all');
clear
% 1. To generate and plot X2, work as in Example 7.4.2, where now the means
% of the Gaussians are different. The plot shows that X3 contains four
% clearly separated compact clusters.
randn('seed', 0);
mu = [0 0; 4 0; 0 4; 5 4];
S(:,:,1) = eye(2);
S(:,:,2) = [1.0 .2; .2 1.5];
S(:,:,3) = [1.0 .4; .4 1.1];
S(:,:,4) = [.3 .2; .2 .5];
n_points = 300 * ones(1,4); % Number of points per group
m = 4; % cluster number
X2=[];
for i=1:4
    X2=[X2; mvnrnd(mu(i,:),S(:,:,i),n_points(i))];
end
X2=X2'; % X3: lxN=2x1200
[l, N] = size(X2); % l: deminsion   N: point number
rand('seed', 0)
theta_ini = rand(l, m);
figure(1), plot(X2(1,:),X2(2,:),' b.')
figure(1), axis equal

% 2. To apply the k-means algorithm, for q = 2 and random initialization
q = 2;
[theta, U, obj_fun] = fuzzy_c_means(X2, m, q);
[U_max, bel] = max(U');
% Plot X3, using different colors for points from different clusters
figure(2), hold on, plot_cluster(X2, bel, theta)
figure(2), axis equal

%  3. Work as in step 2, for q = 10
q = 10;
[theta, U, obj_fun] = fuzzy_c_means(X2, m, q);
[U_max, bel] = max(U');
figure(3), hold on, plot_cluster(X2, bel, theta)
figure(3), axis equal

% 4. Work as in step 2, for q = 25
q = 25;
[theta, U, obj_fun] = fuzzy_c_means(X2, m, q);
[U_max, bel] = max(U');
figure(4), hold on, plot_cluster(X2, bel, theta)
figure(4), axis equal    

Result

  1. original data
    homework7_ZhankunLuo
  2. q = 2
    homework7_ZhankunLuo
  3. q = 10
    homework7_ZhankunLuo
  4. q = 25
    homework7_ZhankunLuo
>> exercise_759
% q = 2
Iteration count = 1, obj. fcn = 4374.311623
Iteration count = 2, obj. fcn = 3344.540096
Iteration count = 3, obj. fcn = 3335.790961
Iteration count = 4, obj. fcn = 3304.365018
Iteration count = 5, obj. fcn = 3195.008604
Iteration count = 6, obj. fcn = 2884.099872
Iteration count = 7, obj. fcn = 2364.754387
Iteration count = 8, obj. fcn = 2037.567145
Iteration count = 9, obj. fcn = 1834.607024
Iteration count = 10, obj. fcn = 1606.009571
Iteration count = 11, obj. fcn = 1483.160811
Iteration count = 12, obj. fcn = 1454.601969
Iteration count = 13, obj. fcn = 1450.174378
Iteration count = 14, obj. fcn = 1449.552804
Iteration count = 15, obj. fcn = 1449.463887
Iteration count = 16, obj. fcn = 1449.450440
Iteration count = 17, obj. fcn = 1449.448266
Iteration count = 18, obj. fcn = 1449.447888
Iteration count = 19, obj. fcn = 1449.447816
Iteration count = 20, obj. fcn = 1449.447801
Iteration count = 21, obj. fcn = 1449.447798
% q = 10
Iteration count = 1, obj. fcn = 14.151425
Iteration count = 2, obj. fcn = 0.051120
Iteration count = 3, obj. fcn = 0.050908
Iteration count = 4, obj. fcn = 0.050800
Iteration count = 5, obj. fcn = 0.050680
Iteration count = 6, obj. fcn = 0.050538
Iteration count = 7, obj. fcn = 0.050373
Iteration count = 8, obj. fcn = 0.050195
Iteration count = 9, obj. fcn = 0.049981
Iteration count = 10, obj. fcn = 0.049709
Iteration count = 11, obj. fcn = 0.049374
Iteration count = 12, obj. fcn = 0.048968
Iteration count = 13, obj. fcn = 0.048510
Iteration count = 14, obj. fcn = 0.048036
Iteration count = 15, obj. fcn = 0.047543
Iteration count = 16, obj. fcn = 0.047002
Iteration count = 17, obj. fcn = 0.046502
Iteration count = 18, obj. fcn = 0.046041
Iteration count = 19, obj. fcn = 0.045677
Iteration count = 20, obj. fcn = 0.045404
Iteration count = 21, obj. fcn = 0.045249
Iteration count = 22, obj. fcn = 0.045160
Iteration count = 23, obj. fcn = 0.045097
Iteration count = 24, obj. fcn = 0.045053
Iteration count = 25, obj. fcn = 0.045025
Iteration count = 26, obj. fcn = 0.045009
Iteration count = 27, obj. fcn = 0.044999
% q = 25
Iteration count = 1, obj. fcn = 1.678658
Iteration count = 2, obj. fcn = 0.000000
Iteration count = 3, obj. fcn = 0.000000

Conclusion

  1. When q=2,10,25q = 2, 10, 25, divisions of clusters are different

    So, for different qq, it could get different divisions.

  2. When q=25q = 25, its division has great difference to the original division

    So, we need choose proper qq to certain clustering problems.