000 03200nam a22004453a 4500
001 UPMIN-00003300442
003 UPMIN
005 20230202172209.0
008 230202b |||||||| |||| 00| 0 eng d
040 _aDLC
_cUPMin
_dupmin
041 _aeng
090 0 _aLG993.5 2009
_bA64 M67
100 _aMoreno, Iresh Granada.
_92090
245 2 _aA modified k-means clustering algorithm with Mahalanobis distance for clustering incomplete data sets /
_cIresh Granada Moreno.
260 _c2009
300 _a94 leaves.
502 _aThesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2009
520 3 _aCluster analysis is an art of finding grounds in data in such a way that objects in the same group are similar to each other, whereas objects in different groups are as dissimilar as possible. The most commonly used clustering algorithm is the K-means with Euclidean distance. However, such distance function neglects the covariance among the variables in calculating distances. To account for this issue, the Mahalanobis distance is used. However, occurrence of missing values is inevitable and clustering such kind of data set is impossible. Existing method such as case deletion and mean imputation for treating missing values are very prone to producing erroneous conclusions by imputing unreliable estimates and significantly reducing the data set. To avoid these problems, modifications of the K-means clustering algorithm's two most essential elements, allocation and representation, were made. Allocation, which was defined by the Mahalanobis distance, was modified to compute distances between two vectors and to compute variances with some unknown values. The representation which was defined by arithmetic mean was modified to estimate mean where there are one or more unknown values of the certain attribute. The proposed algorithm was applied to Iris and Bupa incomplete data sets simulated under MCAR and MAR assumptions with different levels of missing values. Under MAR, case deletion has the highest cluster recovery at 5% of the samples. However, it was totally outperformed by the proposed algorithm as the occurrences of missing values in the sample increased. In general, the modified k-means with Mahalanobis distance has outdone the rest of the algorithms when applied to both data sets.
610 _aPhilippine Eagle Foundation.
_92091
610 _aPhilippine Eagle Foundation
_zDavao City
_zPhilippines.
_92092
650 1 7 _aClustering.
_9366
650 1 7 _aK-means clustering.
_92093
650 1 7 _aMahalanobis distance.
_92094
650 1 7 _aClustering algorithm.
_91300
650 1 7 _aData sets.
_91992
650 1 7 _aModified algorithm.
_92095
650 1 7 _aIncomplete data.
_92096
650 1 7 _aMissing Values.
_9990
650 1 7 _aIris data base.
_92097
650 1 7 _aBUPA data base.
_92098
650 1 7 _aCluster analysis.
_92099
650 1 7 _aAdjusted Rand Index.
_92100
650 1 7 _aMultivariate techniques.
_92101
650 1 7 _aMAR (Missing at random).
_92102
650 1 7 _aMCAR (Missing completely at random).
_92103
658 _aUndergraduate Thesis
_cAMAT200
905 _aFi
905 _aUP
942 _2lcc
_cTHESIS
999 _c2269
_d2269