MARC details
000 -LEADER |
fixed length control field |
03200nam a22004453a 4500 |
001 - CONTROL NUMBER |
control field |
UPMIN-00003300442 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
UPMIN |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20230202172209.0 |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
230202b |||||||| |||| 00| 0 eng d |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
DLC |
Transcribing agency |
UPMin |
Modifying agency |
upmin |
041 ## - LANGUAGE CODE |
Language code of text/sound track or separate title |
eng |
090 #0 - LOCALLY ASSIGNED LC-TYPE CALL NUMBER (OCLC); LOCAL CALL NUMBER (RLIN) |
Classification number (OCLC) (R) ; Classification number, CALL (RLIN) (NR) |
LG993.5 2009 |
Local cutter number (OCLC) ; Book number/undivided call number, CALL (RLIN) |
A64 M67 |
100 ## - MAIN ENTRY--PERSONAL NAME |
Personal name |
Moreno, Iresh Granada. |
9 (RLIN) |
2090 |
245 #2 - TITLE STATEMENT |
Title |
A modified k-means clustering algorithm with Mahalanobis distance for clustering incomplete data sets / |
Statement of responsibility, etc. |
Iresh Granada Moreno. |
260 ## - PUBLICATION, DISTRIBUTION, ETC. |
Date of publication, distribution, etc. |
2009 |
300 ## - PHYSICAL DESCRIPTION |
Extent |
94 leaves. |
502 ## - DISSERTATION NOTE |
Dissertation note |
Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2009 |
520 3# - SUMMARY, ETC. |
Summary, etc. |
Cluster analysis is an art of finding grounds in data in such a way that objects in the same group are similar to each other, whereas objects in different groups are as dissimilar as possible. The most commonly used clustering algorithm is the K-means with Euclidean distance. However, such distance function neglects the covariance among the variables in calculating distances. To account for this issue, the Mahalanobis distance is used. However, occurrence of missing values is inevitable and clustering such kind of data set is impossible. Existing method such as case deletion and mean imputation for treating missing values are very prone to producing erroneous conclusions by imputing unreliable estimates and significantly reducing the data set. To avoid these problems, modifications of the K-means clustering algorithm's two most essential elements, allocation and representation, were made. Allocation, which was defined by the Mahalanobis distance, was modified to compute distances between two vectors and to compute variances with some unknown values. The representation which was defined by arithmetic mean was modified to estimate mean where there are one or more unknown values of the certain attribute. The proposed algorithm was applied to Iris and Bupa incomplete data sets simulated under MCAR and MAR assumptions with different levels of missing values. Under MAR, case deletion has the highest cluster recovery at 5% of the samples. However, it was totally outperformed by the proposed algorithm as the occurrences of missing values in the sample increased. In general, the modified k-means with Mahalanobis distance has outdone the rest of the algorithms when applied to both data sets. |
610 ## - SUBJECT ADDED ENTRY--CORPORATE NAME |
Corporate name or jurisdiction name as entry element |
Philippine Eagle Foundation. |
9 (RLIN) |
2091 |
610 ## - SUBJECT ADDED ENTRY--CORPORATE NAME |
Corporate name or jurisdiction name as entry element |
Philippine Eagle Foundation |
Geographic subdivision |
Davao City |
-- |
Philippines. |
9 (RLIN) |
2092 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Clustering. |
9 (RLIN) |
366 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
K-means clustering. |
9 (RLIN) |
2093 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Mahalanobis distance. |
9 (RLIN) |
2094 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Clustering algorithm. |
9 (RLIN) |
1300 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Data sets. |
9 (RLIN) |
1992 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Modified algorithm. |
9 (RLIN) |
2095 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Incomplete data. |
9 (RLIN) |
2096 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Missing Values. |
9 (RLIN) |
990 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Iris data base. |
9 (RLIN) |
2097 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
BUPA data base. |
9 (RLIN) |
2098 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Cluster analysis. |
9 (RLIN) |
2099 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Adjusted Rand Index. |
9 (RLIN) |
2100 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Multivariate techniques. |
9 (RLIN) |
2101 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
MAR (Missing at random). |
9 (RLIN) |
2102 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
MCAR (Missing completely at random). |
9 (RLIN) |
2103 |
658 ## - INDEX TERM--CURRICULUM OBJECTIVE |
Main curriculum objective |
Undergraduate Thesis |
Curriculum code |
AMAT200 |
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN) |
a |
Fi |
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN) |
a |
UP |
942 ## - ADDED ENTRY ELEMENTS (KOHA) |
Source of classification or shelving scheme |
Library of Congress Classification |
Koha item type |
Thesis |