Modified principal feature analysis (MPFA) as a feature selection algorithm for clustering large data sets within missing values.
Material type:![Text](/opac-tmpl/lib/famfamfam/BK.png)
Cover image | Item type | Current library | Collection | Call number | Status | Date due | Barcode |
---|---|---|---|---|---|---|---|
|
![]() |
University Library General Reference | Room-Use Only | LG993.5 2010 A64 A48 (Browse shelf(Opens below)) | Not For Loan | 3UPML00012572 | |
|
![]() |
University Library Archives and Records | Preservation Copy | LG993.5 2010 A64 A48 (Browse shelf(Opens below)) | Not For Loan | 3UPML00033354 |
Browsing College of Science and Mathematics shelves, Shelving location: General Reference, Collection: Room-Use Only Close shelf browser (Hides shelf browser)
College of Science and Mathematics
Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2010
Clustering is a technique of positioning objects into groups that objects within the same group exhibit a high degree of similarity, while objects from different groups manifest a high degree of disparity. Unfortunately, high-dimensional datasets often contain unimportant features that can adversely affect the performance of clustering algorithms. Feature selection has emerged as a reduction technique that chooses only the important features from data. It is commonly applied in preparation for clustering. However, the use of clustering and feature selection is limited only to complete datasets. This study modified Principal Feature Analysis in order to handle missing values. Modified Principal Feature Analysis (MPFA) makes use of all the available information in the data. MPFA was compared to case deletion, mean imputation and KNN imputation, which are common methods of handling missing values. In general, MPFA reduced the datasets with a very low percentage of retention and whose clustering results are low of quality. Also, in comparison with the existing approaches, MPFA exhibited the least satisfactory performance. This is due to inappropriate use of correlation and erroneous choice of data sets used. The competing approaches were further applied to an actual incomplete datasets and similar ranking of performance was observed.
There are no comments on this title.