Modified principal feature analysis (MPFA) as a feature selection algorithm for clustering large data sets within missing values.

By:

Aguelo, Renz Marion Y

Material type: Text

TextLanguage: English Publication details: 2010Description: 63 leavesSubject(s):

Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2010 Abstract: Clustering is a technique of positioning objects into groups that objects within the same group exhibit a high degree of similarity, while objects from different groups manifest a high degree of disparity. Unfortunately, high-dimensional datasets often contain unimportant features that can adversely affect the performance of clustering algorithms. Feature selection has emerged as a reduction technique that chooses only the important features from data. It is commonly applied in preparation for clustering. However, the use of clustering and feature selection is limited only to complete datasets. This study modified Principal Feature Analysis in order to handle missing values. Modified Principal Feature Analysis (MPFA) makes use of all the available information in the data. MPFA was compared to case deletion, mean imputation and KNN imputation, which are common methods of handling missing values. In general, MPFA reduced the datasets with a very low percentage of retention and whose clustering results are low of quality. Also, in comparison with the existing approaches, MPFA exhibited the least satisfactory performance. This is due to inappropriate use of correlation and erroneous choice of data sets used. The competing approaches were further applied to an actual incomplete datasets and similar ranking of performance was observed.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 2 )
Title notes ( 3 )
Comments ( 0 )
Images

Holdings
Cover image	Item type	Current library	Collection	Call number	Status	Date due	Barcode
	Thesis	University Library General Reference	Room-Use Only	LG993.5 2010 A64 A48 (Browse shelf(Opens below))	Not For Loan		3UPML00012572
	Thesis	University Library Archives and Records	Preservation Copy	LG993.5 2010 A64 A48 (Browse shelf(Opens below))	Not For Loan		3UPML00033354

Browsing College of Science and Mathematics shelves, Shelving location: General Reference, Collection: Room-Use Only Close shelf browser (Hides shelf browser)

Previous	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	Next
Previous	LG993.5 2009 A64 B37 A mathematical investigation of RAO diversity coefficients among the communities according to the species morphometry and species taxonomy /	LG993.5 2009 C6 W66 A star algorithm modification for a Java knowledge based jeepney route recommender system /	LG 993.5 2010 A64 A44 A new approach in relating two data sets : an application to the study of interspecies relationship /	LG993.5 2010 A64 A48 Modified principal feature analysis (MPFA) as a feature selection algorithm for clustering large data sets within missing values.	LG993.5 2010 A64 A76 Minimum dominating set of P4 x Cm, M_>3 /	LG993.5 2010 A64 B29 Stochastic programming approach in determining the best combination of water-saving irrigation technologies of an irrigated rice land system under limited water resources /	LG993.5 2010 A64 B35 Double principal coordinate analysis (DPCoA) for fishes in Lake Mihaba with respect to their diet composition	Next

College of Science and Mathematics

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2010

Clustering is a technique of positioning objects into groups that objects within the same group exhibit a high degree of similarity, while objects from different groups manifest a high degree of disparity. Unfortunately, high-dimensional datasets often contain unimportant features that can adversely affect the performance of clustering algorithms. Feature selection has emerged as a reduction technique that chooses only the important features from data. It is commonly applied in preparation for clustering. However, the use of clustering and feature selection is limited only to complete datasets. This study modified Principal Feature Analysis in order to handle missing values. Modified Principal Feature Analysis (MPFA) makes use of all the available information in the data. MPFA was compared to case deletion, mean imputation and KNN imputation, which are common methods of handling missing values. In general, MPFA reduced the datasets with a very low percentage of retention and whose clustering results are low of quality. Also, in comparison with the existing approaches, MPFA exhibited the least satisfactory performance. This is due to inappropriate use of correlation and erroneous choice of data sets used. The competing approaches were further applied to an actual incomplete datasets and similar ranking of performance was observed.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer