Hierarchical clustering for mixed dataset based on variance and entropy / Luchie Marie A. Labayan.

By:

Labayan, Luchie Marie A

Material type: Text

TextLanguage: English Publication details: 2009Description: 87 leavesSubject(s): Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2009 Abstract: Hsu's coefficient for mixed data was modified in terms of the aggregation technique and the entropy-based distance function the efficiently cluster mixed datasets. There are six proposed dissimilarly coefficients which use variance for numerical attributes and entropy such as Shannon's weighted entropy. Havrda-Charvat's structural a-entropy and Jensen-Shannon divergence for categorical attributes. The type of data has a significant effect on the clustering produced accompanied by the aggregation function used and the entropy measure employed. For data whose categorical values have no level of similarity. The proposed dissimilarity coefficients that are closely related and generated similar dendrograms are those which have the same aggregation function. Based on the performance, the proposed dissimilarity coefficients that used De Carvallo's dissimilarity measure as aggregation function produced better clustering solution. On the other hand, for data whose categorical values have different degrees of similarity, only the proposed dissimilarity coefficients that used Shannon's entropy weighted by the distance in the distance hierarchy deviated from the group. The proposed dissimilarity coefficients that used De Carvallo's extension of Ichino and Yaguchi's dissimilarity as aggregation function worked well in clustering. The six proposed dissimilarity coefficients performed better with mixed data compared to the existing dissimilarity measures for mixed data.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 2 )
Title notes ( 2 )
Comments ( 0 )
Images

Holdings
Cover image	Item type	Current library	Collection	Call number	Status	Date due	Barcode
	Thesis	University Library Theses	Room-Use Only	LG993.5 2009 A64 L32 (Browse shelf(Opens below))	Not For Loan		3UPML00012369
	Thesis	University Library Archives and Records	Preservation Copy	LG993.5 2009 A64 L32 (Browse shelf(Opens below))	Not For Loan		3UPML00032663

Browsing College of Science and Mathematics shelves, Shelving location: Theses, Collection: Room-Use Only Close shelf browser (Hides shelf browser)

Previous	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	Next
Previous	LG993.5 2008 F62 T37 A major practicum report for Lanao Foundation Dairy Processing Plant, in Bangaan, Sultan Naga Dimaporo, Lanao del Norte /	LG993.5 2008 F62 V54 A major practicum report for Virginia Food, Incorporated Cogon-Canamuca, Compostela, Cebu /	LG993.5 2009 A64 G83 Genetic algorithm application on the nurse scheduling problem in Davao Medical Center /	LG993.5 2009 A64 L32 Hierarchical clustering for mixed dataset based on variance and entropy /	LG993.5 2009 A64 L46 Genetic algorithm with shuffled frog leaping algorithm for the University course timetabling problem /	LG993.5 2009 A64 M36 Modified Rao diversity coefficient considering species dissimilarity with respect to biological characteristics at the class level /	LG993.5 2009 A64 M67 A modified k-means clustering algorithm with Mahalanobis distance for clustering incomplete data sets /	Next

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2009

Hsu's coefficient for mixed data was modified in terms of the aggregation technique and the entropy-based distance function the efficiently cluster mixed datasets. There are six proposed dissimilarly coefficients which use variance for numerical attributes and entropy such as Shannon's weighted entropy. Havrda-Charvat's structural a-entropy and Jensen-Shannon divergence for categorical attributes. The type of data has a significant effect on the clustering produced accompanied by the aggregation function used and the entropy measure employed. For data whose categorical values have no level of similarity. The proposed dissimilarity coefficients that are closely related and generated similar dendrograms are those which have the same aggregation function. Based on the performance, the proposed dissimilarity coefficients that used De Carvallo's dissimilarity measure as aggregation function produced better clustering solution. On the other hand, for data whose categorical values have different degrees of similarity, only the proposed dissimilarity coefficients that used Shannon's entropy weighted by the distance in the distance hierarchy deviated from the group. The proposed dissimilarity coefficients that used De Carvallo's extension of Ichino and Yaguchi's dissimilarity as aggregation function worked well in clustering. The six proposed dissimilarity coefficients performed better with mixed data compared to the existing dissimilarity measures for mixed data.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer