Approaches in handling missing values in randomly amplified polymorphic DNA (RAPD) analysis / Mabele Palmes Malagamba.
Material type:![Text](/opac-tmpl/lib/famfamfam/BK.png)
Cover image | Item type | Current library | Collection | Call number | Status | Date due | Barcode |
---|---|---|---|---|---|---|---|
|
![]() |
University Library Theses | Room-Use Only | LG993.5 2006 A64 M36 (Browse shelf(Opens below)) | Not For Loan | 3UPML00011647 | |
|
![]() |
University Library Archives and Records | Preservation Copy | LG993.5 2006 A64 M36 (Browse shelf(Opens below)) | Not For Loan | 3UPML00021976 |
Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2006
Randomly amplified polymorphic DNA (RAPD) experiments produce large amount of data. The hierarchical is commonly used in constructing a dendrogram. However, RAPD data usually contain missing values caused by experimental errors that even at a low rate can be a major drawback for computing the similarity and the use of clustering methods. Thus, missing values were treated with case deletion and data imputation. This study focuses on the approaches in handling missing values present in RAPD data and its effects on the construction of phylogenetic tree. This was obtained by comparing the existing techniques in handling missing values such as zero replacement, K-nearest neighbor imputation (KNN) and by developing an alternative approach for obtaining similarity indices that will accommodate incomplete data sets. The results of the study present the modified similarity coefficients and comparative experiments of the methods in handling missing values. In comparing the methods in handling missing values, in general, the KNN outperformed the zero replacement and the modified similarity coefficients at almost all levels of degradation. However, at a low rate of missing values modified similarity coefficients outdid the KNN and zero replacement methods. Moreover, the single linkage seemed the most stable levels of degradation. The average linkage performed fairly among the clustering algorithm. In addition, the complete linkage gives the worst result because of its low recovery in all levels of degradation. Furthermore, both Jaccard and Sorensen-Dice similarity coefficients had similar performance. Thus, the impact of missing values depends on the hierarchical clustering algorithm used. Also, the performance of an approach in handling missing values depends on the rate of the missing values.
There are no comments on this title.