### Impacto

### Downloads

Downloads per month over past year

Simón, Alejandro
(2020)
*Métodos bayesianos para comparar el funcionamiento de algoritmos sobre un conjunto de datos médicos.*
[Trabajo Fin de Grado]

Preview |
PDF
632kB |

## Abstract

One of the greatest challenge is electing appropriate hyperparameters for unsupervised clustering algorithms in an optimal way depending on the issue under study, which we face while adapting clustering algorithms to immune disorder diagnoses. In this essay we approach this challenge by proposing a model of statistical assessment, that allows the empirical comparison of algorithms, an essential step in heuristic optimization. The statistical assessments are based on the adaptation of the proposed bayesian procedure in [7] to compare the performance of the algorithms in several tests problems. Hitherto, in the field of statistical assessment researchers have relied on the use of null hypothesis statistical test. Nonetheless, lately, concerns about their treatment[5, 6] has emerged and, in many fields, other (Bayesian) alternatives are being considered. In this project, we propose a Bayesian analysis based on the Plackett-Luce model over rankings, that allows several algorithms to be considered at the same time. The major edge of the proposed method is that it allows queries such as - which is the marginal probability that a given clustering algorithm is the best one? - to be directly answered. Furthermore, thanks to the nature of the Bayesian analysis, it instinctively serves us with knowledge about the uncertainty remaining after the data have been introduced. In order to test the proposed approach, we will carry out two different experiments. In the first one, we will use controlled scenarios to show, as a sanity check, that indeed the model provides the information we are looking for. In order to do that, instead of using actual rankings of algorithms, we will simulate them by sampling a probabilistic model deined over permutations. In particular, we consider a Mallows model with Kendall's distance. In that way, we can set the number of algorithms and instances simulated and, more importantly, we can get the true marginal probabilities associated to the first position. The second one will be used to show how the procedure can be applied in an actual comparison of algorithms, in a real-life environment. We adapt clustering algorithms to immune disorder diagnoses. We compare the performance of unsupervised clustering algorithms to detect ares and remission periods in lupus patients' records with different hyperparameter choices. Specically, the clustering algorithms that we apply are: K-Means, Hierarchical Clustering and DBSCAN. To answer the query - which is the marginal probability that a given clustering algorithm is the best one? - we resort to a Bayesian analysis based on the Plackett-Luce model applied to rankings, that allow us to determine the best combination of hyperparameters and clustering technique to detect outbreaks on immune disorder diagnoses. The document is organized as follows: In the chapter 2, we motivate the bayesian analysis approach. In the chapter 3, we present the Bayesian model and, besides that we deine and detail the mathematical concepts needed to grasp the project. At the end of the chapter 3, we run a synthetic test to show that the model works as expected. In the chapter 4, we apply the Plackett-Luce model in a real-life problem, specically, the detection of ares and remission periods in lupus patients' records. Finally, in the section 5, we draw the main conclusions of the project.

Item Type: | Trabajo Fin de Grado |
---|---|

Directors: | Directors Carpio, Ana |

Uncontrolled Keywords: | Algoritmos; Análisis de datos; Estadística bayesiana |

Subjects: | Sciences > Mathematics Sciences > Mathematics > Mathematical analysis Sciences > Statistics > Mathematical statistics |

Título de Grado: | Doble grado Ingeniería, informática y matemáticas |

ID Code: | 73604 |

Deposited On: | 15 Jul 2022 08:42 |

Last Modified: | 03 Aug 2022 06:43 |

### Origin of downloads

Repository Staff Only: item control page