Aprendizaje por refuerzo: Fundamentos teóricos del algoritmo AlphaZero e implementación

Maurel Serrano, Alberto

Publication:
Aprendizaje por refuerzo: Fundamentos teóricos del algoritmo AlphaZero e implementación

Files

MAUREL SERRANO 60108_ALBERTO_MAUREL_SERRANO_Aprendizaje_por_refuerzo_Fundamentos_teoricos_del_Algoritmo_AlphaZero_e_implementacion_784051_607075157.pdf (2.92 MB)

Publication Date

2021

Authors

Maurel Serrano, Alberto

Advisors (or tutors)

Palomino Tarjuelo, Miguel

Verdejo López, José Alberto

Citations

Exportar

Abstract

En 2016, el equipo de DeepMind sorprendió al mundo creando una inteligencia artificial capaz de jugar al go a un nivel superior al de los humanos y ganando a uno de los jugadores más laureados de la historia. Sin embargo, AlphaGo era un algoritmo complejo y requería de una gran potencia computacional. Un año más tarde se publicó AlphaZero. La belleza de este algoritmo residía no solo en que requería menos potencia computacional y se podía aplicar a más juegos, sino en la elegancia con la que combinaba sus componentes para lograr un rendimiento por encima de cualquier otro algoritmo hasta el momento. El objetivo de este trabajo es explicar el funcionamiento del algoritmo AlphaZero. Para ello se introducen primero las nociones teóricas básicas del aprendizaje por refuerzo y las redes neuronales y posteriormente los detalles particulares del algoritmo. Además, se implementa una versión reducida del mismo y se entrena para jugar al tres en raya y al Conecta 4, estudiándose los resultados obtenidos.
In 2016, DeepMind’s team surprised the world by crafting an artificial intelligence that was able to play Go at a superhuman level and win the second most laureate Go player in history. However, AlphaGo was a complex algorithm, that required huge computing power. A year later AlphaZero was published. The beauty behind this algorithm relies not only on the smaller computing power required or that it can be applied to more board games but also on the way they skillfully put together its components to achieve a performance way better than other Go programs at that moment. The objective of this work is to explain how AlphaZero works. First, we briefly introduce the theoretical basis of reinforcement learning and neural networks and later we explain the details of the algorithm. In addition, a slightly simplified version of the algorithm is implemented and trained to play Tic Tac Toe and Connect 4, and its performance is analyzed.

Description

Trabajo de Fin de Grado en Doble Grado en Ingeniería Informática - Matemáticas, Facultad de Informática UCM, Departamento de Sistemas Informáticos y Computación, Curso 2022-21.

Publication:
Aprendizaje por refuerzo: Fundamentos teóricos del algoritmo AlphaZero e implementación

Files

Official URL

Full text at PDC

Publication Date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

Research Projects

Organizational Units

Journal Issue

Abstract

Description

UCM subjects

Unesco subjects

Keywords

Citation

URI

Collections

Publication: Aprendizaje por refuerzo: Fundamentos teóricos del algoritmo AlphaZero e implementación

Files

Official URL

Full text at PDC

Publication Date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

Research Projects

Organizational Units

Journal Issue

Abstract

Description

UCM subjects

Unesco subjects

Keywords

Citation

URI

Collections

Publication:
Aprendizaje por refuerzo: Fundamentos teóricos del algoritmo AlphaZero e implementación