Preface

Author

Saúl Díaz Infante, David González Sánchez

Published

May 30, 2024

This notes are based in the course from Berstekas for the MIT see all lectures and other resources for complete the understanding.

1 Outline

The textbook for chapter one is Bertsekas’ book [1]. Chapters 2 and 3 are adapted from Sutton’s book [Ch. 3, Ch. 4, 5]. For application and broad connection with more machine learning applications, we refer to [3]. Also, we recommend a handbook of algorithms [6]. For applications with implemented code, we follow the books [2,4]. The source code for multiarmed bandits algorhims: https://github.com/terrence-ou/Reinforcement-Learning-2nd-Edition-Notes-Codes.git

[1]
D.P. Bertsekas, Dynamic programming and optimal control. Vol. I, Third, Athena Scientific, Belmont, MA, 2005.
[5]
R.S. Sutton, A.G. Barto, Reinforcement learning: An introduction, Second, MIT Press, Cambridge, MA, 2018.
[3]
S.L. Brunton, J.N. Kutz, Data-driven science and engineering, Cambridge University Press, Cambridge, 2019.
[6]
C. Szepesvári, Algorithms for reinforcement learning, Springer, Cham, 2022.
[2]
[4]
J. Stachurski., Dynamic programming volume 1, GitHub Repository. (2024).

1.1 Bibliography