Preface
This notes are based in the course from Berstekas for the MIT see all lectures and other resources for complete the understanding.
1 Outline
The textbook for chapter one is Bertsekas’ book [1]. Chapters 2 and 3 are adapted from Sutton’s book [Ch. 3, Ch. 4, 5]. For application and broad connection with more machine learning applications, we refer to [3]. Also, we recommend a handbook of algorithms [6]. For applications with implemented code, we follow the books [2,4]. The source code for multiarmed bandits algorhims: https://github.com/terrence-ou/Reinforcement-Learning-2nd-Edition-Notes-Codes.git
[1]
D.P. Bertsekas, Dynamic programming and optimal control. Vol. I, Third, Athena Scientific, Belmont, MA, 2005.
[5]
R.S. Sutton, A.G. Barto, Reinforcement learning: An introduction, Second, MIT Press, Cambridge, MA, 2018.
[3]
S.L. Brunton, J.N. Kutz, Data-driven science and engineering, Cambridge University Press, Cambridge, 2019.
[6]
C. Szepesvári, Algorithms for reinforcement learning, Springer, Cham, 2022.
[2]
[4]
J. Stachurski., Dynamic programming volume 1, GitHub Repository. (2024).