Project proposal

Author

SDIV

Formulation and reinforcement learning solution to a problem

of a sequence of decisions.

Bases

The sequence of decision must be based on a finite Markov decision Process.
- The space of states and action must to be finite and discrete.

Death line: December 08, 2024-23:59:00

Sub-products

Death lines
Stage 01	November 20, 2024-23:59
Stage 02	December 15, 2024-23:59
Stage 02	December 18, 2024-23:59

Stage 01: Quarto book with MDP formulation

The page must encloses the report according to the template rl_bookdown_prg.qmd
- Introduction
- Formulation of the Mrakov decision process
- Model dynamics
- Description and justification of the Cost (reward)
- Justification of the actions
Must include
- Figures to illustrates the behavior of the regarding elements:
  1. Policy
  2. Reward
  3. Value function eventuated for a one state-action and transition.
  4. Environmental model
- References via bibtex.
- Output compilation for HTML and PDF formats.
- The compiled version has to be mounted ing GitHub or Quarto Pub

Stage 02: Python code Implementation

Only code whit out running errors wold be accepted
Code must follows the style guide from PEP 08
All functions must include doc-strings
Extras:
Packing and Documentation extra 200 xps

Stage 03: Video Presentation

A video mounted in you-tube of at most 20 min with results and insight of your project

Suggested project list:

Reinforcement learning simulation of the TIC-TAC-TOE Game with SARSA or Q-learning Algorithms [2]
The movement of a Recycling Robot [2]
The replacement of a bus engine [5] from [see pd.pdf, p.130 6]
Optimal Inventories [see dp.pdf, p. 147 6]
Multi-Armed Bandits [2]

Project Lits

Project list
Project	Author	Reference	GitHub Repo
Dynamic Portfolio Analysis	GABRIEL MIRANDA GAMEZ	[Sec. 4.3, 1]	(gh_page)[https://gabo-always-learning.quarto.pub/project-dpa/]
Analyzing Learned Markov Decision Processes using Model Checking for
Providing Tactical Advice in Professional Soccer	JOSE ITALO SANCHEZ BERMUDEZ
[3]	(gh_page)[https://italosanchezb.github.io/Proyecto-Final-RL/]
A MDP model for the collective behavior in vaccination campaigns	IRASEMA PEDROZA MEZA		(GitHub)[https://github.com/IrasemaPM/Proyecto_psi]
Modelo de inventario para alimentos pedecederos	DAVID PEÑA PERALTA	[6]	(gh_page)[https://dust1920.github.io/InventoryManagement/]
A inventory model	JAZMIN SARAHI FLORES GOMEZ	[4]	(gh_page)[https://flordejazmin.github.io/Proyectoinventario/]

Configuration to build with Spanish language

Adapt to your project accordingly to your .png files another sources.

_quarto.yml

  project:
  type: book
  output-dir: _book

website:
  favicon: FCFMLOGO.png
  reader-mode: true
  search:
    location: sidebar
    type: overlay
  comments:
    hypothesis: true

book:
  title: "Análisis comparativo del desempeño en métodos para el pronóstico de series temporales"
  reader-mode: true
  language: es
  date: "02/14/2024"
  output-file: "Tesis_JSLG"
  # image: logofcfm.png
  # cover-image: FCFMLOGO.png
  sharing: [twitter, facebook]
  downloads: [pdf, epub]
  # favicon: logofcfm.png
  sidebar:
  #  logo: LOGO50.png
    style: floating
    collapse-level: 2
    border: true
    search: true
  open-graph: true
  twitter-card: true
  #repo-url: https://github.com/Jennlg/Tesis
  repo-actions: [edit, issue, source]
  page-navigation: true
  chapters:
    - index.qmd
    - intro.qmd
    - objetivos.qmd

    - part: 'Preliminares'
      chapters:
        - tconjuntos.qmd
        - probabilidad.qmd
        - estadistica.qmd
        - procesos.qmd
    - part: 'Series de tiempo'
      chapters:
        - series.qmd
    - part: 'Redes neuronales'
      chapters:
        - redes.qmd
    - part: estudio.qmd
      chapters:
        - metodologia.qmd
        - confirmados.qmd
        - muertes.qmd
    - conclusiones.qmd

    - references.qmd

comments:
    hypothesis: true

bibliography: references.bib

format:
  html:
    theme:
      dark: darkly
      light: cerulean
    highlight-style: a11y
    lang: es
    html-math-method: mathjax
    grid:
      sidebar-width: 300px
      body-width: 900px
      margin-width: 300px
      gutter-width: 1.5rem
    code-copy: true
    code-fold: true
  pdf:
    lang: es
    include-in-header:
      - packa.tex
    template-partials:
      - before-body.tex
    documentclass: scrreprt
    papersize: us-letter
    #titlegraphic: FCFMLOGO.png
    institution: Universidad Autónoma de Chiapas
    email: jennifer.lopez67@unach.mx
    keep-tex: true
  epub:
    cover-image: FCFMLOGO.png
editor: visual

We also need the following .tex in the root folder

\usepackage{upgreek}
\usepackage{amsmath}
\usepackage{amssymb}
\newcommand{\dashedbox}[1]{
  \begin{tikzpicture}
    \node[draw, dashed, rounded corners=5pt, inner sep=10pt] {
      \begin{minipage}{0.8\textwidth} % Establece el ancho del minipage
        #1
      \end{minipage}
    };
  \end{tikzpicture}
}

Refrences

[1]

D.P. Bertsekas, Dynamic programming and optimal control. Vol. I, Third, Athena Scientific, Belmont, MA, 2005.

[2]

E. Bilgin, Mastering reinforcement learning with python: Build next-generation, self-learning models using reinforcement learning techniques and best practices, Packt Publishing, 2020.

[3]

T. Decroos, L. Bransen, J.V. Haaren, J. Davis, VAEP: An Objective Approach to Valuing On-the-Ball Actions in Soccer (Extended Abstract), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. (2020) 4696–4700.

[4]

D. Levhari, L.J. Mirman, The great fish war: An example using a dynamic cournot-nash solution, Essays in the Economics of Renewable Resources, LJ Mirman and DF Spulber (Eds.), North-Holland. (1982) 243–258.

[5]

J. Rust, Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher, Econometrica. 55 (1987) 999.

[6]

J. Stachurski., Dynamic programming volume 1, GitHub Repository. (2024).