Project proposal

Author

SDIV

Formulation and reinforcement learning solution to a problem

of a sequence of decisions.

Bases

  • The sequence of decision must be based on a finite Markov decision Process.
    • The space of states and action must to be finite and discrete.

Death line: December 08, 2024-23:59:00

Sub-products

Death lines
Stage 01 November 20, 2024-23:59
Stage 02 December 15, 2024-23:59
Stage 02 December 18, 2024-23:59

Stage 01: Quarto book with MDP formulation

  • The page must encloses the report according to the template rl_bookdown_prg.qmd

    • Introduction
    • Formulation of the Mrakov decision process
    • Model dynamics
    • Description and justification of the Cost (reward)
    • Justification of the actions
  • Must include

    • Figures to illustrates the behavior of the regarding elements:

      1. Policy
      2. Reward
      3. Value function eventuated for a one state-action and transition.
      4. Environmental model
    • References via bibtex.

    • Output compilation for HTML and PDF formats.

    • The compiled version has to be mounted ing GitHub or Quarto Pub

Stage 02: Python code Implementation

  • Only code whit out running errors wold be accepted
  • Code must follows the style guide from PEP 08
  • All functions must include doc-strings
  • Extras:
  • Packing and Documentation extra 200 xps

Stage 03: Video Presentation

A video mounted in you-tube of at most 20 min with results and insight of your project

Suggested project list:

  1. Reinforcement learning simulation of the TIC-TAC-TOE Game with SARSA or Q-learning Algorithms [2]
  2. The movement of a Recycling Robot [2]
  3. The replacement of a bus engine [5] from [see pd.pdf, p.130 6]
  4. Optimal Inventories [see dp.pdf, p. 147 6]
  5. Multi-Armed Bandits [2]

Project Lits

Project list
Project Author Reference GitHub Repo
Dynamic Portfolio Analysis GABRIEL MIRANDA GAMEZ [Sec. 4.3, 1] (gh_page)[https://gabo-always-learning.quarto.pub/project-dpa/]
Analyzing Learned Markov Decision Processes using Model Checking for
Providing Tactical Advice in Professional Soccer JOSE ITALO SANCHEZ BERMUDEZ
[3] (gh_page)[https://italosanchezb.github.io/Proyecto-Final-RL/]
A MDP model for the collective behavior in vaccination campaigns IRASEMA PEDROZA MEZA (GitHub)[https://github.com/IrasemaPM/Proyecto_psi]
Modelo de inventario para alimentos pedecederos DAVID PEÑA PERALTA [6] (gh_page)[https://dust1920.github.io/InventoryManagement/]
A inventory model JAZMIN SARAHI FLORES GOMEZ [4] (gh_page)[https://flordejazmin.github.io/Proyectoinventario/]

Configuration to build with Spanish language

Adapt to your project accordingly to your .png files another sources.

_quarto.yml
  project:
  type: book
  output-dir: _book

website:
  favicon: FCFMLOGO.png
  reader-mode: true
  search:
    location: sidebar
    type: overlay
  comments:
    hypothesis: true

book:
  title: "Análisis comparativo del desempeño en métodos para el pronóstico de series temporales"
  reader-mode: true
  language: es
  date: "02/14/2024"
  output-file: "Tesis_JSLG"
  # image: logofcfm.png
  # cover-image: FCFMLOGO.png
  sharing: [twitter, facebook]
  downloads: [pdf, epub]
  # favicon: logofcfm.png
  sidebar:
  #  logo: LOGO50.png
    style: floating
    collapse-level: 2
    border: true
    search: true
  open-graph: true
  twitter-card: true
  #repo-url: https://github.com/Jennlg/Tesis
  repo-actions: [edit, issue, source]
  page-navigation: true
  chapters:
    - index.qmd
    - intro.qmd
    - objetivos.qmd

    - part: 'Preliminares'
      chapters:
        - tconjuntos.qmd
        - probabilidad.qmd
        - estadistica.qmd
        - procesos.qmd
    - part: 'Series de tiempo'
      chapters:
        - series.qmd
    - part: 'Redes neuronales'
      chapters:
        - redes.qmd
    - part: estudio.qmd
      chapters:
        - metodologia.qmd
        - confirmados.qmd
        - muertes.qmd
    - conclusiones.qmd

    - references.qmd

comments:
    hypothesis: true

bibliography: references.bib

format:
  html:
    theme:
      dark: darkly
      light: cerulean
    highlight-style: a11y
    lang: es
    html-math-method: mathjax
    grid:
      sidebar-width: 300px
      body-width: 900px
      margin-width: 300px
      gutter-width: 1.5rem
    code-copy: true
    code-fold: true
  pdf:
    lang: es
    include-in-header:
      - packa.tex
    template-partials:
      - before-body.tex
    documentclass: scrreprt
    papersize: us-letter
    #titlegraphic: FCFMLOGO.png
    institution: Universidad Autónoma de Chiapas
    email: jennifer.lopez67@unach.mx
    keep-tex: true
  epub:
    cover-image: FCFMLOGO.png
editor: visual

We also need the following .tex in the root folder

\usepackage{upgreek}
\usepackage{amsmath}
\usepackage{amssymb}
\newcommand{\dashedbox}[1]{
  \begin{tikzpicture}
    \node[draw, dashed, rounded corners=5pt, inner sep=10pt] {
      \begin{minipage}{0.8\textwidth} % Establece el ancho del minipage
        #1
      \end{minipage}
    };
  \end{tikzpicture}
}

Refrences

[1]
D.P. Bertsekas, Dynamic programming and optimal control. Vol. I, Third, Athena Scientific, Belmont, MA, 2005.
[2]
[3]
T. Decroos, L. Bransen, J.V. Haaren, J. Davis, VAEP: An Objective Approach to Valuing On-the-Ball Actions in Soccer (Extended Abstract), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. (2020) 4696–4700.
[4]
D. Levhari, L.J. Mirman, The great fish war: An example using a dynamic cournot-nash solution, Essays in the Economics of Renewable Resources, LJ Mirman and DF Spulber (Eds.), North-Holland. (1982) 243–258.
[5]
[6]
J. Stachurski., Dynamic programming volume 1, GitHub Repository. (2024).