Reinforcement Learning

RL:

  • figure out what leads to good result / bad result
  • do something to get the good result

Adaptive Dynamic Programming:

  • Learn the model (transition & reward function): supervised learning
  • policy evaluation TD’s advantage over ADP:
    no need for simulator?