REINFORCE

Reinforcement Learning

[RL] Policy Gradient(REINFORCE)

Policy Gradient 개념과 대표적인 알고리즘인 REINFORCE 내용을 정리하고자 한다. 1. 개념 (1) Monte Carlo methods Monte Carlo methods are a group of methods that samples randomly from a distribution to eventually localize to some solution. In the context of RL, we use monte carlo methods to estimate the reward by averaging the rewards over many episodes of interaction with the environment. (출처) (2) Policy Gradient methods P..

Fine애플
'REINFORCE' 태그의 글 목록