Reinforcement Learning
[RL] Introduction to Multi-Armed Bandits (1)
Reinforcement Learning 관련 내용 중 하나인 Multi-Armed Bandits(MAB)에 대한 내용을 정리하고자 한다(논문링크). The Multi-Armed Bandit problem (MAB) is a toy problem that models sequential decision tasks where the learner must simultaneously exploit their knowledge and explore unknown actions to gain knowledge for the future (exploration-exploitation tradeoff)(출처). 0. Introduction: Scope and Motivation 1) Example Multi-arm..