The Pessimism Heuristic for Contextual Bandits
A contextual bandit is a game played by seeing a context, selection an action relevant to the context and then observing reward. It is often used as a simple model of a (short term reward) recommender system. The user’s interest is revealed in the co...
ml4interact.hashnode.dev3 min read