The Pessimism Heuristic for Contextual Bandits
Nov 28, 2025 · 3 min read · A contextual bandit is a game played by seeing a context, selection an action relevant to the context and then observing reward. It is often used as a simple model of a (short term reward) recommender system. The user’s interest is revealed in the co...
Join discussion




