What is the exploration vs exploitation trade off in reinforcement learning?

A key challenge that arises in reinforcement learning (RL) is the trade-off between exploration and exploitation. This challenge is unique to RL and doesn’t arise in supervised or unsupervised learning.

Exploration is any action that lets the agent discover new features about the environment, while exploitation is capitalizing on knowledge already gained. If the agent continues to exploit only past experiences, it is likely to get stuck in a suboptimal policy. On the other hand, if it continues to explore without exploiting, it might never find a good policy.

An agent must find the right balance between the two so that it can discover the optimal policy that yields the maximum rewards.