Skip to main content

Evaluation and Metrics

Evaluating agent performance is crucial to understand how well they are learning.

๐Ÿ“Š Main Metricsโ€‹

Total Rewardโ€‹

The sum of all rewards received in an episode.

Average Rewardโ€‹

The total reward divided by the number of steps.

Success Rateโ€‹

The percentage of episodes that end successfully.

๐Ÿ“ˆ Learning Curvesโ€‹

Reward vs Episodesโ€‹

  • X-axis - Number of episodes
  • Y-axis - Average reward
  • Trend - Should increase over time

Example in Ants Sagaโ€‹

  • Episode 1-100 - Average reward: -5
  • Episode 100-500 - Average reward: 0
  • Episode 500+ - Average reward: 10+

๐ŸŽฏ Key Performance Indicatorsโ€‹

Convergenceโ€‹

When the agent's performance stabilizes and stops improving.

Sample Efficiencyโ€‹

How many samples (interactions) the agent needs to learn.

Stabilityโ€‹

How consistent the agent's performance is across different runs.

๐Ÿ“Š Visualization Toolsโ€‹

TensorBoardโ€‹

  • Real-time monitoring
  • Multiple metrics
  • Comparison between runs

Custom Plotsโ€‹

  • Learning curves
  • Reward distribution
  • Action frequency

๐Ÿ” Debugging Performanceโ€‹

Common Issuesโ€‹

  • No learning - Check reward function
  • Unstable - Reduce learning rate
  • Slow convergence - Increase exploration

Best Practicesโ€‹

  • Monitor multiple metrics
  • Use multiple random seeds
  • Compare with baselines

๐Ÿ“š Further Readingโ€‹