Evaluation and Metrics

Evaluating agent performance is crucial to understand how well they are learning.

📊 Main Metrics

Total Reward

The sum of all rewards received in an episode.

Average Reward

The total reward divided by the number of steps.

Success Rate

The percentage of episodes that end successfully.

📈 Learning Curves

Reward vs Episodes

X-axis - Number of episodes
Y-axis - Average reward
Trend - Should increase over time

Example in Ants Saga

Episode 1-100 - Average reward: -5
Episode 100-500 - Average reward: 0
Episode 500+ - Average reward: 10+

🎯 Key Performance Indicators

Convergence

When the agent's performance stabilizes and stops improving.

Sample Efficiency

How many samples (interactions) the agent needs to learn.

Stability

How consistent the agent's performance is across different runs.

📊 Visualization Tools

TensorBoard

Real-time monitoring
Multiple metrics
Comparison between runs

Custom Plots

Learning curves
Reward distribution
Action frequency

🔍 Debugging Performance

Common Issues

No learning - Check reward function
Unstable - Reduce learning rate
Slow convergence - Increase exploration

Best Practices

Monitor multiple metrics
Use multiple random seeds
Compare with baselines

📊 Main Metrics​

Total Reward​

Average Reward​

Success Rate​

📈 Learning Curves​

Reward vs Episodes​

Example in Ants Saga​

🎯 Key Performance Indicators​

Convergence​

Sample Efficiency​

Stability​

📊 Visualization Tools​

TensorBoard​

Custom Plots​

🔍 Debugging Performance​

Common Issues​

Best Practices​

📚 Further Reading​