Agents and Environments
The fundamental concepts of Reinforcement Learning revolve around the interaction between agents and environments. Here you'll learn the key definitions.
๐ค Agentโ
An agent is the entity that learns and makes decisions in an environment. In MLVisual, our agents are the ants that learn to collect food.
Characteristics of an Agentโ
- Learns through experience
- Makes decisions based on current state
- Receives rewards for its actions
- Improves performance over time
Example in Ants Sagaโ
In our project, the ant is the agent that must:
- Decide which direction to move
- Learn to avoid obstacles
- Maximize food collection
๐ Environmentโ
The environment is the world in which the agent interacts. It includes everything the agent can observe and interact with.
Characteristics of an Environmentโ
- Provides states - Information about the current situation
- Receives actions - From the agent
- Returns rewards - Feedback on action quality
- Transitions - Changes state based on actions
Example in Ants Sagaโ
Our environment includes:
- Map layout - Walls, food, obstacles
- Physics - Movement, collisions
- Rules - How the world behaves
๐ Statesโ
A state is a snapshot of the environment at a specific moment. It contains all the information the agent needs to make decisions.
State Representationโ
- Discrete - Finite set of possible states
- Continuous - Infinite set of possible states
- Partial - Agent can't see everything
- Full - Agent can see everything
Example in Ants Sagaโ
state = {
'ant_position': (x, y),
'food_positions': [(x1, y1), (x2, y2), ...],
'obstacle_positions': [(x1, y1), (x2, y2), ...],
'ant_health': 100,
'food_collected': 5
}
๐ฎ Actionsโ
An action is a decision the agent can make. The set of all possible actions is called the action space.
Action Typesโ
- Discrete - Finite set of actions (left, right, up, down)
- Continuous - Infinite set of actions (move 0.5 units left)
- Multi-dimensional - Multiple actions at once
Example in Ants Sagaโ
actions = {
'move_up': (0, 1),
'move_down': (0, -1),
'move_left': (-1, 0),
'move_right': (1, 0),
'stay': (0, 0)
}
๐ Agent-Environment Loopโ
The learning process follows this cycle:
- Agent observes the current state
- Agent chooses an action based on its policy
- Environment receives the action
- Environment transitions to a new state
- Environment provides a reward
- Agent updates its policy based on the reward
- Repeat until learning is complete
๐ฏ Learning Objectivesโ
Goalโ
The agent learns to maximize cumulative reward over time.
Methodsโ
- Value-based - Learn value functions (Q-learning)
- Policy-based - Learn policies directly (REINFORCE)
- Actor-Critic - Combine both approaches